netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/5] Containerize syslog
@ 2012-11-19  8:16 Rui Xiang
       [not found] ` <50A9EAD8.9090501-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Rui Xiang @ 2012-11-19  8:16 UTC (permalink / raw)
  To: serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Eric W. Biederman

From: Xiang Rui <rui.xiang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

In Serge's patch (http://lwn.net/Articles/525629/), syslog_namespace was tied to a user
namespace. We add syslog_ns tied to nsproxy instead, and implement ns_printk in
ip_table context.

We add syslog_namespace as a part of nsproxy, and a new flag CLONE_SYSLOG to unshare
syslog area.

In syslog_namespace, some necessary identifiers for handling syslog buf are contained.
When one container creates a new syslog namespace,containerized buf will be allocated
to store log ownned this container. Containerized identifiers such as log_first_seq
instead of global variable only affect their own buf.The buf will not be free until
syslog_namespace is destructed by host.

Printk should be re-implimented because log buf is isolated into syslog_ns. The function
include printk, /dev/kmsg, do_syslog and kmsg_dump should be realized in container. So,
to make these funtions available in container, a parameter syslog_ns is necessory for
their interfaces.

For container context, the value syslog namespace is reasonable if we use current method
to get syslog_ns when using iptable. Because the log info belong to each containers will
be printed in host.

We add a pointer in net namespace, and use it to track the syslog_ns which was created
when the log was generated in container. Then add ns_printk to provide a new interface
while using syslog_ns.

This patchset is based on the develop tree of net branch
	https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git.

Libo Chen (3):
  printk: modify printk interface for syslog_namespace
  printk: add ns_printk for specific syslog_ns
  printk: use ns_printk in iptable context

Xiang Rui (2):
  Syslog_ns: add syslog_namespace struct and API
  Syslog_ns: add CLONE_NEWSYSLOG and create syslog_ns when copying
    process

 drivers/base/core.c              |    4 +-
 include/linux/nsproxy.h          |    2 +
 include/linux/printk.h           |    5 +-
 include/linux/syslog_namespace.h |   98 ++++++
 include/net/net_namespace.h      |    7 +-
 include/net/netfilter/xt_log.h   |    7 +-
 include/uapi/linux/sched.h       |    3 +-
 init/Kconfig                     |    7 +
 kernel/Makefile                  |    1 +
 kernel/nsproxy.c                 |   19 +-
 kernel/printk.c                  |  646 ++++++++++++++++++++++++--------------
 kernel/syslog_namespace.c        |   65 ++++
 net/core/net_namespace.c         |   12 +-
 net/netfilter/xt_LOG.c           |    4 +-
 14 files changed, 623 insertions(+), 257 deletions(-)
 create mode 100644 include/linux/syslog_namespace.h
 create mode 100644 kernel/syslog_namespace.c

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/5] Containerize syslog
       [not found] ` <50A9EAD8.9090501-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2012-11-19  9:51   ` Eric W. Biederman
       [not found]     ` <874nklkjjm.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-11-19 14:37   ` Serge E. Hallyn
  1 sibling, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2012-11-19  9:51 UTC (permalink / raw)
  To: Rui Xiang
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	netdev-u79uwXL29TY76Z2rM5mHXA

Rui Xiang <leo.ruixiang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> From: Xiang Rui <rui.xiang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>
> In Serge's patch (http://lwn.net/Articles/525629/), syslog_namespace was tied to a user
> namespace. We add syslog_ns tied to nsproxy instead, and implement ns_printk in
> ip_table context.
>
> We add syslog_namespace as a part of nsproxy, and a new flag CLONE_SYSLOG to unshare
> syslog area.
>
> In syslog_namespace, some necessary identifiers for handling syslog buf are contained.
> When one container creates a new syslog namespace,containerized buf will be allocated
> to store log ownned this container. Containerized identifiers such as log_first_seq
> instead of global variable only affect their own buf.The buf will not be free until
> syslog_namespace is destructed by host.
>
> Printk should be re-implimented because log buf is isolated into syslog_ns. The function
> include printk, /dev/kmsg, do_syslog and kmsg_dump should be realized in container. So,
> to make these funtions available in container, a parameter syslog_ns is necessory for
> their interfaces.
>
> For container context, the value syslog namespace is reasonable if we use current method
> to get syslog_ns when using iptable. Because the log info belong to each containers will
> be printed in host.
>
> We add a pointer in net namespace, and use it to track the syslog_ns which was created
> when the log was generated in container. Then add ns_printk to provide a new interface
> while using syslog_ns.

It occurs to me that calling this a syslog namespace is a misnomer.
Syslog in general uses unix domain sockets.  This is about the linux
kernel specific kernel log interface that tends to be put in syslog.

Are there any kernel print statements besides networking stack printks
that we want to move to show up in a new "kernel log" namespace?

For the kernel generated pieces of information that are interesting (and
their don't seem to be many of those) would we be better off using
another kernel method that is already per namespace.  Something like
netlink.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/5] Containerize syslog
       [not found] ` <50A9EAD8.9090501-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2012-11-19  9:51   ` Eric W. Biederman
@ 2012-11-19 14:37   ` Serge E. Hallyn
       [not found]     ` <20121119143702.GB4620-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: Serge E. Hallyn @ 2012-11-19 14:37 UTC (permalink / raw)
  To: Rui Xiang
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman, netdev-u79uwXL29TY76Z2rM5mHXA

Quoting Rui Xiang (leo.ruixiang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
> From: Xiang Rui <rui.xiang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> 
> In Serge's patch (http://lwn.net/Articles/525629/), syslog_namespace was tied to a user
> namespace. We add syslog_ns tied to nsproxy instead, and implement ns_printk in
> ip_table context.

Since you say 'we', I'm just wondering, which project is this a part of?

> We add syslog_namespace as a part of nsproxy, and a new flag CLONE_SYSLOG to unshare
> syslog area.

Thanks, looks like you save me the time of having to add some users of
nsprintk :)

I understand that user namespaces aren't 100% usable yet, but looking
long term, is there a reason to have the syslog namespace separate
from user namespace?

thanks,
-serge

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/5] Containerize syslog
       [not found]     ` <20121119143702.GB4620-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2012-11-21  9:35       ` Rui Xiang
  2012-11-26 15:16         ` Eric W. Biederman
  0 siblings, 1 reply; 14+ messages in thread
From: Rui Xiang @ 2012-11-21  9:35 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman, netdev-u79uwXL29TY76Z2rM5mHXA

On 2012-11-19 22:37, Serge E. Hallyn wrote:
> Quoting Rui Xiang (leo.ruixiang-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
>> From: Xiang Rui <rui.xiang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
>>
>> In Serge's patch (http://lwn.net/Articles/525629/), syslog_namespace was tied to a user
>> namespace. We add syslog_ns tied to nsproxy instead, and implement ns_printk in
>> ip_table context.
> 
> Since you say 'we', I'm just wondering, which project is this a part of?
> 

Hi,Serge

Thank you for your attention.

We may use container in our company, and one of the missing part we found is syslog
isolation (though we require this feature or not is not sure at this moment), so we
made this patchset.

>> We add syslog_namespace as a part of nsproxy, and a new flag CLONE_SYSLOG to unshare
>> syslog area.
> 
> Thanks, looks like you save me the time of having to add some users of
> nsprintk :)
> 
> I understand that user namespaces aren't 100% usable yet, but looking
> long term, is there a reason to have the syslog namespace separate
> from user namespace?

Actually we don't have strong preference. We'll think more about it. Hope we can make
consensus with Eric.

Thanks,
Rui Xiang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/5] Containerize syslog
  2012-11-21  9:35       ` Rui Xiang
@ 2012-11-26 15:16         ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2012-11-26 15:16 UTC (permalink / raw)
  To: Rui Xiang; +Cc: Serge E. Hallyn, serge.hallyn, containers, netdev

Rui Xiang <leo.ruixiang@gmail.com> writes:

> On 2012-11-19 22:37, Serge E. Hallyn wrote:

>> I understand that user namespaces aren't 100% usable yet, but looking
>> long term, is there a reason to have the syslog namespace separate
>> from user namespace?
>
> Actually we don't have strong preference. We'll think more about it. Hope we can make
> consensus with Eric.

I hope I am not hard to work with.  My primary concern is reasonable
looking code and good long term maintainable semantics.

I really don't care in which namespace where we file the kernel log
statements.

I care much more about which kernel log print statements we want filed
differently.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/5] Containerize syslog
       [not found]     ` <874nklkjjm.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-12-07  9:03       ` Andrew Morton
       [not found]         ` <20121207010355.c809b3f7.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2012-12-07  9:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rui Xiang,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	netdev-u79uwXL29TY76Z2rM5mHXA

On Mon, 19 Nov 2012 01:51:09 -0800 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote:

> Are there any kernel print statements besides networking stack printks
> that we want to move to show up in a new "kernel log" namespace?

That's a good question, and afaict it remains unanswered.

As so often happens, this patchset's changelogs forgot to describe the
reason for the existence of this patchset.  Via a bit of lwn reading
and my awesome telepathic skills, I divine that something in networking
is using syslog for kernel->userspace communications.

wtf?

Wouldn't it be better to just stop doing that, and to implement a
respectable and reliable kernel->userspace messaging scheme?

And leave syslog alone - it's a crude low-level thing for random
unexpected things which operators might want to know about.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/5] Containerize syslog
       [not found]         ` <20121207010355.c809b3f7.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
@ 2012-12-07 14:23           ` Serge Hallyn
  2012-12-07 14:30             ` Glauber Costa
  2012-12-07 18:21             ` Eric W. Biederman
  0 siblings, 2 replies; 14+ messages in thread
From: Serge Hallyn @ 2012-12-07 14:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rui Xiang, netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

Quoting Andrew Morton (akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org):
> On Mon, 19 Nov 2012 01:51:09 -0800 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote:
> 
> > Are there any kernel print statements besides networking stack printks
> > that we want to move to show up in a new "kernel log" namespace?
> 
> That's a good question, and afaict it remains unanswered.

There are some other (not *terribly* compelling) cases.  For instance
selinux hooks, if you say mount an fs without xattr support or with
unsupported options, will printk a warning.  Things like stat.c and
capabilities and syslog print out warnings when userspace uses a
deprecated somethingorother - old stat syscall or sys_syslog without
CAP_SYSLOG.  That should go to the container.  Filesystems may give
warnings (bad mount options for tmpfs, bad uid owner for many of them,
etc) which belong in the container.  Obviously some belong on the host -
if they show a corrupt superblock which may indicate an attempt by the
container to crash the kernel.

> As so often happens, this patchset's changelogs forgot to describe the
> reason for the existence of this patchset.  Via a bit of lwn reading

Not as a separate justification admittedly, but the description was
meant to explain it:  right now /dev/kmsg and sys_syslog are not safe
and useful in a container;  syslog messages from host and containers
can be confusingly intermixed;  and helpful printks are not seen in
the container.

> and my awesome telepathic skills, I divine that something in networking
> is using syslog for kernel->userspace communications.
> 
> wtf?

Well, syslog is the kernel->userspace channel of last resort.

> Wouldn't it be better to just stop doing that, and to implement a
> respectable and reliable kernel->userspace messaging scheme?

Convenience functions on top of netlink?

> And leave syslog alone - it's a crude low-level thing for random
> unexpected things which operators might want to know about.

That sentence is a result of not calling a container admin an operator.
I can't argue it because I'm not sure whether to agree with that
classification.

-serge

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/5] Containerize syslog
  2012-12-07 14:23           ` Serge Hallyn
@ 2012-12-07 14:30             ` Glauber Costa
       [not found]               ` <50C1FD9D.5020703-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  2012-12-07 18:21             ` Eric W. Biederman
  1 sibling, 1 reply; 14+ messages in thread
From: Glauber Costa @ 2012-12-07 14:30 UTC (permalink / raw)
  To: Serge Hallyn
  Cc: Rui Xiang, netdev-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Eric W. Biederman

On 12/07/2012 06:23 PM, Serge Hallyn wrote:
> Quoting Andrew Morton (akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org):
>> On Mon, 19 Nov 2012 01:51:09 -0800 ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman) wrote:
>>
>>> Are there any kernel print statements besides networking stack printks
>>> that we want to move to show up in a new "kernel log" namespace?
>>
>> That's a good question, and afaict it remains unanswered.
> 
> There are some other (not *terribly* compelling) cases.  For instance
> selinux hooks, if you say mount an fs without xattr support or with
> unsupported options, will printk a warning.  Things like stat.c and
> capabilities and syslog print out warnings when userspace uses a
> deprecated somethingorother - old stat syscall or sys_syslog without
> CAP_SYSLOG.  That should go to the container.  Filesystems may give
> warnings (bad mount options for tmpfs, bad uid owner for many of them,
> etc) which belong in the container.  Obviously some belong on the host -
> if they show a corrupt superblock which may indicate an attempt by the
> container to crash the kernel.
> 
>> As so often happens, this patchset's changelogs forgot to describe the
>> reason for the existence of this patchset.  Via a bit of lwn reading
> 
> Not as a separate justification admittedly, but the description was
> meant to explain it:  right now /dev/kmsg and sys_syslog are not safe
> and useful in a container;  syslog messages from host and containers
> can be confusingly intermixed;  and helpful printks are not seen in
> the container.
> 
>> and my awesome telepathic skills, I divine that something in networking
>> is using syslog for kernel->userspace communications.
>>
>> wtf?
> 
> Well, syslog is the kernel->userspace channel of last resort.
> 
>> Wouldn't it be better to just stop doing that, and to implement a
>> respectable and reliable kernel->userspace messaging scheme?
> 
> Convenience functions on top of netlink?
> 
>> And leave syslog alone - it's a crude low-level thing for random
>> unexpected things which operators might want to know about.
> 
> That sentence is a result of not calling a container admin an operator.
> I can't argue it because I'm not sure whether to agree with that
> classification.
> 

I keep asking myself if it isn't the case of forwarding to a container
all messages printed in process context. That will obviously exclude all
messages resulting from kthreads - that will always be in the initial
namespace anyway, interrupts, etc. There is no harm, for instance, in
delivering the same message twice: one to the container, and the other
to the host system.

Isn't it natural that if the kernel printed something on behalf of a
process, whoever is the admin of the machine that process lives on
should see what it is about?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/5] Containerize syslog
       [not found]               ` <50C1FD9D.5020703-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-12-07 18:05                 ` Eric W. Biederman
  2012-12-11  8:25                   ` Glauber Costa
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2012-12-07 18:05 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Rui Xiang, Andrew Morton,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	netdev-u79uwXL29TY76Z2rM5mHXA

Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:

> I keep asking myself if it isn't the case of forwarding to a container
> all messages printed in process context. That will obviously exclude all
> messages resulting from kthreads - that will always be in the initial
> namespace anyway, interrupts, etc. There is no harm, for instance, in
> delivering the same message twice: one to the container, and the other
> to the host system.

Except that there is harm in double printing.  One of the better
justifications for doing something with the kernel log is that it is
possible to overflow the kernel log with operations performed
exclusively in a container.

I do think the idea of process context printks going to the current
container one worth playing with.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/5] Containerize syslog
  2012-12-07 14:23           ` Serge Hallyn
  2012-12-07 14:30             ` Glauber Costa
@ 2012-12-07 18:21             ` Eric W. Biederman
  1 sibling, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2012-12-07 18:21 UTC (permalink / raw)
  To: Serge Hallyn; +Cc: Andrew Morton, Rui Xiang, containers, netdev

Serge Hallyn <serge.hallyn@canonical.com> writes:

> Not as a separate justification admittedly, but the description was
> meant to explain it:  right now /dev/kmsg and sys_syslog are not safe
> and useful in a container;

The user namespace solves this the biggest practical problem, like it
solves so many other problems of excessive privileges in a container.

Since these patches are still mostly in the design
proof-of-concept/design phase I am inclined to see how getting a usable
user namespace affects the situation on the ground.

But I do think there are issues to be solved in some fashion.  We have
the possibiloity of configuring firewall logging rules that are not
usable in containers.   Similarly reasons for mount failures and number
of other cases go silent.

I think it makes some sense to change where we put things in the kernel
log to solve these things but it also makes sense to ask the question
is there a better solution.  Hopefully a little more experience with
these issues and time playing with ideas can make things clear.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/5] Containerize syslog
  2012-12-07 18:05                 ` Eric W. Biederman
@ 2012-12-11  8:25                   ` Glauber Costa
       [not found]                     ` <50C6EDF0.5060108-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Glauber Costa @ 2012-12-11  8:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge Hallyn, Andrew Morton, Rui Xiang, netdev, containers

On 12/07/2012 10:05 PM, Eric W. Biederman wrote:
> Glauber Costa <glommer@parallels.com> writes:
> 
>> I keep asking myself if it isn't the case of forwarding to a container
>> all messages printed in process context. That will obviously exclude all
>> messages resulting from kthreads - that will always be in the initial
>> namespace anyway, interrupts, etc. There is no harm, for instance, in
>> delivering the same message twice: one to the container, and the other
>> to the host system.
> 
> Except that there is harm in double printing.  One of the better
> justifications for doing something with the kernel log is that it is
> possible to overflow the kernel log with operations performed
> exclusively in a container.
> 
I don't agree with you here.

If we are double printing, we are using up more memory, but we also have
an extra buffer anyway. The messages are print on behalf of the user,
but still, by the kernel.

So one of the following will necessarily hold:

1) There is no way that the process can overflow the main log, and as a
consequence, the container log, that has less messages than it.

2) The process will overflow the main log. But since we are not printing
anything extra to the main log compared to the scenario in which the
process lives in the main namespace, this would already be a problem
independent of namespaces. And needs to be fixed.

IOW, double printing should not print anything *extra* to the main log.
It just prints to the container log, and leaves a copy to the box admin
to see. I think it is very reasonable to imagine that the main admin
would like to see anything the kernel has to tell him about the box.

> I do think the idea of process context printks going to the current
> container one worth playing with.
> 

It still leaves the problem of prinkts outside process context that
should go to a namespace open. But it is easy to extend this idea to do
both.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/5] Containerize syslog
       [not found]                     ` <50C6EDF0.5060108-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-12-11 18:22                       ` Eric W. Biederman
  2012-12-12  8:56                         ` Glauber Costa
  0 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2012-12-11 18:22 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Rui Xiang, Andrew Morton,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	netdev-u79uwXL29TY76Z2rM5mHXA

Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:

> On 12/07/2012 10:05 PM, Eric W. Biederman wrote:
>> Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> writes:
>> 
>>> I keep asking myself if it isn't the case of forwarding to a container
>>> all messages printed in process context. That will obviously exclude all
>>> messages resulting from kthreads - that will always be in the initial
>>> namespace anyway, interrupts, etc. There is no harm, for instance, in
>>> delivering the same message twice: one to the container, and the other
>>> to the host system.
>> 
>> Except that there is harm in double printing.  One of the better
>> justifications for doing something with the kernel log is that it is
>> possible to overflow the kernel log with operations performed
>> exclusively in a container.
>> 
> I don't agree with you here.
>
> If we are double printing, we are using up more memory, but we also have
> an extra buffer anyway. The messages are print on behalf of the user,
> but still, by the kernel.
>
> So one of the following will necessarily hold:
>
> 1) There is no way that the process can overflow the main log, and as a
> consequence, the container log, that has less messages than it.
>
> 2) The process will overflow the main log. But since we are not printing
> anything extra to the main log compared to the scenario in which the
> process lives in the main namespace, this would already be a problem
> independent of namespaces. And needs to be fixed.

Well mounts, brining network interfaces up and down, running packets
through our own choice of firewall rules, possibly enabling debug
messages on network interfaces has the potential to create messages we
aren't seeing today.

> IOW, double printing should not print anything *extra* to the main log.
> It just prints to the container log, and leaves a copy to the box admin
> to see. I think it is very reasonable to imagine that the main admin
> would like to see anything the kernel has to tell him about the box.

The only reason that I have seen for doing anything with printks is
because we are generating messages that would not be generated in a
non-container environment.  At which point double printing is scary
because it allows a container user to flood the kernel log ring buffer
and suppress interesting messages.

>> I do think the idea of process context printks going to the current
>> container one worth playing with.
>> 
>
> It still leaves the problem of prinkts outside process context that
> should go to a namespace open. But it is easy to extend this idea to do
> both.

Hmm.  For printks from process context I think I can see a point where
double printing makes sense, because that is a rather indiscriminate grab
of printk messages.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/5] Containerize syslog
  2012-12-11 18:22                       ` Eric W. Biederman
@ 2012-12-12  8:56                         ` Glauber Costa
       [not found]                           ` <50C846C7.5050904-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Glauber Costa @ 2012-12-12  8:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge Hallyn, Andrew Morton, Rui Xiang, netdev, containers

On 12/11/2012 10:22 PM, Eric W. Biederman wrote:
> Glauber Costa <glommer@parallels.com> writes:
> 
>> On 12/07/2012 10:05 PM, Eric W. Biederman wrote:
>>> Glauber Costa <glommer@parallels.com> writes:
>>>
>>>> I keep asking myself if it isn't the case of forwarding to a container
>>>> all messages printed in process context. That will obviously exclude all
>>>> messages resulting from kthreads - that will always be in the initial
>>>> namespace anyway, interrupts, etc. There is no harm, for instance, in
>>>> delivering the same message twice: one to the container, and the other
>>>> to the host system.
>>>
>>> Except that there is harm in double printing.  One of the better
>>> justifications for doing something with the kernel log is that it is
>>> possible to overflow the kernel log with operations performed
>>> exclusively in a container.
>>>
>> I don't agree with you here.
>>
>> If we are double printing, we are using up more memory, but we also have
>> an extra buffer anyway. The messages are print on behalf of the user,
>> but still, by the kernel.
>>
>> So one of the following will necessarily hold:
>>
>> 1) There is no way that the process can overflow the main log, and as a
>> consequence, the container log, that has less messages than it.
>>
>> 2) The process will overflow the main log. But since we are not printing
>> anything extra to the main log compared to the scenario in which the
>> process lives in the main namespace, this would already be a problem
>> independent of namespaces. And needs to be fixed.
> 
> Well mounts, brining network interfaces up and down, running packets
> through our own choice of firewall rules, possibly enabling debug
> messages on network interfaces has the potential to create messages we
> aren't seeing today.
> 

There are two kinds of messages: the ones that would be print anyway if
you were not running in an ns, which we have no reason to fear, and the
ones that only exist because we wrote the code for them, due to ns
support. They are no different from writing a new fs support, driver,
etc. Any new functionality can print new messages, and we have to be
sure they won't fill the log. We write that code, so it is on us to make
sure the messages are being print in a reasonable rate.

This should be true for all messages running in process context. It is
either that, or we have a bug and we relying on a specific clone flag to
protect us against the buffer overrun.


>> IOW, double printing should not print anything *extra* to the main log.
>> It just prints to the container log, and leaves a copy to the box admin
>> to see. I think it is very reasonable to imagine that the main admin
>> would like to see anything the kernel has to tell him about the box.
> 
> The only reason that I have seen for doing anything with printks is
> because we are generating messages that would not be generated in a
> non-container environment.  At which point double printing is scary
> because it allows a container user to flood the kernel log ring buffer
> and suppress interesting messages.
> 
>>> I do think the idea of process context printks going to the current
>>> container one worth playing with.
>>>
>>
>> It still leaves the problem of prinkts outside process context that
>> should go to a namespace open. But it is easy to extend this idea to do
>> both.
> 
> Hmm.  For printks from process context I think I can see a point where
> double printing makes sense, because that is a rather indiscriminate grab
> of printk messages.
> 
exactly. What I have in mind this whole time, is that if you are
printing a message of interest of the container admin, it is *very
likely* also of interest of the box admin, specially if it indicates
something going wrong. Maybe what goes and does not go to the main
buffer can be determined by the log level of each buffer. But still, I
think that just hiding them from the box admin may not exactly be what
we want.

Cheers

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RFC 0/5] Containerize syslog
       [not found]                           ` <50C846C7.5050904-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2012-12-12 20:08                             ` Eric W. Biederman
  0 siblings, 0 replies; 14+ messages in thread
From: Eric W. Biederman @ 2012-12-12 20:08 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Rui Xiang, Andrew Morton,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	netdev-u79uwXL29TY76Z2rM5mHXA


This conversation is really debating the wrong question.

The fundamental question is can we modify the kernel such that
containers don't care about anything that goes into the kernel log.

The fundamental question leads to the question what functionality
that is logged to the kernel log must we see in containers.

Serge has a collected a lot of cases and it seems reasonable to assume
that we will find those cases that must show up in a container or
require huge userspace reworking.

These messages that must show up in a container (or break userspace) are
messages that already exist today.  These messages quite likely are
specific to the container itself and I don't think we will need to
double log them.


But the first step of the process is to find the kernel messages that if
we don't log to user space via the kernel log will break user space.

Once we have found those messages and the cases in which there are no
work-arounds we can refine the design with nice to haves.

But this conversation really needs to be about which messages must we
deliver to userspace via the kernel log.


There are a lot of nice to haves messages out there.  Mount failures,
selinux failures etc.  Some of those have a pretty compelling case to be
double logged.

I think for the compelling cases we won't want to doulbe log them, and
those compelling cases need to drive the design.



The most compelling case I know of are firewall log messages that are
logge by explicit user request when selected packets hit the local
firewall.  Those messages I definitely don't want to double log and
those messages are most definitely not interesting to the operator of
the machine.  Those messages are only interesting to the admin who
configured his firewall to log them.

We should look to see if there are alternatives to user configured
firewall log messages that a system administratrator can use.  If not we
have our first compelling case for this functionality.

Eric

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-12-12 20:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-19  8:16 [PATCH RFC 0/5] Containerize syslog Rui Xiang
     [not found] ` <50A9EAD8.9090501-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-11-19  9:51   ` Eric W. Biederman
     [not found]     ` <874nklkjjm.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-07  9:03       ` Andrew Morton
     [not found]         ` <20121207010355.c809b3f7.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2012-12-07 14:23           ` Serge Hallyn
2012-12-07 14:30             ` Glauber Costa
     [not found]               ` <50C1FD9D.5020703-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-12-07 18:05                 ` Eric W. Biederman
2012-12-11  8:25                   ` Glauber Costa
     [not found]                     ` <50C6EDF0.5060108-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-12-11 18:22                       ` Eric W. Biederman
2012-12-12  8:56                         ` Glauber Costa
     [not found]                           ` <50C846C7.5050904-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-12-12 20:08                             ` Eric W. Biederman
2012-12-07 18:21             ` Eric W. Biederman
2012-11-19 14:37   ` Serge E. Hallyn
     [not found]     ` <20121119143702.GB4620-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-11-21  9:35       ` Rui Xiang
2012-11-26 15:16         ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).