All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Topi Miettinen <toiwoton@gmail.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Tejun Heo <tj@kernel.org>, lkml <linux-kernel@vger.kernel.org>,
	luto@kernel.org, Kees Cook <keescook@chromium.org>,
	Jonathan Corbet <corbet@lwn.net>, Li Zefan <lizefan@huawei.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Serge Hallyn <serge.hallyn@canonical.com>,
	James Morris <james.l.morris@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Howells <dhowells@redhat.com>,
	David Woodhouse <David.Woodhouse@intel.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	"open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>,
	"open list:CAPABILITIES" <linux-security-module@vger.kernel.org>
Subject: Re: [PATCH] capabilities: add capability cgroup controller
Date: Fri, 8 Jul 2016 11:13:32 +0200	[thread overview]
Message-ID: <20160708091332.GD3556@pathway.suse.cz> (raw)
In-Reply-To: <e81401b8-a731-9c22-f5a3-ce25eefb1471@gmail.com>

On Thu 2016-07-07 20:27:13, Topi Miettinen wrote:
> On 07/07/16 09:16, Petr Mladek wrote:
> > On Sun 2016-07-03 15:08:07, Topi Miettinen wrote:
> >> The attached patch would make any uses of capabilities generate audit
> >> messages. It works for simple tests as you can see from the commit
> >> message, but unfortunately the call to audit_cgroup_list() deadlocks the
> >> system when booting a full blown OS. There's no deadlock when the call
> >> is removed.
> >>
> >> I guess that in some cases, cgroup_mutex and/or css_set_lock could be
> >> already held earlier before entering audit_cgroup_list(). Holding the
> >> locks is however required by task_cgroup_from_root(). Is there any way
> >> to avoid this? For example, only print some kind of cgroup ID numbers
> >> (are there unique and stable IDs, available without locks?) for those
> >> cgroups where the task is registered in the audit message?
> > 
> > I am not sure if anyone know what really happens here. I suggest to
> > enable lockdep. It might detect possible deadlock even before it
> > really happens, see Documentation/locking/lockdep-design.txt
> > 
> > It can be enabled by
> > 
> >    CONFIG_PROVE_LOCKING=y
> > 
> > It depends on
> > 
> >     CONFIG_DEBUG_KERNEL=y
> > 
> > and maybe some more options, see lib/Kconfig.debug
> 
> Thanks a lot! I caught this stack dump:
> 
> starting version 230
> [    3.416647] ------------[ cut here ]------------
> [    3.417310] WARNING: CPU: 0 PID: 95 at
> /home/topi/d/linux.git/kernel/locking/lockdep.c:2871
> lockdep_trace_alloc+0xb4/0xc0
> [    3.417605] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
> [    3.417923] Modules linked in:
> [    3.418288] CPU: 0 PID: 95 Comm: systemd-udevd Not tainted 4.7.0-rc5+ #97
> [    3.418444] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS Debian-1.8.2-1 04/01/2014
> [    3.418726]  0000000000000086 000000007970f3b0 ffff88000016fb00
> ffffffff813c9c45
> [    3.418993]  ffff88000016fb50 0000000000000000 ffff88000016fb40
> ffffffff81091e9b
> [    3.419176]  00000b3705e2c798 0000000000000046 0000000000000410
> 00000000ffffffff
> [    3.419374] Call Trace:
> [    3.419511]  [<ffffffff813c9c45>] dump_stack+0x67/0x92
> [    3.419644]  [<ffffffff81091e9b>] __warn+0xcb/0xf0
> [    3.419745]  [<ffffffff81091f1f>] warn_slowpath_fmt+0x5f/0x80
> [    3.419868]  [<ffffffff810e9a84>] lockdep_trace_alloc+0xb4/0xc0
> [    3.419988]  [<ffffffff8120dc42>] kmem_cache_alloc_node+0x42/0x600
> [    3.420156]  [<ffffffff8110432d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
> [    3.420170]  [<ffffffff8163183b>] __alloc_skb+0x5b/0x1d0
> [    3.420170]  [<ffffffff81144f6b>] audit_log_start+0x29b/0x480
> [    3.420170]  [<ffffffff810a2925>] ? __lock_task_sighand+0x95/0x270
> [    3.420170]  [<ffffffff81145cc9>] audit_log_cap_use+0x39/0xf0
> [    3.420170]  [<ffffffff8109cd75>] ns_capable+0x45/0x70
> [    3.420170]  [<ffffffff8109cdb7>] capable+0x17/0x20
> [    3.420170]  [<ffffffff812a2f50>] oom_score_adj_write+0x150/0x2f0
> [    3.420170]  [<ffffffff81230997>] __vfs_write+0x37/0x160
> [    3.420170]  [<ffffffff810e33b7>] ? update_fast_ctr+0x17/0x30
> [    3.420170]  [<ffffffff810e3449>] ? percpu_down_read+0x49/0x90
> [    3.420170]  [<ffffffff81233d47>] ? __sb_start_write+0xb7/0xf0
> [    3.420170]  [<ffffffff81233d47>] ? __sb_start_write+0xb7/0xf0
> [    3.420170]  [<ffffffff81231048>] vfs_write+0xb8/0x1b0
> [    3.420170]  [<ffffffff812533c6>] ? __fget_light+0x66/0x90
> [    3.420170]  [<ffffffff81232078>] SyS_write+0x58/0xc0
> [    3.420170]  [<ffffffff81001f2c>] do_syscall_64+0x5c/0x300
> [    3.420170]  [<ffffffff81849c9a>] entry_SYSCALL64_slow_path+0x25/0x25
> [    3.420170] ---[ end trace fb586899fb556a5e ]---
> [    3.447922] random: systemd-udevd urandom read with 3 bits of entropy
> available
> [    4.014078] clocksource: Switched to clocksource tsc
> Begin: Loading essential drivers ... done.
> 
> This is with qemu and the boot continues normally. With real computer,
> there's no such output and system just seems to freeze.
> 
> Could it be possible that the deadlock happens because there's some IO
> towards /sys/fs/cgroup, which causes a capability check and that in turn
> causes locking problems when we try to print cgroup list?

The above warning is printed by the code from
kernel/locking/lockdep.c:2871

static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags)
{
[...]
	/* We're only interested __GFP_FS allocations for now */
	if (!(gfp_mask & __GFP_FS))
		return;

	/*
	 * Oi! Can't be having __GFP_FS allocations with IRQs disabled.
	 */
	if (DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)))
		return;


The backtrace shows that your new audit_log_cap_use() is called
from vfs_write(). You might try to use audit_log_start() with
GFP_NOFS instead of GFP_KERNEL.

Note that this is rather intuitive advice. I still need to learn a lot
about memory management and kernel in general to be more sure about
a correct solution.

Best Regards,
Petr

  reply	other threads:[~2016-07-08  9:13 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-23 15:07 [PATCH] capabilities: add capability cgroup controller Topi Miettinen
2016-06-23 15:07 ` Topi Miettinen
2016-06-23 21:03 ` Kees Cook
2016-06-23 21:03   ` Kees Cook
2016-06-23 21:38 ` Tejun Heo
2016-06-24  0:22   ` Topi Miettinen
2016-06-24 15:48     ` Tejun Heo
2016-06-24 15:59       ` Serge E. Hallyn
2016-06-24 16:35         ` Tejun Heo
2016-06-24 16:35           ` Tejun Heo
2016-06-24 16:59           ` Serge E. Hallyn
2016-06-24 17:21             ` Eric W. Biederman
2016-06-24 17:21               ` Eric W. Biederman
2016-06-24 17:39               ` Serge E. Hallyn
2016-06-26 19:03               ` Topi Miettinen
2016-06-26 19:03                 ` Topi Miettinen
2016-06-28  4:57                 ` Eric W. Biederman
2016-06-28  4:57                   ` Eric W. Biederman
2016-07-02 11:20                   ` Topi Miettinen
2016-07-02 11:20                     ` Topi Miettinen
2016-06-24 17:24             ` Tejun Heo
2016-06-26 19:14               ` Topi Miettinen
2016-06-26 22:26                 ` Tejun Heo
2016-06-27 14:54                   ` Serge E. Hallyn
2016-06-27 19:10                     ` Topi Miettinen
2016-06-27 19:17                       ` Tejun Heo
2016-06-27 19:49                         ` Serge E. Hallyn
2016-06-27 19:49                           ` Serge E. Hallyn
2016-07-03 15:08                           ` Topi Miettinen
2016-07-03 15:08                             ` Topi Miettinen
2016-07-03 16:13                             ` [PATCH] capabilities: audit capability use kbuild test robot
2016-07-03 16:13                               ` kbuild test robot
2016-07-07  9:16                             ` [PATCH] capabilities: add capability cgroup controller Petr Mladek
2016-07-07 20:27                               ` Topi Miettinen
2016-07-08  9:13                                 ` Petr Mladek [this message]
2016-07-09 16:38                                   ` Topi Miettinen
2016-07-10  9:04                                   ` Topi Miettinen
2016-07-10  9:04                                     ` Topi Miettinen
2016-06-23 23:46 ` Andrew Morton
2016-06-23 23:46   ` Andrew Morton
2016-06-24  1:14   ` Topi Miettinen
2016-06-24  1:14     ` Topi Miettinen
2016-06-24  4:15     ` Andy Lutomirski
2016-06-24  4:15       ` Andy Lutomirski
2016-06-25 18:00       ` Djalal Harouni
2016-06-25 18:00         ` Djalal Harouni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160708091332.GD3556@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=David.Woodhouse@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=hannes@cmpxchg.org \
    --cc=james.l.morris@oracle.com \
    --cc=keescook@chromium.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=luto@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=serge.hallyn@canonical.com \
    --cc=serge@hallyn.com \
    --cc=tj@kernel.org \
    --cc=toiwoton@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.