From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-f68.google.com ([209.85.210.68]:38893 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731957AbeISTzZ (ORCPT ); Wed, 19 Sep 2018 15:55:25 -0400 Received: by mail-ot1-f68.google.com with SMTP id n5-v6so5926223otl.5 for ; Wed, 19 Sep 2018 07:17:17 -0700 (PDT) MIME-Version: 1.0 References: <20180919070737.GB17524@uranus.lan> <20180919071056.GC17524@uranus.lan> In-Reply-To: <20180919071056.GC17524@uranus.lan> From: Jann Horn Date: Wed, 19 Sep 2018 16:16:50 +0200 Message-ID: Subject: Re: [linux-next] BUG triggered in ptraceme To: Cyrill Gorcunov , Alexander Viro , linux-fsdevel@vger.kernel.org, Michal Hocko Cc: Oleg Nesterov , avagin@virtuozzo.com, kernel list Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Adding FS people to figure out whether GFP_KERNEL allocations with i_rwsem's held for writing are okay. On Wed, Sep 19, 2018 at 9:10 AM Cyrill Gorcunov wrote: > On Wed, Sep 19, 2018 at 10:07:37AM +0300, Cyrill Gorcunov wrote: > > Hi Oleg! While been testing criu with linux-next we've triggered a BUG. > > https://api.travis-ci.org/v3/job/430308998/log.txt > > > > [ 2.461618] BUG: sleeping function called from invalid context at security/apparmor/include/cred.h:154 > > [ 2.461794] in_atomic(): 1, irqs_disabled(): 1, pid: 152, name: init > > [ 2.461890] 1 lock held by init/152: > > [ 2.461981] #0: 00000000f30c3fda (tasklist_lock){.+.+}, at: ptrace_traceme+0x1c/0x70 > > [ 2.462114] irq event stamp: 2524 > > [ 2.462242] hardirqs last enabled at (2523): [] do_syscall_64+0x12/0x190 > > [ 2.462363] hardirqs last disabled at (2524): [] _raw_write_lock_irq+0xf/0x40 > > [ 2.462476] softirqs last enabled at (1904): [] unix_sock_destructor+0x4f/0xc0 > > [ 2.462586] softirqs last disabled at (1902): [] unix_sock_destructor+0x4f/0xc0 > > [ 2.462697] CPU: 1 PID: 152 Comm: init Not tainted 4.19.0-rc4-next-20180918+ #1 > > > > Which is due to commit > > > > commit 4b105cbbaf7c06e01c27391957dc3c446328d087 > > Author: Oleg Nesterov > > Date: Wed Jun 17 16:27:33 2009 -0700 > > > > ptrace: do not use task_lock() for attach > > > > because now after write_lock_irq(&tasklist_lock); apparmor calls for > > traceme and > > > > static inline struct aa_label *begin_current_label_crit_section(void) > > { > > struct aa_label *label = aa_current_raw_label(); > > > > --> might_sleep(); > > > > Take a look please, once time permit. > > Heh, actually not :) It is due to commit > > commit 1f8266ff58840d698a1e96d2274189de1bdf7969 > Author: Jann Horn > Date: Thu Sep 13 18:12:09 2018 +0200 > > which introduced might_sleep. Seems it is bad idea to send bug report > without having a cup of coffee at the morning :) Yeah, I fixed one sleep-in-atomic bug and figured I'd throw a might_sleep() in there for good measure... sigh. I guess now I have to go through all the callers of begin_current_label_crit_section() to see what else looks wrong... apparmor_ptrace_traceme() is wrong, as reported... apparmor_path_link() looks icky, but I'm not sure - from what I can tell, it's called with an i_rwsem held for writing, and that probably makes calling back into filesystem context from there a bad idea? OTOH, it's just the i_rwsem of a newly-created path, so I don't know whether that's actually an issue... security_path_rename() is called with two i_rwsem's held, but again, I'm not sure whether that's a problem.