* [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() @ 2014-01-09 21:27 Steven Rostedt 2014-01-09 21:42 ` Matthew Wilcox 0 siblings, 1 reply; 20+ messages in thread From: Steven Rostedt @ 2014-01-09 21:27 UTC (permalink / raw) To: LKML, linux-fsdevel Cc: Alexander Viro, Stephen Smalley, Christoph Hellwig, Linus Torvalds, Eric Paris, Theodore Ts'o, Dave Chinner, James Morris, Paul Moore, Andrew Morton, Paul E. McKenney, stable While running stress tests on adding and deleting ftrace instances I hit this bug: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160 PGD 63681067 PUD 7ddbe067 PMD 0 Oops: 0000 [#1] PREEMPT CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000 RIP: 0010:[<ffffffff812d8bc5>] [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160 RSP: 0018:ffff88007ddb1c48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840 RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000 RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54 R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000 R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000 FS: 00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0 Stack: 0000000000000081 ffff88006ee46000 0000000000000081 ffffffff812d8b45 ffff88006ee46000 0000000000000081 ffff880078375800 ffff880078375800 ffff88007ddb1c98 ffffffff812d358c ffff88007ddb1cb8 ffffffff811364f1 Call Trace:^M [<ffffffff812d8b45>] ? selinux_inode_permission+0x5/0x160 [<ffffffff812d358c>] security_inode_permission+0x1c/0x30 [<ffffffff811364f1>] __inode_permission+0x41/0xa0 [<ffffffff81136568>] inode_permission+0x18/0x50 [<ffffffff811378b6>] link_path_walk+0x66/0x920 [<ffffffff810875a5>] ? __rcu_read_lock+0x5/0x20 [<ffffffff8113a9e6>] path_openat+0xa6/0x6c0 [<ffffffff8113b000>] ? path_openat+0x6c0/0x6c0 [<ffffffff810c4e69>] ? __trace_graph_entry+0x49/0xc0 [<ffffffff8112b196>] ? do_sys_open+0x146/0x240 [<ffffffff8113b043>] do_filp_open+0x43/0xa0 [<ffffffff8113b005>] ? do_filp_open+0x5/0xa0 [<ffffffff8112b196>] do_sys_open+0x146/0x240 [<ffffffff8112b2ae>] SyS_open+0x1e/0x20 [<ffffffff81948cd0>] system_call_fastpath+0x16/0x1b Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff RIP [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160 RSP <ffff88007ddb1c48> CR2: 0000000000000020 ---[ end trace 9d800e5ac5059462 ]--- Investigating, I found that the inode->i_security was NULL, and the dereference of it caused the oops. in selinux_inode_permission(): ---- isec = inode->i_security; rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd); ---- Note, the crash came from stressing the deletion and reading of debugfs files. I was not able to recreate this via normal files. But I'm not sure they are safe. It may just be that the race window is much harder to hit. What seems to have happened (and what I have traced), is the file is being opened at the same time the file or directory is being deleted. As the dentry and inode locks are not held during the path walk, nor is the inodes ref counts being incremented, there is nothing saving these structures from being discarded except for an rcu_read_lock(). The rcu_read_lock() protects against freeing of the inode, but it does not protect freeing of the inode_security_struct. Talking with Eric Paris about this, it seems to be a generic issue with the destroy_inode() calling security_inode_free() when the inode may still be in use (protected by rcu). It seems that the true destruction of the inode (done by __destroy_inode()) should also be protect by rcu. Cc: stable@vger.kernel.org Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> --- This is based off of this thread: https://lkml.org/lkml/2014/1/9/349 And perhaps is the true fix for: http://oss.sgi.com/archives/xfs/2013-11/msg00709.html diff --git a/fs/inode.c b/fs/inode.c index 4bcdad3..a8f3b88 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -252,16 +252,17 @@ EXPORT_SYMBOL(__destroy_inode); static void i_callback(struct rcu_head *head) { struct inode *inode = container_of(head, struct inode, i_rcu); + __destroy_inode(inode); kmem_cache_free(inode_cachep, inode); } static void destroy_inode(struct inode *inode) { BUG_ON(!list_empty(&inode->i_lru)); - __destroy_inode(inode); - if (inode->i_sb->s_op->destroy_inode) + if (inode->i_sb->s_op->destroy_inode) { + __destroy_inode(inode); inode->i_sb->s_op->destroy_inode(inode); - else + } else call_rcu(&inode->i_rcu, i_callback); } ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-09 21:27 [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() Steven Rostedt @ 2014-01-09 21:42 ` Matthew Wilcox 2014-01-09 21:50 ` Steven Rostedt 0 siblings, 1 reply; 20+ messages in thread From: Matthew Wilcox @ 2014-01-09 21:42 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, linux-fsdevel, Alexander Viro, Stephen Smalley, Christoph Hellwig, Linus Torvalds, Eric Paris, Theodore Ts'o, Dave Chinner, James Morris, Paul Moore, Andrew Morton, Paul E. McKenney, stable On Thu, Jan 09, 2014 at 04:27:31PM -0500, Steven Rostedt wrote: > Note, the crash came from stressing the deletion and reading of debugfs > files. I was not able to recreate this via normal files. But I'm not > sure they are safe. It may just be that the race window is much harder > to hit. But "normal" files have a 'destroy_inode' method. So you've basically only fixed it for debugfs (and maybe a few other unusual filesystems). Why doesn't the code look like this: static void i_callback(struct rcu_head *head) { struct inode *inode = container_of(head, struct inode, i_rcu); __destroy_inode(inode); if (inode->i_sb->s_op->destroy_inode) inode->i_sb->s_op->destroy_inode(inode); else kmem_cache_free(inode_cachep, inode); } static void destroy_inode(struct inode *inode) { BUG_ON(!list_empty(&inode->i_lru)); call_rcu(&inode->i_rcu, i_callback); } We'd then have to get rid of all the call_rcu() invocations in individual filesystems' destroy_inode methods, but that doesn't sound like a bad thing to me. -- Matthew Wilcox Intel Open Source Technology Centre "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step." ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-09 21:42 ` Matthew Wilcox @ 2014-01-09 21:50 ` Steven Rostedt 2014-01-09 22:31 ` Al Viro 0 siblings, 1 reply; 20+ messages in thread From: Steven Rostedt @ 2014-01-09 21:50 UTC (permalink / raw) To: Matthew Wilcox Cc: LKML, linux-fsdevel, Alexander Viro, Stephen Smalley, Christoph Hellwig, Linus Torvalds, Eric Paris, Theodore Ts'o, Dave Chinner, James Morris, Paul Moore, Andrew Morton, Paul E. McKenney, stable On Thu, 9 Jan 2014 14:42:39 -0700 Matthew Wilcox <matthew@wil.cx> wrote: > On Thu, Jan 09, 2014 at 04:27:31PM -0500, Steven Rostedt wrote: > > Note, the crash came from stressing the deletion and reading of debugfs > > files. I was not able to recreate this via normal files. But I'm not > > sure they are safe. It may just be that the race window is much harder > > to hit. > > But "normal" files have a 'destroy_inode' method. So you've basically > only fixed it for debugfs (and maybe a few other unusual filesystems). > Why doesn't the code look like this: Because I thought of that after I sent the email ;-) Well, that's not really true. I don't know the semantics of the destroy_inode() call. But I should have asked that in my change log. > > static void i_callback(struct rcu_head *head) > { > struct inode *inode = container_of(head, struct inode, i_rcu); > __destroy_inode(inode); > if (inode->i_sb->s_op->destroy_inode) > inode->i_sb->s_op->destroy_inode(inode); > else > kmem_cache_free(inode_cachep, inode); > } > > static void destroy_inode(struct inode *inode) > { > BUG_ON(!list_empty(&inode->i_lru)); > call_rcu(&inode->i_rcu, i_callback); > } > > We'd then have to get rid of all the call_rcu() invocations in individual > filesystems' destroy_inode methods, but that doesn't sound like a bad > thing to me. > Which is another reason that I didn't do it, as I didn't know all the happenings inside the ->destroy_inode() calls. But yeah, I agree with this. Also, can iput() sleep? If not then we are OK. Otherwise, we need to be careful about any mutex being grabbed in those call backs, as the rcu_callback can't sleep either. -- Steve ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-09 21:50 ` Steven Rostedt @ 2014-01-09 22:31 ` Al Viro [not found] ` <CA+55aFzCTPYEQCPnLBi1CwmMTocVqCFiCuJ391HkVx1CMw61ug@mail.gmail.com> 0 siblings, 1 reply; 20+ messages in thread From: Al Viro @ 2014-01-09 22:31 UTC (permalink / raw) To: Steven Rostedt Cc: Matthew Wilcox, LKML, linux-fsdevel, Stephen Smalley, Christoph Hellwig, Linus Torvalds, Eric Paris, Theodore Ts'o, Dave Chinner, James Morris, Paul Moore, Andrew Morton, Paul E. McKenney, stable On Thu, Jan 09, 2014 at 04:50:12PM -0500, Steven Rostedt wrote: > > We'd then have to get rid of all the call_rcu() invocations in individual > > filesystems' destroy_inode methods, but that doesn't sound like a bad > > thing to me. Check what e.g. XFS is doing... > Which is another reason that I didn't do it, as I didn't know all the > happenings inside the ->destroy_inode() calls. But yeah, I agree with > this. > > Also, can iput() sleep? If not then we are OK. Otherwise, we need to be > careful about any mutex being grabbed in those call backs, as the > rcu_callback can't sleep either. iput() definitely can sleep (that's when actual truncation and inode freeing is done for opened-and-unlinked files - on the final iput() after close()), but that' irrelevant here - fsnotify_delete_inode() grabs a bunch of mutexes, which makes calling it from rcu callback no-go. ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <CA+55aFzCTPYEQCPnLBi1CwmMTocVqCFiCuJ391HkVx1CMw61ug@mail.gmail.com>]
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() [not found] ` <CA+55aFzCTPYEQCPnLBi1CwmMTocVqCFiCuJ391HkVx1CMw61ug@mail.gmail.com> @ 2014-01-09 23:10 ` Paul E. McKenney 2014-01-09 23:25 ` Steven Rostedt 1 sibling, 0 replies; 20+ messages in thread From: Paul E. McKenney @ 2014-01-09 23:10 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Dave Chinner, linux-fsdevel, James Morris, Andrew Morton, Stephen Smalley, Theodore Ts'o, Steven Rostedt, Eric Paris, stable, Paul Moore, LKML, Matthew Wilcox, Christoph Hellwig On Fri, Jan 10, 2014 at 06:41:03AM +0800, Linus Torvalds wrote: > I think the sane short term fix is to make the kfree() of the i_security > member be a rcu free, and not clear the member. Interesting use case. ;-) Thanx, Paul > Not pretty, but should did this case.. > > Linus > > On Jan 10, 2014 6:31 AM, "Al Viro" <viro@zeniv.linux.org.uk> wrote: > > > > iput() definitely can sleep (that's when actual truncation and inode > > freeing is done for opened-and-unlinked files - on the final iput() after > > close()), but that' irrelevant here - fsnotify_delete_inode() grabs > > a bunch of mutexes, which makes calling it from rcu callback no-go. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() [not found] ` <CA+55aFzCTPYEQCPnLBi1CwmMTocVqCFiCuJ391HkVx1CMw61ug@mail.gmail.com> 2014-01-09 23:10 ` Paul E. McKenney @ 2014-01-09 23:25 ` Steven Rostedt 2014-01-09 23:27 ` Steven Rostedt 1 sibling, 1 reply; 20+ messages in thread From: Steven Rostedt @ 2014-01-09 23:25 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Paul McKenney, Dave Chinner, linux-fsdevel, James Morris, Andrew Morton, Stephen Smalley, Theodore Ts'o, Eric Paris, stable, Paul Moore, LKML, Matthew Wilcox, Christoph Hellwig On Fri, 10 Jan 2014 06:41:03 +0800 Linus Torvalds <torvalds@linux-foundation.org> wrote: > I think the sane short term fix is to make the kfree() of the i_security > member be a rcu free, and not clear the member. You mean my first patch? https://lkml.org/lkml/2014/1/9/349 -- Steve > > Not pretty, but should did this case.. > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-09 23:25 ` Steven Rostedt @ 2014-01-09 23:27 ` Steven Rostedt 2014-01-09 23:37 ` Eric Paris 2014-01-09 23:45 ` Paul E. McKenney 0 siblings, 2 replies; 20+ messages in thread From: Steven Rostedt @ 2014-01-09 23:27 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Paul McKenney, Dave Chinner, linux-fsdevel, James Morris, Andrew Morton, Stephen Smalley, Theodore Ts'o, Eric Paris, stable, Paul Moore, LKML, Matthew Wilcox, Christoph Hellwig On Thu, 9 Jan 2014 18:25:23 -0500 Steven Rostedt <rostedt@goodmis.org> wrote: > On Fri, 10 Jan 2014 06:41:03 +0800 > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > I think the sane short term fix is to make the kfree() of the i_security > > member be a rcu free, and not clear the member. > > You mean my first patch? > > https://lkml.org/lkml/2014/1/9/349 > Oh wait, you said not to clear the member. Thus, the patch would look like this: Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Index: linux-trace.git/security/selinux/hooks.c =================================================================== --- linux-trace.git.orig/security/selinux/hooks.c +++ linux-trace.git/security/selinux/hooks.c @@ -234,6 +234,14 @@ static int inode_alloc_security(struct i return 0; } +static void inode_free_rcu(struct rcu_head *head) +{ + struct inode_security_struct *isec; + + isec = container_of(head, struct inode_security_struct, rcu); + kmem_cache_free(sel_inode_cache, isec); +} + static void inode_free_security(struct inode *inode) { struct inode_security_struct *isec = inode->i_security; @@ -244,8 +252,7 @@ static void inode_free_security(struct i list_del_init(&isec->list); spin_unlock(&sbsec->isec_lock); - inode->i_security = NULL; - kmem_cache_free(sel_inode_cache, isec); + call_rcu(&isec->rcu, inode_free_rcu); } static int file_alloc_security(struct file *file) Index: linux-trace.git/security/selinux/include/objsec.h =================================================================== --- linux-trace.git.orig/security/selinux/include/objsec.h +++ linux-trace.git/security/selinux/include/objsec.h @@ -38,7 +38,10 @@ struct task_security_struct { struct inode_security_struct { struct inode *inode; /* back pointer to inode object */ - struct list_head list; /* list of inode_security_struct */ + union { + struct list_head list; /* list of inode_security_struct */ + struct rcu_head rcu; /* for freeing the inode_security_struct */ + }; u32 task_sid; /* SID of creating task */ u32 sid; /* SID of this object */ u16 sclass; /* security class of this object */ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-09 23:27 ` Steven Rostedt @ 2014-01-09 23:37 ` Eric Paris 2014-01-09 23:45 ` Steven Rostedt 2014-01-09 23:53 ` Linus Torvalds 2014-01-09 23:45 ` Paul E. McKenney 1 sibling, 2 replies; 20+ messages in thread From: Eric Paris @ 2014-01-09 23:37 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Al Viro, Paul McKenney, Dave Chinner, linux-fsdevel, James Morris, Andrew Morton, Stephen Smalley, Theodore Ts'o, stable, Paul Moore, LKML, Matthew Wilcox, Christoph Hellwig On Thu, 2014-01-09 at 18:27 -0500, Steven Rostedt wrote: > On Thu, 9 Jan 2014 18:25:23 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Fri, 10 Jan 2014 06:41:03 +0800 > > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > > I think the sane short term fix is to make the kfree() of the i_security > > > member be a rcu free, and not clear the member. > > > > You mean my first patch? > > > > https://lkml.org/lkml/2014/1/9/349 > > > > Oh wait, you said not to clear the member. Thus, the patch would look > like this: > > Signed-off-by: Steven Rostedt <rostedt@goodmis.org> SMACK also needs this change somehow in smack_inode_free_security() but at least from an SELinux PoV, I think it's quick and easy, but wrong for maintainability... > Index: linux-trace.git/security/selinux/hooks.c > =================================================================== > --- linux-trace.git.orig/security/selinux/hooks.c > +++ linux-trace.git/security/selinux/hooks.c > @@ -234,6 +234,14 @@ static int inode_alloc_security(struct i > return 0; > } > > +static void inode_free_rcu(struct rcu_head *head) > +{ > + struct inode_security_struct *isec; > + > + isec = container_of(head, struct inode_security_struct, rcu); > + kmem_cache_free(sel_inode_cache, isec); > +} > + > static void inode_free_security(struct inode *inode) > { > struct inode_security_struct *isec = inode->i_security; > @@ -244,8 +252,7 @@ static void inode_free_security(struct i > list_del_init(&isec->list); > spin_unlock(&sbsec->isec_lock); > > - inode->i_security = NULL; > - kmem_cache_free(sel_inode_cache, isec); > + call_rcu(&isec->rcu, inode_free_rcu); > } > > static int file_alloc_security(struct file *file) > Index: linux-trace.git/security/selinux/include/objsec.h > =================================================================== > --- linux-trace.git.orig/security/selinux/include/objsec.h > +++ linux-trace.git/security/selinux/include/objsec.h > @@ -38,7 +38,10 @@ struct task_security_struct { > > struct inode_security_struct { > struct inode *inode; /* back pointer to inode object */ > - struct list_head list; /* list of inode_security_struct */ > + union { > + struct list_head list; /* list of inode_security_struct */ > + struct rcu_head rcu; /* for freeing the inode_security_struct */ > + }; > u32 task_sid; /* SID of creating task */ > u32 sid; /* SID of this object */ > u16 sclass; /* security class of this object */ ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-09 23:37 ` Eric Paris @ 2014-01-09 23:45 ` Steven Rostedt 2014-01-09 23:53 ` Linus Torvalds 1 sibling, 0 replies; 20+ messages in thread From: Steven Rostedt @ 2014-01-09 23:45 UTC (permalink / raw) To: Eric Paris Cc: Linus Torvalds, Al Viro, Paul McKenney, Dave Chinner, linux-fsdevel, James Morris, Andrew Morton, Stephen Smalley, Theodore Ts'o, stable, Paul Moore, LKML, Matthew Wilcox, Christoph Hellwig On Thu, 09 Jan 2014 18:37:06 -0500 Eric Paris <eparis@redhat.com> wrote: > > Oh wait, you said not to clear the member. Thus, the patch would look > > like this: > > > > Signed-off-by: Steven Rostedt <rostedt@goodmis.org> > > SMACK also needs this change somehow in smack_inode_free_security() > > but at least from an SELinux PoV, I think it's quick and easy, but wrong > for maintainability... I agree, but for this late in the -rc window, this may be the best thing for now. We can think of a better solution for the future. -- Steve ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-09 23:37 ` Eric Paris 2014-01-09 23:45 ` Steven Rostedt @ 2014-01-09 23:53 ` Linus Torvalds 2014-01-10 0:06 ` Al Viro 1 sibling, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2014-01-09 23:53 UTC (permalink / raw) To: Eric Paris Cc: Steven Rostedt, Al Viro, Paul McKenney, Dave Chinner, linux-fsdevel, James Morris, Andrew Morton, Stephen Smalley, Theodore Ts'o, stable, Paul Moore, LKML, Matthew Wilcox, Christoph Hellwig On Fri, Jan 10, 2014 at 7:37 AM, Eric Paris <eparis@redhat.com> wrote: > > but at least from an SELinux PoV, I think it's quick and easy, but wrong > for maintainability... Yeah, it's a hack, and it's wrong, and we should figure out how to do it right. Likely we should just tie the lifetime of the i_security member directly to the lifetime of the inode itself, and just make the rule be that security_inode_free() gets called from whatever frees the inode itself, and *not* have an extra rcu callback etc. But that sounds like a bigger change than I'm comfy with right now, so the hacky one might be the band-aid to do for stable.. The problem, of course, is that all the different filesystems have their own inode allocations/freeing. Of course, they all tend to share the same pattern ("call_rcu xyz_i_callback"), so maybe we could try to make that a more generic thing? Like have a "free_inode" vfs callback, and do the call_rcu delaying at the VFS level.. And maybe, just maybe, we could just say that that is what "destroy_inode()" is, and that we will just call it from rcu context. All the IO has hopefully been done earlier Yes/no? Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-09 23:53 ` Linus Torvalds @ 2014-01-10 0:06 ` Al Viro 2014-01-10 0:09 ` Al Viro ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Al Viro @ 2014-01-10 0:06 UTC (permalink / raw) To: Linus Torvalds Cc: Eric Paris, Steven Rostedt, Paul McKenney, Dave Chinner, linux-fsdevel, James Morris, Andrew Morton, Stephen Smalley, Theodore Ts'o, stable, Paul Moore, LKML, Matthew Wilcox, Christoph Hellwig On Fri, Jan 10, 2014 at 07:53:41AM +0800, Linus Torvalds wrote: > On Fri, Jan 10, 2014 at 7:37 AM, Eric Paris <eparis@redhat.com> wrote: > > > > but at least from an SELinux PoV, I think it's quick and easy, but wrong > > for maintainability... > > Yeah, it's a hack, and it's wrong, and we should figure out how to do > it right. Likely we should just tie the lifetime of the i_security > member directly to the lifetime of the inode itself, and just make the > rule be that security_inode_free() gets called from whatever frees the > inode itself, and *not* have an extra rcu callback etc. But that > sounds like a bigger change than I'm comfy with right now, so the > hacky one might be the band-aid to do for stable.. > > The problem, of course, is that all the different filesystems have > their own inode allocations/freeing. Of course, they all tend to share > the same pattern ("call_rcu xyz_i_callback"), so maybe we could try to > make that a more generic thing? Like have a "free_inode" vfs callback, > and do the call_rcu delaying at the VFS level.. > > And maybe, just maybe, we could just say that that is what > "destroy_inode()" is, and that we will just call it from rcu context. > All the IO has hopefully been done earlier Yes/no? Check what XFS is doing ;-/ That's where those call_rcu() have come from. Sure, we can separate the simple "just do call_rcu(...->free_inode)" case and hit it whenever full ->free_inode is there and ->destroy_inode isn't. Not too pretty, but removal of tons of boilerplate might be worth doing that anyway. But ->destroy_inode() is still needed for cases where fs has its own idea of inode lifetime rules. Again, check what XFS is doing in that area... There's an extra source of headache, BTW - what about the "LSM stacking" crowd and their plans? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-10 0:06 ` Al Viro @ 2014-01-10 0:09 ` Al Viro 2014-01-10 0:18 ` Linus Torvalds 2014-01-10 9:31 ` Christoph Hellwig 2 siblings, 0 replies; 20+ messages in thread From: Al Viro @ 2014-01-10 0:09 UTC (permalink / raw) To: Linus Torvalds Cc: Eric Paris, Steven Rostedt, Paul McKenney, Dave Chinner, linux-fsdevel, James Morris, Andrew Morton, Stephen Smalley, Theodore Ts'o, stable, Paul Moore, LKML, Matthew Wilcox, Christoph Hellwig On Fri, Jan 10, 2014 at 12:06:42AM +0000, Al Viro wrote: > and hit it whenever full ->free_inode is there and ->destroy_inode isn't. Grr... "whenever ->free_inode is there and full ->destroy_inode isn't". Apologies for sloppy editing... ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-10 0:06 ` Al Viro 2014-01-10 0:09 ` Al Viro @ 2014-01-10 0:18 ` Linus Torvalds 2014-01-10 2:36 ` James Morris 2014-01-10 9:31 ` Christoph Hellwig 2 siblings, 1 reply; 20+ messages in thread From: Linus Torvalds @ 2014-01-10 0:18 UTC (permalink / raw) To: Al Viro Cc: Eric Paris, Steven Rostedt, Paul McKenney, Dave Chinner, linux-fsdevel, James Morris, Andrew Morton, Stephen Smalley, Theodore Ts'o, stable, Paul Moore, LKML, Matthew Wilcox, Christoph Hellwig On Fri, Jan 10, 2014 at 8:06 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Sure, we can separate the simple "just do call_rcu(...->free_inode)" case > and hit it whenever full ->free_inode is there and ->destroy_inode isn't. > Not too pretty, but removal of tons of boilerplate might be worth doing > that anyway. Yeah. > But ->destroy_inode() is still needed for cases where fs > has its own idea of inode lifetime rules. Again, check what XFS is doing > in that area... Ok, so we can't change destroy_inode, and we'd need to add a new op for just freeing it. Painful mainly because there are so many filesystems, but it shouldn't be *complicated*. > There's an extra source of headache, BTW - what about the "LSM stacking" > crowd and their plans? LSM stacking is a pipedream right now anyway, isn't it? It's been talked about for years and years, I've never seen a patch-set that is even remotely something we'd seriously consider. Linus ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-10 0:18 ` Linus Torvalds @ 2014-01-10 2:36 ` James Morris 0 siblings, 0 replies; 20+ messages in thread From: James Morris @ 2014-01-10 2:36 UTC (permalink / raw) To: Linus Torvalds Cc: Al Viro, Eric Paris, Steven Rostedt, Paul McKenney, Dave Chinner, linux-fsdevel, James Morris, Andrew Morton, Stephen Smalley, Theodore Ts'o, stable, Paul Moore, LKML, Matthew Wilcox, Christoph Hellwig On Fri, 10 Jan 2014, Linus Torvalds wrote: > > There's an extra source of headache, BTW - what about the "LSM stacking" > > crowd and their plans? > > LSM stacking is a pipedream right now anyway, isn't it? It's been > talked about for years and years, I've never seen a patch-set that is > even remotely something we'd seriously consider. > And there's no good justification for it. -- James Morris <jmorris@namei.org> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-10 0:06 ` Al Viro 2014-01-10 0:09 ` Al Viro 2014-01-10 0:18 ` Linus Torvalds @ 2014-01-10 9:31 ` Christoph Hellwig 2014-01-10 18:14 ` Ben Myers 2 siblings, 1 reply; 20+ messages in thread From: Christoph Hellwig @ 2014-01-10 9:31 UTC (permalink / raw) To: Al Viro Cc: Linus Torvalds, Eric Paris, Steven Rostedt, Paul McKenney, Dave Chinner, linux-fsdevel, James Morris, Andrew Morton, Stephen Smalley, Theodore Ts'o, stable, Paul Moore, LKML, Matthew Wilcox, xfs On Fri, Jan 10, 2014 at 12:06:42AM +0000, Al Viro wrote: > Check what XFS is doing ;-/ That's where those call_rcu() have come from. > Sure, we can separate the simple "just do call_rcu(...->free_inode)" case > and hit it whenever full ->free_inode is there and ->destroy_inode isn't. > Not too pretty, but removal of tons of boilerplate might be worth doing > that anyway. But ->destroy_inode() is still needed for cases where fs > has its own idea of inode lifetime rules. Again, check what XFS is doing > in that area... Btw, I'd really love to get rid of the XFS ->destroy_inode abuse, it's been a long time thorn in the flesh. What's really needed there to make XFS behave more similar to everyone else is a way for the filesystem to say: "I can't actually free this inode right now, but I'll come back to you later". That's what we actually do right now, except we pretend that the VFS inode gets freed, while its memory lives on (punt intended). ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-10 9:31 ` Christoph Hellwig @ 2014-01-10 18:14 ` Ben Myers 2014-01-11 10:32 ` Christoph Hellwig 0 siblings, 1 reply; 20+ messages in thread From: Ben Myers @ 2014-01-10 18:14 UTC (permalink / raw) To: Christoph Hellwig Cc: Theodore Ts'o, Paul Moore, Matthew Wilcox, Linus Torvalds, Eric Paris, LKML, Steven Rostedt, xfs, Al Viro, James Morris, linux-fsdevel, stable, Andrew Morton, Paul McKenney, Stephen Smalley Christoph, On Fri, Jan 10, 2014 at 01:31:48AM -0800, Christoph Hellwig wrote: > On Fri, Jan 10, 2014 at 12:06:42AM +0000, Al Viro wrote: > > Check what XFS is doing ;-/ That's where those call_rcu() have come from. > > Sure, we can separate the simple "just do call_rcu(...->free_inode)" case > > and hit it whenever full ->free_inode is there and ->destroy_inode isn't. > > Not too pretty, but removal of tons of boilerplate might be worth doing > > that anyway. But ->destroy_inode() is still needed for cases where fs > > has its own idea of inode lifetime rules. Again, check what XFS is doing > > in that area... > > Btw, I'd really love to get rid of the XFS ->destroy_inode abuse, it's > been a long time thorn in the flesh. I believe this behavior is related to freeing of an inode cluster. > What's really needed there to make XFS behave more similar to everyone > else is a way for the filesystem to say: "I can't actually free this > inode right now, but I'll come back to you later". This test might read something like: "If my link count has gone to zero, and I am the last inode in my cluster to be freed, and there are other inodes from my cluster incore, I cannot be freed." Should be doable. Maybe there are other reasons. -Ben _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-10 18:14 ` Ben Myers @ 2014-01-11 10:32 ` Christoph Hellwig 0 siblings, 0 replies; 20+ messages in thread From: Christoph Hellwig @ 2014-01-11 10:32 UTC (permalink / raw) To: Ben Myers Cc: Theodore Ts'o, Paul Moore, Matthew Wilcox, Linus Torvalds, Eric Paris, LKML, Steven Rostedt, xfs, Christoph Hellwig, Al Viro, James Morris, linux-fsdevel, stable, Andrew Morton, Paul McKenney, Stephen Smalley On Fri, Jan 10, 2014 at 12:14:34PM -0600, Ben Myers wrote: > > What's really needed there to make XFS behave more similar to everyone > > else is a way for the filesystem to say: "I can't actually free this > > inode right now, but I'll come back to you later". > > This test might read something like: "If my link count has gone to zero, and I > am the last inode in my cluster to be freed, and there are other inodes from my > cluster incore, I cannot be freed." It's more complicated than that. In theory we would free the inode easily as soon as the VFS wants it, but performance would be horrible as we would have to synchronously write back the inode. Note that it really matters for the interface, that just needs to be an: I won't free this right now, but I'll call you back later when I can. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-09 23:27 ` Steven Rostedt 2014-01-09 23:37 ` Eric Paris @ 2014-01-09 23:45 ` Paul E. McKenney 2014-01-09 23:59 ` Steven Rostedt 1 sibling, 1 reply; 20+ messages in thread From: Paul E. McKenney @ 2014-01-09 23:45 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Al Viro, Dave Chinner, linux-fsdevel, James Morris, Andrew Morton, Stephen Smalley, Theodore Ts'o, Eric Paris, stable, Paul Moore, LKML, Matthew Wilcox, Christoph Hellwig On Thu, Jan 09, 2014 at 06:27:56PM -0500, Steven Rostedt wrote: > On Thu, 9 Jan 2014 18:25:23 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Fri, 10 Jan 2014 06:41:03 +0800 > > Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > > > I think the sane short term fix is to make the kfree() of the i_security > > > member be a rcu free, and not clear the member. > > > > You mean my first patch? > > > > https://lkml.org/lkml/2014/1/9/349 > > > > Oh wait, you said not to clear the member. Thus, the patch would look > like this: > > Signed-off-by: Steven Rostedt <rostedt@goodmis.org> > > Index: linux-trace.git/security/selinux/hooks.c > =================================================================== > --- linux-trace.git.orig/security/selinux/hooks.c > +++ linux-trace.git/security/selinux/hooks.c > @@ -234,6 +234,14 @@ static int inode_alloc_security(struct i > return 0; > } > > +static void inode_free_rcu(struct rcu_head *head) > +{ > + struct inode_security_struct *isec; > + > + isec = container_of(head, struct inode_security_struct, rcu); > + kmem_cache_free(sel_inode_cache, isec); > +} > + > static void inode_free_security(struct inode *inode) > { > struct inode_security_struct *isec = inode->i_security; > @@ -244,8 +252,7 @@ static void inode_free_security(struct i > list_del_init(&isec->list); > spin_unlock(&sbsec->isec_lock); > > - inode->i_security = NULL; > - kmem_cache_free(sel_inode_cache, isec); > + call_rcu(&isec->rcu, inode_free_rcu); Does not clearing ->i_security mean that RCU readers can traverse this pointer after the invocation of call_rcu()? If so, this is problematic. (If something else already prevents readers from getting here, no problem.) Thanx, Paul > } > > static int file_alloc_security(struct file *file) > Index: linux-trace.git/security/selinux/include/objsec.h > =================================================================== > --- linux-trace.git.orig/security/selinux/include/objsec.h > +++ linux-trace.git/security/selinux/include/objsec.h > @@ -38,7 +38,10 @@ struct task_security_struct { > > struct inode_security_struct { > struct inode *inode; /* back pointer to inode object */ > - struct list_head list; /* list of inode_security_struct */ > + union { > + struct list_head list; /* list of inode_security_struct */ > + struct rcu_head rcu; /* for freeing the inode_security_struct */ > + }; > u32 task_sid; /* SID of creating task */ > u32 sid; /* SID of this object */ > u16 sclass; /* security class of this object */ > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-09 23:45 ` Paul E. McKenney @ 2014-01-09 23:59 ` Steven Rostedt 2014-01-10 0:44 ` Paul E. McKenney 0 siblings, 1 reply; 20+ messages in thread From: Steven Rostedt @ 2014-01-09 23:59 UTC (permalink / raw) To: paulmck Cc: Linus Torvalds, Al Viro, Dave Chinner, linux-fsdevel, James Morris, Andrew Morton, Stephen Smalley, Theodore Ts'o, Eric Paris, stable, Paul Moore, LKML, Matthew Wilcox, Christoph Hellwig On Thu, 9 Jan 2014 15:45:37 -0800 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > static void inode_free_security(struct inode *inode) > > { > > struct inode_security_struct *isec = inode->i_security; > > @@ -244,8 +252,7 @@ static void inode_free_security(struct i > > list_del_init(&isec->list); > > spin_unlock(&sbsec->isec_lock); > > > > - inode->i_security = NULL; > > - kmem_cache_free(sel_inode_cache, isec); > > + call_rcu(&isec->rcu, inode_free_rcu); > > Does not clearing ->i_security mean that RCU readers can traverse > this pointer after the invocation of call_rcu()? If so, this is > problematic. (If something else already prevents readers from getting > here, no problem.) This is called when we are about to free the inode. Look at destroy_inode(). Basically, this is the same as doing: call_rcu(&isec->rcu, inode_free_rcu); call_rcu(&inode->i_rcu, i_callback); Where i_callback() does the free of the inode. If you can access inode->i_security, after a call_rcu, then you can also access the inode itself that has just been freed. Yes, technically, having two separate call_rcu(), the first grace period can end before the second, but everything to remove the inode from sight has already been set up before that first call_rcu() is made. That means when the first call_rcu() is executed, the inode should already be invisible to the readers. - Steve > > Thanx, Paul > > > } > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() 2014-01-09 23:59 ` Steven Rostedt @ 2014-01-10 0:44 ` Paul E. McKenney 0 siblings, 0 replies; 20+ messages in thread From: Paul E. McKenney @ 2014-01-10 0:44 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, Al Viro, Dave Chinner, linux-fsdevel, James Morris, Andrew Morton, Stephen Smalley, Theodore Ts'o, Eric Paris, stable, Paul Moore, LKML, Matthew Wilcox, Christoph Hellwig On Thu, Jan 09, 2014 at 06:59:07PM -0500, Steven Rostedt wrote: > On Thu, 9 Jan 2014 15:45:37 -0800 > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote: > > > > static void inode_free_security(struct inode *inode) > > > { > > > struct inode_security_struct *isec = inode->i_security; > > > @@ -244,8 +252,7 @@ static void inode_free_security(struct i > > > list_del_init(&isec->list); > > > spin_unlock(&sbsec->isec_lock); > > > > > > - inode->i_security = NULL; > > > - kmem_cache_free(sel_inode_cache, isec); > > > + call_rcu(&isec->rcu, inode_free_rcu); > > > > Does not clearing ->i_security mean that RCU readers can traverse > > this pointer after the invocation of call_rcu()? If so, this is > > problematic. (If something else already prevents readers from getting > > here, no problem.) > > This is called when we are about to free the inode. Look at > destroy_inode(). Basically, this is the same as doing: > > call_rcu(&isec->rcu, inode_free_rcu); > call_rcu(&inode->i_rcu, i_callback); > > Where i_callback() does the free of the inode. > > If you can access inode->i_security, after a call_rcu, then you can > also access the inode itself that has just been freed. > > Yes, technically, having two separate call_rcu(), the first grace > period can end before the second, but everything to remove the inode > from sight has already been set up before that first call_rcu() is > made. That means when the first call_rcu() is executed, the inode > should already be invisible to the readers. Got it, should be fine then, sorry for the noise. Thanx, Paul ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2014-01-11 10:32 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-01-09 21:27 [PATCH] vfs: Fix possible NULL pointer dereference in inode_permission() Steven Rostedt 2014-01-09 21:42 ` Matthew Wilcox 2014-01-09 21:50 ` Steven Rostedt 2014-01-09 22:31 ` Al Viro [not found] ` <CA+55aFzCTPYEQCPnLBi1CwmMTocVqCFiCuJ391HkVx1CMw61ug@mail.gmail.com> 2014-01-09 23:10 ` Paul E. McKenney 2014-01-09 23:25 ` Steven Rostedt 2014-01-09 23:27 ` Steven Rostedt 2014-01-09 23:37 ` Eric Paris 2014-01-09 23:45 ` Steven Rostedt 2014-01-09 23:53 ` Linus Torvalds 2014-01-10 0:06 ` Al Viro 2014-01-10 0:09 ` Al Viro 2014-01-10 0:18 ` Linus Torvalds 2014-01-10 2:36 ` James Morris 2014-01-10 9:31 ` Christoph Hellwig 2014-01-10 18:14 ` Ben Myers 2014-01-11 10:32 ` Christoph Hellwig 2014-01-09 23:45 ` Paul E. McKenney 2014-01-09 23:59 ` Steven Rostedt 2014-01-10 0:44 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).