All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mimi Zohar <zohar@linux.vnet.ibm.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	". James Morris" <jmorris@namei.org>,
	linux-security-module@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	David Howells <dhowells@redhat.com>
Subject: Re: [PATCH 0/4] Was: deferring __fput()
Date: Mon, 02 Jul 2012 07:49:50 -0400	[thread overview]
Message-ID: <1341229790.2350.1.camel@falcor> (raw)
In-Reply-To: <20120702051155.GF22927@ZenIV.linux.org.uk>

On Mon, 2012-07-02 at 06:11 +0100, Al Viro wrote:
> On Mon, Jul 02, 2012 at 04:43:10AM +0100, Al Viro wrote:
> > On Sun, Jul 01, 2012 at 09:46:31PM -0400, Mimi Zohar wrote:
> > > On Sun, 2012-07-01 at 21:57 +0100, Al Viro wrote:
> > > > On Sun, Jul 01, 2012 at 03:50:02PM -0400, Mimi Zohar wrote:
> > > > > Replacing it with a call to __fput(), the system boots.
> > > > 
> > > > "it" being just the part under that if (unlikely(...)))?  Very interesting...  If so, we
> > > > have some kernel thread ending up with delayed __fput() which somehow makes dracut (assuimg
> > > > you are using fedora initramfs to go with fedora config) unhappy.  With your own patch,
> > > > doing async __fput() in a lot of cases when this one doesn't delay past the return to
> > > > userland managing to survive the boot...  I wonder which files end up triggering that fun
> > > > and which kernel thread is responsible...  Could you slap a printk() in there, showing
> > > > file->f_dentry->d_inode->i_mode (octal) and at least file->f_dentry->d_name.name?
> > > > Along with the current->comm[], all under that inner if ().  And see which ones end up
> > > > going that way by the time execve() of /sbin/init fails.
> > > 
> > > pid=1 uid=0 d_name=init comm=swapper/0 dev="rootfs" mode=100775
> > > pid=1 uid=0 d_name=bash comm=swapper/0 dev="rootfs" mode=100755
> > 
> > OK...  Here's what I suspect is going on:
> > 	* populating initramfs writes binaries there.  We open files (for write) from
> > the kernel thread (there's nothing other than kernel threads at that point), write to
> > them, then close().  Final fput() gets delayed.
> > 	* Then we proceed to execve().  Which means mapping the binary with MAP_DENYWRITE.
> > Which fails, since there's a struct file still opened for write on that sucker.
> > 
> > Your patch did not delay those fput() - they were done without ->mmap_sem held.  So
> > it survived.  Booting without initramfs always survives; booting with initramfs may
> > or may not survive, depending on the timings - if that scheduled work manages to
> > run by the time we do those execve(), we win.  Note that async_synchronize_full()
> > done in init_post() might easily affect that, depending on config.
> > 
> > As a quick test, could you try slapping a delay somewhere around the beginning
> > of init_post() and see if it rescues the system?
> 
> Ho-hum...  How about this (modulo missing documentation of the whole sad mess):

Sorry, neither adding the delay or this patch helped.

> diff --git a/fs/file_table.c b/fs/file_table.c
> index 470da0b..00fd849 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -284,6 +284,11 @@ static void ____fput(struct callback_head *work)
>  	__fput(container_of(work, struct file, f_u.fu_rcuhead));
>  }
> 
> +void flush_delayed_fput(void)
> +{
> +	delayed_fput(NULL);
> +}
> +
>  static DECLARE_WORK(delayed_fput_work, delayed_fput);
> 
>  void fput(struct file *file)
> diff --git a/include/linux/file.h b/include/linux/file.h
> index 58bf158..d9a4f5a 100644
> --- a/include/linux/file.h
> +++ b/include/linux/file.h
> @@ -39,4 +39,6 @@ extern void put_unused_fd(unsigned int fd);
> 
>  extern void fd_install(unsigned int fd, struct file *file);
> 
> +extern void flush_delayed_fput(void);
> +
>  #endif /* __LINUX_FILE_H */
> diff --git a/init/main.c b/init/main.c
> index b5cc0a7..3f151f6 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -68,6 +68,7 @@
>  #include <linux/shmem_fs.h>
>  #include <linux/slab.h>
>  #include <linux/perf_event.h>
> +#include <linux/file.h>
> 
>  #include <asm/io.h>
>  #include <asm/bugs.h>
> @@ -804,8 +805,8 @@ static noinline int init_post(void)
>  	system_state = SYSTEM_RUNNING;
>  	numa_default_policy();
> 
> -
>  	current->signal->flags |= SIGNAL_UNKILLABLE;
> +	flush_delayed_fput();
> 
>  	if (ramdisk_execute_command) {
>  		run_init_process(ramdisk_execute_command);



  reply	other threads:[~2012-07-02 11:50 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-22 12:44 deferring __fput() Mimi Zohar
2012-06-23  9:20 ` Al Viro
2012-06-23 19:45   ` Al Viro
2012-06-23 20:38     ` Oleg Nesterov
2012-06-23 21:01       ` Al Viro
2012-06-23 21:11         ` Al Viro
2012-06-24  4:16         ` Al Viro
2012-06-24 10:09           ` Al Viro
2012-06-24 16:54             ` Oleg Nesterov
2012-06-24 15:33           ` Oleg Nesterov
2012-06-25  6:03             ` Al Viro
2012-06-25 15:18               ` Oleg Nesterov
2012-06-27 18:37                 ` [PATCH 0/4] Was: " Oleg Nesterov
2012-06-27 18:37                   ` [PATCH 1/4] task_work: use the single-linked list to shrink sizeof(task_work) Oleg Nesterov
2012-06-27 18:37                   ` [PATCH 2/4] task_work: don't rely on PF_EXITING Oleg Nesterov
2012-06-27 18:38                   ` [PATCH 3/4] task_work: deal with task_work callbacks adding more work Oleg Nesterov
2012-06-27 18:38                   ` [PATCH 4/4] task_work: kill task_work->data Oleg Nesterov
2012-06-27 19:05                     ` Oleg Nesterov
2012-06-28  4:38                   ` [PATCH 0/4] Was: deferring __fput() Al Viro
2012-06-28 16:22                     ` Oleg Nesterov
2012-06-28 16:45                       ` Oleg Nesterov
2012-06-30  6:24                         ` Al Viro
2012-06-30 17:41                           ` Oleg Nesterov
2012-06-29  5:30                     ` Mimi Zohar
2012-06-29  8:33                       ` Al Viro
2012-06-29 13:02                         ` Mimi Zohar
2012-06-29 17:41                           ` Al Viro
2012-06-29 21:38                             ` Mimi Zohar
2012-06-29 23:56                               ` Mimi Zohar
2012-06-30  5:02                                 ` Al Viro
2012-07-01 19:50                                   ` Mimi Zohar
2012-07-01 20:57                                     ` Al Viro
2012-07-02  1:46                                       ` Mimi Zohar
2012-07-02  3:43                                         ` Al Viro
2012-07-02  5:11                                           ` Al Viro
2012-07-02 11:49                                             ` Mimi Zohar [this message]
2012-07-02 12:02                                               ` Al Viro
2012-07-02 13:01                                                 ` Mimi Zohar
2012-07-02 13:33                                                   ` Al Viro
2012-07-02 14:50                                                     ` Mimi Zohar
2012-08-21 13:05                                                       ` [PATCH] task_work: add a scheduling point in task_work_run() Eric Dumazet
2012-08-21 20:37                                                         ` Mimi Zohar
2012-08-21 21:32                                                           ` Eric Dumazet
2012-08-22  3:13                                                             ` Mimi Zohar
2012-08-22  5:27                                                         ` Michael Wang
2012-08-22  5:38                                                           ` Al Viro
2012-06-23 20:57     ` deferring __fput() Al Viro
2012-06-23 21:33       ` Al Viro
2012-06-24 15:20       ` Oleg Nesterov
2012-06-24 18:11         ` Oleg Nesterov
2012-06-25 12:03       ` Peter Zijlstra
2012-06-25 12:14         ` Al Viro
2012-06-25 13:19           ` Peter Zijlstra
2012-06-25 13:53             ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1341229790.2350.1.camel@falcor \
    --to=zohar@linux.vnet.ibm.com \
    --cc=dhowells@redhat.com \
    --cc=jmorris@namei.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.