All of lore.kernel.org
 help / color / mirror / Atom feed
From: Calvin Owens <calvinowens@fb.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Miklos Szeredi <miklos@szeredi.hu>, Zefan Li <lizefan@huawei.com>,
	Oleg Nesterov <oleg@redhat.com>, Joe Perches <joe@perches.com>,
	David Howells <dhowells@redhat.com>,
	<linux-kernel@vger.kernel.org>, <kernel-team@fb.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Kees Cook <keescook@chromium.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>
Subject: Re: [PATCH v6] procfs: Always expose /proc/<pid>/map_files/ and make it readable
Date: Tue, 9 Jun 2015 18:39:02 -0700	[thread overview]
Message-ID: <20150610013902.GA176908@mail.thefacebook.com> (raw)
In-Reply-To: <20150609141300.b80eeec15b2c379146816c06@linux-foundation.org>

On Tuesday 06/09 at 14:13 -0700, Andrew Morton wrote:
> On Mon, 8 Jun 2015 20:39:33 -0700 Calvin Owens <calvinowens@fb.com> wrote:
> 
> > Currently, /proc/<pid>/map_files/ is restricted to CAP_SYS_ADMIN, and
> > is only exposed if CONFIG_CHECKPOINT_RESTORE is set.
> > 
> > This interface very useful because it allows userspace to stat()
> > deleted files that are still mapped by some process, which enables a
> > much quicker and more accurate answer to the question "How much disk
> > space is being consumed by files that are deleted but still mapped?"
> > than is currently possible.
> 
> Why is that information useful?
> 
> I could perhaps think of some use for "How much disk space is being
> consumed by files that are deleted but still open", but to count the
> mmapped-then-unlinked files while excluding the opened-then-unlinked
> files seems damned peculiar.

Let's phrase the question a bit more generically:

"How much disk space is being consumed by files that have been
unlinked, but are still referenced by some process?"

There are two pieces to this problem:
	1) Unlinked files that are still open (whether mapped or not)
	2) Unlinked files that are not open, but are still mapped

You can track down everything in (1) using /proc/<pid>/fd/*, and you
can use stat() to figure out how much space they're using.

But directly measuring how much space (2) consumes is actually not
currently possible from userspace: there's no way to stat() the files.
You can get the inode number from /proc/<pid>/maps, but that still
doesn't get you anywhere because it's been unlinked from the
filesystem.

So I'm not looking to measure (2) and exclude (1): I'm looking to have
a way to directly measure (2) at all.

The reason I say "directly", and I say "quicker and more accurate" in
the original message, is that there is a very ugly way to answer this
question right now: you sum up the number of blocks used by every file
on the disk and subtract it from what statfs() tells you. This
obviously stinks, and becomes untenable once your filesystem is large
enough.
 
> IOW, this changelog failed to explain the value of the patch.  Bad
> changelog!  Please sell it to us.  Preferably with real-world use
> cases.

The real-world use case is catching long-lived processes that leak
references to temporary files and waste space on the disk. When such
processes leak file-backed mappings, this wasted space is especially
difficult to detect until it gets out of hand. The map_files/
interface eliminates this difficulty.

I've included a little test program at the end of this file to illustrate
what I'm getting at here. It creates a file at /tmp/DELETEDFILE:

	calvinowens@Haydn:~$ gcc test.c 
	calvinowens@Haydn:~$ ./a.out &
	[1] 5832
	Holding mapping at 0x7fe74d1ea000
	calvinowens@Haydn:~$ lsof -p `pgrep a.out`
	COMMAND  PID        USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
	a.out   5832 calvinowens  cwd    DIR  254,1     4096 3413033 /home/calvinowens
	a.out   5832 calvinowens  rtd    DIR  254,1     4096       2 /
	a.out   5832 calvinowens  txt    REG  254,1     7512 3408268 /home/calvinowens/a.out
	a.out   5832 calvinowens  mem    REG  254,1  1729984 4456767 /lib/x86_64-linux-gnu/libc-2.19.so
	a.out   5832 calvinowens  mem    REG  254,1   140928 4456619 /lib/x86_64-linux-gnu/ld-2.19.so
	a.out   5832 calvinowens  mem    REG   0,32    32768  184946 /tmp/DELETEDFILE
	a.out   5832 calvinowens    0u   CHR  136,2      0t0       5 /dev/pts/2
	a.out   5832 calvinowens    1u   CHR  136,2      0t0       5 /dev/pts/2
	a.out   5832 calvinowens    2u   CHR  136,2      0t0       5 /dev/pts/2
	calvinowens@Haydn:~$ killall a.out
	[1]+  Terminated              ./a.out
	calvinowens@Haydn:~$ gcc -DDO_UNLINK test.c 
	calvinowens@Haydn:~$ ./a.out &
	[1] 5842
	Holding mapping at 0x7fec8ae63000
	calvinowens@Haydn:~$ lsof -p `pgrep a.out`
	COMMAND  PID        USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
	a.out   5842 calvinowens  cwd    DIR  254,1     4096 3413033 /home/calvinowens
	a.out   5842 calvinowens  rtd    DIR  254,1     4096       2 /
	a.out   5842 calvinowens  txt    REG  254,1     7640 3408268 /home/calvinowens/a.out
	a.out   5842 calvinowens  mem    REG  254,1  1729984 4456767 /lib/x86_64-linux-gnu/libc-2.19.so
	a.out   5842 calvinowens  mem    REG  254,1   140928 4456619 /lib/x86_64-linux-gnu/ld-2.19.so
	a.out   5842 calvinowens  DEL    REG   0,32           184946 /tmp/DELETEDFILE
	a.out   5842 calvinowens    0u   CHR  136,2      0t0       5 /dev/pts/2
	a.out   5842 calvinowens    1u   CHR  136,2      0t0       5 /dev/pts/2
	a.out   5842 calvinowens    2u   CHR  136,2      0t0       5 /dev/pts/2

Notice the gap under "SIZE/OFF" in the 2nd output? This is because lsof
has no possible way to actually determine the leaked file's size.
That's the functionality "hole" I'm trying to fill with this patch.

Does that all seem sensible?

Thanks,
Calvin

--
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <limits.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>

int main(void)
{
	int ret, fd;
	void *map;

	fd = open("/tmp/DELETEDFILE", O_CREAT|O_TRUNC|O_RDWR, 0777);
	if (fd == -1)
		return -1;

	ret = ftruncate(fd, 32768);
	if (ret == -1)
		return -1;

	map = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE,
			fd, 0);
	if (map == MAP_FAILED)
		return -1;

	close(fd);
	#ifdef DO_UNLINK
	unlink("/tmp/DELETEDFILE");
	#endif

	printf("Holding mapping at %p\n", map);
	while (1)
		sleep(UINT_MAX);
}

  reply	other threads:[~2015-06-10  1:39 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-14  0:20 [RFC][PATCH] procfs: Add /proc/<pid>/mapped_files Calvin Owens
2015-01-14  0:23 ` Calvin Owens
2015-01-14 14:13 ` Rasmus Villemoes
2015-01-14 14:37   ` Siddhesh Poyarekar
2015-01-14 14:53     ` Rasmus Villemoes
2015-01-14 21:03       ` Calvin Owens
2015-01-14 22:45         ` Andrew Morton
2015-01-14 23:51           ` Rasmus Villemoes
2015-01-16  1:15             ` Andrew Morton
2015-01-16 11:00               ` Kirill A. Shutemov
2015-01-14 15:25 ` Kirill A. Shutemov
2015-01-14 15:33   ` Cyrill Gorcunov
2015-01-14 20:46     ` Calvin Owens
2015-01-14 21:16       ` Cyrill Gorcunov
2015-01-22  2:45         ` [RFC][PATCH] procfs: Always expose /proc/<pid>/map_files/ and make it readable Calvin Owens
2015-01-22  7:16           ` Cyrill Gorcunov
2015-01-22 11:02           ` Kirill A. Shutemov
2015-01-22 21:00             ` Calvin Owens
2015-01-22 21:27               ` Kirill A. Shutemov
2015-01-23  5:52                 ` Calvin Owens
2015-01-24  3:15           ` [RFC][PATCH v2] " Calvin Owens
2015-01-26 12:47             ` Kirill A. Shutemov
2015-01-26 21:00               ` Cyrill Gorcunov
2015-01-26 21:00                 ` Cyrill Gorcunov
2015-01-26 23:43                 ` Andrew Morton
2015-01-27  0:15                   ` Kees Cook
2015-01-27  0:15                     ` Kees Cook
2015-01-27  7:37                     ` Cyrill Gorcunov
2015-01-27  7:37                       ` Cyrill Gorcunov
2015-01-27 19:53                       ` Kees Cook
2015-01-27 19:53                         ` Kees Cook
2015-01-27 21:35                         ` Cyrill Gorcunov
2015-01-27 21:35                           ` Cyrill Gorcunov
2015-01-27 21:46                         ` Pavel Emelyanov
2015-01-27 21:46                           ` Pavel Emelyanov
2015-01-27  0:19                   ` Kirill A. Shutemov
2015-01-27  0:19                     ` Kirill A. Shutemov
2015-01-27  6:46                   ` Cyrill Gorcunov
2015-01-27  6:46                     ` Cyrill Gorcunov
2015-01-27  6:50                     ` Andrew Morton
2015-01-27  7:23                       ` Cyrill Gorcunov
2015-01-27  7:23                         ` Cyrill Gorcunov
2015-01-28  4:38                   ` Calvin Owens
2015-01-28  4:38                     ` Calvin Owens
2015-01-30  1:30                     ` Kees Cook
2015-01-30  1:30                       ` Kees Cook
2015-01-31  1:58                       ` Calvin Owens
2015-01-31  1:58                         ` Calvin Owens
2015-02-02 14:01                         ` Austin S Hemmelgarn
2015-02-04  3:53                           ` Calvin Owens
2015-02-04  3:53                             ` Calvin Owens
2015-02-02 20:16                         ` Andy Lutomirski
2015-02-04  3:28                           ` Calvin Owens
2015-02-04  3:28                             ` Calvin Owens
2015-02-12  2:29             ` [RFC][PATCH v3] " Calvin Owens
2015-02-12  7:45               ` Cyrill Gorcunov
2015-02-14 20:40               ` [RFC][PATCH v4] " Calvin Owens
2015-03-10 22:17                 ` Cyrill Gorcunov
2015-04-28 22:23                   ` Calvin Owens
2015-04-29  7:32                     ` Cyrill Gorcunov
2015-05-19  3:10                 ` [PATCH v5] " Calvin Owens
2015-05-19  3:29                   ` Joe Perches
2015-05-19 18:04                   ` Andy Lutomirski
2015-05-21  1:52                     ` Calvin Owens
2015-05-21  2:10                       ` Andy Lutomirski
2015-06-09  3:39                   ` [PATCH v6] " Calvin Owens
2015-06-09 17:27                     ` Kees Cook
2015-06-09 17:47                       ` Andy Lutomirski
2015-06-09 18:15                         ` Cyrill Gorcunov
2015-06-09 21:13                     ` Andrew Morton
2015-06-10  1:39                       ` Calvin Owens [this message]
2015-06-10 20:58                         ` Andrew Morton
2015-06-11 11:10                           ` Alexey Dobriyan
2015-06-11 18:49                             ` Andrew Morton
2015-06-12  9:55                               ` Alexey Dobriyan
2015-06-19  2:32                     ` [PATCH v7] " Calvin Owens
2015-07-15 22:21                       ` Andrew Morton
2015-07-15 23:39                         ` Calvin Owens
2015-02-14 20:44               ` [PATCH] procfs: Return -ESRCH on /proc/N/fd/* when PID N doesn't exist Calvin Owens
2015-01-14 22:40 ` [RFC][PATCH] procfs: Add /proc/<pid>/mapped_files Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150610013902.GA176908@mail.thefacebook.com \
    --to=calvinowens@fb.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=gorcunov@openvz.org \
    --cc=joe@perches.com \
    --cc=keescook@chromium.org \
    --cc=kernel-team@fb.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=luto@amacapital.net \
    --cc=miklos@szeredi.hu \
    --cc=oleg@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.