All of lore.kernel.org
 help / color / mirror / Atom feed
* /proc/pid/fd/ shows strange mode when executed via sudo.
@ 2012-05-02 13:40 Tetsuo Handa
  2012-05-03 15:42 ` Serge Hallyn
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2012-05-02 13:40 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-security-module

I noticed a strange difference in /proc/pid/fd/ directory
when a command is executed via /usr/bin/sudo.

Say, there are three files in some directory.
(In my environment, /tmp/ is a plain ext4 partition.)

# touch /tmp/1
# touch /tmp/2
# touch /tmp/3
# ls -l /tmp/?
-rw-r--r-- 1 root root 0 May  2 21:48 /tmp/1
-rw-r--r-- 1 root root 0 May  2 21:48 /tmp/2
-rw-r--r-- 1 root root 0 May  2 21:48 /tmp/3

Try to read one of them using "tail -f" from one terminal.

# tail -f /tmp/1

Show /proc/pid/fd/ from another terminal.

# ls -l /proc/`pidof tail`/fd/
total 0
lrwx------ 1 root root 64 May  2 21:54 0 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:54 1 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:54 2 -> /dev/pts/0
lr-x------ 1 root root 64 May  2 21:54 3 -> /tmp/1
lr-x------ 1 root root 64 May  2 21:54 4 -> anon_inode:inotify

Quit the "tail -f". Try to read two of them using "tail -f".

# tail -f /tmp/1 /tmp/2

Show /proc/pid/fd/ from another terminal.

# ls -l /proc/`pidof tail`/fd/
total 0
lrwx------ 1 root root 64 May  2 21:54 0 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:54 1 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:54 2 -> /dev/pts/0
lr-x------ 1 root root 64 May  2 21:54 3 -> /tmp/1
lr-x------ 1 root root 64 May  2 21:54 4 -> /tmp/2
lr-x------ 1 root root 64 May  2 21:54 5 -> anon_inode:inotify

Quit the "tail -f". Try to read three of them using "tail -f".

# tail -f /tmp/1 /tmp/2 /tmp/3

Show /proc/pid/fd/ from another terminal.

# ls -l /proc/`pidof tail`/fd/
total 0
lrwx------ 1 root root 64 May  2 21:55 0 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:55 1 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:55 2 -> /dev/pts/0
lr-x------ 1 root root 64 May  2 21:55 3 -> /tmp/1
lr-x------ 1 root root 64 May  2 21:55 4 -> /tmp/2
lr-x------ 1 root root 64 May  2 21:55 5 -> /tmp/3
lr-x------ 1 root root 64 May  2 21:55 6 -> anon_inode:inotify

Quit the "tail -f". You see, they are all fine.

However, the output is different when executed via /usr/bin/sudo .

Try to read one of them using "sudo tail -f" from one terminal.

# sudo tail -f /tmp/1

Show /proc/pid/fd/ from another terminal.

# ls -l /proc/`pidof tail`/fd/
total 0
lrwx------ 1 root root 64 May  2 21:55 0 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:55 1 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:55 2 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:55 3 -> /tmp/1
lrwx------ 1 root root 64 May  2 21:55 4 -> anon_inode:inotify

Quit the "tail -f". Try to read two of them using "sudo tail -f".

# sudo tail -f /tmp/1 /tmp/2

Show /proc/pid/fd/ from another terminal.

# ls -l /proc/`pidof tail`/fd/
total 0
lrwx------ 1 root root 64 May  2 21:56 0 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:56 1 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:56 2 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:56 3 -> /tmp/1
lrwx------ 1 root root 64 May  2 21:56 4 -> /tmp/2
lr-x------ 1 root root 64 May  2 21:56 5 -> anon_inode:inotify

Quit the "tail -f". Try to read three of them using "sudo tail -f".

# sudo tail -f /tmp/1 /tmp/2 /tmp/3

Show /proc/pid/fd/ from another terminal.

# ls -l /proc/`pidof tail`/fd/
total 0
lrwx------ 1 root root 64 May  2 21:56 0 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:56 1 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:56 2 -> /dev/pts/0
lrwx------ 1 root root 64 May  2 21:56 3 -> /tmp/1
lrwx------ 1 root root 64 May  2 21:56 4 -> /tmp/2
lr-x------ 1 root root 64 May  2 21:56 5 -> /tmp/3
lr-x------ 1 root root 64 May  2 21:56 6 -> anon_inode:inotify

Quit the "tail -f".

You see, when executed via /usr/bin/sudo , fd == 3 and fd == 4 are reported as
"lrwx------" whereas fd >= 5 are reported as "lr-x------".

# strace -f -e open sudo tail -f /tmp/1 /tmp/2 /tmp/3

shows that /usr/bin/tail is opening /tmp/1 /tmp/2 /tmp/3 as O_RDONLY.
/usr/bin/sudo can't set w bit before /usr/bin/tail opens them with r bit.
I wonder from where the w bit came...

Above result was obtained using kernel 3.2.0-24-generic-pae (3.2.0-24.37) on
Ubuntu 12.04, but below result (similar but not identical) was obtained using
vanilla 3.4-rc5 kernel on CentOS 6.2.

-- (normal case. normal result.)
# tail -f /tmp/1 /tmp/2

# ls -l /proc/`pidof tail`/fd/
total 0
lrwx------ 1 root root 64 May  2 21:04 0 -> /dev/pts/2
lrwx------ 1 root root 64 May  2 21:04 1 -> /dev/pts/2
lrwx------ 1 root root 64 May  2 21:04 2 -> /dev/pts/2
lr-x------ 1 root root 64 May  2 21:04 3 -> /tmp/1
lr-x------ 1 root root 64 May  2 21:04 4 -> /tmp/2
lr-x------ 1 root root 64 May  2 21:04 5 -> anon_inode:inotify
-- (sudo case. only fd == 3 got w bit.)
# sudo tail -f /tmp/1 /tmp/2

# ls -l /proc/`pidof tail`/fd/
total 0
lrwx------ 1 root root 64 May  2 21:05 0 -> /dev/pts/2
lrwx------ 1 root root 64 May  2 21:05 1 -> /dev/pts/2
lrwx------ 1 root root 64 May  2 21:05 2 -> /dev/pts/2
lrwx------ 1 root root 64 May  2 21:05 3 -> /tmp/1
lr-x------ 1 root root 64 May  2 21:05 4 -> /tmp/2
lr-x------ 1 root root 64 May  2 21:05 5 -> anon_inode:inotify
-- (normal case. normal result.)
# tail -f /tmp/1 /tmp/2 /tmp/3

# ls -l /proc/`pidof tail`/fd/
total 0
lrwx------ 1 root root 64 May  2 21:07 0 -> /dev/pts/2
lrwx------ 1 root root 64 May  2 21:07 1 -> /dev/pts/2
lrwx------ 1 root root 64 May  2 21:07 2 -> /dev/pts/2
lr-x------ 1 root root 64 May  2 21:07 3 -> /tmp/1
lr-x------ 1 root root 64 May  2 21:07 4 -> /tmp/2
lr-x------ 1 root root 64 May  2 21:07 5 -> /tmp/3
lr-x------ 1 root root 64 May  2 21:07 6 -> anon_inode:inotify
-- (sudo case. fd == 3 and fd == 6 got w bit.)
# sudo tail -f /tmp/1 /tmp/2 /tmp/3

# ls -l /proc/`pidof tail`/fd/
total 0
lrwx------ 1 root root 64 May  2 21:07 0 -> /dev/pts/2
lrwx------ 1 root root 64 May  2 21:07 1 -> /dev/pts/2
lrwx------ 1 root root 64 May  2 21:07 2 -> /dev/pts/2
lrwx------ 1 root root 64 May  2 21:07 3 -> /tmp/1
lr-x------ 1 root root 64 May  2 21:07 4 -> /tmp/2
lr-x------ 1 root root 64 May  2 21:07 5 -> /tmp/3
lrwx------ 1 root root 64 May  2 21:07 6 -> anon_inode:inotify

I guess something is wrong.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-02 13:40 /proc/pid/fd/ shows strange mode when executed via sudo Tetsuo Handa
@ 2012-05-03 15:42 ` Serge Hallyn
  2012-05-03 16:25   ` Tetsuo Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Serge Hallyn @ 2012-05-03 15:42 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-fsdevel, linux-security-module, keescook

Quoting Tetsuo Handa (penguin-kernel@I-love.SAKURA.ne.jp):
> I noticed a strange difference in /proc/pid/fd/ directory
> when a command is executed via /usr/bin/sudo.
> 
> Say, there are three files in some directory.
> (In my environment, /tmp/ is a plain ext4 partition.)
> 
> # touch /tmp/1
> # touch /tmp/2
> # touch /tmp/3
> # ls -l /tmp/?
> -rw-r--r-- 1 root root 0 May  2 21:48 /tmp/1
> -rw-r--r-- 1 root root 0 May  2 21:48 /tmp/2
> -rw-r--r-- 1 root root 0 May  2 21:48 /tmp/3
> 
> Try to read one of them using "tail -f" from one terminal.
> 
> # tail -f /tmp/1
> 
> Show /proc/pid/fd/ from another terminal.
> 
> # ls -l /proc/`pidof tail`/fd/
> total 0
> lrwx------ 1 root root 64 May  2 21:54 0 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:54 1 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:54 2 -> /dev/pts/0
> lr-x------ 1 root root 64 May  2 21:54 3 -> /tmp/1
> lr-x------ 1 root root 64 May  2 21:54 4 -> anon_inode:inotify
> 
> Quit the "tail -f". Try to read two of them using "tail -f".
> 
> # tail -f /tmp/1 /tmp/2
> 
> Show /proc/pid/fd/ from another terminal.
> 
> # ls -l /proc/`pidof tail`/fd/
> total 0
> lrwx------ 1 root root 64 May  2 21:54 0 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:54 1 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:54 2 -> /dev/pts/0
> lr-x------ 1 root root 64 May  2 21:54 3 -> /tmp/1
> lr-x------ 1 root root 64 May  2 21:54 4 -> /tmp/2
> lr-x------ 1 root root 64 May  2 21:54 5 -> anon_inode:inotify
> 
> Quit the "tail -f". Try to read three of them using "tail -f".
> 
> # tail -f /tmp/1 /tmp/2 /tmp/3
> 
> Show /proc/pid/fd/ from another terminal.
> 
> # ls -l /proc/`pidof tail`/fd/
> total 0
> lrwx------ 1 root root 64 May  2 21:55 0 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:55 1 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:55 2 -> /dev/pts/0
> lr-x------ 1 root root 64 May  2 21:55 3 -> /tmp/1
> lr-x------ 1 root root 64 May  2 21:55 4 -> /tmp/2
> lr-x------ 1 root root 64 May  2 21:55 5 -> /tmp/3
> lr-x------ 1 root root 64 May  2 21:55 6 -> anon_inode:inotify
> 
> Quit the "tail -f". You see, they are all fine.
> 
> However, the output is different when executed via /usr/bin/sudo .
> 
> Try to read one of them using "sudo tail -f" from one terminal.
> 
> # sudo tail -f /tmp/1
> 
> Show /proc/pid/fd/ from another terminal.
> 
> # ls -l /proc/`pidof tail`/fd/
> total 0
> lrwx------ 1 root root 64 May  2 21:55 0 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:55 1 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:55 2 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:55 3 -> /tmp/1
> lrwx------ 1 root root 64 May  2 21:55 4 -> anon_inode:inotify
> 
> Quit the "tail -f". Try to read two of them using "sudo tail -f".
> 
> # sudo tail -f /tmp/1 /tmp/2
> 
> Show /proc/pid/fd/ from another terminal.
> 
> # ls -l /proc/`pidof tail`/fd/
> total 0
> lrwx------ 1 root root 64 May  2 21:56 0 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:56 1 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:56 2 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:56 3 -> /tmp/1
> lrwx------ 1 root root 64 May  2 21:56 4 -> /tmp/2
> lr-x------ 1 root root 64 May  2 21:56 5 -> anon_inode:inotify
> 
> Quit the "tail -f". Try to read three of them using "sudo tail -f".
> 
> # sudo tail -f /tmp/1 /tmp/2 /tmp/3
> 
> Show /proc/pid/fd/ from another terminal.
> 
> # ls -l /proc/`pidof tail`/fd/
> total 0
> lrwx------ 1 root root 64 May  2 21:56 0 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:56 1 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:56 2 -> /dev/pts/0
> lrwx------ 1 root root 64 May  2 21:56 3 -> /tmp/1
> lrwx------ 1 root root 64 May  2 21:56 4 -> /tmp/2
> lr-x------ 1 root root 64 May  2 21:56 5 -> /tmp/3
> lr-x------ 1 root root 64 May  2 21:56 6 -> anon_inode:inotify
> 
> Quit the "tail -f".
> 
> You see, when executed via /usr/bin/sudo , fd == 3 and fd == 4 are reported as
> "lrwx------" whereas fd >= 5 are reported as "lr-x------".
> 
> # strace -f -e open sudo tail -f /tmp/1 /tmp/2 /tmp/3
> 
> shows that /usr/bin/tail is opening /tmp/1 /tmp/2 /tmp/3 as O_RDONLY.
> /usr/bin/sudo can't set w bit before /usr/bin/tail opens them with r bit.
> I wonder from where the w bit came...

Note that if you do

sudo strace -f -e open tail -f /tmp/{1,2,3,4}

then the fds are not opened with write perms.  But if you do as you did,

strace -f -e open sudo tail -f /tmp/1 /tmp/2 /tmp/3

they are.  Interesting.

The same thing also happens for me with tmpfs, and with a debian sid ec2
instance running 2.6.32-5-xen-amd64.

> Above result was obtained using kernel 3.2.0-24-generic-pae (3.2.0-24.37) on
> Ubuntu 12.04, but below result (similar but not identical) was obtained using
> vanilla 3.4-rc5 kernel on CentOS 6.2.
> 
> -- (normal case. normal result.)
> # tail -f /tmp/1 /tmp/2
> 
> # ls -l /proc/`pidof tail`/fd/
> total 0
> lrwx------ 1 root root 64 May  2 21:04 0 -> /dev/pts/2
> lrwx------ 1 root root 64 May  2 21:04 1 -> /dev/pts/2
> lrwx------ 1 root root 64 May  2 21:04 2 -> /dev/pts/2
> lr-x------ 1 root root 64 May  2 21:04 3 -> /tmp/1
> lr-x------ 1 root root 64 May  2 21:04 4 -> /tmp/2
> lr-x------ 1 root root 64 May  2 21:04 5 -> anon_inode:inotify
> -- (sudo case. only fd == 3 got w bit.)
> # sudo tail -f /tmp/1 /tmp/2
> 
> # ls -l /proc/`pidof tail`/fd/
> total 0
> lrwx------ 1 root root 64 May  2 21:05 0 -> /dev/pts/2
> lrwx------ 1 root root 64 May  2 21:05 1 -> /dev/pts/2
> lrwx------ 1 root root 64 May  2 21:05 2 -> /dev/pts/2
> lrwx------ 1 root root 64 May  2 21:05 3 -> /tmp/1
> lr-x------ 1 root root 64 May  2 21:05 4 -> /tmp/2
> lr-x------ 1 root root 64 May  2 21:05 5 -> anon_inode:inotify
> -- (normal case. normal result.)
> # tail -f /tmp/1 /tmp/2 /tmp/3
> 
> # ls -l /proc/`pidof tail`/fd/
> total 0
> lrwx------ 1 root root 64 May  2 21:07 0 -> /dev/pts/2
> lrwx------ 1 root root 64 May  2 21:07 1 -> /dev/pts/2
> lrwx------ 1 root root 64 May  2 21:07 2 -> /dev/pts/2
> lr-x------ 1 root root 64 May  2 21:07 3 -> /tmp/1
> lr-x------ 1 root root 64 May  2 21:07 4 -> /tmp/2
> lr-x------ 1 root root 64 May  2 21:07 5 -> /tmp/3
> lr-x------ 1 root root 64 May  2 21:07 6 -> anon_inode:inotify
> -- (sudo case. fd == 3 and fd == 6 got w bit.)
> # sudo tail -f /tmp/1 /tmp/2 /tmp/3
> 
> # ls -l /proc/`pidof tail`/fd/
> total 0
> lrwx------ 1 root root 64 May  2 21:07 0 -> /dev/pts/2
> lrwx------ 1 root root 64 May  2 21:07 1 -> /dev/pts/2
> lrwx------ 1 root root 64 May  2 21:07 2 -> /dev/pts/2
> lrwx------ 1 root root 64 May  2 21:07 3 -> /tmp/1
> lr-x------ 1 root root 64 May  2 21:07 4 -> /tmp/2
> lr-x------ 1 root root 64 May  2 21:07 5 -> /tmp/3
> lrwx------ 1 root root 64 May  2 21:07 6 -> anon_inode:inotify
> 
> I guess something is wrong.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-03 15:42 ` Serge Hallyn
@ 2012-05-03 16:25   ` Tetsuo Handa
  2012-05-18  2:39     ` Tetsuo Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2012-05-03 16:25 UTC (permalink / raw)
  To: serge.hallyn; +Cc: linux-fsdevel, linux-security-module, keescook

Serge Hallyn wrote:
> Note that if you do
> 
> sudo strace -f -e open tail -f /tmp/{1,2,3,4}
> 
> then the fds are not opened with write perms.  But if you do as you did,
> 
> strace -f -e open sudo tail -f /tmp/1 /tmp/2 /tmp/3
> 
> they are.  Interesting.
> 
They are not opened with write perms, for vfs_write() rejects write requests
with -EBADF.

----- test.c start -----
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#define O_LARGEFILE 00100000

int main(int argc, char *argv[])
{
	int fd1 = open("/tmp/1", O_RDONLY | O_LARGEFILE);
	int fd2 = open("/tmp/2", O_RDONLY | O_LARGEFILE);
	int fd3 = open("/tmp/3", O_RDONLY | O_LARGEFILE);
	int fd4 = open("/tmp/4", O_RDONLY | O_LARGEFILE);
	printf("fd1=%d fd2=%d fd3=%d fd4=%d\n", fd1, fd2, fd3, fd4);
	printf("write(fd1)=%d write(fd2)=%d write(fd3)=%d write(fd4)=%d\n",
	       write(fd1, "", 1), write(fd2, "", 1),
	       write(fd3, "", 1), write(fd4, "", 1));
	getchar();
	return 0;
}
----- test.c end -----

# sudo ./a.out
fd1=3 fd2=4 fd3=5 fd4=6
write(fd1)=-1 write(fd2)=-1 write(fd3)=-1 write(fd4)=-1

Also, /proc/pid/fdinfo/ shows correct info.
So, the problem is nothing but that the /proc/pid/fd/ is showing strange mode.
But I can't imagine what can make /proc/pid/fd/ wrong when /proc/pid/fdinfo/ is
correct...

> The same thing also happens for me with tmpfs, and with a debian sid ec2
> instance running 2.6.32-5-xen-amd64.

2.6.32-220.13.1.el6 has this problem but
2.6.18-308.4.1.el5 does not have this problem.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-03 16:25   ` Tetsuo Handa
@ 2012-05-18  2:39     ` Tetsuo Handa
  2012-05-18  9:27       ` Tetsuo Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2012-05-18  2:39 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: linux-security-module, serge.hallyn, keescook

It turned out that /usr/bin/sudo is using /proc/self/fd/ for closing already
opened files. I made a simple demo program that can reproduce this regression.

---------- test.c start ----------
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <dirent.h>
#include <string.h>

static void opentest(void)
{
	FILE *fp = fopen("/dev/tty", "a");
	int i;
	char buffer[1024];
	memset(buffer, 0, sizeof(buffer));
	for (i = 0; i < 5; i++) {
		struct stat buf;
		int fd = open("/proc/self/exe", O_RDONLY);
		if (fd == EOF)
			break;
		snprintf(buffer, sizeof(buffer) - 1, "/proc/self/fd/%u", fd);
		if (lstat(buffer, &buf))
			continue;
		if ((buf.st_mode & 0700) == 0700) {
			char buffer2[1024];
			memset(buffer2, 0, sizeof(buffer2));
			readlink(buffer, buffer2, sizeof(buffer2) - 1);
			fprintf(fp, "%s -> %s \n", buffer, buffer2);
		}
	}
}

int main(int argc, char *argv[])
{
	DIR *dirp = (argc > 1) ? opendir("/proc/self/fd") : NULL;
	if (dirp) {
		struct dirent *dent;
		fprintf(stderr, "closefrom with /proc/self/fd/\n");
		while ((dent = readdir(dirp)) != NULL) {
			int fd;
			if (sscanf(dent->d_name, "%u", &fd) == 1 &&
			    fd != dirfd(dirp))
				close(fd);
		}
		closedir(dirp);
	} else {
		int fd;
		fprintf(stderr, "closefrom without /proc/self/fd/\n");
		for (fd = 0; fd < 1024; fd++)
			close(fd);
	}
	opentest();
	return 0;
}
---------- test.c end ----------

[root@ccsecurity tmp]# ./a.out 1
closefrom with /proc/self/fd/
/proc/self/fd/1 -> /tmp/a.out
/proc/self/fd/2 -> /tmp/a.out
[root@ccsecurity tmp]# ./a.out
closefrom without /proc/self/fd/
[root@ccsecurity tmp]#

I tried on three kernels.

  2.6.18-308.4.1.el5 : OK
  2.6.26-2-686 (2.6.26-26lenny4) : NG
  2.6.32-220.17.1.el6 : NG

This regression seems to be introduced between 2.6.19 and 2.6.26.
This regression seems to involve opendir()/closedir() usage.

Regards.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18  2:39     ` Tetsuo Handa
@ 2012-05-18  9:27       ` Tetsuo Handa
  2012-05-18 16:08         ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Tetsuo Handa @ 2012-05-18  9:27 UTC (permalink / raw)
  To: ebiederm; +Cc: linux-fsdevel, akpm, torvalds

Hello.

I noticed that /proc/self/fd/ returns wrong information to lstat() when files
are open()ed after already opened files have been close()d in accordance with
entries returned from opendir("/proc/self/fd").

Bisection reached to below commit added between 2.6.18 and 2.6.19-rc1.

  git bisect start '3f2e05e90e0846c42626e3d272454f26be34a1bc' 'e9ff3990f08e9a0c2839cc22808b01732ea5b3e4' '1651e14e28a2d9f446018ef522882e0709a2ce4f' '43fa1adb9334bf4585cd53144eb5911488f85bc7' '609d7fa9565c754428d2520cac2accc9052e1245' 'ee0b3e671baff681d69fbf0db33b47603c0a8280' 'cf342e52e3117391868fb4bd900ce772a27a5a1a' 'v2.6.18' 'v2.6.17' 'v2.6.16' 'v2.6.15' 'v2.6.14' 'v2.6.13' 'v2.6.12'
  git bisect good 5f4c6bc1f369f20807a8e753c2308d1629478c61
  git bisect bad 5e6b3f42edc20e988b186fbfb9eec174294222ea
  git bisect good 801199ce805a2412bbcd9bfe213092ec656013dd
  git bisect bad 61a28784028e6d55755e4d0f39bee8d9bf2ee8d9
  git bisect good 444ceed8d186631fdded5e3f24dc20b93d0d3fda

Below commit replaced filldir() with proc_pident_fill_cache().
Eric, would you check (though no need to be urgent)?

commit 61a28784028e6d55755e4d0f39bee8d9bf2ee8d9
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Mon Oct 2 02:18:49 2006 -0700

    [PATCH] proc: Remove the hard coded inode numbers

    The hard coded inode numbers in proc currently limit its maintainability,
    its flexibility, and what can be done with the rest of system.  /proc limits
    pid-max to 32768 on 32 bit systems it limits fd-max to 32768 on all systems,
    and placing the pid in the inode number really gets in the way of implementing
    subdirectories of per process information.

    Ever since people started adding to the middle of the file type enumeration we
    haven't been maintaing the historical inode numbers, all we have really
    succeeded in doing is keeping the pid in the proc inode number.  The pid is
    already available in the directory name so no information is lost removing it
    from the inode number.

    So if something in user space cares if we remove the inode number from the
    /proc inode it is almost certainly broken.

    Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

:040000 040000 1bf9577e5bab5f938e780de96ab79003523688ec 46f890c72c75bfb6ecaee8c985f5127bcf82eef3 M      fs

Tetsuo Handa wrote:
> It turned out that /usr/bin/sudo is using /proc/self/fd/ for closing already
> opened files. I made a simple demo program that can reproduce this regression.
> 
> ---------- test.c start ----------
> #include <stdio.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <stdlib.h>
> #include <dirent.h>
> #include <string.h>
> 
> static void opentest(void)
> {
> 	FILE *fp = fopen("/dev/tty", "a");
> 	int i;
> 	char buffer[1024];
> 	memset(buffer, 0, sizeof(buffer));
> 	for (i = 0; i < 5; i++) {
> 		struct stat buf;
> 		int fd = open("/proc/self/exe", O_RDONLY);
> 		if (fd == EOF)
> 			break;
> 		snprintf(buffer, sizeof(buffer) - 1, "/proc/self/fd/%u", fd);
> 		if (lstat(buffer, &buf))
> 			continue;
> 		if ((buf.st_mode & 0700) == 0700) {
> 			char buffer2[1024];
> 			memset(buffer2, 0, sizeof(buffer2));
> 			readlink(buffer, buffer2, sizeof(buffer2) - 1);
> 			fprintf(fp, "%s -> %s \n", buffer, buffer2);
> 		}
> 	}
> }
> 
> int main(int argc, char *argv[])
> {
> 	DIR *dirp = (argc > 1) ? opendir("/proc/self/fd") : NULL;
> 	if (dirp) {
> 		struct dirent *dent;
> 		fprintf(stderr, "closefrom with /proc/self/fd/\n");
> 		while ((dent = readdir(dirp)) != NULL) {
> 			int fd;
> 			if (sscanf(dent->d_name, "%u", &fd) == 1 &&
> 			    fd != dirfd(dirp))
> 				close(fd);
> 		}
> 		closedir(dirp);
> 	} else {
> 		int fd;
> 		fprintf(stderr, "closefrom without /proc/self/fd/\n");
> 		for (fd = 0; fd < 1024; fd++)
> 			close(fd);
> 	}
> 	opentest();
> 	return 0;
> }
> ---------- test.c end ----------
> 
> [root@ccsecurity tmp]# ./a.out 1
> closefrom with /proc/self/fd/
> /proc/self/fd/1 -> /tmp/a.out
> /proc/self/fd/2 -> /tmp/a.out
> [root@ccsecurity tmp]# ./a.out
> closefrom without /proc/self/fd/
> [root@ccsecurity tmp]#
> 
> I tried on three kernels.
> 
>   2.6.18-308.4.1.el5 : OK
>   2.6.26-2-686 (2.6.26-26lenny4) : NG
>   2.6.32-220.17.1.el6 : NG
> 
> This regression seems to be introduced between 2.6.19 and 2.6.26.
> This regression seems to involve opendir()/closedir() usage.
> 
> Regards.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18  9:27       ` Tetsuo Handa
@ 2012-05-18 16:08         ` Linus Torvalds
  2012-05-18 16:25           ` Linus Torvalds
  2012-05-18 18:08           ` Al Viro
  0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2012-05-18 16:08 UTC (permalink / raw)
  To: Tetsuo Handa, Al Viro; +Cc: ebiederm, linux-fsdevel, akpm

[-- Attachment #1: Type: text/plain, Size: 3218 bytes --]

[ Added Al explicitly, although I guess he sees the fsdevel posts too ]

On Fri, May 18, 2012 at 2:27 AM, Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> I noticed that /proc/self/fd/ returns wrong information to lstat() when files
> are open()ed after already opened files have been close()d in accordance with
> entries returned from opendir("/proc/self/fd").

So if I understand this correctly, the wrong information is otherwise
fine, except the link has the S_IWUSR bit set. Correct?

The code that determines the mode of the link is
proc_fd_instantiate(), which does

        inode->i_mode = S_IFLNK;

        /*
         * We are not taking a ref to the file structure, so we must
         * hold ->file_lock.
         */
        spin_lock(&files->file_lock);
        file = fcheck_files(files, fd);
        if (!file)
                goto out_unlock;
        if (file->f_mode & FMODE_READ)
                inode->i_mode |= S_IRUSR | S_IXUSR;
        if (file->f_mode & FMODE_WRITE)
                inode->i_mode |= S_IWUSR | S_IXUSR;
        spin_unlock(&files->file_lock);

and what *seems* to happen is that your test program basically
triggers a situation where the old /proc/self/fd/<x> dentry has not
been replaced, so it contains the previous one that was writable
(because you used to have a writable fd there before you closed it).
The "readdir()" you have probably only matters because it instantiates
the dentries in /proc/self/fd, and then the "close()" just doesn't
invalidate the cached dentry - and neither does the lookup().

In fact, I can show the issue much more simply with this program, no
readdir() required:

--- test.c ---
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

static void show_stat(const char *mode, int fd)
{
	char buffer[100];
	struct stat st;

	snprintf(buffer, sizeof(buffer), "/proc/self/fd/%u", fd);
	if (!lstat(buffer, &st))
		printf("st_mode = %06o (%s)\n", st.st_mode, mode);
}

int main(int argc, char *argv[])
{
	int fdrw = open("/dev/null", O_RDWR);
	int fdro = open("/dev/null", O_RDONLY);

	show_stat("read-write", fdrw);
	show_stat("read-only", fdro);

	dup2(fdro, fdrw);
	show_stat("read-only", fdrw);
	show_stat("read-only", fdro);

	return 0;
}
--- end test.c ---

and running it results in:

    [torvalds@i5 ~]$ ./a.out
    st_mode = 120700 (read-write)
    st_mode = 120500 (read-only)
    st_mode = 120700 (read-only)
    st_mode = 120500 (read-only)

notice how the "dup2()" that closed the read-write fd and replaced it
with the read-only fd still shows 0700 in the lstat information.

The good news is that we still do check the inode permission when
actually following the link and doing the open(), so you cannot
actually open a read-only file using that link. So it's a "stale
information" thing, and ugly, but not a security issue.

I would suggest just moving the i_mode initialization from
proc_fd_instantiate() into the revalidate function that we already
have, and that already fixes up i_uid/i_gid etc. Attached is a TOTALLY
UNTESTED patch that does this, and actually seems to simplify things
in the process.

Al, Eric?

                         Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 2527 bytes --]

 fs/proc/base.c | 43 ++++++++++++++-----------------------------
 1 file changed, 14 insertions(+), 29 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1c8b280146d7..7d6ad98631f2 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1799,10 +1799,15 @@ static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
 	if (task) {
 		files = get_files_struct(task);
 		if (files) {
+			struct file *file;
 			rcu_read_lock();
-			if (fcheck_files(files, fd)) {
+			file = fcheck_files(files, fd);
+			if (file) {
+				unsigned i_mode, f_mode = file->f_mode;
+
 				rcu_read_unlock();
 				put_files_struct(files);
+
 				if (task_dumpable(task)) {
 					rcu_read_lock();
 					cred = __task_cred(task);
@@ -1813,7 +1818,14 @@ static int tid_fd_revalidate(struct dentry *dentry, struct nameidata *nd)
 					inode->i_uid = 0;
 					inode->i_gid = 0;
 				}
-				inode->i_mode &= ~(S_ISUID | S_ISGID);
+
+				i_mode = S_IFLNK;
+				if (f_mode & FMODE_READ)
+					i_mode |= S_IRUSR | S_IXUSR;
+				if (f_mode & FMODE_WRITE)
+					i_mode |= S_IWUSR | S_IXUSR;
+				inode->i_mode = i_mode;
+
 				security_task_to_inode(task, inode);
 				put_task_struct(task);
 				return 1;
@@ -1837,8 +1849,6 @@ static struct dentry *proc_fd_instantiate(struct inode *dir,
 	struct dentry *dentry, struct task_struct *task, const void *ptr)
 {
 	unsigned fd = *(const unsigned *)ptr;
-	struct file *file;
-	struct files_struct *files;
  	struct inode *inode;
  	struct proc_inode *ei;
 	struct dentry *error = ERR_PTR(-ENOENT);
@@ -1848,25 +1858,6 @@ static struct dentry *proc_fd_instantiate(struct inode *dir,
 		goto out;
 	ei = PROC_I(inode);
 	ei->fd = fd;
-	files = get_files_struct(task);
-	if (!files)
-		goto out_iput;
-	inode->i_mode = S_IFLNK;
-
-	/*
-	 * We are not taking a ref to the file structure, so we must
-	 * hold ->file_lock.
-	 */
-	spin_lock(&files->file_lock);
-	file = fcheck_files(files, fd);
-	if (!file)
-		goto out_unlock;
-	if (file->f_mode & FMODE_READ)
-		inode->i_mode |= S_IRUSR | S_IXUSR;
-	if (file->f_mode & FMODE_WRITE)
-		inode->i_mode |= S_IWUSR | S_IXUSR;
-	spin_unlock(&files->file_lock);
-	put_files_struct(files);
 
 	inode->i_op = &proc_pid_link_inode_operations;
 	inode->i_size = 64;
@@ -1879,12 +1870,6 @@ static struct dentry *proc_fd_instantiate(struct inode *dir,
 
  out:
 	return error;
-out_unlock:
-	spin_unlock(&files->file_lock);
-	put_files_struct(files);
-out_iput:
-	iput(inode);
-	goto out;
 }
 
 static struct dentry *proc_lookupfd_common(struct inode *dir,

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18 16:08         ` Linus Torvalds
@ 2012-05-18 16:25           ` Linus Torvalds
  2012-05-18 19:55             ` Eric W. Biederman
  2012-05-18 18:08           ` Al Viro
  1 sibling, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2012-05-18 16:25 UTC (permalink / raw)
  To: Tetsuo Handa, Al Viro; +Cc: ebiederm, linux-fsdevel, akpm

On Fri, May 18, 2012 at 9:08 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I would suggest just moving the i_mode initialization from
> proc_fd_instantiate() into the revalidate function that we already
> have, and that already fixes up i_uid/i_gid etc. Attached is a TOTALLY
> UNTESTED patch that does this, and actually seems to simplify things
> in the process.

Ok, so it's now "tested" in the sense that it works for me, and fixes
both my and your test-cases.

Which doesn't mean that it is bug-free of course, but it does seem to
be a sane patch that actually cleans things up.

I'll delay committing it in case somebody hollers, but I think I'll
mark it for stable too since this issue seems age-old, and the fix
looks good.

Ack/nak? Can anybody find anything wrong in that patch?

                   Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18 16:08         ` Linus Torvalds
  2012-05-18 16:25           ` Linus Torvalds
@ 2012-05-18 18:08           ` Al Viro
  2012-05-18 18:18             ` Linus Torvalds
  1 sibling, 1 reply; 20+ messages in thread
From: Al Viro @ 2012-05-18 18:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm


> I would suggest just moving the i_mode initialization from
> proc_fd_instantiate() into the revalidate function that we already
> have, and that already fixes up i_uid/i_gid etc. Attached is a TOTALLY
> UNTESTED patch that does this, and actually seems to simplify things
> in the process.

I think this is bogus.  We don't give a fuck about *any* of those fields
for symlinks; the only problem here is that default ->getattr() uses
them to fill ->st_mode et.al.  The same goes for ->i_uid/->i_gid.
So how about simply adding ->getattr() for those guys?  And to hell
with assignments in that ->d_revalidate() instance...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18 18:08           ` Al Viro
@ 2012-05-18 18:18             ` Linus Torvalds
  2012-05-18 18:23               ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2012-05-18 18:18 UTC (permalink / raw)
  To: Al Viro; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm

On Fri, May 18, 2012 at 11:08 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> I think this is bogus.  We don't give a fuck about *any* of those fields
> for symlinks; the only problem here is that default ->getattr() uses
> them to fill ->st_mode et.al.  The same goes for ->i_uid/->i_gid.
> So how about simply adding ->getattr() for those guys?  And to hell
> with assignments in that ->d_revalidate() instance...

Doing it at getattr() time does sound good.

Let me try to cook something up.

               Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18 18:18             ` Linus Torvalds
@ 2012-05-18 18:23               ` Linus Torvalds
  2012-05-18 18:45                 ` Al Viro
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2012-05-18 18:23 UTC (permalink / raw)
  To: Al Viro; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm

On Fri, May 18, 2012 at 11:18 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Doing it at getattr() time does sound good.
>
> Let me try to cook something up.

Ugh. It's a much bigger patch, because we share the inode operations
with other cases too.

So I think that would fall under the "further cleanup" heading, and
I'm not going to do it now. Not with 3.4 imminent.

But if you were to do such a cleanup later...

                 Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18 18:23               ` Linus Torvalds
@ 2012-05-18 18:45                 ` Al Viro
  2012-05-18 18:55                   ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Al Viro @ 2012-05-18 18:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm

On Fri, May 18, 2012 at 11:23:27AM -0700, Linus Torvalds wrote:
> On Fri, May 18, 2012 at 11:18 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Doing it at getattr() time does sound good.
> >
> > Let me try to cook something up.
> 
> Ugh. It's a much bigger patch, because we share the inode operations
> with other cases too.
> 
> So I think that would fall under the "further cleanup" heading, and
> I'm not going to do it now. Not with 3.4 imminent.
> 
> But if you were to do such a cleanup later...

It's actually not big.  Completely untested minimal variant (without
touching ->d_revalidate()):

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1c8b280..b2a8b8f 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1491,6 +1491,44 @@ static const struct inode_operations proc_pid_link_inode_operations = {
 	.setattr	= proc_setattr,
 };
 
+static int proc_fd_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
+{
+	struct inode *inode = dentry->d_inode;
+	struct files_struct *files;
+	struct task_struct *task;
+	struct file *file;
+	fmode_t mode = 0;
+
+	generic_fillattr(inode, stat);
+
+	rcu_read_lock();
+	task = pid_task(proc_pid(inode), PIDTYPE_PID);
+	files = task ? get_files_struct(task) : NULL;
+	file = files ? fcheck_files(files, PROC_I(inode)->fd) : NULL;
+	if (file)
+		mode = file->f_mode;
+	rcu_read_unlock();
+
+	if (files)
+		put_files_struct(files);
+	if (!file)
+		return -ENOENT;
+
+	if (mode & FMODE_READ)
+		stat->mode |= S_IRUSR | S_IXUSR;
+	if (mode & FMODE_WRITE)
+		stat->mode |= S_IWUSR | S_IXUSR;
+	stat->size = 64;
+	return 0;
+}
+
+static const struct inode_operations proc_pid_fd_inode_operations = {
+	.readlink	= proc_pid_readlink,
+	.follow_link	= proc_pid_follow_link,
+	.setattr	= proc_setattr,
+	.getattr	= proc_fd_getattr,
+};
+
 
 /* building an inode */
 
@@ -1851,25 +1889,16 @@ static struct dentry *proc_fd_instantiate(struct inode *dir,
 	files = get_files_struct(task);
 	if (!files)
 		goto out_iput;
-	inode->i_mode = S_IFLNK;
 
-	/*
-	 * We are not taking a ref to the file structure, so we must
-	 * hold ->file_lock.
-	 */
-	spin_lock(&files->file_lock);
+	rcu_read_lock();
 	file = fcheck_files(files, fd);
-	if (!file)
-		goto out_unlock;
-	if (file->f_mode & FMODE_READ)
-		inode->i_mode |= S_IRUSR | S_IXUSR;
-	if (file->f_mode & FMODE_WRITE)
-		inode->i_mode |= S_IWUSR | S_IXUSR;
-	spin_unlock(&files->file_lock);
+	rcu_read_unlock();
 	put_files_struct(files);
+	if (!file)
+		goto out_iput;
 
-	inode->i_op = &proc_pid_link_inode_operations;
-	inode->i_size = 64;
+	inode->i_mode = S_IFLNK;
+	inode->i_op = &proc_pid_fd_inode_operations;
 	ei->op.proc_get_link = proc_fd_link;
 	d_set_d_op(dentry, &tid_fd_dentry_operations);
 	d_add(dentry, inode);
@@ -1879,9 +1908,6 @@ static struct dentry *proc_fd_instantiate(struct inode *dir,
 
  out:
 	return error;
-out_unlock:
-	spin_unlock(&files->file_lock);
-	put_files_struct(files);
 out_iput:
 	iput(inode);
 	goto out;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18 18:45                 ` Al Viro
@ 2012-05-18 18:55                   ` Linus Torvalds
  2012-05-18 19:10                     ` Al Viro
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2012-05-18 18:55 UTC (permalink / raw)
  To: Al Viro; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm

On Fri, May 18, 2012 at 11:45 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> It's actually not big.  Completely untested minimal variant (without
> touching ->d_revalidate()):

Get rid of all the now unnecessary files stuff and error handling in
proc_fd_instantiate() too (it's unnecessary - it only ends up being
re-done in the revalidate function anyway), and I'll take it.

                     Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18 18:55                   ` Linus Torvalds
@ 2012-05-18 19:10                     ` Al Viro
  2012-05-18 20:49                       ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Al Viro @ 2012-05-18 19:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm

On Fri, May 18, 2012 at 11:55:21AM -0700, Linus Torvalds wrote:
> On Fri, May 18, 2012 at 11:45 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > It's actually not big. ?Completely untested minimal variant (without
> > touching ->d_revalidate()):
> 
> Get rid of all the now unnecessary files stuff and error handling in
> proc_fd_instantiate() too (it's unnecessary - it only ends up being
> re-done in the revalidate function anyway), and I'll take it.

Umm...  I'd rather do it other way - do that test *before* bothering
with allocating an inode.  IOW, use it as early cutoff, with
tid_fd_revalidate() called in the end closing any races about
file getting closed while we'd been allocating an inode, etc.

IOW, how about this?  Note that it's _still_ completely untested wrt
weird LSM stuff.  I simply don't have tomoyo/apparmor/whatnot setups
to test on, so I'd rather see an ACK from the LSM people.

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1c8b280..e663a04 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1491,6 +1491,44 @@ static const struct inode_operations proc_pid_link_inode_operations = {
 	.setattr	= proc_setattr,
 };
 
+static int proc_fd_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
+{
+	struct inode *inode = dentry->d_inode;
+	struct files_struct *files;
+	struct task_struct *task;
+	struct file *file;
+	fmode_t mode = 0;
+
+	generic_fillattr(inode, stat);
+
+	rcu_read_lock();
+	task = pid_task(proc_pid(inode), PIDTYPE_PID);
+	files = task ? get_files_struct(task) : NULL;
+	file = files ? fcheck_files(files, PROC_I(inode)->fd) : NULL;
+	if (file)
+		mode = file->f_mode;
+	rcu_read_unlock();
+
+	if (files)
+		put_files_struct(files);
+	if (!file)
+		return -ENOENT;
+
+	if (mode & FMODE_READ)
+		stat->mode |= S_IRUSR | S_IXUSR;
+	if (mode & FMODE_WRITE)
+		stat->mode |= S_IWUSR | S_IXUSR;
+	stat->size = 64;
+	return 0;
+}
+
+static const struct inode_operations proc_pid_fd_inode_operations = {
+	.readlink	= proc_pid_readlink,
+	.follow_link	= proc_pid_follow_link,
+	.setattr	= proc_setattr,
+	.getattr	= proc_fd_getattr,
+};
+
 
 /* building an inode */
 
@@ -1837,54 +1875,38 @@ static struct dentry *proc_fd_instantiate(struct inode *dir,
 	struct dentry *dentry, struct task_struct *task, const void *ptr)
 {
 	unsigned fd = *(const unsigned *)ptr;
-	struct file *file;
-	struct files_struct *files;
+	struct file *file = NULL;
+	struct files_struct *files = get_files_struct(task);
  	struct inode *inode;
  	struct proc_inode *ei;
-	struct dentry *error = ERR_PTR(-ENOENT);
+
+	if (files) {
+		rcu_read_lock();
+		file = fcheck_files(files, fd);
+		rcu_read_unlock();
+		put_files_struct(files);
+	}
+
+	if (!file)
+		return ERR_PTR(-ENOENT);
 
 	inode = proc_pid_make_inode(dir->i_sb, task);
 	if (!inode)
-		goto out;
+		return ERR_PTR(-ENOMEM);
 	ei = PROC_I(inode);
 	ei->fd = fd;
-	files = get_files_struct(task);
-	if (!files)
-		goto out_iput;
 	inode->i_mode = S_IFLNK;
-
-	/*
-	 * We are not taking a ref to the file structure, so we must
-	 * hold ->file_lock.
-	 */
-	spin_lock(&files->file_lock);
-	file = fcheck_files(files, fd);
-	if (!file)
-		goto out_unlock;
-	if (file->f_mode & FMODE_READ)
-		inode->i_mode |= S_IRUSR | S_IXUSR;
-	if (file->f_mode & FMODE_WRITE)
-		inode->i_mode |= S_IWUSR | S_IXUSR;
-	spin_unlock(&files->file_lock);
-	put_files_struct(files);
-
-	inode->i_op = &proc_pid_link_inode_operations;
-	inode->i_size = 64;
+	inode->i_op = &proc_pid_fd_inode_operations;
 	ei->op.proc_get_link = proc_fd_link;
 	d_set_d_op(dentry, &tid_fd_dentry_operations);
 	d_add(dentry, inode);
-	/* Close the race of the process dying before we return the dentry */
-	if (tid_fd_revalidate(dentry, NULL))
-		error = NULL;
-
- out:
-	return error;
-out_unlock:
-	spin_unlock(&files->file_lock);
-	put_files_struct(files);
-out_iput:
-	iput(inode);
-	goto out;
+	/*
+	 * Close the race of the process dying or file getting closed
+	 * before we return the dentry
+	 */
+	if (!tid_fd_revalidate(dentry, NULL))
+		return ERR_PTR(-ENOENT);
+	return NULL;
 }
 
 static struct dentry *proc_lookupfd_common(struct inode *dir,

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18 16:25           ` Linus Torvalds
@ 2012-05-18 19:55             ` Eric W. Biederman
  0 siblings, 0 replies; 20+ messages in thread
From: Eric W. Biederman @ 2012-05-18 19:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tetsuo Handa, Al Viro, linux-fsdevel, akpm

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, May 18, 2012 at 9:08 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> I would suggest just moving the i_mode initialization from
>> proc_fd_instantiate() into the revalidate function that we already
>> have, and that already fixes up i_uid/i_gid etc. Attached is a TOTALLY
>> UNTESTED patch that does this, and actually seems to simplify things
>> in the process.
>
> Ok, so it's now "tested" in the sense that it works for me, and fixes
> both my and your test-cases.
>
> Which doesn't mean that it is bug-free of course, but it does seem to
> be a sane patch that actually cleans things up.
>
> I'll delay committing it in case somebody hollers, but I think I'll
> mark it for stable too since this issue seems age-old, and the fix
> looks good.
>
> Ack/nak? Can anybody find anything wrong in that patch?

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

It looks reasonable.

I am a tad leery of the d_add without setting inode->i_mode
but the dcache lookup will have to call revalidate before
we access the inode so I don't expect a problem there.

Eric

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18 19:10                     ` Al Viro
@ 2012-05-18 20:49                       ` Linus Torvalds
  2012-05-18 21:23                         ` Al Viro
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2012-05-18 20:49 UTC (permalink / raw)
  To: Al Viro; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm

On Fri, May 18, 2012 at 12:10 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Umm...  I'd rather do it other way - do that test *before* bothering
> with allocating an inode.  IOW, use it as early cutoff, with
> tid_fd_revalidate() called in the end closing any races about
> file getting closed while we'd been allocating an inode, etc.

I think you're optimizing the wrong case, and adding code to do so.

The ENOENT case never happens in practice - there's no sane situation
where you'd look up a non-existend /proc/xyz/fd/X file.

So rather than "optimize" the case where you don't need an inode
allocation, you're just wasting time doing the file lookup twice, and
adding more code.

Also, thinking some more about it: while I do agree that doing things
at getattr() time is a really clean approach, we still do need the
dentry revalidation for the existence check. And at that point it's
actually pretty much free to just update the inode information, since
we had to do the file lookup anyway.

So I'm actually starting to think that I prefer my original simpler
patch after all. It handles the case we care about, and doesn't add
any unnecessary code.

                     Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18 20:49                       ` Linus Torvalds
@ 2012-05-18 21:23                         ` Al Viro
  2012-05-18 21:26                           ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Al Viro @ 2012-05-18 21:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm

On Fri, May 18, 2012 at 01:49:18PM -0700, Linus Torvalds wrote:
> I think you're optimizing the wrong case, and adding code to do so.
> 
> The ENOENT case never happens in practice - there's no sane situation
> where you'd look up a non-existend /proc/xyz/fd/X file.

I would agree if we only called that on lookup.  We also do that on
readdir(), for every descriptor in range 0..files->max_fds-1.  So
if you have sufficiently sparse set of descriptors, it will be
called a _lot_.  Moreover, that's the usual path for calling it,
exactly because we do getdents before trying to open/lstat/anything
else.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18 21:23                         ` Al Viro
@ 2012-05-18 21:26                           ` Linus Torvalds
  2012-05-18 21:32                             ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2012-05-18 21:26 UTC (permalink / raw)
  To: Al Viro; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm

On Fri, May 18, 2012 at 2:23 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> I would agree if we only called that on lookup.  We also do that on
> readdir(), for every descriptor in range 0..files->max_fds-1.  So
> if you have sufficiently sparse set of descriptors, it will be
> called a _lot_.  Moreover, that's the usual path for calling it,
> exactly because we do getdents before trying to open/lstat/anything
> else.

Ahh, good catch.

However, wouldn't it be better to move the check you added to the
readdir() path, then, so that we don't do it unnecessarily for
lookups..

I'll check how that looks.

               Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18 21:26                           ` Linus Torvalds
@ 2012-05-18 21:32                             ` Linus Torvalds
  2012-05-18 22:29                               ` Al Viro
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2012-05-18 21:32 UTC (permalink / raw)
  To: Al Viro; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm

On Fri, May 18, 2012 at 2:26 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> However, wouldn't it be better to move the check you added to the
> readdir() path, then, so that we don't do it unnecessarily for
> lookups..
>
> I'll check how that looks.

Actually, we already do that. It's the fcheck_files() check in
proc_readfd_common(), no?

So I don't think we actually call that proc_fd_instantiate() function
normally with out-of-range numbers. Afaik, only a (very rare) race, or
an insane lookup of non-existent names would do it, neither of which
looks like anything we should worry about.

Am I missing some other path?

                    Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18 21:32                             ` Linus Torvalds
@ 2012-05-18 22:29                               ` Al Viro
  2012-05-19  7:08                                 ` Tetsuo Handa
  0 siblings, 1 reply; 20+ messages in thread
From: Al Viro @ 2012-05-18 22:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Tetsuo Handa, ebiederm, linux-fsdevel, akpm

On Fri, May 18, 2012 at 02:32:58PM -0700, Linus Torvalds wrote:
> On Fri, May 18, 2012 at 2:26 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > However, wouldn't it be better to move the check you added to the
> > readdir() path, then, so that we don't do it unnecessarily for
> > lookups..
> >
> > I'll check how that looks.
> 
> Actually, we already do that. It's the fcheck_files() check in
> proc_readfd_common(), no?

Missed that.  Yes, you are right - there's no point keeping that
in proc_fd_instantiate().  OK, messing with fcheck_files() in there
dropped now...

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1c8b280..3986db1 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1491,6 +1491,44 @@ static const struct inode_operations proc_pid_link_inode_operations = {
 	.setattr	= proc_setattr,
 };
 
+static int proc_fd_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat)
+{
+	struct inode *inode = dentry->d_inode;
+	struct files_struct *files;
+	struct task_struct *task;
+	struct file *file;
+	fmode_t mode = 0;
+
+	generic_fillattr(inode, stat);
+
+	rcu_read_lock();
+	task = pid_task(proc_pid(inode), PIDTYPE_PID);
+	files = task ? get_files_struct(task) : NULL;
+	file = files ? fcheck_files(files, PROC_I(inode)->fd) : NULL;
+	if (file)
+		mode = file->f_mode;
+	rcu_read_unlock();
+
+	if (files)
+		put_files_struct(files);
+	if (!file)
+		return -ENOENT;
+
+	if (mode & FMODE_READ)
+		stat->mode |= S_IRUSR | S_IXUSR;
+	if (mode & FMODE_WRITE)
+		stat->mode |= S_IWUSR | S_IXUSR;
+	stat->size = 64;
+	return 0;
+}
+
+static const struct inode_operations proc_pid_fd_inode_operations = {
+	.readlink	= proc_pid_readlink,
+	.follow_link	= proc_pid_follow_link,
+	.setattr	= proc_setattr,
+	.getattr	= proc_fd_getattr,
+};
+
 
 /* building an inode */
 
@@ -1837,54 +1875,26 @@ static struct dentry *proc_fd_instantiate(struct inode *dir,
 	struct dentry *dentry, struct task_struct *task, const void *ptr)
 {
 	unsigned fd = *(const unsigned *)ptr;
-	struct file *file;
-	struct files_struct *files;
  	struct inode *inode;
  	struct proc_inode *ei;
-	struct dentry *error = ERR_PTR(-ENOENT);
 
 	inode = proc_pid_make_inode(dir->i_sb, task);
 	if (!inode)
-		goto out;
+		return ERR_PTR(-ENOMEM);
 	ei = PROC_I(inode);
 	ei->fd = fd;
-	files = get_files_struct(task);
-	if (!files)
-		goto out_iput;
 	inode->i_mode = S_IFLNK;
-
-	/*
-	 * We are not taking a ref to the file structure, so we must
-	 * hold ->file_lock.
-	 */
-	spin_lock(&files->file_lock);
-	file = fcheck_files(files, fd);
-	if (!file)
-		goto out_unlock;
-	if (file->f_mode & FMODE_READ)
-		inode->i_mode |= S_IRUSR | S_IXUSR;
-	if (file->f_mode & FMODE_WRITE)
-		inode->i_mode |= S_IWUSR | S_IXUSR;
-	spin_unlock(&files->file_lock);
-	put_files_struct(files);
-
-	inode->i_op = &proc_pid_link_inode_operations;
-	inode->i_size = 64;
+	inode->i_op = &proc_pid_fd_inode_operations;
 	ei->op.proc_get_link = proc_fd_link;
 	d_set_d_op(dentry, &tid_fd_dentry_operations);
 	d_add(dentry, inode);
-	/* Close the race of the process dying before we return the dentry */
-	if (tid_fd_revalidate(dentry, NULL))
-		error = NULL;
-
- out:
-	return error;
-out_unlock:
-	spin_unlock(&files->file_lock);
-	put_files_struct(files);
-out_iput:
-	iput(inode);
-	goto out;
+	/*
+	 * Close the race of the process dying or file getting closed
+	 * before we return the dentry
+	 */
+	if (!tid_fd_revalidate(dentry, NULL))
+		return ERR_PTR(-ENOENT);
+	return NULL;
 }
 
 static struct dentry *proc_lookupfd_common(struct inode *dir,

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: /proc/pid/fd/ shows strange mode when executed via sudo.
  2012-05-18 22:29                               ` Al Viro
@ 2012-05-19  7:08                                 ` Tetsuo Handa
  0 siblings, 0 replies; 20+ messages in thread
From: Tetsuo Handa @ 2012-05-19  7:08 UTC (permalink / raw)
  To: viro, torvalds; +Cc: ebiederm, linux-fsdevel, akpm

Al Viro wrote:
> On Fri, May 18, 2012 at 02:32:58PM -0700, Linus Torvalds wrote:
> > On Fri, May 18, 2012 at 2:26 PM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > However, wouldn't it be better to move the check you added to the
> > > readdir() path, then, so that we don't do it unnecessarily for
> > > lookups..
> > >
> > > I'll check how that looks.
> > 
> > Actually, we already do that. It's the fcheck_files() check in
> > proc_readfd_common(), no?
> 
> Missed that.  Yes, you are right - there's no point keeping that
> in proc_fd_instantiate().  OK, messing with fcheck_files() in there
> dropped now...

OK. As of commit 30a08bf2d "proc: move fd symlink i_mode calculations into
tid_fd_revalidate()", things are working as expected. Thank you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-05-19  7:08 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-02 13:40 /proc/pid/fd/ shows strange mode when executed via sudo Tetsuo Handa
2012-05-03 15:42 ` Serge Hallyn
2012-05-03 16:25   ` Tetsuo Handa
2012-05-18  2:39     ` Tetsuo Handa
2012-05-18  9:27       ` Tetsuo Handa
2012-05-18 16:08         ` Linus Torvalds
2012-05-18 16:25           ` Linus Torvalds
2012-05-18 19:55             ` Eric W. Biederman
2012-05-18 18:08           ` Al Viro
2012-05-18 18:18             ` Linus Torvalds
2012-05-18 18:23               ` Linus Torvalds
2012-05-18 18:45                 ` Al Viro
2012-05-18 18:55                   ` Linus Torvalds
2012-05-18 19:10                     ` Al Viro
2012-05-18 20:49                       ` Linus Torvalds
2012-05-18 21:23                         ` Al Viro
2012-05-18 21:26                           ` Linus Torvalds
2012-05-18 21:32                             ` Linus Torvalds
2012-05-18 22:29                               ` Al Viro
2012-05-19  7:08                                 ` Tetsuo Handa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.