linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT
       [not found]         ` <mailman.1097403036.11924@unix-os.sc.intel.com>
@ 2004-10-11 21:05           ` Arun Sharma
  2004-10-12 21:50             ` David Woodhouse
  0 siblings, 1 reply; 8+ messages in thread
From: Arun Sharma @ 2004-10-11 21:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, linux-ia64

[-- Attachment #1: Type: text/plain, Size: 1936 bytes --]


[ cc'ing LKML - since this really belongs there ]

Original thread on linux-ia64 about supporting alternate root for user level emulators:

http://marc.theaimsgroup.com/?t=109708875700003&r=1&w=2

I've reattached the patch here for new readers.

On 10/10/2004 3:10 AM, Christoph Hellwig wrote:
> On Fri, Oct 08, 2004 at 03:37:10PM -0700, Arun Sharma wrote:
>> On 10/8/2004 1:08 AM, David Mosberger wrote:
>> 
>> >Unfortunately, that thread ran out in a rather unhelpful manner, as
>> >far as I can see.  Rusty seemed to agree that the performance-hit of
>> >doing it all in user-level was unacceptably high, but I didn't see any
>> >actual numbers.  There was a suggestion to decouple the altroot from
>> >the personality which makes some sense, but nobody actually did
>> >anything about it?
>> >
>> 
>> I'd really like this issue to be resolved one way or the other. I'm not 
>> sure I've heard a convincing argument on why my original patch(which adds a 
>> new exec domain unconditionally) should not be applied.
>> 
>> I'm fine with the attached patch to set the altroot via a system call as 
>> well.
> 
> I'd still like to see some number on how much smaller a userland emulation
> is.  Best using qumu because that's opensource and we simply don't care
> about intel's propritary stuff.  Else your patch looks pretty okay - but
> I'd need to go through linux-kernel and hooking up to all architectures.

I've prototyped a generic userland solution that covers just open and stat system calls (for completeness, all path walk related syscalls need to be covered) using the LD_PRELOAD approach.

I saw a 16% degradation in system time on this benchmark:

find /usr/src/linux -name '*.[chS]' | xargs grep fsck

mainly due to the doubling of the number of calls to open. Also, there was a slight increase in user time as well, due to malloc/free overhead.

shar file with the benchmark and system call interposer attached.

	-Arun

[-- Attachment #2: ia32-exec-domain.patch --]
[-- Type: text/plain, Size: 2120 bytes --]


Register the ia32 exec domain even though CONFIG_IA32_SUPPORT and
CONFIG_COMPAT are turned off. This is necessary to support the alternate
root feature for programs running with PER_LINUX32.

Signed-off-by: Arun Sharma <arun.sharma@intel.com>

--- linux26/include/asm-ia64/ia32.h-	2004-10-05 15:30:05.000000000 -0700
+++ linux26/include/asm-ia64/ia32.h	2004-10-05 15:31:11.000000000 -0700
@@ -11,6 +11,8 @@
 
 #ifndef __ASSEMBLY__
 
+extern int register_ia32_exec_domain(void);
+
 # ifdef CONFIG_IA32_SUPPORT
 
 extern void ia32_cpu_init (void);
--- linux26/arch/ia64/kernel/process.c-	2004-10-05 15:20:04.000000000 -0700
+++ linux26/arch/ia64/kernel/process.c	2004-10-05 15:30:57.000000000 -0700
@@ -765,3 +765,19 @@
 }
 
 EXPORT_SYMBOL(machine_power_off);
+
+struct exec_domain ia32_exec_domain;
+
+int __init
+register_ia32_exec_domain()
+{
+	ia32_exec_domain.name = "Linux/x86";
+	ia32_exec_domain.handler = NULL;
+	ia32_exec_domain.pers_low = PER_LINUX32;
+	ia32_exec_domain.pers_high = PER_LINUX32;
+	ia32_exec_domain.signal_map = default_exec_domain.signal_map;
+	ia32_exec_domain.signal_invmap = default_exec_domain.signal_invmap;
+	return register_exec_domain(&ia32_exec_domain);
+}
+
+__initcall(register_ia32_exec_domain);
--- linux26/arch/ia64/ia32/ia32_support.c-	2004-10-05 15:19:24.000000000 -0700
+++ linux26/arch/ia64/ia32/ia32_support.c	2004-10-05 16:37:50.000000000 -0700
@@ -29,7 +29,6 @@
 
 extern void die_if_kernel (char *str, struct pt_regs *regs, long err);
 
-struct exec_domain ia32_exec_domain;
 struct page *ia32_shared_page[NR_CPUS];
 unsigned long *ia32_boot_gdt;
 unsigned long *cpu_gdt_table[NR_CPUS];
@@ -211,14 +210,6 @@
 static int __init
 ia32_init (void)
 {
-	ia32_exec_domain.name = "Linux/x86";
-	ia32_exec_domain.handler = NULL;
-	ia32_exec_domain.pers_low = PER_LINUX32;
-	ia32_exec_domain.pers_high = PER_LINUX32;
-	ia32_exec_domain.signal_map = default_exec_domain.signal_map;
-	ia32_exec_domain.signal_invmap = default_exec_domain.signal_invmap;
-	register_exec_domain(&ia32_exec_domain);
-
 #if PAGE_SHIFT > IA32_PAGE_SHIFT
 	{
 		extern kmem_cache_t *partial_page_cachep;

[-- Attachment #3: perf.shar.txt --]
[-- Type: text/plain, Size: 8661 bytes --]

#!/bin/sh
# This is a shell archive (produced by GNU sharutils 4.2.1).
# To extract the files from this archive, save it to some FILE, remove
# everything before the `!/bin/sh' line above, then type `sh FILE'.
#
# Made on 2004-10-11 13:43 PDT by <adsharma@linux-t08.sc.intel.com>.
# Source directory was `/home/adsharma'.
#
# Existing files will *not* be overwritten unless `-c' is specified.
#
# This shar contains:
# length mode       name
# ------ ---------- ------------------------------------------
#    157 -rw-r--r-- perf/Makefile
#   1227 -rw-r--r-- perf/namei.c
#     56 -rwxr-xr-x perf/test1.sh
#     88 -rwxr-xr-x perf/test.sh
#     85 -rw-r--r-- perf/results.txt
#
save_IFS="${IFS}"
IFS="${IFS}:"
gettext_dir=FAILED
locale_dir=FAILED
first_param="$1"
for dir in $PATH
do
  if test "$gettext_dir" = FAILED && test -f $dir/gettext \
     && ($dir/gettext --version >/dev/null 2>&1)
  then
    set `$dir/gettext --version 2>&1`
    if test "$3" = GNU
    then
      gettext_dir=$dir
    fi
  fi
  if test "$locale_dir" = FAILED && test -f $dir/shar \
     && ($dir/shar --print-text-domain-dir >/dev/null 2>&1)
  then
    locale_dir=`$dir/shar --print-text-domain-dir`
  fi
done
IFS="$save_IFS"
if test "$locale_dir" = FAILED || test "$gettext_dir" = FAILED
then
  echo=echo
else
  TEXTDOMAINDIR=$locale_dir
  export TEXTDOMAINDIR
  TEXTDOMAIN=sharutils
  export TEXTDOMAIN
  echo="$gettext_dir/gettext -s"
fi
if touch -am -t 200112312359.59 $$.touch >/dev/null 2>&1 && test ! -f 200112312359.59 -a -f $$.touch; then
  shar_touch='touch -am -t $1$2$3$4$5$6.$7 "$8"'
elif touch -am 123123592001.59 $$.touch >/dev/null 2>&1 && test ! -f 123123592001.59 -a ! -f 123123592001.5 -a -f $$.touch; then
  shar_touch='touch -am $3$4$5$6$1$2.$7 "$8"'
elif touch -am 1231235901 $$.touch >/dev/null 2>&1 && test ! -f 1231235901 -a -f $$.touch; then
  shar_touch='touch -am $3$4$5$6$2 "$8"'
else
  shar_touch=:
  echo
  $echo 'WARNING: not restoring timestamps.  Consider getting and'
  $echo "installing GNU \`touch', distributed in GNU File Utilities..."
  echo
fi
rm -f 200112312359.59 123123592001.59 123123592001.5 1231235901 $$.touch
#
if mkdir _sh21367; then
  $echo 'x -' 'creating lock directory'
else
  $echo 'failed to create lock directory'
  exit 1
fi
# ============= perf/Makefile ==============
if test ! -d 'perf'; then
  $echo 'x -' 'creating directory' 'perf'
  mkdir 'perf'
fi
if test -f 'perf/Makefile' && test "$first_param" != -c; then
  $echo 'x -' SKIPPING 'perf/Makefile' '(file already exists)'
else
  $echo 'x -' extracting 'perf/Makefile' '(text)'
  sed 's/^X//' << 'SHAR_EOF' > 'perf/Makefile' &&
CFLAGS		=	-D_GNU_SOURCE
X
all: libaltroot.so
X
libaltroot.so: namei.o
X	gcc -shared -g -fpic namei.o -o libaltroot.so -ldl
X
clean:
X	$(RM) namei.o libaltroot.so
SHAR_EOF
  (set 20 04 10 11 13 43 04 'perf/Makefile'; eval "$shar_touch") &&
  chmod 0644 'perf/Makefile' ||
  $echo 'restore of' 'perf/Makefile' 'failed'
  if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \
  && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then
    md5sum -c << SHAR_EOF >/dev/null 2>&1 \
    || $echo 'perf/Makefile:' 'MD5 check failed'
c04da369bdfbb9e9a73346b393ac99a0  perf/Makefile
SHAR_EOF
  else
    shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'perf/Makefile'`"
    test 157 -eq "$shar_count" ||
    $echo 'perf/Makefile:' 'original size' '157,' 'current size' "$shar_count!"
  fi
fi
# ============= perf/namei.c ==============
if test -f 'perf/namei.c' && test "$first_param" != -c; then
  $echo 'x -' SKIPPING 'perf/namei.c' '(file already exists)'
else
  $echo 'x -' extracting 'perf/namei.c' '(text)'
  sed 's/^X//' << 'SHAR_EOF' > 'perf/namei.c' &&
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <dlfcn.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <linux/limits.h>
#include <fcntl.h>
X
#define EMULATE 1
X
int open(const char *pathname, int flags, ...)
{
X    static int (*func) (const char *, int, mode_t);
X    int ret;
X    mode_t mode;
X    char *buf;
X  
X    if (!func)
X	func = (int (*)()) dlsym(RTLD_NEXT, "open");
X
#ifdef EMULATE
X    if (*pathname == '/') {
X    	buf = malloc(PATH_MAX);
X	strncpy(buf, "/emul/ia32-linux", PATH_MAX);
X	strncat(buf, pathname, PATH_MAX-16);
X    	ret = func(buf, flags, mode);
X	if (ret != -1) {
X		free(buf);
X		return ret;
X	}
X	free(buf);
X    }
#endif
X
X    ret = func(pathname, flags, mode);
X
X    return ret;
}
X
int stat(const char *pathname, struct stat *buf1)
{
X    static int (*func) (const char *, struct stat *);
X    int ret;
X    char *buf;
X
X    if (!func)
X	func = (int (*)()) dlsym(RTLD_NEXT, "stat");
X
#ifdef EMULATE
X    if (*pathname == '/') {
X    	buf = malloc(PATH_MAX);
X	strncpy(buf, "/emul/ia32-linux", PATH_MAX);
X	strncat(buf, pathname, PATH_MAX-16);
X    	ret = func(buf, buf1);
X	if (ret != -1) {
X		free(buf);
X		return ret;
X	}
X	free(buf);
X    }
#endif
X
X    ret = func(pathname, buf1);
X
X    return ret;
}
SHAR_EOF
  (set 20 04 10 11 13 28 34 'perf/namei.c'; eval "$shar_touch") &&
  chmod 0644 'perf/namei.c' ||
  $echo 'restore of' 'perf/namei.c' 'failed'
  if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \
  && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then
    md5sum -c << SHAR_EOF >/dev/null 2>&1 \
    || $echo 'perf/namei.c:' 'MD5 check failed'
968c756b5664cc2125b842b1750e40a8  perf/namei.c
SHAR_EOF
  else
    shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'perf/namei.c'`"
    test 1227 -eq "$shar_count" ||
    $echo 'perf/namei.c:' 'original size' '1227,' 'current size' "$shar_count!"
  fi
fi
# ============= perf/test1.sh ==============
if test -f 'perf/test1.sh' && test "$first_param" != -c; then
  $echo 'x -' SKIPPING 'perf/test1.sh' '(file already exists)'
else
  $echo 'x -' extracting 'perf/test1.sh' '(text)'
  sed 's/^X//' << 'SHAR_EOF' > 'perf/test1.sh' &&
#!/bin/sh
X
find `pwd` -name '*.[chS]' | xargs grep fsck
SHAR_EOF
  (set 20 04 10 11 13 41 25 'perf/test1.sh'; eval "$shar_touch") &&
  chmod 0755 'perf/test1.sh' ||
  $echo 'restore of' 'perf/test1.sh' 'failed'
  if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \
  && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then
    md5sum -c << SHAR_EOF >/dev/null 2>&1 \
    || $echo 'perf/test1.sh:' 'MD5 check failed'
bd830f72d5ade56e00a8c5c4f0e4c5fe  perf/test1.sh
SHAR_EOF
  else
    shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'perf/test1.sh'`"
    test 56 -eq "$shar_count" ||
    $echo 'perf/test1.sh:' 'original size' '56,' 'current size' "$shar_count!"
  fi
fi
# ============= perf/test.sh ==============
if test -f 'perf/test.sh' && test "$first_param" != -c; then
  $echo 'x -' SKIPPING 'perf/test.sh' '(file already exists)'
else
  $echo 'x -' extracting 'perf/test.sh' '(text)'
  sed 's/^X//' << 'SHAR_EOF' > 'perf/test.sh' &&
#!/bin/sh
X
find `pwd` -name '*.[chS]' | LD_PRELOAD=~/perf/libaltroot.so xargs grep fsck
SHAR_EOF
  (set 20 04 10 11 13 41 25 'perf/test.sh'; eval "$shar_touch") &&
  chmod 0755 'perf/test.sh' ||
  $echo 'restore of' 'perf/test.sh' 'failed'
  if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \
  && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then
    md5sum -c << SHAR_EOF >/dev/null 2>&1 \
    || $echo 'perf/test.sh:' 'MD5 check failed'
005c01ad066d70c7be266f1bec9074f6  perf/test.sh
SHAR_EOF
  else
    shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'perf/test.sh'`"
    test 88 -eq "$shar_count" ||
    $echo 'perf/test.sh:' 'original size' '88,' 'current size' "$shar_count!"
  fi
fi
# ============= perf/results.txt ==============
if test -f 'perf/results.txt' && test "$first_param" != -c; then
  $echo 'x -' SKIPPING 'perf/results.txt' '(file already exists)'
else
  $echo 'x -' extracting 'perf/results.txt' '(text)'
  sed 's/^X//' << 'SHAR_EOF' > 'perf/results.txt' &&
test1.sh:
X
real 14.49
user 14.09
sys 0.37
X
test.sh:
X
real 14.68
user 14.13
sys 0.43
X
SHAR_EOF
  (set 20 04 10 11 13 42 04 'perf/results.txt'; eval "$shar_touch") &&
  chmod 0644 'perf/results.txt' ||
  $echo 'restore of' 'perf/results.txt' 'failed'
  if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \
  && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then
    md5sum -c << SHAR_EOF >/dev/null 2>&1 \
    || $echo 'perf/results.txt:' 'MD5 check failed'
db123bdf3d15d40c363be0924ccb2b6d  perf/results.txt
SHAR_EOF
  else
    shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'perf/results.txt'`"
    test 85 -eq "$shar_count" ||
    $echo 'perf/results.txt:' 'original size' '85,' 'current size' "$shar_count!"
  fi
fi
rm -fr _sh21367
exit 0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT
  2004-10-11 21:05           ` [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT Arun Sharma
@ 2004-10-12 21:50             ` David Woodhouse
  2004-10-12 22:46               ` Arun Sharma
  0 siblings, 1 reply; 8+ messages in thread
From: David Woodhouse @ 2004-10-12 21:50 UTC (permalink / raw)
  To: Arun Sharma; +Cc: Christoph Hellwig, linux-kernel, linux-ia64

On Mon, 2004-10-11 at 14:05 -0700, Arun Sharma wrote:
> I've prototyped a generic userland solution that covers just open and
> stat system calls (for completeness, all path walk related syscalls
> need to be covered) using the LD_PRELOAD approach.
> 
> I saw a 16% degradation in system time on this benchmark:
> 
> find /usr/src/linux -name '*.[chS]' | xargs grep fsck
> 
> mainly due to the doubling of the number of calls to open. Also, there
> was a slight increase in user time as well, due to malloc/free
> overhead.

The patch is entirely bogus. This isn't at all ia64-specific, and
doesn't live in arch/ia64. It's just as applicable on _all_ systems
where we may want to do CPU or OS emulation.

If you make it generic so that qemu can use it for emulating i386 even
on machines like ppc64, perhaps it would be saner.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT
  2004-10-12 21:50             ` David Woodhouse
@ 2004-10-12 22:46               ` Arun Sharma
  2004-10-13 22:27                 ` Arun Sharma
  2004-10-14  8:50                 ` Jakub Jelinek
  0 siblings, 2 replies; 8+ messages in thread
From: Arun Sharma @ 2004-10-12 22:46 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Christoph Hellwig, linux-kernel, linux-ia64

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

David Woodhouse wrote:

> If you make it generic so that qemu can use it for emulating i386 even
> on machines like ppc64, perhaps it would be saner.
> 

There is already an architecture independent solution to the problem - 
set_fs_altroot(). The only problem is that it's not easily accessible 
from userland. Accessing it via set_personality() works only if there 
are multiple exec domains.

I can't see how anyone would accept a patch that adds linux-i386 exec 
domain in generic code.

Christoph doesn't like the idea of adding exec-domains just for this 
purpose and has suggested adding a new system call to set the altroot. A 
prototype patch to do this already exists. I will be cleaning it up and 
  posting it to LKML later this week. The main purpose of moving the 
discussion to LKML was to see how receptive people were to the proposed 
new system call.

So far I haven't heard anyone complaining :)

	-Arun

PS: Prototype patch with known bugs attached.

[-- Attachment #2: sys-altroot.patch --]
[-- Type: text/plain, Size: 4045 bytes --]

Index: linux-2.6-cvs/arch/ia64/kernel/entry.S
===================================================================
RCS file: /home/adsharma/disk2/cvs/linux-2.5/arch/ia64/kernel/entry.S,v
retrieving revision 1.49
diff -u -r1.49 entry.S
--- linux-2.6-cvs/arch/ia64/kernel/entry.S	24 Aug 2004 18:27:07 -0000	1.49
+++ linux-2.6-cvs/arch/ia64/kernel/entry.S	8 Oct 2004 17:35:53 -0000
@@ -1527,7 +1527,7 @@
 	data8 sys_mq_getsetattr
 	data8 sys_ni_syscall			// reserved for kexec_load
 	data8 sys_ni_syscall
-	data8 sys_ni_syscall			// 1270
+	data8 sys_altroot			// 1270
 	data8 sys_ni_syscall
 	data8 sys_ni_syscall
 	data8 sys_ni_syscall
Index: linux-2.6-cvs/fs/namei.c
===================================================================
RCS file: /home/adsharma/disk2/cvs/linux-2.5/fs/namei.c,v
retrieving revision 1.110
diff -u -r1.110 namei.c
--- linux-2.6-cvs/fs/namei.c	2 Oct 2004 17:59:55 -0000	1.110
+++ linux-2.6-cvs/fs/namei.c	8 Oct 2004 18:14:11 -0000
@@ -897,7 +897,7 @@
 	return 1;
 }
 
-void set_fs_altroot(void)
+int set_fs_altroot(const char __user *altroot)
 {
 	char *emul = __emul_prefix();
 	struct nameidata nd;
@@ -905,12 +905,20 @@
 	struct dentry *dentry = NULL, *olddentry;
 	int err;
 
+	if (altroot) {
+		emul = getname(altroot);
+		if (IS_ERR(emul))
+			return PTR_ERR(emul);
+	}
+
 	if (!emul)
 		goto set_it;
 	err = path_lookup(emul, LOOKUP_FOLLOW|LOOKUP_DIRECTORY|LOOKUP_NOALT, &nd);
 	if (!err) {
 		mnt = nd.mnt;
 		dentry = nd.dentry;
+	} else {
+		return err;
 	}
 set_it:
 	write_lock(&current->fs->lock);
@@ -923,6 +931,33 @@
 		dput(olddentry);
 		mntput(oldmnt);
 	}
+	return 0;
+}
+
+asmlinkage int sys_altroot(const char __user * altroot)
+{
+	if (atomic_read(&current->fs->count) != 1) {
+		struct fs_struct *fsp, *ofsp;
+
+		fsp = copy_fs_struct(current->fs);
+		if (fsp == NULL) {
+			return -ENOMEM;
+		}
+
+		task_lock(current);
+		ofsp = current->fs;
+		current->fs = fsp;
+		task_unlock(current);
+
+		put_fs_struct(ofsp);
+	}
+
+	/*
+	 * At that point we are guaranteed to be the sole owner of
+	 * current->fs.
+	 */
+
+	return set_fs_altroot(altroot);
 }
 
 int fastcall path_lookup(const char *name, unsigned int flags, struct nameidata *nd)
Index: linux-2.6-cvs/fs/open.c
===================================================================
RCS file: /home/adsharma/disk2/cvs/linux-2.5/fs/open.c,v
retrieving revision 1.71
diff -u -r1.71 open.c
--- linux-2.6-cvs/fs/open.c	8 Aug 2004 01:54:18 -0000	1.71
+++ linux-2.6-cvs/fs/open.c	8 Oct 2004 17:38:03 -0000
@@ -584,7 +584,7 @@
 		goto dput_and_out;
 
 	set_fs_root(current->fs, nd.mnt, nd.dentry);
-	set_fs_altroot();
+	set_fs_altroot(NULL);
 	error = 0;
 dput_and_out:
 	path_release(&nd);
Index: linux-2.6-cvs/include/linux/fs_struct.h
===================================================================
RCS file: /home/adsharma/disk2/cvs/linux-2.5/include/linux/fs_struct.h,v
retrieving revision 1.7
diff -u -r1.7 fs_struct.h
--- linux-2.6-cvs/include/linux/fs_struct.h	17 Nov 2002 03:52:23 -0000	1.7
+++ linux-2.6-cvs/include/linux/fs_struct.h	8 Oct 2004 17:39:16 -0000
@@ -19,7 +19,7 @@
 }
 
 extern void exit_fs(struct task_struct *);
-extern void set_fs_altroot(void);
+extern int set_fs_altroot(const char __user *);
 extern void set_fs_root(struct fs_struct *, struct vfsmount *, struct dentry *);
 extern void set_fs_pwd(struct fs_struct *, struct vfsmount *, struct dentry *);
 extern struct fs_struct *copy_fs_struct(struct fs_struct *);
Index: linux-2.6-cvs/kernel/exec_domain.c
===================================================================
RCS file: /home/adsharma/disk2/cvs/linux-2.5/kernel/exec_domain.c,v
retrieving revision 1.18
diff -u -r1.18 exec_domain.c
--- linux-2.6-cvs/kernel/exec_domain.c	8 Sep 2004 14:50:53 -0000	1.18
+++ linux-2.6-cvs/kernel/exec_domain.c	8 Oct 2004 17:33:00 -0000
@@ -167,7 +167,7 @@
 	current->personality = personality;
 	oep = current_thread_info()->exec_domain;
 	current_thread_info()->exec_domain = ep;
-	set_fs_altroot();
+	set_fs_altroot(NULL);
 
 	module_put(oep->module);
 	return 0;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT
  2004-10-12 22:46               ` Arun Sharma
@ 2004-10-13 22:27                 ` Arun Sharma
  2004-10-14  7:32                   ` David Mosberger
  2004-10-14  8:50                 ` Jakub Jelinek
  1 sibling, 1 reply; 8+ messages in thread
From: Arun Sharma @ 2004-10-13 22:27 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Christoph Hellwig, linux-kernel, linux-ia64

[-- Attachment #1: Type: text/plain, Size: 527 bytes --]

Arun Sharma wrote:

> Christoph doesn't like the idea of adding exec-domains just for this 
> purpose and has suggested adding a new system call to set the altroot. A 
> prototype patch to do this already exists. I will be cleaning it up and 
>  posting it to LKML later this week. The main purpose of moving the 
> discussion to LKML was to see how receptive people were to the proposed 
> new system call.
> 

Attached is the promised patch. It addresses Christoph's comments and 
fixes the bug Tony found as well.

	-Arun



[-- Attachment #2: sys_altroot.txt --]
[-- Type: text/plain, Size: 3034 bytes --]

Add a new system call sys_altroot. This allows using the 
altroot feature on systems where there is only one exec domain.

Signed-off-by: Zou Nanhai <nanhai.zou@intel.com>
Signed-off-by: Gordon Jin <gordon.jin@intel.com>
Signed-off-by: Arun Sharma <arun.sharma@intel.com>

diff -Nraup a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S
--- a/arch/ia64/kernel/entry.S	2004-10-12 09:56:51.408496174 -0700
+++ b/arch/ia64/kernel/entry.S	2004-10-12 09:58:17.362596684 -0700
@@ -1527,7 +1527,7 @@ sys_call_table:
 	data8 sys_mq_getsetattr
 	data8 sys_ni_syscall			// reserved for kexec_load
 	data8 sys_ni_syscall
-	data8 sys_ni_syscall			// 1270
+	data8 sys_setaltroot			// 1270
 	data8 sys_ni_syscall
 	data8 sys_ni_syscall
 	data8 sys_ni_syscall
diff -Nraup a/fs/namei.c b/fs/namei.c
--- a/fs/namei.c	2004-10-12 09:56:56.895800795 -0700
+++ b/fs/namei.c	2004-10-12 09:58:33.524705861 -0700
@@ -897,20 +897,20 @@ static int __emul_lookup_dentry(const ch
 	return 1;
 }
 
-void set_fs_altroot(void)
+int __set_fs_altroot(const char *altroot)
 {
-	char *emul = __emul_prefix();
 	struct nameidata nd;
 	struct vfsmount *mnt = NULL, *oldmnt;
 	struct dentry *dentry = NULL, *olddentry;
 	int err;
-
-	if (!emul)
+	if (!altroot)
 		goto set_it;
-	err = path_lookup(emul, LOOKUP_FOLLOW|LOOKUP_DIRECTORY|LOOKUP_NOALT, &nd);
+	err = path_lookup(altroot, LOOKUP_FOLLOW|LOOKUP_DIRECTORY|LOOKUP_NOALT, &nd);
 	if (!err) {
 		mnt = nd.mnt;
 		dentry = nd.dentry;
+	} else {
+		return err;
 	}
 set_it:
 	write_lock(&current->fs->lock);
@@ -923,6 +923,58 @@ set_it:
 		dput(olddentry);
 		mntput(oldmnt);
 	}
+	return 0;
+}
+
+void set_fs_altroot()
+{
+	char *emul = __emul_prefix();
+
+	__set_fs_altroot(emul);
+}
+
+asmlinkage long sys_setaltroot(const char __user * altroot)
+{
+	char *emul = NULL;
+	int ret;
+
+	if (altroot) {
+		emul = getname(altroot);
+		if (IS_ERR(emul)) {
+			ret = PTR_ERR(emul);
+			goto out;
+		}
+	}
+
+	if (atomic_read(&current->fs->count) != 1) {
+		struct fs_struct *fsp, *ofsp;
+
+		fsp = copy_fs_struct(current->fs);
+		if (fsp == NULL) {
+			ret = -ENOMEM;
+			goto out_putname;
+		}
+
+		task_lock(current);
+		ofsp = current->fs;
+		current->fs = fsp;
+		task_unlock(current);
+
+		put_fs_struct(ofsp);
+	}
+
+	/*
+	 * At that point we are guaranteed to be the sole owner of
+	 * current->fs.
+	 */
+
+	ret = __set_fs_altroot(emul);
+
+out_putname:
+	if (emul)
+		putname(emul);
+out:
+	return ret;
 }
 
 int fastcall path_lookup(const char *name, unsigned int flags, struct nameidata *nd)
diff -Nraup a/include/linux/syscalls.h b/include/linux/syscalls.h
--- a/include/linux/syscalls.h	2004-10-12 09:56:58.124316405 -0700
+++ b/include/linux/syscalls.h	2004-10-12 09:58:17.362596684 -0700
@@ -489,6 +489,7 @@ asmlinkage long sys_nfsservctl(int cmd,
 				void __user *res);
 asmlinkage long sys_syslog(int type, char __user *buf, int len);
 asmlinkage long sys_uselib(const char __user *library);
+asmlinkage long sys_setaltroot(const char __user *altroot);
 asmlinkage long sys_ni_syscall(void);
 
 #endif

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT
  2004-10-13 22:27                 ` Arun Sharma
@ 2004-10-14  7:32                   ` David Mosberger
  2004-10-14  8:25                     ` David Woodhouse
  0 siblings, 1 reply; 8+ messages in thread
From: David Mosberger @ 2004-10-14  7:32 UTC (permalink / raw)
  To: Arun Sharma; +Cc: David Woodhouse, Christoph Hellwig, linux-kernel, linux-ia64

>>>>> On Wed, 13 Oct 2004 15:27:05 -0700, Arun Sharma <arun.sharma@intel.com> said:

  Arun> Arun Sharma wrote:
  >> Christoph doesn't like the idea of adding exec-domains just for
  >> this purpose and has suggested adding a new system call to set
  >> the altroot. A prototype patch to do this already exists. I will
  >> be cleaning it up and posting it to LKML later this week. The
  >> main purpose of moving the discussion to LKML was to see how
  >> receptive people were to the proposed new system call.

  Arun> Attached is the promised patch. It addresses Christoph's
  Arun> comments and fixes the bug Tony found as well.

I like it.  Once it's in (surely it will go in, no? ;-),
we can change asm-ia64/namei.h to define a non-NULL __emul_prefix()
only if CONFIG_IA32_SUPPORT is defined.  One less hardcoded path
to worry about!

	--david

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT
  2004-10-14  7:32                   ` David Mosberger
@ 2004-10-14  8:25                     ` David Woodhouse
  0 siblings, 0 replies; 8+ messages in thread
From: David Woodhouse @ 2004-10-14  8:25 UTC (permalink / raw)
  To: davidm; +Cc: Arun Sharma, Christoph Hellwig, linux-kernel, linux-ia64

On Thu, 2004-10-14 at 00:32 -0700, David Mosberger wrote:
>   Arun> Attached is the promised patch. It addresses Christoph's
>   Arun> comments and fixes the bug Tony found as well.
> 
> I like it.  Once it's in (surely it will go in, no? ;-),

It certainly looks saner, yes. If this is the way we're going, then the
new syscall needs to be added on all other architectures, and I'd
suggest that we should see a real open-source user of it too before it's
merged. Does anyone have the qemu patches?

-- 
dwmw2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT
  2004-10-12 22:46               ` Arun Sharma
  2004-10-13 22:27                 ` Arun Sharma
@ 2004-10-14  8:50                 ` Jakub Jelinek
  2004-10-14 17:53                   ` Arun Sharma
  1 sibling, 1 reply; 8+ messages in thread
From: Jakub Jelinek @ 2004-10-14  8:50 UTC (permalink / raw)
  To: Arun Sharma; +Cc: David Woodhouse, Christoph Hellwig, linux-kernel, linux-ia64

On Tue, Oct 12, 2004 at 03:46:39PM -0700, Arun Sharma wrote:
> --- linux-2.6-cvs/fs/namei.c	2 Oct 2004 17:59:55 -0000	1.110
> +++ linux-2.6-cvs/fs/namei.c	8 Oct 2004 18:14:11 -0000
> @@ -897,7 +897,7 @@
>  	return 1;
>  }
>  
> -void set_fs_altroot(void)
> +int set_fs_altroot(const char __user *altroot)
>  {
>  	char *emul = __emul_prefix();

I think you should change this to char *emul;

>  	struct nameidata nd;
> @@ -905,12 +905,20 @@
>  	struct dentry *dentry = NULL, *olddentry;
>  	int err;
>  
> +	if (altroot) {
> +		emul = getname(altroot);
> +		if (IS_ERR(emul))
> +			return PTR_ERR(emul);
> +	}
> +

and add here
	else
		emul = __emul_prefix();

There is no point in calling __emul_prefix () when it will be thrown away.

	Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT
  2004-10-14  8:50                 ` Jakub Jelinek
@ 2004-10-14 17:53                   ` Arun Sharma
  0 siblings, 0 replies; 8+ messages in thread
From: Arun Sharma @ 2004-10-14 17:53 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: David Woodhouse, Christoph Hellwig, linux-kernel, linux-ia64

On 10/14/2004 1:50 AM, Jakub Jelinek wrote:

> There is no point in calling __emul_prefix () when it will be thrown away.

This is addressed in the most recent patch posted to the mailing list.

	-Arun

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-10-14 18:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <41643EC0.1010505@intel.com>
     [not found] ` <20041007142710.A12688@infradead.org>
     [not found]   ` <4165D4C9.2040804@intel.com>
     [not found]     ` <mailman.1097223239.25078@unix-os.sc.intel.com>
     [not found]       ` <41671696.1060706@intel.com>
     [not found]         ` <mailman.1097403036.11924@unix-os.sc.intel.com>
2004-10-11 21:05           ` [PATCH] Support ia32 exec domains without CONFIG_IA32_SUPPORT Arun Sharma
2004-10-12 21:50             ` David Woodhouse
2004-10-12 22:46               ` Arun Sharma
2004-10-13 22:27                 ` Arun Sharma
2004-10-14  7:32                   ` David Mosberger
2004-10-14  8:25                     ` David Woodhouse
2004-10-14  8:50                 ` Jakub Jelinek
2004-10-14 17:53                   ` Arun Sharma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).