All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fam Zheng <famz@redhat.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <keescook@chromium.org>,
	Andy Lutomirski <luto@amacapital.net>,
	David Herrmann <dh.herrmann@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Miklos Szeredi <mszeredi@suse.cz>,
	David Drysdale <drysdale@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Vivek Goyal <vgoyal@redhat.com>,
	Mike Frysinger <vapier@gentoo.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Rashika Kheria <rashika.kheria@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	Josh Triplett <josh@joshtriplett.org>,
	"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH RFC 5/6] epoll: Add implementation for epoll_mod_wait
Date: Wed, 21 Jan 2015 16:59:00 +0800	[thread overview]
Message-ID: <20150121085900.GA25434@ad.nay.redhat.com> (raw)
In-Reply-To: <20150121075617.GA21266@mew>

On Tue, 01/20 23:56, Omar Sandoval wrote:
> On Tue, Jan 20, 2015 at 05:57:57PM +0800, Fam Zheng wrote:
> > This syscall is a sequence of
> > 
> > 1) a number of epoll_ctl calls
> > 2) a epoll_pwait, with timeout enhancement.
> > 
> > The epoll_ctl operations are embeded so that application doesn't have to use
> > separate syscalls to insert/delete/update the fds before poll. It is more
> > efficient if the set of fds varies from one poll to another, which is the
> > common pattern for certain applications. For example, depending on the input
> > buffer status, a data reading program may decide to temporarily not polling an
> > fd.
> > 
> > Because the enablement of batching in this interface, even that regular
> > epoll_ctl call sequence, which manipulates several fds, can be optimized to one
> > single epoll_ctl_wait (while specifying spec=NULL to skip the poll part).
> > 
> > The only complexity is returning the result of each operation.  For each
> > epoll_mod_cmd in cmds, the field "error" is an output field that will be stored
> > the return code *iff* the command is executed (0 for success and -errno of the
> > equivalent epoll_ctl call), and will be left unchanged if the command is not
> > executed because some earlier error, for example due to failure of
> > copy_from_user to copy the array.
> > 
> > Applications can utilize this fact to do error handling: they could initialize
> > all the epoll_mod_wait.error to a positive value, which is by definition not a
> > possible output value from epoll_mod_wait. Then when the syscall returned, they
> > know whether or not the command is executed by comparing each error with the
> > init value, if they're different, they have the result of the command.
> > More roughly, they can put any non-zero and not distinguish "not run" from
> > failure.
> > 
> > Also, timeout parameter is enhanced: timespec is used, compared to the old ms
> > scalar. This provides higher precision. The parameter field in struct
> > epoll_wait_spec, "clockid", also makes it possible for users to use a different
> > clock than the default when it makes more sense.
> > 
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  fs/eventpoll.c           | 60 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/syscalls.h |  5 ++++
> >  2 files changed, 65 insertions(+)
> > 
> > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > index e7a116d..2cc22c9 100644
> > --- a/fs/eventpoll.c
> > +++ b/fs/eventpoll.c
> > @@ -2067,6 +2067,66 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
> >  			      sigmask ? &ksigmask : NULL);
> >  }
> >  
> > +SYSCALL_DEFINE5(epoll_mod_wait, int, epfd, int, flags,
> > +		int, ncmds, struct epoll_mod_cmd __user *, cmds,
> > +		struct epoll_wait_spec __user *, spec)
> > +{
> > +	struct epoll_mod_cmd *kcmds = NULL;
> > +	int i, ret = 0;
> > +	int cmd_size = sizeof(struct epoll_mod_cmd) * ncmds;
> > +
> > +	if (flags)
> > +		return -EINVAL;
> > +	if (ncmds) {
> > +		if (!cmds)
> > +			return -EINVAL;
> > +		kcmds = kmalloc(cmd_size, GFP_KERNEL);
> > +		if (!kcmds)
> > +			return -ENOMEM;
> > +		if (copy_from_user(kcmds, cmds, cmd_size)) {
> > +			ret = -EFAULT;
> > +			goto out;
> > +		}
> > +	}
> > +	for (i = 0; i < ncmds; i++) {
> > +		struct epoll_event ev = (struct epoll_event) {
> > +			.events = kcmds[i].events,
> > +			.data = kcmds[i].data,
> > +		};
> > +		if (kcmds[i].flags) {
> > +			kcmds[i].error = ret = -EINVAL;
> > +			goto out;
> > +		}
> > +		kcmds[i].error = ret = ep_ctl_do(epfd, kcmds[i].op, kcmds[i].fd, ev);
> > +		if (ret)
> > +			goto out;
> > +	}
> > +	if (spec) {
> > +		sigset_t ksigmask;
> > +		struct epoll_wait_spec kspec;
> > +		ktime_t timeout;
> > +
> > +		if(copy_from_user(&kspec, spec, sizeof(struct epoll_wait_spec)))
> > +			return -EFAULT;
> This should probably be goto out, or you'll leak kcmds.
> 
> > +		if (kspec.sigmask) {
> > +			if (kspec.sigsetsize != sizeof(sigset_t))
> > +				return -EINVAL;
> Same here...
> 
> > +			if (copy_from_user(&ksigmask, kspec.sigmask, sizeof(ksigmask)))
> > +				return -EFAULT;
> and here.
> 
> > +		}
> > +		timeout = timespec_to_ktime(kspec.timeout);
> > +		ret = epoll_pwait_do(epfd, kspec.events, kspec.maxevents,
> > +				     kspec.clockid, timeout,
> > +				     kspec.sigmask ? &ksigmask : NULL);
> > +	}
> > +
> > +out:
> > +	if (ncmds && copy_to_user(cmds, kcmds, cmd_size))
> > +		return -EFAULT;
> This will also leak kcmds, it should be ret = -EFAULT. This case, however, seems
> to lead to a weird corner case: if cmds is read-only, we'll end up executing
> every command but fail to copy out the return values, so when userspace gets the
> EFAULT, it won't know whether anything was executed. But, getting an EFAULT here
> means you're probably doing something wrong anyways, so maybe not the biggest
> concern.

Yes, thanks! Will fix this.

Fam

> 
> > +	kfree(kcmds);
> > +	return ret;
> > +}
> > +
> >  #ifdef CONFIG_COMPAT
> >  COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
> >  		       struct epoll_event __user *, events,
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > index 85893d7..7156c80 100644
> > --- a/include/linux/syscalls.h
> > +++ b/include/linux/syscalls.h
> > @@ -12,6 +12,8 @@
> >  #define _LINUX_SYSCALLS_H
> >  
> >  struct epoll_event;
> > +struct epoll_mod_cmd;
> > +struct epoll_wait_spec;
> >  struct iattr;
> >  struct inode;
> >  struct iocb;
> > @@ -630,6 +632,9 @@ asmlinkage long sys_epoll_pwait(int epfd, struct epoll_event __user *events,
> >  				int maxevents, int timeout,
> >  				const sigset_t __user *sigmask,
> >  				size_t sigsetsize);
> > +asmlinkage long sys_epoll_mod_wait(int epfd, int flags,
> > +				   int ncmds, struct epoll_mod_cmd __user * cmds,
> > +				   struct epoll_wait_spec __user * spec);
> >  asmlinkage long sys_gethostname(char __user *name, int len);
> >  asmlinkage long sys_sethostname(char __user *name, int len);
> >  asmlinkage long sys_setdomainname(char __user *name, int len);
> > -- 
> > 1.9.3
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> -- 
> Omar

WARNING: multiple messages have this Message-ID (diff)
From: Fam Zheng <famz@redhat.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <keescook@chromium.org>,
	Andy Lutomirski <luto@amacapital.net>,
	David Herrmann <dh.herrmann@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Miklos Szeredi <mszeredi@suse.cz>,
	David Drysdale <drysdale@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Vivek Goyal <vgoyal@redhat.com>,
	Mike Frysinger <vapier@gentoo.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Rashika Kheria <rashika.kheria@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Peter Zijlstra <peter
Subject: Re: [PATCH RFC 5/6] epoll: Add implementation for epoll_mod_wait
Date: Wed, 21 Jan 2015 16:59:00 +0800	[thread overview]
Message-ID: <20150121085900.GA25434@ad.nay.redhat.com> (raw)
In-Reply-To: <20150121075617.GA21266@mew>

On Tue, 01/20 23:56, Omar Sandoval wrote:
> On Tue, Jan 20, 2015 at 05:57:57PM +0800, Fam Zheng wrote:
> > This syscall is a sequence of
> > 
> > 1) a number of epoll_ctl calls
> > 2) a epoll_pwait, with timeout enhancement.
> > 
> > The epoll_ctl operations are embeded so that application doesn't have to use
> > separate syscalls to insert/delete/update the fds before poll. It is more
> > efficient if the set of fds varies from one poll to another, which is the
> > common pattern for certain applications. For example, depending on the input
> > buffer status, a data reading program may decide to temporarily not polling an
> > fd.
> > 
> > Because the enablement of batching in this interface, even that regular
> > epoll_ctl call sequence, which manipulates several fds, can be optimized to one
> > single epoll_ctl_wait (while specifying spec=NULL to skip the poll part).
> > 
> > The only complexity is returning the result of each operation.  For each
> > epoll_mod_cmd in cmds, the field "error" is an output field that will be stored
> > the return code *iff* the command is executed (0 for success and -errno of the
> > equivalent epoll_ctl call), and will be left unchanged if the command is not
> > executed because some earlier error, for example due to failure of
> > copy_from_user to copy the array.
> > 
> > Applications can utilize this fact to do error handling: they could initialize
> > all the epoll_mod_wait.error to a positive value, which is by definition not a
> > possible output value from epoll_mod_wait. Then when the syscall returned, they
> > know whether or not the command is executed by comparing each error with the
> > init value, if they're different, they have the result of the command.
> > More roughly, they can put any non-zero and not distinguish "not run" from
> > failure.
> > 
> > Also, timeout parameter is enhanced: timespec is used, compared to the old ms
> > scalar. This provides higher precision. The parameter field in struct
> > epoll_wait_spec, "clockid", also makes it possible for users to use a different
> > clock than the default when it makes more sense.
> > 
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  fs/eventpoll.c           | 60 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/syscalls.h |  5 ++++
> >  2 files changed, 65 insertions(+)
> > 
> > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > index e7a116d..2cc22c9 100644
> > --- a/fs/eventpoll.c
> > +++ b/fs/eventpoll.c
> > @@ -2067,6 +2067,66 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
> >  			      sigmask ? &ksigmask : NULL);
> >  }
> >  
> > +SYSCALL_DEFINE5(epoll_mod_wait, int, epfd, int, flags,
> > +		int, ncmds, struct epoll_mod_cmd __user *, cmds,
> > +		struct epoll_wait_spec __user *, spec)
> > +{
> > +	struct epoll_mod_cmd *kcmds = NULL;
> > +	int i, ret = 0;
> > +	int cmd_size = sizeof(struct epoll_mod_cmd) * ncmds;
> > +
> > +	if (flags)
> > +		return -EINVAL;
> > +	if (ncmds) {
> > +		if (!cmds)
> > +			return -EINVAL;
> > +		kcmds = kmalloc(cmd_size, GFP_KERNEL);
> > +		if (!kcmds)
> > +			return -ENOMEM;
> > +		if (copy_from_user(kcmds, cmds, cmd_size)) {
> > +			ret = -EFAULT;
> > +			goto out;
> > +		}
> > +	}
> > +	for (i = 0; i < ncmds; i++) {
> > +		struct epoll_event ev = (struct epoll_event) {
> > +			.events = kcmds[i].events,
> > +			.data = kcmds[i].data,
> > +		};
> > +		if (kcmds[i].flags) {
> > +			kcmds[i].error = ret = -EINVAL;
> > +			goto out;
> > +		}
> > +		kcmds[i].error = ret = ep_ctl_do(epfd, kcmds[i].op, kcmds[i].fd, ev);
> > +		if (ret)
> > +			goto out;
> > +	}
> > +	if (spec) {
> > +		sigset_t ksigmask;
> > +		struct epoll_wait_spec kspec;
> > +		ktime_t timeout;
> > +
> > +		if(copy_from_user(&kspec, spec, sizeof(struct epoll_wait_spec)))
> > +			return -EFAULT;
> This should probably be goto out, or you'll leak kcmds.
> 
> > +		if (kspec.sigmask) {
> > +			if (kspec.sigsetsize != sizeof(sigset_t))
> > +				return -EINVAL;
> Same here...
> 
> > +			if (copy_from_user(&ksigmask, kspec.sigmask, sizeof(ksigmask)))
> > +				return -EFAULT;
> and here.
> 
> > +		}
> > +		timeout = timespec_to_ktime(kspec.timeout);
> > +		ret = epoll_pwait_do(epfd, kspec.events, kspec.maxevents,
> > +				     kspec.clockid, timeout,
> > +				     kspec.sigmask ? &ksigmask : NULL);
> > +	}
> > +
> > +out:
> > +	if (ncmds && copy_to_user(cmds, kcmds, cmd_size))
> > +		return -EFAULT;
> This will also leak kcmds, it should be ret = -EFAULT. This case, however, seems
> to lead to a weird corner case: if cmds is read-only, we'll end up executing
> every command but fail to copy out the return values, so when userspace gets the
> EFAULT, it won't know whether anything was executed. But, getting an EFAULT here
> means you're probably doing something wrong anyways, so maybe not the biggest
> concern.

Yes, thanks! Will fix this.

Fam

> 
> > +	kfree(kcmds);
> > +	return ret;
> > +}
> > +
> >  #ifdef CONFIG_COMPAT
> >  COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
> >  		       struct epoll_event __user *, events,
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > index 85893d7..7156c80 100644
> > --- a/include/linux/syscalls.h
> > +++ b/include/linux/syscalls.h
> > @@ -12,6 +12,8 @@
> >  #define _LINUX_SYSCALLS_H
> >  
> >  struct epoll_event;
> > +struct epoll_mod_cmd;
> > +struct epoll_wait_spec;
> >  struct iattr;
> >  struct inode;
> >  struct iocb;
> > @@ -630,6 +632,9 @@ asmlinkage long sys_epoll_pwait(int epfd, struct epoll_event __user *events,
> >  				int maxevents, int timeout,
> >  				const sigset_t __user *sigmask,
> >  				size_t sigsetsize);
> > +asmlinkage long sys_epoll_mod_wait(int epfd, int flags,
> > +				   int ncmds, struct epoll_mod_cmd __user * cmds,
> > +				   struct epoll_wait_spec __user * spec);
> >  asmlinkage long sys_gethostname(char __user *name, int len);
> >  asmlinkage long sys_sethostname(char __user *name, int len);
> >  asmlinkage long sys_setdomainname(char __user *name, int len);
> > -- 
> > 1.9.3
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> -- 
> Omar

WARNING: multiple messages have this Message-ID (diff)
From: Fam Zheng <famz@redhat.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <keescook@chromium.org>,
	Andy Lutomirski <luto@amacapital.net>,
	David Herrmann <dh.herrmann@gmail.com>,
	Alexei Starovoitov <ast@plumgrid.com>,
	Miklos Szeredi <mszeredi@suse.cz>,
	David Drysdale <drysdale@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Vivek Goyal <vgoyal@redhat.com>,
	Mike Frysinger <vapier@gentoo.org>, Theodore Ts'o <tytso@mit.edu>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Rashika Kheria <rashika.kheria@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Peter Zijlstra <peter>
Subject: Re: [PATCH RFC 5/6] epoll: Add implementation for epoll_mod_wait
Date: Wed, 21 Jan 2015 16:59:00 +0800	[thread overview]
Message-ID: <20150121085900.GA25434@ad.nay.redhat.com> (raw)
In-Reply-To: <20150121075617.GA21266@mew>

On Tue, 01/20 23:56, Omar Sandoval wrote:
> On Tue, Jan 20, 2015 at 05:57:57PM +0800, Fam Zheng wrote:
> > This syscall is a sequence of
> > 
> > 1) a number of epoll_ctl calls
> > 2) a epoll_pwait, with timeout enhancement.
> > 
> > The epoll_ctl operations are embeded so that application doesn't have to use
> > separate syscalls to insert/delete/update the fds before poll. It is more
> > efficient if the set of fds varies from one poll to another, which is the
> > common pattern for certain applications. For example, depending on the input
> > buffer status, a data reading program may decide to temporarily not polling an
> > fd.
> > 
> > Because the enablement of batching in this interface, even that regular
> > epoll_ctl call sequence, which manipulates several fds, can be optimized to one
> > single epoll_ctl_wait (while specifying spec=NULL to skip the poll part).
> > 
> > The only complexity is returning the result of each operation.  For each
> > epoll_mod_cmd in cmds, the field "error" is an output field that will be stored
> > the return code *iff* the command is executed (0 for success and -errno of the
> > equivalent epoll_ctl call), and will be left unchanged if the command is not
> > executed because some earlier error, for example due to failure of
> > copy_from_user to copy the array.
> > 
> > Applications can utilize this fact to do error handling: they could initialize
> > all the epoll_mod_wait.error to a positive value, which is by definition not a
> > possible output value from epoll_mod_wait. Then when the syscall returned, they
> > know whether or not the command is executed by comparing each error with the
> > init value, if they're different, they have the result of the command.
> > More roughly, they can put any non-zero and not distinguish "not run" from
> > failure.
> > 
> > Also, timeout parameter is enhanced: timespec is used, compared to the old ms
> > scalar. This provides higher precision. The parameter field in struct
> > epoll_wait_spec, "clockid", also makes it possible for users to use a different
> > clock than the default when it makes more sense.
> > 
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> > ---
> >  fs/eventpoll.c           | 60 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/syscalls.h |  5 ++++
> >  2 files changed, 65 insertions(+)
> > 
> > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > index e7a116d..2cc22c9 100644
> > --- a/fs/eventpoll.c
> > +++ b/fs/eventpoll.c
> > @@ -2067,6 +2067,66 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events,
> >  			      sigmask ? &ksigmask : NULL);
> >  }
> >  
> > +SYSCALL_DEFINE5(epoll_mod_wait, int, epfd, int, flags,
> > +		int, ncmds, struct epoll_mod_cmd __user *, cmds,
> > +		struct epoll_wait_spec __user *, spec)
> > +{
> > +	struct epoll_mod_cmd *kcmds = NULL;
> > +	int i, ret = 0;
> > +	int cmd_size = sizeof(struct epoll_mod_cmd) * ncmds;
> > +
> > +	if (flags)
> > +		return -EINVAL;
> > +	if (ncmds) {
> > +		if (!cmds)
> > +			return -EINVAL;
> > +		kcmds = kmalloc(cmd_size, GFP_KERNEL);
> > +		if (!kcmds)
> > +			return -ENOMEM;
> > +		if (copy_from_user(kcmds, cmds, cmd_size)) {
> > +			ret = -EFAULT;
> > +			goto out;
> > +		}
> > +	}
> > +	for (i = 0; i < ncmds; i++) {
> > +		struct epoll_event ev = (struct epoll_event) {
> > +			.events = kcmds[i].events,
> > +			.data = kcmds[i].data,
> > +		};
> > +		if (kcmds[i].flags) {
> > +			kcmds[i].error = ret = -EINVAL;
> > +			goto out;
> > +		}
> > +		kcmds[i].error = ret = ep_ctl_do(epfd, kcmds[i].op, kcmds[i].fd, ev);
> > +		if (ret)
> > +			goto out;
> > +	}
> > +	if (spec) {
> > +		sigset_t ksigmask;
> > +		struct epoll_wait_spec kspec;
> > +		ktime_t timeout;
> > +
> > +		if(copy_from_user(&kspec, spec, sizeof(struct epoll_wait_spec)))
> > +			return -EFAULT;
> This should probably be goto out, or you'll leak kcmds.
> 
> > +		if (kspec.sigmask) {
> > +			if (kspec.sigsetsize != sizeof(sigset_t))
> > +				return -EINVAL;
> Same here...
> 
> > +			if (copy_from_user(&ksigmask, kspec.sigmask, sizeof(ksigmask)))
> > +				return -EFAULT;
> and here.
> 
> > +		}
> > +		timeout = timespec_to_ktime(kspec.timeout);
> > +		ret = epoll_pwait_do(epfd, kspec.events, kspec.maxevents,
> > +				     kspec.clockid, timeout,
> > +				     kspec.sigmask ? &ksigmask : NULL);
> > +	}
> > +
> > +out:
> > +	if (ncmds && copy_to_user(cmds, kcmds, cmd_size))
> > +		return -EFAULT;
> This will also leak kcmds, it should be ret = -EFAULT. This case, however, seems
> to lead to a weird corner case: if cmds is read-only, we'll end up executing
> every command but fail to copy out the return values, so when userspace gets the
> EFAULT, it won't know whether anything was executed. But, getting an EFAULT here
> means you're probably doing something wrong anyways, so maybe not the biggest
> concern.

Yes, thanks! Will fix this.

Fam

> 
> > +	kfree(kcmds);
> > +	return ret;
> > +}
> > +
> >  #ifdef CONFIG_COMPAT
> >  COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
> >  		       struct epoll_event __user *, events,
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > index 85893d7..7156c80 100644
> > --- a/include/linux/syscalls.h
> > +++ b/include/linux/syscalls.h
> > @@ -12,6 +12,8 @@
> >  #define _LINUX_SYSCALLS_H
> >  
> >  struct epoll_event;
> > +struct epoll_mod_cmd;
> > +struct epoll_wait_spec;
> >  struct iattr;
> >  struct inode;
> >  struct iocb;
> > @@ -630,6 +632,9 @@ asmlinkage long sys_epoll_pwait(int epfd, struct epoll_event __user *events,
> >  				int maxevents, int timeout,
> >  				const sigset_t __user *sigmask,
> >  				size_t sigsetsize);
> > +asmlinkage long sys_epoll_mod_wait(int epfd, int flags,
> > +				   int ncmds, struct epoll_mod_cmd __user * cmds,
> > +				   struct epoll_wait_spec __user * spec);
> >  asmlinkage long sys_gethostname(char __user *name, int len);
> >  asmlinkage long sys_sethostname(char __user *name, int len);
> >  asmlinkage long sys_setdomainname(char __user *name, int len);
> > -- 
> > 1.9.3
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> -- 
> Omar

  reply	other threads:[~2015-01-21  9:00 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-20  9:57 [PATCH RFC 0/6] epoll: Introduce new syscall "epoll_mod_wait" Fam Zheng
2015-01-20  9:57 ` Fam Zheng
2015-01-20  9:57 ` Fam Zheng
2015-01-20  9:57 ` [PATCH RFC 1/6] epoll: Extract epoll_wait_do and epoll_pwait_do Fam Zheng
2015-01-20  9:57   ` Fam Zheng
2015-01-20  9:57   ` Fam Zheng
2015-01-20  9:57 ` [PATCH RFC 2/6] epoll: Specify clockid explicitly Fam Zheng
2015-01-20  9:57   ` Fam Zheng
2015-01-20  9:57   ` Fam Zheng
2015-01-20  9:57 ` [PATCH RFC 3/6] epoll: Add definition for epoll_mod_wait structures Fam Zheng
2015-01-20  9:57   ` Fam Zheng
2015-01-20  9:57   ` Fam Zheng
2015-01-20  9:57 ` [PATCH RFC 4/6] epoll: Extract ep_ctl_do Fam Zheng
2015-01-20  9:57   ` Fam Zheng
2015-01-20  9:57   ` Fam Zheng
2015-01-20  9:57 ` [PATCH RFC 5/6] epoll: Add implementation for epoll_mod_wait Fam Zheng
2015-01-20  9:57   ` Fam Zheng
2015-01-20  9:57   ` Fam Zheng
2015-01-20 12:50   ` Michael Kerrisk (man-pages)
2015-01-20 12:50     ` Michael Kerrisk (man-pages)
2015-01-20 12:50     ` Michael Kerrisk (man-pages)
2015-01-21  4:59     ` Fam Zheng
2015-01-21  4:59       ` Fam Zheng
2015-01-21  4:59       ` Fam Zheng
2015-01-21  7:52       ` Michael Kerrisk (man-pages)
2015-01-21  7:52         ` Michael Kerrisk (man-pages)
2015-01-21  7:52         ` Michael Kerrisk (man-pages)
2015-01-21  8:58         ` Fam Zheng
2015-01-21  8:58           ` Fam Zheng
2015-01-21 10:37           ` Paolo Bonzini
2015-01-21 10:37             ` Paolo Bonzini
2015-01-21 11:14             ` Fam Zheng
2015-01-21 11:14               ` Fam Zheng
2015-01-21 11:14               ` Fam Zheng
2015-01-21 11:50               ` Paolo Bonzini
2015-01-21 11:50                 ` Paolo Bonzini
2015-01-21 11:50                 ` Paolo Bonzini
2015-01-22 21:12                 ` Andy Lutomirski
2015-01-22 21:12                   ` Andy Lutomirski
2015-01-22 21:12                   ` Andy Lutomirski
2015-01-23  6:20                   ` Fam Zheng
2015-01-23  6:20                     ` Fam Zheng
2015-01-23  6:20                     ` Fam Zheng
2015-01-23  9:56                   ` Paolo Bonzini
2015-01-23  9:56                     ` Paolo Bonzini
2015-01-23  9:56                     ` Paolo Bonzini
2015-01-21 10:34         ` Paolo Bonzini
2015-01-21 10:34           ` Paolo Bonzini
2015-01-21 10:34           ` Paolo Bonzini
2015-01-21  7:56   ` Omar Sandoval
2015-01-21  7:56     ` Omar Sandoval
2015-01-21  7:56     ` Omar Sandoval
2015-01-21  8:59     ` Fam Zheng [this message]
2015-01-21  8:59       ` Fam Zheng
2015-01-21  8:59       ` Fam Zheng
2015-01-20  9:57 ` [PATCH RFC 6/6] x86: Hook up epoll_mod_wait syscall Fam Zheng
2015-01-20  9:57   ` Fam Zheng
2015-01-20  9:57   ` Fam Zheng
2015-01-20 10:37 ` [PATCH RFC 0/6] epoll: Introduce new syscall "epoll_mod_wait" Rasmus Villemoes
2015-01-20 10:37   ` Rasmus Villemoes
2015-01-20 10:53   ` Fam Zheng
2015-01-20 10:53     ` Fam Zheng
2015-01-20 12:48 ` Michael Kerrisk (man-pages)
2015-01-20 12:48   ` Michael Kerrisk (man-pages)
2015-01-20 12:48   ` Michael Kerrisk (man-pages)
2015-01-21  9:05   ` Fam Zheng
2015-01-21  9:05     ` Fam Zheng
2015-01-21  9:05     ` Fam Zheng
2015-01-20 22:40 ` Andy Lutomirski
2015-01-20 22:40   ` Andy Lutomirski
2015-01-20 22:40   ` Andy Lutomirski
2015-01-20 23:03   ` josh
2015-01-20 23:03     ` josh-iaAMLnmF4UmaiuxdJuQwMA
2015-01-20 23:03     ` josh-iaAMLnmF4UmaiuxdJuQwMA
2015-01-21  5:55   ` Michael Kerrisk (man-pages)
2015-01-21  5:55     ` Michael Kerrisk (man-pages)
2015-01-21  5:55     ` Michael Kerrisk (man-pages)
2015-01-21  9:07   ` Fam Zheng
2015-01-21  9:07     ` Fam Zheng
2015-01-21  9:07     ` Fam Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150121085900.GA25434@ad.nay.redhat.com \
    --to=famz@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ast@plumgrid.com \
    --cc=davem@davemloft.net \
    --cc=dh.herrmann@gmail.com \
    --cc=drysdale@google.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=josh@joshtriplett.org \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=luto@amacapital.net \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=mszeredi@suse.cz \
    --cc=mtk.manpages@gmail.com \
    --cc=oleg@redhat.com \
    --cc=osandov@osandov.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rashika.kheria@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=tytso@mit.edu \
    --cc=vapier@gentoo.org \
    --cc=vgoyal@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.