[RFC] a corner case of open(2)

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC] a corner case of open(2)
@ 2016-04-26 17:55 Al Viro
  2016-04-26 18:05 ` Cedric Blancher
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Al Viro @ 2016-04-26 17:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, linux-kernel

	According to POSIX (and behaviour on other Unices) the following
should succeed: open("/tmp", O_CREAT, 0) does not have O_EXCL and the pathname
does refer to existing object, so O_CREAT is ignored and the call is
equivalent to open("/tmp", 0), which succeeds.

	We have it rejected with EISDIR.  The thing is, the standard behaviour
is actually less messy wrt code, and do_last()/lookup_open()/atomic_open()
badly needs untangling.

	Another place where we produce a bogus EISDIR is O_CREAT|O_EXCL on
an existing directory.  POSIX (and other Unices) have EEXIST there.  In some
cases we produce EEXIST, in some - EISDIR.  Uniform EEXIST is actually easier.

	It is a change of user-visible behaviour, but I would be very
surprised if anything broke from that change.  And it would help to simplify
the awful mess we have in there.

	Comments?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] a corner case of open(2)
  2016-04-26 17:55 [RFC] a corner case of open(2) Al Viro
@ 2016-04-26 18:05 ` Cedric Blancher
  2016-04-26 18:15   ` Al Viro
  2016-04-26 18:41 ` Valdis.Kletnieks
  2016-04-27  5:34 ` Al Viro
  2 siblings, 1 reply; 12+ messages in thread
From: Cedric Blancher @ 2016-04-26 18:05 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, Linux Kernel Mailing List

Existing UNIX behaviour is better. Also, for open() a directory,
remember that int fd=open(), fchdir(fd) must work.

Ced

On 26 April 2016 at 19:55, Al Viro <viro@zeniv.linux.org.uk> wrote:
>         According to POSIX (and behaviour on other Unices) the following
> should succeed: open("/tmp", O_CREAT, 0) does not have O_EXCL and the pathname
> does refer to existing object, so O_CREAT is ignored and the call is
> equivalent to open("/tmp", 0), which succeeds.
>
>         We have it rejected with EISDIR.  The thing is, the standard behaviour
> is actually less messy wrt code, and do_last()/lookup_open()/atomic_open()
> badly needs untangling.
>
>         Another place where we produce a bogus EISDIR is O_CREAT|O_EXCL on
> an existing directory.  POSIX (and other Unices) have EEXIST there.  In some
> cases we produce EEXIST, in some - EISDIR.  Uniform EEXIST is actually easier.
>
>         It is a change of user-visible behaviour, but I would be very
> surprised if anything broke from that change.  And it would help to simplify
> the awful mess we have in there.
>
>         Comments?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Cedric Blancher <cedric.blancher@gmail.com>
Institute Pasteur

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] a corner case of open(2)
  2016-04-26 18:05 ` Cedric Blancher
@ 2016-04-26 18:15   ` Al Viro
  0 siblings, 0 replies; 12+ messages in thread
From: Al Viro @ 2016-04-26 18:15 UTC (permalink / raw)
  To: Cedric Blancher; +Cc: Linus Torvalds, linux-fsdevel, Linux Kernel Mailing List

On Tue, Apr 26, 2016 at 08:05:50PM +0200, Cedric Blancher wrote:
> Existing UNIX behaviour is better. Also, for open() a directory,
> remember that int fd=open(), fchdir(fd) must work.

What's to remember?  That part is quite orthogonal to what I'm talking about...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] a corner case of open(2)
  2016-04-26 17:55 [RFC] a corner case of open(2) Al Viro
  2016-04-26 18:05 ` Cedric Blancher
@ 2016-04-26 18:41 ` Valdis.Kletnieks
  2016-04-26 19:02   ` Al Viro
  2016-04-27  5:34 ` Al Viro
  2 siblings, 1 reply; 12+ messages in thread
From: Valdis.Kletnieks @ 2016-04-26 18:41 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 732 bytes --]

On Tue, 26 Apr 2016 18:55:38 +0100, Al Viro said:

> 	It is a change of user-visible behaviour, but I would be very
> surprised if anything broke from that change.  And it would help to simplify
> the awful mess we have in there.

I have to admit that over the past 3 decades of working with Unix-y systems,
there's been a number of times I've had to resort to 'od -cx /your/dir/here'
to debug issues (/bin/ls -fi is *almost* equivalent, but doesn't show holes
in the directory)

The biggest danger I can see is some shell script doing something like:

foobar > $dir/$targetfile

and $targetfile is unset. If we allow a program to get an open fd that refers
to a directory, what are the semantics of various operations on that fd?


[-- Attachment #2: Type: application/pgp-signature, Size: 848 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] a corner case of open(2)
  2016-04-26 18:41 ` Valdis.Kletnieks
@ 2016-04-26 19:02   ` Al Viro
  2016-04-26 19:25     ` Valdis.Kletnieks
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2016-04-26 19:02 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel

On Tue, Apr 26, 2016 at 02:41:37PM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Tue, 26 Apr 2016 18:55:38 +0100, Al Viro said:
> 
> > 	It is a change of user-visible behaviour, but I would be very
> > surprised if anything broke from that change.  And it would help to simplify
> > the awful mess we have in there.
> 
> I have to admit that over the past 3 decades of working with Unix-y systems,
> there's been a number of times I've had to resort to 'od -cx /your/dir/here'
> to debug issues (/bin/ls -fi is *almost* equivalent, but doesn't show holes
> in the directory)
> 
> The biggest danger I can see is some shell script doing something like:
> 
> foobar > $dir/$targetfile
> 
> and $targetfile is unset. If we allow a program to get an open fd that refers
> to a directory, what are the semantics of various operations on that fd?

Huh?  We certainly do allow to get an open fd that refers to a directory -
how else could ls(1) possibly work?  See getdents(2) - it does use an
open file descriptor to specify the directory we operate upon.

We also do not allow opening directories for *write*, and in that case EISDIR
is the right error (and we do return it).  The corner case in question is
different:
	* O_CREAT present
	* O_EXCL absent
	* O_RDWR absent
	* O_WRONLY absent
	* pathname refers to existing directory

That's where POSIX says "just open it for read, as if O_CREAT hadn't been
there" and we fail with EISDIR.  With both O_CREAT and O_EXCL POSIX says
"fail with EEXIST" and we either do that or fail with EISDIR, depending on the
pathname details.  With either of O_RDWR and O_WRONLY POSIX says "fail with
EISDIR, O_CREAT or no O_CREAT" and that's what we do (and would certainly keep
doing so).

If you look at the code you'll see
        case S_IFDIR:  
                if (acc_mode & MAY_WRITE)
                        return -EISDIR;
in may_open() and
        error = -EISDIR;
        if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry))
                goto out;
in do_last().  The former is "can't open them rw or w", the latter - "can't
have O_CREAT on those".  With O_CREAT|O_RDWR as in your example either one
would trigger (in reality the latter will trigger first and the call of
may_open() several lines below won't be reached at all).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] a corner case of open(2)
  2016-04-26 19:02   ` Al Viro
@ 2016-04-26 19:25     ` Valdis.Kletnieks
  2016-04-26 20:17         ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Valdis.Kletnieks @ 2016-04-26 19:25 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 993 bytes --]

On Tue, 26 Apr 2016 20:02:48 +0100, Al Viro said:

> > The biggest danger I can see is some shell script doing something like:
> >
> > foobar > $dir/$targetfile
> >
> > and $targetfile is unset. If we allow a program to get an open fd that refers
> > to a directory, what are the semantics of various operations on that fd?
>
> Huh?  We certainly do allow to get an open fd that refers to a directory -
> how else could ls(1) possibly work?  See getdents(2) - it does use an
> open file descriptor to specify the directory we operate upon.

Gaah.. I lost a few words in there - /bin/ls is *expecting* to operate on
a directory, so to calls getdents.   I meant some generic program that
opened a directory in error, and was expecting to act on "stream of bytes"

> We also do not allow opening directories for *write*, and in that case EISDIR
> is the right error (and we do return it).

OK, that and ftruncate() are about the only ways to cause trouble with a
directory opened by accident...


[-- Attachment #2: Type: application/pgp-signature, Size: 848 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] a corner case of open(2)
  2016-04-26 19:25     ` Valdis.Kletnieks
@ 2016-04-26 20:17         ` Al Viro
  0 siblings, 0 replies; 12+ messages in thread
From: Al Viro @ 2016-04-26 20:17 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel

On Tue, Apr 26, 2016 at 03:25:16PM -0400, Valdis.Kletnieks@vt.edu wrote:
> Gaah.. I lost a few words in there - /bin/ls is *expecting* to operate on
> a directory, so to calls getdents.   I meant some generic program that
> opened a directory in error, and was expecting to act on "stream of bytes"
> 
> > We also do not allow opening directories for *write*, and in that case EISDIR
> > is the right error (and we do return it).
> 
> OK, that and ftruncate() are about the only ways to cause trouble with a
> directory opened by accident...

ftruncate() requires the file to be opened for write (which already excludes
directories) *and* it requires the file to be a regular one (which is
redundant in case of directories, but e.g. a block device can be opened
for write and ftruncate would still fail on that).  EINVAL in both cases.

truncate() for directories should fail with EISDIR (see vfs_truncate());
for anything that is neither directory nor regular - EINVAL (same place).

O_TRUNC ends up failing with EISDIR on directories - see
        /* O_TRUNC implies we need access checks for write permissions */
        if (flags & O_TRUNC)
                acc_mode |= MAY_WRITE;
in build_open_flags() and aforementioned bit in may_open().  POSIX is
bloody vague on that topic, but that's the common behaviour since 4.3BSD
has fixed an fs-corrupting bug in the original implementation (4.2BSD
allowed open(directory, O_TRUNC), which both succeeded *and* truncated the
damn thing to zero, to great joy of fsck).  Note that v7 didn't have O_TRUNC
at all - creat(2) was the only way to get it and that opened the sucker r/w,
so the usual rules re "no opening directories for write" applied.  When
O_TRUNC had been introduced, initially they'd missed the possibility of
somebody passing it to read-only open() and the need to reject those for
directories same as open for write.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] a corner case of open(2)
@ 2016-04-26 20:17         ` Al Viro
  0 siblings, 0 replies; 12+ messages in thread
From: Al Viro @ 2016-04-26 20:17 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel

On Tue, Apr 26, 2016 at 03:25:16PM -0400, Valdis.Kletnieks@vt.edu wrote:
> Gaah.. I lost a few words in there - /bin/ls is *expecting* to operate on
> a directory, so to calls getdents.   I meant some generic program that
> opened a directory in error, and was expecting to act on "stream of bytes"
> 
> > We also do not allow opening directories for *write*, and in that case EISDIR
> > is the right error (and we do return it).
> 
> OK, that and ftruncate() are about the only ways to cause trouble with a
> directory opened by accident...

ftruncate() requires the file to be opened for write (which already excludes
directories) *and* it requires the file to be a regular one (which is
redundant in case of directories, but e.g. a block device can be opened
for write and ftruncate would still fail on that).  EINVAL in both cases.

truncate() for directoriesï¿½should fail with EISDIR (see vfs_truncate());
for anything that is neither directory nor regular - EINVAL (same place).

O_TRUNC ends up failing with EISDIR on directories - see
        /* O_TRUNC implies we need access checks for write permissions */
        if (flags & O_TRUNC)
                acc_mode |= MAY_WRITE;
in build_open_flags() and aforementioned bit in may_open().  POSIX is
bloody vague on that topic, but that's the common behaviour since 4.3BSD
has fixed an fs-corrupting bug in the original implementation (4.2BSD
allowed open(directory, O_TRUNC), which both succeeded *and* truncated the
damn thing to zero, to great joy of fsck).  Note that v7 didn't have O_TRUNC
at all - creat(2) was the only way to get it and that opened the sucker r/w,
so the usual rules re "no opening directories for write" applied.  When
O_TRUNC had been introduced, initially they'd missed the possibility of
somebody passing it to read-only open() and the need to reject those for
directories same as open for write.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] a corner case of open(2)
  2016-04-26 17:55 [RFC] a corner case of open(2) Al Viro
  2016-04-26 18:05 ` Cedric Blancher
  2016-04-26 18:41 ` Valdis.Kletnieks
@ 2016-04-27  5:34 ` Al Viro
  2016-04-27  9:33   ` Miklos Szeredi
  2016-04-27 19:29   ` another patch in #for-linus (was Re: [RFC] a corner case of open(2)) Al Viro
  2 siblings, 2 replies; 12+ messages in thread
From: Al Viro @ 2016-04-27  5:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, linux-kernel, Miklos Szeredi

Fun bugs caught while trying to massage atomic_open()...  Patch below is
in vfs.git#for-linus (along with two more fixes); I would like to get
an ACK from Miklos on that one - it's his code and this thing had been
present in there since the original merge.  I might be misreading what
it tries to do, but
        open("/mnt/no-such-file", O_CREAT | O_TRUNC);
        perror("open"); errno = 0;
        stat("/mnt/no-such-file", &st);
        perror("stat"); errno = 0;
        open("/mnt/no-such-file", O_CREAT | O_TRUNC);
        perror("open");
should *not* end up with
	open: Read-only file system
	stat: No such file or directory
	open: No such file or directory
no matter what.  And it's very easy to arrange just that - mount nfs4
read-only on /mnt and run the snippet above.  First open() will fail with
EROFS (as it should), but as soon as the thing is in dcache we start getting
ENOENT.  Obviously bogus.

commit 1aa57f2aaa108ead7d81481af68085b0a77708f1
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Wed Apr 27 01:11:55 2016 -0400

    atomic_open(): fix the handling of create_error
    
    * if we have a hashed negative dentry and either CREAT|EXCL on
    r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL
    with failing may_o_create(), we should fail with EROFS or the
    error may_o_create() has returned, but not ENOENT.  Which is what
    the current code ends up returning.
    
    * if we have CREAT|TRUNC hitting a regular file on a read-only
    filesystem, we can't fail with EROFS here.  At the very least,
    not until we'd done follow_managed() - we might have a writable
    file (or a device, for that matter) bound on top of that one.
    Moreover, the code downstream will see that O_TRUNC and attempt
    to grab the write access (*after* following possible mount), so
    if we really should fail with EROFS, it will happen.  No need
    to do that inside atomic_open().
    
    The real logics is much simpler than what the current code is
    trying to do - if we decided to go for simple lookup, ended
    up with a negative dentry *and* had create_error set, fail with
    create_error.  No matter whether we'd got that negative dentry
    from lookup_real() or had found it in dcache.
    
    Cc: stable@vger.kernel.org # v3.6+
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/fs/namei.c b/fs/namei.c
index 1d9ca2d..b458992 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2942,22 +2942,10 @@ no_open:
 		dentry = lookup_real(dir, dentry, nd->flags);
 		if (IS_ERR(dentry))
 			return PTR_ERR(dentry);
-
-		if (create_error) {
-			int open_flag = op->open_flag;
-
-			error = create_error;
-			if ((open_flag & O_EXCL)) {
-				if (!dentry->d_inode)
-					goto out;
-			} else if (!dentry->d_inode) {
-				goto out;
-			} else if ((open_flag & O_TRUNC) &&
-				   d_is_reg(dentry)) {
-				goto out;
-			}
-			/* will fail later, go on to get the right error */
-		}
+	}
+	if (create_error && !dentry->d_inode) {
+		error = create_error;
+		goto out;
 	}
 looked_up:
 	path->dentry = dentry;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC] a corner case of open(2)
  2016-04-27  5:34 ` Al Viro
@ 2016-04-27  9:33   ` Miklos Szeredi
  2016-04-27 19:29   ` another patch in #for-linus (was Re: [RFC] a corner case of open(2)) Al Viro
  1 sibling, 0 replies; 12+ messages in thread
From: Miklos Szeredi @ 2016-04-27  9:33 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, linux-kernel

On Wed, Apr 27, 2016 at 7:34 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> Fun bugs caught while trying to massage atomic_open()...  Patch below is
> in vfs.git#for-linus (along with two more fixes); I would like to get
> an ACK from Miklos on that one - it's his code and this thing had been
> present in there since the original merge.  I might be misreading what
> it tries to do, but
>         open("/mnt/no-such-file", O_CREAT | O_TRUNC);
>         perror("open"); errno = 0;
>         stat("/mnt/no-such-file", &st);
>         perror("stat"); errno = 0;
>         open("/mnt/no-such-file", O_CREAT | O_TRUNC);
>         perror("open");
> should *not* end up with
>         open: Read-only file system
>         stat: No such file or directory
>         open: No such file or directory
> no matter what.  And it's very easy to arrange just that - mount nfs4
> read-only on /mnt and run the snippet above.  First open() will fail with
> EROFS (as it should), but as soon as the thing is in dcache we start getting
> ENOENT.  Obviously bogus.
>
> commit 1aa57f2aaa108ead7d81481af68085b0a77708f1
> Author: Al Viro <viro@zeniv.linux.org.uk>
> Date:   Wed Apr 27 01:11:55 2016 -0400
>
>     atomic_open(): fix the handling of create_error
>
>     * if we have a hashed negative dentry and either CREAT|EXCL on
>     r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL
>     with failing may_o_create(), we should fail with EROFS or the
>     error may_o_create() has returned, but not ENOENT.  Which is what
>     the current code ends up returning.
>
>     * if we have CREAT|TRUNC hitting a regular file on a read-only
>     filesystem, we can't fail with EROFS here.  At the very least,
>     not until we'd done follow_managed() - we might have a writable
>     file (or a device, for that matter) bound on top of that one.
>     Moreover, the code downstream will see that O_TRUNC and attempt
>     to grab the write access (*after* following possible mount), so
>     if we really should fail with EROFS, it will happen.  No need
>     to do that inside atomic_open().
>
>     The real logics is much simpler than what the current code is
>     trying to do - if we decided to go for simple lookup, ended
>     up with a negative dentry *and* had create_error set, fail with
>     create_error.  No matter whether we'd got that negative dentry
>     from lookup_real() or had found it in dcache.

Makes perfect sense.  Looks like I just fsckd up that part.

Acked-by: Miklos Szeredi <mszeredi@redhat.com>

>
>     Cc: stable@vger.kernel.org # v3.6+
>     Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 1d9ca2d..b458992 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2942,22 +2942,10 @@ no_open:
>                 dentry = lookup_real(dir, dentry, nd->flags);
>                 if (IS_ERR(dentry))
>                         return PTR_ERR(dentry);
> -
> -               if (create_error) {
> -                       int open_flag = op->open_flag;
> -
> -                       error = create_error;
> -                       if ((open_flag & O_EXCL)) {
> -                               if (!dentry->d_inode)
> -                                       goto out;
> -                       } else if (!dentry->d_inode) {
> -                               goto out;
> -                       } else if ((open_flag & O_TRUNC) &&
> -                                  d_is_reg(dentry)) {
> -                               goto out;
> -                       }
> -                       /* will fail later, go on to get the right error */
> -               }
> +       }
> +       if (create_error && !dentry->d_inode) {
> +               error = create_error;
> +               goto out;
>         }
>  looked_up:
>         path->dentry = dentry;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* another patch in #for-linus (was Re: [RFC] a corner case of open(2))
  2016-04-27  5:34 ` Al Viro
  2016-04-27  9:33   ` Miklos Szeredi
@ 2016-04-27 19:29   ` Al Viro
  2016-05-02 21:48     ` Pavel Machek
  1 sibling, 1 reply; 12+ messages in thread
From: Al Viro @ 2016-04-27 19:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, linux-kernel, Linus Torvalds

On Wed, Apr 27, 2016 at 06:34:58AM +0100, Al Viro wrote:
> Fun bugs caught while trying to massage atomic_open()...  Patch below is
> in vfs.git#for-linus (along with two more fixes); I would like to get
> an ACK from Miklos on that one [snip]

Now that Miklos has ACKed that one...  The other fix in there is for alignment
checks in blk_rq_map_user_iov() - also -stable fodder, even though this one
will be a bit more work to backport.  Jens, could you ACK that one?  This is
the patch I'd sent your way several weeks ago (and the one we'd been talking
about at LSFMM):

commit 357f435d8a0d32068c75f3c7176434d992b3adb7
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Fri Apr 8 19:05:19 2016 -0400

    fix the copy vs. map logics in blk_rq_map_user_iov()
    
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

diff --git a/block/blk-map.c b/block/blk-map.c
index a54f054..b9f88b77 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -9,24 +9,6 @@
 
 #include "blk.h"
 
-static bool iovec_gap_to_prv(struct request_queue *q,
-			     struct iovec *prv, struct iovec *cur)
-{
-	unsigned long prev_end;
-
-	if (!queue_virt_boundary(q))
-		return false;
-
-	if (prv->iov_base == NULL && prv->iov_len == 0)
-		/* prv is not set - don't check */
-		return false;
-
-	prev_end = (unsigned long)(prv->iov_base + prv->iov_len);
-
-	return (((unsigned long)cur->iov_base & queue_virt_boundary(q)) ||
-		prev_end & queue_virt_boundary(q));
-}
-
 int blk_rq_append_bio(struct request_queue *q, struct request *rq,
 		      struct bio *bio)
 {
@@ -125,31 +107,18 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq,
 			struct rq_map_data *map_data,
 			const struct iov_iter *iter, gfp_t gfp_mask)
 {
-	struct iovec iov, prv = {.iov_base = NULL, .iov_len = 0};
-	bool copy = (q->dma_pad_mask & iter->count) || map_data;
+	bool copy = false;
+	unsigned long align = q->dma_pad_mask | queue_dma_alignment(q);
 	struct bio *bio = NULL;
 	struct iov_iter i;
 	int ret;
 
-	if (!iter || !iter->count)
-		return -EINVAL;
-
-	iov_for_each(iov, i, *iter) {
-		unsigned long uaddr = (unsigned long) iov.iov_base;
-
-		if (!iov.iov_len)
-			return -EINVAL;
-
-		/*
-		 * Keep going so we check length of all segments
-		 */
-		if ((uaddr & queue_dma_alignment(q)) ||
-		    iovec_gap_to_prv(q, &prv, &iov))
-			copy = true;
-
-		prv.iov_base = iov.iov_base;
-		prv.iov_len = iov.iov_len;
-	}
+	if (map_data)
+		copy = true;
+	else if (iov_iter_alignment(iter) & align)
+		copy = true;
+	else if (queue_virt_boundary(q))
+		copy = queue_virt_boundary(q) & iov_iter_gap_alignment(iter);
 
 	i = *iter;
 	do {
diff --git a/include/linux/uio.h b/include/linux/uio.h
index fd9bcfe..1b5d1cd 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -87,6 +87,7 @@ size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
 size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
 size_t iov_iter_zero(size_t bytes, struct iov_iter *);
 unsigned long iov_iter_alignment(const struct iov_iter *i);
+unsigned long iov_iter_gap_alignment(const struct iov_iter *i);
 void iov_iter_init(struct iov_iter *i, int direction, const struct iovec *iov,
 			unsigned long nr_segs, size_t count);
 void iov_iter_kvec(struct iov_iter *i, int direction, const struct kvec *kvec,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 5fecddc..ca5316e 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -569,6 +569,25 @@ unsigned long iov_iter_alignment(const struct iov_iter *i)
 }
 EXPORT_SYMBOL(iov_iter_alignment);
 
+unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
+{
+        unsigned long res = 0;
+	size_t size = i->count;
+	if (!size)
+		return 0;
+
+	iterate_all_kinds(i, size, v,
+		(res |= (!res ? 0 : (unsigned long)v.iov_base) |
+			(size != v.iov_len ? size : 0), 0),
+		(res |= (!res ? 0 : (unsigned long)v.bv_offset) |
+			(size != v.bv_len ? size : 0)),
+		(res |= (!res ? 0 : (unsigned long)v.iov_base) |
+			(size != v.iov_len ? size : 0))
+		);
+		return res;
+}
+EXPORT_SYMBOL(iov_iter_gap_alignment);
+
 ssize_t iov_iter_get_pages(struct iov_iter *i,
 		   struct page **pages, size_t maxsize, unsigned maxpages,
 		   size_t *start)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: another patch in #for-linus (was Re: [RFC] a corner case of open(2))
  2016-04-27 19:29   ` another patch in #for-linus (was Re: [RFC] a corner case of open(2)) Al Viro
@ 2016-05-02 21:48     ` Pavel Machek
  0 siblings, 0 replies; 12+ messages in thread
From: Pavel Machek @ 2016-05-02 21:48 UTC (permalink / raw)
  To: Al Viro; +Cc: Jens Axboe, linux-fsdevel, linux-kernel, Linus Torvalds

Hi!


> +unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
> +{
> +        unsigned long res = 0;
> +	size_t size = i->count;

Something is wrong with indentation here.

> +	iterate_all_kinds(i, size, v,
> +		(res |= (!res ? 0 : (unsigned long)v.iov_base) |
> +			(size != v.iov_len ? size : 0), 0),
> +		(res |= (!res ? 0 : (unsigned long)v.bv_offset) |
> +			(size != v.bv_len ? size : 0)),
> +		(res |= (!res ? 0 : (unsigned long)v.iov_base) |
> +			(size != v.iov_len ? size : 0))
> +		);

Umm. Come on. What's that? Obfuscated C code contest? Could the logic
at least be modified to (res ? ... : 0)?

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-05-02 21:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-26 17:55 [RFC] a corner case of open(2) Al Viro
2016-04-26 18:05 ` Cedric Blancher
2016-04-26 18:15   ` Al Viro
2016-04-26 18:41 ` Valdis.Kletnieks
2016-04-26 19:02   ` Al Viro
2016-04-26 19:25     ` Valdis.Kletnieks
2016-04-26 20:17       ` Al Viro
2016-04-26 20:17         ` Al Viro
2016-04-27  5:34 ` Al Viro
2016-04-27  9:33   ` Miklos Szeredi
2016-04-27 19:29   ` another patch in #for-linus (was Re: [RFC] a corner case of open(2)) Al Viro
2016-05-02 21:48     ` Pavel Machek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.