linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] FMODE_NONOTIFY, FMODE_RANDOM and FMODE_NEG_OFFSET bits
@ 2010-01-15  1:39 Wu Fengguang
  2010-01-15  1:39 ` [PATCH 1/6] fanotify: fix FMODE_NONOTIFY bit number Wu Fengguang
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Wu Fengguang @ 2010-01-15  1:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Christoph Hellwig, Wu Fengguang, LKML, Eric Paris,
	Nick Piggin, Andi Kleen, David Howells, Jonathan Corbet,
	linux-fsdevel

Andrew,

This collects my recent patches related to FMODE_* and O_*
discussed in LKML. They are mainly bug fixes.

O_* and FMODE_NONOTIFY collision fix/check
	[PATCH 1/6] fanotify: fix FMODE_NONOTIFY bit number
	[PATCH 2/6] bitops: compile time optimization for hweight_long(CONSTANT)
	[PATCH 3/6] vfs: O_* bit numbers uniqueness check

POSIX_FADV_RANDOM misbehavior fix
	[PATCH 4/6] vfs: take f_lock on modifying f_mode after open time
	[PATCH 5/6] readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM

allow negative f_pos for /dev/kmem
	[PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/6] fanotify: fix FMODE_NONOTIFY bit number
  2010-01-15  1:39 [PATCH 0/6] FMODE_NONOTIFY, FMODE_RANDOM and FMODE_NEG_OFFSET bits Wu Fengguang
@ 2010-01-15  1:39 ` Wu Fengguang
  2010-01-15  1:39 ` [PATCH 2/6] bitops: compile time optimization for hweight_long(CONSTANT) Wu Fengguang
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2010-01-15  1:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Eric Paris, Wu Fengguang, Christoph Hellwig, LKML,
	Nick Piggin, Andi Kleen, David Howells, Jonathan Corbet,
	linux-fsdevel

[-- Attachment #1: fanotify-bit-fix --]
[-- Type: text/plain, Size: 1170 bytes --]

FMODE_NONOTIFY=0x800000 collides with __O_SYNC in sparc,
so change it to 0x1000000.

CC: Eric Paris <eparis@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/asm-generic/fcntl.h |    2 +-
 include/linux/fs.h          |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- linux.orig/include/asm-generic/fcntl.h	2010-01-05 10:42:36.000000000 +0800
+++ linux/include/asm-generic/fcntl.h	2010-01-05 10:42:57.000000000 +0800
@@ -5,7 +5,7 @@
 
 /*
  * FMODE_EXEC is 0x20
- * FMODE_NONOTIFY is 0x800000
+ * FMODE_NONOTIFY is 0x1000000
  * These cannot be used by userspace O_* until internal and external open
  * flags are split.
  * -Eric Paris
--- linux.orig/include/linux/fs.h	2010-01-05 10:40:33.000000000 +0800
+++ linux/include/linux/fs.h	2010-01-05 10:42:07.000000000 +0800
@@ -88,7 +88,7 @@ struct inodes_stat_t {
 #define FMODE_NOCMTIME		((__force fmode_t)2048)
 
 /* File was opened by fanotify and shouldn't generate fanotify events */
-#define FMODE_NONOTIFY		((__force fmode_t)8388608)
+#define FMODE_NONOTIFY		((__force fmode_t)0x1000000)
 
 /*
  * The below are the various read and write types that we support. Some of



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/6] bitops: compile time optimization for hweight_long(CONSTANT)
  2010-01-15  1:39 [PATCH 0/6] FMODE_NONOTIFY, FMODE_RANDOM and FMODE_NEG_OFFSET bits Wu Fengguang
  2010-01-15  1:39 ` [PATCH 1/6] fanotify: fix FMODE_NONOTIFY bit number Wu Fengguang
@ 2010-01-15  1:39 ` Wu Fengguang
  2010-01-15  1:39 ` [PATCH 3/6] vfs: O_* bit numbers uniqueness check Wu Fengguang
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2010-01-15  1:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Jamie Lokier, Roland Dreier, Wu Fengguang,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

[-- Attachment #1: constant-hweight32.patch --]
[-- Type: text/plain, Size: 868 bytes --]

This allows use of hweight_long() in BUILD_BUG_ON().
Suggested by Jamie.

CC: Jamie Lokier <jamie@shareable.org>
CC: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/bitops.h |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

--- linux.orig/include/linux/bitops.h	2010-01-09 12:13:00.000000000 +0800
+++ linux/include/linux/bitops.h	2010-01-09 12:21:50.000000000 +0800
@@ -40,10 +40,14 @@ static __inline__ int get_count_order(un
 	return order;
 }
 
-static inline unsigned long hweight_long(unsigned long w)
-{
-	return sizeof(w) == 4 ? hweight32(w) : hweight64(w);
-}
+#define hweight_long(x)				\
+(						\
+	__builtin_constant_p(x) ?		\
+		__builtin_popcountl(x) :	\
+		(sizeof(x) <= 4 ?		\
+			 hweight32(x) :		\
+			 hweight64(x))		\
+)
 
 /**
  * rol32 - rotate a 32-bit value left



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/6] vfs: O_* bit numbers uniqueness check
  2010-01-15  1:39 [PATCH 0/6] FMODE_NONOTIFY, FMODE_RANDOM and FMODE_NEG_OFFSET bits Wu Fengguang
  2010-01-15  1:39 ` [PATCH 1/6] fanotify: fix FMODE_NONOTIFY bit number Wu Fengguang
  2010-01-15  1:39 ` [PATCH 2/6] bitops: compile time optimization for hweight_long(CONSTANT) Wu Fengguang
@ 2010-01-15  1:39 ` Wu Fengguang
  2010-01-15  1:39 ` [PATCH 4/6] vfs: take f_lock on modifying f_mode after open time Wu Fengguang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2010-01-15  1:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, David Miller, Stephen Rothwell, Christoph Hellwig,
	Eric Paris, Roland Dreier, Jamie Lokier, Andreas Schwab,
	Wu Fengguang, LKML, Nick Piggin, Andi Kleen, David Howells,
	Jonathan Corbet, linux-fsdevel

[-- Attachment #1: fcntl-bit-check.patch --]
[-- Type: text/plain, Size: 2095 bytes --]

The O_* bit numbers are defined in 20+ arch/*, and can silently overlap.
Add a compile time check to ensure the uniqueness as suggested by David
Miller.

v4: use the nice hweight_long() (suggested by Jamie)
    split O_SYNC to (__O_SYNC | O_DSYNC) (suggested by Andreas)
    take away the FMODE_* and O_RANDOM bits
v3: change to BUILD_BUG_ON() (suggested by Roland)

CC: David Miller <davem@davemloft.net>
CC: Stephen Rothwell <sfr@canb.auug.org.au>
CC: Al Viro <viro@zeniv.linux.org.uk>
CC: Christoph Hellwig <hch@infradead.org>
CC: Eric Paris <eparis@redhat.com>
CC: Roland Dreier <rdreier@cisco.com>
CC: Jamie Lokier <jamie@shareable.org>
CC: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fcntl.c                  |   14 ++++++++++++--
 include/asm-generic/fcntl.h |    2 ++
 2 files changed, 14 insertions(+), 2 deletions(-)

--- linux.orig/fs/fcntl.c	2010-01-09 11:02:57.000000000 +0800
+++ linux/fs/fcntl.c	2010-01-09 21:36:16.000000000 +0800
@@ -709,11 +709,21 @@ void kill_fasync(struct fasync_struct **
 }
 EXPORT_SYMBOL(kill_fasync);
 
-static int __init fasync_init(void)
+static int __init fcntl_init(void)
 {
+	/* please add new bits here to ensure allocation uniqueness */
+	BUILD_BUG_ON(17 != hweight_long(
+		O_RDONLY	| O_WRONLY	| O_RDWR	|
+		O_CREAT		| O_EXCL	| O_NOCTTY	|
+		O_TRUNC		| O_APPEND	| O_NONBLOCK	|
+		__O_SYNC	| O_DSYNC	| FASYNC	|
+		O_DIRECT	| O_LARGEFILE	| O_DIRECTORY	|
+		O_NOFOLLOW	| O_NOATIME	| O_CLOEXEC
+		));
+
 	fasync_cache = kmem_cache_create("fasync_cache",
 		sizeof(struct fasync_struct), 0, SLAB_PANIC, NULL);
 	return 0;
 }
 
-module_init(fasync_init)
+module_init(fcntl_init)
--- linux.orig/include/asm-generic/fcntl.h	2010-01-09 11:02:57.000000000 +0800
+++ linux/include/asm-generic/fcntl.h	2010-01-09 12:23:17.000000000 +0800
@@ -4,6 +4,8 @@
 #include <linux/types.h>
 
 /*
+ * When introducing new O_* bits, please check its uniqueness in fcntl_init().
+ *
  * FMODE_EXEC is 0x20
  * FMODE_NONOTIFY is 0x1000000
  * These cannot be used by userspace O_* until internal and external open



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 4/6] vfs: take f_lock on modifying f_mode after open time
  2010-01-15  1:39 [PATCH 0/6] FMODE_NONOTIFY, FMODE_RANDOM and FMODE_NEG_OFFSET bits Wu Fengguang
                   ` (2 preceding siblings ...)
  2010-01-15  1:39 ` [PATCH 3/6] vfs: O_* bit numbers uniqueness check Wu Fengguang
@ 2010-01-15  1:39 ` Wu Fengguang
  2010-01-15  1:39 ` [PATCH 5/6] readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM Wu Fengguang
  2010-01-15  1:40 ` [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos Wu Fengguang
  5 siblings, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2010-01-15  1:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Christoph Hellwig, Wu Fengguang, LKML, Eric Paris,
	Nick Piggin, Andi Kleen, David Howells, Jonathan Corbet,
	linux-fsdevel

[-- Attachment #1: fmode-lock.patch --]
[-- Type: text/plain, Size: 1167 bytes --]

We'll introduce FMODE_RANDOM which will be runtime modified.
So protect all runtime modification to f_mode with f_lock to
avoid races.

CC: Al Viro <viro@zeniv.linux.org.uk>
CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/file_table.c     |    2 ++
 fs/nfsd/nfs4state.c |    2 ++
 2 files changed, 4 insertions(+)

--- linux.orig/fs/file_table.c	2010-01-15 09:11:07.000000000 +0800
+++ linux/fs/file_table.c	2010-01-15 09:11:15.000000000 +0800
@@ -392,7 +392,9 @@ retry:
 			continue;
 		if (!(f->f_mode & FMODE_WRITE))
 			continue;
+		spin_lock(&f->f_lock);
 		f->f_mode &= ~FMODE_WRITE;
+		spin_unlock(&f->f_lock);
 		if (file_check_writeable(f) != 0)
 			continue;
 		file_release_write(f);
--- linux.orig/fs/nfsd/nfs4state.c	2010-01-15 09:08:22.000000000 +0800
+++ linux/fs/nfsd/nfs4state.c	2010-01-15 09:11:15.000000000 +0800
@@ -1998,7 +1998,9 @@ nfs4_file_downgrade(struct file *filp, u
 {
 	if (share_access & NFS4_SHARE_ACCESS_WRITE) {
 		drop_file_write_access(filp);
+		spin_lock(&filp->f_lock);
 		filp->f_mode = (filp->f_mode | FMODE_READ) & ~FMODE_WRITE;
+		spin_unlock(&filp->f_lock);
 	}
 }
 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 5/6] readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM
  2010-01-15  1:39 [PATCH 0/6] FMODE_NONOTIFY, FMODE_RANDOM and FMODE_NEG_OFFSET bits Wu Fengguang
                   ` (3 preceding siblings ...)
  2010-01-15  1:39 ` [PATCH 4/6] vfs: take f_lock on modifying f_mode after open time Wu Fengguang
@ 2010-01-15  1:39 ` Wu Fengguang
  2010-01-15  1:40 ` [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos Wu Fengguang
  5 siblings, 0 replies; 22+ messages in thread
From: Wu Fengguang @ 2010-01-15  1:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Nick Piggin, Andi Kleen, Steven Whitehouse,
	David Howells, Jonathan Corbet, Christoph Hellwig, Wu Fengguang,
	LKML, Eric Paris, linux-fsdevel

[-- Attachment #1: fadvise-random.patch --]
[-- Type: text/plain, Size: 3833 bytes --]

This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.

POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor
performance: a 16K read will be carried out in 4 _sync_ 1-page reads.

In other places, ra_pages==0 means
- it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
- some IO error happened
where multi-page read IO won't help or should be avoided.

POSIX_FADV_RANDOM actually want a different semantics: to disable the
*heuristic* readahead algorithm, and to use a dumb one which faithfully
submit read IO for whatever application requests.

So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.

Note that the random hint is not likely to help random reads performance
noticeably. And it may be too permissive on huge request size (its IO
size is not limited by read_ahead_kb).

In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
(NFS read) performance of the application increased by 313%!

v6: use FMODE_RANDOM (proposed by Christoph Hellwig)
v5: use bit 0200000000; explicitly nuke the O_RANDOM bit in __dentry_open()
    (Stephen Rothwell)
v4: resolve bit conflicts with sparc and parisc;
    use bit 040000000(=FMODE_NONOTIFY), which will be masked out by
    __dentry_open(), so that open(O_RANDOM) is disabled
    (Stephen Rothwell and Christoph Hellwig)
v3: use O_RANDOM to indicate both read/write access pattern as in
    posix_fadvise(), although it only takes effect for read() now
    (proposed by Quentin)
v2: use O_RANDOM_READ to avoid race conditions (pointed out by Andi)

CC: Nick Piggin <npiggin@suse.de>
CC: Andi Kleen <andi@firstfloor.org>
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: David Howells <dhowells@redhat.com>
CC: Al Viro <viro@zeniv.linux.org.uk>
CC: Jonathan Corbet <corbet@lwn.net>
CC: Christoph Hellwig <hch@infradead.org>
Tested-by: Quentin Barnes <qbarnes+nfs@yahoo-inc.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/fs.h |    3 +++
 mm/fadvise.c       |   10 +++++++++-
 mm/readahead.c     |    6 ++++++
 3 files changed, 18 insertions(+), 1 deletion(-)

--- linux.orig/mm/fadvise.c	2010-01-09 11:00:11.000000000 +0800
+++ linux/mm/fadvise.c	2010-01-09 12:25:01.000000000 +0800
@@ -77,12 +77,20 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, lof
 	switch (advice) {
 	case POSIX_FADV_NORMAL:
 		file->f_ra.ra_pages = bdi->ra_pages;
+		spin_lock(&file->f_lock);
+		file->f_flags &= ~FMODE_RANDOM;
+		spin_unlock(&file->f_lock);
 		break;
 	case POSIX_FADV_RANDOM:
-		file->f_ra.ra_pages = 0;
+		spin_lock(&file->f_lock);
+		file->f_flags |= FMODE_RANDOM;
+		spin_unlock(&file->f_lock);
 		break;
 	case POSIX_FADV_SEQUENTIAL:
 		file->f_ra.ra_pages = bdi->ra_pages * 2;
+		spin_lock(&file->f_lock);
+		file->f_flags &= ~FMODE_RANDOM;
+		spin_unlock(&file->f_lock);
 		break;
 	case POSIX_FADV_WILLNEED:
 		if (!mapping->a_ops->readpage) {
--- linux.orig/mm/readahead.c	2010-01-09 11:00:11.000000000 +0800
+++ linux/mm/readahead.c	2010-01-09 12:25:01.000000000 +0800
@@ -501,6 +501,12 @@ void page_cache_sync_readahead(struct ad
 	if (!ra->ra_pages)
 		return;
 
+	/* be dumb */
+	if (filp->f_mode & FMODE_RANDOM) {
+		force_page_cache_readahead(mapping, filp, offset, req_size);
+		return;
+	}
+
 	/* do read-ahead */
 	ondemand_readahead(mapping, ra, filp, false, offset, req_size);
 }
--- linux.orig/include/linux/fs.h	2010-01-09 11:00:29.000000000 +0800
+++ linux/include/linux/fs.h	2010-01-09 12:26:33.000000000 +0800
@@ -90,6 +90,9 @@ struct inodes_stat_t {
 /* File was opened by fanotify and shouldn't generate fanotify events */
 #define FMODE_NONOTIFY		((__force fmode_t)0x1000000)
 
+/* Expect random access pattern */
+#define FMODE_RANDOM		((__force fmode_t)0x1000)
+
 /*
  * The below are the various read and write types that we support. Some of
  * them include behavioral modifiers that send information down to the



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-15  1:39 [PATCH 0/6] FMODE_NONOTIFY, FMODE_RANDOM and FMODE_NEG_OFFSET bits Wu Fengguang
                   ` (4 preceding siblings ...)
  2010-01-15  1:39 ` [PATCH 5/6] readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM Wu Fengguang
@ 2010-01-15  1:40 ` Wu Fengguang
  2010-01-16 12:54   ` OGAWA Hirofumi
  5 siblings, 1 reply; 22+ messages in thread
From: Wu Fengguang @ 2010-01-15  1:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Al Viro, Heiko Carstens, Wu Fengguang,
	KAMEZAWA Hiroyuki, Christoph Hellwig, LKML, Eric Paris,
	Nick Piggin, Andi Kleen, David Howells, Jonathan Corbet,
	linux-fsdevel

[-- Attachment #1: f_pos-fix --]
[-- Type: text/plain, Size: 3661 bytes --]

From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Now, rw_verify_area() checsk f_pos is negative or not. And if
negative, returns -EINVAL.

But, some special files as /dev/(k)mem and /proc/<pid>/mem etc..
has negative offsets. And we can't do any access via read/write
to the file(device).

So introduce FMODE_NEG_OFFSET to allow negative file offsets.

Changelog: v5->v6
 - use FMODE_NEG_OFFSET (suggested by Al)
 - rebased onto 2.6.33-rc1

Changelog: v4->v5
 - clean up patches dor /dev/mem.
 - rebased onto 2.6.32-rc1

Changelog: v3->v4
 - make changes in mem.c aligned.
 - change __negative_fpos_check() to return int. 
 - fixed bug in "pos" check.
 - added comments.

Changelog: v2->v3
 - fixed bug in rw_verify_area (it cannot be compiled)

CC: Al Viro <viro@ZenIV.linux.org.uk>
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
---
 drivers/char/mem.c |    4 ++++
 fs/proc/base.c     |    2 ++
 fs/read_write.c    |   21 +++++++++++++++++++--
 include/linux/fs.h |    3 +++
 4 files changed, 28 insertions(+), 2 deletions(-)

--- linux.orig/fs/read_write.c	2010-01-14 21:28:00.000000000 +0800
+++ linux/fs/read_write.c	2010-01-14 21:30:41.000000000 +0800
@@ -205,6 +205,20 @@ bad:
 }
 #endif
 
+static int
+__negative_fpos_check(struct file *file, loff_t pos, size_t count)
+{
+	/*
+	 * pos or pos+count is negative here, check overflow.
+	 * too big "count" will be caught in rw_verify_area().
+	 */
+	if ((pos < 0) && (pos + count < pos))
+		return -EOVERFLOW;
+	if (file->f_mode & FMODE_NEG_OFFSET)
+		return 0;
+	return -EINVAL;
+}
+
 /*
  * rw_verify_area doesn't like huge counts. We limit
  * them to something that fits in "int" so that others
@@ -222,8 +236,11 @@ int rw_verify_area(int read_write, struc
 	if (unlikely((ssize_t) count < 0))
 		return retval;
 	pos = *ppos;
-	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
-		return retval;
+	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
+		retval = __negative_fpos_check(file, pos, count);
+		if (retval)
+			return retval;
+	}
 
 	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
 		retval = locks_mandatory_area(
--- linux.orig/include/linux/fs.h	2010-01-14 21:28:00.000000000 +0800
+++ linux/include/linux/fs.h	2010-01-14 21:32:24.000000000 +0800
@@ -93,6 +93,9 @@ struct inodes_stat_t {
 /* Expect random access pattern */
 #define FMODE_RANDOM		((__force fmode_t)0x1000)
 
+/* File is huge (eg. /dev/kmem): treat loff_t as unsigned */
+#define FMODE_NEG_OFFSET	((__force fmode_t)0x2000)
+
 /*
  * The below are the various read and write types that we support. Some of
  * them include behavioral modifiers that send information down to the
--- linux.orig/drivers/char/mem.c	2010-01-14 21:28:00.000000000 +0800
+++ linux/drivers/char/mem.c	2010-01-14 21:33:20.000000000 +0800
@@ -861,6 +861,10 @@ static int memory_open(struct inode *ino
 	if (dev->dev_info)
 		filp->f_mapping->backing_dev_info = dev->dev_info;
 
+	/* Is /dev/mem or /dev/kmem ? */
+	if (dev->dev_info == &directly_mappable_cdev_bdi)
+		filp->f_mode |= FMODE_NEG_OFFSET;
+
 	if (dev->fops->open)
 		return dev->fops->open(inode, filp);
 
--- linux.orig/fs/proc/base.c	2010-01-14 21:28:00.000000000 +0800
+++ linux/fs/proc/base.c	2010-01-14 21:37:08.000000000 +0800
@@ -861,6 +861,8 @@ static const struct file_operations proc
 static int mem_open(struct inode* inode, struct file* file)
 {
 	file->private_data = (void*)((long)current->self_exec_id);
+	/* OK to pass negative loff_t, we can catch out-of-range */
+	file->f_mode |= FMODE_NEG_OFFSET;
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-15  1:40 ` [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos Wu Fengguang
@ 2010-01-16 12:54   ` OGAWA Hirofumi
  2010-01-18  0:15     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 22+ messages in thread
From: OGAWA Hirofumi @ 2010-01-16 12:54 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Al Viro, Heiko Carstens, KAMEZAWA Hiroyuki,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

Wu Fengguang <fengguang.wu@intel.com> writes:

> +static int
> +__negative_fpos_check(struct file *file, loff_t pos, size_t count)
> +{
> +	/*
> +	 * pos or pos+count is negative here, check overflow.
> +	 * too big "count" will be caught in rw_verify_area().
> +	 */
> +	if ((pos < 0) && (pos + count < pos))
> +		return -EOVERFLOW;
> +	if (file->f_mode & FMODE_NEG_OFFSET)
> +		return 0;
> +	return -EINVAL;
> +}
> +
>  /*
>   * rw_verify_area doesn't like huge counts. We limit
>   * them to something that fits in "int" so that others
> @@ -222,8 +236,11 @@ int rw_verify_area(int read_write, struc
>  	if (unlikely((ssize_t) count < 0))
>  		return retval;
>  	pos = *ppos;
> -	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> -		return retval;
> +	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> +		retval = __negative_fpos_check(file, pos, count);
> +		if (retval)
> +			return retval;
> +	}
>  
>  	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
>  		retval = locks_mandatory_area(

Um... How do lseek() work? It sounds like to violate error code range.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-16 12:54   ` OGAWA Hirofumi
@ 2010-01-18  0:15     ` KAMEZAWA Hiroyuki
  2010-01-18  1:17       ` OGAWA Hirofumi
  0 siblings, 1 reply; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-01-18  0:15 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Wu Fengguang, Andrew Morton, Al Viro, Heiko Carstens,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

On Sat, 16 Jan 2010 21:54:39 +0900
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:

> Wu Fengguang <fengguang.wu@intel.com> writes:
> 
> > +static int
> > +__negative_fpos_check(struct file *file, loff_t pos, size_t count)
> > +{
> > +	/*
> > +	 * pos or pos+count is negative here, check overflow.
> > +	 * too big "count" will be caught in rw_verify_area().
> > +	 */
> > +	if ((pos < 0) && (pos + count < pos))
> > +		return -EOVERFLOW;
> > +	if (file->f_mode & FMODE_NEG_OFFSET)
> > +		return 0;
> > +	return -EINVAL;
> > +}
> > +
> >  /*
> >   * rw_verify_area doesn't like huge counts. We limit
> >   * them to something that fits in "int" so that others
> > @@ -222,8 +236,11 @@ int rw_verify_area(int read_write, struc
> >  	if (unlikely((ssize_t) count < 0))
> >  		return retval;
> >  	pos = *ppos;
> > -	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> > -		return retval;
> > +	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> > +		retval = __negative_fpos_check(file, pos, count);
> > +		if (retval)
> > +			return retval;
> > +	}
> >  
> >  	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
> >  		retval = locks_mandatory_area(
> 
> Um... How do lseek() work? It sounds like to violate error code range.

This is for read-write. As far as I know, 
  - generic_file_llseek,
  - default_llseek
  - no_llseek

doesn't call this function. 

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-18  0:15     ` KAMEZAWA Hiroyuki
@ 2010-01-18  1:17       ` OGAWA Hirofumi
  2010-01-18  1:25         ` KAMEZAWA Hiroyuki
  2010-01-18  1:32         ` OGAWA Hirofumi
  0 siblings, 2 replies; 22+ messages in thread
From: OGAWA Hirofumi @ 2010-01-18  1:17 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, Andrew Morton, Al Viro, Heiko Carstens,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

>> > +static int
>> > +__negative_fpos_check(struct file *file, loff_t pos, size_t count)
>> > +{
>> > +	/*
>> > +	 * pos or pos+count is negative here, check overflow.
>> > +	 * too big "count" will be caught in rw_verify_area().
>> > +	 */
>> > +	if ((pos < 0) && (pos + count < pos))
>> > +		return -EOVERFLOW;
>> > +	if (file->f_mode & FMODE_NEG_OFFSET)
>> > +		return 0;
>> > +	return -EINVAL;
>> > +}
>> > +
>> >  /*
>> >   * rw_verify_area doesn't like huge counts. We limit
>> >   * them to something that fits in "int" so that others
>> > @@ -222,8 +236,11 @@ int rw_verify_area(int read_write, struc
>> >  	if (unlikely((ssize_t) count < 0))
>> >  		return retval;
>> >  	pos = *ppos;
>> > -	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
>> > -		return retval;
>> > +	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
>> > +		retval = __negative_fpos_check(file, pos, count);
>> > +		if (retval)
>> > +			return retval;
>> > +	}
>> >  
>> >  	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
>> >  		retval = locks_mandatory_area(
>> 
>> Um... How do lseek() work? It sounds like to violate error code range.
>
> This is for read-write. As far as I know, 
>   - generic_file_llseek,
>   - default_llseek
>   - no_llseek
>
> doesn't call this function. 

It seems to allow to set negative value to ->f_pos, right? So, lseek()
returns (uses) it?

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-18  1:17       ` OGAWA Hirofumi
@ 2010-01-18  1:25         ` KAMEZAWA Hiroyuki
  2010-01-18  1:38           ` OGAWA Hirofumi
  2010-01-18  1:32         ` OGAWA Hirofumi
  1 sibling, 1 reply; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-01-18  1:25 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Wu Fengguang, Andrew Morton, Al Viro, Heiko Carstens,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

On Mon, 18 Jan 2010 10:17:48 +0900
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:

> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> 
> >> > +static int
> >> > +__negative_fpos_check(struct file *file, loff_t pos, size_t count)
> >> > +{
> >> > +	/*
> >> > +	 * pos or pos+count is negative here, check overflow.
> >> > +	 * too big "count" will be caught in rw_verify_area().
> >> > +	 */
> >> > +	if ((pos < 0) && (pos + count < pos))
> >> > +		return -EOVERFLOW;
> >> > +	if (file->f_mode & FMODE_NEG_OFFSET)
> >> > +		return 0;
> >> > +	return -EINVAL;
> >> > +}
> >> > +
> >> >  /*
> >> >   * rw_verify_area doesn't like huge counts. We limit
> >> >   * them to something that fits in "int" so that others
> >> > @@ -222,8 +236,11 @@ int rw_verify_area(int read_write, struc
> >> >  	if (unlikely((ssize_t) count < 0))
> >> >  		return retval;
> >> >  	pos = *ppos;
> >> > -	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0))
> >> > -		return retval;
> >> > +	if (unlikely((pos < 0) || (loff_t) (pos + count) < 0)) {
> >> > +		retval = __negative_fpos_check(file, pos, count);
> >> > +		if (retval)
> >> > +			return retval;
> >> > +	}
> >> >  
> >> >  	if (unlikely(inode->i_flock && mandatory_lock(inode))) {
> >> >  		retval = locks_mandatory_area(
> >> 
> >> Um... How do lseek() work? It sounds like to violate error code range.
> >
> > This is for read-write. As far as I know, 
> >   - generic_file_llseek,
> >   - default_llseek
> >   - no_llseek
> >
> > doesn't call this function. 
> 
> It seems to allow to set negative value to ->f_pos, right?
yes. Some file (/dev/kmem) requires that. 

> So, lseek() returns (uses) it?

lseek can return negative value, as far as I know.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-18  1:17       ` OGAWA Hirofumi
  2010-01-18  1:25         ` KAMEZAWA Hiroyuki
@ 2010-01-18  1:32         ` OGAWA Hirofumi
  2010-01-18  1:49           ` KAMEZAWA Hiroyuki
  1 sibling, 1 reply; 22+ messages in thread
From: OGAWA Hirofumi @ 2010-01-18  1:32 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, Andrew Morton, Al Viro, Heiko Carstens,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> writes:

>>> Um... How do lseek() work? It sounds like to violate error code range.
>>
>> This is for read-write. As far as I know, 
>>   - generic_file_llseek,
>>   - default_llseek
>>   - no_llseek
>>
>> doesn't call this function. 
>
> It seems to allow to set negative value to ->f_pos, right? So, lseek()
> returns (uses) it?

BTW, another concern by negative "pos" value is, the following like code

	pos >> shift_bits

it will break the above. So, I think it should be checked if not yet.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-18  1:25         ` KAMEZAWA Hiroyuki
@ 2010-01-18  1:38           ` OGAWA Hirofumi
  2010-01-18  2:00             ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 22+ messages in thread
From: OGAWA Hirofumi @ 2010-01-18  1:38 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, Andrew Morton, Al Viro, Heiko Carstens,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

>> So, lseek() returns (uses) it?
>
> lseek can return negative value, as far as I know.

Umm..., how do you know the difference of -EOVERFLOW and fpos == -75? 

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-18  1:32         ` OGAWA Hirofumi
@ 2010-01-18  1:49           ` KAMEZAWA Hiroyuki
  2010-01-18  1:59             ` OGAWA Hirofumi
  0 siblings, 1 reply; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-01-18  1:49 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Wu Fengguang, Andrew Morton, Al Viro, Heiko Carstens,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

On Mon, 18 Jan 2010 10:32:49 +0900
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:

> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> writes:
> 
> >>> Um... How do lseek() work? It sounds like to violate error code range.
> >>
> >> This is for read-write. As far as I know, 
> >>   - generic_file_llseek,
> >>   - default_llseek
> >>   - no_llseek
> >>
> >> doesn't call this function. 
> >
> > It seems to allow to set negative value to ->f_pos, right? So, lseek()
> > returns (uses) it?
> 
> BTW, another concern by negative "pos" value is, the following like code
> 
> 	pos >> shift_bits
> 
> it will break the above. So, I think it should be checked if not yet.

Where do we check ?

FMODE_NEG_OFFSET is just used by /dev/mem and /proc/<pid>/mem. And I don't
think there are no additonal users. So, I myself don't have has such concerns...


Thanks,
-Kame


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-18  1:49           ` KAMEZAWA Hiroyuki
@ 2010-01-18  1:59             ` OGAWA Hirofumi
  0 siblings, 0 replies; 22+ messages in thread
From: OGAWA Hirofumi @ 2010-01-18  1:59 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, Andrew Morton, Al Viro, Heiko Carstens,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

>> BTW, another concern by negative "pos" value is, the following like code
>> 
>> 	pos >> shift_bits
>> 
>> it will break the above. So, I think it should be checked if not yet.
>
> Where do we check ?
>
> FMODE_NEG_OFFSET is just used by /dev/mem and /proc/<pid>/mem. And I don't
> think there are no additonal users. So, I myself don't have has such concerns...

Sorry, it's just my concern. I'm not checking real path (e.g. vfs) of
related to /dev/mem, if there is no user of such code, it's ok.

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-18  1:38           ` OGAWA Hirofumi
@ 2010-01-18  2:00             ` KAMEZAWA Hiroyuki
  2010-01-18  2:13               ` OGAWA Hirofumi
  0 siblings, 1 reply; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-01-18  2:00 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Wu Fengguang, Andrew Morton, Al Viro, Heiko Carstens,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

On Mon, 18 Jan 2010 10:38:27 +0900
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:

> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> 
> >> So, lseek() returns (uses) it?
> >
> > lseek can return negative value, as far as I know.
> 
> Umm..., how do you know the difference of -EOVERFLOW and fpos == -75? 
> 

Ah, sorry. I read wrong.

For /dev/mem, it uses its own lseek function which allows negative f_pos
value. Other usual file system doesn't allow negative f_pos.

It's ok not to return -EOVEFLOW for /dev/mem because there is no file end.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-18  2:00             ` KAMEZAWA Hiroyuki
@ 2010-01-18  2:13               ` OGAWA Hirofumi
  2010-01-18  2:30                 ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 22+ messages in thread
From: OGAWA Hirofumi @ 2010-01-18  2:13 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Wu Fengguang, Andrew Morton, Al Viro, Heiko Carstens,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:

> On Mon, 18 Jan 2010 10:38:27 +0900
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
>
>> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
>> 
>> >> So, lseek() returns (uses) it?
>> >
>> > lseek can return negative value, as far as I know.
>> 
>> Umm..., how do you know the difference of -EOVERFLOW and fpos == -75? 
>> 
>
> Ah, sorry. I read wrong.
>
> For /dev/mem, it uses its own lseek function which allows negative f_pos
> value. Other usual file system doesn't allow negative f_pos.
>
> It's ok not to return -EOVEFLOW for /dev/mem because there is no file end.

No, no. I think it has the problem.

E.g. /dev/mem returns -75 as fpos, so, lseek(2) returns -75 to
userland. Then the userland (e.g. glibc) convert it as
error. I.e. finally, errno == -75, and lseek(3) returns -1, right?

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-18  2:13               ` OGAWA Hirofumi
@ 2010-01-18  2:30                 ` KAMEZAWA Hiroyuki
  2010-01-18  3:15                   ` Wu, Fengguang
  0 siblings, 1 reply; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-01-18  2:30 UTC (permalink / raw)
  To: OGAWA Hirofumi
  Cc: Wu Fengguang, Andrew Morton, Al Viro, Heiko Carstens,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

On Mon, 18 Jan 2010 11:13:04 +0900
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:

> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> 
> > On Mon, 18 Jan 2010 10:38:27 +0900
> > OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
> >
> >> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> >> 
> >> >> So, lseek() returns (uses) it?
> >> >
> >> > lseek can return negative value, as far as I know.
> >> 
> >> Umm..., how do you know the difference of -EOVERFLOW and fpos == -75? 
> >> 
> >
> > Ah, sorry. I read wrong.
> >
> > For /dev/mem, it uses its own lseek function which allows negative f_pos
> > value. Other usual file system doesn't allow negative f_pos.
> >
> > It's ok not to return -EOVEFLOW for /dev/mem because there is no file end.
> 
> No, no. I think it has the problem.
> 
> E.g. /dev/mem returns -75 as fpos, so, lseek(2) returns -75 to
> userland. Then the userland (e.g. glibc) convert it as
> error. I.e. finally, errno == -75, and lseek(3) returns -1, right?
> 
Maybe possible.

Hmm. Then, /dev/mem's llseek need some fix not to return pos < -PAGESIZE.
Wu-san, could you add additional bug fix to lseek()'s f_pos handling in
/dev/mem ?

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-18  2:30                 ` KAMEZAWA Hiroyuki
@ 2010-01-18  3:15                   ` Wu, Fengguang
  2010-01-18  3:22                     ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 22+ messages in thread
From: Wu, Fengguang @ 2010-01-18  3:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki, OGAWA Hirofumi
  Cc: Andrew Morton, Al Viro, Heiko Carstens, Christoph Hellwig, LKML,
	Eric Paris, Nick Piggin, Andi Kleen, David Howells,
	Jonathan Corbet, linux-fsdevel

Hi,

[replying from webmail, sorry for top-posting]

memory_lseek() calls force_successful_syscall_return() to force success on negative vals.
However that is a no-op for x86.

My experiment shows that lseek() does return negative pos. However,
manual says that "a value of (off_t) -1 is returned" on error. So it's OK
as long as your program is written as "err == -1" instead of "err < 0".

code:
        err = lseek64(fd, addr, SEEK_SET);
        if (err == -1)
                perror("seek " FILENAME);

output:
  # kmem-rw 0xffffffffa0094000
  addr=0xffffffffa0094000 val=0x441f0fe5894855

strace:
  open("/dev/kmem", O_RDWR)               = 3
  lseek(3, 18446744072099545088, SEEK_SET) = 18446744072099545088
  read(3, "UH\211\345\17\37D\0"..., 8)    = 8

Thanks,
Fengguang
________________________________________
From: KAMEZAWA Hiroyuki [kamezawa.hiroyu@jp.fujitsu.com]
Sent: Monday, January 18, 2010 10:30 AM
To: OGAWA Hirofumi
Cc: Wu, Fengguang; Andrew Morton; Al Viro; Heiko Carstens; Christoph Hellwig; LKML; Eric Paris; Nick Piggin; Andi Kleen; David Howells; Jonathan Corbet; linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos

On Mon, 18 Jan 2010 11:13:04 +0900
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:

> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
>
> > On Mon, 18 Jan 2010 10:38:27 +0900
> > OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
> >
> >> KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> writes:
> >>
> >> >> So, lseek() returns (uses) it?
> >> >
> >> > lseek can return negative value, as far as I know.
> >>
> >> Umm..., how do you know the difference of -EOVERFLOW and fpos == -75?
> >>
> >
> > Ah, sorry. I read wrong.
> >
> > For /dev/mem, it uses its own lseek function which allows negative f_pos
> > value. Other usual file system doesn't allow negative f_pos.
> >
> > It's ok not to return -EOVEFLOW for /dev/mem because there is no file end.
>
> No, no. I think it has the problem.
>
> E.g. /dev/mem returns -75 as fpos, so, lseek(2) returns -75 to
> userland. Then the userland (e.g. glibc) convert it as
> error. I.e. finally, errno == -75, and lseek(3) returns -1, right?
>
Maybe possible.

Hmm. Then, /dev/mem's llseek need some fix not to return pos < -PAGESIZE.
Wu-san, could you add additional bug fix to lseek()'s f_pos handling in
/dev/mem ?

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-18  3:15                   ` Wu, Fengguang
@ 2010-01-18  3:22                     ` KAMEZAWA Hiroyuki
  2010-01-18  5:26                       ` Wu, Fengguang
  0 siblings, 1 reply; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-01-18  3:22 UTC (permalink / raw)
  To: Wu, Fengguang
  Cc: OGAWA Hirofumi, Andrew Morton, Al Viro, Heiko Carstens,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

On Mon, 18 Jan 2010 11:15:38 +0800
"Wu, Fengguang" <fengguang.wu@intel.com> wrote:

> Hi,
> 
> [replying from webmail, sorry for top-posting]
> 
> memory_lseek() calls force_successful_syscall_return() to force success on negative vals.
> However that is a no-op for x86.
> 
> My experiment shows that lseek() does return negative pos. However,
> manual says that "a value of (off_t) -1 is returned" on error. So it's OK
> as long as your program is written as "err == -1" instead of "err < 0".
> 
On error, the kernel returns -EOVERFLOW (via %eax) and libc hides
it by
  errno = EOVERFLOW
  ret = -1

The problem discussed here is the kernel's return value. So, the kernel's
lseek should check that, I think.

Anyway, this lseek problem is not related to this patch itself and has
existed for very long time. Fixing it later by another patch is not very
bad, I think.
(I'm sorry I myself is not ready for writing a patch...)

Thaks,
-Kame


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-18  3:22                     ` KAMEZAWA Hiroyuki
@ 2010-01-18  5:26                       ` Wu, Fengguang
  2010-01-19  0:37                         ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 22+ messages in thread
From: Wu, Fengguang @ 2010-01-18  5:26 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: OGAWA Hirofumi, Andrew Morton, Al Viro, Heiko Carstens,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 181 bytes --]


> On error, the kernel returns -EOVERFLOW (via %eax) and libc hides
> it by
>  errno = EOVERFLOW
>  ret = -1

Ah got it. How about the attached patch?

Thanks,
Fengguang

[-- Attachment #2: mem-seek-fix --]
[-- Type: application/octet-stream, Size: 1404 bytes --]

devmem: don't allow seek to last page

So as to return a uniform error -EOVERFLOW instead of a random one:

# kmem-seek 0xfffffffffffffff0
seek /dev/kmem: Device or resource busy
# kmem-seek 0xfffffffffffffff1
seek /dev/kmem: Block device required

Suggested by OGAWA Hirofumi.

CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> 
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 drivers/char/mem.c |   19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

--- linux-mm.orig/drivers/char/mem.c	2010-01-18 12:37:11.000000000 +0800
+++ linux-mm/drivers/char/mem.c	2010-01-18 13:18:24.000000000 +0800
@@ -725,16 +725,23 @@ static loff_t memory_lseek(struct file *
 
 	mutex_lock(&file->f_path.dentry->d_inode->i_mutex);
 	switch (orig) {
-		case 0:
+		case SEEK_CUR:
+			offset += file->f_pos;
+			if ((unsigned long long)offset <
+			    (unsigned long long)file->f_pos) {
+				ret = -EOVERFLOW;
+				break;
+			}
+		case SEEK_SET:
+			/* to avoid eg. mistaking f_pos=-9 as -EBADF=-9 */
+			if ((unsigned long long)offset >= ~0xFFFULL) {
+				ret = -EOVERFLOW;
+				break;
+			}
 			file->f_pos = offset;
 			ret = file->f_pos;
 			force_successful_syscall_return();
 			break;
-		case 1:
-			file->f_pos += offset;
-			ret = file->f_pos;
-			force_successful_syscall_return();
-			break;
 		default:
 			ret = -EINVAL;
 	}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos
  2010-01-18  5:26                       ` Wu, Fengguang
@ 2010-01-19  0:37                         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 22+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-01-19  0:37 UTC (permalink / raw)
  To: Wu, Fengguang
  Cc: OGAWA Hirofumi, Andrew Morton, Al Viro, Heiko Carstens,
	Christoph Hellwig, LKML, Eric Paris, Nick Piggin, Andi Kleen,
	David Howells, Jonathan Corbet, linux-fsdevel

On Mon, 18 Jan 2010 13:26:44 +0800
"Wu, Fengguang" <fengguang.wu@intel.com> wrote:

> 
> > On error, the kernel returns -EOVERFLOW (via %eax) and libc hides
> > it by
> >  errno = EOVERFLOW
> >  ret = -1
> 
> Ah got it. How about the attached patch?
> 

Seems good to me. Thank you very much.

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-01-19  0:40 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-01-15  1:39 [PATCH 0/6] FMODE_NONOTIFY, FMODE_RANDOM and FMODE_NEG_OFFSET bits Wu Fengguang
2010-01-15  1:39 ` [PATCH 1/6] fanotify: fix FMODE_NONOTIFY bit number Wu Fengguang
2010-01-15  1:39 ` [PATCH 2/6] bitops: compile time optimization for hweight_long(CONSTANT) Wu Fengguang
2010-01-15  1:39 ` [PATCH 3/6] vfs: O_* bit numbers uniqueness check Wu Fengguang
2010-01-15  1:39 ` [PATCH 4/6] vfs: take f_lock on modifying f_mode after open time Wu Fengguang
2010-01-15  1:39 ` [PATCH 5/6] readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM Wu Fengguang
2010-01-15  1:40 ` [PATCH 6/6] vfs: introduce FMODE_NEG_OFFSET for allowing negative f_pos Wu Fengguang
2010-01-16 12:54   ` OGAWA Hirofumi
2010-01-18  0:15     ` KAMEZAWA Hiroyuki
2010-01-18  1:17       ` OGAWA Hirofumi
2010-01-18  1:25         ` KAMEZAWA Hiroyuki
2010-01-18  1:38           ` OGAWA Hirofumi
2010-01-18  2:00             ` KAMEZAWA Hiroyuki
2010-01-18  2:13               ` OGAWA Hirofumi
2010-01-18  2:30                 ` KAMEZAWA Hiroyuki
2010-01-18  3:15                   ` Wu, Fengguang
2010-01-18  3:22                     ` KAMEZAWA Hiroyuki
2010-01-18  5:26                       ` Wu, Fengguang
2010-01-19  0:37                         ` KAMEZAWA Hiroyuki
2010-01-18  1:32         ` OGAWA Hirofumi
2010-01-18  1:49           ` KAMEZAWA Hiroyuki
2010-01-18  1:59             ` OGAWA Hirofumi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).