linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] clean up and generalize swap-over-NFS
@ 2014-12-20  3:18 Omar Sandoval
  2014-12-20  3:18 ` [PATCH v2 1/5] iov_iter: add ITER_BVEC helpers Omar Sandoval
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Omar Sandoval @ 2014-12-20  3:18 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton, Trond Myklebust,
	Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
	linux-kernel
  Cc: Omar Sandoval

Hi,

This patch series (based on ecb5ec0 in Linus' tree) contains all of the
non-BTRFS work that I've done to implement swapfiles on BTRFS. The BTRFS
portion is still undergoing development and is now outweighed by the
non-BTRFS changes, so I want to get these in separately.

Version 2 changes the generic swapfile interface to use ->read_iter and
->write_iter instead of using ->direct_IO directly in response to
discussion on the previous submission. It also adds the iov_iter_is_bvec
helper to factor out some common checks.

Version 1 can be found here: https://lkml.org/lkml/2014/12/15/7

Omar Sandoval (5):
  iov_iter: add ITER_BVEC helpers
  direct-io: don't dirty ITER_BVEC pages on read
  nfs: don't dirty ITER_BVEC pages read through direct I/O
  swapfile: use ->read_iter and ->write_iter
  vfs: update swap_{,de}activate documentation

 Documentation/filesystems/Locking |  7 ++++---
 Documentation/filesystems/vfs.txt |  7 ++++---
 fs/direct-io.c                    |  8 ++++---
 fs/nfs/direct.c                   |  5 ++++-
 fs/splice.c                       |  7 ++-----
 include/linux/uio.h               |  7 +++++++
 mm/iov_iter.c                     | 12 +++++++++++
 mm/page_io.c                      | 44 +++++++++++++++++++++++++--------------
 mm/swapfile.c                     | 11 +++++++++-
 9 files changed, 76 insertions(+), 32 deletions(-)

-- 
2.2.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/5] iov_iter: add ITER_BVEC helpers
  2014-12-20  3:18 [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
@ 2014-12-20  3:18 ` Omar Sandoval
  2014-12-20  3:18 ` [PATCH v2 2/5] direct-io: don't dirty ITER_BVEC pages on read Omar Sandoval
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2014-12-20  3:18 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton, Trond Myklebust,
	Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
	linux-kernel
  Cc: Omar Sandoval

Add iov_iter_is_bvec and iov_iter_bvec and convert callers.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
 fs/splice.c         |  7 ++-----
 include/linux/uio.h |  7 +++++++
 mm/iov_iter.c       | 12 ++++++++++++
 mm/page_io.c        | 14 +++++---------
 4 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 75c6058..7c7176f 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1006,11 +1006,8 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
 		}
 
 		/* ... iov_iter */
-		from.type = ITER_BVEC | WRITE;
-		from.bvec = array;
-		from.nr_segs = n;
-		from.count = sd.total_len - left;
-		from.iov_offset = 0;
+		iov_iter_bvec(&from, ITER_BVEC | WRITE, array, n,
+			      sd.total_len - left);
 
 		/* ... and iocb */
 		init_sync_kiocb(&kiocb, out);
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 1c5e453..23bb5e4 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -63,6 +63,11 @@ static inline struct iovec iov_iter_iovec(const struct iov_iter *iter)
 	};
 }
 
+static inline int iov_iter_is_bvec(const struct iov_iter *iter)
+{
+	return (iter->type & ITER_BVEC) != 0;
+}
+
 #define iov_for_each(iov, iter, start)				\
 	if (!((start).type & ITER_BVEC))			\
 	for (iter = (start);					\
@@ -90,6 +95,8 @@ void iov_iter_init(struct iov_iter *i, int direction, const struct iovec *iov,
 			unsigned long nr_segs, size_t count);
 void iov_iter_kvec(struct iov_iter *i, int direction, const struct kvec *iov,
 			unsigned long nr_segs, size_t count);
+void iov_iter_bvec(struct iov_iter *i, int direction, const struct bio_vec *bv,
+		   unsigned long nr_segs, size_t count);
 ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
 			size_t maxsize, unsigned maxpages, size_t *start);
 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
diff --git a/mm/iov_iter.c b/mm/iov_iter.c
index a1599ca..c975bc4 100644
--- a/mm/iov_iter.c
+++ b/mm/iov_iter.c
@@ -513,6 +513,18 @@ void iov_iter_kvec(struct iov_iter *i, int direction,
 }
 EXPORT_SYMBOL(iov_iter_kvec);
 
+void iov_iter_bvec(struct iov_iter *i, int direction, const struct bio_vec *bv,
+		   unsigned long nr_segs, size_t count)
+{
+	BUG_ON(!(direction & ITER_BVEC));
+	i->type = direction;
+	i->bvec = bv;
+	i->nr_segs = nr_segs;
+	i->iov_offset = 0;
+	i->count = count;
+}
+EXPORT_SYMBOL(iov_iter_bvec);
+
 unsigned long iov_iter_alignment(const struct iov_iter *i)
 {
 	unsigned long res = 0;
diff --git a/mm/page_io.c b/mm/page_io.c
index 955db8b..532a39b 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -264,18 +264,14 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 		struct kiocb kiocb;
 		struct file *swap_file = sis->swap_file;
 		struct address_space *mapping = swap_file->f_mapping;
+		struct iov_iter from;
 		struct bio_vec bv = {
 			.bv_page = page,
-			.bv_len  = PAGE_SIZE,
-			.bv_offset = 0
+			.bv_len = PAGE_SIZE,
+			.bv_offset = 0,
 		};
-		struct iov_iter from = {
-			.type = ITER_BVEC | WRITE,
-			.count = PAGE_SIZE,
-			.iov_offset = 0,
-			.nr_segs = 1,
-		};
-		from.bvec = &bv;	/* older gcc versions are broken */
+
+		iov_iter_bvec(&from, ITER_BVEC | WRITE, &bv, 1, PAGE_SIZE);
 
 		init_sync_kiocb(&kiocb, swap_file);
 		kiocb.ki_pos = page_file_offset(page);
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/5] direct-io: don't dirty ITER_BVEC pages on read
  2014-12-20  3:18 [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
  2014-12-20  3:18 ` [PATCH v2 1/5] iov_iter: add ITER_BVEC helpers Omar Sandoval
@ 2014-12-20  3:18 ` Omar Sandoval
  2014-12-20  6:01   ` Al Viro
  2014-12-20  3:18 ` [PATCH v2 3/5] nfs: don't dirty ITER_BVEC pages read through direct I/O Omar Sandoval
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2014-12-20  3:18 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton, Trond Myklebust,
	Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
	linux-kernel
  Cc: Omar Sandoval, Ming Lei

Reads through the iov_iter infrastructure for kernel pages shouldn't be
dirtied by the direct I/O code.

This is based on Dave Kleikamp's and Ming Lei's previously posted
patches.

Cc: Ming Lei <ming.lei@canonical.com>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
 fs/direct-io.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index e181b6b..c71387b 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -120,6 +120,7 @@ struct dio {
 	spinlock_t bio_lock;		/* protects BIO fields below */
 	int page_errors;		/* errno from get_user_pages() */
 	int is_async;			/* is IO async ? */
+	int should_dirty;		/* should we mark read pages dirty? */
 	bool defer_completion;		/* defer AIO completion to workqueue? */
 	int io_error;			/* IO error in completion path */
 	unsigned long refcount;		/* direct_io_worker() and bios */
@@ -392,7 +393,7 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio)
 	dio->refcount++;
 	spin_unlock_irqrestore(&dio->bio_lock, flags);
 
-	if (dio->is_async && dio->rw == READ)
+	if (dio->is_async && dio->rw == READ && dio->should_dirty)
 		bio_set_pages_dirty(bio);
 
 	if (sdio->submit_io)
@@ -463,13 +464,13 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
 	if (!uptodate)
 		dio->io_error = -EIO;
 
-	if (dio->is_async && dio->rw == READ) {
+	if (dio->is_async && dio->rw == READ && dio->should_dirty) {
 		bio_check_pages_dirty(bio);	/* transfers ownership */
 	} else {
 		bio_for_each_segment_all(bvec, bio, i) {
 			struct page *page = bvec->bv_page;
 
-			if (dio->rw == READ && !PageCompound(page))
+			if (dio->rw == READ && !PageCompound(page) && dio->should_dirty)
 				set_page_dirty_lock(page);
 			page_cache_release(page);
 		}
@@ -1177,6 +1178,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
 
 	dio->inode = inode;
 	dio->rw = rw;
+	dio->should_dirty = !iov_iter_is_bvec(iter);
 
 	/*
 	 * For AIO O_(D)SYNC writes we need to defer completions to a workqueue
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/5] nfs: don't dirty ITER_BVEC pages read through direct I/O
  2014-12-20  3:18 [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
  2014-12-20  3:18 ` [PATCH v2 1/5] iov_iter: add ITER_BVEC helpers Omar Sandoval
  2014-12-20  3:18 ` [PATCH v2 2/5] direct-io: don't dirty ITER_BVEC pages on read Omar Sandoval
@ 2014-12-20  3:18 ` Omar Sandoval
  2015-01-05 14:41   ` Anna Schumaker
  2014-12-20  3:18 ` [PATCH v2 4/5] swapfile: use ->read_iter and ->write_iter Omar Sandoval
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2014-12-20  3:18 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton, Trond Myklebust,
	Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
	linux-kernel
  Cc: Omar Sandoval

As with the generic blockdev code, kernel pages shouldn't be dirtied by
the direct I/O path.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
 fs/nfs/direct.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 10bf072..b6ca65c 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -88,6 +88,7 @@ struct nfs_direct_req {
 	struct pnfs_ds_commit_info ds_cinfo;	/* Storage for cinfo */
 	struct work_struct	work;
 	int			flags;
+	int			should_dirty;	/* should we mark read pages dirty? */
 #define NFS_ODIRECT_DO_COMMIT		(1)	/* an unstable reply was received */
 #define NFS_ODIRECT_RESCHED_WRITES	(2)	/* write verification failed */
 	struct nfs_writeverf	verf;		/* unstable write verifier */
@@ -370,7 +371,8 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
 		struct nfs_page *req = nfs_list_entry(hdr->pages.next);
 		struct page *page = req->wb_page;
 
-		if (!PageCompound(page) && bytes < hdr->good_bytes)
+		if (!PageCompound(page) && bytes < hdr->good_bytes &&
+		    dreq->should_dirty)
 			set_page_dirty(page);
 		bytes += req->wb_bytes;
 		nfs_list_remove_request(req);
@@ -542,6 +544,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
 	dreq->inode = inode;
 	dreq->bytes_left = count;
 	dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
+	dreq->should_dirty = !iov_iter_is_bvec(iter);
 	l_ctx = nfs_get_lock_context(dreq->ctx);
 	if (IS_ERR(l_ctx)) {
 		result = PTR_ERR(l_ctx);
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/5] swapfile: use ->read_iter and ->write_iter
  2014-12-20  3:18 [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
                   ` (2 preceding siblings ...)
  2014-12-20  3:18 ` [PATCH v2 3/5] nfs: don't dirty ITER_BVEC pages read through direct I/O Omar Sandoval
@ 2014-12-20  3:18 ` Omar Sandoval
  2014-12-20  6:13   ` Al Viro
  2014-12-20  3:18 ` [PATCH v2 5/5] vfs: update swap_{,de}activate documentation Omar Sandoval
  2015-01-14  3:18 ` [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
  5 siblings, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2014-12-20  3:18 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton, Trond Myklebust,
	Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
	linux-kernel
  Cc: Omar Sandoval, Mel Gorman

Using ->direct_IO and ->readpage for the generic swap file
infrastructure requires all sorts of nasty workarounds. ->readpage
implementations don't play nicely with swap cache pages, and ->direct_IO
implementations have different locking conventions for every filesystem.
Instead, use ->read_iter/->write_iter with an ITER_BVEC and let the
filesystem take care of it. This will also allow us to easily transition
to kernel AIO if that gets merged in the future.

Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
 mm/page_io.c  | 30 +++++++++++++++++++++++-------
 mm/swapfile.c | 11 ++++++++++-
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/mm/page_io.c b/mm/page_io.c
index 532a39b..61165b0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -263,7 +263,6 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 	if (sis->flags & SWP_FILE) {
 		struct kiocb kiocb;
 		struct file *swap_file = sis->swap_file;
-		struct address_space *mapping = swap_file->f_mapping;
 		struct iov_iter from;
 		struct bio_vec bv = {
 			.bv_page = page,
@@ -279,9 +278,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 
 		set_page_writeback(page);
 		unlock_page(page);
-		ret = mapping->a_ops->direct_IO(ITER_BVEC | WRITE,
-						&kiocb, &from,
-						kiocb.ki_pos);
+		ret = swap_file->f_op->write_iter(&kiocb, &from);
 		if (ret == PAGE_SIZE) {
 			count_vm_event(PSWPOUT);
 			ret = 0;
@@ -344,12 +341,31 @@ int swap_readpage(struct page *page)
 	}
 
 	if (sis->flags & SWP_FILE) {
+		struct kiocb kiocb;
 		struct file *swap_file = sis->swap_file;
-		struct address_space *mapping = swap_file->f_mapping;
+		struct iov_iter to;
+		struct bio_vec bv = {
+			.bv_page = page,
+			.bv_len = PAGE_SIZE,
+			.bv_offset = 0,
+		};
+
+		iov_iter_bvec(&to, ITER_BVEC | READ, &bv, 1, PAGE_SIZE);
+
+		init_sync_kiocb(&kiocb, swap_file);
+		kiocb.ki_pos = page_file_offset(page);
+		kiocb.ki_nbytes = PAGE_SIZE;
 
-		ret = mapping->a_ops->readpage(swap_file, page);
-		if (!ret)
+		ret = swap_file->f_op->read_iter(&kiocb, &to);
+		if (ret == PAGE_SIZE) {
+			SetPageUptodate(page);
 			count_vm_event(PSWPIN);
+			ret = 0;
+		} else {
+			ClearPageUptodate(page);
+			SetPageError(page);
+		}
+		unlock_page(page);
 		return ret;
 	}
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 63f55cc..4e14122 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2379,7 +2379,16 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 		name = NULL;
 		goto bad_swap;
 	}
-	swap_file = file_open_name(name, O_RDWR|O_LARGEFILE, 0);
+	swap_file = file_open_name(name, O_RDWR | O_LARGEFILE | O_DIRECT, 0);
+	if (swap_file == ERR_PTR(-EINVAL)) {
+		/*
+		 * XXX: there are several filesystems that implement ->bmap but
+		 * not ->direct_IO. It's unlikely that anyone is using a
+		 * swapfile on, e.g., the MINIX fs, but this kludge will keep us
+		 * from getting a complaint from the one person who does.
+		 */
+		swap_file = file_open_name(name, O_RDWR | O_LARGEFILE, 0);
+	}
 	if (IS_ERR(swap_file)) {
 		error = PTR_ERR(swap_file);
 		swap_file = NULL;
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 5/5] vfs: update swap_{,de}activate documentation
  2014-12-20  3:18 [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
                   ` (3 preceding siblings ...)
  2014-12-20  3:18 ` [PATCH v2 4/5] swapfile: use ->read_iter and ->write_iter Omar Sandoval
@ 2014-12-20  3:18 ` Omar Sandoval
  2015-01-14  3:18 ` [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
  5 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2014-12-20  3:18 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton, Trond Myklebust,
	Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
	linux-kernel
  Cc: Omar Sandoval

Parameters were added to swap_activate in the same patch series that
introduced it without updating the documentation. Additionally, the
documentation claims that non-existent address space operations
->swap_{in,out} are used for swap I/O, but now we use
->{read,write}_iter.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
 Documentation/filesystems/Locking | 7 ++++---
 Documentation/filesystems/vfs.txt | 7 ++++---
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index b30753c..e72b4c3 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -205,7 +205,8 @@ prototypes:
 	int (*launder_page)(struct page *);
 	int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
 	int (*error_remove_page)(struct address_space *, struct page *);
-	int (*swap_activate)(struct file *);
+	int (*swap_activate)(struct swap_info_struct *, struct file *,
+			     sector_t *);
 	int (*swap_deactivate)(struct file *);
 
 locking rules:
@@ -230,8 +231,8 @@ migratepage:		yes (both)
 launder_page:		yes
 is_partially_uptodate:	yes
 error_remove_page:	yes
-swap_activate:		no
-swap_deactivate:	no
+swap_activate:					yes
+swap_deactivate:				no
 
 	->write_begin(), ->write_end(), ->sync_page() and ->readpage()
 may be called from the request handler (/dev/loop).
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 43ce050..9c793a7 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -600,8 +600,9 @@ struct address_space_operations {
 					unsigned long);
 	void (*is_dirty_writeback) (struct page *, bool *, bool *);
 	int (*error_remove_page) (struct mapping *mapping, struct page *page);
-	int (*swap_activate)(struct file *);
-	int (*swap_deactivate)(struct file *);
+	int (*swap_activate)(struct swap_info_struct *, struct file *,
+			     sector_t *);
+	void (*swap_deactivate)(struct file *);
 };
 
   writepage: called by the VM to write a dirty page to backing store.
@@ -788,7 +789,7 @@ struct address_space_operations {
 	memory. A return value of zero indicates success,
 	in which case this file can be used to back swapspace. The
 	swapspace operations will be proxied to this address space's
-	->swap_{out,in} methods.
+	->{read,write}_iter methods with O_DIRECT.
 
   swap_deactivate: Called during swapoff on files where swap_activate
 	was successful.
-- 
2.2.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/5] direct-io: don't dirty ITER_BVEC pages on read
  2014-12-20  3:18 ` [PATCH v2 2/5] direct-io: don't dirty ITER_BVEC pages on read Omar Sandoval
@ 2014-12-20  6:01   ` Al Viro
  2014-12-22  7:12     ` Omar Sandoval
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2014-12-20  6:01 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Andrew Morton, Trond Myklebust, Christoph Hellwig, linux-fsdevel,
	linux-mm, linux-nfs, linux-kernel, Ming Lei

On Fri, Dec 19, 2014 at 07:18:26PM -0800, Omar Sandoval wrote:
> Reads through the iov_iter infrastructure for kernel pages shouldn't be
> dirtied by the direct I/O code.
> 
> This is based on Dave Kleikamp's and Ming Lei's previously posted
> patches.

Umm...  

> +	dio->should_dirty = !iov_iter_is_bvec(iter);

	dio->should_dirty = iter_is_iovec(iter);

perhaps?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/5] swapfile: use ->read_iter and ->write_iter
  2014-12-20  3:18 ` [PATCH v2 4/5] swapfile: use ->read_iter and ->write_iter Omar Sandoval
@ 2014-12-20  6:13   ` Al Viro
  2014-12-22  7:32     ` Omar Sandoval
  0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2014-12-20  6:13 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Andrew Morton, Trond Myklebust, Christoph Hellwig, linux-fsdevel,
	linux-mm, linux-nfs, linux-kernel, Mel Gorman

On Fri, Dec 19, 2014 at 07:18:28PM -0800, Omar Sandoval wrote:

> +		ret = swap_file->f_op->read_iter(&kiocb, &to);
> +		if (ret == PAGE_SIZE) {
> +			SetPageUptodate(page);
>  			count_vm_event(PSWPIN);
> +			ret = 0;
> +		} else {
> +			ClearPageUptodate(page);
> +			SetPageError(page);
> +		}
> +		unlock_page(page);

Umm...  What's to guarantee that ->read_iter() won't try lock_page() on what
turns out to be equal to page?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/5] direct-io: don't dirty ITER_BVEC pages on read
  2014-12-20  6:01   ` Al Viro
@ 2014-12-22  7:12     ` Omar Sandoval
  0 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2014-12-22  7:12 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Trond Myklebust, Christoph Hellwig, linux-fsdevel,
	linux-mm, linux-nfs, linux-kernel, Ming Lei

On Sat, Dec 20, 2014 at 06:01:30AM +0000, Al Viro wrote:
> On Fri, Dec 19, 2014 at 07:18:26PM -0800, Omar Sandoval wrote:
> > Reads through the iov_iter infrastructure for kernel pages shouldn't be
> > dirtied by the direct I/O code.
> > 
> > This is based on Dave Kleikamp's and Ming Lei's previously posted
> > patches.
> 
> Umm...  
> 
> > +	dio->should_dirty = !iov_iter_is_bvec(iter);
> 
> 	dio->should_dirty = iter_is_iovec(iter);
> 
> perhaps?

Mm, yeah, I'll do that. That helper snuck in without me noticing it... I
see that we can't do iov_iter_get_pages on an ITER_KVEC, so a kvec
doesn't work for blockdev_direct_IO anyways, right?

-- 
Omar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/5] swapfile: use ->read_iter and ->write_iter
  2014-12-20  6:13   ` Al Viro
@ 2014-12-22  7:32     ` Omar Sandoval
  0 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2014-12-22  7:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Trond Myklebust, Christoph Hellwig, linux-fsdevel,
	linux-mm, linux-nfs, linux-kernel, Mel Gorman

On Sat, Dec 20, 2014 at 06:13:37AM +0000, Al Viro wrote:
> On Fri, Dec 19, 2014 at 07:18:28PM -0800, Omar Sandoval wrote:
> 
> > +		ret = swap_file->f_op->read_iter(&kiocb, &to);
> > +		if (ret == PAGE_SIZE) {
> > +			SetPageUptodate(page);
> >  			count_vm_event(PSWPIN);
> > +			ret = 0;
> > +		} else {
> > +			ClearPageUptodate(page);
> > +			SetPageError(page);
> > +		}
> > +		unlock_page(page);
> 
> Umm...  What's to guarantee that ->read_iter() won't try lock_page() on what
> turns out to be equal to page?

Ergh. I don't see why ->read_iter would be screwing around in the swap
cache or with the pages in the iterator, anything in particular you can
see happening?

-- 
Omar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/5] nfs: don't dirty ITER_BVEC pages read through direct I/O
  2014-12-20  3:18 ` [PATCH v2 3/5] nfs: don't dirty ITER_BVEC pages read through direct I/O Omar Sandoval
@ 2015-01-05 14:41   ` Anna Schumaker
  2015-01-08  9:25     ` Omar Sandoval
  0 siblings, 1 reply; 14+ messages in thread
From: Anna Schumaker @ 2015-01-05 14:41 UTC (permalink / raw)
  To: Omar Sandoval, Alexander Viro, Andrew Morton, Trond Myklebust,
	Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
	linux-kernel

Hi Omar,

On 12/19/2014 10:18 PM, Omar Sandoval wrote:
> As with the generic blockdev code, kernel pages shouldn't be dirtied by
> the direct I/O path.
> 
> Signed-off-by: Omar Sandoval <osandov@osandov.com>
> ---
>  fs/nfs/direct.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 10bf072..b6ca65c 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -88,6 +88,7 @@ struct nfs_direct_req {
>  	struct pnfs_ds_commit_info ds_cinfo;	/* Storage for cinfo */
>  	struct work_struct	work;
>  	int			flags;
> +	int			should_dirty;	/* should we mark read pages dirty? */
>  #define NFS_ODIRECT_DO_COMMIT		(1)	/* an unstable reply was received */
>  #define NFS_ODIRECT_RESCHED_WRITES	(2)	/* write verification failed */
>  	struct nfs_writeverf	verf;		/* unstable write verifier */

Can you add should_dirty after the NFS_ODIRECT_* flags?

Thanks,
Anna

> @@ -370,7 +371,8 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
>  		struct nfs_page *req = nfs_list_entry(hdr->pages.next);
>  		struct page *page = req->wb_page;
>  
> -		if (!PageCompound(page) && bytes < hdr->good_bytes)
> +		if (!PageCompound(page) && bytes < hdr->good_bytes &&
> +		    dreq->should_dirty)
>  			set_page_dirty(page);
>  		bytes += req->wb_bytes;
>  		nfs_list_remove_request(req);
> @@ -542,6 +544,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
>  	dreq->inode = inode;
>  	dreq->bytes_left = count;
>  	dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
> +	dreq->should_dirty = !iov_iter_is_bvec(iter);
>  	l_ctx = nfs_get_lock_context(dreq->ctx);
>  	if (IS_ERR(l_ctx)) {
>  		result = PTR_ERR(l_ctx);
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 3/5] nfs: don't dirty ITER_BVEC pages read through direct I/O
  2015-01-05 14:41   ` Anna Schumaker
@ 2015-01-08  9:25     ` Omar Sandoval
  0 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2015-01-08  9:25 UTC (permalink / raw)
  To: Anna Schumaker
  Cc: Alexander Viro, Andrew Morton, Trond Myklebust,
	Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
	linux-kernel

On Mon, Jan 05, 2015 at 09:41:00AM -0500, Anna Schumaker wrote:
> Hi Omar,
> 
> On 12/19/2014 10:18 PM, Omar Sandoval wrote:
> > As with the generic blockdev code, kernel pages shouldn't be dirtied by
> > the direct I/O path.
> > 
> > Signed-off-by: Omar Sandoval <osandov@osandov.com>
> > ---
> >  fs/nfs/direct.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> > index 10bf072..b6ca65c 100644
> > --- a/fs/nfs/direct.c
> > +++ b/fs/nfs/direct.c
> > @@ -88,6 +88,7 @@ struct nfs_direct_req {
> >  	struct pnfs_ds_commit_info ds_cinfo;	/* Storage for cinfo */
> >  	struct work_struct	work;
> >  	int			flags;
> > +	int			should_dirty;	/* should we mark read pages dirty? */
> >  #define NFS_ODIRECT_DO_COMMIT		(1)	/* an unstable reply was received */
> >  #define NFS_ODIRECT_RESCHED_WRITES	(2)	/* write verification failed */
> >  	struct nfs_writeverf	verf;		/* unstable write verifier */
> 
> Can you add should_dirty after the NFS_ODIRECT_* flags?
> 
> Thanks,
> Anna
> 
> > @@ -370,7 +371,8 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
> >  		struct nfs_page *req = nfs_list_entry(hdr->pages.next);
> >  		struct page *page = req->wb_page;
> >  
> > -		if (!PageCompound(page) && bytes < hdr->good_bytes)
> > +		if (!PageCompound(page) && bytes < hdr->good_bytes &&
> > +		    dreq->should_dirty)
> >  			set_page_dirty(page);
> >  		bytes += req->wb_bytes;
> >  		nfs_list_remove_request(req);
> > @@ -542,6 +544,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
> >  	dreq->inode = inode;
> >  	dreq->bytes_left = count;
> >  	dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
> > +	dreq->should_dirty = !iov_iter_is_bvec(iter);
> >  	l_ctx = nfs_get_lock_context(dreq->ctx);
> >  	if (IS_ERR(l_ctx)) {
> >  		result = PTR_ERR(l_ctx);
> > 
> 

Thanks, Anna, I'll fix that.

-- 
Omar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/5] clean up and generalize swap-over-NFS
  2014-12-20  3:18 [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
                   ` (4 preceding siblings ...)
  2014-12-20  3:18 ` [PATCH v2 5/5] vfs: update swap_{,de}activate documentation Omar Sandoval
@ 2015-01-14  3:18 ` Omar Sandoval
  2015-01-21 19:14   ` Omar Sandoval
  5 siblings, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2015-01-14  3:18 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton, Trond Myklebust,
	Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
	linux-kernel

On Fri, Dec 19, 2014 at 07:18:24PM -0800, Omar Sandoval wrote:
> Hi,
> 
> This patch series (based on ecb5ec0 in Linus' tree) contains all of the
> non-BTRFS work that I've done to implement swapfiles on BTRFS. The BTRFS
> portion is still undergoing development and is now outweighed by the
> non-BTRFS changes, so I want to get these in separately.
> 
> Version 2 changes the generic swapfile interface to use ->read_iter and
> ->write_iter instead of using ->direct_IO directly in response to
> discussion on the previous submission. It also adds the iov_iter_is_bvec
> helper to factor out some common checks.
> 
> Version 1 can be found here: https://lkml.org/lkml/2014/12/15/7
> 
> Omar Sandoval (5):
>   iov_iter: add ITER_BVEC helpers
>   direct-io: don't dirty ITER_BVEC pages on read
>   nfs: don't dirty ITER_BVEC pages read through direct I/O
>   swapfile: use ->read_iter and ->write_iter
>   vfs: update swap_{,de}activate documentation
> 
>  Documentation/filesystems/Locking |  7 ++++---
>  Documentation/filesystems/vfs.txt |  7 ++++---
>  fs/direct-io.c                    |  8 ++++---
>  fs/nfs/direct.c                   |  5 ++++-
>  fs/splice.c                       |  7 ++-----
>  include/linux/uio.h               |  7 +++++++
>  mm/iov_iter.c                     | 12 +++++++++++
>  mm/page_io.c                      | 44 +++++++++++++++++++++++++--------------
>  mm/swapfile.c                     | 11 +++++++++-
>  9 files changed, 76 insertions(+), 32 deletions(-)
> 
> -- 
> 2.2.1
> 

Hi, everyone,

Thanks for all of the feedback on the last few iterations of this
series. If it's alright, I'd like to revive the conversation around
these patches.

There are a couple of issues which we were discussing before the
holidays:

One concern that Al mentioned was ->read_iter and ->write_iter falling
back to the buffered I/O case. Like Christoph mentioned, this can be
prevented by doing the proper checks on the filesystem side (usually
just making sure that all blocks of a swapfile are allocated, but on
BTRFS, for example, we also have to check for compressed extents).

The other concern which Al brought up was that ->read_iter is passed a
locked page in the iter_bvec and could end up trying to lock it. I'm not
too sure under what conditions that would happen -- could someone give
an example? My intuition is that there's no path which will lead us to
deadlock on a page in the swapcache, but I don't have anything solid to
back that up.

Thanks!
-- 
Omar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/5] clean up and generalize swap-over-NFS
  2015-01-14  3:18 ` [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
@ 2015-01-21 19:14   ` Omar Sandoval
  0 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2015-01-21 19:14 UTC (permalink / raw)
  To: Alexander Viro, Andrew Morton, Trond Myklebust,
	Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
	linux-kernel

On Tue, Jan 13, 2015 at 07:18:36PM -0800, Omar Sandoval wrote:
> On Fri, Dec 19, 2014 at 07:18:24PM -0800, Omar Sandoval wrote:
> > Hi,
> > 
> > This patch series (based on ecb5ec0 in Linus' tree) contains all of the
> > non-BTRFS work that I've done to implement swapfiles on BTRFS. The BTRFS
> > portion is still undergoing development and is now outweighed by the
> > non-BTRFS changes, so I want to get these in separately.
> > 
> > Version 2 changes the generic swapfile interface to use ->read_iter and
> > ->write_iter instead of using ->direct_IO directly in response to
> > discussion on the previous submission. It also adds the iov_iter_is_bvec
> > helper to factor out some common checks.
> > 
> > Version 1 can be found here: https://lkml.org/lkml/2014/12/15/7
> > 
> > Omar Sandoval (5):
> >   iov_iter: add ITER_BVEC helpers
> >   direct-io: don't dirty ITER_BVEC pages on read
> >   nfs: don't dirty ITER_BVEC pages read through direct I/O
> >   swapfile: use ->read_iter and ->write_iter
> >   vfs: update swap_{,de}activate documentation
> > 
> >  Documentation/filesystems/Locking |  7 ++++---
> >  Documentation/filesystems/vfs.txt |  7 ++++---
> >  fs/direct-io.c                    |  8 ++++---
> >  fs/nfs/direct.c                   |  5 ++++-
> >  fs/splice.c                       |  7 ++-----
> >  include/linux/uio.h               |  7 +++++++
> >  mm/iov_iter.c                     | 12 +++++++++++
> >  mm/page_io.c                      | 44 +++++++++++++++++++++++++--------------
> >  mm/swapfile.c                     | 11 +++++++++-
> >  9 files changed, 76 insertions(+), 32 deletions(-)
> > 
> > -- 
> > 2.2.1
> > 
> 
> Hi, everyone,
> 
> Thanks for all of the feedback on the last few iterations of this
> series. If it's alright, I'd like to revive the conversation around
> these patches.
> 
> There are a couple of issues which we were discussing before the
> holidays:
> 
> One concern that Al mentioned was ->read_iter and ->write_iter falling
> back to the buffered I/O case. Like Christoph mentioned, this can be
> prevented by doing the proper checks on the filesystem side (usually
> just making sure that all blocks of a swapfile are allocated, but on
> BTRFS, for example, we also have to check for compressed extents).
> 
> The other concern which Al brought up was that ->read_iter is passed a
> locked page in the iter_bvec and could end up trying to lock it. I'm not
> too sure under what conditions that would happen -- could someone give
> an example? My intuition is that there's no path which will lead us to
> deadlock on a page in the swapcache, but I don't have anything solid to
> back that up.
> 
> Thanks!
> -- 
> Omar

Hi,

Any updates on this?

Thanks,
-- 
Omar

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-01-21 19:14 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-20  3:18 [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
2014-12-20  3:18 ` [PATCH v2 1/5] iov_iter: add ITER_BVEC helpers Omar Sandoval
2014-12-20  3:18 ` [PATCH v2 2/5] direct-io: don't dirty ITER_BVEC pages on read Omar Sandoval
2014-12-20  6:01   ` Al Viro
2014-12-22  7:12     ` Omar Sandoval
2014-12-20  3:18 ` [PATCH v2 3/5] nfs: don't dirty ITER_BVEC pages read through direct I/O Omar Sandoval
2015-01-05 14:41   ` Anna Schumaker
2015-01-08  9:25     ` Omar Sandoval
2014-12-20  3:18 ` [PATCH v2 4/5] swapfile: use ->read_iter and ->write_iter Omar Sandoval
2014-12-20  6:13   ` Al Viro
2014-12-22  7:32     ` Omar Sandoval
2014-12-20  3:18 ` [PATCH v2 5/5] vfs: update swap_{,de}activate documentation Omar Sandoval
2015-01-14  3:18 ` [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
2015-01-21 19:14   ` Omar Sandoval

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).