* [PATCH v2 0/5] clean up and generalize swap-over-NFS
@ 2014-12-20 3:18 Omar Sandoval
2014-12-20 3:18 ` [PATCH v2 1/5] iov_iter: add ITER_BVEC helpers Omar Sandoval
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: Omar Sandoval @ 2014-12-20 3:18 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Trond Myklebust,
Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
linux-kernel
Cc: Omar Sandoval
Hi,
This patch series (based on ecb5ec0 in Linus' tree) contains all of the
non-BTRFS work that I've done to implement swapfiles on BTRFS. The BTRFS
portion is still undergoing development and is now outweighed by the
non-BTRFS changes, so I want to get these in separately.
Version 2 changes the generic swapfile interface to use ->read_iter and
->write_iter instead of using ->direct_IO directly in response to
discussion on the previous submission. It also adds the iov_iter_is_bvec
helper to factor out some common checks.
Version 1 can be found here: https://lkml.org/lkml/2014/12/15/7
Omar Sandoval (5):
iov_iter: add ITER_BVEC helpers
direct-io: don't dirty ITER_BVEC pages on read
nfs: don't dirty ITER_BVEC pages read through direct I/O
swapfile: use ->read_iter and ->write_iter
vfs: update swap_{,de}activate documentation
Documentation/filesystems/Locking | 7 ++++---
Documentation/filesystems/vfs.txt | 7 ++++---
fs/direct-io.c | 8 ++++---
fs/nfs/direct.c | 5 ++++-
fs/splice.c | 7 ++-----
include/linux/uio.h | 7 +++++++
mm/iov_iter.c | 12 +++++++++++
mm/page_io.c | 44 +++++++++++++++++++++++++--------------
mm/swapfile.c | 11 +++++++++-
9 files changed, 76 insertions(+), 32 deletions(-)
--
2.2.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 1/5] iov_iter: add ITER_BVEC helpers
2014-12-20 3:18 [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
@ 2014-12-20 3:18 ` Omar Sandoval
2014-12-20 3:18 ` [PATCH v2 2/5] direct-io: don't dirty ITER_BVEC pages on read Omar Sandoval
` (4 subsequent siblings)
5 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2014-12-20 3:18 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Trond Myklebust,
Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
linux-kernel
Cc: Omar Sandoval
Add iov_iter_is_bvec and iov_iter_bvec and convert callers.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
fs/splice.c | 7 ++-----
include/linux/uio.h | 7 +++++++
mm/iov_iter.c | 12 ++++++++++++
mm/page_io.c | 14 +++++---------
4 files changed, 26 insertions(+), 14 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index 75c6058..7c7176f 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1006,11 +1006,8 @@ iter_file_splice_write(struct pipe_inode_info *pipe, struct file *out,
}
/* ... iov_iter */
- from.type = ITER_BVEC | WRITE;
- from.bvec = array;
- from.nr_segs = n;
- from.count = sd.total_len - left;
- from.iov_offset = 0;
+ iov_iter_bvec(&from, ITER_BVEC | WRITE, array, n,
+ sd.total_len - left);
/* ... and iocb */
init_sync_kiocb(&kiocb, out);
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 1c5e453..23bb5e4 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -63,6 +63,11 @@ static inline struct iovec iov_iter_iovec(const struct iov_iter *iter)
};
}
+static inline int iov_iter_is_bvec(const struct iov_iter *iter)
+{
+ return (iter->type & ITER_BVEC) != 0;
+}
+
#define iov_for_each(iov, iter, start) \
if (!((start).type & ITER_BVEC)) \
for (iter = (start); \
@@ -90,6 +95,8 @@ void iov_iter_init(struct iov_iter *i, int direction, const struct iovec *iov,
unsigned long nr_segs, size_t count);
void iov_iter_kvec(struct iov_iter *i, int direction, const struct kvec *iov,
unsigned long nr_segs, size_t count);
+void iov_iter_bvec(struct iov_iter *i, int direction, const struct bio_vec *bv,
+ unsigned long nr_segs, size_t count);
ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
size_t maxsize, unsigned maxpages, size_t *start);
ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
diff --git a/mm/iov_iter.c b/mm/iov_iter.c
index a1599ca..c975bc4 100644
--- a/mm/iov_iter.c
+++ b/mm/iov_iter.c
@@ -513,6 +513,18 @@ void iov_iter_kvec(struct iov_iter *i, int direction,
}
EXPORT_SYMBOL(iov_iter_kvec);
+void iov_iter_bvec(struct iov_iter *i, int direction, const struct bio_vec *bv,
+ unsigned long nr_segs, size_t count)
+{
+ BUG_ON(!(direction & ITER_BVEC));
+ i->type = direction;
+ i->bvec = bv;
+ i->nr_segs = nr_segs;
+ i->iov_offset = 0;
+ i->count = count;
+}
+EXPORT_SYMBOL(iov_iter_bvec);
+
unsigned long iov_iter_alignment(const struct iov_iter *i)
{
unsigned long res = 0;
diff --git a/mm/page_io.c b/mm/page_io.c
index 955db8b..532a39b 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -264,18 +264,14 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
struct kiocb kiocb;
struct file *swap_file = sis->swap_file;
struct address_space *mapping = swap_file->f_mapping;
+ struct iov_iter from;
struct bio_vec bv = {
.bv_page = page,
- .bv_len = PAGE_SIZE,
- .bv_offset = 0
+ .bv_len = PAGE_SIZE,
+ .bv_offset = 0,
};
- struct iov_iter from = {
- .type = ITER_BVEC | WRITE,
- .count = PAGE_SIZE,
- .iov_offset = 0,
- .nr_segs = 1,
- };
- from.bvec = &bv; /* older gcc versions are broken */
+
+ iov_iter_bvec(&from, ITER_BVEC | WRITE, &bv, 1, PAGE_SIZE);
init_sync_kiocb(&kiocb, swap_file);
kiocb.ki_pos = page_file_offset(page);
--
2.2.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 2/5] direct-io: don't dirty ITER_BVEC pages on read
2014-12-20 3:18 [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
2014-12-20 3:18 ` [PATCH v2 1/5] iov_iter: add ITER_BVEC helpers Omar Sandoval
@ 2014-12-20 3:18 ` Omar Sandoval
2014-12-20 6:01 ` Al Viro
2014-12-20 3:18 ` [PATCH v2 3/5] nfs: don't dirty ITER_BVEC pages read through direct I/O Omar Sandoval
` (3 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2014-12-20 3:18 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Trond Myklebust,
Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
linux-kernel
Cc: Omar Sandoval, Ming Lei
Reads through the iov_iter infrastructure for kernel pages shouldn't be
dirtied by the direct I/O code.
This is based on Dave Kleikamp's and Ming Lei's previously posted
patches.
Cc: Ming Lei <ming.lei@canonical.com>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
fs/direct-io.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/direct-io.c b/fs/direct-io.c
index e181b6b..c71387b 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -120,6 +120,7 @@ struct dio {
spinlock_t bio_lock; /* protects BIO fields below */
int page_errors; /* errno from get_user_pages() */
int is_async; /* is IO async ? */
+ int should_dirty; /* should we mark read pages dirty? */
bool defer_completion; /* defer AIO completion to workqueue? */
int io_error; /* IO error in completion path */
unsigned long refcount; /* direct_io_worker() and bios */
@@ -392,7 +393,7 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio)
dio->refcount++;
spin_unlock_irqrestore(&dio->bio_lock, flags);
- if (dio->is_async && dio->rw == READ)
+ if (dio->is_async && dio->rw == READ && dio->should_dirty)
bio_set_pages_dirty(bio);
if (sdio->submit_io)
@@ -463,13 +464,13 @@ static int dio_bio_complete(struct dio *dio, struct bio *bio)
if (!uptodate)
dio->io_error = -EIO;
- if (dio->is_async && dio->rw == READ) {
+ if (dio->is_async && dio->rw == READ && dio->should_dirty) {
bio_check_pages_dirty(bio); /* transfers ownership */
} else {
bio_for_each_segment_all(bvec, bio, i) {
struct page *page = bvec->bv_page;
- if (dio->rw == READ && !PageCompound(page))
+ if (dio->rw == READ && !PageCompound(page) && dio->should_dirty)
set_page_dirty_lock(page);
page_cache_release(page);
}
@@ -1177,6 +1178,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
dio->inode = inode;
dio->rw = rw;
+ dio->should_dirty = !iov_iter_is_bvec(iter);
/*
* For AIO O_(D)SYNC writes we need to defer completions to a workqueue
--
2.2.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 3/5] nfs: don't dirty ITER_BVEC pages read through direct I/O
2014-12-20 3:18 [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
2014-12-20 3:18 ` [PATCH v2 1/5] iov_iter: add ITER_BVEC helpers Omar Sandoval
2014-12-20 3:18 ` [PATCH v2 2/5] direct-io: don't dirty ITER_BVEC pages on read Omar Sandoval
@ 2014-12-20 3:18 ` Omar Sandoval
2015-01-05 14:41 ` Anna Schumaker
2014-12-20 3:18 ` [PATCH v2 4/5] swapfile: use ->read_iter and ->write_iter Omar Sandoval
` (2 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2014-12-20 3:18 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Trond Myklebust,
Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
linux-kernel
Cc: Omar Sandoval
As with the generic blockdev code, kernel pages shouldn't be dirtied by
the direct I/O path.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
fs/nfs/direct.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 10bf072..b6ca65c 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -88,6 +88,7 @@ struct nfs_direct_req {
struct pnfs_ds_commit_info ds_cinfo; /* Storage for cinfo */
struct work_struct work;
int flags;
+ int should_dirty; /* should we mark read pages dirty? */
#define NFS_ODIRECT_DO_COMMIT (1) /* an unstable reply was received */
#define NFS_ODIRECT_RESCHED_WRITES (2) /* write verification failed */
struct nfs_writeverf verf; /* unstable write verifier */
@@ -370,7 +371,8 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
struct nfs_page *req = nfs_list_entry(hdr->pages.next);
struct page *page = req->wb_page;
- if (!PageCompound(page) && bytes < hdr->good_bytes)
+ if (!PageCompound(page) && bytes < hdr->good_bytes &&
+ dreq->should_dirty)
set_page_dirty(page);
bytes += req->wb_bytes;
nfs_list_remove_request(req);
@@ -542,6 +544,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
dreq->inode = inode;
dreq->bytes_left = count;
dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
+ dreq->should_dirty = !iov_iter_is_bvec(iter);
l_ctx = nfs_get_lock_context(dreq->ctx);
if (IS_ERR(l_ctx)) {
result = PTR_ERR(l_ctx);
--
2.2.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 4/5] swapfile: use ->read_iter and ->write_iter
2014-12-20 3:18 [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
` (2 preceding siblings ...)
2014-12-20 3:18 ` [PATCH v2 3/5] nfs: don't dirty ITER_BVEC pages read through direct I/O Omar Sandoval
@ 2014-12-20 3:18 ` Omar Sandoval
2014-12-20 6:13 ` Al Viro
2014-12-20 3:18 ` [PATCH v2 5/5] vfs: update swap_{,de}activate documentation Omar Sandoval
2015-01-14 3:18 ` [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
5 siblings, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2014-12-20 3:18 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Trond Myklebust,
Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
linux-kernel
Cc: Omar Sandoval, Mel Gorman
Using ->direct_IO and ->readpage for the generic swap file
infrastructure requires all sorts of nasty workarounds. ->readpage
implementations don't play nicely with swap cache pages, and ->direct_IO
implementations have different locking conventions for every filesystem.
Instead, use ->read_iter/->write_iter with an ITER_BVEC and let the
filesystem take care of it. This will also allow us to easily transition
to kernel AIO if that gets merged in the future.
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
mm/page_io.c | 30 +++++++++++++++++++++++-------
mm/swapfile.c | 11 ++++++++++-
2 files changed, 33 insertions(+), 8 deletions(-)
diff --git a/mm/page_io.c b/mm/page_io.c
index 532a39b..61165b0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -263,7 +263,6 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
if (sis->flags & SWP_FILE) {
struct kiocb kiocb;
struct file *swap_file = sis->swap_file;
- struct address_space *mapping = swap_file->f_mapping;
struct iov_iter from;
struct bio_vec bv = {
.bv_page = page,
@@ -279,9 +278,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
set_page_writeback(page);
unlock_page(page);
- ret = mapping->a_ops->direct_IO(ITER_BVEC | WRITE,
- &kiocb, &from,
- kiocb.ki_pos);
+ ret = swap_file->f_op->write_iter(&kiocb, &from);
if (ret == PAGE_SIZE) {
count_vm_event(PSWPOUT);
ret = 0;
@@ -344,12 +341,31 @@ int swap_readpage(struct page *page)
}
if (sis->flags & SWP_FILE) {
+ struct kiocb kiocb;
struct file *swap_file = sis->swap_file;
- struct address_space *mapping = swap_file->f_mapping;
+ struct iov_iter to;
+ struct bio_vec bv = {
+ .bv_page = page,
+ .bv_len = PAGE_SIZE,
+ .bv_offset = 0,
+ };
+
+ iov_iter_bvec(&to, ITER_BVEC | READ, &bv, 1, PAGE_SIZE);
+
+ init_sync_kiocb(&kiocb, swap_file);
+ kiocb.ki_pos = page_file_offset(page);
+ kiocb.ki_nbytes = PAGE_SIZE;
- ret = mapping->a_ops->readpage(swap_file, page);
- if (!ret)
+ ret = swap_file->f_op->read_iter(&kiocb, &to);
+ if (ret == PAGE_SIZE) {
+ SetPageUptodate(page);
count_vm_event(PSWPIN);
+ ret = 0;
+ } else {
+ ClearPageUptodate(page);
+ SetPageError(page);
+ }
+ unlock_page(page);
return ret;
}
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 63f55cc..4e14122 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2379,7 +2379,16 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
name = NULL;
goto bad_swap;
}
- swap_file = file_open_name(name, O_RDWR|O_LARGEFILE, 0);
+ swap_file = file_open_name(name, O_RDWR | O_LARGEFILE | O_DIRECT, 0);
+ if (swap_file == ERR_PTR(-EINVAL)) {
+ /*
+ * XXX: there are several filesystems that implement ->bmap but
+ * not ->direct_IO. It's unlikely that anyone is using a
+ * swapfile on, e.g., the MINIX fs, but this kludge will keep us
+ * from getting a complaint from the one person who does.
+ */
+ swap_file = file_open_name(name, O_RDWR | O_LARGEFILE, 0);
+ }
if (IS_ERR(swap_file)) {
error = PTR_ERR(swap_file);
swap_file = NULL;
--
2.2.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 5/5] vfs: update swap_{,de}activate documentation
2014-12-20 3:18 [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
` (3 preceding siblings ...)
2014-12-20 3:18 ` [PATCH v2 4/5] swapfile: use ->read_iter and ->write_iter Omar Sandoval
@ 2014-12-20 3:18 ` Omar Sandoval
2015-01-14 3:18 ` [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
5 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2014-12-20 3:18 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Trond Myklebust,
Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
linux-kernel
Cc: Omar Sandoval
Parameters were added to swap_activate in the same patch series that
introduced it without updating the documentation. Additionally, the
documentation claims that non-existent address space operations
->swap_{in,out} are used for swap I/O, but now we use
->{read,write}_iter.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
---
Documentation/filesystems/Locking | 7 ++++---
Documentation/filesystems/vfs.txt | 7 ++++---
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index b30753c..e72b4c3 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -205,7 +205,8 @@ prototypes:
int (*launder_page)(struct page *);
int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
int (*error_remove_page)(struct address_space *, struct page *);
- int (*swap_activate)(struct file *);
+ int (*swap_activate)(struct swap_info_struct *, struct file *,
+ sector_t *);
int (*swap_deactivate)(struct file *);
locking rules:
@@ -230,8 +231,8 @@ migratepage: yes (both)
launder_page: yes
is_partially_uptodate: yes
error_remove_page: yes
-swap_activate: no
-swap_deactivate: no
+swap_activate: yes
+swap_deactivate: no
->write_begin(), ->write_end(), ->sync_page() and ->readpage()
may be called from the request handler (/dev/loop).
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 43ce050..9c793a7 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -600,8 +600,9 @@ struct address_space_operations {
unsigned long);
void (*is_dirty_writeback) (struct page *, bool *, bool *);
int (*error_remove_page) (struct mapping *mapping, struct page *page);
- int (*swap_activate)(struct file *);
- int (*swap_deactivate)(struct file *);
+ int (*swap_activate)(struct swap_info_struct *, struct file *,
+ sector_t *);
+ void (*swap_deactivate)(struct file *);
};
writepage: called by the VM to write a dirty page to backing store.
@@ -788,7 +789,7 @@ struct address_space_operations {
memory. A return value of zero indicates success,
in which case this file can be used to back swapspace. The
swapspace operations will be proxied to this address space's
- ->swap_{out,in} methods.
+ ->{read,write}_iter methods with O_DIRECT.
swap_deactivate: Called during swapoff on files where swap_activate
was successful.
--
2.2.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] direct-io: don't dirty ITER_BVEC pages on read
2014-12-20 3:18 ` [PATCH v2 2/5] direct-io: don't dirty ITER_BVEC pages on read Omar Sandoval
@ 2014-12-20 6:01 ` Al Viro
2014-12-22 7:12 ` Omar Sandoval
0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2014-12-20 6:01 UTC (permalink / raw)
To: Omar Sandoval
Cc: Andrew Morton, Trond Myklebust, Christoph Hellwig, linux-fsdevel,
linux-mm, linux-nfs, linux-kernel, Ming Lei
On Fri, Dec 19, 2014 at 07:18:26PM -0800, Omar Sandoval wrote:
> Reads through the iov_iter infrastructure for kernel pages shouldn't be
> dirtied by the direct I/O code.
>
> This is based on Dave Kleikamp's and Ming Lei's previously posted
> patches.
Umm...
> + dio->should_dirty = !iov_iter_is_bvec(iter);
dio->should_dirty = iter_is_iovec(iter);
perhaps?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/5] swapfile: use ->read_iter and ->write_iter
2014-12-20 3:18 ` [PATCH v2 4/5] swapfile: use ->read_iter and ->write_iter Omar Sandoval
@ 2014-12-20 6:13 ` Al Viro
2014-12-22 7:32 ` Omar Sandoval
0 siblings, 1 reply; 14+ messages in thread
From: Al Viro @ 2014-12-20 6:13 UTC (permalink / raw)
To: Omar Sandoval
Cc: Andrew Morton, Trond Myklebust, Christoph Hellwig, linux-fsdevel,
linux-mm, linux-nfs, linux-kernel, Mel Gorman
On Fri, Dec 19, 2014 at 07:18:28PM -0800, Omar Sandoval wrote:
> + ret = swap_file->f_op->read_iter(&kiocb, &to);
> + if (ret == PAGE_SIZE) {
> + SetPageUptodate(page);
> count_vm_event(PSWPIN);
> + ret = 0;
> + } else {
> + ClearPageUptodate(page);
> + SetPageError(page);
> + }
> + unlock_page(page);
Umm... What's to guarantee that ->read_iter() won't try lock_page() on what
turns out to be equal to page?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/5] direct-io: don't dirty ITER_BVEC pages on read
2014-12-20 6:01 ` Al Viro
@ 2014-12-22 7:12 ` Omar Sandoval
0 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2014-12-22 7:12 UTC (permalink / raw)
To: Al Viro
Cc: Andrew Morton, Trond Myklebust, Christoph Hellwig, linux-fsdevel,
linux-mm, linux-nfs, linux-kernel, Ming Lei
On Sat, Dec 20, 2014 at 06:01:30AM +0000, Al Viro wrote:
> On Fri, Dec 19, 2014 at 07:18:26PM -0800, Omar Sandoval wrote:
> > Reads through the iov_iter infrastructure for kernel pages shouldn't be
> > dirtied by the direct I/O code.
> >
> > This is based on Dave Kleikamp's and Ming Lei's previously posted
> > patches.
>
> Umm...
>
> > + dio->should_dirty = !iov_iter_is_bvec(iter);
>
> dio->should_dirty = iter_is_iovec(iter);
>
> perhaps?
Mm, yeah, I'll do that. That helper snuck in without me noticing it... I
see that we can't do iov_iter_get_pages on an ITER_KVEC, so a kvec
doesn't work for blockdev_direct_IO anyways, right?
--
Omar
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 4/5] swapfile: use ->read_iter and ->write_iter
2014-12-20 6:13 ` Al Viro
@ 2014-12-22 7:32 ` Omar Sandoval
0 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2014-12-22 7:32 UTC (permalink / raw)
To: Al Viro
Cc: Andrew Morton, Trond Myklebust, Christoph Hellwig, linux-fsdevel,
linux-mm, linux-nfs, linux-kernel, Mel Gorman
On Sat, Dec 20, 2014 at 06:13:37AM +0000, Al Viro wrote:
> On Fri, Dec 19, 2014 at 07:18:28PM -0800, Omar Sandoval wrote:
>
> > + ret = swap_file->f_op->read_iter(&kiocb, &to);
> > + if (ret == PAGE_SIZE) {
> > + SetPageUptodate(page);
> > count_vm_event(PSWPIN);
> > + ret = 0;
> > + } else {
> > + ClearPageUptodate(page);
> > + SetPageError(page);
> > + }
> > + unlock_page(page);
>
> Umm... What's to guarantee that ->read_iter() won't try lock_page() on what
> turns out to be equal to page?
Ergh. I don't see why ->read_iter would be screwing around in the swap
cache or with the pages in the iterator, anything in particular you can
see happening?
--
Omar
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/5] nfs: don't dirty ITER_BVEC pages read through direct I/O
2014-12-20 3:18 ` [PATCH v2 3/5] nfs: don't dirty ITER_BVEC pages read through direct I/O Omar Sandoval
@ 2015-01-05 14:41 ` Anna Schumaker
2015-01-08 9:25 ` Omar Sandoval
0 siblings, 1 reply; 14+ messages in thread
From: Anna Schumaker @ 2015-01-05 14:41 UTC (permalink / raw)
To: Omar Sandoval, Alexander Viro, Andrew Morton, Trond Myklebust,
Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
linux-kernel
Hi Omar,
On 12/19/2014 10:18 PM, Omar Sandoval wrote:
> As with the generic blockdev code, kernel pages shouldn't be dirtied by
> the direct I/O path.
>
> Signed-off-by: Omar Sandoval <osandov@osandov.com>
> ---
> fs/nfs/direct.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 10bf072..b6ca65c 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -88,6 +88,7 @@ struct nfs_direct_req {
> struct pnfs_ds_commit_info ds_cinfo; /* Storage for cinfo */
> struct work_struct work;
> int flags;
> + int should_dirty; /* should we mark read pages dirty? */
> #define NFS_ODIRECT_DO_COMMIT (1) /* an unstable reply was received */
> #define NFS_ODIRECT_RESCHED_WRITES (2) /* write verification failed */
> struct nfs_writeverf verf; /* unstable write verifier */
Can you add should_dirty after the NFS_ODIRECT_* flags?
Thanks,
Anna
> @@ -370,7 +371,8 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
> struct nfs_page *req = nfs_list_entry(hdr->pages.next);
> struct page *page = req->wb_page;
>
> - if (!PageCompound(page) && bytes < hdr->good_bytes)
> + if (!PageCompound(page) && bytes < hdr->good_bytes &&
> + dreq->should_dirty)
> set_page_dirty(page);
> bytes += req->wb_bytes;
> nfs_list_remove_request(req);
> @@ -542,6 +544,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
> dreq->inode = inode;
> dreq->bytes_left = count;
> dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
> + dreq->should_dirty = !iov_iter_is_bvec(iter);
> l_ctx = nfs_get_lock_context(dreq->ctx);
> if (IS_ERR(l_ctx)) {
> result = PTR_ERR(l_ctx);
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 3/5] nfs: don't dirty ITER_BVEC pages read through direct I/O
2015-01-05 14:41 ` Anna Schumaker
@ 2015-01-08 9:25 ` Omar Sandoval
0 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2015-01-08 9:25 UTC (permalink / raw)
To: Anna Schumaker
Cc: Alexander Viro, Andrew Morton, Trond Myklebust,
Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
linux-kernel
On Mon, Jan 05, 2015 at 09:41:00AM -0500, Anna Schumaker wrote:
> Hi Omar,
>
> On 12/19/2014 10:18 PM, Omar Sandoval wrote:
> > As with the generic blockdev code, kernel pages shouldn't be dirtied by
> > the direct I/O path.
> >
> > Signed-off-by: Omar Sandoval <osandov@osandov.com>
> > ---
> > fs/nfs/direct.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> > index 10bf072..b6ca65c 100644
> > --- a/fs/nfs/direct.c
> > +++ b/fs/nfs/direct.c
> > @@ -88,6 +88,7 @@ struct nfs_direct_req {
> > struct pnfs_ds_commit_info ds_cinfo; /* Storage for cinfo */
> > struct work_struct work;
> > int flags;
> > + int should_dirty; /* should we mark read pages dirty? */
> > #define NFS_ODIRECT_DO_COMMIT (1) /* an unstable reply was received */
> > #define NFS_ODIRECT_RESCHED_WRITES (2) /* write verification failed */
> > struct nfs_writeverf verf; /* unstable write verifier */
>
> Can you add should_dirty after the NFS_ODIRECT_* flags?
>
> Thanks,
> Anna
>
> > @@ -370,7 +371,8 @@ static void nfs_direct_read_completion(struct nfs_pgio_header *hdr)
> > struct nfs_page *req = nfs_list_entry(hdr->pages.next);
> > struct page *page = req->wb_page;
> >
> > - if (!PageCompound(page) && bytes < hdr->good_bytes)
> > + if (!PageCompound(page) && bytes < hdr->good_bytes &&
> > + dreq->should_dirty)
> > set_page_dirty(page);
> > bytes += req->wb_bytes;
> > nfs_list_remove_request(req);
> > @@ -542,6 +544,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
> > dreq->inode = inode;
> > dreq->bytes_left = count;
> > dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp));
> > + dreq->should_dirty = !iov_iter_is_bvec(iter);
> > l_ctx = nfs_get_lock_context(dreq->ctx);
> > if (IS_ERR(l_ctx)) {
> > result = PTR_ERR(l_ctx);
> >
>
Thanks, Anna, I'll fix that.
--
Omar
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/5] clean up and generalize swap-over-NFS
2014-12-20 3:18 [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
` (4 preceding siblings ...)
2014-12-20 3:18 ` [PATCH v2 5/5] vfs: update swap_{,de}activate documentation Omar Sandoval
@ 2015-01-14 3:18 ` Omar Sandoval
2015-01-21 19:14 ` Omar Sandoval
5 siblings, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2015-01-14 3:18 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Trond Myklebust,
Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
linux-kernel
On Fri, Dec 19, 2014 at 07:18:24PM -0800, Omar Sandoval wrote:
> Hi,
>
> This patch series (based on ecb5ec0 in Linus' tree) contains all of the
> non-BTRFS work that I've done to implement swapfiles on BTRFS. The BTRFS
> portion is still undergoing development and is now outweighed by the
> non-BTRFS changes, so I want to get these in separately.
>
> Version 2 changes the generic swapfile interface to use ->read_iter and
> ->write_iter instead of using ->direct_IO directly in response to
> discussion on the previous submission. It also adds the iov_iter_is_bvec
> helper to factor out some common checks.
>
> Version 1 can be found here: https://lkml.org/lkml/2014/12/15/7
>
> Omar Sandoval (5):
> iov_iter: add ITER_BVEC helpers
> direct-io: don't dirty ITER_BVEC pages on read
> nfs: don't dirty ITER_BVEC pages read through direct I/O
> swapfile: use ->read_iter and ->write_iter
> vfs: update swap_{,de}activate documentation
>
> Documentation/filesystems/Locking | 7 ++++---
> Documentation/filesystems/vfs.txt | 7 ++++---
> fs/direct-io.c | 8 ++++---
> fs/nfs/direct.c | 5 ++++-
> fs/splice.c | 7 ++-----
> include/linux/uio.h | 7 +++++++
> mm/iov_iter.c | 12 +++++++++++
> mm/page_io.c | 44 +++++++++++++++++++++++++--------------
> mm/swapfile.c | 11 +++++++++-
> 9 files changed, 76 insertions(+), 32 deletions(-)
>
> --
> 2.2.1
>
Hi, everyone,
Thanks for all of the feedback on the last few iterations of this
series. If it's alright, I'd like to revive the conversation around
these patches.
There are a couple of issues which we were discussing before the
holidays:
One concern that Al mentioned was ->read_iter and ->write_iter falling
back to the buffered I/O case. Like Christoph mentioned, this can be
prevented by doing the proper checks on the filesystem side (usually
just making sure that all blocks of a swapfile are allocated, but on
BTRFS, for example, we also have to check for compressed extents).
The other concern which Al brought up was that ->read_iter is passed a
locked page in the iter_bvec and could end up trying to lock it. I'm not
too sure under what conditions that would happen -- could someone give
an example? My intuition is that there's no path which will lead us to
deadlock on a page in the swapcache, but I don't have anything solid to
back that up.
Thanks!
--
Omar
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/5] clean up and generalize swap-over-NFS
2015-01-14 3:18 ` [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
@ 2015-01-21 19:14 ` Omar Sandoval
0 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2015-01-21 19:14 UTC (permalink / raw)
To: Alexander Viro, Andrew Morton, Trond Myklebust,
Christoph Hellwig, linux-fsdevel, linux-mm, linux-nfs,
linux-kernel
On Tue, Jan 13, 2015 at 07:18:36PM -0800, Omar Sandoval wrote:
> On Fri, Dec 19, 2014 at 07:18:24PM -0800, Omar Sandoval wrote:
> > Hi,
> >
> > This patch series (based on ecb5ec0 in Linus' tree) contains all of the
> > non-BTRFS work that I've done to implement swapfiles on BTRFS. The BTRFS
> > portion is still undergoing development and is now outweighed by the
> > non-BTRFS changes, so I want to get these in separately.
> >
> > Version 2 changes the generic swapfile interface to use ->read_iter and
> > ->write_iter instead of using ->direct_IO directly in response to
> > discussion on the previous submission. It also adds the iov_iter_is_bvec
> > helper to factor out some common checks.
> >
> > Version 1 can be found here: https://lkml.org/lkml/2014/12/15/7
> >
> > Omar Sandoval (5):
> > iov_iter: add ITER_BVEC helpers
> > direct-io: don't dirty ITER_BVEC pages on read
> > nfs: don't dirty ITER_BVEC pages read through direct I/O
> > swapfile: use ->read_iter and ->write_iter
> > vfs: update swap_{,de}activate documentation
> >
> > Documentation/filesystems/Locking | 7 ++++---
> > Documentation/filesystems/vfs.txt | 7 ++++---
> > fs/direct-io.c | 8 ++++---
> > fs/nfs/direct.c | 5 ++++-
> > fs/splice.c | 7 ++-----
> > include/linux/uio.h | 7 +++++++
> > mm/iov_iter.c | 12 +++++++++++
> > mm/page_io.c | 44 +++++++++++++++++++++++++--------------
> > mm/swapfile.c | 11 +++++++++-
> > 9 files changed, 76 insertions(+), 32 deletions(-)
> >
> > --
> > 2.2.1
> >
>
> Hi, everyone,
>
> Thanks for all of the feedback on the last few iterations of this
> series. If it's alright, I'd like to revive the conversation around
> these patches.
>
> There are a couple of issues which we were discussing before the
> holidays:
>
> One concern that Al mentioned was ->read_iter and ->write_iter falling
> back to the buffered I/O case. Like Christoph mentioned, this can be
> prevented by doing the proper checks on the filesystem side (usually
> just making sure that all blocks of a swapfile are allocated, but on
> BTRFS, for example, we also have to check for compressed extents).
>
> The other concern which Al brought up was that ->read_iter is passed a
> locked page in the iter_bvec and could end up trying to lock it. I'm not
> too sure under what conditions that would happen -- could someone give
> an example? My intuition is that there's no path which will lead us to
> deadlock on a page in the swapcache, but I don't have anything solid to
> back that up.
>
> Thanks!
> --
> Omar
Hi,
Any updates on this?
Thanks,
--
Omar
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-01-21 19:14 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-20 3:18 [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
2014-12-20 3:18 ` [PATCH v2 1/5] iov_iter: add ITER_BVEC helpers Omar Sandoval
2014-12-20 3:18 ` [PATCH v2 2/5] direct-io: don't dirty ITER_BVEC pages on read Omar Sandoval
2014-12-20 6:01 ` Al Viro
2014-12-22 7:12 ` Omar Sandoval
2014-12-20 3:18 ` [PATCH v2 3/5] nfs: don't dirty ITER_BVEC pages read through direct I/O Omar Sandoval
2015-01-05 14:41 ` Anna Schumaker
2015-01-08 9:25 ` Omar Sandoval
2014-12-20 3:18 ` [PATCH v2 4/5] swapfile: use ->read_iter and ->write_iter Omar Sandoval
2014-12-20 6:13 ` Al Viro
2014-12-22 7:32 ` Omar Sandoval
2014-12-20 3:18 ` [PATCH v2 5/5] vfs: update swap_{,de}activate documentation Omar Sandoval
2015-01-14 3:18 ` [PATCH v2 0/5] clean up and generalize swap-over-NFS Omar Sandoval
2015-01-21 19:14 ` Omar Sandoval
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).