* [PATCH v6 0/9] gfs2 iomap write support
@ 2018-06-02 9:57 ` Andreas Gruenbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher
Here's an update of my gfs2 iomap write patches, with support for
buffered writes as well as direct I/O reads and writes through iomap.
Significant changes since v5:
* Initial gfs2 specific cleanups split off and posted separately.
* New patch for generic iomap inline data handling.
* Iomap write_begin operation removed.
A 4.17-rc7 based version of the patches can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/log/?h=iomap-write
Thanks,
Andreas
Andreas Gruenbacher (8):
iomap: Mark newly allocated buffer heads as new
iomap: Complete partial direct I/O writes synchronously
iomap: Generic inline data handling
iomap: Add write_end iomap operation
gfs2: iomap buffered write support
gfs2: gfs2_extent_length cleanup
gfs2: iomap direct I/O support
gfs2: Remove gfs2_write_{begin,end}
Christoph Hellwig (1):
iomap: inline data should be an iomap type, not a flag
fs/buffer.c | 28 +--
fs/ext2/inode.c | 1 +
fs/ext4/inline.c | 4 +-
fs/ext4/inode.c | 1 +
fs/gfs2/aops.c | 328 ++------------------------------
fs/gfs2/aops.h | 22 +++
fs/gfs2/bmap.c | 367 +++++++++++++++++++++++++++++++-----
fs/gfs2/file.c | 168 ++++++++++++++++-
fs/iomap.c | 133 ++++++++++---
fs/xfs/xfs_iomap.c | 1 +
include/linux/buffer_head.h | 3 +
include/linux/iomap.h | 15 +-
12 files changed, 654 insertions(+), 417 deletions(-)
create mode 100644 fs/gfs2/aops.h
--
2.17.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 0/9] gfs2 iomap write support
@ 2018-06-02 9:57 ` Andreas Gruenbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel.redhat.com
Here's an update of my gfs2 iomap write patches, with support for
buffered writes as well as direct I/O reads and writes through iomap.
Significant changes since v5:
* Initial gfs2 specific cleanups split off and posted separately.
* New patch for generic iomap inline data handling.
* Iomap write_begin operation removed.
A 4.17-rc7 based version of the patches can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/log/?h=iomap-write
Thanks,
Andreas
Andreas Gruenbacher (8):
iomap: Mark newly allocated buffer heads as new
iomap: Complete partial direct I/O writes synchronously
iomap: Generic inline data handling
iomap: Add write_end iomap operation
gfs2: iomap buffered write support
gfs2: gfs2_extent_length cleanup
gfs2: iomap direct I/O support
gfs2: Remove gfs2_write_{begin,end}
Christoph Hellwig (1):
iomap: inline data should be an iomap type, not a flag
fs/buffer.c | 28 +--
fs/ext2/inode.c | 1 +
fs/ext4/inline.c | 4 +-
fs/ext4/inode.c | 1 +
fs/gfs2/aops.c | 328 ++------------------------------
fs/gfs2/aops.h | 22 +++
fs/gfs2/bmap.c | 367 +++++++++++++++++++++++++++++++-----
fs/gfs2/file.c | 168 ++++++++++++++++-
fs/iomap.c | 133 ++++++++++---
fs/xfs/xfs_iomap.c | 1 +
include/linux/buffer_head.h | 3 +
include/linux/iomap.h | 15 +-
12 files changed, 654 insertions(+), 417 deletions(-)
create mode 100644 fs/gfs2/aops.h
--
2.17.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v6 1/9] iomap: inline data should be an iomap type, not a flag
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-06-02 9:57 ` Andreas Gruenbacher
-1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher
From: Christoph Hellwig <hch@lst.de>
Inline data is fundamentally different from our normal mapped case in that
it doesn't even have a block address. So instead of having a flag for it
it should be an entirely separate iomap range type.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/ext4/inline.c | 4 ++--
fs/gfs2/bmap.c | 4 ++--
fs/iomap.c | 8 ++++----
include/linux/iomap.h | 2 +-
4 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 70cf4c7b268a..e1f00891ef95 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1835,8 +1835,8 @@ int ext4_inline_data_iomap(struct inode *inode, struct iomap *iomap)
iomap->offset = 0;
iomap->length = min_t(loff_t, ext4_get_inline_size(inode),
i_size_read(inode));
- iomap->type = 0;
- iomap->flags = IOMAP_F_DATA_INLINE;
+ iomap->type = IOMAP_INLINE;
+ iomap->flags = 0;
out:
up_read(&EXT4_I(inode)->xattr_sem);
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index a7b586e02693..59a78056530a 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -767,8 +767,8 @@ static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
sizeof(struct gfs2_dinode);
iomap->offset = 0;
iomap->length = i_size_read(inode);
- iomap->type = IOMAP_MAPPED;
- iomap->flags = IOMAP_F_DATA_INLINE;
+ iomap->type = IOMAP_INLINE;
+ iomap->flags = 0;
}
/**
diff --git a/fs/iomap.c b/fs/iomap.c
index afd163586aa0..54b693da3a35 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -498,22 +498,22 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
case IOMAP_HOLE:
/* skip holes */
return 0;
+ case IOMAP_MAPPED:
+ break;
case IOMAP_DELALLOC:
flags |= FIEMAP_EXTENT_DELALLOC | FIEMAP_EXTENT_UNKNOWN;
break;
case IOMAP_UNWRITTEN:
flags |= FIEMAP_EXTENT_UNWRITTEN;
break;
- case IOMAP_MAPPED:
- break;
+ case IOMAP_INLINE:
+ flags |= FIEMAP_EXTENT_DATA_INLINE;
}
if (iomap->flags & IOMAP_F_MERGED)
flags |= FIEMAP_EXTENT_MERGED;
if (iomap->flags & IOMAP_F_SHARED)
flags |= FIEMAP_EXTENT_SHARED;
- if (iomap->flags & IOMAP_F_DATA_INLINE)
- flags |= FIEMAP_EXTENT_DATA_INLINE;
return fiemap_fill_next_extent(fi, iomap->offset,
iomap->addr != IOMAP_NULL_ADDR ? iomap->addr : 0,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 19a07de28212..918f14075702 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -18,6 +18,7 @@ struct vm_fault;
#define IOMAP_DELALLOC 0x02 /* delayed allocation blocks */
#define IOMAP_MAPPED 0x03 /* blocks allocated at @addr */
#define IOMAP_UNWRITTEN 0x04 /* blocks allocated at @addr in unwritten state */
+#define IOMAP_INLINE 0x05 /* data inline in the inode */
/*
* Flags for all iomap mappings:
@@ -34,7 +35,6 @@ struct vm_fault;
*/
#define IOMAP_F_MERGED 0x10 /* contains multiple blocks/extents */
#define IOMAP_F_SHARED 0x20 /* block shared with another file */
-#define IOMAP_F_DATA_INLINE 0x40 /* data inline in the inode */
/*
* Magic value for addr:
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 1/9] iomap: inline data should be an iomap type, not a flag
@ 2018-06-02 9:57 ` Andreas Gruenbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel.redhat.com
From: Christoph Hellwig <hch@lst.de>
Inline data is fundamentally different from our normal mapped case in that
it doesn't even have a block address. So instead of having a flag for it
it should be an entirely separate iomap range type.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/ext4/inline.c | 4 ++--
fs/gfs2/bmap.c | 4 ++--
fs/iomap.c | 8 ++++----
include/linux/iomap.h | 2 +-
4 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 70cf4c7b268a..e1f00891ef95 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1835,8 +1835,8 @@ int ext4_inline_data_iomap(struct inode *inode, struct iomap *iomap)
iomap->offset = 0;
iomap->length = min_t(loff_t, ext4_get_inline_size(inode),
i_size_read(inode));
- iomap->type = 0;
- iomap->flags = IOMAP_F_DATA_INLINE;
+ iomap->type = IOMAP_INLINE;
+ iomap->flags = 0;
out:
up_read(&EXT4_I(inode)->xattr_sem);
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index a7b586e02693..59a78056530a 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -767,8 +767,8 @@ static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
sizeof(struct gfs2_dinode);
iomap->offset = 0;
iomap->length = i_size_read(inode);
- iomap->type = IOMAP_MAPPED;
- iomap->flags = IOMAP_F_DATA_INLINE;
+ iomap->type = IOMAP_INLINE;
+ iomap->flags = 0;
}
/**
diff --git a/fs/iomap.c b/fs/iomap.c
index afd163586aa0..54b693da3a35 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -498,22 +498,22 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi,
case IOMAP_HOLE:
/* skip holes */
return 0;
+ case IOMAP_MAPPED:
+ break;
case IOMAP_DELALLOC:
flags |= FIEMAP_EXTENT_DELALLOC | FIEMAP_EXTENT_UNKNOWN;
break;
case IOMAP_UNWRITTEN:
flags |= FIEMAP_EXTENT_UNWRITTEN;
break;
- case IOMAP_MAPPED:
- break;
+ case IOMAP_INLINE:
+ flags |= FIEMAP_EXTENT_DATA_INLINE;
}
if (iomap->flags & IOMAP_F_MERGED)
flags |= FIEMAP_EXTENT_MERGED;
if (iomap->flags & IOMAP_F_SHARED)
flags |= FIEMAP_EXTENT_SHARED;
- if (iomap->flags & IOMAP_F_DATA_INLINE)
- flags |= FIEMAP_EXTENT_DATA_INLINE;
return fiemap_fill_next_extent(fi, iomap->offset,
iomap->addr != IOMAP_NULL_ADDR ? iomap->addr : 0,
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 19a07de28212..918f14075702 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -18,6 +18,7 @@ struct vm_fault;
#define IOMAP_DELALLOC 0x02 /* delayed allocation blocks */
#define IOMAP_MAPPED 0x03 /* blocks allocated at @addr */
#define IOMAP_UNWRITTEN 0x04 /* blocks allocated at @addr in unwritten state */
+#define IOMAP_INLINE 0x05 /* data inline in the inode */
/*
* Flags for all iomap mappings:
@@ -34,7 +35,6 @@ struct vm_fault;
*/
#define IOMAP_F_MERGED 0x10 /* contains multiple blocks/extents */
#define IOMAP_F_SHARED 0x20 /* block shared with another file */
-#define IOMAP_F_DATA_INLINE 0x40 /* data inline in the inode */
/*
* Magic value for addr:
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v6 2/9] iomap: Mark newly allocated buffer heads as new
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-06-02 9:57 ` Andreas Gruenbacher
-1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher
In iomap_to_bh, not only mark buffer heads in IOMAP_UNWRITTEN maps as
new, but also buffer heads in IOMAP_MAPPED maps with the IOMAP_F_NEW
flag set. This will be used by filesystems like gfs2, which allocate
blocks in iomap->begin.
Minor corrections to the comment for IOMAP_UNWRITTEN maps.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/buffer.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 249b83fafe48..5220d9efcd18 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1900,15 +1900,15 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
break;
case IOMAP_UNWRITTEN:
/*
- * For unwritten regions, we always need to ensure that
- * sub-block writes cause the regions in the block we are not
- * writing to are zeroed. Set the buffer as new to ensure this.
+ * For unwritten regions, we always need to ensure that regions
+ * in the block we are not writing to are zeroed. Mark the
+ * buffer as new to ensure this.
*/
set_buffer_new(bh);
set_buffer_unwritten(bh);
/* FALLTHRU */
case IOMAP_MAPPED:
- if (offset >= i_size_read(inode))
+ if ((iomap->flags & IOMAP_F_NEW) || offset >= i_size_read(inode))
set_buffer_new(bh);
bh->b_blocknr = (iomap->addr + offset - iomap->offset) >>
inode->i_blkbits;
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 2/9] iomap: Mark newly allocated buffer heads as new
@ 2018-06-02 9:57 ` Andreas Gruenbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel.redhat.com
In iomap_to_bh, not only mark buffer heads in IOMAP_UNWRITTEN maps as
new, but also buffer heads in IOMAP_MAPPED maps with the IOMAP_F_NEW
flag set. This will be used by filesystems like gfs2, which allocate
blocks in iomap->begin.
Minor corrections to the comment for IOMAP_UNWRITTEN maps.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/buffer.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 249b83fafe48..5220d9efcd18 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1900,15 +1900,15 @@ iomap_to_bh(struct inode *inode, sector_t block, struct buffer_head *bh,
break;
case IOMAP_UNWRITTEN:
/*
- * For unwritten regions, we always need to ensure that
- * sub-block writes cause the regions in the block we are not
- * writing to are zeroed. Set the buffer as new to ensure this.
+ * For unwritten regions, we always need to ensure that regions
+ * in the block we are not writing to are zeroed. Mark the
+ * buffer as new to ensure this.
*/
set_buffer_new(bh);
set_buffer_unwritten(bh);
/* FALLTHRU */
case IOMAP_MAPPED:
- if (offset >= i_size_read(inode))
+ if ((iomap->flags & IOMAP_F_NEW) || offset >= i_size_read(inode))
set_buffer_new(bh);
bh->b_blocknr = (iomap->addr + offset - iomap->offset) >>
inode->i_blkbits;
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v6 3/9] iomap: Complete partial direct I/O writes synchronously
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-06-02 9:57 ` Andreas Gruenbacher
-1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher
According to xfstest generic/240, applications seem to expect direct I/O
writes to either complete as a whole or to fail; short direct I/O writes
are apparently not appreciated. This means that when only part of an
asynchronous direct I/O write succeeds, we can either fail the entire
write, or we can wait for the partial write to complete and retry the
remaining write as buffered I/O. The old __blockdev_direct_IO helper
has code for waiting for partial writes to complete; the new
iomap_dio_rw iomap helper does not.
The above mentioned fallback mode is needed for gfs2, which doesn't
allow block allocations under direct I/O to avoid taking cluster-wide
exclusive locks. As a consequence, an asynchronous direct I/O write to
a file range that contains a hole will result in a short write. In that
case, wait for the short write to complete to allow gfs2 to recover.
This will make xfstest generic/240 work on gfs2.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/iomap.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/fs/iomap.c b/fs/iomap.c
index 54b693da3a35..a0d3b7742060 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -696,6 +696,7 @@ struct iomap_dio {
atomic_t ref;
unsigned flags;
int error;
+ bool wait_for_completion;
union {
/* used during submission and for synchronous completion: */
@@ -797,9 +798,8 @@ static void iomap_dio_bio_end_io(struct bio *bio)
iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status));
if (atomic_dec_and_test(&dio->ref)) {
- if (is_sync_kiocb(dio->iocb)) {
+ if (dio->wait_for_completion) {
struct task_struct *waiter = dio->submit.waiter;
-
WRITE_ONCE(dio->submit.waiter, NULL);
wake_up_process(waiter);
} else if (dio->flags & IOMAP_DIO_WRITE) {
@@ -990,13 +990,12 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio->end_io = end_io;
dio->error = 0;
dio->flags = 0;
+ dio->wait_for_completion = is_sync_kiocb(iocb);
dio->submit.iter = iter;
- if (is_sync_kiocb(iocb)) {
- dio->submit.waiter = current;
- dio->submit.cookie = BLK_QC_T_NONE;
- dio->submit.last_queue = NULL;
- }
+ dio->submit.waiter = current;
+ dio->submit.cookie = BLK_QC_T_NONE;
+ dio->submit.last_queue = NULL;
if (iov_iter_rw(iter) == READ) {
if (pos >= dio->i_size)
@@ -1033,7 +1032,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio_warn_stale_pagecache(iocb->ki_filp);
ret = 0;
- if (iov_iter_rw(iter) == WRITE && !is_sync_kiocb(iocb) &&
+ if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
!inode->i_sb->s_dio_done_wq) {
ret = sb_init_dio_done_wq(inode->i_sb);
if (ret < 0)
@@ -1048,8 +1047,10 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
iomap_dio_actor);
if (ret <= 0) {
/* magic error code to fall back to buffered I/O */
- if (ret == -ENOTBLK)
+ if (ret == -ENOTBLK) {
+ dio->wait_for_completion = true;
ret = 0;
+ }
break;
}
pos += ret;
@@ -1062,8 +1063,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (ret < 0)
iomap_dio_set_error(dio, ret);
+ smp_mb__before_atomic();
if (!atomic_dec_and_test(&dio->ref)) {
- if (!is_sync_kiocb(iocb))
+ if (!dio->wait_for_completion)
return -EIOCBQUEUED;
for (;;) {
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 3/9] iomap: Complete partial direct I/O writes synchronously
@ 2018-06-02 9:57 ` Andreas Gruenbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel.redhat.com
According to xfstest generic/240, applications seem to expect direct I/O
writes to either complete as a whole or to fail; short direct I/O writes
are apparently not appreciated. This means that when only part of an
asynchronous direct I/O write succeeds, we can either fail the entire
write, or we can wait for the partial write to complete and retry the
remaining write as buffered I/O. The old __blockdev_direct_IO helper
has code for waiting for partial writes to complete; the new
iomap_dio_rw iomap helper does not.
The above mentioned fallback mode is needed for gfs2, which doesn't
allow block allocations under direct I/O to avoid taking cluster-wide
exclusive locks. As a consequence, an asynchronous direct I/O write to
a file range that contains a hole will result in a short write. In that
case, wait for the short write to complete to allow gfs2 to recover.
This will make xfstest generic/240 work on gfs2.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/iomap.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/fs/iomap.c b/fs/iomap.c
index 54b693da3a35..a0d3b7742060 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -696,6 +696,7 @@ struct iomap_dio {
atomic_t ref;
unsigned flags;
int error;
+ bool wait_for_completion;
union {
/* used during submission and for synchronous completion: */
@@ -797,9 +798,8 @@ static void iomap_dio_bio_end_io(struct bio *bio)
iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status));
if (atomic_dec_and_test(&dio->ref)) {
- if (is_sync_kiocb(dio->iocb)) {
+ if (dio->wait_for_completion) {
struct task_struct *waiter = dio->submit.waiter;
-
WRITE_ONCE(dio->submit.waiter, NULL);
wake_up_process(waiter);
} else if (dio->flags & IOMAP_DIO_WRITE) {
@@ -990,13 +990,12 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio->end_io = end_io;
dio->error = 0;
dio->flags = 0;
+ dio->wait_for_completion = is_sync_kiocb(iocb);
dio->submit.iter = iter;
- if (is_sync_kiocb(iocb)) {
- dio->submit.waiter = current;
- dio->submit.cookie = BLK_QC_T_NONE;
- dio->submit.last_queue = NULL;
- }
+ dio->submit.waiter = current;
+ dio->submit.cookie = BLK_QC_T_NONE;
+ dio->submit.last_queue = NULL;
if (iov_iter_rw(iter) == READ) {
if (pos >= dio->i_size)
@@ -1033,7 +1032,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio_warn_stale_pagecache(iocb->ki_filp);
ret = 0;
- if (iov_iter_rw(iter) == WRITE && !is_sync_kiocb(iocb) &&
+ if (iov_iter_rw(iter) == WRITE && !dio->wait_for_completion &&
!inode->i_sb->s_dio_done_wq) {
ret = sb_init_dio_done_wq(inode->i_sb);
if (ret < 0)
@@ -1048,8 +1047,10 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
iomap_dio_actor);
if (ret <= 0) {
/* magic error code to fall back to buffered I/O */
- if (ret == -ENOTBLK)
+ if (ret == -ENOTBLK) {
+ dio->wait_for_completion = true;
ret = 0;
+ }
break;
}
pos += ret;
@@ -1062,8 +1063,9 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (ret < 0)
iomap_dio_set_error(dio, ret);
+ smp_mb__before_atomic();
if (!atomic_dec_and_test(&dio->ref)) {
- if (!is_sync_kiocb(iocb))
+ if (!dio->wait_for_completion)
return -EIOCBQUEUED;
for (;;) {
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v6 4/9] iomap: Generic inline data handling
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-06-02 9:57 ` Andreas Gruenbacher
-1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher
Add generic inline data handling by adding a pointer to the inline data
region to struct iomap. When handling a buffered IOMAP_INLINE write,
iomap_write_begin will copy the current inline data from the inline data
region into the page cache, and iomap_write_end will copy the changes in
the page cache back to the inline data region.
This doesn't cover inline data reads and direct I/O yet because so far,
we have no users.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/buffer.c | 20 ++++++++-----
fs/iomap.c | 56 ++++++++++++++++++++++++++++++++++---
include/linux/buffer_head.h | 3 ++
include/linux/iomap.h | 1 +
4 files changed, 69 insertions(+), 11 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 5220d9efcd18..cfd1ad2a5805 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2112,15 +2112,12 @@ int block_write_end(struct file *file, struct address_space *mapping,
}
EXPORT_SYMBOL(block_write_end);
-int generic_write_end(struct file *file, struct address_space *mapping,
+void __generic_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
- struct page *page, void *fsdata)
+ struct page *page, void *fsdata, bool dirty_inode)
{
struct inode *inode = mapping->host;
loff_t old_size = inode->i_size;
- int i_size_changed = 0;
-
- copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
/*
* No need to use i_size_read() here, the i_size
@@ -2131,7 +2128,7 @@ int generic_write_end(struct file *file, struct address_space *mapping,
*/
if (pos+copied > inode->i_size) {
i_size_write(inode, pos+copied);
- i_size_changed = 1;
+ dirty_inode = true;
}
unlock_page(page);
@@ -2145,9 +2142,18 @@ int generic_write_end(struct file *file, struct address_space *mapping,
* ordering of page lock and transaction start for journaling
* filesystems.
*/
- if (i_size_changed)
+ if (dirty_inode)
mark_inode_dirty(inode);
+}
+int generic_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
+{
+ copied = block_write_end(file, mapping, pos, len, copied,
+ page, fsdata);
+ __generic_write_end(file, mapping, pos, len, copied,
+ page, fsdata, false);
return copied;
}
EXPORT_SYMBOL(generic_write_end);
diff --git a/fs/iomap.c b/fs/iomap.c
index a0d3b7742060..857c4b7b54eb 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -108,6 +108,40 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
truncate_pagecache_range(inode, max(pos, i_size), pos + len);
}
+static void
+iomap_read_inline_data(struct page *page, struct iomap *iomap, loff_t size)
+{
+ void *data = iomap->inline_data;
+ void *addr;
+
+ if (PageUptodate(page))
+ return;
+
+ BUG_ON(page->index);
+ BUG_ON(size > PAGE_SIZE - offset_in_page(data));
+
+ addr = kmap_atomic(page);
+ memcpy(addr, data, size);
+ memset(addr + size, 0, PAGE_SIZE - size);
+ kunmap_atomic(addr);
+ SetPageUptodate(page);
+}
+
+static void
+iomap_write_inline_data(struct page *page, struct iomap *iomap, off_t pos,
+ unsigned copied)
+{
+ void *data = iomap->inline_data;
+ void *addr;
+
+ BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(data));
+ WARN_ON_ONCE(!PageUptodate(page));
+
+ addr = kmap_atomic(page);
+ memcpy(data + pos, addr + pos, copied);
+ kunmap_atomic(addr);
+}
+
static int
iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
struct page **pagep, struct iomap *iomap)
@@ -125,6 +159,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
if (!page)
return -ENOMEM;
+ if (iomap->type == IOMAP_INLINE) {
+ iomap_read_inline_data(page, iomap, inode->i_size);
+ goto out;
+ }
+
status = __block_write_begin_int(page, pos, len, NULL, iomap);
if (unlikely(status)) {
unlock_page(page);
@@ -134,16 +173,24 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
iomap_write_failed(inode, pos, len);
}
+out:
*pagep = page;
return status;
}
static int
iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
- unsigned copied, struct page *page)
+ unsigned copied, struct page *page, struct iomap *iomap)
{
int ret;
+ if (iomap->type == IOMAP_INLINE) {
+ iomap_write_inline_data(page, iomap, pos, copied);
+ __generic_write_end(NULL, inode->i_mapping, pos, len,
+ copied, page, NULL, true);
+ return copied;
+ }
+
ret = generic_write_end(NULL, inode->i_mapping, pos, len,
copied, page, NULL);
if (ret < len)
@@ -200,7 +247,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
flush_dcache_page(page);
- status = iomap_write_end(inode, pos, bytes, copied, page);
+ status = iomap_write_end(inode, pos, bytes, copied, page,
+ iomap);
if (unlikely(status < 0))
break;
copied = status;
@@ -294,7 +342,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
WARN_ON_ONCE(!PageUptodate(page));
- status = iomap_write_end(inode, pos, bytes, bytes, page);
+ status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
if (unlikely(status <= 0)) {
if (WARN_ON_ONCE(status == 0))
return -EIO;
@@ -346,7 +394,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
zero_user(page, offset, bytes);
mark_page_accessed(page);
- return iomap_write_end(inode, pos, bytes, bytes, page);
+ return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
}
static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 894e5d125de6..c730ae8888a8 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -231,6 +231,9 @@ int __block_write_begin(struct page *page, loff_t pos, unsigned len,
int block_write_end(struct file *, struct address_space *,
loff_t, unsigned, unsigned,
struct page *, void *);
+void __generic_write_end(struct file *, struct address_space *,
+ loff_t, unsigned, unsigned,
+ struct page *, void *, bool);
int generic_write_end(struct file *, struct address_space *,
loff_t, unsigned, unsigned,
struct page *, void *);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 918f14075702..c61113c71a60 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -47,6 +47,7 @@ struct iomap {
u64 length; /* length of mapping, bytes */
u16 type; /* type of mapping */
u16 flags; /* flags for mapping */
+ void *inline_data; /* inline data buffer */
struct block_device *bdev; /* block device for I/O */
struct dax_device *dax_dev; /* dax_dev for dax operations */
};
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 4/9] iomap: Generic inline data handling
@ 2018-06-02 9:57 ` Andreas Gruenbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel.redhat.com
Add generic inline data handling by adding a pointer to the inline data
region to struct iomap. When handling a buffered IOMAP_INLINE write,
iomap_write_begin will copy the current inline data from the inline data
region into the page cache, and iomap_write_end will copy the changes in
the page cache back to the inline data region.
This doesn't cover inline data reads and direct I/O yet because so far,
we have no users.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/buffer.c | 20 ++++++++-----
fs/iomap.c | 56 ++++++++++++++++++++++++++++++++++---
include/linux/buffer_head.h | 3 ++
include/linux/iomap.h | 1 +
4 files changed, 69 insertions(+), 11 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 5220d9efcd18..cfd1ad2a5805 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2112,15 +2112,12 @@ int block_write_end(struct file *file, struct address_space *mapping,
}
EXPORT_SYMBOL(block_write_end);
-int generic_write_end(struct file *file, struct address_space *mapping,
+void __generic_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
- struct page *page, void *fsdata)
+ struct page *page, void *fsdata, bool dirty_inode)
{
struct inode *inode = mapping->host;
loff_t old_size = inode->i_size;
- int i_size_changed = 0;
-
- copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
/*
* No need to use i_size_read() here, the i_size
@@ -2131,7 +2128,7 @@ int generic_write_end(struct file *file, struct address_space *mapping,
*/
if (pos+copied > inode->i_size) {
i_size_write(inode, pos+copied);
- i_size_changed = 1;
+ dirty_inode = true;
}
unlock_page(page);
@@ -2145,9 +2142,18 @@ int generic_write_end(struct file *file, struct address_space *mapping,
* ordering of page lock and transaction start for journaling
* filesystems.
*/
- if (i_size_changed)
+ if (dirty_inode)
mark_inode_dirty(inode);
+}
+int generic_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned len, unsigned copied,
+ struct page *page, void *fsdata)
+{
+ copied = block_write_end(file, mapping, pos, len, copied,
+ page, fsdata);
+ __generic_write_end(file, mapping, pos, len, copied,
+ page, fsdata, false);
return copied;
}
EXPORT_SYMBOL(generic_write_end);
diff --git a/fs/iomap.c b/fs/iomap.c
index a0d3b7742060..857c4b7b54eb 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -108,6 +108,40 @@ iomap_write_failed(struct inode *inode, loff_t pos, unsigned len)
truncate_pagecache_range(inode, max(pos, i_size), pos + len);
}
+static void
+iomap_read_inline_data(struct page *page, struct iomap *iomap, loff_t size)
+{
+ void *data = iomap->inline_data;
+ void *addr;
+
+ if (PageUptodate(page))
+ return;
+
+ BUG_ON(page->index);
+ BUG_ON(size > PAGE_SIZE - offset_in_page(data));
+
+ addr = kmap_atomic(page);
+ memcpy(addr, data, size);
+ memset(addr + size, 0, PAGE_SIZE - size);
+ kunmap_atomic(addr);
+ SetPageUptodate(page);
+}
+
+static void
+iomap_write_inline_data(struct page *page, struct iomap *iomap, off_t pos,
+ unsigned copied)
+{
+ void *data = iomap->inline_data;
+ void *addr;
+
+ BUG_ON(pos + copied > PAGE_SIZE - offset_in_page(data));
+ WARN_ON_ONCE(!PageUptodate(page));
+
+ addr = kmap_atomic(page);
+ memcpy(data + pos, addr + pos, copied);
+ kunmap_atomic(addr);
+}
+
static int
iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
struct page **pagep, struct iomap *iomap)
@@ -125,6 +159,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
if (!page)
return -ENOMEM;
+ if (iomap->type == IOMAP_INLINE) {
+ iomap_read_inline_data(page, iomap, inode->i_size);
+ goto out;
+ }
+
status = __block_write_begin_int(page, pos, len, NULL, iomap);
if (unlikely(status)) {
unlock_page(page);
@@ -134,16 +173,24 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
iomap_write_failed(inode, pos, len);
}
+out:
*pagep = page;
return status;
}
static int
iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
- unsigned copied, struct page *page)
+ unsigned copied, struct page *page, struct iomap *iomap)
{
int ret;
+ if (iomap->type == IOMAP_INLINE) {
+ iomap_write_inline_data(page, iomap, pos, copied);
+ __generic_write_end(NULL, inode->i_mapping, pos, len,
+ copied, page, NULL, true);
+ return copied;
+ }
+
ret = generic_write_end(NULL, inode->i_mapping, pos, len,
copied, page, NULL);
if (ret < len)
@@ -200,7 +247,8 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
flush_dcache_page(page);
- status = iomap_write_end(inode, pos, bytes, copied, page);
+ status = iomap_write_end(inode, pos, bytes, copied, page,
+ iomap);
if (unlikely(status < 0))
break;
copied = status;
@@ -294,7 +342,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
WARN_ON_ONCE(!PageUptodate(page));
- status = iomap_write_end(inode, pos, bytes, bytes, page);
+ status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
if (unlikely(status <= 0)) {
if (WARN_ON_ONCE(status == 0))
return -EIO;
@@ -346,7 +394,7 @@ static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
zero_user(page, offset, bytes);
mark_page_accessed(page);
- return iomap_write_end(inode, pos, bytes, bytes, page);
+ return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
}
static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 894e5d125de6..c730ae8888a8 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -231,6 +231,9 @@ int __block_write_begin(struct page *page, loff_t pos, unsigned len,
int block_write_end(struct file *, struct address_space *,
loff_t, unsigned, unsigned,
struct page *, void *);
+void __generic_write_end(struct file *, struct address_space *,
+ loff_t, unsigned, unsigned,
+ struct page *, void *, bool);
int generic_write_end(struct file *, struct address_space *,
loff_t, unsigned, unsigned,
struct page *, void *);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 918f14075702..c61113c71a60 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -47,6 +47,7 @@ struct iomap {
u64 length; /* length of mapping, bytes */
u16 type; /* type of mapping */
u16 flags; /* flags for mapping */
+ void *inline_data; /* inline data buffer */
struct block_device *bdev; /* block device for I/O */
struct dax_device *dax_dev; /* dax_dev for dax operations */
};
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v6 5/9] iomap: Add write_end iomap operation
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-06-02 9:57 ` Andreas Gruenbacher
-1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher
Add a write_end operation to struct iomap_ops to provide a way of
overriding the default behavior of iomap_write_end. This will be used
for implementing data journaling in gfs2: in the data journaling case,
pages are written into the journal before being written back to their
proper on-disk locations.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/ext2/inode.c | 1 +
fs/ext4/inode.c | 1 +
fs/iomap.c | 53 +++++++++++++++++++++++++++++++------------
fs/xfs/xfs_iomap.c | 1 +
include/linux/iomap.h | 12 ++++++++++
5 files changed, 53 insertions(+), 15 deletions(-)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 71635909df3b..2744bf15b74b 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -848,6 +848,7 @@ ext2_iomap_end(struct inode *inode, loff_t offset, loff_t length,
const struct iomap_ops ext2_iomap_ops = {
.iomap_begin = ext2_iomap_begin,
+ .write_end = iomap_write_end,
.iomap_end = ext2_iomap_end,
};
#else
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1e50c5efae67..16dd1b4cc9f9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3604,6 +3604,7 @@ static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length,
const struct iomap_ops ext4_iomap_ops = {
.iomap_begin = ext4_iomap_begin,
+ .write_end = iomap_write_end,
.iomap_end = ext4_iomap_end,
};
diff --git a/fs/iomap.c b/fs/iomap.c
index 857c4b7b54eb..8f7dca17300d 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -178,7 +178,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
return status;
}
-static int
+int
iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
unsigned copied, struct page *page, struct iomap *iomap)
{
@@ -197,12 +197,19 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
iomap_write_failed(inode, pos, len);
return ret;
}
+EXPORT_SYMBOL_GPL(iomap_write_end);
+
+struct iomap_write_args {
+ const struct iomap_ops *ops;
+ struct iov_iter *iter;
+};
static loff_t
iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iomap *iomap)
{
- struct iov_iter *i = data;
+ struct iomap_write_args *args = data;
+ struct iov_iter *i = args->iter;
long status = 0;
ssize_t written = 0;
unsigned int flags = AOP_FLAG_NOFS;
@@ -247,7 +254,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
flush_dcache_page(page);
- status = iomap_write_end(inode, pos, bytes, copied, page,
+ status = args->ops->write_end(inode, pos, bytes, copied, page,
iomap);
if (unlikely(status < 0))
break;
@@ -285,10 +292,14 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
{
struct inode *inode = iocb->ki_filp->f_mapping->host;
loff_t pos = iocb->ki_pos, ret = 0, written = 0;
+ struct iomap_write_args args = {
+ .ops = ops,
+ .iter = iter,
+ };
while (iov_iter_count(iter)) {
ret = iomap_apply(inode, pos, iov_iter_count(iter),
- IOMAP_WRITE, ops, iter, iomap_write_actor);
+ IOMAP_WRITE, ops, &args, iomap_write_actor);
if (ret <= 0)
break;
pos += ret;
@@ -319,6 +330,7 @@ static loff_t
iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iomap *iomap)
{
+ const struct iomap_ops *ops = data;
long status = 0;
ssize_t written = 0;
@@ -335,14 +347,14 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
return PTR_ERR(rpage);
status = iomap_write_begin(inode, pos, bytes,
- AOP_FLAG_NOFS, &page, iomap);
+ AOP_FLAG_NOFS, &page, iomap);
put_page(rpage);
if (unlikely(status))
return status;
WARN_ON_ONCE(!PageUptodate(page));
- status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
+ status = ops->write_end(inode, pos, bytes, bytes, page, iomap);
if (unlikely(status <= 0)) {
if (WARN_ON_ONCE(status == 0))
return -EIO;
@@ -368,7 +380,7 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
loff_t ret;
while (len) {
- ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
+ ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, (void *)ops,
iomap_dirty_actor);
if (ret <= 0)
return ret;
@@ -381,20 +393,21 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
EXPORT_SYMBOL_GPL(iomap_file_dirty);
static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
- unsigned bytes, struct iomap *iomap)
+ unsigned bytes, const struct iomap_ops *ops,
+ struct iomap *iomap)
{
struct page *page;
int status;
status = iomap_write_begin(inode, pos, bytes, AOP_FLAG_NOFS, &page,
- iomap);
+ iomap);
if (status)
return status;
zero_user(page, offset, bytes);
mark_page_accessed(page);
- return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
+ return ops->write_end(inode, pos, bytes, bytes, page, iomap);
}
static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
@@ -407,11 +420,16 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
offset, bytes);
}
+struct iomap_zero_range_args {
+ const struct iomap_ops *ops;
+ bool *did_zero;
+};
+
static loff_t
iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
void *data, struct iomap *iomap)
{
- bool *did_zero = data;
+ struct iomap_zero_range_args *args = data;
loff_t written = 0;
int status;
@@ -428,15 +446,16 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
if (IS_DAX(inode))
status = iomap_dax_zero(pos, offset, bytes, iomap);
else
- status = iomap_zero(inode, pos, offset, bytes, iomap);
+ status = iomap_zero(inode, pos, offset, bytes,
+ (void *)args->ops, iomap);
if (status < 0)
return status;
pos += bytes;
count -= bytes;
written += bytes;
- if (did_zero)
- *did_zero = true;
+ if (args->did_zero)
+ *args->did_zero = true;
} while (count > 0);
return written;
@@ -446,11 +465,15 @@ int
iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
const struct iomap_ops *ops)
{
+ struct iomap_zero_range_args args = {
+ .ops = ops,
+ .did_zero = did_zero,
+ };
loff_t ret;
while (len > 0) {
ret = iomap_apply(inode, pos, len, IOMAP_ZERO,
- ops, did_zero, iomap_zero_range_actor);
+ ops, &args, iomap_zero_range_actor);
if (ret <= 0)
return ret;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 046469fcc1b8..6b817b83a078 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1199,6 +1199,7 @@ xfs_file_iomap_end(
const struct iomap_ops xfs_iomap_ops = {
.iomap_begin = xfs_file_iomap_begin,
+ .write_end = iomap_write_end,
.iomap_end = xfs_file_iomap_end,
};
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index c61113c71a60..8cd67afe78d7 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -62,6 +62,8 @@ struct iomap {
#define IOMAP_DIRECT (1 << 4) /* direct I/O */
#define IOMAP_NOWAIT (1 << 5) /* Don't wait for writeback */
+struct page;
+
struct iomap_ops {
/*
* Return the existing mapping at pos, or reserve space starting at
@@ -71,6 +73,13 @@ struct iomap_ops {
int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
unsigned flags, struct iomap *iomap);
+ /*
+ * End writing to a page. Usually initialized to iomap_write_end.
+ */
+ int (*write_end)(struct inode *inode, loff_t pos, unsigned len,
+ unsigned copied, struct page *page,
+ struct iomap *iomap);
+
/*
* Commit and/or unreserve space previous allocated using iomap_begin.
* Written indicates the length of the successful write operation which
@@ -81,6 +90,9 @@ struct iomap_ops {
ssize_t written, unsigned flags, struct iomap *iomap);
};
+int iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
+ unsigned copied, struct page *page, struct iomap *iomap);
+
ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
const struct iomap_ops *ops);
int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 5/9] iomap: Add write_end iomap operation
@ 2018-06-02 9:57 ` Andreas Gruenbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel.redhat.com
Add a write_end operation to struct iomap_ops to provide a way of
overriding the default behavior of iomap_write_end. This will be used
for implementing data journaling in gfs2: in the data journaling case,
pages are written into the journal before being written back to their
proper on-disk locations.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/ext2/inode.c | 1 +
fs/ext4/inode.c | 1 +
fs/iomap.c | 53 +++++++++++++++++++++++++++++++------------
fs/xfs/xfs_iomap.c | 1 +
include/linux/iomap.h | 12 ++++++++++
5 files changed, 53 insertions(+), 15 deletions(-)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 71635909df3b..2744bf15b74b 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -848,6 +848,7 @@ ext2_iomap_end(struct inode *inode, loff_t offset, loff_t length,
const struct iomap_ops ext2_iomap_ops = {
.iomap_begin = ext2_iomap_begin,
+ .write_end = iomap_write_end,
.iomap_end = ext2_iomap_end,
};
#else
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1e50c5efae67..16dd1b4cc9f9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3604,6 +3604,7 @@ static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length,
const struct iomap_ops ext4_iomap_ops = {
.iomap_begin = ext4_iomap_begin,
+ .write_end = iomap_write_end,
.iomap_end = ext4_iomap_end,
};
diff --git a/fs/iomap.c b/fs/iomap.c
index 857c4b7b54eb..8f7dca17300d 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -178,7 +178,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
return status;
}
-static int
+int
iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
unsigned copied, struct page *page, struct iomap *iomap)
{
@@ -197,12 +197,19 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
iomap_write_failed(inode, pos, len);
return ret;
}
+EXPORT_SYMBOL_GPL(iomap_write_end);
+
+struct iomap_write_args {
+ const struct iomap_ops *ops;
+ struct iov_iter *iter;
+};
static loff_t
iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iomap *iomap)
{
- struct iov_iter *i = data;
+ struct iomap_write_args *args = data;
+ struct iov_iter *i = args->iter;
long status = 0;
ssize_t written = 0;
unsigned int flags = AOP_FLAG_NOFS;
@@ -247,7 +254,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
flush_dcache_page(page);
- status = iomap_write_end(inode, pos, bytes, copied, page,
+ status = args->ops->write_end(inode, pos, bytes, copied, page,
iomap);
if (unlikely(status < 0))
break;
@@ -285,10 +292,14 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter,
{
struct inode *inode = iocb->ki_filp->f_mapping->host;
loff_t pos = iocb->ki_pos, ret = 0, written = 0;
+ struct iomap_write_args args = {
+ .ops = ops,
+ .iter = iter,
+ };
while (iov_iter_count(iter)) {
ret = iomap_apply(inode, pos, iov_iter_count(iter),
- IOMAP_WRITE, ops, iter, iomap_write_actor);
+ IOMAP_WRITE, ops, &args, iomap_write_actor);
if (ret <= 0)
break;
pos += ret;
@@ -319,6 +330,7 @@ static loff_t
iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
struct iomap *iomap)
{
+ const struct iomap_ops *ops = data;
long status = 0;
ssize_t written = 0;
@@ -335,14 +347,14 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
return PTR_ERR(rpage);
status = iomap_write_begin(inode, pos, bytes,
- AOP_FLAG_NOFS, &page, iomap);
+ AOP_FLAG_NOFS, &page, iomap);
put_page(rpage);
if (unlikely(status))
return status;
WARN_ON_ONCE(!PageUptodate(page));
- status = iomap_write_end(inode, pos, bytes, bytes, page, iomap);
+ status = ops->write_end(inode, pos, bytes, bytes, page, iomap);
if (unlikely(status <= 0)) {
if (WARN_ON_ONCE(status == 0))
return -EIO;
@@ -368,7 +380,7 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
loff_t ret;
while (len) {
- ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL,
+ ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, (void *)ops,
iomap_dirty_actor);
if (ret <= 0)
return ret;
@@ -381,20 +393,21 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
EXPORT_SYMBOL_GPL(iomap_file_dirty);
static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset,
- unsigned bytes, struct iomap *iomap)
+ unsigned bytes, const struct iomap_ops *ops,
+ struct iomap *iomap)
{
struct page *page;
int status;
status = iomap_write_begin(inode, pos, bytes, AOP_FLAG_NOFS, &page,
- iomap);
+ iomap);
if (status)
return status;
zero_user(page, offset, bytes);
mark_page_accessed(page);
- return iomap_write_end(inode, pos, bytes, bytes, page, iomap);
+ return ops->write_end(inode, pos, bytes, bytes, page, iomap);
}
static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
@@ -407,11 +420,16 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes,
offset, bytes);
}
+struct iomap_zero_range_args {
+ const struct iomap_ops *ops;
+ bool *did_zero;
+};
+
static loff_t
iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
void *data, struct iomap *iomap)
{
- bool *did_zero = data;
+ struct iomap_zero_range_args *args = data;
loff_t written = 0;
int status;
@@ -428,15 +446,16 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count,
if (IS_DAX(inode))
status = iomap_dax_zero(pos, offset, bytes, iomap);
else
- status = iomap_zero(inode, pos, offset, bytes, iomap);
+ status = iomap_zero(inode, pos, offset, bytes,
+ (void *)args->ops, iomap);
if (status < 0)
return status;
pos += bytes;
count -= bytes;
written += bytes;
- if (did_zero)
- *did_zero = true;
+ if (args->did_zero)
+ *args->did_zero = true;
} while (count > 0);
return written;
@@ -446,11 +465,15 @@ int
iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
const struct iomap_ops *ops)
{
+ struct iomap_zero_range_args args = {
+ .ops = ops,
+ .did_zero = did_zero,
+ };
loff_t ret;
while (len > 0) {
ret = iomap_apply(inode, pos, len, IOMAP_ZERO,
- ops, did_zero, iomap_zero_range_actor);
+ ops, &args, iomap_zero_range_actor);
if (ret <= 0)
return ret;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 046469fcc1b8..6b817b83a078 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1199,6 +1199,7 @@ xfs_file_iomap_end(
const struct iomap_ops xfs_iomap_ops = {
.iomap_begin = xfs_file_iomap_begin,
+ .write_end = iomap_write_end,
.iomap_end = xfs_file_iomap_end,
};
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index c61113c71a60..8cd67afe78d7 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -62,6 +62,8 @@ struct iomap {
#define IOMAP_DIRECT (1 << 4) /* direct I/O */
#define IOMAP_NOWAIT (1 << 5) /* Don't wait for writeback */
+struct page;
+
struct iomap_ops {
/*
* Return the existing mapping@pos, or reserve space starting at
@@ -71,6 +73,13 @@ struct iomap_ops {
int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length,
unsigned flags, struct iomap *iomap);
+ /*
+ * End writing to a page. Usually initialized to iomap_write_end.
+ */
+ int (*write_end)(struct inode *inode, loff_t pos, unsigned len,
+ unsigned copied, struct page *page,
+ struct iomap *iomap);
+
/*
* Commit and/or unreserve space previous allocated using iomap_begin.
* Written indicates the length of the successful write operation which
@@ -81,6 +90,9 @@ struct iomap_ops {
ssize_t written, unsigned flags, struct iomap *iomap);
};
+int iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
+ unsigned copied, struct page *page, struct iomap *iomap);
+
ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
const struct iomap_ops *ops);
int iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len,
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v6 6/9] gfs2: iomap buffered write support
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-06-02 9:57 ` Andreas Gruenbacher
-1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher
With the traditional page-based writes, blocks are allocated separately
for each page written to. With iomap writes, we can allocate a lot more
blocks at once, with a fraction of the allocation overhead for each
page.
Split calculating the number of blocks that can be allocated at a given
position (gfs2_alloc_size) off from gfs2_iomap_alloc: that size
determines the number of blocks to allocate and reserve in the journal.
In gfs2_iomap_alloc, set the type of newly allocated extents to
IOMAP_MAPPED so that iomap_to_bh will set the bh states correctly:
otherwise, the bhs would not be marked as mapped, confusing
__mpage_writepage. This means that we need to check for the IOMAP_F_NEW
flag in fallocate_chunk now.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/gfs2/aops.c | 20 +--
fs/gfs2/aops.h | 22 ++++
fs/gfs2/bmap.c | 328 ++++++++++++++++++++++++++++++++++++++++++++-----
fs/gfs2/file.c | 46 +++++--
4 files changed, 370 insertions(+), 46 deletions(-)
create mode 100644 fs/gfs2/aops.h
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index a1c6b5de9200..3d9633175aa8 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -22,6 +22,7 @@
#include <linux/backing-dev.h>
#include <linux/uio.h>
#include <trace/events/writeback.h>
+#include <linux/sched/signal.h>
#include "gfs2.h"
#include "incore.h"
@@ -36,10 +37,11 @@
#include "super.h"
#include "util.h"
#include "glops.h"
+#include "aops.h"
-static void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
- unsigned int from, unsigned int len)
+void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
+ unsigned int from, unsigned int len)
{
struct buffer_head *head = page_buffers(page);
unsigned int bsize = head->b_size;
@@ -462,7 +464,7 @@ static int gfs2_jdata_writepages(struct address_space *mapping,
* Returns: errno
*/
-static int stuffed_readpage(struct gfs2_inode *ip, struct page *page)
+int stuffed_readpage(struct gfs2_inode *ip, struct page *page)
{
struct buffer_head *dibh;
u64 dsize = i_size_read(&ip->i_inode);
@@ -773,7 +775,7 @@ static int gfs2_write_begin(struct file *file, struct address_space *mapping,
* adjust_fs_space - Adjusts the free space available due to gfs2_grow
* @inode: the rindex inode
*/
-static void adjust_fs_space(struct inode *inode)
+void adjust_fs_space(struct inode *inode)
{
struct gfs2_sbd *sdp = inode->i_sb->s_fs_info;
struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
@@ -819,11 +821,11 @@ static void adjust_fs_space(struct inode *inode)
* This copies the data from the page into the inode block after
* the inode data structure itself.
*
- * Returns: errno
+ * Returns: copied bytes or errno
*/
-static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
- loff_t pos, unsigned copied,
- struct page *page)
+int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
+ loff_t pos, unsigned copied,
+ struct page *page)
{
struct gfs2_inode *ip = GFS2_I(inode);
u64 to = pos + copied;
@@ -862,7 +864,7 @@ static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
* The main write_end function for GFS2. We just put our locking around the VFS
* provided functions.
*
- * Returns: errno
+ * Returns: copied bytes or errno
*/
static int gfs2_write_end(struct file *file, struct address_space *mapping,
diff --git a/fs/gfs2/aops.h b/fs/gfs2/aops.h
new file mode 100644
index 000000000000..976bb32dd405
--- /dev/null
+++ b/fs/gfs2/aops.h
@@ -0,0 +1,22 @@
+/*
+ * Copyright (C) 2017 Red Hat, Inc. All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License version 2.
+ */
+
+#ifndef __AOPS_DOT_H__
+#define __AOPS_DOT_H__
+
+#include "incore.h"
+
+extern int stuffed_readpage(struct gfs2_inode *ip, struct page *page);
+extern int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
+ loff_t pos, unsigned copied,
+ struct page *page);
+extern void adjust_fs_space(struct inode *inode);
+extern void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
+ unsigned int from, unsigned int len);
+
+#endif /* __AOPS_DOT_H__ */
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 59a78056530a..6d4ee6e5c474 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -28,6 +28,7 @@
#include "trans.h"
#include "dir.h"
#include "util.h"
+#include "aops.h"
#include "trace_gfs2.h"
/* This doesn't need to be that large as max 64 bit pointers in a 4k
@@ -41,6 +42,8 @@ struct metapath {
int mp_aheight; /* actual height (lookup height) */
};
+static int punch_hole(struct gfs2_inode *ip, u64 offset, u64 length);
+
/**
* gfs2_unstuffer_page - unstuff a stuffed inode into a block cached by a page
* @ip: the inode
@@ -389,7 +392,7 @@ static int fillup_metapath(struct gfs2_inode *ip, struct metapath *mp, int h)
return mp->mp_aheight - x - 1;
}
-static inline void release_metapath(struct metapath *mp)
+static void release_metapath(struct metapath *mp)
{
int i;
@@ -397,6 +400,7 @@ static inline void release_metapath(struct metapath *mp)
if (mp->mp_bh[i] == NULL)
break;
brelse(mp->mp_bh[i]);
+ mp->mp_bh[i] = NULL;
}
}
@@ -609,11 +613,13 @@ enum alloc_state {
* ii) Indirect blocks to fill in lower part of the metadata tree
* iii) Data blocks
*
- * The function is in two parts. The first part works out the total
- * number of blocks which we need. The second part does the actual
- * allocation asking for an extent at a time (if enough contiguous free
- * blocks are available, there will only be one request per bmap call)
- * and uses the state machine to initialise the blocks in order.
+ * This function is called after gfs2_iomap_get, which works out the
+ * total number of blocks which we need via gfs2_alloc_size.
+ *
+ * We then do the actual allocation asking for an extent at a time (if
+ * enough contiguous free blocks are available, there will only be one
+ * allocation request per call) and uses the state machine to initialise
+ * the blocks in order.
*
* Right now, this function will allocate at most one indirect block
* worth of data -- with a default block size of 4K, that's slightly
@@ -633,39 +639,26 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
struct buffer_head *dibh = mp->mp_bh[0];
u64 bn;
unsigned n, i, blks, alloced = 0, iblks = 0, branch_start = 0;
- unsigned dblks = 0;
- unsigned ptrs_per_blk;
+ size_t dblks = iomap->length >> inode->i_blkbits;
const unsigned end_of_metadata = mp->mp_fheight - 1;
int ret;
enum alloc_state state;
__be64 *ptr;
__be64 zero_bn = 0;
- size_t maxlen = iomap->length >> inode->i_blkbits;
BUG_ON(mp->mp_aheight < 1);
BUG_ON(dibh == NULL);
+ BUG_ON(dblks < 1);
gfs2_trans_add_meta(ip->i_gl, dibh);
down_write(&ip->i_rw_mutex);
if (mp->mp_fheight == mp->mp_aheight) {
- struct buffer_head *bh;
- int eob;
-
- /* Bottom indirect block exists, find unalloced extent size */
- ptr = metapointer(end_of_metadata, mp);
- bh = mp->mp_bh[end_of_metadata];
- dblks = gfs2_extent_length(bh->b_data, bh->b_size, ptr,
- maxlen, &eob);
- BUG_ON(dblks < 1);
+ /* Bottom indirect block exists */
state = ALLOC_DATA;
} else {
/* Need to allocate indirect blocks */
- ptrs_per_blk = mp->mp_fheight > 1 ? sdp->sd_inptrs :
- sdp->sd_diptrs;
- dblks = min(maxlen, (size_t)(ptrs_per_blk -
- mp->mp_list[end_of_metadata]));
if (mp->mp_fheight == ip->i_height) {
/* Writing into existing tree, extend tree down */
iblks = mp->mp_fheight - mp->mp_aheight;
@@ -750,6 +743,7 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
}
} while (iomap->addr == IOMAP_NULL_ADDR);
+ iomap->type = IOMAP_MAPPED;
iomap->length = (u64)dblks << inode->i_blkbits;
ip->i_height = mp->mp_fheight;
gfs2_add_inode_blocks(&ip->i_inode, alloced);
@@ -759,18 +753,63 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
return ret;
}
-static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
+static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap,
+ u64 length)
{
struct gfs2_inode *ip = GFS2_I(inode);
iomap->addr = (ip->i_no_addr << inode->i_blkbits) +
sizeof(struct gfs2_dinode);
iomap->offset = 0;
- iomap->length = i_size_read(inode);
+ iomap->length = length;
iomap->type = IOMAP_INLINE;
iomap->flags = 0;
}
+/**
+ * gfs2_alloc_size - Compute the maximum allocation size
+ * @inode: The inode
+ * @mp: The metapath
+ * @size: Requested size in blocks
+ *
+ * Compute the maximum size of the next allocation at @mp.
+ *
+ * Returns: size in blocks
+ */
+static u64 gfs2_alloc_size(struct inode *inode, struct metapath *mp, u64 size)
+{
+ struct gfs2_inode *ip = GFS2_I(inode);
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
+ const __be64 *first, *ptr, *end;
+
+ /*
+ * For writes to stuffed files, this function is called twice via
+ * gfs2_iomap_get, before and after unstuffing. The size we return the
+ * first time needs to be large enough to get the reservation and
+ * allocation sizes right. The size we return the second time must
+ * be exact or else gfs2_iomap_alloc won't do the right thing.
+ */
+
+ if (gfs2_is_stuffed(ip) || mp->mp_fheight != mp->mp_aheight) {
+ unsigned maxsize = mp->mp_fheight > 1 ?
+ sdp->sd_inptrs : sdp->sd_diptrs;
+ maxsize -= mp->mp_list[mp->mp_fheight - 1];
+ if (size > maxsize)
+ size = maxsize;
+ return size;
+ }
+
+ first = metapointer(ip->i_height - 1, mp);
+ end = metaend(ip->i_height - 1, mp);
+ if (end - first > size)
+ end = first + size;
+ for (ptr = first; ptr < end; ptr++) {
+ if (*ptr)
+ break;
+ }
+ return ptr - first;
+}
+
/**
* gfs2_iomap_get - Map blocks from an inode to disk blocks
* @inode: The inode
@@ -804,10 +843,15 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
if (flags & IOMAP_REPORT) {
if (pos >= i_size_read(inode))
return -ENOENT;
- gfs2_stuffed_iomap(inode, iomap);
+ gfs2_stuffed_iomap(inode, iomap,
+ i_size_read(inode));
+ return 0;
+ }
+ if (pos + length <= gfs2_max_stuffed_size(ip)) {
+ gfs2_stuffed_iomap(inode, iomap,
+ gfs2_max_stuffed_size(ip));
return 0;
}
- BUG_ON(!(flags & IOMAP_WRITE));
}
lblock = pos >> inode->i_blkbits;
iomap->offset = lblock << inode->i_blkbits;
@@ -867,10 +911,160 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
ret = gfs2_hole_size(inode, lblock, len, mp, iomap);
else
iomap->length = size - pos;
+ } else if (flags & IOMAP_WRITE) {
+ u64 size;
+ size = gfs2_alloc_size(inode, mp, len) << inode->i_blkbits;
+ if (size < iomap->length)
+ iomap->length = size;
+ } else {
+ if (height == ip->i_height)
+ ret = gfs2_hole_size(inode, lblock, len, mp, iomap);
}
goto out;
}
+static int gfs2_write_lock(struct inode *inode)
+{
+ struct gfs2_inode *ip = GFS2_I(inode);
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
+ int error;
+
+ gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &ip->i_gh);
+ error = gfs2_glock_nq(&ip->i_gh);
+ if (error)
+ goto out_uninit;
+ if (&ip->i_inode == sdp->sd_rindex) {
+ struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
+
+ error = gfs2_glock_nq_init(m_ip->i_gl, LM_ST_EXCLUSIVE,
+ GL_NOCACHE, &m_ip->i_gh);
+ if (error)
+ goto out_unlock;
+ }
+ return 0;
+
+out_unlock:
+ gfs2_glock_dq(&ip->i_gh);
+out_uninit:
+ gfs2_holder_uninit(&ip->i_gh);
+ return error;
+}
+
+static void gfs2_write_unlock(struct inode *inode)
+{
+ struct gfs2_inode *ip = GFS2_I(inode);
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
+
+ if (&ip->i_inode == sdp->sd_rindex) {
+ struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
+
+ gfs2_glock_dq_uninit(&m_ip->i_gh);
+ }
+ gfs2_glock_dq_uninit(&ip->i_gh);
+}
+
+static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length,
+ unsigned flags, struct iomap *iomap)
+{
+ struct metapath mp = { .mp_aheight = 1, };
+ struct gfs2_inode *ip = GFS2_I(inode);
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
+ unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
+ bool unstuff, alloc_required;
+ struct buffer_head *dibh = NULL;
+ int ret;
+
+ ret = gfs2_write_lock(inode);
+ if (ret)
+ return ret;
+
+ unstuff = gfs2_is_stuffed(ip) &&
+ pos + length > gfs2_max_stuffed_size(ip);
+
+ ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
+ if (ret)
+ goto out_release;
+
+ if (iomap->type == IOMAP_INLINE) {
+ ret = gfs2_meta_inode_buffer(ip, &dibh);
+ if (ret)
+ goto out_release;
+ iomap->inline_data = dibh->b_data + sizeof(struct gfs2_dinode);
+ /* dibh reference put in gfs2_iomap_end */
+ }
+
+ alloc_required = unstuff || iomap->type == IOMAP_HOLE;
+
+ if (alloc_required || gfs2_is_jdata(ip))
+ gfs2_write_calc_reserv(ip, iomap->length, &data_blocks, &ind_blocks);
+
+ if (alloc_required) {
+ struct gfs2_alloc_parms ap = { .target = data_blocks + ind_blocks };
+ ret = gfs2_quota_lock_check(ip, &ap);
+ if (ret)
+ goto out_release;
+
+ ret = gfs2_inplace_reserve(ip, &ap);
+ if (ret)
+ goto out_qunlock;
+ }
+
+ rblocks = RES_DINODE + ind_blocks;
+ if (gfs2_is_jdata(ip))
+ rblocks += data_blocks;
+ if (ind_blocks || data_blocks)
+ rblocks += RES_STATFS + RES_QUOTA;
+ if (inode == sdp->sd_rindex)
+ rblocks += 2 * RES_STATFS;
+ if (alloc_required)
+ rblocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks);
+
+ ret = gfs2_trans_begin(sdp, rblocks, iomap->length >> inode->i_blkbits);
+ if (ret)
+ goto out_trans_fail;
+
+ if (unstuff) {
+ ret = gfs2_unstuff_dinode(ip, NULL);
+ if (ret)
+ goto out_trans_end;
+ release_metapath(&mp);
+ ret = gfs2_iomap_get(inode, iomap->offset, iomap->length,
+ flags, iomap, &mp);
+ if (ret)
+ goto out_trans_end;
+ }
+
+ if (iomap->type == IOMAP_HOLE) {
+ ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
+ if (ret) {
+ gfs2_trans_end(sdp);
+ if (alloc_required)
+ gfs2_inplace_release(ip);
+ punch_hole(ip, iomap->offset, iomap->length);
+ goto out_qunlock;
+ }
+ }
+ release_metapath(&mp);
+ return 0;
+
+out_trans_end:
+ gfs2_trans_end(sdp);
+out_trans_fail:
+ if (alloc_required)
+ gfs2_inplace_release(ip);
+out_qunlock:
+ if (alloc_required)
+ gfs2_quota_unlock(ip);
+out_release:
+ release_metapath(&mp);
+ if (dibh) {
+ brelse(dibh);
+ iomap->inline_data = NULL;
+ }
+ gfs2_write_unlock(inode);
+ return ret;
+}
+
static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
unsigned flags, struct iomap *iomap)
{
@@ -880,10 +1074,7 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
trace_gfs2_iomap_start(ip, pos, length, flags);
if (flags & IOMAP_WRITE) {
- ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
- if (!ret && iomap->type == IOMAP_HOLE)
- ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
- release_metapath(&mp);
+ ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
} else {
ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
release_metapath(&mp);
@@ -892,8 +1083,85 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
return ret;
}
+static int gfs2_iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
+ unsigned copied, struct page *page,
+ struct iomap *iomap)
+{
+ struct gfs2_inode *ip = GFS2_I(inode);
+
+ if (gfs2_is_jdata(ip) && iomap->type != IOMAP_INLINE)
+ gfs2_page_add_databufs(ip, page, offset_in_page(pos), len);
+
+ return iomap_write_end(inode, pos, len, copied, page, iomap);
+}
+
+static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
+ ssize_t written, unsigned flags, struct iomap *iomap)
+{
+ struct gfs2_inode *ip = GFS2_I(inode);
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
+ struct gfs2_trans *tr = current->journal_info;
+
+ if (!(flags & IOMAP_WRITE))
+ return 0;
+
+ if (iomap->type == IOMAP_INLINE) {
+ struct buffer_head *dibh;
+
+ dibh = gfs2_getbuf(ip->i_gl, ip->i_no_addr, NO_CREATE);
+ if (likely(dibh)) {
+ gfs2_trans_add_meta(ip->i_gl, dibh);
+ put_bh(dibh);
+ brelse(dibh);
+ }
+ iomap->inline_data = NULL;
+ goto out;
+ }
+
+ gfs2_ordered_add_inode(ip);
+
+ if (tr->tr_num_buf_new)
+ __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
+ else {
+ struct buffer_head *dibh;
+
+ dibh = gfs2_getbuf(ip->i_gl, ip->i_no_addr, NO_CREATE);
+ if (likely(dibh)) {
+ gfs2_trans_add_meta(ip->i_gl, dibh);
+ brelse(dibh);
+ }
+ }
+
+out:
+ if (inode == sdp->sd_rindex) {
+ adjust_fs_space(inode);
+ sdp->sd_rindex_uptodate = 0;
+ }
+
+ gfs2_trans_end(sdp);
+ gfs2_inplace_release(ip);
+
+ if (length != written && (iomap->flags & IOMAP_F_NEW)) {
+ /* Deallocate blocks that were just allocated. */
+ loff_t blockmask = i_blocksize(inode) - 1;
+ loff_t end = (pos + length) & ~blockmask;
+ pos = (pos + written + blockmask) & ~blockmask;
+ if (pos < end) {
+ truncate_pagecache_range(inode, pos, end - 1);
+ punch_hole(ip, pos, end - pos);
+ }
+ }
+
+ if (ip->i_qadata && ip->i_qadata->qa_qd_num)
+ gfs2_quota_unlock(ip);
+ gfs2_write_unlock(inode);
+ return 0;
+}
+
const struct iomap_ops gfs2_iomap_ops = {
.iomap_begin = gfs2_iomap_begin,
+ .write_end = gfs2_iomap_write_end,
+ .iomap_end = gfs2_iomap_end,
};
/**
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 7137db7b0119..16dd395479a5 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -26,10 +26,12 @@
#include <linux/dlm.h>
#include <linux/dlm_plock.h>
#include <linux/delay.h>
+#include <linux/backing-dev.h>
#include "gfs2.h"
#include "incore.h"
#include "bmap.h"
+#include "aops.h"
#include "dir.h"
#include "glock.h"
#include "glops.h"
@@ -691,9 +693,7 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
/**
* gfs2_file_write_iter - Perform a write to a file
* @iocb: The io context
- * @iov: The data to write
- * @nr_segs: Number of @iov segments
- * @pos: The file position
+ * @from: The data to write
*
* We have to do a lock/unlock here to refresh the inode size for
* O_APPEND writes, otherwise we can land up writing at the wrong
@@ -705,8 +705,9 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
- struct gfs2_inode *ip = GFS2_I(file_inode(file));
- int ret;
+ struct inode *inode = file_inode(file);
+ struct gfs2_inode *ip = GFS2_I(inode);
+ ssize_t ret;
ret = gfs2_rsqa_alloc(ip);
if (ret)
@@ -723,7 +724,38 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
gfs2_glock_dq_uninit(&gh);
}
- return generic_file_write_iter(iocb, from);
+ if (iocb->ki_flags & IOCB_DIRECT)
+ return generic_file_write_iter(iocb, from);
+
+ inode_lock(inode);
+ ret = generic_write_checks(iocb, from);
+ if (ret <= 0)
+ goto out;
+
+ /* We can write back this queue in page reclaim */
+ current->backing_dev_info = inode_to_bdi(inode);
+
+ ret = file_remove_privs(file);
+ if (ret)
+ goto out2;
+
+ ret = file_update_time(file);
+ if (ret)
+ goto out2;
+
+ ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+
+out2:
+ current->backing_dev_info = NULL;
+out:
+ inode_unlock(inode);
+ if (likely(ret > 0)) {
+ iocb->ki_pos += ret;
+
+ /* Handle various SYNC-type writes */
+ ret = generic_write_sync(iocb, ret);
+ }
+ return ret;
}
static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
@@ -754,7 +786,7 @@ static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
if (error)
goto out;
offset = iomap.offset + iomap.length;
- if (iomap.type != IOMAP_HOLE)
+ if (!(iomap.flags & IOMAP_F_NEW))
continue;
error = sb_issue_zeroout(sb, iomap.addr >> inode->i_blkbits,
iomap.length >> inode->i_blkbits,
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 6/9] gfs2: iomap buffered write support
@ 2018-06-02 9:57 ` Andreas Gruenbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel.redhat.com
With the traditional page-based writes, blocks are allocated separately
for each page written to. With iomap writes, we can allocate a lot more
blocks at once, with a fraction of the allocation overhead for each
page.
Split calculating the number of blocks that can be allocated at a given
position (gfs2_alloc_size) off from gfs2_iomap_alloc: that size
determines the number of blocks to allocate and reserve in the journal.
In gfs2_iomap_alloc, set the type of newly allocated extents to
IOMAP_MAPPED so that iomap_to_bh will set the bh states correctly:
otherwise, the bhs would not be marked as mapped, confusing
__mpage_writepage. This means that we need to check for the IOMAP_F_NEW
flag in fallocate_chunk now.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/gfs2/aops.c | 20 +--
fs/gfs2/aops.h | 22 ++++
fs/gfs2/bmap.c | 328 ++++++++++++++++++++++++++++++++++++++++++++-----
fs/gfs2/file.c | 46 +++++--
4 files changed, 370 insertions(+), 46 deletions(-)
create mode 100644 fs/gfs2/aops.h
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index a1c6b5de9200..3d9633175aa8 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -22,6 +22,7 @@
#include <linux/backing-dev.h>
#include <linux/uio.h>
#include <trace/events/writeback.h>
+#include <linux/sched/signal.h>
#include "gfs2.h"
#include "incore.h"
@@ -36,10 +37,11 @@
#include "super.h"
#include "util.h"
#include "glops.h"
+#include "aops.h"
-static void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
- unsigned int from, unsigned int len)
+void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
+ unsigned int from, unsigned int len)
{
struct buffer_head *head = page_buffers(page);
unsigned int bsize = head->b_size;
@@ -462,7 +464,7 @@ static int gfs2_jdata_writepages(struct address_space *mapping,
* Returns: errno
*/
-static int stuffed_readpage(struct gfs2_inode *ip, struct page *page)
+int stuffed_readpage(struct gfs2_inode *ip, struct page *page)
{
struct buffer_head *dibh;
u64 dsize = i_size_read(&ip->i_inode);
@@ -773,7 +775,7 @@ static int gfs2_write_begin(struct file *file, struct address_space *mapping,
* adjust_fs_space - Adjusts the free space available due to gfs2_grow
* @inode: the rindex inode
*/
-static void adjust_fs_space(struct inode *inode)
+void adjust_fs_space(struct inode *inode)
{
struct gfs2_sbd *sdp = inode->i_sb->s_fs_info;
struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
@@ -819,11 +821,11 @@ static void adjust_fs_space(struct inode *inode)
* This copies the data from the page into the inode block after
* the inode data structure itself.
*
- * Returns: errno
+ * Returns: copied bytes or errno
*/
-static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
- loff_t pos, unsigned copied,
- struct page *page)
+int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
+ loff_t pos, unsigned copied,
+ struct page *page)
{
struct gfs2_inode *ip = GFS2_I(inode);
u64 to = pos + copied;
@@ -862,7 +864,7 @@ static int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
* The main write_end function for GFS2. We just put our locking around the VFS
* provided functions.
*
- * Returns: errno
+ * Returns: copied bytes or errno
*/
static int gfs2_write_end(struct file *file, struct address_space *mapping,
diff --git a/fs/gfs2/aops.h b/fs/gfs2/aops.h
new file mode 100644
index 000000000000..976bb32dd405
--- /dev/null
+++ b/fs/gfs2/aops.h
@@ -0,0 +1,22 @@
+/*
+ * Copyright (C) 2017 Red Hat, Inc. All rights reserved.
+ *
+ * This copyrighted material is made available to anyone wishing to use,
+ * modify, copy, or redistribute it subject to the terms and conditions
+ * of the GNU General Public License version 2.
+ */
+
+#ifndef __AOPS_DOT_H__
+#define __AOPS_DOT_H__
+
+#include "incore.h"
+
+extern int stuffed_readpage(struct gfs2_inode *ip, struct page *page);
+extern int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
+ loff_t pos, unsigned copied,
+ struct page *page);
+extern void adjust_fs_space(struct inode *inode);
+extern void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
+ unsigned int from, unsigned int len);
+
+#endif /* __AOPS_DOT_H__ */
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 59a78056530a..6d4ee6e5c474 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -28,6 +28,7 @@
#include "trans.h"
#include "dir.h"
#include "util.h"
+#include "aops.h"
#include "trace_gfs2.h"
/* This doesn't need to be that large as max 64 bit pointers in a 4k
@@ -41,6 +42,8 @@ struct metapath {
int mp_aheight; /* actual height (lookup height) */
};
+static int punch_hole(struct gfs2_inode *ip, u64 offset, u64 length);
+
/**
* gfs2_unstuffer_page - unstuff a stuffed inode into a block cached by a page
* @ip: the inode
@@ -389,7 +392,7 @@ static int fillup_metapath(struct gfs2_inode *ip, struct metapath *mp, int h)
return mp->mp_aheight - x - 1;
}
-static inline void release_metapath(struct metapath *mp)
+static void release_metapath(struct metapath *mp)
{
int i;
@@ -397,6 +400,7 @@ static inline void release_metapath(struct metapath *mp)
if (mp->mp_bh[i] == NULL)
break;
brelse(mp->mp_bh[i]);
+ mp->mp_bh[i] = NULL;
}
}
@@ -609,11 +613,13 @@ enum alloc_state {
* ii) Indirect blocks to fill in lower part of the metadata tree
* iii) Data blocks
*
- * The function is in two parts. The first part works out the total
- * number of blocks which we need. The second part does the actual
- * allocation asking for an extent at a time (if enough contiguous free
- * blocks are available, there will only be one request per bmap call)
- * and uses the state machine to initialise the blocks in order.
+ * This function is called after gfs2_iomap_get, which works out the
+ * total number of blocks which we need via gfs2_alloc_size.
+ *
+ * We then do the actual allocation asking for an extent at a time (if
+ * enough contiguous free blocks are available, there will only be one
+ * allocation request per call) and uses the state machine to initialise
+ * the blocks in order.
*
* Right now, this function will allocate at most one indirect block
* worth of data -- with a default block size of 4K, that's slightly
@@ -633,39 +639,26 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
struct buffer_head *dibh = mp->mp_bh[0];
u64 bn;
unsigned n, i, blks, alloced = 0, iblks = 0, branch_start = 0;
- unsigned dblks = 0;
- unsigned ptrs_per_blk;
+ size_t dblks = iomap->length >> inode->i_blkbits;
const unsigned end_of_metadata = mp->mp_fheight - 1;
int ret;
enum alloc_state state;
__be64 *ptr;
__be64 zero_bn = 0;
- size_t maxlen = iomap->length >> inode->i_blkbits;
BUG_ON(mp->mp_aheight < 1);
BUG_ON(dibh == NULL);
+ BUG_ON(dblks < 1);
gfs2_trans_add_meta(ip->i_gl, dibh);
down_write(&ip->i_rw_mutex);
if (mp->mp_fheight == mp->mp_aheight) {
- struct buffer_head *bh;
- int eob;
-
- /* Bottom indirect block exists, find unalloced extent size */
- ptr = metapointer(end_of_metadata, mp);
- bh = mp->mp_bh[end_of_metadata];
- dblks = gfs2_extent_length(bh->b_data, bh->b_size, ptr,
- maxlen, &eob);
- BUG_ON(dblks < 1);
+ /* Bottom indirect block exists */
state = ALLOC_DATA;
} else {
/* Need to allocate indirect blocks */
- ptrs_per_blk = mp->mp_fheight > 1 ? sdp->sd_inptrs :
- sdp->sd_diptrs;
- dblks = min(maxlen, (size_t)(ptrs_per_blk -
- mp->mp_list[end_of_metadata]));
if (mp->mp_fheight == ip->i_height) {
/* Writing into existing tree, extend tree down */
iblks = mp->mp_fheight - mp->mp_aheight;
@@ -750,6 +743,7 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
}
} while (iomap->addr == IOMAP_NULL_ADDR);
+ iomap->type = IOMAP_MAPPED;
iomap->length = (u64)dblks << inode->i_blkbits;
ip->i_height = mp->mp_fheight;
gfs2_add_inode_blocks(&ip->i_inode, alloced);
@@ -759,18 +753,63 @@ static int gfs2_iomap_alloc(struct inode *inode, struct iomap *iomap,
return ret;
}
-static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap)
+static void gfs2_stuffed_iomap(struct inode *inode, struct iomap *iomap,
+ u64 length)
{
struct gfs2_inode *ip = GFS2_I(inode);
iomap->addr = (ip->i_no_addr << inode->i_blkbits) +
sizeof(struct gfs2_dinode);
iomap->offset = 0;
- iomap->length = i_size_read(inode);
+ iomap->length = length;
iomap->type = IOMAP_INLINE;
iomap->flags = 0;
}
+/**
+ * gfs2_alloc_size - Compute the maximum allocation size
+ * @inode: The inode
+ * @mp: The metapath
+ * @size: Requested size in blocks
+ *
+ * Compute the maximum size of the next allocation at @mp.
+ *
+ * Returns: size in blocks
+ */
+static u64 gfs2_alloc_size(struct inode *inode, struct metapath *mp, u64 size)
+{
+ struct gfs2_inode *ip = GFS2_I(inode);
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
+ const __be64 *first, *ptr, *end;
+
+ /*
+ * For writes to stuffed files, this function is called twice via
+ * gfs2_iomap_get, before and after unstuffing. The size we return the
+ * first time needs to be large enough to get the reservation and
+ * allocation sizes right. The size we return the second time must
+ * be exact or else gfs2_iomap_alloc won't do the right thing.
+ */
+
+ if (gfs2_is_stuffed(ip) || mp->mp_fheight != mp->mp_aheight) {
+ unsigned maxsize = mp->mp_fheight > 1 ?
+ sdp->sd_inptrs : sdp->sd_diptrs;
+ maxsize -= mp->mp_list[mp->mp_fheight - 1];
+ if (size > maxsize)
+ size = maxsize;
+ return size;
+ }
+
+ first = metapointer(ip->i_height - 1, mp);
+ end = metaend(ip->i_height - 1, mp);
+ if (end - first > size)
+ end = first + size;
+ for (ptr = first; ptr < end; ptr++) {
+ if (*ptr)
+ break;
+ }
+ return ptr - first;
+}
+
/**
* gfs2_iomap_get - Map blocks from an inode to disk blocks
* @inode: The inode
@@ -804,10 +843,15 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
if (flags & IOMAP_REPORT) {
if (pos >= i_size_read(inode))
return -ENOENT;
- gfs2_stuffed_iomap(inode, iomap);
+ gfs2_stuffed_iomap(inode, iomap,
+ i_size_read(inode));
+ return 0;
+ }
+ if (pos + length <= gfs2_max_stuffed_size(ip)) {
+ gfs2_stuffed_iomap(inode, iomap,
+ gfs2_max_stuffed_size(ip));
return 0;
}
- BUG_ON(!(flags & IOMAP_WRITE));
}
lblock = pos >> inode->i_blkbits;
iomap->offset = lblock << inode->i_blkbits;
@@ -867,10 +911,160 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
ret = gfs2_hole_size(inode, lblock, len, mp, iomap);
else
iomap->length = size - pos;
+ } else if (flags & IOMAP_WRITE) {
+ u64 size;
+ size = gfs2_alloc_size(inode, mp, len) << inode->i_blkbits;
+ if (size < iomap->length)
+ iomap->length = size;
+ } else {
+ if (height == ip->i_height)
+ ret = gfs2_hole_size(inode, lblock, len, mp, iomap);
}
goto out;
}
+static int gfs2_write_lock(struct inode *inode)
+{
+ struct gfs2_inode *ip = GFS2_I(inode);
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
+ int error;
+
+ gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &ip->i_gh);
+ error = gfs2_glock_nq(&ip->i_gh);
+ if (error)
+ goto out_uninit;
+ if (&ip->i_inode == sdp->sd_rindex) {
+ struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
+
+ error = gfs2_glock_nq_init(m_ip->i_gl, LM_ST_EXCLUSIVE,
+ GL_NOCACHE, &m_ip->i_gh);
+ if (error)
+ goto out_unlock;
+ }
+ return 0;
+
+out_unlock:
+ gfs2_glock_dq(&ip->i_gh);
+out_uninit:
+ gfs2_holder_uninit(&ip->i_gh);
+ return error;
+}
+
+static void gfs2_write_unlock(struct inode *inode)
+{
+ struct gfs2_inode *ip = GFS2_I(inode);
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
+
+ if (&ip->i_inode == sdp->sd_rindex) {
+ struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
+
+ gfs2_glock_dq_uninit(&m_ip->i_gh);
+ }
+ gfs2_glock_dq_uninit(&ip->i_gh);
+}
+
+static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, loff_t length,
+ unsigned flags, struct iomap *iomap)
+{
+ struct metapath mp = { .mp_aheight = 1, };
+ struct gfs2_inode *ip = GFS2_I(inode);
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
+ unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
+ bool unstuff, alloc_required;
+ struct buffer_head *dibh = NULL;
+ int ret;
+
+ ret = gfs2_write_lock(inode);
+ if (ret)
+ return ret;
+
+ unstuff = gfs2_is_stuffed(ip) &&
+ pos + length > gfs2_max_stuffed_size(ip);
+
+ ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
+ if (ret)
+ goto out_release;
+
+ if (iomap->type == IOMAP_INLINE) {
+ ret = gfs2_meta_inode_buffer(ip, &dibh);
+ if (ret)
+ goto out_release;
+ iomap->inline_data = dibh->b_data + sizeof(struct gfs2_dinode);
+ /* dibh reference put in gfs2_iomap_end */
+ }
+
+ alloc_required = unstuff || iomap->type == IOMAP_HOLE;
+
+ if (alloc_required || gfs2_is_jdata(ip))
+ gfs2_write_calc_reserv(ip, iomap->length, &data_blocks, &ind_blocks);
+
+ if (alloc_required) {
+ struct gfs2_alloc_parms ap = { .target = data_blocks + ind_blocks };
+ ret = gfs2_quota_lock_check(ip, &ap);
+ if (ret)
+ goto out_release;
+
+ ret = gfs2_inplace_reserve(ip, &ap);
+ if (ret)
+ goto out_qunlock;
+ }
+
+ rblocks = RES_DINODE + ind_blocks;
+ if (gfs2_is_jdata(ip))
+ rblocks += data_blocks;
+ if (ind_blocks || data_blocks)
+ rblocks += RES_STATFS + RES_QUOTA;
+ if (inode == sdp->sd_rindex)
+ rblocks += 2 * RES_STATFS;
+ if (alloc_required)
+ rblocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks);
+
+ ret = gfs2_trans_begin(sdp, rblocks, iomap->length >> inode->i_blkbits);
+ if (ret)
+ goto out_trans_fail;
+
+ if (unstuff) {
+ ret = gfs2_unstuff_dinode(ip, NULL);
+ if (ret)
+ goto out_trans_end;
+ release_metapath(&mp);
+ ret = gfs2_iomap_get(inode, iomap->offset, iomap->length,
+ flags, iomap, &mp);
+ if (ret)
+ goto out_trans_end;
+ }
+
+ if (iomap->type == IOMAP_HOLE) {
+ ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
+ if (ret) {
+ gfs2_trans_end(sdp);
+ if (alloc_required)
+ gfs2_inplace_release(ip);
+ punch_hole(ip, iomap->offset, iomap->length);
+ goto out_qunlock;
+ }
+ }
+ release_metapath(&mp);
+ return 0;
+
+out_trans_end:
+ gfs2_trans_end(sdp);
+out_trans_fail:
+ if (alloc_required)
+ gfs2_inplace_release(ip);
+out_qunlock:
+ if (alloc_required)
+ gfs2_quota_unlock(ip);
+out_release:
+ release_metapath(&mp);
+ if (dibh) {
+ brelse(dibh);
+ iomap->inline_data = NULL;
+ }
+ gfs2_write_unlock(inode);
+ return ret;
+}
+
static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
unsigned flags, struct iomap *iomap)
{
@@ -880,10 +1074,7 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
trace_gfs2_iomap_start(ip, pos, length, flags);
if (flags & IOMAP_WRITE) {
- ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
- if (!ret && iomap->type == IOMAP_HOLE)
- ret = gfs2_iomap_alloc(inode, iomap, flags, &mp);
- release_metapath(&mp);
+ ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
} else {
ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
release_metapath(&mp);
@@ -892,8 +1083,85 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
return ret;
}
+static int gfs2_iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
+ unsigned copied, struct page *page,
+ struct iomap *iomap)
+{
+ struct gfs2_inode *ip = GFS2_I(inode);
+
+ if (gfs2_is_jdata(ip) && iomap->type != IOMAP_INLINE)
+ gfs2_page_add_databufs(ip, page, offset_in_page(pos), len);
+
+ return iomap_write_end(inode, pos, len, copied, page, iomap);
+}
+
+static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
+ ssize_t written, unsigned flags, struct iomap *iomap)
+{
+ struct gfs2_inode *ip = GFS2_I(inode);
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
+ struct gfs2_trans *tr = current->journal_info;
+
+ if (!(flags & IOMAP_WRITE))
+ return 0;
+
+ if (iomap->type == IOMAP_INLINE) {
+ struct buffer_head *dibh;
+
+ dibh = gfs2_getbuf(ip->i_gl, ip->i_no_addr, NO_CREATE);
+ if (likely(dibh)) {
+ gfs2_trans_add_meta(ip->i_gl, dibh);
+ put_bh(dibh);
+ brelse(dibh);
+ }
+ iomap->inline_data = NULL;
+ goto out;
+ }
+
+ gfs2_ordered_add_inode(ip);
+
+ if (tr->tr_num_buf_new)
+ __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
+ else {
+ struct buffer_head *dibh;
+
+ dibh = gfs2_getbuf(ip->i_gl, ip->i_no_addr, NO_CREATE);
+ if (likely(dibh)) {
+ gfs2_trans_add_meta(ip->i_gl, dibh);
+ brelse(dibh);
+ }
+ }
+
+out:
+ if (inode == sdp->sd_rindex) {
+ adjust_fs_space(inode);
+ sdp->sd_rindex_uptodate = 0;
+ }
+
+ gfs2_trans_end(sdp);
+ gfs2_inplace_release(ip);
+
+ if (length != written && (iomap->flags & IOMAP_F_NEW)) {
+ /* Deallocate blocks that were just allocated. */
+ loff_t blockmask = i_blocksize(inode) - 1;
+ loff_t end = (pos + length) & ~blockmask;
+ pos = (pos + written + blockmask) & ~blockmask;
+ if (pos < end) {
+ truncate_pagecache_range(inode, pos, end - 1);
+ punch_hole(ip, pos, end - pos);
+ }
+ }
+
+ if (ip->i_qadata && ip->i_qadata->qa_qd_num)
+ gfs2_quota_unlock(ip);
+ gfs2_write_unlock(inode);
+ return 0;
+}
+
const struct iomap_ops gfs2_iomap_ops = {
.iomap_begin = gfs2_iomap_begin,
+ .write_end = gfs2_iomap_write_end,
+ .iomap_end = gfs2_iomap_end,
};
/**
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 7137db7b0119..16dd395479a5 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -26,10 +26,12 @@
#include <linux/dlm.h>
#include <linux/dlm_plock.h>
#include <linux/delay.h>
+#include <linux/backing-dev.h>
#include "gfs2.h"
#include "incore.h"
#include "bmap.h"
+#include "aops.h"
#include "dir.h"
#include "glock.h"
#include "glops.h"
@@ -691,9 +693,7 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
/**
* gfs2_file_write_iter - Perform a write to a file
* @iocb: The io context
- * @iov: The data to write
- * @nr_segs: Number of @iov segments
- * @pos: The file position
+ * @from: The data to write
*
* We have to do a lock/unlock here to refresh the inode size for
* O_APPEND writes, otherwise we can land up writing at the wrong
@@ -705,8 +705,9 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
- struct gfs2_inode *ip = GFS2_I(file_inode(file));
- int ret;
+ struct inode *inode = file_inode(file);
+ struct gfs2_inode *ip = GFS2_I(inode);
+ ssize_t ret;
ret = gfs2_rsqa_alloc(ip);
if (ret)
@@ -723,7 +724,38 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
gfs2_glock_dq_uninit(&gh);
}
- return generic_file_write_iter(iocb, from);
+ if (iocb->ki_flags & IOCB_DIRECT)
+ return generic_file_write_iter(iocb, from);
+
+ inode_lock(inode);
+ ret = generic_write_checks(iocb, from);
+ if (ret <= 0)
+ goto out;
+
+ /* We can write back this queue in page reclaim */
+ current->backing_dev_info = inode_to_bdi(inode);
+
+ ret = file_remove_privs(file);
+ if (ret)
+ goto out2;
+
+ ret = file_update_time(file);
+ if (ret)
+ goto out2;
+
+ ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+
+out2:
+ current->backing_dev_info = NULL;
+out:
+ inode_unlock(inode);
+ if (likely(ret > 0)) {
+ iocb->ki_pos += ret;
+
+ /* Handle various SYNC-type writes */
+ ret = generic_write_sync(iocb, ret);
+ }
+ return ret;
}
static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
@@ -754,7 +786,7 @@ static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
if (error)
goto out;
offset = iomap.offset + iomap.length;
- if (iomap.type != IOMAP_HOLE)
+ if (!(iomap.flags & IOMAP_F_NEW))
continue;
error = sb_issue_zeroout(sb, iomap.addr >> inode->i_blkbits,
iomap.length >> inode->i_blkbits,
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v6 7/9] gfs2: gfs2_extent_length cleanup
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-06-02 9:57 ` Andreas Gruenbacher
-1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher
Now that gfs2_extent_length is no longer used for determining the size
of a hole and always with an upper size limit, the function can be
simplified.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/gfs2/bmap.c | 24 ++++++++----------------
1 file changed, 8 insertions(+), 16 deletions(-)
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 6d4ee6e5c474..022c21643129 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -406,22 +406,17 @@ static void release_metapath(struct metapath *mp)
/**
* gfs2_extent_length - Returns length of an extent of blocks
- * @start: Start of the buffer
- * @len: Length of the buffer in bytes
- * @ptr: Current position in the buffer
- * @limit: Max extent length to return (0 = unlimited)
+ * @bh: The metadata block
+ * @ptr: Current position in @bh
+ * @limit: Max extent length to return
* @eob: Set to 1 if we hit "end of block"
*
- * If the first block is zero (unallocated) it will return the number of
- * unallocated blocks in the extent, otherwise it will return the number
- * of contiguous blocks in the extent.
- *
* Returns: The length of the extent (minimum of one block)
*/
-static inline unsigned int gfs2_extent_length(void *start, unsigned int len, __be64 *ptr, size_t limit, int *eob)
+static inline unsigned int gfs2_extent_length(struct buffer_head *bh, __be64 *ptr, size_t limit, int *eob)
{
- const __be64 *end = (start + len);
+ const __be64 *end = (__be64 *)(bh->b_data + bh->b_size);
const __be64 *first = ptr;
u64 d = be64_to_cpu(*ptr);
@@ -430,14 +425,11 @@ static inline unsigned int gfs2_extent_length(void *start, unsigned int len, __b
ptr++;
if (ptr >= end)
break;
- if (limit && --limit == 0)
- break;
- if (d)
- d++;
+ d++;
} while(be64_to_cpu(*ptr) == d);
if (ptr >= end)
*eob = 1;
- return (ptr - first);
+ return ptr - first;
}
typedef const __be64 *(*gfs2_metadata_walker)(
@@ -883,7 +875,7 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
goto do_alloc;
bh = mp->mp_bh[ip->i_height - 1];
- len = gfs2_extent_length(bh->b_data, bh->b_size, ptr, len, &eob);
+ len = gfs2_extent_length(bh, ptr, len, &eob);
iomap->addr = be64_to_cpu(*ptr) << inode->i_blkbits;
iomap->length = len << inode->i_blkbits;
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 7/9] gfs2: gfs2_extent_length cleanup
@ 2018-06-02 9:57 ` Andreas Gruenbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel.redhat.com
Now that gfs2_extent_length is no longer used for determining the size
of a hole and always with an upper size limit, the function can be
simplified.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/gfs2/bmap.c | 24 ++++++++----------------
1 file changed, 8 insertions(+), 16 deletions(-)
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 6d4ee6e5c474..022c21643129 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -406,22 +406,17 @@ static void release_metapath(struct metapath *mp)
/**
* gfs2_extent_length - Returns length of an extent of blocks
- * @start: Start of the buffer
- * @len: Length of the buffer in bytes
- * @ptr: Current position in the buffer
- * @limit: Max extent length to return (0 = unlimited)
+ * @bh: The metadata block
+ * @ptr: Current position in @bh
+ * @limit: Max extent length to return
* @eob: Set to 1 if we hit "end of block"
*
- * If the first block is zero (unallocated) it will return the number of
- * unallocated blocks in the extent, otherwise it will return the number
- * of contiguous blocks in the extent.
- *
* Returns: The length of the extent (minimum of one block)
*/
-static inline unsigned int gfs2_extent_length(void *start, unsigned int len, __be64 *ptr, size_t limit, int *eob)
+static inline unsigned int gfs2_extent_length(struct buffer_head *bh, __be64 *ptr, size_t limit, int *eob)
{
- const __be64 *end = (start + len);
+ const __be64 *end = (__be64 *)(bh->b_data + bh->b_size);
const __be64 *first = ptr;
u64 d = be64_to_cpu(*ptr);
@@ -430,14 +425,11 @@ static inline unsigned int gfs2_extent_length(void *start, unsigned int len, __b
ptr++;
if (ptr >= end)
break;
- if (limit && --limit == 0)
- break;
- if (d)
- d++;
+ d++;
} while(be64_to_cpu(*ptr) == d);
if (ptr >= end)
*eob = 1;
- return (ptr - first);
+ return ptr - first;
}
typedef const __be64 *(*gfs2_metadata_walker)(
@@ -883,7 +875,7 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
goto do_alloc;
bh = mp->mp_bh[ip->i_height - 1];
- len = gfs2_extent_length(bh->b_data, bh->b_size, ptr, len, &eob);
+ len = gfs2_extent_length(bh, ptr, len, &eob);
iomap->addr = be64_to_cpu(*ptr) << inode->i_blkbits;
iomap->length = len << inode->i_blkbits;
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v6 8/9] gfs2: iomap direct I/O support
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-06-02 9:57 ` Andreas Gruenbacher
-1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher
The page unmapping previously done in gfs2_direct_IO is now done
generically in iomap_dio_rw.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/gfs2/aops.c | 100 +----------------------------------
fs/gfs2/bmap.c | 15 +++++-
fs/gfs2/file.c | 138 +++++++++++++++++++++++++++++++++++++++++++++----
3 files changed, 143 insertions(+), 110 deletions(-)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 3d9633175aa8..b0a219fae162 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -84,12 +84,6 @@ static int gfs2_get_block_noalloc(struct inode *inode, sector_t lblock,
return 0;
}
-static int gfs2_get_block_direct(struct inode *inode, sector_t lblock,
- struct buffer_head *bh_result, int create)
-{
- return gfs2_block_map(inode, lblock, bh_result, 0);
-}
-
/**
* gfs2_writepage_common - Common bits of writepage
* @page: The page to be written
@@ -1021,96 +1015,6 @@ static void gfs2_invalidatepage(struct page *page, unsigned int offset,
try_to_release_page(page, 0);
}
-/**
- * gfs2_ok_for_dio - check that dio is valid on this file
- * @ip: The inode
- * @offset: The offset at which we are reading or writing
- *
- * Returns: 0 (to ignore the i/o request and thus fall back to buffered i/o)
- * 1 (to accept the i/o request)
- */
-static int gfs2_ok_for_dio(struct gfs2_inode *ip, loff_t offset)
-{
- /*
- * Should we return an error here? I can't see that O_DIRECT for
- * a stuffed file makes any sense. For now we'll silently fall
- * back to buffered I/O
- */
- if (gfs2_is_stuffed(ip))
- return 0;
-
- if (offset >= i_size_read(&ip->i_inode))
- return 0;
- return 1;
-}
-
-
-
-static ssize_t gfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
-{
- struct file *file = iocb->ki_filp;
- struct inode *inode = file->f_mapping->host;
- struct address_space *mapping = inode->i_mapping;
- struct gfs2_inode *ip = GFS2_I(inode);
- loff_t offset = iocb->ki_pos;
- struct gfs2_holder gh;
- int rv;
-
- /*
- * Deferred lock, even if its a write, since we do no allocation
- * on this path. All we need change is atime, and this lock mode
- * ensures that other nodes have flushed their buffered read caches
- * (i.e. their page cache entries for this inode). We do not,
- * unfortunately have the option of only flushing a range like
- * the VFS does.
- */
- gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
- rv = gfs2_glock_nq(&gh);
- if (rv)
- goto out_uninit;
- rv = gfs2_ok_for_dio(ip, offset);
- if (rv != 1)
- goto out; /* dio not valid, fall back to buffered i/o */
-
- /*
- * Now since we are holding a deferred (CW) lock at this point, you
- * might be wondering why this is ever needed. There is a case however
- * where we've granted a deferred local lock against a cached exclusive
- * glock. That is ok provided all granted local locks are deferred, but
- * it also means that it is possible to encounter pages which are
- * cached and possibly also mapped. So here we check for that and sort
- * them out ahead of the dio. The glock state machine will take care of
- * everything else.
- *
- * If in fact the cached glock state (gl->gl_state) is deferred (CW) in
- * the first place, mapping->nr_pages will always be zero.
- */
- if (mapping->nrpages) {
- loff_t lstart = offset & ~(PAGE_SIZE - 1);
- loff_t len = iov_iter_count(iter);
- loff_t end = PAGE_ALIGN(offset + len) - 1;
-
- rv = 0;
- if (len == 0)
- goto out;
- if (test_and_clear_bit(GIF_SW_PAGED, &ip->i_flags))
- unmap_shared_mapping_range(ip->i_inode.i_mapping, offset, len);
- rv = filemap_write_and_wait_range(mapping, lstart, end);
- if (rv)
- goto out;
- if (iov_iter_rw(iter) == WRITE)
- truncate_inode_pages_range(mapping, lstart, end);
- }
-
- rv = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter,
- gfs2_get_block_direct, NULL, NULL, 0);
-out:
- gfs2_glock_dq(&gh);
-out_uninit:
- gfs2_holder_uninit(&gh);
- return rv;
-}
-
/**
* gfs2_releasepage - free the metadata associated with a page
* @page: the page that's being released
@@ -1191,7 +1095,7 @@ static const struct address_space_operations gfs2_writeback_aops = {
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
.releasepage = gfs2_releasepage,
- .direct_IO = gfs2_direct_IO,
+ .direct_IO = noop_direct_IO,
.migratepage = buffer_migrate_page,
.is_partially_uptodate = block_is_partially_uptodate,
.error_remove_page = generic_error_remove_page,
@@ -1208,7 +1112,7 @@ static const struct address_space_operations gfs2_ordered_aops = {
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
.releasepage = gfs2_releasepage,
- .direct_IO = gfs2_direct_IO,
+ .direct_IO = noop_direct_IO,
.migratepage = buffer_migrate_page,
.is_partially_uptodate = block_is_partially_uptodate,
.error_remove_page = generic_error_remove_page,
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 022c21643129..e2ddf498da9d 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -905,6 +905,10 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
iomap->length = size - pos;
} else if (flags & IOMAP_WRITE) {
u64 size;
+
+ if (flags & IOMAP_DIRECT)
+ goto out;
+
size = gfs2_alloc_size(inode, mp, len) << inode->i_blkbits;
if (size < iomap->length)
iomap->length = size;
@@ -1066,7 +1070,14 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
trace_gfs2_iomap_start(ip, pos, length, flags);
if (flags & IOMAP_WRITE) {
- ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
+ if (flags & IOMAP_DIRECT) {
+ ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
+ release_metapath(&mp);
+ if (iomap->type != IOMAP_MAPPED)
+ ret = -ENOTBLK;
+ } else {
+ ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
+ }
} else {
ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
release_metapath(&mp);
@@ -1094,7 +1105,7 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
struct gfs2_sbd *sdp = GFS2_SB(inode);
struct gfs2_trans *tr = current->journal_info;
- if (!(flags & IOMAP_WRITE))
+ if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
return 0;
if (iomap->type == IOMAP_INLINE) {
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 16dd395479a5..8de92708f18b 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -690,6 +690,91 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
return ret ? ret : ret1;
}
+static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to)
+{
+ struct file *file = iocb->ki_filp;
+ struct gfs2_inode *ip = GFS2_I(file->f_mapping->host);
+ size_t count = iov_iter_count(to);
+ struct gfs2_holder gh;
+ ssize_t ret;
+
+ if (!count)
+ return 0; /* skip atime */
+
+ gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
+ ret = gfs2_glock_nq(&gh);
+ if (ret)
+ goto out_uninit;
+
+ /* fall back to buffered I/O for stuffed files */
+ ret = -ENOTBLK;
+ if (gfs2_is_stuffed(ip))
+ goto out;
+
+ ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL);
+
+out:
+ gfs2_glock_dq(&gh);
+out_uninit:
+ gfs2_holder_uninit(&gh);
+ return ret;
+}
+
+static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
+{
+ struct file *file = iocb->ki_filp;
+ struct inode *inode = file->f_mapping->host;
+ struct gfs2_inode *ip = GFS2_I(inode);
+ size_t len = iov_iter_count(from);
+ loff_t offset = iocb->ki_pos;
+ struct gfs2_holder gh;
+ ssize_t ret;
+
+ /*
+ * Deferred lock, even if its a write, since we do no allocation on
+ * this path. All we need to change is the atime, and this lock mode
+ * ensures that other nodes have flushed their buffered read caches
+ * (i.e. their page cache entries for this inode). We do not,
+ * unfortunately, have the option of only flushing a range like the
+ * VFS does.
+ */
+ gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
+ ret = gfs2_glock_nq(&gh);
+ if (ret)
+ goto out_uninit;
+
+ /* Silently fall back to buffered I/O for stuffed files */
+ if (gfs2_is_stuffed(ip))
+ goto out;
+
+ /* Silently fall back to buffered I/O when writing beyond EOF */
+ if (offset + len > i_size_read(&ip->i_inode))
+ goto out;
+
+ ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL);
+
+out:
+ gfs2_glock_dq(&gh);
+out_uninit:
+ gfs2_holder_uninit(&gh);
+ return ret;
+}
+
+static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+ ssize_t ret;
+
+ if (iocb->ki_flags & IOCB_DIRECT) {
+ ret = gfs2_file_direct_read(iocb, to);
+ if (likely(ret != -ENOTBLK))
+ goto out;
+ iocb->ki_flags &= ~IOCB_DIRECT;
+ }
+ ret = generic_file_read_iter(iocb, to);
+out:
+ return ret;
+}
+
/**
* gfs2_file_write_iter - Perform a write to a file
* @iocb: The io context
@@ -707,7 +792,7 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct file *file = iocb->ki_filp;
struct inode *inode = file_inode(file);
struct gfs2_inode *ip = GFS2_I(inode);
- ssize_t ret;
+ ssize_t written = 0, ret;
ret = gfs2_rsqa_alloc(ip);
if (ret)
@@ -724,9 +809,6 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
gfs2_glock_dq_uninit(&gh);
}
- if (iocb->ki_flags & IOCB_DIRECT)
- return generic_file_write_iter(iocb, from);
-
inode_lock(inode);
ret = generic_write_checks(iocb, from);
if (ret <= 0)
@@ -743,19 +825,55 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (ret)
goto out2;
- ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+ if (iocb->ki_flags & IOCB_DIRECT) {
+ struct address_space *mapping = file->f_mapping;
+ loff_t pos, endbyte;
+ ssize_t buffered;
+
+ written = gfs2_file_direct_write(iocb, from);
+ if (written < 0 || !iov_iter_count(from))
+ goto out2;
+
+ ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+ if (unlikely(ret < 0))
+ goto out2;
+ buffered = ret;
+
+ /*
+ * We need to ensure that the page cache pages are written to
+ * disk and invalidated to preserve the expected O_DIRECT
+ * semantics.
+ */
+ pos = iocb->ki_pos;
+ endbyte = pos + buffered - 1;
+ ret = filemap_write_and_wait_range(mapping, pos, endbyte);
+ if (!ret) {
+ iocb->ki_pos += buffered;
+ written += buffered;
+ invalidate_mapping_pages(mapping,
+ pos >> PAGE_SHIFT,
+ endbyte >> PAGE_SHIFT);
+ } else {
+ /*
+ * We don't know how much we wrote, so just return
+ * the number of bytes which were direct-written
+ */
+ }
+ } else {
+ ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+ if (likely(ret > 0))
+ iocb->ki_pos += ret;
+ }
out2:
current->backing_dev_info = NULL;
out:
inode_unlock(inode);
if (likely(ret > 0)) {
- iocb->ki_pos += ret;
-
/* Handle various SYNC-type writes */
ret = generic_write_sync(iocb, ret);
}
- return ret;
+ return written ? written : ret;
}
static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
@@ -1157,7 +1275,7 @@ static int gfs2_flock(struct file *file, int cmd, struct file_lock *fl)
const struct file_operations gfs2_file_fops = {
.llseek = gfs2_llseek,
- .read_iter = generic_file_read_iter,
+ .read_iter = gfs2_file_read_iter,
.write_iter = gfs2_file_write_iter,
.unlocked_ioctl = gfs2_ioctl,
.mmap = gfs2_mmap,
@@ -1187,7 +1305,7 @@ const struct file_operations gfs2_dir_fops = {
const struct file_operations gfs2_file_fops_nolock = {
.llseek = gfs2_llseek,
- .read_iter = generic_file_read_iter,
+ .read_iter = gfs2_file_read_iter,
.write_iter = gfs2_file_write_iter,
.unlocked_ioctl = gfs2_ioctl,
.mmap = gfs2_mmap,
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 8/9] gfs2: iomap direct I/O support
@ 2018-06-02 9:57 ` Andreas Gruenbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel.redhat.com
The page unmapping previously done in gfs2_direct_IO is now done
generically in iomap_dio_rw.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/gfs2/aops.c | 100 +----------------------------------
fs/gfs2/bmap.c | 15 +++++-
fs/gfs2/file.c | 138 +++++++++++++++++++++++++++++++++++++++++++++----
3 files changed, 143 insertions(+), 110 deletions(-)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 3d9633175aa8..b0a219fae162 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -84,12 +84,6 @@ static int gfs2_get_block_noalloc(struct inode *inode, sector_t lblock,
return 0;
}
-static int gfs2_get_block_direct(struct inode *inode, sector_t lblock,
- struct buffer_head *bh_result, int create)
-{
- return gfs2_block_map(inode, lblock, bh_result, 0);
-}
-
/**
* gfs2_writepage_common - Common bits of writepage
* @page: The page to be written
@@ -1021,96 +1015,6 @@ static void gfs2_invalidatepage(struct page *page, unsigned int offset,
try_to_release_page(page, 0);
}
-/**
- * gfs2_ok_for_dio - check that dio is valid on this file
- * @ip: The inode
- * @offset: The offset at which we are reading or writing
- *
- * Returns: 0 (to ignore the i/o request and thus fall back to buffered i/o)
- * 1 (to accept the i/o request)
- */
-static int gfs2_ok_for_dio(struct gfs2_inode *ip, loff_t offset)
-{
- /*
- * Should we return an error here? I can't see that O_DIRECT for
- * a stuffed file makes any sense. For now we'll silently fall
- * back to buffered I/O
- */
- if (gfs2_is_stuffed(ip))
- return 0;
-
- if (offset >= i_size_read(&ip->i_inode))
- return 0;
- return 1;
-}
-
-
-
-static ssize_t gfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
-{
- struct file *file = iocb->ki_filp;
- struct inode *inode = file->f_mapping->host;
- struct address_space *mapping = inode->i_mapping;
- struct gfs2_inode *ip = GFS2_I(inode);
- loff_t offset = iocb->ki_pos;
- struct gfs2_holder gh;
- int rv;
-
- /*
- * Deferred lock, even if its a write, since we do no allocation
- * on this path. All we need change is atime, and this lock mode
- * ensures that other nodes have flushed their buffered read caches
- * (i.e. their page cache entries for this inode). We do not,
- * unfortunately have the option of only flushing a range like
- * the VFS does.
- */
- gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
- rv = gfs2_glock_nq(&gh);
- if (rv)
- goto out_uninit;
- rv = gfs2_ok_for_dio(ip, offset);
- if (rv != 1)
- goto out; /* dio not valid, fall back to buffered i/o */
-
- /*
- * Now since we are holding a deferred (CW) lock at this point, you
- * might be wondering why this is ever needed. There is a case however
- * where we've granted a deferred local lock against a cached exclusive
- * glock. That is ok provided all granted local locks are deferred, but
- * it also means that it is possible to encounter pages which are
- * cached and possibly also mapped. So here we check for that and sort
- * them out ahead of the dio. The glock state machine will take care of
- * everything else.
- *
- * If in fact the cached glock state (gl->gl_state) is deferred (CW) in
- * the first place, mapping->nr_pages will always be zero.
- */
- if (mapping->nrpages) {
- loff_t lstart = offset & ~(PAGE_SIZE - 1);
- loff_t len = iov_iter_count(iter);
- loff_t end = PAGE_ALIGN(offset + len) - 1;
-
- rv = 0;
- if (len == 0)
- goto out;
- if (test_and_clear_bit(GIF_SW_PAGED, &ip->i_flags))
- unmap_shared_mapping_range(ip->i_inode.i_mapping, offset, len);
- rv = filemap_write_and_wait_range(mapping, lstart, end);
- if (rv)
- goto out;
- if (iov_iter_rw(iter) == WRITE)
- truncate_inode_pages_range(mapping, lstart, end);
- }
-
- rv = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev, iter,
- gfs2_get_block_direct, NULL, NULL, 0);
-out:
- gfs2_glock_dq(&gh);
-out_uninit:
- gfs2_holder_uninit(&gh);
- return rv;
-}
-
/**
* gfs2_releasepage - free the metadata associated with a page
* @page: the page that's being released
@@ -1191,7 +1095,7 @@ static const struct address_space_operations gfs2_writeback_aops = {
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
.releasepage = gfs2_releasepage,
- .direct_IO = gfs2_direct_IO,
+ .direct_IO = noop_direct_IO,
.migratepage = buffer_migrate_page,
.is_partially_uptodate = block_is_partially_uptodate,
.error_remove_page = generic_error_remove_page,
@@ -1208,7 +1112,7 @@ static const struct address_space_operations gfs2_ordered_aops = {
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
.releasepage = gfs2_releasepage,
- .direct_IO = gfs2_direct_IO,
+ .direct_IO = noop_direct_IO,
.migratepage = buffer_migrate_page,
.is_partially_uptodate = block_is_partially_uptodate,
.error_remove_page = generic_error_remove_page,
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 022c21643129..e2ddf498da9d 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -905,6 +905,10 @@ static int gfs2_iomap_get(struct inode *inode, loff_t pos, loff_t length,
iomap->length = size - pos;
} else if (flags & IOMAP_WRITE) {
u64 size;
+
+ if (flags & IOMAP_DIRECT)
+ goto out;
+
size = gfs2_alloc_size(inode, mp, len) << inode->i_blkbits;
if (size < iomap->length)
iomap->length = size;
@@ -1066,7 +1070,14 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
trace_gfs2_iomap_start(ip, pos, length, flags);
if (flags & IOMAP_WRITE) {
- ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
+ if (flags & IOMAP_DIRECT) {
+ ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
+ release_metapath(&mp);
+ if (iomap->type != IOMAP_MAPPED)
+ ret = -ENOTBLK;
+ } else {
+ ret = gfs2_iomap_begin_write(inode, pos, length, flags, iomap);
+ }
} else {
ret = gfs2_iomap_get(inode, pos, length, flags, iomap, &mp);
release_metapath(&mp);
@@ -1094,7 +1105,7 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
struct gfs2_sbd *sdp = GFS2_SB(inode);
struct gfs2_trans *tr = current->journal_info;
- if (!(flags & IOMAP_WRITE))
+ if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
return 0;
if (iomap->type == IOMAP_INLINE) {
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 16dd395479a5..8de92708f18b 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -690,6 +690,91 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
return ret ? ret : ret1;
}
+static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to)
+{
+ struct file *file = iocb->ki_filp;
+ struct gfs2_inode *ip = GFS2_I(file->f_mapping->host);
+ size_t count = iov_iter_count(to);
+ struct gfs2_holder gh;
+ ssize_t ret;
+
+ if (!count)
+ return 0; /* skip atime */
+
+ gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
+ ret = gfs2_glock_nq(&gh);
+ if (ret)
+ goto out_uninit;
+
+ /* fall back to buffered I/O for stuffed files */
+ ret = -ENOTBLK;
+ if (gfs2_is_stuffed(ip))
+ goto out;
+
+ ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL);
+
+out:
+ gfs2_glock_dq(&gh);
+out_uninit:
+ gfs2_holder_uninit(&gh);
+ return ret;
+}
+
+static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
+{
+ struct file *file = iocb->ki_filp;
+ struct inode *inode = file->f_mapping->host;
+ struct gfs2_inode *ip = GFS2_I(inode);
+ size_t len = iov_iter_count(from);
+ loff_t offset = iocb->ki_pos;
+ struct gfs2_holder gh;
+ ssize_t ret;
+
+ /*
+ * Deferred lock, even if its a write, since we do no allocation on
+ * this path. All we need to change is the atime, and this lock mode
+ * ensures that other nodes have flushed their buffered read caches
+ * (i.e. their page cache entries for this inode). We do not,
+ * unfortunately, have the option of only flushing a range like the
+ * VFS does.
+ */
+ gfs2_holder_init(ip->i_gl, LM_ST_DEFERRED, 0, &gh);
+ ret = gfs2_glock_nq(&gh);
+ if (ret)
+ goto out_uninit;
+
+ /* Silently fall back to buffered I/O for stuffed files */
+ if (gfs2_is_stuffed(ip))
+ goto out;
+
+ /* Silently fall back to buffered I/O when writing beyond EOF */
+ if (offset + len > i_size_read(&ip->i_inode))
+ goto out;
+
+ ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL);
+
+out:
+ gfs2_glock_dq(&gh);
+out_uninit:
+ gfs2_holder_uninit(&gh);
+ return ret;
+}
+
+static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+ ssize_t ret;
+
+ if (iocb->ki_flags & IOCB_DIRECT) {
+ ret = gfs2_file_direct_read(iocb, to);
+ if (likely(ret != -ENOTBLK))
+ goto out;
+ iocb->ki_flags &= ~IOCB_DIRECT;
+ }
+ ret = generic_file_read_iter(iocb, to);
+out:
+ return ret;
+}
+
/**
* gfs2_file_write_iter - Perform a write to a file
* @iocb: The io context
@@ -707,7 +792,7 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct file *file = iocb->ki_filp;
struct inode *inode = file_inode(file);
struct gfs2_inode *ip = GFS2_I(inode);
- ssize_t ret;
+ ssize_t written = 0, ret;
ret = gfs2_rsqa_alloc(ip);
if (ret)
@@ -724,9 +809,6 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
gfs2_glock_dq_uninit(&gh);
}
- if (iocb->ki_flags & IOCB_DIRECT)
- return generic_file_write_iter(iocb, from);
-
inode_lock(inode);
ret = generic_write_checks(iocb, from);
if (ret <= 0)
@@ -743,19 +825,55 @@ static ssize_t gfs2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
if (ret)
goto out2;
- ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+ if (iocb->ki_flags & IOCB_DIRECT) {
+ struct address_space *mapping = file->f_mapping;
+ loff_t pos, endbyte;
+ ssize_t buffered;
+
+ written = gfs2_file_direct_write(iocb, from);
+ if (written < 0 || !iov_iter_count(from))
+ goto out2;
+
+ ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+ if (unlikely(ret < 0))
+ goto out2;
+ buffered = ret;
+
+ /*
+ * We need to ensure that the page cache pages are written to
+ * disk and invalidated to preserve the expected O_DIRECT
+ * semantics.
+ */
+ pos = iocb->ki_pos;
+ endbyte = pos + buffered - 1;
+ ret = filemap_write_and_wait_range(mapping, pos, endbyte);
+ if (!ret) {
+ iocb->ki_pos += buffered;
+ written += buffered;
+ invalidate_mapping_pages(mapping,
+ pos >> PAGE_SHIFT,
+ endbyte >> PAGE_SHIFT);
+ } else {
+ /*
+ * We don't know how much we wrote, so just return
+ * the number of bytes which were direct-written
+ */
+ }
+ } else {
+ ret = iomap_file_buffered_write(iocb, from, &gfs2_iomap_ops);
+ if (likely(ret > 0))
+ iocb->ki_pos += ret;
+ }
out2:
current->backing_dev_info = NULL;
out:
inode_unlock(inode);
if (likely(ret > 0)) {
- iocb->ki_pos += ret;
-
/* Handle various SYNC-type writes */
ret = generic_write_sync(iocb, ret);
}
- return ret;
+ return written ? written : ret;
}
static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
@@ -1157,7 +1275,7 @@ static int gfs2_flock(struct file *file, int cmd, struct file_lock *fl)
const struct file_operations gfs2_file_fops = {
.llseek = gfs2_llseek,
- .read_iter = generic_file_read_iter,
+ .read_iter = gfs2_file_read_iter,
.write_iter = gfs2_file_write_iter,
.unlocked_ioctl = gfs2_ioctl,
.mmap = gfs2_mmap,
@@ -1187,7 +1305,7 @@ const struct file_operations gfs2_dir_fops = {
const struct file_operations gfs2_file_fops_nolock = {
.llseek = gfs2_llseek,
- .read_iter = generic_file_read_iter,
+ .read_iter = gfs2_file_read_iter,
.write_iter = gfs2_file_write_iter,
.unlocked_ioctl = gfs2_ioctl,
.mmap = gfs2_mmap,
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v6 9/9] gfs2: Remove gfs2_write_{begin,end}
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-06-02 9:57 ` Andreas Gruenbacher
-1 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel, Christoph Hellwig; +Cc: linux-fsdevel, Andreas Gruenbacher
Now that generic_file_write_iter is no longer used, there are no
remaining users of these address space operations.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/gfs2/aops.c | 210 -------------------------------------------------
1 file changed, 210 deletions(-)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index b0a219fae162..cc80fd71f3dd 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -639,132 +639,6 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
return ret;
}
-/**
- * gfs2_write_begin - Begin to write to a file
- * @file: The file to write to
- * @mapping: The mapping in which to write
- * @pos: The file offset at which to start writing
- * @len: Length of the write
- * @flags: Various flags
- * @pagep: Pointer to return the page
- * @fsdata: Pointer to return fs data (unused by GFS2)
- *
- * Returns: errno
- */
-
-static int gfs2_write_begin(struct file *file, struct address_space *mapping,
- loff_t pos, unsigned len, unsigned flags,
- struct page **pagep, void **fsdata)
-{
- struct gfs2_inode *ip = GFS2_I(mapping->host);
- struct gfs2_sbd *sdp = GFS2_SB(mapping->host);
- struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
- unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
- unsigned requested = 0;
- int alloc_required;
- int error = 0;
- pgoff_t index = pos >> PAGE_SHIFT;
- unsigned from = pos & (PAGE_SIZE - 1);
- struct page *page;
-
- gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &ip->i_gh);
- error = gfs2_glock_nq(&ip->i_gh);
- if (unlikely(error))
- goto out_uninit;
- if (&ip->i_inode == sdp->sd_rindex) {
- error = gfs2_glock_nq_init(m_ip->i_gl, LM_ST_EXCLUSIVE,
- GL_NOCACHE, &m_ip->i_gh);
- if (unlikely(error)) {
- gfs2_glock_dq(&ip->i_gh);
- goto out_uninit;
- }
- }
-
- alloc_required = gfs2_write_alloc_required(ip, pos, len);
-
- if (alloc_required || gfs2_is_jdata(ip))
- gfs2_write_calc_reserv(ip, len, &data_blocks, &ind_blocks);
-
- if (alloc_required) {
- struct gfs2_alloc_parms ap = { .aflags = 0, };
- requested = data_blocks + ind_blocks;
- ap.target = requested;
- error = gfs2_quota_lock_check(ip, &ap);
- if (error)
- goto out_unlock;
-
- error = gfs2_inplace_reserve(ip, &ap);
- if (error)
- goto out_qunlock;
- }
-
- rblocks = RES_DINODE + ind_blocks;
- if (gfs2_is_jdata(ip))
- rblocks += data_blocks ? data_blocks : 1;
- if (ind_blocks || data_blocks)
- rblocks += RES_STATFS + RES_QUOTA;
- if (&ip->i_inode == sdp->sd_rindex)
- rblocks += 2 * RES_STATFS;
- if (alloc_required)
- rblocks += gfs2_rg_blocks(ip, requested);
-
- error = gfs2_trans_begin(sdp, rblocks,
- PAGE_SIZE/sdp->sd_sb.sb_bsize);
- if (error)
- goto out_trans_fail;
-
- error = -ENOMEM;
- flags |= AOP_FLAG_NOFS;
- page = grab_cache_page_write_begin(mapping, index, flags);
- *pagep = page;
- if (unlikely(!page))
- goto out_endtrans;
-
- if (gfs2_is_stuffed(ip)) {
- error = 0;
- if (pos + len > gfs2_max_stuffed_size(ip)) {
- error = gfs2_unstuff_dinode(ip, page);
- if (error == 0)
- goto prepare_write;
- } else if (!PageUptodate(page)) {
- error = stuffed_readpage(ip, page);
- }
- goto out;
- }
-
-prepare_write:
- error = __block_write_begin(page, from, len, gfs2_block_map);
-out:
- if (error == 0)
- return 0;
-
- unlock_page(page);
- put_page(page);
-
- gfs2_trans_end(sdp);
- if (pos + len > ip->i_inode.i_size)
- gfs2_trim_blocks(&ip->i_inode);
- goto out_trans_fail;
-
-out_endtrans:
- gfs2_trans_end(sdp);
-out_trans_fail:
- if (alloc_required) {
- gfs2_inplace_release(ip);
-out_qunlock:
- gfs2_quota_unlock(ip);
- }
-out_unlock:
- if (&ip->i_inode == sdp->sd_rindex) {
- gfs2_glock_dq(&m_ip->i_gh);
- gfs2_holder_uninit(&m_ip->i_gh);
- }
- gfs2_glock_dq(&ip->i_gh);
-out_uninit:
- gfs2_holder_uninit(&ip->i_gh);
- return error;
-}
-
/**
* adjust_fs_space - Adjusts the free space available due to gfs2_grow
* @inode: the rindex inode
@@ -845,84 +719,6 @@ int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
return copied;
}
-/**
- * gfs2_write_end
- * @file: The file to write to
- * @mapping: The address space to write to
- * @pos: The file position
- * @len: The length of the data
- * @copied: How much was actually copied by the VFS
- * @page: The page that has been written
- * @fsdata: The fsdata (unused in GFS2)
- *
- * The main write_end function for GFS2. We just put our locking around the VFS
- * provided functions.
- *
- * Returns: copied bytes or errno
- */
-
-static int gfs2_write_end(struct file *file, struct address_space *mapping,
- loff_t pos, unsigned len, unsigned copied,
- struct page *page, void *fsdata)
-{
- struct inode *inode = page->mapping->host;
- struct gfs2_inode *ip = GFS2_I(inode);
- struct gfs2_sbd *sdp = GFS2_SB(inode);
- struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
- struct buffer_head *dibh;
- int ret;
- struct gfs2_trans *tr = current->journal_info;
- BUG_ON(!tr);
-
- BUG_ON(gfs2_glock_is_locked_by_me(ip->i_gl) == NULL);
-
- ret = gfs2_meta_inode_buffer(ip, &dibh);
- if (unlikely(ret))
- goto out;
-
- if (gfs2_is_stuffed(ip)) {
- ret = gfs2_stuffed_write_end(inode, dibh, pos, copied, page);
- page = NULL;
- goto out2;
- }
-
- if (gfs2_is_jdata(ip))
- gfs2_page_add_databufs(ip, page, pos & ~PAGE_MASK, len);
- else
- gfs2_ordered_add_inode(ip);
-
- ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
- page = NULL;
- if (tr->tr_num_buf_new)
- __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
- else
- gfs2_trans_add_meta(ip->i_gl, dibh);
-
-out2:
- if (inode == sdp->sd_rindex) {
- adjust_fs_space(inode);
- sdp->sd_rindex_uptodate = 0;
- }
-
- brelse(dibh);
-out:
- if (page) {
- unlock_page(page);
- put_page(page);
- }
- gfs2_trans_end(sdp);
- gfs2_inplace_release(ip);
- if (ip->i_qadata && ip->i_qadata->qa_qd_num)
- gfs2_quota_unlock(ip);
- if (inode == sdp->sd_rindex) {
- gfs2_glock_dq(&m_ip->i_gh);
- gfs2_holder_uninit(&m_ip->i_gh);
- }
- gfs2_glock_dq(&ip->i_gh);
- gfs2_holder_uninit(&ip->i_gh);
- return ret;
-}
-
/**
* jdata_set_page_dirty - Page dirtying function
* @page: The page to dirty
@@ -1090,8 +886,6 @@ static const struct address_space_operations gfs2_writeback_aops = {
.writepages = gfs2_writepages,
.readpage = gfs2_readpage,
.readpages = gfs2_readpages,
- .write_begin = gfs2_write_begin,
- .write_end = gfs2_write_end,
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
.releasepage = gfs2_releasepage,
@@ -1106,8 +900,6 @@ static const struct address_space_operations gfs2_ordered_aops = {
.writepages = gfs2_writepages,
.readpage = gfs2_readpage,
.readpages = gfs2_readpages,
- .write_begin = gfs2_write_begin,
- .write_end = gfs2_write_end,
.set_page_dirty = __set_page_dirty_buffers,
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
@@ -1123,8 +915,6 @@ static const struct address_space_operations gfs2_jdata_aops = {
.writepages = gfs2_jdata_writepages,
.readpage = gfs2_readpage,
.readpages = gfs2_readpages,
- .write_begin = gfs2_write_begin,
- .write_end = gfs2_write_end,
.set_page_dirty = jdata_set_page_dirty,
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 9/9] gfs2: Remove gfs2_write_{begin, end}
@ 2018-06-02 9:57 ` Andreas Gruenbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Gruenbacher @ 2018-06-02 9:57 UTC (permalink / raw)
To: cluster-devel.redhat.com
Now that generic_file_write_iter is no longer used, there are no
remaining users of these address space operations.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/gfs2/aops.c | 210 -------------------------------------------------
1 file changed, 210 deletions(-)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index b0a219fae162..cc80fd71f3dd 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -639,132 +639,6 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
return ret;
}
-/**
- * gfs2_write_begin - Begin to write to a file
- * @file: The file to write to
- * @mapping: The mapping in which to write
- * @pos: The file offset at which to start writing
- * @len: Length of the write
- * @flags: Various flags
- * @pagep: Pointer to return the page
- * @fsdata: Pointer to return fs data (unused by GFS2)
- *
- * Returns: errno
- */
-
-static int gfs2_write_begin(struct file *file, struct address_space *mapping,
- loff_t pos, unsigned len, unsigned flags,
- struct page **pagep, void **fsdata)
-{
- struct gfs2_inode *ip = GFS2_I(mapping->host);
- struct gfs2_sbd *sdp = GFS2_SB(mapping->host);
- struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
- unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
- unsigned requested = 0;
- int alloc_required;
- int error = 0;
- pgoff_t index = pos >> PAGE_SHIFT;
- unsigned from = pos & (PAGE_SIZE - 1);
- struct page *page;
-
- gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &ip->i_gh);
- error = gfs2_glock_nq(&ip->i_gh);
- if (unlikely(error))
- goto out_uninit;
- if (&ip->i_inode == sdp->sd_rindex) {
- error = gfs2_glock_nq_init(m_ip->i_gl, LM_ST_EXCLUSIVE,
- GL_NOCACHE, &m_ip->i_gh);
- if (unlikely(error)) {
- gfs2_glock_dq(&ip->i_gh);
- goto out_uninit;
- }
- }
-
- alloc_required = gfs2_write_alloc_required(ip, pos, len);
-
- if (alloc_required || gfs2_is_jdata(ip))
- gfs2_write_calc_reserv(ip, len, &data_blocks, &ind_blocks);
-
- if (alloc_required) {
- struct gfs2_alloc_parms ap = { .aflags = 0, };
- requested = data_blocks + ind_blocks;
- ap.target = requested;
- error = gfs2_quota_lock_check(ip, &ap);
- if (error)
- goto out_unlock;
-
- error = gfs2_inplace_reserve(ip, &ap);
- if (error)
- goto out_qunlock;
- }
-
- rblocks = RES_DINODE + ind_blocks;
- if (gfs2_is_jdata(ip))
- rblocks += data_blocks ? data_blocks : 1;
- if (ind_blocks || data_blocks)
- rblocks += RES_STATFS + RES_QUOTA;
- if (&ip->i_inode == sdp->sd_rindex)
- rblocks += 2 * RES_STATFS;
- if (alloc_required)
- rblocks += gfs2_rg_blocks(ip, requested);
-
- error = gfs2_trans_begin(sdp, rblocks,
- PAGE_SIZE/sdp->sd_sb.sb_bsize);
- if (error)
- goto out_trans_fail;
-
- error = -ENOMEM;
- flags |= AOP_FLAG_NOFS;
- page = grab_cache_page_write_begin(mapping, index, flags);
- *pagep = page;
- if (unlikely(!page))
- goto out_endtrans;
-
- if (gfs2_is_stuffed(ip)) {
- error = 0;
- if (pos + len > gfs2_max_stuffed_size(ip)) {
- error = gfs2_unstuff_dinode(ip, page);
- if (error == 0)
- goto prepare_write;
- } else if (!PageUptodate(page)) {
- error = stuffed_readpage(ip, page);
- }
- goto out;
- }
-
-prepare_write:
- error = __block_write_begin(page, from, len, gfs2_block_map);
-out:
- if (error == 0)
- return 0;
-
- unlock_page(page);
- put_page(page);
-
- gfs2_trans_end(sdp);
- if (pos + len > ip->i_inode.i_size)
- gfs2_trim_blocks(&ip->i_inode);
- goto out_trans_fail;
-
-out_endtrans:
- gfs2_trans_end(sdp);
-out_trans_fail:
- if (alloc_required) {
- gfs2_inplace_release(ip);
-out_qunlock:
- gfs2_quota_unlock(ip);
- }
-out_unlock:
- if (&ip->i_inode == sdp->sd_rindex) {
- gfs2_glock_dq(&m_ip->i_gh);
- gfs2_holder_uninit(&m_ip->i_gh);
- }
- gfs2_glock_dq(&ip->i_gh);
-out_uninit:
- gfs2_holder_uninit(&ip->i_gh);
- return error;
-}
-
/**
* adjust_fs_space - Adjusts the free space available due to gfs2_grow
* @inode: the rindex inode
@@ -845,84 +719,6 @@ int gfs2_stuffed_write_end(struct inode *inode, struct buffer_head *dibh,
return copied;
}
-/**
- * gfs2_write_end
- * @file: The file to write to
- * @mapping: The address space to write to
- * @pos: The file position
- * @len: The length of the data
- * @copied: How much was actually copied by the VFS
- * @page: The page that has been written
- * @fsdata: The fsdata (unused in GFS2)
- *
- * The main write_end function for GFS2. We just put our locking around the VFS
- * provided functions.
- *
- * Returns: copied bytes or errno
- */
-
-static int gfs2_write_end(struct file *file, struct address_space *mapping,
- loff_t pos, unsigned len, unsigned copied,
- struct page *page, void *fsdata)
-{
- struct inode *inode = page->mapping->host;
- struct gfs2_inode *ip = GFS2_I(inode);
- struct gfs2_sbd *sdp = GFS2_SB(inode);
- struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
- struct buffer_head *dibh;
- int ret;
- struct gfs2_trans *tr = current->journal_info;
- BUG_ON(!tr);
-
- BUG_ON(gfs2_glock_is_locked_by_me(ip->i_gl) == NULL);
-
- ret = gfs2_meta_inode_buffer(ip, &dibh);
- if (unlikely(ret))
- goto out;
-
- if (gfs2_is_stuffed(ip)) {
- ret = gfs2_stuffed_write_end(inode, dibh, pos, copied, page);
- page = NULL;
- goto out2;
- }
-
- if (gfs2_is_jdata(ip))
- gfs2_page_add_databufs(ip, page, pos & ~PAGE_MASK, len);
- else
- gfs2_ordered_add_inode(ip);
-
- ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
- page = NULL;
- if (tr->tr_num_buf_new)
- __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
- else
- gfs2_trans_add_meta(ip->i_gl, dibh);
-
-out2:
- if (inode == sdp->sd_rindex) {
- adjust_fs_space(inode);
- sdp->sd_rindex_uptodate = 0;
- }
-
- brelse(dibh);
-out:
- if (page) {
- unlock_page(page);
- put_page(page);
- }
- gfs2_trans_end(sdp);
- gfs2_inplace_release(ip);
- if (ip->i_qadata && ip->i_qadata->qa_qd_num)
- gfs2_quota_unlock(ip);
- if (inode == sdp->sd_rindex) {
- gfs2_glock_dq(&m_ip->i_gh);
- gfs2_holder_uninit(&m_ip->i_gh);
- }
- gfs2_glock_dq(&ip->i_gh);
- gfs2_holder_uninit(&ip->i_gh);
- return ret;
-}
-
/**
* jdata_set_page_dirty - Page dirtying function
* @page: The page to dirty
@@ -1090,8 +886,6 @@ static const struct address_space_operations gfs2_writeback_aops = {
.writepages = gfs2_writepages,
.readpage = gfs2_readpage,
.readpages = gfs2_readpages,
- .write_begin = gfs2_write_begin,
- .write_end = gfs2_write_end,
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
.releasepage = gfs2_releasepage,
@@ -1106,8 +900,6 @@ static const struct address_space_operations gfs2_ordered_aops = {
.writepages = gfs2_writepages,
.readpage = gfs2_readpage,
.readpages = gfs2_readpages,
- .write_begin = gfs2_write_begin,
- .write_end = gfs2_write_end,
.set_page_dirty = __set_page_dirty_buffers,
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
@@ -1123,8 +915,6 @@ static const struct address_space_operations gfs2_jdata_aops = {
.writepages = gfs2_jdata_writepages,
.readpage = gfs2_readpage,
.readpages = gfs2_readpages,
- .write_begin = gfs2_write_begin,
- .write_end = gfs2_write_end,
.set_page_dirty = jdata_set_page_dirty,
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
--
2.17.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v6 2/9] iomap: Mark newly allocated buffer heads as new
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-06-02 16:58 ` Christoph Hellwig
-1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2018-06-02 16:58 UTC (permalink / raw)
To: Andreas Gruenbacher; +Cc: cluster-devel, Christoph Hellwig, linux-fsdevel
> case IOMAP_MAPPED:
> - if (offset >= i_size_read(inode))
> + if ((iomap->flags & IOMAP_F_NEW) || offset >= i_size_read(inode))
Please split the line to stay withing 80 characters.
Otherwise looks fine:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 2/9] iomap: Mark newly allocated buffer heads as new
@ 2018-06-02 16:58 ` Christoph Hellwig
0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2018-06-02 16:58 UTC (permalink / raw)
To: cluster-devel.redhat.com
> case IOMAP_MAPPED:
> - if (offset >= i_size_read(inode))
> + if ((iomap->flags & IOMAP_F_NEW) || offset >= i_size_read(inode))
Please split the line to stay withing 80 characters.
Otherwise looks fine:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v6 3/9] iomap: Complete partial direct I/O writes synchronously
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-06-02 16:58 ` Christoph Hellwig
-1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2018-06-02 16:58 UTC (permalink / raw)
To: Andreas Gruenbacher; +Cc: cluster-devel, Christoph Hellwig, linux-fsdevel
Looks fine,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 3/9] iomap: Complete partial direct I/O writes synchronously
@ 2018-06-02 16:58 ` Christoph Hellwig
0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2018-06-02 16:58 UTC (permalink / raw)
To: cluster-devel.redhat.com
Looks fine,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v6 4/9] iomap: Generic inline data handling
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-06-02 17:04 ` Christoph Hellwig
-1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2018-06-02 17:04 UTC (permalink / raw)
To: Andreas Gruenbacher; +Cc: cluster-devel, Christoph Hellwig, linux-fsdevel
On Sat, Jun 02, 2018 at 11:57:12AM +0200, Andreas Gruenbacher wrote:
> Add generic inline data handling by adding a pointer to the inline data
> region to struct iomap. When handling a buffered IOMAP_INLINE write,
> iomap_write_begin will copy the current inline data from the inline data
> region into the page cache, and iomap_write_end will copy the changes in
> the page cache back to the inline data region.
This approach looks good. A few comments below:
>
> This doesn't cover inline data reads and direct I/O yet because so far,
> we have no users.
I'm fine with that as a first step, but gfs2 should be able to do
proper direct I/O to inline data and use by new iomap_readpage(s)
easily at least for blocksize == PAGE_SIZE, so this should be added
soon.
> -int generic_write_end(struct file *file, struct address_space *mapping,
> +void __generic_write_end(struct file *file, struct address_space *mapping,
> loff_t pos, unsigned len, unsigned copied,
> - struct page *page, void *fsdata)
> + struct page *page, void *fsdata, bool dirty_inode)
This is going to clash with
http://git.infradead.org/users/hch/xfs.git/commitdiff/2733909d6b40046ce9c7302c2e742c5e993a0108
It should also be a separate prep patch with a good explanation
> iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> - unsigned copied, struct page *page)
> + unsigned copied, struct page *page, struct iomap *iomap)
Note that I have another patch adding this parameter. I think we'll need
a common iomap base tree for the gfs2 and xfs changes for the next
merge window. I'd be happy to one one up.
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 918f14075702..c61113c71a60 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -47,6 +47,7 @@ struct iomap {
> u64 length; /* length of mapping, bytes */
> u16 type; /* type of mapping */
> u16 flags; /* flags for mapping */
> + void *inline_data; /* inline data buffer */
> struct block_device *bdev; /* block device for I/O */
> struct dax_device *dax_dev; /* dax_dev for dax operations */
> };
Eventually we need to find a way to union the inline_data, bdev and
dax_dev fields, but that can be left to later.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 4/9] iomap: Generic inline data handling
@ 2018-06-02 17:04 ` Christoph Hellwig
0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2018-06-02 17:04 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Sat, Jun 02, 2018 at 11:57:12AM +0200, Andreas Gruenbacher wrote:
> Add generic inline data handling by adding a pointer to the inline data
> region to struct iomap. When handling a buffered IOMAP_INLINE write,
> iomap_write_begin will copy the current inline data from the inline data
> region into the page cache, and iomap_write_end will copy the changes in
> the page cache back to the inline data region.
This approach looks good. A few comments below:
>
> This doesn't cover inline data reads and direct I/O yet because so far,
> we have no users.
I'm fine with that as a first step, but gfs2 should be able to do
proper direct I/O to inline data and use by new iomap_readpage(s)
easily at least for blocksize == PAGE_SIZE, so this should be added
soon.
> -int generic_write_end(struct file *file, struct address_space *mapping,
> +void __generic_write_end(struct file *file, struct address_space *mapping,
> loff_t pos, unsigned len, unsigned copied,
> - struct page *page, void *fsdata)
> + struct page *page, void *fsdata, bool dirty_inode)
This is going to clash with
http://git.infradead.org/users/hch/xfs.git/commitdiff/2733909d6b40046ce9c7302c2e742c5e993a0108
It should also be a separate prep patch with a good explanation
> iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
> - unsigned copied, struct page *page)
> + unsigned copied, struct page *page, struct iomap *iomap)
Note that I have another patch adding this parameter. I think we'll need
a common iomap base tree for the gfs2 and xfs changes for the next
merge window. I'd be happy to one one up.
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 918f14075702..c61113c71a60 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -47,6 +47,7 @@ struct iomap {
> u64 length; /* length of mapping, bytes */
> u16 type; /* type of mapping */
> u16 flags; /* flags for mapping */
> + void *inline_data; /* inline data buffer */
> struct block_device *bdev; /* block device for I/O */
> struct dax_device *dax_dev; /* dax_dev for dax operations */
> };
Eventually we need to find a way to union the inline_data, bdev and
dax_dev fields, but that can be left to later.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v6 5/9] iomap: Add write_end iomap operation
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
@ 2018-06-02 17:06 ` Christoph Hellwig
-1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2018-06-02 17:06 UTC (permalink / raw)
To: Andreas Gruenbacher; +Cc: cluster-devel, Christoph Hellwig, linux-fsdevel
On Sat, Jun 02, 2018 at 11:57:13AM +0200, Andreas Gruenbacher wrote:
> Add a write_end operation to struct iomap_ops to provide a way of
> overriding the default behavior of iomap_write_end. This will be used
> for implementing data journaling in gfs2: in the data journaling case,
> pages are written into the journal before being written back to their
> proper on-disk locations.
Please names this page_write_end and make it an optional callout
just for the additional functionality, that is keep the call to
iomap_write_end hardcoded in iomap.c, just call the new method before
it.
> +
> +struct iomap_write_args {
> + const struct iomap_ops *ops;
> + struct iov_iter *iter;
> +};
Also I wonder if we should just pass the iomap_ops diretly to the actor
callback.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 5/9] iomap: Add write_end iomap operation
@ 2018-06-02 17:06 ` Christoph Hellwig
0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2018-06-02 17:06 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Sat, Jun 02, 2018 at 11:57:13AM +0200, Andreas Gruenbacher wrote:
> Add a write_end operation to struct iomap_ops to provide a way of
> overriding the default behavior of iomap_write_end. This will be used
> for implementing data journaling in gfs2: in the data journaling case,
> pages are written into the journal before being written back to their
> proper on-disk locations.
Please names this page_write_end and make it an optional callout
just for the additional functionality, that is keep the call to
iomap_write_end hardcoded in iomap.c, just call the new method before
it.
> +
> +struct iomap_write_args {
> + const struct iomap_ops *ops;
> + struct iov_iter *iter;
> +};
Also I wonder if we should just pass the iomap_ops diretly to the actor
callback.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v6 4/9] iomap: Generic inline data handling
2018-06-02 17:04 ` [Cluster-devel] " Christoph Hellwig
@ 2018-06-04 12:02 ` Andreas Grünbacher
-1 siblings, 0 replies; 36+ messages in thread
From: Andreas Grünbacher @ 2018-06-04 12:02 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andreas Gruenbacher, cluster-devel, Linux FS-devel Mailing List
2018-06-02 19:04 GMT+02:00 Christoph Hellwig <hch@lst.de>:
> On Sat, Jun 02, 2018 at 11:57:12AM +0200, Andreas Gruenbacher wrote:
>> Add generic inline data handling by adding a pointer to the inline data
>> region to struct iomap. When handling a buffered IOMAP_INLINE write,
>> iomap_write_begin will copy the current inline data from the inline data
>> region into the page cache, and iomap_write_end will copy the changes in
>> the page cache back to the inline data region.
>
> This approach looks good. A few comments below:
>
>>
>> This doesn't cover inline data reads and direct I/O yet because so far,
>> we have no users.
>
> I'm fine with that as a first step, but gfs2 should be able to do
> proper direct I/O to inline data and use by new iomap_readpage(s)
> easily at least for blocksize == PAGE_SIZE, so this should be added
> soon.
Yes, there are a few open ends in gfs's iomap support, but what we
have so far is already quite useful.
>> -int generic_write_end(struct file *file, struct address_space *mapping,
>> +void __generic_write_end(struct file *file, struct address_space *mapping,
>> loff_t pos, unsigned len, unsigned copied,
>> - struct page *page, void *fsdata)
>> + struct page *page, void *fsdata, bool dirty_inode)
>
> This is going to clash with
>
> http://git.infradead.org/users/hch/xfs.git/commitdiff/2733909d6b40046ce9c7302c2e742c5e993a0108
>
> It should also be a separate prep patch with a good explanation
I'll cherry-pick that commit for now.
>> iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>> - unsigned copied, struct page *page)
>> + unsigned copied, struct page *page, struct iomap *iomap)
>
> Note that I have another patch adding this parameter. I think we'll need
> a common iomap base tree for the gfs2 and xfs changes for the next
> merge window. I'd be happy to one one up.
No objections there, I don't think we'll end up with major conflicts though.
>> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
>> index 918f14075702..c61113c71a60 100644
>> --- a/include/linux/iomap.h
>> +++ b/include/linux/iomap.h
>> @@ -47,6 +47,7 @@ struct iomap {
>> u64 length; /* length of mapping, bytes */
>> u16 type; /* type of mapping */
>> u16 flags; /* flags for mapping */
>> + void *inline_data; /* inline data buffer */
>> struct block_device *bdev; /* block device for I/O */
>> struct dax_device *dax_dev; /* dax_dev for dax operations */
>> };
>
> Eventually we need to find a way to union the inline_data, bdev and
> dax_dev fields, but that can be left to later.
For gfs2, passing a private pointer from iomap_begin to iomap_end (for
the buffer head holding the inode) would help as well, but it's not
critical.
Thanks,
Andreas
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 4/9] iomap: Generic inline data handling
@ 2018-06-04 12:02 ` Andreas Grünbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Grünbacher @ 2018-06-04 12:02 UTC (permalink / raw)
To: cluster-devel.redhat.com
2018-06-02 19:04 GMT+02:00 Christoph Hellwig <hch@lst.de>:
> On Sat, Jun 02, 2018 at 11:57:12AM +0200, Andreas Gruenbacher wrote:
>> Add generic inline data handling by adding a pointer to the inline data
>> region to struct iomap. When handling a buffered IOMAP_INLINE write,
>> iomap_write_begin will copy the current inline data from the inline data
>> region into the page cache, and iomap_write_end will copy the changes in
>> the page cache back to the inline data region.
>
> This approach looks good. A few comments below:
>
>>
>> This doesn't cover inline data reads and direct I/O yet because so far,
>> we have no users.
>
> I'm fine with that as a first step, but gfs2 should be able to do
> proper direct I/O to inline data and use by new iomap_readpage(s)
> easily at least for blocksize == PAGE_SIZE, so this should be added
> soon.
Yes, there are a few open ends in gfs's iomap support, but what we
have so far is already quite useful.
>> -int generic_write_end(struct file *file, struct address_space *mapping,
>> +void __generic_write_end(struct file *file, struct address_space *mapping,
>> loff_t pos, unsigned len, unsigned copied,
>> - struct page *page, void *fsdata)
>> + struct page *page, void *fsdata, bool dirty_inode)
>
> This is going to clash with
>
> http://git.infradead.org/users/hch/xfs.git/commitdiff/2733909d6b40046ce9c7302c2e742c5e993a0108
>
> It should also be a separate prep patch with a good explanation
I'll cherry-pick that commit for now.
>> iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
>> - unsigned copied, struct page *page)
>> + unsigned copied, struct page *page, struct iomap *iomap)
>
> Note that I have another patch adding this parameter. I think we'll need
> a common iomap base tree for the gfs2 and xfs changes for the next
> merge window. I'd be happy to one one up.
No objections there, I don't think we'll end up with major conflicts though.
>> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
>> index 918f14075702..c61113c71a60 100644
>> --- a/include/linux/iomap.h
>> +++ b/include/linux/iomap.h
>> @@ -47,6 +47,7 @@ struct iomap {
>> u64 length; /* length of mapping, bytes */
>> u16 type; /* type of mapping */
>> u16 flags; /* flags for mapping */
>> + void *inline_data; /* inline data buffer */
>> struct block_device *bdev; /* block device for I/O */
>> struct dax_device *dax_dev; /* dax_dev for dax operations */
>> };
>
> Eventually we need to find a way to union the inline_data, bdev and
> dax_dev fields, but that can be left to later.
For gfs2, passing a private pointer from iomap_begin to iomap_end (for
the buffer head holding the inode) would help as well, but it's not
critical.
Thanks,
Andreas
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v6 5/9] iomap: Add write_end iomap operation
2018-06-02 17:06 ` [Cluster-devel] " Christoph Hellwig
@ 2018-06-04 12:03 ` Andreas Grünbacher
-1 siblings, 0 replies; 36+ messages in thread
From: Andreas Grünbacher @ 2018-06-04 12:03 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andreas Gruenbacher, cluster-devel, Linux FS-devel Mailing List
2018-06-02 19:06 GMT+02:00 Christoph Hellwig <hch@lst.de>:
> On Sat, Jun 02, 2018 at 11:57:13AM +0200, Andreas Gruenbacher wrote:
>> Add a write_end operation to struct iomap_ops to provide a way of
>> overriding the default behavior of iomap_write_end. This will be used
>> for implementing data journaling in gfs2: in the data journaling case,
>> pages are written into the journal before being written back to their
>> proper on-disk locations.
>
> Please names this page_write_end and make it an optional callout
> just for the additional functionality, that is keep the call to
> iomap_write_end hardcoded in iomap.c, just call the new method before
> it.
I'll send an updated patch queue shortly.
>> +
>> +struct iomap_write_args {
>> + const struct iomap_ops *ops;
>> + struct iov_iter *iter;
>> +};
>
> Also I wonder if we should just pass the iomap_ops diretly to the actor
> callback.
It would make more sense to put the iomap_ops pointer into struct
iomap itself. I'll put a patch doing that at the end of my patch queue
so you can have a look.
Thanks,
Andreas
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 5/9] iomap: Add write_end iomap operation
@ 2018-06-04 12:03 ` Andreas Grünbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Grünbacher @ 2018-06-04 12:03 UTC (permalink / raw)
To: cluster-devel.redhat.com
2018-06-02 19:06 GMT+02:00 Christoph Hellwig <hch@lst.de>:
> On Sat, Jun 02, 2018 at 11:57:13AM +0200, Andreas Gruenbacher wrote:
>> Add a write_end operation to struct iomap_ops to provide a way of
>> overriding the default behavior of iomap_write_end. This will be used
>> for implementing data journaling in gfs2: in the data journaling case,
>> pages are written into the journal before being written back to their
>> proper on-disk locations.
>
> Please names this page_write_end and make it an optional callout
> just for the additional functionality, that is keep the call to
> iomap_write_end hardcoded in iomap.c, just call the new method before
> it.
I'll send an updated patch queue shortly.
>> +
>> +struct iomap_write_args {
>> + const struct iomap_ops *ops;
>> + struct iov_iter *iter;
>> +};
>
> Also I wonder if we should just pass the iomap_ops diretly to the actor
> callback.
It would make more sense to put the iomap_ops pointer into struct
iomap itself. I'll put a patch doing that at the end of my patch queue
so you can have a look.
Thanks,
Andreas
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v6 4/9] iomap: Generic inline data handling
2018-06-04 12:02 ` [Cluster-devel] " Andreas Grünbacher
@ 2018-06-04 12:12 ` Christoph Hellwig
-1 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2018-06-04 12:12 UTC (permalink / raw)
To: Andreas Grünbacher
Cc: Christoph Hellwig, Andreas Gruenbacher, cluster-devel,
Linux FS-devel Mailing List
On Mon, Jun 04, 2018 at 02:02:10PM +0200, Andreas Gr�nbacher wrote:
> For gfs2, passing a private pointer from iomap_begin to iomap_end (for
> the buffer head holding the inode) would help as well, but it's not
> critical.
A private pointer is fine with me.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 4/9] iomap: Generic inline data handling
@ 2018-06-04 12:12 ` Christoph Hellwig
0 siblings, 0 replies; 36+ messages in thread
From: Christoph Hellwig @ 2018-06-04 12:12 UTC (permalink / raw)
To: cluster-devel.redhat.com
On Mon, Jun 04, 2018 at 02:02:10PM +0200, Andreas Gr?nbacher wrote:
> For gfs2, passing a private pointer from iomap_begin to iomap_end (for
> the buffer head holding the inode) would help as well, but it's not
> critical.
A private pointer is fine with me.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v6 4/9] iomap: Generic inline data handling
2018-06-04 12:12 ` [Cluster-devel] " Christoph Hellwig
@ 2018-06-04 17:01 ` Andreas Grünbacher
-1 siblings, 0 replies; 36+ messages in thread
From: Andreas Grünbacher @ 2018-06-04 17:01 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Andreas Gruenbacher, cluster-devel, Linux FS-devel Mailing List
2018-06-04 14:12 GMT+02:00 Christoph Hellwig <hch@lst.de>:
> On Mon, Jun 04, 2018 at 02:02:10PM +0200, Andreas Grünbacher wrote:
>> For gfs2, passing a private pointer from iomap_begin to iomap_end (for
>> the buffer head holding the inode) would help as well, but it's not
>> critical.
>
> A private pointer is fine with me.
Okay, that'll be a patch for later though.
Andreas
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Cluster-devel] [PATCH v6 4/9] iomap: Generic inline data handling
@ 2018-06-04 17:01 ` Andreas Grünbacher
0 siblings, 0 replies; 36+ messages in thread
From: Andreas Grünbacher @ 2018-06-04 17:01 UTC (permalink / raw)
To: cluster-devel.redhat.com
2018-06-04 14:12 GMT+02:00 Christoph Hellwig <hch@lst.de>:
> On Mon, Jun 04, 2018 at 02:02:10PM +0200, Andreas Gr?nbacher wrote:
>> For gfs2, passing a private pointer from iomap_begin to iomap_end (for
>> the buffer head holding the inode) would help as well, but it's not
>> critical.
>
> A private pointer is fine with me.
Okay, that'll be a patch for later though.
Andreas
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2018-06-04 17:01 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-02 9:57 [PATCH v6 0/9] gfs2 iomap write support Andreas Gruenbacher
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
2018-06-02 9:57 ` [PATCH v6 1/9] iomap: inline data should be an iomap type, not a flag Andreas Gruenbacher
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
2018-06-02 9:57 ` [PATCH v6 2/9] iomap: Mark newly allocated buffer heads as new Andreas Gruenbacher
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
2018-06-02 16:58 ` Christoph Hellwig
2018-06-02 16:58 ` [Cluster-devel] " Christoph Hellwig
2018-06-02 9:57 ` [PATCH v6 3/9] iomap: Complete partial direct I/O writes synchronously Andreas Gruenbacher
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
2018-06-02 16:58 ` Christoph Hellwig
2018-06-02 16:58 ` [Cluster-devel] " Christoph Hellwig
2018-06-02 9:57 ` [PATCH v6 4/9] iomap: Generic inline data handling Andreas Gruenbacher
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
2018-06-02 17:04 ` Christoph Hellwig
2018-06-02 17:04 ` [Cluster-devel] " Christoph Hellwig
2018-06-04 12:02 ` Andreas Grünbacher
2018-06-04 12:02 ` [Cluster-devel] " Andreas Grünbacher
2018-06-04 12:12 ` Christoph Hellwig
2018-06-04 12:12 ` [Cluster-devel] " Christoph Hellwig
2018-06-04 17:01 ` Andreas Grünbacher
2018-06-04 17:01 ` [Cluster-devel] " Andreas Grünbacher
2018-06-02 9:57 ` [PATCH v6 5/9] iomap: Add write_end iomap operation Andreas Gruenbacher
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
2018-06-02 17:06 ` Christoph Hellwig
2018-06-02 17:06 ` [Cluster-devel] " Christoph Hellwig
2018-06-04 12:03 ` Andreas Grünbacher
2018-06-04 12:03 ` [Cluster-devel] " Andreas Grünbacher
2018-06-02 9:57 ` [PATCH v6 6/9] gfs2: iomap buffered write support Andreas Gruenbacher
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
2018-06-02 9:57 ` [PATCH v6 7/9] gfs2: gfs2_extent_length cleanup Andreas Gruenbacher
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
2018-06-02 9:57 ` [PATCH v6 8/9] gfs2: iomap direct I/O support Andreas Gruenbacher
2018-06-02 9:57 ` [Cluster-devel] " Andreas Gruenbacher
2018-06-02 9:57 ` [PATCH v6 9/9] gfs2: Remove gfs2_write_{begin,end} Andreas Gruenbacher
2018-06-02 9:57 ` [Cluster-devel] [PATCH v6 9/9] gfs2: Remove gfs2_write_{begin, end} Andreas Gruenbacher
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.