* [PATCH v2 1/2] iomap: Add a page_prepare callback
@ 2019-04-25 15:26 Andreas Gruenbacher
2019-04-25 15:26 ` [PATCH v2 2/2] gfs2: Fix iomap write page reclaim deadlock Andreas Gruenbacher
2019-04-25 15:29 ` [PATCH v2 1/2] iomap: Add a page_prepare callback Matthew Wilcox
0 siblings, 2 replies; 4+ messages in thread
From: Andreas Gruenbacher @ 2019-04-25 15:26 UTC (permalink / raw)
To: cluster-devel, Christoph Hellwig
Cc: Bob Peterson, Jan Kara, Dave Chinner, Ross Lagerwall, Mark Syms,
Edwin Török, linux-fsdevel, linux-mm,
Andreas Gruenbacher
Move the page_done callback into a separate iomap_page_ops structure and
add a page_prepare calback to be called before a page is written to. In
gfs2, we'll want to start a transaction in page_prepare and end it in
page_done, and other filesystems that implement data journaling will
require the same kind of mechanism.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
fs/iomap.c | 21 +++++++++++++++++----
include/linux/iomap.h | 18 +++++++++++++-----
2 files changed, 30 insertions(+), 9 deletions(-)
diff --git a/fs/iomap.c b/fs/iomap.c
index 97cb9d486a7d..967c985c5310 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -674,9 +674,17 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
if (fatal_signal_pending(current))
return -EINTR;
+ if (page_ops) {
+ status = page_ops->page_prepare(inode, pos, len, iomap);
+ if (status)
+ return status;
+ }
+
page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
- if (!page)
- return -ENOMEM;
+ if (!page) {
+ status = -ENOMEM;
+ goto no_page;
+ }
if (iomap->type == IOMAP_INLINE)
iomap_read_inline_data(inode, page, iomap);
@@ -684,12 +692,16 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
status = __block_write_begin_int(page, pos, len, NULL, iomap);
else
status = __iomap_write_begin(inode, pos, len, page, iomap);
+
if (unlikely(status)) {
unlock_page(page);
put_page(page);
page = NULL;
iomap_write_failed(inode, pos, len);
+no_page:
+ if (page_ops)
+ page_ops->page_done(inode, pos, 0, NULL, iomap);
}
*pagep = page;
@@ -769,6 +781,7 @@ static int
iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
unsigned copied, struct page *page, struct iomap *iomap)
{
+ const struct iomap_page_ops *page_ops = iomap->page_ops;
int ret;
if (iomap->type == IOMAP_INLINE) {
@@ -780,8 +793,8 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len,
ret = __iomap_write_end(inode, pos, len, copied, page, iomap);
}
- if (iomap->page_done)
- iomap->page_done(inode, pos, copied, page, iomap);
+ if (page_ops)
+ page_ops->page_done(inode, pos, copied, page, iomap);
if (ret < len)
iomap_write_failed(inode, pos, len);
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0fefb5455bda..fd65f27d300e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -53,6 +53,8 @@ struct vm_fault;
*/
#define IOMAP_NULL_ADDR -1ULL /* addr is not valid */
+struct iomap_page_ops;
+
struct iomap {
u64 addr; /* disk offset of mapping, bytes */
loff_t offset; /* file offset of mapping, bytes */
@@ -63,12 +65,18 @@ struct iomap {
struct dax_device *dax_dev; /* dax_dev for dax operations */
void *inline_data;
void *private; /* filesystem private */
+ const struct iomap_page_ops *page_ops;
+};
- /*
- * Called when finished processing a page in the mapping returned in
- * this iomap. At least for now this is only supported in the buffered
- * write path.
- */
+/*
+ * Called before / after processing a page in the mapping returned in this
+ * iomap. At least for now, this is only supported in the buffered write path.
+ * When page_prepare returns 0, page_done is called as well
+ * (possibly with page == NULL).
+ */
+struct iomap_page_ops {
+ int (*page_prepare)(struct inode *inode, loff_t pos, unsigned len,
+ struct iomap *iomap);
void (*page_done)(struct inode *inode, loff_t pos, unsigned copied,
struct page *page, struct iomap *iomap);
};
--
2.20.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v2 2/2] gfs2: Fix iomap write page reclaim deadlock
2019-04-25 15:26 [PATCH v2 1/2] iomap: Add a page_prepare callback Andreas Gruenbacher
@ 2019-04-25 15:26 ` Andreas Gruenbacher
2019-04-25 15:29 ` [PATCH v2 1/2] iomap: Add a page_prepare callback Matthew Wilcox
1 sibling, 0 replies; 4+ messages in thread
From: Andreas Gruenbacher @ 2019-04-25 15:26 UTC (permalink / raw)
To: cluster-devel, Christoph Hellwig
Cc: Bob Peterson, Jan Kara, Dave Chinner, Ross Lagerwall, Mark Syms,
Edwin Török, linux-fsdevel, linux-mm,
Andreas Gruenbacher
Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
buffered writes by starting a transaction in iomap_begin, writing a range of
pages, and ending that transaction in iomap_end. This approach suffers from
two problems:
(1) Any allocations necessary for the write are done in iomap_begin, so when
the data aren't journaled, there is no need for keeping the transaction open
until iomap_end.
(2) Transactions keep the gfs2 log flush lock held. When
iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
gfs2_write_inode, which will try to flush the log. This requires taking the
log flush lock which is already held, resulting in a deadlock.
Fix both of these issues by not keeping transactions open from iomap_begin to
iomap_end. Instead, start a small transaction in page_prepare and end it in
page_done when necessary.
Reported-by: Edwin Török <edvin.torok@citrix.com>
Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
fs/gfs2/aops.c | 14 +++++--
fs/gfs2/bmap.c | 101 ++++++++++++++++++++++++++++---------------------
fs/iomap.c | 1 +
3 files changed, 68 insertions(+), 48 deletions(-)
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 05dd78f4b2b3..6210d4429d84 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -649,7 +649,7 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
*/
void adjust_fs_space(struct inode *inode)
{
- struct gfs2_sbd *sdp = inode->i_sb->s_fs_info;
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode);
struct gfs2_inode *l_ip = GFS2_I(sdp->sd_sc_inode);
struct gfs2_statfs_change_host *m_sc = &sdp->sd_statfs_master;
@@ -657,10 +657,13 @@ void adjust_fs_space(struct inode *inode)
struct buffer_head *m_bh, *l_bh;
u64 fs_total, new_free;
+ if (gfs2_trans_begin(sdp, 2 * RES_STATFS, 0) != 0)
+ return;
+
/* Total up the file system space, according to the latest rindex. */
fs_total = gfs2_ri_total(sdp);
if (gfs2_meta_inode_buffer(m_ip, &m_bh) != 0)
- return;
+ goto out;
spin_lock(&sdp->sd_statfs_spin);
gfs2_statfs_change_in(m_sc, m_bh->b_data +
@@ -675,11 +678,14 @@ void adjust_fs_space(struct inode *inode)
gfs2_statfs_change(sdp, new_free, new_free, 0);
if (gfs2_meta_inode_buffer(l_ip, &l_bh) != 0)
- goto out;
+ goto out2;
update_statfs(sdp, m_bh, l_bh);
brelse(l_bh);
-out:
+out2:
brelse(m_bh);
+out:
+ sdp->sd_rindex_uptodate = 0;
+ gfs2_trans_end(sdp);
}
/**
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index 5da4ca9041c0..2fae3f4f5db6 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -991,15 +991,31 @@ static void gfs2_write_unlock(struct inode *inode)
gfs2_glock_dq_uninit(&ip->i_gh);
}
-static void gfs2_iomap_journaled_page_done(struct inode *inode, loff_t pos,
- unsigned copied, struct page *page,
- struct iomap *iomap)
+static int gfs2_iomap_page_prepare(struct inode *inode, loff_t pos,
+ unsigned len, struct iomap *iomap)
+{
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
+
+ return gfs2_trans_begin(sdp, RES_DINODE + (len >> inode->i_blkbits), 0);
+}
+
+static void gfs2_iomap_page_done(struct inode *inode, loff_t pos,
+ unsigned copied, struct page *page,
+ struct iomap *iomap)
{
struct gfs2_inode *ip = GFS2_I(inode);
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
- gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
+ if (page && !gfs2_is_stuffed(ip))
+ gfs2_page_add_databufs(ip, page, offset_in_page(pos), copied);
+ gfs2_trans_end(sdp);
}
+const struct iomap_page_ops gfs2_iomap_page_ops = {
+ .page_prepare = gfs2_iomap_page_prepare,
+ .page_done = gfs2_iomap_page_done,
+};
+
static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
loff_t length, unsigned flags,
struct iomap *iomap,
@@ -1052,32 +1068,46 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos,
if (alloc_required)
rblocks += gfs2_rg_blocks(ip, data_blocks + ind_blocks);
- ret = gfs2_trans_begin(sdp, rblocks, iomap->length >> inode->i_blkbits);
- if (ret)
- goto out_trans_fail;
+ if (unstuff || iomap->type == IOMAP_HOLE) {
+ struct gfs2_trans *tr;
- if (unstuff) {
- ret = gfs2_unstuff_dinode(ip, NULL);
- if (ret)
- goto out_trans_end;
- release_metapath(mp);
- ret = gfs2_iomap_get(inode, iomap->offset, iomap->length,
- flags, iomap, mp);
+ ret = gfs2_trans_begin(sdp, rblocks,
+ iomap->length >> inode->i_blkbits);
if (ret)
- goto out_trans_end;
- }
+ goto out_trans_fail;
- if (iomap->type == IOMAP_HOLE) {
- ret = gfs2_iomap_alloc(inode, iomap, flags, mp);
- if (ret) {
- gfs2_trans_end(sdp);
- gfs2_inplace_release(ip);
- punch_hole(ip, iomap->offset, iomap->length);
- goto out_qunlock;
+ if (unstuff) {
+ ret = gfs2_unstuff_dinode(ip, NULL);
+ if (ret)
+ goto out_trans_end;
+ release_metapath(mp);
+ ret = gfs2_iomap_get(inode, iomap->offset,
+ iomap->length, flags, iomap, mp);
+ if (ret)
+ goto out_trans_end;
}
+
+ if (iomap->type == IOMAP_HOLE) {
+ ret = gfs2_iomap_alloc(inode, iomap, flags, mp);
+ if (ret) {
+ gfs2_trans_end(sdp);
+ gfs2_inplace_release(ip);
+ punch_hole(ip, iomap->offset, iomap->length);
+ goto out_qunlock;
+ }
+ }
+
+ tr = current->journal_info;
+ if (tr->tr_num_buf_new)
+ __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
+ else
+ gfs2_trans_add_meta(ip->i_gl, mp->mp_bh[0]);
+
+ gfs2_trans_end(sdp);
}
- if (!gfs2_is_stuffed(ip) && gfs2_is_jdata(ip))
- iomap->page_done = gfs2_iomap_journaled_page_done;
+
+ if (gfs2_is_stuffed(ip) || gfs2_is_jdata(ip))
+ iomap->page_ops = &gfs2_iomap_page_ops;
return 0;
out_trans_end:
@@ -1116,10 +1146,6 @@ static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length,
iomap->type != IOMAP_MAPPED)
ret = -ENOTBLK;
}
- if (!ret) {
- get_bh(mp.mp_bh[0]);
- iomap->private = mp.mp_bh[0];
- }
release_metapath(&mp);
trace_gfs2_iomap_end(ip, iomap, ret);
return ret;
@@ -1130,27 +1156,16 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
{
struct gfs2_inode *ip = GFS2_I(inode);
struct gfs2_sbd *sdp = GFS2_SB(inode);
- struct gfs2_trans *tr = current->journal_info;
- struct buffer_head *dibh = iomap->private;
if ((flags & (IOMAP_WRITE | IOMAP_DIRECT)) != IOMAP_WRITE)
goto out;
- if (iomap->type != IOMAP_INLINE) {
+ if (!gfs2_is_stuffed(ip))
gfs2_ordered_add_inode(ip);
- if (tr->tr_num_buf_new)
- __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
- else
- gfs2_trans_add_meta(ip->i_gl, dibh);
- }
-
- if (inode == sdp->sd_rindex) {
+ if (inode == sdp->sd_rindex)
adjust_fs_space(inode);
- sdp->sd_rindex_uptodate = 0;
- }
- gfs2_trans_end(sdp);
gfs2_inplace_release(ip);
if (length != written && (iomap->flags & IOMAP_F_NEW)) {
@@ -1170,8 +1185,6 @@ static int gfs2_iomap_end(struct inode *inode, loff_t pos, loff_t length,
gfs2_write_unlock(inode);
out:
- if (dibh)
- brelse(dibh);
return 0;
}
diff --git a/fs/iomap.c b/fs/iomap.c
index 967c985c5310..667a822ecb7d 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -665,6 +665,7 @@ static int
iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
struct page **pagep, struct iomap *iomap)
{
+ const struct iomap_page_ops *page_ops = iomap->page_ops;
pgoff_t index = pos >> PAGE_SHIFT;
struct page *page;
int status = 0;
--
2.20.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/2] iomap: Add a page_prepare callback
2019-04-25 15:26 [PATCH v2 1/2] iomap: Add a page_prepare callback Andreas Gruenbacher
2019-04-25 15:26 ` [PATCH v2 2/2] gfs2: Fix iomap write page reclaim deadlock Andreas Gruenbacher
@ 2019-04-25 15:29 ` Matthew Wilcox
2019-04-25 15:52 ` Andreas Gruenbacher
1 sibling, 1 reply; 4+ messages in thread
From: Matthew Wilcox @ 2019-04-25 15:29 UTC (permalink / raw)
To: Andreas Gruenbacher
Cc: cluster-devel, Christoph Hellwig, Bob Peterson, Jan Kara,
Dave Chinner, Ross Lagerwall, Mark Syms, Edwin Török,
linux-fsdevel, linux-mm
On Thu, Apr 25, 2019 at 05:26:30PM +0200, Andreas Gruenbacher wrote:
This seems to be corrupted; there's no declaration of a page_ops in
iomap_write_begin ... unless you're basing on a patch I don't have?
> diff --git a/fs/iomap.c b/fs/iomap.c
> index 97cb9d486a7d..967c985c5310 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -674,9 +674,17 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> if (fatal_signal_pending(current))
> return -EINTR;
>
> + if (page_ops) {
> + status = page_ops->page_prepare(inode, pos, len, iomap);
> + if (status)
> + return status;
> + }
> +
> page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
> - if (!page)
> - return -ENOMEM;
> + if (!page) {
> + status = -ENOMEM;
> + goto no_page;
> + }
>
> if (iomap->type == IOMAP_INLINE)
> iomap_read_inline_data(inode, page, iomap);
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/2] iomap: Add a page_prepare callback
2019-04-25 15:29 ` [PATCH v2 1/2] iomap: Add a page_prepare callback Matthew Wilcox
@ 2019-04-25 15:52 ` Andreas Gruenbacher
0 siblings, 0 replies; 4+ messages in thread
From: Andreas Gruenbacher @ 2019-04-25 15:52 UTC (permalink / raw)
To: Matthew Wilcox
Cc: cluster-devel, Christoph Hellwig, Bob Peterson, Jan Kara,
Dave Chinner, Ross Lagerwall, Mark Syms, Edwin Török,
linux-fsdevel, linux-mm
On Thu, 25 Apr 2019 at 17:29, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Apr 25, 2019 at 05:26:30PM +0200, Andreas Gruenbacher wrote:
>
> This seems to be corrupted; there's no declaration of a page_ops in
> iomap_write_begin ... unless you're basing on a patch I don't have?
Oops, this has slipped into the 2nd patch, sorry.
> > diff --git a/fs/iomap.c b/fs/iomap.c
> > index 97cb9d486a7d..967c985c5310 100644
> > --- a/fs/iomap.c
> > +++ b/fs/iomap.c
> > @@ -674,9 +674,17 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags,
> > if (fatal_signal_pending(current))
> > return -EINTR;
> >
> > + if (page_ops) {
> > + status = page_ops->page_prepare(inode, pos, len, iomap);
> > + if (status)
> > + return status;
> > + }
> > +
> > page = grab_cache_page_write_begin(inode->i_mapping, index, flags);
> > - if (!page)
> > - return -ENOMEM;
> > + if (!page) {
> > + status = -ENOMEM;
> > + goto no_page;
> > + }
> >
> > if (iomap->type == IOMAP_INLINE)
> > iomap_read_inline_data(inode, page, iomap);
Andreas
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-04-25 15:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-25 15:26 [PATCH v2 1/2] iomap: Add a page_prepare callback Andreas Gruenbacher
2019-04-25 15:26 ` [PATCH v2 2/2] gfs2: Fix iomap write page reclaim deadlock Andreas Gruenbacher
2019-04-25 15:29 ` [PATCH v2 1/2] iomap: Add a page_prepare callback Matthew Wilcox
2019-04-25 15:52 ` Andreas Gruenbacher
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).