All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] ext4: Avoid transaction stalls during writeback
@ 2017-04-11 13:54 Jan Kara
  2017-04-11 13:54 ` [PATCH 1/3] ext4: Allow IO submission without io_end Jan Kara
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Jan Kara @ 2017-04-11 13:54 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara

Hello,

this series fixes stalls I've observed during a workload simulating Postgress
database. The series reduces transaction commit times about 100x in my test
setup. See patch 3/3 for more details about the workload and the results.
The series has passed testing with xfstests with 4k and 1k blocksize and
a test with 4k blocksize in dioreadnolock mode.

								Honza

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] ext4: Allow IO submission without io_end
  2017-04-11 13:54 [PATCH 0/3] ext4: Avoid transaction stalls during writeback Jan Kara
@ 2017-04-11 13:54 ` Jan Kara
  2017-04-30 22:01   ` Theodore Ts'o
  2017-04-11 13:54 ` [PATCH 2/3] ext4: Don't allocate io_end for writeback from ext4_writepage() Jan Kara
  2017-04-11 13:54 ` [PATCH 3/3] ext4: Avoid unnecessary transaction stalls during writeback Jan Kara
  2 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2017-04-11 13:54 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara

Allow submission of bio through ext4_bio_write_page() without io_end.
For the case where we are not converting unwritten extents we don't need
io_end for anything anyway.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/page-io.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 208241b06662..a8a6caf4752a 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -298,23 +298,20 @@ static void ext4_end_bio(struct bio *bio)
 	ext4_io_end_t *io_end = bio->bi_private;
 	sector_t bi_sector = bio->bi_iter.bi_sector;
 
-	BUG_ON(!io_end);
 	bio->bi_end_io = NULL;
 
 	if (bio->bi_error) {
 		struct inode *inode = io_end->inode;
 
 		ext4_warning(inode->i_sb, "I/O error %d writing to inode %lu "
-			     "(offset %llu size %ld starting block %llu)",
+			     "(starting block %llu)",
 			     bio->bi_error, inode->i_ino,
-			     (unsigned long long) io_end->offset,
-			     (long) io_end->size,
 			     (unsigned long long)
 			     bi_sector >> (inode->i_blkbits - 9));
 		mapping_set_error(inode->i_mapping, bio->bi_error);
 	}
 
-	if (io_end->flag & EXT4_IO_END_UNWRITTEN) {
+	if (io_end && io_end->flag & EXT4_IO_END_UNWRITTEN) {
 		/*
 		 * Link bio into list hanging from io_end. We have to do it
 		 * atomically as bio completions can be racing against each
@@ -322,15 +319,17 @@ static void ext4_end_bio(struct bio *bio)
 		 */
 		bio->bi_private = xchg(&io_end->bio, bio);
 		ext4_put_io_end_defer(io_end);
-	} else {
+		return;
+	}
+	if (io_end) {
 		/*
 		 * Drop io_end reference early. Inode can get freed once
 		 * we finish the bio.
 		 */
 		ext4_put_io_end_defer(io_end);
-		ext4_finish_bio(bio);
-		bio_put(bio);
 	}
+	ext4_finish_bio(bio);
+	bio_put(bio);
 }
 
 void ext4_io_submit(struct ext4_io_submit *io)
@@ -366,7 +365,8 @@ static int io_submit_init_bio(struct ext4_io_submit *io,
 	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
 	bio->bi_bdev = bh->b_bdev;
 	bio->bi_end_io = ext4_end_bio;
-	bio->bi_private = ext4_get_io_end(io->io_end);
+	if (io->io_end)
+		bio->bi_private = ext4_get_io_end(io->io_end);
 	io->io_bio = bio;
 	io->io_next_block = bh->b_blocknr;
 	return 0;
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] ext4: Don't allocate io_end for writeback from ext4_writepage()
  2017-04-11 13:54 [PATCH 0/3] ext4: Avoid transaction stalls during writeback Jan Kara
  2017-04-11 13:54 ` [PATCH 1/3] ext4: Allow IO submission without io_end Jan Kara
@ 2017-04-11 13:54 ` Jan Kara
  2017-04-11 13:54 ` [PATCH 3/3] ext4: Avoid unnecessary transaction stalls during writeback Jan Kara
  2 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2017-04-11 13:54 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara

ext4_writepage() writes out only mapped buffers with allocated
underlying blocks. Thus there's no need for io_end structure
and we can avoid allocating it.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4247d8d25687..baa87e7d1426 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2107,16 +2107,8 @@ static int ext4_writepage(struct page *page,
 		return __ext4_journalled_writepage(page, len);
 
 	ext4_io_submit_init(&io_submit, wbc);
-	io_submit.io_end = ext4_init_io_end(inode, GFP_NOFS);
-	if (!io_submit.io_end) {
-		redirty_page_for_writepage(wbc, page);
-		unlock_page(page);
-		return -ENOMEM;
-	}
 	ret = ext4_bio_write_page(&io_submit, page, len, wbc, keep_towrite);
 	ext4_io_submit(&io_submit);
-	/* Drop io_end reference we got from init */
-	ext4_put_io_end_defer(io_submit.io_end);
 	return ret;
 }
 
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] ext4: Avoid unnecessary transaction stalls during writeback
  2017-04-11 13:54 [PATCH 0/3] ext4: Avoid transaction stalls during writeback Jan Kara
  2017-04-11 13:54 ` [PATCH 1/3] ext4: Allow IO submission without io_end Jan Kara
  2017-04-11 13:54 ` [PATCH 2/3] ext4: Don't allocate io_end for writeback from ext4_writepage() Jan Kara
@ 2017-04-11 13:54 ` Jan Kara
  2017-04-11 15:00   ` Amir Goldstein
  2017-04-30  0:47   ` Theodore Ts'o
  2 siblings, 2 replies; 11+ messages in thread
From: Jan Kara @ 2017-04-11 13:54 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara

Currently ext4_writepages() submits all pages with transaction started.
When no page needs block allocation or extent conversion we can submit
all dirty pages in the inode while holding a single transaction handle
and when device is congested this can take significant amount of time.
Thus ext4_writepages() can block transaction commits for extended
periods of time.

Take for example a simple benchmark simulating PostgreSQL database
(pgioperf in mmtest). The benchmark runs 16 processes doing random reads
from a huge file, one process doing random writes to the huge file, and
one process doing sequential writes to a small writes and frequently
running fsync. With unpatched kernel transaction commits take on average
~18s with standard deviation of ~41s, top 5 commit times are:

274.466639s, 126.467347s, 86.992429s, 34.351563s, 31.517653s.

After this patch transaction commits take on average 0.1s with standard
deviation of 0.15s, top 5 commit times are:

0.563792s, 0.519980s, 0.509841s, 0.471700s, 0.469899s

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index baa87e7d1426..ff55d430938b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2171,6 +2171,9 @@ static bool mpage_add_bh_to_extent(struct mpage_da_data *mpd, ext4_lblk_t lblk,
 
 	/* First block in the extent? */
 	if (map->m_len == 0) {
+		/* We cannot map unless handle is started... */
+		if (!mpd->io_submit.io_end)
+			return false;
 		map->m_lblk = lblk;
 		map->m_len = 1;
 		map->m_flags = bh->b_state & BH_FLAGS;
@@ -2223,6 +2226,9 @@ static int mpage_process_page_bufs(struct mpage_da_data *mpd,
 			/* Found extent to map? */
 			if (mpd->map.m_len)
 				return 0;
+			/* Buffer needs mapping and handle is not started? */
+			if (!mpd->io_submit.io_end)
+				return 0;
 			/* Everything mapped so far and we hit EOF */
 			break;
 		}
@@ -2739,6 +2745,21 @@ static int ext4_writepages(struct address_space *mapping,
 		tag_pages_for_writeback(mapping, mpd.first_page, mpd.last_page);
 	done = false;
 	blk_start_plug(&plug);
+
+	/*
+	 * First writeback pages that don't need mapping - we can avoid
+	 * starting a transaction unnecessarily and also avoid being blocked
+	 * in the block layer on device congestion while having transaction
+	 * started.
+	 */
+	ret = mpage_prepare_extent_to_map(&mpd);
+	/* Submit prepared bio */
+	ext4_io_submit(&mpd.io_submit);
+	/* Unlock pages we didn't use */
+	mpage_release_unused_pages(&mpd, false);
+	if (ret < 0)
+		goto unplug;
+
 	while (!done && mpd.first_page <= mpd.last_page) {
 		/* For each extent of pages we use new io_end */
 		mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
@@ -2767,6 +2788,7 @@ static int ext4_writepages(struct address_space *mapping,
 				wbc->nr_to_write, inode->i_ino, ret);
 			/* Release allocated io_end */
 			ext4_put_io_end(mpd.io_submit.io_end);
+			mpd.io_submit.io_end = NULL;
 			break;
 		}
 
@@ -2816,6 +2838,7 @@ static int ext4_writepages(struct address_space *mapping,
 			ext4_journal_stop(handle);
 		} else
 			ext4_put_io_end(mpd.io_submit.io_end);
+		mpd.io_submit.io_end = NULL;
 
 		if (ret == -ENOSPC && sbi->s_journal) {
 			/*
@@ -2831,6 +2854,7 @@ static int ext4_writepages(struct address_space *mapping,
 		if (ret)
 			break;
 	}
+unplug:
 	blk_finish_plug(&plug);
 	if (!ret && !cycled && wbc->nr_to_write > 0) {
 		cycled = 1;
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] ext4: Avoid unnecessary transaction stalls during writeback
  2017-04-11 13:54 ` [PATCH 3/3] ext4: Avoid unnecessary transaction stalls during writeback Jan Kara
@ 2017-04-11 15:00   ` Amir Goldstein
  2017-04-11 16:16     ` Jan Kara
  2017-04-30  0:47   ` Theodore Ts'o
  1 sibling, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2017-04-11 15:00 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ted Tso, Ext4

On Tue, Apr 11, 2017 at 4:54 PM, Jan Kara <jack@suse.cz> wrote:
> Currently ext4_writepages() submits all pages with transaction started.
> When no page needs block allocation or extent conversion we can submit
> all dirty pages in the inode while holding a single transaction handle
> and when device is congested this can take significant amount of time.
> Thus ext4_writepages() can block transaction commits for extended
> periods of time.
>
> Take for example a simple benchmark simulating PostgreSQL database
> (pgioperf in mmtest). The benchmark runs 16 processes doing random reads
> from a huge file, one process doing random writes to the huge file, and
> one process doing sequential writes to a small writes and frequently

typo s/small writes/small file/

> running fsync. With unpatched kernel transaction commits take on average
> ~18s with standard deviation of ~41s, top 5 commit times are:
>
> 274.466639s, 126.467347s, 86.992429s, 34.351563s, 31.517653s.
>
> After this patch transaction commits take on average 0.1s with standard
> deviation of 0.15s, top 5 commit times are:
>
> 0.563792s, 0.519980s, 0.509841s, 0.471700s, 0.469899s

That's a very nice improvement! I wonder what the "commit time" metrics
means to end users though.
Perhaps you should additionally phrase the problem statement and the
improvement in metrics that end users understand?
i.e. the runtime of fsync on the small file? is that what it means?

Out of curiousity, I wonder how XFS performs in this benchmark
did you happen to check?
I am guessing it would be closer to the 'after' results?

>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/inode.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index baa87e7d1426..ff55d430938b 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2171,6 +2171,9 @@ static bool mpage_add_bh_to_extent(struct mpage_da_data *mpd, ext4_lblk_t lblk,
>
>         /* First block in the extent? */
>         if (map->m_len == 0) {
> +               /* We cannot map unless handle is started... */
> +               if (!mpd->io_submit.io_end)
> +                       return false;
>                 map->m_lblk = lblk;
>                 map->m_len = 1;
>                 map->m_flags = bh->b_state & BH_FLAGS;
> @@ -2223,6 +2226,9 @@ static int mpage_process_page_bufs(struct mpage_da_data *mpd,
>                         /* Found extent to map? */
>                         if (mpd->map.m_len)
>                                 return 0;
> +                       /* Buffer needs mapping and handle is not started? */
> +                       if (!mpd->io_submit.io_end)
> +                               return 0;
>                         /* Everything mapped so far and we hit EOF */
>                         break;
>                 }
> @@ -2739,6 +2745,21 @@ static int ext4_writepages(struct address_space *mapping,
>                 tag_pages_for_writeback(mapping, mpd.first_page, mpd.last_page);
>         done = false;
>         blk_start_plug(&plug);
> +
> +       /*
> +        * First writeback pages that don't need mapping - we can avoid
> +        * starting a transaction unnecessarily and also avoid being blocked
> +        * in the block layer on device congestion while having transaction
> +        * started.
> +        */
> +       ret = mpage_prepare_extent_to_map(&mpd);
> +       /* Submit prepared bio */
> +       ext4_io_submit(&mpd.io_submit);
> +       /* Unlock pages we didn't use */
> +       mpage_release_unused_pages(&mpd, false);
> +       if (ret < 0)
> +               goto unplug;
> +
>         while (!done && mpd.first_page <= mpd.last_page) {
>                 /* For each extent of pages we use new io_end */
>                 mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
> @@ -2767,6 +2788,7 @@ static int ext4_writepages(struct address_space *mapping,
>                                 wbc->nr_to_write, inode->i_ino, ret);
>                         /* Release allocated io_end */
>                         ext4_put_io_end(mpd.io_submit.io_end);
> +                       mpd.io_submit.io_end = NULL;
>                         break;
>                 }
>
> @@ -2816,6 +2838,7 @@ static int ext4_writepages(struct address_space *mapping,
>                         ext4_journal_stop(handle);
>                 } else
>                         ext4_put_io_end(mpd.io_submit.io_end);
> +               mpd.io_submit.io_end = NULL;
>
>                 if (ret == -ENOSPC && sbi->s_journal) {
>                         /*
> @@ -2831,6 +2854,7 @@ static int ext4_writepages(struct address_space *mapping,
>                 if (ret)
>                         break;
>         }
> +unplug:
>         blk_finish_plug(&plug);
>         if (!ret && !cycled && wbc->nr_to_write > 0) {
>                 cycled = 1;
> --
> 2.12.0
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] ext4: Avoid unnecessary transaction stalls during writeback
  2017-04-11 15:00   ` Amir Goldstein
@ 2017-04-11 16:16     ` Jan Kara
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2017-04-11 16:16 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Ted Tso, Ext4

On Tue 11-04-17 18:00:50, Amir Goldstein wrote:
> On Tue, Apr 11, 2017 at 4:54 PM, Jan Kara <jack@suse.cz> wrote:
> > Currently ext4_writepages() submits all pages with transaction started.
> > When no page needs block allocation or extent conversion we can submit
> > all dirty pages in the inode while holding a single transaction handle
> > and when device is congested this can take significant amount of time.
> > Thus ext4_writepages() can block transaction commits for extended
> > periods of time.
> >
> > Take for example a simple benchmark simulating PostgreSQL database
> > (pgioperf in mmtest). The benchmark runs 16 processes doing random reads
> > from a huge file, one process doing random writes to the huge file, and
> > one process doing sequential writes to a small writes and frequently
> 
> typo s/small writes/small file/

Yeah, thanks.

> > running fsync. With unpatched kernel transaction commits take on average
> > ~18s with standard deviation of ~41s, top 5 commit times are:
> >
> > 274.466639s, 126.467347s, 86.992429s, 34.351563s, 31.517653s.
> >
> > After this patch transaction commits take on average 0.1s with standard
> > deviation of 0.15s, top 5 commit times are:
> >
> > 0.563792s, 0.519980s, 0.509841s, 0.471700s, 0.469899s
> 
> That's a very nice improvement! I wonder what the "commit time" metrics
> means to end users though.
> Perhaps you should additionally phrase the problem statement and the
> improvement in metrics that end users understand?
> i.e. the runtime of fsync on the small file? is that what it means?

Well, I don't think kernel commit logs are really for "end users" :). But
generally any change in ext4 can wait for an outstanding transaction commit
so even a plain atime update or write(2) can block until transaction is
committed.

> Out of curiousity, I wonder how XFS performs in this benchmark
> did you happen to check?
> I am guessing it would be closer to the 'after' results?

Out of curiosity I've run the benchmark on XFS and yes, XFS is closer to
'after' results. The benchmark actually reports three things - times to
read, times to write, and times to fsync. Reads are very similar for ext4
after the patch and xfs. XFS is much faster for writes even after the patch
(I assume because we pay the overhead of transaction machinery in ext4
although it is not necessary for the writes we do here). ext4 is about 2x
faster for fsync calls.

								Honza

> >
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  fs/ext4/inode.c | 24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> >
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index baa87e7d1426..ff55d430938b 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -2171,6 +2171,9 @@ static bool mpage_add_bh_to_extent(struct mpage_da_data *mpd, ext4_lblk_t lblk,
> >
> >         /* First block in the extent? */
> >         if (map->m_len == 0) {
> > +               /* We cannot map unless handle is started... */
> > +               if (!mpd->io_submit.io_end)
> > +                       return false;
> >                 map->m_lblk = lblk;
> >                 map->m_len = 1;
> >                 map->m_flags = bh->b_state & BH_FLAGS;
> > @@ -2223,6 +2226,9 @@ static int mpage_process_page_bufs(struct mpage_da_data *mpd,
> >                         /* Found extent to map? */
> >                         if (mpd->map.m_len)
> >                                 return 0;
> > +                       /* Buffer needs mapping and handle is not started? */
> > +                       if (!mpd->io_submit.io_end)
> > +                               return 0;
> >                         /* Everything mapped so far and we hit EOF */
> >                         break;
> >                 }
> > @@ -2739,6 +2745,21 @@ static int ext4_writepages(struct address_space *mapping,
> >                 tag_pages_for_writeback(mapping, mpd.first_page, mpd.last_page);
> >         done = false;
> >         blk_start_plug(&plug);
> > +
> > +       /*
> > +        * First writeback pages that don't need mapping - we can avoid
> > +        * starting a transaction unnecessarily and also avoid being blocked
> > +        * in the block layer on device congestion while having transaction
> > +        * started.
> > +        */
> > +       ret = mpage_prepare_extent_to_map(&mpd);
> > +       /* Submit prepared bio */
> > +       ext4_io_submit(&mpd.io_submit);
> > +       /* Unlock pages we didn't use */
> > +       mpage_release_unused_pages(&mpd, false);
> > +       if (ret < 0)
> > +               goto unplug;
> > +
> >         while (!done && mpd.first_page <= mpd.last_page) {
> >                 /* For each extent of pages we use new io_end */
> >                 mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
> > @@ -2767,6 +2788,7 @@ static int ext4_writepages(struct address_space *mapping,
> >                                 wbc->nr_to_write, inode->i_ino, ret);
> >                         /* Release allocated io_end */
> >                         ext4_put_io_end(mpd.io_submit.io_end);
> > +                       mpd.io_submit.io_end = NULL;
> >                         break;
> >                 }
> >
> > @@ -2816,6 +2838,7 @@ static int ext4_writepages(struct address_space *mapping,
> >                         ext4_journal_stop(handle);
> >                 } else
> >                         ext4_put_io_end(mpd.io_submit.io_end);
> > +               mpd.io_submit.io_end = NULL;
> >
> >                 if (ret == -ENOSPC && sbi->s_journal) {
> >                         /*
> > @@ -2831,6 +2854,7 @@ static int ext4_writepages(struct address_space *mapping,
> >                 if (ret)
> >                         break;
> >         }
> > +unplug:
> >         blk_finish_plug(&plug);
> >         if (!ret && !cycled && wbc->nr_to_write > 0) {
> >                 cycled = 1;
> > --
> > 2.12.0
> >
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] ext4: Avoid unnecessary transaction stalls during writeback
  2017-04-11 13:54 ` [PATCH 3/3] ext4: Avoid unnecessary transaction stalls during writeback Jan Kara
  2017-04-11 15:00   ` Amir Goldstein
@ 2017-04-30  0:47   ` Theodore Ts'o
  1 sibling, 0 replies; 11+ messages in thread
From: Theodore Ts'o @ 2017-04-30  0:47 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

On Tue, Apr 11, 2017 at 03:54:18PM +0200, Jan Kara wrote:
> Currently ext4_writepages() submits all pages with transaction started.
> When no page needs block allocation or extent conversion we can submit
> all dirty pages in the inode while holding a single transaction handle
> and when device is congested this can take significant amount of time.
> Thus ext4_writepages() can block transaction commits for extended
> periods of time.
> 
> Take for example a simple benchmark simulating PostgreSQL database
> (pgioperf in mmtest). The benchmark runs 16 processes doing random reads
> from a huge file, one process doing random writes to the huge file, and
> one process doing sequential writes to a small writes and frequently
> running fsync. With unpatched kernel transaction commits take on average
> ~18s with standard deviation of ~41s, top 5 commit times are:
> 
> 274.466639s, 126.467347s, 86.992429s, 34.351563s, 31.517653s.
> 
> After this patch transaction commits take on average 0.1s with standard
> deviation of 0.15s, top 5 commit times are:
> 
> 0.563792s, 0.519980s, 0.509841s, 0.471700s, 0.469899s
> 
> Signed-off-by: Jan Kara <jack@suse.cz>

Great work!   Many thanks.

Apologies for the late application; April has been crazy busy.  I'm
kicking off a full regression test, but it looks good.

	      	   	      	    	- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] ext4: Allow IO submission without io_end
  2017-04-11 13:54 ` [PATCH 1/3] ext4: Allow IO submission without io_end Jan Kara
@ 2017-04-30 22:01   ` Theodore Ts'o
  2017-04-30 22:30     ` [PATCH -v2] ext4: avoid unnecessary transaction stalls during writeback Theodore Ts'o
  0 siblings, 1 reply; 11+ messages in thread
From: Theodore Ts'o @ 2017-04-30 22:01 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

On Tue, Apr 11, 2017 at 03:54:16PM +0200, Jan Kara wrote:
> Allow submission of bio through ext4_bio_write_page() without io_end.
> For the case where we are not converting unwritten extents we don't need
> io_end for anything anyway.

So this patch doesn't work, because if there is an error, we need the
inode in io_end so we can set the error in the mapping:

> @@ -298,23 +298,20 @@ static void ext4_end_bio(struct bio *bio)
>  	ext4_io_end_t *io_end = bio->bi_private;
>  	sector_t bi_sector = bio->bi_iter.bi_sector;
>  
> -	BUG_ON(!io_end);
>  	bio->bi_end_io = NULL;
>  
>  	if (bio->bi_error) {
>  		struct inode *inode = io_end->inode;
                                      ^^^^^^^^^  NULL pointer dereference

Otherwise, we can crash above.  The best way to reproduce it is
"gce-xfstests -c ext3 -C 10 generic/081" (it will usually crash on the
very first attempt to run generic/081).  It doesn't repro using
kvm-xfstests for me, so there does seem to be a certain amount of
timing that's needed to trigger it.  I assume the reason using the
ext3 configuration is necessary to ticklet the bug is because it turns
off delayed allocation --- and we probably don't have any workloads in
xfstests where we are doing writes to preallocated blocks while we
trigger an I/O error by letting a snapshot fill up, which is why you
didn't notice this when doing your tests using 4k and 1k block sizes.

To fix this, I've dropped the first two patches in your series, and
just stuck with the last patch ("ext4: avoid unnecessary transaction
stalls during writeback") with the following patch applied on top (I
will merge it into your patch).  We still allocate and deallocate
io_end everywhere, but we do avoid the unnecessary handle starts and
thus, the transaction stalls.

Cheers,

						- Ted

commit d4d6313a2298ec91e6f8bc95ed0d437f33453b45
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Sun Apr 30 16:03:06 2017 -0400

    patch fixup-avoid-unnecessary-transaction-stalls

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c5eb8b861629..c65e29953800 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1643,6 +1643,7 @@ struct mpage_da_data {
 	 */
 	struct ext4_map_blocks map;
 	struct ext4_io_submit io_submit;	/* IO submission data */
+	unsigned int do_map:1;
 };
 
 static void mpage_release_unused_pages(struct mpage_da_data *mpd,
@@ -2180,7 +2181,7 @@ static bool mpage_add_bh_to_extent(struct mpage_da_data *mpd, ext4_lblk_t lblk,
 	/* First block in the extent? */
 	if (map->m_len == 0) {
 		/* We cannot map unless handle is started... */
-		if (!mpd->io_submit.io_end)
+		if (!mpd->do_map)
 			return false;
 		map->m_lblk = lblk;
 		map->m_len = 1;
@@ -2235,7 +2236,7 @@ static int mpage_process_page_bufs(struct mpage_da_data *mpd,
 			if (mpd->map.m_len)
 				return 0;
 			/* Buffer needs mapping and handle is not started? */
-			if (!mpd->io_submit.io_end)
+			if (!mpd->do_map)
 				return 0;
 			/* Everything mapped so far and we hit EOF */
 			break;
@@ -2760,9 +2761,17 @@ static int ext4_writepages(struct address_space *mapping,
 	 * in the block layer on device congestion while having transaction
 	 * started.
 	 */
+	mpd.do_map = 0;
+	mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
+	if (!mpd.io_submit.io_end) {
+		ret = -ENOMEM;
+		goto unplug;
+	}
 	ret = mpage_prepare_extent_to_map(&mpd);
 	/* Submit prepared bio */
 	ext4_io_submit(&mpd.io_submit);
+	ext4_put_io_end_defer(mpd.io_submit.io_end);
+	mpd.io_submit.io_end = NULL;
 	/* Unlock pages we didn't use */
 	mpage_release_unused_pages(&mpd, false);
 	if (ret < 0)
@@ -2799,6 +2808,7 @@ static int ext4_writepages(struct address_space *mapping,
 			mpd.io_submit.io_end = NULL;
 			break;
 		}
+		mpd.do_map = 1;
 
 		trace_ext4_da_write_pages(inode, mpd.first_page, mpd.wbc);
 		ret = mpage_prepare_extent_to_map(&mpd);
@@ -2829,6 +2839,7 @@ static int ext4_writepages(struct address_space *mapping,
 		if (!ext4_handle_valid(handle) || handle->h_sync == 0) {
 			ext4_journal_stop(handle);
 			handle = NULL;
+			mpd.do_map = 0;
 		}
 		/* Submit prepared bio */
 		ext4_io_submit(&mpd.io_submit);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH -v2] ext4: avoid unnecessary transaction stalls during writeback
  2017-04-30 22:01   ` Theodore Ts'o
@ 2017-04-30 22:30     ` Theodore Ts'o
  2017-05-02 12:10       ` Jan Kara
  0 siblings, 1 reply; 11+ messages in thread
From: Theodore Ts'o @ 2017-04-30 22:30 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: Jan Kara, Theodore Ts'o

From: Jan Kara <jack@suse.cz>

Currently ext4_writepages() submits all pages with transaction started.
When no page needs block allocation or extent conversion we can submit
all dirty pages in the inode while holding a single transaction handle
and when device is congested this can take significant amount of time.
Thus ext4_writepages() can block transaction commits for extended
periods of time.

Take for example a simple benchmark simulating PostgreSQL database
(pgioperf in mmtest). The benchmark runs 16 processes doing random reads
from a huge file, one process doing random writes to the huge file, and
one process doing sequential writes to a small files and frequently
running fsync. With unpatched kernel transaction commits take on average
~18s with standard deviation of ~41s, top 5 commit times are:

274.466639s, 126.467347s, 86.992429s, 34.351563s, 31.517653s.

After this patch transaction commits take on average 0.1s with standard
deviation of 0.15s, top 5 commit times are:

0.563792s, 0.519980s, 0.509841s, 0.471700s, 0.469899s

[ Modified so we use an explicit do_map flag instead of relying on
  io_end not being allocated, the since io_end->inode is needed for I/O
  error handling. -- tytso ]

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 fs/ext4/inode.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 178ac7c39403..c65e29953800 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1643,6 +1643,7 @@ struct mpage_da_data {
 	 */
 	struct ext4_map_blocks map;
 	struct ext4_io_submit io_submit;	/* IO submission data */
+	unsigned int do_map:1;
 };
 
 static void mpage_release_unused_pages(struct mpage_da_data *mpd,
@@ -2179,6 +2180,9 @@ static bool mpage_add_bh_to_extent(struct mpage_da_data *mpd, ext4_lblk_t lblk,
 
 	/* First block in the extent? */
 	if (map->m_len == 0) {
+		/* We cannot map unless handle is started... */
+		if (!mpd->do_map)
+			return false;
 		map->m_lblk = lblk;
 		map->m_len = 1;
 		map->m_flags = bh->b_state & BH_FLAGS;
@@ -2231,6 +2235,9 @@ static int mpage_process_page_bufs(struct mpage_da_data *mpd,
 			/* Found extent to map? */
 			if (mpd->map.m_len)
 				return 0;
+			/* Buffer needs mapping and handle is not started? */
+			if (!mpd->do_map)
+				return 0;
 			/* Everything mapped so far and we hit EOF */
 			break;
 		}
@@ -2747,6 +2754,29 @@ static int ext4_writepages(struct address_space *mapping,
 		tag_pages_for_writeback(mapping, mpd.first_page, mpd.last_page);
 	done = false;
 	blk_start_plug(&plug);
+
+	/*
+	 * First writeback pages that don't need mapping - we can avoid
+	 * starting a transaction unnecessarily and also avoid being blocked
+	 * in the block layer on device congestion while having transaction
+	 * started.
+	 */
+	mpd.do_map = 0;
+	mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
+	if (!mpd.io_submit.io_end) {
+		ret = -ENOMEM;
+		goto unplug;
+	}
+	ret = mpage_prepare_extent_to_map(&mpd);
+	/* Submit prepared bio */
+	ext4_io_submit(&mpd.io_submit);
+	ext4_put_io_end_defer(mpd.io_submit.io_end);
+	mpd.io_submit.io_end = NULL;
+	/* Unlock pages we didn't use */
+	mpage_release_unused_pages(&mpd, false);
+	if (ret < 0)
+		goto unplug;
+
 	while (!done && mpd.first_page <= mpd.last_page) {
 		/* For each extent of pages we use new io_end */
 		mpd.io_submit.io_end = ext4_init_io_end(inode, GFP_KERNEL);
@@ -2775,8 +2805,10 @@ static int ext4_writepages(struct address_space *mapping,
 				wbc->nr_to_write, inode->i_ino, ret);
 			/* Release allocated io_end */
 			ext4_put_io_end(mpd.io_submit.io_end);
+			mpd.io_submit.io_end = NULL;
 			break;
 		}
+		mpd.do_map = 1;
 
 		trace_ext4_da_write_pages(inode, mpd.first_page, mpd.wbc);
 		ret = mpage_prepare_extent_to_map(&mpd);
@@ -2807,6 +2839,7 @@ static int ext4_writepages(struct address_space *mapping,
 		if (!ext4_handle_valid(handle) || handle->h_sync == 0) {
 			ext4_journal_stop(handle);
 			handle = NULL;
+			mpd.do_map = 0;
 		}
 		/* Submit prepared bio */
 		ext4_io_submit(&mpd.io_submit);
@@ -2824,6 +2857,7 @@ static int ext4_writepages(struct address_space *mapping,
 			ext4_journal_stop(handle);
 		} else
 			ext4_put_io_end(mpd.io_submit.io_end);
+		mpd.io_submit.io_end = NULL;
 
 		if (ret == -ENOSPC && sbi->s_journal) {
 			/*
@@ -2839,6 +2873,7 @@ static int ext4_writepages(struct address_space *mapping,
 		if (ret)
 			break;
 	}
+unplug:
 	blk_finish_plug(&plug);
 	if (!ret && !cycled && wbc->nr_to_write > 0) {
 		cycled = 1;
-- 
2.11.0.rc0.7.gbe5a750

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH -v2] ext4: avoid unnecessary transaction stalls during writeback
  2017-04-30 22:30     ` [PATCH -v2] ext4: avoid unnecessary transaction stalls during writeback Theodore Ts'o
@ 2017-05-02 12:10       ` Jan Kara
  2017-05-02 14:18         ` Theodore Ts'o
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kara @ 2017-05-02 12:10 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Ext4 Developers List, Jan Kara

On Sun 30-04-17 18:30:53, Ted Tso wrote:
> From: Jan Kara <jack@suse.cz>
> 
> Currently ext4_writepages() submits all pages with transaction started.
> When no page needs block allocation or extent conversion we can submit
> all dirty pages in the inode while holding a single transaction handle
> and when device is congested this can take significant amount of time.
> Thus ext4_writepages() can block transaction commits for extended
> periods of time.
> 
> Take for example a simple benchmark simulating PostgreSQL database
> (pgioperf in mmtest). The benchmark runs 16 processes doing random reads
> from a huge file, one process doing random writes to the huge file, and
> one process doing sequential writes to a small files and frequently
> running fsync. With unpatched kernel transaction commits take on average
> ~18s with standard deviation of ~41s, top 5 commit times are:
> 
> 274.466639s, 126.467347s, 86.992429s, 34.351563s, 31.517653s.
> 
> After this patch transaction commits take on average 0.1s with standard
> deviation of 0.15s, top 5 commit times are:
> 
> 0.563792s, 0.519980s, 0.509841s, 0.471700s, 0.469899s
> 
> [ Modified so we use an explicit do_map flag instead of relying on
>   io_end not being allocated, the since io_end->inode is needed for I/O
>   error handling. -- tytso ]

Yeah, thanks for finding the problem and fixing it up! The updated patch
looks good. It is actually relatively easy to fix the original approach so
that we don't have to allocate ioend unnecessarily (we just fetch the inode
pointer on error through bio_vec->bv_page->mapping->host) but I'll do that
separately on top of this change.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH -v2] ext4: avoid unnecessary transaction stalls during writeback
  2017-05-02 12:10       ` Jan Kara
@ 2017-05-02 14:18         ` Theodore Ts'o
  0 siblings, 0 replies; 11+ messages in thread
From: Theodore Ts'o @ 2017-05-02 14:18 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ext4 Developers List

On Tue, May 02, 2017 at 02:10:44PM +0200, Jan Kara wrote:
> 
> Yeah, thanks for finding the problem and fixing it up! The updated patch
> looks good. It is actually relatively easy to fix the original approach so
> that we don't have to allocate ioend unnecessarily (we just fetch the inode
> pointer on error through bio_vec->bv_page->mapping->host) but I'll do that
> separately on top of this change.

We'll need to be a little bit careful to make sure this works for the
encryption case (where we use a temporary scratch/bounce buffer), but
it should be possible to make that work (we just have to look at the
right struct page).

					- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-05-02 14:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-11 13:54 [PATCH 0/3] ext4: Avoid transaction stalls during writeback Jan Kara
2017-04-11 13:54 ` [PATCH 1/3] ext4: Allow IO submission without io_end Jan Kara
2017-04-30 22:01   ` Theodore Ts'o
2017-04-30 22:30     ` [PATCH -v2] ext4: avoid unnecessary transaction stalls during writeback Theodore Ts'o
2017-05-02 12:10       ` Jan Kara
2017-05-02 14:18         ` Theodore Ts'o
2017-04-11 13:54 ` [PATCH 2/3] ext4: Don't allocate io_end for writeback from ext4_writepage() Jan Kara
2017-04-11 13:54 ` [PATCH 3/3] ext4: Avoid unnecessary transaction stalls during writeback Jan Kara
2017-04-11 15:00   ` Amir Goldstein
2017-04-11 16:16     ` Jan Kara
2017-04-30  0:47   ` Theodore Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.