linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Help: ext4 jbd2 IO requests slow down fsync
@ 2020-01-24  6:28 Colin Zou
  2020-01-25  1:57 ` Theodore Y. Ts'o
  0 siblings, 1 reply; 4+ messages in thread
From: Colin Zou @ 2020-01-24  6:28 UTC (permalink / raw)
  To: linux-ext4

Hi,

I used to run my application on ext3 on SSD and recently switched to
ext4. However, my application sees performance regression. The root
cause is, iosnoop shows that the workload includes a lot of fsync and
every fsync does data IO and also jbd2 IO. While on ext3, it seldom
does journal IO. Is there a way to tune ext4 to increase fsync
performance? Say, by reducing jbd2 IO requests?

Thanks,
Colin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Help: ext4 jbd2 IO requests slow down fsync
  2020-01-24  6:28 Help: ext4 jbd2 IO requests slow down fsync Colin Zou
@ 2020-01-25  1:57 ` Theodore Y. Ts'o
  2020-01-28  4:55   ` Colin Zou
  0 siblings, 1 reply; 4+ messages in thread
From: Theodore Y. Ts'o @ 2020-01-25  1:57 UTC (permalink / raw)
  To: Colin Zou; +Cc: linux-ext4

On Thu, Jan 23, 2020 at 10:28:47PM -0800, Colin Zou wrote:
> 
> I used to run my application on ext3 on SSD and recently switched to
> ext4. However, my application sees performance regression. The root
> cause is, iosnoop shows that the workload includes a lot of fsync and
> every fsync does data IO and also jbd2 IO. While on ext3, it seldom
> does journal IO. Is there a way to tune ext4 to increase fsync
> performance? Say, by reducing jbd2 IO requests?

If you're not seeing journal I/O from ext3 after an fsync, you're not
looking at things correctly.  At the very *least* there will be
journal I/O for the commit block, unless all of the work was done
earlier in a previous journal commit.

In general, ext4 and ext3 will be doing roughly the same amount of I/O
to the journal.  In some cases, depending on the workload, ext4
*might* need to do more data I/O for the file being synced.  That's
because with ext3, if there is an intervening periodic 5 second
journal commit, some or all of the data I/O may have been forced out
to disk earlier due to said 5 second sync.

What sort of workload does your application do?  How much data blocks
are you writing before each fsync(), and how often are the fsync()
operations?

						- Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Help: ext4 jbd2 IO requests slow down fsync
  2020-01-25  1:57 ` Theodore Y. Ts'o
@ 2020-01-28  4:55   ` Colin Zou
  2020-01-28 19:34     ` Theodore Y. Ts'o
  0 siblings, 1 reply; 4+ messages in thread
From: Colin Zou @ 2020-01-28  4:55 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: linux-ext4

Thanks for the information and analysis. I then did more tests. My app
runs random 4KB workloads on SSD device, one write followed by one
fsync. Here are the FIO test simulating the workload and the test
results. Please help to take a look and let me know what you think.

=================
Ext3 (Kernel 3.2)
=================
# ./fio  --rw=randwrite --bs=4k --direct=0 --filesize=1G --numjobs=32
--runtime=240 --group_reporting --filename=/db/test.benchmark
--name=randomwrite --fsync=3 --invalidate=1

randomwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K,
ioengine=psync, iodepth=1

...

fio-2.16-5-g915ca

Starting 32 processes

randomwrite: Laying out IO file(s) (1 file(s) / 1024MB)

Jobs: 31 (f=31): [w(9),_(1),w(22)] [100.0% done] [0KB/294.4MB/0KB /s]
[0/75.4K/0 iops] [eta 00m:00s]

randomwrite: (groupid=0, jobs=32): err= 0: pid=9050: Tue Jan  7 18:00:16 2020

  write: io=32768MB, bw=272130KB/s, iops=68032, runt=123303msec
     <<<<<<<<<<<  iops is much higher than 4.4 kernel test below.

    clat (usec): min=1, max=21783, avg=35.40, stdev=137.63

     lat (usec): min=1, max=21783, avg=35.51, stdev=137.74

    clat percentiles (usec):

     |  1.00th=[    1],  5.00th=[    2], 10.00th=[    2], 20.00th=[    3],

     | 30.00th=[    4], 40.00th=[    6], 50.00th=[    7], 60.00th=[   10],

     | 70.00th=[   14], 80.00th=[   23], 90.00th=[   94], 95.00th=[  211],

     | 99.00th=[  354], 99.50th=[  580], 99.90th=[ 1144], 99.95th=[ 1336],

     | 99.99th=[ 1816]

    lat (usec) : 2=2.18%, 4=19.43%, 10=36.36%, 20=19.23%, 50=10.43%

    lat (usec) : 100=2.60%, 250=6.43%, 500=2.75%, 750=0.23%, 1000=0.19%

    lat (msec) : 2=0.16%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%

  cpu          : usr=0.35%, sys=8.89%, ctx=27987466, majf=0, minf=1120

  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%

     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     issued    : total=r=0/w=8388608/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0

     latency   : target=0, window=0, percentile=100.00%, depth=1



Run status group 0 (all jobs):

  WRITE: io=32768MB, aggrb=272129KB/s, minb=272129KB/s,
maxb=272129KB/s, mint=123303msec, maxt=123303msec



Disk stats (read/write):

    dm-4: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=1/8378314, aggrmerge=0/10665, aggrticks=0/3661470,
aggrin_queue=3661970, aggrutil=97.88%

  sdb: ios=1/8378314, merge=0/10665, ticks=0/3661470,
in_queue=3661970, util=97.88%


===================
Kernel 4.4
===================
# ./fio  --rw=randwrite --bs=4k --direct=0 --filesize=1G --numjobs=32
--runtime=240 --group_reporting --filename=/db/test.benchmark
--name=randomwrite --fsync=3 --invalidate=1

randomwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K,
ioengine=psync, iodepth=1

...

fio-2.16-5-g915ca

Starting 32 processes

Jobs: 32 (f=32): [w(32)] [100.0% done] [0KB/219.3MB/0KB /s] [0/56.2K/0
iops] [eta 00m:00s]

randomwrite: (groupid=0, jobs=32): err= 0: pid=27703: Tue Jan  7 17:42:03 2020

  write: io=32768MB, bw=228297KB/s, iops=57074, runt=146977msec

    clat (usec): min=1, max=1647, avg=22.88, stdev=27.19

     lat (usec): min=1, max=1647, avg=22.96, stdev=27.22

    clat percentiles (usec):

     |  1.00th=[    2],  5.00th=[    3], 10.00th=[    5], 20.00th=[    8],

     | 30.00th=[   11], 40.00th=[   13], 50.00th=[   15], 60.00th=[   17],

     | 70.00th=[   22], 80.00th=[   33], 90.00th=[   52], 95.00th=[   72],

     | 99.00th=[  103], 99.50th=[  131], 99.90th=[  241], 99.95th=[  326],

     | 99.99th=[  852]

    lat (usec) : 2=0.68%, 4=5.79%, 10=18.95%, 20=40.42%, 50=22.92%

    lat (usec) : 100=9.96%, 250=1.19%, 500=0.06%, 750=0.01%, 1000=0.01%

    lat (msec) : 2=0.01%

  cpu          : usr=0.30%, sys=10.26%, ctx=24443801, majf=0, minf=295

  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%

     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

     issued    : total=r=0/w=8388608/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0

     latency   : target=0, window=0, percentile=100.00%, depth=1



Run status group 0 (all jobs):

  WRITE: io=32768MB, aggrb=228297KB/s, minb=228297KB/s,
maxb=228297KB/s, mint=146977msec, maxt=146977msec



Disk stats (read/write):

    dm-3: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=0/8555222, aggrmerge=0/88454, aggrticks=0/4851095,
aggrin_queue=4858514, aggrutil=91.66%

  sdb: ios=0/8555222, merge=0/88454, ticks=0/4851095,
in_queue=4858514, util=91.66%



On Fri, Jan 24, 2020 at 5:57 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> On Thu, Jan 23, 2020 at 10:28:47PM -0800, Colin Zou wrote:
> >
> > I used to run my application on ext3 on SSD and recently switched to
> > ext4. However, my application sees performance regression. The root
> > cause is, iosnoop shows that the workload includes a lot of fsync and
> > every fsync does data IO and also jbd2 IO. While on ext3, it seldom
> > does journal IO. Is there a way to tune ext4 to increase fsync
> > performance? Say, by reducing jbd2 IO requests?
>
> If you're not seeing journal I/O from ext3 after an fsync, you're not
> looking at things correctly.  At the very *least* there will be
> journal I/O for the commit block, unless all of the work was done
> earlier in a previous journal commit.
>
> In general, ext4 and ext3 will be doing roughly the same amount of I/O
> to the journal.  In some cases, depending on the workload, ext4
> *might* need to do more data I/O for the file being synced.  That's
> because with ext3, if there is an intervening periodic 5 second
> journal commit, some or all of the data I/O may have been forced out
> to disk earlier due to said 5 second sync.
>
> What sort of workload does your application do?  How much data blocks
> are you writing before each fsync(), and how often are the fsync()
> operations?
>
>                                                 - Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Help: ext4 jbd2 IO requests slow down fsync
  2020-01-28  4:55   ` Colin Zou
@ 2020-01-28 19:34     ` Theodore Y. Ts'o
  0 siblings, 0 replies; 4+ messages in thread
From: Theodore Y. Ts'o @ 2020-01-28 19:34 UTC (permalink / raw)
  To: Colin Zou; +Cc: linux-ext4

On Mon, Jan 27, 2020 at 08:55:04PM -0800, Colin Zou wrote:
> Thanks for the information and analysis. I then did more tests. My app
> runs random 4KB workloads on SSD device, one write followed by one
> fsync. Here are the FIO test simulating the workload and the test
> results. Please help to take a look and let me know what you think.

What changed and didn't between the two tests?  I see you went between
the 3.2 kernel and the 4.4 kernel.  Was the hardware held constant?
What about the file system configuration?  Did you use a freshly
formated file systems before running each test?  What file system
configuration?  Ext4 tends to enable 64-bit support, and 256-byte
inodes, and journal checksums.  On much older versions of e2fsprogs,
an ext3 file system may be using 128-byte inodes.

I see that your test is one where you are using buffered I/O and
running an fsync after each 12k write.  With that sort of workload,
differences caused by ext4's use of delayed allocation would be
largely mooted; in both cases, data block writes would *have* to be
forced out as part of the fsync operation.

So something else is going on.  Looking at the output of dumpe2fs -h
on both file systems would be useful.  You can also try creating a
file system using mke2fs -t ext3 and mounting it with -t ext3 (making
sure CONFIG_FS_EXT3 is enabled on the 4.4 kernel) and see what sort of
results you see from that.  Although the ext3 code was removed from
the 4.4 kernels, we do have an ext3 emulation mode that disables all
of the ext4 optimizations and uses the ext3 style algorithms.

Note that with newer versions of e2fsprogs, the default inode size is
now 256 bytes, even if you create the file system using "mke2fs -t
ext3" or "mkfs.ext3".  The decision to go to a larger inode size was
to optimize SELinux performance, but if you're using a really ancient
distro, you might have an equally ancient version of e2fsprogs that is
using a 128 byte inode.  A smaller inode means we can put more inodes
in a 4k block, and this can decrease the need for metadata updates.
This could very much be an issue with this workload, since you there
are 32 thread writing in parallel.

The other thing that could be going on is that ext3 had a really,
really stupid allocator that doens't try to keep files contiguous.
Combined with the lack of preallocation, and a workload which has 32
threads doing "write 32k, fsync", it's very likely that the files are
horribly fragmented.  Using a 4 file example:

         BLOCKS
File A:  100, 101, 102, 112, 113, 114, 124, 125, 126, ...
File B:  103, 104, 105, 115, 116, 117, 127, 128, 129, ...
File C:  106, 107, 108, 118, 119, 120, 130, 131, 132, ...
File D:  109, 110, 111, 121, 122, 123, 133, 134, 135, ...

But what it does mean is that workload could have a very sequential
I/O *pattern*.

100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, <CACHE FLUSH>
112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, <CACHE FLUSH>

With ext4 (and even ext4 in "ext3 emulation mode") the write patterns
will be less sequential, but the resulting files will be much more
contiguous.  And this could be causing the SSD to take more time to do
the write requests and the cache flush operations.

That could very well be what you are seeing.  Is your benchmark
workload of parallel, buffered writes with fsync's every 12k really
representative of what your workload is actually doing in production?

	       	       		   	    	     - Ted

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-01-28 19:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-24  6:28 Help: ext4 jbd2 IO requests slow down fsync Colin Zou
2020-01-25  1:57 ` Theodore Y. Ts'o
2020-01-28  4:55   ` Colin Zou
2020-01-28 19:34     ` Theodore Y. Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).