Big I/O requests are split into small ones due to unaligned ext4 partition boundary?

* Big I/O requests are split into small ones due to unaligned ext4 partition boundary?
@ 2016-12-15 11:47 Dexuan Cui
  2016-12-15 12:43 ` Ming Lei
  0 siblings, 1 reply; 4+ messages in thread
From: Dexuan Cui @ 2016-12-15 11:47 UTC (permalink / raw)
  To: Jens Axboe, Theodore Ts'o, Andreas Dilger, linux-block, linux-ext4
  Cc: linux-kernel, Abel Hu, Thomas Shao, Matthew Wilcox, Long Li,
	KY Srinivasan

Hi, when I run "mkfs.ext4 /dev/sdc2" in a Linux virtual machine on Hyper-V,
where a disk IOPS=500 limit is applied by me [0],  the command takes much
more time, if the ext4 partition boundary is not properly aligned:

Example 1 [1]: it takes ~7 minutes with average wMB/s = 0.3   (slow)
Example 2 [2]: it takes ~3.5 minutes with average wMB/s = 0.6 (slow)
Example 3 [3]: it takes ~0.5 minute with average wMB/s = 4 (expected)

strace shows the mkfs.ext3 program calls seek()/write() a lot and most of
the writes use 32KB buffers (this should be big enough), and the program
only invokes fsync() once, after it issues all the writes -- the fsync() takes
>99% of the time.

By logging SCSI commands, the SCSI Write(10) command is used here for the
userspace 32KB write:
in example 1, *each* command writes 1 or 2 sectors only (1 sector = 512 bytes);
in example 2, *each* command writes 2 or 4 sectors only;
in example 3, each command writes 1024 sectors.

It looks the kernel block I/O layer can somehow split big user-space buffers
into really small write requests (1, 2, and 4 sectors)?
This looks really strange to me.

Note: in my test, this strange issue happens to 4.4 and the mainline 4.9 kernels,
but the stable 3.18.45 kernel doesn't have the issue, i.e. all the 3 above test
examples can finish in ~0.5 minute.

Any comment?

Thanks!
-- Dexuan

[0] The max IOPS are measured in 8KB increments, meaning the max
 throughput is 8KB * 500 = 4000KB.

[1] This is the partition info of my 20GB disk:
# fdisk -l /dev/sdc
Disk /dev/sdc: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot    Start      End  Sectors  Size Id Type
/dev/sdc1              1 14281784 14281784  6.8G 82 Linux swap / Solaris
/dev/sdc2       14281785 41929649 27647865 13.2G 83 Linux

Here, start_sector = 14281785, end_sector = 41929649.

[2] start_sector = 14282752, end_sector = 41929649

[3] start_sector = 14282752, end_sector = 41943039

^ permalink raw reply	[flat|nested] 4+ messages in thread