* about the xfs performance
@ 2016-04-11 14:14 Songbo Wang
2016-04-11 16:10 ` Emmanuel Florac
2016-04-11 23:10 ` Dave Chinner
0 siblings, 2 replies; 8+ messages in thread
From: Songbo Wang @ 2016-04-11 14:14 UTC (permalink / raw)
To: xfs
[-- Attachment #1.1: Type: text/plain, Size: 14896 bytes --]
Hi xfsers:
I got some troubles on the performance of xfs.
The environment is ,
xfs version is 3.2.1,
centos 7.1,
kernel version:3.10.0-229.el7.x86_64.
pcie-ssd card,
mkfs: mkfs.xfs /dev/hioa2 -f -n size=64k -i size=512 -d agcount=40 -l
size=1024m.
mount: mount /dev/hioa2 /mnt/ -t xfs -o
rw,noexec,nodev,noatime,nodiratime,nobarrier,discard,inode64,logbsize=256k,delaylog
I use the following command to test iops: fio -ioengine=libaio -bs=4k
-direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test -name="EBS
4KB randwrite test" -iodepth=64 -runtime=60
The results is normal at the beginning which is about 210k±,but some
seconds later, the results down to 19k±.
I did a senond test ,
umount the /dev/hioa2,
fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite
-filename=/dev/hioa2 -name="EBS 8KB randwrite test" -iodepth=64 -runtime=60
The results was normal, the iops is about 210k± all the time.
Then I use perf-tools to trace xfs:
# ./funccount -i 1 'xfs*'
In the first test case , when the iops seems normal , the results is as
follows:
FUNC COUNT
xfs_free_eofblocks 1
xfs_attr_get 5
xfs_attr_get_int 5
xfs_attr_name_to_xname 5
xfs_attr_shortform_getvalue 5
xfs_da_hashname 5
xfs_ilock_attr_map_shared 5
xfs_xattr_get 5
xfs_fs_statfs 7
xfs_icsb_count 7
xfs_icsb_sync_counters 7
xfs_icsb_sync_counters_locked 7
xfs_can_free_eofblocks 9
xfs_file_release 9
xfs_release 9
xfs_icsb_counter_disabled 21
xfs_file_open 30
xfs_file_aio_read 33
xfs_readlink 37
xfs_vn_follow_link 37
xfs_vn_put_link 37
xfs_buf_delwri_submit_nowait 41
xfs_trans_ail_cursor_first 41
xfs_trans_ail_cursor_init 41
xfs_vn_getattr 44
xfs_file_mmap 48
xfs_cil_prepare_item.isra.1 1004
xfs_extent_busy_clear 1004
xfs_inode_item_committing 1004
xfs_inode_item_data_fork_size. 1004
xfs_inode_item_format 1004
xfs_inode_item_format_data_for 1004
xfs_inode_item_size 1004
xfs_inode_item_unlock 1004
xfs_log_calc_unit_res 1004
xfs_log_commit_cil 1004
xfs_log_done 1004
xfs_log_reserve 1004
xfs_log_space_wake 1004
xfs_log_ticket_put 1004
xfs_trans_add_item 1004
xfs_trans_alloc 1004
xfs_trans_apply_dquot_deltas 1004
xfs_trans_commit 1004
xfs_trans_free 1004
xfs_trans_free_dqinfo 1004
xfs_trans_free_item_desc 1004
xfs_trans_free_items 1004
xfs_trans_ijoin 1004
xfs_trans_log_inode 1004
xfs_trans_reserve 1004
xfs_trans_unreserve_and_mod_sb 1004
xfs_vn_update_time 1004
xfs_destroy_ioend 84845
xfs_end_io_direct_write 84845
xfs_finish_ioend 84845
xfs_file_aio_write 84880
xfs_file_dio_aio_write 84882
xfs_vm_direct_IO 84882
xfs_get_blocks_direct 84883
xfs_bmapi_read 84885
xfs_bmap_search_extents 84886
xfs_bmap_search_multi_extents 84886
xfs_iext_bno_to_ext 84887
xfs_iext_bno_to_irec 84887
xfs_iext_get_ext 84890
xfs_iext_idx_to_irec 84890
xfs_ilock_data_map_shared 84891
xfs_map_buffer.isra.9 84892
xfs_file_aio_write_checks 84893
xfs_alloc_ioend 84895
xfs_bmapi_trim_map.isra.11 84900
xfs_fsb_to_db 84902
xfs_bmapi_update_map 84903
xfs_bmbt_get_all 169793
xfs_find_bdev_for_inode 169794
xfs_ilock 170868
xfs_iunlock 170872
xfs_bmbt_get_blockcount 555016
xfs_bmbt_get_startoff 2277490
when the results down to 19k±, the trace results are:
FUNC COUNT
xfs_bitmap_empty 1
xfs_free_eofblocks 1
xfs_attr_get 5
xfs_attr_get_int 5
xfs_attr_name_to_xname 5
xfs_attr_shortform_getvalue 5
xfs_da_hashname 5
xfs_ilock_attr_map_shared 5
xfs_xattr_get 5
xfs_fs_statfs 7
xfs_icsb_count 7
xfs_icsb_sync_counters 7
xfs_icsb_sync_counters_locked 7
xfs_can_free_eofblocks 9
xfs_file_release 9
xfs_release 9
xfs_file_open 30
xfs_file_aio_read 33
xfs_readlink 37
xfs_vn_follow_link 37
xfs_vn_put_link 37
xfs_buf_delwri_submit_nowait 41
xfs_trans_ail_cursor_first 41
xfs_trans_ail_cursor_init 41
xfs_alloc_ag_vextent 43
xfs_alloc_ag_vextent_near 43
xfs_alloc_find_best_extent 43
xfs_alloc_fix_freelist 43
xfs_alloc_fixup_trees 43
xfs_alloc_read_agf 43
xfs_alloc_read_agfl 43
xfs_alloc_update 43
xfs_alloc_update_counters.isra 43
xfs_alloc_vextent 43
xfs_bmbt_alloc_block 43
xfs_btree_get_buf_block.constp 43
xfs_btree_reada_bufs 43
xfs_btree_readahead_sblock.isr 43
xfs_btree_split 43
xfs_btree_split_worker 43
xfs_read_agf 43
xfs_trans_get_buf_map 43
xfs_allocbt_dup_cursor 44
xfs_allocbt_get_minrecs 44
xfs_allocbt_init_rec_from_cur 44
xfs_allocbt_update_lastrec 44
xfs_alloc_fix_minleft 44
xfs_alloc_log_agf 44
xfs_alloc_longest_free_extent 44
xfs_bmbt_init_rec_from_key 44
xfs_bmbt_update_cursor 44
xfs_btree_buf_to_ptr.isra.17 44
xfs_btree_check_sblock 44
xfs_btree_init_block_cur.isra. 44
xfs_btree_init_block_int 44
xfs_btree_islastblock 44
xfs_trans_mod_dquot_byino 44
xfs_trans_mod_sb 44
xfs_vn_getattr 44
xfs_file_mmap 48
xfs_iext_irec_new 57
xfs_iext_realloc_indirect 57
xfs_btree_copy_ptrs 66
xfs_btree_shift_keys.isra.23 66
xfs_btree_shift_ptrs 66
xfs_btree_log_ptrs 72
xfs_alloc_get_rec 86
xfs_alloc_lookup_eq 86
xfs_alloc_compute_aligned 88
xfs_btree_get_rec 88
xfs_extent_busy_trim 88
xfs_iext_add_indirect_multi 108
xfs_alloc_compute_diff 110
xfs_alloc_fix_len 110
xfs_allocbt_init_cursor 132
xfs_iext_irec_compact 163
xfs_iext_remove 163
xfs_iext_remove_indirect 163
xfs_buf_item_relse 173
xfs_allocbt_init_ptr_from_cur 176
xfs_btree_set_sibling.isra.12 176
xfs_buf_item_free 177
xfs_buf_item_free_format 177
xfs_buf_hold 248
xfs_buf_item_get_format 248
xfs_log_item_init 248
xfs_allocbt_get_maxrecs 264
xfs_bmbt_get_minrecs 326
xfs_btree_dec_cursor 369
xfs_btree_delete 369
xfs_btree_delrec 369
xfs_bmbt_set_allf 633
xfs_bmbt_set_startblock 1007
xfs_bmbt_set_startoff 1007
xfs_trans_alloc 1029
xfs_vn_update_time 1029
xfs_btree_lshift 1048
xfs_btree_decrement 1332
xfs_allocbt_init_key_from_rec 1481
xfs_allocbt_key_diff 2427
xfs_btree_check_lblock 3653
xfs_btree_lastrec 3653
xfs_buf_free 3660
xfs_bmbt_dup_cursor 3696
xfs_btree_check_block 3696
xfs_buf_allocate_memory 3703
xfs_btree_dup_cursor 3738
xfs_btree_increment 4324
xfs_btree_make_block_unfull 4699
xfs_btree_rshift 4699
xfs_btree_updkey 4738
xfs_btree_copy_keys.isra.22 4824
xfs_btree_log_keys 4832
xfs_iext_add 4876
xfs_iext_insert 4876
xfs_btree_reada_bufl 5465
xfs_btree_readahead_lblock.isr 5465
xfs_buf_readahead_map 5508
xfs_btree_readahead 5670
xfs_inode_item_format 6071
xfs_inode_item_format_data_for 6072
xfs_inode_item_size 6072
xfs_inode_item_data_fork_size. 6073
xfs_trans_log_inode 6115
xfs_buf_item_pin 6315
xfs_bmapi_write 6674
xfs_end_io 6674
xfs_iomap_write_unwritten 6674
xfs_bmap_add_extent_unwritten_ 6675
xfs_bmapi_convert_unwritten 6675
xfs_destroy_ioend 6675
xfs_end_io_direct_write 6676
xfs_finish_ioend 6677
xfs_bmap_finish 6683
xfs_file_aio_write 6784
xfs_bmapi_read 6785
xfs_file_aio_write_checks 6785
xfs_file_dio_aio_write 6785
xfs_get_blocks_direct 6785
xfs_ilock_data_map_shared 6785
xfs_map_buffer.isra.9 6785
xfs_vm_direct_IO 6785
xfs_alloc_ioend 6786
xfs_fsb_to_db 6786
xfs_bmbt_set_blockcount 7691
xfs_log_commit_cil 7704
xfs_trans_commit 7704
xfs_trans_free_items 7704
xfs_trans_reserve 7705
xfs_trans_unreserve_and_mod_sb 7705
xfs_log_done 7706
xfs_trans_free 7706
xfs_log_reserve 7708
xfs_trans_ijoin 7708
xfs_extent_busy_clear 7712
xfs_inode_item_unlock 7712
xfs_log_space_wake 7712
xfs_log_ticket_put 7712
xfs_trans_apply_dquot_deltas 7712
xfs_inode_item_committing 7713
xfs_log_calc_unit_res 7713
xfs_trans_free_dqinfo 7713
xfs_bmbt_update 8311
xfs_btree_update 8354
xfs_bmbt_init_rec_from_cur 9758
xfs_bmbt_disk_set_all 9760
xfs_bmbt_set_all 9760
xfs_btree_insert 9795
xfs_btree_insrec 9838
xfs_bmbt_init_cursor 10379
xfs_btree_del_cursor 10498
xfs_trans_del_item 11101
xfs_buf_item_dirty 11109
xfs_bmbt_lookup_eq 11551
xfs_bmbt_init_ptr_from_cur 11563
xfs_btree_get_iroot.isra.8 11563
xfs_buf_item_format 11690
xfs_buf_item_size 11690
xfs_buf_item_size_segment.isra 11690
xfs_btree_lookup 11723
xfs_btree_get_sibling.isra.11 11974
xfs_buf_item_unlock 12229
xfs_buf_item_committing 12237
xfs_icsb_modify_counters 13351
xfs_icsb_lock_cntr 13365
xfs_icsb_unlock_cntr 13365
xfs_iext_bno_to_irec 13458
xfs_bmap_search_extents 13459
xfs_bmap_search_multi_extents 13459
xfs_iext_bno_to_ext 13459
xfs_bmapi_trim_map.isra.11 13469
xfs_bmapi_update_map 13469
xfs_find_bdev_for_inode 13573
xfs_btree_shift_recs.isra.24 14798
xfs_cil_prepare_item.isra.1 17769
xfs_bmbt_disk_set_allf 18126
xfs_btree_is_lastrec 18581
xfs_btree_log_block 19648
xfs_btree_set_ptr_null.isra.10 19649
xfs_btree_offsets 19703
xfs_iunlock 21348
xfs_ilock 21349
xfs_btree_copy_recs.isra.21 22849
xfs_buf_item_init 23331
xfs_buf_unlock 23345
xfs_btree_setbuf 23948
xfs_btree_log_recs 24190
xfs_icsb_counter_disabled 26751
xfs_buf_read_map 28786
xfs_buf_get_map 28829
xfs_btree_ptr_is_null.isra.9 29274
xfs_btree_read_buf_block.isra. 29684
xfs_btree_set_refs.isra.13 29712
xfs_trans_add_item 31037
xfs_trans_free_item_desc 31052
xfs_buf_trylock 32463
xfs_perag_put 32606
xfs_perag_get 32633
xfs_buf_rele 32693
xfs_trans_brelse 35148
xfs_btree_ptr_addr 38591
xfs_btree_ptr_offset 38738
xfs_trans_read_buf_map 40938
xfs_trans_buf_item_match 41000
xfs_btree_ptr_to_daddr 44562
xfs_btree_lookup_get_block 46548
xfs_trans_log_buf 48783
xfs_buf_item_log 48806
xfs_trans_buf_set_type 48815
xfs_iext_get_ext 51712
xfs_bmbt_get_all 53669
xfs_btree_get_block 54424
xfs_iext_idx_to_irec 56768
xfs_bmbt_get_maxrecs 57774
xfs_bmbt_get_blockcount 89650
xfs_bmbt_disk_get_startoff 98285
xfs_bmbt_init_key_from_rec 98285
xfs_btree_rec_addr 115199
xfs_btree_rec_offset 163635
xfs_btree_key_addr 176837
xfs_btree_key_offset 186525
xfs_bmbt_key_diff 254763
xfs_lookup_get_search_key 257190
xfs_bmbt_get_startoff 363158
xfs_next_bit 631334
xfs_buf_offset 1166477
Compare with the two results above, I found when the iops down to 19k±,
lots of functions were traced, such
as xfs_buf_trylock, xfs_iext_insert, xfs_btree_insert, etc.
I cannot find what cause the iops down to 19k±, any suggestion ?
[-- Attachment #1.2: Type: text/html, Size: 20758 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: about the xfs performance
2016-04-11 14:14 about the xfs performance Songbo Wang
@ 2016-04-11 16:10 ` Emmanuel Florac
2016-04-11 19:33 ` Eric Sandeen
2016-04-12 13:27 ` Songbo Wang
2016-04-11 23:10 ` Dave Chinner
1 sibling, 2 replies; 8+ messages in thread
From: Emmanuel Florac @ 2016-04-11 16:10 UTC (permalink / raw)
To: Songbo Wang; +Cc: xfs
Le Mon, 11 Apr 2016 22:14:06 +0800
Songbo Wang <hack.coo@gmail.com> écrivait:
> mkfs: mkfs.xfs /dev/hioa2 -f -n size=64k -i size=512 -d
> agcount=40 -l size=1024m.
> mount: mount /dev/hioa2 /mnt/ -t xfs -o
> rw,noexec,nodev,noatime,nodiratime,nobarrier,discard,inode64,logbsize=256k,delaylog
> I use the following command to test iops: fio -ioengine=libaio -bs=4k
> -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test
> -name="EBS 4KB randwrite test" -iodepth=64 -runtime=60
> The results is normal at the beginning which is about 210k±,but some
> seconds later, the results down to 19k±.
You should first try default mkfs settings, with default mount options.
Normally mkfs.xfs should initiate a TRIM on the SSD, therefore
performance should remain predictable.
What model of SSD card are you using? With an HGST NVMe SN1x0 I've got
very consistent results (no degradation with time).
--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <eflorac@intellique.com>
| +33 1 78 94 84 02
------------------------------------------------------------------------
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: about the xfs performance
2016-04-11 16:10 ` Emmanuel Florac
@ 2016-04-11 19:33 ` Eric Sandeen
2016-04-12 13:27 ` Songbo Wang
1 sibling, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2016-04-11 19:33 UTC (permalink / raw)
To: xfs
On 4/11/16 11:10 AM, Emmanuel Florac wrote:
> Le Mon, 11 Apr 2016 22:14:06 +0800
> Songbo Wang <hack.coo@gmail.com> écrivait:
>
>> mkfs: mkfs.xfs /dev/hioa2 -f -n size=64k -i size=512 -d
>> agcount=40 -l size=1024m.
>> mount: mount /dev/hioa2 /mnt/ -t xfs -o
>> rw,noexec,nodev,noatime,nodiratime,nobarrier,discard,inode64,logbsize=256k,delaylog
>> I use the following command to test iops: fio -ioengine=libaio -bs=4k
>> -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test
>> -name="EBS 4KB randwrite test" -iodepth=64 -runtime=60
>> The results is normal at the beginning which is about 210k±,but some
>> seconds later, the results down to 19k±.
>
> You should first try default mkfs settings, with default mount options.
Agreed. Where did that set of options come from, in any case?
-Eric
> Normally mkfs.xfs should initiate a TRIM on the SSD, therefore
> performance should remain predictable.
>
> What model of SSD card are you using? With an HGST NVMe SN1x0 I've got
> very consistent results (no degradation with time).
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: about the xfs performance
2016-04-11 14:14 about the xfs performance Songbo Wang
2016-04-11 16:10 ` Emmanuel Florac
@ 2016-04-11 23:10 ` Dave Chinner
2016-04-12 14:07 ` Songbo Wang
1 sibling, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2016-04-11 23:10 UTC (permalink / raw)
To: Songbo Wang; +Cc: xfs
On Mon, Apr 11, 2016 at 10:14:06PM +0800, Songbo Wang wrote:
> Hi xfsers:
>
> I got some troubles on the performance of xfs.
> The environment is ,
> xfs version is 3.2.1,
> centos 7.1,
> kernel version:3.10.0-229.el7.x86_64.
> pcie-ssd card,
> mkfs: mkfs.xfs /dev/hioa2 -f -n size=64k -i size=512 -d agcount=40 -l
> size=1024m.
> mount: mount /dev/hioa2 /mnt/ -t xfs -o
> rw,noexec,nodev,noatime,nodiratime,nobarrier,discard,inode64,logbsize=256k,delaylog
> I use the following command to test iops: fio -ioengine=libaio -bs=4k
> -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test -name="EBS
> 4KB randwrite test" -iodepth=64 -runtime=60
> The results is normal at the beginning which is about 210k±,but some
> seconds later, the results down to 19k±.
Looks like the workload runs out of log space due to all the
allocation transactions being logged, which then causes new
transactions to start tail pushing the log to flush dirty metadata.
This is needed to to make more space in the log for on incoming dio
writes that require allocation transactions. This will block IO
submission until there is space available in the log.
Let's face it, all that test does is create a massively fragmented
50GB file, so you're going to have a lot of metadata to log. Do the
maths - if it runs at 200kiops for a few seconds, it's created a
million extents.
And it's doing random insert on the extent btree, so
it's repeatedly dirtying the entire extent btree. This will trigger
journal commits quite frequently as this is a large amount of
metadata that is being dirtied. e.g. at 500 extent records per 4k
block, a million extents will require 2000 leaf blocks to store them
all. That's 80MB of metadata per million extents that this workload
is generating and repeatedly dirtying.
Then there's also other metadata, like the free space btrees, that
is also being repeatedly dirtied, etc, so it would not be unexpected
to see a workload like this on high IOPS devices allocating 100MB of
metadata every few seconds and the amount being journalled steadily
increasing until the file is fully populated.
> I did a senond test ,
> umount the /dev/hioa2,
> fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite
> -filename=/dev/hioa2 -name="EBS 8KB randwrite test" -iodepth=64 -runtime=60
> The results was normal, the iops is about 210k± all the time.
That's not an equivalent test - it's being run direct to the block
device, not to a file on the filesytem on the block device, and so
you won't see artifacts taht are a result of creating worst case
file fragmentation....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: about the xfs performance
2016-04-11 16:10 ` Emmanuel Florac
2016-04-11 19:33 ` Eric Sandeen
@ 2016-04-12 13:27 ` Songbo Wang
1 sibling, 0 replies; 8+ messages in thread
From: Songbo Wang @ 2016-04-12 13:27 UTC (permalink / raw)
To: Emmanuel Florac; +Cc: xfs
[-- Attachment #1.1: Type: text/plain, Size: 1584 bytes --]
Hi,Emmanuel
Thank you for your reply , I have two types of pcie-ssd cards: Intel P3600
and ES3000 V2 PCIe SSD ;
I did this testing as you mentioned above , but the results are also bad.
2016-04-12 0:10 GMT+08:00 Emmanuel Florac <eflorac@intellique.com>:
> Le Mon, 11 Apr 2016 22:14:06 +0800
> Songbo Wang <hack.coo@gmail.com> écrivait:
>
> > mkfs: mkfs.xfs /dev/hioa2 -f -n size=64k -i size=512 -d
> > agcount=40 -l size=1024m.
> > mount: mount /dev/hioa2 /mnt/ -t xfs -o
> >
> rw,noexec,nodev,noatime,nodiratime,nobarrier,discard,inode64,logbsize=256k,delaylog
> > I use the following command to test iops: fio -ioengine=libaio -bs=4k
> > -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test
> > -name="EBS 4KB randwrite test" -iodepth=64 -runtime=60
> > The results is normal at the beginning which is about 210k±,but some
> > seconds later, the results down to 19k±.
>
> You should first try default mkfs settings, with default mount options.
> Normally mkfs.xfs should initiate a TRIM on the SSD, therefore
> performance should remain predictable.
>
> What model of SSD card are you using? With an HGST NVMe SN1x0 I've got
> very consistent results (no degradation with time).
>
> --
> ------------------------------------------------------------------------
> Emmanuel Florac | Direction technique
> | Intellique
> | <eflorac@intellique.com>
> | +33 1 78 94 84 02
> ------------------------------------------------------------------------
>
[-- Attachment #1.2: Type: text/html, Size: 2529 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: about the xfs performance
2016-04-11 23:10 ` Dave Chinner
@ 2016-04-12 14:07 ` Songbo Wang
2016-04-12 21:31 ` Dave Chinner
0 siblings, 1 reply; 8+ messages in thread
From: Songbo Wang @ 2016-04-12 14:07 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
[-- Attachment #1.1: Type: text/plain, Size: 3943 bytes --]
Hi Dave,
Thank you for your reply. I did some test today and described those as
follows:
Delete the existing test file , and redo the test : fio -ioengine=libaio
-bs=4k -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test
-name="EBS 4KB randwrite test" -iodepth=64 -runtime=60
The iops resultes is 19k±(per second); I continue to fio this test file
untill it was filled to the full. Then I did another test using the same
test case, the results was 210k±(per second).(The results mentioned
yesterday was partial. I used the same test file several times, the
results degraded because of the test file was not fill to the full)
I try to remake the filesystem using the following command to increase the
internal log size , inode size and agcount num:
mkfs.xfs /dev/hioa2 -f -n size=64k -i size=2048,align=1 -d agcount=2045 -l
size=512m
but it has no help to the result.
Any suggestion to deal with this problems ?
I very appreciate your feedback.
songbo
2016-04-12 7:10 GMT+08:00 Dave Chinner <david@fromorbit.com>:
> On Mon, Apr 11, 2016 at 10:14:06PM +0800, Songbo Wang wrote:
> > Hi xfsers:
> >
> > I got some troubles on the performance of xfs.
> > The environment is ,
> > xfs version is 3.2.1,
> > centos 7.1,
> > kernel version:3.10.0-229.el7.x86_64.
> > pcie-ssd card,
> > mkfs: mkfs.xfs /dev/hioa2 -f -n size=64k -i size=512 -d agcount=40
> -l
> > size=1024m.
> > mount: mount /dev/hioa2 /mnt/ -t xfs -o
> >
> rw,noexec,nodev,noatime,nodiratime,nobarrier,discard,inode64,logbsize=256k,delaylog
> > I use the following command to test iops: fio -ioengine=libaio -bs=4k
> > -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test -name="EBS
> > 4KB randwrite test" -iodepth=64 -runtime=60
> > The results is normal at the beginning which is about 210k±,but some
> > seconds later, the results down to 19k±.
>
> Looks like the workload runs out of log space due to all the
> allocation transactions being logged, which then causes new
> transactions to start tail pushing the log to flush dirty metadata.
> This is needed to to make more space in the log for on incoming dio
> writes that require allocation transactions. This will block IO
> submission until there is space available in the log.
>
> Let's face it, all that test does is create a massively fragmented
> 50GB file, so you're going to have a lot of metadata to log. Do the
> maths - if it runs at 200kiops for a few seconds, it's created a
> million extents.
>
> And it's doing random insert on the extent btree, so
> it's repeatedly dirtying the entire extent btree. This will trigger
> journal commits quite frequently as this is a large amount of
> metadata that is being dirtied. e.g. at 500 extent records per 4k
> block, a million extents will require 2000 leaf blocks to store them
> all. That's 80MB of metadata per million extents that this workload
> is generating and repeatedly dirtying.
>
> Then there's also other metadata, like the free space btrees, that
> is also being repeatedly dirtied, etc, so it would not be unexpected
> to see a workload like this on high IOPS devices allocating 100MB of
> metadata every few seconds and the amount being journalled steadily
> increasing until the file is fully populated.
>
> > I did a senond test ,
> > umount the /dev/hioa2,
> > fio -ioengine=libaio -bs=4k -direct=1 -thread -rw=randwrite
> > -filename=/dev/hioa2 -name="EBS 8KB randwrite test" -iodepth=64
> -runtime=60
> > The results was normal, the iops is about 210k± all the time.
>
> That's not an equivalent test - it's being run direct to the block
> device, not to a file on the filesytem on the block device, and so
> you won't see artifacts taht are a result of creating worst case
> file fragmentation....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
[-- Attachment #1.2: Type: text/html, Size: 5863 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: about the xfs performance
2016-04-12 14:07 ` Songbo Wang
@ 2016-04-12 21:31 ` Dave Chinner
2016-04-13 17:27 ` Songbo Wang
0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2016-04-12 21:31 UTC (permalink / raw)
To: Songbo Wang; +Cc: xfs
On Tue, Apr 12, 2016 at 10:07:45PM +0800, Songbo Wang wrote:
> Hi Dave,
>
> Thank you for your reply. I did some test today and described those as
> follows:
>
> Delete the existing test file , and redo the test : fio -ioengine=libaio
> -bs=4k -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test
> -name="EBS 4KB randwrite test" -iodepth=64 -runtime=60
> The iops resultes is 19k±(per second); I continue to fio this test file
> untill it was filled to the full. Then I did another test using the same
> test case, the results was 210k±(per second).(The results mentioned
Yup, that's when the workload goes from allocation bound to being an
overwrite workload when there is no allocation occurring.
Perhaps you should preallocate the file using the fallocate=posix
option. This will move the initial overhead to IO completion, so
won't block submission, and the file will not end up a fragmented
mess as the written areas will merge back into large single extents
as more of the file is written.
> yesterday was partial. I used the same test file several times, the
> results degraded because of the test file was not fill to the full)
>
> I try to remake the filesystem using the following command to increase the
> internal log size , inode size and agcount num:
> mkfs.xfs /dev/hioa2 -f -n size=64k -i size=2048,align=1 -d agcount=2045 -l
> size=512m
> but it has no help to the result.
Of course it won't. Turning random knobs without knowing what they
do will not solve the problem. Indeed, if you're workload is
performance limited because it is running out of log space, then
*reducing the log size* will not solve the issue.
Users who tweaking knobs without understanding what they do or how
they affect the application is the leading cause of filesystem
performance and reliability issues on XFS. Just don't do it - all
you'll do is cause something to go wrong when you can least afford
it to happen.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: about the xfs performance
2016-04-12 21:31 ` Dave Chinner
@ 2016-04-13 17:27 ` Songbo Wang
0 siblings, 0 replies; 8+ messages in thread
From: Songbo Wang @ 2016-04-13 17:27 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
[-- Attachment #1.1: Type: text/plain, Size: 2275 bytes --]
Hi Dave,
Thank you for your suggestion, and very appreciate for your reply!
2016-04-13 5:31 GMT+08:00 Dave Chinner <david@fromorbit.com>:
> On Tue, Apr 12, 2016 at 10:07:45PM +0800, Songbo Wang wrote:
> > Hi Dave,
> >
> > Thank you for your reply. I did some test today and described those as
> > follows:
> >
> > Delete the existing test file , and redo the test : fio -ioengine=libaio
> > -bs=4k -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test
> > -name="EBS 4KB randwrite test" -iodepth=64 -runtime=60
> > The iops resultes is 19ką(per second); I continue to fio this test file
> > untill it was filled to the full. Then I did another test using the same
> > test case, the results was 210ką(per second).(The results mentioned
>
> Yup, that's when the workload goes from allocation bound to being an
> overwrite workload when there is no allocation occurring.
>
> Perhaps you should preallocate the file using the fallocate=posix
> option. This will move the initial overhead to IO completion, so
> won't block submission, and the file will not end up a fragmented
> mess as the written areas will merge back into large single extents
> as more of the file is written.
>
> > yesterday was partial. I used the same test file several times, the
> > results degraded because of the test file was not fill to the full)
> >
> > I try to remake the filesystem using the following command to increase
> the
> > internal log size , inode size and agcount num:
> > mkfs.xfs /dev/hioa2 -f -n size=64k -i size=2048,align=1 -d agcount=2045
> -l
> > size=512m
> > but it has no help to the result.
>
> Of course it won't. Turning random knobs without knowing what they
> do will not solve the problem. Indeed, if you're workload is
> performance limited because it is running out of log space, then
> *reducing the log size* will not solve the issue.
>
> Users who tweaking knobs without understanding what they do or how
> they affect the application is the leading cause of filesystem
> performance and reliability issues on XFS. Just don't do it - all
> you'll do is cause something to go wrong when you can least afford
> it to happen.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
[-- Attachment #1.2: Type: text/html, Size: 2931 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-04-13 17:27 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-11 14:14 about the xfs performance Songbo Wang
2016-04-11 16:10 ` Emmanuel Florac
2016-04-11 19:33 ` Eric Sandeen
2016-04-12 13:27 ` Songbo Wang
2016-04-11 23:10 ` Dave Chinner
2016-04-12 14:07 ` Songbo Wang
2016-04-12 21:31 ` Dave Chinner
2016-04-13 17:27 ` Songbo Wang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.