All of lore.kernel.org
 help / color / mirror / Atom feed
* about the xfs performance
@ 2016-04-11 14:14 Songbo Wang
  2016-04-11 16:10 ` Emmanuel Florac
  2016-04-11 23:10 ` Dave Chinner
  0 siblings, 2 replies; 8+ messages in thread
From: Songbo Wang @ 2016-04-11 14:14 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 14896 bytes --]

Hi xfsers:

I got some troubles on the performance of  xfs.
The environment is ,
     xfs version is 3.2.1,
     centos 7.1,
     kernel version:3.10.0-229.el7.x86_64.
     pcie-ssd card,
     mkfs: mkfs.xfs /dev/hioa2 -f -n size=64k -i size=512 -d agcount=40 -l
size=1024m.
     mount: mount /dev/hioa2 /mnt/  -t xfs -o
rw,noexec,nodev,noatime,nodiratime,nobarrier,discard,inode64,logbsize=256k,delaylog
I use the following command to test iops: fio -ioengine=libaio -bs=4k
-direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test -name="EBS
4KB randwrite test" -iodepth=64 -runtime=60
The results is normal at the beginning which is about 210k±,but some
seconds later, the results down to 19k±.

I did a senond test ,
     umount the /dev/hioa2,
     fio -ioengine=libaio -bs=4k -direct=1  -thread -rw=randwrite
-filename=/dev/hioa2  -name="EBS 8KB randwrite test" -iodepth=64 -runtime=60
The results was normal, the iops is about 210k± all the time.

Then I use perf-tools to trace xfs:
     # ./funccount -i 1 'xfs*'

In the first test case , when the iops seems normal , the results is as
follows:
FUNC                              COUNT
xfs_free_eofblocks                    1
xfs_attr_get                          5
xfs_attr_get_int                      5
xfs_attr_name_to_xname                5
xfs_attr_shortform_getvalue           5
xfs_da_hashname                       5
xfs_ilock_attr_map_shared             5
xfs_xattr_get                         5
xfs_fs_statfs                         7
xfs_icsb_count                        7
xfs_icsb_sync_counters                7
xfs_icsb_sync_counters_locked         7
xfs_can_free_eofblocks                9
xfs_file_release                      9
xfs_release                           9
xfs_icsb_counter_disabled            21
xfs_file_open                        30
xfs_file_aio_read                    33
xfs_readlink                         37
xfs_vn_follow_link                   37
xfs_vn_put_link                      37
xfs_buf_delwri_submit_nowait         41
xfs_trans_ail_cursor_first           41
xfs_trans_ail_cursor_init            41
xfs_vn_getattr                       44
xfs_file_mmap                        48
xfs_cil_prepare_item.isra.1        1004
xfs_extent_busy_clear              1004
xfs_inode_item_committing          1004
xfs_inode_item_data_fork_size.     1004
xfs_inode_item_format              1004
xfs_inode_item_format_data_for     1004
xfs_inode_item_size                1004
xfs_inode_item_unlock              1004
xfs_log_calc_unit_res              1004
xfs_log_commit_cil                 1004
xfs_log_done                       1004
xfs_log_reserve                    1004
xfs_log_space_wake                 1004
xfs_log_ticket_put                 1004
xfs_trans_add_item                 1004
xfs_trans_alloc                    1004
xfs_trans_apply_dquot_deltas       1004
xfs_trans_commit                   1004
xfs_trans_free                     1004
xfs_trans_free_dqinfo              1004
xfs_trans_free_item_desc           1004
xfs_trans_free_items               1004
xfs_trans_ijoin                    1004
xfs_trans_log_inode                1004
xfs_trans_reserve                  1004
xfs_trans_unreserve_and_mod_sb     1004
xfs_vn_update_time                 1004
xfs_destroy_ioend                 84845
xfs_end_io_direct_write           84845
xfs_finish_ioend                  84845
xfs_file_aio_write                84880
xfs_file_dio_aio_write            84882
xfs_vm_direct_IO                  84882
xfs_get_blocks_direct             84883
xfs_bmapi_read                    84885
xfs_bmap_search_extents           84886
xfs_bmap_search_multi_extents     84886
xfs_iext_bno_to_ext               84887
xfs_iext_bno_to_irec              84887
xfs_iext_get_ext                  84890
xfs_iext_idx_to_irec              84890
xfs_ilock_data_map_shared         84891
xfs_map_buffer.isra.9             84892
xfs_file_aio_write_checks         84893
xfs_alloc_ioend                   84895
xfs_bmapi_trim_map.isra.11        84900
xfs_fsb_to_db                     84902
xfs_bmapi_update_map              84903
xfs_bmbt_get_all                 169793
xfs_find_bdev_for_inode          169794
xfs_ilock                        170868
xfs_iunlock                      170872
xfs_bmbt_get_blockcount          555016
xfs_bmbt_get_startoff           2277490

when the results down to 19k±, the trace results are:
FUNC                              COUNT
xfs_bitmap_empty                      1
xfs_free_eofblocks                    1
xfs_attr_get                          5
xfs_attr_get_int                      5
xfs_attr_name_to_xname                5
xfs_attr_shortform_getvalue           5
xfs_da_hashname                       5
xfs_ilock_attr_map_shared             5
xfs_xattr_get                         5
xfs_fs_statfs                         7
xfs_icsb_count                        7
xfs_icsb_sync_counters                7
xfs_icsb_sync_counters_locked         7
xfs_can_free_eofblocks                9
xfs_file_release                      9
xfs_release                           9
xfs_file_open                        30
xfs_file_aio_read                    33
xfs_readlink                         37
xfs_vn_follow_link                   37
xfs_vn_put_link                      37
xfs_buf_delwri_submit_nowait         41
xfs_trans_ail_cursor_first           41
xfs_trans_ail_cursor_init            41
xfs_alloc_ag_vextent                 43
xfs_alloc_ag_vextent_near            43
xfs_alloc_find_best_extent           43
xfs_alloc_fix_freelist               43
xfs_alloc_fixup_trees                43
xfs_alloc_read_agf                   43
xfs_alloc_read_agfl                  43
xfs_alloc_update                     43
xfs_alloc_update_counters.isra       43
xfs_alloc_vextent                    43
xfs_bmbt_alloc_block                 43
xfs_btree_get_buf_block.constp       43
xfs_btree_reada_bufs                 43
xfs_btree_readahead_sblock.isr       43
xfs_btree_split                      43
xfs_btree_split_worker               43
xfs_read_agf                         43
xfs_trans_get_buf_map                43
xfs_allocbt_dup_cursor               44
xfs_allocbt_get_minrecs              44
xfs_allocbt_init_rec_from_cur        44
xfs_allocbt_update_lastrec           44
xfs_alloc_fix_minleft                44
xfs_alloc_log_agf                    44
xfs_alloc_longest_free_extent        44
xfs_bmbt_init_rec_from_key           44
xfs_bmbt_update_cursor               44
xfs_btree_buf_to_ptr.isra.17         44
xfs_btree_check_sblock               44
xfs_btree_init_block_cur.isra.       44
xfs_btree_init_block_int             44
xfs_btree_islastblock                44
xfs_trans_mod_dquot_byino            44
xfs_trans_mod_sb                     44
xfs_vn_getattr                       44
xfs_file_mmap                        48
xfs_iext_irec_new                    57
xfs_iext_realloc_indirect            57
xfs_btree_copy_ptrs                  66
xfs_btree_shift_keys.isra.23         66
xfs_btree_shift_ptrs                 66
xfs_btree_log_ptrs                   72
xfs_alloc_get_rec                    86
xfs_alloc_lookup_eq                  86
xfs_alloc_compute_aligned            88
xfs_btree_get_rec                    88
xfs_extent_busy_trim                 88
xfs_iext_add_indirect_multi         108
xfs_alloc_compute_diff              110
xfs_alloc_fix_len                   110
xfs_allocbt_init_cursor             132
xfs_iext_irec_compact               163
xfs_iext_remove                     163
xfs_iext_remove_indirect            163
xfs_buf_item_relse                  173
xfs_allocbt_init_ptr_from_cur       176
xfs_btree_set_sibling.isra.12       176
xfs_buf_item_free                   177
xfs_buf_item_free_format            177
xfs_buf_hold                        248
xfs_buf_item_get_format             248
xfs_log_item_init                   248
xfs_allocbt_get_maxrecs             264
xfs_bmbt_get_minrecs                326
xfs_btree_dec_cursor                369
xfs_btree_delete                    369
xfs_btree_delrec                    369
xfs_bmbt_set_allf                   633
xfs_bmbt_set_startblock            1007
xfs_bmbt_set_startoff              1007
xfs_trans_alloc                    1029
xfs_vn_update_time                 1029
xfs_btree_lshift                   1048
xfs_btree_decrement                1332
xfs_allocbt_init_key_from_rec      1481
xfs_allocbt_key_diff               2427
xfs_btree_check_lblock             3653
xfs_btree_lastrec                  3653
xfs_buf_free                       3660
xfs_bmbt_dup_cursor                3696
xfs_btree_check_block              3696
xfs_buf_allocate_memory            3703
xfs_btree_dup_cursor               3738
xfs_btree_increment                4324
xfs_btree_make_block_unfull        4699
xfs_btree_rshift                   4699
xfs_btree_updkey                   4738
xfs_btree_copy_keys.isra.22        4824
xfs_btree_log_keys                 4832
xfs_iext_add                       4876
xfs_iext_insert                    4876
xfs_btree_reada_bufl               5465
xfs_btree_readahead_lblock.isr     5465
xfs_buf_readahead_map              5508
xfs_btree_readahead                5670
xfs_inode_item_format              6071
xfs_inode_item_format_data_for     6072
xfs_inode_item_size                6072
xfs_inode_item_data_fork_size.     6073
xfs_trans_log_inode                6115
xfs_buf_item_pin                   6315
xfs_bmapi_write                    6674
xfs_end_io                         6674
xfs_iomap_write_unwritten          6674
xfs_bmap_add_extent_unwritten_     6675
xfs_bmapi_convert_unwritten        6675
xfs_destroy_ioend                  6675
xfs_end_io_direct_write            6676
xfs_finish_ioend                   6677
xfs_bmap_finish                    6683
xfs_file_aio_write                 6784
xfs_bmapi_read                     6785
xfs_file_aio_write_checks          6785
xfs_file_dio_aio_write             6785
xfs_get_blocks_direct              6785
xfs_ilock_data_map_shared          6785
xfs_map_buffer.isra.9              6785
xfs_vm_direct_IO                   6785
xfs_alloc_ioend                    6786
xfs_fsb_to_db                      6786
xfs_bmbt_set_blockcount            7691
xfs_log_commit_cil                 7704
xfs_trans_commit                   7704
xfs_trans_free_items               7704
xfs_trans_reserve                  7705
xfs_trans_unreserve_and_mod_sb     7705
xfs_log_done                       7706
xfs_trans_free                     7706
xfs_log_reserve                    7708
xfs_trans_ijoin                    7708
xfs_extent_busy_clear              7712
xfs_inode_item_unlock              7712
xfs_log_space_wake                 7712
xfs_log_ticket_put                 7712
xfs_trans_apply_dquot_deltas       7712
xfs_inode_item_committing          7713
xfs_log_calc_unit_res              7713
xfs_trans_free_dqinfo              7713
xfs_bmbt_update                    8311
xfs_btree_update                   8354
xfs_bmbt_init_rec_from_cur         9758
xfs_bmbt_disk_set_all              9760
xfs_bmbt_set_all                   9760
xfs_btree_insert                   9795
xfs_btree_insrec                   9838
xfs_bmbt_init_cursor              10379
xfs_btree_del_cursor              10498
xfs_trans_del_item                11101
xfs_buf_item_dirty                11109
xfs_bmbt_lookup_eq                11551
xfs_bmbt_init_ptr_from_cur        11563
xfs_btree_get_iroot.isra.8        11563
xfs_buf_item_format               11690
xfs_buf_item_size                 11690
xfs_buf_item_size_segment.isra    11690
xfs_btree_lookup                  11723
xfs_btree_get_sibling.isra.11     11974
xfs_buf_item_unlock               12229
xfs_buf_item_committing           12237
xfs_icsb_modify_counters          13351
xfs_icsb_lock_cntr                13365
xfs_icsb_unlock_cntr              13365
xfs_iext_bno_to_irec              13458
xfs_bmap_search_extents           13459
xfs_bmap_search_multi_extents     13459
xfs_iext_bno_to_ext               13459
xfs_bmapi_trim_map.isra.11        13469
xfs_bmapi_update_map              13469
xfs_find_bdev_for_inode           13573
xfs_btree_shift_recs.isra.24      14798
xfs_cil_prepare_item.isra.1       17769
xfs_bmbt_disk_set_allf            18126
xfs_btree_is_lastrec              18581
xfs_btree_log_block               19648
xfs_btree_set_ptr_null.isra.10    19649
xfs_btree_offsets                 19703
xfs_iunlock                       21348
xfs_ilock                         21349
xfs_btree_copy_recs.isra.21       22849
xfs_buf_item_init                 23331
xfs_buf_unlock                    23345
xfs_btree_setbuf                  23948
xfs_btree_log_recs                24190
xfs_icsb_counter_disabled         26751
xfs_buf_read_map                  28786
xfs_buf_get_map                   28829
xfs_btree_ptr_is_null.isra.9      29274
xfs_btree_read_buf_block.isra.    29684
xfs_btree_set_refs.isra.13        29712
xfs_trans_add_item                31037
xfs_trans_free_item_desc          31052
xfs_buf_trylock                   32463
xfs_perag_put                     32606
xfs_perag_get                     32633
xfs_buf_rele                      32693
xfs_trans_brelse                  35148
xfs_btree_ptr_addr                38591
xfs_btree_ptr_offset              38738
xfs_trans_read_buf_map            40938
xfs_trans_buf_item_match          41000
xfs_btree_ptr_to_daddr            44562
xfs_btree_lookup_get_block        46548
xfs_trans_log_buf                 48783
xfs_buf_item_log                  48806
xfs_trans_buf_set_type            48815
xfs_iext_get_ext                  51712
xfs_bmbt_get_all                  53669
xfs_btree_get_block               54424
xfs_iext_idx_to_irec              56768
xfs_bmbt_get_maxrecs              57774
xfs_bmbt_get_blockcount           89650
xfs_bmbt_disk_get_startoff        98285
xfs_bmbt_init_key_from_rec        98285
xfs_btree_rec_addr               115199
xfs_btree_rec_offset             163635
xfs_btree_key_addr               176837
xfs_btree_key_offset             186525
xfs_bmbt_key_diff                254763
xfs_lookup_get_search_key        257190
xfs_bmbt_get_startoff            363158
xfs_next_bit                     631334
xfs_buf_offset                  1166477


Compare with the two results above, I found  when the iops down to 19k±,
lots of functions were traced, such
as xfs_buf_trylock, xfs_iext_insert, xfs_btree_insert, etc.

I cannot find what cause the iops down to 19k±, any suggestion ?

[-- Attachment #1.2: Type: text/html, Size: 20758 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: about the xfs performance
  2016-04-11 14:14 about the xfs performance Songbo Wang
@ 2016-04-11 16:10 ` Emmanuel Florac
  2016-04-11 19:33   ` Eric Sandeen
  2016-04-12 13:27   ` Songbo Wang
  2016-04-11 23:10 ` Dave Chinner
  1 sibling, 2 replies; 8+ messages in thread
From: Emmanuel Florac @ 2016-04-11 16:10 UTC (permalink / raw)
  To: Songbo Wang; +Cc: xfs

Le Mon, 11 Apr 2016 22:14:06 +0800
Songbo Wang <hack.coo@gmail.com> écrivait:

>      mkfs: mkfs.xfs /dev/hioa2 -f -n size=64k -i size=512 -d
> agcount=40 -l size=1024m.
>      mount: mount /dev/hioa2 /mnt/  -t xfs -o
> rw,noexec,nodev,noatime,nodiratime,nobarrier,discard,inode64,logbsize=256k,delaylog
> I use the following command to test iops: fio -ioengine=libaio -bs=4k
> -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test
> -name="EBS 4KB randwrite test" -iodepth=64 -runtime=60
> The results is normal at the beginning which is about 210k±,but some
> seconds later, the results down to 19k±.

You should first try default mkfs settings, with default mount options.
Normally mkfs.xfs should initiate a TRIM on the SSD, therefore
performance should remain predictable.

What model of SSD card are you using? With an HGST NVMe SN1x0 I've got
very consistent results (no degradation with time).

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: about the xfs performance
  2016-04-11 16:10 ` Emmanuel Florac
@ 2016-04-11 19:33   ` Eric Sandeen
  2016-04-12 13:27   ` Songbo Wang
  1 sibling, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2016-04-11 19:33 UTC (permalink / raw)
  To: xfs



On 4/11/16 11:10 AM, Emmanuel Florac wrote:
> Le Mon, 11 Apr 2016 22:14:06 +0800
> Songbo Wang <hack.coo@gmail.com> écrivait:
> 
>>      mkfs: mkfs.xfs /dev/hioa2 -f -n size=64k -i size=512 -d
>> agcount=40 -l size=1024m.
>>      mount: mount /dev/hioa2 /mnt/  -t xfs -o
>> rw,noexec,nodev,noatime,nodiratime,nobarrier,discard,inode64,logbsize=256k,delaylog
>> I use the following command to test iops: fio -ioengine=libaio -bs=4k
>> -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test
>> -name="EBS 4KB randwrite test" -iodepth=64 -runtime=60
>> The results is normal at the beginning which is about 210k±,but some
>> seconds later, the results down to 19k±.
> 
> You should first try default mkfs settings, with default mount options.

Agreed.  Where did that set of options come from, in any case?

-Eric

> Normally mkfs.xfs should initiate a TRIM on the SSD, therefore
> performance should remain predictable.
> 
> What model of SSD card are you using? With an HGST NVMe SN1x0 I've got
> very consistent results (no degradation with time).
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: about the xfs performance
  2016-04-11 14:14 about the xfs performance Songbo Wang
  2016-04-11 16:10 ` Emmanuel Florac
@ 2016-04-11 23:10 ` Dave Chinner
  2016-04-12 14:07   ` Songbo Wang
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2016-04-11 23:10 UTC (permalink / raw)
  To: Songbo Wang; +Cc: xfs

On Mon, Apr 11, 2016 at 10:14:06PM +0800, Songbo Wang wrote:
> Hi xfsers:
> 
> I got some troubles on the performance of  xfs.
> The environment is ,
>      xfs version is 3.2.1,
>      centos 7.1,
>      kernel version:3.10.0-229.el7.x86_64.
>      pcie-ssd card,
>      mkfs: mkfs.xfs /dev/hioa2 -f -n size=64k -i size=512 -d agcount=40 -l
> size=1024m.
>      mount: mount /dev/hioa2 /mnt/  -t xfs -o
> rw,noexec,nodev,noatime,nodiratime,nobarrier,discard,inode64,logbsize=256k,delaylog
> I use the following command to test iops: fio -ioengine=libaio -bs=4k
> -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test -name="EBS
> 4KB randwrite test" -iodepth=64 -runtime=60
> The results is normal at the beginning which is about 210k±,but some
> seconds later, the results down to 19k±.

Looks like the workload runs out of log space due to all the
allocation transactions being logged, which then causes new
transactions to start tail pushing the log to flush dirty metadata.
This is needed to to make more space in the log for on incoming dio
writes that require allocation transactions. This will block IO
submission until there is space available in the log.

Let's face it, all that test does is create a massively fragmented
50GB file, so you're going to have a lot of metadata to log. Do the
maths - if it runs at 200kiops for a few seconds, it's created a
million extents.

And it's doing random insert on the extent btree, so
it's repeatedly dirtying the entire extent btree. This will trigger
journal commits quite frequently as this is a large amount of
metadata that is being dirtied. e.g. at 500 extent records per 4k
block, a million extents will require 2000 leaf blocks to store them
all. That's 80MB of metadata per million extents that this workload
is generating and repeatedly dirtying.

Then there's also other metadata, like the free space btrees, that
is also being repeatedly dirtied, etc, so it would not be unexpected
to see a workload like this on high IOPS devices allocating 100MB of
metadata every few seconds and the amount being journalled steadily
increasing until the file is fully populated.

> I did a senond test ,
>      umount the /dev/hioa2,
>      fio -ioengine=libaio -bs=4k -direct=1  -thread -rw=randwrite
> -filename=/dev/hioa2  -name="EBS 8KB randwrite test" -iodepth=64 -runtime=60
> The results was normal, the iops is about 210k± all the time.

That's not an equivalent test - it's being run direct to the block
device, not to a file on the filesytem on the block device, and so
you won't see artifacts taht are a result of creating worst case
file fragmentation....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: about the xfs performance
  2016-04-11 16:10 ` Emmanuel Florac
  2016-04-11 19:33   ` Eric Sandeen
@ 2016-04-12 13:27   ` Songbo Wang
  1 sibling, 0 replies; 8+ messages in thread
From: Songbo Wang @ 2016-04-12 13:27 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 1584 bytes --]

Hi,Emmanuel
Thank you for your reply , I have two types of pcie-ssd cards: Intel P3600
 and ES3000 V2 PCIe SSD ;
I did this testing as you mentioned above , but the results are also bad.

2016-04-12 0:10 GMT+08:00 Emmanuel Florac <eflorac@intellique.com>:

> Le Mon, 11 Apr 2016 22:14:06 +0800
> Songbo Wang <hack.coo@gmail.com> écrivait:
>
> >      mkfs: mkfs.xfs /dev/hioa2 -f -n size=64k -i size=512 -d
> > agcount=40 -l size=1024m.
> >      mount: mount /dev/hioa2 /mnt/  -t xfs -o
> >
> rw,noexec,nodev,noatime,nodiratime,nobarrier,discard,inode64,logbsize=256k,delaylog
> > I use the following command to test iops: fio -ioengine=libaio -bs=4k
> > -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test
> > -name="EBS 4KB randwrite test" -iodepth=64 -runtime=60
> > The results is normal at the beginning which is about 210k±,but some
> > seconds later, the results down to 19k±.
>
> You should first try default mkfs settings, with default mount options.
> Normally mkfs.xfs should initiate a TRIM on the SSD, therefore
> performance should remain predictable.
>
> What model of SSD card are you using? With an HGST NVMe SN1x0 I've got
> very consistent results (no degradation with time).
>
> --
> ------------------------------------------------------------------------
> Emmanuel Florac     |   Direction technique
>                     |   Intellique
>                     |   <eflorac@intellique.com>
>                     |   +33 1 78 94 84 02
> ------------------------------------------------------------------------
>

[-- Attachment #1.2: Type: text/html, Size: 2529 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: about the xfs performance
  2016-04-11 23:10 ` Dave Chinner
@ 2016-04-12 14:07   ` Songbo Wang
  2016-04-12 21:31     ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Songbo Wang @ 2016-04-12 14:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 3943 bytes --]

Hi Dave,

Thank you for your reply. I did some test today and described those as
follows:

Delete the existing test file , and redo the test : fio -ioengine=libaio
-bs=4k -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test
-name="EBS 4KB randwrite test" -iodepth=64 -runtime=60
The iops resultes is 19k±(per second);  I continue to fio this test file
untill it was filled to the full. Then I did another test using the same
test case, the results was 210k±(per second).(The results mentioned
yesterday was partial.  I used the same test file several times, the
results degraded because of the test file was not fill to the full)

I try to remake the filesystem using the following command to increase the
internal log size , inode size and agcount num:
mkfs.xfs /dev/hioa2 -f -n size=64k -i size=2048,align=1 -d agcount=2045 -l
size=512m
but it has no help to the result.


Any suggestion to deal with this problems ?
I very appreciate your feedback.

songbo





2016-04-12 7:10 GMT+08:00 Dave Chinner <david@fromorbit.com>:

> On Mon, Apr 11, 2016 at 10:14:06PM +0800, Songbo Wang wrote:
> > Hi xfsers:
> >
> > I got some troubles on the performance of  xfs.
> > The environment is ,
> >      xfs version is 3.2.1,
> >      centos 7.1,
> >      kernel version:3.10.0-229.el7.x86_64.
> >      pcie-ssd card,
> >      mkfs: mkfs.xfs /dev/hioa2 -f -n size=64k -i size=512 -d agcount=40
> -l
> > size=1024m.
> >      mount: mount /dev/hioa2 /mnt/  -t xfs -o
> >
> rw,noexec,nodev,noatime,nodiratime,nobarrier,discard,inode64,logbsize=256k,delaylog
> > I use the following command to test iops: fio -ioengine=libaio -bs=4k
> > -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test -name="EBS
> > 4KB randwrite test" -iodepth=64 -runtime=60
> > The results is normal at the beginning which is about 210k±,but some
> > seconds later, the results down to 19k±.
>
> Looks like the workload runs out of log space due to all the
> allocation transactions being logged, which then causes new
> transactions to start tail pushing the log to flush dirty metadata.
> This is needed to to make more space in the log for on incoming dio
> writes that require allocation transactions. This will block IO
> submission until there is space available in the log.
>
> Let's face it, all that test does is create a massively fragmented
> 50GB file, so you're going to have a lot of metadata to log. Do the
> maths - if it runs at 200kiops for a few seconds, it's created a
> million extents.
>
> And it's doing random insert on the extent btree, so
> it's repeatedly dirtying the entire extent btree. This will trigger
> journal commits quite frequently as this is a large amount of
> metadata that is being dirtied. e.g. at 500 extent records per 4k
> block, a million extents will require 2000 leaf blocks to store them
> all. That's 80MB of metadata per million extents that this workload
> is generating and repeatedly dirtying.
>
> Then there's also other metadata, like the free space btrees, that
> is also being repeatedly dirtied, etc, so it would not be unexpected
> to see a workload like this on high IOPS devices allocating 100MB of
> metadata every few seconds and the amount being journalled steadily
> increasing until the file is fully populated.
>
> > I did a senond test ,
> >      umount the /dev/hioa2,
> >      fio -ioengine=libaio -bs=4k -direct=1  -thread -rw=randwrite
> > -filename=/dev/hioa2  -name="EBS 8KB randwrite test" -iodepth=64
> -runtime=60
> > The results was normal, the iops is about 210k± all the time.
>
> That's not an equivalent test - it's being run direct to the block
> device, not to a file on the filesytem on the block device, and so
> you won't see artifacts taht are a result of creating worst case
> file fragmentation....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>

[-- Attachment #1.2: Type: text/html, Size: 5863 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: about the xfs performance
  2016-04-12 14:07   ` Songbo Wang
@ 2016-04-12 21:31     ` Dave Chinner
  2016-04-13 17:27       ` Songbo Wang
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2016-04-12 21:31 UTC (permalink / raw)
  To: Songbo Wang; +Cc: xfs

On Tue, Apr 12, 2016 at 10:07:45PM +0800, Songbo Wang wrote:
> Hi Dave,
> 
> Thank you for your reply. I did some test today and described those as
> follows:
> 
> Delete the existing test file , and redo the test : fio -ioengine=libaio
> -bs=4k -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test
> -name="EBS 4KB randwrite test" -iodepth=64 -runtime=60
> The iops resultes is 19k±(per second);  I continue to fio this test file
> untill it was filled to the full. Then I did another test using the same
> test case, the results was 210k±(per second).(The results mentioned

Yup, that's when the workload goes from allocation bound to being an
overwrite workload when there is no allocation occurring.

Perhaps you should preallocate the file using the fallocate=posix
option. This will move the initial overhead to IO completion, so
won't block submission, and the file will not end up a fragmented
mess as the written areas will merge back into large single extents
as more of the file is written.

> yesterday was partial.  I used the same test file several times, the
> results degraded because of the test file was not fill to the full)
> 
> I try to remake the filesystem using the following command to increase the
> internal log size , inode size and agcount num:
> mkfs.xfs /dev/hioa2 -f -n size=64k -i size=2048,align=1 -d agcount=2045 -l
> size=512m
> but it has no help to the result.

Of course it won't. Turning random knobs without knowing what they
do will not solve the problem. Indeed, if you're workload is
performance limited because it is running out of log space, then
*reducing the log size* will not solve the issue.

Users who tweaking knobs without understanding what they do or how
they affect the application is the leading cause of filesystem
performance and reliability issues on XFS. Just don't do it - all
you'll do is cause something to go wrong when you can least afford
it to happen.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: about the xfs performance
  2016-04-12 21:31     ` Dave Chinner
@ 2016-04-13 17:27       ` Songbo Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Songbo Wang @ 2016-04-13 17:27 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 2275 bytes --]

Hi Dave,

Thank you for your suggestion,  and very appreciate for your reply!

2016-04-13 5:31 GMT+08:00 Dave Chinner <david@fromorbit.com>:

> On Tue, Apr 12, 2016 at 10:07:45PM +0800, Songbo Wang wrote:
> > Hi Dave,
> >
> > Thank you for your reply. I did some test today and described those as
> > follows:
> >
> > Delete the existing test file , and redo the test : fio -ioengine=libaio
> > -bs=4k -direct=1 -thread -rw=randwrite -size=50G -filename=/mnt/test
> > -name="EBS 4KB randwrite test" -iodepth=64 -runtime=60
> > The iops resultes is 19ką(per second);  I continue to fio this test file
> > untill it was filled to the full. Then I did another test using the same
> > test case, the results was 210ką(per second).(The results mentioned
>
> Yup, that's when the workload goes from allocation bound to being an
> overwrite workload when there is no allocation occurring.
>
> Perhaps you should preallocate the file using the fallocate=posix
> option. This will move the initial overhead to IO completion, so
> won't block submission, and the file will not end up a fragmented
> mess as the written areas will merge back into large single extents
> as more of the file is written.
>
> > yesterday was partial.  I used the same test file several times, the
> > results degraded because of the test file was not fill to the full)
> >
> > I try to remake the filesystem using the following command to increase
> the
> > internal log size , inode size and agcount num:
> > mkfs.xfs /dev/hioa2 -f -n size=64k -i size=2048,align=1 -d agcount=2045
> -l
> > size=512m
> > but it has no help to the result.
>
> Of course it won't. Turning random knobs without knowing what they
> do will not solve the problem. Indeed, if you're workload is
> performance limited because it is running out of log space, then
> *reducing the log size* will not solve the issue.
>
> Users who tweaking knobs without understanding what they do or how
> they affect the application is the leading cause of filesystem
> performance and reliability issues on XFS. Just don't do it - all
> you'll do is cause something to go wrong when you can least afford
> it to happen.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>

[-- Attachment #1.2: Type: text/html, Size: 2931 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-04-13 17:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-11 14:14 about the xfs performance Songbo Wang
2016-04-11 16:10 ` Emmanuel Florac
2016-04-11 19:33   ` Eric Sandeen
2016-04-12 13:27   ` Songbo Wang
2016-04-11 23:10 ` Dave Chinner
2016-04-12 14:07   ` Songbo Wang
2016-04-12 21:31     ` Dave Chinner
2016-04-13 17:27       ` Songbo Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.