linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* A NFS, xfs, reflink and rmapbt story
@ 2020-01-23  8:32 Murphy Zhou
  2020-01-24  1:10 ` Darrick J. Wong
  0 siblings, 1 reply; 8+ messages in thread
From: Murphy Zhou @ 2020-01-23  8:32 UTC (permalink / raw)
  To: linux-xfs, linux-nfs

Hi,

Deleting the files left by generic/175 costs too much time when testing
on NFSv4.2 exporting xfs with rmapbt=1.

"./check -nfs generic/175 generic/176" should reproduce it.

My test bed is a 16c8G vm.

NFSv4.2  rmapbt=1   24h+
NFSv4.2  rmapbt=0   1h-2h
xfs      rmapbt=1   10m+

At first I thought it hung, turns out it was just slow when deleting
2 massive reflined files.

It's reproducible using latest Linus tree, and Darrick's deferred-inactivation
branch. Run latest for-next branch xfsprogs.

I'm not sure it's something wrong, just sharing with you guys. I don't
remember I have identified this as a regression. It should be there for
a long time.

Sending to xfs and nfs because it looks like all related. :)

This almost gets lost in my list. Not much information recorded, some
trace-cmd outputs for your info. It's easy to reproduce. If it's
interesting to you and need any info, feel free to ask.

Thanks,


7)   0.279 us    |  xfs_btree_get_block [xfs]();
7)   0.303 us    |  xfs_btree_rec_offset [xfs]();
7)   0.301 us    |  xfs_rmapbt_init_high_key_from_rec [xfs]();
7)   0.356 us    |  xfs_rmapbt_diff_two_keys [xfs]();
7)   0.305 us    |  xfs_rmapbt_init_key_from_rec [xfs]();
7)   0.306 us    |  xfs_rmapbt_diff_two_keys [xfs]();
7)               |  xfs_rmap_query_range_helper [xfs]() {
7)   0.279 us    |    xfs_rmap_btrec_to_irec [xfs]();
7)               |    xfs_rmap_lookup_le_range_helper [xfs]() {
1)   0.786 us    |  _raw_spin_lock_irqsave();
7)               |      /* xfs_rmap_lookup_le_range_candidate: dev 8:34 agno 2 agbno 6416 len 256 owner 67160161 offset 99284480 flags 0x0 */
7)   0.506 us    |    }
7)   1.680 us    |  }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A NFS, xfs, reflink and rmapbt story
  2020-01-23  8:32 A NFS, xfs, reflink and rmapbt story Murphy Zhou
@ 2020-01-24  1:10 ` Darrick J. Wong
  2020-01-27 22:36   ` J. Bruce Fields
  2020-01-27 23:56   ` Dave Chinner
  0 siblings, 2 replies; 8+ messages in thread
From: Darrick J. Wong @ 2020-01-24  1:10 UTC (permalink / raw)
  To: Murphy Zhou; +Cc: linux-xfs, linux-nfs

On Thu, Jan 23, 2020 at 04:32:17PM +0800, Murphy Zhou wrote:
> Hi,
> 
> Deleting the files left by generic/175 costs too much time when testing
> on NFSv4.2 exporting xfs with rmapbt=1.
> 
> "./check -nfs generic/175 generic/176" should reproduce it.
> 
> My test bed is a 16c8G vm.

What kind of storage?

> NFSv4.2  rmapbt=1   24h+

<URK> Wow.  I wonder what about NFS makes us so slow now?  Synchronous
transactions on the inactivation?  (speculates wildly at the end of the
workday)

I'll have a look in the morning.  It might take me a while to remember
how to set up NFS42 :)

--D

> NFSv4.2  rmapbt=0   1h-2h
> xfs      rmapbt=1   10m+
> 
> At first I thought it hung, turns out it was just slow when deleting
> 2 massive reflined files.
> 
> It's reproducible using latest Linus tree, and Darrick's deferred-inactivation
> branch. Run latest for-next branch xfsprogs.
> 
> I'm not sure it's something wrong, just sharing with you guys. I don't
> remember I have identified this as a regression. It should be there for
> a long time.
> 
> Sending to xfs and nfs because it looks like all related. :)
> 
> This almost gets lost in my list. Not much information recorded, some
> trace-cmd outputs for your info. It's easy to reproduce. If it's
> interesting to you and need any info, feel free to ask.
> 
> Thanks,
> 
> 
> 7)   0.279 us    |  xfs_btree_get_block [xfs]();
> 7)   0.303 us    |  xfs_btree_rec_offset [xfs]();
> 7)   0.301 us    |  xfs_rmapbt_init_high_key_from_rec [xfs]();
> 7)   0.356 us    |  xfs_rmapbt_diff_two_keys [xfs]();
> 7)   0.305 us    |  xfs_rmapbt_init_key_from_rec [xfs]();
> 7)   0.306 us    |  xfs_rmapbt_diff_two_keys [xfs]();
> 7)               |  xfs_rmap_query_range_helper [xfs]() {
> 7)   0.279 us    |    xfs_rmap_btrec_to_irec [xfs]();
> 7)               |    xfs_rmap_lookup_le_range_helper [xfs]() {
> 1)   0.786 us    |  _raw_spin_lock_irqsave();
> 7)               |      /* xfs_rmap_lookup_le_range_candidate: dev 8:34 agno 2 agbno 6416 len 256 owner 67160161 offset 99284480 flags 0x0 */
> 7)   0.506 us    |    }
> 7)   1.680 us    |  }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A NFS, xfs, reflink and rmapbt story
  2020-01-24  1:10 ` Darrick J. Wong
@ 2020-01-27 22:36   ` J. Bruce Fields
  2020-02-05  6:22     ` Murphy Zhou
  2020-02-16  8:28     ` Murphy Zhou
  2020-01-27 23:56   ` Dave Chinner
  1 sibling, 2 replies; 8+ messages in thread
From: J. Bruce Fields @ 2020-01-27 22:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Murphy Zhou, linux-xfs, linux-nfs

On Thu, Jan 23, 2020 at 05:10:19PM -0800, Darrick J. Wong wrote:
> On Thu, Jan 23, 2020 at 04:32:17PM +0800, Murphy Zhou wrote:
> > Hi,
> > 
> > Deleting the files left by generic/175 costs too much time when testing
> > on NFSv4.2 exporting xfs with rmapbt=1.
> > 
> > "./check -nfs generic/175 generic/176" should reproduce it.
> > 
> > My test bed is a 16c8G vm.
> 
> What kind of storage?
> 
> > NFSv4.2  rmapbt=1   24h+
> 
> <URK> Wow.  I wonder what about NFS makes us so slow now?  Synchronous
> transactions on the inactivation?  (speculates wildly at the end of the
> workday)
> 
> I'll have a look in the morning.  It might take me a while to remember
> how to set up NFS42 :)

It may just be the default on a recent enough distro.

Though I'd be a little surprised if this behavior is specific to the
protocol version.

nfsd_unlink() is basically just vfs_unlink() followed by
commit_metadata().

--b.

> 
> --D
> 
> > NFSv4.2  rmapbt=0   1h-2h
> > xfs      rmapbt=1   10m+
> > 
> > At first I thought it hung, turns out it was just slow when deleting
> > 2 massive reflined files.
> > 
> > It's reproducible using latest Linus tree, and Darrick's deferred-inactivation
> > branch. Run latest for-next branch xfsprogs.
> > 
> > I'm not sure it's something wrong, just sharing with you guys. I don't
> > remember I have identified this as a regression. It should be there for
> > a long time.
> > 
> > Sending to xfs and nfs because it looks like all related. :)
> > 
> > This almost gets lost in my list. Not much information recorded, some
> > trace-cmd outputs for your info. It's easy to reproduce. If it's
> > interesting to you and need any info, feel free to ask.
> > 
> > Thanks,
> > 
> > 
> > 7)   0.279 us    |  xfs_btree_get_block [xfs]();
> > 7)   0.303 us    |  xfs_btree_rec_offset [xfs]();
> > 7)   0.301 us    |  xfs_rmapbt_init_high_key_from_rec [xfs]();
> > 7)   0.356 us    |  xfs_rmapbt_diff_two_keys [xfs]();
> > 7)   0.305 us    |  xfs_rmapbt_init_key_from_rec [xfs]();
> > 7)   0.306 us    |  xfs_rmapbt_diff_two_keys [xfs]();
> > 7)               |  xfs_rmap_query_range_helper [xfs]() {
> > 7)   0.279 us    |    xfs_rmap_btrec_to_irec [xfs]();
> > 7)               |    xfs_rmap_lookup_le_range_helper [xfs]() {
> > 1)   0.786 us    |  _raw_spin_lock_irqsave();
> > 7)               |      /* xfs_rmap_lookup_le_range_candidate: dev 8:34 agno 2 agbno 6416 len 256 owner 67160161 offset 99284480 flags 0x0 */
> > 7)   0.506 us    |    }
> > 7)   1.680 us    |  }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A NFS, xfs, reflink and rmapbt story
  2020-01-24  1:10 ` Darrick J. Wong
  2020-01-27 22:36   ` J. Bruce Fields
@ 2020-01-27 23:56   ` Dave Chinner
  2020-02-05  6:52     ` Murphy Zhou
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2020-01-27 23:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Murphy Zhou, linux-xfs, linux-nfs

On Thu, Jan 23, 2020 at 05:10:19PM -0800, Darrick J. Wong wrote:
> On Thu, Jan 23, 2020 at 04:32:17PM +0800, Murphy Zhou wrote:
> > Hi,
> > 
> > Deleting the files left by generic/175 costs too much time when testing
> > on NFSv4.2 exporting xfs with rmapbt=1.
> > 
> > "./check -nfs generic/175 generic/176" should reproduce it.
> > 
> > My test bed is a 16c8G vm.
> 
> What kind of storage?

Is the NFS server the same machine as what the local XFS tests were
run on?

> > NFSv4.2  rmapbt=1   24h+
> 
> <URK> Wow.  I wonder what about NFS makes us so slow now?  Synchronous
> transactions on the inactivation?  (speculates wildly at the end of the
> workday)

Doubt it - NFS server uses ->commit_metadata after the async
operation to ensure that it is completed and on stable storage, so
the truncate on inactivation should run at pretty much the same
speed as on a local filesystem as it's still all async commits. i.e.
the only difference on the NFS server is the log force that follows
the inode inactivation...

> I'll have a look in the morning.  It might take me a while to remember
> how to set up NFS42 :)
> 
> --D
> 
> > NFSv4.2  rmapbt=0   1h-2h
> > xfs      rmapbt=1   10m+
> > 
> > At first I thought it hung, turns out it was just slow when deleting
> > 2 massive reflined files.

Both tests run on the scratch device, so I don't see where there is
a large file unlink in either of these tests.

In which case, I'd expect that all the time is consumed in
generic/176 running punch_alternating to create a million extents
as that will effectively run a synchronous server-side hole punch
half a million times.

However, I'm guessing that the server side filesystem has a very
small log and is on spinning rust, hence the ->commit_metadata log
forces are preventing in-memory aggregation of modifications. This
results in the working set of metadata not fitting in the log and so
each new hole punch transaction ends up waiting on log tail pushing
(i.e. metadata writeback IO).  i.e. it's thrashing the disk, and
that's why it is slow.....

Storage details, please!

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A NFS, xfs, reflink and rmapbt story
  2020-01-27 22:36   ` J. Bruce Fields
@ 2020-02-05  6:22     ` Murphy Zhou
  2020-02-16  8:28     ` Murphy Zhou
  1 sibling, 0 replies; 8+ messages in thread
From: Murphy Zhou @ 2020-02-05  6:22 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Darrick J. Wong, Murphy Zhou, linux-xfs, linux-nfs

On Mon, Jan 27, 2020 at 05:36:31PM -0500, J. Bruce Fields wrote:
> On Thu, Jan 23, 2020 at 05:10:19PM -0800, Darrick J. Wong wrote:
> > On Thu, Jan 23, 2020 at 04:32:17PM +0800, Murphy Zhou wrote:
> > > Hi,
> > > 
> > > Deleting the files left by generic/175 costs too much time when testing
> > > on NFSv4.2 exporting xfs with rmapbt=1.
> > > 
> > > "./check -nfs generic/175 generic/176" should reproduce it.
> > > 
> > > My test bed is a 16c8G vm.
> > 
> > What kind of storage?
> > 
> > > NFSv4.2  rmapbt=1   24h+
> > 
> > <URK> Wow.  I wonder what about NFS makes us so slow now?  Synchronous
> > transactions on the inactivation?  (speculates wildly at the end of the
> > workday)
> > 
> > I'll have a look in the morning.  It might take me a while to remember
> > how to set up NFS42 :)
> 
> It may just be the default on a recent enough distro.
> 
> Though I'd be a little surprised if this behavior is specific to the
> protocol version.

This testcase requires reflink, which is only available in v4.2.
On other protocols, this testase does not run.

Murphy

> 
> nfsd_unlink() is basically just vfs_unlink() followed by
> commit_metadata().
> 
> --b.
> 
> > 
> > --D
> > 
> > > NFSv4.2  rmapbt=0   1h-2h
> > > xfs      rmapbt=1   10m+
> > > 
> > > At first I thought it hung, turns out it was just slow when deleting
> > > 2 massive reflined files.
> > > 
> > > It's reproducible using latest Linus tree, and Darrick's deferred-inactivation
> > > branch. Run latest for-next branch xfsprogs.
> > > 
> > > I'm not sure it's something wrong, just sharing with you guys. I don't
> > > remember I have identified this as a regression. It should be there for
> > > a long time.
> > > 
> > > Sending to xfs and nfs because it looks like all related. :)
> > > 
> > > This almost gets lost in my list. Not much information recorded, some
> > > trace-cmd outputs for your info. It's easy to reproduce. If it's
> > > interesting to you and need any info, feel free to ask.
> > > 
> > > Thanks,
> > > 
> > > 
> > > 7)   0.279 us    |  xfs_btree_get_block [xfs]();
> > > 7)   0.303 us    |  xfs_btree_rec_offset [xfs]();
> > > 7)   0.301 us    |  xfs_rmapbt_init_high_key_from_rec [xfs]();
> > > 7)   0.356 us    |  xfs_rmapbt_diff_two_keys [xfs]();
> > > 7)   0.305 us    |  xfs_rmapbt_init_key_from_rec [xfs]();
> > > 7)   0.306 us    |  xfs_rmapbt_diff_two_keys [xfs]();
> > > 7)               |  xfs_rmap_query_range_helper [xfs]() {
> > > 7)   0.279 us    |    xfs_rmap_btrec_to_irec [xfs]();
> > > 7)               |    xfs_rmap_lookup_le_range_helper [xfs]() {
> > > 1)   0.786 us    |  _raw_spin_lock_irqsave();
> > > 7)               |      /* xfs_rmap_lookup_le_range_candidate: dev 8:34 agno 2 agbno 6416 len 256 owner 67160161 offset 99284480 flags 0x0 */
> > > 7)   0.506 us    |    }
> > > 7)   1.680 us    |  }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A NFS, xfs, reflink and rmapbt story
  2020-01-27 23:56   ` Dave Chinner
@ 2020-02-05  6:52     ` Murphy Zhou
  0 siblings, 0 replies; 8+ messages in thread
From: Murphy Zhou @ 2020-02-05  6:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, Murphy Zhou, linux-xfs, linux-nfs

On Tue, Jan 28, 2020 at 10:56:17AM +1100, Dave Chinner wrote:
> On Thu, Jan 23, 2020 at 05:10:19PM -0800, Darrick J. Wong wrote:
> > On Thu, Jan 23, 2020 at 04:32:17PM +0800, Murphy Zhou wrote:
> > > Hi,
> > > 
> > > Deleting the files left by generic/175 costs too much time when testing
> > > on NFSv4.2 exporting xfs with rmapbt=1.
> > > 
> > > "./check -nfs generic/175 generic/176" should reproduce it.
> > > 
> > > My test bed is a 16c8G vm.
> > 
> > What kind of storage?

Loop device in guest.

# Host:

[root@ibm-x3850x5-03]$ lsblk
NAME                            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                               8:0    0  2.7T  0 disk
├─sda1                            8:1    0    1M  0 part
├─sda2                            8:2    0    1G  0 part /boot
└─sda3                            8:3    0  2.7T  0 part
  ├─rhel_ibm--x3850x5--03-root  253:0    0  550G  0 lvm  /
  ├─rhel_ibm--x3850x5--03-swap  253:1    0 27.6G  0 lvm  [SWAP]
  ├─rhel_ibm--x3850x5--03-home  253:2    0  1.7T  0 lvm  /home
  ├─rhel_ibm--x3850x5--03-test1 253:3    0   10G  0 lvm
  └─rhel_ibm--x3850x5--03-test2 253:4    0   10G  0 lvm
loop0                             7:0    0    1G  0 loop
loop1                             7:1    0    1G  0 loop
[root@ibm-x3850x5-03]$ smartctl -a /dev/sda
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-3.10.0-1115.el7.x86_64]
(local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke,
www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               IBM
Product:              ServeRAID M5015
Revision:             2.13
Compliance:           SPC-3
User Capacity:        2,996,997,980,160 bytes [2.99 TB]
Logical block size:   512 bytes
Logical Unit id:      0x600605b001665aa019cb17be1e9ce991
Serial number:        0091e99c1ebe17cb19a05a6601b00506
Device type:          disk
Local Time is:        Wed Feb  5 14:35:57 2020 CST
SMART support is:     Unavailable - device lacks SMART capability.

=== START OF READ SMART DATA SECTION ===
Current Drive Temperature:     0 C
Drive Trip Temperature:        0 C

Error Counter logging not supported

Device does not support Self Test logging
[root@ibm-x3850x5-03]$ virsh domblklist 8u
Target     Source
------------------------------------------------
hda        /home/8u.qcow2
hdb        /home/8ut.qcow2
hdc        /home/8ut1.qcow2

[root@ibm-x3850x5-03]$

# Guest:

[root@8u]$ lsblk
NAME          MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda             8:0    0  800G  0 disk
├─sda1          8:1    0    2G  0 part
│ └─rhel-swap 253:0    0    2G  0 lvm  [SWAP]
└─sda2          8:2    0  798G  0 part /
sdb             8:16   0  200G  0 disk /home
sdc             8:32   0  100G  0 disk
├─sdc1          8:33   0   50G  0 part
└─sdc2          8:34   0   50G  0 part
pmem0         259:0    0    5G  0 disk
[root@8u]$ smartctl -a /dev/sdb
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.5.0-v5.5-9386-g33b4013]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke,
www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     QEMU HARDDISK
Serial Number:    QM00003
Firmware Version: 1.5.3
User Capacity:    214,748,364,800 bytes [214 GB]
Sector Size:      512 bytes logical/physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA/ATAPI-7, ATA/ATAPI-5 published, ANSI NCITS
340-2000
Local Time is:    Wed Feb  5 14:39:18 2020 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection:
Enabled.
Self-test execution status:      (   0)	The previous self-test routine
completed
					without error or no self-test
has ever
					been run.
Total time to complete Offline
data collection: 		(  288) seconds.
Offline data collection
capabilities: 			 (0x19) SMART execute Offline immediate.
					No Auto Offline data collection
support.
					Suspend Offline collection upon
new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test
supported.
					No Selective Self-test
supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					No General Purpose Logging
support.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  54) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0003   100   100   006    Pre-fail  Always
-       0
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always
-       16
  4 Start_Stop_Count        0x0002   100   100   020    Old_age   Always
-       100
  5 Reallocated_Sector_Ct   0x0003   100   100   036    Pre-fail  Always
-       0
  9 Power_On_Hours          0x0003   100   100   000    Pre-fail  Always
-       1
 12 Power_Cycle_Count       0x0003   100   100   000    Pre-fail  Always
-       0
190 Airflow_Temperature_Cel 0x0003   069   069   050    Pre-fail  Always
-       31 (Min/Max 31/31)

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

[root@8u]$

> 
> Is the NFS server the same machine as what the local XFS tests were
> run on?

Yes. It's also reproducible whening testing on remote NFS mounts.

> 
> > > NFSv4.2  rmapbt=1   24h+
> > 
> > <URK> Wow.  I wonder what about NFS makes us so slow now?  Synchronous
> > transactions on the inactivation?  (speculates wildly at the end of the
> > workday)
> 
> Doubt it - NFS server uses ->commit_metadata after the async
> operation to ensure that it is completed and on stable storage, so
> the truncate on inactivation should run at pretty much the same
> speed as on a local filesystem as it's still all async commits. i.e.
> the only difference on the NFS server is the log force that follows
> the inode inactivation...
> 
> > I'll have a look in the morning.  It might take me a while to remember
> > how to set up NFS42 :)
> > 
> > --D
> > 
> > > NFSv4.2  rmapbt=0   1h-2h
> > > xfs      rmapbt=1   10m+
> > > 
> > > At first I thought it hung, turns out it was just slow when deleting
> > > 2 massive reflined files.
> 
> Both tests run on the scratch device, so I don't see where there is
> a large file unlink in either of these tests.
> 
> In which case, I'd expect that all the time is consumed in
> generic/176 running punch_alternating to create a million extents
> as that will effectively run a synchronous server-side hole punch
> half a million times.

I've tracked this down. Time was consumed in "rm -rf" in _scratch_mkfs
of generic/176. Thread https://www.spinics.net/lists/fstests/msg13316.html

Thanks,
Murphy

> 
> However, I'm guessing that the server side filesystem has a very
> small log and is on spinning rust, hence the ->commit_metadata log
> forces are preventing in-memory aggregation of modifications. This
> results in the working set of metadata not fitting in the log and so
> each new hole punch transaction ends up waiting on log tail pushing
> (i.e. metadata writeback IO).  i.e. it's thrashing the disk, and
> that's why it is slow.....
> 
> Storage details, please!
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A NFS, xfs, reflink and rmapbt story
  2020-01-27 22:36   ` J. Bruce Fields
  2020-02-05  6:22     ` Murphy Zhou
@ 2020-02-16  8:28     ` Murphy Zhou
  2020-02-17  0:36       ` J. Bruce Fields
  1 sibling, 1 reply; 8+ messages in thread
From: Murphy Zhou @ 2020-02-16  8:28 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Darrick J. Wong, Murphy Zhou, linux-xfs, linux-nfs

Hi Bruce,

On Mon, Jan 27, 2020 at 05:36:31PM -0500, J. Bruce Fields wrote:
> On Thu, Jan 23, 2020 at 05:10:19PM -0800, Darrick J. Wong wrote:
> > On Thu, Jan 23, 2020 at 04:32:17PM +0800, Murphy Zhou wrote:
> > > Hi,
> > > 
> > > Deleting the files left by generic/175 costs too much time when testing
> > > on NFSv4.2 exporting xfs with rmapbt=1.
> > > 
> > > "./check -nfs generic/175 generic/176" should reproduce it.
> > > 
> > > My test bed is a 16c8G vm.
> > 
> > What kind of storage?
> > 
> > > NFSv4.2  rmapbt=1   24h+
> > 
> > <URK> Wow.  I wonder what about NFS makes us so slow now?  Synchronous
> > transactions on the inactivation?  (speculates wildly at the end of the
> > workday)
> > 
> > I'll have a look in the morning.  It might take me a while to remember
> > how to set up NFS42 :)
> 
> It may just be the default on a recent enough distro.
> 
> Though I'd be a little surprised if this behavior is specific to the
> protocol version.

Can NFS client or server know the file has reflinked part ? Is there
any thing like a flag or a bit tracking this?

Thanks!
Murphy
> 
> nfsd_unlink() is basically just vfs_unlink() followed by
> commit_metadata().
> 
> --b.
> 
> > 
> > --D
> > 
> > > NFSv4.2  rmapbt=0   1h-2h
> > > xfs      rmapbt=1   10m+
> > > 
> > > At first I thought it hung, turns out it was just slow when deleting
> > > 2 massive reflined files.
> > > 
> > > It's reproducible using latest Linus tree, and Darrick's deferred-inactivation
> > > branch. Run latest for-next branch xfsprogs.
> > > 
> > > I'm not sure it's something wrong, just sharing with you guys. I don't
> > > remember I have identified this as a regression. It should be there for
> > > a long time.
> > > 
> > > Sending to xfs and nfs because it looks like all related. :)
> > > 
> > > This almost gets lost in my list. Not much information recorded, some
> > > trace-cmd outputs for your info. It's easy to reproduce. If it's
> > > interesting to you and need any info, feel free to ask.
> > > 
> > > Thanks,
> > > 
> > > 
> > > 7)   0.279 us    |  xfs_btree_get_block [xfs]();
> > > 7)   0.303 us    |  xfs_btree_rec_offset [xfs]();
> > > 7)   0.301 us    |  xfs_rmapbt_init_high_key_from_rec [xfs]();
> > > 7)   0.356 us    |  xfs_rmapbt_diff_two_keys [xfs]();
> > > 7)   0.305 us    |  xfs_rmapbt_init_key_from_rec [xfs]();
> > > 7)   0.306 us    |  xfs_rmapbt_diff_two_keys [xfs]();
> > > 7)               |  xfs_rmap_query_range_helper [xfs]() {
> > > 7)   0.279 us    |    xfs_rmap_btrec_to_irec [xfs]();
> > > 7)               |    xfs_rmap_lookup_le_range_helper [xfs]() {
> > > 1)   0.786 us    |  _raw_spin_lock_irqsave();
> > > 7)               |      /* xfs_rmap_lookup_le_range_candidate: dev 8:34 agno 2 agbno 6416 len 256 owner 67160161 offset 99284480 flags 0x0 */
> > > 7)   0.506 us    |    }
> > > 7)   1.680 us    |  }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: A NFS, xfs, reflink and rmapbt story
  2020-02-16  8:28     ` Murphy Zhou
@ 2020-02-17  0:36       ` J. Bruce Fields
  0 siblings, 0 replies; 8+ messages in thread
From: J. Bruce Fields @ 2020-02-17  0:36 UTC (permalink / raw)
  To: Murphy Zhou; +Cc: Darrick J. Wong, linux-xfs, linux-nfs

On Sun, Feb 16, 2020 at 04:28:51PM +0800, Murphy Zhou wrote:
> Hi Bruce,
> 
> On Mon, Jan 27, 2020 at 05:36:31PM -0500, J. Bruce Fields wrote:
> > On Thu, Jan 23, 2020 at 05:10:19PM -0800, Darrick J. Wong wrote:
> > > On Thu, Jan 23, 2020 at 04:32:17PM +0800, Murphy Zhou wrote:
> > > > Hi,
> > > > 
> > > > Deleting the files left by generic/175 costs too much time when testing
> > > > on NFSv4.2 exporting xfs with rmapbt=1.
> > > > 
> > > > "./check -nfs generic/175 generic/176" should reproduce it.
> > > > 
> > > > My test bed is a 16c8G vm.
> > > 
> > > What kind of storage?
> > > 
> > > > NFSv4.2  rmapbt=1   24h+
> > > 
> > > <URK> Wow.  I wonder what about NFS makes us so slow now?  Synchronous
> > > transactions on the inactivation?  (speculates wildly at the end of the
> > > workday)
> > > 
> > > I'll have a look in the morning.  It might take me a while to remember
> > > how to set up NFS42 :)
> > 
> > It may just be the default on a recent enough distro.
> > 
> > Though I'd be a little surprised if this behavior is specific to the
> > protocol version.
> 
> Can NFS client or server know the file has reflinked part ? Is there
> any thing like a flag or a bit tracking this?

Not that I'm aware of.

--b.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-02-17  0:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-23  8:32 A NFS, xfs, reflink and rmapbt story Murphy Zhou
2020-01-24  1:10 ` Darrick J. Wong
2020-01-27 22:36   ` J. Bruce Fields
2020-02-05  6:22     ` Murphy Zhou
2020-02-16  8:28     ` Murphy Zhou
2020-02-17  0:36       ` J. Bruce Fields
2020-01-27 23:56   ` Dave Chinner
2020-02-05  6:52     ` Murphy Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).