From: John Groves <John@groves.net>
To: Dave Chinner <david@fromorbit.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>,
John Groves <jgroves@micron.com>,
Jonathan Corbet <corbet@lwn.net>,
Dan Williams <dan.j.williams@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Matthew Wilcox <willy@infradead.org>,
linux-cxl@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
nvdimm@lists.linux.dev, john@jagalactic.com,
Christoph Hellwig <hch@infradead.org>,
dave.hansen@linux.intel.com, gregory.price@memverge.com
Subject: Re: [RFC PATCH 00/20] Introduce the famfs shared-memory file system
Date: Thu, 29 Feb 2024 08:52:48 -0600 [thread overview]
Message-ID: <5segby7xk6wbyblovpapdymiuvg63e5qarahc4pramhsqikx2x@y3zmih6mgs33> (raw)
In-Reply-To: <Zd/ovHqO/16PsUsp@dread.disaster.area>
Hi Dave!
On 24/02/29 01:15PM, Dave Chinner wrote:
> On Mon, Feb 26, 2024 at 08:05:58PM -0600, John Groves wrote:
> > On 24/02/26 04:58PM, Luis Chamberlain wrote:
> > > On Mon, Feb 26, 2024 at 1:16 PM John Groves <John@groves.net> wrote:
> > > >
> > > > On 24/02/26 07:53AM, Luis Chamberlain wrote:
> > > > > On Mon, Feb 26, 2024 at 07:27:18AM -0600, John Groves wrote:
> > > > > > Run status group 0 (all jobs):
> > > > > > WRITE: bw=29.6GiB/s (31.8GB/s), 29.6GiB/s-29.6GiB/s (31.8GB/s-31.8GB/s), io=44.7GiB (48.0GB), run=1511-1511msec
> > > > >
> > > > > > This is run on an xfs file system on a SATA ssd.
> > > > >
> > > > > To compare more closer apples to apples, wouldn't it make more sense
> > > > > to try this with XFS on pmem (with fio -direct=1)?
> > > > >
> > > > > Luis
> > > >
> > > > Makes sense. Here is the same command line I used with xfs before, but
> > > > now it's on /dev/pmem0 (the same 128G, but converted from devdax to pmem
> > > > because xfs requires that.
> > > >
> > > > fio -name=ten-256m-per-thread --nrfiles=10 -bs=2M --group_reporting=1 --alloc-size=1048576 --filesize=256MiB --readwrite=write --fallocate=none --numjobs=48 --create_on_open=0 --ioengine=io_uring --direct=1 --directory=/mnt/xfs
> > >
> > > Could you try with mkfs.xfs -d agcount=1024
>
> Won't change anything for the better, may make things worse.
I dropped that arg, though performance looked about the same either way.
>
> > bw ( MiB/s): min= 5085, max=27367, per=100.00%, avg=14361.95, stdev=165.61, samples=719
> > iops : min= 2516, max=13670, avg=7160.17, stdev=82.88, samples=719
> > lat (usec) : 4=0.05%, 10=0.72%, 20=2.23%, 50=2.48%, 100=3.02%
> > lat (usec) : 250=1.54%, 500=2.37%, 750=1.34%, 1000=0.75%
> > lat (msec) : 2=3.20%, 4=43.10%, 10=23.05%, 20=14.81%, 50=1.25%
>
> Most of the IO latencies are up round the 4-20ms marks. That seems
> kinda high for a 2MB IO. With a memcpy speed of 10GB/s, the 2MB
> should only take a couple of hundred microseconds. For Famfs, the
> latencies appear to be around 1-4ms.
>
> So where's all that extra time coming from?
Below, you will see two runs with performance and latency distribution
about the same as famfs (the answer for that was --fallocate=native).
>
>
> > lat (msec) : 100=0.08%
> > cpu : usr=10.18%, sys=0.79%, ctx=67227, majf=0, minf=38511
>
> And why is system time reporting at almost zero instead of almost
> all the remaining cpu time (i.e. up at 80-90%)?
Something weird is going on with the cpu reporting. Sometimes sys=~0, but other times
it's about what you would expect. I suspect some sort of measurement error,
like maybe the method doesn't work with my cpu model? (I'm grasping, but with
a somewhat rational basis...)
I pasted two xfs runs below. The first has the wonky cpu sys value, and
the second looks about like what one would expect.
>
> Can you run call-graph kernel profiles for XFS and famfs whilst
> running this workload so we have some insight into what is behaving
> differently here?
Can you point me to an example of how to do that?
>
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
I'd been thinking about the ~2x gap for a few days, and the most obvious
difference is famfs files must be preallocated (like fallocate, but works
a bit differently since allocation happens in user space). I just checked
one of the xfs files, and it had maybe 80 extents (whereas the famfs
files always have 1 extent here).
FWIW I ran xfs with and without io_uring, and there was no apparent
difference (which makes sense to me because it's not block I/O).
The prior ~2x gap still seems like a lot of overhead for extent list
mapping to memory, but adding --fallocate=native to the xfs test brought
it into line with famfs:
+ fio -name=ten-256m-per-thread --nrfiles=10 -bs=2M --group_reporting=1 --alloc-size=1048576 --filesize=256MiB --readwrite=write --fallocate=native --numjobs=48 --create_on_open=0 --ioengine=io_uring --direct=1 --directory=/mnt/xfs
ten-256m-per-thread: (g=0): rw=write, bs=(R) 2048KiB-2048KiB, (W) 2048KiB-2048KiB, (T) 2048KiB-2048KiB, ioengine=io_uring, iodepth=1
...
fio-3.33
Starting 48 processes
Jobs: 38 (f=380): [W(5),_(1),W(12),_(1),W(3),_(1),W(2),_(1),W(2),_(1),W(1),_(1),W(1),_(1),W(6),_(1),W(6),_(2)][57.1%][w=28.0GiB/s][w=14.3k IOPS][eta 00m:03s]
ten-256m-per-thread: (groupid=0, jobs=48): err= 0: pid=1452590: Thu Feb 29 07:46:06 2024
write: IOPS=15.3k, BW=29.8GiB/s (32.0GB/s)(114GiB/3838msec); 0 zone resets
slat (usec): min=17, max=55364, avg=668.20, stdev=1120.41
clat (nsec): min=1368, max=99619k, avg=1982477.32, stdev=2198309.32
lat (usec): min=179, max=99813, avg=2650.68, stdev=2485.15
clat percentiles (usec):
| 1.00th=[ 4], 5.00th=[ 14], 10.00th=[ 172], 20.00th=[ 420],
| 30.00th=[ 644], 40.00th=[ 1057], 50.00th=[ 1582], 60.00th=[ 2008],
| 70.00th=[ 2343], 80.00th=[ 3097], 90.00th=[ 4555], 95.00th=[ 5473],
| 99.00th=[ 8717], 99.50th=[11863], 99.90th=[20055], 99.95th=[27657],
| 99.99th=[49546]
bw ( MiB/s): min=20095, max=59216, per=100.00%, avg=35985.47, stdev=318.61, samples=280
iops : min=10031, max=29587, avg=17970.76, stdev=159.29, samples=280
lat (usec) : 2=0.06%, 4=1.02%, 10=2.33%, 20=4.29%, 50=1.85%
lat (usec) : 100=0.20%, 250=3.26%, 500=11.23%, 750=8.87%, 1000=5.82%
lat (msec) : 2=20.95%, 4=26.74%, 10=12.60%, 20=0.66%, 50=0.09%
lat (msec) : 100=0.01%
cpu : usr=15.48%, sys=1.17%, ctx=62654, majf=0, minf=22801
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,58560,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=29.8GiB/s (32.0GB/s), 29.8GiB/s-29.8GiB/s (32.0GB/s-32.0GB/s), io=114GiB (123GB), run=3838-3838msec
Disk stats (read/write):
pmem0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
## Here is a run where the cpu looks "normal"
+ fio -name=ten-256m-per-thread --nrfiles=10 -bs=2M --group_reporting=1 --alloc-size=1048576 --filesize=256MiB --readwrite=write --fallocate=native --numjobs=48 --create_on_open=0 --direct=1 --directory=/mnt/xfs
ten-256m-per-thread: (g=0): rw=write, bs=(R) 2048KiB-2048KiB, (W) 2048KiB-2048KiB, (T) 2048KiB-2048KiB, ioengine=psync, iodepth=1
...
fio-3.33
Starting 48 processes
Jobs: 19 (f=190): [W(2),_(1),W(2),_(8),W(1),_(3),W(1),_(1),W(2),_(2),W(1),_(1),W(3),_(2),W(1),_(1),W(1),_(2),W(2),_(7),W(3),_(1)][55.6%][w=26.7GiB/s][w=13.6k IOPS][eta 00m:04s]
ten-256m-per-thread: (groupid=0, jobs=48): err= 0: pid=1463615: Thu Feb 29 08:19:53 2024
write: IOPS=12.4k, BW=24.1GiB/s (25.9GB/s)(114GiB/4736msec); 0 zone resets
clat (usec): min=138, max=117903, avg=2581.99, stdev=2704.61
lat (usec): min=152, max=120405, avg=3019.04, stdev=2964.47
clat percentiles (usec):
| 1.00th=[ 161], 5.00th=[ 249], 10.00th=[ 627], 20.00th=[ 1270],
| 30.00th=[ 1631], 40.00th=[ 1942], 50.00th=[ 2089], 60.00th=[ 2212],
| 70.00th=[ 2343], 80.00th=[ 2704], 90.00th=[ 5866], 95.00th=[ 6849],
| 99.00th=[12387], 99.50th=[14353], 99.90th=[26084], 99.95th=[38536],
| 99.99th=[78119]
bw ( MiB/s): min=21204, max=47040, per=100.00%, avg=29005.40, stdev=237.31, samples=329
iops : min=10577, max=23497, avg=14479.74, stdev=118.65, samples=329
lat (usec) : 250=5.04%, 500=4.03%, 750=2.37%, 1000=3.13%
lat (msec) : 2=29.39%, 4=41.05%, 10=13.37%, 20=1.45%, 50=0.15%
lat (msec) : 100=0.03%, 250=0.01%
cpu : usr=14.43%, sys=78.18%, ctx=5272, majf=0, minf=15708
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,58560,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=24.1GiB/s (25.9GB/s), 24.1GiB/s-24.1GiB/s (25.9GB/s-25.9GB/s), io=114GiB (123GB), run=4736-4736msec
Disk stats (read/write):
pmem0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
Cheers,
John
next prev parent reply other threads:[~2024-02-29 14:52 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-23 17:41 [RFC PATCH 00/20] Introduce the famfs shared-memory file system John Groves
2024-02-23 17:41 ` [RFC PATCH 01/20] famfs: Documentation John Groves
2024-02-23 17:41 ` [RFC PATCH 02/20] dev_dax_iomap: Add fs_dax_get() func to prepare dax for fs-dax usage John Groves
2024-02-26 12:05 ` Jonathan Cameron
2024-02-26 15:00 ` John Groves
2024-02-23 17:41 ` [RFC PATCH 03/20] dev_dax_iomap: Move dax_pgoff_to_phys from device.c to bus.c since both need it now John Groves
2024-02-26 12:10 ` Jonathan Cameron
2024-02-26 15:13 ` John Groves
2024-02-23 17:41 ` [RFC PATCH 04/20] dev_dax_iomap: Save the kva from memremap John Groves
2024-02-26 12:21 ` Jonathan Cameron
2024-02-26 15:48 ` John Groves
2024-02-23 17:41 ` [RFC PATCH 05/20] dev_dax_iomap: Add dax_operations for use by fs-dax on devdax John Groves
2024-02-26 12:32 ` Jonathan Cameron
2024-02-26 16:09 ` John Groves
2024-02-23 17:41 ` [RFC PATCH 06/20] dev_dax_iomap: Add CONFIG_DEV_DAX_IOMAP kernel build parameter John Groves
2024-02-26 12:34 ` Jonathan Cameron
2024-02-26 16:12 ` John Groves
2024-02-23 17:41 ` [RFC PATCH 07/20] famfs: Add include/linux/famfs_ioctl.h John Groves
2024-02-24 1:39 ` Randy Dunlap
2024-02-24 2:23 ` John Groves
2024-02-24 3:27 ` Randy Dunlap
2024-02-24 23:32 ` John Groves
2024-02-24 23:40 ` Randy Dunlap
2024-02-26 12:39 ` Jonathan Cameron
2024-02-26 16:44 ` John Groves
2024-02-26 16:56 ` Jonathan Cameron
2024-02-26 18:04 ` John Groves
2024-02-23 17:41 ` [RFC PATCH 08/20] famfs: Add famfs_internal.h John Groves
2024-02-26 12:48 ` Jonathan Cameron
2024-02-26 17:35 ` John Groves
2024-02-27 10:28 ` Jonathan Cameron
2024-02-28 1:06 ` John Groves
2024-02-27 13:38 ` Christian Brauner
2024-02-27 14:12 ` John Groves
2024-02-23 17:41 ` [RFC PATCH 09/20] famfs: Add super_operations John Groves
2024-02-26 12:51 ` Jonathan Cameron
2024-02-26 21:47 ` John Groves
2024-02-27 10:34 ` Jonathan Cameron
2024-02-27 17:48 ` John Groves
2024-02-23 17:41 ` [RFC PATCH 10/20] famfs: famfs_open_device() & dax_holder_operations John Groves
2024-02-26 12:56 ` Jonathan Cameron
2024-02-26 22:22 ` John Groves
2024-02-27 13:39 ` Christian Brauner
2024-02-27 18:38 ` John Groves
2024-02-23 17:41 ` [RFC PATCH 11/20] famfs: Add fs_context_operations John Groves
2024-02-26 13:20 ` Jonathan Cameron
2024-02-26 22:43 ` John Groves
2024-02-27 13:41 ` Christian Brauner
2024-02-28 0:59 ` John Groves
2024-02-28 1:49 ` Randy Dunlap
2024-02-28 8:17 ` Christian Brauner
2024-02-28 10:07 ` Christian Brauner
2024-02-28 12:01 ` Christian Brauner
2024-02-23 17:41 ` [RFC PATCH 12/20] famfs: Add inode_operations and file_system_type John Groves
2024-02-26 13:25 ` Jonathan Cameron
2024-02-26 22:53 ` John Groves
2024-02-23 17:41 ` [RFC PATCH 13/20] famfs: Add iomap_ops John Groves
2024-02-26 13:30 ` Jonathan Cameron
2024-02-26 23:00 ` John Groves
2024-02-23 17:41 ` [RFC PATCH 14/20] famfs: Add struct file_operations John Groves
2024-02-26 13:32 ` Jonathan Cameron
2024-02-26 23:09 ` John Groves
2024-02-23 17:41 ` [RFC PATCH 15/20] famfs: Add ioctl to file_operations John Groves
2024-02-26 13:44 ` Jonathan Cameron
2024-02-23 17:42 ` [RFC PATCH 16/20] famfs: Add fault counters John Groves
2024-02-23 18:23 ` Dave Hansen
2024-02-23 19:56 ` John Groves
2024-02-23 20:04 ` Dan Williams
2024-02-23 20:39 ` John Groves
2024-02-23 21:19 ` Dave Hansen
2024-02-23 23:50 ` Dan Williams
2024-02-24 3:59 ` Matthew Wilcox
2024-02-24 4:30 ` Dan Williams
2024-02-23 17:42 ` [RFC PATCH 17/20] famfs: Add module stuff John Groves
2024-02-26 13:47 ` Jonathan Cameron
2024-02-27 22:15 ` John Groves
2024-02-23 17:42 ` [RFC PATCH 18/20] famfs: Support character dax via the dev_dax_iomap patch John Groves
2024-02-26 13:52 ` Jonathan Cameron
2024-02-27 22:27 ` John Groves
2024-02-23 17:42 ` [RFC PATCH 19/20] famfs: Update MAINTAINERS file John Groves
2024-02-23 17:42 ` [RFC PATCH 20/20] famfs: Add Kconfig and Makefile plumbing John Groves
2024-02-24 1:50 ` Randy Dunlap
2024-02-24 2:24 ` John Groves
2024-02-24 0:07 ` [RFC PATCH 00/20] Introduce the famfs shared-memory file system Luis Chamberlain
2024-02-26 13:27 ` John Groves
2024-02-26 15:53 ` Luis Chamberlain
2024-02-26 21:16 ` John Groves
2024-02-27 0:58 ` Luis Chamberlain
2024-02-27 2:05 ` John Groves
2024-02-29 2:15 ` Dave Chinner
2024-02-29 14:52 ` John Groves [this message]
2024-03-11 1:29 ` Dave Chinner
2024-02-29 6:52 ` Amir Goldstein
2024-02-29 22:16 ` John Groves
2024-05-17 9:55 ` Miklos Szeredi
2024-05-19 5:59 ` Amir Goldstein
2024-05-22 2:05 ` John Groves
2024-05-22 8:58 ` Miklos Szeredi
2024-05-22 10:16 ` Amir Goldstein
2024-05-22 11:28 ` Miklos Szeredi
2024-05-22 13:41 ` Amir Goldstein
2024-05-23 2:49 ` John Groves
2024-05-23 13:57 ` Miklos Szeredi
2024-05-24 0:47 ` John Groves
2024-05-24 7:55 ` Miklos Szeredi
2024-05-25 22:54 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5segby7xk6wbyblovpapdymiuvg63e5qarahc4pramhsqikx2x@y3zmih6mgs33 \
--to=john@groves.net \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=david@fromorbit.com \
--cc=gregory.price@memverge.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jgroves@micron.com \
--cc=john@jagalactic.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=nvdimm@lists.linux.dev \
--cc=viro@zeniv.linux.org.uk \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).