* [PATCH] xfs_db: add extent count and file size histograms
@ 2019-05-14 18:50 Jorge Guerra
2019-05-14 19:52 ` Eric Sandeen
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Jorge Guerra @ 2019-05-14 18:50 UTC (permalink / raw)
To: linux-xfs; +Cc: osandov, Jorge Guerra
From: Jorge Guerra <jorgeguerra@fb.com>
In this change we add two feature to the xfs_db 'frag' command:
1) Extent count histogram [-e]: This option enables tracking the
number of extents per inode (file) as the we traverse the file
system. The end result is a histogram of the number of extents per
file in power of 2 buckets.
2) File size histogram and file system internal fragmentation stats
[-s]: This option enables tracking file sizes both in terms of what
has been physically allocated and how much has been written to the
file. In addition, we track the amount of internal fragmentation
seen per file. This is particularly useful in the case of real
time devices where space is allocated in units of fixed sized
extents.
The man page for xfs_db has been updated to reflect these new command
line arguments.
Tests:
We tested this change on several XFS file systems with different
configurations:
1) regular XFS:
[root@m1 ~]# xfs_info /mnt/d0
meta-data=/dev/sdb1 isize=256 agcount=10, agsize=268435455 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0, sparse=0, rmapbt=0
= reflink=0
data = bsize=4096 blocks=2441608704, imaxpct=100
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=521728, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
[root@m1 ~]# echo "frag -e -s" | xfs_db -r /dev/sdb1
xfs_db> actual 494393, ideal 489246, fragmentation factor 1.04%
Note, this number is largely meaningless.
Files on this filesystem average 1.01 extents per file
Maximum extents in a file 17
Histogram of number of extents per file:
bucket = count % of total
<= 1 = 486157 99.573 %
<= 2 = 768 0.157 %
<= 4 = 371 0.076 %
<= 8 = 947 0.194 %
<= 16 = 0 0.000 %
<= 32 = 1 0.000 %
Maximum file size 64.512 MB
Histogram of file size:
bucket = used overhead(bytes)
<= 4 KB = 180515 0 0.00%
<= 8 KB = 23604 4666970112 44.31%
<= 16 KB = 2712 1961668608 18.62%
<= 32 KB = 1695 612319232 5.81%
<= 64 KB = 290 473210880 4.49%
<= 128 KB = 214 270184448 2.56%
<= 256 KB = 186 269856768 2.56%
<= 512 KB = 201 67203072 0.64%
<= 1 MB = 325 267558912 2.54%
<= 2 MB = 419 596860928 5.67%
<= 4 MB = 436 454148096 4.31%
<= 8 MB = 1864 184532992 1.75%
<= 16 MB = 16084 111964160 1.06%
<= 32 MB = 258910 395116544 3.75%
<= 64 MB = 61 202104832 1.92%
<= 128 MB = 728 0 0.00%
capacity used (bytes): 7210847514624 (6.558 TB)
block overhead (bytes): 10533699584 (0.146 %)
xfs_db>
2) XFS with a realtime device configured with 256 KiB extents:
[root@m2 ~]# xfs_info /mnt/d0
meta-data=/dev/nvme0n1p1 isize=2048 agcount=15, agsize=434112 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0, sparse=0, rmapbt=0
= reflink=0
data = bsize=4096 blocks=6104576, imaxpct=100
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =/dev/sdaa1 extsz=262144 blocks=2439872256, rtextents=38123004
[root@m2 ~]# echo "frag -s -e" | xfs_db -r /dev/nvme0n1p1
xfs_db> actual 11851552, ideal 1264416, fragmentation factor 89.33%
Note, this number is largely meaningless.
Files on this filesystem average 9.37 extents per file
Maximum extents in a file 129956
Histogram of number of extents per file:
bucket = count % of total
<= 1 = 331951 26.295 %
<= 2 = 82720 6.553 %
<= 4 = 160041 12.677 %
<= 8 = 205312 16.263 %
<= 16 = 267145 21.161 %
<= 32 = 197625 15.655 %
<= 64 = 17610 1.395 %
<= 128 = 8 0.001 %
<= 256 = 1 0.000 %
<= 512 = 0 0.000 %
<= 1024 = 0 0.000 %
<= 2048 = 0 0.000 %
<= 4096 = 0 0.000 %
<= 8192 = 0 0.000 %
<= 16384 = 0 0.000 %
<= 32768 = 0 0.000 %
<= 65536 = 0 0.000 %
<= 131072 = 1 0.000 %
Maximum file size 15.522 GB
Histogram of file size:
bucket = allocated used overhead(bytes)
<= 4 KB = 0 2054 8924143616 3.80%
<= 8 KB = 0 57684 14648967168 6.23%
<= 16 KB = 0 24280 6032441344 2.57%
<= 32 KB = 0 18351 4340473856 1.85%
<= 64 KB = 0 20064 4280770560 1.82%
<= 128 KB = 1002 25287 4138127360 1.76%
<= 256 KB = 163110 17548 1264742400 0.54%
<= 512 KB = 19898 19863 2843152384 1.21%
<= 1 MB = 32687 32617 4361404416 1.86%
<= 2 MB = 38395 38324 5388206080 2.29%
<= 4 MB = 82700 82633 10549821440 4.49%
<= 8 MB = 208576 208477 34238386176 14.57%
<= 16 MB = 715937 715092 134046113792 57.02%
<= 32 MB = 107 107 6332416 0.00%
<= 64 MB = 0 0 0 0.00%
<= 128 MB = 1 1 157611 0.00%
<= 256 MB = 0 0 0 0.00%
<= 512 MB = 0 0 0 0.00%
<= 1 GB = 0 0 0 0.00%
<= 2 GB = 0 0 0 0.00%
<= 4 GB = 0 0 0 0.00%
<= 8 GB = 0 0 0 0.00%
<= 16 GB = 1 1 0 0.00%
capacity used (bytes): 7679537216535 (6.984 TB)
capacity allocated (bytes): 7914608582656 (7.198 TB)
block overhead (bytes): 235071366121 (3.061 %)
xfs_db>
3) XFS with a realtime device configured with 1044 KiB extents:
[root@m3 ~]# xfs_info /mnt/d0
meta-data=/dev/sdb1 isize=2048 agcount=4, agsize=1041728 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=0 finobt=0, sparse=0, rmapbt=0
= reflink=0
data = bsize=4096 blocks=4166912, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =/dev/sdc1 extsz=1069056 blocks=1949338880, rtextents=7468731
[root@m3 ~]# echo "frag -s -e" | /tmp/xfs_db -r /dev/sdc1
xfs_db: /dev/sdc1 is not a valid XFS filesystem (unexpected SB magic number 0x68656164)
Use -F to force a read attempt.
[root@m3 ~]# echo "frag -s -e" | /tmp/xfs_db -r /dev/sdb1
xfs_db> actual 732480, ideal 360707, fragmentation factor 50.76%
Note, this number is largely meaningless.
Files on this filesystem average 2.03 extents per file
Maximum extents in a file 14
Histogram of number of extents per file:
bucket = count % of total
<= 1 = 350934 97.696 %
<= 2 = 6231 1.735 %
<= 4 = 1001 0.279 %
<= 8 = 953 0.265 %
<= 16 = 92 0.026 %
Maximum file size 26.508 MB
Histogram of file size:
bucket = allocated used overhead(bytes)
<= 4 KB = 0 62 314048512 0.13%
<= 8 KB = 0 119911 127209263104 53.28%
<= 16 KB = 0 14543 15350194176 6.43%
<= 32 KB = 909 12330 11851161600 4.96%
<= 64 KB = 92 6704 6828642304 2.86%
<= 128 KB = 1 7132 6933372928 2.90%
<= 256 KB = 0 10013 8753799168 3.67%
<= 512 KB = 0 13616 9049227264 3.79%
<= 1 MB = 1 15056 4774912000 2.00%
<= 2 MB = 198662 17168 9690226688 4.06%
<= 4 MB = 28639 21073 11806654464 4.94%
<= 8 MB = 35169 29878 14200553472 5.95%
<= 16 MB = 95667 91633 11939287040 5.00%
<= 32 MB = 71 62 28471742 0.01%
capacity used (bytes): 1097735533058 (1022.346 GB)
capacity allocated (bytes): 1336497410048 (1.216 TB)
block overhead (bytes): 238761885182 (21.750 %)
xfs_db>
Signed-off-by: Jorge Guerra <jorgeguerra@fb.com>
---
db/frag.c | 210 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
man/man8/xfs_db.8 | 8 ++-
2 files changed, 211 insertions(+), 7 deletions(-)
diff --git a/db/frag.c b/db/frag.c
index 91395234..5d569325 100644
--- a/db/frag.c
+++ b/db/frag.c
@@ -15,6 +15,31 @@
#include "init.h"
#include "malloc.h"
+#define PERCENT(x, y) (((double)(x) * 100)/(y))
+//#define ARRAY_SIZE(a) (sizeof((a))/sizeof((a)[0]))
+#define BLOCKS_2_BYTES(b) ((b) << 12)
+#define CLZ(n) (__builtin_clzl(n))
+#define CTZ(n) (__builtin_ctzl(n))
+
+#define N_BUCKETS 64
+
+typedef struct extentstats {
+ uint64_t allocsize[N_BUCKETS + 1];
+ uint64_t usedsize[N_BUCKETS + 1];
+ uint64_t wastedsize[N_BUCKETS + 1];
+ uint64_t maxfilesize;
+ uint64_t logicalused;
+ uint64_t physicalused;
+ uint64_t wastedspace;
+ bool realtime;
+} extentstats_t;
+
+typedef struct fileextstats {
+ uint64_t extsbuckets[N_BUCKETS + 1];
+ uint64_t maxexts;
+ uint64_t numfiles;
+} fileextstats_t;
+
typedef struct extent {
xfs_fileoff_t startoff;
xfs_filblks_t blockcount;
@@ -38,6 +63,10 @@ static int qflag;
static int Rflag;
static int rflag;
static int vflag;
+static int eflag;
+static extentstats_t extstats;
+static int sflag;
+static fileextstats_t festats;
typedef void (*scan_lbtree_f_t)(struct xfs_btree_block *block,
int level,
@@ -49,7 +78,7 @@ typedef void (*scan_sbtree_f_t)(struct xfs_btree_block *block,
xfs_agf_t *agf);
static extmap_t *extmap_alloc(xfs_extnum_t nex);
-static xfs_extnum_t extmap_ideal(extmap_t *extmap);
+static xfs_extnum_t extmap_ideal(extmap_t *extmap, uint64_t *fallocsize);
static void extmap_set_ext(extmap_t **extmapp, xfs_fileoff_t o,
xfs_extlen_t c);
static int frag_f(int argc, char **argv);
@@ -77,9 +106,46 @@ static void scanfunc_ino(struct xfs_btree_block *block, int level,
static const cmdinfo_t frag_cmd =
{ "frag", NULL, frag_f, 0, -1, 0,
- "[-a] [-d] [-f] [-l] [-q] [-R] [-r] [-v]",
+ "[-a] [-d] [-e] [-f] [-l] [-q] [-R] [-r] [-s] [-v]",
"get file fragmentation data", NULL };
+// IEC 2^10 standard prefixes
+static const char iec_prefixes[] =
+ { ' ', 'K', 'M', 'G', 'T', 'P', 'E', 'Z'};
+
+static double
+bytes_2_human(
+ uint64_t bytes,
+ int *iecprefix)
+{
+ double answer;
+ int i;
+
+ for (i = 0, answer = (double)bytes;
+ answer > 1024 && i < ARRAY_SIZE(iec_prefixes);
+ i++, answer /= 1024);
+ *iecprefix = i;
+
+ return answer;
+}
+
+static uint8_t
+get_bucket(
+ uint64_t val)
+{
+ uint8_t bucket;
+ uint8_t msbidx = 63 - CLZ(val);
+ uint8_t lsbidx = CTZ(val);
+
+ /*
+ * The bucket is computed as ceiling(s, 2^CLZ(s)), but this method is
+ * faster.
+ */
+ bucket = msbidx + (msbidx != lsbidx ? 1 : 0);
+
+ return MIN(bucket, N_BUCKETS);
+}
+
static extmap_t *
extmap_alloc(
xfs_extnum_t nex)
@@ -96,18 +162,23 @@ extmap_alloc(
static xfs_extnum_t
extmap_ideal(
- extmap_t *extmap)
+ extmap_t *extmap,
+ uint64_t *fallocsize)
{
extent_t *ep;
xfs_extnum_t rval;
+ uint64_t fsize = 0;
for (ep = &extmap->ents[0], rval = 0;
ep < &extmap->ents[extmap->nents];
ep++) {
+ fsize += BLOCKS_2_BYTES(ep->blockcount);
if (ep == &extmap->ents[0] ||
ep->startoff != ep[-1].startoff + ep[-1].blockcount)
rval++;
}
+ *fallocsize = fsize;
+
return rval;
}
@@ -133,6 +204,80 @@ extmap_set_ext(
}
void
+print_extents_histo(void)
+{
+ int i;
+ int nfiles = 0;
+
+ dbprintf(_("Maximum extents in a file %lu\n"), festats.maxexts);
+ dbprintf(_("Histogram of number of extents per file:\n"));
+ dbprintf(_(" %7s =\t%8s\t%s\n"), "bucket", "count", "\% of total");
+ for (i = 0;
+ i <= N_BUCKETS && nfiles < festats.numfiles; i++) {
+ nfiles += festats.extsbuckets[i];
+ if (nfiles == 0)
+ continue;
+ dbprintf(_("<= %7u = \t%8u\t%.3f \%\n"), 1 << i, festats.extsbuckets[i],
+ PERCENT(festats.extsbuckets[i], festats.numfiles));
+ }
+}
+
+void
+print_file_size_histo(void)
+{
+ double answer;
+ int i;
+ int nfiles = 0;
+ int ufiles = 0;
+
+ answer = bytes_2_human(extstats.maxfilesize, &i);
+ dbprintf(_("Maximum file size %.3f %cB\n"), answer, iec_prefixes[i]);
+ dbprintf(_("Histogram of file size:\n"));
+ if (extstats.realtime) {
+ dbprintf(_(" %7s =\t%8s\t%8s\t%12s\n"),
+ "bucket", "allocated", "used", "overhead(bytes)");
+ for (i = 10; i <= N_BUCKETS && nfiles < festats.numfiles; i++) {
+ nfiles += extstats.allocsize[i];
+ ufiles += extstats.usedsize[i];
+ if (ufiles == 0)
+ continue;
+ dbprintf(_("<= %4u %cB =\t%8lu\t%8lu\t%12lu %.2f\%\n"), 1 << (i % 10),
+ iec_prefixes[i/10],
+ extstats.allocsize[i], extstats.usedsize[i],
+ extstats.wastedsize[i],
+ PERCENT(extstats.wastedsize[i], extstats.wastedspace));
+ }
+ answer = bytes_2_human(extstats.logicalused, &i);
+ dbprintf(_("capacity used (bytes): %llu (%.3f %cB)\n"),
+ extstats.logicalused, answer, iec_prefixes[i]);
+ answer = bytes_2_human(extstats.physicalused, &i);
+ dbprintf(_("capacity allocated (bytes): %llu (%.3f %cB)\n"),
+ extstats.physicalused, answer, iec_prefixes[i]);
+ answer = PERCENT(extstats.wastedspace, extstats.logicalused);
+ } else {
+ dbprintf(_(" %7s =\t%8s\t%12s\n"),
+ "bucket", "used", "overhead(bytes)");
+ for (i = 10; i <= N_BUCKETS && nfiles < festats.numfiles; i++) {
+ nfiles += extstats.allocsize[i];
+ ufiles += extstats.usedsize[i];
+ if (ufiles == 0)
+ continue;
+ dbprintf(_("<= %4u %cB =\t%8lu\t%12lu %.2f\%\n"), 1 << (i % 10),
+ iec_prefixes[i/10],
+ extstats.allocsize[i],
+ extstats.wastedsize[i],
+ PERCENT(extstats.wastedsize[i], extstats.wastedspace));
+ }
+ answer = bytes_2_human(extstats.physicalused, &i);
+ dbprintf(_("capacity used (bytes): %llu (%.3f %cB)\n"),
+ extstats.physicalused, answer, iec_prefixes[i]);
+ answer = PERCENT(extstats.wastedspace, extstats.physicalused);
+ }
+ dbprintf(_("block overhead (bytes): %llu (%.3f \%)\n"),
+ extstats.wastedspace, answer);
+}
+
+void
frag_init(void)
{
add_command(&frag_cmd);
@@ -164,6 +309,12 @@ frag_f(
answer = (double)extcount_actual / (double)extcount_ideal;
dbprintf(_("Files on this filesystem average %.2f extents per file\n"),
answer);
+ if (eflag) {
+ print_extents_histo();
+ }
+ if (sflag) {
+ print_file_size_histo();
+ }
return 0;
}
@@ -174,9 +325,10 @@ init(
{
int c;
- aflag = dflag = fflag = lflag = qflag = Rflag = rflag = vflag = 0;
+ aflag = dflag = eflag = fflag = lflag = qflag = Rflag =
+ rflag = sflag = vflag = 0;
optind = 0;
- while ((c = getopt(argc, argv, "adflqRrv")) != EOF) {
+ while ((c = getopt(argc, argv, "adeflqRrsv")) != EOF) {
switch (c) {
case 'a':
aflag = 1;
@@ -184,6 +336,9 @@ init(
case 'd':
dflag = 1;
break;
+ case 'e':
+ eflag = 1;
+ break;
case 'f':
fflag = 1;
break;
@@ -199,6 +354,9 @@ init(
case 'r':
rflag = 1;
break;
+ case 's':
+ sflag = 1;
+ break;
case 'v':
vflag = 1;
break;
@@ -210,6 +368,8 @@ init(
if (!aflag && !dflag && !fflag && !lflag && !qflag && !Rflag && !rflag)
aflag = dflag = fflag = lflag = qflag = Rflag = rflag = 1;
extcount_actual = extcount_ideal = 0;
+ memset(&extstats, 0 , sizeof(extstats));
+ memset(&festats, 0 , sizeof(festats));
return 1;
}
@@ -274,6 +434,10 @@ process_fork(
{
extmap_t *extmap;
int nex;
+ int bucket;
+ uint64_t fallocsize;
+ uint64_t fusedsize;
+ uint64_t fwastedsize;
nex = XFS_DFORK_NEXTENTS(dip, whichfork);
if (!nex)
@@ -288,7 +452,41 @@ process_fork(
break;
}
extcount_actual += extmap->nents;
- extcount_ideal += extmap_ideal(extmap);
+ extcount_ideal += extmap_ideal(extmap, &fallocsize);
+
+ if (sflag) {
+ // Record file size stats
+ fusedsize = be64_to_cpu(dip->di_size);
+ bucket = get_bucket(fallocsize);
+ extstats.allocsize[bucket]++;
+ bucket = get_bucket(fusedsize);
+ extstats.usedsize[bucket]++;
+
+ if (fallocsize > fusedsize) {
+ fwastedsize = fallocsize - fusedsize;
+ extstats.wastedspace += fwastedsize;
+ extstats.wastedsize[bucket] += fwastedsize;
+ }
+ extstats.logicalused += fusedsize;
+ extstats.physicalused += fallocsize;
+ extstats.maxfilesize = MAX(extstats.maxfilesize, fallocsize);
+ if (be16_to_cpu(dip->di_flags) & XFS_DIFLAG_REALTIME) {
+ extstats.realtime = true;
+ }
+ }
+
+ if (eflag) {
+ // Record file extent stats
+ bucket = get_bucket(extmap->nents);
+ if (be16_to_cpu(dip->di_flags) & XFS_DIFLAG_REALTIME) {
+ // Realtime inodes have an additional extent
+ bucket = get_bucket(MAX(extmap->nents - 1, 1));
+ }
+ festats.extsbuckets[bucket]++;
+ festats.maxexts = MAX(festats.maxexts, extmap->nents);
+ }
+ festats.numfiles++;
+
xfree(extmap);
}
diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
index a1ee3514..52d5f18a 100644
--- a/man/man8/xfs_db.8
+++ b/man/man8/xfs_db.8
@@ -489,7 +489,7 @@ command.
.B forward
Move forward to the next entry in the position ring.
.TP
-.B frag [\-adflqRrv]
+.B frag [\-adeflqRrsv]
Get file fragmentation data. This prints information about fragmentation
of file data in the filesystem (as opposed to fragmentation of freespace,
for which see the
@@ -510,6 +510,9 @@ enables processing of attribute data.
.B \-d
enables processing of directory data.
.TP
+.B \-e
+enables computing extent count per inode histogram.
+.TP
.B \-f
enables processing of regular file data.
.TP
@@ -524,6 +527,9 @@ enables processing of realtime control file data.
.TP
.B \-r
enables processing of realtime file data.
+.TP
+.B \-s
+enables computing file size histogram and file system overheads.
.RE
.TP
.BI "freesp [\-bcds] [\-A " alignment "] [\-a " ag "] ... [\-e " i "] [\-h " h1 "] ... [\-m " m ]
--
2.13.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] xfs_db: add extent count and file size histograms
2019-05-14 18:50 [PATCH] xfs_db: add extent count and file size histograms Jorge Guerra
@ 2019-05-14 19:52 ` Eric Sandeen
2019-05-14 20:02 ` Eric Sandeen
2019-05-14 23:31 ` Dave Chinner
2 siblings, 0 replies; 14+ messages in thread
From: Eric Sandeen @ 2019-05-14 19:52 UTC (permalink / raw)
To: Jorge Guerra, linux-xfs; +Cc: osandov, Jorge Guerra
On 5/14/19 1:50 PM, Jorge Guerra wrote:
> From: Jorge Guerra <jorgeguerra@fb.com>
>
> In this change we add two feature to the xfs_db 'frag' command:
>
> 1) Extent count histogram [-e]: This option enables tracking the
> number of extents per inode (file) as the we traverse the file
> system. The end result is a histogram of the number of extents per
> file in power of 2 buckets.
>
> 2) File size histogram and file system internal fragmentation stats
> [-s]: This option enables tracking file sizes both in terms of what
> has been physically allocated and how much has been written to the
> file. In addition, we track the amount of internal fragmentation
> seen per file. This is particularly useful in the case of real
> time devices where space is allocated in units of fixed sized
> extents.
>
> The man page for xfs_db has been updated to reflect these new command
> line arguments.
>
> Tests:
>
> We tested this change on several XFS file systems with different
> configurations:
>
> 1) regular XFS:
>
> [root@m1 ~]# xfs_info /mnt/d0
> meta-data=/dev/sdb1 isize=256 agcount=10, agsize=268435455 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=0 finobt=0, sparse=0, rmapbt=0
> = reflink=0
> data = bsize=4096 blocks=2441608704, imaxpct=100
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> log =internal log bsize=4096 blocks=521728, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
> [root@m1 ~]# echo "frag -e -s" | xfs_db -r /dev/sdb1
> xfs_db> actual 494393, ideal 489246, fragmentation factor 1.04%
> Note, this number is largely meaningless.
> Files on this filesystem average 1.01 extents per file
> Maximum extents in a file 17
> Histogram of number of extents per file:
> bucket = count % of total
> <= 1 = 486157 99.573 %
> <= 2 = 768 0.157 %
> <= 4 = 371 0.076 %
> <= 8 = 947 0.194 %
> <= 16 = 0 0.000 %
> <= 32 = 1 0.000 %
> Maximum file size 64.512 MB
One thing to note here is that by default, frag is collecting stats on everything -
files, dirs, symlinks, and even attributes. That may not be obvious, and it
may do interesting things to your stats. You can always pick "only file data"
with the -f argument.
mostly cosmetic nitpicks below, though on technical point is that it's wrong
to assume 4k blocks as you seem to have done.
> Histogram of file size:
> bucket = used overhead(bytes)
> <= 4 KB = 180515 0 0.00%
> <= 8 KB = 23604 4666970112 44.31%
> <= 16 KB = 2712 1961668608 18.62%
> <= 32 KB = 1695 612319232 5.81%
> <= 64 KB = 290 473210880 4.49%
> <= 128 KB = 214 270184448 2.56%
> <= 256 KB = 186 269856768 2.56%
> <= 512 KB = 201 67203072 0.64%
> <= 1 MB = 325 267558912 2.54%
> <= 2 MB = 419 596860928 5.67%
> <= 4 MB = 436 454148096 4.31%
> <= 8 MB = 1864 184532992 1.75%
> <= 16 MB = 16084 111964160 1.06%
> <= 32 MB = 258910 395116544 3.75%
> <= 64 MB = 61 202104832 1.92%
> <= 128 MB = 728 0 0.00%
> capacity used (bytes): 7210847514624 (6.558 TB)
> block overhead (bytes): 10533699584 (0.146 %)
> xfs_db>
>
> 2) XFS with a realtime device configured with 256 KiB extents:
>
> [root@m2 ~]# xfs_info /mnt/d0
> meta-data=/dev/nvme0n1p1 isize=2048 agcount=15, agsize=434112 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=0 finobt=0, sparse=0, rmapbt=0
> = reflink=0
> data = bsize=4096 blocks=6104576, imaxpct=100
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> log =internal log bsize=4096 blocks=2560, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=1
> realtime =/dev/sdaa1 extsz=262144 blocks=2439872256, rtextents=38123004
>
> [root@m2 ~]# echo "frag -s -e" | xfs_db -r /dev/nvme0n1p1
> xfs_db> actual 11851552, ideal 1264416, fragmentation factor 89.33%
> Note, this number is largely meaningless.
> Files on this filesystem average 9.37 extents per file
> Maximum extents in a file 129956
> Histogram of number of extents per file:
> bucket = count % of total
> <= 1 = 331951 26.295 %
> <= 2 = 82720 6.553 %
> <= 4 = 160041 12.677 %
> <= 8 = 205312 16.263 %
> <= 16 = 267145 21.161 %
> <= 32 = 197625 15.655 %
> <= 64 = 17610 1.395 %
> <= 128 = 8 0.001 %
> <= 256 = 1 0.000 %
> <= 512 = 0 0.000 %
> <= 1024 = 0 0.000 %
> <= 2048 = 0 0.000 %
> <= 4096 = 0 0.000 %
> <= 8192 = 0 0.000 %
> <= 16384 = 0 0.000 %
> <= 32768 = 0 0.000 %
> <= 65536 = 0 0.000 %
> <= 131072 = 1 0.000 %
> Maximum file size 15.522 GB
> Histogram of file size:
> bucket = allocated used overhead(bytes)
> <= 4 KB = 0 2054 8924143616 3.80%
> <= 8 KB = 0 57684 14648967168 6.23%
> <= 16 KB = 0 24280 6032441344 2.57%
> <= 32 KB = 0 18351 4340473856 1.85%
> <= 64 KB = 0 20064 4280770560 1.82%
> <= 128 KB = 1002 25287 4138127360 1.76%
> <= 256 KB = 163110 17548 1264742400 0.54%
> <= 512 KB = 19898 19863 2843152384 1.21%
> <= 1 MB = 32687 32617 4361404416 1.86%
> <= 2 MB = 38395 38324 5388206080 2.29%
> <= 4 MB = 82700 82633 10549821440 4.49%
> <= 8 MB = 208576 208477 34238386176 14.57%
> <= 16 MB = 715937 715092 134046113792 57.02%
> <= 32 MB = 107 107 6332416 0.00%
> <= 64 MB = 0 0 0 0.00%
> <= 128 MB = 1 1 157611 0.00%
> <= 256 MB = 0 0 0 0.00%
> <= 512 MB = 0 0 0 0.00%
> <= 1 GB = 0 0 0 0.00%
> <= 2 GB = 0 0 0 0.00%
> <= 4 GB = 0 0 0 0.00%
> <= 8 GB = 0 0 0 0.00%
> <= 16 GB = 1 1 0 0.00%
> capacity used (bytes): 7679537216535 (6.984 TB)
> capacity allocated (bytes): 7914608582656 (7.198 TB)
> block overhead (bytes): 235071366121 (3.061 %)
> xfs_db>
>
> 3) XFS with a realtime device configured with 1044 KiB extents:
>
> [root@m3 ~]# xfs_info /mnt/d0
> meta-data=/dev/sdb1 isize=2048 agcount=4, agsize=1041728 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=0 finobt=0, sparse=0, rmapbt=0
> = reflink=0
> data = bsize=4096 blocks=4166912, imaxpct=25
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> log =internal log bsize=4096 blocks=2560, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=1
> realtime =/dev/sdc1 extsz=1069056 blocks=1949338880, rtextents=7468731
> [root@m3 ~]# echo "frag -s -e" | /tmp/xfs_db -r /dev/sdc1
> xfs_db: /dev/sdc1 is not a valid XFS filesystem (unexpected SB magic number 0x68656164)
> Use -F to force a read attempt.
> [root@m3 ~]# echo "frag -s -e" | /tmp/xfs_db -r /dev/sdb1
> xfs_db> actual 732480, ideal 360707, fragmentation factor 50.76%
> Note, this number is largely meaningless.
> Files on this filesystem average 2.03 extents per file
> Maximum extents in a file 14
> Histogram of number of extents per file:
> bucket = count % of total
> <= 1 = 350934 97.696 %
> <= 2 = 6231 1.735 %
> <= 4 = 1001 0.279 %
> <= 8 = 953 0.265 %
> <= 16 = 92 0.026 %
> Maximum file size 26.508 MB
> Histogram of file size:
> bucket = allocated used overhead(bytes)
> <= 4 KB = 0 62 314048512 0.13%
> <= 8 KB = 0 119911 127209263104 53.28%
> <= 16 KB = 0 14543 15350194176 6.43%
> <= 32 KB = 909 12330 11851161600 4.96%
> <= 64 KB = 92 6704 6828642304 2.86%
> <= 128 KB = 1 7132 6933372928 2.90%
> <= 256 KB = 0 10013 8753799168 3.67%
> <= 512 KB = 0 13616 9049227264 3.79%
> <= 1 MB = 1 15056 4774912000 2.00%
> <= 2 MB = 198662 17168 9690226688 4.06%
> <= 4 MB = 28639 21073 11806654464 4.94%
> <= 8 MB = 35169 29878 14200553472 5.95%
> <= 16 MB = 95667 91633 11939287040 5.00%
> <= 32 MB = 71 62 28471742 0.01%
> capacity used (bytes): 1097735533058 (1022.346 GB)
> capacity allocated (bytes): 1336497410048 (1.216 TB)
> block overhead (bytes): 238761885182 (21.750 %)
ok if you do this you have to handle the RFE for file tail packing too ;)
> xfs_db>
>
> Signed-off-by: Jorge Guerra <jorgeguerra@fb.com>
> ---
> db/frag.c | 210 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> man/man8/xfs_db.8 | 8 ++-
> 2 files changed, 211 insertions(+), 7 deletions(-)
>
> diff --git a/db/frag.c b/db/frag.c
> index 91395234..5d569325 100644
> --- a/db/frag.c
> +++ b/db/frag.c
> @@ -15,6 +15,31 @@
> #include "init.h"
> #include "malloc.h"
>
> +#define PERCENT(x, y) (((double)(x) * 100)/(y))
> +//#define ARRAY_SIZE(a) (sizeof((a))/sizeof((a)[0]))
no need to add commented-out new #defines
> +#define BLOCKS_2_BYTES(b) ((b) << 12)
only for 4k blocks, right?
I think you want to use
XFS_FSB_TO_B(mp,fsbno) though getting mp might be fun
> +#define CLZ(n) (__builtin_clzl(n))
> +#define CTZ(n) (__builtin_ctzl(n))
> +
> +#define N_BUCKETS 64
> +
> +typedef struct extentstats {
> + uint64_t allocsize[N_BUCKETS + 1];
> + uint64_t usedsize[N_BUCKETS + 1];
> + uint64_t wastedsize[N_BUCKETS + 1];
> + uint64_t maxfilesize;
> + uint64_t logicalused;
> + uint64_t physicalused;
> + uint64_t wastedspace;
> + bool realtime;
> +} extentstats_t;
> +
> +typedef struct fileextstats {
> + uint64_t extsbuckets[N_BUCKETS + 1];
> + uint64_t maxexts;
> + uint64_t numfiles;
> +} fileextstats_t;
> +
> typedef struct extent {
> xfs_fileoff_t startoff;
> xfs_filblks_t blockcount;
> @@ -38,6 +63,10 @@ static int qflag;
> static int Rflag;
> static int rflag;
> static int vflag;
up until here it seems like declarations were ~alphabetical; I'd keep
it that way, or group all the flags together, vs. the randomness
you've introduced below. </nitpick>
> +static int eflag;
> +static extentstats_t extstats;
> +static int sflag;
> +static fileextstats_t festats;
We've been trying to avoid typedefs where we don't need them,
I /think/ just "struct extentstats extstats;" would
be preferred here. (yes there are typedefs in the code but we've
been trying to move the other way)
> typedef void (*scan_lbtree_f_t)(struct xfs_btree_block *block,
> int level,
> @@ -49,7 +78,7 @@ typedef void (*scan_sbtree_f_t)(struct xfs_btree_block *block,
> xfs_agf_t *agf);
>
> static extmap_t *extmap_alloc(xfs_extnum_t nex);
> -static xfs_extnum_t extmap_ideal(extmap_t *extmap);
> +static xfs_extnum_t extmap_ideal(extmap_t *extmap, uint64_t *fallocsize);
> static void extmap_set_ext(extmap_t **extmapp, xfs_fileoff_t o,
> xfs_extlen_t c);
> static int frag_f(int argc, char **argv);
> @@ -77,9 +106,46 @@ static void scanfunc_ino(struct xfs_btree_block *block, int level,
>
> static const cmdinfo_t frag_cmd =
> { "frag", NULL, frag_f, 0, -1, 0,
> - "[-a] [-d] [-f] [-l] [-q] [-R] [-r] [-v]",
> + "[-a] [-d] [-e] [-f] [-l] [-q] [-R] [-r] [-s] [-v]",
Heh, it might be time for
> + "[-adeflgRrsv]",
:)
> "get file fragmentation data", NULL };
>
> +// IEC 2^10 standard prefixes
/* C comments please */
> +static const char iec_prefixes[] =
> + { ' ', 'K', 'M', 'G', 'T', 'P', 'E', 'Z'};
> +
> +static double
> +bytes_2_human(
> + uint64_t bytes,
> + int *iecprefix)
> +{
> + double answer;
> + int i;
+bytes_2_human(
+ uint64_t bytes,
+ int *iecprefix)
+{
+ double answer;
+ int i;
> +
> + for (i = 0, answer = (double)bytes;
> + answer > 1024 && i < ARRAY_SIZE(iec_prefixes);
> + i++, answer /= 1024);
> + *iecprefix = i;
> +
> + return answer;
> +}
> +
> +static uint8_t
> +get_bucket(
> + uint64_t val)
> +{
> + uint8_t bucket;
> + uint8_t msbidx = 63 - CLZ(val);
> + uint8_t lsbidx = CTZ(val);
+ uint8_t bucket;
+ uint8_t msbidx = 63 - CLZ(val);
+ uint8_t lsbidx = CTZ(val);
> +
> + /*
> + * The bucket is computed as ceiling(s, 2^CLZ(s)), but this method is
> + * faster.
> + */
> + bucket = msbidx + (msbidx != lsbidx ? 1 : 0);
> +
> + return MIN(bucket, N_BUCKETS);
> +}
> +
> static extmap_t *
> extmap_alloc(
> xfs_extnum_t nex)
> @@ -96,18 +162,23 @@ extmap_alloc(
>
> static xfs_extnum_t
> extmap_ideal(
> - extmap_t *extmap)
> + extmap_t *extmap,
> + uint64_t *fallocsize)
"fallocsize" is a little bit of an odd choice given the existence
of "falloc" and "fallocate" - which are unrelated here.
maybe f_allocsize / f_usedsize / f_wastedsize? Not sure.
> {
> extent_t *ep;
> xfs_extnum_t rval;
> + uint64_t fsize = 0;
>
> for (ep = &extmap->ents[0], rval = 0;
> ep < &extmap->ents[extmap->nents];
> ep++) {
> + fsize += BLOCKS_2_BYTES(ep->blockcount);
XFS_FSB_TO_B(mp, ep->blockcount) except of course you don't have mp...
Could also use a file stat to get allocated blocks all at once, but
it's otherwise convenient here I suppose...
> if (ep == &extmap->ents[0] ||
> ep->startoff != ep[-1].startoff + ep[-1].blockcount)
> rval++;
> }
> + *fallocsize = fsize;
> +
> return rval;
> }
>
> @@ -133,6 +204,80 @@ extmap_set_ext(
> }
>
> void
> +print_extents_histo(void)
> +{
> + int i;
> + int nfiles = 0;
> +
> + dbprintf(_("Maximum extents in a file %lu\n"), festats.maxexts);
> + dbprintf(_("Histogram of number of extents per file:\n"));
> + dbprintf(_(" %7s =\t%8s\t%s\n"), "bucket", "count", "\% of total");
> + for (i = 0;
> + i <= N_BUCKETS && nfiles < festats.numfiles; i++) {
> + nfiles += festats.extsbuckets[i];
> + if (nfiles == 0)
> + continue;
> + dbprintf(_("<= %7u = \t%8u\t%.3f \%\n"), 1 << i, festats.extsbuckets[i],
<= 80 cols please
> + PERCENT(festats.extsbuckets[i], festats.numfiles));
> + }
> +}
> +
> +void
> +print_file_size_histo(void)
> +{
> + double answer;
> + int i;
> + int nfiles = 0;
> + int ufiles = 0;
> +
> + answer = bytes_2_human(extstats.maxfilesize, &i);
> + dbprintf(_("Maximum file size %.3f %cB\n"), answer, iec_prefixes[i]);
> + dbprintf(_("Histogram of file size:\n"));
> + if (extstats.realtime) {
> + dbprintf(_(" %7s =\t%8s\t%8s\t%12s\n"),
> + "bucket", "allocated", "used", "overhead(bytes)");
+ dbprintf(_(" %7s =\t%8s\t%8s\t%12s\n"),
+ "bucket", "allocated", "used", "overhead(bytes)");
> + for (i = 10; i <= N_BUCKETS && nfiles < festats.numfiles; i++) {
> + nfiles += extstats.allocsize[i];
> + ufiles += extstats.usedsize[i];
> + if (ufiles == 0)
> + continue;
> + dbprintf(_("<= %4u %cB =\t%8lu\t%8lu\t%12lu %.2f\%\n"), 1 << (i % 10),
Please do your best to keep lines <= 80 cols
> + iec_prefixes[i/10],
> + extstats.allocsize[i], extstats.usedsize[i],
> + extstats.wastedsize[i],
> + PERCENT(extstats.wastedsize[i], extstats.wastedspace));
> + }
> + answer = bytes_2_human(extstats.logicalused, &i);
> + dbprintf(_("capacity used (bytes): %llu (%.3f %cB)\n"),
> + extstats.logicalused, answer, iec_prefixes[i]);
> + answer = bytes_2_human(extstats.physicalused, &i);
> + dbprintf(_("capacity allocated (bytes): %llu (%.3f %cB)\n"),
> + extstats.physicalused, answer, iec_prefixes[i]);
> + answer = PERCENT(extstats.wastedspace, extstats.logicalused);
> + } else {
> + dbprintf(_(" %7s =\t%8s\t%12s\n"),
> + "bucket", "used", "overhead(bytes)");
+ dbprintf(_(" %7s =\t%8s\t%12s\n"),
+ "bucket", "used", "overhead(bytes)");
(keep the continued printf lines indented enough to make it obvious)
> + for (i = 10; i <= N_BUCKETS && nfiles < festats.numfiles; i++) {
> + nfiles += extstats.allocsize[i];
> + ufiles += extstats.usedsize[i];
> + if (ufiles == 0)
> + continue;
> + dbprintf(_("<= %4u %cB =\t%8lu\t%12lu %.2f\%\n"), 1 << (i % 10),
> + iec_prefixes[i/10],
> + extstats.allocsize[i],
> + extstats.wastedsize[i],
> + PERCENT(extstats.wastedsize[i], extstats.wastedspace));
> + }
> + answer = bytes_2_human(extstats.physicalused, &i);
> + dbprintf(_("capacity used (bytes): %llu (%.3f %cB)\n"),
> + extstats.physicalused, answer, iec_prefixes[i]);
> + answer = PERCENT(extstats.wastedspace, extstats.physicalused);
> + }
> + dbprintf(_("block overhead (bytes): %llu (%.3f \%)\n"),
> + extstats.wastedspace, answer);
> +}
> +
> +void
> frag_init(void)
> {
> add_command(&frag_cmd);
> @@ -164,6 +309,12 @@ frag_f(
> answer = (double)extcount_actual / (double)extcount_ideal;
> dbprintf(_("Files on this filesystem average %.2f extents per file\n"),
> answer);
> + if (eflag) {
> + print_extents_histo();
> + }
> + if (sflag) {
> + print_file_size_histo();
> + }
+ if (eflag)
+ print_extents_histo();
+ if (sflag)
+ print_file_size_histo();
is fine, we don't generally curly-brace single lines (when in Rome...)
> return 0;
> }
>
> @@ -174,9 +325,10 @@ init(
> {
> int c;
>
> - aflag = dflag = fflag = lflag = qflag = Rflag = rflag = vflag = 0;
> + aflag = dflag = eflag = fflag = lflag = qflag = Rflag =
> + rflag = sflag = vflag = 0;
I'd prefer to not split the line:
+ aflag = dflag = eflag = fflag = lflag = qflag = Rflag = 0;
+ rflag = sflag = vflag = 0;
> optind = 0;
> - while ((c = getopt(argc, argv, "adflqRrv")) != EOF) {
> + while ((c = getopt(argc, argv, "adeflqRrsv")) != EOF) {
> switch (c) {
> case 'a':
> aflag = 1;
> @@ -184,6 +336,9 @@ init(
> case 'd':
> dflag = 1;
> break;
> + case 'e':
> + eflag = 1;
> + break;
> case 'f':
> fflag = 1;
> break;
> @@ -199,6 +354,9 @@ init(
> case 'r':
> rflag = 1;
> break;
> + case 's':
> + sflag = 1;
> + break;
> case 'v':
> vflag = 1;
> break;
> @@ -210,6 +368,8 @@ init(
> if (!aflag && !dflag && !fflag && !lflag && !qflag && !Rflag && !rflag)
> aflag = dflag = fflag = lflag = qflag = Rflag = rflag = 1;
> extcount_actual = extcount_ideal = 0;
> + memset(&extstats, 0 , sizeof(extstats));
> + memset(&festats, 0 , sizeof(festats));
No space before , :
+ memset(&extstats, 0, sizeof(extstats));
+ memset(&festats, 0, sizeof(festats));
> return 1;
> }
>
> @@ -274,6 +434,10 @@ process_fork(
> {
> extmap_t *extmap;
> int nex;
> + int bucket;
> + uint64_t fallocsize;
> + uint64_t fusedsize;
> + uint64_t fwastedsize;
+ int bucket;
+ uint64_t fallocsize;
+ uint64_t fusedsize;
+ uint64_t fwastedsize;
>
> nex = XFS_DFORK_NEXTENTS(dip, whichfork);
> if (!nex)
> @@ -288,7 +452,41 @@ process_fork(
> break;
> }
> extcount_actual += extmap->nents;
> - extcount_ideal += extmap_ideal(extmap);
> + extcount_ideal += extmap_ideal(extmap, &fallocsize);
> +
> + if (sflag) {
> + // Record file size stats
/* C comments please */
> + fusedsize = be64_to_cpu(dip->di_size);
> + bucket = get_bucket(fallocsize);
> + extstats.allocsize[bucket]++;
> + bucket = get_bucket(fusedsize);
> + extstats.usedsize[bucket]++;
> +
> + if (fallocsize > fusedsize) {
> + fwastedsize = fallocsize - fusedsize;
> + extstats.wastedspace += fwastedsize;
> + extstats.wastedsize[bucket] += fwastedsize;
> + }
> + extstats.logicalused += fusedsize;
> + extstats.physicalused += fallocsize;
> + extstats.maxfilesize = MAX(extstats.maxfilesize, fallocsize);
> + if (be16_to_cpu(dip->di_flags) & XFS_DIFLAG_REALTIME) {
> + extstats.realtime = true;
> + }
> + }
> +
> + if (eflag) {
> + // Record file extent stats
/* C comments ... */
> + bucket = get_bucket(extmap->nents);
> + if (be16_to_cpu(dip->di_flags) & XFS_DIFLAG_REALTIME) {
> + // Realtime inodes have an additional extent
> + bucket = get_bucket(MAX(extmap->nents - 1, 1));
> + }
> + festats.extsbuckets[bucket]++;
> + festats.maxexts = MAX(festats.maxexts, extmap->nents);
> + }
> + festats.numfiles++;
> +
> xfree(extmap);
> }
>
> diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8
> index a1ee3514..52d5f18a 100644
> --- a/man/man8/xfs_db.8
> +++ b/man/man8/xfs_db.8
> @@ -489,7 +489,7 @@ command.
> .B forward
> Move forward to the next entry in the position ring.
> .TP
> -.B frag [\-adflqRrv]
> +.B frag [\-adeflqRrsv]
> Get file fragmentation data. This prints information about fragmentation
> of file data in the filesystem (as opposed to fragmentation of freespace,
> for which see the
> @@ -510,6 +510,9 @@ enables processing of attribute data.
> .B \-d
> enables processing of directory data.
> .TP
> +.B \-e
> +enables computing extent count per inode histogram.
> +.TP
> .B \-f
> enables processing of regular file data.
> .TP
> @@ -524,6 +527,9 @@ enables processing of realtime control file data.
> .TP
> .B \-r
> enables processing of realtime file data.
> +.TP
> +.B \-s
> +enables computing file size histogram and file system overheads.
> .RE
> .TP
> .BI "freesp [\-bcds] [\-A " alignment "] [\-a " ag "] ... [\-e " i "] [\-h " h1 "] ... [\-m " m ]
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] xfs_db: add extent count and file size histograms
2019-05-14 18:50 [PATCH] xfs_db: add extent count and file size histograms Jorge Guerra
2019-05-14 19:52 ` Eric Sandeen
@ 2019-05-14 20:02 ` Eric Sandeen
2019-05-15 15:57 ` Jorge Guerra
2019-05-14 23:31 ` Dave Chinner
2 siblings, 1 reply; 14+ messages in thread
From: Eric Sandeen @ 2019-05-14 20:02 UTC (permalink / raw)
To: Jorge Guerra, linux-xfs; +Cc: osandov, Jorge Guerra
On 5/14/19 1:50 PM, Jorge Guerra wrote:
> + dbprintf(_("capacity used (bytes): %llu (%.3f %cB)\n"),
> + extstats.logicalused, answer, iec_prefixes[i]);
I think I missed this instance of "indent please" and probably others...
(I'm kind of wondering about carrying 'used' in bytes, but I suppose we're
ok until we really get zettabyte filesytems in the wild) ;)
-Eric
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] xfs_db: add extent count and file size histograms
2019-05-14 18:50 [PATCH] xfs_db: add extent count and file size histograms Jorge Guerra
2019-05-14 19:52 ` Eric Sandeen
2019-05-14 20:02 ` Eric Sandeen
@ 2019-05-14 23:31 ` Dave Chinner
2019-05-15 0:06 ` Eric Sandeen
2019-05-15 16:15 ` Jorge Guerra
2 siblings, 2 replies; 14+ messages in thread
From: Dave Chinner @ 2019-05-14 23:31 UTC (permalink / raw)
To: Jorge Guerra; +Cc: linux-xfs, osandov, Jorge Guerra
On Tue, May 14, 2019 at 11:50:26AM -0700, Jorge Guerra wrote:
> From: Jorge Guerra <jorgeguerra@fb.com>
>
> In this change we add two feature to the xfs_db 'frag' command:
>
> 1) Extent count histogram [-e]: This option enables tracking the
> number of extents per inode (file) as the we traverse the file
> system. The end result is a histogram of the number of extents per
> file in power of 2 buckets.
>
> 2) File size histogram and file system internal fragmentation stats
> [-s]: This option enables tracking file sizes both in terms of what
> has been physically allocated and how much has been written to the
> file. In addition, we track the amount of internal fragmentation
> seen per file. This is particularly useful in the case of real
> time devices where space is allocated in units of fixed sized
> extents.
I can see the usefulness of having such information, but xfs_db is
the wrong tool/interface for generating such usage reports.
> The man page for xfs_db has been updated to reflect these new command
> line arguments.
>
> Tests:
>
> We tested this change on several XFS file systems with different
> configurations:
>
> 1) regular XFS:
>
> [root@m1 ~]# xfs_info /mnt/d0
> meta-data=/dev/sdb1 isize=256 agcount=10, agsize=268435455 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=0 finobt=0, sparse=0, rmapbt=0
> = reflink=0
> data = bsize=4096 blocks=2441608704, imaxpct=100
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> log =internal log bsize=4096 blocks=521728, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
> [root@m1 ~]# echo "frag -e -s" | xfs_db -r /dev/sdb1
> xfs_db> actual 494393, ideal 489246, fragmentation factor 1.04%
For example, xfs_db is not the right tool for probing online, active
filesystems. It is not coherent with the active kernel filesystem,
and is quite capable of walking off into la-la land as a result of
mis-parsing the inconsistent filesystem that is on disk underneath
active mounted filesystems. This does not make for a robust, usable
tool, let alone one that can make use of things like rmap for
querying usage and ownership information really quickly.
To solve this problem, we now have the xfs_spaceman tool and the
GETFSMAP ioctl for running usage queries on mounted filesystems.
That avoids all the coherency and crash problems, and for rmap
enabled filesystems it does not require scanning the entire
filesystem to work out this information (i.e. it can all be derived
from the contents of the rmap tree).
So I'd much prefer that new online filesystem queries go into
xfs-spaceman and use GETFSMAP so they can be accelerated on rmap
configured filesystems rather than hoping xfs_db will parse the
entire mounted filesystem correctly while it is being actively
changed...
> Maximum extents in a file 14
> Histogram of number of extents per file:
> bucket = count % of total
> <= 1 = 350934 97.696 %
> <= 2 = 6231 1.735 %
> <= 4 = 1001 0.279 %
> <= 8 = 953 0.265 %
> <= 16 = 92 0.026 %
> Maximum file size 26.508 MB
> Histogram of file size:
> bucket = allocated used overhead(bytes)
> <= 4 KB = 0 62 314048512 0.13%
> <= 8 KB = 0 119911 127209263104 53.28%
> <= 16 KB = 0 14543 15350194176 6.43%
> <= 32 KB = 909 12330 11851161600 4.96%
> <= 64 KB = 92 6704 6828642304 2.86%
> <= 128 KB = 1 7132 6933372928 2.90%
> <= 256 KB = 0 10013 8753799168 3.67%
> <= 512 KB = 0 13616 9049227264 3.79%
> <= 1 MB = 1 15056 4774912000 2.00%
> <= 2 MB = 198662 17168 9690226688 4.06%
> <= 4 MB = 28639 21073 11806654464 4.94%
> <= 8 MB = 35169 29878 14200553472 5.95%
> <= 16 MB = 95667 91633 11939287040 5.00%
> <= 32 MB = 71 62 28471742 0.01%
> capacity used (bytes): 1097735533058 (1022.346 GB)
> capacity allocated (bytes): 1336497410048 (1.216 TB)
> block overhead (bytes): 238761885182 (21.750 %)
BTW, "bytes" as a display unit is stupidly verbose and largely
unnecessary. The byte count is /always/ going to be a multiple of
the filesystem block size, and the first thing anyone who wants to
use this for diagnosis is going to have to do is return the byte
count to filesystem blocks (which is what the filesystem itself
tracks everything in. ANd then when you have PB scale filesystems,
anything more than 3 significant digits is just impossible to read
and compare - that "overhead" column (what the "overhead" even
mean?) is largely impossible to read and determine what the actual
capacity used is without counting individual digits in each number.
FWIW, we already have extent histogram code in xfs_spaceman
(in spaceman/freesp.c) and in xfs_db (db/freesp.c) so we really
don't need re-implementation of the same functionality we already
have duplicate copies of. I'd suggest that the histogram code should
be factored and moved to libfrog/ and then enhanced if new histogram
functionality is required...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] xfs_db: add extent count and file size histograms
2019-05-14 23:31 ` Dave Chinner
@ 2019-05-15 0:06 ` Eric Sandeen
2019-05-15 2:05 ` Dave Chinner
2019-05-15 16:15 ` Jorge Guerra
1 sibling, 1 reply; 14+ messages in thread
From: Eric Sandeen @ 2019-05-15 0:06 UTC (permalink / raw)
To: Dave Chinner, Jorge Guerra; +Cc: linux-xfs, osandov, Jorge Guerra
On 5/14/19 6:31 PM, Dave Chinner wrote:
> On Tue, May 14, 2019 at 11:50:26AM -0700, Jorge Guerra wrote:
>> From: Jorge Guerra <jorgeguerra@fb.com>
>>
>> In this change we add two feature to the xfs_db 'frag' command:
>>
>> 1) Extent count histogram [-e]: This option enables tracking the
>> number of extents per inode (file) as the we traverse the file
>> system. The end result is a histogram of the number of extents per
>> file in power of 2 buckets.
>>
>> 2) File size histogram and file system internal fragmentation stats
>> [-s]: This option enables tracking file sizes both in terms of what
>> has been physically allocated and how much has been written to the
>> file. In addition, we track the amount of internal fragmentation
>> seen per file. This is particularly useful in the case of real
>> time devices where space is allocated in units of fixed sized
>> extents.
>
> I can see the usefulness of having such information, but xfs_db is
> the wrong tool/interface for generating such usage reports.
>
>> The man page for xfs_db has been updated to reflect these new command
>> line arguments.
>>
>> Tests:
>>
>> We tested this change on several XFS file systems with different
>> configurations:
>>
>> 1) regular XFS:
>>
>> [root@m1 ~]# xfs_info /mnt/d0
>> meta-data=/dev/sdb1 isize=256 agcount=10, agsize=268435455 blks
>> = sectsz=4096 attr=2, projid32bit=1
>> = crc=0 finobt=0, sparse=0, rmapbt=0
>> = reflink=0
>> data = bsize=4096 blocks=2441608704, imaxpct=100
>> = sunit=0 swidth=0 blks
>> naming =version 2 bsize=4096 ascii-ci=0, ftype=1
>> log =internal log bsize=4096 blocks=521728, version=2
>> = sectsz=4096 sunit=1 blks, lazy-count=1
>> realtime =none extsz=4096 blocks=0, rtextents=0
>> [root@m1 ~]# echo "frag -e -s" | xfs_db -r /dev/sdb1
>> xfs_db> actual 494393, ideal 489246, fragmentation factor 1.04%
>
> For example, xfs_db is not the right tool for probing online, active
> filesystems.
yes, the usage example is poor. (I almost wonder if we should disallow
certain operations with -r ...)
> It is not coherent with the active kernel filesystem,
> and is quite capable of walking off into la-la land as a result of
> mis-parsing the inconsistent filesystem that is on disk underneath
> active mounted filesystems. This does not make for a robust, usable
> tool, let alone one that can make use of things like rmap for
> querying usage and ownership information really quickly.
>
> To solve this problem, we now have the xfs_spaceman tool and the
> GETFSMAP ioctl for running usage queries on mounted filesystems.
> That avoids all the coherency and crash problems, and for rmap
> enabled filesystems it does not require scanning the entire
> filesystem to work out this information (i.e. it can all be derived
> from the contents of the rmap tree).
>
> So I'd much prefer that new online filesystem queries go into
> xfs-spaceman and use GETFSMAP so they can be accelerated on rmap
> configured filesystems rather than hoping xfs_db will parse the
> entire mounted filesystem correctly while it is being actively
> changed...
Yeah fair point.
>> Maximum extents in a file 14
>> Histogram of number of extents per file:
>> bucket = count % of total
>> <= 1 = 350934 97.696 %
>> <= 2 = 6231 1.735 %
>> <= 4 = 1001 0.279 %
>> <= 8 = 953 0.265 %
>> <= 16 = 92 0.026 %
>> Maximum file size 26.508 MB
>> Histogram of file size:
>> bucket = allocated used overhead(bytes)
>> <= 4 KB = 0 62 314048512 0.13%
>> <= 8 KB = 0 119911 127209263104 53.28%
>> <= 16 KB = 0 14543 15350194176 6.43%
>> <= 32 KB = 909 12330 11851161600 4.96%
>> <= 64 KB = 92 6704 6828642304 2.86%
>> <= 128 KB = 1 7132 6933372928 2.90%
>> <= 256 KB = 0 10013 8753799168 3.67%
>> <= 512 KB = 0 13616 9049227264 3.79%
>> <= 1 MB = 1 15056 4774912000 2.00%
>> <= 2 MB = 198662 17168 9690226688 4.06%
>> <= 4 MB = 28639 21073 11806654464 4.94%
>> <= 8 MB = 35169 29878 14200553472 5.95%
>> <= 16 MB = 95667 91633 11939287040 5.00%
>> <= 32 MB = 71 62 28471742 0.01%
>> capacity used (bytes): 1097735533058 (1022.346 GB)
>> capacity allocated (bytes): 1336497410048 (1.216 TB)
>> block overhead (bytes): 238761885182 (21.750 %)
>
> BTW, "bytes" as a display unit is stupidly verbose and largely
> unnecessary. The byte count is /always/ going to be a multiple of
> the filesystem block size, and the first thing anyone who wants to
> use this for diagnosis is going to have to do is return the byte
> count to filesystem blocks (which is what the filesystem itself
> tracks everything in. ANd then when you have PB scale filesystems,
> anything more than 3 significant digits is just impossible to read
> and compare - that "overhead" column (what the "overhead" even
> mean?) is largely impossible to read and determine what the actual
> capacity used is without counting individual digits in each number.
But if the whole point is trying to figure out "internal fragmentation"
then it's the only unit that makes sense, right? This is the "15 bytes"
of a 15 byte file (or extent) allocated into a 4k block.
OTOH, for any random file distribution it's going to trend towards half
a block, so I'm not sure how useful this is in the end.
(however your example seems to show roughly 200x the waste expected,
so I kind of wonder if that points to a bug somewhere in your patch...)
> FWIW, we already have extent histogram code in xfs_spaceman
> (in spaceman/freesp.c) and in xfs_db (db/freesp.c) so we really
> don't need re-implementation of the same functionality we already
> have duplicate copies of. I'd suggest that the histogram code should
> be factored and moved to libfrog/ and then enhanced if new histogram
> functionality is required...
Also a fair point, I had forgotten about that.
Thanks,
-Eric
> Cheers,
>
> Dave.
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] xfs_db: add extent count and file size histograms
2019-05-15 0:06 ` Eric Sandeen
@ 2019-05-15 2:05 ` Dave Chinner
2019-05-15 16:39 ` Jorge Guerra
0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2019-05-15 2:05 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Jorge Guerra, linux-xfs, osandov, Jorge Guerra
On Tue, May 14, 2019 at 07:06:52PM -0500, Eric Sandeen wrote:
> On 5/14/19 6:31 PM, Dave Chinner wrote:
> > On Tue, May 14, 2019 at 11:50:26AM -0700, Jorge Guerra wrote:
> >> Maximum extents in a file 14
> >> Histogram of number of extents per file:
> >> bucket = count % of total
> >> <= 1 = 350934 97.696 %
> >> <= 2 = 6231 1.735 %
> >> <= 4 = 1001 0.279 %
> >> <= 8 = 953 0.265 %
> >> <= 16 = 92 0.026 %
> >> Maximum file size 26.508 MB
> >> Histogram of file size:
> >> bucket = allocated used overhead(bytes)
> >> <= 4 KB = 0 62 314048512 0.13%
> >> <= 8 KB = 0 119911 127209263104 53.28%
> >> <= 16 KB = 0 14543 15350194176 6.43%
> >> <= 32 KB = 909 12330 11851161600 4.96%
> >> <= 64 KB = 92 6704 6828642304 2.86%
> >> <= 128 KB = 1 7132 6933372928 2.90%
> >> <= 256 KB = 0 10013 8753799168 3.67%
> >> <= 512 KB = 0 13616 9049227264 3.79%
> >> <= 1 MB = 1 15056 4774912000 2.00%
> >> <= 2 MB = 198662 17168 9690226688 4.06%
> >> <= 4 MB = 28639 21073 11806654464 4.94%
> >> <= 8 MB = 35169 29878 14200553472 5.95%
> >> <= 16 MB = 95667 91633 11939287040 5.00%
> >> <= 32 MB = 71 62 28471742 0.01%
> >> capacity used (bytes): 1097735533058 (1022.346 GB)
> >> capacity allocated (bytes): 1336497410048 (1.216 TB)
> >> block overhead (bytes): 238761885182 (21.750 %)
> >
> > BTW, "bytes" as a display unit is stupidly verbose and largely
> > unnecessary. The byte count is /always/ going to be a multiple of
> > the filesystem block size, and the first thing anyone who wants to
> > use this for diagnosis is going to have to do is return the byte
> > count to filesystem blocks (which is what the filesystem itself
> > tracks everything in. ANd then when you have PB scale filesystems,
> > anything more than 3 significant digits is just impossible to read
> > and compare - that "overhead" column (what the "overhead" even
> > mean?) is largely impossible to read and determine what the actual
> > capacity used is without counting individual digits in each number.
>
> But if the whole point is trying to figure out "internal fragmentation"
> then it's the only unit that makes sense, right? This is the "15 bytes"
> of a 15 byte file (or extent) allocated into a 4k block.
Urk. I missed that - I saw "-s" and assumed that, like the other
extent histogram printing commands we have, it meant "print summary
information". i.e. the last 3 lines in the above output.
But the rest of it? It comes back to my comment "what does overhead
even mean"? All it is a measure of how many bytes are allocated in
extents vs the file size. It assumes that if there is more bytes
allocated in extents than the file size, then the excess is "wasted
space".
This is not a measure of "internal fragmentation". It doesn't take
into account the fact we can (and do) allocate extents beyond EOF
that are there (temporarily or permanently) for the file to be
extended into without physically fragmenting the file. These can go
away at any time, so one scan might show massive "internal
fragmentation" and then a minute later after the EOF block scanner
runs there is none. i.e. without changing the file data, the layout
of the file within EOF, or file size, "internal fragmentation" can
just magically disappear.
It doesn't take into account sparse files. Well, it does by
ignoring them which is another flag that this isn't measuring
internal fragmentation because even sparse files can be internally
fragmented.
Which is another thing this doesn't take into account: the amount of
data actually written to the files. e.g. a preallocated, zero length
file is "internally fragmented" by this criteria, but the same empty
file with a file size that matches the preallocation is not
"internally fragmented". Yet an actual internally fragmented file
(e.g. preallocate 1MB, set size to 1MB, write 4k at 256k) will not
actually be noticed by this code....
IOWs, what is being reported here is exactly the same information
that "stat(blocks) vs stat(size)" will tell you, which makes me
wonder why the method of gathering it (full fs scan via xfs_db) is
being used when this could be done with a simple script based around
this:
$ find /mntpt -type f -exec stat -c "%s %b" {} \; | histogram_script
I have no problems with adding analysis and reporting functionality
to the filesystem tools, but they have to be done the right way, and
not duplicate functionality and information that can be trivially
obtained from userspace with a script and basic utilities. IMO,
there has to be some substantial benefit from implementing the
functionality using deep, dark filesystem gubbins that can't be
acheived in any other way for it be worth the additional code
maintenance burden....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] xfs_db: add extent count and file size histograms
2019-05-14 20:02 ` Eric Sandeen
@ 2019-05-15 15:57 ` Jorge Guerra
2019-05-15 16:02 ` Eric Sandeen
0 siblings, 1 reply; 14+ messages in thread
From: Jorge Guerra @ 2019-05-15 15:57 UTC (permalink / raw)
To: Eric Sandeen; +Cc: linux-xfs, Omar Sandoval, Jorge Guerra
Thanks Eric,
I'm addressing these comments. Will send an update once we have an
agreement with Dave into how and where to implement this.
On Tue, May 14, 2019 at 1:02 PM Eric Sandeen <sandeen@sandeen.net> wrote:
>
> On 5/14/19 1:50 PM, Jorge Guerra wrote:
> > + dbprintf(_("capacity used (bytes): %llu (%.3f %cB)\n"),
> > + extstats.logicalused, answer, iec_prefixes[i]);
>
> I think I missed this instance of "indent please" and probably others...
>
> (I'm kind of wondering about carrying 'used' in bytes, but I suppose we're
> ok until we really get zettabyte filesytems in the wild) ;)
>
> -Eric
--
Jorge E Guerra D
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] xfs_db: add extent count and file size histograms
2019-05-15 15:57 ` Jorge Guerra
@ 2019-05-15 16:02 ` Eric Sandeen
0 siblings, 0 replies; 14+ messages in thread
From: Eric Sandeen @ 2019-05-15 16:02 UTC (permalink / raw)
To: Jorge Guerra; +Cc: linux-xfs, Omar Sandoval, Jorge Guerra
On 5/15/19 10:57 AM, Jorge Guerra wrote:
> Thanks Eric,
>
> I'm addressing these comments. Will send an update once we have an
> agreement with Dave into how and where to implement this.
Might want to give Dave's concerns thought before doing too much
editing of this patch, but it's up to you. :)
(same style comments will apply to any solution, though)
Thanks,
-Eric
> On Tue, May 14, 2019 at 1:02 PM Eric Sandeen <sandeen@sandeen.net> wrote:
>>
>> On 5/14/19 1:50 PM, Jorge Guerra wrote:
>>> + dbprintf(_("capacity used (bytes): %llu (%.3f %cB)\n"),
>>> + extstats.logicalused, answer, iec_prefixes[i]);
>>
>> I think I missed this instance of "indent please" and probably others...
>>
>> (I'm kind of wondering about carrying 'used' in bytes, but I suppose we're
>> ok until we really get zettabyte filesytems in the wild) ;)
>>
>> -Eric
>
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] xfs_db: add extent count and file size histograms
2019-05-14 23:31 ` Dave Chinner
2019-05-15 0:06 ` Eric Sandeen
@ 2019-05-15 16:15 ` Jorge Guerra
2019-05-15 16:24 ` Eric Sandeen
1 sibling, 1 reply; 14+ messages in thread
From: Jorge Guerra @ 2019-05-15 16:15 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs, Omar Sandoval, Jorge Guerra
Thanks Dave,
I appreciate you taking the time to review and comment.
On Tue, May 14, 2019 at 4:31 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Tue, May 14, 2019 at 11:50:26AM -0700, Jorge Guerra wrote:
> > From: Jorge Guerra <jorgeguerra@fb.com>
> >
> > In this change we add two feature to the xfs_db 'frag' command:
> >
> > 1) Extent count histogram [-e]: This option enables tracking the
> > number of extents per inode (file) as the we traverse the file
> > system. The end result is a histogram of the number of extents per
> > file in power of 2 buckets.
> >
> > 2) File size histogram and file system internal fragmentation stats
> > [-s]: This option enables tracking file sizes both in terms of what
> > has been physically allocated and how much has been written to the
> > file. In addition, we track the amount of internal fragmentation
> > seen per file. This is particularly useful in the case of real
> > time devices where space is allocated in units of fixed sized
> > extents.
>
> I can see the usefulness of having such information, but xfs_db is
> the wrong tool/interface for generating such usage reports.
>
> > The man page for xfs_db has been updated to reflect these new command
> > line arguments.
> >
> > Tests:
> >
> > We tested this change on several XFS file systems with different
> > configurations:
> >
> > 1) regular XFS:
> >
> > [root@m1 ~]# xfs_info /mnt/d0
> > meta-data=/dev/sdb1 isize=256 agcount=10, agsize=268435455 blks
> > = sectsz=4096 attr=2, projid32bit=1
> > = crc=0 finobt=0, sparse=0, rmapbt=0
> > = reflink=0
> > data = bsize=4096 blocks=2441608704, imaxpct=100
> > = sunit=0 swidth=0 blks
> > naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> > log =internal log bsize=4096 blocks=521728, version=2
> > = sectsz=4096 sunit=1 blks, lazy-count=1
> > realtime =none extsz=4096 blocks=0, rtextents=0
> > [root@m1 ~]# echo "frag -e -s" | xfs_db -r /dev/sdb1
> > xfs_db> actual 494393, ideal 489246, fragmentation factor 1.04%
>
> For example, xfs_db is not the right tool for probing online, active
> filesystems. It is not coherent with the active kernel filesystem,
> and is quite capable of walking off into la-la land as a result of
> mis-parsing the inconsistent filesystem that is on disk underneath
> active mounted filesystems. This does not make for a robust, usable
> tool, let alone one that can make use of things like rmap for
> querying usage and ownership information really quickly.
I see your point, that the FS is constantly changing and that we might
see an inconsistent view. But if we are generating bucketed
histograms we are anyways approximating the stats.
> To solve this problem, we now have the xfs_spaceman tool and the
> GETFSMAP ioctl for running usage queries on mounted filesystems.
> That avoids all the coherency and crash problems, and for rmap
> enabled filesystems it does not require scanning the entire
> filesystem to work out this information (i.e. it can all be derived
> from the contents of the rmap tree).
>
> So I'd much prefer that new online filesystem queries go into
> xfs-spaceman and use GETFSMAP so they can be accelerated on rmap
> configured filesystems rather than hoping xfs_db will parse the
> entire mounted filesystem correctly while it is being actively
> changed...
Good to know, I wasn't aware of this tool. However I seems like I
don't have that ioctl in my systems yet :(
# xfs_spaceman /mnt/d0
xfs_spaceman> frespc
command "frespc" not found
xfs_spaceman> fresp
command "fresp" not found
xfs_spaceman> freesp
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
xfs_spaceman: FS_IOC_GETFSMAP ["/mnt/d0"]: Inappropriate ioctl for device
from to extents blocks pct
xfs_spaceman>
One other thing. If we go this route then, we would need to issue an
ioctl for every file right? wouldn't this be much slower?
>
> > Maximum extents in a file 14
> > Histogram of number of extents per file:
> > bucket = count % of total
> > <= 1 = 350934 97.696 %
> > <= 2 = 6231 1.735 %
> > <= 4 = 1001 0.279 %
> > <= 8 = 953 0.265 %
> > <= 16 = 92 0.026 %
> > Maximum file size 26.508 MB
> > Histogram of file size:
> > bucket = allocated used overhead(bytes)
> > <= 4 KB = 0 62 314048512 0.13%
> > <= 8 KB = 0 119911 127209263104 53.28%
> > <= 16 KB = 0 14543 15350194176 6.43%
> > <= 32 KB = 909 12330 11851161600 4.96%
> > <= 64 KB = 92 6704 6828642304 2.86%
> > <= 128 KB = 1 7132 6933372928 2.90%
> > <= 256 KB = 0 10013 8753799168 3.67%
> > <= 512 KB = 0 13616 9049227264 3.79%
> > <= 1 MB = 1 15056 4774912000 2.00%
> > <= 2 MB = 198662 17168 9690226688 4.06%
> > <= 4 MB = 28639 21073 11806654464 4.94%
> > <= 8 MB = 35169 29878 14200553472 5.95%
> > <= 16 MB = 95667 91633 11939287040 5.00%
> > <= 32 MB = 71 62 28471742 0.01%
> > capacity used (bytes): 1097735533058 (1022.346 GB)
> > capacity allocated (bytes): 1336497410048 (1.216 TB)
> > block overhead (bytes): 238761885182 (21.750 %)
>
> BTW, "bytes" as a display unit is stupidly verbose and largely
> unnecessary. The byte count is /always/ going to be a multiple of
> the filesystem block size, and the first thing anyone who wants to
> use this for diagnosis is going to have to do is return the byte
> count to filesystem blocks (which is what the filesystem itself
> tracks everything in. ANd then when you have PB scale filesystems,
> anything more than 3 significant digits is just impossible to read
> and compare - that "overhead" column (what the "overhead" even
> mean?) is largely impossible to read and determine what the actual
> capacity used is without counting individual digits in each number.
Sure, I'll remove the bytes and display one in human readable units.
>
> FWIW, we already have extent histogram code in xfs_spaceman
> (in spaceman/freesp.c) and in xfs_db (db/freesp.c) so we really
> don't need re-implementation of the same functionality we already
Both these tools query the free space, the tool in this patch queries
the opposite, the size of the allocated extents and count of extents
per file.
> have duplicate copies of. I'd suggest that the histogram code should
> be factored and moved to libfrog/ and then enhanced if new histogram
> functionality is required...
Makes sense, will do!
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
--
Jorge E Guerra D
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] xfs_db: add extent count and file size histograms
2019-05-15 16:15 ` Jorge Guerra
@ 2019-05-15 16:24 ` Eric Sandeen
2019-05-15 16:47 ` Jorge Guerra
0 siblings, 1 reply; 14+ messages in thread
From: Eric Sandeen @ 2019-05-15 16:24 UTC (permalink / raw)
To: Jorge Guerra, Dave Chinner; +Cc: linux-xfs, Omar Sandoval, Jorge Guerra
On 5/15/19 11:15 AM, Jorge Guerra wrote:
> Thanks Dave,
>
> I appreciate you taking the time to review and comment.
>
> On Tue, May 14, 2019 at 4:31 PM Dave Chinner <david@fromorbit.com> wrote:
>>
>> On Tue, May 14, 2019 at 11:50:26AM -0700, Jorge Guerra wrote:
>>> From: Jorge Guerra <jorgeguerra@fb.com>
>>>
>>> In this change we add two feature to the xfs_db 'frag' command:
>>>
>>> 1) Extent count histogram [-e]: This option enables tracking the
>>> number of extents per inode (file) as the we traverse the file
>>> system. The end result is a histogram of the number of extents per
>>> file in power of 2 buckets.
>>>
>>> 2) File size histogram and file system internal fragmentation stats
>>> [-s]: This option enables tracking file sizes both in terms of what
>>> has been physically allocated and how much has been written to the
>>> file. In addition, we track the amount of internal fragmentation
>>> seen per file. This is particularly useful in the case of real
>>> time devices where space is allocated in units of fixed sized
>>> extents.
>>
>> I can see the usefulness of having such information, but xfs_db is
>> the wrong tool/interface for generating such usage reports.
>>
>>> The man page for xfs_db has been updated to reflect these new command
>>> line arguments.
>>>
>>> Tests:
>>>
>>> We tested this change on several XFS file systems with different
>>> configurations:
>>>
>>> 1) regular XFS:
>>>
>>> [root@m1 ~]# xfs_info /mnt/d0
>>> meta-data=/dev/sdb1 isize=256 agcount=10, agsize=268435455 blks
>>> = sectsz=4096 attr=2, projid32bit=1
>>> = crc=0 finobt=0, sparse=0, rmapbt=0
>>> = reflink=0
>>> data = bsize=4096 blocks=2441608704, imaxpct=100
>>> = sunit=0 swidth=0 blks
>>> naming =version 2 bsize=4096 ascii-ci=0, ftype=1
>>> log =internal log bsize=4096 blocks=521728, version=2
>>> = sectsz=4096 sunit=1 blks, lazy-count=1
>>> realtime =none extsz=4096 blocks=0, rtextents=0
>>> [root@m1 ~]# echo "frag -e -s" | xfs_db -r /dev/sdb1
>>> xfs_db> actual 494393, ideal 489246, fragmentation factor 1.04%
>>
>> For example, xfs_db is not the right tool for probing online, active
>> filesystems. It is not coherent with the active kernel filesystem,
>> and is quite capable of walking off into la-la land as a result of
>> mis-parsing the inconsistent filesystem that is on disk underneath
>> active mounted filesystems. This does not make for a robust, usable
>> tool, let alone one that can make use of things like rmap for
>> querying usage and ownership information really quickly.
>
> I see your point, that the FS is constantly changing and that we might
> see an inconsistent view. But if we are generating bucketed
> histograms we are anyways approximating the stats.
I think that Dave's "inconsistency" concern is literal - if the on-disk
metadata is not consistent, you may wander into what looks like corruption
if you try to traverse every inode while mounted.
It's pretty much never valid for userspace to try to traverse or read
the filesystem while mounted.
>> To solve this problem, we now have the xfs_spaceman tool and the
>> GETFSMAP ioctl for running usage queries on mounted filesystems.
>> That avoids all the coherency and crash problems, and for rmap
>> enabled filesystems it does not require scanning the entire
>> filesystem to work out this information (i.e. it can all be derived
>> from the contents of the rmap tree).
>>
>> So I'd much prefer that new online filesystem queries go into
>> xfs-spaceman and use GETFSMAP so they can be accelerated on rmap
>> configured filesystems rather than hoping xfs_db will parse the
>> entire mounted filesystem correctly while it is being actively
>> changed...
>
> Good to know, I wasn't aware of this tool. However I seems like I
> don't have that ioctl in my systems yet :(
It was added in 2017, in kernel-4.12 I believe.
What kernel did you test?
-Eric
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] xfs_db: add extent count and file size histograms
2019-05-15 2:05 ` Dave Chinner
@ 2019-05-15 16:39 ` Jorge Guerra
2019-05-15 22:55 ` Dave Chinner
0 siblings, 1 reply; 14+ messages in thread
From: Jorge Guerra @ 2019-05-15 16:39 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, linux-xfs, Omar Sandoval, Jorge Guerra
On Tue, May 14, 2019 at 7:05 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Tue, May 14, 2019 at 07:06:52PM -0500, Eric Sandeen wrote:
> > On 5/14/19 6:31 PM, Dave Chinner wrote:
> > > On Tue, May 14, 2019 at 11:50:26AM -0700, Jorge Guerra wrote:
> > >> Maximum extents in a file 14
> > >> Histogram of number of extents per file:
> > >> bucket = count % of total
> > >> <= 1 = 350934 97.696 %
> > >> <= 2 = 6231 1.735 %
> > >> <= 4 = 1001 0.279 %
> > >> <= 8 = 953 0.265 %
> > >> <= 16 = 92 0.026 %
> > >> Maximum file size 26.508 MB
> > >> Histogram of file size:
> > >> bucket = allocated used overhead(bytes)
> > >> <= 4 KB = 0 62 314048512 0.13%
> > >> <= 8 KB = 0 119911 127209263104 53.28%
> > >> <= 16 KB = 0 14543 15350194176 6.43%
> > >> <= 32 KB = 909 12330 11851161600 4.96%
> > >> <= 64 KB = 92 6704 6828642304 2.86%
> > >> <= 128 KB = 1 7132 6933372928 2.90%
> > >> <= 256 KB = 0 10013 8753799168 3.67%
> > >> <= 512 KB = 0 13616 9049227264 3.79%
> > >> <= 1 MB = 1 15056 4774912000 2.00%
> > >> <= 2 MB = 198662 17168 9690226688 4.06%
> > >> <= 4 MB = 28639 21073 11806654464 4.94%
> > >> <= 8 MB = 35169 29878 14200553472 5.95%
> > >> <= 16 MB = 95667 91633 11939287040 5.00%
> > >> <= 32 MB = 71 62 28471742 0.01%
> > >> capacity used (bytes): 1097735533058 (1022.346 GB)
> > >> capacity allocated (bytes): 1336497410048 (1.216 TB)
> > >> block overhead (bytes): 238761885182 (21.750 %)
> > >
> > > BTW, "bytes" as a display unit is stupidly verbose and largely
> > > unnecessary. The byte count is /always/ going to be a multiple of
> > > the filesystem block size, and the first thing anyone who wants to
> > > use this for diagnosis is going to have to do is return the byte
> > > count to filesystem blocks (which is what the filesystem itself
> > > tracks everything in. ANd then when you have PB scale filesystems,
> > > anything more than 3 significant digits is just impossible to read
> > > and compare - that "overhead" column (what the "overhead" even
> > > mean?) is largely impossible to read and determine what the actual
> > > capacity used is without counting individual digits in each number.
> >
> > But if the whole point is trying to figure out "internal fragmentation"
> > then it's the only unit that makes sense, right? This is the "15 bytes"
> > of a 15 byte file (or extent) allocated into a 4k block.
>
> Urk. I missed that - I saw "-s" and assumed that, like the other
> extent histogram printing commands we have, it meant "print summary
> information". i.e. the last 3 lines in the above output.
>
> But the rest of it? It comes back to my comment "what does overhead
> even mean"? All it is a measure of how many bytes are allocated in
> extents vs the file size. It assumes that if there is more bytes
> allocated in extents than the file size, then the excess is "wasted
> space".
Yes, the way I interpret "wasted space" is that if we allocate space
to an inode and the space is not used then it's label as wasted since
at that point we are consuming it and it's not available for immediate
use.
>
> This is not a measure of "internal fragmentation". It doesn't take
> into account the fact we can (and do) allocate extents beyond EOF
> that are there (temporarily or permanently) for the file to be
> extended into without physically fragmenting the file. These can go
> away at any time, so one scan might show massive "internal
> fragmentation" and then a minute later after the EOF block scanner
> runs there is none. i.e. without changing the file data, the layout
> of the file within EOF, or file size, "internal fragmentation" can
> just magically disappear.
I see, how much is do we expect this to be (i.e 1%, 10%? of the file
size?). In other words what's the order of magnitude of the
"preemtive" allocation compared to the total space in the file system?
>
> It doesn't take into account sparse files. Well, it does by
> ignoring them which is another flag that this isn't measuring
> internal fragmentation because even sparse files can be internally
> fragmented.
>
> Which is another thing this doesn't take into account: the amount of
> data actually written to the files. e.g. a preallocated, zero length
> file is "internally fragmented" by this criteria, but the same empty
> file with a file size that matches the preallocation is not
> "internally fragmented". Yet an actual internally fragmented file
> (e.g. preallocate 1MB, set size to 1MB, write 4k at 256k) will not
> actually be noticed by this code....
Interesting, how can we better account for these?
>
> IOWs, what is being reported here is exactly the same information
> that "stat(blocks) vs stat(size)" will tell you, which makes me
> wonder why the method of gathering it (full fs scan via xfs_db) is
> being used when this could be done with a simple script based around
> this:
>
> $ find /mntpt -type f -exec stat -c "%s %b" {} \; | histogram_script
While this is true that this can be measured via a simply a script,
I'd like to point out that it would be significantly more inefficient,
for instance:
# time find /mnt/pt -type f -exec stat -c "%s %b" {} \; > /tmp/file-sizes
real 27m38.885s
user 3m29.774s
sys 17m9.272s
# echo "frag -s -e" | time /tmp/xfs_db -r /dev/sdb1
[...]
0.44user 2.48system 0:05.42elapsed 53%CPU (0avgtext+0avgdata 996000maxresident)k
2079416inputs+0outputs (0major+248446minor)pagefaults 0swaps
That's 5.4s vs +27 minutes without considering the time to build the histogram.
>
> I have no problems with adding analysis and reporting functionality
> to the filesystem tools, but they have to be done the right way, and
> not duplicate functionality and information that can be trivially
> obtained from userspace with a script and basic utilities. IMO,
> there has to be some substantial benefit from implementing the
> functionality using deep, dark filesystem gubbins that can't be
> acheived in any other way for it be worth the additional code
> maintenance burden....
In my view, the efficiency gain should justify the need to for this
tool. An in fact this was our main motivation, we where using "du -s
--apparent-size" and comparing that to the result of "df" to estimate
FS overhead, but this method was consuming a lot more IO than what we
had budget for. With the proposed tool the we reduced IO 15x compared
to the "du vs df" method and collected more information along the way.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
--
Jorge E Guerra D
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] xfs_db: add extent count and file size histograms
2019-05-15 16:24 ` Eric Sandeen
@ 2019-05-15 16:47 ` Jorge Guerra
2019-05-15 16:51 ` Eric Sandeen
0 siblings, 1 reply; 14+ messages in thread
From: Jorge Guerra @ 2019-05-15 16:47 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Dave Chinner, linux-xfs, Omar Sandoval, Jorge Guerra
On Wed, May 15, 2019 at 9:24 AM Eric Sandeen <sandeen@sandeen.net> wrote:
> >> For example, xfs_db is not the right tool for probing online, active
> >> filesystems. It is not coherent with the active kernel filesystem,
> >> and is quite capable of walking off into la-la land as a result of
> >> mis-parsing the inconsistent filesystem that is on disk underneath
> >> active mounted filesystems. This does not make for a robust, usable
> >> tool, let alone one that can make use of things like rmap for
> >> querying usage and ownership information really quickly.
> >
> > I see your point, that the FS is constantly changing and that we might
> > see an inconsistent view. But if we are generating bucketed
> > histograms we are anyways approximating the stats.
>
> I think that Dave's "inconsistency" concern is literal - if the on-disk
> metadata is not consistent, you may wander into what looks like corruption
> if you try to traverse every inode while mounted.
>
> It's pretty much never valid for userspace to try to traverse or read
> the filesystem while mounted.
Sure, I understand this point. Then can we:
1) Abort scan if the we detect "corrupt" metadata, the user would then
either restart the scan or decide not to.
2) Have a mechanism which detects if the FS changed will scan was in
progress and tell the user the results might be stale?
>
> >> To solve this problem, we now have the xfs_spaceman tool and the
> >> GETFSMAP ioctl for running usage queries on mounted filesystems.
> >> That avoids all the coherency and crash problems, and for rmap
> >> enabled filesystems it does not require scanning the entire
> >> filesystem to work out this information (i.e. it can all be derived
> >> from the contents of the rmap tree).
> >>
> >> So I'd much prefer that new online filesystem queries go into
> >> xfs-spaceman and use GETFSMAP so they can be accelerated on rmap
> >> configured filesystems rather than hoping xfs_db will parse the
> >> entire mounted filesystem correctly while it is being actively
> >> changed...
> >
> > Good to know, I wasn't aware of this tool. However I seems like I
> > don't have that ioctl in my systems yet :(
>
> It was added in 2017, in kernel-4.12 I believe.
> What kernel did you test?
Yeap, that's it we tested in 4.11.
--
Jorge E Guerra D
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] xfs_db: add extent count and file size histograms
2019-05-15 16:47 ` Jorge Guerra
@ 2019-05-15 16:51 ` Eric Sandeen
0 siblings, 0 replies; 14+ messages in thread
From: Eric Sandeen @ 2019-05-15 16:51 UTC (permalink / raw)
To: Jorge Guerra; +Cc: Dave Chinner, linux-xfs, Omar Sandoval, Jorge Guerra
On 5/15/19 11:47 AM, Jorge Guerra wrote:
> On Wed, May 15, 2019 at 9:24 AM Eric Sandeen <sandeen@sandeen.net> wrote:
>>>> For example, xfs_db is not the right tool for probing online, active
>>>> filesystems. It is not coherent with the active kernel filesystem,
>>>> and is quite capable of walking off into la-la land as a result of
>>>> mis-parsing the inconsistent filesystem that is on disk underneath
>>>> active mounted filesystems. This does not make for a robust, usable
>>>> tool, let alone one that can make use of things like rmap for
>>>> querying usage and ownership information really quickly.
>>>
>>> I see your point, that the FS is constantly changing and that we might
>>> see an inconsistent view. But if we are generating bucketed
>>> histograms we are anyways approximating the stats.
>>
>> I think that Dave's "inconsistency" concern is literal - if the on-disk
>> metadata is not consistent, you may wander into what looks like corruption
>> if you try to traverse every inode while mounted.
>>
>> It's pretty much never valid for userspace to try to traverse or read
>> the filesystem while mounted.
>
> Sure, I understand this point. Then can we:
>
> 1) Abort scan if the we detect "corrupt" metadata, the user would then
> either restart the scan or decide not to.
> 2) Have a mechanism which detects if the FS changed will scan was in
> progress and tell the user the results might be stale?
none of that should be shoehorned into xfs_db, tbh. It's fine to use it
while unmounted. If you want to gather these stats on a mounted filesystem,
xfs_db is the wrong tool for the job. It's an offline inspection tool.
The fact that "-r" exists is because developers may need
it, but normal admin-facing tools should not be designed around it.
>>
>>>> To solve this problem, we now have the xfs_spaceman tool and the
>>>> GETFSMAP ioctl for running usage queries on mounted filesystems.
>>>> That avoids all the coherency and crash problems, and for rmap
>>>> enabled filesystems it does not require scanning the entire
>>>> filesystem to work out this information (i.e. it can all be derived
>>>> from the contents of the rmap tree).
>>>>
>>>> So I'd much prefer that new online filesystem queries go into
>>>> xfs-spaceman and use GETFSMAP so they can be accelerated on rmap
>>>> configured filesystems rather than hoping xfs_db will parse the
>>>> entire mounted filesystem correctly while it is being actively
>>>> changed...
>>>
>>> Good to know, I wasn't aware of this tool. However I seems like I
>>> don't have that ioctl in my systems yet :(
>>
>> It was added in 2017, in kernel-4.12 I believe.
>> What kernel did you test?
>
> Yeap, that's it we tested in 4.11.
Catch up! *grin*
-Eric
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] xfs_db: add extent count and file size histograms
2019-05-15 16:39 ` Jorge Guerra
@ 2019-05-15 22:55 ` Dave Chinner
0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2019-05-15 22:55 UTC (permalink / raw)
To: Jorge Guerra; +Cc: Eric Sandeen, linux-xfs, Omar Sandoval, Jorge Guerra
On Wed, May 15, 2019 at 09:39:01AM -0700, Jorge Guerra wrote:
> On Tue, May 14, 2019 at 7:05 PM Dave Chinner <david@fromorbit.com> wrote:
> > On Tue, May 14, 2019 at 07:06:52PM -0500, Eric Sandeen wrote:
> > > On 5/14/19 6:31 PM, Dave Chinner wrote:
> > This is not a measure of "internal fragmentation". It doesn't take
> > into account the fact we can (and do) allocate extents beyond EOF
> > that are there (temporarily or permanently) for the file to be
> > extended into without physically fragmenting the file. These can go
> > away at any time, so one scan might show massive "internal
> > fragmentation" and then a minute later after the EOF block scanner
> > runs there is none. i.e. without changing the file data, the layout
> > of the file within EOF, or file size, "internal fragmentation" can
> > just magically disappear.
>
> I see, how much is do we expect this to be (i.e 1%, 10%? of the file
> size?). In other words what's the order of magnitude of the
> "preemtive" allocation compared to the total space in the file system?
Specualtive delalloc can be up to MAXEXTLEN on large files. It is
typically the size of the file again as the file is growing. i.e. if
the file is 64k, we'll preallocate 64k. if it's 1GB, we'll prealloc
1GB, if it's over 8GB (MAXEXTLEN on a 4k block size filesystem),
then we'll prealloc 8GB.
This typically is not removed when the file is closed - it is
typically removed when the file has not been modified for a few
minutes and the EOF block scanner runs over it, the inode is cycled
out of cache or we hit an ENOSPC condition, in which case the EOF
block scanner is run to clean up such prealloc before we attempt
allocation again. THe amount of speculative prealloc is dialled back
as the filesystem gets nearer to ENOSPC (>95% capacity) or the user
starts to run out of quota space.
So, yes, it can be a large amount of space that is consumed
temporarily, but the amount is workload dependent. The
reality is that almost no-one notices that XFS does this or the
extent to which XFS makes liberal use of free space for
fragmentation avoidance...
Of course, the filesystem has no real control over user directed
preallocation beyond EOF (i.e. fallocate()) and we do not ever
remove that unless the user runs ftruncate(). Hence the space
beyond EOF might be a direct result of the applications that are
running and not filesystem behaviour related at all...
> > It doesn't take into account sparse files. Well, it does by
> > ignoring them which is another flag that this isn't measuring
> > internal fragmentation because even sparse files can be internally
> > fragmented.
> >
> > Which is another thing this doesn't take into account: the amount of
> > data actually written to the files. e.g. a preallocated, zero length
> > file is "internally fragmented" by this criteria, but the same empty
> > file with a file size that matches the preallocation is not
> > "internally fragmented". Yet an actual internally fragmented file
> > (e.g. preallocate 1MB, set size to 1MB, write 4k at 256k) will not
> > actually be noticed by this code....
>
> Interesting, how can we better account for these?
If it is preallocated space, then you need to scan each file to
determine the ratio of written to unwritten extents in the file.
(i.e. allocated space that contains data vs allocated space that
does not contain data). Basically, you need something similar to
what xfs_fsr is doing to determine if files need defragmentation or
not...
> > IOWs, what is being reported here is exactly the same information
> > that "stat(blocks) vs stat(size)" will tell you, which makes me
> > wonder why the method of gathering it (full fs scan via xfs_db) is
> > being used when this could be done with a simple script based around
> > this:
> >
> > $ find /mntpt -type f -exec stat -c "%s %b" {} \; | histogram_script
>
> While this is true that this can be measured via a simply a script,
> I'd like to point out that it would be significantly more inefficient,
> for instance:
>
> # time find /mnt/pt -type f -exec stat -c "%s %b" {} \; > /tmp/file-sizes
>
> real 27m38.885s
> user 3m29.774s
> sys 17m9.272s
That's close on CPU bound, so I'm betting most of that time is in
fork/exec for the stat binary for each file that find pipes out.
e.g on my workstation, which has ~50,000 read iops @ QD=1 capability
on the root filesystem:
$ time sudo find / -type f > /tmp/files
real 0m8.707s
user 0m1.006s
sys 0m2.571s
$ wc -l /tmp/files
1832634 /tmp/files
So, 1.8m files traversed in under 9 seconds - this didn't stat the
inodes because ftype. How long does the example script take?
$ time sudo find / -type f -exec stat -c "%s %b" {} \; > /tmp/file-sizes
<still waiting after 10 minutes>
.....
While we are waiting, lets just get rid of the fork/exec overhead,
eh?
$ time sudo sh -c 'find / -type f |xargs -d "\n" stat -c "%s %b" > /tmp/files-size-2'
real 0m4.712s
user 0m2.732s
sys 0m5.073s
Ok, a fair bunch of the directory heirarchy and inodes were cached,
but the actual stat takes almost no time at all and almost no CPU
usage.
Back to the fork-exec script, still waiting for it to finish,
despite the entire file set now residing in kernel memory.
Fmeh, I'm just going to kill it.
....
real 26m34.542s
user 18m58.183s
sys 7m38.125s
$ wc -l /tmp/file-sizes
1824062 /tmp/file-sizes
Oh, it was almost done. IOWs, the /implementation/ was the problem
I think by now you understand that I gave an example of how the
information you are querying is already available through normal
POSIX APIs, not that it was the most optimal way of running that
query through those APIs.
It is, however, /trivial/ to do this query traversal query at max IO
speed even using scripts - using xargs to batch arguments to
utilities to avoid fork/exec overhead is sysadmin 101 stuff...
> # echo "frag -s -e" | time /tmp/xfs_db -r /dev/sdb1
> [...]
> 0.44user 2.48system 0:05.42elapsed 53%CPU (0avgtext+0avgdata 996000maxresident)k
> 2079416inputs+0outputs (0major+248446minor)pagefaults 0swaps
>
> That's 5.4s vs +27 minutes without considering the time to build the histogram.
Yup, but now you've explained why you are trying to use xfs_db in
inappropriate ways: performance.
Despite the fact directory traversal can be fast, it is still not
the most IO efficient way to iterate inodes. xfs_db does that by
reading the inodes in ascending order from the AGI btrees, meaning
it's a single sequential pass across the filesystem to parse indoes
a chunk at a time.
The sad fact about all this is that we've been able to do this from
userspace with XFS since .... 1994. It's called bulkstat. I say
this is sad because I've lost count of the number of times people
have wasted time trying to re-invent the wheel rather than just
asking the experts a simple question and being told about bulkstat
or GETFSMAP....
Yup, there's even a basic test program in fstests that outputs stat
information from bulkstat that you can filter to report the info you
are generating histograms from:
$ time sudo src/bstat / | grep -A1 "mode 01" | awk -e '/blksize/ { print $4, $6 }' > /tmp/bstat-sizes
real 0m11.317s
user 0m8.686s
sys 0m5.909s
Again, this is /not an optimal implementation/ but just an example
that this functionality is available to userspace. Targeted
implementations can be found in tools like xfs_fsr and xfsdump which
use bulkstat to find the indoes they need to operate on much faster
than a directory walk ever will acheive....
> > I have no problems with adding analysis and reporting functionality
> > to the filesystem tools, but they have to be done the right way, and
> > not duplicate functionality and information that can be trivially
> > obtained from userspace with a script and basic utilities. IMO,
> > there has to be some substantial benefit from implementing the
> > functionality using deep, dark filesystem gubbins that can't be
> > acheived in any other way for it be worth the additional code
> > maintenance burden....
>
> In my view, the efficiency gain should justify the need to for this
> tool.
As I said, I have no objections to such functionality if it is /done
the right way/. I'm not arguing against providing such functionality
to users, I'm pointing out that the implementation has issues that
will cause problems for users that try to use this functionality
and trying to let you know how to implement it safely without giving
up any of the efficiency gains.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-05-15 22:55 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-14 18:50 [PATCH] xfs_db: add extent count and file size histograms Jorge Guerra
2019-05-14 19:52 ` Eric Sandeen
2019-05-14 20:02 ` Eric Sandeen
2019-05-15 15:57 ` Jorge Guerra
2019-05-15 16:02 ` Eric Sandeen
2019-05-14 23:31 ` Dave Chinner
2019-05-15 0:06 ` Eric Sandeen
2019-05-15 2:05 ` Dave Chinner
2019-05-15 16:39 ` Jorge Guerra
2019-05-15 22:55 ` Dave Chinner
2019-05-15 16:15 ` Jorge Guerra
2019-05-15 16:24 ` Eric Sandeen
2019-05-15 16:47 ` Jorge Guerra
2019-05-15 16:51 ` Eric Sandeen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.