From: Avi Kivity <avi@scylladb.com>
To: Chris Murphy <lists@colorremedies.com>
Cc: Linux-RAID <linux-raid@vger.kernel.org>
Subject: Re: raid0 vs. mkfs
Date: Mon, 28 Nov 2016 09:28:18 +0200 [thread overview]
Message-ID: <9ce71838-0a1d-e4d8-5786-9ab0422688af@scylladb.com> (raw)
In-Reply-To: <CAJCQCtR+PnJXq9k37V1pXVO4UmW0dbhvAZYo=Pius2eCz9X_kg@mail.gmail.com>
On 11/28/2016 06:11 AM, Chris Murphy wrote:
> On Sun, Nov 27, 2016 at 8:24 AM, Avi Kivity <avi@scylladb.com> wrote:
>> mkfs /dev/md0 can take a very long time, if /dev/md0 is a very large disk
>> that supports TRIM/DISCARD (erase whichever is inappropriate)
> Trim is the appropriate term. Term discard refers to a specific mount
> time implementation of FITRIM ioctl, and fstrim refers to a user space
> tool that does the same and can be scheduled or issued manually.
That's good to know.
>
>
> That is
>> because mkfs issues a TRIM/DISCARD (erase whichever is inappropriate) for
>> the entire partition. As far as I can tell, md converts the large
>> TRIM/DISCARD (erase whichever is inappropriate) into a large number of
>> TRIM/DISCARD (erase whichever is inappropriate) requests, one per chunk-size
>> worth of disk, and issues them to the RAID components individually.
> You could strace the mkfs command.
I did, and saw that it was running a single syscall for the entire run.
I verified in the sources that mkfs.xfs issues a single BLKDISCARD (?!)
ioctl spanning the entire device.
> Each filesystem is doing it a
> little differently the last time I compared mkfs.xfs and mkfs.btrfs;
> but I can't qualify the differences relative to how the device is
> going to react to those commands.
>
> It's also possible to enable block device tracing and see the actual
> SCSI or ATA commands sent to a drive.
I did, and saw a ton of half-megabyte TRIMs. It's an NVMe device so not
SCSI or SATA.
Here's a sample (I only blktraced one of the members):
259,1 10 1090 0.379688898 4801 Q D 3238067200 + 1024
[mkfs.xfs]
259,1 10 1091 0.379689222 4801 G D 3238067200 + 1024
[mkfs.xfs]
259,1 10 1092 0.379690304 4801 I D 3238067200 + 1024
[mkfs.xfs]
259,1 10 1093 0.379703110 2307 D D 3238067200 + 1024
[kworker/10:1H]
259,1 1 589 0.379718918 0 C D 3231849472 + 1024 [0]
259,1 10 1094 0.379735215 4801 Q D 3238068224 + 1024
[mkfs.xfs]
259,1 10 1095 0.379735548 4801 G D 3238068224 + 1024
[mkfs.xfs]
259,1 10 1096 0.379736598 4801 I D 3238068224 + 1024
[mkfs.xfs]
259,1 10 1097 0.379753077 2307 D D 3238068224 + 1024
[kworker/10:1H]
259,1 1 590 0.379782139 0 C D 3231850496 + 1024 [0]
259,1 10 1098 0.379785399 4801 Q D 3238069248 + 1024
[mkfs.xfs]
259,1 10 1099 0.379785657 4801 G D 3238069248 + 1024
[mkfs.xfs]
259,1 10 1100 0.379786562 4801 I D 3238069248 + 1024
[mkfs.xfs]
259,1 10 1101 0.379800116 2307 D D 3238069248 + 1024
[kworker/10:1H]
259,1 10 1102 0.379829822 4801 Q D 3238070272 + 1024
[mkfs.xfs]
259,1 10 1103 0.379830156 4801 G D 3238070272 + 1024
[mkfs.xfs]
259,1 10 1104 0.379831015 4801 I D 3238070272 + 1024
[mkfs.xfs]
259,1 10 1105 0.379844120 2307 D D 3238070272 + 1024
[kworker/10:1H]
259,1 10 1106 0.379877825 4801 Q D 3238071296 + 1024
[mkfs.xfs]
259,1 10 1107 0.379878173 4801 G D 3238071296 + 1024
[mkfs.xfs]
259,1 10 1108 0.379879028 4801 I D 3238071296 + 1024
[mkfs.xfs]
259,1 1 591 0.379886451 0 C D 3231851520 + 1024 [0]
259,1 10 1109 0.379898178 2307 D D 3238071296 + 1024
[kworker/10:1H]
259,1 10 1110 0.379923982 4801 Q D 3238072320 + 1024
[mkfs.xfs]
259,1 10 1111 0.379924229 4801 G D 3238072320 + 1024
[mkfs.xfs]
259,1 10 1112 0.379925054 4801 I D 3238072320 + 1024
[mkfs.xfs]
259,1 10 1113 0.379937716 2307 D D 3238072320 + 1024
[kworker/10:1H]
259,1 1 592 0.379954380 0 C D 3231852544 + 1024 [0]
259,1 10 1114 0.379970091 4801 Q D 3238073344 + 1024
[mkfs.xfs]
259,1 10 1115 0.379970341 4801 G D 3238073344 + 1024
[mkfs.xfs]
259,1 10 1116 0.379971260 4801 I D 3238073344 + 1024
[mkfs.xfs]
259,1 10 1117 0.379984303 2307 D D 3238073344 + 1024
[kworker/10:1H]
259,1 10 1118 0.380014754 4801 Q D 3238074368 + 1024
[mkfs.xfs]
259,1 10 1119 0.380015075 4801 G D 3238074368 + 1024
[mkfs.xfs]
259,1 10 1120 0.380015903 4801 I D 3238074368 + 1024
[mkfs.xfs]
259,1 10 1121 0.380028655 2307 D D 3238074368 + 1024
[kworker/10:1H]
259,1 2 170 0.380054279 0 C D 3218706432 + 1024 [0]
259,1 10 1122 0.380060773 4801 Q D 3238075392 + 1024
[mkfs.xfs]
259,1 10 1123 0.380061024 4801 G D 3238075392 + 1024
[mkfs.xfs]
259,1 10 1124 0.380062093 4801 I D 3238075392 + 1024
[mkfs.xfs]
259,1 10 1125 0.380072940 2307 D D 3238075392 + 1024
[kworker/10:1H]
259,1 10 1126 0.380107437 4801 Q D 3238076416 + 1024
[mkfs.xfs]
259,1 10 1127 0.380107882 4801 G D 3238076416 + 1024
[mkfs.xfs]
259,1 10 1128 0.380109258 4801 I D 3238076416 + 1024
[mkfs.xfs]
259,1 10 1129 0.380123914 2307 D D 3238076416 + 1024
[kworker/10:1H]
259,1 2 171 0.380130823 0 C D 3218707456 + 1024 [0]
259,1 10 1130 0.380156971 4801 Q D 3238077440 + 1024
[mkfs.xfs]
259,1 10 1131 0.380157308 4801 G D 3238077440 + 1024
[mkfs.xfs]
259,1 10 1132 0.380158354 4801 I D 3238077440 + 1024
[mkfs.xfs]
259,1 10 1133 0.380168948 2307 D D 3238077440 + 1024
[kworker/10:1H]
259,1 2 172 0.380186647 0 C D 3218708480 + 1024 [0]
259,1 10 1134 0.380197495 4801 Q D 3238078464 + 1024
[mkfs.xfs]
259,1 10 1135 0.380197848 4801 G D 3238078464 + 1024
[mkfs.xfs]
259,1 10 1136 0.380198724 4801 I D 3238078464 + 1024
[mkfs.xfs]
259,1 10 1137 0.380202964 2307 D D 3238078464 + 1024
[kworker/10:1H]
259,1 10 1138 0.380237133 4801 Q D 3238079488 + 1024
[mkfs.xfs]
259,1 10 1139 0.380237393 4801 G D 3238079488 + 1024
[mkfs.xfs]
259,1 10 1140 0.380238333 4801 I D 3238079488 + 1024
[mkfs.xfs]
259,1 10 1141 0.380252580 2307 D D 3238079488 + 1024
[kworker/10:1H]
259,1 2 173 0.380260605 0 C D 3218709504 + 1024 [0]
259,1 10 1142 0.380283800 4801 Q D 3238080512 + 1024
[mkfs.xfs]
259,1 10 1143 0.380284158 4801 G D 3238080512 + 1024
[mkfs.xfs]
259,1 10 1144 0.380285150 4801 I D 3238080512 + 1024
[mkfs.xfs]
259,1 10 1145 0.380297127 2307 D D 3238080512 + 1024
[kworker/10:1H]
259,1 10 1146 0.380324340 4801 Q D 3238081536 + 1024
[mkfs.xfs]
259,1 10 1147 0.380324648 4801 G D 3238081536 + 1024
[mkfs.xfs]
259,1 10 1148 0.380325663 4801 I D 3238081536 + 1024
[mkfs.xfs]
259,1 2 174 0.380328083 0 C D 3218710528 + 1024 [0]
So we see these one-megabyte requests; moreover, they are issued
sequentially.
> There's a metric f tonne of bugs in this area so before anything I'd
> consider researching if there's a firmware update for your hardware
> and applying that and retesting.
I don't have access to that machine any more (I could get some with a
bit of trouble). But I think it's clear from the traces that the
problem is in the RAID layer?
> And then also after testing your
> ideal deployed version, use something much close to upstream (Arch or
> Fedora) and see if the problem is reproducible.
I'm hoping the RAID maintainers can confirm at a glance whether the
problem exists or not, it doesn't look like a minor glitch but simply
that this code path doesn't take the issue into account.
next prev parent reply other threads:[~2016-11-28 7:28 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-27 15:24 raid0 vs. mkfs Avi Kivity
2016-11-27 17:09 ` Coly Li
2016-11-27 17:25 ` Avi Kivity
2016-11-27 19:25 ` Doug Dumitru
2016-11-28 4:11 ` Chris Murphy
2016-11-28 7:28 ` Avi Kivity [this message]
2016-11-28 7:33 ` Avi Kivity
2016-11-28 5:09 ` NeilBrown
2016-11-28 6:08 ` Shaohua Li
2016-11-28 7:38 ` Avi Kivity
2016-11-28 8:40 ` NeilBrown
2016-11-28 8:58 ` Avi Kivity
2016-11-28 9:00 ` Christoph Hellwig
2016-11-28 9:11 ` Avi Kivity
2016-11-28 9:15 ` Coly Li
2016-11-28 17:47 ` Shaohua Li
2016-11-29 21:14 ` NeilBrown
2016-11-29 22:45 ` Avi Kivity
2016-12-07 5:08 ` Mike Snitzer
2016-12-07 11:50 ` Coly Li
2016-12-07 12:03 ` Coly Li
2016-12-07 16:59 ` Shaohua Li
2016-12-08 16:44 ` Coly Li
2016-12-08 19:19 ` Shaohua Li
2016-12-09 7:34 ` Coly Li
2016-12-12 3:17 ` NeilBrown
2017-06-29 15:15 ` Avi Kivity
2017-06-29 15:31 ` Coly Li
2017-06-29 15:36 ` Avi Kivity
2017-01-22 18:01 ` Avi Kivity
2017-01-23 12:26 ` Coly Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9ce71838-0a1d-e4d8-5786-9ab0422688af@scylladb.com \
--to=avi@scylladb.com \
--cc=linux-raid@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.