All of lore.kernel.org
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: Xiao Ni <xni@redhat.com>
Cc: linux-raid@vger.kernel.org, Jes Sorensen <jes@trained-monkey.org>,
	Guoqing Jiang <guoqing.jiang@linux.dev>,
	Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>,
	Coly Li <colyli@suse.de>,
	Chaitanya Kulkarni <chaitanyak@nvidia.com>,
	Jonmichael Hands <jm@chia.net>,
	Stephen Bates <sbates@raithlin.com>,
	Martin Oliveira <Martin.Oliveira@eideticom.com>,
	David Sloan <David.Sloan@eideticom.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>
Subject: Re: [PATCH mdadm v4 0/7] Write Zeroes option for Creating Arrays
Date: Wed, 12 Oct 2022 10:59:45 -0600	[thread overview]
Message-ID: <8ee5368c-1808-d2bc-9ad2-2f8332d2704e@deltatee.com> (raw)
In-Reply-To: <CALTww28HQUPbB647oP9WKvkLX=9PqZv+9am-884zZVM923H-KA@mail.gmail.com>

@ccing Martin hoping he has an opinion on the write zeroes interface

On 2022-10-11 19:09, Xiao Ni wrote:
> Hi Logan
> 
> I did a test with the patchset. There is a problem like this:
> 
> mdadm -CR /dev/md0 -l5 -n3 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme0n1 --write-zero
> mdadm: zeroing data from 135266304 to 960061505536 on: /dev/nvme1n1
> mdadm: zeroing data from 135266304 to 960061505536 on: /dev/nvme2n1
> mdadm: zeroing data from 135266304 to 960061505536 on: /dev/nvme0n1
> 
> I ran ctrl+c when waiting, then the raid can't be created anymore. Because the
> processes that write zero to nvmes are stuck.
> 
> ps auxf | grep mdadm
> root       68764  0.0  0.0   9216  1104 pts/0    S+   21:09   0:00
>          \_ grep --color=auto mdadm
> root       68633  0.1  0.0  27808   336 pts/0    D    21:04   0:00
> mdadm -CR /dev/md0 -l5 -n3 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme0n1
> --write-zero
> root       68634  0.2  0.0  27808   336 pts/0    D    21:04   0:00
> mdadm -CR /dev/md0 -l5 -n3 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme0n1
> --write-zero
> root       68635  0.0  0.0  27808   336 pts/0    D    21:04   0:00
> mdadm -CR /dev/md0 -l5 -n3 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme0n1
> --write-zero

Yes, this is because the fallocate() call that the child processes use
to write_zeros will submit a large number of bios in the kernel and then
wait with submit_bio_wait() which is non-interruptible. So when the
child processes get the SIGINT, they will not stop until after the
fallocate() call completes which will pretty much be after the entire
disk is zeroed. So if you are zeroing a very large disk, those processes
will be stuck around for several minutes after the parent process
terminates; though they do go away eventually.

There aren't many great solutions for this:

1) We could install as signal handler in the parent so it sticks around
until the zeroing is complete. This would mean mdadm will not be able to
be terminated while the zeroing is occurring and the user has to wait.

2) We could split up the fallocate call into multiple calls to zero the
entire disk. This would allow a quicker ctrl-c to occur, however it's
not clear what the best size would be to split it into. Even zeroing 1GB
can take a few seconds, but the smaller we go, the less efficient it
will be if the block layer and devices ever get write-zeroes optimized
in the same way discard has been optimized (with NVMe, discard only
requires a single command to handle the entire disk where as
write-zeroes requires a minimum of one command per 2MB of data to zero).
I was hoping write-zeroes could be made faster in the future, at least
for NVMe.

Thoughts?

Logan

  reply	other threads:[~2022-10-12 17:00 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-07 20:10 [PATCH mdadm v4 0/7] Write Zeroes option for Creating Arrays Logan Gunthorpe
2022-10-07 20:10 ` [PATCH mdadm v4 1/7] Create: goto abort_locked instead of return 1 in error path Logan Gunthorpe
2022-10-07 20:10 ` [PATCH mdadm v4 2/7] Create: remove safe_mode_delay local variable Logan Gunthorpe
2022-10-07 20:10 ` [PATCH mdadm v4 3/7] Create: Factor out add_disks() helpers Logan Gunthorpe
2022-10-07 20:10 ` [PATCH mdadm v4 4/7] mdadm: Introduce pr_info() Logan Gunthorpe
2022-10-07 20:10 ` [PATCH mdadm v4 5/7] mdadm: Add --write-zeros option for Create Logan Gunthorpe
2022-10-07 20:10 ` [PATCH mdadm v4 6/7] tests/00raid5-zero: Introduce test to exercise --write-zeros Logan Gunthorpe
2022-10-07 20:10 ` [PATCH mdadm v4 7/7] manpage: Add --write-zeroes option to manpage Logan Gunthorpe
2022-10-12  1:09 ` [PATCH mdadm v4 0/7] Write Zeroes option for Creating Arrays Xiao Ni
2022-10-12 16:59   ` Logan Gunthorpe [this message]
2022-10-13  1:33     ` Martin K. Petersen
2022-10-13  7:51       ` Xiao Ni
2022-10-26  2:41         ` Martin K. Petersen
2022-10-27  8:44           ` Xiao Ni
2022-11-16 17:11       ` Logan Gunthorpe
2022-11-03  8:14 ` Kinga Tanska

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ee5368c-1808-d2bc-9ad2-2f8332d2704e@deltatee.com \
    --to=logang@deltatee.com \
    --cc=David.Sloan@eideticom.com \
    --cc=Martin.Oliveira@eideticom.com \
    --cc=chaitanyak@nvidia.com \
    --cc=colyli@suse.de \
    --cc=guoqing.jiang@linux.dev \
    --cc=jes@trained-monkey.org \
    --cc=jm@chia.net \
    --cc=linux-raid@vger.kernel.org \
    --cc=mariusz.tkaczyk@linux.intel.com \
    --cc=martin.petersen@oracle.com \
    --cc=sbates@raithlin.com \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.