All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Wheeler <bcache@lists.ewheeler.net>
To: Adriano Silva <adriano_da_silva@yahoo.com.br>
Cc: Keith Busch <kbusch@kernel.org>,
	Matthias Ferdinand <bcache@mfedv.net>,
	Bcache Linux <linux-bcache@vger.kernel.org>,
	Coly Li <colyli@suse.de>, Christoph Hellwig <hch@infradead.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: [RFC] Add sysctl option to drop disk flushes in bcache? (was: Bcache in writes direct with fsync)
Date: Wed, 1 Jun 2022 14:11:35 -0700 (PDT)	[thread overview]
Message-ID: <8a95d4f-b263-5231-537d-b1f88fdd5090@ewheeler.net> (raw)
In-Reply-To: <1295433800.3263424.1654111657911@mail.yahoo.com>

[-- Attachment #1: Type: text/plain, Size: 5493 bytes --]

On Wed, 1 Jun 2022, Adriano Silva wrote:
> I don't know if my NVME's devices are 4K LBA. I do not think so. They 
> are all the same model and manufacturer. I know that they work with 
> blocks of 512 Bytes, but that their latency is very high when processing 
> blocks of this size.

Ok, it should be safe in terms of the possible bcache bug I was referring 
to if it supports 512b IOs.

> However, in all the tests I do with them with 4K blocks, the result is 
> much better. So I always use 4K blocks. Because in real life I don't 
> think I'll use blocks smaller than 4K.

Makes sense, format with -w 4k.  There is probably some CPU benefit to 
having page-aligned IOs, too.

> > You can remove the kernel interpretation using passthrough commands. Here's an
> > example comparing with and without FUA assuming a 512b logical block format:
> > 
> >   # echo "" | nvme write /dev/nvme0n1 --block-count=7 --data-size=4k --force-unit-access --latency
> >   # echo "" | nvme write /dev/nvme0n1 --block-count=7 --data-size=4k --latency
> > 
> > if you have a 4k LBA format, use "--block-count=0".
> > 
> > And you may want to run each of the above several times to get an average since
> > other factors can affect the reported latency.
> 
> I created a bash script capable of executing the two commands you 
> suggested to me in a period of 10 seconds in a row, to get some more 
> acceptable average. The result is the following:
> 
> root@pve-21:~# for i in /sys/block/*/queue/write_cache; do echo 'write back' > $i; done
> root@pve-21:~# cat /sys/block/nvme0n1/queue/write_cache
> write back
> root@pve-21:~# ./nvme_write.sh
> Total: 10 seconds, 3027 tests. Latency (us) : min: 29  /  avr: 37   /  max: 98
> root@pve-21:~# ./nvme_write.sh --force-unit-access
> Total: 10 seconds, 2985 tests. Latency (us) : min: 29  /  avr: 37   /  max: 111
> root@pve-21:~#
> root@pve-21:~# ./nvme_write.sh --force-unit-access --block-count=0
> Total: 10 seconds, 2556 tests. Latency (us) : min: 404  /  avr: 428   /  max: 492
> root@pve-21:~# ./nvme_write.sh --block-count=0
> Total: 10 seconds, 2521 tests. Latency (us) : min: 403  /  avr: 428   /  max: 496
> root@pve-21:~#
> root@pve-21:~#
> root@pve-21:~# for i in /sys/block/*/queue/write_cache; do echo 'write through' > $i; done
> root@pve-21:~# cat /sys/block/nvme0n1/queue/write_cache
> write through
> root@pve-21:~# ./nvme_write.sh
> Total: 10 seconds, 2988 tests. Latency (us) : min: 29  /  avr: 37   /  max: 114
> root@pve-21:~# ./nvme_write.sh --force-unit-access
> Total: 10 seconds, 2926 tests. Latency (us) : min: 29  /  avr: 36   /  max: 71
> root@pve-21:~#
> root@pve-21:~# ./nvme_write.sh --force-unit-access --block-count=0
> Total: 10 seconds, 2456 tests. Latency (us) : min: 31  /  avr: 428   /  max: 496
> root@pve-21:~# ./nvme_write.sh --block-count=0
> Total: 10 seconds, 2627 tests. Latency (us) : min: 402  /  avr: 428   /  max: 509
> 
> Well, as we can see above, in almost 3k tests run in a period of ten 
> seconds, with each of the commands, I got even better results than I 
> already got with ioping. I did tests with isolated commands as well, but 
> I decided to write a bash script to be able to execute many commands in 
> a short period of time and make an average. And we can see an average of 
> about 37us in any situation. Very low!
> 
> However, when using that suggested command --block-count=0 the latency 
> is very high in any situation, around 428us.
> 
> But as we see, using the nvme command, the latency is always the same in 
> any scenario, whether with or without --force-unit-access, having a 
> difference only regarding the use of the command directed to devices 
> that don't have LBA or that aren't.
> 
> What do you think?

It looks like the NVMe works well except in 512b situations.  Its 
interesting that --force-unit-access doesn't increase the latency: Perhaps 
the NVMe ignores sync flags since it knows it has a non-volatile cache.

-Eric

> 
> Tanks,
> 
> 
> Em segunda-feira, 30 de maio de 2022 10:45:37 BRT, Keith Busch <kbusch@kernel.org> escreveu: 
> 
> 
> 
> 
> 
> On Sun, May 29, 2022 at 11:50:57AM +0000, Adriano Silva wrote:
> 
> > So why the slowness? Is it just the time spent in kernel code to set 
> > FUA and Flush Cache bits on writes that would cause all this latency 
> > increment (84us to 1.89ms) ?
> 
> 
> I don't think the kernel's handling accounts for that great of a difference. I
> think the difference is probably on the controller side.
> 
> The NVMe spec says that a Write command with FUA set:
> 
> "the controller shall write that data and metadata, if any, to non-volatile
> media before indicating command completion."
> 
> So if the memory is non-volatile, it can complete the command without writing
> to the backing media. It can also commit the data to the backing media if it
> wants to before completing the command, but that's implementation specific
> details.
> 
> You can remove the kernel interpretation using passthrough commands. Here's an
> example comparing with and without FUA assuming a 512b logical block format:
> 
>   # echo "" | nvme write /dev/nvme0n1 --block-count=7 --data-size=4k --force-unit-access --latency
>   # echo "" | nvme write /dev/nvme0n1 --block-count=7 --data-size=4k --latency
> 
> If you have a 4k LBA format, use "--block-count=0".
> 
> And you may want to run each of the above several times to get an average since
> other factors can affect the reported latency.
> 

  reply	other threads:[~2022-06-01 21:12 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <958894243.922478.1652201375900.ref@mail.yahoo.com>
2022-05-10 16:49 ` Bcache in writes direct with fsync. Are IOPS limited? Adriano Silva
2022-05-11  6:20   ` Matthias Ferdinand
2022-05-11 12:58     ` Adriano Silva
2022-05-11 21:21       ` Matthias Ferdinand
2022-05-18  1:22   ` Eric Wheeler
2022-05-23 14:07     ` Coly Li
2022-05-26 19:15       ` Eric Wheeler
2022-05-27 17:28         ` colyli
2022-05-28  0:58           ` Eric Wheeler
2022-05-23 18:36     ` [RFC] Add sysctl option to drop disk flushes in bcache? (was: Bcache in writes direct with fsync) Eric Wheeler
2022-05-24  5:34       ` Christoph Hellwig
2022-05-24 20:14         ` Eric Wheeler
2022-05-24 20:34           ` Keith Busch
2022-05-24 21:34             ` Eric Wheeler
2022-05-25  5:20               ` Christoph Hellwig
2022-05-25 18:44                 ` Eric Wheeler
2022-05-26  9:06                   ` Christoph Hellwig
2022-05-28  1:52                 ` Eric Wheeler
2022-05-28  3:57                   ` Keith Busch
2022-05-28  4:59                   ` Christoph Hellwig
2022-05-28 12:57                     ` Adriano Silva
2022-05-29  3:18                       ` Keith Busch
2022-05-31 19:42                         ` Eric Wheeler
2022-05-31 20:22                           ` Keith Busch
2022-05-31 23:04                             ` Eric Wheeler
2022-06-01  0:36                               ` Keith Busch
2022-06-01 18:48                                 ` Eric Wheeler
     [not found]                         ` <2064546094.2440522.1653825057164@mail.yahoo.com>
     [not found]                           ` <YpTKfHHWz27Qugi+@kbusch-mbp.dhcp.thefacebook.com>
2022-06-01 19:27                             ` Adriano Silva
2022-06-01 21:11                               ` Eric Wheeler [this message]
2022-06-02  5:26                                 ` Christoph Hellwig
2022-05-25  5:17           ` Christoph Hellwig
     [not found]     ` <681726005.1812841.1653564986700@mail.yahoo.com>
2022-05-26 20:20       ` Bcache in writes direct with fsync. Are IOPS limited? Adriano Silva
2022-05-26 20:28       ` Eric Wheeler
2022-05-27  4:07         ` Adriano Silva
2022-05-28  1:27           ` Eric Wheeler
2022-05-28  7:22             ` Matthias Ferdinand
2022-05-28 12:09               ` Adriano Silva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a95d4f-b263-5231-537d-b1f88fdd5090@ewheeler.net \
    --to=bcache@lists.ewheeler.net \
    --cc=adriano_da_silva@yahoo.com.br \
    --cc=bcache@mfedv.net \
    --cc=colyli@suse.de \
    --cc=hch@infradead.org \
    --cc=kbusch@kernel.org \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.