* nvme: split bios issued in reverse order @ 2022-05-23 16:16 Jonathan Nicklin 2022-05-24 12:58 ` Sagi Grimberg 0 siblings, 1 reply; 13+ messages in thread From: Jonathan Nicklin @ 2022-05-23 16:16 UTC (permalink / raw) To: linux-nvme There seems to be an inconsistency in the order of writes that are issued after splitting a bio. Ordering depends on how the application write is submitted and the number of I/O queues configured. In our testing nvme/tcp, a 128K write issued with fio/pvsync is split into four 32K I/Os (the target maximum data transfer size is set to 32K, and max_sectors_kb is therefore 32K). As expected, the four write I/Os are issued to the target in sequential order. However, if the 128K write is issued using fio/libaio, the four 32K writes are issued in reverse order: fio-8098 [001] ..... 254009.711080: nvme_setup_cmd: nvme1: disk=nvme1c1n1, qid=2, cmdid=16468, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-8098 [001] ..... 254009.711083: nvme_setup_cmd: nvme1: disk=nvme1c1n1, qid=2, cmdid=16467, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-8098 [001] ..... 254009.711084: nvme_setup_cmd: nvme1: disk=nvme1c1n1, qid=2, cmdid=16466, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-8098 [001] ..... 254009.711085: nvme_setup_cmd: nvme1: disk=nvme1c1n1, qid=2, cmdid=16465, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0) Further investigation found that if the number of I/Os queues is limited to 1 at connect time, the issue order is sequential for both pwritev and libaio. I've spent some time tracing through the bio/blk_mq code and can't seem to find what might be causing the difference in behavior. Can anyone confirm that this is expected or desired behavior? Thanks, -Jonathan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nvme: split bios issued in reverse order 2022-05-23 16:16 nvme: split bios issued in reverse order Jonathan Nicklin @ 2022-05-24 12:58 ` Sagi Grimberg 2022-05-24 13:25 ` Jonathan Nicklin 0 siblings, 1 reply; 13+ messages in thread From: Sagi Grimberg @ 2022-05-24 12:58 UTC (permalink / raw) To: Jonathan Nicklin, linux-nvme > There seems to be an inconsistency in the order of writes that are > issued after splitting a bio. Ordering depends on how the application > write is submitted and the number of I/O queues configured. > > In our testing nvme/tcp, Is this specific to nvme-tcp? > a 128K write issued with fio/pvsync is this specific to the io engine? > is split > into four 32K I/Os (the target maximum data transfer size is set to > 32K, and max_sectors_kb is therefore 32K). As expected, the four write > I/Os are issued to the target in sequential order. However, if the > 128K write is issued using fio/libaio, the four 32K writes are issued > in reverse order: > > fio-8098 [001] ..... 254009.711080: nvme_setup_cmd: nvme1: > disk=nvme1c1n1, qid=2, cmdid=16468, nsid=1, flags=0x0, meta=0x0, > cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > > fio-8098 [001] ..... 254009.711083: nvme_setup_cmd: nvme1: > disk=nvme1c1n1, qid=2, cmdid=16467, nsid=1, flags=0x0, meta=0x0, > cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > > fio-8098 [001] ..... 254009.711084: nvme_setup_cmd: nvme1: > disk=nvme1c1n1, qid=2, cmdid=16466, nsid=1, flags=0x0, meta=0x0, > cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > > fio-8098 [001] ..... 254009.711085: nvme_setup_cmd: nvme1: > disk=nvme1c1n1, qid=2, cmdid=16465, nsid=1, flags=0x0, meta=0x0, > cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > > Further investigation found that if the number of I/Os queues is > limited to 1 at connect time, Is this specific to a single I/O queue? > the issue order is sequential for both > pwritev and libaio. I'm assuming that this is 100% repeatable? > > I've spent some time tracing through the bio/blk_mq code and > can't seem to find what might be causing the difference in > behavior. Can anyone confirm that this is expected or desired > behavior? What is the controller mdts? does the 32k go in-capsule? or does the controller send r2t? Also, if we assume that this is indeed the case, is this a fundamental issue? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nvme: split bios issued in reverse order 2022-05-24 12:58 ` Sagi Grimberg @ 2022-05-24 13:25 ` Jonathan Nicklin 2022-05-24 16:08 ` Keith Busch 0 siblings, 1 reply; 13+ messages in thread From: Jonathan Nicklin @ 2022-05-24 13:25 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-nvme On Tue, May 24, 2022 at 8:58 AM Sagi Grimberg <sagi@grimberg.me> wrote: > > > > There seems to be an inconsistency in the order of writes that are > > issued after splitting a bio. Ordering depends on how the application > > write is submitted and the number of I/O queues configured. > > > > In our testing nvme/tcp, > > Is this specific to nvme-tcp? No. This is not specific to nvme-tcp. I confirmed the same behavior directly to a pci device. > > > a 128K write issued with fio/pvsync > > is this specific to the io engine? Yes. With ioengine=libaio, the IOs are reversed. With ioengine=pvsync the IOs are sequential. > > > is split > > into four 32K I/Os (the target maximum data transfer size is set to > > 32K, and max_sectors_kb is therefore 32K). As expected, the four write > > I/Os are issued to the target in sequential order. However, if the > > 128K write is issued using fio/libaio, the four 32K writes are issued > > in reverse order: > > > > fio-8098 [001] ..... 254009.711080: nvme_setup_cmd: nvme1: > > disk=nvme1c1n1, qid=2, cmdid=16468, nsid=1, flags=0x0, meta=0x0, > > cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > > > > fio-8098 [001] ..... 254009.711083: nvme_setup_cmd: nvme1: > > disk=nvme1c1n1, qid=2, cmdid=16467, nsid=1, flags=0x0, meta=0x0, > > cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > > > > fio-8098 [001] ..... 254009.711084: nvme_setup_cmd: nvme1: > > disk=nvme1c1n1, qid=2, cmdid=16466, nsid=1, flags=0x0, meta=0x0, > > cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > > > > fio-8098 [001] ..... 254009.711085: nvme_setup_cmd: nvme1: > > disk=nvme1c1n1, qid=2, cmdid=16465, nsid=1, flags=0x0, meta=0x0, > > cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > > > > Further investigation found that if the number of I/Os queues is > > limited to 1 at connect time, > > Is this specific to a single I/O queue? With ioengine=libaio && queues > 1, the IOs are issued in reverse order. With ioengine=libaio && queues == 1, the IOs are in sequential order. > > > the issue order is sequential for both > > pwritev and libaio. > > I'm assuming that this is 100% repeatable? Yes. !00% repeatable. > > > > > I've spent some time tracing through the bio/blk_mq code and > > can't seem to find what might be causing the difference in > > behavior. Can anyone confirm that this is expected or desired > > behavior? > > What is the controller mdts? does the 32k go in-capsule? or does > the controller send r2t? mdts=32K, io capsule size=32K, no R2T > > > Also, if we assume that this is indeed the case, is this a fundamental > issue? Maybe it is fundamental since it occurs for both PCI and TCP devices? The part that I can't reconcile is why there is a difference in behavior for ioengine=libaio when multiple queues are present. It feels like it has something to do with the interaction with bio splitting and plugging. Here are a couple more details: - you can reproduce it on a PCI device by setting max_sectors_kb to 32 - the order issued is not present if the submitted IO is a read. I'm happy to run additional testing to shed more light on the behavior. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nvme: split bios issued in reverse order 2022-05-24 13:25 ` Jonathan Nicklin @ 2022-05-24 16:08 ` Keith Busch 2022-05-24 16:12 ` Jonathan Nicklin 0 siblings, 1 reply; 13+ messages in thread From: Keith Busch @ 2022-05-24 16:08 UTC (permalink / raw) To: Jonathan Nicklin; +Cc: Sagi Grimberg, linux-nvme On Tue, May 24, 2022 at 09:25:20AM -0400, Jonathan Nicklin wrote: > On Tue, May 24, 2022 at 8:58 AM Sagi Grimberg <sagi@grimberg.me> wrote: > > > > > > > There seems to be an inconsistency in the order of writes that are > > > issued after splitting a bio. Ordering depends on how the application > > > write is submitted and the number of I/O queues configured. > > > > > > In our testing nvme/tcp, > > > > Is this specific to nvme-tcp? > > No. This is not specific to nvme-tcp. I confirmed the same behavior > directly to a pci device. I just tried this on a qemu pci device with MDTS set to '3' for 32k splits. Both psync and libaio dispatch the submitted bio's in ascending LBA order. I'm using kernel 5.18 with these fio commands: # fio --name=global --filename=/dev/nvme0n1 --bs=128k --direct=1 --size=128k --rw=read --ioengine=psync --name=job # fio --name=global --filename=/dev/nvme0n1 --bs=128k --direct=1 --size=128k --rw=read --ioengine=libaio --name=job The first command produced this trace: fio-438 .. 57.799950: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=29184, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-438 .. 57.799995: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=513, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-438 .. 57.800002: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=514, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-438 .. 57.800009: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=515, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-438 .. 57.800017: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=516, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=256, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-438 .. 57.800023: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=517, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=320, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-438 .. 57.800029: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=518, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=384, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-438 .. 57.800046: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=519, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=448, len=63, ctrl=0x0, dsmgmt=0, reftag=0) And the second command produced this: fio-445 .. 83.440824: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=32896, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-445 .. 83.440870: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=28801, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-445 .. 83.440920: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=24706, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-445 .. 83.440931: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=20611, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-445 .. 83.441246: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=36992, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=256, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-445 .. 83.441259: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=32897, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=320, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-445 .. 83.441287: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=28802, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=384, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-445 .. 83.441348: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=41088, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=448, len=63, ctrl=0x0, dsmgmt=0, reftag=0) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nvme: split bios issued in reverse order 2022-05-24 16:08 ` Keith Busch @ 2022-05-24 16:12 ` Jonathan Nicklin 2022-05-24 19:25 ` Keith Busch 0 siblings, 1 reply; 13+ messages in thread From: Jonathan Nicklin @ 2022-05-24 16:12 UTC (permalink / raw) To: Keith Busch; +Cc: Sagi Grimberg, linux-nvme On Tue, May 24, 2022 at 12:08 PM Keith Busch <kbusch@kernel.org> wrote: > > On Tue, May 24, 2022 at 09:25:20AM -0400, Jonathan Nicklin wrote: > > On Tue, May 24, 2022 at 8:58 AM Sagi Grimberg <sagi@grimberg.me> wrote: > > > > > > > > > > There seems to be an inconsistency in the order of writes that are > > > > issued after splitting a bio. Ordering depends on how the application > > > > write is submitted and the number of I/O queues configured. > > > > > > > > In our testing nvme/tcp, > > > > > > Is this specific to nvme-tcp? > > > > No. This is not specific to nvme-tcp. I confirmed the same behavior > > directly to a pci device. > > I just tried this on a qemu pci device with MDTS set to '3' for 32k splits. > Both psync and libaio dispatch the submitted bio's in ascending LBA order. I'm > using kernel 5.18 with these fio commands: > > # fio --name=global --filename=/dev/nvme0n1 --bs=128k --direct=1 --size=128k --rw=read --ioengine=psync --name=job > # fio --name=global --filename=/dev/nvme0n1 --bs=128k --direct=1 --size=128k --rw=read --ioengine=libaio --name=job > > The first command produced this trace: > > fio-438 .. 57.799950: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=29184, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > fio-438 .. 57.799995: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=513, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > fio-438 .. 57.800002: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=514, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > fio-438 .. 57.800009: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=515, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > fio-438 .. 57.800017: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=516, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=256, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > fio-438 .. 57.800023: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=517, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=320, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > fio-438 .. 57.800029: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=518, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=384, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > fio-438 .. 57.800046: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=519, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=448, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > > And the second command produced this: > > fio-445 .. 83.440824: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=32896, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > fio-445 .. 83.440870: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=28801, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > fio-445 .. 83.440920: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=24706, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > fio-445 .. 83.440931: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=20611, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > fio-445 .. 83.441246: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=36992, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=256, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > fio-445 .. 83.441259: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=32897, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=320, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > fio-445 .. 83.441287: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=28802, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=384, len=63, ctrl=0x0, dsmgmt=0, reftag=0) > fio-445 .. 83.441348: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=10, cmdid=41088, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_read slba=448, len=63, ctrl=0x0, dsmgmt=0, reftag=0) The command lines you have are for read operations. The behavior seems only to appear with writes. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nvme: split bios issued in reverse order 2022-05-24 16:12 ` Jonathan Nicklin @ 2022-05-24 19:25 ` Keith Busch 2022-05-24 19:37 ` Jonathan Nicklin 0 siblings, 1 reply; 13+ messages in thread From: Keith Busch @ 2022-05-24 19:25 UTC (permalink / raw) To: Jonathan Nicklin; +Cc: Sagi Grimberg, linux-nvme On Tue, May 24, 2022 at 12:12:29PM -0400, Jonathan Nicklin wrote: > > The command lines you have are for read operations. The behavior seems > only to appear with writes. Huh, I'll be darn... I think it's because the writes are plugged and the reads are not. The plug appends requests to the head of the plug list, and unplugging will dispatch the requests in the reverse order. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nvme: split bios issued in reverse order 2022-05-24 19:25 ` Keith Busch @ 2022-05-24 19:37 ` Jonathan Nicklin 2022-05-24 20:29 ` Keith Busch 0 siblings, 1 reply; 13+ messages in thread From: Jonathan Nicklin @ 2022-05-24 19:37 UTC (permalink / raw) To: Keith Busch; +Cc: Sagi Grimberg, linux-nvme On Tue, May 24, 2022 at 3:25 PM Keith Busch <kbusch@kernel.org> wrote: > > On Tue, May 24, 2022 at 12:12:29PM -0400, Jonathan Nicklin wrote: > > > > The command lines you have are for read operations. The behavior seems > > only to appear with writes. > > Huh, I'll be darn... > > I think it's because the writes are plugged and the reads are not. The plug > appends requests to the head of the plug list, and unplugging will dispatch the > requests in the reverse order. Thanks for confirming! That's about where I got to. Do you have any ideas on what might explain the difference in behavior between fio/pvsync and fio/libaio? And, why does this not seem to occur when only one nvme queue is present? Perhaps the in-order cases are an indication of not being plugged? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nvme: split bios issued in reverse order 2022-05-24 19:37 ` Jonathan Nicklin @ 2022-05-24 20:29 ` Keith Busch 2022-05-24 20:32 ` Jonathan Nicklin 0 siblings, 1 reply; 13+ messages in thread From: Keith Busch @ 2022-05-24 20:29 UTC (permalink / raw) To: Jonathan Nicklin; +Cc: Sagi Grimberg, linux-nvme On Tue, May 24, 2022 at 03:37:56PM -0400, Jonathan Nicklin wrote: > On Tue, May 24, 2022 at 3:25 PM Keith Busch <kbusch@kernel.org> wrote: > > > > On Tue, May 24, 2022 at 12:12:29PM -0400, Jonathan Nicklin wrote: > > > > > > The command lines you have are for read operations. The behavior seems > > > only to appear with writes. > > > > Huh, I'll be darn... > > > > I think it's because the writes are plugged and the reads are not. The plug > > appends requests to the head of the plug list, and unplugging will dispatch the > > requests in the reverse order. > > Thanks for confirming! That's about where I got to. Do you have any > ideas on what might explain the difference in behavior between > fio/pvsync and fio/libaio? And, why does this not seem to occur when > only one nvme queue is present? Perhaps the in-order cases are an > indication of not being plugged? I actually didn't see a difference between libaio or psync, and also io_uring. They all plugged and reversed the dispatch order. Do you have a scheduler enabled? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nvme: split bios issued in reverse order 2022-05-24 20:29 ` Keith Busch @ 2022-05-24 20:32 ` Jonathan Nicklin 2022-05-24 21:44 ` Damien Le Moal 0 siblings, 1 reply; 13+ messages in thread From: Jonathan Nicklin @ 2022-05-24 20:32 UTC (permalink / raw) To: Keith Busch; +Cc: Sagi Grimberg, linux-nvme On Tue, May 24, 2022 at 4:29 PM Keith Busch <kbusch@kernel.org> wrote: > > On Tue, May 24, 2022 at 03:37:56PM -0400, Jonathan Nicklin wrote: > > On Tue, May 24, 2022 at 3:25 PM Keith Busch <kbusch@kernel.org> wrote: > > > > > > On Tue, May 24, 2022 at 12:12:29PM -0400, Jonathan Nicklin wrote: > > > > > > > > The command lines you have are for read operations. The behavior seems > > > > only to appear with writes. > > > > > > Huh, I'll be darn... > > > > > > I think it's because the writes are plugged and the reads are not. The plug > > > appends requests to the head of the plug list, and unplugging will dispatch the > > > requests in the reverse order. > > > > Thanks for confirming! That's about where I got to. Do you have any > > ideas on what might explain the difference in behavior between > > fio/pvsync and fio/libaio? And, why does this not seem to occur when > > only one nvme queue is present? Perhaps the in-order cases are an > > indication of not being plugged? > > I actually didn't see a difference between libaio or psync, and also io_uring. > They all plugged and reversed the dispatch order. Do you have a scheduler > enabled? Nope, there's no scheduler in the way. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nvme: split bios issued in reverse order 2022-05-24 20:32 ` Jonathan Nicklin @ 2022-05-24 21:44 ` Damien Le Moal 2022-05-24 22:01 ` Jonathan Nicklin 0 siblings, 1 reply; 13+ messages in thread From: Damien Le Moal @ 2022-05-24 21:44 UTC (permalink / raw) To: Jonathan Nicklin, Keith Busch; +Cc: Sagi Grimberg, linux-nvme On 5/25/22 05:32, Jonathan Nicklin wrote: > On Tue, May 24, 2022 at 4:29 PM Keith Busch <kbusch@kernel.org> wrote: >> >> On Tue, May 24, 2022 at 03:37:56PM -0400, Jonathan Nicklin wrote: >>> On Tue, May 24, 2022 at 3:25 PM Keith Busch <kbusch@kernel.org> wrote: >>>> >>>> On Tue, May 24, 2022 at 12:12:29PM -0400, Jonathan Nicklin wrote: >>>>> >>>>> The command lines you have are for read operations. The behavior seems >>>>> only to appear with writes. >>>> >>>> Huh, I'll be darn... >>>> >>>> I think it's because the writes are plugged and the reads are not. The plug >>>> appends requests to the head of the plug list, and unplugging will dispatch the >>>> requests in the reverse order. >>> >>> Thanks for confirming! That's about where I got to. Do you have any >>> ideas on what might explain the difference in behavior between >>> fio/pvsync and fio/libaio? And, why does this not seem to occur when >>> only one nvme queue is present? Perhaps the in-order cases are an >>> indication of not being plugged? >> >> I actually didn't see a difference between libaio or psync, and also io_uring. >> They all plugged and reversed the dispatch order. Do you have a scheduler >> enabled? > > Nope, there's no scheduler in the way. If the drive is exercised at QD=1, e.g. psync() and libaio with iodepth=1, then plugging does not matter as there will be no merge (so no at-head insertion in the plug). Commands will be executed in the user submission order. At higher qd, if merge happen while plugged, then order will be reversed (libaio with iodepth > 1 only). > -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nvme: split bios issued in reverse order 2022-05-24 21:44 ` Damien Le Moal @ 2022-05-24 22:01 ` Jonathan Nicklin 2022-05-24 22:54 ` Jonathan Nicklin 2022-05-25 1:18 ` Damien Le Moal 0 siblings, 2 replies; 13+ messages in thread From: Jonathan Nicklin @ 2022-05-24 22:01 UTC (permalink / raw) To: Damien Le Moal; +Cc: Keith Busch, Sagi Grimberg, linux-nvme With the drive exercised as follows: app: fio engine: libaio queue depth: 1 block size: 128K device max_sectors_kb: 32K A 128K user IO is split into four 32K I/Os which are then issued in reverse order as follows: fio-12103 [001] ..... 89587.120514: nvme_setup_cmd: nvme1: disk=nvme1c1n1, qid=2, cmdid=12328, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-12103 [001] ..... 89587.120515: nvme_setup_cmd: nvme1: disk=nvme1c1n1, qid=2, cmdid=12327, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-12103 [001] ..... 89587.120518: nvme_setup_cmd: nvme1: disk=nvme1c1n1, qid=2, cmdid=12326, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-12103 [001] ..... 89587.120518: nvme_setup_cmd: nvme1: disk=nvme1c1n1, qid=2, cmdid=12325, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0) On Tue, May 24, 2022 at 5:44 PM Damien Le Moal <damien.lemoal@opensource.wdc.com> wrote: > > On 5/25/22 05:32, Jonathan Nicklin wrote: > > On Tue, May 24, 2022 at 4:29 PM Keith Busch <kbusch@kernel.org> wrote: > >> > >> On Tue, May 24, 2022 at 03:37:56PM -0400, Jonathan Nicklin wrote: > >>> On Tue, May 24, 2022 at 3:25 PM Keith Busch <kbusch@kernel.org> wrote: > >>>> > >>>> On Tue, May 24, 2022 at 12:12:29PM -0400, Jonathan Nicklin wrote: > >>>>> > >>>>> The command lines you have are for read operations. The behavior seems > >>>>> only to appear with writes. > >>>> > >>>> Huh, I'll be darn... > >>>> > >>>> I think it's because the writes are plugged and the reads are not. The plug > >>>> appends requests to the head of the plug list, and unplugging will dispatch the > >>>> requests in the reverse order. > >>> > >>> Thanks for confirming! That's about where I got to. Do you have any > >>> ideas on what might explain the difference in behavior between > >>> fio/pvsync and fio/libaio? And, why does this not seem to occur when > >>> only one nvme queue is present? Perhaps the in-order cases are an > >>> indication of not being plugged? > >> > >> I actually didn't see a difference between libaio or psync, and also io_uring. > >> They all plugged and reversed the dispatch order. Do you have a scheduler > >> enabled? > > > > Nope, there's no scheduler in the way. > > If the drive is exercised at QD=1, e.g. psync() and libaio with iodepth=1, > then plugging does not matter as there will be no merge (so no at-head > insertion in the plug). Commands will be executed in the user submission > order. At higher qd, if merge happen while plugged, then order will be > reversed (libaio with iodepth > 1 only). > > > > > > -- > Damien Le Moal > Western Digital Research ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nvme: split bios issued in reverse order 2022-05-24 22:01 ` Jonathan Nicklin @ 2022-05-24 22:54 ` Jonathan Nicklin 2022-05-25 1:18 ` Damien Le Moal 1 sibling, 0 replies; 13+ messages in thread From: Jonathan Nicklin @ 2022-05-24 22:54 UTC (permalink / raw) To: Damien Le Moal; +Cc: Keith Busch, Sagi Grimberg, linux-nvme Just to clear up confusion on the observed behavior, Here's a trace from a virtual NVME device in QEMU (thanks for the idea Keith). # CPUs >1 to ensure multiple nvme queues 4 # Kernel Linux debian 5.16.0-0.bpo.4-amd64 #1 SMP PREEMPT # Simulate effect of NVME MDTS @ 32K echo 32 > /sys/block/nvme0n1/queue/max_sectors_kb # Basic QD1 Write w/ libaio (split IOs get issued in descending LBA order) fio --name dbg --filename=/dev/nvme0n1 --rw=write --iodepth=1 --bs=128K --ioengine=libaio --direct=1 --size=1M fio-3201 [002] ..... 1070.001305: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=4867, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-3201 [002] ..... 1070.001308: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=17154, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-3201 [002] ..... 1070.001309: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=29441, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-3201 [002] ..... 1070.001310: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=13056, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-3201 [002] ..... 1070.001559: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=8963, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=448, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-3201 [002] ..... 1070.001560: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=21250, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=384, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-3201 [002] ..... 1070.001561: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=33537, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=320, len=63, ctrl=0x0, dsmgmt=0, reftag=0) fio-3201 [002] ..... 1070.001561: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=3, cmdid=17152, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=256, len=63, ctrl=0x0, dsmgmt=0, reftag=0) # Basic QD1 Write w/ pvsync (split IOs get issued in ascending LBA order) fio --name dbg --filename=/dev/nvme0n1 --rw=write --iodepth=1 --bs=128K --ioengine=pvsync --direct=1 --size=1M kworker/1:1H-139 [001] ..... 1392.956314: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=2, cmdid=33088, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0, reftag=0) kworker/1:1H-139 [001] ..... 1392.956316: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=2, cmdid=33089, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0, reftag=0) kworker/1:1H-139 [001] ..... 1392.956318: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=2, cmdid=16706, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0, reftag=0) kworker/1:1H-139 [001] ..... 1392.956320: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=2, cmdid=4419, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, reftag=0) kworker/1:1H-139 [001] ..... 1392.956537: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=2, cmdid=37184, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=256, len=63, ctrl=0x0, dsmgmt=0, reftag=0) kworker/1:1H-139 [001] ..... 1392.956538: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=2, cmdid=37185, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=320, len=63, ctrl=0x0, dsmgmt=0, reftag=0) kworker/1:1H-139 [001] ..... 1392.956539: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=2, cmdid=20802, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=384, len=63, ctrl=0x0, dsmgmt=0, reftag=0) kworker/1:1H-139 [001] ..... 1392.956540: nvme_setup_cmd: nvme0: disk=nvme0n1, qid=2, cmdid=8515, nsid=1, flags=0x0, meta=0x0, cmd=(nvme_cmd_write slba=448, len=63, ctrl=0x0, dsmgmt=0, reftag=0) On Tue, May 24, 2022 at 6:01 PM Jonathan Nicklin <jnicklin@blockbridge.com> wrote: > > With the drive exercised as follows: > > app: fio > engine: libaio > queue depth: 1 > block size: 128K > device max_sectors_kb: 32K > > A 128K user IO is split into four 32K I/Os which are then issued in > reverse order as follows: > > fio-12103 [001] ..... 89587.120514: nvme_setup_cmd: > nvme1: disk=nvme1c1n1, qid=2, cmdid=12328, nsid=1, flags=0x0, > meta=0x0, cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, > reftag=0) > fio-12103 [001] ..... 89587.120515: nvme_setup_cmd: > nvme1: disk=nvme1c1n1, qid=2, cmdid=12327, nsid=1, flags=0x0, > meta=0x0, cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0, > reftag=0) > fio-12103 [001] ..... 89587.120518: nvme_setup_cmd: > nvme1: disk=nvme1c1n1, qid=2, cmdid=12326, nsid=1, flags=0x0, > meta=0x0, cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0, > reftag=0) > fio-12103 [001] ..... 89587.120518: nvme_setup_cmd: > nvme1: disk=nvme1c1n1, qid=2, cmdid=12325, nsid=1, flags=0x0, > meta=0x0, cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0, > reftag=0) > > On Tue, May 24, 2022 at 5:44 PM Damien Le Moal > <damien.lemoal@opensource.wdc.com> wrote: > > > > On 5/25/22 05:32, Jonathan Nicklin wrote: > > > On Tue, May 24, 2022 at 4:29 PM Keith Busch <kbusch@kernel.org> wrote: > > >> > > >> On Tue, May 24, 2022 at 03:37:56PM -0400, Jonathan Nicklin wrote: > > >>> On Tue, May 24, 2022 at 3:25 PM Keith Busch <kbusch@kernel.org> wrote: > > >>>> > > >>>> On Tue, May 24, 2022 at 12:12:29PM -0400, Jonathan Nicklin wrote: > > >>>>> > > >>>>> The command lines you have are for read operations. The behavior seems > > >>>>> only to appear with writes. > > >>>> > > >>>> Huh, I'll be darn... > > >>>> > > >>>> I think it's because the writes are plugged and the reads are not. The plug > > >>>> appends requests to the head of the plug list, and unplugging will dispatch the > > >>>> requests in the reverse order. > > >>> > > >>> Thanks for confirming! That's about where I got to. Do you have any > > >>> ideas on what might explain the difference in behavior between > > >>> fio/pvsync and fio/libaio? And, why does this not seem to occur when > > >>> only one nvme queue is present? Perhaps the in-order cases are an > > >>> indication of not being plugged? > > >> > > >> I actually didn't see a difference between libaio or psync, and also io_uring. > > >> They all plugged and reversed the dispatch order. Do you have a scheduler > > >> enabled? > > > > > > Nope, there's no scheduler in the way. > > > > If the drive is exercised at QD=1, e.g. psync() and libaio with iodepth=1, > > then plugging does not matter as there will be no merge (so no at-head > > insertion in the plug). Commands will be executed in the user submission > > order. At higher qd, if merge happen while plugged, then order will be > > reversed (libaio with iodepth > 1 only). > > > > > > > > > > > -- > > Damien Le Moal > > Western Digital Research ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: nvme: split bios issued in reverse order 2022-05-24 22:01 ` Jonathan Nicklin 2022-05-24 22:54 ` Jonathan Nicklin @ 2022-05-25 1:18 ` Damien Le Moal 1 sibling, 0 replies; 13+ messages in thread From: Damien Le Moal @ 2022-05-25 1:18 UTC (permalink / raw) To: Jonathan Nicklin; +Cc: Keith Busch, Sagi Grimberg, linux-nvme On 5/25/22 07:01, Jonathan Nicklin wrote: > With the drive exercised as follows: > > app: fio > engine: libaio > queue depth: 1 > block size: 128K > device max_sectors_kb: 32K > > A 128K user IO is split into four 32K I/Os which are then issued in > reverse order as follows: > > fio-12103 [001] ..... 89587.120514: nvme_setup_cmd: > nvme1: disk=nvme1c1n1, qid=2, cmdid=12328, nsid=1, flags=0x0, > meta=0x0, cmd=(nvme_cmd_write slba=192, len=63, ctrl=0x0, dsmgmt=0, > reftag=0) > fio-12103 [001] ..... 89587.120515: nvme_setup_cmd: > nvme1: disk=nvme1c1n1, qid=2, cmdid=12327, nsid=1, flags=0x0, > meta=0x0, cmd=(nvme_cmd_write slba=128, len=63, ctrl=0x0, dsmgmt=0, > reftag=0) > fio-12103 [001] ..... 89587.120518: nvme_setup_cmd: > nvme1: disk=nvme1c1n1, qid=2, cmdid=12326, nsid=1, flags=0x0, > meta=0x0, cmd=(nvme_cmd_write slba=64, len=63, ctrl=0x0, dsmgmt=0, > reftag=0) > fio-12103 [001] ..... 89587.120518: nvme_setup_cmd: > nvme1: disk=nvme1c1n1, qid=2, cmdid=12325, nsid=1, flags=0x0, > meta=0x0, cmd=(nvme_cmd_write slba=0, len=63, ctrl=0x0, dsmgmt=0, > reftag=0) Yep, expected. This is the same as doing libaio with iodepth=4 and bs=32K. Your max_sectors_kb is really very small though. Can't you increase its value ? (max_hw_sector_kb tells you the device max but other limitations may apply, e.g. prp vs sgl) > > On Tue, May 24, 2022 at 5:44 PM Damien Le Moal > <damien.lemoal@opensource.wdc.com> wrote: >> >> On 5/25/22 05:32, Jonathan Nicklin wrote: >>> On Tue, May 24, 2022 at 4:29 PM Keith Busch <kbusch@kernel.org> wrote: >>>> >>>> On Tue, May 24, 2022 at 03:37:56PM -0400, Jonathan Nicklin wrote: >>>>> On Tue, May 24, 2022 at 3:25 PM Keith Busch <kbusch@kernel.org> wrote: >>>>>> >>>>>> On Tue, May 24, 2022 at 12:12:29PM -0400, Jonathan Nicklin wrote: >>>>>>> >>>>>>> The command lines you have are for read operations. The behavior seems >>>>>>> only to appear with writes. >>>>>> >>>>>> Huh, I'll be darn... >>>>>> >>>>>> I think it's because the writes are plugged and the reads are not. The plug >>>>>> appends requests to the head of the plug list, and unplugging will dispatch the >>>>>> requests in the reverse order. >>>>> >>>>> Thanks for confirming! That's about where I got to. Do you have any >>>>> ideas on what might explain the difference in behavior between >>>>> fio/pvsync and fio/libaio? And, why does this not seem to occur when >>>>> only one nvme queue is present? Perhaps the in-order cases are an >>>>> indication of not being plugged? >>>> >>>> I actually didn't see a difference between libaio or psync, and also io_uring. >>>> They all plugged and reversed the dispatch order. Do you have a scheduler >>>> enabled? >>> >>> Nope, there's no scheduler in the way. >> >> If the drive is exercised at QD=1, e.g. psync() and libaio with iodepth=1, >> then plugging does not matter as there will be no merge (so no at-head >> insertion in the plug). Commands will be executed in the user submission >> order. At higher qd, if merge happen while plugged, then order will be >> reversed (libaio with iodepth > 1 only). >> >>> >> >> >> -- >> Damien Le Moal >> Western Digital Research -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-05-25 1:18 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-23 16:16 nvme: split bios issued in reverse order Jonathan Nicklin 2022-05-24 12:58 ` Sagi Grimberg 2022-05-24 13:25 ` Jonathan Nicklin 2022-05-24 16:08 ` Keith Busch 2022-05-24 16:12 ` Jonathan Nicklin 2022-05-24 19:25 ` Keith Busch 2022-05-24 19:37 ` Jonathan Nicklin 2022-05-24 20:29 ` Keith Busch 2022-05-24 20:32 ` Jonathan Nicklin 2022-05-24 21:44 ` Damien Le Moal 2022-05-24 22:01 ` Jonathan Nicklin 2022-05-24 22:54 ` Jonathan Nicklin 2022-05-25 1:18 ` Damien Le Moal
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.