linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head
@ 2022-10-20 15:35 Uros Bizjak
  2022-10-23  8:22 ` Sagi Grimberg
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Uros Bizjak @ 2022-10-20 15:35 UTC (permalink / raw)
  To: linux-nvme, linux-kernel
  Cc: Uros Bizjak, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni

Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
nvmet_update_sq_head.  x86 CMPXCHG instruction returns success in ZF flag, so
this change saves a compare after cmpxchg (and related move instruction in
front of cmpxchg).

Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
fails. There is no need to re-read the value in the loop.

Note that the value from *ptr should be read using READ_ONCE to prevent
the compiler from merging, refetching or reordering the read.

No functional change intended.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
---
 drivers/nvme/target/core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 14677145bbba..a384a0927aed 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -695,11 +695,10 @@ static void nvmet_update_sq_head(struct nvmet_req *req)
 	if (req->sq->size) {
 		u32 old_sqhd, new_sqhd;
 
+		old_sqhd = READ_ONCE(req->sq->sqhd);
 		do {
-			old_sqhd = req->sq->sqhd;
 			new_sqhd = (old_sqhd + 1) % req->sq->size;
-		} while (cmpxchg(&req->sq->sqhd, old_sqhd, new_sqhd) !=
-					old_sqhd);
+		} while (!try_cmpxchg(&req->sq->sqhd, &old_sqhd, new_sqhd));
 	}
 	req->cqe->sq_head = cpu_to_le16(req->sq->sqhd & 0x0000FFFF);
 }
-- 
2.37.3



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head
  2022-10-20 15:35 [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head Uros Bizjak
@ 2022-10-23  8:22 ` Sagi Grimberg
  2022-10-25  0:47 ` Chaitanya Kulkarni
  2022-11-01  9:45 ` Christoph Hellwig
  2 siblings, 0 replies; 5+ messages in thread
From: Sagi Grimberg @ 2022-10-23  8:22 UTC (permalink / raw)
  To: Uros Bizjak, linux-nvme, linux-kernel
  Cc: Christoph Hellwig, Chaitanya Kulkarni

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head
  2022-10-20 15:35 [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head Uros Bizjak
  2022-10-23  8:22 ` Sagi Grimberg
@ 2022-10-25  0:47 ` Chaitanya Kulkarni
  2022-10-25  6:48   ` Uros Bizjak
  2022-11-01  9:45 ` Christoph Hellwig
  2 siblings, 1 reply; 5+ messages in thread
From: Chaitanya Kulkarni @ 2022-10-25  0:47 UTC (permalink / raw)
  To: Uros Bizjak, linux-nvme, linux-kernel
  Cc: Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni

On 10/20/22 08:35, Uros Bizjak wrote:
> Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
> nvmet_update_sq_head.  x86 CMPXCHG instruction returns success in ZF flag, so
> this change saves a compare after cmpxchg (and related move instruction in
> front of cmpxchg).
> 

Is it worth a share delts of assembly instructions of the changes above?
as developers on block mailing list are sharing the delta between before
and after patch including the assembly.

I also hope that you have tested this change with blktests nvme.

Either way:-

Reviewed-by: ChaItanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head
  2022-10-25  0:47 ` Chaitanya Kulkarni
@ 2022-10-25  6:48   ` Uros Bizjak
  0 siblings, 0 replies; 5+ messages in thread
From: Uros Bizjak @ 2022-10-25  6:48 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: linux-nvme, linux-kernel, Christoph Hellwig, Sagi Grimberg

On Tue, Oct 25, 2022 at 2:47 AM Chaitanya Kulkarni
<chaitanyak@nvidia.com> wrote:
>
> On 10/20/22 08:35, Uros Bizjak wrote:
> > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
> > nvmet_update_sq_head.  x86 CMPXCHG instruction returns success in ZF flag, so
> > this change saves a compare after cmpxchg (and related move instruction in
> > front of cmpxchg).
> >
>
> Is it worth a share delts of assembly instructions of the changes above?
> as developers on block mailing list are sharing the delta between before
> and after patch including the assembly.

The difference in the assembly of nvmet_update_sq_head function is:

before:

0000000000001d30 <nvmet_update_sq_head>:
    1d30:    48 8b 4f 10              mov    0x10(%rdi),%rcx
    1d34:    49 89 f8                 mov    %rdi,%r8
    1d37:    0f b7 71 1a              movzwl 0x1a(%rcx),%esi
    1d3b:    66 85 f6                 test   %si,%si
    1d3e:    75 14                    jne    1d54 <nvmet_update_sq_head+0x24>
    1d40:    49 8b 40 08              mov    0x8(%r8),%rax
    1d44:    8b 51 1c                 mov    0x1c(%rcx),%edx
    1d47:    66 89 50 08              mov    %dx,0x8(%rax)
    1d4b:    e9 00 00 00 00           jmpq   1d50 <nvmet_update_sq_head+0x20>
            1d4c: R_X86_64_PLT32    __x86_return_thunk-0x4
    1d50:    0f b7 71 1a              movzwl 0x1a(%rcx),%esi
    1d54:    8b 79 1c                 mov    0x1c(%rcx),%edi
    1d57:    31 d2                    xor    %edx,%edx
    1d59:    8d 47 01                 lea    0x1(%rdi),%eax
    1d5c:    f7 f6                    div    %esi
    1d5e:    89 f8                    mov    %edi,%eax
    1d60:    f0 0f b1 51 1c           lock cmpxchg %edx,0x1c(%rcx)
    1d65:    49 8b 48 10              mov    0x10(%r8),%rcx
    1d69:    39 c7                    cmp    %eax,%edi
    1d6b:    75 e3                    jne    1d50 <nvmet_update_sq_head+0x20>
    1d6d:    49 8b 40 08              mov    0x8(%r8),%rax
    1d71:    8b 51 1c                 mov    0x1c(%rcx),%edx
    1d74:    66 89 50 08              mov    %dx,0x8(%rax)
    1d78:    e9 00 00 00 00           jmpq   1d7d <nvmet_update_sq_head+0x4d>
            1d79: R_X86_64_PLT32    __x86_return_thunk-0x4
    1d7d:

after:

0000000000001d30 <nvmet_update_sq_head>:
    1d30:    48 8b 4f 10              mov    0x10(%rdi),%rcx
    1d34:    0f b7 51 1a              movzwl 0x1a(%rcx),%edx
    1d38:    66 85 d2                 test   %dx,%dx
    1d3b:    74 1e                    je     1d5b <nvmet_update_sq_head+0x2b>
    1d3d:    8b 71 1c                 mov    0x1c(%rcx),%esi
    1d40:    44 0f b7 c2              movzwl %dx,%r8d
    1d44:    8d 46 01                 lea    0x1(%rsi),%eax
    1d47:    31 d2                    xor    %edx,%edx
    1d49:    41 f7 f0                 div    %r8d
    1d4c:    89 f0                    mov    %esi,%eax
    1d4e:    f0 0f b1 51 1c           lock cmpxchg %edx,0x1c(%rcx)
    1d53:    48 8b 4f 10              mov    0x10(%rdi),%rcx
    1d57:    89 c6                    mov    %eax,%esi
    1d59:    75 10                    jne    1d6b <nvmet_update_sq_head+0x3b>
    1d5b:    48 8b 47 08              mov    0x8(%rdi),%rax
    1d5f:    8b 51 1c                 mov    0x1c(%rcx),%edx
    1d62:    66 89 50 08              mov    %dx,0x8(%rax)
    1d66:    e9 00 00 00 00           jmpq   1d6b <nvmet_update_sq_head+0x3b>
            1d67: R_X86_64_PLT32    __x86_return_thunk-0x4
    1d6b:    0f b7 51 1a              movzwl 0x1a(%rcx),%edx
    1d6f:    eb cf                    jmp    1d40 <nvmet_update_sq_head+0x10>
    1d71:

You can see that in addition to the smaller size of the function, the
load of req->sq->size at 1d6b got moved to a cold path. As the main
benefit, the load at 1d3d is now out of the loop, and the value in
%esi is now provided by cmpxchg insn itself at 1d4e (plus move at
1d57). Unfortunately, the division clobbers %eax, so some reg-reg
moves are necessary. Note also that the compare at 1d69 is now gone.

> I also hope that you have tested this change with blktests nvme.

No, I didn't test the patch that thoroughly, but the change is the
same as some similar recent changes in the generic code, so I
confirmed the patch by inspecting asm code. OTOH, the kernel booted
from a NVME SSD.

> Either way:-
>
> Reviewed-by: ChaItanya Kulkarni <kch@nvidia.com>

Thanks!

Uros.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head
  2022-10-20 15:35 [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head Uros Bizjak
  2022-10-23  8:22 ` Sagi Grimberg
  2022-10-25  0:47 ` Chaitanya Kulkarni
@ 2022-11-01  9:45 ` Christoph Hellwig
  2 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2022-11-01  9:45 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: linux-nvme, linux-kernel, Christoph Hellwig, Sagi Grimberg,
	Chaitanya Kulkarni

Thanks,

applied to nvme-6.2.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-01  9:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-20 15:35 [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head Uros Bizjak
2022-10-23  8:22 ` Sagi Grimberg
2022-10-25  0:47 ` Chaitanya Kulkarni
2022-10-25  6:48   ` Uros Bizjak
2022-11-01  9:45 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).