Re: [SPDK] Why multiple threads achieve lower IOPS than a single thread in nvme?

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [SPDK] Why multiple threads achieve lower IOPS than a single thread in nvme?
@ 2017-11-09  5:43 Li Wang
  0 siblings, 0 replies; 6+ messages in thread
From: Li Wang @ 2017-11-09  5:43 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 833 bytes --]

Hi Ziye, 

Thanks for your prompt reply. I got your general idea. I have a couple of further question. I will appreciate if you could help.

> However, it will be the contention among different qpairs in firmware code.  
Is there any reference about this?

> Currently SPDK provides qpair priority, so there will still be contention.
Could you please more details or a link about the qpair priority in SPDK? I will appreciate that.

> And in your case, will all those threads allocate memory from dedicated pools instead of all threads are questing from one pool?  For the later case, it may influence the performance. 
Sorry, I am new to SPDK and I don’t know what exactly the term “pool” means. I allocated dedicated read/write buffer for each thread using spdk_dma_zmalloc().

Thank you again.

Regards,
Li

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 2809 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [SPDK] Why multiple threads achieve lower IOPS than a single thread in nvme?
@ 2017-11-09  5:49 Yang, Ziye
  0 siblings, 0 replies; 6+ messages in thread
From: Yang, Ziye @ 2017-11-09  5:49 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1699 bytes --]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Li Wang
Sent: Thursday, November 9, 2017 1:43 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Why multiple threads achieve lower IOPS than a single thread in nvme?

Hi Ziye,

Thanks for your prompt reply. I got your general idea. I have a couple of further question. I will appreciate if you could help.

However, it will be the contention among different qpairs in firmware code.
Is there any reference about this?
               [Ziye] Not know the details of how firmware codes works.

Currently SPDK provides qpair priority, so there will still be contention.
Could you please more details or a link about the qpair priority in SPDK? I will appreciate that.
[Ziye] struct spdk_nvme_qpair *spdk_nvme_ctrlr_alloc_io_qpair(struct spdk_nvme_ctrlr *ctrlr,
                              const struct spdk_nvme_io_qpair_opts *opts,
                              size_t opts_size);  You can refer the structure qprio  in spdk_nvme_io_qpair_opts

And in your case, will all those threads allocate memory from dedicated pools instead of all threads are questing from one pool?  For the later case, it may influence the performance.
Sorry, I am new to SPDK and I don’t know what exactly the term “pool” means. I allocated dedicated read/write buffer for each thread using spdk_dma_zmalloc().
[Ziye] call spdk_mempool_create function, then you can pre allocate different memory pool for each thread, and each thread can allocate memory from its own pool.  For the usage, you can search it in spdk’s codes, and you will find how to use it.

Thank you again.

Regards,
Li

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 6728 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [SPDK] Why multiple threads achieve lower IOPS than a single thread in nvme?
@ 2017-11-09  5:08 Yang, Ziye
  0 siblings, 0 replies; 6+ messages in thread
From: Yang, Ziye @ 2017-11-09  5:08 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3597 bytes --]

Hi  Li,

If you use dedicated queue qpairs for each thread, there will be no locking issues. However, it will be the contention among different qpairs in firmware code.  Currently SPDK provides qpair priority, so there will still be contention. And in your case, will all those threads allocate memory from dedicated pools instead of all threads are questing from one pool?  For the later case, it may influence the performance. Generally,  using multiple threads, the IOPS or performance limitation should be <= to one thread on one queue pair case.

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Li Wang
Sent: Thursday, November 9, 2017 12:55 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Why multiple threads achieve lower IOPS than a single thread in nvme?

Hi Jim,

Thank very much for your response. I appreciate it and your points are taken. I sorry I forgot to mention that I used a dedicated queue pair for each thread, so there wasn't any locking overhead on the queue pairs. Also, the performance of random access is quite similar to the sequential access on my device. When the access is random, the overall IOPS achieved by multiple threads is still lower than that of a single thread. Could you suggest any other possible reasons behind this phenomenon?

Regards,
Li Wang

On 8 November 2017 at 23:28, Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>> wrote:
Hi Li,

Sequential workloads will almost always degrade when going from 1 to 2+ cores.  This is because each thread in isolation is doing sequential I/O, but when two threads are each doing sequential I/O, the device no longer sees a sequential I/O stream – it needs to process the two queue pairs in parallel and the I/O stream will no longer be sequential.

In general, using multiple queue pairs has little to no performance difference compared to using a single queue pair.  Any performance degradation seen when moving to multiple threads/queue pairs is typically a result of changes to the observed I/O pattern by the SSD (as I just described), or locking overhead (for example, two threads sharing a single queue pair).

Regards,

-Jim

From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Li Wang <wangli1426(a)gmail.com<mailto:wangli1426(a)gmail.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Tuesday, November 7, 2017 at 8:29 PM
To: "SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>" <SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>>
Subject: [SPDK] Why multiple threads achieve lower IOPS than a single thread in nvme?

Hi guys,

I am planning to redesign a new b plus implementation, tailored to the characteristics of NVMe. I have run extensive experiments to test the performance of nvme. I found that one thread always outperforms multiple threads in terms of IOPS achieved? The same trend is seen when I use different workloads and queue lengths? The experimental results are attached.

My guess is that multiple threads will utilize multiple queue pairs, which causes contention on the hardware side? Am I right? Is there any reference about the overhead of multiple queue pairs or the contention between queue pairs?

Any help will be appreciated!

Thanks,
Li Wang

[nline images 1]
[nline images 2]

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 12573 bytes --]

[-- Attachment #3: image001.png --]
[-- Type: image/png, Size: 53868 bytes --]

[-- Attachment #4: image002.png --]
[-- Type: image/png, Size: 32110 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [SPDK] Why multiple threads achieve lower IOPS than a single thread in nvme?
@ 2017-11-09  4:54 Li Wang
  0 siblings, 0 replies; 6+ messages in thread
From: Li Wang @ 2017-11-09  4:54 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2797 bytes --]

Hi Jim,

Thank very much for your response. I appreciate it and your points are
taken. I sorry I forgot to mention that I used a dedicated queue pair for
each thread, so there wasn't any locking overhead on the queue pairs. Also,
the performance of random access is quite similar to the sequential access
on my device. When the access is random, the overall IOPS achieved by
multiple threads is still lower than that of a single thread. Could you
suggest any other possible reasons behind this phenomenon?



Regards,
Li Wang

On 8 November 2017 at 23:28, Harris, James R <james.r.harris(a)intel.com>
wrote:

> Hi Li,
>
>
>
> Sequential workloads will almost always degrade when going from 1 to 2+
> cores.  This is because each thread in isolation is doing sequential I/O,
> but when two threads are each doing sequential I/O, the device no longer
> sees a sequential I/O stream – it needs to process the two queue pairs in
> parallel and the I/O stream will no longer be sequential.
>
>
>
> In general, using multiple queue pairs has little to no performance
> difference compared to using a single queue pair.  Any performance
> degradation seen when moving to multiple threads/queue pairs is typically a
> result of changes to the observed I/O pattern by the SSD (as I just
> described), or locking overhead (for example, two threads sharing a single
> queue pair).
>
>
>
> Regards,
>
>
>
> -Jim
>
>
>
>
>
>
>
>
>
>
>
> *From: *SPDK <spdk-bounces(a)lists.01.org> on behalf of Li Wang <
> wangli1426(a)gmail.com>
> *Reply-To: *Storage Performance Development Kit <spdk(a)lists.01.org>
> *Date: *Tuesday, November 7, 2017 at 8:29 PM
> *To: *"SPDK(a)lists.01.org" <SPDK(a)lists.01.org>
> *Subject: *[SPDK] Why multiple threads achieve lower IOPS than a single
> thread in nvme?
>
>
>
> Hi guys,
>
>
>
> I am planning to redesign a new b plus implementation, tailored to the
> characteristics of NVMe. I have run extensive experiments to test the
> performance of nvme. I found that one thread always outperforms multiple
> threads in terms of IOPS achieved? The same trend is seen when I use
> different workloads and queue lengths? The experimental results are
> attached.
>
>
>
> My guess is that multiple threads will utilize multiple queue pairs, which
> causes contention on the hardware side? Am I right? Is there any reference
> about the overhead of multiple queue pairs or the contention between queue
> pairs?
>
>
>
> Any help will be appreciated!
>
>
>
> Thanks,
>
> Li Wang
>
>
>
> [image: nline images 1]
>
> [image: nline images 2]
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 6538 bytes --]

[-- Attachment #3: image001.png --]
[-- Type: image/png, Size: 53868 bytes --]

[-- Attachment #4: image002.png --]
[-- Type: image/png, Size: 32110 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [SPDK] Why multiple threads achieve lower IOPS than a single thread in nvme?
@ 2017-11-08 15:28 Harris, James R
  0 siblings, 0 replies; 6+ messages in thread
From: Harris, James R @ 2017-11-08 15:28 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1803 bytes --]

Hi Li,

Sequential workloads will almost always degrade when going from 1 to 2+ cores.  This is because each thread in isolation is doing sequential I/O, but when two threads are each doing sequential I/O, the device no longer sees a sequential I/O stream – it needs to process the two queue pairs in parallel and the I/O stream will no longer be sequential.

In general, using multiple queue pairs has little to no performance difference compared to using a single queue pair.  Any performance degradation seen when moving to multiple threads/queue pairs is typically a result of changes to the observed I/O pattern by the SSD (as I just described), or locking overhead (for example, two threads sharing a single queue pair).

Regards,

-Jim

From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Li Wang <wangli1426(a)gmail.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Tuesday, November 7, 2017 at 8:29 PM
To: "SPDK(a)lists.01.org" <SPDK(a)lists.01.org>
Subject: [SPDK] Why multiple threads achieve lower IOPS than a single thread in nvme?

Hi guys,

I am planning to redesign a new b plus implementation, tailored to the characteristics of NVMe. I have run extensive experiments to test the performance of nvme. I found that one thread always outperforms multiple threads in terms of IOPS achieved? The same trend is seen when I use different workloads and queue lengths? The experimental results are attached.

My guess is that multiple threads will utilize multiple queue pairs, which causes contention on the hardware side? Am I right? Is there any reference about the overhead of multiple queue pairs or the contention between queue pairs?

Any help will be appreciated!

Thanks,
Li Wang

[nline images 1]
[nline images 2]

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 8344 bytes --]

[-- Attachment #3: image001.png --]
[-- Type: image/png, Size: 53868 bytes --]

[-- Attachment #4: image002.png --]
[-- Type: image/png, Size: 32110 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [SPDK] Why multiple threads achieve lower IOPS than a single thread in nvme?
@ 2017-11-08  3:29 Li Wang
  0 siblings, 0 replies; 6+ messages in thread
From: Li Wang @ 2017-11-08  3:29 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 724 bytes --]

Hi guys,

I am planning to redesign a new b plus implementation, tailored to the
characteristics of NVMe. I have run extensive experiments to test the
performance of nvme. I found that one thread always outperforms multiple
threads in terms of IOPS achieved? The same trend is seen when I use
different workloads and queue lengths? The experimental results are
attached.

My guess is that multiple threads will utilize multiple queue pairs, which
causes contention on the hardware side? Am I right? Is there any reference
about the overhead of multiple queue pairs or the contention between queue
pairs?

Any help will be appreciated!

Thanks,
Li Wang

[image: Inline images 1]
[image: Inline images 2]

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 988 bytes --]

[-- Attachment #3: image.png --]
[-- Type: image/png, Size: 38309 bytes --]

[-- Attachment #4: image.png --]
[-- Type: image/png, Size: 32109 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-11-09  5:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-09  5:43 [SPDK] Why multiple threads achieve lower IOPS than a single thread in nvme? Li Wang
  -- strict thread matches above, loose matches on Subject: below --
2017-11-09  5:49 Yang, Ziye
2017-11-09  5:08 Yang, Ziye
2017-11-09  4:54 Li Wang
2017-11-08 15:28 Harris, James R
2017-11-08  3:29 Li Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.