linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] block/mq: blk map queues by core id
@ 2019-03-22 10:09 luferry
  2019-03-22 11:53 ` Dongli Zhang
  2019-03-22 11:58 ` Dongli Zhang
  0 siblings, 2 replies; 9+ messages in thread
From: luferry @ 2019-03-22 10:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ming Lei, Christoph Hellwig, xuyun, linux-block, linux-kernel

under virtual machine environment, cpu topology may differ from normal
physical server.
for example (machine with 4 cores, 2 threads per core):

normal physical server:
core-id   thread-0-id  thread-1-id
0         0            4
1         1            5
2         2            6
3         3            7

virtual machine:
core-id   thread-0-id  thread-1-id
0         0            1
1         2            3
2         4            5
3         6            7

When attach disk with two queues, all the even numbered cpus will be
mapped to queue 0. Under virtual machine, all the cpus is followed by
its sibling cpu.Before this patch, all the odd numbered cpus will also
be mapped to queue 0, can cause serious imbalance.this will lead to
performance impact on system IO

So suggest to allocate cpu map by core id, this can be more currency

Signed-off-by: luferry <luferry@163.com>
---
 block/blk-mq-cpumap.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 03a534820271..4125e8e77679 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
 {
 	unsigned int *map = qmap->mq_map;
 	unsigned int nr_queues = qmap->nr_queues;
-	unsigned int cpu, first_sibling;
+	unsigned int cpu, first_sibling, core = 0;
 
 	for_each_possible_cpu(cpu) {
 		/*
@@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
 			map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
 		} else {
 			first_sibling = get_first_sibling(cpu);
-			if (first_sibling == cpu)
-				map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
-			else
+			if (first_sibling == cpu) {
+				map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
+				core++;
+			} else
 				map[cpu] = map[first_sibling];
 		}
 	}
-- 
2.14.1.40.g8e62ba1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] block/mq: blk map queues by core id
  2019-03-22 10:09 [PATCH] block/mq: blk map queues by core id luferry
@ 2019-03-22 11:53 ` Dongli Zhang
  2019-03-22 11:58 ` Dongli Zhang
  1 sibling, 0 replies; 9+ messages in thread
From: Dongli Zhang @ 2019-03-22 11:53 UTC (permalink / raw)
  To: luferry
  Cc: Jens Axboe, Ming Lei, Christoph Hellwig, linux-block, linux-kernel



On 03/22/2019 06:09 PM, luferry wrote:
> under virtual machine environment, cpu topology may differ from normal

Would mind share the name of virtual machine monitor, the command line if
available, and which device to reproduce.

For instance, I am not able to reproduce with qemu nvme or virtio-blk as I
assume they use pci or virtio specific mapper to establish the mapping.

E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2

Indeed I use three queues instead of twp as one is reserved for admin.

# ls /sys/block/nvme0n1/mq/*
/sys/block/nvme0n1/mq/0:
cpu0  cpu1  cpu2  cpu3  cpu_list  nr_reserved_tags  nr_tags

/sys/block/nvme0n1/mq/1:
cpu4  cpu5  cpu6  cpu7  cpu_list  nr_reserved_tags  nr_tags

> physical server.
> for example (machine with 4 cores, 2 threads per core):
> 
> normal physical server:
> core-id   thread-0-id  thread-1-id
> 0         0            4
> 1         1            5
> 2         2            6
> 3         3            7
> 
> virtual machine:
> core-id   thread-0-id  thread-1-id
> 0         0            1
> 1         2            3
> 2         4            5
> 3         6            7
> 
> When attach disk with two queues, all the even numbered cpus will be
> mapped to queue 0. Under virtual machine, all the cpus is followed by
> its sibling cpu.Before this patch, all the odd numbered cpus will also
> be mapped to queue 0, can cause serious imbalance.this will lead to
> performance impact on system IO
> 
> So suggest to allocate cpu map by core id, this can be more currency
> 
> Signed-off-by: luferry <luferry@163.com>
> ---
>  block/blk-mq-cpumap.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 03a534820271..4125e8e77679 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>  {
>  	unsigned int *map = qmap->mq_map;
>  	unsigned int nr_queues = qmap->nr_queues;
> -	unsigned int cpu, first_sibling;
> +	unsigned int cpu, first_sibling, core = 0;
>  
>  	for_each_possible_cpu(cpu) {
>  		/*
> @@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>  			map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>  		} else {
>  			first_sibling = get_first_sibling(cpu);
> -			if (first_sibling == cpu)
> -				map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
> -			else
> +			if (first_sibling == cpu) {
> +				map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
> +				core++;
> +			} else
>  				map[cpu] = map[first_sibling];
>  		}
>  	}
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] block/mq: blk map queues by core id
  2019-03-22 10:09 [PATCH] block/mq: blk map queues by core id luferry
  2019-03-22 11:53 ` Dongli Zhang
@ 2019-03-22 11:58 ` Dongli Zhang
  2019-03-23  6:34   ` luferry
  1 sibling, 1 reply; 9+ messages in thread
From: Dongli Zhang @ 2019-03-22 11:58 UTC (permalink / raw)
  To: luferry
  Cc: Jens Axboe, Ming Lei, Christoph Hellwig, linux-block, linux-kernel



On 03/22/2019 06:09 PM, luferry wrote:
> under virtual machine environment, cpu topology may differ from normal
> physical server.

Would mind share the name of virtual machine monitor, the command line if
available, and which device to reproduce.

For instance, I am not able to reproduce with qemu nvme or virtio-blk as I
assume they use pci or virtio specific mapper to establish the mapping.

E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2

Indeed I use three queues instead of twp as one is reserved for admin.

# ls /sys/block/nvme0n1/mq/*
/sys/block/nvme0n1/mq/0:
cpu0  cpu1  cpu2  cpu3  cpu_list  nr_reserved_tags  nr_tags

/sys/block/nvme0n1/mq/1:
cpu4  cpu5  cpu6  cpu7  cpu_list  nr_reserved_tags  nr_tags


Thank you very much!

Dongli Zhang

> for example (machine with 4 cores, 2 threads per core):
> 
> normal physical server:
> core-id   thread-0-id  thread-1-id
> 0         0            4
> 1         1            5
> 2         2            6
> 3         3            7
> 
> virtual machine:
> core-id   thread-0-id  thread-1-id
> 0         0            1
> 1         2            3
> 2         4            5
> 3         6            7
> 
> When attach disk with two queues, all the even numbered cpus will be
> mapped to queue 0. Under virtual machine, all the cpus is followed by
> its sibling cpu.Before this patch, all the odd numbered cpus will also
> be mapped to queue 0, can cause serious imbalance.this will lead to
> performance impact on system IO
> 
> So suggest to allocate cpu map by core id, this can be more currency
> 
> Signed-off-by: luferry <luferry@163.com>
> ---
>  block/blk-mq-cpumap.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 03a534820271..4125e8e77679 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>  {
>  	unsigned int *map = qmap->mq_map;
>  	unsigned int nr_queues = qmap->nr_queues;
> -	unsigned int cpu, first_sibling;
> +	unsigned int cpu, first_sibling, core = 0;
>  
>  	for_each_possible_cpu(cpu) {
>  		/*
> @@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>  			map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>  		} else {
>  			first_sibling = get_first_sibling(cpu);
> -			if (first_sibling == cpu)
> -				map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
> -			else
> +			if (first_sibling == cpu) {
> +				map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
> +				core++;
> +			} else
>  				map[cpu] = map[first_sibling];
>  		}
>  	}
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re:Re: [PATCH] block/mq: blk map queues by core id
  2019-03-22 11:58 ` Dongli Zhang
@ 2019-03-23  6:34   ` luferry
  2019-03-23 11:14     ` Dongli Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: luferry @ 2019-03-23  6:34 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: Jens Axboe, Ming Lei, Christoph Hellwig, linux-block, linux-kernel


I just bought one vm , so i cannot access to hypervisor. I will try to build the environment on my desktop.
I'm sure about something.
The hypervisor is KVM and disk driver is virtio-blk.
[root@blk-mq ~]# dmesg |grep KVM
[    0.000000] Hypervisor detected: KVM
[    0.186330] Booting paravirtualized kernel on KVM
[    0.279106] KVM setup async PF for cpu 0
[    0.420819] KVM setup async PF for cpu 1
[    0.421682] KVM setup async PF for cpu 2
[    0.422113] KVM setup async PF for cpu 3
[    0.422434] KVM setup async PF for cpu 4
[    0.422434] KVM setup async PF for cpu 5
[    0.423312] KVM setup async PF for cpu 6
[    0.423394] KVM setup async PF for cpu 7
[    0.424125] KVM setup async PF for cpu 8
[    0.424414] KVM setup async PF for cpu 9
[    0.424415] KVM setup async PF for cpu 10
[    0.425329] KVM setup async PF for cpu 11
[    0.425420] KVM setup async PF for cpu 12
[    0.426156] KVM setup async PF for cpu 13
[    0.426431] KVM setup async PF for cpu 14
[    0.426431] KVM setup async PF for cpu 15
[root@blk-mq ~]# lspci |grep block
00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device
00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device

[root@blk-mq ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    2
Core(s) per socket:    8

[root@blk-mq ~]# ls /sys/block/vdb/mq/
0  1  2  3

[root@blk-mq ~]# cat /proc/cpuinfo |egrep 'processor|core id'
processor	: 0
core id		: 0
processor	: 1
core id		: 0
processor	: 2
core id		: 1
processor	: 3
core id		: 1
processor	: 4
core id		: 2
processor	: 5
core id		: 2
processor	: 6
core id		: 3
processor	: 7
core id		: 3
processor	: 8
core id		: 4
processor	: 9
core id		: 4
processor	: 10
core id		: 5
processor	: 11
core id		: 5
processor	: 12
core id		: 6
processor	: 13
core id		: 6
processor	: 14
core id		: 7
processor	: 15
core id		: 7

--before this patch--
[root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
0, 4, 5, 8, 9, 12, 13
1
2, 6, 7, 10, 11, 14, 15
3

--after this patch--
[root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
0, 4, 5, 12, 13
1, 6, 7, 14, 15
2, 8, 9
3, 10, 11


I add dump_stack insert blk_mq_map_queues.

[    1.378680] Call Trace:
[    1.378690]  dump_stack+0x5a/0x73
[    1.378695]  blk_mq_map_queues+0x29/0xb0
[    1.378700]  blk_mq_alloc_tag_set+0x1bd/0x2d0
[    1.378705]  virtblk_probe+0x1ae/0x8e4 [virtio_blk]
[    1.378709]  virtio_dev_probe+0x18a/0x230 [virtio]
[    1.378713]  really_probe+0x215/0x3f0
[    1.378716]  driver_probe_device+0x115/0x130
[    1.378718]  device_driver_attach+0x50/0x60
[    1.378720]  __driver_attach+0xbd/0x140
[    1.378722]  ? device_driver_attach+0x60/0x60
[    1.378724]  bus_for_each_dev+0x67/0xc0
[    1.378727]  ? klist_add_tail+0x57/0x70


At 2019-03-22 19:58:08, "Dongli Zhang" <dongli.zhang@oracle.com> wrote:
>
>
>On 03/22/2019 06:09 PM, luferry wrote:
>> under virtual machine environment, cpu topology may differ from normal
>> physical server.
>
>Would mind share the name of virtual machine monitor, the command line if
>available, and which device to reproduce.
>
>For instance, I am not able to reproduce with qemu nvme or virtio-blk as I
>assume they use pci or virtio specific mapper to establish the mapping.
>
>E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2
>
>Indeed I use three queues instead of twp as one is reserved for admin.
>
># ls /sys/block/nvme0n1/mq/*
>/sys/block/nvme0n1/mq/0:
>cpu0  cpu1  cpu2  cpu3  cpu_list  nr_reserved_tags  nr_tags
>
>/sys/block/nvme0n1/mq/1:
>cpu4  cpu5  cpu6  cpu7  cpu_list  nr_reserved_tags  nr_tags
>
>
>Thank you very much!
>
>Dongli Zhang
>
>> for example (machine with 4 cores, 2 threads per core):
>> 
>> normal physical server:
>> core-id   thread-0-id  thread-1-id
>> 0         0            4
>> 1         1            5
>> 2         2            6
>> 3         3            7
>> 
>> virtual machine:
>> core-id   thread-0-id  thread-1-id
>> 0         0            1
>> 1         2            3
>> 2         4            5
>> 3         6            7
>> 
>> When attach disk with two queues, all the even numbered cpus will be
>> mapped to queue 0. Under virtual machine, all the cpus is followed by
>> its sibling cpu.Before this patch, all the odd numbered cpus will also
>> be mapped to queue 0, can cause serious imbalance.this will lead to
>> performance impact on system IO
>> 
>> So suggest to allocate cpu map by core id, this can be more currency
>> 
>> Signed-off-by: luferry <luferry@163.com>
>> ---
>>  block/blk-mq-cpumap.c | 9 +++++----
>>  1 file changed, 5 insertions(+), 4 deletions(-)
>> 
>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>> index 03a534820271..4125e8e77679 100644
>> --- a/block/blk-mq-cpumap.c
>> +++ b/block/blk-mq-cpumap.c
>> @@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>  {
>>  	unsigned int *map = qmap->mq_map;
>>  	unsigned int nr_queues = qmap->nr_queues;
>> -	unsigned int cpu, first_sibling;
>> +	unsigned int cpu, first_sibling, core = 0;
>>  
>>  	for_each_possible_cpu(cpu) {
>>  		/*
>> @@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>  			map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>  		} else {
>>  			first_sibling = get_first_sibling(cpu);
>> -			if (first_sibling == cpu)
>> -				map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>> -			else
>> +			if (first_sibling == cpu) {
>> +				map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
>> +				core++;
>> +			} else
>>  				map[cpu] = map[first_sibling];
>>  		}
>>  	}
>> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] block/mq: blk map queues by core id
  2019-03-23  6:34   ` luferry
@ 2019-03-23 11:14     ` Dongli Zhang
  2019-03-25  9:49       ` luferry
  0 siblings, 1 reply; 9+ messages in thread
From: Dongli Zhang @ 2019-03-23 11:14 UTC (permalink / raw)
  To: luferry
  Cc: Jens Axboe, Ming Lei, Christoph Hellwig, linux-block, linux-kernel



On 03/23/2019 02:34 PM, luferry wrote:
> 
> I just bought one vm , so i cannot access to hypervisor. I will try to build the environment on my desktop.
> I'm sure about something.
> The hypervisor is KVM and disk driver is virtio-blk.
> [root@blk-mq ~]# dmesg |grep KVM
> [    0.000000] Hypervisor detected: KVM
> [    0.186330] Booting paravirtualized kernel on KVM
> [    0.279106] KVM setup async PF for cpu 0
> [    0.420819] KVM setup async PF for cpu 1
> [    0.421682] KVM setup async PF for cpu 2
> [    0.422113] KVM setup async PF for cpu 3
> [    0.422434] KVM setup async PF for cpu 4
> [    0.422434] KVM setup async PF for cpu 5
> [    0.423312] KVM setup async PF for cpu 6
> [    0.423394] KVM setup async PF for cpu 7
> [    0.424125] KVM setup async PF for cpu 8
> [    0.424414] KVM setup async PF for cpu 9
> [    0.424415] KVM setup async PF for cpu 10
> [    0.425329] KVM setup async PF for cpu 11
> [    0.425420] KVM setup async PF for cpu 12
> [    0.426156] KVM setup async PF for cpu 13
> [    0.426431] KVM setup async PF for cpu 14
> [    0.426431] KVM setup async PF for cpu 15
> [root@blk-mq ~]# lspci |grep block
> 00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device
> 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device
> 
> [root@blk-mq ~]# lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                16
> On-line CPU(s) list:   0-15
> Thread(s) per core:    2
> Core(s) per socket:    8
> 
> [root@blk-mq ~]# ls /sys/block/vdb/mq/
> 0  1  2  3
> 
> [root@blk-mq ~]# cat /proc/cpuinfo |egrep 'processor|core id'
> processor	: 0
> core id		: 0
> processor	: 1
> core id		: 0
> processor	: 2
> core id		: 1
> processor	: 3
> core id		: 1
> processor	: 4
> core id		: 2
> processor	: 5
> core id		: 2
> processor	: 6
> core id		: 3
> processor	: 7
> core id		: 3
> processor	: 8
> core id		: 4
> processor	: 9
> core id		: 4
> processor	: 10
> core id		: 5
> processor	: 11
> core id		: 5
> processor	: 12
> core id		: 6
> processor	: 13
> core id		: 6
> processor	: 14
> core id		: 7
> processor	: 15
> core id		: 7
> 
> --before this patch--
> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
> 0, 4, 5, 8, 9, 12, 13
> 1
> 2, 6, 7, 10, 11, 14, 15
> 3
> 
> --after this patch--
> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
> 0, 4, 5, 12, 13
> 1, 6, 7, 14, 15
> 2, 8, 9
> 3, 10, 11
> 
> 
> I add dump_stack insert blk_mq_map_queues.
> 
> [    1.378680] Call Trace:
> [    1.378690]  dump_stack+0x5a/0x73
> [    1.378695]  blk_mq_map_queues+0x29/0xb0
> [    1.378700]  blk_mq_alloc_tag_set+0x1bd/0x2d0
> [    1.378705]  virtblk_probe+0x1ae/0x8e4 [virtio_blk]
> [    1.378709]  virtio_dev_probe+0x18a/0x230 [virtio]
> [    1.378713]  really_probe+0x215/0x3f0
> [    1.378716]  driver_probe_device+0x115/0x130
> [    1.378718]  device_driver_attach+0x50/0x60
> [    1.378720]  __driver_attach+0xbd/0x140
> [    1.378722]  ? device_driver_attach+0x60/0x60
> [    1.378724]  bus_for_each_dev+0x67/0xc0
> [    1.378727]  ? klist_add_tail+0x57/0x70

I am not able to reproduce above call stack when virtio-blk is assigned 4 queues
while my qemu cmdline is "-smp 16,sockets=1,cores=8,threads=2".

# cat /sys/block/vda/mq/0/cpu_list
0, 1, 2, 3
# cat /sys/block/vda/mq/1/cpu_list
4, 5, 6, 7
# cat /sys/block/vda/mq/2/cpu_list
8, 9, 10, 11
# cat /sys/block/vda/mq/3/cpu_list
12, 13, 14, 15


I do agree in above case we would have issue if the mapping is established by
blk_mq_map_queues().


However, I am just curious how we finally reach at blk_mq_map_queues() from
blk_mq_alloc_tag_set()?

It should be something like:

blk_mq_alloc_tag_set()
 -> blk_mq_update_queue_map()
      -> if (set->ops->map_queues && !is_kdump_kernel())
             return set->ops->map_queues(set);
      -> else
             return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);

Wouldn't we always have set->ops->map_queues = virtblk_map_queues()?

Or the execution reach at:

virtblk_map_queues()
 -> blk_mq_virtio_map_queues()
     -> if (!vdev->config->get_vq_affinity)
            return blk_mq_map_queues(qmap);
     -> else
            establish the mapping via get_vq_affinity

but vdev->config->get_vq_affinity == NULL?

For virtio pci, get_vq_affinity is always set. Seems virtio mmio would not set
get_vq_affinity.


I used to play with firecracker (by amazon) and it is interesting firecracker
uses mmio to setup virtio-blk.


Sorry for disturbing the review of this patch. I just would like to clarify in
which scenario we would hit this issue, e.g., when virtio-blk is based on mmio?

Dongli Zhang

> 
> 
> At 2019-03-22 19:58:08, "Dongli Zhang" <dongli.zhang@oracle.com> wrote:
>>
>>
>> On 03/22/2019 06:09 PM, luferry wrote:
>>> under virtual machine environment, cpu topology may differ from normal
>>> physical server.
>>
>> Would mind share the name of virtual machine monitor, the command line if
>> available, and which device to reproduce.
>>
>> For instance, I am not able to reproduce with qemu nvme or virtio-blk as I
>> assume they use pci or virtio specific mapper to establish the mapping.
>>
>> E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2
>>
>> Indeed I use three queues instead of twp as one is reserved for admin.
>>
>> # ls /sys/block/nvme0n1/mq/*
>> /sys/block/nvme0n1/mq/0:
>> cpu0  cpu1  cpu2  cpu3  cpu_list  nr_reserved_tags  nr_tags
>>
>> /sys/block/nvme0n1/mq/1:
>> cpu4  cpu5  cpu6  cpu7  cpu_list  nr_reserved_tags  nr_tags
>>
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>>> for example (machine with 4 cores, 2 threads per core):
>>>
>>> normal physical server:
>>> core-id   thread-0-id  thread-1-id
>>> 0         0            4
>>> 1         1            5
>>> 2         2            6
>>> 3         3            7
>>>
>>> virtual machine:
>>> core-id   thread-0-id  thread-1-id
>>> 0         0            1
>>> 1         2            3
>>> 2         4            5
>>> 3         6            7
>>>
>>> When attach disk with two queues, all the even numbered cpus will be
>>> mapped to queue 0. Under virtual machine, all the cpus is followed by
>>> its sibling cpu.Before this patch, all the odd numbered cpus will also
>>> be mapped to queue 0, can cause serious imbalance.this will lead to
>>> performance impact on system IO
>>>
>>> So suggest to allocate cpu map by core id, this can be more currency
>>>
>>> Signed-off-by: luferry <luferry@163.com>
>>> ---
>>>  block/blk-mq-cpumap.c | 9 +++++----
>>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>> index 03a534820271..4125e8e77679 100644
>>> --- a/block/blk-mq-cpumap.c
>>> +++ b/block/blk-mq-cpumap.c
>>> @@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>  {
>>>  	unsigned int *map = qmap->mq_map;
>>>  	unsigned int nr_queues = qmap->nr_queues;
>>> -	unsigned int cpu, first_sibling;
>>> +	unsigned int cpu, first_sibling, core = 0;
>>>  
>>>  	for_each_possible_cpu(cpu) {
>>>  		/*
>>> @@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>  			map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>  		} else {
>>>  			first_sibling = get_first_sibling(cpu);
>>> -			if (first_sibling == cpu)
>>> -				map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>> -			else
>>> +			if (first_sibling == cpu) {
>>> +				map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
>>> +				core++;
>>> +			} else
>>>  				map[cpu] = map[first_sibling];
>>>  		}
>>>  	}
>>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re:Re: [PATCH] block/mq: blk map queues by core id
  2019-03-23 11:14     ` Dongli Zhang
@ 2019-03-25  9:49       ` luferry
  2019-03-25  9:53         ` luferry
  0 siblings, 1 reply; 9+ messages in thread
From: luferry @ 2019-03-25  9:49 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: Jens Axboe, Ming Lei, Christoph Hellwig, linux-block, linux-kernel


After reading the code and compare pci device info.
I can reproduce this imbalance under KVM.
When enable virtio-blk with multi queues but with only 2 msix-vector.
vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to virtio_mq_map_queues

Since virtual machine users cannot change the vector numbers, I think virtio_mq_map_queues should be more friendly with different cpu topology. 

448 const struct cpumask *vp_get_vq_affinity(struct virtio_device *vdev, int index)
449 {
450     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
451
452     if (!vp_dev->per_vq_vectors ||
453         vp_dev->vqs[index]->msix_vector == VIRTIO_MSI_NO_VECTOR)
454         return NULL;
455
456     return pci_irq_get_affinity(vp_dev->pci_dev,
457                     vp_dev->vqs[index]->msix_vector);
458 }

 32 int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
 33         struct virtio_device *vdev, int first_vec)
 34 {
 35     const struct cpumask *mask;
 36     unsigned int queue, cpu;
 37
 38     if (!vdev->config->get_vq_affinity)
 39         goto fallback;
 40
 41     for (queue = 0; queue < qmap->nr_queues; queue++) {
 42         mask = vdev->config->get_vq_affinity(vdev, first_vec + queue); //vp_get_vq_affinity
 43         if (!mask)
 44             goto fallback;
 45
 46         for_each_cpu(cpu, mask)
 47             qmap->mq_map[cpu] = qmap->queue_offset + queue;
 48     }
 49
 50     return 0;
 51 fallback:
 52     return blk_mq_map_queues(qmap);
 53 }








At 2019-03-23 19:14:34, "Dongli Zhang" <dongli.zhang@oracle.com> wrote:
>
>
>On 03/23/2019 02:34 PM, luferry wrote:
>> 
>> I just bought one vm , so i cannot access to hypervisor. I will try to build the environment on my desktop.
>> I'm sure about something.
>> The hypervisor is KVM and disk driver is virtio-blk.
>> [root@blk-mq ~]# dmesg |grep KVM
>> [    0.000000] Hypervisor detected: KVM
>> [    0.186330] Booting paravirtualized kernel on KVM
>> [    0.279106] KVM setup async PF for cpu 0
>> [    0.420819] KVM setup async PF for cpu 1
>> [    0.421682] KVM setup async PF for cpu 2
>> [    0.422113] KVM setup async PF for cpu 3
>> [    0.422434] KVM setup async PF for cpu 4
>> [    0.422434] KVM setup async PF for cpu 5
>> [    0.423312] KVM setup async PF for cpu 6
>> [    0.423394] KVM setup async PF for cpu 7
>> [    0.424125] KVM setup async PF for cpu 8
>> [    0.424414] KVM setup async PF for cpu 9
>> [    0.424415] KVM setup async PF for cpu 10
>> [    0.425329] KVM setup async PF for cpu 11
>> [    0.425420] KVM setup async PF for cpu 12
>> [    0.426156] KVM setup async PF for cpu 13
>> [    0.426431] KVM setup async PF for cpu 14
>> [    0.426431] KVM setup async PF for cpu 15
>> [root@blk-mq ~]# lspci |grep block
>> 00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>> 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>> 
>> [root@blk-mq ~]# lscpu
>> Architecture:          x86_64
>> CPU op-mode(s):        32-bit, 64-bit
>> Byte Order:            Little Endian
>> CPU(s):                16
>> On-line CPU(s) list:   0-15
>> Thread(s) per core:    2
>> Core(s) per socket:    8
>> 
>> [root@blk-mq ~]# ls /sys/block/vdb/mq/
>> 0  1  2  3
>> 
>> [root@blk-mq ~]# cat /proc/cpuinfo |egrep 'processor|core id'
>> processor	: 0
>> core id		: 0
>> processor	: 1
>> core id		: 0
>> processor	: 2
>> core id		: 1
>> processor	: 3
>> core id		: 1
>> processor	: 4
>> core id		: 2
>> processor	: 5
>> core id		: 2
>> processor	: 6
>> core id		: 3
>> processor	: 7
>> core id		: 3
>> processor	: 8
>> core id		: 4
>> processor	: 9
>> core id		: 4
>> processor	: 10
>> core id		: 5
>> processor	: 11
>> core id		: 5
>> processor	: 12
>> core id		: 6
>> processor	: 13
>> core id		: 6
>> processor	: 14
>> core id		: 7
>> processor	: 15
>> core id		: 7
>> 
>> --before this patch--
>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>> 0, 4, 5, 8, 9, 12, 13
>> 1
>> 2, 6, 7, 10, 11, 14, 15
>> 3
>> 
>> --after this patch--
>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>> 0, 4, 5, 12, 13
>> 1, 6, 7, 14, 15
>> 2, 8, 9
>> 3, 10, 11
>> 
>> 
>> I add dump_stack insert blk_mq_map_queues.
>> 
>> [    1.378680] Call Trace:
>> [    1.378690]  dump_stack+0x5a/0x73
>> [    1.378695]  blk_mq_map_queues+0x29/0xb0
>> [    1.378700]  blk_mq_alloc_tag_set+0x1bd/0x2d0
>> [    1.378705]  virtblk_probe+0x1ae/0x8e4 [virtio_blk]
>> [    1.378709]  virtio_dev_probe+0x18a/0x230 [virtio]
>> [    1.378713]  really_probe+0x215/0x3f0
>> [    1.378716]  driver_probe_device+0x115/0x130
>> [    1.378718]  device_driver_attach+0x50/0x60
>> [    1.378720]  __driver_attach+0xbd/0x140
>> [    1.378722]  ? device_driver_attach+0x60/0x60
>> [    1.378724]  bus_for_each_dev+0x67/0xc0
>> [    1.378727]  ? klist_add_tail+0x57/0x70
>
>I am not able to reproduce above call stack when virtio-blk is assigned 4 queues
>while my qemu cmdline is "-smp 16,sockets=1,cores=8,threads=2".
>
># cat /sys/block/vda/mq/0/cpu_list
>0, 1, 2, 3
># cat /sys/block/vda/mq/1/cpu_list
>4, 5, 6, 7
># cat /sys/block/vda/mq/2/cpu_list
>8, 9, 10, 11
># cat /sys/block/vda/mq/3/cpu_list
>12, 13, 14, 15
>
>
>I do agree in above case we would have issue if the mapping is established by
>blk_mq_map_queues().
>
>
>However, I am just curious how we finally reach at blk_mq_map_queues() from
>blk_mq_alloc_tag_set()?
>
>It should be something like:
>
>blk_mq_alloc_tag_set()
> -> blk_mq_update_queue_map()
>      -> if (set->ops->map_queues && !is_kdump_kernel())
>             return set->ops->map_queues(set);
>      -> else
>             return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
>
>Wouldn't we always have set->ops->map_queues = virtblk_map_queues()?
>
>Or the execution reach at:
>
>virtblk_map_queues()
> -> blk_mq_virtio_map_queues()
>     -> if (!vdev->config->get_vq_affinity)
>            return blk_mq_map_queues(qmap);
>     -> else
>            establish the mapping via get_vq_affinity
>
>but vdev->config->get_vq_affinity == NULL?
>
>For virtio pci, get_vq_affinity is always set. Seems virtio mmio would not set
>get_vq_affinity.
>
>
>I used to play with firecracker (by amazon) and it is interesting firecracker
>uses mmio to setup virtio-blk.
>
>
>Sorry for disturbing the review of this patch. I just would like to clarify in
>which scenario we would hit this issue, e.g., when virtio-blk is based on mmio?
>
>Dongli Zhang
>
>> 
>> 
>> At 2019-03-22 19:58:08, "Dongli Zhang" <dongli.zhang@oracle.com> wrote:
>>>
>>>
>>> On 03/22/2019 06:09 PM, luferry wrote:
>>>> under virtual machine environment, cpu topology may differ from normal
>>>> physical server.
>>>
>>> Would mind share the name of virtual machine monitor, the command line if
>>> available, and which device to reproduce.
>>>
>>> For instance, I am not able to reproduce with qemu nvme or virtio-blk as I
>>> assume they use pci or virtio specific mapper to establish the mapping.
>>>
>>> E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2
>>>
>>> Indeed I use three queues instead of twp as one is reserved for admin.
>>>
>>> # ls /sys/block/nvme0n1/mq/*
>>> /sys/block/nvme0n1/mq/0:
>>> cpu0  cpu1  cpu2  cpu3  cpu_list  nr_reserved_tags  nr_tags
>>>
>>> /sys/block/nvme0n1/mq/1:
>>> cpu4  cpu5  cpu6  cpu7  cpu_list  nr_reserved_tags  nr_tags
>>>
>>>
>>> Thank you very much!
>>>
>>> Dongli Zhang
>>>
>>>> for example (machine with 4 cores, 2 threads per core):
>>>>
>>>> normal physical server:
>>>> core-id   thread-0-id  thread-1-id
>>>> 0         0            4
>>>> 1         1            5
>>>> 2         2            6
>>>> 3         3            7
>>>>
>>>> virtual machine:
>>>> core-id   thread-0-id  thread-1-id
>>>> 0         0            1
>>>> 1         2            3
>>>> 2         4            5
>>>> 3         6            7
>>>>
>>>> When attach disk with two queues, all the even numbered cpus will be
>>>> mapped to queue 0. Under virtual machine, all the cpus is followed by
>>>> its sibling cpu.Before this patch, all the odd numbered cpus will also
>>>> be mapped to queue 0, can cause serious imbalance.this will lead to
>>>> performance impact on system IO
>>>>
>>>> So suggest to allocate cpu map by core id, this can be more currency
>>>>
>>>> Signed-off-by: luferry <luferry@163.com>
>>>> ---
>>>>  block/blk-mq-cpumap.c | 9 +++++----
>>>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>>> index 03a534820271..4125e8e77679 100644
>>>> --- a/block/blk-mq-cpumap.c
>>>> +++ b/block/blk-mq-cpumap.c
>>>> @@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>>  {
>>>>  	unsigned int *map = qmap->mq_map;
>>>>  	unsigned int nr_queues = qmap->nr_queues;
>>>> -	unsigned int cpu, first_sibling;
>>>> +	unsigned int cpu, first_sibling, core = 0;
>>>>  
>>>>  	for_each_possible_cpu(cpu) {
>>>>  		/*
>>>> @@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>>  			map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>>  		} else {
>>>>  			first_sibling = get_first_sibling(cpu);
>>>> -			if (first_sibling == cpu)
>>>> -				map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>> -			else
>>>> +			if (first_sibling == cpu) {
>>>> +				map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
>>>> +				core++;
>>>> +			} else
>>>>  				map[cpu] = map[first_sibling];
>>>>  		}
>>>>  	}
>>>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re:Re:Re: [PATCH] block/mq: blk map queues by core id
  2019-03-25  9:49       ` luferry
@ 2019-03-25  9:53         ` luferry
  2019-03-25 13:53           ` Dongli Zhang
  0 siblings, 1 reply; 9+ messages in thread
From: luferry @ 2019-03-25  9:53 UTC (permalink / raw)
  To: luferry
  Cc: Dongli Zhang, Jens Axboe, Ming Lei, Christoph Hellwig,
	linux-block, linux-kernel


sorry,  messed up some func name

When enable virtio-blk with multi queues but with only 2 msix-vector.
vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to blk_mq_map_queues

Since virtual machine users cannot change the vector numbers, I think blk_mq_map_queues should be more friendly with different cpu topology.




At 2019-03-25 17:49:33, "luferry" <luferry@163.com> wrote:
>
>After reading the code and compare pci device info.
>I can reproduce this imbalance under KVM.
>When enable virtio-blk with multi queues but with only 2 msix-vector.
>vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to virtio_mq_map_queues
>
>Since virtual machine users cannot change the vector numbers, I think virtio_mq_map_queues should be more friendly with different cpu topology. 
>
>448 const struct cpumask *vp_get_vq_affinity(struct virtio_device *vdev, int index)
>449 {
>450     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>451
>452     if (!vp_dev->per_vq_vectors ||
>453         vp_dev->vqs[index]->msix_vector == VIRTIO_MSI_NO_VECTOR)
>454         return NULL;
>455
>456     return pci_irq_get_affinity(vp_dev->pci_dev,
>457                     vp_dev->vqs[index]->msix_vector);
>458 }
>
> 32 int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
> 33         struct virtio_device *vdev, int first_vec)
> 34 {
> 35     const struct cpumask *mask;
> 36     unsigned int queue, cpu;
> 37
> 38     if (!vdev->config->get_vq_affinity)
> 39         goto fallback;
> 40
> 41     for (queue = 0; queue < qmap->nr_queues; queue++) {
> 42         mask = vdev->config->get_vq_affinity(vdev, first_vec + queue); //vp_get_vq_affinity
> 43         if (!mask)
> 44             goto fallback;
> 45
> 46         for_each_cpu(cpu, mask)
> 47             qmap->mq_map[cpu] = qmap->queue_offset + queue;
> 48     }
> 49
> 50     return 0;
> 51 fallback:
> 52     return blk_mq_map_queues(qmap);
> 53 }
>
>
>
>
>
>
>
>
>At 2019-03-23 19:14:34, "Dongli Zhang" <dongli.zhang@oracle.com> wrote:
>>
>>
>>On 03/23/2019 02:34 PM, luferry wrote:
>>> 
>>> I just bought one vm , so i cannot access to hypervisor. I will try to build the environment on my desktop.
>>> I'm sure about something.
>>> The hypervisor is KVM and disk driver is virtio-blk.
>>> [root@blk-mq ~]# dmesg |grep KVM
>>> [    0.000000] Hypervisor detected: KVM
>>> [    0.186330] Booting paravirtualized kernel on KVM
>>> [    0.279106] KVM setup async PF for cpu 0
>>> [    0.420819] KVM setup async PF for cpu 1
>>> [    0.421682] KVM setup async PF for cpu 2
>>> [    0.422113] KVM setup async PF for cpu 3
>>> [    0.422434] KVM setup async PF for cpu 4
>>> [    0.422434] KVM setup async PF for cpu 5
>>> [    0.423312] KVM setup async PF for cpu 6
>>> [    0.423394] KVM setup async PF for cpu 7
>>> [    0.424125] KVM setup async PF for cpu 8
>>> [    0.424414] KVM setup async PF for cpu 9
>>> [    0.424415] KVM setup async PF for cpu 10
>>> [    0.425329] KVM setup async PF for cpu 11
>>> [    0.425420] KVM setup async PF for cpu 12
>>> [    0.426156] KVM setup async PF for cpu 13
>>> [    0.426431] KVM setup async PF for cpu 14
>>> [    0.426431] KVM setup async PF for cpu 15
>>> [root@blk-mq ~]# lspci |grep block
>>> 00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>>> 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>>> 
>>> [root@blk-mq ~]# lscpu
>>> Architecture:          x86_64
>>> CPU op-mode(s):        32-bit, 64-bit
>>> Byte Order:            Little Endian
>>> CPU(s):                16
>>> On-line CPU(s) list:   0-15
>>> Thread(s) per core:    2
>>> Core(s) per socket:    8
>>> 
>>> [root@blk-mq ~]# ls /sys/block/vdb/mq/
>>> 0  1  2  3
>>> 
>>> [root@blk-mq ~]# cat /proc/cpuinfo |egrep 'processor|core id'
>>> processor	: 0
>>> core id		: 0
>>> processor	: 1
>>> core id		: 0
>>> processor	: 2
>>> core id		: 1
>>> processor	: 3
>>> core id		: 1
>>> processor	: 4
>>> core id		: 2
>>> processor	: 5
>>> core id		: 2
>>> processor	: 6
>>> core id		: 3
>>> processor	: 7
>>> core id		: 3
>>> processor	: 8
>>> core id		: 4
>>> processor	: 9
>>> core id		: 4
>>> processor	: 10
>>> core id		: 5
>>> processor	: 11
>>> core id		: 5
>>> processor	: 12
>>> core id		: 6
>>> processor	: 13
>>> core id		: 6
>>> processor	: 14
>>> core id		: 7
>>> processor	: 15
>>> core id		: 7
>>> 
>>> --before this patch--
>>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>>> 0, 4, 5, 8, 9, 12, 13
>>> 1
>>> 2, 6, 7, 10, 11, 14, 15
>>> 3
>>> 
>>> --after this patch--
>>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>>> 0, 4, 5, 12, 13
>>> 1, 6, 7, 14, 15
>>> 2, 8, 9
>>> 3, 10, 11
>>> 
>>> 
>>> I add dump_stack insert blk_mq_map_queues.
>>> 
>>> [    1.378680] Call Trace:
>>> [    1.378690]  dump_stack+0x5a/0x73
>>> [    1.378695]  blk_mq_map_queues+0x29/0xb0
>>> [    1.378700]  blk_mq_alloc_tag_set+0x1bd/0x2d0
>>> [    1.378705]  virtblk_probe+0x1ae/0x8e4 [virtio_blk]
>>> [    1.378709]  virtio_dev_probe+0x18a/0x230 [virtio]
>>> [    1.378713]  really_probe+0x215/0x3f0
>>> [    1.378716]  driver_probe_device+0x115/0x130
>>> [    1.378718]  device_driver_attach+0x50/0x60
>>> [    1.378720]  __driver_attach+0xbd/0x140
>>> [    1.378722]  ? device_driver_attach+0x60/0x60
>>> [    1.378724]  bus_for_each_dev+0x67/0xc0
>>> [    1.378727]  ? klist_add_tail+0x57/0x70
>>
>>I am not able to reproduce above call stack when virtio-blk is assigned 4 queues
>>while my qemu cmdline is "-smp 16,sockets=1,cores=8,threads=2".
>>
>># cat /sys/block/vda/mq/0/cpu_list
>>0, 1, 2, 3
>># cat /sys/block/vda/mq/1/cpu_list
>>4, 5, 6, 7
>># cat /sys/block/vda/mq/2/cpu_list
>>8, 9, 10, 11
>># cat /sys/block/vda/mq/3/cpu_list
>>12, 13, 14, 15
>>
>>
>>I do agree in above case we would have issue if the mapping is established by
>>blk_mq_map_queues().
>>
>>
>>However, I am just curious how we finally reach at blk_mq_map_queues() from
>>blk_mq_alloc_tag_set()?
>>
>>It should be something like:
>>
>>blk_mq_alloc_tag_set()
>> -> blk_mq_update_queue_map()
>>      -> if (set->ops->map_queues && !is_kdump_kernel())
>>             return set->ops->map_queues(set);
>>      -> else
>>             return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
>>
>>Wouldn't we always have set->ops->map_queues = virtblk_map_queues()?
>>
>>Or the execution reach at:
>>
>>virtblk_map_queues()
>> -> blk_mq_virtio_map_queues()
>>     -> if (!vdev->config->get_vq_affinity)
>>            return blk_mq_map_queues(qmap);
>>     -> else
>>            establish the mapping via get_vq_affinity
>>
>>but vdev->config->get_vq_affinity == NULL?
>>
>>For virtio pci, get_vq_affinity is always set. Seems virtio mmio would not set
>>get_vq_affinity.
>>
>>
>>I used to play with firecracker (by amazon) and it is interesting firecracker
>>uses mmio to setup virtio-blk.
>>
>>
>>Sorry for disturbing the review of this patch. I just would like to clarify in
>>which scenario we would hit this issue, e.g., when virtio-blk is based on mmio?
>>
>>Dongli Zhang
>>
>>> 
>>> 
>>> At 2019-03-22 19:58:08, "Dongli Zhang" <dongli.zhang@oracle.com> wrote:
>>>>
>>>>
>>>> On 03/22/2019 06:09 PM, luferry wrote:
>>>>> under virtual machine environment, cpu topology may differ from normal
>>>>> physical server.
>>>>
>>>> Would mind share the name of virtual machine monitor, the command line if
>>>> available, and which device to reproduce.
>>>>
>>>> For instance, I am not able to reproduce with qemu nvme or virtio-blk as I
>>>> assume they use pci or virtio specific mapper to establish the mapping.
>>>>
>>>> E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2
>>>>
>>>> Indeed I use three queues instead of twp as one is reserved for admin.
>>>>
>>>> # ls /sys/block/nvme0n1/mq/*
>>>> /sys/block/nvme0n1/mq/0:
>>>> cpu0  cpu1  cpu2  cpu3  cpu_list  nr_reserved_tags  nr_tags
>>>>
>>>> /sys/block/nvme0n1/mq/1:
>>>> cpu4  cpu5  cpu6  cpu7  cpu_list  nr_reserved_tags  nr_tags
>>>>
>>>>
>>>> Thank you very much!
>>>>
>>>> Dongli Zhang
>>>>
>>>>> for example (machine with 4 cores, 2 threads per core):
>>>>>
>>>>> normal physical server:
>>>>> core-id   thread-0-id  thread-1-id
>>>>> 0         0            4
>>>>> 1         1            5
>>>>> 2         2            6
>>>>> 3         3            7
>>>>>
>>>>> virtual machine:
>>>>> core-id   thread-0-id  thread-1-id
>>>>> 0         0            1
>>>>> 1         2            3
>>>>> 2         4            5
>>>>> 3         6            7
>>>>>
>>>>> When attach disk with two queues, all the even numbered cpus will be
>>>>> mapped to queue 0. Under virtual machine, all the cpus is followed by
>>>>> its sibling cpu.Before this patch, all the odd numbered cpus will also
>>>>> be mapped to queue 0, can cause serious imbalance.this will lead to
>>>>> performance impact on system IO
>>>>>
>>>>> So suggest to allocate cpu map by core id, this can be more currency
>>>>>
>>>>> Signed-off-by: luferry <luferry@163.com>
>>>>> ---
>>>>>  block/blk-mq-cpumap.c | 9 +++++----
>>>>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>>>> index 03a534820271..4125e8e77679 100644
>>>>> --- a/block/blk-mq-cpumap.c
>>>>> +++ b/block/blk-mq-cpumap.c
>>>>> @@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>>>  {
>>>>>  	unsigned int *map = qmap->mq_map;
>>>>>  	unsigned int nr_queues = qmap->nr_queues;
>>>>> -	unsigned int cpu, first_sibling;
>>>>> +	unsigned int cpu, first_sibling, core = 0;
>>>>>  
>>>>>  	for_each_possible_cpu(cpu) {
>>>>>  		/*
>>>>> @@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>>>  			map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>>>  		} else {
>>>>>  			first_sibling = get_first_sibling(cpu);
>>>>> -			if (first_sibling == cpu)
>>>>> -				map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>>> -			else
>>>>> +			if (first_sibling == cpu) {
>>>>> +				map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
>>>>> +				core++;
>>>>> +			} else
>>>>>  				map[cpu] = map[first_sibling];
>>>>>  		}
>>>>>  	}
>>>>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] block/mq: blk map queues by core id
  2019-03-25  9:53         ` luferry
@ 2019-03-25 13:53           ` Dongli Zhang
  2019-03-25 15:17             ` luferry
  0 siblings, 1 reply; 9+ messages in thread
From: Dongli Zhang @ 2019-03-25 13:53 UTC (permalink / raw)
  To: luferry
  Cc: Jens Axboe, Ming Lei, Christoph Hellwig, linux-block, linux-kernel



On 03/25/2019 05:53 PM, luferry wrote:
> 
> sorry,  messed up some func name
> 
> When enable virtio-blk with multi queues but with only 2 msix-vector.
> vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to blk_mq_map_queues
> 
> Since virtual machine users cannot change the vector numbers, I think blk_mq_map_queues should be more friendly with different cpu topology.


Yes, we will fallback to blk_mq_map_queues if there is lack of vectors.

Here is the qemu cmdline:
"-smp 8,sockets=1,cores=4,threads=2"
"-device virtio-blk-pci,drive=drive0,id=device0,num-queues=4,vectors=2"

# cat /sys/block/vda/mq/0/cpu_list
0, 4, 5
root@vm:/home/zhang# cat /sys/block/vda/mq/1/cpu_list
1
root@vm:/home/zhang# cat /sys/block/vda/mq/2/cpu_list
2, 6, 7
root@vm:/home/zhang# cat /sys/block/vda/mq/3/cpu_list
3

How about to put how to hit the issue in commit message so people would be able
to know the scenario?

Dongli Zhang

> 
> 
> 
> 
> At 2019-03-25 17:49:33, "luferry" <luferry@163.com> wrote:
>>
>> After reading the code and compare pci device info.
>> I can reproduce this imbalance under KVM.
>> When enable virtio-blk with multi queues but with only 2 msix-vector.
>> vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to virtio_mq_map_queues
>>
>> Since virtual machine users cannot change the vector numbers, I think virtio_mq_map_queues should be more friendly with different cpu topology. 
>>
>> 448 const struct cpumask *vp_get_vq_affinity(struct virtio_device *vdev, int index)
>> 449 {
>> 450     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>> 451
>> 452     if (!vp_dev->per_vq_vectors ||
>> 453         vp_dev->vqs[index]->msix_vector == VIRTIO_MSI_NO_VECTOR)
>> 454         return NULL;
>> 455
>> 456     return pci_irq_get_affinity(vp_dev->pci_dev,
>> 457                     vp_dev->vqs[index]->msix_vector);
>> 458 }
>>
>> 32 int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
>> 33         struct virtio_device *vdev, int first_vec)
>> 34 {
>> 35     const struct cpumask *mask;
>> 36     unsigned int queue, cpu;
>> 37
>> 38     if (!vdev->config->get_vq_affinity)
>> 39         goto fallback;
>> 40
>> 41     for (queue = 0; queue < qmap->nr_queues; queue++) {
>> 42         mask = vdev->config->get_vq_affinity(vdev, first_vec + queue); //vp_get_vq_affinity
>> 43         if (!mask)
>> 44             goto fallback;
>> 45
>> 46         for_each_cpu(cpu, mask)
>> 47             qmap->mq_map[cpu] = qmap->queue_offset + queue;
>> 48     }
>> 49
>> 50     return 0;
>> 51 fallback:
>> 52     return blk_mq_map_queues(qmap);
>> 53 }
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2019-03-23 19:14:34, "Dongli Zhang" <dongli.zhang@oracle.com> wrote:
>>>
>>>
>>> On 03/23/2019 02:34 PM, luferry wrote:
>>>>
>>>> I just bought one vm , so i cannot access to hypervisor. I will try to build the environment on my desktop.
>>>> I'm sure about something.
>>>> The hypervisor is KVM and disk driver is virtio-blk.
>>>> [root@blk-mq ~]# dmesg |grep KVM
>>>> [    0.000000] Hypervisor detected: KVM
>>>> [    0.186330] Booting paravirtualized kernel on KVM
>>>> [    0.279106] KVM setup async PF for cpu 0
>>>> [    0.420819] KVM setup async PF for cpu 1
>>>> [    0.421682] KVM setup async PF for cpu 2
>>>> [    0.422113] KVM setup async PF for cpu 3
>>>> [    0.422434] KVM setup async PF for cpu 4
>>>> [    0.422434] KVM setup async PF for cpu 5
>>>> [    0.423312] KVM setup async PF for cpu 6
>>>> [    0.423394] KVM setup async PF for cpu 7
>>>> [    0.424125] KVM setup async PF for cpu 8
>>>> [    0.424414] KVM setup async PF for cpu 9
>>>> [    0.424415] KVM setup async PF for cpu 10
>>>> [    0.425329] KVM setup async PF for cpu 11
>>>> [    0.425420] KVM setup async PF for cpu 12
>>>> [    0.426156] KVM setup async PF for cpu 13
>>>> [    0.426431] KVM setup async PF for cpu 14
>>>> [    0.426431] KVM setup async PF for cpu 15
>>>> [root@blk-mq ~]# lspci |grep block
>>>> 00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>>>> 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>>>>
>>>> [root@blk-mq ~]# lscpu
>>>> Architecture:          x86_64
>>>> CPU op-mode(s):        32-bit, 64-bit
>>>> Byte Order:            Little Endian
>>>> CPU(s):                16
>>>> On-line CPU(s) list:   0-15
>>>> Thread(s) per core:    2
>>>> Core(s) per socket:    8
>>>>
>>>> [root@blk-mq ~]# ls /sys/block/vdb/mq/
>>>> 0  1  2  3
>>>>
>>>> [root@blk-mq ~]# cat /proc/cpuinfo |egrep 'processor|core id'
>>>> processor	: 0
>>>> core id		: 0
>>>> processor	: 1
>>>> core id		: 0
>>>> processor	: 2
>>>> core id		: 1
>>>> processor	: 3
>>>> core id		: 1
>>>> processor	: 4
>>>> core id		: 2
>>>> processor	: 5
>>>> core id		: 2
>>>> processor	: 6
>>>> core id		: 3
>>>> processor	: 7
>>>> core id		: 3
>>>> processor	: 8
>>>> core id		: 4
>>>> processor	: 9
>>>> core id		: 4
>>>> processor	: 10
>>>> core id		: 5
>>>> processor	: 11
>>>> core id		: 5
>>>> processor	: 12
>>>> core id		: 6
>>>> processor	: 13
>>>> core id		: 6
>>>> processor	: 14
>>>> core id		: 7
>>>> processor	: 15
>>>> core id		: 7
>>>>
>>>> --before this patch--
>>>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>>>> 0, 4, 5, 8, 9, 12, 13
>>>> 1
>>>> 2, 6, 7, 10, 11, 14, 15
>>>> 3
>>>>
>>>> --after this patch--
>>>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>>>> 0, 4, 5, 12, 13
>>>> 1, 6, 7, 14, 15
>>>> 2, 8, 9
>>>> 3, 10, 11
>>>>
>>>>
>>>> I add dump_stack insert blk_mq_map_queues.
>>>>
>>>> [    1.378680] Call Trace:
>>>> [    1.378690]  dump_stack+0x5a/0x73
>>>> [    1.378695]  blk_mq_map_queues+0x29/0xb0
>>>> [    1.378700]  blk_mq_alloc_tag_set+0x1bd/0x2d0
>>>> [    1.378705]  virtblk_probe+0x1ae/0x8e4 [virtio_blk]
>>>> [    1.378709]  virtio_dev_probe+0x18a/0x230 [virtio]
>>>> [    1.378713]  really_probe+0x215/0x3f0
>>>> [    1.378716]  driver_probe_device+0x115/0x130
>>>> [    1.378718]  device_driver_attach+0x50/0x60
>>>> [    1.378720]  __driver_attach+0xbd/0x140
>>>> [    1.378722]  ? device_driver_attach+0x60/0x60
>>>> [    1.378724]  bus_for_each_dev+0x67/0xc0
>>>> [    1.378727]  ? klist_add_tail+0x57/0x70
>>>
>>> I am not able to reproduce above call stack when virtio-blk is assigned 4 queues
>>> while my qemu cmdline is "-smp 16,sockets=1,cores=8,threads=2".
>>>
>>> # cat /sys/block/vda/mq/0/cpu_list
>>> 0, 1, 2, 3
>>> # cat /sys/block/vda/mq/1/cpu_list
>>> 4, 5, 6, 7
>>> # cat /sys/block/vda/mq/2/cpu_list
>>> 8, 9, 10, 11
>>> # cat /sys/block/vda/mq/3/cpu_list
>>> 12, 13, 14, 15
>>>
>>>
>>> I do agree in above case we would have issue if the mapping is established by
>>> blk_mq_map_queues().
>>>
>>>
>>> However, I am just curious how we finally reach at blk_mq_map_queues() from
>>> blk_mq_alloc_tag_set()?
>>>
>>> It should be something like:
>>>
>>> blk_mq_alloc_tag_set()
>>> -> blk_mq_update_queue_map()
>>>      -> if (set->ops->map_queues && !is_kdump_kernel())
>>>             return set->ops->map_queues(set);
>>>      -> else
>>>             return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
>>>
>>> Wouldn't we always have set->ops->map_queues = virtblk_map_queues()?
>>>
>>> Or the execution reach at:
>>>
>>> virtblk_map_queues()
>>> -> blk_mq_virtio_map_queues()
>>>     -> if (!vdev->config->get_vq_affinity)
>>>            return blk_mq_map_queues(qmap);
>>>     -> else
>>>            establish the mapping via get_vq_affinity
>>>
>>> but vdev->config->get_vq_affinity == NULL?
>>>
>>> For virtio pci, get_vq_affinity is always set. Seems virtio mmio would not set
>>> get_vq_affinity.
>>>
>>>
>>> I used to play with firecracker (by amazon) and it is interesting firecracker
>>> uses mmio to setup virtio-blk.
>>>
>>>
>>> Sorry for disturbing the review of this patch. I just would like to clarify in
>>> which scenario we would hit this issue, e.g., when virtio-blk is based on mmio?
>>>
>>> Dongli Zhang
>>>
>>>>
>>>>
>>>> At 2019-03-22 19:58:08, "Dongli Zhang" <dongli.zhang@oracle.com> wrote:
>>>>>
>>>>>
>>>>> On 03/22/2019 06:09 PM, luferry wrote:
>>>>>> under virtual machine environment, cpu topology may differ from normal
>>>>>> physical server.
>>>>>
>>>>> Would mind share the name of virtual machine monitor, the command line if
>>>>> available, and which device to reproduce.
>>>>>
>>>>> For instance, I am not able to reproduce with qemu nvme or virtio-blk as I
>>>>> assume they use pci or virtio specific mapper to establish the mapping.
>>>>>
>>>>> E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2
>>>>>
>>>>> Indeed I use three queues instead of twp as one is reserved for admin.
>>>>>
>>>>> # ls /sys/block/nvme0n1/mq/*
>>>>> /sys/block/nvme0n1/mq/0:
>>>>> cpu0  cpu1  cpu2  cpu3  cpu_list  nr_reserved_tags  nr_tags
>>>>>
>>>>> /sys/block/nvme0n1/mq/1:
>>>>> cpu4  cpu5  cpu6  cpu7  cpu_list  nr_reserved_tags  nr_tags
>>>>>
>>>>>
>>>>> Thank you very much!
>>>>>
>>>>> Dongli Zhang
>>>>>
>>>>>> for example (machine with 4 cores, 2 threads per core):
>>>>>>
>>>>>> normal physical server:
>>>>>> core-id   thread-0-id  thread-1-id
>>>>>> 0         0            4
>>>>>> 1         1            5
>>>>>> 2         2            6
>>>>>> 3         3            7
>>>>>>
>>>>>> virtual machine:
>>>>>> core-id   thread-0-id  thread-1-id
>>>>>> 0         0            1
>>>>>> 1         2            3
>>>>>> 2         4            5
>>>>>> 3         6            7
>>>>>>
>>>>>> When attach disk with two queues, all the even numbered cpus will be
>>>>>> mapped to queue 0. Under virtual machine, all the cpus is followed by
>>>>>> its sibling cpu.Before this patch, all the odd numbered cpus will also
>>>>>> be mapped to queue 0, can cause serious imbalance.this will lead to
>>>>>> performance impact on system IO
>>>>>>
>>>>>> So suggest to allocate cpu map by core id, this can be more currency
>>>>>>
>>>>>> Signed-off-by: luferry <luferry@163.com>
>>>>>> ---
>>>>>>  block/blk-mq-cpumap.c | 9 +++++----
>>>>>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>>>>> index 03a534820271..4125e8e77679 100644
>>>>>> --- a/block/blk-mq-cpumap.c
>>>>>> +++ b/block/blk-mq-cpumap.c
>>>>>> @@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>>>>  {
>>>>>>  	unsigned int *map = qmap->mq_map;
>>>>>>  	unsigned int nr_queues = qmap->nr_queues;
>>>>>> -	unsigned int cpu, first_sibling;
>>>>>> +	unsigned int cpu, first_sibling, core = 0;
>>>>>>  
>>>>>>  	for_each_possible_cpu(cpu) {
>>>>>>  		/*
>>>>>> @@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>>>>  			map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>>>>  		} else {
>>>>>>  			first_sibling = get_first_sibling(cpu);
>>>>>> -			if (first_sibling == cpu)
>>>>>> -				map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>>>> -			else
>>>>>> +			if (first_sibling == cpu) {
>>>>>> +				map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
>>>>>> +				core++;
>>>>>> +			} else
>>>>>>  				map[cpu] = map[first_sibling];
>>>>>>  		}
>>>>>>  	}
>>>>>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re:Re: [PATCH] block/mq: blk map queues by core id
  2019-03-25 13:53           ` Dongli Zhang
@ 2019-03-25 15:17             ` luferry
  0 siblings, 0 replies; 9+ messages in thread
From: luferry @ 2019-03-25 15:17 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: Jens Axboe, Ming Lei, Christoph Hellwig, linux-block, linux-kernel




Thanks for your kindly suggestions, I will reformat this patch later!




At 2019-03-25 21:53:00, "Dongli Zhang" <dongli.zhang@oracle.com> wrote:
>
>
>On 03/25/2019 05:53 PM, luferry wrote:
>> 
>> sorry,  messed up some func name
>> 
>> When enable virtio-blk with multi queues but with only 2 msix-vector.
>> vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to blk_mq_map_queues
>> 
>> Since virtual machine users cannot change the vector numbers, I think blk_mq_map_queues should be more friendly with different cpu topology.
>
>
>Yes, we will fallback to blk_mq_map_queues if there is lack of vectors.
>
>Here is the qemu cmdline:
>"-smp 8,sockets=1,cores=4,threads=2"
>"-device virtio-blk-pci,drive=drive0,id=device0,num-queues=4,vectors=2"
>
># cat /sys/block/vda/mq/0/cpu_list
>0, 4, 5
>root@vm:/home/zhang# cat /sys/block/vda/mq/1/cpu_list
>1
>root@vm:/home/zhang# cat /sys/block/vda/mq/2/cpu_list
>2, 6, 7
>root@vm:/home/zhang# cat /sys/block/vda/mq/3/cpu_list
>3
>
>How about to put how to hit the issue in commit message so people would be able
>to know the scenario?
>
>Dongli Zhang
>
>> 
>> 
>> 
>> 
>> At 2019-03-25 17:49:33, "luferry" <luferry@163.com> wrote:
>>>
>>> After reading the code and compare pci device info.
>>> I can reproduce this imbalance under KVM.
>>> When enable virtio-blk with multi queues but with only 2 msix-vector.
>>> vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to virtio_mq_map_queues
>>>
>>> Since virtual machine users cannot change the vector numbers, I think virtio_mq_map_queues should be more friendly with different cpu topology. 
>>>
>>> 448 const struct cpumask *vp_get_vq_affinity(struct virtio_device *vdev, int index)
>>> 449 {
>>> 450     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>> 451
>>> 452     if (!vp_dev->per_vq_vectors ||
>>> 453         vp_dev->vqs[index]->msix_vector == VIRTIO_MSI_NO_VECTOR)
>>> 454         return NULL;
>>> 455
>>> 456     return pci_irq_get_affinity(vp_dev->pci_dev,
>>> 457                     vp_dev->vqs[index]->msix_vector);
>>> 458 }
>>>
>>> 32 int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
>>> 33         struct virtio_device *vdev, int first_vec)
>>> 34 {
>>> 35     const struct cpumask *mask;
>>> 36     unsigned int queue, cpu;
>>> 37
>>> 38     if (!vdev->config->get_vq_affinity)
>>> 39         goto fallback;
>>> 40
>>> 41     for (queue = 0; queue < qmap->nr_queues; queue++) {
>>> 42         mask = vdev->config->get_vq_affinity(vdev, first_vec + queue); //vp_get_vq_affinity
>>> 43         if (!mask)
>>> 44             goto fallback;
>>> 45
>>> 46         for_each_cpu(cpu, mask)
>>> 47             qmap->mq_map[cpu] = qmap->queue_offset + queue;
>>> 48     }
>>> 49
>>> 50     return 0;
>>> 51 fallback:
>>> 52     return blk_mq_map_queues(qmap);
>>> 53 }
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> At 2019-03-23 19:14:34, "Dongli Zhang" <dongli.zhang@oracle.com> wrote:
>>>>
>>>>
>>>> On 03/23/2019 02:34 PM, luferry wrote:
>>>>>
>>>>> I just bought one vm , so i cannot access to hypervisor. I will try to build the environment on my desktop.
>>>>> I'm sure about something.
>>>>> The hypervisor is KVM and disk driver is virtio-blk.
>>>>> [root@blk-mq ~]# dmesg |grep KVM
>>>>> [    0.000000] Hypervisor detected: KVM
>>>>> [    0.186330] Booting paravirtualized kernel on KVM
>>>>> [    0.279106] KVM setup async PF for cpu 0
>>>>> [    0.420819] KVM setup async PF for cpu 1
>>>>> [    0.421682] KVM setup async PF for cpu 2
>>>>> [    0.422113] KVM setup async PF for cpu 3
>>>>> [    0.422434] KVM setup async PF for cpu 4
>>>>> [    0.422434] KVM setup async PF for cpu 5
>>>>> [    0.423312] KVM setup async PF for cpu 6
>>>>> [    0.423394] KVM setup async PF for cpu 7
>>>>> [    0.424125] KVM setup async PF for cpu 8
>>>>> [    0.424414] KVM setup async PF for cpu 9
>>>>> [    0.424415] KVM setup async PF for cpu 10
>>>>> [    0.425329] KVM setup async PF for cpu 11
>>>>> [    0.425420] KVM setup async PF for cpu 12
>>>>> [    0.426156] KVM setup async PF for cpu 13
>>>>> [    0.426431] KVM setup async PF for cpu 14
>>>>> [    0.426431] KVM setup async PF for cpu 15
>>>>> [root@blk-mq ~]# lspci |grep block
>>>>> 00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>>>>> 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>>>>>
>>>>> [root@blk-mq ~]# lscpu
>>>>> Architecture:          x86_64
>>>>> CPU op-mode(s):        32-bit, 64-bit
>>>>> Byte Order:            Little Endian
>>>>> CPU(s):                16
>>>>> On-line CPU(s) list:   0-15
>>>>> Thread(s) per core:    2
>>>>> Core(s) per socket:    8
>>>>>
>>>>> [root@blk-mq ~]# ls /sys/block/vdb/mq/
>>>>> 0  1  2  3
>>>>>
>>>>> [root@blk-mq ~]# cat /proc/cpuinfo |egrep 'processor|core id'
>>>>> processor	: 0
>>>>> core id		: 0
>>>>> processor	: 1
>>>>> core id		: 0
>>>>> processor	: 2
>>>>> core id		: 1
>>>>> processor	: 3
>>>>> core id		: 1
>>>>> processor	: 4
>>>>> core id		: 2
>>>>> processor	: 5
>>>>> core id		: 2
>>>>> processor	: 6
>>>>> core id		: 3
>>>>> processor	: 7
>>>>> core id		: 3
>>>>> processor	: 8
>>>>> core id		: 4
>>>>> processor	: 9
>>>>> core id		: 4
>>>>> processor	: 10
>>>>> core id		: 5
>>>>> processor	: 11
>>>>> core id		: 5
>>>>> processor	: 12
>>>>> core id		: 6
>>>>> processor	: 13
>>>>> core id		: 6
>>>>> processor	: 14
>>>>> core id		: 7
>>>>> processor	: 15
>>>>> core id		: 7
>>>>>
>>>>> --before this patch--
>>>>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>>>>> 0, 4, 5, 8, 9, 12, 13
>>>>> 1
>>>>> 2, 6, 7, 10, 11, 14, 15
>>>>> 3
>>>>>
>>>>> --after this patch--
>>>>> [root@blk-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>>>>> 0, 4, 5, 12, 13
>>>>> 1, 6, 7, 14, 15
>>>>> 2, 8, 9
>>>>> 3, 10, 11
>>>>>
>>>>>
>>>>> I add dump_stack insert blk_mq_map_queues.
>>>>>
>>>>> [    1.378680] Call Trace:
>>>>> [    1.378690]  dump_stack+0x5a/0x73
>>>>> [    1.378695]  blk_mq_map_queues+0x29/0xb0
>>>>> [    1.378700]  blk_mq_alloc_tag_set+0x1bd/0x2d0
>>>>> [    1.378705]  virtblk_probe+0x1ae/0x8e4 [virtio_blk]
>>>>> [    1.378709]  virtio_dev_probe+0x18a/0x230 [virtio]
>>>>> [    1.378713]  really_probe+0x215/0x3f0
>>>>> [    1.378716]  driver_probe_device+0x115/0x130
>>>>> [    1.378718]  device_driver_attach+0x50/0x60
>>>>> [    1.378720]  __driver_attach+0xbd/0x140
>>>>> [    1.378722]  ? device_driver_attach+0x60/0x60
>>>>> [    1.378724]  bus_for_each_dev+0x67/0xc0
>>>>> [    1.378727]  ? klist_add_tail+0x57/0x70
>>>>
>>>> I am not able to reproduce above call stack when virtio-blk is assigned 4 queues
>>>> while my qemu cmdline is "-smp 16,sockets=1,cores=8,threads=2".
>>>>
>>>> # cat /sys/block/vda/mq/0/cpu_list
>>>> 0, 1, 2, 3
>>>> # cat /sys/block/vda/mq/1/cpu_list
>>>> 4, 5, 6, 7
>>>> # cat /sys/block/vda/mq/2/cpu_list
>>>> 8, 9, 10, 11
>>>> # cat /sys/block/vda/mq/3/cpu_list
>>>> 12, 13, 14, 15
>>>>
>>>>
>>>> I do agree in above case we would have issue if the mapping is established by
>>>> blk_mq_map_queues().
>>>>
>>>>
>>>> However, I am just curious how we finally reach at blk_mq_map_queues() from
>>>> blk_mq_alloc_tag_set()?
>>>>
>>>> It should be something like:
>>>>
>>>> blk_mq_alloc_tag_set()
>>>> -> blk_mq_update_queue_map()
>>>>      -> if (set->ops->map_queues && !is_kdump_kernel())
>>>>             return set->ops->map_queues(set);
>>>>      -> else
>>>>             return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
>>>>
>>>> Wouldn't we always have set->ops->map_queues = virtblk_map_queues()?
>>>>
>>>> Or the execution reach at:
>>>>
>>>> virtblk_map_queues()
>>>> -> blk_mq_virtio_map_queues()
>>>>     -> if (!vdev->config->get_vq_affinity)
>>>>            return blk_mq_map_queues(qmap);
>>>>     -> else
>>>>            establish the mapping via get_vq_affinity
>>>>
>>>> but vdev->config->get_vq_affinity == NULL?
>>>>
>>>> For virtio pci, get_vq_affinity is always set. Seems virtio mmio would not set
>>>> get_vq_affinity.
>>>>
>>>>
>>>> I used to play with firecracker (by amazon) and it is interesting firecracker
>>>> uses mmio to setup virtio-blk.
>>>>
>>>>
>>>> Sorry for disturbing the review of this patch. I just would like to clarify in
>>>> which scenario we would hit this issue, e.g., when virtio-blk is based on mmio?
>>>>
>>>> Dongli Zhang
>>>>
>>>>>
>>>>>
>>>>> At 2019-03-22 19:58:08, "Dongli Zhang" <dongli.zhang@oracle.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 03/22/2019 06:09 PM, luferry wrote:
>>>>>>> under virtual machine environment, cpu topology may differ from normal
>>>>>>> physical server.
>>>>>>
>>>>>> Would mind share the name of virtual machine monitor, the command line if
>>>>>> available, and which device to reproduce.
>>>>>>
>>>>>> For instance, I am not able to reproduce with qemu nvme or virtio-blk as I
>>>>>> assume they use pci or virtio specific mapper to establish the mapping.
>>>>>>
>>>>>> E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2
>>>>>>
>>>>>> Indeed I use three queues instead of twp as one is reserved for admin.
>>>>>>
>>>>>> # ls /sys/block/nvme0n1/mq/*
>>>>>> /sys/block/nvme0n1/mq/0:
>>>>>> cpu0  cpu1  cpu2  cpu3  cpu_list  nr_reserved_tags  nr_tags
>>>>>>
>>>>>> /sys/block/nvme0n1/mq/1:
>>>>>> cpu4  cpu5  cpu6  cpu7  cpu_list  nr_reserved_tags  nr_tags
>>>>>>
>>>>>>
>>>>>> Thank you very much!
>>>>>>
>>>>>> Dongli Zhang
>>>>>>
>>>>>>> for example (machine with 4 cores, 2 threads per core):
>>>>>>>
>>>>>>> normal physical server:
>>>>>>> core-id   thread-0-id  thread-1-id
>>>>>>> 0         0            4
>>>>>>> 1         1            5
>>>>>>> 2         2            6
>>>>>>> 3         3            7
>>>>>>>
>>>>>>> virtual machine:
>>>>>>> core-id   thread-0-id  thread-1-id
>>>>>>> 0         0            1
>>>>>>> 1         2            3
>>>>>>> 2         4            5
>>>>>>> 3         6            7
>>>>>>>
>>>>>>> When attach disk with two queues, all the even numbered cpus will be
>>>>>>> mapped to queue 0. Under virtual machine, all the cpus is followed by
>>>>>>> its sibling cpu.Before this patch, all the odd numbered cpus will also
>>>>>>> be mapped to queue 0, can cause serious imbalance.this will lead to
>>>>>>> performance impact on system IO
>>>>>>>
>>>>>>> So suggest to allocate cpu map by core id, this can be more currency
>>>>>>>
>>>>>>> Signed-off-by: luferry <luferry@163.com>
>>>>>>> ---
>>>>>>>  block/blk-mq-cpumap.c | 9 +++++----
>>>>>>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>>>>>> index 03a534820271..4125e8e77679 100644
>>>>>>> --- a/block/blk-mq-cpumap.c
>>>>>>> +++ b/block/blk-mq-cpumap.c
>>>>>>> @@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>>>>>  {
>>>>>>>  	unsigned int *map = qmap->mq_map;
>>>>>>>  	unsigned int nr_queues = qmap->nr_queues;
>>>>>>> -	unsigned int cpu, first_sibling;
>>>>>>> +	unsigned int cpu, first_sibling, core = 0;
>>>>>>>  
>>>>>>>  	for_each_possible_cpu(cpu) {
>>>>>>>  		/*
>>>>>>> @@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>>>>>  			map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>>>>>  		} else {
>>>>>>>  			first_sibling = get_first_sibling(cpu);
>>>>>>> -			if (first_sibling == cpu)
>>>>>>> -				map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>>>>> -			else
>>>>>>> +			if (first_sibling == cpu) {
>>>>>>> +				map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
>>>>>>> +				core++;
>>>>>>> +			} else
>>>>>>>  				map[cpu] = map[first_sibling];
>>>>>>>  		}
>>>>>>>  	}
>>>>>>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-03-25 15:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-22 10:09 [PATCH] block/mq: blk map queues by core id luferry
2019-03-22 11:53 ` Dongli Zhang
2019-03-22 11:58 ` Dongli Zhang
2019-03-23  6:34   ` luferry
2019-03-23 11:14     ` Dongli Zhang
2019-03-25  9:49       ` luferry
2019-03-25  9:53         ` luferry
2019-03-25 13:53           ` Dongli Zhang
2019-03-25 15:17             ` luferry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).