linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Sumit Saxena <sumit.saxena@broadcom.com>, Hannes Reinecke <hare@suse.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
	Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@lst.de>,
	James Bottomley <james.bottomley@hansenpartnership.com>,
	Ming Lei <ming.lei@redhat.com>,
	Linux SCSI List <linux-scsi@vger.kernel.org>,
	<linux-block@vger.kernel.org>, Hannes Reinecke <hare@suse.com>
Subject: Re: [PATCH 09/11] megaraid_sas: switch fusion adapters to MQ
Date: Fri, 10 Jan 2020 12:18:47 +0000	[thread overview]
Message-ID: <244b9da3-b9ef-de61-addf-880bb757bb77@huawei.com> (raw)
In-Reply-To: <CAL2rwxotoWakFS4DPe85hZ4VAgd_zw8pL+B5ckHR9NwEf+-L=g@mail.gmail.com>

On 10/01/2020 04:00, Sumit Saxena wrote:
> On Mon, Dec 9, 2019 at 4:32 PM Hannes Reinecke <hare@suse.de> wrote:
>>
>> On 12/9/19 11:10 AM, Sumit Saxena wrote:
>>> On Mon, Dec 2, 2019 at 9:09 PM Hannes Reinecke <hare@suse.de> wrote:
>>>>
>>>> Fusion adapters can steer completions to individual queues, and
>>>> we now have support for shared host-wide tags.
>>>> So we can enable multiqueue support for fusion adapters and
>>>> drop the hand-crafted interrupt affinity settings.
>>>
>>> Hi Hannes,
>>>
>>> Ming Lei also proposed similar changes in megaraid_sas driver some
>>> time back and it had resulted in performance drop-
>>> https://patchwork.kernel.org/patch/10969511/
>>>
>>> So, we will do some performance tests with this patch and update you.
>>> Thank you.
>>
>> I'm aware of the results of Ming Leis work, but I do hope this patchset
>> performs better.
>>
>> And when you do performance measurements, can you please run with both,
>> 'none' I/O scheduler and 'mq-deadline' I/O scheduler?
>> I've measured quite a performance improvements when using mq-deadline,
>> up to the point where I've gotten on-par performance with the original,
>> non-mq, implementation.
>> (As a data point, on my setup I've measured about 270k IOPS and 1092
>> MB/s througput, running on just 2 SSDs).
>>
>> But thanks for doing a performance test here.
> 

Hi Sumit,

Thanks for this. We'll have a look ...

Cheers,
John

EOM

> Hi Hannes,
> 
> Sorry for the delay in replying, I observed a few issues with this patchset:
> 
> 1. "blk_mq_unique_tag_to_hwq(tag)" does not return MSI-x vector to
> which IO submitter CPU is affined with. Due to this IO submission and
> completion CPUs are different which causes performance drop for low
> latency workloads.
> 
> lspcu:
> 
> # lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                72
> On-line CPU(s) list:   0-71
> Thread(s) per core:    2
> Core(s) per socket:    18
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 85
> Model name:            Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
> Stepping:              4
> CPU MHz:               3204.246
> CPU max MHz:           3700.0000
> CPU min MHz:           1200.0000
> BogoMIPS:              5400.00
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              1024K
> L3 cache:              25344K
> NUMA node0 CPU(s):     0-17,36-53
> NUMA node1 CPU(s):     18-35,54-71
> Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep
> mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
> tm pbe s
> yscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts
> rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
> dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1
> sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand
> lahf_lm abm
> 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin
> mba tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
> bmi1 hle
> avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed
> adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt
> xsavec
> xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_lo
> 
> 
> fio test:
> 
> #numactl -N 0 fio jbod.fio
> 
> # cat jbod.fio
> [global]
> allow_mounted_write=0
> ioengine=libaio
> buffered=0
> direct=1
> group_reporting
> iodepth=1
> bs=4096
> readwrite=randread
> ..
> 
> In this test, IOs are submitted on Node 0 but observed IO completions on Node 1.
> 
> IRQs / 1 second(s)
> IRQ#  TOTAL  NODE0   NODE1  NAME
> 176  48387  48387       0  IR-PCI-MSI 34603073-edge megasas14-msix65
> 178  47966  47966       0  IR-PCI-MSI 34603075-edge megasas14-msix67
> 170  47706  47706       0  IR-PCI-MSI 34603067-edge megasas14-msix59
> 180  47291  47291       0  IR-PCI-MSI 34603077-edge megasas14-msix69
> 181  47155  47155       0  IR-PCI-MSI 34603078-edge megasas14-msix70
> 173  46806  46806       0  IR-PCI-MSI 34603070-edge megasas14-msix62
> 179  46773  46773       0  IR-PCI-MSI 34603076-edge megasas14-msix68
> 169  46600  46600       0  IR-PCI-MSI 34603066-edge megasas14-msix58
> 175  46447  46447       0  IR-PCI-MSI 34603072-edge megasas14-msix64
> 172  46184  46184       0  IR-PCI-MSI 34603069-edge megasas14-msix61
> 182  46117  46117       0  IR-PCI-MSI 34603079-edge megasas14-msix71
> 165  46070  46070       0  IR-PCI-MSI 34603062-edge megasas14-msix54
> 164  45892  45892       0  IR-PCI-MSI 34603061-edge megasas14-msix53
> 174  45864  45864       0  IR-PCI-MSI 34603071-edge megasas14-msix63
> 156  45348  45348       0  IR-PCI-MSI 34603053-edge megasas14-msix45
> 147  45302      0   45302  IR-PCI-MSI 34603044-edge megasas14-msix36
> 151  44922      0   44922  IR-PCI-MSI 34603048-edge megasas14-msix40
> 171  44876  44876       0  IR-PCI-MSI 34603068-edge megasas14-msix60
> 159  44755  44755       0  IR-PCI-MSI 34603056-edge megasas14-msix48
> 148  44695      0   44695  IR-PCI-MSI 34603045-edge megasas14-msix37
> 157  44304  44304       0  IR-PCI-MSI 34603054-edge megasas14-msix46
> 167  42552  42552       0  IR-PCI-MSI 34603064-edge megasas14-msix56
> 154  35937      0   35937  IR-PCI-MSI 34603051-edge megasas14-msix43
> 166  16004  16004       0  IR-PCI-MSI 34603063-edge megasas14-msix55
> 
> 
> IRQ-CPU affinity:
> 
> Ran below script to get IRQ-CPU affinity:
> --
> #!/bin/bash
> PCID=`lspci | grep "SAS39xx" | cut -c1-7`
> PCIP=`find /sys/devices -name *$PCID | grep pci`
> IRQS=`ls $PCIP/msi_irqs`
> 
> echo "kernel version: "
> uname -a
> 
> for IRQ in $IRQS; do
>      CPUS=`cat /proc/irq/$IRQ/smp_affinity_list`
>      echo "irq $IRQ, cpu list $CPUS"
> done
> 
> --
> 
> irq 103, cpu list 0-17,36-53
> irq 112, cpu list 0-17,36-53
> irq 113, cpu list 0-17,36-53
> irq 114, cpu list 0-17,36-53
> irq 115, cpu list 0-17,36-53
> irq 116, cpu list 0-17,36-53
> irq 117, cpu list 0-17,36-53
> irq 118, cpu list 0-17,36-53
> irq 119, cpu list 18
> irq 120, cpu list 19
> irq 121, cpu list 20
> irq 122, cpu list 21
> irq 123, cpu list 22
> irq 124, cpu list 23
> irq 125, cpu list 24
> irq 126, cpu list 25
> irq 127, cpu list 26
> irq 128, cpu list 27
> irq 129, cpu list 28
> irq 130, cpu list 29
> irq 131, cpu list 30
> irq 132, cpu list 31
> irq 133, cpu list 32
> irq 134, cpu list 33
> irq 135, cpu list 34
> irq 136, cpu list 35
> irq 137, cpu list 54
> irq 138, cpu list 55
> irq 139, cpu list 56
> irq 140, cpu list 57
> irq 141, cpu list 58
> irq 142, cpu list 59
> irq 143, cpu list 60
> irq 144, cpu list 61
> irq 145, cpu list 62
> irq 146, cpu list 63
> irq 147, cpu list 64
> irq 148, cpu list 65
> irq 149, cpu list 66
> irq 150, cpu list 67
> irq 151, cpu list 68
> irq 152, cpu list 69
> irq 153, cpu list 70
> irq 154, cpu list 71
> irq 155, cpu list 0
> irq 156, cpu list 1
> irq 157, cpu list 2
> irq 158, cpu list 3
> irq 159, cpu list 4
> irq 160, cpu list 5
> irq 161, cpu list 6
> irq 162, cpu list 7
> irq 163, cpu list 8
> irq 164, cpu list 9
> irq 165, cpu list 10
> irq 166, cpu list 11
> irq 167, cpu list 12
> irq 168, cpu list 13
> irq 169, cpu list 14
> irq 170, cpu list 15
> irq 171, cpu list 16
> irq 172, cpu list 17
> irq 173, cpu list 36
> irq 174, cpu list 37
> irq 175, cpu list 38
> irq 176, cpu list 39
> irq 177, cpu list 40
> irq 178, cpu list 41
> irq 179, cpu list 42
> irq 180, cpu list 43
> irq 181, cpu list 44
> irq 182, cpu list 45
> irq 183, cpu list 46
> irq 184, cpu list 47
> irq 185, cpu list 48
> irq 186, cpu list 49
> irq 187, cpu list 50
> irq 188, cpu list 51
> irq 189, cpu list 52
> irq 190, cpu list 53
> 
> 
> I added prints in megaraid_sas driver's IO path to catch when MSI-x
> affined to IO submitter CPU does not match with what is returned by
> API "blk_mq_unique_tag_to_hwq(tag)".
> I have copied below few prints from dmesg logs- in these prints CPU is
> submitter CPU, calculated MSX-x is what is returned by
> "blk_mq_unique_tag_to_hwq(tag)" and affined MSI-x is actual MSI-x to
> which submitting
> CPU is affined to:
> 
> [2536843.629877] BRCM DBG: CPU:6 calculated MSI-x: 153 affined MSIx: 161
> [2536843.641674] BRCM DBG: CPU:39 calculated MSI-x: 168 affined MSIx: 176
> [2536843.641674] BRCM DBG: CPU:13 calculated MSI-x: 160 affined MSIx: 168
> 
> 
> 2. Seeing below stack traces/messages in dmesg during driver unload –
> 
> [2565601.054366] Call Trace:
> [2565601.054368]  blk_mq_free_map_and_requests+0x28/0x50
> [2565601.054369]  blk_mq_free_tag_set+0x1d/0x90
> [2565601.054370]  scsi_host_dev_release+0x8a/0xf0
> [2565601.054370]  device_release+0x27/0x80
> [2565601.054371]  kobject_cleanup+0x61/0x190
> [2565601.054373]  megasas_detach_one+0x4c1/0x650 [megaraid_sas]
> [2565601.054374]  pci_device_remove+0x3b/0xc0
> [2565601.054375]  device_release_driver_internal+0xec/0x1b0
> [2565601.054376]  driver_detach+0x46/0x90
> [2565601.054377]  bus_remove_driver+0x58/0xd0
> [2565601.054378]  pci_unregister_driver+0x26/0xa0
> [2565601.054379]  megasas_exit+0x91/0x882 [megaraid_sas]
> [2565601.054381]  __x64_sys_delete_module+0x16c/0x250
> [2565601.054382]  do_syscall_64+0x5b/0x1b0
> [2565601.054383]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [2565601.054383] RIP: 0033:0x7f7212a82837
> [2565601.054384] RSP: 002b:00007ffdfa2dcea8 EFLAGS: 00000202 ORIG_RAX:
> 00000000000000b0
> [2565601.054385] RAX: ffffffffffffffda RBX: 0000000000b6e2e0 RCX:
> 00007f7212a82837
> [2565601.054385] RDX: 00007f7212af3ac0 RSI: 0000000000000800 RDI:
> 0000000000b6e348
> [2565601.054386] RBP: 0000000000000000 R08: 00007f7212d47060 R09:
> 00007f7212af3ac0
> [2565601.054386] R10: 00007ffdfa2dcbc0 R11: 0000000000000202 R12:
> 00007ffdfa2dd71c
> [2565601.054387] R13: 0000000000000000 R14: 0000000000b6e2e0 R15:
> 0000000000b6e010
> [2565601.054387] ---[ end trace 38899303bd85e838 ]---
> [2565601.054390] ------------[ cut here ]------------
> [2565601.054391] WARNING: CPU: 31 PID: 50996 at block/blk-mq.c:2056
> blk_mq_free_rqs+0x10b/0x120
> [2565601.054391] Modules linked in: megaraid_sas(-) ses enclosure
> scsi_transport_sas xt_CHECKSUM xt_MASQUERADE tun bridge stp llc
> ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6
> xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_nat
> ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle
> iptable_security iptable_raw ebtable_filter ebtables ip6table_filter
> ip6_tables iptable_filter intel_rapl_msr intel_rapl_common skx_edac
> nfit libnvdimm ftdi_sio x86_pkg_temp_thermal intel_powerclamp coretemp
> kvm_intel snd_hda_codec_realtek kvm snd_hda_codec_generic
> ledtrig_audio snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core
> irqbypass snd_hwdep crct10dif_pclmul snd_seq crc32_pclmul
> ghash_clmulni_intel snd_seq_device snd_pcm iTCO_wdt mei_me
> iTCO_vendor_support cryptd snd_timer joydev snd mei soundcore ioatdma
> sg pcspkr ipmi_si ipmi_devintf ipmi_msghandler dca i2c_i801 lpc_ich
> acpi_power_meter wmi
> [2565601.054400]  acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc
> ip_tables xfs libcrc32c sd_mod ast i2c_algo_bit drm_vram_helper ttm
> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
> e1000e ahci libahci crc32c_intel nvme libata nvme_core i2c_core
> dm_mirror dm_region_hash dm_log dm_mod [last unloaded: megaraid_sas]
> [2565601.054404] CPU: 31 PID: 50996 Comm: rmmod Kdump: loaded Tainted:
> G        W  OE     5.4.0-rc1+ #2
> [2565601.054405] Hardware name: Supermicro Super Server/X11DPG-QT,
> BIOS 1.0 06/22/2017
> [2565601.054406] RIP: 0010:blk_mq_free_rqs+0x10b/0x120
> [2565601.054407] Code: 89 10 48 8b 73 20 48 89 1b 4c 89 e7 48 89 5b 08
> e8 2a 54 e7 ff 48 8b 85 b0 00 00 00 49 39 c5 75 bc 5b 5d 41 5c 41 5d
> 41 5e c3 <0f> 0b 48 c7 02 00 00 00 00 e9 2b ff ff ff 0f 1f 80 00 00 00
> 00 0f
> [2565601.054407] RSP: 0018:ffffb37446a6bd58 EFLAGS: 00010286
> [2565601.054408] RAX: 0000000000000747 RBX: ffff92219cb280a8 RCX:
> 0000000000000747
> [2565601.054408] RDX: ffff92219b153a38 RSI: ffff9221692bb5c0 RDI:
> ffff92219cb280a8
> [2565601.054408] RBP: ffff9221692bb5c0 R08: 0000000000000001 R09:
> 0000000000000000
> [2565601.054409] R10: ffff9221692bb680 R11: ffff9221bffd5f60 R12:
> ffff9221970dd678
> [2565601.054409] R13: 0000000000000030 R14: ffff92219cb280a8 R15:
> ffff922199ada010
> [2565601.054410] FS:  00007f72135a1740(0000) GS:ffff92299f540000(0000)
> knlGS:0000000000000000
> [2565601.054410] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [2565601.054410] CR2: 0000000000b77c78 CR3: 0000000f27db0006 CR4:
> 00000000007606e0
> [2565601.054412] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [2565601.054412] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [2565601.054413] PKRU: 55555554
> 
> 
> 3. For High IOs(outstanding IOs per physical disk > 8) oriented
> workloads, performance numbers are good(no performance drop) as in
> that case driver uses non-managed affinity high IOPs reply queues and
> this patchset does not touch driver's high IOPs IO path.
> 
> 4. This patch removes below code from driver so what this piece of
> code does is broken-
> 
> 
> -                               if (instance->adapter_type >= INVADER_SERIES &&
> -                                   !instance->msix_combined) {
> -                                       instance->msix_load_balance = true;
> -                                       instance->smp_affinity_enable = false;
> -                               }
> 
> Thanks,
> Sumit
>>
>> Cheers,
>>
>> Hannes
>> --
>> Dr. Hannes Reinecke            Teamlead Storage & Networking
>> hare@suse.de                               +49 911 74053 688
>> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
>> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
> .
> 


  reply	other threads:[~2020-01-10 12:18 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-02 15:39 [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs Hannes Reinecke
2019-12-02 15:39 ` [PATCH 01/11] blk-mq: Remove some unused function arguments Hannes Reinecke
2019-12-02 15:39 ` [PATCH 02/11] blk-mq: rename BLK_MQ_F_TAG_SHARED as BLK_MQ_F_TAG_QUEUE_SHARED Hannes Reinecke
2019-12-02 15:39 ` [PATCH 03/11] blk-mq: rename blk_mq_update_tag_set_depth() Hannes Reinecke
2019-12-03 14:30   ` John Garry
2019-12-03 14:53     ` Hannes Reinecke
2019-12-02 15:39 ` [PATCH 04/11] blk-mq: Facilitate a shared sbitmap per tagset Hannes Reinecke
2019-12-03 14:54   ` John Garry
2019-12-03 15:02     ` Hannes Reinecke
2019-12-04 10:24       ` John Garry
2019-12-03 16:38     ` Bart Van Assche
2019-12-02 15:39 ` [PATCH 05/11] blk-mq: add WARN_ON in blk_mq_free_rqs() Hannes Reinecke
2019-12-02 15:39 ` [PATCH 06/11] blk-mq: move shared sbitmap into elevator queue Hannes Reinecke
2019-12-02 15:39 ` [PATCH 07/11] scsi: Add template flag 'host_tagset' Hannes Reinecke
2019-12-02 15:39 ` [PATCH 08/11] scsi: hisi_sas: Switch v3 hw to MQ Hannes Reinecke
2019-12-02 15:39 ` [PATCH 09/11] megaraid_sas: switch fusion adapters " Hannes Reinecke
2019-12-09 10:10   ` Sumit Saxena
2019-12-09 11:02     ` Hannes Reinecke
2020-01-10  4:00       ` Sumit Saxena
2020-01-10 12:18         ` John Garry [this message]
2020-01-13 17:42         ` John Garry
2020-01-14  7:05           ` Hannes Reinecke
2020-01-16 15:47             ` John Garry
2020-01-16 17:45               ` Hannes Reinecke
2020-01-17 11:40             ` Sumit Saxena
2020-01-17 11:18           ` Sumit Saxena
2020-02-13 10:07             ` John Garry
2020-02-17 10:09               ` Sumit Saxena
2020-01-09 11:55     ` John Garry
2020-01-09 15:19       ` Hannes Reinecke
2020-01-09 18:17         ` John Garry
2020-01-10  2:00       ` Ming Lei
2020-01-10 12:09         ` John Garry
2020-01-14 13:57           ` John Garry
2019-12-02 15:39 ` [PATCH 10/11] smartpqi: enable host tagset Hannes Reinecke
2019-12-02 15:39 ` [PATCH 11/11] hpsa: enable host_tagset and switch to MQ Hannes Reinecke
2020-02-26 11:09 ` [PATCH RFC v5 00/11] blk-mq/scsi: Provide hostwide shared tags for SCSI HBAs John Garry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=244b9da3-b9ef-de61-addf-880bb757bb77@huawei.com \
    --to=john.garry@huawei.com \
    --cc=axboe@kernel.dk \
    --cc=hare@suse.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=james.bottomley@hansenpartnership.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=ming.lei@redhat.com \
    --cc=sumit.saxena@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).