From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1ED74C433DB for ; Tue, 23 Feb 2021 09:00:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B0F3664DDA for ; Tue, 23 Feb 2021 09:00:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230482AbhBWJAX (ORCPT ); Tue, 23 Feb 2021 04:00:23 -0500 Received: from frasgout.his.huawei.com ([185.176.79.56]:2595 "EHLO frasgout.his.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231143AbhBWJAV (ORCPT ); Tue, 23 Feb 2021 04:00:21 -0500 Received: from fraeml714-chm.china.huawei.com (unknown [172.18.147.226]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4DlCZ13ynYz67rv9; Tue, 23 Feb 2021 16:55:37 +0800 (CST) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by fraeml714-chm.china.huawei.com (10.206.15.33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2106.2; Tue, 23 Feb 2021 09:59:37 +0100 Received: from [10.47.0.222] (10.47.0.222) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2106.2; Tue, 23 Feb 2021 08:59:36 +0000 Subject: Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early To: Roger Willcocks , CC: , , , , , , , , , References: <20210120184548.20219-1-mwilck@suse.com> <37579c64-1cdb-8864-6a30-4d912836f28a@huawei.com> <231d9fcd-14f4-6abf-c41a-56315877a3dc@molgen.mpg.de> <87b7f873-46c4-140b-ee45-f724b50b6aca@huawei.com> <361d5a2f-fb8e-c400-2818-29aea435aff2@huawei.com> <2e4cca87aaa27220e186025573ae7c24579e8b7b.camel@suse.com> <0DB85ADC-B962-4AF9-B106-3F3B412CE4DB@filmlight.ltd.uk> From: John Garry Message-ID: <4bff6232-6abd-dae8-c240-07a1a40178bf@huawei.com> Date: Tue, 23 Feb 2021 08:57:49 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.1.2 MIME-Version: 1.0 In-Reply-To: <0DB85ADC-B962-4AF9-B106-3F3B412CE4DB@filmlight.ltd.uk> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.47.0.222] X-ClientProxiedBy: lhreml742-chm.china.huawei.com (10.201.108.192) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org On 22/02/2021 14:23, Roger Willcocks wrote: > FYI we have exactly this issue on a machine here running CentOS 8.3 (kernel 4.18.0-240.1.1) (so presumably this happens in RHEL 8 too.) > > Controller is MSCC / Adaptec 3154-8i16e driving 60 x 12TB HGST drives configured as five x twelve-drive raid-6, software striped using md, and formatted with xfs. > > Test software writes to the array using multiple threads in parallel. > > The smartpqi driver would report controller offline within ten minutes or so, with status code 0x6100c > > Changed the driver to set 'nr_hw_queues = 1’ and then tested by filling the array with random files (which took a couple of days), which completed fine, so it looks like that one-line change fixes it. > That just makes the driver single-queue. As such, since the driver uses blk_mq_unique_tag_to_hwq(), only hw queue #0 will ever be used in the driver. And then, since the driver still spreads MSI-X interrupt vectors over all CPUs [from pci_alloc_vectors(PCI_IRQ_AFFINITY)], if CPUs associated with HW queue #0 are offlined (probably just cpu0), there is no CPUs available to service queue #0 interrupt. That's what I think would happen, from a quick glance at the code. > Would, of course, be helpful if this was back-ported. > > — > Roger > > > >> On 3 Feb 2021, at 15:56, Don.Brace@microchip.com wrote: >> >> -----Original Message----- >> From: Martin Wilck [mailto:mwilck@suse.com] >> Subject: Re: [PATCH] scsi: scsi_host_queue_ready: increase busy count early >> >>> >>> >>> Confirmed my suspicions - it looks like the host is sent more commands >>> than it can handle. We would need many disks to see this issue though, >>> which you have. >>> >>> So for stable kernels, 6eb045e092ef is not in 5.4 . Next is 5.10, and >>> I suppose it could be simply fixed by setting .host_tagset in scsi >>> host template there. >>> >>> Thanks, >>> John >>> -- >>> Don: Even though this works for current kernels, what would chances of >>> this getting back-ported to 5.9 or even further? >>> >>> Otherwise the original patch smartpqi_fix_host_qdepth_limit would >>> correct this issue for older kernels. >> >> True. However this is 5.12 material, so we shouldn't be bothered by that here. For 5.5 up to 5.9, you need a workaround. But I'm unsure whether smartpqi_fix_host_qdepth_limit would be the solution. >> You could simply divide can_queue by nr_hw_queues, as suggested before, or even simpler, set nr_hw_queues = 1. >> >> How much performance would that cost you? >> >> Don: For my HBA disk tests... >> >> Dividing can_queue / nr_hw_queues is about a 40% drop. >> ~380K - 400K IOPS >> Setting nr_hw_queues = 1 results in a 1.5 X gain in performance. >> ~980K IOPS >> Setting host_tagset = 1 >> ~640K IOPS >> >> So, it seem that setting nr_hw_queues = 1 results in the best performance. >> >> Is this expected? Would this also be true for the future? >> >> Thanks, >> Don Brace >> >> Below is my setup. >> --- >> [3:0:0:0] disk HP EG0900FBLSK HPD7 /dev/sdd >> [3:0:1:0] disk HP EG0900FBLSK HPD7 /dev/sde >> [3:0:2:0] disk HP EG0900FBLSK HPD7 /dev/sdf >> [3:0:3:0] disk HP EH0300FBQDD HPD5 /dev/sdg >> [3:0:4:0] disk HP EG0900FDJYR HPD4 /dev/sdh >> [3:0:5:0] disk HP EG0300FCVBF HPD9 /dev/sdi >> [3:0:6:0] disk HP EG0900FBLSK HPD7 /dev/sdj >> [3:0:7:0] disk HP EG0900FBLSK HPD7 /dev/sdk >> [3:0:8:0] disk HP EG0900FBLSK HPD7 /dev/sdl >> [3:0:9:0] disk HP MO0200FBRWB HPD9 /dev/sdm >> [3:0:10:0] disk HP MM0500FBFVQ HPD8 /dev/sdn >> [3:0:11:0] disk ATA MM0500GBKAK HPGC /dev/sdo >> [3:0:12:0] disk HP EG0900FBVFQ HPDC /dev/sdp >> [3:0:13:0] disk HP VO006400JWZJT HP00 /dev/sdq >> [3:0:14:0] disk HP VO015360JWZJN HP00 /dev/sdr >> [3:0:15:0] enclosu HP D3700 5.04 - >> [3:0:16:0] enclosu HP D3700 5.04 - >> [3:0:17:0] enclosu HPE Smart Adapter 3.00 - >> [3:1:0:0] disk HPE LOGICAL VOLUME 3.00 /dev/sds >> [3:2:0:0] storage HPE P408e-p SR Gen10 3.00 - >> ----- >> [global] >> ioengine=libaio >> ; rw=randwrite >> ; percentage_random=40 >> rw=write >> size=100g >> bs=4k >> direct=1 >> ramp_time=15 >> ; filename=/mnt/fio_test >> ; cpus_allowed=0-27 >> iodepth=4096 >> >> [/dev/sdd] >> [/dev/sde] >> [/dev/sdf] >> [/dev/sdg] >> [/dev/sdh] >> [/dev/sdi] >> [/dev/sdj] >> [/dev/sdk] >> [/dev/sdl] >> [/dev/sdm] >> [/dev/sdn] >> [/dev/sdo] >> [/dev/sdp] >> [/dev/sdq] >> [/dev/sdr] >> >> >> Distribution kernels would be yet another issue, distros can backport host_tagset and get rid of the issue. >> >> Regards >> Martin >> >> >> >> >> >> >> >> >> >> > > . >