From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4F0BC433E0 for ; Thu, 4 Jun 2020 12:08:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BA27A206E6 for ; Thu, 4 Jun 2020 12:08:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="PxHKNGlK" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728101AbgFDMIg (ORCPT ); Thu, 4 Jun 2020 08:08:36 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:53187 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728024AbgFDMIf (ORCPT ); Thu, 4 Jun 2020 08:08:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1591272513; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sDqh2aYfOa5VGAO8vBMA4ql1kGqUY2JylfyA9K3Mcz8=; b=PxHKNGlKtCta2eBI0p0NULzDhCAZZ/H6stBgglrzz1rgAO/95c2gyg/8kWF4hOTugR5OTB RN01z/sBNdErV1jcyBtGPDkZIWwvP3OAzQWgSzZMnVDBRvcKvDYwzYCiUULh6QOEHjH9K6 1CcKGJxyujvegEJI6DvVLTWAFSRjcm8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-498-mhe-ARdUMjeAFEXGZyYl3Q-1; Thu, 04 Jun 2020 08:08:29 -0400 X-MC-Unique: mhe-ARdUMjeAFEXGZyYl3Q-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 351AC1005510; Thu, 4 Jun 2020 12:08:28 +0000 (UTC) Received: from T590 (ovpn-12-27.pek2.redhat.com [10.72.12.27]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7F97F5C6D9; Thu, 4 Jun 2020 12:08:20 +0000 (UTC) Date: Thu, 4 Jun 2020 20:08:16 +0800 From: Ming Lei To: John Garry Cc: Christoph Hellwig , Dongli Zhang , Jens Axboe , "linux-block@vger.kernel.org" , Hannes Reinecke , Daniel Wagner Subject: Re: [PATCH] blk-mq: don't fail driver tag allocation because of inactive hctx Message-ID: <20200604120747.GB2336493@T590> References: <20200603105128.2147139-1-ming.lei@redhat.com> <20200603115347.GA8653@lst.de> <20200603133608.GA2149752@T590> <6b58e473-16a4-4ce2-a4ac-50b952d364d7@huawei.com> <6fbd3669-4358-6d9f-5c94-e1bc7acecb86@oracle.com> <20200604112615.GA2336493@T590> <7291fd02-3c2c-f3f9-f3eb-725cd85d5523@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7291fd02-3c2c-f3f9-f3eb-725cd85d5523@huawei.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Thu, Jun 04, 2020 at 12:50:32PM +0100, John Garry wrote: > On 04/06/2020 12:26, Ming Lei wrote: > > > And another: > > > > > > 51.336698] CPU53: shutdown > > > [ 251.336737] run queue from wrong CPU 0, hctx active > > > [ 251.339545] run queue from wrong CPU 0, hctx active > > > [ 251.344695] psci: CPU53 killed (polled 0 ms) > > > [ 251.349302] run queue from wrong CPU 0, hctx active > > > [ 251.358447] run queue from wrong CPU 0, hctx active > > > [ 251.363336] run queue from wrong CPU 0, hctx active > > > [ 251.368222] run queue from wrong CPU 0, hctx active > > > [ 251.373108] run queue from wrong CPU 0, hctx active > > > [ 252.488831] CPU54: shutdown=1094MiB/s,w=0KiB/s][r=280k,w=0 IOPS][eta > > > 00m:04s] > > > [ 252.491629] psci: CPU54 killed (polled 0 ms) > > > [ 252.536533] irq_shutdown irq146 > > > [ 252.540007] CPU55: shutdown > > > [ 252.543273] psci: CPU55 killed (polled 0 ms) > > > [ 252.656656] CPU56: shutdown > > > [ 252.659446] psci: CPU56 killed (polled 0 ms) > > > [ 252.704576] CPU57: shutdown=947MiB/s,w=0KiB/s][r=242k,w=0 IOPS][eta > > > 00m:02s] > > > [ 252.707369] psci: CPU57 killed (polled 0 ms) > > > [ 254.352646] CPU58: shutdown=677MiB/s,w=0KiB/s][r=173k,w=0 IOPS][eta > > > 00m:02s] > > > [ 254.355442] psci: CPU58 killed (polled 0 ms) > > > long sleep 1 > > > [ 279.288227] scsi_times_out req=0xffff041fa10b6600[r=0,w=0 IOPS][eta > > > 04m:37s] > > > [ 279.320281] sas: Enter sas_scsi_recover_host busy: 1 failed: 1 > > > [ 279.326144] sas: trying to find task 0x00000000e8dca422 > > > [ 279.331379] sas: sas_scsi_find_task: aborting task 0x00000000e8dca422 > > > > > > none scheduler seems ok. > > Can you reproduce it by just applying the patch of 'blk-mq: don't fail driver tag > > allocation because of inactive hctx'? > > That's your patch - ok, I can try. > > So I added some debug for when using Christoph's: > > static bool blk_mq_hctx_has_requests2(struct blk_mq_hw_ctx *hctx) > { > struct blk_mq_tags *tags = hctx->tags; > struct rq_iter_data data = { > .hctx = hctx, > }; > > blk_mq_all_tag_iter(tags, blk_mq_has_request, &data); > return data.has_rq; > } > > > static inline bool blk_mq_last_cpu_in_hctx(unsigned int cpu, > struct blk_mq_hw_ctx *hctx) > { > @@ -2386,6 +2398,8 @@ static int blk_mq_hctx_notify_offline(unsigned int > cpu, struct hlist_node *node) > if (percpu_ref_tryget(&hctx->queue->q_usage_counter)) { > while (blk_mq_hctx_has_requests(hctx)) > msleep(5); > + if (blk_mq_hctx_has_requests2(hctx)) > + pr_err("%s still has driver tag - but how? hctx->sched_tags=%pS\n", > __func__, hctx->sched_tags); > percpu_ref_put(&hctx->queue->q_usage_counter); > } else { > > And I get this: > > [ 366.839260] CPU46: shutdown > [ 366.842052] psci: CPU46 killed (polled 0 ms) > [ 366.910923] blk_mq_hctx_notify_offline still has driver tag - but how? > hctx->sched_tags=0xffff041fa59e0d00 > [ 366.920870] irq_migrate_all_off_this_cpu: 5 callbacks suppressed > [ 366.920871] IRQ 113: no longer affine to CPU47 > [ 366.931299] IRQ 116: no longer affine to CPU47 > [ 366.935734] IRQ 119: no longer affine to CPU47 > [ 366.940169] IRQ 123: no longer affine to CPU47 > [ 366.944607] irq_shutdown irq144 > [ 366.947783] IRQ 330: no longer affine to CPU47 > [ 366.952217] IRQ 333: no longer affine to CPU47 > [ 366.956651] IRQ 336: no longer affine to CPU47 > [ 366.961085] IRQ 339: no longer affine to CPU47 > [ 366.965519] IRQ 342: no longer affine to CPU47 > [ 366.969953] IRQ 345: no longer affine to CPU47 > [ 366.974475] CPU47: shutdown > [ 366.977273] psci: CPU47 killed (polled 0 ms) > [ 367.063066] CPU48: shutdown > [ 367.065857] psci: CPU48 killed (polled 0 ms) > [ 367.107071] CPU49: shutdown > [ 367.109858] psci: CPU49 killed (polled 0 ms) > [ 367.147057] CPU50: shutdown > [ 367.149843] psci: CPU50 killed (polled 0 ms) > [ 367.175006] irq_shutdown irq145 > [ 367.178272] CPU51: shutdown > [ 367.181070] psci: CPU51 killed (polled 0 ms) > [ 367.223067] CPU52: shutdown > [ 367.225856] psci: CPU52 killed (polled 0 ms) > [ 367.267024] CPU53: shutdown > [ 367.269821] psci: CPU53 killed (polled 0 ms) > [ 367.303004] CPU54: shutdown > [ 367.305792] psci: CPU54 killed (polled 0 ms) > [ 367.342948] irq_shutdown irq146 > [ 367.346169] CPU55: shutdown > [ 367.348965] psci: CPU55 killed (polled 0 ms) > [ 367.387004] CPU56: shutdown][16.9%][r=869MiB/s,w=0KiB/s][r=223k,w=0 > IOPS][eta 04m:59s] > [ 367.389791] psci: CPU56 killed (polled 0 ms) > [ 367.419017] CPU57: shutdown > [ 367.421804] psci: CPU57 killed (polled 0 ms) > [ 367.447010] CPU58: shutdown > [ 367.449796] psci: CPU58 killed (polled 0 ms) > long sleep 2 > [ 398.070707] scsi_times_out req=0xffff041fa1a89c00iB/s][r=0,w=0 IOPS][eta > 04m:29s] > > I thought that if all the sched tags are put, then we should have no driver > tag for that same hctx, right? That seems to coincide with the timeout (30 > seconds later) That is weird, if there is driver tag found, that means the request is in-flight and can't be completed by HW. I assume you have integrated global host tags patch in your test, and suggest you to double check hisi_sas's queue mapping which has to be exactly same with blk-mq's mapping. > > > > > If yes, can you collect debugfs log after the timeout is triggered? > > Same limitation as before - once SCSI timeout happens, SCSI error handling > kicks in and the shost no longer accepts commands, and, since that same > shost provides rootfs, becomes unresponsive. But I can try. Just wondering why not install two disks in your test machine, :-) Thanks, Ming