From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 315BFC432C0 for ; Fri, 22 Nov 2019 22:30:45 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E8A0B2071B for ; Fri, 22 Nov 2019 22:30:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="O2HMn7UH"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WXDgo7tb" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E8A0B2071B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=jghq5SJzsHNt9ZmLS5hQIAq0LDIvQudiyv+2xrjh9JY=; b=O2HMn7UH08MzxS AVRy3EIsrkn8Adq+HUbx0DjCEKLSRAToSnNl+J2yvFRdh6ZsNiKP3BNA0Go3ZBhl7q+xqgxQl8B9k zX5LTg4yJYkPaezADf6twKURadwGfDw3qYGN8pvOH58AruxIRS7sYKHR52lDzFpR19ckzbT4nmas/ IixyrG1XFNfBcSbK7GiZO84Z9qfxNpvriHeaSLN4TW4yZrhiQ2E0gbJghau4tsI26HVvhe17WIyxm QnH23trkALJ0EM9HnIU4x4BrrTqL8ADMGJBQ04oZRhaov1yBethZd0tLlMgat9FjK/G6NkWnb/Uit PF/Ztr4uQsSpfIJv5hAw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1iYHRp-0008Ci-Nz; Fri, 22 Nov 2019 22:30:41 +0000 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120] helo=us-smtp-1.mimecast.com) by bombadil.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1iYHRm-0008CO-J9 for linux-nvme@lists.infradead.org; Fri, 22 Nov 2019 22:30:40 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1574461836; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=50x9unB18SDsV5p17g6N6VkYJW1pLthvvsptUTIV+x0=; b=WXDgo7tbTJIuDLI8GJY2yN9kdc3QNuModpgBpwx8FU4BRGYxRbNwyNpr+ACrmB+0ILIryZ JNAj4OY/c+MVSjuv9ukhtygGP5ekITyBXt4zBhBZjYhPJwUSQNtXCef0U4GhWISmbwoOay RS8nFCmzrD5mw3f+7ZWy0NlTmjEwVmk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-185-aR1Z9tMLMWeIZOx_MO-Ywg-1; Fri, 22 Nov 2019 17:30:33 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 719DA8C3EC0; Fri, 22 Nov 2019 22:30:31 +0000 (UTC) Received: from ming.t460p (ovpn-8-16.pek2.redhat.com [10.72.8.16]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 29F9D5D6A0; Fri, 22 Nov 2019 22:30:23 +0000 (UTC) Date: Sat, 23 Nov 2019 06:30:19 +0800 From: Ming Lei To: Jens Axboe Subject: Re: [PATCH V3 0/2] nvme-pci: check CQ after batch submission for Microsoft device Message-ID: <20191122223019.GE8700@ming.t460p> References: <20191121031154.GJ24548@ming.t460p> <20191121061431.GA22886@lst.de> <20191121074643.GB4755@ming.t460p> <20191121154531.GB17852@redsun51.ssa.fujisawa.hgst.com> <20191122094457.GA23632@ming.t460p> <20191122095743.GA21087@lst.de> <20191122102517.GA30001@ming.t460p> <20191122214954.GB8700@ming.t460p> <9ef6c1da-99c5-14f8-edb7-af50c935ce76@fb.com> MIME-Version: 1.0 In-Reply-To: <9ef6c1da-99c5-14f8-edb7-af50c935ce76@fb.com> User-Agent: Mutt/1.12.1 (2019-06-15) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-MC-Unique: aR1Z9tMLMWeIZOx_MO-Ywg-1 X-Mimecast-Spam-Score: 0 Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20191122_143038_709653_E4D3B8D6 X-CRM114-Status: GOOD ( 21.77 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Sagi Grimberg , Long Li , "linux-nvme@lists.infradead.org" , Nadolski Edmund , Keith Busch , Thomas Gleixner , Christoph Hellwig Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org On Fri, Nov 22, 2019 at 09:58:36PM +0000, Jens Axboe wrote: > On 11/22/19 2:49 PM, Ming Lei wrote: > > On Fri, Nov 22, 2019 at 02:04:52PM +0000, Jens Axboe wrote: > >> On 11/22/19 3:25 AM, Ming Lei wrote: > >>>> as that will still overload the one cpu that the interrupt handler was > >>>> assigned to. A dumb fix would be a cpu mask for the threaded interrupt > >>> > >>> Actually one CPU is fast enough to handle several drive's interrupt handling. > >>> Also there is per-queue depth limit, and the interrupt flood issue in network > >>> can't be serious on storage. > >> > >> This is true today, but it won't be true in the future. Lets aim for a > >> solution that's a little more future proof than just "enough today", if > >> we're going to make changes in this area. > > > > That should be a new feature for future hardware, and we don't know any > > performance details, and it can be hard to prepare for it now. Maybe > > such hardware or case never comes: > > Oh it'll surely come, and maybe sooner than you think. My point is that > using "one CPU is fast enough to handle several drive interrupts" is > very shortsighted, and probably not even true today. That single CPU is responsible for handling more than one drives should only happen in case that the following condition is true: nr_drives * nr_io_hw_queue > nr_cpus > > > - storage device has queue depth, which limits the max in-flight requests > > to be handled in each queue's interrupt handler. > > Only if new requests aren't also coming in and completing while you are > doing that work. > > > > - Suppose such fast hardware comes, it isn't reasonable for them > > to support N:1 mapping(N is big). > > Very true, in fact that's already pretty damn dumb today... OK, I guess it is because lots of NVMe only supports limited hw queues(32). > > > - Also IRQ matrix has balanced interrupt handling loading already, that > > said most of times, one CPU is just responsible for handing one hw queue's > > interrupt. Even in Azure's case, 8 CPUs are mapped to one hw queue, but > > there is just several CPUs which is for responsible for at most 2 hw queues. It also depends on how many drives are used on single machine. The issue is possible only when the number of drives is big enough. I guess it isn't unusual. > > > > So could we focus on now and fix the regression first? > > As far as I could tell from the other message, sounds like they both > have broken interrupt coalescing? Makes it harder to care, honestly... Yeah, I found two reports on two different drives, both can be fixed by this patch. Not see other reports which is caused by too much interrupt loading on single CPU. That is why I tried to avoid generic approach... > > But yes, I think we should do something about this. This really isn't a > new issue, if a core gets overloaded just doing completions from > interrupts, we should punt the work. NAPI has been doing that for ages, > and the block layer also used to have support it, but nobody used it. > Would be a great idea to make a blk-mq friendly version of that, with > the kinds of IOPS and latencies in mind that we see today and in the > coming years. I don't think hacking around this in the nvme driver is a > very good way to go about it. OK, I will look at this approach, and Sagi has posted one such patch. thanks, Ming _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme