From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752357AbeAQHvq (ORCPT + 1 other); Wed, 17 Jan 2018 02:51:46 -0500 Received: from mga14.intel.com ([192.55.52.115]:39212 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752271AbeAQHvp (ORCPT ); Wed, 17 Jan 2018 02:51:45 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,372,1511856000"; d="scan'208";a="11197950" Date: Wed, 17 Jan 2018 00:55:00 -0700 From: Keith Busch To: Thomas Gleixner Cc: LKML Subject: Re: [BUG 4.15-rc7] IRQ matrix management errors Message-ID: <20180117075500.GB7562@localhost.localdomain> References: <20180115025759.GG13580@localhost.localdomain> <20180115030255.GA13921@localhost.localdomain> <20180116061641.GB32639@localhost.localdomain> <20180116071145.GA5643@localhost.localdomain> <20180117022511.GD6259@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Wed, Jan 17, 2018 at 08:34:22AM +0100, Thomas Gleixner wrote: > Can you trace the matrix allocations from the very beginning or tell me how > to reproduce. I'd like to figure out why this is happening. Sure, I'll get the irq_matrix events. I reproduce this on a machine with 112 CPUs and 3 NVMe controllers. The first two NVMe want 112 MSI-x vectors, and the last only 31 vectors. The test runs 'modprobe nvme' and 'modprobe -r nvme' in a loop with 10 second delay between each step. Repro occurs within a few iterations, sometimes already broken after the initial boot.