From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FDF5C433EF for ; Tue, 5 Oct 2021 05:02:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EC3326126A for ; Tue, 5 Oct 2021 05:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231840AbhJEFEa (ORCPT ); Tue, 5 Oct 2021 01:04:30 -0400 Received: from smtp-relay-internal-0.canonical.com ([185.125.188.122]:35598 "EHLO smtp-relay-internal-0.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230493AbhJEFE1 (ORCPT ); Tue, 5 Oct 2021 01:04:27 -0400 Received: from mail-pj1-f72.google.com (mail-pj1-f72.google.com [209.85.216.72]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 983033F22C for ; Tue, 5 Oct 2021 05:02:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1633410152; bh=4sv1HPCgRHpn4yyOFITBxjMLxNIHOvFNMiBGKmMcH4k=; h=Subject:To:Cc:References:From:Message-ID:Date:MIME-Version: In-Reply-To:Content-Type; b=vvalc3mvXOQ3g1yrOrBn8g0EqBaCQlMxOr85d8lNUBE3JwoVtDMhaINKTSXW6tUwG 1yeQWAHACG4Xn/6kGiEMdnEk+r6zE6QOw7jIAk/+zhUgjwwzYSDjLFXreygVjIDa5O vExidkOjIqaE7lddhb5tGqSUYJVrctx0RMwBqi2BXwVTyKkPSUm3wxjT1chfbqGnSS aAUmGANzz4vDlX2CDuVkVhK0MoOJx0em05z6H8r0dnotrsyi8lUpYYDJl8khOkrLtM xMDCIqS0CDdKXghrrEJ7hLk9WpT45kXawhQWxw8kmno8L0qLjT3IIMVh5EulC8S8Nc llUqsXnxj/Otg== Received: by mail-pj1-f72.google.com with SMTP id o15-20020a17090ac08f00b0019fafa34327so862665pjs.3 for ; Mon, 04 Oct 2021 22:02:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=4sv1HPCgRHpn4yyOFITBxjMLxNIHOvFNMiBGKmMcH4k=; b=xmaRl4CNUWNz3QFjgYp2ew+bOdRu5Uwj07p6GzJWR1rSVUG0Zm2np/wsAYrf9ZGuT0 g9Q7QBvWEncNwA3QbjBfHRg2b1pkL7EZhF+h/upLpOlPR3NihbA4ktNC6UVakY3phgBj nKHWTaaTAqcUVfeczWQSLtuF1AYsZaI0gPyOYRWuSBKDYpuqbCzb/ediJgpPzv+drPsN HnNwrCBev2T5hSdlDMhwTZ50CkzrLC7dEIQnonupGOS0yA1auDstVOBmV2y0TH0lVCfx aW/LRjXfI0KX0vsNdBVlreB8sZGa0RYMzUskRzCr3uVmkmi5YCjBZeb21rF7/CkMPiRR v4Iw== X-Gm-Message-State: AOAM531E5rrWZ/7758BK5swsjIKqVuVlrvC0AL2Ry9A+x1H55kj7DtST n3laFkJN1rLr4JR2kykUyyvHK4p58v8CXQa16so5LPdaGqTlWlFpHCZqIEfOLRyQWzFqe/e6oHU l6HxdEpDg70zKJGVruvZsMX6/ewIEuawM83/66b8OkA== X-Received: by 2002:a63:f80a:: with SMTP id n10mr13873450pgh.303.1633410151083; Mon, 04 Oct 2021 22:02:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxhJ2fHJ+dmvfkCeHi0hcYcdtJHYxmYJcUqsJ34pBYhVK4O+ZYTX3mcaGYC7tek2r//0a/SZA== X-Received: by 2002:a63:f80a:: with SMTP id n10mr13873440pgh.303.1633410150715; Mon, 04 Oct 2021 22:02:30 -0700 (PDT) Received: from [192.168.1.107] (125-237-197-94-fibre.sparkbb.co.nz. [125.237.197.94]) by smtp.gmail.com with ESMTPSA id b23sm16272954pfi.135.2021.10.04.22.02.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 04 Oct 2021 22:02:30 -0700 (PDT) Subject: Re: [PROBLEM] Frequently get "irq 31: nobody cared" when passing through 2x GPUs that share same pci switch via vfio To: Alex Williamson Cc: linux-pci@vger.kernel.org, lkml , kvm@vger.kernel.org, nathan.langford@xcelesunifiedtechnologies.com References: <20210914104301.48270518.alex.williamson@redhat.com> <9e8d0e9e-1d94-35e8-be1f-cf66916c24b2@canonical.com> <20210915103235.097202d2.alex.williamson@redhat.com> From: Matthew Ruffell Message-ID: <2fadf33d-8487-94c2-4460-2a20fdb2ea12@canonical.com> Date: Tue, 5 Oct 2021 18:02:24 +1300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <20210915103235.097202d2.alex.williamson@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Alex, Have you had an opportunity to have a look at this a bit deeper? On 16/09/21 4:32 am, Alex Williamson wrote: > > Adding debugging to the vfio-pci interrupt handler, it's correctly > deferring the interrupt as the GPU device is not identifying itself as > the source of the interrupt via the status register. In fact, setting > the disable INTx bit in the GPU command register while the interrupt > storm occurs does not stop the interrupts. > > The interrupt storm does seem to be related to the bus resets, but I > can't figure out yet how multiple devices per switch factors into the > issue. Serializing all bus resets via a mutex doesn't seem to change > the behavior. > > I'm still investigating, but if anyone knows how to get access to the > Broadcom datasheet or errata for this switch, please let me know. We have managed to obtain a recent errata for this switch, and it doesn't mention any interrupt storms with nested switches. What would I be looking for in the errata? I cannot share our copy, sorry. Is there anything that we can do to help? Thanks, Matthew