From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D09AEC56202 for ; Tue, 17 Nov 2020 15:10:09 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6C5C024199 for ; Tue, 17 Nov 2020 15:10:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6C5C024199 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=antioche.eu.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from list by lists.xenproject.org with outflank-mailman.29083.58316 (Exim 4.92) (envelope-from ) id 1kf2cJ-0004qT-PE; Tue, 17 Nov 2020 15:09:59 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 29083.58316; Tue, 17 Nov 2020 15:09:59 +0000 X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1kf2cJ-0004qM-M2; Tue, 17 Nov 2020 15:09:59 +0000 Received: by outflank-mailman (input) for mailman id 29083; Tue, 17 Nov 2020 15:09:58 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1kf2cI-0004qH-LH for xen-devel@lists.xenproject.org; Tue, 17 Nov 2020 15:09:58 +0000 Received: from chassiron.antioche.eu.org (unknown [2001:41d0:fe9d:1101::1]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id b6049ace-4c61-4d3e-aff2-95e05a55cab9; Tue, 17 Nov 2020 15:09:56 +0000 (UTC) Received: from sandettie.soc.lip6.fr (82-64-3-41.subs.proxad.net [82.64.3.41]) by chassiron.antioche.eu.org (8.15.2/8.15.2) with ESMTPS id 0AHF9sei029017 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=OK) for ; Tue, 17 Nov 2020 16:09:55 +0100 (MET) Received: by sandettie.soc.lip6.fr (Postfix, from userid 373) id 4624C2E9CA8; Tue, 17 Nov 2020 16:09:49 +0100 (MET) Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1kf2cI-0004qH-LH for xen-devel@lists.xenproject.org; Tue, 17 Nov 2020 15:09:58 +0000 X-Inumbo-ID: b6049ace-4c61-4d3e-aff2-95e05a55cab9 Received: from chassiron.antioche.eu.org (unknown [2001:41d0:fe9d:1101::1]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id b6049ace-4c61-4d3e-aff2-95e05a55cab9; Tue, 17 Nov 2020 15:09:56 +0000 (UTC) Received: from sandettie.soc.lip6.fr (82-64-3-41.subs.proxad.net [82.64.3.41]) by chassiron.antioche.eu.org (8.15.2/8.15.2) with ESMTPS id 0AHF9sei029017 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=OK) for ; Tue, 17 Nov 2020 16:09:55 +0100 (MET) Received: by sandettie.soc.lip6.fr (Postfix, from userid 373) id 4624C2E9CA8; Tue, 17 Nov 2020 16:09:49 +0100 (MET) Date: Tue, 17 Nov 2020 16:09:49 +0100 From: Manuel Bouyer To: xen-devel@lists.xenproject.org Subject: NetBSD dom0 PVH: hardware interrupts stalls Message-ID: <20201117150949.GA3791@antioche.eu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Greylist: Sender succeeded STARTTLS authentication, not delayed by milter-greylist-4.4.3 (chassiron.antioche.eu.org [151.127.5.145]); Tue, 17 Nov 2020 16:09:55 +0100 (MET) Hello, so, after fixing an issue in the NetBSD kernel, related to PV clock interrupts, I'm back with physical interrupts issues. At some point in the initialisation, the dom0 kernel stops receiving interrupts for its disks controller. The disk controller is: [ 1.0000030] mfii0 at pci6 dev 0 function 0: "PERC H740P Adapter ", firmware 51.13.0-3485, 8192MB cache (XEN) d0: bind: m_gsi=34 g_gsi=34 [ 1.0000030] allocated pic ioapic2 type level pin 2 level 6 to cpu0 slot 2 idt entry 103 [ 1.0000030] mfii0: interrupting at ioapic2 pin 2 entering the NetBSD kenrel debugger and looking at interrupt counters, I see that some interrupts did trigger on ioapic2 pin 2, as well as for some other hardware controllers. I did print the controller's status when the command times out, and the controller says that there is an interrupt pending. So I guess that the command was executed, but the dom0 kernel didn't get interupted. At this point I can't say if other hardware controller interripts are working (because of the lockdown I don't have physical access to the hardware). What's strange is that some Xen console activity seems to be enough to resume interrupt activity. Hitting ^A 3 times is enough to get some progess on the dom0's disk controller, and hitting 'v' is usually enough to get the dom0 multiuser. Once there the systems looks stable, I can log in from network. But I/O may stall again on reboot, maybe because the dom0 kenrel is back using synchronous console output. Any idea what to look at from here ? -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --