From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4112C433F5 for ; Fri, 3 Dec 2021 10:55:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1380163AbhLCK67 (ORCPT ); Fri, 3 Dec 2021 05:58:59 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:26630 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237048AbhLCK67 (ORCPT ); Fri, 3 Dec 2021 05:58:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1638528935; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nFj9rWN7S7vuib57ob5QlML0gmhUnKVbSla0HzR2Qjs=; b=bmkNcHd7ViJAjiSJEI1gLM8QPW5JjTrviAZ+xviBaSsJTzwUm8OqJ6t3ntdiXeKtoBsrj+ LSs1Cxl7kfkConMbBVN73uK90uVGGV1IUkw3Bc4zkds7jkzzgWBWW476gafKaPIVH4ujdD yd+pvjZ7ELKXCcG2mhD246M1ZE+kwmM= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-514-zi3fnaDIPDWWUac8YmQUtw-1; Fri, 03 Dec 2021 05:55:34 -0500 X-MC-Unique: zi3fnaDIPDWWUac8YmQUtw-1 Received: by mail-wm1-f70.google.com with SMTP id a64-20020a1c7f43000000b003335e5dc26bso1348337wmd.8 for ; Fri, 03 Dec 2021 02:55:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:subject:in-reply-to :content-transfer-encoding; bh=nFj9rWN7S7vuib57ob5QlML0gmhUnKVbSla0HzR2Qjs=; b=As24zYbIM9USGLMykrEjVZLeqRZ27L6fczQsWFSzOwFKS1YESCU1CE2f0m4Fz7s8tB W2fuum+S/1Fa4AMB2/IZdotF1fT1s1+hlhqH0xiddzavzukKf2bMq9l2o5eFhk9DCcSb ni7WD6Dh70VIAii0nRQz4zPwsGbxNBjhMl/O7b6asO1wO+Ll0ascBLpoAeqVoZcVWTuQ TeoU3lp9XGAzukPmmcZVg8JjVcPEsSpmdnh+k1m7fOgh8q/mMviTEewC0C6AY1wYwTYw SFGfb/4qYCaddeLl8DOJgx23EsQXsP3KIyKoFbKjQcGRZaNozT7h/F+y93gAt5afd5fU AiRg== X-Gm-Message-State: AOAM530yHVP0azmZ79rIkzcEIjdag3iYDFFn+FhE5cML3MC4me8SVpwq SVq8zqv7lJPPNT+P5muc/LES85lULYtUIgztMjb4HNKMa8xuav0SsDZTdBk/s1V421WSnTPUXvP 5fFBR93/dn1hmUlR4sO+FzQ== X-Received: by 2002:adf:db04:: with SMTP id s4mr20392073wri.467.1638528932755; Fri, 03 Dec 2021 02:55:32 -0800 (PST) X-Google-Smtp-Source: ABdhPJxZEQ+qZhV43zN2BozwBHr6roxUn9gyXW8s8bQmG1Ykj1VkU+fnEngvgmzKikpNKHkJ0EK/Zg== X-Received: by 2002:adf:db04:: with SMTP id s4mr20392048wri.467.1638528932519; Fri, 03 Dec 2021 02:55:32 -0800 (PST) Received: from [10.33.192.183] (nat-pool-str-t.redhat.com. [149.14.88.106]) by smtp.gmail.com with ESMTPSA id o10sm2914435wri.15.2021.12.03.02.55.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 03 Dec 2021 02:55:32 -0800 (PST) Message-ID: <11f0ff2f-2bae-0f1b-753f-b0e9dc24b345@redhat.com> Date: Fri, 3 Dec 2021 11:55:31 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.0 Content-Language: en-US To: David Hildenbrand , kvm@vger.kernel.org Cc: Janosch Frank , Christian Borntraeger , Claudio Imbrenda , Sebastian Mitterle , Halil Pasic , linux-s390@vger.kernel.org References: <20211202123553.96412-1-david@redhat.com> <20211202123553.96412-3-david@redhat.com> From: Thomas Huth Subject: Re: [kvm-unit-tests PATCH v2 2/2] s390x: firq: floating interrupt test In-Reply-To: <20211202123553.96412-3-david@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-s390@vger.kernel.org On 02/12/2021 13.35, David Hildenbrand wrote: > We had a KVM BUG fixed by kernel commit a3e03bc1368c ("KVM: s390: index > kvm->arch.idle_mask by vcpu_idx"), whereby a floating interrupt might get > stuck forever because a CPU in the wait state would not get woken up. > > The issue can be triggered when CPUs are created in a nonlinear fashion, > such that the CPU address ("core-id") and the KVM cpu id don't match. > > So let's start with a floating interrupt test that will trigger a > floating interrupt (via SCLP) to be delivered to a CPU in the wait state. Thank you very much for tackling this! Some remarks below... > Signed-off-by: David Hildenbrand > --- > lib/s390x/sclp.c | 11 ++-- > lib/s390x/sclp.h | 1 + > s390x/Makefile | 1 + > s390x/firq.c | 122 ++++++++++++++++++++++++++++++++++++++++++++ > s390x/unittests.cfg | 10 ++++ > 5 files changed, 142 insertions(+), 3 deletions(-) > create mode 100644 s390x/firq.c > > diff --git a/lib/s390x/sclp.c b/lib/s390x/sclp.c > index 0272249..33985eb 100644 > --- a/lib/s390x/sclp.c > +++ b/lib/s390x/sclp.c > @@ -60,9 +60,7 @@ void sclp_setup_int(void) > void sclp_handle_ext(void) > { > ctl_clear_bit(0, CTL0_SERVICE_SIGNAL); > - spin_lock(&sclp_lock); > - sclp_busy = false; > - spin_unlock(&sclp_lock); > + sclp_clear_busy(); > } > > void sclp_wait_busy(void) > @@ -89,6 +87,13 @@ void sclp_mark_busy(void) > } > } > > +void sclp_clear_busy(void) > +{ > + spin_lock(&sclp_lock); > + sclp_busy = false; > + spin_unlock(&sclp_lock); > +} > + > static void sclp_read_scp_info(ReadInfo *ri, int length) > { > unsigned int commands[] = { SCLP_CMDW_READ_SCP_INFO_FORCED, > diff --git a/lib/s390x/sclp.h b/lib/s390x/sclp.h > index 61e9cf5..fead007 100644 > --- a/lib/s390x/sclp.h > +++ b/lib/s390x/sclp.h > @@ -318,6 +318,7 @@ void sclp_setup_int(void); > void sclp_handle_ext(void); > void sclp_wait_busy(void); > void sclp_mark_busy(void); > +void sclp_clear_busy(void); > void sclp_console_setup(void); > void sclp_print(const char *str); > void sclp_read_info(void); > diff --git a/s390x/Makefile b/s390x/Makefile > index f95f2e6..1e567c1 100644 > --- a/s390x/Makefile > +++ b/s390x/Makefile > @@ -25,6 +25,7 @@ tests += $(TEST_DIR)/uv-host.elf > tests += $(TEST_DIR)/edat.elf > tests += $(TEST_DIR)/mvpg-sie.elf > tests += $(TEST_DIR)/spec_ex-sie.elf > +tests += $(TEST_DIR)/firq.elf > > tests_binary = $(patsubst %.elf,%.bin,$(tests)) > ifneq ($(HOST_KEY_DOCUMENT),) > diff --git a/s390x/firq.c b/s390x/firq.c > new file mode 100644 > index 0000000..1f87718 > --- /dev/null > +++ b/s390x/firq.c > @@ -0,0 +1,122 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* > + * Floating interrupt tests. > + * > + * Copyright 2021 Red Hat Inc > + * > + * Authors: > + * David Hildenbrand > + */ > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > + > +static void wait_for_sclp_int(void) > +{ > + /* Enable SCLP interrupts on this CPU only. */ > + ctl_set_bit(0, CTL0_SERVICE_SIGNAL); > + > + /* Enable external interrupts and go to the wait state. */ > + wait_for_interrupt(PSW_MASK_EXT); > +} What happens if the CPU got an interrupt? Should there be a "while (true)" at the end of the function to avoid that the CPU ends up crashing at the end of the function? > +/* > + * Some KVM versions might mix CPUs when looking for a floating IRQ target, > + * accidentially detecting a stopped CPU as waiting and resulting in the actually > + * waiting CPU not getting woken up for the interrupt. > + */ > +static void test_wait_state_delivery(void) > +{ > + struct psw psw; > + SCCBHeader *h; > + int ret; > + > + report_prefix_push("wait state delivery"); > + > + if (smp_query_num_cpus() < 3) { > + report_skip("need at least 3 CPUs for this test"); > + goto out; > + } > + > + if (stap()) { > + report_skip("need to start on CPU #0"); > + goto out; > + } I think I'd rather turn this into an assert() instead ... no strong opinion about it, though. > + > + /* > + * We want CPU #2 to be stopped. This should be the case at this > + * point, however, we want to sense if it even exists as well. > + */ > + ret = smp_cpu_stop(2); > + if (ret) { > + report_skip("CPU #2 not found"); Since you already queried for the availablity of at least 3 CPUs above, I think you could turn this into a report_fail() instead? > + goto out; > + } > + > + /* > + * We're going to perform an SCLP service call but expect > + * the interrupt on CPU #1 while it is in the wait state. > + */ > + sclp_mark_busy(); > + > + /* Start CPU #1 and let it wait for the interrupt. */ > + psw.mask = extract_psw_mask(); > + psw.addr = (unsigned long)wait_for_sclp_int; > + ret = smp_cpu_setup(1, psw); > + if (ret) { > + sclp_clear_busy(); > + report_skip("cpu #1 not found"); > + goto out; > + } > + > + /* > + * We'd have to jump trough some hoops to sense e.g., via SIGP > + * CONDITIONAL EMERGENCY SIGNAL if CPU #1 is already in the > + * wait state. > + * > + * Although not completely reliable, use SIGP SENSE RUNNING STATUS > + * until not reported as running -- after all, our SCLP processing > + * will take some time as well and smp_cpu_setup() returns when we're > + * either already in wait_for_sclp_int() or just about to execute it. > + */ > + while(smp_sense_running_status(1)); > + > + h = alloc_page(); > + h->length = 4096; > + ret = servc(SCLP_CMDW_READ_CPU_INFO, __pa(h)); > + if (ret) { > + sclp_clear_busy(); > + report_fail("SCLP_CMDW_READ_CPU_INFO failed"); > + goto out_destroy; > + } > + > + /* > + * Wait until the interrupt gets delivered on CPU #1, marking the > + * SCLP requests as done. > + */ > + sclp_wait_busy(); > + > + report(true, "sclp interrupt delivered"); > + > +out_destroy: > + free_page(h); > + smp_cpu_destroy(1); > +out: > + report_prefix_pop(); > +} Anyway, code looks fine for me, either with my comments addressed or not: Reviewed-by: Thomas Huth