From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 1CCCDC433EF
	for <linux-kernel@archiver.kernel.org>; Thu, 24 Feb 2022 18:09:15 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232533AbiBXSJn (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Thu, 24 Feb 2022 13:09:43 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37900 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229662AbiBXSJk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 24 Feb 2022 13:09:40 -0500
Received: from mail-wr1-x42e.google.com (mail-wr1-x42e.google.com [IPv6:2a00:1450:4864:20::42e])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3B101D67D5
        for <linux-kernel@vger.kernel.org>; Thu, 24 Feb 2022 10:09:07 -0800 (PST)
Received: by mail-wr1-x42e.google.com with SMTP id r10so941089wrp.3
        for <linux-kernel@vger.kernel.org>; Thu, 24 Feb 2022 10:09:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=JBod19rkADe4RLTH6pIu+iewF45ltCOcT+bAF+Wv0jI=;
        b=jbWx65rXhidUWKLzjE5t5/ELOwE9H4N7N1Py6DVq329fFUHtli6eeZN9hfBsWJ/VZg
         JZBK/CdFrLEGqce2ZoQ9Wl3P5s2ApAyvLgpss0vbEIWK9I2Lf/N57ZOuoH5ccWswZ4Ss
         7prwfrJSMgCv1bsliPkng8j4B9yloXNYc6hf4OgZBEo7WKGoxQxQ3MjRCy/SU1wPurNK
         vuI2Z/Li4QP2k2hxhW0zNY2l/eo5ISqLSD9U+LTnSGGDYsjJhLKpRrgbbCuUWHerRBsA
         ZygKhLNN1sV2ch5mVvtnMwFFaJuPbj3iw03WfuwBriXD/MXhG0lSCUo8yaeuirAX2jyZ
         +Tsw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=JBod19rkADe4RLTH6pIu+iewF45ltCOcT+bAF+Wv0jI=;
        b=QbEv1w7BRLM3qNpG9427XPAczySaehsybjxprhCCVzixRujSdqGf2jRtAtmLtjNXCS
         lDL/q/pPVzx7cliWoUH1OHoCfSGSwHvFyPshgBASuS67Dr8eFTs4FcAR3mTtUGZAbHTY
         yQszp42sRMjIdvZBF8GicK5cz0EgW70mLxPPvl7CmUzNsg3ZZigOOjslRuL7w6ALUUXG
         GVh9rd7YFU215a+85NGeeQuhmmHR0xN3TeS3mWC2vQ/hA5/UBzZy5lBv/8LeflhbjqD1
         Zxmw++3va0IMW2Ni2E1HQLD5Ui7fuEMOFRasdx4eejdUdC3fd09Dax+Kw8BdtF/TI2un
         CXsg==
X-Gm-Message-State: AOAM530l4zjmJQX9tBSdMfTdrVVZnY6mEgFobjcFVRshYrgoC06kjzxk
        v9AjFNds9w0Rx1JKL0cjGa5m2KP8uLj2qEXcKTVy4w==
X-Google-Smtp-Source: ABdhPJw3fGngrJIeXA2AngyWEbeuzVREcXS2h92NXlwzoHHBHZ2ErN9+yU7GNB6Mu3tbLl54mrAmF7kecbeXj1BnSLA=
X-Received: by 2002:a5d:4b87:0:b0:1ed:f948:7bcf with SMTP id
 b7-20020a5d4b87000000b001edf9487bcfmr2752186wrt.699.1645726146117; Thu, 24
 Feb 2022 10:09:06 -0800 (PST)
MIME-Version: 1.0
References: <20220224051439.640768-1-kaleshsingh@google.com>
 <20220224051439.640768-8-kaleshsingh@google.com> <CA+EHjTwMvQzvyA2Nxq_TKNjM-0_XvTtoKPtBPVPjSn7cboBNUA@mail.gmail.com>
In-Reply-To: <CA+EHjTwMvQzvyA2Nxq_TKNjM-0_XvTtoKPtBPVPjSn7cboBNUA@mail.gmail.com>
From:   Kalesh Singh <kaleshsingh@google.com>
Date:   Thu, 24 Feb 2022 10:08:55 -0800
Message-ID: <CAC_TJvef9=gtidBJ1T1fEMY6prDAo8dTYr5uCiS=i3o03miuWg@mail.gmail.com>
Subject: Re: [PATCH v3 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace
To:     Fuad Tabba <tabba@google.com>
Cc:     Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>,
        Quentin Perret <qperret@google.com>,
        Suren Baghdasaryan <surenb@google.com>,
        "Cc: Android Kernel" <kernel-team@android.com>,
        James Morse <james.morse@arm.com>,
        Alexandru Elisei <alexandru.elisei@arm.com>,
        Suzuki K Poulose <suzuki.poulose@arm.com>,
        Catalin Marinas <catalin.marinas@arm.com>,
        Mark Rutland <mark.rutland@arm.com>,
        Mark Brown <broonie@kernel.org>,
        Masami Hiramatsu <mhiramat@kernel.org>,
        Peter Collingbourne <pcc@google.com>,
        "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>,
        Andrew Scull <ascull@google.com>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Ard Biesheuvel <ardb@kernel.org>,
        "moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)" 
        <linux-arm-kernel@lists.infradead.org>,
        kvmarm@lists.cs.columbia.edu, LKML <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Feb 24, 2022 at 4:28 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> On Thu, Feb 24, 2022 at 5:22 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
> > possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
> > which allows host to access the hypervisor stack pages in EL1.
>
> For this comment to be clearer, and if my understanding is correct, I
> think that it should say that CONFIG_NVHE_EL2_DEBUG allows host stage
> 2 protection to be disabled on a hyp_panic. Otherwise, on reading the
> comment one might think that CONFIG_NVHE_EL2_DEBUG runs without host
> stage 2 protection at all.

Your understanding is correct: the host stage 2  protection is only
disabled on a hyp_panic(). I'll rephrase to make it clearer.

>
> >
> > Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
> > to avoid the potential leaking of information to the host.
> >
> > A simple stack overflow test produces the following output:
> >
> > [  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
> > [  580.378034][  T412] kvm [412]: nVHE HYP call trace:
> > [  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
> > [  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
> > [  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
> > [  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > . . .
> >
> > Since nVHE hyp symbols are not included by kallsyms to avoid issues
> > with aliasing, we fallback to the vmlinux addresses. Symbolizing the
> > addresses is handled in the next patch in this series.
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - The nvhe hyp stack unwinder now makes use of the core logic from the
> >     regular kernel unwinder to avoid duplication, per Mark
> >
> > Changes in v2:
> >   - Add cpu_prepare_nvhe_panic_info()
> >   - Move updating the panic info to hyp_panic(), so that unwinding also
> >     works for conventional nVHE Hyp-mode.
> >
> >  arch/arm64/include/asm/kvm_asm.h    |  19 +++
> >  arch/arm64/include/asm/stacktrace.h |  12 ++
> >  arch/arm64/kernel/stacktrace.c      | 210 +++++++++++++++++++++++++---
> >  arch/arm64/kvm/Kconfig              |   5 +-
> >  arch/arm64/kvm/arm.c                |   2 +-
> >  arch/arm64/kvm/handle_exit.c        |   3 +
> >  arch/arm64/kvm/hyp/nvhe/switch.c    |  18 +++
> >  7 files changed, 243 insertions(+), 26 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 2e277f2ed671..16efdf150a37 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -176,6 +176,25 @@ struct kvm_nvhe_init_params {
> >         unsigned long vtcr;
> >  };
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +/*
> > + * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
> > + * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
> > + * the host stage 2 protection. See: __hyp_do_panic()
>
> Same as my comment above.

Ack
>
> > + *
> > + * @hyp_stack_base:             hyp VA of the hyp_stack base.
> > + * @hyp_overflow_stack_base:    hyp VA of the hyp_overflow_stack base.
> > + * @fp:                         hyp FP where the backtrace begins.
> > + * @pc:                         hyp PC where the backtrace begins.
> > + */
> > +struct kvm_nvhe_panic_info {
> > +       unsigned long hyp_stack_base;
> > +       unsigned long hyp_overflow_stack_base;
> > +       unsigned long fp;
> > +       unsigned long pc;
> > +};
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > +
> >  /* Translate a kernel address @ptr into its equivalent linear mapping */
> >  #define kvm_ksym_ref(ptr)                                              \
> >         ({                                                              \
> > diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> > index e77cdef9ca29..18611a51cf14 100644
> > --- a/arch/arm64/include/asm/stacktrace.h
> > +++ b/arch/arm64/include/asm/stacktrace.h
> > @@ -22,6 +22,10 @@ enum stack_type {
> >         STACK_TYPE_OVERFLOW,
> >         STACK_TYPE_SDEI_NORMAL,
> >         STACK_TYPE_SDEI_CRITICAL,
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +       STACK_TYPE_KVM_NVHE_HYP,
> > +       STACK_TYPE_KVM_NVHE_OVERFLOW,
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> >         __NR_STACK_TYPES
> >  };
> >
> > @@ -147,4 +151,12 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
> >         return false;
> >  }
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset);
> > +#else
> > +static inline void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> > +{
> > +}
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > +
> >  #endif /* __ASM_STACKTRACE_H */
> > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> > index e4103e085681..6ec85cb69b1f 100644
> > --- a/arch/arm64/kernel/stacktrace.c
> > +++ b/arch/arm64/kernel/stacktrace.c
> > @@ -15,6 +15,8 @@
> >
> >  #include <asm/irq.h>
> >  #include <asm/pointer_auth.h>
> > +#include <asm/kvm_asm.h>
> > +#include <asm/kvm_hyp.h>
> >  #include <asm/stack_pointer.h>
> >  #include <asm/stacktrace.h>
> >
> > @@ -64,26 +66,15 @@ NOKPROBE_SYMBOL(start_backtrace);
> >   * records (e.g. a cycle), determined based on the location and fp value of A
> >   * and the location (but not the fp value) of B.
> >   */
> > -static int notrace unwind_frame(struct task_struct *tsk,
> > -                               struct stackframe *frame)
> > +static int notrace __unwind_frame(struct stackframe *frame, struct stack_info *info,
> > +               unsigned long (*translate_fp)(unsigned long, enum stack_type))
> >  {
> >         unsigned long fp = frame->fp;
> > -       struct stack_info info;
> > -
> > -       if (!tsk)
> > -               tsk = current;
> > -
> > -       /* Final frame; nothing to unwind */
> > -       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> > -               return -ENOENT;
> >
> >         if (fp & 0x7)
> >                 return -EINVAL;
> >
> > -       if (!on_accessible_stack(tsk, fp, 16, &info))
> > -               return -EINVAL;
> > -
> > -       if (test_bit(info.type, frame->stacks_done))
> > +       if (test_bit(info->type, frame->stacks_done))
> >                 return -EINVAL;
> >
> >         /*
> > @@ -94,28 +85,62 @@ static int notrace unwind_frame(struct task_struct *tsk,
> >          *
> >          * TASK -> IRQ -> OVERFLOW -> SDEI_NORMAL
> >          * TASK -> SDEI_NORMAL -> SDEI_CRITICAL -> OVERFLOW
> > +        * KVM_NVHE_HYP -> KVM_NVHE_OVERFLOW
> >          *
> >          * ... but the nesting itself is strict. Once we transition from one
> >          * stack to another, it's never valid to unwind back to that first
> >          * stack.
> >          */
> > -       if (info.type == frame->prev_type) {
> > +       if (info->type == frame->prev_type) {
> >                 if (fp <= frame->prev_fp)
> >                         return -EINVAL;
> >         } else {
> >                 set_bit(frame->prev_type, frame->stacks_done);
> >         }
> >
> > +       /* Record fp as prev_fp before attempting to get the next fp */
> > +       frame->prev_fp = fp;
> > +
> > +       /*
> > +        * If fp is not from the current address space perform the
> > +        * necessary translation before dereferencing it to get next fp.
> > +        */
> > +       if (translate_fp)
> > +               fp = translate_fp(fp, info->type);
> > +       if (!fp)
> > +               return -EINVAL;
> > +
> >         /*
> >          * Record this frame record's values and location. The prev_fp and
> > -        * prev_type are only meaningful to the next unwind_frame() invocation.
> > +        * prev_type are only meaningful to the next __unwind_frame() invocation.
> >          */
> >         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> >         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> > -       frame->prev_fp = fp;
> > -       frame->prev_type = info.type;
> > -
> >         frame->pc = ptrauth_strip_insn_pac(frame->pc);
> > +       frame->prev_type = info->type;
> > +
> > +       return 0;
> > +}
> > +
> > +static int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> > +{
> > +       unsigned long fp = frame->fp;
> > +       struct stack_info info;
> > +       int err;
> > +
> > +       if (!tsk)
> > +               tsk = current;
> > +
> > +       /* Final frame; nothing to unwind */
> > +       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> > +               return -ENOENT;
> > +
> > +       if (!on_accessible_stack(tsk, fp, 16, &info))
> > +               return -EINVAL;
> > +
> > +       err = __unwind_frame(frame, &info, NULL);
> > +       if (err)
> > +               return err;
> >
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >         if (tsk->ret_stack &&
> > @@ -143,20 +168,27 @@ static int notrace unwind_frame(struct task_struct *tsk,
> >  }
> >  NOKPROBE_SYMBOL(unwind_frame);
> >
> > -static void notrace walk_stackframe(struct task_struct *tsk,
> > -                                   struct stackframe *frame,
> > -                                   bool (*fn)(void *, unsigned long), void *data)
> > +static void notrace __walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
> > +               bool (*fn)(void *, unsigned long), void *data,
> > +               int (*unwind_frame_fn)(struct task_struct *tsk, struct stackframe *frame))
> >  {
> >         while (1) {
> >                 int ret;
> >
> >                 if (!fn(data, frame->pc))
> >                         break;
> > -               ret = unwind_frame(tsk, frame);
> > +               ret = unwind_frame_fn(tsk, frame);
> >                 if (ret < 0)
> >                         break;
> >         }
> >  }
> > +
> > +static void notrace walk_stackframe(struct task_struct *tsk,
> > +                                   struct stackframe *frame,
> > +                                   bool (*fn)(void *, unsigned long), void *data)
> > +{
> > +       __walk_stackframe(tsk, frame, fn, data, unwind_frame);
> > +}
> >  NOKPROBE_SYMBOL(walk_stackframe);
> >
> >  static bool dump_backtrace_entry(void *arg, unsigned long where)
> > @@ -210,3 +242,135 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
> >
> >         walk_stackframe(task, &frame, consume_entry, cookie);
> >  }
> > +
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> > +DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
> > +DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> > +
> > +static inline bool kvm_nvhe_on_overflow_stack(unsigned long sp, unsigned long size,
> > +                                struct stack_info *info)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
> > +       unsigned long high = low + PAGE_SIZE;
> > +
> > +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_OVERFLOW, info);
> > +}
> > +
> > +static inline bool kvm_nvhe_on_hyp_stack(unsigned long sp, unsigned long size,
> > +                                struct stack_info *info)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long low = (unsigned long)panic_info->hyp_stack_base;
> > +       unsigned long high = low + PAGE_SIZE;
> > +
> > +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_HYP, info);
> > +}
> > +
> > +static inline bool kvm_nvhe_on_accessible_stack(unsigned long sp, unsigned long size,
> > +                                      struct stack_info *info)
> > +{
> > +       if (info)
> > +               info->type = STACK_TYPE_UNKNOWN;
> > +
> > +       if (kvm_nvhe_on_hyp_stack(sp, size, info))
> > +               return true;
> > +       if (kvm_nvhe_on_overflow_stack(sp, size, info))
> > +               return true;
> > +
> > +       return false;
> > +}
> > +
> > +static unsigned long kvm_nvhe_hyp_stack_kern_va(unsigned long addr)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long hyp_base, kern_base, hyp_offset;
> > +
> > +       hyp_base = (unsigned long)panic_info->hyp_stack_base;
> > +       hyp_offset = addr - hyp_base;
> > +
> > +       kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
> > +
> > +       return kern_base + hyp_offset;
> > +}
> > +
> > +static unsigned long kvm_nvhe_overflow_stack_kern_va(unsigned long addr)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long hyp_base, kern_base, hyp_offset;
> > +
> > +       hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
> > +       hyp_offset = addr - hyp_base;
> > +
> > +       kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
> > +
> > +       return kern_base + hyp_offset;
> > +}
> > +
> > +/*
> > + * Convert KVM nVHE hypervisor stack VA to a kernel VA.
> > + *
> > + * The nVHE hypervisor stack is mapped in the flexible 'private' VA range, to allow
> > + * for guard pages below the stack. Consequently, the fixed offset address
> > + * translation macros won't work here.
> > + *
> > + * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
> > + * stack base. See: kvm_nvhe_hyp_stack_kern_va(),  kvm_nvhe_overflow_stack_kern_va()
> > + */
> > +static unsigned long kvm_nvhe_stack_kern_va(unsigned long addr,
> > +                                       enum stack_type type)
> > +{
> > +       switch (type) {
> > +       case STACK_TYPE_KVM_NVHE_HYP:
> > +               return kvm_nvhe_hyp_stack_kern_va(addr);
> > +       case STACK_TYPE_KVM_NVHE_OVERFLOW:
> > +               return kvm_nvhe_overflow_stack_kern_va(addr);
> > +       default:
> > +               return 0UL;
> > +       }
> > +}
> > +
> > +static int notrace kvm_nvhe_unwind_frame(struct task_struct *tsk,
> > +                                       struct stackframe *frame)
> > +{
> > +       struct stack_info info;
> > +
> > +       if (!kvm_nvhe_on_accessible_stack(frame->fp, 16, &info))
> > +               return -EINVAL;
> > +
> > +       return  __unwind_frame(frame, &info, kvm_nvhe_stack_kern_va);
> > +}
> > +
> > +static bool kvm_nvhe_dump_backtrace_entry(void *arg, unsigned long where)
> > +{
> > +       unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
> > +       unsigned long hyp_offset = (unsigned long)arg;
> > +
> > +       where &= va_mask;       /* Mask tags */
> > +       where += hyp_offset;    /* Convert to kern addr */
> > +
> > +       kvm_err("[<%016lx>] %pB\n", where, (void *)where);
> > +
> > +       return true;
> > +}
> > +
> > +static void notrace kvm_nvhe_walk_stackframe(struct task_struct *tsk,
> > +                                   struct stackframe *frame,
> > +                                   bool (*fn)(void *, unsigned long), void *data)
> > +{
> > +       __walk_stackframe(tsk, frame, fn, data, kvm_nvhe_unwind_frame);
> > +}
> > +
> > +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       struct stackframe frame;
> > +
> > +       start_backtrace(&frame, panic_info->fp, panic_info->pc);
> > +       pr_err("nVHE HYP call trace:\n");
> > +       kvm_nvhe_walk_stackframe(NULL, &frame, kvm_nvhe_dump_backtrace_entry,
> > +                                       (void *)hyp_offset);
> > +       pr_err("---- end of nVHE HYP call trace ----\n");
> > +}
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > index 8a5fbbf084df..75f2c8255ff0 100644
> > --- a/arch/arm64/kvm/Kconfig
> > +++ b/arch/arm64/kvm/Kconfig
> > @@ -51,8 +51,9 @@ config NVHE_EL2_DEBUG
> >         depends on KVM
> >         help
> >           Say Y here to enable the debug mode for the non-VHE KVM EL2 object.
> > -         Failure reports will BUG() in the hypervisor. This is intended for
> > -         local EL2 hypervisor development.
> > +         Failure reports will BUG() in the hypervisor; and panics will print
> > +         the hypervisor call stack. This is intended for local EL2 hypervisor
> > +         development.
>
> Nit: maybe for clarity you could rephrase as "calls to hyp_panic()
> will result in printing the hypervisor call stack".

Ack. I'll update in the next version.

Thanks,
Kalesh
>
> Thanks,
> /fuad
>
>
> >
> >           If unsure, say N.
> >
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 7a23630c4a7f..66c07c04eb52 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
> >
> >  DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
> >
> > -static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> > +DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> >  unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
> >  DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
> >
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index e3140abd2e2e..ff69dff33700 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -17,6 +17,7 @@
> >  #include <asm/kvm_emulate.h>
> >  #include <asm/kvm_mmu.h>
> >  #include <asm/debug-monitors.h>
> > +#include <asm/stacktrace.h>
> >  #include <asm/traps.h>
> >
> >  #include <kvm/arm_hypercalls.h>
> > @@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
> >                 kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
> >         }
> >
> > +       kvm_nvhe_dump_backtrace(hyp_offset);
> > +
> >         /*
> >          * Hyp has panicked and we're going to handle that by panicking the
> >          * kernel. The kernel offset will be revealed in the panic so we're
> > diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> > index efc20273a352..b8ecffc47424 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> > @@ -37,6 +37,22 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
> >  #ifdef CONFIG_NVHE_EL2_DEBUG
> >  DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
> >         __aligned(16);
> > +DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> > +
> > +static inline void cpu_prepare_nvhe_panic_info(void)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
> > +       struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> > +
> > +       panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
> > +       panic_info->hyp_overflow_stack_base = (unsigned long)this_cpu_ptr(hyp_overflow_stack);
> > +       panic_info->fp = (unsigned long)__builtin_frame_address(0);
> > +       panic_info->pc = _THIS_IP_;
> > +}
> > + #else
> > +static inline void cpu_prepare_nvhe_panic_info(void)
> > +{
> > +}
> >  #endif
> >
> >  static void __activate_traps(struct kvm_vcpu *vcpu)
> > @@ -360,6 +376,8 @@ asmlinkage void __noreturn hyp_panic(void)
> >         struct kvm_cpu_context *host_ctxt;
> >         struct kvm_vcpu *vcpu;
> >
> > +       cpu_prepare_nvhe_panic_info();
> > +
> >         host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> >         vcpu = host_ctxt->__hyp_running_vcpu;
> >
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id CEDFDC433EF
	for <linux-arm-kernel@archiver.kernel.org>; Thu, 24 Feb 2022 18:10:37 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From:
	In-Reply-To:References:MIME-Version:Reply-To:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=jhAbkS3SfIJpDXGTbflYOzKlzH1eJhHGT0IGT0OdzVk=; b=fRKhqSyF9a2DaT
	yBH0rrQoXKSAie9gU/MT5xWKamlOop3OI1DxoD0E4uOoW3dDnYx0/DhfQDiGsSifcJ49uq8ZZ7gJs
	5yyYsONdTqtdTgAH9yzyQ74+nFU2/LjfqH5FCGlJ10swRgvwXBMifsP6iOcKXXVPJ24giBuUC5VsR
	uJJ9An89D+IZL+2ZiOWyciV4BbsZv5OM524RuxYBc3vk8ZmApHAMfACeb6HNkRrBr/aIaX0hUg2eH
	lBRorYEqCCNuDe5njXLgkjvTsN6MXfB8w1OfMWb/+J+2tdQYjhNaHNe71iNNuU9auazKrUx/bWT71
	lGGwGUB3Kt0hgbWe9IIg==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux))
	id 1nNIYD-001xNx-JY; Thu, 24 Feb 2022 18:09:13 +0000
Received: from mail-wr1-x436.google.com ([2a00:1450:4864:20::436])
 by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux))
 id 1nNIY9-001xN2-4W
 for linux-arm-kernel@lists.infradead.org; Thu, 24 Feb 2022 18:09:11 +0000
Received: by mail-wr1-x436.google.com with SMTP id d17so900354wrc.9
 for <linux-arm-kernel@lists.infradead.org>;
 Thu, 24 Feb 2022 10:09:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=JBod19rkADe4RLTH6pIu+iewF45ltCOcT+bAF+Wv0jI=;
 b=jbWx65rXhidUWKLzjE5t5/ELOwE9H4N7N1Py6DVq329fFUHtli6eeZN9hfBsWJ/VZg
 JZBK/CdFrLEGqce2ZoQ9Wl3P5s2ApAyvLgpss0vbEIWK9I2Lf/N57ZOuoH5ccWswZ4Ss
 7prwfrJSMgCv1bsliPkng8j4B9yloXNYc6hf4OgZBEo7WKGoxQxQ3MjRCy/SU1wPurNK
 vuI2Z/Li4QP2k2hxhW0zNY2l/eo5ISqLSD9U+LTnSGGDYsjJhLKpRrgbbCuUWHerRBsA
 ZygKhLNN1sV2ch5mVvtnMwFFaJuPbj3iw03WfuwBriXD/MXhG0lSCUo8yaeuirAX2jyZ
 +Tsw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=JBod19rkADe4RLTH6pIu+iewF45ltCOcT+bAF+Wv0jI=;
 b=sXFlix9DhalnKWieyoHlDeJAi30HZCGnhwm+XrHpg4n4ha8irECO5wJFiwgA6dIa+I
 eLRO9Q6kp+9I600//mgRJ7wxJYuh0d+CNH+WAG06oIGshQigIQM3x3MKhHicNgYZN9sw
 WEdKYqXlpyk3jDx+KPESIn/SzxauunTrbiPxByUsz56WeXArMQjoQ0GCQQ2KtEvveAVx
 DhQxnDSRlEFZgbHxqbvzN6zpAyoiYMCFAdSmrlczcDMCAfnogkeHbsnUJ7fHlLwqIDmt
 izQfjyDfDHL/gFDhCThClVYmApf6WlKaWdIWP7FMzayG2IEGKzQoo+9DZS6eA3sueZkg
 qZsw==
X-Gm-Message-State: AOAM531hlBBgwm903iiToRfcL1RtNlrecmG3mDjvkeLt5Q3yfzjgHQQ2
 q49IwmStrE8kTm6BwaI7IIGYs2tRBPJwywP2CL0/Uw==
X-Google-Smtp-Source: ABdhPJw3fGngrJIeXA2AngyWEbeuzVREcXS2h92NXlwzoHHBHZ2ErN9+yU7GNB6Mu3tbLl54mrAmF7kecbeXj1BnSLA=
X-Received: by 2002:a5d:4b87:0:b0:1ed:f948:7bcf with SMTP id
 b7-20020a5d4b87000000b001edf9487bcfmr2752186wrt.699.1645726146117; Thu, 24
 Feb 2022 10:09:06 -0800 (PST)
MIME-Version: 1.0
References: <20220224051439.640768-1-kaleshsingh@google.com>
 <20220224051439.640768-8-kaleshsingh@google.com>
 <CA+EHjTwMvQzvyA2Nxq_TKNjM-0_XvTtoKPtBPVPjSn7cboBNUA@mail.gmail.com>
In-Reply-To: <CA+EHjTwMvQzvyA2Nxq_TKNjM-0_XvTtoKPtBPVPjSn7cboBNUA@mail.gmail.com>
From: Kalesh Singh <kaleshsingh@google.com>
Date: Thu, 24 Feb 2022 10:08:55 -0800
Message-ID: <CAC_TJvef9=gtidBJ1T1fEMY6prDAo8dTYr5uCiS=i3o03miuWg@mail.gmail.com>
Subject: Re: [PATCH v3 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace
To: Fuad Tabba <tabba@google.com>
Cc: Will Deacon <will@kernel.org>, Marc Zyngier <maz@kernel.org>,
 Quentin Perret <qperret@google.com>, 
 Suren Baghdasaryan <surenb@google.com>,
 "Cc: Android Kernel" <kernel-team@android.com>, 
 James Morse <james.morse@arm.com>, Alexandru Elisei <alexandru.elisei@arm.com>,
 Suzuki K Poulose <suzuki.poulose@arm.com>,
 Catalin Marinas <catalin.marinas@arm.com>, 
 Mark Rutland <mark.rutland@arm.com>, Mark Brown <broonie@kernel.org>, 
 Masami Hiramatsu <mhiramat@kernel.org>, Peter Collingbourne <pcc@google.com>, 
 "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>,
 Andrew Scull <ascull@google.com>, 
 Paolo Bonzini <pbonzini@redhat.com>, Ard Biesheuvel <ardb@kernel.org>, 
 "moderated list:ARM64 PORT (AARCH64 ARCHITECTURE)"
 <linux-arm-kernel@lists.infradead.org>, kvmarm@lists.cs.columbia.edu, 
 LKML <linux-kernel@vger.kernel.org>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20220224_100909_231144_32E5CE98 
X-CRM114-Status: GOOD (  42.76  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Thu, Feb 24, 2022 at 4:28 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> On Thu, Feb 24, 2022 at 5:22 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
> > possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
> > which allows host to access the hypervisor stack pages in EL1.
>
> For this comment to be clearer, and if my understanding is correct, I
> think that it should say that CONFIG_NVHE_EL2_DEBUG allows host stage
> 2 protection to be disabled on a hyp_panic. Otherwise, on reading the
> comment one might think that CONFIG_NVHE_EL2_DEBUG runs without host
> stage 2 protection at all.

Your understanding is correct: the host stage 2  protection is only
disabled on a hyp_panic(). I'll rephrase to make it clearer.

>
> >
> > Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
> > to avoid the potential leaking of information to the host.
> >
> > A simple stack overflow test produces the following output:
> >
> > [  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
> > [  580.378034][  T412] kvm [412]: nVHE HYP call trace:
> > [  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
> > [  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
> > [  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
> > [  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > . . .
> >
> > Since nVHE hyp symbols are not included by kallsyms to avoid issues
> > with aliasing, we fallback to the vmlinux addresses. Symbolizing the
> > addresses is handled in the next patch in this series.
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - The nvhe hyp stack unwinder now makes use of the core logic from the
> >     regular kernel unwinder to avoid duplication, per Mark
> >
> > Changes in v2:
> >   - Add cpu_prepare_nvhe_panic_info()
> >   - Move updating the panic info to hyp_panic(), so that unwinding also
> >     works for conventional nVHE Hyp-mode.
> >
> >  arch/arm64/include/asm/kvm_asm.h    |  19 +++
> >  arch/arm64/include/asm/stacktrace.h |  12 ++
> >  arch/arm64/kernel/stacktrace.c      | 210 +++++++++++++++++++++++++---
> >  arch/arm64/kvm/Kconfig              |   5 +-
> >  arch/arm64/kvm/arm.c                |   2 +-
> >  arch/arm64/kvm/handle_exit.c        |   3 +
> >  arch/arm64/kvm/hyp/nvhe/switch.c    |  18 +++
> >  7 files changed, 243 insertions(+), 26 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 2e277f2ed671..16efdf150a37 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -176,6 +176,25 @@ struct kvm_nvhe_init_params {
> >         unsigned long vtcr;
> >  };
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +/*
> > + * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
> > + * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
> > + * the host stage 2 protection. See: __hyp_do_panic()
>
> Same as my comment above.

Ack
>
> > + *
> > + * @hyp_stack_base:             hyp VA of the hyp_stack base.
> > + * @hyp_overflow_stack_base:    hyp VA of the hyp_overflow_stack base.
> > + * @fp:                         hyp FP where the backtrace begins.
> > + * @pc:                         hyp PC where the backtrace begins.
> > + */
> > +struct kvm_nvhe_panic_info {
> > +       unsigned long hyp_stack_base;
> > +       unsigned long hyp_overflow_stack_base;
> > +       unsigned long fp;
> > +       unsigned long pc;
> > +};
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > +
> >  /* Translate a kernel address @ptr into its equivalent linear mapping */
> >  #define kvm_ksym_ref(ptr)                                              \
> >         ({                                                              \
> > diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> > index e77cdef9ca29..18611a51cf14 100644
> > --- a/arch/arm64/include/asm/stacktrace.h
> > +++ b/arch/arm64/include/asm/stacktrace.h
> > @@ -22,6 +22,10 @@ enum stack_type {
> >         STACK_TYPE_OVERFLOW,
> >         STACK_TYPE_SDEI_NORMAL,
> >         STACK_TYPE_SDEI_CRITICAL,
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +       STACK_TYPE_KVM_NVHE_HYP,
> > +       STACK_TYPE_KVM_NVHE_OVERFLOW,
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> >         __NR_STACK_TYPES
> >  };
> >
> > @@ -147,4 +151,12 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
> >         return false;
> >  }
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset);
> > +#else
> > +static inline void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> > +{
> > +}
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > +
> >  #endif /* __ASM_STACKTRACE_H */
> > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> > index e4103e085681..6ec85cb69b1f 100644
> > --- a/arch/arm64/kernel/stacktrace.c
> > +++ b/arch/arm64/kernel/stacktrace.c
> > @@ -15,6 +15,8 @@
> >
> >  #include <asm/irq.h>
> >  #include <asm/pointer_auth.h>
> > +#include <asm/kvm_asm.h>
> > +#include <asm/kvm_hyp.h>
> >  #include <asm/stack_pointer.h>
> >  #include <asm/stacktrace.h>
> >
> > @@ -64,26 +66,15 @@ NOKPROBE_SYMBOL(start_backtrace);
> >   * records (e.g. a cycle), determined based on the location and fp value of A
> >   * and the location (but not the fp value) of B.
> >   */
> > -static int notrace unwind_frame(struct task_struct *tsk,
> > -                               struct stackframe *frame)
> > +static int notrace __unwind_frame(struct stackframe *frame, struct stack_info *info,
> > +               unsigned long (*translate_fp)(unsigned long, enum stack_type))
> >  {
> >         unsigned long fp = frame->fp;
> > -       struct stack_info info;
> > -
> > -       if (!tsk)
> > -               tsk = current;
> > -
> > -       /* Final frame; nothing to unwind */
> > -       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> > -               return -ENOENT;
> >
> >         if (fp & 0x7)
> >                 return -EINVAL;
> >
> > -       if (!on_accessible_stack(tsk, fp, 16, &info))
> > -               return -EINVAL;
> > -
> > -       if (test_bit(info.type, frame->stacks_done))
> > +       if (test_bit(info->type, frame->stacks_done))
> >                 return -EINVAL;
> >
> >         /*
> > @@ -94,28 +85,62 @@ static int notrace unwind_frame(struct task_struct *tsk,
> >          *
> >          * TASK -> IRQ -> OVERFLOW -> SDEI_NORMAL
> >          * TASK -> SDEI_NORMAL -> SDEI_CRITICAL -> OVERFLOW
> > +        * KVM_NVHE_HYP -> KVM_NVHE_OVERFLOW
> >          *
> >          * ... but the nesting itself is strict. Once we transition from one
> >          * stack to another, it's never valid to unwind back to that first
> >          * stack.
> >          */
> > -       if (info.type == frame->prev_type) {
> > +       if (info->type == frame->prev_type) {
> >                 if (fp <= frame->prev_fp)
> >                         return -EINVAL;
> >         } else {
> >                 set_bit(frame->prev_type, frame->stacks_done);
> >         }
> >
> > +       /* Record fp as prev_fp before attempting to get the next fp */
> > +       frame->prev_fp = fp;
> > +
> > +       /*
> > +        * If fp is not from the current address space perform the
> > +        * necessary translation before dereferencing it to get next fp.
> > +        */
> > +       if (translate_fp)
> > +               fp = translate_fp(fp, info->type);
> > +       if (!fp)
> > +               return -EINVAL;
> > +
> >         /*
> >          * Record this frame record's values and location. The prev_fp and
> > -        * prev_type are only meaningful to the next unwind_frame() invocation.
> > +        * prev_type are only meaningful to the next __unwind_frame() invocation.
> >          */
> >         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> >         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> > -       frame->prev_fp = fp;
> > -       frame->prev_type = info.type;
> > -
> >         frame->pc = ptrauth_strip_insn_pac(frame->pc);
> > +       frame->prev_type = info->type;
> > +
> > +       return 0;
> > +}
> > +
> > +static int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> > +{
> > +       unsigned long fp = frame->fp;
> > +       struct stack_info info;
> > +       int err;
> > +
> > +       if (!tsk)
> > +               tsk = current;
> > +
> > +       /* Final frame; nothing to unwind */
> > +       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> > +               return -ENOENT;
> > +
> > +       if (!on_accessible_stack(tsk, fp, 16, &info))
> > +               return -EINVAL;
> > +
> > +       err = __unwind_frame(frame, &info, NULL);
> > +       if (err)
> > +               return err;
> >
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >         if (tsk->ret_stack &&
> > @@ -143,20 +168,27 @@ static int notrace unwind_frame(struct task_struct *tsk,
> >  }
> >  NOKPROBE_SYMBOL(unwind_frame);
> >
> > -static void notrace walk_stackframe(struct task_struct *tsk,
> > -                                   struct stackframe *frame,
> > -                                   bool (*fn)(void *, unsigned long), void *data)
> > +static void notrace __walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
> > +               bool (*fn)(void *, unsigned long), void *data,
> > +               int (*unwind_frame_fn)(struct task_struct *tsk, struct stackframe *frame))
> >  {
> >         while (1) {
> >                 int ret;
> >
> >                 if (!fn(data, frame->pc))
> >                         break;
> > -               ret = unwind_frame(tsk, frame);
> > +               ret = unwind_frame_fn(tsk, frame);
> >                 if (ret < 0)
> >                         break;
> >         }
> >  }
> > +
> > +static void notrace walk_stackframe(struct task_struct *tsk,
> > +                                   struct stackframe *frame,
> > +                                   bool (*fn)(void *, unsigned long), void *data)
> > +{
> > +       __walk_stackframe(tsk, frame, fn, data, unwind_frame);
> > +}
> >  NOKPROBE_SYMBOL(walk_stackframe);
> >
> >  static bool dump_backtrace_entry(void *arg, unsigned long where)
> > @@ -210,3 +242,135 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
> >
> >         walk_stackframe(task, &frame, consume_entry, cookie);
> >  }
> > +
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> > +DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
> > +DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> > +
> > +static inline bool kvm_nvhe_on_overflow_stack(unsigned long sp, unsigned long size,
> > +                                struct stack_info *info)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
> > +       unsigned long high = low + PAGE_SIZE;
> > +
> > +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_OVERFLOW, info);
> > +}
> > +
> > +static inline bool kvm_nvhe_on_hyp_stack(unsigned long sp, unsigned long size,
> > +                                struct stack_info *info)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long low = (unsigned long)panic_info->hyp_stack_base;
> > +       unsigned long high = low + PAGE_SIZE;
> > +
> > +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_HYP, info);
> > +}
> > +
> > +static inline bool kvm_nvhe_on_accessible_stack(unsigned long sp, unsigned long size,
> > +                                      struct stack_info *info)
> > +{
> > +       if (info)
> > +               info->type = STACK_TYPE_UNKNOWN;
> > +
> > +       if (kvm_nvhe_on_hyp_stack(sp, size, info))
> > +               return true;
> > +       if (kvm_nvhe_on_overflow_stack(sp, size, info))
> > +               return true;
> > +
> > +       return false;
> > +}
> > +
> > +static unsigned long kvm_nvhe_hyp_stack_kern_va(unsigned long addr)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long hyp_base, kern_base, hyp_offset;
> > +
> > +       hyp_base = (unsigned long)panic_info->hyp_stack_base;
> > +       hyp_offset = addr - hyp_base;
> > +
> > +       kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
> > +
> > +       return kern_base + hyp_offset;
> > +}
> > +
> > +static unsigned long kvm_nvhe_overflow_stack_kern_va(unsigned long addr)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long hyp_base, kern_base, hyp_offset;
> > +
> > +       hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
> > +       hyp_offset = addr - hyp_base;
> > +
> > +       kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
> > +
> > +       return kern_base + hyp_offset;
> > +}
> > +
> > +/*
> > + * Convert KVM nVHE hypervisor stack VA to a kernel VA.
> > + *
> > + * The nVHE hypervisor stack is mapped in the flexible 'private' VA range, to allow
> > + * for guard pages below the stack. Consequently, the fixed offset address
> > + * translation macros won't work here.
> > + *
> > + * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
> > + * stack base. See: kvm_nvhe_hyp_stack_kern_va(),  kvm_nvhe_overflow_stack_kern_va()
> > + */
> > +static unsigned long kvm_nvhe_stack_kern_va(unsigned long addr,
> > +                                       enum stack_type type)
> > +{
> > +       switch (type) {
> > +       case STACK_TYPE_KVM_NVHE_HYP:
> > +               return kvm_nvhe_hyp_stack_kern_va(addr);
> > +       case STACK_TYPE_KVM_NVHE_OVERFLOW:
> > +               return kvm_nvhe_overflow_stack_kern_va(addr);
> > +       default:
> > +               return 0UL;
> > +       }
> > +}
> > +
> > +static int notrace kvm_nvhe_unwind_frame(struct task_struct *tsk,
> > +                                       struct stackframe *frame)
> > +{
> > +       struct stack_info info;
> > +
> > +       if (!kvm_nvhe_on_accessible_stack(frame->fp, 16, &info))
> > +               return -EINVAL;
> > +
> > +       return  __unwind_frame(frame, &info, kvm_nvhe_stack_kern_va);
> > +}
> > +
> > +static bool kvm_nvhe_dump_backtrace_entry(void *arg, unsigned long where)
> > +{
> > +       unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
> > +       unsigned long hyp_offset = (unsigned long)arg;
> > +
> > +       where &= va_mask;       /* Mask tags */
> > +       where += hyp_offset;    /* Convert to kern addr */
> > +
> > +       kvm_err("[<%016lx>] %pB\n", where, (void *)where);
> > +
> > +       return true;
> > +}
> > +
> > +static void notrace kvm_nvhe_walk_stackframe(struct task_struct *tsk,
> > +                                   struct stackframe *frame,
> > +                                   bool (*fn)(void *, unsigned long), void *data)
> > +{
> > +       __walk_stackframe(tsk, frame, fn, data, kvm_nvhe_unwind_frame);
> > +}
> > +
> > +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       struct stackframe frame;
> > +
> > +       start_backtrace(&frame, panic_info->fp, panic_info->pc);
> > +       pr_err("nVHE HYP call trace:\n");
> > +       kvm_nvhe_walk_stackframe(NULL, &frame, kvm_nvhe_dump_backtrace_entry,
> > +                                       (void *)hyp_offset);
> > +       pr_err("---- end of nVHE HYP call trace ----\n");
> > +}
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > index 8a5fbbf084df..75f2c8255ff0 100644
> > --- a/arch/arm64/kvm/Kconfig
> > +++ b/arch/arm64/kvm/Kconfig
> > @@ -51,8 +51,9 @@ config NVHE_EL2_DEBUG
> >         depends on KVM
> >         help
> >           Say Y here to enable the debug mode for the non-VHE KVM EL2 object.
> > -         Failure reports will BUG() in the hypervisor. This is intended for
> > -         local EL2 hypervisor development.
> > +         Failure reports will BUG() in the hypervisor; and panics will print
> > +         the hypervisor call stack. This is intended for local EL2 hypervisor
> > +         development.
>
> Nit: maybe for clarity you could rephrase as "calls to hyp_panic()
> will result in printing the hypervisor call stack".

Ack. I'll update in the next version.

Thanks,
Kalesh
>
> Thanks,
> /fuad
>
>
> >
> >           If unsure, say N.
> >
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 7a23630c4a7f..66c07c04eb52 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
> >
> >  DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
> >
> > -static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> > +DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> >  unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
> >  DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
> >
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index e3140abd2e2e..ff69dff33700 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -17,6 +17,7 @@
> >  #include <asm/kvm_emulate.h>
> >  #include <asm/kvm_mmu.h>
> >  #include <asm/debug-monitors.h>
> > +#include <asm/stacktrace.h>
> >  #include <asm/traps.h>
> >
> >  #include <kvm/arm_hypercalls.h>
> > @@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
> >                 kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
> >         }
> >
> > +       kvm_nvhe_dump_backtrace(hyp_offset);
> > +
> >         /*
> >          * Hyp has panicked and we're going to handle that by panicking the
> >          * kernel. The kernel offset will be revealed in the panic so we're
> > diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> > index efc20273a352..b8ecffc47424 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> > @@ -37,6 +37,22 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
> >  #ifdef CONFIG_NVHE_EL2_DEBUG
> >  DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
> >         __aligned(16);
> > +DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> > +
> > +static inline void cpu_prepare_nvhe_panic_info(void)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
> > +       struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> > +
> > +       panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
> > +       panic_info->hyp_overflow_stack_base = (unsigned long)this_cpu_ptr(hyp_overflow_stack);
> > +       panic_info->fp = (unsigned long)__builtin_frame_address(0);
> > +       panic_info->pc = _THIS_IP_;
> > +}
> > + #else
> > +static inline void cpu_prepare_nvhe_panic_info(void)
> > +{
> > +}
> >  #endif
> >
> >  static void __activate_traps(struct kvm_vcpu *vcpu)
> > @@ -360,6 +376,8 @@ asmlinkage void __noreturn hyp_panic(void)
> >         struct kvm_cpu_context *host_ctxt;
> >         struct kvm_vcpu *vcpu;
> >
> > +       cpu_prepare_nvhe_panic_info();
> > +
> >         host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> >         vcpu = host_ctxt->__hyp_running_vcpu;
> >
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <kvmarm-bounces@lists.cs.columbia.edu>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2451FC433F5
	for <kvmarm@archiver.kernel.org>; Fri, 25 Feb 2022 14:59:53 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by mm01.cs.columbia.edu (Postfix) with ESMTP id CF4954B9EB;
	Fri, 25 Feb 2022 09:59:52 -0500 (EST)
X-Virus-Scanned: at lists.cs.columbia.edu
Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail
	(fail, message has been altered) header.i=@google.com
Received: from mm01.cs.columbia.edu ([127.0.0.1])
	by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id CPaI2iX-eoj3; Fri, 25 Feb 2022 09:59:50 -0500 (EST)
Received: from mm01.cs.columbia.edu (localhost [127.0.0.1])
	by mm01.cs.columbia.edu (Postfix) with ESMTP id 789C14BA58;
	Fri, 25 Feb 2022 09:59:47 -0500 (EST)
Received: from localhost (localhost [127.0.0.1])
 by mm01.cs.columbia.edu (Postfix) with ESMTP id 0766E4BA3C
 for <kvmarm@lists.cs.columbia.edu>; Thu, 24 Feb 2022 13:09:10 -0500 (EST)
X-Virus-Scanned: at lists.cs.columbia.edu
Received: from mm01.cs.columbia.edu ([127.0.0.1])
 by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id gRZtAbNMy34H for <kvmarm@lists.cs.columbia.edu>;
 Thu, 24 Feb 2022 13:09:07 -0500 (EST)
Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com
 [209.85.221.53])
 by mm01.cs.columbia.edu (Postfix) with ESMTPS id 8252B4BA36
 for <kvmarm@lists.cs.columbia.edu>; Thu, 24 Feb 2022 13:09:07 -0500 (EST)
Received: by mail-wr1-f53.google.com with SMTP id x15so910526wrg.8
 for <kvmarm@lists.cs.columbia.edu>; Thu, 24 Feb 2022 10:09:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=JBod19rkADe4RLTH6pIu+iewF45ltCOcT+bAF+Wv0jI=;
 b=jbWx65rXhidUWKLzjE5t5/ELOwE9H4N7N1Py6DVq329fFUHtli6eeZN9hfBsWJ/VZg
 JZBK/CdFrLEGqce2ZoQ9Wl3P5s2ApAyvLgpss0vbEIWK9I2Lf/N57ZOuoH5ccWswZ4Ss
 7prwfrJSMgCv1bsliPkng8j4B9yloXNYc6hf4OgZBEo7WKGoxQxQ3MjRCy/SU1wPurNK
 vuI2Z/Li4QP2k2hxhW0zNY2l/eo5ISqLSD9U+LTnSGGDYsjJhLKpRrgbbCuUWHerRBsA
 ZygKhLNN1sV2ch5mVvtnMwFFaJuPbj3iw03WfuwBriXD/MXhG0lSCUo8yaeuirAX2jyZ
 +Tsw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=JBod19rkADe4RLTH6pIu+iewF45ltCOcT+bAF+Wv0jI=;
 b=qmcgwDcbSjIMfEmoDzHKKGwjfR35fRwmuxuII7LGknQPLtD/hNXwL3YzfB4QyDrnvT
 T0JjnhJdMmFoRVicuq9vIRscy4/tZIlp3gCkT2eOoTtjAROtnquzU9jL41z+Wuv7ucx/
 xY4WvkRO1Qxe5KqGGgLo16UszwfD76F7dCIapOUjvprQsutTBBqHhIOoAF8cQSRDKH5b
 2DbrXHbs5zLMnkx9jKKQRDIxVsNDPAo+EnlX/k2APyTnYWBpCdGABcgHjIfciwOOYqZZ
 F7YJG/UN07kjQ1MD+0kV7JcUPbhm5k2Ryrjli3lcFdR2ZctfW5XVhQy9Gk53Oo4T3nJ9
 kmSg==
X-Gm-Message-State: AOAM533HSNushFxnBdlJUT+PnBWIc76eNVaOcdzytwyT4wDla00fcXhq
 pt96OShWST7kiy1cQCNjqWiYJCanUGISRi2ZGRPw7Q==
X-Google-Smtp-Source: ABdhPJw3fGngrJIeXA2AngyWEbeuzVREcXS2h92NXlwzoHHBHZ2ErN9+yU7GNB6Mu3tbLl54mrAmF7kecbeXj1BnSLA=
X-Received: by 2002:a5d:4b87:0:b0:1ed:f948:7bcf with SMTP id
 b7-20020a5d4b87000000b001edf9487bcfmr2752186wrt.699.1645726146117; Thu, 24
 Feb 2022 10:09:06 -0800 (PST)
MIME-Version: 1.0
References: <20220224051439.640768-1-kaleshsingh@google.com>
 <20220224051439.640768-8-kaleshsingh@google.com>
 <CA+EHjTwMvQzvyA2Nxq_TKNjM-0_XvTtoKPtBPVPjSn7cboBNUA@mail.gmail.com>
In-Reply-To: <CA+EHjTwMvQzvyA2Nxq_TKNjM-0_XvTtoKPtBPVPjSn7cboBNUA@mail.gmail.com>
From: Kalesh Singh <kaleshsingh@google.com>
Date: Thu, 24 Feb 2022 10:08:55 -0800
Message-ID: <CAC_TJvef9=gtidBJ1T1fEMY6prDAo8dTYr5uCiS=i3o03miuWg@mail.gmail.com>
Subject: Re: [PATCH v3 7/8] KVM: arm64: Unwind and dump nVHE HYP stacktrace
To: Fuad Tabba <tabba@google.com>
X-Mailman-Approved-At: Fri, 25 Feb 2022 09:59:45 -0500
Cc: "Cc: Android Kernel" <kernel-team@android.com>,
 Will Deacon <will@kernel.org>, Peter Collingbourne <pcc@google.com>,
 Marc Zyngier <maz@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
 kvmarm@lists.cs.columbia.edu,
 "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>,
 Mark Brown <broonie@kernel.org>, Masami Hiramatsu <mhiramat@kernel.org>,
 Catalin Marinas <catalin.marinas@arm.com>, Paolo Bonzini <pbonzini@redhat.com>,
 Suren Baghdasaryan <surenb@google.com>,
 "moderated list:ARM64 PORT \(AARCH64 ARCHITECTURE\)"
 <linux-arm-kernel@lists.infradead.org>
X-BeenThere: kvmarm@lists.cs.columbia.edu
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Where KVM/ARM decisions are made <kvmarm.lists.cs.columbia.edu>
List-Unsubscribe: <https://lists.cs.columbia.edu/mailman/options/kvmarm>,
 <mailto:kvmarm-request@lists.cs.columbia.edu?subject=unsubscribe>
List-Archive: <https://lists.cs.columbia.edu/pipermail/kvmarm>
List-Post: <mailto:kvmarm@lists.cs.columbia.edu>
List-Help: <mailto:kvmarm-request@lists.cs.columbia.edu?subject=help>
List-Subscribe: <https://lists.cs.columbia.edu/mailman/listinfo/kvmarm>,
 <mailto:kvmarm-request@lists.cs.columbia.edu?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: kvmarm-bounces@lists.cs.columbia.edu
Sender: kvmarm-bounces@lists.cs.columbia.edu

On Thu, Feb 24, 2022 at 4:28 AM Fuad Tabba <tabba@google.com> wrote:
>
> Hi Kalesh,
>
> On Thu, Feb 24, 2022 at 5:22 AM Kalesh Singh <kaleshsingh@google.com> wrote:
> >
> > Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
> > possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
> > which allows host to access the hypervisor stack pages in EL1.
>
> For this comment to be clearer, and if my understanding is correct, I
> think that it should say that CONFIG_NVHE_EL2_DEBUG allows host stage
> 2 protection to be disabled on a hyp_panic. Otherwise, on reading the
> comment one might think that CONFIG_NVHE_EL2_DEBUG runs without host
> stage 2 protection at all.

Your understanding is correct: the host stage 2  protection is only
disabled on a hyp_panic(). I'll rephrase to make it clearer.

>
> >
> > Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
> > to avoid the potential leaking of information to the host.
> >
> > A simple stack overflow test produces the following output:
> >
> > [  580.376051][  T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
> > [  580.378034][  T412] kvm [412]: nVHE HYP call trace:
> > [  580.378591][  T412] kvm [412]:  [<ffffffc011614934>]
> > [  580.378993][  T412] kvm [412]:  [<ffffffc01160fa48>]
> > [  580.379386][  T412] kvm [412]:  [<ffffffc0116145dc>]  // Non-terminating recursive call
> > [  580.379772][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380158][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380544][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > [  580.380928][  T412] kvm [412]:  [<ffffffc0116145dc>]
> > . . .
> >
> > Since nVHE hyp symbols are not included by kallsyms to avoid issues
> > with aliasing, we fallback to the vmlinux addresses. Symbolizing the
> > addresses is handled in the next patch in this series.
> >
> > Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
> > ---
> >
> > Changes in v3:
> >   - The nvhe hyp stack unwinder now makes use of the core logic from the
> >     regular kernel unwinder to avoid duplication, per Mark
> >
> > Changes in v2:
> >   - Add cpu_prepare_nvhe_panic_info()
> >   - Move updating the panic info to hyp_panic(), so that unwinding also
> >     works for conventional nVHE Hyp-mode.
> >
> >  arch/arm64/include/asm/kvm_asm.h    |  19 +++
> >  arch/arm64/include/asm/stacktrace.h |  12 ++
> >  arch/arm64/kernel/stacktrace.c      | 210 +++++++++++++++++++++++++---
> >  arch/arm64/kvm/Kconfig              |   5 +-
> >  arch/arm64/kvm/arm.c                |   2 +-
> >  arch/arm64/kvm/handle_exit.c        |   3 +
> >  arch/arm64/kvm/hyp/nvhe/switch.c    |  18 +++
> >  7 files changed, 243 insertions(+), 26 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 2e277f2ed671..16efdf150a37 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -176,6 +176,25 @@ struct kvm_nvhe_init_params {
> >         unsigned long vtcr;
> >  };
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +/*
> > + * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
> > + * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
> > + * the host stage 2 protection. See: __hyp_do_panic()
>
> Same as my comment above.

Ack
>
> > + *
> > + * @hyp_stack_base:             hyp VA of the hyp_stack base.
> > + * @hyp_overflow_stack_base:    hyp VA of the hyp_overflow_stack base.
> > + * @fp:                         hyp FP where the backtrace begins.
> > + * @pc:                         hyp PC where the backtrace begins.
> > + */
> > +struct kvm_nvhe_panic_info {
> > +       unsigned long hyp_stack_base;
> > +       unsigned long hyp_overflow_stack_base;
> > +       unsigned long fp;
> > +       unsigned long pc;
> > +};
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > +
> >  /* Translate a kernel address @ptr into its equivalent linear mapping */
> >  #define kvm_ksym_ref(ptr)                                              \
> >         ({                                                              \
> > diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> > index e77cdef9ca29..18611a51cf14 100644
> > --- a/arch/arm64/include/asm/stacktrace.h
> > +++ b/arch/arm64/include/asm/stacktrace.h
> > @@ -22,6 +22,10 @@ enum stack_type {
> >         STACK_TYPE_OVERFLOW,
> >         STACK_TYPE_SDEI_NORMAL,
> >         STACK_TYPE_SDEI_CRITICAL,
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +       STACK_TYPE_KVM_NVHE_HYP,
> > +       STACK_TYPE_KVM_NVHE_OVERFLOW,
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> >         __NR_STACK_TYPES
> >  };
> >
> > @@ -147,4 +151,12 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
> >         return false;
> >  }
> >
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset);
> > +#else
> > +static inline void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> > +{
> > +}
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > +
> >  #endif /* __ASM_STACKTRACE_H */
> > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> > index e4103e085681..6ec85cb69b1f 100644
> > --- a/arch/arm64/kernel/stacktrace.c
> > +++ b/arch/arm64/kernel/stacktrace.c
> > @@ -15,6 +15,8 @@
> >
> >  #include <asm/irq.h>
> >  #include <asm/pointer_auth.h>
> > +#include <asm/kvm_asm.h>
> > +#include <asm/kvm_hyp.h>
> >  #include <asm/stack_pointer.h>
> >  #include <asm/stacktrace.h>
> >
> > @@ -64,26 +66,15 @@ NOKPROBE_SYMBOL(start_backtrace);
> >   * records (e.g. a cycle), determined based on the location and fp value of A
> >   * and the location (but not the fp value) of B.
> >   */
> > -static int notrace unwind_frame(struct task_struct *tsk,
> > -                               struct stackframe *frame)
> > +static int notrace __unwind_frame(struct stackframe *frame, struct stack_info *info,
> > +               unsigned long (*translate_fp)(unsigned long, enum stack_type))
> >  {
> >         unsigned long fp = frame->fp;
> > -       struct stack_info info;
> > -
> > -       if (!tsk)
> > -               tsk = current;
> > -
> > -       /* Final frame; nothing to unwind */
> > -       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> > -               return -ENOENT;
> >
> >         if (fp & 0x7)
> >                 return -EINVAL;
> >
> > -       if (!on_accessible_stack(tsk, fp, 16, &info))
> > -               return -EINVAL;
> > -
> > -       if (test_bit(info.type, frame->stacks_done))
> > +       if (test_bit(info->type, frame->stacks_done))
> >                 return -EINVAL;
> >
> >         /*
> > @@ -94,28 +85,62 @@ static int notrace unwind_frame(struct task_struct *tsk,
> >          *
> >          * TASK -> IRQ -> OVERFLOW -> SDEI_NORMAL
> >          * TASK -> SDEI_NORMAL -> SDEI_CRITICAL -> OVERFLOW
> > +        * KVM_NVHE_HYP -> KVM_NVHE_OVERFLOW
> >          *
> >          * ... but the nesting itself is strict. Once we transition from one
> >          * stack to another, it's never valid to unwind back to that first
> >          * stack.
> >          */
> > -       if (info.type == frame->prev_type) {
> > +       if (info->type == frame->prev_type) {
> >                 if (fp <= frame->prev_fp)
> >                         return -EINVAL;
> >         } else {
> >                 set_bit(frame->prev_type, frame->stacks_done);
> >         }
> >
> > +       /* Record fp as prev_fp before attempting to get the next fp */
> > +       frame->prev_fp = fp;
> > +
> > +       /*
> > +        * If fp is not from the current address space perform the
> > +        * necessary translation before dereferencing it to get next fp.
> > +        */
> > +       if (translate_fp)
> > +               fp = translate_fp(fp, info->type);
> > +       if (!fp)
> > +               return -EINVAL;
> > +
> >         /*
> >          * Record this frame record's values and location. The prev_fp and
> > -        * prev_type are only meaningful to the next unwind_frame() invocation.
> > +        * prev_type are only meaningful to the next __unwind_frame() invocation.
> >          */
> >         frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> >         frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> > -       frame->prev_fp = fp;
> > -       frame->prev_type = info.type;
> > -
> >         frame->pc = ptrauth_strip_insn_pac(frame->pc);
> > +       frame->prev_type = info->type;
> > +
> > +       return 0;
> > +}
> > +
> > +static int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
> > +{
> > +       unsigned long fp = frame->fp;
> > +       struct stack_info info;
> > +       int err;
> > +
> > +       if (!tsk)
> > +               tsk = current;
> > +
> > +       /* Final frame; nothing to unwind */
> > +       if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> > +               return -ENOENT;
> > +
> > +       if (!on_accessible_stack(tsk, fp, 16, &info))
> > +               return -EINVAL;
> > +
> > +       err = __unwind_frame(frame, &info, NULL);
> > +       if (err)
> > +               return err;
> >
> >  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> >         if (tsk->ret_stack &&
> > @@ -143,20 +168,27 @@ static int notrace unwind_frame(struct task_struct *tsk,
> >  }
> >  NOKPROBE_SYMBOL(unwind_frame);
> >
> > -static void notrace walk_stackframe(struct task_struct *tsk,
> > -                                   struct stackframe *frame,
> > -                                   bool (*fn)(void *, unsigned long), void *data)
> > +static void notrace __walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
> > +               bool (*fn)(void *, unsigned long), void *data,
> > +               int (*unwind_frame_fn)(struct task_struct *tsk, struct stackframe *frame))
> >  {
> >         while (1) {
> >                 int ret;
> >
> >                 if (!fn(data, frame->pc))
> >                         break;
> > -               ret = unwind_frame(tsk, frame);
> > +               ret = unwind_frame_fn(tsk, frame);
> >                 if (ret < 0)
> >                         break;
> >         }
> >  }
> > +
> > +static void notrace walk_stackframe(struct task_struct *tsk,
> > +                                   struct stackframe *frame,
> > +                                   bool (*fn)(void *, unsigned long), void *data)
> > +{
> > +       __walk_stackframe(tsk, frame, fn, data, unwind_frame);
> > +}
> >  NOKPROBE_SYMBOL(walk_stackframe);
> >
> >  static bool dump_backtrace_entry(void *arg, unsigned long where)
> > @@ -210,3 +242,135 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
> >
> >         walk_stackframe(task, &frame, consume_entry, cookie);
> >  }
> > +
> > +#ifdef CONFIG_NVHE_EL2_DEBUG
> > +DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> > +DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
> > +DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> > +
> > +static inline bool kvm_nvhe_on_overflow_stack(unsigned long sp, unsigned long size,
> > +                                struct stack_info *info)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
> > +       unsigned long high = low + PAGE_SIZE;
> > +
> > +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_OVERFLOW, info);
> > +}
> > +
> > +static inline bool kvm_nvhe_on_hyp_stack(unsigned long sp, unsigned long size,
> > +                                struct stack_info *info)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long low = (unsigned long)panic_info->hyp_stack_base;
> > +       unsigned long high = low + PAGE_SIZE;
> > +
> > +       return on_stack(sp, size, low, high, STACK_TYPE_KVM_NVHE_HYP, info);
> > +}
> > +
> > +static inline bool kvm_nvhe_on_accessible_stack(unsigned long sp, unsigned long size,
> > +                                      struct stack_info *info)
> > +{
> > +       if (info)
> > +               info->type = STACK_TYPE_UNKNOWN;
> > +
> > +       if (kvm_nvhe_on_hyp_stack(sp, size, info))
> > +               return true;
> > +       if (kvm_nvhe_on_overflow_stack(sp, size, info))
> > +               return true;
> > +
> > +       return false;
> > +}
> > +
> > +static unsigned long kvm_nvhe_hyp_stack_kern_va(unsigned long addr)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long hyp_base, kern_base, hyp_offset;
> > +
> > +       hyp_base = (unsigned long)panic_info->hyp_stack_base;
> > +       hyp_offset = addr - hyp_base;
> > +
> > +       kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
> > +
> > +       return kern_base + hyp_offset;
> > +}
> > +
> > +static unsigned long kvm_nvhe_overflow_stack_kern_va(unsigned long addr)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       unsigned long hyp_base, kern_base, hyp_offset;
> > +
> > +       hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
> > +       hyp_offset = addr - hyp_base;
> > +
> > +       kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
> > +
> > +       return kern_base + hyp_offset;
> > +}
> > +
> > +/*
> > + * Convert KVM nVHE hypervisor stack VA to a kernel VA.
> > + *
> > + * The nVHE hypervisor stack is mapped in the flexible 'private' VA range, to allow
> > + * for guard pages below the stack. Consequently, the fixed offset address
> > + * translation macros won't work here.
> > + *
> > + * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
> > + * stack base. See: kvm_nvhe_hyp_stack_kern_va(),  kvm_nvhe_overflow_stack_kern_va()
> > + */
> > +static unsigned long kvm_nvhe_stack_kern_va(unsigned long addr,
> > +                                       enum stack_type type)
> > +{
> > +       switch (type) {
> > +       case STACK_TYPE_KVM_NVHE_HYP:
> > +               return kvm_nvhe_hyp_stack_kern_va(addr);
> > +       case STACK_TYPE_KVM_NVHE_OVERFLOW:
> > +               return kvm_nvhe_overflow_stack_kern_va(addr);
> > +       default:
> > +               return 0UL;
> > +       }
> > +}
> > +
> > +static int notrace kvm_nvhe_unwind_frame(struct task_struct *tsk,
> > +                                       struct stackframe *frame)
> > +{
> > +       struct stack_info info;
> > +
> > +       if (!kvm_nvhe_on_accessible_stack(frame->fp, 16, &info))
> > +               return -EINVAL;
> > +
> > +       return  __unwind_frame(frame, &info, kvm_nvhe_stack_kern_va);
> > +}
> > +
> > +static bool kvm_nvhe_dump_backtrace_entry(void *arg, unsigned long where)
> > +{
> > +       unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
> > +       unsigned long hyp_offset = (unsigned long)arg;
> > +
> > +       where &= va_mask;       /* Mask tags */
> > +       where += hyp_offset;    /* Convert to kern addr */
> > +
> > +       kvm_err("[<%016lx>] %pB\n", where, (void *)where);
> > +
> > +       return true;
> > +}
> > +
> > +static void notrace kvm_nvhe_walk_stackframe(struct task_struct *tsk,
> > +                                   struct stackframe *frame,
> > +                                   bool (*fn)(void *, unsigned long), void *data)
> > +{
> > +       __walk_stackframe(tsk, frame, fn, data, kvm_nvhe_unwind_frame);
> > +}
> > +
> > +void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
> > +       struct stackframe frame;
> > +
> > +       start_backtrace(&frame, panic_info->fp, panic_info->pc);
> > +       pr_err("nVHE HYP call trace:\n");
> > +       kvm_nvhe_walk_stackframe(NULL, &frame, kvm_nvhe_dump_backtrace_entry,
> > +                                       (void *)hyp_offset);
> > +       pr_err("---- end of nVHE HYP call trace ----\n");
> > +}
> > +#endif /* CONFIG_NVHE_EL2_DEBUG */
> > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > index 8a5fbbf084df..75f2c8255ff0 100644
> > --- a/arch/arm64/kvm/Kconfig
> > +++ b/arch/arm64/kvm/Kconfig
> > @@ -51,8 +51,9 @@ config NVHE_EL2_DEBUG
> >         depends on KVM
> >         help
> >           Say Y here to enable the debug mode for the non-VHE KVM EL2 object.
> > -         Failure reports will BUG() in the hypervisor. This is intended for
> > -         local EL2 hypervisor development.
> > +         Failure reports will BUG() in the hypervisor; and panics will print
> > +         the hypervisor call stack. This is intended for local EL2 hypervisor
> > +         development.
>
> Nit: maybe for clarity you could rephrase as "calls to hyp_panic()
> will result in printing the hypervisor call stack".

Ack. I'll update in the next version.

Thanks,
Kalesh
>
> Thanks,
> /fuad
>
>
> >
> >           If unsure, say N.
> >
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 7a23630c4a7f..66c07c04eb52 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
> >
> >  DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
> >
> > -static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> > +DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
> >  unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
> >  DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
> >
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index e3140abd2e2e..ff69dff33700 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -17,6 +17,7 @@
> >  #include <asm/kvm_emulate.h>
> >  #include <asm/kvm_mmu.h>
> >  #include <asm/debug-monitors.h>
> > +#include <asm/stacktrace.h>
> >  #include <asm/traps.h>
> >
> >  #include <kvm/arm_hypercalls.h>
> > @@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
> >                 kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
> >         }
> >
> > +       kvm_nvhe_dump_backtrace(hyp_offset);
> > +
> >         /*
> >          * Hyp has panicked and we're going to handle that by panicking the
> >          * kernel. The kernel offset will be revealed in the panic so we're
> > diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> > index efc20273a352..b8ecffc47424 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> > @@ -37,6 +37,22 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
> >  #ifdef CONFIG_NVHE_EL2_DEBUG
> >  DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
> >         __aligned(16);
> > +DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
> > +
> > +static inline void cpu_prepare_nvhe_panic_info(void)
> > +{
> > +       struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
> > +       struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
> > +
> > +       panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
> > +       panic_info->hyp_overflow_stack_base = (unsigned long)this_cpu_ptr(hyp_overflow_stack);
> > +       panic_info->fp = (unsigned long)__builtin_frame_address(0);
> > +       panic_info->pc = _THIS_IP_;
> > +}
> > + #else
> > +static inline void cpu_prepare_nvhe_panic_info(void)
> > +{
> > +}
> >  #endif
> >
> >  static void __activate_traps(struct kvm_vcpu *vcpu)
> > @@ -360,6 +376,8 @@ asmlinkage void __noreturn hyp_panic(void)
> >         struct kvm_cpu_context *host_ctxt;
> >         struct kvm_vcpu *vcpu;
> >
> > +       cpu_prepare_nvhe_panic_info();
> > +
> >         host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> >         vcpu = host_ctxt->__hyp_running_vcpu;
> >
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm