All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marco Elver <elver@google.com>
To: andrey.konovalov@linux.dev
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Andrey Konovalov <andreyknvl@gmail.com>,
	Alexander Potapenko <glider@google.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	Andrey Ryabinin <ryabinin.a.a@gmail.com>,
	kasan-dev@googlegroups.com,
	Vincenzo Frascino <vincenzo.frascino@arm.com>,
	Will Deacon <will@kernel.org>,
	linux-arm-kernel@lists.infradead.org,
	Peter Collingbourne <pcc@google.com>,
	Evgenii Stepanov <eugenis@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Weizhao Ouyang <ouyangweizhao@zeku.com>,
	linux-kernel@vger.kernel.org,
	Andrey Konovalov <andreyknvl@google.com>
Subject: Re: [PATCH 5/5] kasan: suppress recursive reports for HW_TAGS
Date: Mon, 13 Mar 2023 12:19:30 +0100	[thread overview]
Message-ID: <CANpmjNMpjREcMc2iUS2ycUih9SRbP93mUaNPXcDZAd-ZDT2d+g@mail.gmail.com> (raw)
In-Reply-To: <59f433e00f7fa985e8bf9f7caf78574db16b67ab.1678491668.git.andreyknvl@google.com>

On Sat, 11 Mar 2023 at 00:43, <andrey.konovalov@linux.dev> wrote:
>
> From: Andrey Konovalov <andreyknvl@google.com>
>
> KASAN suppresses reports for bad accesses done by the KASAN reporting
> code. The reporting code might access poisoned memory for reporting
> purposes.
>
> Software KASAN modes do this by suppressing reports during reporting
> via current->kasan_depth, the same way they suppress reports during
> accesses to poisoned slab metadata.
>
> Hardware Tag-Based KASAN does not use current->kasan_depth, and instead
> resets pointer tags for accesses to poisoned memory done by the reporting
> code.
>
> Despite that, a recursive report can still happen:
>
> 1. On hardware with faulty MTE support. This was observed by Weizhao
>    Ouyang on a faulty hardware that caused memory tags to randomly change
>    from time to time.
>
> 2. Theoretically, due to a previous MTE-undetected memory corruption.
>
> A recursive report can happen via:
>
> 1. Accessing a pointer with a non-reset tag in the reporting code, e.g.
>    slab->slab_cache, which is what Weizhao Ouyang observed.
>
> 2. Theoretically, via external non-annotated routines, e.g. stackdepot.
>
> To resolve this issue, resetting tags for all of the pointers in the
> reporting code and all the used external routines would be impractical.
>
> Instead, disable tag checking done by the CPU for the duration of KASAN
> reporting for Hardware Tag-Based KASAN.
>
> Without this fix, Hardware Tag-Based KASAN reporting code might deadlock.
>
> Fixes: 2e903b914797 ("kasan, arm64: implement HW_TAGS runtime")
> Reported-by: Weizhao Ouyang <ouyangweizhao@zeku.com>
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
>
> ---
>
> Considering that 1. the bug this patch fixes was only observed on faulty
> MTE hardware, and 2. the patch depends on the other patches in this series,
> I don't think it's worth backporting it into stable.
> ---
>  mm/kasan/report.c | 59 ++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 48 insertions(+), 11 deletions(-)
>
> diff --git a/mm/kasan/report.c b/mm/kasan/report.c
> index 89078f912827..77a88d85c0a7 100644
> --- a/mm/kasan/report.c
> +++ b/mm/kasan/report.c
> @@ -72,10 +72,18 @@ static int __init kasan_set_multi_shot(char *str)
>  __setup("kasan_multi_shot", kasan_set_multi_shot);
>
>  /*
> - * Used to suppress reports within kasan_disable/enable_current() critical
> - * sections, which are used for marking accesses to slab metadata.
> + * This function is used to check whether KASAN reports are suppressed for
> + * software KASAN modes via kasan_disable/enable_current() critical sections.
> + *
> + * This is done to avoid:
> + * 1. False-positive reports when accessing slab metadata,
> + * 2. Deadlocking when poisoned memory is accessed by the reporting code.
> + *
> + * Hardware Tag-Based KASAN instead relies on:
> + * For #1: Resetting tags via kasan_reset_tag().
> + * For #2: Supression of tag checks via CPU, see report_suppress_start/end().

Typo: "Suppression"

>   */
> -static bool report_suppressed(void)
> +static bool report_suppressed_sw(void)
>  {
>  #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
>         if (current->kasan_depth)
> @@ -84,6 +92,30 @@ static bool report_suppressed(void)
>         return false;
>  }
>
> +static void report_suppress_start(void)
> +{
> +#ifdef CONFIG_KASAN_HW_TAGS
> +       /*
> +        * Disable migration for the duration of printing a KASAN report, as
> +        * hw_suppress_tag_checks_start() disables checks on the current CPU.
> +        */
> +       migrate_disable();

This still allows this task to be preempted by another task. If the
other task is scheduled in right after hw_suppress_tag_checks_start()
then there won't be any tag checking in that task. If HW-tags KASAN is
used as a mitigation technique, that may unnecessarily weaken KASAN,
because right after report_suppress_start(), it does
spin_lock_irqsave() which disables interrupts (and thereby preemption)
anyway.

Why not just use preempt_disable()?

> +       hw_suppress_tag_checks_start();
> +#else
> +       kasan_disable_current();
> +#endif
> +}
> +
> +static void report_suppress_stop(void)
> +{
> +#ifdef CONFIG_KASAN_HW_TAGS
> +       hw_suppress_tag_checks_stop();
> +       migrate_enable();
> +#else
> +       kasan_enable_current();
> +#endif
> +}
> +
>  /*
>   * Used to avoid reporting more than one KASAN bug unless kasan_multi_shot
>   * is enabled. Note that KASAN tests effectively enable kasan_multi_shot
> @@ -174,7 +206,7 @@ static void start_report(unsigned long *flags, bool sync)
>         /* Do not allow LOCKDEP mangling KASAN reports. */
>         lockdep_off();
>         /* Make sure we don't end up in loop. */
> -       kasan_disable_current();
> +       report_suppress_start();
>         spin_lock_irqsave(&report_lock, *flags);
>         pr_err("==================================================================\n");
>  }
> @@ -192,7 +224,7 @@ static void end_report(unsigned long *flags, void *addr)
>                 panic("kasan.fault=panic set ...\n");
>         add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
>         lockdep_on();
> -       kasan_enable_current();
> +       report_suppress_stop();
>  }
>
>  static void print_error_description(struct kasan_report_info *info)
> @@ -480,9 +512,13 @@ void kasan_report_invalid_free(void *ptr, unsigned long ip, enum kasan_report_ty
>         struct kasan_report_info info;
>
>         /*
> -        * Do not check report_suppressed(), as an invalid-free cannot be
> -        * caused by accessing slab metadata and thus should not be
> -        * suppressed by kasan_disable/enable_current() critical sections.
> +        * Do not check report_suppressed_sw(), as an invalid-free cannot be
> +        * caused by accessing poisoned memory and thus should not be suppressed
> +        * by kasan_disable/enable_current() critical sections.
> +        *
> +        * Note that for Hardware Tag-Based KASAN, kasan_report_invalid_free()
> +        * is triggered by explicit tag checks and not by the ones performed by
> +        * the CPU. Thus, reporting invalid-free is not suppressed as well.
>          */
>         if (unlikely(!report_enabled()))
>                 return;
> @@ -517,7 +553,7 @@ bool kasan_report(unsigned long addr, size_t size, bool is_write,
>         unsigned long irq_flags;
>         struct kasan_report_info info;
>
> -       if (unlikely(report_suppressed()) || unlikely(!report_enabled())) {
> +       if (unlikely(report_suppressed_sw()) || unlikely(!report_enabled())) {
>                 ret = false;
>                 goto out;
>         }
> @@ -549,8 +585,9 @@ void kasan_report_async(void)
>         unsigned long flags;
>
>         /*
> -        * Do not check report_suppressed(), as kasan_disable/enable_current()
> -        * critical sections do not affect Hardware Tag-Based KASAN.
> +        * Do not check report_suppressed_sw(), as
> +        * kasan_disable/enable_current() critical sections do not affect
> +        * Hardware Tag-Based KASAN.
>          */
>         if (unlikely(!report_enabled()))
>                 return;
> --
> 2.25.1
>

WARNING: multiple messages have this Message-ID (diff)
From: Marco Elver <elver@google.com>
To: andrey.konovalov@linux.dev
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Andrey Konovalov <andreyknvl@gmail.com>,
	 Alexander Potapenko <glider@google.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	 Andrey Ryabinin <ryabinin.a.a@gmail.com>,
	kasan-dev@googlegroups.com,
	 Vincenzo Frascino <vincenzo.frascino@arm.com>,
	Will Deacon <will@kernel.org>,
	 linux-arm-kernel@lists.infradead.org,
	Peter Collingbourne <pcc@google.com>,
	 Evgenii Stepanov <eugenis@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org,  Weizhao Ouyang <ouyangweizhao@zeku.com>,
	linux-kernel@vger.kernel.org,
	 Andrey Konovalov <andreyknvl@google.com>
Subject: Re: [PATCH 5/5] kasan: suppress recursive reports for HW_TAGS
Date: Mon, 13 Mar 2023 12:19:30 +0100	[thread overview]
Message-ID: <CANpmjNMpjREcMc2iUS2ycUih9SRbP93mUaNPXcDZAd-ZDT2d+g@mail.gmail.com> (raw)
In-Reply-To: <59f433e00f7fa985e8bf9f7caf78574db16b67ab.1678491668.git.andreyknvl@google.com>

On Sat, 11 Mar 2023 at 00:43, <andrey.konovalov@linux.dev> wrote:
>
> From: Andrey Konovalov <andreyknvl@google.com>
>
> KASAN suppresses reports for bad accesses done by the KASAN reporting
> code. The reporting code might access poisoned memory for reporting
> purposes.
>
> Software KASAN modes do this by suppressing reports during reporting
> via current->kasan_depth, the same way they suppress reports during
> accesses to poisoned slab metadata.
>
> Hardware Tag-Based KASAN does not use current->kasan_depth, and instead
> resets pointer tags for accesses to poisoned memory done by the reporting
> code.
>
> Despite that, a recursive report can still happen:
>
> 1. On hardware with faulty MTE support. This was observed by Weizhao
>    Ouyang on a faulty hardware that caused memory tags to randomly change
>    from time to time.
>
> 2. Theoretically, due to a previous MTE-undetected memory corruption.
>
> A recursive report can happen via:
>
> 1. Accessing a pointer with a non-reset tag in the reporting code, e.g.
>    slab->slab_cache, which is what Weizhao Ouyang observed.
>
> 2. Theoretically, via external non-annotated routines, e.g. stackdepot.
>
> To resolve this issue, resetting tags for all of the pointers in the
> reporting code and all the used external routines would be impractical.
>
> Instead, disable tag checking done by the CPU for the duration of KASAN
> reporting for Hardware Tag-Based KASAN.
>
> Without this fix, Hardware Tag-Based KASAN reporting code might deadlock.
>
> Fixes: 2e903b914797 ("kasan, arm64: implement HW_TAGS runtime")
> Reported-by: Weizhao Ouyang <ouyangweizhao@zeku.com>
> Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
>
> ---
>
> Considering that 1. the bug this patch fixes was only observed on faulty
> MTE hardware, and 2. the patch depends on the other patches in this series,
> I don't think it's worth backporting it into stable.
> ---
>  mm/kasan/report.c | 59 ++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 48 insertions(+), 11 deletions(-)
>
> diff --git a/mm/kasan/report.c b/mm/kasan/report.c
> index 89078f912827..77a88d85c0a7 100644
> --- a/mm/kasan/report.c
> +++ b/mm/kasan/report.c
> @@ -72,10 +72,18 @@ static int __init kasan_set_multi_shot(char *str)
>  __setup("kasan_multi_shot", kasan_set_multi_shot);
>
>  /*
> - * Used to suppress reports within kasan_disable/enable_current() critical
> - * sections, which are used for marking accesses to slab metadata.
> + * This function is used to check whether KASAN reports are suppressed for
> + * software KASAN modes via kasan_disable/enable_current() critical sections.
> + *
> + * This is done to avoid:
> + * 1. False-positive reports when accessing slab metadata,
> + * 2. Deadlocking when poisoned memory is accessed by the reporting code.
> + *
> + * Hardware Tag-Based KASAN instead relies on:
> + * For #1: Resetting tags via kasan_reset_tag().
> + * For #2: Supression of tag checks via CPU, see report_suppress_start/end().

Typo: "Suppression"

>   */
> -static bool report_suppressed(void)
> +static bool report_suppressed_sw(void)
>  {
>  #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
>         if (current->kasan_depth)
> @@ -84,6 +92,30 @@ static bool report_suppressed(void)
>         return false;
>  }
>
> +static void report_suppress_start(void)
> +{
> +#ifdef CONFIG_KASAN_HW_TAGS
> +       /*
> +        * Disable migration for the duration of printing a KASAN report, as
> +        * hw_suppress_tag_checks_start() disables checks on the current CPU.
> +        */
> +       migrate_disable();

This still allows this task to be preempted by another task. If the
other task is scheduled in right after hw_suppress_tag_checks_start()
then there won't be any tag checking in that task. If HW-tags KASAN is
used as a mitigation technique, that may unnecessarily weaken KASAN,
because right after report_suppress_start(), it does
spin_lock_irqsave() which disables interrupts (and thereby preemption)
anyway.

Why not just use preempt_disable()?

> +       hw_suppress_tag_checks_start();
> +#else
> +       kasan_disable_current();
> +#endif
> +}
> +
> +static void report_suppress_stop(void)
> +{
> +#ifdef CONFIG_KASAN_HW_TAGS
> +       hw_suppress_tag_checks_stop();
> +       migrate_enable();
> +#else
> +       kasan_enable_current();
> +#endif
> +}
> +
>  /*
>   * Used to avoid reporting more than one KASAN bug unless kasan_multi_shot
>   * is enabled. Note that KASAN tests effectively enable kasan_multi_shot
> @@ -174,7 +206,7 @@ static void start_report(unsigned long *flags, bool sync)
>         /* Do not allow LOCKDEP mangling KASAN reports. */
>         lockdep_off();
>         /* Make sure we don't end up in loop. */
> -       kasan_disable_current();
> +       report_suppress_start();
>         spin_lock_irqsave(&report_lock, *flags);
>         pr_err("==================================================================\n");
>  }
> @@ -192,7 +224,7 @@ static void end_report(unsigned long *flags, void *addr)
>                 panic("kasan.fault=panic set ...\n");
>         add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
>         lockdep_on();
> -       kasan_enable_current();
> +       report_suppress_stop();
>  }
>
>  static void print_error_description(struct kasan_report_info *info)
> @@ -480,9 +512,13 @@ void kasan_report_invalid_free(void *ptr, unsigned long ip, enum kasan_report_ty
>         struct kasan_report_info info;
>
>         /*
> -        * Do not check report_suppressed(), as an invalid-free cannot be
> -        * caused by accessing slab metadata and thus should not be
> -        * suppressed by kasan_disable/enable_current() critical sections.
> +        * Do not check report_suppressed_sw(), as an invalid-free cannot be
> +        * caused by accessing poisoned memory and thus should not be suppressed
> +        * by kasan_disable/enable_current() critical sections.
> +        *
> +        * Note that for Hardware Tag-Based KASAN, kasan_report_invalid_free()
> +        * is triggered by explicit tag checks and not by the ones performed by
> +        * the CPU. Thus, reporting invalid-free is not suppressed as well.
>          */
>         if (unlikely(!report_enabled()))
>                 return;
> @@ -517,7 +553,7 @@ bool kasan_report(unsigned long addr, size_t size, bool is_write,
>         unsigned long irq_flags;
>         struct kasan_report_info info;
>
> -       if (unlikely(report_suppressed()) || unlikely(!report_enabled())) {
> +       if (unlikely(report_suppressed_sw()) || unlikely(!report_enabled())) {
>                 ret = false;
>                 goto out;
>         }
> @@ -549,8 +585,9 @@ void kasan_report_async(void)
>         unsigned long flags;
>
>         /*
> -        * Do not check report_suppressed(), as kasan_disable/enable_current()
> -        * critical sections do not affect Hardware Tag-Based KASAN.
> +        * Do not check report_suppressed_sw(), as
> +        * kasan_disable/enable_current() critical sections do not affect
> +        * Hardware Tag-Based KASAN.
>          */
>         if (unlikely(!report_enabled()))
>                 return;
> --
> 2.25.1
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2023-03-13 11:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-10 23:43 [PATCH 1/5] kasan: drop empty tagging-related defines andrey.konovalov
2023-03-10 23:43 ` andrey.konovalov
2023-03-10 23:43 ` [PATCH 2/5] kasan, arm64: rename tagging-related routines andrey.konovalov
2023-03-10 23:43   ` andrey.konovalov
2023-03-10 23:43 ` [PATCH 3/5] arm64: mte: Rename TCO routines andrey.konovalov
2023-03-10 23:43   ` andrey.konovalov
2023-03-10 23:43 ` [PATCH 4/5] kasan, arm64: add arch_suppress_tag_checks_start/stop andrey.konovalov
2023-03-10 23:43   ` andrey.konovalov
2023-03-10 23:43 ` [PATCH 5/5] kasan: suppress recursive reports for HW_TAGS andrey.konovalov
2023-03-10 23:43   ` andrey.konovalov
2023-03-13 11:19   ` Marco Elver [this message]
2023-03-13 11:19     ` Marco Elver
2023-03-14 17:56     ` Andrey Konovalov
2023-03-14 17:56       ` Andrey Konovalov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANpmjNMpjREcMc2iUS2ycUih9SRbP93mUaNPXcDZAd-ZDT2d+g@mail.gmail.com \
    --to=elver@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrey.konovalov@linux.dev \
    --cc=andreyknvl@gmail.com \
    --cc=andreyknvl@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dvyukov@google.com \
    --cc=eugenis@google.com \
    --cc=glider@google.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ouyangweizhao@zeku.com \
    --cc=pcc@google.com \
    --cc=ryabinin.a.a@gmail.com \
    --cc=vincenzo.frascino@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.