linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Marco Elver <elver@google.com>
To: Liu Shixin <liushixin2@huawei.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>,
	akpm@linux-foundation.org,  glider@google.com,
	dvyukov@google.com, jannh@google.com, mark.rutland@arm.com,
	 linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kasan-dev@googlegroups.com,  hdanton@sina.com
Subject: Re: [PATCH v2 2/3] kfence: maximize allocation wait timeout duration
Date: Sat, 18 Sep 2021 11:45:15 +0200	[thread overview]
Message-ID: <CANpmjNOUt5is7iHCAz9aOdD2nBb_7tqAKXmuWtitY_VNOkmv5w@mail.gmail.com> (raw)
In-Reply-To: <CANpmjNPj5aMPu_7D=cwrDyAwz9i-rVcXYgGapYdB+vdHcR3RZg@mail.gmail.com>

On Sat, 18 Sept 2021 at 11:37, Marco Elver <elver@google.com> wrote:
>
> On Sat, 18 Sept 2021 at 10:07, Liu Shixin <liushixin2@huawei.com> wrote:
> >
> > On 2021/9/16 16:49, Marco Elver wrote:
> > > On Thu, 16 Sept 2021 at 03:20, Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> > >> Hi Marco,
> > >>
> > >> We found kfence_test will fails  on ARM64 with this patch with/without
> > >> CONFIG_DETECT_HUNG_TASK,
> > >>
> > >> Any thought ?
> > > Please share log and instructions to reproduce if possible. Also, if
> > > possible, please share bisection log that led you to this patch.
> > >
> > > I currently do not see how this patch would cause that, it only
> > > increases the timeout duration.
> > >
> > > I know that under QEMU TCG mode, there are occasionally timeouts in
> > > the test simply due to QEMU being extremely slow or other weirdness.
> > >
> > >
> > Hi Marco,
> >
> > There are some of the results of the current test:
> > 1. Using qemu-kvm on arm64 machine, all testcase can pass.
> > 2. Using qemu-system-aarch64 on x86_64 machine, randomly some testcases fail.
> > 3. Using qemu-system-aarch64 on x86_64, but removing the judgment of kfence_allocation_key in kfence_alloc(), all testcase can pass.
> >
> > I add some printing to the kernel and get very strange results.
> > I add a new variable kfence_allocation_key_gate to track the
> > state of kfence_allocation_key. As shown in the following code, theoretically,
> > if kfence_allocation_key_gate is zero, then kfence_allocation_key must be
> > enabled, so the value of variable error in kfence_alloc() should always be
> > zero. In fact, all the passed testcases fit this point. But as shown in the
> > following failed log, although kfence_allocation_key has been enabled, it's
> > still check failed here.
> >
> > So I think static_key might be problematic in my qemu environment.
> > The change of timeout is not a problem but caused us to observe this problem.
> > I tried changing the wait_event to a loop. I set timeout to HZ and re-enable/disabled
> > in each loop, then the failed testcase disappears.
>
> Nice analysis, thanks! What I gather is that static_keys/jump_labels
> are somehow broken in QEMU.
>
> This does remind me that I found a bug in QEMU that might be relevant:
> https://bugs.launchpad.net/qemu/+bug/1920934
> Looks like it was never fixed. :-/
>
> The failures I encountered caused the kernel to crash, but never saw
> the kfence test to fail due to that (never managed to get that far).
> Though the bug I saw was on x86 TCG mode, and I never tried arm64. If

[ ... that is, I didn't try running QEMU-ASan in arm64 TCG mode ... of
course I use QEMU arm64 to test. ;-) ]

> you can, try to build a QEMU with ASan and see if you also get the
> same use-after-free bug.
>
> Unless we observe the problem on a real machine, I think for now we
> can conclude with fairly high confidence that QEMU TCG still has
> issues and cannot be fully trusted here (see bug above).
>
> Thanks,
> -- Marco


  reply	other threads:[~2021-09-18  9:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-21 10:51 [PATCH v2 0/3] kfence: optimize timer scheduling Marco Elver
2021-04-21 10:51 ` [PATCH v2 1/3] kfence: await for allocation using wait_event Marco Elver
2021-04-21 10:51 ` [PATCH v2 2/3] kfence: maximize allocation wait timeout duration Marco Elver
2021-09-16  1:02   ` Kefeng Wang
2021-09-16  1:20     ` Kefeng Wang
2021-09-16  8:49       ` Marco Elver
2021-09-18  8:07         ` Liu Shixin
2021-09-18  9:37           ` Marco Elver
2021-09-18  9:45             ` Marco Elver [this message]
2021-09-16 15:45       ` David Laight
2021-09-16 15:48         ` Marco Elver
2021-04-21 10:51 ` [PATCH v2 3/3] kfence: use power-efficient work queue to run delayed work Marco Elver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANpmjNOUt5is7iHCAz9aOdD2nBb_7tqAKXmuWtitY_VNOkmv5w@mail.gmail.com \
    --to=elver@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=hdanton@sina.com \
    --cc=jannh@google.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liushixin2@huawei.com \
    --cc=mark.rutland@arm.com \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).