All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Marco Elver <elver@google.com>
Cc: paulmck@kernel.org, peterz@infradead.org,
	catalin.marinas@arm.com, james.morse@arm.com,
	linux-arm-kernel@lists.infradead.org, will@kernel.org,
	dvyukov@google.com
Subject: Re: [PATCH 00/11] arm64: entry lockdep/rcu/tracing fixes
Date: Mon, 30 Nov 2020 16:54:27 +0000	[thread overview]
Message-ID: <20201130165427.GD1251@C02TD0UTHF1T.local> (raw)
In-Reply-To: <20201130133245.GA1307615@elver.google.com>

On Mon, Nov 30, 2020 at 02:32:45PM +0100, Marco Elver wrote:
> On Mon, Nov 30, 2020 at 12:38PM +0000, Mark Rutland wrote:
> > On Mon, Nov 30, 2020 at 01:03:05PM +0100, Marco Elver wrote:

> > > So, I was hoping that this would fix all the problems I was seeing when
> > > running the ftrace tests ... unfortunately, it didn't. :-( Perhaps the
> > > WIP version you had only worked because it ended up disabling lockdep
> > > early?
> > 
> > Possibly, yes. Either that or the way we do / do-not treat debug
> > exceptions as true NMIs. Either way this appears to be a latent issue
> > rather than something introduced by this series.
> > 
> > From the log below I see you're using:
> > 
> >   5.10.0-rc4-next-20201119-00002-gc88aca8827ce #1 Not tainted
> > 
> > ... and it's possible that the issue you're seeing now is a delta
> > between v5.10-rc3 and what's queued in linux-next -- I've been running
> > the ftrace tests locally without issue atop v5.10-rc3 and v5.10-rc5.
> > 
> > Are you able to reproduce this on my branch alone? If so that gives us a
> > stable tree to investigate, and if not that gives us a stable base for a
> > bisect against linux-next.
> 
> It's the same problem as before and that I've been reporting in the
> other thread [1]. We know mainline is fine, however, -next is broken. We
> also know that next-20201105 was still fine, and next-202010 started
> breaking:
> 
> 	https://lkml.kernel.org/r/20201111133813.GA81547@elver.google.com
> 
> The recent tests have been on next-20201119 (including the logs from
> previous email).
>
> I tried bisection, but results are never conclusive (the closest I got
> was a -rcu merge commit). As discussed in the thread at [1] (and its
> ancestors) we never really got anywhere and really exhausted all options
> (several bisection attempts, etc.).

Ah; I'd lost track and missed that you'd already identified this was
introduced in linux-next, and that bisection wasn't getting anywhere.
Thanks for bearing with me! :)

> > This area is really sensitive to config options, so if you can reproduce
> > this on a stable base, could you share youir exact config?
> 
> No, it's not reproducible on mainline.
> 
> Which might also mean that it's something else in -next and your work is
> unrelated.
> 
> But I was surprised your WIP series fixed the problems on next-20201119
> (or so it seemed). So, given all the confusion in [1], I was really
> hoping this would be it...

The major difference between that and the version upstreamed is the way
debug exceptions (including BRKs) got handled as true NMIs, which hints
that there could be a subtle interaction in that area (or that the
lockdep disable calls in the NMI paths simply masked the problem).

One simple thing to try would be to hack the debug exception cases to
enter/exit as true NMIs and see whether that hides the issue again. If
so, we can start teasing that apart to narrow it down.

> > > I've attached the log and the symbolized report.
> > 
> > Thanks for all this. I'll see if I can tickle this locally while waiting
> > for the above. If you could share your config from this time around
> > that'd be a great head-start!
> 
> It's the same as I've been using for the work in
> 
> 	[1] https://lore.kernel.org/r/20201119193819.GA2601289@elver.google.com
> 
> In summary, to repro:
> 
> 	1. Switch to next-20201119 (possibly even latest, but I haven't tested)
> 
> 	2. Apply provoke-bug.diff
> 
> 	3. Use the attached .config
> 
> 	4. Run with 
> 
> 	   qemu-system-aarch64 -kernel $KERNEL_WORKTREE/arch/arm64/boot/Image \
> 		-append "console=ttyAMA0 root=/dev/sda debug earlycon earlyprintk=serial workqueue.watchdog_thresh=10" \
> 		-nographic -smp 1 -machine virt -cpu cortex-a57 -m 2G

Thanks for the comprehensive repro information!

I note that you're using QEMU in TCG mode, whereas I've been testing
with KVM acceleration. Those differ in speed by ordered of magnitude, so
I wonder if the stalls you see are down to TCG simply being slow, and my
patches just happened to shuffle where that slowness was felt.

I gave the above a go, but I wasn't able to reproduce the issue under
either TCG or KVM acceleration after a few attempts. I'm not sure
whether this is intermittent and I'm just getting lucky, or if something
is different between our setups that's causing me to not hit this.

FWIW I'm testing on a ThunderX2 workstation running Debian 10.6, using
the packaged GCC 8.3.0-6, and a locally-built QEMU 5.1.50
(v5.1.0-2347-g1f3081f6de). The QEMU has a couple of test patches atop
upstream commit ba2a9a9e6318bfd93a2306dec40137e198205b86.

> The tests I ran on your WIP series and just now were applied on top of
> next-20201119+provoke-bug.diff. Your WIP series seemed to fix whatever
> it was we were debugging in [1] (but with some new warnings), but this
> latest series shows no difference and behaviour is unchanged again.
> 
> I also want to emphasize it is really hard to say if your series here is
> related or the fact that the WIP series worked was some other
> side-effect we don't understand.

Sure; I think we're aligned on that understanding. There are a
sufficient number of moving parts here that the WIP might have been
masking a problem, or might have unintentionally solved a problem we
haven't realised exists.

> So I leave it to your judgement to decide to what extent this series
> could possibly help, because I wouldn't want to make you go down a
> rabbit hole that doesn't lead anywhere (as I had already done to
> somehow debug the problem in [1]).

I think as you say it's not at all clear, but I'd hope this series at
least removes a number of potential problems from the search space.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

      parent reply	other threads:[~2020-11-30 16:55 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-26 12:35 [PATCH 00/11] arm64: entry lockdep/rcu/tracing fixes Mark Rutland
2020-11-26 12:35 ` [PATCH 01/11] arm64: syscall: exit userspace before unmasking exceptions Mark Rutland
2020-11-26 12:35 ` [PATCH 02/11] arm64: mark idle code as noinstr Mark Rutland
2020-11-26 12:35 ` [PATCH 03/11] arm64: entry: mark entry " Mark Rutland
2020-11-26 12:35 ` [PATCH 04/11] arm64: entry: move enter_from_user_mode to entry-common.c Mark Rutland
2020-11-26 12:35 ` [PATCH 05/11] arm64: entry: prepare ret_to_user for function call Mark Rutland
2020-11-26 12:35 ` [PATCH 06/11] arm64: entry: move el1 irq/nmi logic to C Mark Rutland
2020-11-26 12:35 ` [PATCH 07/11] arm64: entry: fix non-NMI user<->kernel transitions Mark Rutland
2020-11-30 11:22   ` Will Deacon
2020-11-26 12:35 ` [PATCH 08/11] arm64: ptrace: prepare for EL1 irq/rcu tracking Mark Rutland
2020-11-30 11:01   ` Will Deacon
2020-11-26 12:36 ` [PATCH 09/11] arm64: entry: fix non-NMI kernel<->kernel transitions Mark Rutland
2020-11-30 11:22   ` Will Deacon
2020-11-26 12:36 ` [PATCH 10/11] arm64: entry: fix NMI {user, kernel}->kernel transitions Mark Rutland
2020-11-26 18:41   ` [PATCH 10/11] arm64: entry: fix NMI {user,kernel}->kernel transitions Mark Rutland
2020-11-26 21:00     ` Will Deacon
2020-11-26 12:36 ` [PATCH 11/11] arm64: entry: fix EL1 debug transitions Mark Rutland
2020-11-30 11:23 ` [PATCH 00/11] arm64: entry lockdep/rcu/tracing fixes Will Deacon
2020-11-30 12:03 ` Marco Elver
2020-11-30 12:38   ` Mark Rutland
     [not found]     ` <20201130133245.GA1307615@elver.google.com>
2020-11-30 16:54       ` Mark Rutland [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201130165427.GD1251@C02TD0UTHF1T.local \
    --to=mark.rutland@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=dvyukov@google.com \
    --cc=elver@google.com \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.