All of lore.kernel.org
 help / color / mirror / Atom feed
From: Don Zickus <dzickus@redhat.com>
To: Jeffrey Merkey <jeffmerkey@gmail.com>
Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	atomlin@redhat.com, cmetcalf@ezchip.com, fweisbec@gmail.com,
	hidehiro.kawai.ez@hitachi.com, mhocko@suse.cz, tj@kernel.org,
	uobergfe@redhat.com
Subject: Re: [PATCH v5 3/3] Add BUG_XX() debugging hard/soft lockup detection
Date: Wed, 3 Feb 2016 10:45:15 -0500	[thread overview]
Message-ID: <20160203154515.GP26637@redhat.com> (raw)
In-Reply-To: <CA+ekxPXUNBvS9QjKWLGo_0zXJ=-fbC_iry5Jcv4HVyE1hpRbsw@mail.gmail.com>

On Tue, Feb 02, 2016 at 03:40:34PM -0700, Jeffrey Merkey wrote:
> >
> > Please remember to add version history, so I can tell what changed.
> >
> 
> What command do I give to git when it creates the patch from git
> format-patch that outputs what you are looking for or do I have to add
> that manually.  The diff of files changed?
> 
> >
> > I am not sure I am a fan of this.  You are taking a known macro BUG_ON with
> > known expectations and perversely converting it into an 'asm'.  So now when
> > folks read the code they scratch their heads why we are dumping the stack
> > twice when in fact we are not. It seems misleading. :-/
> >
> 
> 1.  Does not dump the stack at all the way it is coded -- look again.
> The current code dumps it only once.  Just executes an int3 and
> returns instead of crashing.  If you called panic all the time instead
> of conditionally in this code, this change would not be needed, since
> panic is setup already to call debuggers.  It's the failure of the
> current code to do that requires this change.  How about you call
> panic when this condition ocurrs, then the debugger will get called.
> 
> 2.  BUG() outputs an asm("ud2") and triggers an invalid instruction
> and system crash.  All that was added is the ability to switch that
> ud2 to an int3.    So what is more perverse here:
> 
> BUG() = ud2 -> invalid instruction -> trap  -> call crash code ->
> debugger -> then hang
> BUG() = int3 -> int3 trap -> enter debugger -> return - system can recover

Yes, I understand that.  I was referring to reading the code.  With your
patch I read the code and see BUG() and think the machine will panic right
there, when it fact it does not because of the tricks you play.

> 
> Because:
> 
> BUG() = Debugger = int3
> and
> BUG() != ud2 (undefined instruction) = crash = non recoverable
> 
> int3 (0xCC) has always been understood to mean BUG().  int3
> breakpoints are an integral part of Intel's architecture.  There is no
> reason for not exploiting this capability of their processors to help
> kernel developers use intel technology better.
> 
> > I still don't understand why we can't use Ingo's or tglx's approach?  Your
> > changelog doesn't point out the problems there.
> >
> 
> Because when you catch a bug in the hard lockup detector the system
> just sits there hard hung and you are not able to get into a debugger
> console since the system has crashed and the watchdog code has already
> killed off the other processors and locked up all the NMI interrupt
> handlers, thereby preventing any debugger at all from functioning
> other than a hardware ice, so it's a hell of a lot easier just to
> trigger a break when you detect the first instance of a hard lockup
> before the system is completely hosed.

Hmm, I am confused here.  So you are saying because we are in the nmi
handler you can not break into the system?  The nmi handler prints some
stuff to the screen, pokes the other cpus to print stuff to the screen and
then returns to a normal operation.  Unless you are saying the act of
sending NMI IPIs never completes (because a cpu is blocking IPI interrupts),
so the cpu hangs in nmi context and the debugger never has a chance to
'break' in and see what is going on?

Cheers,
Don

  parent reply	other threads:[~2016-02-03 15:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-02  2:33 [PATCH v5 1/3] Add BUG_XX() debugging options Jeffrey Merkey
2016-02-02  2:33 ` [PATCH v5 2/3] Add BUG_XX() debugging options Kconfig.debug Jeffrey Merkey
2016-02-02  2:33 ` [PATCH v5 3/3] Add BUG_XX() debugging hard/soft lockup detection Jeffrey Merkey
2016-02-02 17:30   ` Don Zickus
2016-02-02 22:40     ` Jeffrey Merkey
2016-02-03  4:17       ` Jeffrey Merkey
2016-02-03  4:39         ` Jeffrey Merkey
2016-02-03 15:45       ` Don Zickus [this message]
2016-02-03 17:23         ` Jeffrey Merkey
2016-02-03 20:14           ` Don Zickus
2016-02-03 20:18             ` Jeffrey Merkey
2016-02-04  2:48               ` Jeffrey Merkey
2016-02-03 15:47       ` Don Zickus
2016-02-03 17:26         ` Jeffrey Merkey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160203154515.GP26637@redhat.com \
    --to=dzickus@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=atomlin@redhat.com \
    --cc=cmetcalf@ezchip.com \
    --cc=fweisbec@gmail.com \
    --cc=hidehiro.kawai.ez@hitachi.com \
    --cc=jeffmerkey@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=tj@kernel.org \
    --cc=uobergfe@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.