All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ming Lei <minlei@redhat.com>, Juri Lelli <juri.lelli@redhat.com>
Subject: Re: [PATCH] x86/vector: Allow to free vector for managed IRQ
Date: Sat, 14 Mar 2020 19:53:56 -0400	[thread overview]
Message-ID: <20200314235356.GD95517@xz-x1> (raw)
In-Reply-To: <87v9n7erq3.fsf@nanos.tec.linutronix.de>

On Sat, Mar 14, 2020 at 12:54:12AM +0100, Thomas Gleixner wrote:
> Peter Xu <peterx@redhat.com> writes:
> > On Fri, Mar 13, 2020 at 03:24:08PM +0100, Thomas Gleixner wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> What is this backtrace for? It's completly useless as it merily shows
> >> that the warning triggers. Also even if it'd be useful then it wants to
> >> be trimmed properly.
> >
> > I thought it was a good habit to keep the facts of issues.  Backtrace
> > is one of them so I kept them.  It could, for example, help people who
> > spot the same issue in an old/downstream kernel so when they google or
> > grepping git-log they know the exact issue has been solved by some
> > commit, even without much knowledge on the internals (because they can
> > exactly compare the whole dmesg error).
> 
> This is not really good habit. Changelogs should contain factual
> information which is relevant to understand the problem. Backtraces are
> useful when the callchain leading to a bug/warn/oops _is_ relevant for
> understanding the issue. For things like this where the backtrace is
> completly out of context it's more of an distraction than of value.
> 
> Aside of that your 'spot the same issue' argument is wrong in this
> context as this does not affect anything old/downstream. The feature was
> merged during the 5.6 merge window. So it has not reached anything
> downstream which needs backports. If distro people pick out such a
> feature and backport it to their frankenkernels before the final release
> then I really dont care.
> 
> One way to preserve the backtrace for google's sake is to write a cover
> letter and stick the back trace there if it is not useful in the
> changelog.
> 
> Also having 40+ lines of untrimmed backtrace is just a horrible
> distraction. The timestamps are completely useless and depending on the
> type of problem most of the other gunk which is emitted by a
> bug/warn/oops backtrace is useless as well.
> 
> Why would you be interested in the register values for this problem or
> the code or the interrupt flags tracer state? That thing triggered a
> warning and nothing in the backtrace gives any clue about why that
> happened, IOW it's pure distraction.
> 
> If the call chain is relevant to explain the problem, then a trimmed
> down version is really sufficient. Here is an example:
> 
>      BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
>      RIP: 0010:apic_ack_edge+0x1e/0x40
>      Call Trace:
>        handle_edge_irq+0x7d/0x1e0
>        generic_handle_irq+0x27/0x30
>        aer_inject_write+0x53a/0x720
> 
> This helps, because it illustrates exactly how this BUG was triggered
> and it's reduced to exactly the 6 lines because everything else in the
> original 50+ lines is useless.

Fair enough.

> 
> > However I think I still miss one thing in the puzzle (although it
> > turns out that we've agreed on removing the warning already, but just
> > in case I missed something important) - do you mean that offlining all
> > the non-isolated CPUs in the mask won't trigger this already?  Because
> > I also saw some similar comment somewhere else...
> >
> > Here's my understanding - when offlining, we'll disable the CPU and
> > reach:
> >
> >   - irq_migrate_all_off_this_cpu
> >     - migrate_one_irq
> >       - irq_do_set_affinity
> >         - calculate HK_FLAG_MANAGED_IRQ and so on...
> >
> > Then we can still trigger this irq move event even before we bring
> > another housekeeping cpu online, right?  Or could you guide me on what
> > I have missed?
> 
> The migration when offlining the CPU to which an interrupt is affine
> does not trigger this because that uses a different mechanism.
> 
> It clears the vector on the outgoing CPU brute force simply because this
> CPU can't handle any interrupts anymore. There is some logic to catch
> the device interrupt racing against this, but while careful it's not
> perfect. There is a theoretical hole there which probably could be
> triggered by carefully orchestrating things, but we can't do anything
> about it except disabling CPU unplug :)

Ah I haven't thought about the "theoretical hole" and that far...  I
think I was only focusing on whether set_affinity() would happen, but
I ignored the fact that x86 __send_cleanup_vector() treated offlining
CPU in the special way to avoid sending IRQ_MOVE_CLEANUP_VECTOR at
all.  And that part makes perfect sense since we can't do much with an
offlined core.

> 
> So the only two ways to trigger this are the ones I described in the
> changelog.
> 
> Hope that helps.

Definitely.

Thanks for writting this up, Thomas!

-- 
Peter Xu


  reply	other threads:[~2020-03-15  1:55 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-12 20:58 [PATCH] x86/vector: Allow to free vector for managed IRQ Peter Xu
2020-03-13  3:13 ` Ming Lei
2020-03-13 12:32   ` Peter Xu
2020-03-13 14:24 ` Thomas Gleixner
2020-03-13 15:19   ` Peter Xu
2020-03-13 23:54     ` Thomas Gleixner
2020-03-14 23:53       ` Peter Xu [this message]
2020-03-14  2:04   ` Ming Lei
2020-03-13 14:31 ` [tip: x86/urgent] x86/vector: Remove warning on managed interrupt migration tip-bot2 for Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200314235356.GD95517@xz-x1 \
    --to=peterx@redhat.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=minlei@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.