linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Marc Zyngier <maz@kernel.org>
Cc: mark.rutland@arm.com,
	Android Kernel Team <kernel-team@android.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Will Deacon <will@kernel.org>,
	Valentin Schneider <Valentin.Schneider@arm.com>,
	LAK <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH 0/2] arm64: Allow the rescheduling IPI to bypass irq_enter/exit
Date: Fri, 20 Nov 2020 15:17:12 +0100	[thread overview]
Message-ID: <87lfewnmdz.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <91cde5eeb22eb2926515dd27113c664a@kernel.org>

Marc,

On Fri, Nov 20 2020 at 09:20, Marc Zyngier wrote:
> On 2020-11-03 20:32, Thomas Gleixner wrote:
>> On Sun, Nov 01 2020 at 13:14, Marc Zyngier wrote:
>

> More seriously, it seems to me that we have a bit of a
> cross-architecture disconnect here. I have been trying to join the
> dots between what you describe above, and the behaviour of arm64 (and
> probably a large number of the non-x86 architectures), and I feel
> massively confused.
>
> Up to 5.9, our handling of the rescheduling IPI was "do as little as
> possible": decode the interrupt from the lowest possible level (the
> root irqchip), call into arch code, end-up in scheduler_ipi(), the end.
>
> No lockdep, no RCU, no nothing.
>
> What changed? Have we missed some radical change in the way the core
> kernel expects the arch code to do thing? I'm aware of the 
> kernel/entry/common.c stuff, which implements most of the goodies you
> mention, but this effectively is x86-only at the moment.
>
> If arm64 has forever been broken, I'd really like to know and fix it.

So with the pre 5.8 scheduler IPI we had scheduler_ipi() doing wonderful
things

scheduler_ipi()
       ...
       if (llist_empty(&this_rq()->wake_list) && !got_nohz_idle_kick())
               return;
       irq_enter();
       ...
       irq_exit();

When Peter and I started to investigate the correctness and robustness
of syscalls, interrupts and exceptions versus RCU, instrumentation,
breakpoints for live patching etc., we stumbled over this and a lot of
other things.

Especially instrumentation had grown warts over time to deal with the
problem that it can hit into a RCU idle section. That started to get
worse which made us look deeper and the decision was to have strict
non-instrumentable sections which call out into instrumentable sections
_after_ establishing state. That moved all of the context tracking, RCU,
tracing muck out into C code which is what we have in kernel/entry/ now.

Back to the scheduler IPI. 5.8 made the scheduler IPI an almost empty
function which just folds need resched on architectures which need that.

Of course when I wrote the last reply I had the current x86
implementation in mind which does:

DEFINE_IDTENTRY_SYSVEC_SIMPLE(sysvec_reschedule_ipi)
{
        ack_APIC_irq();
        trace_reschedule_entry(RESCHEDULE_VECTOR);
        inc_irq_stat(irq_resched_count);
        scheduler_ipi();
        trace_reschedule_exit(RESCHEDULE_VECTOR);
}

ack_APIC_irq() can be traced and the tracepoints of course want a proper
established context too.

Now you surely could do:

ASM-IPI-enter:
       asm_ack_irq();
       asm_fold_need_resched();
       reti();

But that's not enough, because the point of the scheduler_ipi is
unsurprisingly to reevaluate need_resched() and to schedule. So you get:

ASM-IPI-enter:
       asm_ack_irq();
       asm_fold_need_resched()
       if (!asm_need_resched())
            reti();

       /* context tracking, rcu, .... */
       handle_enter_from_user_or_kernel();

       preempt_schedule_irq();

       /* context tracking, rcu, .... */
       handle_exit_to_user_or_kernel();
       reti();

We did not do that because we have the tracepoints there and I couldn't
find a reason why having yet another slightly different codepath to get
straight vs. context tracking and RCU was desirable. So we ended up
with:

ASM-IPI-enter:
  asm_enter();

    irqentry_enter() {
      handle_enter_from_user_or_kernel();

      /* Start of safe and instrumentable section */
    }

    __irq_enter_raw();

    sysvec_scheduler_ipi();

    __irq_exit_raw();

    irqentry_exit() {
      if (need_resched())
        preempt_schedule_irq();

      /* End of safe and instrumentable section */

      handle_exit_to_user_or_kernel();
    }

  asm_exit();

The only difference to other IPIs is that it avoids the accounting
overhead because accounting the almost empty scheduler_ipi() function is
pretty much pointless.

Hope that clarifies it.

> If arm64 has forever been broken, I'd really like to know and fix it.

Define broken. It kinda works with all the warts and bolts in tracing,
context tracking and other places. Is it entirely correct? Probably not.

The scheduler IPI is trivial compared to the more interesting ones like
NMI, MCE, breakpoint, debug exceptions etc. We found quite some
interesting issues with all of that muck during our 'noinstr' chase.

Thanks,

        tglx

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-11-20 14:18 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-01 13:14 [PATCH 0/2] arm64: Allow the rescheduling IPI to bypass irq_enter/exit Marc Zyngier
2020-11-01 13:14 ` [PATCH 1/2] genirq: Allow an interrupt to be marked as 'naked' Marc Zyngier
2020-11-01 14:33   ` David Laight
2020-11-01 13:14 ` [PATCH 2/2] arm64: Mark the recheduling IPI as naked interrupt Marc Zyngier
2020-11-01 14:30 ` [PATCH 0/2] arm64: Allow the rescheduling IPI to bypass irq_enter/exit David Laight
2020-11-02 10:30 ` Valentin Schneider
2020-11-10 13:03   ` Peter Zijlstra
     [not found]     ` <19286daf276f46aa@fake-msgid>
2020-11-10 15:48       ` Valentin Schneider
2020-11-03 20:32 ` Thomas Gleixner
2020-11-20  9:20   ` Marc Zyngier
2020-11-20 14:17     ` Thomas Gleixner [this message]
2020-11-22 16:13       ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lfewnmdz.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=Valentin.Schneider@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=kernel-team@android.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=peterz@infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).