From: Cristian Marussi <cristian.marussi@arm.com> To: linux-kernel@vger.kernel.org Cc: linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, dave.martin@arm.com, james.morse@arm.com, mark.rutland@arm.com, catalin.marinas@arm.com, will@kernel.org, tglx@linutronix.de, peterz@infradead.org, takahiro.akashi@linaro.org, hidehiro.kawai.ez@hitachi.com Subject: [RFC PATCH 4/7] smp: address races of starting CPUs while stopping Date: Fri, 23 Aug 2019 12:57:17 +0100 [thread overview] Message-ID: <20190823115720.605-5-cristian.marussi@arm.com> (raw) In-Reply-To: <20190823115720.605-1-cristian.marussi@arm.com> Add to SMP stop common code a best-effort retry logic, re-issuing a stop request when any CPU is detected to be still online after the first stop request cycle has completed. Address the case in which some CPUs could be still starting up when the stop process is started, remaining so undetected and coming fully online only after the SMP stop procedure was already started: such officially still offline CPUs would be missed by an ongoing stop procedure. (not considering here the special case of stuck CPUs) Using here a best effort approach, though, it is not anyway guaranteed to be able to stop any CPU that happened to finally come online after the whole SMP stop retry cycle has completed. (i.e. the race-window is reduced but not eliminated) Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> --- A more deterministic approach has been also attempted in order to catch any concurrently starting CPUs at the very last moment and make them kill themselves by: - adding a new set_cpus_stopping() in cpumask.h used to set a __cpus_stopping atomic internal flag - modifying set_cpu_online() to check on __cpus_stopping only when coming online, and force the offending CPU to kill itself in that case Anyway it has proved tricky and complex (beside faulty) to implement the above 'kill-myself' phase in a reliable way while remaining architecture agnostic and still distingushing properly regular stops from crash kexec. So given that the main idea underlying this patch series was instead of simplifying and unifying code and the residual races not caught by the best-effort logic seemed not very likely, this more deterministic approach has been dropped in favour of the best effort retry logic. --- kernel/smp.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index ea8a1cc0ec3e..10d3120494f2 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -847,6 +847,8 @@ void __weak arch_smp_crash_call(cpumask_t *cpus) #define REASON_STOP 1 #define REASON_CRASH 2 +#define MAX_STOP_RETRIES 2 + /* * This centralizes the common logic to: * @@ -860,7 +862,7 @@ void __weak arch_smp_crash_call(cpumask_t *cpus) */ static inline void __smp_send_stop_all(bool reason_crash) { - unsigned int this_cpu_id; + unsigned int this_cpu_id, retries = MAX_STOP_RETRIES; cpumask_t mask; static atomic_t stopping; int was_reason; @@ -894,7 +896,7 @@ static inline void __smp_send_stop_all(bool reason_crash) arch_smp_cpu_park(); } } - if (any_other_cpus_online(&mask, this_cpu_id)) { + while (retries-- && any_other_cpus_online(&mask, this_cpu_id)) { unsigned long timeout; unsigned int this_cpu_online = cpu_online(this_cpu_id); @@ -921,9 +923,12 @@ static inline void __smp_send_stop_all(bool reason_crash) udelay(1); /* ensure any stopping-CPUs memory access is made visible */ smp_rmb(); - if (num_online_cpus() > this_cpu_online) + if (num_online_cpus() > this_cpu_online) { pr_warn("failed to stop secondary CPUs %*pbl\n", cpumask_pr_args(cpu_online_mask)); + if (retries) + pr_warn("Retrying SMP stop call...\n"); + } } /* Perform final (possibly arch-specific) work on this CPU */ arch_smp_cpus_stop_complete(); -- 2.17.1
WARNING: multiple messages have this Message-ID (diff)
From: Cristian Marussi <cristian.marussi@arm.com> To: linux-kernel@vger.kernel.org Cc: linux-arch@vger.kernel.org, mark.rutland@arm.com, peterz@infradead.org, catalin.marinas@arm.com, takahiro.akashi@linaro.org, james.morse@arm.com, hidehiro.kawai.ez@hitachi.com, tglx@linutronix.de, will@kernel.org, dave.martin@arm.com, linux-arm-kernel@lists.infradead.org Subject: [RFC PATCH 4/7] smp: address races of starting CPUs while stopping Date: Fri, 23 Aug 2019 12:57:17 +0100 [thread overview] Message-ID: <20190823115720.605-5-cristian.marussi@arm.com> (raw) In-Reply-To: <20190823115720.605-1-cristian.marussi@arm.com> Add to SMP stop common code a best-effort retry logic, re-issuing a stop request when any CPU is detected to be still online after the first stop request cycle has completed. Address the case in which some CPUs could be still starting up when the stop process is started, remaining so undetected and coming fully online only after the SMP stop procedure was already started: such officially still offline CPUs would be missed by an ongoing stop procedure. (not considering here the special case of stuck CPUs) Using here a best effort approach, though, it is not anyway guaranteed to be able to stop any CPU that happened to finally come online after the whole SMP stop retry cycle has completed. (i.e. the race-window is reduced but not eliminated) Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> --- A more deterministic approach has been also attempted in order to catch any concurrently starting CPUs at the very last moment and make them kill themselves by: - adding a new set_cpus_stopping() in cpumask.h used to set a __cpus_stopping atomic internal flag - modifying set_cpu_online() to check on __cpus_stopping only when coming online, and force the offending CPU to kill itself in that case Anyway it has proved tricky and complex (beside faulty) to implement the above 'kill-myself' phase in a reliable way while remaining architecture agnostic and still distingushing properly regular stops from crash kexec. So given that the main idea underlying this patch series was instead of simplifying and unifying code and the residual races not caught by the best-effort logic seemed not very likely, this more deterministic approach has been dropped in favour of the best effort retry logic. --- kernel/smp.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index ea8a1cc0ec3e..10d3120494f2 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -847,6 +847,8 @@ void __weak arch_smp_crash_call(cpumask_t *cpus) #define REASON_STOP 1 #define REASON_CRASH 2 +#define MAX_STOP_RETRIES 2 + /* * This centralizes the common logic to: * @@ -860,7 +862,7 @@ void __weak arch_smp_crash_call(cpumask_t *cpus) */ static inline void __smp_send_stop_all(bool reason_crash) { - unsigned int this_cpu_id; + unsigned int this_cpu_id, retries = MAX_STOP_RETRIES; cpumask_t mask; static atomic_t stopping; int was_reason; @@ -894,7 +896,7 @@ static inline void __smp_send_stop_all(bool reason_crash) arch_smp_cpu_park(); } } - if (any_other_cpus_online(&mask, this_cpu_id)) { + while (retries-- && any_other_cpus_online(&mask, this_cpu_id)) { unsigned long timeout; unsigned int this_cpu_online = cpu_online(this_cpu_id); @@ -921,9 +923,12 @@ static inline void __smp_send_stop_all(bool reason_crash) udelay(1); /* ensure any stopping-CPUs memory access is made visible */ smp_rmb(); - if (num_online_cpus() > this_cpu_online) + if (num_online_cpus() > this_cpu_online) { pr_warn("failed to stop secondary CPUs %*pbl\n", cpumask_pr_args(cpu_online_mask)); + if (retries) + pr_warn("Retrying SMP stop call...\n"); + } } /* Perform final (possibly arch-specific) work on this CPU */ arch_smp_cpus_stop_complete(); -- 2.17.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-08-23 11:58 UTC|newest] Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-08-23 11:57 [RFC PATCH 0/7] Unify SMP stop generic logic to common code Cristian Marussi 2019-08-23 11:57 ` Cristian Marussi 2019-08-23 11:57 ` [RFC PATCH 1/7] smp: add generic SMP-stop support " Cristian Marussi 2019-08-23 11:57 ` Cristian Marussi 2019-08-23 11:57 ` [RFC PATCH 2/7] smp: unify crash_ and smp_send_stop() logic Cristian Marussi 2019-08-23 11:57 ` Cristian Marussi 2019-08-23 11:57 ` [RFC PATCH 3/7] smp: coordinate concurrent crash/smp stop calls Cristian Marussi 2019-08-23 11:57 ` Cristian Marussi 2019-08-23 11:57 ` Cristian Marussi [this message] 2019-08-23 11:57 ` [RFC PATCH 4/7] smp: address races of starting CPUs while stopping Cristian Marussi 2019-08-23 11:57 ` [RFC PATCH 5/7] arm64: smp: use generic SMP stop common code Cristian Marussi 2019-08-23 11:57 ` Cristian Marussi 2019-08-26 15:32 ` Christoph Hellwig 2019-08-26 15:32 ` Christoph Hellwig 2019-08-26 19:58 ` Cristian Marussi 2019-08-26 19:58 ` Cristian Marussi 2019-08-26 22:26 ` Thomas Gleixner 2019-08-26 22:26 ` Thomas Gleixner 2019-08-27 14:34 ` Cristian Marussi 2019-08-27 14:34 ` Cristian Marussi 2019-08-23 11:57 ` [RFC PATCH 6/7] arm64: smp: use SMP crash-stop " Cristian Marussi 2019-08-23 11:57 ` Cristian Marussi 2019-08-23 11:57 ` [RFC PATCH 7/7] arm64: smp: add arch specific cpu parking helper Cristian Marussi 2019-08-23 11:57 ` Cristian Marussi 2019-08-26 15:34 ` [RFC PATCH 0/7] Unify SMP stop generic logic to common code Christoph Hellwig 2019-08-26 15:34 ` Christoph Hellwig 2019-08-26 19:33 ` Cristian Marussi 2019-08-26 19:33 ` Cristian Marussi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190823115720.605-5-cristian.marussi@arm.com \ --to=cristian.marussi@arm.com \ --cc=catalin.marinas@arm.com \ --cc=dave.martin@arm.com \ --cc=hidehiro.kawai.ez@hitachi.com \ --cc=james.morse@arm.com \ --cc=linux-arch@vger.kernel.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mark.rutland@arm.com \ --cc=peterz@infradead.org \ --cc=takahiro.akashi@linaro.org \ --cc=tglx@linutronix.de \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.