From: Andrea Parri <parri.andrea@gmail.com> To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: paulmck@kernel.org, palmer@dabbelt.com, paul.walmsley@sifive.com, aou@eecs.berkeley.edu, mmaas@google.com, hboehm@google.com, striker@us.ibm.com, charlie@rivosinc.com, rehn@rivosinc.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] membarrier: riscv: Provide core serializing command Date: Tue, 28 Nov 2023 16:13:56 +0100 [thread overview] Message-ID: <ZWYDtB/otYvTMZWd@andrea> (raw) In-Reply-To: <91ab0210-07f9-42c4-af7f-a98799250cf7@efficios.com> > I am concerned about the possibility that this change lacks two barriers in the > following scenario: > > On a transition from uthread -> uthread on [CPU 0], from a thread belonging to > another mm to a thread belonging to the mm [!mm -> mm] for which a concurrent > membarrier sync-core is done on [CPU 1]: > > - [CPU 1] sets all bits in the mm icache_stale_mask [A]. There are no barriers > associated with these stores. > > - [CPU 0] store to rq->curr [B] (by the scheduler) vs [CPU 1] loads rq->curr [C] > within membarrier to decide if the IPI should be skipped. Let's say CPU 1 observes > cpu_rq(0)->curr->mm != mm, so it skips the IPI. > > - This means membarrier relies on switch_mm() to issue the sync-core. > > - [CPU 0] switch_mm() loads [D] the icache_stale_mask. If the bit is zero, switch_mm() > may incorrectly skip the sync-core. > > AFAIU, [C] can be reordered before [A] because there is no barrier between those > operations within membarrier. I suspect it can cause the switch_mm() code to skip > a needed sync-core. > > AFAIU, [D] can be reordered before [B] because there is no documented barrier > between those operations within the scheduler, which can also cause switch_mm() > to skip a needed sync-core. > > We possibly have a similar scenario for uthread->uthread when the scheduler > switches between mm -> !mm. > > One way to fix this would be to add the following barriers: > > - A smp_mb() between [A] and [C], possibly just after cpumask_setall() in > prepare_sync_core_cmd(), with comments detailing the ordering it guarantees, > - A smp_mb() between [B] and [D], possibly just before cpumask_test_cpu() in > flush_icache_deferred(), with appropriate comments. > > Am I missing something ? Thanks for the detailed analysis. AFAIU, the following barrier (in membarrier_private_expedited()) /* * Matches memory barriers around rq->curr modification in * scheduler. */ smp_mb(); /* system call entry is not a mb. */ can serve the purpose of ordering [A] before [C] (to be documented in v2). But I agree that [B] and [D] are unordered /missing suitable synchronization. Worse, RISC-V has currently no full barrier after [B] and before returning to user-space: I'm thinking (inspired by the PowerPC implementation), diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c index 217fd4de61342..f63222513076d 100644 --- a/arch/riscv/mm/context.c +++ b/arch/riscv/mm/context.c @@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, if (unlikely(prev == next)) return; +#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP) + /* + * The membarrier system call requires a full memory barrier + * after storing to rq->curr, before going back to user-space. + * + * Only need the full barrier when switching between processes: + * barrier when switching from kernel to userspace is not + * required here, given that it is implied by mmdrop(); barrier + * when switching from userspace to kernel is not needed after + * store to rq->curr. + */ + if (unlikely(atomic_read(&next->membarrier_state) & + (MEMBARRIER_STATE_PRIVATE_EXPEDITED | + MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev) + smp_mb(); +#endif + /* * Mark the current MM context as inactive, and the next as * active. This is at least used by the icache flushing diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a708d225c28e8..a1c749fddd095 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6670,8 +6670,9 @@ static void __sched notrace __schedule(unsigned int sched_mode) * * Here are the schemes providing that barrier on the * various architectures: - * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC. - * switch_mm() rely on membarrier_arch_switch_mm() on PowerPC. + * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC, + * RISC-V. switch_mm() relies on membarrier_arch_switch_mm() + * on PowerPC. * - finish_lock_switch() for weakly-ordered * architectures where spin_unlock is a full barrier, * - switch_to() for arm64 (weakly-ordered, spin_unlock The silver lining is that similar changes (probably as a separate/preliminary patch) also restore the desired order between [B] and [D] AFAIU; so with them, 2/2 would just need additions to document the above SYNC_CORE scenario. Thoughts? Andrea
WARNING: multiple messages have this Message-ID (diff)
From: Andrea Parri <parri.andrea@gmail.com> To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: paulmck@kernel.org, palmer@dabbelt.com, paul.walmsley@sifive.com, aou@eecs.berkeley.edu, mmaas@google.com, hboehm@google.com, striker@us.ibm.com, charlie@rivosinc.com, rehn@rivosinc.com, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] membarrier: riscv: Provide core serializing command Date: Tue, 28 Nov 2023 16:13:56 +0100 [thread overview] Message-ID: <ZWYDtB/otYvTMZWd@andrea> (raw) In-Reply-To: <91ab0210-07f9-42c4-af7f-a98799250cf7@efficios.com> > I am concerned about the possibility that this change lacks two barriers in the > following scenario: > > On a transition from uthread -> uthread on [CPU 0], from a thread belonging to > another mm to a thread belonging to the mm [!mm -> mm] for which a concurrent > membarrier sync-core is done on [CPU 1]: > > - [CPU 1] sets all bits in the mm icache_stale_mask [A]. There are no barriers > associated with these stores. > > - [CPU 0] store to rq->curr [B] (by the scheduler) vs [CPU 1] loads rq->curr [C] > within membarrier to decide if the IPI should be skipped. Let's say CPU 1 observes > cpu_rq(0)->curr->mm != mm, so it skips the IPI. > > - This means membarrier relies on switch_mm() to issue the sync-core. > > - [CPU 0] switch_mm() loads [D] the icache_stale_mask. If the bit is zero, switch_mm() > may incorrectly skip the sync-core. > > AFAIU, [C] can be reordered before [A] because there is no barrier between those > operations within membarrier. I suspect it can cause the switch_mm() code to skip > a needed sync-core. > > AFAIU, [D] can be reordered before [B] because there is no documented barrier > between those operations within the scheduler, which can also cause switch_mm() > to skip a needed sync-core. > > We possibly have a similar scenario for uthread->uthread when the scheduler > switches between mm -> !mm. > > One way to fix this would be to add the following barriers: > > - A smp_mb() between [A] and [C], possibly just after cpumask_setall() in > prepare_sync_core_cmd(), with comments detailing the ordering it guarantees, > - A smp_mb() between [B] and [D], possibly just before cpumask_test_cpu() in > flush_icache_deferred(), with appropriate comments. > > Am I missing something ? Thanks for the detailed analysis. AFAIU, the following barrier (in membarrier_private_expedited()) /* * Matches memory barriers around rq->curr modification in * scheduler. */ smp_mb(); /* system call entry is not a mb. */ can serve the purpose of ordering [A] before [C] (to be documented in v2). But I agree that [B] and [D] are unordered /missing suitable synchronization. Worse, RISC-V has currently no full barrier after [B] and before returning to user-space: I'm thinking (inspired by the PowerPC implementation), diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c index 217fd4de61342..f63222513076d 100644 --- a/arch/riscv/mm/context.c +++ b/arch/riscv/mm/context.c @@ -323,6 +323,23 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next, if (unlikely(prev == next)) return; +#if defined(CONFIG_MEMBARRIER) && defined(CONFIG_SMP) + /* + * The membarrier system call requires a full memory barrier + * after storing to rq->curr, before going back to user-space. + * + * Only need the full barrier when switching between processes: + * barrier when switching from kernel to userspace is not + * required here, given that it is implied by mmdrop(); barrier + * when switching from userspace to kernel is not needed after + * store to rq->curr. + */ + if (unlikely(atomic_read(&next->membarrier_state) & + (MEMBARRIER_STATE_PRIVATE_EXPEDITED | + MEMBARRIER_STATE_GLOBAL_EXPEDITED)) && prev) + smp_mb(); +#endif + /* * Mark the current MM context as inactive, and the next as * active. This is at least used by the icache flushing diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a708d225c28e8..a1c749fddd095 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6670,8 +6670,9 @@ static void __sched notrace __schedule(unsigned int sched_mode) * * Here are the schemes providing that barrier on the * various architectures: - * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC. - * switch_mm() rely on membarrier_arch_switch_mm() on PowerPC. + * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC, + * RISC-V. switch_mm() relies on membarrier_arch_switch_mm() + * on PowerPC. * - finish_lock_switch() for weakly-ordered * architectures where spin_unlock is a full barrier, * - switch_to() for arm64 (weakly-ordered, spin_unlock The silver lining is that similar changes (probably as a separate/preliminary patch) also restore the desired order between [B] and [D] AFAIU; so with them, 2/2 would just need additions to document the above SYNC_CORE scenario. Thoughts? Andrea _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2023-11-28 15:14 UTC|newest] Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-11-27 10:32 [PATCH 0/2] membarrier: riscv: Provide core serializing command Andrea Parri 2023-11-27 10:32 ` Andrea Parri 2023-11-27 10:32 ` [PATCH 1/2] locking: Introduce prepare_sync_core_cmd() Andrea Parri 2023-11-27 10:32 ` Andrea Parri 2023-11-27 12:53 ` Mathieu Desnoyers 2023-11-27 12:53 ` Mathieu Desnoyers 2023-11-27 10:32 ` [PATCH 2/2] membarrier: riscv: Provide core serializing command Andrea Parri 2023-11-27 10:32 ` Andrea Parri 2023-11-27 13:28 ` Mathieu Desnoyers 2023-11-27 13:28 ` Mathieu Desnoyers 2023-11-28 15:13 ` Andrea Parri [this message] 2023-11-28 15:13 ` Andrea Parri 2023-11-28 18:39 ` Mathieu Desnoyers 2023-11-28 18:39 ` Mathieu Desnoyers 2023-11-29 18:29 ` Andrea Parri 2023-11-29 18:29 ` Andrea Parri 2023-11-29 20:00 ` Mathieu Desnoyers 2023-11-29 20:00 ` Mathieu Desnoyers 2023-11-29 21:25 ` Andrea Parri 2023-11-29 21:25 ` Andrea Parri 2023-11-29 21:32 ` Mathieu Desnoyers 2023-11-29 21:32 ` Mathieu Desnoyers 2023-11-29 22:43 ` Andrea Parri 2023-11-29 22:43 ` Andrea Parri 2023-12-06 13:05 ` Palmer Dabbelt 2023-12-06 13:05 ` Palmer Dabbelt 2023-12-06 14:11 ` Andrea Parri 2023-12-06 14:11 ` Andrea Parri 2023-12-06 14:15 ` Palmer Dabbelt 2023-12-06 14:15 ` Palmer Dabbelt 2023-12-06 17:42 ` Andrea Parri 2023-12-06 17:42 ` Andrea Parri 2023-12-06 17:56 ` Palmer Dabbelt 2023-12-06 17:56 ` Palmer Dabbelt
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=ZWYDtB/otYvTMZWd@andrea \ --to=parri.andrea@gmail.com \ --cc=aou@eecs.berkeley.edu \ --cc=charlie@rivosinc.com \ --cc=hboehm@google.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-riscv@lists.infradead.org \ --cc=mathieu.desnoyers@efficios.com \ --cc=mmaas@google.com \ --cc=palmer@dabbelt.com \ --cc=paul.walmsley@sifive.com \ --cc=paulmck@kernel.org \ --cc=rehn@rivosinc.com \ --cc=striker@us.ibm.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.