From: Jayachandran Chandrasekharan Nair <jnair@marvell.com> To: Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>, Jan Glauber <jglauber@marvell.com>, "catalin.marinas@arm.com" <catalin.marinas@arm.com>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org> Subject: Re: [RFC] Disable lockref on arm64 Date: Sat, 18 May 2019 04:24:43 +0000 [thread overview] Message-ID: <20190518042424.GA28517@dc5-eodlnx05.marvell.com> (raw) In-Reply-To: <20190506181039.GA2875@brain-police> On Mon, May 06, 2019 at 07:10:40PM +0100, Will Deacon wrote: > On Mon, May 06, 2019 at 06:13:12AM +0000, Jayachandran Chandrasekharan Nair wrote: > > Perhaps someone from ARM can chime in here how the cas/yield combo > > is expected to work when there is contention. ThunderX2 does not > > do much with the yield, but I don't expect any ARM implementation > > to treat YIELD as a hint not to yield, but to get/keep exclusive > > access to the last failed CAS location. > > Just picking up on this as "someone from ARM". > > The yield instruction in our implementation of cpu_relax() is *only* there > as a scheduling hint to QEMU so that it can treat it as an internal > scheduling hint and run some other thread; see 1baa82f48030 ("arm64: > Implement cpu_relax as yield"). We can't use WFE or WFI blindly here, as it > could be a long time before we see a wake-up event such as an interrupt. Our > implementation of smp_cond_load_acquire() is much better for that kind of > thing, but doesn't help at all for a contended CAS loop where the variable > is actually changing constantly. Looking thru the perf output of this case (open/close of a file from multiple CPUs), I see that refcount is a significant factor in most kernel configurations - and that too uses cmpxchg (without yield). x86 has an optimized inline version of refcount that helps significantly. Do you think this is worth looking at for arm64? > Implementing yield in the CPU may generally be beneficial for SMT designs so > that the hardware resources aren't wasted when spinning round a busy loop. Yield is probably used in sub-optimal implementations of delay or wait. It is going to be different across multiple implementations and revisions (given the description in ARM spec). Having a more yielding(?) implementation would be equally problematic especially in the lockref case. > For this particular discussion (i.e. lockref), however, it seems as though > the cpu_relax() call is questionable to start with. In case of lockref, taking out the yield/pause and dropping to queued spinlock after some cycles appears to me to be a better approach. Relying on the quality of cpu_relax() on the specific processor to mitigate against contention is going to be tricky anyway. We will do some more work here, but would appreciate any pointers based on your experience here. Thanks, JC
WARNING: multiple messages have this Message-ID (diff)
From: Jayachandran Chandrasekharan Nair <jnair@marvell.com> To: Will Deacon <will.deacon@arm.com> Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>, Jan Glauber <jglauber@marvell.com>, Linus Torvalds <torvalds@linux-foundation.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Subject: Re: [RFC] Disable lockref on arm64 Date: Sat, 18 May 2019 04:24:43 +0000 [thread overview] Message-ID: <20190518042424.GA28517@dc5-eodlnx05.marvell.com> (raw) In-Reply-To: <20190506181039.GA2875@brain-police> On Mon, May 06, 2019 at 07:10:40PM +0100, Will Deacon wrote: > On Mon, May 06, 2019 at 06:13:12AM +0000, Jayachandran Chandrasekharan Nair wrote: > > Perhaps someone from ARM can chime in here how the cas/yield combo > > is expected to work when there is contention. ThunderX2 does not > > do much with the yield, but I don't expect any ARM implementation > > to treat YIELD as a hint not to yield, but to get/keep exclusive > > access to the last failed CAS location. > > Just picking up on this as "someone from ARM". > > The yield instruction in our implementation of cpu_relax() is *only* there > as a scheduling hint to QEMU so that it can treat it as an internal > scheduling hint and run some other thread; see 1baa82f48030 ("arm64: > Implement cpu_relax as yield"). We can't use WFE or WFI blindly here, as it > could be a long time before we see a wake-up event such as an interrupt. Our > implementation of smp_cond_load_acquire() is much better for that kind of > thing, but doesn't help at all for a contended CAS loop where the variable > is actually changing constantly. Looking thru the perf output of this case (open/close of a file from multiple CPUs), I see that refcount is a significant factor in most kernel configurations - and that too uses cmpxchg (without yield). x86 has an optimized inline version of refcount that helps significantly. Do you think this is worth looking at for arm64? > Implementing yield in the CPU may generally be beneficial for SMT designs so > that the hardware resources aren't wasted when spinning round a busy loop. Yield is probably used in sub-optimal implementations of delay or wait. It is going to be different across multiple implementations and revisions (given the description in ARM spec). Having a more yielding(?) implementation would be equally problematic especially in the lockref case. > For this particular discussion (i.e. lockref), however, it seems as though > the cpu_relax() call is questionable to start with. In case of lockref, taking out the yield/pause and dropping to queued spinlock after some cycles appears to me to be a better approach. Relying on the quality of cpu_relax() on the specific processor to mitigate against contention is going to be tricky anyway. We will do some more work here, but would appreciate any pointers based on your experience here. Thanks, JC _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2019-05-18 4:25 UTC|newest] Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-04-29 14:52 [RFC] Disable lockref on arm64 Jan Glauber 2019-04-29 14:52 ` Jan Glauber 2019-05-01 16:01 ` Will Deacon 2019-05-01 16:01 ` Will Deacon 2019-05-02 8:38 ` Jan Glauber 2019-05-02 8:38 ` Jan Glauber 2019-05-01 16:41 ` Linus Torvalds 2019-05-01 16:41 ` Linus Torvalds 2019-05-02 8:27 ` Jan Glauber 2019-05-02 8:27 ` Jan Glauber 2019-05-02 16:12 ` Linus Torvalds 2019-05-02 16:12 ` Linus Torvalds 2019-05-02 23:19 ` Jayachandran Chandrasekharan Nair 2019-05-02 23:19 ` Jayachandran Chandrasekharan Nair 2019-05-03 19:40 ` Linus Torvalds 2019-05-03 19:40 ` Linus Torvalds 2019-05-06 6:13 ` [EXT] " Jayachandran Chandrasekharan Nair 2019-05-06 6:13 ` Jayachandran Chandrasekharan Nair 2019-05-06 17:13 ` Linus Torvalds 2019-05-06 17:13 ` Linus Torvalds 2019-05-06 18:10 ` Will Deacon 2019-05-06 18:10 ` Will Deacon 2019-05-18 4:24 ` Jayachandran Chandrasekharan Nair [this message] 2019-05-18 4:24 ` Jayachandran Chandrasekharan Nair 2019-05-18 10:00 ` Ard Biesheuvel 2019-05-18 10:00 ` Ard Biesheuvel 2019-05-22 16:04 ` Will Deacon 2019-05-22 16:04 ` Will Deacon 2019-06-12 4:10 ` Jayachandran Chandrasekharan Nair 2019-06-12 4:10 ` Jayachandran Chandrasekharan Nair 2019-06-12 9:31 ` Will Deacon 2019-06-12 9:31 ` Will Deacon 2019-06-14 7:09 ` Jayachandran Chandrasekharan Nair 2019-06-14 7:09 ` Jayachandran Chandrasekharan Nair 2019-06-14 9:58 ` Will Deacon 2019-06-14 9:58 ` Will Deacon 2019-06-14 10:24 ` Ard Biesheuvel 2019-06-14 10:24 ` Ard Biesheuvel 2019-06-14 10:38 ` Will Deacon 2019-06-14 10:38 ` Will Deacon 2019-06-15 4:21 ` Kees Cook 2019-06-15 4:21 ` Kees Cook 2019-06-15 8:47 ` Ard Biesheuvel 2019-06-15 8:47 ` Ard Biesheuvel 2019-06-15 13:59 ` Kees Cook 2019-06-15 13:59 ` Kees Cook 2019-06-15 14:18 ` Ard Biesheuvel 2019-06-15 14:18 ` Ard Biesheuvel 2019-06-16 21:31 ` Kees Cook 2019-06-16 21:31 ` Kees Cook 2019-06-17 11:33 ` Ard Biesheuvel 2019-06-17 11:33 ` Ard Biesheuvel 2019-06-17 17:26 ` Will Deacon 2019-06-17 17:26 ` Will Deacon 2019-06-17 20:07 ` Jayachandran Chandrasekharan Nair 2019-06-17 20:07 ` Jayachandran Chandrasekharan Nair 2019-06-18 5:41 ` Kees Cook 2019-06-18 5:41 ` Kees Cook 2019-06-13 9:53 ` Hanjun Guo 2019-06-13 9:53 ` Hanjun Guo 2019-06-05 13:48 ` [PATCH] lockref: Limit number of cmpxchg loop retries Jan Glauber 2019-06-05 13:48 ` Jan Glauber 2019-06-05 20:16 ` Linus Torvalds 2019-06-05 20:16 ` Linus Torvalds 2019-06-06 8:03 ` Jan Glauber 2019-06-06 8:03 ` Jan Glauber 2019-06-06 9:41 ` Will Deacon 2019-06-06 9:41 ` Will Deacon 2019-06-06 10:28 ` Jan Glauber 2019-06-06 10:28 ` Jan Glauber 2019-06-07 7:27 ` Jan Glauber 2019-06-07 7:27 ` Jan Glauber 2019-06-07 20:14 ` Linus Torvalds 2019-06-07 20:14 ` Linus Torvalds
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190518042424.GA28517@dc5-eodlnx05.marvell.com \ --to=jnair@marvell.com \ --cc=catalin.marinas@arm.com \ --cc=jglauber@marvell.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=torvalds@linux-foundation.org \ --cc=will.deacon@arm.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.