Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Jan Glauber <jglauber@marvell.com>
Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	Jayachandran Chandrasekharan Nair <jnair@marvell.com>,
	"will.deacon@arm.com" <will.deacon@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [RFC] Disable lockref on arm64
Date: Wed, 1 May 2019 09:41:08 -0700
Message-ID: <CAHk-=wjPqcPYkiWKFc=R3+18DXqEhV+Nfbo=JWa32Xp8Nze67g@mail.gmail.com> (raw)
In-Reply-To: <20190429145159.GA29076@hc>

[-- Attachment #1: Type: text/plain, Size: 2932 bytes --]

On Mon, Apr 29, 2019 at 7:52 AM Jan Glauber <jglauber@marvell.com> wrote:
>
> It turned out the issue we have on ThunderX2 is the file open-close sequence
> with small read sizes. If the used files are opened read-only the
> lockref code (enabled by ARCH_USE_CMPXCHG_LOCKREF) is used.
>
> The lockref CMPXCHG_LOOP uses an unbound (as long as the associated
> spinlock isn't taken) while loop to change the lock count. This behaves
> badly under heavy contention

Ok, excuse me when I rant a bit.

Since you're at Marvell, maybe you can forward this rant to the proper
guilty parties?

Who was the absolute *GENIUS* who went

 Step 1: "Oh, we have a middling CPU that isn't world-class on its own"

 Step 2: "BUT! We can put a lot of them on a die, because that's 'easy'"

 Step 3: "But let's make sure the interconnect isn't all that special,
because that would negate the the whole 'easy' part, and really strong
interconnects are even harder than CPU's and use even more power, so
that wouldn't work"

 Step 4: "I wonder why this thing scales badly?"

Seriously. Why are you guys doing this? Has nobody ever looked at the
fundamental thought process above and gone "Hmm"?

If you try to compensate for a not-great core by putting twice the
number of them in a system, you need a cache system and interconnect
between them that is more than twice as good as the competition.

And honestly, from everything that I hear, you don't have it. The
whole chip is designed for "throughput when there is no contention".
Is it really a huge surprise that it then falls flat on its face when
there's something fancy going on?

So now you want to penalize everybody else in the ARM community
because you have a badly balanced system?

Ok, rant over.

The good news is that we can easily fix _this_ particular case by just
limiting the CMPXCHG_LOOP to a maximum number of retries, since the
loop is already designed to fail quickly if the spin lock value isn't
unlocked, and all the lockref code is already organized to fall back
to spinlocks.

So the attached three-liner patch may just work for you. Once _one_
thread hits the maximum retry case and goes into the spinlocked case,
everybody else will also fall back to spinlocks because they now see
that the lockref is contended. So the "retry" value probably isn't all
that important, but let's make it big enough that it probably never
happens on a well-balanced system.

But seriously: the whole "let's just do lots of CPU cores because it's
easy" needs to stop. It's fine if you have a network processor and
you're doing independent things, but it's not a GP processor approach.

Your hardware people need to improve on your CPU core (maybe the
server version of Cortex A76 is starting to approach being good
enough?) and your interconnect (seriously!) instead of just slapping
32 cores on a die and calling it a day.

                Linus "not a fan of the flock of chickens" Torvalds

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 649 bytes --]

 lib/lockref.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/lib/lockref.c b/lib/lockref.c
index 3d468b53d4c9..a6762f8f45c9 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -9,6 +9,7 @@
  * failure case.
  */
 #define CMPXCHG_LOOP(CODE, SUCCESS) do {					\
+	int retry = 15;		/* Guaranteed random number */			\
 	struct lockref old;							\
 	BUILD_BUG_ON(sizeof(old) != 8);						\
 	old.lock_count = READ_ONCE(lockref->lock_count);			\
@@ -21,6 +22,8 @@
 		if (likely(old.lock_count == prev.lock_count)) {		\
 			SUCCESS;						\
 		}								\
+		if (!--retry)							\
+			break;							\
 		cpu_relax();							\
 	}									\
 } while (0)

[-- Attachment #3: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply index

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-29 14:52 Jan Glauber
2019-05-01 16:01 ` Will Deacon
2019-05-02  8:38   ` Jan Glauber
2019-05-01 16:41 ` Linus Torvalds [this message]
2019-05-02  8:27   ` Jan Glauber
2019-05-02 16:12     ` Linus Torvalds
2019-05-02 23:19       ` Jayachandran Chandrasekharan Nair
2019-05-03 19:40         ` Linus Torvalds
2019-05-06  6:13           ` [EXT] " Jayachandran Chandrasekharan Nair
2019-05-06 17:13             ` Linus Torvalds
2019-05-06 18:10             ` Will Deacon
2019-05-18  4:24               ` Jayachandran Chandrasekharan Nair
2019-05-18 10:00                 ` Ard Biesheuvel
2019-05-22 16:04                   ` Will Deacon
2019-06-12  4:10                     ` Jayachandran Chandrasekharan Nair
2019-06-12  9:31                       ` Will Deacon
2019-06-14  7:09                         ` Jayachandran Chandrasekharan Nair
2019-06-14  9:58                           ` Will Deacon
2019-06-14 10:24                             ` Ard Biesheuvel
2019-06-14 10:38                               ` Will Deacon
2019-06-15  4:21                                 ` Kees Cook
2019-06-15  8:47                                   ` Ard Biesheuvel
2019-06-15 13:59                                     ` Kees Cook
2019-06-15 14:18                                       ` Ard Biesheuvel
2019-06-16 21:31                                         ` Kees Cook
2019-06-17 11:33                                           ` Ard Biesheuvel
2019-06-17 17:26                                             ` Will Deacon
2019-06-17 20:07                                               ` Jayachandran Chandrasekharan Nair
2019-06-18  5:41                                               ` Kees Cook
2019-06-13  9:53                       ` Hanjun Guo
2019-06-05 13:48   ` [PATCH] lockref: Limit number of cmpxchg loop retries Jan Glauber
2019-06-05 20:16     ` Linus Torvalds
2019-06-06  8:03       ` Jan Glauber
2019-06-06  9:41         ` Will Deacon
2019-06-06 10:28           ` Jan Glauber
2019-06-07  7:27             ` Jan Glauber
2019-06-07 20:14               ` Linus Torvalds

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wjPqcPYkiWKFc=R3+18DXqEhV+Nfbo=JWa32Xp8Nze67g@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=catalin.marinas@arm.com \
    --cc=jglauber@marvell.com \
    --cc=jnair@marvell.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ARM-Kernel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/0 linux-arm-kernel/git/0.git
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/1 linux-arm-kernel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arm-kernel linux-arm-kernel/ https://lore.kernel.org/linux-arm-kernel \
		linux-arm-kernel@lists.infradead.org infradead-linux-arm-kernel@archiver.kernel.org
	public-inbox-index linux-arm-kernel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-arm-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox