All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/12] arm+arm64: vdso unification to lib/vdso/
@ 2017-10-27 22:23 ` Mark Salyzyn
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Salyzyn @ 2017-10-27 22:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Mark Salyzyn, James Morse, Russell King, Catalin Marinas,
	Will Deacon, Andy Lutomirski, Dmitry Safonov, John Stultz,
	Mark Rutland, Laura Abbott, Kees Cook, Ard Biesheuvel,
	Andy Gross, Kevin Brodsky, Andrew Pinski, linux-arm-kernel

Take an effort to recode the arm64 vdso code from assembler to C
previously submitted by Andrew Pinski <apinski@cavium.com>, rework
it for use in both arm and arm64, overlapping any optimizations
for each architecture. But instead of landing it in arm64, land the
result into lib/vdso and unify both implementations to simplify
future maintenance. This will act as the basis for implementing
arm64 vdso32 in the future.


apinski@cavium.com made the following claims in the original patch:

This allows the compiler to optimize the divide by 1000 and remove
the other divides.

On ThunderX, gettimeofday improves by 32%.  On ThunderX 2,
gettimeofday improves by 18%.

Note I noticed a bug in the old implementation of __kernel_clock_getres;
it was checking only the lower 32bits of the pointer; this would work
for most cases but could fail in a few.


Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: James Morse <james.morse@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Gross <andy.gross@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Andrew Pinski <apinski@cavium.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org

v2:
- split first CL into 7 pieces, there were cosmetic adjustments.
- make sure profiling is turned off.
- kept quiet_cmd_vdsoas.

v3:
- changed are a result of private email review comments
- rebase
- move arch/arm/vdso/vgettimeofday.c to lib/vdso/vgettimeofday.c
- adjust vgettimeofday.c to be a better global candidate, switch to using
  ARCH_PROVIDES_TIMER and __arch_counter_get() as more generic.
- do not expose gettimeofday if arch does not support user space timer

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 0/12] arm+arm64: vdso unification to lib/vdso/
@ 2017-10-27 22:23 ` Mark Salyzyn
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Salyzyn @ 2017-10-27 22:23 UTC (permalink / raw)
  To: linux-arm-kernel

Take an effort to recode the arm64 vdso code from assembler to C
previously submitted by Andrew Pinski <apinski@cavium.com>, rework
it for use in both arm and arm64, overlapping any optimizations
for each architecture. But instead of landing it in arm64, land the
result into lib/vdso and unify both implementations to simplify
future maintenance. This will act as the basis for implementing
arm64 vdso32 in the future.


apinski at cavium.com made the following claims in the original patch:

This allows the compiler to optimize the divide by 1000 and remove
the other divides.

On ThunderX, gettimeofday improves by 32%.  On ThunderX 2,
gettimeofday improves by 18%.

Note I noticed a bug in the old implementation of __kernel_clock_getres;
it was checking only the lower 32bits of the pointer; this would work
for most cases but could fail in a few.


Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: James Morse <james.morse@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Gross <andy.gross@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Andrew Pinski <apinski@cavium.com>
Cc: linux-kernel at vger.kernel.org
Cc: linux-arm-kernel at lists.infradead.org

v2:
- split first CL into 7 pieces, there were cosmetic adjustments.
- make sure profiling is turned off.
- kept quiet_cmd_vdsoas.

v3:
- changed are a result of private email review comments
- rebase
- move arch/arm/vdso/vgettimeofday.c to lib/vdso/vgettimeofday.c
- adjust vgettimeofday.c to be a better global candidate, switch to using
  ARCH_PROVIDES_TIMER and __arch_counter_get() as more generic.
- do not expose gettimeofday if arch does not support user space timer

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/12] arm+arm64: vdso unification to lib/vdso/
  2017-10-27 22:23 ` Mark Salyzyn
@ 2017-10-30 14:18   ` Mark Rutland
  -1 siblings, 0 replies; 10+ messages in thread
From: Mark Rutland @ 2017-10-30 14:18 UTC (permalink / raw)
  To: Mark Salyzyn
  Cc: linux-kernel, James Morse, Russell King, Catalin Marinas,
	Will Deacon, Andy Lutomirski, Dmitry Safonov, John Stultz,
	Laura Abbott, Kees Cook, Ard Biesheuvel, Andy Gross,
	Kevin Brodsky, Andrew Pinski, linux-arm-kernel

Hi,

On Fri, Oct 27, 2017 at 03:23:48PM -0700, Mark Salyzyn wrote:
> Note I noticed a bug in the old implementation of __kernel_clock_getres;
> it was checking only the lower 32bits of the pointer; this would work
> for most cases but could fail in a few.

Sorry if this is a stupid question, but do you mean from a prior version
of this series, or the one in the kernel today?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 0/12] arm+arm64: vdso unification to lib/vdso/
@ 2017-10-30 14:18   ` Mark Rutland
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Rutland @ 2017-10-30 14:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Fri, Oct 27, 2017 at 03:23:48PM -0700, Mark Salyzyn wrote:
> Note I noticed a bug in the old implementation of __kernel_clock_getres;
> it was checking only the lower 32bits of the pointer; this would work
> for most cases but could fail in a few.

Sorry if this is a stupid question, but do you mean from a prior version
of this series, or the one in the kernel today?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/12] arm+arm64: vdso unification to lib/vdso/
  2017-10-30 14:18   ` Mark Rutland
@ 2017-10-30 20:34     ` Mark Salyzyn
  -1 siblings, 0 replies; 10+ messages in thread
From: Mark Salyzyn @ 2017-10-30 20:34 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, James Morse, Russell King, Catalin Marinas,
	Will Deacon, Andy Lutomirski, Dmitry Safonov, John Stultz,
	Laura Abbott, Kees Cook, Ard Biesheuvel, Andy Gross,
	Kevin Brodsky, Andrew Pinski, linux-arm-kernel

On 10/30/2017 07:18 AM, Mark Rutland wrote:
> Hi,
>
> On Fri, Oct 27, 2017 at 03:23:48PM -0700, Mark Salyzyn wrote:
>> Note I noticed a bug in the old implementation of __kernel_clock_getres;
>> it was checking only the lower 32bits of the pointer; this would work
>> for most cases but could fail in a few.
> Sorry if this is a stupid question, but do you mean from a prior version
> of this series, or the one in the kernel today?

apinski@cavium.com noticed this as part of the existing upstream arm64 
assembler when he did the original conversion to C.

Yes, I am aware that a separate patch will need to be made on the 
assembler code for 'stable' trees. And it needs to be addressed 
(separately from this patch series IMHO, but if forced, it would be a 
pre-patch that gets swallowed in patch 10 of the series).

I have not looked into it in depth, because from Android's point of 
view, this patch series (once approved) is going to be back-ported to 
our android-common kernels and will be the solution to the problem. I 
must declare I have to be humble about my understanding of arm64 
assembler, so there is _that_ ...

-- MarkS

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 0/12] arm+arm64: vdso unification to lib/vdso/
@ 2017-10-30 20:34     ` Mark Salyzyn
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Salyzyn @ 2017-10-30 20:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/30/2017 07:18 AM, Mark Rutland wrote:
> Hi,
>
> On Fri, Oct 27, 2017 at 03:23:48PM -0700, Mark Salyzyn wrote:
>> Note I noticed a bug in the old implementation of __kernel_clock_getres;
>> it was checking only the lower 32bits of the pointer; this would work
>> for most cases but could fail in a few.
> Sorry if this is a stupid question, but do you mean from a prior version
> of this series, or the one in the kernel today?

apinski at cavium.com noticed this as part of the existing upstream arm64 
assembler when he did the original conversion to C.

Yes, I am aware that a separate patch will need to be made on the 
assembler code for 'stable' trees. And it needs to be addressed 
(separately from this patch series IMHO, but if forced, it would be a 
pre-patch that gets swallowed in patch 10 of the series).

I have not looked into it in depth, because from Android's point of 
view, this patch series (once approved) is going to be back-ported to 
our android-common kernels and will be the solution to the problem. I 
must declare I have to be humble about my understanding of arm64 
assembler, so there is _that_ ...

-- MarkS

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/12] arm+arm64: vdso unification to lib/vdso/
  2017-10-30 20:34     ` Mark Salyzyn
@ 2017-10-30 21:44       ` Mark Rutland
  -1 siblings, 0 replies; 10+ messages in thread
From: Mark Rutland @ 2017-10-30 21:44 UTC (permalink / raw)
  To: Mark Salyzyn
  Cc: linux-kernel, James Morse, Russell King, Catalin Marinas,
	Will Deacon, Andy Lutomirski, Dmitry Safonov, John Stultz,
	Laura Abbott, Kees Cook, Ard Biesheuvel, Andy Gross,
	Kevin Brodsky, Andrew Pinski, linux-arm-kernel

On Mon, Oct 30, 2017 at 01:34:13PM -0700, Mark Salyzyn wrote:
> On 10/30/2017 07:18 AM, Mark Rutland wrote:
> > On Fri, Oct 27, 2017 at 03:23:48PM -0700, Mark Salyzyn wrote:
> > > Note I noticed a bug in the old implementation of __kernel_clock_getres;
> > > it was checking only the lower 32bits of the pointer; this would work
> > > for most cases but could fail in a few.
> > Sorry if this is a stupid question, but do you mean from a prior version
> > of this series, or the one in the kernel today?
> 
> apinski@cavium.com noticed this as part of the existing upstream arm64
> assembler when he did the original conversion to C.

Just to check, does the below address the issue, or is there something that
I've missed?

Thanks,
Mark.

---->8----
>From 3557ca259f0a6dbb35a057256552545ec0cef677 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Mon, 30 Oct 2017 21:23:19 +0000
Subject: [PATCH] arm64: vdso: fix clock_getres for 4GiB-aligned res

The vdso tries to check for a NULL res pointer in __kernel_clock_getres,
but only checks the lower 32 bits as is uses CBZ on the W register the
res pointer is held in.

Thus, if the res pointer happened to be aligned to a 4GiB boundary, we'd
spuriously skip storing the timespec to it, while returning a zero error code
to the caller.

Prevent this by checking the whole pointer, using CBZ on the X register
the res pointer is held in.

Fixes: 9031fefde6f2ac1d ("arm64: VDSO support")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Andrew Pinski <apinski@cavium.com>
Reported-by: Mark Salyzyn <salyzyn@android.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/vdso/gettimeofday.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S
index 76320e920965..c39872a7b03c 100644
--- a/arch/arm64/kernel/vdso/gettimeofday.S
+++ b/arch/arm64/kernel/vdso/gettimeofday.S
@@ -309,7 +309,7 @@ ENTRY(__kernel_clock_getres)
 	b.ne	4f
 	ldr	x2, 6f
 2:
-	cbz	w1, 3f
+	cbz	x1, 3f
 	stp	xzr, x2, [x1]
 
 3:	/* res == NULL. */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 0/12] arm+arm64: vdso unification to lib/vdso/
@ 2017-10-30 21:44       ` Mark Rutland
  0 siblings, 0 replies; 10+ messages in thread
From: Mark Rutland @ 2017-10-30 21:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 30, 2017 at 01:34:13PM -0700, Mark Salyzyn wrote:
> On 10/30/2017 07:18 AM, Mark Rutland wrote:
> > On Fri, Oct 27, 2017 at 03:23:48PM -0700, Mark Salyzyn wrote:
> > > Note I noticed a bug in the old implementation of __kernel_clock_getres;
> > > it was checking only the lower 32bits of the pointer; this would work
> > > for most cases but could fail in a few.
> > Sorry if this is a stupid question, but do you mean from a prior version
> > of this series, or the one in the kernel today?
> 
> apinski at cavium.com noticed this as part of the existing upstream arm64
> assembler when he did the original conversion to C.

Just to check, does the below address the issue, or is there something that
I've missed?

Thanks,
Mark.

---->8----
>From 3557ca259f0a6dbb35a057256552545ec0cef677 Mon Sep 17 00:00:00 2001
From: Mark Rutland <mark.rutland@arm.com>
Date: Mon, 30 Oct 2017 21:23:19 +0000
Subject: [PATCH] arm64: vdso: fix clock_getres for 4GiB-aligned res

The vdso tries to check for a NULL res pointer in __kernel_clock_getres,
but only checks the lower 32 bits as is uses CBZ on the W register the
res pointer is held in.

Thus, if the res pointer happened to be aligned to a 4GiB boundary, we'd
spuriously skip storing the timespec to it, while returning a zero error code
to the caller.

Prevent this by checking the whole pointer, using CBZ on the X register
the res pointer is held in.

Fixes: 9031fefde6f2ac1d ("arm64: VDSO support")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Andrew Pinski <apinski@cavium.com>
Reported-by: Mark Salyzyn <salyzyn@android.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/vdso/gettimeofday.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S
index 76320e920965..c39872a7b03c 100644
--- a/arch/arm64/kernel/vdso/gettimeofday.S
+++ b/arch/arm64/kernel/vdso/gettimeofday.S
@@ -309,7 +309,7 @@ ENTRY(__kernel_clock_getres)
 	b.ne	4f
 	ldr	x2, 6f
 2:
-	cbz	w1, 3f
+	cbz	x1, 3f
 	stp	xzr, x2, [x1]
 
 3:	/* res == NULL. */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 0/12] arm+arm64: vdso unification to lib/vdso/
  2017-10-30 21:44       ` Mark Rutland
@ 2017-10-31  9:49         ` Will Deacon
  -1 siblings, 0 replies; 10+ messages in thread
From: Will Deacon @ 2017-10-31  9:49 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Mark Salyzyn, linux-kernel, James Morse, Russell King,
	Catalin Marinas, Andy Lutomirski, Dmitry Safonov, John Stultz,
	Laura Abbott, Kees Cook, Ard Biesheuvel, Andy Gross,
	Kevin Brodsky, Andrew Pinski, linux-arm-kernel

On Mon, Oct 30, 2017 at 09:44:35PM +0000, Mark Rutland wrote:
> On Mon, Oct 30, 2017 at 01:34:13PM -0700, Mark Salyzyn wrote:
> > On 10/30/2017 07:18 AM, Mark Rutland wrote:
> > > On Fri, Oct 27, 2017 at 03:23:48PM -0700, Mark Salyzyn wrote:
> > > > Note I noticed a bug in the old implementation of __kernel_clock_getres;
> > > > it was checking only the lower 32bits of the pointer; this would work
> > > > for most cases but could fail in a few.
> > > Sorry if this is a stupid question, but do you mean from a prior version
> > > of this series, or the one in the kernel today?
> > 
> > apinski@cavium.com noticed this as part of the existing upstream arm64
> > assembler when he did the original conversion to C.
> 
> Just to check, does the below address the issue, or is there something that
> I've missed?

Looks fine to me. Applied.

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 0/12] arm+arm64: vdso unification to lib/vdso/
@ 2017-10-31  9:49         ` Will Deacon
  0 siblings, 0 replies; 10+ messages in thread
From: Will Deacon @ 2017-10-31  9:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 30, 2017 at 09:44:35PM +0000, Mark Rutland wrote:
> On Mon, Oct 30, 2017 at 01:34:13PM -0700, Mark Salyzyn wrote:
> > On 10/30/2017 07:18 AM, Mark Rutland wrote:
> > > On Fri, Oct 27, 2017 at 03:23:48PM -0700, Mark Salyzyn wrote:
> > > > Note I noticed a bug in the old implementation of __kernel_clock_getres;
> > > > it was checking only the lower 32bits of the pointer; this would work
> > > > for most cases but could fail in a few.
> > > Sorry if this is a stupid question, but do you mean from a prior version
> > > of this series, or the one in the kernel today?
> > 
> > apinski at cavium.com noticed this as part of the existing upstream arm64
> > assembler when he did the original conversion to C.
> 
> Just to check, does the below address the issue, or is there something that
> I've missed?

Looks fine to me. Applied.

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-10-31  9:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-27 22:23 [PATCH v3 0/12] arm+arm64: vdso unification to lib/vdso/ Mark Salyzyn
2017-10-27 22:23 ` Mark Salyzyn
2017-10-30 14:18 ` Mark Rutland
2017-10-30 14:18   ` Mark Rutland
2017-10-30 20:34   ` Mark Salyzyn
2017-10-30 20:34     ` Mark Salyzyn
2017-10-30 21:44     ` Mark Rutland
2017-10-30 21:44       ` Mark Rutland
2017-10-31  9:49       ` Will Deacon
2017-10-31  9:49         ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.