All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Metcalf <cmetcalf@mellanox.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	John Stultz <john.stultz@linaro.org>,
	Ingo Molnar <mingo@kernel.org>,
	David Gibson <david@gibson.dropbear.id.au>,
	Liav Rehana <liavr@mellanox.com>,
	Richard Cochran <richardcochran@gmail.com>,
	Parit Bhargava <prarit@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	"Christopher S. Hall" <christopher.s.hall@intel.com>
Subject: Re: [patch 5/6] [RFD] timekeeping: Provide optional 128bit math
Date: Fri, 9 Dec 2016 12:32:07 -0500	[thread overview]
Message-ID: <486a28a2-b7de-67fd-f731-1487b141319b@mellanox.com> (raw)
In-Reply-To: <20161209083011.GD15765@worktop.programming.kicks-ass.net>

On 12/9/2016 3:30 AM, Peter Zijlstra wrote:
> On Fri, Dec 09, 2016 at 07:38:47AM +0100, Peter Zijlstra wrote:
>> On Fri, Dec 09, 2016 at 06:26:38AM +0100, Peter Zijlstra wrote:
>>> Just for giggles, on tilegx the branch is actually slower than doing the
>>> mult unconditionally.
>>>
>>> The problem is that the two multiplies would otherwise completely
>>> pipeline, whereas with the conditional you serialize them.
>> On my Haswell laptop the unconditional version is faster too.
> Only when using x86_64 instructions, once I fixed the i386 variant it
> was slower, probably due to register pressure and the like.
>
>>> (came to light while talking about why the mul_u64_u32_shr() fallback
>>> didn't work right for them, which was a combination of the above issue
>>> and the fact that their compiler 'lost' the fact that these are
>>> 32x32->64 mults and did 64x64 ones instead).
>> Turns out using GCC-6.2.1 we have the same problem on i386, GCC doesn't
>> recognise the 32x32 mults and generates crap.
>>
>> This used to work :/
> Do we want something like so?
>
> ---
>   arch/tile/include/asm/Kbuild  |  1 -
>   arch/tile/include/asm/div64.h | 14 ++++++++++++++
>   arch/x86/include/asm/div64.h  | 10 ++++++++++
>   include/linux/math64.h        | 26 ++++++++++++++++++--------
>   4 files changed, 42 insertions(+), 9 deletions(-)

Untested, but I looked at it closely, and it seems like a decent idea.

Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]

Of course if this is pushed up, it will then probably be too tempting for me not
to add the tilegx-specific mul_u64_u32_shr() to take advantage of pipelining
the two 32x32->64 multiplies :-)

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

  parent reply	other threads:[~2016-12-09 17:32 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-08 20:49 [patch 0/6] timekeeping: Cure the signed/unsigned wreckage Thomas Gleixner
2016-12-08 20:49 ` [patch 1/6] timekeeping: Force unsigned clocksource to nanoseconds conversion Thomas Gleixner
2016-12-08 23:38   ` David Gibson
2016-12-09 11:13   ` [tip:timers/core] timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 2/6] timekeeping: Make the conversion call chain consistently unsigned Thomas Gleixner
2016-12-08 23:39   ` David Gibson
2016-12-09 11:13   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 3/6] timekeeping: Get rid of pointless typecasts Thomas Gleixner
2016-12-08 23:40   ` David Gibson
2016-12-09 11:14   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 4/6] timekeeping: Use mul_u64_u32_shr() instead of open coding it Thomas Gleixner
2016-12-08 23:41   ` David Gibson
2016-12-09 11:14   ` [tip:timers/core] " tip-bot for Thomas Gleixner
2016-12-08 20:49 ` [patch 5/6] [RFD] timekeeping: Provide optional 128bit math Thomas Gleixner
2016-12-09  4:08   ` Ingo Molnar
2016-12-09  4:29     ` Ingo Molnar
2016-12-09  4:39       ` John Stultz
2016-12-09  4:48     ` Peter Zijlstra
2016-12-09  5:22       ` Ingo Molnar
2016-12-09  5:41         ` Peter Zijlstra
2016-12-09  5:11   ` Peter Zijlstra
2016-12-09  6:08     ` Peter Zijlstra
2016-12-09  5:26   ` Peter Zijlstra
2016-12-09  6:38     ` Peter Zijlstra
2016-12-09  8:30       ` Peter Zijlstra
2016-12-09  9:11         ` Peter Zijlstra
2016-12-09 10:01         ` Peter Zijlstra
2016-12-09 17:32         ` Chris Metcalf [this message]
2017-01-14 12:51         ` [tip:timers/core] math64, timers: Fix 32bit mul_u64_u32_shr() and friends tip-bot for Peter Zijlstra
2016-12-09 10:18       ` [patch 5/6] [RFD] timekeeping: Provide optional 128bit math Peter Zijlstra
2016-12-09 17:20         ` Chris Metcalf
2016-12-08 20:49 ` [patch 6/6] [RFD] timekeeping: Get rid of cycle_t Thomas Gleixner
2016-12-08 23:43   ` David Gibson
2016-12-09  4:52 ` [patch 0/6] timekeeping: Cure the signed/unsigned wreckage John Stultz
2016-12-09  5:30 ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=486a28a2-b7de-67fd-f731-1487b141319b@mellanox.com \
    --to=cmetcalf@mellanox.com \
    --cc=christopher.s.hall@intel.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=john.stultz@linaro.org \
    --cc=liavr@mellanox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lvivier@redhat.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=prarit@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.