Util-Linux Archive on lore.kernel.org
 help / color / Atom feed
From: Peter Cordes <peter@cordes.ca>
To: "Aurélien Lajoie" <orel@melix.net>
Cc: util-linux@vger.kernel.org
Subject: Re: [PATCH] libuuid: improve uuid_unparse() performance
Date: Wed, 25 Mar 2020 23:13:38 -0300
Message-ID: <CA+bjHURiQDMEp2UzxUX4ceop+o3Ebzr1z4zfSZWJDcaYTyN6Dg@mail.gmail.com> (raw)
In-Reply-To: <CAA0A08WYct_BhybReWMkK_LXBS2L0DmmyNh2W8cp=kys9Q0R7g@mail.gmail.com>

On Wed, Mar 25, 2020 at 10:07 PM Aurélien Lajoie <orel@melix.net> wrote:
> On Wed, Mar 25, 2020 at 3:16 PM Peter Cordes <peter@cordes.ca> wrote:
> > If you really are bottlenecking on UUID throughput, see my SIMD answer
> > on https://stackoverflow.com/questions/53823756/how-to-convert-a-binary-integer-number-to-a-hex-string
> > with x86 SSE2 (baseline for x86-64), SSSE3, AVX2 variable-shift, and
> > AVX512VBMI integer -> hex manual vectorization
> I will take a look at it, but in a second time, I get your idea.
> I am not familiar with this, nice way to jumb on SIMD operations.

I can write that code with _mm_cmpgt_epi8 intrinsics from immintrin.h
if libuuid actually wants a patch add an #ifdef __SSE2__ version that
x86-64 can use all the time instead of the scalar version.  I'm very
familiar with x86 SIMD intrinsics so it would be easy for me to write
the code I'm already imagining in my head.  But it might not be worth
the trouble if it won't get merged because nobody wants to maintain

 Also __SSSE3__,  __AVX2__, and __AVX512VBMI__ versions if we want
them, but those would only get enabled for people compiling libuuid
with  -march=native on their machines, or stuff like that.

Or we could even to runtime CPU detection to set a function pointer to
the version that's best for the current CPU.  SSSE3 helps a lot (byte
shuffle as a hexdigit LUT, and to line up data for the '-' gaps).  And
AVX512VBMI is fantastic for this on IceLake client/server.  It's only
called internally so we don't need to use the dynamic-link-time CPU
detection that glibc uses to resolve memset to for example
__memset_avx2_unaligned_erms, using a custom symbol resolver function.
We can see how much speedup we get from using more than SSE2 and
decide if it's worth the trouble.

  reply index

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-24 21:26 Aurelien LAJOIE
2020-03-25 11:10 ` Karel Zak
2020-03-26  0:54   ` Aurélien Lajoie
2020-03-25 14:16 ` Peter Cordes
2020-03-26  1:06   ` Aurélien Lajoie
2020-03-26  2:13     ` Peter Cordes [this message]
2020-03-26 23:22       ` Peter Cordes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+bjHURiQDMEp2UzxUX4ceop+o3Ebzr1z4zfSZWJDcaYTyN6Dg@mail.gmail.com \
    --to=peter@cordes.ca \
    --cc=orel@melix.net \
    --cc=util-linux@vger.kernel.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Util-Linux Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/util-linux/0 util-linux/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 util-linux util-linux/ https://lore.kernel.org/util-linux \
	public-inbox-index util-linux

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git