All of lore.kernel.org
 help / color / mirror / Atom feed
From: wei.guo.simon@gmail.com
To: linuxppc-dev@lists.ozlabs.org
Cc: Paul Mackerras <paulus@ozlabs.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	"Naveen N.  Rao" <naveen.n.rao@linux.vnet.ibm.com>,
	Cyril Bur <cyrilbur@gmail.com>,
	Simon Guo <wei.guo.simon@gmail.com>
Subject: [PATCH v7 0/5] powerpc/64: memcmp() optimization
Date: Wed, 30 May 2018 17:20:58 +0800	[thread overview]
Message-ID: <1527672063-6953-1-git-send-email-wei.guo.simon@gmail.com> (raw)

From: Simon Guo <wei.guo.simon@gmail.com>

There is some room to optimize memcmp() in powerpc 64 bits version for
following 2 cases:
(1) Even src/dst addresses are not aligned with 8 bytes at the beginning,
memcmp() can align them and go with .Llong comparision mode without
fallback to .Lshort comparision mode do compare buffer byte by byte.
(2) VMX instructions can be used to speed up for large size comparision,
currently the threshold is set for 4K bytes. Notes the VMX instructions
will lead to VMX regs save/load penalty. This patch set includes a
patch to add a 32 bytes pre-checking to minimize the penalty.

It did the similar with glibc commit dec4a7105e (powerpc: Improve memcmp 
performance for POWER8). Thanks Cyril Bur's information.
This patch set also updates memcmp selftest case to make it compiled and
incorporate large size comparison case.

v6 -> v7:
- add vcmpequd/vcmpequdb .long macro
- add CPU_FTR pair so that Power7 won't invoke Altivec instrs.
- rework some instructions for higher performance or more readable.

v5 -> v6:
- correct some comments/commit messsage.
- rename VMX_OPS_THRES to VMX_THRESH

v4 -> v5:
- Expand 32 bytes prechk to src/dst different offset case, and remove
KSM specific label/comment.

v3 -> v4:
- Add 32 bytes pre-checking before using VMX instructions.

v2 -> v3:
- add optimization for src/dst with different offset against 8 bytes
boundary.
- renamed some label names.
- reworked some comments from Cyril Bur, such as fill the pipeline, 
and use VMX when size == 4K.
- fix a bug of enter/exit_vmx_ops pairness issue. And revised test 
case to test whether enter/exit_vmx_ops are paired.

v1 -> v2:
- update 8bytes unaligned bytes comparison method.
- fix a VMX comparision bug.
- enhanced the original memcmp() selftest.
- add powerpc/64 to subject/commit message.


Simon Guo (5):
  powerpc/64: Align bytes before fall back to .Lshort in powerpc64
    memcmp()
  powerpc: add vcmpequd/vcmpequb ppc instruction macro
  powerpc/64: enhance memcmp() with VMX instruction for long bytes
    comparision
  powerpc/64: add 32 bytes prechecking before using VMX optimization on
    memcmp()
  powerpc:selftest update memcmp_64 selftest for VMX implementation

 arch/powerpc/include/asm/asm-prototypes.h          |   4 +-
 arch/powerpc/include/asm/ppc-opcode.h              |  11 +
 arch/powerpc/lib/copypage_power7.S                 |   4 +-
 arch/powerpc/lib/memcmp_64.S                       | 412 ++++++++++++++++++++-
 arch/powerpc/lib/memcpy_power7.S                   |   6 +-
 arch/powerpc/lib/vmx-helper.c                      |   4 +-
 .../selftests/powerpc/copyloops/asm/ppc_asm.h      |   4 +-
 .../selftests/powerpc/stringloops/asm/ppc-opcode.h |  39 ++
 .../selftests/powerpc/stringloops/asm/ppc_asm.h    |  24 ++
 .../testing/selftests/powerpc/stringloops/memcmp.c |  98 +++--
 10 files changed, 566 insertions(+), 40 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/stringloops/asm/ppc-opcode.h

-- 
1.8.3.1

             reply	other threads:[~2018-05-30  9:21 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-30  9:20 wei.guo.simon [this message]
2018-05-30  9:20 ` [PATCH v7 1/5] powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp() wei.guo.simon
2018-05-30  9:21 ` [PATCH v7 2/5] powerpc: add vcmpequd/vcmpequb ppc instruction macro wei.guo.simon
2018-05-30  9:21 ` [PATCH v7 3/5] powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision wei.guo.simon
2018-05-30  9:21 ` [PATCH v7 4/5] powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp() wei.guo.simon
2018-05-30  9:21 ` [PATCH v7 5/5] powerpc:selftest update memcmp_64 selftest for VMX implementation wei.guo.simon
2018-06-05  2:16 ` [PATCH v7 0/5] powerpc/64: memcmp() optimization Michael Ellerman
2018-06-04 10:27   ` Simon Guo
2018-06-06  6:21   ` Simon Guo
2018-06-06  6:36     ` Naveen N. Rao
2018-06-06  6:53       ` Simon Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1527672063-6953-1-git-send-email-wei.guo.simon@gmail.com \
    --to=wei.guo.simon@gmail.com \
    --cc=cyrilbur@gmail.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=paulus@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.