linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Alexey Dobriyan <adobriyan@gmail.com>
Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	hpa@zytor.com, x86@kernel.org, linux-kernel@vger.kernel.org,
	torvalds@linux-foundation.org
Subject: Re: [PATCH v-1] x86_64: new and improved memset() + question
Date: Mon, 11 Feb 2019 13:47:16 +0100	[thread overview]
Message-ID: <20190211124716.GA13062@gmail.com> (raw)
In-Reply-To: <20190117222318.GA10338@avx2>


* Alexey Dobriyan <adobriyan@gmail.com> wrote:

> Current memset() implementation does silly things:
> * multiplication to get wide constant:
> 	waste of cycles if filler is known at compile time,
> 
> * REP STOSQ followed by REP STOSB:
> 	this code is used when REP STOSB is slow but still it is used
> 	for small length (< 8) when setup overhead is relatively big,
> 
> * suboptimal calling convention:
> 	REP STOSB/STOSQ favours (rdi, rcx)
> 
> * memset_orig():
> 	it is hard to even look at it :^)
> 
> New implementation is based on the following observations:
> * c == 0 is the most common form,
> 	filler can be done with "xor eax, eax" and pushed into memset()
> 	saving 2 bytes per call and multiplication
> 
> * len divisible by 8 is the most common form:
> 	all it takes is one pointer or unsigned long inside structure,
> 	dispatch at compile time to code without those ugly "lets fill
> 	at most 7 bytes" tails,
> 
> * multiplication to get wider filler value can be done at compile time
>   for "c != 0" with 1 insn/10 bytes at most saving multiplication.
> 
> * those leaner forms of memset can be done withing 3/4 registers (RDI,
>   RCX, RAX, [RSI]) saving the rest from clobbering.

Ok, sorry about the belated reply - all that sounds like very nice 
improvements!

> Note: "memset0" name is chosen because "bzero" is officially deprecated.
> Note: memset(,0,) form is interleaved into memset(,c,) form to save
> space.
> 
> QUESTION: is it possible to tell gcc "this function is semantically
> equivalent to memset(3) so make high level optimizations but call it
> when it is necessary"? I suspect the answer is "no" :-\

No idea ...

> TODO:
> 	CONFIG_FORTIFY_SOURCE is enabled by distros
> 	benchmarks
> 	testing
> 	more comments
> 	check with memset_io() so that no surprises pop up

I'd only like to make happy noises here to make sure you continue with 
this work - it does look promising. :-)
 
Thanks,

	Ingo

  reply	other threads:[~2019-02-11 12:47 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-17 22:23 [PATCH v-1] x86_64: new and improved memset() + question Alexey Dobriyan
2019-02-11 12:47 ` Ingo Molnar [this message]
2019-02-11 17:10   ` Alexey Dobriyan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190211124716.GA13062@gmail.com \
    --to=mingo@kernel.org \
    --cc=adobriyan@gmail.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).