All of lore.kernel.org
 help / color / mirror / Atom feed
* x86 rte_memcpy_aligned possible optimization
@ 2023-03-27 11:45 Morten Brørup
  0 siblings, 0 replies; only message in thread
From: Morten Brørup @ 2023-03-27 11:45 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, Zhihong Wang

Hi Bruce,

I think one of the loops in rte_memcpy_aligned() takes one too many rounds in the case where the catch-up could replace the last round.

Consider e.g. n = 128:

The 64 bytes block copy will take two rounds, and the catch-up will copy the last 64 bytes once again.

I think that the 64 bytes block copy could take only one round and let the catch-up copy the last 64 bytes.

I'm not sure if my suggested method is generally faster than the current method, so I'm passing the ball.

PS: It looks like something similar can be done for the other block copy loops in this file. I haven't dug into the details.


static __rte_always_inline void *
rte_memcpy_aligned(void *dst, const void *src, size_t n)
{
	void *ret = dst;

	/* Copy size < 16 bytes */
	if (n < 16) {
		return rte_mov15_or_less(dst, src, n);
	}

	/* Copy 16 <= size <= 32 bytes */
	if (n <= 32) {
		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
		rte_mov16((uint8_t *)dst - 16 + n,
				(const uint8_t *)src - 16 + n);

		return ret;
	}

	/* Copy 32 < size <= 64 bytes */
	if (n <= 64) {
		rte_mov32((uint8_t *)dst, (const uint8_t *)src);
		rte_mov32((uint8_t *)dst - 32 + n,
				(const uint8_t *)src - 32 + n);

		return ret;
	}

	/* Copy 64 bytes blocks */
-	for (; n >= 64; n -= 64) {
+	for (; n > 64; n -= 64) {
		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
		dst = (uint8_t *)dst + 64;
		src = (const uint8_t *)src + 64;
	}

	/* Copy whatever left */
	rte_mov64((uint8_t *)dst - 64 + n,
			(const uint8_t *)src - 64 + n);

	return ret;
}


Med venlig hilsen / Kind regards,
-Morten Brørup



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-03-27 11:45 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-27 11:45 x86 rte_memcpy_aligned possible optimization Morten Brørup

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.