util-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] libuuid: use kernel crypto api
@ 2018-08-04 19:17 Sami Kerola
  2018-08-04 19:27 ` Dmitry V. Levin
  2018-08-04 19:46 ` Theodore Y. Ts'o
  0 siblings, 2 replies; 6+ messages in thread
From: Sami Kerola @ 2018-08-04 19:17 UTC (permalink / raw)
  To: util-linux; +Cc: Sami Kerola

Try to speed up md5 checksums by using hardware accelerators when they are
present with use of kernel crypto api.

Reference: http://www.chronox.de/libkcapi/html/ch01s02.html
Signed-off-by: Sami Kerola <kerolasa@iki.fi>
---
 libuuid/src/gen_uuid.c | 51 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 43 insertions(+), 8 deletions(-)

diff --git a/libuuid/src/gen_uuid.c b/libuuid/src/gen_uuid.c
index a374e75c9..a1867460e 100644
--- a/libuuid/src/gen_uuid.c
+++ b/libuuid/src/gen_uuid.c
@@ -80,6 +80,8 @@
 #if defined(__linux__) && defined(HAVE_SYS_SYSCALL_H)
 #include <sys/syscall.h>
 #endif
+#include <linux/if_alg.h>
+#include <sys/uio.h>
 
 #include "all-io.h"
 #include "uuidP.h"
@@ -564,17 +566,50 @@ void uuid_generate(uuid_t out)
  */
 void uuid_generate_md5(uuid_t out, const uuid_t ns, const char *name, size_t len)
 {
-	UL_MD5_CTX ctx;
-	char hash[UL_MD5LENGTH];
+	int tfmfd;
+	struct sockaddr_alg sa = {
+		.salg_family = AF_ALG,
+		.salg_type = "hash",
+		.salg_name = "md5"
+	};
+
+	tfmfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
+ old_md5:
+	if (tfmfd != -1) {
+		int opfd = -1, fail = 0;
+		struct iovec const iov[2] = {
+			{.iov_base = (void *)ns, .iov_len = sizeof(uuid_t)},
+			{.iov_base = (void *)name, .iov_len = len}
+		};
+
+		if (bind(tfmfd, (struct sockaddr *)&sa, sizeof(sa)) < 0)
+			fail = 1;
+		if (!fail && (opfd = accept(tfmfd, NULL, 0)) < 0)
+			fail = 1;
+		if (!fail && writev(opfd, iov, 2) < 0)
+			fail = 1;
+		if (!fail && read(opfd, out, sizeof(uuid_t)) == -1)
+			fail = 1;
+		if (opfd != -1)
+			close(opfd);
+		close(tfmfd);
+		if (fail) {
+			tfmfd = -1;
+			goto old_md5;
+		}
+	} else {
+		UL_MD5_CTX ctx;
+		char hash[UL_MD5LENGTH];
 
-	ul_MD5Init(&ctx);
-	/* hash concatenation of well-known UUID with name */
-	ul_MD5Update(&ctx, ns, sizeof(uuid_t));
-	ul_MD5Update(&ctx, (const unsigned char *)name, len);
+		ul_MD5Init(&ctx);
+		/* hash concatenation of well-known UUID with name */
+		ul_MD5Update(&ctx, ns, sizeof(uuid_t));
+		ul_MD5Update(&ctx, (const unsigned char *)name, len);
 
-	ul_MD5Final((unsigned char *)hash, &ctx);
+		ul_MD5Final((unsigned char *)hash, &ctx);
 
-	memcpy(out, hash, sizeof(uuid_t));
+		memcpy(out, hash, sizeof(uuid_t));
+	}
 
 	out[6] &= ~(UUID_TYPE_MASK << UUID_TYPE_SHIFT);
 	out[6] |= (UUID_TYPE_DCE_MD5 << UUID_TYPE_SHIFT);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] libuuid: use kernel crypto api
  2018-08-04 19:17 [PATCH] libuuid: use kernel crypto api Sami Kerola
@ 2018-08-04 19:27 ` Dmitry V. Levin
  2018-08-04 19:46 ` Theodore Y. Ts'o
  1 sibling, 0 replies; 6+ messages in thread
From: Dmitry V. Levin @ 2018-08-04 19:27 UTC (permalink / raw)
  To: util-linux

[-- Attachment #1: Type: text/plain, Size: 380 bytes --]

On Sat, Aug 04, 2018 at 08:17:06PM +0100, Sami Kerola wrote:
> Try to speed up md5 checksums by using hardware accelerators when they are
> present with use of kernel crypto api.
> 
> Reference: http://www.chronox.de/libkcapi/html/ch01s02.html

What if there is no speedup?

See lengthy discussions in the gnulib mailing list (April..June) on this subject.


-- 
ldv

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] libuuid: use kernel crypto api
  2018-08-04 19:17 [PATCH] libuuid: use kernel crypto api Sami Kerola
  2018-08-04 19:27 ` Dmitry V. Levin
@ 2018-08-04 19:46 ` Theodore Y. Ts'o
  2018-08-05 10:42   ` Sami Kerola
  1 sibling, 1 reply; 6+ messages in thread
From: Theodore Y. Ts'o @ 2018-08-04 19:46 UTC (permalink / raw)
  To: Sami Kerola; +Cc: util-linux

On Sat, Aug 04, 2018 at 08:17:06PM +0100, Sami Kerola wrote:
> Try to speed up md5 checksums by using hardware accelerators when they are
> present with use of kernel crypto api.
> 
> Reference: http://www.chronox.de/libkcapi/html/ch01s02.html
> Signed-off-by: Sami Kerola <kerolasa@iki.fi>

I'm not sure I see the point of this change.  Is there a use case you
have in mind where people would be using crypto hash based UUID's
where performance is a concern?  In in that case where someone was
trying to create a huge number of crypto hash based UUID's, how much
overhead is there in going through the kernel API if there is *not* a
hardware accelerator present?

						- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] libuuid: use kernel crypto api
  2018-08-04 19:46 ` Theodore Y. Ts'o
@ 2018-08-05 10:42   ` Sami Kerola
  2018-08-05 23:41     ` Peter Cordes
  2018-08-06  6:56     ` Karel Zak
  0 siblings, 2 replies; 6+ messages in thread
From: Sami Kerola @ 2018-08-05 10:42 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: Dmitry V. Levin, util-linux

On Sat, 4 Aug 2018, Theodore Y. Ts'o wrote:

> On Sat, Aug 04, 2018 at 08:17:06PM +0100, Sami Kerola wrote:
> > Try to speed up md5 checksums by using hardware accelerators when they are
> > present with use of kernel crypto api.
> > 
> > Reference: http://www.chronox.de/libkcapi/html/ch01s02.html
> > Signed-off-by: Sami Kerola <kerolasa@iki.fi>
> 
> I'm not sure I see the point of this change.  Is there a use case you
> have in mind where people would be using crypto hash based UUID's
> where performance is a concern?  In in that case where someone was
> trying to create a huge number of crypto hash based UUID's, how much
> overhead is there in going through the kernel API if there is *not* a
> hardware accelerator present?
> 
> 						- Ted

Hi Dmitry & Ted,

I should have told in that commit message part of the motivation was to 
deprecate util-linux local md5 implementation. But since both of you 
raised concern about performance I decided to test kernel api and 
util-linux implementations as close the same way as they are used in 
libuuid.

Executive summary: kernel api is surprisingly slow.

The following test is using short input messages (1 to 64 bytes) to md5 
functions. I assume most uuid use is in this range. As you can see in perf 
print out the kernel api is about 10x slower. When inputs are up to 1024 
bytes slowdown is about x4. With 10 kilobyte chucks these are about as 
quick. Maybe I am wrong when thinking most uuid's tend to be based on 
small inputs, so the slow down is relevant.

Thank you both for pointing out the change smelled rats. I did not realize 
kernel api could be so expensive to use. That said I am no longer in 
favour of merging this change.

/*
$ dd if=/dev/urandom of=./test.in count=1 bs=10M
$ cat ./test.in >/dev/null # ensure page cache is warm

$ perf stat md5-test --kernel-api ./test.in

 Performance counter stats for 'md5-test --kernel-api ./test.in':

       2462.555389      task-clock:u (msec)       #    0.963 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
                56      page-faults:u             #    0.023 K/sec
       213,119,356      cycles:u                  #    0.087 GHz
        65,763,609      instructions:u            #    0.31  insn per cycle
        19,412,627      branches:u                #    7.883 M/sec
           131,530      branch-misses:u           #    0.68% of all branches

       2.557148339 seconds time elapsed

$ perf stat md5-test --util-linux ./test.in

 Performance counter stats for 'md5-test --util-linux ./test.in':

        225.212947      task-clock:u (msec)       #    0.998 CPUs utilized
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
                57      page-faults:u             #    0.253 K/sec
       225,921,565      cycles:u                  #    1.003 GHz
       375,010,554      instructions:u            #    1.66  insn per cycle
        21,918,050      branches:u                #   97.321 M/sec
            48,356      branch-misses:u           #    0.22% of all branches

       0.225656336 seconds time elapsed
*/

#include <getopt.h>
#include <linux/if_alg.h>
#include <stdio.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <unistd.h>

#include "c.h"
#include "closestream.h"
#include "nls.h"
#include "strutils.h"
#include "uuid.h"
#include "xalloc.h"
#include "md5.h"

struct run_state {
	unsigned int api;
	unsigned int seed;
	size_t max_read;
	int fd;
};

static void __attribute__((__noreturn__)) usage(void)
{
	fputs(USAGE_HEADER, stdout);
	printf(" %s [options] file\n", program_invocation_short_name);

	fputs(USAGE_SEPARATOR, stdout);
	puts("Compare util-linux md5 with kernel crypto api.");

	fputs(USAGE_OPTIONS, stdout);
	puts(" -u, --util-linux       use util-linux md5 implementation (default)");
	puts(" -k, --kernel-api       use kernel api");
	puts(" -s, --seed <number>    srand(seed) value (default: 42)");
	puts(" -m, --max <number>     maximum read chunk from file (default: 64)");
	fputs(USAGE_SEPARATOR, stdout);
	printf(USAGE_HELP_OPTIONS(24));
	printf(USAGE_MAN_TAIL("md5-test(1)"));
	exit(EXIT_SUCCESS);
}

static void kernel_api(uuid_t out, const uuid_t ns, char *buf, size_t sz)
{
	int tfmfd;
	struct sockaddr_alg sa = {
		.salg_family = AF_ALG,
		.salg_type = "hash",
		.salg_name = "md5"
	};
	int opfd = -1, fail = 0;
	struct iovec const iov[2] = {
		{.iov_base = (void *)ns, .iov_len = sizeof(uuid_t)},
		{.iov_base = (void *)buf, .iov_len = sz}
	};

	tfmfd = socket(AF_ALG, SOCK_SEQPACKET, 0);
	if (bind(tfmfd, (struct sockaddr *)&sa, sizeof(sa)) < 0)
		fail = 1;
	if (!fail && (opfd = accept(tfmfd, NULL, 0)) < 0)
		fail = 1;
	if (!fail && writev(opfd, iov, 2) < 0)
		fail = 1;
	if (!fail && read(opfd, out, sizeof(uuid_t)) == -1)
		fail = 1;
	if (opfd != -1)
		close(opfd);
	close(tfmfd);
}

static void util_linux_md5(uuid_t out, const uuid_t ns, char *buf, size_t sz)
{
	UL_MD5_CTX ctx;
	char hash[UL_MD5LENGTH];

	ul_MD5Init(&ctx);
	/* hash concatenation of well-known UUID with name */
	ul_MD5Update(&ctx, ns, sizeof(uuid_t));
	ul_MD5Update(&ctx, (const unsigned char *)buf, sz);

	ul_MD5Final((unsigned char *)hash, &ctx);

	memcpy(out, hash, sizeof(uuid_t));
}

int main(int argc, char **argv)
{
	struct run_state rst = {
		.api = 0,
		.seed = 42,
		.max_read = 64,
		.fd = -1
	};
	static const struct option longopts[] = {
		{ "util-linux", no_argument,       NULL, 'u' },
		{ "kernel-api", no_argument,       NULL, 'k' },
		{ "seed",       required_argument, NULL, 's' },
		{ "max",        required_argument, NULL, 'm' },
		{ "version",    no_argument,       NULL, 'V' },
		{ "help",       no_argument,       NULL, 'h' },
		{ NULL, 0, NULL, 0 }
	};
	int c;
	char *buf;
	uuid_t out;
	uuid_t ns;

	atexit(close_stdout);

	while ((c = getopt_long(argc, argv, "uks:m:Vh", longopts, NULL)) != -1)
		switch (c) {
		case 'u':
			rst.api = 0;
			break;
		case 'k':
			rst.api = 1;
			break;
		case 's':
			rst.seed = strtou32_or_err(optarg, "failed to parse seed");
			break;
		case 'm':
			rst.max_read = strtou32_or_err(optarg, "failed to parse max chunk");
			if (rst.max_read == 0)
				err(EXIT_FAILURE, "max cannot be zero");
			break;
		case 'V':
			printf(UTIL_LINUX_VERSION);
			return EXIT_SUCCESS;
		case 'h':
			usage();
		default:
			errtryhelp(EXIT_FAILURE);
		}

	argc -= optind;
	argv += optind;
	if (0 < argc) {
		rst.fd = open(argv[0], O_RDONLY);
		if (rst.fd < 0)
			err(EXIT_FAILURE, "could not open: %s", argv[0]);
	} else
		errx(EXIT_FAILURE, "mandatory file argument missing");

	buf = xmalloc(rst.max_read);
	srand(rst.seed);

	while(1) {
		ssize_t sz;

		sz = (rand() % rst.max_read) + 1;
		if (read(rst.fd, buf, sz) != sz)
			break;
		if (rst.api)
			kernel_api(out, ns, buf, sz);
		else
			util_linux_md5(out, ns, buf, sz);
	}
	free(buf);
	close(rst.fd);
	return EXIT_SUCCESS;
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] libuuid: use kernel crypto api
  2018-08-05 10:42   ` Sami Kerola
@ 2018-08-05 23:41     ` Peter Cordes
  2018-08-06  6:56     ` Karel Zak
  1 sibling, 0 replies; 6+ messages in thread
From: Peter Cordes @ 2018-08-05 23:41 UTC (permalink / raw)
  To: Sami Kerola; +Cc: Theodore Y. Ts'o, Dmitry V. Levin, util-linux

On Sun, Aug 05, 2018 at 11:42:09AM +0100, Sami Kerola wrote:
> 
> I should have told in that commit message part of the motivation was to 
> deprecate util-linux local md5 implementation. But since both of you 
> raised concern about performance I decided to test kernel api and 
> util-linux implementations as close the same way as they are used in 
> libuuid.
> 
> Executive summary: kernel api is surprisingly slow.

 You're probably testing on an x86-64 system with kernel mitigation
for Spectre and Meltdown.

Both of those add *significant* overhead to every system call (or
other kernel entry/exit, like interrupts).

e.g. in comments on Stack Overflow, @BeeOnRope found that a `syscall`
instruction with an invalid call number takes about 1800 cycles on a
Skylake CPU running Linux (in late February 2018).
https://stackoverflow.com/questions/48913091/fastest-linux-system-call#comment84843442_48914200

(Unfortunately IDK if there's a better / more details analysis of
system call costs anywhere.)

Most of that cost is in the WRMSR that flushes branch predictors,
using Intel's newly-introduced (and *not* fast) microcode assistance
for Spectre.  Possibly future hardware will make this cheaper, but on
current hardware it just sucks to make system calls.

Thanks to Meltdown mitigation, you get extra TLB misses in the kernel
and after returning to user-space.  (This may be less bad than in
early patches, thanks to using hardware PCIDs).  But even just the MOV
to CR3 to change the top level page table takes some time.

I'm not surprised that you found a 10x slowdown for short messages.  
Amortizing the kernel entry/exit over a larger buffer is the only way
for it not to be horrible.

If you're curious, you could try booting with the workarounds disabled
(or an old kernel) to see how much perf difference that makes.  The
SYSCALL / SYSRET instructions themselves only take something in the
ballpark of 50 cycles on Skylake or Ryzen, IIRC, and Linux's
system-call dispatch code is pretty efficient for the fast path.  Even
that might still be measurable overhead for MD5 on a short message,
though.

-- 
#define X(x,y) x##y
Peter Cordes ;  e-mail: X(peter@cor , des.ca)

"The gods confound the man who first found out how to distinguish the hours!
 Confound him, too, who in this place set up a sundial, to cut and hack
 my day so wretchedly into small pieces!" -- Plautus, 200 BC

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] libuuid: use kernel crypto api
  2018-08-05 10:42   ` Sami Kerola
  2018-08-05 23:41     ` Peter Cordes
@ 2018-08-06  6:56     ` Karel Zak
  1 sibling, 0 replies; 6+ messages in thread
From: Karel Zak @ 2018-08-06  6:56 UTC (permalink / raw)
  To: Sami Kerola; +Cc: Theodore Y. Ts'o, Dmitry V. Levin, util-linux

On Sun, Aug 05, 2018 at 11:42:09AM +0100, Sami Kerola wrote:
> I should have told in that commit message part of the motivation was to 
> deprecate util-linux local md5 implementation.

Hmm... it will be difficult if we want to support non-Linux systems
(GNU Hurd, OSX, BSD) for some of our tools ;-)

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-08-06  9:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-04 19:17 [PATCH] libuuid: use kernel crypto api Sami Kerola
2018-08-04 19:27 ` Dmitry V. Levin
2018-08-04 19:46 ` Theodore Y. Ts'o
2018-08-05 10:42   ` Sami Kerola
2018-08-05 23:41     ` Peter Cordes
2018-08-06  6:56     ` Karel Zak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).