Re: Memory keys and io_uring.

From: Jens Axboe <axboe@kernel.dk>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Michael Ellerman <mpe@ellerman.id.au>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: Memory keys and io_uring.
Date: Fri, 12 Feb 2021 08:37:57 -0700	[thread overview]
Message-ID: <0ec1943b-4004-66bd-5a8f-2daf86de3349@kernel.dk> (raw)
In-Reply-To: <4ed6cbf6-b850-dac5-88c6-03e58dfc6631@linux.ibm.com>

On 2/12/21 8:33 AM, Aneesh Kumar K.V wrote:
> On 2/12/21 8:45 PM, Jens Axboe wrote:
>> On 2/11/21 11:59 PM, Aneesh Kumar K.V wrote:
>>>
>>> Hi,
>>>
>>> I am trying to estabilish the behaviour we should expect when passing a
>>> buffer with memory keys attached to io_uring syscalls. As show  in the
>>> blow test
>>>
>>> /*
>>>   * gcc -Wall -O2 -D_GNU_SOURCE -o pkey_uring pkey_uring.c -luring
>>>   */
>>> #include <stdio.h>
>>> #include <fcntl.h>
>>> #include <string.h>
>>> #include <stdlib.h>
>>> #include <unistd.h>
>>> #include <sys/mman.h>
>>> #include "liburing.h"
>>>
>>> #define PAGE_SIZE  (64 << 10)
>>>
>>> int main(int argc, char *argv[])
>>> {
>>> 	int fd, ret, pkey;
>>> 	struct io_uring ring;
>>> 	struct io_uring_sqe *sqe;
>>> 	struct io_uring_cqe *cqe;
>>> 	struct iovec iovec;
>>> 	void *buf;
>>>
>>> 	if (argc < 2) {
>>> 		printf("%s: file\n", argv[0]);
>>> 		return 1;
>>> 	}
>>>
>>> 	ret = io_uring_queue_init(1, &ring, IORING_SETUP_SQPOLL);
>>> 	if (ret < 0) {
>>> 		fprintf(stderr, "queue_init: %s\n", strerror(-ret));
>>> 		return 1;
>>> 	}
>>>
>>> 	fd = open(argv[1], O_RDONLY | O_DIRECT);
>>> 	if (fd < 0) {
>>> 		perror("open");
>>> 		return 1;
>>> 	}
>>>
>>> 	if (posix_memalign(&buf, PAGE_SIZE, PAGE_SIZE))
>>> 		return 1;
>>> 	iovec.iov_base = buf;
>>> 	iovec.iov_len = PAGE_SIZE;
>>>
>>> 	//mprotect(buf, PAGE_SIZE, PROT_NONE);
>>> 	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
>>> 	pkey_mprotect(buf, PAGE_SIZE, PROT_READ | PROT_WRITE, pkey);
>>>
>>>
>>> 	sqe = io_uring_get_sqe(&ring);
>>> 	if (!sqe) {
>>> 		perror("io_uring_get_sqe");
>>> 		return 1;
>>> 	}
>>> 	io_uring_prep_readv(sqe, fd, &iovec, 1, 0);
>>>
>>> 	ret = io_uring_submit(&ring);
>>> 	if (ret != 1) {
>>> 		fprintf(stderr, "io_uring_submit: %s\n", strerror(-ret));
>>> 		return 1;
>>> 	}
>>>
>>> 	ret = io_uring_wait_cqe(&ring, &cqe);
>>>
>>> 	if (cqe->res < 0)
>>> 		fprintf(stderr, "iouring submit failed %s\n", strerror(-cqe->res));
>>> 	else
>>> 		fprintf(stderr, "iouring submit success\n");
>>>
>>> 	io_uring_cqe_seen(&ring, cqe);
>>>
>>> 	/*
>>> 	 * let's access this via a read syscall
>>> 	 */
>>> 	ret = read(fd, buf, PAGE_SIZE);
>>> 	if (ret < 0)
>>> 		fprintf(stderr, "read failed : %s\n", strerror(errno));
>>>
>>> 	close(fd);
>>> 	io_uring_queue_exit(&ring);
>>>
>>> 	return 0;
>>> }
>>>
>>> A read syscall do fail with EFAULT. But we allow read via io_uring
>>> syscalls. Is that ok? Considering memory keys are thread-specific we
>>> could debate that kernel thread can be considered to be the one that got all access
>>> allowed via keys or we could update that access is denied via kernel
>>> thread for any key value other than default key (key 0). Other option
>>> is to inherit the memory key restrictions when doing
>>> io_uring_submit() and use the same when accessing the userspace from
>>> kernel thread.
>>>
>>> Any thoughts here with respect to what should be behaviour?
>>
>> It this a powerpc thing? I get -EFAULT on x86 for both reads, io_uring
>> and regular syscall. That includes SQPOLL, not using SQPOLL, or
>> explicitly setting IOSQE_ASYNC on the sqe.
>>
> 
> Interesting, I didn't check x86 because i don't have hardware that 
> supports memory keys. I am trying to make ppc64 behavior compatible with 
> other archs here.
> 
> IIUC, in your test io_wqe/sqe kernel thread did hit access fault when 
> touching the buffer on x86? That is different from what Dave explained 
> earlier.

Yes, all four methods (task inline, task_work, SQPOLL, io-wq offload)
return -EFAULT for me on x86.

> With the patch 8c511eff1827 ("powerpc/kuap: Allow kernel thread to 
> access userspace after kthread_use_mm") I now have key 0 access  allowed 
> but all other keys denied with ppc64. I was planning to change that to 
> allow all key access based on reply from Dave.  I would be curious to 
> understand what made x86 deny the access and how did kthread inherit the 
> key details.

I'm not very familiar with the memory protection for pkeys and how it's
done on various archs, so not going to be of much help there... But
io_uring assumes the right mm for any of these accesses, so if it's tied
to that, then it should work as it does on x86.

-- 
Jens Axboe