linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Doubts regarding iov_iter_get_pages{,_alloc}
@ 2016-12-11 20:23 Wolfgang Draxinger
  0 siblings, 0 replies; only message in thread
From: Wolfgang Draxinger @ 2016-12-11 20:23 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: viro, matthew

Hi,

I'm currently reworking a device driver (unfortunately it's mostly
under NDA and I can't share details about it) and am at the point of
replacing the old jenga tower of "DIY async read/write via custom
ioctls" with something that uses the kernel's AIO infrastructure.

Essentially where I'm currently at is taking an iov_iter and produce a
DMA scatterlist from it (and I definitely think that THAT should be
something readily available, will probably contribute my implementation
once it's done). There are a few extra quirks in what I need, but those
are not important here.

Anyway, in trying to understand what I have to do to this end I was
looking at iov_iter_get_pages (and its _alloc variant), which raised
more questions than answering. So here's the code as found in linux-4.8
(linux/lib/iov_iter.c +585)

ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
		   struct page ***pages, size_t maxsize,
		   size_t *start)
{
	struct page **p;

	if (maxsize > i->count)
		maxsize = i->count;

	if (!maxsize)
		return 0;

	iterate_all_kinds(i, maxsize, v, ({
		unsigned long addr = (unsigned long)v.iov_base;
		size_t len = v.iov_len + (*start = addr & (PAGE_SIZE -	1));
		int n;
		int res;

		addr &= ~(PAGE_SIZE - 1);
		n = DIV_ROUND_UP(len, PAGE_SIZE);
		p = get_pages_array(n);
		if (!p)
			return -ENOMEM;
		res = get_user_pages_fast(addr, n, (i->type & WRITE) != WRITE, p);
		if (unlikely(res < 0)) {
			kvfree(p);
			return res;
		}
		*pages = p;
		return (res == n ? len : res * PAGE_SIZE) - *start;
	0;}),({
		/* can't be more than PAGE_SIZE */
		*start = v.bv_offset;
		*pages = p = get_pages_array(1);
		if (!p)
			return -ENOMEM;
		get_page(*p = v.bv_page);
		return v.bv_len;
	}),({
		return -EFAULT;
	})
	)
	return 0;
}
EXPORT_SYMBOL(iov_iter_get_pages_alloc);

Here's the thing I'm dumbfounded about: This thing will never iterate
past the first entry of an iov_iter (never mind the superfluous(?) `0;`
at the end of the ITER_IOVEC step; I mean whatever that compound statement
expression is the r-value of, that return preempts it).
The execution path of every kind of step ends in an unconditional return
statement, thereby (seemingly?) cutting the whole thing short.

In case of the _alloc variant this is probably a good thing, as
otherwise the memory allocated for the page array would be leaked in
subsequent iterations.

Which leaves me with the following doubts:

What is the rationale for looking at only the very first element in an
iov_iter?

If this somehow does give the pages of everything an iov_iter covers,
how does this work?

If this gives the pages of only the very first element in an iov_iter
is this the intended behaviour of the API and what's the recommended
way of retrieving the pages of all the elements in an iov_iter?


Kind Regards,

Wolfgang

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2016-12-11 20:33 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-11 20:23 Doubts regarding iov_iter_get_pages{,_alloc} Wolfgang Draxinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).