From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932640AbcLMBQH (ORCPT ); Mon, 12 Dec 2016 20:16:07 -0500 Received: from mail.kernel.org ([198.145.29.136]:53224 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753082AbcLMBQF (ORCPT ); Mon, 12 Dec 2016 20:16:05 -0500 Date: Mon, 12 Dec 2016 17:15:55 -0800 (PST) From: Stefano Stabellini X-X-Sender: sstabellini@sstabellini-ThinkPad-X260 To: Al Viro cc: Stefano Stabellini , v9fs-developer@lists.sourceforge.net, ericvh@gmail.com, rminnich@sandia.gov, lucho@ionkov.net, linux-kernel@vger.kernel.org Subject: Re: [PATCH 4/5] 9p: introduce async read requests In-Reply-To: <20161210015059.GD1555@ZenIV.linux.org.uk> Message-ID: References: <1481230746-16741-1-git-send-email-sstabellini@kernel.org> <1481230746-16741-4-git-send-email-sstabellini@kernel.org> <20161210015059.GD1555@ZenIV.linux.org.uk> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Al, thanks for looking at the patch. On Sat, 10 Dec 2016, Al Viro wrote: > On Thu, Dec 08, 2016 at 12:59:05PM -0800, Stefano Stabellini wrote: > > > > + } else { > > + req = p9_client_get_req(clnt, P9_TREAD, "dqd", fid->fid, offset, rsize); > > + if (IS_ERR(req)) { > > + *err = PTR_ERR(req); > > + break; > > + } > > + req->rsize = iov_iter_get_pages_alloc(to, &req->pagevec, > > + (size_t)rsize, &req->offset); > > + req->kiocb = iocb; > > + for (i = 0; i < req->rsize; i += PAGE_SIZE) > > + page_cache_get_speculative(req->pagevec[i/PAGE_SIZE]); > > + req->callback = p9_client_read_complete; > > + > > + *err = clnt->trans_mod->request(clnt, req); > > + if (*err < 0) { > > + clnt->status = Disconnected; > > + release_pages(req->pagevec, > > + (req->rsize + PAGE_SIZE - 1) / PAGE_SIZE, > > + true); > > + kvfree(req->pagevec); > > + p9_free_req(clnt, req); > > + break; > > + } > > + > > + *err = -EIOCBQUEUED; > > IDGI. AFAICS, your code will result in shitloads of short reads - every > time when you give it a multi-iovec array, only the first one will be > issued and the rest won't be even looked at. Sure, it is technically > legal, but I very much doubt that aio users will be happy with that. > > What am I missing here? There is a problem with short reads, but I don't think it is the one you described, unless I made a mistake somewhere. The code above will issue one request as big as possible (size is limited by clnt->msize, which is the size of the channel). No matter how many segs in the iov_iter, the request will cover the maximum amount of bytes allowed by the channel, typically 4K but can be larger. So multi-iovec arrays should be handled correctly up to msize. Please let me know if I am wrong, I am not an expert on this. I tried, but couldn't actually get any multi-iovec arrays in my tests. That said, reading the code again, there are indeed two possible causes of short reads. The first one are responses which have a smaller byte count than requested. Currently this case is not handled, forwarding the short read up to the user. But I wrote and tested successfully a patch that issues another follow-up request from the completion function. Example: p9_client_read, issue request, size 4K p9_client_read_complete, receive response, count 2K p9_client_read_complete, issue request size 4K-2K = 2K p9_client_read_complete, receive response size 2K p9_client_read_complete, call ki_complete The second possible cause of short reads are requests which are bigger than msize. For example a 2MB read over a 4K channel. In this patch we simply issue one request as big as msize, and eventually return to the caller a smaller byte count. One option is to simply fall back to sync IO in these cases. Another solution is to use the same technique described above: we issue the first request as big as msize, then, from the callback function, we issue a follow-up request, and again, and again, until we fully complete the large read. This way, although we are not issuing any simultaneous requests for a specific large read, at least we can issue other aio requests in parallel for other reads or writes. It should still be more performant than sync IO. I am thinking of implementing this second option in the next version of the series.