From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx143.netapp.com ([216.240.21.24]:45411 "EHLO mx143.netapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751357AbeCOQb4 (ORCPT ); Thu, 15 Mar 2018 12:31:56 -0400 Subject: Re: [RFC 1/7] mm: Add new vma flag VM_LOCAL_CPU To: Miklos Szeredi References: <443fea57-f165-6bed-8c8a-0a32f72b9cd2@netapp.com> <20180313185658.GB21538@bombadil.infradead.org> <07cda3e5-c911-a49b-fceb-052f8ca57e66@netapp.com> CC: Matthew Wilcox , linux-fsdevel , Ric Wheeler , "Steve French" , Steven Whitehouse , "Jefff moyer" , Sage Weil , Jan Kara , Amir Goldstein , Andy Rudof , Anna Schumaker , "Amit Golander" , Sagi Manole , "Shachar Sharon" From: Boaz Harrosh Message-ID: <9bfa8d53-5693-7953-9dcf-79a8cff0b97f@netapp.com> Date: Thu, 15 Mar 2018 18:30:47 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 15/03/18 18:10, Miklos Szeredi wrote: <> >> This can never properly translate. Even a simple file on disk >> is linear for the app (unaligned buffer) but is scattered on >> multiple blocks on disk. Yes perhaps networking can somewhat work >> if you pre/post-pend the headers you need. >> And you restrict direct IO semantics on everything specially the APP >> with my system you can do zero copy on any kind of application > > I lost you there, sorry. > > How will your scheme deal with alignment issues better than my scheme? > In my pmem case easy memcpy. This will not work if you need to go to hard disk I agree. (Which is not a priority for me) >> And this assumes networking or some-device. Which means going back >> to the Kernel, which in ZUFS rules you must return -ASYNC to the zuf >> and complete in a background ASYNC thread. This is an order of a magnitude >> higher latency then what I showed here. > > Indeed. > >> And what about the SYNC copy from Server to APP. With a pipe you >> are forcing me to go back to the Kernel to execute the copy. which >> means two more crossings. This will double the round trips. > > If you are trying to minimize the roundtrips, why not cache the > mapping in the kernel? That way you don't necessarily have to go to > userspace at all. With readahead logic, the server will be able to > preload the mapping before the reads happen, and you basically get the > same speed as an in-kernel fs would. > Yes as I said that was my first approach. But at the end this is always a special workload optimization but in the general case this actually adds a round trip and a huge complexity that always comes to bite you. > Also don't quite understand how are you planning to generalize beyond > the pmem case. The interface is ready for that, sure. But what about > caching? Will that be done in the server? Does that make sense? > Kernel already has page cache for that purpose and userspace cache > won't ever be as good as kernel cache. > I explained about that. We can easily support page-cache in zufs here what I wrote: > Please note that it will be very easy with this API to also support > page-cache for FSs that wants it like the network FSs you said. > The FS will set a bit in the fs_register call to say that it would > rather use page cache. These type of FSs will run on a different > kind of BDI which says "Yes page cache please". All the IO entry > vectors point to the generic_iter API and instead we implement > read/write_pages(). At read/write_pages() we do the exact same OP_READ/WRITE > like today. map the cache pages to the zus VM, despatch, return, release page_lock. > all is happy. Any one wanting to contribute this is very welcome. Yes please no caching at the zus level that's insane > Thanks, > Miklos > Thanks Boaz