From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail-ot0-f180.google.com ([74.125.82.180]:44095 "EHLO
        mail-ot0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1752843AbeCOUmS (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 15 Mar 2018 16:42:18 -0400
Received: by mail-ot0-f180.google.com with SMTP id 79-v6so8224472oth.11
        for <linux-fsdevel@vger.kernel.org>; Thu, 15 Mar 2018 13:42:18 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <9bfa8d53-5693-7953-9dcf-79a8cff0b97f@netapp.com>
References: <d5db55b6-e488-986a-81b1-a3514e8eba81@netapp.com>
 <443fea57-f165-6bed-8c8a-0a32f72b9cd2@netapp.com> <20180313185658.GB21538@bombadil.infradead.org>
 <CAOssrKfoZKcu1Ku3YOGsoTXmdJeJy71bvQaZ6k3+r6_kD0B2Fg@mail.gmail.com>
 <b49772ef-e96e-af22-ba6d-f91a26389fab@netapp.com> <CAOssrKf+KJfr8anKZFqTwLNO85Fkfrw7=ZpXYi53uT++PqADbA@mail.gmail.com>
 <07cda3e5-c911-a49b-fceb-052f8ca57e66@netapp.com> <CAOssrKcUDNQdEoKayLPsoSNpZgtnro3u6nAQcZvOHZHO25JFag@mail.gmail.com>
 <9bfa8d53-5693-7953-9dcf-79a8cff0b97f@netapp.com>
From: Miklos Szeredi <mszeredi@redhat.com>
Date: Thu, 15 Mar 2018 21:42:17 +0100
Message-ID: <CAOssrKd6nNvQ53fdMjTQ7ad=Lqd38Se+acy1cSsDECtaaP3JyA@mail.gmail.com>
Subject: Re: [RFC 1/7] mm: Add new vma flag VM_LOCAL_CPU
To: Boaz Harrosh <boazh@netapp.com>
Cc: Matthew Wilcox <willy@infradead.org>,
        linux-fsdevel <linux-fsdevel@vger.kernel.org>,
        Ric Wheeler <rwheeler@redhat.com>,
        Steve French <smfrench@gmail.com>,
        Steven Whitehouse <swhiteho@redhat.com>,
        Jefff moyer <jmoyer@redhat.com>, Sage Weil <sweil@redhat.com>,
        Jan Kara <jack@suse.cz>, Amir Goldstein <amir73il@gmail.com>,
        Andy Rudof <andy.rudoff@intel.com>,
        Anna Schumaker <Anna.Schumaker@netapp.com>,
        Amit Golander <Amit.Golander@netapp.com>,
        Sagi Manole <sagim@netapp.com>,
        Shachar Sharon <Shachar.Sharon@netapp.com>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Thu, Mar 15, 2018 at 5:30 PM, Boaz Harrosh <boazh@netapp.com> wrote:
> On 15/03/18 18:10, Miklos Szeredi wrote:
> <>
>>> This can never properly translate. Even a simple file on disk
>>> is linear for the app (unaligned buffer) but is scattered on
>>> multiple blocks on disk. Yes perhaps networking can somewhat work
>>> if you pre/post-pend the headers you need.
>>> And you restrict direct IO semantics on everything specially the APP
>>> with my system you can do zero copy on any kind of application
>>
>> I lost you there, sorry.
>>
>> How will your scheme deal with alignment issues better than my scheme?
>>
>
> In my pmem case easy memcpy. This will not work if you need to go
> to hard disk I agree. (Which is not a priority for me)
>
>>> And this assumes networking or some-device. Which means going back
>>> to the Kernel, which in ZUFS rules you must return -ASYNC to the zuf
>>> and complete in a background ASYNC thread. This is an order of a magnitude
>>> higher latency then what I showed here.
>>
>> Indeed.
>>
>>> And what about the SYNC copy from Server to APP. With a pipe you
>>> are forcing me to go back to the Kernel to execute the copy. which
>>> means two more crossings. This will double the round trips.
>>
>> If you are trying to minimize the roundtrips, why not cache the
>> mapping in the kernel?  That way you don't necessarily have to go to
>> userspace at all.  With readahead logic, the server will be able to
>> preload the mapping before the reads happen, and you basically get the
>> same speed as an in-kernel fs would.
>>
>
> Yes as I said that was my first approach. But at the end this is
> always a special workload optimization but in the general case this
> actually adds a round trip and a huge complexity that always comes
> to bite you.

Ideally most of the complexity would be in the page cache.  Not sure
how ready it is to handle pmem pages?

The general case (non-pmem) will always have to be handled
differently; you've just stated that it's much less latency sensitive
and needs async handling.    Basing the design on just trying to make
it use the same mechanism (userspace copy) is flawed in my opinion,
since it's suboptimal for either case.

Thanks,
Miklos