From mboxrd@z Thu Jan 1 00:00:00 1970 From: Raymond Jennings Subject: Re: [FYI] tux3: Core changes Date: Fri, 31 Jul 2015 08:37:35 -0700 Message-ID: References: <67294911-1776-46b8-916d-0e5642a38725@phunq.net> <20150526070910.GA3307@quack.suse.cz> <20150526090058.GA8024@quack.suse.cz> <5564D60E.6000306@phunq.net> <20150527084138.GD2590@quack.suse.cz> <87a8vtdqfz.fsf@mail.parknet.co.jp> <20150623161247.GP2427@quack.suse.cz> <87k2ueepd6.fsf@mail.parknet.co.jp> <20150709160528.GK2900@quack.suse.cz> <874mklaqbn.fsf@mail.parknet.co.jp> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0309240263==" Cc: David Lang , Rik van Riel , Jan Kara , tux3@tux3.org, Linux Kernel Mailing List , Daniel Phillips , linux-fsdevel@vger.kernel.org To: OGAWA Hirofumi Return-path: In-Reply-To: <874mklaqbn.fsf@mail.parknet.co.jp> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: tux3-bounces@phunq.net Sender: "Tux3" List-Id: linux-fsdevel.vger.kernel.org --===============0309240263== Content-Type: multipart/alternative; boundary=089e0149d294d110b1051c2d9b95 --089e0149d294d110b1051c2d9b95 Content-Type: text/plain; charset=UTF-8 Returning ENOSPC when you have free space you can't yet prove is safer than not returning it and risking a data loss when you get hit by a write/commit storm. :) On Thu, Jul 30, 2015 at 9:44 PM, OGAWA Hirofumi wrote: > Jan Kara writes: > > >> > Yes, if userspace truncates the file, the situation we end up with is > >> > basically the same. However for truncate to happen some malicious > process > >> > has to come and truncate the file - a failure scenario that is > acceptable > >> > for most use cases since it doesn't happen unless someone is actively > >> > trying to screw you. With page forking it is enough for flusher thread > >> > to start writeback for that page to trigger the problem - event that > is > >> > basically bound to happen without any other userspace application > >> > interfering. > >> > >> Acceptable conclusion is where came from? That pseudocode logic doesn't > >> say about usage at all. And even if assume it is acceptable, as far as I > >> can see, for example /proc/sys/vm/drop_caches is enough to trigger, or a > >> page on non-exists block (sparse file. i.e. missing disk space check in > >> your logic). And if really no any lock/check, there would be another > >> races. > > > > So drop_caches won't cause any issues because it avoids mmaped pages. > > Also page reclaim or page migration don't cause any issues because > > they avoid pages with increased refcount (and increased refcount would > stop > > drop_caches from reclaiming the page as well if it was not for the mmaped > > check before). Generally, elevated page refcount currently guarantees > page > > isn't migrated, reclaimed, or otherwise detached from the mapping (except > > for truncate where the combination of mapping-index becomes invalid) and > > your page forking would change that assumption - which IMHO has a big > > potential for some breakage somewhere. > > Lifetime and visibility from user are different topic. The issue here > is visibility. Of course, those has relation more or less though, > refcount doesn't stop to drop page from radix-tree at all. > > Well, anyway, your claim seems to be assuming the userspace app > workarounds the issues. And it sounds like still not workarounds the > ENOSPC issue (validate at page fault/GUP) even if assuming userspace > behave as perfect. Calling it as kernel assumption is strange. > > If you claim, there is strange logic widely used already, and of course, > we can't simply break it because of compatibility. I would be able to > agree. But your claim sounds like that logic is sane and well designed > behavior. So I disagree. > > > And frankly I fail to see why you and Daniel care so much about this > > corner case because from performance POV it's IMHO a non-issue and you > > bother with page forking because of performance, don't you? > > Trying to penalize the corner case path, instead of normal path, should > try at first. Penalizing normal path to allow corner case path is insane > basically. > > Make normal path faster and more reliable is what we are trying. > > > So you can have a look for example at > > drivers/media/v4l2-core/videobuf2-dma-contig.c which implements setting > up > > of a video device buffer at virtual address specified by user. Now I > don't > > know whether there really is any userspace video program that sets up the > > video buffer in mmaped file. I would agree with you that it would be a > > strange thing to do but I've seen enough strange userspace code that I > > would not be too surprised. > > > > Another example of similar kind is at > > drivers/infiniband/core/umem.c where we again set up buffer for > infiniband > > cards at users specified virtual address. And there are more drivers in > > kernel like that. > > Unfortunately, I'm not looking those yet though. I guess those would be > helpful to see the details. > > Thanks. > -- > OGAWA Hirofumi > > _______________________________________________ > Tux3 mailing list > Tux3@phunq.net > http://phunq.net/mailman/listinfo/tux3 > --089e0149d294d110b1051c2d9b95 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Returning ENOSPC when you have free space you can't ye= t prove is safer than not returning it and risking a data loss when you get= hit by a write/commit storm. :)

On Thu, Jul 30, 2015 at 9:44 PM, OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> wrote:
Jan Kara <jack@suse.cz> writes:

>> > Yes, if userspace truncates the file, the situation we end up= with is
>> > basically the same. However for truncate to happen some malic= ious process
>> > has to come and truncate the file - a failure scenario that i= s acceptable
>> > for most use cases since it doesn't happen unless someone= is actively
>> > trying to screw you. With page forking it is enough for flush= er thread
>> > to start writeback for that page to trigger the problem - eve= nt that is
>> > basically bound to happen without any other userspace applica= tion
>> > interfering.
>>
>> Acceptable conclusion is where came from? That pseudocode logic do= esn't
>> say about usage at all. And even if assume it is acceptable, as fa= r as I
>> can see, for example /proc/sys/vm/drop_caches is enough to trigger= , or a
>> page on non-exists block (sparse file. i.e. missing disk space che= ck in
>> your logic). And if really no any lock/check, there would be anoth= er
>> races.
>
> So drop_caches won't cause any issues because it avoids mmaped pag= es.
> Also page reclaim or page migration don't cause any issues because=
> they avoid pages with increased refcount (and increased refcount would= stop
> drop_caches from reclaiming the page as well if it was not for the mma= ped
> check before). Generally, elevated page refcount currently guarantees = page
> isn't migrated, reclaimed, or otherwise detached from the mapping = (except
> for truncate where the combination of mapping-index becomes invalid) a= nd
> your page forking would change that assumption - which IMHO has a big<= br> > potential for some breakage somewhere.

Lifetime and visibility from user are different topic.=C2=A0 The iss= ue here
is visibility. Of course, those has relation more or less though,
refcount doesn't stop to drop page from radix-tree at all.

Well, anyway, your claim seems to be assuming the userspace app
workarounds the issues. And it sounds like still not workarounds the
ENOSPC issue (validate at page fault/GUP) even if assuming userspace
behave as perfect. Calling it as kernel assumption is strange.

If you claim, there is strange logic widely used already, and of course, we can't simply break it because of compatibility. I would be able to agree. But your claim sounds like that logic is sane and well designed
behavior. So I disagree.

> And frankly I fail to see why you and Daniel care so much about this > corner case because from performance POV it's IMHO a non-issue and= you
> bother with page forking because of performance, don't you?

Trying to penalize the corner case path, instead of normal path, sho= uld
try at first. Penalizing normal path to allow corner case path is insane basically.

Make normal path faster and more reliable is what we are trying.

> So you can have a look for example at
> drivers/media/v4l2-core/videobuf2-dma-contig.c which implements settin= g up
> of a video device buffer at virtual address specified by user. Now I d= on't
> know whether there really is any userspace video program that sets up = the
> video buffer in mmaped file. I would agree with you that it would be a=
> strange thing to do but I've seen enough strange userspace code th= at I
> would not be too surprised.
>
> Another example of similar kind is at
> drivers/infiniband/core/umem.c where we again set up buffer for infini= band
> cards at users specified virtual address. And there are more drivers i= n
> kernel like that.

Unfortunately, I'm not looking those yet though. I guess those w= ould be
helpful to see the details.

Thanks.
--
OGAWA Hirofumi <hirofumi@= mail.parknet.co.jp>

____________________________= ___________________
Tux3 mailing list
Tux3@phunq.net
http://phunq.net/mailman/listinfo/tux3

--089e0149d294d110b1051c2d9b95-- --===============0309240263== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Tux3 mailing list Tux3@phunq.net http://phunq.net/mailman/listinfo/tux3 --===============0309240263==--