On Sat, Apr 9, 2016 at 1:08 AM, Dave Chinner <david@fromorbit.com> wrote:
On Fri, Apr 08, 2016 at 12:06:35PM +0200, Jan Tulak wrote:
> On Fri, Apr 8, 2016 at 2:09 AM, Dave Chinner <david@fromorbit.com> wrote:
>
> > On Thu, Mar 24, 2016 at 12:15:34PM +0100, jtulak@redhat.com wrote:
> > > From: Jan Tulak <jtulak@redhat.com>
> > >
> > > Unify mkfs.xfs behaviour a bit and never truncate files. If the user
> > > is trying to mkfs an existing file, we don't want to destroy anything
> > > he did with the file before (sparse file, allocations...)
> >
> > Why not? We do that with discard-by-default to block devices,
> > O_TRUNC is exactly the same situation with a file - we completely
> > re-initialise the file from a known state if mkfs has been asked to
> > create the file.
> >
> But AFAIK, we don't zero-out entire spindle devices,

Unless the controller above them supports discard or whatever
implementation the storage protocol uses (e.g. UNMAP or WRITE_SAME).
e.g, the "spindle devices" often are big raid arrays that are using
thin provisioning, compression and dedupe internally, so running
discard on them does make a significant difference to their
behaviour.

> we don't ask if the drive skips some blocks (i.e. because they are bad),

That's irrelevant to the issue at hand.

> and we don't care
> about what an underlaying layer (like LVM) did with the block device.

Actually, we do, because users care about their storage stack doing
sane management operations automatically.

That's why we issued a discard - it tells the underlying devices to
re-initialise the storage on this device *if they care about such
things*. Stuff like thinly provisioned devices rely on mkfs
behaviour like this to recycle used storage efficiently and
transparently. The user expects things to "just work" and this is
one of those things that makes it "just work".

> From
> this point of view, we shouldn't care about the file either.
>
> I can be missing something, though.

I think you're missing the fact that we don't know what the
*underlying storage* cares about, so we need to tell them in some
way that a device or image file is being re-initialised from
scratch. Whether that is by truncating the image file (so the
filesytem can issue discards on the now unused space) or by issuing
discard ioctls ourselves, it really doesn't matter. The key point is
that we have a mechanism that allows us to notify the underlying
storage of the "this is re-initialised storage" intent of mkfs.

So from that perspective, the O_TRUNC behaviour should remain.

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com

All right​, I will keep the O_TRUNC there. However, should it truncate the file every time, or should we offer a way how to avoid the file truncating? Until now, mkfs behaved differently based on whether -d file was given, or not. Your explanation suggests that we should truncate every time, right?


​Cheers,
Jan​



--