All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Bartlett <abartlet@samba.org>
To: Namjae Jeon <linkinjeon@gmail.com>
Cc: hirofumi@mail.parknet.co.jp, linux-kernel@vger.kernel.org,
	Ravishankar N <cyberax82@gmail.com>,
	Amit Sahrawat <amit.sahrawat83@gmail.com>,
	Nam-Jae Jeon <namjae.jeon@samsung.com>,
	Ravishankar N <ravi.n1@samsung.com>,
	Amit Sahrawat <a.sahrawat@samsung.com>
Subject: Re: Read support for fat_fallocate()? (was [v2] fat: editions to support fat_fallocate())
Date: Fri, 15 Feb 2013 14:49:38 +1100	[thread overview]
Message-ID: <1360900178.1727.393.camel@jesse> (raw)
In-Reply-To: <CAKYAXd9tVzvP_Oa+BoUwKbpaMoVZKZgesDgrLirqoyk6AHrJ2A@mail.gmail.com>

On Thu, 2013-02-14 at 18:52 +0900, Namjae Jeon wrote:
> [snip]
> >> >
> >> > Thanks,
> >> Hi Andrew.
> >>
> >> First, Thanks for your interest !
> >> A mismatch between inode size and reserved blocks can be either due to
> >> pre-allocation (after our changes) or due to corruption (sudden unplug
> >> of media etc).
> >> We don’t think it is right to include only read only support (i.e.
> >> without fallocate support) for such files because if such files are
> >> encountered it only means that the file is corrupted, as there is no
> >> current method to check if the issue is due to pre-allocation.
> >> If it is to be included in the kernel, then the whole patch has to go
> >> in.
> >
> > I don't see why that is the case.
> If we consider that there is no FALLOCATE support, then the condition
> of file size and blocks not matching can be only possible in case of
> corruption, right?

Sure.  I was just suggesting we transparently recover from that, by
using the blocks.  Think of it more as an online fsck not about
fallocate. 

Anyway, if you don't think it's reasonable to use those blocks, or to
'just fix it', then we just have to continue to do as we currently do.
That is on first sign of FS corruption just stop doing writes, and await
an FSCK.  

> >> But then again, since the FAT specifications do not accommodate
> >> for pre-allocation, then it is up to OGAWA to decide if this is
> >> acceptable.
> >> In any case, the patch will definitely break backward compatibility
> >> (on an older fat driver without fallocate support) and also in case
> >> for the two variants for the same kernel versions and only one has
> >> FALLOCATE enabled, in such cases also, the behavior will assume
> >> corruption in one case.
> >
> > I agree that the sudden unplug is a concern, but why not make the
> > filesystem more robust against that inevitable occurrence?  If the
> > blocks appear to be allocated to the file, why not use them?
> We also agree that there should be pre-allocation feature on FAT, and
> we had shared the scenarios where this could be required for a TV/
> recorder.
> But there are certain drawbacks which were raised by OGAWA with
> respect to compatibility and we also tend to agree on them.
> There could possibly be an issue where we are unable to distinguish
> between pre-allocation and corruption. Perhaps we could set a status
> bit on the file to indicate whether the file has pre-allocated blocks.
> This will make it clear whether the allocation is genuine through the
> FAT Fallocate request or is a result of corruption. Depending on the
> status of the flag - the decision can be made regard to reading
> blocks.
> But, the main issue in this will be storing this bit in on-disk
> directory entry for that file. From the feature and usability point of
> view, we should have fallocate on FAT too.
> 
> But it needs initial ACK from OGAWA to continue to work on this so
> that we can figure out the proper solution to move forward.

OK.  Given the need to find other approaches, I wanted to suggest some
ideas - some of which you may have already considered:

What about having a shadow FAT in a file, say called 'allocated space',
that would contain inode -> cluster list pairs, and where that file
would itself contain the free space the 'belongs' to other files?

As new clusters become needed in a file, they would simply be removed
from the 'allocated space' file, and assigned to the file they really
belong to.  That way, another OS just sees a large file, nothing more. 

Or, if we cannot make any changes to the on-disk format, what about
keeping such a database in memory, allocating some of the existing free
list to files that have had fallocate() called on them?  (Naturally,
this makes it non-persistent, and instead more of a 'hint', but could at
least solve our mutual performance issues). 

Or, could we leave allocated but unused clusters in the free cluster
list, but maintain a file with a hint that a particular file should use
a particular free cluster next, if available?  That list of 'allocated
free' clusters could be honoured by fallocate-aware OSs to reduce df and
increase du, but be ignored by other OSs, ensuring you could not run out
of space expanding a file in another OS.  

If a cluster was observed no longer to be in the real free list, it
would be ignored in the 'allocated free' list, to avoid corruption. 

In short, I see the restriction on not breaking existing implementations
as a difficult, but certainty not impossible problem. 

Thanks,

Andrew Bartlett

-- 
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org



  reply	other threads:[~2013-02-15  3:49 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-13 14:31 [PATCH v2] fat: editions to support fat_fallocate() Namjae Jeon
2012-10-14 16:20 ` OGAWA Hirofumi
2012-10-16  4:12   ` Namjae Jeon
2012-10-16 10:14     ` OGAWA Hirofumi
2012-10-17 10:57       ` Namjae Jeon
2012-10-17 10:57         ` Namjae Jeon
2012-10-21 23:54         ` OGAWA Hirofumi
2012-10-22 15:10           ` Namjae Jeon
2012-10-22 15:10             ` Namjae Jeon
2012-10-23  7:19             ` OGAWA Hirofumi
2012-10-23  7:24               ` Namjae Jeon
2013-02-14  2:40 ` Read support for fat_fallocate()? (was [v2] fat: editions to support fat_fallocate()) Andrew Bartlett
2013-02-14  2:48 ` Andrew Bartlett
2013-02-14  6:44   ` Namjae Jeon
2013-02-14  7:07     ` Andrew Bartlett
2013-02-14  9:52       ` Namjae Jeon
2013-02-15  3:49         ` Andrew Bartlett [this message]
2013-02-18 11:36           ` OGAWA Hirofumi
2013-02-18 13:11             ` Andrew Bartlett
2013-02-18 14:25             ` Namjae Jeon
2013-02-18 14:59               ` OGAWA Hirofumi
2013-02-18 15:15                 ` Namjae Jeon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1360900178.1727.393.camel@jesse \
    --to=abartlet@samba.org \
    --cc=a.sahrawat@samsung.com \
    --cc=amit.sahrawat83@gmail.com \
    --cc=cyberax82@gmail.com \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=linkinjeon@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=namjae.jeon@samsung.com \
    --cc=ravi.n1@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.