All of lore.kernel.org
 help / color / mirror / Atom feed
From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
To: Andrew Bartlett <abartlet@samba.org>
Cc: Namjae Jeon <linkinjeon@gmail.com>,
	linux-kernel@vger.kernel.org, Ravishankar N <cyberax82@gmail.com>,
	Amit Sahrawat <amit.sahrawat83@gmail.com>,
	Nam-Jae Jeon <namjae.jeon@samsung.com>,
	Ravishankar N <ravi.n1@samsung.com>,
	Amit Sahrawat <a.sahrawat@samsung.com>
Subject: Re: Read support for fat_fallocate()? (was [v2] fat: editions to support fat_fallocate())
Date: Mon, 18 Feb 2013 20:36:57 +0900	[thread overview]
Message-ID: <87fw0t26qu.fsf@devron.myhome.or.jp> (raw)
In-Reply-To: <1360900178.1727.393.camel@jesse> (Andrew Bartlett's message of "Fri, 15 Feb 2013 14:49:38 +1100")

Andrew Bartlett <abartlet@samba.org> writes:

>> >> First, Thanks for your interest !
>> >> A mismatch between inode size and reserved blocks can be either due to
>> >> pre-allocation (after our changes) or due to corruption (sudden unplug
>> >> of media etc).
>> >> We don’t think it is right to include only read only support (i.e.
>> >> without fallocate support) for such files because if such files are
>> >> encountered it only means that the file is corrupted, as there is no
>> >> current method to check if the issue is due to pre-allocation.
>> >> If it is to be included in the kernel, then the whole patch has to go
>> >> in.
>> >
>> > I don't see why that is the case.
>> If we consider that there is no FALLOCATE support, then the condition
>> of file size and blocks not matching can be only possible in case of
>> corruption, right?
>
> Sure.  I was just suggesting we transparently recover from that, by
> using the blocks.  Think of it more as an online fsck not about
> fallocate. 
>
> Anyway, if you don't think it's reasonable to use those blocks, or to
> 'just fix it', then we just have to continue to do as we currently do.
> That is on first sign of FS corruption just stop doing writes, and await
> an FSCK.  

I'm not sure what is suggesting actually though. We have to consider
about synchronous runtime fsck makes normal path enough slower.

E.g. probably, in this case, all first open(2) of the inode will have to
walk cluster chain until end of cluster mark, to verify cluster chain.

>> >> But then again, since the FAT specifications do not accommodate
>> >> for pre-allocation, then it is up to OGAWA to decide if this is
>> >> acceptable.
>> >> In any case, the patch will definitely break backward compatibility
>> >> (on an older fat driver without fallocate support) and also in case
>> >> for the two variants for the same kernel versions and only one has
>> >> FALLOCATE enabled, in such cases also, the behavior will assume
>> >> corruption in one case.
>> >
>> > I agree that the sudden unplug is a concern, but why not make the
>> > filesystem more robust against that inevitable occurrence?  If the
>> > blocks appear to be allocated to the file, why not use them?
>> We also agree that there should be pre-allocation feature on FAT, and
>> we had shared the scenarios where this could be required for a TV/
>> recorder.
>> But there are certain drawbacks which were raised by OGAWA with
>> respect to compatibility and we also tend to agree on them.
>> There could possibly be an issue where we are unable to distinguish
>> between pre-allocation and corruption. Perhaps we could set a status
>> bit on the file to indicate whether the file has pre-allocated blocks.
>> This will make it clear whether the allocation is genuine through the
>> FAT Fallocate request or is a result of corruption. Depending on the
>> status of the flag - the decision can be made regard to reading
>> blocks.
>> But, the main issue in this will be storing this bit in on-disk
>> directory entry for that file. From the feature and usability point of
>> view, we should have fallocate on FAT too.
>> 
>> But it needs initial ACK from OGAWA to continue to work on this so
>> that we can figure out the proper solution to move forward.
>
> OK.  Given the need to find other approaches, I wanted to suggest some
> ideas - some of which you may have already considered:
>
> What about having a shadow FAT in a file, say called 'allocated space',
> that would contain inode -> cluster list pairs, and where that file
> would itself contain the free space the 'belongs' to other files?
>
> As new clusters become needed in a file, they would simply be removed
> from the 'allocated space' file, and assigned to the file they really
> belong to.  That way, another OS just sees a large file, nothing more. 
>
> Or, if we cannot make any changes to the on-disk format, what about
> keeping such a database in memory, allocating some of the existing free
> list to files that have had fallocate() called on them?  (Naturally,
> this makes it non-persistent, and instead more of a 'hint', but could at
> least solve our mutual performance issues).

[...]

Hm. My concerns are compatibility and reliability. Although We can
change on-disk format if need, but I don't think it can be compatible
and reliable. If so, who wants to use it? I feel there is no reason to
use FAT if there is no compatible.

Well, anyway, possible solution would be, we can pre-allocate physical
blocks via fallocate(2) or something, but discard pre-allocated blocks
at ->release() (or before unmount at least). This way would have
compatibility (no on-disk change over unmount) and possible breakage
would be same with normal extend write patterns on kernel crash
(i.e. Windows or fsck will truncate after i_size).

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

  reply	other threads:[~2013-02-18 11:37 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-13 14:31 [PATCH v2] fat: editions to support fat_fallocate() Namjae Jeon
2012-10-14 16:20 ` OGAWA Hirofumi
2012-10-16  4:12   ` Namjae Jeon
2012-10-16 10:14     ` OGAWA Hirofumi
2012-10-17 10:57       ` Namjae Jeon
2012-10-17 10:57         ` Namjae Jeon
2012-10-21 23:54         ` OGAWA Hirofumi
2012-10-22 15:10           ` Namjae Jeon
2012-10-22 15:10             ` Namjae Jeon
2012-10-23  7:19             ` OGAWA Hirofumi
2012-10-23  7:24               ` Namjae Jeon
2013-02-14  2:40 ` Read support for fat_fallocate()? (was [v2] fat: editions to support fat_fallocate()) Andrew Bartlett
2013-02-14  2:48 ` Andrew Bartlett
2013-02-14  6:44   ` Namjae Jeon
2013-02-14  7:07     ` Andrew Bartlett
2013-02-14  9:52       ` Namjae Jeon
2013-02-15  3:49         ` Andrew Bartlett
2013-02-18 11:36           ` OGAWA Hirofumi [this message]
2013-02-18 13:11             ` Andrew Bartlett
2013-02-18 14:25             ` Namjae Jeon
2013-02-18 14:59               ` OGAWA Hirofumi
2013-02-18 15:15                 ` Namjae Jeon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fw0t26qu.fsf@devron.myhome.or.jp \
    --to=hirofumi@mail.parknet.co.jp \
    --cc=a.sahrawat@samsung.com \
    --cc=abartlet@samba.org \
    --cc=amit.sahrawat83@gmail.com \
    --cc=cyberax82@gmail.com \
    --cc=linkinjeon@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=namjae.jeon@samsung.com \
    --cc=ravi.n1@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.