All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: A few thoughts on hfsplus dev (Re: [PATCH v2] hfsplus: add journal replay)
@ 2014-04-05  1:02 Hin-Tak Leung
  2014-04-06 16:01 ` Sergei Antonov
  0 siblings, 1 reply; 11+ messages in thread
From: Hin-Tak Leung @ 2014-04-05  1:02 UTC (permalink / raw)
  To: saproj; +Cc: slava, linux-fsdevel, viro, hch, akpm

Hi Sergei,

------------------------------
On Tue, Mar 25, 2014 1:37 AM GMT Sergei Antonov wrote:

>Let me inject a remark here.
>A reader of my discussion with Vyacheslav here may get an impression that my code cares less about data safety. This impression is false. To make a long story short: replaying+removing transactions one by one is as safe as replaying them all at once and then removing them alltogether. Maybe it is not so for other journal formats, for HFS+ journal it is.
>

I think a reader of your commit message might get the impression that you are more
concerned about performance/elegance/cleverness than correctness.

"Doing A is as safe as doing B" - you still have to pick one way or another.
Or supply a patch to do both, to demonstrate switching between the two.

>
>Sigh. You have been misled. That's probably because claims like "you do not follow this excerpt from the spec" are more impressive and easier to remember than my following explanations why I am not.
>

You are probably assuming that you will always be around to explain to future persons
who might want to understand this code; and you are assuming that you are the 
only person to be working on it? How do you feel about a future person
making a commit to revert/correct your work, for example?

>> - I have a more re-produce'able recipe of making disk images with strange 
>> location of 2nd volume header using hdiutil (Apple's loopback mounting, etc disk image
>> manipulation tool). At some point we should re-visit this issue.
>
>Well, present your recepie. But! There is only one proper placement for this header. All other placements are improper, and the spec even explicitly gives one example of an improper placement. I imagine no ways to get it wrong. I see no need for any elaborate code for finding the 2nd header.
>

Apple owns the spec, and owns the rights of interpretation and mis-interpretation of it,
and they are not bound by it, and they can freely deviate and change their minds, etc. it
is not an iso spec, it was not submitted to external bodies for standardization - it 
is just provided as is.

You can say Appple's tool is wrong - but a more pragmatic way of seeing it, is that
they don't have to keep the spec up to date or correct; keeping the spec up to date
is going to cost somebody's time to do so; there is no business incentive to
spend the engineering hours to keep the spec up to date. Their tools/implementations
in software is more authoritative than the words in the spec.

hdiutil is Apple proprietary software. source code is not available.

Hin-Tak

^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: A few thoughts on hfsplus dev (Re: [PATCH v2] hfsplus: add journal replay)
@ 2014-04-04 23:35 Hin-Tak Leung
  2014-04-06 20:59 ` Sergei Antonov
  0 siblings, 1 reply; 11+ messages in thread
From: Hin-Tak Leung @ 2014-04-04 23:35 UTC (permalink / raw)
  To: saproj, slava; +Cc: linux-fsdevel, viro, hch, akpm

Hi Sergei,

------------------------------
On Fri, Mar 28, 2014 1:42 PM GMT Sergei Antonov wrote:

>> I am ready for further convincing efforts.

As I said, I am happy to review your work, and I hope you and Vyacheslav can work together.
However, I wish to point out that Vyacheslav has a track record going back for a few years
of contributing to the hfs+ driver, and I believe there was a patch submitted a few months
ago to make him the maintainer. In the hierarchy of linux kernel development, informal
as it is, he is still probably one of the people that you need to convince. The way you are
going about the matter, it is not contributing to getting your ideas accepted.

>> Though now that Vyacheslav has stopped replying in the thread there is no progress. Two compromises have been suggested by me in the thread, I'll recap them here for you or anyone who wasn't following.
>> 1. Since collecting all transactions requires more memory (up to 12 MB in my tests, but theoretically more), I suggested a threshold. Let's say it will be 1 MB. Once collected data reaches one meg, flush it, free the buffers and go on collecting.
>> 2. A very simple suggestions that makes my logic closer to Vyacheslav's one-by-one transaction processing approach. In case we encounter a corrupted transaction, replay (good) transactions collected so far. Corrupted transaction is a rare thing (caused by cosmic rays!), but partial replay looks sane and not a big change to the code.
>>

...

>Could you clarify "transaction based"? Seriously. Because since there
>are transactions in HFS+ journal, all replay implementations are
>transaction-based in a way.

Transactions are what they are. The journal is an ordered list of transactions.
Each of them has a checksum for integrity, and is treated as one unit of
change. Theorectically, you should apply one after the other in the correct order,
until the list is exhausted, or until you encounter the first of one such unit
that does not pass sanity checks (the checksum, and other out-of-bound
conditions, etc).

How you go about it, or buffering a number of transactions for performance
reasons, is up to you (or the implementor), *up to a certain point*. I
suggest you supply a patch on top of Vyacheslav's patch set to
demonstrate how you want to go about it. It is all rather vague and
not constructive going around in circle about words and meaning of
words. So I suggest that you show us the code to do it,
how you would like to modify Vyacheslav's to work in your way.

Be really careful about saying things like "all replay implementations". There is
no such thing as "all" replay implementations: there is an official specification
- TN1150, and there is an official implementation, that inside xnu and diskdev_cmds
from Apple. Anything else is about you making a Sergei's file system that's
not quite HFS+.

If you read/write the journal qualitatively different from how Apple does it,
you are by definition wrong, even if your implemention of journalling
does it faster and "cleverer" than Apple's.

Hin-Tak

 





^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: [PATCH v2] hfsplus: add journal replay
@ 2014-03-13 20:04 Sergei Antonov
  2014-03-23 23:15 ` A few thoughts on hfsplus dev (Re: [PATCH v2] hfsplus: add journal replay) Hin-Tak Leung
  0 siblings, 1 reply; 11+ messages in thread
From: Sergei Antonov @ 2014-03-13 20:04 UTC (permalink / raw)
  To: Vyacheslav Dubeyko
  Cc: linux-fsdevel, Al Viro, Christoph Hellwig, Andrew Morton, Hin-Tak Leung

On 13 March 2014 11:20, Vyacheslav Dubeyko <slava@dubeyko.com> wrote:

> > I checked your code and made a bugreport.
>
> Your report is related to the issue that I don't re-read in my code a
> volume header after journal replay. This bug can be fixed easily. I've
> written to you about it yet.

But you do not reread VH on any device and the error was only on a 4K
sector device. OK, thanks for telling anyway.

> So, in such situation we have some freedom in field naming. Because
> specification doesn't contain any description.

Spec is not to be objected, reference implementation - have some
freedom. OK, you've made your point :).

Will you at least not object naming block_info::next
block_info::checksum (i.e. "checksum" instead of "next")? Just want to
know the limits of your esteem for this (dated) spec.

> The "folderCount" name contradicts to kernel coding style. I suppose
> that "subfolders" is much sensible and shorter name as "folder_count".
> Moreover, you've accepted my suggestion.
>
>> > (3) to distort a structure with comparing with Technical Note TN1150 (I
>> > mean struct block_list_header, for example).
>>
>> I'll explain.
>> block_list_header is not good because it describes sequential block
>> runs, not blocks. To avoid a double meaning of "block" I came up with:
>>   struct block_list_header (not really a block is meant) -> struct hfs_jrnl_list
>>   struct block_info (not really a block is meant) -> struct hfs_jrnl_run
>> Also changed fields:
>>   block_info::bnum (now comes the real block!) ->
>> hfs_jrnl_run::first_block (the 1st block of the run)
>>   block_info::bsize (size in blocks?) -> hfs_jrnl_run::byte_len (no,
>> it is in bytes! make it clear)
>
> I don't think that you've suggested better names. It is completely
> obscure and not informative for me, personally.

My reasoning did not have an effect. Oh, well.

By the way, did you notice how elegantly I fixed
block_list_header::binfo problem with binfo[0] meaning something
completely different than other elements? I shifted first element's
fields to the structure:

struct hfs_jrnl_list {
        u16 reserved;
        u16 count;                      /* number of runs plus 1 */
        u32 length_with_data;           /* length of the list and data */
        u32 checksum;                   /* checksum of the first 32 bytes */
        u32 flags;                      /* see possible flags below */
        u64 reserved1;                  /* unused part of 1st fake run */
        u32 reserved2;                  /* unused part of 1st fake run */
        u32 tr_end_or_seq_num;          /* Before sequence numbers introduction:
                                           zero value means end of transaction.
                                           After sequence numbers introduction:
                                           a non-zero sequence number. */
        struct hfs_jrnl_run runs[0];    /* number of elements is count-1 */
};

Isn't that clearer than the original struct? IMO the original
structure - that's what really deserves to be called "obscure and not
informative".

> And I always say NACK of your patch while we haven't naming of
> structures and fields as it described in specification of HFS+
> (Technical Note TN1150). Because specification is common point for
> everybody who try to understand HFS+ internals and functionality. And if
> your code doesn't comply with specification then everybody will have
> troubles with understanding of the code. Simply write your own file
> system if you want to name in your own way always. We have specification
> for HFS+.
>
> I prefer to leave mainline untouched then to have incorrect or obscure
> code for HFS+.
>
> Of course, it is possible to discuss every name but the common remark
> from my side is: (1) your names are longer; (2) your names are obscure.
>
>> > First of all, journal contains sequence of transactions. Technical Note
>> > TN1150: "A group of related changes is called a transaction. When all of
>> > the changes of a transaction have been written to their normal locations
>> > on disk, that transaction has been committed, and is removed from the
>> > journal.
>>
>> This is about journaling changes, not about replay.
>>
>
> This is related as journaling as journal replay.
>
>> > Secondly, for example, you had sudden power-off. It means that last
>> > transaction will be not finished.
>>
>> You have a poor understanding of HFS+ journal. After sudden power-off
>> the last transaction is good. The unfinished (not completely written)
>> transaction may be the one that is beyond header->data_end. The header
>> is updated only after it is completely on the disk. This is why header
>> is 1 sector long - header update is atomic, and the journal is
>> consistent any time.
>>
>> If you've found a corrupted transactions it is not a sudden power-off,
>> rather it is your bug :) or a bug in FS driver that wrote them. Wise
>> handling is: cry about it using printk, leave the journal unchanged,
>> mount for read-only. This is what my code does.
>
> I insist on transaction-based journal replay. I don't think that I have
> poor understanding of journaling at whole.
>
> There are many possible reasons that transaction in journal can be
> broken:
> (1) You can't guarantee that initiated flush will succeed. As a result,
> file system driver can think that all is OK with written data but really
> storage can have troubled data.
> (2) Even if data was written successfully then it doesn't mean that
> these data will be read successfully because of different storage'
> troubles. We live in not ideal world.

All these rare curruption cases are "cry about it using printk, leave
the journal unchanged".

Ok, what do you think of an implementation that works as follows?

if (a broken transaction is read) {
  1. stop reading journal /* obvious */
  2. replay all good transactions /* they are good and deserve it */
  3. do not update the journal header /* we do not want to deal with it */
  4. mount read-only /* safe behavior for any volume with a journal,
but in this case it is an absolute must */
}

It differs from my current logic in an added step 2.

> So, any transaction can be broken, potentially. Moreover, you can't
> guarantee anything with last transaction after sudden power-off. As a
> result, it makes sense to replay as much as possible valid transactions
> before discovering of invalid one.
>
>> > And, finally, if you firstly check all transactions then it means that
>> > you need to read all of its in memory before real replaying. And, as far
>> > as I can judge, you simply keep all sectors in memory before replay. For
>> > example, your system has 512 MB of RAM and HFS+ volume has fully filled
>> > journal with 512 MB in size. How lucky will you with journal replay on
>> > such system without a swap?
>>
>> Pretty lucky :) because:
>> 1st I store data to write to sectors, not all journal.
>> 2nd there is a feature of my code which significantly minimizes memory
>> usage. Different data in journal is destined to same sectors, and I
>> avoid allocating separate memory buffers for it. An awesome
>> <linux/interval_tree_generic.h> data structure helps me do it, see
>> hfsplus_replay_data_add().
>>
>> To defend my approach more, I can make measurements of memory
>> consumption for a 512 MB journal. Would you like me to?
>
> There are many possible situation that you doesn't take into account:
> (1) Journal can be 1 GB in size, for example. And some system can have
> 512 MB of RAM. And size of journal potentially can be much more,
> potentially.
> (2) You can't predict in what environment will work system and under
> what memory pressure a HFS+ volume will be mounted. So, trying to read
> all transactions in memory can be resulted with memory allocation
> failure with significant probability. And such approach is very
> dangerous way, from my point of view.

A threshold, as I already suggested, seems to be a good protection.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-04-06 20:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-05  1:02 A few thoughts on hfsplus dev (Re: [PATCH v2] hfsplus: add journal replay) Hin-Tak Leung
2014-04-06 16:01 ` Sergei Antonov
  -- strict thread matches above, loose matches on Subject: below --
2014-04-04 23:35 Hin-Tak Leung
2014-04-06 20:59 ` Sergei Antonov
2014-03-13 20:04 [PATCH v2] hfsplus: add journal replay Sergei Antonov
2014-03-23 23:15 ` A few thoughts on hfsplus dev (Re: [PATCH v2] hfsplus: add journal replay) Hin-Tak Leung
2014-03-24  6:56   ` Vyacheslav Dubeyko
2014-03-25  1:37   ` Sergei Antonov
2014-03-25  6:30     ` Vyacheslav Dubeyko
2014-03-28 13:42       ` Sergei Antonov
2014-03-28 15:09         ` Vyacheslav Dubeyko
2014-03-28 20:04           ` Sergei Antonov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.