All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] udftools: steps towards fsck
@ 2019-03-07  2:44 Steve Magnani
  2019-03-07 10:23 ` Pali Rohár
  0 siblings, 1 reply; 5+ messages in thread
From: Steve Magnani @ 2019-03-07  2:44 UTC (permalink / raw)
  To: Jan Kara, pali.rohar, reinoud; +Cc: Colin King, linux-kernel, linux-fsdevel

(Please remove at least LKML when responding. Mailing lists are a 
scattershot attempt to reach others who might be interested in this 
topic since I'm not aware of any linux-udf mailing list. )

A few months ago I stumbled across an interesting bit of abandonware in 
the Sourceforge CVS repo that hosted UDF development through about 2004. 
Code that originated here eventually became the modern-day udftools:

     https://sourceforge.net/p/linux-udf/code/

The 'udf' module in that repo contains a program from 1999 named 
'chkudf', which appears to have been written by Rob Simms. Being from 
the Y2K era, the program has no awareness of anything beyond UDF2.01; in 
particular, its comprehension of VAT reflects UDF1.50 and not the 
revamped design introduced in UDF2.00. But it does have an ability to 
analyze the major UDF data structures and to walk the filesystem.

I've spent quite a bit of time enhancing and fixing bugs in this code, 
with a short term goal of being able to report damage to UDF2.01 
filesystems on "hard disk" (magnetic and SSD) media. It's not quite to 
the point of being release-ready, but I think the code is on the cusp of 
becoming useful to others so I wanted to get some feedback on the approach.

I posted a GIT port (via SVN) of the CVS repo here, including all the 
changes I've made so far:

     https://github.com/smagnani/chkudf.git

If you're interested in building the code you should be able to just run 
'make' within the chkudf folder. On Debian-derived systems you'll need 
libblkid-dev installed in order to build.

Some questions for consideration:

* Would a udffsck limited to checking of UDF2.01 and earlier on "hard 
disk" media be a sufficiently useful starting point to justify inclusion 
in udftools? Obviously a tool with such limitations would have to be 
particularly vigilant about ensuring that media-under-test doesn't 
exceed its capabilities.

* If so, do you think the chkudf implementation could qualify? It's not 
ready yet, but with an investment of some time and energy it could be 
made more functionally complete and (maybe more importantly) more 
user-friendly.

In part this is a question of whether the chkudf design can support 
enhancements to get (eventually) to UDF2.60 and optical media support, 
balanced against the many years without an open-source udffsck and not 
"letting the perfect become the enemy of the good."

* For any standards-based parser it's important to have examples of as 
many variations as possible (both normal and pathological) in order to 
ensure that corner cases and less common features are tested properly. 
Can anyone point me to any good sources of UDF data for testing? There 
are always commercial DVDs and Blu-Ray discs, of course, and I've 
cobbled together a few special cases by hand (i.e., a filesystem with 
directory cycles), but I have no examples with extended attributes or 
stream data. If I could find a DVD of Mac software in a resale shop 
would that help? [Side note, I've thought of enhancing chkudf to support 
a tool that would store all the UDF structures of a filesystem in a 
tarball that could be used to reconstitute that filesystem within a 
sparse file. Since none of the file contents would be stored the 
tarballs would be relatively small even if they represent terabyte-scale 
filesystems.

* Are there versions (or features) of UDF that are less important to 
support than others (1.50? Strategy 4096? Named streams? etc.) I know 
1.02, 2.01, and 2.50 are in wide use.

* What kinds of repairs are most important to implement? I was thinking 
that regeneration of the Logical Volume Integrity Descriptor and the 
unallocated space bitmap are both important and hopefully relatively 
straightforward. Beyond that...recovering ICBs to "lost+found"?


My 2 cents:
I didn't write this program. There are things I would have done 
differently, but to this point I have tried to work within the existing 
design and code style. After becoming more aware of differences between 
the various UDF standards (in particular, the increase in complexity 
since 2.01) and the many errata involved, I have a gut feeling that an 
implementation in a language that supports inheritance might be a lot 
more manageable over the long term - but it's not something I've spent a 
lot of time thinking about. I've only recently become aware of 
UDFclient, and haven't had time to look over its design yet. And, I can 
see the potential for followon utilities such as a filesystem resizer - 
which might argue for making more of the code library-based and not so 
heavy on printed output.

Bottom line...udffsck has to start somewhere, could it start with chkudf?

Thanks for reading.
------------------------------------------------------------------------
  Steven J. Magnani               "I claim this network for MARS!
  www.digidescorp.com              Earthling, return my space modulator!"

  #include <standard.disclaimer>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] udftools: steps towards fsck
  2019-03-07  2:44 [RFC] udftools: steps towards fsck Steve Magnani
@ 2019-03-07 10:23 ` Pali Rohár
  2019-03-08 16:50   ` Steve Magnani
  0 siblings, 1 reply; 5+ messages in thread
From: Pali Rohár @ 2019-03-07 10:23 UTC (permalink / raw)
  To: Steve Magnani
  Cc: Jan Kara, reinoud, Colin King, Vojtěch Vladyka,
	linux-kernel, linux-fsdevel

On Wednesday 06 March 2019 20:44:54 Steve Magnani wrote:
> (Please remove at least LKML when responding. Mailing lists are a
> scattershot attempt to reach others who might be interested in this topic
> since I'm not aware of any linux-udf mailing list. )

IIRC there is no linux-udf mailing list, but I do not see any reason why
not to use linux-fsdevel or linux-kernel.

> A few months ago I stumbled across an interesting bit of abandonware in the
> Sourceforge CVS repo that hosted UDF development through about 2004. Code
> that originated here eventually became the modern-day udftools:
> 
>     https://sourceforge.net/p/linux-udf/code/

udftools project was moved to github:
https://github.com/pali/udftools/

Ben (original project developer) also updated sourceforce page and you
can see there a big blue box "This project can now be found here." which
points to github.

> The 'udf' module in that repo contains a program from 1999 named 'chkudf',
> which appears to have been written by Rob Simms. Being from the Y2K era, the
> program has no awareness of anything beyond UDF2.01; in particular, its
> comprehension of VAT reflects UDF1.50 and not the revamped design introduced
> in UDF2.00. But it does have an ability to analyze the major UDF data
> structures and to walk the filesystem.

As project page was moved to github I converted also whole source code
history. You can find there also that that old chkudf code...

https://github.com/pali/udftools/tree/87acf1a2306b7b60ed9d61b53c2a487ea5f3396c/src/chkudf

But, I would like to let you know that Vojtěch (CCed) started working on
udffsck implementation as part of his master thesis and current WIP code
is available on github in pull request:

https://github.com/pali/udftools/pull/7

So it would be great if you look at new code and probably help Vojtěch
to finish new effort as trying to port and fix 20 years old code which
was already removed from udftools project...

> I've spent quite a bit of time enhancing and fixing bugs in this code, with
> a short term goal of being able to report damage to UDF2.01 filesystems on
> "hard disk" (magnetic and SSD) media. It's not quite to the point of being
> release-ready, but I think the code is on the cusp of becoming useful to
> others so I wanted to get some feedback on the approach.
> 
> I posted a GIT port (via SVN) of the CVS repo here, including all the
> changes I've made so far:
> 
>     https://github.com/smagnani/chkudf.git
> 
> If you're interested in building the code you should be able to just run
> 'make' within the chkudf folder. On Debian-derived systems you'll need
> libblkid-dev installed in order to build.
> 
> Some questions for consideration:
> 
> * Would a udffsck limited to checking of UDF2.01 and earlier on "hard disk"
> media be a sufficiently useful starting point to justify inclusion in
> udftools? Obviously a tool with such limitations would have to be
> particularly vigilant about ensuring that media-under-test doesn't exceed
> its capabilities.
> 
> * If so, do you think the chkudf implementation could qualify? It's not
> ready yet, but with an investment of some time and energy it could be made
> more functionally complete and (maybe more importantly) more user-friendly.
> 
> In part this is a question of whether the chkudf design can support
> enhancements to get (eventually) to UDF2.60 and optical media support,
> balanced against the many years without an open-source udffsck and not
> "letting the perfect become the enemy of the good."
> 
> * For any standards-based parser it's important to have examples of as many
> variations as possible (both normal and pathological) in order to ensure
> that corner cases and less common features are tested properly. Can anyone
> point me to any good sources of UDF data for testing? There are always
> commercial DVDs and Blu-Ray discs, of course, and I've cobbled together a
> few special cases by hand (i.e., a filesystem with directory cycles), but I
> have no examples with extended attributes or stream data. If I could find a
> DVD of Mac software in a resale shop would that help? [Side note, I've
> thought of enhancing chkudf to support a tool that would store all the UDF
> structures of a filesystem in a tarball that could be used to reconstitute
> that filesystem within a sparse file. Since none of the file contents would
> be stored the tarballs would be relatively small even if they represent
> terabyte-scale filesystems.
> 
> * Are there versions (or features) of UDF that are less important to support
> than others (1.50? Strategy 4096? Named streams? etc.) I know 1.02, 2.01,
> and 2.50 are in wide use.

Currently udftools support UDF revisions 1.01, 1.02, 1.50, 2.00, 2.01
and for BD-R (without metadata partition) also 2.50 and 2.60.

> * What kinds of repairs are most important to implement? I was thinking that
> regeneration of the Logical Volume Integrity Descriptor and the unallocated
> space bitmap are both important and hopefully relatively straightforward.
> Beyond that...recovering ICBs to "lost+found"?
> 
> 
> My 2 cents:
> I didn't write this program. There are things I would have done differently,
> but to this point I have tried to work within the existing design and code
> style. After becoming more aware of differences between the various UDF
> standards (in particular, the increase in complexity since 2.01) and the
> many errata involved, I have a gut feeling that an implementation in a
> language that supports inheritance might be a lot more manageable over the
> long term - but it's not something I've spent a lot of time thinking about.
> I've only recently become aware of UDFclient, and haven't had time to look
> over its design yet. And, I can see the potential for followon utilities
> such as a filesystem resizer - which might argue for making more of the code
> library-based and not so heavy on printed output.
> 
> Bottom line...udffsck has to start somewhere, could it start with chkudf?
> 
> Thanks for reading.
> ------------------------------------------------------------------------
>  Steven J. Magnani               "I claim this network for MARS!
>  www.digidescorp.com              Earthling, return my space modulator!"
> 
>  #include <standard.disclaimer>
> 

-- 
Pali Rohár
pali.rohar@gmail.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] udftools: steps towards fsck
  2019-03-07 10:23 ` Pali Rohár
@ 2019-03-08 16:50   ` Steve Magnani
  2019-03-09 14:25     ` Pali Rohár
  2019-03-11 17:05     ` Jan Kara
  0 siblings, 2 replies; 5+ messages in thread
From: Steve Magnani @ 2019-03-08 16:50 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Jan Kara, reinoud, Colin King, Vojtěch Vladyka, linux-fsdevel

Hi Pali,

On 3/7/19 4:23 AM, Pali Rohár wrote:
> On Wednesday 06 March 2019 20:44:54 Steve Magnani wrote:
>> ...
>>   
> udftools project was moved to github:
> https://github.com/pali/udftools/
>
> Ben (original project developer) also updated sourceforce page and you
> can see there a big blue box "This project can now be found here." which
> points to github.
>> ...
>>   
> As project page was moved to github I converted also whole source code
> history. You can find there also that that old chkudf code...
>
> https://github.com/pali/udftools/tree/87acf1a2306b7b60ed9d61b53c2a487ea5f3396c/src/chkudf


Right. But AFAICS there is no way to get _from_ the link in the "big 
blue box" _to_ the chkudf path you cite via normal browser navigation. 
Also I see no evidence of chkudf when using qgit or git-cola to browse a 
clone of the udftools repo. This is why I was working from the original 
CVS repo.

Since your git-fu is stronger than mine, can you educate me on how to 
navigate to this tree without a magic URL? Are there better tools that I 
should be using to work with local GIT repo clones?


> But, I would like to let you know that Vojtěch (CCed) started working on
> udffsck implementation as part of his master thesis and current WIP code
> is available on github in pull request:
>
> https://github.com/pali/udftools/pull/7
>
> So it would be great if you look at new code and probably help Vojtěch
> to finish new effort as trying to port and fix 20 years old code which
> was already removed from udftools project...

Sure, I was hoping to pool resources.

One suggestion - since udffsck is much-desired, and you have a pull 
request, a reference to it either in a README within the stub udffsck 
folder or a printf() to that effect added to main() would make it more 
obvious that work is in progress and where to find it. I checked the 
udftools Github repo when I started this adventure but it never crossed 
my mind to look through pull requests.

>> Some questions for consideration:
>>
>> ...
>>
>> * For any standards-based parser it's important to have examples of as many
>> variations as possible (both normal and pathological) in order to ensure
>> that corner cases and less common features are tested properly. Can anyone
>> point me to any good sources of UDF data for testing? There are always
>> commercial DVDs and Blu-Ray discs, of course, and I've cobbled together a
>> few special cases by hand (i.e., a filesystem with directory cycles), but I
>> have no examples with extended attributes or stream data. If I could find a
>> DVD of Mac software in a resale shop would that help? [Side note, I've
>> thought of enhancing chkudf to support a tool that would store all the UDF
>> structures of a filesystem in a tarball that could be used to reconstitute
>> that filesystem within a sparse file. Since none of the file contents would
>> be stored the tarballs would be relatively small even if they represent
>> terabyte-scale filesystems.
Any thoughts on this? It would seem like a library of test cases would 
help both udftools and kernel driver development.
>>
>> * Are there versions (or features) of UDF that are less important to support
>> than others (1.50? Strategy 4096? Named streams? etc.) I know 1.02, 2.01,
>> and 2.50 are in wide use.
> Currently udftools support UDF revisions 1.01, 1.02, 1.50, 2.00, 2.01
> and for BD-R (without metadata partition) also 2.50 and 2.60.

I'm not quite sure how to read this. Currently-functional tools such as 
mkudffs support those versions? Vojtěch's code supports those versions? 
All of the versions are equally represented in field use? The reason I 
was asking was to try to prioritize development, to avoid getting bogged 
down (at least initially) in details of UDF that don't have as much 
practical significance.

Regards,

------------------------------------------------------------------------
  Steven J. Magnani               "I claim this network for MARS!
  www.digidescorp.com              Earthling, return my space modulator!"

  #include <standard.disclaimer>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] udftools: steps towards fsck
  2019-03-08 16:50   ` Steve Magnani
@ 2019-03-09 14:25     ` Pali Rohár
  2019-03-11 17:05     ` Jan Kara
  1 sibling, 0 replies; 5+ messages in thread
From: Pali Rohár @ 2019-03-09 14:25 UTC (permalink / raw)
  To: Steve Magnani
  Cc: Jan Kara, reinoud, Colin King, Vojtěch Vladyka, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 6752 bytes --]

Hi!

On Friday 08 March 2019 10:50:34 Steve Magnani wrote:
> Hi Pali,
> 
> On 3/7/19 4:23 AM, Pali Rohár wrote:
> > On Wednesday 06 March 2019 20:44:54 Steve Magnani wrote:
> > > ...
> > udftools project was moved to github:
> > https://github.com/pali/udftools/
> > 
> > Ben (original project developer) also updated sourceforce page and you
> > can see there a big blue box "This project can now be found here." which
> > points to github.
> > > ...
> > As project page was moved to github I converted also whole source code
> > history. You can find there also that that old chkudf code...
> > 
> > https://github.com/pali/udftools/tree/87acf1a2306b7b60ed9d61b53c2a487ea5f3396c/src/chkudf
> 
> 
> Right. But AFAICS there is no way to get _from_ the link in the "big blue
> box" _to_ the chkudf path you cite via normal browser navigation. Also I see
> no evidence of chkudf when using qgit or git-cola to browse a clone of the
> udftools repo. This is why I was working from the original CVS repo.

I do not know nor use qgit or git-cola. But git itself allows you to
find commit which removed chkudf from repository and git checkout allows
you to checkout to chkudf code. E.g.:

  $ git log -- src/chkudf

  $ git checkout -b my_chkudf_branch 0bccda9

> Since your git-fu is stronger than mine, can you educate me on how to
> navigate to this tree without a magic URL? Are there better tools that I
> should be using to work with local GIT repo clones?

  $ git clone git://github.com/pali/udftools.git
  $ cd udftools
  $ git checkout 87acf1a2306b7b60ed9d61b53c2a487ea5f3396c
  $ cd src/chkudf

> > But, I would like to let you know that Vojtěch (CCed) started working on
> > udffsck implementation as part of his master thesis and current WIP code
> > is available on github in pull request:
> > 
> > https://github.com/pali/udftools/pull/7
> > 
> > So it would be great if you look at new code and probably help Vojtěch
> > to finish new effort as trying to port and fix 20 years old code which
> > was already removed from udftools project...
> 
> Sure, I was hoping to pool resources.
> 
> One suggestion - since udffsck is much-desired, and you have a pull request,
> a reference to it either in a README within the stub udffsck folder or a
> printf() to that effect added to main() would make it more obvious that work
> is in progress and where to find it. I checked the udftools Github repo when
> I started this adventure but it never crossed my mind to look through pull
> requests.

Heh, it is a good idea to look at reported issues and pull requests.
There is already reported issue that fsck.udf is missing with link to
pull request where is WIP implementation. Also it is a good idea to ask
if somebody else have not started to do same thing :-)

> > > Some questions for consideration:
> > > 
> > > ...
> > > 
> > > * For any standards-based parser it's important to have examples of as many
> > > variations as possible (both normal and pathological) in order to ensure
> > > that corner cases and less common features are tested properly. Can anyone
> > > point me to any good sources of UDF data for testing? There are always
> > > commercial DVDs and Blu-Ray discs, of course, and I've cobbled together a
> > > few special cases by hand (i.e., a filesystem with directory cycles), but I
> > > have no examples with extended attributes or stream data. If I could find a
> > > DVD of Mac software in a resale shop would that help? [Side note, I've
> > > thought of enhancing chkudf to support a tool that would store all the UDF
> > > structures of a filesystem in a tarball that could be used to reconstitute
> > > that filesystem within a sparse file. Since none of the file contents would
> > > be stored the tarballs would be relatively small even if they represent
> > > terabyte-scale filesystems.
> Any thoughts on this? It would seem like a library of test cases would help
> both udftools and kernel driver development.

I'm not aware of any "standard-based" testing UDF filesystem images.

For purpose of parsing and testing UDF label and UUID I generated UDF
filesystem images, which are part of util-linux project, where is
libblkid library (used e.g. by mount or GUI programs for showing fs
LABEL). You can find them in util-linux project, online links are there:

https://github.com/karelzak/util-linux/tree/master/tests/ts/blkid/images-fs
https://github.com/karelzak/util-linux/tree/master/tests/expected/blkid

In "images-fs" are compressed filesystem images and in "blkid" are
parsed outputs, information about image.

> > > 
> > > * Are there versions (or features) of UDF that are less important to support
> > > than others (1.50? Strategy 4096? Named streams? etc.) I know 1.02, 2.01,
> > > and 2.50 are in wide use.
> > Currently udftools support UDF revisions 1.01, 1.02, 1.50, 2.00, 2.01
> > and for BD-R (without metadata partition) also 2.50 and 2.60.
> 
> I'm not quite sure how to read this. Currently-functional tools such as
> mkudffs support those versions?

Yes, it applies to mkudffs, udflabel and udfinfo tools.

> Vojtěch's code supports those versions?

IIRC Vojtěch's udffsck code supports only UDF revision 2.01 and is
restricted to UDF filesystems which uses Space set with Unallocated
Bitmap, no VAT, no Sparse Tables, ... and other options which are
default values for mkudffs when formatting "hard disk" media.

> All of the versions are equally represented in field use?

There are some differences between UDF revisions. E.g. some features are
unsupported by older revisions (VAT, Sparse Tables prior 1.50); some
were changed in new revisions (VAT is different in 1.50 and in 2.00+)
and some are introduced in new revisions as mandatory (Metadata
partition in UDF 2.50+).

> The reason I was
> asking was to try to prioritize development, to avoid getting bogged down
> (at least initially) in details of UDF that don't have as much practical
> significance.

Linux kernel udf.ko driver supports R/W mode up to the revision 2.01. It
is because it does not support write operations to Metadata partition
and Metadata partition is required for overwritable media (like hard
disk).

So check & repair support has the highest priority for UDF revision 2.01
on hard disks (overwritable media).

> Regards,
> 
> ------------------------------------------------------------------------
>  Steven J. Magnani               "I claim this network for MARS!
>  www.digidescorp.com              Earthling, return my space modulator!"
> 
>  #include <standard.disclaimer>
> 

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] udftools: steps towards fsck
  2019-03-08 16:50   ` Steve Magnani
  2019-03-09 14:25     ` Pali Rohár
@ 2019-03-11 17:05     ` Jan Kara
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Kara @ 2019-03-11 17:05 UTC (permalink / raw)
  To: Steve Magnani
  Cc: Pali Rohár, Jan Kara, reinoud, Colin King,
	Vojtěch Vladyka, linux-fsdevel

On Fri 08-03-19 10:50:34, Steve Magnani wrote:
> > > * For any standards-based parser it's important to have examples of as many
> > > variations as possible (both normal and pathological) in order to ensure
> > > that corner cases and less common features are tested properly. Can anyone
> > > point me to any good sources of UDF data for testing? There are always
> > > commercial DVDs and Blu-Ray discs, of course, and I've cobbled together a
> > > few special cases by hand (i.e., a filesystem with directory cycles), but I
> > > have no examples with extended attributes or stream data. If I could find a
> > > DVD of Mac software in a resale shop would that help? [Side note, I've
> > > thought of enhancing chkudf to support a tool that would store all the UDF
> > > structures of a filesystem in a tarball that could be used to reconstitute
> > > that filesystem within a sparse file. Since none of the file contents would
> > > be stored the tarballs would be relatively small even if they represent
> > > terabyte-scale filesystems.
> Any thoughts on this? It would seem like a library of test cases would help
> both udftools and kernel driver development.

Agreed. E.g. e2fsprogs have a relatively large bunch of fs images to test
fsck functionality. Initially I think it is important to have just several
images checking basic functionality and an easy way to run them. New images
to test then can be added as we find various corruptions in the wild... I
have a couple of images I test the kernel driver against which I can
contribute.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-03-11 17:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-07  2:44 [RFC] udftools: steps towards fsck Steve Magnani
2019-03-07 10:23 ` Pali Rohár
2019-03-08 16:50   ` Steve Magnani
2019-03-09 14:25     ` Pali Rohár
2019-03-11 17:05     ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.