linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hubert Kario <hka@qbs.com.pl>
To: Paul Millar <paul.millar@desy.de>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: A couple of questions
Date: Thu, 27 May 2010 16:56:00 +0200	[thread overview]
Message-ID: <201005271656.00398.hka@qbs.com.pl> (raw)
In-Reply-To: <201005271539.55644.paul.millar@desy.de>

On Thursday 27 May 2010 15:39:54 Paul Millar wrote:
> Hi,
>=20
> I've been looking at Btrfs and have a couple of naive questions that =
don't
> seem to be answered on the wiki or in the articles I've read on the
> filesystem.
>=20
>=20
> First: discovering a file's checksum value.
>=20
> Here's the scenario: software is writing some data as a fresh file.  =
This
> software happens to know (a priori) the checksum of this data; for ex=
ample,
> a storage server receives the file's data and checksum independently.
>=20
> I've some confidence that, once the data is stored in btrfs, any corr=
uption
> (from the storage fabric) will be spotted; however, the data may have
> became corrupt before being stored (e.g., from the network).  To catc=
h
> this, the checksum of the stored data needs to be calculated and chec=
ked.
>=20
> One approach is to calculate the checksum (in user-space) after the d=
ata is
> stored.  This adds extra IO- and CPU-load and there's also the possib=
ility
> of false-negative results due to the filesystem cache (although btrfs=
 may
> remove this risk).
>=20
> Another approach would be to ask btrfs for the checksum.  It seems th=
at
> it's possible to combine multiple CRC-32C values to figure out the
> checksum of the combined data [e.g., zlib's crc32_combine() function]=
=2E=20
> So, obtaining a file's checksum might be a light-weight operation.
>=20
> Yet another possibility would be to push the desired checksum value (=
via
> fcntl?) and have btrfs compare the desired checksum with the file's a=
ctual
> checksum on close(2), failing that call if the checksums don't match.
>=20
> Would any of this be possible (without an awful lot of work)?

IMO, if an application recieves data with checksum it can calculate the=
=20
checksum of data on the fly, as it writes it to the disk. It won't add =
any=20
additional IO to storage subsystem. It won't detect in-memory corruptio=
n=20
though, but if you want to be resilant to this, you should be looking a=
t ECC=20
RAM as subsequent checks can be affected by it to.

Second, you shouldn't tie application or network protocol to a CRC sche=
me used=20
by filesystem on server! Especially when there can be other CRC algorit=
hms=20
used, not only CRC-32C.

If the checksum algorithm used by FS was set in stone, then userspace c=
ould=20
employ it somehow, but if there can be different CRCs used, I see no re=
ason to=20
allow the userspace to read them.


--=20
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawer=C3=B3w 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl

System Zarz=C4=85dzania Jako=C5=9Bci=C4=85
zgodny z norm=C4=85 ISO 9001:2000
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-05-27 14:56 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-27 13:39 A couple of questions Paul Millar
2010-05-27 14:56 ` Hubert Kario [this message]
2010-05-31 17:59   ` Paul Millar
2010-06-02 16:19     ` Hubert Kario
2010-05-27 16:00 ` Chris Mason
2010-05-31 18:06   ` Paul Millar
2010-05-31 20:33     ` Mike Fedyk
2010-06-02 11:56       ` Paul Millar
2010-06-01 13:39     ` Martin K. Petersen
2010-06-02 13:40       ` Paul Millar
2010-06-04  1:17         ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201005271656.00398.hka@qbs.com.pl \
    --to=hka@qbs.com.pl \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=paul.millar@desy.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).