All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: james harvey <jamespharvey20@gmail.com>
Cc: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: Questions from aspiring btrfs mini-debugger/mini-developer
Date: Tue, 5 Jun 2018 09:05:49 +0800	[thread overview]
Message-ID: <6ac66452-928a-0609-3be5-438dcb67f8e7@gmx.com> (raw)
In-Reply-To: <CA+X5Wn5H9643dWbw-YoQ_tLy7Az=EBNLrs1CnDZVv12yOOjrQQ@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 6546 bytes --]



On 2018年06月05日 08:27, james harvey wrote:
> On Mon, May 28, 2018 at 8:48 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> On 2018年05月28日 17:21, james harvey wrote:
>>> #29, through btrfs-tree-debug, is:
>>>
>>>         item 49 key (71469 EXTENT_DATA 3768320) itemoff 13232 itemsize 53
>>>                 generation 218 type 1 (regular)
>>>                 extent data disk byte 2373160960 nr 8384512
>>>                 extent data offset 3764224 nr 425984 ram 8384512
>>>                 extent compression 0 (none)
>>>
>>> Its extents without a data offset (i.e. filefrag #30) look like:
>>>
>>>         item 50 key (71469 EXTENT_DATA 4194304) itemoff 13179 itemsize 53
>>>                 generation 310 type 1 (regular)
>>>                 extent data disk byte 2445152256 nr 49152
>>>                 extent data offset 0 nr 131072 ram 131072
>>>                 extent compression 2 (lzo)
>>>
>>> So, item 49 is saying there's 8,384,512 bytes on disk, but for this
>>> file extent, only read starting 3,764,224 into the extent_data, and
>>> only read 425,984 bytes?
>>
>> Yep, reading from on-disk logical address 2373160960 + 3764224, len 425984.
>>
>>>  This is a snapshotted file.  At first, I was
>>> thinking this might mean most of this extent had changed, but 425,984
>>> bytes in the "middle" were the same, so btrfs was re-using that
>>> portion.  Is that's why data_offset is used?
>>
>> Yep.
> 
> Thanks for taking all the time to respond!  Been working through all
> this (and some other things) since your response.  Few follow-up
> questions.
> 
> Can a compressed COW file wind up with an offset, like this hypothetical output:

It is completely allowed.
And can be created easily.

	item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
		generation 11 type 1 (regular)
		extent data disk byte 13897728 nr 16384
		extent data offset 0 nr 16384 ram 16384
		extent compression 0 (none)
	item 7 key (257 EXTENT_DATA 16384) itemoff 15763 itemsize 53
		generation 9 type 1 (regular)
		extent data disk byte 13893632 nr 4096
		extent data offset 16384 nr 98304 ram 131072 <<<
		extent compression 2 (lzo)
	item 8 key (257 EXTENT_DATA 114688) itemoff 15710 itemsize 53
		generation 11 type 1 (regular)
		extent data disk byte 13914112 nr 16384
		extent data offset 0 nr 16384 ram 16384
		extent compression 0 (none)


> 
>          item 50 key (71469 EXTENT_DATA 4194304) itemoff 13179 itemsize 53
>                  generation 310 type 1 (regular)
>                  extent data disk byte 2445152256 nr 49152
>                  extent data offset 4096 nr 65536 ram 131072
>                  extent compression 2 (lzo)
> 
> I checked my main volume, and don't see any with compression and an
> offset.  So, I'm thinking it might not be allowed.
> 
> If it is allowed, things get a bit confusing.  Is the offset for
> on-disk (compressed) data?

No, always for uncompressed data.

>  Would it be reading from on-disk logical
> address 2445152256  + 4096, len 65536 of the compressed data?
> 
> Or, is the offset for in-memory (uncompressed) data?  If this is the case, un
> 
> But, with it being on the second line here (maybe that's just for line
> length though), and being referred to as btrfs_file_extent_offset
> rather than btrfs_file_extent_***disk****_offset, it makes me think
> offset might always be an in memory (uncompressed) offset.

Yep

> 
> With checksums being on 4k blocks, in theory, it seems to me like
> on-disk offsetting should be able to happen.

Although, csum only works for on-disk data, that to say, for
compression, csum only works for disk_bytenr and disk_len.

> 
> 
>>> Am I right that preallocated means no data has actually been written
>>> there?
>>
>> Yes, but space must be allocated for later possible write.
>> That's why we call it pre-allocated.
> 
> Ahh, I was misunderstanding pre-allocated to mean before allocation.
> 
> 
>>> Given an extent_buffer, btrfs_item, slot, and btrfs_file_extent_item,
>>> if the extent type is BTRFS_FILE_EXTENT_INLINE, how would one get the
>>> on-disk (so if compressed, in compressed format) data?
>>
>> Read from the leaf.
>> Just as the name inline, the data directly recorded into the leaf, and
>> there is no need to use disk_bytenr.
>> In fact starting from the offset of where disk_bytenr should be, inlined
>> data is recorded there directly.
>>
>>>  With
>>> non-inline, non-prealloc extents, I'm using bytenr as location and
>>> num_bytes as length, and code based off btrfs-map-logical, which winds
>>> up using read_extent_data with a mirror number argument, which uses
>>> btrfs_map_block() on that logical address and mirror and pread64() to
>>> do the read.  For inline data, there's no logical address.
> 
> Sorry, my question wasn't clear.  Assuming its mirrored, I was
> wondering how to get both copies of the metadata,

You don't really need to care or worry about this.

In theory, you could read out the mirror in btrfs-progs using mirror
number. (0 means the first good copy, 1 means the first copy, 2 means
the second copy for RAID1)

But normally it won't cause anything wrong, as we have checksum for
metadata, thus it won't be a problem.

> which would give
> both copies of the inline data, so the mirrored data could be
> compared.

We have csum for the whole tree block, which means before you could read
anything from the leaf, it must match with its csum.
Thus less possible to cause problem.

>  I've since realized that since it's in the metadata, the
> metadata checksumming which (I think) can't be turned off will cover
> it.  So, there's no need to examine these whatsoever in the context of
> checking for mismatched mirrored data.  A NOCOW/NODATASUM flag on the
> inode would be irrelevant.  Am I right here?

Yep.

> 
> Does scrub cover inline data marked NOCOW/NODATASUM?

Nope.
Btrfs scrub only checks extent.
For inline data, they don't have any extent. Only the tree leaf
containing the inlined data is an extent.

In that case, btrfs just checks the csum of the tree block.

Further more, since metadata is always CoWed, even we have
NOCOW/NODATASUM flag, it doesn't make any sense for inlined data.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

      reply	other threads:[~2018-06-05  1:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-28  9:21 Questions from aspiring btrfs mini-debugger/mini-developer james harvey
2018-05-28 12:48 ` Qu Wenruo
2018-06-05  0:27   ` james harvey
2018-06-05  1:05     ` Qu Wenruo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ac66452-928a-0609-3be5-438dcb67f8e7@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=jamespharvey20@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.