* [PATCH 00/17] btrfs: add read-only support for subpage sector size
@ 2020-09-08 7:52 Qu Wenruo
2020-09-08 7:52 ` [PATCH 01/17] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
` (18 more replies)
0 siblings, 19 replies; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
Patches can be fetched from github:
https://github.com/adam900710/linux/tree/subpage
Currently btrfs only allows to mount fs with sectorsize == PAGE_SIZE.
That means, for 64K page size system, they can only use 64K sector size
fs.
This brings a big compatible problem for btrfs.
This patch is going to slightly solve the problem by, allowing 64K
system to mount 4K sectorsize fs in read-only mode.
The main objective here, is to remove the blockage in the code base, and
pave the road to full RW mount support.
== What works ==
Existing regular page sized sector size support
Subpage read-only Mount (with all self tests and ASSERT)
Subpage metadata read (including all trees and inline extents, and csum checking)
Subpage uncompressed data read (with csum checking)
== What doesn't work ==
Read-write mount (see the subject)
Compressed data read
== Challenge we meet ==
The main problem is metadata, where we have several limitations:
- We always read the full page of a metadata
In subpage case, one full page can contain several tree blocks.
- We use page::private to point to extent buffer
This means we currently can only support one-page-to-one-extent-buffer
mapping.
For subpage size support, we need one-page-to-multiple-extent-buffer
mapping.
== Solutions ==
So here for the metadata part, we use the following methods to
workaround the problem:
- Introduce subpage_eb_mapping structure to do bitmap
Now for subpage, page::private points to a subpage_eb_mapping
structure, which has a bitmap to mapping one page to multiple extent
buffers.
- Still do full page read for metadata
This means, at read time, we're not reading just one extent buffer,
but possibly many more.
In that case, we first do tree block verification for the tree blocks
triggering the read, and mark the page uptodate.
For newly incoming tree block read, they will check if the tree block
is verified. If not verified, even if the page is uptodate, we still
need to check the extent buffer.
By this all subpage extent buffers are verified properly.
For data part, it's pretty simple, all existing infrastructure can be
easily converted to support subpage read, without any subpage specific
handing yet.
== Patchset structure ==
The structure of the patchset:
Patch 01~11: Preparation patches for metadata subpage read support.
These patches can be merged without problem, and work for
both regular and subpage case.
Patch 12~14: Patches for data subpage read support.
These patches works for both cases.
That means, patch 01~14 can be applied to current kernel, and shouldn't
affect any existing behavior.
Patch 15~17: Subpage metadata read specific patches.
These patches introduces the main part of the subpage
metadata read support.
The number of patches is the main reason I'm submitting them to the mail
list. As there are too many preparation patches already.
Qu Wenruo (17):
btrfs: extent-io-tests: remove invalid tests
btrfs: calculate inline extent buffer page size based on page size
btrfs: remove the open-code to read disk-key
btrfs: make btrfs_fs_info::buffer_radix to take sector size devided
values
btrfs: don't allow tree block to cross page boundary for subpage
support
btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
btrfs: make csum_tree_block() handle sectorsize smaller than page size
btrfs: refactor how we extract extent buffer from page for
alloc_extent_buffer()
btrfs: refactor btrfs_release_extent_buffer_pages()
btrfs: add assert_spin_locked() for attach_extent_buffer_page()
btrfs: extract the extent buffer verification from
btree_readpage_end_io_hook()
btrfs: remove the unnecessary parameter @start and @len for
check_data_csum()
btrfs: extent_io: only require sector size alignment for page read
btrfs: make btrfs_readpage_end_io_hook() follow sector size
btrfs: introduce subpage_eb_mapping for extent buffers
btrfs: handle extent buffer verification proper for subpage size
btrfs: allow RO mount of 4K sector size fs on 64K page system
fs/btrfs/ctree.c | 13 +-
fs/btrfs/ctree.h | 38 ++-
fs/btrfs/disk-io.c | 111 ++++---
fs/btrfs/disk-io.h | 1 +
fs/btrfs/extent_io.c | 538 +++++++++++++++++++++++++------
fs/btrfs/extent_io.h | 19 +-
fs/btrfs/inode.c | 40 ++-
fs/btrfs/struct-funcs.c | 18 +-
fs/btrfs/super.c | 7 +
fs/btrfs/tests/extent-io-tests.c | 26 +-
10 files changed, 633 insertions(+), 178 deletions(-)
--
2.28.0
^ permalink raw reply [flat|nested] 49+ messages in thread
* [PATCH 01/17] btrfs: extent-io-tests: remove invalid tests
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-09 12:26 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 02/17] btrfs: calculate inline extent buffer page size based on page size Qu Wenruo
` (17 subsequent siblings)
18 siblings, 1 reply; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
In extent-io-test, there are two invalid tests:
- Invalid nodesize for test_eb_bitmaps()
Instead of the sectorsize and nodesize combination passed in, we're
always using hand-crafted nodesize.
Although it has some extra check for 64K page size, we can still hit
a case where PAGE_SIZE == 32K, then we got 128K nodesize which is
larger than max valid node size.
Thankfully most machines are either 4K or 64K page size, thus we
haven't yet hit such case.
- Invalid extent buffer bytenr
For 64K page size, the only combination we're going to test is
sectorsize = nodesize = 64K.
In that case, we'll try to create an extent buffer with 32K bytenr,
which is not aligned to sectorsize thus invalid.
This patch will fix both problems by:
- Honor the sectorsize/nodesize combination
Now we won't bother to hand-craft a strange length and use it as
nodesize.
- Use sectorsize as the 2nd run extent buffer start
This would test the case where extent buffer is aligned to sectorsize
but not always aligned to nodesize.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/tests/extent-io-tests.c | 26 +++++++++++---------------
1 file changed, 11 insertions(+), 15 deletions(-)
diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c
index df7ce874a74b..73e96d505f4f 100644
--- a/fs/btrfs/tests/extent-io-tests.c
+++ b/fs/btrfs/tests/extent-io-tests.c
@@ -379,54 +379,50 @@ static int __test_eb_bitmaps(unsigned long *bitmap, struct extent_buffer *eb,
static int test_eb_bitmaps(u32 sectorsize, u32 nodesize)
{
struct btrfs_fs_info *fs_info;
- unsigned long len;
unsigned long *bitmap = NULL;
struct extent_buffer *eb = NULL;
int ret;
test_msg("running extent buffer bitmap tests");
- /*
- * In ppc64, sectorsize can be 64K, thus 4 * 64K will be larger than
- * BTRFS_MAX_METADATA_BLOCKSIZE.
- */
- len = (sectorsize < BTRFS_MAX_METADATA_BLOCKSIZE)
- ? sectorsize * 4 : sectorsize;
-
- fs_info = btrfs_alloc_dummy_fs_info(len, len);
+ fs_info = btrfs_alloc_dummy_fs_info(nodesize, sectorsize);
if (!fs_info) {
test_std_err(TEST_ALLOC_FS_INFO);
return -ENOMEM;
}
- bitmap = kmalloc(len, GFP_KERNEL);
+ bitmap = kmalloc(nodesize, GFP_KERNEL);
if (!bitmap) {
test_err("couldn't allocate test bitmap");
ret = -ENOMEM;
goto out;
}
- eb = __alloc_dummy_extent_buffer(fs_info, 0, len);
+ eb = __alloc_dummy_extent_buffer(fs_info, 0, nodesize);
if (!eb) {
test_std_err(TEST_ALLOC_ROOT);
ret = -ENOMEM;
goto out;
}
- ret = __test_eb_bitmaps(bitmap, eb, len);
+ ret = __test_eb_bitmaps(bitmap, eb, nodesize);
if (ret)
goto out;
- /* Do it over again with an extent buffer which isn't page-aligned. */
free_extent_buffer(eb);
- eb = __alloc_dummy_extent_buffer(fs_info, nodesize / 2, len);
+
+ /*
+ * Test again for case where the tree block is sectorsize aligned but
+ * not nodesize aligned.
+ */
+ eb = __alloc_dummy_extent_buffer(fs_info, sectorsize, nodesize);
if (!eb) {
test_std_err(TEST_ALLOC_ROOT);
ret = -ENOMEM;
goto out;
}
- ret = __test_eb_bitmaps(bitmap, eb, len);
+ ret = __test_eb_bitmaps(bitmap, eb, nodesize);
out:
free_extent_buffer(eb);
kfree(bitmap);
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 02/17] btrfs: calculate inline extent buffer page size based on page size
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
2020-09-08 7:52 ` [PATCH 01/17] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-11 9:56 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 03/17] btrfs: remove the open-code to read disk-key Qu Wenruo
` (16 subsequent siblings)
18 siblings, 1 reply; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
Btrfs only support 64K as max node size, thus for 4K page system, we
would have at most 16 pages for one extent buffer.
But for 64K system, we only need and always need one page for extent
buffer.
This stays true even for future subpage sized sector size support (as
long as extent buffer doesn't cross 64K boundary).
So this patch will change how INLINE_EXTENT_BUFFER_PAGES is calculated.
Instead of using fixed 16 pages, use (64K / PAGE_SIZE) as the result.
This should save some bytes for extent buffer structure for 64K
systems.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.h | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 00a88f2eb5ab..e16c5449ba48 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -86,8 +86,8 @@ struct extent_io_ops {
};
-#define INLINE_EXTENT_BUFFER_PAGES 16
-#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE)
+#define MAX_INLINE_EXTENT_BUFFER_SIZE SZ_64K
+#define INLINE_EXTENT_BUFFER_PAGES (MAX_INLINE_EXTENT_BUFFER_SIZE / PAGE_SIZE)
struct extent_buffer {
u64 start;
unsigned long len;
@@ -227,8 +227,15 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb);
static inline int num_extent_pages(const struct extent_buffer *eb)
{
- return (round_up(eb->start + eb->len, PAGE_SIZE) >> PAGE_SHIFT) -
- (eb->start >> PAGE_SHIFT);
+ /*
+ * For sectorsize == PAGE_SIZE case, since eb is always aligned to
+ * sectorsize, it's just (eb->len / PAGE_SIZE) >> PAGE_SHIFT.
+ *
+ * For sectorsize < PAGE_SIZE case, we only want to support 64K
+ * PAGE_SIZE, and ensured all tree blocks won't cross page boundary.
+ * So in that case we always got 1 page.
+ */
+ return (round_up(eb->len, PAGE_SIZE) >> PAGE_SHIFT);
}
static inline int extent_buffer_uptodate(const struct extent_buffer *eb)
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 03/17] btrfs: remove the open-code to read disk-key
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
2020-09-08 7:52 ` [PATCH 01/17] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
2020-09-08 7:52 ` [PATCH 02/17] btrfs: calculate inline extent buffer page size based on page size Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-11 10:07 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 04/17] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
` (15 subsequent siblings)
18 siblings, 1 reply; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
There is some ancient code from the old days where we handle the
disk_key read manually when the disk key is in one page.
But that's totally unnecessary, as we have read_extent_buffer() to
handle everything.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/ctree.c | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index cd1cd673bc0b..e204e1320745 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1697,7 +1697,6 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
}
while (low < high) {
- unsigned long oip;
unsigned long offset;
struct btrfs_disk_key *tmp;
struct btrfs_disk_key unaligned;
@@ -1705,17 +1704,9 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
mid = (low + high) / 2;
offset = p + mid * item_size;
- oip = offset_in_page(offset);
- if (oip + key_size <= PAGE_SIZE) {
- const unsigned long idx = offset >> PAGE_SHIFT;
- char *kaddr = page_address(eb->pages[idx]);
-
- tmp = (struct btrfs_disk_key *)(kaddr + oip);
- } else {
- read_extent_buffer(eb, &unaligned, offset, key_size);
- tmp = &unaligned;
- }
+ read_extent_buffer(eb, &unaligned, offset, key_size);
+ tmp = &unaligned;
ret = comp_keys(tmp, key);
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 04/17] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (2 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 03/17] btrfs: remove the open-code to read disk-key Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-08 18:03 ` kernel test robot
2020-09-11 10:11 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 05/17] btrfs: don't allow tree block to cross page boundary for subpage support Qu Wenruo
` (14 subsequent siblings)
18 siblings, 2 replies; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
For subpage size sector size support, one page can contain mutliple tree
blocks, thus we can no longer use (eb->start >> PAGE_SHIFT) any more, or
we can easily get extent buffer doesn't belongs to us.
This patch will use (extent_buffer::start / sectorsize) as index for radix
tree so that we can get correct extent buffer for subpage size support.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6def411b2eba..5d969340275e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5142,7 +5142,7 @@ struct extent_buffer *find_extent_buffer(struct btrfs_fs_info *fs_info,
rcu_read_lock();
eb = radix_tree_lookup(&fs_info->buffer_radix,
- start >> PAGE_SHIFT);
+ start / fs_info->sectorsize);
if (eb && atomic_inc_not_zero(&eb->refs)) {
rcu_read_unlock();
/*
@@ -5194,7 +5194,7 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
}
spin_lock(&fs_info->buffer_lock);
ret = radix_tree_insert(&fs_info->buffer_radix,
- start >> PAGE_SHIFT, eb);
+ start / fs_info->sectorsize, eb);
spin_unlock(&fs_info->buffer_lock);
radix_tree_preload_end();
if (ret == -EEXIST) {
@@ -5302,7 +5302,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
spin_lock(&fs_info->buffer_lock);
ret = radix_tree_insert(&fs_info->buffer_radix,
- start >> PAGE_SHIFT, eb);
+ start / fs_info->sectorsize, eb);
spin_unlock(&fs_info->buffer_lock);
radix_tree_preload_end();
if (ret == -EEXIST) {
@@ -5358,7 +5358,7 @@ static int release_extent_buffer(struct extent_buffer *eb)
spin_lock(&fs_info->buffer_lock);
radix_tree_delete(&fs_info->buffer_radix,
- eb->start >> PAGE_SHIFT);
+ eb->start / fs_info->sectorsize);
spin_unlock(&fs_info->buffer_lock);
} else {
spin_unlock(&eb->refs_lock);
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 05/17] btrfs: don't allow tree block to cross page boundary for subpage support
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (3 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 04/17] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-11 10:26 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 06/17] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
` (13 subsequent siblings)
18 siblings, 1 reply; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
As a preparation for subpage sector size support (allow sector size
smaller than page size to be mounted), if the sector size is smaller
than page size, we don't allow tree block to be read if it cross page
boundary (normally 64K).
This ensures that, tree blocks are always contained in one page for 64K
system, which can greatly simplify the handling.
Or we need to do complex multi-page handling for tree blocks.
Currently the only way to create such tree blocks crossing 64K boundary
is by btrfs-convert, which will get fixed soon and doesn't get
wide-spread usage.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5d969340275e..119193166cec 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5232,6 +5232,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
btrfs_err(fs_info, "bad tree block start %llu", start);
return ERR_PTR(-EINVAL);
}
+ if (fs_info->sectorsize < PAGE_SIZE && round_down(start, PAGE_SIZE) !=
+ round_down(start + len - 1, PAGE_SIZE)) {
+ btrfs_err(fs_info,
+ "tree block crosses page boundary, start %llu nodesize %lu",
+ start, len);
+ return ERR_PTR(-EINVAL);
+ }
eb = find_extent_buffer(fs_info, start);
if (eb)
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 06/17] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (4 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 05/17] btrfs: don't allow tree block to cross page boundary for subpage support Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-08 7:52 ` [PATCH 07/17] btrfs: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
` (12 subsequent siblings)
18 siblings, 0 replies; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs; +Cc: Goldwyn Rodrigues
To support sectorsize < PAGE_SIZE case, we need to take extra care for
extent buffer accessors.
Since sectorsize is smaller than PAGE_SIZE, one page can contain
multiple tree blocks, we must use eb->start to determine the real offset
to read/write for extent buffer accessors.
This patch introduces two helpers to do these:
- get_eb_page_index()
This is to calculate the index to access extent_buffer::pages.
It's just a simple wrapper around "start >> PAGE_SHIFT".
For sectorsize == PAGE_SIZE case, nothing is changed.
For sectorsize < PAGE_SIZE case, we always get index as 0, and
the existing page shift works also fine.
- get_eb_page_offset()
This is to calculate the offset to access extent_buffer::pages.
This needs to take extent_buffer::start into consideration.
For sectorsize == PAGE_SIZE case, extent_buffer::start is always
aligned to PAGE_SIZE, thus adding extent_buffer::start to
offset_in_page() won't change the result.
For sectorsize < PAGE_SIZE case, adding extent_buffer::start gives
us the correct offset to access.
This patch will touch the following parts to cover all extent buffer
accessors:
- BTRFS_SETGET_HEADER_FUNCS()
- read_extent_buffer()
- read_extent_buffer_to_user()
- memcmp_extent_buffer()
- write_extent_buffer_chunk_tree_uuid()
- write_extent_buffer_fsid()
- write_extent_buffer()
- memzero_extent_buffer()
- copy_extent_buffer_full()
- copy_extent_buffer()
- memcpy_extent_buffer()
- memmove_extent_buffer()
- btrfs_get_token_##bits()
- btrfs_get_##bits()
- btrfs_set_token_##bits()
- btrfs_set_##bits()
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/ctree.h | 38 ++++++++++++++++++++++--
fs/btrfs/extent_io.c | 66 ++++++++++++++++++++++++-----------------
fs/btrfs/struct-funcs.c | 18 ++++++-----
3 files changed, 85 insertions(+), 37 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9a72896bed2e..81d5a6cc97b5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1448,14 +1448,15 @@ static inline void btrfs_set_token_##name(struct btrfs_map_token *token,\
#define BTRFS_SETGET_HEADER_FUNCS(name, type, member, bits) \
static inline u##bits btrfs_##name(const struct extent_buffer *eb) \
{ \
- const type *p = page_address(eb->pages[0]); \
+ const type *p = page_address(eb->pages[0]) + \
+ offset_in_page(eb->start); \
u##bits res = le##bits##_to_cpu(p->member); \
return res; \
} \
static inline void btrfs_set_##name(const struct extent_buffer *eb, \
u##bits val) \
{ \
- type *p = page_address(eb->pages[0]); \
+ type *p = page_address(eb->pages[0]) + offset_in_page(eb->start); \
p->member = cpu_to_le##bits(val); \
}
@@ -3241,6 +3242,39 @@ static inline void assertfail(const char *expr, const char* file, int line) { }
#define ASSERT(expr) (void)(expr)
#endif
+/*
+ * Get the correct offset inside the page of extent buffer.
+ *
+ * Will handle both sectorsize == PAGE_SIZE and sectorsize < PAGE_SIZE cases.
+ *
+ * @eb: The target extent buffer
+ * @start: The offset inside the extent buffer
+ */
+static inline size_t get_eb_page_offset(const struct extent_buffer *eb,
+ unsigned long start)
+{
+ /*
+ * For sectorsize == PAGE_SIZE case, eb->start will always be aligned
+ * to PAGE_SIZE, thus adding it won't cause any difference.
+ *
+ * For sectorsize < PAGE_SIZE, we must only read the data belongs to
+ * the eb, thus we have to take the eb->start into consideration.
+ */
+ return offset_in_page(start + eb->start);
+}
+
+static inline unsigned long get_eb_page_index(unsigned long start)
+{
+ /*
+ * For sectorsize == PAGE_SIZE case, plain >> PAGE_SHIFT is enough.
+ *
+ * For sectorsize < PAGE_SIZE case, we only support 64K PAGE_SIZE,
+ * and has ensured all tree blocks are contained in one page, thus
+ * we always get index == 0.
+ */
+ return start >> PAGE_SHIFT;
+}
+
/*
* Use that for functions that are conditionally exported for sanity tests but
* otherwise static
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 119193166cec..6fafbc1d047b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5637,7 +5637,7 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
struct page *page;
char *kaddr;
char *dst = (char *)dstv;
- unsigned long i = start >> PAGE_SHIFT;
+ unsigned long i = get_eb_page_index(start);
if (start + len > eb->len) {
WARN(1, KERN_ERR "btrfs bad mapping eb start %llu len %lu, wanted %lu %lu\n",
@@ -5646,7 +5646,7 @@ void read_extent_buffer(const struct extent_buffer *eb, void *dstv,
return;
}
- offset = offset_in_page(start);
+ offset = get_eb_page_offset(eb, start);
while (len > 0) {
page = eb->pages[i];
@@ -5671,13 +5671,13 @@ int read_extent_buffer_to_user(const struct extent_buffer *eb,
struct page *page;
char *kaddr;
char __user *dst = (char __user *)dstv;
- unsigned long i = start >> PAGE_SHIFT;
+ unsigned long i = get_eb_page_index(start);
int ret = 0;
WARN_ON(start > eb->len);
WARN_ON(start + len > eb->start + eb->len);
- offset = offset_in_page(start);
+ offset = get_eb_page_offset(eb, start);
while (len > 0) {
page = eb->pages[i];
@@ -5706,13 +5706,13 @@ int memcmp_extent_buffer(const struct extent_buffer *eb, const void *ptrv,
struct page *page;
char *kaddr;
char *ptr = (char *)ptrv;
- unsigned long i = start >> PAGE_SHIFT;
+ unsigned long i = get_eb_page_index(start);
int ret = 0;
WARN_ON(start > eb->len);
WARN_ON(start + len > eb->start + eb->len);
- offset = offset_in_page(start);
+ offset = get_eb_page_offset(eb, start);
while (len > 0) {
page = eb->pages[i];
@@ -5738,7 +5738,7 @@ void write_extent_buffer_chunk_tree_uuid(const struct extent_buffer *eb,
char *kaddr;
WARN_ON(!PageUptodate(eb->pages[0]));
- kaddr = page_address(eb->pages[0]);
+ kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0);
memcpy(kaddr + offsetof(struct btrfs_header, chunk_tree_uuid), srcv,
BTRFS_FSID_SIZE);
}
@@ -5748,7 +5748,7 @@ void write_extent_buffer_fsid(const struct extent_buffer *eb, const void *srcv)
char *kaddr;
WARN_ON(!PageUptodate(eb->pages[0]));
- kaddr = page_address(eb->pages[0]);
+ kaddr = page_address(eb->pages[0]) + get_eb_page_offset(eb, 0);
memcpy(kaddr + offsetof(struct btrfs_header, fsid), srcv,
BTRFS_FSID_SIZE);
}
@@ -5761,12 +5761,12 @@ void write_extent_buffer(const struct extent_buffer *eb, const void *srcv,
struct page *page;
char *kaddr;
char *src = (char *)srcv;
- unsigned long i = start >> PAGE_SHIFT;
+ unsigned long i = get_eb_page_index(start);
WARN_ON(start > eb->len);
WARN_ON(start + len > eb->start + eb->len);
- offset = offset_in_page(start);
+ offset = get_eb_page_offset(eb, start);
while (len > 0) {
page = eb->pages[i];
@@ -5790,12 +5790,12 @@ void memzero_extent_buffer(const struct extent_buffer *eb, unsigned long start,
size_t offset;
struct page *page;
char *kaddr;
- unsigned long i = start >> PAGE_SHIFT;
+ unsigned long i = get_eb_page_index(start);
WARN_ON(start > eb->len);
WARN_ON(start + len > eb->start + eb->len);
- offset = offset_in_page(start);
+ offset = get_eb_page_offset(eb, start);
while (len > 0) {
page = eb->pages[i];
@@ -5819,10 +5819,22 @@ void copy_extent_buffer_full(const struct extent_buffer *dst,
ASSERT(dst->len == src->len);
- num_pages = num_extent_pages(dst);
- for (i = 0; i < num_pages; i++)
- copy_page(page_address(dst->pages[i]),
- page_address(src->pages[i]));
+ if (dst->fs_info->sectorsize == PAGE_SIZE) {
+ num_pages = num_extent_pages(dst);
+ for (i = 0; i < num_pages; i++)
+ copy_page(page_address(dst->pages[i]),
+ page_address(src->pages[i]));
+ } else {
+ unsigned long src_index = get_eb_page_index(src->start);
+ unsigned long dst_index = get_eb_page_index(dst->start);
+ size_t src_offset = get_eb_page_offset(src, 0);
+ size_t dst_offset = get_eb_page_offset(dst, 0);
+
+ ASSERT(src_index == 0 && dst_index == 0);
+ memcpy(page_address(dst->pages[dst_index]) + dst_offset,
+ page_address(src->pages[src_index]) + src_offset,
+ src->len);
+ }
}
void copy_extent_buffer(const struct extent_buffer *dst,
@@ -5835,11 +5847,11 @@ void copy_extent_buffer(const struct extent_buffer *dst,
size_t offset;
struct page *page;
char *kaddr;
- unsigned long i = dst_offset >> PAGE_SHIFT;
+ unsigned long i = get_eb_page_index(dst_offset);
WARN_ON(src->len != dst_len);
- offset = offset_in_page(dst_offset);
+ offset = get_eb_page_offset(dst, dst_offset);
while (len > 0) {
page = dst->pages[i];
@@ -5883,7 +5895,7 @@ static inline void eb_bitmap_offset(const struct extent_buffer *eb,
* the bitmap item in the extent buffer + the offset of the byte in the
* bitmap item.
*/
- offset = start + byte_offset;
+ offset = start + offset_in_page(eb->start) + byte_offset;
*page_index = offset >> PAGE_SHIFT;
*page_offset = offset_in_page(offset);
@@ -6047,11 +6059,11 @@ void memcpy_extent_buffer(const struct extent_buffer *dst,
}
while (len > 0) {
- dst_off_in_page = offset_in_page(dst_offset);
- src_off_in_page = offset_in_page(src_offset);
+ dst_off_in_page = get_eb_page_offset(dst, dst_offset);
+ src_off_in_page = get_eb_page_offset(dst, src_offset);
- dst_i = dst_offset >> PAGE_SHIFT;
- src_i = src_offset >> PAGE_SHIFT;
+ dst_i = get_eb_page_index(dst_offset);
+ src_i = get_eb_page_index(src_offset);
cur = min(len, (unsigned long)(PAGE_SIZE -
src_off_in_page));
@@ -6097,11 +6109,11 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
return;
}
while (len > 0) {
- dst_i = dst_end >> PAGE_SHIFT;
- src_i = src_end >> PAGE_SHIFT;
+ dst_i = get_eb_page_index(dst_end);
+ src_i = get_eb_page_index(src_end);
- dst_off_in_page = offset_in_page(dst_end);
- src_off_in_page = offset_in_page(src_end);
+ dst_off_in_page = get_eb_page_offset(dst, dst_end);
+ src_off_in_page = get_eb_page_offset(dst, src_end);
cur = min_t(unsigned long, len, src_off_in_page + 1);
cur = min(cur, dst_off_in_page + 1);
diff --git a/fs/btrfs/struct-funcs.c b/fs/btrfs/struct-funcs.c
index 079b059818e9..769901c2b3c9 100644
--- a/fs/btrfs/struct-funcs.c
+++ b/fs/btrfs/struct-funcs.c
@@ -67,8 +67,9 @@ u##bits btrfs_get_token_##bits(struct btrfs_map_token *token, \
const void *ptr, unsigned long off) \
{ \
const unsigned long member_offset = (unsigned long)ptr + off; \
- const unsigned long idx = member_offset >> PAGE_SHIFT; \
- const unsigned long oip = offset_in_page(member_offset); \
+ const unsigned long idx = get_eb_page_index(member_offset); \
+ const unsigned long oip = get_eb_page_offset(token->eb, \
+ member_offset); \
const int size = sizeof(u##bits); \
u8 lebytes[sizeof(u##bits)]; \
const int part = PAGE_SIZE - oip; \
@@ -95,8 +96,8 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \
const void *ptr, unsigned long off) \
{ \
const unsigned long member_offset = (unsigned long)ptr + off; \
- const unsigned long oip = offset_in_page(member_offset); \
- const unsigned long idx = member_offset >> PAGE_SHIFT; \
+ const unsigned long oip = get_eb_page_offset(eb, member_offset);\
+ const unsigned long idx = get_eb_page_index(member_offset); \
char *kaddr = page_address(eb->pages[idx]); \
const int size = sizeof(u##bits); \
const int part = PAGE_SIZE - oip; \
@@ -116,8 +117,9 @@ void btrfs_set_token_##bits(struct btrfs_map_token *token, \
u##bits val) \
{ \
const unsigned long member_offset = (unsigned long)ptr + off; \
- const unsigned long idx = member_offset >> PAGE_SHIFT; \
- const unsigned long oip = offset_in_page(member_offset); \
+ const unsigned long idx = get_eb_page_index(member_offset); \
+ const unsigned long oip = get_eb_page_offset(token->eb, \
+ member_offset); \
const int size = sizeof(u##bits); \
u8 lebytes[sizeof(u##bits)]; \
const int part = PAGE_SIZE - oip; \
@@ -146,8 +148,8 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
unsigned long off, u##bits val) \
{ \
const unsigned long member_offset = (unsigned long)ptr + off; \
- const unsigned long oip = offset_in_page(member_offset); \
- const unsigned long idx = member_offset >> PAGE_SHIFT; \
+ const unsigned long oip = get_eb_page_offset(eb, member_offset);\
+ const unsigned long idx = get_eb_page_index(member_offset); \
char *kaddr = page_address(eb->pages[idx]); \
const int size = sizeof(u##bits); \
const int part = PAGE_SIZE - oip; \
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 07/17] btrfs: make csum_tree_block() handle sectorsize smaller than page size
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (5 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 06/17] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-11 11:10 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 08/17] btrfs: refactor how we extract extent buffer from page for alloc_extent_buffer() Qu Wenruo
` (11 subsequent siblings)
18 siblings, 1 reply; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs; +Cc: Goldwyn Rodrigues
For subpage size support, we only need to handle the first page.
To make the code work for both cases, we modify the following behaviors:
- num_pages calcuation
Instead of "nodesize >> PAGE_SHIFT", we go
"DIV_ROUND_UP(nodesize, PAGE_SIZE)", this ensures we get at least one
page for subpage size support, while still get the same result for
reguar page size.
- The length for the first run
Instead of PAGE_SIZE - BTRFS_CSUM_SIZE, we go min(PAGE_SIZE, nodesize)
- BTRFS_CSUM_SIZE.
This allows us to handle both cases well.
- The start location of the first run
Instead of always use BTRFS_CSUM_SIZE as csum start position, add
offset_in_page(eb->start) to get proper offset for both cases.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f6bba7eb1fa1..62dbd9bbd381 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -257,16 +257,16 @@ struct extent_map *btree_get_extent(struct btrfs_inode *inode,
static void csum_tree_block(struct extent_buffer *buf, u8 *result)
{
struct btrfs_fs_info *fs_info = buf->fs_info;
- const int num_pages = fs_info->nodesize >> PAGE_SHIFT;
+ const int num_pages = DIV_ROUND_UP(fs_info->nodesize, PAGE_SIZE);
SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
char *kaddr;
int i;
shash->tfm = fs_info->csum_shash;
crypto_shash_init(shash);
- kaddr = page_address(buf->pages[0]);
+ kaddr = page_address(buf->pages[0]) + offset_in_page(buf->start);
crypto_shash_update(shash, kaddr + BTRFS_CSUM_SIZE,
- PAGE_SIZE - BTRFS_CSUM_SIZE);
+ min_t(u32, PAGE_SIZE, fs_info->nodesize) - BTRFS_CSUM_SIZE);
for (i = 1; i < num_pages; i++) {
kaddr = page_address(buf->pages[i]);
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 08/17] btrfs: refactor how we extract extent buffer from page for alloc_extent_buffer()
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (6 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 07/17] btrfs: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-11 11:14 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 09/17] btrfs: refactor btrfs_release_extent_buffer_pages() Qu Wenruo
` (10 subsequent siblings)
18 siblings, 1 reply; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
This patch will extract the code to extract extent_buffer from
page::private into its own function, grab_extent_buffer_from_page().
Although it's just one line, for later sub-page size support it will
become way more larger.
Also add some extra comments why we need to do such page::private
dancing.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 49 ++++++++++++++++++++++++++++++++++++--------
1 file changed, 40 insertions(+), 9 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 6fafbc1d047b..3c8fe40f67fa 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5214,6 +5214,44 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
}
#endif
+/*
+ * A helper to grab the exist extent buffer from a page.
+ *
+ * There is a small race window where two callers of alloc_extent_buffer():
+ * Thread 1 | Thread 2
+ * -------------------------------------+---------------------------------------
+ * alloc_extent_buffer() | alloc_extent_buffer()
+ * |- eb = __alloc_extent_buffer() | |- eb = __alloc_extent_buffer()
+ * |- p = find_or_create_page() | |- p = find_or_create_page()
+ *
+ * In above case, two ebs get allocated for the same bytenr, and got the same
+ * page.
+ * We have to rely on the page->mapping->private_lock to make one of them to
+ * give up and reuse the allocated eb:
+ *
+ * | |- grab_extent_buffer_from_page()
+ * | |- get nothing
+ * | |- attach_extent_buffer_page()
+ * | | |- Now page->private is set
+ * | |- spin_unlock(&mapping->private_lock);
+ * |- spin_lock(private_lock); | |- Continue to insert radix tree.
+ * |- grab_extent_buffer_from_page() |
+ * |- got eb from thread 2 |
+ * |- spin_unlock(private_lock); |
+ * |- goto free_eb; |
+ *
+ * The function here is to ensure we have proper locking and detect such race
+ * so we won't allocating an eb twice.
+ */
+static struct extent_buffer *grab_extent_buffer_from_page(struct page *page)
+{
+ /*
+ * For PAGE_SIZE == sectorsize case, a btree_inode page should have its
+ * private pointer as extent buffer who owns this page.
+ */
+ return (struct extent_buffer *)page->private;
+}
+
struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
u64 start)
{
@@ -5258,15 +5296,8 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
spin_lock(&mapping->private_lock);
if (PagePrivate(p)) {
- /*
- * We could have already allocated an eb for this page
- * and attached one so lets see if we can get a ref on
- * the existing eb, and if we can we know it's good and
- * we can just return that one, else we know we can just
- * overwrite page->private.
- */
- exists = (struct extent_buffer *)p->private;
- if (atomic_inc_not_zero(&exists->refs)) {
+ exists = grab_extent_buffer_from_page(p);
+ if (exists && atomic_inc_not_zero(&exists->refs)) {
spin_unlock(&mapping->private_lock);
unlock_page(p);
put_page(p);
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 09/17] btrfs: refactor btrfs_release_extent_buffer_pages()
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (7 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 08/17] btrfs: refactor how we extract extent buffer from page for alloc_extent_buffer() Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-11 11:17 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 10/17] btrfs: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
` (9 subsequent siblings)
18 siblings, 1 reply; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
We have attach_extent_buffer_page() and it get utilized in
btrfs_clone_extent_buffer() and alloc_extent_buffer().
But in btrfs_release_extent_buffer_pages() we manually call
detach_page_private().
This is fine for current code, but if we're going to support subpage
size, we will do a lot of more work other than just calling
detach_page_private().
This patch will extract the main work of btrfs_clone_extent_buffer()
into detach_extent_buffer_page() so that later subpage size support can
put their own code into them.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 58 +++++++++++++++++++-------------------------
1 file changed, 25 insertions(+), 33 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3c8fe40f67fa..1cb41dab7a1d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4920,6 +4920,29 @@ int extent_buffer_under_io(const struct extent_buffer *eb)
test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
}
+static void detach_extent_buffer_page(struct extent_buffer *eb,
+ struct page *page)
+{
+ bool mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
+
+ if (!page)
+ return;
+
+ if (mapped)
+ spin_lock(&page->mapping->private_lock);
+ if (PagePrivate(page) && page->private == (unsigned long)eb) {
+ BUG_ON(test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
+ BUG_ON(PageDirty(page));
+ BUG_ON(PageWriteback(page));
+ /* We need to make sure we haven't be attached to a new eb. */
+ detach_page_private(page);
+ }
+ if (mapped)
+ spin_unlock(&page->mapping->private_lock);
+ /* One for when we allocated the page */
+ put_page(page);
+}
+
/*
* Release all pages attached to the extent buffer.
*/
@@ -4927,43 +4950,12 @@ static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
{
int i;
int num_pages;
- int mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
BUG_ON(extent_buffer_under_io(eb));
num_pages = num_extent_pages(eb);
- for (i = 0; i < num_pages; i++) {
- struct page *page = eb->pages[i];
-
- if (!page)
- continue;
- if (mapped)
- spin_lock(&page->mapping->private_lock);
- /*
- * We do this since we'll remove the pages after we've
- * removed the eb from the radix tree, so we could race
- * and have this page now attached to the new eb. So
- * only clear page_private if it's still connected to
- * this eb.
- */
- if (PagePrivate(page) &&
- page->private == (unsigned long)eb) {
- BUG_ON(test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
- BUG_ON(PageDirty(page));
- BUG_ON(PageWriteback(page));
- /*
- * We need to make sure we haven't be attached
- * to a new eb.
- */
- detach_page_private(page);
- }
-
- if (mapped)
- spin_unlock(&page->mapping->private_lock);
-
- /* One for when we allocated the page */
- put_page(page);
- }
+ for (i = 0; i < num_pages; i++)
+ detach_extent_buffer_page(eb, eb->pages[i]);
}
/*
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 10/17] btrfs: add assert_spin_locked() for attach_extent_buffer_page()
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (8 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 09/17] btrfs: refactor btrfs_release_extent_buffer_pages() Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-11 11:22 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 11/17] btrfs: extract the extent buffer verification from btree_readpage_end_io_hook() Qu Wenruo
` (8 subsequent siblings)
18 siblings, 1 reply; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
When calling attach_extent_buffer_page(), either we're attaching
anonymous pages, called from btrfs_clone_extent_buffer().
Or we're attaching btree_inode pages, called from alloc_extent_buffer().
For the later case, we should have page->mapping->private_lock hold to
avoid race modifying page->private.
Add assert_spin_locked() if we're calling from alloc_extent_buffer().
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1cb41dab7a1d..81e43d99feda 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3096,6 +3096,9 @@ static int submit_extent_page(unsigned int opf,
static void attach_extent_buffer_page(struct extent_buffer *eb,
struct page *page)
{
+ if (page->mapping)
+ assert_spin_locked(&page->mapping->private_lock);
+
if (!PagePrivate(page))
attach_page_private(page, eb);
else
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 11/17] btrfs: extract the extent buffer verification from btree_readpage_end_io_hook()
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (9 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 10/17] btrfs: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-11 13:00 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 12/17] btrfs: remove the unnecessary parameter @start and @len for check_data_csum() Qu Wenruo
` (7 subsequent siblings)
18 siblings, 1 reply; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
Currently btree_readpage_end_io_hook() only needs to handle one extent
buffer as currently one page only maps to one extent buffer.
But for incoming subpage support, one page can be mapped to multiple
extent buffers, thus we can no longer use current code.
This refactor would allow us to call btrfs_check_extent_buffer() on
all involved extent buffers at btree_readpage_end_io_hook() and other
locations.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 78 ++++++++++++++++++++++++++--------------------
1 file changed, 45 insertions(+), 33 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 62dbd9bbd381..f6e562979682 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -574,60 +574,37 @@ static int check_tree_block_fsid(struct extent_buffer *eb)
return ret;
}
-static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
- u64 phy_offset, struct page *page,
- u64 start, u64 end, int mirror)
+/* Do basic extent buffer check at read time */
+static int btrfs_check_extent_buffer(struct extent_buffer *eb)
{
- u64 found_start;
- int found_level;
- struct extent_buffer *eb;
- struct btrfs_fs_info *fs_info;
+ struct btrfs_fs_info *fs_info = eb->fs_info;
u16 csum_size;
- int ret = 0;
+ u64 found_start;
+ u8 found_level;
u8 result[BTRFS_CSUM_SIZE];
- int reads_done;
-
- if (!page->private)
- goto out;
+ int ret = 0;
- eb = (struct extent_buffer *)page->private;
- fs_info = eb->fs_info;
csum_size = btrfs_super_csum_size(fs_info->super_copy);
- /* the pending IO might have been the only thing that kept this buffer
- * in memory. Make sure we have a ref for all this other checks
- */
- atomic_inc(&eb->refs);
-
- reads_done = atomic_dec_and_test(&eb->io_pages);
- if (!reads_done)
- goto err;
-
- eb->read_mirror = mirror;
- if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
- ret = -EIO;
- goto err;
- }
-
found_start = btrfs_header_bytenr(eb);
if (found_start != eb->start) {
btrfs_err_rl(fs_info, "bad tree block start, want %llu have %llu",
eb->start, found_start);
ret = -EIO;
- goto err;
+ goto out;
}
if (check_tree_block_fsid(eb)) {
btrfs_err_rl(fs_info, "bad fsid on block %llu",
eb->start);
ret = -EIO;
- goto err;
+ goto out;
}
found_level = btrfs_header_level(eb);
if (found_level >= BTRFS_MAX_LEVEL) {
btrfs_err(fs_info, "bad tree block level %d on %llu",
(int)btrfs_header_level(eb), eb->start);
ret = -EIO;
- goto err;
+ goto out;
}
btrfs_set_buffer_lockdep_class(btrfs_header_owner(eb),
@@ -647,7 +624,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
fs_info->sb->s_id, eb->start,
val, found, btrfs_header_level(eb));
ret = -EUCLEAN;
- goto err;
+ goto out;
}
/*
@@ -669,6 +646,41 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
btrfs_err(fs_info,
"block=%llu read time tree block corruption detected",
eb->start);
+out:
+ return ret;
+}
+
+static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
+ u64 phy_offset, struct page *page,
+ u64 start, u64 end, int mirror)
+{
+ struct extent_buffer *eb;
+ int ret = 0;
+ int reads_done;
+
+ if (!page->private)
+ goto out;
+
+ eb = (struct extent_buffer *)page->private;
+
+ /*
+ * The pending IO might have been the only thing that kept this buffer
+ * in memory. Make sure we have a ref for all this other checks
+ */
+ atomic_inc(&eb->refs);
+
+ reads_done = atomic_dec_and_test(&eb->io_pages);
+ if (!reads_done)
+ goto err;
+
+ eb->read_mirror = mirror;
+ if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
+ ret = -EIO;
+ goto err;
+ }
+
+ ret = btrfs_check_extent_buffer(eb);
+
err:
if (reads_done &&
test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 12/17] btrfs: remove the unnecessary parameter @start and @len for check_data_csum()
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (10 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 11/17] btrfs: extract the extent buffer verification from btree_readpage_end_io_hook() Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-11 13:50 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 13/17] btrfs: extent_io: only require sector size alignment for page read Qu Wenruo
` (6 subsequent siblings)
18 siblings, 1 reply; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
For check_data_csum(), the page we're using is directly from inode
mapping, thus it has valid page_offset().
We can use (page_offset() + pg_off) to replace @start parameter
completely, while the @len should always be sectorsize.
Since we're here, also add some comment, since there are quite some
confusion in words like start/offset, without explaining whether it's
file_offset or logical bytenr.
This should not affect the existing behavior, as for current sectorsize
== PAGE_SIZE case, @pgoff should always be 0, and len is always
PAGE_SIZE (or sectorsize from the dio read path).
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/inode.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 9570458aa847..078735aa0f68 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2793,17 +2793,30 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
btrfs_queue_work(wq, &ordered_extent->work);
}
+/*
+ * Verify the checksum of one sector of uncompressed data.
+ *
+ * @inode: The inode.
+ * @io_bio: The btrfs_io_bio which contains the csum.
+ * @icsum: The csum offset (by number of sectors).
+ * @page: The page where the data will be written to.
+ * @pgoff: The offset inside the page.
+ *
+ * The length of such check is always one sector size.
+ */
static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
- int icsum, struct page *page, int pgoff, u64 start,
- size_t len)
+ int icsum, struct page *page, int pgoff)
{
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
char *kaddr;
+ u32 len = fs_info->sectorsize;
u16 csum_size = btrfs_super_csum_size(fs_info->super_copy);
u8 *csum_expected;
u8 csum[BTRFS_CSUM_SIZE];
+ ASSERT(pgoff + len <= PAGE_SIZE);
+
csum_expected = ((u8 *)io_bio->csum) + icsum * csum_size;
kaddr = kmap_atomic(page);
@@ -2817,8 +2830,8 @@ static int check_data_csum(struct inode *inode, struct btrfs_io_bio *io_bio,
kunmap_atomic(kaddr);
return 0;
zeroit:
- btrfs_print_data_csum_error(BTRFS_I(inode), start, csum, csum_expected,
- io_bio->mirror_num);
+ btrfs_print_data_csum_error(BTRFS_I(inode), page_offset(page) + pgoff,
+ csum, csum_expected, io_bio->mirror_num);
if (io_bio->device)
btrfs_dev_stat_inc_and_print(io_bio->device,
BTRFS_DEV_STAT_CORRUPTION_ERRS);
@@ -2857,8 +2870,7 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
}
phy_offset >>= inode->i_sb->s_blocksize_bits;
- return check_data_csum(inode, io_bio, phy_offset, page, offset, start,
- (size_t)(end - start + 1));
+ return check_data_csum(inode, io_bio, phy_offset, page, offset);
}
/*
@@ -7545,8 +7557,7 @@ static blk_status_t btrfs_check_read_dio_bio(struct inode *inode,
ASSERT(pgoff < PAGE_SIZE);
if (uptodate &&
(!csum || !check_data_csum(inode, io_bio, icsum,
- bvec.bv_page, pgoff,
- start, sectorsize))) {
+ bvec.bv_page, pgoff))) {
clean_io_failure(fs_info, failure_tree, io_tree,
start, bvec.bv_page,
btrfs_ino(BTRFS_I(inode)),
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 13/17] btrfs: extent_io: only require sector size alignment for page read
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (11 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 12/17] btrfs: remove the unnecessary parameter @start and @len for check_data_csum() Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-11 13:55 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 14/17] btrfs: make btrfs_readpage_end_io_hook() follow sector size Qu Wenruo
` (5 subsequent siblings)
18 siblings, 1 reply; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
If we're reading partial page, btrfs will warn about this as our
read/write are always done in sector size, which equals page size.
But for the incoming subpage RO support, our data read is only aligned
to sectorsize, which can be smaller than page size.
Thus here we change the warning condition to check it against
sectorsize, thus the behavior is not changed for regular sectorsize ==
PAGE_SIZE case, while won't report error for subpage read.
Also, pass the proper start/end with bv_offset for check_data_csum() to
handle.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 81e43d99feda..a83b63ecc5f8 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2819,6 +2819,7 @@ static void end_bio_extent_readpage(struct bio *bio)
struct page *page = bvec->bv_page;
struct inode *inode = page->mapping->host;
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+ u32 sectorsize = fs_info->sectorsize;
bool data_inode = btrfs_ino(BTRFS_I(inode))
!= BTRFS_BTREE_INODE_OBJECTID;
@@ -2829,13 +2830,17 @@ static void end_bio_extent_readpage(struct bio *bio)
tree = &BTRFS_I(inode)->io_tree;
failure_tree = &BTRFS_I(inode)->io_failure_tree;
- /* We always issue full-page reads, but if some block
+ /*
+ * We always issue full-sector reads, but if some block
* in a page fails to read, blk_update_request() will
* advance bv_offset and adjust bv_len to compensate.
- * Print a warning for nonzero offsets, and an error
- * if they don't add up to a full page. */
- if (bvec->bv_offset || bvec->bv_len != PAGE_SIZE) {
- if (bvec->bv_offset + bvec->bv_len != PAGE_SIZE)
+ * Print a warning for unaligned offsets, and an error
+ * if they don't add up to a full sector.
+ */
+ if (!IS_ALIGNED(bvec->bv_offset, sectorsize) ||
+ !IS_ALIGNED(bvec->bv_offset + bvec->bv_len, sectorsize)) {
+ if (!IS_ALIGNED(bvec->bv_offset + bvec->bv_len,
+ sectorsize))
btrfs_err(fs_info,
"partial page read in btrfs with offset %u and length %u",
bvec->bv_offset, bvec->bv_len);
@@ -2845,8 +2850,8 @@ static void end_bio_extent_readpage(struct bio *bio)
bvec->bv_offset, bvec->bv_len);
}
- start = page_offset(page);
- end = start + bvec->bv_offset + bvec->bv_len - 1;
+ start = page_offset(page) + bvec->bv_offset;
+ end = start + bvec->bv_len - 1;
len = bvec->bv_len;
mirror = io_bio->mirror_num;
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 14/17] btrfs: make btrfs_readpage_end_io_hook() follow sector size
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (12 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 13/17] btrfs: extent_io: only require sector size alignment for page read Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-09 17:34 ` Goldwyn Rodrigues
2020-09-08 7:52 ` [PATCH 15/17] btrfs: introduce subpage_eb_mapping for extent buffers Qu Wenruo
` (4 subsequent siblings)
18 siblings, 1 reply; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs; +Cc: Goldwyn Rodrigues
Currently btrfs_readpage_end_io_hook() just pass the whole page to
check_data_csum(), which is fine since we only support sectorsize ==
PAGE_SIZE.
To support subpage RO support, we need to properly honor per-sector
checksum verification, just like what we did in dio read path.
This patch will do the csum verification in a for loop, starts with
pg_off == start - page_offset(page), with sectorsize increasement for
each loop.
For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
will only finish with just one loop.
For subpage, we do the proper loop.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/inode.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 078735aa0f68..8bd14dda2067 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2851,9 +2851,12 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
u64 start, u64 end, int mirror)
{
size_t offset = start - page_offset(page);
+ size_t pg_off;
struct inode *inode = page->mapping->host;
struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
struct btrfs_root *root = BTRFS_I(inode)->root;
+ u32 sectorsize = root->fs_info->sectorsize;
+ bool found_err = false;
if (PageChecked(page)) {
ClearPageChecked(page);
@@ -2870,7 +2873,17 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
}
phy_offset >>= inode->i_sb->s_blocksize_bits;
- return check_data_csum(inode, io_bio, phy_offset, page, offset);
+ for (pg_off = offset; pg_off < end - page_offset(page);
+ pg_off += sectorsize, phy_offset++) {
+ int ret;
+
+ ret = check_data_csum(inode, io_bio, phy_offset, page, pg_off);
+ if (ret < 0)
+ found_err = true;
+ }
+ if (found_err)
+ return -EIO;
+ return 0;
}
/*
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 15/17] btrfs: introduce subpage_eb_mapping for extent buffers
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (13 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 14/17] btrfs: make btrfs_readpage_end_io_hook() follow sector size Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-08 10:22 ` kernel test robot
` (2 more replies)
2020-09-08 7:52 ` [PATCH 16/17] btrfs: handle extent buffer verification proper for subpage size Qu Wenruo
` (3 subsequent siblings)
18 siblings, 3 replies; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
One of the design blockage for subpage support is the btree inode
page::private mapping.
Currently page::private for btree inode is a pointer to extent buffer
who owns this page.
This is fine for sectorsize == PAGE_SIZE case, but not suitable for
subpage size support, as in that case one page can hold multiple tree
blocks.
So to support subpage, here we introduce a new structure,
subpage_eb_mapping, to record how many extent buffers are referring to
one page.
It uses a bitmap (at most 16 bits used) to record tree blocks, and a
extent buffer pointers array (at most 16 too) to record the owners.
This patch will modify the following functions to add subpage support
using subpage_eb_mapping structure:
- attach_extent_buffer_page()
- detach_extent_buffer_page()
- grab_extent_buffer_from_page()
- try_release_extent_buffer()
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/extent_io.c | 221 ++++++++++++++++++++++++++++++++++++++++---
fs/btrfs/extent_io.h | 3 +
2 files changed, 212 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a83b63ecc5f8..87b3bb781532 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -29,6 +29,34 @@ static struct kmem_cache *extent_state_cache;
static struct kmem_cache *extent_buffer_cache;
static struct bio_set btrfs_bioset;
+/* Upper limit of how many extent buffers can be stored in one page */
+#define SUBPAGE_NR_EXTENT_BUFFERS (SZ_64K / SZ_4K)
+/*
+ * Structure for subpage support, recording the page -> extent buffer mapping
+ *
+ * For subpage support, one 64K page can contain several tree blocks, other than
+ * 1:1 page <-> extent buffer mapping from sectorsize == PAGE_SIZE case.
+ */
+struct subpage_eb_mapping {
+ /*
+ * Which range has extent buffer.
+ *
+ * One bit represents one sector, bit nr represents the offset in page.
+ * At most 16 bits are utilized.
+ */
+ unsigned long bitmap;
+
+ /* We only support 64K PAGE_SIZE system to mount 4K sectorsize fs */
+ struct extent_buffer *buffers[SUBPAGE_NR_EXTENT_BUFFERS];
+};
+
+struct btrfs_fs_info *page_to_fs_info(struct page *page)
+{
+ ASSERT(page && page->mapping);
+
+ return BTRFS_I(page->mapping->host)->root->fs_info;
+}
+
static inline bool extent_state_in_tree(const struct extent_state *state)
{
return !RB_EMPTY_NODE(&state->rb_node);
@@ -3098,12 +3126,50 @@ static int submit_extent_page(unsigned int opf,
return ret;
}
+static void attach_subpage_mapping(struct extent_buffer *eb,
+ struct page *page,
+ struct subpage_eb_mapping *mapping)
+{
+ u32 sectorsize = eb->fs_info->sectorsize;
+ u32 nodesize = eb->fs_info->nodesize;
+ int index_start = (eb->start - page_offset(page)) / sectorsize;
+ int nr_bits = nodesize / sectorsize;
+ int i;
+
+ ASSERT(mapping);
+ if (!PagePrivate(page)) {
+ /* Attach mapping to page::private and initialize */
+ memset(mapping, 0, sizeof(*mapping));
+ attach_page_private(page, mapping);
+ } else {
+ /* Use the existing page::private as mapping */
+ kfree(mapping);
+ mapping = (struct subpage_eb_mapping *) page->private;
+ }
+
+ /* Set the bitmap and pointers */
+ for (i = index_start; i < index_start + nr_bits; i++) {
+ set_bit(i, &mapping->bitmap);
+ mapping->buffers[i] = eb;
+ }
+}
+
static void attach_extent_buffer_page(struct extent_buffer *eb,
- struct page *page)
+ struct page *page,
+ struct subpage_eb_mapping *mapping)
{
+ bool subpage = (eb->fs_info->sectorsize < PAGE_SIZE);
if (page->mapping)
assert_spin_locked(&page->mapping->private_lock);
+ if (subpage && page->mapping) {
+ attach_subpage_mapping(eb, page, mapping);
+ return;
+ }
+ /*
+ * Anonymous page and sectorsize == PAGE_SIZE uses page::private as a
+ * pointer to eb directly.
+ */
if (!PagePrivate(page))
attach_page_private(page, eb);
else
@@ -4928,16 +4994,61 @@ int extent_buffer_under_io(const struct extent_buffer *eb)
test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
}
+static void detach_subpage_mapping(struct extent_buffer *eb, struct page *page)
+{
+ struct subpage_eb_mapping *mapping;
+ u32 sectorsize = eb->fs_info->sectorsize;
+ int start_index;
+ int nr_bits = eb->fs_info->nodesize / sectorsize;
+ int i;
+
+ /* Page already detached */
+ if (!PagePrivate(page))
+ return;
+
+ assert_spin_locked(&page->mapping->private_lock);
+ ASSERT(eb->start >= page_offset(page) &&
+ eb->start < page_offset(page) + PAGE_SIZE);
+
+ mapping = (struct subpage_eb_mapping *)page->private;
+ start_index = (eb->start - page_offset(page)) / sectorsize;
+
+ for (i = start_index; i < start_index + nr_bits; i++) {
+ if (test_bit(i, &mapping->bitmap) &&
+ mapping->buffers[i] == eb) {
+ clear_bit(i, &mapping->bitmap);
+ mapping->buffers[i] = NULL;
+ }
+ }
+
+ /* Are we the last owner ? */
+ if (mapping->bitmap == 0) {
+ kfree(mapping);
+ detach_page_private(page);
+ /* One for the first time allocated the page */
+ put_page(page);
+ }
+}
+
static void detach_extent_buffer_page(struct extent_buffer *eb,
struct page *page)
{
bool mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
+ bool subpage = (eb->fs_info->sectorsize < PAGE_SIZE);
if (!page)
return;
if (mapped)
spin_lock(&page->mapping->private_lock);
+
+ if (subpage && page->mapping) {
+ detach_subpage_mapping(eb, page);
+ if (mapped)
+ spin_unlock(&page->mapping->private_lock);
+ return;
+ }
+
if (PagePrivate(page) && page->private == (unsigned long)eb) {
BUG_ON(test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
BUG_ON(PageDirty(page));
@@ -5035,7 +5146,7 @@ struct extent_buffer *btrfs_clone_extent_buffer(const struct extent_buffer *src)
btrfs_release_extent_buffer(new);
return NULL;
}
- attach_extent_buffer_page(new, p);
+ attach_extent_buffer_page(new, p, NULL);
WARN_ON(PageDirty(p));
SetPageUptodate(p);
new->pages[i] = p;
@@ -5243,8 +5354,31 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
* The function here is to ensure we have proper locking and detect such race
* so we won't allocating an eb twice.
*/
-static struct extent_buffer *grab_extent_buffer_from_page(struct page *page)
+static struct extent_buffer *grab_extent_buffer_from_page(struct page *page,
+ u64 bytenr)
{
+ struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+ bool subpage = (fs_info->sectorsize < PAGE_SIZE);
+
+ if (!PagePrivate(page))
+ return NULL;
+
+ if (subpage) {
+ struct subpage_eb_mapping *mapping;
+ u32 sectorsize = fs_info->sectorsize;
+ int start_index;
+
+ ASSERT(bytenr >= page_offset(page) &&
+ bytenr < page_offset(page) + PAGE_SIZE);
+
+ start_index = (bytenr - page_offset(page)) / sectorsize;
+ mapping = (struct subpage_eb_mapping *)page->private;
+
+ if (test_bit(start_index, &mapping->bitmap))
+ return mapping->buffers[start_index];
+ return NULL;
+ }
+
/*
* For PAGE_SIZE == sectorsize case, a btree_inode page should have its
* private pointer as extent buffer who owns this page.
@@ -5263,6 +5397,8 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
struct extent_buffer *exists = NULL;
struct page *p;
struct address_space *mapping = fs_info->btree_inode->i_mapping;
+ struct subpage_eb_mapping *subpage_mapping = NULL;
+ bool subpage = (fs_info->sectorsize < PAGE_SIZE);
int uptodate = 1;
int ret;
@@ -5286,6 +5422,14 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
if (!eb)
return ERR_PTR(-ENOMEM);
+ if (subpage) {
+ subpage_mapping = kmalloc(sizeof(*subpage_mapping), GFP_NOFS);
+ if (!mapping) {
+ exists = ERR_PTR(-ENOMEM);
+ goto free_eb;
+ }
+ }
+
num_pages = num_extent_pages(eb);
for (i = 0; i < num_pages; i++, index++) {
p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
@@ -5296,7 +5440,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
spin_lock(&mapping->private_lock);
if (PagePrivate(p)) {
- exists = grab_extent_buffer_from_page(p);
+ exists = grab_extent_buffer_from_page(p, start);
if (exists && atomic_inc_not_zero(&exists->refs)) {
spin_unlock(&mapping->private_lock);
unlock_page(p);
@@ -5306,16 +5450,19 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
}
exists = NULL;
- /*
- * Do this so attach doesn't complain and we need to
- * drop the ref the old guy had.
- */
- ClearPagePrivate(p);
- WARN_ON(PageDirty(p));
- put_page(p);
+ if (!subpage) {
+ /*
+ * Do this so attach doesn't complain and we
+ * need to drop the ref the old guy had.
+ */
+ ClearPagePrivate(p);
+ WARN_ON(PageDirty(p));
+ put_page(p);
+ }
}
- attach_extent_buffer_page(eb, p);
+ attach_extent_buffer_page(eb, p, subpage_mapping);
spin_unlock(&mapping->private_lock);
+ subpage_mapping = NULL;
WARN_ON(PageDirty(p));
eb->pages[i] = p;
if (!PageUptodate(p))
@@ -5365,6 +5512,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
free_eb:
WARN_ON(!atomic_dec_and_test(&eb->refs));
+ kfree(subpage_mapping);
for (i = 0; i < num_pages; i++) {
if (eb->pages[i])
unlock_page(eb->pages[i]);
@@ -6158,8 +6306,49 @@ void memmove_extent_buffer(const struct extent_buffer *dst,
}
}
+int try_release_subpage_ebs(struct page *page)
+{
+ struct subpage_eb_mapping *mapping;
+ int i;
+
+ assert_spin_locked(&page->mapping->private_lock);
+ if (!PagePrivate(page))
+ return 1;
+
+ mapping = (struct subpage_eb_mapping *)page->private;
+ for (i = 0; i < SUBPAGE_NR_EXTENT_BUFFERS && PagePrivate(page); i++) {
+ struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+ struct extent_buffer *eb;
+ int ret;
+
+ if (!test_bit(i, &mapping->bitmap))
+ continue;
+
+ eb = mapping->buffers[i];
+ spin_unlock(&page->mapping->private_lock);
+ spin_lock(&eb->refs_lock);
+ ret = release_extent_buffer(eb);
+ spin_lock(&page->mapping->private_lock);
+
+ /*
+ * Extent buffer can't be freed yet, must jump to next slot
+ * and avoid calling release_extent_buffer().
+ */
+ if (!ret)
+ i += (fs_info->nodesize / fs_info->sectorsize - 1);
+ }
+ /*
+ * detach_subpage_mapping() from release_extent_buffer() has detached
+ * all ebs from this page. All related ebs are released.
+ */
+ if (!PagePrivate(page))
+ return 1;
+ return 0;
+}
+
int try_release_extent_buffer(struct page *page)
{
+ bool subpage = (page_to_fs_info(page)->sectorsize < PAGE_SIZE);
struct extent_buffer *eb;
/*
@@ -6172,6 +6361,14 @@ int try_release_extent_buffer(struct page *page)
return 1;
}
+ if (subpage) {
+ int ret;
+
+ ret = try_release_subpage_ebs(page);
+ spin_unlock(&page->mapping->private_lock);
+ return ret;
+ }
+
eb = (struct extent_buffer *)page->private;
BUG_ON(!eb);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index e16c5449ba48..6593b6883438 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -184,6 +184,9 @@ static inline int extent_compress_type(unsigned long bio_flags)
return bio_flags >> EXTENT_BIO_FLAG_SHIFT;
}
+/* Unable to inline it due to the requirement for both ASSERT() and BTRFS_I() */
+struct btrfs_fs_info *page_to_fs_info(struct page *page);
+
struct extent_map_tree;
typedef struct extent_map *(get_extent_t)(struct btrfs_inode *inode,
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 16/17] btrfs: handle extent buffer verification proper for subpage size
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (14 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 15/17] btrfs: introduce subpage_eb_mapping for extent buffers Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-08 7:52 ` [PATCH 17/17] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
` (2 subsequent siblings)
18 siblings, 0 replies; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
Unlike regular PAGE_SIZE == sectorsize case, one btree inode page can
contain several tree blocks.
This makes the csum and other basic tree block verification very tricky,
as in btree_readpage_end_io_hook(), we can only check the extent buffer
who triggers this page read, not the remaining tree blocks in the same
page.
So this patch will change the timing of tree block verification to the
following timings:
- btree_readpage_end_io_hook()
This is the old timing, but now we check all known extent buffers of
the page.
- read_extent_buffer_pages()
This is the new timing exclusive for subpage support.
Now if an extent buffer finds all its page (only 1 for subpage) is
already uptodate, it still needs to check if we have already checked
the extent buffer.
If not, then call btrfs_check_extent_buffer() to verify the extent
buffer.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 5 +-
fs/btrfs/disk-io.h | 1 +
fs/btrfs/extent_io.c | 111 ++++++++++++++++++++++++++++++++++++++++++-
fs/btrfs/extent_io.h | 1 +
4 files changed, 116 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index f6e562979682..5153c0420e7e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -575,7 +575,7 @@ static int check_tree_block_fsid(struct extent_buffer *eb)
}
/* Do basic extent buffer check at read time */
-static int btrfs_check_extent_buffer(struct extent_buffer *eb)
+int btrfs_check_extent_buffer(struct extent_buffer *eb)
{
struct btrfs_fs_info *fs_info = eb->fs_info;
u16 csum_size;
@@ -661,6 +661,9 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
if (!page->private)
goto out;
+ if (page_to_fs_info(page)->sectorsize < PAGE_SIZE)
+ return btrfs_verify_subpage_extent_buffers(page, mirror);
+
eb = (struct extent_buffer *)page->private;
/*
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 00dc39d47ed3..ac42b622f113 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -129,6 +129,7 @@ struct extent_map *btree_get_extent(struct btrfs_inode *inode,
int btrfs_get_num_tolerated_disk_barrier_failures(u64 flags);
int __init btrfs_end_io_wq_init(void);
void __cold btrfs_end_io_wq_exit(void);
+int btrfs_check_extent_buffer(struct extent_buffer *eb);
#ifdef CONFIG_DEBUG_LOCK_ALLOC
void btrfs_init_lockdep(void);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 87b3bb781532..8c5bb53265ab 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -46,6 +46,9 @@ struct subpage_eb_mapping {
*/
unsigned long bitmap;
+ /* Which range of ebs has been verified */
+ unsigned long verified;
+
/* We only support 64K PAGE_SIZE system to mount 4K sectorsize fs */
struct extent_buffer *buffers[SUBPAGE_NR_EXTENT_BUFFERS];
};
@@ -5017,6 +5020,7 @@ static void detach_subpage_mapping(struct extent_buffer *eb, struct page *page)
if (test_bit(i, &mapping->bitmap) &&
mapping->buffers[i] == eb) {
clear_bit(i, &mapping->bitmap);
+ clear_bit(i, &mapping->verified);
mapping->buffers[i] = NULL;
}
}
@@ -5696,6 +5700,38 @@ void set_extent_buffer_uptodate(struct extent_buffer *eb)
}
}
+/*
+ * For subpage, one btree page can already be uptodate (read by other tree
+ * blocks in the same page), but we haven't verified the csum of the tree
+ * block.
+ *
+ * So we need to do extra check for uptodate page of the extent buffer.
+ */
+static int check_uptodate_extent_buffer_page(struct extent_buffer *eb)
+{
+ struct btrfs_fs_info *fs_info = eb->fs_info;
+ struct subpage_eb_mapping *eb_mapping;
+ struct page *page = eb->pages[0];
+ int nr_bit;
+ int ret;
+
+ if (fs_info->sectorsize == PAGE_SIZE)
+ return 0;
+
+ nr_bit = (eb->start - page_offset(page)) / fs_info->sectorsize;
+ spin_lock(&page->mapping->private_lock);
+ eb_mapping = (struct subpage_eb_mapping *)page->private;
+ if (test_bit(nr_bit, &eb_mapping->verified)) {
+ spin_unlock(&page->mapping->private_lock);
+ return 0;
+ }
+ spin_unlock(&page->mapping->private_lock);
+ ret = btrfs_check_extent_buffer(eb);
+ if (!ret)
+ set_bit(nr_bit, &eb_mapping->verified);
+ return ret;
+}
+
int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
{
int i;
@@ -5737,7 +5773,9 @@ int read_extent_buffer_pages(struct extent_buffer *eb, int wait, int mirror_num)
}
if (all_uptodate) {
- set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+ ret = check_uptodate_extent_buffer_page(eb);
+ if (!ret)
+ set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
goto unlock_exit;
}
@@ -6396,3 +6434,74 @@ int try_release_extent_buffer(struct page *page)
return release_extent_buffer(eb);
}
+
+/*
+ * Verify all referred extent buffers in one page for subpage support.
+ *
+ * This is called in btree_readpage_end_io_hook(), where we still have the
+ * page locked.
+ * Here we only check the extent buffer who triggers the page read, so it
+ * doesn't cover all extent buffers contained by this page.
+ *
+ * We still need to do the same check for read_extent_buffer_pages() where
+ * the page of the extent buffer is already uptodate.
+ */
+int btrfs_verify_subpage_extent_buffers(struct page *page, int mirror)
+{
+ struct btrfs_fs_info *fs_info = page_to_fs_info(page);
+ struct extent_buffer *eb;
+ struct subpage_eb_mapping *eb_mapping;
+ int nr_bits = (fs_info->nodesize / fs_info->sectorsize);
+ int i;
+ int ret = 0;
+
+ spin_lock(&page->mapping->private_lock);
+ eb_mapping = (struct subpage_eb_mapping *)page->private;
+ for (i = 0; i < SUBPAGE_NR_EXTENT_BUFFERS; i++) {
+ int reads_done;
+ int j;
+
+ if (!test_bit(i, &eb_mapping->bitmap))
+ continue;
+
+ eb = eb_mapping->buffers[i];
+ spin_unlock(&page->mapping->private_lock);
+
+ atomic_inc(&eb->refs);
+ reads_done = atomic_dec_and_test(&eb->io_pages);
+
+ /*
+ * For subpage tree block, all tree read should be contained in
+ * one page, thus the read should always be done.
+ */
+ ASSERT(reads_done);
+
+ eb->read_mirror = mirror;
+ if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
+ ret = -EIO;
+ atomic_inc(&eb->io_pages);
+ clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+ free_extent_buffer(eb);
+ goto out;
+ }
+
+ ret = btrfs_check_extent_buffer(eb);
+ if (ret < 0) {
+ atomic_inc(&eb->io_pages);
+ clear_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
+ free_extent_buffer(eb);
+ goto out;
+ }
+ for (j = i; j < i + nr_bits; j++)
+ set_bit(j, &eb_mapping->verified);
+
+ /* Go to next eb directly */
+ i += (nr_bits - 1);
+
+ free_extent_buffer(eb);
+ spin_lock(&page->mapping->private_lock);
+ }
+ spin_unlock(&page->mapping->private_lock);
+out:
+ return ret;
+}
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 6593b6883438..d714e05178b5 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -330,6 +330,7 @@ blk_status_t btrfs_submit_read_repair(struct inode *inode,
struct page *page, unsigned int pgoff,
u64 start, u64 end, int failed_mirror,
submit_bio_hook_t *submit_bio_hook);
+int btrfs_verify_subpage_extent_buffers(struct page *page, int mirror);
#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
bool find_lock_delalloc_range(struct inode *inode,
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* [PATCH 17/17] btrfs: allow RO mount of 4K sector size fs on 64K page system
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (15 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 16/17] btrfs: handle extent buffer verification proper for subpage size Qu Wenruo
@ 2020-09-08 7:52 ` Qu Wenruo
2020-09-08 8:03 ` [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
2020-09-11 10:24 ` Qu Wenruo
18 siblings, 0 replies; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 7:52 UTC (permalink / raw)
To: linux-btrfs
This adds the basic RO mount ability for 4K sector size on 64K page
system.
Currently we only plan to support 4K and 64K page system.
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/disk-io.c | 24 +++++++++++++++++++++---
fs/btrfs/super.c | 7 +++++++
2 files changed, 28 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 5153c0420e7e..9e3938e68355 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2446,13 +2446,21 @@ static int validate_super(struct btrfs_fs_info *fs_info,
btrfs_err(fs_info, "invalid sectorsize %llu", sectorsize);
ret = -EINVAL;
}
- /* Only PAGE SIZE is supported yet */
- if (sectorsize != PAGE_SIZE) {
+
+ /*
+ * For 4K page size, we only support 4K sector size.
+ * For 64K page size, we support RW for 64K sector size, and RO for
+ * 4K sector size.
+ */
+ if ((PAGE_SIZE == SZ_4K && sectorsize != PAGE_SIZE) ||
+ (PAGE_SIZE == SZ_64K && (sectorsize != SZ_4K &&
+ sectorsize != SZ_64K))) {
btrfs_err(fs_info,
- "sectorsize %llu not supported yet, only support %lu",
+ "sectorsize %llu not supported yet for page size %lu",
sectorsize, PAGE_SIZE);
ret = -EINVAL;
}
+
if (!is_power_of_2(nodesize) || nodesize < sectorsize ||
nodesize > BTRFS_MAX_METADATA_BLOCKSIZE) {
btrfs_err(fs_info, "invalid nodesize %llu", nodesize);
@@ -3100,6 +3108,16 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
goto fail_alloc;
}
+ /* For 4K sector size support, it's only read-only yet */
+ if (PAGE_SIZE == SZ_64K && sectorsize == SZ_4K) {
+ if (!sb_rdonly(sb) || btrfs_super_log_root(disk_super)) {
+ btrfs_err(fs_info,
+ "subpage sector size only support RO yet");
+ err = -EINVAL;
+ goto fail_alloc;
+ }
+ }
+
ret = btrfs_init_workqueues(fs_info, fs_devices);
if (ret) {
err = ret;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 25967ecaaf0a..edc731780d64 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1922,6 +1922,13 @@ static int btrfs_remount(struct super_block *sb, int *flags, char *data)
ret = -EINVAL;
goto restore;
}
+ if (fs_info->sectorsize < PAGE_SIZE) {
+ btrfs_warn(fs_info,
+ "read-write mount is not yet allowed for sector size %u page size %lu",
+ fs_info->sectorsize, PAGE_SIZE);
+ ret = -EINVAL;
+ goto restore;
+ }
ret = btrfs_cleanup_fs_roots(fs_info);
if (ret)
--
2.28.0
^ permalink raw reply related [flat|nested] 49+ messages in thread
* Re: [PATCH 00/17] btrfs: add read-only support for subpage sector size
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (16 preceding siblings ...)
2020-09-08 7:52 ` [PATCH 17/17] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
@ 2020-09-08 8:03 ` Qu Wenruo
2020-09-11 10:24 ` Qu Wenruo
18 siblings, 0 replies; 49+ messages in thread
From: Qu Wenruo @ 2020-09-08 8:03 UTC (permalink / raw)
To: linux-btrfs
On 2020/9/8 下午3:52, Qu Wenruo wrote:
> Patches can be fetched from github:
> https://github.com/adam900710/linux/tree/subpage
>
> Currently btrfs only allows to mount fs with sectorsize == PAGE_SIZE.
>
> That means, for 64K page size system, they can only use 64K sector size
> fs.
> This brings a big compatible problem for btrfs.
>
> This patch is going to slightly solve the problem by, allowing 64K
> system to mount 4K sectorsize fs in read-only mode.
>
> The main objective here, is to remove the blockage in the code base, and
> pave the road to full RW mount support.
>
> == What works ==
>
> Existing regular page sized sector size support
> Subpage read-only Mount (with all self tests and ASSERT)
> Subpage metadata read (including all trees and inline extents, and csum checking)
> Subpage uncompressed data read (with csum checking)
>
> == What doesn't work ==
>
> Read-write mount (see the subject)
> Compressed data read
>
> == Challenge we meet ==
>
> The main problem is metadata, where we have several limitations:
> - We always read the full page of a metadata
> In subpage case, one full page can contain several tree blocks.
>
> - We use page::private to point to extent buffer
> This means we currently can only support one-page-to-one-extent-buffer
> mapping.
> For subpage size support, we need one-page-to-multiple-extent-buffer
> mapping.
>
>
> == Solutions ==
>
> So here for the metadata part, we use the following methods to
> workaround the problem:
This is pretty different from what Chanda submitted several years ago.
Chanda chooses to base its work on Josef's attempt to kill btree_inode
and use kmalloc memory for tree blocks.
That idea is in fact pretty awesome, it solves a lot of problem and
makes btree read/write way easier.
The problem is, that attempt to kill btree_inode is exposing a big
behavior change, which brings a lot of uncertainty to the following
development.
While this patchset choose to use the existing btree_inode mechanism to
make it easier to be merged.
Personally speaking, I still like the idea of btree_inode kill.
If we could get an agreement on the direction we take, it would be much
better for the future.
Thanks,
Qu
>
> - Introduce subpage_eb_mapping structure to do bitmap
> Now for subpage, page::private points to a subpage_eb_mapping
> structure, which has a bitmap to mapping one page to multiple extent
> buffers.
>
> - Still do full page read for metadata
> This means, at read time, we're not reading just one extent buffer,
> but possibly many more.
> In that case, we first do tree block verification for the tree blocks
> triggering the read, and mark the page uptodate.
>
> For newly incoming tree block read, they will check if the tree block
> is verified. If not verified, even if the page is uptodate, we still
> need to check the extent buffer.
>
> By this all subpage extent buffers are verified properly.
>
> For data part, it's pretty simple, all existing infrastructure can be
> easily converted to support subpage read, without any subpage specific
> handing yet.
>
> == Patchset structure ==
>
> The structure of the patchset:
> Patch 01~11: Preparation patches for metadata subpage read support.
> These patches can be merged without problem, and work for
> both regular and subpage case.
> Patch 12~14: Patches for data subpage read support.
> These patches works for both cases.
>
> That means, patch 01~14 can be applied to current kernel, and shouldn't
> affect any existing behavior.
>
> Patch 15~17: Subpage metadata read specific patches.
> These patches introduces the main part of the subpage
> metadata read support.
>
> The number of patches is the main reason I'm submitting them to the mail
> list. As there are too many preparation patches already.
>
> Qu Wenruo (17):
> btrfs: extent-io-tests: remove invalid tests
> btrfs: calculate inline extent buffer page size based on page size
> btrfs: remove the open-code to read disk-key
> btrfs: make btrfs_fs_info::buffer_radix to take sector size devided
> values
> btrfs: don't allow tree block to cross page boundary for subpage
> support
> btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
> btrfs: make csum_tree_block() handle sectorsize smaller than page size
> btrfs: refactor how we extract extent buffer from page for
> alloc_extent_buffer()
> btrfs: refactor btrfs_release_extent_buffer_pages()
> btrfs: add assert_spin_locked() for attach_extent_buffer_page()
> btrfs: extract the extent buffer verification from
> btree_readpage_end_io_hook()
> btrfs: remove the unnecessary parameter @start and @len for
> check_data_csum()
> btrfs: extent_io: only require sector size alignment for page read
> btrfs: make btrfs_readpage_end_io_hook() follow sector size
> btrfs: introduce subpage_eb_mapping for extent buffers
> btrfs: handle extent buffer verification proper for subpage size
> btrfs: allow RO mount of 4K sector size fs on 64K page system
>
> fs/btrfs/ctree.c | 13 +-
> fs/btrfs/ctree.h | 38 ++-
> fs/btrfs/disk-io.c | 111 ++++---
> fs/btrfs/disk-io.h | 1 +
> fs/btrfs/extent_io.c | 538 +++++++++++++++++++++++++------
> fs/btrfs/extent_io.h | 19 +-
> fs/btrfs/inode.c | 40 ++-
> fs/btrfs/struct-funcs.c | 18 +-
> fs/btrfs/super.c | 7 +
> fs/btrfs/tests/extent-io-tests.c | 26 +-
> 10 files changed, 633 insertions(+), 178 deletions(-)
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 15/17] btrfs: introduce subpage_eb_mapping for extent buffers
2020-09-08 7:52 ` [PATCH 15/17] btrfs: introduce subpage_eb_mapping for extent buffers Qu Wenruo
@ 2020-09-08 10:22 ` kernel test robot
2020-09-08 12:11 ` kernel test robot
2020-09-08 14:24 ` Dan Carpenter
2 siblings, 0 replies; 49+ messages in thread
From: kernel test robot @ 2020-09-08 10:22 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs; +Cc: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 3111 bytes --]
Hi Qu,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on v5.9-rc4]
[also build test WARNING on next-20200903]
[cannot apply to kdave/for-next btrfs/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
base: f4d51dffc6c01a9e94650d95ce0104964f8ae822
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=xtensa
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> fs/btrfs/extent_io.c:6309:5: warning: no previous prototype for 'try_release_subpage_ebs' [-Wmissing-prototypes]
6309 | int try_release_subpage_ebs(struct page *page)
| ^~~~~~~~~~~~~~~~~~~~~~~
# https://github.com/0day-ci/linux/commit/3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
git checkout 3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
vim +/try_release_subpage_ebs +6309 fs/btrfs/extent_io.c
6308
> 6309 int try_release_subpage_ebs(struct page *page)
6310 {
6311 struct subpage_eb_mapping *mapping;
6312 int i;
6313
6314 assert_spin_locked(&page->mapping->private_lock);
6315 if (!PagePrivate(page))
6316 return 1;
6317
6318 mapping = (struct subpage_eb_mapping *)page->private;
6319 for (i = 0; i < SUBPAGE_NR_EXTENT_BUFFERS && PagePrivate(page); i++) {
6320 struct btrfs_fs_info *fs_info = page_to_fs_info(page);
6321 struct extent_buffer *eb;
6322 int ret;
6323
6324 if (!test_bit(i, &mapping->bitmap))
6325 continue;
6326
6327 eb = mapping->buffers[i];
6328 spin_unlock(&page->mapping->private_lock);
6329 spin_lock(&eb->refs_lock);
6330 ret = release_extent_buffer(eb);
6331 spin_lock(&page->mapping->private_lock);
6332
6333 /*
6334 * Extent buffer can't be freed yet, must jump to next slot
6335 * and avoid calling release_extent_buffer().
6336 */
6337 if (!ret)
6338 i += (fs_info->nodesize / fs_info->sectorsize - 1);
6339 }
6340 /*
6341 * detach_subpage_mapping() from release_extent_buffer() has detached
6342 * all ebs from this page. All related ebs are released.
6343 */
6344 if (!PagePrivate(page))
6345 return 1;
6346 return 0;
6347 }
6348
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 65078 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 15/17] btrfs: introduce subpage_eb_mapping for extent buffers
@ 2020-09-08 10:22 ` kernel test robot
0 siblings, 0 replies; 49+ messages in thread
From: kernel test robot @ 2020-09-08 10:22 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 3194 bytes --]
Hi Qu,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on v5.9-rc4]
[also build test WARNING on next-20200903]
[cannot apply to kdave/for-next btrfs/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
base: f4d51dffc6c01a9e94650d95ce0104964f8ae822
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=xtensa
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> fs/btrfs/extent_io.c:6309:5: warning: no previous prototype for 'try_release_subpage_ebs' [-Wmissing-prototypes]
6309 | int try_release_subpage_ebs(struct page *page)
| ^~~~~~~~~~~~~~~~~~~~~~~
# https://github.com/0day-ci/linux/commit/3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
git checkout 3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
vim +/try_release_subpage_ebs +6309 fs/btrfs/extent_io.c
6308
> 6309 int try_release_subpage_ebs(struct page *page)
6310 {
6311 struct subpage_eb_mapping *mapping;
6312 int i;
6313
6314 assert_spin_locked(&page->mapping->private_lock);
6315 if (!PagePrivate(page))
6316 return 1;
6317
6318 mapping = (struct subpage_eb_mapping *)page->private;
6319 for (i = 0; i < SUBPAGE_NR_EXTENT_BUFFERS && PagePrivate(page); i++) {
6320 struct btrfs_fs_info *fs_info = page_to_fs_info(page);
6321 struct extent_buffer *eb;
6322 int ret;
6323
6324 if (!test_bit(i, &mapping->bitmap))
6325 continue;
6326
6327 eb = mapping->buffers[i];
6328 spin_unlock(&page->mapping->private_lock);
6329 spin_lock(&eb->refs_lock);
6330 ret = release_extent_buffer(eb);
6331 spin_lock(&page->mapping->private_lock);
6332
6333 /*
6334 * Extent buffer can't be freed yet, must jump to next slot
6335 * and avoid calling release_extent_buffer().
6336 */
6337 if (!ret)
6338 i += (fs_info->nodesize / fs_info->sectorsize - 1);
6339 }
6340 /*
6341 * detach_subpage_mapping() from release_extent_buffer() has detached
6342 * all ebs from this page. All related ebs are released.
6343 */
6344 if (!PagePrivate(page))
6345 return 1;
6346 return 0;
6347 }
6348
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 65078 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 15/17] btrfs: introduce subpage_eb_mapping for extent buffers
2020-09-08 7:52 ` [PATCH 15/17] btrfs: introduce subpage_eb_mapping for extent buffers Qu Wenruo
2020-09-08 10:22 ` kernel test robot
@ 2020-09-08 12:11 ` kernel test robot
2020-09-08 14:24 ` Dan Carpenter
2 siblings, 0 replies; 49+ messages in thread
From: kernel test robot @ 2020-09-08 12:11 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 7504 bytes --]
Hi Qu,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on v5.9-rc4]
[also build test WARNING on next-20200903]
[cannot apply to kdave/for-next btrfs/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
base: f4d51dffc6c01a9e94650d95ce0104964f8ae822
config: x86_64-randconfig-a006-20200908 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 5f5a0bb0872a9673bad08b38bc0b14c42263902a)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> fs/btrfs/extent_io.c:5427:7: warning: variable 'num_pages' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
if (!mapping) {
^~~~~~~~
fs/btrfs/extent_io.c:5516:18: note: uninitialized use occurs here
for (i = 0; i < num_pages; i++) {
^~~~~~~~~
fs/btrfs/extent_io.c:5427:3: note: remove the 'if' if its condition is always false
if (!mapping) {
^~~~~~~~~~~~~~~
fs/btrfs/extent_io.c:5393:15: note: initialize the variable 'num_pages' to silence this warning
int num_pages;
^
= 0
>> fs/btrfs/extent_io.c:6309:5: warning: no previous prototype for function 'try_release_subpage_ebs' [-Wmissing-prototypes]
int try_release_subpage_ebs(struct page *page)
^
fs/btrfs/extent_io.c:6309:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int try_release_subpage_ebs(struct page *page)
^
static
2 warnings generated.
# https://github.com/0day-ci/linux/commit/3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
git checkout 3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
vim +5427 fs/btrfs/extent_io.c
5388
5389 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
5390 u64 start)
5391 {
5392 unsigned long len = fs_info->nodesize;
5393 int num_pages;
5394 int i;
5395 unsigned long index = start >> PAGE_SHIFT;
5396 struct extent_buffer *eb;
5397 struct extent_buffer *exists = NULL;
5398 struct page *p;
5399 struct address_space *mapping = fs_info->btree_inode->i_mapping;
5400 struct subpage_eb_mapping *subpage_mapping = NULL;
5401 bool subpage = (fs_info->sectorsize < PAGE_SIZE);
5402 int uptodate = 1;
5403 int ret;
5404
5405 if (!IS_ALIGNED(start, fs_info->sectorsize)) {
5406 btrfs_err(fs_info, "bad tree block start %llu", start);
5407 return ERR_PTR(-EINVAL);
5408 }
5409 if (fs_info->sectorsize < PAGE_SIZE && round_down(start, PAGE_SIZE) !=
5410 round_down(start + len - 1, PAGE_SIZE)) {
5411 btrfs_err(fs_info,
5412 "tree block crosses page boundary, start %llu nodesize %lu",
5413 start, len);
5414 return ERR_PTR(-EINVAL);
5415 }
5416
5417 eb = find_extent_buffer(fs_info, start);
5418 if (eb)
5419 return eb;
5420
5421 eb = __alloc_extent_buffer(fs_info, start, len);
5422 if (!eb)
5423 return ERR_PTR(-ENOMEM);
5424
5425 if (subpage) {
5426 subpage_mapping = kmalloc(sizeof(*subpage_mapping), GFP_NOFS);
> 5427 if (!mapping) {
5428 exists = ERR_PTR(-ENOMEM);
5429 goto free_eb;
5430 }
5431 }
5432
5433 num_pages = num_extent_pages(eb);
5434 for (i = 0; i < num_pages; i++, index++) {
5435 p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
5436 if (!p) {
5437 exists = ERR_PTR(-ENOMEM);
5438 goto free_eb;
5439 }
5440
5441 spin_lock(&mapping->private_lock);
5442 if (PagePrivate(p)) {
5443 exists = grab_extent_buffer_from_page(p, start);
5444 if (exists && atomic_inc_not_zero(&exists->refs)) {
5445 spin_unlock(&mapping->private_lock);
5446 unlock_page(p);
5447 put_page(p);
5448 mark_extent_buffer_accessed(exists, p);
5449 goto free_eb;
5450 }
5451 exists = NULL;
5452
5453 if (!subpage) {
5454 /*
5455 * Do this so attach doesn't complain and we
5456 * need to drop the ref the old guy had.
5457 */
5458 ClearPagePrivate(p);
5459 WARN_ON(PageDirty(p));
5460 put_page(p);
5461 }
5462 }
5463 attach_extent_buffer_page(eb, p, subpage_mapping);
5464 spin_unlock(&mapping->private_lock);
5465 subpage_mapping = NULL;
5466 WARN_ON(PageDirty(p));
5467 eb->pages[i] = p;
5468 if (!PageUptodate(p))
5469 uptodate = 0;
5470
5471 /*
5472 * We can't unlock the pages just yet since the extent buffer
5473 * hasn't been properly inserted in the radix tree, this
5474 * opens a race with btree_releasepage which can free a page
5475 * while we are still filling in all pages for the buffer and
5476 * we could crash.
5477 */
5478 }
5479 if (uptodate)
5480 set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
5481 again:
5482 ret = radix_tree_preload(GFP_NOFS);
5483 if (ret) {
5484 exists = ERR_PTR(ret);
5485 goto free_eb;
5486 }
5487
5488 spin_lock(&fs_info->buffer_lock);
5489 ret = radix_tree_insert(&fs_info->buffer_radix,
5490 start / fs_info->sectorsize, eb);
5491 spin_unlock(&fs_info->buffer_lock);
5492 radix_tree_preload_end();
5493 if (ret == -EEXIST) {
5494 exists = find_extent_buffer(fs_info, start);
5495 if (exists)
5496 goto free_eb;
5497 else
5498 goto again;
5499 }
5500 /* add one reference for the tree */
5501 check_buffer_tree_ref(eb);
5502 set_bit(EXTENT_BUFFER_IN_TREE, &eb->bflags);
5503
5504 /*
5505 * Now it's safe to unlock the pages because any calls to
5506 * btree_releasepage will correctly detect that a page belongs to a
5507 * live buffer and won't free them prematurely.
5508 */
5509 for (i = 0; i < num_pages; i++)
5510 unlock_page(eb->pages[i]);
5511 return eb;
5512
5513 free_eb:
5514 WARN_ON(!atomic_dec_and_test(&eb->refs));
5515 kfree(subpage_mapping);
5516 for (i = 0; i < num_pages; i++) {
5517 if (eb->pages[i])
5518 unlock_page(eb->pages[i]);
5519 }
5520
5521 btrfs_release_extent_buffer(eb);
5522 return exists;
5523 }
5524
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 36417 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 15/17] btrfs: introduce subpage_eb_mapping for extent buffers
2020-09-08 7:52 ` [PATCH 15/17] btrfs: introduce subpage_eb_mapping for extent buffers Qu Wenruo
2020-09-08 10:22 ` kernel test robot
@ 2020-09-08 14:24 ` Dan Carpenter
2020-09-08 14:24 ` Dan Carpenter
2 siblings, 0 replies; 49+ messages in thread
From: Dan Carpenter @ 2020-09-08 14:24 UTC (permalink / raw)
To: kbuild, Qu Wenruo, linux-btrfs; +Cc: lkp, kbuild-all
[-- Attachment #1: Type: text/plain, Size: 12257 bytes --]
Hi Qu,
url: https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
base: f4d51dffc6c01a9e94650d95ce0104964f8ae822
config: x86_64-randconfig-m001-20200907 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
fs/btrfs/extent_io.c:5516 alloc_extent_buffer() error: uninitialized symbol 'num_pages'.
Old smatch warnings:
fs/btrfs/extent_io.c:6397 try_release_extent_buffer() warn: inconsistent returns 'eb->refs_lock'.
# https://github.com/0day-ci/linux/commit/3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
git checkout 3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
vim +/subpage_mapping +5511 fs/btrfs/extent_io.c
f28491e0a6c46d Josef Bacik 2013-12-16 5389 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
ce3e69847e3ec7 David Sterba 2014-06-15 5390 u64 start)
d1310b2e0cd98e Chris Mason 2008-01-24 5391 {
da17066c40472c Jeff Mahoney 2016-06-15 5392 unsigned long len = fs_info->nodesize;
cc5e31a4775d0d David Sterba 2018-03-01 5393 int num_pages;
cc5e31a4775d0d David Sterba 2018-03-01 5394 int i;
09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 5395 unsigned long index = start >> PAGE_SHIFT;
d1310b2e0cd98e Chris Mason 2008-01-24 5396 struct extent_buffer *eb;
6af118ce51b52c Chris Mason 2008-07-22 5397 struct extent_buffer *exists = NULL;
d1310b2e0cd98e Chris Mason 2008-01-24 5398 struct page *p;
f28491e0a6c46d Josef Bacik 2013-12-16 5399 struct address_space *mapping = fs_info->btree_inode->i_mapping;
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5400 struct subpage_eb_mapping *subpage_mapping = NULL;
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5401 bool subpage = (fs_info->sectorsize < PAGE_SIZE);
d1310b2e0cd98e Chris Mason 2008-01-24 5402 int uptodate = 1;
19fe0a8b787d7c Miao Xie 2010-10-26 5403 int ret;
d1310b2e0cd98e Chris Mason 2008-01-24 5404
da17066c40472c Jeff Mahoney 2016-06-15 5405 if (!IS_ALIGNED(start, fs_info->sectorsize)) {
c871b0f2fd27e7 Liu Bo 2016-06-06 5406 btrfs_err(fs_info, "bad tree block start %llu", start);
c871b0f2fd27e7 Liu Bo 2016-06-06 5407 return ERR_PTR(-EINVAL);
c871b0f2fd27e7 Liu Bo 2016-06-06 5408 }
039b297b76d816 Qu Wenruo 2020-09-08 5409 if (fs_info->sectorsize < PAGE_SIZE && round_down(start, PAGE_SIZE) !=
039b297b76d816 Qu Wenruo 2020-09-08 5410 round_down(start + len - 1, PAGE_SIZE)) {
039b297b76d816 Qu Wenruo 2020-09-08 5411 btrfs_err(fs_info,
039b297b76d816 Qu Wenruo 2020-09-08 5412 "tree block crosses page boundary, start %llu nodesize %lu",
039b297b76d816 Qu Wenruo 2020-09-08 5413 start, len);
039b297b76d816 Qu Wenruo 2020-09-08 5414 return ERR_PTR(-EINVAL);
039b297b76d816 Qu Wenruo 2020-09-08 5415 }
c871b0f2fd27e7 Liu Bo 2016-06-06 5416
f28491e0a6c46d Josef Bacik 2013-12-16 5417 eb = find_extent_buffer(fs_info, start);
452c75c3d21870 Chandra Seetharaman 2013-10-07 5418 if (eb)
6af118ce51b52c Chris Mason 2008-07-22 5419 return eb;
6af118ce51b52c Chris Mason 2008-07-22 5420
23d79d81b13431 David Sterba 2014-06-15 5421 eb = __alloc_extent_buffer(fs_info, start, len);
2b114d1d33551a Peter 2008-04-01 5422 if (!eb)
c871b0f2fd27e7 Liu Bo 2016-06-06 5423 return ERR_PTR(-ENOMEM);
d1310b2e0cd98e Chris Mason 2008-01-24 5424
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5425 if (subpage) {
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5426 subpage_mapping = kmalloc(sizeof(*subpage_mapping), GFP_NOFS);
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5427 if (!mapping) {
This was probably supposed to be if "subpage_mapping" instead of
"mapping".
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5428 exists = ERR_PTR(-ENOMEM);
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5429 goto free_eb;
The "num_pages" variable is uninitialized on this goto path.
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5430 }
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5431 }
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5432
65ad010488a5cc David Sterba 2018-06-29 5433 num_pages = num_extent_pages(eb);
727011e07cbdf8 Chris Mason 2010-08-06 5434 for (i = 0; i < num_pages; i++, index++) {
d1b5c5671d010d Michal Hocko 2015-08-19 5435 p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
c871b0f2fd27e7 Liu Bo 2016-06-06 5436 if (!p) {
c871b0f2fd27e7 Liu Bo 2016-06-06 5437 exists = ERR_PTR(-ENOMEM);
6af118ce51b52c Chris Mason 2008-07-22 5438 goto free_eb;
c871b0f2fd27e7 Liu Bo 2016-06-06 5439 }
4f2de97acee653 Josef Bacik 2012-03-07 5440
4f2de97acee653 Josef Bacik 2012-03-07 5441 spin_lock(&mapping->private_lock);
4f2de97acee653 Josef Bacik 2012-03-07 5442 if (PagePrivate(p)) {
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5443 exists = grab_extent_buffer_from_page(p, start);
b76bd038ff9290 Qu Wenruo 2020-09-08 5444 if (exists && atomic_inc_not_zero(&exists->refs)) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This increment doesn't pair perfectly with the decrement in the error
handling. In other words, we sometimes decrement it when it was never
incremented. This presumably will lead to a use after free.
4f2de97acee653 Josef Bacik 2012-03-07 5445 spin_unlock(&mapping->private_lock);
4f2de97acee653 Josef Bacik 2012-03-07 5446 unlock_page(p);
09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 5447 put_page(p);
2457aec63745e2 Mel Gorman 2014-06-04 5448 mark_extent_buffer_accessed(exists, p);
4f2de97acee653 Josef Bacik 2012-03-07 5449 goto free_eb;
4f2de97acee653 Josef Bacik 2012-03-07 5450 }
5ca64f45e92dc5 Omar Sandoval 2015-02-24 5451 exists = NULL;
4f2de97acee653 Josef Bacik 2012-03-07 5452
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5453 if (!subpage) {
4f2de97acee653 Josef Bacik 2012-03-07 5454 /*
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5455 * Do this so attach doesn't complain and we
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5456 * need to drop the ref the old guy had.
4f2de97acee653 Josef Bacik 2012-03-07 5457 */
4f2de97acee653 Josef Bacik 2012-03-07 5458 ClearPagePrivate(p);
0b32f4bbb423f0 Josef Bacik 2012-03-13 5459 WARN_ON(PageDirty(p));
09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 5460 put_page(p);
d1310b2e0cd98e Chris Mason 2008-01-24 5461 }
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5462 }
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5463 attach_extent_buffer_page(eb, p, subpage_mapping);
4f2de97acee653 Josef Bacik 2012-03-07 5464 spin_unlock(&mapping->private_lock);
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5465 subpage_mapping = NULL;
0b32f4bbb423f0 Josef Bacik 2012-03-13 5466 WARN_ON(PageDirty(p));
727011e07cbdf8 Chris Mason 2010-08-06 5467 eb->pages[i] = p;
d1310b2e0cd98e Chris Mason 2008-01-24 5468 if (!PageUptodate(p))
d1310b2e0cd98e Chris Mason 2008-01-24 5469 uptodate = 0;
eb14ab8ed24a04 Chris Mason 2011-02-10 5470
eb14ab8ed24a04 Chris Mason 2011-02-10 5471 /*
b16d011e79fb35 Nikolay Borisov 2018-07-04 5472 * We can't unlock the pages just yet since the extent buffer
b16d011e79fb35 Nikolay Borisov 2018-07-04 5473 * hasn't been properly inserted in the radix tree, this
b16d011e79fb35 Nikolay Borisov 2018-07-04 5474 * opens a race with btree_releasepage which can free a page
b16d011e79fb35 Nikolay Borisov 2018-07-04 5475 * while we are still filling in all pages for the buffer and
b16d011e79fb35 Nikolay Borisov 2018-07-04 5476 * we could crash.
eb14ab8ed24a04 Chris Mason 2011-02-10 5477 */
d1310b2e0cd98e Chris Mason 2008-01-24 5478 }
d1310b2e0cd98e Chris Mason 2008-01-24 5479 if (uptodate)
b4ce94de9b4d64 Chris Mason 2009-02-04 5480 set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
115391d2315239 Josef Bacik 2012-03-09 5481 again:
e1860a7724828a David Sterba 2016-05-09 5482 ret = radix_tree_preload(GFP_NOFS);
c871b0f2fd27e7 Liu Bo 2016-06-06 5483 if (ret) {
c871b0f2fd27e7 Liu Bo 2016-06-06 5484 exists = ERR_PTR(ret);
19fe0a8b787d7c Miao Xie 2010-10-26 5485 goto free_eb;
c871b0f2fd27e7 Liu Bo 2016-06-06 5486 }
19fe0a8b787d7c Miao Xie 2010-10-26 5487
f28491e0a6c46d Josef Bacik 2013-12-16 5488 spin_lock(&fs_info->buffer_lock);
f28491e0a6c46d Josef Bacik 2013-12-16 5489 ret = radix_tree_insert(&fs_info->buffer_radix,
d31cf6896006df Qu Wenruo 2020-09-08 5490 start / fs_info->sectorsize, eb);
f28491e0a6c46d Josef Bacik 2013-12-16 5491 spin_unlock(&fs_info->buffer_lock);
19fe0a8b787d7c Miao Xie 2010-10-26 5492 radix_tree_preload_end();
452c75c3d21870 Chandra Seetharaman 2013-10-07 5493 if (ret == -EEXIST) {
f28491e0a6c46d Josef Bacik 2013-12-16 5494 exists = find_extent_buffer(fs_info, start);
452c75c3d21870 Chandra Seetharaman 2013-10-07 5495 if (exists)
6af118ce51b52c Chris Mason 2008-07-22 5496 goto free_eb;
452c75c3d21870 Chandra Seetharaman 2013-10-07 5497 else
452c75c3d21870 Chandra Seetharaman 2013-10-07 5498 goto again;
6af118ce51b52c Chris Mason 2008-07-22 5499 }
6af118ce51b52c Chris Mason 2008-07-22 5500 /* add one reference for the tree */
0b32f4bbb423f0 Josef Bacik 2012-03-13 5501 check_buffer_tree_ref(eb);
34b41acec1ccc0 Josef Bacik 2013-12-13 5502 set_bit(EXTENT_BUFFER_IN_TREE, &eb->bflags);
eb14ab8ed24a04 Chris Mason 2011-02-10 5503
eb14ab8ed24a04 Chris Mason 2011-02-10 5504 /*
b16d011e79fb35 Nikolay Borisov 2018-07-04 5505 * Now it's safe to unlock the pages because any calls to
b16d011e79fb35 Nikolay Borisov 2018-07-04 5506 * btree_releasepage will correctly detect that a page belongs to a
b16d011e79fb35 Nikolay Borisov 2018-07-04 5507 * live buffer and won't free them prematurely.
eb14ab8ed24a04 Chris Mason 2011-02-10 5508 */
28187ae569e8a6 Nikolay Borisov 2018-07-04 5509 for (i = 0; i < num_pages; i++)
28187ae569e8a6 Nikolay Borisov 2018-07-04 5510 unlock_page(eb->pages[i]);
d1310b2e0cd98e Chris Mason 2008-01-24 @5511 return eb;
d1310b2e0cd98e Chris Mason 2008-01-24 5512
6af118ce51b52c Chris Mason 2008-07-22 5513 free_eb:
5ca64f45e92dc5 Omar Sandoval 2015-02-24 5514 WARN_ON(!atomic_dec_and_test(&eb->refs));
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5515 kfree(subpage_mapping);
727011e07cbdf8 Chris Mason 2010-08-06 @5516 for (i = 0; i < num_pages; i++) {
727011e07cbdf8 Chris Mason 2010-08-06 5517 if (eb->pages[i])
727011e07cbdf8 Chris Mason 2010-08-06 5518 unlock_page(eb->pages[i]);
727011e07cbdf8 Chris Mason 2010-08-06 5519 }
eb14ab8ed24a04 Chris Mason 2011-02-10 5520
897ca6e9b4fef8 Miao Xie 2010-10-26 5521 btrfs_release_extent_buffer(eb);
6af118ce51b52c Chris Mason 2008-07-22 5522 return exists;
d1310b2e0cd98e Chris Mason 2008-01-24 5523 }
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 38592 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 15/17] btrfs: introduce subpage_eb_mapping for extent buffers
@ 2020-09-08 14:24 ` Dan Carpenter
0 siblings, 0 replies; 49+ messages in thread
From: Dan Carpenter @ 2020-09-08 14:24 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 12433 bytes --]
Hi Qu,
url: https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
base: f4d51dffc6c01a9e94650d95ce0104964f8ae822
config: x86_64-randconfig-m001-20200907 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
fs/btrfs/extent_io.c:5516 alloc_extent_buffer() error: uninitialized symbol 'num_pages'.
Old smatch warnings:
fs/btrfs/extent_io.c:6397 try_release_extent_buffer() warn: inconsistent returns 'eb->refs_lock'.
# https://github.com/0day-ci/linux/commit/3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
git checkout 3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
vim +/subpage_mapping +5511 fs/btrfs/extent_io.c
f28491e0a6c46d Josef Bacik 2013-12-16 5389 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
ce3e69847e3ec7 David Sterba 2014-06-15 5390 u64 start)
d1310b2e0cd98e Chris Mason 2008-01-24 5391 {
da17066c40472c Jeff Mahoney 2016-06-15 5392 unsigned long len = fs_info->nodesize;
cc5e31a4775d0d David Sterba 2018-03-01 5393 int num_pages;
cc5e31a4775d0d David Sterba 2018-03-01 5394 int i;
09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 5395 unsigned long index = start >> PAGE_SHIFT;
d1310b2e0cd98e Chris Mason 2008-01-24 5396 struct extent_buffer *eb;
6af118ce51b52c Chris Mason 2008-07-22 5397 struct extent_buffer *exists = NULL;
d1310b2e0cd98e Chris Mason 2008-01-24 5398 struct page *p;
f28491e0a6c46d Josef Bacik 2013-12-16 5399 struct address_space *mapping = fs_info->btree_inode->i_mapping;
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5400 struct subpage_eb_mapping *subpage_mapping = NULL;
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5401 bool subpage = (fs_info->sectorsize < PAGE_SIZE);
d1310b2e0cd98e Chris Mason 2008-01-24 5402 int uptodate = 1;
19fe0a8b787d7c Miao Xie 2010-10-26 5403 int ret;
d1310b2e0cd98e Chris Mason 2008-01-24 5404
da17066c40472c Jeff Mahoney 2016-06-15 5405 if (!IS_ALIGNED(start, fs_info->sectorsize)) {
c871b0f2fd27e7 Liu Bo 2016-06-06 5406 btrfs_err(fs_info, "bad tree block start %llu", start);
c871b0f2fd27e7 Liu Bo 2016-06-06 5407 return ERR_PTR(-EINVAL);
c871b0f2fd27e7 Liu Bo 2016-06-06 5408 }
039b297b76d816 Qu Wenruo 2020-09-08 5409 if (fs_info->sectorsize < PAGE_SIZE && round_down(start, PAGE_SIZE) !=
039b297b76d816 Qu Wenruo 2020-09-08 5410 round_down(start + len - 1, PAGE_SIZE)) {
039b297b76d816 Qu Wenruo 2020-09-08 5411 btrfs_err(fs_info,
039b297b76d816 Qu Wenruo 2020-09-08 5412 "tree block crosses page boundary, start %llu nodesize %lu",
039b297b76d816 Qu Wenruo 2020-09-08 5413 start, len);
039b297b76d816 Qu Wenruo 2020-09-08 5414 return ERR_PTR(-EINVAL);
039b297b76d816 Qu Wenruo 2020-09-08 5415 }
c871b0f2fd27e7 Liu Bo 2016-06-06 5416
f28491e0a6c46d Josef Bacik 2013-12-16 5417 eb = find_extent_buffer(fs_info, start);
452c75c3d21870 Chandra Seetharaman 2013-10-07 5418 if (eb)
6af118ce51b52c Chris Mason 2008-07-22 5419 return eb;
6af118ce51b52c Chris Mason 2008-07-22 5420
23d79d81b13431 David Sterba 2014-06-15 5421 eb = __alloc_extent_buffer(fs_info, start, len);
2b114d1d33551a Peter 2008-04-01 5422 if (!eb)
c871b0f2fd27e7 Liu Bo 2016-06-06 5423 return ERR_PTR(-ENOMEM);
d1310b2e0cd98e Chris Mason 2008-01-24 5424
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5425 if (subpage) {
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5426 subpage_mapping = kmalloc(sizeof(*subpage_mapping), GFP_NOFS);
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5427 if (!mapping) {
This was probably supposed to be if "subpage_mapping" instead of
"mapping".
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5428 exists = ERR_PTR(-ENOMEM);
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5429 goto free_eb;
The "num_pages" variable is uninitialized on this goto path.
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5430 }
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5431 }
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5432
65ad010488a5cc David Sterba 2018-06-29 5433 num_pages = num_extent_pages(eb);
727011e07cbdf8 Chris Mason 2010-08-06 5434 for (i = 0; i < num_pages; i++, index++) {
d1b5c5671d010d Michal Hocko 2015-08-19 5435 p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
c871b0f2fd27e7 Liu Bo 2016-06-06 5436 if (!p) {
c871b0f2fd27e7 Liu Bo 2016-06-06 5437 exists = ERR_PTR(-ENOMEM);
6af118ce51b52c Chris Mason 2008-07-22 5438 goto free_eb;
c871b0f2fd27e7 Liu Bo 2016-06-06 5439 }
4f2de97acee653 Josef Bacik 2012-03-07 5440
4f2de97acee653 Josef Bacik 2012-03-07 5441 spin_lock(&mapping->private_lock);
4f2de97acee653 Josef Bacik 2012-03-07 5442 if (PagePrivate(p)) {
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5443 exists = grab_extent_buffer_from_page(p, start);
b76bd038ff9290 Qu Wenruo 2020-09-08 5444 if (exists && atomic_inc_not_zero(&exists->refs)) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This increment doesn't pair perfectly with the decrement in the error
handling. In other words, we sometimes decrement it when it was never
incremented. This presumably will lead to a use after free.
4f2de97acee653 Josef Bacik 2012-03-07 5445 spin_unlock(&mapping->private_lock);
4f2de97acee653 Josef Bacik 2012-03-07 5446 unlock_page(p);
09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 5447 put_page(p);
2457aec63745e2 Mel Gorman 2014-06-04 5448 mark_extent_buffer_accessed(exists, p);
4f2de97acee653 Josef Bacik 2012-03-07 5449 goto free_eb;
4f2de97acee653 Josef Bacik 2012-03-07 5450 }
5ca64f45e92dc5 Omar Sandoval 2015-02-24 5451 exists = NULL;
4f2de97acee653 Josef Bacik 2012-03-07 5452
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5453 if (!subpage) {
4f2de97acee653 Josef Bacik 2012-03-07 5454 /*
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5455 * Do this so attach doesn't complain and we
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5456 * need to drop the ref the old guy had.
4f2de97acee653 Josef Bacik 2012-03-07 5457 */
4f2de97acee653 Josef Bacik 2012-03-07 5458 ClearPagePrivate(p);
0b32f4bbb423f0 Josef Bacik 2012-03-13 5459 WARN_ON(PageDirty(p));
09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 5460 put_page(p);
d1310b2e0cd98e Chris Mason 2008-01-24 5461 }
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5462 }
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5463 attach_extent_buffer_page(eb, p, subpage_mapping);
4f2de97acee653 Josef Bacik 2012-03-07 5464 spin_unlock(&mapping->private_lock);
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5465 subpage_mapping = NULL;
0b32f4bbb423f0 Josef Bacik 2012-03-13 5466 WARN_ON(PageDirty(p));
727011e07cbdf8 Chris Mason 2010-08-06 5467 eb->pages[i] = p;
d1310b2e0cd98e Chris Mason 2008-01-24 5468 if (!PageUptodate(p))
d1310b2e0cd98e Chris Mason 2008-01-24 5469 uptodate = 0;
eb14ab8ed24a04 Chris Mason 2011-02-10 5470
eb14ab8ed24a04 Chris Mason 2011-02-10 5471 /*
b16d011e79fb35 Nikolay Borisov 2018-07-04 5472 * We can't unlock the pages just yet since the extent buffer
b16d011e79fb35 Nikolay Borisov 2018-07-04 5473 * hasn't been properly inserted in the radix tree, this
b16d011e79fb35 Nikolay Borisov 2018-07-04 5474 * opens a race with btree_releasepage which can free a page
b16d011e79fb35 Nikolay Borisov 2018-07-04 5475 * while we are still filling in all pages for the buffer and
b16d011e79fb35 Nikolay Borisov 2018-07-04 5476 * we could crash.
eb14ab8ed24a04 Chris Mason 2011-02-10 5477 */
d1310b2e0cd98e Chris Mason 2008-01-24 5478 }
d1310b2e0cd98e Chris Mason 2008-01-24 5479 if (uptodate)
b4ce94de9b4d64 Chris Mason 2009-02-04 5480 set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
115391d2315239 Josef Bacik 2012-03-09 5481 again:
e1860a7724828a David Sterba 2016-05-09 5482 ret = radix_tree_preload(GFP_NOFS);
c871b0f2fd27e7 Liu Bo 2016-06-06 5483 if (ret) {
c871b0f2fd27e7 Liu Bo 2016-06-06 5484 exists = ERR_PTR(ret);
19fe0a8b787d7c Miao Xie 2010-10-26 5485 goto free_eb;
c871b0f2fd27e7 Liu Bo 2016-06-06 5486 }
19fe0a8b787d7c Miao Xie 2010-10-26 5487
f28491e0a6c46d Josef Bacik 2013-12-16 5488 spin_lock(&fs_info->buffer_lock);
f28491e0a6c46d Josef Bacik 2013-12-16 5489 ret = radix_tree_insert(&fs_info->buffer_radix,
d31cf6896006df Qu Wenruo 2020-09-08 5490 start / fs_info->sectorsize, eb);
f28491e0a6c46d Josef Bacik 2013-12-16 5491 spin_unlock(&fs_info->buffer_lock);
19fe0a8b787d7c Miao Xie 2010-10-26 5492 radix_tree_preload_end();
452c75c3d21870 Chandra Seetharaman 2013-10-07 5493 if (ret == -EEXIST) {
f28491e0a6c46d Josef Bacik 2013-12-16 5494 exists = find_extent_buffer(fs_info, start);
452c75c3d21870 Chandra Seetharaman 2013-10-07 5495 if (exists)
6af118ce51b52c Chris Mason 2008-07-22 5496 goto free_eb;
452c75c3d21870 Chandra Seetharaman 2013-10-07 5497 else
452c75c3d21870 Chandra Seetharaman 2013-10-07 5498 goto again;
6af118ce51b52c Chris Mason 2008-07-22 5499 }
6af118ce51b52c Chris Mason 2008-07-22 5500 /* add one reference for the tree */
0b32f4bbb423f0 Josef Bacik 2012-03-13 5501 check_buffer_tree_ref(eb);
34b41acec1ccc0 Josef Bacik 2013-12-13 5502 set_bit(EXTENT_BUFFER_IN_TREE, &eb->bflags);
eb14ab8ed24a04 Chris Mason 2011-02-10 5503
eb14ab8ed24a04 Chris Mason 2011-02-10 5504 /*
b16d011e79fb35 Nikolay Borisov 2018-07-04 5505 * Now it's safe to unlock the pages because any calls to
b16d011e79fb35 Nikolay Borisov 2018-07-04 5506 * btree_releasepage will correctly detect that a page belongs to a
b16d011e79fb35 Nikolay Borisov 2018-07-04 5507 * live buffer and won't free them prematurely.
eb14ab8ed24a04 Chris Mason 2011-02-10 5508 */
28187ae569e8a6 Nikolay Borisov 2018-07-04 5509 for (i = 0; i < num_pages; i++)
28187ae569e8a6 Nikolay Borisov 2018-07-04 5510 unlock_page(eb->pages[i]);
d1310b2e0cd98e Chris Mason 2008-01-24 @5511 return eb;
d1310b2e0cd98e Chris Mason 2008-01-24 5512
6af118ce51b52c Chris Mason 2008-07-22 5513 free_eb:
5ca64f45e92dc5 Omar Sandoval 2015-02-24 5514 WARN_ON(!atomic_dec_and_test(&eb->refs));
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5515 kfree(subpage_mapping);
727011e07cbdf8 Chris Mason 2010-08-06 @5516 for (i = 0; i < num_pages; i++) {
727011e07cbdf8 Chris Mason 2010-08-06 5517 if (eb->pages[i])
727011e07cbdf8 Chris Mason 2010-08-06 5518 unlock_page(eb->pages[i]);
727011e07cbdf8 Chris Mason 2010-08-06 5519 }
eb14ab8ed24a04 Chris Mason 2011-02-10 5520
897ca6e9b4fef8 Miao Xie 2010-10-26 5521 btrfs_release_extent_buffer(eb);
6af118ce51b52c Chris Mason 2008-07-22 5522 return exists;
d1310b2e0cd98e Chris Mason 2008-01-24 5523 }
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 38592 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 15/17] btrfs: introduce subpage_eb_mapping for extent buffers
@ 2020-09-08 14:24 ` Dan Carpenter
0 siblings, 0 replies; 49+ messages in thread
From: Dan Carpenter @ 2020-09-08 14:24 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 12433 bytes --]
Hi Qu,
url: https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
base: f4d51dffc6c01a9e94650d95ce0104964f8ae822
config: x86_64-randconfig-m001-20200907 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
fs/btrfs/extent_io.c:5516 alloc_extent_buffer() error: uninitialized symbol 'num_pages'.
Old smatch warnings:
fs/btrfs/extent_io.c:6397 try_release_extent_buffer() warn: inconsistent returns 'eb->refs_lock'.
# https://github.com/0day-ci/linux/commit/3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
git checkout 3ef1cb4eb96ce4dce4dc94e3f06c4dd41879e977
vim +/subpage_mapping +5511 fs/btrfs/extent_io.c
f28491e0a6c46d Josef Bacik 2013-12-16 5389 struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
ce3e69847e3ec7 David Sterba 2014-06-15 5390 u64 start)
d1310b2e0cd98e Chris Mason 2008-01-24 5391 {
da17066c40472c Jeff Mahoney 2016-06-15 5392 unsigned long len = fs_info->nodesize;
cc5e31a4775d0d David Sterba 2018-03-01 5393 int num_pages;
cc5e31a4775d0d David Sterba 2018-03-01 5394 int i;
09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 5395 unsigned long index = start >> PAGE_SHIFT;
d1310b2e0cd98e Chris Mason 2008-01-24 5396 struct extent_buffer *eb;
6af118ce51b52c Chris Mason 2008-07-22 5397 struct extent_buffer *exists = NULL;
d1310b2e0cd98e Chris Mason 2008-01-24 5398 struct page *p;
f28491e0a6c46d Josef Bacik 2013-12-16 5399 struct address_space *mapping = fs_info->btree_inode->i_mapping;
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5400 struct subpage_eb_mapping *subpage_mapping = NULL;
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5401 bool subpage = (fs_info->sectorsize < PAGE_SIZE);
d1310b2e0cd98e Chris Mason 2008-01-24 5402 int uptodate = 1;
19fe0a8b787d7c Miao Xie 2010-10-26 5403 int ret;
d1310b2e0cd98e Chris Mason 2008-01-24 5404
da17066c40472c Jeff Mahoney 2016-06-15 5405 if (!IS_ALIGNED(start, fs_info->sectorsize)) {
c871b0f2fd27e7 Liu Bo 2016-06-06 5406 btrfs_err(fs_info, "bad tree block start %llu", start);
c871b0f2fd27e7 Liu Bo 2016-06-06 5407 return ERR_PTR(-EINVAL);
c871b0f2fd27e7 Liu Bo 2016-06-06 5408 }
039b297b76d816 Qu Wenruo 2020-09-08 5409 if (fs_info->sectorsize < PAGE_SIZE && round_down(start, PAGE_SIZE) !=
039b297b76d816 Qu Wenruo 2020-09-08 5410 round_down(start + len - 1, PAGE_SIZE)) {
039b297b76d816 Qu Wenruo 2020-09-08 5411 btrfs_err(fs_info,
039b297b76d816 Qu Wenruo 2020-09-08 5412 "tree block crosses page boundary, start %llu nodesize %lu",
039b297b76d816 Qu Wenruo 2020-09-08 5413 start, len);
039b297b76d816 Qu Wenruo 2020-09-08 5414 return ERR_PTR(-EINVAL);
039b297b76d816 Qu Wenruo 2020-09-08 5415 }
c871b0f2fd27e7 Liu Bo 2016-06-06 5416
f28491e0a6c46d Josef Bacik 2013-12-16 5417 eb = find_extent_buffer(fs_info, start);
452c75c3d21870 Chandra Seetharaman 2013-10-07 5418 if (eb)
6af118ce51b52c Chris Mason 2008-07-22 5419 return eb;
6af118ce51b52c Chris Mason 2008-07-22 5420
23d79d81b13431 David Sterba 2014-06-15 5421 eb = __alloc_extent_buffer(fs_info, start, len);
2b114d1d33551a Peter 2008-04-01 5422 if (!eb)
c871b0f2fd27e7 Liu Bo 2016-06-06 5423 return ERR_PTR(-ENOMEM);
d1310b2e0cd98e Chris Mason 2008-01-24 5424
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5425 if (subpage) {
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5426 subpage_mapping = kmalloc(sizeof(*subpage_mapping), GFP_NOFS);
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5427 if (!mapping) {
This was probably supposed to be if "subpage_mapping" instead of
"mapping".
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5428 exists = ERR_PTR(-ENOMEM);
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5429 goto free_eb;
The "num_pages" variable is uninitialized on this goto path.
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5430 }
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5431 }
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5432
65ad010488a5cc David Sterba 2018-06-29 5433 num_pages = num_extent_pages(eb);
727011e07cbdf8 Chris Mason 2010-08-06 5434 for (i = 0; i < num_pages; i++, index++) {
d1b5c5671d010d Michal Hocko 2015-08-19 5435 p = find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL);
c871b0f2fd27e7 Liu Bo 2016-06-06 5436 if (!p) {
c871b0f2fd27e7 Liu Bo 2016-06-06 5437 exists = ERR_PTR(-ENOMEM);
6af118ce51b52c Chris Mason 2008-07-22 5438 goto free_eb;
c871b0f2fd27e7 Liu Bo 2016-06-06 5439 }
4f2de97acee653 Josef Bacik 2012-03-07 5440
4f2de97acee653 Josef Bacik 2012-03-07 5441 spin_lock(&mapping->private_lock);
4f2de97acee653 Josef Bacik 2012-03-07 5442 if (PagePrivate(p)) {
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5443 exists = grab_extent_buffer_from_page(p, start);
b76bd038ff9290 Qu Wenruo 2020-09-08 5444 if (exists && atomic_inc_not_zero(&exists->refs)) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This increment doesn't pair perfectly with the decrement in the error
handling. In other words, we sometimes decrement it when it was never
incremented. This presumably will lead to a use after free.
4f2de97acee653 Josef Bacik 2012-03-07 5445 spin_unlock(&mapping->private_lock);
4f2de97acee653 Josef Bacik 2012-03-07 5446 unlock_page(p);
09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 5447 put_page(p);
2457aec63745e2 Mel Gorman 2014-06-04 5448 mark_extent_buffer_accessed(exists, p);
4f2de97acee653 Josef Bacik 2012-03-07 5449 goto free_eb;
4f2de97acee653 Josef Bacik 2012-03-07 5450 }
5ca64f45e92dc5 Omar Sandoval 2015-02-24 5451 exists = NULL;
4f2de97acee653 Josef Bacik 2012-03-07 5452
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5453 if (!subpage) {
4f2de97acee653 Josef Bacik 2012-03-07 5454 /*
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5455 * Do this so attach doesn't complain and we
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5456 * need to drop the ref the old guy had.
4f2de97acee653 Josef Bacik 2012-03-07 5457 */
4f2de97acee653 Josef Bacik 2012-03-07 5458 ClearPagePrivate(p);
0b32f4bbb423f0 Josef Bacik 2012-03-13 5459 WARN_ON(PageDirty(p));
09cbfeaf1a5a67 Kirill A. Shutemov 2016-04-01 5460 put_page(p);
d1310b2e0cd98e Chris Mason 2008-01-24 5461 }
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5462 }
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5463 attach_extent_buffer_page(eb, p, subpage_mapping);
4f2de97acee653 Josef Bacik 2012-03-07 5464 spin_unlock(&mapping->private_lock);
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5465 subpage_mapping = NULL;
0b32f4bbb423f0 Josef Bacik 2012-03-13 5466 WARN_ON(PageDirty(p));
727011e07cbdf8 Chris Mason 2010-08-06 5467 eb->pages[i] = p;
d1310b2e0cd98e Chris Mason 2008-01-24 5468 if (!PageUptodate(p))
d1310b2e0cd98e Chris Mason 2008-01-24 5469 uptodate = 0;
eb14ab8ed24a04 Chris Mason 2011-02-10 5470
eb14ab8ed24a04 Chris Mason 2011-02-10 5471 /*
b16d011e79fb35 Nikolay Borisov 2018-07-04 5472 * We can't unlock the pages just yet since the extent buffer
b16d011e79fb35 Nikolay Borisov 2018-07-04 5473 * hasn't been properly inserted in the radix tree, this
b16d011e79fb35 Nikolay Borisov 2018-07-04 5474 * opens a race with btree_releasepage which can free a page
b16d011e79fb35 Nikolay Borisov 2018-07-04 5475 * while we are still filling in all pages for the buffer and
b16d011e79fb35 Nikolay Borisov 2018-07-04 5476 * we could crash.
eb14ab8ed24a04 Chris Mason 2011-02-10 5477 */
d1310b2e0cd98e Chris Mason 2008-01-24 5478 }
d1310b2e0cd98e Chris Mason 2008-01-24 5479 if (uptodate)
b4ce94de9b4d64 Chris Mason 2009-02-04 5480 set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
115391d2315239 Josef Bacik 2012-03-09 5481 again:
e1860a7724828a David Sterba 2016-05-09 5482 ret = radix_tree_preload(GFP_NOFS);
c871b0f2fd27e7 Liu Bo 2016-06-06 5483 if (ret) {
c871b0f2fd27e7 Liu Bo 2016-06-06 5484 exists = ERR_PTR(ret);
19fe0a8b787d7c Miao Xie 2010-10-26 5485 goto free_eb;
c871b0f2fd27e7 Liu Bo 2016-06-06 5486 }
19fe0a8b787d7c Miao Xie 2010-10-26 5487
f28491e0a6c46d Josef Bacik 2013-12-16 5488 spin_lock(&fs_info->buffer_lock);
f28491e0a6c46d Josef Bacik 2013-12-16 5489 ret = radix_tree_insert(&fs_info->buffer_radix,
d31cf6896006df Qu Wenruo 2020-09-08 5490 start / fs_info->sectorsize, eb);
f28491e0a6c46d Josef Bacik 2013-12-16 5491 spin_unlock(&fs_info->buffer_lock);
19fe0a8b787d7c Miao Xie 2010-10-26 5492 radix_tree_preload_end();
452c75c3d21870 Chandra Seetharaman 2013-10-07 5493 if (ret == -EEXIST) {
f28491e0a6c46d Josef Bacik 2013-12-16 5494 exists = find_extent_buffer(fs_info, start);
452c75c3d21870 Chandra Seetharaman 2013-10-07 5495 if (exists)
6af118ce51b52c Chris Mason 2008-07-22 5496 goto free_eb;
452c75c3d21870 Chandra Seetharaman 2013-10-07 5497 else
452c75c3d21870 Chandra Seetharaman 2013-10-07 5498 goto again;
6af118ce51b52c Chris Mason 2008-07-22 5499 }
6af118ce51b52c Chris Mason 2008-07-22 5500 /* add one reference for the tree */
0b32f4bbb423f0 Josef Bacik 2012-03-13 5501 check_buffer_tree_ref(eb);
34b41acec1ccc0 Josef Bacik 2013-12-13 5502 set_bit(EXTENT_BUFFER_IN_TREE, &eb->bflags);
eb14ab8ed24a04 Chris Mason 2011-02-10 5503
eb14ab8ed24a04 Chris Mason 2011-02-10 5504 /*
b16d011e79fb35 Nikolay Borisov 2018-07-04 5505 * Now it's safe to unlock the pages because any calls to
b16d011e79fb35 Nikolay Borisov 2018-07-04 5506 * btree_releasepage will correctly detect that a page belongs to a
b16d011e79fb35 Nikolay Borisov 2018-07-04 5507 * live buffer and won't free them prematurely.
eb14ab8ed24a04 Chris Mason 2011-02-10 5508 */
28187ae569e8a6 Nikolay Borisov 2018-07-04 5509 for (i = 0; i < num_pages; i++)
28187ae569e8a6 Nikolay Borisov 2018-07-04 5510 unlock_page(eb->pages[i]);
d1310b2e0cd98e Chris Mason 2008-01-24 @5511 return eb;
d1310b2e0cd98e Chris Mason 2008-01-24 5512
6af118ce51b52c Chris Mason 2008-07-22 5513 free_eb:
5ca64f45e92dc5 Omar Sandoval 2015-02-24 5514 WARN_ON(!atomic_dec_and_test(&eb->refs));
3ef1cb4eb96ce4 Qu Wenruo 2020-09-08 5515 kfree(subpage_mapping);
727011e07cbdf8 Chris Mason 2010-08-06 @5516 for (i = 0; i < num_pages; i++) {
727011e07cbdf8 Chris Mason 2010-08-06 5517 if (eb->pages[i])
727011e07cbdf8 Chris Mason 2010-08-06 5518 unlock_page(eb->pages[i]);
727011e07cbdf8 Chris Mason 2010-08-06 5519 }
eb14ab8ed24a04 Chris Mason 2011-02-10 5520
897ca6e9b4fef8 Miao Xie 2010-10-26 5521 btrfs_release_extent_buffer(eb);
6af118ce51b52c Chris Mason 2008-07-22 5522 return exists;
d1310b2e0cd98e Chris Mason 2008-01-24 5523 }
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 38592 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 04/17] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values
2020-09-08 7:52 ` [PATCH 04/17] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
@ 2020-09-08 18:03 ` kernel test robot
2020-09-11 10:11 ` Nikolay Borisov
1 sibling, 0 replies; 49+ messages in thread
From: kernel test robot @ 2020-09-08 18:03 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 3111 bytes --]
Hi Qu,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on v5.9-rc4]
[also build test ERROR on next-20200908]
[cannot apply to kdave/for-next btrfs/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
base: f4d51dffc6c01a9e94650d95ce0104964f8ae822
config: i386-randconfig-a016-20200908 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
# save the attached .config to linux build tree
make W=1 ARCH=i386
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
ld: fs/btrfs/extent_io.o: in function `release_extent_buffer':
>> fs/btrfs/extent_io.c:5361: undefined reference to `__udivdi3'
ld: fs/btrfs/extent_io.o: in function `find_extent_buffer':
fs/btrfs/extent_io.c:5145: undefined reference to `__udivdi3'
ld: fs/btrfs/extent_io.o: in function `alloc_extent_buffer':
fs/btrfs/extent_io.c:5305: undefined reference to `__udivdi3'
# https://github.com/0day-ci/linux/commit/d31cf6896006df8bbb0adfd39919bfb7926ded08
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Qu-Wenruo/btrfs-add-read-only-support-for-subpage-sector-size/20200908-155601
git checkout d31cf6896006df8bbb0adfd39919bfb7926ded08
vim +5361 fs/btrfs/extent_io.c
5346
5347 static int release_extent_buffer(struct extent_buffer *eb)
5348 __releases(&eb->refs_lock)
5349 {
5350 lockdep_assert_held(&eb->refs_lock);
5351
5352 WARN_ON(atomic_read(&eb->refs) == 0);
5353 if (atomic_dec_and_test(&eb->refs)) {
5354 if (test_and_clear_bit(EXTENT_BUFFER_IN_TREE, &eb->bflags)) {
5355 struct btrfs_fs_info *fs_info = eb->fs_info;
5356
5357 spin_unlock(&eb->refs_lock);
5358
5359 spin_lock(&fs_info->buffer_lock);
5360 radix_tree_delete(&fs_info->buffer_radix,
> 5361 eb->start / fs_info->sectorsize);
5362 spin_unlock(&fs_info->buffer_lock);
5363 } else {
5364 spin_unlock(&eb->refs_lock);
5365 }
5366
5367 btrfs_leak_debug_del(&eb->fs_info->eb_leak_lock, &eb->leak_list);
5368 /* Should be safe to release our pages at this point */
5369 btrfs_release_extent_buffer_pages(eb);
5370 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
5371 if (unlikely(test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags))) {
5372 __free_extent_buffer(eb);
5373 return 1;
5374 }
5375 #endif
5376 call_rcu(&eb->rcu_head, btrfs_release_extent_buffer_rcu);
5377 return 1;
5378 }
5379 spin_unlock(&eb->refs_lock);
5380
5381 return 0;
5382 }
5383
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 32426 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 01/17] btrfs: extent-io-tests: remove invalid tests
2020-09-08 7:52 ` [PATCH 01/17] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
@ 2020-09-09 12:26 ` Nikolay Borisov
2020-09-09 13:06 ` Qu Wenruo
0 siblings, 1 reply; 49+ messages in thread
From: Nikolay Borisov @ 2020-09-09 12:26 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
> In extent-io-test, there are two invalid tests:
> - Invalid nodesize for test_eb_bitmaps()
> Instead of the sectorsize and nodesize combination passed in, we're
> always using hand-crafted nodesize.
> Although it has some extra check for 64K page size, we can still hit
> a case where PAGE_SIZE == 32K, then we got 128K nodesize which is
> larger than max valid node size.
>
> Thankfully most machines are either 4K or 64K page size, thus we
> haven't yet hit such case.
>
> - Invalid extent buffer bytenr
> For 64K page size, the only combination we're going to test is
> sectorsize = nodesize = 64K.
> In that case, we'll try to create an extent buffer with 32K bytenr,
> which is not aligned to sectorsize thus invalid.
>
> This patch will fix both problems by:
> - Honor the sectorsize/nodesize combination
> Now we won't bother to hand-craft a strange length and use it as
> nodesize.
>
> - Use sectorsize as the 2nd run extent buffer start
> This would test the case where extent buffer is aligned to sectorsize
> but not always aligned to nodesize.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Shouldn't you also modify btrfs_run_sanity_tests to extend
test_sectorsize such that it contains a subpage blocksize? As it stands
now test_eb_bitmaps will be called with sectorsize always being
PAGE_SIZE and nodesize being a multiple of the PAGE_SIZE i.e for a 4k
page that would be 4/8/16/32/64 k nodes
<snip>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 01/17] btrfs: extent-io-tests: remove invalid tests
2020-09-09 12:26 ` Nikolay Borisov
@ 2020-09-09 13:06 ` Qu Wenruo
0 siblings, 0 replies; 49+ messages in thread
From: Qu Wenruo @ 2020-09-09 13:06 UTC (permalink / raw)
To: Nikolay Borisov, linux-btrfs
On 2020/9/9 下午8:26, Nikolay Borisov wrote:
>
>
> On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
>> In extent-io-test, there are two invalid tests:
>> - Invalid nodesize for test_eb_bitmaps()
>> Instead of the sectorsize and nodesize combination passed in, we're
>> always using hand-crafted nodesize.
>> Although it has some extra check for 64K page size, we can still hit
>> a case where PAGE_SIZE == 32K, then we got 128K nodesize which is
>> larger than max valid node size.
>>
>> Thankfully most machines are either 4K or 64K page size, thus we
>> haven't yet hit such case.
>>
>> - Invalid extent buffer bytenr
>> For 64K page size, the only combination we're going to test is
>> sectorsize = nodesize = 64K.
>> In that case, we'll try to create an extent buffer with 32K bytenr,
>> which is not aligned to sectorsize thus invalid.
>>
>> This patch will fix both problems by:
>> - Honor the sectorsize/nodesize combination
>> Now we won't bother to hand-craft a strange length and use it as
>> nodesize.
>>
>> - Use sectorsize as the 2nd run extent buffer start
>> This would test the case where extent buffer is aligned to sectorsize
>> but not always aligned to nodesize.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> Shouldn't you also modify btrfs_run_sanity_tests to extend
> test_sectorsize such that it contains a subpage blocksize? As it stands
> now test_eb_bitmaps will be called with sectorsize always being
> PAGE_SIZE and nodesize being a multiple of the PAGE_SIZE i.e for a 4k
> page that would be 4/8/16/32/64 k nodes
Not yet, currently since it's just RO support, I'm not confident enough
for the set_extent_bits() path.
Thanks,
Qu
>
>
> <snip>
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 14/17] btrfs: make btrfs_readpage_end_io_hook() follow sector size
2020-09-08 7:52 ` [PATCH 14/17] btrfs: make btrfs_readpage_end_io_hook() follow sector size Qu Wenruo
@ 2020-09-09 17:34 ` Goldwyn Rodrigues
2020-09-10 0:05 ` Qu Wenruo
0 siblings, 1 reply; 49+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-09 17:34 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On 15:52 08/09, Qu Wenruo wrote:
> Currently btrfs_readpage_end_io_hook() just pass the whole page to
> check_data_csum(), which is fine since we only support sectorsize ==
> PAGE_SIZE.
>
> To support subpage RO support, we need to properly honor per-sector
> checksum verification, just like what we did in dio read path.
>
> This patch will do the csum verification in a for loop, starts with
> pg_off == start - page_offset(page), with sectorsize increasement for
> each loop.
>
> For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
> will only finish with just one loop.
>
> For subpage, we do the proper loop.
>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/inode.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 078735aa0f68..8bd14dda2067 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2851,9 +2851,12 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
> u64 start, u64 end, int mirror)
> {
> size_t offset = start - page_offset(page);
> + size_t pg_off;
> struct inode *inode = page->mapping->host;
> struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
> struct btrfs_root *root = BTRFS_I(inode)->root;
> + u32 sectorsize = root->fs_info->sectorsize;
> + bool found_err = false;
>
> if (PageChecked(page)) {
> ClearPageChecked(page);
> @@ -2870,7 +2873,17 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
> }
>
> phy_offset >>= inode->i_sb->s_blocksize_bits;
> - return check_data_csum(inode, io_bio, phy_offset, page, offset);
> + for (pg_off = offset; pg_off < end - page_offset(page);
> + pg_off += sectorsize, phy_offset++) {
> + int ret;
> +
> + ret = check_data_csum(inode, io_bio, phy_offset, page, pg_off);
> + if (ret < 0)
> + found_err = true;
> + }
> + if (found_err)
> + return -EIO;
> + return 0;
> }
We don't need found_err here. Just return ret when you encounter the
first error.
--
Goldwyn
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 14/17] btrfs: make btrfs_readpage_end_io_hook() follow sector size
2020-09-09 17:34 ` Goldwyn Rodrigues
@ 2020-09-10 0:05 ` Qu Wenruo
2020-09-10 14:26 ` Goldwyn Rodrigues
0 siblings, 1 reply; 49+ messages in thread
From: Qu Wenruo @ 2020-09-10 0:05 UTC (permalink / raw)
To: Goldwyn Rodrigues, Qu Wenruo; +Cc: linux-btrfs
[-- Attachment #1.1: Type: text/plain, Size: 2416 bytes --]
On 2020/9/10 上午1:34, Goldwyn Rodrigues wrote:
> On 15:52 08/09, Qu Wenruo wrote:
>> Currently btrfs_readpage_end_io_hook() just pass the whole page to
>> check_data_csum(), which is fine since we only support sectorsize ==
>> PAGE_SIZE.
>>
>> To support subpage RO support, we need to properly honor per-sector
>> checksum verification, just like what we did in dio read path.
>>
>> This patch will do the csum verification in a for loop, starts with
>> pg_off == start - page_offset(page), with sectorsize increasement for
>> each loop.
>>
>> For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
>> will only finish with just one loop.
>>
>> For subpage, we do the proper loop.
>>
>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> fs/btrfs/inode.c | 15 ++++++++++++++-
>> 1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 078735aa0f68..8bd14dda2067 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -2851,9 +2851,12 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
>> u64 start, u64 end, int mirror)
>> {
>> size_t offset = start - page_offset(page);
>> + size_t pg_off;
>> struct inode *inode = page->mapping->host;
>> struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
>> struct btrfs_root *root = BTRFS_I(inode)->root;
>> + u32 sectorsize = root->fs_info->sectorsize;
>> + bool found_err = false;
>>
>> if (PageChecked(page)) {
>> ClearPageChecked(page);
>> @@ -2870,7 +2873,17 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
>> }
>>
>> phy_offset >>= inode->i_sb->s_blocksize_bits;
>> - return check_data_csum(inode, io_bio, phy_offset, page, offset);
>> + for (pg_off = offset; pg_off < end - page_offset(page);
>> + pg_off += sectorsize, phy_offset++) {
>> + int ret;
>> +
>> + ret = check_data_csum(inode, io_bio, phy_offset, page, pg_off);
>> + if (ret < 0)
>> + found_err = true;
>> + }
>> + if (found_err)
>> + return -EIO;
>> + return 0;
>> }
>
> We don't need found_err here. Just return ret when you encounter the
> first error.
>
But that means, the whole range will be marked error, while some sectors
may still contain good data and pass the csum checking.
Thanks,
Qu
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 14/17] btrfs: make btrfs_readpage_end_io_hook() follow sector size
2020-09-10 0:05 ` Qu Wenruo
@ 2020-09-10 14:26 ` Goldwyn Rodrigues
0 siblings, 0 replies; 49+ messages in thread
From: Goldwyn Rodrigues @ 2020-09-10 14:26 UTC (permalink / raw)
To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 2985 bytes --]
On 8:05 10/09, Qu Wenruo wrote:
>
>
> On 2020/9/10 上午1:34, Goldwyn Rodrigues wrote:
> > On 15:52 08/09, Qu Wenruo wrote:
> >> Currently btrfs_readpage_end_io_hook() just pass the whole page to
> >> check_data_csum(), which is fine since we only support sectorsize ==
> >> PAGE_SIZE.
> >>
> >> To support subpage RO support, we need to properly honor per-sector
> >> checksum verification, just like what we did in dio read path.
> >>
> >> This patch will do the csum verification in a for loop, starts with
> >> pg_off == start - page_offset(page), with sectorsize increasement for
> >> each loop.
> >>
> >> For sectorsize == PAGE_SIZE case, the pg_off will always be 0, and we
> >> will only finish with just one loop.
> >>
> >> For subpage, we do the proper loop.
> >>
> >> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >> fs/btrfs/inode.c | 15 ++++++++++++++-
> >> 1 file changed, 14 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >> index 078735aa0f68..8bd14dda2067 100644
> >> --- a/fs/btrfs/inode.c
> >> +++ b/fs/btrfs/inode.c
> >> @@ -2851,9 +2851,12 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
> >> u64 start, u64 end, int mirror)
> >> {
> >> size_t offset = start - page_offset(page);
> >> + size_t pg_off;
> >> struct inode *inode = page->mapping->host;
> >> struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree;
> >> struct btrfs_root *root = BTRFS_I(inode)->root;
> >> + u32 sectorsize = root->fs_info->sectorsize;
> >> + bool found_err = false;
> >>
> >> if (PageChecked(page)) {
> >> ClearPageChecked(page);
> >> @@ -2870,7 +2873,17 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
> >> }
> >>
> >> phy_offset >>= inode->i_sb->s_blocksize_bits;
> >> - return check_data_csum(inode, io_bio, phy_offset, page, offset);
> >> + for (pg_off = offset; pg_off < end - page_offset(page);
> >> + pg_off += sectorsize, phy_offset++) {
> >> + int ret;
> >> +
> >> + ret = check_data_csum(inode, io_bio, phy_offset, page, pg_off);
> >> + if (ret < 0)
> >> + found_err = true;
> >> + }
> >> + if (found_err)
> >> + return -EIO;
> >> + return 0;
> >> }
> >
> > We don't need found_err here. Just return ret when you encounter the
> > first error.
> >
> But that means, the whole range will be marked error, while some sectors
> may still contain good data and pass the csum checking.
>
It would have made sense if you are storing block-by-block value of the
validity of the page so it may be referenced later. The function is only
checking if the data read in the page is correct or not, whether a part
of the data is correct is of no consequence to the caller. The earlier it
returns on an error, the better.. rather than checking the whole range
just to return the same -EIO.
--
Goldwyn
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 02/17] btrfs: calculate inline extent buffer page size based on page size
2020-09-08 7:52 ` [PATCH 02/17] btrfs: calculate inline extent buffer page size based on page size Qu Wenruo
@ 2020-09-11 9:56 ` Nikolay Borisov
2020-09-11 10:13 ` Qu Wenruo
0 siblings, 1 reply; 49+ messages in thread
From: Nikolay Borisov @ 2020-09-11 9:56 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
> Btrfs only support 64K as max node size, thus for 4K page system, we
> would have at most 16 pages for one extent buffer.
>
> But for 64K system, we only need and always need one page for extent
> buffer.
-EAMBIGIOUS. It should be "For a system using 64k pages we would really
have have just a single page"
> This stays true even for future subpage sized sector size support (as
> long as extent buffer doesn't cross 64K boundary).
>
> So this patch will change how INLINE_EXTENT_BUFFER_PAGES is calculated.
>
> Instead of using fixed 16 pages, use (64K / PAGE_SIZE) as the result.
> This should save some bytes for extent buffer structure for 64K
> systems.
I'd rephrase the whole changelog as something along the lines of :
"Currently btrfs hardcodes the number of inline pages it uses to 16
which in turn has an effect on MAX_INLINE_EXTENT_BUFFER_SIZE effectively
defining the upper limit of the size of extent buffer. For systems using
4k pages this works out fine but on 64k page systems this results in
unnecessary memory overhead. That's due to the fact
BTRFS_MAX_METADATA_BLOCKSIZE is defined as 64k so having
INLINE_EXTENT_BUFFER_PAGES set to 16 on a 64k system results in
extent_buffer::pages being unnecessarily large since an eb can be mapped
with just a single page but the pages array would be 16 entries large.
Fix this by changing the way we calculate the size of the pages array by
exploiting the fact an eb can't be larger than 64k so choosing enough
pages to fit it"
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
This patch must be split into 2:
1. Changing the defines
2. Simplifying num_extent_pages
> ---
> fs/btrfs/extent_io.h | 15 +++++++++++----
> 1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index 00a88f2eb5ab..e16c5449ba48 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -86,8 +86,8 @@ struct extent_io_ops {
> };
>
>
> -#define INLINE_EXTENT_BUFFER_PAGES 16
> -#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE)
> +#define MAX_INLINE_EXTENT_BUFFER_SIZE SZ_64K
> +#define INLINE_EXTENT_BUFFER_PAGES (MAX_INLINE_EXTENT_BUFFER_SIZE / PAGE_SIZE)
Actually having the defines like that it makes no sense to keep
MAX_INLINE_EXTENT_BUFFER_SIZE around since it has the same value as
BTRFS_MAX_METADATA_BLOCKSIZE. So why not just remove
MAX_INLINE_EXTENT_BUFFER_SIZE and use BTRFS_MAX_METADATA_BLOCKSIZE when
calculating INLINE_EXTENT_BUFFER_PAGES.
> struct extent_buffer {
> u64 start;
> unsigned long len;
> @@ -227,8 +227,15 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb);
>
> static inline int num_extent_pages(const struct extent_buffer *eb)
> {
> - return (round_up(eb->start + eb->len, PAGE_SIZE) >> PAGE_SHIFT) -
> - (eb->start >> PAGE_SHIFT);
> + /*
> + * For sectorsize == PAGE_SIZE case, since eb is always aligned to
> + * sectorsize, it's just (eb->len / PAGE_SIZE) >> PAGE_SHIFT.
> + *
> + * For sectorsize < PAGE_SIZE case, we only want to support 64K
> + * PAGE_SIZE, and ensured all tree blocks won't cross page boundary.
> + * So in that case we always got 1 page.
> + */
> + return (round_up(eb->len, PAGE_SIZE) >> PAGE_SHIFT);
> }
>
> static inline int extent_buffer_uptodate(const struct extent_buffer *eb)
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 03/17] btrfs: remove the open-code to read disk-key
2020-09-08 7:52 ` [PATCH 03/17] btrfs: remove the open-code to read disk-key Qu Wenruo
@ 2020-09-11 10:07 ` Nikolay Borisov
0 siblings, 0 replies; 49+ messages in thread
From: Nikolay Borisov @ 2020-09-11 10:07 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
> There is some ancient code from the old days where we handle the
> disk_key read manually when the disk key is in one page.>
> But that's totally unnecessary, as we have read_extent_buffer() to
> handle everything.
I'd rephrase this to something like:
"generic_bin_search distinguishes between reading a key which doesn't
cross a page and one which does. However this distinction is not
necessary since read_extent_buffer handles both cases transparently.
Just use read_extent_buffer to streamline the code"
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/ctree.c | 13 ++-----------
> 1 file changed, 2 insertions(+), 11 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index cd1cd673bc0b..e204e1320745 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -1697,7 +1697,6 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
> }
>
> while (low < high) {
> - unsigned long oip;
> unsigned long offset;
> struct btrfs_disk_key *tmp;
> struct btrfs_disk_key unaligned;
> @@ -1705,17 +1704,9 @@ static noinline int generic_bin_search(struct extent_buffer *eb,
>
> mid = (low + high) / 2;
> offset = p + mid * item_size;
> - oip = offset_in_page(offset);
>
> - if (oip + key_size <= PAGE_SIZE) {
> - const unsigned long idx = offset >> PAGE_SHIFT;
> - char *kaddr = page_address(eb->pages[idx]);
> -
> - tmp = (struct btrfs_disk_key *)(kaddr + oip);
> - } else {
> - read_extent_buffer(eb, &unaligned, offset, key_size);
> - tmp = &unaligned;
> - }
> + read_extent_buffer(eb, &unaligned, offset, key_size);
> + tmp = &unaligned;
>
> ret = comp_keys(tmp, key);
>
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 04/17] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values
2020-09-08 7:52 ` [PATCH 04/17] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
2020-09-08 18:03 ` kernel test robot
@ 2020-09-11 10:11 ` Nikolay Borisov
2020-09-11 10:15 ` Qu Wenruo
1 sibling, 1 reply; 49+ messages in thread
From: Nikolay Borisov @ 2020-09-11 10:11 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
> For subpage size sector size support, one page can contain mutliple tree
> blocks, thus we can no longer use (eb->start >> PAGE_SHIFT) any more, or
> we can easily get extent buffer doesn't belongs to us.
>
> This patch will use (extent_buffer::start / sectorsize) as index for radix
> tree so that we can get correct extent buffer for subpage size support.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
That's fine, however now that we are moving towards sectorsize I wonder
if it would make more sense to fs_info->sector_bits which would be
log2(fs_info->sectorsize) and have expressions such as:
start >> fs_info->sector_bits. I * think* we can rely on the compiler
doing the right thing given fs_info->sectorsize is guaranteed to be a
power of 2 value.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> ---
> fs/btrfs/extent_io.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 6def411b2eba..5d969340275e 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5142,7 +5142,7 @@ struct extent_buffer *find_extent_buffer(struct btrfs_fs_info *fs_info,
>
> rcu_read_lock();
> eb = radix_tree_lookup(&fs_info->buffer_radix,
> - start >> PAGE_SHIFT);
> + start / fs_info->sectorsize);
> if (eb && atomic_inc_not_zero(&eb->refs)) {
> rcu_read_unlock();
> /*
> @@ -5194,7 +5194,7 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
> }
> spin_lock(&fs_info->buffer_lock);
> ret = radix_tree_insert(&fs_info->buffer_radix,
> - start >> PAGE_SHIFT, eb);
> + start / fs_info->sectorsize, eb);
> spin_unlock(&fs_info->buffer_lock);
> radix_tree_preload_end();
> if (ret == -EEXIST) {
> @@ -5302,7 +5302,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>
> spin_lock(&fs_info->buffer_lock);
> ret = radix_tree_insert(&fs_info->buffer_radix,
> - start >> PAGE_SHIFT, eb);
> + start / fs_info->sectorsize, eb);
> spin_unlock(&fs_info->buffer_lock);
> radix_tree_preload_end();
> if (ret == -EEXIST) {
> @@ -5358,7 +5358,7 @@ static int release_extent_buffer(struct extent_buffer *eb)
>
> spin_lock(&fs_info->buffer_lock);
> radix_tree_delete(&fs_info->buffer_radix,
> - eb->start >> PAGE_SHIFT);
> + eb->start / fs_info->sectorsize);
> spin_unlock(&fs_info->buffer_lock);
> } else {
> spin_unlock(&eb->refs_lock);
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 02/17] btrfs: calculate inline extent buffer page size based on page size
2020-09-11 9:56 ` Nikolay Borisov
@ 2020-09-11 10:13 ` Qu Wenruo
0 siblings, 0 replies; 49+ messages in thread
From: Qu Wenruo @ 2020-09-11 10:13 UTC (permalink / raw)
To: Nikolay Borisov, linux-btrfs
On 2020/9/11 下午5:56, Nikolay Borisov wrote:
>
>
> On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
>> Btrfs only support 64K as max node size, thus for 4K page system, we
>> would have at most 16 pages for one extent buffer.
>>
>> But for 64K system, we only need and always need one page for extent
>> buffer.
>
> -EAMBIGIOUS. It should be "For a system using 64k pages we would really
> have have just a single page"
>
>> This stays true even for future subpage sized sector size support (as
>> long as extent buffer doesn't cross 64K boundary).
>>
>> So this patch will change how INLINE_EXTENT_BUFFER_PAGES is calculated.
>>
>> Instead of using fixed 16 pages, use (64K / PAGE_SIZE) as the result.
>> This should save some bytes for extent buffer structure for 64K
>> systems.
>
> I'd rephrase the whole changelog as something along the lines of :
>
> "Currently btrfs hardcodes the number of inline pages it uses to 16
> which in turn has an effect on MAX_INLINE_EXTENT_BUFFER_SIZE effectively
> defining the upper limit of the size of extent buffer. For systems using
> 4k pages this works out fine but on 64k page systems this results in
> unnecessary memory overhead. That's due to the fact
> BTRFS_MAX_METADATA_BLOCKSIZE is defined as 64k so having
> INLINE_EXTENT_BUFFER_PAGES set to 16 on a 64k system results in
> extent_buffer::pages being unnecessarily large since an eb can be mapped
> with just a single page but the pages array would be 16 entries large.
Really? Turning 3 small sentences into one paragraph without much
separation?
It doesn't improve the readability to me...
>
> Fix this by changing the way we calculate the size of the pages array by
> exploiting the fact an eb can't be larger than 64k so choosing enough
> pages to fit it"
>
>
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
> This patch must be split into 2:
> 1. Changing the defines
> 2. Simplifying num_extent_pages
That's OK to do.
>
>> ---
>> fs/btrfs/extent_io.h | 15 +++++++++++----
>> 1 file changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
>> index 00a88f2eb5ab..e16c5449ba48 100644
>> --- a/fs/btrfs/extent_io.h
>> +++ b/fs/btrfs/extent_io.h
>> @@ -86,8 +86,8 @@ struct extent_io_ops {
>> };
>>
>>
>> -#define INLINE_EXTENT_BUFFER_PAGES 16
>> -#define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE)
>> +#define MAX_INLINE_EXTENT_BUFFER_SIZE SZ_64K
>> +#define INLINE_EXTENT_BUFFER_PAGES (MAX_INLINE_EXTENT_BUFFER_SIZE / PAGE_SIZE)
>
> Actually having the defines like that it makes no sense to keep
> MAX_INLINE_EXTENT_BUFFER_SIZE around since it has the same value as
> BTRFS_MAX_METADATA_BLOCKSIZE. So why not just remove
> MAX_INLINE_EXTENT_BUFFER_SIZE and use BTRFS_MAX_METADATA_BLOCKSIZE when
> calculating INLINE_EXTENT_BUFFER_PAGES.
That's indeed much better.
Thanks,
Qu
>
>
>> struct extent_buffer {
>> u64 start;
>> unsigned long len;
>> @@ -227,8 +227,15 @@ void wait_on_extent_buffer_writeback(struct extent_buffer *eb);
>>
>> static inline int num_extent_pages(const struct extent_buffer *eb)
>> {
>> - return (round_up(eb->start + eb->len, PAGE_SIZE) >> PAGE_SHIFT) -
>> - (eb->start >> PAGE_SHIFT);
>> + /*
>> + * For sectorsize == PAGE_SIZE case, since eb is always aligned to
>> + * sectorsize, it's just (eb->len / PAGE_SIZE) >> PAGE_SHIFT.
>> + *
>> + * For sectorsize < PAGE_SIZE case, we only want to support 64K
>> + * PAGE_SIZE, and ensured all tree blocks won't cross page boundary.
>> + * So in that case we always got 1 page.
>> + */
>> + return (round_up(eb->len, PAGE_SIZE) >> PAGE_SHIFT);
>> }
>>
>> static inline int extent_buffer_uptodate(const struct extent_buffer *eb)
>>
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 04/17] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values
2020-09-11 10:11 ` Nikolay Borisov
@ 2020-09-11 10:15 ` Qu Wenruo
0 siblings, 0 replies; 49+ messages in thread
From: Qu Wenruo @ 2020-09-11 10:15 UTC (permalink / raw)
To: Nikolay Borisov, linux-btrfs
On 2020/9/11 下午6:11, Nikolay Borisov wrote:
>
>
> On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
>> For subpage size sector size support, one page can contain mutliple tree
>> blocks, thus we can no longer use (eb->start >> PAGE_SHIFT) any more, or
>> we can easily get extent buffer doesn't belongs to us.
>>
>> This patch will use (extent_buffer::start / sectorsize) as index for radix
>> tree so that we can get correct extent buffer for subpage size support.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> That's fine, however now that we are moving towards sectorsize I wonder
> if it would make more sense to fs_info->sector_bits which would be
Exactly what david is doing.
IIRC he mentioned such shift bits simplification in IRC, and I'm just
waiting for that bit to land and use that to further simplify the code.
Thanks,
Qu
>
> log2(fs_info->sectorsize) and have expressions such as:
>
> start >> fs_info->sector_bits. I * think* we can rely on the compiler
> doing the right thing given fs_info->sectorsize is guaranteed to be a
> power of 2 value.
>
>
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
>
>> ---
>> fs/btrfs/extent_io.c | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 6def411b2eba..5d969340275e 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -5142,7 +5142,7 @@ struct extent_buffer *find_extent_buffer(struct btrfs_fs_info *fs_info,
>>
>> rcu_read_lock();
>> eb = radix_tree_lookup(&fs_info->buffer_radix,
>> - start >> PAGE_SHIFT);
>> + start / fs_info->sectorsize);
>> if (eb && atomic_inc_not_zero(&eb->refs)) {
>> rcu_read_unlock();
>> /*
>> @@ -5194,7 +5194,7 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
>> }
>> spin_lock(&fs_info->buffer_lock);
>> ret = radix_tree_insert(&fs_info->buffer_radix,
>> - start >> PAGE_SHIFT, eb);
>> + start / fs_info->sectorsize, eb);
>> spin_unlock(&fs_info->buffer_lock);
>> radix_tree_preload_end();
>> if (ret == -EEXIST) {
>> @@ -5302,7 +5302,7 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>>
>> spin_lock(&fs_info->buffer_lock);
>> ret = radix_tree_insert(&fs_info->buffer_radix,
>> - start >> PAGE_SHIFT, eb);
>> + start / fs_info->sectorsize, eb);
>> spin_unlock(&fs_info->buffer_lock);
>> radix_tree_preload_end();
>> if (ret == -EEXIST) {
>> @@ -5358,7 +5358,7 @@ static int release_extent_buffer(struct extent_buffer *eb)
>>
>> spin_lock(&fs_info->buffer_lock);
>> radix_tree_delete(&fs_info->buffer_radix,
>> - eb->start >> PAGE_SHIFT);
>> + eb->start / fs_info->sectorsize);
>> spin_unlock(&fs_info->buffer_lock);
>> } else {
>> spin_unlock(&eb->refs_lock);
>>
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 00/17] btrfs: add read-only support for subpage sector size
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
` (17 preceding siblings ...)
2020-09-08 8:03 ` [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
@ 2020-09-11 10:24 ` Qu Wenruo
18 siblings, 0 replies; 49+ messages in thread
From: Qu Wenruo @ 2020-09-11 10:24 UTC (permalink / raw)
To: linux-btrfs
On 2020/9/8 下午3:52, Qu Wenruo wrote:
> Patches can be fetched from github:
> https://github.com/adam900710/linux/tree/subpage
>
> Currently btrfs only allows to mount fs with sectorsize == PAGE_SIZE.
>
> That means, for 64K page size system, they can only use 64K sector size
> fs.
> This brings a big compatible problem for btrfs.
>
> This patch is going to slightly solve the problem by, allowing 64K
> system to mount 4K sectorsize fs in read-only mode.
>
> The main objective here, is to remove the blockage in the code base, and
> pave the road to full RW mount support.
>
> == What works ==
>
> Existing regular page sized sector size support
> Subpage read-only Mount (with all self tests and ASSERT)
> Subpage metadata read (including all trees and inline extents, and csum checking)
> Subpage uncompressed data read (with csum checking)
>
> == What doesn't work ==
>
> Read-write mount (see the subject)
> Compressed data read
>
> == Challenge we meet ==
>
> The main problem is metadata, where we have several limitations:
> - We always read the full page of a metadata
> In subpage case, one full page can contain several tree blocks.
>
> - We use page::private to point to extent buffer
> This means we currently can only support one-page-to-one-extent-buffer
> mapping.
> For subpage size support, we need one-page-to-multiple-extent-buffer
> mapping.
>
>
> == Solutions ==
>
> So here for the metadata part, we use the following methods to
> workaround the problem:
>
> - Introduce subpage_eb_mapping structure to do bitmap
> Now for subpage, page::private points to a subpage_eb_mapping
> structure, which has a bitmap to mapping one page to multiple extent
> buffers.
>
> - Still do full page read for metadata
> This means, at read time, we're not reading just one extent buffer,
> but possibly many more.
> In that case, we first do tree block verification for the tree blocks
> triggering the read, and mark the page uptodate.
>
> For newly incoming tree block read, they will check if the tree block
> is verified. If not verified, even if the page is uptodate, we still
> need to check the extent buffer.
>
> By this all subpage extent buffers are verified properly.
>
> For data part, it's pretty simple, all existing infrastructure can be
> easily converted to support subpage read, without any subpage specific
> handing yet.
>
> == Patchset structure ==
>
> The structure of the patchset:
> Patch 01~11: Preparation patches for metadata subpage read support.
> These patches can be merged without problem, and work for
> both regular and subpage case.
> Patch 12~14: Patches for data subpage read support.
> These patches works for both cases.
>
> That means, patch 01~14 can be applied to current kernel, and shouldn't
> affect any existing behavior.
>
> Patch 15~17: Subpage metadata read specific patches.
> These patches introduces the main part of the subpage
> metadata read support.
For the last 3 patches, it turns out that, we may get rid of
page::private pointer completely, and greatly simplify the bits handling
by relying on extent_io_tree.
So if possible, please ignore these last 3 patches for now. They would
be the backup solution if the extent_io_tree idea doesn't go well.
Thanks,
Qu
>
> The number of patches is the main reason I'm submitting them to the mail
> list. As there are too many preparation patches already.
>
> Qu Wenruo (17):
> btrfs: extent-io-tests: remove invalid tests
> btrfs: calculate inline extent buffer page size based on page size
> btrfs: remove the open-code to read disk-key
> btrfs: make btrfs_fs_info::buffer_radix to take sector size devided
> values
> btrfs: don't allow tree block to cross page boundary for subpage
> support
> btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors
> btrfs: make csum_tree_block() handle sectorsize smaller than page size
> btrfs: refactor how we extract extent buffer from page for
> alloc_extent_buffer()
> btrfs: refactor btrfs_release_extent_buffer_pages()
> btrfs: add assert_spin_locked() for attach_extent_buffer_page()
> btrfs: extract the extent buffer verification from
> btree_readpage_end_io_hook()
> btrfs: remove the unnecessary parameter @start and @len for
> check_data_csum()
> btrfs: extent_io: only require sector size alignment for page read
> btrfs: make btrfs_readpage_end_io_hook() follow sector size
> btrfs: introduce subpage_eb_mapping for extent buffers
> btrfs: handle extent buffer verification proper for subpage size
> btrfs: allow RO mount of 4K sector size fs on 64K page system
>
> fs/btrfs/ctree.c | 13 +-
> fs/btrfs/ctree.h | 38 ++-
> fs/btrfs/disk-io.c | 111 ++++---
> fs/btrfs/disk-io.h | 1 +
> fs/btrfs/extent_io.c | 538 +++++++++++++++++++++++++------
> fs/btrfs/extent_io.h | 19 +-
> fs/btrfs/inode.c | 40 ++-
> fs/btrfs/struct-funcs.c | 18 +-
> fs/btrfs/super.c | 7 +
> fs/btrfs/tests/extent-io-tests.c | 26 +-
> 10 files changed, 633 insertions(+), 178 deletions(-)
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 05/17] btrfs: don't allow tree block to cross page boundary for subpage support
2020-09-08 7:52 ` [PATCH 05/17] btrfs: don't allow tree block to cross page boundary for subpage support Qu Wenruo
@ 2020-09-11 10:26 ` Nikolay Borisov
2020-09-11 11:36 ` Qu Wenruo
0 siblings, 1 reply; 49+ messages in thread
From: Nikolay Borisov @ 2020-09-11 10:26 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
> As a preparation for subpage sector size support (allow sector size
> smaller than page size to be mounted), if the sector size is smaller
nit: (allowing filesystem with sector size smaller than page size to be
mounted)
> than page size, we don't allow tree block to be read if it cross page
> boundary (normally 64K).
Why normally 64k ? I suspect you assume readers are familiar with the
motivation for this work which they don't necessarily need to be. Please
make your sentences explicit. Correct me if I'm wrong but you are
ensuring that an eb doesn't cross the largest possible sectorsize?
>
> This ensures that, tree blocks are always contained in one page for 64K
> system, which can greatly simplify the handling.
"For system with 64k page size" the term "64k system" is ambiguous, heed
this when rewording other patches as well since I've seen this used in
multiple places.
> Or we need to do complex multi-page handling for tree blocks.
>
> Currently the only way to create such tree blocks crossing 64K boundary
> is by btrfs-convert, which will get fixed soon and doesn't get
> wide-spread usage.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/extent_io.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 5d969340275e..119193166cec 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5232,6 +5232,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
> btrfs_err(fs_info, "bad tree block start %llu", start);
> return ERR_PTR(-EINVAL);
> }
> + if (fs_info->sectorsize < PAGE_SIZE && round_down(start, PAGE_SIZE) !=
> + round_down(start + len - 1, PAGE_SIZE)) {
> + btrfs_err(fs_info,
> + "tree block crosses page boundary, start %llu nodesize %lu",
> + start, len);
> + return ERR_PTR(-EINVAL);
> + }
>
> eb = find_extent_buffer(fs_info, start);
> if (eb)
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 07/17] btrfs: make csum_tree_block() handle sectorsize smaller than page size
2020-09-08 7:52 ` [PATCH 07/17] btrfs: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
@ 2020-09-11 11:10 ` Nikolay Borisov
0 siblings, 0 replies; 49+ messages in thread
From: Nikolay Borisov @ 2020-09-11 11:10 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs; +Cc: Goldwyn Rodrigues
On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
> For subpage size support, we only need to handle the first page.
>
> To make the code work for both cases, we modify the following behaviors:
>
> - num_pages calcuation
> Instead of "nodesize >> PAGE_SHIFT", we go
> "DIV_ROUND_UP(nodesize, PAGE_SIZE)", this ensures we get at least one
> page for subpage size support, while still get the same result for
> reguar page size.
>
> - The length for the first run
> Instead of PAGE_SIZE - BTRFS_CSUM_SIZE, we go min(PAGE_SIZE, nodesize)
> - BTRFS_CSUM_SIZE.
> This allows us to handle both cases well.
>
> - The start location of the first run
> Instead of always use BTRFS_CSUM_SIZE as csum start position, add
> offset_in_page(eb->start) to get proper offset for both cases.
>
> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 08/17] btrfs: refactor how we extract extent buffer from page for alloc_extent_buffer()
2020-09-08 7:52 ` [PATCH 08/17] btrfs: refactor how we extract extent buffer from page for alloc_extent_buffer() Qu Wenruo
@ 2020-09-11 11:14 ` Nikolay Borisov
0 siblings, 0 replies; 49+ messages in thread
From: Nikolay Borisov @ 2020-09-11 11:14 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
> This patch will extract the code to extract extent_buffer from
> page::private into its own function, grab_extent_buffer_from_page().
>
> Although it's just one line, for later sub-page size support it will
> become way more larger.
>
> Also add some extra comments why we need to do such page::private
> dancing.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/extent_io.c | 49 ++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 40 insertions(+), 9 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 6fafbc1d047b..3c8fe40f67fa 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5214,6 +5214,44 @@ struct extent_buffer *alloc_test_extent_buffer(struct btrfs_fs_info *fs_info,
> }
> #endif
>
> +/*
> + * A helper to grab the exist extent buffer from a page.
> + *
> + * There is a small race window where two callers of alloc_extent_buffer():
> + * Thread 1 | Thread 2
> + * -------------------------------------+---------------------------------------
> + * alloc_extent_buffer() | alloc_extent_buffer()
> + * |- eb = __alloc_extent_buffer() | |- eb = __alloc_extent_buffer()
> + * |- p = find_or_create_page() | |- p = find_or_create_page()
> + *
> + * In above case, two ebs get allocated for the same bytenr, and got the same
> + * page.
> + * We have to rely on the page->mapping->private_lock to make one of them to
> + * give up and reuse the allocated eb:
> + *
> + * | |- grab_extent_buffer_from_page()
> + * | |- get nothing
> + * | |- attach_extent_buffer_page()
> + * | | |- Now page->private is set
> + * | |- spin_unlock(&mapping->private_lock);
> + * |- spin_lock(private_lock); | |- Continue to insert radix tree.
> + * |- grab_extent_buffer_from_page() |
> + * |- got eb from thread 2 |
> + * |- spin_unlock(private_lock); |
> + * |- goto free_eb; |
This comment is slightly misleading because it's not
graB_extent_buffer_form_page's return value which decides who wins the
race but rather the retval of PagePrivate which invoked _before_
grab_extent_buffer_from_page.
> + *
> + * The function here is to ensure we have proper locking and detect such race
> + * so we won't allocating an eb twice.
> + */
> +static struct extent_buffer *grab_extent_buffer_from_page(struct page *page)
> +{
> + /*
> + * For PAGE_SIZE == sectorsize case, a btree_inode page should have its
> + * private pointer as extent buffer who owns this page.
> + */
> + return (struct extent_buffer *)page->private;
> +}
> +
> struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
> u64 start)
> {
> @@ -5258,15 +5296,8 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>
> spin_lock(&mapping->private_lock);
> if (PagePrivate(p)) {
> - /*
> - * We could have already allocated an eb for this page
> - * and attached one so lets see if we can get a ref on
> - * the existing eb, and if we can we know it's good and
> - * we can just return that one, else we know we can just
> - * overwrite page->private.
> - */
> - exists = (struct extent_buffer *)p->private;
> - if (atomic_inc_not_zero(&exists->refs)) {
> + exists = grab_extent_buffer_from_page(p);
> + if (exists && atomic_inc_not_zero(&exists->refs)) {
grab_extent_buffer_from_page is guaranteed to return an extent buffer
due to the PagePrivate() check above. So simply doing atomic_in_not_zero
is fine.
> spin_unlock(&mapping->private_lock);
> unlock_page(p);
> put_page(p);
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 09/17] btrfs: refactor btrfs_release_extent_buffer_pages()
2020-09-08 7:52 ` [PATCH 09/17] btrfs: refactor btrfs_release_extent_buffer_pages() Qu Wenruo
@ 2020-09-11 11:17 ` Nikolay Borisov
2020-09-11 11:39 ` Qu Wenruo
0 siblings, 1 reply; 49+ messages in thread
From: Nikolay Borisov @ 2020-09-11 11:17 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
> We have attach_extent_buffer_page() and it get utilized in
> btrfs_clone_extent_buffer() and alloc_extent_buffer().
>
> But in btrfs_release_extent_buffer_pages() we manually call
> detach_page_private().
>
> This is fine for current code, but if we're going to support subpage
> size, we will do a lot of more work other than just calling
> detach_page_private().
>
> This patch will extract the main work of btrfs_clone_extent_buffer()
Did you mean to type btrfs_release_extent_buffer_pages instead of
clone_extent_buffer ?
> into detach_extent_buffer_page() so that later subpage size support can
> put their own code into them.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/extent_io.c | 58 +++++++++++++++++++-------------------------
> 1 file changed, 25 insertions(+), 33 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 3c8fe40f67fa..1cb41dab7a1d 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4920,6 +4920,29 @@ int extent_buffer_under_io(const struct extent_buffer *eb)
> test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
> }
>
> +static void detach_extent_buffer_page(struct extent_buffer *eb,
> + struct page *page)
> +{
> + bool mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
nit: Now you are performing the atomic op once per page rather than once
per-eb.
> +
> + if (!page)
> + return;
> +
> + if (mapped)
> + spin_lock(&page->mapping->private_lock);
> + if (PagePrivate(page) && page->private == (unsigned long)eb) {
> + BUG_ON(test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
> + BUG_ON(PageDirty(page));
> + BUG_ON(PageWriteback(page));
> + /* We need to make sure we haven't be attached to a new eb. */
> + detach_page_private(page);
> + }
> + if (mapped)
> + spin_unlock(&page->mapping->private_lock);
> + /* One for when we allocated the page */
> + put_page(page);
> +}
> +
> /*
> * Release all pages attached to the extent buffer.
> */
> @@ -4927,43 +4950,12 @@ static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
> {
> int i;
> int num_pages;
> - int mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
>
> BUG_ON(extent_buffer_under_io(eb));
>
> num_pages = num_extent_pages(eb);
> - for (i = 0; i < num_pages; i++) {
> - struct page *page = eb->pages[i];
> -
> - if (!page)
> - continue;
> - if (mapped)
> - spin_lock(&page->mapping->private_lock);
> - /*
> - * We do this since we'll remove the pages after we've
> - * removed the eb from the radix tree, so we could race
> - * and have this page now attached to the new eb. So
> - * only clear page_private if it's still connected to
> - * this eb.
> - */
> - if (PagePrivate(page) &&
> - page->private == (unsigned long)eb) {
> - BUG_ON(test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
> - BUG_ON(PageDirty(page));
> - BUG_ON(PageWriteback(page));
> - /*
> - * We need to make sure we haven't be attached
> - * to a new eb.
> - */
> - detach_page_private(page);
> - }
> -
> - if (mapped)
> - spin_unlock(&page->mapping->private_lock);
> -
> - /* One for when we allocated the page */
> - put_page(page);
> - }
> + for (i = 0; i < num_pages; i++)
> + detach_extent_buffer_page(eb, eb->pages[i]);
> }
>
> /*
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 10/17] btrfs: add assert_spin_locked() for attach_extent_buffer_page()
2020-09-08 7:52 ` [PATCH 10/17] btrfs: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
@ 2020-09-11 11:22 ` Nikolay Borisov
0 siblings, 0 replies; 49+ messages in thread
From: Nikolay Borisov @ 2020-09-11 11:22 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
> When calling attach_extent_buffer_page(), either we're attaching
> anonymous pages, called from btrfs_clone_extent_buffer().
>
> Or we're attaching btree_inode pages, called from alloc_extent_buffer().
>
> For the later case, we should have page->mapping->private_lock hold to
> avoid race modifying page->private.
>
> Add assert_spin_locked() if we're calling from alloc_extent_buffer().
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com> but see one nit below.
> ---
> fs/btrfs/extent_io.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 1cb41dab7a1d..81e43d99feda 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3096,6 +3096,9 @@ static int submit_extent_page(unsigned int opf,
> static void attach_extent_buffer_page(struct extent_buffer *eb,
> struct page *page)
> {
> + if (page->mapping)
> + assert_spin_locked(&page->mapping->private_lock);
nit: In addition to the changelog I'd rather have a comment explaining
where the distinction we make : So something like:
"Only pages allocated through alloc_extent_buffer will have their
mapping set"
> +
> if (!PagePrivate(page))
> attach_page_private(page, eb);
> else
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 05/17] btrfs: don't allow tree block to cross page boundary for subpage support
2020-09-11 10:26 ` Nikolay Borisov
@ 2020-09-11 11:36 ` Qu Wenruo
2020-09-11 12:08 ` Nikolay Borisov
0 siblings, 1 reply; 49+ messages in thread
From: Qu Wenruo @ 2020-09-11 11:36 UTC (permalink / raw)
To: Nikolay Borisov, linux-btrfs
On 2020/9/11 下午6:26, Nikolay Borisov wrote:
>
>
> On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
>> As a preparation for subpage sector size support (allow sector size
>> smaller than page size to be mounted), if the sector size is smaller
>
> nit: (allowing filesystem with sector size smaller than page size to be
> mounted)
>
>> than page size, we don't allow tree block to be read if it cross page
>> boundary (normally 64K).
>
> Why normally 64k ?
Because there are only two major arches supporting non-4k page size,
ppc64 which uses 64K page size, and arm which can support 4K, 16K and
64K page size.
Currently our plan is only to support 64K page size and 4K page size.
Furthermore, that 64K magic number matches with 64K stripe length of
btrfs, thus we choose 64K as the boundary.
> I suspect you assume readers are familiar with the
> motivation for this work which they don't necessarily need to be. Please
> make your sentences explicit. Correct me if I'm wrong but you are
> ensuring that an eb doesn't cross the largest possible sectorsize?
That's correct, and by incident, it's also the stripe length of btrfs
RAID, and scrub stripe length too.
(And scrub can't handle such tree block either)
Thanks,
Qu
>>
>> This ensures that, tree blocks are always contained in one page for 64K
>> system, which can greatly simplify the handling.
>
> "For system with 64k page size" the term "64k system" is ambiguous, heed
> this when rewording other patches as well since I've seen this used in
> multiple places.
>
>
>> Or we need to do complex multi-page handling for tree blocks.
>>
>> Currently the only way to create such tree blocks crossing 64K boundary
>> is by btrfs-convert, which will get fixed soon and doesn't get
>> wide-spread usage.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> fs/btrfs/extent_io.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 5d969340275e..119193166cec 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -5232,6 +5232,13 @@ struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info,
>> btrfs_err(fs_info, "bad tree block start %llu", start);
>> return ERR_PTR(-EINVAL);
>> }
>> + if (fs_info->sectorsize < PAGE_SIZE && round_down(start, PAGE_SIZE) !=
>> + round_down(start + len - 1, PAGE_SIZE)) {
>> + btrfs_err(fs_info,
>> + "tree block crosses page boundary, start %llu nodesize %lu",
>> + start, len);
>> + return ERR_PTR(-EINVAL);
>> + }
>>
>> eb = find_extent_buffer(fs_info, start);
>> if (eb)
>>
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 09/17] btrfs: refactor btrfs_release_extent_buffer_pages()
2020-09-11 11:17 ` Nikolay Borisov
@ 2020-09-11 11:39 ` Qu Wenruo
0 siblings, 0 replies; 49+ messages in thread
From: Qu Wenruo @ 2020-09-11 11:39 UTC (permalink / raw)
To: Nikolay Borisov, linux-btrfs
On 2020/9/11 下午7:17, Nikolay Borisov wrote:
>
>
> On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
>> We have attach_extent_buffer_page() and it get utilized in
>> btrfs_clone_extent_buffer() and alloc_extent_buffer().
>>
>> But in btrfs_release_extent_buffer_pages() we manually call
>> detach_page_private().
>>
>> This is fine for current code, but if we're going to support subpage
>> size, we will do a lot of more work other than just calling
>> detach_page_private().
>>
>> This patch will extract the main work of btrfs_clone_extent_buffer()
>
> Did you mean to type btrfs_release_extent_buffer_pages instead of
> clone_extent_buffer ?
Oh, that's what I mean exactly...
>
>> into detach_extent_buffer_page() so that later subpage size support can
>> put their own code into them.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> fs/btrfs/extent_io.c | 58 +++++++++++++++++++-------------------------
>> 1 file changed, 25 insertions(+), 33 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 3c8fe40f67fa..1cb41dab7a1d 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -4920,6 +4920,29 @@ int extent_buffer_under_io(const struct extent_buffer *eb)
>> test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
>> }
>>
>> +static void detach_extent_buffer_page(struct extent_buffer *eb,
>> + struct page *page)
>> +{
>> + bool mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
>
> nit: Now you are performing the atomic op once per page rather than once
> per-eb.
Makes sense, I should just extract the for () loop into a function
rather than the loop body.
Thanks,
Qu
>
>> +
>> + if (!page)
>> + return;
>> +
>> + if (mapped)
>> + spin_lock(&page->mapping->private_lock);
>> + if (PagePrivate(page) && page->private == (unsigned long)eb) {
>> + BUG_ON(test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
>> + BUG_ON(PageDirty(page));
>> + BUG_ON(PageWriteback(page));
>> + /* We need to make sure we haven't be attached to a new eb. */
>> + detach_page_private(page);
>> + }
>> + if (mapped)
>> + spin_unlock(&page->mapping->private_lock);
>> + /* One for when we allocated the page */
>> + put_page(page);
>> +}
>> +
>> /*
>> * Release all pages attached to the extent buffer.
>> */
>> @@ -4927,43 +4950,12 @@ static void btrfs_release_extent_buffer_pages(struct extent_buffer *eb)
>> {
>> int i;
>> int num_pages;
>> - int mapped = !test_bit(EXTENT_BUFFER_UNMAPPED, &eb->bflags);
>>
>> BUG_ON(extent_buffer_under_io(eb));
>>
>> num_pages = num_extent_pages(eb);
>> - for (i = 0; i < num_pages; i++) {
>> - struct page *page = eb->pages[i];
>> -
>> - if (!page)
>> - continue;
>> - if (mapped)
>> - spin_lock(&page->mapping->private_lock);
>> - /*
>> - * We do this since we'll remove the pages after we've
>> - * removed the eb from the radix tree, so we could race
>> - * and have this page now attached to the new eb. So
>> - * only clear page_private if it's still connected to
>> - * this eb.
>> - */
>> - if (PagePrivate(page) &&
>> - page->private == (unsigned long)eb) {
>> - BUG_ON(test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags));
>> - BUG_ON(PageDirty(page));
>> - BUG_ON(PageWriteback(page));
>> - /*
>> - * We need to make sure we haven't be attached
>> - * to a new eb.
>> - */
>> - detach_page_private(page);
>> - }
>> -
>> - if (mapped)
>> - spin_unlock(&page->mapping->private_lock);
>> -
>> - /* One for when we allocated the page */
>> - put_page(page);
>> - }
>> + for (i = 0; i < num_pages; i++)
>> + detach_extent_buffer_page(eb, eb->pages[i]);
>> }
>>
>> /*
>>
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 05/17] btrfs: don't allow tree block to cross page boundary for subpage support
2020-09-11 11:36 ` Qu Wenruo
@ 2020-09-11 12:08 ` Nikolay Borisov
0 siblings, 0 replies; 49+ messages in thread
From: Nikolay Borisov @ 2020-09-11 12:08 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 11.09.20 г. 14:36 ч., Qu Wenruo wrote:
>
>
> On 2020/9/11 下午6:26, Nikolay Borisov wrote:
>>
>>
>> On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
>>> As a preparation for subpage sector size support (allow sector size
>>> smaller than page size to be mounted), if the sector size is smaller
>>
>> nit: (allowing filesystem with sector size smaller than page size to be
>> mounted)
>>
>>> than page size, we don't allow tree block to be read if it cross page
>>> boundary (normally 64K).
>>
>> Why normally 64k ?
>
> Because there are only two major arches supporting non-4k page size,
> ppc64 which uses 64K page size, and arm which can support 4K, 16K and
> 64K page size.
>
> Currently our plan is only to support 64K page size and 4K page size.
I really rather make the support generic rather than boxing ourselves in
some "artificial" constraints.
>
> Furthermore, that 64K magic number matches with 64K stripe length of
> btrfs, thus we choose 64K as the boundary.
>
>> I suspect you assume readers are familiar with the
>> motivation for this work which they don't necessarily need to be. Please
>> make your sentences explicit. Correct me if I'm wrong but you are
>> ensuring that an eb doesn't cross the largest possible sectorsize?
>
> That's correct, and by incident, it's also the stripe length of btrfs
> RAID, and scrub stripe length too.
>
> (And scrub can't handle such tree block either)
And that by itself is a good reason why we may want to support just 64k.
So state that explicitly.
>
> Thanks,
> Qu
>
<snip>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 11/17] btrfs: extract the extent buffer verification from btree_readpage_end_io_hook()
2020-09-08 7:52 ` [PATCH 11/17] btrfs: extract the extent buffer verification from btree_readpage_end_io_hook() Qu Wenruo
@ 2020-09-11 13:00 ` Nikolay Borisov
0 siblings, 0 replies; 49+ messages in thread
From: Nikolay Borisov @ 2020-09-11 13:00 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
> Currently btree_readpage_end_io_hook() only needs to handle one extent
> buffer as currently one page only maps to one extent buffer.
>
> But for incoming subpage support, one page can be mapped to multiple
> extent buffers, thus we can no longer use current code.
>
> This refactor would allow us to call btrfs_check_extent_buffer() on
> all involved extent buffers at btree_readpage_end_io_hook() and other
> locations.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> ---
> fs/btrfs/disk-io.c | 78 ++++++++++++++++++++++++++--------------------
> 1 file changed, 45 insertions(+), 33 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 62dbd9bbd381..f6e562979682 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -574,60 +574,37 @@ static int check_tree_block_fsid(struct extent_buffer *eb)
> return ret;
> }
>
<snip>
> +static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
> + u64 phy_offset, struct page *page,
> + u64 start, u64 end, int mirror)
> +{
> + struct extent_buffer *eb;
> + int ret = 0;
> + int reads_done;
nit: while at it just turn it into a bool.
> +
> + if (!page->private)
> + goto out;
> +
nit: metadata pages are guaranteed to have their ->private val set to
the pointer of extent buffer so this check can go away.
> + eb = (struct extent_buffer *)page->private;
> +
> + /*
> + * The pending IO might have been the only thing that kept this buffer
> + * in memory. Make sure we have a ref for all this other checks
> + */
> + atomic_inc(&eb->refs);
> +
> + reads_done = atomic_dec_and_test(&eb->io_pages);
> + if (!reads_done)
> + goto err;
> +
> + eb->read_mirror = mirror;
> + if (test_bit(EXTENT_BUFFER_READ_ERR, &eb->bflags)) {
> + ret = -EIO;
> + goto err;
> + }
> +
> + ret = btrfs_check_extent_buffer(eb);
> +
> err:
> if (reads_done &&
> test_and_clear_bit(EXTENT_BUFFER_READAHEAD, &eb->bflags))
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 12/17] btrfs: remove the unnecessary parameter @start and @len for check_data_csum()
2020-09-08 7:52 ` [PATCH 12/17] btrfs: remove the unnecessary parameter @start and @len for check_data_csum() Qu Wenruo
@ 2020-09-11 13:50 ` Nikolay Borisov
0 siblings, 0 replies; 49+ messages in thread
From: Nikolay Borisov @ 2020-09-11 13:50 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
> For check_data_csum(), the page we're using is directly from inode
> mapping, thus it has valid page_offset().
>
> We can use (page_offset() + pg_off) to replace @start parameter
> completely, while the @len should always be sectorsize.
>
> Since we're here, also add some comment, since there are quite some
> confusion in words like start/offset, without explaining whether it's
> file_offset or logical bytenr.
>
> This should not affect the existing behavior, as for current sectorsize
> == PAGE_SIZE case, @pgoff should always be 0, and len is always
> PAGE_SIZE (or sectorsize from the dio read path).
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
<snip>
> @@ -2857,8 +2870,7 @@ static int btrfs_readpage_end_io_hook(struct btrfs_io_bio *io_bio,
> }
>
> phy_offset >>= inode->i_sb->s_blocksize_bits;
> - return check_data_csum(inode, io_bio, phy_offset, page, offset, start,
> - (size_t)(end - start + 1));
> + return check_data_csum(inode, io_bio, phy_offset, page, offset);
offset in this function is calculated as:
size_t offset = start - page_offset(page);
Where stat is an input parameter that's derived in end_bio_extent_readpage:
start = page_offset(page);
So it seems to be always set to 0 and can simply be removed.
> }
>
> /*
<snip>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 13/17] btrfs: extent_io: only require sector size alignment for page read
2020-09-08 7:52 ` [PATCH 13/17] btrfs: extent_io: only require sector size alignment for page read Qu Wenruo
@ 2020-09-11 13:55 ` Nikolay Borisov
2020-09-15 1:54 ` Qu Wenruo
0 siblings, 1 reply; 49+ messages in thread
From: Nikolay Borisov @ 2020-09-11 13:55 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs
On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
> If we're reading partial page, btrfs will warn about this as our
> read/write are always done in sector size, which equals page size.
>
> But for the incoming subpage RO support, our data read is only aligned
> to sectorsize, which can be smaller than page size.
>
> Thus here we change the warning condition to check it against
> sectorsize, thus the behavior is not changed for regular sectorsize ==
> PAGE_SIZE case, while won't report error for subpage read.
>
> Also, pass the proper start/end with bv_offset for check_data_csum() to
> handle.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> fs/btrfs/extent_io.c | 19 ++++++++++++-------
> 1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 81e43d99feda..a83b63ecc5f8 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2819,6 +2819,7 @@ static void end_bio_extent_readpage(struct bio *bio)
> struct page *page = bvec->bv_page;
> struct inode *inode = page->mapping->host;
> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> + u32 sectorsize = fs_info->sectorsize;
> bool data_inode = btrfs_ino(BTRFS_I(inode))
> != BTRFS_BTREE_INODE_OBJECTID;
>
> @@ -2829,13 +2830,17 @@ static void end_bio_extent_readpage(struct bio *bio)
> tree = &BTRFS_I(inode)->io_tree;
> failure_tree = &BTRFS_I(inode)->io_failure_tree;
>
> - /* We always issue full-page reads, but if some block
> + /*
> + * We always issue full-sector reads, but if some block
> * in a page fails to read, blk_update_request() will
> * advance bv_offset and adjust bv_len to compensate.
> - * Print a warning for nonzero offsets, and an error
> - * if they don't add up to a full page. */
> - if (bvec->bv_offset || bvec->bv_len != PAGE_SIZE) {
> - if (bvec->bv_offset + bvec->bv_len != PAGE_SIZE)
> + * Print a warning for unaligned offsets, and an error
> + * if they don't add up to a full sector.
> + */
> + if (!IS_ALIGNED(bvec->bv_offset, sectorsize) ||
> + !IS_ALIGNED(bvec->bv_offset + bvec->bv_len, sectorsize)) {
> + if (!IS_ALIGNED(bvec->bv_offset + bvec->bv_len,
> + sectorsize))
Duplicated check ...
> btrfs_err(fs_info,
> "partial page read in btrfs with offset %u and length %u",
> bvec->bv_offset, bvec->bv_len);
> @@ -2845,8 +2850,8 @@ static void end_bio_extent_readpage(struct bio *bio)
> bvec->bv_offset, bvec->bv_len);
> }
>
> - start = page_offset(page);
> - end = start + bvec->bv_offset + bvec->bv_len - 1;
> + start = page_offset(page) + bvec->bv_offset;
> + end = start + bvec->bv_len - 1;
nit: 'start' and 'end' must really be renamed - to file_offset and
file_end because they represent values in the logical namespace of the
file. And given the context they are used i.e endio handler where we
also deal with extent starts and physical offsets such a rename is long
over due. Perhaps you can create a separate patch when you are
resending the series alternatively I'll make a sweep across those
low-level functions to clean that up.
> len = bvec->bv_len;
>
> mirror = io_bio->mirror_num;
>
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [PATCH 13/17] btrfs: extent_io: only require sector size alignment for page read
2020-09-11 13:55 ` Nikolay Borisov
@ 2020-09-15 1:54 ` Qu Wenruo
0 siblings, 0 replies; 49+ messages in thread
From: Qu Wenruo @ 2020-09-15 1:54 UTC (permalink / raw)
To: Nikolay Borisov, linux-btrfs
On 2020/9/11 下午9:55, Nikolay Borisov wrote:
>
>
> On 8.09.20 г. 10:52 ч., Qu Wenruo wrote:
>> If we're reading partial page, btrfs will warn about this as our
>> read/write are always done in sector size, which equals page size.
>>
>> But for the incoming subpage RO support, our data read is only aligned
>> to sectorsize, which can be smaller than page size.
>>
>> Thus here we change the warning condition to check it against
>> sectorsize, thus the behavior is not changed for regular sectorsize ==
>> PAGE_SIZE case, while won't report error for subpage read.
>>
>> Also, pass the proper start/end with bv_offset for check_data_csum() to
>> handle.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> fs/btrfs/extent_io.c | 19 ++++++++++++-------
>> 1 file changed, 12 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 81e43d99feda..a83b63ecc5f8 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -2819,6 +2819,7 @@ static void end_bio_extent_readpage(struct bio *bio)
>> struct page *page = bvec->bv_page;
>> struct inode *inode = page->mapping->host;
>> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>> + u32 sectorsize = fs_info->sectorsize;
>> bool data_inode = btrfs_ino(BTRFS_I(inode))
>> != BTRFS_BTREE_INODE_OBJECTID;
>>
>> @@ -2829,13 +2830,17 @@ static void end_bio_extent_readpage(struct bio *bio)
>> tree = &BTRFS_I(inode)->io_tree;
>> failure_tree = &BTRFS_I(inode)->io_failure_tree;
>>
>> - /* We always issue full-page reads, but if some block
>> + /*
>> + * We always issue full-sector reads, but if some block
>> * in a page fails to read, blk_update_request() will
>> * advance bv_offset and adjust bv_len to compensate.
>> - * Print a warning for nonzero offsets, and an error
>> - * if they don't add up to a full page. */
>> - if (bvec->bv_offset || bvec->bv_len != PAGE_SIZE) {
>> - if (bvec->bv_offset + bvec->bv_len != PAGE_SIZE)
>> + * Print a warning for unaligned offsets, and an error
>> + * if they don't add up to a full sector.
>> + */
>> + if (!IS_ALIGNED(bvec->bv_offset, sectorsize) ||
>> + !IS_ALIGNED(bvec->bv_offset + bvec->bv_len, sectorsize)) {
>> + if (!IS_ALIGNED(bvec->bv_offset + bvec->bv_len,
>> + sectorsize))
>
> Duplicated check ...
BTW, this is not duplicated, it's to distinguish two different error
patterns...
One for read request which doesn't end at sector boundary, and the other
one for which doesn't start at sector boundary.
>
>> btrfs_err(fs_info,
>> "partial page read in btrfs with offset %u and length %u",
>> bvec->bv_offset, bvec->bv_len);
>> @@ -2845,8 +2850,8 @@ static void end_bio_extent_readpage(struct bio *bio)
>> bvec->bv_offset, bvec->bv_len);
>> }
>>
>> - start = page_offset(page);
>> - end = start + bvec->bv_offset + bvec->bv_len - 1;
>> + start = page_offset(page) + bvec->bv_offset;
>> + end = start + bvec->bv_len - 1;
>
> nit: 'start' and 'end' must really be renamed - to file_offset and
> file_end because they represent values in the logical namespace of the
> file. And given the context they are used i.e endio handler where we
> also deal with extent starts and physical offsets such a rename is long
> over due. Perhaps you can create a separate patch when you are
> resending the series alternatively I'll make a sweep across those
> low-level functions to clean that up.
I guess we could do that in another patchset.
The naming is really aweful, but there are tons of other similar
situations across the code base.
It may be a big batch of work to properly unify the naming.
And the naming itself will take some time to mature.
We have a lot of different terms which share the similar meanings but
still slightly different:
- bytenr
btrfs logical bytenr
- file_offset
the offset inside a file
And in this particular case, for btree inode, bytenr == file_offset,
which may make things more complex.
While for regualr file inodes, file_offset is different from the extent
bytenr.
So we really need to come out with a proper term table for this...
Thanks,
Qu
>
>> len = bvec->bv_len;
>>
>> mirror = io_bio->mirror_num;
>>
>
^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2020-09-15 1:55 UTC | newest]
Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-08 7:52 [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
2020-09-08 7:52 ` [PATCH 01/17] btrfs: extent-io-tests: remove invalid tests Qu Wenruo
2020-09-09 12:26 ` Nikolay Borisov
2020-09-09 13:06 ` Qu Wenruo
2020-09-08 7:52 ` [PATCH 02/17] btrfs: calculate inline extent buffer page size based on page size Qu Wenruo
2020-09-11 9:56 ` Nikolay Borisov
2020-09-11 10:13 ` Qu Wenruo
2020-09-08 7:52 ` [PATCH 03/17] btrfs: remove the open-code to read disk-key Qu Wenruo
2020-09-11 10:07 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 04/17] btrfs: make btrfs_fs_info::buffer_radix to take sector size devided values Qu Wenruo
2020-09-08 18:03 ` kernel test robot
2020-09-11 10:11 ` Nikolay Borisov
2020-09-11 10:15 ` Qu Wenruo
2020-09-08 7:52 ` [PATCH 05/17] btrfs: don't allow tree block to cross page boundary for subpage support Qu Wenruo
2020-09-11 10:26 ` Nikolay Borisov
2020-09-11 11:36 ` Qu Wenruo
2020-09-11 12:08 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 06/17] btrfs: handle sectorsize < PAGE_SIZE case for extent buffer accessors Qu Wenruo
2020-09-08 7:52 ` [PATCH 07/17] btrfs: make csum_tree_block() handle sectorsize smaller than page size Qu Wenruo
2020-09-11 11:10 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 08/17] btrfs: refactor how we extract extent buffer from page for alloc_extent_buffer() Qu Wenruo
2020-09-11 11:14 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 09/17] btrfs: refactor btrfs_release_extent_buffer_pages() Qu Wenruo
2020-09-11 11:17 ` Nikolay Borisov
2020-09-11 11:39 ` Qu Wenruo
2020-09-08 7:52 ` [PATCH 10/17] btrfs: add assert_spin_locked() for attach_extent_buffer_page() Qu Wenruo
2020-09-11 11:22 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 11/17] btrfs: extract the extent buffer verification from btree_readpage_end_io_hook() Qu Wenruo
2020-09-11 13:00 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 12/17] btrfs: remove the unnecessary parameter @start and @len for check_data_csum() Qu Wenruo
2020-09-11 13:50 ` Nikolay Borisov
2020-09-08 7:52 ` [PATCH 13/17] btrfs: extent_io: only require sector size alignment for page read Qu Wenruo
2020-09-11 13:55 ` Nikolay Borisov
2020-09-15 1:54 ` Qu Wenruo
2020-09-08 7:52 ` [PATCH 14/17] btrfs: make btrfs_readpage_end_io_hook() follow sector size Qu Wenruo
2020-09-09 17:34 ` Goldwyn Rodrigues
2020-09-10 0:05 ` Qu Wenruo
2020-09-10 14:26 ` Goldwyn Rodrigues
2020-09-08 7:52 ` [PATCH 15/17] btrfs: introduce subpage_eb_mapping for extent buffers Qu Wenruo
2020-09-08 10:22 ` kernel test robot
2020-09-08 10:22 ` kernel test robot
2020-09-08 12:11 ` kernel test robot
2020-09-08 14:24 ` Dan Carpenter
2020-09-08 14:24 ` Dan Carpenter
2020-09-08 14:24 ` Dan Carpenter
2020-09-08 7:52 ` [PATCH 16/17] btrfs: handle extent buffer verification proper for subpage size Qu Wenruo
2020-09-08 7:52 ` [PATCH 17/17] btrfs: allow RO mount of 4K sector size fs on 64K page system Qu Wenruo
2020-09-08 8:03 ` [PATCH 00/17] btrfs: add read-only support for subpage sector size Qu Wenruo
2020-09-11 10:24 ` Qu Wenruo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.