* [PATCH] btrfs: relocation: Fix KASAN reports caused by extended reloc tree lifespan
@ 2020-01-04 13:56 Qu Wenruo
2020-01-05 14:49 ` Nikolay Borisov
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Qu Wenruo @ 2020-01-04 13:56 UTC (permalink / raw)
To: linux-btrfs; +Cc: Zygo Blaxell, David Sterba
[BUG]
There are several different KASAN reports for balance + snapshot
workloads.
Involved call paths include:
should_ignore_root+0x54/0xb0 [btrfs]
build_backref_tree+0x11af/0x2280 [btrfs]
relocate_tree_blocks+0x391/0xb80 [btrfs]
relocate_block_group+0x3e5/0xa00 [btrfs]
btrfs_relocate_block_group+0x240/0x4d0 [btrfs]
btrfs_relocate_chunk+0x53/0xf0 [btrfs]
btrfs_balance+0xc91/0x1840 [btrfs]
btrfs_ioctl_balance+0x416/0x4e0 [btrfs]
btrfs_ioctl+0x8af/0x3e60 [btrfs]
do_vfs_ioctl+0x831/0xb10
ksys_ioctl+0x67/0x90
__x64_sys_ioctl+0x43/0x50
do_syscall_64+0x79/0xe0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
create_reloc_root+0x9f/0x460 [btrfs]
btrfs_reloc_post_snapshot+0xff/0x6c0 [btrfs]
create_pending_snapshot+0xa9b/0x15f0 [btrfs]
create_pending_snapshots+0x111/0x140 [btrfs]
btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
btrfs_mksubvol+0x915/0x960 [btrfs]
btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
btrfs_ioctl+0x241b/0x3e60 [btrfs]
do_vfs_ioctl+0x831/0xb10
ksys_ioctl+0x67/0x90
__x64_sys_ioctl+0x43/0x50
do_syscall_64+0x79/0xe0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs]
create_pending_snapshot+0x209/0x15f0 [btrfs]
create_pending_snapshots+0x111/0x140 [btrfs]
btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
btrfs_mksubvol+0x915/0x960 [btrfs]
btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
btrfs_ioctl+0x241b/0x3e60 [btrfs]
do_vfs_ioctl+0x831/0xb10
ksys_ioctl+0x67/0x90
__x64_sys_ioctl+0x43/0x50
do_syscall_64+0x79/0xe0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
[CAUSE]
All these call sites are only relying on root->reloc_root, which can
undergo btrfs_drop_snapshot(), and since we don't have real refcount
based protection to reloc roots, we can reach already dropped reloc
root, triggering KASAN.
[FIX]
To avoid such access to unstable root->reloc_root, we should check
BTRFS_ROOT_DEAD_RELOC_TREE bit first.
This patch introduces a new wrapper, have_reloc_root(), to do the proper
check for most callers who don't distinguish merged reloc tree and no
reloc tree.
The only exception is should_ignore_root(), as merged reloc tree can be
ignored, while no reloc tree shouldn't.
Also, set_bit()/clear_bit()/test_bit() doesn't imply a memory barrier,
and BTRFS_ROOT_DEAD_RELOC_TREE is the only indicator, also add extra
memory barrier for that bit.
Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
Singed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Difference between this and David's diff:
- Use proper smp_mb__after_atomic() for clear_bit()
- Use test_bit() only check for should_ignore_root()
That call site is an except, can't go regular have_reloc_root() check
- Add extra comment for have_reloc_root()
---
fs/btrfs/relocation.c | 32 ++++++++++++++++++++++++++++----
1 file changed, 28 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index d897a8e5e430..586f045bb6dc 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -517,6 +517,23 @@ static int update_backref_cache(struct btrfs_trans_handle *trans,
return 1;
}
+/*
+ * Check if this subvolume tree has valid reloc(*) tree.
+ *
+ * *: Reloc tree after swap is considered dead, thus not considered as valid.
+ * This is enough for most callers, as they don't distinguish dead reloc
+ * root from no reloc root.
+ * But should_ignore_root() below is a special case.
+ */
+static bool have_reloc_root(struct btrfs_root *root)
+{
+ smp_mb__before_atomic();
+ if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
+ return false;
+ if (!root->reloc_root)
+ return false;
+ return true;
+}
static int should_ignore_root(struct btrfs_root *root)
{
@@ -525,6 +542,11 @@ static int should_ignore_root(struct btrfs_root *root)
if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
return 0;
+ /* This root has been merged with its reloc tree, we can ignore it */
+ smp_mb__before_atomic();
+ if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
+ return 1;
+
reloc_root = root->reloc_root;
if (!reloc_root)
return 0;
@@ -1439,6 +1461,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
* The subvolume has reloc tree but the swap is finished, no need to
* create/update the dead reloc tree
*/
+ smp_mb__before_atomic();
if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
return 0;
@@ -1478,8 +1501,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
struct btrfs_root_item *root_item;
int ret;
- if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state) ||
- !root->reloc_root)
+ if (!have_reloc_root(root))
goto out;
reloc_root = root->reloc_root;
@@ -1489,6 +1511,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
if (fs_info->reloc_ctl->merge_reloc_tree &&
btrfs_root_refs(root_item) == 0) {
set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
+ smp_mb__after_atomic();
__del_reloc_root(reloc_root);
}
@@ -2202,6 +2225,7 @@ static int clean_dirty_subvols(struct reloc_control *rc)
ret = ret2;
}
clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
+ smp_mb__after_atomic();
btrfs_put_fs_root(root);
} else {
/* Orphan reloc tree, just clean it up */
@@ -4717,7 +4741,7 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
struct btrfs_root *root = pending->root;
struct reloc_control *rc = root->fs_info->reloc_ctl;
- if (!root->reloc_root || !rc)
+ if (!rc || !have_reloc_root(root))
return;
if (!rc->merge_reloc_tree)
@@ -4751,7 +4775,7 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
struct reloc_control *rc = root->fs_info->reloc_ctl;
int ret;
- if (!root->reloc_root || !rc)
+ if (!rc || !have_reloc_root(root))
return 0;
rc = root->fs_info->reloc_ctl;
--
2.24.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: relocation: Fix KASAN reports caused by extended reloc tree lifespan
2020-01-04 13:56 [PATCH] btrfs: relocation: Fix KASAN reports caused by extended reloc tree lifespan Qu Wenruo
@ 2020-01-05 14:49 ` Nikolay Borisov
[not found] ` <b58caea4-476b-bf83-292d-ea71052bbea7@toxicpanda.com>
2020-01-06 18:15 ` [PATCH] btrfs: relocation: Fix KASAN reports caused by extended reloc tree lifespan David Sterba
2 siblings, 0 replies; 9+ messages in thread
From: Nikolay Borisov @ 2020-01-05 14:49 UTC (permalink / raw)
To: Qu Wenruo, linux-btrfs; +Cc: Zygo Blaxell, David Sterba
On 4.01.20 г. 15:56 ч., Qu Wenruo wrote:
> [BUG]
> There are several different KASAN reports for balance + snapshot
> workloads.
> Involved call paths include:
>
> should_ignore_root+0x54/0xb0 [btrfs]
> build_backref_tree+0x11af/0x2280 [btrfs]
> relocate_tree_blocks+0x391/0xb80 [btrfs]
> relocate_block_group+0x3e5/0xa00 [btrfs]
> btrfs_relocate_block_group+0x240/0x4d0 [btrfs]
> btrfs_relocate_chunk+0x53/0xf0 [btrfs]
> btrfs_balance+0xc91/0x1840 [btrfs]
> btrfs_ioctl_balance+0x416/0x4e0 [btrfs]
> btrfs_ioctl+0x8af/0x3e60 [btrfs]
> do_vfs_ioctl+0x831/0xb10
> ksys_ioctl+0x67/0x90
> __x64_sys_ioctl+0x43/0x50
> do_syscall_64+0x79/0xe0
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> create_reloc_root+0x9f/0x460 [btrfs]
> btrfs_reloc_post_snapshot+0xff/0x6c0 [btrfs]
> create_pending_snapshot+0xa9b/0x15f0 [btrfs]
> create_pending_snapshots+0x111/0x140 [btrfs]
> btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
> btrfs_mksubvol+0x915/0x960 [btrfs]
> btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
> btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
> btrfs_ioctl+0x241b/0x3e60 [btrfs]
> do_vfs_ioctl+0x831/0xb10
> ksys_ioctl+0x67/0x90
> __x64_sys_ioctl+0x43/0x50
> do_syscall_64+0x79/0xe0
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs]
> create_pending_snapshot+0x209/0x15f0 [btrfs]
> create_pending_snapshots+0x111/0x140 [btrfs]
> btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
> btrfs_mksubvol+0x915/0x960 [btrfs]
> btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
> btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
> btrfs_ioctl+0x241b/0x3e60 [btrfs]
> do_vfs_ioctl+0x831/0xb10
> ksys_ioctl+0x67/0x90
> __x64_sys_ioctl+0x43/0x50
> do_syscall_64+0x79/0xe0
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> [CAUSE]
> All these call sites are only relying on root->reloc_root, which can
> undergo btrfs_drop_snapshot(), and since we don't have real refcount
> based protection to reloc roots, we can reach already dropped reloc
> root, triggering KASAN.
>
> [FIX]
> To avoid such access to unstable root->reloc_root, we should check
> BTRFS_ROOT_DEAD_RELOC_TREE bit first.
>
> This patch introduces a new wrapper, have_reloc_root(), to do the proper
> check for most callers who don't distinguish merged reloc tree and no
> reloc tree.
>
> The only exception is should_ignore_root(), as merged reloc tree can be
> ignored, while no reloc tree shouldn't.
>
> Also, set_bit()/clear_bit()/test_bit() doesn't imply a memory barrier,
> and BTRFS_ROOT_DEAD_RELOC_TREE is the only indicator, also add extra
> memory barrier for that bit.
>
> Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
> Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
> Singed-off-by: David Sterba <dsterba@suse.com>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Difference between this and David's diff:
> - Use proper smp_mb__after_atomic() for clear_bit()
> - Use test_bit() only check for should_ignore_root()
> That call site is an except, can't go regular have_reloc_root() check
> - Add extra comment for have_reloc_root()
> ---
> fs/btrfs/relocation.c | 32 ++++++++++++++++++++++++++++----
> 1 file changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index d897a8e5e430..586f045bb6dc 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -517,6 +517,23 @@ static int update_backref_cache(struct btrfs_trans_handle *trans,
> return 1;
> }
>
> +/*
> + * Check if this subvolume tree has valid reloc(*) tree.
> + *
> + * *: Reloc tree after swap is considered dead, thus not considered as valid.
> + * This is enough for most callers, as they don't distinguish dead reloc
> + * root from no reloc root.
> + * But should_ignore_root() below is a special case.
> + */
> +static bool have_reloc_root(struct btrfs_root *root)
> +{
> + smp_mb__before_atomic();
> + if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
> + return false;
> + if (!root->reloc_root)
> + return false;
> + return true;
> +}
>
> static int should_ignore_root(struct btrfs_root *root)
> {
> @@ -525,6 +542,11 @@ static int should_ignore_root(struct btrfs_root *root)
> if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
> return 0;
>
> + /* This root has been merged with its reloc tree, we can ignore it */
> + smp_mb__before_atomic();
Haven't analyzed the patch deeply but if you add memory barriers you
*must* add comments explaining the ordering guarantees those barriers
provide.
<snip>
^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <b58caea4-476b-bf83-292d-ea71052bbea7@toxicpanda.com>]
* Re: r
[not found] ` <b58caea4-476b-bf83-292d-ea71052bbea7@toxicpanda.com>
@ 2020-01-06 18:04 ` David Sterba
2020-01-06 19:26 ` r Josef Bacik
0 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2020-01-06 18:04 UTC (permalink / raw)
To: Josef Bacik; +Cc: Qu Wenruo, linux-btrfs, Zygo Blaxell, David Sterba
On Mon, Jan 06, 2020 at 11:33:51AM -0500, Josef Bacik wrote:
> This took me a minute to figure out, but from what I can tell you are doing the
> mb's around the BTRFS_ROOT_DEAD_RELOC_TREE flag so that in clean_dirty_subvols()
> where we clear the bit and then set root->reloc_root = NULL we are sure to
> either see the bit or that reloc_root == NULL.
>
> That's fine, but man all these random memory barriers around the bit messing
> make 0 sense and confuse the issue, what we really want is the
> smp_mb__after_atomic() in clean_dirty_subvols() and the smp_mb__before_atomic()
> in have_reloc_root().
The barriers around test_bit are required, test_bit could be reordered
as it's not a RMW operation. I suggest reding docs/atomic_t.rst on that
topic.
> But instead since we really want to know the right answer for root->reloc_root,
> and we clear that _before_ we clear the BTRFS_ROOT_DEAD_RELOC_TREE let's just do
> READ_ONCE/WRITE_ONCE everywhere we access the reloc_root. In fact you could just do
But READ/WRITE_ONCE don't guarantee CPU-ordering, only that compiler
will not reload the variable in case it's used more than once.
> static struct btrfs_root get_reloc_root(struct btrfs_root *root)
> {
> if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
> return NULL;
> return READ_ONCE(root->reloc_root);
Use of READ_ONCE has no effect here and produces the same buggy code as
we have now.
I sent the code to Qu in the previous discussion as work in progress,
with uncommented barriers, expecting that they will be documented in the
final version. So don't blame him, I should have not let barriers
reasoning left only on him. I'll comment under the patch.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: r
2020-01-06 18:04 ` r David Sterba
@ 2020-01-06 19:26 ` Josef Bacik
0 siblings, 0 replies; 9+ messages in thread
From: Josef Bacik @ 2020-01-06 19:26 UTC (permalink / raw)
To: dsterba, Qu Wenruo, linux-btrfs, Zygo Blaxell, David Sterba
On 1/6/20 1:04 PM, David Sterba wrote:
> On Mon, Jan 06, 2020 at 11:33:51AM -0500, Josef Bacik wrote:
>> This took me a minute to figure out, but from what I can tell you are doing the
>> mb's around the BTRFS_ROOT_DEAD_RELOC_TREE flag so that in clean_dirty_subvols()
>> where we clear the bit and then set root->reloc_root = NULL we are sure to
>> either see the bit or that reloc_root == NULL.
>>
>> That's fine, but man all these random memory barriers around the bit messing
>> make 0 sense and confuse the issue, what we really want is the
>> smp_mb__after_atomic() in clean_dirty_subvols() and the smp_mb__before_atomic()
>> in have_reloc_root().
>
> The barriers around test_bit are required, test_bit could be reordered
> as it's not a RMW operation. I suggest reding docs/atomic_t.rst on that
> topic.
>
>> But instead since we really want to know the right answer for root->reloc_root,
>> and we clear that _before_ we clear the BTRFS_ROOT_DEAD_RELOC_TREE let's just do
>> READ_ONCE/WRITE_ONCE everywhere we access the reloc_root. In fact you could just do
>
> But READ/WRITE_ONCE don't guarantee CPU-ordering, only that compiler
> will not reload the variable in case it's used more than once.
>
>> static struct btrfs_root get_reloc_root(struct btrfs_root *root)
>> {
>> if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
>> return NULL;
>> return READ_ONCE(root->reloc_root);
>
> Use of READ_ONCE has no effect here and produces the same buggy code as
> we have now.
>
Hmm didn't follow smp_read_barrier_depends() all the way down, I assumed it at
least protected from re-ordering. Looks like it only does something on alpha.
> I sent the code to Qu in the previous discussion as work in progress,
> with uncommented barriers, expecting that they will be documented in the
> final version. So don't blame him, I should have not let barriers
> reasoning left only on him. I'll comment under the patch.
>
There's still just too many of them, like I said before we're only worried about
either BTRFS_ROOT_DEAD_RELOC_TREE or !root->reloc_root. So I guess instead do
something like
static struct btrfs_root *get_reloc_root(struct btrfs_root *root)
{
if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
return NULL;
smp_mb__after_atomic();
return root->reloc_root;
}
And then in clean_dirty_subvols() do the smp_mb__before_atomic() before the
clear_bit. There's no reason for the random mb's around the other test_bit's.
Thanks,
Josef
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: relocation: Fix KASAN reports caused by extended reloc tree lifespan
2020-01-04 13:56 [PATCH] btrfs: relocation: Fix KASAN reports caused by extended reloc tree lifespan Qu Wenruo
2020-01-05 14:49 ` Nikolay Borisov
[not found] ` <b58caea4-476b-bf83-292d-ea71052bbea7@toxicpanda.com>
@ 2020-01-06 18:15 ` David Sterba
2020-01-07 2:30 ` Qu Wenruo
2 siblings, 1 reply; 9+ messages in thread
From: David Sterba @ 2020-01-06 18:15 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, Zygo Blaxell, David Sterba
On Sat, Jan 04, 2020 at 09:56:02PM +0800, Qu Wenruo wrote:
> }
>
> +/*
> + * Check if this subvolume tree has valid reloc(*) tree.
> + *
> + * *: Reloc tree after swap is considered dead, thus not considered as valid.
> + * This is enough for most callers, as they don't distinguish dead reloc
> + * root from no reloc root.
> + * But should_ignore_root() below is a special case.
> + */
> +static bool have_reloc_root(struct btrfs_root *root)
> +{
> + smp_mb__before_atomic();
That one should be the easiest, to get an up to date value of the bit,
sync before reading it. Similar to smp_rmb.
> + if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
> + return false;
> + if (!root->reloc_root)
> + return false;
> + return true;
> +}
>
> static int should_ignore_root(struct btrfs_root *root)
> {
> @@ -525,6 +542,11 @@ static int should_ignore_root(struct btrfs_root *root)
> if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
> return 0;
>
> + /* This root has been merged with its reloc tree, we can ignore it */
> + smp_mb__before_atomic();
This could be replaced by have_reloc_root but the reloc_root has to be
check twice in that function. Here it was slightly optimized as it
partially opencodes have_reloc_root. For clarity and fewer standalone
barriers using the helper might be better.
> + if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
> + return 1;
> +
> reloc_root = root->reloc_root;
> if (!reloc_root)
> return 0;
> @@ -1439,6 +1461,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
> * The subvolume has reloc tree but the swap is finished, no need to
> * create/update the dead reloc tree
> */
> + smp_mb__before_atomic();
Another partial have_reloc_root, could be used here as well with
additional reloc_tree check.
> if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
> return 0;
>
> @@ -1478,8 +1501,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
> struct btrfs_root_item *root_item;
> int ret;
>
> - if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state) ||
> - !root->reloc_root)
> + if (!have_reloc_root(root))
> goto out;
>
> reloc_root = root->reloc_root;
> @@ -1489,6 +1511,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
> if (fs_info->reloc_ctl->merge_reloc_tree &&
> btrfs_root_refs(root_item) == 0) {
> set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
First set the bit, so anybody who properly uses barriers before checking
the bit will see it set
> + smp_mb__after_atomic();
since the reloc_root pointer is not safe to be accessed since this point
> __del_reloc_root(reloc_root);
> }
>
> @@ -2202,6 +2225,7 @@ static int clean_dirty_subvols(struct reloc_control *rc)
> ret = ret2;
> }
> clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
This one looks misplaced and reverse, root->reloc_root is set to NULL a
few lines before and the barrier must be between this and clear_bit.
This was not in my proposed version, why did you change that?
> + smp_mb__after_atomic();
> btrfs_put_fs_root(root);
> } else {
> /* Orphan reloc tree, just clean it up */
> @@ -4717,7 +4741,7 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
> struct btrfs_root *root = pending->root;
> struct reloc_control *rc = root->fs_info->reloc_ctl;
>
> - if (!root->reloc_root || !rc)
> + if (!rc || !have_reloc_root(root))
> return;
>
> if (!rc->merge_reloc_tree)
> @@ -4751,7 +4775,7 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
> struct reloc_control *rc = root->fs_info->reloc_ctl;
> int ret;
>
> - if (!root->reloc_root || !rc)
> + if (!rc || !have_reloc_root(root))
> return 0;
>
> rc = root->fs_info->reloc_ctl;
> --
> 2.24.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: relocation: Fix KASAN reports caused by extended reloc tree lifespan
2020-01-06 18:15 ` [PATCH] btrfs: relocation: Fix KASAN reports caused by extended reloc tree lifespan David Sterba
@ 2020-01-07 2:30 ` Qu Wenruo
2020-01-07 2:35 ` Qu Wenruo
0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2020-01-07 2:30 UTC (permalink / raw)
To: dsterba, Qu Wenruo, linux-btrfs, Zygo Blaxell, David Sterba
[-- Attachment #1.1: Type: text/plain, Size: 5319 bytes --]
On 2020/1/7 上午2:15, David Sterba wrote:
> On Sat, Jan 04, 2020 at 09:56:02PM +0800, Qu Wenruo wrote:
>> }
>>
>> +/*
>> + * Check if this subvolume tree has valid reloc(*) tree.
>> + *
>> + * *: Reloc tree after swap is considered dead, thus not considered as valid.
>> + * This is enough for most callers, as they don't distinguish dead reloc
>> + * root from no reloc root.
>> + * But should_ignore_root() below is a special case.
>> + */
>> +static bool have_reloc_root(struct btrfs_root *root)
>> +{
>> + smp_mb__before_atomic();
>
> That one should be the easiest, to get an up to date value of the bit,
> sync before reading it. Similar to smp_rmb.
Yep, if we just go plain rmb/wmb everything would be much easier to
understand.
But since full rmb/wmb is expensive and in this case we're only address
certain arches which doesn't follower Total Store Order, we still need
to use that variant.
One more question.
Why not use before_atomic() and after_atomic() to surround the
set_bit()/test_bit()?
If before and after acts as rmb/wmb, then we don't really need this
before_atomic call aroudn test_bit()
>
>> + if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
>> + return false;
>> + if (!root->reloc_root)
>> + return false;
>> + return true;
>> +}
>>
>> static int should_ignore_root(struct btrfs_root *root)
>> {
>> @@ -525,6 +542,11 @@ static int should_ignore_root(struct btrfs_root *root)
>> if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
>> return 0;
>>
>> + /* This root has been merged with its reloc tree, we can ignore it */
>> + smp_mb__before_atomic();
>
> This could be replaced by have_reloc_root but the reloc_root has to be
> check twice in that function. Here it was slightly optimized as it
> partially opencodes have_reloc_root. For clarity and fewer standalone
> barriers using the helper might be better.
The problem is, have_reloc_root() returns false if either:
- DEAD_RELOC_TREE is set
- no reloc_root
What we really want is, if bit set, return 1, but if no reloc root,
return 0 instead.
So we can't use that helper at all.
>
>> + if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
>> + return 1;
>> +
>> reloc_root = root->reloc_root;
>> if (!reloc_root)
>> return 0;
>> @@ -1439,6 +1461,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
>> * The subvolume has reloc tree but the swap is finished, no need to
>> * create/update the dead reloc tree
>> */
>> + smp_mb__before_atomic();
>
> Another partial have_reloc_root, could be used here as well with
> additional reloc_tree check.
Same problem as should_ignore_root().
Or we need to do extra reloc_root out of the helper.
>
>> if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
>> return 0;
>>
>> @@ -1478,8 +1501,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
>> struct btrfs_root_item *root_item;
>> int ret;
>>
>> - if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state) ||
>> - !root->reloc_root)
>> + if (!have_reloc_root(root))
>> goto out;
>>
>> reloc_root = root->reloc_root;
>> @@ -1489,6 +1511,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
>> if (fs_info->reloc_ctl->merge_reloc_tree &&
>> btrfs_root_refs(root_item) == 0) {
>> set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
>
> First set the bit, so anybody who properly uses barriers before checking
> the bit will see it set
Still the same question, why not use before and after version around
set_bit()/clear_bit() so test_bit() doesn't need extra before_atomic call?
>
>> + smp_mb__after_atomic();
>
> since the reloc_root pointer is not safe to be accessed since this point
>
>> __del_reloc_root(reloc_root);
>> }
>>
>> @@ -2202,6 +2225,7 @@ static int clean_dirty_subvols(struct reloc_control *rc)
>> ret = ret2;
>> }
>> clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
>
> This one looks misplaced and reverse, root->reloc_root is set to NULL a
> few lines before and the barrier must be between this and clear_bit.
Got the point.
> This was not in my proposed version, why did you change that?
I thought the clear_bit() must be visible for all later operations, thus
the after_atomic() is needed.
But forgot the reloc_root is set to NULL.
So in that case, I guess we need both barriers.
Thanks,
Qu
>
>
>
>> + smp_mb__after_atomic();
>> btrfs_put_fs_root(root);
>> } else {
>> /* Orphan reloc tree, just clean it up */
>> @@ -4717,7 +4741,7 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
>> struct btrfs_root *root = pending->root;
>> struct reloc_control *rc = root->fs_info->reloc_ctl;
>>
>> - if (!root->reloc_root || !rc)
>> + if (!rc || !have_reloc_root(root))
>> return;
>>
>> if (!rc->merge_reloc_tree)
>> @@ -4751,7 +4775,7 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
>> struct reloc_control *rc = root->fs_info->reloc_ctl;
>> int ret;
>>
>> - if (!root->reloc_root || !rc)
>> + if (!rc || !have_reloc_root(root))
>> return 0;
>>
>> rc = root->fs_info->reloc_ctl;
>> --
>> 2.24.1
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] btrfs: relocation: Fix KASAN reports caused by extended reloc tree lifespan
2020-01-07 2:30 ` Qu Wenruo
@ 2020-01-07 2:35 ` Qu Wenruo
0 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2020-01-07 2:35 UTC (permalink / raw)
To: dsterba, Qu Wenruo, linux-btrfs, Zygo Blaxell, David Sterba
[-- Attachment #1.1: Type: text/plain, Size: 5612 bytes --]
On 2020/1/7 上午10:30, Qu Wenruo wrote:
>
>
> On 2020/1/7 上午2:15, David Sterba wrote:
>> On Sat, Jan 04, 2020 at 09:56:02PM +0800, Qu Wenruo wrote:
>>> }
>>>
>>> +/*
>>> + * Check if this subvolume tree has valid reloc(*) tree.
>>> + *
>>> + * *: Reloc tree after swap is considered dead, thus not considered as valid.
>>> + * This is enough for most callers, as they don't distinguish dead reloc
>>> + * root from no reloc root.
>>> + * But should_ignore_root() below is a special case.
>>> + */
>>> +static bool have_reloc_root(struct btrfs_root *root)
>>> +{
>>> + smp_mb__before_atomic();
>>
>> That one should be the easiest, to get an up to date value of the bit,
>> sync before reading it. Similar to smp_rmb.
>
> Yep, if we just go plain rmb/wmb everything would be much easier to
> understand.
>
> But since full rmb/wmb is expensive and in this case we're only address
> certain arches which doesn't follower Total Store Order, we still need
> to use that variant.
>
>
> One more question.
>
> Why not use before_atomic() and after_atomic() to surround the
> set_bit()/test_bit()?
Typo, it's set_bit()/clear_bit().
>
> If before and after acts as rmb/wmb, then we don't really need this
> before_atomic call aroudn test_bit()
>
>>
>>> + if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
>>> + return false;
>>> + if (!root->reloc_root)
>>> + return false;
>>> + return true;
>>> +}
>>>
>>> static int should_ignore_root(struct btrfs_root *root)
>>> {
>>> @@ -525,6 +542,11 @@ static int should_ignore_root(struct btrfs_root *root)
>>> if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
>>> return 0;
>>>
>>> + /* This root has been merged with its reloc tree, we can ignore it */
>>> + smp_mb__before_atomic();
>>
>> This could be replaced by have_reloc_root but the reloc_root has to be
>> check twice in that function. Here it was slightly optimized as it
>> partially opencodes have_reloc_root. For clarity and fewer standalone
>> barriers using the helper might be better.
>
> The problem is, have_reloc_root() returns false if either:
> - DEAD_RELOC_TREE is set
> - no reloc_root
>
> What we really want is, if bit set, return 1, but if no reloc root,
> return 0 instead.
>
> So we can't use that helper at all.
>
>>
>>> + if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
>>> + return 1;
>>> +
>>> reloc_root = root->reloc_root;
>>> if (!reloc_root)
>>> return 0;
>>> @@ -1439,6 +1461,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
>>> * The subvolume has reloc tree but the swap is finished, no need to
>>> * create/update the dead reloc tree
>>> */
>>> + smp_mb__before_atomic();
>>
>> Another partial have_reloc_root, could be used here as well with
>> additional reloc_tree check.
>
> Same problem as should_ignore_root().
> Or we need to do extra reloc_root out of the helper.
>
>>
>>> if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
>>> return 0;
>>>
>>> @@ -1478,8 +1501,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
>>> struct btrfs_root_item *root_item;
>>> int ret;
>>>
>>> - if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state) ||
>>> - !root->reloc_root)
>>> + if (!have_reloc_root(root))
>>> goto out;
>>>
>>> reloc_root = root->reloc_root;
>>> @@ -1489,6 +1511,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
>>> if (fs_info->reloc_ctl->merge_reloc_tree &&
>>> btrfs_root_refs(root_item) == 0) {
>>> set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
>>
>> First set the bit, so anybody who properly uses barriers before checking
>> the bit will see it set
>
> Still the same question, why not use before and after version around
> set_bit()/clear_bit() so test_bit() doesn't need extra before_atomic call?
>>
>>> + smp_mb__after_atomic();
>>
>> since the reloc_root pointer is not safe to be accessed since this point
>>
>>> __del_reloc_root(reloc_root);
>>> }
>>>
>>> @@ -2202,6 +2225,7 @@ static int clean_dirty_subvols(struct reloc_control *rc)
>>> ret = ret2;
>>> }
>>> clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
>>
>> This one looks misplaced and reverse, root->reloc_root is set to NULL a
>> few lines before and the barrier must be between this and clear_bit.
>
> Got the point.
>
>> This was not in my proposed version, why did you change that?
>
> I thought the clear_bit() must be visible for all later operations, thus
> the after_atomic() is needed.
> But forgot the reloc_root is set to NULL.
>
> So in that case, I guess we need both barriers.
>
> Thanks,
> Qu
>
>>
>>
>>
>>> + smp_mb__after_atomic();
>>> btrfs_put_fs_root(root);
>>> } else {
>>> /* Orphan reloc tree, just clean it up */
>>> @@ -4717,7 +4741,7 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
>>> struct btrfs_root *root = pending->root;
>>> struct reloc_control *rc = root->fs_info->reloc_ctl;
>>>
>>> - if (!root->reloc_root || !rc)
>>> + if (!rc || !have_reloc_root(root))
>>> return;
>>>
>>> if (!rc->merge_reloc_tree)
>>> @@ -4751,7 +4775,7 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
>>> struct reloc_control *rc = root->fs_info->reloc_ctl;
>>> int ret;
>>>
>>> - if (!root->reloc_root || !rc)
>>> + if (!rc || !have_reloc_root(root))
>>> return 0;
>>>
>>> rc = root->fs_info->reloc_ctl;
>>> --
>>> 2.24.1
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* freevxfs: hp-ux support. patchset 1-7/7
@ 2016-05-26 14:45 Krzysztof Błaszkowski
2016-05-26 15:53 ` r Christoph Hellwig
0 siblings, 1 reply; 9+ messages in thread
From: Krzysztof Błaszkowski @ 2016-05-26 14:45 UTC (permalink / raw)
To: Christoph Hellwig, Carlos Maiolino; +Cc: linux-fsdevel
Hi,
So then let it roll.
Thanks
>From 3d3b4e1ed5df014ae191e0566ff86a17d7d9ac05 Mon Sep 17 00:00:00 2001
From: KB <kb@sysmikro.com.pl>
Date: Wed, 25 May 2016 21:50:11 +0200
Subject: [PATCH 1/7] kconfig note
Signed-off-by: KB <kb@sysmikro.com.pl>
---
fs/freevxfs/Kconfig | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/fs/freevxfs/Kconfig b/fs/freevxfs/Kconfig
index 8dc1cd5..a4c9075 100644
--- a/fs/freevxfs/Kconfig
+++ b/fs/freevxfs/Kconfig
@@ -5,12 +5,20 @@ config VXFS_FS
FreeVxFS is a file system driver that support the VERITAS VxFS(TM)
file system format. VERITAS VxFS(TM) is the standard file system
of SCO UnixWare (and possibly others) and optionally available
- for Sunsoft Solaris, HP-UX and many other operating systems.
+ for Sunsoft Solaris, HP-UX and many other operating systems. However
+ these particular OS implementations of vxfs may differ in on-disk
+ data endianess and/or superblock offset. The vxfs module has been
+ tested with SCO UnixWare and HP-UX B.10.20 (pa-risc 1.1 arch.)
Currently only readonly access is supported.
NOTE: the file system type as used by mount(1), mount(2) and
fstab(5) is 'vxfs' as it describes the file system format, not
the actual driver.
+ There is a userspace utility for HP-UX logical volumes which makes
+ creating HP-UX logical volumes easy from HP-UX disk block device file
+ or regular file with image of the disk. See:
+ https://sourceforge.net/projects/linux-vxfs/
+
To compile this as a module, choose M here: the module will be
called freevxfs. If unsure, say N.
--
1.7.3.4
>From c617f6bceedc2f68c62e7432f12b59124bab34f7 Mon Sep 17 00:00:00 2001
From: KB <kb@sysmikro.com.pl>
Date: Wed, 25 May 2016 22:12:05 +0200
Subject: [PATCH 2/7] cpu endian vs file system endian
Signed-off-by: KB <kb@sysmikro.com.pl>
---
fs/freevxfs/vxfs.h | 25 +++++++++++-
fs/freevxfs/vxfs_bmap.c | 24 ++++++-----
fs/freevxfs/vxfs_fshead.c | 25 +++++++++++-
fs/freevxfs/vxfs_inode.c | 61 +++++++++++++++++++++++++++-
fs/freevxfs/vxfs_inode.h | 8 ++--
fs/freevxfs/vxfs_lookup.c | 15 ++++++-
fs/freevxfs/vxfs_olt.c | 15 ++++---
fs/freevxfs/vxfs_super.c | 97 ++++++++++++++++++++++++++++++++-------------
8 files changed, 215 insertions(+), 55 deletions(-)
diff --git a/fs/freevxfs/vxfs.h b/fs/freevxfs/vxfs.h
index c8a9265..5dc8949 100644
--- a/fs/freevxfs/vxfs.h
+++ b/fs/freevxfs/vxfs.h
@@ -152,7 +152,7 @@ struct vxfs_sb {
/*
* Actually much more...
*/
-};
+} __packed;
/*
@@ -168,9 +168,32 @@ struct vxfs_sb_info {
ino_t vsi_fshino; /* fileset header inode */
daddr_t vsi_oltext; /* OLT extent */
daddr_t vsi_oltsize; /* OLT size */
+ int byte_order;
+ int silent;
+};
+
+enum {
+ BO_LE = 1,
+ BO_BE
};
+static inline u32 fs32_to_cpu(int bo, u32 a)
+{
+ return (bo == BO_BE) ? be32_to_cpu(a) : le32_to_cpu(a);
+}
+
+static inline u16 fs16_to_cpu(int bo, u16 a)
+{
+ return (bo == BO_BE) ? be16_to_cpu(a) : le16_to_cpu(a);
+}
+
+static inline u64 fs64_to_cpu(int bo, u64 a)
+{
+ return (bo == BO_BE) ? be64_to_cpu(a) : le64_to_cpu(a);
+}
+
+
/*
* File modes. File types above 0xf000 are vxfs internal only, they should
* not be passed back to higher levels of the system. vxfs file types must
diff --git a/fs/freevxfs/vxfs_bmap.c b/fs/freevxfs/vxfs_bmap.c
index f86fd3c..95afd98 100644
--- a/fs/freevxfs/vxfs_bmap.c
+++ b/fs/freevxfs/vxfs_bmap.c
@@ -92,7 +92,8 @@ vxfs_bmap_ext4(struct inode *ip, long bn)
goto fail_buf;
indir = (u32 *)buf->b_data;
- bno = indir[(bn/indsize) % (indsize*bn)] + (bn%indsize);
+ bno = fs32_to_cpu(VXFS_SBI(sb)->byte_order,
+ indir[(bn/indsize) % (indsize*bn)]) + (bn % indsize);
brelse(buf);
return bno;
@@ -130,6 +131,7 @@ vxfs_bmap_indir(struct inode *ip, long indir, int size, long block)
struct buffer_head *bp = NULL;
daddr_t pblock = 0;
int i;
+ int bo = VXFS_SBI(ip->i_sb)->byte_order;
for (i = 0; i < size * VXFS_TYPED_PER_BLOCK(ip->i_sb); i++) {
struct vxfs_typed *typ;
@@ -142,24 +144,24 @@ vxfs_bmap_indir(struct inode *ip, long indir, int size, long block)
typ = ((struct vxfs_typed *)bp->b_data) +
(i % VXFS_TYPED_PER_BLOCK(ip->i_sb));
- off = (typ->vt_hdr & VXFS_TYPED_OFFSETMASK);
+ off = fs64_to_cpu(bo, typ->vt_hdr) & VXFS_TYPED_OFFSETMASK;
if (block < off) {
brelse(bp);
continue;
}
- switch ((u_int32_t)(typ->vt_hdr >> VXFS_TYPED_TYPESHIFT)) {
+ switch ((u_int32_t)(fs64_to_cpu(bo, typ->vt_hdr) >> VXFS_TYPED_TYPESHIFT)) {
case VXFS_TYPED_INDIRECT:
- pblock = vxfs_bmap_indir(ip, typ->vt_block,
- typ->vt_size, block - off);
+ pblock = vxfs_bmap_indir(ip, fs32_to_cpu(bo, typ->vt_block),
+ fs32_to_cpu(bo, typ->vt_size), block - off);
if (pblock == -2)
break;
goto out;
case VXFS_TYPED_DATA:
- if ((block - off) >= typ->vt_size)
+ if ((block - off) >= fs32_to_cpu(bo, typ->vt_size))
break;
- pblock = (typ->vt_block + block - off);
+ pblock = fs32_to_cpu(bo, typ->vt_block) + block - off;
goto out;
case VXFS_TYPED_INDIRECT_DEV4:
case VXFS_TYPED_DATA_DEV4: {
@@ -168,12 +170,14 @@ vxfs_bmap_indir(struct inode *ip, long indir, int size, long block)
printk(KERN_INFO "\n\nTYPED_DEV4 detected!\n");
printk(KERN_INFO "block: %Lu\tsize: %Ld\tdev: %d\n",
- (unsigned long long) typ4->vd4_block,
- (unsigned long long) typ4->vd4_size,
- typ4->vd4_dev);
+ (unsigned long long) fs64_to_cpu(bo, typ4->vd4_block),
+ (unsigned long long) fs64_to_cpu(bo, typ4->vd4_size),
+ fs32_to_cpu(bo, typ4->vd4_dev));
goto fail;
}
default:
+ printk(KERN_ERR "%s:%d vt_hdr %llu\n", __func__, __LINE__,
+ fs64_to_cpu(bo, typ->vt_hdr));
BUG();
}
brelse(bp);
diff --git a/fs/freevxfs/vxfs_fshead.c b/fs/freevxfs/vxfs_fshead.c
index c9a6a94..6cbdde7 100644
--- a/fs/freevxfs/vxfs_fshead.c
+++ b/fs/freevxfs/vxfs_fshead.c
@@ -60,6 +60,29 @@ vxfs_dumpfsh(struct vxfs_fsh *fhp)
}
#endif
+#define VXFS_FS32(field1, field2) fhp->field1 = fs32_to_cpu(bo, dbh->field2)
+static inline void dbh2fhp(struct vxfs_fsh *fhp, void *_dbh, int bo)
+{
+ struct vxfs_fsh *dbh = (struct vxfs_fsh *)_dbh;
+
+ VXFS_FS32(fsh_version, fsh_version);
+ VXFS_FS32(fsh_fsindex, fsh_fsindex);
+ VXFS_FS32(fsh_time, fsh_time);
+ VXFS_FS32(fsh_utime, fsh_utime);
+ VXFS_FS32(fsh_extop, fsh_extop);
+ VXFS_FS32(fsh_ninodes, fsh_ninodes);
+ VXFS_FS32(fsh_nau, fsh_nau);
+ VXFS_FS32(fsh_old_ilesize, fsh_old_ilesize);
+ VXFS_FS32(fsh_dflags, fsh_dflags);
+ VXFS_FS32(fsh_quota, fsh_quota);
+ VXFS_FS32(fsh_maxinode, fsh_maxinode);
+ VXFS_FS32(fsh_iauino, fsh_iauino);
+ VXFS_FS32(fsh_ilistino[0], fsh_ilistino[0]);
+ VXFS_FS32(fsh_ilistino[1], fsh_ilistino[1]);
+ VXFS_FS32(fsh_lctino, fsh_lctino);
+}
+
+
/**
* vxfs_getfsh - read fileset header into memory
* @ip: the (fake) fileset header inode
@@ -83,7 +106,7 @@ vxfs_getfsh(struct inode *ip, int which)
if (!(fhp = kmalloc(sizeof(*fhp), GFP_KERNEL)))
goto out;
- memcpy(fhp, bp->b_data, sizeof(*fhp));
+ dbh2fhp(fhp, bp->b_data, VXFS_SBI(ip->i_sb)->byte_order);
put_bh(bp);
return (fhp);
diff --git a/fs/freevxfs/vxfs_inode.c b/fs/freevxfs/vxfs_inode.c
index 363e3ae..53b8757 100644
--- a/fs/freevxfs/vxfs_inode.c
+++ b/fs/freevxfs/vxfs_inode.c
@@ -67,6 +67,63 @@ vxfs_dumpi(struct vxfs_inode_info *vip, ino_t ino)
}
#endif
+#define VXFS_FS32(field1, field2) vip->field1 = fs32_to_cpu(bo, dip->field2)
+#define VXFS_FS64(field1, field2) vip->field1 = fs64_to_cpu(bo, dip->field2)
+#define VXFS_FS16(field1, field2) vip->field1 = fs16_to_cpu(bo, dip->field2)
+
+static inline void dip2vip_cpy(struct vxfs_inode_info *vip, struct vxfs_dinode *dip, int bo)
+{
+ int j;
+
+ VXFS_FS32(vdi_mode, vdi_mode);
+ VXFS_FS32(vdi_nlink, vdi_nlink);
+ VXFS_FS32(vdi_uid, vdi_uid);
+ VXFS_FS32(vdi_gid, vdi_gid);
+ VXFS_FS64(vdi_size, vdi_size);
+ VXFS_FS32(vdi_atime, vdi_atime);
+ VXFS_FS32(vdi_autime, vdi_autime);
+ VXFS_FS32(vdi_mtime, vdi_mtime);
+ VXFS_FS32(vdi_mutime, vdi_mutime);
+ VXFS_FS32(vdi_ctime, vdi_ctime);
+ VXFS_FS32(vdi_cutime, vdi_cutime);
+ vip->vdi_aflags = dip->vdi_aflags;
+ vip->vdi_orgtype = dip->vdi_orgtype;
+ VXFS_FS16(vdi_eopflags, vdi_eopflags);
+ VXFS_FS32(vdi_eopdata, vdi_eopdata);
+
+ VXFS_FS32(vdi_ftarea.i_regular.reserved, vdi_ftarea.i_regular.reserved);
+ VXFS_FS32(vdi_ftarea.i_regular.fixextsize, vdi_ftarea.i_regular.fixextsize);
+ VXFS_FS32(vdi_blocks, vdi_blocks);
+ VXFS_FS32(vdi_gen, vdi_gen);
+ VXFS_FS64(vdi_version, vdi_version);
+
+ switch (dip->vdi_orgtype) {
+ case VXFS_ORG_EXT4:
+ VXFS_FS32(vdi_org.ext4.ve4_spare, vdi_org.ext4.ve4_spare);
+ VXFS_FS32(vdi_org.ext4.ve4_indsize, vdi_org.ext4.ve4_indsize);
+ for (j = 0; j < VXFS_NIADDR; j++) {
+ VXFS_FS32(vdi_org.ext4.ve4_indir[j], vdi_org.ext4.ve4_indir[j]);
+ }
+ for (j = 0; j < VXFS_NDADDR; j++) {
+ VXFS_FS32(vdi_org.ext4.ve4_direct[j].extent, vdi_org.ext4.ve4_direct[j].extent);
+ VXFS_FS32(vdi_org.ext4.ve4_direct[j].size, vdi_org.ext4.ve4_direct[j].size);
+ }
+ break;
+ case VXFS_ORG_IMMED:
+ memcpy(&vip->vdi_org.immed, &dip->vdi_org.immed, sizeof(vip->vdi_org.immed));
+ break;
+ case VXFS_ORG_TYPED:
+ for (j = 0; j < VXFS_NTYPED; j++) {
+ VXFS_FS64(vdi_org.typed[j].vt_hdr, vdi_org.typed[j].vt_hdr);
+ VXFS_FS32(vdi_org.typed[j].vt_block, vdi_org.typed[j].vt_block);
+ VXFS_FS32(vdi_org.typed[j].vt_size, vdi_org.typed[j].vt_size);
+ }
+ break;
+ };
+
+ VXFS_FS32(vdi_iattrino, vdi_iattrino);
+}
+
/**
* vxfs_blkiget - find inode based on extent #
@@ -101,7 +158,7 @@ vxfs_blkiget(struct super_block *sbp, u_long extent, ino_t ino)
if (!(vip = kmem_cache_alloc(vxfs_inode_cachep, GFP_KERNEL)))
goto fail;
dip = (struct vxfs_dinode *)(bp->b_data + offset);
- memcpy(vip, dip, sizeof(*vip));
+ dip2vip_cpy(vip, dip, VXFS_SBI(sbp)->byte_order);
#ifdef DIAGNOSTIC
vxfs_dumpi(vip, ino);
#endif
@@ -143,7 +200,7 @@ __vxfs_iget(ino_t ino, struct inode *ilistp)
if (!(vip = kmem_cache_alloc(vxfs_inode_cachep, GFP_KERNEL)))
goto fail;
dip = (struct vxfs_dinode *)(kaddr + offset);
- memcpy(vip, dip, sizeof(*vip));
+ dip2vip_cpy(vip, dip, VXFS_SBI(ilistp->i_sb)->byte_order);
#ifdef DIAGNOSTIC
vxfs_dumpi(vip, ino);
#endif
diff --git a/fs/freevxfs/vxfs_inode.h b/fs/freevxfs/vxfs_inode.h
index 240aeb1..9a2c376 100644
--- a/fs/freevxfs/vxfs_inode.h
+++ b/fs/freevxfs/vxfs_inode.h
@@ -77,13 +77,13 @@ struct vxfs_ext4 {
vx_daddr_t extent; /* Extent number */
int32_t size; /* Size of extent */
} ve4_direct[VXFS_NDADDR];
-};
+} __packed;
struct vxfs_typed {
u_int64_t vt_hdr; /* Header, 0xTTOOOOOOOOOOOOOO; T=type,O=offs */
vx_daddr_t vt_block; /* Extent block */
int32_t vt_size; /* Size in blocks */
-};
+} __packed;
struct vxfs_typed_dev4 {
u_int64_t vd4_hdr; /* Header, 0xTTOOOOOOOOOOOOOO; T=type,O=offs */
@@ -91,7 +91,7 @@ struct vxfs_typed_dev4 {
u_int64_t vd4_size; /* Size in blocks */
int32_t vd4_dev; /* Device ID */
u_int32_t __pad1;
-};
+} __packed;
/*
* The inode as contained on the physical device.
@@ -134,7 +134,7 @@ struct vxfs_dinode {
struct vxfs_typed typed[VXFS_NTYPED];
} vdi_org;
u_int32_t vdi_iattrino;
-};
+} __packed;
#define vdi_rdev vdi_ftarea.rdev
#define vdi_dotdot vdi_ftarea.dotdot
diff --git a/fs/freevxfs/vxfs_lookup.c b/fs/freevxfs/vxfs_lookup.c
index 99c7f0a..cea158a 100644
--- a/fs/freevxfs/vxfs_lookup.c
+++ b/fs/freevxfs/vxfs_lookup.c
@@ -96,6 +96,15 @@ vxfs_next_entry(struct vxfs_direct *de)
return ((struct vxfs_direct *)((char*)de + de->d_reclen));
}
+/*
+ * VXFS_dirblk_ovh is the overhead of a specific dirblock.
+ */
+static inline u_long VXFS_dirblk_ovh(struct vxfs_dirblk *dbp, int bo)
+{
+ return (sizeof(short) * fs16_to_cpu(bo, dbp->d_nhash)) + 4;
+}
+
+
/**
* vxfs_find_entry - find a mathing directory entry for a dentry
* @ip: directory inode
@@ -242,6 +251,8 @@ vxfs_readdir(struct file *fp, struct dir_context *ctx)
u_long bsize = sbp->s_blocksize;
u_long page, npages, block, pblocks, nblocks, offset;
loff_t pos;
+ int bo = VXFS_SBI(sbp)->byte_order;
+
if (ctx->pos == 0) {
if (!dir_emit_dot(fp, ctx))
@@ -297,8 +308,8 @@ vxfs_readdir(struct file *fp, struct dir_context *ctx)
offset = (char *)de - kaddr;
ctx->pos = ((page << PAGE_CACHE_SHIFT) | offset) + 2;
- if (!dir_emit(ctx, de->d_name, de->d_namelen,
- de->d_ino, DT_UNKNOWN)) {
+ if (!dir_emit(ctx, de->d_name, fs16_to_cpu(bo, de->d_namelen),
+ fs32_to_cpu(bo, de->d_ino), DT_UNKNOWN)) {
vxfs_put_page(pp);
return 0;
}
diff --git a/fs/freevxfs/vxfs_olt.c b/fs/freevxfs/vxfs_olt.c
index 0495008..6b50188 100644
--- a/fs/freevxfs/vxfs_olt.c
+++ b/fs/freevxfs/vxfs_olt.c
@@ -43,14 +43,14 @@ static inline void
vxfs_get_fshead(struct vxfs_oltfshead *fshp, struct vxfs_sb_info *infp)
{
BUG_ON(infp->vsi_fshino);
- infp->vsi_fshino = fshp->olt_fsino[0];
+ infp->vsi_fshino = fs32_to_cpu(infp->byte_order, fshp->olt_fsino[0]);
}
static inline void
vxfs_get_ilist(struct vxfs_oltilist *ilistp, struct vxfs_sb_info *infp)
{
BUG_ON(infp->vsi_iext);
- infp->vsi_iext = ilistp->olt_iext[0];
+ infp->vsi_iext = fs32_to_cpu(infp->byte_order, ilistp->olt_iext[0]);
}
static inline u_long
@@ -80,6 +80,7 @@ vxfs_read_olt(struct super_block *sbp, u_long bsize)
struct buffer_head *bp;
struct vxfs_olt *op;
char *oaddr, *eaddr;
+ int bo = infp->byte_order;
bp = sb_bread(sbp, vxfs_oblock(sbp, infp->vsi_oltext, bsize));
@@ -87,7 +88,7 @@ vxfs_read_olt(struct super_block *sbp, u_long bsize)
goto fail;
op = (struct vxfs_olt *)bp->b_data;
- if (op->olt_magic != VXFS_OLT_MAGIC) {
+ if (fs32_to_cpu(bo, op->olt_magic) != VXFS_OLT_MAGIC) {
printk(KERN_NOTICE "vxfs: ivalid olt magic number\n");
goto fail;
}
@@ -102,14 +103,14 @@ vxfs_read_olt(struct super_block *sbp, u_long bsize)
goto fail;
}
- oaddr = bp->b_data + op->olt_size;
+ oaddr = bp->b_data + fs32_to_cpu(bo, op->olt_size);
eaddr = bp->b_data + (infp->vsi_oltsize * sbp->s_blocksize);
while (oaddr < eaddr) {
struct vxfs_oltcommon *ocp =
(struct vxfs_oltcommon *)oaddr;
- switch (ocp->olt_type) {
+ switch (fs32_to_cpu(bo, ocp->olt_type)) {
case VXFS_OLT_FSHEAD:
vxfs_get_fshead((struct vxfs_oltfshead *)oaddr, infp);
break;
@@ -118,11 +119,11 @@ vxfs_read_olt(struct super_block *sbp, u_long bsize)
break;
}
- oaddr += ocp->olt_size;
+ oaddr += fs32_to_cpu(bo, ocp->olt_size);
}
brelse(bp);
- return 0;
+ return (infp->vsi_fshino && infp->vsi_iext) ? 0 : -EINVAL;
fail:
brelse(bp);
diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
index 7ca8c75..6a19802 100644
--- a/fs/freevxfs/vxfs_super.c
+++ b/fs/freevxfs/vxfs_super.c
@@ -109,14 +109,15 @@ static int
vxfs_statfs(struct dentry *dentry, struct kstatfs *bufp)
{
struct vxfs_sb_info *infp = VXFS_SBI(dentry->d_sb);
+ struct vxfs_sb *raw_sb = infp->vsi_raw;
bufp->f_type = VXFS_SUPER_MAGIC;
bufp->f_bsize = dentry->d_sb->s_blocksize;
- bufp->f_blocks = infp->vsi_raw->vs_dsize;
- bufp->f_bfree = infp->vsi_raw->vs_free;
+ bufp->f_blocks = fs32_to_cpu(infp->byte_order, raw_sb->vs_dsize);
+ bufp->f_bfree = fs32_to_cpu(infp->byte_order, raw_sb->vs_free);
bufp->f_bavail = 0;
bufp->f_files = 0;
- bufp->f_ffree = infp->vsi_raw->vs_ifree;
+ bufp->f_ffree = fs32_to_cpu(infp->byte_order, raw_sb->vs_ifree);
bufp->f_namelen = VXFS_NAMELEN;
return 0;
@@ -129,6 +130,46 @@ static int vxfs_remount(struct super_block *sb, int *flags, char *data)
return 0;
}
+
+static int vxfs_try_sb_magic(struct super_block *sbp, int blk, u32 magic)
+{
+ struct buffer_head *bp;
+ struct vxfs_sb *rsbp;
+ struct vxfs_sb_info *infp = VXFS_SBI(sbp);
+ int rc = -ENOMEM;
+
+ bp = sb_bread(sbp, blk);
+ do {
+ if (!bp || !buffer_mapped(bp)) {
+ if (!infp->silent) {
+ printk(KERN_WARNING "vxfs: unable to read"
+ " disk superblock at %d\n", blk);
+ }
+ break;
+ }
+
+ rc = -EINVAL;
+ rsbp = (struct vxfs_sb *)bp->b_data;
+ if (rsbp->vs_magic != magic) {
+ if (!infp->silent)
+ printk(KERN_NOTICE "vxfs: WRONG superblock magic\n");
+ break;
+ }
+
+ rc = 0;
+ infp->vsi_raw = rsbp;
+ infp->vsi_bp = bp;
+ } while (0);
+
+ if (rc) {
+ infp->vsi_raw = NULL;
+ infp->vsi_bp = NULL;
+ brelse(bp);
+ }
+
+ return rc;
+}
+
/**
* vxfs_read_super - read superblock into memory and initialize filesystem
* @sbp: VFS superblock (to fill)
@@ -149,10 +190,10 @@ static int vxfs_fill_super(struct super_block *sbp, void *dp, int silent)
{
struct vxfs_sb_info *infp;
struct vxfs_sb *rsbp;
- struct buffer_head *bp = NULL;
u_long bsize;
struct inode *root;
int ret = -EINVAL;
+ u32 j;
sbp->s_flags |= MS_RDONLY;
@@ -162,48 +203,47 @@ static int vxfs_fill_super(struct super_block *sbp, void *dp, int silent)
return -ENOMEM;
}
+ infp->silent = silent;
bsize = sb_min_blocksize(sbp, BLOCK_SIZE);
if (!bsize) {
printk(KERN_WARNING "vxfs: unable to set blocksize\n");
goto out;
}
- bp = sb_bread(sbp, 1);
- if (!bp || !buffer_mapped(bp)) {
- if (!silent) {
- printk(KERN_WARNING
- "vxfs: unable to read disk superblock\n");
+ sbp->s_fs_info = infp;
+ do {
+ if (!vxfs_try_sb_magic(sbp, 1, cpu_to_le32(VXFS_SUPER_MAGIC))) {
+ infp->byte_order = BO_LE; /* SCO */
+ break;
+ }
+
+ if (!vxfs_try_sb_magic(sbp, 8, cpu_to_be32(VXFS_SUPER_MAGIC))) {
+ infp->byte_order = BO_BE; /* HP-UX pa-risc likely */
+ break;
}
- goto out;
- }
- rsbp = (struct vxfs_sb *)bp->b_data;
- if (rsbp->vs_magic != VXFS_SUPER_MAGIC) {
- if (!silent)
- printk(KERN_NOTICE "vxfs: WRONG superblock magic\n");
goto out;
- }
+ } while (0);
- if ((rsbp->vs_version < 2 || rsbp->vs_version > 4) && !silent) {
- printk(KERN_NOTICE "vxfs: unsupported VxFS version (%d)\n",
- rsbp->vs_version);
+ rsbp = infp->vsi_raw;
+ j = fs32_to_cpu(infp->byte_order, rsbp->vs_version);
+ if ((j < 2 || j > 4) && !silent) {
+ printk(KERN_NOTICE "vxfs: unsupported VxFS version (%d)\n", j);
goto out;
}
#ifdef DIAGNOSTIC
- printk(KERN_DEBUG "vxfs: supported VxFS version (%d)\n", rsbp->vs_version);
+ printk(KERN_DEBUG "vxfs: supported VxFS version (%d)\n", j);
printk(KERN_DEBUG "vxfs: blocksize: %d\n", rsbp->vs_bsize);
#endif
- sbp->s_magic = rsbp->vs_magic;
- sbp->s_fs_info = infp;
+ sbp->s_magic = fs32_to_cpu(infp->byte_order, rsbp->vs_magic);
- infp->vsi_raw = rsbp;
- infp->vsi_bp = bp;
- infp->vsi_oltext = rsbp->vs_oltext[0];
- infp->vsi_oltsize = rsbp->vs_oltsize;
+ infp->vsi_oltext = fs32_to_cpu(infp->byte_order, rsbp->vs_oltext[0]);
+ infp->vsi_oltsize = fs32_to_cpu(infp->byte_order, rsbp->vs_oltsize);
- if (!sb_set_blocksize(sbp, rsbp->vs_bsize)) {
+ j = fs32_to_cpu(infp->byte_order, rsbp->vs_bsize);
+ if (!sb_set_blocksize(sbp, j)) {
printk(KERN_WARNING "vxfs: unable to set final block size\n");
goto out;
}
@@ -237,7 +277,8 @@ out_free_ilist:
vxfs_put_fake_inode(infp->vsi_ilist);
vxfs_put_fake_inode(infp->vsi_stilist);
out:
- brelse(bp);
+ if (infp->vsi_bp)
+ brelse(infp->vsi_bp);
kfree(infp);
return ret;
}
--
1.7.3.4
>From b95e9b749b85be347c953ce33d2b3c5dde4e56d6 Mon Sep 17 00:00:00 2001
From: KB <kb@sysmikro.com.pl>
Date: Wed, 25 May 2016 22:13:20 +0200
Subject: [PATCH 3/7] missing kfree and kfree on kmem_cache obj
Signed-off-by: KB <kb@sysmikro.com.pl>
---
fs/freevxfs/vxfs_extern.h | 1 +
fs/freevxfs/vxfs_fshead.c | 8 +++++---
fs/freevxfs/vxfs_inode.c | 5 +++++
3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/fs/freevxfs/vxfs_extern.h b/fs/freevxfs/vxfs_extern.h
index 881aa3d..0a3ff86 100644
--- a/fs/freevxfs/vxfs_extern.h
+++ b/fs/freevxfs/vxfs_extern.h
@@ -64,6 +64,7 @@ extern struct vxfs_inode_info * vxfs_blkiget(struct super_block *, u_long, ino_t
extern struct vxfs_inode_info * vxfs_stiget(struct super_block *, ino_t);
extern struct inode * vxfs_iget(struct super_block *, ino_t);
extern void vxfs_evict_inode(struct inode *);
+extern void vxfs_inode_info_free(struct vxfs_inode_info *vip);
/* vxfs_lookup.c */
extern const struct inode_operations vxfs_dir_inode_ops;
diff --git a/fs/freevxfs/vxfs_fshead.c b/fs/freevxfs/vxfs_fshead.c
index 6cbdde7..67ca2f9 100644
--- a/fs/freevxfs/vxfs_fshead.c
+++ b/fs/freevxfs/vxfs_fshead.c
@@ -183,7 +183,7 @@ vxfs_read_fshead(struct super_block *sbp)
infp->vsi_stilist = vxfs_get_fake_inode(sbp, tip);
if (!infp->vsi_stilist) {
printk(KERN_ERR "vxfs: unable to get structural list inode\n");
- kfree(tip);
+ vxfs_inode_info_free(tip);
goto out_free_pfp;
}
if (!VXFS_ISILT(VXFS_INO(infp->vsi_stilist))) {
@@ -198,7 +198,7 @@ vxfs_read_fshead(struct super_block *sbp)
infp->vsi_ilist = vxfs_get_fake_inode(sbp, tip);
if (!infp->vsi_ilist) {
printk(KERN_ERR "vxfs: unable to get inode list inode\n");
- kfree(tip);
+ vxfs_inode_info_free(tip);
goto out_iput_stilist;
}
if (!VXFS_ISILT(VXFS_INO(infp->vsi_ilist))) {
@@ -206,6 +206,8 @@ vxfs_read_fshead(struct super_block *sbp)
VXFS_INO(infp->vsi_ilist)->vii_mode & VXFS_TYPE_MASK);
goto out_iput_ilist;
}
+ kfree(pfp);
+ kfree(sfp);
return 0;
@@ -221,6 +223,6 @@ vxfs_read_fshead(struct super_block *sbp)
iput(infp->vsi_fship);
return -EINVAL;
out_free_fship:
- kfree(vip);
+ vxfs_inode_info_free(vip);
return -EINVAL;
}
diff --git a/fs/freevxfs/vxfs_inode.c b/fs/freevxfs/vxfs_inode.c
index 53b8757..16d8d27 100644
--- a/fs/freevxfs/vxfs_inode.c
+++ b/fs/freevxfs/vxfs_inode.c
@@ -415,3 +415,8 @@ vxfs_evict_inode(struct inode *ip)
clear_inode(ip);
call_rcu(&ip->i_rcu, vxfs_i_callback);
}
+
+void vxfs_inode_info_free(struct vxfs_inode_info *vip)
+{
+ kmem_cache_free(vxfs_inode_cachep, vip);
+}
--
1.7.3.4
>From e4979c6a29eb79d59d8bc2eca9acc0d2b416780f Mon Sep 17 00:00:00 2001
From: KB <kb@sysmikro.com.pl>
Date: Wed, 25 May 2016 22:15:20 +0200
Subject: [PATCH 4/7] super_operations.destroy_inode
Signed-off-by: KB <kb@sysmikro.com.pl>
---
fs/freevxfs/vxfs_extern.h | 1 +
fs/freevxfs/vxfs_inode.c | 16 ++++++++++++----
fs/freevxfs/vxfs_super.c | 1 +
3 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/fs/freevxfs/vxfs_extern.h b/fs/freevxfs/vxfs_extern.h
index 0a3ff86..4d8298b 100644
--- a/fs/freevxfs/vxfs_extern.h
+++ b/fs/freevxfs/vxfs_extern.h
@@ -65,6 +65,7 @@ extern struct vxfs_inode_info * vxfs_stiget(struct super_block *, ino_t);
extern struct inode * vxfs_iget(struct super_block *, ino_t);
extern void vxfs_evict_inode(struct inode *);
extern void vxfs_inode_info_free(struct vxfs_inode_info *vip);
+extern void vxfs_destroy_inode(struct inode *ip);
/* vxfs_lookup.c */
extern const struct inode_operations vxfs_dir_inode_ops;
diff --git a/fs/freevxfs/vxfs_inode.c b/fs/freevxfs/vxfs_inode.c
index 16d8d27..9b45ad7 100644
--- a/fs/freevxfs/vxfs_inode.c
+++ b/fs/freevxfs/vxfs_inode.c
@@ -397,7 +397,15 @@ vxfs_iget(struct super_block *sbp, ino_t ino)
static void vxfs_i_callback(struct rcu_head *head)
{
struct inode *inode = container_of(head, struct inode, i_rcu);
- kmem_cache_free(vxfs_inode_cachep, inode->i_private);
+ void *priv = inode->i_private;
+
+ inode->i_private = NULL;
+ kmem_cache_free(vxfs_inode_cachep, priv);
+}
+
+void vxfs_destroy_inode(struct inode *ip)
+{
+ call_rcu(&ip->i_rcu, vxfs_i_callback);
}
/**
@@ -405,17 +413,17 @@ static void vxfs_i_callback(struct rcu_head *head)
* @ip: inode to discard.
*
* Description:
- * vxfs_evict_inode() is called on the final iput and frees the private
- * inode area.
+ * vxfs_evict_inode() is called on the final iput
*/
void
vxfs_evict_inode(struct inode *ip)
{
truncate_inode_pages_final(&ip->i_data);
+ invalidate_inode_buffers(ip);
clear_inode(ip);
- call_rcu(&ip->i_rcu, vxfs_i_callback);
}
+
void vxfs_inode_info_free(struct vxfs_inode_info *vip)
{
kmem_cache_free(vxfs_inode_cachep, vip);
diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
index 6a19802..11a535a 100644
--- a/fs/freevxfs/vxfs_super.c
+++ b/fs/freevxfs/vxfs_super.c
@@ -59,6 +59,7 @@ static int vxfs_statfs(struct dentry *, struct kstatfs *);
static int vxfs_remount(struct super_block *, int *, char *);
static const struct super_operations vxfs_super_ops = {
+ .destroy_inode = vxfs_destroy_inode,
.evict_inode = vxfs_evict_inode,
.put_super = vxfs_put_super,
.statfs = vxfs_statfs,
--
1.7.3.4
>From 4a6ae0e351b6046367bbd2939547f292ed2b6e47 Mon Sep 17 00:00:00 2001
From: KB <kb@sysmikro.com.pl>
Date: Wed, 25 May 2016 22:28:37 +0200
Subject: [PATCH 5/7] refactoring of vxfs_readir() and find_entry()
Signed-off-by: KB <kb@sysmikro.com.pl>
---
fs/freevxfs/vxfs_lookup.c | 267 +++++++++++++++++++++------------------------
1 files changed, 124 insertions(+), 143 deletions(-)
diff --git a/fs/freevxfs/vxfs_lookup.c b/fs/freevxfs/vxfs_lookup.c
index cea158a..8eacb27 100644
--- a/fs/freevxfs/vxfs_lookup.c
+++ b/fs/freevxfs/vxfs_lookup.c
@@ -61,48 +61,6 @@ const struct file_operations vxfs_dir_operations = {
.iterate = vxfs_readdir,
};
-
-static inline u_long
-dir_pages(struct inode *inode)
-{
- return (inode->i_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-}
-
-static inline u_long
-dir_blocks(struct inode *ip)
-{
- u_long bsize = ip->i_sb->s_blocksize;
- return (ip->i_size + bsize - 1) & ~(bsize - 1);
-}
-
-/*
- * NOTE! unlike strncmp, vxfs_match returns 1 for success, 0 for failure.
- *
- * len <= VXFS_NAMELEN and de != NULL are guaranteed by caller.
- */
-static inline int
-vxfs_match(int len, const char * const name, struct vxfs_direct *de)
-{
- if (len != de->d_namelen)
- return 0;
- if (!de->d_ino)
- return 0;
- return !memcmp(name, de->d_name, len);
-}
-
-static inline struct vxfs_direct *
-vxfs_next_entry(struct vxfs_direct *de)
-{
- return ((struct vxfs_direct *)((char*)de + de->d_reclen));
-}
-
-/*
- * VXFS_dirblk_ovh is the overhead of a specific dirblock.
- */
-static inline u_long VXFS_dirblk_ovh(struct vxfs_dirblk *dbp, int bo)
-{
- return (sizeof(short) * fs16_to_cpu(bo, dbp->d_nhash)) + 4;
-}
/**
@@ -122,50 +80,65 @@ static inline u_long VXFS_dirblk_ovh(struct vxfs_dirblk *dbp, int bo)
static struct vxfs_direct *
vxfs_find_entry(struct inode *ip, struct dentry *dp, struct page **ppp)
{
- u_long npages, page, nblocks, pblocks, block;
- u_long bsize = ip->i_sb->s_blocksize;
- const char *name = dp->d_name.name;
- int namelen = dp->d_name.len;
-
- npages = dir_pages(ip);
- nblocks = dir_blocks(ip);
- pblocks = VXFS_BLOCK_PER_PAGE(ip->i_sb);
-
- for (page = 0; page < npages; page++) {
- caddr_t kaddr;
- struct page *pp;
-
- pp = vxfs_get_page(ip->i_mapping, page);
- if (IS_ERR(pp))
- continue;
- kaddr = (caddr_t)page_address(pp);
-
- for (block = 0; block <= nblocks && block <= pblocks; block++) {
- caddr_t baddr, limit;
- struct vxfs_dirblk *dbp;
- struct vxfs_direct *de;
-
- baddr = kaddr + (block * bsize);
- limit = baddr + bsize - VXFS_DIRLEN(1);
-
- dbp = (struct vxfs_dirblk *)baddr;
- de = (struct vxfs_direct *)(baddr + VXFS_DIRBLKOV(dbp));
-
- for (; (caddr_t)de <= limit; de = vxfs_next_entry(de)) {
- if (!de->d_reclen)
- break;
- if (!de->d_ino)
- continue;
- if (vxfs_match(namelen, name, de)) {
- *ppp = pp;
- return (de);
- }
+ u_long bsize = ip->i_sb->s_blocksize;
+ const char *name = dp->d_name.name;
+ int namelen = dp->d_name.len;
+ loff_t limit = VXFS_DIRROUND(ip->i_size);
+ struct vxfs_direct *de_exit = NULL;
+ loff_t pos = 0;
+ int bo = VXFS_SBI(ip->i_sb)->byte_order;
+
+ while (pos < limit) {
+ struct page *pp;
+ char *kaddr;
+ int pg_ofs = pos & ~PAGE_CACHE_MASK;
+
+ pp = vxfs_get_page(ip->i_mapping, pos >> PAGE_CACHE_SHIFT);
+ if (IS_ERR(pp)) {
+ return NULL;
+ }
+ kaddr = (char *)page_address(pp);
+
+ while (pg_ofs < PAGE_SIZE && pos < limit) {
+ struct vxfs_direct *de;
+
+ if ((pos & (bsize - 1)) < 4) {
+ struct vxfs_dirblk *dbp =
+ (struct vxfs_dirblk *)(kaddr + (pos & ~PAGE_CACHE_MASK));
+ int overhead = (sizeof(short) * fs16_to_cpu(bo, dbp->d_nhash)) + 4;
+
+ pos += overhead;
+ pg_ofs += overhead;
+ }
+ de = (struct vxfs_direct *)(kaddr + pg_ofs);
+
+ if (!de->d_reclen) {
+ pos += bsize - 1;
+ pos &= ~(bsize - 1);
+ break;
+ }
+
+ pg_ofs += fs16_to_cpu(bo, de->d_reclen);
+ pos += fs16_to_cpu(bo, de->d_reclen);
+ if (!de->d_ino) {
+ continue;
+ }
+
+ if (namelen != fs16_to_cpu(bo, de->d_namelen))
+ continue;
+ if (!memcmp(name, de->d_name, namelen)) {
+ *ppp = pp;
+ de_exit = de;
+ break;
}
}
- vxfs_put_page(pp);
+ if (!de_exit)
+ vxfs_put_page(pp);
+ else
+ break;
}
- return NULL;
+ return de_exit;
}
/**
@@ -185,15 +158,17 @@ vxfs_inode_by_name(struct inode *dip, struct dentry *dp)
{
struct vxfs_direct *de;
struct page *pp;
- ino_t ino = 0;
+ ino_t ino = 0;
de = vxfs_find_entry(dip, dp, &pp);
if (de) {
- ino = de->d_ino;
+ int bo = VXFS_SBI(dip->i_sb)->byte_order;
+
+ ino = fs32_to_cpu(bo, de->d_ino);
kunmap(pp);
page_cache_release(pp);
}
-
+
return (ino);
}
@@ -225,8 +200,8 @@ vxfs_lookup(struct inode *dip, struct dentry *dp, unsigned int flags)
ip = vxfs_iget(dip->i_sb, ino);
if (IS_ERR(ip))
return ERR_CAST(ip);
+ d_add(dp, ip);
}
- d_add(dp, ip);
return NULL;
}
@@ -249,76 +224,82 @@ vxfs_readdir(struct file *fp, struct dir_context *ctx)
struct inode *ip = file_inode(fp);
struct super_block *sbp = ip->i_sb;
u_long bsize = sbp->s_blocksize;
- u_long page, npages, block, pblocks, nblocks, offset;
- loff_t pos;
+ loff_t pos, limit;
int bo = VXFS_SBI(sbp)->byte_order;
-
if (ctx->pos == 0) {
if (!dir_emit_dot(fp, ctx))
- return 0;
- ctx->pos = 1;
+ goto out;
+ ctx->pos++;
}
if (ctx->pos == 1) {
if (!dir_emit(ctx, "..", 2, VXFS_INO(ip)->vii_dotdot, DT_DIR))
- return 0;
- ctx->pos = 2;
+ goto out;
+ ctx->pos++;
+ }
+
+ limit = VXFS_DIRROUND(ip->i_size);
+ if (ctx->pos > limit) {
+#if 0
+ ctx->pos = 0;
+#endif
+ goto out;
}
- pos = ctx->pos - 2;
-
- if (pos > VXFS_DIRROUND(ip->i_size))
- return 0;
-
- npages = dir_pages(ip);
- nblocks = dir_blocks(ip);
- pblocks = VXFS_BLOCK_PER_PAGE(sbp);
-
- page = pos >> PAGE_CACHE_SHIFT;
- offset = pos & ~PAGE_CACHE_MASK;
- block = (u_long)(pos >> sbp->s_blocksize_bits) % pblocks;
-
- for (; page < npages; page++, block = 0) {
- char *kaddr;
- struct page *pp;
-
- pp = vxfs_get_page(ip->i_mapping, page);
- if (IS_ERR(pp))
- continue;
+
+ pos = ctx->pos & ~3L;
+
+ while (pos < limit) {
+ struct page *pp;
+ char *kaddr;
+ int pg_ofs = pos & ~PAGE_CACHE_MASK;
+ int rc = 0;
+
+ pp = vxfs_get_page(ip->i_mapping, pos >> PAGE_CACHE_SHIFT);
+ if (IS_ERR(pp)) {
+ return -ENOMEM;
+ }
kaddr = (char *)page_address(pp);
- for (; block <= nblocks && block <= pblocks; block++) {
- char *baddr, *limit;
- struct vxfs_dirblk *dbp;
- struct vxfs_direct *de;
-
- baddr = kaddr + (block * bsize);
- limit = baddr + bsize - VXFS_DIRLEN(1);
-
- dbp = (struct vxfs_dirblk *)baddr;
- de = (struct vxfs_direct *)
- (offset ?
- (kaddr + offset) :
- (baddr + VXFS_DIRBLKOV(dbp)));
-
- for (; (char *)de <= limit; de = vxfs_next_entry(de)) {
- if (!de->d_reclen)
- break;
- if (!de->d_ino)
- continue;
-
- offset = (char *)de - kaddr;
- ctx->pos = ((page << PAGE_CACHE_SHIFT) | offset) + 2;
- if (!dir_emit(ctx, de->d_name, fs16_to_cpu(bo, de->d_namelen),
- fs32_to_cpu(bo, de->d_ino), DT_UNKNOWN)) {
- vxfs_put_page(pp);
- return 0;
- }
+ while (pg_ofs < PAGE_SIZE && pos < limit) {
+ struct vxfs_direct *de;
+
+ if ((pos & (bsize - 1)) < 4) {
+ struct vxfs_dirblk *dbp =
+ (struct vxfs_dirblk *)(kaddr + (pos & ~PAGE_CACHE_MASK));
+ int overhead = (sizeof(short) * fs16_to_cpu(bo, dbp->d_nhash)) + 4;
+
+ pos += overhead;
+ pg_ofs += overhead;
+ }
+ de = (struct vxfs_direct *)(kaddr + pg_ofs);
+
+ if (!de->d_reclen) {
+ pos += bsize - 1;
+ pos &= ~(bsize - 1);
+ break;
+ }
+
+ pg_ofs += fs16_to_cpu(bo, de->d_reclen);
+ pos += fs16_to_cpu(bo, de->d_reclen);
+ if (!de->d_ino) {
+ continue;
+ }
+
+ rc = dir_emit(ctx, de->d_name, fs16_to_cpu(bo, de->d_namelen),
+ fs32_to_cpu(bo, de->d_ino), DT_UNKNOWN);
+ if (!rc) {
+ /* the dir entry was not submitted, so fix pos. */
+ pos -= fs16_to_cpu(bo, de->d_reclen);
+ break;
}
- offset = 0;
}
vxfs_put_page(pp);
- offset = 0;
+ if (!rc)
+ break;
}
- ctx->pos = ((page << PAGE_CACHE_SHIFT) | offset) + 2;
+
+ ctx->pos = pos | 2;
+
+out:
return 0;
}
--
1.7.3.4
>From e7f68291aada1535016c767a4f5a5fcfb19a1de6 Mon Sep 17 00:00:00 2001
From: KB <kb@sysmikro.com.pl>
Date: Wed, 25 May 2016 22:41:22 +0200
Subject: [PATCH 6/7] static cachep
Signed-off-by: KB <kb@sysmikro.com.pl>
---
fs/freevxfs/vxfs_extern.h | 4 +++-
fs/freevxfs/vxfs_inode.c | 24 +++++++++++++++++++++++-
fs/freevxfs/vxfs_super.c | 26 ++++++++++----------------
3 files changed, 36 insertions(+), 18 deletions(-)
diff --git a/fs/freevxfs/vxfs_extern.h b/fs/freevxfs/vxfs_extern.h
index 4d8298b..cc43fd0 100644
--- a/fs/freevxfs/vxfs_extern.h
+++ b/fs/freevxfs/vxfs_extern.h
@@ -55,7 +55,6 @@ extern const struct inode_operations vxfs_immed_symlink_iops;
/* vxfs_inode.c */
extern const struct address_space_operations vxfs_immed_aops;
-extern struct kmem_cache *vxfs_inode_cachep;
extern void vxfs_dumpi(struct vxfs_inode_info *, ino_t);
extern struct inode * vxfs_get_fake_inode(struct super_block *,
struct vxfs_inode_info *);
@@ -66,6 +65,9 @@ extern struct inode * vxfs_iget(struct super_block *, ino_t);
extern void vxfs_evict_inode(struct inode *);
extern void vxfs_inode_info_free(struct vxfs_inode_info *vip);
extern void vxfs_destroy_inode(struct inode *ip);
+extern int vxfs_ii_cache_init(void);
+extern void vxfs_ii_cache_destroy(void);
+
/* vxfs_lookup.c */
extern const struct inode_operations vxfs_dir_inode_ops;
diff --git a/fs/freevxfs/vxfs_inode.c b/fs/freevxfs/vxfs_inode.c
index 9b45ad7..73ac417 100644
--- a/fs/freevxfs/vxfs_inode.c
+++ b/fs/freevxfs/vxfs_inode.c
@@ -41,7 +41,7 @@
#include "vxfs_extern.h"
-struct kmem_cache *vxfs_inode_cachep;
+static struct kmem_cache *vxfs_inode_cachep;
#ifdef DIAGNOSTIC
@@ -428,3 +428,25 @@ void vxfs_inode_info_free(struct vxfs_inode_info *vip)
{
kmem_cache_free(vxfs_inode_cachep, vip);
}
+
+
+int vxfs_ii_cache_init(void)
+{
+ vxfs_inode_cachep = kmem_cache_create("vxfs_inode",
+ sizeof(struct vxfs_inode_info), 0,
+ SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL);
+
+ return vxfs_inode_cachep ? 0 : -ENOMEM;
+}
+
+
+void vxfs_ii_cache_destroy(void)
+{
+ /*
+ * Make sure all delayed rcu free inodes are flushed before we
+ * destroy cache.
+ */
+ rcu_barrier();
+ kmem_cache_destroy(vxfs_inode_cachep);
+}
+
diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
index 11a535a..7579500 100644
--- a/fs/freevxfs/vxfs_super.c
+++ b/fs/freevxfs/vxfs_super.c
@@ -306,29 +306,23 @@ MODULE_ALIAS("vxfs");
static int __init
vxfs_init(void)
{
- int rv;
+ int rc = vxfs_ii_cache_init();
- vxfs_inode_cachep = kmem_cache_create("vxfs_inode",
- sizeof(struct vxfs_inode_info), 0,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL);
- if (!vxfs_inode_cachep)
- return -ENOMEM;
- rv = register_filesystem(&vxfs_fs_type);
- if (rv < 0)
- kmem_cache_destroy(vxfs_inode_cachep);
- return rv;
+ if (!rc) {
+ rc = register_filesystem(&vxfs_fs_type);
+ if (rc < 0)
+ vxfs_ii_cache_destroy();
+ }
+ printk(KERN_DEBUG "%s: **** %s %s rc %d\n", __func__, __DATE__, __TIME__, rc);
+
+ return rc;
}
static void __exit
vxfs_cleanup(void)
{
unregister_filesystem(&vxfs_fs_type);
- /*
- * Make sure all delayed rcu free inodes are flushed before we
- * destroy cache.
- */
- rcu_barrier();
- kmem_cache_destroy(vxfs_inode_cachep);
+ vxfs_ii_cache_destroy();
}
module_init(vxfs_init);
--
1.7.3.4
>From 2c0008e9f2f4e62bb3e10eecad54e2a0140e0c4c Mon Sep 17 00:00:00 2001
From: KB <kb@sysmikro.com.pl>
Date: Wed, 25 May 2016 22:58:08 +0200
Subject: [PATCH 7/7] the credits
Signed-off-by: KB <kb@sysmikro.com.pl>
---
fs/freevxfs/vxfs.h | 3 +++
fs/freevxfs/vxfs_fshead.c | 4 ++++
fs/freevxfs/vxfs_inode.c | 4 ++++
fs/freevxfs/vxfs_lookup.c | 4 ++++
fs/freevxfs/vxfs_super.c | 6 +++++-
5 files changed, 20 insertions(+), 1 deletions(-)
diff --git a/fs/freevxfs/vxfs.h b/fs/freevxfs/vxfs.h
index 5dc8949..35f56b7 100644
--- a/fs/freevxfs/vxfs.h
+++ b/fs/freevxfs/vxfs.h
@@ -2,6 +2,9 @@
* Copyright (c) 2000-2001 Christoph Hellwig.
* All rights reserved.
*
+ * (c) 2016 Krzysztof Blaszkowski
+ * Many bug fixes, improvements & tests with HP-UX B.10.20 (pa-risc)
+ *
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
diff --git a/fs/freevxfs/vxfs_fshead.c b/fs/freevxfs/vxfs_fshead.c
index 67ca2f9..44b87d0 100644
--- a/fs/freevxfs/vxfs_fshead.c
+++ b/fs/freevxfs/vxfs_fshead.c
@@ -2,6 +2,10 @@
* Copyright (c) 2000-2001 Christoph Hellwig.
* All rights reserved.
*
+ *
+ * (c) 2016 Krzysztof Blaszkowski
+ * Many bug fixes, improvements & tests with HP-UX B.10.20 (pa-risc)
+ *
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
diff --git a/fs/freevxfs/vxfs_inode.c b/fs/freevxfs/vxfs_inode.c
index 73ac417..4c8a625 100644
--- a/fs/freevxfs/vxfs_inode.c
+++ b/fs/freevxfs/vxfs_inode.c
@@ -2,6 +2,10 @@
* Copyright (c) 2000-2001 Christoph Hellwig.
* All rights reserved.
*
+ *
+ * (c) 2016 Krzysztof Blaszkowski
+ * Many bug fixes, improvements & tests with HP-UX B.10.20 (pa-risc)
+ *
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
diff --git a/fs/freevxfs/vxfs_lookup.c b/fs/freevxfs/vxfs_lookup.c
index 8eacb27..173aeea 100644
--- a/fs/freevxfs/vxfs_lookup.c
+++ b/fs/freevxfs/vxfs_lookup.c
@@ -2,6 +2,10 @@
* Copyright (c) 2000-2001 Christoph Hellwig.
* All rights reserved.
*
+ *
+ * (c) 2016 Krzysztof Blaszkowski
+ * Many bug fixes, improvements & tests with HP-UX B.10.20 (pa-risc)
+ *
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
index 7579500..bd121aa 100644
--- a/fs/freevxfs/vxfs_super.c
+++ b/fs/freevxfs/vxfs_super.c
@@ -2,6 +2,10 @@
* Copyright (c) 2000-2001 Christoph Hellwig.
* All rights reserved.
*
+ *
+ * (c) 2016 Krzysztof Blaszkowski
+ * Many bug fixes, improvements & tests with HP-UX B.10.20 (pa-risc)
+ *
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
@@ -48,7 +52,7 @@
#include "vxfs_inode.h"
-MODULE_AUTHOR("Christoph Hellwig");
+MODULE_AUTHOR("Christoph Hellwig, Krzysztof Blaszkowski");
MODULE_DESCRIPTION("Veritas Filesystem (VxFS) driver");
MODULE_LICENSE("Dual BSD/GPL");
--
1.7.3.4
--
Krzysztof Blaszkowski
^ permalink raw reply related [flat|nested] 9+ messages in thread
* r
2016-05-26 14:45 freevxfs: hp-ux support. patchset 1-7/7 Krzysztof Błaszkowski
@ 2016-05-26 15:53 ` Christoph Hellwig
2016-05-26 17:44 ` r Krzysztof Błaszkowski
0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2016-05-26 15:53 UTC (permalink / raw)
To: Krzysztof B??aszkowski; +Cc: Christoph Hellwig, Carlos Maiolino, linux-fsdevel
Hi Krzysztof,
first round of reviews below:
> Date: Wed, 25 May 2016 21:50:11 +0200
> Subject: [PATCH 1/7] kconfig note
>
>
> Signed-off-by: KB <kb@sysmikro.com.pl>
This part looks good, but I'd add a littlle more text and move it last
in the series. The other patches could also use a little more
descriptive text in the changelogs.
> diff --git a/fs/freevxfs/vxfs.h b/fs/freevxfs/vxfs.h
> index c8a9265..5dc8949 100644
> --- a/fs/freevxfs/vxfs.h
> +++ b/fs/freevxfs/vxfs.h
> @@ -152,7 +152,7 @@ struct vxfs_sb {
> /*
> * Actually much more...
> */
> -};
> +} __packed;
Can you explain why this is added?
>
>
> /*
> @@ -168,9 +168,32 @@ struct vxfs_sb_info {
> ino_t vsi_fshino; /* fileset header inode */
> daddr_t vsi_oltext; /* OLT extent */
> daddr_t vsi_oltsize; /* OLT size */
> + int byte_order;
> + int silent;
> +};
Can you align these with the rest of the fields? Also where does the
silent flag come from here?
> +#define VXFS_FS32(field1, field2) fhp->field1 = fs32_to_cpu(bo, dbh->field2)
Please remove this wrapper.
> +#define VXFS_FS32(field1, field2) vip->field1 = fs32_to_cpu(bo, dip->field2)
> +#define VXFS_FS64(field1, field2) vip->field1 = fs64_to_cpu(bo, dip->field2)
> +#define VXFS_FS16(field1, field2) vip->field1 = fs16_to_cpu(bo, dip->field2)
and these.
> + sbp->s_fs_info = infp;
> + do {
> + if (!vxfs_try_sb_magic(sbp, 1, cpu_to_le32(VXFS_SUPER_MAGIC))) {
> + infp->byte_order = BO_LE; /* SCO */
> + break;
> + }
> +
> + if (!vxfs_try_sb_magic(sbp, 8, cpu_to_be32(VXFS_SUPER_MAGIC))) {
> + infp->byte_order = BO_BE; /* HP-UX pa-risc likely */
> + break;
I'd normally move this into a separate patch. But even if this
stay in here it should be mentioned in the patch changelog.
> vxfs_put_fake_inode(infp->vsi_ilist);
> vxfs_put_fake_inode(infp->vsi_stilist);
> out:
> - brelse(bp);
> + if (infp->vsi_bp)
> + brelse(infp->vsi_bp);
brelse handles a null argument just fine.
> extern void vxfs_inode_info_free(struct vxfs_inode_info *vip);
> +extern void vxfs_destroy_inode(struct inode *ip);
>
> /* vxfs_lookup.c */
> extern const struct inode_operations vxfs_dir_inode_ops;
> diff --git a/fs/freevxfs/vxfs_inode.c b/fs/freevxfs/vxfs_inode.c
> index 16d8d27..9b45ad7 100644
> --- a/fs/freevxfs/vxfs_inode.c
> +++ b/fs/freevxfs/vxfs_inode.c
> @@ -397,7 +397,15 @@ vxfs_iget(struct super_block *sbp, ino_t ino)
> static void vxfs_i_callback(struct rcu_head *head)
> {
> struct inode *inode = container_of(head, struct inode, i_rcu);
> - kmem_cache_free(vxfs_inode_cachep, inode->i_private);
> + void *priv = inode->i_private;
> +
> + inode->i_private = NULL;
> + kmem_cache_free(vxfs_inode_cachep, priv);
> +}
> +
> +void vxfs_destroy_inode(struct inode *ip)
> +{
> + call_rcu(&ip->i_rcu, vxfs_i_callback);
> }
I'd rather go for embedding the VFS inode in the VxFS inode like most
modern file systems do. Take a look at alloc_inode and destroy_inode
in ext2 for an example.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: r
2016-05-26 15:53 ` r Christoph Hellwig
@ 2016-05-26 17:44 ` Krzysztof Błaszkowski
0 siblings, 0 replies; 9+ messages in thread
From: Krzysztof Błaszkowski @ 2016-05-26 17:44 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Carlos Maiolino, linux-fsdevel
Hi Christoph,
On Thu, 2016-05-26 at 08:53 -0700, Christoph Hellwig wrote:
> Hi Krzysztof,
>
> first round of reviews below:
>
> > Date: Wed, 25 May 2016 21:50:11 +0200
> > Subject: [PATCH 1/7] kconfig note
> >
> >
> > Signed-off-by: KB <kb@sysmikro.com.pl>
>
> This part looks good, but I'd add a littlle more text and move it last
> in the series.
okay, what would you write in this section ?
Will move the patch.
> The other patches could also use a little more
> descriptive text in the changelogs.
>
> > diff --git a/fs/freevxfs/vxfs.h b/fs/freevxfs/vxfs.h
> > index c8a9265..5dc8949 100644
> > --- a/fs/freevxfs/vxfs.h
> > +++ b/fs/freevxfs/vxfs.h
> > @@ -152,7 +152,7 @@ struct vxfs_sb {
> > /*
> > * Actually much more...
> > */
> > -};
> > +} __packed;
>
> Can you explain why this is added?
>
I was thinking that on-disk memory mapped structures should be "packed"
especially if they contain 1 or 2 bytes fields in the middle.
If the structure does not have the packed attribute then its size may be
bigger than the size of similar structure (by name) used on another OS
e.g. 32bit. An access to fields of such structure aligned differently
(due to e.g. 64 bit compiler) will end up with data corruption.
(that's the basics, why i write this ?)
the vxfs_sb structure has u8 and u16's so usage of __pack is desired
over here. the __pack will not cause any harm (because if we count all
u8 and u16, they are still u32 aligned) and it will prevent from
hypothetical misalignment (just in case). It is also a tip that the
structure is on-disk and must be compiler alignment independent.
> >
> >
> > /*
> > @@ -168,9 +168,32 @@ struct vxfs_sb_info {
> > ino_t vsi_fshino; /* fileset header inode */
> > daddr_t vsi_oltext; /* OLT extent */
> > daddr_t vsi_oltsize; /* OLT size */
> > + int byte_order;
> > + int silent;
> > +};
>
> Can you align these with the rest of the fields? Also where does the
> silent flag come from here?
I reckon aligning it is not necessary, because this structure is
run-time, cpu endian and it lives in memory only (but storage device)
"silent" int comes from arg passed over to vxfs_fill_super() of the same
name. My workaround, I do not like functions which have long arguments
list. 2 args is fine, 3 - maybe too long, 4 - it must be done something,
5+ - bad day ? ;)
>
> > +#define VXFS_FS32(field1, field2) fhp->field1 = fs32_to_cpu(bo, dbh->field2)
>
> Please remove this wrapper.
>
> > +#define VXFS_FS32(field1, field2) vip->field1 = fs32_to_cpu(bo, dip->field2)
> > +#define VXFS_FS64(field1, field2) vip->field1 = fs64_to_cpu(bo, dip->field2)
> > +#define VXFS_FS16(field1, field2) vip->field1 = fs16_to_cpu(bo, dip->field2)
>
> and these.
well, this will be difficult. These macros help saving lots of typing
over and over again.
although they raise checkpatch errors I will not opt-out because scope
of these macros is local. These macros are not intended to be used
anywhere else than functions like: dip2vip_cpy() and dbh2fhp().
I can #undef them later. Will you agree ?
>
> > + sbp->s_fs_info = infp;
> > + do {
> > + if (!vxfs_try_sb_magic(sbp, 1, cpu_to_le32(VXFS_SUPER_MAGIC))) {
> > + infp->byte_order = BO_LE; /* SCO */
> > + break;
> > + }
> > +
> > + if (!vxfs_try_sb_magic(sbp, 8, cpu_to_be32(VXFS_SUPER_MAGIC))) {
> > + infp->byte_order = BO_BE; /* HP-UX pa-risc likely */
> > + break;
>
> I'd normally move this into a separate patch. But even if this
> stay in here it should be mentioned in the patch changelog.
indeed, that's quite important new feature.
changelog ?, where is it ?
>
> > vxfs_put_fake_inode(infp->vsi_ilist);
> > vxfs_put_fake_inode(infp->vsi_stilist);
> > out:
> > - brelse(bp);
> > + if (infp->vsi_bp)
> > + brelse(infp->vsi_bp);
>
> brelse handles a null argument just fine.
wasn't sure so added the "if" just in case of it didn't handle.
that "if" may stay there still.
>
> > extern void vxfs_inode_info_free(struct vxfs_inode_info *vip);
> > +extern void vxfs_destroy_inode(struct inode *ip);
> >
> > /* vxfs_lookup.c */
> > extern const struct inode_operations vxfs_dir_inode_ops;
> > diff --git a/fs/freevxfs/vxfs_inode.c b/fs/freevxfs/vxfs_inode.c
> > index 16d8d27..9b45ad7 100644
> > --- a/fs/freevxfs/vxfs_inode.c
> > +++ b/fs/freevxfs/vxfs_inode.c
> > @@ -397,7 +397,15 @@ vxfs_iget(struct super_block *sbp, ino_t ino)
> > static void vxfs_i_callback(struct rcu_head *head)
> > {
> > struct inode *inode = container_of(head, struct inode, i_rcu);
> > - kmem_cache_free(vxfs_inode_cachep, inode->i_private);
> > + void *priv = inode->i_private;
> > +
> > + inode->i_private = NULL;
> > + kmem_cache_free(vxfs_inode_cachep, priv);
> > +}
> > +
> > +void vxfs_destroy_inode(struct inode *ip)
> > +{
> > + call_rcu(&ip->i_rcu, vxfs_i_callback);
> > }
>
> I'd rather go for embedding the VFS inode in the VxFS inode like most
> modern file systems do. Take a look at alloc_inode and destroy_inode
> in ext2 for an example.
right, and then alloc callback will be necessary. this will introduce
more changes to e.g. get_fake_inode() and around this function.
Do you really want to have this "more perfect" ?
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Krzysztof Blaszkowski
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-01-07 2:36 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-04 13:56 [PATCH] btrfs: relocation: Fix KASAN reports caused by extended reloc tree lifespan Qu Wenruo
2020-01-05 14:49 ` Nikolay Borisov
[not found] ` <b58caea4-476b-bf83-292d-ea71052bbea7@toxicpanda.com>
2020-01-06 18:04 ` r David Sterba
2020-01-06 19:26 ` r Josef Bacik
2020-01-06 18:15 ` [PATCH] btrfs: relocation: Fix KASAN reports caused by extended reloc tree lifespan David Sterba
2020-01-07 2:30 ` Qu Wenruo
2020-01-07 2:35 ` Qu Wenruo
-- strict thread matches above, loose matches on Subject: below --
2016-05-26 14:45 freevxfs: hp-ux support. patchset 1-7/7 Krzysztof Błaszkowski
2016-05-26 15:53 ` r Christoph Hellwig
2016-05-26 17:44 ` r Krzysztof Błaszkowski
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.