* [PATCH] btrfs: fix invalid memory access with journal_info
@ 2018-05-09 10:35 robbieko
2018-05-10 4:53 ` Omar Sandoval
0 siblings, 1 reply; 5+ messages in thread
From: robbieko @ 2018-05-09 10:35 UTC (permalink / raw)
To: linux-btrfs; +Cc: Robbie Ko
From: Robbie Ko <robbieko@synology.com>
When send process requires memory allocation, shrinker may be triggered
due to insufficient memory.
Then evict_inode gets called when inode is freed, and this function
may need to start transaction.
However, the journal_info is already points to BTRFS_SEND_TRANS_STUB, it
passed the if condition,
and the following use yields illegal memory access.
if (current->journal_info) {
WARN_ON(type & TRANS_EXTWRITERS);
h = current->journal_info;
refcount_inc(&h->use_count);
WARN_ON(refcount_read(&h->use_count) > 2);
h->orig_rsv = h->block_rsv;
h->block_rsv = NULL;
goto got_it;
}
Direct IO has a similar problem, journal_info will store btrfs_dio_data,
which will lead to illegal memory access.
We fixed the problem by save the journal_info and restore afterwards.
CallTrace looks like this:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000021
IP: [<ffffffffa086f2d4>] start_transaction+0x64/0x450 [btrfs]
PGD 8fea4b067 PUD a33bea067 PMD 0
Oops: 0000 [#1] SMP
CPU: 3 PID: 12681 Comm: btrfs Tainted: P C O 3.10.102 #15266
RIP: 0010:[<ffffffffa086f2d4>] start_transaction+0x64/0x450 [btrfs]
Call Trace:
[<ffffffffa087a838>] ? btrfs_evict_inode+0x3d8/0x580 [btrfs]
[<ffffffff81115932>] ? evict+0xa2/0x1a0
[<ffffffff81112888>] ? shrink_dentry_list+0x308/0x3d0
[<ffffffff811137f3>] ? prune_dcache_sb+0x133/0x160
[<ffffffff810fa51f>] ? prune_super+0xcf/0x1a0
[<ffffffff810bf6bf>] ? shrink_slab+0x11f/0x1d0
[<ffffffff810c19f2>] ? do_try_to_free_pages+0x452/0x560
[<ffffffff810bf054>] ? throttle_direct_reclaim+0x74/0x240
[<ffffffff810c1bae>] ? try_to_free_pages+0xae/0xc0
[<ffffffff810ba16b>] ? __alloc_pages_nodemask+0x53b/0x9f0
[<ffffffff810bc89c>] ? __do_page_cache_readahead+0xec/0x270
[<ffffffff810bcb2b>] ? ondemand_readahead+0xbb/0x220
[<ffffffffa08d7c43>] ? fill_read_buf+0x2b3/0x3a0 [btrfs]
[<ffffffffa08dbf5e>] ? send_extent_data+0x10e/0x300 [btrfs]
[<ffffffffa08dc34b>] ? process_extent+0x1fb/0x1310 [btrfs]
[<ffffffffa08d8300>] ? iterate_dir_item.isra.28+0x1b0/0x250 [btrfs]
[<ffffffffa08dd500>] ? send_set_xattr+0xa0/0xa0 [btrfs]
[<ffffffffa08de565>] ? changed_cb+0xd5/0xc40 [btrfs]
[<ffffffffa08df1c2>] ? full_send_tree+0xf2/0x1a0 [btrfs]
[<ffffffffa08e022b>] ? btrfs_ioctl_send+0xfbb/0x1040 [btrfs]
[<ffffffffa08a9864>] ? btrfs_ioctl+0x1084/0x32a0 [btrfs]
Signed-off-by: Robbie Ko <robbieko@synology.com>
---
fs/btrfs/inode.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f534701..77aec8d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5295,6 +5295,7 @@ void btrfs_evict_inode(struct inode *inode)
int steal_from_global = 0;
u64 min_size;
int ret;
+ void *journal_info = NULL;
trace_btrfs_inode_evict(inode);
@@ -5303,6 +5304,16 @@ void btrfs_evict_inode(struct inode *inode)
return;
}
+ /*
+ * Send or Direct IO may store information in journal_info.
+ * However, this function may use start_transaction and
+ * start_transaction will use journal_info.
+ * To avoid accessing invalid memory, we can save the journal_info
+ * and restore it later.
+ */
+ journal_info = current->journal_info;
+ current->journal_info = NULL;
+
min_size = btrfs_calc_trunc_metadata_size(fs_info, 1);
evict_inode_truncate_pages(inode);
@@ -5462,6 +5473,7 @@ void btrfs_evict_inode(struct inode *inode)
no_delete:
btrfs_remove_delayed_node(BTRFS_I(inode));
clear_inode(inode);
+ current->journal_info = journal_info;
}
/*
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] btrfs: fix invalid memory access with journal_info
2018-05-09 10:35 [PATCH] btrfs: fix invalid memory access with journal_info robbieko
@ 2018-05-10 4:53 ` Omar Sandoval
2018-05-10 5:55 ` robbieko
2018-05-10 13:01 ` David Sterba
0 siblings, 2 replies; 5+ messages in thread
From: Omar Sandoval @ 2018-05-10 4:53 UTC (permalink / raw)
To: robbieko; +Cc: linux-btrfs
On Wed, May 09, 2018 at 06:35:25PM +0800, robbieko wrote:
> From: Robbie Ko <robbieko@synology.com>
>
> When send process requires memory allocation, shrinker may be triggered
> due to insufficient memory.
> Then evict_inode gets called when inode is freed, and this function
> may need to start transaction.
> However, the journal_info is already points to BTRFS_SEND_TRANS_STUB, it
> passed the if condition,
> and the following use yields illegal memory access.
>
> if (current->journal_info) {
> WARN_ON(type & TRANS_EXTWRITERS);
> h = current->journal_info;
> refcount_inc(&h->use_count);
> WARN_ON(refcount_read(&h->use_count) > 2);
> h->orig_rsv = h->block_rsv;
> h->block_rsv = NULL;
> goto got_it;
> }
start_transaction() has
ASSERT(current->journal_info != BTRFS_SEND_TRANS_STUB);
Are you saying that's wrong? Are there other cases where the shrinker
can end up starting a transaction?
> Direct IO has a similar problem, journal_info will store btrfs_dio_data,
> which will lead to illegal memory access.
I have patches getting rid of this for direct I/O here:
https://github.com/osandov/linux/tree/btrfs-journal-info-abuse
I originally did that for btrfs swapfile support, but if it actually
fixes an existing bug it should be easy to get merged.
> We fixed the problem by save the journal_info and restore afterwards.
>
> CallTrace looks like this:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000021
> IP: [<ffffffffa086f2d4>] start_transaction+0x64/0x450 [btrfs]
> PGD 8fea4b067 PUD a33bea067 PMD 0
> Oops: 0000 [#1] SMP
> CPU: 3 PID: 12681 Comm: btrfs Tainted: P C O 3.10.102 #15266
> RIP: 0010:[<ffffffffa086f2d4>] start_transaction+0x64/0x450 [btrfs]
> Call Trace:
> [<ffffffffa087a838>] ? btrfs_evict_inode+0x3d8/0x580 [btrfs]
> [<ffffffff81115932>] ? evict+0xa2/0x1a0
> [<ffffffff81112888>] ? shrink_dentry_list+0x308/0x3d0
> [<ffffffff811137f3>] ? prune_dcache_sb+0x133/0x160
> [<ffffffff810fa51f>] ? prune_super+0xcf/0x1a0
> [<ffffffff810bf6bf>] ? shrink_slab+0x11f/0x1d0
> [<ffffffff810c19f2>] ? do_try_to_free_pages+0x452/0x560
> [<ffffffff810bf054>] ? throttle_direct_reclaim+0x74/0x240
> [<ffffffff810c1bae>] ? try_to_free_pages+0xae/0xc0
> [<ffffffff810ba16b>] ? __alloc_pages_nodemask+0x53b/0x9f0
> [<ffffffff810bc89c>] ? __do_page_cache_readahead+0xec/0x270
> [<ffffffff810bcb2b>] ? ondemand_readahead+0xbb/0x220
> [<ffffffffa08d7c43>] ? fill_read_buf+0x2b3/0x3a0 [btrfs]
> [<ffffffffa08dbf5e>] ? send_extent_data+0x10e/0x300 [btrfs]
> [<ffffffffa08dc34b>] ? process_extent+0x1fb/0x1310 [btrfs]
> [<ffffffffa08d8300>] ? iterate_dir_item.isra.28+0x1b0/0x250 [btrfs]
> [<ffffffffa08dd500>] ? send_set_xattr+0xa0/0xa0 [btrfs]
> [<ffffffffa08de565>] ? changed_cb+0xd5/0xc40 [btrfs]
> [<ffffffffa08df1c2>] ? full_send_tree+0xf2/0x1a0 [btrfs]
> [<ffffffffa08e022b>] ? btrfs_ioctl_send+0xfbb/0x1040 [btrfs]
> [<ffffffffa08a9864>] ? btrfs_ioctl+0x1084/0x32a0 [btrfs]
>
> Signed-off-by: Robbie Ko <robbieko@synology.com>
> ---
> fs/btrfs/inode.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index f534701..77aec8d 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -5295,6 +5295,7 @@ void btrfs_evict_inode(struct inode *inode)
> int steal_from_global = 0;
> u64 min_size;
> int ret;
> + void *journal_info = NULL;
This initialization isn't necessary.
> trace_btrfs_inode_evict(inode);
>
> @@ -5303,6 +5304,16 @@ void btrfs_evict_inode(struct inode *inode)
> return;
> }
>
> + /*
> + * Send or Direct IO may store information in journal_info.
> + * However, this function may use start_transaction and
> + * start_transaction will use journal_info.
> + * To avoid accessing invalid memory, we can save the journal_info
> + * and restore it later.
> + */
> + journal_info = current->journal_info;
> + current->journal_info = NULL;
> +
> min_size = btrfs_calc_trunc_metadata_size(fs_info, 1);
>
> evict_inode_truncate_pages(inode);
> @@ -5462,6 +5473,7 @@ void btrfs_evict_inode(struct inode *inode)
> no_delete:
> btrfs_remove_delayed_node(BTRFS_I(inode));
> clear_inode(inode);
> + current->journal_info = journal_info;
> }
>
> /*
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] btrfs: fix invalid memory access with journal_info
2018-05-10 4:53 ` Omar Sandoval
@ 2018-05-10 5:55 ` robbieko
2018-05-10 12:56 ` David Sterba
2018-05-10 13:01 ` David Sterba
1 sibling, 1 reply; 5+ messages in thread
From: robbieko @ 2018-05-10 5:55 UTC (permalink / raw)
To: Omar Sandoval; +Cc: linux-btrfs, linux-btrfs-owner
Omar Sandoval 於 2018-05-10 12:53 寫到:
> On Wed, May 09, 2018 at 06:35:25PM +0800, robbieko wrote:
>> From: Robbie Ko <robbieko@synology.com>
>>
>> When send process requires memory allocation, shrinker may be
>> triggered
>> due to insufficient memory.
>> Then evict_inode gets called when inode is freed, and this function
>> may need to start transaction.
>> However, the journal_info is already points to BTRFS_SEND_TRANS_STUB,
>> it
>> passed the if condition,
>> and the following use yields illegal memory access.
>>
>> if (current->journal_info) {
>> WARN_ON(type & TRANS_EXTWRITERS);
>> h = current->journal_info;
>> refcount_inc(&h->use_count);
>> WARN_ON(refcount_read(&h->use_count) > 2);
>> h->orig_rsv = h->block_rsv;
>> h->block_rsv = NULL;
>> goto got_it;
>> }
>
> start_transaction() has
>
> ASSERT(current->journal_info != BTRFS_SEND_TRANS_STUB);
I didn't turn on the configuration CONFIG_BTRFS_ASSERT, so this ASSERT
has no effect
4506 #ifdef CONFIG_BTRFS_ASSERT
4507
4508 static inline void assfail(char *expr, char *file, int line)
4509 {
4510 pr_err("BTRFS: assertion failed: %s, file: %s, line: %d",
4511 expr, file, line);
4512 BUG();
4513 }
4514
4515 #define ASSERT(expr) \
4516 (likely(expr) ? (void)0 : assfail(#expr, __FILE__, __LINE__))
4517 #else
4518 #define ASSERT(expr) ((void)0)
4519 #endif
>
> Are you saying that's wrong? Are there other cases where the shrinker
> can end up starting a transaction?
I'm not sure if there are other cases where shrinker might trigger
start_transaction.
But I confirm that btrfs_evict_inode triggers strat_transaction
>
>> Direct IO has a similar problem, journal_info will store
>> btrfs_dio_data,
>> which will lead to illegal memory access.
>
> I have patches getting rid of this for direct I/O here:
> https://github.com/osandov/linux/tree/btrfs-journal-info-abuse
>
> I originally did that for btrfs swapfile support, but if it actually
> fixes an existing bug it should be easy to get merged.
>
>> We fixed the problem by save the journal_info and restore afterwards.
>>
>> CallTrace looks like this:
>> BUG: unable to handle kernel NULL pointer dereference at
>> 0000000000000021
>> IP: [<ffffffffa086f2d4>] start_transaction+0x64/0x450 [btrfs]
>> PGD 8fea4b067 PUD a33bea067 PMD 0
>> Oops: 0000 [#1] SMP
>> CPU: 3 PID: 12681 Comm: btrfs Tainted: P C O 3.10.102 #15266
>> RIP: 0010:[<ffffffffa086f2d4>] start_transaction+0x64/0x450 [btrfs]
>> Call Trace:
>> [<ffffffffa087a838>] ? btrfs_evict_inode+0x3d8/0x580 [btrfs]
>> [<ffffffff81115932>] ? evict+0xa2/0x1a0
>> [<ffffffff81112888>] ? shrink_dentry_list+0x308/0x3d0
>> [<ffffffff811137f3>] ? prune_dcache_sb+0x133/0x160
>> [<ffffffff810fa51f>] ? prune_super+0xcf/0x1a0
>> [<ffffffff810bf6bf>] ? shrink_slab+0x11f/0x1d0
>> [<ffffffff810c19f2>] ? do_try_to_free_pages+0x452/0x560
>> [<ffffffff810bf054>] ? throttle_direct_reclaim+0x74/0x240
>> [<ffffffff810c1bae>] ? try_to_free_pages+0xae/0xc0
>> [<ffffffff810ba16b>] ? __alloc_pages_nodemask+0x53b/0x9f0
>> [<ffffffff810bc89c>] ? __do_page_cache_readahead+0xec/0x270
>> [<ffffffff810bcb2b>] ? ondemand_readahead+0xbb/0x220
>> [<ffffffffa08d7c43>] ? fill_read_buf+0x2b3/0x3a0 [btrfs]
>> [<ffffffffa08dbf5e>] ? send_extent_data+0x10e/0x300 [btrfs]
>> [<ffffffffa08dc34b>] ? process_extent+0x1fb/0x1310 [btrfs]
>> [<ffffffffa08d8300>] ? iterate_dir_item.isra.28+0x1b0/0x250 [btrfs]
>> [<ffffffffa08dd500>] ? send_set_xattr+0xa0/0xa0 [btrfs]
>> [<ffffffffa08de565>] ? changed_cb+0xd5/0xc40 [btrfs]
>> [<ffffffffa08df1c2>] ? full_send_tree+0xf2/0x1a0 [btrfs]
>> [<ffffffffa08e022b>] ? btrfs_ioctl_send+0xfbb/0x1040 [btrfs]
>> [<ffffffffa08a9864>] ? btrfs_ioctl+0x1084/0x32a0 [btrfs]
>>
>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>> ---
>> fs/btrfs/inode.c | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index f534701..77aec8d 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -5295,6 +5295,7 @@ void btrfs_evict_inode(struct inode *inode)
>> int steal_from_global = 0;
>> u64 min_size;
>> int ret;
>> + void *journal_info = NULL;
>
> This initialization isn't necessary.
>
>> trace_btrfs_inode_evict(inode);
>>
>> @@ -5303,6 +5304,16 @@ void btrfs_evict_inode(struct inode *inode)
>> return;
>> }
>>
>> + /*
>> + * Send or Direct IO may store information in journal_info.
>> + * However, this function may use start_transaction and
>> + * start_transaction will use journal_info.
>> + * To avoid accessing invalid memory, we can save the journal_info
>> + * and restore it later.
>> + */
>> + journal_info = current->journal_info;
>> + current->journal_info = NULL;
>> +
>> min_size = btrfs_calc_trunc_metadata_size(fs_info, 1);
>>
>> evict_inode_truncate_pages(inode);
>> @@ -5462,6 +5473,7 @@ void btrfs_evict_inode(struct inode *inode)
>> no_delete:
>> btrfs_remove_delayed_node(BTRFS_I(inode));
>> clear_inode(inode);
>> + current->journal_info = journal_info;
>> }
>>
>> /*
>> --
>> 1.9.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] btrfs: fix invalid memory access with journal_info
2018-05-10 5:55 ` robbieko
@ 2018-05-10 12:56 ` David Sterba
0 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2018-05-10 12:56 UTC (permalink / raw)
To: robbieko; +Cc: Omar Sandoval, linux-btrfs, linux-btrfs-owner
On Thu, May 10, 2018 at 01:55:45PM +0800, robbieko wrote:
> Omar Sandoval 於 2018-05-10 12:53 寫到:
> > On Wed, May 09, 2018 at 06:35:25PM +0800, robbieko wrote:
> >> From: Robbie Ko <robbieko@synology.com>
> >>
> >> When send process requires memory allocation, shrinker may be
> >> triggered
> >> due to insufficient memory.
> >> Then evict_inode gets called when inode is freed, and this function
> >> may need to start transaction.
> >> However, the journal_info is already points to BTRFS_SEND_TRANS_STUB,
> >> it
> >> passed the if condition,
> >> and the following use yields illegal memory access.
> >>
> >> if (current->journal_info) {
> >> WARN_ON(type & TRANS_EXTWRITERS);
> >> h = current->journal_info;
> >> refcount_inc(&h->use_count);
> >> WARN_ON(refcount_read(&h->use_count) > 2);
> >> h->orig_rsv = h->block_rsv;
> >> h->block_rsv = NULL;
> >> goto got_it;
> >> }
> >
> > start_transaction() has
> >
> > ASSERT(current->journal_info != BTRFS_SEND_TRANS_STUB);
>
> I didn't turn on the configuration CONFIG_BTRFS_ASSERT, so this ASSERT
> has no effect
Asserts are usually turned on, at least during development so we can
catch all sorts of problems early. In this case it caught problem that
needs to be fixed regardless of the assert being on or off.
>
> 4506 #ifdef CONFIG_BTRFS_ASSERT
> 4507
> 4508 static inline void assfail(char *expr, char *file, int line)
> 4509 {
> 4510 pr_err("BTRFS: assertion failed: %s, file: %s, line: %d",
> 4511 expr, file, line);
> 4512 BUG();
> 4513 }
> 4514
> 4515 #define ASSERT(expr) \
> 4516 (likely(expr) ? (void)0 : assfail(#expr, __FILE__, __LINE__))
> 4517 #else
> 4518 #define ASSERT(expr) ((void)0)
> 4519 #endif
>
>
> >
> > Are you saying that's wrong? Are there other cases where the shrinker
> > can end up starting a transaction?
>
> I'm not sure if there are other cases where shrinker might trigger
> start_transaction.
> But I confirm that btrfs_evict_inode triggers strat_transaction
The shrinker and allocator can call to filesystems to let them write
data and free some memory, but this is typically guarded by the GFP_NOFS
so it does not recurse to the same filesystem (and deadlock).
Playing games with the journal_info can fix that but I'd rather not
spread the pattern, Omar's removal of that from dio looks cleaner and
the right way to go.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] btrfs: fix invalid memory access with journal_info
2018-05-10 4:53 ` Omar Sandoval
2018-05-10 5:55 ` robbieko
@ 2018-05-10 13:01 ` David Sterba
1 sibling, 0 replies; 5+ messages in thread
From: David Sterba @ 2018-05-10 13:01 UTC (permalink / raw)
To: Omar Sandoval; +Cc: robbieko, linux-btrfs
On Wed, May 09, 2018 at 09:53:03PM -0700, Omar Sandoval wrote:
> On Wed, May 09, 2018 at 06:35:25PM +0800, robbieko wrote:
> > From: Robbie Ko <robbieko@synology.com>
> >
> > When send process requires memory allocation, shrinker may be triggered
> > due to insufficient memory.
> > Then evict_inode gets called when inode is freed, and this function
> > may need to start transaction.
> > However, the journal_info is already points to BTRFS_SEND_TRANS_STUB, it
> > passed the if condition,
> > and the following use yields illegal memory access.
> >
> > if (current->journal_info) {
> > WARN_ON(type & TRANS_EXTWRITERS);
> > h = current->journal_info;
> > refcount_inc(&h->use_count);
> > WARN_ON(refcount_read(&h->use_count) > 2);
> > h->orig_rsv = h->block_rsv;
> > h->block_rsv = NULL;
> > goto got_it;
> > }
>
> start_transaction() has
>
> ASSERT(current->journal_info != BTRFS_SEND_TRANS_STUB);
>
> Are you saying that's wrong? Are there other cases where the shrinker
> can end up starting a transaction?
>
> > Direct IO has a similar problem, journal_info will store btrfs_dio_data,
> > which will lead to illegal memory access.
>
> I have patches getting rid of this for direct I/O here:
> https://github.com/osandov/linux/tree/btrfs-journal-info-abuse
>
> I originally did that for btrfs swapfile support, but if it actually
> fixes an existing bug it should be easy to get merged.
Yes please.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-05-10 13:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-09 10:35 [PATCH] btrfs: fix invalid memory access with journal_info robbieko
2018-05-10 4:53 ` Omar Sandoval
2018-05-10 5:55 ` robbieko
2018-05-10 12:56 ` David Sterba
2018-05-10 13:01 ` David Sterba
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.