* [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool @ 2020-06-16 2:17 Qu Wenruo 2020-06-16 2:17 ` [PATCH 1/4] btrfs: disk-io: don't allocate anonymous block device for user invisible roots Qu Wenruo ` (4 more replies) 0 siblings, 5 replies; 27+ messages in thread From: Qu Wenruo @ 2020-06-16 2:17 UTC (permalink / raw) To: linux-btrfs There is a user report about transaction abort with a rare error number, -EMFILE: ------------[ cut here ]------------ BTRFS: Transaction aborted (error -24) WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] Call Trace: create_pending_snapshots+0x82/0xa0 [btrfs] btrfs_commit_transaction+0x275/0x8c0 [btrfs] btrfs_mksubvol+0x4b9/0x500 [btrfs] btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] btrfs_ioctl+0x11a4/0x2da0 [btrfs] do_vfs_ioctl+0xa9/0x640 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace 33f2f83f3d5250e9 ]--- BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown BTRFS info (device sda1): forced readonly BTRFS warning (device sda1): Skipping commit of aborted transaction. BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown It turns out that, it's caused by exhausted anonymous block device pool. For that case, we don't have any good way to enlarge that pool to match btrfs subvolume limits. But still we can improve the situation not to abort transaction, but report error gracefully to user. This patchset will: - Reduce the user of anonymous block device pool Although it can only reduce the user by data reloc tree and orphan roots, it should still save save several ids. - Catch the unintialized btrfs_root::anon_dev for user visible roots Just an extra safe net for previous patch - Preallocate anon_dev for snapshot/subvolume creation So user will get a graceful error other than transaction abort This patch should be the core of the patchset, as the remaining part won't have much user visible effect. - Free anon_dev earlier to reduce the lifespan of anon_dev Qu Wenruo (4): btrfs: disk-io: don't allocate anonymous block device for user invisible roots btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes btrfs: preallocate anon_dev for subvolume and snapshot creation btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool fs/btrfs/disk-io.c | 59 +++++++++++++++++++++++++++++++++++++----- fs/btrfs/disk-io.h | 2 ++ fs/btrfs/inode.c | 11 +++++++- fs/btrfs/ioctl.c | 21 ++++++++++++++- fs/btrfs/transaction.c | 3 ++- fs/btrfs/transaction.h | 2 ++ 6 files changed, 88 insertions(+), 10 deletions(-) -- 2.27.0 ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 1/4] btrfs: disk-io: don't allocate anonymous block device for user invisible roots 2020-06-16 2:17 [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool Qu Wenruo @ 2020-06-16 2:17 ` Qu Wenruo 2020-06-16 19:21 ` Josef Bacik 2020-06-16 2:17 ` [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes Qu Wenruo ` (3 subsequent siblings) 4 siblings, 1 reply; 27+ messages in thread From: Qu Wenruo @ 2020-06-16 2:17 UTC (permalink / raw) To: linux-btrfs; +Cc: Greed Rong [BUG] When a lot of subvolumes are created, there is a user report about transaction aborted: ------------[ cut here ]------------ BTRFS: Transaction aborted (error -24) WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] Call Trace: create_pending_snapshots+0x82/0xa0 [btrfs] btrfs_commit_transaction+0x275/0x8c0 [btrfs] btrfs_mksubvol+0x4b9/0x500 [btrfs] btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] btrfs_ioctl+0x11a4/0x2da0 [btrfs] do_vfs_ioctl+0xa9/0x640 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace 33f2f83f3d5250e9 ]--- BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown BTRFS info (device sda1): forced readonly BTRFS warning (device sda1): Skipping commit of aborted transaction. BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown [CAUSE] The root cause is we don't have unlimited resource for anonymous block device number. The anonymous block device pool only contains 1<<20 devices, and is shared across a several fses, like ceph and overlayfs. While btrfs has support for 1<<48 subvolumes, so it's just a problem of time to hit such limit. [WORKAROUND] Since it's not possible to completely solve the problem, we can only workaround it. Firstly, we can reduce the user of anon_dev. Data reloc tree is not visible to users, thus it doesn't need anon_dev at all. This patch will do extra check on root objectid, to rule out roots who don't need anon_dev. Although currently it's only data reloc tree and orphan roots. Reported-by: Greed Rong <greedrong@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/ Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index c70d47b8090a..cfc0ff288238 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1428,9 +1428,17 @@ static int btrfs_init_fs_root(struct btrfs_root *root) spin_lock_init(&root->ino_cache_lock); init_waitqueue_head(&root->ino_cache_wait); - ret = get_anon_bdev(&root->anon_dev); - if (ret) - goto fail; + /* + * Anonymous block device pool has limited size (1M), which is way + * smaller than btrfs subvolumes limits (1<<48). + * We shouldn't allocate any if it's not a user visible subvolume. + */ + if (is_fstree(root->root_key.objectid) && + btrfs_root_refs(&root->root_item)) { + ret = get_anon_bdev(&root->anon_dev); + if (ret) + goto fail; + } mutex_lock(&root->objectid_mutex); ret = btrfs_find_highest_objectid(root, -- 2.27.0 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/4] btrfs: disk-io: don't allocate anonymous block device for user invisible roots 2020-06-16 2:17 ` [PATCH 1/4] btrfs: disk-io: don't allocate anonymous block device for user invisible roots Qu Wenruo @ 2020-06-16 19:21 ` Josef Bacik 0 siblings, 0 replies; 27+ messages in thread From: Josef Bacik @ 2020-06-16 19:21 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs; +Cc: Greed Rong On 6/15/20 10:17 PM, Qu Wenruo wrote: > [BUG] > When a lot of subvolumes are created, there is a user report about > transaction aborted: > > ------------[ cut here ]------------ > BTRFS: Transaction aborted (error -24) > WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] > RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] > Call Trace: > create_pending_snapshots+0x82/0xa0 [btrfs] > btrfs_commit_transaction+0x275/0x8c0 [btrfs] > btrfs_mksubvol+0x4b9/0x500 [btrfs] > btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] > btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] > btrfs_ioctl+0x11a4/0x2da0 [btrfs] > do_vfs_ioctl+0xa9/0x640 > ksys_ioctl+0x67/0x90 > __x64_sys_ioctl+0x1a/0x20 > do_syscall_64+0x5a/0x110 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > ---[ end trace 33f2f83f3d5250e9 ]--- > BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown > BTRFS info (device sda1): forced readonly > BTRFS warning (device sda1): Skipping commit of aborted transaction. > BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown > > [CAUSE] > The root cause is we don't have unlimited resource for anonymous block > device number. > The anonymous block device pool only contains 1<<20 devices, and is > shared across a several fses, like ceph and overlayfs. > > While btrfs has support for 1<<48 subvolumes, so it's just a problem of > time to hit such limit. > > [WORKAROUND] > Since it's not possible to completely solve the problem, we can only > workaround it. > > Firstly, we can reduce the user of anon_dev. Data reloc tree is not visible > to users, thus it doesn't need anon_dev at all. > > This patch will do extra check on root objectid, to rule out roots who > don't need anon_dev. > Although currently it's only data reloc tree and orphan roots. > > Reported-by: Greed Rong <greedrong@gmail.com> > Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/ > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes 2020-06-16 2:17 [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool Qu Wenruo 2020-06-16 2:17 ` [PATCH 1/4] btrfs: disk-io: don't allocate anonymous block device for user invisible roots Qu Wenruo @ 2020-06-16 2:17 ` Qu Wenruo 2020-06-16 19:25 ` Josef Bacik 2020-06-16 2:17 ` [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation Qu Wenruo ` (2 subsequent siblings) 4 siblings, 1 reply; 27+ messages in thread From: Qu Wenruo @ 2020-06-16 2:17 UTC (permalink / raw) To: linux-btrfs btrfs_root::anon_dev is an indicator for different subvolume namespaces. Thus it should always be initialized to an anonymous block device. Add a safe net to catch such uninitialized values. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/inode.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 00647e8cf2df..195aac71fe32 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8697,6 +8697,7 @@ static int btrfs_getattr(const struct path *path, struct kstat *stat, { u64 delalloc_bytes; struct inode *inode = d_inode(path->dentry); + struct btrfs_root *root = BTRFS_I(inode)->root; u32 blocksize = inode->i_sb->s_blocksize; u32 bi_flags = BTRFS_I(inode)->flags; @@ -8718,7 +8719,13 @@ static int btrfs_getattr(const struct path *path, struct kstat *stat, STATX_ATTR_NODUMP); generic_fillattr(inode, stat); - stat->dev = BTRFS_I(inode)->root->anon_dev; + if (unlikely(!root->anon_dev)) { + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); + btrfs_warn(root->fs_info, + "uninitialized btrfs_root::anon_dev detected, root=%llu", + root->root_key.objectid); + } + stat->dev = root->anon_dev; spin_lock(&BTRFS_I(inode)->lock); delalloc_bytes = BTRFS_I(inode)->new_delalloc_bytes; -- 2.27.0 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes 2020-06-16 2:17 ` [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes Qu Wenruo @ 2020-06-16 19:25 ` Josef Bacik 2020-06-16 22:49 ` Qu Wenruo 0 siblings, 1 reply; 27+ messages in thread From: Josef Bacik @ 2020-06-16 19:25 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 6/15/20 10:17 PM, Qu Wenruo wrote: > btrfs_root::anon_dev is an indicator for different subvolume namespaces. > Thus it should always be initialized to an anonymous block device. > > Add a safe net to catch such uninitialized values. > > Signed-off-by: Qu Wenruo <wqu@suse.com> Can we handle stat->dev not having a device set? Or will this blow up in other ways? Thanks, Josef ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes 2020-06-16 19:25 ` Josef Bacik @ 2020-06-16 22:49 ` Qu Wenruo 2020-06-16 23:32 ` Josef Bacik 0 siblings, 1 reply; 27+ messages in thread From: Qu Wenruo @ 2020-06-16 22:49 UTC (permalink / raw) To: Josef Bacik, Qu Wenruo, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 746 bytes --] On 2020/6/17 上午3:25, Josef Bacik wrote: > On 6/15/20 10:17 PM, Qu Wenruo wrote: >> btrfs_root::anon_dev is an indicator for different subvolume namespaces. >> Thus it should always be initialized to an anonymous block device. >> >> Add a safe net to catch such uninitialized values. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> > > Can we handle stat->dev not having a device set? Or will this blow up > in other ways? Thanks, We can handle it without any problem, just users get confused. As a common practice, we use different bdev as a namespace for different subvolumes. Without a valid bdev, some user space tools may not be able to distinguish inodes in different subvolumes. Thanks, Qu > > Josef [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes 2020-06-16 22:49 ` Qu Wenruo @ 2020-06-16 23:32 ` Josef Bacik 2020-06-16 23:49 ` Qu Wenruo 0 siblings, 1 reply; 27+ messages in thread From: Josef Bacik @ 2020-06-16 23:32 UTC (permalink / raw) To: Qu Wenruo, Qu Wenruo, linux-btrfs On 6/16/20 6:49 PM, Qu Wenruo wrote: > > > On 2020/6/17 上午3:25, Josef Bacik wrote: >> On 6/15/20 10:17 PM, Qu Wenruo wrote: >>> btrfs_root::anon_dev is an indicator for different subvolume namespaces. >>> Thus it should always be initialized to an anonymous block device. >>> >>> Add a safe net to catch such uninitialized values. >>> >>> Signed-off-by: Qu Wenruo <wqu@suse.com> >> >> Can we handle stat->dev not having a device set? Or will this blow up >> in other ways? Thanks, > > We can handle it without any problem, just users get confused. > > As a common practice, we use different bdev as a namespace for different > subvolumes. > Without a valid bdev, some user space tools may not be able to > distinguish inodes in different subvolumes. > Alright that's fine then. But I feel like stat is one of those things that'll flood the console, can we put this somewhere else that's going to be hit less? Thanks, Josef ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes 2020-06-16 23:32 ` Josef Bacik @ 2020-06-16 23:49 ` Qu Wenruo 2020-06-17 11:31 ` David Sterba 0 siblings, 1 reply; 27+ messages in thread From: Qu Wenruo @ 2020-06-16 23:49 UTC (permalink / raw) To: Josef Bacik, Qu Wenruo, linux-btrfs On 2020/6/17 上午7:32, Josef Bacik wrote: > On 6/16/20 6:49 PM, Qu Wenruo wrote: >> >> >> On 2020/6/17 上午3:25, Josef Bacik wrote: >>> On 6/15/20 10:17 PM, Qu Wenruo wrote: >>>> btrfs_root::anon_dev is an indicator for different subvolume >>>> namespaces. >>>> Thus it should always be initialized to an anonymous block device. >>>> >>>> Add a safe net to catch such uninitialized values. >>>> >>>> Signed-off-by: Qu Wenruo <wqu@suse.com> >>> >>> Can we handle stat->dev not having a device set? Or will this blow up >>> in other ways? Thanks, >> >> We can handle it without any problem, just users get confused. >> >> As a common practice, we use different bdev as a namespace for different >> subvolumes. >> Without a valid bdev, some user space tools may not be able to >> distinguish inodes in different subvolumes. >> > > Alright that's fine then. But I feel like stat is one of those things > that'll flood the console, can we put this somewhere else that's going > to be hit less? Thanks, Unfortunately, stat() is the only user of btrfs_root::anon_dev. While fortunately, the logical is pretty simple, even without the safe net we can understand the lifespan pretty well. I'm fine to drop this patch if you're concerned about the possible warning flood, as the benefit is really not that much. Thanks, Qu > > Josef ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes 2020-06-16 23:49 ` Qu Wenruo @ 2020-06-17 11:31 ` David Sterba 2020-06-17 13:37 ` Josef Bacik 0 siblings, 1 reply; 27+ messages in thread From: David Sterba @ 2020-06-17 11:31 UTC (permalink / raw) To: Qu Wenruo; +Cc: Josef Bacik, Qu Wenruo, linux-btrfs On Wed, Jun 17, 2020 at 07:49:33AM +0800, Qu Wenruo wrote: > >>> Can we handle stat->dev not having a device set? Or will this blow up > >>> in other ways? Thanks, > >> > >> We can handle it without any problem, just users get confused. > >> > >> As a common practice, we use different bdev as a namespace for different > >> subvolumes. > >> Without a valid bdev, some user space tools may not be able to > >> distinguish inodes in different subvolumes. > >> > > > > Alright that's fine then. But I feel like stat is one of those things > > that'll flood the console, can we put this somewhere else that's going > > to be hit less? Thanks, > > Unfortunately, stat() is the only user of btrfs_root::anon_dev. > > While fortunately, the logical is pretty simple, even without the safe > net we can understand the lifespan pretty well. > > I'm fine to drop this patch if you're concerned about the possible > warning flood, as the benefit is really not that much. It could be a developer-only warning but if there's a root with a bad anon_dev, a simple 'ls -l' would flood the log for sure. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes 2020-06-17 11:31 ` David Sterba @ 2020-06-17 13:37 ` Josef Bacik 2020-06-17 23:39 ` Qu Wenruo 0 siblings, 1 reply; 27+ messages in thread From: Josef Bacik @ 2020-06-17 13:37 UTC (permalink / raw) To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs On 6/17/20 7:31 AM, David Sterba wrote: > On Wed, Jun 17, 2020 at 07:49:33AM +0800, Qu Wenruo wrote: >>>>> Can we handle stat->dev not having a device set? Or will this blow up >>>>> in other ways? Thanks, >>>> >>>> We can handle it without any problem, just users get confused. >>>> >>>> As a common practice, we use different bdev as a namespace for different >>>> subvolumes. >>>> Without a valid bdev, some user space tools may not be able to >>>> distinguish inodes in different subvolumes. >>>> >>> >>> Alright that's fine then. But I feel like stat is one of those things >>> that'll flood the console, can we put this somewhere else that's going >>> to be hit less? Thanks, >> >> Unfortunately, stat() is the only user of btrfs_root::anon_dev. >> >> While fortunately, the logical is pretty simple, even without the safe >> net we can understand the lifespan pretty well. >> >> I'm fine to drop this patch if you're concerned about the possible >> warning flood, as the benefit is really not that much. > > It could be a developer-only warning but if there's a root with a bad > anon_dev, a simple 'ls -l' would flood the log for sure. > We'll know in btrfs_init_fs_root() when get_anon_bdev() fails right? Can't we just complain then? That seems less spammy. Thanks, Josef ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes 2020-06-17 13:37 ` Josef Bacik @ 2020-06-17 23:39 ` Qu Wenruo 0 siblings, 0 replies; 27+ messages in thread From: Qu Wenruo @ 2020-06-17 23:39 UTC (permalink / raw) To: Josef Bacik, dsterba, Qu Wenruo, linux-btrfs On 2020/6/17 下午9:37, Josef Bacik wrote: > On 6/17/20 7:31 AM, David Sterba wrote: >> On Wed, Jun 17, 2020 at 07:49:33AM +0800, Qu Wenruo wrote: >>>>>> Can we handle stat->dev not having a device set? Or will this >>>>>> blow up >>>>>> in other ways? Thanks, >>>>> >>>>> We can handle it without any problem, just users get confused. >>>>> >>>>> As a common practice, we use different bdev as a namespace for >>>>> different >>>>> subvolumes. >>>>> Without a valid bdev, some user space tools may not be able to >>>>> distinguish inodes in different subvolumes. >>>>> >>>> >>>> Alright that's fine then. But I feel like stat is one of those things >>>> that'll flood the console, can we put this somewhere else that's going >>>> to be hit less? Thanks, >>> >>> Unfortunately, stat() is the only user of btrfs_root::anon_dev. >>> >>> While fortunately, the logical is pretty simple, even without the safe >>> net we can understand the lifespan pretty well. >>> >>> I'm fine to drop this patch if you're concerned about the possible >>> warning flood, as the benefit is really not that much. >> >> It could be a developer-only warning but if there's a root with a bad >> anon_dev, a simple 'ls -l' would flood the log for sure. >> > > We'll know in btrfs_init_fs_root() when get_anon_bdev() fails right? > Can't we just complain then? That seems less spammy. Thanks, That's exactly where we do the allocation, that would be something like warning about the allocation result. That's why I'm fine to drop this patch if it's too spammy. Thanks, Qu > > Josef ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation 2020-06-16 2:17 [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool Qu Wenruo 2020-06-16 2:17 ` [PATCH 1/4] btrfs: disk-io: don't allocate anonymous block device for user invisible roots Qu Wenruo 2020-06-16 2:17 ` [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes Qu Wenruo @ 2020-06-16 2:17 ` Qu Wenruo 2020-06-16 15:10 ` David Sterba 2020-06-16 2:17 ` [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool Qu Wenruo 2020-06-30 14:14 ` [PATCH 0/4] btrfs: workaround exhausted " David Sterba 4 siblings, 1 reply; 27+ messages in thread From: Qu Wenruo @ 2020-06-16 2:17 UTC (permalink / raw) To: linux-btrfs; +Cc: Greed Rong [BUG] When a lot of subvolumes are created, there is a user report about transaction aborted: ------------[ cut here ]------------ BTRFS: Transaction aborted (error -24) WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] Call Trace: create_pending_snapshots+0x82/0xa0 [btrfs] btrfs_commit_transaction+0x275/0x8c0 [btrfs] btrfs_mksubvol+0x4b9/0x500 [btrfs] btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] btrfs_ioctl+0x11a4/0x2da0 [btrfs] do_vfs_ioctl+0xa9/0x640 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace 33f2f83f3d5250e9 ]--- BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown BTRFS info (device sda1): forced readonly BTRFS warning (device sda1): Skipping commit of aborted transaction. BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown [CAUSE] When the global anonymous block device pool is exhausted, the following call chain will fail, and lead to transaction abort: btrfs_ioctl_snap_create_v2() |- btrfs_ioctl_snap_create_transid() |- btrfs_mksubvol() |- btrfs_commit_transaction() |- create_pending_snapshot() |- btrfs_get_fs_root() |- btrfs_init_fs_root() |- get_anon_bdev() [FIX] Although we can't enlarge the anonymous block device pool, at least we can preallocate anon_dev for subvolume/snapshot creation. So that when the pool is exhausted, user will get an error other than aborting transaction later. Reported-by: Greed Rong <greedrong@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/ Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 51 ++++++++++++++++++++++++++++++++++++------ fs/btrfs/disk-io.h | 2 ++ fs/btrfs/ioctl.c | 21 ++++++++++++++++- fs/btrfs/transaction.c | 3 ++- fs/btrfs/transaction.h | 2 ++ 5 files changed, 70 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index cfc0ff288238..14fd69b71cb8 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1395,7 +1395,7 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root, goto out; } -static int btrfs_init_fs_root(struct btrfs_root *root) +static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev) { int ret; unsigned int nofs_flag; @@ -1435,9 +1435,13 @@ static int btrfs_init_fs_root(struct btrfs_root *root) */ if (is_fstree(root->root_key.objectid) && btrfs_root_refs(&root->root_item)) { - ret = get_anon_bdev(&root->anon_dev); - if (ret) - goto fail; + if (!anon_dev) { + ret = get_anon_bdev(&root->anon_dev); + if (ret) + goto fail; + } else { + root->anon_dev = anon_dev; + } } mutex_lock(&root->objectid_mutex); @@ -1542,8 +1546,27 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info) } -struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, - u64 objectid, bool check_ref) +/* + * Get a fs root. + * + * For essential trees like root/extent tree, we grab it from fs_info directly. + * For subvolume trees, we check the cached fs roots first. If miss then + * read it from disk and add it to cached fs roots. + * + * Caller should release the root by calling btrfs_put_root() after the usage. + * + * NOTE: Reloc and log trees can't be read by this function as they share the + * same root objectid. + * + * @objectid: Root (subvolume) id + * @anon_dev: Preallocated anonymous block device number for new roots. + * Pass 0 for automatic allocation. + * @check_ref: Whether to check root refs. If true, return -ENOENT for orphan + * roots. + */ +static struct btrfs_root *__get_fs_root(struct btrfs_fs_info *fs_info, + u64 objectid, dev_t anon_dev, + bool check_ref) { struct btrfs_root *root; struct btrfs_path *path; @@ -1572,6 +1595,8 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, again: root = btrfs_lookup_fs_root(fs_info, objectid); if (root) { + /* Shouldn't get preallocated anon_dev for cached roots */ + ASSERT(!anon_dev); if (check_ref && btrfs_root_refs(&root->root_item) == 0) { btrfs_put_root(root); return ERR_PTR(-ENOENT); @@ -1591,7 +1616,7 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, goto fail; } - ret = btrfs_init_fs_root(root); + ret = btrfs_init_fs_root(root, anon_dev); if (ret) goto fail; @@ -1624,6 +1649,18 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, return ERR_PTR(ret); } +struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, + u64 objectid, bool check_ref) +{ + return __get_fs_root(fs_info, objectid, 0, check_ref); +} + +struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info, + u64 objectid, dev_t anon_dev) +{ + return __get_fs_root(fs_info, objectid, anon_dev, true); +} + static int btrfs_congested_fn(void *congested_data, int bdi_bits) { struct btrfs_fs_info *info = (struct btrfs_fs_info *)congested_data; diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index bf43245406c4..00dc39d47ed3 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -67,6 +67,8 @@ void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info); struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, u64 objectid, bool check_ref); +struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info, + u64 objectid, dev_t anon_dev); void btrfs_free_fs_info(struct btrfs_fs_info *fs_info); int btrfs_cleanup_fs_roots(struct btrfs_fs_info *fs_info); diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 168deb8ef68a..644bfe53ff61 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -566,6 +566,7 @@ static noinline int create_subvol(struct inode *dir, struct inode *inode; int ret; int err; + dev_t anon_dev = 0; u64 objectid; u64 new_dirid = BTRFS_FIRST_FREE_OBJECTID; u64 index = 0; @@ -578,6 +579,10 @@ static noinline int create_subvol(struct inode *dir, if (ret) goto fail_free; + ret = get_anon_bdev(&anon_dev); + if (ret < 0) + goto fail_free; + /* * Don't create subvolume whose level is not zero. Or qgroup will be * screwed up since it assumes subvolume qgroup's level to be 0. @@ -660,12 +665,15 @@ static noinline int create_subvol(struct inode *dir, goto fail; key.offset = (u64)-1; - new_root = btrfs_get_fs_root(fs_info, objectid, true); + new_root = btrfs_get_new_fs_root(fs_info, objectid, anon_dev); if (IS_ERR(new_root)) { + free_anon_bdev(anon_dev); ret = PTR_ERR(new_root); btrfs_abort_transaction(trans, ret); goto fail; } + /* Freeing would be done in btrfs_put_root() of @new_root */ + anon_dev = 0; btrfs_record_root_in_trans(trans, new_root); @@ -735,6 +743,8 @@ static noinline int create_subvol(struct inode *dir, return ret; fail_free: + if (anon_dev) + free_anon_bdev(anon_dev); kfree(root_item); return ret; } @@ -762,6 +772,9 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir, if (!pending_snapshot) return -ENOMEM; + ret = get_anon_bdev(&pending_snapshot->anon_dev); + if (ret < 0) + goto free_pending; pending_snapshot->root_item = kzalloc(sizeof(struct btrfs_root_item), GFP_KERNEL); pending_snapshot->path = btrfs_alloc_path(); @@ -823,10 +836,16 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir, d_instantiate(dentry, inode); ret = 0; + pending_snapshot->anon_dev = 0; fail: + /* Prevent doulbe freeing of anon_dev */ + if (ret && pending_snapshot->snap) + pending_snapshot->snap->anon_dev = 0; btrfs_put_root(pending_snapshot->snap); btrfs_subvolume_release_metadata(fs_info, &pending_snapshot->block_rsv); free_pending: + if (pending_snapshot->anon_dev) + free_anon_bdev(pending_snapshot->anon_dev); kfree(pending_snapshot->root_item); btrfs_free_path(pending_snapshot->path); kfree(pending_snapshot); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index b359d4b17658..48f3db64cd40 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1630,7 +1630,8 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, } key.offset = (u64)-1; - pending->snap = btrfs_get_fs_root(fs_info, objectid, true); + pending->snap = btrfs_get_new_fs_root(fs_info, objectid, + pending->anon_dev); if (IS_ERR(pending->snap)) { ret = PTR_ERR(pending->snap); btrfs_abort_transaction(trans, ret); diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index bf102e64bfb2..a122a712f5cc 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -151,6 +151,8 @@ struct btrfs_pending_snapshot { struct btrfs_block_rsv block_rsv; /* extra metadata reservation for relocation */ int error; + /* Preallocated anonymous block device number */ + dev_t anon_dev; bool readonly; struct list_head list; }; -- 2.27.0 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation 2020-06-16 2:17 ` [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation Qu Wenruo @ 2020-06-16 15:10 ` David Sterba 2020-06-16 22:54 ` Qu Wenruo 2020-07-01 3:25 ` Qu Wenruo 0 siblings, 2 replies; 27+ messages in thread From: David Sterba @ 2020-06-16 15:10 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs, Greed Rong On Tue, Jun 16, 2020 at 10:17:36AM +0800, Qu Wenruo wrote: > [BUG] > When a lot of subvolumes are created, there is a user report about > transaction aborted: > > ------------[ cut here ]------------ > BTRFS: Transaction aborted (error -24) > WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] > RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] > Call Trace: > create_pending_snapshots+0x82/0xa0 [btrfs] > btrfs_commit_transaction+0x275/0x8c0 [btrfs] > btrfs_mksubvol+0x4b9/0x500 [btrfs] > btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] > btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] > btrfs_ioctl+0x11a4/0x2da0 [btrfs] > do_vfs_ioctl+0xa9/0x640 > ksys_ioctl+0x67/0x90 > __x64_sys_ioctl+0x1a/0x20 > do_syscall_64+0x5a/0x110 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > ---[ end trace 33f2f83f3d5250e9 ]--- > BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown > BTRFS info (device sda1): forced readonly > BTRFS warning (device sda1): Skipping commit of aborted transaction. > BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown > > [CAUSE] > When the global anonymous block device pool is exhausted, the following > call chain will fail, and lead to transaction abort: > > btrfs_ioctl_snap_create_v2() > |- btrfs_ioctl_snap_create_transid() > |- btrfs_mksubvol() > |- btrfs_commit_transaction() > |- create_pending_snapshot() > |- btrfs_get_fs_root() > |- btrfs_init_fs_root() > |- get_anon_bdev() > > [FIX] > Although we can't enlarge the anonymous block device pool, at least we > can preallocate anon_dev for subvolume/snapshot creation. > So that when the pool is exhausted, user will get an error other than > aborting transaction later. > > Reported-by: Greed Rong <greedrong@gmail.com> > Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/ > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/disk-io.c | 51 ++++++++++++++++++++++++++++++++++++------ > fs/btrfs/disk-io.h | 2 ++ > fs/btrfs/ioctl.c | 21 ++++++++++++++++- > fs/btrfs/transaction.c | 3 ++- > fs/btrfs/transaction.h | 2 ++ > 5 files changed, 70 insertions(+), 9 deletions(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index cfc0ff288238..14fd69b71cb8 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -1395,7 +1395,7 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root, > goto out; > } > > -static int btrfs_init_fs_root(struct btrfs_root *root) > +static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev) > { > int ret; > unsigned int nofs_flag; > @@ -1435,9 +1435,13 @@ static int btrfs_init_fs_root(struct btrfs_root *root) > */ > if (is_fstree(root->root_key.objectid) && > btrfs_root_refs(&root->root_item)) { > - ret = get_anon_bdev(&root->anon_dev); > - if (ret) > - goto fail; > + if (!anon_dev) { > + ret = get_anon_bdev(&root->anon_dev); > + if (ret) > + goto fail; > + } else { > + root->anon_dev = anon_dev; > + } > } > > mutex_lock(&root->objectid_mutex); > @@ -1542,8 +1546,27 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info) > } > > > -struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, > - u64 objectid, bool check_ref) > +/* > + * Get a fs root. > + * > + * For essential trees like root/extent tree, we grab it from fs_info directly. > + * For subvolume trees, we check the cached fs roots first. If miss then > + * read it from disk and add it to cached fs roots. > + * > + * Caller should release the root by calling btrfs_put_root() after the usage. > + * > + * NOTE: Reloc and log trees can't be read by this function as they share the > + * same root objectid. > + * > + * @objectid: Root (subvolume) id > + * @anon_dev: Preallocated anonymous block device number for new roots. > + * Pass 0 for automatic allocation. > + * @check_ref: Whether to check root refs. If true, return -ENOENT for orphan > + * roots. > + */ > +static struct btrfs_root *__get_fs_root(struct btrfs_fs_info *fs_info, > + u64 objectid, dev_t anon_dev, > + bool check_ref) > +struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, > + u64 objectid, bool check_ref) > +{ > + return __get_fs_root(fs_info, objectid, 0, check_ref); > +} > + > +struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info, > + u64 objectid, dev_t anon_dev) > +{ > + return __get_fs_root(fs_info, objectid, anon_dev, true); > +} This does not look like a good API, we should keep btrfs_get_fs_root and add the anon_bdev initialization to the callers, there are only a few. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation 2020-06-16 15:10 ` David Sterba @ 2020-06-16 22:54 ` Qu Wenruo 2020-07-01 3:25 ` Qu Wenruo 1 sibling, 0 replies; 27+ messages in thread From: Qu Wenruo @ 2020-06-16 22:54 UTC (permalink / raw) To: dsterba, linux-btrfs, Greed Rong On 2020/6/16 下午11:10, David Sterba wrote: > On Tue, Jun 16, 2020 at 10:17:36AM +0800, Qu Wenruo wrote: >> [BUG] >> When a lot of subvolumes are created, there is a user report about >> transaction aborted: >> >> ------------[ cut here ]------------ >> BTRFS: Transaction aborted (error -24) >> WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] >> RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] >> Call Trace: >> create_pending_snapshots+0x82/0xa0 [btrfs] >> btrfs_commit_transaction+0x275/0x8c0 [btrfs] >> btrfs_mksubvol+0x4b9/0x500 [btrfs] >> btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] >> btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] >> btrfs_ioctl+0x11a4/0x2da0 [btrfs] >> do_vfs_ioctl+0xa9/0x640 >> ksys_ioctl+0x67/0x90 >> __x64_sys_ioctl+0x1a/0x20 >> do_syscall_64+0x5a/0x110 >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> ---[ end trace 33f2f83f3d5250e9 ]--- >> BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown >> BTRFS info (device sda1): forced readonly >> BTRFS warning (device sda1): Skipping commit of aborted transaction. >> BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown >> >> [CAUSE] >> When the global anonymous block device pool is exhausted, the following >> call chain will fail, and lead to transaction abort: >> >> btrfs_ioctl_snap_create_v2() >> |- btrfs_ioctl_snap_create_transid() >> |- btrfs_mksubvol() >> |- btrfs_commit_transaction() >> |- create_pending_snapshot() >> |- btrfs_get_fs_root() >> |- btrfs_init_fs_root() >> |- get_anon_bdev() >> >> [FIX] >> Although we can't enlarge the anonymous block device pool, at least we >> can preallocate anon_dev for subvolume/snapshot creation. >> So that when the pool is exhausted, user will get an error other than >> aborting transaction later. >> >> Reported-by: Greed Rong <greedrong@gmail.com> >> Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/ >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/disk-io.c | 51 ++++++++++++++++++++++++++++++++++++------ >> fs/btrfs/disk-io.h | 2 ++ >> fs/btrfs/ioctl.c | 21 ++++++++++++++++- >> fs/btrfs/transaction.c | 3 ++- >> fs/btrfs/transaction.h | 2 ++ >> 5 files changed, 70 insertions(+), 9 deletions(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index cfc0ff288238..14fd69b71cb8 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -1395,7 +1395,7 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root, >> goto out; >> } >> >> -static int btrfs_init_fs_root(struct btrfs_root *root) >> +static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev) >> { >> int ret; >> unsigned int nofs_flag; >> @@ -1435,9 +1435,13 @@ static int btrfs_init_fs_root(struct btrfs_root *root) >> */ >> if (is_fstree(root->root_key.objectid) && >> btrfs_root_refs(&root->root_item)) { >> - ret = get_anon_bdev(&root->anon_dev); >> - if (ret) >> - goto fail; >> + if (!anon_dev) { >> + ret = get_anon_bdev(&root->anon_dev); >> + if (ret) >> + goto fail; >> + } else { >> + root->anon_dev = anon_dev; >> + } >> } >> >> mutex_lock(&root->objectid_mutex); >> @@ -1542,8 +1546,27 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info) >> } >> >> >> -struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, >> - u64 objectid, bool check_ref) >> +/* >> + * Get a fs root. >> + * >> + * For essential trees like root/extent tree, we grab it from fs_info directly. >> + * For subvolume trees, we check the cached fs roots first. If miss then >> + * read it from disk and add it to cached fs roots. >> + * >> + * Caller should release the root by calling btrfs_put_root() after the usage. >> + * >> + * NOTE: Reloc and log trees can't be read by this function as they share the >> + * same root objectid. >> + * >> + * @objectid: Root (subvolume) id >> + * @anon_dev: Preallocated anonymous block device number for new roots. >> + * Pass 0 for automatic allocation. >> + * @check_ref: Whether to check root refs. If true, return -ENOENT for orphan >> + * roots. >> + */ >> +static struct btrfs_root *__get_fs_root(struct btrfs_fs_info *fs_info, >> + u64 objectid, dev_t anon_dev, >> + bool check_ref) > > >> +struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, >> + u64 objectid, bool check_ref) >> +{ >> + return __get_fs_root(fs_info, objectid, 0, check_ref); >> +} >> + >> +struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info, >> + u64 objectid, dev_t anon_dev) >> +{ >> + return __get_fs_root(fs_info, objectid, anon_dev, true); >> +} > > This does not look like a good API, we should keep btrfs_get_fs_root and > add the anon_bdev initialization to the callers, there are only a few. > OK, I'd go that direction. Thanks, Qu ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation 2020-06-16 15:10 ` David Sterba 2020-06-16 22:54 ` Qu Wenruo @ 2020-07-01 3:25 ` Qu Wenruo 2020-07-01 17:39 ` David Sterba 1 sibling, 1 reply; 27+ messages in thread From: Qu Wenruo @ 2020-07-01 3:25 UTC (permalink / raw) To: dsterba, Qu Wenruo, linux-btrfs, Greed Rong [-- Attachment #1.1: Type: text/plain, Size: 6198 bytes --] On 2020/6/16 下午11:10, David Sterba wrote: > On Tue, Jun 16, 2020 at 10:17:36AM +0800, Qu Wenruo wrote: >> [BUG] >> When a lot of subvolumes are created, there is a user report about >> transaction aborted: >> >> ------------[ cut here ]------------ >> BTRFS: Transaction aborted (error -24) >> WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] >> RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] >> Call Trace: >> create_pending_snapshots+0x82/0xa0 [btrfs] >> btrfs_commit_transaction+0x275/0x8c0 [btrfs] >> btrfs_mksubvol+0x4b9/0x500 [btrfs] >> btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] >> btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] >> btrfs_ioctl+0x11a4/0x2da0 [btrfs] >> do_vfs_ioctl+0xa9/0x640 >> ksys_ioctl+0x67/0x90 >> __x64_sys_ioctl+0x1a/0x20 >> do_syscall_64+0x5a/0x110 >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> ---[ end trace 33f2f83f3d5250e9 ]--- >> BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown >> BTRFS info (device sda1): forced readonly >> BTRFS warning (device sda1): Skipping commit of aborted transaction. >> BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown >> >> [CAUSE] >> When the global anonymous block device pool is exhausted, the following >> call chain will fail, and lead to transaction abort: >> >> btrfs_ioctl_snap_create_v2() >> |- btrfs_ioctl_snap_create_transid() >> |- btrfs_mksubvol() >> |- btrfs_commit_transaction() >> |- create_pending_snapshot() >> |- btrfs_get_fs_root() >> |- btrfs_init_fs_root() >> |- get_anon_bdev() >> >> [FIX] >> Although we can't enlarge the anonymous block device pool, at least we >> can preallocate anon_dev for subvolume/snapshot creation. >> So that when the pool is exhausted, user will get an error other than >> aborting transaction later. >> >> Reported-by: Greed Rong <greedrong@gmail.com> >> Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/ >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/disk-io.c | 51 ++++++++++++++++++++++++++++++++++++------ >> fs/btrfs/disk-io.h | 2 ++ >> fs/btrfs/ioctl.c | 21 ++++++++++++++++- >> fs/btrfs/transaction.c | 3 ++- >> fs/btrfs/transaction.h | 2 ++ >> 5 files changed, 70 insertions(+), 9 deletions(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index cfc0ff288238..14fd69b71cb8 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -1395,7 +1395,7 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root, >> goto out; >> } >> >> -static int btrfs_init_fs_root(struct btrfs_root *root) >> +static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev) >> { >> int ret; >> unsigned int nofs_flag; >> @@ -1435,9 +1435,13 @@ static int btrfs_init_fs_root(struct btrfs_root *root) >> */ >> if (is_fstree(root->root_key.objectid) && >> btrfs_root_refs(&root->root_item)) { >> - ret = get_anon_bdev(&root->anon_dev); >> - if (ret) >> - goto fail; >> + if (!anon_dev) { >> + ret = get_anon_bdev(&root->anon_dev); >> + if (ret) >> + goto fail; >> + } else { >> + root->anon_dev = anon_dev; >> + } >> } >> >> mutex_lock(&root->objectid_mutex); >> @@ -1542,8 +1546,27 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info) >> } >> >> >> -struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, >> - u64 objectid, bool check_ref) >> +/* >> + * Get a fs root. >> + * >> + * For essential trees like root/extent tree, we grab it from fs_info directly. >> + * For subvolume trees, we check the cached fs roots first. If miss then >> + * read it from disk and add it to cached fs roots. >> + * >> + * Caller should release the root by calling btrfs_put_root() after the usage. >> + * >> + * NOTE: Reloc and log trees can't be read by this function as they share the >> + * same root objectid. >> + * >> + * @objectid: Root (subvolume) id >> + * @anon_dev: Preallocated anonymous block device number for new roots. >> + * Pass 0 for automatic allocation. >> + * @check_ref: Whether to check root refs. If true, return -ENOENT for orphan >> + * roots. >> + */ >> +static struct btrfs_root *__get_fs_root(struct btrfs_fs_info *fs_info, >> + u64 objectid, dev_t anon_dev, >> + bool check_ref) > > >> +struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, >> + u64 objectid, bool check_ref) >> +{ >> + return __get_fs_root(fs_info, objectid, 0, check_ref); >> +} >> + >> +struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info, >> + u64 objectid, dev_t anon_dev) >> +{ >> + return __get_fs_root(fs_info, objectid, anon_dev, true); >> +} > > This does not look like a good API, we should keep btrfs_get_fs_root and > add the anon_bdev initialization to the callers, there are only a few. > A few = over 25? I have switched to keep btrfs_get_fs_root(), but you won't like the summary: Old: fs/btrfs/disk-io.h | 2 ++ fs/btrfs/ioctl.c | 21 ++++++++++++++++- fs/btrfs/transaction.c | 3 ++- fs/btrfs/transaction.h | 2 ++ 5 files changed, 70 insertions(+), 9 deletions(-) New: fs/btrfs/backref.c | 4 ++-- fs/btrfs/disk-io.c | 42 ++++++++++++++++++++++++++++++++++-------- fs/btrfs/disk-io.h | 3 ++- fs/btrfs/export.c | 2 +- fs/btrfs/file.c | 2 +- fs/btrfs/inode.c | 2 +- fs/btrfs/ioctl.c | 31 +++++++++++++++++++++++++------ fs/btrfs/relocation.c | 11 ++++++----- fs/btrfs/root-tree.c | 2 +- fs/btrfs/scrub.c | 2 +- fs/btrfs/send.c | 4 ++-- fs/btrfs/super.c | 2 +- fs/btrfs/transaction.c | 3 ++- fs/btrfs/transaction.h | 2 ++ fs/btrfs/tree-log.c | 2 +- fs/btrfs/uuid-tree.c | 2 +- 16 files changed, 83 insertions(+), 33 deletions(-) Do we really go that direction? Thanks, Qu [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation 2020-07-01 3:25 ` Qu Wenruo @ 2020-07-01 17:39 ` David Sterba 2020-07-01 23:56 ` Qu Wenruo 0 siblings, 1 reply; 27+ messages in thread From: David Sterba @ 2020-07-01 17:39 UTC (permalink / raw) To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, Greed Rong On Wed, Jul 01, 2020 at 11:25:27AM +0800, Qu Wenruo wrote: > >> +struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info, > >> + u64 objectid, dev_t anon_dev) > >> +{ > >> + return __get_fs_root(fs_info, objectid, anon_dev, true); > >> +} > > > > This does not look like a good API, we should keep btrfs_get_fs_root and > > add the anon_bdev initialization to the callers, there are only a few. > > > > A few = over 25? > > I have switched to keep btrfs_get_fs_root(), but you won't like the summary: > > Old: > fs/btrfs/disk-io.h | 2 ++ > fs/btrfs/ioctl.c | 21 ++++++++++++++++- > fs/btrfs/transaction.c | 3 ++- > fs/btrfs/transaction.h | 2 ++ > 5 files changed, 70 insertions(+), 9 deletions(-) > > New: > fs/btrfs/backref.c | 4 ++-- > fs/btrfs/disk-io.c | 42 ++++++++++++++++++++++++++++++++++-------- > fs/btrfs/disk-io.h | 3 ++- > fs/btrfs/export.c | 2 +- > fs/btrfs/file.c | 2 +- > fs/btrfs/inode.c | 2 +- > fs/btrfs/ioctl.c | 31 +++++++++++++++++++++++++------ > fs/btrfs/relocation.c | 11 ++++++----- > fs/btrfs/root-tree.c | 2 +- > fs/btrfs/scrub.c | 2 +- > fs/btrfs/send.c | 4 ++-- > fs/btrfs/super.c | 2 +- > fs/btrfs/transaction.c | 3 ++- > fs/btrfs/transaction.h | 2 ++ > fs/btrfs/tree-log.c | 2 +- > fs/btrfs/uuid-tree.c | 2 +- > 16 files changed, 83 insertions(+), 33 deletions(-) > > Do we really go that direction? You're right, I don't like the summary and I don't like the code either. Adding the anon_dev argument to btrfs_get_fs_root is wrong and I have never suggested that. What I meant is to put the actual id allocation to the callers where the subvolume is created, ie only 2 places. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation 2020-07-01 17:39 ` David Sterba @ 2020-07-01 23:56 ` Qu Wenruo 2020-07-02 16:08 ` David Sterba 0 siblings, 1 reply; 27+ messages in thread From: Qu Wenruo @ 2020-07-01 23:56 UTC (permalink / raw) To: dsterba, Qu Wenruo, linux-btrfs, Greed Rong [-- Attachment #1.1: Type: text/plain, Size: 2186 bytes --] On 2020/7/2 上午1:39, David Sterba wrote: > On Wed, Jul 01, 2020 at 11:25:27AM +0800, Qu Wenruo wrote: >>>> +struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info, >>>> + u64 objectid, dev_t anon_dev) >>>> +{ >>>> + return __get_fs_root(fs_info, objectid, anon_dev, true); >>>> +} >>> >>> This does not look like a good API, we should keep btrfs_get_fs_root and >>> add the anon_bdev initialization to the callers, there are only a few. >>> >> >> A few = over 25? >> >> I have switched to keep btrfs_get_fs_root(), but you won't like the summary: >> >> Old: >> fs/btrfs/disk-io.h | 2 ++ >> fs/btrfs/ioctl.c | 21 ++++++++++++++++- >> fs/btrfs/transaction.c | 3 ++- >> fs/btrfs/transaction.h | 2 ++ >> 5 files changed, 70 insertions(+), 9 deletions(-) >> >> New: >> fs/btrfs/backref.c | 4 ++-- >> fs/btrfs/disk-io.c | 42 ++++++++++++++++++++++++++++++++++-------- >> fs/btrfs/disk-io.h | 3 ++- >> fs/btrfs/export.c | 2 +- >> fs/btrfs/file.c | 2 +- >> fs/btrfs/inode.c | 2 +- >> fs/btrfs/ioctl.c | 31 +++++++++++++++++++++++++------ >> fs/btrfs/relocation.c | 11 ++++++----- >> fs/btrfs/root-tree.c | 2 +- >> fs/btrfs/scrub.c | 2 +- >> fs/btrfs/send.c | 4 ++-- >> fs/btrfs/super.c | 2 +- >> fs/btrfs/transaction.c | 3 ++- >> fs/btrfs/transaction.h | 2 ++ >> fs/btrfs/tree-log.c | 2 +- >> fs/btrfs/uuid-tree.c | 2 +- >> 16 files changed, 83 insertions(+), 33 deletions(-) >> >> Do we really go that direction? > > You're right, I don't like the summary and I don't like the code either. > > Adding the anon_dev argument to btrfs_get_fs_root is wrong and I have > never suggested that. What I meant is to put the actual id allocation > to the callers where the subvolume is created, ie only 2 places. > You mean to extract btrfs_init_fs_root() out of btrfs_get_fs_root()? That looks a little risky and I can't find any good solution to make it more elegant than the current one. Although I would definitely remove the "__" prefix as we shouldn't add such prefix anymore. Thanks, Qu [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation 2020-07-01 23:56 ` Qu Wenruo @ 2020-07-02 16:08 ` David Sterba 2020-07-02 23:46 ` David Sterba 0 siblings, 1 reply; 27+ messages in thread From: David Sterba @ 2020-07-02 16:08 UTC (permalink / raw) To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, Greed Rong On Thu, Jul 02, 2020 at 07:56:57AM +0800, Qu Wenruo wrote: > On 2020/7/2 上午1:39, David Sterba wrote: > > On Wed, Jul 01, 2020 at 11:25:27AM +0800, Qu Wenruo wrote: > > Adding the anon_dev argument to btrfs_get_fs_root is wrong and I have > > never suggested that. What I meant is to put the actual id allocation > > to the callers where the subvolume is created, ie only 2 places. > > You mean to extract btrfs_init_fs_root() out of btrfs_get_fs_root()? > > That looks a little risky and I can't find any good solution to make it > more elegant than the current one. I spent more time reading through the get-fs-root functions and the main problem is that btrfs_get_fs_root is doing several things, and it makes a lot of code simple, I certainly want to keep it that way. The idea was to pre-insert the new root (similar to the root item insertion, btrfs_insert_root) and not letting btrfs_get_fs_root call to btrfs_init_fs_info where the anon_bdev allocation happens for all the other non-ioctl cases. Which could be done by factoring out btrfs_init_fs_root from btrfs_get_fs_root. This would allow to extend only btrfs_init_fs_root arguments with the anon_bdev, and keep btrfs_get_fs_root intact. So this is splitting the API from the end. What you originally proposed is a split from the begnning, ie. add a common implementation for existing and new and provide btrfs_get_fs_root and btrfs_get_new_fs_root that would hide the additional parameters. Both ways are IMO valid but I thought it would be easier to pass the anon bdev inside ioctl callbacks. The problem that makes my proposal less appealing is that btrfs_read_tree_root gets called earlier than I'd like so factoring everything after btrfs_init_fs_root would not be so straightforward. In conclusion, your proposal is better and I'm going to merge it. > Although I would definitely remove the "__" prefix as we shouldn't add > such prefix anymore. Yeah with the small naming fixups. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation 2020-07-02 16:08 ` David Sterba @ 2020-07-02 23:46 ` David Sterba 2020-07-03 5:19 ` Qu Wenruo 0 siblings, 1 reply; 27+ messages in thread From: David Sterba @ 2020-07-02 23:46 UTC (permalink / raw) To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs, Greed Rong On Thu, Jul 02, 2020 at 06:08:21PM +0200, David Sterba wrote: > On Thu, Jul 02, 2020 at 07:56:57AM +0800, Qu Wenruo wrote: > > On 2020/7/2 上午1:39, David Sterba wrote: > > > On Wed, Jul 01, 2020 at 11:25:27AM +0800, Qu Wenruo wrote: > > > Adding the anon_dev argument to btrfs_get_fs_root is wrong and I have > > > never suggested that. What I meant is to put the actual id allocation > > > to the callers where the subvolume is created, ie only 2 places. > > > > You mean to extract btrfs_init_fs_root() out of btrfs_get_fs_root()? > > > > That looks a little risky and I can't find any good solution to make it > > more elegant than the current one. > > I spent more time reading through the get-fs-root functions and the main > problem is that btrfs_get_fs_root is doing several things, and it makes > a lot of code simple, I certainly want to keep it that way. > > The idea was to pre-insert the new root (similar to the root item > insertion, btrfs_insert_root) and not letting btrfs_get_fs_root call to > btrfs_init_fs_info where the anon_bdev allocation happens for all the > other non-ioctl cases. > > Which could be done by factoring out btrfs_init_fs_root from > btrfs_get_fs_root. This would allow to extend only btrfs_init_fs_root > arguments with the anon_bdev, and keep btrfs_get_fs_root intact. > So this is splitting the API from the end. > > What you originally proposed is a split from the begnning, ie. add a > common implementation for existing and new and provide btrfs_get_fs_root > and btrfs_get_new_fs_root that would hide the additional parameters. > > Both ways are IMO valid but I thought it would be easier to pass the > anon bdev inside ioctl callbacks. The problem that makes my proposal > less appealing is that btrfs_read_tree_root gets called earlier than > I'd like so factoring everything after btrfs_init_fs_root would not be > so straightforward. > > In conclusion, your proposal is better and I'm going to merge it. > > > Although I would definitely remove the "__" prefix as we shouldn't add > > such prefix anymore. > > Yeah with the small naming fixups. It's in for-next-20200703. I've updated the changelogs to reflect what we found during debugging the issue, the __ function renamed to btrfs_get_root_ref and some function comments added. All patches reordered and tagged for stable though the preallocation is not within the size limit. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation 2020-07-02 23:46 ` David Sterba @ 2020-07-03 5:19 ` Qu Wenruo 2020-07-03 12:29 ` David Sterba 0 siblings, 1 reply; 27+ messages in thread From: Qu Wenruo @ 2020-07-03 5:19 UTC (permalink / raw) To: dsterba, Qu Wenruo, linux-btrfs, Greed Rong [-- Attachment #1.1: Type: text/plain, Size: 2868 bytes --] On 2020/7/3 上午7:46, David Sterba wrote: > On Thu, Jul 02, 2020 at 06:08:21PM +0200, David Sterba wrote: >> On Thu, Jul 02, 2020 at 07:56:57AM +0800, Qu Wenruo wrote: >>> On 2020/7/2 上午1:39, David Sterba wrote: >>>> On Wed, Jul 01, 2020 at 11:25:27AM +0800, Qu Wenruo wrote: >>>> Adding the anon_dev argument to btrfs_get_fs_root is wrong and I have >>>> never suggested that. What I meant is to put the actual id allocation >>>> to the callers where the subvolume is created, ie only 2 places. >>> >>> You mean to extract btrfs_init_fs_root() out of btrfs_get_fs_root()? >>> >>> That looks a little risky and I can't find any good solution to make it >>> more elegant than the current one. >> >> I spent more time reading through the get-fs-root functions and the main >> problem is that btrfs_get_fs_root is doing several things, and it makes >> a lot of code simple, I certainly want to keep it that way. >> >> The idea was to pre-insert the new root (similar to the root item >> insertion, btrfs_insert_root) and not letting btrfs_get_fs_root call to >> btrfs_init_fs_info where the anon_bdev allocation happens for all the >> other non-ioctl cases. >> >> Which could be done by factoring out btrfs_init_fs_root from >> btrfs_get_fs_root. This would allow to extend only btrfs_init_fs_root >> arguments with the anon_bdev, and keep btrfs_get_fs_root intact. >> So this is splitting the API from the end. >> >> What you originally proposed is a split from the begnning, ie. add a >> common implementation for existing and new and provide btrfs_get_fs_root >> and btrfs_get_new_fs_root that would hide the additional parameters. >> >> Both ways are IMO valid but I thought it would be easier to pass the >> anon bdev inside ioctl callbacks. The problem that makes my proposal >> less appealing is that btrfs_read_tree_root gets called earlier than >> I'd like so factoring everything after btrfs_init_fs_root would not be >> so straightforward. >> >> In conclusion, your proposal is better and I'm going to merge it. >> >>> Although I would definitely remove the "__" prefix as we shouldn't add >>> such prefix anymore. >> >> Yeah with the small naming fixups. > > It's in for-next-20200703. I've updated the changelogs to reflect what > we found during debugging the issue, the __ function renamed to > btrfs_get_root_ref and some function comments added. All patches > reordered and tagged for stable though the preallocation is not within > the size limit. > Thanks for the merge and dropping the unneeded check patch. All the modification looks good to me. Just a small nitpick for commit a561defc34aa ("btrfs: don't allocate anonymous block device for user invisible roots"), there is an unnecessary new line after "[CAUSE]". Thanks for your daily work of maintaining btrfs, Qu [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation 2020-07-03 5:19 ` Qu Wenruo @ 2020-07-03 12:29 ` David Sterba 2020-07-03 12:39 ` Qu Wenruo 0 siblings, 1 reply; 27+ messages in thread From: David Sterba @ 2020-07-03 12:29 UTC (permalink / raw) To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs, Greed Rong On Fri, Jul 03, 2020 at 01:19:14PM +0800, Qu Wenruo wrote: > >> In conclusion, your proposal is better and I'm going to merge it. > >> > >>> Although I would definitely remove the "__" prefix as we shouldn't add > >>> such prefix anymore. > >> > >> Yeah with the small naming fixups. > > > > It's in for-next-20200703. I've updated the changelogs to reflect what > > we found during debugging the issue, the __ function renamed to > > btrfs_get_root_ref and some function comments added. All patches > > reordered and tagged for stable though the preallocation is not within > > the size limit. > > > > Thanks for the merge and dropping the unneeded check patch. > > All the modification looks good to me. > > Just a small nitpick for commit a561defc34aa ("btrfs: don't allocate > anonymous block device for user invisible roots"), there is an > unnecessary new line after "[CAUSE]". Fixed, thanks for checking. I'm not entirely satisfied with the name of btrfs_get_root_ref, as it could be confused with the on-disk btrfs_root_ref. The get-ref functions could use some cleanup as btrfs_get_fs_root is sometimes used for non-filesystem roots. Adding a generic get_any_root that would accept any tree and btrfs_get_fs_root would be only for subvolume roots or perhaps related trees like data reloc. But this is not essential for the anon bdev fixes so that's for later. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation 2020-07-03 12:29 ` David Sterba @ 2020-07-03 12:39 ` Qu Wenruo 0 siblings, 0 replies; 27+ messages in thread From: Qu Wenruo @ 2020-07-03 12:39 UTC (permalink / raw) To: dsterba, Qu Wenruo, linux-btrfs, Greed Rong [-- Attachment #1.1: Type: text/plain, Size: 1601 bytes --] On 2020/7/3 下午8:29, David Sterba wrote: > On Fri, Jul 03, 2020 at 01:19:14PM +0800, Qu Wenruo wrote: >>>> In conclusion, your proposal is better and I'm going to merge it. >>>> >>>>> Although I would definitely remove the "__" prefix as we shouldn't add >>>>> such prefix anymore. >>>> >>>> Yeah with the small naming fixups. >>> >>> It's in for-next-20200703. I've updated the changelogs to reflect what >>> we found during debugging the issue, the __ function renamed to >>> btrfs_get_root_ref and some function comments added. All patches >>> reordered and tagged for stable though the preallocation is not within >>> the size limit. >>> >> >> Thanks for the merge and dropping the unneeded check patch. >> >> All the modification looks good to me. >> >> Just a small nitpick for commit a561defc34aa ("btrfs: don't allocate >> anonymous block device for user invisible roots"), there is an >> unnecessary new line after "[CAUSE]". > > Fixed, thanks for checking. I'm not entirely satisfied with the name of > btrfs_get_root_ref, as it could be confused with the on-disk > btrfs_root_ref. My original plan is even simpler, just get_fs_root(), remove the btrfs_ prefix... > The get-ref functions could use some cleanup as > btrfs_get_fs_root is sometimes used for non-filesystem roots. Adding a > generic get_any_root that would accept any tree and btrfs_get_fs_root > would be only for subvolume roots or perhaps related trees like data > reloc. But this is not essential for the anon bdev fixes so that's for > later. No problem. Thanks, Qu [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool 2020-06-16 2:17 [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool Qu Wenruo ` (2 preceding siblings ...) 2020-06-16 2:17 ` [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation Qu Wenruo @ 2020-06-16 2:17 ` Qu Wenruo 2020-06-16 19:23 ` Josef Bacik 2020-06-30 14:14 ` [PATCH 0/4] btrfs: workaround exhausted " David Sterba 4 siblings, 1 reply; 27+ messages in thread From: Qu Wenruo @ 2020-06-16 2:17 UTC (permalink / raw) To: linux-btrfs; +Cc: Greed Rong [BUG] When a lot of subvolumes are created, there is a user report about transaction aborted: ------------[ cut here ]------------ BTRFS: Transaction aborted (error -24) WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] Call Trace: create_pending_snapshots+0x82/0xa0 [btrfs] btrfs_commit_transaction+0x275/0x8c0 [btrfs] btrfs_mksubvol+0x4b9/0x500 [btrfs] btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] btrfs_ioctl+0x11a4/0x2da0 [btrfs] do_vfs_ioctl+0xa9/0x640 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x5a/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ---[ end trace 33f2f83f3d5250e9 ]--- BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown BTRFS info (device sda1): forced readonly BTRFS warning (device sda1): Skipping commit of aborted transaction. BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown [CAUSE] The root cause is we don't have unlimited resource for anonymous block device number. The anonymous block device pool only contains 1<<20 devices, and is shared across a several fses, like ceph and overlayfs. While btrfs has support for 1<<48 subvolumes, so it's just a problem of time to hit such limit. [WORKAROUND] Here we can free btrfs_root::anon_dev as long as the subvolume is no longer visible to users. By freeing it earlier we reclaim the anon_dev quicker, hopefully to reduce the chance of exhausting the pool. Reported-by: Greed Rong <greedrong@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/ Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/inode.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 195aac71fe32..e9f5a166d056 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4045,6 +4045,8 @@ int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry) } } + free_anon_bdev(dest->anon_dev); + dest->anon_dev = 0; out_end_trans: trans->block_rsv = NULL; trans->bytes_reserved = 0; -- 2.27.0 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool 2020-06-16 2:17 ` [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool Qu Wenruo @ 2020-06-16 19:23 ` Josef Bacik 2020-06-16 22:48 ` David Sterba 0 siblings, 1 reply; 27+ messages in thread From: Josef Bacik @ 2020-06-16 19:23 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs; +Cc: Greed Rong On 6/15/20 10:17 PM, Qu Wenruo wrote: > [BUG] > When a lot of subvolumes are created, there is a user report about > transaction aborted: > > ------------[ cut here ]------------ > BTRFS: Transaction aborted (error -24) > WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] > RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] > Call Trace: > create_pending_snapshots+0x82/0xa0 [btrfs] > btrfs_commit_transaction+0x275/0x8c0 [btrfs] > btrfs_mksubvol+0x4b9/0x500 [btrfs] > btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] > btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] > btrfs_ioctl+0x11a4/0x2da0 [btrfs] > do_vfs_ioctl+0xa9/0x640 > ksys_ioctl+0x67/0x90 > __x64_sys_ioctl+0x1a/0x20 > do_syscall_64+0x5a/0x110 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > ---[ end trace 33f2f83f3d5250e9 ]--- > BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown > BTRFS info (device sda1): forced readonly > BTRFS warning (device sda1): Skipping commit of aborted transaction. > BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown > > [CAUSE] > The root cause is we don't have unlimited resource for anonymous block > device number. > The anonymous block device pool only contains 1<<20 devices, and is > shared across a several fses, like ceph and overlayfs. > > While btrfs has support for 1<<48 subvolumes, so it's just a problem of > time to hit such limit. > > [WORKAROUND] > Here we can free btrfs_root::anon_dev as long as the subvolume is no > longer visible to users. > > By freeing it earlier we reclaim the anon_dev quicker, hopefully to > reduce the chance of exhausting the pool. > Why isn't this happening as part of the root teardown once all the references to it are gone? Thanks, Josef ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool 2020-06-16 19:23 ` Josef Bacik @ 2020-06-16 22:48 ` David Sterba 2020-06-16 23:31 ` Josef Bacik 0 siblings, 1 reply; 27+ messages in thread From: David Sterba @ 2020-06-16 22:48 UTC (permalink / raw) To: Josef Bacik; +Cc: Qu Wenruo, linux-btrfs, Greed Rong On Tue, Jun 16, 2020 at 03:23:03PM -0400, Josef Bacik wrote: > > By freeing it earlier we reclaim the anon_dev quicker, hopefully to > > reduce the chance of exhausting the pool. > > Why isn't this happening as part of the root teardown once all the references to > it are gone? Thanks, This is where it happens now and is correct. The problem is that deleted subvolumes keep the id allocated until they are cleaned up, ie. all the dead roots consume the id though we don't need it anymore. Creating and deleting snapshots at large will produce a long list of dead subvolumes. THis patch will return the ids at the earlies possible moment so they can get reused. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool 2020-06-16 22:48 ` David Sterba @ 2020-06-16 23:31 ` Josef Bacik 0 siblings, 0 replies; 27+ messages in thread From: Josef Bacik @ 2020-06-16 23:31 UTC (permalink / raw) To: dsterba, Qu Wenruo, linux-btrfs, Greed Rong On 6/16/20 6:48 PM, David Sterba wrote: > On Tue, Jun 16, 2020 at 03:23:03PM -0400, Josef Bacik wrote: >>> By freeing it earlier we reclaim the anon_dev quicker, hopefully to >>> reduce the chance of exhausting the pool. >> >> Why isn't this happening as part of the root teardown once all the references to >> it are gone? Thanks, > > This is where it happens now and is correct. The problem is that deleted > subvolumes keep the id allocated until they are cleaned up, ie. all the > dead roots consume the id though we don't need it anymore. Creating and > deleting snapshots at large will produce a long list of dead subvolumes. > THis patch will return the ids at the earlies possible moment so they > can get reused. > Oh ok I misread, we're doing it earlier on purpose. Alright that's fine, you can add Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool 2020-06-16 2:17 [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool Qu Wenruo ` (3 preceding siblings ...) 2020-06-16 2:17 ` [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool Qu Wenruo @ 2020-06-30 14:14 ` David Sterba 4 siblings, 0 replies; 27+ messages in thread From: David Sterba @ 2020-06-30 14:14 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Tue, Jun 16, 2020 at 10:17:33AM +0800, Qu Wenruo wrote: > Qu Wenruo (4): > btrfs: disk-io: don't allocate anonymous block device for user > invisible roots > btrfs: detect uninitialized btrfs_root::anon_dev for user visible > subvolumes > btrfs: preallocate anon_dev for subvolume and snapshot creation > btrfs: free anon_dev earlier to prevent exhausting anonymous block > device pool I haven't seen an update to this patchset and would really like to get it fixed. Let me know if I missed an update or you need help, there weren't any big issues left. ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2020-07-03 12:39 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-16 2:17 [PATCH 0/4] btrfs: workaround exhausted anonymous block device pool Qu Wenruo 2020-06-16 2:17 ` [PATCH 1/4] btrfs: disk-io: don't allocate anonymous block device for user invisible roots Qu Wenruo 2020-06-16 19:21 ` Josef Bacik 2020-06-16 2:17 ` [PATCH 2/4] btrfs: detect uninitialized btrfs_root::anon_dev for user visible subvolumes Qu Wenruo 2020-06-16 19:25 ` Josef Bacik 2020-06-16 22:49 ` Qu Wenruo 2020-06-16 23:32 ` Josef Bacik 2020-06-16 23:49 ` Qu Wenruo 2020-06-17 11:31 ` David Sterba 2020-06-17 13:37 ` Josef Bacik 2020-06-17 23:39 ` Qu Wenruo 2020-06-16 2:17 ` [PATCH 3/4] btrfs: preallocate anon_dev for subvolume and snapshot creation Qu Wenruo 2020-06-16 15:10 ` David Sterba 2020-06-16 22:54 ` Qu Wenruo 2020-07-01 3:25 ` Qu Wenruo 2020-07-01 17:39 ` David Sterba 2020-07-01 23:56 ` Qu Wenruo 2020-07-02 16:08 ` David Sterba 2020-07-02 23:46 ` David Sterba 2020-07-03 5:19 ` Qu Wenruo 2020-07-03 12:29 ` David Sterba 2020-07-03 12:39 ` Qu Wenruo 2020-06-16 2:17 ` [PATCH 4/4] btrfs: free anon_dev earlier to prevent exhausting anonymous block device pool Qu Wenruo 2020-06-16 19:23 ` Josef Bacik 2020-06-16 22:48 ` David Sterba 2020-06-16 23:31 ` Josef Bacik 2020-06-30 14:14 ` [PATCH 0/4] btrfs: workaround exhausted " David Sterba
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.