On 2020/6/16 下午11:10, David Sterba wrote: > On Tue, Jun 16, 2020 at 10:17:36AM +0800, Qu Wenruo wrote: >> [BUG] >> When a lot of subvolumes are created, there is a user report about >> transaction aborted: >> >> ------------[ cut here ]------------ >> BTRFS: Transaction aborted (error -24) >> WARNING: CPU: 17 PID: 17041 at fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10 [btrfs] >> RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs] >> Call Trace: >> create_pending_snapshots+0x82/0xa0 [btrfs] >> btrfs_commit_transaction+0x275/0x8c0 [btrfs] >> btrfs_mksubvol+0x4b9/0x500 [btrfs] >> btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs] >> btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs] >> btrfs_ioctl+0x11a4/0x2da0 [btrfs] >> do_vfs_ioctl+0xa9/0x640 >> ksys_ioctl+0x67/0x90 >> __x64_sys_ioctl+0x1a/0x20 >> do_syscall_64+0x5a/0x110 >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> ---[ end trace 33f2f83f3d5250e9 ]--- >> BTRFS: error (device sda1) in create_pending_snapshot:1576: errno=-24 unknown >> BTRFS info (device sda1): forced readonly >> BTRFS warning (device sda1): Skipping commit of aborted transaction. >> BTRFS: error (device sda1) in cleanup_transaction:1831: errno=-24 unknown >> >> [CAUSE] >> When the global anonymous block device pool is exhausted, the following >> call chain will fail, and lead to transaction abort: >> >> btrfs_ioctl_snap_create_v2() >> |- btrfs_ioctl_snap_create_transid() >> |- btrfs_mksubvol() >> |- btrfs_commit_transaction() >> |- create_pending_snapshot() >> |- btrfs_get_fs_root() >> |- btrfs_init_fs_root() >> |- get_anon_bdev() >> >> [FIX] >> Although we can't enlarge the anonymous block device pool, at least we >> can preallocate anon_dev for subvolume/snapshot creation. >> So that when the pool is exhausted, user will get an error other than >> aborting transaction later. >> >> Reported-by: Greed Rong >> Link: https://lore.kernel.org/linux-btrfs/CA+UqX+NTrZ6boGnWHhSeZmEY5J76CTqmYjO2S+=tHJX7nb9DPw@mail.gmail.com/ >> Signed-off-by: Qu Wenruo >> --- >> fs/btrfs/disk-io.c | 51 ++++++++++++++++++++++++++++++++++++------ >> fs/btrfs/disk-io.h | 2 ++ >> fs/btrfs/ioctl.c | 21 ++++++++++++++++- >> fs/btrfs/transaction.c | 3 ++- >> fs/btrfs/transaction.h | 2 ++ >> 5 files changed, 70 insertions(+), 9 deletions(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index cfc0ff288238..14fd69b71cb8 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -1395,7 +1395,7 @@ struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root, >> goto out; >> } >> >> -static int btrfs_init_fs_root(struct btrfs_root *root) >> +static int btrfs_init_fs_root(struct btrfs_root *root, dev_t anon_dev) >> { >> int ret; >> unsigned int nofs_flag; >> @@ -1435,9 +1435,13 @@ static int btrfs_init_fs_root(struct btrfs_root *root) >> */ >> if (is_fstree(root->root_key.objectid) && >> btrfs_root_refs(&root->root_item)) { >> - ret = get_anon_bdev(&root->anon_dev); >> - if (ret) >> - goto fail; >> + if (!anon_dev) { >> + ret = get_anon_bdev(&root->anon_dev); >> + if (ret) >> + goto fail; >> + } else { >> + root->anon_dev = anon_dev; >> + } >> } >> >> mutex_lock(&root->objectid_mutex); >> @@ -1542,8 +1546,27 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info) >> } >> >> >> -struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, >> - u64 objectid, bool check_ref) >> +/* >> + * Get a fs root. >> + * >> + * For essential trees like root/extent tree, we grab it from fs_info directly. >> + * For subvolume trees, we check the cached fs roots first. If miss then >> + * read it from disk and add it to cached fs roots. >> + * >> + * Caller should release the root by calling btrfs_put_root() after the usage. >> + * >> + * NOTE: Reloc and log trees can't be read by this function as they share the >> + * same root objectid. >> + * >> + * @objectid: Root (subvolume) id >> + * @anon_dev: Preallocated anonymous block device number for new roots. >> + * Pass 0 for automatic allocation. >> + * @check_ref: Whether to check root refs. If true, return -ENOENT for orphan >> + * roots. >> + */ >> +static struct btrfs_root *__get_fs_root(struct btrfs_fs_info *fs_info, >> + u64 objectid, dev_t anon_dev, >> + bool check_ref) > > >> +struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info, >> + u64 objectid, bool check_ref) >> +{ >> + return __get_fs_root(fs_info, objectid, 0, check_ref); >> +} >> + >> +struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info, >> + u64 objectid, dev_t anon_dev) >> +{ >> + return __get_fs_root(fs_info, objectid, anon_dev, true); >> +} > > This does not look like a good API, we should keep btrfs_get_fs_root and > add the anon_bdev initialization to the callers, there are only a few. > A few = over 25? I have switched to keep btrfs_get_fs_root(), but you won't like the summary: Old: fs/btrfs/disk-io.h | 2 ++ fs/btrfs/ioctl.c | 21 ++++++++++++++++- fs/btrfs/transaction.c | 3 ++- fs/btrfs/transaction.h | 2 ++ 5 files changed, 70 insertions(+), 9 deletions(-) New: fs/btrfs/backref.c | 4 ++-- fs/btrfs/disk-io.c | 42 ++++++++++++++++++++++++++++++++++-------- fs/btrfs/disk-io.h | 3 ++- fs/btrfs/export.c | 2 +- fs/btrfs/file.c | 2 +- fs/btrfs/inode.c | 2 +- fs/btrfs/ioctl.c | 31 +++++++++++++++++++++++++------ fs/btrfs/relocation.c | 11 ++++++----- fs/btrfs/root-tree.c | 2 +- fs/btrfs/scrub.c | 2 +- fs/btrfs/send.c | 4 ++-- fs/btrfs/super.c | 2 +- fs/btrfs/transaction.c | 3 ++- fs/btrfs/transaction.h | 2 ++ fs/btrfs/tree-log.c | 2 +- fs/btrfs/uuid-tree.c | 2 +- 16 files changed, 83 insertions(+), 33 deletions(-) Do we really go that direction? Thanks, Qu