linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BTRFS: Transaction aborted (error -24)
@ 2020-06-11 10:29 Greed Rong
  2020-06-11 11:20 ` David Sterba
  0 siblings, 1 reply; 13+ messages in thread
From: Greed Rong @ 2020-06-11 10:29 UTC (permalink / raw)
  To: linux-btrfs

Hi,
I have got this error several times. Are there any suggestions to avoid this?

# dmesg
[7142286.563596] ------------[ cut here ]------------
[7142286.564499] BTRFS: Transaction aborted (error -24)
[7142286.565053] WARNING: CPU: 17 PID: 17041 at
fs/btrfs/transaction.c:1576 create_pending_snapshot+0xbc4/0xd10
[btrfs]
[7142286.565482] Modules linked in: vhost_net vhost tap xt_CHECKSUM
iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
nf_reject_ipv4 xt_tcpudp ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter bpfilter iscsi_target_mod target_core_mod
overlay 8021q garp mrp bonding bridge stp llc ipmi_ssif nls_iso8859_1
intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp
dcdbas kvm_intel kvm joydev input_leds irqbypass intel_cstate
intel_rapl_perf lpc_ich mei_me mei ipmi_si ipmi_devintf
ipmi_msghandler acpi_power_meter mac_hid sch_fq_codel ib_iser rdma_cm
iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq libcrc32c raid1 raid0 multipath linear mlx4_ib ib_uverbs
mlx4_en ib_core hid_generic usbhid hid crct10dif_pclmul crc32_pclmul
mgag200 i2c_algo_bit
[7142286.565508]  ghash_clmulni_intel ttm drm_kms_helper aesni_intel
syscopyarea sysfillrect mxm_wmi aes_x86_64 sysimgblt crypto_simd
fb_sys_fops cryptd glue_helper ahci drm mlx4_core tg3 libahci
megaraid_sas devlink wmi
[7142286.570322] CPU: 17 PID: 17041 Comm: btrfs Tainted: G      D
     5.0.10-050010-generic #201904270832
[7142286.570893] Hardware name: Dell Inc. PowerEdge R730xd/0H21J3,
BIOS 2.10.5 07/25/2019
[7142286.571479] RIP: 0010:create_pending_snapshot+0xbc4/0xd10 [btrfs]
[7142286.572063] Code: f0 48 0f ba aa 40 ce 00 00 02 72 27 83 f8 fb 0f
84 54 8d 08 00 89 c6 48 c7 c7 10 41 95 c0 48 89 85 60 ff ff ff e8 6e
17 3e f3 <0f> 0b 48 8b 85 60 ff ff ff 89 c1 ba 28 06 00 00 48 c7 c6 b0
7e 94
[7142286.573357] RSP: 0018:ffffac950ff0fa10 EFLAGS: 00010282
[7142286.573981] RAX: 0000000000000000 RBX: ffff9ec1602e2e38 RCX:
0000000000000006
[7142286.574615] RDX: 0000000000000007 RSI: 0000000000000082 RDI:
ffff9eefbf816440
[7142286.575247] RBP: ffffac950ff0fb00 R08: 0000000000000000 R09:
0000000000002595
[7142286.575888] R10: 0000000000010101 R11: ffff9ebfab630c90 R12:
ffff9ebf8998c3c0
[7142286.576561] R13: 00000000ffffffe8 R14: ffff9e92f1bde000 R15:
ffff9e9701297e00
[7142286.577251] FS:  00007fbcad9e18c0(0000) GS:ffff9eefbf800000(0000)
knlGS:0000000000000000
[7142286.577910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[7142286.578569] CR2: 00007fbcac82e100 CR3: 00000030ad800004 CR4:
00000000001626e0
[7142286.579240] Call Trace:
[7142286.579915]  create_pending_snapshots+0x82/0xa0 [btrfs]
[7142286.580627]  ? create_pending_snapshots+0x82/0xa0 [btrfs]
[7142286.581350]  btrfs_commit_transaction+0x275/0x8c0 [btrfs]
[7142286.582049]  ? btrfs_subvolume_reserve_metadata+0x41/0x180 [btrfs]
[7142286.582750]  btrfs_mksubvol+0x4b9/0x500 [btrfs]
[7142286.583447]  ? security_capable+0x3c/0x60
[7142286.584144]  btrfs_ioctl_snap_create_transid+0x174/0x180 [btrfs]
[7142286.584928]  btrfs_ioctl_snap_create_v2+0x11c/0x180 [btrfs]
[7142286.585646]  btrfs_ioctl+0x11a4/0x2da0 [btrfs]
[7142286.586346]  ? filemap_map_pages+0x1ae/0x380
[7142286.587047]  do_vfs_ioctl+0xa9/0x640
[7142286.587739]  ? do_vfs_ioctl+0xa9/0x640
[7142286.588441]  ksys_ioctl+0x67/0x90
[7142286.589202]  __x64_sys_ioctl+0x1a/0x20
[7142286.589899]  do_syscall_64+0x5a/0x110
[7142286.590604]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[7142286.591308] RIP: 0033:0x7fbcac7c75d7
[7142286.592011] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00
00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89
01 48
[7142286.593558] RSP: 002b:00007fff93db05c8 EFLAGS: 00000206 ORIG_RAX:
0000000000000010
[7142286.594296] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
00007fbcac7c75d7
[7142286.595034] RDX: 00007fff93db0610 RSI: 0000000050009417 RDI:
0000000000000003
[7142286.595765] RBP: 0000000000000003 R08: 0000000000000000 R09:
000000000000000d
[7142286.596492] R10: 00000000fffffff3 R11: 0000000000000206 R12:
000056203917f260
[7142286.597241] R13: 00007fff93db1e46 R14: 00007fff93db0610 R15:
0000000000000004
[7142286.597926] ---[ end trace 33f2f83f3d5250e9 ]---
[7142286.598635] BTRFS: error (device sda1) in
create_pending_snapshot:1576: errno=-24 unknown
[7142286.599388] BTRFS info (device sda1): forced readonly
[7142286.600037] BTRFS warning (device sda1): Skipping commit of
aborted transaction.
[7142286.600731] BTRFS: error (device sda1) in
cleanup_transaction:1831: errno=-24 unknown

# lsb_release -a
No LSB modules are available.
Distributor ID:    Ubuntu
Description:    Ubuntu 18.04.2 LTS
Release:    18.04
Codename:    bionic

# uname -a
Linux gz-cached-10 5.0.10-050010-generic #201904270832 SMP Sat Apr 27
08:34:43 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

# btrfs --version
btrfs-progs v4.20.2

# btrfs fi show
Label: none  uuid: 209cc44a-3e68-46bd-ad69-184228c73ece
    Total devices 1 FS bytes used 4.31TiB
    devid    1 size 8.73TiB used 5.00TiB path /dev/sda1

# btrfs fi df /snapshot/
Data, single: total=4.92TiB, used=4.28TiB
System, DUP: total=8.00MiB, used=544.00KiB
Metadata, DUP: total=45.00GiB, used=24.60GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS: Transaction aborted (error -24)
  2020-06-11 10:29 BTRFS: Transaction aborted (error -24) Greed Rong
@ 2020-06-11 11:20 ` David Sterba
  2020-06-11 12:37   ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: David Sterba @ 2020-06-11 11:20 UTC (permalink / raw)
  To: Greed Rong; +Cc: linux-btrfs

On Thu, Jun 11, 2020 at 06:29:34PM +0800, Greed Rong wrote:
> Hi,
> I have got this error several times. Are there any suggestions to avoid this?
> 
> # dmesg
> [7142286.563596] ------------[ cut here ]------------
> [7142286.564499] BTRFS: Transaction aborted (error -24)

EMFILE          24      /* Too many open files */

you can increase the open file limit but it's strange that this happens,
first time I see this.

>      5.0.10-050010-generic #201904270832

5.0.10 is quite old, but that shouldn't affect it.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS: Transaction aborted (error -24)
  2020-06-11 11:20 ` David Sterba
@ 2020-06-11 12:37   ` Qu Wenruo
  2020-06-11 13:52     ` David Sterba
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2020-06-11 12:37 UTC (permalink / raw)
  To: dsterba, Greed Rong, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1099 bytes --]



On 2020/6/11 下午7:20, David Sterba wrote:
> On Thu, Jun 11, 2020 at 06:29:34PM +0800, Greed Rong wrote:
>> Hi,
>> I have got this error several times. Are there any suggestions to avoid this?
>>
>> # dmesg
>> [7142286.563596] ------------[ cut here ]------------
>> [7142286.564499] BTRFS: Transaction aborted (error -24)
> 
> EMFILE          24      /* Too many open files */
> 
> you can increase the open file limit but it's strange that this happens,
> first time I see this.

Not something from btrfs code, thus it must come from the VFS/MM code.

The offending abort transaction is from btrfs_read_fs_root_no_name(),
which is updated to btrfs_get_fs_root() in upstream kernel.
Overall, it's not much different between the upstream and the 5.0.10 kernel.

But with latest btrfs_get_fs_root(), after a quick glance, there isn't
any obvious location to introduce the EMFILE error.

Any extra info about the worload to trigger the bug?

Thanks,
Qu

> 
>>      5.0.10-050010-generic #201904270832
> 
> 5.0.10 is quite old, but that shouldn't affect it.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS: Transaction aborted (error -24)
  2020-06-11 12:37   ` Qu Wenruo
@ 2020-06-11 13:52     ` David Sterba
  2020-06-12  3:15       ` Greed Rong
  2020-06-12  5:38       ` Qu Wenruo
  0 siblings, 2 replies; 13+ messages in thread
From: David Sterba @ 2020-06-11 13:52 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Greed Rong, linux-btrfs

On Thu, Jun 11, 2020 at 08:37:11PM +0800, Qu Wenruo wrote:
> 
> 
> On 2020/6/11 下午7:20, David Sterba wrote:
> > On Thu, Jun 11, 2020 at 06:29:34PM +0800, Greed Rong wrote:
> >> Hi,
> >> I have got this error several times. Are there any suggestions to avoid this?
> >>
> >> # dmesg
> >> [7142286.563596] ------------[ cut here ]------------
> >> [7142286.564499] BTRFS: Transaction aborted (error -24)
> > 
> > EMFILE          24      /* Too many open files */
> > 
> > you can increase the open file limit but it's strange that this happens,
> > first time I see this.
> 
> Not something from btrfs code, thus it must come from the VFS/MM code.

Yeah, this is VFS. Creating a new root will need a new inode and dentry
and the limits are applied.

> The offending abort transaction is from btrfs_read_fs_root_no_name(),
> which is updated to btrfs_get_fs_root() in upstream kernel.
> Overall, it's not much different between the upstream and the 5.0.10 kernel.
> 
> But with latest btrfs_get_fs_root(), after a quick glance, there isn't
> any obvious location to introduce the EMFILE error.
> 
> Any extra info about the worload to trigger the bug?

I think it's from get_anon_bdev, that's called from btrfs_init_fs_root
(in btrfs_get_fs_root):

1073 int get_anon_bdev(dev_t *p)
1074 {
1075         int dev;
1076
1077         /*
1078          * Many userspace utilities consider an FSID of 0 invalid.
1079          * Always return at least 1 from get_anon_bdev.
1080          */
1081         dev = ida_alloc_range(&unnamed_dev_ida, 1, (1 << MINORBITS) - 1,
1082                         GFP_ATOMIC);
1083         if (dev == -ENOSPC)
1084                 dev = -EMFILE;
1085         if (dev < 0)
1086                 return dev;
1087
1088         *p = MKDEV(0, dev);
1089         return 0;
1090 }
1091 EXPORT_SYMBOL(get_anon_bdev);

And comment says "Return: 0 on success, -EMFILE if there are no
anonymous bdevs left ".

The fs tree roots are created later than the actual command is executed,
so all the errors are also delayed. For that reason I moved eg. the root
item and path allocation to the first phase. We could do the same for
the anonymous bdev.

The problem won't go away tough, the question is why is the IDA range
unnamed_dev_ida exhausted.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS: Transaction aborted (error -24)
  2020-06-11 13:52     ` David Sterba
@ 2020-06-12  3:15       ` Greed Rong
  2020-06-12  6:41         ` Qu Wenruo
  2020-06-12 17:13         ` David Sterba
  2020-06-12  5:38       ` Qu Wenruo
  1 sibling, 2 replies; 13+ messages in thread
From: Greed Rong @ 2020-06-12  3:15 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Greed Rong, linux-btrfs

This server is used for network storage. When a new client arrives, I
create a snapshot of the workspace subvolume for this client. And
delete it when the client disconnects.
Most workspaces are PC game programs. It contains thousands of files
and Its size ranges from 1GB to 20GB.
About 200 windows clients access this server through samba. About 20
snapshots create/delete in one minute.

# lsof | wc -l
47405

# sysctl fs.file-max
fs.file-max = 39579457

# sysctl fs.file-nr
fs.file-nr = 5120    0    39579457

# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1547267
max locked memory       (kbytes, -l) 16384
max memory size         (kbytes, -m) unlimited
open files                      (-n) 102400
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1547267
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

On Thu, Jun 11, 2020 at 9:52 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Thu, Jun 11, 2020 at 08:37:11PM +0800, Qu Wenruo wrote:
> >
> >
> > On 2020/6/11 下午7:20, David Sterba wrote:
> > > On Thu, Jun 11, 2020 at 06:29:34PM +0800, Greed Rong wrote:
> > >> Hi,
> > >> I have got this error several times. Are there any suggestions to avoid this?
> > >>
> > >> # dmesg
> > >> [7142286.563596] ------------[ cut here ]------------
> > >> [7142286.564499] BTRFS: Transaction aborted (error -24)
> > >
> > > EMFILE          24      /* Too many open files */
> > >
> > > you can increase the open file limit but it's strange that this happens,
> > > first time I see this.
> >
> > Not something from btrfs code, thus it must come from the VFS/MM code.
>
> Yeah, this is VFS. Creating a new root will need a new inode and dentry
> and the limits are applied.
>
> > The offending abort transaction is from btrfs_read_fs_root_no_name(),
> > which is updated to btrfs_get_fs_root() in upstream kernel.
> > Overall, it's not much different between the upstream and the 5.0.10 kernel.
> >
> > But with latest btrfs_get_fs_root(), after a quick glance, there isn't
> > any obvious location to introduce the EMFILE error.
> >
> > Any extra info about the worload to trigger the bug?
>
> I think it's from get_anon_bdev, that's called from btrfs_init_fs_root
> (in btrfs_get_fs_root):
>
> 1073 int get_anon_bdev(dev_t *p)
> 1074 {
> 1075         int dev;
> 1076
> 1077         /*
> 1078          * Many userspace utilities consider an FSID of 0 invalid.
> 1079          * Always return at least 1 from get_anon_bdev.
> 1080          */
> 1081         dev = ida_alloc_range(&unnamed_dev_ida, 1, (1 << MINORBITS) - 1,
> 1082                         GFP_ATOMIC);
> 1083         if (dev == -ENOSPC)
> 1084                 dev = -EMFILE;
> 1085         if (dev < 0)
> 1086                 return dev;
> 1087
> 1088         *p = MKDEV(0, dev);
> 1089         return 0;
> 1090 }
> 1091 EXPORT_SYMBOL(get_anon_bdev);
>
> And comment says "Return: 0 on success, -EMFILE if there are no
> anonymous bdevs left ".
>
> The fs tree roots are created later than the actual command is executed,
> so all the errors are also delayed. For that reason I moved eg. the root
> item and path allocation to the first phase. We could do the same for
> the anonymous bdev.
>
> The problem won't go away tough, the question is why is the IDA range
> unnamed_dev_ida exhausted.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS: Transaction aborted (error -24)
  2020-06-11 13:52     ` David Sterba
  2020-06-12  3:15       ` Greed Rong
@ 2020-06-12  5:38       ` Qu Wenruo
  1 sibling, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2020-06-12  5:38 UTC (permalink / raw)
  To: dsterba, Greed Rong, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2882 bytes --]



On 2020/6/11 下午9:52, David Sterba wrote:
> On Thu, Jun 11, 2020 at 08:37:11PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2020/6/11 下午7:20, David Sterba wrote:
>>> On Thu, Jun 11, 2020 at 06:29:34PM +0800, Greed Rong wrote:
>>>> Hi,
>>>> I have got this error several times. Are there any suggestions to avoid this?
>>>>
>>>> # dmesg
>>>> [7142286.563596] ------------[ cut here ]------------
>>>> [7142286.564499] BTRFS: Transaction aborted (error -24)
>>>
>>> EMFILE          24      /* Too many open files */
>>>
>>> you can increase the open file limit but it's strange that this happens,
>>> first time I see this.
>>
>> Not something from btrfs code, thus it must come from the VFS/MM code.
> 
> Yeah, this is VFS. Creating a new root will need a new inode and dentry
> and the limits are applied.
> 
>> The offending abort transaction is from btrfs_read_fs_root_no_name(),
>> which is updated to btrfs_get_fs_root() in upstream kernel.
>> Overall, it's not much different between the upstream and the 5.0.10 kernel.
>>
>> But with latest btrfs_get_fs_root(), after a quick glance, there isn't
>> any obvious location to introduce the EMFILE error.
>>
>> Any extra info about the worload to trigger the bug?
> 
> I think it's from get_anon_bdev, that's called from btrfs_init_fs_root
> (in btrfs_get_fs_root):
> 
> 1073 int get_anon_bdev(dev_t *p)
> 1074 {
> 1075         int dev;
> 1076
> 1077         /*
> 1078          * Many userspace utilities consider an FSID of 0 invalid.
> 1079          * Always return at least 1 from get_anon_bdev.
> 1080          */
> 1081         dev = ida_alloc_range(&unnamed_dev_ida, 1, (1 << MINORBITS) - 1,
> 1082                         GFP_ATOMIC);
> 1083         if (dev == -ENOSPC)
> 1084                 dev = -EMFILE;
> 1085         if (dev < 0)
> 1086                 return dev;
> 1087
> 1088         *p = MKDEV(0, dev);
> 1089         return 0;
> 1090 }
> 1091 EXPORT_SYMBOL(get_anon_bdev);
> 
> And comment says "Return: 0 on success, -EMFILE if there are no
> anonymous bdevs left ".
> 
> The fs tree roots are created later than the actual command is executed,
> so all the errors are also delayed. For that reason I moved eg. the root
> item and path allocation to the first phase. We could do the same for
> the anonymous bdev.

The first question is, do we really need per-root anonymous bdev?

IMHO btrfs can shared the same anonymous bdev across the same fs, no
need for each root to own one.

The user-visible change would be, statefs() will alwasy return the same
bdev for all roots.
User would lose the ability to distinguish different roots from the same
fs, but I doubt if that would really impact the use cases.

Thanks,
Qu

> 
> The problem won't go away tough, the question is why is the IDA range
> unnamed_dev_ida exhausted.
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS: Transaction aborted (error -24)
  2020-06-12  3:15       ` Greed Rong
@ 2020-06-12  6:41         ` Qu Wenruo
  2020-06-12 17:13         ` David Sterba
  1 sibling, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2020-06-12  6:41 UTC (permalink / raw)
  To: Greed Rong, dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4667 bytes --]



On 2020/6/12 上午11:15, Greed Rong wrote:
> This server is used for network storage. When a new client arrives, I
> create a snapshot of the workspace subvolume for this client. And
> delete it when the client disconnects.
> Most workspaces are PC game programs. It contains thousands of files
> and Its size ranges from 1GB to 20GB.
> About 200 windows clients access this server through samba. About 20
> snapshots create/delete in one minute.

After checking the idr code, for anonymous block device, we only have
1<<20 devices to allocate.
Which is not too many, but should be enough for regular usage. (Can
maintain 12 days for one snapshot per second).

But in your workload, the snapshot creation is not that frequent, not to
mention snapshots are also deleted.
Although btrfs snapshot deletion is delayed, unless you're not deleting
snaposhots for around a month, you shouldn't exhaust the pool.

Have you ever experienced strange performance problem like creating a
snapshot taking too long time?
Or see space not recycled?

Anyway, I'll send out an RFC patch to explore the possibility to use one
single anonymous block device across one btrfs.

Thanks,
Qu

> 
> # lsof | wc -l
> 47405
> 
> # sysctl fs.file-max
> fs.file-max = 39579457
> 
> # sysctl fs.file-nr
> fs.file-nr = 5120    0    39579457
> 
> # ulimit -a
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 1547267
> max locked memory       (kbytes, -l) 16384
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 102400
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 1547267
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> 
> On Thu, Jun 11, 2020 at 9:52 PM David Sterba <dsterba@suse.cz> wrote:
>>
>> On Thu, Jun 11, 2020 at 08:37:11PM +0800, Qu Wenruo wrote:
>>>
>>>
>>> On 2020/6/11 下午7:20, David Sterba wrote:
>>>> On Thu, Jun 11, 2020 at 06:29:34PM +0800, Greed Rong wrote:
>>>>> Hi,
>>>>> I have got this error several times. Are there any suggestions to avoid this?
>>>>>
>>>>> # dmesg
>>>>> [7142286.563596] ------------[ cut here ]------------
>>>>> [7142286.564499] BTRFS: Transaction aborted (error -24)
>>>>
>>>> EMFILE          24      /* Too many open files */
>>>>
>>>> you can increase the open file limit but it's strange that this happens,
>>>> first time I see this.
>>>
>>> Not something from btrfs code, thus it must come from the VFS/MM code.
>>
>> Yeah, this is VFS. Creating a new root will need a new inode and dentry
>> and the limits are applied.
>>
>>> The offending abort transaction is from btrfs_read_fs_root_no_name(),
>>> which is updated to btrfs_get_fs_root() in upstream kernel.
>>> Overall, it's not much different between the upstream and the 5.0.10 kernel.
>>>
>>> But with latest btrfs_get_fs_root(), after a quick glance, there isn't
>>> any obvious location to introduce the EMFILE error.
>>>
>>> Any extra info about the worload to trigger the bug?
>>
>> I think it's from get_anon_bdev, that's called from btrfs_init_fs_root
>> (in btrfs_get_fs_root):
>>
>> 1073 int get_anon_bdev(dev_t *p)
>> 1074 {
>> 1075         int dev;
>> 1076
>> 1077         /*
>> 1078          * Many userspace utilities consider an FSID of 0 invalid.
>> 1079          * Always return at least 1 from get_anon_bdev.
>> 1080          */
>> 1081         dev = ida_alloc_range(&unnamed_dev_ida, 1, (1 << MINORBITS) - 1,
>> 1082                         GFP_ATOMIC);
>> 1083         if (dev == -ENOSPC)
>> 1084                 dev = -EMFILE;
>> 1085         if (dev < 0)
>> 1086                 return dev;
>> 1087
>> 1088         *p = MKDEV(0, dev);
>> 1089         return 0;
>> 1090 }
>> 1091 EXPORT_SYMBOL(get_anon_bdev);
>>
>> And comment says "Return: 0 on success, -EMFILE if there are no
>> anonymous bdevs left ".
>>
>> The fs tree roots are created later than the actual command is executed,
>> so all the errors are also delayed. For that reason I moved eg. the root
>> item and path allocation to the first phase. We could do the same for
>> the anonymous bdev.
>>
>> The problem won't go away tough, the question is why is the IDA range
>> unnamed_dev_ida exhausted.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS: Transaction aborted (error -24)
  2020-06-12  3:15       ` Greed Rong
  2020-06-12  6:41         ` Qu Wenruo
@ 2020-06-12 17:13         ` David Sterba
  2020-06-15 12:50           ` Greed Rong
  1 sibling, 1 reply; 13+ messages in thread
From: David Sterba @ 2020-06-12 17:13 UTC (permalink / raw)
  To: Greed Rong; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Fri, Jun 12, 2020 at 11:15:43AM +0800, Greed Rong wrote:
> This server is used for network storage. When a new client arrives, I
> create a snapshot of the workspace subvolume for this client. And
> delete it when the client disconnects.

NFS, cephfs and overlayfs use the same pool of ids, in combination with
btrfs snapshots the consumption might be higher than in other setups.

> Most workspaces are PC game programs. It contains thousands of files
> and Its size ranges from 1GB to 20GB.

We can rule out regular files, they don't affect that, and the numbers
you posted are all normal.

> About 200 windows clients access this server through samba. About 20
> snapshots create/delete in one minute.

This is contributing to the overall consumption of the ids from the
pool, but now it's shared among the network filesystem and btrfs.

Possible explanation would be leak of the ids, once this state is hit
it's permament so no new snapshots could be created or the network
clients will start getting some other error.

If there's no leak, then all objects that have the id attached would
need to be active, ie. snapshot part of a path, network client
connected to it's path. This also means some sort of caching, so the ids
are not returned back right away.

For the subvolumes the ids get returned once the subvolume is deleted
and cleaned, which might take time and contribute to the pool
exhaustion. I need to do some tests to see if we could release the ids
earlier.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS: Transaction aborted (error -24)
  2020-06-12 17:13         ` David Sterba
@ 2020-06-15 12:50           ` Greed Rong
  2020-06-16  0:38             ` Qu Wenruo
  2020-06-18 12:34             ` David Sterba
  0 siblings, 2 replies; 13+ messages in thread
From: Greed Rong @ 2020-06-15 12:50 UTC (permalink / raw)
  To: dsterba, Greed Rong, Qu Wenruo, linux-btrfs

Does that mean about 2^20 subvolumes can be created in one root btrfs?

The snapshot delete service was stopped a few weeks ago. I think this
is the reason why the id pool is exhausted.
I will try to run it again and see if it works.

Thanks

On Sat, Jun 13, 2020 at 1:13 AM David Sterba <dsterba@suse.cz> wrote:
>
> On Fri, Jun 12, 2020 at 11:15:43AM +0800, Greed Rong wrote:
> > This server is used for network storage. When a new client arrives, I
> > create a snapshot of the workspace subvolume for this client. And
> > delete it when the client disconnects.
>
> NFS, cephfs and overlayfs use the same pool of ids, in combination with
> btrfs snapshots the consumption might be higher than in other setups.
>
> > Most workspaces are PC game programs. It contains thousands of files
> > and Its size ranges from 1GB to 20GB.
>
> We can rule out regular files, they don't affect that, and the numbers
> you posted are all normal.
>
> > About 200 windows clients access this server through samba. About 20
> > snapshots create/delete in one minute.
>
> This is contributing to the overall consumption of the ids from the
> pool, but now it's shared among the network filesystem and btrfs.
>
> Possible explanation would be leak of the ids, once this state is hit
> it's permament so no new snapshots could be created or the network
> clients will start getting some other error.
>
> If there's no leak, then all objects that have the id attached would
> need to be active, ie. snapshot part of a path, network client
> connected to it's path. This also means some sort of caching, so the ids
> are not returned back right away.
>
> For the subvolumes the ids get returned once the subvolume is deleted
> and cleaned, which might take time and contribute to the pool
> exhaustion. I need to do some tests to see if we could release the ids
> earlier.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS: Transaction aborted (error -24)
  2020-06-15 12:50           ` Greed Rong
@ 2020-06-16  0:38             ` Qu Wenruo
  2020-06-18 12:34             ` David Sterba
  1 sibling, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2020-06-16  0:38 UTC (permalink / raw)
  To: Greed Rong, dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2564 bytes --]



On 2020/6/15 下午8:50, Greed Rong wrote:
> Does that mean about 2^20 subvolumes can be created in one root btrfs?

Unfortunately that 1<<20 limit is shared across a lot of filesystemts,
like overlayfs, ceph and btrfs.

Furthermore the pool is a global pool, which means it's shared by all
btrfs filesystems.

So in one btrfs, it's way smaller than 1<<20.

> 
> The snapshot delete service was stopped a few weeks ago. I think this
> is the reason why the id pool is exhausted.
> I will try to run it again and see if it works.

At least we're working on workaround the limit, by:
- Reduce known unnecessary users of the pool
  Reloc tree/data reloc tree don't need to utilize the pool

- Prealloc the id to prevent transaction abort
  So the user would get error from ioctl, other than forcing the whole
  fs to be RO later.

Thanks,
Qu

> 
> Thanks
> 
> On Sat, Jun 13, 2020 at 1:13 AM David Sterba <dsterba@suse.cz> wrote:
>>
>> On Fri, Jun 12, 2020 at 11:15:43AM +0800, Greed Rong wrote:
>>> This server is used for network storage. When a new client arrives, I
>>> create a snapshot of the workspace subvolume for this client. And
>>> delete it when the client disconnects.
>>
>> NFS, cephfs and overlayfs use the same pool of ids, in combination with
>> btrfs snapshots the consumption might be higher than in other setups.
>>
>>> Most workspaces are PC game programs. It contains thousands of files
>>> and Its size ranges from 1GB to 20GB.
>>
>> We can rule out regular files, they don't affect that, and the numbers
>> you posted are all normal.
>>
>>> About 200 windows clients access this server through samba. About 20
>>> snapshots create/delete in one minute.
>>
>> This is contributing to the overall consumption of the ids from the
>> pool, but now it's shared among the network filesystem and btrfs.
>>
>> Possible explanation would be leak of the ids, once this state is hit
>> it's permament so no new snapshots could be created or the network
>> clients will start getting some other error.
>>
>> If there's no leak, then all objects that have the id attached would
>> need to be active, ie. snapshot part of a path, network client
>> connected to it's path. This also means some sort of caching, so the ids
>> are not returned back right away.
>>
>> For the subvolumes the ids get returned once the subvolume is deleted
>> and cleaned, which might take time and contribute to the pool
>> exhaustion. I need to do some tests to see if we could release the ids
>> earlier.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS: Transaction aborted (error -24)
  2020-06-15 12:50           ` Greed Rong
  2020-06-16  0:38             ` Qu Wenruo
@ 2020-06-18 12:34             ` David Sterba
  2020-06-19  4:04               ` Greed Rong
  1 sibling, 1 reply; 13+ messages in thread
From: David Sterba @ 2020-06-18 12:34 UTC (permalink / raw)
  To: Greed Rong; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Mon, Jun 15, 2020 at 08:50:28PM +0800, Greed Rong wrote:
> Does that mean about 2^20 subvolumes can be created in one root btrfs?

No, subvolume ids are assigned incrementally, the amount is 2^64 so this
shouldn't be a problem in practice.

> The snapshot delete service was stopped a few weeks ago. I think this
> is the reason why the id pool is exhausted.
> I will try to run it again and see if it works.

The patches to reclaim the anon bdevs faster is small enough to be
pushed to older stable kernels, so you should be able to use it
eventually.

As a workaround, you can still delete the old subvolumes to get the
space back but perhaps at a slower rate and wait until the deleted
subvolumes are cleaned. That there's no way to get the number of used
anon bdevs makes it harder unfortunatelly.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS: Transaction aborted (error -24)
  2020-06-18 12:34             ` David Sterba
@ 2020-06-19  4:04               ` Greed Rong
  2020-06-19  4:41                 ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: Greed Rong @ 2020-06-19  4:04 UTC (permalink / raw)
  To: dsterba, Greed Rong, Qu Wenruo, linux-btrfs

I have restarted the delete service. And unfortunately it happened again.
I am confuse that:
    1. When will an anon bdev be allocated in btrfs?
    2. When will an anon bdev return to the pool?
    3. Are there any tools to find out how many subvolumes have been
deleted but not committed?

Thanks

On Thu, Jun 18, 2020 at 8:34 PM David Sterba <dsterba@suse.cz> wrote:
>
> On Mon, Jun 15, 2020 at 08:50:28PM +0800, Greed Rong wrote:
> > Does that mean about 2^20 subvolumes can be created in one root btrfs?
>
> No, subvolume ids are assigned incrementally, the amount is 2^64 so this
> shouldn't be a problem in practice.
>
> > The snapshot delete service was stopped a few weeks ago. I think this
> > is the reason why the id pool is exhausted.
> > I will try to run it again and see if it works.
>
> The patches to reclaim the anon bdevs faster is small enough to be
> pushed to older stable kernels, so you should be able to use it
> eventually.
>
> As a workaround, you can still delete the old subvolumes to get the
> space back but perhaps at a slower rate and wait until the deleted
> subvolumes are cleaned. That there's no way to get the number of used
> anon bdevs makes it harder unfortunatelly.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS: Transaction aborted (error -24)
  2020-06-19  4:04               ` Greed Rong
@ 2020-06-19  4:41                 ` Qu Wenruo
  0 siblings, 0 replies; 13+ messages in thread
From: Qu Wenruo @ 2020-06-19  4:41 UTC (permalink / raw)
  To: Greed Rong, dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1740 bytes --]



On 2020/6/19 下午12:04, Greed Rong wrote:
> I have restarted the delete service. And unfortunately it happened again.
> I am confuse that:
>     1. When will an anon bdev be allocated in btrfs?

When a new subvolume is created, or one existing subvolume get read for
its first time after mount.

>     2. When will an anon bdev return to the pool?

When a subvolume is unlinked (with the latest patch), or when a
subvolume is completely deleted (current behavior), or when the whole
btrfs is unmounted.


>     3. Are there any tools to find out how many subvolumes have been
> deleted but not committed?

I'm not sure, but you can always wait for all orphan subvolumes to be
completely deleted, by using "btrfs sub sync" command.

Thanks,
Qu

> 
> Thanks
> 
> On Thu, Jun 18, 2020 at 8:34 PM David Sterba <dsterba@suse.cz> wrote:
>>
>> On Mon, Jun 15, 2020 at 08:50:28PM +0800, Greed Rong wrote:
>>> Does that mean about 2^20 subvolumes can be created in one root btrfs?
>>
>> No, subvolume ids are assigned incrementally, the amount is 2^64 so this
>> shouldn't be a problem in practice.
>>
>>> The snapshot delete service was stopped a few weeks ago. I think this
>>> is the reason why the id pool is exhausted.
>>> I will try to run it again and see if it works.
>>
>> The patches to reclaim the anon bdevs faster is small enough to be
>> pushed to older stable kernels, so you should be able to use it
>> eventually.
>>
>> As a workaround, you can still delete the old subvolumes to get the
>> space back but perhaps at a slower rate and wait until the deleted
>> subvolumes are cleaned. That there's no way to get the number of used
>> anon bdevs makes it harder unfortunatelly.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-06-19  4:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11 10:29 BTRFS: Transaction aborted (error -24) Greed Rong
2020-06-11 11:20 ` David Sterba
2020-06-11 12:37   ` Qu Wenruo
2020-06-11 13:52     ` David Sterba
2020-06-12  3:15       ` Greed Rong
2020-06-12  6:41         ` Qu Wenruo
2020-06-12 17:13         ` David Sterba
2020-06-15 12:50           ` Greed Rong
2020-06-16  0:38             ` Qu Wenruo
2020-06-18 12:34             ` David Sterba
2020-06-19  4:04               ` Greed Rong
2020-06-19  4:41                 ` Qu Wenruo
2020-06-12  5:38       ` Qu Wenruo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).