* applications hang on a btrfs spanning two partitions @ 2019-01-08 19:38 Florian Stecker 2019-01-09 6:24 ` Nikolay Borisov 0 siblings, 1 reply; 12+ messages in thread From: Florian Stecker @ 2019-01-08 19:38 UTC (permalink / raw) To: linux-btrfs Hi everyone, I extended the btrfs volume on my laptop by adding a second partition to it which lies on the same SSD (using btrfs device add). Since I did this, all kinds of applications regularly hang for up to 30 seconds. It seems they are stuck in the fdatasync syscall. For example: $ strace -tt -T gajim 2>&1 | grep fdatasync [...] 11:36:31.112200 fdatasync(25) = 0 <0.006958> 11:36:32.147525 fdatasync(25) = 0 <0.008138> 11:36:32.156882 fdatasync(25) = 0 <0.006866> 11:36:32.165979 fdatasync(25) = 0 <0.011797> 11:36:32.178867 fdatasync(25) = 0 <23.636614> 11:36:55.827726 fdatasync(25) = 0 <0.009595> 11:36:55.838702 fdatasync(25) = 0 <0.007261> 11:36:55.850440 fdatasync(25) = 0 <0.006807> 11:36:55.858168 fdatasync(25) = 0 <0.006767> [...] File descriptor 25 here points to a file which is just ~90KB, so it really shouldn't take that long. Removing the second partition again resolves the problem. Does anyone know this issue? Is it related to btrfs? Or am I just doing something wrong? Best, Florian Some more info: $ btrfs device usage / /dev/sda2, ID: 2 Device size: 52.16GiB Device slack: 0.00B Data,single: 1.00GiB Unallocated: 51.16GiB /dev/sda8, ID: 1 Device size: 174.92GiB Device slack: 0.00B Data,single: 168.91GiB Metadata,single: 3.01GiB System,single: 4.00MiB Unallocated: 3.00GiB $ fdisk -l /dev/sda Disk /dev/sda: 238.5 GiB, 256060514304 bytes, 500118192 sectors Disk model: SAMSUNG SSD PM87 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: A48B5A25-AA84-4D3F-90DD-E8A4991BDF03 Device Start End Sectors Size Type /dev/sda1 2048 1026047 1024000 500M EFI System /dev/sda2 1026048 110422015 109395968 52.2G Linux filesystem /dev/sda8 110422016 477263871 366841856 174.9G Linux filesystem /dev/sda9 477263872 481458175 4194304 2G Linux swap $ uname -a Linux dell 4.20.0-arch1-1-ARCH #1 SMP PREEMPT Mon Dec 24 03:00:40 UTC 2018 x86_64 GNU/Linux ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: applications hang on a btrfs spanning two partitions 2019-01-08 19:38 applications hang on a btrfs spanning two partitions Florian Stecker @ 2019-01-09 6:24 ` Nikolay Borisov 2019-01-09 9:16 ` Florian Stecker 0 siblings, 1 reply; 12+ messages in thread From: Nikolay Borisov @ 2019-01-09 6:24 UTC (permalink / raw) To: Florian Stecker, linux-btrfs On 8.01.19 г. 21:38 ч., Florian Stecker wrote: > Hi everyone, > > I extended the btrfs volume on my laptop by adding a second partition to > it which lies on the same SSD (using btrfs device add). Since I did > this, all kinds of applications regularly hang for up to 30 seconds. It > seems they are stuck in the fdatasync syscall. For example: > > $ strace -tt -T gajim 2>&1 | grep fdatasync > [...] > 11:36:31.112200 fdatasync(25) = 0 <0.006958> > 11:36:32.147525 fdatasync(25) = 0 <0.008138> > 11:36:32.156882 fdatasync(25) = 0 <0.006866> > 11:36:32.165979 fdatasync(25) = 0 <0.011797> > 11:36:32.178867 fdatasync(25) = 0 <23.636614> > 11:36:55.827726 fdatasync(25) = 0 <0.009595> > 11:36:55.838702 fdatasync(25) = 0 <0.007261> > 11:36:55.850440 fdatasync(25) = 0 <0.006807> > 11:36:55.858168 fdatasync(25) = 0 <0.006767> > [...] > > File descriptor 25 here points to a file which is just ~90KB, so it > really shouldn't take that long. > > Removing the second partition again resolves the problem. Does anyone > know this issue? Is it related to btrfs? Or am I just doing something > wrong? > > Best, > Florian > > Some more info: > > $ btrfs device usage / > /dev/sda2, ID: 2 > Device size: 52.16GiB > Device slack: 0.00B > Data,single: 1.00GiB > Unallocated: 51.16GiB > > /dev/sda8, ID: 1 > Device size: 174.92GiB > Device slack: 0.00B > Data,single: 168.91GiB > Metadata,single: 3.01GiB > System,single: 4.00MiB > Unallocated: 3.00GiB > > $ fdisk -l /dev/sda > Disk /dev/sda: 238.5 GiB, 256060514304 bytes, 500118192 sectors > Disk model: SAMSUNG SSD PM87 > Units: sectors of 1 * 512 = 512 bytes > Sector size (logical/physical): 512 bytes / 512 bytes > I/O size (minimum/optimal): 512 bytes / 512 bytes > Disklabel type: gpt > Disk identifier: A48B5A25-AA84-4D3F-90DD-E8A4991BDF03 > > Device Start End Sectors Size Type > /dev/sda1 2048 1026047 1024000 500M EFI System > /dev/sda2 1026048 110422015 109395968 52.2G Linux filesystem > /dev/sda8 110422016 477263871 366841856 174.9G Linux filesystem > /dev/sda9 477263872 481458175 4194304 2G Linux swap > > $ uname -a > Linux dell 4.20.0-arch1-1-ARCH #1 SMP PREEMPT Mon Dec 24 03:00:40 UTC > 2018 x86_64 GNU/Linux Provide output of echo w > /proc/sysrq-trigger when the hang occurs otherwise it's hard to figure what's going on. > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: applications hang on a btrfs spanning two partitions 2019-01-09 6:24 ` Nikolay Borisov @ 2019-01-09 9:16 ` Florian Stecker 2019-01-09 10:03 ` Nikolay Borisov 0 siblings, 1 reply; 12+ messages in thread From: Florian Stecker @ 2019-01-09 9:16 UTC (permalink / raw) To: Nikolay Borisov, linux-btrfs > > Provide output of echo w > /proc/sysrq-trigger when the hang occurs > otherwise it's hard to figure what's going on. > Here's one, again in gajim. This time, fdatasync() took "only" 2 seconds: [42481.243491] sysrq: SysRq : Show Blocked State [42481.243494] task PC stack pid father [42481.243566] gajim D 0 15778 15774 0x00000083 [42481.243569] Call Trace: [42481.243575] ? __schedule+0x29b/0x8b0 [42481.243576] ? bit_wait+0x50/0x50 [42481.243578] schedule+0x32/0x90 [42481.243580] io_schedule+0x12/0x40 [42481.243582] bit_wait_io+0xd/0x50 [42481.243583] __wait_on_bit+0x6c/0x80 [42481.243585] out_of_line_wait_on_bit+0x91/0xb0 [42481.243587] ? init_wait_var_entry+0x40/0x40 [42481.243605] write_all_supers+0x418/0xa70 [btrfs] [42481.243622] btrfs_sync_log+0x695/0x910 [btrfs] [42481.243625] ? _raw_spin_lock_irqsave+0x25/0x50 [42481.243641] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] [42481.243655] btrfs_sync_file+0x3a9/0x3d0 [btrfs] [42481.243659] do_fsync+0x38/0x70 [42481.243661] __x64_sys_fdatasync+0x13/0x20 [42481.243663] do_syscall_64+0x5b/0x170 [42481.243666] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [42481.243667] RIP: 0033:0x7fd4022f873f [42481.243671] Code: Bad RIP value. [42481.243672] RSP: 002b:00007ffd3710a300 EFLAGS: 00000293 ORIG_RAX: 000000000000004b [42481.243674] RAX: ffffffffffffffda RBX: 0000000000000019 RCX: 00007fd4022f873f [42481.243675] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000019 [42481.243675] RBP: 0000000000000000 R08: 000055d8d8649f68 R09: 00007ffd3710a320 [42481.243676] R10: 0000000000013000 R11: 0000000000000293 R12: 0000000000000000 [42481.243677] R13: 0000000000000000 R14: 000055d8d8363fa0 R15: 000055d8d8613040 On 1/9/19 7:24 AM, Nikolay Borisov wrote: > > > On 8.01.19 г. 21:38 ч., Florian Stecker wrote: >> Hi everyone, >> >> I extended the btrfs volume on my laptop by adding a second partition to >> it which lies on the same SSD (using btrfs device add). Since I did >> this, all kinds of applications regularly hang for up to 30 seconds. It >> seems they are stuck in the fdatasync syscall. For example: >> >> $ strace -tt -T gajim 2>&1 | grep fdatasync >> [...] >> 11:36:31.112200 fdatasync(25) = 0 <0.006958> >> 11:36:32.147525 fdatasync(25) = 0 <0.008138> >> 11:36:32.156882 fdatasync(25) = 0 <0.006866> >> 11:36:32.165979 fdatasync(25) = 0 <0.011797> >> 11:36:32.178867 fdatasync(25) = 0 <23.636614> >> 11:36:55.827726 fdatasync(25) = 0 <0.009595> >> 11:36:55.838702 fdatasync(25) = 0 <0.007261> >> 11:36:55.850440 fdatasync(25) = 0 <0.006807> >> 11:36:55.858168 fdatasync(25) = 0 <0.006767> >> [...] >> >> File descriptor 25 here points to a file which is just ~90KB, so it >> really shouldn't take that long. >> >> Removing the second partition again resolves the problem. Does anyone >> know this issue? Is it related to btrfs? Or am I just doing something >> wrong? >> >> Best, >> Florian >> >> Some more info: >> >> $ btrfs device usage / >> /dev/sda2, ID: 2 >> Device size: 52.16GiB >> Device slack: 0.00B >> Data,single: 1.00GiB >> Unallocated: 51.16GiB >> >> /dev/sda8, ID: 1 >> Device size: 174.92GiB >> Device slack: 0.00B >> Data,single: 168.91GiB >> Metadata,single: 3.01GiB >> System,single: 4.00MiB >> Unallocated: 3.00GiB >> >> $ fdisk -l /dev/sda >> Disk /dev/sda: 238.5 GiB, 256060514304 bytes, 500118192 sectors >> Disk model: SAMSUNG SSD PM87 >> Units: sectors of 1 * 512 = 512 bytes >> Sector size (logical/physical): 512 bytes / 512 bytes >> I/O size (minimum/optimal): 512 bytes / 512 bytes >> Disklabel type: gpt >> Disk identifier: A48B5A25-AA84-4D3F-90DD-E8A4991BDF03 >> >> Device Start End Sectors Size Type >> /dev/sda1 2048 1026047 1024000 500M EFI System >> /dev/sda2 1026048 110422015 109395968 52.2G Linux filesystem >> /dev/sda8 110422016 477263871 366841856 174.9G Linux filesystem >> /dev/sda9 477263872 481458175 4194304 2G Linux swap >> >> $ uname -a >> Linux dell 4.20.0-arch1-1-ARCH #1 SMP PREEMPT Mon Dec 24 03:00:40 UTC >> 2018 x86_64 GNU/Linux > >> >> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: applications hang on a btrfs spanning two partitions 2019-01-09 9:16 ` Florian Stecker @ 2019-01-09 10:03 ` Nikolay Borisov 2019-01-09 20:10 ` Florian Stecker 0 siblings, 1 reply; 12+ messages in thread From: Nikolay Borisov @ 2019-01-09 10:03 UTC (permalink / raw) To: Florian Stecker, linux-btrfs On 9.01.19 г. 11:16 ч., Florian Stecker wrote: >> >> Provide output of echo w > /proc/sysrq-trigger when the hang occurs >> otherwise it's hard to figure what's going on. >> > > Here's one, again in gajim. This time, fdatasync() took "only" 2 seconds: > > [42481.243491] sysrq: SysRq : Show Blocked State > [42481.243494] task PC stack pid father > [42481.243566] gajim D 0 15778 15774 0x00000083 > [42481.243569] Call Trace: > [42481.243575] ? __schedule+0x29b/0x8b0 > [42481.243576] ? bit_wait+0x50/0x50 > [42481.243578] schedule+0x32/0x90 > [42481.243580] io_schedule+0x12/0x40 > [42481.243582] bit_wait_io+0xd/0x50 > [42481.243583] __wait_on_bit+0x6c/0x80 > [42481.243585] out_of_line_wait_on_bit+0x91/0xb0 > [42481.243587] ? init_wait_var_entry+0x40/0x40 > [42481.243605] write_all_supers+0x418/0xa70 [btrfs] > [42481.243622] btrfs_sync_log+0x695/0x910 [btrfs] > [42481.243625] ? _raw_spin_lock_irqsave+0x25/0x50 > [42481.243641] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > [42481.243655] btrfs_sync_file+0x3a9/0x3d0 [btrfs] > [42481.243659] do_fsync+0x38/0x70 > [42481.243661] __x64_sys_fdatasync+0x13/0x20 > [42481.243663] do_syscall_64+0x5b/0x170 > [42481.243666] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [42481.243667] RIP: 0033:0x7fd4022f873f > [42481.243671] Code: Bad RIP value. > [42481.243672] RSP: 002b:00007ffd3710a300 EFLAGS: 00000293 ORIG_RAX: > 000000000000004b > [42481.243674] RAX: ffffffffffffffda RBX: 0000000000000019 RCX: > 00007fd4022f873f > [42481.243675] RDX: 0000000000000000 RSI: 0000000000000002 RDI: > 0000000000000019 > [42481.243675] RBP: 0000000000000000 R08: 000055d8d8649f68 R09: > 00007ffd3710a320 > [42481.243676] R10: 0000000000013000 R11: 0000000000000293 R12: > 0000000000000000 > [42481.243677] R13: 0000000000000000 R14: 000055d8d8363fa0 R15: > 000055d8d8613040 This shows that IO was send to disk to write the supper blocks following an fsync and it's waiting for IO to finish. This seems like a problem in the storage layer, i.e IOs being stuck. Check your dmesg for any errors. > > > On 1/9/19 7:24 AM, Nikolay Borisov wrote: >> >> >> On 8.01.19 г. 21:38 ч., Florian Stecker wrote: >>> Hi everyone, >>> >>> I extended the btrfs volume on my laptop by adding a second partition to >>> it which lies on the same SSD (using btrfs device add). Since I did >>> this, all kinds of applications regularly hang for up to 30 seconds. It >>> seems they are stuck in the fdatasync syscall. For example: >>> >>> $ strace -tt -T gajim 2>&1 | grep fdatasync >>> [...] >>> 11:36:31.112200 fdatasync(25) = 0 <0.006958> >>> 11:36:32.147525 fdatasync(25) = 0 <0.008138> >>> 11:36:32.156882 fdatasync(25) = 0 <0.006866> >>> 11:36:32.165979 fdatasync(25) = 0 <0.011797> >>> 11:36:32.178867 fdatasync(25) = 0 <23.636614> >>> 11:36:55.827726 fdatasync(25) = 0 <0.009595> >>> 11:36:55.838702 fdatasync(25) = 0 <0.007261> >>> 11:36:55.850440 fdatasync(25) = 0 <0.006807> >>> 11:36:55.858168 fdatasync(25) = 0 <0.006767> >>> [...] >>> >>> File descriptor 25 here points to a file which is just ~90KB, so it >>> really shouldn't take that long. >>> >>> Removing the second partition again resolves the problem. Does anyone >>> know this issue? Is it related to btrfs? Or am I just doing something >>> wrong? >>> >>> Best, >>> Florian >>> >>> Some more info: >>> >>> $ btrfs device usage / >>> /dev/sda2, ID: 2 >>> Device size: 52.16GiB >>> Device slack: 0.00B >>> Data,single: 1.00GiB >>> Unallocated: 51.16GiB >>> >>> /dev/sda8, ID: 1 >>> Device size: 174.92GiB >>> Device slack: 0.00B >>> Data,single: 168.91GiB >>> Metadata,single: 3.01GiB >>> System,single: 4.00MiB >>> Unallocated: 3.00GiB >>> >>> $ fdisk -l /dev/sda >>> Disk /dev/sda: 238.5 GiB, 256060514304 bytes, 500118192 sectors >>> Disk model: SAMSUNG SSD PM87 >>> Units: sectors of 1 * 512 = 512 bytes >>> Sector size (logical/physical): 512 bytes / 512 bytes >>> I/O size (minimum/optimal): 512 bytes / 512 bytes >>> Disklabel type: gpt >>> Disk identifier: A48B5A25-AA84-4D3F-90DD-E8A4991BDF03 >>> >>> Device Start End Sectors Size Type >>> /dev/sda1 2048 1026047 1024000 500M EFI System >>> /dev/sda2 1026048 110422015 109395968 52.2G Linux filesystem >>> /dev/sda8 110422016 477263871 366841856 174.9G Linux filesystem >>> /dev/sda9 477263872 481458175 4194304 2G Linux swap >>> >>> $ uname -a >>> Linux dell 4.20.0-arch1-1-ARCH #1 SMP PREEMPT Mon Dec 24 03:00:40 UTC >>> 2018 x86_64 GNU/Linux > >> >>> >>> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: applications hang on a btrfs spanning two partitions 2019-01-09 10:03 ` Nikolay Borisov @ 2019-01-09 20:10 ` Florian Stecker 2019-01-12 2:12 ` Chris Murphy 0 siblings, 1 reply; 12+ messages in thread From: Florian Stecker @ 2019-01-09 20:10 UTC (permalink / raw) To: Nikolay Borisov, linux-btrfs On 1/9/19 11:03 AM, Nikolay Borisov wrote: > > > On 9.01.19 г. 11:16 ч., Florian Stecker wrote: >>> >>> Provide output of echo w > /proc/sysrq-trigger when the hang occurs >>> otherwise it's hard to figure what's going on. >>> >> >> Here's one, again in gajim. This time, fdatasync() took "only" 2 seconds: >> >> [42481.243491] sysrq: SysRq : Show Blocked State >> [42481.243494] task PC stack pid father >> [42481.243566] gajim D 0 15778 15774 0x00000083 >> [42481.243569] Call Trace: >> [42481.243575] ? __schedule+0x29b/0x8b0 >> [42481.243576] ? bit_wait+0x50/0x50 >> [42481.243578] schedule+0x32/0x90 >> [42481.243580] io_schedule+0x12/0x40 >> [42481.243582] bit_wait_io+0xd/0x50 >> [42481.243583] __wait_on_bit+0x6c/0x80 >> [42481.243585] out_of_line_wait_on_bit+0x91/0xb0 >> [42481.243587] ? init_wait_var_entry+0x40/0x40 >> [42481.243605] write_all_supers+0x418/0xa70 [btrfs] >> [42481.243622] btrfs_sync_log+0x695/0x910 [btrfs] >> [42481.243625] ? _raw_spin_lock_irqsave+0x25/0x50 >> [42481.243641] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >> [42481.243655] btrfs_sync_file+0x3a9/0x3d0 [btrfs] >> [42481.243659] do_fsync+0x38/0x70 >> [42481.243661] __x64_sys_fdatasync+0x13/0x20 >> [42481.243663] do_syscall_64+0x5b/0x170 >> [42481.243666] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [42481.243667] RIP: 0033:0x7fd4022f873f >> [42481.243671] Code: Bad RIP value. >> [42481.243672] RSP: 002b:00007ffd3710a300 EFLAGS: 00000293 ORIG_RAX: >> 000000000000004b >> [42481.243674] RAX: ffffffffffffffda RBX: 0000000000000019 RCX: >> 00007fd4022f873f >> [42481.243675] RDX: 0000000000000000 RSI: 0000000000000002 RDI: >> 0000000000000019 >> [42481.243675] RBP: 0000000000000000 R08: 000055d8d8649f68 R09: >> 00007ffd3710a320 >> [42481.243676] R10: 0000000000013000 R11: 0000000000000293 R12: >> 0000000000000000 >> [42481.243677] R13: 0000000000000000 R14: 000055d8d8363fa0 R15: >> 000055d8d8613040 > > This shows that IO was send to disk to write the supper blocks following > an fsync and it's waiting for IO to finish. This seems like a problem in > the storage layer, i.e IOs being stuck. Check your dmesg for any error. There are no IO errors in dmesg. Also, I never had any problems with this disk, SMART reports nothing, and also btrfs dev stats and btrfs scrub say everything's ok. I now found a way to reproduce this issue more reliably: If I just write 10KB of random data to a file and sync, this usually takes only a few ms, but on my setup if I do it 1000 times, about 10 of them will be longer than 100ms, sometimes much longer: $ for i in $(seq 0 1000); do dd if=/dev/urandom of=/home/stecker/test bs=10k count=1 conv=fdatasync 2>&1 && sleep 0.1; done | grep '([1-9][0-9]*\.|0\.[1-9])[0-9]* s' 10240 bytes (10 kB, 10 KiB) copied, 1.12436 s, 9.1 kB/s 10240 bytes (10 kB, 10 KiB) copied, 1.33179 s, 7.7 kB/s 10240 bytes (10 kB, 10 KiB) copied, 1.27658 s, 8.0 kB/s 10240 bytes (10 kB, 10 KiB) copied, 0.401769 s, 25.5 kB/s 10240 bytes (10 kB, 10 KiB) copied, 1.019 s, 10.0 kB/s 10240 bytes (10 kB, 10 KiB) copied, 1.95148 s, 5.2 kB/s 10240 bytes (10 kB, 10 KiB) copied, 1.48939 s, 6.9 kB/s 10240 bytes (10 kB, 10 KiB) copied, 1.9071 s, 5.4 kB/s 10240 bytes (10 kB, 10 KiB) copied, 1.90988 s, 5.4 kB/s 10240 bytes (10 kB, 10 KiB) copied, 0.845141 s, 12.1 kB/s 10240 bytes (10 kB, 10 KiB) copied, 0.184172 s, 55.6 kB/s If I use the two partitions /dev/sda2 not as part of a single fs, but as a seperate btrfs filesystems, this does not happen, all writes are fast. But this should not make a difference for the storage layer, or should it? I mean, it writes the superblocks to the exact same position on the disk? By the way, thanks a lot for your help! > >> >> >> On 1/9/19 7:24 AM, Nikolay Borisov wrote: >>> >>> >>> On 8.01.19 г. 21:38 ч., Florian Stecker wrote: >>>> Hi everyone, >>>> >>>> I extended the btrfs volume on my laptop by adding a second partition to >>>> it which lies on the same SSD (using btrfs device add). Since I did >>>> this, all kinds of applications regularly hang for up to 30 seconds. It >>>> seems they are stuck in the fdatasync syscall. For example: >>>> >>>> $ strace -tt -T gajim 2>&1 | grep fdatasync >>>> [...] >>>> 11:36:31.112200 fdatasync(25) = 0 <0.006958> >>>> 11:36:32.147525 fdatasync(25) = 0 <0.008138> >>>> 11:36:32.156882 fdatasync(25) = 0 <0.006866> >>>> 11:36:32.165979 fdatasync(25) = 0 <0.011797> >>>> 11:36:32.178867 fdatasync(25) = 0 <23.636614> >>>> 11:36:55.827726 fdatasync(25) = 0 <0.009595> >>>> 11:36:55.838702 fdatasync(25) = 0 <0.007261> >>>> 11:36:55.850440 fdatasync(25) = 0 <0.006807> >>>> 11:36:55.858168 fdatasync(25) = 0 <0.006767> >>>> [...] >>>> >>>> File descriptor 25 here points to a file which is just ~90KB, so it >>>> really shouldn't take that long. >>>> >>>> Removing the second partition again resolves the problem. Does anyone >>>> know this issue? Is it related to btrfs? Or am I just doing something >>>> wrong? >>>> >>>> Best, >>>> Florian >>>> >>>> Some more info: >>>> >>>> $ btrfs device usage / >>>> /dev/sda2, ID: 2 >>>> Device size: 52.16GiB >>>> Device slack: 0.00B >>>> Data,single: 1.00GiB >>>> Unallocated: 51.16GiB >>>> >>>> /dev/sda8, ID: 1 >>>> Device size: 174.92GiB >>>> Device slack: 0.00B >>>> Data,single: 168.91GiB >>>> Metadata,single: 3.01GiB >>>> System,single: 4.00MiB >>>> Unallocated: 3.00GiB >>>> >>>> $ fdisk -l /dev/sda >>>> Disk /dev/sda: 238.5 GiB, 256060514304 bytes, 500118192 sectors >>>> Disk model: SAMSUNG SSD PM87 >>>> Units: sectors of 1 * 512 = 512 bytes >>>> Sector size (logical/physical): 512 bytes / 512 bytes >>>> I/O size (minimum/optimal): 512 bytes / 512 bytes >>>> Disklabel type: gpt >>>> Disk identifier: A48B5A25-AA84-4D3F-90DD-E8A4991BDF03 >>>> >>>> Device Start End Sectors Size Type >>>> /dev/sda1 2048 1026047 1024000 500M EFI System >>>> /dev/sda2 1026048 110422015 109395968 52.2G Linux filesystem >>>> /dev/sda8 110422016 477263871 366841856 174.9G Linux filesystem >>>> /dev/sda9 477263872 481458175 4194304 2G Linux swap >>>> >>>> $ uname -a >>>> Linux dell 4.20.0-arch1-1-ARCH #1 SMP PREEMPT Mon Dec 24 03:00:40 UTC >>>> 2018 x86_64 GNU/Linux >> >>> >>>> >>>> >> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: applications hang on a btrfs spanning two partitions 2019-01-09 20:10 ` Florian Stecker @ 2019-01-12 2:12 ` Chris Murphy 2019-01-12 10:19 ` Florian Stecker 0 siblings, 1 reply; 12+ messages in thread From: Chris Murphy @ 2019-01-12 2:12 UTC (permalink / raw) To: Florian Stecker; +Cc: Nikolay Borisov, Btrfs BTRFS On Wed, Jan 9, 2019 at 1:10 PM Florian Stecker <m19@florianstecker.de> wrote: > > > > On 1/9/19 11:03 AM, Nikolay Borisov wrote: > > > > > > On 9.01.19 г. 11:16 ч., Florian Stecker wrote: > >>> > >>> Provide output of echo w > /proc/sysrq-trigger when the hang occurs > >>> otherwise it's hard to figure what's going on. > >>> > >> > >> Here's one, again in gajim. This time, fdatasync() took "only" 2 seconds: > >> > >> [42481.243491] sysrq: SysRq : Show Blocked State > >> [42481.243494] task PC stack pid father > >> [42481.243566] gajim D 0 15778 15774 0x00000083 > >> [42481.243569] Call Trace: > >> [42481.243575] ? __schedule+0x29b/0x8b0 > >> [42481.243576] ? bit_wait+0x50/0x50 > >> [42481.243578] schedule+0x32/0x90 > >> [42481.243580] io_schedule+0x12/0x40 > >> [42481.243582] bit_wait_io+0xd/0x50 > >> [42481.243583] __wait_on_bit+0x6c/0x80 > >> [42481.243585] out_of_line_wait_on_bit+0x91/0xb0 > >> [42481.243587] ? init_wait_var_entry+0x40/0x40 > >> [42481.243605] write_all_supers+0x418/0xa70 [btrfs] > >> [42481.243622] btrfs_sync_log+0x695/0x910 [btrfs] > >> [42481.243625] ? _raw_spin_lock_irqsave+0x25/0x50 > >> [42481.243641] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >> [42481.243655] btrfs_sync_file+0x3a9/0x3d0 [btrfs] > >> [42481.243659] do_fsync+0x38/0x70 > >> [42481.243661] __x64_sys_fdatasync+0x13/0x20 > >> [42481.243663] do_syscall_64+0x5b/0x170 > >> [42481.243666] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >> [42481.243667] RIP: 0033:0x7fd4022f873f > >> [42481.243671] Code: Bad RIP value. > >> [42481.243672] RSP: 002b:00007ffd3710a300 EFLAGS: 00000293 ORIG_RAX: > >> 000000000000004b > >> [42481.243674] RAX: ffffffffffffffda RBX: 0000000000000019 RCX: > >> 00007fd4022f873f > >> [42481.243675] RDX: 0000000000000000 RSI: 0000000000000002 RDI: > >> 0000000000000019 > >> [42481.243675] RBP: 0000000000000000 R08: 000055d8d8649f68 R09: > >> 00007ffd3710a320 > >> [42481.243676] R10: 0000000000013000 R11: 0000000000000293 R12: > >> 0000000000000000 > >> [42481.243677] R13: 0000000000000000 R14: 000055d8d8363fa0 R15: > >> 000055d8d8613040 > > > > This shows that IO was send to disk to write the supper blocks following > > an fsync and it's waiting for IO to finish. This seems like a problem in > > the storage layer, i.e IOs being stuck. Check your dmesg for any error. > There are no IO errors in dmesg. Also, I never had any problems with > this disk, SMART reports nothing, and also btrfs dev stats and btrfs > scrub say everything's ok. What do you get for: mount | grep btrfs btrfs insp dump-s -f /dev/sda8 I ran in this same configuration for a long time, maybe 5 months, and never ran into this problem. But it was with much older kernel, perhaps circa 4.8 era. -- Chris Murphy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: applications hang on a btrfs spanning two partitions 2019-01-12 2:12 ` Chris Murphy @ 2019-01-12 10:19 ` Florian Stecker 2019-01-14 5:49 ` Duncan 0 siblings, 1 reply; 12+ messages in thread From: Florian Stecker @ 2019-01-12 10:19 UTC (permalink / raw) To: Chris Murphy; +Cc: Nikolay Borisov, Btrfs BTRFS I found out a few things in the meantime: * My IO scheduler is mq-deadline by default. When I switch it to none, the problem disappears * What hangs is the call to wait_on_buffer inside wait_dev_supers, while waiting for superblock 0 of device 1 to be written. That is /dev/sda8, i.e. the one which is physically behind /dev/sda2, but has the lower device id So it seems to me as if btrfs produces some strange sequence of writes which confuse the scheduler and cause it to hang? Could this have to do with the fact that the order of devids is different to the physical order of the partitons? If you guys want me to, I can definitely put some printks etc. into my kernel. I just know too little about the code to see what information could be useful. > What do you get for: > mount | grep btrfs > btrfs insp dump-s -f /dev/sda8 $ mount | grep btrfs /dev/sda8 on / type btrfs (rw,relatime,ssd,space_cache,subvolid=5,subvol=/) $ btrfs insp dump-s -f /dev/sda8 superblock: bytenr=65536, device=/dev/sda8 --------------------------------------------------------- csum_type 0 (crc32c) csum_size 4 csum 0x5a2fbdf1 [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid c4c0b512-00d3-42f2-a2e1-dcc62a2acd98 label generation 575201 root 622264320 sys_array_size 97 chunk_root_generation 574575 root_level 1 chunk_root 241407393792 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 243285360640 bytes_used 184232562688 sectorsize 4096 nodesize 16384 leafsize (deprecated) 16384 stripesize 4096 root_dir 6 num_devices 2 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x161 ( MIXED_BACKREF | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) cache_generation 575201 uuid_tree_generation 574411 dev_item.uuid 3e8d6ecb-a595-4c6a-aae8-e6a09e5b151d dev_item.fsid c4c0b512-00d3-42f2-a2e1-dcc62a2acd98 [match] dev_item.type 0 dev_item.total_bytes 187823030272 dev_item.bytes_used 183523868672 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 sys_chunk_array[2048]: item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 241407361024) length 33554432 owner 2 stripe_len 65536 type SYSTEM io_align 65536 io_width 65536 sector_size 4096 num_stripes 1 sub_stripes 1 stripe 0 devid 2 offset 1074790400 dev_uuid d56afd3b-d89c-4e59-8850-7dedd84900b9 backup_roots[4]: backup 0: backup_tree_root: 618283008 gen: 575198 level: 1 backup_chunk_root: 241407393792 gen: 574575 level: 1 backup_extent_root: 616038400 gen: 575198 level: 2 backup_fs_root: 593952768 gen: 575198 level: 2 backup_dev_root: 487751680 gen: 575188 level: 0 backup_csum_root: 594149376 gen: 575198 level: 2 backup_total_bytes: 243285360640 backup_bytes_used: 184231972864 backup_num_devices: 2 backup 1: backup_tree_root: 613908480 gen: 575199 level: 1 backup_chunk_root: 241407393792 gen: 574575 level: 1 backup_extent_root: 607731712 gen: 575199 level: 2 backup_fs_root: 622444544 gen: 575200 level: 2 backup_dev_root: 487751680 gen: 575188 level: 0 backup_csum_root: 603389952 gen: 575199 level: 2 backup_total_bytes: 243285360640 backup_bytes_used: 184232038400 backup_num_devices: 2 backup 2: backup_tree_root: 628539392 gen: 575200 level: 1 backup_chunk_root: 241407393792 gen: 574575 level: 1 backup_extent_root: 616103936 gen: 575200 level: 2 backup_fs_root: 622444544 gen: 575200 level: 2 backup_dev_root: 487751680 gen: 575188 level: 0 backup_csum_root: 616890368 gen: 575200 level: 2 backup_total_bytes: 243285360640 backup_bytes_used: 184232468480 backup_num_devices: 2 backup 3: backup_tree_root: 622264320 gen: 575201 level: 1 backup_chunk_root: 241407393792 gen: 574575 level: 1 backup_extent_root: 617791488 gen: 575201 level: 2 backup_fs_root: 615432192 gen: 575201 level: 2 backup_dev_root: 487751680 gen: 575188 level: 0 backup_csum_root: 615841792 gen: 575201 level: 2 backup_total_bytes: 243285360640 backup_bytes_used: 184232562688 backup_num_devices: 2 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: applications hang on a btrfs spanning two partitions 2019-01-12 10:19 ` Florian Stecker @ 2019-01-14 5:49 ` Duncan 2019-01-14 11:35 ` Marc Joliet 0 siblings, 1 reply; 12+ messages in thread From: Duncan @ 2019-01-14 5:49 UTC (permalink / raw) To: linux-btrfs Florian Stecker posted on Sat, 12 Jan 2019 11:19:14 +0100 as excerpted: > $ mount | grep btrfs > /dev/sda8 on / type btrfs > (rw,relatime,ssd,space_cache,subvolid=5,subvol=/) Unlikely to be apropos to the problem at hand, but FYI... Unless you have a known reason not to[1], running noatime with btrfs instead of the kernel-default relatime is strongly recommended, especially if you use btrfs snapshotting on the filesystem. The reasoning is that even tho relatime reduces the default access-time updates to once a day, it still likely-unnecessarily turns otherwise read- only operations into read-write operations, and atimes are metadata, which btrfs always COWs (copy-on-writes), meaning atime updates can trigger cascading metadata block-writes and much larger than anticipated[2] write-amplification, potentially hurting performance, yes, even for relatime, depending on your usage. In addition, if you're using snapshotting and not using noatime, it can easily happen that a large portion of the change between one snapshot and the next is simply atime updates, thus making the space referenced exclusively by individual affected snapshots far larger than it would otherwise be. --- [1] mutt is AFAIK the only widely used application that still depends on atime updates, and it only does so in certain modes, not with mbox-format mailboxes, for instance. So unless you're using it, or your backup solution happens to use atime, chances are quite high that noatime won't disrupt your usage at all. [2] Larger than anticipated write-amplification: Especially when you /thought/ you were only reading the files and hadn't considered the atime update that read could trigger, thus effectively generating infinity write amplification because the read access did an atime update and turned what otherwise wouldn't be a write operation at all into one! -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: applications hang on a btrfs spanning two partitions 2019-01-14 5:49 ` Duncan @ 2019-01-14 11:35 ` Marc Joliet 2019-01-15 8:33 ` Duncan 0 siblings, 1 reply; 12+ messages in thread From: Marc Joliet @ 2019-01-14 11:35 UTC (permalink / raw) To: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 628 bytes --] Am Montag, 14. Januar 2019, 06:49:58 CET schrieb Duncan: [...] > Unless you have a known reason not to[1], running noatime with btrfs > instead of the kernel-default relatime is strongly recommended, > especially if you use btrfs snapshotting on the filesystem. [...] The one reason I decided to remove noatime from my systems' mount options is because I use systemd-tmpfiles to clean up cache directories, for which it is necessary to leave atime intact (since caches are often Write Once Read Many). -- Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: applications hang on a btrfs spanning two partitions 2019-01-14 11:35 ` Marc Joliet @ 2019-01-15 8:33 ` Duncan 2019-01-15 22:40 ` Marc Joliet 0 siblings, 1 reply; 12+ messages in thread From: Duncan @ 2019-01-15 8:33 UTC (permalink / raw) To: linux-btrfs Marc Joliet posted on Mon, 14 Jan 2019 12:35:05 +0100 as excerpted: > Am Montag, 14. Januar 2019, 06:49:58 CET schrieb Duncan: > [...] >> Unless you have a known reason not to[1], running noatime with btrfs >> instead of the kernel-default relatime is strongly recommended, >> especially if you use btrfs snapshotting on the filesystem. > [...] > > The one reason I decided to remove noatime from my systems' mount > options is because I use systemd-tmpfiles to clean up cache directories, > for which it is necessary to leave atime intact (since caches are often > Write Once Read Many). Thanks for the reply. I hadn't really thought of that use, but it makes sense... FWIW systemd here too, but I suppose it depends on what's being cached and particularly on the expense of recreation of cached data. I actually have many of my caches (user/browser caches, etc) on tmpfs and reboot several times a week, so much of the cached data is only trivially cached as it's trivial to recreate/redownload. OTOH, running gentoo, my ccache and binpkg cache are seriously CPU-cycle expensive to recreate, so you can bet those are _not_ tmpfs, but OTTH, they're not managed by systemd-tmpfiles either. (Ccache manages its own cache and together with the source-tarballs cache and git-managed repo trees along with binpkgs, I have a dedicated packages btrfs containing all of them, so I eclean binpkgs and distfiles whenever the 24-gigs space (48-gig total, 24-gig each on pair-device btrfs raid1) gets too close to full, then btrfs balance with -dusage= to reclaim partial chunks to unallocated.) Anyway, if you're not regularly snapshotting, relatime is reasonably fine, tho I'd still keep the atime effects in mind and switch to noatime if you end up in a recovery situation that requires writable mounting. (Losing a device in btrfs raid1 and mounting writable in ordered to replace it and rebalance comes to mind as one example of a writable-mount recovery scenario where noatime until full replace/rebalance/scrub completion would prevent unnecessary writes until the raid1 is safely complete and scrub-verified again.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: applications hang on a btrfs spanning two partitions 2019-01-15 8:33 ` Duncan @ 2019-01-15 22:40 ` Marc Joliet 2019-01-17 11:15 ` Duncan 0 siblings, 1 reply; 12+ messages in thread From: Marc Joliet @ 2019-01-15 22:40 UTC (permalink / raw) To: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 4466 bytes --] Am Dienstag, 15. Januar 2019, 09:33:40 CET schrieb Duncan: > Marc Joliet posted on Mon, 14 Jan 2019 12:35:05 +0100 as excerpted: > > Am Montag, 14. Januar 2019, 06:49:58 CET schrieb Duncan: > > [...] > > > >> Unless you have a known reason not to[1], running noatime with btrfs > >> instead of the kernel-default relatime is strongly recommended, > >> especially if you use btrfs snapshotting on the filesystem. > > > > [...] > > > > The one reason I decided to remove noatime from my systems' mount > > options is because I use systemd-tmpfiles to clean up cache directories, > > for which it is necessary to leave atime intact (since caches are often > > Write Once Read Many). > > Thanks for the reply. I hadn't really thought of that use, but it makes > sense... Specifically, I mean ~/.cache/ (plus a separate entry for ~/.cache/ thumbnails/, since I want thumbnails to live longer): % grep \^[\^#] .config/user-tmpfiles.d/*.conf .config/user-tmpfiles.d/clean.conf:d %C/thumbnails - - - 730d - .config/user-tmpfiles.d/subvolumes.conf:q %C 0700 - - 60d - .config/user-tmpfiles.d/subvolumes.conf:q %h/tmp 0700 - - - - I don't use qgroups now, but will probably in the future, hence the use of "q" instead of "v". ~/tmp/ is just a scratch space that I don't want snapshotted. I haven't bothered configuring /var/cache/, other than making it a subvolume so it's not a part of my snapshots (overriding the systemd default of creating it as a directory). It appears to me that it's managed just fine by pre- existing tmpfiles.d snippets and by the applications that use it cleaning up after themselves (except for portage, see below). > FWIW systemd here too, but I suppose it depends on what's being cached > and particularly on the expense of recreation of cached data. I actually > have many of my caches (user/browser caches, etc) on tmpfs and reboot > several times a week, so much of the cached data is only trivially cached > as it's trivial to recreate/redownload. While that sort of tmpfs hackery is definitely cool, my system is, despite its age, fast enough for me that I don't want to bother with that (plus I like my 8 GB of RAM to be used just for applications and whatever Linux decides to cache in RAM). Also, modern SSDs live long enough that I'm not worried about wearing them out through my daily usage (which IIRC was a major reason for you to do things that way). > OTOH, running gentoo, my ccache and binpkg cache are seriously CPU-cycle > expensive to recreate, so you can bet those are _not_ tmpfs, but OTTH, > they're not managed by systemd-tmpfiles either. (Ccache manages its own > cache and together with the source-tarballs cache and git-managed repo > trees along with binpkgs, I have a dedicated packages btrfs containing > all of them, so I eclean binpkgs and distfiles whenever the 24-gigs space > (48-gig total, 24-gig each on pair-device btrfs raid1) gets too close to > full, then btrfs balance with -dusage= to reclaim partial chunks to > unallocated.) For distfiles I just have a weekly systemd timer that runs "eclean-dist -d" (I stopped using the buildpkg feature, so no eclean-pkg), and have moved both $DISTDIR and $PKGDIR to their future default locations in /var/cache/. (They used to reside on my desktops HDD RAID1 as distinct subvolumes, but I recently bought a larger SSD, so I set up the above and got rid of two fstab entries.) > Anyway, if you're not regularly snapshotting, relatime is reasonably > fine, Personally, I don't notice the difference between noatime and relatime in day- to-day usage (perhaps I just don't snapshot often enough). > tho I'd still keep the atime effects in mind and switch to noatime > if you end up in a recovery situation that requires writable mounting. > (Losing a device in btrfs raid1 and mounting writable in ordered to > replace it and rebalance comes to mind as one example of a writable-mount > recovery scenario where noatime until full replace/rebalance/scrub > completion would prevent unnecessary writes until the raid1 is safely > complete and scrub-verified again.) That all makes sense. I was going to argue that I can't imagine randomly reading files in a recovery situation, but eventually realized that "ls" would be enough to trigger a directory atime update. So yeah, one should keep the above mind. -- Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: applications hang on a btrfs spanning two partitions 2019-01-15 22:40 ` Marc Joliet @ 2019-01-17 11:15 ` Duncan 0 siblings, 0 replies; 12+ messages in thread From: Duncan @ 2019-01-17 11:15 UTC (permalink / raw) To: linux-btrfs Marc Joliet posted on Tue, 15 Jan 2019 23:40:18 +0100 as excerpted: > Am Dienstag, 15. Januar 2019, 09:33:40 CET schrieb Duncan: >> Marc Joliet posted on Mon, 14 Jan 2019 12:35:05 +0100 as excerpted: >> > Am Montag, 14. Januar 2019, 06:49:58 CET schrieb Duncan: >> > >> >> ... noatime ... >> > >> > The one reason I decided to remove noatime from my systems' mount >> > options is because I use systemd-tmpfiles to clean up cache >> > directories, for which it is necessary to leave atime intact >> > (since caches are often Write Once Read Many). >> >> Thanks for the reply. I hadn't really thought of that use, but it >> makes sense... I really enjoy these "tips" subthreads. As I said I hadn't really thought of that use, and seeing and understanding other people's solutions helps when I later find reason to review/change my own. =:^) One example is an ssd brand reliability discussion from a couple years ago. I had the main system on ssds then and wasn't planning on an immediate upgrade, but later on, I got tired of the media partition and a main system backup being on slow spinning rust, and dug out that ssd discussion to help me decide what to buy. (Samsung 1 TB evo 850s, FWIW.) > Specifically, I mean ~/.cache/ (plus a separate entry for ~/.cache/ > thumbnails/, since I want thumbnails to live longer): Here, ~/.cache -> tmp/cache/ and ~/tmp -> /tmp/tmp-$USER/, plus XDG_CACHE_HOME=$HOME/tmp/cache/, with /tmp being tmpfs. So as I said, user cache is on tmpfs. Thumbnails... I actually did an experiment with the .thumbnails backed up elsewhere and empty, and found that with my ssds anyway, rethumbnailing was close enough to having them cached that it didn't really matter to my visual browsing experience. So not only do I not mind thumbnails being on tmpfs, I actually have gwenview, my primary images browser, set to delete its thumbnails dir on close. > I haven't bothered configuring /var/cache/, other than making it a > subvolume so it's not a part of my snapshots (overriding the systemd > default of creating it as a directory). It appears to me that it's > managed just fine by pre- existing tmpfiles.d snippets and by the > applications that use it cleaning up after themselves (except for > portage, see below). Here, /var/cache/ is on /, which remains mounted read-only by default. The only things using it are package-updates related, and I obviously have to mount / rw for package updates, so it works fine. (My sync script mounts the dedicated packages filesystem containing the repos, ccache, distdir, and binpkgs, and remounting / rw, and that's the first thing I run doing an update, so I don't even have to worry about doing the mounts manually.) >> FWIW systemd here too, but I suppose it depends on what's being cached >> and particularly on the expense of recreation of cached data. I >> actually have many of my caches (user/browser caches, etc) on tmpfs and >> reboot several times a week, so much of the cached data is only >> trivially cached as it's trivial to recreate/redownload. > > While that sort of tmpfs hackery is definitely cool, my system is, > despite its age, fast enough for me that I don't want to bother with > that (plus I like my 8 GB of RAM to be used just for applications and > whatever Linux decides to cache in RAM). Also, modern SSDs live long > enough that I'm not worried about wearing them out through my daily > usage (which IIRC was a major reason for you to do things that way). 16 gigs RAM here, and except for building chromium (in tmpfs), I seldom fill it even with cache -- most of the time several gigs remain entirely empty. With 8 gig I'd obviously have to worry a bit more about what I put in tmpfs, but given that I have the RAM space, I might as well use it. When I setup this system I was upgrading from a 4-core (original 2-socket dual-core 3-digit Opterons, purchased in 2003 and ran until the caps started dying in 2011), this system being a 6-core fx-series, and based on the experience with the quad-core, I figured 12 gig RAM for the 6- core. But with pairs of RAM sticks for dual-channel, powers of two worked better, so it was 8 gig or 16 gig. And given that I had worked with 8 gig on the quad-core, I knew that would be OK, but 12 gig would mean less cache dumping, so 16 gig it was. And my estimate was right on. Since 2011, I've typically run up to ~12 gigs RAM used including cache, leaving ~4 gigs of the 16 entirely unused most of the time, tho I do use the full 16 gig sometimes when doing updates, since I have PORTAGE_TMPDIR set to tmpfs. Of course since my purchase in 2011 I've upgraded to SSDs and RAM-based storage cache isn't as important as it was back on spinning rust, so for my routine usage 8 gig RAM with ssds would be just fine, today. But building chromium on tmpfs is the exception. Until recently I was running firefox, but for various reasons including firefox upstream requiring pulse-audio now so I can't just run upstream firefox binaries, and gentoo's firefox updates unfortunately sometimes being uncomfortably late for a security-minded user aware that their primary browser is the single most security-exposed application they run, and often build or run problems after gentoo /did/ have a firefox build, making reliably running a secure-as-possible firefox even *more* of a problem, a few months ago I switched to chromium. And chromium is over a half-gig of compressed sources that expands to several gigs of build dir. Put that in tmpfs along with the memory requirements of a multi-threaded build, with USE=jumbo-build and a couple gigs of other stuff (an X/kde-plasma session, building in a konsole window, often with chromium and minitube running) in memory too, and... That 16 gig RAM isn't enough for that sort of chromium build. =:^( So for the first time on the ssds, I reconfigured and rebuilt the kernel with swap support, and added a pair of 16-gig each swap partitions on the ssds, for now 16 gig RAM and 32 gig swap. With the parallel-jobs cut down slightly via a package.env setting to better control memory usage, to -j7 from the normal -j8, and with PORTAGE_TMPDIR still pointed at tmpfs, I run about 16 gig into swap building chromium now. So for that I could now use 32 gig of RAM. Meanwhile, it's 2019, and this 2011 system's starting to feel a bit dated in other ways too, now, and I'm already at the ~8 years my last system lasted, so I'm thinking about upgrading. I've upgraded to SSDs and to big-screen monitors (a 65-inch/165cm 4K TV as primary) on this system, but I've not done the CPU or memory upgrades on it that I did on the last one, and having to enable swap to build chromium just seems so last century. So I'm thinking about upgrading later this year, probably to a zen-2- based system with hardware spectre mitigations. And I want at least 32-gig RAM when I do, depending on the number of cores/threads. I'm figuring 4-gig/thread now, 4-core/8-thread minimum, which would be the 32-gig. But 8-core/16-thread, 64-gig RAM, would be nice. But I'm moving this spring and am busy with that first. When that's done and I'm settled in the new place I'll see what my financials look like and go from there. >> OTOH, running gentoo, my ccache and binpkg cache are seriously >> CPU-cycle expensive to recreate, so you can bet those are _not_ tmpfs, >> but OTTH, they're not managed by systemd-tmpfiles either. (Ccache >> manages its own cache and together with the source-tarballs cache and >> git-managed repo trees along with binpkgs, I have a dedicated packages >> btrfs containing all of them, so I eclean binpkgs and distfiles >> whenever the 24-gigs space (48-gig total, 24-gig each on pair-device >> btrfs raid1) gets too close to full, then btrfs balance with -dusage= >> to reclaim partial chunks to unallocated.) > > For distfiles I just have a weekly systemd timer that runs "eclean-dist > -d" (I stopped using the buildpkg feature, so no eclean-pkg), and have > moved both $DISTDIR and $PKGDIR to their future default locations in > /var/cache/. (They used to reside on my desktops HDD RAID1 as distinct > subvolumes, but I recently bought a larger SSD, so I set up the above > and got rid of two fstab entries.) I like short paths. So my packages filesystem mountpoint is /p, with /p/gentoo and /p/kde being my main repos, DISTDIR=/p/src, PKGDIR=/p/pkw (w=workstation, back when I had my 32-bit netbook and 32-bit chroot build image on the workstation too, I had its packages in pkn, IIRC), /p/linux for the linux git tree, /p/kpatch for local kernel patches, /p/cc for ccache, and /p/ initramfs for my (dracut-generate) initramfs. And FWIW, /h is the home mountpoint, /lg the log mountpoint (with /var/log -> /lg) /l the system-local dir (with /var/local -> /l) on /, /mnt for auxiliary mounts, /bk the root-backup mountpoint, etc. You stopped using binpkgs? I can't imagine doing that. Not only does it make the occasional downgrade easier, older binpkgs come in handy for checking whether a file location moved in recent versions, looking up default configs and seeing how they've changed, checking the dates on them to know when I was running version X or whether I upgraded package Y before or after package Z, etc. Of course I could use btrfs snapshotting for most of that and could get the other info in other ways, but I had this setup working and tested long before btrfs, and it seems less risky and easier to quantify and manage than btrfs snapshotting. But surely that's because I /did/ have it up, running and tested, before btrfs, so it's old hat to me now. If I were starting with it now, I imagine I might well find the btrfs snapshotting thing simpler to manage, and covering a broader use-case too. >> tho I'd still keep the atime effects in mind and switch to noatime if >> you end up in a recovery situation that requires writable mounting. >> (Losing a device in btrfs raid1 and mounting writable in ordered to >> replace it and rebalance comes to mind as one example of a >> writable-mount recovery scenario where noatime until full >> replace/rebalance/scrub completion would prevent unnecessary writes >> until the raid1 is safely complete and scrub-verified again.) > > That all makes sense. I was going to argue that I can't imagine > randomly reading files in a recovery situation, but eventually realized > that "ls" would be enough to trigger a directory atime update. So yeah, > one should keep the above mind. Not just ls, etc, either. Consider manpage access, etc, as well. Plus of course any executable binaries you run, the libs they load, scripts... If atime's on, all those otherwise read-only accesses will trigger atime-update writes, and with btrfs, updating that bit of metadata copies and writes the entire updated metadata block, triggering an update and thus a COW of the metadata block tracking the one just written... all the way up the metadata tree. In a recovery situation where every write is an additional risk, that's a lot of additional risk, all for not-so-necessary atime updates! -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-01-17 11:18 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-01-08 19:38 applications hang on a btrfs spanning two partitions Florian Stecker 2019-01-09 6:24 ` Nikolay Borisov 2019-01-09 9:16 ` Florian Stecker 2019-01-09 10:03 ` Nikolay Borisov 2019-01-09 20:10 ` Florian Stecker 2019-01-12 2:12 ` Chris Murphy 2019-01-12 10:19 ` Florian Stecker 2019-01-14 5:49 ` Duncan 2019-01-14 11:35 ` Marc Joliet 2019-01-15 8:33 ` Duncan 2019-01-15 22:40 ` Marc Joliet 2019-01-17 11:15 ` Duncan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).