Re: Bcache still unstable for me (memory problems)

* Re: Bcache still unstable for me (memory problems)
@ 2016-03-08 23:04 Richard Bade
  2016-03-09  3:51 ` Eric Wheeler
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Bade @ 2016-03-08 23:04 UTC (permalink / raw)
  To: linux-bcache

Hi Guys,
I've also seen this issue when registering a device.
[Fri Feb  5 14:06:59 2016] bcache: register_cache() error opening
dm-6: cannot allocate memory

For me the situation happens because I'm running Ceph OSD's on the
host. These are using all the available memory and for some reason the
register cannot allocate memory.
I've reproduced this several times, but basically if I register on a
system that has been up for a while and therefore using all it's RAM
this will happen.
Here's two specific time's it's happened:
when creating using "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
--bdev /dev/sdf1". It failed due to xfs still remaining on /dev/sdf1.
Then run wipefs and "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
--bdev /dev/sdf1" again. CRASH.
when manually creating a cache device using "make-bcache --cache
/dev/vg-raid0/lv-bcache". The cache set didn't show up in
/sys/fs/bcache. echo "/dev/vg-raid0/lv-bcache" >
/sys/fs/bcache/register. CRASH.
I was able to work around this issue by stopping the osd's before
registering and therefore freeing up memory.
Because I am able to reproduce this relatively easily, I'm happy to
test any patches.
Apart from the register crash, the cache sets have been up and stable
for over 1 month.

Regards,
Richard

On Mon, 7 Mar 2016, Marc MERLIN wrote:

> I started a btrfs copy from a bcached device to another one (both are in
> the 5 to 15TB range).
> A minute later, my system that had been up for several days, crashed.
>
> Ultimately, it crashed on this:
> bcache: register_cache() error opening sdl2: cannot allocate memory
> BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
> IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]

Strange about memory allocation issues.  Do you have
/proc/sys/vm/min_free_kbytes set to something like $((256*1024)) ? Is this
a multi-socket machine with all memory plugged into only one CPU?

I'm curious though, why was registration called a second time? Was the
drive external?  Could udev be re-registering the device?

You might find where the registration is being done and prevent it from
running automatically.  At least that might solve the re-registration
problem.

As for the memory allocation issue, the backtrace indicates that this is a
registration-time problem, not a runtime issue.  I'm guessing it is one of
the threads attempting to proceed after a memory allocation error similar
to the writeback thread issue you had last time which was fixed by adding
some locking around the initialization.

I'll look and see if I can come up with a reasonable patch to cleanup the
allocator or gc thread under OOM conditions at registration time.

-Eric

>
> I'm not sure why it tried to register sdl2 then when it was already
> registered from boot. Did it time out and had to be registered again?
>
> At this point, things with bcache have been bad enough, that I'm
> considering removing it. My system has been very unstable with it :(
> Or maybe I can just remove the caching devices, leave the backing ones,
> and run without cache for now, giving me the option to re-add the cache later.
>
> I actually just added 16GB of RAM to the server after adding bcache, and
> clearly going from 8 to 24GB made no difference. I'm wondering if there
> is a memory leak somewhere?
>
> Just before the system crashed, memory looked like this, which isn't bad:
>              total       used       free     shared    buffers     cached
> Mem:      24392440   24241440     151000          0         88   20116748
> -/+ buffers/cache:    4124604   20267836
> Swap:     15616764     244452   15372312
>
> I'm using btrfs on top of dmcrypt on top of bcache.
> btrfs on top of dmcrypt was stable, I had been using it for years. Adding bcache is what
> caused all my recent problems to start.
>
> Any ideas?
>
> Thanks,
> Marc
>
>
> [290569.458707] Process accounting resumed
> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
> [290623.697143] CPU: 0 PID: 1284 Comm: bcache-register Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
> [290623.732143] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [290623.762428]  0000000000000000 ffff88001df7bb08 ffffffff8134ae0a 0000000000000001
> [290623.787470]  ffff88001df7bba0 ffffffff81124ab6 024080c01df7bc44 024080c000000040
> [290623.811775]  0000000700000001 0000000000000007 0000000000000007 0000000000000040
> [290623.836443] Call Trace:
> [290623.846280]  [<ffffffff8134ae0a>] dump_stack+0x44/0x55
> [290623.864123]  [<ffffffff81124ab6>] warn_alloc_failed+0x114/0x12c
> [290623.883425]  [<ffffffff811274b8>] __alloc_pages_nodemask+0x7cb/0x84c
> [290623.904159]  [<ffffffff8107df80>] ? try_to_wake_up+0x1d9/0x1eb
> [290623.923173]  [<ffffffff8115f6d7>] alloc_pages_current+0xa9/0xcd
> [290623.942435]  [<ffffffff8112377e>] __get_free_pages+0xe/0x3c
> [290623.960621]  [<ffffffffc05192f0>] bch_cache_set_alloc+0x3c7/0x479 [bcache]
> [290623.983825]  [<ffffffffc051a5e3>] register_bcache+0x1241/0x1452 [bcache]
> [290624.005456]  [<ffffffff8134c853>] kobj_attr_store+0x10/0x1a
> [290624.023729]  [<ffffffff811df9df>] sysfs_kf_write+0x39/0x3b
> [290624.041739]  [<ffffffff811df2e1>] kernfs_fop_write+0xed/0x130
> [290624.060394]  [<ffffffff8117d97d>] __vfs_write+0x26/0xa5
> [290624.077452]  [<ffffffff810b2969>] ? current_kernel_time64+0x10/0x36
> [290624.097921]  [<ffffffff812d6050>] ? security_file_permission+0x3b/0x42
> [290624.119197]  [<ffffffff810915ea>] ? percpu_down_read+0x12/0x41
> [290624.138945]  [<ffffffff8117fb61>] ? __sb_start_write+0x2b/0x48
> [290624.158291]  [<ffffffff8117dfe8>] vfs_write+0x9d/0xe8
> [290624.174919]  [<ffffffff8117e7bd>] SyS_write+0x4d/0x78
> [290624.191455]  [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3
> [290624.210331]  [<ffffffff816e3e32>] sysenter_flags_fixed+0x8/0x12
> [290624.229355] Mem-Info:
> [290624.237432] active_anon:607145 inactive_anon:208571 isolated_anon:0
> [290624.237432]  active_file:2103407 inactive_file:1565209 isolated_file:0
> [290624.237432]  unevictable:1307 dirty:188846 writeback:138775 unstable:0
> [290624.237432]  slab_reclaimable:180112 slab_unreclaimable:73373
> [290624.237432]  mapped:419640 shmem:466581 pagetables:5778 bounce:0
> [290624.237432]  free:988612 free_pcp:2130 free_cma:14
> [290624.351715] Node 0 DMA free:15884kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> [290624.483875] lowmem_reserve[]: 0 3201 23800 23800
> [290624.500998] Node 0 DMA32 free:581000kB min:2648kB low:3308kB high:3972kB active_anon:229220kB inactive_anon:248156kB active_file:1102468kB inactive_file:832264kB unevictable:948kB isolated(anon):0kB isolated(file):68kB present:3362068kB managed:3283032kB mlocked:68719477684kB dirty:94316kB writeback:65996kB mapped:253400kB shmem:282168kB slab_reclaimable:96344kB slab_unreclaimable:41308kB kernel_stack:2480kB pagetables:2952kB unstable:0kB bounce:0kB free_pcp:4008kB local_pcp:140kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> [290624.655741] lowmem_reserve[]: 0 0 20599 20599
> [290624.674232] Node 0 Normal free:3368136kB min:17048kB low:21308kB high:25572kB active_anon:2203780kB inactive_anon:586104kB active_file:7311280kB inactive_file:5430068kB unevictable:4280kB isolated(anon):0kB isolated(file):0kB present:21485568kB managed:21093516kB mlocked:481036341432kB dirty:662172kB writeback:472472kB mapped:1425288kB shmem:1584132kB slab_reclaimable:624104kB slab_unreclaimable:251680kB kernel_stack:11696kB pagetables:20180kB unstable:0kB bounce:0kB free_pcp:4304kB local_pcp:588kB free_cma:56kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> [290624.834378] lowmem_reserve[]: 0 0 0 0
> [290624.848459] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
> [290624.895688] Node 0 DMA32: 30635*4kB (UME) 23925*8kB (UME) 7478*16kB (UME) 2420*32kB (UME) 794*64kB (UME) 139*128kB (UME) 10*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 582196kB
> [290624.947950] Node 0 Normal: 241499*4kB (UME) 161875*8kB (UME) 33424*16kB (UME) 10915*32kB (UME) 2795*64kB (UME) 334*128kB (UME) 15*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3370532kB
> [290625.003724] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [290625.030463] 4142937 total pagecache pages
> [290625.043969] 5782 pages in swap cache
> [290625.056118] Swap cache stats: add 102837, delete 97055, find 627323/635133
> [290625.078329] Free swap  = 15372324kB
> [290625.090190] Total swap = 15616764kB
> [290625.102186] 6215903 pages RAM
> [290625.112399] 0 pages HighMem/MovableOnly
> [290625.125747] 117793 pages reserved
> [290625.137495] 4096 pages cma reserved
> [290625.149199] 0 pages hwpoisoned
> [290625.159961] bcache: register_cache() error opening sdl2: cannot allocate memory
> [290625.183410] BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
> [290625.208648] IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> [290625.230734] PGD 0
> [290625.238446] Oops: 0000 [#1] SMP
> [290625.249857] Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype tun bridge stp llc autofs4 softdog binfmt_misc ftdi_sio nfsd nfs_acl auth_rpcgss nfs fscache lockd grace sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter bcache lm85 hwmon_vid pl2303 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables nf_conntrack_ftp ipt_MASQUERADE x_tables nf_nat_masquerade_ipv4 nf_nat nf_conntrack sg st snd_pcm_oss snd_mixer_oss kvm_intel snd_hda_codec_realtek snd_hda_codec_generic snd_opl3_synth kvm snd_seq_midi_emul snd_hda_intel snd_cmipci gameport snd_hda_codec snd_opl3_lib snd_mpu401_uart snd_seq_midi irqbypass snd_seq_midi_event 
 snd_seq snd_hda_core snd_hwdep snd_rawmidi asix coretemp eeepc_wmi tpm_infineon rc_ati_x10 snd_pcm libphy snd_seq_device asus_wmi snd_timer sparse_keymap rfkill intel_rapl 8250_fintek ati_re
 mo
>  te usbnet tpm_tis snd rc_core battery tpm evdev iosf_mbi wmi pcspkr input_leds intel_powerclamp parport_pc processor i2c_i801 xhci_pci x86_pkg_temp_thermal usbserial soundcore parport xhci_hcd lpc_ich e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx blowfish_x86_64 blowfish_common ecb xts ansi_cprng drbg aesni_intel ablk_helper lrw gf128mul glue_helper aes_x86_64 crct10dif_pclmul crc32_pclmul crc32c_intel r8169 ehci_pci fjes ehci_hcd thermal mii sata_mv fan cryptd usbcore usb_common sata_sil24 [last unloaded: ftdi_sio]
> [290625.713109] CPU: 0 PID: 25842 Comm: kworker/0:0 Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
> [290625.747597] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [290625.776918] Workqueue: events cache_set_flush [bcache]
> [290625.794260] task: ffff880467caa0c0 ti: ffff8805af184000 task.ti: ffff8805af184000
> [290625.818544] RIP: 0010:[<ffffffffc0516e8d>]  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> [290625.848082] RSP: 0018:ffff8805af187e10  EFLAGS: 00010202
> [290625.866890] RAX: 0000000000000001 RBX: ffff880153a40ce8 RCX: 0000000000000282
> [290625.891136] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [290625.914365] RBP: ffff8805af187e30 R08: 00000000000000fa R09: ffffe8ffffc01900
> [290625.937588] R10: ffff880088430838 R11: ffff880088430838 R12: 0000000000000000
> [290625.960781] R13: ffff880153a40de8 R14: 0000000000000000 R15: ffffe8ffffc01900
> [290625.983951] FS:  0000000000000000(0000) GS:ffff88061e200000(0000) knlGS:0000000000000000
> [290626.010918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [290626.030800] CR2: 00000000000009b8 CR3: 0000000001e0a000 CR4: 00000000001406f0
> [290626.053914] Stack:
> [290626.061687]  ffff8805d6183280 ffff88061e216040 ffff880153a40ce8 0000000000000000
> [290626.085725]  ffff8805af187e78 ffffffff81070fcf ffff8805d61832b0 000000001e216058
> [290626.110510]  ffff8805d6183280 ffff88061e216040 ffff8805d61832b0 ffff88061e216058
> [290626.135329] Call Trace:
> [290626.145108]  [<ffffffff81070fcf>] process_one_work+0x15b/0x260
> [290626.164241]  [<ffffffff81071574>] worker_thread+0x1f0/0x29d
> [290626.182561]  [<ffffffff81071384>] ? rescuer_thread+0x281/0x281
> [290626.201721]  [<ffffffff810758e9>] kthread+0xa5/0xad
> [290626.217966]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
> [290626.236505]  [<ffffffff816e1c0f>] ret_from_fork+0x3f/0x70
> [290626.255052]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
> [290626.274460] Code: 89 1c c1 4d 8b a4 24 f0 01 00 00 49 81 ec f0 01 00 00 eb b4 45 31 e4 0f b7 83 4c f7 ff ff 4a 8b 94 e3 58 ff ff ff 44 39 e0 76 16 <48> 8b ba b8 09 00 00 48 85 ff 74 05 e8 61 e8 b5 c0 49 ff c4 eb
> [290626.337443] RIP  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> [290626.359833]  RSP <ffff8805af187e10>
> [290626.372050] CR2: 00000000000009b8
> [290626.383646] ---[ end trace f97d88b8bb456800 ]---
> [290626.399977] Kernel panic - not syncing: Fatal exception
> [290626.418169] Kernel Offset: disabled
> [290626.430278] Rebooting in 20 seconds..
> [290646.378903] ACPI MEMORY or I/O RESET_REG.
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 20+ messages in thread