All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Bcache still unstable for me (memory problems)
@ 2016-03-08 23:04 Richard Bade
  2016-03-09  3:51 ` Eric Wheeler
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Bade @ 2016-03-08 23:04 UTC (permalink / raw)
  To: linux-bcache

Hi Guys,
I've also seen this issue when registering a device.
[Fri Feb  5 14:06:59 2016] bcache: register_cache() error opening
dm-6: cannot allocate memory

For me the situation happens because I'm running Ceph OSD's on the
host. These are using all the available memory and for some reason the
register cannot allocate memory.
I've reproduced this several times, but basically if I register on a
system that has been up for a while and therefore using all it's RAM
this will happen.
Here's two specific time's it's happened:
when creating using "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
--bdev /dev/sdf1". It failed due to xfs still remaining on /dev/sdf1.
Then run wipefs and "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
--bdev /dev/sdf1" again. CRASH.
when manually creating a cache device using "make-bcache --cache
/dev/vg-raid0/lv-bcache". The cache set didn't show up in
/sys/fs/bcache. echo "/dev/vg-raid0/lv-bcache" >
/sys/fs/bcache/register. CRASH.
I was able to work around this issue by stopping the osd's before
registering and therefore freeing up memory.
Because I am able to reproduce this relatively easily, I'm happy to
test any patches.
Apart from the register crash, the cache sets have been up and stable
for over 1 month.

Regards,
Richard

On Mon, 7 Mar 2016, Marc MERLIN wrote:

> I started a btrfs copy from a bcached device to another one (both are in
> the 5 to 15TB range).
> A minute later, my system that had been up for several days, crashed.
>
> Ultimately, it crashed on this:
> bcache: register_cache() error opening sdl2: cannot allocate memory
> BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
> IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]

Strange about memory allocation issues.  Do you have
/proc/sys/vm/min_free_kbytes set to something like $((256*1024)) ? Is this
a multi-socket machine with all memory plugged into only one CPU?

I'm curious though, why was registration called a second time? Was the
drive external?  Could udev be re-registering the device?

You might find where the registration is being done and prevent it from
running automatically.  At least that might solve the re-registration
problem.

As for the memory allocation issue, the backtrace indicates that this is a
registration-time problem, not a runtime issue.  I'm guessing it is one of
the threads attempting to proceed after a memory allocation error similar
to the writeback thread issue you had last time which was fixed by adding
some locking around the initialization.

I'll look and see if I can come up with a reasonable patch to cleanup the
allocator or gc thread under OOM conditions at registration time.

-Eric

>
> I'm not sure why it tried to register sdl2 then when it was already
> registered from boot. Did it time out and had to be registered again?
>
> At this point, things with bcache have been bad enough, that I'm
> considering removing it. My system has been very unstable with it :(
> Or maybe I can just remove the caching devices, leave the backing ones,
> and run without cache for now, giving me the option to re-add the cache later.
>
> I actually just added 16GB of RAM to the server after adding bcache, and
> clearly going from 8 to 24GB made no difference. I'm wondering if there
> is a memory leak somewhere?
>
> Just before the system crashed, memory looked like this, which isn't bad:
>              total       used       free     shared    buffers     cached
> Mem:      24392440   24241440     151000          0         88   20116748
> -/+ buffers/cache:    4124604   20267836
> Swap:     15616764     244452   15372312
>
> I'm using btrfs on top of dmcrypt on top of bcache.
> btrfs on top of dmcrypt was stable, I had been using it for years. Adding bcache is what
> caused all my recent problems to start.
>
> Any ideas?
>
> Thanks,
> Marc
>
>
> [290569.458707] Process accounting resumed
> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
> [290623.697143] CPU: 0 PID: 1284 Comm: bcache-register Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
> [290623.732143] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [290623.762428]  0000000000000000 ffff88001df7bb08 ffffffff8134ae0a 0000000000000001
> [290623.787470]  ffff88001df7bba0 ffffffff81124ab6 024080c01df7bc44 024080c000000040
> [290623.811775]  0000000700000001 0000000000000007 0000000000000007 0000000000000040
> [290623.836443] Call Trace:
> [290623.846280]  [<ffffffff8134ae0a>] dump_stack+0x44/0x55
> [290623.864123]  [<ffffffff81124ab6>] warn_alloc_failed+0x114/0x12c
> [290623.883425]  [<ffffffff811274b8>] __alloc_pages_nodemask+0x7cb/0x84c
> [290623.904159]  [<ffffffff8107df80>] ? try_to_wake_up+0x1d9/0x1eb
> [290623.923173]  [<ffffffff8115f6d7>] alloc_pages_current+0xa9/0xcd
> [290623.942435]  [<ffffffff8112377e>] __get_free_pages+0xe/0x3c
> [290623.960621]  [<ffffffffc05192f0>] bch_cache_set_alloc+0x3c7/0x479 [bcache]
> [290623.983825]  [<ffffffffc051a5e3>] register_bcache+0x1241/0x1452 [bcache]
> [290624.005456]  [<ffffffff8134c853>] kobj_attr_store+0x10/0x1a
> [290624.023729]  [<ffffffff811df9df>] sysfs_kf_write+0x39/0x3b
> [290624.041739]  [<ffffffff811df2e1>] kernfs_fop_write+0xed/0x130
> [290624.060394]  [<ffffffff8117d97d>] __vfs_write+0x26/0xa5
> [290624.077452]  [<ffffffff810b2969>] ? current_kernel_time64+0x10/0x36
> [290624.097921]  [<ffffffff812d6050>] ? security_file_permission+0x3b/0x42
> [290624.119197]  [<ffffffff810915ea>] ? percpu_down_read+0x12/0x41
> [290624.138945]  [<ffffffff8117fb61>] ? __sb_start_write+0x2b/0x48
> [290624.158291]  [<ffffffff8117dfe8>] vfs_write+0x9d/0xe8
> [290624.174919]  [<ffffffff8117e7bd>] SyS_write+0x4d/0x78
> [290624.191455]  [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3
> [290624.210331]  [<ffffffff816e3e32>] sysenter_flags_fixed+0x8/0x12
> [290624.229355] Mem-Info:
> [290624.237432] active_anon:607145 inactive_anon:208571 isolated_anon:0
> [290624.237432]  active_file:2103407 inactive_file:1565209 isolated_file:0
> [290624.237432]  unevictable:1307 dirty:188846 writeback:138775 unstable:0
> [290624.237432]  slab_reclaimable:180112 slab_unreclaimable:73373
> [290624.237432]  mapped:419640 shmem:466581 pagetables:5778 bounce:0
> [290624.237432]  free:988612 free_pcp:2130 free_cma:14
> [290624.351715] Node 0 DMA free:15884kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> [290624.483875] lowmem_reserve[]: 0 3201 23800 23800
> [290624.500998] Node 0 DMA32 free:581000kB min:2648kB low:3308kB high:3972kB active_anon:229220kB inactive_anon:248156kB active_file:1102468kB inactive_file:832264kB unevictable:948kB isolated(anon):0kB isolated(file):68kB present:3362068kB managed:3283032kB mlocked:68719477684kB dirty:94316kB writeback:65996kB mapped:253400kB shmem:282168kB slab_reclaimable:96344kB slab_unreclaimable:41308kB kernel_stack:2480kB pagetables:2952kB unstable:0kB bounce:0kB free_pcp:4008kB local_pcp:140kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> [290624.655741] lowmem_reserve[]: 0 0 20599 20599
> [290624.674232] Node 0 Normal free:3368136kB min:17048kB low:21308kB high:25572kB active_anon:2203780kB inactive_anon:586104kB active_file:7311280kB inactive_file:5430068kB unevictable:4280kB isolated(anon):0kB isolated(file):0kB present:21485568kB managed:21093516kB mlocked:481036341432kB dirty:662172kB writeback:472472kB mapped:1425288kB shmem:1584132kB slab_reclaimable:624104kB slab_unreclaimable:251680kB kernel_stack:11696kB pagetables:20180kB unstable:0kB bounce:0kB free_pcp:4304kB local_pcp:588kB free_cma:56kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> [290624.834378] lowmem_reserve[]: 0 0 0 0
> [290624.848459] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
> [290624.895688] Node 0 DMA32: 30635*4kB (UME) 23925*8kB (UME) 7478*16kB (UME) 2420*32kB (UME) 794*64kB (UME) 139*128kB (UME) 10*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 582196kB
> [290624.947950] Node 0 Normal: 241499*4kB (UME) 161875*8kB (UME) 33424*16kB (UME) 10915*32kB (UME) 2795*64kB (UME) 334*128kB (UME) 15*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3370532kB
> [290625.003724] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [290625.030463] 4142937 total pagecache pages
> [290625.043969] 5782 pages in swap cache
> [290625.056118] Swap cache stats: add 102837, delete 97055, find 627323/635133
> [290625.078329] Free swap  = 15372324kB
> [290625.090190] Total swap = 15616764kB
> [290625.102186] 6215903 pages RAM
> [290625.112399] 0 pages HighMem/MovableOnly
> [290625.125747] 117793 pages reserved
> [290625.137495] 4096 pages cma reserved
> [290625.149199] 0 pages hwpoisoned
> [290625.159961] bcache: register_cache() error opening sdl2: cannot allocate memory
> [290625.183410] BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
> [290625.208648] IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> [290625.230734] PGD 0
> [290625.238446] Oops: 0000 [#1] SMP
> [290625.249857] Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype tun bridge stp llc autofs4 softdog binfmt_misc ftdi_sio nfsd nfs_acl auth_rpcgss nfs fscache lockd grace sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter bcache lm85 hwmon_vid pl2303 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables nf_conntrack_ftp ipt_MASQUERADE x_tables nf_nat_masquerade_ipv4 nf_nat nf_conntrack sg st snd_pcm_oss snd_mixer_oss kvm_intel snd_hda_codec_realtek snd_hda_codec_generic snd_opl3_synth kvm snd_seq_midi_emul snd_hda_intel snd_cmipci gameport snd_hda_codec snd_opl3_lib snd_mpu401_uart snd_seq_midi irqbypass snd_seq_midi_event 
 snd_seq snd_hda_core snd_hwdep snd_rawmidi asix coretemp eeepc_wmi tpm_infineon rc_ati_x10 snd_pcm libphy snd_seq_device asus_wmi snd_timer sparse_keymap rfkill intel_rapl 8250_fintek ati_re
 mo
>  te usbnet tpm_tis snd rc_core battery tpm evdev iosf_mbi wmi pcspkr input_leds intel_powerclamp parport_pc processor i2c_i801 xhci_pci x86_pkg_temp_thermal usbserial soundcore parport xhci_hcd lpc_ich e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx blowfish_x86_64 blowfish_common ecb xts ansi_cprng drbg aesni_intel ablk_helper lrw gf128mul glue_helper aes_x86_64 crct10dif_pclmul crc32_pclmul crc32c_intel r8169 ehci_pci fjes ehci_hcd thermal mii sata_mv fan cryptd usbcore usb_common sata_sil24 [last unloaded: ftdi_sio]
> [290625.713109] CPU: 0 PID: 25842 Comm: kworker/0:0 Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
> [290625.747597] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [290625.776918] Workqueue: events cache_set_flush [bcache]
> [290625.794260] task: ffff880467caa0c0 ti: ffff8805af184000 task.ti: ffff8805af184000
> [290625.818544] RIP: 0010:[<ffffffffc0516e8d>]  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> [290625.848082] RSP: 0018:ffff8805af187e10  EFLAGS: 00010202
> [290625.866890] RAX: 0000000000000001 RBX: ffff880153a40ce8 RCX: 0000000000000282
> [290625.891136] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [290625.914365] RBP: ffff8805af187e30 R08: 00000000000000fa R09: ffffe8ffffc01900
> [290625.937588] R10: ffff880088430838 R11: ffff880088430838 R12: 0000000000000000
> [290625.960781] R13: ffff880153a40de8 R14: 0000000000000000 R15: ffffe8ffffc01900
> [290625.983951] FS:  0000000000000000(0000) GS:ffff88061e200000(0000) knlGS:0000000000000000
> [290626.010918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [290626.030800] CR2: 00000000000009b8 CR3: 0000000001e0a000 CR4: 00000000001406f0
> [290626.053914] Stack:
> [290626.061687]  ffff8805d6183280 ffff88061e216040 ffff880153a40ce8 0000000000000000
> [290626.085725]  ffff8805af187e78 ffffffff81070fcf ffff8805d61832b0 000000001e216058
> [290626.110510]  ffff8805d6183280 ffff88061e216040 ffff8805d61832b0 ffff88061e216058
> [290626.135329] Call Trace:
> [290626.145108]  [<ffffffff81070fcf>] process_one_work+0x15b/0x260
> [290626.164241]  [<ffffffff81071574>] worker_thread+0x1f0/0x29d
> [290626.182561]  [<ffffffff81071384>] ? rescuer_thread+0x281/0x281
> [290626.201721]  [<ffffffff810758e9>] kthread+0xa5/0xad
> [290626.217966]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
> [290626.236505]  [<ffffffff816e1c0f>] ret_from_fork+0x3f/0x70
> [290626.255052]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
> [290626.274460] Code: 89 1c c1 4d 8b a4 24 f0 01 00 00 49 81 ec f0 01 00 00 eb b4 45 31 e4 0f b7 83 4c f7 ff ff 4a 8b 94 e3 58 ff ff ff 44 39 e0 76 16 <48> 8b ba b8 09 00 00 48 85 ff 74 05 e8 61 e8 b5 c0 49 ff c4 eb
> [290626.337443] RIP  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> [290626.359833]  RSP <ffff8805af187e10>
> [290626.372050] CR2: 00000000000009b8
> [290626.383646] ---[ end trace f97d88b8bb456800 ]---
> [290626.399977] Kernel panic - not syncing: Fatal exception
> [290626.418169] Kernel Offset: disabled
> [290626.430278] Rebooting in 20 seconds..
> [290646.378903] ACPI MEMORY or I/O RESET_REG.
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-08 23:04 Bcache still unstable for me (memory problems) Richard Bade
@ 2016-03-09  3:51 ` Eric Wheeler
  2016-03-10  1:34   ` Eric Wheeler
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Wheeler @ 2016-03-09  3:51 UTC (permalink / raw)
  To: Richard Bade; +Cc: linux-bcache

On Wed, 9 Mar 2016, Richard Bade wrote:

> Hi Guys,
> I've also seen this issue when registering a device.
> [Fri Feb  5 14:06:59 2016] bcache: register_cache() error opening
> dm-6: cannot allocate memory
> 
> For me the situation happens because I'm running Ceph OSD's on the
> host. These are using all the available memory and for some reason the
> register cannot allocate memory.
> I've reproduced this several times, but basically if I register on a
> system that has been up for a while and therefore using all it's RAM
> this will happen.
> Here's two specific time's it's happened:
> when creating using "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
> --bdev /dev/sdf1". It failed due to xfs still remaining on /dev/sdf1.

Do you get a kernel failure, or is it just make-bcache being safe?

> Then run wipefs and "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
> --bdev /dev/sdf1" again. CRASH.

make-bcache will auto register.  (Maybe there's an option to stop that, 
I've not looked.)

> when manually creating a cache device using "make-bcache --cache
> /dev/vg-raid0/lv-bcache". The cache set didn't show up in
> /sys/fs/bcache. echo "/dev/vg-raid0/lv-bcache" >
> /sys/fs/bcache/register. CRASH.
> I was able to work around this issue by stopping the osd's before
> registering and therefore freeing up memory.

I would be curious if setting /proc/sys/vm/min_free_kbytes to 256*1024 
fixes that.

> Because I am able to reproduce this relatively easily, I'm happy to
> test any patches.

Yes, thank you, please do:

Try the three patches at the top of these two branches and see if you 
crash.  I'm hoping it fails gracefully on OOM:
	
https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc6-bcache-fixes
	commits 84a2edd, a704484
https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc7-bcache-fixes
	commit 87264e9

If it stops crashing, then maybe we can work out a way to allocate memory 
such that it succeeds.  Please send any dmesg output.

-Eric

> Apart from the register crash, the cache sets have been up and stable
> for over 1 month.
> 
> Regards,
> Richard
> 
> On Mon, 7 Mar 2016, Marc MERLIN wrote:
> 
> > I started a btrfs copy from a bcached device to another one (both are in
> > the 5 to 15TB range).
> > A minute later, my system that had been up for several days, crashed.
> >
> > Ultimately, it crashed on this:
> > bcache: register_cache() error opening sdl2: cannot allocate memory
> > BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
> > IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> 
> Strange about memory allocation issues.  Do you have
> /proc/sys/vm/min_free_kbytes set to something like $((256*1024)) ? Is this
> a multi-socket machine with all memory plugged into only one CPU?
> 
> I'm curious though, why was registration called a second time? Was the
> drive external?  Could udev be re-registering the device?
> 
> You might find where the registration is being done and prevent it from
> running automatically.  At least that might solve the re-registration
> problem.
> 
> As for the memory allocation issue, the backtrace indicates that this is a
> registration-time problem, not a runtime issue.  I'm guessing it is one of
> the threads attempting to proceed after a memory allocation error similar
> to the writeback thread issue you had last time which was fixed by adding
> some locking around the initialization.
> 
> I'll look and see if I can come up with a reasonable patch to cleanup the
> allocator or gc thread under OOM conditions at registration time.
> 
> -Eric
> 
> >
> > I'm not sure why it tried to register sdl2 then when it was already
> > registered from boot. Did it time out and had to be registered again?
> >
> > At this point, things with bcache have been bad enough, that I'm
> > considering removing it. My system has been very unstable with it :(
> > Or maybe I can just remove the caching devices, leave the backing ones,
> > and run without cache for now, giving me the option to re-add the cache later.
> >
> > I actually just added 16GB of RAM to the server after adding bcache, and
> > clearly going from 8 to 24GB made no difference. I'm wondering if there
> > is a memory leak somewhere?
> >
> > Just before the system crashed, memory looked like this, which isn't bad:
> >              total       used       free     shared    buffers     cached
> > Mem:      24392440   24241440     151000          0         88   20116748
> > -/+ buffers/cache:    4124604   20267836
> > Swap:     15616764     244452   15372312
> >
> > I'm using btrfs on top of dmcrypt on top of bcache.
> > btrfs on top of dmcrypt was stable, I had been using it for years. Adding bcache is what
> > caused all my recent problems to start.
> >
> > Any ideas?
> >
> > Thanks,
> > Marc
> >
> >
> > [290569.458707] Process accounting resumed
> > [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
> > [290623.697143] CPU: 0 PID: 1284 Comm: bcache-register Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
> > [290623.732143] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > [290623.762428]  0000000000000000 ffff88001df7bb08 ffffffff8134ae0a 0000000000000001
> > [290623.787470]  ffff88001df7bba0 ffffffff81124ab6 024080c01df7bc44 024080c000000040
> > [290623.811775]  0000000700000001 0000000000000007 0000000000000007 0000000000000040
> > [290623.836443] Call Trace:
> > [290623.846280]  [<ffffffff8134ae0a>] dump_stack+0x44/0x55
> > [290623.864123]  [<ffffffff81124ab6>] warn_alloc_failed+0x114/0x12c
> > [290623.883425]  [<ffffffff811274b8>] __alloc_pages_nodemask+0x7cb/0x84c
> > [290623.904159]  [<ffffffff8107df80>] ? try_to_wake_up+0x1d9/0x1eb
> > [290623.923173]  [<ffffffff8115f6d7>] alloc_pages_current+0xa9/0xcd
> > [290623.942435]  [<ffffffff8112377e>] __get_free_pages+0xe/0x3c
> > [290623.960621]  [<ffffffffc05192f0>] bch_cache_set_alloc+0x3c7/0x479 [bcache]
> > [290623.983825]  [<ffffffffc051a5e3>] register_bcache+0x1241/0x1452 [bcache]
> > [290624.005456]  [<ffffffff8134c853>] kobj_attr_store+0x10/0x1a
> > [290624.023729]  [<ffffffff811df9df>] sysfs_kf_write+0x39/0x3b
> > [290624.041739]  [<ffffffff811df2e1>] kernfs_fop_write+0xed/0x130
> > [290624.060394]  [<ffffffff8117d97d>] __vfs_write+0x26/0xa5
> > [290624.077452]  [<ffffffff810b2969>] ? current_kernel_time64+0x10/0x36
> > [290624.097921]  [<ffffffff812d6050>] ? security_file_permission+0x3b/0x42
> > [290624.119197]  [<ffffffff810915ea>] ? percpu_down_read+0x12/0x41
> > [290624.138945]  [<ffffffff8117fb61>] ? __sb_start_write+0x2b/0x48
> > [290624.158291]  [<ffffffff8117dfe8>] vfs_write+0x9d/0xe8
> > [290624.174919]  [<ffffffff8117e7bd>] SyS_write+0x4d/0x78
> > [290624.191455]  [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3
> > [290624.210331]  [<ffffffff816e3e32>] sysenter_flags_fixed+0x8/0x12
> > [290624.229355] Mem-Info:
> > [290624.237432] active_anon:607145 inactive_anon:208571 isolated_anon:0
> > [290624.237432]  active_file:2103407 inactive_file:1565209 isolated_file:0
> > [290624.237432]  unevictable:1307 dirty:188846 writeback:138775 unstable:0
> > [290624.237432]  slab_reclaimable:180112 slab_unreclaimable:73373
> > [290624.237432]  mapped:419640 shmem:466581 pagetables:5778 bounce:0
> > [290624.237432]  free:988612 free_pcp:2130 free_cma:14
> > [290624.351715] Node 0 DMA free:15884kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> > [290624.483875] lowmem_reserve[]: 0 3201 23800 23800
> > [290624.500998] Node 0 DMA32 free:581000kB min:2648kB low:3308kB high:3972kB active_anon:229220kB inactive_anon:248156kB active_file:1102468kB inactive_file:832264kB unevictable:948kB isolated(anon):0kB isolated(file):68kB present:3362068kB managed:3283032kB mlocked:68719477684kB dirty:94316kB writeback:65996kB mapped:253400kB shmem:282168kB slab_reclaimable:96344kB slab_unreclaimable:41308kB kernel_stack:2480kB pagetables:2952kB unstable:0kB bounce:0kB free_pcp:4008kB local_pcp:140kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> > [290624.655741] lowmem_reserve[]: 0 0 20599 20599
> > [290624.674232] Node 0 Normal free:3368136kB min:17048kB low:21308kB high:25572kB active_anon:2203780kB inactive_anon:586104kB active_file:7311280kB inactive_file:5430068kB unevictable:4280kB isolated(anon):0kB isolated(file):0kB present:21485568kB managed:21093516kB mlocked:481036341432kB dirty:662172kB writeback:472472kB mapped:1425288kB shmem:1584132kB slab_reclaimable:624104kB slab_unreclaimable:251680kB kernel_stack:11696kB pagetables:20180kB unstable:0kB bounce:0kB free_pcp:4304kB local_pcp:588kB free_cma:56kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> > [290624.834378] lowmem_reserve[]: 0 0 0 0
> > [290624.848459] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
> > [290624.895688] Node 0 DMA32: 30635*4kB (UME) 23925*8kB (UME) 7478*16kB (UME) 2420*32kB (UME) 794*64kB (UME) 139*128kB (UME) 10*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 582196kB
> > [290624.947950] Node 0 Normal: 241499*4kB (UME) 161875*8kB (UME) 33424*16kB (UME) 10915*32kB (UME) 2795*64kB (UME) 334*128kB (UME) 15*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3370532kB
> > [290625.003724] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > [290625.030463] 4142937 total pagecache pages
> > [290625.043969] 5782 pages in swap cache
> > [290625.056118] Swap cache stats: add 102837, delete 97055, find 627323/635133
> > [290625.078329] Free swap  = 15372324kB
> > [290625.090190] Total swap = 15616764kB
> > [290625.102186] 6215903 pages RAM
> > [290625.112399] 0 pages HighMem/MovableOnly
> > [290625.125747] 117793 pages reserved
> > [290625.137495] 4096 pages cma reserved
> > [290625.149199] 0 pages hwpoisoned
> > [290625.159961] bcache: register_cache() error opening sdl2: cannot allocate memory
> > [290625.183410] BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
> > [290625.208648] IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> > [290625.230734] PGD 0
> > [290625.238446] Oops: 0000 [#1] SMP
> > [290625.249857] Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype tun bridge stp llc autofs4 softdog binfmt_misc ftdi_sio nfsd nfs_acl auth_rpcgss nfs fscache lockd grace sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter bcache lm85 hwmon_vid pl2303 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables nf_conntrack_ftp ipt_MASQUERADE x_tables nf_nat_masquerade_ipv4 nf_nat nf_conntrack sg st snd_pcm_oss snd_mixer_oss kvm_intel snd_hda_codec_realtek snd_hda_codec_generic snd_opl3_synth kvm snd_seq_midi_emul snd_hda_intel snd_cmipci gameport snd_hda_codec snd_opl3_lib snd_mpu401_uart snd_seq_midi irqbypass snd_seq_midi_even
 t snd_seq snd_hda_core snd_hwdep snd_rawmidi asix coretemp eeepc_wmi tpm_infineon rc_ati_x10 snd_pcm libphy snd_seq_device asus_wmi snd_timer sparse_keymap rfkill intel_rapl 8250_fintek ati_
 re
>  mo
> >  te usbnet tpm_tis snd rc_core battery tpm evdev iosf_mbi wmi pcspkr input_leds intel_powerclamp parport_pc processor i2c_i801 xhci_pci x86_pkg_temp_thermal usbserial soundcore parport xhci_hcd lpc_ich e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx blowfish_x86_64 blowfish_common ecb xts ansi_cprng drbg aesni_intel ablk_helper lrw gf128mul glue_helper aes_x86_64 crct10dif_pclmul crc32_pclmul crc32c_intel r8169 ehci_pci fjes ehci_hcd thermal mii sata_mv fan cryptd usbcore usb_common sata_sil24 [last unloaded: ftdi_sio]
> > [290625.713109] CPU: 0 PID: 25842 Comm: kworker/0:0 Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
> > [290625.747597] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > [290625.776918] Workqueue: events cache_set_flush [bcache]
> > [290625.794260] task: ffff880467caa0c0 ti: ffff8805af184000 task.ti: ffff8805af184000
> > [290625.818544] RIP: 0010:[<ffffffffc0516e8d>]  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> > [290625.848082] RSP: 0018:ffff8805af187e10  EFLAGS: 00010202
> > [290625.866890] RAX: 0000000000000001 RBX: ffff880153a40ce8 RCX: 0000000000000282
> > [290625.891136] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > [290625.914365] RBP: ffff8805af187e30 R08: 00000000000000fa R09: ffffe8ffffc01900
> > [290625.937588] R10: ffff880088430838 R11: ffff880088430838 R12: 0000000000000000
> > [290625.960781] R13: ffff880153a40de8 R14: 0000000000000000 R15: ffffe8ffffc01900
> > [290625.983951] FS:  0000000000000000(0000) GS:ffff88061e200000(0000) knlGS:0000000000000000
> > [290626.010918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [290626.030800] CR2: 00000000000009b8 CR3: 0000000001e0a000 CR4: 00000000001406f0
> > [290626.053914] Stack:
> > [290626.061687]  ffff8805d6183280 ffff88061e216040 ffff880153a40ce8 0000000000000000
> > [290626.085725]  ffff8805af187e78 ffffffff81070fcf ffff8805d61832b0 000000001e216058
> > [290626.110510]  ffff8805d6183280 ffff88061e216040 ffff8805d61832b0 ffff88061e216058
> > [290626.135329] Call Trace:
> > [290626.145108]  [<ffffffff81070fcf>] process_one_work+0x15b/0x260
> > [290626.164241]  [<ffffffff81071574>] worker_thread+0x1f0/0x29d
> > [290626.182561]  [<ffffffff81071384>] ? rescuer_thread+0x281/0x281
> > [290626.201721]  [<ffffffff810758e9>] kthread+0xa5/0xad
> > [290626.217966]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
> > [290626.236505]  [<ffffffff816e1c0f>] ret_from_fork+0x3f/0x70
> > [290626.255052]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
> > [290626.274460] Code: 89 1c c1 4d 8b a4 24 f0 01 00 00 49 81 ec f0 01 00 00 eb b4 45 31 e4 0f b7 83 4c f7 ff ff 4a 8b 94 e3 58 ff ff ff 44 39 e0 76 16 <48> 8b ba b8 09 00 00 48 85 ff 74 05 e8 61 e8 b5 c0 49 ff c4 eb
> > [290626.337443] RIP  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> > [290626.359833]  RSP <ffff8805af187e10>
> > [290626.372050] CR2: 00000000000009b8
> > [290626.383646] ---[ end trace f97d88b8bb456800 ]---
> > [290626.399977] Kernel panic - not syncing: Fatal exception
> > [290626.418169] Kernel Offset: disabled
> > [290626.430278] Rebooting in 20 seconds..
> > [290646.378903] ACPI MEMORY or I/O RESET_REG.
> > --
> > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> > Microsoft is to operating systems ....
> >                                       .... what McDonalds is to gourmet cooking
> > Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-09  3:51 ` Eric Wheeler
@ 2016-03-10  1:34   ` Eric Wheeler
  2016-03-10  2:15     ` Marc MERLIN
  2016-03-21  0:05     ` Richard Bade
  0 siblings, 2 replies; 20+ messages in thread
From: Eric Wheeler @ 2016-03-10  1:34 UTC (permalink / raw)
  To: Richard Bade; +Cc: linux-bcache, Marc MERLIN

Hi Richard, Marc,

>>> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0

Do you still have the backtraces that show the function call stack for 
errors that look like this?
	%s: page allocation failure: order:%d, mode:0x%x 

Please send as many relevant OOM failure traces that you can.  I would 
like to see which memory allocation(s) are failing and if they are always 
the same stack trace.

In the example above, order 7 means 2^7 of 4k pages, so it means the 
kernel can't find 512k of contiguous memory that can be allocated.

It looks like the OOM is triggered in bch_cache_set_alloc, but might be 
cache_alloc too.  I'm not sure if an alternate allocation mechanism can be 
used safely, but thats what I want to look into.

Thanks!

-Eric

--
Eric Wheeler

On Wed, 9 Mar 2016, Eric Wheeler wrote:

> On Wed, 9 Mar 2016, Richard Bade wrote:
> 
> > Hi Guys,
> > I've also seen this issue when registering a device.
> > [Fri Feb  5 14:06:59 2016] bcache: register_cache() error opening
> > dm-6: cannot allocate memory
> > 
> > For me the situation happens because I'm running Ceph OSD's on the
> > host. These are using all the available memory and for some reason the
> > register cannot allocate memory.
> > I've reproduced this several times, but basically if I register on a
> > system that has been up for a while and therefore using all it's RAM
> > this will happen.
> > Here's two specific time's it's happened:
> > when creating using "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
> > --bdev /dev/sdf1". It failed due to xfs still remaining on /dev/sdf1.
> 
> Do you get a kernel failure, or is it just make-bcache being safe?
> 
> > Then run wipefs and "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
> > --bdev /dev/sdf1" again. CRASH.
> 
> make-bcache will auto register.  (Maybe there's an option to stop that, 
> I've not looked.)
> 
> > when manually creating a cache device using "make-bcache --cache
> > /dev/vg-raid0/lv-bcache". The cache set didn't show up in
> > /sys/fs/bcache. echo "/dev/vg-raid0/lv-bcache" >
> > /sys/fs/bcache/register. CRASH.
> > I was able to work around this issue by stopping the osd's before
> > registering and therefore freeing up memory.
> 
> I would be curious if setting /proc/sys/vm/min_free_kbytes to 256*1024 
> fixes that.
> 
> > Because I am able to reproduce this relatively easily, I'm happy to
> > test any patches.
> 
> Yes, thank you, please do:
> 
> Try the three patches at the top of these two branches and see if you 
> crash.  I'm hoping it fails gracefully on OOM:
> 	
> https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc6-bcache-fixes
> 	commits 84a2edd, a704484
> https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc7-bcache-fixes
> 	commit 87264e9
> 
> If it stops crashing, then maybe we can work out a way to allocate memory 
> such that it succeeds.  Please send any dmesg output.
> 
> -Eric
> 
> > Apart from the register crash, the cache sets have been up and stable
> > for over 1 month.
> > 
> > Regards,
> > Richard
> > 
> > On Mon, 7 Mar 2016, Marc MERLIN wrote:
> > 
> > > I started a btrfs copy from a bcached device to another one (both are in
> > > the 5 to 15TB range).
> > > A minute later, my system that had been up for several days, crashed.
> > >
> > > Ultimately, it crashed on this:
> > > bcache: register_cache() error opening sdl2: cannot allocate memory
> > > BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
> > > IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> > 
> > Strange about memory allocation issues.  Do you have
> > /proc/sys/vm/min_free_kbytes set to something like $((256*1024)) ? Is this
> > a multi-socket machine with all memory plugged into only one CPU?
> > 
> > I'm curious though, why was registration called a second time? Was the
> > drive external?  Could udev be re-registering the device?
> > 
> > You might find where the registration is being done and prevent it from
> > running automatically.  At least that might solve the re-registration
> > problem.
> > 
> > As for the memory allocation issue, the backtrace indicates that this is a
> > registration-time problem, not a runtime issue.  I'm guessing it is one of
> > the threads attempting to proceed after a memory allocation error similar
> > to the writeback thread issue you had last time which was fixed by adding
> > some locking around the initialization.
> > 
> > I'll look and see if I can come up with a reasonable patch to cleanup the
> > allocator or gc thread under OOM conditions at registration time.
> > 
> > -Eric
> > 
> > >
> > > I'm not sure why it tried to register sdl2 then when it was already
> > > registered from boot. Did it time out and had to be registered again?
> > >
> > > At this point, things with bcache have been bad enough, that I'm
> > > considering removing it. My system has been very unstable with it :(
> > > Or maybe I can just remove the caching devices, leave the backing ones,
> > > and run without cache for now, giving me the option to re-add the cache later.
> > >
> > > I actually just added 16GB of RAM to the server after adding bcache, and
> > > clearly going from 8 to 24GB made no difference. I'm wondering if there
> > > is a memory leak somewhere?
> > >
> > > Just before the system crashed, memory looked like this, which isn't bad:
> > >              total       used       free     shared    buffers     cached
> > > Mem:      24392440   24241440     151000          0         88   20116748
> > > -/+ buffers/cache:    4124604   20267836
> > > Swap:     15616764     244452   15372312
> > >
> > > I'm using btrfs on top of dmcrypt on top of bcache.
> > > btrfs on top of dmcrypt was stable, I had been using it for years. Adding bcache is what
> > > caused all my recent problems to start.
> > >
> > > Any ideas?
> > >
> > > Thanks,
> > > Marc
> > >
> > >
> > > [290569.458707] Process accounting resumed
> > > [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
> > > [290623.697143] CPU: 0 PID: 1284 Comm: bcache-register Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
> > > [290623.732143] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > > [290623.762428]  0000000000000000 ffff88001df7bb08 ffffffff8134ae0a 0000000000000001
> > > [290623.787470]  ffff88001df7bba0 ffffffff81124ab6 024080c01df7bc44 024080c000000040
> > > [290623.811775]  0000000700000001 0000000000000007 0000000000000007 0000000000000040
> > > [290623.836443] Call Trace:
> > > [290623.846280]  [<ffffffff8134ae0a>] dump_stack+0x44/0x55
> > > [290623.864123]  [<ffffffff81124ab6>] warn_alloc_failed+0x114/0x12c
> > > [290623.883425]  [<ffffffff811274b8>] __alloc_pages_nodemask+0x7cb/0x84c
> > > [290623.904159]  [<ffffffff8107df80>] ? try_to_wake_up+0x1d9/0x1eb
> > > [290623.923173]  [<ffffffff8115f6d7>] alloc_pages_current+0xa9/0xcd
> > > [290623.942435]  [<ffffffff8112377e>] __get_free_pages+0xe/0x3c
> > > [290623.960621]  [<ffffffffc05192f0>] bch_cache_set_alloc+0x3c7/0x479 [bcache]
> > > [290623.983825]  [<ffffffffc051a5e3>] register_bcache+0x1241/0x1452 [bcache]
> > > [290624.005456]  [<ffffffff8134c853>] kobj_attr_store+0x10/0x1a
> > > [290624.023729]  [<ffffffff811df9df>] sysfs_kf_write+0x39/0x3b
> > > [290624.041739]  [<ffffffff811df2e1>] kernfs_fop_write+0xed/0x130
> > > [290624.060394]  [<ffffffff8117d97d>] __vfs_write+0x26/0xa5
> > > [290624.077452]  [<ffffffff810b2969>] ? current_kernel_time64+0x10/0x36
> > > [290624.097921]  [<ffffffff812d6050>] ? security_file_permission+0x3b/0x42
> > > [290624.119197]  [<ffffffff810915ea>] ? percpu_down_read+0x12/0x41
> > > [290624.138945]  [<ffffffff8117fb61>] ? __sb_start_write+0x2b/0x48
> > > [290624.158291]  [<ffffffff8117dfe8>] vfs_write+0x9d/0xe8
> > > [290624.174919]  [<ffffffff8117e7bd>] SyS_write+0x4d/0x78
> > > [290624.191455]  [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3
> > > [290624.210331]  [<ffffffff816e3e32>] sysenter_flags_fixed+0x8/0x12
> > > [290624.229355] Mem-Info:
> > > [290624.237432] active_anon:607145 inactive_anon:208571 isolated_anon:0
> > > [290624.237432]  active_file:2103407 inactive_file:1565209 isolated_file:0
> > > [290624.237432]  unevictable:1307 dirty:188846 writeback:138775 unstable:0
> > > [290624.237432]  slab_reclaimable:180112 slab_unreclaimable:73373
> > > [290624.237432]  mapped:419640 shmem:466581 pagetables:5778 bounce:0
> > > [290624.237432]  free:988612 free_pcp:2130 free_cma:14
> > > [290624.351715] Node 0 DMA free:15884kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> > > [290624.483875] lowmem_reserve[]: 0 3201 23800 23800
> > > [290624.500998] Node 0 DMA32 free:581000kB min:2648kB low:3308kB high:3972kB active_anon:229220kB inactive_anon:248156kB active_file:1102468kB inactive_file:832264kB unevictable:948kB isolated(anon):0kB isolated(file):68kB present:3362068kB managed:3283032kB mlocked:68719477684kB dirty:94316kB writeback:65996kB mapped:253400kB shmem:282168kB slab_reclaimable:96344kB slab_unreclaimable:41308kB kernel_stack:2480kB pagetables:2952kB unstable:0kB bounce:0kB free_pcp:4008kB local_pcp:140kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> > > [290624.655741] lowmem_reserve[]: 0 0 20599 20599
> > > [290624.674232] Node 0 Normal free:3368136kB min:17048kB low:21308kB high:25572kB active_anon:2203780kB inactive_anon:586104kB active_file:7311280kB inactive_file:5430068kB unevictable:4280kB isolated(anon):0kB isolated(file):0kB present:21485568kB managed:21093516kB mlocked:481036341432kB dirty:662172kB writeback:472472kB mapped:1425288kB shmem:1584132kB slab_reclaimable:624104kB slab_unreclaimable:251680kB kernel_stack:11696kB pagetables:20180kB unstable:0kB bounce:0kB free_pcp:4304kB local_pcp:588kB free_cma:56kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> > > [290624.834378] lowmem_reserve[]: 0 0 0 0
> > > [290624.848459] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
> > > [290624.895688] Node 0 DMA32: 30635*4kB (UME) 23925*8kB (UME) 7478*16kB (UME) 2420*32kB (UME) 794*64kB (UME) 139*128kB (UME) 10*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 582196kB
> > > [290624.947950] Node 0 Normal: 241499*4kB (UME) 161875*8kB (UME) 33424*16kB (UME) 10915*32kB (UME) 2795*64kB (UME) 334*128kB (UME) 15*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3370532kB
> > > [290625.003724] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> > > [290625.030463] 4142937 total pagecache pages
> > > [290625.043969] 5782 pages in swap cache
> > > [290625.056118] Swap cache stats: add 102837, delete 97055, find 627323/635133
> > > [290625.078329] Free swap  = 15372324kB
> > > [290625.090190] Total swap = 15616764kB
> > > [290625.102186] 6215903 pages RAM
> > > [290625.112399] 0 pages HighMem/MovableOnly
> > > [290625.125747] 117793 pages reserved
> > > [290625.137495] 4096 pages cma reserved
> > > [290625.149199] 0 pages hwpoisoned
> > > [290625.159961] bcache: register_cache() error opening sdl2: cannot allocate memory
> > > [290625.183410] BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
> > > [290625.208648] IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> > > [290625.230734] PGD 0
> > > [290625.238446] Oops: 0000 [#1] SMP
> > > [290625.249857] Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype tun bridge stp llc autofs4 softdog binfmt_misc ftdi_sio nfsd nfs_acl auth_rpcgss nfs fscache lockd grace sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter bcache lm85 hwmon_vid pl2303 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables nf_conntrack_ftp ipt_MASQUERADE x_tables nf_nat_masquerade_ipv4 nf_nat nf_conntrack sg st snd_pcm_oss snd_mixer_oss kvm_intel snd_hda_codec_realtek snd_hda_codec_generic snd_opl3_synth kvm snd_seq_midi_emul snd_hda_intel snd_cmipci gameport snd_hda_codec snd_opl3_lib snd_mpu401_uart snd_seq_midi irqbypass snd_seq_midi_ev
 ent snd_seq snd_hda_core snd_hwdep snd_rawmidi asix coretemp eeepc_wmi tpm_infineon rc_ati_x10 snd_pcm libphy snd_seq_device asus_wmi snd_timer sparse_keymap rfkill intel_rapl 8250_fintek at
 i_
>  re
> >  mo
> > >  te usbnet tpm_tis snd rc_core battery tpm evdev iosf_mbi wmi pcspkr input_leds intel_powerclamp parport_pc processor i2c_i801 xhci_pci x86_pkg_temp_thermal usbserial soundcore parport xhci_hcd lpc_ich e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx blowfish_x86_64 blowfish_common ecb xts ansi_cprng drbg aesni_intel ablk_helper lrw gf128mul glue_helper aes_x86_64 crct10dif_pclmul crc32_pclmul crc32c_intel r8169 ehci_pci fjes ehci_hcd thermal mii sata_mv fan cryptd usbcore usb_common sata_sil24 [last unloaded: ftdi_sio]
> > > [290625.713109] CPU: 0 PID: 25842 Comm: kworker/0:0 Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
> > > [290625.747597] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> > > [290625.776918] Workqueue: events cache_set_flush [bcache]
> > > [290625.794260] task: ffff880467caa0c0 ti: ffff8805af184000 task.ti: ffff8805af184000
> > > [290625.818544] RIP: 0010:[<ffffffffc0516e8d>]  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> > > [290625.848082] RSP: 0018:ffff8805af187e10  EFLAGS: 00010202
> > > [290625.866890] RAX: 0000000000000001 RBX: ffff880153a40ce8 RCX: 0000000000000282
> > > [290625.891136] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > [290625.914365] RBP: ffff8805af187e30 R08: 00000000000000fa R09: ffffe8ffffc01900
> > > [290625.937588] R10: ffff880088430838 R11: ffff880088430838 R12: 0000000000000000
> > > [290625.960781] R13: ffff880153a40de8 R14: 0000000000000000 R15: ffffe8ffffc01900
> > > [290625.983951] FS:  0000000000000000(0000) GS:ffff88061e200000(0000) knlGS:0000000000000000
> > > [290626.010918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [290626.030800] CR2: 00000000000009b8 CR3: 0000000001e0a000 CR4: 00000000001406f0
> > > [290626.053914] Stack:
> > > [290626.061687]  ffff8805d6183280 ffff88061e216040 ffff880153a40ce8 0000000000000000
> > > [290626.085725]  ffff8805af187e78 ffffffff81070fcf ffff8805d61832b0 000000001e216058
> > > [290626.110510]  ffff8805d6183280 ffff88061e216040 ffff8805d61832b0 ffff88061e216058
> > > [290626.135329] Call Trace:
> > > [290626.145108]  [<ffffffff81070fcf>] process_one_work+0x15b/0x260
> > > [290626.164241]  [<ffffffff81071574>] worker_thread+0x1f0/0x29d
> > > [290626.182561]  [<ffffffff81071384>] ? rescuer_thread+0x281/0x281
> > > [290626.201721]  [<ffffffff810758e9>] kthread+0xa5/0xad
> > > [290626.217966]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
> > > [290626.236505]  [<ffffffff816e1c0f>] ret_from_fork+0x3f/0x70
> > > [290626.255052]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
> > > [290626.274460] Code: 89 1c c1 4d 8b a4 24 f0 01 00 00 49 81 ec f0 01 00 00 eb b4 45 31 e4 0f b7 83 4c f7 ff ff 4a 8b 94 e3 58 ff ff ff 44 39 e0 76 16 <48> 8b ba b8 09 00 00 48 85 ff 74 05 e8 61 e8 b5 c0 49 ff c4 eb
> > > [290626.337443] RIP  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> > > [290626.359833]  RSP <ffff8805af187e10>
> > > [290626.372050] CR2: 00000000000009b8
> > > [290626.383646] ---[ end trace f97d88b8bb456800 ]---
> > > [290626.399977] Kernel panic - not syncing: Fatal exception
> > > [290626.418169] Kernel Offset: disabled
> > > [290626.430278] Rebooting in 20 seconds..
> > > [290646.378903] ACPI MEMORY or I/O RESET_REG.
> > > --
> > > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> > > Microsoft is to operating systems ....
> > >                                       .... what McDonalds is to gourmet cooking
> > > Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-10  1:34   ` Eric Wheeler
@ 2016-03-10  2:15     ` Marc MERLIN
  2016-03-10  7:07       ` sysrq-o for safe bcache shutdown, what about reboot? Eric Wheeler
  2016-03-10 15:29       ` Bcache still unstable for me (memory problems) Marc MERLIN
  2016-03-21  0:05     ` Richard Bade
  1 sibling, 2 replies; 20+ messages in thread
From: Marc MERLIN @ 2016-03-10  2:15 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Richard Bade, linux-bcache

On Thu, Mar 10, 2016 at 01:34:56AM +0000, Eric Wheeler wrote:
> Hi Richard, Marc,
> 
> >>> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
> 
> Do you still have the backtraces that show the function call stack for 
> errors that look like this?
> 	%s: page allocation failure: order:%d, mode:0x%x 
> 
> Please send as many relevant OOM failure traces that you can.  I would 
> like to see which memory allocation(s) are failing and if they are always 
> the same stack trace.
 
It's the same one I already sent you, just from syslog instead of serial
console (I was looking for other relevant cronjobs or errors per your
request)

> In the example above, order 7 means 2^7 of 4k pages, so it means the 
> kernel can't find 512k of contiguous memory that can be allocated.
> 
> It looks like the OOM is triggered in bch_cache_set_alloc, but might be 
> cache_alloc too.  I'm not sure if an alternate allocation mechanism can be 
> used safely, but thats what I want to look into.

That was before your patches of course, so I'll report back further
crashes if any.

By the way, slightly related question. If I have a slightly hung system
that will not reboot with 'reboot', if I use sysrq - e + u + s + b, I
get:
[213056.198133] sysrq: SysRq : Emergency Remount R/O
[213058.266112] sysrq: SysRq : Emergency Sync
[213061.704158] sysrq: SysRq : Resetting
[213061.716559] ACPI MEMORY or I/O RESET_REG.

This does not properly stop bcache (I believe) or sw raid, or flush
things properly.
Instead of 'b', I usually use 'o', it does properly shut everything
down, flush all IO and everything, but then also turns off my machine,
and I have to rely on wake on lan to bring it back up, which mostly
works, until maybe it won't one day :)

'o' gives me the much reassuring:
[ 1744.758691] sysrq: SysRq : Emergency Remount R/O
[ 1745.867719] sysrq: SysRq : Emergency Sync
[ 1747.482890] sysrq: SysRq : Power Off
[ 1754.242984] Emergency Remount complete
[ 1758.535234] bcache: bcache_reboot() Stopping all devices:
[ 1758.551562] bcache: bcache_device_free() bcache0 stopped
[ 1760.539050] bcache: bcache_reboot() Timeout waiting for devices to be closed
[ 1760.560249] kvm: exiting hardware virtualization
[ 1760.574844] sd 17:0:0:0: [sdr] Synchronizing SCSI cache
[ 1760.590730] sd 17:0:0:0: [sdr] Stopping disk
[ 1760.891076] sd 16:0:0:0: [sdq] Synchronizing SCSI cache
[ 1760.911070] sd 16:0:0:0: [sdq] Stopping disk
[ 1761.219149] sd 15:0:0:0: [sdp] Synchronizing SCSI cache
[ 1761.235053] sd 15:0:0:0: [sdp] Stopping disk
[ 1761.535120] sd 14:0:0:0: [sdo] Synchronizing SCSI cache
[ 1761.555095] sd 14:0:0:0: [sdo] Stopping disk
[ 1761.855112] sd 13:0:0:0: [sdn] Synchronizing SCSI cache
[ 1761.870920] sd 13:0:0:0: [sdn] Stopping disk
[ 1762.751983] sd 11:4:0:0: [sdm] Synchronizing SCSI cache
[ 1762.767882] sd 11:4:0:0: [sdm] Stopping disk
[ 1763.191203] sd 11:3:0:0: [sdl] Synchronizing SCSI cache
[ 1763.207428] sd 11:3:0:0: [sdl] Stopping disk
[ 1763.631534] sd 11:2:0:0: [sdk] Synchronizing SCSI cache
[ 1763.647524] sd 11:2:0:0: [sdk] Stopping disk
[ 1764.071512] sd 11:1:0:0: [sdj] Synchronizing SCSI cache
[ 1764.087396] sd 11:1:0:0: [sdj] Stopping disk
[ 1764.510467] sd 11:0:0:0: [sdi] Synchronizing SCSI cache
[ 1764.526819] sd 11:0:0:0: [sdi] Stopping disk
[ 1764.950319] sd 9:0:0:0: [sdh] Synchronizing SCSI cache
[ 1764.966079] sd 9:0:0:0: [sdh] Stopping disk
[ 1765.960508] sd 8:0:0:0: [sdg] Synchronizing SCSI cache
[ 1765.978370] sd 8:0:0:0: [sdg] Stopping disk
[ 1766.278896] r8169 0000:05:00.0: System wakeup enabled by ACPI
[ 1766.442869] sd 3:0:0:0: [sdf] Synchronizing SCSI cache
[ 1766.519912] sd 3:0:0:0: [sdf] Stopping disk
[ 1767.014799] sd 2:0:0:0: [sde] Synchronizing SCSI cache
[ 1767.042979] sd 2:0:0:0: [sde] Stopping disk
[ 1767.864325] sd 1:0:1:0: [sdd] Synchronizing SCSI cache
[ 1767.976656] sd 1:0:1:0: [sdd] Stopping disk
[ 1768.754903] sd 1:0:0:0: [sdc] Synchronizing SCSI cache
[ 1770.197116] sd 1:0:0:0: [sdc] Stopping disk
[ 1771.084250] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
[ 1771.125229] sd 0:0:1:0: [sdb] Stopping disk
[ 1771.558552] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 1771.574145] sd 0:0:0:0: [sda] Stopping disk
[ 1772.008787] ACPI: Preparing to enter system sleep state S5
[ 1772.026660] reboot: Power down
[ 1772.037064] acpi_power_off called

Is there another way to get a proper flush of everything and still
reboot instead of powering off?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: sysrq-o for safe bcache shutdown, what about reboot?
  2016-03-10  2:15     ` Marc MERLIN
@ 2016-03-10  7:07       ` Eric Wheeler
  2016-03-10 17:46         ` Marc MERLIN
  2016-03-10 15:29       ` Bcache still unstable for me (memory problems) Marc MERLIN
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Wheeler @ 2016-03-10  7:07 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Richard Bade, linux-bcache

On Wed, 9 Mar 2016, Marc MERLIN wrote:
> That was before your patches of course, so I'll report back further
> crashes if any.
> 
> By the way, slightly related question. If I have a slightly hung system
> that will not reboot with 'reboot', if I use sysrq - e + u + s + b, I

I use r-e-i-s-u-b, but o is better I guess if you're at the console.  In 
severe situations, o might not work where b might.

IMHO, s before u.  If u fails, s may not complete depending on how hard 
unmount(s) failed.  But if there's a counter argument for u before s then 
I'd love to hear it.

bcache does he a reboot notifier in super.c:bcache_reboot(), not sure if 
's' will sync it, but if it syncs all filesystems, it would sync all upper 
layers down through bcache.  Probably.  I'm not sure if 's' or 'u' could 
be hooked by bcache.  

It would be interesting to add a sysrq that calls all reboot notifiers, or 
does an 'o' but reboots instead.  It would be interesting to hunt these 
down and find out what they do, exactly.  hmmm...

http://lxr.free-electrons.com/source/Documentation/sysrq.txt
 196 *  I want to add SysRQ key events to a module, how does it work?
 198 In order to register a basic function with the table, you must first include
 199 the header 'include/linux/sysrq.h' [...]

Hey, try this and see if it works.  Use '.' as the sysrq key for reboot or 
`echo . > /proc/sysrq-trigger`.  Not sure if that's a valid key from 
sysrq, but if it does what you like, I'll post to lmkl and see what they 
have to say about it:

========================================
diff --git a/kernel/reboot.c b/kernel/reboot.c
index d20c85d..e680383 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -16,6 +16,7 @@
 #include <linux/syscalls.h>
 #include <linux/syscore_ops.h>
 #include <linux/uaccess.h>
+#include <linux/sysrq.h>
 
 /*
  * this indicates whether you can reboot with ctrl-alt-del: the default is yes
@@ -555,3 +556,42 @@ static int __init reboot_setup(char *str)
 	return 1;
 }
 __setup("reboot=", reboot_setup);
+
+
+/*
+ * When the user hits Sys-Rq '.' to kernel_restart the machine this is the
+ * callback we use.
+ */
+
+static void do_krestart(struct work_struct *dummy)
+{
+	/* are these two necessary? */
+	lockdep_off();
+	local_irq_enable();
+
+	/* restart */
+	kernel_restart(NULL);
+}
+
+static DECLARE_WORK(krestart_work, do_krestart);
+
+static void handle_krestart(int key)
+{
+	/* run sysrq krestart on boot cpu */
+	schedule_work_on(cpumask_first(cpu_online_mask), &krestart_work);
+}
+
+static struct sysrq_key_op	sysrq_krestart_op = {
+	.handler        = handle_krestart,
+	.help_msg       = "kernel_restart(.)",
+	.action_msg     = "Gracefullish Restart",
+	.enable_mask	= SYSRQ_ENABLE_BOOT,
+};
+
+static int __init pm_sysrq_krestart_init(void)
+{
+	register_sysrq_key('.', &sysrq_krestart_op);
+	return 0;
+}
+
+subsys_initcall(pm_sysrq_krestart_init);
========================================

-Eric



> get:
> [213056.198133] sysrq: SysRq : Emergency Remount R/O
> [213058.266112] sysrq: SysRq : Emergency Sync
> [213061.704158] sysrq: SysRq : Resetting
> [213061.716559] ACPI MEMORY or I/O RESET_REG.
> 
> This does not properly stop bcache (I believe) or sw raid, or flush
> things properly.
> Instead of 'b', I usually use 'o', it does properly shut everything
> down, flush all IO and everything, but then also turns off my machine,
> and I have to rely on wake on lan to bring it back up, which mostly
> works, until maybe it won't one day :)
> 
> 'o' gives me the much reassuring:
> [ 1744.758691] sysrq: SysRq : Emergency Remount R/O
> [ 1745.867719] sysrq: SysRq : Emergency Sync
> [ 1747.482890] sysrq: SysRq : Power Off
> [ 1754.242984] Emergency Remount complete
> [ 1758.535234] bcache: bcache_reboot() Stopping all devices:
> [ 1758.551562] bcache: bcache_device_free() bcache0 stopped
> [ 1760.539050] bcache: bcache_reboot() Timeout waiting for devices to be closed
> [ 1760.560249] kvm: exiting hardware virtualization
> [ 1760.574844] sd 17:0:0:0: [sdr] Synchronizing SCSI cache
> [ 1760.590730] sd 17:0:0:0: [sdr] Stopping disk
> [ 1760.891076] sd 16:0:0:0: [sdq] Synchronizing SCSI cache
> [ 1760.911070] sd 16:0:0:0: [sdq] Stopping disk
> [ 1761.219149] sd 15:0:0:0: [sdp] Synchronizing SCSI cache
> [ 1761.235053] sd 15:0:0:0: [sdp] Stopping disk
> [ 1761.535120] sd 14:0:0:0: [sdo] Synchronizing SCSI cache
> [ 1761.555095] sd 14:0:0:0: [sdo] Stopping disk
> [ 1761.855112] sd 13:0:0:0: [sdn] Synchronizing SCSI cache
> [ 1761.870920] sd 13:0:0:0: [sdn] Stopping disk
> [ 1762.751983] sd 11:4:0:0: [sdm] Synchronizing SCSI cache
> [ 1762.767882] sd 11:4:0:0: [sdm] Stopping disk
> [ 1763.191203] sd 11:3:0:0: [sdl] Synchronizing SCSI cache
> [ 1763.207428] sd 11:3:0:0: [sdl] Stopping disk
> [ 1763.631534] sd 11:2:0:0: [sdk] Synchronizing SCSI cache
> [ 1763.647524] sd 11:2:0:0: [sdk] Stopping disk
> [ 1764.071512] sd 11:1:0:0: [sdj] Synchronizing SCSI cache
> [ 1764.087396] sd 11:1:0:0: [sdj] Stopping disk
> [ 1764.510467] sd 11:0:0:0: [sdi] Synchronizing SCSI cache
> [ 1764.526819] sd 11:0:0:0: [sdi] Stopping disk
> [ 1764.950319] sd 9:0:0:0: [sdh] Synchronizing SCSI cache
> [ 1764.966079] sd 9:0:0:0: [sdh] Stopping disk
> [ 1765.960508] sd 8:0:0:0: [sdg] Synchronizing SCSI cache
> [ 1765.978370] sd 8:0:0:0: [sdg] Stopping disk
> [ 1766.278896] r8169 0000:05:00.0: System wakeup enabled by ACPI
> [ 1766.442869] sd 3:0:0:0: [sdf] Synchronizing SCSI cache
> [ 1766.519912] sd 3:0:0:0: [sdf] Stopping disk
> [ 1767.014799] sd 2:0:0:0: [sde] Synchronizing SCSI cache
> [ 1767.042979] sd 2:0:0:0: [sde] Stopping disk
> [ 1767.864325] sd 1:0:1:0: [sdd] Synchronizing SCSI cache
> [ 1767.976656] sd 1:0:1:0: [sdd] Stopping disk
> [ 1768.754903] sd 1:0:0:0: [sdc] Synchronizing SCSI cache
> [ 1770.197116] sd 1:0:0:0: [sdc] Stopping disk
> [ 1771.084250] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
> [ 1771.125229] sd 0:0:1:0: [sdb] Stopping disk
> [ 1771.558552] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [ 1771.574145] sd 0:0:0:0: [sda] Stopping disk
> [ 1772.008787] ACPI: Preparing to enter system sleep state S5
> [ 1772.026660] reboot: Power down
> [ 1772.037064] acpi_power_off called
> 
> Is there another way to get a proper flush of everything and still
> reboot instead of powering off?
> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-10  2:15     ` Marc MERLIN
  2016-03-10  7:07       ` sysrq-o for safe bcache shutdown, what about reboot? Eric Wheeler
@ 2016-03-10 15:29       ` Marc MERLIN
  2016-03-10 15:41         ` Christoph Nelles
  2016-03-24 21:25         ` Marc MERLIN
  1 sibling, 2 replies; 20+ messages in thread
From: Marc MERLIN @ 2016-03-10 15:29 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Richard Bade, linux-bcache

On Wed, Mar 09, 2016 at 06:15:16PM -0800, Marc MERLIN wrote:
> On Thu, Mar 10, 2016 at 01:34:56AM +0000, Eric Wheeler wrote:
> > Hi Richard, Marc,
> > 
> > >>> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
> > 
> > Do you still have the backtraces that show the function call stack for 
> > errors that look like this?
> > 	%s: page allocation failure: order:%d, mode:0x%x 
> > 
> > Please send as many relevant OOM failure traces that you can.  I would 
> > like to see which memory allocation(s) are failing and if they are always 
> > the same stack trace.
>  
> It's the same one I already sent you, just from syslog instead of serial
> console (I was looking for other relevant cronjobs or errors per your
> request)
 
Ok, here's a new one with your patches installed. Good thing is that
kernel didn't crash. But adding caching device still failed.

I can probably fix this by rebooting and re-registering my device at boot, but 
before I do this, an idea why I'm still getting memory issues?

gargamel:~# cat /proc/sys/vm/min_free_kbytes
262144

gargamel:/sys/block/md5/bcache# echo /dev/sdl2 > /sys/fs/bcache/register
bash: echo: write error: Invalid argument

[34688.183049] bash: page allocation failure: order:7, mode:0x24080c0
[34688.203133] CPU: 0 PID: 25275 Comm: bash Not tainted 4.4.2-amd64-i915-volpreempt-20160214bc4 #6
[34688.230667] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[34688.259444]  0000000000000000 ffff88043e58bb30 ffffffff8134ae0a 0000000000000001
[34688.284107]  ffff88043e58bbc8 ffffffff81124ab6 024080c03e58bc6c 024080c000000040
[34688.308281]  0000000700000001 0000000000000007 0000000000000007 0000000000000040
[34688.332378] Call Trace:
[34688.341070]  [<ffffffff8134ae0a>] dump_stack+0x44/0x55
[34688.358078]  [<ffffffff81124ab6>] warn_alloc_failed+0x114/0x12c
[34688.377326]  [<ffffffff811274b8>] __alloc_pages_nodemask+0x7cb/0x84c
[34688.397732]  [<ffffffff8115f6d7>] alloc_pages_current+0xa9/0xcd
[34688.417080]  [<ffffffff8112377e>] __get_free_pages+0xe/0x3c
[34688.435374]  [<ffffffffc05a631f>] register_bcache+0xf62/0x143c [bcache]
[34688.456857]  [<ffffffff8134c853>] kobj_attr_store+0x10/0x1a
[34688.475189]  [<ffffffff811df9df>] sysfs_kf_write+0x39/0x3b
[34688.493102]  [<ffffffff811df2e1>] kernfs_fop_write+0xed/0x130
[34688.511850]  [<ffffffff8117d97d>] __vfs_write+0x26/0xa5
[34688.528864]  [<ffffffff8117f228>] ? fput+0x16/0x88
[34688.545066]  [<ffffffff810b2969>] ? current_kernel_time64+0x10/0x36
[34688.565037]  [<ffffffff812d6050>] ? security_file_permission+0x3b/0x42
[34688.585872]  [<ffffffff810915ea>] ? percpu_down_read+0x12/0x41
[34688.604737]  [<ffffffff8117fb61>] ? __sb_start_write+0x2b/0x48
[34688.623560]  [<ffffffff8117dfe8>] vfs_write+0x9d/0xe8
[34688.640019]  [<ffffffff8117e7bd>] SyS_write+0x4d/0x78
[34688.656471]  [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3
[34688.675266]  [<ffffffff816e3e32>] sysenter_flags_fixed+0x8/0x12
[34688.694310] Mem-Info:
[34688.702368] active_anon:565495 inactive_anon:189125 isolated_anon:0
[34688.702368]  active_file:1996870 inactive_file:2265222 isolated_file:0
[34688.702368]  unevictable:1279 dirty:54238 writeback:0 unstable:0
[34688.702368]  slab_reclaimable:406092 slab_unreclaimable:66680
[34688.702368]  mapped:426230 shmem:411337 pagetables:5114 bounce:0
[34688.702368]  free:299621 free_pcp:762 free_cma:67
[34688.813209] Node 0 DMA free:15876kB min:168kB low:208kB high:252kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[34688.941523] lowmem_reserve[]: 0 3201 23800 23800
[34688.957205] Node 0 DMA32 free:425612kB min:35280kB low:44100kB high:52920kB active_anon:194436kB inactive_anon:205776kB active_file:718524kB inactive_file:1408480kB unevictable:584kB isolated(anon):0kB isolated(file):0kB present:3362068kB managed:3283032kB mlocked:68719477320kB dirty:2440kB writeback:0kB mapped:253824kB shmem:248508kB slab_reclaimable:118200kB slab_unreclaimable:34760kB kernel_stack:2304kB pagetables:1944kB unstable:0kB bounce:0kB free_pcp:1804kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[34689.105915] lowmem_reserve[]: 0 0 20599 20599
[34689.120492] Node 0 Normal free:755312kB min:226688kB low:283360kB high:340032kB active_anon:2068316kB inactive_anon:550724kB active_file:7268956kB inactive_file:7653284kB unevictable:4532kB isolated(anon):0kB isolated(file):0kB present:21485568kB managed:21093516kB mlocked:51539612084kB dirty:215820kB writeback:0kB mapped:1451096kB shmem:1396840kB slab_reclaimable:1506168kB slab_unreclaimable:231960kB kernel_stack:9248kB pagetables:18512kB unstable:0kB bounce:0kB free_pcp:1940kB local_pcp:92kB free_cma:268kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[34689.274142] lowmem_reserve[]: 0 0 0 0
[34689.286213] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15876kB
[34689.328035] Node 0 DMA32: 42110*4kB (UME) 31724*8kB (UME) 79*16kB (UME) 5*32kB (UM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 423656kB
[34689.371488] Node 0 Normal: 154990*4kB (UMEC) 16220*8kB (UMEC) 356*16kB (UME) 3*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 755512kB
[34689.416083] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[34689.442305] 4676126 total pagecache pages
[34689.455229] 1342 pages in swap cache
[34689.466865] Swap cache stats: add 7173, delete 5831, find 3279/3833
[34689.486591] Free swap  = 15592768kB
[34689.497974] Total swap = 15616764kB
[34689.509335] 6215903 pages RAM
[34689.519147] 0 pages HighMem/MovableOnly
[34689.531592] 117793 pages reserved
[34689.542475] 4096 pages cma reserved
[34689.553954] 0 pages hwpoisoned
[34689.564116] bcache: register_bcache() error opening /dev/sdl2: (null)

gargamel:~# cat /proc/meminfo 
MemTotal:       24392440 kB
MemFree:         1345756 kB
MemAvailable:   18880940 kB
Buffers:           15068 kB
Cached:         18575392 kB
SwapCached:         5356 kB
Active:         10383364 kB
Inactive:        9582652 kB
Active(anon):    2264292 kB
Inactive(anon):   759480 kB
Active(file):    8119072 kB
Inactive(file):  8823172 kB
Unevictable:        5116 kB
Mlocked:        120259089404 kB
SwapTotal:      15616764 kB
SwapFree:       15592168 kB
Dirty:            237576 kB
Writeback:             4 kB
AnonPages:       1375628 kB
Mapped:          1705744 kB
Shmem:           1645364 kB
Slab:            1842444 kB
SReclaimable:    1575944 kB
SUnreclaim:       266500 kB
KernelStack:       11488 kB
PageTables:        20376 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    27812984 kB
Committed_AS:    7463904 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
CmaTotal:          16384 kB
CmaFree:             540 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      394516 kB
DirectMap2M:    24469504 kB

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-10 15:29       ` Bcache still unstable for me (memory problems) Marc MERLIN
@ 2016-03-10 15:41         ` Christoph Nelles
  2016-03-10 15:47           ` Marc MERLIN
  2016-03-24 21:25         ` Marc MERLIN
  1 sibling, 1 reply; 20+ messages in thread
From: Christoph Nelles @ 2016-03-10 15:41 UTC (permalink / raw)
  Cc: linux-bcache


Am 10.03.2016 um 16:29 schrieb Marc MERLIN:
> On Wed, Mar 09, 2016 at 06:15:16PM -0800, Marc MERLIN wrote:
>> On Thu, Mar 10, 2016 at 01:34:56AM +0000, Eric Wheeler wrote:
>>> Hi Richard, Marc,
>>>
>>>>>> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
>>> Do you still have the backtraces that show the function call stack for
>>> errors that look like this?
>>> 	%s: page allocation failure: order:%d, mode:0x%x
>>>
>>> Please send as many relevant OOM failure traces that you can.  I would
>>> like to see which memory allocation(s) are failing and if they are always
>>> the same stack trace.
>>   
>> It's the same one I already sent you, just from syslog instead of serial
>> console (I was looking for other relevant cronjobs or errors per your
>> request)
>   
>
> gargamel:~# cat /proc/meminfo
> MemTotal:       24392440 kB
> Mlocked:        120259089404 kB
> VmallocTotal:   34359738367 kB
>
> Thanks,
> Marc

Just out of curiosity, how can 112TB be in the state mlocked on a 24GB 
machine?

Regards

Christoph

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-10 15:41         ` Christoph Nelles
@ 2016-03-10 15:47           ` Marc MERLIN
  0 siblings, 0 replies; 20+ messages in thread
From: Marc MERLIN @ 2016-03-10 15:47 UTC (permalink / raw)
  To: Christoph Nelles; +Cc: linux-bcache

On Thu, Mar 10, 2016 at 04:41:09PM +0100, Christoph Nelles wrote:
> >gargamel:~# cat /proc/meminfo
> >MemTotal:       24392440 kB
> >Mlocked:        120259089404 kB
> >VmallocTotal:   34359738367 kB
> >
> >Thanks,
> >Marc
> 
> Just out of curiosity, how can 112TB be in the state mlocked on a 24GB
> machine?

That's a very good question. Sadly I'm not knowledgeable about the
different kinds of memory pools and their exactly meaning/use, so I have
no idea.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: sysrq-o for safe bcache shutdown, what about reboot?
  2016-03-10  7:07       ` sysrq-o for safe bcache shutdown, what about reboot? Eric Wheeler
@ 2016-03-10 17:46         ` Marc MERLIN
  0 siblings, 0 replies; 20+ messages in thread
From: Marc MERLIN @ 2016-03-10 17:46 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Richard Bade, linux-bcache

On Thu, Mar 10, 2016 at 07:07:02AM +0000, Eric Wheeler wrote:
> It would be interesting to add a sysrq that calls all reboot notifiers, or 
> does an 'o' but reboots instead.  It would be interesting to hunt these 
> down and find out what they do, exactly.  hmmm...
> 
> http://lxr.free-electrons.com/source/Documentation/sysrq.txt
>  196 *  I want to add SysRQ key events to a module, how does it work?
>  198 In order to register a basic function with the table, you must first include
>  199 the header 'include/linux/sysrq.h' [...]
> 
> Hey, try this and see if it works.  Use '.' as the sysrq key for reboot or 
> `echo . > /proc/sysrq-trigger`.  Not sure if that's a valid key from 
> sysrq, but if it does what you like, I'll post to lmkl and see what they 
> have to say about it:

Eh, talk about extra credit, thanks much for that.

I changed the key to 'x' because '.' didn't work, but otherwise it seems
to have worked fine, so thanks a bunch for that.
I'm surprised that we'd be the first people to notice this need, but
it's never too late to make linux better :) so thanks for your patch,
and indeed I think it's worth sending upstream for consideration.
 
[  622.392830] sysrq: SysRq : HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) force-fb(V) show-blocked-tasks(w) kernel_restart(x) dump-ftrace-buffer(z) 
[  630.587827] sysrq: SysRq : Gracefullish Restart
[  653.840897] bcache: bcache_reboot() Stopping all devices:
[  653.857210] bcache: bcache_device_free() bcache0 stopped
[  655.841896] bcache: bcache_reboot() Timeout waiting for devices to be closed
[  655.863099] kvm: exiting hardware virtualization
[  655.877707] sd 17:0:0:0: [sdr] Synchronizing SCSI cache
[  655.897948] sd 16:0:0:0: [sdq] Synchronizing SCSI cache
[  655.917940] sd 15:0:0:0: [sdp] Synchronizing SCSI cache
[  655.937941] sd 14:0:0:0: [sdo] Synchronizing SCSI cache
[  655.957938] sd 13:0:0:0: [sdn] Synchronizing SCSI cache
[  655.973718] sd 11:4:0:0: [sdm] Synchronizing SCSI cache
[  655.991024] sd 11:3:0:0: [sdl] Synchronizing SCSI cache
[  656.008218] sd 11:2:0:0: [sdk] Synchronizing SCSI cache
[  656.025367] sd 11:1:0:0: [sdj] Synchronizing SCSI cache
[  656.042531] sd 11:0:0:0: [sdi] Synchronizing SCSI cache
[  656.059546] sd 9:0:0:0: [sdh] Synchronizing SCSI cache
[  656.076342] sd 8:0:0:0: [sdg] Synchronizing SCSI cache
[  656.167161] sd 3:0:0:0: [sdf] Synchronizing SCSI cache
[  656.184725] sd 2:0:0:0: [sde] Synchronizing SCSI cache
[  656.202491] sd 1:0:1:0: [sdd] Synchronizing SCSI cache
[  656.219932] sd 1:0:0:0: [sdc] Synchronizing SCSI cache
[  656.237140] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
[  656.254206] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  656.271897] reboot: Restarting system
[  656.284158] reboot: machine restart
[  656.296061] ACPI MEMORY or I/O RESET_REG.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-10  1:34   ` Eric Wheeler
  2016-03-10  2:15     ` Marc MERLIN
@ 2016-03-21  0:05     ` Richard Bade
  2016-03-21  0:46       ` Marc MERLIN
  2016-03-25  3:59       ` Eric Wheeler
  1 sibling, 2 replies; 20+ messages in thread
From: Richard Bade @ 2016-03-21  0:05 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: linux-bcache, Marc MERLIN

Hi Eric,
We built a kernel with the 3 patches applied and I still got the box to lock up.
I don't have any of the traces any more. Also I'm wondering what is
the best way to get these? When this happens my machine completely
locks up so can't grab anything from dmesg.
Let me know what you want me to grab and I'll crash it again.

Regards,
Richard

On 10 March 2016 at 14:34, Eric Wheeler <bcache@lists.ewheeler.net> wrote:
> Hi Richard, Marc,
>
>>>> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
>
> Do you still have the backtraces that show the function call stack for
> errors that look like this?
>         %s: page allocation failure: order:%d, mode:0x%x
>
> Please send as many relevant OOM failure traces that you can.  I would
> like to see which memory allocation(s) are failing and if they are always
> the same stack trace.
>
> In the example above, order 7 means 2^7 of 4k pages, so it means the
> kernel can't find 512k of contiguous memory that can be allocated.
>
> It looks like the OOM is triggered in bch_cache_set_alloc, but might be
> cache_alloc too.  I'm not sure if an alternate allocation mechanism can be
> used safely, but thats what I want to look into.
>
> Thanks!
>
> -Eric
>
> --
> Eric Wheeler
>
> On Wed, 9 Mar 2016, Eric Wheeler wrote:
>
>> On Wed, 9 Mar 2016, Richard Bade wrote:
>>
>> > Hi Guys,
>> > I've also seen this issue when registering a device.
>> > [Fri Feb  5 14:06:59 2016] bcache: register_cache() error opening
>> > dm-6: cannot allocate memory
>> >
>> > For me the situation happens because I'm running Ceph OSD's on the
>> > host. These are using all the available memory and for some reason the
>> > register cannot allocate memory.
>> > I've reproduced this several times, but basically if I register on a
>> > system that has been up for a while and therefore using all it's RAM
>> > this will happen.
>> > Here's two specific time's it's happened:
>> > when creating using "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
>> > --bdev /dev/sdf1". It failed due to xfs still remaining on /dev/sdf1.
>>
>> Do you get a kernel failure, or is it just make-bcache being safe?
>>
>> > Then run wipefs and "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
>> > --bdev /dev/sdf1" again. CRASH.
>>
>> make-bcache will auto register.  (Maybe there's an option to stop that,
>> I've not looked.)
>>
>> > when manually creating a cache device using "make-bcache --cache
>> > /dev/vg-raid0/lv-bcache". The cache set didn't show up in
>> > /sys/fs/bcache. echo "/dev/vg-raid0/lv-bcache" >
>> > /sys/fs/bcache/register. CRASH.
>> > I was able to work around this issue by stopping the osd's before
>> > registering and therefore freeing up memory.
>>
>> I would be curious if setting /proc/sys/vm/min_free_kbytes to 256*1024
>> fixes that.
>>
>> > Because I am able to reproduce this relatively easily, I'm happy to
>> > test any patches.
>>
>> Yes, thank you, please do:
>>
>> Try the three patches at the top of these two branches and see if you
>> crash.  I'm hoping it fails gracefully on OOM:
>>
>> https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc6-bcache-fixes
>>       commits 84a2edd, a704484
>> https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc7-bcache-fixes
>>       commit 87264e9
>>
>> If it stops crashing, then maybe we can work out a way to allocate memory
>> such that it succeeds.  Please send any dmesg output.
>>
>> -Eric
>>
>> > Apart from the register crash, the cache sets have been up and stable
>> > for over 1 month.
>> >
>> > Regards,
>> > Richard
>> >
>> > On Mon, 7 Mar 2016, Marc MERLIN wrote:
>> >
>> > > I started a btrfs copy from a bcached device to another one (both are in
>> > > the 5 to 15TB range).
>> > > A minute later, my system that had been up for several days, crashed.
>> > >
>> > > Ultimately, it crashed on this:
>> > > bcache: register_cache() error opening sdl2: cannot allocate memory
>> > > BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
>> > > IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
>> >
>> > Strange about memory allocation issues.  Do you have
>> > /proc/sys/vm/min_free_kbytes set to something like $((256*1024)) ? Is this
>> > a multi-socket machine with all memory plugged into only one CPU?
>> >
>> > I'm curious though, why was registration called a second time? Was the
>> > drive external?  Could udev be re-registering the device?
>> >
>> > You might find where the registration is being done and prevent it from
>> > running automatically.  At least that might solve the re-registration
>> > problem.
>> >
>> > As for the memory allocation issue, the backtrace indicates that this is a
>> > registration-time problem, not a runtime issue.  I'm guessing it is one of
>> > the threads attempting to proceed after a memory allocation error similar
>> > to the writeback thread issue you had last time which was fixed by adding
>> > some locking around the initialization.
>> >
>> > I'll look and see if I can come up with a reasonable patch to cleanup the
>> > allocator or gc thread under OOM conditions at registration time.
>> >
>> > -Eric
>> >
>> > >
>> > > I'm not sure why it tried to register sdl2 then when it was already
>> > > registered from boot. Did it time out and had to be registered again?
>> > >
>> > > At this point, things with bcache have been bad enough, that I'm
>> > > considering removing it. My system has been very unstable with it :(
>> > > Or maybe I can just remove the caching devices, leave the backing ones,
>> > > and run without cache for now, giving me the option to re-add the cache later.
>> > >
>> > > I actually just added 16GB of RAM to the server after adding bcache, and
>> > > clearly going from 8 to 24GB made no difference. I'm wondering if there
>> > > is a memory leak somewhere?
>> > >
>> > > Just before the system crashed, memory looked like this, which isn't bad:
>> > >              total       used       free     shared    buffers     cached
>> > > Mem:      24392440   24241440     151000          0         88   20116748
>> > > -/+ buffers/cache:    4124604   20267836
>> > > Swap:     15616764     244452   15372312
>> > >
>> > > I'm using btrfs on top of dmcrypt on top of bcache.
>> > > btrfs on top of dmcrypt was stable, I had been using it for years. Adding bcache is what
>> > > caused all my recent problems to start.
>> > >
>> > > Any ideas?
>> > >
>> > > Thanks,
>> > > Marc
>> > >
>> > >
>> > > [290569.458707] Process accounting resumed
>> > > [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
>> > > [290623.697143] CPU: 0 PID: 1284 Comm: bcache-register Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
>> > > [290623.732143] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
>> > > [290623.762428]  0000000000000000 ffff88001df7bb08 ffffffff8134ae0a 0000000000000001
>> > > [290623.787470]  ffff88001df7bba0 ffffffff81124ab6 024080c01df7bc44 024080c000000040
>> > > [290623.811775]  0000000700000001 0000000000000007 0000000000000007 0000000000000040
>> > > [290623.836443] Call Trace:
>> > > [290623.846280]  [<ffffffff8134ae0a>] dump_stack+0x44/0x55
>> > > [290623.864123]  [<ffffffff81124ab6>] warn_alloc_failed+0x114/0x12c
>> > > [290623.883425]  [<ffffffff811274b8>] __alloc_pages_nodemask+0x7cb/0x84c
>> > > [290623.904159]  [<ffffffff8107df80>] ? try_to_wake_up+0x1d9/0x1eb
>> > > [290623.923173]  [<ffffffff8115f6d7>] alloc_pages_current+0xa9/0xcd
>> > > [290623.942435]  [<ffffffff8112377e>] __get_free_pages+0xe/0x3c
>> > > [290623.960621]  [<ffffffffc05192f0>] bch_cache_set_alloc+0x3c7/0x479 [bcache]
>> > > [290623.983825]  [<ffffffffc051a5e3>] register_bcache+0x1241/0x1452 [bcache]
>> > > [290624.005456]  [<ffffffff8134c853>] kobj_attr_store+0x10/0x1a
>> > > [290624.023729]  [<ffffffff811df9df>] sysfs_kf_write+0x39/0x3b
>> > > [290624.041739]  [<ffffffff811df2e1>] kernfs_fop_write+0xed/0x130
>> > > [290624.060394]  [<ffffffff8117d97d>] __vfs_write+0x26/0xa5
>> > > [290624.077452]  [<ffffffff810b2969>] ? current_kernel_time64+0x10/0x36
>> > > [290624.097921]  [<ffffffff812d6050>] ? security_file_permission+0x3b/0x42
>> > > [290624.119197]  [<ffffffff810915ea>] ? percpu_down_read+0x12/0x41
>> > > [290624.138945]  [<ffffffff8117fb61>] ? __sb_start_write+0x2b/0x48
>> > > [290624.158291]  [<ffffffff8117dfe8>] vfs_write+0x9d/0xe8
>> > > [290624.174919]  [<ffffffff8117e7bd>] SyS_write+0x4d/0x78
>> > > [290624.191455]  [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3
>> > > [290624.210331]  [<ffffffff816e3e32>] sysenter_flags_fixed+0x8/0x12
>> > > [290624.229355] Mem-Info:
>> > > [290624.237432] active_anon:607145 inactive_anon:208571 isolated_anon:0
>> > > [290624.237432]  active_file:2103407 inactive_file:1565209 isolated_file:0
>> > > [290624.237432]  unevictable:1307 dirty:188846 writeback:138775 unstable:0
>> > > [290624.237432]  slab_reclaimable:180112 slab_unreclaimable:73373
>> > > [290624.237432]  mapped:419640 shmem:466581 pagetables:5778 bounce:0
>> > > [290624.237432]  free:988612 free_pcp:2130 free_cma:14
>> > > [290624.351715] Node 0 DMA free:15884kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
>> > > [290624.483875] lowmem_reserve[]: 0 3201 23800 23800
>> > > [290624.500998] Node 0 DMA32 free:581000kB min:2648kB low:3308kB high:3972kB active_anon:229220kB inactive_anon:248156kB active_file:1102468kB inactive_file:832264kB unevictable:948kB isolated(anon):0kB isolated(file):68kB present:3362068kB managed:3283032kB mlocked:68719477684kB dirty:94316kB writeback:65996kB mapped:253400kB shmem:282168kB slab_reclaimable:96344kB slab_unreclaimable:41308kB kernel_stack:2480kB pagetables:2952kB unstable:0kB bounce:0kB free_pcp:4008kB local_pcp:140kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>> > > [290624.655741] lowmem_reserve[]: 0 0 20599 20599
>> > > [290624.674232] Node 0 Normal free:3368136kB min:17048kB low:21308kB high:25572kB active_anon:2203780kB inactive_anon:586104kB active_file:7311280kB inactive_file:5430068kB unevictable:4280kB isolated(anon):0kB isolated(file):0kB present:21485568kB managed:21093516kB mlocked:481036341432kB dirty:662172kB writeback:472472kB mapped:1425288kB shmem:1584132kB slab_reclaimable:624104kB slab_unreclaimable:251680kB kernel_stack:11696kB pagetables:20180kB unstable:0kB bounce:0kB free_pcp:4304kB local_pcp:588kB free_cma:56kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>> > > [290624.834378] lowmem_reserve[]: 0 0 0 0
>> > > [290624.848459] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
>> > > [290624.895688] Node 0 DMA32: 30635*4kB (UME) 23925*8kB (UME) 7478*16kB (UME) 2420*32kB (UME) 794*64kB (UME) 139*128kB (UME) 10*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 582196kB
>> > > [290624.947950] Node 0 Normal: 241499*4kB (UME) 161875*8kB (UME) 33424*16kB (UME) 10915*32kB (UME) 2795*64kB (UME) 334*128kB (UME) 15*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3370532kB
>> > > [290625.003724] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
>> > > [290625.030463] 4142937 total pagecache pages
>> > > [290625.043969] 5782 pages in swap cache
>> > > [290625.056118] Swap cache stats: add 102837, delete 97055, find 627323/635133
>> > > [290625.078329] Free swap  = 15372324kB
>> > > [290625.090190] Total swap = 15616764kB
>> > > [290625.102186] 6215903 pages RAM
>> > > [290625.112399] 0 pages HighMem/MovableOnly
>> > > [290625.125747] 117793 pages reserved
>> > > [290625.137495] 4096 pages cma reserved
>> > > [290625.149199] 0 pages hwpoisoned
>> > > [290625.159961] bcache: register_cache() error opening sdl2: cannot allocate memory
>> > > [290625.183410] BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
>> > > [290625.208648] IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
>> > > [290625.230734] PGD 0
>> > > [290625.238446] Oops: 0000 [#1] SMP
>> > > [290625.249857] Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype tun bridge stp llc autofs4 softdog binfmt_misc ftdi_sio nfsd nfs_acl auth_rpcgss nfs fscache lockd grace sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter bcache lm85 hwmon_vid pl2303 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables nf_conntrack_ftp ipt_MASQUERADE x_tables nf_nat_masquerade_ipv4 nf_nat nf_conntrack sg st snd_pcm_oss snd_mixer_oss kvm_intel snd_hda_codec_realtek snd_hda_codec_generic snd_opl3_synth kvm snd_seq_midi_emul snd_hda_intel snd_cmipci gameport snd_hda_codec snd_opl3_lib snd_mpu401_uart snd_seq_midi irqbypass snd_seq_midi_e
 vent snd_seq snd_hda_core snd_hwdep snd_rawmidi asix coretemp eeepc_wmi tpm_infineon rc_ati_x10 snd_pcm libphy snd_seq_device asus_wmi snd_timer sparse_keymap rfkill intel_rapl 8250_fintek at
>  i_
>>  re
>> >  mo
>> > >  te usbnet tpm_tis snd rc_core battery tpm evdev iosf_mbi wmi pcspkr input_leds intel_powerclamp parport_pc processor i2c_i801 xhci_pci x86_pkg_temp_thermal usbserial soundcore parport xhci_hcd lpc_ich e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx blowfish_x86_64 blowfish_common ecb xts ansi_cprng drbg aesni_intel ablk_helper lrw gf128mul glue_helper aes_x86_64 crct10dif_pclmul crc32_pclmul crc32c_intel r8169 ehci_pci fjes ehci_hcd thermal mii sata_mv fan cryptd usbcore usb_common sata_sil24 [last unloaded: ftdi_sio]
>> > > [290625.713109] CPU: 0 PID: 25842 Comm: kworker/0:0 Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
>> > > [290625.747597] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
>> > > [290625.776918] Workqueue: events cache_set_flush [bcache]
>> > > [290625.794260] task: ffff880467caa0c0 ti: ffff8805af184000 task.ti: ffff8805af184000
>> > > [290625.818544] RIP: 0010:[<ffffffffc0516e8d>]  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
>> > > [290625.848082] RSP: 0018:ffff8805af187e10  EFLAGS: 00010202
>> > > [290625.866890] RAX: 0000000000000001 RBX: ffff880153a40ce8 RCX: 0000000000000282
>> > > [290625.891136] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>> > > [290625.914365] RBP: ffff8805af187e30 R08: 00000000000000fa R09: ffffe8ffffc01900
>> > > [290625.937588] R10: ffff880088430838 R11: ffff880088430838 R12: 0000000000000000
>> > > [290625.960781] R13: ffff880153a40de8 R14: 0000000000000000 R15: ffffe8ffffc01900
>> > > [290625.983951] FS:  0000000000000000(0000) GS:ffff88061e200000(0000) knlGS:0000000000000000
>> > > [290626.010918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > > [290626.030800] CR2: 00000000000009b8 CR3: 0000000001e0a000 CR4: 00000000001406f0
>> > > [290626.053914] Stack:
>> > > [290626.061687]  ffff8805d6183280 ffff88061e216040 ffff880153a40ce8 0000000000000000
>> > > [290626.085725]  ffff8805af187e78 ffffffff81070fcf ffff8805d61832b0 000000001e216058
>> > > [290626.110510]  ffff8805d6183280 ffff88061e216040 ffff8805d61832b0 ffff88061e216058
>> > > [290626.135329] Call Trace:
>> > > [290626.145108]  [<ffffffff81070fcf>] process_one_work+0x15b/0x260
>> > > [290626.164241]  [<ffffffff81071574>] worker_thread+0x1f0/0x29d
>> > > [290626.182561]  [<ffffffff81071384>] ? rescuer_thread+0x281/0x281
>> > > [290626.201721]  [<ffffffff810758e9>] kthread+0xa5/0xad
>> > > [290626.217966]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
>> > > [290626.236505]  [<ffffffff816e1c0f>] ret_from_fork+0x3f/0x70
>> > > [290626.255052]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
>> > > [290626.274460] Code: 89 1c c1 4d 8b a4 24 f0 01 00 00 49 81 ec f0 01 00 00 eb b4 45 31 e4 0f b7 83 4c f7 ff ff 4a 8b 94 e3 58 ff ff ff 44 39 e0 76 16 <48> 8b ba b8 09 00 00 48 85 ff 74 05 e8 61 e8 b5 c0 49 ff c4 eb
>> > > [290626.337443] RIP  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
>> > > [290626.359833]  RSP <ffff8805af187e10>
>> > > [290626.372050] CR2: 00000000000009b8
>> > > [290626.383646] ---[ end trace f97d88b8bb456800 ]---
>> > > [290626.399977] Kernel panic - not syncing: Fatal exception
>> > > [290626.418169] Kernel Offset: disabled
>> > > [290626.430278] Rebooting in 20 seconds..
>> > > [290646.378903] ACPI MEMORY or I/O RESET_REG.
>> > > --
>> > > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>> > > Microsoft is to operating systems ....
>> > >                                       .... what McDonalds is to gourmet cooking
>> > > Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
>> > > --
>> > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> > > the body of a message to majordomo@vger.kernel.org
>> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-21  0:05     ` Richard Bade
@ 2016-03-21  0:46       ` Marc MERLIN
  2016-03-21  0:52         ` Richard Bade
  2016-03-25  3:59       ` Eric Wheeler
  1 sibling, 1 reply; 20+ messages in thread
From: Marc MERLIN @ 2016-03-21  0:46 UTC (permalink / raw)
  To: Richard Bade; +Cc: Eric Wheeler, linux-bcache

On Mon, Mar 21, 2016 at 01:05:28PM +1300, Richard Bade wrote:
> Hi Eric,
> We built a kernel with the 3 patches applied and I still got the box to lock up.
> I don't have any of the traces any more. Also I'm wondering what is
> the best way to get these? When this happens my machine completely
> locks up so can't grab anything from dmesg.
> Let me know what you want me to grab and I'll crash it again.
 
On the plus side, my machine hasn't crashed with bcache recently :)

I think in your case, your best bet is serial console, which is what I
have. It's the only way to get a trace in a case like that (unless you
see it on the screen, and get a crash dump there that you can take a
picture of, assuming you have a screen).
Next best option is netconsole, but it's a bit of work to setup.

Marc

> Regards,
> Richard
> 
> On 10 March 2016 at 14:34, Eric Wheeler <bcache@lists.ewheeler.net> wrote:
> > Hi Richard, Marc,
> >
> >>>> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
> >
> > Do you still have the backtraces that show the function call stack for
> > errors that look like this?
> >         %s: page allocation failure: order:%d, mode:0x%x
> >
> > Please send as many relevant OOM failure traces that you can.  I would
> > like to see which memory allocation(s) are failing and if they are always
> > the same stack trace.
> >
> > In the example above, order 7 means 2^7 of 4k pages, so it means the
> > kernel can't find 512k of contiguous memory that can be allocated.
> >
> > It looks like the OOM is triggered in bch_cache_set_alloc, but might be
> > cache_alloc too.  I'm not sure if an alternate allocation mechanism can be
> > used safely, but thats what I want to look into.
> >
> > Thanks!
> >
> > -Eric
> >
> > --
> > Eric Wheeler
> >
> > On Wed, 9 Mar 2016, Eric Wheeler wrote:
> >
> >> On Wed, 9 Mar 2016, Richard Bade wrote:
> >>
> >> > Hi Guys,
> >> > I've also seen this issue when registering a device.
> >> > [Fri Feb  5 14:06:59 2016] bcache: register_cache() error opening
> >> > dm-6: cannot allocate memory
> >> >
> >> > For me the situation happens because I'm running Ceph OSD's on the
> >> > host. These are using all the available memory and for some reason the
> >> > register cannot allocate memory.
> >> > I've reproduced this several times, but basically if I register on a
> >> > system that has been up for a while and therefore using all it's RAM
> >> > this will happen.
> >> > Here's two specific time's it's happened:
> >> > when creating using "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
> >> > --bdev /dev/sdf1". It failed due to xfs still remaining on /dev/sdf1.
> >>
> >> Do you get a kernel failure, or is it just make-bcache being safe?
> >>
> >> > Then run wipefs and "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
> >> > --bdev /dev/sdf1" again. CRASH.
> >>
> >> make-bcache will auto register.  (Maybe there's an option to stop that,
> >> I've not looked.)
> >>
> >> > when manually creating a cache device using "make-bcache --cache
> >> > /dev/vg-raid0/lv-bcache". The cache set didn't show up in
> >> > /sys/fs/bcache. echo "/dev/vg-raid0/lv-bcache" >
> >> > /sys/fs/bcache/register. CRASH.
> >> > I was able to work around this issue by stopping the osd's before
> >> > registering and therefore freeing up memory.
> >>
> >> I would be curious if setting /proc/sys/vm/min_free_kbytes to 256*1024
> >> fixes that.
> >>
> >> > Because I am able to reproduce this relatively easily, I'm happy to
> >> > test any patches.
> >>
> >> Yes, thank you, please do:
> >>
> >> Try the three patches at the top of these two branches and see if you
> >> crash.  I'm hoping it fails gracefully on OOM:
> >>
> >> https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc6-bcache-fixes
> >>       commits 84a2edd, a704484
> >> https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc7-bcache-fixes
> >>       commit 87264e9
> >>
> >> If it stops crashing, then maybe we can work out a way to allocate memory
> >> such that it succeeds.  Please send any dmesg output.
> >>
> >> -Eric
> >>
> >> > Apart from the register crash, the cache sets have been up and stable
> >> > for over 1 month.
> >> >
> >> > Regards,
> >> > Richard
> >> >
> >> > On Mon, 7 Mar 2016, Marc MERLIN wrote:
> >> >
> >> > > I started a btrfs copy from a bcached device to another one (both are in
> >> > > the 5 to 15TB range).
> >> > > A minute later, my system that had been up for several days, crashed.
> >> > >
> >> > > Ultimately, it crashed on this:
> >> > > bcache: register_cache() error opening sdl2: cannot allocate memory
> >> > > BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
> >> > > IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> >> >
> >> > Strange about memory allocation issues.  Do you have
> >> > /proc/sys/vm/min_free_kbytes set to something like $((256*1024)) ? Is this
> >> > a multi-socket machine with all memory plugged into only one CPU?
> >> >
> >> > I'm curious though, why was registration called a second time? Was the
> >> > drive external?  Could udev be re-registering the device?
> >> >
> >> > You might find where the registration is being done and prevent it from
> >> > running automatically.  At least that might solve the re-registration
> >> > problem.
> >> >
> >> > As for the memory allocation issue, the backtrace indicates that this is a
> >> > registration-time problem, not a runtime issue.  I'm guessing it is one of
> >> > the threads attempting to proceed after a memory allocation error similar
> >> > to the writeback thread issue you had last time which was fixed by adding
> >> > some locking around the initialization.
> >> >
> >> > I'll look and see if I can come up with a reasonable patch to cleanup the
> >> > allocator or gc thread under OOM conditions at registration time.
> >> >
> >> > -Eric
> >> >
> >> > >
> >> > > I'm not sure why it tried to register sdl2 then when it was already
> >> > > registered from boot. Did it time out and had to be registered again?
> >> > >
> >> > > At this point, things with bcache have been bad enough, that I'm
> >> > > considering removing it. My system has been very unstable with it :(
> >> > > Or maybe I can just remove the caching devices, leave the backing ones,
> >> > > and run without cache for now, giving me the option to re-add the cache later.
> >> > >
> >> > > I actually just added 16GB of RAM to the server after adding bcache, and
> >> > > clearly going from 8 to 24GB made no difference. I'm wondering if there
> >> > > is a memory leak somewhere?
> >> > >
> >> > > Just before the system crashed, memory looked like this, which isn't bad:
> >> > >              total       used       free     shared    buffers     cached
> >> > > Mem:      24392440   24241440     151000          0         88   20116748
> >> > > -/+ buffers/cache:    4124604   20267836
> >> > > Swap:     15616764     244452   15372312
> >> > >
> >> > > I'm using btrfs on top of dmcrypt on top of bcache.
> >> > > btrfs on top of dmcrypt was stable, I had been using it for years. Adding bcache is what
> >> > > caused all my recent problems to start.
> >> > >
> >> > > Any ideas?
> >> > >
> >> > > Thanks,
> >> > > Marc
> >> > >
> >> > >
> >> > > [290569.458707] Process accounting resumed
> >> > > [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
> >> > > [290623.697143] CPU: 0 PID: 1284 Comm: bcache-register Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
> >> > > [290623.732143] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> >> > > [290623.762428]  0000000000000000 ffff88001df7bb08 ffffffff8134ae0a 0000000000000001
> >> > > [290623.787470]  ffff88001df7bba0 ffffffff81124ab6 024080c01df7bc44 024080c000000040
> >> > > [290623.811775]  0000000700000001 0000000000000007 0000000000000007 0000000000000040
> >> > > [290623.836443] Call Trace:
> >> > > [290623.846280]  [<ffffffff8134ae0a>] dump_stack+0x44/0x55
> >> > > [290623.864123]  [<ffffffff81124ab6>] warn_alloc_failed+0x114/0x12c
> >> > > [290623.883425]  [<ffffffff811274b8>] __alloc_pages_nodemask+0x7cb/0x84c
> >> > > [290623.904159]  [<ffffffff8107df80>] ? try_to_wake_up+0x1d9/0x1eb
> >> > > [290623.923173]  [<ffffffff8115f6d7>] alloc_pages_current+0xa9/0xcd
> >> > > [290623.942435]  [<ffffffff8112377e>] __get_free_pages+0xe/0x3c
> >> > > [290623.960621]  [<ffffffffc05192f0>] bch_cache_set_alloc+0x3c7/0x479 [bcache]
> >> > > [290623.983825]  [<ffffffffc051a5e3>] register_bcache+0x1241/0x1452 [bcache]
> >> > > [290624.005456]  [<ffffffff8134c853>] kobj_attr_store+0x10/0x1a
> >> > > [290624.023729]  [<ffffffff811df9df>] sysfs_kf_write+0x39/0x3b
> >> > > [290624.041739]  [<ffffffff811df2e1>] kernfs_fop_write+0xed/0x130
> >> > > [290624.060394]  [<ffffffff8117d97d>] __vfs_write+0x26/0xa5
> >> > > [290624.077452]  [<ffffffff810b2969>] ? current_kernel_time64+0x10/0x36
> >> > > [290624.097921]  [<ffffffff812d6050>] ? security_file_permission+0x3b/0x42
> >> > > [290624.119197]  [<ffffffff810915ea>] ? percpu_down_read+0x12/0x41
> >> > > [290624.138945]  [<ffffffff8117fb61>] ? __sb_start_write+0x2b/0x48
> >> > > [290624.158291]  [<ffffffff8117dfe8>] vfs_write+0x9d/0xe8
> >> > > [290624.174919]  [<ffffffff8117e7bd>] SyS_write+0x4d/0x78
> >> > > [290624.191455]  [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3
> >> > > [290624.210331]  [<ffffffff816e3e32>] sysenter_flags_fixed+0x8/0x12
> >> > > [290624.229355] Mem-Info:
> >> > > [290624.237432] active_anon:607145 inactive_anon:208571 isolated_anon:0
> >> > > [290624.237432]  active_file:2103407 inactive_file:1565209 isolated_file:0
> >> > > [290624.237432]  unevictable:1307 dirty:188846 writeback:138775 unstable:0
> >> > > [290624.237432]  slab_reclaimable:180112 slab_unreclaimable:73373
> >> > > [290624.237432]  mapped:419640 shmem:466581 pagetables:5778 bounce:0
> >> > > [290624.237432]  free:988612 free_pcp:2130 free_cma:14
> >> > > [290624.351715] Node 0 DMA free:15884kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> >> > > [290624.483875] lowmem_reserve[]: 0 3201 23800 23800
> >> > > [290624.500998] Node 0 DMA32 free:581000kB min:2648kB low:3308kB high:3972kB active_anon:229220kB inactive_anon:248156kB active_file:1102468kB inactive_file:832264kB unevictable:948kB isolated(anon):0kB isolated(file):68kB present:3362068kB managed:3283032kB mlocked:68719477684kB dirty:94316kB writeback:65996kB mapped:253400kB shmem:282168kB slab_reclaimable:96344kB slab_unreclaimable:41308kB kernel_stack:2480kB pagetables:2952kB unstable:0kB bounce:0kB free_pcp:4008kB local_pcp:140kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> >> > > [290624.655741] lowmem_reserve[]: 0 0 20599 20599
> >> > > [290624.674232] Node 0 Normal free:3368136kB min:17048kB low:21308kB high:25572kB active_anon:2203780kB inactive_anon:586104kB active_file:7311280kB inactive_file:5430068kB unevictable:4280kB isolated(anon):0kB isolated(file):0kB present:21485568kB managed:21093516kB mlocked:481036341432kB dirty:662172kB writeback:472472kB mapped:1425288kB shmem:1584132kB slab_reclaimable:624104kB slab_unreclaimable:251680kB kernel_stack:11696kB pagetables:20180kB unstable:0kB bounce:0kB free_pcp:4304kB local_pcp:588kB free_cma:56kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> >> > > [290624.834378] lowmem_reserve[]: 0 0 0 0
> >> > > [290624.848459] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
> >> > > [290624.895688] Node 0 DMA32: 30635*4kB (UME) 23925*8kB (UME) 7478*16kB (UME) 2420*32kB (UME) 794*64kB (UME) 139*128kB (UME) 10*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 582196kB
> >> > > [290624.947950] Node 0 Normal: 241499*4kB (UME) 161875*8kB (UME) 33424*16kB (UME) 10915*32kB (UME) 2795*64kB (UME) 334*128kB (UME) 15*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3370532kB
> >> > > [290625.003724] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> >> > > [290625.030463] 4142937 total pagecache pages
> >> > > [290625.043969] 5782 pages in swap cache
> >> > > [290625.056118] Swap cache stats: add 102837, delete 97055, find 627323/635133
> >> > > [290625.078329] Free swap  = 15372324kB
> >> > > [290625.090190] Total swap = 15616764kB
> >> > > [290625.102186] 6215903 pages RAM
> >> > > [290625.112399] 0 pages HighMem/MovableOnly
> >> > > [290625.125747] 117793 pages reserved
> >> > > [290625.137495] 4096 pages cma reserved
> >> > > [290625.149199] 0 pages hwpoisoned
> >> > > [290625.159961] bcache: register_cache() error opening sdl2: cannot allocate memory
> >> > > [290625.183410] BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
> >> > > [290625.208648] IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> >> > > [290625.230734] PGD 0
> >> > > [290625.238446] Oops: 0000 [#1] SMP
> >> > > [290625.249857] Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype tun bridge stp llc autofs4 softdog binfmt_misc ftdi_sio nfsd nfs_acl auth_rpcgss nfs fscache lockd grace sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter bcache lm85 hwmon_vid pl2303 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables nf_conntrack_ftp ipt_MASQUERADE x_tables nf_nat_masquerade_ipv4 nf_nat nf_conntrack sg st snd_pcm_oss snd_mixer_oss kvm_intel snd_hda_codec_realtek snd_hda_codec_generic snd_opl3_synth kvm snd_seq_midi_emul snd_hda_intel snd_cmipci gameport snd_hda_codec snd_opl3_lib snd_mpu401_uart snd_seq_midi irqbypass snd_seq_midi
 _event snd_seq snd_hda_core snd_hwdep snd_rawmidi asix coretemp eeepc_wmi tpm_infineon rc_ati_x10 snd_pcm libphy snd_seq_device asus_wmi snd_timer sparse_keymap rfkill intel_rapl 8250_fintek at
> >  i_
> >>  re
> >> >  mo
> >> > >  te usbnet tpm_tis snd rc_core battery tpm evdev iosf_mbi wmi pcspkr input_leds intel_powerclamp parport_pc processor i2c_i801 xhci_pci x86_pkg_temp_thermal usbserial soundcore parport xhci_hcd lpc_ich e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx blowfish_x86_64 blowfish_common ecb xts ansi_cprng drbg aesni_intel ablk_helper lrw gf128mul glue_helper aes_x86_64 crct10dif_pclmul crc32_pclmul crc32c_intel r8169 ehci_pci fjes ehci_hcd thermal mii sata_mv fan cryptd usbcore usb_common sata_sil24 [last unloaded: ftdi_sio]
> >> > > [290625.713109] CPU: 0 PID: 25842 Comm: kworker/0:0 Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
> >> > > [290625.747597] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> >> > > [290625.776918] Workqueue: events cache_set_flush [bcache]
> >> > > [290625.794260] task: ffff880467caa0c0 ti: ffff8805af184000 task.ti: ffff8805af184000
> >> > > [290625.818544] RIP: 0010:[<ffffffffc0516e8d>]  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> >> > > [290625.848082] RSP: 0018:ffff8805af187e10  EFLAGS: 00010202
> >> > > [290625.866890] RAX: 0000000000000001 RBX: ffff880153a40ce8 RCX: 0000000000000282
> >> > > [290625.891136] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> >> > > [290625.914365] RBP: ffff8805af187e30 R08: 00000000000000fa R09: ffffe8ffffc01900
> >> > > [290625.937588] R10: ffff880088430838 R11: ffff880088430838 R12: 0000000000000000
> >> > > [290625.960781] R13: ffff880153a40de8 R14: 0000000000000000 R15: ffffe8ffffc01900
> >> > > [290625.983951] FS:  0000000000000000(0000) GS:ffff88061e200000(0000) knlGS:0000000000000000
> >> > > [290626.010918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > > [290626.030800] CR2: 00000000000009b8 CR3: 0000000001e0a000 CR4: 00000000001406f0
> >> > > [290626.053914] Stack:
> >> > > [290626.061687]  ffff8805d6183280 ffff88061e216040 ffff880153a40ce8 0000000000000000
> >> > > [290626.085725]  ffff8805af187e78 ffffffff81070fcf ffff8805d61832b0 000000001e216058
> >> > > [290626.110510]  ffff8805d6183280 ffff88061e216040 ffff8805d61832b0 ffff88061e216058
> >> > > [290626.135329] Call Trace:
> >> > > [290626.145108]  [<ffffffff81070fcf>] process_one_work+0x15b/0x260
> >> > > [290626.164241]  [<ffffffff81071574>] worker_thread+0x1f0/0x29d
> >> > > [290626.182561]  [<ffffffff81071384>] ? rescuer_thread+0x281/0x281
> >> > > [290626.201721]  [<ffffffff810758e9>] kthread+0xa5/0xad
> >> > > [290626.217966]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
> >> > > [290626.236505]  [<ffffffff816e1c0f>] ret_from_fork+0x3f/0x70
> >> > > [290626.255052]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
> >> > > [290626.274460] Code: 89 1c c1 4d 8b a4 24 f0 01 00 00 49 81 ec f0 01 00 00 eb b4 45 31 e4 0f b7 83 4c f7 ff ff 4a 8b 94 e3 58 ff ff ff 44 39 e0 76 16 <48> 8b ba b8 09 00 00 48 85 ff 74 05 e8 61 e8 b5 c0 49 ff c4 eb
> >> > > [290626.337443] RIP  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> >> > > [290626.359833]  RSP <ffff8805af187e10>
> >> > > [290626.372050] CR2: 00000000000009b8
> >> > > [290626.383646] ---[ end trace f97d88b8bb456800 ]---
> >> > > [290626.399977] Kernel panic - not syncing: Fatal exception
> >> > > [290626.418169] Kernel Offset: disabled
> >> > > [290626.430278] Rebooting in 20 seconds..
> >> > > [290646.378903] ACPI MEMORY or I/O RESET_REG.
> >> > > --
> >> > > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> >> > > Microsoft is to operating systems ....
> >> > >                                       .... what McDonalds is to gourmet cooking
> >> > > Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> >> > > --
> >> > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> >> > > the body of a message to majordomo@vger.kernel.org
> >> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> > >
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> >> > the body of a message to majordomo@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> >> > the body of a message to majordomo@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> 

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-21  0:46       ` Marc MERLIN
@ 2016-03-21  0:52         ` Richard Bade
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Bade @ 2016-03-21  0:52 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Eric Wheeler, linux-bcache

On 21 March 2016 at 13:46, Marc MERLIN <marc@merlins.org> wrote:
> On Mon, Mar 21, 2016 at 01:05:28PM +1300, Richard Bade wrote:
>> Hi Eric,
>> We built a kernel with the 3 patches applied and I still got the box to lock up.
>> I don't have any of the traces any more. Also I'm wondering what is
>> the best way to get these? When this happens my machine completely
>> locks up so can't grab anything from dmesg.
>> Let me know what you want me to grab and I'll crash it again.
>
> On the plus side, my machine hasn't crashed with bcache recently :)
>
> I think in your case, your best bet is serial console, which is what I
> have. It's the only way to get a trace in a case like that (unless you
> see it on the screen, and get a crash dump there that you can take a
> picture of, assuming you have a screen).
> Next best option is netconsole, but it's a bit of work to setup.

Thanks Marc, I'll see if I can get a serial console set up.

Richard

>
> Marc
>
>> Regards,
>> Richard
>>
>> On 10 March 2016 at 14:34, Eric Wheeler <bcache@lists.ewheeler.net> wrote:
>> > Hi Richard, Marc,
>> >
>> >>>> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
>> >
>> > Do you still have the backtraces that show the function call stack for
>> > errors that look like this?
>> >         %s: page allocation failure: order:%d, mode:0x%x
>> >
>> > Please send as many relevant OOM failure traces that you can.  I would
>> > like to see which memory allocation(s) are failing and if they are always
>> > the same stack trace.
>> >
>> > In the example above, order 7 means 2^7 of 4k pages, so it means the
>> > kernel can't find 512k of contiguous memory that can be allocated.
>> >
>> > It looks like the OOM is triggered in bch_cache_set_alloc, but might be
>> > cache_alloc too.  I'm not sure if an alternate allocation mechanism can be
>> > used safely, but thats what I want to look into.
>> >
>> > Thanks!
>> >
>> > -Eric
>> >
>> > --
>> > Eric Wheeler
>> >
>> > On Wed, 9 Mar 2016, Eric Wheeler wrote:
>> >
>> >> On Wed, 9 Mar 2016, Richard Bade wrote:
>> >>
>> >> > Hi Guys,
>> >> > I've also seen this issue when registering a device.
>> >> > [Fri Feb  5 14:06:59 2016] bcache: register_cache() error opening
>> >> > dm-6: cannot allocate memory
>> >> >
>> >> > For me the situation happens because I'm running Ceph OSD's on the
>> >> > host. These are using all the available memory and for some reason the
>> >> > register cannot allocate memory.
>> >> > I've reproduced this several times, but basically if I register on a
>> >> > system that has been up for a while and therefore using all it's RAM
>> >> > this will happen.
>> >> > Here's two specific time's it's happened:
>> >> > when creating using "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
>> >> > --bdev /dev/sdf1". It failed due to xfs still remaining on /dev/sdf1.
>> >>
>> >> Do you get a kernel failure, or is it just make-bcache being safe?
>> >>
>> >> > Then run wipefs and "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
>> >> > --bdev /dev/sdf1" again. CRASH.
>> >>
>> >> make-bcache will auto register.  (Maybe there's an option to stop that,
>> >> I've not looked.)
>> >>
>> >> > when manually creating a cache device using "make-bcache --cache
>> >> > /dev/vg-raid0/lv-bcache". The cache set didn't show up in
>> >> > /sys/fs/bcache. echo "/dev/vg-raid0/lv-bcache" >
>> >> > /sys/fs/bcache/register. CRASH.
>> >> > I was able to work around this issue by stopping the osd's before
>> >> > registering and therefore freeing up memory.
>> >>
>> >> I would be curious if setting /proc/sys/vm/min_free_kbytes to 256*1024
>> >> fixes that.
>> >>
>> >> > Because I am able to reproduce this relatively easily, I'm happy to
>> >> > test any patches.
>> >>
>> >> Yes, thank you, please do:
>> >>
>> >> Try the three patches at the top of these two branches and see if you
>> >> crash.  I'm hoping it fails gracefully on OOM:
>> >>
>> >> https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc6-bcache-fixes
>> >>       commits 84a2edd, a704484
>> >> https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc7-bcache-fixes
>> >>       commit 87264e9
>> >>
>> >> If it stops crashing, then maybe we can work out a way to allocate memory
>> >> such that it succeeds.  Please send any dmesg output.
>> >>
>> >> -Eric
>> >>
>> >> > Apart from the register crash, the cache sets have been up and stable
>> >> > for over 1 month.
>> >> >
>> >> > Regards,
>> >> > Richard
>> >> >
>> >> > On Mon, 7 Mar 2016, Marc MERLIN wrote:
>> >> >
>> >> > > I started a btrfs copy from a bcached device to another one (both are in
>> >> > > the 5 to 15TB range).
>> >> > > A minute later, my system that had been up for several days, crashed.
>> >> > >
>> >> > > Ultimately, it crashed on this:
>> >> > > bcache: register_cache() error opening sdl2: cannot allocate memory
>> >> > > BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
>> >> > > IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
>> >> >
>> >> > Strange about memory allocation issues.  Do you have
>> >> > /proc/sys/vm/min_free_kbytes set to something like $((256*1024)) ? Is this
>> >> > a multi-socket machine with all memory plugged into only one CPU?
>> >> >
>> >> > I'm curious though, why was registration called a second time? Was the
>> >> > drive external?  Could udev be re-registering the device?
>> >> >
>> >> > You might find where the registration is being done and prevent it from
>> >> > running automatically.  At least that might solve the re-registration
>> >> > problem.
>> >> >
>> >> > As for the memory allocation issue, the backtrace indicates that this is a
>> >> > registration-time problem, not a runtime issue.  I'm guessing it is one of
>> >> > the threads attempting to proceed after a memory allocation error similar
>> >> > to the writeback thread issue you had last time which was fixed by adding
>> >> > some locking around the initialization.
>> >> >
>> >> > I'll look and see if I can come up with a reasonable patch to cleanup the
>> >> > allocator or gc thread under OOM conditions at registration time.
>> >> >
>> >> > -Eric
>> >> >
>> >> > >
>> >> > > I'm not sure why it tried to register sdl2 then when it was already
>> >> > > registered from boot. Did it time out and had to be registered again?
>> >> > >
>> >> > > At this point, things with bcache have been bad enough, that I'm
>> >> > > considering removing it. My system has been very unstable with it :(
>> >> > > Or maybe I can just remove the caching devices, leave the backing ones,
>> >> > > and run without cache for now, giving me the option to re-add the cache later.
>> >> > >
>> >> > > I actually just added 16GB of RAM to the server after adding bcache, and
>> >> > > clearly going from 8 to 24GB made no difference. I'm wondering if there
>> >> > > is a memory leak somewhere?
>> >> > >
>> >> > > Just before the system crashed, memory looked like this, which isn't bad:
>> >> > >              total       used       free     shared    buffers     cached
>> >> > > Mem:      24392440   24241440     151000          0         88   20116748
>> >> > > -/+ buffers/cache:    4124604   20267836
>> >> > > Swap:     15616764     244452   15372312
>> >> > >
>> >> > > I'm using btrfs on top of dmcrypt on top of bcache.
>> >> > > btrfs on top of dmcrypt was stable, I had been using it for years. Adding bcache is what
>> >> > > caused all my recent problems to start.
>> >> > >
>> >> > > Any ideas?
>> >> > >
>> >> > > Thanks,
>> >> > > Marc
>> >> > >
>> >> > >
>> >> > > [290569.458707] Process accounting resumed
>> >> > > [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
>> >> > > [290623.697143] CPU: 0 PID: 1284 Comm: bcache-register Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
>> >> > > [290623.732143] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
>> >> > > [290623.762428]  0000000000000000 ffff88001df7bb08 ffffffff8134ae0a 0000000000000001
>> >> > > [290623.787470]  ffff88001df7bba0 ffffffff81124ab6 024080c01df7bc44 024080c000000040
>> >> > > [290623.811775]  0000000700000001 0000000000000007 0000000000000007 0000000000000040
>> >> > > [290623.836443] Call Trace:
>> >> > > [290623.846280]  [<ffffffff8134ae0a>] dump_stack+0x44/0x55
>> >> > > [290623.864123]  [<ffffffff81124ab6>] warn_alloc_failed+0x114/0x12c
>> >> > > [290623.883425]  [<ffffffff811274b8>] __alloc_pages_nodemask+0x7cb/0x84c
>> >> > > [290623.904159]  [<ffffffff8107df80>] ? try_to_wake_up+0x1d9/0x1eb
>> >> > > [290623.923173]  [<ffffffff8115f6d7>] alloc_pages_current+0xa9/0xcd
>> >> > > [290623.942435]  [<ffffffff8112377e>] __get_free_pages+0xe/0x3c
>> >> > > [290623.960621]  [<ffffffffc05192f0>] bch_cache_set_alloc+0x3c7/0x479 [bcache]
>> >> > > [290623.983825]  [<ffffffffc051a5e3>] register_bcache+0x1241/0x1452 [bcache]
>> >> > > [290624.005456]  [<ffffffff8134c853>] kobj_attr_store+0x10/0x1a
>> >> > > [290624.023729]  [<ffffffff811df9df>] sysfs_kf_write+0x39/0x3b
>> >> > > [290624.041739]  [<ffffffff811df2e1>] kernfs_fop_write+0xed/0x130
>> >> > > [290624.060394]  [<ffffffff8117d97d>] __vfs_write+0x26/0xa5
>> >> > > [290624.077452]  [<ffffffff810b2969>] ? current_kernel_time64+0x10/0x36
>> >> > > [290624.097921]  [<ffffffff812d6050>] ? security_file_permission+0x3b/0x42
>> >> > > [290624.119197]  [<ffffffff810915ea>] ? percpu_down_read+0x12/0x41
>> >> > > [290624.138945]  [<ffffffff8117fb61>] ? __sb_start_write+0x2b/0x48
>> >> > > [290624.158291]  [<ffffffff8117dfe8>] vfs_write+0x9d/0xe8
>> >> > > [290624.174919]  [<ffffffff8117e7bd>] SyS_write+0x4d/0x78
>> >> > > [290624.191455]  [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3
>> >> > > [290624.210331]  [<ffffffff816e3e32>] sysenter_flags_fixed+0x8/0x12
>> >> > > [290624.229355] Mem-Info:
>> >> > > [290624.237432] active_anon:607145 inactive_anon:208571 isolated_anon:0
>> >> > > [290624.237432]  active_file:2103407 inactive_file:1565209 isolated_file:0
>> >> > > [290624.237432]  unevictable:1307 dirty:188846 writeback:138775 unstable:0
>> >> > > [290624.237432]  slab_reclaimable:180112 slab_unreclaimable:73373
>> >> > > [290624.237432]  mapped:419640 shmem:466581 pagetables:5778 bounce:0
>> >> > > [290624.237432]  free:988612 free_pcp:2130 free_cma:14
>> >> > > [290624.351715] Node 0 DMA free:15884kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
>> >> > > [290624.483875] lowmem_reserve[]: 0 3201 23800 23800
>> >> > > [290624.500998] Node 0 DMA32 free:581000kB min:2648kB low:3308kB high:3972kB active_anon:229220kB inactive_anon:248156kB active_file:1102468kB inactive_file:832264kB unevictable:948kB isolated(anon):0kB isolated(file):68kB present:3362068kB managed:3283032kB mlocked:68719477684kB dirty:94316kB writeback:65996kB mapped:253400kB shmem:282168kB slab_reclaimable:96344kB slab_unreclaimable:41308kB kernel_stack:2480kB pagetables:2952kB unstable:0kB bounce:0kB free_pcp:4008kB local_pcp:140kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>> >> > > [290624.655741] lowmem_reserve[]: 0 0 20599 20599
>> >> > > [290624.674232] Node 0 Normal free:3368136kB min:17048kB low:21308kB high:25572kB active_anon:2203780kB inactive_anon:586104kB active_file:7311280kB inactive_file:5430068kB unevictable:4280kB isolated(anon):0kB isolated(file):0kB present:21485568kB managed:21093516kB mlocked:481036341432kB dirty:662172kB writeback:472472kB mapped:1425288kB shmem:1584132kB slab_reclaimable:624104kB slab_unreclaimable:251680kB kernel_stack:11696kB pagetables:20180kB unstable:0kB bounce:0kB free_pcp:4304kB local_pcp:588kB free_cma:56kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>> >> > > [290624.834378] lowmem_reserve[]: 0 0 0 0
>> >> > > [290624.848459] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
>> >> > > [290624.895688] Node 0 DMA32: 30635*4kB (UME) 23925*8kB (UME) 7478*16kB (UME) 2420*32kB (UME) 794*64kB (UME) 139*128kB (UME) 10*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 582196kB
>> >> > > [290624.947950] Node 0 Normal: 241499*4kB (UME) 161875*8kB (UME) 33424*16kB (UME) 10915*32kB (UME) 2795*64kB (UME) 334*128kB (UME) 15*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3370532kB
>> >> > > [290625.003724] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
>> >> > > [290625.030463] 4142937 total pagecache pages
>> >> > > [290625.043969] 5782 pages in swap cache
>> >> > > [290625.056118] Swap cache stats: add 102837, delete 97055, find 627323/635133
>> >> > > [290625.078329] Free swap  = 15372324kB
>> >> > > [290625.090190] Total swap = 15616764kB
>> >> > > [290625.102186] 6215903 pages RAM
>> >> > > [290625.112399] 0 pages HighMem/MovableOnly
>> >> > > [290625.125747] 117793 pages reserved
>> >> > > [290625.137495] 4096 pages cma reserved
>> >> > > [290625.149199] 0 pages hwpoisoned
>> >> > > [290625.159961] bcache: register_cache() error opening sdl2: cannot allocate memory
>> >> > > [290625.183410] BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
>> >> > > [290625.208648] IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
>> >> > > [290625.230734] PGD 0
>> >> > > [290625.238446] Oops: 0000 [#1] SMP
>> >> > > [290625.249857] Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype tun bridge stp llc autofs4 softdog binfmt_misc ftdi_sio nfsd nfs_acl auth_rpcgss nfs fscache lockd grace sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter bcache lm85 hwmon_vid pl2303 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables nf_conntrack_ftp ipt_MASQUERADE x_tables nf_nat_masquerade_ipv4 nf_nat nf_conntrack sg st snd_pcm_oss snd_mixer_oss kvm_intel snd_hda_codec_realtek snd_hda_codec_generic snd_opl3_synth kvm snd_seq_midi_emul snd_hda_intel snd_cmipci gameport snd_hda_codec snd_opl3_lib snd_mpu401_uart snd_seq_midi irqbypass snd_seq_mid
 i_event snd_seq snd_hda_core snd_hwdep snd_rawmidi asix coretemp eeepc_wmi tpm_infineon rc_ati_x10 snd_pcm libphy snd_seq_device asus_wmi snd_timer sparse_keymap rfkill intel_rapl 8250_fintek at
>> >  i_
>> >>  re
>> >> >  mo
>> >> > >  te usbnet tpm_tis snd rc_core battery tpm evdev iosf_mbi wmi pcspkr input_leds intel_powerclamp parport_pc processor i2c_i801 xhci_pci x86_pkg_temp_thermal usbserial soundcore parport xhci_hcd lpc_ich e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx blowfish_x86_64 blowfish_common ecb xts ansi_cprng drbg aesni_intel ablk_helper lrw gf128mul glue_helper aes_x86_64 crct10dif_pclmul crc32_pclmul crc32c_intel r8169 ehci_pci fjes ehci_hcd thermal mii sata_mv fan cryptd usbcore usb_common sata_sil24 [last unloaded: ftdi_sio]
>> >> > > [290625.713109] CPU: 0 PID: 25842 Comm: kworker/0:0 Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
>> >> > > [290625.747597] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
>> >> > > [290625.776918] Workqueue: events cache_set_flush [bcache]
>> >> > > [290625.794260] task: ffff880467caa0c0 ti: ffff8805af184000 task.ti: ffff8805af184000
>> >> > > [290625.818544] RIP: 0010:[<ffffffffc0516e8d>]  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
>> >> > > [290625.848082] RSP: 0018:ffff8805af187e10  EFLAGS: 00010202
>> >> > > [290625.866890] RAX: 0000000000000001 RBX: ffff880153a40ce8 RCX: 0000000000000282
>> >> > > [290625.891136] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>> >> > > [290625.914365] RBP: ffff8805af187e30 R08: 00000000000000fa R09: ffffe8ffffc01900
>> >> > > [290625.937588] R10: ffff880088430838 R11: ffff880088430838 R12: 0000000000000000
>> >> > > [290625.960781] R13: ffff880153a40de8 R14: 0000000000000000 R15: ffffe8ffffc01900
>> >> > > [290625.983951] FS:  0000000000000000(0000) GS:ffff88061e200000(0000) knlGS:0000000000000000
>> >> > > [290626.010918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> > > [290626.030800] CR2: 00000000000009b8 CR3: 0000000001e0a000 CR4: 00000000001406f0
>> >> > > [290626.053914] Stack:
>> >> > > [290626.061687]  ffff8805d6183280 ffff88061e216040 ffff880153a40ce8 0000000000000000
>> >> > > [290626.085725]  ffff8805af187e78 ffffffff81070fcf ffff8805d61832b0 000000001e216058
>> >> > > [290626.110510]  ffff8805d6183280 ffff88061e216040 ffff8805d61832b0 ffff88061e216058
>> >> > > [290626.135329] Call Trace:
>> >> > > [290626.145108]  [<ffffffff81070fcf>] process_one_work+0x15b/0x260
>> >> > > [290626.164241]  [<ffffffff81071574>] worker_thread+0x1f0/0x29d
>> >> > > [290626.182561]  [<ffffffff81071384>] ? rescuer_thread+0x281/0x281
>> >> > > [290626.201721]  [<ffffffff810758e9>] kthread+0xa5/0xad
>> >> > > [290626.217966]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
>> >> > > [290626.236505]  [<ffffffff816e1c0f>] ret_from_fork+0x3f/0x70
>> >> > > [290626.255052]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
>> >> > > [290626.274460] Code: 89 1c c1 4d 8b a4 24 f0 01 00 00 49 81 ec f0 01 00 00 eb b4 45 31 e4 0f b7 83 4c f7 ff ff 4a 8b 94 e3 58 ff ff ff 44 39 e0 76 16 <48> 8b ba b8 09 00 00 48 85 ff 74 05 e8 61 e8 b5 c0 49 ff c4 eb
>> >> > > [290626.337443] RIP  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
>> >> > > [290626.359833]  RSP <ffff8805af187e10>
>> >> > > [290626.372050] CR2: 00000000000009b8
>> >> > > [290626.383646] ---[ end trace f97d88b8bb456800 ]---
>> >> > > [290626.399977] Kernel panic - not syncing: Fatal exception
>> >> > > [290626.418169] Kernel Offset: disabled
>> >> > > [290626.430278] Rebooting in 20 seconds..
>> >> > > [290646.378903] ACPI MEMORY or I/O RESET_REG.
>> >> > > --
>> >> > > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>> >> > > Microsoft is to operating systems ....
>> >> > >                                       .... what McDonalds is to gourmet cooking
>> >> > > Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
>> >> > > --
>> >> > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> >> > > the body of a message to majordomo@vger.kernel.org
>> >> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> > >
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> >> > the body of a message to majordomo@vger.kernel.org
>> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> >> > the body of a message to majordomo@vger.kernel.org
>> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>
>>
>
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-10 15:29       ` Bcache still unstable for me (memory problems) Marc MERLIN
  2016-03-10 15:41         ` Christoph Nelles
@ 2016-03-24 21:25         ` Marc MERLIN
  2016-03-25  3:52           ` Eric Wheeler
  1 sibling, 1 reply; 20+ messages in thread
From: Marc MERLIN @ 2016-03-24 21:25 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Richard Bade, linux-bcache

On Thu, Mar 10, 2016 at 07:29:58AM -0800, Marc MERLIN wrote:
> On Wed, Mar 09, 2016 at 06:15:16PM -0800, Marc MERLIN wrote:
> > On Thu, Mar 10, 2016 at 01:34:56AM +0000, Eric Wheeler wrote:
> > > Hi Richard, Marc,
> > > 
> > > >>> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
> > > 
> > > Do you still have the backtraces that show the function call stack for 
> > > errors that look like this?
> > > 	%s: page allocation failure: order:%d, mode:0x%x 
> > > 
> > > Please send as many relevant OOM failure traces that you can.  I would 
> > > like to see which memory allocation(s) are failing and if they are always 
> > > the same stack trace.
> >  
> > It's the same one I already sent you, just from syslog instead of serial
> > console (I was looking for other relevant cronjobs or errors per your
> > request)
>  
> Ok, here's a new one with your patches installed. Good thing is that
> kernel didn't crash. But adding caching device still failed.

Mmmh, so I just stopped a bcache, made another one with make-bcache, and my machine died.

Sadly, the actual crash wasn't captured due to my serial console server having failed.
I got this below from remote syslog, the real crash happened later and wasn't logged.

Hope this helps.

bcache-register: page allocation failure: order:7, mode:0x24080c0
CPU: 0 PID: 13517 Comm: bcache-register Not tainted 4.4.2-amd64-i915-volpreempt-sysrq-20160214bc5 #8
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
 0000000000000000 ffff88037cb4fb00 ffffffff8134ae4a 0000000000000001
 ffff88037cb4fb98 ffffffff81124b00 024080c07cb4fc3c 024080c000000040
 0000000700000001 0000000000000007 0000000000000007 0000000000000040
Call Trace:
 [<ffffffff8134ae4a>] dump_stack+0x44/0x55
 [<ffffffff81124b00>] warn_alloc_failed+0x114/0x12c
 [<ffffffff81127502>] __alloc_pages_nodemask+0x7cb/0x84c
 [<ffffffff8107dfca>] ? try_to_wake_up+0x1d9/0x1eb
 [<ffffffff8115f721>] alloc_pages_current+0xa9/0xcd
 [<ffffffff811237c8>] __get_free_pages+0xe/0x3c
 [<ffffffffc059530b>] bch_cache_set_alloc+0x3c7/0x479 [bcache]
 [<ffffffffc05965ea>] register_bcache+0x122d/0x143c [bcache]
 [<ffffffff8134c893>] kobj_attr_store+0x10/0x1a
 [<ffffffff811dfa29>] sysfs_kf_write+0x39/0x3b
 [<ffffffff811df32b>] kernfs_fop_write+0xed/0x130
 [<ffffffff8117d9c7>] __vfs_write+0x26/0xa5
 [<ffffffff810b29b3>] ? current_kernel_time64+0x10/0x36
 [<ffffffff812d609a>] ? security_file_permission+0x3b/0x42
 [<ffffffff81091634>] ? percpu_down_read+0x12/0x41
 [<ffffffff8117fbab>] ? __sb_start_write+0x2b/0x48
 [<ffffffff8117e032>] vfs_write+0x9d/0xe8
 [<ffffffff8117e807>] SyS_write+0x4d/0x78
 [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3
 [<ffffffff816e3e72>] sysenter_flags_fixed+0x8/0x12
Mem-Info:
active_anon:468787 inactive_anon:167077 isolated_anon:0
 active_file:1380425 inactive_file:1192252 isolated_file:0
 unevictable:1474 dirty:39397 writeback:3372 unstable:0
 slab_reclaimable:164804 slab_unreclaimable:70389
 mapped:416857 shmem:409963 pagetables:4551 bounce:0
 free:302751 free_pcp:1140 free_cma:384
Node 0 DMA free:15892kB min:256kB low:320kB high:384kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB ilowmem_reserve[]: 0 3201 15736 15736
Node 0 DMA32 free:554388kB min:53336kB low:66668kB high:80004kB active_anon:185968kB inactive_anon:196384kB active_file:1429216kB inactive_flowmem_reserve[]: 0 0 12535 12535
Node 0 Normal free:651784kB min:208544kB low:260680kB high:312816kB active_anon:1688260kB inactive_anon:471920kB active_file:4092556kB inact lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
Node 0 DMA32: 20169*4kB (UME) 59104*8kB (UME) 46*16kB (UM) 4*32kB (UM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 554372kB
Node 0 Normal: 58345*4kB (UMEC) 40698*8kB (UMEC) 4028*16kB (UMEC) 430*32kB (UMEC) 160*64kB (UM) 35*128kB (UM) 10*256kB (M) 0*512kB 0*1024kB 
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
2991157 total pagecache pages
7348 pages in swap cache
Swap cache stats: add 516663, delete 509315, find 10110495/10164991
Free swap  = 14686944kB
Total swap = 15616764kB
4118751 pages RAM
0 pages HighMem/MovableOnly
85016 pages reserved
4096 pages cma reserved
0 pages hwpoisoned
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-24 21:25         ` Marc MERLIN
@ 2016-03-25  3:52           ` Eric Wheeler
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Wheeler @ 2016-03-25  3:52 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Richard Bade, linux-bcache

On Thu, 24 Mar 2016, Marc MERLIN wrote:

> On Thu, Mar 10, 2016 at 07:29:58AM -0800, Marc MERLIN wrote:
> > On Wed, Mar 09, 2016 at 06:15:16PM -0800, Marc MERLIN wrote:
> > > On Thu, Mar 10, 2016 at 01:34:56AM +0000, Eric Wheeler wrote:
> > > > Hi Richard, Marc,
> > > > 
> > > > >>> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
> > > > 
> > > > Do you still have the backtraces that show the function call stack for 
> > > > errors that look like this?
> > > > 	%s: page allocation failure: order:%d, mode:0x%x 
> > > > 
> > > > Please send as many relevant OOM failure traces that you can.  I would 
> > > > like to see which memory allocation(s) are failing and if they are always 
> > > > the same stack trace.
> > >  
> > > It's the same one I already sent you, just from syslog instead of serial
> > > console (I was looking for other relevant cronjobs or errors per your
> > > request)
> >  
> > Ok, here's a new one with your patches installed. Good thing is that
> > kernel didn't crash. But adding caching device still failed.
> 
> Mmmh, so I just stopped a bcache, made another one with make-bcache, and my machine died.
> 
> Sadly, the actual crash wasn't captured due to my serial console server having failed.
> I got this below from remote syslog, the real crash happened later and wasn't logged.
> 
> Hope this helps.
> 
> bcache-register: page allocation failure: order:7, mode:0x24080c0

Unfortunately it only shows me that you're out of memory and that the 
kernel can't find 512k of contiguous memory to allocate.  The crash might 
be related to high memory pressure, or it could be something else---but 
without the crash trace I can only speculate.

-Eric

> CPU: 0 PID: 13517 Comm: bcache-register Not tainted 4.4.2-amd64-i915-volpreempt-sysrq-20160214bc5 #8
> Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
>  0000000000000000 ffff88037cb4fb00 ffffffff8134ae4a 0000000000000001
>  ffff88037cb4fb98 ffffffff81124b00 024080c07cb4fc3c 024080c000000040
>  0000000700000001 0000000000000007 0000000000000007 0000000000000040
> Call Trace:
>  [<ffffffff8134ae4a>] dump_stack+0x44/0x55
>  [<ffffffff81124b00>] warn_alloc_failed+0x114/0x12c
>  [<ffffffff81127502>] __alloc_pages_nodemask+0x7cb/0x84c
>  [<ffffffff8107dfca>] ? try_to_wake_up+0x1d9/0x1eb
>  [<ffffffff8115f721>] alloc_pages_current+0xa9/0xcd
>  [<ffffffff811237c8>] __get_free_pages+0xe/0x3c
>  [<ffffffffc059530b>] bch_cache_set_alloc+0x3c7/0x479 [bcache]
>  [<ffffffffc05965ea>] register_bcache+0x122d/0x143c [bcache]
>  [<ffffffff8134c893>] kobj_attr_store+0x10/0x1a
>  [<ffffffff811dfa29>] sysfs_kf_write+0x39/0x3b
>  [<ffffffff811df32b>] kernfs_fop_write+0xed/0x130
>  [<ffffffff8117d9c7>] __vfs_write+0x26/0xa5
>  [<ffffffff810b29b3>] ? current_kernel_time64+0x10/0x36
>  [<ffffffff812d609a>] ? security_file_permission+0x3b/0x42
>  [<ffffffff81091634>] ? percpu_down_read+0x12/0x41
>  [<ffffffff8117fbab>] ? __sb_start_write+0x2b/0x48
>  [<ffffffff8117e032>] vfs_write+0x9d/0xe8
>  [<ffffffff8117e807>] SyS_write+0x4d/0x78
>  [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3
>  [<ffffffff816e3e72>] sysenter_flags_fixed+0x8/0x12
> Mem-Info:
> active_anon:468787 inactive_anon:167077 isolated_anon:0
>  active_file:1380425 inactive_file:1192252 isolated_file:0
>  unevictable:1474 dirty:39397 writeback:3372 unstable:0
>  slab_reclaimable:164804 slab_unreclaimable:70389
>  mapped:416857 shmem:409963 pagetables:4551 bounce:0
>  free:302751 free_pcp:1140 free_cma:384
> Node 0 DMA free:15892kB min:256kB low:320kB high:384kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB ilowmem_reserve[]: 0 3201 15736 15736
> Node 0 DMA32 free:554388kB min:53336kB low:66668kB high:80004kB active_anon:185968kB inactive_anon:196384kB active_file:1429216kB inactive_flowmem_reserve[]: 0 0 12535 12535
> Node 0 Normal free:651784kB min:208544kB low:260680kB high:312816kB active_anon:1688260kB inactive_anon:471920kB active_file:4092556kB inact lowmem_reserve[]: 0 0 0 0
> Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
> Node 0 DMA32: 20169*4kB (UME) 59104*8kB (UME) 46*16kB (UM) 4*32kB (UM) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 554372kB
> Node 0 Normal: 58345*4kB (UMEC) 40698*8kB (UMEC) 4028*16kB (UMEC) 430*32kB (UMEC) 160*64kB (UM) 35*128kB (UM) 10*256kB (M) 0*512kB 0*1024kB 
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> 2991157 total pagecache pages
> 7348 pages in swap cache
> Swap cache stats: add 516663, delete 509315, find 10110495/10164991
> Free swap  = 14686944kB
> Total swap = 15616764kB
> 4118751 pages RAM
> 0 pages HighMem/MovableOnly
> 85016 pages reserved
> 4096 pages cma reserved
> 0 pages hwpoisoned
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-21  0:05     ` Richard Bade
  2016-03-21  0:46       ` Marc MERLIN
@ 2016-03-25  3:59       ` Eric Wheeler
  1 sibling, 0 replies; 20+ messages in thread
From: Eric Wheeler @ 2016-03-25  3:59 UTC (permalink / raw)
  To: Richard Bade; +Cc: linux-bcache, Marc MERLIN

On Mon, 21 Mar 2016, Richard Bade wrote:

> Hi Eric,
> We built a kernel with the 3 patches applied and I still got the box to lock up.
> I don't have any of the traces any more. Also I'm wondering what is
> the best way to get these? When this happens my machine completely
> locks up so can't grab anything from dmesg.
> Let me know what you want me to grab and I'll crash it again.

Netconsole is my favorite (modprobe netconsole args).  On distro's that 
provide /etc/sysconfig/netconsole its pretty easy to point the logs at a 
syslog server---otherwise you'll need to get the MAC addr of the 
destination host (which might be your router if its on a different layer3 
network).  Then just configure syslog to receive logs.  Other ways include 
serial consoles (add 'console=ttyS0,115200n81 console=tty1' to kernel 
cmdline).

For virtual machines (KVM at least) you can usually configure a virtual 
serial console and get it that way.  In the absence of all of that, 
vga=xxx will make your display bigger if you find the right resolution and 
somethimes that is useful if the others aren't an option.

-Eric

> 
> Regards,
> Richard
> 
> On 10 March 2016 at 14:34, Eric Wheeler <bcache@lists.ewheeler.net> wrote:
> > Hi Richard, Marc,
> >
> >>>> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
> >
> > Do you still have the backtraces that show the function call stack for
> > errors that look like this?
> >         %s: page allocation failure: order:%d, mode:0x%x
> >
> > Please send as many relevant OOM failure traces that you can.  I would
> > like to see which memory allocation(s) are failing and if they are always
> > the same stack trace.
> >
> > In the example above, order 7 means 2^7 of 4k pages, so it means the
> > kernel can't find 512k of contiguous memory that can be allocated.
> >
> > It looks like the OOM is triggered in bch_cache_set_alloc, but might be
> > cache_alloc too.  I'm not sure if an alternate allocation mechanism can be
> > used safely, but thats what I want to look into.
> >
> > Thanks!
> >
> > -Eric
> >
> > --
> > Eric Wheeler
> >
> > On Wed, 9 Mar 2016, Eric Wheeler wrote:
> >
> >> On Wed, 9 Mar 2016, Richard Bade wrote:
> >>
> >> > Hi Guys,
> >> > I've also seen this issue when registering a device.
> >> > [Fri Feb  5 14:06:59 2016] bcache: register_cache() error opening
> >> > dm-6: cannot allocate memory
> >> >
> >> > For me the situation happens because I'm running Ceph OSD's on the
> >> > host. These are using all the available memory and for some reason the
> >> > register cannot allocate memory.
> >> > I've reproduced this several times, but basically if I register on a
> >> > system that has been up for a while and therefore using all it's RAM
> >> > this will happen.
> >> > Here's two specific time's it's happened:
> >> > when creating using "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
> >> > --bdev /dev/sdf1". It failed due to xfs still remaining on /dev/sdf1.
> >>
> >> Do you get a kernel failure, or is it just make-bcache being safe?
> >>
> >> > Then run wipefs and "make-bcache --cache /dev/vg-raid0/lv-ceph_j19
> >> > --bdev /dev/sdf1" again. CRASH.
> >>
> >> make-bcache will auto register.  (Maybe there's an option to stop that,
> >> I've not looked.)
> >>
> >> > when manually creating a cache device using "make-bcache --cache
> >> > /dev/vg-raid0/lv-bcache". The cache set didn't show up in
> >> > /sys/fs/bcache. echo "/dev/vg-raid0/lv-bcache" >
> >> > /sys/fs/bcache/register. CRASH.
> >> > I was able to work around this issue by stopping the osd's before
> >> > registering and therefore freeing up memory.
> >>
> >> I would be curious if setting /proc/sys/vm/min_free_kbytes to 256*1024
> >> fixes that.
> >>
> >> > Because I am able to reproduce this relatively easily, I'm happy to
> >> > test any patches.
> >>
> >> Yes, thank you, please do:
> >>
> >> Try the three patches at the top of these two branches and see if you
> >> crash.  I'm hoping it fails gracefully on OOM:
> >>
> >> https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc6-bcache-fixes
> >>       commits 84a2edd, a704484
> >> https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc7-bcache-fixes
> >>       commit 87264e9
> >>
> >> If it stops crashing, then maybe we can work out a way to allocate memory
> >> such that it succeeds.  Please send any dmesg output.
> >>
> >> -Eric
> >>
> >> > Apart from the register crash, the cache sets have been up and stable
> >> > for over 1 month.
> >> >
> >> > Regards,
> >> > Richard
> >> >
> >> > On Mon, 7 Mar 2016, Marc MERLIN wrote:
> >> >
> >> > > I started a btrfs copy from a bcached device to another one (both are in
> >> > > the 5 to 15TB range).
> >> > > A minute later, my system that had been up for several days, crashed.
> >> > >
> >> > > Ultimately, it crashed on this:
> >> > > bcache: register_cache() error opening sdl2: cannot allocate memory
> >> > > BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
> >> > > IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> >> >
> >> > Strange about memory allocation issues.  Do you have
> >> > /proc/sys/vm/min_free_kbytes set to something like $((256*1024)) ? Is this
> >> > a multi-socket machine with all memory plugged into only one CPU?
> >> >
> >> > I'm curious though, why was registration called a second time? Was the
> >> > drive external?  Could udev be re-registering the device?
> >> >
> >> > You might find where the registration is being done and prevent it from
> >> > running automatically.  At least that might solve the re-registration
> >> > problem.
> >> >
> >> > As for the memory allocation issue, the backtrace indicates that this is a
> >> > registration-time problem, not a runtime issue.  I'm guessing it is one of
> >> > the threads attempting to proceed after a memory allocation error similar
> >> > to the writeback thread issue you had last time which was fixed by adding
> >> > some locking around the initialization.
> >> >
> >> > I'll look and see if I can come up with a reasonable patch to cleanup the
> >> > allocator or gc thread under OOM conditions at registration time.
> >> >
> >> > -Eric
> >> >
> >> > >
> >> > > I'm not sure why it tried to register sdl2 then when it was already
> >> > > registered from boot. Did it time out and had to be registered again?
> >> > >
> >> > > At this point, things with bcache have been bad enough, that I'm
> >> > > considering removing it. My system has been very unstable with it :(
> >> > > Or maybe I can just remove the caching devices, leave the backing ones,
> >> > > and run without cache for now, giving me the option to re-add the cache later.
> >> > >
> >> > > I actually just added 16GB of RAM to the server after adding bcache, and
> >> > > clearly going from 8 to 24GB made no difference. I'm wondering if there
> >> > > is a memory leak somewhere?
> >> > >
> >> > > Just before the system crashed, memory looked like this, which isn't bad:
> >> > >              total       used       free     shared    buffers     cached
> >> > > Mem:      24392440   24241440     151000          0         88   20116748
> >> > > -/+ buffers/cache:    4124604   20267836
> >> > > Swap:     15616764     244452   15372312
> >> > >
> >> > > I'm using btrfs on top of dmcrypt on top of bcache.
> >> > > btrfs on top of dmcrypt was stable, I had been using it for years. Adding bcache is what
> >> > > caused all my recent problems to start.
> >> > >
> >> > > Any ideas?
> >> > >
> >> > > Thanks,
> >> > > Marc
> >> > >
> >> > >
> >> > > [290569.458707] Process accounting resumed
> >> > > [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
> >> > > [290623.697143] CPU: 0 PID: 1284 Comm: bcache-register Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
> >> > > [290623.732143] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> >> > > [290623.762428]  0000000000000000 ffff88001df7bb08 ffffffff8134ae0a 0000000000000001
> >> > > [290623.787470]  ffff88001df7bba0 ffffffff81124ab6 024080c01df7bc44 024080c000000040
> >> > > [290623.811775]  0000000700000001 0000000000000007 0000000000000007 0000000000000040
> >> > > [290623.836443] Call Trace:
> >> > > [290623.846280]  [<ffffffff8134ae0a>] dump_stack+0x44/0x55
> >> > > [290623.864123]  [<ffffffff81124ab6>] warn_alloc_failed+0x114/0x12c
> >> > > [290623.883425]  [<ffffffff811274b8>] __alloc_pages_nodemask+0x7cb/0x84c
> >> > > [290623.904159]  [<ffffffff8107df80>] ? try_to_wake_up+0x1d9/0x1eb
> >> > > [290623.923173]  [<ffffffff8115f6d7>] alloc_pages_current+0xa9/0xcd
> >> > > [290623.942435]  [<ffffffff8112377e>] __get_free_pages+0xe/0x3c
> >> > > [290623.960621]  [<ffffffffc05192f0>] bch_cache_set_alloc+0x3c7/0x479 [bcache]
> >> > > [290623.983825]  [<ffffffffc051a5e3>] register_bcache+0x1241/0x1452 [bcache]
> >> > > [290624.005456]  [<ffffffff8134c853>] kobj_attr_store+0x10/0x1a
> >> > > [290624.023729]  [<ffffffff811df9df>] sysfs_kf_write+0x39/0x3b
> >> > > [290624.041739]  [<ffffffff811df2e1>] kernfs_fop_write+0xed/0x130
> >> > > [290624.060394]  [<ffffffff8117d97d>] __vfs_write+0x26/0xa5
> >> > > [290624.077452]  [<ffffffff810b2969>] ? current_kernel_time64+0x10/0x36
> >> > > [290624.097921]  [<ffffffff812d6050>] ? security_file_permission+0x3b/0x42
> >> > > [290624.119197]  [<ffffffff810915ea>] ? percpu_down_read+0x12/0x41
> >> > > [290624.138945]  [<ffffffff8117fb61>] ? __sb_start_write+0x2b/0x48
> >> > > [290624.158291]  [<ffffffff8117dfe8>] vfs_write+0x9d/0xe8
> >> > > [290624.174919]  [<ffffffff8117e7bd>] SyS_write+0x4d/0x78
> >> > > [290624.191455]  [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3
> >> > > [290624.210331]  [<ffffffff816e3e32>] sysenter_flags_fixed+0x8/0x12
> >> > > [290624.229355] Mem-Info:
> >> > > [290624.237432] active_anon:607145 inactive_anon:208571 isolated_anon:0
> >> > > [290624.237432]  active_file:2103407 inactive_file:1565209 isolated_file:0
> >> > > [290624.237432]  unevictable:1307 dirty:188846 writeback:138775 unstable:0
> >> > > [290624.237432]  slab_reclaimable:180112 slab_unreclaimable:73373
> >> > > [290624.237432]  mapped:419640 shmem:466581 pagetables:5778 bounce:0
> >> > > [290624.237432]  free:988612 free_pcp:2130 free_cma:14
> >> > > [290624.351715] Node 0 DMA free:15884kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> >> > > [290624.483875] lowmem_reserve[]: 0 3201 23800 23800
> >> > > [290624.500998] Node 0 DMA32 free:581000kB min:2648kB low:3308kB high:3972kB active_anon:229220kB inactive_anon:248156kB active_file:1102468kB inactive_file:832264kB unevictable:948kB isolated(anon):0kB isolated(file):68kB present:3362068kB managed:3283032kB mlocked:68719477684kB dirty:94316kB writeback:65996kB mapped:253400kB shmem:282168kB slab_reclaimable:96344kB slab_unreclaimable:41308kB kernel_stack:2480kB pagetables:2952kB unstable:0kB bounce:0kB free_pcp:4008kB local_pcp:140kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> >> > > [290624.655741] lowmem_reserve[]: 0 0 20599 20599
> >> > > [290624.674232] Node 0 Normal free:3368136kB min:17048kB low:21308kB high:25572kB active_anon:2203780kB inactive_anon:586104kB active_file:7311280kB inactive_file:5430068kB unevictable:4280kB isolated(anon):0kB isolated(file):0kB present:21485568kB managed:21093516kB mlocked:481036341432kB dirty:662172kB writeback:472472kB mapped:1425288kB shmem:1584132kB slab_reclaimable:624104kB slab_unreclaimable:251680kB kernel_stack:11696kB pagetables:20180kB unstable:0kB bounce:0kB free_pcp:4304kB local_pcp:588kB free_cma:56kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> >> > > [290624.834378] lowmem_reserve[]: 0 0 0 0
> >> > > [290624.848459] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
> >> > > [290624.895688] Node 0 DMA32: 30635*4kB (UME) 23925*8kB (UME) 7478*16kB (UME) 2420*32kB (UME) 794*64kB (UME) 139*128kB (UME) 10*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 582196kB
> >> > > [290624.947950] Node 0 Normal: 241499*4kB (UME) 161875*8kB (UME) 33424*16kB (UME) 10915*32kB (UME) 2795*64kB (UME) 334*128kB (UME) 15*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3370532kB
> >> > > [290625.003724] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> >> > > [290625.030463] 4142937 total pagecache pages
> >> > > [290625.043969] 5782 pages in swap cache
> >> > > [290625.056118] Swap cache stats: add 102837, delete 97055, find 627323/635133
> >> > > [290625.078329] Free swap  = 15372324kB
> >> > > [290625.090190] Total swap = 15616764kB
> >> > > [290625.102186] 6215903 pages RAM
> >> > > [290625.112399] 0 pages HighMem/MovableOnly
> >> > > [290625.125747] 117793 pages reserved
> >> > > [290625.137495] 4096 pages cma reserved
> >> > > [290625.149199] 0 pages hwpoisoned
> >> > > [290625.159961] bcache: register_cache() error opening sdl2: cannot allocate memory
> >> > > [290625.183410] BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
> >> > > [290625.208648] IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> >> > > [290625.230734] PGD 0
> >> > > [290625.238446] Oops: 0000 [#1] SMP
> >> > > [290625.249857] Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype tun bridge stp llc autofs4 softdog binfmt_misc ftdi_sio nfsd nfs_acl auth_rpcgss nfs fscache lockd grace sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter bcache lm85 hwmon_vid pl2303 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables nf_conntrack_ftp ipt_MASQUERADE x_tables nf_nat_masquerade_ipv4 nf_nat nf_conntrack sg st snd_pcm_oss snd_mixer_oss kvm_intel snd_hda_codec_realtek snd_hda_codec_generic snd_opl3_synth kvm snd_seq_midi_emul snd_hda_intel snd_cmipci gameport snd_hda_codec snd_opl3_lib snd_mpu401_uart snd_seq_midi irqbypass snd_seq_midi
 _event snd_seq snd_hda_core snd_hwdep snd_rawmidi asix coretemp eeepc_wmi tpm_infineon rc_ati_x10 snd_pcm libphy snd_seq_device asus_wmi snd_timer sparse_keymap rfkill intel_rapl 8250_fintek
  a
>  t
> >  i_
> >>  re
> >> >  mo
> >> > >  te usbnet tpm_tis snd rc_core battery tpm evdev iosf_mbi wmi pcspkr input_leds intel_powerclamp parport_pc processor i2c_i801 xhci_pci x86_pkg_temp_thermal usbserial soundcore parport xhci_hcd lpc_ich e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx blowfish_x86_64 blowfish_common ecb xts ansi_cprng drbg aesni_intel ablk_helper lrw gf128mul glue_helper aes_x86_64 crct10dif_pclmul crc32_pclmul crc32c_intel r8169 ehci_pci fjes ehci_hcd thermal mii sata_mv fan cryptd usbcore usb_common sata_sil24 [last unloaded: ftdi_sio]
> >> > > [290625.713109] CPU: 0 PID: 25842 Comm: kworker/0:0 Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
> >> > > [290625.747597] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> >> > > [290625.776918] Workqueue: events cache_set_flush [bcache]
> >> > > [290625.794260] task: ffff880467caa0c0 ti: ffff8805af184000 task.ti: ffff8805af184000
> >> > > [290625.818544] RIP: 0010:[<ffffffffc0516e8d>]  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> >> > > [290625.848082] RSP: 0018:ffff8805af187e10  EFLAGS: 00010202
> >> > > [290625.866890] RAX: 0000000000000001 RBX: ffff880153a40ce8 RCX: 0000000000000282
> >> > > [290625.891136] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> >> > > [290625.914365] RBP: ffff8805af187e30 R08: 00000000000000fa R09: ffffe8ffffc01900
> >> > > [290625.937588] R10: ffff880088430838 R11: ffff880088430838 R12: 0000000000000000
> >> > > [290625.960781] R13: ffff880153a40de8 R14: 0000000000000000 R15: ffffe8ffffc01900
> >> > > [290625.983951] FS:  0000000000000000(0000) GS:ffff88061e200000(0000) knlGS:0000000000000000
> >> > > [290626.010918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > > [290626.030800] CR2: 00000000000009b8 CR3: 0000000001e0a000 CR4: 00000000001406f0
> >> > > [290626.053914] Stack:
> >> > > [290626.061687]  ffff8805d6183280 ffff88061e216040 ffff880153a40ce8 0000000000000000
> >> > > [290626.085725]  ffff8805af187e78 ffffffff81070fcf ffff8805d61832b0 000000001e216058
> >> > > [290626.110510]  ffff8805d6183280 ffff88061e216040 ffff8805d61832b0 ffff88061e216058
> >> > > [290626.135329] Call Trace:
> >> > > [290626.145108]  [<ffffffff81070fcf>] process_one_work+0x15b/0x260
> >> > > [290626.164241]  [<ffffffff81071574>] worker_thread+0x1f0/0x29d
> >> > > [290626.182561]  [<ffffffff81071384>] ? rescuer_thread+0x281/0x281
> >> > > [290626.201721]  [<ffffffff810758e9>] kthread+0xa5/0xad
> >> > > [290626.217966]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
> >> > > [290626.236505]  [<ffffffff816e1c0f>] ret_from_fork+0x3f/0x70
> >> > > [290626.255052]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
> >> > > [290626.274460] Code: 89 1c c1 4d 8b a4 24 f0 01 00 00 49 81 ec f0 01 00 00 eb b4 45 31 e4 0f b7 83 4c f7 ff ff 4a 8b 94 e3 58 ff ff ff 44 39 e0 76 16 <48> 8b ba b8 09 00 00 48 85 ff 74 05 e8 61 e8 b5 c0 49 ff c4 eb
> >> > > [290626.337443] RIP  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> >> > > [290626.359833]  RSP <ffff8805af187e10>
> >> > > [290626.372050] CR2: 00000000000009b8
> >> > > [290626.383646] ---[ end trace f97d88b8bb456800 ]---
> >> > > [290626.399977] Kernel panic - not syncing: Fatal exception
> >> > > [290626.418169] Kernel Offset: disabled
> >> > > [290626.430278] Rebooting in 20 seconds..
> >> > > [290646.378903] ACPI MEMORY or I/O RESET_REG.
> >> > > --
> >> > > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> >> > > Microsoft is to operating systems ....
> >> > >                                       .... what McDonalds is to gourmet cooking
> >> > > Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> >> > > --
> >> > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> >> > > the body of a message to majordomo@vger.kernel.org
> >> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> > >
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> >> > the body of a message to majordomo@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> >> > the body of a message to majordomo@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-09  3:59     ` Eric Wheeler
@ 2016-03-09 20:55       ` Marc MERLIN
  0 siblings, 0 replies; 20+ messages in thread
From: Marc MERLIN @ 2016-03-09 20:55 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: linux-bcache

On Wed, Mar 09, 2016 at 03:59:30AM +0000, Eric Wheeler wrote:
> > gargamel:/mnt/mnt# cat /proc/sys/vm/min_free_kbytes
> > 19712
> 
> Ours is set to 256mb (256*1024) and I've never had a problem.  
> 
> > Should I change it?
> 
> Could try it, shrug.

Done :)

> > So clearly on this boot too, it got registered late (20h-ish after boot)
> 
> I find it interesting that it re-registered md5 within 5 minutes of 24 
> hours after initial registration: (86242-102)/3600 = 23:55:40
 
Oh, I didn't do the math, but yeah, that looks very suspicous :)

> Is there some kind of cron.daily thing going on?  If you have timestamps 
> for that kernel log, maybe check cron for logs too.

I don't have any cronjobs that do anything with bcache, but I have
cronjobs that scan all drives and save all partitions plus other related
stuff.
That said, if I recall correctly, it died just when I restarted a copy
to that filesystem (but my memory of that event is getting hazy now).

I checked syslog around that time though 
Mar  6 06:29:50 gargamel kernel: [204255.678548] bcache-register: page allocation failure: order:7, mode:0x24080c0
and found nothing related to a cronjob or /dev/sdl disappearing and
coming back.

> Are there any intevening non-bcache lines indicating a disk was removed 
> (eg, bad usb cable) and re-added?  

Good question, none that I can find.
 
> > > Do you have this patch?
> > > https://bitbucket.org/ewheelerinc/linux/commits/a7044848050ac60e178798d20ea8a3ef2be36bc7?at=master
> >  
> > I got the other patches you sent me last time, but didn't end up with
> > this one, sorry if you sent it to me and I dropped it.
> > I'll apply it now, thanks.
> 
> All of the patches related to troubleshooting with you are here:
> 
>   https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc6-bcache-fixes
> and here:
>   https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc7-bcache-fixes
> 
> so make sure all 3 are applied.  It might still OOM, but it shouldn't 
> crash if we got it all.

Thanks. I just checked that I have all those paatches now.

I will report back if I get more interesting output :)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-08 23:52   ` Marc MERLIN
@ 2016-03-09  3:59     ` Eric Wheeler
  2016-03-09 20:55       ` Marc MERLIN
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Wheeler @ 2016-03-09  3:59 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-bcache

On Tue, 8 Mar 2016, Marc MERLIN wrote:

> On Mon, Mar 07, 2016 at 07:56:56PM +0000, Eric Wheeler wrote:
> > Strange about memory allocation issues.  Do you have 
> > /proc/sys/vm/min_free_kbytes set to something like $((256*1024)) ? Is this 
> > a multi-socket machine with all memory plugged into only one CPU?
>  
> gargamel:/mnt/mnt# cat /proc/sys/vm/min_free_kbytes
> 19712

Ours is set to 256mb (256*1024) and I've never had a problem.  

> Should I change it?

Could try it, shrug.

> 
> > I'm curious though, why was registration called a second time? Was the 
> > drive external?  Could udev be re-registering the device?
> 
> Yeah, this puzzled me.
> The filesystem was already mounted, I made a long copy via btrfs send,
> it failed before the end, I came back a day or so later, so the copy
> failed, restarted it, and then the kernel crashed.
> It seems that accessing the filesystem (that was already mounted) caused
> bcache to register the cache device then?
> I have no idea why though.
> 
> This is kind of weird:
> [   86.612756] bcache: register_bdev() registered backing device md5
> [  102.097299] bcache: bch_journal_replay() journal replay done, 41 keys in 4 entries, seq 22200
> [  102.124135] bcache: register_cache() registered cache device dm-4
> [  102.151653] bcache: register_bdev() registered backing device dm-1
> [  102.221977] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set 0226553a-37cf-41d5-b3ce-8b1e944543a8
> [  102.253183] bcache: register_bcache() error opening /dev/md5: device already registered
> [86240.547242] bcache: bch_journal_replay() journal replay done, 0 keys in 2 entries, seq 215862
> [86242.109874] bcache: bch_cached_dev_attach() Caching md5 as bcache0 on set 5bc072a8-ab17-446d-9744-e247949913c1
> [86242.141648] bcache: register_cache() registered cache device sdh2
> [86253.186416] bcache: register_bcache() error opening /dev/sdh2: device already registered
> 
> So clearly on this boot too, it got registered late (20h-ish after boot)

I find it interesting that it re-registered md5 within 5 minutes of 24 
hours after initial registration: (86242-102)/3600 = 23:55:40

Is there some kind of cron.daily thing going on?  If you have timestamps 
for that kernel log, maybe check cron for logs too.

Are there any intevening non-bcache lines indicating a disk was removed 
(eg, bad usb cable) and re-added?  

> > Do you have this patch?
> > https://bitbucket.org/ewheelerinc/linux/commits/a7044848050ac60e178798d20ea8a3ef2be36bc7?at=master
>  
> I got the other patches you sent me last time, but didn't end up with
> this one, sorry if you sent it to me and I dropped it.
> I'll apply it now, thanks.

All of the patches related to troubleshooting with you are here:

  https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc6-bcache-fixes
and here:
  https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc7-bcache-fixes

so make sure all 3 are applied.  It might still OOM, but it shouldn't 
crash if we got it all.

-Eric

> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-07 19:56 ` Eric Wheeler
@ 2016-03-08 23:52   ` Marc MERLIN
  2016-03-09  3:59     ` Eric Wheeler
  0 siblings, 1 reply; 20+ messages in thread
From: Marc MERLIN @ 2016-03-08 23:52 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: linux-bcache, kent.overstreet

On Mon, Mar 07, 2016 at 07:56:56PM +0000, Eric Wheeler wrote:
> Strange about memory allocation issues.  Do you have 
> /proc/sys/vm/min_free_kbytes set to something like $((256*1024)) ? Is this 
> a multi-socket machine with all memory plugged into only one CPU?
 
gargamel:/mnt/mnt# cat /proc/sys/vm/min_free_kbytes
19712

Should I change it?

> I'm curious though, why was registration called a second time? Was the 
> drive external?  Could udev be re-registering the device?

Yeah, this puzzled me.
The filesystem was already mounted, I made a long copy via btrfs send,
it failed before the end, I came back a day or so later, so the copy
failed, restarted it, and then the kernel crashed.
It seems that accessing the filesystem (that was already mounted) caused
bcache to register the cache device then?
I have no idea why though.

This is kind of weird:
[   86.612756] bcache: register_bdev() registered backing device md5
[  102.097299] bcache: bch_journal_replay() journal replay done, 41 keys in 4 entries, seq 22200
[  102.124135] bcache: register_cache() registered cache device dm-4
[  102.151653] bcache: register_bdev() registered backing device dm-1
[  102.221977] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set 0226553a-37cf-41d5-b3ce-8b1e944543a8
[  102.253183] bcache: register_bcache() error opening /dev/md5: device already registered
[86240.547242] bcache: bch_journal_replay() journal replay done, 0 keys in 2 entries, seq 215862
[86242.109874] bcache: bch_cached_dev_attach() Caching md5 as bcache0 on set 5bc072a8-ab17-446d-9744-e247949913c1
[86242.141648] bcache: register_cache() registered cache device sdh2
[86253.186416] bcache: register_bcache() error opening /dev/sdh2: device already registered

So clearly on this boot too, it got registered late (20h-ish after boot)

> You might find where the registration is being done and prevent it from 
> running automatically.  At least that might solve the re-registration 
> problem.

Right.

> As for the memory allocation issue, the backtrace indicates that this is a 
> registration-time problem, not a runtime issue.  I'm guessing it is one of 
> the threads attempting to proceed after a memory allocation error similar 
> to the writeback thread issue you had last time which was fixed by adding 
> some locking around the initialization.

Makes snese.

On Mon, Mar 07, 2016 at 08:35:00PM +0000, Eric Wheeler wrote:
> Looking at the stack trace, bch_cache_set_alloc() appears to fail doing a 
> kzalloc() and returns NULL.  This causes register_cache_set() to return 
> "cannot allocate memory" but that error path isn't handled without my 
> upstream commit that went to Jens.  
> 
> Marc,
> 
> Do you have this patch?
> https://bitbucket.org/ewheelerinc/linux/commits/a7044848050ac60e178798d20ea8a3ef2be36bc7?at=master
 
I got the other patches you sent me last time, but didn't end up with
this one, sorry if you sent it to me and I dropped it.
I'll apply it now, thanks.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bcache still unstable for me (memory problems)
  2016-03-07 14:45 Marc MERLIN
@ 2016-03-07 19:56 ` Eric Wheeler
  2016-03-08 23:52   ` Marc MERLIN
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Wheeler @ 2016-03-07 19:56 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-bcache


On Mon, 7 Mar 2016, Marc MERLIN wrote:

> I started a btrfs copy from a bcached device to another one (both are in
> the 5 to 15TB range).
> A minute later, my system that had been up for several days, crashed.
> 
> Ultimately, it crashed on this:
> bcache: register_cache() error opening sdl2: cannot allocate memory
> BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
> IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]

Strange about memory allocation issues.  Do you have 
/proc/sys/vm/min_free_kbytes set to something like $((256*1024)) ? Is this 
a multi-socket machine with all memory plugged into only one CPU?

I'm curious though, why was registration called a second time? Was the 
drive external?  Could udev be re-registering the device?

You might find where the registration is being done and prevent it from 
running automatically.  At least that might solve the re-registration 
problem.

As for the memory allocation issue, the backtrace indicates that this is a 
registration-time problem, not a runtime issue.  I'm guessing it is one of 
the threads attempting to proceed after a memory allocation error similar 
to the writeback thread issue you had last time which was fixed by adding 
some locking around the initialization.

I'll look and see if I can come up with a reasonable patch to cleanup the 
allocator or gc thread under OOM conditions at registration time.

-Eric

> 
> I'm not sure why it tried to register sdl2 then when it was already
> registered from boot. Did it time out and had to be registered again?
> 
> At this point, things with bcache have been bad enough, that I'm
> considering removing it. My system has been very unstable with it :(
> Or maybe I can just remove the caching devices, leave the backing ones, 
> and run without cache for now, giving me the option to re-add the cache later.
> 
> I actually just added 16GB of RAM to the server after adding bcache, and
> clearly going from 8 to 24GB made no difference. I'm wondering if there
> is a memory leak somewhere?
> 
> Just before the system crashed, memory looked like this, which isn't bad:
>              total       used       free     shared    buffers     cached
> Mem:      24392440   24241440     151000          0         88   20116748
> -/+ buffers/cache:    4124604   20267836
> Swap:     15616764     244452   15372312
> 
> I'm using btrfs on top of dmcrypt on top of bcache.
> btrfs on top of dmcrypt was stable, I had been using it for years. Adding bcache is what
> caused all my recent problems to start.
> 
> Any ideas?
> 
> Thanks,
> Marc
> 
> 
> [290569.458707] Process accounting resumed
> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
> [290623.697143] CPU: 0 PID: 1284 Comm: bcache-register Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
> [290623.732143] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [290623.762428]  0000000000000000 ffff88001df7bb08 ffffffff8134ae0a 0000000000000001
> [290623.787470]  ffff88001df7bba0 ffffffff81124ab6 024080c01df7bc44 024080c000000040
> [290623.811775]  0000000700000001 0000000000000007 0000000000000007 0000000000000040
> [290623.836443] Call Trace:
> [290623.846280]  [<ffffffff8134ae0a>] dump_stack+0x44/0x55
> [290623.864123]  [<ffffffff81124ab6>] warn_alloc_failed+0x114/0x12c
> [290623.883425]  [<ffffffff811274b8>] __alloc_pages_nodemask+0x7cb/0x84c
> [290623.904159]  [<ffffffff8107df80>] ? try_to_wake_up+0x1d9/0x1eb
> [290623.923173]  [<ffffffff8115f6d7>] alloc_pages_current+0xa9/0xcd
> [290623.942435]  [<ffffffff8112377e>] __get_free_pages+0xe/0x3c
> [290623.960621]  [<ffffffffc05192f0>] bch_cache_set_alloc+0x3c7/0x479 [bcache]
> [290623.983825]  [<ffffffffc051a5e3>] register_bcache+0x1241/0x1452 [bcache]
> [290624.005456]  [<ffffffff8134c853>] kobj_attr_store+0x10/0x1a
> [290624.023729]  [<ffffffff811df9df>] sysfs_kf_write+0x39/0x3b
> [290624.041739]  [<ffffffff811df2e1>] kernfs_fop_write+0xed/0x130
> [290624.060394]  [<ffffffff8117d97d>] __vfs_write+0x26/0xa5
> [290624.077452]  [<ffffffff810b2969>] ? current_kernel_time64+0x10/0x36
> [290624.097921]  [<ffffffff812d6050>] ? security_file_permission+0x3b/0x42
> [290624.119197]  [<ffffffff810915ea>] ? percpu_down_read+0x12/0x41
> [290624.138945]  [<ffffffff8117fb61>] ? __sb_start_write+0x2b/0x48
> [290624.158291]  [<ffffffff8117dfe8>] vfs_write+0x9d/0xe8
> [290624.174919]  [<ffffffff8117e7bd>] SyS_write+0x4d/0x78
> [290624.191455]  [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3
> [290624.210331]  [<ffffffff816e3e32>] sysenter_flags_fixed+0x8/0x12
> [290624.229355] Mem-Info:
> [290624.237432] active_anon:607145 inactive_anon:208571 isolated_anon:0
> [290624.237432]  active_file:2103407 inactive_file:1565209 isolated_file:0
> [290624.237432]  unevictable:1307 dirty:188846 writeback:138775 unstable:0
> [290624.237432]  slab_reclaimable:180112 slab_unreclaimable:73373
> [290624.237432]  mapped:419640 shmem:466581 pagetables:5778 bounce:0
> [290624.237432]  free:988612 free_pcp:2130 free_cma:14
> [290624.351715] Node 0 DMA free:15884kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> [290624.483875] lowmem_reserve[]: 0 3201 23800 23800
> [290624.500998] Node 0 DMA32 free:581000kB min:2648kB low:3308kB high:3972kB active_anon:229220kB inactive_anon:248156kB active_file:1102468kB inactive_file:832264kB unevictable:948kB isolated(anon):0kB isolated(file):68kB present:3362068kB managed:3283032kB mlocked:68719477684kB dirty:94316kB writeback:65996kB mapped:253400kB shmem:282168kB slab_reclaimable:96344kB slab_unreclaimable:41308kB kernel_stack:2480kB pagetables:2952kB unstable:0kB bounce:0kB free_pcp:4008kB local_pcp:140kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> [290624.655741] lowmem_reserve[]: 0 0 20599 20599
> [290624.674232] Node 0 Normal free:3368136kB min:17048kB low:21308kB high:25572kB active_anon:2203780kB inactive_anon:586104kB active_file:7311280kB inactive_file:5430068kB unevictable:4280kB isolated(anon):0kB isolated(file):0kB present:21485568kB managed:21093516kB mlocked:481036341432kB dirty:662172kB writeback:472472kB mapped:1425288kB shmem:1584132kB slab_reclaimable:624104kB slab_unreclaimable:251680kB kernel_stack:11696kB pagetables:20180kB unstable:0kB bounce:0kB free_pcp:4304kB local_pcp:588kB free_cma:56kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> [290624.834378] lowmem_reserve[]: 0 0 0 0
> [290624.848459] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
> [290624.895688] Node 0 DMA32: 30635*4kB (UME) 23925*8kB (UME) 7478*16kB (UME) 2420*32kB (UME) 794*64kB (UME) 139*128kB (UME) 10*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 582196kB
> [290624.947950] Node 0 Normal: 241499*4kB (UME) 161875*8kB (UME) 33424*16kB (UME) 10915*32kB (UME) 2795*64kB (UME) 334*128kB (UME) 15*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3370532kB
> [290625.003724] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> [290625.030463] 4142937 total pagecache pages
> [290625.043969] 5782 pages in swap cache
> [290625.056118] Swap cache stats: add 102837, delete 97055, find 627323/635133
> [290625.078329] Free swap  = 15372324kB
> [290625.090190] Total swap = 15616764kB
> [290625.102186] 6215903 pages RAM
> [290625.112399] 0 pages HighMem/MovableOnly
> [290625.125747] 117793 pages reserved
> [290625.137495] 4096 pages cma reserved
> [290625.149199] 0 pages hwpoisoned
> [290625.159961] bcache: register_cache() error opening sdl2: cannot allocate memory
> [290625.183410] BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
> [290625.208648] IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> [290625.230734] PGD 0
> [290625.238446] Oops: 0000 [#1] SMP
> [290625.249857] Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype tun bridge stp llc autofs4 softdog binfmt_misc ftdi_sio nfsd nfs_acl auth_rpcgss nfs fscache lockd grace sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter bcache lm85 hwmon_vid pl2303 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables nf_conntrack_ftp ipt_MASQUERADE x_tables nf_nat_masquerade_ipv4 nf_nat nf_conntrack sg st snd_pcm_oss snd_mixer_oss kvm_intel snd_hda_codec_realtek snd_hda_codec_generic snd_opl3_synth kvm snd_seq_midi_emul snd_hda_intel snd_cmipci gameport snd_hda_codec snd_opl3_lib snd_mpu401_uart snd_seq_midi irqbypass snd_seq_midi_event 
 snd_seq snd_hda_core snd_hwdep snd_rawmidi asix coretemp eeepc_wmi tpm_infineon rc_ati_x10 snd_pcm libphy snd_seq_device asus_wmi snd_timer sparse_keymap rfkill intel_rapl 8250_fintek ati_re
 mo
>  te usbnet tpm_tis snd rc_core battery tpm evdev iosf_mbi wmi pcspkr input_leds intel_powerclamp parport_pc processor i2c_i801 xhci_pci x86_pkg_temp_thermal usbserial soundcore parport xhci_hcd lpc_ich e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx blowfish_x86_64 blowfish_common ecb xts ansi_cprng drbg aesni_intel ablk_helper lrw gf128mul glue_helper aes_x86_64 crct10dif_pclmul crc32_pclmul crc32c_intel r8169 ehci_pci fjes ehci_hcd thermal mii sata_mv fan cryptd usbcore usb_common sata_sil24 [last unloaded: ftdi_sio]
> [290625.713109] CPU: 0 PID: 25842 Comm: kworker/0:0 Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
> [290625.747597] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
> [290625.776918] Workqueue: events cache_set_flush [bcache]
> [290625.794260] task: ffff880467caa0c0 ti: ffff8805af184000 task.ti: ffff8805af184000
> [290625.818544] RIP: 0010:[<ffffffffc0516e8d>]  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> [290625.848082] RSP: 0018:ffff8805af187e10  EFLAGS: 00010202
> [290625.866890] RAX: 0000000000000001 RBX: ffff880153a40ce8 RCX: 0000000000000282
> [290625.891136] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [290625.914365] RBP: ffff8805af187e30 R08: 00000000000000fa R09: ffffe8ffffc01900
> [290625.937588] R10: ffff880088430838 R11: ffff880088430838 R12: 0000000000000000
> [290625.960781] R13: ffff880153a40de8 R14: 0000000000000000 R15: ffffe8ffffc01900
> [290625.983951] FS:  0000000000000000(0000) GS:ffff88061e200000(0000) knlGS:0000000000000000
> [290626.010918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [290626.030800] CR2: 00000000000009b8 CR3: 0000000001e0a000 CR4: 00000000001406f0
> [290626.053914] Stack:
> [290626.061687]  ffff8805d6183280 ffff88061e216040 ffff880153a40ce8 0000000000000000
> [290626.085725]  ffff8805af187e78 ffffffff81070fcf ffff8805d61832b0 000000001e216058
> [290626.110510]  ffff8805d6183280 ffff88061e216040 ffff8805d61832b0 ffff88061e216058
> [290626.135329] Call Trace:
> [290626.145108]  [<ffffffff81070fcf>] process_one_work+0x15b/0x260
> [290626.164241]  [<ffffffff81071574>] worker_thread+0x1f0/0x29d
> [290626.182561]  [<ffffffff81071384>] ? rescuer_thread+0x281/0x281
> [290626.201721]  [<ffffffff810758e9>] kthread+0xa5/0xad
> [290626.217966]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
> [290626.236505]  [<ffffffff816e1c0f>] ret_from_fork+0x3f/0x70
> [290626.255052]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
> [290626.274460] Code: 89 1c c1 4d 8b a4 24 f0 01 00 00 49 81 ec f0 01 00 00 eb b4 45 31 e4 0f b7 83 4c f7 ff ff 4a 8b 94 e3 58 ff ff ff 44 39 e0 76 16 <48> 8b ba b8 09 00 00 48 85 ff 74 05 e8 61 e8 b5 c0 49 ff c4 eb
> [290626.337443] RIP  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
> [290626.359833]  RSP <ffff8805af187e10>
> [290626.372050] CR2: 00000000000009b8
> [290626.383646] ---[ end trace f97d88b8bb456800 ]---
> [290626.399977] Kernel panic - not syncing: Fatal exception
> [290626.418169] Kernel Offset: disabled
> [290626.430278] Rebooting in 20 seconds..
> [290646.378903] ACPI MEMORY or I/O RESET_REG.
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bcache still unstable for me (memory problems)
@ 2016-03-07 14:45 Marc MERLIN
  2016-03-07 19:56 ` Eric Wheeler
  0 siblings, 1 reply; 20+ messages in thread
From: Marc MERLIN @ 2016-03-07 14:45 UTC (permalink / raw)
  To: linux-bcache

I started a btrfs copy from a bcached device to another one (both are in
the 5 to 15TB range).
A minute later, my system that had been up for several days, crashed.

Ultimately, it crashed on this:
bcache: register_cache() error opening sdl2: cannot allocate memory
BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]

I'm not sure why it tried to register sdl2 then when it was already
registered from boot. Did it time out and had to be registered again?

At this point, things with bcache have been bad enough, that I'm
considering removing it. My system has been very unstable with it :(
Or maybe I can just remove the caching devices, leave the backing ones, 
and run without cache for now, giving me the option to re-add the cache later.

I actually just added 16GB of RAM to the server after adding bcache, and
clearly going from 8 to 24GB made no difference. I'm wondering if there
is a memory leak somewhere?

Just before the system crashed, memory looked like this, which isn't bad:
             total       used       free     shared    buffers     cached
Mem:      24392440   24241440     151000          0         88   20116748
-/+ buffers/cache:    4124604   20267836
Swap:     15616764     244452   15372312

I'm using btrfs on top of dmcrypt on top of bcache.
btrfs on top of dmcrypt was stable, I had been using it for years. Adding bcache is what
caused all my recent problems to start.

Any ideas?

Thanks,
Marc


[290569.458707] Process accounting resumed
[290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0
[290623.697143] CPU: 0 PID: 1284 Comm: bcache-register Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
[290623.732143] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[290623.762428]  0000000000000000 ffff88001df7bb08 ffffffff8134ae0a 0000000000000001
[290623.787470]  ffff88001df7bba0 ffffffff81124ab6 024080c01df7bc44 024080c000000040
[290623.811775]  0000000700000001 0000000000000007 0000000000000007 0000000000000040
[290623.836443] Call Trace:
[290623.846280]  [<ffffffff8134ae0a>] dump_stack+0x44/0x55
[290623.864123]  [<ffffffff81124ab6>] warn_alloc_failed+0x114/0x12c
[290623.883425]  [<ffffffff811274b8>] __alloc_pages_nodemask+0x7cb/0x84c
[290623.904159]  [<ffffffff8107df80>] ? try_to_wake_up+0x1d9/0x1eb
[290623.923173]  [<ffffffff8115f6d7>] alloc_pages_current+0xa9/0xcd
[290623.942435]  [<ffffffff8112377e>] __get_free_pages+0xe/0x3c
[290623.960621]  [<ffffffffc05192f0>] bch_cache_set_alloc+0x3c7/0x479 [bcache]
[290623.983825]  [<ffffffffc051a5e3>] register_bcache+0x1241/0x1452 [bcache]
[290624.005456]  [<ffffffff8134c853>] kobj_attr_store+0x10/0x1a
[290624.023729]  [<ffffffff811df9df>] sysfs_kf_write+0x39/0x3b
[290624.041739]  [<ffffffff811df2e1>] kernfs_fop_write+0xed/0x130
[290624.060394]  [<ffffffff8117d97d>] __vfs_write+0x26/0xa5
[290624.077452]  [<ffffffff810b2969>] ? current_kernel_time64+0x10/0x36
[290624.097921]  [<ffffffff812d6050>] ? security_file_permission+0x3b/0x42
[290624.119197]  [<ffffffff810915ea>] ? percpu_down_read+0x12/0x41
[290624.138945]  [<ffffffff8117fb61>] ? __sb_start_write+0x2b/0x48
[290624.158291]  [<ffffffff8117dfe8>] vfs_write+0x9d/0xe8
[290624.174919]  [<ffffffff8117e7bd>] SyS_write+0x4d/0x78
[290624.191455]  [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3
[290624.210331]  [<ffffffff816e3e32>] sysenter_flags_fixed+0x8/0x12
[290624.229355] Mem-Info:
[290624.237432] active_anon:607145 inactive_anon:208571 isolated_anon:0
[290624.237432]  active_file:2103407 inactive_file:1565209 isolated_file:0
[290624.237432]  unevictable:1307 dirty:188846 writeback:138775 unstable:0
[290624.237432]  slab_reclaimable:180112 slab_unreclaimable:73373
[290624.237432]  mapped:419640 shmem:466581 pagetables:5778 bounce:0
[290624.237432]  free:988612 free_pcp:2130 free_cma:14
[290624.351715] Node 0 DMA free:15884kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[290624.483875] lowmem_reserve[]: 0 3201 23800 23800
[290624.500998] Node 0 DMA32 free:581000kB min:2648kB low:3308kB high:3972kB active_anon:229220kB inactive_anon:248156kB active_file:1102468kB inactive_file:832264kB unevictable:948kB isolated(anon):0kB isolated(file):68kB present:3362068kB managed:3283032kB mlocked:68719477684kB dirty:94316kB writeback:65996kB mapped:253400kB shmem:282168kB slab_reclaimable:96344kB slab_unreclaimable:41308kB kernel_stack:2480kB pagetables:2952kB unstable:0kB bounce:0kB free_pcp:4008kB local_pcp:140kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[290624.655741] lowmem_reserve[]: 0 0 20599 20599
[290624.674232] Node 0 Normal free:3368136kB min:17048kB low:21308kB high:25572kB active_anon:2203780kB inactive_anon:586104kB active_file:7311280kB inactive_file:5430068kB unevictable:4280kB isolated(anon):0kB isolated(file):0kB present:21485568kB managed:21093516kB mlocked:481036341432kB dirty:662172kB writeback:472472kB mapped:1425288kB shmem:1584132kB slab_reclaimable:624104kB slab_unreclaimable:251680kB kernel_stack:11696kB pagetables:20180kB unstable:0kB bounce:0kB free_pcp:4304kB local_pcp:588kB free_cma:56kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[290624.834378] lowmem_reserve[]: 0 0 0 0
[290624.848459] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB
[290624.895688] Node 0 DMA32: 30635*4kB (UME) 23925*8kB (UME) 7478*16kB (UME) 2420*32kB (UME) 794*64kB (UME) 139*128kB (UME) 10*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 582196kB
[290624.947950] Node 0 Normal: 241499*4kB (UME) 161875*8kB (UME) 33424*16kB (UME) 10915*32kB (UME) 2795*64kB (UME) 334*128kB (UME) 15*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3370532kB
[290625.003724] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[290625.030463] 4142937 total pagecache pages
[290625.043969] 5782 pages in swap cache
[290625.056118] Swap cache stats: add 102837, delete 97055, find 627323/635133
[290625.078329] Free swap  = 15372324kB
[290625.090190] Total swap = 15616764kB
[290625.102186] 6215903 pages RAM
[290625.112399] 0 pages HighMem/MovableOnly
[290625.125747] 117793 pages reserved
[290625.137495] 4096 pages cma reserved
[290625.149199] 0 pages hwpoisoned
[290625.159961] bcache: register_cache() error opening sdl2: cannot allocate memory
[290625.183410] BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
[290625.208648] IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
[290625.230734] PGD 0
[290625.238446] Oops: 0000 [#1] SMP
[290625.249857] Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype tun bridge stp llc autofs4 softdog binfmt_misc ftdi_sio nfsd nfs_acl auth_rpcgss nfs fscache lockd grace sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter bcache lm85 hwmon_vid pl2303 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables nf_conntrack_ftp ipt_MASQUERADE x_tables nf_nat_masquerade_ipv4 nf_nat nf_conntrack sg st snd_pcm_oss snd_mixer_oss kvm_intel snd_hda_codec_realtek snd_hda_codec_generic snd_opl3_synth kvm snd_seq_midi_emul snd_hda_intel snd_cmipci gameport snd_hda_codec snd_opl3_lib snd_mpu401_uart snd_seq_midi irqbypass snd_seq_midi_event sn
 d_seq snd_hda_core snd_hwdep snd_rawmidi asix coretemp eeepc_wmi tpm_infineon rc_ati_x10 snd_pcm libphy snd_seq_device asus_wmi snd_timer sparse_keymap rfkill intel_rapl 8250_fintek ati_remote usbnet tpm_tis snd rc_core battery tpm evdev iosf_mbi wmi pcspkr input_leds intel_powerclamp parport_pc processor i2c_i801 xhci_pci x86_pkg_temp_thermal usbserial soundcore parport xhci_hcd lpc_ich e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx blowfish_x86_64 blowfish_common ecb xts ansi_cprng drbg aesni_intel ablk_helper lrw gf128mul glue_helper aes_x86_64 crct10dif_pclmul crc32_pclmul crc32c_intel r8169 ehci_pci fjes ehci_hcd thermal mii sata_mv fan crypt
 d usbcore usb_common sata_sil24 [last unloaded: ftdi_sio]
[290625.713109] CPU: 0 PID: 25842 Comm: kworker/0:0 Tainted: G        W       4.4.2-amd64-i915-volpreempt-20160214bc3 #5
[290625.747597] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
[290625.776918] Workqueue: events cache_set_flush [bcache]
[290625.794260] task: ffff880467caa0c0 ti: ffff8805af184000 task.ti: ffff8805af184000
[290625.818544] RIP: 0010:[<ffffffffc0516e8d>]  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
[290625.848082] RSP: 0018:ffff8805af187e10  EFLAGS: 00010202
[290625.866890] RAX: 0000000000000001 RBX: ffff880153a40ce8 RCX: 0000000000000282
[290625.891136] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[290625.914365] RBP: ffff8805af187e30 R08: 00000000000000fa R09: ffffe8ffffc01900
[290625.937588] R10: ffff880088430838 R11: ffff880088430838 R12: 0000000000000000
[290625.960781] R13: ffff880153a40de8 R14: 0000000000000000 R15: ffffe8ffffc01900
[290625.983951] FS:  0000000000000000(0000) GS:ffff88061e200000(0000) knlGS:0000000000000000
[290626.010918] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[290626.030800] CR2: 00000000000009b8 CR3: 0000000001e0a000 CR4: 00000000001406f0
[290626.053914] Stack:
[290626.061687]  ffff8805d6183280 ffff88061e216040 ffff880153a40ce8 0000000000000000
[290626.085725]  ffff8805af187e78 ffffffff81070fcf ffff8805d61832b0 000000001e216058
[290626.110510]  ffff8805d6183280 ffff88061e216040 ffff8805d61832b0 ffff88061e216058
[290626.135329] Call Trace:
[290626.145108]  [<ffffffff81070fcf>] process_one_work+0x15b/0x260
[290626.164241]  [<ffffffff81071574>] worker_thread+0x1f0/0x29d
[290626.182561]  [<ffffffff81071384>] ? rescuer_thread+0x281/0x281
[290626.201721]  [<ffffffff810758e9>] kthread+0xa5/0xad
[290626.217966]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
[290626.236505]  [<ffffffff816e1c0f>] ret_from_fork+0x3f/0x70
[290626.255052]  [<ffffffff81075844>] ? kthread_parkme+0x24/0x24
[290626.274460] Code: 89 1c c1 4d 8b a4 24 f0 01 00 00 49 81 ec f0 01 00 00 eb b4 45 31 e4 0f b7 83 4c f7 ff ff 4a 8b 94 e3 58 ff ff ff 44 39 e0 76 16 <48> 8b ba b8 09 00 00 48 85 ff 74 05 e8 61 e8 b5 c0 49 ff c4 eb
[290626.337443] RIP  [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache]
[290626.359833]  RSP <ffff8805af187e10>
[290626.372050] CR2: 00000000000009b8
[290626.383646] ---[ end trace f97d88b8bb456800 ]---
[290626.399977] Kernel panic - not syncing: Fatal exception
[290626.418169] Kernel Offset: disabled
[290626.430278] Rebooting in 20 seconds..
[290646.378903] ACPI MEMORY or I/O RESET_REG.
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-03-25  3:59 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-08 23:04 Bcache still unstable for me (memory problems) Richard Bade
2016-03-09  3:51 ` Eric Wheeler
2016-03-10  1:34   ` Eric Wheeler
2016-03-10  2:15     ` Marc MERLIN
2016-03-10  7:07       ` sysrq-o for safe bcache shutdown, what about reboot? Eric Wheeler
2016-03-10 17:46         ` Marc MERLIN
2016-03-10 15:29       ` Bcache still unstable for me (memory problems) Marc MERLIN
2016-03-10 15:41         ` Christoph Nelles
2016-03-10 15:47           ` Marc MERLIN
2016-03-24 21:25         ` Marc MERLIN
2016-03-25  3:52           ` Eric Wheeler
2016-03-21  0:05     ` Richard Bade
2016-03-21  0:46       ` Marc MERLIN
2016-03-21  0:52         ` Richard Bade
2016-03-25  3:59       ` Eric Wheeler
  -- strict thread matches above, loose matches on Subject: below --
2016-03-07 20:35 [PATCH] " Eric Wheeler
2016-03-07 14:45 Marc MERLIN
2016-03-07 19:56 ` Eric Wheeler
2016-03-08 23:52   ` Marc MERLIN
2016-03-09  3:59     ` Eric Wheeler
2016-03-09 20:55       ` Marc MERLIN

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.