All of lore.kernel.org
 help / color / mirror / Atom feed
* echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0
@ 2016-02-15  6:04 Marc MERLIN
  2016-02-15 12:02 ` Johannes Thumshirn
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Marc MERLIN @ 2016-02-15  6:04 UTC (permalink / raw)
  To: linux-bcache

I was able to make one bcache ok, but when trying to make a 2nd one
where I only have a backing device, I'm getting repeated page allocation
failures.

Just to be clear, this is what I want to do: every new HD backed device
will have bcache on top even if I don't have a cache device for it, so
that I can add bcache later.
Is it something reasonable to do? Setup partitions as bcache backing
devices without a cache device for them?

So right now, I'm trying to do
md5 - bcache - dmcrypt - btrfs

So I did
make-bcache -B /dev/md5
echo /dev/md5 > /sys/fs/bcache/register

and it fails:
bash: page allocation failure: order:4, mode:0x2040d0
CPU: 2 PID: 28043 Comm: bash Not tainted 4.3.3-amd64-i915-volpreempt-20150421 #2
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
 0000000000000000 ffff88006f65ba78 ffffffff8134150e 0000000000000001
 ffff88006f65bb10 ffffffff8111f6ce ffff88021f5f4e38 00000004002040d0
 0000000400000040 0000000000000286 0000000000000004 0000000000000040
Call Trace:
 [<ffffffff8134150e>] dump_stack+0x44/0x55
 [<ffffffff8111f6ce>] warn_alloc_failed+0x111/0x129
 [<ffffffff811220f8>] __alloc_pages_nodemask+0x6ae/0x70d
 [<ffffffff8115fb98>] kmem_getpages+0x6a/0x162
 [<ffffffff8115fd89>] fallback_alloc+0xf9/0x193
 [<ffffffff8115ff46>] ____cache_alloc_node+0x123/0x130
 [<ffffffff81160f1e>] __kmalloc+0xf8/0x175
 [<ffffffffc057e141>] ? kzalloc.constprop.22+0xe/0x10 [bcache]
 [<ffffffffc057e141>] kzalloc.constprop.22+0xe/0x10 [bcache]
 [<ffffffffc0580986>] register_bcache+0x61b/0x1452 [bcache]
 [<ffffffff81342ce0>] kobj_attr_store+0x10/0x1a
 [<ffffffff811d9677>] sysfs_kf_write+0x39/0x3b
 [<ffffffff811d8f79>] kernfs_fop_write+0xed/0x130
 [<ffffffff81177a05>] __vfs_write+0x26/0xa5
 [<ffffffff816c0fdd>] ? _raw_spin_lock+0xe/0x10
 [<ffffffff81179277>] ? fput+0x16/0x88
 [<ffffffff812cd113>] ? security_file_permission+0x3b/0x42
 [<ffffffff8108e79b>] ? percpu_down_read+0x14/0x46
 [<ffffffff81179ba7>] ? __sb_start_write+0x25/0x3c
 [<ffffffff81178067>] vfs_write+0xa2/0xe6
 [<ffffffff81178835>] SyS_write+0x4d/0x78
 [<ffffffff816c3962>] sysenter_dispatch+0xf/0x29
Mem-Info:
active_anon:412014 inactive_anon:222645 isolated_anon:0
 active_file:461209 inactive_file:499016 isolated_file:0
 unevictable:1166 dirty:28999 writeback:712 unstable:0
 slab_reclaimable:69514 slab_unreclaimable:47678
 mapped:420280 shmem:410182 pagetables:5369 bounce:0
 free:131404 free_pcp:1088 free_cma:0
Node 0 DMA free:15888kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3203 7674 7674
Node 0 DMA32 free:462160kB min:4644kB low:5804kB high:6964kB active_anon:443844kB inactive_anon:470452kB active_file:755044kB inactive_file:897988kB unevictable:1488kB isolated(anon):0kB isolated(file):0kB present:3362068kB managed:3283400kB mlocked:1488kB dirty:21656kB writeback:2912kB mapped:703428kB shmem:665604kB slab_reclaimable:71416kB slab_unreclaimable:69108kB kernel_stack:7056kB pagetables:6732kB unstable:0kB bounce:0kB free_pcp:3008kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 4471 4471
Node 0 Normal free:26576kB min:6480kB low:8100kB high:9720kB active_anon:1199644kB inactive_anon:420120kB active_file:1091280kB inactive_file:1101072kB unevictable:3176kB isolated(anon):0kB isolated(file):0kB present:4708352kB managed:4578508kB mlocked:3176kB dirty:78692kB writeback:8900kB mapped:977740kB shmem:975112kB slab_reclaimable:206628kB slab_unreclaimable:124308kB kernel_stack:5120kB pagetables:14888kB unstable:0kB bounce:0kB free_pcp:3364kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
Node 0 DMA32: 79732*4kB (UEM) 17590*8kB (UEM) 151*16kB (UM) 1*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 462096kB
Node 0 Normal: 4435*4kB (UEM) 5*8kB (UE) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17780kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
1387415 total pagecache pages
11051 pages in swap cache
Swap cache stats: add 4480800, delete 4469749, find 1201429612/1202240702
Free swap  = 14096744kB
Total swap = 15616764kB
2021599 pages RAM
0 pages HighMem/MovableOnly
52149 pages reserved
4096 pages cma reserved
0 pages hwpoisoned
bcache: register_bcache() error opening /dev/md5: (null)

Any idea what's going on?

gargamel:~# free
             total       used       free     shared    buffers     cached
Mem:       7877800    6065272    1812528          0        224    4193616
-/+ buffers/cache:    1871432    6006368
Swap:     15616764    1519056   14097708
gargamel:~# cat /proc/meminfo 
MemTotal:        7877800 kB
MemFree:         1799936 kB
MemAvailable:    4602768 kB
Buffers:             224 kB
Cached:          4208328 kB
SwapCached:        44472 kB
Active:          3595380 kB
Inactive:        1508064 kB
Active(anon):    1647420 kB
Inactive(anon):   891972 kB
Active(file):    1947960 kB
Inactive(file):   616092 kB
Unevictable:        4664 kB
Mlocked:            4664 kB
SwapTotal:      15616764 kB
SwapFree:       14097712 kB
Dirty:            104932 kB
Writeback:             0 kB
AnonPages:        876480 kB
Mapped:          1679836 kB
Shmem:           1640716 kB
Slab:             469492 kB
SReclaimable:     280564 kB
SUnreclaim:       188928 kB
KernelStack:       12288 kB
PageTables:        21264 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    19555664 kB
Committed_AS:    7995116 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      409412 kB
VmallocChunk:   34358947836 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
CmaTotal:          16384 kB
CmaFree:             232 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      740628 kB
DirectMap2M:     7346176 kB

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0
  2016-02-15  6:04 echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0 Marc MERLIN
@ 2016-02-15 12:02 ` Johannes Thumshirn
  2016-02-15 15:32   ` Marc MERLIN
  2016-02-15 12:11 ` echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0 Kent Overstreet
  2016-02-24  6:53 ` Eric Wheeler
  2 siblings, 1 reply; 28+ messages in thread
From: Johannes Thumshirn @ 2016-02-15 12:02 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-bcache

On Sun, Feb 14, 2016 at 10:04:10PM -0800, Marc MERLIN wrote:
> I was able to make one bcache ok, but when trying to make a 2nd one
> where I only have a backing device, I'm getting repeated page allocation
> failures.
> 
> Just to be clear, this is what I want to do: every new HD backed device
> will have bcache on top even if I don't have a cache device for it, so
> that I can add bcache later.
> Is it something reasonable to do? Setup partitions as bcache backing
> devices without a cache device for them?
> 
> So right now, I'm trying to do
> md5 - bcache - dmcrypt - btrfs
> 
> So I did
> make-bcache -B /dev/md5
> echo /dev/md5 > /sys/fs/bcache/register
> 
> and it fails:
> bash: page allocation failure: order:4, mode:0x2040d0

Is this system under some kind of memory preassure? Did this happen right
after boot or when the system was already running for quite some time?

> CPU: 2 PID: 28043 Comm: bash Not tainted 4.3.3-amd64-i915-volpreempt-20150421 #2
> Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
>  0000000000000000 ffff88006f65ba78 ffffffff8134150e 0000000000000001
>  ffff88006f65bb10 ffffffff8111f6ce ffff88021f5f4e38 00000004002040d0
>  0000000400000040 0000000000000286 0000000000000004 0000000000000040
> Call Trace:
>  [<ffffffff8134150e>] dump_stack+0x44/0x55
>  [<ffffffff8111f6ce>] warn_alloc_failed+0x111/0x129
>  [<ffffffff811220f8>] __alloc_pages_nodemask+0x6ae/0x70d
>  [<ffffffff8115fb98>] kmem_getpages+0x6a/0x162
>  [<ffffffff8115fd89>] fallback_alloc+0xf9/0x193
>  [<ffffffff8115ff46>] ____cache_alloc_node+0x123/0x130
>  [<ffffffff81160f1e>] __kmalloc+0xf8/0x175
>  [<ffffffffc057e141>] ? kzalloc.constprop.22+0xe/0x10 [bcache]
>  [<ffffffffc057e141>] kzalloc.constprop.22+0xe/0x10 [bcache]
>  [<ffffffffc0580986>] register_bcache+0x61b/0x1452 [bcache]
>  [<ffffffff81342ce0>] kobj_attr_store+0x10/0x1a
>  [<ffffffff811d9677>] sysfs_kf_write+0x39/0x3b
>  [<ffffffff811d8f79>] kernfs_fop_write+0xed/0x130
>  [<ffffffff81177a05>] __vfs_write+0x26/0xa5
>  [<ffffffff816c0fdd>] ? _raw_spin_lock+0xe/0x10
>  [<ffffffff81179277>] ? fput+0x16/0x88
>  [<ffffffff812cd113>] ? security_file_permission+0x3b/0x42
>  [<ffffffff8108e79b>] ? percpu_down_read+0x14/0x46
>  [<ffffffff81179ba7>] ? __sb_start_write+0x25/0x3c
>  [<ffffffff81178067>] vfs_write+0xa2/0xe6
>  [<ffffffff81178835>] SyS_write+0x4d/0x78
>  [<ffffffff816c3962>] sysenter_dispatch+0xf/0x29
> Mem-Info:
> active_anon:412014 inactive_anon:222645 isolated_anon:0
>  active_file:461209 inactive_file:499016 isolated_file:0
>  unevictable:1166 dirty:28999 writeback:712 unstable:0
>  slab_reclaimable:69514 slab_unreclaimable:47678
>  mapped:420280 shmem:410182 pagetables:5369 bounce:0
>  free:131404 free_pcp:1088 free_cma:0
> Node 0 DMA free:15888kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 3203 7674 7674
> Node 0 DMA32 free:462160kB min:4644kB low:5804kB high:6964kB active_anon:443844kB inactive_anon:470452kB active_file:755044kB inactive_file:897988kB unevictable:1488kB isolated(anon):0kB isolated(file):0kB present:3362068kB managed:3283400kB mlocked:1488kB dirty:21656kB writeback:2912kB mapped:703428kB shmem:665604kB slab_reclaimable:71416kB slab_unreclaimable:69108kB kernel_stack:7056kB pagetables:6732kB unstable:0kB bounce:0kB free_pcp:3008kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 4471 4471
> Node 0 Normal free:26576kB min:6480kB low:8100kB high:9720kB active_anon:1199644kB inactive_anon:420120kB active_file:1091280kB inactive_file:1101072kB unevictable:3176kB isolated(anon):0kB isolated(file):0kB present:4708352kB managed:4578508kB mlocked:3176kB dirty:78692kB writeback:8900kB mapped:977740kB shmem:975112kB slab_reclaimable:206628kB slab_unreclaimable:124308kB kernel_stack:5120kB pagetables:14888kB unstable:0kB bounce:0kB free_pcp:3364kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
> Node 0 DMA32: 79732*4kB (UEM) 17590*8kB (UEM) 151*16kB (UM) 1*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 462096kB
> Node 0 Normal: 4435*4kB (UEM) 5*8kB (UE) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17780kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> 1387415 total pagecache pages
> 11051 pages in swap cache
> Swap cache stats: add 4480800, delete 4469749, find 1201429612/1202240702
> Free swap  = 14096744kB
> Total swap = 15616764kB
> 2021599 pages RAM
> 0 pages HighMem/MovableOnly
> 52149 pages reserved
> 4096 pages cma reserved
> 0 pages hwpoisoned
> bcache: register_bcache() error opening /dev/md5: (null)
> 
> Any idea what's going on?
> 
> gargamel:~# free
>              total       used       free     shared    buffers     cached
> Mem:       7877800    6065272    1812528          0        224    4193616
> -/+ buffers/cache:    1871432    6006368
> Swap:     15616764    1519056   14097708
> gargamel:~# cat /proc/meminfo 
> MemTotal:        7877800 kB
> MemFree:         1799936 kB
> MemAvailable:    4602768 kB
> Buffers:             224 kB
> Cached:          4208328 kB
> SwapCached:        44472 kB
> Active:          3595380 kB
> Inactive:        1508064 kB
> Active(anon):    1647420 kB
> Inactive(anon):   891972 kB
> Active(file):    1947960 kB
> Inactive(file):   616092 kB
> Unevictable:        4664 kB
> Mlocked:            4664 kB
> SwapTotal:      15616764 kB
> SwapFree:       14097712 kB
> Dirty:            104932 kB
> Writeback:             0 kB
> AnonPages:        876480 kB
> Mapped:          1679836 kB
> Shmem:           1640716 kB
> Slab:             469492 kB
> SReclaimable:     280564 kB
> SUnreclaim:       188928 kB
> KernelStack:       12288 kB
> PageTables:        21264 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:    19555664 kB
> Committed_AS:    7995116 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:      409412 kB
> VmallocChunk:   34358947836 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:         0 kB
> CmaTotal:          16384 kB
> CmaFree:             232 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> DirectMap4k:      740628 kB
> DirectMap2M:     7346176 kB
> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0
  2016-02-15  6:04 echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0 Marc MERLIN
  2016-02-15 12:02 ` Johannes Thumshirn
@ 2016-02-15 12:11 ` Kent Overstreet
  2016-02-24  6:53 ` Eric Wheeler
  2 siblings, 0 replies; 28+ messages in thread
From: Kent Overstreet @ 2016-02-15 12:11 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-bcache

On Sun, Feb 14, 2016 at 10:04:10PM -0800, Marc MERLIN wrote:
> I was able to make one bcache ok, but when trying to make a 2nd one
> where I only have a backing device, I'm getting repeated page allocation
> failures.
> 
> Just to be clear, this is what I want to do: every new HD backed device
> will have bcache on top even if I don't have a cache device for it, so
> that I can add bcache later.
> Is it something reasonable to do? Setup partitions as bcache backing
> devices without a cache device for them?
> 
> So right now, I'm trying to do
> md5 - bcache - dmcrypt - btrfs
> 
> So I did
> make-bcache -B /dev/md5
> echo /dev/md5 > /sys/fs/bcache/register
> 
> and it fails:
> bash: page allocation failure: order:4, mode:0x2040d0

You want CONFIG_COMPACTION=y

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0
  2016-02-15 12:02 ` Johannes Thumshirn
@ 2016-02-15 15:32   ` Marc MERLIN
  2016-02-15 15:45     ` Christoph Nelles
  0 siblings, 1 reply; 28+ messages in thread
From: Marc MERLIN @ 2016-02-15 15:32 UTC (permalink / raw)
  To: Johannes Thumshirn, Kent Overstreet; +Cc: linux-bcache

On Mon, Feb 15, 2016 at 01:02:55PM +0100, Johannes Thumshirn wrote:
> > So I did
> > make-bcache -B /dev/md5
> > echo /dev/md5 > /sys/fs/bcache/register
> > 
> > and it fails:
> > bash: page allocation failure: order:4, mode:0x2040d0
> 
> Is this system under some kind of memory preassure? Did this happen right
> after boot or when the system was already running for quite some time?
 
Uptime 17 days, it's a server that does a lot of I/O, but free shows I have
6GB available for buffer cache out of 8, so that shouldn't be too bad. Seems
like I'm hitting an issue with memory fragmentation or a specific kind of
memory not being available?
(but I don't have any other kernel processes complaining about memory
allocation failure, at least not consistently like here)

On Mon, Feb 15, 2016 at 03:11:04AM -0900, Kent Overstreet wrote:
> > and it fails:
> > bash: page allocation failure: order:4, mode:0x2040d0
> 
> You want CONFIG_COMPACTION=y

Good suggestion, but I already have it:
gargamel:~# grep CONFIG_COMPACTION /boot/config-4.3.3-amd64-i915-volpreempt-20150421
CONFIG_COMPACTION=y

Thanks for the replies,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0
  2016-02-15 15:32   ` Marc MERLIN
@ 2016-02-15 15:45     ` Christoph Nelles
  2016-02-23 16:32       ` Marc MERLIN
  2016-02-24 20:45       ` BUG: drivers/md/bcache/writeback.c:237 Marc MERLIN
  0 siblings, 2 replies; 28+ messages in thread
From: Christoph Nelles @ 2016-02-15 15:45 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-bcache

Hello Marc,
Am 15.02.2016 um 16:32 schrieb Marc MERLIN:
>>> and it fails:
>>> bash: page allocation failure: order:4, mode:0x2040d0
>> You want CONFIG_COMPACTION=y
> Good suggestion, but I already have it:
> gargamel:~# grep CONFIG_COMPACTION /boot/config-4.3.3-amd64-i915-volpreempt-20150421
> CONFIG_COMPACTION=y
>
> Thanks for the replies,
> Marc
Maybe increasing vm.min_free_kbytes helps you.

Regards

Christoph

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0
  2016-02-15 15:45     ` Christoph Nelles
@ 2016-02-23 16:32       ` Marc MERLIN
  2016-02-23 20:57         ` Marc MERLIN
  2016-02-24 20:45       ` BUG: drivers/md/bcache/writeback.c:237 Marc MERLIN
  1 sibling, 1 reply; 28+ messages in thread
From: Marc MERLIN @ 2016-02-23 16:32 UTC (permalink / raw)
  To: Christoph Nelles; +Cc: linux-bcache

On Mon, Feb 15, 2016 at 04:45:40PM +0100, Christoph Nelles wrote:
> Hello Marc,
> Am 15.02.2016 um 16:32 schrieb Marc MERLIN:
> >>>and it fails:
> >>>bash: page allocation failure: order:4, mode:0x2040d0
> >>You want CONFIG_COMPACTION=y
> >Good suggestion, but I already have it:
> >gargamel:~# grep CONFIG_COMPACTION 
> >/boot/config-4.3.3-amd64-i915-volpreempt-20150421
> >CONFIG_COMPACTION=y
> >
> >Thanks for the replies,
> >Marc
> Maybe increasing vm.min_free_kbytes helps you.

That was a good suggestion, but it didn't help.
Looks like I'm going to have to reboot, even though everything else works,
and it's not a good time to reboot that machine...

gargamel:/sys/fs/bcache# echo 256000 > /proc/sys/vm/min_free_kbytes
gargamel:/sys/fs/bcache# echo /dev/md5 >  /sys/fs/bcache/register
kernel: bash: page allocation failure: order:4, mode:0x2040d0
kernel: CPU: 4 PID: 24535 Comm: bash Not tainted 4.3.3-amd64-i915-volpreempt-20150421 #2
kernel: Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
kernel:  0000000000000000 ffff88001856ba78 ffffffff8134150e 0000000000000001
kernel:  ffff88001856bb10 ffffffff8111f6ce ffff88021f5f4e38 00000004002040d0
kernel:  0000000400000040 0000000000000000 0000000000000004 0000000000000040
kernel: Call Trace:
kernel:  [<ffffffff8134150e>] dump_stack+0x44/0x55
kernel:  [<ffffffff8111f6ce>] warn_alloc_failed+0x111/0x129
kernel:  [<ffffffff811220f8>] __alloc_pages_nodemask+0x6ae/0x70d
kernel:  [<ffffffff8115fb98>] kmem_getpages+0x6a/0x162
kernel:  [<ffffffff8115fd89>] fallback_alloc+0xf9/0x193
kernel:  [<ffffffff8115ff46>] ____cache_alloc_node+0x123/0x130
kernel:  [<ffffffff81160f1e>] __kmalloc+0xf8/0x175
kernel:  [<ffffffffc057e141>] ? kzalloc.constprop.22+0xe/0x10 [bcache]
kernel:  [<ffffffffc057e141>] kzalloc.constprop.22+0xe/0x10 [bcache]
kernel:  [<ffffffffc0580986>] register_bcache+0x61b/0x1452 [bcache]
kernel:  [<ffffffff81342ce0>] kobj_attr_store+0x10/0x1a
kernel:  [<ffffffff811d9677>] sysfs_kf_write+0x39/0x3b
kernel:  [<ffffffff811d8f79>] kernfs_fop_write+0xed/0x130
kernel:  [<ffffffff81177a05>] __vfs_write+0x26/0xa5
kernel:  [<ffffffff816c0fdd>] ? _raw_spin_lock+0xe/0x10
kernel:  [<ffffffff81179277>] ? fput+0x16/0x88
kernel:  [<ffffffff812cd113>] ? security_file_permission+0x3b/0x42
kernel:  [<ffffffff8108e79b>] ? percpu_down_read+0x14/0x46
kernel:  [<ffffffff81179ba7>] ? __sb_start_write+0x25/0x3c
kernel:  [<ffffffff81178067>] vfs_write+0xa2/0xe6
kernel:  [<ffffffff81178835>] SyS_write+0x4d/0x78
kernel:  [<ffffffff816c3962>] sysenter_dispatch+0xf/0x29
kernel: Mem-Info:
kernel: active_anon:382212 inactive_anon:207983 isolated_anon:0
kernel:  active_file:382287 inactive_file:380967 isolated_file:0
kernel:  unevictable:1682 dirty:42944 writeback:138650 unstable:0
kernel:  slab_reclaimable:74240 slab_unreclaimable:64114
kernel:  mapped:419342 shmem:410117 pagetables:5535 bounce:0
kernel:  free:294984 free_pcp:1311 free_cma:58
kernel: Node 0 DMA free:15892kB min:516kB low:644kB high:772kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
kernel: lowmem_reserve[]: 0 3203 7674 7674
kernel: Node 0 DMA32 free:797024kB min:106696kB low:133368kB high:160044kB active_anon:409972kB inactive_anon:435264kB active_file:684968kB inactive_file:679124kB unevictable:3612kB isolated(anon):0kB isolated(file):0kB present:3362068kB managed:3283400kB mlocked:3612kB dirty:136096kB writeback:33480kB mapped:698284kB shmem:663532kB slab_reclaimable:79300kB slab_unreclaimable:80900kB kernel_stack:7680kB pagetables:5920kB unstable:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
kernel: lowmem_reserve[]: 0 0 4471 4471
kernel: Node 0 Normal free:372484kB min:148784kB low:185980kB high:223176kB active_anon:1117868kB inactive_anon:396668kB active_file:842520kB inactive_file:842900kB unevictable:3116kB isolated(anon):0kB isolated(file):0kB present:4708352kB managed:4578508kB mlocked:3116kB dirty:37936kB writeback:521120kB mapped:979084kB shmem:976936kB slab_reclaimable:217660kB slab_unreclaimable:175556kB kernel_stack:4448kB pagetables:16220kB unstable:0kB bounce:0kB free_pcp:1864kB local_pcp:0kB free_cma:232kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
kernel: lowmem_reserve[]: 0 0 0 0
kernel: Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
kernel: Node 0 DMA32: 72789*4kB (UEM) 59674*8kB (UE) 1691*16kB (U) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 795604kB
kernel: Node 0 Normal: 75526*4kB (UE) 13664*8kB (U) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 411416kB
kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
kernel: 1200434 total pagecache pages
kernel: 26303 pages in swap cache
kernel: Swap cache stats: add 6374359, delete 6348056, find 1218002292/1219256059
kernel: Free swap  = 13686312kB
kernel: Total swap = 15616764kB
kernel: 2021599 pages RAM
kernel: 0 pages HighMem/MovableOnly
kernel: 52149 pages reserved
kernel: 4096 pages cma reserved
kernel: 0 pages hwpoisoned
bash: echo: write error: Invalid argument

Cheers,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0
  2016-02-23 16:32       ` Marc MERLIN
@ 2016-02-23 20:57         ` Marc MERLIN
  0 siblings, 0 replies; 28+ messages in thread
From: Marc MERLIN @ 2016-02-23 20:57 UTC (permalink / raw)
  To: Christoph Nelles; +Cc: linux-bcache

On Tue, Feb 23, 2016 at 08:32:43AM -0800, Marc MERLIN wrote:
> On Mon, Feb 15, 2016 at 04:45:40PM +0100, Christoph Nelles wrote:
> > Hello Marc,
> > Am 15.02.2016 um 16:32 schrieb Marc MERLIN:
> > >>>and it fails:
> > >>>bash: page allocation failure: order:4, mode:0x2040d0
> > >>You want CONFIG_COMPACTION=y
> > >Good suggestion, but I already have it:
> > >gargamel:~# grep CONFIG_COMPACTION 
> > >/boot/config-4.3.3-amd64-i915-volpreempt-20150421
> > >CONFIG_COMPACTION=y
> > >
> > >Thanks for the replies,
> > >Marc
> > Maybe increasing vm.min_free_kbytes helps you.
> 
> That was a good suggestion, but it didn't help.
> Looks like I'm going to have to reboot, even though everything else works,
> and it's not a good time to reboot that machine...
> 
> gargamel:/sys/fs/bcache# echo 256000 > /proc/sys/vm/min_free_kbytes
> gargamel:/sys/fs/bcache# echo /dev/md5 >  /sys/fs/bcache/register

So, I rebooted, this worked, and then adding the cache failed similarly:
gargamel:/sys/block/dm-4/bcache# echo /dev/sdh2  > /sys/fs/bcache/register

Dump below. I had to reboot a 2nd time and register the cache device
quickly enough after boot, and then things worked.

I don't seem to have other memory issues on that (busy) server. Would
there be a way for bcache to allocate memory in a different way?
Either way, I'm set now, but the reboots were not great.

bash: page allocation failure: order:7, mode:0x24080c0
CPU: 3 PID: 20478 Comm: bash Not tainted 4.4.2-amd64-i915-volpreempt-20160213 #2
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
 0000000000000000 ffff88001a987b38 ffffffff8134ae0a 0000000000000001
 ffff88001a987bd0 ffffffff81124ab6 024080c01a987c74 024080c000000040
 0000000700000001 0000000000000007 0000000000000007 0000000000000040
Call Trace:
 [<ffffffff8134ae0a>] dump_stack+0x44/0x55
 [<ffffffff81124ab6>] warn_alloc_failed+0x114/0x12c
 [<ffffffff811274b8>] __alloc_pages_nodemask+0x7cb/0x84c
 [<ffffffff8115f6d7>] alloc_pages_current+0xa9/0xcd
 [<ffffffff8112377e>] __get_free_pages+0xe/0x3c
 [<ffffffffc0514315>] register_bcache+0xf98/0x1452 [bcache]
 [<ffffffff8134c853>] kobj_attr_store+0x10/0x1a
 [<ffffffff811df9df>] sysfs_kf_write+0x39/0x3b
 [<ffffffff811df2e1>] kernfs_fop_write+0xed/0x130
 [<ffffffff8117d97d>] __vfs_write+0x26/0xa5
 [<ffffffff8117f228>] ? fput+0x16/0x88
 [<ffffffff810b2969>] ? current_kernel_time64+0x10/0x36
 [<ffffffff812d6050>] ? security_file_permission+0x3b/0x42
 [<ffffffff810915ea>] ? percpu_down_read+0x12/0x41
 [<ffffffff8117fb61>] ? __sb_start_write+0x2b/0x48
 [<ffffffff8117dfe8>] vfs_write+0x9d/0xe8
 [<ffffffff8117e7bd>] SyS_write+0x4d/0x78
 [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3
 [<ffffffff816e3e32>] sysenter_flags_fixed+0x8/0x12
Mem-Info:
active_anon:518645 inactive_anon:178721 isolated_anon:0
 active_file:447391 inactive_file:384565 isolated_file:0
 unevictable:1224 dirty:47288 writeback:32 unstable:0
 slab_reclaimable:60271 slab_unreclaimable:63130
 mapped:420136 shmem:411469 pagetables:4548 bounce:0
 free:23747 free_pcp:1820 free_cma:1140
Node 0 DMA free:15888kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
IN=eth2 OUT= MAC=00:0e:c6:88:7c:ae:20:e5:2a:b7:f5:3f:08:00 SRC=186.188.43.126 DST=173.11.111.146 LEN=60 TOS=0x00 PREC=0x20 TTL=48 ID=15654 DF PROTO=TCP SPT=39231 DPT=23 WINDOW=5840 RES=0x00 SYN URGP=0 
IN=eth2 OUT= MAC=00:0e:c6:88:7c:ae:20:e5:2a:b7:f5:3f:08:00 SRC=186.188.43.126 DST=173.11.111.147 LEN=60 TOS=0x00 PREC=0x20 TTL=50 ID=62914 DF PROTO=TCP SPT=39232 DPT=23 WINDOW=5840 RES=0x00 SYN URGP=0   
lowmem_reserve[]: 0 3201 7672 7672  
Node 0 DMA32 free:56180kB min:4640kB low:5800kB high:6960kB active_anon:826344kB inactive_anon:284052kB active_file:880532kB inactive_file:596332kB unevictable:1564kB isolated(anon):0kB isolated(file):0kB present:3362068kB managed:3283032kB mlocked:1564kB dirty:26784kB writeback:48kB mapped:701412kB shmem:683748kB slab_reclaimable:86248kB slab_unreclaimable:99064kB kernel_stack:4912kB pagetables:6712kB unstable:0kB bounce:0kB free_pcp:3668kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
IN=eth2 OUT= MAC=00:0e:c6:88:7c:ae:20:e5:2a:b7:f5:3f:08:00 SRC=186.188.43.126 DST=173.11.111.149 LEN=60 TOS=0x00 PREC=0x20 TTL=50 ID=31705 DF PROTO=TCP SPT=39234 DPT=23 WINDOW=5840 RES=0x00 SYN URGP=0   
lowmem_reserve[]: 0 0 4471 4471  
Node 0 Normal free:18320kB min:6480kB low:8100kB high:9720kB active_anon:1248180kB inactive_anon:430832kB active_file:909944kB inactive_file:945776kB unevictable:3332kB isolated(anon):0kB isolated(file):0kB present:4708352kB managed:4578512kB mlocked:68719480068kB dirty:164332kB writeback:80kB mapped:979132kB shmem:962128kB slab_reclaimable:154832kB slab_unreclaimable:153456kB kernel_stack:6608kB pagetables:11484kB unstable:0kB bounce:0kB free_pcp:3744kB local_pcp:356kB free_cma:4560kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no  
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
Node 0 DMA32: 12*4kB (UME) 5672*8kB (UME) 577*16kB (UM) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54656kB
Node 0 Normal: 691*4kB (UMEC) 276*8kB (UMEC) 131*16kB (UMEC) 88*32kB (UMEC) 75*64kB (UM) 11*128kB (UM) 2*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 16604kB
Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
1246159 total pagecache pages
18 pages in swap cache
Swap cache stats: add 536, delete 518, find 1/1
Free swap  = 15614620kB
Total swap = 15616764kB
2021599 pages RAM
0 pages HighMem/MovableOnly
52240 pages reserved
4096 pages cma reserved
0 pages hwpoisoned
bcache: register_cache() error opening sdh2: cannot allocate memory


-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0
  2016-02-15  6:04 echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0 Marc MERLIN
  2016-02-15 12:02 ` Johannes Thumshirn
  2016-02-15 12:11 ` echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0 Kent Overstreet
@ 2016-02-24  6:53 ` Eric Wheeler
  2016-02-24 16:37   ` Disabling bcache from boot when it crashes? Marc MERLIN
  2 siblings, 1 reply; 28+ messages in thread
From: Eric Wheeler @ 2016-02-24  6:53 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-bcache


On Sun, 14 Feb 2016, Marc MERLIN wrote:

> I was able to make one bcache ok, but when trying to make a 2nd one
> where I only have a backing device, I'm getting repeated page allocation
> failures.
> 
> Just to be clear, this is what I want to do: every new HD backed device
> will have bcache on top even if I don't have a cache device for it, so
> that I can add bcache later.
> Is it something reasonable to do? Setup partitions as bcache backing
> devices without a cache device for them?
> 
> So right now, I'm trying to do
> md5 - bcache - dmcrypt - btrfs
> 
> So I did
> make-bcache -B /dev/md5
> echo /dev/md5 > /sys/fs/bcache/register
>
> and it fails:
> bash: page allocation failure: order:4, mode:0x2040d0
> CPU: 2 PID: 28043 Comm: bash Not tainted 4.3.3-amd64-i915-volpreempt-20150421 #2

Do you have the bcache stability patches? 4.3.3 might be missing some 
critical patches.

Be sure to cherry-pick these from linux 4.5-rc1:
	git cherry-pick 2ef9ccbf~1..627ccd20
or use one of the 4.1 or 3.18 longterm kernels.

I've not see any memory allocation issues before in bcache, but you 
definitely want those patches for general stability.

-Eric


> Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
>  0000000000000000 ffff88006f65ba78 ffffffff8134150e 0000000000000001
>  ffff88006f65bb10 ffffffff8111f6ce ffff88021f5f4e38 00000004002040d0
>  0000000400000040 0000000000000286 0000000000000004 0000000000000040
> Call Trace:
>  [<ffffffff8134150e>] dump_stack+0x44/0x55
>  [<ffffffff8111f6ce>] warn_alloc_failed+0x111/0x129
>  [<ffffffff811220f8>] __alloc_pages_nodemask+0x6ae/0x70d
>  [<ffffffff8115fb98>] kmem_getpages+0x6a/0x162
>  [<ffffffff8115fd89>] fallback_alloc+0xf9/0x193
>  [<ffffffff8115ff46>] ____cache_alloc_node+0x123/0x130
>  [<ffffffff81160f1e>] __kmalloc+0xf8/0x175
>  [<ffffffffc057e141>] ? kzalloc.constprop.22+0xe/0x10 [bcache]
>  [<ffffffffc057e141>] kzalloc.constprop.22+0xe/0x10 [bcache]
>  [<ffffffffc0580986>] register_bcache+0x61b/0x1452 [bcache]
>  [<ffffffff81342ce0>] kobj_attr_store+0x10/0x1a
>  [<ffffffff811d9677>] sysfs_kf_write+0x39/0x3b
>  [<ffffffff811d8f79>] kernfs_fop_write+0xed/0x130
>  [<ffffffff81177a05>] __vfs_write+0x26/0xa5
>  [<ffffffff816c0fdd>] ? _raw_spin_lock+0xe/0x10
>  [<ffffffff81179277>] ? fput+0x16/0x88
>  [<ffffffff812cd113>] ? security_file_permission+0x3b/0x42
>  [<ffffffff8108e79b>] ? percpu_down_read+0x14/0x46
>  [<ffffffff81179ba7>] ? __sb_start_write+0x25/0x3c
>  [<ffffffff81178067>] vfs_write+0xa2/0xe6
>  [<ffffffff81178835>] SyS_write+0x4d/0x78
>  [<ffffffff816c3962>] sysenter_dispatch+0xf/0x29
> Mem-Info:
> active_anon:412014 inactive_anon:222645 isolated_anon:0
>  active_file:461209 inactive_file:499016 isolated_file:0
>  unevictable:1166 dirty:28999 writeback:712 unstable:0
>  slab_reclaimable:69514 slab_unreclaimable:47678
>  mapped:420280 shmem:410182 pagetables:5369 bounce:0
>  free:131404 free_pcp:1088 free_cma:0
> Node 0 DMA free:15888kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
> lowmem_reserve[]: 0 3203 7674 7674
> Node 0 DMA32 free:462160kB min:4644kB low:5804kB high:6964kB active_anon:443844kB inactive_anon:470452kB active_file:755044kB inactive_file:897988kB unevictable:1488kB isolated(anon):0kB isolated(file):0kB present:3362068kB managed:3283400kB mlocked:1488kB dirty:21656kB writeback:2912kB mapped:703428kB shmem:665604kB slab_reclaimable:71416kB slab_unreclaimable:69108kB kernel_stack:7056kB pagetables:6732kB unstable:0kB bounce:0kB free_pcp:3008kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 4471 4471
> Node 0 Normal free:26576kB min:6480kB low:8100kB high:9720kB active_anon:1199644kB inactive_anon:420120kB active_file:1091280kB inactive_file:1101072kB unevictable:3176kB isolated(anon):0kB isolated(file):0kB present:4708352kB managed:4578508kB mlocked:3176kB dirty:78692kB writeback:8900kB mapped:977740kB shmem:975112kB slab_reclaimable:206628kB slab_unreclaimable:124308kB kernel_stack:5120kB pagetables:14888kB unstable:0kB bounce:0kB free_pcp:3364kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0 0
> Node 0 DMA: 0*4kB 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15888kB
> Node 0 DMA32: 79732*4kB (UEM) 17590*8kB (UEM) 151*16kB (UM) 1*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 462096kB
> Node 0 Normal: 4435*4kB (UEM) 5*8kB (UE) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 17780kB
> Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
> 1387415 total pagecache pages
> 11051 pages in swap cache
> Swap cache stats: add 4480800, delete 4469749, find 1201429612/1202240702
> Free swap  = 14096744kB
> Total swap = 15616764kB
> 2021599 pages RAM
> 0 pages HighMem/MovableOnly
> 52149 pages reserved
> 4096 pages cma reserved
> 0 pages hwpoisoned
> bcache: register_bcache() error opening /dev/md5: (null)
> 
> Any idea what's going on?
> 
> gargamel:~# free
>              total       used       free     shared    buffers     cached
> Mem:       7877800    6065272    1812528          0        224    4193616
> -/+ buffers/cache:    1871432    6006368
> Swap:     15616764    1519056   14097708
> gargamel:~# cat /proc/meminfo 
> MemTotal:        7877800 kB
> MemFree:         1799936 kB
> MemAvailable:    4602768 kB
> Buffers:             224 kB
> Cached:          4208328 kB
> SwapCached:        44472 kB
> Active:          3595380 kB
> Inactive:        1508064 kB
> Active(anon):    1647420 kB
> Inactive(anon):   891972 kB
> Active(file):    1947960 kB
> Inactive(file):   616092 kB
> Unevictable:        4664 kB
> Mlocked:            4664 kB
> SwapTotal:      15616764 kB
> SwapFree:       14097712 kB
> Dirty:            104932 kB
> Writeback:             0 kB
> AnonPages:        876480 kB
> Mapped:          1679836 kB
> Shmem:           1640716 kB
> Slab:             469492 kB
> SReclaimable:     280564 kB
> SUnreclaim:       188928 kB
> KernelStack:       12288 kB
> PageTables:        21264 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:    19555664 kB
> Committed_AS:    7995116 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:      409412 kB
> VmallocChunk:   34358947836 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:         0 kB
> CmaTotal:          16384 kB
> CmaFree:             232 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> DirectMap4k:      740628 kB
> DirectMap2M:     7346176 kB
> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 




--
Eric Wheeler, President           eWheeler, Inc. dba Global Linux Security
888-LINUX26 (888-546-8926)        Fax: 503-716-3878           PO Box 25107
www.GlobalLinuxSecurity.pro       Linux since 1996!     Portland, OR 97298

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Disabling bcache from boot when it crashes?
  2016-02-24  6:53 ` Eric Wheeler
@ 2016-02-24 16:37   ` Marc MERLIN
  2016-02-24 19:10     ` Eric Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Marc MERLIN @ 2016-02-24 16:37 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: linux-bcache

On Wed, Feb 24, 2016 at 06:53:05AM +0000, Eric Wheeler wrote:
> Do you have the bcache stability patches? 4.3.3 might be missing some 
> critical patches.
> 
> Be sure to cherry-pick these from linux 4.5-rc1:
> 	git cherry-pick 2ef9ccbf~1..627ccd20
> or use one of the 4.1 or 3.18 longterm kernels.
> 
> I've not see any memory allocation issues before in bcache, but you 
> definitely want those patches for general stability.

I'm running 4.4.2, so I'm assuming I don't have those fixes, thanks for the
heads up.

Well, your message is timely, just as you wrote this, I got bcache that
crashed my system as I was shutting down, and then I was unable to ever
reboot because bcache would detect my partitions, start bcache, and crash
before I could do anything to fix it.

A few questions though:
1) is there any bcache boot option I can give to disable bcache at boot
time?

2) I had to boot from rescue media and sadly the version of wipefs there
wasn't good enough to find the bcache sig and remove it.
I then tried to change the bcache cache partition type to 0, but that didn't
help either.
Eventually I had to shrink the bcache cache partition to 1 cylinder and
finally then it stopped being detected at boot and the crashes stopped.
(dd of /dev/zero would have worked, but it's an ssd, and I didn't want to
allocate blocks on the flash that had not been used yet to give more room
for garbage collection).

Was there a better way of doing this?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Disabling bcache from boot when it crashes?
  2016-02-24 16:37   ` Disabling bcache from boot when it crashes? Marc MERLIN
@ 2016-02-24 19:10     ` Eric Wheeler
  2016-02-25  5:48       ` Marc MERLIN
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Wheeler @ 2016-02-24 19:10 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-bcache

On Wed, 24 Feb 2016, Marc MERLIN wrote:

> On Wed, Feb 24, 2016 at 06:53:05AM +0000, Eric Wheeler wrote:
> > Do you have the bcache stability patches? 4.3.3 might be missing some 
> > critical patches.
> > 
> > Be sure to cherry-pick these from linux 4.5-rc1:
> > 	git cherry-pick 2ef9ccbf~1..627ccd20
> > or use one of the 4.1 or 3.18 longterm kernels.
> > 
> > I've not see any memory allocation issues before in bcache, but you 
> > definitely want those patches for general stability.
> 
> I'm running 4.4.2, so I'm assuming I don't have those fixes, thanks for the
> heads up.

4.1.18 has the patches, so unless there is something specific in 4.4 that 
you need, I recommend 4.1.  We've been running 4.1.17 with patches in 
production for a while and it works great.  Haven't tried vanilla 4.1.18 
yet, but I plan to soon.
 
> Well, your message is timely, just as you wrote this, I got bcache that
> crashed my system as I was shutting down, and then I was unable to ever
> reboot because bcache would detect my partitions, start bcache, and crash
> before I could do anything to fix it.
> 
> A few questions though:
> 1) is there any bcache boot option I can give to disable bcache at boot
> time?

This is probably distribution specific.  Exclude bcache from your initrd 
unless your rootfs is bcache (update-initramfs, dracut, etc).  Maybe 
blacklist the module and manually modprobe it when you are ready to load 
it.
 
> 2) I had to boot from rescue media and sadly the version of wipefs there
> wasn't good enough to find the bcache sig and remove it.
> I then tried to change the bcache cache partition type to 0, but that didn't
> help either.
> Eventually I had to shrink the bcache cache partition to 1 cylinder and
> finally then it stopped being detected at boot and the crashes stopped.
> (dd of /dev/zero would have worked, but it's an ssd, and I didn't want to
> allocate blocks on the flash that had not been used yet to give more room
> for garbage collection).
> 
> Was there a better way of doing this?

Probably just `make-bcache -C` of the cache with a force option?  

I think the superblock is in the first 2MB of the disk, so you could do 
something like this:
	dd if=/dev/zero bs=1M count=2 of=/cachedev

> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.

Funny, I use my mouse the same way!

-Eric

> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/  
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-15 15:45     ` Christoph Nelles
  2016-02-23 16:32       ` Marc MERLIN
@ 2016-02-24 20:45       ` Marc MERLIN
  2016-02-25  0:58         ` Eric Wheeler
  2016-02-25 10:18         ` Zhu Yanhai
  1 sibling, 2 replies; 28+ messages in thread
From: Marc MERLIN @ 2016-02-24 20:45 UTC (permalink / raw)
  To: Christoph Nelles, Eric Wheeler; +Cc: linux-bcache

On Wed, Feb 24, 2016 at 06:53:05AM +0000, Eric Wheeler wrote:
> Be sure to cherry-pick these from linux 4.5-rc1:
> 	git cherry-pick 2ef9ccbf~1..627ccd20
> or use one of the 4.1 or 3.18 longterm kernels.
 
So, I added these patches to my 4.4.2 kernel, but it still crashes when
seeing one cache device at boot.

Crash:
https://goo.gl/photos/8H1DtYjSijK4ngFv6

	while (!kthread_should_stop()) {
		try_to_freeze();

		w = bch_keybuf_next(&dc->writeback_keys);
		if (!w)
			break;

>>>>>		BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));

		if (KEY_START(&w->key) != dc->last_read ||

I have to remove the partition for my system to boot.

Before I destroy it, any other patches I should try?

And to be fair, it's a huge pain to deal with this, there should be an
easier way to just turn bcache off from the kernel command line. In this
case it was really a lot of work to get back to even a booting system.

You also said:
> 4.1.18 has the patches, so unless there is something specific in 4.4 that
> you need, I recommend 4.1.  We've been running 4.1.17 with patches in
> production for a while and it works great.  Haven't tried vanilla 4.1.18
> yet, but I plan to soon.

Sadly, I run btrfs, I can't just go to random old kernels like this.
Is bcache not stable in up to date kernels?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-24 20:45       ` BUG: drivers/md/bcache/writeback.c:237 Marc MERLIN
@ 2016-02-25  0:58         ` Eric Wheeler
  2016-02-25  6:41           ` Eric Wheeler
  2016-02-25 10:18         ` Zhu Yanhai
  1 sibling, 1 reply; 28+ messages in thread
From: Eric Wheeler @ 2016-02-25  0:58 UTC (permalink / raw)
  To: kent.overstreet; +Cc: Christoph Nelles, linux-bcache, Marc MERLIN

[ +cc: kent ]

On Wed, 24 Feb 2016, Marc MERLIN wrote:

> On Wed, Feb 24, 2016 at 06:53:05AM +0000, Eric Wheeler wrote:
> > Be sure to cherry-pick these from linux 4.5-rc1:
> > 	git cherry-pick 2ef9ccbf~1..627ccd20
> > or use one of the 4.1 or 3.18 longterm kernels.
>  
> So, I added these patches to my 4.4.2 kernel, but it still crashes when
> seeing one cache device at boot.
> 
> Crash:
> https://goo.gl/photos/8H1DtYjSijK4ngFv6
> 

> static void read_dirty(struct cached_dev *dc)
> [...]
> 	while (!kthread_should_stop()) {
> 		try_to_freeze();
> 
> 		w = bch_keybuf_next(&dc->writeback_keys);
> 		if (!w)
> 			break;
> 
> >>>>>		BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> 
> 		if (KEY_START(&w->key) != dc->last_read ||

Kent, any idea whats going on here?  What is this BUG_ON checking?

It looks like dirty data is being read immediately after register, 
possibly due to a crash.

-Eric

> 
> I have to remove the partition for my system to boot.
> 
> Before I destroy it, any other patches I should try?
> 
> And to be fair, it's a huge pain to deal with this, there should be an
> easier way to just turn bcache off from the kernel command line. In this
> case it was really a lot of work to get back to even a booting system.
> 
> You also said:
> > 4.1.18 has the patches, so unless there is something specific in 4.4 that
> > you need, I recommend 4.1.  We've been running 4.1.17 with patches in
> > production for a while and it works great.  Haven't tried vanilla 4.1.18
> > yet, but I plan to soon.
> 
> Sadly, I run btrfs, I can't just go to random old kernels like this.
> Is bcache not stable in up to date kernels?
> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Disabling bcache from boot when it crashes?
  2016-02-24 19:10     ` Eric Wheeler
@ 2016-02-25  5:48       ` Marc MERLIN
  0 siblings, 0 replies; 28+ messages in thread
From: Marc MERLIN @ 2016-02-25  5:48 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: linux-bcache

On Wed, Feb 24, 2016 at 07:10:07PM +0000, Eric Wheeler wrote:
> On Wed, 24 Feb 2016, Marc MERLIN wrote:
> 
> > On Wed, Feb 24, 2016 at 06:53:05AM +0000, Eric Wheeler wrote:
> > > Do you have the bcache stability patches? 4.3.3 might be missing some 
> > > critical patches.
> > > 
> > > Be sure to cherry-pick these from linux 4.5-rc1:
> > > 	git cherry-pick 2ef9ccbf~1..627ccd20
> > > or use one of the 4.1 or 3.18 longterm kernels.
> > > 
> > > I've not see any memory allocation issues before in bcache, but you 
> > > definitely want those patches for general stability.
> > 
> > I'm running 4.4.2, so I'm assuming I don't have those fixes, thanks for the
> > heads up.
> 
> 4.1.18 has the patches, so unless there is something specific in 4.4 that 
> you need, I recommend 4.1.  We've been running 4.1.17 with patches in 
> production for a while and it works great.  Haven't tried vanilla 4.1.18 
> yet, but I plan to soon.
>  
> > Well, your message is timely, just as you wrote this, I got bcache that
> > crashed my system as I was shutting down, and then I was unable to ever
> > reboot because bcache would detect my partitions, start bcache, and crash
> > before I could do anything to fix it.
> > 
> > A few questions though:
> > 1) is there any bcache boot option I can give to disable bcache at boot
> > time?
> 
> This is probably distribution specific.  Exclude bcache from your initrd 
> unless your rootfs is bcache (update-initramfs, dracut, etc).  Maybe 
> blacklist the module and manually modprobe it when you are ready to load 
> it.
  
So I gave this some more thought. This is not how things work.
You have bcache either built in your kernel, so you can't remove it, or as
an early loaded module in initrd, and it's pretty damn hard to stop that
module from loading if your kernel crashes before you can even get a shell.

Either way, when you hit a crashing bug with bcache, your kernel will
consistently crash until you remove or delete the contents of the partition,
which you can only do from rescue media.
In my experience, the rescue media doesn't have make-bcache, or even a
version of wipefs recent enough to clear the cache partition, so you're left
with dd or just deleting the partition.
Needless to say that it's very sub optimal for the average user who doesn't
have rescue media on hand and can't look up the internet easily to find otu
what to do after their system doesn't boot.

So, I hope it makes sense that it would be good to have something like
bcache=off as a command line option to recover when things go wrong (in my
job, it's 'when', not 'if' ;) ).

> I think the superblock is in the first 2MB of the disk, so you could do 
> something like this:
> 	dd if=/dev/zero bs=1M count=2 of=/cachedev
 
Thanks.
I'll still wait for you to tell me that there is nothing you'd like me to
try to help you find out why bcache is crashing before I wipe the data on my
now corrupted cache partition.

> > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> 
> Funny, I use my mouse the same way!

:)

Cheers,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-25  0:58         ` Eric Wheeler
@ 2016-02-25  6:41           ` Eric Wheeler
  2016-02-25  7:36             ` Eric Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Wheeler @ 2016-02-25  6:41 UTC (permalink / raw)
  To: kent.overstreet; +Cc: Christoph Nelles, linux-bcache, Marc MERLIN

On Thu, 25 Feb 2016, Eric Wheeler wrote:

> [ +cc: kent ]
> 
> On Wed, 24 Feb 2016, Marc MERLIN wrote:
> 
> > On Wed, Feb 24, 2016 at 06:53:05AM +0000, Eric Wheeler wrote:
> > > Be sure to cherry-pick these from linux 4.5-rc1:
> > > 	git cherry-pick 2ef9ccbf~1..627ccd20
> > > or use one of the 4.1 or 3.18 longterm kernels.
> >  
> > So, I added these patches to my 4.4.2 kernel, but it still crashes when
> > seeing one cache device at boot.
> > 
> > Crash:
> > https://goo.gl/photos/8H1DtYjSijK4ngFv6
> > 
> 
> > static void read_dirty(struct cached_dev *dc)
> > [...]
> > 	while (!kthread_should_stop()) {
> > 		try_to_freeze();
> > 
> > 		w = bch_keybuf_next(&dc->writeback_keys);
> > 		if (!w)
> > 			break;
> > 
> > >>>>>		BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> > 
> > 		if (KEY_START(&w->key) != dc->last_read ||
> 
> Kent, any idea whats going on here?  What is this BUG_ON checking?
> 
> It looks like dirty data is being read immediately after register, 
> possibly due to a crash.

The calltrace in the image [https://goo.gl/photos/8H1DtYjSijK4ngFv6] 
indicates something about kthread_parkme.  Is our thread being woken 
unexpectedly by kthread_park?

If so, maybe we can do a better job handling the BUG condition.  

What would happen if we did something like this:

	+if (kthread_should_park()) {
	+	kthread_parkme();
	+	break;
	+}

	BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));


/* and maybe this too: */
	-BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
	+if (ptr_stale(dc->disk.c, &w->key, 0))
	+	break;

Or would `continue` be more appropriate?  I don't think anything is lost 
at the point we break because the writeback thread will continue to 
iterate and retry the call to read_dirty().  It hasn't kzalloc'ed yet, so 
no cleanup necessary.

Cleary there is a condition that isn't being handled gracfully enough, and 
clearly it cannot continue if the BUG condition is met---but its in a loop 
so can we safely iterate to retry instead of BUGing??  


Kent,

Do you think this patch would solve the BUG_ON condition in this case?


==============================================
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index ca38362..529310a 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -234,7 +234,8 @@ static void read_dirty(struct cached_dev *dc)
		if (!w)
			break;
 
-	       BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
+	       if (ptr_stale(dc->disk.c, &w->key, 0))
+		       goto err;
 
		if (KEY_START(&w->key) != dc->last_read ||
		    jiffies_to_msecs(delay) > 50)
@@ -282,6 +283,10 @@ err:
	 * freed) before refilling again
	 */
	closure_sync(&cl);
+
+       if (kthread_should_park())
+	       kthread_parkme();
+
 }
 
 /* Scan for dirty data */
==============================================


-Eric

> 
> -Eric
> 
> > 
> > I have to remove the partition for my system to boot.
> > 
> > Before I destroy it, any other patches I should try?
> > 
> > And to be fair, it's a huge pain to deal with this, there should be an
> > easier way to just turn bcache off from the kernel command line. In this
> > case it was really a lot of work to get back to even a booting system.
> > 
> > You also said:
> > > 4.1.18 has the patches, so unless there is something specific in 4.4 that
> > > you need, I recommend 4.1.  We've been running 4.1.17 with patches in
> > > production for a while and it works great.  Haven't tried vanilla 4.1.18
> > > yet, but I plan to soon.
> > 
> > Sadly, I run btrfs, I can't just go to random old kernels like this.
> > Is bcache not stable in up to date kernels?
> > 
> > Thanks,
> > Marc
> > -- 
> > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> > Microsoft is to operating systems ....
> >                                       .... what McDonalds is to gourmet cooking
> > Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-25  6:41           ` Eric Wheeler
@ 2016-02-25  7:36             ` Eric Wheeler
  2016-02-25 10:08               ` Zhu Yanhai
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Wheeler @ 2016-02-25  7:36 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: kent.overstreet, Christoph Nelles, linux-bcache

On Thu, 25 Feb 2016, Eric Wheeler wrote:

> On Thu, 25 Feb 2016, Eric Wheeler wrote:
> 
> > [ +cc: kent ]
> > 
> > On Wed, 24 Feb 2016, Marc MERLIN wrote:
> > 
> > > On Wed, Feb 24, 2016 at 06:53:05AM +0000, Eric Wheeler wrote:
> > > > Be sure to cherry-pick these from linux 4.5-rc1:
> > > > 	git cherry-pick 2ef9ccbf~1..627ccd20
> > > > or use one of the 4.1 or 3.18 longterm kernels.
> > >  
> > > So, I added these patches to my 4.4.2 kernel, but it still crashes when
> > > seeing one cache device at boot.
> > > 
> > > Crash:
> > > https://goo.gl/photos/8H1DtYjSijK4ngFv6
> > > 
> > 
> > > static void read_dirty(struct cached_dev *dc)
> > > [...]
> > > 	while (!kthread_should_stop()) {
> > > 		try_to_freeze();
> > > 
> > > 		w = bch_keybuf_next(&dc->writeback_keys);
> > > 		if (!w)
> > > 			break;
> > > 
> > > >>>>>		BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> > > 
> > > 		if (KEY_START(&w->key) != dc->last_read ||
> > 
> > Kent, any idea whats going on here?  What is this BUG_ON checking?
> > 
> > It looks like dirty data is being read immediately after register, 
> > possibly due to a crash.
> 
> The calltrace in the image [https://goo.gl/photos/8H1DtYjSijK4ngFv6] 
> indicates something about kthread_parkme.  Is our thread being woken 
> unexpectedly by kthread_park?
> 
> If so, maybe we can do a better job handling the BUG condition.  
> 
> What would happen if we did something like this:
> 
> 	+if (kthread_should_park()) {
> 	+	kthread_parkme();
> 	+	break;
> 	+}
> 
> 	BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> 
> 
> /* and maybe this too: */
> 	-BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> 	+if (ptr_stale(dc->disk.c, &w->key, 0))
> 	+	break;
> 
> Or would `continue` be more appropriate?  I don't think anything is lost 
> at the point we break because the writeback thread will continue to 
> iterate and retry the call to read_dirty().  It hasn't kzalloc'ed yet, so 
> no cleanup necessary.
> 
> Cleary there is a condition that isn't being handled gracfully enough, and 
> clearly it cannot continue if the BUG condition is met---but its in a loop 
> so can we safely iterate to retry instead of BUGing??  
> 
> 
> Kent,
> 
> Do you think this patch would solve the BUG_ON condition in this case?
> 
> 
> ==============================================
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index ca38362..529310a 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -234,7 +234,8 @@ static void read_dirty(struct cached_dev *dc)
> 		if (!w)
> 			break;
>  
> -	       BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> +	       if (ptr_stale(dc->disk.c, &w->key, 0))
> +		       goto err;
>  
> 		if (KEY_START(&w->key) != dc->last_read ||
> 		    jiffies_to_msecs(delay) > 50)
> @@ -282,6 +283,10 @@ err:
> 	 * freed) before refilling again
> 	 */
> 	closure_sync(&cl);
> +
> +       if (kthread_should_park())
> +	       kthread_parkme();
> +
>  }

It looks like I can't call kthread_park* functions, perhaps those are 
handled internally:

	WARNING: "kthread_should_park" [drivers/md/bcache//bcache.ko] undefined!
	WARNING: "kthread_parkme" [drivers/md/bcache//bcache.ko] undefined!

I think it would still work if the kthread_*park* function calls were 
removed from the earlier patch based on the code flow.  It should just 
iterate and retry---assuming that it can retry.  At least it 
wouldn't BUG.

If this happens at cache registration, is it a race between writeback 
running too soon and the datastructures not being fully populated?

Can anyone else comment here?  

-Eric

  
>  /* Scan for dirty data */
> ==============================================
> 
> 
> -Eric
> 
> > 
> > -Eric
> > 
> > > 
> > > I have to remove the partition for my system to boot.
> > > 
> > > Before I destroy it, any other patches I should try?
> > > 
> > > And to be fair, it's a huge pain to deal with this, there should be an
> > > easier way to just turn bcache off from the kernel command line. In this
> > > case it was really a lot of work to get back to even a booting system.
> > > 
> > > You also said:
> > > > 4.1.18 has the patches, so unless there is something specific in 4.4 that
> > > > you need, I recommend 4.1.  We've been running 4.1.17 with patches in
> > > > production for a while and it works great.  Haven't tried vanilla 4.1.18
> > > > yet, but I plan to soon.
> > > 
> > > Sadly, I run btrfs, I can't just go to random old kernels like this.
> > > Is bcache not stable in up to date kernels?
> > > 
> > > Thanks,
> > > Marc
> > > -- 
> > > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> > > Microsoft is to operating systems ....
> > >                                       .... what McDonalds is to gourmet cooking
> > > Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-25  7:36             ` Eric Wheeler
@ 2016-02-25 10:08               ` Zhu Yanhai
  2016-02-26  2:38                 ` Eric Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Zhu Yanhai @ 2016-02-25 10:08 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Marc MERLIN, Kent Overstreet, Christoph Nelles, linux-bcache

2016-02-25 15:36 GMT+08:00 Eric Wheeler <bcache@lists.ewheeler.net>:
>
> On Thu, 25 Feb 2016, Eric Wheeler wrote:
>
> > On Thu, 25 Feb 2016, Eric Wheeler wrote:
> >
> > > [ +cc: kent ]
> > >
> > > On Wed, 24 Feb 2016, Marc MERLIN wrote:
> > >
> > > > On Wed, Feb 24, 2016 at 06:53:05AM +0000, Eric Wheeler wrote:
> > > > > Be sure to cherry-pick these from linux 4.5-rc1:
> > > > >         git cherry-pick 2ef9ccbf~1..627ccd20
> > > > > or use one of the 4.1 or 3.18 longterm kernels.
> > > >
> > > > So, I added these patches to my 4.4.2 kernel, but it still crashes when
> > > > seeing one cache device at boot.
> > > >
> > > > Crash:
> > > > https://goo.gl/photos/8H1DtYjSijK4ngFv6
> > > >
> > >
> > > > static void read_dirty(struct cached_dev *dc)
> > > > [...]
> > > >   while (!kthread_should_stop()) {
> > > >           try_to_freeze();
> > > >
> > > >           w = bch_keybuf_next(&dc->writeback_keys);
> > > >           if (!w)
> > > >                   break;
> > > >
> > > > >>>>>             BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> > > >
> > > >           if (KEY_START(&w->key) != dc->last_read ||
> > >
> > > Kent, any idea whats going on here?  What is this BUG_ON checking?
> > >
> > > It looks like dirty data is being read immediately after register,
> > > possibly due to a crash.
> >
> > The calltrace in the image [https://goo.gl/photos/8H1DtYjSijK4ngFv6]
> > indicates something about kthread_parkme.  Is our thread being woken
> > unexpectedly by kthread_park?
> >
> > If so, maybe we can do a better job handling the BUG condition.
> >
> > What would happen if we did something like this:
> >
> >       +if (kthread_should_park()) {
> >       +       kthread_parkme();
> >       +       break;
> >       +}
> >
> >       BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> >
> >
> > /* and maybe this too: */
> >       -BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> >       +if (ptr_stale(dc->disk.c, &w->key, 0))
> >       +       break;
> >
> > Or would `continue` be more appropriate?  I don't think anything is lost
> > at the point we break because the writeback thread will continue to
> > iterate and retry the call to read_dirty().  It hasn't kzalloc'ed yet, so
> > no cleanup necessary.
> >
> > Cleary there is a condition that isn't being handled gracfully enough, and
> > clearly it cannot continue if the BUG condition is met---but its in a loop
> > so can we safely iterate to retry instead of BUGing??
> >
> >
> > Kent,
> >
> > Do you think this patch would solve the BUG_ON condition in this case?
> >
> >
> > ==============================================
> > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> > index ca38362..529310a 100644
> > --- a/drivers/md/bcache/writeback.c
> > +++ b/drivers/md/bcache/writeback.c
> > @@ -234,7 +234,8 @@ static void read_dirty(struct cached_dev *dc)
> >               if (!w)
> >                       break;
> >
> > -            BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> > +            if (ptr_stale(dc->disk.c, &w->key, 0))
> > +                    goto err;
> >
> >               if (KEY_START(&w->key) != dc->last_read ||
> >                   jiffies_to_msecs(delay) > 50)
> > @@ -282,6 +283,10 @@ err:
> >        * freed) before refilling again
> >        */
> >       closure_sync(&cl);
> > +
> > +       if (kthread_should_park())
> > +            kthread_parkme();
> > +
> >  }
>
> It looks like I can't call kthread_park* functions, perhaps those are
> handled internally:
>
>         WARNING: "kthread_should_park" [drivers/md/bcache//bcache.ko] undefined!
>         WARNING: "kthread_parkme" [drivers/md/bcache//bcache.ko] undefined!
>
> I think it would still work if the kthread_*park* function calls were
> removed from the earlier patch based on the code flow.  It should just
> iterate and retry---assuming that it can retry.  At least it
> wouldn't BUG.
>
> If this happens at cache registration, is it a race between writeback
> running too soon and the datastructures not being fully populated?
>
> Can anyone else comment here?

Hi Eric,
I'm not sure why you think it's caused by some kthread park. It is a
BUG_ON with no doubt, since all keys in writeback_keys should be
non-stale otherwise the writeback thread might be writing back some
garbage to the backend device.
See bch_btree_gc_finish(), the buckets pointed by writeback_keys are
marked as GC_MARK_DIRTY, to prevent them be reclaimed by the allocator
in the next, so theoretically the keys won't be stale.
I guess there is some race between the device register path, the early
stage GC and writeback. I think you won't see this BUG_ON after the
whole system take off. But once it happens the bucket with wrong
generation get persistence in SSD, which means you can't use the cache
device any more.

-zyh

>
>
> -Eric
>
>
> >  /* Scan for dirty data */
> > ==============================================
> >
> >
> > -Eric
> >
> > >
> > > -Eric
> > >
> > > >
> > > > I have to remove the partition for my system to boot.
> > > >
> > > > Before I destroy it, any other patches I should try?
> > > >
> > > > And to be fair, it's a huge pain to deal with this, there should be an
> > > > easier way to just turn bcache off from the kernel command line. In this
> > > > case it was really a lot of work to get back to even a booting system.
> > > >
> > > > You also said:
> > > > > 4.1.18 has the patches, so unless there is something specific in 4.4 that
> > > > > you need, I recommend 4.1.  We've been running 4.1.17 with patches in
> > > > > production for a while and it works great.  Haven't tried vanilla 4.1.18
> > > > > yet, but I plan to soon.
> > > >
> > > > Sadly, I run btrfs, I can't just go to random old kernels like this.
> > > > Is bcache not stable in up to date kernels?
> > > >
> > > > Thanks,
> > > > Marc
> > > > --
> > > > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> > > > Microsoft is to operating systems ....
> > > >                                       .... what McDonalds is to gourmet cooking
> > > > Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-24 20:45       ` BUG: drivers/md/bcache/writeback.c:237 Marc MERLIN
  2016-02-25  0:58         ` Eric Wheeler
@ 2016-02-25 10:18         ` Zhu Yanhai
  2016-02-25 15:20           ` Marc MERLIN
  1 sibling, 1 reply; 28+ messages in thread
From: Zhu Yanhai @ 2016-02-25 10:18 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Christoph Nelles, Eric Wheeler, linux-bcache

2016-02-25 4:45 GMT+08:00 Marc MERLIN <marc@merlins.org>:
> On Wed, Feb 24, 2016 at 06:53:05AM +0000, Eric Wheeler wrote:
>> Be sure to cherry-pick these from linux 4.5-rc1:
>>       git cherry-pick 2ef9ccbf~1..627ccd20
>> or use one of the 4.1 or 3.18 longterm kernels.
>
> So, I added these patches to my 4.4.2 kernel, but it still crashes when
> seeing one cache device at boot.
>
> Crash:
> https://goo.gl/photos/8H1DtYjSijK4ngFv6
>
>         while (!kthread_should_stop()) {
>                 try_to_freeze();
>
>                 w = bch_keybuf_next(&dc->writeback_keys);
>                 if (!w)
>                         break;
>
>>>>>>           BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
>
>                 if (KEY_START(&w->key) != dc->last_read ||
>
> I have to remove the partition for my system to boot.
>
> Before I destroy it, any other patches I should try?
>
> And to be fair, it's a huge pain to deal with this, there should be an
> easier way to just turn bcache off from the kernel command line. In this
> case it was really a lot of work to get back to even a booting system.
>
> You also said:
>> 4.1.18 has the patches, so unless there is something specific in 4.4 that
>> you need, I recommend 4.1.  We've been running 4.1.17 with patches in
>> production for a while and it works great.  Haven't tried vanilla 4.1.18
>> yet, but I plan to soon.
>
> Sadly, I run btrfs, I can't just go to random old kernels like this.
> Is bcache not stable in up to date kernels?
>
> Thanks,
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Marc,
When did you *first* see this BUG_ON? During boot up or far after the
whole system is up?

-zyh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-25 10:18         ` Zhu Yanhai
@ 2016-02-25 15:20           ` Marc MERLIN
  2016-02-25 23:44             ` Eric Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Marc MERLIN @ 2016-02-25 15:20 UTC (permalink / raw)
  To: Zhu Yanhai; +Cc: Christoph Nelles, Eric Wheeler, linux-bcache

On Thu, Feb 25, 2016 at 06:18:03PM +0800, Zhu Yanhai wrote:
> Marc,
> When did you *first* see this BUG_ON? During boot up or far after the
> whole system is up?

I setup bcache, had it work a bit.
During shutdown, it crapped out when syncing/unmounting
After each subsequent boot, the system crashed as soon as the module loaded
and scanned my disks.

While I understand I did hit a bug, I've been on a cruisade to get BUG_ON
removed from btrfs code and replaced with warns/abort/remount read only.
On a production system, BUG_ON really sucks, especially if it happens as
soon as the kernel loads and before you can even do anything to fix the
issue :)

If there is any chance this can be changed into some kind of abort that
disables bcache or just prevents bcache from making damaging writes to the
backing device, by simply disabling the cache, that would be much better :)

In the meantime, I've stopped using writeback since it seems that it's been
the reason why I hit this problem.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-25 15:20           ` Marc MERLIN
@ 2016-02-25 23:44             ` Eric Wheeler
  2016-02-26  0:17               ` Marc MERLIN
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Wheeler @ 2016-02-25 23:44 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Zhu Yanhai, Christoph Nelles, linux-bcache


On Thu, 25 Feb 2016, Marc MERLIN wrote:

> On Thu, Feb 25, 2016 at 06:18:03PM +0800, Zhu Yanhai wrote:
> > Marc,
> > When did you *first* see this BUG_ON? During boot up or far after the
> > whole system is up?
> 
> I setup bcache, had it work a bit.
> During shutdown, it crapped out when syncing/unmounting

Do you have more information about what crapped out on shutdown?  
   Memory?
   IO Error?
   Just hung so the reset button was pushed?

Was there a backtrace?

It might be a good idea to use netconsole and point it at a syslog 
server to catch the whole backtrace.

> After each subsequent boot, the system crashed as soon as the module loaded
> and scanned my disks.

So just to clarify, the current BUG_ON discussed is happening at boot, and 
is not the mid-shutdown error that first happened?

-Eric

> 
> While I understand I did hit a bug, I've been on a cruisade to get BUG_ON
> removed from btrfs code and replaced with warns/abort/remount read only.
> On a production system, BUG_ON really sucks, especially if it happens as
> soon as the kernel loads and before you can even do anything to fix the
> issue :)
> 
> If there is any chance this can be changed into some kind of abort that
> disables bcache or just prevents bcache from making damaging writes to the
> backing device, by simply disabling the cache, that would be much better :)
> 
> In the meantime, I've stopped using writeback since it seems that it's been
> the reason why I hit this problem.
> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/  
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-25 23:44             ` Eric Wheeler
@ 2016-02-26  0:17               ` Marc MERLIN
  0 siblings, 0 replies; 28+ messages in thread
From: Marc MERLIN @ 2016-02-26  0:17 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Zhu Yanhai, Christoph Nelles, linux-bcache

On Thu, Feb 25, 2016 at 11:44:01PM +0000, Eric Wheeler wrote:
> 
> On Thu, 25 Feb 2016, Marc MERLIN wrote:
> 
> > On Thu, Feb 25, 2016 at 06:18:03PM +0800, Zhu Yanhai wrote:
> > > Marc,
> > > When did you *first* see this BUG_ON? During boot up or far after the
> > > whole system is up?
> > 
> > I setup bcache, had it work a bit.
> > During shutdown, it crapped out when syncing/unmounting
> 
> Do you have more information about what crapped out on shutdown?  
>    Memory?
>    IO Error?
>    Just hung so the reset button was pushed?
 
Sadly, I didn't record that carefully. I remember a kernel traceback and
system hang, but that's about it.

> Was there a backtrace?

Yes, but not captured :(

> It might be a good idea to use netconsole and point it at a syslog 
> server to catch the whole backtrace.

Next time sure, but too late now, and no netconsole if the system crashes
before I can even bring the ethernet up.

> > After each subsequent boot, the system crashed as soon as the module loaded
> > and scanned my disks.
> 
> So just to clarify, the current BUG_ON discussed is happening at boot, and 
> is not the mid-shutdown error that first happened?

Correct. The only reason my system is booting right now is that I hid the
partition where the bcache cache is, so it doesn't get seen at boot.
If it's there, as soon as the kernel boots and bcache activates, it crashes.
I'm keeping it in case you want me to try a patch to see if it'll stop the
crashing at boot.

Hence my request for a way to turn off bcache as a kernel command line option 
to allow for recovery in such cases.

Sadly even if we don't have perfect state on how we got there, bad data
shouldn't cause bcache to crash the kernel at boot. It could refuse to make
the cache active, log an error, and move on.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-25 10:08               ` Zhu Yanhai
@ 2016-02-26  2:38                 ` Eric Wheeler
  2016-02-26  2:46                   ` Marc MERLIN
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Wheeler @ 2016-02-26  2:38 UTC (permalink / raw)
  To: Zhu Yanhai; +Cc: Marc MERLIN, Kent Overstreet, Christoph Nelles, linux-bcache

On Thu, 25 Feb 2016, Zhu Yanhai wrote:
> > If this happens at cache registration, is it a race between writeback
> > running too soon and the datastructures not being fully populated?
> >
> > Can anyone else comment here?
> 
> Hi Eric,
> I'm not sure why you think it's caused by some kthread park. 

Because of the backtrace in his photo, but I agree, this is definitely a 
BUG_ON in read_dirty(). 

> See bch_btree_gc_finish(), the buckets pointed by writeback_keys are
> marked as GC_MARK_DIRTY, to prevent them be reclaimed by the allocator
> in the next, so theoretically the keys won't be stale.

Interesting.  What does stale mean in this context?  

Sounds like this original problem was caused at shutdown and the bug we 
are working on is the result of that issue.  It would be nice to know what 
caused that if we see any future bug reports like this. 

Could this mean his on-SSD writeback cache (or controller writeback cache) 
didn't get the btree consistent on disk at shutdown---or that there was a 
bug at shutdown that prevented it?  Unfortunately we don't have a trace of 
the problem at shutdown.

> I guess there is some race between the device register path, the early
> stage GC and writeback. I think you won't see this BUG_ON after the
> whole system take off. 

Agreed.  Please comment on the patch below.  

> But once it happens the bucket with wrong generation get persistence in 
> SSD, which means you can't use the cache device any more.

Can you clarify the the "it" of when "it happens" ?  What is "it" that 
bumps the generation proximate to the BUG_ON in read_dirty()?

How does this mean you can't use the cache device?  Is it completely 
invalid?


Marc, 

Disclaimer: if you are comfortable forcing the writeback thread to proceed 
instead of BUG, then try this patch.  It may cause other problems, or it 
may not work at all.  If the cache is attached and corrupt, then its 
backing dev could become corrupt during writeback if bcache still thinks 
they are associated.

Here is how it works:

First, in super.c, I hold the writeback lock until initialization is 
complete on the referenced cached_dev *dc and release it when it should be 
safe to proceed; this should work because bch_writeback_thread downs 
writeback_lock at the top of its loop.

Next, in writeback.c's read_dirty(), I jump to some additional error 
handling instead of BUG_ON.  Except for debug output, it handles the early 
exit in the the same way as the out of memory path.

If you feel that it is safe to run this code, then I would be interested 
to know the result.  If it works, then I wonder which of the two patches 
solved the problem.  If the problem is persistent, then you should get the 
printk(KERN_WARNING) every time writeback runs.  OTOH, if it only printk's 
a few times, then it is an initial startup issue (but the locking should 
prevent that).

-Eric


====================================================================
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index a542b58..c0362a0 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1016,8 +1016,12 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
 	 */
 	atomic_set(&dc->count, 1);
 
-	if (bch_cached_dev_writeback_start(dc))
+	/* Block writeback thread, but spawn it */
+	down_write(&dc->writeback_lock);
+	if (bch_cached_dev_writeback_start(dc)) {
+		up_write(&dc->writeback_lock);
 		return -ENOMEM;
+	}
 
 	if (BDEV_STATE(&dc->sb) == BDEV_STATE_DIRTY) {
 		bch_sectors_dirty_init(dc);
@@ -1029,6 +1033,9 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
 	bch_cached_dev_run(dc);
 	bcache_device_link(&dc->disk, c, "bdev");
 
+	/* Allow the writeback thread to proceed */
+	up_write(&dc->writeback_lock);
+
 	pr_info("Caching %s as %s on set %pU",
 		bdevname(dc->bdev, buf), dc->disk.disk->disk_name,
 		dc->disk.c->sb.set_uuid);
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index ca38362..0fe5de0 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -234,7 +234,8 @@ static void read_dirty(struct cached_dev *dc)
 		if (!w)
 			break;
 
-		BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
+		if (ptr_stale(dc->disk.c, &w->key, 0))
+			goto err_ptr_stale;
 
 		if (KEY_START(&w->key) != dc->last_read ||
 		    jiffies_to_msecs(delay) > 50)
@@ -273,7 +274,13 @@ static void read_dirty(struct cached_dev *dc)
 	if (0) {
 err_free:
 		kfree(w->private);
+
+err_ptr_stale:
+		printk(KERN_WARNING "ptr_stale(dc->disk.c, &w->key, 0) in read_dirty() with dc->disk.flags=%lx\n",
+			dc->disk.flags);
+		dump_stack();
 err:
+
 		bch_keybuf_del(&dc->writeback_keys, w);
 	}
 
====================================================================

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-26  2:38                 ` Eric Wheeler
@ 2016-02-26  2:46                   ` Marc MERLIN
  2016-02-26  3:19                     ` Marc MERLIN
  0 siblings, 1 reply; 28+ messages in thread
From: Marc MERLIN @ 2016-02-26  2:46 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Zhu Yanhai, Kent Overstreet, Christoph Nelles, linux-bcache

On Fri, Feb 26, 2016 at 02:38:11AM +0000, Eric Wheeler wrote:
> Disclaimer: if you are comfortable forcing the writeback thread to proceed 
> instead of BUG, then try this patch.  It may cause other problems, or it 
> may not work at all.  If the cache is attached and corrupt, then its 
> backing dev could become corrupt during writeback if bcache still thinks 
> they are associated.

Thanks for the patch.

So, I could lose the data on the target filesystem, but it's 2TB and if it
gets mangled in a subtle way, that wouldn't be so great.
Btrfs is happier with data that never gets written since it's atomic, than
when garbage that shows up later.

Actually, I've now already mounted and used the backing device without the
cache, so it would be unsafe to bring up the cache and write it considering
the backing device is more ahead then the cache, correct?

Given that, I'm happy to run code that spits out diagostics, but probably
not something that will attempt to write an outdated cache on a filesytem
that's already ahead.
Or does bcache understand that the backing device is ahead already and the
cache should just be discarded?

Thanks,
Marc

> Here is how it works:
> 
> First, in super.c, I hold the writeback lock until initialization is 
> complete on the referenced cached_dev *dc and release it when it should be 
> safe to proceed; this should work because bch_writeback_thread downs 
> writeback_lock at the top of its loop.
> 
> Next, in writeback.c's read_dirty(), I jump to some additional error 
> handling instead of BUG_ON.  Except for debug output, it handles the early 
> exit in the the same way as the out of memory path.
> 
> If you feel that it is safe to run this code, then I would be interested 
> to know the result.  If it works, then I wonder which of the two patches 
> solved the problem.  If the problem is persistent, then you should get the 
> printk(KERN_WARNING) every time writeback runs.  OTOH, if it only printk's 
> a few times, then it is an initial startup issue (but the locking should 
> prevent that).
> 
> -Eric
> 
> 
> ====================================================================
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index a542b58..c0362a0 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1016,8 +1016,12 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
>  	 */
>  	atomic_set(&dc->count, 1);
>  
> -	if (bch_cached_dev_writeback_start(dc))
> +	/* Block writeback thread, but spawn it */
> +	down_write(&dc->writeback_lock);
> +	if (bch_cached_dev_writeback_start(dc)) {
> +		up_write(&dc->writeback_lock);
>  		return -ENOMEM;
> +	}
>  
>  	if (BDEV_STATE(&dc->sb) == BDEV_STATE_DIRTY) {
>  		bch_sectors_dirty_init(dc);
> @@ -1029,6 +1033,9 @@ int bch_cached_dev_attach(struct cached_dev *dc, struct cache_set *c)
>  	bch_cached_dev_run(dc);
>  	bcache_device_link(&dc->disk, c, "bdev");
>  
> +	/* Allow the writeback thread to proceed */
> +	up_write(&dc->writeback_lock);
> +
>  	pr_info("Caching %s as %s on set %pU",
>  		bdevname(dc->bdev, buf), dc->disk.disk->disk_name,
>  		dc->disk.c->sb.set_uuid);
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index ca38362..0fe5de0 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -234,7 +234,8 @@ static void read_dirty(struct cached_dev *dc)
>  		if (!w)
>  			break;
>  
> -		BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
> +		if (ptr_stale(dc->disk.c, &w->key, 0))
> +			goto err_ptr_stale;
>  
>  		if (KEY_START(&w->key) != dc->last_read ||
>  		    jiffies_to_msecs(delay) > 50)
> @@ -273,7 +274,13 @@ static void read_dirty(struct cached_dev *dc)
>  	if (0) {
>  err_free:
>  		kfree(w->private);
> +
> +err_ptr_stale:
> +		printk(KERN_WARNING "ptr_stale(dc->disk.c, &w->key, 0) in read_dirty() with dc->disk.flags=%lx\n",
> +			dc->disk.flags);
> +		dump_stack();
>  err:
> +
>  		bch_keybuf_del(&dc->writeback_keys, w);
>  	}
>  
> ====================================================================
> 
> 
> 

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-26  2:46                   ` Marc MERLIN
@ 2016-02-26  3:19                     ` Marc MERLIN
  2016-02-26  4:55                       ` Eric Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Marc MERLIN @ 2016-02-26  3:19 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Zhu Yanhai, Kent Overstreet, Christoph Nelles, linux-bcache

On Thu, Feb 25, 2016 at 06:46:33PM -0800, Marc MERLIN wrote:
> Given that, I'm happy to run code that spits out diagostics, but probably
> not something that will attempt to write an outdated cache on a filesytem
> that's already ahead.
> Or does bcache understand that the backing device is ahead already and the
> cache should just be discarded?

Yeah, sorry, I did forget to mention that I did a force run on the backing
device so that I could get to its data.
In doing so, I probably invalidated options to try and reply the cache to
see if things would crash, or not, sorry about that.

Now, if I try to make the cache available again, will the code be smart and
not try to reapply a stale cache (never mind that it's probably corrupted in
addition to being old?)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-26  3:19                     ` Marc MERLIN
@ 2016-02-26  4:55                       ` Eric Wheeler
  2016-02-26 16:27                         ` Marc MERLIN
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Wheeler @ 2016-02-26  4:55 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Zhu Yanhai, Kent Overstreet, Christoph Nelles, linux-bcache

On Thu, 25 Feb 2016, Marc MERLIN wrote:
> On Thu, Feb 25, 2016 at 06:46:33PM -0800, Marc MERLIN wrote:
> > Given that, I'm happy to run code that spits out diagostics, but probably
> > not something that will attempt to write an outdated cache on a filesytem
> > that's already ahead.
> > Or does bcache understand that the backing device is ahead already and the
> > cache should just be discarded?
> 
> Yeah, sorry, I did forget to mention that I did a force run on the 
> backing device so that I could get to its data. In doing so, I probably 
> invalidated options to try and reply the cache to see if things would 
> crash, or not, sorry about that.
> 
> Now, if I try to make the cache available again, will the code be smart and
> not try to reapply a stale cache (never mind that it's probably corrupted in
> addition to being old?)

According to Documentation/bcache.txt:
	"" If you're booting up and your cache device is gone and never
	coming back, you can force run the backing device:
	  echo 1 > /sys/block/sdb/bcache/running
	[...]
	The backing device will still use that cache set if it shows up
	in the future, but all the cached data will be invalidated.  ""

So it seems that you are safe.  (It would be interesting to know how it 
invalidates the cache.  Maybe bumps the Set UUID?  Not sure.)

This is what the patch does:

The code in super.c would do what bcache has always done in your kernel, 
except that it holds the writeback_lock to prevent a writeback kthread 
race before initialization is complete.  All that does is delay the 
writeback thread slightly.  If that fixes it, then it is a trivial patch 
from which everyone may benefit.

The code in writeback.c just jumps to the OOM case instead of bugging. If 
it does meet the printk(KERN_WARNING)+dump_stack() condition, then it 
might give us useful data to troubleshoot from.  I think this is quite 
safe since the OOM path just sleeps the kthread until the next writeback 
is scheduled and will try again.  If it dump_stack()'s every scheduled 
writeback interval, then something is seriously wrong somewhere else.

-Eric



> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/  
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-26  4:55                       ` Eric Wheeler
@ 2016-02-26 16:27                         ` Marc MERLIN
  2016-02-26 21:17                           ` Eric Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Marc MERLIN @ 2016-02-26 16:27 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Zhu Yanhai, Kent Overstreet, Christoph Nelles, linux-bcache

On Fri, Feb 26, 2016 at 04:55:02AM +0000, Eric Wheeler wrote:
> According to Documentation/bcache.txt:
> 	"" If you're booting up and your cache device is gone and never
> 	coming back, you can force run the backing device:
> 	  echo 1 > /sys/block/sdb/bcache/running
> 	[...]
> 	The backing device will still use that cache set if it shows up
> 	in the future, but all the cached data will be invalidated.  ""
> 
> So it seems that you are safe.  (It would be interesting to know how it 
> invalidates the cache.  Maybe bumps the Set UUID?  Not sure.)
 
Yeah, that was  my understanding too, but I wanted to make sure.
Strangely (worringly so?) the cache was replayed at boot, and this time
nothing crashed, or any traceback.

Now I'm wondering if it pushed garbage onto my filesystem :-/

Again, no netconsole, sorry, this happens before my ethernet interface
comes up.
https://goo.gl/photos/suqp9sHyijdt9iUG7

sda6 was the partition I hid and just came back.
sdb1 is the bcache linked to it.

On the plus side, no crash, although this didn't get to exercise your
new code either.

Either way, I'm really starting to have mixed feelings about using
writeback if it's going to give me random crashes and subsequent
corruption (which is a risk listed in the doc, admittedly).

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-26 16:27                         ` Marc MERLIN
@ 2016-02-26 21:17                           ` Eric Wheeler
  2016-03-03  4:17                             ` Eric Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Wheeler @ 2016-02-26 21:17 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Zhu Yanhai, Kent Overstreet, Christoph Nelles, linux-bcache

On Fri, 26 Feb 2016, Marc MERLIN wrote:

> On Fri, Feb 26, 2016 at 04:55:02AM +0000, Eric Wheeler wrote:
> > According to Documentation/bcache.txt:
> > 	"" If you're booting up and your cache device is gone and never
> > 	coming back, you can force run the backing device:
> > 	  echo 1 > /sys/block/sdb/bcache/running
> > 	[...]
> > 	The backing device will still use that cache set if it shows up
> > 	in the future, but all the cached data will be invalidated.  ""
> > 
> > So it seems that you are safe.  (It would be interesting to know how it 
> > invalidates the cache.  Maybe bumps the Set UUID?  Not sure.)
>  
> Yeah, that was  my understanding too, but I wanted to make sure.
> Strangely (worringly so?) the cache was replayed at boot, and this time
> nothing crashed, or any traceback.

No crash is a good thing!  I think the lock solved it then.  If the lock 
wasn't the problem, then you would get tracebacks---and possibly lots of 
them.

> Now I'm wondering if it pushed garbage onto my filesystem :-/

Read "THE JOURNAL" here:
  https://evilpiepirate.org/git/linux-bcache.git/tree/drivers/md/bcache/bcache.h

 "Bcache's journal is not necessary for consistency [...] Rather, the 
 journal is purely a performance optimization; we can't complete a write 
 until we've updated the index on disk, otherwise the cache would be 
 inconsistent in the event of an unclean shutdown."

I'm not convinced that journal replay will writeback, especially because 
of the documentation stating that forcing a bdev into a running state 
invalidates its cache.  I think it just keeps the datastructures in good 
shape on the cachedev, even though the cachedev was invalidated by forcing 
a 'running' state.

See super.c:bch_cached_dev_run() which was called when you `echo 1>running`.  
It looks like it sets BDEV_STATE_STALE on the bdev superblock.  

Is this the flag that invalidates the cache?

Zhu, Kent, can you confirm this?

> Again, no netconsole, sorry, this happens before my ethernet interface
> comes up.
> https://goo.gl/photos/suqp9sHyijdt9iUG7
> 
> sda6 was the partition I hid and just came back.
> sdb1 is the bcache linked to it.
> 
> On the plus side, no crash, although this didn't get to exercise your
> new code either.

Actually, I'm glad it didn't execute my tracing code.  The BUG_ON can 
stay, it just wasn't initialized at the time the writeback kthread was 
started.
 
> Either way, I'm really starting to have mixed feelings about using
> writeback if it's going to give me random crashes and subsequent
> corruption (which is a risk listed in the doc, admittedly).

Of course writeback comes with increased risk, but read this from 
bcache.h:

   "[...] we always strictly order metadata writes so that the btree and 
   everything else is consistent on disk in the event of an unclean 
   shutdown [...] and in fact bcache had writeback caching (with recovery 
   from unclean shutdown) before journalling was implemented."

So except for unexpected races like this one, bcache should recovery 
gracefully from an unexpected outage.  I think the greater risk of 
writeback cache failure has to do with device wearout and bitflips---so 
watch your TBW values on the caches.

-Eric

> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-02-26 21:17                           ` Eric Wheeler
@ 2016-03-03  4:17                             ` Eric Wheeler
  2016-03-03  4:25                               ` Marc MERLIN
  0 siblings, 1 reply; 28+ messages in thread
From: Eric Wheeler @ 2016-03-03  4:17 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Zhu Yanhai, Kent Overstreet, Christoph Nelles, linux-bcache

On Fri, 26 Feb 2016, Eric Wheeler wrote:
> On Fri, 26 Feb 2016, Marc MERLIN wrote:
> 
> > On Fri, Feb 26, 2016 at 04:55:02AM +0000, Eric Wheeler wrote:
> > > According to Documentation/bcache.txt:
> > > 	"" If you're booting up and your cache device is gone and never
> > > 	coming back, you can force run the backing device:
> > > 	  echo 1 > /sys/block/sdb/bcache/running
> > > 	[...]
> > > 	The backing device will still use that cache set if it shows up
> > > 	in the future, but all the cached data will be invalidated.  ""
> > > 
> > > So it seems that you are safe.  (It would be interesting to know how it 
> > > invalidates the cache.  Maybe bumps the Set UUID?  Not sure.)
> >  
> > Yeah, that was  my understanding too, but I wanted to make sure.
> > Strangely (worringly so?) the cache was replayed at boot, and this time
> > nothing crashed, or any traceback.
> > Now I'm wondering if it pushed garbage onto my filesystem :-/
> 
> I'm not convinced that journal replay will writeback, especially because 
> of the documentation stating that forcing a bdev into a running state 
> invalidates its cache.  I think it just keeps the datastructures in good 
> shape on the cachedev, even though the cachedev was invalidated by forcing 
> a 'running' state.
 
Hi Marc,

Thank you for your help investigating.  The two patches resulting from our 
testing are on their way into 4.5.

How has it been running since?  Any new backtraces to investigate?

-Eric

> 
> > Marc
> > -- 
> > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> > Microsoft is to operating systems ....
> >                                       .... what McDonalds is to gourmet cooking
> > Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: BUG: drivers/md/bcache/writeback.c:237
  2016-03-03  4:17                             ` Eric Wheeler
@ 2016-03-03  4:25                               ` Marc MERLIN
  0 siblings, 0 replies; 28+ messages in thread
From: Marc MERLIN @ 2016-03-03  4:25 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Zhu Yanhai, Kent Overstreet, Christoph Nelles, linux-bcache

On Thu, Mar 03, 2016 at 04:17:24AM +0000, Eric Wheeler wrote:
> On Fri, 26 Feb 2016, Eric Wheeler wrote:
> > On Fri, 26 Feb 2016, Marc MERLIN wrote:
> > 
> > > On Fri, Feb 26, 2016 at 04:55:02AM +0000, Eric Wheeler wrote:
> > > > According to Documentation/bcache.txt:
> > > > 	"" If you're booting up and your cache device is gone and never
> > > > 	coming back, you can force run the backing device:
> > > > 	  echo 1 > /sys/block/sdb/bcache/running
> > > > 	[...]
> > > > 	The backing device will still use that cache set if it shows up
> > > > 	in the future, but all the cached data will be invalidated.  ""
> > > > 
> > > > So it seems that you are safe.  (It would be interesting to know how it 
> > > > invalidates the cache.  Maybe bumps the Set UUID?  Not sure.)
> > >  
> > > Yeah, that was  my understanding too, but I wanted to make sure.
> > > Strangely (worringly so?) the cache was replayed at boot, and this time
> > > nothing crashed, or any traceback.
> > > Now I'm wondering if it pushed garbage onto my filesystem :-/
> > 
> > I'm not convinced that journal replay will writeback, especially because 
> > of the documentation stating that forcing a bdev into a running state 
> > invalidates its cache.  I think it just keeps the datastructures in good 
> > shape on the cachedev, even though the cachedev was invalidated by forcing 
> > a 'running' state.
>  
> Hi Marc,
> 
> Thank you for your help investigating.  The two patches resulting from our 
> testing are on their way into 4.5.
 
I saw that, thank you.

> How has it been running since?  Any new backtraces to investigate?

No crashes since then, although my last one started when I was shutting
the laptop down, and I don't shut down very often (actually virtually
never unless I upgrade kernels :) ).

But that's another way ot say so far so good :)

Thanks for your work looking into this.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-03-03  4:26 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-15  6:04 echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0 Marc MERLIN
2016-02-15 12:02 ` Johannes Thumshirn
2016-02-15 15:32   ` Marc MERLIN
2016-02-15 15:45     ` Christoph Nelles
2016-02-23 16:32       ` Marc MERLIN
2016-02-23 20:57         ` Marc MERLIN
2016-02-24 20:45       ` BUG: drivers/md/bcache/writeback.c:237 Marc MERLIN
2016-02-25  0:58         ` Eric Wheeler
2016-02-25  6:41           ` Eric Wheeler
2016-02-25  7:36             ` Eric Wheeler
2016-02-25 10:08               ` Zhu Yanhai
2016-02-26  2:38                 ` Eric Wheeler
2016-02-26  2:46                   ` Marc MERLIN
2016-02-26  3:19                     ` Marc MERLIN
2016-02-26  4:55                       ` Eric Wheeler
2016-02-26 16:27                         ` Marc MERLIN
2016-02-26 21:17                           ` Eric Wheeler
2016-03-03  4:17                             ` Eric Wheeler
2016-03-03  4:25                               ` Marc MERLIN
2016-02-25 10:18         ` Zhu Yanhai
2016-02-25 15:20           ` Marc MERLIN
2016-02-25 23:44             ` Eric Wheeler
2016-02-26  0:17               ` Marc MERLIN
2016-02-15 12:11 ` echo dev > /sys/fs/bcache/register gives page allocation failure: order:4, mode:0x2040d0 Kent Overstreet
2016-02-24  6:53 ` Eric Wheeler
2016-02-24 16:37   ` Disabling bcache from boot when it crashes? Marc MERLIN
2016-02-24 19:10     ` Eric Wheeler
2016-02-25  5:48       ` Marc MERLIN

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.