On Tue, Dec 22 2015, Stanislav Samsonov wrote: > Hi, > > Kernel 4.1.3 : there is some troubling kernel message that shows up > after enabling CONFIG_DEBUG_ATOMIC_SLEEP and testing DMA XOR > acceleration for raid5: > > BUG: sleeping function called from invalid context at mm/mempool.c:320 > in_atomic(): 1, irqs_disabled(): 0, pid: 1048, name: md127_raid5 > INFO: lockdep is turned off. > CPU: 1 PID: 1048 Comm: md127_raid5 Not tainted 4.1.15.alpine.1-dirty #1 > Hardware name: Annapurna Labs Alpine > [] (unwind_backtrace) from [] (show_stack+0x10/0x14) > [] (show_stack) from [] (dump_stack+0x80/0xb4) > [] (dump_stack) from [] (mempool_alloc+0x68/0x13c) > [] (mempool_alloc) from [] > (dmaengine_get_unmap_data+0x24/0x4c) > [] (dmaengine_get_unmap_data) from [] > (async_xor_val+0x60/0x3a0) > [] (async_xor_val) from [] (raid_run_ops+0xb70/0x1248) > [] (raid_run_ops) from [] (handle_stripe+0x1068/0x22a8) > [] (handle_stripe) from [] > (handle_active_stripes+0x2d0/0x3dc) > [] (handle_active_stripes) from [] (raid5d+0x384/0x5b0) > [] (raid5d) from [] (md_thread+0x114/0x138) > [] (md_thread) from [] (kthread+0xe4/0x104) > [] (kthread) from [] (ret_from_fork+0x14/0x3c) > > The reason is that async_xor_val() in crypto/async_tx/async_xor.c is > called in atomic context (preemption disabled) by raid_run_ops(). Then > it calls dmaengine_get_unmap_data() an then mempool_alloc() with > GFP_NOIO flag - this allocation type might sleep under some condition. > > Checked latest kernel 4.3 and it has exactly same flow. > > Any advice regarding this issue? Changing the GFP_NOIO to GFP_ATOMIC in all the calls to dmaengine_get_unmap_data() in crypto/async_tx/ would probably fix the issue... or make it crash even worse :-) Dan: do you have any wisdom here? The xor is using the percpu data in raid5, so it cannot be sleep, but GFP_NOIO allows sleep. Does the code handle failure to get_unmap_data() safely? It looks like it probably does. NeilBrown