> On Oct 7, 2019, at 4:07 AM, Michal Hocko wrote: > > I do not think that removing the printk is the right long term solution. > While I do agree that removing the debugging printk __offline_isolated_pages > does make sense because it is essentially of a very limited use, this > doesn't really solve the underlying problem. There are likely other > printks from zone->lock. It would be much more saner to actually > disallow consoles to allocate any memory while printk is called from an > atomic context. No, there is only a handful of places called printk() from zone->lock. It is normal that the callers will quietly process “struct zone” modification in a short section with zone->lock held. No, it is not about “allocate any memory while printk is called from an atomic context”. It is opposite lock chain from different processors which has the same effect. For example, CPU0: CPU1: CPU2: console_owner sclp_lock sclp_lock zone_lock zone_lock console_owner Here it is a deadlock. > >> The problem is probably there forever, but neither many developers will >> run memory offline with the lockdep enabled nor admins in the field are >> lucky enough yet to hit a perfect timing which required to trigger a >> real deadlock. In addition, there aren't many places that call printk() >> while zone->lock was held. >> >> WARNING: possible circular locking dependency detected >> ------------------------------------------------------ >> test.sh/1724 is trying to acquire lock: >> 0000000052059ec0 (console_owner){-...}, at: console_unlock+0x >> 01: 328/0xa30 >> >> but task is already holding lock: >> 000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: start_iso >> 01: late_page_range+0x216/0x538 > > Show Quoted Content >> The problem is probably there forever, but neither many developers will >> run memory offline with the lockdep enabled nor admins in the field are >> lucky enough yet to hit a perfect timing which required to trigger a >> real deadlock. In addition, there aren't many places that call printk() >> while zone->lock was held. >> >> WARNING: possible circular locking dependency detected >> ------------------------------------------------------ >> test.sh/1724 is trying to acquire lock: >> 0000000052059ec0 (console_owner){-...}, at: console_unlock+0x >> 01: 328/0xa30 >> >> but task is already holding lock: >> 000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: start_iso >> 01: late_page_range+0x216/0x538 > > > I am also wondering what does this lockdep report actually say. How come > we have a dependency between a start_kernel path and a syscall? Petr explained it correctly.