Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Qian Cai <cai@lca.pw>
To: Anshuman Khandual <anshuman.khandual@arm.com>,
	Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Oscar Salvador <osalvador@suse.de>,
	Mel Gorman <mgorman@techsingularity.net>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Pavel Tatashin <pavel.tatashin@microsoft.com>,
	 linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/page_alloc: Add a reason for reserved pages in has_unmovable_pages()
Date: Fri, 04 Oct 2019 08:56:16 -0400
Message-ID: <1570193776.5576.270.camel@lca.pw> (raw)
In-Reply-To: <91128b73-9a47-100b-d3de-e83f0b941e9f@arm.com>

On Fri, 2019-10-04 at 17:14 +0530, Anshuman Khandual wrote:
> 
> On 10/04/2019 04:28 PM, Michal Hocko wrote:
> > On Thu 03-10-19 13:40:57, Anshuman Khandual wrote:
> > > Having unmovable pages on a given pageblock should be reported correctly
> > > when required with REPORT_FAILURE flag. But there can be a scenario where a
> > > reserved page in the page block will get reported as a generic "unmovable"
> > > reason code. Instead this should be changed to a more appropriate reason
> > > code like "Reserved page".
> > 
> > Others have already pointed out this is just redundant but I will have a
> 
> Sure.
> 
> > more generic comment on the changelog. There is essentially no
> > information why the current state is bad/unhelpful and why the chnage is
> 
> The current state is not necessarily bad or unhelpful. I just though that it
> could be improved upon. Some how calling out explicitly only the CMA page
> failure case just felt adhoc, where as there are other reasons like HugeTLB
> immovability which might depend on other factors apart from just page flags
> (though I did not propose that originally).
> 
> > needed. All you claim is that something is a certain way and then assert
> > that it should be done differently. That is not how changelogs should
> > look like.
> > 
> 
> Okay, probably I should have explained more on why "unmovable" is less than
> adequate to capture the exact reason for specific failure cases and how
> "Reserved Page" instead would been better. But got the point, will improve.
> 

It might be a good time to rethink if it is really a good idea to dump_page()
at all inside has_unmovable_pages(). As it is right now, it is a a potential
deadlock between console vs memory offline. More details are in this thread,

https://lore.kernel.org/lkml/1568817579.5576.172.camel@lca.pw/

01: [  672.875392] WARNING: possible circular locking dependency detected       
01: [  672.875394] 5.4.0-rc1-next-20191004+ #64 Not tainted                     
01: [  672.875396] ------------------------------------------------------       
01: [  672.875398] test.sh/1724 is trying to acquire lock:                      
01: [  672.875400] 0000000052059ec0 (console_owner){-...}, at: console_unlock+0x
01: 328/0xa30                                                                   
01: [  672.875406]                                                              
01: [  672.875408] but task is already holding lock:                            
01: [  672.875409] 000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: start_iso
01: late_page_range+0x216/0x538                                                 
01: [  672.875415]                                                              
01: [  672.875417] which lock already depends on the new lock.                  
01: [  672.875418]                                                              
01: [  672.875419]                                                              
01: [  672.875421] the existing dependency chain (in reverse order) is:         
01: [  672.875423]                                                              
01: [  672.875424] -> #2 (&(&zone->lock)->rlock){-.-.}:                         
01: [  672.875430]        lock_acquire+0x21a/0x468                              
01: [  672.875431]        _raw_spin_lock+0x54/0x68                              
01: [  672.875433]        get_page_from_freelist+0x8b6/0x2d28                   
01: [  672.875435]        __alloc_pages_nodemask+0x246/0x658                    
01: [  672.875437]        __get_free_pages+0x34/0x78                            
01: [  672.875438]        sclp_init+0x106/0x690                                 
01: [  672.875440]        sclp_register+0x2e/0x248                              
01: [  672.875442]        sclp_rw_init+0x4a/0x70                                
01: [  672.875443]        sclp_console_init+0x4a/0x1b8                          
01: [  672.875445]        console_init+0x2c8/0x410                              
01: [  672.875447]        start_kernel+0x530/0x6a0                              
01: [  672.875448]        startup_continue+0x70/0xd0                            
01: [  672.875449]                                                              
01: [  672.875450] -> #1 (sclp_lock){-.-.}:                                     
01: [  672.875458]        lock_acquire+0x21a/0x468                              
01: [  672.875460]        _raw_spin_lock_irqsave+0xcc/0xe8                      
01: [  672.875462]        sclp_add_request+0x34/0x308                           
01: [  672.875464]        sclp_conbuf_emit+0x100/0x138                          
01: [  672.875465]        sclp_console_write+0x96/0x3b8                         
01: [  672.875467]        console_unlock+0x6dc/0xa30                            
01: [  672.875469]        vprintk_emit+0x184/0x3c8                              
01: [  672.875470]        vprintk_default+0x44/0x50                             
01: [  672.875472]        printk+0xa8/0xc0                                      
01: [  672.875473]        iommu_debugfs_setup+0xf2/0x108                        
01: [  672.875475]        iommu_init+0x6c/0x78                                  
01: [  672.875477]        do_one_initcall+0x162/0x680                           
01: [  672.875478]        kernel_init_freeable+0x4e8/0x5a8                      
01: [  672.875480]        kernel_init+0x2a/0x188                                
01: [  672.875484]        ret_from_fork+0x30/0x34                               
01: [  672.875486]        kernel_thread_starter+0x0/0xc                         
01: [  672.875487]                                                              
01: [  672.875488] -> #0 (console_owner){-...}:                                 
01: [  672.875495]        check_noncircular+0x338/0x3e0                         
01: [  672.875496]        __lock_acquire+0x1e66/0x2d88                          
01: [  672.875498]        lock_acquire+0x21a/0x468                              
01: [  672.875499]        console_unlock+0x3a6/0xa30                            
01: [  672.875501]        vprintk_emit+0x184/0x3c8                              
01: [  672.875503]        vprintk_default+0x44/0x50                             
01: [  672.875504]        printk+0xa8/0xc0                                      
01: [  672.875506]        __dump_page+0x1dc/0x710                               
01: [  672.875507]        dump_page+0x2e/0x58                                   
01: [  672.875509]        has_unmovable_pages+0x2e8/0x470                       
01: [  672.875511]        start_isolate_page_range+0x404/0x538                  
01: [  672.875513]        __offline_pages+0x22c/0x1338                          
01: [  672.875514]        memory_subsys_offline+0xa6/0xe8                       
01: [  672.875516]        device_offline+0xe6/0x118                             
01: [  672.875517]        state_store+0xf0/0x110                                
01: [  672.875519]        kernfs_fop_write+0x1bc/0x270                          
01: [  672.875521]        vfs_write+0xce/0x220                                  
01: [  672.875522]        ksys_write+0xea/0x190                                 
01: [  672.875524]        system_call+0xd8/0x2b4                                
01: [  672.875525]                                                              
01: [  672.875527] other info that might help us debug this:                    
01: [  672.875528]                                                              
01: [  672.875529] Chain exists of:                                             
01: [  672.875530]   console_owner --> sclp_lock --> &(&zone->lock)->rlock      
01: [  672.875538]                                                              
01: [  672.875540]  Possible unsafe locking scenario:                           
01: [  672.875541]                                                              
01: [  672.875543]        CPU0                    CPU1                          
01: [  672.875544]        ----                    ----                          
01: [  672.875545]   lock(&(&zone->lock)->rlock);                               
01: [  672.875549]                                lock(sclp_lock);              
01: [  672.875553]                                lock(&(&zone->lock)->rlock);  
01: [  672.875557]   lock(console_owner);                                       
01: [  672.875560]                                                              
01: [  672.875562]  *** DEADLOCK ***                                            
01: [  672.875563]                                                              
01: [  672.875564] 9 locks held by test.sh/1724:                                
01: [  672.875565]  #0: 000000000e925408 (sb_writers#4){.+.+}, at: vfs_write+0x2
01: 06/0x220                                                                    
01: [  672.875574]  #1: 0000000050aa4280 (&of->mutex){+.+.}, at: kernfs_fop_writ
01: e+0x154/0x270                                                               
01: [  672.875581]  #2: 0000000062e5c628 (kn->count#198){.+.+}, at: kernfs_fop_w
01: rite+0x16a/0x270                                                            
01: [  672.875590]  #3: 00000000523236a0 (device_hotplug_lock){+.+.}, at: lock_d
01: evice_hotplug_sysfs+0x30/0x80                                               
01: [  672.875598]  #4: 0000000062e70990 (&dev->mutex){....}, at: device_offline
01: +0x78/0x118                                                                 
01: [  672.875605]  #5: 0000000051fd36b0 (cpu_hotplug_lock.rw_sem){++++}, at: __
01: offline_pages+0xec/0x1338                                                   
01: [  672.875613]  #6: 00000000521ca470 (mem_hotplug_lock.rw_sem){++++}, at: pe
01: rcpu_down_write+0x38/0x210                                                  
01: [  672.875620]  #7: 000000006ffd89c8 (&(&zone->lock)->rlock){-.-.}, at: star
01: t_isolate_page_range+0x216/0x538                                            
01: [  672.875628]  #8: 000000005205a100 (console_lock){+.+.}, at: vprintk_emit+
01: 0x178/0x3c8                                                                 
01: [  672.875635]                                                              
01: [  672.875636] stack backtrace:                                             
01: [  672.875639] CPU: 1 PID: 1724 Comm: test.sh Not tainted 5.4.0-rc1-next-201
01: 91004+ #64                                                                  
01: [  672.875640] Hardware name: IBM 2964 N96 400 (z/VM 6.4.0)                 
01: [  672.875642] Call Trace:                                                  
01: [  672.875644] ([<00000000512ae218>] show_stack+0x110/0x1b0)                
01: [  672.875645]  [<0000000051b6d506>] dump_stack+0x126/0x178                 
01: [  672.875648]  [<00000000513a4b08>] check_noncircular+0x338/0x3e0          
01: [  672.875650]  [<00000000513aaaf6>] __lock_acquire+0x1e66/0x2d88           
01: [  672.875652]  [<00000000513a7e12>] lock_acquire+0x21a/0x468               
01: [  672.875654]  [<00000000513bb2fe>] console_unlock+0x3a6/0xa30             
01: [  672.875655]  [<00000000513bde2c>] vprintk_emit+0x184/0x3c8               
01: [  672.875657]  [<00000000513be0b4>] vprintk_default+0x44/0x50              
01: [  672.875659]  [<00000000513beb60>] printk+0xa8/0xc0                       
01: [  672.875661]  [<000000005158c364>] __dump_page+0x1dc/0x710                
01: [  672.875663]  [<000000005158c8c6>] dump_page+0x2e/0x58                    
01: [  672.875665]  [<00000000515d87c8>] has_unmovable_pages+0x2e8/0x470        
01: [  672.875667]  [<000000005167072c>] start_isolate_page_range+0x404/0x538   
01: [  672.875669]  [<0000000051b96de4>] __offline_pages+0x22c/0x1338           
01: [  672.875671]  [<0000000051908586>] memory_subsys_offline+0xa6/0xe8        
01: [  672.875673]  [<00000000518e561e>] device_offline+0xe6/0x118              
01: [  672.875675]  [<0000000051908170>] state_store+0xf0/0x110                 
01: [  672.875677]  [<0000000051796384>] kernfs_fop_write+0x1bc/0x270           
01: [  672.875679]  [<000000005168972e>] vfs_write+0xce/0x220                   
01: [  672.875681]  [<0000000051689b9a>] ksys_write+0xea/0x190                  
01: [  672.875685]  [<0000000051ba9990>] system_call+0xd8/0x2b4                 
01: [  672.875687] INFO: lockdep is turned off.


  reply index

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-03  8:10 Anshuman Khandual
2019-10-03  9:05 ` Qian Cai
2019-10-03  9:32   ` Anshuman Khandual
2019-10-03  9:53     ` Anshuman Khandual
2019-10-03 11:19     ` Qian Cai
2019-10-03 11:32       ` Anshuman Khandual
2019-10-03 11:50         ` Qian Cai
2019-10-03 12:02           ` Anshuman Khandual
2019-10-03 12:14             ` Qian Cai
2019-10-04  8:25               ` David Hildenbrand
2020-01-14  8:19                 ` Anshuman Khandual
2020-01-14  8:30                   ` David Hildenbrand
2020-01-14  9:10                   ` Michal Hocko
2020-01-14 10:23                     ` Vlastimil Babka
2020-01-14 11:03                       ` Anshuman Khandual
2020-01-14 11:32                       ` Michal Hocko
2020-01-14 12:04                         ` [PATCH] mm, debug: always print flags in dump_page() Vlastimil Babka
2020-01-14 13:35                           ` Michal Hocko
2020-01-14 18:22                         ` [PATCH] mm/page_alloc: Add a reason for reserved pages in has_unmovable_pages() Ralph Campbell
2019-10-04 10:58 ` Michal Hocko
2019-10-04 11:44   ` Anshuman Khandual
2019-10-04 12:56     ` Qian Cai [this message]
2019-10-04 13:07       ` Michal Hocko
2019-10-04 13:30         ` Qian Cai
2019-10-04 13:38           ` Michal Hocko
2019-10-04 13:56             ` Qian Cai
2019-10-04 14:41               ` Michal Hocko
2019-10-05 21:22                 ` Andrew Morton
2019-10-05 22:38                   ` Qian Cai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1570193776.5576.270.camel@lca.pw \
    --to=cai@lca.pw \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=osalvador@suse.de \
    --cc=pavel.tatashin@microsoft.com \
    --cc=rppt@linux.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git