From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754181AbdKFQRc (ORCPT ); Mon, 6 Nov 2017 11:17:32 -0500 Received: from out0-250.mail.aliyun.com ([140.205.0.250]:41818 "EHLO out0-250.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753207AbdKFQRX (ORCPT ); Mon, 6 Nov 2017 11:17:23 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R871e4;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e02c03280;MF=yang.s@alibaba-inc.com;NM=1;PH=DS;RN=8;SR=0;TI=SMTPD_---.9L8Km2k_1509985024; Subject: Re: [PATCH] mm: do not rely on preempt_count in print_vma_addr To: Michal Hocko , Peter Zijlstra Cc: Bart Van Assche , "akpm@linux-foundation.org" , "joe@perches.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "mingo@redhat.com" References: <1509572313-102989-1-git-send-email-yang.s@alibaba-inc.com> <20171102075744.whhxjmqbdkfaxghd@dhcp22.suse.cz> <20171103110245.7049460a05cc18c7e8a9feb2@linux-foundation.org> <1509739786.2473.33.camel@wdc.com> <20171105081946.yr2pvalbegxygcky@dhcp22.suse.cz> <20171106100558.GD3165@worktop.lehotels.local> <20171106104354.2jlgd2m4j4gxx4qo@dhcp22.suse.cz> <20171106120025.GH3165@worktop.lehotels.local> <20171106121222.nnzrr4cb7s7y5h74@dhcp22.suse.cz> <20171106134031.g6dbelg55mrbyc6i@dhcp22.suse.cz> From: "Yang Shi" Message-ID: <8665ccad-fa48-b835-c2e0-e50a4f05f319@alibaba-inc.com> Date: Tue, 07 Nov 2017 00:16:58 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20171106134031.g6dbelg55mrbyc6i@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/6/17 5:40 AM, Michal Hocko wrote: > On Mon 06-11-17 13:12:22, Michal Hocko wrote: >> On Mon 06-11-17 13:00:25, Peter Zijlstra wrote: >>> On Mon, Nov 06, 2017 at 11:43:54AM +0100, Michal Hocko wrote: >>>>> Yes the comment is very much accurate. >>>> >>>> Which suggests that print_vma_addr might be problematic, right? >>>> Shouldn't we do trylock on mmap_sem instead? >>> >>> Yes that's complete rubbish. trylock will get spurious failures to print >>> when the lock is contended. >> >> Yes, but I guess that it is acceptable to to not print the state under >> that condition. > > So what do you think about this? I think this is more robust than > playing tricks with the explicit preempt count checks and less tedious > than checking to make it conditional on the context. This is on top of > Linus tree and if accepted it should replace the patch discussed here. > --- > From 0de6d57cbc54ee2686d1f1e4ffcc4ed490ded8aa Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Mon, 6 Nov 2017 14:31:20 +0100 > Subject: [PATCH] mm: do not rely on preempt_count in print_vma_addr > > The preempt count check on print_vma_addr has been added by e8bff74afbdb > ("x86: fix "BUG: sleeping function called from invalid context" in > print_vma_addr()") and it relied on the elevated preempt count from > preempt_conditional_sti because preempt_count check doesn't work on > non preemptive kernels by default. The code has evolved though and > d99e1bd175f4 ("x86/entry/traps: Refactor preemption and interrupt flag > handling") has replaced preempt_conditional_sti by an explicit > preempt_disable which is noop on !PREEMPT so the check in print_vma_addr > is broken. > > Fix the issue by using trylock on mmap_sem rather than chacking the s/chacking/checking > preempt count. The allocation we are relying on has to be GFP_NOWAIT > as well. There is a chance that we won't dump the vma state if the lock > is contended or the memory short but this is acceptable outcome and much > less fragile than the not working preemption check or tricks around it. > > Fixes: d99e1bd175f4 ("x86/entry/traps: Refactor preemption and interrupt flag handling") > Signed-off-by: Michal Hocko Acked-by: Yang Shi Regards, Yang > --- > mm/memory.c | 8 +++----- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index a728bed16c20..1e308ac8ca0a 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4457,17 +4457,15 @@ void print_vma_addr(char *prefix, unsigned long ip) > struct vm_area_struct *vma; > > /* > - * Do not print if we are in atomic > - * contexts (in exception stacks, etc.): > + * we might be running from an atomic context so we cannot sleep > */ > - if (preempt_count()) > + if (!down_read_trylock(&mm->mmap_sem)) > return; > > - down_read(&mm->mmap_sem); > vma = find_vma(mm, ip); > if (vma && vma->vm_file) { > struct file *f = vma->vm_file; > - char *buf = (char *)__get_free_page(GFP_KERNEL); > + char *buf = (char *)__get_free_page(GFP_NOWAIT); > if (buf) { > char *p; > >