From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6F4DC38A02 for ; Mon, 31 Oct 2022 07:13:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D97076B0071; Mon, 31 Oct 2022 03:13:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D46E26B0073; Mon, 31 Oct 2022 03:13:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C35858E0001; Mon, 31 Oct 2022 03:13:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A04D66B0071 for ; Mon, 31 Oct 2022 03:13:04 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 49EEC1A0AE2 for ; Mon, 31 Oct 2022 07:13:04 +0000 (UTC) X-FDA: 80080377888.07.C943E02 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf13.hostedemail.com (Postfix) with ESMTP id D3D5720007 for ; Mon, 31 Oct 2022 07:13:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667200382; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=aDNdrDjI7gCfBgNeHVrnpka1B5PdUJ+SfPhvxY/vK/Y=; b=YzmiTedhLK7tmUrGYs8bEDtesddVv2dGVTf/sjObQzMXlI9eLmmhxSubKJjAso2m+GqaBU TzG45TYtaykbXcaEPCT16KAyuSReI0aDenPJccVICceD8IkYets9ATBHxe0dJtG07wJdJ8 7Ota16fzeDYhxBOmeAmTR0BPJNS6f3w= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-292-h9UI46o8MWyxVTy_PT7kjw-1; Mon, 31 Oct 2022 03:12:59 -0400 X-MC-Unique: h9UI46o8MWyxVTy_PT7kjw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 83B1E29DD9A1; Mon, 31 Oct 2022 07:12:54 +0000 (UTC) Received: from localhost (ovpn-12-152.pek2.redhat.com [10.72.12.152]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2C2362027062; Mon, 31 Oct 2022 07:12:45 +0000 (UTC) Date: Mon, 31 Oct 2022 15:12:40 +0800 From: Baoquan He To: Uladzislau Rezki Cc: Stephen Brennan , Andrew Morton , linux-mm@kvack.org, Christoph Hellwig , Matthew Wilcox Subject: Re: /proc/kcore reads 0's for vmap_block Message-ID: References: <87ilk6gos2.fsf@oracle.com> MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667200384; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aDNdrDjI7gCfBgNeHVrnpka1B5PdUJ+SfPhvxY/vK/Y=; b=COopp+h7ehwJuQUraMyL+rhPTRQcB9qUsxHml0HI8jEFfn4arsBX/Kv9jD7bTJHpv+h72Z PmHXnoqM65s+MSRBmhcqyNzZhf89fIp4j1NfERXZvIvlJY/WZaIiNiP9pGGu3jlOthNd0d Ht6rKzgPcqC4VIvJFLPSVCrumxugtEo= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YzmiTedh; spf=pass (imf13.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667200384; a=rsa-sha256; cv=none; b=s4qdIWVIZ9x3Bhwd9eyqex6V4X/2ZRQosdOD7s8ADjVdVLFVdb9K8y9VUjP3IKpszs8+hA uw/Trsp6sOdiBR0bVDbAv0UyI9O3t8MyDvTbhz62leF5PB4cdrcOxMBZQnMfGcMzV+jUVO lrnEGVtyfiNZkHZO0lX0/lp+0cUp7og= Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YzmiTedh; spf=pass (imf13.hostedemail.com: domain of bhe@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Queue-Id: D3D5720007 X-Rspamd-Server: rspam03 X-Stat-Signature: 1bxg1njin8w34wd4iazpbwm7q4z6rxia X-HE-Tag: 1667200382-837214 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/28/22 at 09:52am, Uladzislau Rezki wrote: > On Thu, Oct 27, 2022 at 04:54:38PM +0800, Baoquan He wrote: > > On 10/26/22 at 04:15pm, Stephen Brennan wrote: > > > Hi all, > > > > > > The /proc/kcore interface uses vread() to read memory addresses which > > > are in the vmalloc region, and it seems that vread() doesn't support > > > vmap_areas which are associated with a vmap_block. As far as I can tell, > > > those have vmap_area.vm == NULL, and get skipped during vread()'s > > > iteration. So the end result is that the read simply returns 0's. > > > > Hmm, with my understanding, it should be vm_map_ram() which is called > > without a struct vm_struct associated, and it's the only interface do > > to so. vmap_block is the optimization way based on percpu to reduce vmap > > lock race. > > > > > > > > This impacts live debuggers like gdb and drgn, which is how I stumbled > > > upon it[1]. It looks like crash avoids the issue by doing a page table > > > walk and reading the physical address. > > > > > > I'm wondering if there's any rationale for this omission from vread(): > > > is it a simple oversight, or was it omitted due to the difficulty? Is > > > it possible for /proc/kcore to simply take the page faults when it reads > > > unmapped memory and handle them? (I'm sure that's already discussed or > > > is obviously infeasible for some reason beyond me.) > > > > From git history, the vmlist iterating was taken in vread() at the > > first place. Later, in below commit, when people changed to iterate > > vmap_area_list instead, they just inherited the old code logic. I guess > > that's why vmap with NULL ->vm is skipped. > > > > commit e81ce85f960c ("mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite()") > > > > > > > > Ideally, I'm just looking for a way forward that allows the debugger to > > > *work* as expected, meaning either that /proc/kcore always reads the > > > correct data, or that the debugger can know ahead of time that it will > > > need to do some processing (like a page table walk) first. > > > > I think we can adjust vread() to allow those vmap_area with NULL ->vm > > being read out? Made a draft patch at below, please feel free to have a > > test. Not sure if there's any risk. > > > > From 9f1b786730f3ee0a8d5b48a94dbefa674102d7b9 Mon Sep 17 00:00:00 2001 > > From: Baoquan He > > Date: Thu, 27 Oct 2022 16:20:26 +0800 > > Subject: [PATCH] mm/vmalloc.c: allow to read out vm_map_ram() areas in vread() > > Content-type: text/plain > > > > Currently, vread can read out vmalloc areas who is associated with > > a vm_struct. While this doesn't work for areas created by vm_map_ram() > > interface because it doesn't allocate a vm_struct. Then in vread(), > > these areas will be skipped. > > > > Pages are passed into vm_map_ram() and mapped onto frea vmap area, > > it should be safe to read them out. Change code to allow to read > > out these vm_map_ram() areas in vread(). > > > > Signed-off-by: Baoquan He > > --- > > mm/vmalloc.c | 15 +++++++-------- > > 1 file changed, 7 insertions(+), 8 deletions(-) > > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index ccaa461998f3..f899ab784671 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -3526,7 +3526,7 @@ long vread(char *buf, char *addr, unsigned long count) > > struct vm_struct *vm; > > char *vaddr, *buf_start = buf; > > unsigned long buflen = count; > > - unsigned long n; > > + unsigned long n, size; > > > > addr = kasan_reset_tag(addr); > > > > @@ -3547,12 +3547,11 @@ long vread(char *buf, char *addr, unsigned long count) > > if (!count) > > break; > > > > - if (!va->vm) > > - continue; > > - > > vm = va->vm; > > - vaddr = (char *) vm->addr; > > - if (addr >= vaddr + get_vm_area_size(vm)) > > + vaddr = (char *) va->va_start; > > + size = vm ? get_vm_area_size(vm) : va_size(va); > > + > > + if (addr >= vaddr + size) > > continue; > > while (addr < vaddr) { > > if (count == 0) > > @@ -3562,10 +3561,10 @@ long vread(char *buf, char *addr, unsigned long count) > > addr++; > > count--; > > } > > - n = vaddr + get_vm_area_size(vm) - addr; > > + n = vaddr + size - addr; > > if (n > count) > > n = count; > > - if (!(vm->flags & VM_IOREMAP)) > > + if (!vm || !(vm->flags & VM_IOREMAP)) > > aligned_vread(buf, addr, n); > > > What happens if during the read a page is unmapped by the > vm_unamp_ram()? I see that it concurrently can occur. You are right, thanks for pointing that out. Currently, vmap_block doesn't track the used or dirty/free pages in one whole vmap block. The old alloc_map and dirty_map bits have been removed. I plan to add used_map to track the pages being used. The added code looks simple and won't degrade efficiency. Will post for reviewing once finished.