All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Christoph Hellwig <hch@infradead.org>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: /proc/kcore reads 0's for vmap_block
Date: Mon, 31 Oct 2022 15:12:40 +0800	[thread overview]
Message-ID: <Y191aGqqaCLMF/P3@MiWiFi-R3L-srv> (raw)
In-Reply-To: <Y1uKSmgURNEa3nQu@pc636>

On 10/28/22 at 09:52am, Uladzislau Rezki wrote:
> On Thu, Oct 27, 2022 at 04:54:38PM +0800, Baoquan He wrote:
> > On 10/26/22 at 04:15pm, Stephen Brennan wrote:
> > > Hi all,
> > > 
> > > The /proc/kcore interface uses vread() to read memory addresses which
> > > are in the vmalloc region, and it seems that vread() doesn't support
> > > vmap_areas which are associated with a vmap_block. As far as I can tell,
> > > those have vmap_area.vm == NULL, and get skipped during vread()'s
> > > iteration. So the end result is that the read simply returns 0's.
> > 
> > Hmm, with my understanding, it should be vm_map_ram() which is called
> > without a struct vm_struct associated, and it's the only interface do
> > to so. vmap_block is the optimization way based on percpu to reduce vmap
> > lock race.
> > 
> > > 
> > > This impacts live debuggers like gdb and drgn, which is how I stumbled
> > > upon it[1]. It looks like crash avoids the issue by doing a page table
> > > walk and reading the physical address.
> > > 
> > > I'm wondering if there's any rationale for this omission from vread():
> > > is it a simple oversight, or was it omitted due to the difficulty? Is
> > > it possible for /proc/kcore to simply take the page faults when it reads
> > > unmapped memory and handle them? (I'm sure that's already discussed or
> > > is obviously infeasible for some reason beyond me.)
> > 
> > From git history, the vmlist iterating was taken in vread() at the
> > first place. Later, in below commit, when people changed to iterate
> > vmap_area_list instead, they just inherited the old code logic. I guess
> > that's why vmap with NULL ->vm is skipped.
> > 
> > commit e81ce85f960c ("mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite()")
> > 
> > > 
> > > Ideally, I'm just looking for a way forward that allows the debugger to
> > > *work* as expected, meaning either that /proc/kcore always reads the
> > > correct data, or that the debugger can know ahead of time that it will
> > > need to do some processing (like a page table walk) first. 
> > 
> > I think we can adjust vread() to allow those vmap_area with NULL ->vm
> > being read out? Made a draft patch at below, please feel free to have a
> > test. Not sure if there's any risk.
> > 
> > From 9f1b786730f3ee0a8d5b48a94dbefa674102d7b9 Mon Sep 17 00:00:00 2001
> > From: Baoquan He <bhe@redhat.com>
> > Date: Thu, 27 Oct 2022 16:20:26 +0800
> > Subject: [PATCH] mm/vmalloc.c: allow to read out vm_map_ram() areas in vread()
> > Content-type: text/plain
> > 
> > Currently, vread can read out vmalloc areas who is associated with
> > a vm_struct. While this doesn't work for areas created by vm_map_ram()
> > interface because it doesn't allocate a vm_struct. Then in vread(),
> > these areas will be skipped.
> > 
> > Pages are passed into vm_map_ram() and mapped onto frea vmap area,
> > it should be safe to read them out. Change code to allow to read
> > out these vm_map_ram() areas in vread().
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  mm/vmalloc.c | 15 +++++++--------
> >  1 file changed, 7 insertions(+), 8 deletions(-)
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index ccaa461998f3..f899ab784671 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3526,7 +3526,7 @@ long vread(char *buf, char *addr, unsigned long count)
> >  	struct vm_struct *vm;
> >  	char *vaddr, *buf_start = buf;
> >  	unsigned long buflen = count;
> > -	unsigned long n;
> > +	unsigned long n, size;
> >  
> >  	addr = kasan_reset_tag(addr);
> >  
> > @@ -3547,12 +3547,11 @@ long vread(char *buf, char *addr, unsigned long count)
> >  		if (!count)
> >  			break;
> >  
> > -		if (!va->vm)
> > -			continue;
> > -
> >  		vm = va->vm;
> > -		vaddr = (char *) vm->addr;
> > -		if (addr >= vaddr + get_vm_area_size(vm))
> > +		vaddr = (char *) va->va_start;
> > +		size = vm ? get_vm_area_size(vm) : va_size(va);
> > +
> > +		if (addr >= vaddr + size)
> >  			continue;
> >  		while (addr < vaddr) {
> >  			if (count == 0)
> > @@ -3562,10 +3561,10 @@ long vread(char *buf, char *addr, unsigned long count)
> >  			addr++;
> >  			count--;
> >  		}
> > -		n = vaddr + get_vm_area_size(vm) - addr;
> > +		n = vaddr + size - addr;
> >  		if (n > count)
> >  			n = count;
> > -		if (!(vm->flags & VM_IOREMAP))
> > +		if (!vm || !(vm->flags & VM_IOREMAP))
> >  			aligned_vread(buf, addr, n);
> >
> What happens if during the read a page is unmapped by the
> vm_unamp_ram()? I see that it concurrently can occur.

You are right, thanks for pointing that out.

Currently, vmap_block doesn't track the used or dirty/free pages in one
whole vmap block. The old alloc_map and dirty_map bits have been
removed. I plan to add used_map to track the pages being used. The
added code looks simple and won't degrade efficiency. Will post for
reviewing once finished.



      reply	other threads:[~2022-10-31  7:13 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-26 23:15 /proc/kcore reads 0's for vmap_block Stephen Brennan
2022-10-27  8:54 ` Baoquan He
2022-10-27 18:07   ` Stephen Brennan
2022-10-28  7:52   ` Uladzislau Rezki
2022-10-31  7:12     ` Baoquan He [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y191aGqqaCLMF/P3@MiWiFi-R3L-srv \
    --to=bhe@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=stephen.s.brennan@oracle.com \
    --cc=urezki@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.