All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Brennan <stephen.s.brennan@oracle.com>
To: Baoquan He <bhe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Uladzislau Rezki <urezki@gmail.com>,
	Christoph Hellwig <hch@infradead.org>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: /proc/kcore reads 0's for vmap_block
Date: Thu, 27 Oct 2022 11:07:36 -0700	[thread overview]
Message-ID: <87fsf9gmxj.fsf@oracle.com> (raw)
In-Reply-To: <Y1pHTj2wuhoWmeV3@MiWiFi-R3L-srv>

Hi Baoquan,

Thanks for taking a look and coming up with a patch!

(responses inline)

Baoquan He <bhe@redhat.com> writes:
> On 10/26/22 at 04:15pm, Stephen Brennan wrote:
>> Hi all,
>> 
>> The /proc/kcore interface uses vread() to read memory addresses which
>> are in the vmalloc region, and it seems that vread() doesn't support
>> vmap_areas which are associated with a vmap_block. As far as I can tell,
>> those have vmap_area.vm == NULL, and get skipped during vread()'s
>> iteration. So the end result is that the read simply returns 0's.
>
> Hmm, with my understanding, it should be vm_map_ram() which is called
> without a struct vm_struct associated, and it's the only interface do
> to so. vmap_block is the optimization way based on percpu to reduce vmap
> lock race.
>
>> 
>> This impacts live debuggers like gdb and drgn, which is how I stumbled
>> upon it[1]. It looks like crash avoids the issue by doing a page table
>> walk and reading the physical address.
>> 
>> I'm wondering if there's any rationale for this omission from vread():
>> is it a simple oversight, or was it omitted due to the difficulty? Is
>> it possible for /proc/kcore to simply take the page faults when it reads
>> unmapped memory and handle them? (I'm sure that's already discussed or
>> is obviously infeasible for some reason beyond me.)
>
> From git history, the vmlist iterating was taken in vread() at the
> first place. Later, in below commit, when people changed to iterate
> vmap_area_list instead, they just inherited the old code logic. I guess
> that's why vmap with NULL ->vm is skipped.
>
> commit e81ce85f960c ("mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite()")
>
>> 
>> Ideally, I'm just looking for a way forward that allows the debugger to
>> *work* as expected, meaning either that /proc/kcore always reads the
>> correct data, or that the debugger can know ahead of time that it will
>> need to do some processing (like a page table walk) first. 
>
> I think we can adjust vread() to allow those vmap_area with NULL ->vm
> being read out? Made a draft patch at below, please feel free to have a
> test. Not sure if there's any risk.

I tested this out and it fully resolved the problem! I didn't encounter
any issues with the debuggers using /proc/kcore. My test kernel module
is here:
https://github.com/brenns10/kernel_stuff/blob/master/drgn_vmalloc_test/drgn_vmalloc_test.c

I was able to read the full contents of the mapped area correctly with
drgn thanks to this patch.

>
> From 9f1b786730f3ee0a8d5b48a94dbefa674102d7b9 Mon Sep 17 00:00:00 2001
> From: Baoquan He <bhe@redhat.com>
> Date: Thu, 27 Oct 2022 16:20:26 +0800
> Subject: [PATCH] mm/vmalloc.c: allow to read out vm_map_ram() areas in vread()
> Content-type: text/plain
>
> Currently, vread can read out vmalloc areas who is associated with
> a vm_struct. While this doesn't work for areas created by vm_map_ram()
> interface because it doesn't allocate a vm_struct. Then in vread(),
> these areas will be skipped.
>
> Pages are passed into vm_map_ram() and mapped onto frea vmap area,
> it should be safe to read them out. Change code to allow to read
> out these vm_map_ram() areas in vread().

I see, we don't even need to worry about page faults because
vm_map_ram() is guaranteed to have mapped them. That does make things
easier :)

>
> Signed-off-by: Baoquan He <bhe@redhat.com>

Reviewed-by: Stephen Brennan <stephen.s.brennan@oracle.com>

(or Tested-by if you prefer)

> ---
>  mm/vmalloc.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ccaa461998f3..f899ab784671 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3526,7 +3526,7 @@ long vread(char *buf, char *addr, unsigned long count)
>  	struct vm_struct *vm;
>  	char *vaddr, *buf_start = buf;
>  	unsigned long buflen = count;
> -	unsigned long n;
> +	unsigned long n, size;
>  
>  	addr = kasan_reset_tag(addr);
>  
> @@ -3547,12 +3547,11 @@ long vread(char *buf, char *addr, unsigned long count)
>  		if (!count)
>  			break;
>  
> -		if (!va->vm)
> -			continue;
> -
>  		vm = va->vm;
> -		vaddr = (char *) vm->addr;
> -		if (addr >= vaddr + get_vm_area_size(vm))
> +		vaddr = (char *) va->va_start;
> +		size = vm ? get_vm_area_size(vm) : va_size(va);
> +
> +		if (addr >= vaddr + size)
>  			continue;
>  		while (addr < vaddr) {
>  			if (count == 0)
> @@ -3562,10 +3561,10 @@ long vread(char *buf, char *addr, unsigned long count)
>  			addr++;
>  			count--;
>  		}
> -		n = vaddr + get_vm_area_size(vm) - addr;
> +		n = vaddr + size - addr;
>  		if (n > count)
>  			n = count;
> -		if (!(vm->flags & VM_IOREMAP))
> +		if (!vm || !(vm->flags & VM_IOREMAP))
>  			aligned_vread(buf, addr, n);
>  		else /* IOREMAP area is treated as memory hole */
>  			memset(buf, 0, n);
> -- 
> 2.34.1


  reply	other threads:[~2022-10-27 18:08 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-26 23:15 /proc/kcore reads 0's for vmap_block Stephen Brennan
2022-10-27  8:54 ` Baoquan He
2022-10-27 18:07   ` Stephen Brennan [this message]
2022-10-28  7:52   ` Uladzislau Rezki
2022-10-31  7:12     ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fsf9gmxj.fsf@oracle.com \
    --to=stephen.s.brennan@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=urezki@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.