All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Zhang Yanfei <zhangyanfei.yes@gmail.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>,
	Jan Willeke <willeke@de.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-kernel@vger.kernel.org, kexec@lists.infradead.org,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 0/2] kdump/mmap: Fix mmap of /proc/vmcore for s390
Date: Fri, 31 May 2013 16:21:27 +0200	[thread overview]
Message-ID: <20130531162127.6d512233@holzheu> (raw)
In-Reply-To: <20130530203847.GB5968@redhat.com>

On Thu, 30 May 2013 16:38:47 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Wed, May 29, 2013 at 01:51:44PM +0200, Michael Holzheu wrote:
> 
> [..]
> > >>> START QUOTE
> > 
> > [PATCH v3 1/3] kdump: Introduce ELF header in new memory feature
> > 
> > Currently for s390 we create the ELF core header in the 2nd kernel
> > with a small trick. We relocate the addresses in the ELF header in
> > a way that for the /proc/vmcore code it seems to be in the 1st
> > kernel (old) memory and the read_from_oldmem() returns the correct
> > data. This allows the /proc/vmcore code to use the ELF header in
> > the 2nd kernel.
> > 
> > >>> END QUOTE
> > 
> > For our current zfcpdump project (see "[PATCH 3/3]s390/kdump: Use
> > vmcore for zfcpdump") we could no longer use this trick. Therefore
> > we sent you the patches to get a clean interface for ELF header
> > creation in the 2nd kernel.
> 
> Hi Michael,
> 
> Few more questions.
> 
> - What's the difference between zfcpdump and kdump. I thought zfcpdump
>   just boots specific kernel from fixed drive? If yes, why can't that
>   kernel prepare headers in similar way as regular kdump kernel does
>   and gain from kdump kernel swap trick?

Correct, the zfcpdump kernel is booted from a fixed disk drive. The
difference is that the zfcpdump HSA memory is not mapped into real
memory. It is accessed using a read memory interface "memcpy_hsa()"
that copies memory from the hypervisor owned HSA memory into the Linux
memory.

So it looks like the following:

+----------+                 +------------+
|          |   memcpy_hsa()  |            |
| zfcpdump | <-------------- | HSA memory |
|          |                 |            |
+----------+                 +------------+
|          |
| old mem  |
|          |
+----------+

In the copy_oldmem_page() function for zfcpdump we do the following:

copy_oldmem_page_zfcpdump(...)
{
	if (src < ZFCPDUMP_HSA_SIZE) {
		if (memcpy_hsa(buf, src, csize, userbuf) < 0)
			return -EINVAL;
	} else {
		if (userbuf)
			copy_to_user_real((buf, src, csize);
		else
			memcpy_real(buf, src, csize);
	}
}

So I think for zfcpdump we only can use the read() interface
of /proc/vmcore. But this is sufficient for us since we also provide
the s390 specific zfcpdump user space that copies /proc/vmcore.

> Also, we are accessing the contents of elf headers using physical
> address. If that's the case, does it make a difference if data is
> in old kernel's memory or new kernel's memory. We will use the
> physical address and create a temporary mapping and it should not
> make a difference whether same physical page is already mapped in
> current kernel or not.
>
> Only restriction this places is that all ELF header needs to be
> contiguous. I see that s390 code already creates elf headers using
> kzalloc_panic(). So memory allocated should by physically contiguous.
> 
> So can't we just put __pa(elfcorebuf) in elfcorehdr_addr. And same
> is true for p_offset fields in PT_NOTE headers and everything should
> work fine?
> 
> Only problem we can face is that at some point of time kzalloc() might
> not be able to contiguous memory request. We can handle that once s390
> runs into those issues. You are anyway allocating memory using
> kzalloc().
> 
> And if this works for s390 kdump, it should work for zfcpdump too?

So your suggestion is that copy_oldmem_page() should also be used for
copying memory from the new kernel, correct?

For kdump on s390 I think this will work with the new "ELF header swap"
patch. With that patch access to [0, OLDMEM_SIZE] will uniquely identify
an address in the new kernel and access to [OLDMEM_BASE, OLDMEM_BASE +
OLDMEM_SIZE] will identify an address in the old kernel.

For zfcpdump currently we add a load from [0, HSA_SIZE] where p_offset
equals p_paddr. Therefore we can't distinguish in copy_oldmem_page() if
we read from oldmem (HSA) or newmem. The range [0, HSA_SIZE] is used
twice. As a workaroun we could use an artificial p_offset for the HSA
memory chunk that is not used by the 1st kernel physical memory. This
is not really beautiful, but probably doable.

When I tried to implement this for kdump, I noticed another problem
with the vmcore mmap patches. Our copy_oldmem_page() function uses
memcpy_real() to access the old 1st kernel memory. This function
switches to real mode and therefore does not require any page tables.
But as a side effect of that we can't copy to vmalloc memory. The mmap
patches use vmalloc memory for "notes_buf". So currently using our
copy_oldmem_page() fails here.

If copy_oldmem_page() now also must be able to copy to vmalloc memory,
we would have to add new code for that:

* oldmem -> newmem (real): Use direct memcpy_real()
* oldmem -> newmem (vmalloc): Use intermediate buffer with memcpy_real()
* newmem -> newmem: Use memcpy()

What do you think?

Best Regards,
Michael


WARNING: multiple messages have this Message-ID (diff)
From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: kexec@lists.infradead.org,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Jan Willeke <willeke@de.ibm.com>,
	linux-kernel@vger.kernel.org,
	HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Zhang Yanfei <zhangyanfei.yes@gmail.com>
Subject: Re: [PATCH 0/2] kdump/mmap: Fix mmap of /proc/vmcore for s390
Date: Fri, 31 May 2013 16:21:27 +0200	[thread overview]
Message-ID: <20130531162127.6d512233@holzheu> (raw)
In-Reply-To: <20130530203847.GB5968@redhat.com>

On Thu, 30 May 2013 16:38:47 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Wed, May 29, 2013 at 01:51:44PM +0200, Michael Holzheu wrote:
> 
> [..]
> > >>> START QUOTE
> > 
> > [PATCH v3 1/3] kdump: Introduce ELF header in new memory feature
> > 
> > Currently for s390 we create the ELF core header in the 2nd kernel
> > with a small trick. We relocate the addresses in the ELF header in
> > a way that for the /proc/vmcore code it seems to be in the 1st
> > kernel (old) memory and the read_from_oldmem() returns the correct
> > data. This allows the /proc/vmcore code to use the ELF header in
> > the 2nd kernel.
> > 
> > >>> END QUOTE
> > 
> > For our current zfcpdump project (see "[PATCH 3/3]s390/kdump: Use
> > vmcore for zfcpdump") we could no longer use this trick. Therefore
> > we sent you the patches to get a clean interface for ELF header
> > creation in the 2nd kernel.
> 
> Hi Michael,
> 
> Few more questions.
> 
> - What's the difference between zfcpdump and kdump. I thought zfcpdump
>   just boots specific kernel from fixed drive? If yes, why can't that
>   kernel prepare headers in similar way as regular kdump kernel does
>   and gain from kdump kernel swap trick?

Correct, the zfcpdump kernel is booted from a fixed disk drive. The
difference is that the zfcpdump HSA memory is not mapped into real
memory. It is accessed using a read memory interface "memcpy_hsa()"
that copies memory from the hypervisor owned HSA memory into the Linux
memory.

So it looks like the following:

+----------+                 +------------+
|          |   memcpy_hsa()  |            |
| zfcpdump | <-------------- | HSA memory |
|          |                 |            |
+----------+                 +------------+
|          |
| old mem  |
|          |
+----------+

In the copy_oldmem_page() function for zfcpdump we do the following:

copy_oldmem_page_zfcpdump(...)
{
	if (src < ZFCPDUMP_HSA_SIZE) {
		if (memcpy_hsa(buf, src, csize, userbuf) < 0)
			return -EINVAL;
	} else {
		if (userbuf)
			copy_to_user_real((buf, src, csize);
		else
			memcpy_real(buf, src, csize);
	}
}

So I think for zfcpdump we only can use the read() interface
of /proc/vmcore. But this is sufficient for us since we also provide
the s390 specific zfcpdump user space that copies /proc/vmcore.

> Also, we are accessing the contents of elf headers using physical
> address. If that's the case, does it make a difference if data is
> in old kernel's memory or new kernel's memory. We will use the
> physical address and create a temporary mapping and it should not
> make a difference whether same physical page is already mapped in
> current kernel or not.
>
> Only restriction this places is that all ELF header needs to be
> contiguous. I see that s390 code already creates elf headers using
> kzalloc_panic(). So memory allocated should by physically contiguous.
> 
> So can't we just put __pa(elfcorebuf) in elfcorehdr_addr. And same
> is true for p_offset fields in PT_NOTE headers and everything should
> work fine?
> 
> Only problem we can face is that at some point of time kzalloc() might
> not be able to contiguous memory request. We can handle that once s390
> runs into those issues. You are anyway allocating memory using
> kzalloc().
> 
> And if this works for s390 kdump, it should work for zfcpdump too?

So your suggestion is that copy_oldmem_page() should also be used for
copying memory from the new kernel, correct?

For kdump on s390 I think this will work with the new "ELF header swap"
patch. With that patch access to [0, OLDMEM_SIZE] will uniquely identify
an address in the new kernel and access to [OLDMEM_BASE, OLDMEM_BASE +
OLDMEM_SIZE] will identify an address in the old kernel.

For zfcpdump currently we add a load from [0, HSA_SIZE] where p_offset
equals p_paddr. Therefore we can't distinguish in copy_oldmem_page() if
we read from oldmem (HSA) or newmem. The range [0, HSA_SIZE] is used
twice. As a workaroun we could use an artificial p_offset for the HSA
memory chunk that is not used by the 1st kernel physical memory. This
is not really beautiful, but probably doable.

When I tried to implement this for kdump, I noticed another problem
with the vmcore mmap patches. Our copy_oldmem_page() function uses
memcpy_real() to access the old 1st kernel memory. This function
switches to real mode and therefore does not require any page tables.
But as a side effect of that we can't copy to vmalloc memory. The mmap
patches use vmalloc memory for "notes_buf". So currently using our
copy_oldmem_page() fails here.

If copy_oldmem_page() now also must be able to copy to vmalloc memory,
we would have to add new code for that:

* oldmem -> newmem (real): Use direct memcpy_real()
* oldmem -> newmem (vmalloc): Use intermediate buffer with memcpy_real()
* newmem -> newmem: Use memcpy()

What do you think?

Best Regards,
Michael


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2013-05-31 14:21 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-24 13:08 [PATCH 0/2] kdump/mmap: Fix mmap of /proc/vmcore for s390 Michael Holzheu
2013-05-24 13:08 ` Michael Holzheu
2013-05-24 13:08 ` [PATCH 1/2] kdump/mmap: Introduce arch_oldmem_remap_pfn_range() Michael Holzheu
2013-05-24 13:08   ` Michael Holzheu
2013-05-24 13:08 ` [PATCH 2/2] s390/kdump/mmap: Implement arch_oldmem_remap_pfn_range() for s390 Michael Holzheu
2013-05-24 13:08   ` Michael Holzheu
2013-05-24 14:36 ` [PATCH 0/2] kdump/mmap: Fix mmap of /proc/vmcore " Vivek Goyal
2013-05-24 14:36   ` Vivek Goyal
2013-05-24 15:06   ` Michael Holzheu
2013-05-24 15:06     ` Michael Holzheu
2013-05-24 15:28     ` Vivek Goyal
2013-05-24 15:28       ` Vivek Goyal
2013-05-24 16:46       ` Michael Holzheu
2013-05-24 16:46         ` Michael Holzheu
2013-05-24 17:05         ` Vivek Goyal
2013-05-24 17:05           ` Vivek Goyal
2013-05-25 13:13           ` Michael Holzheu
2013-05-25 13:13             ` Michael Holzheu
2013-05-24 22:44       ` Eric W. Biederman
2013-05-24 22:44         ` Eric W. Biederman
2013-05-25  0:33         ` Zhang Yanfei
2013-05-25  0:33           ` Zhang Yanfei
2013-05-25  3:01           ` Eric W. Biederman
2013-05-25  3:01             ` Eric W. Biederman
2013-05-25  8:31             ` Zhang Yanfei
2013-05-25  8:31               ` Zhang Yanfei
2013-05-25 12:52               ` Michael Holzheu
2013-05-25 12:52                 ` Michael Holzheu
2013-05-28 13:55                 ` Vivek Goyal
2013-05-28 13:55                   ` Vivek Goyal
2013-05-29 11:51                   ` Michael Holzheu
2013-05-29 11:51                     ` Michael Holzheu
2013-05-29 16:23                     ` Vivek Goyal
2013-05-29 16:23                       ` Vivek Goyal
2013-05-29 17:12                       ` Michael Holzheu
2013-05-29 17:12                         ` Michael Holzheu
2013-05-30 15:00                         ` Vivek Goyal
2013-05-30 15:00                           ` Vivek Goyal
2013-05-30 20:38                     ` Vivek Goyal
2013-05-30 20:38                       ` Vivek Goyal
2013-05-31 14:21                       ` Michael Holzheu [this message]
2013-05-31 14:21                         ` Michael Holzheu
2013-05-31 16:01                         ` Vivek Goyal
2013-05-31 16:01                           ` Vivek Goyal
2013-06-03 13:27                           ` Michael Holzheu
2013-06-03 13:27                             ` Michael Holzheu
2013-06-03 15:59                             ` Vivek Goyal
2013-06-03 15:59                               ` Vivek Goyal
2013-06-03 16:48                               ` Michael Holzheu
2013-06-03 16:48                                 ` Michael Holzheu
2013-05-28 14:44                 ` Vivek Goyal
2013-05-28 14:44                   ` Vivek Goyal
2013-05-25 20:36               ` Eric W. Biederman
2013-05-25 20:36                 ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130531162127.6d512233@holzheu \
    --to=holzheu@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=d.hatayama@jp.fujitsu.com \
    --cc=ebiederm@xmission.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=vgoyal@redhat.com \
    --cc=willeke@de.ibm.com \
    --cc=zhangyanfei.yes@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.