[PATCH 0/13] makedumpfile: Avoid two pass filtering by using bitmap file.

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/13] makedumpfile: Avoid two pass filtering by using bitmap file.
@ 2015-05-11  6:13 Atsushi Kumagai
  2015-05-12  8:20 ` HATAYAMA Daisuke
  0 siblings, 1 reply; 5+ messages in thread
From: Atsushi Kumagai @ 2015-05-11  6:13 UTC (permalink / raw)
  To: kexec

Hello,

This is the patch set to avoid two pass filtering, it is the
finished version of the previous patch set below:

  http://lists.infradead.org/pipermail/kexec/2015-March/013497.html

cyclic mode has to take a two-pass approach for filtering to save on the
memory consumption, it's a disadvantage of the cyclic mode and it's basically
unavoidable. However, even the cyclic mode can avoid two-pass filtering if free
memory space is enough to store the whole 1st and 2nd bitmaps, but the current
version doesn't it.
The main purpose of this patch set is avoiding that useless filtering,
but before that, I merged non-cyclic mode into cyclic mode as code clean up 
because the codes are almost the same. Instead, I introduce another way to
guarantee one-pass filtering by using disk space.

MAJOR CHANGES

1. Introduce --work-dir option instead of --non-cyclic

  --non-cyclic option were used to choose the non-cyclic mode, it realizes
  one-pass filtering by creating a temporary bitmap file on TMPDIR.
  The new option "--work-dir" is used to specify a working directory
  to store the bitmap file, so this is the alternative to the combination of
  --non-cyclic and TMPDIR.

2. Remove extra page filtering

  If free memory space is enough to store the whole 1st and 2nd bitmaps
  (or --work-dir is specified), page filtering and creating bitmaps are
  done only once before writing process. Otherwise the bitmaps are created
  twice; to decide the offset of the page header region, and for actual
  page writing process, as usual.

INTERNAL CHANGE

  info->flag_cyclic indicated whether the internal mode is cyclic or non-cyclic.
  Now, since the two modes are merged, the flag means that filtering process will
  take multi cycles, i.e. it has to take a two-pass approach.

Atsushi Kumagai (13):
  Organize bitmap structure for cyclic logic.
  Add option to specify working directory for the bitmap.
  Integrate the entry point of is_dumpable().
  Integrate the main logic of writing kdump file.
  Communalize the function for creating 1st bitmap.
  Remove the old logic of writing kdump pages.
  Integrate filtering process for ELF path.
  Remove the old logic of writing ELF pages.
  Adjust --mem-usage path to the new code.
  Adjust --split/--reassemble path to the new code.
  Adjust refiltering path to the new code.
  Adjust sadump path to the new code.
  Remove --non-cyclic option.

 README         |   16 -
 makedumpfile.8 |   32 +-
 makedumpfile.c | 1624 ++++++++++++++++++--------------------------------------
 makedumpfile.h |   41 +-
 print_info.c   |   23 +-
 sadump_info.c  |    8 +-
 6 files changed, 588 insertions(+), 1156 deletions(-)

--
1.9.0

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/13] makedumpfile: Avoid two pass filtering by using bitmap file.
  2015-05-11  6:13 [PATCH 0/13] makedumpfile: Avoid two pass filtering by using bitmap file Atsushi Kumagai
@ 2015-05-12  8:20 ` HATAYAMA Daisuke
  2015-05-13  8:04   ` Atsushi Kumagai
  0 siblings, 1 reply; 5+ messages in thread
From: HATAYAMA Daisuke @ 2015-05-12  8:20 UTC (permalink / raw)
  To: ats-kumagai; +Cc: kexec

Hello Kumagai-san,

From: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Subject: [PATCH 0/13] makedumpfile: Avoid two pass filtering by using bitmap file.
Date: Mon, 11 May 2015 06:13:51 +0000

> Hello,
> 
> This is the patch set to avoid two pass filtering, it is the
> finished version of the previous patch set below:
> 
>   http://lists.infradead.org/pipermail/kexec/2015-March/013497.html
> 
> 
> cyclic mode has to take a two-pass approach for filtering to save on the
> memory consumption, it's a disadvantage of the cyclic mode and it's basically
> unavoidable. However, even the cyclic mode can avoid two-pass filtering if free
> memory space is enough to store the whole 1st and 2nd bitmaps, but the current
> version doesn't it.
> The main purpose of this patch set is avoiding that useless filtering,
> but before that, I merged non-cyclic mode into cyclic mode as code clean up 
> because the codes are almost the same. Instead, I introduce another way to
> guarantee one-pass filtering by using disk space.
> 

How about compromising progress information to some extent? The first
pass is intended to count up the exact number of dumpable pages just
to provide precise progress information. Is such prcision really
needed?

For example, how about another simple progress information:

   pfn / max_mapnr

where pfn is the number of a page frame that is currently
processed. We know max_mapnr from the beginning, so this is possible
within one pass. It's less precise but might be precise enough.

--
Thanks.
HATAYAMA, Daisuke

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH 0/13] makedumpfile: Avoid two pass filtering by using bitmap file.
  2015-05-12  8:20 ` HATAYAMA Daisuke
@ 2015-05-13  8:04   ` Atsushi Kumagai
  2015-05-14  1:08     ` HATAYAMA Daisuke
  0 siblings, 1 reply; 5+ messages in thread
From: Atsushi Kumagai @ 2015-05-13  8:04 UTC (permalink / raw)
  To: d.hatayama; +Cc: kexec

>> cyclic mode has to take a two-pass approach for filtering to save on the
>> memory consumption, it's a disadvantage of the cyclic mode and it's basically
>> unavoidable. However, even the cyclic mode can avoid two-pass filtering if free
>> memory space is enough to store the whole 1st and 2nd bitmaps, but the current
>> version doesn't it.
>> The main purpose of this patch set is avoiding that useless filtering,
>> but before that, I merged non-cyclic mode into cyclic mode as code clean up
>> because the codes are almost the same. Instead, I introduce another way to
>> guarantee one-pass filtering by using disk space.
>>
>
>How about compromising progress information to some extent? The first
>pass is intended to count up the exact number of dumpable pages just
>to provide precise progress information. Is such prcision really
>needed?

The first pass counts up the num_dumpable *to calculate the offset of
starting page data region in advance*, otherwise makedumpfile can't start
to write page data except create a sparse file. 

   7330 write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
   7331 {
   7332         struct page_desc pd_zero;
   7333         off_t offset_data=0;
   7334         struct disk_dump_header *dh = info->dump_header;
   7335         unsigned char buf[info->page_size];
   7336         struct timeval tv_start;
   7337
   7338         /*
   7339          * Reset counter for debug message.
   7340          */
   7341         pfn_zero = pfn_cache = pfn_cache_private = 0;
   7342         pfn_user = pfn_free = pfn_hwpoison = 0;
   7343         pfn_memhole = info->max_mapnr;
   7344
   7345         cd_header->offset
   7346                 = (DISKDUMP_HEADER_BLOCKS + dh->sub_hdr_size + dh->bitmap_blocks)
   7347                 * dh->block_size;
   7348         cd_page->offset = cd_header->offset + sizeof(page_desc_t)*info->num_dumpable;
   7349         offset_data = cd_page->offset;                                  ^^^^^^^^^^^^


>For example, how about another simple progress information:
>
>   pfn / max_mapnr
>
>where pfn is the number of a page frame that is currently
>processed. We know max_mapnr from the beginning, so this is possible
>within one pass. It's less precise but might be precise enough.

I also think it's enough for progress information, but anyway the 1st
pass is necessary as above.


Thanks
Atsushi Kumagai

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/13] makedumpfile: Avoid two pass filtering by using bitmap file.
  2015-05-13  8:04   ` Atsushi Kumagai
@ 2015-05-14  1:08     ` HATAYAMA Daisuke
  2015-05-15  4:51       ` Atsushi Kumagai
  0 siblings, 1 reply; 5+ messages in thread
From: HATAYAMA Daisuke @ 2015-05-14  1:08 UTC (permalink / raw)
  To: ats-kumagai; +Cc: kexec

From: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com>
Subject: RE: [PATCH 0/13] makedumpfile: Avoid two pass filtering by using bitmap file.
Date: Wed, 13 May 2015 08:04:27 +0000

>>> cyclic mode has to take a two-pass approach for filtering to save on the
>>> memory consumption, it's a disadvantage of the cyclic mode and it's basically
>>> unavoidable. However, even the cyclic mode can avoid two-pass filtering if free
>>> memory space is enough to store the whole 1st and 2nd bitmaps, but the current
>>> version doesn't it.
>>> The main purpose of this patch set is avoiding that useless filtering,
>>> but before that, I merged non-cyclic mode into cyclic mode as code clean up
>>> because the codes are almost the same. Instead, I introduce another way to
>>> guarantee one-pass filtering by using disk space.
>>>
>>
>>How about compromising progress information to some extent? The first
>>pass is intended to count up the exact number of dumpable pages just
>>to provide precise progress information. Is such prcision really
>>needed?
> 
> The first pass counts up the num_dumpable *to calculate the offset of
> starting page data region in advance*, otherwise makedumpfile can't start
> to write page data except create a sparse file. 
> 
>    7330 write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
>    7331 {
>    7332         struct page_desc pd_zero;
>    7333         off_t offset_data=0;
>    7334         struct disk_dump_header *dh = info->dump_header;
>    7335         unsigned char buf[info->page_size];
>    7336         struct timeval tv_start;
>    7337
>    7338         /*
>    7339          * Reset counter for debug message.
>    7340          */
>    7341         pfn_zero = pfn_cache = pfn_cache_private = 0;
>    7342         pfn_user = pfn_free = pfn_hwpoison = 0;
>    7343         pfn_memhole = info->max_mapnr;
>    7344
>    7345         cd_header->offset
>    7346                 = (DISKDUMP_HEADER_BLOCKS + dh->sub_hdr_size + dh->bitmap_blocks)
>    7347                 * dh->block_size;
>    7348         cd_page->offset = cd_header->offset + sizeof(page_desc_t)*info->num_dumpable;
>    7349         offset_data = cd_page->offset;                                  ^^^^^^^^^^^^
> 
> 

I overlooked this, sorry.

Size of page description header is 24 bytes. This corresponds to 6 GB
per 1 TB. Can this become a big problem? Of course, I think it odd
that page description table could be larger than memory data part.

There's another aproach: construct the page description table at each
cycle separately over a dump file and connect them by a linked list.

This changes dump format and needs to add crash utility support; no
compatibility to current crash utility.

>>For example, how about another simple progress information:
>>
>>   pfn / max_mapnr
>>
>>where pfn is the number of a page frame that is currently
>>processed. We know max_mapnr from the beginning, so this is possible
>>within one pass. It's less precise but might be precise enough.
> 
> I also think it's enough for progress information, but anyway the 1st
> pass is necessary as above.
> 
> 
> Thanks
> Atsushi Kumagai
--
Thanks.
HATAYAMA, Daisuke


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH 0/13] makedumpfile: Avoid two pass filtering by using bitmap file.
  2015-05-14  1:08     ` HATAYAMA Daisuke
@ 2015-05-15  4:51       ` Atsushi Kumagai
  0 siblings, 0 replies; 5+ messages in thread
From: Atsushi Kumagai @ 2015-05-15  4:51 UTC (permalink / raw)
  To: d.hatayama; +Cc: kexec

[-- Attachment #1: Type: text/plain, Size: 4253 bytes --]

>>>How about compromising progress information to some extent? The first
>>>pass is intended to count up the exact number of dumpable pages just
>>>to provide precise progress information. Is such prcision really
>>>needed?
>>
>> The first pass counts up the num_dumpable *to calculate the offset of
>> starting page data region in advance*, otherwise makedumpfile can't start
>> to write page data except create a sparse file.
>>
>>    7330 write_kdump_pages_and_bitmap_cyclic(struct cache_data *cd_header, struct cache_data *cd_page)
>>    7331 {
>>    7332         struct page_desc pd_zero;
>>    7333         off_t offset_data=0;
>>    7334         struct disk_dump_header *dh = info->dump_header;
>>    7335         unsigned char buf[info->page_size];
>>    7336         struct timeval tv_start;
>>    7337
>>    7338         /*
>>    7339          * Reset counter for debug message.
>>    7340          */
>>    7341         pfn_zero = pfn_cache = pfn_cache_private = 0;
>>    7342         pfn_user = pfn_free = pfn_hwpoison = 0;
>>    7343         pfn_memhole = info->max_mapnr;
>>    7344
>>    7345         cd_header->offset
>>    7346                 = (DISKDUMP_HEADER_BLOCKS + dh->sub_hdr_size + dh->bitmap_blocks)
>>    7347                 * dh->block_size;
>>    7348         cd_page->offset = cd_header->offset + sizeof(page_desc_t)*info->num_dumpable;
>>    7349         offset_data = cd_page->offset;                                  ^^^^^^^^^^^^
>>
>>
>
>I overlooked this, sorry.
>
>Size of page description header is 24 bytes. This corresponds to 6 GB
>per 1 TB. Can this become a big problem? Of course, I think it odd
>that page description table could be larger than memory data part.

At least, it looks that the member "page_flags" can be removed since
makedumpfile always just set 0 to it and crash doesn't refer it.

  typedef struct page_desc {
	off_t offset; 				/* the offset of the page data*/
	unsigned int size; 			/* the size of this dump page */
	unsigned int flags; 			/* flags */
	unsigned long long page_flags; 		/* page flags */  <--- always 0, this 8 byte is useless.
  } page_desc_t;


(Sorry for getting off track here)
Further, I have another idea that would reduce the total size
of page descriptor. That is assigning a page descriptor to a number
of pages, it means multiple pages will be managed as a data block.

The original purpose of the idea is improving compressive performance
by compressing some pages in a lump.
We know the compression with zlib is too slow. I suspect that one of
the causes is the buffer size for compress2().
When compressing a 100MB file, I expect that compressing 10MB block 10 times
will be faster than compressing 1MB block 100 times.
Actually I did simple verification with the attached program like:

  # ./zlib_compress 1024 testdata
  TOTAL COMPRESSION TIME: 18.478064
  # ./zlib_compress 10240 testdata
  TOTAL COMPRESSION TIME: 5.940524
  # ./zlib_compress 102400 testdata
  TOTAL COMPRESSION TIME: 2.088867
  #

Unfortunately I haven't had a chance to work for it for a long time,
but I think it would be better to consider it together if we design
a new dump format.

>There's another aproach: construct the page description table at each
>cycle separately over a dump file and connect them by a linked list.
>
>This changes dump format and needs to add crash utility support; no
>compatibility to current crash utility.

It's interesting. I think we should improve the format if there is a
good reason, the format shouldn't be an obstacle.
Of course, the new format should be an option at first, but it would
be great if there is a choice to get better performance.


Thanks
Atsushi Kumagai

>>>For example, how about another simple progress information:
>>>
>>>   pfn / max_mapnr
>>>
>>>where pfn is the number of a page frame that is currently
>>>processed. We know max_mapnr from the beginning, so this is possible
>>>within one pass. It's less precise but might be precise enough.
>>
>> I also think it's enough for progress information, but anyway the 1st
>> pass is necessary as above.
>>
>>
>> Thanks
>> Atsushi Kumagai
>--
>Thanks.
>HATAYAMA, Daisuke

[-- Attachment #2: zlib_compress.c --]
[-- Type: text/plain, Size: 1243 bytes --]

#include <zlib.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h>
#include <fcntl.h>
#include <sys/time.h>


static inline double getdtime(void)
{
	struct timeval tv;

	gettimeofday(&tv, NULL);

	return (double)tv.tv_sec + (double)tv.tv_usec * 0.001 * 0.001;
}

int main(int argc, char * argv[]) {
	char *buf_in, *buf_out;
	unsigned long size_in, size_out;
	int fd;
	double d_start, d_end, d_compress;
	
	if (argc != 3) {
		printf("./command <size_in> <input file>\n");
		exit(1);
	}

	d_compress = 0;

	// buffer for input data
	size_in = atoi(argv[1]);
	buf_in = (char *)malloc(size_in);

	if (!buf_in) {
		printf("malloc failed. %n byten", size_in);
		exit(1);
	}

	// buffer for output data
	size_out = compressBound(size_in);
	buf_out = (char *)malloc(size_out);

	if (!buf_out) {
		printf("malloc failed. %n byten", size_out);
		exit(1);
	}

	if ((fd = open(argv[2], O_RDONLY)) == 0) {
		printf("file open failed. %s\n", argv[2]);
		exit(1);
	}

	while (read(fd, buf_in, size_in)) {
		d_start = getdtime();
		compress2(buf_out, &size_out, buf_in, size_in, Z_BEST_SPEED);
		d_end = getdtime();
		d_compress += d_end - d_start;
	}

	printf("TOTAL COMPRESSION TIME: %lf\n", d_compress);
	return 0;
}

		  

[-- Attachment #3: Type: text/plain, Size: 143 bytes --]

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-05-15  4:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-11  6:13 [PATCH 0/13] makedumpfile: Avoid two pass filtering by using bitmap file Atsushi Kumagai
2015-05-12  8:20 ` HATAYAMA Daisuke
2015-05-13  8:04   ` Atsushi Kumagai
2015-05-14  1:08     ` HATAYAMA Daisuke
2015-05-15  4:51       ` Atsushi Kumagai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.