All of lore.kernel.org
 help / color / mirror / Atom feed
* makedumpfile memmory usage seems high with option -E
@ 2014-04-10 21:44 Vivek Goyal
  2014-04-11  9:22 ` Atsushi Kumagai
  0 siblings, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2014-04-10 21:44 UTC (permalink / raw)
  To: Kexec Mailing List, Atsushi Kumagai, HATAYAMA Daisuke; +Cc: Arthur Zou

Hi Atsushi/Hatayama,

We noticed in our testing that makedumpfile gets OOM killed if we happen
to use -E option. Saving to compressed kdump format works just fine.

Also we noticed that with -E if we disable cyclic mode then it works just
fine.

So looks like something is going on with -E and cyclic mode enabled. I am
not sure what it is.

Do you suspect something?

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: makedumpfile memmory usage seems high with option -E
  2014-04-10 21:44 makedumpfile memmory usage seems high with option -E Vivek Goyal
@ 2014-04-11  9:22 ` Atsushi Kumagai
  2014-04-11 10:19   ` Arthur Zou
  0 siblings, 1 reply; 38+ messages in thread
From: Atsushi Kumagai @ 2014-04-11  9:22 UTC (permalink / raw)
  To: vgoyal; +Cc: d.hatayama, kexec, zzou

Hello Vivek,

>Hi Atsushi/Hatayama,
>
>We noticed in our testing that makedumpfile gets OOM killed if we happen
>to use -E option. Saving to compressed kdump format works just fine.
>
>Also we noticed that with -E if we disable cyclic mode then it works just
>fine.
>
>So looks like something is going on with -E and cyclic mode enabled. I am
>not sure what it is.
>
>Do you suspect something?

At first, I supposed the function to calculate cyclic buffer size may
be related to this issue, but I haven't found the answer yet...

int
calculate_cyclic_buffer_size() {

        if (info->flag_elf_dumpfile) {
                free_size = get_free_memory_size() * 0.4;
                needed_size = (info->max_mapnr * 2) / BITPERBYTE;
        } else {
                free_size = get_free_memory_size() * 0.8;
                needed_size = info->max_mapnr / BITPERBYTE;
        }
        [...]
        info->bufsize_cyclic = (free_size <= needed_size) ? free_size : needed_size;


I've found this function has an issue about memory allocation.
When -E is specified, info->bufsize_cyclic will be the total size of
the 1st and 2nd bitmap if free memory is enough. Then,
info->bufsize_cyclic will be used to allocate each bitmap in
prepare_bitmap_buffer_cyclic() like below:

        if ((info->partial_bitmap1 = (char *)malloc(info->bufsize_cyclic)) == NULL) {
                ERRMSG("Can't allocate memory for the 1st-bitmap. %s\n",
                       strerror(errno));
                return FALSE;
        }
        if ((info->partial_bitmap2 = (char *)malloc(info->bufsize_cyclic)) == NULL) {
                ERRMSG("Can't allocate memory for the 2nd-bitmap. %s\n",
                       strerror(errno));
                return FALSE;
        }

It's a too much allocation definitely, but it mustn't exceed 80% of free
memory due to the condition check in calculate_cyclic_buffer_size(), so
I think the OOM issue will not happen by this issue.
I'll fix this too much allocation with the patch below, but it will not
resolve your OOM issue... 

BTW, what are your version of makedumpfile and crashkernel= size and
the system memory size? and does the issue happen even if you specify
--cyclic-buffer which is small enough to fit the available memory ?
I'm curious to know the details of the condition which cause the issue.


Thanks
Atsushi Kumagai


diff --git a/makedumpfile.c b/makedumpfile.c
index 75092a8..ae9e69a 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -8996,7 +8996,7 @@ out:
  */
 int
 calculate_cyclic_buffer_size(void) {
-	unsigned long long free_size, needed_size;
+	unsigned long long limit_size, bitmap_size;
 
 	if (info->max_mapnr <= 0) {
 		ERRMSG("Invalid max_mapnr(%llu).\n", info->max_mapnr);
@@ -9009,18 +9009,17 @@ calculate_cyclic_buffer_size(void) {
 	 * within 80% of free memory.
 	 */
 	if (info->flag_elf_dumpfile) {
-		free_size = get_free_memory_size() * 0.4;
-		needed_size = (info->max_mapnr * 2) / BITPERBYTE;
+		limit_size = get_free_memory_size() * 0.4;
 	} else {
-		free_size = get_free_memory_size() * 0.8;
-		needed_size = info->max_mapnr / BITPERBYTE;
+		limit_size = get_free_memory_size() * 0.8;
 	}
+	bitmap_size = info->max_mapnr / BITPERBYTE;
 
 	/* if --split was specified cyclic buffer allocated per dump file */
 	if (info->num_dumpfile > 1)
-		needed_size /= info->num_dumpfile;
+		bitmap_size /= info->num_dumpfile;
 
-	info->bufsize_cyclic = (free_size <= needed_size) ? free_size : needed_size;
+	info->bufsize_cyclic = (limit_size <= bitmap_size) ? limit_size : bitmap_size;
 
 	return TRUE;
 }
-- 
1.8.0.2

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: makedumpfile memmory usage seems high with option -E
  2014-04-11  9:22 ` Atsushi Kumagai
@ 2014-04-11 10:19   ` Arthur Zou
  2014-04-14  8:02     ` [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump Baoquan He
  0 siblings, 1 reply; 38+ messages in thread
From: Arthur Zou @ 2014-04-11 10:19 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: d hatayama, kexec, vgoyal



----- Original Message -----
> Hello Vivek,
> 
> >Hi Atsushi/Hatayama,
> >
> >We noticed in our testing that makedumpfile gets OOM killed if we happen
> >to use -E option. Saving to compressed kdump format works just fine.
> >
> >Also we noticed that with -E if we disable cyclic mode then it works just
> >fine.
> >
> >So looks like something is going on with -E and cyclic mode enabled. I am
> >not sure what it is.
> >
> >Do you suspect something?
> 
> At first, I supposed the function to calculate cyclic buffer size may
> be related to this issue, but I haven't found the answer yet...
> 
> int
> calculate_cyclic_buffer_size() {
> 
>         if (info->flag_elf_dumpfile) {
>                 free_size = get_free_memory_size() * 0.4;
>                 needed_size = (info->max_mapnr * 2) / BITPERBYTE;
>         } else {
>                 free_size = get_free_memory_size() * 0.8;
>                 needed_size = info->max_mapnr / BITPERBYTE;
>         }
>         [...]
>         info->bufsize_cyclic = (free_size <= needed_size) ? free_size :
>         needed_size;
> 
> 
> I've found this function has an issue about memory allocation.
> When -E is specified, info->bufsize_cyclic will be the total size of
> the 1st and 2nd bitmap if free memory is enough. Then,
> info->bufsize_cyclic will be used to allocate each bitmap in
> prepare_bitmap_buffer_cyclic() like below:
> 
>         if ((info->partial_bitmap1 = (char *)malloc(info->bufsize_cyclic)) ==
>         NULL) {
>                 ERRMSG("Can't allocate memory for the 1st-bitmap. %s\n",
>                        strerror(errno));
>                 return FALSE;
>         }
>         if ((info->partial_bitmap2 = (char *)malloc(info->bufsize_cyclic)) ==
>         NULL) {
>                 ERRMSG("Can't allocate memory for the 2nd-bitmap. %s\n",
>                        strerror(errno));
>                 return FALSE;
>         }
> 
> It's a too much allocation definitely, but it mustn't exceed 80% of free
> memory due to the condition check in calculate_cyclic_buffer_size(), so
> I think the OOM issue will not happen by this issue.
> I'll fix this too much allocation with the patch below, but it will not
> resolve your OOM issue...
> 
> BTW, what are your version of makedumpfile and crashkernel= size and
> the system memory size? and does the issue happen even if you specify
> --cyclic-buffer which is small enough to fit the available memory ?
> I'm curious to know the details of the condition which cause the issue.
> 
> 

Hi Atsushi Kumagai
   The makedumpfile version is 1.5.4. crashkernel=auto which actually is 166M.
the system memory size is about 96G and the arch is x86_64 which kernel version
3.10.110. But the same issue happened when using the newest makedumpfile in devel
branch. After add --cyclic-buffer=100, dump succeed.

Thanks
arthur 

> Thanks
> Atsushi Kumagai
> 
> 
> diff --git a/makedumpfile.c b/makedumpfile.c
> index 75092a8..ae9e69a 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -8996,7 +8996,7 @@ out:
>   */
>  int
>  calculate_cyclic_buffer_size(void) {
> -	unsigned long long free_size, needed_size;
> +	unsigned long long limit_size, bitmap_size;
>  
>  	if (info->max_mapnr <= 0) {
>  		ERRMSG("Invalid max_mapnr(%llu).\n", info->max_mapnr);
> @@ -9009,18 +9009,17 @@ calculate_cyclic_buffer_size(void) {
>  	 * within 80% of free memory.
>  	 */
>  	if (info->flag_elf_dumpfile) {
> -		free_size = get_free_memory_size() * 0.4;
> -		needed_size = (info->max_mapnr * 2) / BITPERBYTE;
> +		limit_size = get_free_memory_size() * 0.4;
>  	} else {
> -		free_size = get_free_memory_size() * 0.8;
> -		needed_size = info->max_mapnr / BITPERBYTE;
> +		limit_size = get_free_memory_size() * 0.8;
>  	}
> +	bitmap_size = info->max_mapnr / BITPERBYTE;
>  
>  	/* if --split was specified cyclic buffer allocated per dump file */
>  	if (info->num_dumpfile > 1)
> -		needed_size /= info->num_dumpfile;
> +		bitmap_size /= info->num_dumpfile;
>  
> -	info->bufsize_cyclic = (free_size <= needed_size) ? free_size :
> needed_size;
> +	info->bufsize_cyclic = (limit_size <= bitmap_size) ? limit_size :
> bitmap_size;
>  
>  	return TRUE;
>  }
> --
> 1.8.0.2
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-11 10:19   ` Arthur Zou
@ 2014-04-14  8:02     ` Baoquan He
  2014-04-14  8:11       ` Baoquan He
  2014-04-16  6:44       ` Baoquan He
  0 siblings, 2 replies; 38+ messages in thread
From: Baoquan He @ 2014-04-14  8:02 UTC (permalink / raw)
  To: Arthur Zou; +Cc: kexec, d hatayama, Atsushi Kumagai, vgoyal

In case elf dump, the code to calculate the cyclic buffer size is
not correct. Since elf need bitmap1/2, so the needed memory for
bufsize_cyclic need be double. Hence free size should be 40% of
free memory, however the needed_size which free_size is compared
with should be info->max_mapnr / BITPERBYTE, but not 2 times of it.

Because of this, in case of free memory in 2nd kernel is not too much,
OOM will happen very often.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 makedumpfile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/makedumpfile.c b/makedumpfile.c
index 75092a8..01ec516 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -9010,7 +9010,7 @@ calculate_cyclic_buffer_size(void) {
 	 */
 	if (info->flag_elf_dumpfile) {
 		free_size = get_free_memory_size() * 0.4;
-		needed_size = (info->max_mapnr * 2) / BITPERBYTE;
+		needed_size = info->max_mapnr / BITPERBYTE;
 	} else {
 		free_size = get_free_memory_size() * 0.8;
 		needed_size = info->max_mapnr / BITPERBYTE;
-- 
1.8.5.3


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-14  8:02     ` [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump Baoquan He
@ 2014-04-14  8:11       ` Baoquan He
  2014-04-16  6:44       ` Baoquan He
  1 sibling, 0 replies; 38+ messages in thread
From: Baoquan He @ 2014-04-14  8:11 UTC (permalink / raw)
  To: Arthur Zou; +Cc: Atsushi Kumagai, d hatayama, kexec, vgoyal

Hi Atsushi,

Since both of our test machines where this bug was reproduced are
reserved, I haven't tested this patch. But after several times of the
code flow of elf dump, finally this only this suspect is found.

Could you also help check this?

Also below change is OK too, and its meaning is more obvious. But my
previous patch is more readable for code.

int
calculate_cyclic_buffer_size(void) {
        unsigned long long free_size, needed_size;
 
        if (info->max_mapnr <= 0) {
                ERRMSG("Invalid max_mapnr(%llu).\n", info->max_mapnr);
                return FALSE;
        }
 
        /*
         * free_size will be used to allocate 1st and 2nd bitmap, so it
         * should be 40% of free memory to keep the size of cyclic
         * buffer
         * within 80% of free memory.
         */
        if (info->flag_elf_dumpfile) {
                free_size = get_free_memory_size() * 0.4 * 2;
                needed_size = (info->max_mapnr * 2) / BITPERBYTE;
        } else {
                free_size = get_free_memory_size() * 0.8;
                needed_size = info->max_mapnr / BITPERBYTE;
        }
 
        /* if --split was specified cyclic buffer allocated per dump
 * file */
        if (info->num_dumpfile > 1)                                                                                                                                                     
                needed_size /= info->num_dumpfile;
 
        info->bufsize_cyclic = (free_size <= needed_size) ? free_size/2 :
needed_size/2;
 
        return TRUE;
}

Thanks
Baoquan


On 04/14/14 at 04:02pm, Baoquan He wrote:
> In case elf dump, the code to calculate the cyclic buffer size is
> not correct. Since elf need bitmap1/2, so the needed memory for
> bufsize_cyclic need be double. Hence free size should be 40% of
> free memory, however the needed_size which free_size is compared
> with should be info->max_mapnr / BITPERBYTE, but not 2 times of it.
> 
> Because of this, in case of free memory in 2nd kernel is not too much,
> OOM will happen very often.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  makedumpfile.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/makedumpfile.c b/makedumpfile.c
> index 75092a8..01ec516 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -9010,7 +9010,7 @@ calculate_cyclic_buffer_size(void) {
>  	 */
>  	if (info->flag_elf_dumpfile) {
>  		free_size = get_free_memory_size() * 0.4;
> -		needed_size = (info->max_mapnr * 2) / BITPERBYTE;
> +		needed_size = info->max_mapnr / BITPERBYTE;
>  	} else {
>  		free_size = get_free_memory_size() * 0.8;
>  		needed_size = info->max_mapnr / BITPERBYTE;
> -- 
> 1.8.5.3
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-14  8:02     ` [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump Baoquan He
  2014-04-14  8:11       ` Baoquan He
@ 2014-04-16  6:44       ` Baoquan He
  2014-04-17  4:01         ` Atsushi Kumagai
  1 sibling, 1 reply; 38+ messages in thread
From: Baoquan He @ 2014-04-16  6:44 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: d hatayama, kexec, Arthur Zou, vgoyal

Hi Atsushi,

I have got the test machine where bug reported and did a test. The
changed code can make elf dump successful.

Thanks
Baoquan


On 04/14/14 at 04:02pm, Baoquan He wrote:
> In case elf dump, the code to calculate the cyclic buffer size is
> not correct. Since elf need bitmap1/2, so the needed memory for
> bufsize_cyclic need be double. Hence free size should be 40% of
> free memory, however the needed_size which free_size is compared
> with should be info->max_mapnr / BITPERBYTE, but not 2 times of it.
> 
> Because of this, in case of free memory in 2nd kernel is not too much,
> OOM will happen very often.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  makedumpfile.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/makedumpfile.c b/makedumpfile.c
> index 75092a8..01ec516 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -9010,7 +9010,7 @@ calculate_cyclic_buffer_size(void) {
>  	 */
>  	if (info->flag_elf_dumpfile) {
>  		free_size = get_free_memory_size() * 0.4;
> -		needed_size = (info->max_mapnr * 2) / BITPERBYTE;
> +		needed_size = info->max_mapnr / BITPERBYTE;
>  	} else {
>  		free_size = get_free_memory_size() * 0.8;
>  		needed_size = info->max_mapnr / BITPERBYTE;
> -- 
> 1.8.5.3
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-16  6:44       ` Baoquan He
@ 2014-04-17  4:01         ` Atsushi Kumagai
  2014-04-17  4:52           ` bhe
  0 siblings, 1 reply; 38+ messages in thread
From: Atsushi Kumagai @ 2014-04-17  4:01 UTC (permalink / raw)
  To: bhe; +Cc: d.hatayama, kexec, zzou, vgoyal

Hello Baoquan,

>Hi Atsushi,
>
>I have got the test machine where bug reported and did a test. The
>changed code can make elf dump successful.

Great, thanks for your help!
However, I still have questions.

First, what is the difference between yours and mine?

http://lists.infradead.org/pipermail/kexec/2014-April/011535.html

My patch includes renaming some values, but the purpose looks
the same as yours.
Further, you described as below, 

>On 04/14/14 at 04:02pm, Baoquan He wrote:
>> In case elf dump, the code to calculate the cyclic buffer size is
>> not correct. Since elf need bitmap1/2, so the needed memory for
>> bufsize_cyclic need be double. Hence free size should be 40% of
>> free memory, however the needed_size which free_size is compared
>> with should be info->max_mapnr / BITPERBYTE, but not 2 times of it.
>>
>> Because of this, in case of free memory in 2nd kernel is not too much,
>> OOM will happen very often.

but I still don't think this bug causes OOM.
Even if needed_size is calculated as so much size wrongly, bufsize_cyclic
will not exceed 40% of free memory by the check below:

    info->bufsize_cyclic = (free_size <= needed_size) ? free_size : needed_size;

So it looks that bitmap1(40%) and bitmap2(40%) will fit in 80% of free
memory in any case.

I may misunderstand something since your patch has an effect on this
issue in practice, could you correct me?


Thanks
Atsushi Kumagai

>> Signed-off-by: Baoquan He <bhe@redhat.com>
>> ---
>>  makedumpfile.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/makedumpfile.c b/makedumpfile.c
>> index 75092a8..01ec516 100644
>> --- a/makedumpfile.c
>> +++ b/makedumpfile.c
>> @@ -9010,7 +9010,7 @@ calculate_cyclic_buffer_size(void) {
>>  	 */
>>  	if (info->flag_elf_dumpfile) {
>>  		free_size = get_free_memory_size() * 0.4;
>> -		needed_size = (info->max_mapnr * 2) / BITPERBYTE;
>> +		needed_size = info->max_mapnr / BITPERBYTE;
>>  	} else {
>>  		free_size = get_free_memory_size() * 0.8;
>>  		needed_size = info->max_mapnr / BITPERBYTE;
>> --
>> 1.8.5.3
>>
>>
>> _______________________________________________
>> kexec mailing list
>> kexec@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-17  4:01         ` Atsushi Kumagai
@ 2014-04-17  4:52           ` bhe
  2014-04-17  5:02             ` bhe
  0 siblings, 1 reply; 38+ messages in thread
From: bhe @ 2014-04-17  4:52 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: d.hatayama, kexec, zzou, vgoyal

On 04/17/14 at 04:01am, Atsushi Kumagai wrote:
> Hello Baoquan,
> 
> >Hi Atsushi,
> >
> >I have got the test machine where bug reported and did a test. The
> >changed code can make elf dump successful.
> 
> Great, thanks for your help!
> However, I still have questions.
> 
> First, what is the difference between yours and mine?
> 
> http://lists.infradead.org/pipermail/kexec/2014-April/011535.html

Yeah, you are right, it's the same on changing the code bug. I mush
haven't read your patch carefully.

> 
> My patch includes renaming some values, but the purpose looks
> the same as yours.
> Further, you described as below, 
> 
> >On 04/14/14 at 04:02pm, Baoquan He wrote:
> >> In case elf dump, the code to calculate the cyclic buffer size is
> >> not correct. Since elf need bitmap1/2, so the needed memory for
> >> bufsize_cyclic need be double. Hence free size should be 40% of
> >> free memory, however the needed_size which free_size is compared
> >> with should be info->max_mapnr / BITPERBYTE, but not 2 times of it.
> >>
> >> Because of this, in case of free memory in 2nd kernel is not too much,
> >> OOM will happen very often.
> 
> but I still don't think this bug causes OOM.
> Even if needed_size is calculated as so much size wrongly, bufsize_cyclic
> will not exceed 40% of free memory by the check below:
> 
>     info->bufsize_cyclic = (free_size <= needed_size) ? free_size : needed_size;
> 
> So it looks that bitmap1(40%) and bitmap2(40%) will fit in 80% of free
> memory in any case.
> 
> I may misunderstand something since your patch has an effect on this
> issue in practice, could you correct me?

It definitely will cause OOM. On my test machine, it has 100G memory. So
per old code, its needed_size is 3200K*2 == 6.4M, if currently free
memory is only 15M left, the free_size will be 15M*0.4 which is 6M. So
info->bufsize_cyclic is assigned to be 6M. and only 3M is left for other
use, e.g page cache, dynamic allocation. OOM will happen.

> 
> 
> Thanks
> Atsushi Kumagai
> 
> >> Signed-off-by: Baoquan He <bhe@redhat.com>
> >> ---
> >>  makedumpfile.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/makedumpfile.c b/makedumpfile.c
> >> index 75092a8..01ec516 100644
> >> --- a/makedumpfile.c
> >> +++ b/makedumpfile.c
> >> @@ -9010,7 +9010,7 @@ calculate_cyclic_buffer_size(void) {
> >>  	 */
> >>  	if (info->flag_elf_dumpfile) {
> >>  		free_size = get_free_memory_size() * 0.4;
> >> -		needed_size = (info->max_mapnr * 2) / BITPERBYTE;
> >> +		needed_size = info->max_mapnr / BITPERBYTE;
> >>  	} else {
> >>  		free_size = get_free_memory_size() * 0.8;
> >>  		needed_size = info->max_mapnr / BITPERBYTE;
> >> --
> >> 1.8.5.3
> >>
> >>
> >> _______________________________________________
> >> kexec mailing list
> >> kexec@lists.infradead.org
> >> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-17  4:52           ` bhe
@ 2014-04-17  5:02             ` bhe
  2014-04-18  9:22               ` Atsushi Kumagai
  0 siblings, 1 reply; 38+ messages in thread
From: bhe @ 2014-04-17  5:02 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: d.hatayama, kexec, zzou, vgoyal

On 04/17/14 at 12:52pm, Baoquan He wrote:
> On 04/17/14 at 04:01am, Atsushi Kumagai wrote:
> > Hello Baoquan,
> > 
> > >Hi Atsushi,
> > >
> > >I have got the test machine where bug reported and did a test. The
> > >changed code can make elf dump successful.
> > 
> > Great, thanks for your help!
> > However, I still have questions.
> > 
> > First, what is the difference between yours and mine?
> > 
> > http://lists.infradead.org/pipermail/kexec/2014-April/011535.html
> 
> Yeah, you are right, it's the same on changing the code bug. I mush
> haven't read your patch carefully. 
                                                          must<--
> 
> > 
> > My patch includes renaming some values, but the purpose looks
> > the same as yours.
> > Further, you described as below, 
> > 
> > >On 04/14/14 at 04:02pm, Baoquan He wrote:
> > but I still don't think this bug causes OOM.
> > Even if needed_size is calculated as so much size wrongly, bufsize_cyclic
> > will not exceed 40% of free memory by the check below:
> > 
> >     info->bufsize_cyclic = (free_size <= needed_size) ? free_size : needed_size;
> > 
> > So it looks that bitmap1(40%) and bitmap2(40%) will fit in 80% of free
> > memory in any case.
> > 
> > I may misunderstand something since your patch has an effect on this
> > issue in practice, could you correct me?
> 
> It definitely will cause OOM. On my test machine, it has 100G memory. So
> per old code, its needed_size is 3200K*2 == 6.4M, if currently free
> memory is only 15M left, the free_size will be 15M*0.4 which is 6M. So
> info->bufsize_cyclic is assigned to be 6M. and only 3M is left for other
> use, e.g page cache, dynamic allocation. OOM will happen.
> 

BTW, in our case, there's about 30M free memory when we started saving
dump. It should be caused by my coarse estimation above.



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-17  5:02             ` bhe
@ 2014-04-18  9:22               ` Atsushi Kumagai
  2014-04-18 14:29                 ` bhe
  2014-04-21 15:12                 ` Vivek Goyal
  0 siblings, 2 replies; 38+ messages in thread
From: Atsushi Kumagai @ 2014-04-18  9:22 UTC (permalink / raw)
  To: bhe; +Cc: d.hatayama, kexec, zzou, vgoyal

>On 04/17/14 at 12:52pm, Baoquan He wrote:
>> On 04/17/14 at 04:01am, Atsushi Kumagai wrote:
>> > Hello Baoquan,
>> >
>> > >Hi Atsushi,
>> > >
>> > >I have got the test machine where bug reported and did a test. The
>> > >changed code can make elf dump successful.
>> >
>> > Great, thanks for your help!
>> > However, I still have questions.
>> >
>> > First, what is the difference between yours and mine?
>> >
>> > http://lists.infradead.org/pipermail/kexec/2014-April/011535.html
>>
>> Yeah, you are right, it's the same on changing the code bug. I mush
>> haven't read your patch carefully.
>                                                          must<--
>>
>> >
>> > My patch includes renaming some values, but the purpose looks
>> > the same as yours.
>> > Further, you described as below,
>> >
>> > >On 04/14/14 at 04:02pm, Baoquan He wrote:
>> > but I still don't think this bug causes OOM.
>> > Even if needed_size is calculated as so much size wrongly, bufsize_cyclic
>> > will not exceed 40% of free memory by the check below:
>> >
>> >     info->bufsize_cyclic = (free_size <= needed_size) ? free_size : needed_size;
>> >
>> > So it looks that bitmap1(40%) and bitmap2(40%) will fit in 80% of free
>> > memory in any case.
>> >
>> > I may misunderstand something since your patch has an effect on this
>> > issue in practice, could you correct me?
>>
>> It definitely will cause OOM. On my test machine, it has 100G memory. So
>> per old code, its needed_size is 3200K*2 == 6.4M, if currently free
>> memory is only 15M left, the free_size will be 15M*0.4 which is 6M. So
>> info->bufsize_cyclic is assigned to be 6M. and only 3M is left for other
>> use, e.g page cache, dynamic allocation. OOM will happen.
>>
>
>BTW, in our case, there's about 30M free memory when we started saving
>dump. It should be caused by my coarse estimation above.

Thanks for your description, I understand that situation and
the nature of the problem.

That is, the assumption that 20% of free memory is enough for
makedumpfile can be broken if free memory is too small.
If your machine has 200GB memory, OOM will happen even after fix
the too allocation bug.

I don't think this is a problem, it's natural that a lack of memory
causes OOM. However, there is a thing we can do for improvement. 

What I think is:

  1. Use a constant value as safe limit to calculate bufsize_cyclic
     instead of 80% of free memory. This value must be enough for
     makedumpfile's work except bitmap.

  2. If free memory is smaller than the value, makedumpfile gives up
     to work early.

This change may reduce the possibility of lack of memory, but the
required memory size will be changing every version, so maintaining
it sounds tough to me.

Any comments are welcome.


Thanks
Atsushi Kumagai

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-18  9:22               ` Atsushi Kumagai
@ 2014-04-18 14:29                 ` bhe
  2014-04-18 19:41                   ` Petr Tesarik
  2014-04-21 15:14                   ` Vivek Goyal
  2014-04-21 15:12                 ` Vivek Goyal
  1 sibling, 2 replies; 38+ messages in thread
From: bhe @ 2014-04-18 14:29 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: d.hatayama, kexec, zzou, vgoyal


> >> It definitely will cause OOM. On my test machine, it has 100G memory. So
> >> per old code, its needed_size is 3200K*2 == 6.4M, if currently free
> >> memory is only 15M left, the free_size will be 15M*0.4 which is 6M. So
> >> info->bufsize_cyclic is assigned to be 6M. and only 3M is left for other
> >> use, e.g page cache, dynamic allocation. OOM will happen.
> >>
> >
> >BTW, in our case, there's about 30M free memory when we started saving
> >dump. It should be caused by my coarse estimation above.
> 
> Thanks for your description, I understand that situation and
> the nature of the problem.
> 
> That is, the assumption that 20% of free memory is enough for
> makedumpfile can be broken if free memory is too small.
> If your machine has 200GB memory, OOM will happen even after fix
> the too allocation bug.

Well, we have done some experiments to try to get the statistical memory
range which kdump really need. Then a final reservation will be
calculated automatically as (base_value + linear growth of total memory). 
If one machine has 200GB memory, its reservation will grow too. Since
except of the bitmap cost, other memory cost is almost fixed. 

Per this scheme things should be go well, if memory always goes to the
edge of OOM, an adjust of base_value is needed. So a constant value as
you said may not be needed.

Instead, I am wondering how the 80% comes from, and why 20% of free
memory must be safe.
> 
> I don't think this is a problem, it's natural that a lack of memory
> causes OOM. However, there is a thing we can do for improvement. 
> 
> What I think is:
> 
>   1. Use a constant value as safe limit to calculate bufsize_cyclic
>      instead of 80% of free memory. This value must be enough for
>      makedumpfile's work except bitmap.
> 
>   2. If free memory is smaller than the value, makedumpfile gives up
>      to work early.
> 
> This change may reduce the possibility of lack of memory, but the
> required memory size will be changing every version, so maintaining
> it sounds tough to me.
> 
> Any comments are welcome.
> 
> 
> Thanks
> Atsushi Kumagai

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-18 14:29                 ` bhe
@ 2014-04-18 19:41                   ` Petr Tesarik
  2014-04-21 15:19                     ` Vivek Goyal
  2014-04-21 15:14                   ` Vivek Goyal
  1 sibling, 1 reply; 38+ messages in thread
From: Petr Tesarik @ 2014-04-18 19:41 UTC (permalink / raw)
  To: bhe; +Cc: kexec, d.hatayama, Atsushi Kumagai, zzou, vgoyal

On Fri, 18 Apr 2014 22:29:12 +0800
"bhe@redhat.com" <bhe@redhat.com> wrote:

> 
> > >> It definitely will cause OOM. On my test machine, it has 100G memory. So
> > >> per old code, its needed_size is 3200K*2 == 6.4M, if currently free
> > >> memory is only 15M left, the free_size will be 15M*0.4 which is 6M. So
> > >> info->bufsize_cyclic is assigned to be 6M. and only 3M is left for other
> > >> use, e.g page cache, dynamic allocation. OOM will happen.
> > >>
> > >
> > >BTW, in our case, there's about 30M free memory when we started saving
> > >dump. It should be caused by my coarse estimation above.
> > 
> > Thanks for your description, I understand that situation and
> > the nature of the problem.
> > 
> > That is, the assumption that 20% of free memory is enough for
> > makedumpfile can be broken if free memory is too small.
> > If your machine has 200GB memory, OOM will happen even after fix
> > the too allocation bug.
> 
> Well, we have done some experiments to try to get the statistical memory
> range which kdump really need. Then a final reservation will be
> calculated automatically as (base_value + linear growth of total memory). 
> If one machine has 200GB memory, its reservation will grow too. Since
> except of the bitmap cost, other memory cost is almost fixed. 
> 
> Per this scheme things should be go well, if memory always goes to the
> edge of OOM, an adjust of base_value is needed. So a constant value as
> you said may not be needed.
> 
> Instead, I am wondering how the 80% comes from, and why 20% of free
> memory must be safe.

I believe these 80% come from the default value of vm.dirty_ratio,
which is 20%. In other words, the kernel won't block further writes
until 20% of available RAM is used up by dirty cache. But if you
fill up all free memory with dirty pages and then touch another (though
allocated) page, the kernel will go into direct reclaim, and if nothing
can be written out ATM, it will invoke the OOM Killer.

I figured out that the actual requirements also depend on the target
device driver, filesystem (e.g. NFS is considerably more memory-hungry
than ext3) and (not the least) number of online CPUs.

The calculation is quite complex, and that's probably why nobody has
done it properly yet.

Petr T

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-18  9:22               ` Atsushi Kumagai
  2014-04-18 14:29                 ` bhe
@ 2014-04-21 15:12                 ` Vivek Goyal
  2014-04-23  7:55                   ` Atsushi Kumagai
  1 sibling, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2014-04-21 15:12 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: d.hatayama, kexec, zzou, bhe

On Fri, Apr 18, 2014 at 09:22:26AM +0000, Atsushi Kumagai wrote:
> >On 04/17/14 at 12:52pm, Baoquan He wrote:
> >> On 04/17/14 at 04:01am, Atsushi Kumagai wrote:
> >> > Hello Baoquan,
> >> >
> >> > >Hi Atsushi,
> >> > >
> >> > >I have got the test machine where bug reported and did a test. The
> >> > >changed code can make elf dump successful.
> >> >
> >> > Great, thanks for your help!
> >> > However, I still have questions.
> >> >
> >> > First, what is the difference between yours and mine?
> >> >
> >> > http://lists.infradead.org/pipermail/kexec/2014-April/011535.html
> >>
> >> Yeah, you are right, it's the same on changing the code bug. I mush
> >> haven't read your patch carefully.
> >                                                          must<--
> >>
> >> >
> >> > My patch includes renaming some values, but the purpose looks
> >> > the same as yours.
> >> > Further, you described as below,
> >> >
> >> > >On 04/14/14 at 04:02pm, Baoquan He wrote:
> >> > but I still don't think this bug causes OOM.
> >> > Even if needed_size is calculated as so much size wrongly, bufsize_cyclic
> >> > will not exceed 40% of free memory by the check below:
> >> >
> >> >     info->bufsize_cyclic = (free_size <= needed_size) ? free_size : needed_size;
> >> >
> >> > So it looks that bitmap1(40%) and bitmap2(40%) will fit in 80% of free
> >> > memory in any case.
> >> >
> >> > I may misunderstand something since your patch has an effect on this
> >> > issue in practice, could you correct me?
> >>
> >> It definitely will cause OOM. On my test machine, it has 100G memory. So
> >> per old code, its needed_size is 3200K*2 == 6.4M, if currently free
> >> memory is only 15M left, the free_size will be 15M*0.4 which is 6M. So
> >> info->bufsize_cyclic is assigned to be 6M. and only 3M is left for other
> >> use, e.g page cache, dynamic allocation. OOM will happen.
> >>
> >
> >BTW, in our case, there's about 30M free memory when we started saving
> >dump. It should be caused by my coarse estimation above.
> 
> Thanks for your description, I understand that situation and
> the nature of the problem.
> 
> That is, the assumption that 20% of free memory is enough for
> makedumpfile can be broken if free memory is too small.
> If your machine has 200GB memory, OOM will happen even after fix
> the too allocation bug.

Why? In cyclic mode, shouldn't makedumpfile's memory usage be fixed
and should not be dependent on amount of RAM present in the system?

Also, so even 30MB is not sufficient to run makedumpfile. That looks
like a lot of free memory to me. 

> 
> I don't think this is a problem, it's natural that a lack of memory
> causes OOM. However, there is a thing we can do for improvement. 
> 
> What I think is:
> 
>   1. Use a constant value as safe limit to calculate bufsize_cyclic
>      instead of 80% of free memory. This value must be enough for
>      makedumpfile's work except bitmap.
> 
>   2. If free memory is smaller than the value, makedumpfile gives up
>      to work early.

What do we gain by makedumpfile giving up. System will reboot. System
will reboot anyway after OOM.

> 
> This change may reduce the possibility of lack of memory, but the
> required memory size will be changing every version, so maintaining
> it sounds tough to me.

I think we need to dive deeper to figure out why 30MB of free memory
is not sufficient. To me something looks wrong here.

Secondly, I think using absolute values is not a good idea. It will be
very hard to keep track and udpate that value. 

At max we can warn saying makedumpfile needs X MB of memory and only
Y MB of free memory is available.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-18 14:29                 ` bhe
  2014-04-18 19:41                   ` Petr Tesarik
@ 2014-04-21 15:14                   ` Vivek Goyal
  2014-04-23 11:09                     ` bhe
  1 sibling, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2014-04-21 15:14 UTC (permalink / raw)
  To: bhe; +Cc: kexec, d.hatayama, Atsushi Kumagai, zzou

On Fri, Apr 18, 2014 at 10:29:12PM +0800, bhe@redhat.com wrote:
> 
> > >> It definitely will cause OOM. On my test machine, it has 100G memory. So
> > >> per old code, its needed_size is 3200K*2 == 6.4M, if currently free
> > >> memory is only 15M left, the free_size will be 15M*0.4 which is 6M. So
> > >> info->bufsize_cyclic is assigned to be 6M. and only 3M is left for other
> > >> use, e.g page cache, dynamic allocation. OOM will happen.
> > >>
> > >
> > >BTW, in our case, there's about 30M free memory when we started saving
> > >dump. It should be caused by my coarse estimation above.
> > 
> > Thanks for your description, I understand that situation and
> > the nature of the problem.
> > 
> > That is, the assumption that 20% of free memory is enough for
> > makedumpfile can be broken if free memory is too small.
> > If your machine has 200GB memory, OOM will happen even after fix
> > the too allocation bug.
> 
> Well, we have done some experiments to try to get the statistical memory
> range which kdump really need. Then a final reservation will be
> calculated automatically as (base_value + linear growth of total memory). 
> If one machine has 200GB memory, its reservation will grow too. Since
> except of the bitmap cost, other memory cost is almost fixed. 
> 
> Per this scheme things should be go well, if memory always goes to the
> edge of OOM, an adjust of base_value is needed. So a constant value as
> you said may not be needed.

That logic is old and we probably should get rid of at some point. We
don't want makedumpfile's memory usage to go up because system has
more physical RAM. That's why cyclic mode was introduced.

> 
> Instead, I am wondering how the 80% comes from, and why 20% of free
> memory must be safe.

I had come up with this 80% number randomly. So you think that's the
problem?

I am still scratching my head that why 30MB is not sufficient for
makdumpfile.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-18 19:41                   ` Petr Tesarik
@ 2014-04-21 15:19                     ` Vivek Goyal
  2014-04-21 15:46                       ` Petr Tesarik
  0 siblings, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2014-04-21 15:19 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: kexec, d.hatayama, Atsushi Kumagai, zzou, bhe

On Fri, Apr 18, 2014 at 09:41:33PM +0200, Petr Tesarik wrote:
> On Fri, 18 Apr 2014 22:29:12 +0800
> "bhe@redhat.com" <bhe@redhat.com> wrote:
> 
> > 
> > > >> It definitely will cause OOM. On my test machine, it has 100G memory. So
> > > >> per old code, its needed_size is 3200K*2 == 6.4M, if currently free
> > > >> memory is only 15M left, the free_size will be 15M*0.4 which is 6M. So
> > > >> info->bufsize_cyclic is assigned to be 6M. and only 3M is left for other
> > > >> use, e.g page cache, dynamic allocation. OOM will happen.
> > > >>
> > > >
> > > >BTW, in our case, there's about 30M free memory when we started saving
> > > >dump. It should be caused by my coarse estimation above.
> > > 
> > > Thanks for your description, I understand that situation and
> > > the nature of the problem.
> > > 
> > > That is, the assumption that 20% of free memory is enough for
> > > makedumpfile can be broken if free memory is too small.
> > > If your machine has 200GB memory, OOM will happen even after fix
> > > the too allocation bug.
> > 
> > Well, we have done some experiments to try to get the statistical memory
> > range which kdump really need. Then a final reservation will be
> > calculated automatically as (base_value + linear growth of total memory). 
> > If one machine has 200GB memory, its reservation will grow too. Since
> > except of the bitmap cost, other memory cost is almost fixed. 
> > 
> > Per this scheme things should be go well, if memory always goes to the
> > edge of OOM, an adjust of base_value is needed. So a constant value as
> > you said may not be needed.
> > 
> > Instead, I am wondering how the 80% comes from, and why 20% of free
> > memory must be safe.
> 
> I believe these 80% come from the default value of vm.dirty_ratio,

Actually I had suggested this 80% number when --cyclic feature was
implemented. And I did not base it on dirty_ratio. Just a random
suggestion.

> which is 20%. In other words, the kernel won't block further writes
> until 20% of available RAM is used up by dirty cache. But if you
> fill up all free memory with dirty pages and then touch another (though
> allocated) page, the kernel will go into direct reclaim, and if nothing
> can be written out ATM, it will invoke the OOM Killer.

We can start playig with reducing dirty_raio too and see how does it go.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-21 15:19                     ` Vivek Goyal
@ 2014-04-21 15:46                       ` Petr Tesarik
  2014-04-21 15:51                         ` Vivek Goyal
  0 siblings, 1 reply; 38+ messages in thread
From: Petr Tesarik @ 2014-04-21 15:46 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Atsushi Kumagai, d.hatayama, kexec, zzou, bhe

On Mon, 21 Apr 2014 11:19:14 -0400
Vivek Goyal <vgoyal@redhat.com> wrote:

> On Fri, Apr 18, 2014 at 09:41:33PM +0200, Petr Tesarik wrote:
> > On Fri, 18 Apr 2014 22:29:12 +0800
> > "bhe@redhat.com" <bhe@redhat.com> wrote:
>[...]
> > > Instead, I am wondering how the 80% comes from, and why 20% of free
> > > memory must be safe.
> > 
> > I believe these 80% come from the default value of vm.dirty_ratio,
> 
> Actually I had suggested this 80% number when --cyclic feature was
> implemented. And I did not base it on dirty_ratio. Just a random
> suggestion.

Hm. OK, you seem to have very good suggestions, even if they are
random. ;-)

> > which is 20%. In other words, the kernel won't block further writes
> > until 20% of available RAM is used up by dirty cache. But if you
> > fill up all free memory with dirty pages and then touch another (though
> > allocated) page, the kernel will go into direct reclaim, and if nothing
> > can be written out ATM, it will invoke the OOM Killer.
> 
> We can start playig with reducing dirty_raio too and see how does it go.

Or use direct I/O... Is it still broken?

Petr T

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-21 15:46                       ` Petr Tesarik
@ 2014-04-21 15:51                         ` Vivek Goyal
  0 siblings, 0 replies; 38+ messages in thread
From: Vivek Goyal @ 2014-04-21 15:51 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Atsushi Kumagai, d.hatayama, kexec, zzou, bhe

On Mon, Apr 21, 2014 at 05:46:30PM +0200, Petr Tesarik wrote:
> On Mon, 21 Apr 2014 11:19:14 -0400
> Vivek Goyal <vgoyal@redhat.com> wrote:
> 
> > On Fri, Apr 18, 2014 at 09:41:33PM +0200, Petr Tesarik wrote:
> > > On Fri, 18 Apr 2014 22:29:12 +0800
> > > "bhe@redhat.com" <bhe@redhat.com> wrote:
> >[...]
> > > > Instead, I am wondering how the 80% comes from, and why 20% of free
> > > > memory must be safe.
> > > 
> > > I believe these 80% come from the default value of vm.dirty_ratio,
> > 
> > Actually I had suggested this 80% number when --cyclic feature was
> > implemented. And I did not base it on dirty_ratio. Just a random
> > suggestion.
> 
> Hm. OK, you seem to have very good suggestions, even if they are
> random. ;-)
> 
> > > which is 20%. In other words, the kernel won't block further writes
> > > until 20% of available RAM is used up by dirty cache. But if you
> > > fill up all free memory with dirty pages and then touch another (though
> > > allocated) page, the kernel will go into direct reclaim, and if nothing
> > > can be written out ATM, it will invoke the OOM Killer.
> > 
> > We can start playig with reducing dirty_raio too and see how does it go.
> 
> Or use direct I/O... Is it still broken?

May be if that turns out to be an issue. Or just reduce dirty_ratio to say
5 instead of default 20. Different things can be tried once we know what's
the root cause of the issue.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-21 15:12                 ` Vivek Goyal
@ 2014-04-23  7:55                   ` Atsushi Kumagai
  2014-04-23 11:55                     ` bhe
  2014-04-23 17:08                     ` Vivek Goyal
  0 siblings, 2 replies; 38+ messages in thread
From: Atsushi Kumagai @ 2014-04-23  7:55 UTC (permalink / raw)
  To: vgoyal; +Cc: d.hatayama, kexec, zzou, bhe

>On Fri, Apr 18, 2014 at 09:22:26AM +0000, Atsushi Kumagai wrote:
>> >On 04/17/14 at 12:52pm, Baoquan He wrote:
>> >> On 04/17/14 at 04:01am, Atsushi Kumagai wrote:
>> >> > Hello Baoquan,
>> >> >
>> >> > >Hi Atsushi,
>> >> > >
>> >> > >I have got the test machine where bug reported and did a test. The
>> >> > >changed code can make elf dump successful.
>> >> >
>> >> > Great, thanks for your help!
>> >> > However, I still have questions.
>> >> >
>> >> > First, what is the difference between yours and mine?
>> >> >
>> >> > http://lists.infradead.org/pipermail/kexec/2014-April/011535.html
>> >>
>> >> Yeah, you are right, it's the same on changing the code bug. I mush
>> >> haven't read your patch carefully.
>> >                                                          must<--
>> >>
>> >> >
>> >> > My patch includes renaming some values, but the purpose looks
>> >> > the same as yours.
>> >> > Further, you described as below,
>> >> >
>> >> > >On 04/14/14 at 04:02pm, Baoquan He wrote:
>> >> > but I still don't think this bug causes OOM.
>> >> > Even if needed_size is calculated as so much size wrongly, bufsize_cyclic
>> >> > will not exceed 40% of free memory by the check below:
>> >> >
>> >> >     info->bufsize_cyclic = (free_size <= needed_size) ? free_size : needed_size;
>> >> >
>> >> > So it looks that bitmap1(40%) and bitmap2(40%) will fit in 80% of free
>> >> > memory in any case.
>> >> >
>> >> > I may misunderstand something since your patch has an effect on this
>> >> > issue in practice, could you correct me?
>> >>
>> >> It definitely will cause OOM. On my test machine, it has 100G memory. So
>> >> per old code, its needed_size is 3200K*2 == 6.4M, if currently free
>> >> memory is only 15M left, the free_size will be 15M*0.4 which is 6M. So
>> >> info->bufsize_cyclic is assigned to be 6M. and only 3M is left for other
>> >> use, e.g page cache, dynamic allocation. OOM will happen.
>> >>
>> >
>> >BTW, in our case, there's about 30M free memory when we started saving
>> >dump. It should be caused by my coarse estimation above.
>>
>> Thanks for your description, I understand that situation and
>> the nature of the problem.
>>
>> That is, the assumption that 20% of free memory is enough for
>> makedumpfile can be broken if free memory is too small.
>> If your machine has 200GB memory, OOM will happen even after fix
>> the too allocation bug.
>
>Why? In cyclic mode, shouldn't makedumpfile's memory usage be fixed
>and should not be dependent on amount of RAM present in the system?

Strictly speaking, it's not fixed but just restricted by the safe
limit(80% of free memory) like below:

 - bitmap size: used for 1st and 2nd bitmaps
 - remains: can be used for the other works of makedumpfile (e.g. I/O buffer)

                 pattern                      |  bitmap size  |   remains
----------------------------------------------+---------------+-------------
  A. 100G memory with the too allocation bug  |    12.8 MB    |   17.2 MB
  B. 100G memory with fixed makedumpfile      |     6.4 MB    |   23.6 MB
  C. 200G memory with fixed makedumpfile      |    12.8 MB    |   17.2 MB
  D. 300G memory with fixed makedumpfile      |    19.2 MB    |   10.8 MB
  E. 400G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
  F. 500G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
  ...

Baoquan got OOM in A pattern and didn't get it in B, so C must also
fail due to OOM. This is just what I wanted to say.

>Also, so even 30MB is not sufficient to run makedumpfile. That looks
>like a lot of free memory to me.

According to the table above and Baoquan's report, 23.6MB was enough
but 17.2 MB was short for the other works, it sounds too much requirement
also to me.

>> I don't think this is a problem, it's natural that a lack of memory
>> causes OOM. However, there is a thing we can do for improvement.
>>
>> What I think is:
>>
>>   1. Use a constant value as safe limit to calculate bufsize_cyclic
>>      instead of 80% of free memory. This value must be enough for
>>      makedumpfile's work except bitmap.
>>
>>   2. If free memory is smaller than the value, makedumpfile gives up
>>      to work early.
>
>What do we gain by makedumpfile giving up. System will reboot. System
>will reboot anyway after OOM.

Oh, you got a point. 

>>
>> This change may reduce the possibility of lack of memory, but the
>> required memory size will be changing every version, so maintaining
>> it sounds tough to me.
>
>I think we need to dive deeper to figure out why 30MB of free memory
>is not sufficient. To me something looks wrong here.

I agree. I'm trying to get the peak size of memory footprint except 
cyclic_buffer with memory cgroup (memory.max_usage_in_bytes), but
I don't make it yet.

>Secondly, I think using absolute values is not a good idea. It will be
>very hard to keep track and udpate that value.
>
>At max we can warn saying makedumpfile needs X MB of memory and only
>Y MB of free memory is available.

Yes, I don't want to choose that way, and to maintain the constant value
just to inform possibility of OOM sounds worthless.


Thanks
Atsushi Kumagai

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-21 15:14                   ` Vivek Goyal
@ 2014-04-23 11:09                     ` bhe
  0 siblings, 0 replies; 38+ messages in thread
From: bhe @ 2014-04-23 11:09 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Atsushi Kumagai, d.hatayama, kexec, zzou


> > 
> > Well, we have done some experiments to try to get the statistical memory
> > range which kdump really need. Then a final reservation will be
> > calculated automatically as (base_value + linear growth of total memory). 
> > If one machine has 200GB memory, its reservation will grow too. Since
> > except of the bitmap cost, other memory cost is almost fixed. 
> > 
> > Per this scheme things should be go well, if memory always goes to the
> > edge of OOM, an adjust of base_value is needed. So a constant value as
> > you said may not be needed.
> 
> That logic is old and we probably should get rid of at some point. We
> don't want makedumpfile's memory usage to go up because system has
> more physical RAM. That's why cyclic mode was introduced.


> 
> > 
> > Instead, I am wondering how the 80% comes from, and why 20% of free
> > memory must be safe.
> 
> I had come up with this 80% number randomly. So you think that's the
> problem?


As Astushi said, when memory amount goes up, the bitmap also goes up.
The cyclic should solve this problem, but seems it doesn't. In the bug
we found, it always get the whole bitmap size to cyclic_bugsize. And
that is because 80% number.

We are try to add debug code to see what is going on during makedumpfile
running.

> 
> I am still scratching my head that why 30MB is not sufficient for
> makdumpfile.
> 
> Thanks
> Vivek
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-23  7:55                   ` Atsushi Kumagai
@ 2014-04-23 11:55                     ` bhe
  2014-04-23 17:08                     ` Vivek Goyal
  1 sibling, 0 replies; 38+ messages in thread
From: bhe @ 2014-04-23 11:55 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: d.hatayama, kexec, zzou, vgoyal

> >> Thanks for your description, I understand that situation and
> >> the nature of the problem.
> >>
> >> That is, the assumption that 20% of free memory is enough for
> >> makedumpfile can be broken if free memory is too small.
> >> If your machine has 200GB memory, OOM will happen even after fix
> >> the too allocation bug.
> >
> >Why? In cyclic mode, shouldn't makedumpfile's memory usage be fixed
> >and should not be dependent on amount of RAM present in the system?
> 
> Strictly speaking, it's not fixed but just restricted by the safe
> limit(80% of free memory) like below:
> 
>  - bitmap size: used for 1st and 2nd bitmaps
>  - remains: can be used for the other works of makedumpfile (e.g. I/O buffer)
> 
>                  pattern                      |  bitmap size  |   remains
> ----------------------------------------------+---------------+-------------
>   A. 100G memory with the too allocation bug  |    12.8 MB    |   17.2 MB
>   B. 100G memory with fixed makedumpfile      |     6.4 MB    |   23.6 MB
>   C. 200G memory with fixed makedumpfile      |    12.8 MB    |   17.2 MB
>   D. 300G memory with fixed makedumpfile      |    19.2 MB    |   10.8 MB
>   E. 400G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
>   F. 500G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
>   ...
> 
> Baoquan got OOM in A pattern and didn't get it in B, so C must also
> fail due to OOM. This is just what I wanted to say.

Thanks Atsushi for this detailed table.

In fact in cyclic mode, the cyclic bugsize is the dynamic number which
need be grieved, not the remaining memory which always got to be
decreased. Since user set cyclic mode, they would like to accept the
fact that dumping could cost a longer time. But we give them a OOM,
that's not acceptable.

So I think we need think about how the cyclic_bufsize can be calculated
better. Maybe dynamically adjust the 80% number?

Thanks
Baoquan



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-23  7:55                   ` Atsushi Kumagai
  2014-04-23 11:55                     ` bhe
@ 2014-04-23 17:08                     ` Vivek Goyal
  2014-04-23 23:50                       ` bhe
  2014-04-28  5:04                       ` Atsushi Kumagai
  1 sibling, 2 replies; 38+ messages in thread
From: Vivek Goyal @ 2014-04-23 17:08 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: d.hatayama, kexec, zzou, bhe

On Wed, Apr 23, 2014 at 07:55:19AM +0000, Atsushi Kumagai wrote:
> >On Fri, Apr 18, 2014 at 09:22:26AM +0000, Atsushi Kumagai wrote:
> >> >On 04/17/14 at 12:52pm, Baoquan He wrote:
> >> >> On 04/17/14 at 04:01am, Atsushi Kumagai wrote:
> >> >> > Hello Baoquan,
> >> >> >
> >> >> > >Hi Atsushi,
> >> >> > >
> >> >> > >I have got the test machine where bug reported and did a test. The
> >> >> > >changed code can make elf dump successful.
> >> >> >
> >> >> > Great, thanks for your help!
> >> >> > However, I still have questions.
> >> >> >
> >> >> > First, what is the difference between yours and mine?
> >> >> >
> >> >> > http://lists.infradead.org/pipermail/kexec/2014-April/011535.html
> >> >>
> >> >> Yeah, you are right, it's the same on changing the code bug. I mush
> >> >> haven't read your patch carefully.
> >> >                                                          must<--
> >> >>
> >> >> >
> >> >> > My patch includes renaming some values, but the purpose looks
> >> >> > the same as yours.
> >> >> > Further, you described as below,
> >> >> >
> >> >> > >On 04/14/14 at 04:02pm, Baoquan He wrote:
> >> >> > but I still don't think this bug causes OOM.
> >> >> > Even if needed_size is calculated as so much size wrongly, bufsize_cyclic
> >> >> > will not exceed 40% of free memory by the check below:
> >> >> >
> >> >> >     info->bufsize_cyclic = (free_size <= needed_size) ? free_size : needed_size;
> >> >> >
> >> >> > So it looks that bitmap1(40%) and bitmap2(40%) will fit in 80% of free
> >> >> > memory in any case.
> >> >> >
> >> >> > I may misunderstand something since your patch has an effect on this
> >> >> > issue in practice, could you correct me?
> >> >>
> >> >> It definitely will cause OOM. On my test machine, it has 100G memory. So
> >> >> per old code, its needed_size is 3200K*2 == 6.4M, if currently free
> >> >> memory is only 15M left, the free_size will be 15M*0.4 which is 6M. So
> >> >> info->bufsize_cyclic is assigned to be 6M. and only 3M is left for other
> >> >> use, e.g page cache, dynamic allocation. OOM will happen.
> >> >>
> >> >
> >> >BTW, in our case, there's about 30M free memory when we started saving
> >> >dump. It should be caused by my coarse estimation above.
> >>
> >> Thanks for your description, I understand that situation and
> >> the nature of the problem.
> >>
> >> That is, the assumption that 20% of free memory is enough for
> >> makedumpfile can be broken if free memory is too small.
> >> If your machine has 200GB memory, OOM will happen even after fix
> >> the too allocation bug.
> >
> >Why? In cyclic mode, shouldn't makedumpfile's memory usage be fixed
> >and should not be dependent on amount of RAM present in the system?
> 
> Strictly speaking, it's not fixed but just restricted by the safe
> limit(80% of free memory) like below:
> 

What I meant was memory allocation is of the order O(1) and not O(N).

>  - bitmap size: used for 1st and 2nd bitmaps
>  - remains: can be used for the other works of makedumpfile (e.g. I/O buffer)
> 
>                  pattern                      |  bitmap size  |   remains
> ----------------------------------------------+---------------+-------------
>   A. 100G memory with the too allocation bug  |    12.8 MB    |   17.2 MB
>   B. 100G memory with fixed makedumpfile      |     6.4 MB    |   23.6 MB
>   C. 200G memory with fixed makedumpfile      |    12.8 MB    |   17.2 MB
>   D. 300G memory with fixed makedumpfile      |    19.2 MB    |   10.8 MB
>   E. 400G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
>   F. 500G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
>   ...
> 
> Baoquan got OOM in A pattern and didn't get it in B, so C must also
> fail due to OOM. This is just what I wanted to say.

ok, So here bitmap size is growing because we have not hit the 80% of 
available memory limit yet. But it gets limited at 24MB once we hit
80% limit. I think that's fine. That's what I was looking for.

Now key question will remain is that is using 80% of free memory by
bitmaps too much. Are other things happening in system which consume
memory and because memory is not available OOM hits. If that's the
case we probably need to lower the amount of memory allocated to
bit maps. Say 70% or 60% or may be 50%. But this should be data driven.


> 
> >Also, so even 30MB is not sufficient to run makedumpfile. That looks
> >like a lot of free memory to me.
> 
> According to the table above and Baoquan's report, 23.6MB was enough
> but 17.2 MB was short for the other works, it sounds too much requirement
> also to me.
> 
> >> I don't think this is a problem, it's natural that a lack of memory
> >> causes OOM. However, there is a thing we can do for improvement.
> >>
> >> What I think is:
> >>
> >>   1. Use a constant value as safe limit to calculate bufsize_cyclic
> >>      instead of 80% of free memory. This value must be enough for
> >>      makedumpfile's work except bitmap.
> >>
> >>   2. If free memory is smaller than the value, makedumpfile gives up
> >>      to work early.
> >
> >What do we gain by makedumpfile giving up. System will reboot. System
> >will reboot anyway after OOM.
> 
> Oh, you got a point. 
> 
> >>
> >> This change may reduce the possibility of lack of memory, but the
> >> required memory size will be changing every version, so maintaining
> >> it sounds tough to me.
> >
> >I think we need to dive deeper to figure out why 30MB of free memory
> >is not sufficient. To me something looks wrong here.
> 
> I agree. I'm trying to get the peak size of memory footprint except 
> cyclic_buffer with memory cgroup (memory.max_usage_in_bytes), but
> I don't make it yet.

That sounds like a good idea.

I am wondering if we are doing anything which requires kernel to allocate
memory and that leads to OOM.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-23 17:08                     ` Vivek Goyal
@ 2014-04-23 23:50                       ` bhe
  2014-04-24  2:05                         ` bhe
  2014-04-25 13:22                         ` Vivek Goyal
  2014-04-28  5:04                       ` Atsushi Kumagai
  1 sibling, 2 replies; 38+ messages in thread
From: bhe @ 2014-04-23 23:50 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: kexec, d.hatayama, Atsushi Kumagai, zzou

On 04/23/14 at 01:08pm, Vivek Goyal wrote:
 
> >  - bitmap size: used for 1st and 2nd bitmaps
> >  - remains: can be used for the other works of makedumpfile (e.g. I/O buffer)
> > 
> >                  pattern                      |  bitmap size  |   remains
> > ----------------------------------------------+---------------+-------------
> >   A. 100G memory with the too allocation bug  |    12.8 MB    |   17.2 MB
> >   B. 100G memory with fixed makedumpfile      |     6.4 MB    |   23.6 MB
> >   C. 200G memory with fixed makedumpfile      |    12.8 MB    |   17.2 MB
> >   D. 300G memory with fixed makedumpfile      |    19.2 MB    |   10.8 MB
> >   E. 400G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
> >   F. 500G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
> >   ...
> > 
> > Baoquan got OOM in A pattern and didn't get it in B, so C must also
> > fail due to OOM. This is just what I wanted to say.
> 
> ok, So here bitmap size is growing because we have not hit the 80% of 
> available memory limit yet. But it gets limited at 24MB once we hit
> 80% limit. I think that's fine. That's what I was looking for.
> 
> Now key question will remain is that is using 80% of free memory by
> bitmaps too much. Are other things happening in system which consume
> memory and because memory is not available OOM hits. If that's the
> case we probably need to lower the amount of memory allocated to
> bit maps. Say 70% or 60% or may be 50%. But this should be data driven.

How about add anoter limit, say left memory safety limit, e.g 20M. If
the remaining memory which is 20% of free memory is bigger than 20M, 80%
can be taken to calculate the bitmap size. If smaller than 20M, we just
take (total memory - safety limit) for bitmap size.

Thanks
Baoquan

> 
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-23 23:50                       ` bhe
@ 2014-04-24  2:05                         ` bhe
  2014-04-25 13:22                         ` Vivek Goyal
  1 sibling, 0 replies; 38+ messages in thread
From: bhe @ 2014-04-24  2:05 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: kexec, d.hatayama, Atsushi Kumagai, zzou

On 04/24/14 at 07:50am, Baoquan He wrote:
> On 04/23/14 at 01:08pm, Vivek Goyal wrote:
>  
> > >  - bitmap size: used for 1st and 2nd bitmaps
> > >  - remains: can be used for the other works of makedumpfile (e.g. I/O buffer)
> > > 
> > >                  pattern                      |  bitmap size  |   remains
> > > ----------------------------------------------+---------------+-------------
> > >   A. 100G memory with the too allocation bug  |    12.8 MB    |   17.2 MB
> > >   B. 100G memory with fixed makedumpfile      |     6.4 MB    |   23.6 MB
> > >   C. 200G memory with fixed makedumpfile      |    12.8 MB    |   17.2 MB
> > >   D. 300G memory with fixed makedumpfile      |    19.2 MB    |   10.8 MB
> > >   E. 400G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
> > >   F. 500G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
> > >   ...
> > > 
> > > Baoquan got OOM in A pattern and didn't get it in B, so C must also
> > > fail due to OOM. This is just what I wanted to say.
> > 
> > ok, So here bitmap size is growing because we have not hit the 80% of 
> > available memory limit yet. But it gets limited at 24MB once we hit
> > 80% limit. I think that's fine. That's what I was looking for.
> > 
> > Now key question will remain is that is using 80% of free memory by
> > bitmaps too much. Are other things happening in system which consume
> > memory and because memory is not available OOM hits. If that's the
> > case we probably need to lower the amount of memory allocated to
> > bit maps. Say 70% or 60% or may be 50%. But this should be data driven.
> 
> How about add anoter limit, say left memory safety limit, e.g 20M. If
> the remaining memory which is 20% of free memory is bigger than 20M, 80%
> can be taken to calculate the bitmap size. If smaller than 20M, we just
> take (total memory - safety limit) for bitmap size.

Oh, this is what Astushi suggested earlier in his comments.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-23 23:50                       ` bhe
  2014-04-24  2:05                         ` bhe
@ 2014-04-25 13:22                         ` Vivek Goyal
  2014-04-28  5:05                           ` Atsushi Kumagai
  1 sibling, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2014-04-25 13:22 UTC (permalink / raw)
  To: bhe; +Cc: kexec, d.hatayama, Atsushi Kumagai, zzou

On Thu, Apr 24, 2014 at 07:50:41AM +0800, bhe@redhat.com wrote:
> On 04/23/14 at 01:08pm, Vivek Goyal wrote:
>  
> > >  - bitmap size: used for 1st and 2nd bitmaps
> > >  - remains: can be used for the other works of makedumpfile (e.g. I/O buffer)
> > > 
> > >                  pattern                      |  bitmap size  |   remains
> > > ----------------------------------------------+---------------+-------------
> > >   A. 100G memory with the too allocation bug  |    12.8 MB    |   17.2 MB
> > >   B. 100G memory with fixed makedumpfile      |     6.4 MB    |   23.6 MB
> > >   C. 200G memory with fixed makedumpfile      |    12.8 MB    |   17.2 MB
> > >   D. 300G memory with fixed makedumpfile      |    19.2 MB    |   10.8 MB
> > >   E. 400G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
> > >   F. 500G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
> > >   ...
> > > 
> > > Baoquan got OOM in A pattern and didn't get it in B, so C must also
> > > fail due to OOM. This is just what I wanted to say.
> > 
> > ok, So here bitmap size is growing because we have not hit the 80% of 
> > available memory limit yet. But it gets limited at 24MB once we hit
> > 80% limit. I think that's fine. That's what I was looking for.
> > 
> > Now key question will remain is that is using 80% of free memory by
> > bitmaps too much. Are other things happening in system which consume
> > memory and because memory is not available OOM hits. If that's the
> > case we probably need to lower the amount of memory allocated to
> > bit maps. Say 70% or 60% or may be 50%. But this should be data driven.
> 
> How about add anoter limit, say left memory safety limit, e.g 20M. If
> the remaining memory which is 20% of free memory is bigger than 20M, 80%
> can be taken to calculate the bitmap size. If smaller than 20M, we just
> take (total memory - safety limit) for bitmap size.

I think doing another internal limit for makedumpfile usage sounds fine.
So say, if makedumpfile needs 5MB of memory for purposes other than
bitmap, then remove 5MB from total memory and then take 80% of remaining
memory to calculate bitmap size. I think that should be reasonable.

Tricky bit here is to figure out how much memory does makedumpfile need. 

A simpler solution will be to just reserve 60% of total memory for bitmaps
and leave rest for makedumpfile and kernel and other components.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-23 17:08                     ` Vivek Goyal
  2014-04-23 23:50                       ` bhe
@ 2014-04-28  5:04                       ` Atsushi Kumagai
  2014-05-09  5:35                         ` Atsushi Kumagai
  1 sibling, 1 reply; 38+ messages in thread
From: Atsushi Kumagai @ 2014-04-28  5:04 UTC (permalink / raw)
  To: vgoyal; +Cc: d.hatayama, kexec, zzou, bhe

>> >> >> It definitely will cause OOM. On my test machine, it has 100G memory. So
>> >> >> per old code, its needed_size is 3200K*2 == 6.4M, if currently free
>> >> >> memory is only 15M left, the free_size will be 15M*0.4 which is 6M. So
>> >> >> info->bufsize_cyclic is assigned to be 6M. and only 3M is left for other
>> >> >> use, e.g page cache, dynamic allocation. OOM will happen.
>> >> >>
>> >> >
>> >> >BTW, in our case, there's about 30M free memory when we started saving
>> >> >dump. It should be caused by my coarse estimation above.
>> >>
>> >> Thanks for your description, I understand that situation and
>> >> the nature of the problem.
>> >>
>> >> That is, the assumption that 20% of free memory is enough for
>> >> makedumpfile can be broken if free memory is too small.
>> >> If your machine has 200GB memory, OOM will happen even after fix
>> >> the too allocation bug.
>> >
>> >Why? In cyclic mode, shouldn't makedumpfile's memory usage be fixed
>> >and should not be dependent on amount of RAM present in the system?
>>
>> Strictly speaking, it's not fixed but just restricted by the safe
>> limit(80% of free memory) like below:
>>
>
>What I meant was memory allocation is of the order O(1) and not O(N).
>
>>  - bitmap size: used for 1st and 2nd bitmaps
>>  - remains: can be used for the other works of makedumpfile (e.g. I/O buffer)
>>
>>                  pattern                      |  bitmap size  |   remains
>> ----------------------------------------------+---------------+-------------
>>   A. 100G memory with the too allocation bug  |    12.8 MB    |   17.2 MB
>>   B. 100G memory with fixed makedumpfile      |     6.4 MB    |   23.6 MB
>>   C. 200G memory with fixed makedumpfile      |    12.8 MB    |   17.2 MB
>>   D. 300G memory with fixed makedumpfile      |    19.2 MB    |   10.8 MB
>>   E. 400G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
>>   F. 500G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
>>   ...
>>
>> Baoquan got OOM in A pattern and didn't get it in B, so C must also
>> fail due to OOM. This is just what I wanted to say.
>
>ok, So here bitmap size is growing because we have not hit the 80% of
>available memory limit yet. But it gets limited at 24MB once we hit
>80% limit. I think that's fine. That's what I was looking for.
>
>Now key question will remain is that is using 80% of free memory by
>bitmaps too much. Are other things happening in system which consume
>memory and because memory is not available OOM hits. If that's the
>case we probably need to lower the amount of memory allocated to
>bit maps. Say 70% or 60% or may be 50%. But this should be data driven.
>
>
>>
>> >Also, so even 30MB is not sufficient to run makedumpfile. That looks
>> >like a lot of free memory to me.
>>
>> According to the table above and Baoquan's report, 23.6MB was enough
>> but 17.2 MB was short for the other works, it sounds too much requirement
>> also to me.
>>
>> >> I don't think this is a problem, it's natural that a lack of memory
>> >> causes OOM. However, there is a thing we can do for improvement.
>> >>
>> >> What I think is:
>> >>
>> >>   1. Use a constant value as safe limit to calculate bufsize_cyclic
>> >>      instead of 80% of free memory. This value must be enough for
>> >>      makedumpfile's work except bitmap.
>> >>
>> >>   2. If free memory is smaller than the value, makedumpfile gives up
>> >>      to work early.
>> >
>> >What do we gain by makedumpfile giving up. System will reboot. System
>> >will reboot anyway after OOM.
>>
>> Oh, you got a point.
>>
>> >>
>> >> This change may reduce the possibility of lack of memory, but the
>> >> required memory size will be changing every version, so maintaining
>> >> it sounds tough to me.
>> >
>> >I think we need to dive deeper to figure out why 30MB of free memory
>> >is not sufficient. To me something looks wrong here.
>>
>> I agree. I'm trying to get the peak size of memory footprint except
>> cyclic_buffer with memory cgroup (memory.max_usage_in_bytes), but
>> I don't make it yet.
>
>That sounds like a good idea.
>
>I am wondering if we are doing anything which requires kernel to allocate
>memory and that leads to OOM.

max_usage_in_bytes seems to include page cache, so I used simpler way
like the patch at the bottom of this mail instead.

Here is the result:

            parameter                  |      result
 dump_lv | buffer[KiB] |  mmap (=4MiB) |    VmHWM [KiB]
 --------+-------------+---------------+------------------
    d31	 |	1	|	 on	|	5872
   Ed31	 |	1	|	 on	|	5804
    d31	 |	1	|	off	|	1784
   Ed31	 |	1	|	off	|	1720
    d31	 |   1000	|	 on	|	8884  (*1)
   Ed31  |   1000	|	 on	|	7812
    d31	 |   1000	|	off	|	4788  (*1)
   Ed31  |   1000	|	off	|	3720

  *1. 2nd bitmap is allocated twice by the bug found by Arthur Zou.

According to this result and the estimation below produced from the
design of makedumpfile,

  base size + (buffer size * 2) + (for mmap size) = VmHWM

the base size may be about 2 MB, so about 6MB (base + mmap) will
be the deadline. If 20% of the available memory is less than 6MB,
OOM will happen.
As for Baoquan's case, remained 17.2MB sounds enough even if
makedumpfile consumes 6MB as for the other works. 
So I don't know why OOM happened yet, but the current safety
limit looks still reasonable to me at least.

By the way, I noticed it's better to remove 4MB(for mmap)
from the available memory before calculate the bitmap buffer
size. I'll do this anyway.


Thanks
Atsushi Kumagai

diff --git a/makedumpfile.c b/makedumpfile.c
index 0b31932..4b565f1 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -37,6 +37,14 @@ struct DumpInfo		*info = NULL;
 
 char filename_stdout[] = FILENAME_STDOUT;
 
+void
+print_VmHWM(void)
+{
+	char cmd[64];
+	sprintf(cmd, "grep VmHWM /proc/%d/status", getpid());
+	system(cmd);
+}
+
 static void first_cycle(mdf_pfn_t start, mdf_pfn_t max, struct cycle *cycle)
 {
 	cycle->start_pfn = round(start, info->pfn_cyclic);
@@ -9357,5 +9365,7 @@ out:
 	}
 	free_elf_info();
 
+	print_VmHWM();
+
 	return retcd;
 }

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* RE: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-25 13:22                         ` Vivek Goyal
@ 2014-04-28  5:05                           ` Atsushi Kumagai
  2014-04-28 12:50                             ` Vivek Goyal
  0 siblings, 1 reply; 38+ messages in thread
From: Atsushi Kumagai @ 2014-04-28  5:05 UTC (permalink / raw)
  To: vgoyal; +Cc: d.hatayama, kexec, zzou, bhe

>On Thu, Apr 24, 2014 at 07:50:41AM +0800, bhe@redhat.com wrote:
>> On 04/23/14 at 01:08pm, Vivek Goyal wrote:
>>
>> > >  - bitmap size: used for 1st and 2nd bitmaps
>> > >  - remains: can be used for the other works of makedumpfile (e.g. I/O buffer)
>> > >
>> > >                  pattern                      |  bitmap size  |   remains
>> > > ----------------------------------------------+---------------+-------------
>> > >   A. 100G memory with the too allocation bug  |    12.8 MB    |   17.2 MB
>> > >   B. 100G memory with fixed makedumpfile      |     6.4 MB    |   23.6 MB
>> > >   C. 200G memory with fixed makedumpfile      |    12.8 MB    |   17.2 MB
>> > >   D. 300G memory with fixed makedumpfile      |    19.2 MB    |   10.8 MB
>> > >   E. 400G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
>> > >   F. 500G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
>> > >   ...
>> > >
>> > > Baoquan got OOM in A pattern and didn't get it in B, so C must also
>> > > fail due to OOM. This is just what I wanted to say.
>> >
>> > ok, So here bitmap size is growing because we have not hit the 80% of
>> > available memory limit yet. But it gets limited at 24MB once we hit
>> > 80% limit. I think that's fine. That's what I was looking for.
>> >
>> > Now key question will remain is that is using 80% of free memory by
>> > bitmaps too much. Are other things happening in system which consume
>> > memory and because memory is not available OOM hits. If that's the
>> > case we probably need to lower the amount of memory allocated to
>> > bit maps. Say 70% or 60% or may be 50%. But this should be data driven.
>>
>> How about add anoter limit, say left memory safety limit, e.g 20M. If
>> the remaining memory which is 20% of free memory is bigger than 20M, 80%
>> can be taken to calculate the bitmap size. If smaller than 20M, we just
>> take (total memory - safety limit) for bitmap size.
>
>I think doing another internal limit for makedumpfile usage sounds fine.
>So say, if makedumpfile needs 5MB of memory for purposes other than
>bitmap, then remove 5MB from total memory and then take 80% of remaining
>memory to calculate bitmap size. I think that should be reasonable.
>
>Tricky bit here is to figure out how much memory does makedumpfile need.

Did you said using such value is bad idea since it's hard to update it?
If we got the needed memory size, it would be changing every version.
At least I think this may be an ideal way but not practical.

>A simpler solution will be to just reserve 60% of total memory for bitmaps
>and leave rest for makedumpfile and kernel and other components.

That's just specific tuning for you and Baoquan.

Now, I think this case is just lack of free memory caused by
inappropriate parameter setting for your environment. You should
increase crashkernel= to get enough free memory, 166M may be too
small for your environment.

By the way, I'm going on holiday for 8 days, I can't reply
during that period. Thanks in advance.


Atsushi Kumagai

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-28  5:05                           ` Atsushi Kumagai
@ 2014-04-28 12:50                             ` Vivek Goyal
  2014-05-09  5:36                               ` Atsushi Kumagai
  0 siblings, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2014-04-28 12:50 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: d.hatayama, kexec, zzou, bhe

On Mon, Apr 28, 2014 at 05:05:00AM +0000, Atsushi Kumagai wrote:
> >On Thu, Apr 24, 2014 at 07:50:41AM +0800, bhe@redhat.com wrote:
> >> On 04/23/14 at 01:08pm, Vivek Goyal wrote:
> >>
> >> > >  - bitmap size: used for 1st and 2nd bitmaps
> >> > >  - remains: can be used for the other works of makedumpfile (e.g. I/O buffer)
> >> > >
> >> > >                  pattern                      |  bitmap size  |   remains
> >> > > ----------------------------------------------+---------------+-------------
> >> > >   A. 100G memory with the too allocation bug  |    12.8 MB    |   17.2 MB
> >> > >   B. 100G memory with fixed makedumpfile      |     6.4 MB    |   23.6 MB
> >> > >   C. 200G memory with fixed makedumpfile      |    12.8 MB    |   17.2 MB
> >> > >   D. 300G memory with fixed makedumpfile      |    19.2 MB    |   10.8 MB
> >> > >   E. 400G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
> >> > >   F. 500G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
> >> > >   ...
> >> > >
> >> > > Baoquan got OOM in A pattern and didn't get it in B, so C must also
> >> > > fail due to OOM. This is just what I wanted to say.
> >> >
> >> > ok, So here bitmap size is growing because we have not hit the 80% of
> >> > available memory limit yet. But it gets limited at 24MB once we hit
> >> > 80% limit. I think that's fine. That's what I was looking for.
> >> >
> >> > Now key question will remain is that is using 80% of free memory by
> >> > bitmaps too much. Are other things happening in system which consume
> >> > memory and because memory is not available OOM hits. If that's the
> >> > case we probably need to lower the amount of memory allocated to
> >> > bit maps. Say 70% or 60% or may be 50%. But this should be data driven.
> >>
> >> How about add anoter limit, say left memory safety limit, e.g 20M. If
> >> the remaining memory which is 20% of free memory is bigger than 20M, 80%
> >> can be taken to calculate the bitmap size. If smaller than 20M, we just
> >> take (total memory - safety limit) for bitmap size.
> >
> >I think doing another internal limit for makedumpfile usage sounds fine.
> >So say, if makedumpfile needs 5MB of memory for purposes other than
> >bitmap, then remove 5MB from total memory and then take 80% of remaining
> >memory to calculate bitmap size. I think that should be reasonable.
> >
> >Tricky bit here is to figure out how much memory does makedumpfile need.
> 
> Did you said using such value is bad idea since it's hard to update it?
> If we got the needed memory size, it would be changing every version.
> At least I think this may be an ideal way but not practical.

Yep, I am not too convinced about fixing makedumpfile memory usage at
a particular value.

> 
> >A simpler solution will be to just reserve 60% of total memory for bitmaps
> >and leave rest for makedumpfile and kernel and other components.
> 
> That's just specific tuning for you and Baoquan.
> 
> Now, I think this case is just lack of free memory caused by
> inappropriate parameter setting for your environment. You should
> increase crashkernel= to get enough free memory, 166M may be too
> small for your environment.

I don't think it is bad tuning from our side. makedumpfile has 30MB free
memory when it was launched and still OOM happened. 

30MB should be more than enough to save dump. 

> 
> By the way, I'm going on holiday for 8 days, I can't reply
> during that period. Thanks in advance.

Sure, talk to you more about this once you are back.

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-28  5:04                       ` Atsushi Kumagai
@ 2014-05-09  5:35                         ` Atsushi Kumagai
  0 siblings, 0 replies; 38+ messages in thread
From: Atsushi Kumagai @ 2014-05-09  5:35 UTC (permalink / raw)
  To: kexec; +Cc: d.hatayama, bhe, zzou, vgoyal

>>I am wondering if we are doing anything which requires kernel to allocate
>>memory and that leads to OOM.
>
>max_usage_in_bytes seems to include page cache, so I used simpler way
>like the patch at the bottom of this mail instead.
>
>Here is the result:
>
>            parameter                  |      result
> dump_lv | buffer[KiB] |  mmap (=4MiB) |    VmHWM [KiB]
> --------+-------------+---------------+------------------
>    d31  |       1     |       on      |        5872
>   Ed31  |       1     |       on      |        5804
>    d31  |       1     |      off      |        1784
>   Ed31  |       1     |      off      |        1720
>    d31  |    1000     |       on      |        8884  (*1)
>   Ed31  |    1000     |       on      |        7812
>    d31  |    1000     |      off      |        4788  (*1)
>   Ed31  |    1000     |      off      |        3720
>
>  *1. 2nd bitmap is allocated twice by the bug found by Arthur Zou.
>
>According to this result and the estimation below produced from the
>design of makedumpfile,
>
>  base size + (buffer size * 2) + (for mmap size) = VmHWM
>
>the base size may be about 2 MB, so about 6MB (base + mmap) will
>be the deadline. If 20% of the available memory is less than 6MB,
>OOM will happen.
>As for Baoquan's case, remained 17.2MB sounds enough even if
>makedumpfile consumes 6MB as for the other works.
>So I don't know why OOM happened yet, but the current safety
>limit looks still reasonable to me at least.
>
>By the way, I noticed it's better to remove 4MB(for mmap)
>from the available memory before calculate the bitmap buffer
>size. I'll do this anyway.

Sorry, I did my test on a 1st kernel environment.
The dump file was on a disk and I used mmap() for it, so __do_fault()
occurred and the VmHWM (MAX(MM_FILEPAGES + MM_ANONPAGES)) increased
by 4MB.

Here is the result on a 2nd kernel environment:

             parameter                  |      result
  dump_lv | buffer[KiB] |  mmap (=4MiB) |    VmHWM [KiB]
  --------+-------------+---------------+------------------
     d31  |       1     |       on      |         776
    Ed31  |       1     |       on      |         712
     d31  |       1     |      off      |         704
    Ed31  |       1     |      off      |         708
     d31  |    1000     |       on      |        1776
    Ed31  |    1000     |       on      |        2716
     d31  |    1000     |      off      |        1660
    Ed31  |    1000     |      off      |        2556


According to this, I think there is no need to take care of the memory
usage of mmap() because the actual memory usage is small enough.


Thanks
Atsushi Kumagai

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-04-28 12:50                             ` Vivek Goyal
@ 2014-05-09  5:36                               ` Atsushi Kumagai
  2014-05-09 20:49                                 ` Vivek Goyal
  2014-05-14  5:44                                 ` bhe
  0 siblings, 2 replies; 38+ messages in thread
From: Atsushi Kumagai @ 2014-05-09  5:36 UTC (permalink / raw)
  To: vgoyal; +Cc: d.hatayama, kexec, zzou, bhe

>On Mon, Apr 28, 2014 at 05:05:00AM +0000, Atsushi Kumagai wrote:
>> >On Thu, Apr 24, 2014 at 07:50:41AM +0800, bhe@redhat.com wrote:
>> >> On 04/23/14 at 01:08pm, Vivek Goyal wrote:
>> >>
>> >> > >  - bitmap size: used for 1st and 2nd bitmaps
>> >> > >  - remains: can be used for the other works of makedumpfile (e.g. I/O buffer)
>> >> > >
>> >> > >                  pattern                      |  bitmap size  |   remains
>> >> > > ----------------------------------------------+---------------+-------------
>> >> > >   A. 100G memory with the too allocation bug  |    12.8 MB    |   17.2 MB
>> >> > >   B. 100G memory with fixed makedumpfile      |     6.4 MB    |   23.6 MB
>> >> > >   C. 200G memory with fixed makedumpfile      |    12.8 MB    |   17.2 MB
>> >> > >   D. 300G memory with fixed makedumpfile      |    19.2 MB    |   10.8 MB
>> >> > >   E. 400G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
>> >> > >   F. 500G memory with fixed makedumpfile      |    24.0 MB    |    6.0 MB
>> >> > >   ...
>> >> > >
>> >> > > Baoquan got OOM in A pattern and didn't get it in B, so C must also
>> >> > > fail due to OOM. This is just what I wanted to say.
>> >> >
>> >> > ok, So here bitmap size is growing because we have not hit the 80% of
>> >> > available memory limit yet. But it gets limited at 24MB once we hit
>> >> > 80% limit. I think that's fine. That's what I was looking for.
>> >> >
>> >> > Now key question will remain is that is using 80% of free memory by
>> >> > bitmaps too much. Are other things happening in system which consume
>> >> > memory and because memory is not available OOM hits. If that's the
>> >> > case we probably need to lower the amount of memory allocated to
>> >> > bit maps. Say 70% or 60% or may be 50%. But this should be data driven.
>> >>
>> >> How about add anoter limit, say left memory safety limit, e.g 20M. If
>> >> the remaining memory which is 20% of free memory is bigger than 20M, 80%
>> >> can be taken to calculate the bitmap size. If smaller than 20M, we just
>> >> take (total memory - safety limit) for bitmap size.
>> >
>> >I think doing another internal limit for makedumpfile usage sounds fine.
>> >So say, if makedumpfile needs 5MB of memory for purposes other than
>> >bitmap, then remove 5MB from total memory and then take 80% of remaining
>> >memory to calculate bitmap size. I think that should be reasonable.
>> >
>> >Tricky bit here is to figure out how much memory does makedumpfile need.
>>
>> Did you said using such value is bad idea since it's hard to update it?
>> If we got the needed memory size, it would be changing every version.
>> At least I think this may be an ideal way but not practical.
>
>Yep, I am not too convinced about fixing makedumpfile memory usage at
>a particular value.
>
>>
>> >A simpler solution will be to just reserve 60% of total memory for bitmaps
>> >and leave rest for makedumpfile and kernel and other components.
>>
>> That's just specific tuning for you and Baoquan.
>>
>> Now, I think this case is just lack of free memory caused by
>> inappropriate parameter setting for your environment. You should
>> increase crashkernel= to get enough free memory, 166M may be too
>> small for your environment.
>
>I don't think it is bad tuning from our side. makedumpfile has 30MB free
>memory when it was launched and still OOM happened.
>
>30MB should be more than enough to save dump.

There were more than 10 MB remain memory in your case (A pattern),
and probably they were used as page cache mostly, so it sounds OOM
couldn't happen since such pages are reclaimable.

I tried to reproduce OOM in my environment.
Unfortunately, I couldn't get a chance to use a large memory machine,
so I controlled the bitmap buffer size with --cyclic-buffer like below:

/ # free
              total         used         free       shared      buffers
  Mem:        37544        19796        17748            0           56
 Swap:            0            0            0
Total:        37544        19796        17748
/ # /mnt/usr/sbin/makedumpfile_static -E --cyclic-buffer=8000 /proc/vmcore /mnt/tmp/dumpfile.E
Copying data                       : [100.0 %] |

The dumpfile is saved to /mnt/tmp/dumpfile.E.

makedumpfile Completed.
VmHWM:     16456 kB
/ #

As above, OOM didn't happen even when makedumpfile consumed most of the
available memory (the remains were only 1MB).

Of course, OOM happened when the memory usage exceeded the limit:

/ # free
              total         used         free       shared      buffers
  Mem:        37544        21608        15936            0          368
 Swap:            0            0            0
Total:        37544        21608        15936
/ # /mnt/usr/sbin/makedumpfile_static -E --cyclic-buffer=8192 /proc/vmcore /mnt/tmp/dumpfile.E
Copying data                       : [  0.0 %] /Out of memory: Kill process 1389 (makedumpfile_st) score 428 or sacrifice child
Killed process 1389 (makedumpfile_st) total-vm:22196kB, anon-rss:16524kB, file-rss:8kB
KILL
/ #


I think we should investigate why OOM happened in your environment,
otherwise we can't decide a safety limit of a user process's memory usage.


Thanks
Atsushi Kumagai

>>
>> By the way, I'm going on holiday for 8 days, I can't reply
>> during that period. Thanks in advance.
>
>Sure, talk to you more about this once you are back.
>
>Thanks
>Vivek
>
>_______________________________________________
>kexec mailing list
>kexec@lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-05-09  5:36                               ` Atsushi Kumagai
@ 2014-05-09 20:49                                 ` Vivek Goyal
  2014-05-15  7:22                                   ` bhe
  2014-05-14  5:44                                 ` bhe
  1 sibling, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2014-05-09 20:49 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: d.hatayama, kexec, zzou, bhe

On Fri, May 09, 2014 at 05:36:13AM +0000, Atsushi Kumagai wrote:

[..]
> I tried to reproduce OOM in my environment.
> Unfortunately, I couldn't get a chance to use a large memory machine,
> so I controlled the bitmap buffer size with --cyclic-buffer like below:
> 
> / # free
>               total         used         free       shared      buffers
>   Mem:        37544        19796        17748            0           56
>  Swap:            0            0            0
> Total:        37544        19796        17748
> / # /mnt/usr/sbin/makedumpfile_static -E --cyclic-buffer=8000 /proc/vmcore /mnt/tmp/dumpfile.E
> Copying data                       : [100.0 %] |
> 
> The dumpfile is saved to /mnt/tmp/dumpfile.E.
> 
> makedumpfile Completed.
> VmHWM:     16456 kB
> / #
> 
> As above, OOM didn't happen even when makedumpfile consumed most of the
> available memory (the remains were only 1MB).
> 
> Of course, OOM happened when the memory usage exceeded the limit:

Hi Atsushi,

I think this is the key point. How did makedumpfile exceed the limit. So
if you don't specify --cyclic-buffer=X, then makedumpfile will allocate
80% of available memory. That would be roughly 16MB of cyclic buffer
(instead of 8MB).

In your test case looks like makeudmpfile used 16MB of memory. Out of
which 8MB must have been used for bitmaps and rest 8MB for other purposes.

So clearly our calculation of using 80% of available memory for bitmaps
is not right. Rest of the 20% memory might not be enough for fulfilling
the needs of makeudmpfile.

We don't have a good way to determine how much *non-bitmap* memory
allocation makedumpfile requires. I am wondering how about making
some assumtions and modify the formula for bitmap memory.

- Assume makedumpfile requires 16MB of free memory for various purposes.
- Subtract 16MB from available memory and take 80% of remaining memory
  to calculate the bitmap memory size.
- Keep 4MB as minimum for bitmap buffer size.

One more person has reported makedumpfile OOM (even without -E option), 
on a machine. There also 30MB of memory is free. I think it has the
same issue that 24MB must have been allocated in bitmaps and remainig
6MB is not sufficient to satisfy makedumpfile's needs.

What do you think?

Also, does anybody know what's the impact of bitmap buffer size on dump
speed. Does dump speed change significantly with bitmap buffer size?

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-05-09  5:36                               ` Atsushi Kumagai
  2014-05-09 20:49                                 ` Vivek Goyal
@ 2014-05-14  5:44                                 ` bhe
  1 sibling, 0 replies; 38+ messages in thread
From: bhe @ 2014-05-14  5:44 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: d.hatayama, kexec, zzou, vgoyal

On 05/09/14 at 05:36am, Atsushi Kumagai wrote:
 
> There were more than 10 MB remain memory in your case (A pattern),
> and probably they were used as page cache mostly, so it sounds OOM
> couldn't happen since such pages are reclaimable.
> 
> I tried to reproduce OOM in my environment.
> Unfortunately, I couldn't get a chance to use a large memory machine,
> so I controlled the bitmap buffer size with --cyclic-buffer like below:
> 
> / # free
>               total         used         free       shared      buffers
>   Mem:        37544        19796        17748            0           56
>  Swap:            0            0            0
> Total:        37544        19796        17748
> / # /mnt/usr/sbin/makedumpfile_static -E --cyclic-buffer=8000 /proc/vmcore /mnt/tmp/dumpfile.E
> Copying data                       : [100.0 %] |
> 
> The dumpfile is saved to /mnt/tmp/dumpfile.E.
> 
> makedumpfile Completed.
> VmHWM:     16456 kB

Hi Atsushi,

What is the dump target in your test? The OOM bugs happened if and only
if it nfs target. Other dump target including ssh/local fs/ didn't
happen ever though those cyclic buffer size code bugs existed.

We added some debug code and made test, the result shows the OOM
happened when the left memory is only about 2M. But when we drop page
caches every 1000 times of write, it's OK to complete the dump. And the
OOM happened when page cache need allocate page for writing.

So could you adjust your test to nfs dump with elf format or lzo
compression? I think nfs dump have heavy page cache affect which is
different with others.

I will take machines with 100G memory and 10G memory to test separately,
will paste the result with our configuration.

Thanks
Baoquan


> / #
> 
> As above, OOM didn't happen even when makedumpfile consumed most of the
> available memory (the remains were only 1MB).
> 
> Of course, OOM happened when the memory usage exceeded the limit:
> 
> / # free
>               total         used         free       shared      buffers
>   Mem:        37544        21608        15936            0          368
>  Swap:            0            0            0
> Total:        37544        21608        15936
> / # /mnt/usr/sbin/makedumpfile_static -E --cyclic-buffer=8192 /proc/vmcore /mnt/tmp/dumpfile.E
> Copying data                       : [  0.0 %] /Out of memory: Kill process 1389 (makedumpfile_st) score 428 or sacrifice child
> Killed process 1389 (makedumpfile_st) total-vm:22196kB, anon-rss:16524kB, file-rss:8kB
> KILL
> / #
> 
> 
> I think we should investigate why OOM happened in your environment,
> otherwise we can't decide a safety limit of a user process's memory usage.
> 
> 
> Thanks
> Atsushi Kumagai
> 
> >>
> >> By the way, I'm going on holiday for 8 days, I can't reply
> >> during that period. Thanks in advance.
> >

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-05-09 20:49                                 ` Vivek Goyal
@ 2014-05-15  7:22                                   ` bhe
  2014-05-15  9:10                                     ` Atsushi Kumagai
  0 siblings, 1 reply; 38+ messages in thread
From: bhe @ 2014-05-15  7:22 UTC (permalink / raw)
  To: Vivek Goyal, Atsushi Kumagai; +Cc: d.hatayama, kexec, zzou, bhe

On 05/09/14 at 04:49pm, Vivek Goyal wrote:
> On Fri, May 09, 2014 at 05:36:13AM +0000, Atsushi Kumagai wrote:
> 
> [..]
> > I tried to reproduce OOM in my environment.
> > Unfortunately, I couldn't get a chance to use a large memory machine,
> > so I controlled the bitmap buffer size with --cyclic-buffer like below:
> > 
> > / # free
> >               total         used         free       shared      buffers
> >   Mem:        37544        19796        17748            0           56
> >  Swap:            0            0            0
> > Total:        37544        19796        17748
> > / # /mnt/usr/sbin/makedumpfile_static -E --cyclic-buffer=8000 /proc/vmcore /mnt/tmp/dumpfile.E
> > Copying data                       : [100.0 %] |
> > 
> > The dumpfile is saved to /mnt/tmp/dumpfile.E.
> > 
> > makedumpfile Completed.
> > VmHWM:     16456 kB
> > / #
> > 
> > As above, OOM didn't happen even when makedumpfile consumed most of the
> > available memory (the remains were only 1MB).
> > 
> > Of course, OOM happened when the memory usage exceeded the limit:
> 
> Hi Atsushi,
> 
> I think this is the key point. How did makedumpfile exceed the limit. So
> if you don't specify --cyclic-buffer=X, then makedumpfile will allocate
> 80% of available memory. That would be roughly 16MB of cyclic buffer
> (instead of 8MB).

Hi,

Here I think Vivek is right. In Astushi's test, when
--cyclic-buffer=8000 and elf dump, the info->bufsize_cyclic is only 4M,
because it need be divided by 2 for elf case.

I took tests on my local machine with 16G memory, the result suprised
me. In my tests, even 70% is not safe. I didn't specify
--cyclic-buffer=X like Astushi did, just change code like below, I think
it's more accurate.

@@ -9049,8 +9055,9 @@ calculate_cyclic_buffer_size(void) {
         * should be 40% of free memory to keep the size of cyclic
         * buffer
         * within 80% of free memory.
         */
        if (info->flag_elf_dumpfile) {
-               limit_size = get_free_memory_size() * 0.4;
+               limit_size = get_free_memory_size() * 0.30;
        } else {
                limit_size = get_free_memory_size() * 0.8;
        }
@@ -9060,7 +9067,8 @@ calculate_cyclic_buffer_size(void) {
        if (info->num_dumpfile > 1)
                bitmap_size /= info->num_dumpfile;

-       info->bufsize_cyclic = MIN(limit_size, bitmap_size);
+       info->bufsize_cyclic = limit_size;

        return TRUE;
 }

All dump is elf format, and target is nfs and local separately. And the
crashkernel=168M

Then in nfs dump, when bufsize_cyclic is 80% of memory, OOM happened 3
times in all 5 times. when bufsize_cyclic is 70% of memory, OOM happened
2 times in all 7 times. I paste the output message in the following
(aaaa) section, it's 80% of memory for bufsize_cyclic.


Then I thought 70% is not safe, maybe local fs dump is better. So I took
local fs dump, when bufsize_cyclic is 80%, OOM happened 3 times in all 5
times of test. When bufsize_cyclic is 70%, 2 of 5. when 60%, finally,
it's safe now. I paste the output message in the following (bbbb)
section, it's 80% of memory for bufsize_cyclic.

I can't get a conclusion, just give the test result for your reference.

(aaaa)
pre-pivot:/# cat /etc/kdump.conf 

nfs 10.66.16.116:/var/crash
path /var/crash
core_collector makedumpfile -E --message-level 1 -d 31
default shell
pre-pivot:/# 
pre-pivot:/# exit
kdump: dump target is 10.66.16.116:/var/crash
kdump: saving to
/sysroot/mnt//var/crash/10.66.16.106-2014.05.15-09:08:03/
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore

calculate_cyclic_buffer_size, free memory 34795520

Buffer size for the cyclic mode: 13918208
Excluding unnecessary pages        : [100.0 %] |
writeout_dumpfile,before write_elf_headerfree, memory 6819840
Excluding unnecessary pages        : [100.0 %] \
writeout_dumpfile,before write_elf_pages_cyclic, memory 6791168

In calculate_cyclic_buffer_size, free memory 6791168
Copying data                       : [  1.1 %] /
In calculate_cyclic_buffer_size, free memory 2412544
Excluding unnecessary pages        : [100.0 %] |
In calculate_cyclic_buffer_size, free memory 2691072
Excluding unnecessary pages        : [100.0 %] \
In calculate_cyclic_buffer_size, free memory 2756608
Copying data                       : [  2.8 %] |
In calculate_cyclic_buffer_size, free memory 2363392
Copying data                       : [ 17.2 %] \[   56.567080] usb
2-1.1: USB disconnect, device number 3
[   58.025437] usb 2-1.1: new low-speed USB device number 5 using
ehci-pci
[   58.125797] usb 2-1.1: New USB device found, idVendor=03f0,
idProduct=0b4a
[   58.132907] usb 2-1.1: New USB device strings: Mfr=1, Product=2,
SerialNumber=0
[   58.140431] usb 2-1.1: Product: USB Optical Mouse
[   58.145345] usb 2-1.1: Manufacturer: Logitech
[   58.153006] input: Logitech USB Optical Mouse as
/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.1/2-1.1:1.0/input/input4
[   58.164942] hid-generic 0003:03F0:0B4A.0003: input,hidraw0: USB HID
v1.11 Mouse [Logitech USB Optical Mouse] on usb-0000:00:1d.0-1.1/inp0
[   58.646175] makedumpfile invoked oom-killer: gfp_mask=0x10200da,
order=0, oom_score_adj=0
[   58.654581] makedumpfile cpuset=/ mems_allowed=0
[   58.659411] CPU: 0 PID: 517 Comm: makedumpfile Not tainted
3.10.0-114.el7.x86_64 #1
[   58.667275] Hardware name: Hewlett-Packard HP Z420 Workstation/1589,
BIOS J61 v01.02 03/09/2012
[   58.676177]  ffff880032db4440 000000003ed297af ffff88002e4b3880
ffffffff815ec0ca
[   58.683882]  ffff88002e4b3910 ffffffff815e773d ffffffff810b6918
ffff880033aca0c0
[   58.691584]  ffffffff00000206 ffffffff00000000 0000000000000000
ffffffff81102e03
[   58.699293] Call Trace:
[   58.701959]  [<ffffffff815ec0ca>] dump_stack+0x19/0x1b
[   58.707309]  [<ffffffff815e773d>] dump_header+0x8e/0x214
[   58.712827]  [<ffffffff810b6918>] ? ktime_get_ts+0x48/0xe0
[   58.718529]  [<ffffffff81102e03>] ? delayacct_end+0x33/0xb0
[   58.724310]  [<ffffffff8114516e>] oom_kill_process+0x24e/0x3b0
[   58.730339]  [<ffffffff8106af3e>] ? has_capability_noaudit+0x1e/0x30
[   58.736895]  [<ffffffff81145996>] out_of_memory+0x4b6/0x4f0
[   58.742665]  [<ffffffff8114b55b>] __alloc_pages_nodemask+0xa7b/0xac0
[   58.749222]  [<ffffffff811885f9>] alloc_pages_current+0xa9/0x170
[   58.755440]  [<ffffffff81141957>] __page_cache_alloc+0x87/0xb0
[   58.761479]  [<ffffffff81142566>]
grab_cache_page_write_begin+0x76/0xd0
[   58.768304]  [<ffffffffa03f4767>] nfs_write_begin+0x77/0x210 [nfs]
[   58.774694]  [<ffffffff8114158e>]
generic_file_buffered_write+0x11e/0x290
[   58.781693]  [<ffffffff81050f00>] ?
rbt_memtype_copy_nth_element+0xa0/0xa0
[   58.788779]  [<ffffffff811436e5>]
__generic_file_aio_write+0x1d5/0x3e0
[   58.795512]  [<ffffffff8114394d>] generic_file_aio_write+0x5d/0xc0
[   58.801903]  [<ffffffffa03f3c1b>] nfs_file_write+0xbb/0x1d0 [nfs]
[   58.808206]  [<ffffffff811af21d>] do_sync_write+0x8d/0xd0
[   58.813796]  [<ffffffff811af9bd>] vfs_write+0xbd/0x1e0
[   58.819141]  [<ffffffff811af4a4>] ? generic_file_llseek+0x24/0x30
[   58.825434]  [<ffffffff811b0408>] SyS_write+0x58/0xb0
[   58.830676]  [<ffffffff815fc819>] system_call_fastpath+0x16/0x1b
[   58.836874] Mem-Info:
[   58.839349] Node 0 DMA per-cpu:
[   58.842714] CPU    0: hi:    0, btch:   1 usd:   0
[   58.847704] Node 0 DMA32 per-cpu:
[   58.851244] CPU    0: hi:   42, btch:   7 usd:   0
[   58.856235] active_anon:11294 inactive_anon:925 isolated_anon:0
[   58.856235]  active_file:1032 inactive_file:1653 isolated_file:32
[   58.856235]  unevictable:11030 dirty:0 writeback:1800 unstable:0
[   58.856235]  free:473 slab_reclaimable:2202 slab_unreclaimable:3849
[   58.856235]  mapped:1245 shmem:1103 pagetables:283 bounce:0
[   58.856235]  free_cma:0
[   58.891859] Node 0 DMA free:492kB min:4kB low:4kB high:4kB
active_anon:0kB inactive_anon:0kB active_file:4kB inactive_file:8kB
unevictabs
[   58.931901] lowmem_reserve[]: 0 122 122 122
[   58.936393] Node 0 DMA32 free:1400kB min:1412kB low:1764kB
high:2116kB active_anon:45176kB inactive_anon:3700kB active_file:4124kB
inacts
[   58.981849] lowmem_reserve[]: 0 0 0 0
[   58.985815] Node 0 DMA: 3*4kB (UM) 2*8kB (U) 1*16kB (M) 0*32kB 1*64kB
(U) 1*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4B
[   58.999026] Node 0 DMA32: 38*4kB (EM) 78*8kB (UEMR) 11*16kB (UER)
0*32kB 3*64kB (R) 0*128kB 1*256kB (R) 0*512kB 0*1024kB 0*2048kB 0*4096B
[   59.012831] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[   59.021469] 14852 total pagecache pages
[   59.025522] 0 pages in swap cache
[   59.029048] Swap cache stats: add 0, delete 0, find 0/0
[   59.034491] Free swap  = 0kB
[   59.037583] Total swap = 0kB
[   59.041703] 90211 pages RAM
[   59.044712] 53920 pages reserved
[   59.048157] 267612 pages shared
[   59.051514] 31588 pages non-shared
[   59.055133] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents
oom_score_adj name
[   59.063173] [   64]     0    64    13020      570      24        0
0 systemd-journal
[   59.072172] [  170]    32   170     9977      282      23        0
0 rpcbind
[   59.080474] [  174]     0   174    11120      416      25        0
0 rpc.statd
[   59.088960] [  179]     0   179     6366      169      16        0
0 rpc.idmapd
[   59.097551] [  201]     0   201     8729      425      22        0
-1000 systemd-udevd
[   59.106402] [  324]     0   324    25589     3179      47        0
0 dhclient
[   59.114815] [  383]     0   383     3101      553      16        0
0 dracut-pre-pivo
[   59.123832] [  517]     0   517    13333     7103      42        0
0 makedumpfile
[   59.132594] [  518]     0   518     8728      272      21        0
0 systemd-udevd
[   59.141442] [  519]     0   519     8728      214      21        0
0 systemd-udevd
[   59.150287] Out of memory: Kill process 517 (makedumpfile) score 190
or sacrifice child
[   59.158527] Killed process 517 (makedumpfile) total-vm:53332kB,
anon-rss:27524kB, file-rss:888kB
//lib/dracut/hooks/pre-pivot/9999-kdump.sh: line
Generating "/run/initramfs/rdsosreport.txt" 


(bbbb)
kdump: dump target is /dev/sda2
kdump: saving t[   13.492083] EXT4-fs (sda2): re-mounted. Opts:
data=ordered
o /sysroot//var/crash/127.0.0.1-2014.05.15-11:24:51/
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore

calculate_cyclic_buffer_size, free memory 77033472

Buffer size for the cyclic mode: 30813388
Excluding unnecessary pages        : [100.0 %] |
writeout_dumpfile,before write_elf_headerfree, memory 15233024
Excluding unnecessary pages        : [100.0 %] -
writeout_dumpfile,before write_elf_pages_cyclic, memory 15204352

In calculate_cyclic_buffer_size, free memory 15204352
Excluding unnecessary pages        : [100.0 %] /
In calculate_cyclic_buffer_size, free memory 2932736
Excluding unnecessary pages        : [100.0 %] |
In calculate_cyclic_buffer_size, free memory 2387968
Excluding unnecessary pages        : [100.0 %] \
In calculate_cyclic_buffer_size, free memory 2555904
Excluding unnecessary pages        : [100.0 %] -
In calculate_cyclic_buffer_size, free memory 2838528
Copying data                       : [  3.8 %] |[   15.406789]
makedumpfile invoked oom-killer: gfp_mask=0x10200da, order=0,
oom_score_adj=0
[   15.415222] makedumpfile cpuset=/ mems_allowed=0
[   15.420191] CPU: 0 PID: 287 Comm: makedumpfile Not tainted
3.10.0-114.el7.x86_64 #1
[   15.428092] Hardware name: Hewlett-Packard HP Z420 Workstation/1589,
BIOS J61 v01.02 03/09/2012
[   15.437280]  ffff88002f8196c0 0000000043db25d9 ffff88002ff2b7e0
ffffffff815ec0ca
[   15.445105]  ffff88002ff2b870 ffffffff815e773d ffffffff810b6918
ffff880030018bb0
[   15.452935]  ffffffff00000206 ffffffff00000000 0000000000000000
ffffffff81102e03
[   15.460693] Call Trace:
[   15.463375]  [<ffffffff815ec0ca>] dump_stack+0x19/0x1b
[   15.468743]  [<ffffffff815e773d>] dump_header+0x8e/0x214
[   15.474288]  [<ffffffff810b6918>] ? ktime_get_ts+0x48/0xe0
[   15.479999]  [<ffffffff81102e03>] ? delayacct_end+0x33/0xb0
[   15.485790]  [<ffffffff8114516e>] oom_kill_process+0x24e/0x3b0
[   15.491848]  [<ffffffff8106af3e>] ? has_capability_noaudit+0x1e/0x30
[   15.498430]  [<ffffffff81145996>] out_of_memory+0x4b6/0x4f0
[   15.504222]  [<ffffffff8114b55b>] __alloc_pages_nodemask+0xa7b/0xac0
[   15.510793]  [<ffffffff811885f9>] alloc_pages_current+0xa9/0x170
[   15.517020]  [<ffffffff81141957>] __page_cache_alloc+0x87/0xb0
[   15.523061]  [<ffffffff81142566>]
grab_cache_page_write_begin+0x76/0xd0
[   15.529898]  [<ffffffffa02ab133>] ext4_da_write_begin+0xa3/0x330
[ext4]
[   15.536728]  [<ffffffff8114158e>]
generic_file_buffered_write+0x11e/0x290
[   15.543724]  [<ffffffff811436e5>]
__generic_file_aio_write+0x1d5/0x3e0
[   15.550462]  [<ffffffff81050f00>] ?
rbt_memtype_copy_nth_element+0xa0/0xa0
[   15.557551]  [<ffffffff8114394d>] generic_file_aio_write+0x5d/0xc0
[   15.563952]  [<ffffffffa02a1189>] ext4_file_write+0xa9/0x450 [ext4]
[   15.570434]  [<ffffffff811797fc>] ? free_vmap_area_noflush+0x7c/0x90
[   15.577009]  [<ffffffff811af21d>] do_sync_write+0x8d/0xd0
[   15.582616]  [<ffffffff811af9bd>] vfs_write+0xbd/0x1e0
[   15.587967]  [<ffffffff811b0408>] SyS_write+0x58/0xb0
[   15.593223]  [<ffffffff815fc819>] system_call_fastpath+0x16/0x1b
[   15.599437] Mem-Info:
[   15.601912] Node 0 DMA per-cpu:
[   15.605283] CPU    0: hi:    0, btch:   1 usd:   0
[   15.610279] Node 0 DMA32 per-cpu:
[   15.613825] CPU    0: hi:   42, btch:   7 usd:  29
[   15.618827] active_anon:16072 inactive_anon:924 isolated_anon:0
[   15.618827]  active_file:2075 inactive_file:2232 isolated_file:0
[   15.618827]  unevictable:6464 dirty:0 writeback:3 unstable:0
[   15.618827]  free:511 slab_reclaimable:2077 slab_unreclaimable:3528
[   15.618827]  mapped:1055 shmem:1088 pagetables:146 bounce:0
[   15.618827]  free_cma:0
[   15.654098] Node 0 DMA free:508kB min:4kB low:4kB high:4kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictabs
[   15.693926] lowmem_reserve[]: 0 129 129 129
[   15.698429] Node 0 DMA32 free:1536kB min:1452kB low:1812kB
high:2176kB active_anon:64288kB inactive_anon:3696kB active_file:8300kB
inacto
[   15.742999] lowmem_reserve[]: 0 0 0 0
[   15.746981] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U)
1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB B
[   15.760499] Node 0 DMA32: 6*4kB (M) 7*8kB (R) 3*16kB (R) 2*32kB (M)
1*64kB (M) 0*128kB 1*256kB (R) 0*512kB 1*1024kB (R) 0*2048kB 0*4096kB
[   15.774255] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[   15.782928] 11860 total pagecache pages
[   15.786986] 0 pages in swap cache
[   15.790524] Swap cache stats: add 0, delete 0, find 0/0
[   15.795976] Free swap  = 0kB
[   15.799080] Total swap = 0kB
[   15.803169] 90211 pages RAM
[   15.806188] 53919 pages reserved
[   15.809635] 6131 pages shared
[   15.812820] 29843 pages non-shared
[   15.816437] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents
oom_score_adj name
[   15.824495] [   64]     0    64    13020      559      23        0
0 systemd-journal
[   15.833506] [  119]     0   119     8864      549      23        0
-1000 systemd-udevd
[   15.842347] [  204]     0   204     3071      531      15        0
0 dracut-pre-pivo
[   15.851353] [  287]     0   287    21581    15351      58        0
0 makedumpfile
[   15.860103] Out of memory: Kill process 287 (makedumpfile) score 411
or sacrifice child
[   15.868333] Killed process 287 (makedumpfile) total-vm:86324kB,
anon-rss:60516kB, file-rss:888kB
//lib/dracut/hooks/pre-pivot/999
Generating "/run/initramfs/rdsosreport.txt"



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-05-15  7:22                                   ` bhe
@ 2014-05-15  9:10                                     ` Atsushi Kumagai
  2014-05-19 11:15                                       ` bhe
  0 siblings, 1 reply; 38+ messages in thread
From: Atsushi Kumagai @ 2014-05-15  9:10 UTC (permalink / raw)
  To: bhe, vgoyal; +Cc: d.hatayama, kexec, zzou

>On 05/09/14 at 04:49pm, Vivek Goyal wrote:
>> On Fri, May 09, 2014 at 05:36:13AM +0000, Atsushi Kumagai wrote:
>>
>> [..]
>> > I tried to reproduce OOM in my environment.
>> > Unfortunately, I couldn't get a chance to use a large memory machine,
>> > so I controlled the bitmap buffer size with --cyclic-buffer like below:
>> >
>> > / # free
>> >               total         used         free       shared      buffers
>> >   Mem:        37544        19796        17748            0           56
>> >  Swap:            0            0            0
>> > Total:        37544        19796        17748
>> > / # /mnt/usr/sbin/makedumpfile_static -E --cyclic-buffer=8000 /proc/vmcore /mnt/tmp/dumpfile.E
>> > Copying data                       : [100.0 %] |
>> >
>> > The dumpfile is saved to /mnt/tmp/dumpfile.E.
>> >
>> > makedumpfile Completed.
>> > VmHWM:     16456 kB
>> > / #
>> >
>> > As above, OOM didn't happen even when makedumpfile consumed most of the
>> > available memory (the remains were only 1MB).
>> >
>> > Of course, OOM happened when the memory usage exceeded the limit:
>>
>> Hi Atsushi,
>>
>> I think this is the key point. How did makedumpfile exceed the limit. So
>> if you don't specify --cyclic-buffer=X, then makedumpfile will allocate
>> 80% of available memory. That would be roughly 16MB of cyclic buffer
>> (instead of 8MB).
>
>Hi,
>
>Here I think Vivek is right. In Astushi's test, when
>--cyclic-buffer=8000 and elf dump, the info->bufsize_cyclic is only 4M,
>because it need be divided by 2 for elf case.

No, 16MB was used for bitmaps also in my case.
--cyclic-buffer option means specifying each bitmap size, so allocated bitmap
size is the double of --cyclic-buffer.
(But now, this behavior is only for ELF case, just the specified size is allocated
in kdump case. Yes, it's confusing...)

       --cyclic-buffer buffer_size
              Specify the buffer size in kilo bytes for analysis in the cyclic
              mode.  Actually, the double of buffer_size kilo  bytes  will  be
              allocated  in  memory.  In the cyclic mode, the number of cycles
              is represented as:

So only 456KB was the requirement for other purposes, 20% of available
memory (3.5MB) was enough, thus the current assumption is safe even
when available memory is only 17MB like my test.

According to my test:

http://lists.infradead.org/pipermail/kexec/2014-May/011784.html

> Here is the result on a 2nd kernel environment:
>
>              parameter                  |      result
>   dump_lv | buffer[KiB] |  mmap (=4MiB) |    VmHWM [KiB]
>   --------+-------------+---------------+------------------
>      d31  |       1     |       on      |         776    // about 700
>     Ed31  |       1     |       on      |         712
>      d31  |       1     |      off      |         704
>     Ed31  |       1     |      off      |         708
>      d31  |    1000     |       on      |        1776    // about 700 + 1000
>     Ed31  |    1000     |       on      |        2716    // about 700 + 1000 * 2
>      d31  |    1000     |      off      |        1660
>     Ed31  |    1000     |      off      |        2556

The requirement memory size for other purposes seems about 700KB when the dump
level is 31, it's so small that we can ignore it. 
The size will increase by KB order based on the system memory size in practice
(this assumption comes from my code investigation), but 6MB (20% of 30MB) still
sounds much enough as safety limit.
This is why I was wondering why OOM happened on your machine.

Now that I heard that OOM happened only on nfs, I guess just nfs requires lots
of memory. (I did my test only on local fs.)
If we can get the required size, we could reflect it in the safety limit size.

I haven't checked your test result below yet, so I'll mention it later.

BTW, I prepared a temporal branch "oom" to investigate this issue, 
let's use this after this:

http://sourceforge.net/p/makedumpfile/code/ci/oom/tree/


Thanks
Atsushi Kumagai

>I took tests on my local machine with 16G memory, the result suprised
>me. In my tests, even 70% is not safe. I didn't specify
>--cyclic-buffer=X like Astushi did, just change code like below, I think
>it's more accurate.
>
>@@ -9049,8 +9055,9 @@ calculate_cyclic_buffer_size(void) {
>         * should be 40% of free memory to keep the size of cyclic
>         * buffer
>         * within 80% of free memory.
>         */
>        if (info->flag_elf_dumpfile) {
>-               limit_size = get_free_memory_size() * 0.4;
>+               limit_size = get_free_memory_size() * 0.30;
>        } else {
>                limit_size = get_free_memory_size() * 0.8;
>        }
>@@ -9060,7 +9067,8 @@ calculate_cyclic_buffer_size(void) {
>        if (info->num_dumpfile > 1)
>                bitmap_size /= info->num_dumpfile;
>
>-       info->bufsize_cyclic = MIN(limit_size, bitmap_size);
>+       info->bufsize_cyclic = limit_size;
>
>        return TRUE;
> }
>
>All dump is elf format, and target is nfs and local separately. And the
>crashkernel=168M
>
>Then in nfs dump, when bufsize_cyclic is 80% of memory, OOM happened 3
>times in all 5 times. when bufsize_cyclic is 70% of memory, OOM happened
>2 times in all 7 times. I paste the output message in the following
>(aaaa) section, it's 80% of memory for bufsize_cyclic.
>
>
>Then I thought 70% is not safe, maybe local fs dump is better. So I took
>local fs dump, when bufsize_cyclic is 80%, OOM happened 3 times in all 5
>times of test. When bufsize_cyclic is 70%, 2 of 5. when 60%, finally,
>it's safe now. I paste the output message in the following (bbbb)
>section, it's 80% of memory for bufsize_cyclic.
>
>I can't get a conclusion, just give the test result for your reference.
>
>(aaaa)
>pre-pivot:/# cat /etc/kdump.conf
>
>nfs 10.66.16.116:/var/crash
>path /var/crash
>core_collector makedumpfile -E --message-level 1 -d 31
>default shell
>pre-pivot:/#
>pre-pivot:/# exit
>kdump: dump target is 10.66.16.116:/var/crash
>kdump: saving to
>/sysroot/mnt//var/crash/10.66.16.106-2014.05.15-09:08:03/
>kdump: saving vmcore-dmesg.txt
>kdump: saving vmcore-dmesg.txt complete
>kdump: saving vmcore
>
>calculate_cyclic_buffer_size, free memory 34795520
>
>Buffer size for the cyclic mode: 13918208
>Excluding unnecessary pages        : [100.0 %] |
>writeout_dumpfile,before write_elf_headerfree, memory 6819840
>Excluding unnecessary pages        : [100.0 %] \
>writeout_dumpfile,before write_elf_pages_cyclic, memory 6791168
>
>In calculate_cyclic_buffer_size, free memory 6791168
>Copying data                       : [  1.1 %] /
>In calculate_cyclic_buffer_size, free memory 2412544
>Excluding unnecessary pages        : [100.0 %] |
>In calculate_cyclic_buffer_size, free memory 2691072
>Excluding unnecessary pages        : [100.0 %] \
>In calculate_cyclic_buffer_size, free memory 2756608
>Copying data                       : [  2.8 %] |
>In calculate_cyclic_buffer_size, free memory 2363392
>Copying data                       : [ 17.2 %] \[   56.567080] usb
>2-1.1: USB disconnect, device number 3
>[   58.025437] usb 2-1.1: new low-speed USB device number 5 using
>ehci-pci
>[   58.125797] usb 2-1.1: New USB device found, idVendor=03f0,
>idProduct=0b4a
>[   58.132907] usb 2-1.1: New USB device strings: Mfr=1, Product=2,
>SerialNumber=0
>[   58.140431] usb 2-1.1: Product: USB Optical Mouse
>[   58.145345] usb 2-1.1: Manufacturer: Logitech
>[   58.153006] input: Logitech USB Optical Mouse as
>/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.1/2-1.1:1.0/input/input4
>[   58.164942] hid-generic 0003:03F0:0B4A.0003: input,hidraw0: USB HID
>v1.11 Mouse [Logitech USB Optical Mouse] on usb-0000:00:1d.0-1.1/inp0
>[   58.646175] makedumpfile invoked oom-killer: gfp_mask=0x10200da,
>order=0, oom_score_adj=0
>[   58.654581] makedumpfile cpuset=/ mems_allowed=0
>[   58.659411] CPU: 0 PID: 517 Comm: makedumpfile Not tainted
>3.10.0-114.el7.x86_64 #1
>[   58.667275] Hardware name: Hewlett-Packard HP Z420 Workstation/1589,
>BIOS J61 v01.02 03/09/2012
>[   58.676177]  ffff880032db4440 000000003ed297af ffff88002e4b3880
>ffffffff815ec0ca
>[   58.683882]  ffff88002e4b3910 ffffffff815e773d ffffffff810b6918
>ffff880033aca0c0
>[   58.691584]  ffffffff00000206 ffffffff00000000 0000000000000000
>ffffffff81102e03
>[   58.699293] Call Trace:
>[   58.701959]  [<ffffffff815ec0ca>] dump_stack+0x19/0x1b
>[   58.707309]  [<ffffffff815e773d>] dump_header+0x8e/0x214
>[   58.712827]  [<ffffffff810b6918>] ? ktime_get_ts+0x48/0xe0
>[   58.718529]  [<ffffffff81102e03>] ? delayacct_end+0x33/0xb0
>[   58.724310]  [<ffffffff8114516e>] oom_kill_process+0x24e/0x3b0
>[   58.730339]  [<ffffffff8106af3e>] ? has_capability_noaudit+0x1e/0x30
>[   58.736895]  [<ffffffff81145996>] out_of_memory+0x4b6/0x4f0
>[   58.742665]  [<ffffffff8114b55b>] __alloc_pages_nodemask+0xa7b/0xac0
>[   58.749222]  [<ffffffff811885f9>] alloc_pages_current+0xa9/0x170
>[   58.755440]  [<ffffffff81141957>] __page_cache_alloc+0x87/0xb0
>[   58.761479]  [<ffffffff81142566>]
>grab_cache_page_write_begin+0x76/0xd0
>[   58.768304]  [<ffffffffa03f4767>] nfs_write_begin+0x77/0x210 [nfs]
>[   58.774694]  [<ffffffff8114158e>]
>generic_file_buffered_write+0x11e/0x290
>[   58.781693]  [<ffffffff81050f00>] ?
>rbt_memtype_copy_nth_element+0xa0/0xa0
>[   58.788779]  [<ffffffff811436e5>]
>__generic_file_aio_write+0x1d5/0x3e0
>[   58.795512]  [<ffffffff8114394d>] generic_file_aio_write+0x5d/0xc0
>[   58.801903]  [<ffffffffa03f3c1b>] nfs_file_write+0xbb/0x1d0 [nfs]
>[   58.808206]  [<ffffffff811af21d>] do_sync_write+0x8d/0xd0
>[   58.813796]  [<ffffffff811af9bd>] vfs_write+0xbd/0x1e0
>[   58.819141]  [<ffffffff811af4a4>] ? generic_file_llseek+0x24/0x30
>[   58.825434]  [<ffffffff811b0408>] SyS_write+0x58/0xb0
>[   58.830676]  [<ffffffff815fc819>] system_call_fastpath+0x16/0x1b
>[   58.836874] Mem-Info:
>[   58.839349] Node 0 DMA per-cpu:
>[   58.842714] CPU    0: hi:    0, btch:   1 usd:   0
>[   58.847704] Node 0 DMA32 per-cpu:
>[   58.851244] CPU    0: hi:   42, btch:   7 usd:   0
>[   58.856235] active_anon:11294 inactive_anon:925 isolated_anon:0
>[   58.856235]  active_file:1032 inactive_file:1653 isolated_file:32
>[   58.856235]  unevictable:11030 dirty:0 writeback:1800 unstable:0
>[   58.856235]  free:473 slab_reclaimable:2202 slab_unreclaimable:3849
>[   58.856235]  mapped:1245 shmem:1103 pagetables:283 bounce:0
>[   58.856235]  free_cma:0
>[   58.891859] Node 0 DMA free:492kB min:4kB low:4kB high:4kB
>active_anon:0kB inactive_anon:0kB active_file:4kB inactive_file:8kB
>unevictabs
>[   58.931901] lowmem_reserve[]: 0 122 122 122
>[   58.936393] Node 0 DMA32 free:1400kB min:1412kB low:1764kB
>high:2116kB active_anon:45176kB inactive_anon:3700kB active_file:4124kB
>inacts
>[   58.981849] lowmem_reserve[]: 0 0 0 0
>[   58.985815] Node 0 DMA: 3*4kB (UM) 2*8kB (U) 1*16kB (M) 0*32kB 1*64kB
>(U) 1*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 4B
>[   58.999026] Node 0 DMA32: 38*4kB (EM) 78*8kB (UEMR) 11*16kB (UER)
>0*32kB 3*64kB (R) 0*128kB 1*256kB (R) 0*512kB 0*1024kB 0*2048kB 0*4096B
>[   59.012831] Node 0 hugepages_total=0 hugepages_free=0
>hugepages_surp=0 hugepages_size=2048kB
>[   59.021469] 14852 total pagecache pages
>[   59.025522] 0 pages in swap cache
>[   59.029048] Swap cache stats: add 0, delete 0, find 0/0
>[   59.034491] Free swap  = 0kB
>[   59.037583] Total swap = 0kB
>[   59.041703] 90211 pages RAM
>[   59.044712] 53920 pages reserved
>[   59.048157] 267612 pages shared
>[   59.051514] 31588 pages non-shared
>[   59.055133] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents
>oom_score_adj name
>[   59.063173] [   64]     0    64    13020      570      24        0
>0 systemd-journal
>[   59.072172] [  170]    32   170     9977      282      23        0
>0 rpcbind
>[   59.080474] [  174]     0   174    11120      416      25        0
>0 rpc.statd
>[   59.088960] [  179]     0   179     6366      169      16        0
>0 rpc.idmapd
>[   59.097551] [  201]     0   201     8729      425      22        0
>-1000 systemd-udevd
>[   59.106402] [  324]     0   324    25589     3179      47        0
>0 dhclient
>[   59.114815] [  383]     0   383     3101      553      16        0
>0 dracut-pre-pivo
>[   59.123832] [  517]     0   517    13333     7103      42        0
>0 makedumpfile
>[   59.132594] [  518]     0   518     8728      272      21        0
>0 systemd-udevd
>[   59.141442] [  519]     0   519     8728      214      21        0
>0 systemd-udevd
>[   59.150287] Out of memory: Kill process 517 (makedumpfile) score 190
>or sacrifice child
>[   59.158527] Killed process 517 (makedumpfile) total-vm:53332kB,
>anon-rss:27524kB, file-rss:888kB
>//lib/dracut/hooks/pre-pivot/9999-kdump.sh: line
>Generating "/run/initramfs/rdsosreport.txt"
>
>
>(bbbb)
>kdump: dump target is /dev/sda2
>kdump: saving t[   13.492083] EXT4-fs (sda2): re-mounted. Opts:
>data=ordered
>o /sysroot//var/crash/127.0.0.1-2014.05.15-11:24:51/
>kdump: saving vmcore-dmesg.txt
>kdump: saving vmcore-dmesg.txt complete
>kdump: saving vmcore
>
>calculate_cyclic_buffer_size, free memory 77033472
>
>Buffer size for the cyclic mode: 30813388
>Excluding unnecessary pages        : [100.0 %] |
>writeout_dumpfile,before write_elf_headerfree, memory 15233024
>Excluding unnecessary pages        : [100.0 %] -
>writeout_dumpfile,before write_elf_pages_cyclic, memory 15204352
>
>In calculate_cyclic_buffer_size, free memory 15204352
>Excluding unnecessary pages        : [100.0 %] /
>In calculate_cyclic_buffer_size, free memory 2932736
>Excluding unnecessary pages        : [100.0 %] |
>In calculate_cyclic_buffer_size, free memory 2387968
>Excluding unnecessary pages        : [100.0 %] \
>In calculate_cyclic_buffer_size, free memory 2555904
>Excluding unnecessary pages        : [100.0 %] -
>In calculate_cyclic_buffer_size, free memory 2838528
>Copying data                       : [  3.8 %] |[   15.406789]
>makedumpfile invoked oom-killer: gfp_mask=0x10200da, order=0,
>oom_score_adj=0
>[   15.415222] makedumpfile cpuset=/ mems_allowed=0
>[   15.420191] CPU: 0 PID: 287 Comm: makedumpfile Not tainted
>3.10.0-114.el7.x86_64 #1
>[   15.428092] Hardware name: Hewlett-Packard HP Z420 Workstation/1589,
>BIOS J61 v01.02 03/09/2012
>[   15.437280]  ffff88002f8196c0 0000000043db25d9 ffff88002ff2b7e0
>ffffffff815ec0ca
>[   15.445105]  ffff88002ff2b870 ffffffff815e773d ffffffff810b6918
>ffff880030018bb0
>[   15.452935]  ffffffff00000206 ffffffff00000000 0000000000000000
>ffffffff81102e03
>[   15.460693] Call Trace:
>[   15.463375]  [<ffffffff815ec0ca>] dump_stack+0x19/0x1b
>[   15.468743]  [<ffffffff815e773d>] dump_header+0x8e/0x214
>[   15.474288]  [<ffffffff810b6918>] ? ktime_get_ts+0x48/0xe0
>[   15.479999]  [<ffffffff81102e03>] ? delayacct_end+0x33/0xb0
>[   15.485790]  [<ffffffff8114516e>] oom_kill_process+0x24e/0x3b0
>[   15.491848]  [<ffffffff8106af3e>] ? has_capability_noaudit+0x1e/0x30
>[   15.498430]  [<ffffffff81145996>] out_of_memory+0x4b6/0x4f0
>[   15.504222]  [<ffffffff8114b55b>] __alloc_pages_nodemask+0xa7b/0xac0
>[   15.510793]  [<ffffffff811885f9>] alloc_pages_current+0xa9/0x170
>[   15.517020]  [<ffffffff81141957>] __page_cache_alloc+0x87/0xb0
>[   15.523061]  [<ffffffff81142566>]
>grab_cache_page_write_begin+0x76/0xd0
>[   15.529898]  [<ffffffffa02ab133>] ext4_da_write_begin+0xa3/0x330
>[ext4]
>[   15.536728]  [<ffffffff8114158e>]
>generic_file_buffered_write+0x11e/0x290
>[   15.543724]  [<ffffffff811436e5>]
>__generic_file_aio_write+0x1d5/0x3e0
>[   15.550462]  [<ffffffff81050f00>] ?
>rbt_memtype_copy_nth_element+0xa0/0xa0
>[   15.557551]  [<ffffffff8114394d>] generic_file_aio_write+0x5d/0xc0
>[   15.563952]  [<ffffffffa02a1189>] ext4_file_write+0xa9/0x450 [ext4]
>[   15.570434]  [<ffffffff811797fc>] ? free_vmap_area_noflush+0x7c/0x90
>[   15.577009]  [<ffffffff811af21d>] do_sync_write+0x8d/0xd0
>[   15.582616]  [<ffffffff811af9bd>] vfs_write+0xbd/0x1e0
>[   15.587967]  [<ffffffff811b0408>] SyS_write+0x58/0xb0
>[   15.593223]  [<ffffffff815fc819>] system_call_fastpath+0x16/0x1b
>[   15.599437] Mem-Info:
>[   15.601912] Node 0 DMA per-cpu:
>[   15.605283] CPU    0: hi:    0, btch:   1 usd:   0
>[   15.610279] Node 0 DMA32 per-cpu:
>[   15.613825] CPU    0: hi:   42, btch:   7 usd:  29
>[   15.618827] active_anon:16072 inactive_anon:924 isolated_anon:0
>[   15.618827]  active_file:2075 inactive_file:2232 isolated_file:0
>[   15.618827]  unevictable:6464 dirty:0 writeback:3 unstable:0
>[   15.618827]  free:511 slab_reclaimable:2077 slab_unreclaimable:3528
>[   15.618827]  mapped:1055 shmem:1088 pagetables:146 bounce:0
>[   15.618827]  free_cma:0
>[   15.654098] Node 0 DMA free:508kB min:4kB low:4kB high:4kB
>active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
>unevictabs
>[   15.693926] lowmem_reserve[]: 0 129 129 129
>[   15.698429] Node 0 DMA32 free:1536kB min:1452kB low:1812kB
>high:2176kB active_anon:64288kB inactive_anon:3696kB active_file:8300kB
>inacto
>[   15.742999] lowmem_reserve[]: 0 0 0 0
>[   15.746981] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U)
>1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB B
>[   15.760499] Node 0 DMA32: 6*4kB (M) 7*8kB (R) 3*16kB (R) 2*32kB (M)
>1*64kB (M) 0*128kB 1*256kB (R) 0*512kB 1*1024kB (R) 0*2048kB 0*4096kB
>[   15.774255] Node 0 hugepages_total=0 hugepages_free=0
>hugepages_surp=0 hugepages_size=2048kB
>[   15.782928] 11860 total pagecache pages
>[   15.786986] 0 pages in swap cache
>[   15.790524] Swap cache stats: add 0, delete 0, find 0/0
>[   15.795976] Free swap  = 0kB
>[   15.799080] Total swap = 0kB
>[   15.803169] 90211 pages RAM
>[   15.806188] 53919 pages reserved
>[   15.809635] 6131 pages shared
>[   15.812820] 29843 pages non-shared
>[   15.816437] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents
>oom_score_adj name
>[   15.824495] [   64]     0    64    13020      559      23        0
>0 systemd-journal
>[   15.833506] [  119]     0   119     8864      549      23        0
>-1000 systemd-udevd
>[   15.842347] [  204]     0   204     3071      531      15        0
>0 dracut-pre-pivo
>[   15.851353] [  287]     0   287    21581    15351      58        0
>0 makedumpfile
>[   15.860103] Out of memory: Kill process 287 (makedumpfile) score 411
>or sacrifice child
>[   15.868333] Killed process 287 (makedumpfile) total-vm:86324kB,
>anon-rss:60516kB, file-rss:888kB
>//lib/dracut/hooks/pre-pivot/999
>Generating "/run/initramfs/rdsosreport.txt"

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-05-15  9:10                                     ` Atsushi Kumagai
@ 2014-05-19 11:15                                       ` bhe
  2014-05-19 15:11                                         ` Vivek Goyal
  2014-05-23  7:18                                         ` Atsushi Kumagai
  0 siblings, 2 replies; 38+ messages in thread
From: bhe @ 2014-05-19 11:15 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: d.hatayama, kexec, zzou, vgoyal

Hi Astushi,
 
> No, 16MB was used for bitmaps also in my case.
> --cyclic-buffer option means specifying each bitmap size, so allocated bitmap
> size is the double of --cyclic-buffer.
> (But now, this behavior is only for ELF case, just the specified size is allocated
> in kdump case. Yes, it's confusing...)
> 
>        --cyclic-buffer buffer_size
>               Specify the buffer size in kilo bytes for analysis in the cyclic
>               mode.  Actually, the double of buffer_size kilo  bytes  will  be
>               allocated  in  memory.  In the cyclic mode, the number of cycles
>               is represented as:


Yeah, I was wrong about this. If specified --cyclic-buffer doesn't
exceed the memory limit, it is equal to info->bufsize_cyclic. But for
elf dump, it need be double. E.g if free memory is 30M, --cyclic-buffer
is 16M, it will absolutely fail. But the code check here should be done
separately. Since it's meaningless for elf dump if 2 x --cyclic-buffer
exceeds the free memory.

	if (info->bufsize_cyclic > free_memory) {
	.......
	{

I think code here need be changed.

> 
> So only 456KB was the requirement for other purposes, 20% of available
> memory (3.5MB) was enough, thus the current assumption is safe even
> when available memory is only 17MB like my test.
> 
> According to my test:
> 
> http://lists.infradead.org/pipermail/kexec/2014-May/011784.html
> 
> > Here is the result on a 2nd kernel environment:
> >
> >              parameter                  |      result
> >   dump_lv | buffer[KiB] |  mmap (=4MiB) |    VmHWM [KiB]
> >   --------+-------------+---------------+------------------
> >      d31  |       1     |       on      |         776    // about 700
> >     Ed31  |       1     |       on      |         712
> >      d31  |       1     |      off      |         704
> >     Ed31  |       1     |      off      |         708
> >      d31  |    1000     |       on      |        1776    // about 700 + 1000
> >     Ed31  |    1000     |       on      |        2716    // about 700 + 1000 * 2
> >      d31  |    1000     |      off      |        1660
> >     Ed31  |    1000     |      off      |        2556
> 
> The requirement memory size for other purposes seems about 700KB when the dump
> level is 31, it's so small that we can ignore it. 
> The size will increase by KB order based on the system memory size in practice
> (this assumption comes from my code investigation), but 6MB (20% of 30MB) still
> sounds much enough as safety limit.
> This is why I was wondering why OOM happened on your machine.
> 
> Now that I heard that OOM happened only on nfs, I guess just nfs requires lots
> of memory. (I did my test only on local fs.)
> If we can get the required size, we could reflect it in the safety limit size.
> 
> I haven't checked your test result below yet, so I'll mention it later.
> 
> BTW, I prepared a temporal branch "oom" to investigate this issue, 
> let's use this after this:
> 
> http://sourceforge.net/p/makedumpfile/code/ci/oom/tree/

About OOM, I don't know why the difference between yours and mine is so
big. My test machine has 16G memory, and the reserved crashkernel=161M.
The OOM happened in a very high frequency. You can check the last
section of kdump failed log, OOM failed when use 80% for
info->bufsize_buffer though left free memory is 13M. 

I changed the code like below. 

---------------------------------------------------
diff --git a/makedumpfile.c b/makedumpfile.c
index 8dc1181..cd710a3 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -3157,6 +3157,7 @@ out:
                return FALSE;
 
        if (info->flag_cyclic) {
+               printf("\ncalculate_cyclic_buffer_size,
get_free_memory_size: %llu\n", get_free_memory_size());
                if (info->bufsize_cyclic == 0) {
                        if (!calculate_cyclic_buffer_size())
                                return FALSE;
@@ -3204,6 +3205,7 @@ out:
 
                DEBUG_MSG("\n");
                DEBUG_MSG("Buffer size for the cyclic mode: %ld\n",
info->bufsize_cyclic);
+               printf("\n Buffer size for the cyclic mode: %ld\n",
info->bufsize_cyclic);
        }
 
        if (!is_xen_memory() && !cache_init())
@@ -9060,7 +9062,7 @@ calculate_cyclic_buffer_size(void) {
        if (info->num_dumpfile > 1)
                bitmap_size /= info->num_dumpfile;
 
-       info->bufsize_cyclic = MIN(limit_size, bitmap_size);
+       info->bufsize_cyclic = limit_size;

-------------------------------------------------
bhe# cat /etc/kdump.conf
path /var/crash
core_collector makedumpfile -E --message-level 1 -d 31

------------------------------------------
kdump: dump target is /dev/sda2
kdump: saving [    9.595153] EXT4-fs (sda2): re-mounted. Opts:
data=ordered
to /sysroot//var/crash/127.0.0.1-2014.05.19-18:50:18/
kdump: saving vmcore-dmesg.txt
kdump: saving vmcore-dmesg.txt complete
kdump: saving vmcore

calculate_cyclic_buffer_size, get_free_memory_size: 68857856

 Buffer size for the cyclic mode: 27543142
Copying data                       : [ 15.9 %] -[   14.955468]
makedumpfile invoked oom-killer: gfp_mask=0x10200da, order=0,
oom_score_adj=0
[   14.963876] makedumpfile cpuset=/ mems_allowed=0
[   14.968723] CPU: 0 PID: 286 Comm: makedumpfile Not tainted
3.10.0-123.el7.x86_64 #1
[   14.976606] Hardware name: Hewlett-Packard HP Z420 Workstation/1589,
BIOS J61 v01.02 03/09/2012
[   14.985567]  ffff88002fedc440 00000000f650c592 ffff88002fcb57d0
ffffffff815e19ba
[   14.993291]  ffff88002fcb5860 ffffffff815dd02d ffffffff810b68f8
ffff8800359dc0c0
[   15.001013]  ffffffff00000206 ffffffff00000000 0000000000000000
ffffffff81102e03
[   15.008733] Call Trace:
[   15.011413]  [<ffffffff815e19ba>] dump_stack+0x19/0x1b
[   15.016778]  [<ffffffff815dd02d>] dump_header+0x8e/0x214
[   15.022321]  [<ffffffff810b68f8>] ? ktime_get_ts+0x48/0xe0
[   15.028036]  [<ffffffff81102e03>] ? proc_do_uts_string+0xe3/0x130
[   15.034383]  [<ffffffff8114520e>] oom_kill_process+0x24e/0x3b0
[   15.040446]  [<ffffffff8106af3e>] ? has_capability_noaudit+0x1e/0x30
[   15.047068]  [<ffffffff81145a36>] out_of_memory+0x4b6/0x4f0
[   15.052864]  [<ffffffff8114b579>] __alloc_pages_nodemask+0xa09/0xb10
[   15.059482]  [<ffffffff81188779>] alloc_pages_current+0xa9/0x170
[   15.065711]  [<ffffffff811419f7>] __page_cache_alloc+0x87/0xb0
[   15.071804]  [<ffffffff81142606>]
grab_cache_page_write_begin+0x76/0xd0
[   15.078646]  [<ffffffffa02aa133>] ext4_da_write_begin+0xa3/0x330
[ext4]
[   15.085495]  [<ffffffff8114162e>]
generic_file_buffered_write+0x11e/0x290
[   15.092504]  [<ffffffff81143785>]
__generic_file_aio_write+0x1d5/0x3e0
[   15.099294]  [<ffffffff81050f00>] ?
rbt_memtype_copy_nth_element+0xa0/0xa0
[   15.106385]  [<ffffffff811439ed>] generic_file_aio_write+0x5d/0xc0
[   15.112841]  [<ffffffffa02a0189>] ext4_file_write+0xa9/0x450 [ext4]
[   15.119321]  [<ffffffff8117997c>] ? free_vmap_area_noflush+0x7c/0x90
[   15.125884]  [<ffffffff811af36d>] do_sync_write+0x8d/0xd0
[   15.131492]  [<ffffffff811afb0d>] vfs_write+0xbd/0x1e0
[   15.136839]  [<ffffffff811b0558>] SyS_write+0x58/0xb0
[   15.142091]  [<ffffffff815f2119>] system_call_fastpath+0x16/0x1b
[   15.148293] Mem-Info:
[   15.150770] Node 0 DMA per-cpu:
[   15.154138] CPU    0: hi:    0, btch:   1 usd:   0
[   15.159133] Node 0 DMA32 per-cpu:
[   15.162741] CPU    0: hi:   42, btch:   7 usd:  12
[   15.167732] active_anon:14395 inactive_anon:1034 isolated_anon:0
[   15.167732]  active_file:2406 inactive_file:2533 isolated_file:0
[   15.167732]  unevictable:7137 dirty:2 writeback:1511 unstable:0
[   15.167732]  free:488 slab_reclaimable:2371 slab_unreclaimable:3533
[   15.167732]  mapped:1110 shmem:1065 pagetables:166 bounce:0
[   15.167732]  free_cma:0
[   15.203076] Node 0 DMA free:508kB min:4kB low:4kB high:4kB
active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
unevictabs
[   15.242882] lowmem_reserve[]: 0 128 128 128
[   15.247447] Node 0 DMA32 free:1444kB min:1444kB low:1804kB
high:2164kB active_anon:57580kB inactive_anon:4136kB active_file:9624kB
inacts
[   15.292683] lowmem_reserve[]: 0 0 0 0
[   15.296761] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U)
1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB B
[   15.310372] Node 0 DMA32: 78*4kB (UEM) 52*8kB (UEM) 17*16kB (UM)
12*32kB (UM) 2*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*40B
[   15.324412] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[   15.333088] 13144 total pagecache pages
[   15.337161] 0 pages in swap cache
[   15.340708] Swap cache stats: add 0, delete 0, find 0/0
[   15.346165] Free swap  = 0kB
[   15.349280] Total swap = 0kB
[   15.353385] 90211 pages RAM
[   15.356420] 53902 pages reserved
[   15.359880] 6980 pages shared
[   15.363088] 29182 pages non-shared
[   15.366719] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents
oom_score_adj name
[   15.374788] [   85]     0    85    13020      553      24        0
0 systemd-journal
[   15.383818] [  134]     0   134     8860      547      22        0
-1000 systemd-udevd
[   15.392664] [  146]     0   146     5551      245      23        0
0 plymouthd
[   15.401167] [  230]     0   230     3106      537      16        0
0 dracut-pre-pivo
[   15.410181] [  286]     0   286    19985    13756      55        0
0 makedumpfile
[   15.418942] Out of memory: Kill process 286 (makedumpfile) score 368
or sacrifice child
[   15.427173] Killed process 286 (makedumpfile) total-vm:79940kB,
anon-rss:54132kB, file-rss:892kB
//lib/dracut/hooks/pre-pivot/9999-kdump.sh: line
Generating "/run/initramfs/rdsosreport.txt"

> 
> 
> Thanks
> Atsushi Kumagai
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-05-19 11:15                                       ` bhe
@ 2014-05-19 15:11                                         ` Vivek Goyal
  2014-05-27  5:34                                           ` Atsushi Kumagai
  2014-05-23  7:18                                         ` Atsushi Kumagai
  1 sibling, 1 reply; 38+ messages in thread
From: Vivek Goyal @ 2014-05-19 15:11 UTC (permalink / raw)
  To: bhe; +Cc: kexec, d.hatayama, Atsushi Kumagai, zzou, Larry Woodman

On Mon, May 19, 2014 at 07:15:38PM +0800, bhe@redhat.com wrote:

[..]
> -------------------------------------------------
> bhe# cat /etc/kdump.conf
> path /var/crash
> core_collector makedumpfile -E --message-level 1 -d 31
> 
> ------------------------------------------
> kdump: dump target is /dev/sda2
> kdump: saving [    9.595153] EXT4-fs (sda2): re-mounted. Opts:
> data=ordered
> to /sysroot//var/crash/127.0.0.1-2014.05.19-18:50:18/
> kdump: saving vmcore-dmesg.txt
> kdump: saving vmcore-dmesg.txt complete
> kdump: saving vmcore
> 
> calculate_cyclic_buffer_size, get_free_memory_size: 68857856
> 
>  Buffer size for the cyclic mode: 27543142

Bao,

So 68857856 is 65MB. So we have around 65MB free when makedumpfile
started.

27543142  is 26MB. So we reserved 26MB for bitmaps or we reserved
52MB for bitmaps?

Looking at the backtrace, larry pointed out few things.

- makedumpfile has already allocated around 52MB of anonymous memory. I
  guess this primarily comes from bitmaps and looks like we are reserving
  52MB in bitmaps and not 26MB. I think this could be consistent with
  current 80% logic as 80% of 65MB is around 52MB.

	[   15.427173] Killed process 286 (makedumpfile) total-vm:79940kB,
			anon-rss:54132kB, file-rss:892kB

- So we are left with 65-52 = 13MB of total memory for kernel as well
  as makedumpfile.
 
- We have around 1500 pages in page cache which are in writeback stage.
  That means around 6MB of pages are dirty and being written back to
  disk. That means makedumpfile might not require lot of memory but
  kernel does require free memory in dirty/writeback pages when dump
  file is being written.

	[   15.167732]  unevictable:7137 dirty:2 writeback:1511 unstable:0

- Larry mentioend that there are around 5000 pages (20MB of memory)
  sitting in file pages in page cache which ideally should be reclaimable.
  It is not clear why that memory is not being reclaimed fast enough.

	[   15.167732]  active_file:2406 inactive_file:2533 isolated_file:0

So to me bottom line is that once the write out starts, kernel needs
memory for holding dirty and writeback pages in cache too. So we probably
are being too aggresive in allocating 80% of free memory for bitmaps. May
be we should drop it down to 50-60%  of free memory for bitmaps.

Thanks
Vivek




> Copying data                       : [ 15.9 %] -[   14.955468]
> makedumpfile invoked oom-killer: gfp_mask=0x10200da, order=0,
> oom_score_adj=0
> [   14.963876] makedumpfile cpuset=/ mems_allowed=0
> [   14.968723] CPU: 0 PID: 286 Comm: makedumpfile Not tainted
> 3.10.0-123.el7.x86_64 #1
> [   14.976606] Hardware name: Hewlett-Packard HP Z420 Workstation/1589,
> BIOS J61 v01.02 03/09/2012
> [   14.985567]  ffff88002fedc440 00000000f650c592 ffff88002fcb57d0
> ffffffff815e19ba
> [   14.993291]  ffff88002fcb5860 ffffffff815dd02d ffffffff810b68f8
> ffff8800359dc0c0
> [   15.001013]  ffffffff00000206 ffffffff00000000 0000000000000000
> ffffffff81102e03
> [   15.008733] Call Trace:
> [   15.011413]  [<ffffffff815e19ba>] dump_stack+0x19/0x1b
> [   15.016778]  [<ffffffff815dd02d>] dump_header+0x8e/0x214
> [   15.022321]  [<ffffffff810b68f8>] ? ktime_get_ts+0x48/0xe0
> [   15.028036]  [<ffffffff81102e03>] ? proc_do_uts_string+0xe3/0x130
> [   15.034383]  [<ffffffff8114520e>] oom_kill_process+0x24e/0x3b0
> [   15.040446]  [<ffffffff8106af3e>] ? has_capability_noaudit+0x1e/0x30
> [   15.047068]  [<ffffffff81145a36>] out_of_memory+0x4b6/0x4f0
> [   15.052864]  [<ffffffff8114b579>] __alloc_pages_nodemask+0xa09/0xb10
> [   15.059482]  [<ffffffff81188779>] alloc_pages_current+0xa9/0x170
> [   15.065711]  [<ffffffff811419f7>] __page_cache_alloc+0x87/0xb0
> [   15.071804]  [<ffffffff81142606>]
> grab_cache_page_write_begin+0x76/0xd0
> [   15.078646]  [<ffffffffa02aa133>] ext4_da_write_begin+0xa3/0x330
> [ext4]
> [   15.085495]  [<ffffffff8114162e>]
> generic_file_buffered_write+0x11e/0x290
> [   15.092504]  [<ffffffff81143785>]
> __generic_file_aio_write+0x1d5/0x3e0
> [   15.099294]  [<ffffffff81050f00>] ?
> rbt_memtype_copy_nth_element+0xa0/0xa0
> [   15.106385]  [<ffffffff811439ed>] generic_file_aio_write+0x5d/0xc0
> [   15.112841]  [<ffffffffa02a0189>] ext4_file_write+0xa9/0x450 [ext4]
> [   15.119321]  [<ffffffff8117997c>] ? free_vmap_area_noflush+0x7c/0x90
> [   15.125884]  [<ffffffff811af36d>] do_sync_write+0x8d/0xd0
> [   15.131492]  [<ffffffff811afb0d>] vfs_write+0xbd/0x1e0
> [   15.136839]  [<ffffffff811b0558>] SyS_write+0x58/0xb0
> [   15.142091]  [<ffffffff815f2119>] system_call_fastpath+0x16/0x1b
> [   15.148293] Mem-Info:
> [   15.150770] Node 0 DMA per-cpu:
> [   15.154138] CPU    0: hi:    0, btch:   1 usd:   0
> [   15.159133] Node 0 DMA32 per-cpu:
> [   15.162741] CPU    0: hi:   42, btch:   7 usd:  12
> [   15.167732] active_anon:14395 inactive_anon:1034 isolated_anon:0
> [   15.167732]  active_file:2406 inactive_file:2533 isolated_file:0
> [   15.167732]  unevictable:7137 dirty:2 writeback:1511 unstable:0
> [   15.167732]  free:488 slab_reclaimable:2371 slab_unreclaimable:3533
> [   15.167732]  mapped:1110 shmem:1065 pagetables:166 bounce:0
> [   15.167732]  free_cma:0
> [   15.203076] Node 0 DMA free:508kB min:4kB low:4kB high:4kB
> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
> unevictabs
> [   15.242882] lowmem_reserve[]: 0 128 128 128
> [   15.247447] Node 0 DMA32 free:1444kB min:1444kB low:1804kB
> high:2164kB active_anon:57580kB inactive_anon:4136kB active_file:9624kB
> inacts
> [   15.292683] lowmem_reserve[]: 0 0 0 0
> [   15.296761] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U)
> 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB B
> [   15.310372] Node 0 DMA32: 78*4kB (UEM) 52*8kB (UEM) 17*16kB (UM)
> 12*32kB (UM) 2*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*40B
> [   15.324412] Node 0 hugepages_total=0 hugepages_free=0
> hugepages_surp=0 hugepages_size=2048kB
> [   15.333088] 13144 total pagecache pages
> [   15.337161] 0 pages in swap cache
> [   15.340708] Swap cache stats: add 0, delete 0, find 0/0
> [   15.346165] Free swap  = 0kB
> [   15.349280] Total swap = 0kB
> [   15.353385] 90211 pages RAM
> [   15.356420] 53902 pages reserved
> [   15.359880] 6980 pages shared
> [   15.363088] 29182 pages non-shared
> [   15.366719] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents
> oom_score_adj name
> [   15.374788] [   85]     0    85    13020      553      24        0
> 0 systemd-journal
> [   15.383818] [  134]     0   134     8860      547      22        0
> -1000 systemd-udevd
> [   15.392664] [  146]     0   146     5551      245      23        0
> 0 plymouthd
> [   15.401167] [  230]     0   230     3106      537      16        0
> 0 dracut-pre-pivo
> [   15.410181] [  286]     0   286    19985    13756      55        0
> 0 makedumpfile
> [   15.418942] Out of memory: Kill process 286 (makedumpfile) score 368
> or sacrifice child
> [   15.427173] Killed process 286 (makedumpfile) total-vm:79940kB,
> anon-rss:54132kB, file-rss:892kB
> //lib/dracut/hooks/pre-pivot/9999-kdump.sh: line
> Generating "/run/initramfs/rdsosreport.txt"
> 
> > 
> > 
> > Thanks
> > Atsushi Kumagai
> > 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-05-19 11:15                                       ` bhe
  2014-05-19 15:11                                         ` Vivek Goyal
@ 2014-05-23  7:18                                         ` Atsushi Kumagai
  1 sibling, 0 replies; 38+ messages in thread
From: Atsushi Kumagai @ 2014-05-23  7:18 UTC (permalink / raw)
  To: bhe; +Cc: d.hatayama, kexec, zzou, vgoyal

Hello Bao,

>Hi Astushi,
>
>> No, 16MB was used for bitmaps also in my case.
>> --cyclic-buffer option means specifying each bitmap size, so allocated bitmap
>> size is the double of --cyclic-buffer.
>> (But now, this behavior is only for ELF case, just the specified size is allocated
>> in kdump case. Yes, it's confusing...)
>>
>>        --cyclic-buffer buffer_size
>>               Specify the buffer size in kilo bytes for analysis in the cyclic
>>               mode.  Actually, the double of buffer_size kilo  bytes  will  be
>>               allocated  in  memory.  In the cyclic mode, the number of cycles
>>               is represented as:
>
>
>Yeah, I was wrong about this. If specified --cyclic-buffer doesn't
>exceed the memory limit, it is equal to info->bufsize_cyclic. But for
>elf dump, it need be double. E.g if free memory is 30M, --cyclic-buffer
>is 16M, it will absolutely fail. But the code check here should be done
>separately. Since it's meaningless for elf dump if 2 x --cyclic-buffer
>exceeds the free memory.
>
>	if (info->bufsize_cyclic > free_memory) {
>	.......
>	{
>
>I think code here need be changed.

Exactly, this is a bug.
I can fix it simply by modification to the check above, but I think it's
better to change the ELF path to get rid of the 1st bitmap buffer since it's
unused. Then the required buffer size will be just <--cyclic-buffer> in both
cases, so I'll post the patch to do it later.


Thanks
Atsushi Kumagai

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-05-19 15:11                                         ` Vivek Goyal
@ 2014-05-27  5:34                                           ` Atsushi Kumagai
  2014-05-27 14:49                                             ` Vivek Goyal
  0 siblings, 1 reply; 38+ messages in thread
From: Atsushi Kumagai @ 2014-05-27  5:34 UTC (permalink / raw)
  To: vgoyal, bhe; +Cc: d.hatayama, kexec, zzou, lwoodman

>On Mon, May 19, 2014 at 07:15:38PM +0800, bhe@redhat.com wrote:
>
>[..]
>> -------------------------------------------------
>> bhe# cat /etc/kdump.conf
>> path /var/crash
>> core_collector makedumpfile -E --message-level 1 -d 31
>>
>> ------------------------------------------
>> kdump: dump target is /dev/sda2
>> kdump: saving [    9.595153] EXT4-fs (sda2): re-mounted. Opts:
>> data=ordered
>> to /sysroot//var/crash/127.0.0.1-2014.05.19-18:50:18/
>> kdump: saving vmcore-dmesg.txt
>> kdump: saving vmcore-dmesg.txt complete
>> kdump: saving vmcore
>>
>> calculate_cyclic_buffer_size, get_free_memory_size: 68857856
>>
>>  Buffer size for the cyclic mode: 27543142
>
>Bao,
>
>So 68857856 is 65MB. So we have around 65MB free when makedumpfile
>started.
>
>27543142  is 26MB. So we reserved 26MB for bitmaps or we reserved
>52MB for bitmaps?

52MB is correct, so the larry's view below looks right.

>Looking at the backtrace, larry pointed out few things.
>
>- makedumpfile has already allocated around 52MB of anonymous memory. I
>  guess this primarily comes from bitmaps and looks like we are reserving
>  52MB in bitmaps and not 26MB. I think this could be consistent with
>  current 80% logic as 80% of 65MB is around 52MB.
>
>	[   15.427173] Killed process 286 (makedumpfile) total-vm:79940kB,
>			anon-rss:54132kB, file-rss:892kB
>
>- So we are left with 65-52 = 13MB of total memory for kernel as well
>  as makedumpfile.
>
>- We have around 1500 pages in page cache which are in writeback stage.
>  That means around 6MB of pages are dirty and being written back to
>  disk. That means makedumpfile might not require lot of memory but
>  kernel does require free memory in dirty/writeback pages when dump
>  file is being written.
>
>	[   15.167732]  unevictable:7137 dirty:2 writeback:1511 unstable:0
>
>- Larry mentioend that there are around 5000 pages (20MB of memory)
>  sitting in file pages in page cache which ideally should be reclaimable.
>  It is not clear why that memory is not being reclaimed fast enough.
>
>	[   15.167732]  active_file:2406 inactive_file:2533 isolated_file:0
>
>So to me bottom line is that once the write out starts, kernel needs
>memory for holding dirty and writeback pages in cache too. So we probably
>are being too aggresive in allocating 80% of free memory for bitmaps. May
>be we should drop it down to 50-60%  of free memory for bitmaps.

I don't disagree to changing the 80% limit but I prefer to remove such
a percentage threshold because it's dependent on the environment.
Actually, I think it makes this problem more complex.

Now, thanks to page_is_buddy(), the performance degradation caused by
multi-cycle processing looks very small according to the benchmark on
2TB memory:

  https://lkml.org/lkml/2013/3/26/914

This result means we don't need to make an effort to allocate the bitmap
buffer as large as possible. So how about just setting a small fixed value
like 5MB as a safety limit?
It may be safer, and it will be easier to estimate the total memory usage of
makedumpfile, so I think it's better way if the most users especially large
machine users accept it.


Thanks
Atsushi Kumagai

>> Copying data                       : [ 15.9 %] -[   14.955468]
>> makedumpfile invoked oom-killer: gfp_mask=0x10200da, order=0,
>> oom_score_adj=0
>> [   14.963876] makedumpfile cpuset=/ mems_allowed=0
>> [   14.968723] CPU: 0 PID: 286 Comm: makedumpfile Not tainted
>> 3.10.0-123.el7.x86_64 #1
>> [   14.976606] Hardware name: Hewlett-Packard HP Z420 Workstation/1589,
>> BIOS J61 v01.02 03/09/2012
>> [   14.985567]  ffff88002fedc440 00000000f650c592 ffff88002fcb57d0
>> ffffffff815e19ba
>> [   14.993291]  ffff88002fcb5860 ffffffff815dd02d ffffffff810b68f8
>> ffff8800359dc0c0
>> [   15.001013]  ffffffff00000206 ffffffff00000000 0000000000000000
>> ffffffff81102e03
>> [   15.008733] Call Trace:
>> [   15.011413]  [<ffffffff815e19ba>] dump_stack+0x19/0x1b
>> [   15.016778]  [<ffffffff815dd02d>] dump_header+0x8e/0x214
>> [   15.022321]  [<ffffffff810b68f8>] ? ktime_get_ts+0x48/0xe0
>> [   15.028036]  [<ffffffff81102e03>] ? proc_do_uts_string+0xe3/0x130
>> [   15.034383]  [<ffffffff8114520e>] oom_kill_process+0x24e/0x3b0
>> [   15.040446]  [<ffffffff8106af3e>] ? has_capability_noaudit+0x1e/0x30
>> [   15.047068]  [<ffffffff81145a36>] out_of_memory+0x4b6/0x4f0
>> [   15.052864]  [<ffffffff8114b579>] __alloc_pages_nodemask+0xa09/0xb10
>> [   15.059482]  [<ffffffff81188779>] alloc_pages_current+0xa9/0x170
>> [   15.065711]  [<ffffffff811419f7>] __page_cache_alloc+0x87/0xb0
>> [   15.071804]  [<ffffffff81142606>]
>> grab_cache_page_write_begin+0x76/0xd0
>> [   15.078646]  [<ffffffffa02aa133>] ext4_da_write_begin+0xa3/0x330
>> [ext4]
>> [   15.085495]  [<ffffffff8114162e>]
>> generic_file_buffered_write+0x11e/0x290
>> [   15.092504]  [<ffffffff81143785>]
>> __generic_file_aio_write+0x1d5/0x3e0
>> [   15.099294]  [<ffffffff81050f00>] ?
>> rbt_memtype_copy_nth_element+0xa0/0xa0
>> [   15.106385]  [<ffffffff811439ed>] generic_file_aio_write+0x5d/0xc0
>> [   15.112841]  [<ffffffffa02a0189>] ext4_file_write+0xa9/0x450 [ext4]
>> [   15.119321]  [<ffffffff8117997c>] ? free_vmap_area_noflush+0x7c/0x90
>> [   15.125884]  [<ffffffff811af36d>] do_sync_write+0x8d/0xd0
>> [   15.131492]  [<ffffffff811afb0d>] vfs_write+0xbd/0x1e0
>> [   15.136839]  [<ffffffff811b0558>] SyS_write+0x58/0xb0
>> [   15.142091]  [<ffffffff815f2119>] system_call_fastpath+0x16/0x1b
>> [   15.148293] Mem-Info:
>> [   15.150770] Node 0 DMA per-cpu:
>> [   15.154138] CPU    0: hi:    0, btch:   1 usd:   0
>> [   15.159133] Node 0 DMA32 per-cpu:
>> [   15.162741] CPU    0: hi:   42, btch:   7 usd:  12
>> [   15.167732] active_anon:14395 inactive_anon:1034 isolated_anon:0
>> [   15.167732]  active_file:2406 inactive_file:2533 isolated_file:0
>> [   15.167732]  unevictable:7137 dirty:2 writeback:1511 unstable:0
>> [   15.167732]  free:488 slab_reclaimable:2371 slab_unreclaimable:3533
>> [   15.167732]  mapped:1110 shmem:1065 pagetables:166 bounce:0
>> [   15.167732]  free_cma:0
>> [   15.203076] Node 0 DMA free:508kB min:4kB low:4kB high:4kB
>> active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB
>> unevictabs
>> [   15.242882] lowmem_reserve[]: 0 128 128 128
>> [   15.247447] Node 0 DMA32 free:1444kB min:1444kB low:1804kB
>> high:2164kB active_anon:57580kB inactive_anon:4136kB active_file:9624kB
>> inacts
>> [   15.292683] lowmem_reserve[]: 0 0 0 0
>> [   15.296761] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U)
>> 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB B
>> [   15.310372] Node 0 DMA32: 78*4kB (UEM) 52*8kB (UEM) 17*16kB (UM)
>> 12*32kB (UM) 2*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*40B
>> [   15.324412] Node 0 hugepages_total=0 hugepages_free=0
>> hugepages_surp=0 hugepages_size=2048kB
>> [   15.333088] 13144 total pagecache pages
>> [   15.337161] 0 pages in swap cache
>> [   15.340708] Swap cache stats: add 0, delete 0, find 0/0
>> [   15.346165] Free swap  = 0kB
>> [   15.349280] Total swap = 0kB
>> [   15.353385] 90211 pages RAM
>> [   15.356420] 53902 pages reserved
>> [   15.359880] 6980 pages shared
>> [   15.363088] 29182 pages non-shared
>> [   15.366719] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents
>> oom_score_adj name
>> [   15.374788] [   85]     0    85    13020      553      24        0
>> 0 systemd-journal
>> [   15.383818] [  134]     0   134     8860      547      22        0
>> -1000 systemd-udevd
>> [   15.392664] [  146]     0   146     5551      245      23        0
>> 0 plymouthd
>> [   15.401167] [  230]     0   230     3106      537      16        0
>> 0 dracut-pre-pivo
>> [   15.410181] [  286]     0   286    19985    13756      55        0
>> 0 makedumpfile
>> [   15.418942] Out of memory: Kill process 286 (makedumpfile) score 368
>> or sacrifice child
>> [   15.427173] Killed process 286 (makedumpfile) total-vm:79940kB,
>> anon-rss:54132kB, file-rss:892kB
>> //lib/dracut/hooks/pre-pivot/9999-kdump.sh: line
>> Generating "/run/initramfs/rdsosreport.txt"
>>
>> >
>> >
>> > Thanks
>> > Atsushi Kumagai
>> >

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump
  2014-05-27  5:34                                           ` Atsushi Kumagai
@ 2014-05-27 14:49                                             ` Vivek Goyal
  0 siblings, 0 replies; 38+ messages in thread
From: Vivek Goyal @ 2014-05-27 14:49 UTC (permalink / raw)
  To: Atsushi Kumagai; +Cc: d.hatayama, kexec, zzou, bhe, lwoodman

On Tue, May 27, 2014 at 05:34:05AM +0000, Atsushi Kumagai wrote:

[..]
> >So to me bottom line is that once the write out starts, kernel needs
> >memory for holding dirty and writeback pages in cache too. So we probably
> >are being too aggresive in allocating 80% of free memory for bitmaps. May
> >be we should drop it down to 50-60%  of free memory for bitmaps.
> 
> I don't disagree to changing the 80% limit but I prefer to remove such
> a percentage threshold because it's dependent on the environment.
> Actually, I think it makes this problem more complex.
> 
> Now, thanks to page_is_buddy(), the performance degradation caused by
> multi-cycle processing looks very small according to the benchmark on
> 2TB memory:
> 
>   https://lkml.org/lkml/2013/3/26/914
> 
> This result means we don't need to make an effort to allocate the bitmap
> buffer as large as possible. So how about just setting a small fixed value
> like 5MB as a safety limit?
> It may be safer, and it will be easier to estimate the total memory usage of
> makedumpfile, so I think it's better way if the most users especially large
> machine users accept it.

Hi Atsushi,

If increasing buffer size does not have any significant increase in dump
time, then it is reasonable to have fixed buffer size for bitmaps.
(instead of trying to maximize bitmap size).

We can probably go for 4MB as bitmap size (instead of 5MB).

Also can we modify the logic a bit so that we automatically shrhink the
size of bitmap if sufficient memory is not available. Say assume that 60%
of available memory can be used for bitmaps. If it is less then 4MB then
we drop the buffer size and that hopefully still makes makedumpfile
successful (instead of being OOMed).

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2014-05-27 14:49 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-10 21:44 makedumpfile memmory usage seems high with option -E Vivek Goyal
2014-04-11  9:22 ` Atsushi Kumagai
2014-04-11 10:19   ` Arthur Zou
2014-04-14  8:02     ` [PATCH] makedumpfile: change the wrong code to calculate bufsize_cyclic for elf dump Baoquan He
2014-04-14  8:11       ` Baoquan He
2014-04-16  6:44       ` Baoquan He
2014-04-17  4:01         ` Atsushi Kumagai
2014-04-17  4:52           ` bhe
2014-04-17  5:02             ` bhe
2014-04-18  9:22               ` Atsushi Kumagai
2014-04-18 14:29                 ` bhe
2014-04-18 19:41                   ` Petr Tesarik
2014-04-21 15:19                     ` Vivek Goyal
2014-04-21 15:46                       ` Petr Tesarik
2014-04-21 15:51                         ` Vivek Goyal
2014-04-21 15:14                   ` Vivek Goyal
2014-04-23 11:09                     ` bhe
2014-04-21 15:12                 ` Vivek Goyal
2014-04-23  7:55                   ` Atsushi Kumagai
2014-04-23 11:55                     ` bhe
2014-04-23 17:08                     ` Vivek Goyal
2014-04-23 23:50                       ` bhe
2014-04-24  2:05                         ` bhe
2014-04-25 13:22                         ` Vivek Goyal
2014-04-28  5:05                           ` Atsushi Kumagai
2014-04-28 12:50                             ` Vivek Goyal
2014-05-09  5:36                               ` Atsushi Kumagai
2014-05-09 20:49                                 ` Vivek Goyal
2014-05-15  7:22                                   ` bhe
2014-05-15  9:10                                     ` Atsushi Kumagai
2014-05-19 11:15                                       ` bhe
2014-05-19 15:11                                         ` Vivek Goyal
2014-05-27  5:34                                           ` Atsushi Kumagai
2014-05-27 14:49                                             ` Vivek Goyal
2014-05-23  7:18                                         ` Atsushi Kumagai
2014-05-14  5:44                                 ` bhe
2014-04-28  5:04                       ` Atsushi Kumagai
2014-05-09  5:35                         ` Atsushi Kumagai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.