[RFC] vmcore: Add a kernel cmdline device_dump_limit
diff mbox series

Message ID 20190510102051.25647-1-kasong@redhat.com
State New, archived
Headers show
Series
  • [RFC] vmcore: Add a kernel cmdline device_dump_limit
Related show

Commit Message

Kairui Song May 10, 2019, 10:20 a.m. UTC
Device dump allow drivers to add device related dump data to vmcore as
they want. This have a potential issue, the data is stored in memory,
drivers may append too much data and use too much memory. The vmcore is
typically used in a kdump kernel which runs in a pre-reserved small
chunk of memory. So as a result it will make kdump unusable at all due
to OOM issues.

So introduce new device_dump_limit= kernel parameter, and set the
default limit to 0, so device dump is not enabled unless user specify
the accetable maxiam memory usage for device dump data. In this way user
will also have the chance to adjust the kdump reserved memory
accordingly.

Signed-off-by: Kairui Song <kasong@redhat.com>
---
 fs/proc/vmcore.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Comments

Bhupesh Sharma May 10, 2019, 11:17 a.m. UTC | #1
Hi Kairui,

Thanks for the patch. Please see my comments in-line:

On 05/10/2019 03:50 PM, Kairui Song wrote:
> Device dump allow drivers to add device related dump data to vmcore as
> they want. This have a potential issue, the data is stored in memory,
> drivers may append too much data and use too much memory. The vmcore is
> typically used in a kdump kernel which runs in a pre-reserved small
> chunk of memory. So as a result it will make kdump unusable at all due
> to OOM issues.
> 
> So introduce new device_dump_limit= kernel parameter, and set the
> default limit to 0, so device dump is not enabled unless user specify
> the accetable maxiam 

       ^^^^ acceptable maximum

> memory usage for device dump data. In this way user
> will also have the chance to adjust the kdump reserved memory
> accordingly.

Hmmm., this doesn't give much confidence with the 
PROC_VMCORE_DEVICE_DUMP feature in its current shape. Rather shouldn't 
we be enabling config PROC_VMCORE_DEVICE_DUMP only under EXPERT mode for 
now, considering that this feature needs further thrashing and testing 
with real setups including platforms where drivers append large amounts 
of data to vmcore:

diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index 817c02b13b1d..c47a12cf7fc0 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -45,7 +45,7 @@ config PROC_VMCORE
          Exports the dump image of crashed kernel in ELF format.

  config PROC_VMCORE_DEVICE_DUMP
-       bool "Device Hardware/Firmware Log Collection"
+       bool "Device Hardware/Firmware Log Collection" if EXPERT
         depends on PROC_VMCORE
         default n
         help
@@ -59,6 +59,12 @@ config PROC_VMCORE_DEVICE_DUMP
           If you say Y here, the collected device dumps will be added
           as ELF notes to /proc/vmcore.

+         Considering that there can be device drivers which append
+         large amounts of data to vmcore, you should say N here unless
+         you are reserving a large chunk of memory for crashdump
+         kernel, because otherwise the crashdump kernel might become
+         unusable due to OOM issues.
+

May be you can add a 'Fixes:' tag here.

> Signed-off-by: Kairui Song <kasong@redhat.com>
> ---
>   fs/proc/vmcore.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 3fe90443c1bb..e28695ef2439 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -53,6 +53,9 @@ static struct proc_dir_entry *proc_vmcore;
>   /* Device Dump list and mutex to synchronize access to list */
>   static LIST_HEAD(vmcoredd_list);
>   static DEFINE_MUTEX(vmcoredd_mutex);
> +
> +/* Device Dump Limit */
> +static size_t vmcoredd_limit;
>   #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
>   
>   /* Device Dump Size */
> @@ -1465,6 +1468,11 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
>   	data_size = roundup(sizeof(struct vmcoredd_header) + data->size,
>   			    PAGE_SIZE);
>   
> +	if (vmcoredd_orig_sz + data_size >= vmcoredd_limit) {
> +		ret = -ENOMEM;

Should we be adding a WARN() here to let the user know that the device 
dump data will not be available in vmcore?

> +		goto out_err;
> +	}
> +
>   	/* Allocate buffer for driver's to write their dumps */
>   	buf = vmcore_alloc_buf(data_size);
>   	if (!buf) {
> @@ -1502,6 +1510,18 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
>   	return ret;
>   }
>   EXPORT_SYMBOL(vmcore_add_device_dump);
> +
> +static int __init parse_vmcoredd_limit(char *arg)
> +{
> +	char *end;
> +
> +	if (!arg)
> +		return -EINVAL;
> +	vmcoredd_limit = memparse(arg, &end);
> +	return end > arg ? 0 : -EINVAL;
> +
> +}
> +__setup("device_dump_limit=", parse_vmcoredd_limit);

We should be adding this boot argument and its description to 
'Documentation/admin-guide/kernel-parameters.txt'

>   #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
>   
>   /* Free all dumps in vmcore device dump list */
> 

Thanks,
Bhupesh
Dave Young May 13, 2019, 1:52 a.m. UTC | #2
On 05/10/19 at 06:20pm, Kairui Song wrote:
> Device dump allow drivers to add device related dump data to vmcore as
> they want. This have a potential issue, the data is stored in memory,
> drivers may append too much data and use too much memory. The vmcore is
> typically used in a kdump kernel which runs in a pre-reserved small
> chunk of memory. So as a result it will make kdump unusable at all due
> to OOM issues.
> 
> So introduce new device_dump_limit= kernel parameter, and set the
> default limit to 0, so device dump is not enabled unless user specify
> the accetable maxiam memory usage for device dump data. In this way user
> will also have the chance to adjust the kdump reserved memory
> accordingly.

The device dump is only affective in kdump 2nd kernel, so add the
limitation seems not useful.  One is hard to know the correct size
unless one does some crash test.  If one did the test and want to eanble
the device dump he needs increase crashkernel= size in 1st kernel and
add the limit param in 2nd kernel.

So a global on/off param sounds easier and better, something like
vmcore_device_dump=on  (default is off) 

> 
> Signed-off-by: Kairui Song <kasong@redhat.com>
> ---
>  fs/proc/vmcore.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 3fe90443c1bb..e28695ef2439 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -53,6 +53,9 @@ static struct proc_dir_entry *proc_vmcore;
>  /* Device Dump list and mutex to synchronize access to list */
>  static LIST_HEAD(vmcoredd_list);
>  static DEFINE_MUTEX(vmcoredd_mutex);
> +
> +/* Device Dump Limit */
> +static size_t vmcoredd_limit;
>  #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
>  
>  /* Device Dump Size */
> @@ -1465,6 +1468,11 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
>  	data_size = roundup(sizeof(struct vmcoredd_header) + data->size,
>  			    PAGE_SIZE);
>  
> +	if (vmcoredd_orig_sz + data_size >= vmcoredd_limit) {
> +		ret = -ENOMEM;
> +		goto out_err;
> +	}
> +
>  	/* Allocate buffer for driver's to write their dumps */
>  	buf = vmcore_alloc_buf(data_size);
>  	if (!buf) {
> @@ -1502,6 +1510,18 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
>  	return ret;
>  }
>  EXPORT_SYMBOL(vmcore_add_device_dump);
> +
> +static int __init parse_vmcoredd_limit(char *arg)
> +{
> +	char *end;
> +
> +	if (!arg)
> +		return -EINVAL;
> +	vmcoredd_limit = memparse(arg, &end);
> +	return end > arg ? 0 : -EINVAL;
> +
> +}
> +__setup("device_dump_limit=", parse_vmcoredd_limit);
>  #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
>  
>  /* Free all dumps in vmcore device dump list */
> -- 
> 2.20.1
> 

Thanks
Dave
Kairui Song May 13, 2019, 2:19 a.m. UTC | #3
On Mon, May 13, 2019 at 9:52 AM Dave Young <dyoung@redhat.com> wrote:
>
> On 05/10/19 at 06:20pm, Kairui Song wrote:
> > Device dump allow drivers to add device related dump data to vmcore as
> > they want. This have a potential issue, the data is stored in memory,
> > drivers may append too much data and use too much memory. The vmcore is
> > typically used in a kdump kernel which runs in a pre-reserved small
> > chunk of memory. So as a result it will make kdump unusable at all due
> > to OOM issues.
> >
> > So introduce new device_dump_limit= kernel parameter, and set the
> > default limit to 0, so device dump is not enabled unless user specify
> > the accetable maxiam memory usage for device dump data. In this way user
> > will also have the chance to adjust the kdump reserved memory
> > accordingly.
>
> The device dump is only affective in kdump 2nd kernel, so add the
> limitation seems not useful.  One is hard to know the correct size
> unless one does some crash test.  If one did the test and want to eanble
> the device dump he needs increase crashkernel= size in 1st kernel and
> add the limit param in 2nd kernel.
>
> So a global on/off param sounds easier and better, something like
> vmcore_device_dump=on  (default is off)

Yes, on/off could be another way to solve this issue, the size limit
could being more flexibility, if device dump is not asking for too
much memory then it would just work but bring extra complexity indeed.
Considering it's actually hard to know how much memory is needed for
the device dump drivers to work, I'll update to use the on/off cmdline
then.

>
> >
> > Signed-off-by: Kairui Song <kasong@redhat.com>
> > ---
> >  fs/proc/vmcore.c | 20 ++++++++++++++++++++
> >  1 file changed, 20 insertions(+)
> >
> > diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> > index 3fe90443c1bb..e28695ef2439 100644
> > --- a/fs/proc/vmcore.c
> > +++ b/fs/proc/vmcore.c
> > @@ -53,6 +53,9 @@ static struct proc_dir_entry *proc_vmcore;
> >  /* Device Dump list and mutex to synchronize access to list */
> >  static LIST_HEAD(vmcoredd_list);
> >  static DEFINE_MUTEX(vmcoredd_mutex);
> > +
> > +/* Device Dump Limit */
> > +static size_t vmcoredd_limit;
> >  #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
> >
> >  /* Device Dump Size */
> > @@ -1465,6 +1468,11 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
> >       data_size = roundup(sizeof(struct vmcoredd_header) + data->size,
> >                           PAGE_SIZE);
> >
> > +     if (vmcoredd_orig_sz + data_size >= vmcoredd_limit) {
> > +             ret = -ENOMEM;
> > +             goto out_err;
> > +     }
> > +
> >       /* Allocate buffer for driver's to write their dumps */
> >       buf = vmcore_alloc_buf(data_size);
> >       if (!buf) {
> > @@ -1502,6 +1510,18 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
> >       return ret;
> >  }
> >  EXPORT_SYMBOL(vmcore_add_device_dump);
> > +
> > +static int __init parse_vmcoredd_limit(char *arg)
> > +{
> > +     char *end;
> > +
> > +     if (!arg)
> > +             return -EINVAL;
> > +     vmcoredd_limit = memparse(arg, &end);
> > +     return end > arg ? 0 : -EINVAL;
> > +
> > +}
> > +__setup("device_dump_limit=", parse_vmcoredd_limit);
> >  #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
> >
> >  /* Free all dumps in vmcore device dump list */
> > --
> > 2.20.1
> >
>
> Thanks
> Dave
Kairui Song May 16, 2019, 8:19 a.m. UTC | #4
On Fri, May 10, 2019 at 7:17 PM Bhupesh Sharma <bhsharma@redhat.com> wrote:
>
> Hi Kairui,
>
> Thanks for the patch. Please see my comments in-line:
>
> On 05/10/2019 03:50 PM, Kairui Song wrote:
> > Device dump allow drivers to add device related dump data to vmcore as
> > they want. This have a potential issue, the data is stored in memory,
> > drivers may append too much data and use too much memory. The vmcore is
> > typically used in a kdump kernel which runs in a pre-reserved small
> > chunk of memory. So as a result it will make kdump unusable at all due
> > to OOM issues.
> >
> > So introduce new device_dump_limit= kernel parameter, and set the
> > default limit to 0, so device dump is not enabled unless user specify
> > the accetable maxiam
>
>        ^^^^ acceptable maximum

Will fix this typo.

>
> > memory usage for device dump data. In this way user
> > will also have the chance to adjust the kdump reserved memory
> > accordingly.
>
> Hmmm., this doesn't give much confidence with the
> PROC_VMCORE_DEVICE_DUMP feature in its current shape. Rather shouldn't
> we be enabling config PROC_VMCORE_DEVICE_DUMP only under EXPERT mode for
> now, considering that this feature needs further thrashing and testing
> with real setups including platforms where drivers append large amounts
> of data to vmcore:

I think no need to move it to expert mode, just leave it disabled by
default should be better, that should be enough to make sure driver
won't append that much memory and cause OOM, while it could still be
enabled without changing the kernel, so this feature won't bring extra
risk, and could be enabled anytime easily.

>
> diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
> index 817c02b13b1d..c47a12cf7fc0 100644
> --- a/fs/proc/Kconfig
> +++ b/fs/proc/Kconfig
> @@ -45,7 +45,7 @@ config PROC_VMCORE
>           Exports the dump image of crashed kernel in ELF format.
>
>   config PROC_VMCORE_DEVICE_DUMP
> -       bool "Device Hardware/Firmware Log Collection"
> +       bool "Device Hardware/Firmware Log Collection" if EXPERT
>          depends on PROC_VMCORE
>          default n
>          help
> @@ -59,6 +59,12 @@ config PROC_VMCORE_DEVICE_DUMP
>            If you say Y here, the collected device dumps will be added
>            as ELF notes to /proc/vmcore.
>
> +         Considering that there can be device drivers which append
> +         large amounts of data to vmcore, you should say N here unless
> +         you are reserving a large chunk of memory for crashdump
> +         kernel, because otherwise the crashdump kernel might become
> +         unusable due to OOM issues.
> +
>
> May be you can add a 'Fixes:' tag here.

Problem is previous commit seems not broken, just bring extra memory
stress. Is "Fixes:" tag suitable for this commit?

>
> > Signed-off-by: Kairui Song <kasong@redhat.com>
> > ---
> >   fs/proc/vmcore.c | 20 ++++++++++++++++++++
> >   1 file changed, 20 insertions(+)
> >
> > diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> > index 3fe90443c1bb..e28695ef2439 100644
> > --- a/fs/proc/vmcore.c
> > +++ b/fs/proc/vmcore.c
> > @@ -53,6 +53,9 @@ static struct proc_dir_entry *proc_vmcore;
> >   /* Device Dump list and mutex to synchronize access to list */
> >   static LIST_HEAD(vmcoredd_list);
> >   static DEFINE_MUTEX(vmcoredd_mutex);
> > +
> > +/* Device Dump Limit */
> > +static size_t vmcoredd_limit;
> >   #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
> >
> >   /* Device Dump Size */
> > @@ -1465,6 +1468,11 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
> >       data_size = roundup(sizeof(struct vmcoredd_header) + data->size,
> >                           PAGE_SIZE);
> >
> > +     if (vmcoredd_orig_sz + data_size >= vmcoredd_limit) {
> > +             ret = -ENOMEM;
>
> Should we be adding a WARN() here to let the user know that the device
> dump data will not be available in vmcore?

Yes, that could be very helpful. How about pr_err_once? WARN is too
noise, just give a hint to the user that device dump is disabled
should be enough, so user will know why device dump data is not
present and will just enable it.

>
> > +             goto out_err;
> > +     }
> > +
> >       /* Allocate buffer for driver's to write their dumps */
> >       buf = vmcore_alloc_buf(data_size);
> >       if (!buf) {
> > @@ -1502,6 +1510,18 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
> >       return ret;
> >   }
> >   EXPORT_SYMBOL(vmcore_add_device_dump);
> > +
> > +static int __init parse_vmcoredd_limit(char *arg)
> > +{
> > +     char *end;
> > +
> > +     if (!arg)
> > +             return -EINVAL;
> > +     vmcoredd_limit = memparse(arg, &end);
> > +     return end > arg ? 0 : -EINVAL;
> > +
> > +}
> > +__setup("device_dump_limit=", parse_vmcoredd_limit);
>
> We should be adding this boot argument and its description to
> 'Documentation/admin-guide/kernel-parameters.txt'

Good suggestion, will update the document.

>
> >   #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
> >
> >   /* Free all dumps in vmcore device dump list */
> >
>
> Thanks,
> Bhupesh

Thanks for the review!



--
Best Regards,
Kairui Song
Bhupesh Sharma May 20, 2019, 5:55 a.m. UTC | #5
On 05/16/2019 01:49 PM, Kairui Song wrote:
> On Fri, May 10, 2019 at 7:17 PM Bhupesh Sharma <bhsharma@redhat.com> wrote:
>>
>> Hi Kairui,
>>
>> Thanks for the patch. Please see my comments in-line:
>>
>> On 05/10/2019 03:50 PM, Kairui Song wrote:
>>> Device dump allow drivers to add device related dump data to vmcore as
>>> they want. This have a potential issue, the data is stored in memory,
>>> drivers may append too much data and use too much memory. The vmcore is
>>> typically used in a kdump kernel which runs in a pre-reserved small
>>> chunk of memory. So as a result it will make kdump unusable at all due
>>> to OOM issues.
>>>
>>> So introduce new device_dump_limit= kernel parameter, and set the
>>> default limit to 0, so device dump is not enabled unless user specify
>>> the accetable maxiam
>>
>>         ^^^^ acceptable maximum
> 
> Will fix this typo.

Ok.

>>> memory usage for device dump data. In this way user
>>> will also have the chance to adjust the kdump reserved memory
>>> accordingly.
>>
>> Hmmm., this doesn't give much confidence with the
>> PROC_VMCORE_DEVICE_DUMP feature in its current shape. Rather shouldn't
>> we be enabling config PROC_VMCORE_DEVICE_DUMP only under EXPERT mode for
>> now, considering that this feature needs further thrashing and testing
>> with real setups including platforms where drivers append large amounts
>> of data to vmcore:
> 
> I think no need to move it to expert mode, just leave it disabled by
> default should be better, that should be enough to make sure driver
> won't append that much memory and cause OOM, while it could still be
> enabled without changing the kernel, so this feature won't bring extra
> risk, and could be enabled anytime easily.

I have seen some arm64 users report issues on mailing lists with 
PROC_VMCORE_DEVICE_DUMP enabled as this causes frequent OOM in the arm64 
crash dump kernel.

I think they are using this infrastructure to extend/enable device 
driver debugging on some arm64 platforms and finding issues with the 
crash dump kernel.

I will do some analysis later-on (when I get some spare time) and post a 
patch (if needed) to put the same under EXPERT mode for now.

>> diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
>> index 817c02b13b1d..c47a12cf7fc0 100644
>> --- a/fs/proc/Kconfig
>> +++ b/fs/proc/Kconfig
>> @@ -45,7 +45,7 @@ config PROC_VMCORE
>>            Exports the dump image of crashed kernel in ELF format.
>>
>>    config PROC_VMCORE_DEVICE_DUMP
>> -       bool "Device Hardware/Firmware Log Collection"
>> +       bool "Device Hardware/Firmware Log Collection" if EXPERT
>>           depends on PROC_VMCORE
>>           default n
>>           help
>> @@ -59,6 +59,12 @@ config PROC_VMCORE_DEVICE_DUMP
>>             If you say Y here, the collected device dumps will be added
>>             as ELF notes to /proc/vmcore.
>>
>> +         Considering that there can be device drivers which append
>> +         large amounts of data to vmcore, you should say N here unless
>> +         you are reserving a large chunk of memory for crashdump
>> +         kernel, because otherwise the crashdump kernel might become
>> +         unusable due to OOM issues.
>> +
>>
>> May be you can add a 'Fixes:' tag here.
> 
> Problem is previous commit seems not broken, just bring extra memory
> stress. Is "Fixes:" tag suitable for this commit?

I think since the earlier patch causes an OOM, it would be better to 
atleast mention it in the git log (for easier git bisect later on).

If not the 'Fixes:' tag may be we can use a 'Since commit ..' like 
wording in the commit log.

>>> Signed-off-by: Kairui Song <kasong@redhat.com>
>>> ---
>>>    fs/proc/vmcore.c | 20 ++++++++++++++++++++
>>>    1 file changed, 20 insertions(+)
>>>
>>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
>>> index 3fe90443c1bb..e28695ef2439 100644
>>> --- a/fs/proc/vmcore.c
>>> +++ b/fs/proc/vmcore.c
>>> @@ -53,6 +53,9 @@ static struct proc_dir_entry *proc_vmcore;
>>>    /* Device Dump list and mutex to synchronize access to list */
>>>    static LIST_HEAD(vmcoredd_list);
>>>    static DEFINE_MUTEX(vmcoredd_mutex);
>>> +
>>> +/* Device Dump Limit */
>>> +static size_t vmcoredd_limit;
>>>    #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
>>>
>>>    /* Device Dump Size */
>>> @@ -1465,6 +1468,11 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
>>>        data_size = roundup(sizeof(struct vmcoredd_header) + data->size,
>>>                            PAGE_SIZE);
>>>
>>> +     if (vmcoredd_orig_sz + data_size >= vmcoredd_limit) {
>>> +             ret = -ENOMEM;
>>
>> Should we be adding a WARN() here to let the user know that the device
>> dump data will not be available in vmcore?
> 
> Yes, that could be very helpful. How about pr_err_once? WARN is too
> noise, just give a hint to the user that device dump is disabled
> should be enough, so user will know why device dump data is not
> present and will just enable it.

Sure, pr_err() should be OK as well.

>>> +             goto out_err;
>>> +     }
>>> +
>>>        /* Allocate buffer for driver's to write their dumps */
>>>        buf = vmcore_alloc_buf(data_size);
>>>        if (!buf) {
>>> @@ -1502,6 +1510,18 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
>>>        return ret;
>>>    }
>>>    EXPORT_SYMBOL(vmcore_add_device_dump);
>>> +
>>> +static int __init parse_vmcoredd_limit(char *arg)
>>> +{
>>> +     char *end;
>>> +
>>> +     if (!arg)
>>> +             return -EINVAL;
>>> +     vmcoredd_limit = memparse(arg, &end);
>>> +     return end > arg ? 0 : -EINVAL;
>>> +
>>> +}
>>> +__setup("device_dump_limit=", parse_vmcoredd_limit);
>>
>> We should be adding this boot argument and its description to
>> 'Documentation/admin-guide/kernel-parameters.txt'
> 
> Good suggestion, will update the document.
> 
>>
>>>    #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
>>>
>>>    /* Free all dumps in vmcore device dump list */
>>>

Thanks,
Bhupesh
Kairui Song May 20, 2019, 6:20 a.m. UTC | #6
On Mon, May 20, 2019 at 1:55 PM Bhupesh Sharma <bhsharma@redhat.com> wrote:
>
> On 05/16/2019 01:49 PM, Kairui Song wrote:
> > On Fri, May 10, 2019 at 7:17 PM Bhupesh Sharma <bhsharma@redhat.com> wrote:
> >>
> >> Hi Kairui,
> >>
> >> Thanks for the patch. Please see my comments in-line:
> >>
> >> On 05/10/2019 03:50 PM, Kairui Song wrote:
> >>> Device dump allow drivers to add device related dump data to vmcore as
> >>> they want. This have a potential issue, the data is stored in memory,
> >>> drivers may append too much data and use too much memory. The vmcore is
> >>> typically used in a kdump kernel which runs in a pre-reserved small
> >>> chunk of memory. So as a result it will make kdump unusable at all due
> >>> to OOM issues.
> >>>
> >>> So introduce new device_dump_limit= kernel parameter, and set the
> >>> default limit to 0, so device dump is not enabled unless user specify
> >>> the accetable maxiam
> >>
> >>         ^^^^ acceptable maximum
> >
> > Will fix this typo.
>
> Ok.
>
> >>> memory usage for device dump data. In this way user
> >>> will also have the chance to adjust the kdump reserved memory
> >>> accordingly.
> >>
> >> Hmmm., this doesn't give much confidence with the
> >> PROC_VMCORE_DEVICE_DUMP feature in its current shape. Rather shouldn't
> >> we be enabling config PROC_VMCORE_DEVICE_DUMP only under EXPERT mode for
> >> now, considering that this feature needs further thrashing and testing
> >> with real setups including platforms where drivers append large amounts
> >> of data to vmcore:
> >
> > I think no need to move it to expert mode, just leave it disabled by
> > default should be better, that should be enough to make sure driver
> > won't append that much memory and cause OOM, while it could still be
> > enabled without changing the kernel, so this feature won't bring extra
> > risk, and could be enabled anytime easily.
>
> I have seen some arm64 users report issues on mailing lists with
> PROC_VMCORE_DEVICE_DUMP enabled as this causes frequent OOM in the arm64
> crash dump kernel.
>
> I think they are using this infrastructure to extend/enable device
> driver debugging on some arm64 platforms and finding issues with the
> crash dump kernel.
>
> I will do some analysis later-on (when I get some spare time) and post a
> patch (if needed) to put the same under EXPERT mode for now.
>
> >> diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
> >> index 817c02b13b1d..c47a12cf7fc0 100644
> >> --- a/fs/proc/Kconfig
> >> +++ b/fs/proc/Kconfig
> >> @@ -45,7 +45,7 @@ config PROC_VMCORE
> >>            Exports the dump image of crashed kernel in ELF format.
> >>
> >>    config PROC_VMCORE_DEVICE_DUMP
> >> -       bool "Device Hardware/Firmware Log Collection"
> >> +       bool "Device Hardware/Firmware Log Collection" if EXPERT
> >>           depends on PROC_VMCORE
> >>           default n
> >>           help
> >> @@ -59,6 +59,12 @@ config PROC_VMCORE_DEVICE_DUMP
> >>             If you say Y here, the collected device dumps will be added
> >>             as ELF notes to /proc/vmcore.
> >>
> >> +         Considering that there can be device drivers which append
> >> +         large amounts of data to vmcore, you should say N here unless
> >> +         you are reserving a large chunk of memory for crashdump
> >> +         kernel, because otherwise the crashdump kernel might become
> >> +         unusable due to OOM issues.
> >> +
> >>
> >> May be you can add a 'Fixes:' tag here.
> >
> > Problem is previous commit seems not broken, just bring extra memory
> > stress. Is "Fixes:" tag suitable for this commit?
>
> I think since the earlier patch causes an OOM, it would be better to
> atleast mention it in the git log (for easier git bisect later on).
>
> If not the 'Fixes:' tag may be we can use a 'Since commit ..' like
> wording in the commit log.
>
> >>> Signed-off-by: Kairui Song <kasong@redhat.com>
> >>> ---
> >>>    fs/proc/vmcore.c | 20 ++++++++++++++++++++
> >>>    1 file changed, 20 insertions(+)
> >>>
> >>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> >>> index 3fe90443c1bb..e28695ef2439 100644
> >>> --- a/fs/proc/vmcore.c
> >>> +++ b/fs/proc/vmcore.c
> >>> @@ -53,6 +53,9 @@ static struct proc_dir_entry *proc_vmcore;
> >>>    /* Device Dump list and mutex to synchronize access to list */
> >>>    static LIST_HEAD(vmcoredd_list);
> >>>    static DEFINE_MUTEX(vmcoredd_mutex);
> >>> +
> >>> +/* Device Dump Limit */
> >>> +static size_t vmcoredd_limit;
> >>>    #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
> >>>
> >>>    /* Device Dump Size */
> >>> @@ -1465,6 +1468,11 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
> >>>        data_size = roundup(sizeof(struct vmcoredd_header) + data->size,
> >>>                            PAGE_SIZE);
> >>>
> >>> +     if (vmcoredd_orig_sz + data_size >= vmcoredd_limit) {
> >>> +             ret = -ENOMEM;
> >>
> >> Should we be adding a WARN() here to let the user know that the device
> >> dump data will not be available in vmcore?
> >
> > Yes, that could be very helpful. How about pr_err_once? WARN is too
> > noise, just give a hint to the user that device dump is disabled
> > should be enough, so user will know why device dump data is not
> > present and will just enable it.
>
> Sure, pr_err() should be OK as well.
>
> >>> +             goto out_err;
> >>> +     }
> >>> +
> >>>        /* Allocate buffer for driver's to write their dumps */
> >>>        buf = vmcore_alloc_buf(data_size);
> >>>        if (!buf) {
> >>> @@ -1502,6 +1510,18 @@ int vmcore_add_device_dump(struct vmcoredd_data *data)
> >>>        return ret;
> >>>    }
> >>>    EXPORT_SYMBOL(vmcore_add_device_dump);
> >>> +
> >>> +static int __init parse_vmcoredd_limit(char *arg)
> >>> +{
> >>> +     char *end;
> >>> +
> >>> +     if (!arg)
> >>> +             return -EINVAL;
> >>> +     vmcoredd_limit = memparse(arg, &end);
> >>> +     return end > arg ? 0 : -EINVAL;
> >>> +
> >>> +}
> >>> +__setup("device_dump_limit=", parse_vmcoredd_limit);
> >>
> >> We should be adding this boot argument and its description to
> >> 'Documentation/admin-guide/kernel-parameters.txt'
> >
> > Good suggestion, will update the document.
> >
> >>
> >>>    #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
> >>>
> >>>    /* Free all dumps in vmcore device dump list */
> >>>
>
> Thanks,
> Bhupesh

Thanks for the reply, I've updated the patch accordingly and sent V2.

Patch
diff mbox series

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 3fe90443c1bb..e28695ef2439 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -53,6 +53,9 @@  static struct proc_dir_entry *proc_vmcore;
 /* Device Dump list and mutex to synchronize access to list */
 static LIST_HEAD(vmcoredd_list);
 static DEFINE_MUTEX(vmcoredd_mutex);
+
+/* Device Dump Limit */
+static size_t vmcoredd_limit;
 #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
 
 /* Device Dump Size */
@@ -1465,6 +1468,11 @@  int vmcore_add_device_dump(struct vmcoredd_data *data)
 	data_size = roundup(sizeof(struct vmcoredd_header) + data->size,
 			    PAGE_SIZE);
 
+	if (vmcoredd_orig_sz + data_size >= vmcoredd_limit) {
+		ret = -ENOMEM;
+		goto out_err;
+	}
+
 	/* Allocate buffer for driver's to write their dumps */
 	buf = vmcore_alloc_buf(data_size);
 	if (!buf) {
@@ -1502,6 +1510,18 @@  int vmcore_add_device_dump(struct vmcoredd_data *data)
 	return ret;
 }
 EXPORT_SYMBOL(vmcore_add_device_dump);
+
+static int __init parse_vmcoredd_limit(char *arg)
+{
+	char *end;
+
+	if (!arg)
+		return -EINVAL;
+	vmcoredd_limit = memparse(arg, &end);
+	return end > arg ? 0 : -EINVAL;
+
+}
+__setup("device_dump_limit=", parse_vmcoredd_limit);
 #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */
 
 /* Free all dumps in vmcore device dump list */