Re: doing lots of disk writes causes oom killer to kill processes

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: doing lots of disk writes causes oom killer to kill processes
@ 2013-03-12  2:15 Hillf Danton
  2013-03-12  9:03 ` Michal Suchanek
  2013-08-26 13:51 ` Michal Suchanek
  0 siblings, 2 replies; 13+ messages in thread
From: Hillf Danton @ 2013-03-12  2:15 UTC (permalink / raw)
  To: Michal Suchanek, LKML, Linux-MM

>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>> Hello,
>>
>> I am dealing with VM disk images and performing something like wiping
>> free space to prepare image for compressing and storing on server or
>> copying it to external USB disk causes
>>
>> 1) system lockup in order of a few tens of seconds when all CPU cores
>> are 100% used by system and the machine is basicaly unusable
>>
>> 2) oom killer killing processes
>>
>> This all on system with 8G ram so there should be plenty space to work with.
>>
>> This happens with kernels 3.6.4 or 3.7.1
>>
>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>> problem even with less ram.
>>
>> I have  vm.swappiness = 0 set for a long  time already.
>>
>>
>I did some testing with 3.7.1 and with swappiness as much as 75 the
>kernel still causes all cores to loop somewhere in system when writing
>lots of data to disk.
>
>With swappiness as much as 90 processes still get killed on large disk writes.
>
>Given that the max is 100 the interval in which mm works at all is
>going to be very narrow, less than 10% of the paramater range. This is
>a severe regression as is the cpu time consumed by the kernel.
>
>The io scheduler is the default cfq.
>
>If you have any idea what to try other than downgrading to an earlier
>unaffected kernel I would like to hear.
>
Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
deadlock caused by too_many_isolated())?

Or try 3.8 and/or 3.9, additionally?

Hillf

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-03-12  2:15 doing lots of disk writes causes oom killer to kill processes Hillf Danton
@ 2013-03-12  9:03 ` Michal Suchanek
  2013-08-26 13:51 ` Michal Suchanek
  1 sibling, 0 replies; 13+ messages in thread
From: Michal Suchanek @ 2013-03-12  9:03 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Linux-MM

On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>>> Hello,
>>>
>>> I am dealing with VM disk images and performing something like wiping
>>> free space to prepare image for compressing and storing on server or
>>> copying it to external USB disk causes
>>>
>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>>> are 100% used by system and the machine is basicaly unusable
>>>
>>> 2) oom killer killing processes
>>>
>>> This all on system with 8G ram so there should be plenty space to work with.
>>>
>>> This happens with kernels 3.6.4 or 3.7.1
>>>
>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>>> problem even with less ram.
>>>
>>> I have  vm.swappiness = 0 set for a long  time already.
>>>
>>>
>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>>kernel still causes all cores to loop somewhere in system when writing
>>lots of data to disk.
>>
>>With swappiness as much as 90 processes still get killed on large disk writes.
>>
>>Given that the max is 100 the interval in which mm works at all is
>>going to be very narrow, less than 10% of the paramater range. This is
>>a severe regression as is the cpu time consumed by the kernel.
>>
>>The io scheduler is the default cfq.
>>
>>If you have any idea what to try other than downgrading to an earlier
>>unaffected kernel I would like to hear.
>>
> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
> deadlock caused by too_many_isolated())?
>
> Or try 3.8 and/or 3.9, additionally?

Hello,

in the meantime I tried setting io scheduler to deadline because I
remember using that one in my self-built kernels due to cfq breaking
some obscure block driver.

With the deadline io scheduler I can set swappiness back to 0 and the
system works normally even for moderate amount of IO - restoring disk
images from network. This would cause lockups and oom killer running
loose with the cfq scheduler.

So I guess I found what breaks the system and it is not so much the
kernel version. It's using pre-built kernels with the default
scheduler.

Thanks

Michal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-03-12  2:15 doing lots of disk writes causes oom killer to kill processes Hillf Danton
  2013-03-12  9:03 ` Michal Suchanek
@ 2013-08-26 13:51 ` Michal Suchanek
  2013-09-05 10:12   ` Michal Suchanek
  1 sibling, 1 reply; 13+ messages in thread
From: Michal Suchanek @ 2013-08-26 13:51 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Linux-MM

On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>>> Hello,
>>>
>>> I am dealing with VM disk images and performing something like wiping
>>> free space to prepare image for compressing and storing on server or
>>> copying it to external USB disk causes
>>>
>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>>> are 100% used by system and the machine is basicaly unusable
>>>
>>> 2) oom killer killing processes
>>>
>>> This all on system with 8G ram so there should be plenty space to work with.
>>>
>>> This happens with kernels 3.6.4 or 3.7.1
>>>
>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>>> problem even with less ram.
>>>
>>> I have  vm.swappiness = 0 set for a long  time already.
>>>
>>>
>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>>kernel still causes all cores to loop somewhere in system when writing
>>lots of data to disk.
>>
>>With swappiness as much as 90 processes still get killed on large disk writes.
>>
>>Given that the max is 100 the interval in which mm works at all is
>>going to be very narrow, less than 10% of the paramater range. This is
>>a severe regression as is the cpu time consumed by the kernel.
>>
>>The io scheduler is the default cfq.
>>
>>If you have any idea what to try other than downgrading to an earlier
>>unaffected kernel I would like to hear.
>>
> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
> deadlock caused by too_many_isolated())?
>
> Or try 3.8 and/or 3.9, additionally?
>

Hello,

with deadline IO scheduler I experience this issue less often but it
still happens.

I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.

Do you have some idea what to log so that useful information about the
lockup is gathered?

Thanks

Michal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-08-26 13:51 ` Michal Suchanek
@ 2013-09-05 10:12   ` Michal Suchanek
  2013-09-17 13:31     ` Michal Suchanek
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Suchanek @ 2013-09-05 10:12 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Linux-MM

Hello

On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>>>> Hello,
>>>>
>>>> I am dealing with VM disk images and performing something like wiping
>>>> free space to prepare image for compressing and storing on server or
>>>> copying it to external USB disk causes
>>>>
>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>>>> are 100% used by system and the machine is basicaly unusable
>>>>
>>>> 2) oom killer killing processes
>>>>
>>>> This all on system with 8G ram so there should be plenty space to work with.
>>>>
>>>> This happens with kernels 3.6.4 or 3.7.1
>>>>
>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>>>> problem even with less ram.
>>>>
>>>> I have  vm.swappiness = 0 set for a long  time already.
>>>>
>>>>
>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>>>kernel still causes all cores to loop somewhere in system when writing
>>>lots of data to disk.
>>>
>>>With swappiness as much as 90 processes still get killed on large disk writes.
>>>
>>>Given that the max is 100 the interval in which mm works at all is
>>>going to be very narrow, less than 10% of the paramater range. This is
>>>a severe regression as is the cpu time consumed by the kernel.
>>>
>>>The io scheduler is the default cfq.
>>>
>>>If you have any idea what to try other than downgrading to an earlier
>>>unaffected kernel I would like to hear.
>>>
>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
>> deadlock caused by too_many_isolated())?
>>
>> Or try 3.8 and/or 3.9, additionally?
>>
>
> Hello,
>
> with deadline IO scheduler I experience this issue less often but it
> still happens.
>
> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
>
> Do you have some idea what to log so that useful information about the
> lockup is gathered?
>

This appears to be fixed in vanilla 3.11 kernel.

I still get short intermittent lockups and cpu usage spikes up to 20%
on a core but nowhere near the minute+ long lockups with all cores
100% on earlier kernels.

Thanks

Michal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-09-05 10:12   ` Michal Suchanek
@ 2013-09-17 13:31     ` Michal Suchanek
  2013-09-17 21:13       ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Suchanek @ 2013-09-17 13:31 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Linux-MM

On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
> Hello
>
> On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
>> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>>>>> Hello,
>>>>>
>>>>> I am dealing with VM disk images and performing something like wiping
>>>>> free space to prepare image for compressing and storing on server or
>>>>> copying it to external USB disk causes
>>>>>
>>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>>>>> are 100% used by system and the machine is basicaly unusable
>>>>>
>>>>> 2) oom killer killing processes
>>>>>
>>>>> This all on system with 8G ram so there should be plenty space to work with.
>>>>>
>>>>> This happens with kernels 3.6.4 or 3.7.1
>>>>>
>>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>>>>> problem even with less ram.
>>>>>
>>>>> I have  vm.swappiness = 0 set for a long  time already.
>>>>>
>>>>>
>>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>>>>kernel still causes all cores to loop somewhere in system when writing
>>>>lots of data to disk.
>>>>
>>>>With swappiness as much as 90 processes still get killed on large disk writes.
>>>>
>>>>Given that the max is 100 the interval in which mm works at all is
>>>>going to be very narrow, less than 10% of the paramater range. This is
>>>>a severe regression as is the cpu time consumed by the kernel.
>>>>
>>>>The io scheduler is the default cfq.
>>>>
>>>>If you have any idea what to try other than downgrading to an earlier
>>>>unaffected kernel I would like to hear.
>>>>
>>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
>>> deadlock caused by too_many_isolated())?
>>>
>>> Or try 3.8 and/or 3.9, additionally?
>>>
>>
>> Hello,
>>
>> with deadline IO scheduler I experience this issue less often but it
>> still happens.
>>
>> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
>>
>> Do you have some idea what to log so that useful information about the
>> lockup is gathered?
>>
>
> This appears to be fixed in vanilla 3.11 kernel.
>
> I still get short intermittent lockups and cpu usage spikes up to 20%
> on a core but nowhere near the minute+ long lockups with all cores
> 100% on earlier kernels.
>

So I did more testing on the 3.11 kernel and while it works OK with
tar you can get severe lockups with mc or kvm. The difference is
probably the fact that sane tools do fsync() on files they close
forcing the file to write out and the kernel returning possible write
errors before they move on to next file.

With kvm writing to a file used as virtual disk the system would stall
indefinitely until the disk driver in the emulated system would time
out, return disk IO error, and the emulated system would stop writing.
In top I see all CPU cores 90%+ in wait. System is unusable. With mc
the lockups would be indefinite, probably because there is no timeout
on writing a file in mc.

I tried tuning swappiness and eleveators but the the basic problem is
solved by neither: the dirty buffers fill up memory and system stalls
trying to resolve the situation.

Obviously the kernel puts off writing any dirty buffers until the
memory pressure is overwhelming and the vmm flops.

At least the OOM killer does not get invoked anymore since there is
lots of memory - just Linux does not know how to use it.

The solution to this problem is quite simple - use the ancient
userspace bdflushd or what it was called. I emulate it with
{ while true ; do sleep 5; sync ; done } &

The system performance suddenly increases - to the awesome Debian stable levels.

Thanks

Michal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-09-17 13:31     ` Michal Suchanek
@ 2013-09-17 21:13       ` Jan Kara
  2013-09-17 22:22         ` Michal Suchanek
  2013-09-18 14:56         ` Michal Suchanek
  0 siblings, 2 replies; 13+ messages in thread
From: Jan Kara @ 2013-09-17 21:13 UTC (permalink / raw)
  To: Michal Suchanek; +Cc: Hillf Danton, LKML, Linux-MM

  Hello,

On Tue 17-09-13 15:31:31, Michal Suchanek wrote:
> On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
> > On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
> >> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
> >>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
> >>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
> >>>>> Hello,
> >>>>>
> >>>>> I am dealing with VM disk images and performing something like wiping
> >>>>> free space to prepare image for compressing and storing on server or
> >>>>> copying it to external USB disk causes
> >>>>>
> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
> >>>>> are 100% used by system and the machine is basicaly unusable
> >>>>>
> >>>>> 2) oom killer killing processes
> >>>>>
> >>>>> This all on system with 8G ram so there should be plenty space to work with.
> >>>>>
> >>>>> This happens with kernels 3.6.4 or 3.7.1
> >>>>>
> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
> >>>>> problem even with less ram.
> >>>>>
> >>>>> I have  vm.swappiness = 0 set for a long  time already.
> >>>>>
> >>>>>
> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
> >>>>kernel still causes all cores to loop somewhere in system when writing
> >>>>lots of data to disk.
> >>>>
> >>>>With swappiness as much as 90 processes still get killed on large disk writes.
> >>>>
> >>>>Given that the max is 100 the interval in which mm works at all is
> >>>>going to be very narrow, less than 10% of the paramater range. This is
> >>>>a severe regression as is the cpu time consumed by the kernel.
> >>>>
> >>>>The io scheduler is the default cfq.
> >>>>
> >>>>If you have any idea what to try other than downgrading to an earlier
> >>>>unaffected kernel I would like to hear.
> >>>>
> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
> >>> deadlock caused by too_many_isolated())?
> >>>
> >>> Or try 3.8 and/or 3.9, additionally?
> >>>
> >>
> >> Hello,
> >>
> >> with deadline IO scheduler I experience this issue less often but it
> >> still happens.
> >>
> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
> >>
> >> Do you have some idea what to log so that useful information about the
> >> lockup is gathered?
> >>
> >
> > This appears to be fixed in vanilla 3.11 kernel.
> >
> > I still get short intermittent lockups and cpu usage spikes up to 20%
> > on a core but nowhere near the minute+ long lockups with all cores
> > 100% on earlier kernels.
> >
> 
> So I did more testing on the 3.11 kernel and while it works OK with
> tar you can get severe lockups with mc or kvm. The difference is
> probably the fact that sane tools do fsync() on files they close
> forcing the file to write out and the kernel returning possible write
> errors before they move on to next file.
  Sorry for chiming in a bit late. But is this really writing to a normal
disk? SATA drive or something else?

> With kvm writing to a file used as virtual disk the system would stall
> indefinitely until the disk driver in the emulated system would time
> out, return disk IO error, and the emulated system would stop writing.
> In top I see all CPU cores 90%+ in wait. System is unusable. With mc
> the lockups would be indefinite, probably because there is no timeout
> on writing a file in mc.
> 
> I tried tuning swappiness and eleveators but the the basic problem is
> solved by neither: the dirty buffers fill up memory and system stalls
> trying to resolve the situation.
  This is really strange. There is /proc/sys/vm/dirty_ratio, which limits
amount of dirty memory. By default it is set to 20% of memory which tends
to be too much for 8 GB machine. Can you set it to something like 5% and
/proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate
sizing (assuming standard SATA drive). Does it change anything?

If the problem doesn't go away, can you install systemtap on your system
and run attached script? It should report where exactly processes stall and
for how long which should help us address the issue. Thanks.

> Obviously the kernel puts off writing any dirty buffers until the
> memory pressure is overwhelming and the vmm flops.
> 
> At least the OOM killer does not get invoked anymore since there is
> lots of memory - just Linux does not know how to use it.
> 
> The solution to this problem is quite simple - use the ancient
> userspace bdflushd or what it was called. I emulate it with
> { while true ; do sleep 5; sync ; done } &
> 
> The system performance suddenly increases - to the awesome Debian stable levels.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-09-17 21:13       ` Jan Kara
@ 2013-09-17 22:22         ` Michal Suchanek
  2013-09-18 14:56         ` Michal Suchanek
  1 sibling, 0 replies; 13+ messages in thread
From: Michal Suchanek @ 2013-09-17 22:22 UTC (permalink / raw)
  To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM

On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
>   Hello,
>
> On Tue 17-09-13 15:31:31, Michal Suchanek wrote:
>> On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
>> > On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
>> >> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>> >>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>> >>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>> >>>>> Hello,
>> >>>>>
>> >>>>> I am dealing with VM disk images and performing something like wiping
>> >>>>> free space to prepare image for compressing and storing on server or
>> >>>>> copying it to external USB disk causes
>> >>>>>
>> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>> >>>>> are 100% used by system and the machine is basicaly unusable
>> >>>>>
>> >>>>> 2) oom killer killing processes
>> >>>>>
>> >>>>> This all on system with 8G ram so there should be plenty space to work with.
>> >>>>>
>> >>>>> This happens with kernels 3.6.4 or 3.7.1
>> >>>>>
>> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>> >>>>> problem even with less ram.
>> >>>>>
>> >>>>> I have  vm.swappiness = 0 set for a long  time already.
>> >>>>>
>> >>>>>
>> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>> >>>>kernel still causes all cores to loop somewhere in system when writing
>> >>>>lots of data to disk.
>> >>>>
>> >>>>With swappiness as much as 90 processes still get killed on large disk writes.
>> >>>>
>> >>>>Given that the max is 100 the interval in which mm works at all is
>> >>>>going to be very narrow, less than 10% of the paramater range. This is
>> >>>>a severe regression as is the cpu time consumed by the kernel.
>> >>>>
>> >>>>The io scheduler is the default cfq.
>> >>>>
>> >>>>If you have any idea what to try other than downgrading to an earlier
>> >>>>unaffected kernel I would like to hear.
>> >>>>
>> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
>> >>> deadlock caused by too_many_isolated())?
>> >>>
>> >>> Or try 3.8 and/or 3.9, additionally?
>> >>>
>> >>
>> >> Hello,
>> >>
>> >> with deadline IO scheduler I experience this issue less often but it
>> >> still happens.
>> >>
>> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
>> >>
>> >> Do you have some idea what to log so that useful information about the
>> >> lockup is gathered?
>> >>
>> >
>> > This appears to be fixed in vanilla 3.11 kernel.
>> >
>> > I still get short intermittent lockups and cpu usage spikes up to 20%
>> > on a core but nowhere near the minute+ long lockups with all cores
>> > 100% on earlier kernels.
>> >
>>
>> So I did more testing on the 3.11 kernel and while it works OK with
>> tar you can get severe lockups with mc or kvm. The difference is
>> probably the fact that sane tools do fsync() on files they close
>> forcing the file to write out and the kernel returning possible write
>> errors before they move on to next file.
>   Sorry for chiming in a bit late. But is this really writing to a normal
> disk? SATA drive or something else?

It's a LVM volume on a SATA drive. I sometimes use USB disks as well
but most of the time it's SATA or eSATA.

>
>> With kvm writing to a file used as virtual disk the system would stall
>> indefinitely until the disk driver in the emulated system would time
>> out, return disk IO error, and the emulated system would stop writing.
>> In top I see all CPU cores 90%+ in wait. System is unusable. With mc
>> the lockups would be indefinite, probably because there is no timeout
>> on writing a file in mc.
>>
>> I tried tuning swappiness and eleveators but the the basic problem is
>> solved by neither: the dirty buffers fill up memory and system stalls
>> trying to resolve the situation.
>   This is really strange. There is /proc/sys/vm/dirty_ratio, which limits
> amount of dirty memory. By default it is set to 20% of memory which tends
> to be too much for 8 GB machine. Can you set it to something like 5% and
> /proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate
> sizing (assuming standard SATA drive). Does it change anything?

I can try that but I don't really mind if the kernel uses 2G ram for
buffers. The problem is it cannot manage those buffers. Does some
kernel structure grow out of proportion when the buffers reach this
size or something?

Thanks

Michal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-09-17 21:13       ` Jan Kara
  2013-09-17 22:22         ` Michal Suchanek
@ 2013-09-18 14:56         ` Michal Suchanek
  2013-09-19 10:13           ` Jan Kara
       [not found]           ` <CAJd=RBD_6FMHS3Dg_Zqugs4YCHHDeCgrxypANpPP5K2xTLE0bA@mail.gmail.com>
  1 sibling, 2 replies; 13+ messages in thread
From: Michal Suchanek @ 2013-09-18 14:56 UTC (permalink / raw)
  To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM

On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
>   Hello,
>
> On Tue 17-09-13 15:31:31, Michal Suchanek wrote:
>> On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
>> > On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
>> >> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
>> >>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
>> >>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
>> >>>>> Hello,
>> >>>>>
>> >>>>> I am dealing with VM disk images and performing something like wiping
>> >>>>> free space to prepare image for compressing and storing on server or
>> >>>>> copying it to external USB disk causes
>> >>>>>
>> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
>> >>>>> are 100% used by system and the machine is basicaly unusable
>> >>>>>
>> >>>>> 2) oom killer killing processes
>> >>>>>
>> >>>>> This all on system with 8G ram so there should be plenty space to work with.
>> >>>>>
>> >>>>> This happens with kernels 3.6.4 or 3.7.1
>> >>>>>
>> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
>> >>>>> problem even with less ram.
>> >>>>>
>> >>>>> I have  vm.swappiness = 0 set for a long  time already.
>> >>>>>
>> >>>>>
>> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
>> >>>>kernel still causes all cores to loop somewhere in system when writing
>> >>>>lots of data to disk.
>> >>>>
>> >>>>With swappiness as much as 90 processes still get killed on large disk writes.
>> >>>>
>> >>>>Given that the max is 100 the interval in which mm works at all is
>> >>>>going to be very narrow, less than 10% of the paramater range. This is
>> >>>>a severe regression as is the cpu time consumed by the kernel.
>> >>>>
>> >>>>The io scheduler is the default cfq.
>> >>>>
>> >>>>If you have any idea what to try other than downgrading to an earlier
>> >>>>unaffected kernel I would like to hear.
>> >>>>
>> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
>> >>> deadlock caused by too_many_isolated())?
>> >>>
>> >>> Or try 3.8 and/or 3.9, additionally?
>> >>>
>> >>
>> >> Hello,
>> >>
>> >> with deadline IO scheduler I experience this issue less often but it
>> >> still happens.
>> >>
>> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
>> >>
>> >> Do you have some idea what to log so that useful information about the
>> >> lockup is gathered?
>> >>
>> >
>> > This appears to be fixed in vanilla 3.11 kernel.
>> >
>> > I still get short intermittent lockups and cpu usage spikes up to 20%
>> > on a core but nowhere near the minute+ long lockups with all cores
>> > 100% on earlier kernels.
>> >
>>
>> So I did more testing on the 3.11 kernel and while it works OK with
>> tar you can get severe lockups with mc or kvm. The difference is
>> probably the fact that sane tools do fsync() on files they close
>> forcing the file to write out and the kernel returning possible write
>> errors before they move on to next file.
>   Sorry for chiming in a bit late. But is this really writing to a normal
> disk? SATA drive or something else?
>
>> With kvm writing to a file used as virtual disk the system would stall
>> indefinitely until the disk driver in the emulated system would time
>> out, return disk IO error, and the emulated system would stop writing.
>> In top I see all CPU cores 90%+ in wait. System is unusable. With mc
>> the lockups would be indefinite, probably because there is no timeout
>> on writing a file in mc.
>>
>> I tried tuning swappiness and eleveators but the the basic problem is
>> solved by neither: the dirty buffers fill up memory and system stalls
>> trying to resolve the situation.
>   This is really strange. There is /proc/sys/vm/dirty_ratio, which limits
> amount of dirty memory. By default it is set to 20% of memory which tends
> to be too much for 8 GB machine. Can you set it to something like 5% and
> /proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate
> sizing (assuming standard SATA drive). Does it change anything?

The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
these to 5/2 gives about the same result as running the script that
syncs every 5s. Setting to 30/10 gives larger data chunks and
intermittent lockup before every chunk is written.

It is quite possible to set kernel parameters that kill the kernel but

1) this is the default
2) the parameter is set in units that do not prevent the issue in
general (% RAM vs #blocks)
3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
traversing a structure holding 800M data in the background. Something
is seriously rotten somewhere.

Thanks

Michal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-09-18 14:56         ` Michal Suchanek
@ 2013-09-19 10:13           ` Jan Kara
  2013-10-09 14:19             ` Michal Suchanek
       [not found]           ` <CAJd=RBD_6FMHS3Dg_Zqugs4YCHHDeCgrxypANpPP5K2xTLE0bA@mail.gmail.com>
  1 sibling, 1 reply; 13+ messages in thread
From: Jan Kara @ 2013-09-19 10:13 UTC (permalink / raw)
  To: Michal Suchanek; +Cc: Jan Kara, Hillf Danton, LKML, Linux-MM

[-- Attachment #1: Type: text/plain, Size: 6403 bytes --]

On Wed 18-09-13 16:56:08, Michal Suchanek wrote:
> On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
> >   Hello,
> >
> > On Tue 17-09-13 15:31:31, Michal Suchanek wrote:
> >> On 5 September 2013 12:12, Michal Suchanek <hramrach@gmail.com> wrote:
> >> > On 26 August 2013 15:51, Michal Suchanek <hramrach@gmail.com> wrote:
> >> >> On 12 March 2013 03:15, Hillf Danton <dhillf@gmail.com> wrote:
> >> >>>>On 11 March 2013 13:15, Michal Suchanek <hramrach@gmail.com> wrote:
> >> >>>>>On 8 February 2013 17:31, Michal Suchanek <hramrach@gmail.com> wrote:
> >> >>>>> Hello,
> >> >>>>>
> >> >>>>> I am dealing with VM disk images and performing something like wiping
> >> >>>>> free space to prepare image for compressing and storing on server or
> >> >>>>> copying it to external USB disk causes
> >> >>>>>
> >> >>>>> 1) system lockup in order of a few tens of seconds when all CPU cores
> >> >>>>> are 100% used by system and the machine is basicaly unusable
> >> >>>>>
> >> >>>>> 2) oom killer killing processes
> >> >>>>>
> >> >>>>> This all on system with 8G ram so there should be plenty space to work with.
> >> >>>>>
> >> >>>>> This happens with kernels 3.6.4 or 3.7.1
> >> >>>>>
> >> >>>>> With earlier kernel versions (some 3.0 or 3.2 kernels) this was not a
> >> >>>>> problem even with less ram.
> >> >>>>>
> >> >>>>> I have  vm.swappiness = 0 set for a long  time already.
> >> >>>>>
> >> >>>>>
> >> >>>>I did some testing with 3.7.1 and with swappiness as much as 75 the
> >> >>>>kernel still causes all cores to loop somewhere in system when writing
> >> >>>>lots of data to disk.
> >> >>>>
> >> >>>>With swappiness as much as 90 processes still get killed on large disk writes.
> >> >>>>
> >> >>>>Given that the max is 100 the interval in which mm works at all is
> >> >>>>going to be very narrow, less than 10% of the paramater range. This is
> >> >>>>a severe regression as is the cpu time consumed by the kernel.
> >> >>>>
> >> >>>>The io scheduler is the default cfq.
> >> >>>>
> >> >>>>If you have any idea what to try other than downgrading to an earlier
> >> >>>>unaffected kernel I would like to hear.
> >> >>>>
> >> >>> Can you try commit 3cf23841b4b7(mm/vmscan.c: avoid possible
> >> >>> deadlock caused by too_many_isolated())?
> >> >>>
> >> >>> Or try 3.8 and/or 3.9, additionally?
> >> >>>
> >> >>
> >> >> Hello,
> >> >>
> >> >> with deadline IO scheduler I experience this issue less often but it
> >> >> still happens.
> >> >>
> >> >> I am on 3.9.6 Debian kernel so 3.8 did not fix this problem.
> >> >>
> >> >> Do you have some idea what to log so that useful information about the
> >> >> lockup is gathered?
> >> >>
> >> >
> >> > This appears to be fixed in vanilla 3.11 kernel.
> >> >
> >> > I still get short intermittent lockups and cpu usage spikes up to 20%
> >> > on a core but nowhere near the minute+ long lockups with all cores
> >> > 100% on earlier kernels.
> >> >
> >>
> >> So I did more testing on the 3.11 kernel and while it works OK with
> >> tar you can get severe lockups with mc or kvm. The difference is
> >> probably the fact that sane tools do fsync() on files they close
> >> forcing the file to write out and the kernel returning possible write
> >> errors before they move on to next file.
> >   Sorry for chiming in a bit late. But is this really writing to a normal
> > disk? SATA drive or something else?
> >
> >> With kvm writing to a file used as virtual disk the system would stall
> >> indefinitely until the disk driver in the emulated system would time
> >> out, return disk IO error, and the emulated system would stop writing.
> >> In top I see all CPU cores 90%+ in wait. System is unusable. With mc
> >> the lockups would be indefinite, probably because there is no timeout
> >> on writing a file in mc.
> >>
> >> I tried tuning swappiness and eleveators but the the basic problem is
> >> solved by neither: the dirty buffers fill up memory and system stalls
> >> trying to resolve the situation.
> >   This is really strange. There is /proc/sys/vm/dirty_ratio, which limits
> > amount of dirty memory. By default it is set to 20% of memory which tends
> > to be too much for 8 GB machine. Can you set it to something like 5% and
> > /proc/sys/vm/dirty_background_ratio to 2%? That would be more appropriate
> > sizing (assuming standard SATA drive). Does it change anything?
> 
> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
  Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10
to better accomodate some workloads but 60/40 on 8 GB machines with
SATA drive really seems too much. That is going to give memory management a
headache.

The problem is that a good SATA drive can do ~100 MB/s if we are
lucky and IO is sequential. Thus if you have 5 GB of dirty data to write,
it takes 50s at best to write it, with more random IO to image file it can
well take several minutes to write. That may cause some increased latency
when memory reclaim waits for writeback to clean some pages.

> these to 5/2 gives about the same result as running the script that
> syncs every 5s. Setting to 30/10 gives larger data chunks and
> intermittent lockup before every chunk is written.
> 
> It is quite possible to set kernel parameters that kill the kernel but
> 
> 1) this is the default
  Not upstream one so you should raise this with Debian I guess. 60/40
looks way out of reasonable range for todays machines.

> 2) the parameter is set in units that do not prevent the issue in
> general (% RAM vs #blocks)
  You can set the number of bytes instead of percentage -
/proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper
sizing depends on amount of memory, storage HW, workload. So it's more an
administrative task to set this tunable properly.

> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
> traversing a structure holding 800M data in the background. Something
> is seriously rotten somewhere.
  Likely processes are waiting in direct reclaim for IO to finish. But that
is just guessing. Try running attached script (forgot to attach it to
previous email). You will need systemtap and kernel debuginfo installed.
The script doesn't work with all versions of systemtap (as it is sadly a
moving target) so if it fails, tell me your version of systemtap and I'll
update the script accordingly.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: watch-dstate.pl --]
[-- Type: application/x-perl, Size: 11084 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
       [not found]           ` <CAJd=RBD_6FMHS3Dg_Zqugs4YCHHDeCgrxypANpPP5K2xTLE0bA@mail.gmail.com>
@ 2013-09-20 11:20             ` Michal Suchanek
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Suchanek @ 2013-09-20 11:20 UTC (permalink / raw)
  To: Hillf Danton, Linux-MM, Linux Kernel Mailing List, Jan Kara

Hello,

On 19 September 2013 10:07, Hillf Danton <dhillf@gmail.com> wrote:
> Hello Michal
>
> Take it easy please, the kernel is made by human hands.
>
> Can you please try the diff(and sorry if mail agent reformats it)?
>
> Best Regards
> Hillf
>
>
> --- a/mm/vmscan.c Wed Sep 18 08:44:08 2013
> +++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013
> @@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to
>   * implies that pages are cycling through the LRU faster than
>   * they are written so also forcibly stall.
>   */
> - if (nr_unqueued_dirty == nr_taken || nr_immediate)
> + if (nr_unqueued_dirty == nr_taken || nr_immediate) {
> + if (current_is_kswapd())
> + wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES);
>   congestion_wait(BLK_RW_ASYNC, HZ/10);
> + }
>   }
>
>   /*
> --

I applied the patch and raised the dirty block ratios to 30/10 and the
default 60/40 while imaging a VM and did not observe any problems so I
guess this solves it.

Thanks

Michal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-09-19 10:13           ` Jan Kara
@ 2013-10-09 14:19             ` Michal Suchanek
  2013-10-15 14:15               ` Michal Suchanek
  2014-07-07 11:34               ` Michal Suchanek
  0 siblings, 2 replies; 13+ messages in thread
From: Michal Suchanek @ 2013-10-09 14:19 UTC (permalink / raw)
  To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM

Hello,

On 19 September 2013 12:13, Jan Kara <jack@suse.cz> wrote:
> On Wed 18-09-13 16:56:08, Michal Suchanek wrote:
>> On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
>> >   Hello,
>>
>> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
>   Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10
> to better accomodate some workloads but 60/40 on 8 GB machines with
> SATA drive really seems too much. That is going to give memory management a
> headache.
>
> The problem is that a good SATA drive can do ~100 MB/s if we are
> lucky and IO is sequential. Thus if you have 5 GB of dirty data to write,
> it takes 50s at best to write it, with more random IO to image file it can
> well take several minutes to write. That may cause some increased latency
> when memory reclaim waits for writeback to clean some pages.
>
>> these to 5/2 gives about the same result as running the script that
>> syncs every 5s. Setting to 30/10 gives larger data chunks and
>> intermittent lockup before every chunk is written.
>>
>> It is quite possible to set kernel parameters that kill the kernel but
>>
>> 1) this is the default
>   Not upstream one so you should raise this with Debian I guess. 60/40
> looks way out of reasonable range for todays machines.
>
>> 2) the parameter is set in units that do not prevent the issue in
>> general (% RAM vs #blocks)
>   You can set the number of bytes instead of percentage -
> /proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper
> sizing depends on amount of memory, storage HW, workload. So it's more an
> administrative task to set this tunable properly.
>
>> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
>> traversing a structure holding 800M data in the background. Something
>> is seriously rotten somewhere.
>   Likely processes are waiting in direct reclaim for IO to finish. But that
> is just guessing. Try running attached script (forgot to attach it to
> previous email). You will need systemtap and kernel debuginfo installed.
> The script doesn't work with all versions of systemtap (as it is sadly a
> moving target) so if it fails, tell me your version of systemtap and I'll
> update the script accordingly.

This was fixed for me by the patch posted earlier by Hillf Danton so I
guess this answers what the system was (not) doing:

--- a/mm/vmscan.c Wed Sep 18 08:44:08 2013
+++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013
@@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to
  * implies that pages are cycling through the LRU faster than
  * they are written so also forcibly stall.
  */
- if (nr_unqueued_dirty == nr_taken || nr_immediate)
+ if (nr_unqueued_dirty == nr_taken || nr_immediate) {
+ if (current_is_kswapd())
+ wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES);
  congestion_wait(BLK_RW_ASYNC, HZ/10);
+ }
  }

  /*

Also 75485363 is hopefully addressing this issue in mainline.

Thanks

Michal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-10-09 14:19             ` Michal Suchanek
@ 2013-10-15 14:15               ` Michal Suchanek
  2014-07-07 11:34               ` Michal Suchanek
  1 sibling, 0 replies; 13+ messages in thread
From: Michal Suchanek @ 2013-10-15 14:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM

On 9 October 2013 16:19, Michal Suchanek <hramrach@gmail.com> wrote:
> Hello,
>
> On 19 September 2013 12:13, Jan Kara <jack@suse.cz> wrote:
>> On Wed 18-09-13 16:56:08, Michal Suchanek wrote:
>>> On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
>>> >   Hello,
>>>
>>> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
>>   Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10
>> to better accomodate some workloads but 60/40 on 8 GB machines with
>> SATA drive really seems too much. That is going to give memory management a
>> headache.
>>
>> The problem is that a good SATA drive can do ~100 MB/s if we are
>> lucky and IO is sequential. Thus if you have 5 GB of dirty data to write,
>> it takes 50s at best to write it, with more random IO to image file it can
>> well take several minutes to write. That may cause some increased latency
>> when memory reclaim waits for writeback to clean some pages.
>>
>>> these to 5/2 gives about the same result as running the script that
>>> syncs every 5s. Setting to 30/10 gives larger data chunks and
>>> intermittent lockup before every chunk is written.
>>>
>>> It is quite possible to set kernel parameters that kill the kernel but
>>>
>>> 1) this is the default
>>   Not upstream one so you should raise this with Debian I guess. 60/40
>> looks way out of reasonable range for todays machines.
>>
>>> 2) the parameter is set in units that do not prevent the issue in
>>> general (% RAM vs #blocks)
>>   You can set the number of bytes instead of percentage -
>> /proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper
>> sizing depends on amount of memory, storage HW, workload. So it's more an
>> administrative task to set this tunable properly.
>>
>>> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
>>> traversing a structure holding 800M data in the background. Something
>>> is seriously rotten somewhere.
>>   Likely processes are waiting in direct reclaim for IO to finish. But that
>> is just guessing. Try running attached script (forgot to attach it to
>> previous email). You will need systemtap and kernel debuginfo installed.
>> The script doesn't work with all versions of systemtap (as it is sadly a
>> moving target) so if it fails, tell me your version of systemtap and I'll
>> update the script accordingly.
>
> This was fixed for me by the patch posted earlier by Hillf Danton so I
> guess this answers what the system was (not) doing:
>
> --- a/mm/vmscan.c Wed Sep 18 08:44:08 2013
> +++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013
> @@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to
>   * implies that pages are cycling through the LRU faster than
>   * they are written so also forcibly stall.
>   */
> - if (nr_unqueued_dirty == nr_taken || nr_immediate)
> + if (nr_unqueued_dirty == nr_taken || nr_immediate) {
> + if (current_is_kswapd())
> + wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES);
>   congestion_wait(BLK_RW_ASYNC, HZ/10);
> + }
>   }
>
>   /*
>
> Also 75485363 is hopefully addressing this issue in mainline.
>

Actually, this was in 3.11 already and it did make the behaviour a bit
better but was not enough.

So is something like the vmscan.c patch going to make it into the
mainline kernel?

Thanks

Michal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: doing lots of disk writes causes oom killer to kill processes
  2013-10-09 14:19             ` Michal Suchanek
  2013-10-15 14:15               ` Michal Suchanek
@ 2014-07-07 11:34               ` Michal Suchanek
  1 sibling, 0 replies; 13+ messages in thread
From: Michal Suchanek @ 2014-07-07 11:34 UTC (permalink / raw)
  To: Jan Kara; +Cc: Hillf Danton, LKML, Linux-MM

On 9 October 2013 16:19, Michal Suchanek <hramrach@gmail.com> wrote:
> Hello,
>
> On 19 September 2013 12:13, Jan Kara <jack@suse.cz> wrote:
>> On Wed 18-09-13 16:56:08, Michal Suchanek wrote:
>>> On 17 September 2013 23:13, Jan Kara <jack@suse.cz> wrote:
>>> >   Hello,
>>>
>>> The default for dirty_ratio/dirty_background_ratio is 60/40. Setting
>>   Ah, that's not upstream default. Upstream has 20/10. In SLES we use 40/10
>> to better accomodate some workloads but 60/40 on 8 GB machines with
>> SATA drive really seems too much. That is going to give memory management a
>> headache.
>>
>> The problem is that a good SATA drive can do ~100 MB/s if we are
>> lucky and IO is sequential. Thus if you have 5 GB of dirty data to write,
>> it takes 50s at best to write it, with more random IO to image file it can
>> well take several minutes to write. That may cause some increased latency
>> when memory reclaim waits for writeback to clean some pages.
>>
>>> these to 5/2 gives about the same result as running the script that
>>> syncs every 5s. Setting to 30/10 gives larger data chunks and
>>> intermittent lockup before every chunk is written.
>>>
>>> It is quite possible to set kernel parameters that kill the kernel but
>>>
>>> 1) this is the default
>>   Not upstream one so you should raise this with Debian I guess. 60/40
>> looks way out of reasonable range for todays machines.
>>
>>> 2) the parameter is set in units that do not prevent the issue in
>>> general (% RAM vs #blocks)
>>   You can set the number of bytes instead of percentage -
>> /proc/sys/vm/dirty_bytes / dirty_background_bytes. It's just that proper
>> sizing depends on amount of memory, storage HW, workload. So it's more an
>> administrative task to set this tunable properly.
>>
>>> 3) WTH is the system doing? It's 4core 3GHz cpu so it can handle
>>> traversing a structure holding 800M data in the background. Something
>>> is seriously rotten somewhere.
>>   Likely processes are waiting in direct reclaim for IO to finish. But that
>> is just guessing. Try running attached script (forgot to attach it to
>> previous email). You will need systemtap and kernel debuginfo installed.
>> The script doesn't work with all versions of systemtap (as it is sadly a
>> moving target) so if it fails, tell me your version of systemtap and I'll
>> update the script accordingly.
>
> This was fixed for me by the patch posted earlier by Hillf Danton so I
> guess this answers what the system was (not) doing:
>
> --- a/mm/vmscan.c Wed Sep 18 08:44:08 2013
> +++ b/mm/vmscan.c Wed Sep 18 09:31:34 2013
> @@ -1543,8 +1543,11 @@ shrink_inactive_list(unsigned long nr_to
>   * implies that pages are cycling through the LRU faster than
>   * they are written so also forcibly stall.
>   */
> - if (nr_unqueued_dirty == nr_taken || nr_immediate)
> + if (nr_unqueued_dirty == nr_taken || nr_immediate) {
> + if (current_is_kswapd())
> + wakeup_flusher_threads(0, WB_REASON_TRY_TO_FREE_PAGES);
>   congestion_wait(BLK_RW_ASYNC, HZ/10);
> + }
>   }
>
>   /*
>

Hello,

Is this being addressed somehow?

It seems the 3.15 kernel still has this issue  .. unless it happens to
lock up for some other reason in similar situations.

Thanks

Michal

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-07-07 11:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-12  2:15 doing lots of disk writes causes oom killer to kill processes Hillf Danton
2013-03-12  9:03 ` Michal Suchanek
2013-08-26 13:51 ` Michal Suchanek
2013-09-05 10:12   ` Michal Suchanek
2013-09-17 13:31     ` Michal Suchanek
2013-09-17 21:13       ` Jan Kara
2013-09-17 22:22         ` Michal Suchanek
2013-09-18 14:56         ` Michal Suchanek
2013-09-19 10:13           ` Jan Kara
2013-10-09 14:19             ` Michal Suchanek
2013-10-15 14:15               ` Michal Suchanek
2014-07-07 11:34               ` Michal Suchanek
     [not found]           ` <CAJd=RBD_6FMHS3Dg_Zqugs4YCHHDeCgrxypANpPP5K2xTLE0bA@mail.gmail.com>
2013-09-20 11:20             ` Michal Suchanek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).