* (unknown), @ 2013-06-25 9:25 Nagachandra P 2013-06-26 14:02 ` Memory allocation can cause ext4 filesystem to be remounted r/o Theodore Ts'o 0 siblings, 1 reply; 12+ messages in thread From: Nagachandra P @ 2013-06-25 9:25 UTC (permalink / raw) To: Vikram MP; +Cc: linux-ext4 Hi, Here are some details on the platform Linux kernel version - 3.4.5 Android - 4.2.2 ext4 mounted with *errors=panic* option. We see memory allocation failures mostly caused by low memory kill the ext4 process which is waiting for a allocation on slow path. (below is one such instance) select 26413 (AsyncTask #3), score_adj 647, adj 10,size 15287, to kill send sigkill to 26413 (AsyncTask #3), score_adj 647,adj 10, size 15287 with ofree -450 10896, cfree 27845 984 msa 529 ma 8 AsyncTask #3: page allocation failure: order:0, mode:0x80050 [<c001595c>] (unwind_backtrace+0x0/0x11c) from [<c00dc064>] (warn_alloc_failed+0xe8/0x110) [<c00dc064>] (warn_alloc_failed+0xe8/0x110) from [<c00dee54>] (__alloc_pages_nodemask+0x6d4/0x800) [<c00dee54>] (__alloc_pages_nodemask+0x6d4/0x800) from [<c05fe250>] (cache_alloc_refill+0x30c/0x6a4) [<c05fe250>] (cache_alloc_refill+0x30c/0x6a4) from [<c0104eb4>] (kmem_cache_alloc+0xa0/0x1b8) [<c0104eb4>] (kmem_cache_alloc+0xa0/0x1b8) from [<c0192c34>] (ext4_free_blocks+0x9c4/0xa08) [<c0192c34>] (ext4_free_blocks+0x9c4/0xa08) from [<c01873bc>] (ext4_ext_remove_space+0x690/0xd9c) [<c01873bc>] (ext4_ext_remove_space+0x690/0xd9c) from [<c01897f8>] (ext4_ext_truncate+0x100/0x1c8) [<c01897f8>] (ext4_ext_truncate+0x100/0x1c8) from [<c016447c>] (ext4_truncate+0xf4/0x194) [<c016447c>] (ext4_truncate+0xf4/0x194) from [<c0166cc0>] (ext4_setattr+0x36c/0x3f8) [<c0166cc0>] (ext4_setattr+0x36c/0x3f8) from [<c011f540>] (notify_change+0x1dc/0x2a8) [<c011f540>] (notify_change+0x1dc/0x2a8) from [<c0107cc8>] (do_truncate+0x74/0x90) [<c0107cc8>] (do_truncate+0x74/0x90) from [<c0107e20>] (do_sys_ftruncate+0x13c/0x144) [<c0107e20>] (do_sys_ftruncate+0x13c/0x144) from [<c0108020>] (sys_ftruncate+0x18/0x1c) [<c0108020>] (sys_ftruncate+0x18/0x1c) from [<c000e140>] (ret_fast_syscall+0x0/0x48) followed by.... SLAB: Unable to allocate memory on node 0 (gfp=0x80050) cache: ext4_free_data, object size: 64, order: 0 node 0: slabs: 0/0, objs: 0/0, free: 0 EXT4-fs error (device mmcblk0p26) in ext4_free_blocks:4700: Out of memory Aborting journal on device mmcblk0p26-8. EXT4-fs error (device mmcblk0p26): ext4_journal_start_sb:328: Detected aborted journal EXT4-fs (mmcblk0p26): Remounting filesystem read-only Kernel panic - not syncing: EXT4-fs panic from previous error [<c001595c>] (unwind_backtrace+0x0/0x11c) from [<c05fc5b4>] (panic+0x80/0x1cc) [<c05fc5b4>] (panic+0x80/0x1cc) from [<c017ddec>] (__ext4_abort+0xc0/0xe0) [<c017ddec>] (__ext4_abort+0xc0/0xe0) from [<c017dfa0>] (ext4_journal_start_sb+0x194/0x1c4) [<c017dfa0>] (ext4_journal_start_sb+0x194/0x1c4) from [<c0168c60>] (ext4_dirty_inode+0x14/0x40) [<c0168c60>] (ext4_dirty_inode+0x14/0x40) from [<c01293c0>] (__mark_inode_dirty+0x2c/0x1b4) [<c01293c0>] (__mark_inode_dirty+0x2c/0x1b4) from [<c011d6b8>] (file_update_time+0xfc/0x11c) [<c011d6b8>] (file_update_time+0xfc/0x11c) from [<c00d8f34>] (__generic_file_aio_write+0x2d8/0x40c) [<c00d8f34>] (__generic_file_aio_write+0x2d8/0x40c) from [<c00d90c8>] (generic_file_aio_write+0x60/0xc8) [<c00d90c8>] (generic_file_aio_write+0x60/0xc8) from [<c015f74c>] (ext4_file_write+0x244/0x2b4) [<c015f74c>] (ext4_file_write+0x244/0x2b4) from [<c0108a38>] (do_sync_write+0x9c/0xd8) [<c0108a38>] (do_sync_write+0x9c/0xd8) from [<c0109304>] (vfs_write+0xb0/0x128) [<c0109304>] (vfs_write+0xb0/0x128) from [<c010953c>] (sys_write+0x38/0x64) [<c010953c>] (sys_write+0x38/0x64) from [<c000e140>] (ret_fast_syscall+0x0/0x48) Is there a way in which we could avoid ext4 panic caused by allocation failure (a method other than setting errors=continue :-) )? (or is memory allocation failure considered as fatal as any other IO error) Thanks Nagachandra ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation can cause ext4 filesystem to be remounted r/o 2013-06-25 9:25 (unknown), Nagachandra P @ 2013-06-26 14:02 ` Theodore Ts'o 2013-06-26 14:54 ` Theodore Ts'o 0 siblings, 1 reply; 12+ messages in thread From: Theodore Ts'o @ 2013-06-26 14:02 UTC (permalink / raw) To: Nagachandra P; +Cc: Vikram MP, linux-ext4 On Tue, Jun 25, 2013 at 02:55:33PM +0530, Nagachandra P wrote: > > Here are some details on the platform > > Linux kernel version - 3.4.5 > Android - 4.2.2 > ext4 mounted with *errors=panic* option. > > We see memory allocation failures mostly caused by low memory kill the > ext4 process which is waiting for a allocation on slow path. (below is > one such instance) > > Is there a way in which we could avoid ext4 panic caused by allocation > failure (a method other than setting errors=continue :-) )? (or is > memory allocation failure considered as fatal as any other IO error) In this particular case, we could reflect the error all the way up to the ftruncate(2) system call. Fixing this is going to be a bit involved, unfortunately; we'll need to update a fairly large number of function signatures, including ext4_truncate(), ext4_ext_truncate(), ext4_free_blocks(), and a number of others. One of the problems is that there are code paths, such as ext4's evict_inode() call, where there is the potential that if there was a file descriptor holding the inode open at the time when it was unlinked, we can only delete the file (which involves a call to ext4_truncate) in ext4_evict_inode(), and there isn't a good error recovery path in that case. Probably the best short-term fix for now is to add a flag used by ext4_free_blocks() which retries the memory allocation in a loop (see the retry_alloc loop in jbd2_journal_write_metadata_buffer() in fs/jbd2/journal.c) and then initially add this flag to all of the callers of ext4_free_blocks(). We'll then need to fix the various callers where we can reflect the error back to userspace to do so, and then drop the flag. In the case of ext4_evict_inode(), what we can do is to call ext4_truncate() inode truncation in the unlink() system call if there are no other file descriptors keeping the inode from being deleted immediately. - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation can cause ext4 filesystem to be remounted r/o 2013-06-26 14:02 ` Memory allocation can cause ext4 filesystem to be remounted r/o Theodore Ts'o @ 2013-06-26 14:54 ` Theodore Ts'o 2013-06-26 15:20 ` Nagachandra P 2013-06-26 18:53 ` Joseph D. Wagner 0 siblings, 2 replies; 12+ messages in thread From: Theodore Ts'o @ 2013-06-26 14:54 UTC (permalink / raw) To: Nagachandra P; +Cc: Vikram MP, linux-ext4 On Wed, Jun 26, 2013 at 10:02:05AM -0400, Theodore Ts'o wrote: > > In this particular case, we could reflect the error all the way up to > the ftruncate(2) system call. Fixing this is going to be a bit > involved, unfortunately; we'll need to update a fairly large number of > function signatures, including ext4_truncate(), ext4_ext_truncate(), > ext4_free_blocks(), and a number of others. One thing that comes to mind. If we change things so that ftruncate reflects an ENOMEM error all the way up to userspace, one side effect of this is that the file may be partially truncated when ENOMEM is returned. Applications may not be prepared for this. There would be a similar issue if we do the truncate in the unlink call and return ENOMEM in case of a failure, the file might not be unlinked, and in fact we might have a partially truncated file in the directory, which would probably cause all sorts of confusion. So we're probably better off, putting the inode on a list of inodes in memory, and on the orphan list on disk, and then retry the truncation when memory is available. Messy, but that probably gives the best result for applications living constantly in high memory pressure environments. - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation can cause ext4 filesystem to be remounted r/o 2013-06-26 14:54 ` Theodore Ts'o @ 2013-06-26 15:20 ` Nagachandra P 2013-06-26 16:34 ` Theodore Ts'o 2013-06-26 18:53 ` Joseph D. Wagner 1 sibling, 1 reply; 12+ messages in thread From: Nagachandra P @ 2013-06-26 15:20 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Vikram MP, linux-ext4 Thanks Theodore, We also have seen case where the current allocation itself could cause the lowmem shrinker to be called (which in-turn chooses the same process for killing because of oom_adj_value of the current process, oom_adj_value is a weight age value associated with each process based on which the android low memory killer would select a process for killing to get memory). If we chose to retry in such case we could end up in endless loop of retrying the allocation. It would be better to handle this without retrying. We could your above suggestion which could address this specific path. But, there are quiet a number of allocation in ext4 which could call ext4_std_error on failure and we may need to look each one of them to see on how do we handle each one of them. Do think this something that could be done? We have in the past tried some ugly hacks to workaround the problem (by adjusting oom_adj_values, guarding them from being killed) but they don't seem provide fool proof mechanism at high memory pressure environment. Any advice on what we could try to fix the issue in general would be appreciated? Thanks again. Best regards Nagachandra On Wed, Jun 26, 2013 at 8:24 PM, Theodore Ts'o <tytso@mit.edu> wrote: > On Wed, Jun 26, 2013 at 10:02:05AM -0400, Theodore Ts'o wrote: >> >> In this particular case, we could reflect the error all the way up to >> the ftruncate(2) system call. Fixing this is going to be a bit >> involved, unfortunately; we'll need to update a fairly large number of >> function signatures, including ext4_truncate(), ext4_ext_truncate(), >> ext4_free_blocks(), and a number of others. > > One thing that comes to mind. If we change things so that ftruncate > reflects an ENOMEM error all the way up to userspace, one side effect > of this is that the file may be partially truncated when ENOMEM is > returned. Applications may not be prepared for this. > > There would be a similar issue if we do the truncate in the unlink > call and return ENOMEM in case of a failure, the file might not be > unlinked, and in fact we might have a partially truncated file in the > directory, which would probably cause all sorts of confusion. So > we're probably better off, putting the inode on a list of inodes in > memory, and on the orphan list on disk, and then retry the truncation > when memory is available. Messy, but that probably gives the best > result for applications living constantly in high memory pressure > environments. > > - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation can cause ext4 filesystem to be remounted r/o 2013-06-26 15:20 ` Nagachandra P @ 2013-06-26 16:34 ` Theodore Ts'o 2013-06-26 17:05 ` Nagachandra P 0 siblings, 1 reply; 12+ messages in thread From: Theodore Ts'o @ 2013-06-26 16:34 UTC (permalink / raw) To: Nagachandra P; +Cc: Vikram MP, linux-ext4 On Wed, Jun 26, 2013 at 08:50:50PM +0530, Nagachandra P wrote: > > We also have seen case where the current allocation itself could cause > the lowmem shrinker to be called (which in-turn chooses the same > process for killing because of oom_adj_value of the current process, > oom_adj_value is a weight age value associated with each process based > on which the android low memory killer would select a process for > killing to get memory). If we chose to retry in such case we could end > up in endless loop of retrying the allocation. It would be better to > handle this without retrying. The challenge is that in some cases there's no good way to return an error back upwards, and in other cases, the ability to back out of the middle of a file system operation is incredibly hard. This is why we have the retry loop in the jbd2 code; the presumption is that some other process is scheduable, so that allows other processes to exit when the OOM killer takes out other processes. It's not an ideal solution, but in practice it's been good enough. In general the OOM killer will be able to take out some other process and free up memory that way. Are you seeing this a lot? If so, I think it's fair to ask why; from what I can tell it's not a situation that is happening often on most systems using ext4 (including Android devices, of which I have several). > We could your above suggestion which could address this specific path. > But, there are quiet a number of allocation in ext4 which could call > ext4_std_error on failure and we may need to look each one of them to > see on how do we handle each one of them. Do think this something that > could be done? There aren't that many places where ext4 does memory allocations, actually. And once you exclude those which are used when the file system is initially mounted, there is quite a manageable number. It's probably better to audit all of those and to make sure we have a good error recovery if any of these calls to kmalloc() or kmem_cache_alloc() fail. In many of the cases where we end up calling ext4_std_error(), the most common cause of is an I/O error while trying to read some critical metadata block, and in that case, declaring that the file system is corrupted is in fact the appropriate thing to do. > We have in the past tried some ugly hacks to workaround the problem > (by adjusting oom_adj_values, guarding them from being killed) but > they don't seem provide fool proof mechanism at high memory pressure > environment. Any advice on what we could try to fix the issue in > general would be appreciated? What version of the kernel are using? And do you understand why you are under so much memory pressure? Is it due to applications not getting killed quickly enough? Are applications dirtying too much memory too quickly? Is write throttling not working? Or are they allocating too much memory when they start up their JVM? Or is it just that your Android device has far less memory than most of the other devices out there? Speaking generally, if you're regularly seeing that kmem_cache_alloc failing, that means free memory has fallen to zero. Which to me sounds like the OOM killer should be trying to kill processes more aggressively, and more generally you should be trying to be trying to make sure the kernel is maintaining a somewhat larger amount of free memory. The fact that you mentioned trying to prevent certain processes from being killed may mean that you are approaching this problem from the wrong direction. It may be more fruitful to encourage the system to kill those user applications that most deserving _earlier_. Regards, - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation can cause ext4 filesystem to be remounted r/o 2013-06-26 16:34 ` Theodore Ts'o @ 2013-06-26 17:05 ` Nagachandra P 2013-06-26 18:03 ` Theodore Ts'o 0 siblings, 1 reply; 12+ messages in thread From: Nagachandra P @ 2013-06-26 17:05 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Vikram MP, linux-ext4 Hi Theodore, Kernel version we are using is 3.4.5 (AOSP based). These issue are not easy to reproduce!!! We are running multiple applications (of different memory size) over a period of a 24 hrs to 36 hrs and we hit this once. We have seen these issues easier to reproduce typically with around 512MB memory (may be in about 16 hrs - 20 hrs), and harder to reproduce with 1GB memory. Most of the time we get into these situation are when an application (Typically AsyncTasks in Android) that is doing ext4 fs ops are of low adj values (> 9, typically 10 - 12) and hence would be fairly gullible to be killed (and there would be no way to distinguish this from application perspective), this is one of the challenges we are facing. Also, here we are don't have to completely be out of memory (but just withing the LMK band for the process adj value). But, on rethinking your idea on retrying may work if we have some tweaks in LMK as well (like killing multiple tasks instead of just one). Thanks Naga On Wed, Jun 26, 2013 at 10:04 PM, Theodore Ts'o <tytso@mit.edu> wrote: > On Wed, Jun 26, 2013 at 08:50:50PM +0530, Nagachandra P wrote: >> >> We also have seen case where the current allocation itself could cause >> the lowmem shrinker to be called (which in-turn chooses the same >> process for killing because of oom_adj_value of the current process, >> oom_adj_value is a weight age value associated with each process based >> on which the android low memory killer would select a process for >> killing to get memory). If we chose to retry in such case we could end >> up in endless loop of retrying the allocation. It would be better to >> handle this without retrying. > > The challenge is that in some cases there's no good way to return an > error back upwards, and in other cases, the ability to back out of the > middle of a file system operation is incredibly hard. This is why we > have the retry loop in the jbd2 code; the presumption is that some > other process is scheduable, so that allows other processes to exit > when the OOM killer takes out other processes. > > It's not an ideal solution, but in practice it's been good enough. In > general the OOM killer will be able to take out some other process and > free up memory that way. > > Are you seeing this a lot? If so, I think it's fair to ask why; from > what I can tell it's not a situation that is happening often on most > systems using ext4 (including Android devices, of which I have > several). > >> We could your above suggestion which could address this specific path. >> But, there are quiet a number of allocation in ext4 which could call >> ext4_std_error on failure and we may need to look each one of them to >> see on how do we handle each one of them. Do think this something that >> could be done? > > There aren't that many places where ext4 does memory allocations, > actually. And once you exclude those which are used when the file > system is initially mounted, there is quite a manageable number. It's > probably better to audit all of those and to make sure we have a good > error recovery if any of these calls to kmalloc() or > kmem_cache_alloc() fail. > > In many of the cases where we end up calling ext4_std_error(), the > most common cause of is an I/O error while trying to read some > critical metadata block, and in that case, declaring that the file > system is corrupted is in fact the appropriate thing to do. > >> We have in the past tried some ugly hacks to workaround the problem >> (by adjusting oom_adj_values, guarding them from being killed) but >> they don't seem provide fool proof mechanism at high memory pressure >> environment. Any advice on what we could try to fix the issue in >> general would be appreciated? > > What version of the kernel are using? And do you understand why you > are under so much memory pressure? Is it due to applications not > getting killed quickly enough? Are applications dirtying too much > memory too quickly? Is write throttling not working? Or are they > allocating too much memory when they start up their JVM? Or is it > just that your Android device has far less memory than most of the > other devices out there? > > Speaking generally, if you're regularly seeing that kmem_cache_alloc > failing, that means free memory has fallen to zero. Which to me > sounds like the OOM killer should be trying to kill processes more > aggressively, and more generally you should be trying to be trying to > make sure the kernel is maintaining a somewhat larger amount of free > memory. The fact that you mentioned trying to prevent certain > processes from being killed may mean that you are approaching this > problem from the wrong direction. It may be more fruitful to > encourage the system to kill those user applications that most > deserving _earlier_. > > Regards, > > - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation can cause ext4 filesystem to be remounted r/o 2013-06-26 17:05 ` Nagachandra P @ 2013-06-26 18:03 ` Theodore Ts'o 2013-06-27 12:58 ` Nagachandra P 0 siblings, 1 reply; 12+ messages in thread From: Theodore Ts'o @ 2013-06-26 18:03 UTC (permalink / raw) To: Nagachandra P; +Cc: Vikram MP, linux-ext4 On Wed, Jun 26, 2013 at 10:35:22PM +0530, Nagachandra P wrote: > > These issue are not easy to reproduce!!! We are running multiple > applications (of different memory size) over a period of a 24 hrs to > 36 hrs and we hit this once. We have seen these issues easier to > reproduce typically with around 512MB memory (may be in about 16 hrs - > 20 hrs), and harder to reproduce with 1GB memory. > > Most of the time we get into these situation are when an application > (Typically AsyncTasks in Android) that is doing ext4 fs ops are of low > adj values (> 9, typically 10 - 12) and hence would be fairly gullible > to be killed (and there would be no way to distinguish this from > application perspective), this is one of the challenges we are facing. > Also, here we are don't have to completely be out of memory (but just > withing the LMK band for the process adj value). To be clear, if the application is killed by the low memory killer, we're not going to trigger the ext4_std_err() codepath. The ext4_std_error() is getting called because free memory has fallen to _zero_ and so kmem_cache_alloc() returns an error. Should ext4 do a better job with handling this? Yes, absolutely. I do consider this a fs bug that we should try to fix. The reality though is if that free memory has gone to zero, it's going to put multiple kernel subsystems under stress. It is good to hear that this is only happening on highly memory constrained devices --- speaking as a owner of a Nexus 4 with 2GB of memory. :-P That's why the bigger issue is why did free memory go to zero in the first place? That means the LMK was probably not being aggressive enough, or something started consuming a lot of memory too quickly, before the page cleaner and write throttling algorithms could kick in and try to deal with it. > But, on rethinking your idea on retrying may work if we have some > tweaks in LMK as well (like killing multiple tasks instead of just > one). You might also consider looking at tweaking the mm low watermark and minimum watermark. See the tunable /proc/sys/vm/min_free_kbytes. You might want to just simply try monitorinig the free memory levels on a continuous basis, and see how often it's dropping below some minimum level. This will allow you to give you a figure of merit by which you can try tuning your system, without needing to wait for a file system error. Cheers, - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation can cause ext4 filesystem to be remounted r/o 2013-06-26 18:03 ` Theodore Ts'o @ 2013-06-27 12:58 ` Nagachandra P 2013-06-27 17:36 ` Theodore Ts'o 0 siblings, 1 reply; 12+ messages in thread From: Nagachandra P @ 2013-06-27 12:58 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Vikram MP, linux-ext4 Hi Theodore, Could you point me to the code where ext4_std_err is not triggered because of LMK? As I see it, if a memory allocation returns error in some of the case ext4_std_error would invariably be called. Please consider the following call stack send sigkill to 5648 (id.app.sbrowser), score_adj 1000,adj 15, size 13257 with ofree -2010 20287, cfree 18597 902 msa 1000 ma 15 id.app.sbrowser: page allocation failure: order:0, mode:0x50 [<c0013aa8>] (unwind_backtrace+0x0/0x11c) from [<c00d6530>] (warn_alloc_failed+0xe8/0x110) [<c00d6530>] (warn_alloc_failed+0xe8/0x110) from [<c00d9308>] (__alloc_pages_nodemask+0x6d4/0x804) [<c00d9308>] (__alloc_pages_nodemask+0x6d4/0x804) from [<c00d2b34>] (find_or_create_page+0x40/0x84) [<c00d2b34>] (find_or_create_page+0x40/0x84) from [<c0188858>] (ext4_mb_load_buddy+0xd4/0x2b4) [<c0188858>] (ext4_mb_load_buddy+0xd4/0x2b4) from [<c018c69c>] (ext4_free_blocks+0x5d4/0xa08) [<c018c69c>] (ext4_free_blocks+0x5d4/0xa08) from [<c0181218>] (ext4_ext_remove_space+0x690/0xd9c) [<c0181218>] (ext4_ext_remove_space+0x690/0xd9c) from [<c0183654>] (ext4_ext_truncate+0x100/0x1c8) [<c0183654>] (ext4_ext_truncate+0x100/0x1c8) from [<c015e2ec>] (ext4_truncate+0xf4/0x194) [<c015e2ec>] (ext4_truncate+0xf4/0x194) from [<c01629dc>] (ext4_evict_inode+0x3b4/0x4ac) [<c01629dc>] (ext4_evict_inode+0x3b4/0x4ac) from [<c011871c>] (evict+0x8c/0x150) [<c011871c>] (evict+0x8c/0x150) from [<c010f030>] (do_unlinkat+0xdc/0x134) [<c010f030>] (do_unlinkat+0xdc/0x134) from [<c000e100>] (ret_fast_syscall+0x0/0x30) The failure to allocate memory in above case is because of the kill signal received. __alloc_pages_slowpath would return NULL in case its received a KILL signal. (I don't see any code in 3.4.5 that would check for something similar to TIF_MEMDIE to make an decision on whether to call ext4_std_error or not, is this added recently). Thanks Naga On Wed, Jun 26, 2013 at 11:33 PM, Theodore Ts'o <tytso@mit.edu> wrote: > On Wed, Jun 26, 2013 at 10:35:22PM +0530, Nagachandra P wrote: >> >> These issue are not easy to reproduce!!! We are running multiple >> applications (of different memory size) over a period of a 24 hrs to >> 36 hrs and we hit this once. We have seen these issues easier to >> reproduce typically with around 512MB memory (may be in about 16 hrs - >> 20 hrs), and harder to reproduce with 1GB memory. >> >> Most of the time we get into these situation are when an application >> (Typically AsyncTasks in Android) that is doing ext4 fs ops are of low >> adj values (> 9, typically 10 - 12) and hence would be fairly gullible >> to be killed (and there would be no way to distinguish this from >> application perspective), this is one of the challenges we are facing. >> Also, here we are don't have to completely be out of memory (but just >> withing the LMK band for the process adj value). > > To be clear, if the application is killed by the low memory killer, > we're not going to trigger the ext4_std_err() codepath. The > ext4_std_error() is getting called because free memory has fallen to > _zero_ and so kmem_cache_alloc() returns an error. Should ext4 do a > better job with handling this? Yes, absolutely. I do consider this a > fs bug that we should try to fix. The reality though is if that free > memory has gone to zero, it's going to put multiple kernel subsystems > under stress. > > It is good to hear that this is only happening on highly memory > constrained devices --- speaking as a owner of a Nexus 4 with 2GB of > memory. :-P > > That's why the bigger issue is why did free memory go to zero in the > first place? That means the LMK was probably not being aggressive > enough, or something started consuming a lot of memory too quickly, > before the page cleaner and write throttling algorithms could kick in > and try to deal with it. > >> But, on rethinking your idea on retrying may work if we have some >> tweaks in LMK as well (like killing multiple tasks instead of just >> one). > > You might also consider looking at tweaking the mm low watermark and > minimum watermark. See the tunable /proc/sys/vm/min_free_kbytes. > > You might want to just simply try monitorinig the free memory levels > on a continuous basis, and see how often it's dropping below some > minimum level. This will allow you to give you a figure of merit by > which you can try tuning your system, without needing to wait for a > file system error. > > Cheers, > > - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation can cause ext4 filesystem to be remounted r/o 2013-06-27 12:58 ` Nagachandra P @ 2013-06-27 17:36 ` Theodore Ts'o 2013-06-28 13:52 ` Nagachandra P 0 siblings, 1 reply; 12+ messages in thread From: Theodore Ts'o @ 2013-06-27 17:36 UTC (permalink / raw) To: Nagachandra P; +Cc: Vikram MP, linux-ext4 On Thu, Jun 27, 2013 at 06:28:21PM +0530, Nagachandra P wrote: > Hi Theodore, > > Could you point me to the code where ext4_std_err is not triggered > because of LMK? As I see it, if a memory allocation returns error in > some of the case ext4_std_error would invariably be called. Please > consider the following call stack Yes, that's one example where a memory allocation failure can lead to ext4_std_error() getting called, and I've already acknowledged that's one that we need to fix (although as I said, fixing it may be tricky, short of calling congestion_wait() and then retrying the allocation, and hoping that in the meantime the OOM killer has freed up some memory). If you'd could give me a list of other memory allocations where ext4_std_error() could get called, please let me know. Note that in the jbd2 layer, though, we handle a memory allocation failure by retrying the allocation, to avoid this the file system getting marked read/only. Examples of this include in jbd2_journal_write_metadata_buffer(), and in jbd2_journal_add_journal_head() when it calls journal_alloc_journal_head(). (Although the way we're doing the retry in the latter case is a bit ugly and we're not sleeping with a call to congestion_wait(), so it's something we should clean up.) To give you an example of the intended use of ext4_std_error(), if the journal commit code runs into a disk I/O error while writing to the journal, the jbd2 code has to mark the journal as aborted. This could happen because the disk has gone off-line, or the HDD has run out of spare disk sectors in its bad block replacement pool, so it has to return a write error to the OS. Once the journal has been marked as aborted, the next time the ext4 code tries to access the journal, by starting a new journal handle, or marking a metadata block dirty, the jbd2 function will return an error, and this will cause ext4_std_error() to be called so the file system can be marked as requiring a file system check. Regards, - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation can cause ext4 filesystem to be remounted r/o 2013-06-27 17:36 ` Theodore Ts'o @ 2013-06-28 13:52 ` Nagachandra P 0 siblings, 0 replies; 12+ messages in thread From: Nagachandra P @ 2013-06-28 13:52 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Vikram MP, linux-ext4 Thanks a lot for explaining this. I will have a look into the jbd2 code for having similar implementation on ext4 as well. I will keep you posted on any patches we try out and get your opinion. Best regards Naga On Thu, Jun 27, 2013 at 11:06 PM, Theodore Ts'o <tytso@mit.edu> wrote: > On Thu, Jun 27, 2013 at 06:28:21PM +0530, Nagachandra P wrote: >> Hi Theodore, >> >> Could you point me to the code where ext4_std_err is not triggered >> because of LMK? As I see it, if a memory allocation returns error in >> some of the case ext4_std_error would invariably be called. Please >> consider the following call stack > > Yes, that's one example where a memory allocation failure can lead to > ext4_std_error() getting called, and I've already acknowledged that's > one that we need to fix (although as I said, fixing it may be tricky, > short of calling congestion_wait() and then retrying the allocation, > and hoping that in the meantime the OOM killer has freed up some > memory). > > If you'd could give me a list of other memory allocations where > ext4_std_error() could get called, please let me know. Note that in > the jbd2 layer, though, we handle a memory allocation failure by > retrying the allocation, to avoid this the file system getting marked > read/only. Examples of this include in jbd2_journal_write_metadata_buffer(), > and in jbd2_journal_add_journal_head() when it calls > journal_alloc_journal_head(). (Although the way we're doing the retry > in the latter case is a bit ugly and we're not sleeping with a call to > congestion_wait(), so it's something we should clean up.) > > To give you an example of the intended use of ext4_std_error(), if the > journal commit code runs into a disk I/O error while writing to the > journal, the jbd2 code has to mark the journal as aborted. This could > happen because the disk has gone off-line, or the HDD has run out of > spare disk sectors in its bad block replacement pool, so it has to > return a write error to the OS. Once the journal has been marked as > aborted, the next time the ext4 code tries to access the journal, by > starting a new journal handle, or marking a metadata block dirty, the > jbd2 function will return an error, and this will cause > ext4_std_error() to be called so the file system can be marked as > requiring a file system check. > > Regards, > > - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation can cause ext4 filesystem to be remounted r/o 2013-06-26 14:54 ` Theodore Ts'o 2013-06-26 15:20 ` Nagachandra P @ 2013-06-26 18:53 ` Joseph D. Wagner 2013-06-26 22:14 ` Theodore Ts'o 1 sibling, 1 reply; 12+ messages in thread From: Joseph D. Wagner @ 2013-06-26 18:53 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Nagachandra P, Vikram MP, linux-ext4 On 06/26/2013 7:54 am, Theodore Ts'o wrote: > On Wed, Jun 26, 2013 at 10:02:05AM -0400, Theodore Ts'o wrote: > >> In this particular case, we could reflect the error all the way up to >> the ftruncate(2) system call. Fixing this is going to be a bit >> involved, unfortunately; we'll need to update a fairly large number >> of >> function signatures, including ext4_truncate(), ext4_ext_truncate(), >> ext4_free_blocks(), and a number of others. > > One thing that comes to mind. If we change things so that ftruncate > reflects an ENOMEM error all the way up to userspace, one side effect > of this is that the file may be partially truncated when ENOMEM is > returned. Applications may not be prepared for this. Hi Ted, it's the newbie again. I'd like to throw out a possible band-aid, which I know is ugly, but I'm not sure how it compares to other ideas discussed. What if there was a check at the start of the chain for free memory? For example: 1. User program calls function_x(parameter y). 2. We know function_x() calls function_a(), function_b(), and function_c(). 3. Based upon our knowledge of those functions (and perhaps parameter y), we can _estimate_ that function_x() will require z bytes memory. 4. Alter function_x() so that the first step is to check for free memory z. Upside 1. Obvious memory shortages are returned immediately, instead of 30 steps down the chain. 2. No risk of non-deterministic data changes (if caught; see downside). 2. No risk of infinite loop due to retries. 3. Puts a spotlight on applications that do not correctly handle ENOMEM, which to me is the equivalent of not correctly calling fsync(). Downside 1. Does not guarantee that memory will be available when ext4 needs its. Memory might be available during this pre-check, but another process might scoop it up between the pre-check and ext4's allocation. 2. Does not catch all cases. The check is only an estimate. Thank you for your patience and for answering my questions. Joseph D. Wagner ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation can cause ext4 filesystem to be remounted r/o 2013-06-26 18:53 ` Joseph D. Wagner @ 2013-06-26 22:14 ` Theodore Ts'o 0 siblings, 0 replies; 12+ messages in thread From: Theodore Ts'o @ 2013-06-26 22:14 UTC (permalink / raw) To: Joseph D. Wagner; +Cc: Nagachandra P, Vikram MP, linux-ext4 On Wed, Jun 26, 2013 at 11:53:12AM -0700, Joseph D. Wagner wrote: > 1. Does not guarantee that memory will be available when ext4 needs > its. Memory might be available during this pre-check, but another > process might scoop it up between the pre-check and ext4's > allocation. This is the huge one. In some cases we might need to read potentially one or more disk blocks in the course of servicing the request. By the time we've read in the disk blocks, hundreds of milliseconds can have gone by. If there is a high speed network transfer coming in over the network, the networking stack can chew up a huge amount of memory surprisingly quickly (to say nothing of a JVM that might be starting up in parallel with reading in the allocation bitmaps, for example). This is also assuming we know in advance how much memory would be needed. Depending on how fragmented a file might be, the amount of memory required can vary significantly. And if the required metadata blocks are in memory, we won't know how much memory will be needed until we pull the necessary blocks into memory. The other problem with doing this is that the point at which we would do the check for the necessary memory is at the high-level portions of the ext4, and the places where we are doing the memory allocation are sometimes deep in the guts of ext4. So figuring this out would require some truly nasty abstraction violations. For the same reason, we can't just simply allocate the memory before we start the file system operation. There are places where we could do this, without doing severe violence to the surrounding code and without making things a total maintenance nightmare. But it's one of those things where we'd have to look at each place where we allocate memory and decide what's the best way to handle things. - Ted ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-06-28 13:52 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-06-25 9:25 (unknown), Nagachandra P 2013-06-26 14:02 ` Memory allocation can cause ext4 filesystem to be remounted r/o Theodore Ts'o 2013-06-26 14:54 ` Theodore Ts'o 2013-06-26 15:20 ` Nagachandra P 2013-06-26 16:34 ` Theodore Ts'o 2013-06-26 17:05 ` Nagachandra P 2013-06-26 18:03 ` Theodore Ts'o 2013-06-27 12:58 ` Nagachandra P 2013-06-27 17:36 ` Theodore Ts'o 2013-06-28 13:52 ` Nagachandra P 2013-06-26 18:53 ` Joseph D. Wagner 2013-06-26 22:14 ` Theodore Ts'o
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.