All of lore.kernel.org
 help / color / mirror / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: Luigi Semenzato <semenzato@google.com>
Cc: David Rientjes <rientjes@google.com>,
	linux-mm@kvack.org, Dan Magenheimer <dan.magenheimer@oracle.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Sonny Rao <sonnyrao@google.com>, Mandeep Baines <msb@google.com>
Subject: Re: zram OOM behavior
Date: Wed, 31 Oct 2012 10:27:20 +0900	[thread overview]
Message-ID: <20121031012720.GO15767@bbox> (raw)
In-Reply-To: <CAA25o9QhkQfZi+UVOjj0JBkNo8Vmt22ATUP25LFqkS-cDoq85Q@mail.gmail.com>

On Tue, Oct 30, 2012 at 06:06:56PM -0700, Luigi Semenzato wrote:
> On Tue, Oct 30, 2012 at 5:57 PM, Minchan Kim <minchan@kernel.org> wrote:
> > Hi Luigi,
> >
> > On Tue, Oct 30, 2012 at 12:12:02PM -0700, Luigi Semenzato wrote:
> >> On Mon, Oct 29, 2012 at 10:41 PM, David Rientjes <rientjes@google.com> wrote:
> >> > On Mon, 29 Oct 2012, Luigi Semenzato wrote:
> >> >
> >> >> However, now there is something that worries me more.  The trace of
> >> >> the thread with TIF_MEMDIE set shows that it has executed most of
> >> >> do_exit() and appears to be waiting to be reaped.  From my reading of
> >> >> the code, this implies that task->exit_state should be non-zero, which
> >> >> means that select_bad_process should have skipped that thread, which
> >> >> means that we cannot be in the deadlock situation, and my experiments
> >> >> are not consistent.
> >> >>
> >> >
> >> > Yeah, this is what I was referring to earlier, select_bad_process() will
> >> > not consider the thread for which you posted a stack trace for oom kill,
> >> > so it's not deferring because of it.  There are either other thread(s)
> >> > that have been oom killed and have not yet release their memory or the oom
> >> > killer is never being called.
> >>
> >> Thanks.  I now have better information on what's happening.
> >>
> >> The "culprit" is not the OOM-killed process (the one with TIF_MEMDIE
> >> set).  It's another process that's exiting for some other reason.
> >>
> >> select_bad_process() checks for thread->exit_state at the beginning,
> >> and skips processes that are exiting.  But later it checks for
> >> p->flags & PF_EXITING, and can return -1 in that case (and it does for
> >> me).
> >>
> >> It turns out that do_exit() does a lot of things between setting the
> >> thread->flags PF_EXITING bit (in exit_signals()) and setting
> >> thread->exit_state to non-zero (in exit_notify()).  Some of those
> >> things apparently need memory.  I caught one process responsible for
> >> the PTR_ERR(-1) while it was doing this:
> >>
> >> [  191.859358] VC manager      R running      0  2388   1108 0x00000104
> >> [  191.859377] err_ptr_count = 45623
> >> [  191.859384]  e0611b1c 00200086 f5608000 815ecd20 815ecd20 a0a9ebc3
> >> 0000002c f67cfd20
> >> [  191.859407]  f430a060 81191c34 e0611aec 81196d79 4168ef20 00000001
> >> e1302400 e130264c
> >> [  191.859428]  e1302400 e0611af4 813b71d5 e0611b00 810b42f1 e1302400
> >> e0611b0c 810b430e
> >> [  191.859450] Call Trace:
> >> [  191.859465]  [<81191c34>] ? __delay+0xe/0x10
> >> [  191.859478]  [<81196d79>] ? do_raw_spin_lock+0xa2/0xf3
> >> [  191.859491]  [<813b71d5>] ? _raw_spin_unlock+0xd/0xf
> >> [  191.859504]  [<810b42f1>] ? put_super+0x26/0x29
> >> [  191.859515]  [<810b430e>] ? drop_super+0x1a/0x1d
> >> [  191.859527]  [<8104512d>] __cond_resched+0x1b/0x2b
> >> [  191.859537]  [<813b67a7>] _cond_resched+0x18/0x21
> >> [  191.859549]  [<81093940>] shrink_slab+0x224/0x22f
> >> [  191.859562]  [<81095a96>] try_to_free_pages+0x1b7/0x2e6
> >> [  191.859574]  [<8108df2a>] __alloc_pages_nodemask+0x40a/0x61f
> >> [  191.859588]  [<810a9dbe>] read_swap_cache_async+0x4a/0xcf
> >> [  191.859600]  [<810a9ea4>] swapin_readahead+0x61/0x8d
> >> [  191.859612]  [<8109fff4>] handle_pte_fault+0x310/0x5fb
> >> [  191.859624]  [<810a0420>] handle_mm_fault+0xae/0xbd
> >> [  191.859637]  [<8101d0f9>] do_page_fault+0x265/0x284
> >> [  191.859648]  [<8104aa17>] ? dequeue_entity+0x236/0x252
> >> [  191.859660]  [<8101ce94>] ? vmalloc_sync_all+0xa/0xa
> >> [  191.859672]  [<813b7887>] error_code+0x67/0x6c
> >> [  191.859683]  [<81191d21>] ? __get_user_4+0x11/0x17
> >> [  191.859695]  [<81059f28>] ? exit_robust_list+0x30/0x105
> >> [  191.859707]  [<813b71b0>] ? _raw_spin_unlock_irq+0xd/0x10
> >> [  191.859718]  [<810446d5>] ? finish_task_switch+0x53/0x89
> >> [  191.859730]  [<8102351d>] mm_release+0x1d/0xc3
> >> [  191.859740]  [<81026ce9>] exit_mm+0x1d/0xe9
> >> [  191.859750]  [<81032b87>] ? exit_signals+0x57/0x10a
> >> [  191.859760]  [<81028082>] do_exit+0x19b/0x640
> >> [  191.859770]  [<81058598>] ? futex_wait_queue_me+0xaa/0xbe
> >> [  191.859781]  [<81030bbf>] ? recalc_sigpending_tsk+0x51/0x5c
> >> [  191.859793]  [<81030beb>] ? recalc_sigpending+0x17/0x3e
> >> [  191.859803]  [<81028752>] do_group_exit+0x63/0x86
> >> [  191.859813]  [<81032b19>] get_signal_to_deliver+0x434/0x44b
> >> [  191.859825]  [<81001e01>] do_signal+0x37/0x4fe
> >> [  191.859837]  [<81048eed>] ? set_next_entity+0x36/0x9d
> >> [  191.859850]  [<81050d8e>] ? timekeeping_get_ns+0x11/0x55
> >> [  191.859861]  [<8105a754>] ? sys_futex+0xcb/0xdb
> >> [  191.859871]  [<810024a7>] do_notify_resume+0x26/0x65
> >> [  191.859883]  [<813b73a5>] work_notifysig+0xa/0x11
> >> [  191.859893] Kernel panic - not syncing: too many ERR_PTR
> >>
> >> I don't know why mm_release() would page fault, but it looks like it does.
> >>
> >> So the OOM killer will not kill other processes because it thinks a
> >> process is exiting, which will free up memory.  But the exiting
> >> process needs memory to continue exiting --> deadlock.  Sounds
> >> plausible?
> >
> > It sounds right in your kernel but principal problem is min_filelist_kbytes patch.
> > If normal exited process in exit path requires a page and there is no free page
> > any more, it ends up going to OOM path after try to reclaim memory several time.
> > Then,
> > In select_bad_process,
> >
> >         if (task->flags & PF_EXITING) {
> >                if (task == current)             <== true
> >                         return OOM_SCAN_SELECT;
> > In oom_kill_process,
> >
> >         if (p->flags & PF_EXITING)
> >                 set_tsk_thread_flag(p, TIF_MEMDIE);
> >
> > At last, normal exited process would get a free page.
> >
> > But in your kernel, it seems not because I guess did_some_progress in
> > __alloc_pages_direct_reclaim is never 0. The why it is never 0 is
> > do_try_to_free_pages's all_unreclaimable can't do his role by your
> > min_filelist_kbytes. It makes __alloc_pages_slowpath's looping forever.
> >
> > Sounds plausible?
> 
> Thank you Minchan, it does sound plausible, but I have little
> experience with this and it will take some work to confirm.

No problem :)

> 
> I looked at the patch pretty carefully once, and I had the impression
> its effect could be fully analyzed by logical reasoning. I will check
> this again tomorrow, perhaps I can run some experiments.  I am adding
> Mandeep who wrote the patch.
> 
> However, we have worse problems if we don't use that patch.  Without
> the patch, and either with or without compressed swap, the same load
> causes horrible thrashing, with the system appearing to hang for
> minutes.  If we don't use that patch, do you have any suggestion on
> how to improve the code thrash situation?

As I said, the motivation of the patch is good for embedded system but
patch's implementation is kinda buggy. I will have a look and post if 
I'm luck to get a time.

BTW, a question.

How do you find proper value for min_filelist_kbytes?
Just experiment with several trial?

Thanks.

> 
> Thanks again!
> 
> >>
> >> OK, now someone is going to fix this, right? :-)
> >>
> >> --
> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >> the body to majordomo@kvack.org.  For more info on Linux MM,
> >> see: http://www.linux-mm.org/ .
> >> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> > --
> > Kind regards,
> > Minchan Kim
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-10-31  1:21 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-28 17:32 zram OOM behavior Luigi Semenzato
2012-10-03 13:30 ` Konrad Rzeszutek Wilk
     [not found]   ` <CAA25o9SwO209DD6CUx-LzhMt9XU6niGJ-fBPmgwfcrUvf0BPWA@mail.gmail.com>
2012-10-12 23:30     ` Luigi Semenzato
2012-10-15 14:44 ` Minchan Kim
2012-10-15 18:54   ` Luigi Semenzato
2012-10-16  6:18     ` Minchan Kim
2012-10-16 17:36       ` Luigi Semenzato
2012-10-19 17:49         ` Luigi Semenzato
2012-10-22 23:53           ` Minchan Kim
2012-10-23  0:40             ` Luigi Semenzato
2012-10-23  6:03             ` David Rientjes
2012-10-29 18:26               ` Luigi Semenzato
2012-10-29 19:00                 ` David Rientjes
2012-10-29 22:36                   ` Luigi Semenzato
2012-10-29 22:52                     ` David Rientjes
2012-10-29 23:23                       ` Luigi Semenzato
2012-10-29 23:34                         ` Luigi Semenzato
2012-10-30  0:18                     ` Minchan Kim
2012-10-30  0:45                       ` Luigi Semenzato
2012-10-30  5:41                         ` David Rientjes
2012-10-30 19:12                           ` Luigi Semenzato
2012-10-30 20:30                             ` Luigi Semenzato
2012-10-30 22:32                               ` Luigi Semenzato
2012-10-31 18:42                                 ` David Rientjes
2012-10-30 22:37                               ` Sonny Rao
2012-10-31  4:46                               ` David Rientjes
2012-10-31  6:14                                 ` Luigi Semenzato
2012-10-31  6:28                                   ` Luigi Semenzato
2012-10-31 18:45                                     ` David Rientjes
2012-10-31  0:57                             ` Minchan Kim
2012-10-31  1:06                               ` Luigi Semenzato
2012-10-31  1:27                                 ` Minchan Kim [this message]
2012-10-31  3:49                                   ` Luigi Semenzato
2012-10-31  7:24                                     ` Minchan Kim
2012-10-31 16:07                                       ` Luigi Semenzato
2012-10-31 17:49                                         ` Mandeep Singh Baines
2012-10-31 18:54                               ` David Rientjes
2012-10-31 21:40                                 ` Luigi Semenzato
2012-11-01  2:11                                 ` Minchan Kim
2012-11-01  4:38                                   ` David Rientjes
2012-11-01  5:18                                     ` Minchan Kim
2012-11-01  2:43                                 ` Minchan Kim
2012-11-01  4:48                                   ` David Rientjes
2012-11-01  5:26                                     ` Minchan Kim
2012-11-01  8:28                                     ` Mel Gorman
2012-11-01 15:57                                       ` Luigi Semenzato
2012-11-01 15:58                                         ` Luigi Semenzato
2012-11-01 21:48                                           ` David Rientjes
2012-11-01 17:50                                     ` Luigi Semenzato
2012-11-01 21:50                                       ` David Rientjes
2012-11-01 21:58                                         ` [patch] mm, oom: allow exiting threads to have access to memory reserves David Rientjes
2012-11-01 22:43                                           ` Andrew Morton
2012-11-01 23:05                                             ` David Rientjes
2012-11-01 23:06                                             ` Luigi Semenzato
2012-11-01 22:04                                         ` zram OOM behavior Luigi Semenzato
2012-11-01 22:25                                           ` David Rientjes
2012-11-02  6:39 Minchan Kim
2012-11-02  8:30 ` Mel Gorman
2012-11-02 22:36   ` Minchan Kim
2012-11-05 14:46     ` Mel Gorman
2012-11-06  0:25       ` Minchan Kim
2012-11-06  8:58         ` Mel Gorman
2012-11-06 10:17           ` Minchan Kim
2012-11-09  9:50             ` Mel Gorman
2012-11-12 13:32               ` Minchan Kim
2012-11-12 14:06                 ` Mel Gorman
2012-11-13 13:31                   ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121031012720.GO15767@bbox \
    --to=minchan@kernel.org \
    --cc=dan.magenheimer@oracle.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=msb@google.com \
    --cc=rientjes@google.com \
    --cc=semenzato@google.com \
    --cc=sonnyrao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.