LKML Archive on lore.kernel.org
 help / Atom feed
From: Roman Gushchin <guro@fb.com>
To: "Arkadiusz Miśkiewicz" <a.miskiewicz@gmail.com>
Cc: Tejun Heo <tj@kernel.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	Aleksa Sarai <asarai@suse.de>, Jay Kamat <jgkamat@fb.com>,
	Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: pids.current with invalid value for hours [5.0.0 rc3 git]
Date: Sat, 26 Jan 2019 01:41:04 +0000
Message-ID: <20190126014055.GA25864@castle.DHCP.thefacebook.com> (raw)
In-Reply-To: <a95d004a-4358-7efc-6d21-12aac4411b32@gmail.com>

On Fri, Jan 25, 2019 at 08:47:57PM +0100, Arkadiusz Miśkiewicz wrote:
> On 25/01/2019 17:37, Tejun Heo wrote:
> > On Fri, Jan 25, 2019 at 08:52:11AM +0100, Arkadiusz Miśkiewicz wrote:
> >> On 24/01/2019 12:21, Arkadiusz Miśkiewicz wrote:
> >>> On 17/01/2019 14:17, Arkadiusz Miśkiewicz wrote:
> >>>> On 17/01/2019 13:25, Aleksa Sarai wrote:
> >>>>> On 2019-01-17, Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com> wrote:
> >>>>>> Using kernel 4.19.13.
> >>>>>>
> >>>>>> For one cgroup I noticed weird behaviour:
> >>>>>>
> >>>>>> # cat pids.current
> >>>>>> 60
> >>>>>> # cat cgroup.procs
> >>>>>> #
> >>>>>
> >>>>> Are there any zombies in the cgroup? pids.current is linked up directly
> >>>>> to __put_task_struct (so exit(2) won't decrease it, only the task_struct
> >>>>> actually being freed will decrease it).
> >>>>>
> >>>>
> >>>> There are no zombie processes.
> >>>>
> >>>> In mean time the problem shows on multiple servers and so far saw it
> >>>> only in cgroups that were OOMed.
> >>>>
> >>>> What has changed on these servers (yesterday) is turning on
> >>>> memory.oom.group=1 for all cgroups and changing memory.high from 1G to
> >>>> "max" (leaving memory.max=2G limit only).
> >>>>
> >>>> Previously there was no such problem.
> >>>>
> >>>
> >>> I'm attaching reproducer. This time tried on different distribution
> >>> kernel (arch linux).
> >>>
> >>> After 60s pids.current still shows 37 processes even if there are no
> >>> processes running (according to ps aux).
> >>
> >>
> >> The same test on 5.0.0-rc3-00104-gc04e2a780caf and it's easy to
> >> reproduce bug. No processes in cgroup but pids.current reports 91.
> > 
> > Can you please see whether the problem can be reproduced on the
> > current linux-next?
> > 
> >  git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> 
> I can reproduce on next (5.0.0-rc3-next-20190125), too:

How reliably you can reproduce it? I've tried to run your reproducer
several times with different parameters, but wasn't lucky so far.
What's yours cpu number and total ram size?

Can you, please, provide the corresponding dmesg output?

I've checked the code again, and my wild guess is that these missing
tasks are waiting (maybe hopelessly) for the OOM reaper. Dmesg output
might be very useful here.

Thanks!

  parent reply index

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <df806a77-3327-9db5-8be2-976fde1c84e5@gmail.com>
     [not found] ` <20190117122535.njcbqhlmzozdkncw@mikami>
     [not found]   ` <1d36b181-cbaf-6694-1a31-2f7f55d15675@gmail.com>
     [not found]     ` <96ef6615-a5df-30af-b4dc-417a18ca63f1@gmail.com>
2019-01-25  7:52       ` Arkadiusz Miśkiewicz
2019-01-25 16:37         ` Tejun Heo
2019-01-25 19:47           ` Arkadiusz Miśkiewicz
2019-01-26  1:27             ` Tetsuo Handa
2019-01-26  2:41               ` Arkadiusz Miśkiewicz
2019-01-26  6:10                 ` Tetsuo Handa
2019-01-26  7:55                   ` Tetsuo Handa
2019-01-26 11:09                     ` Tetsuo Handa
2019-01-26 11:29                       ` Arkadiusz Miśkiewicz
2019-01-26 13:10                         ` [PATCH v2] oom, oom_reaper: do not enqueue same task twice Tetsuo Handa
2019-01-27  8:37                           ` Michal Hocko
2019-01-27 10:56                             ` Tetsuo Handa
2019-01-27 11:40                               ` Michal Hocko
2019-01-27 14:57                                 ` [PATCH v3] " Tetsuo Handa
2019-01-27 16:58                                   ` Michal Hocko
2019-01-27 23:00                                   ` Roman Gushchin
2019-01-28 18:15                                   ` Andrew Morton
2019-01-28 18:42                                     ` Michal Hocko
2019-01-28 21:53                                   ` Johannes Weiner
2019-01-29 10:34                                     ` Tetsuo Handa
2019-01-26  1:41             ` Roman Gushchin [this message]
2019-01-26  2:28               ` pids.current with invalid value for hours [5.0.0 rc3 git] Arkadiusz Miśkiewicz

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190126014055.GA25864@castle.DHCP.thefacebook.com \
    --to=guro@fb.com \
    --cc=a.miskiewicz@gmail.com \
    --cc=asarai@suse.de \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jgkamat@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox