From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: ** X-Spam-Status: No, score=2.2 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 207CBC10DCE for ; Sat, 7 Mar 2020 01:57:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C62E1206CC for ; Sat, 7 Mar 2020 01:57:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MNdd7CY1" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C62E1206CC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5ACF86B0003; Fri, 6 Mar 2020 20:57:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 55CF36B0006; Fri, 6 Mar 2020 20:57:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 472746B0007; Fri, 6 Mar 2020 20:57:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0001.hostedemail.com [216.40.44.1]) by kanga.kvack.org (Postfix) with ESMTP id 300DC6B0003 for ; Fri, 6 Mar 2020 20:57:19 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E251B180AD807 for ; Sat, 7 Mar 2020 01:57:18 +0000 (UTC) X-FDA: 76566903756.10.corn83_22b84b0869223 X-HE-Tag: corn83_22b84b0869223 X-Filterd-Recvd-Size: 8831 Received: from mail-io1-f66.google.com (mail-io1-f66.google.com [209.85.166.66]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Sat, 7 Mar 2020 01:57:18 +0000 (UTC) Received: by mail-io1-f66.google.com with SMTP id k4so3976730ior.4 for ; Fri, 06 Mar 2020 17:57:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=RaZaDYOWvAn7pbTY8+Acu+6HHetfn1V8OZjkuxSFjro=; b=MNdd7CY1NRdSabvQaZBYqD/l+ve+EcmJxiZ44KgWTxUmoyOx0+6eisWnJNozA5IBlf 3FLbU+kKp57hpvJBfempd5euFe33zocLbHyG4yjMgsBeakaQh5cHV2Qg92h/8YtEGBGA cE+Pef2yiKPTUHBxe7i2cp4S5S7roRusLKAFsSrVLWMtbhz2t66E9Qx2xGu30/wg1wTY V05zp4YyH8VYHvkNCIWssg4aV6oGaYmQyHSB0NZGlS4sCAoL27Tm2fVXz6YEYPfZHl/U 2j4eJyJWhIosKclAp5IL3+dkBjggRcmNZ6sw3CWloCAdkuVz4ymyX5GTkFz/qS0lI8lN VbLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=RaZaDYOWvAn7pbTY8+Acu+6HHetfn1V8OZjkuxSFjro=; b=MVeEX0Dr6h8YqS5cbAt/zAa5IoOxmHaVtSHivA299JaZqgwTgOfvdOUj2HivI2s7iR AiFB/0oUKlyOD2JTvP5QOpni4SINZmF6UXMiCZaYZP88rvMQpWEuZ+hn7YAim5X21yrL LtWRV8QJLb/TrRJLCuvD1Hl/4OaLCU//AZ3O+U/K//rva4rRsnuDhS34v9LUOu4tFL1s R7jDk+KJAY3f5W8iqEXwBMHy7gsjReHXdJ8Wd4+DnoBaBmV7q3d/0aJzE6fkrbzMoTPQ 9qRtjtyMz62vx87vVCe4SpRwwXYoPZSZjxEfOSHdQtutOiFD8RWtM74aidnnPLWAQ4vD FsTQ== X-Gm-Message-State: ANhLgQ1uALA8aje4jd54H8q80z2lNDzJGtn7VXIvuZbkRmWFZ8vOjonB rKqHGV/1qn+rKXMWFbu3JTJ4y4cZURfUPbbmtAoj8V6dhF8= X-Google-Smtp-Source: ADFU+vseaBEEcA3Rkk3u2TF/ya7Qd7EZUKyYOD3FAVIoHkJEWldS1P1WXlfOY16AFmL5E4j+xT66GBq3KqHv0LWM7C8= X-Received: by 2002:a5d:8143:: with SMTP id f3mr5307582ioo.12.1583546237419; Fri, 06 Mar 2020 17:57:17 -0800 (PST) MIME-Version: 1.0 From: Whoza Ran Date: Sat, 7 Mar 2020 02:57:06 +0100 Message-ID: Subject: Processes being killed without obvious reason or log entries To: linux-mm@kvack.org Content-Type: multipart/alternative; boundary="000000000000f7842105a03a1625" X-Bogosity: Ham, tests=bogofilter, spamicity=0.260632, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --000000000000f7842105a03a1625 Content-Type: text/plain; charset="UTF-8" hi folks I have some issue its reasons I am currently unaware of on multiple hosts. Currently I noticed this when trying to access a postgresql database while doing `su postgres`. please note that I do not think that this is related to postgresql, but the kernel. as soon as I do `su postgres` sometimes, but not always, the su-process responds immediately with `Killed` and does not su to postgres. this does not happen every time, sometimes I can turn into the postgres user. executing anything else in that shell, e.g. psql then results in `Killed` all the time. I may be hitting some system limits or something I am not aware of. but if there were limits hit, like things the oomkiller would handle, this would be logged to syslog in any case afaik, yet there are absolutely no logs in syslog, dmesg or the journal. since psql is a perl wrapper script, it forks lots of processes and does not make it to a forked postgresql client. so I tried this script: https://pastebin.com/6igR3pmW as you can observe, even `date` gets killed sometimes. In this short example at line #22 My best guess now is that there is something wrong with the logging there, the kernel is vanilla 5.5.2, the OS is debian stretch. the ulimits are not hit afaik, at least the usual suspects were set to unlimited, yet any log entry should clarify what's going wrong. also terminating all processes running as user postgres did not help. afaik ulimits can not be the cause then, as there are no processes running as that user. one weird thing: the database server itself, postgresql, can start perfectly fine. # ps -elf | grep postgres 0 S postgres 680 1 0 80 0 - 42460 - Feb07 ? 00:00:41 /usr/lib/postgresql/9.6/bin/postgres [... some args] 1 S postgres 681 680 0 80 0 - 34755 - Feb07 ? 00:00:02 postgres: logger process 1 S postgres 684 680 0 80 0 - 42518 - Feb07 ? 00:00:17 postgres: checkpointer process 1 S postgres 685 680 0 80 0 - 42460 - Feb07 ? 00:00:14 postgres: writer process 1 S postgres 686 680 0 80 0 - 42460 - Feb07 ? 00:00:15 postgres: wal writer process 1 S postgres 687 680 0 80 0 - 38389 - Feb07 ? 00:00:12 postgres: stats collector process what am I missing? could this be some issue with the memory management killing processes but not logging things? Also, should that help: strace psql - https://pastebin.com/7w2muHDc which is being killed as +++ killed by SIGKILL +++ --000000000000f7842105a03a1625 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
hi folks


I ha= ve some issue its reasons I am currently unaware of on multiple hosts.

Currently I noticed this when trying to access a postg= resql database while doing `su postgres`. please note that I do not think t= hat this is related to postgresql, but the kernel.

as soon as I do `su postgres` sometimes, but not always, the su-process re= sponds immediately with `Killed` and does not su to postgres. this does not= happen every time, sometimes I can turn into the postgres user. executing = anything else in that shell, e.g. psql then results in `Killed` all the tim= e. I may be hitting some system limits or something I am not aware of.

but if there were limits hit, like things the oomkille= r would handle, this would be logged to syslog in any case afaik, yet there= are absolutely no logs in syslog, dmesg or the journal. since psql is a pe= rl wrapper script, it forks lots of processes and does not make it to a for= ked postgresql client. so I tried this script: https://pastebin.com/6igR3pmW

as= you can observe, even `date` gets killed sometimes. In this short example = at line #22

My best guess now is that there is som= ething wrong with the logging there, the kernel is vanilla 5.5.2, the OS is= debian stretch. the ulimits are not hit afaik, at least the usual suspects= were set to unlimited, yet any log entry should clarify what's going w= rong.

also terminating a= ll processes running as user postgres did not help. afaik ulimits can not b= e the cause then, as there are no processes running as that user. one weird= thing: the database server itself, postgresql, can start perfectly fine.

# ps -elf | grep postgres
0 S postgres =C2=A0 680 =C2=A0 = =C2=A0 1 =C2=A00 =C2=A080 =C2=A0 0 - 42460 - =C2=A0 =C2=A0 =C2=A0Feb07 ? = =C2=A0 =C2=A0 =C2=A0 =C2=A000:00:41 /usr/lib/postgresql/9.6/bin/postgres [.= .. some args]
1 S postgres =C2=A0 681 =C2=A0 680 =C2=A00 =C2=A080 =C2=A0= 0 - 34755 - =C2=A0 =C2=A0 =C2=A0Feb07 ? =C2=A0 =C2=A0 =C2=A0 =C2=A000:00:0= 2 postgres: logger process =C2=A0
1 S postgres =C2=A0 684 =C2=A0 680 = =C2=A00 =C2=A080 =C2=A0 0 - 42518 - =C2=A0 =C2=A0 =C2=A0Feb07 ? =C2=A0 =C2= =A0 =C2=A0 =C2=A000:00:17 postgres: checkpointer process =C2=A0
1 S pos= tgres =C2=A0 685 =C2=A0 680 =C2=A00 =C2=A080 =C2=A0 0 - 42460 - =C2=A0 =C2= =A0 =C2=A0Feb07 ? =C2=A0 =C2=A0 =C2=A0 =C2=A000:00:14 postgres: writer proc= ess =C2=A0
1 S postgres =C2=A0 686 =C2=A0 680 =C2=A00 =C2=A080 =C2=A0 0= - 42460 - =C2=A0 =C2=A0 =C2=A0Feb07 ? =C2=A0 =C2=A0 =C2=A0 =C2=A000:00:15 = postgres: wal writer process =C2=A0
1 S postgres =C2=A0 687 =C2=A0 680 = =C2=A00 =C2=A080 =C2=A0 0 - 38389 - =C2=A0 =C2=A0 =C2=A0Feb07 ? =C2=A0 =C2= =A0 =C2=A0 =C2=A000:00:12 postgres: stats collector process

what am I missi= ng? could this be some issue with the memory management killing processes b= ut not logging things?
Also, should that help: strace psql - https://pastebin.com/7w2muHDc whi= ch is being killed as +++ killed by SIGKIL= L +++
=
--000000000000f7842105a03a1625--