From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751322AbdFASlq (ORCPT ); Thu, 1 Jun 2017 14:41:46 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:57585 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751078AbdFASlo (ORCPT ); Thu, 1 Jun 2017 14:41:44 -0400 Authentication-Results: kernel.org; dkim=none (message not signed) header.d=none;kernel.org; dmarc=none action=none header.from=fb.com; Date: Thu, 1 Jun 2017 19:41:13 +0100 From: Roman Gushchin To: Michal Hocko CC: Tetsuo Handa , Johannes Weiner , Vladimir Davydov , , , Subject: Re: [PATCH v2] mm,oom: add tracepoints for oom reaper-related events Message-ID: <20170601184113.GA31689@castle> References: <1496145932-18636-1-git-send-email-guro@fb.com> <20170530123415.GF7969@dhcp22.suse.cz> <20170530133335.GB28148@castle> <20170530134552.GI7969@dhcp22.suse.cz> <20170530185231.GA13412@castle> <20170531163928.GZ27783@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170531163928.GZ27783@dhcp22.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) X-Originating-IP: [2620:10d:c092:200::1:ebb3] X-ClientProxiedBy: AM4PR07CA0029.eurprd07.prod.outlook.com (2603:10a6:205:1::42) To CO1PR15MB1080.namprd15.prod.outlook.com (2a01:111:e400:7b66::10) X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PR15MB1080: X-MS-Office365-Filtering-Correlation-Id: 95977f38-c718-4c3f-0d1d-08d4a91dd06f X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(201703131423075)(201703031133081);SRVR:CO1PR15MB1080; X-Microsoft-Exchange-Diagnostics: 1;CO1PR15MB1080;3:VCD0KruzcsrfFdwGUj1wxlws4mUJg1X8KKTeVRmD8dVmyKxJjzm2cWrYrSJEqgpq++ZBR7wjKhZnhC86unZUn71sAmVMLC1wDi3BXJCI0/RmVle/jIC5zZ3jjfIYPF7CdBaZBBRohHXVolkRlpcrGRy7u3gVQQMJxhy8QgftcI7XJVSRxHpvL8Ceqn/jaQtReGexgIE52H8fTnxawucj5VbOYYCs6qv3+tMP2wH1dTadCbbKkKGiCkFUqY4/p0sYSYuJq1ZIuHVofKn2Q1ty4ICvLk6yAOSCP8Yxh/aDat6G7zkX1MBpK06X7BcFfAmKNd5lGq4rnHHKi8j8XEIzXg==;25:970zDkIgpX0tmSCj1DTVqNRy5pbwLInSPWtsvTvm1iOFXTW9hJfR2IKfUSi7qQUX32K+/WWy6M65JiZfI3p+3NGd5oSwg8fvBxGqZqZgueOu92efisTyAdt3Fg/5Pcq3wcAUCW7xKZeyGy+30DvC3PM2LdYD4ztBZbHvPKnyw/8E7kv4GDyU1E/Dv+SJxqUenPXdSGKFoLkIwu1S9AwdCrDxD4OZvX/r+FRio6AxxxquEjfSN7m01OumEKxWGVlCb/UJisUj3LlMoHEUp1pJhaqkPmgniWuxIZ0mhmsvNe71EnYaW5xhhbEL3sXExK9jDhskj0u+qjfK9fRz68NJ8dN9ymvYMLUYYtPvTcavjiY9o/U0EVw1J5qGWQsZPqP2M5a2GS2IeoTk25oo1Abu5T/oAUeqasT2qH+kBrCCAhmeqtMmin7yD7RYyV5jCI/FkNQlDkVbABRpy7N7alydefBc+bkmtVN5UA1De8t3I3o= X-Microsoft-Exchange-Diagnostics: 1;CO1PR15MB1080;31:Qf1/Ojjo3aUUPVu4ipDXZwj7E1eOqfIpPgzTcCfsDP29W6wlVi/6AS3UG+7s3Cb+sT/4LgUCi+S5GCOCtB13JazGoq20NPfvxuQDBqN0f9L1rl3YGzJ/xF5yAEYeW6HkJjBUm9o6RKkfGxRg45TTNKF9j4Kx3qS4iTtrWr3cOLFDXKymWmTgP5+bwWpUtEgJFAG6HYSy7NGPqucnXpvZXyR3GtrzMqTKNEXujiR3tqI=;20:cRyZJOsYswRFtoTJ9uYwoJt9C7jGGElQUOSoz8uZjKHJgASW2rQ85UXMxYX1tHySCSyWan4xEYk1sPWhn6yxXH89K8EMF2rnE+X79rzMSd8bIzAjnm657XNBZGw2P/5zOAlS/lQgGNdnDpZPGQTO7wIYki+hvoXd4lHgZxbyCGIZTKT2rHNPA65Vtv966auTkPG+ejidKC+jwM7zfrQLynAsUBSd7j+PUj2x2bcjvx1UerFiAQYKtr2XCn6asrHTHFVFlnsxAOoMVlUWnZ9WkSGvxF2YEJFwGLvl5JFb87fAklYAR2I2qCsIf2jGzMcPyqGwtVkmsvoOzH5evKtT4+scBc51xV+yZSnyeUEdQn8nxe1R3fxMPGku8/iOvUrZpRWHWY5a2qysIP3wKEQcXhfYdSu/Rsfze3tufp3A8HAtSwGcXcA0SdGB8Sr1aG2I/DYnOBG36hZrmIYOEgXspz9CgHVQ7Y0kR1Yk+KE8euSR8s1VttPOOXAaT07f/iUO X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(67672495146484); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(8121501046)(10201501046)(100000703101)(100105400095)(93006095)(93001095)(3002001)(6041248)(20161123558100)(20161123560025)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123555025)(6072148)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:CO1PR15MB1080;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:CO1PR15MB1080; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;CO1PR15MB1080;4:BPXhAbJaDga4HWSs8yOi/i4y2PvAtooxaCwMkeKQVX?= =?us-ascii?Q?bOpCaxbQGt9fiFRx6gPfIQGX+/yWeHbcgSQo262GvjKFJ8uYpKXkPeR4FcW1?= =?us-ascii?Q?78aI2jSKUzcb0+2jrIQmmvOYItBFDrnqhrTgl9b2YbQk0eOYyBouk99FDtNv?= =?us-ascii?Q?6ty/6B6iAXOZzEljB1kE7xGmDTXAlaZtDtCnE0evuORipiOmXMm9ZSqK+dnz?= =?us-ascii?Q?nAJ1LYPph2lNRb52OPPqkopf/RSHgn4jgMkJ5pOB58lmv/jTMlBM1G4cd4yB?= =?us-ascii?Q?LDngxx8hfztPNGFNSHDn+alU5KpgH30KISHyyEt9lY5/5DmNHDrQ6qqI1VWl?= =?us-ascii?Q?ZuWu0pZCmZQlIZQrConkbyzz4dsEuJM14EDp87exscbdzzsBRICkDrTPtNCK?= =?us-ascii?Q?iWf0GTyOA0WIqNwVszcAtwn3QynrJnlFCbxkdYA2ErINLW4l4Oa15dL95I4m?= =?us-ascii?Q?Awfv4P3s94ScKUv5bIKJozaD3rh/jXp8dAU3mLeZF6kFJTdTNiPRT2LrG2F9?= =?us-ascii?Q?nV6nLLrbblSWLut/V7EKzI5UR4N16La7KPE5xYT0HE6hHcvg8Uzl5Mh0oktg?= =?us-ascii?Q?rZT4J8irxA4C1dhcxqEG2CzFJxz9Rsgn1/L/wbxviOcbmQMDORSc7DK3n3Kj?= =?us-ascii?Q?E9s2H/3t0IwceEjUiHdqGE68ltIm92KHW76T42u8Dq7cHEyq/Z+QdoYZ4Co4?= =?us-ascii?Q?KEUu+uaHPjT2FmQYeO9uVtGe+Xg6UraUAyUS2ZwbiCvEPBMOrqu70JqanFgS?= =?us-ascii?Q?ttfo75kWfTVMOTp3T6neCpSZ3ta9ZNOE/L6q8kHnH5cMR/avFhdtF13LNz+1?= =?us-ascii?Q?f7CHp2fo9SJ5XA0NLiYdm1CCBwP9vCdAScHplKHzy57cNiDXamNp6Osyb9gu?= =?us-ascii?Q?PqsswZscmd17iE6tunMHueiWomtgPvP4cIaP7EmKJQPiKWv3D8jIHomsXrxr?= =?us-ascii?Q?GEHYEVlcQeI95BigO5pKJBlclrBfekeLoEbpT1PhldEitYNgoElm2KINMqYn?= =?us-ascii?Q?WOnAlEGlboCL49m30Zi99exfje8d1nYgbuCUkDwhwembc6iH+vj+0t4tdp9e?= =?us-ascii?Q?4lXex4BKbshwB6MLJGi2g5G5Ct/SneQN6U7+/g56nob2a5/VYtAT51ykOgn2?= =?us-ascii?Q?xk6F4VZG7/4fEdceJ7Yt323OF9N7uIrzDQQFW5vTYt5Zt0FqBmuVlHZ6ZDxb?= =?us-ascii?Q?A+AOMZfyxhDTA=3D?= X-Forefront-PRVS: 0325F6C77B X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(39450400003)(39840400002)(39850400002)(39400400002)(39410400002)(24454002)(377424004)(54906002)(55016002)(6666003)(6116002)(2950100002)(7736002)(110136004)(6306002)(8676002)(33656002)(6496005)(6916009)(81166006)(23726003)(1076002)(9686003)(53936002)(50466002)(25786009)(229853002)(33716001)(2906002)(4326008)(53546009)(42186005)(54356999)(50986999)(189998001)(93886004)(47776003)(5660300001)(83506001)(305945005)(4001350100001)(76176999)(86362001)(966005)(6246003)(478600001)(38730400002)(18370500001)(142933001)(42262002);DIR:OUT;SFP:1102;SCL:1;SRVR:CO1PR15MB1080;H:castle;FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;CO1PR15MB1080;23:ZfabOYa4QkPR+iuW3q+KwB9CIvX9QFPbKTVQh0N5N?= =?us-ascii?Q?7b0SBp+MNWlQ6MXurprihBCY9KsezIFB3oSiqik+ChU4w2LLHZFtK3sbRLsO?= =?us-ascii?Q?BZP6Cpt8CCI8Qy8bMbPj2KrqxVqq3Y6ioGTIA4nNulQSYkUvYoTV72oZN3fa?= =?us-ascii?Q?LzJYOaITCwguBLuYKMMZJ+ySCptBnomAZjOkz1ZtW3ZU+h5fx5ANk70Mc+cV?= =?us-ascii?Q?2VanZ1y3srXNEeg8l4CQSen37fd+0Xm6zlTQeJlMiZxeZh6d1Fy6r0JezDYR?= =?us-ascii?Q?vM6P/w0orw/+Roz4GUrXCgRdzFQI5i9r0vCdA+8SUaZodTqAOhsCkczDAgxx?= =?us-ascii?Q?OPQI7WD+1ANkG5tsoQCOB3EkPo/0UT07lJPxk9JnQFs1eQgFkLlo5/EyxWYh?= =?us-ascii?Q?rgEazoCSwBdyJuEMgq3ZrHRt0iThRebJn9itkddG+IijvOzR6eeut6yZxUcp?= =?us-ascii?Q?VRCoRDXH6pS22L8E+PU8OUJErrCvwWaB4Ti4QDMaBG6kFfnWaaeIaxKb8v1Y?= =?us-ascii?Q?dY1yJibty6IFTC9XQVLNg4rFHVOPnS9yvCfM4bHXaLb7JlHn58ALtbmETca7?= =?us-ascii?Q?Tg4BdOldocVt5glcsz8dd8pmGmrhqqxRpc4mHVgPYULSp/qhrRoBrZEeBP3S?= =?us-ascii?Q?VYtl4HsJtaDHmSb3ZceTRBmAoKByi7QM6aIqI/Rq18XEzamkNDVYWlhhTQ68?= =?us-ascii?Q?NgkXBfyyPh75shFsNtzJ0UxanwlQO2wMZPC5EHjxlzyuoW0qCNUKteHzLRhP?= =?us-ascii?Q?1tvWEQprfqRD+k3W6Zw8lnn4Jnmf3EgmoQh2ISQWD4/Vrqe+/+cHP2n++8mT?= =?us-ascii?Q?GJ2JBprMGLKEYmq60hbAqaWbAwv3m0enUyF/l42BBRpytq2ERx+HYvOPQFHb?= =?us-ascii?Q?FjOBJ2tl3hj43jsCpbTaZCWtkDKa8FGb7aOxe4ttxOLu7z7OYKRI/YvwXTjA?= =?us-ascii?Q?KNzIxMF8J+xPDfNlucfFQqHAvlKP0k3Ay0za7xC0Fjd4uXgrSRJudmZtJmFp?= =?us-ascii?Q?zx1T33Ujtbzpwee05gkZMTmBEeL/Y/ELVv3CGLz9utT46fvOFa0fxUjNK0wh?= =?us-ascii?Q?auzmasPOG23OB8P4Jj7N+NgXM+BegNoUBFvophv6dBOwE/z1zSxoHBKEwQTL?= =?us-ascii?Q?DcgRWyUPAlsNnWr3VVP64zyyDE/di+LUp9BRo2OOtHopTzT8EK2yps+lH6/p?= =?us-ascii?Q?19C3CNxcI590npNNwIVcKqEf+ji8UeW1Xpci1vjiEwiUQ9iOywBHhp9jRNLQ?= =?us-ascii?Q?FpJLJSfAC1Mqu375cIbi+Et8lpWU4y/tVhPtnnXYkxYMgMwhfA79/edcLsDO?= =?us-ascii?B?QT09?= X-Microsoft-Exchange-Diagnostics: 1;CO1PR15MB1080;6:hZGL4DT9bmwuCW49h2681YRHhpxwgeBPs24Z7+Vibj7xopOIK36lLanMTcjhoOP2AgSIUOES6Cvg5RPkFz8/ZxYaTl3pT+xtEXJMe/oAilvanSRymSvmGEDl7hl2A5qc9Mq69yA0+NxnXjlwPNd9nL+vAo+/57ACvKPh3uVbniNunaK1Ur5zsnxVszMg4NN9W4Z97tVgx4NNH/ysXKwqTZ+epf9j01UMRc8DFKQp4vCi6OMdVS9MQwvQOC8sFdilGSeF2BleuE5HsD21I7NjGn0Trg/snlrenxcrBHNsbvgAf4rstEwbdhnqDayZXcT6xTEvMzZjsdrXb+WFFlzIJ7ouF9JxFWTA4rWO4+U71Lf9mtAnqqX58QEvezvlpbVeKcZu/Q080Ctk2XYGCgfnJ0cjUYCGQ9GNAS6YQkHK+BnFvlbjXIVJOTatUzty+hYsfh9aNJBCQc8ilSrRAVWEJkjXqO/L/W2kRmdWPGB6N6Sgm0VvtAjTIl8NTh1VaHs8yV9X/y3lA6rQW5Ai9le88w== X-Microsoft-Exchange-Diagnostics: 1;CO1PR15MB1080;5:n5Ok2AZlbQbI4Q9OLW1pm36Ao3HPcBBQqJ1BIH2dFNfrSbdr5JViflcb+4j56WgpOV2ru3CJfVE+3BeLoBagW9g8OC4LgQiEx8NPKBXO0FRxj5ENqOQfg+DWeJ+RdcAIiuH9mhlOLUp5AyYF5n+/wQn0ahuSi2S9QV3GHOMXEp+5qFoKlOQUL+Ztg5AV0vl5UcGwA0lxvfg1UCGDQI6zJByr20rJ/UZLXY3AohscEkrtkGiwpSrRVRQYxVP235ZvoJymEedW0M8gMXjGma3a/CA2IBTcRQ6McMe/i85Wn63pEOCo3zRH7QR9h4UAWYYAWnbDaeX1IN/Lv4Y4j4x/dYBC3YIklOj3Cuzu6ATtNlpOYyipS6q9+Eo97ghFuAUJztOcN5vePp3cBZq+1/TJ8DFWkb4O03mQ+qIi+wR2ODwpTQxmhBzQljRrJvfIB66R0+sRHY+ECFNU91iWsOPILBbyHSQpSmlBVqpnl+qLIiM2O8KWYLe4hH2Y0AcTwmwm;24:7ZDGNrLKnhx/oC1XY3M9k9dennIrlMSNX6+l+Km7OiIQt0L1WDkAmIwT3POQHk8DRsxver6BirpgF9SSgFgKiQ/uGY5OiSVsnotqRzkZuzc= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;CO1PR15MB1080;7:y4nsz6gHqJJ44R6+X44m6tdVwYUbDY3QLOa1fAGt4Vv5QeAewceQ+i8Y/P+gds5jT7DsWgn1DiA6uXp8RrBlXAduH50a8mSWb+Gxd3V++YuL+4KnRqhVVk7BOCnsQHJrnAy46JPs+mQH1waSaw61k6xSsJJnJLebEYmoUxOjkk+7etf9dVcYIGRFvI5NEm6jSg7whijE7qZyx//O/nr+BuENj2YQLj+7kjKNMqe2cXedtQhRXNEsZbgar7JY9lpapQ1XZ7aUgocOGTKZc+xiKBmWR7lL3fiYujaeTmU8yyAIV2RgdcAbpjemeXwuXPkBpIkvRxKpM5XxcYY2n6VosQ==;20:8HYmC7fTnlbgfxEHrdn6891VEqi72gBz2ry+lG1FhL8xcXP2LOXJowe4udzXxWsAO+//CP0D8umK1GbRTwz6mM/l6PtSN5iJPuelqteF/oNE2qulhLV08s8v5mNw1SYpNbCECeRR6c1NrfvwalivPXSobsrSKUvHVrY7UyIyUxs= X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Jun 2017 18:41:26.6229 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO1PR15MB1080 X-OriginatorOrg: fb.com X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-01_05:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 31, 2017 at 06:39:29PM +0200, Michal Hocko wrote: > On Tue 30-05-17 19:52:31, Roman Gushchin wrote: > > >From c57e3674efc609f8364f5e228a2c1309cfe99901 Mon Sep 17 00:00:00 2001 > > From: Roman Gushchin > > Date: Tue, 23 May 2017 17:37:55 +0100 > > Subject: [PATCH v2] mm,oom: add tracepoints for oom reaper-related events > > > > During the debugging of the problem described in > > https://lkml.org/lkml/2017/5/17/542 and fixed by Tetsuo Handa > > in https://lkml.org/lkml/2017/5/19/383 , I've found that > > the existing debug output is not really useful to understand > > issues related to the oom reaper. > > > > So, I assume, that adding some tracepoints might help with > > debugging of similar issues. > > > > Trace the following events: > > 1) a process is marked as an oom victim, > > 2) a process is added to the oom reaper list, > > 3) the oom reaper starts reaping process's mm, > > 4) the oom reaper finished reaping, > > 5) the oom reaper skips reaping. > > > > How it works in practice? Below is an example which show > > how the problem mentioned above can be found: one process is added > > twice to the oom_reaper list: > > > > $ cd /sys/kernel/debug/tracing > > $ echo "oom:mark_victim" > set_event > > $ echo "oom:wake_reaper" >> set_event > > $ echo "oom:skip_task_reaping" >> set_event > > $ echo "oom:start_task_reaping" >> set_event > > $ echo "oom:finish_task_reaping" >> set_event > > $ cat trace_pipe > > allocate-502 [001] .... 91.836405: mark_victim: pid=502 > > allocate-502 [001] .N.. 91.837356: wake_reaper: pid=502 > > allocate-502 [000] .N.. 91.871149: wake_reaper: pid=502 > > oom_reaper-23 [000] .... 91.871177: start_task_reaping: pid=502 > > oom_reaper-23 [000] .N.. 91.879511: finish_task_reaping: pid=502 > > oom_reaper-23 [000] .... 91.879580: skip_task_reaping: pid=502 > > OK, this is much better! The clue here would be that we got 2 > wakeups for the same task, right? > Do you think it would make sense to put more context to those > tracepoints? E.g. skip_task_reaping can be due to lock contention or the > mm gone. wake_reaper is similar. I agree, that some context might be useful under some circumstances, but I don't think we should add any additional fields until we will have some examples of where this data is actually useful. If we will need it, we can easily add it later. Thanks! Roman From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id 4D6D66B02F3 for ; Thu, 1 Jun 2017 14:41:39 -0400 (EDT) Received: by mail-pf0-f198.google.com with SMTP id n75so53579563pfh.0 for ; Thu, 01 Jun 2017 11:41:39 -0700 (PDT) Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com. [67.231.145.42]) by mx.google.com with ESMTPS id i68si2544973pgc.281.2017.06.01.11.41.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 01 Jun 2017 11:41:38 -0700 (PDT) Date: Thu, 1 Jun 2017 19:41:13 +0100 From: Roman Gushchin Subject: Re: [PATCH v2] mm,oom: add tracepoints for oom reaper-related events Message-ID: <20170601184113.GA31689@castle> References: <1496145932-18636-1-git-send-email-guro@fb.com> <20170530123415.GF7969@dhcp22.suse.cz> <20170530133335.GB28148@castle> <20170530134552.GI7969@dhcp22.suse.cz> <20170530185231.GA13412@castle> <20170531163928.GZ27783@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20170531163928.GZ27783@dhcp22.suse.cz> Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Tetsuo Handa , Johannes Weiner , Vladimir Davydov , kernel-team@fb.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org On Wed, May 31, 2017 at 06:39:29PM +0200, Michal Hocko wrote: > On Tue 30-05-17 19:52:31, Roman Gushchin wrote: > > >From c57e3674efc609f8364f5e228a2c1309cfe99901 Mon Sep 17 00:00:00 2001 > > From: Roman Gushchin > > Date: Tue, 23 May 2017 17:37:55 +0100 > > Subject: [PATCH v2] mm,oom: add tracepoints for oom reaper-related events > > > > During the debugging of the problem described in > > https://lkml.org/lkml/2017/5/17/542 and fixed by Tetsuo Handa > > in https://lkml.org/lkml/2017/5/19/383 , I've found that > > the existing debug output is not really useful to understand > > issues related to the oom reaper. > > > > So, I assume, that adding some tracepoints might help with > > debugging of similar issues. > > > > Trace the following events: > > 1) a process is marked as an oom victim, > > 2) a process is added to the oom reaper list, > > 3) the oom reaper starts reaping process's mm, > > 4) the oom reaper finished reaping, > > 5) the oom reaper skips reaping. > > > > How it works in practice? Below is an example which show > > how the problem mentioned above can be found: one process is added > > twice to the oom_reaper list: > > > > $ cd /sys/kernel/debug/tracing > > $ echo "oom:mark_victim" > set_event > > $ echo "oom:wake_reaper" >> set_event > > $ echo "oom:skip_task_reaping" >> set_event > > $ echo "oom:start_task_reaping" >> set_event > > $ echo "oom:finish_task_reaping" >> set_event > > $ cat trace_pipe > > allocate-502 [001] .... 91.836405: mark_victim: pid=502 > > allocate-502 [001] .N.. 91.837356: wake_reaper: pid=502 > > allocate-502 [000] .N.. 91.871149: wake_reaper: pid=502 > > oom_reaper-23 [000] .... 91.871177: start_task_reaping: pid=502 > > oom_reaper-23 [000] .N.. 91.879511: finish_task_reaping: pid=502 > > oom_reaper-23 [000] .... 91.879580: skip_task_reaping: pid=502 > > OK, this is much better! The clue here would be that we got 2 > wakeups for the same task, right? > Do you think it would make sense to put more context to those > tracepoints? E.g. skip_task_reaping can be due to lock contention or the > mm gone. wake_reaper is similar. I agree, that some context might be useful under some circumstances, but I don't think we should add any additional fields until we will have some examples of where this data is actually useful. If we will need it, we can easily add it later. Thanks! Roman -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org