From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20518C77B7C for ; Mon, 1 May 2023 20:05:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232579AbjEAUEn (ORCPT ); Mon, 1 May 2023 16:04:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50766 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229937AbjEAUEm (ORCPT ); Mon, 1 May 2023 16:04:42 -0400 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93E46138 for ; Mon, 1 May 2023 13:04:39 -0700 (PDT) Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 341Jmxwb027622; Mon, 1 May 2023 20:04:31 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : content-transfer-encoding : content-type : mime-version; s=corp-2023-03-30; bh=9Hu8VToxo+VLrIc4DPrNi7pzFTfFDdUniGFM/GdBDSI=; b=OvwYSBKit39QY30J1A3nqOTATD2O9qjwSZ0NThGJQPTjVESxWD/boi7Uu52wpiRwYZ2l I2pRIBOt3dHdPkphZau9vUGcM53LsG5uzlBZNCeLBXIdQYgnhdfWk7JPQA2xkFOCwLsf i2Uhd+zB+nvCpS6g1+Yj31E5PA98NG6SomBURRzsOgCnZgBwd1eDmES1Bh8T0ISEq7HA Yi2VkEqjSEytA3Idqy0us8irGGS7JlaLkIpbkJX/EuhG0BRZfCjPVoUpkDdMq4FAoa/4 8LaetzSQgekh/gE5TEn2f6/lggATMRpEBVckGI5gSqlWLRJ1I8R3MxcDld67MgmYHGAP Vw== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3q8sne35g0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 01 May 2023 20:04:31 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 341JhW5M036176; Mon, 1 May 2023 20:04:30 GMT Received: from nam12-dm6-obe.outbound.protection.outlook.com (mail-dm6nam12lp2175.outbound.protection.outlook.com [104.47.59.175]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3q8sp4uvy7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 01 May 2023 20:04:30 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=aU91aVbZCYvILN8HWVAiGuxAfaUFT2iZlV4QpYJ0QOdxfVX/ILe1BeEF9WplCNrwvuq8AdrF6vJrWFDb6BmlGf7AM51DXap7/g0nWmOGATGTXV2WFPyzDYEdId0n2KndoRTzCeUGuLsDv8SEX0WVE5poDoMGOoYVLoV2hIyrjisXV8zks/h42j/2LZX+Bh+HQCo58gP3pYtJOZU6biJicP+wd/pF8K4CsvcL3kdyfuG/agV/GVe/fW/FfkzvPsr61+ZL4tcskSuzqhAynQ4apGjC1eT0Fa9f1oW2Wym7/tdnNGLJV24adfcyAVHDizY0hqXPLyR0ldiIjEnqQHjQxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9Hu8VToxo+VLrIc4DPrNi7pzFTfFDdUniGFM/GdBDSI=; b=WQfP2hKEsQmw3tZtssloIoKbLZWBr299/QgTlNwmsdowbaS81QgpyKyzqe/U+z5RU9DSGttZwSirqq1CtISJTDkEQtc21DNv6tWem7IYAm6eWGDmP/YdeAf8kBAcKU7eWU5He9Rh7zxFNI48B/Cp366qn/FI9eSmALBkXHEM8m2PYyScv3l/YNRTEiwlqfAEq0/TMP014y/D8m3QZvXDhQ1uHeKskDfz1LJmBI/LOyxNfMwvVFWGw0tzpwb2khbX7RmjWhC2xyvyOpkwDKOgM1Mtco3aFGWRcAaemObiZlLnR2zDa1EzEGsqa300u1uPRw1NUaFbbrOJG2o/2umkpw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9Hu8VToxo+VLrIc4DPrNi7pzFTfFDdUniGFM/GdBDSI=; b=Bn86VKAvj76bX6kQqFgC7f+VwAbE1hDjlAibCF/uRq1L6C+F4uDPpmpzzXt/Ag+MX5PigHTofFm9+Jf+7H0o0E4xShjsnb+q9tEciZ67NrZaMeHXLjxxfsEV0tgpo0YWUivGJ27Bi26YxAUQltWFpN+OBPNDtGTXluMm45yT4r4= Received: from MWHPR1001MB2158.namprd10.prod.outlook.com (2603:10b6:301:2d::17) by SA1PR10MB6518.namprd10.prod.outlook.com (2603:10b6:806:2b4::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6254.35; Mon, 1 May 2023 20:04:27 +0000 Received: from MWHPR1001MB2158.namprd10.prod.outlook.com ([fe80::ff1c:f1fb:8db9:22e2]) by MWHPR1001MB2158.namprd10.prod.outlook.com ([fe80::ff1c:f1fb:8db9:22e2%6]) with mapi id 15.20.6340.030; Mon, 1 May 2023 20:04:27 +0000 From: Indu Bhagat To: linux-toolchains@vger.kernel.org Cc: daandemeyer@meta.com, andrii@kernel.org, rostedt@goodmis.org, kris.van.hees@oracle.com, elena.zannoni@oracle.com, nick.alcock@oracle.com, Indu Bhagat Subject: [POC 0/5] SFrame based stack tracer for user space in the kernel Date: Mon, 1 May 2023 13:04:05 -0700 Message-Id: <20230501200410.3973453-1-indu.bhagat@oracle.com> X-Mailer: git-send-email 2.39.2 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: MW4PR04CA0269.namprd04.prod.outlook.com (2603:10b6:303:88::34) To MWHPR1001MB2158.namprd10.prod.outlook.com (2603:10b6:301:2d::17) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MWHPR1001MB2158:EE_|SA1PR10MB6518:EE_ X-MS-Office365-Filtering-Correlation-Id: b43b28df-ba0d-45f2-055f-08db4a7f442f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: eVbHij2j0+sO0ups/xgfjA/wTr/6Qa1hYODdVpY3wA7j3W1bCjGq9BU+RP5DKpqruLjyAAALgiEcaJvuur3nzaGf41JwffJZkPxFGNpsWgFMO3GiD6c2g3Lw6jMl/e37MH0Bu3EPgIzzY1eq3Jr71rmR/CFBdbFKmN4lCVQ2JeNixG0ogX0n6cjkySneG9mL1wS8GchCV7Bz0jwi0M07l3aKj6jSnsMvpD+AVY7DbK8THHcHwCzOCPo9woGZYENEfuhsvAxxrcn/qb8aghMMZPfCbxm89/euYQ2hrxAZMFR4GNMmRybWyV5wRhCinzwSxko+sM48IcoKN1zcjaSn0ZilNrM+JYgjZ2EedNSqVO9QtkwHyn8HVN9PHqR2Y+bsif7crJAwDsFlA/Onin7rObVddrhRku+fg+KbN7paH9eQ2DT3uDn6IEjRCTtxVG+4KqSb1smn/R4S+Y9Vg/gi6X6ord+eXfj2cavCsukM0BGn0NHY1Zq5XvPCvFnSnmK72L/EqeO6ri+f9eM3RTcDJ/N63lx9ztB6765CQCSnjN8i5CLimtflO5AQeyg2hN6iynHvZdFzmJLK08nI3nONCA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MWHPR1001MB2158.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(376002)(396003)(39860400002)(346002)(136003)(366004)(451199021)(6506007)(6512007)(1076003)(186003)(6666004)(2906002)(66946007)(36756003)(107886003)(316002)(66556008)(66476007)(8676002)(4326008)(6916009)(6486002)(83380400001)(966005)(86362001)(44832011)(8936002)(66899021)(30864003)(38100700002)(41300700001)(5660300002)(478600001)(2616005);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?/4nAbwMbu7Lq14431S0D+kw8R3bcyQmiP/bl/6gYVNAxqv+8osa0sKV0oLZi?= =?us-ascii?Q?/at6E7d0UamGFlydS4ZT6zGKSswv3Jg+NUa7EwopY7/+ShPWyx9ItNyEzuk5?= =?us-ascii?Q?4agdB/rO+zbhQR/tKEvDty/8c845ybcGtKcp5XEJzanSss6+BYEMH5h657qC?= =?us-ascii?Q?sGARnSAXQeAnO3GxfZa4ZMv+VhxwFYML1mTIdwSnbeKUmjDPI2cJKbsiHeoC?= =?us-ascii?Q?nmDflj6hn4bAiS5N0n9TQQHjq47dI2q08lt/vCY4J4A58x4ZlxKTpoaTitMT?= =?us-ascii?Q?SQhACcbK/lcj+2ihqfQszrNErxBddt3CSF71Y56BFvWQd4PIAzrNXa+dC8Wp?= =?us-ascii?Q?1whhQPZFH4TXJxzIcUbOIbp3oO1BG4Zq4rv3PB+pQkHa6DrmxnTutoQVU5A7?= =?us-ascii?Q?2XffeHrtTh7O5vpk/EaCcPF6i0spuUKNlO8WTjPwPNbQT3TrDajpKuAr+57m?= =?us-ascii?Q?ufajimrnCSlzAbnSnxx647v/PiTcGY99AZ02EortCuR81fMsvqVrLQCYWFel?= =?us-ascii?Q?N4WdA0CdbnVg0MQxmpfKg16tVlemugyLuaJBpVm0IXXtmGGDyesHnww9VvUQ?= =?us-ascii?Q?/fvn2gVhff9Fw/2DfvevxPs7Di/st8fiSyWhPp9RDQfrqlNO7mgFOJPGutHV?= =?us-ascii?Q?/qcFRua8i8iKpzn6ElOci58cJCy9Pf0/fmLrmoz1+2rEYqO3JUdaTu+KVfWr?= =?us-ascii?Q?fkIqRLiGTHgN3qDpifHKfjQ4YnzGi8Ru5OMGSr9w0+B9MhOhdQq+9cx+EApu?= =?us-ascii?Q?OOSDcagdBPmpwXeVAJ8JzWFskEas/xb7ozHlHqFPyaHclOKLmNZYEU6YrdCn?= =?us-ascii?Q?pgCnjUF9/mOcs9hJ8v/RQzIjeDonDeL/BEdwWqi72E23yfi2HPthOxNEesRs?= =?us-ascii?Q?vRaD02iGLV+jmcTzNw0xcu1SQ9Zg0EOECVZm86La9atD2OAaRV5wYWD9CWjw?= =?us-ascii?Q?iJyK7v1vTPUtIKBme+/LYEur1dHh7T9pvJXQxp7LPBepR9m8j0aEegkbtp5F?= =?us-ascii?Q?NNCk07XPmTrozoudHKdrct7T3b68uSpRXzlATlvJe35vIQfK7b8q7JvjQPB9?= =?us-ascii?Q?QH0TpQk2z6sGukroBoK2pZ8M4Fv52Xh8fuhDloSsaC8wrJxcO8l8aXBp243q?= =?us-ascii?Q?E3TyJBTzRzeDEg+oBTAuuI/jk/85CEX4MmzVfgZXCe3X9E58HG/KWIUz84y9?= =?us-ascii?Q?oAWgBp6Zlt7X/7Pu+HNE/XNmNzp8flAhlNtwPZCakeSdH+0eOhDlgPzwsVzf?= =?us-ascii?Q?smaQ5QcBR0o7M8Cn5zLHLfp8PlKYUd/5bsOVJBZbG+qbPj/EKIHrmO7R/BBT?= =?us-ascii?Q?RHPq3QXYiH9HqDu8rdt+R8Lds8DNuxC/y8pWgraHZLw2GCu6cbqq/HXXLf+b?= =?us-ascii?Q?tz6jKTt+Vk4AIJcC9xJLH0PktgOE4ocaiYSG+NBzCjQHcLijTUssBPVGqHgt?= =?us-ascii?Q?KUz3oapIbNLCNLsHrTaYZSmm3XVlUCt+ObAKldBeiDZ6gCLNDpRPntuXfpBF?= =?us-ascii?Q?EgcUQhRGGVpSmDhcV0qwmI+6pY++8xBirKACMQPI7hq77KfVtTZBDCPbQxhF?= =?us-ascii?Q?nnurbQ6Lr0K1nl71mXGS2wfvxqjd88YGb2DG3MNBrh4ub2HdYuk/h6YbRpXK?= =?us-ascii?Q?EVkrBpsGTlvihJkL+HKLkmE=3D?= X-MS-Exchange-AntiSpam-ExternalHop-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-ExternalHop-MessageData-0: ecnyO9hN9/8RoQaDyXtKWPeIkiG/i3QEZmI/CwQW2MSzhnlRSEmQAk6OKvMfMV58nYl3LGj1kLX3iRkGFewoYDZTTT1+yhPv+YrqQSQk0aT/2xxhPKsiEwvKIIKEaMa6fiN8HzDKMYbDg0WAwdWLL5/xT8bAeHswfVxZmUOSfq2Ho45TC4X0qx7SqVlfQ6l12/tEuw1w4pH0PL4fbxWvj9XpGGv65+c6SJOENGSfcB/DfoNjFy23vCa6uA+VeUwkvs2FjWjASfJ/Fcx1V2mHQ04Ou56sgWCY0ZVeycUiyHnpMjJ7ggP9TpWMCw5XV03S9cosiwQcNypK6HzTc/WKoM/j05hnSLXmcZ1GLmhYSqTcH9lfvPPuUg5kesPDJscHzMSyMLuY0Xj0zHMChsBQ4vacO5l5xmhkqbjzxgv6k8Y1xoMEAVwPIsBbacrIIXDuLmozcAhKzVAzW4u/U8NHpPUeq7se/1ITpvDkpKUYaebCrCK9I521QNpiL1FezPJgVVvgwcbk7eqKkWhZyWDdLL/qfsx+XRyeTA/KzDRF0PXO7z1ye1YveOiyIWGO9CwUmp0ayQvyiNTQ59RuJRl9bQmRvxJyDs8aMEUalX0VpzlPSmedYw7I9SEpUZoTOia+yhCu8TJ6jCwKh9X0saNkkO0nHGRX9bKxG6N5wWfmIGims+MAntjwXTwxEeyCFsNNuH85PMRTLi0cxbKq00mWu9UNrS3AF4BElTPuHCvbiz6Yp0o79rlGvC4FRYjOooWQGbFjAOu5hIND0fHq8kqSzhcQ17GKUxbZmSlzON4Wk2eI3eyCff/VTpzG74+nmqnZFILMp5yzkOf0rRDx/EGW8w== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: b43b28df-ba0d-45f2-055f-08db4a7f442f X-MS-Exchange-CrossTenant-AuthSource: MWHPR1001MB2158.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 May 2023 20:04:27.1525 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: yS/1rWP9w14bG0uxVxb7lP8AkHXhNlrAHy/Ff7smN9UgdDUGHzNqJZBibMqdkMq04p1IoWOFQDy+aESRN48qYA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR10MB6518 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-05-01_12,2023-04-27_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 mlxscore=0 spamscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 adultscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2305010161 X-Proofpoint-GUID: RbBrGlFVooH2uII-2GjqC7oB4Z_-dDWB X-Proofpoint-ORIG-GUID: RbBrGlFVooH2uII-2GjqC7oB4Z_-dDWB Precedence: bulk List-ID: X-Mailing-List: linux-toolchains@vger.kernel.org Hello, This patch set is a Proof of Concept implementation for an SFrame-based stack tracer for user space in the kernel. Some of you had expressed interest in exploring this earlier; hopefully, this POC helps discuss the design and take it forward. Motivation ========== Generating stack traces is vital for all profiling, tracing and debugging tools. In context of generating stack traces for user space, frame-pointer based unwinding works, but has its issues ([1],[2]). EH_Frame based unwinding seems undesirable for kernel's unwinding needs ([3],[4]). In general, EH_Frame based unwinding is undesirable in applications that need fast, real-time stack tracers (e.g., profilers), because of the overhead of interpreting and executing DWARF opcodes to calculate the relevant stack offsets. SFrame (Simple Frame) stack trace format is designed to address these concerns. With this POC, we would like to see how to use SFrame as a viable alternative for user space stack tracing needs in the kernel. [1] https://lwn.net/Articles/919940/ [2] https://pagure.io/fesco/issue/2817 [3] https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/OOJDAKTJB5WGMOZRXTUX7FTPFBF3H7WE/#NXRMNKD4B23HX7U5ICMKFRZO6Z3VXQXL [4] https://lkml.org/lkml/2012/2/10/356 What is SFrame format ===================== SFrame is the "Simple Frame" stack trace format. The format is documented as part of the binutils documentation at https://sourceware.org/binutils/docs. Starting with binutils 2.40, the GNU assembler (as) can generate SFrame stack trace data based on the CFI directives found in the source assembly. This is achieved by using the --gsframe command line option when invoking the assembler. This option plays the same role as the existing --gdwarf-[2345] options, only this time referring to SFrame. The resulting stack tracing information is stored in a new segment of its own with type PT_GNU_SFRAME, containing a section named '.sframe'. Also starting with binutils 2.40, the GNU linker (ld) knows how to merge sections containing SFrame stack trace info. SFrame based user space stack tracer POC ======================================== These patches implement a POC for an SFrame based user space stack tracer (for x86) in the kernel. The purpose of this code is to serve as a reference, initiate discussions, and perhaps serve as a starting point for a viable implementation of an SFrame based stack tracer. Please keep in mind that my familiarity with with kernel code/processes/conventions is still limited ;-). High-level Design in this POC ============================= Kconfig adds two config options for userspace unwinding - config USER_UNWINDER_SFRAME to enable the SFrame userspace unwinder - config USER_UNWINDER_FRAME_POINTER to enable the Frame Pointer userspace unwinder If CONFIG_USER_UNWINDER_SFRAME is set, the task_struct keeps a reference to the sframe_state object for the task. For long running user programs, it makes sense to cache the sframe_state in the task and be able to simply do a quick do_sframe_unwind() at every unwind request. Caching the sframe_state also means keeping the .sframe pages (for the prog and its DSOs) pinned. The task's sframe_state is kmalloc'ed and initialized in load_elf_binary, when the task is close to begin execution. The (open) issue with this design, however, remains that we need to detect when additional DSOs are brought in at run-time by the application. The detection (and resolution) of stale sframe_state is not implemented in this POC. As such, the POC at this time is fit only for applications that are statically linked. Following pseudo code roughly describe the relevant stubs around how the SFrame-based unwinder is currently hooked. load_elf_binary() { ... // check if any phdr.p_type with PT_GNU_SFRAME is seen if phdr.p_type == PT_GNU_SFRAME is seen sframe_avail = true ... if sframe_avail sframe_state_setup() // does all kmallocs and get_user_pages_XX ... finalize_exec (bprm) } perf_callchain_user() { ... // check if task.sframe_state is valid sframe_avail = check_sframe_state_p (current); pagefault_disable() // check if task.sframe_state is ready and not stale if sframe_avail && task.sframe_state is ready ret = sframe_callchain_user() // uses __get_user to access stack if ret is success pagefault_enable() return ... Frame pointer based unwinding pagefault_enable() ... } tast_struct.sframe_state is cleaned up in release_task(). What do you think about the above workflow ? What about caching the sframe_state in task_struct? As you see, there are some open issues around this, and discussion is needed to help resolve some of those. Apart from the above design points, other reasons why this remains a POC and not ready for submission are: - Code deals with only Elf64_Phdr (no Elf32_Phdr) at this time; some specific cases like when ELF hdr's e_phnum is equal to PN_XNUM are not handled yet (iterate_phdr.c). - Missing detection of when there is a change in the memory mappings of a task. E.g., dlopen/dlclose are two of the possibilities using which a user program's mappings may have changed over time. - Code stubs around user space memory access by the kernel. For sake of clarity, let me outline here the three locations where user space memory is accessed in context of SFrame based unwinding: 1. Access the ELF header in iterate_phdr(), followed by accessing the ELF PHDRs in add_sframe_unwind_info(). This is currently using get_user_pages_remote() in iterate_phdr(). 2. Access the .sframe section for decoding in sframe_unw_info_init_dctx(). This is currently done by using get_user_pages_unlocked() 3. Access the program's execution stack in sframe_unwind_next_frame() to read, say the caller's IP on x86_64. This is currently done by using __get_user(). - Other stubs marked with FIXME TODO, - The patches may not be bisectable. I haven't particularly tried to compile them individually either. - More testing, including checking out some regression tests. Each commit log has further details. Testing Notes ============== I have tested these patches minimally using: 1. perf on kernel master 2. BPF uprobe on kernel master 3. dtrace with dtrace-linux-kernel v2/6.1.8 (https://github.com/oracle/dtrace-linux-kernel/tree/v2/6.1.8). This diff between the v2/6.1.8 branch and the Linux 6.1.8 is the few patches for CTF/DTrace. dtrace is a tracing tool that can be used to diagnose problems and probe a running linux system. For the following experiment, I used unchanged dtrace packages. The dtrace command line used is: dtrace -c prog -n 'pid$target::func:entry { ustack (); exit(0); }' This triggers a ustack() action when the said function 'func' in program 'prog' is entered. It gives the user stack then exits. The dtrace ustack() action internally invokes the perf_callchain_user(). The latter is updated in the POC patch set to perform SFrame based stack tracing for user space. DTrace uses BPF under the hood, but testing both DTrace and BPF individually has been valuable overall. All binaries below were compiled with -Wa,--gsframe. A few tests to showcase the POC are given below. TEST 1: Toy hello world program with the call chain as follows: main() -> foo() -> bar() -> baz() $ cat deep_hello_sframe.c #include #include int baz (int a) { return a * rand () + 100; } int bar (int a) { int c = baz (a); return c * a * rand (); } int foo (int a) { int b = bar (a); return b * a * rand (); } void main (void) { int a = 100; int b = foo (a); printf ("Hello world %d \n", b); } $ dtrace -c ./deep_hello_sframe -n \ 'pid$target::baz:entry { ustack (); exit(0); }' DTrace 2.0.0 [Pre-Release with limited functionality] dtrace: description 'pid$target::baz:entry ' matched 1 probe ... CPU ID FUNCTION:NAME 1 114215 baz:entry deep_hello_sframe`baz deep_hello_sframe`bar+0x16 deep_hello_sframe`foo+0x16 deep_hello_sframe`main+0x19 $ perf probe -x ./deep_hello_sframe --add baz $ perf record -g -e probe_deep_hello_sframe:baz ./deep_hello_sframe $ perf script deep_hello_sfra 25887 [000] 125196.580149: probe_deep_hello_sframe:baz: (401136) 1136 baz+0x0 (//deep_hello_sframe) 1165 bar+0x16 (//deep_hello_sframe) 1195 foo+0x16 (//deep_hello_sframe) 11c8 main+0x19 (//deep_hello_sframe) $ perf report --call-graph --stdio 100.00% 100.00% (401146) | ---main foo bar baz TEST 2: Using a BPF program target.c, get stacktrace using BPF bpf_get_stack helper in bpf-uprobe.c. I am skipping the BPF program for brevity. $ cat target.c #include #include #include #include #include /* open */ int fd; int foo9(int x) { write(fd, &x, sizeof(x)); return x ^ 1; } int foo8(int x) { return foo9(x) ^ 1; } int foo7(int x) { return foo8(x) ^ 1; } int foo6(int x) { return foo7(x) ^ 1; } int foo5(int x) { return foo6(x) ^ 1; } int foo4(int x) { return foo5(x) ^ 1; } int foo3(int x) { return foo4(x) ^ 1; } int foo2(int x) { return foo3(x) ^ 1; } int foo1(int x) { return foo2(x) ^ 1; } int foo0(int x) { return foo1(x) ^ 1; } int main(int c, char **v) { int x = 0; fd = open("/dev/null", O_WRONLY); if (fd == -1) { printf("open failed\n"); return 1; } while ((x = foo0(x)) < 10) ; close(fd); return 0; } $ gcc -Wa,--gsframe -o target.sframe target.c $ #offset=getoffset_of_foo9_in_target - baseloadaddress_in_target $ echo "p:ibhagat/myuprobe $path_to_target:$offset" >> /sys/kernel/debug/tracing/uprobe_events $ ./target & $ #target_pid=`pgrep target.sframe` $ #event_id=`sudo cat /sys/kernel/debug/tracing/events/username/myuprobe/id` $ gcc -DTARGET_PID=$target_pid -DEVENT_ID=$event_id -o bpf-ustack bpf-ustack.c $ sudo ./bpf-ustack # dumps IPs of callchain 401156 401197 4011b1 4011cb 4011e5 4011ff 401219 401233 40124d 401267 4012c3 $ grep -A 2 'call' target.sframe.s | grep -A 1 'foo' 401192: e8 bf ff ff ff callq 401156 401197: 83 f0 01 xor $0x1,%eax -- 4011ac: e8 d1 ff ff ff callq 401182 4011b1: 83 f0 01 xor $0x1,%eax -- 4011c6: e8 d1 ff ff ff callq 40119c 4011cb: 83 f0 01 xor $0x1,%eax -- 4011e0: e8 d1 ff ff ff callq 4011b6 4011e5: 83 f0 01 xor $0x1,%eax -- 4011fa: e8 d1 ff ff ff callq 4011d0 4011ff: 83 f0 01 xor $0x1,%eax -- 401214: e8 d1 ff ff ff callq 4011ea 401219: 83 f0 01 xor $0x1,%eax -- 40122e: e8 d1 ff ff ff callq 401204 401233: 83 f0 01 xor $0x1,%eax -- 401248: e8 d1 ff ff ff callq 40121e 40124d: 83 f0 01 xor $0x1,%eax -- 401262: e8 d1 ff ff ff callq 401238 401267: 83 f0 01 xor $0x1,%eax -- 4012be: e8 8f ff ff ff callq 401252 4012c3: 89 45 fc mov %eax,-0x4(%rbp) $ perf probe -x ./target.sframe --add foo9 $ perf record -g -e probe_target:foo9 ./target.sframe ^C $ perf script ... target.sframe 20395 [000] 69987.711764: probe_target:foo9: (401156) 1156 foo9+0x0 (/target.sframe) 1197 foo8+0x15 (/target.sframe) 11b1 foo7+0x15 (/target.sframe) 11cb foo6+0x15 (/target.sframe) 11e5 foo5+0x15 (/target.sframe) 11ff foo4+0x15 (/target.sframe) 1219 foo3+0x15 (/target.sframe) 1233 foo2+0x15 (/target.sframe) 124d foo1+0x15 (/target.sframe) 1267 foo0+0x15 (/target.sframe) 12c3 main+0x57 (/target.sframe) ... Please take a look. Any feedback is appreciated. Thanks, Indu Bhagat (5): Kconfig: x86: Add new config options for userspace unwinder task_struct : add additional member for sframe state sframe: add new SFrame library sframe: add an SFrame format stack tracer x86_64: invoke SFrame based stack tracer for user space arch/arm64/include/asm/sframe_regs.h | 37 ++ arch/x86/Kconfig.debug | 31 ++ arch/x86/events/core.c | 51 +++ arch/x86/include/asm/sframe_regs.h | 34 ++ fs/binfmt_elf.c | 39 +++ include/linux/sched.h | 5 + include/sframe/sframe_regs.h | 11 + include/sframe/sframe_unwind.h | 62 ++++ kernel/exit.c | 9 + lib/Makefile | 1 + lib/sframe/Makefile | 11 + lib/sframe/iterate_phdr.c | 113 ++++++ lib/sframe/iterate_phdr.h | 34 ++ lib/sframe/sframe.h | 263 ++++++++++++++ lib/sframe/sframe_read.c | 498 +++++++++++++++++++++++++++ lib/sframe/sframe_read.h | 75 ++++ lib/sframe/sframe_state.c | 424 +++++++++++++++++++++++ lib/sframe/sframe_state.h | 80 +++++ lib/sframe/sframe_unwind.c | 208 +++++++++++ 19 files changed, 1986 insertions(+) create mode 100644 arch/arm64/include/asm/sframe_regs.h create mode 100644 arch/x86/include/asm/sframe_regs.h create mode 100644 include/sframe/sframe_regs.h create mode 100644 include/sframe/sframe_unwind.h create mode 100644 lib/sframe/Makefile create mode 100644 lib/sframe/iterate_phdr.c create mode 100644 lib/sframe/iterate_phdr.h create mode 100644 lib/sframe/sframe.h create mode 100644 lib/sframe/sframe_read.c create mode 100644 lib/sframe/sframe_read.h create mode 100644 lib/sframe/sframe_state.c create mode 100644 lib/sframe/sframe_state.h create mode 100644 lib/sframe/sframe_unwind.c -- 2.39.2