From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC7C8C433F5 for ; Thu, 12 May 2022 16:11:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356299AbiELQLh (ORCPT ); Thu, 12 May 2022 12:11:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356301AbiELQLH (ORCPT ); Thu, 12 May 2022 12:11:07 -0400 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 870FC3B2A3 for ; Thu, 12 May 2022 09:11:05 -0700 (PDT) Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 24CEFahX010429; Thu, 12 May 2022 16:10:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=message-id : date : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding : mime-version; s=corp-2021-07-09; bh=u1USWRC9SJ/0LB9vjVSg+s+gpe+Y7QM/iYIilWPXFzA=; b=UWl9/XxMZFdp6aYPCPDvOf8wjh9Cl2ouulVIWEp9GVzGbMpH5HsWJ035DBjk8NC6aa9x 1zunooWz7d1m4GYzp1Q/L0CllHQGtWOy8ADz+na/avNubGa0YHW9S3AGHxsiSZW8iOUK PZHMc3n7spTYTog6vbiU/XdSOK1IK3KuXU3LHJYOR3iC3zEUUJ81jSWEFLNoqgVSBwqk WUCLmLfEP64BPaQIilVuzsnfEo0DUfSgjcGsrFAj8SnyI7zlMiZGKfAeHVU0I6vrkovt plocY/Lcgq2Bu+wGbJqrw/d5Z64i4ouASGbRxm3dWJGiKkaMY5BGr9FjXX55mxOERZfE fQ== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3fwf6ccw2p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 12 May 2022 16:10:41 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.16.1.2/8.16.1.2) with SMTP id 24CG5k6F008935; Thu, 12 May 2022 16:10:39 GMT Received: from nam10-bn7-obe.outbound.protection.outlook.com (mail-bn7nam10lp2105.outbound.protection.outlook.com [104.47.70.105]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com with ESMTP id 3fwf750vgq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 12 May 2022 16:10:38 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=CiTedDZ+Sj9zk4Zbk8bA69/e3F9dTIkl3+LmGxt+i5ufEOX5RD/Wh03ltFFgoyu2scPSdsa++WquxvKls0kagub0XOS8tsoSRcfjOaufQIc2yBkoMs6pTct17pX0N0EJoGsgb0Uwk0sUYyPvPl0KxPfExxQJpx25peZeYt5xhq2yBPxw6S0U/1PvWB5eoRsBeC8Y9oFNKw2mrIYkRkS7ofgLOsyDT2lPzevnLQ8PD3nMXZVXtEdTVF3B+XCko3afUJN726lEWB6AiWayJjm6DsRTKH5elVKxnAMp190/5EiHkWi3//7FrwSopA9nkRyg0XL8quvK1HxbxmkAjoa1kQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=u1USWRC9SJ/0LB9vjVSg+s+gpe+Y7QM/iYIilWPXFzA=; b=c87StSGnv+Cgzb86eBHYLIManYghR/k73qlhsJgvvx100Kj0+J9n/rVhBHhIHis7FblgTRg5gjjJQyM9+Z67pt2Gzq6+hCjrT7tqmo1CBhaBS480IjyjL3HlkMX5XIEU4EK9LIbeGEZia86aiRXMbO5Ay48pqOOr3OebhVLkS091+NBpNt/8m80BH/gABegcHASrnPKEGygxU+TWzaSePJ1oWPe3LV//td3ukbIOlJJzG1j5omRCzjJTrcMYCTNaLS9/7egL8ndUW9R2eWGqX61E8RUui1TPbVsSLz/ab+SsMyGtgfFDaVcGQ4zPKbtNsUWRYIR56kJdHzaAm1zuyA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oracle.com; dmarc=pass action=none header.from=oracle.com; dkim=pass header.d=oracle.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=u1USWRC9SJ/0LB9vjVSg+s+gpe+Y7QM/iYIilWPXFzA=; b=a9W8816PbMbTZKuEbH/Oxv008sKzrnKpKGGRQ/6cj5bh50B4nJHdG3ZdiX11mBRDyrQJ5Q/0rtCdD3wwedMoHV5cc6CqMPaiNtxF6bKuvVFOKCQBhOJ4nTTzFA/TVSYvW7LveOHY/vv2F9qzSPlLOyPDjZvArn0+BFLBaNR9TXY= Received: from CO1PR10MB4531.namprd10.prod.outlook.com (2603:10b6:303:6c::22) by BY5PR10MB3906.namprd10.prod.outlook.com (2603:10b6:a03:1f7::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5227.22; Thu, 12 May 2022 16:10:33 +0000 Received: from CO1PR10MB4531.namprd10.prod.outlook.com ([fe80::bd49:f0ab:b476:860c]) by CO1PR10MB4531.namprd10.prod.outlook.com ([fe80::bd49:f0ab:b476:860c%8]) with mapi id 15.20.5250.014; Thu, 12 May 2022 16:10:33 +0000 Message-ID: Date: Thu, 12 May 2022 11:10:28 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.1 Subject: Re: [PATCH v8 3/7] crash: add generic infrastructure for crash hotplug support Content-Language: en-US To: David Hildenbrand , linux-kernel@vger.kernel.org, x86@kernel.org, kexec@lists.infradead.org, ebiederm@xmission.com, dyoung@redhat.com, bhe@redhat.com, vgoyal@redhat.com Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, nramas@linux.microsoft.com, thomas.lendacky@amd.com, robh@kernel.org, efault@gmx.de, rppt@kernel.org, konrad.wilk@oracle.com, boris.ostrovsky@oracle.com References: <20220505184603.1548-1-eric.devolder@oracle.com> <20220505184603.1548-4-eric.devolder@oracle.com> <62089f7b-4a3e-7dc8-1cda-84583e19d6fd@redhat.com> From: Eric DeVolder In-Reply-To: <62089f7b-4a3e-7dc8-1cda-84583e19d6fd@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: BYAPR07CA0041.namprd07.prod.outlook.com (2603:10b6:a03:60::18) To CO1PR10MB4531.namprd10.prod.outlook.com (2603:10b6:303:6c::22) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 697d9113-9be7-4829-853d-08da3431f131 X-MS-TrafficTypeDiagnostic: BY5PR10MB3906:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: JYAAaTai2xHRwNXzU70V//97+LEJwwg0AgxRp8rkDt3ZuknWPnl9W3hLhssiwj9zVSwLNSugbfVGCyi40Ut7k2AcYF70FXzZPpuhRUDULpqLiil6l9xEWGiXUfUYIBqVHeFUHECjk8W7go9PgMGzEhbcypZnDvRmTwGV8UDBb81B/SZWoZ7ZGmzxN0IpVBJdk1se2VCkjTM7otAyxwTGoOSPNLu1YP81PjHhPfj3FPAae/KTInH98XOBIa7DIJotsWf+thugzGWC5dlXbs05c4MCenk1BnBHQ4cinDXMKbwfMjs+zFnddaV37SNvn7uSTpZKsneaeRnoKBWuZuidhB1tkmEn4Ac45Tv2XvKB8OnlOC7DTtbrfC2geD1BO6Q3MfdFxrkbvYt1jYY5EvHaK4kZ3lg74Tck1Hm9/l+oad6P+vV7wVNzgJ1G9xwqXWttpSpF86Gvjs7S5AnELgNjDhnTna6kAB2JkbSGEWp6hnifD+05VKJzZGHwyaUAwtZl5LKnPVhZ1A8Os7Uykn6++uuoeYWGm2jNC2m3eiM+bPjXPcr5vCp6OIEFsY4IfODtNI3QAqPS+p9oArBskVuZGBEg9giGNk1GZjkXaD9f8vAkUB4QzTscijF2a3UFxlQ1vkqj9cP2Tr22BpT7oMESFT5BIkZhZvWZNdeod6rtArfExYshqZ0fhPlXKTsa6YrN57/4Y+ZYkRSi1Tu3w4nGVur7bMmcB+UOWQK8ymNpb30/krR61VuIxQSS+xY6+bGNPauyp5c6DhCIydA//FvA3w== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CO1PR10MB4531.namprd10.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(6512007)(8676002)(2616005)(5660300002)(31696002)(4326008)(86362001)(53546011)(2906002)(6506007)(6666004)(6486002)(83380400001)(7416002)(508600001)(8936002)(316002)(38100700002)(31686004)(36756003)(186003)(66946007)(66556008)(66476007)(107886003)(41533002)(45980500001)(43740500002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 2 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?UVBnSG9VUjl3L1BmNk1qN2thaEJpbEFBOXBGSWEvMTN1dW1HQ1FpdW5yVXpR?= =?utf-8?B?UGhvV09yWkJ6blhJdUhycnJKTzREVmcrdkFLWm5ySThwOHQ2L055R3ZLeEVy?= =?utf-8?B?VkdhVk9UTkoyeSs5akY2N0VrTm16RXRtOHBOZ3R2OVdkL3NIUmR3NE5pQXZP?= =?utf-8?B?SWl5ZTJPUDVwR1lKbURYUDVKNk9tMHYwb0ZDa0RKR3phcGwvcUR0ZHFIaWNR?= =?utf-8?B?bS9aR1k1UHc2V1p0VnhScEpLMEJYMFR6L0toUGZkdXdZaDQ5OHFTMVJZaUZm?= =?utf-8?B?azRXbDdIWXBrbTRma2k4U3Q1UDdpVU1kT2x6NkNzajV5VWJKSTlLSDJ1T0VS?= =?utf-8?B?NHFsbGtZQ3NMalgwMjYrd3BjNFVnV0NuUmtMcFZWemNrYTBCWjBJVDc5UzBn?= =?utf-8?B?SGtwWG51SlVDUVpyVnRaMkRqTTdVZFJYOVVsK3BTREdodzY5Njh5ckF2ZlZs?= =?utf-8?B?ZG1qZGYvU3NTOEdLbjB5SHJ1TkNjanZUWHJFdTFaYk1LVVR3YkltVkhlaFk0?= =?utf-8?B?c3EwLzgwUVNpUk9FbGJzY0R2UnNJUTRseDBsR2UxSWRaTjZPQW9GUXBCT0E0?= =?utf-8?B?MkJzY1FFRFpVYVo5RnpmTVJHWDY5MzR2MUhFVkxqQlZwektXTS90MWFwQnU5?= =?utf-8?B?cGkvaVlpYUoxUThKVmlERUs1bjBxTE5XakNPLzlsTjhma0lmV1ZVZ2pXdCtj?= =?utf-8?B?QTRvUFE3bkloMGl0OGRDbnFQcmg1aHZiSnZzcTI5OXVnektnK1ZRdHZ2c0VX?= =?utf-8?B?V1VPblM1VlhBNHBWT29wdExlcnBRUlRLTDRsWFdnbTZiblBINTZyR0VITXJp?= =?utf-8?B?ZndCVlo5eTZVYTVsZU5CK0JPRS9IRzRrM2IxYjFUU1RGdGlYWG5pc1V3eE5B?= =?utf-8?B?Y3lQR3dsRHZ0SjI1Ryt4cjRxcDZ1VFdGdnEvZEk5MFAxd3g2UmhuME44Smda?= =?utf-8?B?ZWFGWFRoMURaY3BzSWhFWVVkdXJaZTVVMVNyZ2EwbFE0YytJMGZqNkxyRDN1?= =?utf-8?B?ZlVGUkNUZk9NdkZRQmRNcmdtOHBSVHpXUzVIRzF0RXZsUC9aWWhNdTc4YXhm?= =?utf-8?B?TmpnWVNmTHhXT1FEMTZ3LytqOC9pekZYTmN1YlZWeVJ3d1BxNWJxTWlpdE4r?= =?utf-8?B?eTM2aEZ3WlhHU3VVUXNHSkQ4VXFyUGsza0RraDRkdVZkSzVac09ob0xPOTBH?= =?utf-8?B?cUg4UVM1RDFIcExMSGZUbEczWVY5TUVPQzYzTklIcUxNUkpsTjVnbTUwZmpD?= =?utf-8?B?VWF5MzNDajhNTkI3ZHZzVHNxTXNxZEh3b2loZTRaQ0F0NHp1MlltQkJocEVn?= =?utf-8?B?UnQ1L29vQlJHdVVFS0JjTHo4cXUwVHBDWnVTVWlXa3VIVnZuZVNLM1ZmelRM?= =?utf-8?B?bDZVMmlCcldIMC9DODFva0JrN1VTK2IvZHF0SUlJYWhVbjVJMDJjQ2ROUWcy?= =?utf-8?B?UWd0dFUvMWdNY1gxUkRIZ0YyMnBadXduYXdzOG5NM2o3L3BKbjNsMFZPcnR5?= =?utf-8?B?c1ZpSjV6Y2RjVXQ5NzVTL3VFNmU4RndZcFlFRHZaZUZzbENBeFhuMTRZN0ZT?= =?utf-8?B?SG5YcmFucUxTOC9GVGdUazBQSDJrcVV3YjNjN0g4cTljcDNBVHlnd2MyZnRZ?= =?utf-8?B?NXloZEtNYzllMi81eDJrOGFVZTVUY3JKVTdaMVdQbFZ5OWN5NWV4U1RCY1JB?= =?utf-8?B?VFo0ejVyUklmSHFtOERKc2hpYTNaMU5nY2NWc3Q5R040NkxQK2hRdHlYclNm?= =?utf-8?B?cGJuUGEwV05rMVRpQUtXVjZrZzBoaG5DU3lUUXFpMWR6bHdzNzUxUlZHVkZs?= =?utf-8?B?WTZ2VDZ2dFpwcFdtR3JBa21NTTVIMDVidzU5ZDZJUVhxcyt3OWwzdElMYS9y?= =?utf-8?B?SVdwM3gvZEx2NzJZWnAwcmtYbmE2ZjdUK24xZDUvclJlNmhPQ1hsdTFVWjhy?= =?utf-8?B?aFRCNE9KK2tOYlNZbmlucTF2QTJsbENmSCtkQUtobjgrc2xLa2h5WVhxQTVp?= =?utf-8?B?OHQxaTYrWDVIakJwU1JMMnJwbVFWS2poQ202YzdlSUZzM0dRMlQrb2o2dHhI?= =?utf-8?B?ejY2VFl3MGMwMG03ckJGTWo0ZzB6LzNqS2hDWXNaWHNTYnB3a2xWNHdLQ3pU?= =?utf-8?B?cCtJQnpMUXRZMDE0cHk3bEp2ODV6b3k1aGJyVWI4NGdxRVBDWmNHVkJrREJR?= =?utf-8?B?d1o4UDU4bHd5SkJMY3B6MlZKdklMczYvejJIeXd6dytqaWxOMG9iVmhUb0s1?= =?utf-8?B?d3Yxc3EwYTUrN204YkZaUkFxNHhnSHlMUFYrMk5qdGQ5R1ZkUTZMeWhIN3A5?= =?utf-8?B?SXhIVzR4eHZvWHIyeXZBUUZ0eGFVajVKZmV4eUZ5UTN0ejQ4VkdRMVdMUkZT?= =?utf-8?Q?CZD3LYq2zfNojlZfnd9H9eJsLXj+sjKjvsKpI8PHyqfy6?= X-MS-Exchange-AntiSpam-MessageData-1: MQ5uhFGWVTxvQQ== X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 697d9113-9be7-4829-853d-08da3431f131 X-MS-Exchange-CrossTenant-AuthSource: CO1PR10MB4531.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 May 2022 16:10:33.2348 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: wMKHh7b24k8YExD5y4QKpgztKUWfBJXEHSDl5IWgX2puzdo3BAOYGJrodOlg+UjM6I7WO6aXRmQvw1SSnpDUiuMUCKH1eDRCJANxRNwXejU= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR10MB3906 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.486,18.0.858 definitions=2022-05-12_09:2022-05-12,2022-05-12 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 malwarescore=0 mlxscore=0 mlxlogscore=999 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2205120076 X-Proofpoint-ORIG-GUID: kB_W0E4vd5gb8SdluDO4tC2GOnz5QnLc X-Proofpoint-GUID: kB_W0E4vd5gb8SdluDO4tC2GOnz5QnLc Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org David, Great questions! See inline responses below. eric On 5/12/22 03:52, David Hildenbrand wrote: > On 05.05.22 20:45, Eric DeVolder wrote: >> CPU and memory change notifications are received in order to >> regenerate the elfcorehdr. >> >> To support cpu hotplug, a callback is registered to capture the >> CPUHP_AP_ONLINE_DYN online and offline events via >> cpuhp_setup_state_nocalls(). >> >> To support memory hotplug, a notifier is registered to capture the >> MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier(). >> >> The cpu callback and memory notifiers call handle_hotplug_event() >> to handle the hot plug/unplug event. Then handle_hotplug_event() >> dispatches the event to the architecture specific >> arch_crash_handle_hotplug_event(). During the process, the >> kexec_mutex is held. >> >> Signed-off-by: Eric DeVolder > > [...] > >> + >> +#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG) >> +void __weak arch_crash_handle_hotplug_event(struct kimage *image, >> + unsigned int hp_action, unsigned int cpu) >> +{ >> + WARN(1, "crash hotplug handler not implemented"); > > > Won't that trigger on any arch that has CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG? > I mean, you only implement it for x86 later in this series. Or what else stops this WARN from > triggering? > You're correct. What about: printk_once(KERN_DEBUG "...") ? >> +} >> + >> +static void handle_hotplug_event(unsigned int hp_action, unsigned int cpu) >> +{ >> + /* Obtain lock while changing crash information */ >> + if (!mutex_trylock(&kexec_mutex)) >> + return; > > This looks wrong. What if you offline memory but for some reason the mutex > is currently locked? You'd miss updating the vmcore, which would be broken. > > Why is this trylock in place? Some workaround for potential locking issues, > or what is the rationale? I took this from kernel/kexec.c:do_my_load(), but you are right, this is not right. * * KISS: always take the mutex. */ if (!mutex_trylock(&kexec_mutex)) return -EBUSY; This should simply be mutex_lock(&kexec_mutex). > >> + >> + /* Check kdump is loaded */ >> + if (kexec_crash_image) { >> + pr_debug("crash hp: hp_action %u, cpu %u", hp_action, cpu); >> + >> + /* Needed in order for the segments to be updated */ >> + arch_kexec_unprotect_crashkres(); >> + >> + /* Flag to differentiate between normal load and hotplug */ >> + kexec_crash_image->hotplug_event = true; > > 1. Why is that required? Why can't arch_crash_handle_hotplug_event() forward that > information? I mean, *hotplug* in the anme implies that the function should be > aware what's happening. Member .hotplug_event is needed in crash_prepare_elf64_headers(). To date, it has made sense (to me) to pass this parameter to crash_prepare_elf64_headers() via the struct kimage, rather than directly. The patch "crash: prototype change for crash_prepare_elf64_headers" changes crash_prepare_elf64_headers() to accept the struct kimage. If it is so desired, it could just as well be changed to pass just this one parameter; though struct kimage seems a more useful way to go. > > 2. Why can't the unprotect+reprotect not be done inside > arch_crash_handle_hotplug_event() ? It's all arch specific either way. > > IMHO, this code here should be as simple as > > if (kexec_crash_image) > arch_crash_handle_hotplug_event(kexec_crash_image, hp_action, cpu); > The intent of this code was to be generic infrastructure. Just invoking the arch_crash_handle_hotplug_event() would certainly be as generic as it gets. But there were a series of steps that seemed to be common, so those I hoisted into this bit of code. > 3. Why do we have to forward the CPU for CPU onlining/offlining but not the > memory block id (or similar) when onlining/offlining a memory block? From patch "kexec: exclude hot remove cpu from elfcorehdr notes" commit message: Due to use of CPUHP_AP_ONLINE_DYN, upon CPU unplug, the CPU is still in the for_each_present_cpu() list when within the handle_hotplug_event(). Thus the CPU must be explicitly excluded when building the new list of CPUs. This change identifies in handle_hotplug_event() the CPU to be excluded, and the check for excluding the CPU in crash_prepare_elf64_headers(). If there is a better CPUHP_ to use than _DYN, I'd be all for that! > >> + >> + /* Now invoke arch-specific update handler */ >> + arch_crash_handle_hotplug_event(kexec_crash_image, hp_action, cpu); >> + >> + /* No longer handling a hotplug event */ >> + kexec_crash_image->hotplug_event = false; >> + >> + /* Change back to read-only */ >> + arch_kexec_protect_crashkres(); >> + } >> + >> + /* Release lock now that update complete */ >> + mutex_unlock(&kexec_mutex); >> +} >> + >> +static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v) >> +{ >> + switch (val) { >> + case MEM_ONLINE: >> + handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY, 0); >> + break; >> + >> + case MEM_OFFLINE: >> + handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY, 0); >> + break; >> + } >> + return NOTIFY_OK; >> +} >> + >> +static struct notifier_block crash_memhp_nb = { >> + .notifier_call = crash_memhp_notifier, >> + .priority = 0 >> +}; >> + >> +static int crash_cpuhp_online(unsigned int cpu) >> +{ >> + handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu); >> + return 0; >> +} >> + >> +static int crash_cpuhp_offline(unsigned int cpu) >> +{ >> + handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu); >> + return 0; >> +} >> + >> +static int __init crash_hotplug_init(void) >> +{ >> + int result = 0; >> + >> + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) >> + register_memory_notifier(&crash_memhp_nb); >> + >> + if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) >> + result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, >> + "crash/cpuhp", >> + crash_cpuhp_online, >> + crash_cpuhp_offline); > > Ehm, this indentation looks very weird. > >> + >> + return result; >> +} >> + >> +subsys_initcall(crash_hotplug_init); >> +#endif > > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric DeVolder Date: Thu, 12 May 2022 11:10:28 -0500 Subject: [PATCH v8 3/7] crash: add generic infrastructure for crash hotplug support In-Reply-To: <62089f7b-4a3e-7dc8-1cda-84583e19d6fd@redhat.com> References: <20220505184603.1548-1-eric.devolder@oracle.com> <20220505184603.1548-4-eric.devolder@oracle.com> <62089f7b-4a3e-7dc8-1cda-84583e19d6fd@redhat.com> Message-ID: List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: kexec@lists.infradead.org David, Great questions! See inline responses below. eric On 5/12/22 03:52, David Hildenbrand wrote: > On 05.05.22 20:45, Eric DeVolder wrote: >> CPU and memory change notifications are received in order to >> regenerate the elfcorehdr. >> >> To support cpu hotplug, a callback is registered to capture the >> CPUHP_AP_ONLINE_DYN online and offline events via >> cpuhp_setup_state_nocalls(). >> >> To support memory hotplug, a notifier is registered to capture the >> MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier(). >> >> The cpu callback and memory notifiers call handle_hotplug_event() >> to handle the hot plug/unplug event. Then handle_hotplug_event() >> dispatches the event to the architecture specific >> arch_crash_handle_hotplug_event(). During the process, the >> kexec_mutex is held. >> >> Signed-off-by: Eric DeVolder > > [...] > >> + >> +#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG) >> +void __weak arch_crash_handle_hotplug_event(struct kimage *image, >> + unsigned int hp_action, unsigned int cpu) >> +{ >> + WARN(1, "crash hotplug handler not implemented"); > > > Won't that trigger on any arch that has CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG? > I mean, you only implement it for x86 later in this series. Or what else stops this WARN from > triggering? > You're correct. What about: printk_once(KERN_DEBUG "...") ? >> +} >> + >> +static void handle_hotplug_event(unsigned int hp_action, unsigned int cpu) >> +{ >> + /* Obtain lock while changing crash information */ >> + if (!mutex_trylock(&kexec_mutex)) >> + return; > > This looks wrong. What if you offline memory but for some reason the mutex > is currently locked? You'd miss updating the vmcore, which would be broken. > > Why is this trylock in place? Some workaround for potential locking issues, > or what is the rationale? I took this from kernel/kexec.c:do_my_load(), but you are right, this is not right. * * KISS: always take the mutex. */ if (!mutex_trylock(&kexec_mutex)) return -EBUSY; This should simply be mutex_lock(&kexec_mutex). > >> + >> + /* Check kdump is loaded */ >> + if (kexec_crash_image) { >> + pr_debug("crash hp: hp_action %u, cpu %u", hp_action, cpu); >> + >> + /* Needed in order for the segments to be updated */ >> + arch_kexec_unprotect_crashkres(); >> + >> + /* Flag to differentiate between normal load and hotplug */ >> + kexec_crash_image->hotplug_event = true; > > 1. Why is that required? Why can't arch_crash_handle_hotplug_event() forward that > information? I mean, *hotplug* in the anme implies that the function should be > aware what's happening. Member .hotplug_event is needed in crash_prepare_elf64_headers(). To date, it has made sense (to me) to pass this parameter to crash_prepare_elf64_headers() via the struct kimage, rather than directly. The patch "crash: prototype change for crash_prepare_elf64_headers" changes crash_prepare_elf64_headers() to accept the struct kimage. If it is so desired, it could just as well be changed to pass just this one parameter; though struct kimage seems a more useful way to go. > > 2. Why can't the unprotect+reprotect not be done inside > arch_crash_handle_hotplug_event() ? It's all arch specific either way. > > IMHO, this code here should be as simple as > > if (kexec_crash_image) > arch_crash_handle_hotplug_event(kexec_crash_image, hp_action, cpu); > The intent of this code was to be generic infrastructure. Just invoking the arch_crash_handle_hotplug_event() would certainly be as generic as it gets. But there were a series of steps that seemed to be common, so those I hoisted into this bit of code. > 3. Why do we have to forward the CPU for CPU onlining/offlining but not the > memory block id (or similar) when onlining/offlining a memory block? From patch "kexec: exclude hot remove cpu from elfcorehdr notes" commit message: Due to use of CPUHP_AP_ONLINE_DYN, upon CPU unplug, the CPU is still in the for_each_present_cpu() list when within the handle_hotplug_event(). Thus the CPU must be explicitly excluded when building the new list of CPUs. This change identifies in handle_hotplug_event() the CPU to be excluded, and the check for excluding the CPU in crash_prepare_elf64_headers(). If there is a better CPUHP_ to use than _DYN, I'd be all for that! > >> + >> + /* Now invoke arch-specific update handler */ >> + arch_crash_handle_hotplug_event(kexec_crash_image, hp_action, cpu); >> + >> + /* No longer handling a hotplug event */ >> + kexec_crash_image->hotplug_event = false; >> + >> + /* Change back to read-only */ >> + arch_kexec_protect_crashkres(); >> + } >> + >> + /* Release lock now that update complete */ >> + mutex_unlock(&kexec_mutex); >> +} >> + >> +static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v) >> +{ >> + switch (val) { >> + case MEM_ONLINE: >> + handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY, 0); >> + break; >> + >> + case MEM_OFFLINE: >> + handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY, 0); >> + break; >> + } >> + return NOTIFY_OK; >> +} >> + >> +static struct notifier_block crash_memhp_nb = { >> + .notifier_call = crash_memhp_notifier, >> + .priority = 0 >> +}; >> + >> +static int crash_cpuhp_online(unsigned int cpu) >> +{ >> + handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu); >> + return 0; >> +} >> + >> +static int crash_cpuhp_offline(unsigned int cpu) >> +{ >> + handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu); >> + return 0; >> +} >> + >> +static int __init crash_hotplug_init(void) >> +{ >> + int result = 0; >> + >> + if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) >> + register_memory_notifier(&crash_memhp_nb); >> + >> + if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) >> + result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, >> + "crash/cpuhp", >> + crash_cpuhp_online, >> + crash_cpuhp_offline); > > Ehm, this indentation looks very weird. > >> + >> + return result; >> +} >> + >> +subsys_initcall(crash_hotplug_init); >> +#endif > >