From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B58CC43457 for ; Fri, 9 Oct 2020 19:50:45 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 28A2722282 for ; Fri, 9 Oct 2020 19:50:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 28A2722282 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id A4C4B15923CD0; Fri, 9 Oct 2020 12:50:44 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=134.134.136.65; helo=mga03.intel.com; envelope-from=ira.weiny@intel.com; receiver= Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 866551503868D for ; Fri, 9 Oct 2020 12:50:42 -0700 (PDT) IronPort-SDR: X6dJIZT+brVJ88QbnkuWt1Ob/hxJUwNGs2qYnszn2/tQmqR30VIPnN1ksEs1IKLZCsiOXQDDfv QNO/WIUGJ0Vw== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="165591917" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="165591917" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 IronPort-SDR: wgvSBhlinBwJf2eRaqYm1d4mOPDeheaaBRmvXZpaWhx0BsPjq5MOqCRmfglsuVIrge+HvLIvQ5 IT741lyNdN2Q== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="419536654" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 From: ira.weiny@intel.com To: Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Subject: [PATCH RFC PKS/PMEM 00/58] PMEM: Introduce stray write protection for PMEM Date: Fri, 9 Oct 2020 12:49:35 -0700 Message-Id: <20201009195033.3208459-1-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 MIME-Version: 1.0 Message-ID-Hash: EHQOQYI54PLFZ3TA3IDDVBBSK7RAAW62 X-Message-ID-Hash: EHQOQYI54PLFZ3TA3IDDVBBSK7RAAW62 X-MailFrom: ira.weiny@intel.com X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header CC: x86@kernel.org, Dave Hansen , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kexec@lists.infradead.org, linux-bcache@vger.kernel.org, linux-mtd@lists.infradead.org, devel@driverdev.osuosl.org, linux-efi@vger.kernel.org, linux-mmc@vger.kernel.org, linux-scsi@vger.kernel.org, target-devel@vger.kernel.org, linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-aio@kvack.org, io-uring@vger.kernel.org, linux-erofs@lists.ozlabs.org, linux-um@lists.infradead.org, linux-ntfs-dev@lists.sourceforge.net, reiserfs-devel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-nilfs@vger.kernel.org, cluster-devel@redhat.com, ecryptfs@vger.kernel.org, linux-cifs@vger.kernel.org , linux-btrfs@vger.kernel.org, linux-afs@lists.infradead.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, drbd-dev@lists.linbit.com, linux-block@vger.kernel.org, xen-devel@lists.xenproject.org, linux-cachefs@redhat.com, samba-technical@lists.samba.org, intel-wired-lan@lists.osuosl.org X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit From: Ira Weiny Should a stray write in the kernel occur persistent memory is affected more than regular memory. A write to the wrong area of memory could result in latent data corruption which will will persist after a reboot. PKS provides a nice way to restrict access to persistent memory kernel mappings, while providing fast access when needed. Since the last RFC[1] this patch set has grown quite a bit. It now depends on the core patches submitted separately. https://lore.kernel.org/lkml/20201009194258.3207172-1-ira.weiny@intel.com/ And contained in the git tree here: https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3 However, functionally there is only 1 major change from the last RFC. Specifically, kmap() is most often used within a single thread in a 'map/do something/unmap' pattern. In fact this is the pattern used in ~90% of the callers of kmap(). This pattern works very well for the pmem use case and the testing which was done. However, there were another ~20-30 kmap users which do not follow this pattern. Some of them seem to expect the mapping to be 'global' while others require a detailed audit to be sure.[2][3] While we don't anticipate global mappings to pmem there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why. There were a number of options considered. 1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only Option 1 is simply not feasible kmap_atomic() is not the same semantic as kmap() within a single tread. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior intended for future users. Therefore, option #4 was chosen. To handle the global PKRS state in the most efficient manner possible. We lazily override the thread specific PKRS key value only when needed because we anticipate PKS to not be needed will not be needed most of the time. And even when it is used 90% of the time it is a thread local call. [1] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny@intel.com/ [2] The following list of callers continue calling kmap() (utilizing the global PKRS). It would be nice if more of them could be converted to kmap_thread() drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]); drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl)); drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page); drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page); drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1)); drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset; drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg)); drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset; fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]); fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page); fs/ceph/inode.c: ctl->dentries = kmap(ctl->page); fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr); lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset; net/ceph/pagelist.c: pl->mapped_tail = kmap(page); net/ceph/pagelist.c: pl->mapped_tail = kmap(page); virt/kvm/kvm_main.c: hva = kmap(page); [3] The following appear to follow the same pattern as ext2 which was converted after some code audit. So I _think_ they too could be converted to k[un]map_thread(). fs/freevxfs/vxfs_subr.c|75| kmap(pp); fs/jfs/jfs_metapage.c|102| kmap(page); fs/jfs/jfs_metapage.c|156| kmap(page); fs/minix/dir.c|72| kmap(page); fs/nilfs2/dir.c|195| kmap(page); fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page); fs/ntfs/aops.h|78| kmap(page); fs/ntfs/compress.c|574| kmap(page); fs/qnx6/dir.c|32| kmap(page); fs/qnx6/dir.c|58| kmap(*p = page); fs/qnx6/inode.c|190| kmap(page); fs/qnx6/inode.c|557| kmap(page); fs/reiserfs/inode.c|2397| kmap(bh_result->b_page); fs/reiserfs/xattr.c|444| kmap(page); fs/sysv/dir.c|60| kmap(page); fs/sysv/dir.c|262| kmap(page); fs/ufs/dir.c|194| kmap(page); fs/ufs/dir.c|562| kmap(page); Ira Weiny (58): x86/pks: Add a global pkrs option x86/pks/test: Add testing for global option memremap: Add zone device access protection kmap: Add stray access protection for device pages kmap: Introduce k[un]map_thread kmap: Introduce k[un]map_thread debugging drivers/drbd: Utilize new kmap_thread() drivers/firmware_loader: Utilize new kmap_thread() drivers/gpu: Utilize new kmap_thread() drivers/rdma: Utilize new kmap_thread() drivers/net: Utilize new kmap_thread() fs/afs: Utilize new kmap_thread() fs/btrfs: Utilize new kmap_thread() fs/cifs: Utilize new kmap_thread() fs/ecryptfs: Utilize new kmap_thread() fs/gfs2: Utilize new kmap_thread() fs/nilfs2: Utilize new kmap_thread() fs/hfs: Utilize new kmap_thread() fs/hfsplus: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() fs/nfs: Utilize new kmap_thread() fs/f2fs: Utilize new kmap_thread() fs/fuse: Utilize new kmap_thread() fs/freevxfs: Utilize new kmap_thread() fs/reiserfs: Utilize new kmap_thread() fs/zonefs: Utilize new kmap_thread() fs/ubifs: Utilize new kmap_thread() fs/cachefiles: Utilize new kmap_thread() fs/ntfs: Utilize new kmap_thread() fs/romfs: Utilize new kmap_thread() fs/vboxsf: Utilize new kmap_thread() fs/hostfs: Utilize new kmap_thread() fs/cramfs: Utilize new kmap_thread() fs/erofs: Utilize new kmap_thread() fs: Utilize new kmap_thread() fs/ext2: Use ext2_put_page fs/ext2: Utilize new kmap_thread() fs/isofs: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() net: Utilize new kmap_thread() drivers/target: Utilize new kmap_thread() drivers/scsi: Utilize new kmap_thread() drivers/mmc: Utilize new kmap_thread() drivers/xen: Utilize new kmap_thread() drivers/firmware: Utilize new kmap_thread() drives/staging: Utilize new kmap_thread() drivers/mtd: Utilize new kmap_thread() drivers/md: Utilize new kmap_thread() drivers/misc: Utilize new kmap_thread() drivers/android: Utilize new kmap_thread() kernel: Utilize new kmap_thread() mm: Utilize new kmap_thread() lib: Utilize new kmap_thread() powerpc: Utilize new kmap_thread() samples: Utilize new kmap_thread() dax: Stray access protection for dax_direct_access() nvdimm/pmem: Stray access protection for pmem->virt_addr [dax|pmem]: Enable stray access protection Documentation/core-api/protection-keys.rst | 11 +- arch/powerpc/mm/mem.c | 4 +- arch/x86/entry/common.c | 28 +++ arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 ++++++- arch/x86/mm/fault.c | 193 ++++++++++++++---- arch/x86/mm/pkeys.c | 88 ++++++-- drivers/android/binder_alloc.c | 4 +- drivers/base/firmware_loader/fallback.c | 4 +- drivers/base/firmware_loader/main.c | 4 +- drivers/block/drbd/drbd_main.c | 4 +- drivers/block/drbd/drbd_receiver.c | 12 +- drivers/dax/device.c | 2 + drivers/dax/super.c | 2 + drivers/firmware/efi/capsule-loader.c | 6 +- drivers/firmware/efi/capsule.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +- drivers/gpu/drm/gma500/gma_display.c | 4 +- drivers/gpu/drm/gma500/mmu.c | 10 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/infiniband/hw/hfi1/sdma.c | 4 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +- drivers/md/bcache/request.c | 4 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +- drivers/mmc/host/mmc_spi.c | 4 +- drivers/mmc/host/sdricoh_cs.c | 4 +- drivers/mtd/mtd_blkdevs.c | 12 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +- .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/nvdimm/pmem.c | 6 + drivers/scsi/ipr.c | 8 +- drivers/scsi/pmcraid.c | 8 +- drivers/staging/rts5208/rtsx_transport.c | 4 +- drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_rd.c | 4 +- drivers/target/target_core_transport.c | 4 +- drivers/xen/gntalloc.c | 4 +- fs/afs/dir.c | 16 +- fs/afs/dir_edit.c | 16 +- fs/afs/mntpt.c | 4 +- fs/afs/write.c | 4 +- fs/aio.c | 4 +- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/btrfs/check-integrity.c | 4 +- fs/btrfs/compression.c | 4 +- fs/btrfs/inode.c | 16 +- fs/btrfs/lzo.c | 24 +-- fs/btrfs/raid56.c | 34 +-- fs/btrfs/reflink.c | 8 +- fs/btrfs/send.c | 4 +- fs/btrfs/zlib.c | 32 +-- fs/btrfs/zstd.c | 20 +- fs/cachefiles/rdwr.c | 4 +- fs/cifs/cifsencrypt.c | 6 +- fs/cifs/file.c | 16 +- fs/cifs/smb2ops.c | 8 +- fs/cramfs/inode.c | 10 +- fs/ecryptfs/crypto.c | 8 +- fs/ecryptfs/read_write.c | 8 +- fs/erofs/super.c | 4 +- fs/erofs/xattr.c | 4 +- fs/exec.c | 10 +- fs/ext2/dir.c | 8 +- fs/ext2/ext2.h | 8 + fs/ext2/namei.c | 15 +- fs/f2fs/f2fs.h | 8 +- fs/freevxfs/vxfs_immed.c | 4 +- fs/fuse/readdir.c | 4 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/ops_fstype.c | 4 +- fs/hfs/bnode.c | 14 +- fs/hfs/btree.c | 20 +- fs/hfsplus/bitmap.c | 20 +- fs/hfsplus/bnode.c | 102 ++++----- fs/hfsplus/btree.c | 18 +- fs/hostfs/hostfs_kern.c | 12 +- fs/io_uring.c | 4 +- fs/isofs/compress.c | 4 +- fs/jffs2/file.c | 8 +- fs/jffs2/gc.c | 4 +- fs/nfs/dir.c | 20 +- fs/nilfs2/alloc.c | 34 +-- fs/nilfs2/cpfile.c | 4 +- fs/ntfs/aops.c | 4 +- fs/reiserfs/journal.c | 4 +- fs/romfs/super.c | 4 +- fs/splice.c | 4 +- fs/ubifs/file.c | 16 +- fs/vboxsf/file.c | 12 +- fs/zonefs/super.c | 4 +- include/linux/entry-common.h | 3 + include/linux/highmem.h | 63 +++++- include/linux/memremap.h | 1 + include/linux/mm.h | 43 ++++ include/linux/pkeys.h | 6 +- include/linux/sched.h | 8 + include/trace/events/kmap_thread.h | 56 +++++ init/init_task.c | 6 + kernel/fork.c | 18 ++ kernel/kexec_core.c | 8 +- lib/Kconfig.debug | 8 + lib/iov_iter.c | 12 +- lib/pks/pks_test.c | 138 +++++++++++-- lib/test_bpf.c | 4 +- lib/test_hmm.c | 8 +- mm/Kconfig | 13 ++ mm/debug.c | 23 +++ mm/memory.c | 8 +- mm/memremap.c | 90 ++++++++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 4 +- net/ceph/messenger.c | 4 +- net/core/datagram.c | 4 +- net/core/sock.c | 8 +- net/ipv4/ip_output.c | 4 +- net/sunrpc/cache.c | 4 +- net/sunrpc/xdr.c | 8 +- net/tls/tls_device.c | 4 +- samples/vfio-mdev/mbochs.c | 4 +- 131 files changed, 1284 insertions(+), 565 deletions(-) create mode 100644 include/trace/events/kmap_thread.h -- 2.28.0.rc0.12.gb6a658bd00c9 _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 306D4C35269 for ; Fri, 9 Oct 2020 19:50:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EF81B2231B for ; Fri, 9 Oct 2020 19:50:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389015AbgJITuq (ORCPT ); Fri, 9 Oct 2020 15:50:46 -0400 Received: from mga02.intel.com ([134.134.136.20]:57507 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726357AbgJITup (ORCPT ); Fri, 9 Oct 2020 15:50:45 -0400 IronPort-SDR: EgmAoyvHqLfUgUFSlazRhx1Cjp9MmVlenKbZYXMVRVRfgMzpJRftl57MJq2pMTa+nNfk+FbMHQ 4ZNHJky2ihhA== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="152450718" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="152450718" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 IronPort-SDR: wgvSBhlinBwJf2eRaqYm1d4mOPDeheaaBRmvXZpaWhx0BsPjq5MOqCRmfglsuVIrge+HvLIvQ5 IT741lyNdN2Q== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="419536654" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 From: ira.weiny@intel.com To: Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kexec@lists.infradead.org, linux-bcache@vger.kernel.org, linux-mtd@lists.infradead.org, devel@driverdev.osuosl.org, linux-efi@vger.kernel.org, linux-mmc@vger.kernel.org, linux-scsi@vger.kernel.org, target-devel@vger.kernel.org, linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-aio@kvack.org, io-uring@vger.kernel.org, linux-erofs@lists.ozlabs.org, linux-um@lists.infradead.org, linux-ntfs-dev@lists.sourceforge.net, reiserfs-devel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-nilfs@vger.kernel.org, cluster-devel@redhat.com, ecryptfs@vger.kernel.org, linux-cifs@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-afs@lists.infradead.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, drbd-dev@lists.linbit.com, linux-block@vger.kernel.org, xen-devel@lists.xenproject.org, linux-cachefs@redhat.com, samba-technical@lists.samba.org, intel-wired-lan@lists.osuosl.org Subject: [PATCH RFC PKS/PMEM 00/58] PMEM: Introduce stray write protection for PMEM Date: Fri, 9 Oct 2020 12:49:35 -0700 Message-Id: <20201009195033.3208459-1-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org From: Ira Weiny Should a stray write in the kernel occur persistent memory is affected more than regular memory. A write to the wrong area of memory could result in latent data corruption which will will persist after a reboot. PKS provides a nice way to restrict access to persistent memory kernel mappings, while providing fast access when needed. Since the last RFC[1] this patch set has grown quite a bit. It now depends on the core patches submitted separately. https://lore.kernel.org/lkml/20201009194258.3207172-1-ira.weiny@intel.com/ And contained in the git tree here: https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3 However, functionally there is only 1 major change from the last RFC. Specifically, kmap() is most often used within a single thread in a 'map/do something/unmap' pattern. In fact this is the pattern used in ~90% of the callers of kmap(). This pattern works very well for the pmem use case and the testing which was done. However, there were another ~20-30 kmap users which do not follow this pattern. Some of them seem to expect the mapping to be 'global' while others require a detailed audit to be sure.[2][3] While we don't anticipate global mappings to pmem there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why. There were a number of options considered. 1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only Option 1 is simply not feasible kmap_atomic() is not the same semantic as kmap() within a single tread. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior intended for future users. Therefore, option #4 was chosen. To handle the global PKRS state in the most efficient manner possible. We lazily override the thread specific PKRS key value only when needed because we anticipate PKS to not be needed will not be needed most of the time. And even when it is used 90% of the time it is a thread local call. [1] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny@intel.com/ [2] The following list of callers continue calling kmap() (utilizing the global PKRS). It would be nice if more of them could be converted to kmap_thread() drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]); drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl)); drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page); drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page); drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1)); drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset; drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg)); drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset; fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]); fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page); fs/ceph/inode.c: ctl->dentries = kmap(ctl->page); fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr); lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset; net/ceph/pagelist.c: pl->mapped_tail = kmap(page); net/ceph/pagelist.c: pl->mapped_tail = kmap(page); virt/kvm/kvm_main.c: hva = kmap(page); [3] The following appear to follow the same pattern as ext2 which was converted after some code audit. So I _think_ they too could be converted to k[un]map_thread(). fs/freevxfs/vxfs_subr.c|75| kmap(pp); fs/jfs/jfs_metapage.c|102| kmap(page); fs/jfs/jfs_metapage.c|156| kmap(page); fs/minix/dir.c|72| kmap(page); fs/nilfs2/dir.c|195| kmap(page); fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page); fs/ntfs/aops.h|78| kmap(page); fs/ntfs/compress.c|574| kmap(page); fs/qnx6/dir.c|32| kmap(page); fs/qnx6/dir.c|58| kmap(*p = page); fs/qnx6/inode.c|190| kmap(page); fs/qnx6/inode.c|557| kmap(page); fs/reiserfs/inode.c|2397| kmap(bh_result->b_page); fs/reiserfs/xattr.c|444| kmap(page); fs/sysv/dir.c|60| kmap(page); fs/sysv/dir.c|262| kmap(page); fs/ufs/dir.c|194| kmap(page); fs/ufs/dir.c|562| kmap(page); Ira Weiny (58): x86/pks: Add a global pkrs option x86/pks/test: Add testing for global option memremap: Add zone device access protection kmap: Add stray access protection for device pages kmap: Introduce k[un]map_thread kmap: Introduce k[un]map_thread debugging drivers/drbd: Utilize new kmap_thread() drivers/firmware_loader: Utilize new kmap_thread() drivers/gpu: Utilize new kmap_thread() drivers/rdma: Utilize new kmap_thread() drivers/net: Utilize new kmap_thread() fs/afs: Utilize new kmap_thread() fs/btrfs: Utilize new kmap_thread() fs/cifs: Utilize new kmap_thread() fs/ecryptfs: Utilize new kmap_thread() fs/gfs2: Utilize new kmap_thread() fs/nilfs2: Utilize new kmap_thread() fs/hfs: Utilize new kmap_thread() fs/hfsplus: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() fs/nfs: Utilize new kmap_thread() fs/f2fs: Utilize new kmap_thread() fs/fuse: Utilize new kmap_thread() fs/freevxfs: Utilize new kmap_thread() fs/reiserfs: Utilize new kmap_thread() fs/zonefs: Utilize new kmap_thread() fs/ubifs: Utilize new kmap_thread() fs/cachefiles: Utilize new kmap_thread() fs/ntfs: Utilize new kmap_thread() fs/romfs: Utilize new kmap_thread() fs/vboxsf: Utilize new kmap_thread() fs/hostfs: Utilize new kmap_thread() fs/cramfs: Utilize new kmap_thread() fs/erofs: Utilize new kmap_thread() fs: Utilize new kmap_thread() fs/ext2: Use ext2_put_page fs/ext2: Utilize new kmap_thread() fs/isofs: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() net: Utilize new kmap_thread() drivers/target: Utilize new kmap_thread() drivers/scsi: Utilize new kmap_thread() drivers/mmc: Utilize new kmap_thread() drivers/xen: Utilize new kmap_thread() drivers/firmware: Utilize new kmap_thread() drives/staging: Utilize new kmap_thread() drivers/mtd: Utilize new kmap_thread() drivers/md: Utilize new kmap_thread() drivers/misc: Utilize new kmap_thread() drivers/android: Utilize new kmap_thread() kernel: Utilize new kmap_thread() mm: Utilize new kmap_thread() lib: Utilize new kmap_thread() powerpc: Utilize new kmap_thread() samples: Utilize new kmap_thread() dax: Stray access protection for dax_direct_access() nvdimm/pmem: Stray access protection for pmem->virt_addr [dax|pmem]: Enable stray access protection Documentation/core-api/protection-keys.rst | 11 +- arch/powerpc/mm/mem.c | 4 +- arch/x86/entry/common.c | 28 +++ arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 ++++++- arch/x86/mm/fault.c | 193 ++++++++++++++---- arch/x86/mm/pkeys.c | 88 ++++++-- drivers/android/binder_alloc.c | 4 +- drivers/base/firmware_loader/fallback.c | 4 +- drivers/base/firmware_loader/main.c | 4 +- drivers/block/drbd/drbd_main.c | 4 +- drivers/block/drbd/drbd_receiver.c | 12 +- drivers/dax/device.c | 2 + drivers/dax/super.c | 2 + drivers/firmware/efi/capsule-loader.c | 6 +- drivers/firmware/efi/capsule.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +- drivers/gpu/drm/gma500/gma_display.c | 4 +- drivers/gpu/drm/gma500/mmu.c | 10 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/infiniband/hw/hfi1/sdma.c | 4 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +- drivers/md/bcache/request.c | 4 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +- drivers/mmc/host/mmc_spi.c | 4 +- drivers/mmc/host/sdricoh_cs.c | 4 +- drivers/mtd/mtd_blkdevs.c | 12 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +- .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/nvdimm/pmem.c | 6 + drivers/scsi/ipr.c | 8 +- drivers/scsi/pmcraid.c | 8 +- drivers/staging/rts5208/rtsx_transport.c | 4 +- drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_rd.c | 4 +- drivers/target/target_core_transport.c | 4 +- drivers/xen/gntalloc.c | 4 +- fs/afs/dir.c | 16 +- fs/afs/dir_edit.c | 16 +- fs/afs/mntpt.c | 4 +- fs/afs/write.c | 4 +- fs/aio.c | 4 +- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/btrfs/check-integrity.c | 4 +- fs/btrfs/compression.c | 4 +- fs/btrfs/inode.c | 16 +- fs/btrfs/lzo.c | 24 +-- fs/btrfs/raid56.c | 34 +-- fs/btrfs/reflink.c | 8 +- fs/btrfs/send.c | 4 +- fs/btrfs/zlib.c | 32 +-- fs/btrfs/zstd.c | 20 +- fs/cachefiles/rdwr.c | 4 +- fs/cifs/cifsencrypt.c | 6 +- fs/cifs/file.c | 16 +- fs/cifs/smb2ops.c | 8 +- fs/cramfs/inode.c | 10 +- fs/ecryptfs/crypto.c | 8 +- fs/ecryptfs/read_write.c | 8 +- fs/erofs/super.c | 4 +- fs/erofs/xattr.c | 4 +- fs/exec.c | 10 +- fs/ext2/dir.c | 8 +- fs/ext2/ext2.h | 8 + fs/ext2/namei.c | 15 +- fs/f2fs/f2fs.h | 8 +- fs/freevxfs/vxfs_immed.c | 4 +- fs/fuse/readdir.c | 4 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/ops_fstype.c | 4 +- fs/hfs/bnode.c | 14 +- fs/hfs/btree.c | 20 +- fs/hfsplus/bitmap.c | 20 +- fs/hfsplus/bnode.c | 102 ++++----- fs/hfsplus/btree.c | 18 +- fs/hostfs/hostfs_kern.c | 12 +- fs/io_uring.c | 4 +- fs/isofs/compress.c | 4 +- fs/jffs2/file.c | 8 +- fs/jffs2/gc.c | 4 +- fs/nfs/dir.c | 20 +- fs/nilfs2/alloc.c | 34 +-- fs/nilfs2/cpfile.c | 4 +- fs/ntfs/aops.c | 4 +- fs/reiserfs/journal.c | 4 +- fs/romfs/super.c | 4 +- fs/splice.c | 4 +- fs/ubifs/file.c | 16 +- fs/vboxsf/file.c | 12 +- fs/zonefs/super.c | 4 +- include/linux/entry-common.h | 3 + include/linux/highmem.h | 63 +++++- include/linux/memremap.h | 1 + include/linux/mm.h | 43 ++++ include/linux/pkeys.h | 6 +- include/linux/sched.h | 8 + include/trace/events/kmap_thread.h | 56 +++++ init/init_task.c | 6 + kernel/fork.c | 18 ++ kernel/kexec_core.c | 8 +- lib/Kconfig.debug | 8 + lib/iov_iter.c | 12 +- lib/pks/pks_test.c | 138 +++++++++++-- lib/test_bpf.c | 4 +- lib/test_hmm.c | 8 +- mm/Kconfig | 13 ++ mm/debug.c | 23 +++ mm/memory.c | 8 +- mm/memremap.c | 90 ++++++++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 4 +- net/ceph/messenger.c | 4 +- net/core/datagram.c | 4 +- net/core/sock.c | 8 +- net/ipv4/ip_output.c | 4 +- net/sunrpc/cache.c | 4 +- net/sunrpc/xdr.c | 8 +- net/tls/tls_device.c | 4 +- samples/vfio-mdev/mbochs.c | 4 +- 131 files changed, 1284 insertions(+), 565 deletions(-) create mode 100644 include/trace/events/kmap_thread.h -- 2.28.0.rc0.12.gb6a658bd00c9 From mboxrd@z Thu Jan 1 00:00:00 1970 From: ira.weiny@intel.com Date: Fri, 09 Oct 2020 19:49:35 +0000 Subject: [PATCH RFC PKS/PMEM 00/58] PMEM: Introduce stray write protection for PMEM Message-Id: <20201009195033.3208459-1-ira.weiny@intel.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: linux-aio@kvack.org, linux-efi@vger.kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-mmc@vger.kernel.org, Dave Hansen , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, target-devel@vger.kernel.org, linux-mtd@lists.infradead.org, linux-kselftest@vger.kernel.org, samba-technical@lists.samba.org, Ira Weiny , ceph-devel@vger.kernel.org, drbd-dev@lists.linbit.com, devel@driverdev.osuosl.org, linux-cifs@vger.kernel.org, linux-nilfs@vger.kernel.org, linux-scsi@vger.kernel.org, linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org, x86@kernel.org, amd-gfx@lists.freedesktop.org, linux-afs@lists.infradead.org, cluster-devel@redhat.com, linux-cachefs@redhat.com, intel-wired-lan@lists.osuosl.org, xen-devel@lists.xenproject.org, linux-ext4@vger.kernel.org, Fenghua Yu , linux-um@lists.infradead.org, intel-gfx@lists.freedesktop.org, ecryptfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, reiserfs-devel@vger.kernel.org, linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Dan Williams , io-uring@vger.kernel.org, linux-nfs@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, netdev@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-btrfs@vger.kernel.org From: Ira Weiny Should a stray write in the kernel occur persistent memory is affected more than regular memory. A write to the wrong area of memory could result in latent data corruption which will will persist after a reboot. PKS provides a nice way to restrict access to persistent memory kernel mappings, while providing fast access when needed. Since the last RFC[1] this patch set has grown quite a bit. It now depends on the core patches submitted separately. https://lore.kernel.org/lkml/20201009194258.3207172-1-ira.weiny@intel.com/ And contained in the git tree here: https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3 However, functionally there is only 1 major change from the last RFC. Specifically, kmap() is most often used within a single thread in a 'map/do something/unmap' pattern. In fact this is the pattern used in ~90% of the callers of kmap(). This pattern works very well for the pmem use case and the testing which was done. However, there were another ~20-30 kmap users which do not follow this pattern. Some of them seem to expect the mapping to be 'global' while others require a detailed audit to be sure.[2][3] While we don't anticipate global mappings to pmem there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why. There were a number of options considered. 1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only Option 1 is simply not feasible kmap_atomic() is not the same semantic as kmap() within a single tread. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior intended for future users. Therefore, option #4 was chosen. To handle the global PKRS state in the most efficient manner possible. We lazily override the thread specific PKRS key value only when needed because we anticipate PKS to not be needed will not be needed most of the time. And even when it is used 90% of the time it is a thread local call. [1] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny@intel.com/ [2] The following list of callers continue calling kmap() (utilizing the global PKRS). It would be nice if more of them could be converted to kmap_thread() drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]); drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl)); drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page); drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page); drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1)); drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset; drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg)); drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset; fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]); fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page); fs/ceph/inode.c: ctl->dentries = kmap(ctl->page); fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr); lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset; net/ceph/pagelist.c: pl->mapped_tail = kmap(page); net/ceph/pagelist.c: pl->mapped_tail = kmap(page); virt/kvm/kvm_main.c: hva = kmap(page); [3] The following appear to follow the same pattern as ext2 which was converted after some code audit. So I _think_ they too could be converted to k[un]map_thread(). fs/freevxfs/vxfs_subr.c|75| kmap(pp); fs/jfs/jfs_metapage.c|102| kmap(page); fs/jfs/jfs_metapage.c|156| kmap(page); fs/minix/dir.c|72| kmap(page); fs/nilfs2/dir.c|195| kmap(page); fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page); fs/ntfs/aops.h|78| kmap(page); fs/ntfs/compress.c|574| kmap(page); fs/qnx6/dir.c|32| kmap(page); fs/qnx6/dir.c|58| kmap(*p = page); fs/qnx6/inode.c|190| kmap(page); fs/qnx6/inode.c|557| kmap(page); fs/reiserfs/inode.c|2397| kmap(bh_result->b_page); fs/reiserfs/xattr.c|444| kmap(page); fs/sysv/dir.c|60| kmap(page); fs/sysv/dir.c|262| kmap(page); fs/ufs/dir.c|194| kmap(page); fs/ufs/dir.c|562| kmap(page); Ira Weiny (58): x86/pks: Add a global pkrs option x86/pks/test: Add testing for global option memremap: Add zone device access protection kmap: Add stray access protection for device pages kmap: Introduce k[un]map_thread kmap: Introduce k[un]map_thread debugging drivers/drbd: Utilize new kmap_thread() drivers/firmware_loader: Utilize new kmap_thread() drivers/gpu: Utilize new kmap_thread() drivers/rdma: Utilize new kmap_thread() drivers/net: Utilize new kmap_thread() fs/afs: Utilize new kmap_thread() fs/btrfs: Utilize new kmap_thread() fs/cifs: Utilize new kmap_thread() fs/ecryptfs: Utilize new kmap_thread() fs/gfs2: Utilize new kmap_thread() fs/nilfs2: Utilize new kmap_thread() fs/hfs: Utilize new kmap_thread() fs/hfsplus: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() fs/nfs: Utilize new kmap_thread() fs/f2fs: Utilize new kmap_thread() fs/fuse: Utilize new kmap_thread() fs/freevxfs: Utilize new kmap_thread() fs/reiserfs: Utilize new kmap_thread() fs/zonefs: Utilize new kmap_thread() fs/ubifs: Utilize new kmap_thread() fs/cachefiles: Utilize new kmap_thread() fs/ntfs: Utilize new kmap_thread() fs/romfs: Utilize new kmap_thread() fs/vboxsf: Utilize new kmap_thread() fs/hostfs: Utilize new kmap_thread() fs/cramfs: Utilize new kmap_thread() fs/erofs: Utilize new kmap_thread() fs: Utilize new kmap_thread() fs/ext2: Use ext2_put_page fs/ext2: Utilize new kmap_thread() fs/isofs: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() net: Utilize new kmap_thread() drivers/target: Utilize new kmap_thread() drivers/scsi: Utilize new kmap_thread() drivers/mmc: Utilize new kmap_thread() drivers/xen: Utilize new kmap_thread() drivers/firmware: Utilize new kmap_thread() drives/staging: Utilize new kmap_thread() drivers/mtd: Utilize new kmap_thread() drivers/md: Utilize new kmap_thread() drivers/misc: Utilize new kmap_thread() drivers/android: Utilize new kmap_thread() kernel: Utilize new kmap_thread() mm: Utilize new kmap_thread() lib: Utilize new kmap_thread() powerpc: Utilize new kmap_thread() samples: Utilize new kmap_thread() dax: Stray access protection for dax_direct_access() nvdimm/pmem: Stray access protection for pmem->virt_addr [dax|pmem]: Enable stray access protection Documentation/core-api/protection-keys.rst | 11 +- arch/powerpc/mm/mem.c | 4 +- arch/x86/entry/common.c | 28 +++ arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 ++++++- arch/x86/mm/fault.c | 193 ++++++++++++++---- arch/x86/mm/pkeys.c | 88 ++++++-- drivers/android/binder_alloc.c | 4 +- drivers/base/firmware_loader/fallback.c | 4 +- drivers/base/firmware_loader/main.c | 4 +- drivers/block/drbd/drbd_main.c | 4 +- drivers/block/drbd/drbd_receiver.c | 12 +- drivers/dax/device.c | 2 + drivers/dax/super.c | 2 + drivers/firmware/efi/capsule-loader.c | 6 +- drivers/firmware/efi/capsule.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +- drivers/gpu/drm/gma500/gma_display.c | 4 +- drivers/gpu/drm/gma500/mmu.c | 10 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/infiniband/hw/hfi1/sdma.c | 4 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +- drivers/md/bcache/request.c | 4 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +- drivers/mmc/host/mmc_spi.c | 4 +- drivers/mmc/host/sdricoh_cs.c | 4 +- drivers/mtd/mtd_blkdevs.c | 12 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +- .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/nvdimm/pmem.c | 6 + drivers/scsi/ipr.c | 8 +- drivers/scsi/pmcraid.c | 8 +- drivers/staging/rts5208/rtsx_transport.c | 4 +- drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_rd.c | 4 +- drivers/target/target_core_transport.c | 4 +- drivers/xen/gntalloc.c | 4 +- fs/afs/dir.c | 16 +- fs/afs/dir_edit.c | 16 +- fs/afs/mntpt.c | 4 +- fs/afs/write.c | 4 +- fs/aio.c | 4 +- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/btrfs/check-integrity.c | 4 +- fs/btrfs/compression.c | 4 +- fs/btrfs/inode.c | 16 +- fs/btrfs/lzo.c | 24 +-- fs/btrfs/raid56.c | 34 +-- fs/btrfs/reflink.c | 8 +- fs/btrfs/send.c | 4 +- fs/btrfs/zlib.c | 32 +-- fs/btrfs/zstd.c | 20 +- fs/cachefiles/rdwr.c | 4 +- fs/cifs/cifsencrypt.c | 6 +- fs/cifs/file.c | 16 +- fs/cifs/smb2ops.c | 8 +- fs/cramfs/inode.c | 10 +- fs/ecryptfs/crypto.c | 8 +- fs/ecryptfs/read_write.c | 8 +- fs/erofs/super.c | 4 +- fs/erofs/xattr.c | 4 +- fs/exec.c | 10 +- fs/ext2/dir.c | 8 +- fs/ext2/ext2.h | 8 + fs/ext2/namei.c | 15 +- fs/f2fs/f2fs.h | 8 +- fs/freevxfs/vxfs_immed.c | 4 +- fs/fuse/readdir.c | 4 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/ops_fstype.c | 4 +- fs/hfs/bnode.c | 14 +- fs/hfs/btree.c | 20 +- fs/hfsplus/bitmap.c | 20 +- fs/hfsplus/bnode.c | 102 ++++----- fs/hfsplus/btree.c | 18 +- fs/hostfs/hostfs_kern.c | 12 +- fs/io_uring.c | 4 +- fs/isofs/compress.c | 4 +- fs/jffs2/file.c | 8 +- fs/jffs2/gc.c | 4 +- fs/nfs/dir.c | 20 +- fs/nilfs2/alloc.c | 34 +-- fs/nilfs2/cpfile.c | 4 +- fs/ntfs/aops.c | 4 +- fs/reiserfs/journal.c | 4 +- fs/romfs/super.c | 4 +- fs/splice.c | 4 +- fs/ubifs/file.c | 16 +- fs/vboxsf/file.c | 12 +- fs/zonefs/super.c | 4 +- include/linux/entry-common.h | 3 + include/linux/highmem.h | 63 +++++- include/linux/memremap.h | 1 + include/linux/mm.h | 43 ++++ include/linux/pkeys.h | 6 +- include/linux/sched.h | 8 + include/trace/events/kmap_thread.h | 56 +++++ init/init_task.c | 6 + kernel/fork.c | 18 ++ kernel/kexec_core.c | 8 +- lib/Kconfig.debug | 8 + lib/iov_iter.c | 12 +- lib/pks/pks_test.c | 138 +++++++++++-- lib/test_bpf.c | 4 +- lib/test_hmm.c | 8 +- mm/Kconfig | 13 ++ mm/debug.c | 23 +++ mm/memory.c | 8 +- mm/memremap.c | 90 ++++++++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 4 +- net/ceph/messenger.c | 4 +- net/core/datagram.c | 4 +- net/core/sock.c | 8 +- net/ipv4/ip_output.c | 4 +- net/sunrpc/cache.c | 4 +- net/sunrpc/xdr.c | 8 +- net/tls/tls_device.c | 4 +- samples/vfio-mdev/mbochs.c | 4 +- 131 files changed, 1284 insertions(+), 565 deletions(-) create mode 100644 include/trace/events/kmap_thread.h -- 2.28.0.rc0.12.gb6a658bd00c9 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B86FFC352B9 for ; Fri, 9 Oct 2020 19:50:59 +0000 (UTC) Received: from lists.sourceforge.net (lists.sourceforge.net [216.105.38.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4DE2C223AB; Fri, 9 Oct 2020 19:50:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=sourceforge.net header.i=@sourceforge.net header.b="Hg59QvUY"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=sf.net header.i=@sf.net header.b="ZM0yfiMe" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4DE2C223AB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linux-f2fs-devel-bounces@lists.sourceforge.net Received: from [127.0.0.1] (helo=sfs-ml-2.v29.lw.sourceforge.com) by sfs-ml-2.v29.lw.sourceforge.com with esmtp (Exim 4.90_1) (envelope-from ) id 1kQyPm-00015F-Se; Fri, 09 Oct 2020 19:50:54 +0000 Received: from [172.30.20.202] (helo=mx.sourceforge.net) by sfs-ml-2.v29.lw.sourceforge.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kQyPl-000154-LG; Fri, 09 Oct 2020 19:50:53 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sourceforge.net; s=x; h=Content-Transfer-Encoding:MIME-Version:Message-Id: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=JgAfa6S4YXBC9vUgHdLQF32bIXGV6hekKowij/31VR8=; b=Hg59QvUYRpPQRI0MFDxn6v4ODv 2hPhnenW5B9B5BX0QjrIwNojQU5PuVZZFr87N6nxEnUQ9mTjGAh1n0k4KxUAc/yE84rHCl2H1DkXI F3B1ezMl+ovZ0xUS632SzU+YwE6ITf0vIGXmnBakGtEbH165c+LnkH9cxN6E5HiGPca8=; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sf.net; s=x ; h=Content-Transfer-Encoding:MIME-Version:Message-Id:Date:Subject:Cc:To:From :Sender:Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To: References:List-Id:List-Help:List-Unsubscribe:List-Subscribe:List-Post: List-Owner:List-Archive; bh=JgAfa6S4YXBC9vUgHdLQF32bIXGV6hekKowij/31VR8=; b=Z M0yfiMel5r32JmvykyuJbQMv8E7wvyADhi9HDVrkVHarTvN/K2BHep13JwPw4/QOLRuIZtHuzkpO2 i2CjDbxdtyTIj+bxSUGWtb6K/UYK33YonPMuhIyW9AXAj/3JMlMNE1qotwLY7+2KUcQ54TetsB43S KlEOlVhb6TUSDE0s=; Received: from mga05.intel.com ([192.55.52.43]) by sfi-mx-3.v28.lw.sourceforge.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92.2) id 1kQyPf-004aBY-Fa; Fri, 09 Oct 2020 19:50:53 +0000 IronPort-SDR: qLImhbyl966SiSIfJmoC3/BdGrbyA8EYIiK4nmXCWBTgX9/xJ5+/s0NtDXfwqbUjQv4kt8rv9x rHSyE8foxrPg== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="250225857" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="250225857" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 IronPort-SDR: wgvSBhlinBwJf2eRaqYm1d4mOPDeheaaBRmvXZpaWhx0BsPjq5MOqCRmfglsuVIrge+HvLIvQ5 IT741lyNdN2Q== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="419536654" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 From: ira.weiny@intel.com To: Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Date: Fri, 9 Oct 2020 12:49:35 -0700 Message-Id: <20201009195033.3208459-1-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 MIME-Version: 1.0 X-Headers-End: 1kQyPf-004aBY-Fa Subject: [f2fs-dev] [PATCH RFC PKS/PMEM 00/58] PMEM: Introduce stray write protection for PMEM X-BeenThere: linux-f2fs-devel@lists.sourceforge.net X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-aio@kvack.org, linux-efi@vger.kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-mmc@vger.kernel.org, Dave Hansen , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, target-devel@vger.kernel.org, linux-mtd@lists.infradead.org, linux-kselftest@vger.kernel.org, samba-technical@lists.samba.org, Ira Weiny , ceph-devel@vger.kernel.org, drbd-dev@lists.linbit.com, devel@driverdev.osuosl.org, linux-cifs@vger.kernel.org, linux-nilfs@vger.kernel.org, linux-scsi@vger.kernel.org, linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org, x86@kernel.org, amd-gfx@lists.freedesktop.org, linux-afs@lists.infradead.org, cluster-devel@redhat.com, linux-cachefs@redhat.com, intel-wired-lan@lists.osuosl.org, xen-devel@lists.xenproject.org, linux-ext4@vger.kernel.org, Fenghua Yu , linux-um@lists.infradead.org, intel-gfx@lists.freedesktop.org, ecryptfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, reiserfs-devel@vger.kernel.org, linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Dan Williams , io-uring@vger.kernel.org, linux-nfs@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, netdev@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-btrfs@vger.kernel.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-f2fs-devel-bounces@lists.sourceforge.net From: Ira Weiny Should a stray write in the kernel occur persistent memory is affected more than regular memory. A write to the wrong area of memory could result in latent data corruption which will will persist after a reboot. PKS provides a nice way to restrict access to persistent memory kernel mappings, while providing fast access when needed. Since the last RFC[1] this patch set has grown quite a bit. It now depends on the core patches submitted separately. https://lore.kernel.org/lkml/20201009194258.3207172-1-ira.weiny@intel.com/ And contained in the git tree here: https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3 However, functionally there is only 1 major change from the last RFC. Specifically, kmap() is most often used within a single thread in a 'map/do something/unmap' pattern. In fact this is the pattern used in ~90% of the callers of kmap(). This pattern works very well for the pmem use case and the testing which was done. However, there were another ~20-30 kmap users which do not follow this pattern. Some of them seem to expect the mapping to be 'global' while others require a detailed audit to be sure.[2][3] While we don't anticipate global mappings to pmem there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why. There were a number of options considered. 1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only Option 1 is simply not feasible kmap_atomic() is not the same semantic as kmap() within a single tread. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior intended for future users. Therefore, option #4 was chosen. To handle the global PKRS state in the most efficient manner possible. We lazily override the thread specific PKRS key value only when needed because we anticipate PKS to not be needed will not be needed most of the time. And even when it is used 90% of the time it is a thread local call. [1] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny@intel.com/ [2] The following list of callers continue calling kmap() (utilizing the global PKRS). It would be nice if more of them could be converted to kmap_thread() drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]); drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl)); drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page); drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page); drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1)); drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset; drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg)); drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset; fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]); fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page); fs/ceph/inode.c: ctl->dentries = kmap(ctl->page); fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr); lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset; net/ceph/pagelist.c: pl->mapped_tail = kmap(page); net/ceph/pagelist.c: pl->mapped_tail = kmap(page); virt/kvm/kvm_main.c: hva = kmap(page); [3] The following appear to follow the same pattern as ext2 which was converted after some code audit. So I _think_ they too could be converted to k[un]map_thread(). fs/freevxfs/vxfs_subr.c|75| kmap(pp); fs/jfs/jfs_metapage.c|102| kmap(page); fs/jfs/jfs_metapage.c|156| kmap(page); fs/minix/dir.c|72| kmap(page); fs/nilfs2/dir.c|195| kmap(page); fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page); fs/ntfs/aops.h|78| kmap(page); fs/ntfs/compress.c|574| kmap(page); fs/qnx6/dir.c|32| kmap(page); fs/qnx6/dir.c|58| kmap(*p = page); fs/qnx6/inode.c|190| kmap(page); fs/qnx6/inode.c|557| kmap(page); fs/reiserfs/inode.c|2397| kmap(bh_result->b_page); fs/reiserfs/xattr.c|444| kmap(page); fs/sysv/dir.c|60| kmap(page); fs/sysv/dir.c|262| kmap(page); fs/ufs/dir.c|194| kmap(page); fs/ufs/dir.c|562| kmap(page); Ira Weiny (58): x86/pks: Add a global pkrs option x86/pks/test: Add testing for global option memremap: Add zone device access protection kmap: Add stray access protection for device pages kmap: Introduce k[un]map_thread kmap: Introduce k[un]map_thread debugging drivers/drbd: Utilize new kmap_thread() drivers/firmware_loader: Utilize new kmap_thread() drivers/gpu: Utilize new kmap_thread() drivers/rdma: Utilize new kmap_thread() drivers/net: Utilize new kmap_thread() fs/afs: Utilize new kmap_thread() fs/btrfs: Utilize new kmap_thread() fs/cifs: Utilize new kmap_thread() fs/ecryptfs: Utilize new kmap_thread() fs/gfs2: Utilize new kmap_thread() fs/nilfs2: Utilize new kmap_thread() fs/hfs: Utilize new kmap_thread() fs/hfsplus: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() fs/nfs: Utilize new kmap_thread() fs/f2fs: Utilize new kmap_thread() fs/fuse: Utilize new kmap_thread() fs/freevxfs: Utilize new kmap_thread() fs/reiserfs: Utilize new kmap_thread() fs/zonefs: Utilize new kmap_thread() fs/ubifs: Utilize new kmap_thread() fs/cachefiles: Utilize new kmap_thread() fs/ntfs: Utilize new kmap_thread() fs/romfs: Utilize new kmap_thread() fs/vboxsf: Utilize new kmap_thread() fs/hostfs: Utilize new kmap_thread() fs/cramfs: Utilize new kmap_thread() fs/erofs: Utilize new kmap_thread() fs: Utilize new kmap_thread() fs/ext2: Use ext2_put_page fs/ext2: Utilize new kmap_thread() fs/isofs: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() net: Utilize new kmap_thread() drivers/target: Utilize new kmap_thread() drivers/scsi: Utilize new kmap_thread() drivers/mmc: Utilize new kmap_thread() drivers/xen: Utilize new kmap_thread() drivers/firmware: Utilize new kmap_thread() drives/staging: Utilize new kmap_thread() drivers/mtd: Utilize new kmap_thread() drivers/md: Utilize new kmap_thread() drivers/misc: Utilize new kmap_thread() drivers/android: Utilize new kmap_thread() kernel: Utilize new kmap_thread() mm: Utilize new kmap_thread() lib: Utilize new kmap_thread() powerpc: Utilize new kmap_thread() samples: Utilize new kmap_thread() dax: Stray access protection for dax_direct_access() nvdimm/pmem: Stray access protection for pmem->virt_addr [dax|pmem]: Enable stray access protection Documentation/core-api/protection-keys.rst | 11 +- arch/powerpc/mm/mem.c | 4 +- arch/x86/entry/common.c | 28 +++ arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 ++++++- arch/x86/mm/fault.c | 193 ++++++++++++++---- arch/x86/mm/pkeys.c | 88 ++++++-- drivers/android/binder_alloc.c | 4 +- drivers/base/firmware_loader/fallback.c | 4 +- drivers/base/firmware_loader/main.c | 4 +- drivers/block/drbd/drbd_main.c | 4 +- drivers/block/drbd/drbd_receiver.c | 12 +- drivers/dax/device.c | 2 + drivers/dax/super.c | 2 + drivers/firmware/efi/capsule-loader.c | 6 +- drivers/firmware/efi/capsule.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +- drivers/gpu/drm/gma500/gma_display.c | 4 +- drivers/gpu/drm/gma500/mmu.c | 10 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/infiniband/hw/hfi1/sdma.c | 4 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +- drivers/md/bcache/request.c | 4 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +- drivers/mmc/host/mmc_spi.c | 4 +- drivers/mmc/host/sdricoh_cs.c | 4 +- drivers/mtd/mtd_blkdevs.c | 12 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +- .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/nvdimm/pmem.c | 6 + drivers/scsi/ipr.c | 8 +- drivers/scsi/pmcraid.c | 8 +- drivers/staging/rts5208/rtsx_transport.c | 4 +- drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_rd.c | 4 +- drivers/target/target_core_transport.c | 4 +- drivers/xen/gntalloc.c | 4 +- fs/afs/dir.c | 16 +- fs/afs/dir_edit.c | 16 +- fs/afs/mntpt.c | 4 +- fs/afs/write.c | 4 +- fs/aio.c | 4 +- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/btrfs/check-integrity.c | 4 +- fs/btrfs/compression.c | 4 +- fs/btrfs/inode.c | 16 +- fs/btrfs/lzo.c | 24 +-- fs/btrfs/raid56.c | 34 +-- fs/btrfs/reflink.c | 8 +- fs/btrfs/send.c | 4 +- fs/btrfs/zlib.c | 32 +-- fs/btrfs/zstd.c | 20 +- fs/cachefiles/rdwr.c | 4 +- fs/cifs/cifsencrypt.c | 6 +- fs/cifs/file.c | 16 +- fs/cifs/smb2ops.c | 8 +- fs/cramfs/inode.c | 10 +- fs/ecryptfs/crypto.c | 8 +- fs/ecryptfs/read_write.c | 8 +- fs/erofs/super.c | 4 +- fs/erofs/xattr.c | 4 +- fs/exec.c | 10 +- fs/ext2/dir.c | 8 +- fs/ext2/ext2.h | 8 + fs/ext2/namei.c | 15 +- fs/f2fs/f2fs.h | 8 +- fs/freevxfs/vxfs_immed.c | 4 +- fs/fuse/readdir.c | 4 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/ops_fstype.c | 4 +- fs/hfs/bnode.c | 14 +- fs/hfs/btree.c | 20 +- fs/hfsplus/bitmap.c | 20 +- fs/hfsplus/bnode.c | 102 ++++----- fs/hfsplus/btree.c | 18 +- fs/hostfs/hostfs_kern.c | 12 +- fs/io_uring.c | 4 +- fs/isofs/compress.c | 4 +- fs/jffs2/file.c | 8 +- fs/jffs2/gc.c | 4 +- fs/nfs/dir.c | 20 +- fs/nilfs2/alloc.c | 34 +-- fs/nilfs2/cpfile.c | 4 +- fs/ntfs/aops.c | 4 +- fs/reiserfs/journal.c | 4 +- fs/romfs/super.c | 4 +- fs/splice.c | 4 +- fs/ubifs/file.c | 16 +- fs/vboxsf/file.c | 12 +- fs/zonefs/super.c | 4 +- include/linux/entry-common.h | 3 + include/linux/highmem.h | 63 +++++- include/linux/memremap.h | 1 + include/linux/mm.h | 43 ++++ include/linux/pkeys.h | 6 +- include/linux/sched.h | 8 + include/trace/events/kmap_thread.h | 56 +++++ init/init_task.c | 6 + kernel/fork.c | 18 ++ kernel/kexec_core.c | 8 +- lib/Kconfig.debug | 8 + lib/iov_iter.c | 12 +- lib/pks/pks_test.c | 138 +++++++++++-- lib/test_bpf.c | 4 +- lib/test_hmm.c | 8 +- mm/Kconfig | 13 ++ mm/debug.c | 23 +++ mm/memory.c | 8 +- mm/memremap.c | 90 ++++++++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 4 +- net/ceph/messenger.c | 4 +- net/core/datagram.c | 4 +- net/core/sock.c | 8 +- net/ipv4/ip_output.c | 4 +- net/sunrpc/cache.c | 4 +- net/sunrpc/xdr.c | 8 +- net/tls/tls_device.c | 4 +- samples/vfio-mdev/mbochs.c | 4 +- 131 files changed, 1284 insertions(+), 565 deletions(-) create mode 100644 include/trace/events/kmap_thread.h -- 2.28.0.rc0.12.gb6a658bd00c9 _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FDF8C2D0A3 for ; Fri, 9 Oct 2020 19:50:49 +0000 (UTC) Received: from whitealder.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B487F222BA for ; Fri, 9 Oct 2020 19:50:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B487F222BA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=driverdev-devel-bounces@linuxdriverproject.org Received: from localhost (localhost [127.0.0.1]) by whitealder.osuosl.org (Postfix) with ESMTP id 6FC38878A5; Fri, 9 Oct 2020 19:50:48 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from whitealder.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id x+xcgXJVQxPV; Fri, 9 Oct 2020 19:50:45 +0000 (UTC) Received: from ash.osuosl.org (ash.osuosl.org [140.211.166.34]) by whitealder.osuosl.org (Postfix) with ESMTP id E195387750; Fri, 9 Oct 2020 19:50:45 +0000 (UTC) Received: from fraxinus.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by ash.osuosl.org (Postfix) with ESMTP id A0FCC1BF2F3 for ; Fri, 9 Oct 2020 19:50:44 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by fraxinus.osuosl.org (Postfix) with ESMTP id 9CBB586E25 for ; Fri, 9 Oct 2020 19:50:44 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from fraxinus.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EWdNtUxJ5Sbw for ; Fri, 9 Oct 2020 19:50:43 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by fraxinus.osuosl.org (Postfix) with ESMTPS id 0E64586C29 for ; Fri, 9 Oct 2020 19:50:42 +0000 (UTC) IronPort-SDR: pJ+EqYVdTpWKpE0IoHGOizlv1wyb8BRM/ZnBSIJuqbewvHN9nAZEuYnO3Dir0o4OBoXbP3GQ2R 0rT0oGWHzVhQ== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="229714870" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="229714870" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 IronPort-SDR: wgvSBhlinBwJf2eRaqYm1d4mOPDeheaaBRmvXZpaWhx0BsPjq5MOqCRmfglsuVIrge+HvLIvQ5 IT741lyNdN2Q== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="419536654" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 From: ira.weiny@intel.com To: Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Subject: [PATCH RFC PKS/PMEM 00/58] PMEM: Introduce stray write protection for PMEM Date: Fri, 9 Oct 2020 12:49:35 -0700 Message-Id: <20201009195033.3208459-1-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 MIME-Version: 1.0 X-BeenThere: driverdev-devel@linuxdriverproject.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux Driver Project Developer List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-aio@kvack.org, linux-efi@vger.kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-mmc@vger.kernel.org, Dave Hansen , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, target-devel@vger.kernel.org, linux-mtd@lists.infradead.org, linux-kselftest@vger.kernel.org, samba-technical@lists.samba.org, ceph-devel@vger.kernel.org, drbd-dev@lists.linbit.com, devel@driverdev.osuosl.org, linux-cifs@vger.kernel.org, linux-nilfs@vger.kernel.org, linux-scsi@vger.kernel.org, linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org, x86@kernel.org, amd-gfx@lists.freedesktop.org, linux-afs@lists.infradead.org, cluster-devel@redhat.com, linux-cachefs@redhat.com, intel-wired-lan@lists.osuosl.org, xen-devel@lists.xenproject.org, linux-ext4@vger.kernel.org, Fenghua Yu , linux-um@lists.infradead.org, intel-gfx@lists.freedesktop.org, ecryptfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, reiserfs-devel@vger.kernel.org, linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Dan Williams , io-uring@vger.kernel.org, linux-nfs@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, netdev@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-btrfs@vger.kernel.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: driverdev-devel-bounces@linuxdriverproject.org Sender: "devel" From: Ira Weiny Should a stray write in the kernel occur persistent memory is affected more than regular memory. A write to the wrong area of memory could result in latent data corruption which will will persist after a reboot. PKS provides a nice way to restrict access to persistent memory kernel mappings, while providing fast access when needed. Since the last RFC[1] this patch set has grown quite a bit. It now depends on the core patches submitted separately. https://lore.kernel.org/lkml/20201009194258.3207172-1-ira.weiny@intel.com/ And contained in the git tree here: https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3 However, functionally there is only 1 major change from the last RFC. Specifically, kmap() is most often used within a single thread in a 'map/do something/unmap' pattern. In fact this is the pattern used in ~90% of the callers of kmap(). This pattern works very well for the pmem use case and the testing which was done. However, there were another ~20-30 kmap users which do not follow this pattern. Some of them seem to expect the mapping to be 'global' while others require a detailed audit to be sure.[2][3] While we don't anticipate global mappings to pmem there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why. There were a number of options considered. 1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only Option 1 is simply not feasible kmap_atomic() is not the same semantic as kmap() within a single tread. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior intended for future users. Therefore, option #4 was chosen. To handle the global PKRS state in the most efficient manner possible. We lazily override the thread specific PKRS key value only when needed because we anticipate PKS to not be needed will not be needed most of the time. And even when it is used 90% of the time it is a thread local call. [1] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny@intel.com/ [2] The following list of callers continue calling kmap() (utilizing the global PKRS). It would be nice if more of them could be converted to kmap_thread() drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]); drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl)); drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page); drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page); drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1)); drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset; drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg)); drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset; fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]); fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page); fs/ceph/inode.c: ctl->dentries = kmap(ctl->page); fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr); lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset; net/ceph/pagelist.c: pl->mapped_tail = kmap(page); net/ceph/pagelist.c: pl->mapped_tail = kmap(page); virt/kvm/kvm_main.c: hva = kmap(page); [3] The following appear to follow the same pattern as ext2 which was converted after some code audit. So I _think_ they too could be converted to k[un]map_thread(). fs/freevxfs/vxfs_subr.c|75| kmap(pp); fs/jfs/jfs_metapage.c|102| kmap(page); fs/jfs/jfs_metapage.c|156| kmap(page); fs/minix/dir.c|72| kmap(page); fs/nilfs2/dir.c|195| kmap(page); fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page); fs/ntfs/aops.h|78| kmap(page); fs/ntfs/compress.c|574| kmap(page); fs/qnx6/dir.c|32| kmap(page); fs/qnx6/dir.c|58| kmap(*p = page); fs/qnx6/inode.c|190| kmap(page); fs/qnx6/inode.c|557| kmap(page); fs/reiserfs/inode.c|2397| kmap(bh_result->b_page); fs/reiserfs/xattr.c|444| kmap(page); fs/sysv/dir.c|60| kmap(page); fs/sysv/dir.c|262| kmap(page); fs/ufs/dir.c|194| kmap(page); fs/ufs/dir.c|562| kmap(page); Ira Weiny (58): x86/pks: Add a global pkrs option x86/pks/test: Add testing for global option memremap: Add zone device access protection kmap: Add stray access protection for device pages kmap: Introduce k[un]map_thread kmap: Introduce k[un]map_thread debugging drivers/drbd: Utilize new kmap_thread() drivers/firmware_loader: Utilize new kmap_thread() drivers/gpu: Utilize new kmap_thread() drivers/rdma: Utilize new kmap_thread() drivers/net: Utilize new kmap_thread() fs/afs: Utilize new kmap_thread() fs/btrfs: Utilize new kmap_thread() fs/cifs: Utilize new kmap_thread() fs/ecryptfs: Utilize new kmap_thread() fs/gfs2: Utilize new kmap_thread() fs/nilfs2: Utilize new kmap_thread() fs/hfs: Utilize new kmap_thread() fs/hfsplus: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() fs/nfs: Utilize new kmap_thread() fs/f2fs: Utilize new kmap_thread() fs/fuse: Utilize new kmap_thread() fs/freevxfs: Utilize new kmap_thread() fs/reiserfs: Utilize new kmap_thread() fs/zonefs: Utilize new kmap_thread() fs/ubifs: Utilize new kmap_thread() fs/cachefiles: Utilize new kmap_thread() fs/ntfs: Utilize new kmap_thread() fs/romfs: Utilize new kmap_thread() fs/vboxsf: Utilize new kmap_thread() fs/hostfs: Utilize new kmap_thread() fs/cramfs: Utilize new kmap_thread() fs/erofs: Utilize new kmap_thread() fs: Utilize new kmap_thread() fs/ext2: Use ext2_put_page fs/ext2: Utilize new kmap_thread() fs/isofs: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() net: Utilize new kmap_thread() drivers/target: Utilize new kmap_thread() drivers/scsi: Utilize new kmap_thread() drivers/mmc: Utilize new kmap_thread() drivers/xen: Utilize new kmap_thread() drivers/firmware: Utilize new kmap_thread() drives/staging: Utilize new kmap_thread() drivers/mtd: Utilize new kmap_thread() drivers/md: Utilize new kmap_thread() drivers/misc: Utilize new kmap_thread() drivers/android: Utilize new kmap_thread() kernel: Utilize new kmap_thread() mm: Utilize new kmap_thread() lib: Utilize new kmap_thread() powerpc: Utilize new kmap_thread() samples: Utilize new kmap_thread() dax: Stray access protection for dax_direct_access() nvdimm/pmem: Stray access protection for pmem->virt_addr [dax|pmem]: Enable stray access protection Documentation/core-api/protection-keys.rst | 11 +- arch/powerpc/mm/mem.c | 4 +- arch/x86/entry/common.c | 28 +++ arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 ++++++- arch/x86/mm/fault.c | 193 ++++++++++++++---- arch/x86/mm/pkeys.c | 88 ++++++-- drivers/android/binder_alloc.c | 4 +- drivers/base/firmware_loader/fallback.c | 4 +- drivers/base/firmware_loader/main.c | 4 +- drivers/block/drbd/drbd_main.c | 4 +- drivers/block/drbd/drbd_receiver.c | 12 +- drivers/dax/device.c | 2 + drivers/dax/super.c | 2 + drivers/firmware/efi/capsule-loader.c | 6 +- drivers/firmware/efi/capsule.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +- drivers/gpu/drm/gma500/gma_display.c | 4 +- drivers/gpu/drm/gma500/mmu.c | 10 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/infiniband/hw/hfi1/sdma.c | 4 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +- drivers/md/bcache/request.c | 4 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +- drivers/mmc/host/mmc_spi.c | 4 +- drivers/mmc/host/sdricoh_cs.c | 4 +- drivers/mtd/mtd_blkdevs.c | 12 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +- .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/nvdimm/pmem.c | 6 + drivers/scsi/ipr.c | 8 +- drivers/scsi/pmcraid.c | 8 +- drivers/staging/rts5208/rtsx_transport.c | 4 +- drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_rd.c | 4 +- drivers/target/target_core_transport.c | 4 +- drivers/xen/gntalloc.c | 4 +- fs/afs/dir.c | 16 +- fs/afs/dir_edit.c | 16 +- fs/afs/mntpt.c | 4 +- fs/afs/write.c | 4 +- fs/aio.c | 4 +- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/btrfs/check-integrity.c | 4 +- fs/btrfs/compression.c | 4 +- fs/btrfs/inode.c | 16 +- fs/btrfs/lzo.c | 24 +-- fs/btrfs/raid56.c | 34 +-- fs/btrfs/reflink.c | 8 +- fs/btrfs/send.c | 4 +- fs/btrfs/zlib.c | 32 +-- fs/btrfs/zstd.c | 20 +- fs/cachefiles/rdwr.c | 4 +- fs/cifs/cifsencrypt.c | 6 +- fs/cifs/file.c | 16 +- fs/cifs/smb2ops.c | 8 +- fs/cramfs/inode.c | 10 +- fs/ecryptfs/crypto.c | 8 +- fs/ecryptfs/read_write.c | 8 +- fs/erofs/super.c | 4 +- fs/erofs/xattr.c | 4 +- fs/exec.c | 10 +- fs/ext2/dir.c | 8 +- fs/ext2/ext2.h | 8 + fs/ext2/namei.c | 15 +- fs/f2fs/f2fs.h | 8 +- fs/freevxfs/vxfs_immed.c | 4 +- fs/fuse/readdir.c | 4 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/ops_fstype.c | 4 +- fs/hfs/bnode.c | 14 +- fs/hfs/btree.c | 20 +- fs/hfsplus/bitmap.c | 20 +- fs/hfsplus/bnode.c | 102 ++++----- fs/hfsplus/btree.c | 18 +- fs/hostfs/hostfs_kern.c | 12 +- fs/io_uring.c | 4 +- fs/isofs/compress.c | 4 +- fs/jffs2/file.c | 8 +- fs/jffs2/gc.c | 4 +- fs/nfs/dir.c | 20 +- fs/nilfs2/alloc.c | 34 +-- fs/nilfs2/cpfile.c | 4 +- fs/ntfs/aops.c | 4 +- fs/reiserfs/journal.c | 4 +- fs/romfs/super.c | 4 +- fs/splice.c | 4 +- fs/ubifs/file.c | 16 +- fs/vboxsf/file.c | 12 +- fs/zonefs/super.c | 4 +- include/linux/entry-common.h | 3 + include/linux/highmem.h | 63 +++++- include/linux/memremap.h | 1 + include/linux/mm.h | 43 ++++ include/linux/pkeys.h | 6 +- include/linux/sched.h | 8 + include/trace/events/kmap_thread.h | 56 +++++ init/init_task.c | 6 + kernel/fork.c | 18 ++ kernel/kexec_core.c | 8 +- lib/Kconfig.debug | 8 + lib/iov_iter.c | 12 +- lib/pks/pks_test.c | 138 +++++++++++-- lib/test_bpf.c | 4 +- lib/test_hmm.c | 8 +- mm/Kconfig | 13 ++ mm/debug.c | 23 +++ mm/memory.c | 8 +- mm/memremap.c | 90 ++++++++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 4 +- net/ceph/messenger.c | 4 +- net/core/datagram.c | 4 +- net/core/sock.c | 8 +- net/ipv4/ip_output.c | 4 +- net/sunrpc/cache.c | 4 +- net/sunrpc/xdr.c | 8 +- net/tls/tls_device.c | 4 +- samples/vfio-mdev/mbochs.c | 4 +- 131 files changed, 1284 insertions(+), 565 deletions(-) create mode 100644 include/trace/events/kmap_thread.h -- 2.28.0.rc0.12.gb6a658bd00c9 _______________________________________________ devel mailing list devel@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D5EFC56204 for ; Fri, 9 Oct 2020 19:59:43 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 458CB22267 for ; Fri, 9 Oct 2020 19:59:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 458CB22267 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4C7JnQ4SkfzDqDy for ; Sat, 10 Oct 2020 06:59:38 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=intel.com (client-ip=192.55.52.43; helo=mga05.intel.com; envelope-from=ira.weiny@intel.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=intel.com Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4C7JbB6MXwzDqXt; Sat, 10 Oct 2020 06:50:45 +1100 (AEDT) IronPort-SDR: pUremmhSpP/u1Pj1r5HCLTFoaVFVRd+SBbZbQaN8aflvHsHjFqQEAT9IE7ZGIigZzNeStLhjX1 tFzLLzGkoRaQ== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="250225858" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="250225858" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 IronPort-SDR: wgvSBhlinBwJf2eRaqYm1d4mOPDeheaaBRmvXZpaWhx0BsPjq5MOqCRmfglsuVIrge+HvLIvQ5 IT741lyNdN2Q== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="419536654" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 From: ira.weiny@intel.com To: Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Subject: [PATCH RFC PKS/PMEM 00/58] PMEM: Introduce stray write protection for PMEM Date: Fri, 9 Oct 2020 12:49:35 -0700 Message-Id: <20201009195033.3208459-1-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-aio@kvack.org, linux-efi@vger.kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-mmc@vger.kernel.org, Dave Hansen , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, target-devel@vger.kernel.org, linux-mtd@lists.infradead.org, linux-kselftest@vger.kernel.org, samba-technical@lists.samba.org, Ira Weiny , ceph-devel@vger.kernel.org, drbd-dev@lists.linbit.com, devel@driverdev.osuosl.org, linux-cifs@vger.kernel.org, linux-nilfs@vger.kernel.org, linux-scsi@vger.kernel.org, linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org, x86@kernel.org, amd-gfx@lists.freedesktop.org, linux-afs@lists.infradead.org, cluster-devel@redhat.com, linux-cachefs@redhat.com, intel-wired-lan@lists.osuosl.org, xen-devel@lists.xenproject.org, linux-ext4@vger.kernel.org, Fenghua Yu , linux-um@lists.infradead.org, intel-gfx@lists.freedesktop.org, ecryptfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, reiserfs-devel@vger.kernel.org, linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Dan Williams , io-uring@vger.kernel.org, linux-nfs@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, netdev@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-btrfs@vger.kernel.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" From: Ira Weiny Should a stray write in the kernel occur persistent memory is affected more than regular memory. A write to the wrong area of memory could result in latent data corruption which will will persist after a reboot. PKS provides a nice way to restrict access to persistent memory kernel mappings, while providing fast access when needed. Since the last RFC[1] this patch set has grown quite a bit. It now depends on the core patches submitted separately. https://lore.kernel.org/lkml/20201009194258.3207172-1-ira.weiny@intel.com/ And contained in the git tree here: https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3 However, functionally there is only 1 major change from the last RFC. Specifically, kmap() is most often used within a single thread in a 'map/do something/unmap' pattern. In fact this is the pattern used in ~90% of the callers of kmap(). This pattern works very well for the pmem use case and the testing which was done. However, there were another ~20-30 kmap users which do not follow this pattern. Some of them seem to expect the mapping to be 'global' while others require a detailed audit to be sure.[2][3] While we don't anticipate global mappings to pmem there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why. There were a number of options considered. 1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only Option 1 is simply not feasible kmap_atomic() is not the same semantic as kmap() within a single tread. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior intended for future users. Therefore, option #4 was chosen. To handle the global PKRS state in the most efficient manner possible. We lazily override the thread specific PKRS key value only when needed because we anticipate PKS to not be needed will not be needed most of the time. And even when it is used 90% of the time it is a thread local call. [1] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny@intel.com/ [2] The following list of callers continue calling kmap() (utilizing the global PKRS). It would be nice if more of them could be converted to kmap_thread() drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]); drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl)); drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page); drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page); drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1)); drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset; drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg)); drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset; fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]); fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page); fs/ceph/inode.c: ctl->dentries = kmap(ctl->page); fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr); lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset; net/ceph/pagelist.c: pl->mapped_tail = kmap(page); net/ceph/pagelist.c: pl->mapped_tail = kmap(page); virt/kvm/kvm_main.c: hva = kmap(page); [3] The following appear to follow the same pattern as ext2 which was converted after some code audit. So I _think_ they too could be converted to k[un]map_thread(). fs/freevxfs/vxfs_subr.c|75| kmap(pp); fs/jfs/jfs_metapage.c|102| kmap(page); fs/jfs/jfs_metapage.c|156| kmap(page); fs/minix/dir.c|72| kmap(page); fs/nilfs2/dir.c|195| kmap(page); fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page); fs/ntfs/aops.h|78| kmap(page); fs/ntfs/compress.c|574| kmap(page); fs/qnx6/dir.c|32| kmap(page); fs/qnx6/dir.c|58| kmap(*p = page); fs/qnx6/inode.c|190| kmap(page); fs/qnx6/inode.c|557| kmap(page); fs/reiserfs/inode.c|2397| kmap(bh_result->b_page); fs/reiserfs/xattr.c|444| kmap(page); fs/sysv/dir.c|60| kmap(page); fs/sysv/dir.c|262| kmap(page); fs/ufs/dir.c|194| kmap(page); fs/ufs/dir.c|562| kmap(page); Ira Weiny (58): x86/pks: Add a global pkrs option x86/pks/test: Add testing for global option memremap: Add zone device access protection kmap: Add stray access protection for device pages kmap: Introduce k[un]map_thread kmap: Introduce k[un]map_thread debugging drivers/drbd: Utilize new kmap_thread() drivers/firmware_loader: Utilize new kmap_thread() drivers/gpu: Utilize new kmap_thread() drivers/rdma: Utilize new kmap_thread() drivers/net: Utilize new kmap_thread() fs/afs: Utilize new kmap_thread() fs/btrfs: Utilize new kmap_thread() fs/cifs: Utilize new kmap_thread() fs/ecryptfs: Utilize new kmap_thread() fs/gfs2: Utilize new kmap_thread() fs/nilfs2: Utilize new kmap_thread() fs/hfs: Utilize new kmap_thread() fs/hfsplus: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() fs/nfs: Utilize new kmap_thread() fs/f2fs: Utilize new kmap_thread() fs/fuse: Utilize new kmap_thread() fs/freevxfs: Utilize new kmap_thread() fs/reiserfs: Utilize new kmap_thread() fs/zonefs: Utilize new kmap_thread() fs/ubifs: Utilize new kmap_thread() fs/cachefiles: Utilize new kmap_thread() fs/ntfs: Utilize new kmap_thread() fs/romfs: Utilize new kmap_thread() fs/vboxsf: Utilize new kmap_thread() fs/hostfs: Utilize new kmap_thread() fs/cramfs: Utilize new kmap_thread() fs/erofs: Utilize new kmap_thread() fs: Utilize new kmap_thread() fs/ext2: Use ext2_put_page fs/ext2: Utilize new kmap_thread() fs/isofs: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() net: Utilize new kmap_thread() drivers/target: Utilize new kmap_thread() drivers/scsi: Utilize new kmap_thread() drivers/mmc: Utilize new kmap_thread() drivers/xen: Utilize new kmap_thread() drivers/firmware: Utilize new kmap_thread() drives/staging: Utilize new kmap_thread() drivers/mtd: Utilize new kmap_thread() drivers/md: Utilize new kmap_thread() drivers/misc: Utilize new kmap_thread() drivers/android: Utilize new kmap_thread() kernel: Utilize new kmap_thread() mm: Utilize new kmap_thread() lib: Utilize new kmap_thread() powerpc: Utilize new kmap_thread() samples: Utilize new kmap_thread() dax: Stray access protection for dax_direct_access() nvdimm/pmem: Stray access protection for pmem->virt_addr [dax|pmem]: Enable stray access protection Documentation/core-api/protection-keys.rst | 11 +- arch/powerpc/mm/mem.c | 4 +- arch/x86/entry/common.c | 28 +++ arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 ++++++- arch/x86/mm/fault.c | 193 ++++++++++++++---- arch/x86/mm/pkeys.c | 88 ++++++-- drivers/android/binder_alloc.c | 4 +- drivers/base/firmware_loader/fallback.c | 4 +- drivers/base/firmware_loader/main.c | 4 +- drivers/block/drbd/drbd_main.c | 4 +- drivers/block/drbd/drbd_receiver.c | 12 +- drivers/dax/device.c | 2 + drivers/dax/super.c | 2 + drivers/firmware/efi/capsule-loader.c | 6 +- drivers/firmware/efi/capsule.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +- drivers/gpu/drm/gma500/gma_display.c | 4 +- drivers/gpu/drm/gma500/mmu.c | 10 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/infiniband/hw/hfi1/sdma.c | 4 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +- drivers/md/bcache/request.c | 4 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +- drivers/mmc/host/mmc_spi.c | 4 +- drivers/mmc/host/sdricoh_cs.c | 4 +- drivers/mtd/mtd_blkdevs.c | 12 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +- .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/nvdimm/pmem.c | 6 + drivers/scsi/ipr.c | 8 +- drivers/scsi/pmcraid.c | 8 +- drivers/staging/rts5208/rtsx_transport.c | 4 +- drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_rd.c | 4 +- drivers/target/target_core_transport.c | 4 +- drivers/xen/gntalloc.c | 4 +- fs/afs/dir.c | 16 +- fs/afs/dir_edit.c | 16 +- fs/afs/mntpt.c | 4 +- fs/afs/write.c | 4 +- fs/aio.c | 4 +- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/btrfs/check-integrity.c | 4 +- fs/btrfs/compression.c | 4 +- fs/btrfs/inode.c | 16 +- fs/btrfs/lzo.c | 24 +-- fs/btrfs/raid56.c | 34 +-- fs/btrfs/reflink.c | 8 +- fs/btrfs/send.c | 4 +- fs/btrfs/zlib.c | 32 +-- fs/btrfs/zstd.c | 20 +- fs/cachefiles/rdwr.c | 4 +- fs/cifs/cifsencrypt.c | 6 +- fs/cifs/file.c | 16 +- fs/cifs/smb2ops.c | 8 +- fs/cramfs/inode.c | 10 +- fs/ecryptfs/crypto.c | 8 +- fs/ecryptfs/read_write.c | 8 +- fs/erofs/super.c | 4 +- fs/erofs/xattr.c | 4 +- fs/exec.c | 10 +- fs/ext2/dir.c | 8 +- fs/ext2/ext2.h | 8 + fs/ext2/namei.c | 15 +- fs/f2fs/f2fs.h | 8 +- fs/freevxfs/vxfs_immed.c | 4 +- fs/fuse/readdir.c | 4 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/ops_fstype.c | 4 +- fs/hfs/bnode.c | 14 +- fs/hfs/btree.c | 20 +- fs/hfsplus/bitmap.c | 20 +- fs/hfsplus/bnode.c | 102 ++++----- fs/hfsplus/btree.c | 18 +- fs/hostfs/hostfs_kern.c | 12 +- fs/io_uring.c | 4 +- fs/isofs/compress.c | 4 +- fs/jffs2/file.c | 8 +- fs/jffs2/gc.c | 4 +- fs/nfs/dir.c | 20 +- fs/nilfs2/alloc.c | 34 +-- fs/nilfs2/cpfile.c | 4 +- fs/ntfs/aops.c | 4 +- fs/reiserfs/journal.c | 4 +- fs/romfs/super.c | 4 +- fs/splice.c | 4 +- fs/ubifs/file.c | 16 +- fs/vboxsf/file.c | 12 +- fs/zonefs/super.c | 4 +- include/linux/entry-common.h | 3 + include/linux/highmem.h | 63 +++++- include/linux/memremap.h | 1 + include/linux/mm.h | 43 ++++ include/linux/pkeys.h | 6 +- include/linux/sched.h | 8 + include/trace/events/kmap_thread.h | 56 +++++ init/init_task.c | 6 + kernel/fork.c | 18 ++ kernel/kexec_core.c | 8 +- lib/Kconfig.debug | 8 + lib/iov_iter.c | 12 +- lib/pks/pks_test.c | 138 +++++++++++-- lib/test_bpf.c | 4 +- lib/test_hmm.c | 8 +- mm/Kconfig | 13 ++ mm/debug.c | 23 +++ mm/memory.c | 8 +- mm/memremap.c | 90 ++++++++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 4 +- net/ceph/messenger.c | 4 +- net/core/datagram.c | 4 +- net/core/sock.c | 8 +- net/ipv4/ip_output.c | 4 +- net/sunrpc/cache.c | 4 +- net/sunrpc/xdr.c | 8 +- net/tls/tls_device.c | 4 +- samples/vfio-mdev/mbochs.c | 4 +- 131 files changed, 1284 insertions(+), 565 deletions(-) create mode 100644 include/trace/events/kmap_thread.h -- 2.28.0.rc0.12.gb6a658bd00c9 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2644DC5517A for ; Thu, 22 Oct 2020 22:34:53 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7B16C24658 for ; Thu, 22 Oct 2020 22:34:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="3Cp8NYx3" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7B16C24658 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-mtd-bounces+linux-mtd=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:To:From: Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender :Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=JOg3xdVAAn1n8jdPVm3mdHymdMzeCrkcEjK5Fgp6u1w=; b=3Cp8NYx3n2jNRLDTqesAyrHuUe KnnIM/qKpxE2zKVOVsoxqpOBSQapsqecG81wJiFfQhT//qROp9b3Z5jB8/dZ8jb0fskHsDByGzHpC f4xxJ+UyPULMO5T914leIwNJchcAhFHfdti64un3g0HFoN0ydTMyT3X3uQ6y705HFBmvdDSQH8SBG glWt2sl4ilwWXvFTrzNB60d07G72Ql2UgehvussznK66VZpF8+EGcqRVX6HNSx44U2E3R8g4JqqMH yTFN21CBw/SV1X/6RYf0chpk8qWE6NmufPLhAPsxZUmZt83iyMOrjfdvtjaNy9EdXDUUEIiHm4CIb tlMc5z0w==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kVj9f-0000xb-4A; Thu, 22 Oct 2020 22:33:55 +0000 Received: from mga14.intel.com ([192.55.52.115]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kQyPe-0000CS-O7; Fri, 09 Oct 2020 19:50:51 +0000 IronPort-SDR: ZdlanByPE1CUTi29YpuNmhloh3AHEKxQUyyy0oBuut2sDV16SLZ2xmasbBrIEdUByMv3yH9PWX 2dUWUa/09l+Q== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="164743564" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="164743564" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 IronPort-SDR: wgvSBhlinBwJf2eRaqYm1d4mOPDeheaaBRmvXZpaWhx0BsPjq5MOqCRmfglsuVIrge+HvLIvQ5 IT741lyNdN2Q== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="419536654" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 From: ira.weiny@intel.com To: Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Subject: [PATCH RFC PKS/PMEM 00/58] PMEM: Introduce stray write protection for PMEM Date: Fri, 9 Oct 2020 12:49:35 -0700 Message-Id: <20201009195033.3208459-1-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201009_155047_009191_B4159EA8 X-CRM114-Status: GOOD ( 22.39 ) X-BeenThere: linux-mtd@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-aio@kvack.org, linux-efi@vger.kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-mmc@vger.kernel.org, Dave Hansen , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, target-devel@vger.kernel.org, linux-mtd@lists.infradead.org, linux-kselftest@vger.kernel.org, samba-technical@lists.samba.org, Ira Weiny , ceph-devel@vger.kernel.org, drbd-dev@lists.linbit.com, devel@driverdev.osuosl.org, linux-cifs@vger.kernel.org, linux-nilfs@vger.kernel.org, linux-scsi@vger.kernel.org, linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org, x86@kernel.org, amd-gfx@lists.freedesktop.org, linux-afs@lists.infradead.org, cluster-devel@redhat.com, linux-cachefs@redhat.com, intel-wired-lan@lists.osuosl.org, xen-devel@lists.xenproject.org, linux-ext4@vger.kernel.org, Fenghua Yu , linux-um@lists.infradead.org, intel-gfx@lists.freedesktop.org, ecryptfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, reiserfs-devel@vger.kernel.org, linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Dan Williams , io-uring@vger.kernel.org, linux-nfs@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, netdev@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-btrfs@vger.kernel.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-mtd" Errors-To: linux-mtd-bounces+linux-mtd=archiver.kernel.org@lists.infradead.org From: Ira Weiny Should a stray write in the kernel occur persistent memory is affected more than regular memory. A write to the wrong area of memory could result in latent data corruption which will will persist after a reboot. PKS provides a nice way to restrict access to persistent memory kernel mappings, while providing fast access when needed. Since the last RFC[1] this patch set has grown quite a bit. It now depends on the core patches submitted separately. https://lore.kernel.org/lkml/20201009194258.3207172-1-ira.weiny@intel.com/ And contained in the git tree here: https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3 However, functionally there is only 1 major change from the last RFC. Specifically, kmap() is most often used within a single thread in a 'map/do something/unmap' pattern. In fact this is the pattern used in ~90% of the callers of kmap(). This pattern works very well for the pmem use case and the testing which was done. However, there were another ~20-30 kmap users which do not follow this pattern. Some of them seem to expect the mapping to be 'global' while others require a detailed audit to be sure.[2][3] While we don't anticipate global mappings to pmem there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why. There were a number of options considered. 1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only Option 1 is simply not feasible kmap_atomic() is not the same semantic as kmap() within a single tread. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior intended for future users. Therefore, option #4 was chosen. To handle the global PKRS state in the most efficient manner possible. We lazily override the thread specific PKRS key value only when needed because we anticipate PKS to not be needed will not be needed most of the time. And even when it is used 90% of the time it is a thread local call. [1] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny@intel.com/ [2] The following list of callers continue calling kmap() (utilizing the global PKRS). It would be nice if more of them could be converted to kmap_thread() drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]); drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl)); drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page); drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page); drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1)); drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset; drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg)); drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset; fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]); fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page); fs/ceph/inode.c: ctl->dentries = kmap(ctl->page); fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr); lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset; net/ceph/pagelist.c: pl->mapped_tail = kmap(page); net/ceph/pagelist.c: pl->mapped_tail = kmap(page); virt/kvm/kvm_main.c: hva = kmap(page); [3] The following appear to follow the same pattern as ext2 which was converted after some code audit. So I _think_ they too could be converted to k[un]map_thread(). fs/freevxfs/vxfs_subr.c|75| kmap(pp); fs/jfs/jfs_metapage.c|102| kmap(page); fs/jfs/jfs_metapage.c|156| kmap(page); fs/minix/dir.c|72| kmap(page); fs/nilfs2/dir.c|195| kmap(page); fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page); fs/ntfs/aops.h|78| kmap(page); fs/ntfs/compress.c|574| kmap(page); fs/qnx6/dir.c|32| kmap(page); fs/qnx6/dir.c|58| kmap(*p = page); fs/qnx6/inode.c|190| kmap(page); fs/qnx6/inode.c|557| kmap(page); fs/reiserfs/inode.c|2397| kmap(bh_result->b_page); fs/reiserfs/xattr.c|444| kmap(page); fs/sysv/dir.c|60| kmap(page); fs/sysv/dir.c|262| kmap(page); fs/ufs/dir.c|194| kmap(page); fs/ufs/dir.c|562| kmap(page); Ira Weiny (58): x86/pks: Add a global pkrs option x86/pks/test: Add testing for global option memremap: Add zone device access protection kmap: Add stray access protection for device pages kmap: Introduce k[un]map_thread kmap: Introduce k[un]map_thread debugging drivers/drbd: Utilize new kmap_thread() drivers/firmware_loader: Utilize new kmap_thread() drivers/gpu: Utilize new kmap_thread() drivers/rdma: Utilize new kmap_thread() drivers/net: Utilize new kmap_thread() fs/afs: Utilize new kmap_thread() fs/btrfs: Utilize new kmap_thread() fs/cifs: Utilize new kmap_thread() fs/ecryptfs: Utilize new kmap_thread() fs/gfs2: Utilize new kmap_thread() fs/nilfs2: Utilize new kmap_thread() fs/hfs: Utilize new kmap_thread() fs/hfsplus: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() fs/nfs: Utilize new kmap_thread() fs/f2fs: Utilize new kmap_thread() fs/fuse: Utilize new kmap_thread() fs/freevxfs: Utilize new kmap_thread() fs/reiserfs: Utilize new kmap_thread() fs/zonefs: Utilize new kmap_thread() fs/ubifs: Utilize new kmap_thread() fs/cachefiles: Utilize new kmap_thread() fs/ntfs: Utilize new kmap_thread() fs/romfs: Utilize new kmap_thread() fs/vboxsf: Utilize new kmap_thread() fs/hostfs: Utilize new kmap_thread() fs/cramfs: Utilize new kmap_thread() fs/erofs: Utilize new kmap_thread() fs: Utilize new kmap_thread() fs/ext2: Use ext2_put_page fs/ext2: Utilize new kmap_thread() fs/isofs: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() net: Utilize new kmap_thread() drivers/target: Utilize new kmap_thread() drivers/scsi: Utilize new kmap_thread() drivers/mmc: Utilize new kmap_thread() drivers/xen: Utilize new kmap_thread() drivers/firmware: Utilize new kmap_thread() drives/staging: Utilize new kmap_thread() drivers/mtd: Utilize new kmap_thread() drivers/md: Utilize new kmap_thread() drivers/misc: Utilize new kmap_thread() drivers/android: Utilize new kmap_thread() kernel: Utilize new kmap_thread() mm: Utilize new kmap_thread() lib: Utilize new kmap_thread() powerpc: Utilize new kmap_thread() samples: Utilize new kmap_thread() dax: Stray access protection for dax_direct_access() nvdimm/pmem: Stray access protection for pmem->virt_addr [dax|pmem]: Enable stray access protection Documentation/core-api/protection-keys.rst | 11 +- arch/powerpc/mm/mem.c | 4 +- arch/x86/entry/common.c | 28 +++ arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 ++++++- arch/x86/mm/fault.c | 193 ++++++++++++++---- arch/x86/mm/pkeys.c | 88 ++++++-- drivers/android/binder_alloc.c | 4 +- drivers/base/firmware_loader/fallback.c | 4 +- drivers/base/firmware_loader/main.c | 4 +- drivers/block/drbd/drbd_main.c | 4 +- drivers/block/drbd/drbd_receiver.c | 12 +- drivers/dax/device.c | 2 + drivers/dax/super.c | 2 + drivers/firmware/efi/capsule-loader.c | 6 +- drivers/firmware/efi/capsule.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +- drivers/gpu/drm/gma500/gma_display.c | 4 +- drivers/gpu/drm/gma500/mmu.c | 10 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/infiniband/hw/hfi1/sdma.c | 4 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +- drivers/md/bcache/request.c | 4 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +- drivers/mmc/host/mmc_spi.c | 4 +- drivers/mmc/host/sdricoh_cs.c | 4 +- drivers/mtd/mtd_blkdevs.c | 12 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +- .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/nvdimm/pmem.c | 6 + drivers/scsi/ipr.c | 8 +- drivers/scsi/pmcraid.c | 8 +- drivers/staging/rts5208/rtsx_transport.c | 4 +- drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_rd.c | 4 +- drivers/target/target_core_transport.c | 4 +- drivers/xen/gntalloc.c | 4 +- fs/afs/dir.c | 16 +- fs/afs/dir_edit.c | 16 +- fs/afs/mntpt.c | 4 +- fs/afs/write.c | 4 +- fs/aio.c | 4 +- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/btrfs/check-integrity.c | 4 +- fs/btrfs/compression.c | 4 +- fs/btrfs/inode.c | 16 +- fs/btrfs/lzo.c | 24 +-- fs/btrfs/raid56.c | 34 +-- fs/btrfs/reflink.c | 8 +- fs/btrfs/send.c | 4 +- fs/btrfs/zlib.c | 32 +-- fs/btrfs/zstd.c | 20 +- fs/cachefiles/rdwr.c | 4 +- fs/cifs/cifsencrypt.c | 6 +- fs/cifs/file.c | 16 +- fs/cifs/smb2ops.c | 8 +- fs/cramfs/inode.c | 10 +- fs/ecryptfs/crypto.c | 8 +- fs/ecryptfs/read_write.c | 8 +- fs/erofs/super.c | 4 +- fs/erofs/xattr.c | 4 +- fs/exec.c | 10 +- fs/ext2/dir.c | 8 +- fs/ext2/ext2.h | 8 + fs/ext2/namei.c | 15 +- fs/f2fs/f2fs.h | 8 +- fs/freevxfs/vxfs_immed.c | 4 +- fs/fuse/readdir.c | 4 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/ops_fstype.c | 4 +- fs/hfs/bnode.c | 14 +- fs/hfs/btree.c | 20 +- fs/hfsplus/bitmap.c | 20 +- fs/hfsplus/bnode.c | 102 ++++----- fs/hfsplus/btree.c | 18 +- fs/hostfs/hostfs_kern.c | 12 +- fs/io_uring.c | 4 +- fs/isofs/compress.c | 4 +- fs/jffs2/file.c | 8 +- fs/jffs2/gc.c | 4 +- fs/nfs/dir.c | 20 +- fs/nilfs2/alloc.c | 34 +-- fs/nilfs2/cpfile.c | 4 +- fs/ntfs/aops.c | 4 +- fs/reiserfs/journal.c | 4 +- fs/romfs/super.c | 4 +- fs/splice.c | 4 +- fs/ubifs/file.c | 16 +- fs/vboxsf/file.c | 12 +- fs/zonefs/super.c | 4 +- include/linux/entry-common.h | 3 + include/linux/highmem.h | 63 +++++- include/linux/memremap.h | 1 + include/linux/mm.h | 43 ++++ include/linux/pkeys.h | 6 +- include/linux/sched.h | 8 + include/trace/events/kmap_thread.h | 56 +++++ init/init_task.c | 6 + kernel/fork.c | 18 ++ kernel/kexec_core.c | 8 +- lib/Kconfig.debug | 8 + lib/iov_iter.c | 12 +- lib/pks/pks_test.c | 138 +++++++++++-- lib/test_bpf.c | 4 +- lib/test_hmm.c | 8 +- mm/Kconfig | 13 ++ mm/debug.c | 23 +++ mm/memory.c | 8 +- mm/memremap.c | 90 ++++++++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 4 +- net/ceph/messenger.c | 4 +- net/core/datagram.c | 4 +- net/core/sock.c | 8 +- net/ipv4/ip_output.c | 4 +- net/sunrpc/cache.c | 4 +- net/sunrpc/xdr.c | 8 +- net/tls/tls_device.c | 4 +- samples/vfio-mdev/mbochs.c | 4 +- 131 files changed, 1284 insertions(+), 565 deletions(-) create mode 100644 include/trace/events/kmap_thread.h -- 2.28.0.rc0.12.gb6a658bd00c9 ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BFAEC433DF for ; Fri, 9 Oct 2020 19:50:44 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5713922282 for ; Fri, 9 Oct 2020 19:50:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5713922282 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9006A6ED91; Fri, 9 Oct 2020 19:50:43 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id 11CC86ED91; Fri, 9 Oct 2020 19:50:43 +0000 (UTC) IronPort-SDR: +eCjQoTJ4s+bFNrF4DvuRgPSb0Rm+BHAcRtYHgpcL60PBnHjOvU8mPN0V8JuLrq+tTZfF4ZRim evubcVlcW/Cw== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="152450719" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="152450719" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 IronPort-SDR: wgvSBhlinBwJf2eRaqYm1d4mOPDeheaaBRmvXZpaWhx0BsPjq5MOqCRmfglsuVIrge+HvLIvQ5 IT741lyNdN2Q== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="419536654" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 From: ira.weiny@intel.com To: Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Subject: [PATCH RFC PKS/PMEM 00/58] PMEM: Introduce stray write protection for PMEM Date: Fri, 9 Oct 2020 12:49:35 -0700 Message-Id: <20201009195033.3208459-1-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-aio@kvack.org, linux-efi@vger.kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-mmc@vger.kernel.org, Dave Hansen , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, target-devel@vger.kernel.org, linux-mtd@lists.infradead.org, linux-kselftest@vger.kernel.org, samba-technical@lists.samba.org, Ira Weiny , ceph-devel@vger.kernel.org, drbd-dev@lists.linbit.com, devel@driverdev.osuosl.org, linux-cifs@vger.kernel.org, linux-nilfs@vger.kernel.org, linux-scsi@vger.kernel.org, linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org, x86@kernel.org, amd-gfx@lists.freedesktop.org, linux-afs@lists.infradead.org, cluster-devel@redhat.com, linux-cachefs@redhat.com, intel-wired-lan@lists.osuosl.org, xen-devel@lists.xenproject.org, linux-ext4@vger.kernel.org, Fenghua Yu , linux-um@lists.infradead.org, intel-gfx@lists.freedesktop.org, ecryptfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, reiserfs-devel@vger.kernel.org, linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Dan Williams , io-uring@vger.kernel.org, linux-nfs@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, netdev@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-btrfs@vger.kernel.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" From: Ira Weiny Should a stray write in the kernel occur persistent memory is affected more than regular memory. A write to the wrong area of memory could result in latent data corruption which will will persist after a reboot. PKS provides a nice way to restrict access to persistent memory kernel mappings, while providing fast access when needed. Since the last RFC[1] this patch set has grown quite a bit. It now depends on the core patches submitted separately. https://lore.kernel.org/lkml/20201009194258.3207172-1-ira.weiny@intel.com/ And contained in the git tree here: https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3 However, functionally there is only 1 major change from the last RFC. Specifically, kmap() is most often used within a single thread in a 'map/do something/unmap' pattern. In fact this is the pattern used in ~90% of the callers of kmap(). This pattern works very well for the pmem use case and the testing which was done. However, there were another ~20-30 kmap users which do not follow this pattern. Some of them seem to expect the mapping to be 'global' while others require a detailed audit to be sure.[2][3] While we don't anticipate global mappings to pmem there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why. There were a number of options considered. 1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only Option 1 is simply not feasible kmap_atomic() is not the same semantic as kmap() within a single tread. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior intended for future users. Therefore, option #4 was chosen. To handle the global PKRS state in the most efficient manner possible. We lazily override the thread specific PKRS key value only when needed because we anticipate PKS to not be needed will not be needed most of the time. And even when it is used 90% of the time it is a thread local call. [1] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny@intel.com/ [2] The following list of callers continue calling kmap() (utilizing the global PKRS). It would be nice if more of them could be converted to kmap_thread() drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]); drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl)); drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page); drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page); drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1)); drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset; drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg)); drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset; fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]); fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page); fs/ceph/inode.c: ctl->dentries = kmap(ctl->page); fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr); lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset; net/ceph/pagelist.c: pl->mapped_tail = kmap(page); net/ceph/pagelist.c: pl->mapped_tail = kmap(page); virt/kvm/kvm_main.c: hva = kmap(page); [3] The following appear to follow the same pattern as ext2 which was converted after some code audit. So I _think_ they too could be converted to k[un]map_thread(). fs/freevxfs/vxfs_subr.c|75| kmap(pp); fs/jfs/jfs_metapage.c|102| kmap(page); fs/jfs/jfs_metapage.c|156| kmap(page); fs/minix/dir.c|72| kmap(page); fs/nilfs2/dir.c|195| kmap(page); fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page); fs/ntfs/aops.h|78| kmap(page); fs/ntfs/compress.c|574| kmap(page); fs/qnx6/dir.c|32| kmap(page); fs/qnx6/dir.c|58| kmap(*p = page); fs/qnx6/inode.c|190| kmap(page); fs/qnx6/inode.c|557| kmap(page); fs/reiserfs/inode.c|2397| kmap(bh_result->b_page); fs/reiserfs/xattr.c|444| kmap(page); fs/sysv/dir.c|60| kmap(page); fs/sysv/dir.c|262| kmap(page); fs/ufs/dir.c|194| kmap(page); fs/ufs/dir.c|562| kmap(page); Ira Weiny (58): x86/pks: Add a global pkrs option x86/pks/test: Add testing for global option memremap: Add zone device access protection kmap: Add stray access protection for device pages kmap: Introduce k[un]map_thread kmap: Introduce k[un]map_thread debugging drivers/drbd: Utilize new kmap_thread() drivers/firmware_loader: Utilize new kmap_thread() drivers/gpu: Utilize new kmap_thread() drivers/rdma: Utilize new kmap_thread() drivers/net: Utilize new kmap_thread() fs/afs: Utilize new kmap_thread() fs/btrfs: Utilize new kmap_thread() fs/cifs: Utilize new kmap_thread() fs/ecryptfs: Utilize new kmap_thread() fs/gfs2: Utilize new kmap_thread() fs/nilfs2: Utilize new kmap_thread() fs/hfs: Utilize new kmap_thread() fs/hfsplus: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() fs/nfs: Utilize new kmap_thread() fs/f2fs: Utilize new kmap_thread() fs/fuse: Utilize new kmap_thread() fs/freevxfs: Utilize new kmap_thread() fs/reiserfs: Utilize new kmap_thread() fs/zonefs: Utilize new kmap_thread() fs/ubifs: Utilize new kmap_thread() fs/cachefiles: Utilize new kmap_thread() fs/ntfs: Utilize new kmap_thread() fs/romfs: Utilize new kmap_thread() fs/vboxsf: Utilize new kmap_thread() fs/hostfs: Utilize new kmap_thread() fs/cramfs: Utilize new kmap_thread() fs/erofs: Utilize new kmap_thread() fs: Utilize new kmap_thread() fs/ext2: Use ext2_put_page fs/ext2: Utilize new kmap_thread() fs/isofs: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() net: Utilize new kmap_thread() drivers/target: Utilize new kmap_thread() drivers/scsi: Utilize new kmap_thread() drivers/mmc: Utilize new kmap_thread() drivers/xen: Utilize new kmap_thread() drivers/firmware: Utilize new kmap_thread() drives/staging: Utilize new kmap_thread() drivers/mtd: Utilize new kmap_thread() drivers/md: Utilize new kmap_thread() drivers/misc: Utilize new kmap_thread() drivers/android: Utilize new kmap_thread() kernel: Utilize new kmap_thread() mm: Utilize new kmap_thread() lib: Utilize new kmap_thread() powerpc: Utilize new kmap_thread() samples: Utilize new kmap_thread() dax: Stray access protection for dax_direct_access() nvdimm/pmem: Stray access protection for pmem->virt_addr [dax|pmem]: Enable stray access protection Documentation/core-api/protection-keys.rst | 11 +- arch/powerpc/mm/mem.c | 4 +- arch/x86/entry/common.c | 28 +++ arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 ++++++- arch/x86/mm/fault.c | 193 ++++++++++++++---- arch/x86/mm/pkeys.c | 88 ++++++-- drivers/android/binder_alloc.c | 4 +- drivers/base/firmware_loader/fallback.c | 4 +- drivers/base/firmware_loader/main.c | 4 +- drivers/block/drbd/drbd_main.c | 4 +- drivers/block/drbd/drbd_receiver.c | 12 +- drivers/dax/device.c | 2 + drivers/dax/super.c | 2 + drivers/firmware/efi/capsule-loader.c | 6 +- drivers/firmware/efi/capsule.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +- drivers/gpu/drm/gma500/gma_display.c | 4 +- drivers/gpu/drm/gma500/mmu.c | 10 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/infiniband/hw/hfi1/sdma.c | 4 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +- drivers/md/bcache/request.c | 4 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +- drivers/mmc/host/mmc_spi.c | 4 +- drivers/mmc/host/sdricoh_cs.c | 4 +- drivers/mtd/mtd_blkdevs.c | 12 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +- .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/nvdimm/pmem.c | 6 + drivers/scsi/ipr.c | 8 +- drivers/scsi/pmcraid.c | 8 +- drivers/staging/rts5208/rtsx_transport.c | 4 +- drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_rd.c | 4 +- drivers/target/target_core_transport.c | 4 +- drivers/xen/gntalloc.c | 4 +- fs/afs/dir.c | 16 +- fs/afs/dir_edit.c | 16 +- fs/afs/mntpt.c | 4 +- fs/afs/write.c | 4 +- fs/aio.c | 4 +- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/btrfs/check-integrity.c | 4 +- fs/btrfs/compression.c | 4 +- fs/btrfs/inode.c | 16 +- fs/btrfs/lzo.c | 24 +-- fs/btrfs/raid56.c | 34 +-- fs/btrfs/reflink.c | 8 +- fs/btrfs/send.c | 4 +- fs/btrfs/zlib.c | 32 +-- fs/btrfs/zstd.c | 20 +- fs/cachefiles/rdwr.c | 4 +- fs/cifs/cifsencrypt.c | 6 +- fs/cifs/file.c | 16 +- fs/cifs/smb2ops.c | 8 +- fs/cramfs/inode.c | 10 +- fs/ecryptfs/crypto.c | 8 +- fs/ecryptfs/read_write.c | 8 +- fs/erofs/super.c | 4 +- fs/erofs/xattr.c | 4 +- fs/exec.c | 10 +- fs/ext2/dir.c | 8 +- fs/ext2/ext2.h | 8 + fs/ext2/namei.c | 15 +- fs/f2fs/f2fs.h | 8 +- fs/freevxfs/vxfs_immed.c | 4 +- fs/fuse/readdir.c | 4 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/ops_fstype.c | 4 +- fs/hfs/bnode.c | 14 +- fs/hfs/btree.c | 20 +- fs/hfsplus/bitmap.c | 20 +- fs/hfsplus/bnode.c | 102 ++++----- fs/hfsplus/btree.c | 18 +- fs/hostfs/hostfs_kern.c | 12 +- fs/io_uring.c | 4 +- fs/isofs/compress.c | 4 +- fs/jffs2/file.c | 8 +- fs/jffs2/gc.c | 4 +- fs/nfs/dir.c | 20 +- fs/nilfs2/alloc.c | 34 +-- fs/nilfs2/cpfile.c | 4 +- fs/ntfs/aops.c | 4 +- fs/reiserfs/journal.c | 4 +- fs/romfs/super.c | 4 +- fs/splice.c | 4 +- fs/ubifs/file.c | 16 +- fs/vboxsf/file.c | 12 +- fs/zonefs/super.c | 4 +- include/linux/entry-common.h | 3 + include/linux/highmem.h | 63 +++++- include/linux/memremap.h | 1 + include/linux/mm.h | 43 ++++ include/linux/pkeys.h | 6 +- include/linux/sched.h | 8 + include/trace/events/kmap_thread.h | 56 +++++ init/init_task.c | 6 + kernel/fork.c | 18 ++ kernel/kexec_core.c | 8 +- lib/Kconfig.debug | 8 + lib/iov_iter.c | 12 +- lib/pks/pks_test.c | 138 +++++++++++-- lib/test_bpf.c | 4 +- lib/test_hmm.c | 8 +- mm/Kconfig | 13 ++ mm/debug.c | 23 +++ mm/memory.c | 8 +- mm/memremap.c | 90 ++++++++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 4 +- net/ceph/messenger.c | 4 +- net/core/datagram.c | 4 +- net/core/sock.c | 8 +- net/ipv4/ip_output.c | 4 +- net/sunrpc/cache.c | 4 +- net/sunrpc/xdr.c | 8 +- net/tls/tls_device.c | 4 +- samples/vfio-mdev/mbochs.c | 4 +- 131 files changed, 1284 insertions(+), 565 deletions(-) create mode 100644 include/trace/events/kmap_thread.h -- 2.28.0.rc0.12.gb6a658bd00c9 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DED71C4363A for ; Fri, 9 Oct 2020 19:50:47 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AA180222BA for ; Fri, 9 Oct 2020 19:50:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AA180222BA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1F1B26ED93; Fri, 9 Oct 2020 19:50:45 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id 11CC86ED91; Fri, 9 Oct 2020 19:50:43 +0000 (UTC) IronPort-SDR: +eCjQoTJ4s+bFNrF4DvuRgPSb0Rm+BHAcRtYHgpcL60PBnHjOvU8mPN0V8JuLrq+tTZfF4ZRim evubcVlcW/Cw== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="152450719" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="152450719" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 IronPort-SDR: wgvSBhlinBwJf2eRaqYm1d4mOPDeheaaBRmvXZpaWhx0BsPjq5MOqCRmfglsuVIrge+HvLIvQ5 IT741lyNdN2Q== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="419536654" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 From: ira.weiny@intel.com To: Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Date: Fri, 9 Oct 2020 12:49:35 -0700 Message-Id: <20201009195033.3208459-1-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH RFC PKS/PMEM 00/58] PMEM: Introduce stray write protection for PMEM X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-aio@kvack.org, linux-efi@vger.kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-mmc@vger.kernel.org, Dave Hansen , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, target-devel@vger.kernel.org, linux-mtd@lists.infradead.org, linux-kselftest@vger.kernel.org, samba-technical@lists.samba.org, Ira Weiny , ceph-devel@vger.kernel.org, drbd-dev@lists.linbit.com, devel@driverdev.osuosl.org, linux-cifs@vger.kernel.org, linux-nilfs@vger.kernel.org, linux-scsi@vger.kernel.org, linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org, x86@kernel.org, amd-gfx@lists.freedesktop.org, linux-afs@lists.infradead.org, cluster-devel@redhat.com, linux-cachefs@redhat.com, intel-wired-lan@lists.osuosl.org, xen-devel@lists.xenproject.org, linux-ext4@vger.kernel.org, Fenghua Yu , linux-um@lists.infradead.org, intel-gfx@lists.freedesktop.org, ecryptfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, reiserfs-devel@vger.kernel.org, linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Dan Williams , io-uring@vger.kernel.org, linux-nfs@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, netdev@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-btrfs@vger.kernel.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" From: Ira Weiny Should a stray write in the kernel occur persistent memory is affected more than regular memory. A write to the wrong area of memory could result in latent data corruption which will will persist after a reboot. PKS provides a nice way to restrict access to persistent memory kernel mappings, while providing fast access when needed. Since the last RFC[1] this patch set has grown quite a bit. It now depends on the core patches submitted separately. https://lore.kernel.org/lkml/20201009194258.3207172-1-ira.weiny@intel.com/ And contained in the git tree here: https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3 However, functionally there is only 1 major change from the last RFC. Specifically, kmap() is most often used within a single thread in a 'map/do something/unmap' pattern. In fact this is the pattern used in ~90% of the callers of kmap(). This pattern works very well for the pmem use case and the testing which was done. However, there were another ~20-30 kmap users which do not follow this pattern. Some of them seem to expect the mapping to be 'global' while others require a detailed audit to be sure.[2][3] While we don't anticipate global mappings to pmem there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why. There were a number of options considered. 1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only Option 1 is simply not feasible kmap_atomic() is not the same semantic as kmap() within a single tread. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior intended for future users. Therefore, option #4 was chosen. To handle the global PKRS state in the most efficient manner possible. We lazily override the thread specific PKRS key value only when needed because we anticipate PKS to not be needed will not be needed most of the time. And even when it is used 90% of the time it is a thread local call. [1] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny@intel.com/ [2] The following list of callers continue calling kmap() (utilizing the global PKRS). It would be nice if more of them could be converted to kmap_thread() drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]); drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl)); drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page); drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page); drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1)); drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset; drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg)); drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset; fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]); fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page); fs/ceph/inode.c: ctl->dentries = kmap(ctl->page); fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr); lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset; net/ceph/pagelist.c: pl->mapped_tail = kmap(page); net/ceph/pagelist.c: pl->mapped_tail = kmap(page); virt/kvm/kvm_main.c: hva = kmap(page); [3] The following appear to follow the same pattern as ext2 which was converted after some code audit. So I _think_ they too could be converted to k[un]map_thread(). fs/freevxfs/vxfs_subr.c|75| kmap(pp); fs/jfs/jfs_metapage.c|102| kmap(page); fs/jfs/jfs_metapage.c|156| kmap(page); fs/minix/dir.c|72| kmap(page); fs/nilfs2/dir.c|195| kmap(page); fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page); fs/ntfs/aops.h|78| kmap(page); fs/ntfs/compress.c|574| kmap(page); fs/qnx6/dir.c|32| kmap(page); fs/qnx6/dir.c|58| kmap(*p = page); fs/qnx6/inode.c|190| kmap(page); fs/qnx6/inode.c|557| kmap(page); fs/reiserfs/inode.c|2397| kmap(bh_result->b_page); fs/reiserfs/xattr.c|444| kmap(page); fs/sysv/dir.c|60| kmap(page); fs/sysv/dir.c|262| kmap(page); fs/ufs/dir.c|194| kmap(page); fs/ufs/dir.c|562| kmap(page); Ira Weiny (58): x86/pks: Add a global pkrs option x86/pks/test: Add testing for global option memremap: Add zone device access protection kmap: Add stray access protection for device pages kmap: Introduce k[un]map_thread kmap: Introduce k[un]map_thread debugging drivers/drbd: Utilize new kmap_thread() drivers/firmware_loader: Utilize new kmap_thread() drivers/gpu: Utilize new kmap_thread() drivers/rdma: Utilize new kmap_thread() drivers/net: Utilize new kmap_thread() fs/afs: Utilize new kmap_thread() fs/btrfs: Utilize new kmap_thread() fs/cifs: Utilize new kmap_thread() fs/ecryptfs: Utilize new kmap_thread() fs/gfs2: Utilize new kmap_thread() fs/nilfs2: Utilize new kmap_thread() fs/hfs: Utilize new kmap_thread() fs/hfsplus: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() fs/nfs: Utilize new kmap_thread() fs/f2fs: Utilize new kmap_thread() fs/fuse: Utilize new kmap_thread() fs/freevxfs: Utilize new kmap_thread() fs/reiserfs: Utilize new kmap_thread() fs/zonefs: Utilize new kmap_thread() fs/ubifs: Utilize new kmap_thread() fs/cachefiles: Utilize new kmap_thread() fs/ntfs: Utilize new kmap_thread() fs/romfs: Utilize new kmap_thread() fs/vboxsf: Utilize new kmap_thread() fs/hostfs: Utilize new kmap_thread() fs/cramfs: Utilize new kmap_thread() fs/erofs: Utilize new kmap_thread() fs: Utilize new kmap_thread() fs/ext2: Use ext2_put_page fs/ext2: Utilize new kmap_thread() fs/isofs: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() net: Utilize new kmap_thread() drivers/target: Utilize new kmap_thread() drivers/scsi: Utilize new kmap_thread() drivers/mmc: Utilize new kmap_thread() drivers/xen: Utilize new kmap_thread() drivers/firmware: Utilize new kmap_thread() drives/staging: Utilize new kmap_thread() drivers/mtd: Utilize new kmap_thread() drivers/md: Utilize new kmap_thread() drivers/misc: Utilize new kmap_thread() drivers/android: Utilize new kmap_thread() kernel: Utilize new kmap_thread() mm: Utilize new kmap_thread() lib: Utilize new kmap_thread() powerpc: Utilize new kmap_thread() samples: Utilize new kmap_thread() dax: Stray access protection for dax_direct_access() nvdimm/pmem: Stray access protection for pmem->virt_addr [dax|pmem]: Enable stray access protection Documentation/core-api/protection-keys.rst | 11 +- arch/powerpc/mm/mem.c | 4 +- arch/x86/entry/common.c | 28 +++ arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 ++++++- arch/x86/mm/fault.c | 193 ++++++++++++++---- arch/x86/mm/pkeys.c | 88 ++++++-- drivers/android/binder_alloc.c | 4 +- drivers/base/firmware_loader/fallback.c | 4 +- drivers/base/firmware_loader/main.c | 4 +- drivers/block/drbd/drbd_main.c | 4 +- drivers/block/drbd/drbd_receiver.c | 12 +- drivers/dax/device.c | 2 + drivers/dax/super.c | 2 + drivers/firmware/efi/capsule-loader.c | 6 +- drivers/firmware/efi/capsule.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +- drivers/gpu/drm/gma500/gma_display.c | 4 +- drivers/gpu/drm/gma500/mmu.c | 10 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/infiniband/hw/hfi1/sdma.c | 4 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +- drivers/md/bcache/request.c | 4 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +- drivers/mmc/host/mmc_spi.c | 4 +- drivers/mmc/host/sdricoh_cs.c | 4 +- drivers/mtd/mtd_blkdevs.c | 12 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +- .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/nvdimm/pmem.c | 6 + drivers/scsi/ipr.c | 8 +- drivers/scsi/pmcraid.c | 8 +- drivers/staging/rts5208/rtsx_transport.c | 4 +- drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_rd.c | 4 +- drivers/target/target_core_transport.c | 4 +- drivers/xen/gntalloc.c | 4 +- fs/afs/dir.c | 16 +- fs/afs/dir_edit.c | 16 +- fs/afs/mntpt.c | 4 +- fs/afs/write.c | 4 +- fs/aio.c | 4 +- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/btrfs/check-integrity.c | 4 +- fs/btrfs/compression.c | 4 +- fs/btrfs/inode.c | 16 +- fs/btrfs/lzo.c | 24 +-- fs/btrfs/raid56.c | 34 +-- fs/btrfs/reflink.c | 8 +- fs/btrfs/send.c | 4 +- fs/btrfs/zlib.c | 32 +-- fs/btrfs/zstd.c | 20 +- fs/cachefiles/rdwr.c | 4 +- fs/cifs/cifsencrypt.c | 6 +- fs/cifs/file.c | 16 +- fs/cifs/smb2ops.c | 8 +- fs/cramfs/inode.c | 10 +- fs/ecryptfs/crypto.c | 8 +- fs/ecryptfs/read_write.c | 8 +- fs/erofs/super.c | 4 +- fs/erofs/xattr.c | 4 +- fs/exec.c | 10 +- fs/ext2/dir.c | 8 +- fs/ext2/ext2.h | 8 + fs/ext2/namei.c | 15 +- fs/f2fs/f2fs.h | 8 +- fs/freevxfs/vxfs_immed.c | 4 +- fs/fuse/readdir.c | 4 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/ops_fstype.c | 4 +- fs/hfs/bnode.c | 14 +- fs/hfs/btree.c | 20 +- fs/hfsplus/bitmap.c | 20 +- fs/hfsplus/bnode.c | 102 ++++----- fs/hfsplus/btree.c | 18 +- fs/hostfs/hostfs_kern.c | 12 +- fs/io_uring.c | 4 +- fs/isofs/compress.c | 4 +- fs/jffs2/file.c | 8 +- fs/jffs2/gc.c | 4 +- fs/nfs/dir.c | 20 +- fs/nilfs2/alloc.c | 34 +-- fs/nilfs2/cpfile.c | 4 +- fs/ntfs/aops.c | 4 +- fs/reiserfs/journal.c | 4 +- fs/romfs/super.c | 4 +- fs/splice.c | 4 +- fs/ubifs/file.c | 16 +- fs/vboxsf/file.c | 12 +- fs/zonefs/super.c | 4 +- include/linux/entry-common.h | 3 + include/linux/highmem.h | 63 +++++- include/linux/memremap.h | 1 + include/linux/mm.h | 43 ++++ include/linux/pkeys.h | 6 +- include/linux/sched.h | 8 + include/trace/events/kmap_thread.h | 56 +++++ init/init_task.c | 6 + kernel/fork.c | 18 ++ kernel/kexec_core.c | 8 +- lib/Kconfig.debug | 8 + lib/iov_iter.c | 12 +- lib/pks/pks_test.c | 138 +++++++++++-- lib/test_bpf.c | 4 +- lib/test_hmm.c | 8 +- mm/Kconfig | 13 ++ mm/debug.c | 23 +++ mm/memory.c | 8 +- mm/memremap.c | 90 ++++++++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 4 +- net/ceph/messenger.c | 4 +- net/core/datagram.c | 4 +- net/core/sock.c | 8 +- net/ipv4/ip_output.c | 4 +- net/sunrpc/cache.c | 4 +- net/sunrpc/xdr.c | 8 +- net/tls/tls_device.c | 4 +- samples/vfio-mdev/mbochs.c | 4 +- 131 files changed, 1284 insertions(+), 565 deletions(-) create mode 100644 include/trace/events/kmap_thread.h -- 2.28.0.rc0.12.gb6a658bd00c9 _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8EC2C43467 for ; Fri, 9 Oct 2020 21:11:07 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A1154222C3 for ; Fri, 9 Oct 2020 21:11:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A1154222C3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=amd-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3C4746EE39; Fri, 9 Oct 2020 21:10:53 +0000 (UTC) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by gabe.freedesktop.org (Postfix) with ESMTPS id 11CC86ED91; Fri, 9 Oct 2020 19:50:43 +0000 (UTC) IronPort-SDR: +eCjQoTJ4s+bFNrF4DvuRgPSb0Rm+BHAcRtYHgpcL60PBnHjOvU8mPN0V8JuLrq+tTZfF4ZRim evubcVlcW/Cw== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="152450719" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="152450719" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 IronPort-SDR: wgvSBhlinBwJf2eRaqYm1d4mOPDeheaaBRmvXZpaWhx0BsPjq5MOqCRmfglsuVIrge+HvLIvQ5 IT741lyNdN2Q== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="419536654" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:50:41 -0700 From: ira.weiny@intel.com To: Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Subject: [PATCH RFC PKS/PMEM 00/58] PMEM: Introduce stray write protection for PMEM Date: Fri, 9 Oct 2020 12:49:35 -0700 Message-Id: <20201009195033.3208459-1-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 MIME-Version: 1.0 X-Mailman-Approved-At: Fri, 09 Oct 2020 21:10:40 +0000 X-BeenThere: amd-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussion list for AMD gfx List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-aio@kvack.org, linux-efi@vger.kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-mmc@vger.kernel.org, Dave Hansen , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, target-devel@vger.kernel.org, linux-mtd@lists.infradead.org, linux-kselftest@vger.kernel.org, samba-technical@lists.samba.org, Ira Weiny , ceph-devel@vger.kernel.org, drbd-dev@lists.linbit.com, devel@driverdev.osuosl.org, linux-cifs@vger.kernel.org, linux-nilfs@vger.kernel.org, linux-scsi@vger.kernel.org, linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org, x86@kernel.org, amd-gfx@lists.freedesktop.org, linux-afs@lists.infradead.org, cluster-devel@redhat.com, linux-cachefs@redhat.com, intel-wired-lan@lists.osuosl.org, xen-devel@lists.xenproject.org, linux-ext4@vger.kernel.org, Fenghua Yu , linux-um@lists.infradead.org, intel-gfx@lists.freedesktop.org, ecryptfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, reiserfs-devel@vger.kernel.org, linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Dan Williams , io-uring@vger.kernel.org, linux-nfs@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, netdev@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-btrfs@vger.kernel.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: amd-gfx-bounces@lists.freedesktop.org Sender: "amd-gfx" From: Ira Weiny Should a stray write in the kernel occur persistent memory is affected more than regular memory. A write to the wrong area of memory could result in latent data corruption which will will persist after a reboot. PKS provides a nice way to restrict access to persistent memory kernel mappings, while providing fast access when needed. Since the last RFC[1] this patch set has grown quite a bit. It now depends on the core patches submitted separately. https://lore.kernel.org/lkml/20201009194258.3207172-1-ira.weiny@intel.com/ And contained in the git tree here: https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3 However, functionally there is only 1 major change from the last RFC. Specifically, kmap() is most often used within a single thread in a 'map/do something/unmap' pattern. In fact this is the pattern used in ~90% of the callers of kmap(). This pattern works very well for the pmem use case and the testing which was done. However, there were another ~20-30 kmap users which do not follow this pattern. Some of them seem to expect the mapping to be 'global' while others require a detailed audit to be sure.[2][3] While we don't anticipate global mappings to pmem there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why. There were a number of options considered. 1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only Option 1 is simply not feasible kmap_atomic() is not the same semantic as kmap() within a single tread. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior intended for future users. Therefore, option #4 was chosen. To handle the global PKRS state in the most efficient manner possible. We lazily override the thread specific PKRS key value only when needed because we anticipate PKS to not be needed will not be needed most of the time. And even when it is used 90% of the time it is a thread local call. [1] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny@intel.com/ [2] The following list of callers continue calling kmap() (utilizing the global PKRS). It would be nice if more of them could be converted to kmap_thread() drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]); drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl)); drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page); drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page); drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1)); drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset; drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg)); drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset; fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]); fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page); fs/ceph/inode.c: ctl->dentries = kmap(ctl->page); fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr); lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset; net/ceph/pagelist.c: pl->mapped_tail = kmap(page); net/ceph/pagelist.c: pl->mapped_tail = kmap(page); virt/kvm/kvm_main.c: hva = kmap(page); [3] The following appear to follow the same pattern as ext2 which was converted after some code audit. So I _think_ they too could be converted to k[un]map_thread(). fs/freevxfs/vxfs_subr.c|75| kmap(pp); fs/jfs/jfs_metapage.c|102| kmap(page); fs/jfs/jfs_metapage.c|156| kmap(page); fs/minix/dir.c|72| kmap(page); fs/nilfs2/dir.c|195| kmap(page); fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page); fs/ntfs/aops.h|78| kmap(page); fs/ntfs/compress.c|574| kmap(page); fs/qnx6/dir.c|32| kmap(page); fs/qnx6/dir.c|58| kmap(*p = page); fs/qnx6/inode.c|190| kmap(page); fs/qnx6/inode.c|557| kmap(page); fs/reiserfs/inode.c|2397| kmap(bh_result->b_page); fs/reiserfs/xattr.c|444| kmap(page); fs/sysv/dir.c|60| kmap(page); fs/sysv/dir.c|262| kmap(page); fs/ufs/dir.c|194| kmap(page); fs/ufs/dir.c|562| kmap(page); Ira Weiny (58): x86/pks: Add a global pkrs option x86/pks/test: Add testing for global option memremap: Add zone device access protection kmap: Add stray access protection for device pages kmap: Introduce k[un]map_thread kmap: Introduce k[un]map_thread debugging drivers/drbd: Utilize new kmap_thread() drivers/firmware_loader: Utilize new kmap_thread() drivers/gpu: Utilize new kmap_thread() drivers/rdma: Utilize new kmap_thread() drivers/net: Utilize new kmap_thread() fs/afs: Utilize new kmap_thread() fs/btrfs: Utilize new kmap_thread() fs/cifs: Utilize new kmap_thread() fs/ecryptfs: Utilize new kmap_thread() fs/gfs2: Utilize new kmap_thread() fs/nilfs2: Utilize new kmap_thread() fs/hfs: Utilize new kmap_thread() fs/hfsplus: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() fs/nfs: Utilize new kmap_thread() fs/f2fs: Utilize new kmap_thread() fs/fuse: Utilize new kmap_thread() fs/freevxfs: Utilize new kmap_thread() fs/reiserfs: Utilize new kmap_thread() fs/zonefs: Utilize new kmap_thread() fs/ubifs: Utilize new kmap_thread() fs/cachefiles: Utilize new kmap_thread() fs/ntfs: Utilize new kmap_thread() fs/romfs: Utilize new kmap_thread() fs/vboxsf: Utilize new kmap_thread() fs/hostfs: Utilize new kmap_thread() fs/cramfs: Utilize new kmap_thread() fs/erofs: Utilize new kmap_thread() fs: Utilize new kmap_thread() fs/ext2: Use ext2_put_page fs/ext2: Utilize new kmap_thread() fs/isofs: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() net: Utilize new kmap_thread() drivers/target: Utilize new kmap_thread() drivers/scsi: Utilize new kmap_thread() drivers/mmc: Utilize new kmap_thread() drivers/xen: Utilize new kmap_thread() drivers/firmware: Utilize new kmap_thread() drives/staging: Utilize new kmap_thread() drivers/mtd: Utilize new kmap_thread() drivers/md: Utilize new kmap_thread() drivers/misc: Utilize new kmap_thread() drivers/android: Utilize new kmap_thread() kernel: Utilize new kmap_thread() mm: Utilize new kmap_thread() lib: Utilize new kmap_thread() powerpc: Utilize new kmap_thread() samples: Utilize new kmap_thread() dax: Stray access protection for dax_direct_access() nvdimm/pmem: Stray access protection for pmem->virt_addr [dax|pmem]: Enable stray access protection Documentation/core-api/protection-keys.rst | 11 +- arch/powerpc/mm/mem.c | 4 +- arch/x86/entry/common.c | 28 +++ arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 ++++++- arch/x86/mm/fault.c | 193 ++++++++++++++---- arch/x86/mm/pkeys.c | 88 ++++++-- drivers/android/binder_alloc.c | 4 +- drivers/base/firmware_loader/fallback.c | 4 +- drivers/base/firmware_loader/main.c | 4 +- drivers/block/drbd/drbd_main.c | 4 +- drivers/block/drbd/drbd_receiver.c | 12 +- drivers/dax/device.c | 2 + drivers/dax/super.c | 2 + drivers/firmware/efi/capsule-loader.c | 6 +- drivers/firmware/efi/capsule.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +- drivers/gpu/drm/gma500/gma_display.c | 4 +- drivers/gpu/drm/gma500/mmu.c | 10 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/infiniband/hw/hfi1/sdma.c | 4 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +- drivers/md/bcache/request.c | 4 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +- drivers/mmc/host/mmc_spi.c | 4 +- drivers/mmc/host/sdricoh_cs.c | 4 +- drivers/mtd/mtd_blkdevs.c | 12 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +- .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/nvdimm/pmem.c | 6 + drivers/scsi/ipr.c | 8 +- drivers/scsi/pmcraid.c | 8 +- drivers/staging/rts5208/rtsx_transport.c | 4 +- drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_rd.c | 4 +- drivers/target/target_core_transport.c | 4 +- drivers/xen/gntalloc.c | 4 +- fs/afs/dir.c | 16 +- fs/afs/dir_edit.c | 16 +- fs/afs/mntpt.c | 4 +- fs/afs/write.c | 4 +- fs/aio.c | 4 +- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/btrfs/check-integrity.c | 4 +- fs/btrfs/compression.c | 4 +- fs/btrfs/inode.c | 16 +- fs/btrfs/lzo.c | 24 +-- fs/btrfs/raid56.c | 34 +-- fs/btrfs/reflink.c | 8 +- fs/btrfs/send.c | 4 +- fs/btrfs/zlib.c | 32 +-- fs/btrfs/zstd.c | 20 +- fs/cachefiles/rdwr.c | 4 +- fs/cifs/cifsencrypt.c | 6 +- fs/cifs/file.c | 16 +- fs/cifs/smb2ops.c | 8 +- fs/cramfs/inode.c | 10 +- fs/ecryptfs/crypto.c | 8 +- fs/ecryptfs/read_write.c | 8 +- fs/erofs/super.c | 4 +- fs/erofs/xattr.c | 4 +- fs/exec.c | 10 +- fs/ext2/dir.c | 8 +- fs/ext2/ext2.h | 8 + fs/ext2/namei.c | 15 +- fs/f2fs/f2fs.h | 8 +- fs/freevxfs/vxfs_immed.c | 4 +- fs/fuse/readdir.c | 4 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/ops_fstype.c | 4 +- fs/hfs/bnode.c | 14 +- fs/hfs/btree.c | 20 +- fs/hfsplus/bitmap.c | 20 +- fs/hfsplus/bnode.c | 102 ++++----- fs/hfsplus/btree.c | 18 +- fs/hostfs/hostfs_kern.c | 12 +- fs/io_uring.c | 4 +- fs/isofs/compress.c | 4 +- fs/jffs2/file.c | 8 +- fs/jffs2/gc.c | 4 +- fs/nfs/dir.c | 20 +- fs/nilfs2/alloc.c | 34 +-- fs/nilfs2/cpfile.c | 4 +- fs/ntfs/aops.c | 4 +- fs/reiserfs/journal.c | 4 +- fs/romfs/super.c | 4 +- fs/splice.c | 4 +- fs/ubifs/file.c | 16 +- fs/vboxsf/file.c | 12 +- fs/zonefs/super.c | 4 +- include/linux/entry-common.h | 3 + include/linux/highmem.h | 63 +++++- include/linux/memremap.h | 1 + include/linux/mm.h | 43 ++++ include/linux/pkeys.h | 6 +- include/linux/sched.h | 8 + include/trace/events/kmap_thread.h | 56 +++++ init/init_task.c | 6 + kernel/fork.c | 18 ++ kernel/kexec_core.c | 8 +- lib/Kconfig.debug | 8 + lib/iov_iter.c | 12 +- lib/pks/pks_test.c | 138 +++++++++++-- lib/test_bpf.c | 4 +- lib/test_hmm.c | 8 +- mm/Kconfig | 13 ++ mm/debug.c | 23 +++ mm/memory.c | 8 +- mm/memremap.c | 90 ++++++++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 4 +- net/ceph/messenger.c | 4 +- net/core/datagram.c | 4 +- net/core/sock.c | 8 +- net/ipv4/ip_output.c | 4 +- net/sunrpc/cache.c | 4 +- net/sunrpc/xdr.c | 8 +- net/tls/tls_device.c | 4 +- samples/vfio-mdev/mbochs.c | 4 +- 131 files changed, 1284 insertions(+), 565 deletions(-) create mode 100644 include/trace/events/kmap_thread.h -- 2.28.0.rc0.12.gb6a658bd00c9 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx From mboxrd@z Thu Jan 1 00:00:00 1970 From: ira.weiny@intel.com Date: Fri, 9 Oct 2020 12:49:35 -0700 Subject: [Intel-wired-lan] [PATCH RFC PKS/PMEM 00/58] PMEM: Introduce stray write protection for PMEM Message-ID: <20201009195033.3208459-1-ira.weiny@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: intel-wired-lan@osuosl.org List-ID: From: Ira Weiny Should a stray write in the kernel occur persistent memory is affected more than regular memory. A write to the wrong area of memory could result in latent data corruption which will will persist after a reboot. PKS provides a nice way to restrict access to persistent memory kernel mappings, while providing fast access when needed. Since the last RFC[1] this patch set has grown quite a bit. It now depends on the core patches submitted separately. https://lore.kernel.org/lkml/20201009194258.3207172-1-ira.weiny at intel.com/ And contained in the git tree here: https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3 However, functionally there is only 1 major change from the last RFC. Specifically, kmap() is most often used within a single thread in a 'map/do something/unmap' pattern. In fact this is the pattern used in ~90% of the callers of kmap(). This pattern works very well for the pmem use case and the testing which was done. However, there were another ~20-30 kmap users which do not follow this pattern. Some of them seem to expect the mapping to be 'global' while others require a detailed audit to be sure.[2][3] While we don't anticipate global mappings to pmem there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why. There were a number of options considered. 1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only Option 1 is simply not feasible kmap_atomic() is not the same semantic as kmap() within a single tread. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior intended for future users. Therefore, option #4 was chosen. To handle the global PKRS state in the most efficient manner possible. We lazily override the thread specific PKRS key value only when needed because we anticipate PKS to not be needed will not be needed most of the time. And even when it is used 90% of the time it is a thread local call. [1] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny at intel.com/ [2] The following list of callers continue calling kmap() (utilizing the global PKRS). It would be nice if more of them could be converted to kmap_thread() drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]); drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl)); drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page); drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page); drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1)); drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset; drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg)); drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset; fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]); fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page); fs/ceph/inode.c: ctl->dentries = kmap(ctl->page); fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr); lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset; net/ceph/pagelist.c: pl->mapped_tail = kmap(page); net/ceph/pagelist.c: pl->mapped_tail = kmap(page); virt/kvm/kvm_main.c: hva = kmap(page); [3] The following appear to follow the same pattern as ext2 which was converted after some code audit. So I _think_ they too could be converted to k[un]map_thread(). fs/freevxfs/vxfs_subr.c|75| kmap(pp); fs/jfs/jfs_metapage.c|102| kmap(page); fs/jfs/jfs_metapage.c|156| kmap(page); fs/minix/dir.c|72| kmap(page); fs/nilfs2/dir.c|195| kmap(page); fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page); fs/ntfs/aops.h|78| kmap(page); fs/ntfs/compress.c|574| kmap(page); fs/qnx6/dir.c|32| kmap(page); fs/qnx6/dir.c|58| kmap(*p = page); fs/qnx6/inode.c|190| kmap(page); fs/qnx6/inode.c|557| kmap(page); fs/reiserfs/inode.c|2397| kmap(bh_result->b_page); fs/reiserfs/xattr.c|444| kmap(page); fs/sysv/dir.c|60| kmap(page); fs/sysv/dir.c|262| kmap(page); fs/ufs/dir.c|194| kmap(page); fs/ufs/dir.c|562| kmap(page); Ira Weiny (58): x86/pks: Add a global pkrs option x86/pks/test: Add testing for global option memremap: Add zone device access protection kmap: Add stray access protection for device pages kmap: Introduce k[un]map_thread kmap: Introduce k[un]map_thread debugging drivers/drbd: Utilize new kmap_thread() drivers/firmware_loader: Utilize new kmap_thread() drivers/gpu: Utilize new kmap_thread() drivers/rdma: Utilize new kmap_thread() drivers/net: Utilize new kmap_thread() fs/afs: Utilize new kmap_thread() fs/btrfs: Utilize new kmap_thread() fs/cifs: Utilize new kmap_thread() fs/ecryptfs: Utilize new kmap_thread() fs/gfs2: Utilize new kmap_thread() fs/nilfs2: Utilize new kmap_thread() fs/hfs: Utilize new kmap_thread() fs/hfsplus: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() fs/nfs: Utilize new kmap_thread() fs/f2fs: Utilize new kmap_thread() fs/fuse: Utilize new kmap_thread() fs/freevxfs: Utilize new kmap_thread() fs/reiserfs: Utilize new kmap_thread() fs/zonefs: Utilize new kmap_thread() fs/ubifs: Utilize new kmap_thread() fs/cachefiles: Utilize new kmap_thread() fs/ntfs: Utilize new kmap_thread() fs/romfs: Utilize new kmap_thread() fs/vboxsf: Utilize new kmap_thread() fs/hostfs: Utilize new kmap_thread() fs/cramfs: Utilize new kmap_thread() fs/erofs: Utilize new kmap_thread() fs: Utilize new kmap_thread() fs/ext2: Use ext2_put_page fs/ext2: Utilize new kmap_thread() fs/isofs: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() net: Utilize new kmap_thread() drivers/target: Utilize new kmap_thread() drivers/scsi: Utilize new kmap_thread() drivers/mmc: Utilize new kmap_thread() drivers/xen: Utilize new kmap_thread() drivers/firmware: Utilize new kmap_thread() drives/staging: Utilize new kmap_thread() drivers/mtd: Utilize new kmap_thread() drivers/md: Utilize new kmap_thread() drivers/misc: Utilize new kmap_thread() drivers/android: Utilize new kmap_thread() kernel: Utilize new kmap_thread() mm: Utilize new kmap_thread() lib: Utilize new kmap_thread() powerpc: Utilize new kmap_thread() samples: Utilize new kmap_thread() dax: Stray access protection for dax_direct_access() nvdimm/pmem: Stray access protection for pmem->virt_addr [dax|pmem]: Enable stray access protection Documentation/core-api/protection-keys.rst | 11 +- arch/powerpc/mm/mem.c | 4 +- arch/x86/entry/common.c | 28 +++ arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 ++++++- arch/x86/mm/fault.c | 193 ++++++++++++++---- arch/x86/mm/pkeys.c | 88 ++++++-- drivers/android/binder_alloc.c | 4 +- drivers/base/firmware_loader/fallback.c | 4 +- drivers/base/firmware_loader/main.c | 4 +- drivers/block/drbd/drbd_main.c | 4 +- drivers/block/drbd/drbd_receiver.c | 12 +- drivers/dax/device.c | 2 + drivers/dax/super.c | 2 + drivers/firmware/efi/capsule-loader.c | 6 +- drivers/firmware/efi/capsule.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +- drivers/gpu/drm/gma500/gma_display.c | 4 +- drivers/gpu/drm/gma500/mmu.c | 10 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/infiniband/hw/hfi1/sdma.c | 4 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +- drivers/md/bcache/request.c | 4 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +- drivers/mmc/host/mmc_spi.c | 4 +- drivers/mmc/host/sdricoh_cs.c | 4 +- drivers/mtd/mtd_blkdevs.c | 12 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +- .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/nvdimm/pmem.c | 6 + drivers/scsi/ipr.c | 8 +- drivers/scsi/pmcraid.c | 8 +- drivers/staging/rts5208/rtsx_transport.c | 4 +- drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_rd.c | 4 +- drivers/target/target_core_transport.c | 4 +- drivers/xen/gntalloc.c | 4 +- fs/afs/dir.c | 16 +- fs/afs/dir_edit.c | 16 +- fs/afs/mntpt.c | 4 +- fs/afs/write.c | 4 +- fs/aio.c | 4 +- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/btrfs/check-integrity.c | 4 +- fs/btrfs/compression.c | 4 +- fs/btrfs/inode.c | 16 +- fs/btrfs/lzo.c | 24 +-- fs/btrfs/raid56.c | 34 +-- fs/btrfs/reflink.c | 8 +- fs/btrfs/send.c | 4 +- fs/btrfs/zlib.c | 32 +-- fs/btrfs/zstd.c | 20 +- fs/cachefiles/rdwr.c | 4 +- fs/cifs/cifsencrypt.c | 6 +- fs/cifs/file.c | 16 +- fs/cifs/smb2ops.c | 8 +- fs/cramfs/inode.c | 10 +- fs/ecryptfs/crypto.c | 8 +- fs/ecryptfs/read_write.c | 8 +- fs/erofs/super.c | 4 +- fs/erofs/xattr.c | 4 +- fs/exec.c | 10 +- fs/ext2/dir.c | 8 +- fs/ext2/ext2.h | 8 + fs/ext2/namei.c | 15 +- fs/f2fs/f2fs.h | 8 +- fs/freevxfs/vxfs_immed.c | 4 +- fs/fuse/readdir.c | 4 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/ops_fstype.c | 4 +- fs/hfs/bnode.c | 14 +- fs/hfs/btree.c | 20 +- fs/hfsplus/bitmap.c | 20 +- fs/hfsplus/bnode.c | 102 ++++----- fs/hfsplus/btree.c | 18 +- fs/hostfs/hostfs_kern.c | 12 +- fs/io_uring.c | 4 +- fs/isofs/compress.c | 4 +- fs/jffs2/file.c | 8 +- fs/jffs2/gc.c | 4 +- fs/nfs/dir.c | 20 +- fs/nilfs2/alloc.c | 34 +-- fs/nilfs2/cpfile.c | 4 +- fs/ntfs/aops.c | 4 +- fs/reiserfs/journal.c | 4 +- fs/romfs/super.c | 4 +- fs/splice.c | 4 +- fs/ubifs/file.c | 16 +- fs/vboxsf/file.c | 12 +- fs/zonefs/super.c | 4 +- include/linux/entry-common.h | 3 + include/linux/highmem.h | 63 +++++- include/linux/memremap.h | 1 + include/linux/mm.h | 43 ++++ include/linux/pkeys.h | 6 +- include/linux/sched.h | 8 + include/trace/events/kmap_thread.h | 56 +++++ init/init_task.c | 6 + kernel/fork.c | 18 ++ kernel/kexec_core.c | 8 +- lib/Kconfig.debug | 8 + lib/iov_iter.c | 12 +- lib/pks/pks_test.c | 138 +++++++++++-- lib/test_bpf.c | 4 +- lib/test_hmm.c | 8 +- mm/Kconfig | 13 ++ mm/debug.c | 23 +++ mm/memory.c | 8 +- mm/memremap.c | 90 ++++++++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 4 +- net/ceph/messenger.c | 4 +- net/core/datagram.c | 4 +- net/core/sock.c | 8 +- net/ipv4/ip_output.c | 4 +- net/sunrpc/cache.c | 4 +- net/sunrpc/xdr.c | 8 +- net/tls/tls_device.c | 4 +- samples/vfio-mdev/mbochs.c | 4 +- 131 files changed, 1284 insertions(+), 565 deletions(-) create mode 100644 include/trace/events/kmap_thread.h -- 2.28.0.rc0.12.gb6a658bd00c9 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: From: ira.weiny@intel.com Subject: [PATCH RFC PKS/PMEM 00/58] PMEM: Introduce stray write protection for PMEM Date: Fri, 9 Oct 2020 12:49:35 -0700 Message-Id: <20201009195033.3208459-1-ira.weiny@intel.com> MIME-Version: 1.0 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: linux-aio@kvack.org, linux-efi@vger.kernel.org, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-mmc@vger.kernel.org, Dave Hansen , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, target-devel@vger.kernel.org, linux-mtd@lists.infradead.org, linux-kselftest@vger.kernel.org, samba-technical@lists.samba.org, Ira Weiny , ceph-devel@vger.kernel.org, drbd-dev@lists.linbit.com, devel@driverdev.osuosl.org, linux-cifs@vger.kernel.org, linux-nilfs@vger.kernel.org, linux-scsi@vger.kernel.org, linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org, x86@kernel.org, amd-gfx@lists.freedesktop.org, linux-afs@lists.infradead.org, cluster-devel@redhat.com, linux-cachefs@redhat.com, intel-wired-lan@lists.osuosl.org, xen-devel@lists.xenproject.org, linux-ext4@vger.kernel.org, Fenghua Yu , linux-um@lists.infradead.org, intel-gfx@lists.freedesktop.org, ecryptfs@vger.kernel.org, linux-erofs@lists.ozlabs.org, reiserfs-devel@vger.kernel.org, linux-block@vger.kernel.org, linux-bcache@vger.kernel.org, Dan Williams , io-uring@vger.kernel.org, linux-nfs@vger.kernel.org, linux-ntfs-dev@lists.sourceforge.net, netdev@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, bpf@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-btrfs@vger.kernel.org From: Ira Weiny Should a stray write in the kernel occur persistent memory is affected more than regular memory. A write to the wrong area of memory could result in latent data corruption which will will persist after a reboot. PKS provides a nice way to restrict access to persistent memory kernel mappings, while providing fast access when needed. Since the last RFC[1] this patch set has grown quite a bit. It now depends on the core patches submitted separately. https://lore.kernel.org/lkml/20201009194258.3207172-1-ira.weiny@intel.com/ And contained in the git tree here: https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3 However, functionally there is only 1 major change from the last RFC. Specifically, kmap() is most often used within a single thread in a 'map/do something/unmap' pattern. In fact this is the pattern used in ~90% of the callers of kmap(). This pattern works very well for the pmem use case and the testing which was done. However, there were another ~20-30 kmap users which do not follow this pattern. Some of them seem to expect the mapping to be 'global' while others require a detailed audit to be sure.[2][3] While we don't anticipate global mappings to pmem there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why. There were a number of options considered. 1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only Option 1 is simply not feasible kmap_atomic() is not the same semantic as kmap() within a single tread. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior intended for future users. Therefore, option #4 was chosen. To handle the global PKRS state in the most efficient manner possible. We lazily override the thread specific PKRS key value only when needed because we anticipate PKS to not be needed will not be needed most of the time. And even when it is used 90% of the time it is a thread local call. [1] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny@intel.com/ [2] The following list of callers continue calling kmap() (utilizing the global PKRS). It would be nice if more of them could be converted to kmap_thread() drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]); drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl)); drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page); drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page); drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1)); drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset; drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg)); drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset; fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]); fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page); fs/ceph/inode.c: ctl->dentries = kmap(ctl->page); fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr); lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset; net/ceph/pagelist.c: pl->mapped_tail = kmap(page); net/ceph/pagelist.c: pl->mapped_tail = kmap(page); virt/kvm/kvm_main.c: hva = kmap(page); [3] The following appear to follow the same pattern as ext2 which was converted after some code audit. So I _think_ they too could be converted to k[un]map_thread(). fs/freevxfs/vxfs_subr.c|75| kmap(pp); fs/jfs/jfs_metapage.c|102| kmap(page); fs/jfs/jfs_metapage.c|156| kmap(page); fs/minix/dir.c|72| kmap(page); fs/nilfs2/dir.c|195| kmap(page); fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page); fs/ntfs/aops.h|78| kmap(page); fs/ntfs/compress.c|574| kmap(page); fs/qnx6/dir.c|32| kmap(page); fs/qnx6/dir.c|58| kmap(*p = page); fs/qnx6/inode.c|190| kmap(page); fs/qnx6/inode.c|557| kmap(page); fs/reiserfs/inode.c|2397| kmap(bh_result->b_page); fs/reiserfs/xattr.c|444| kmap(page); fs/sysv/dir.c|60| kmap(page); fs/sysv/dir.c|262| kmap(page); fs/ufs/dir.c|194| kmap(page); fs/ufs/dir.c|562| kmap(page); Ira Weiny (58): x86/pks: Add a global pkrs option x86/pks/test: Add testing for global option memremap: Add zone device access protection kmap: Add stray access protection for device pages kmap: Introduce k[un]map_thread kmap: Introduce k[un]map_thread debugging drivers/drbd: Utilize new kmap_thread() drivers/firmware_loader: Utilize new kmap_thread() drivers/gpu: Utilize new kmap_thread() drivers/rdma: Utilize new kmap_thread() drivers/net: Utilize new kmap_thread() fs/afs: Utilize new kmap_thread() fs/btrfs: Utilize new kmap_thread() fs/cifs: Utilize new kmap_thread() fs/ecryptfs: Utilize new kmap_thread() fs/gfs2: Utilize new kmap_thread() fs/nilfs2: Utilize new kmap_thread() fs/hfs: Utilize new kmap_thread() fs/hfsplus: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() fs/nfs: Utilize new kmap_thread() fs/f2fs: Utilize new kmap_thread() fs/fuse: Utilize new kmap_thread() fs/freevxfs: Utilize new kmap_thread() fs/reiserfs: Utilize new kmap_thread() fs/zonefs: Utilize new kmap_thread() fs/ubifs: Utilize new kmap_thread() fs/cachefiles: Utilize new kmap_thread() fs/ntfs: Utilize new kmap_thread() fs/romfs: Utilize new kmap_thread() fs/vboxsf: Utilize new kmap_thread() fs/hostfs: Utilize new kmap_thread() fs/cramfs: Utilize new kmap_thread() fs/erofs: Utilize new kmap_thread() fs: Utilize new kmap_thread() fs/ext2: Use ext2_put_page fs/ext2: Utilize new kmap_thread() fs/isofs: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() net: Utilize new kmap_thread() drivers/target: Utilize new kmap_thread() drivers/scsi: Utilize new kmap_thread() drivers/mmc: Utilize new kmap_thread() drivers/xen: Utilize new kmap_thread() drivers/firmware: Utilize new kmap_thread() drives/staging: Utilize new kmap_thread() drivers/mtd: Utilize new kmap_thread() drivers/md: Utilize new kmap_thread() drivers/misc: Utilize new kmap_thread() drivers/android: Utilize new kmap_thread() kernel: Utilize new kmap_thread() mm: Utilize new kmap_thread() lib: Utilize new kmap_thread() powerpc: Utilize new kmap_thread() samples: Utilize new kmap_thread() dax: Stray access protection for dax_direct_access() nvdimm/pmem: Stray access protection for pmem->virt_addr [dax|pmem]: Enable stray access protection Documentation/core-api/protection-keys.rst | 11 +- arch/powerpc/mm/mem.c | 4 +- arch/x86/entry/common.c | 28 +++ arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 ++++++- arch/x86/mm/fault.c | 193 ++++++++++++++---- arch/x86/mm/pkeys.c | 88 ++++++-- drivers/android/binder_alloc.c | 4 +- drivers/base/firmware_loader/fallback.c | 4 +- drivers/base/firmware_loader/main.c | 4 +- drivers/block/drbd/drbd_main.c | 4 +- drivers/block/drbd/drbd_receiver.c | 12 +- drivers/dax/device.c | 2 + drivers/dax/super.c | 2 + drivers/firmware/efi/capsule-loader.c | 6 +- drivers/firmware/efi/capsule.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +- drivers/gpu/drm/gma500/gma_display.c | 4 +- drivers/gpu/drm/gma500/mmu.c | 10 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/infiniband/hw/hfi1/sdma.c | 4 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +- drivers/md/bcache/request.c | 4 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +- drivers/mmc/host/mmc_spi.c | 4 +- drivers/mmc/host/sdricoh_cs.c | 4 +- drivers/mtd/mtd_blkdevs.c | 12 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +- .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/nvdimm/pmem.c | 6 + drivers/scsi/ipr.c | 8 +- drivers/scsi/pmcraid.c | 8 +- drivers/staging/rts5208/rtsx_transport.c | 4 +- drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_rd.c | 4 +- drivers/target/target_core_transport.c | 4 +- drivers/xen/gntalloc.c | 4 +- fs/afs/dir.c | 16 +- fs/afs/dir_edit.c | 16 +- fs/afs/mntpt.c | 4 +- fs/afs/write.c | 4 +- fs/aio.c | 4 +- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/btrfs/check-integrity.c | 4 +- fs/btrfs/compression.c | 4 +- fs/btrfs/inode.c | 16 +- fs/btrfs/lzo.c | 24 +-- fs/btrfs/raid56.c | 34 +-- fs/btrfs/reflink.c | 8 +- fs/btrfs/send.c | 4 +- fs/btrfs/zlib.c | 32 +-- fs/btrfs/zstd.c | 20 +- fs/cachefiles/rdwr.c | 4 +- fs/cifs/cifsencrypt.c | 6 +- fs/cifs/file.c | 16 +- fs/cifs/smb2ops.c | 8 +- fs/cramfs/inode.c | 10 +- fs/ecryptfs/crypto.c | 8 +- fs/ecryptfs/read_write.c | 8 +- fs/erofs/super.c | 4 +- fs/erofs/xattr.c | 4 +- fs/exec.c | 10 +- fs/ext2/dir.c | 8 +- fs/ext2/ext2.h | 8 + fs/ext2/namei.c | 15 +- fs/f2fs/f2fs.h | 8 +- fs/freevxfs/vxfs_immed.c | 4 +- fs/fuse/readdir.c | 4 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/ops_fstype.c | 4 +- fs/hfs/bnode.c | 14 +- fs/hfs/btree.c | 20 +- fs/hfsplus/bitmap.c | 20 +- fs/hfsplus/bnode.c | 102 ++++----- fs/hfsplus/btree.c | 18 +- fs/hostfs/hostfs_kern.c | 12 +- fs/io_uring.c | 4 +- fs/isofs/compress.c | 4 +- fs/jffs2/file.c | 8 +- fs/jffs2/gc.c | 4 +- fs/nfs/dir.c | 20 +- fs/nilfs2/alloc.c | 34 +-- fs/nilfs2/cpfile.c | 4 +- fs/ntfs/aops.c | 4 +- fs/reiserfs/journal.c | 4 +- fs/romfs/super.c | 4 +- fs/splice.c | 4 +- fs/ubifs/file.c | 16 +- fs/vboxsf/file.c | 12 +- fs/zonefs/super.c | 4 +- include/linux/entry-common.h | 3 + include/linux/highmem.h | 63 +++++- include/linux/memremap.h | 1 + include/linux/mm.h | 43 ++++ include/linux/pkeys.h | 6 +- include/linux/sched.h | 8 + include/trace/events/kmap_thread.h | 56 +++++ init/init_task.c | 6 + kernel/fork.c | 18 ++ kernel/kexec_core.c | 8 +- lib/Kconfig.debug | 8 + lib/iov_iter.c | 12 +- lib/pks/pks_test.c | 138 +++++++++++-- lib/test_bpf.c | 4 +- lib/test_hmm.c | 8 +- mm/Kconfig | 13 ++ mm/debug.c | 23 +++ mm/memory.c | 8 +- mm/memremap.c | 90 ++++++++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 4 +- net/ceph/messenger.c | 4 +- net/core/datagram.c | 4 +- net/core/sock.c | 8 +- net/ipv4/ip_output.c | 4 +- net/sunrpc/cache.c | 4 +- net/sunrpc/xdr.c | 8 +- net/tls/tls_device.c | 4 +- samples/vfio-mdev/mbochs.c | 4 +- 131 files changed, 1284 insertions(+), 565 deletions(-) create mode 100644 include/trace/events/kmap_thread.h -- 2.28.0.rc0.12.gb6a658bd00c9 _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec From mboxrd@z Thu Jan 1 00:00:00 1970 From: ira.weiny@intel.com Date: Fri, 9 Oct 2020 12:49:35 -0700 Subject: [Cluster-devel] [PATCH RFC PKS/PMEM 00/58] PMEM: Introduce stray write protection for PMEM Message-ID: <20201009195033.3208459-1-ira.weiny@intel.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit From: Ira Weiny Should a stray write in the kernel occur persistent memory is affected more than regular memory. A write to the wrong area of memory could result in latent data corruption which will will persist after a reboot. PKS provides a nice way to restrict access to persistent memory kernel mappings, while providing fast access when needed. Since the last RFC[1] this patch set has grown quite a bit. It now depends on the core patches submitted separately. https://lore.kernel.org/lkml/20201009194258.3207172-1-ira.weiny at intel.com/ And contained in the git tree here: https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3 However, functionally there is only 1 major change from the last RFC. Specifically, kmap() is most often used within a single thread in a 'map/do something/unmap' pattern. In fact this is the pattern used in ~90% of the callers of kmap(). This pattern works very well for the pmem use case and the testing which was done. However, there were another ~20-30 kmap users which do not follow this pattern. Some of them seem to expect the mapping to be 'global' while others require a detailed audit to be sure.[2][3] While we don't anticipate global mappings to pmem there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why. There were a number of options considered. 1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only Option 1 is simply not feasible kmap_atomic() is not the same semantic as kmap() within a single tread. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior intended for future users. Therefore, option #4 was chosen. To handle the global PKRS state in the most efficient manner possible. We lazily override the thread specific PKRS key value only when needed because we anticipate PKS to not be needed will not be needed most of the time. And even when it is used 90% of the time it is a thread local call. [1] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny at intel.com/ [2] The following list of callers continue calling kmap() (utilizing the global PKRS). It would be nice if more of them could be converted to kmap_thread() drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]); drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl)); drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page); drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page); drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1)); drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset; drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg)); drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset; fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]); fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page); fs/ceph/inode.c: ctl->dentries = kmap(ctl->page); fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr); lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset; net/ceph/pagelist.c: pl->mapped_tail = kmap(page); net/ceph/pagelist.c: pl->mapped_tail = kmap(page); virt/kvm/kvm_main.c: hva = kmap(page); [3] The following appear to follow the same pattern as ext2 which was converted after some code audit. So I _think_ they too could be converted to k[un]map_thread(). fs/freevxfs/vxfs_subr.c|75| kmap(pp); fs/jfs/jfs_metapage.c|102| kmap(page); fs/jfs/jfs_metapage.c|156| kmap(page); fs/minix/dir.c|72| kmap(page); fs/nilfs2/dir.c|195| kmap(page); fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page); fs/ntfs/aops.h|78| kmap(page); fs/ntfs/compress.c|574| kmap(page); fs/qnx6/dir.c|32| kmap(page); fs/qnx6/dir.c|58| kmap(*p = page); fs/qnx6/inode.c|190| kmap(page); fs/qnx6/inode.c|557| kmap(page); fs/reiserfs/inode.c|2397| kmap(bh_result->b_page); fs/reiserfs/xattr.c|444| kmap(page); fs/sysv/dir.c|60| kmap(page); fs/sysv/dir.c|262| kmap(page); fs/ufs/dir.c|194| kmap(page); fs/ufs/dir.c|562| kmap(page); Ira Weiny (58): x86/pks: Add a global pkrs option x86/pks/test: Add testing for global option memremap: Add zone device access protection kmap: Add stray access protection for device pages kmap: Introduce k[un]map_thread kmap: Introduce k[un]map_thread debugging drivers/drbd: Utilize new kmap_thread() drivers/firmware_loader: Utilize new kmap_thread() drivers/gpu: Utilize new kmap_thread() drivers/rdma: Utilize new kmap_thread() drivers/net: Utilize new kmap_thread() fs/afs: Utilize new kmap_thread() fs/btrfs: Utilize new kmap_thread() fs/cifs: Utilize new kmap_thread() fs/ecryptfs: Utilize new kmap_thread() fs/gfs2: Utilize new kmap_thread() fs/nilfs2: Utilize new kmap_thread() fs/hfs: Utilize new kmap_thread() fs/hfsplus: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() fs/nfs: Utilize new kmap_thread() fs/f2fs: Utilize new kmap_thread() fs/fuse: Utilize new kmap_thread() fs/freevxfs: Utilize new kmap_thread() fs/reiserfs: Utilize new kmap_thread() fs/zonefs: Utilize new kmap_thread() fs/ubifs: Utilize new kmap_thread() fs/cachefiles: Utilize new kmap_thread() fs/ntfs: Utilize new kmap_thread() fs/romfs: Utilize new kmap_thread() fs/vboxsf: Utilize new kmap_thread() fs/hostfs: Utilize new kmap_thread() fs/cramfs: Utilize new kmap_thread() fs/erofs: Utilize new kmap_thread() fs: Utilize new kmap_thread() fs/ext2: Use ext2_put_page fs/ext2: Utilize new kmap_thread() fs/isofs: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() net: Utilize new kmap_thread() drivers/target: Utilize new kmap_thread() drivers/scsi: Utilize new kmap_thread() drivers/mmc: Utilize new kmap_thread() drivers/xen: Utilize new kmap_thread() drivers/firmware: Utilize new kmap_thread() drives/staging: Utilize new kmap_thread() drivers/mtd: Utilize new kmap_thread() drivers/md: Utilize new kmap_thread() drivers/misc: Utilize new kmap_thread() drivers/android: Utilize new kmap_thread() kernel: Utilize new kmap_thread() mm: Utilize new kmap_thread() lib: Utilize new kmap_thread() powerpc: Utilize new kmap_thread() samples: Utilize new kmap_thread() dax: Stray access protection for dax_direct_access() nvdimm/pmem: Stray access protection for pmem->virt_addr [dax|pmem]: Enable stray access protection Documentation/core-api/protection-keys.rst | 11 +- arch/powerpc/mm/mem.c | 4 +- arch/x86/entry/common.c | 28 +++ arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 ++++++- arch/x86/mm/fault.c | 193 ++++++++++++++---- arch/x86/mm/pkeys.c | 88 ++++++-- drivers/android/binder_alloc.c | 4 +- drivers/base/firmware_loader/fallback.c | 4 +- drivers/base/firmware_loader/main.c | 4 +- drivers/block/drbd/drbd_main.c | 4 +- drivers/block/drbd/drbd_receiver.c | 12 +- drivers/dax/device.c | 2 + drivers/dax/super.c | 2 + drivers/firmware/efi/capsule-loader.c | 6 +- drivers/firmware/efi/capsule.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +- drivers/gpu/drm/gma500/gma_display.c | 4 +- drivers/gpu/drm/gma500/mmu.c | 10 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/infiniband/hw/hfi1/sdma.c | 4 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +- drivers/md/bcache/request.c | 4 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +- drivers/mmc/host/mmc_spi.c | 4 +- drivers/mmc/host/sdricoh_cs.c | 4 +- drivers/mtd/mtd_blkdevs.c | 12 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +- .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/nvdimm/pmem.c | 6 + drivers/scsi/ipr.c | 8 +- drivers/scsi/pmcraid.c | 8 +- drivers/staging/rts5208/rtsx_transport.c | 4 +- drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_rd.c | 4 +- drivers/target/target_core_transport.c | 4 +- drivers/xen/gntalloc.c | 4 +- fs/afs/dir.c | 16 +- fs/afs/dir_edit.c | 16 +- fs/afs/mntpt.c | 4 +- fs/afs/write.c | 4 +- fs/aio.c | 4 +- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/btrfs/check-integrity.c | 4 +- fs/btrfs/compression.c | 4 +- fs/btrfs/inode.c | 16 +- fs/btrfs/lzo.c | 24 +-- fs/btrfs/raid56.c | 34 +-- fs/btrfs/reflink.c | 8 +- fs/btrfs/send.c | 4 +- fs/btrfs/zlib.c | 32 +-- fs/btrfs/zstd.c | 20 +- fs/cachefiles/rdwr.c | 4 +- fs/cifs/cifsencrypt.c | 6 +- fs/cifs/file.c | 16 +- fs/cifs/smb2ops.c | 8 +- fs/cramfs/inode.c | 10 +- fs/ecryptfs/crypto.c | 8 +- fs/ecryptfs/read_write.c | 8 +- fs/erofs/super.c | 4 +- fs/erofs/xattr.c | 4 +- fs/exec.c | 10 +- fs/ext2/dir.c | 8 +- fs/ext2/ext2.h | 8 + fs/ext2/namei.c | 15 +- fs/f2fs/f2fs.h | 8 +- fs/freevxfs/vxfs_immed.c | 4 +- fs/fuse/readdir.c | 4 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/ops_fstype.c | 4 +- fs/hfs/bnode.c | 14 +- fs/hfs/btree.c | 20 +- fs/hfsplus/bitmap.c | 20 +- fs/hfsplus/bnode.c | 102 ++++----- fs/hfsplus/btree.c | 18 +- fs/hostfs/hostfs_kern.c | 12 +- fs/io_uring.c | 4 +- fs/isofs/compress.c | 4 +- fs/jffs2/file.c | 8 +- fs/jffs2/gc.c | 4 +- fs/nfs/dir.c | 20 +- fs/nilfs2/alloc.c | 34 +-- fs/nilfs2/cpfile.c | 4 +- fs/ntfs/aops.c | 4 +- fs/reiserfs/journal.c | 4 +- fs/romfs/super.c | 4 +- fs/splice.c | 4 +- fs/ubifs/file.c | 16 +- fs/vboxsf/file.c | 12 +- fs/zonefs/super.c | 4 +- include/linux/entry-common.h | 3 + include/linux/highmem.h | 63 +++++- include/linux/memremap.h | 1 + include/linux/mm.h | 43 ++++ include/linux/pkeys.h | 6 +- include/linux/sched.h | 8 + include/trace/events/kmap_thread.h | 56 +++++ init/init_task.c | 6 + kernel/fork.c | 18 ++ kernel/kexec_core.c | 8 +- lib/Kconfig.debug | 8 + lib/iov_iter.c | 12 +- lib/pks/pks_test.c | 138 +++++++++++-- lib/test_bpf.c | 4 +- lib/test_hmm.c | 8 +- mm/Kconfig | 13 ++ mm/debug.c | 23 +++ mm/memory.c | 8 +- mm/memremap.c | 90 ++++++++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 4 +- net/ceph/messenger.c | 4 +- net/core/datagram.c | 4 +- net/core/sock.c | 8 +- net/ipv4/ip_output.c | 4 +- net/sunrpc/cache.c | 4 +- net/sunrpc/xdr.c | 8 +- net/tls/tls_device.c | 4 +- samples/vfio-mdev/mbochs.c | 4 +- 131 files changed, 1284 insertions(+), 565 deletions(-) create mode 100644 include/trace/events/kmap_thread.h -- 2.28.0.rc0.12.gb6a658bd00c9