All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/30] Hardened usercopy whitelisting
@ 2017-08-28 21:34 ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: Kees Cook, linux-mm, kernel-hardening, David Windsor

This series is modified from Brad Spengler/PaX Team's PAX_USERCOPY code
in the last public patch of grsecurity/PaX based on our understanding
of the code. Changes or omissions from the original code are ours and
don't reflect the original grsecurity/PaX code.

David Windsor did the bulk of the porting, refactoring, splitting,
testing, etc; I just did some extra tweaks, hunk moving, traces,
and extra patches.

Description from patch 1:

Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)

To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.

After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:

Total Slab Memory:           48074720
Usercopyable Memory:          6367532  13.2%
         task_struct                    0.2%         4480/1630720
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%       269760/8740224
         dentry                        11.1%       585984/5273856
         mm_struct                     29.1%         54912/188448
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          81920/81920
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        167936/167936
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        455616/455616
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        812032/812032
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1310720/1310720

After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:

Total Slab Memory:           95516184
Usercopyable Memory:          8497452   8.8%
         task_struct                    0.2%         4000/1456000
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%     1217280/39439872
         dentry                        11.1%     1623200/14608800
         mm_struct                     29.1%         73216/251264
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          94208/94208
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        245760/245760
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        563520/563520
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        794624/794624
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1257472/1257472

------
The patches are broken in several stages of changes:

Prepare and whitelist kmalloc:
    [PATCH 01/30] usercopy: Prepare for usercopy whitelisting
    [PATCH 02/30] usercopy: Enforce slab cache usercopy region boundaries
    [PATCH 03/30] usercopy: Mark kmalloc caches as usercopy caches

Update VFS layer for symlinks and other inline storage:
    [PATCH 04/30] dcache: Define usercopy region in dentry_cache slab
    [PATCH 05/30] vfs: Define usercopy region in names_cache slab caches
    [PATCH 06/30] vfs: Copy struct mount.mnt_id to userspace using
    [PATCH 07/30] ext4: Define usercopy region in ext4_inode_cache slab
    [PATCH 08/30] ext2: Define usercopy region in ext2_inode_cache slab
    [PATCH 09/30] jfs: Define usercopy region in jfs_ip slab cache
    [PATCH 10/30] befs: Define usercopy region in befs_inode_cache slab
    [PATCH 11/30] exofs: Define usercopy region in exofs_inode_cache slab
    [PATCH 12/30] orangefs: Define usercopy region in
    [PATCH 13/30] ufs: Define usercopy region in ufs_inode_cache slab
    [PATCH 14/30] vxfs: Define usercopy region in vxfs_inode slab cache
    [PATCH 15/30] xfs: Define usercopy region in xfs_inode slab cache
    [PATCH 16/30] cifs: Define usercopy region in cifs_request slab cache

Update scsi layer for inline storage:
    [PATCH 17/30] scsi: Define usercopy region in scsi_sense_cache slab

Whitelist a few network protocol-specific areas of memory:
    [PATCH 18/30] net: Define usercopy region in struct proto slab cache
    [PATCH 19/30] ip: Define usercopy region in IP proto slab cache
    [PATCH 20/30] caif: Define usercopy region in caif proto slab cache
    [PATCH 21/30] sctp: Define usercopy region in SCTP proto slab cache
    [PATCH 22/30] sctp: Copy struct sctp_sock.autoclose to userspace
    [PATCH 23/30] net: Restrict unwhitelisted proto caches to size 0

Whitelist areas of process memory:
    [PATCH 24/30] fork: Define usercopy region in mm_struct slab caches
    [PATCH 25/30] fork: Define usercopy region in thread_stack slab

Deal with per-architecture thread_struct whitelisting:
    [PATCH 26/30] fork: Provide usercopy whitelisting for task_struct
    [PATCH 27/30] x86: Implement thread_struct whitelist for hardened
    [PATCH 28/30] arm64: Implement thread_struct whitelist for hardened
    [PATCH 29/30] arm: Implement thread_struct whitelist for hardened

Make blacklisting the default:
    [PATCH 30/30] usercopy: Restrict non-usercopy caches to size 0

v2:
- added tracing of allocation and usage
- refactored solutions for task_struct
- split up network patches for readability

I intend for this to land via my usercopy hardening tree, so Acks,
Reviewed, and Tested-bys would be greatly appreciated. I have some
questions in a few patches (e.g. CIFS and thread_stack) that would be
nice to get answered for completeness. FWIW, this series has survived
over the weekend in 0-day testing.

Thanks!

-Kees (and David)

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [PATCH v2 00/30] Hardened usercopy whitelisting
@ 2017-08-28 21:34 ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: Kees Cook, linux-mm, kernel-hardening, David Windsor

This series is modified from Brad Spengler/PaX Team's PAX_USERCOPY code
in the last public patch of grsecurity/PaX based on our understanding
of the code. Changes or omissions from the original code are ours and
don't reflect the original grsecurity/PaX code.

David Windsor did the bulk of the porting, refactoring, splitting,
testing, etc; I just did some extra tweaks, hunk moving, traces,
and extra patches.

Description from patch 1:

Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)

To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.

After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:

Total Slab Memory:           48074720
Usercopyable Memory:          6367532  13.2%
         task_struct                    0.2%         4480/1630720
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%       269760/8740224
         dentry                        11.1%       585984/5273856
         mm_struct                     29.1%         54912/188448
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          81920/81920
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        167936/167936
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        455616/455616
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        812032/812032
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1310720/1310720

After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:

Total Slab Memory:           95516184
Usercopyable Memory:          8497452   8.8%
         task_struct                    0.2%         4000/1456000
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%     1217280/39439872
         dentry                        11.1%     1623200/14608800
         mm_struct                     29.1%         73216/251264
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          94208/94208
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        245760/245760
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        563520/563520
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        794624/794624
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1257472/1257472

------
The patches are broken in several stages of changes:

Prepare and whitelist kmalloc:
    [PATCH 01/30] usercopy: Prepare for usercopy whitelisting
    [PATCH 02/30] usercopy: Enforce slab cache usercopy region boundaries
    [PATCH 03/30] usercopy: Mark kmalloc caches as usercopy caches

Update VFS layer for symlinks and other inline storage:
    [PATCH 04/30] dcache: Define usercopy region in dentry_cache slab
    [PATCH 05/30] vfs: Define usercopy region in names_cache slab caches
    [PATCH 06/30] vfs: Copy struct mount.mnt_id to userspace using
    [PATCH 07/30] ext4: Define usercopy region in ext4_inode_cache slab
    [PATCH 08/30] ext2: Define usercopy region in ext2_inode_cache slab
    [PATCH 09/30] jfs: Define usercopy region in jfs_ip slab cache
    [PATCH 10/30] befs: Define usercopy region in befs_inode_cache slab
    [PATCH 11/30] exofs: Define usercopy region in exofs_inode_cache slab
    [PATCH 12/30] orangefs: Define usercopy region in
    [PATCH 13/30] ufs: Define usercopy region in ufs_inode_cache slab
    [PATCH 14/30] vxfs: Define usercopy region in vxfs_inode slab cache
    [PATCH 15/30] xfs: Define usercopy region in xfs_inode slab cache
    [PATCH 16/30] cifs: Define usercopy region in cifs_request slab cache

Update scsi layer for inline storage:
    [PATCH 17/30] scsi: Define usercopy region in scsi_sense_cache slab

Whitelist a few network protocol-specific areas of memory:
    [PATCH 18/30] net: Define usercopy region in struct proto slab cache
    [PATCH 19/30] ip: Define usercopy region in IP proto slab cache
    [PATCH 20/30] caif: Define usercopy region in caif proto slab cache
    [PATCH 21/30] sctp: Define usercopy region in SCTP proto slab cache
    [PATCH 22/30] sctp: Copy struct sctp_sock.autoclose to userspace
    [PATCH 23/30] net: Restrict unwhitelisted proto caches to size 0

Whitelist areas of process memory:
    [PATCH 24/30] fork: Define usercopy region in mm_struct slab caches
    [PATCH 25/30] fork: Define usercopy region in thread_stack slab

Deal with per-architecture thread_struct whitelisting:
    [PATCH 26/30] fork: Provide usercopy whitelisting for task_struct
    [PATCH 27/30] x86: Implement thread_struct whitelist for hardened
    [PATCH 28/30] arm64: Implement thread_struct whitelist for hardened
    [PATCH 29/30] arm: Implement thread_struct whitelist for hardened

Make blacklisting the default:
    [PATCH 30/30] usercopy: Restrict non-usercopy caches to size 0

v2:
- added tracing of allocation and usage
- refactored solutions for task_struct
- split up network patches for readability

I intend for this to land via my usercopy hardening tree, so Acks,
Reviewed, and Tested-bys would be greatly appreciated. I have some
questions in a few patches (e.g. CIFS and thread_stack) that would be
nice to get answered for completeness. FWIW, this series has survived
over the weekend in 0-day testing.

Thanks!

-Kees (and David)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 00/30] Hardened usercopy whitelisting
@ 2017-08-28 21:34 ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: Kees Cook, linux-mm, kernel-hardening, David Windsor

This series is modified from Brad Spengler/PaX Team's PAX_USERCOPY code
in the last public patch of grsecurity/PaX based on our understanding
of the code. Changes or omissions from the original code are ours and
don't reflect the original grsecurity/PaX code.

David Windsor did the bulk of the porting, refactoring, splitting,
testing, etc; I just did some extra tweaks, hunk moving, traces,
and extra patches.

Description from patch 1:

Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)

To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.

After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:

Total Slab Memory:           48074720
Usercopyable Memory:          6367532  13.2%
         task_struct                    0.2%         4480/1630720
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%       269760/8740224
         dentry                        11.1%       585984/5273856
         mm_struct                     29.1%         54912/188448
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          81920/81920
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        167936/167936
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        455616/455616
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        812032/812032
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1310720/1310720

After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:

Total Slab Memory:           95516184
Usercopyable Memory:          8497452   8.8%
         task_struct                    0.2%         4000/1456000
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%     1217280/39439872
         dentry                        11.1%     1623200/14608800
         mm_struct                     29.1%         73216/251264
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          94208/94208
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        245760/245760
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        563520/563520
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        794624/794624
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1257472/1257472

------
The patches are broken in several stages of changes:

Prepare and whitelist kmalloc:
    [PATCH 01/30] usercopy: Prepare for usercopy whitelisting
    [PATCH 02/30] usercopy: Enforce slab cache usercopy region boundaries
    [PATCH 03/30] usercopy: Mark kmalloc caches as usercopy caches

Update VFS layer for symlinks and other inline storage:
    [PATCH 04/30] dcache: Define usercopy region in dentry_cache slab
    [PATCH 05/30] vfs: Define usercopy region in names_cache slab caches
    [PATCH 06/30] vfs: Copy struct mount.mnt_id to userspace using
    [PATCH 07/30] ext4: Define usercopy region in ext4_inode_cache slab
    [PATCH 08/30] ext2: Define usercopy region in ext2_inode_cache slab
    [PATCH 09/30] jfs: Define usercopy region in jfs_ip slab cache
    [PATCH 10/30] befs: Define usercopy region in befs_inode_cache slab
    [PATCH 11/30] exofs: Define usercopy region in exofs_inode_cache slab
    [PATCH 12/30] orangefs: Define usercopy region in
    [PATCH 13/30] ufs: Define usercopy region in ufs_inode_cache slab
    [PATCH 14/30] vxfs: Define usercopy region in vxfs_inode slab cache
    [PATCH 15/30] xfs: Define usercopy region in xfs_inode slab cache
    [PATCH 16/30] cifs: Define usercopy region in cifs_request slab cache

Update scsi layer for inline storage:
    [PATCH 17/30] scsi: Define usercopy region in scsi_sense_cache slab

Whitelist a few network protocol-specific areas of memory:
    [PATCH 18/30] net: Define usercopy region in struct proto slab cache
    [PATCH 19/30] ip: Define usercopy region in IP proto slab cache
    [PATCH 20/30] caif: Define usercopy region in caif proto slab cache
    [PATCH 21/30] sctp: Define usercopy region in SCTP proto slab cache
    [PATCH 22/30] sctp: Copy struct sctp_sock.autoclose to userspace
    [PATCH 23/30] net: Restrict unwhitelisted proto caches to size 0

Whitelist areas of process memory:
    [PATCH 24/30] fork: Define usercopy region in mm_struct slab caches
    [PATCH 25/30] fork: Define usercopy region in thread_stack slab

Deal with per-architecture thread_struct whitelisting:
    [PATCH 26/30] fork: Provide usercopy whitelisting for task_struct
    [PATCH 27/30] x86: Implement thread_struct whitelist for hardened
    [PATCH 28/30] arm64: Implement thread_struct whitelist for hardened
    [PATCH 29/30] arm: Implement thread_struct whitelist for hardened

Make blacklisting the default:
    [PATCH 30/30] usercopy: Restrict non-usercopy caches to size 0

v2:
- added tracing of allocation and usage
- refactored solutions for task_struct
- split up network patches for readability

I intend for this to land via my usercopy hardening tree, so Acks,
Reviewed, and Tested-bys would be greatly appreciated. I have some
questions in a few patches (e.g. CIFS and thread_stack) that would be
nice to get answered for completeness. FWIW, this series has survived
over the weekend in 0-day testing.

Thanks!

-Kees (and David)

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [PATCH v2 01/30] usercopy: Prepare for usercopy whitelisting
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm,
	kernel-hardening

From: David Windsor <dave@nullcore.net>

This patch prepares the slab allocator to handle caches having annotations
(useroffset and usersize) defining usercopy regions.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on
my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code.

Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)

To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.

After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:

Total Slab Memory:           48074720
Usercopyable Memory:          6367532  13.2%
         task_struct                    0.2%         4480/1630720
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%       269760/8740224
         dentry                        11.1%       585984/5273856
         mm_struct                     29.1%         54912/188448
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          81920/81920
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        167936/167936
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        455616/455616
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        812032/812032
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1310720/1310720

After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:

Total Slab Memory:           95516184
Usercopyable Memory:          8497452   8.8%
         task_struct                    0.2%         4000/1456000
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%     1217280/39439872
         dentry                        11.1%     1623200/14608800
         mm_struct                     29.1%         73216/251264
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          94208/94208
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        245760/245760
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        563520/563520
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        794624/794624
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1257472/1257472

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split out a few extra kmalloc hunks]
[kees: add field names to function declarations]
[kees: add attack surface reduction analysis to commit log]
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/slab.h     | 27 +++++++++++++++++++++------
 include/linux/slab_def.h |  3 +++
 include/linux/slub_def.h |  3 +++
 include/linux/stddef.h   |  2 ++
 mm/slab.c                |  2 +-
 mm/slab.h                |  5 ++++-
 mm/slab_common.c         | 42 ++++++++++++++++++++++++++++++++++--------
 mm/slub.c                | 11 +++++++++--
 8 files changed, 77 insertions(+), 18 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 41473df6dfb0..8b6cb384f8b6 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -126,9 +126,13 @@ struct mem_cgroup;
 void __init kmem_cache_init(void);
 bool slab_is_available(void);
 
-struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
-			unsigned long,
-			void (*)(void *));
+struct kmem_cache *kmem_cache_create(const char *name, size_t size,
+			size_t align, unsigned long flags,
+			void (*ctor)(void *));
+struct kmem_cache *kmem_cache_create_usercopy(const char *name,
+			size_t size, size_t align, unsigned long flags,
+			size_t useroffset, size_t usersize,
+			void (*ctor)(void *));
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
 
@@ -144,9 +148,20 @@ void memcg_destroy_kmem_caches(struct mem_cgroup *);
  * f.e. add ____cacheline_aligned_in_smp to the struct declaration
  * then the objects will be properly aligned in SMP configurations.
  */
-#define KMEM_CACHE(__struct, __flags) kmem_cache_create(#__struct,\
-		sizeof(struct __struct), __alignof__(struct __struct),\
-		(__flags), NULL)
+#define KMEM_CACHE(__struct, __flags)					\
+		kmem_cache_create(#__struct, sizeof(struct __struct),	\
+			__alignof__(struct __struct), (__flags), NULL)
+
+/*
+ * To whitelist a single field for copying to/from usercopy, use this
+ * macro instead for KMEM_CACHE() above.
+ */
+#define KMEM_CACHE_USERCOPY(__struct, __flags, __field)			\
+		kmem_cache_create_usercopy(#__struct,			\
+			sizeof(struct __struct),			\
+			__alignof__(struct __struct), (__flags),	\
+			offsetof(struct __struct, __field),		\
+			sizeof_field(struct __struct, __field), NULL)
 
 /*
  * Common kmalloc functions provided by all allocators
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 4ad2c5a26399..03eef0df8648 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -84,6 +84,9 @@ struct kmem_cache {
 	unsigned int *random_seq;
 #endif
 
+	size_t useroffset;		/* Usercopy region offset */
+	size_t usersize;		/* Usercopy region size */
+
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index cc0faf3a90be..7f373a8ee155 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -130,6 +130,9 @@ struct kmem_cache {
 	struct kasan_cache kasan_info;
 #endif
 
+	size_t useroffset;		/* Usercopy region offset */
+	size_t usersize;		/* Usercopy region size */
+
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/include/linux/stddef.h b/include/linux/stddef.h
index 9c61c7cda936..f00355086fb2 100644
--- a/include/linux/stddef.h
+++ b/include/linux/stddef.h
@@ -18,6 +18,8 @@ enum {
 #define offsetof(TYPE, MEMBER)	((size_t)&((TYPE *)0)->MEMBER)
 #endif
 
+#define sizeof_field(structure, field) sizeof((((structure *)0)->field))
+
 /**
  * offsetofend(TYPE, MEMBER)
  *
diff --git a/mm/slab.c b/mm/slab.c
index 04dec48c3ed7..87b6e5e0cdaf 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1281,7 +1281,7 @@ void __init kmem_cache_init(void)
 	create_boot_cache(kmem_cache, "kmem_cache",
 		offsetof(struct kmem_cache, node) +
 				  nr_node_ids * sizeof(struct kmem_cache_node *),
-				  SLAB_HWCACHE_ALIGN);
+				  SLAB_HWCACHE_ALIGN, 0, 0);
 	list_add(&kmem_cache->list, &slab_caches);
 	slab_state = PARTIAL;
 
diff --git a/mm/slab.h b/mm/slab.h
index 6885e1192ec5..5d4b0fb6b7de 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -21,6 +21,8 @@ struct kmem_cache {
 	unsigned int size;	/* The aligned/padded/added on size  */
 	unsigned int align;	/* Alignment as calculated */
 	unsigned long flags;	/* Active flags on the slab */
+	size_t useroffset;	/* Usercopy region offset */
+	size_t usersize;	/* Usercopy region size */
 	const char *name;	/* Slab name for sysfs */
 	int refcount;		/* Use counter */
 	void (*ctor)(void *);	/* Called on object slot creation */
@@ -96,7 +98,8 @@ extern int __kmem_cache_create(struct kmem_cache *, unsigned long flags);
 extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size,
 			unsigned long flags);
 extern void create_boot_cache(struct kmem_cache *, const char *name,
-			size_t size, unsigned long flags);
+			size_t size, unsigned long flags, size_t useroffset,
+			size_t usersize);
 
 int slab_unmergeable(struct kmem_cache *s);
 struct kmem_cache *find_mergeable(size_t size, size_t align,
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83be82de..4b1bca7c1a42 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -272,6 +272,9 @@ int slab_unmergeable(struct kmem_cache *s)
 	if (s->ctor)
 		return 1;
 
+	if (s->usersize)
+		return 1;
+
 	/*
 	 * We may have set a slab to be unmergeable during bootstrap.
 	 */
@@ -357,12 +360,15 @@ unsigned long calculate_alignment(unsigned long flags,
 
 static struct kmem_cache *create_cache(const char *name,
 		size_t object_size, size_t size, size_t align,
-		unsigned long flags, void (*ctor)(void *),
+		unsigned long flags, size_t useroffset,
+		size_t usersize, void (*ctor)(void *),
 		struct mem_cgroup *memcg, struct kmem_cache *root_cache)
 {
 	struct kmem_cache *s;
 	int err;
 
+	BUG_ON(useroffset + usersize > object_size);
+
 	err = -ENOMEM;
 	s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);
 	if (!s)
@@ -373,6 +379,8 @@ static struct kmem_cache *create_cache(const char *name,
 	s->size = size;
 	s->align = align;
 	s->ctor = ctor;
+	s->useroffset = useroffset;
+	s->usersize = usersize;
 
 	err = init_memcg_params(s, memcg, root_cache);
 	if (err)
@@ -397,11 +405,13 @@ static struct kmem_cache *create_cache(const char *name,
 }
 
 /*
- * kmem_cache_create - Create a cache.
+ * kmem_cache_create_usercopy - Create a cache.
  * @name: A string which is used in /proc/slabinfo to identify this cache.
  * @size: The size of objects to be created in this cache.
  * @align: The required alignment for the objects.
  * @flags: SLAB flags
+ * @useroffset: Usercopy region offset
+ * @usersize: Usercopy region size
  * @ctor: A constructor for the objects.
  *
  * Returns a ptr to the cache on success, NULL on failure.
@@ -421,8 +431,9 @@ static struct kmem_cache *create_cache(const char *name,
  * as davem.
  */
 struct kmem_cache *
-kmem_cache_create(const char *name, size_t size, size_t align,
-		  unsigned long flags, void (*ctor)(void *))
+kmem_cache_create_usercopy(const char *name, size_t size, size_t align,
+		  unsigned long flags, size_t useroffset, size_t usersize,
+		  void (*ctor)(void *))
 {
 	struct kmem_cache *s = NULL;
 	const char *cache_name;
@@ -453,7 +464,10 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 	 */
 	flags &= CACHE_CREATE_MASK;
 
-	s = __kmem_cache_alias(name, size, align, flags, ctor);
+	BUG_ON(!usersize && useroffset);
+	BUG_ON(size < usersize || size - usersize < useroffset);
+	if (!usersize)
+		s = __kmem_cache_alias(name, size, align, flags, ctor);
 	if (s)
 		goto out_unlock;
 
@@ -465,7 +479,7 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 
 	s = create_cache(cache_name, size, size,
 			 calculate_alignment(flags, align, size),
-			 flags, ctor, NULL, NULL);
+			 flags, useroffset, usersize, ctor, NULL, NULL);
 	if (IS_ERR(s)) {
 		err = PTR_ERR(s);
 		kfree_const(cache_name);
@@ -491,6 +505,15 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 	}
 	return s;
 }
+EXPORT_SYMBOL(kmem_cache_create_usercopy);
+
+struct kmem_cache *
+kmem_cache_create(const char *name, size_t size, size_t align,
+		unsigned long flags, void (*ctor)(void *))
+{
+	return kmem_cache_create_usercopy(name, size, align, flags, 0, size,
+					  ctor);
+}
 EXPORT_SYMBOL(kmem_cache_create);
 
 static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
@@ -603,6 +626,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
 	s = create_cache(cache_name, root_cache->object_size,
 			 root_cache->size, root_cache->align,
 			 root_cache->flags & CACHE_CREATE_MASK,
+			 root_cache->useroffset, root_cache->usersize,
 			 root_cache->ctor, memcg, root_cache);
 	/*
 	 * If we could not create a memcg cache, do not complain, because
@@ -870,13 +894,15 @@ bool slab_is_available(void)
 #ifndef CONFIG_SLOB
 /* Create a cache during boot when no slab services are available yet */
 void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t size,
-		unsigned long flags)
+		unsigned long flags, size_t useroffset, size_t usersize)
 {
 	int err;
 
 	s->name = name;
 	s->size = s->object_size = size;
 	s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
+	s->useroffset = useroffset;
+	s->usersize = usersize;
 
 	slab_init_memcg_params(s);
 
@@ -897,7 +923,7 @@ struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
 	if (!s)
 		panic("Out of memory when creating slab %s\n", name);
 
-	create_boot_cache(s, name, size, flags);
+	create_boot_cache(s, name, size, flags, 0, size);
 	list_add(&s->list, &slab_caches);
 	memcg_link_cache(s);
 	s->refcount = 1;
diff --git a/mm/slub.c b/mm/slub.c
index 1d3f9835f4ea..dfed8ef99b68 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4165,7 +4165,7 @@ void __init kmem_cache_init(void)
 	kmem_cache = &boot_kmem_cache;
 
 	create_boot_cache(kmem_cache_node, "kmem_cache_node",
-		sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN);
+		sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN, 0, 0);
 
 	register_hotmemory_notifier(&slab_memory_callback_nb);
 
@@ -4175,7 +4175,7 @@ void __init kmem_cache_init(void)
 	create_boot_cache(kmem_cache, "kmem_cache",
 			offsetof(struct kmem_cache, node) +
 				nr_node_ids * sizeof(struct kmem_cache_node *),
-		       SLAB_HWCACHE_ALIGN);
+		       SLAB_HWCACHE_ALIGN, 0, 0);
 
 	kmem_cache = bootstrap(&boot_kmem_cache);
 
@@ -5045,6 +5045,12 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char *buf)
 SLAB_ATTR_RO(cache_dma);
 #endif
 
+static ssize_t usersize_show(struct kmem_cache *s, char *buf)
+{
+	return sprintf(buf, "%zu\n", s->usersize);
+}
+SLAB_ATTR_RO(usersize);
+
 static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_TYPESAFE_BY_RCU));
@@ -5419,6 +5425,7 @@ static struct attribute *slab_attrs[] = {
 #ifdef CONFIG_FAILSLAB
 	&failslab_attr.attr,
 #endif
+	&usersize_attr.attr,
 
 	NULL
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 01/30] usercopy: Prepare for usercopy whitelisting
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm,
	kernel-hardening

From: David Windsor <dave@nullcore.net>

This patch prepares the slab allocator to handle caches having annotations
(useroffset and usersize) defining usercopy regions.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on
my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code.

Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)

To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.

After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:

Total Slab Memory:           48074720
Usercopyable Memory:          6367532  13.2%
         task_struct                    0.2%         4480/1630720
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%       269760/8740224
         dentry                        11.1%       585984/5273856
         mm_struct                     29.1%         54912/188448
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          81920/81920
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        167936/167936
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        455616/455616
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        812032/812032
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1310720/1310720

After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:

Total Slab Memory:           95516184
Usercopyable Memory:          8497452   8.8%
         task_struct                    0.2%         4000/1456000
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%     1217280/39439872
         dentry                        11.1%     1623200/14608800
         mm_struct                     29.1%         73216/251264
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          94208/94208
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        245760/245760
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        563520/563520
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        794624/794624
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1257472/1257472

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split out a few extra kmalloc hunks]
[kees: add field names to function declarations]
[kees: add attack surface reduction analysis to commit log]
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/slab.h     | 27 +++++++++++++++++++++------
 include/linux/slab_def.h |  3 +++
 include/linux/slub_def.h |  3 +++
 include/linux/stddef.h   |  2 ++
 mm/slab.c                |  2 +-
 mm/slab.h                |  5 ++++-
 mm/slab_common.c         | 42 ++++++++++++++++++++++++++++++++++--------
 mm/slub.c                | 11 +++++++++--
 8 files changed, 77 insertions(+), 18 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 41473df6dfb0..8b6cb384f8b6 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -126,9 +126,13 @@ struct mem_cgroup;
 void __init kmem_cache_init(void);
 bool slab_is_available(void);
 
-struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
-			unsigned long,
-			void (*)(void *));
+struct kmem_cache *kmem_cache_create(const char *name, size_t size,
+			size_t align, unsigned long flags,
+			void (*ctor)(void *));
+struct kmem_cache *kmem_cache_create_usercopy(const char *name,
+			size_t size, size_t align, unsigned long flags,
+			size_t useroffset, size_t usersize,
+			void (*ctor)(void *));
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
 
@@ -144,9 +148,20 @@ void memcg_destroy_kmem_caches(struct mem_cgroup *);
  * f.e. add ____cacheline_aligned_in_smp to the struct declaration
  * then the objects will be properly aligned in SMP configurations.
  */
-#define KMEM_CACHE(__struct, __flags) kmem_cache_create(#__struct,\
-		sizeof(struct __struct), __alignof__(struct __struct),\
-		(__flags), NULL)
+#define KMEM_CACHE(__struct, __flags)					\
+		kmem_cache_create(#__struct, sizeof(struct __struct),	\
+			__alignof__(struct __struct), (__flags), NULL)
+
+/*
+ * To whitelist a single field for copying to/from usercopy, use this
+ * macro instead for KMEM_CACHE() above.
+ */
+#define KMEM_CACHE_USERCOPY(__struct, __flags, __field)			\
+		kmem_cache_create_usercopy(#__struct,			\
+			sizeof(struct __struct),			\
+			__alignof__(struct __struct), (__flags),	\
+			offsetof(struct __struct, __field),		\
+			sizeof_field(struct __struct, __field), NULL)
 
 /*
  * Common kmalloc functions provided by all allocators
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 4ad2c5a26399..03eef0df8648 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -84,6 +84,9 @@ struct kmem_cache {
 	unsigned int *random_seq;
 #endif
 
+	size_t useroffset;		/* Usercopy region offset */
+	size_t usersize;		/* Usercopy region size */
+
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index cc0faf3a90be..7f373a8ee155 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -130,6 +130,9 @@ struct kmem_cache {
 	struct kasan_cache kasan_info;
 #endif
 
+	size_t useroffset;		/* Usercopy region offset */
+	size_t usersize;		/* Usercopy region size */
+
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/include/linux/stddef.h b/include/linux/stddef.h
index 9c61c7cda936..f00355086fb2 100644
--- a/include/linux/stddef.h
+++ b/include/linux/stddef.h
@@ -18,6 +18,8 @@ enum {
 #define offsetof(TYPE, MEMBER)	((size_t)&((TYPE *)0)->MEMBER)
 #endif
 
+#define sizeof_field(structure, field) sizeof((((structure *)0)->field))
+
 /**
  * offsetofend(TYPE, MEMBER)
  *
diff --git a/mm/slab.c b/mm/slab.c
index 04dec48c3ed7..87b6e5e0cdaf 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1281,7 +1281,7 @@ void __init kmem_cache_init(void)
 	create_boot_cache(kmem_cache, "kmem_cache",
 		offsetof(struct kmem_cache, node) +
 				  nr_node_ids * sizeof(struct kmem_cache_node *),
-				  SLAB_HWCACHE_ALIGN);
+				  SLAB_HWCACHE_ALIGN, 0, 0);
 	list_add(&kmem_cache->list, &slab_caches);
 	slab_state = PARTIAL;
 
diff --git a/mm/slab.h b/mm/slab.h
index 6885e1192ec5..5d4b0fb6b7de 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -21,6 +21,8 @@ struct kmem_cache {
 	unsigned int size;	/* The aligned/padded/added on size  */
 	unsigned int align;	/* Alignment as calculated */
 	unsigned long flags;	/* Active flags on the slab */
+	size_t useroffset;	/* Usercopy region offset */
+	size_t usersize;	/* Usercopy region size */
 	const char *name;	/* Slab name for sysfs */
 	int refcount;		/* Use counter */
 	void (*ctor)(void *);	/* Called on object slot creation */
@@ -96,7 +98,8 @@ extern int __kmem_cache_create(struct kmem_cache *, unsigned long flags);
 extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size,
 			unsigned long flags);
 extern void create_boot_cache(struct kmem_cache *, const char *name,
-			size_t size, unsigned long flags);
+			size_t size, unsigned long flags, size_t useroffset,
+			size_t usersize);
 
 int slab_unmergeable(struct kmem_cache *s);
 struct kmem_cache *find_mergeable(size_t size, size_t align,
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83be82de..4b1bca7c1a42 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -272,6 +272,9 @@ int slab_unmergeable(struct kmem_cache *s)
 	if (s->ctor)
 		return 1;
 
+	if (s->usersize)
+		return 1;
+
 	/*
 	 * We may have set a slab to be unmergeable during bootstrap.
 	 */
@@ -357,12 +360,15 @@ unsigned long calculate_alignment(unsigned long flags,
 
 static struct kmem_cache *create_cache(const char *name,
 		size_t object_size, size_t size, size_t align,
-		unsigned long flags, void (*ctor)(void *),
+		unsigned long flags, size_t useroffset,
+		size_t usersize, void (*ctor)(void *),
 		struct mem_cgroup *memcg, struct kmem_cache *root_cache)
 {
 	struct kmem_cache *s;
 	int err;
 
+	BUG_ON(useroffset + usersize > object_size);
+
 	err = -ENOMEM;
 	s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);
 	if (!s)
@@ -373,6 +379,8 @@ static struct kmem_cache *create_cache(const char *name,
 	s->size = size;
 	s->align = align;
 	s->ctor = ctor;
+	s->useroffset = useroffset;
+	s->usersize = usersize;
 
 	err = init_memcg_params(s, memcg, root_cache);
 	if (err)
@@ -397,11 +405,13 @@ static struct kmem_cache *create_cache(const char *name,
 }
 
 /*
- * kmem_cache_create - Create a cache.
+ * kmem_cache_create_usercopy - Create a cache.
  * @name: A string which is used in /proc/slabinfo to identify this cache.
  * @size: The size of objects to be created in this cache.
  * @align: The required alignment for the objects.
  * @flags: SLAB flags
+ * @useroffset: Usercopy region offset
+ * @usersize: Usercopy region size
  * @ctor: A constructor for the objects.
  *
  * Returns a ptr to the cache on success, NULL on failure.
@@ -421,8 +431,9 @@ static struct kmem_cache *create_cache(const char *name,
  * as davem.
  */
 struct kmem_cache *
-kmem_cache_create(const char *name, size_t size, size_t align,
-		  unsigned long flags, void (*ctor)(void *))
+kmem_cache_create_usercopy(const char *name, size_t size, size_t align,
+		  unsigned long flags, size_t useroffset, size_t usersize,
+		  void (*ctor)(void *))
 {
 	struct kmem_cache *s = NULL;
 	const char *cache_name;
@@ -453,7 +464,10 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 	 */
 	flags &= CACHE_CREATE_MASK;
 
-	s = __kmem_cache_alias(name, size, align, flags, ctor);
+	BUG_ON(!usersize && useroffset);
+	BUG_ON(size < usersize || size - usersize < useroffset);
+	if (!usersize)
+		s = __kmem_cache_alias(name, size, align, flags, ctor);
 	if (s)
 		goto out_unlock;
 
@@ -465,7 +479,7 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 
 	s = create_cache(cache_name, size, size,
 			 calculate_alignment(flags, align, size),
-			 flags, ctor, NULL, NULL);
+			 flags, useroffset, usersize, ctor, NULL, NULL);
 	if (IS_ERR(s)) {
 		err = PTR_ERR(s);
 		kfree_const(cache_name);
@@ -491,6 +505,15 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 	}
 	return s;
 }
+EXPORT_SYMBOL(kmem_cache_create_usercopy);
+
+struct kmem_cache *
+kmem_cache_create(const char *name, size_t size, size_t align,
+		unsigned long flags, void (*ctor)(void *))
+{
+	return kmem_cache_create_usercopy(name, size, align, flags, 0, size,
+					  ctor);
+}
 EXPORT_SYMBOL(kmem_cache_create);
 
 static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
@@ -603,6 +626,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
 	s = create_cache(cache_name, root_cache->object_size,
 			 root_cache->size, root_cache->align,
 			 root_cache->flags & CACHE_CREATE_MASK,
+			 root_cache->useroffset, root_cache->usersize,
 			 root_cache->ctor, memcg, root_cache);
 	/*
 	 * If we could not create a memcg cache, do not complain, because
@@ -870,13 +894,15 @@ bool slab_is_available(void)
 #ifndef CONFIG_SLOB
 /* Create a cache during boot when no slab services are available yet */
 void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t size,
-		unsigned long flags)
+		unsigned long flags, size_t useroffset, size_t usersize)
 {
 	int err;
 
 	s->name = name;
 	s->size = s->object_size = size;
 	s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
+	s->useroffset = useroffset;
+	s->usersize = usersize;
 
 	slab_init_memcg_params(s);
 
@@ -897,7 +923,7 @@ struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
 	if (!s)
 		panic("Out of memory when creating slab %s\n", name);
 
-	create_boot_cache(s, name, size, flags);
+	create_boot_cache(s, name, size, flags, 0, size);
 	list_add(&s->list, &slab_caches);
 	memcg_link_cache(s);
 	s->refcount = 1;
diff --git a/mm/slub.c b/mm/slub.c
index 1d3f9835f4ea..dfed8ef99b68 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4165,7 +4165,7 @@ void __init kmem_cache_init(void)
 	kmem_cache = &boot_kmem_cache;
 
 	create_boot_cache(kmem_cache_node, "kmem_cache_node",
-		sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN);
+		sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN, 0, 0);
 
 	register_hotmemory_notifier(&slab_memory_callback_nb);
 
@@ -4175,7 +4175,7 @@ void __init kmem_cache_init(void)
 	create_boot_cache(kmem_cache, "kmem_cache",
 			offsetof(struct kmem_cache, node) +
 				nr_node_ids * sizeof(struct kmem_cache_node *),
-		       SLAB_HWCACHE_ALIGN);
+		       SLAB_HWCACHE_ALIGN, 0, 0);
 
 	kmem_cache = bootstrap(&boot_kmem_cache);
 
@@ -5045,6 +5045,12 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char *buf)
 SLAB_ATTR_RO(cache_dma);
 #endif
 
+static ssize_t usersize_show(struct kmem_cache *s, char *buf)
+{
+	return sprintf(buf, "%zu\n", s->usersize);
+}
+SLAB_ATTR_RO(usersize);
+
 static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_TYPESAFE_BY_RCU));
@@ -5419,6 +5425,7 @@ static struct attribute *slab_attrs[] = {
 #ifdef CONFIG_FAILSLAB
 	&failslab_attr.attr,
 #endif
+	&usersize_attr.attr,
 
 	NULL
 };
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 01/30] usercopy: Prepare for usercopy whitelisting
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm,
	kernel-hardening

From: David Windsor <dave@nullcore.net>

This patch prepares the slab allocator to handle caches having annotations
(useroffset and usersize) defining usercopy regions.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on
my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code.

Currently, hardened usercopy performs dynamic bounds checking on slab
cache objects. This is good, but still leaves a lot of kernel memory
available to be copied to/from userspace in the face of bugs. To further
restrict what memory is available for copying, this creates a way to
whitelist specific areas of a given slab cache object for copying to/from
userspace, allowing much finer granularity of access control. Slab caches
that are never exposed to userspace can declare no whitelist for their
objects, thereby keeping them unavailable to userspace via dynamic copy
operations. (Note, an implicit form of whitelisting is the use of constant
sizes in usercopy operations and get_user()/put_user(); these bypass
hardened usercopy checks since these sizes cannot change at runtime.)

To support this whitelist annotation, usercopy region offset and size
members are added to struct kmem_cache. The slab allocator receives a
new function, kmem_cache_create_usercopy(), that creates a new cache
with a usercopy region defined, suitable for declaring spans of fields
within the objects that get copied to/from userspace.

In this patch, the default kmem_cache_create() marks the entire allocation
as whitelisted, leaving it semantically unchanged. Once all fine-grained
whitelists have been added (in subsequent patches), this will be changed
to a usersize of 0, making caches created with kmem_cache_create() not
copyable to/from userspace.

After the entire usercopy whitelist series is applied, less than 15%
of the slab cache memory remains exposed to potential usercopy bugs
after a fresh boot:

Total Slab Memory:           48074720
Usercopyable Memory:          6367532  13.2%
         task_struct                    0.2%         4480/1630720
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%       269760/8740224
         dentry                        11.1%       585984/5273856
         mm_struct                     29.1%         54912/188448
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          81920/81920
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        167936/167936
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        455616/455616
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        812032/812032
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1310720/1310720

After some kernel build workloads, the percentage (mainly driven by
dentry and inode caches expanding) drops under 10%:

Total Slab Memory:           95516184
Usercopyable Memory:          8497452   8.8%
         task_struct                    0.2%         4000/1456000
         RAW                            0.3%            300/96000
         RAWv6                          2.1%           1408/64768
         ext4_inode_cache               3.0%     1217280/39439872
         dentry                        11.1%     1623200/14608800
         mm_struct                     29.1%         73216/251264
         kmalloc-8                    100.0%          24576/24576
         kmalloc-16                   100.0%          28672/28672
         kmalloc-32                   100.0%          94208/94208
         kmalloc-192                  100.0%          96768/96768
         kmalloc-128                  100.0%        143360/143360
         names_cache                  100.0%        163840/163840
         kmalloc-64                   100.0%        245760/245760
         kmalloc-256                  100.0%        339968/339968
         kmalloc-512                  100.0%        350720/350720
         kmalloc-96                   100.0%        563520/563520
         kmalloc-8192                 100.0%        655360/655360
         kmalloc-1024                 100.0%        794624/794624
         kmalloc-4096                 100.0%        819200/819200
         kmalloc-2048                 100.0%      1257472/1257472

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split out a few extra kmalloc hunks]
[kees: add field names to function declarations]
[kees: add attack surface reduction analysis to commit log]
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/slab.h     | 27 +++++++++++++++++++++------
 include/linux/slab_def.h |  3 +++
 include/linux/slub_def.h |  3 +++
 include/linux/stddef.h   |  2 ++
 mm/slab.c                |  2 +-
 mm/slab.h                |  5 ++++-
 mm/slab_common.c         | 42 ++++++++++++++++++++++++++++++++++--------
 mm/slub.c                | 11 +++++++++--
 8 files changed, 77 insertions(+), 18 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 41473df6dfb0..8b6cb384f8b6 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -126,9 +126,13 @@ struct mem_cgroup;
 void __init kmem_cache_init(void);
 bool slab_is_available(void);
 
-struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
-			unsigned long,
-			void (*)(void *));
+struct kmem_cache *kmem_cache_create(const char *name, size_t size,
+			size_t align, unsigned long flags,
+			void (*ctor)(void *));
+struct kmem_cache *kmem_cache_create_usercopy(const char *name,
+			size_t size, size_t align, unsigned long flags,
+			size_t useroffset, size_t usersize,
+			void (*ctor)(void *));
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
 
@@ -144,9 +148,20 @@ void memcg_destroy_kmem_caches(struct mem_cgroup *);
  * f.e. add ____cacheline_aligned_in_smp to the struct declaration
  * then the objects will be properly aligned in SMP configurations.
  */
-#define KMEM_CACHE(__struct, __flags) kmem_cache_create(#__struct,\
-		sizeof(struct __struct), __alignof__(struct __struct),\
-		(__flags), NULL)
+#define KMEM_CACHE(__struct, __flags)					\
+		kmem_cache_create(#__struct, sizeof(struct __struct),	\
+			__alignof__(struct __struct), (__flags), NULL)
+
+/*
+ * To whitelist a single field for copying to/from usercopy, use this
+ * macro instead for KMEM_CACHE() above.
+ */
+#define KMEM_CACHE_USERCOPY(__struct, __flags, __field)			\
+		kmem_cache_create_usercopy(#__struct,			\
+			sizeof(struct __struct),			\
+			__alignof__(struct __struct), (__flags),	\
+			offsetof(struct __struct, __field),		\
+			sizeof_field(struct __struct, __field), NULL)
 
 /*
  * Common kmalloc functions provided by all allocators
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 4ad2c5a26399..03eef0df8648 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -84,6 +84,9 @@ struct kmem_cache {
 	unsigned int *random_seq;
 #endif
 
+	size_t useroffset;		/* Usercopy region offset */
+	size_t usersize;		/* Usercopy region size */
+
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index cc0faf3a90be..7f373a8ee155 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -130,6 +130,9 @@ struct kmem_cache {
 	struct kasan_cache kasan_info;
 #endif
 
+	size_t useroffset;		/* Usercopy region offset */
+	size_t usersize;		/* Usercopy region size */
+
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/include/linux/stddef.h b/include/linux/stddef.h
index 9c61c7cda936..f00355086fb2 100644
--- a/include/linux/stddef.h
+++ b/include/linux/stddef.h
@@ -18,6 +18,8 @@ enum {
 #define offsetof(TYPE, MEMBER)	((size_t)&((TYPE *)0)->MEMBER)
 #endif
 
+#define sizeof_field(structure, field) sizeof((((structure *)0)->field))
+
 /**
  * offsetofend(TYPE, MEMBER)
  *
diff --git a/mm/slab.c b/mm/slab.c
index 04dec48c3ed7..87b6e5e0cdaf 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1281,7 +1281,7 @@ void __init kmem_cache_init(void)
 	create_boot_cache(kmem_cache, "kmem_cache",
 		offsetof(struct kmem_cache, node) +
 				  nr_node_ids * sizeof(struct kmem_cache_node *),
-				  SLAB_HWCACHE_ALIGN);
+				  SLAB_HWCACHE_ALIGN, 0, 0);
 	list_add(&kmem_cache->list, &slab_caches);
 	slab_state = PARTIAL;
 
diff --git a/mm/slab.h b/mm/slab.h
index 6885e1192ec5..5d4b0fb6b7de 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -21,6 +21,8 @@ struct kmem_cache {
 	unsigned int size;	/* The aligned/padded/added on size  */
 	unsigned int align;	/* Alignment as calculated */
 	unsigned long flags;	/* Active flags on the slab */
+	size_t useroffset;	/* Usercopy region offset */
+	size_t usersize;	/* Usercopy region size */
 	const char *name;	/* Slab name for sysfs */
 	int refcount;		/* Use counter */
 	void (*ctor)(void *);	/* Called on object slot creation */
@@ -96,7 +98,8 @@ extern int __kmem_cache_create(struct kmem_cache *, unsigned long flags);
 extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size,
 			unsigned long flags);
 extern void create_boot_cache(struct kmem_cache *, const char *name,
-			size_t size, unsigned long flags);
+			size_t size, unsigned long flags, size_t useroffset,
+			size_t usersize);
 
 int slab_unmergeable(struct kmem_cache *s);
 struct kmem_cache *find_mergeable(size_t size, size_t align,
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 904a83be82de..4b1bca7c1a42 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -272,6 +272,9 @@ int slab_unmergeable(struct kmem_cache *s)
 	if (s->ctor)
 		return 1;
 
+	if (s->usersize)
+		return 1;
+
 	/*
 	 * We may have set a slab to be unmergeable during bootstrap.
 	 */
@@ -357,12 +360,15 @@ unsigned long calculate_alignment(unsigned long flags,
 
 static struct kmem_cache *create_cache(const char *name,
 		size_t object_size, size_t size, size_t align,
-		unsigned long flags, void (*ctor)(void *),
+		unsigned long flags, size_t useroffset,
+		size_t usersize, void (*ctor)(void *),
 		struct mem_cgroup *memcg, struct kmem_cache *root_cache)
 {
 	struct kmem_cache *s;
 	int err;
 
+	BUG_ON(useroffset + usersize > object_size);
+
 	err = -ENOMEM;
 	s = kmem_cache_zalloc(kmem_cache, GFP_KERNEL);
 	if (!s)
@@ -373,6 +379,8 @@ static struct kmem_cache *create_cache(const char *name,
 	s->size = size;
 	s->align = align;
 	s->ctor = ctor;
+	s->useroffset = useroffset;
+	s->usersize = usersize;
 
 	err = init_memcg_params(s, memcg, root_cache);
 	if (err)
@@ -397,11 +405,13 @@ static struct kmem_cache *create_cache(const char *name,
 }
 
 /*
- * kmem_cache_create - Create a cache.
+ * kmem_cache_create_usercopy - Create a cache.
  * @name: A string which is used in /proc/slabinfo to identify this cache.
  * @size: The size of objects to be created in this cache.
  * @align: The required alignment for the objects.
  * @flags: SLAB flags
+ * @useroffset: Usercopy region offset
+ * @usersize: Usercopy region size
  * @ctor: A constructor for the objects.
  *
  * Returns a ptr to the cache on success, NULL on failure.
@@ -421,8 +431,9 @@ static struct kmem_cache *create_cache(const char *name,
  * as davem.
  */
 struct kmem_cache *
-kmem_cache_create(const char *name, size_t size, size_t align,
-		  unsigned long flags, void (*ctor)(void *))
+kmem_cache_create_usercopy(const char *name, size_t size, size_t align,
+		  unsigned long flags, size_t useroffset, size_t usersize,
+		  void (*ctor)(void *))
 {
 	struct kmem_cache *s = NULL;
 	const char *cache_name;
@@ -453,7 +464,10 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 	 */
 	flags &= CACHE_CREATE_MASK;
 
-	s = __kmem_cache_alias(name, size, align, flags, ctor);
+	BUG_ON(!usersize && useroffset);
+	BUG_ON(size < usersize || size - usersize < useroffset);
+	if (!usersize)
+		s = __kmem_cache_alias(name, size, align, flags, ctor);
 	if (s)
 		goto out_unlock;
 
@@ -465,7 +479,7 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 
 	s = create_cache(cache_name, size, size,
 			 calculate_alignment(flags, align, size),
-			 flags, ctor, NULL, NULL);
+			 flags, useroffset, usersize, ctor, NULL, NULL);
 	if (IS_ERR(s)) {
 		err = PTR_ERR(s);
 		kfree_const(cache_name);
@@ -491,6 +505,15 @@ kmem_cache_create(const char *name, size_t size, size_t align,
 	}
 	return s;
 }
+EXPORT_SYMBOL(kmem_cache_create_usercopy);
+
+struct kmem_cache *
+kmem_cache_create(const char *name, size_t size, size_t align,
+		unsigned long flags, void (*ctor)(void *))
+{
+	return kmem_cache_create_usercopy(name, size, align, flags, 0, size,
+					  ctor);
+}
 EXPORT_SYMBOL(kmem_cache_create);
 
 static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
@@ -603,6 +626,7 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
 	s = create_cache(cache_name, root_cache->object_size,
 			 root_cache->size, root_cache->align,
 			 root_cache->flags & CACHE_CREATE_MASK,
+			 root_cache->useroffset, root_cache->usersize,
 			 root_cache->ctor, memcg, root_cache);
 	/*
 	 * If we could not create a memcg cache, do not complain, because
@@ -870,13 +894,15 @@ bool slab_is_available(void)
 #ifndef CONFIG_SLOB
 /* Create a cache during boot when no slab services are available yet */
 void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t size,
-		unsigned long flags)
+		unsigned long flags, size_t useroffset, size_t usersize)
 {
 	int err;
 
 	s->name = name;
 	s->size = s->object_size = size;
 	s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
+	s->useroffset = useroffset;
+	s->usersize = usersize;
 
 	slab_init_memcg_params(s);
 
@@ -897,7 +923,7 @@ struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
 	if (!s)
 		panic("Out of memory when creating slab %s\n", name);
 
-	create_boot_cache(s, name, size, flags);
+	create_boot_cache(s, name, size, flags, 0, size);
 	list_add(&s->list, &slab_caches);
 	memcg_link_cache(s);
 	s->refcount = 1;
diff --git a/mm/slub.c b/mm/slub.c
index 1d3f9835f4ea..dfed8ef99b68 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4165,7 +4165,7 @@ void __init kmem_cache_init(void)
 	kmem_cache = &boot_kmem_cache;
 
 	create_boot_cache(kmem_cache_node, "kmem_cache_node",
-		sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN);
+		sizeof(struct kmem_cache_node), SLAB_HWCACHE_ALIGN, 0, 0);
 
 	register_hotmemory_notifier(&slab_memory_callback_nb);
 
@@ -4175,7 +4175,7 @@ void __init kmem_cache_init(void)
 	create_boot_cache(kmem_cache, "kmem_cache",
 			offsetof(struct kmem_cache, node) +
 				nr_node_ids * sizeof(struct kmem_cache_node *),
-		       SLAB_HWCACHE_ALIGN);
+		       SLAB_HWCACHE_ALIGN, 0, 0);
 
 	kmem_cache = bootstrap(&boot_kmem_cache);
 
@@ -5045,6 +5045,12 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char *buf)
 SLAB_ATTR_RO(cache_dma);
 #endif
 
+static ssize_t usersize_show(struct kmem_cache *s, char *buf)
+{
+	return sprintf(buf, "%zu\n", s->usersize);
+}
+SLAB_ATTR_RO(usersize);
+
 static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf)
 {
 	return sprintf(buf, "%d\n", !!(s->flags & SLAB_TYPESAFE_BY_RCU));
@@ -5419,6 +5425,7 @@ static struct attribute *slab_attrs[] = {
 #ifdef CONFIG_FAILSLAB
 	&failslab_attr.attr,
 #endif
+	&usersize_attr.attr,
 
 	NULL
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 02/30] usercopy: Enforce slab cache usercopy region boundaries
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Laura Abbott,
	Ingo Molnar, Mark Rutland, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

This patch adds the enforcement component of usercopy cache whitelisting,
and is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting
code in the last public patch of grsecurity/PaX based on my understanding
of the code. Changes or omissions from the original code are mine and
don't reflect the original grsecurity/PaX code.

The SLAB and SLUB allocators are modified to deny all copy operations
in which the kernel heap memory being modified falls outside of the cache's
defined usercopy region.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log and comments]
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-mm@kvack.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 mm/slab.c     | 16 +++++++++++-----
 mm/slub.c     | 18 +++++++++++-------
 mm/usercopy.c | 12 ++++++++++++
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 87b6e5e0cdaf..df268999cf02 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4408,7 +4408,9 @@ module_init(slab_proc_init);
 
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
- * Rejects objects that are incorrectly sized.
+ * Rejects incorrectly sized objects and objects that are to be copied
+ * to/from userspace but do not fall entirely within the containing slab
+ * cache's usercopy region.
  *
  * Returns NULL if check passes, otherwise const char * to name of cache
  * to indicate an error.
@@ -4428,11 +4430,15 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
 	/* Find offset within object. */
 	offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);
 
-	/* Allow address range falling entirely within object size. */
-	if (offset <= cachep->object_size && n <= cachep->object_size - offset)
-		return NULL;
+	/* Make sure object falls entirely within cache's usercopy region. */
+	if (offset < cachep->useroffset)
+		return cachep->name;
+	if (offset - cachep->useroffset > cachep->usersize)
+		return cachep->name;
+	if (n > cachep->useroffset - offset + cachep->usersize)
+		return cachep->name;
 
-	return cachep->name;
+	return NULL;
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
 
diff --git a/mm/slub.c b/mm/slub.c
index dfed8ef99b68..bd4e2b9d4524 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3797,7 +3797,9 @@ EXPORT_SYMBOL(__kmalloc_node);
 
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
- * Rejects objects that are incorrectly sized.
+ * Rejects incorrectly sized objects and objects that are to be copied
+ * to/from userspace but do not fall entirely within the containing slab
+ * cache's usercopy region.
  *
  * Returns NULL if check passes, otherwise const char * to name of cache
  * to indicate an error.
@@ -3807,11 +3809,9 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
 {
 	struct kmem_cache *s;
 	unsigned long offset;
-	size_t object_size;
 
 	/* Find object and usable object size. */
 	s = page->slab_cache;
-	object_size = slab_ksize(s);
 
 	/* Reject impossible pointers. */
 	if (ptr < page_address(page))
@@ -3827,11 +3827,15 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
 		offset -= s->red_left_pad;
 	}
 
-	/* Allow address range falling entirely within object size. */
-	if (offset <= object_size && n <= object_size - offset)
-		return NULL;
+	/* Make sure object falls entirely within cache's usercopy region. */
+	if (offset < s->useroffset)
+		return s->name;
+	if (offset - s->useroffset > s->usersize)
+		return s->name;
+	if (n > s->useroffset - offset + s->usersize)
+		return s->name;
 
-	return s->name;
+	return NULL;
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
 
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b24715d..cbffde670c49 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -58,6 +58,18 @@ static noinline int check_stack_object(const void *obj, unsigned long len)
 	return GOOD_STACK;
 }
 
+/*
+ * If this function is reached, then CONFIG_HARDENED_USERCOPY has found an
+ * unexpected state during a copy_from_user() or copy_to_user() call.
+ * There are several checks being performed on the buffer by the
+ * __check_object_size() function. Normal stack buffer usage should never
+ * trip the checks, and kernel text addressing will always trip the check.
+ * For cache objects, it is checking that only the whitelisted range of
+ * bytes for a given cache is being accessed (via the cache's usersize and
+ * useroffset fields). To adjust a cache whitelist, use the usercopy-aware
+ * kmem_cache_create_usercopy() function to create the cache (and
+ * carefully audit the whitelist range).
+ */
 static void report_usercopy(const void *ptr, unsigned long len,
 			    bool to_user, const char *type)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 02/30] usercopy: Enforce slab cache usercopy region boundaries
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Laura Abbott,
	Ingo Molnar, Mark Rutland, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

This patch adds the enforcement component of usercopy cache whitelisting,
and is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting
code in the last public patch of grsecurity/PaX based on my understanding
of the code. Changes or omissions from the original code are mine and
don't reflect the original grsecurity/PaX code.

The SLAB and SLUB allocators are modified to deny all copy operations
in which the kernel heap memory being modified falls outside of the cache's
defined usercopy region.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log and comments]
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-mm@kvack.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 mm/slab.c     | 16 +++++++++++-----
 mm/slub.c     | 18 +++++++++++-------
 mm/usercopy.c | 12 ++++++++++++
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 87b6e5e0cdaf..df268999cf02 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4408,7 +4408,9 @@ module_init(slab_proc_init);
 
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
- * Rejects objects that are incorrectly sized.
+ * Rejects incorrectly sized objects and objects that are to be copied
+ * to/from userspace but do not fall entirely within the containing slab
+ * cache's usercopy region.
  *
  * Returns NULL if check passes, otherwise const char * to name of cache
  * to indicate an error.
@@ -4428,11 +4430,15 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
 	/* Find offset within object. */
 	offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);
 
-	/* Allow address range falling entirely within object size. */
-	if (offset <= cachep->object_size && n <= cachep->object_size - offset)
-		return NULL;
+	/* Make sure object falls entirely within cache's usercopy region. */
+	if (offset < cachep->useroffset)
+		return cachep->name;
+	if (offset - cachep->useroffset > cachep->usersize)
+		return cachep->name;
+	if (n > cachep->useroffset - offset + cachep->usersize)
+		return cachep->name;
 
-	return cachep->name;
+	return NULL;
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
 
diff --git a/mm/slub.c b/mm/slub.c
index dfed8ef99b68..bd4e2b9d4524 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3797,7 +3797,9 @@ EXPORT_SYMBOL(__kmalloc_node);
 
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
- * Rejects objects that are incorrectly sized.
+ * Rejects incorrectly sized objects and objects that are to be copied
+ * to/from userspace but do not fall entirely within the containing slab
+ * cache's usercopy region.
  *
  * Returns NULL if check passes, otherwise const char * to name of cache
  * to indicate an error.
@@ -3807,11 +3809,9 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
 {
 	struct kmem_cache *s;
 	unsigned long offset;
-	size_t object_size;
 
 	/* Find object and usable object size. */
 	s = page->slab_cache;
-	object_size = slab_ksize(s);
 
 	/* Reject impossible pointers. */
 	if (ptr < page_address(page))
@@ -3827,11 +3827,15 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
 		offset -= s->red_left_pad;
 	}
 
-	/* Allow address range falling entirely within object size. */
-	if (offset <= object_size && n <= object_size - offset)
-		return NULL;
+	/* Make sure object falls entirely within cache's usercopy region. */
+	if (offset < s->useroffset)
+		return s->name;
+	if (offset - s->useroffset > s->usersize)
+		return s->name;
+	if (n > s->useroffset - offset + s->usersize)
+		return s->name;
 
-	return s->name;
+	return NULL;
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
 
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b24715d..cbffde670c49 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -58,6 +58,18 @@ static noinline int check_stack_object(const void *obj, unsigned long len)
 	return GOOD_STACK;
 }
 
+/*
+ * If this function is reached, then CONFIG_HARDENED_USERCOPY has found an
+ * unexpected state during a copy_from_user() or copy_to_user() call.
+ * There are several checks being performed on the buffer by the
+ * __check_object_size() function. Normal stack buffer usage should never
+ * trip the checks, and kernel text addressing will always trip the check.
+ * For cache objects, it is checking that only the whitelisted range of
+ * bytes for a given cache is being accessed (via the cache's usersize and
+ * useroffset fields). To adjust a cache whitelist, use the usercopy-aware
+ * kmem_cache_create_usercopy() function to create the cache (and
+ * carefully audit the whitelist range).
+ */
 static void report_usercopy(const void *ptr, unsigned long len,
 			    bool to_user, const char *type)
 {
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 02/30] usercopy: Enforce slab cache usercopy region boundaries
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Laura Abbott,
	Ingo Molnar, Mark Rutland, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

This patch adds the enforcement component of usercopy cache whitelisting,
and is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting
code in the last public patch of grsecurity/PaX based on my understanding
of the code. Changes or omissions from the original code are mine and
don't reflect the original grsecurity/PaX code.

The SLAB and SLUB allocators are modified to deny all copy operations
in which the kernel heap memory being modified falls outside of the cache's
defined usercopy region.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log and comments]
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-mm@kvack.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 mm/slab.c     | 16 +++++++++++-----
 mm/slub.c     | 18 +++++++++++-------
 mm/usercopy.c | 12 ++++++++++++
 3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 87b6e5e0cdaf..df268999cf02 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4408,7 +4408,9 @@ module_init(slab_proc_init);
 
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
- * Rejects objects that are incorrectly sized.
+ * Rejects incorrectly sized objects and objects that are to be copied
+ * to/from userspace but do not fall entirely within the containing slab
+ * cache's usercopy region.
  *
  * Returns NULL if check passes, otherwise const char * to name of cache
  * to indicate an error.
@@ -4428,11 +4430,15 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
 	/* Find offset within object. */
 	offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);
 
-	/* Allow address range falling entirely within object size. */
-	if (offset <= cachep->object_size && n <= cachep->object_size - offset)
-		return NULL;
+	/* Make sure object falls entirely within cache's usercopy region. */
+	if (offset < cachep->useroffset)
+		return cachep->name;
+	if (offset - cachep->useroffset > cachep->usersize)
+		return cachep->name;
+	if (n > cachep->useroffset - offset + cachep->usersize)
+		return cachep->name;
 
-	return cachep->name;
+	return NULL;
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
 
diff --git a/mm/slub.c b/mm/slub.c
index dfed8ef99b68..bd4e2b9d4524 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3797,7 +3797,9 @@ EXPORT_SYMBOL(__kmalloc_node);
 
 #ifdef CONFIG_HARDENED_USERCOPY
 /*
- * Rejects objects that are incorrectly sized.
+ * Rejects incorrectly sized objects and objects that are to be copied
+ * to/from userspace but do not fall entirely within the containing slab
+ * cache's usercopy region.
  *
  * Returns NULL if check passes, otherwise const char * to name of cache
  * to indicate an error.
@@ -3807,11 +3809,9 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
 {
 	struct kmem_cache *s;
 	unsigned long offset;
-	size_t object_size;
 
 	/* Find object and usable object size. */
 	s = page->slab_cache;
-	object_size = slab_ksize(s);
 
 	/* Reject impossible pointers. */
 	if (ptr < page_address(page))
@@ -3827,11 +3827,15 @@ const char *__check_heap_object(const void *ptr, unsigned long n,
 		offset -= s->red_left_pad;
 	}
 
-	/* Allow address range falling entirely within object size. */
-	if (offset <= object_size && n <= object_size - offset)
-		return NULL;
+	/* Make sure object falls entirely within cache's usercopy region. */
+	if (offset < s->useroffset)
+		return s->name;
+	if (offset - s->useroffset > s->usersize)
+		return s->name;
+	if (n > s->useroffset - offset + s->usersize)
+		return s->name;
 
-	return s->name;
+	return NULL;
 }
 #endif /* CONFIG_HARDENED_USERCOPY */
 
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b24715d..cbffde670c49 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -58,6 +58,18 @@ static noinline int check_stack_object(const void *obj, unsigned long len)
 	return GOOD_STACK;
 }
 
+/*
+ * If this function is reached, then CONFIG_HARDENED_USERCOPY has found an
+ * unexpected state during a copy_from_user() or copy_to_user() call.
+ * There are several checks being performed on the buffer by the
+ * __check_object_size() function. Normal stack buffer usage should never
+ * trip the checks, and kernel text addressing will always trip the check.
+ * For cache objects, it is checking that only the whitelisted range of
+ * bytes for a given cache is being accessed (via the cache's usersize and
+ * useroffset fields). To adjust a cache whitelist, use the usercopy-aware
+ * kmem_cache_create_usercopy() function to create the cache (and
+ * carefully audit the whitelist range).
+ */
 static void report_usercopy(const void *ptr, unsigned long len,
 			    bool to_user, const char *type)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 03/30] usercopy: Mark kmalloc caches as usercopy caches
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm,
	kernel-hardening

From: David Windsor <dave@nullcore.net>

Mark the kmalloc slab caches as entirely whitelisted. These caches
are frequently used to fulfill kernel allocations that contain data
to be copied to/from userspace. Internal-only uses are also common,
but are scattered in the kernel. For now, mark all the kmalloc caches
as whitelisted.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: merged in moved kmalloc hunks, adjust commit log]
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 mm/slab.c        |  3 ++-
 mm/slab.h        |  3 ++-
 mm/slab_common.c | 10 ++++++----
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index df268999cf02..9af16f675927 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1291,7 +1291,8 @@ void __init kmem_cache_init(void)
 	 */
 	kmalloc_caches[INDEX_NODE] = create_kmalloc_cache(
 				kmalloc_info[INDEX_NODE].name,
-				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);
+				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS,
+				0, kmalloc_size(INDEX_NODE));
 	slab_state = PARTIAL_NODE;
 	setup_kmalloc_cache_index_table();
 
diff --git a/mm/slab.h b/mm/slab.h
index 5d4b0fb6b7de..ec7d64debebd 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -96,7 +96,8 @@ struct kmem_cache *kmalloc_slab(size_t, gfp_t);
 extern int __kmem_cache_create(struct kmem_cache *, unsigned long flags);
 
 extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size,
-			unsigned long flags);
+			unsigned long flags, size_t useroffset,
+			size_t usersize);
 extern void create_boot_cache(struct kmem_cache *, const char *name,
 			size_t size, unsigned long flags, size_t useroffset,
 			size_t usersize);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 4b1bca7c1a42..f662f4e2fa29 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -916,14 +916,15 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t siz
 }
 
 struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
-				unsigned long flags)
+				unsigned long flags, size_t useroffset,
+				size_t usersize)
 {
 	struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT);
 
 	if (!s)
 		panic("Out of memory when creating slab %s\n", name);
 
-	create_boot_cache(s, name, size, flags, 0, size);
+	create_boot_cache(s, name, size, flags, useroffset, usersize);
 	list_add(&s->list, &slab_caches);
 	memcg_link_cache(s);
 	s->refcount = 1;
@@ -1077,7 +1078,8 @@ void __init setup_kmalloc_cache_index_table(void)
 static void __init new_kmalloc_cache(int idx, unsigned long flags)
 {
 	kmalloc_caches[idx] = create_kmalloc_cache(kmalloc_info[idx].name,
-					kmalloc_info[idx].size, flags);
+					kmalloc_info[idx].size, flags, 0,
+					kmalloc_info[idx].size);
 }
 
 /*
@@ -1118,7 +1120,7 @@ void __init create_kmalloc_caches(unsigned long flags)
 
 			BUG_ON(!n);
 			kmalloc_dma_caches[i] = create_kmalloc_cache(n,
-				size, SLAB_CACHE_DMA | flags);
+				size, SLAB_CACHE_DMA | flags, 0, 0);
 		}
 	}
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 03/30] usercopy: Mark kmalloc caches as usercopy caches
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm,
	kernel-hardening

From: David Windsor <dave@nullcore.net>

Mark the kmalloc slab caches as entirely whitelisted. These caches
are frequently used to fulfill kernel allocations that contain data
to be copied to/from userspace. Internal-only uses are also common,
but are scattered in the kernel. For now, mark all the kmalloc caches
as whitelisted.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: merged in moved kmalloc hunks, adjust commit log]
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 mm/slab.c        |  3 ++-
 mm/slab.h        |  3 ++-
 mm/slab_common.c | 10 ++++++----
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index df268999cf02..9af16f675927 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1291,7 +1291,8 @@ void __init kmem_cache_init(void)
 	 */
 	kmalloc_caches[INDEX_NODE] = create_kmalloc_cache(
 				kmalloc_info[INDEX_NODE].name,
-				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);
+				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS,
+				0, kmalloc_size(INDEX_NODE));
 	slab_state = PARTIAL_NODE;
 	setup_kmalloc_cache_index_table();
 
diff --git a/mm/slab.h b/mm/slab.h
index 5d4b0fb6b7de..ec7d64debebd 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -96,7 +96,8 @@ struct kmem_cache *kmalloc_slab(size_t, gfp_t);
 extern int __kmem_cache_create(struct kmem_cache *, unsigned long flags);
 
 extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size,
-			unsigned long flags);
+			unsigned long flags, size_t useroffset,
+			size_t usersize);
 extern void create_boot_cache(struct kmem_cache *, const char *name,
 			size_t size, unsigned long flags, size_t useroffset,
 			size_t usersize);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 4b1bca7c1a42..f662f4e2fa29 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -916,14 +916,15 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t siz
 }
 
 struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
-				unsigned long flags)
+				unsigned long flags, size_t useroffset,
+				size_t usersize)
 {
 	struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT);
 
 	if (!s)
 		panic("Out of memory when creating slab %s\n", name);
 
-	create_boot_cache(s, name, size, flags, 0, size);
+	create_boot_cache(s, name, size, flags, useroffset, usersize);
 	list_add(&s->list, &slab_caches);
 	memcg_link_cache(s);
 	s->refcount = 1;
@@ -1077,7 +1078,8 @@ void __init setup_kmalloc_cache_index_table(void)
 static void __init new_kmalloc_cache(int idx, unsigned long flags)
 {
 	kmalloc_caches[idx] = create_kmalloc_cache(kmalloc_info[idx].name,
-					kmalloc_info[idx].size, flags);
+					kmalloc_info[idx].size, flags, 0,
+					kmalloc_info[idx].size);
 }
 
 /*
@@ -1118,7 +1120,7 @@ void __init create_kmalloc_caches(unsigned long flags)
 
 			BUG_ON(!n);
 			kmalloc_dma_caches[i] = create_kmalloc_cache(n,
-				size, SLAB_CACHE_DMA | flags);
+				size, SLAB_CACHE_DMA | flags, 0, 0);
 		}
 	}
 #endif
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 03/30] usercopy: Mark kmalloc caches as usercopy caches
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm,
	kernel-hardening

From: David Windsor <dave@nullcore.net>

Mark the kmalloc slab caches as entirely whitelisted. These caches
are frequently used to fulfill kernel allocations that contain data
to be copied to/from userspace. Internal-only uses are also common,
but are scattered in the kernel. For now, mark all the kmalloc caches
as whitelisted.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: merged in moved kmalloc hunks, adjust commit log]
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 mm/slab.c        |  3 ++-
 mm/slab.h        |  3 ++-
 mm/slab_common.c | 10 ++++++----
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index df268999cf02..9af16f675927 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1291,7 +1291,8 @@ void __init kmem_cache_init(void)
 	 */
 	kmalloc_caches[INDEX_NODE] = create_kmalloc_cache(
 				kmalloc_info[INDEX_NODE].name,
-				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS);
+				kmalloc_size(INDEX_NODE), ARCH_KMALLOC_FLAGS,
+				0, kmalloc_size(INDEX_NODE));
 	slab_state = PARTIAL_NODE;
 	setup_kmalloc_cache_index_table();
 
diff --git a/mm/slab.h b/mm/slab.h
index 5d4b0fb6b7de..ec7d64debebd 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -96,7 +96,8 @@ struct kmem_cache *kmalloc_slab(size_t, gfp_t);
 extern int __kmem_cache_create(struct kmem_cache *, unsigned long flags);
 
 extern struct kmem_cache *create_kmalloc_cache(const char *name, size_t size,
-			unsigned long flags);
+			unsigned long flags, size_t useroffset,
+			size_t usersize);
 extern void create_boot_cache(struct kmem_cache *, const char *name,
 			size_t size, unsigned long flags, size_t useroffset,
 			size_t usersize);
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 4b1bca7c1a42..f662f4e2fa29 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -916,14 +916,15 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t siz
 }
 
 struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
-				unsigned long flags)
+				unsigned long flags, size_t useroffset,
+				size_t usersize)
 {
 	struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT);
 
 	if (!s)
 		panic("Out of memory when creating slab %s\n", name);
 
-	create_boot_cache(s, name, size, flags, 0, size);
+	create_boot_cache(s, name, size, flags, useroffset, usersize);
 	list_add(&s->list, &slab_caches);
 	memcg_link_cache(s);
 	s->refcount = 1;
@@ -1077,7 +1078,8 @@ void __init setup_kmalloc_cache_index_table(void)
 static void __init new_kmalloc_cache(int idx, unsigned long flags)
 {
 	kmalloc_caches[idx] = create_kmalloc_cache(kmalloc_info[idx].name,
-					kmalloc_info[idx].size, flags);
+					kmalloc_info[idx].size, flags, 0,
+					kmalloc_info[idx].size);
 }
 
 /*
@@ -1118,7 +1120,7 @@ void __init create_kmalloc_caches(unsigned long flags)
 
 			BUG_ON(!n);
 			kmalloc_dma_caches[i] = create_kmalloc_cache(n,
-				size, SLAB_CACHE_DMA | flags);
+				size, SLAB_CACHE_DMA | flags, 0, 0);
 		}
 	}
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 04/30] dcache: Define usercopy region in dentry_cache slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Alexander Viro, linux-fsdevel,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

When a dentry name is short enough, it can be stored directly in the
dentry itself (instead in a separate kmalloc allocation). These dentry
short names, stored in struct dentry.d_iname and therefore contained in
the dentry_cache slab cache, need to be coped to userspace.

cache object allocation:
    fs/dcache.c:
        __d_alloc(...):
            ...
            dentry = kmem_cache_alloc(dentry_cache, ...);
            ...
            dentry->d_name.name = dentry->d_iname;

example usage trace:
    filldir+0xb0/0x140
    dcache_readdir+0x82/0x170
    iterate_dir+0x142/0x1b0
    SyS_getdents+0xb5/0x160

    fs/readdir.c:
        (called via ctx.actor by dir_emit)
        filldir(..., const char *name, ...):
            ...
            copy_to_user(..., name, namlen)

    fs/libfs.c:
        dcache_readdir(...):
            ...
            next = next_positive(dentry, p, 1)
            ...
            dir_emit(..., next->d_name.name, ...)

In support of usercopy hardening, this patch defines a region in the
dentry_cache slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust hunks for kmalloc-specific things moved later]
[kees: adjust commit log, provide usage trace]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/dcache.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f90141387f01..5f5e7c1fcf4b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3603,8 +3603,9 @@ static void __init dcache_init(void)
 	 * but it is probably not worth it because of the cache nature
 	 * of the dcache.
 	 */
-	dentry_cache = KMEM_CACHE(dentry,
-		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT);
+	dentry_cache = KMEM_CACHE_USERCOPY(dentry,
+		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+		d_iname);
 
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 04/30] dcache: Define usercopy region in dentry_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Alexander Viro, linux-fsdevel,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

When a dentry name is short enough, it can be stored directly in the
dentry itself (instead in a separate kmalloc allocation). These dentry
short names, stored in struct dentry.d_iname and therefore contained in
the dentry_cache slab cache, need to be coped to userspace.

cache object allocation:
    fs/dcache.c:
        __d_alloc(...):
            ...
            dentry = kmem_cache_alloc(dentry_cache, ...);
            ...
            dentry->d_name.name = dentry->d_iname;

example usage trace:
    filldir+0xb0/0x140
    dcache_readdir+0x82/0x170
    iterate_dir+0x142/0x1b0
    SyS_getdents+0xb5/0x160

    fs/readdir.c:
        (called via ctx.actor by dir_emit)
        filldir(..., const char *name, ...):
            ...
            copy_to_user(..., name, namlen)

    fs/libfs.c:
        dcache_readdir(...):
            ...
            next = next_positive(dentry, p, 1)
            ...
            dir_emit(..., next->d_name.name, ...)

In support of usercopy hardening, this patch defines a region in the
dentry_cache slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust hunks for kmalloc-specific things moved later]
[kees: adjust commit log, provide usage trace]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/dcache.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f90141387f01..5f5e7c1fcf4b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3603,8 +3603,9 @@ static void __init dcache_init(void)
 	 * but it is probably not worth it because of the cache nature
 	 * of the dcache.
 	 */
-	dentry_cache = KMEM_CACHE(dentry,
-		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT);
+	dentry_cache = KMEM_CACHE_USERCOPY(dentry,
+		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+		d_iname);
 
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 04/30] dcache: Define usercopy region in dentry_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Alexander Viro, linux-fsdevel,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

When a dentry name is short enough, it can be stored directly in the
dentry itself (instead in a separate kmalloc allocation). These dentry
short names, stored in struct dentry.d_iname and therefore contained in
the dentry_cache slab cache, need to be coped to userspace.

cache object allocation:
    fs/dcache.c:
        __d_alloc(...):
            ...
            dentry = kmem_cache_alloc(dentry_cache, ...);
            ...
            dentry->d_name.name = dentry->d_iname;

example usage trace:
    filldir+0xb0/0x140
    dcache_readdir+0x82/0x170
    iterate_dir+0x142/0x1b0
    SyS_getdents+0xb5/0x160

    fs/readdir.c:
        (called via ctx.actor by dir_emit)
        filldir(..., const char *name, ...):
            ...
            copy_to_user(..., name, namlen)

    fs/libfs.c:
        dcache_readdir(...):
            ...
            next = next_positive(dentry, p, 1)
            ...
            dir_emit(..., next->d_name.name, ...)

In support of usercopy hardening, this patch defines a region in the
dentry_cache slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust hunks for kmalloc-specific things moved later]
[kees: adjust commit log, provide usage trace]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/dcache.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index f90141387f01..5f5e7c1fcf4b 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3603,8 +3603,9 @@ static void __init dcache_init(void)
 	 * but it is probably not worth it because of the cache nature
 	 * of the dcache.
 	 */
-	dentry_cache = KMEM_CACHE(dentry,
-		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT);
+	dentry_cache = KMEM_CACHE_USERCOPY(dentry,
+		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+		d_iname);
 
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 05/30] vfs: Define usercopy region in names_cache slab caches
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Alexander Viro, linux-fsdevel,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

VFS pathnames are stored in the names_cache slab cache, either inline
or across an entire allocation entry (when approaching PATH_MAX). These
are copied to/from userspace, so they must be entirely whitelisted.

cache object allocation:
    include/linux/fs.h:
        #define __getname()    kmem_cache_alloc(names_cachep, GFP_KERNEL)

example usage trace:
    strncpy_from_user+0x4d/0x170
    getname_flags+0x6f/0x1f0
    user_path_at_empty+0x23/0x40
    do_mount+0x69/0xda0
    SyS_mount+0x83/0xd0

    fs/namei.c:
        getname_flags(...):
            ...
            result = __getname();
            ...
            kname = (char *)result->iname;
            result->name = kname;
            len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
            ...
            if (unlikely(len == EMBEDDED_NAME_MAX)) {
                const size_t size = offsetof(struct filename, iname[1]);
                kname = (char *)result;

                result = kzalloc(size, GFP_KERNEL);
                ...
                result->name = kname;
                len = strncpy_from_user(kname, filename, PATH_MAX);

In support of usercopy hardening, this patch defines the entire cache
object in the names_cache slab cache as whitelisted, since it may entirely
hold name strings to be copied to/from userspace.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, add usage trace]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/dcache.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 5f5e7c1fcf4b..34ef9a9169be 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3642,8 +3642,8 @@ void __init vfs_caches_init_early(void)
 
 void __init vfs_caches_init(void)
 {
-	names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+	names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
 
 	dcache_init();
 	inode_init();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 05/30] vfs: Define usercopy region in names_cache slab caches
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Alexander Viro, linux-fsdevel,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

VFS pathnames are stored in the names_cache slab cache, either inline
or across an entire allocation entry (when approaching PATH_MAX). These
are copied to/from userspace, so they must be entirely whitelisted.

cache object allocation:
    include/linux/fs.h:
        #define __getname()    kmem_cache_alloc(names_cachep, GFP_KERNEL)

example usage trace:
    strncpy_from_user+0x4d/0x170
    getname_flags+0x6f/0x1f0
    user_path_at_empty+0x23/0x40
    do_mount+0x69/0xda0
    SyS_mount+0x83/0xd0

    fs/namei.c:
        getname_flags(...):
            ...
            result = __getname();
            ...
            kname = (char *)result->iname;
            result->name = kname;
            len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
            ...
            if (unlikely(len == EMBEDDED_NAME_MAX)) {
                const size_t size = offsetof(struct filename, iname[1]);
                kname = (char *)result;

                result = kzalloc(size, GFP_KERNEL);
                ...
                result->name = kname;
                len = strncpy_from_user(kname, filename, PATH_MAX);

In support of usercopy hardening, this patch defines the entire cache
object in the names_cache slab cache as whitelisted, since it may entirely
hold name strings to be copied to/from userspace.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, add usage trace]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/dcache.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 5f5e7c1fcf4b..34ef9a9169be 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3642,8 +3642,8 @@ void __init vfs_caches_init_early(void)
 
 void __init vfs_caches_init(void)
 {
-	names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+	names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
 
 	dcache_init();
 	inode_init();
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 05/30] vfs: Define usercopy region in names_cache slab caches
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Alexander Viro, linux-fsdevel,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

VFS pathnames are stored in the names_cache slab cache, either inline
or across an entire allocation entry (when approaching PATH_MAX). These
are copied to/from userspace, so they must be entirely whitelisted.

cache object allocation:
    include/linux/fs.h:
        #define __getname()    kmem_cache_alloc(names_cachep, GFP_KERNEL)

example usage trace:
    strncpy_from_user+0x4d/0x170
    getname_flags+0x6f/0x1f0
    user_path_at_empty+0x23/0x40
    do_mount+0x69/0xda0
    SyS_mount+0x83/0xd0

    fs/namei.c:
        getname_flags(...):
            ...
            result = __getname();
            ...
            kname = (char *)result->iname;
            result->name = kname;
            len = strncpy_from_user(kname, filename, EMBEDDED_NAME_MAX);
            ...
            if (unlikely(len == EMBEDDED_NAME_MAX)) {
                const size_t size = offsetof(struct filename, iname[1]);
                kname = (char *)result;

                result = kzalloc(size, GFP_KERNEL);
                ...
                result->name = kname;
                len = strncpy_from_user(kname, filename, PATH_MAX);

In support of usercopy hardening, this patch defines the entire cache
object in the names_cache slab cache as whitelisted, since it may entirely
hold name strings to be copied to/from userspace.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, add usage trace]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/dcache.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 5f5e7c1fcf4b..34ef9a9169be 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3642,8 +3642,8 @@ void __init vfs_caches_init_early(void)
 
 void __init vfs_caches_init(void)
 {
-	names_cachep = kmem_cache_create("names_cache", PATH_MAX, 0,
-			SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
+	names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
+			SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
 
 	dcache_init();
 	inode_init();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 06/30] vfs: Copy struct mount.mnt_id to userspace using put_user()
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Alexander Viro, linux-fsdevel,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The mnt_id field can be copied with put_user(), so there is no need to
use copy_to_user(). In both cases, hardened usercopy is being bypassed
since the size is constant, and not open to runtime manipulation.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/fhandle.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/fhandle.c b/fs/fhandle.c
index 58a61f55e0d0..46e00ccca8f0 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -68,8 +68,7 @@ static long do_sys_name_to_handle(struct path *path,
 	} else
 		retval = 0;
 	/* copy the mount id */
-	if (copy_to_user(mnt_id, &real_mount(path->mnt)->mnt_id,
-			 sizeof(*mnt_id)) ||
+	if (put_user(real_mount(path->mnt)->mnt_id, mnt_id) ||
 	    copy_to_user(ufh, handle,
 			 sizeof(struct file_handle) + handle_bytes))
 		retval = -EFAULT;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 06/30] vfs: Copy struct mount.mnt_id to userspace using put_user()
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Alexander Viro, linux-fsdevel,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The mnt_id field can be copied with put_user(), so there is no need to
use copy_to_user(). In both cases, hardened usercopy is being bypassed
since the size is constant, and not open to runtime manipulation.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/fhandle.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/fhandle.c b/fs/fhandle.c
index 58a61f55e0d0..46e00ccca8f0 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -68,8 +68,7 @@ static long do_sys_name_to_handle(struct path *path,
 	} else
 		retval = 0;
 	/* copy the mount id */
-	if (copy_to_user(mnt_id, &real_mount(path->mnt)->mnt_id,
-			 sizeof(*mnt_id)) ||
+	if (put_user(real_mount(path->mnt)->mnt_id, mnt_id) ||
 	    copy_to_user(ufh, handle,
 			 sizeof(struct file_handle) + handle_bytes))
 		retval = -EFAULT;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 06/30] vfs: Copy struct mount.mnt_id to userspace using put_user()
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Alexander Viro, linux-fsdevel,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The mnt_id field can be copied with put_user(), so there is no need to
use copy_to_user(). In both cases, hardened usercopy is being bypassed
since the size is constant, and not open to runtime manipulation.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log]
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/fhandle.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/fhandle.c b/fs/fhandle.c
index 58a61f55e0d0..46e00ccca8f0 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -68,8 +68,7 @@ static long do_sys_name_to_handle(struct path *path,
 	} else
 		retval = 0;
 	/* copy the mount id */
-	if (copy_to_user(mnt_id, &real_mount(path->mnt)->mnt_id,
-			 sizeof(*mnt_id)) ||
+	if (put_user(real_mount(path->mnt)->mnt_id, mnt_id) ||
 	    copy_to_user(ufh, handle,
 			 sizeof(struct file_handle) + handle_bytes))
 		retval = -EFAULT;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 07/30] ext4: Define usercopy region in ext4_inode_cache slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Theodore Ts'o, Andreas Dilger,
	linux-ext4, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The ext4 symlink pathnames, stored in struct ext4_inode_info.i_data
and therefore contained in the ext4_inode_cache slab cache, need
to be copied to/from userspace.

cache object allocation:
    fs/ext4/super.c:
        ext4_alloc_inode(...):
            struct ext4_inode_info *ei;
            ...
            ei = kmem_cache_alloc(ext4_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    include/trace/events/ext4.h:
            #define EXT4_I(inode) \
                (container_of(inode, struct ext4_inode_info, vfs_inode))

    fs/ext4/namei.c:
        ext4_symlink(...):
            ...
            inode->i_link = (char *)&EXT4_I(inode)->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len)

        (inlined into vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ext4_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/ext4/super.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0886fe82e9c4..79c3b1b11364 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1038,11 +1038,13 @@ static void init_once(void *foo)
 
 static int __init init_inodecache(void)
 {
-	ext4_inode_cachep = kmem_cache_create("ext4_inode_cache",
-					     sizeof(struct ext4_inode_info),
-					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					     init_once);
+	ext4_inode_cachep = kmem_cache_create_usercopy("ext4_inode_cache",
+				sizeof(struct ext4_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct ext4_inode_info, i_data),
+				sizeof_field(struct ext4_inode_info, i_data),
+				init_once);
 	if (ext4_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 07/30] ext4: Define usercopy region in ext4_inode_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Theodore Ts'o, Andreas Dilger,
	linux-ext4, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The ext4 symlink pathnames, stored in struct ext4_inode_info.i_data
and therefore contained in the ext4_inode_cache slab cache, need
to be copied to/from userspace.

cache object allocation:
    fs/ext4/super.c:
        ext4_alloc_inode(...):
            struct ext4_inode_info *ei;
            ...
            ei = kmem_cache_alloc(ext4_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    include/trace/events/ext4.h:
            #define EXT4_I(inode) \
                (container_of(inode, struct ext4_inode_info, vfs_inode))

    fs/ext4/namei.c:
        ext4_symlink(...):
            ...
            inode->i_link = (char *)&EXT4_I(inode)->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len)

        (inlined into vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ext4_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/ext4/super.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0886fe82e9c4..79c3b1b11364 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1038,11 +1038,13 @@ static void init_once(void *foo)
 
 static int __init init_inodecache(void)
 {
-	ext4_inode_cachep = kmem_cache_create("ext4_inode_cache",
-					     sizeof(struct ext4_inode_info),
-					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					     init_once);
+	ext4_inode_cachep = kmem_cache_create_usercopy("ext4_inode_cache",
+				sizeof(struct ext4_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct ext4_inode_info, i_data),
+				sizeof_field(struct ext4_inode_info, i_data),
+				init_once);
 	if (ext4_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 07/30] ext4: Define usercopy region in ext4_inode_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Theodore Ts'o, Andreas Dilger,
	linux-ext4, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The ext4 symlink pathnames, stored in struct ext4_inode_info.i_data
and therefore contained in the ext4_inode_cache slab cache, need
to be copied to/from userspace.

cache object allocation:
    fs/ext4/super.c:
        ext4_alloc_inode(...):
            struct ext4_inode_info *ei;
            ...
            ei = kmem_cache_alloc(ext4_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    include/trace/events/ext4.h:
            #define EXT4_I(inode) \
                (container_of(inode, struct ext4_inode_info, vfs_inode))

    fs/ext4/namei.c:
        ext4_symlink(...):
            ...
            inode->i_link = (char *)&EXT4_I(inode)->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len)

        (inlined into vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ext4_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/ext4/super.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0886fe82e9c4..79c3b1b11364 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1038,11 +1038,13 @@ static void init_once(void *foo)
 
 static int __init init_inodecache(void)
 {
-	ext4_inode_cachep = kmem_cache_create("ext4_inode_cache",
-					     sizeof(struct ext4_inode_info),
-					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					     init_once);
+	ext4_inode_cachep = kmem_cache_create_usercopy("ext4_inode_cache",
+				sizeof(struct ext4_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct ext4_inode_info, i_data),
+				sizeof_field(struct ext4_inode_info, i_data),
+				init_once);
 	if (ext4_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 08/30] ext2: Define usercopy region in ext2_inode_cache slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Jan Kara, linux-ext4, linux-mm,
	kernel-hardening

From: David Windsor <dave@nullcore.net>

The ext2 symlink pathnames, stored in struct ext2_inode_info.i_data and
therefore contained in the ext2_inode_cache slab cache, need to be copied
to/from userspace.

cache object allocation:
    fs/ext2/super.c:
        ext2_alloc_inode(...):
            struct ext2_inode_info *ei;
            ...
            ei = kmem_cache_alloc(ext2_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    fs/ext2/ext2.h:
        EXT2_I(struct inode *inode):
            return container_of(inode, struct ext2_inode_info, vfs_inode);

    fs/ext2/namei.c:
        ext2_symlink(...):
            ...
            inode->i_link = (char *)&EXT2_I(inode)->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined into vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ext2_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Jan Kara <jack@suse.com>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/ext2/super.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 7b1bc9059863..670142cde59d 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -219,11 +219,13 @@ static void init_once(void *foo)
 
 static int __init init_inodecache(void)
 {
-	ext2_inode_cachep = kmem_cache_create("ext2_inode_cache",
-					     sizeof(struct ext2_inode_info),
-					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					     init_once);
+	ext2_inode_cachep = kmem_cache_create_usercopy("ext2_inode_cache",
+				sizeof(struct ext2_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct ext2_inode_info, i_data),
+				sizeof_field(struct ext2_inode_info, i_data),
+				init_once);
 	if (ext2_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 08/30] ext2: Define usercopy region in ext2_inode_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Jan Kara, linux-ext4, linux-mm,
	kernel-hardening

From: David Windsor <dave@nullcore.net>

The ext2 symlink pathnames, stored in struct ext2_inode_info.i_data and
therefore contained in the ext2_inode_cache slab cache, need to be copied
to/from userspace.

cache object allocation:
    fs/ext2/super.c:
        ext2_alloc_inode(...):
            struct ext2_inode_info *ei;
            ...
            ei = kmem_cache_alloc(ext2_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    fs/ext2/ext2.h:
        EXT2_I(struct inode *inode):
            return container_of(inode, struct ext2_inode_info, vfs_inode);

    fs/ext2/namei.c:
        ext2_symlink(...):
            ...
            inode->i_link = (char *)&EXT2_I(inode)->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined into vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ext2_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Jan Kara <jack@suse.com>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/ext2/super.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 7b1bc9059863..670142cde59d 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -219,11 +219,13 @@ static void init_once(void *foo)
 
 static int __init init_inodecache(void)
 {
-	ext2_inode_cachep = kmem_cache_create("ext2_inode_cache",
-					     sizeof(struct ext2_inode_info),
-					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					     init_once);
+	ext2_inode_cachep = kmem_cache_create_usercopy("ext2_inode_cache",
+				sizeof(struct ext2_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct ext2_inode_info, i_data),
+				sizeof_field(struct ext2_inode_info, i_data),
+				init_once);
 	if (ext2_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 08/30] ext2: Define usercopy region in ext2_inode_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Jan Kara, linux-ext4, linux-mm,
	kernel-hardening

From: David Windsor <dave@nullcore.net>

The ext2 symlink pathnames, stored in struct ext2_inode_info.i_data and
therefore contained in the ext2_inode_cache slab cache, need to be copied
to/from userspace.

cache object allocation:
    fs/ext2/super.c:
        ext2_alloc_inode(...):
            struct ext2_inode_info *ei;
            ...
            ei = kmem_cache_alloc(ext2_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    fs/ext2/ext2.h:
        EXT2_I(struct inode *inode):
            return container_of(inode, struct ext2_inode_info, vfs_inode);

    fs/ext2/namei.c:
        ext2_symlink(...):
            ...
            inode->i_link = (char *)&EXT2_I(inode)->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined into vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ext2_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Jan Kara <jack@suse.com>
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/ext2/super.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 7b1bc9059863..670142cde59d 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -219,11 +219,13 @@ static void init_once(void *foo)
 
 static int __init init_inodecache(void)
 {
-	ext2_inode_cachep = kmem_cache_create("ext2_inode_cache",
-					     sizeof(struct ext2_inode_info),
-					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					     init_once);
+	ext2_inode_cachep = kmem_cache_create_usercopy("ext2_inode_cache",
+				sizeof(struct ext2_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct ext2_inode_info, i_data),
+				sizeof_field(struct ext2_inode_info, i_data),
+				init_once);
 	if (ext2_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 09/30] jfs: Define usercopy region in jfs_ip slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Dave Kleikamp, jfs-discussion,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The jfs symlink pathnames, stored in struct jfs_inode_info.i_inline and
therefore contained in the jfs_ip slab cache, need to be copied to/from
userspace.

cache object allocation:
    fs/jfs/super.c:
        jfs_alloc_inode(...):
            ...
            jfs_inode = kmem_cache_alloc(jfs_inode_cachep, GFP_NOFS);
            ...
            return &jfs_inode->vfs_inode;

    fs/jfs/jfs_incore.h:
        JFS_IP(struct inode *inode):
            return container_of(inode, struct jfs_inode_info, vfs_inode);

    fs/jfs/inode.c:
        jfs_iget(...):
            ...
            inode->i_link = JFS_IP(inode)->i_inline;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
jfs_ip slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: jfs-discussion@lists.sourceforge.net
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/jfs/super.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index e8aad7d87b8c..10b958f49f57 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -972,9 +972,11 @@ static int __init init_jfs_fs(void)
 	int rc;
 
 	jfs_inode_cachep =
-	    kmem_cache_create("jfs_ip", sizeof(struct jfs_inode_info), 0,
-			    SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
-			    init_once);
+	    kmem_cache_create_usercopy("jfs_ip", sizeof(struct jfs_inode_info),
+			0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+			offsetof(struct jfs_inode_info, i_inline),
+			sizeof_field(struct jfs_inode_info, i_inline),
+			init_once);
 	if (jfs_inode_cachep == NULL)
 		return -ENOMEM;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 09/30] jfs: Define usercopy region in jfs_ip slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Dave Kleikamp, jfs-discussion,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The jfs symlink pathnames, stored in struct jfs_inode_info.i_inline and
therefore contained in the jfs_ip slab cache, need to be copied to/from
userspace.

cache object allocation:
    fs/jfs/super.c:
        jfs_alloc_inode(...):
            ...
            jfs_inode = kmem_cache_alloc(jfs_inode_cachep, GFP_NOFS);
            ...
            return &jfs_inode->vfs_inode;

    fs/jfs/jfs_incore.h:
        JFS_IP(struct inode *inode):
            return container_of(inode, struct jfs_inode_info, vfs_inode);

    fs/jfs/inode.c:
        jfs_iget(...):
            ...
            inode->i_link = JFS_IP(inode)->i_inline;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
jfs_ip slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: jfs-discussion@lists.sourceforge.net
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/jfs/super.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index e8aad7d87b8c..10b958f49f57 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -972,9 +972,11 @@ static int __init init_jfs_fs(void)
 	int rc;
 
 	jfs_inode_cachep =
-	    kmem_cache_create("jfs_ip", sizeof(struct jfs_inode_info), 0,
-			    SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
-			    init_once);
+	    kmem_cache_create_usercopy("jfs_ip", sizeof(struct jfs_inode_info),
+			0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+			offsetof(struct jfs_inode_info, i_inline),
+			sizeof_field(struct jfs_inode_info, i_inline),
+			init_once);
 	if (jfs_inode_cachep == NULL)
 		return -ENOMEM;
 
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 09/30] jfs: Define usercopy region in jfs_ip slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Dave Kleikamp, jfs-discussion,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The jfs symlink pathnames, stored in struct jfs_inode_info.i_inline and
therefore contained in the jfs_ip slab cache, need to be copied to/from
userspace.

cache object allocation:
    fs/jfs/super.c:
        jfs_alloc_inode(...):
            ...
            jfs_inode = kmem_cache_alloc(jfs_inode_cachep, GFP_NOFS);
            ...
            return &jfs_inode->vfs_inode;

    fs/jfs/jfs_incore.h:
        JFS_IP(struct inode *inode):
            return container_of(inode, struct jfs_inode_info, vfs_inode);

    fs/jfs/inode.c:
        jfs_iget(...):
            ...
            inode->i_link = JFS_IP(inode)->i_inline;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
jfs_ip slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: jfs-discussion@lists.sourceforge.net
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/jfs/super.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/jfs/super.c b/fs/jfs/super.c
index e8aad7d87b8c..10b958f49f57 100644
--- a/fs/jfs/super.c
+++ b/fs/jfs/super.c
@@ -972,9 +972,11 @@ static int __init init_jfs_fs(void)
 	int rc;
 
 	jfs_inode_cachep =
-	    kmem_cache_create("jfs_ip", sizeof(struct jfs_inode_info), 0,
-			    SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
-			    init_once);
+	    kmem_cache_create_usercopy("jfs_ip", sizeof(struct jfs_inode_info),
+			0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+			offsetof(struct jfs_inode_info, i_inline),
+			sizeof_field(struct jfs_inode_info, i_inline),
+			init_once);
 	if (jfs_inode_cachep == NULL)
 		return -ENOMEM;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 10/30] befs: Define usercopy region in befs_inode_cache slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Luis de Bethencourt, Salah Triki,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
and therefore contained in the befs_inode_cache slab cache, need to be
copied to/from userspace.

cache object allocation:
    fs/befs/linuxvfs.c:
        befs_alloc_inode(...):
            ...
            bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
            ...
            return &bi->vfs_inode;

        befs_iget(...):
            ...
            strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
                    BEFS_SYMLINK_LEN);
            ...
            inode->i_link = befs_ino->i_data.symlink;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
befs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Luis de Bethencourt <luisbg@kernel.org>
Cc: Salah Triki <salah.triki@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/befs/linuxvfs.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 4a4a5a366158..1c2dcbee79dd 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block *sb, unsigned long ino)
 static int __init
 befs_init_inodecache(void)
 {
-	befs_inode_cachep = kmem_cache_create("befs_inode_cache",
-					      sizeof (struct befs_inode_info),
-					      0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					      init_once);
+	befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
+				sizeof(struct befs_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct befs_inode_info,
+					i_data.symlink),
+				sizeof_field(struct befs_inode_info,
+					i_data.symlink),
+				init_once);
 	if (befs_inode_cachep == NULL)
 		return -ENOMEM;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 10/30] befs: Define usercopy region in befs_inode_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Luis de Bethencourt, Salah Triki,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
and therefore contained in the befs_inode_cache slab cache, need to be
copied to/from userspace.

cache object allocation:
    fs/befs/linuxvfs.c:
        befs_alloc_inode(...):
            ...
            bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
            ...
            return &bi->vfs_inode;

        befs_iget(...):
            ...
            strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
                    BEFS_SYMLINK_LEN);
            ...
            inode->i_link = befs_ino->i_data.symlink;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
befs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Luis de Bethencourt <luisbg@kernel.org>
Cc: Salah Triki <salah.triki@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/befs/linuxvfs.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 4a4a5a366158..1c2dcbee79dd 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block *sb, unsigned long ino)
 static int __init
 befs_init_inodecache(void)
 {
-	befs_inode_cachep = kmem_cache_create("befs_inode_cache",
-					      sizeof (struct befs_inode_info),
-					      0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					      init_once);
+	befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
+				sizeof(struct befs_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct befs_inode_info,
+					i_data.symlink),
+				sizeof_field(struct befs_inode_info,
+					i_data.symlink),
+				init_once);
 	if (befs_inode_cachep == NULL)
 		return -ENOMEM;
 
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 10/30] befs: Define usercopy region in befs_inode_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Luis de Bethencourt, Salah Triki,
	linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
and therefore contained in the befs_inode_cache slab cache, need to be
copied to/from userspace.

cache object allocation:
    fs/befs/linuxvfs.c:
        befs_alloc_inode(...):
            ...
            bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
            ...
            return &bi->vfs_inode;

        befs_iget(...):
            ...
            strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
                    BEFS_SYMLINK_LEN);
            ...
            inode->i_link = befs_ino->i_data.symlink;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
befs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Luis de Bethencourt <luisbg@kernel.org>
Cc: Salah Triki <salah.triki@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/befs/linuxvfs.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
index 4a4a5a366158..1c2dcbee79dd 100644
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block *sb, unsigned long ino)
 static int __init
 befs_init_inodecache(void)
 {
-	befs_inode_cachep = kmem_cache_create("befs_inode_cache",
-					      sizeof (struct befs_inode_info),
-					      0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					      init_once);
+	befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
+				sizeof(struct befs_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct befs_inode_info,
+					i_data.symlink),
+				sizeof_field(struct befs_inode_info,
+					i_data.symlink),
+				init_once);
 	if (befs_inode_cachep == NULL)
 		return -ENOMEM;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 11/30] exofs: Define usercopy region in exofs_inode_cache slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Boaz Harrosh, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The exofs short symlink names, stored in struct exofs_i_info.i_data and
therefore contained in the exofs_inode_cache slab cache, need to be copied
to/from userspace.

cache object allocation:
    fs/exofs/super.c:
        exofs_alloc_inode(...):
            ...
            oi = kmem_cache_alloc(exofs_inode_cachep, GFP_KERNEL);
            ...
            return &oi->vfs_inode;

    fs/exofs/namei.c:
        exofs_symlink(...):
            ...
            inode->i_link = (char *)oi->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
exofs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/exofs/super.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/exofs/super.c b/fs/exofs/super.c
index 819624cfc8da..e5c532875bb7 100644
--- a/fs/exofs/super.c
+++ b/fs/exofs/super.c
@@ -192,10 +192,13 @@ static void exofs_init_once(void *foo)
  */
 static int init_inodecache(void)
 {
-	exofs_inode_cachep = kmem_cache_create("exofs_inode_cache",
+	exofs_inode_cachep = kmem_cache_create_usercopy("exofs_inode_cache",
 				sizeof(struct exofs_i_info), 0,
 				SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD |
-				SLAB_ACCOUNT, exofs_init_once);
+				SLAB_ACCOUNT,
+				offsetof(struct exofs_i_info, i_data),
+				sizeof_field(struct exofs_i_info, i_data),
+				exofs_init_once);
 	if (exofs_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 11/30] exofs: Define usercopy region in exofs_inode_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Boaz Harrosh, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The exofs short symlink names, stored in struct exofs_i_info.i_data and
therefore contained in the exofs_inode_cache slab cache, need to be copied
to/from userspace.

cache object allocation:
    fs/exofs/super.c:
        exofs_alloc_inode(...):
            ...
            oi = kmem_cache_alloc(exofs_inode_cachep, GFP_KERNEL);
            ...
            return &oi->vfs_inode;

    fs/exofs/namei.c:
        exofs_symlink(...):
            ...
            inode->i_link = (char *)oi->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
exofs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/exofs/super.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/exofs/super.c b/fs/exofs/super.c
index 819624cfc8da..e5c532875bb7 100644
--- a/fs/exofs/super.c
+++ b/fs/exofs/super.c
@@ -192,10 +192,13 @@ static void exofs_init_once(void *foo)
  */
 static int init_inodecache(void)
 {
-	exofs_inode_cachep = kmem_cache_create("exofs_inode_cache",
+	exofs_inode_cachep = kmem_cache_create_usercopy("exofs_inode_cache",
 				sizeof(struct exofs_i_info), 0,
 				SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD |
-				SLAB_ACCOUNT, exofs_init_once);
+				SLAB_ACCOUNT,
+				offsetof(struct exofs_i_info, i_data),
+				sizeof_field(struct exofs_i_info, i_data),
+				exofs_init_once);
 	if (exofs_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 11/30] exofs: Define usercopy region in exofs_inode_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Boaz Harrosh, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The exofs short symlink names, stored in struct exofs_i_info.i_data and
therefore contained in the exofs_inode_cache slab cache, need to be copied
to/from userspace.

cache object allocation:
    fs/exofs/super.c:
        exofs_alloc_inode(...):
            ...
            oi = kmem_cache_alloc(exofs_inode_cachep, GFP_KERNEL);
            ...
            return &oi->vfs_inode;

    fs/exofs/namei.c:
        exofs_symlink(...):
            ...
            inode->i_link = (char *)oi->i_data;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
exofs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Boaz Harrosh <ooo@electrozaur.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/exofs/super.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/exofs/super.c b/fs/exofs/super.c
index 819624cfc8da..e5c532875bb7 100644
--- a/fs/exofs/super.c
+++ b/fs/exofs/super.c
@@ -192,10 +192,13 @@ static void exofs_init_once(void *foo)
  */
 static int init_inodecache(void)
 {
-	exofs_inode_cachep = kmem_cache_create("exofs_inode_cache",
+	exofs_inode_cachep = kmem_cache_create_usercopy("exofs_inode_cache",
 				sizeof(struct exofs_i_info), 0,
 				SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD |
-				SLAB_ACCOUNT, exofs_init_once);
+				SLAB_ACCOUNT,
+				offsetof(struct exofs_i_info, i_data),
+				sizeof_field(struct exofs_i_info, i_data),
+				exofs_init_once);
 	if (exofs_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 12/30] orangefs: Define usercopy region in orangefs_inode_cache slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Mike Marshall, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

orangefs symlink pathnames, stored in struct orangefs_inode_s.link_target
and therefore contained in the orangefs_inode_cache, need to be copied
to/from userspace.

cache object allocation:
    fs/orangefs/super.c:
        orangefs_alloc_inode(...):
            ...
            orangefs_inode = kmem_cache_alloc(orangefs_inode_cache, ...);
            ...
            return &orangefs_inode->vfs_inode;

    fs/orangefs/orangefs-utils.c:
        exofs_symlink(...):
            ...
            inode->i_link = orangefs_inode->link_target;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
orangefs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/orangefs/super.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 5a1bed6c8c6a..c67b91239730 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -626,11 +626,16 @@ void orangefs_kill_sb(struct super_block *sb)
 
 int orangefs_inode_cache_initialize(void)
 {
-	orangefs_inode_cache = kmem_cache_create("orangefs_inode_cache",
-					      sizeof(struct orangefs_inode_s),
-					      0,
-					      ORANGEFS_CACHE_CREATE_FLAGS,
-					      orangefs_inode_cache_ctor);
+	orangefs_inode_cache = kmem_cache_create_usercopy(
+					"orangefs_inode_cache",
+					sizeof(struct orangefs_inode_s),
+					0,
+					ORANGEFS_CACHE_CREATE_FLAGS,
+					offsetof(struct orangefs_inode_s,
+						link_target),
+					sizeof_field(struct orangefs_inode_s,
+						link_target),
+					orangefs_inode_cache_ctor);
 
 	if (!orangefs_inode_cache) {
 		gossip_err("Cannot create orangefs_inode_cache\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 12/30] orangefs: Define usercopy region in orangefs_inode_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Mike Marshall, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

orangefs symlink pathnames, stored in struct orangefs_inode_s.link_target
and therefore contained in the orangefs_inode_cache, need to be copied
to/from userspace.

cache object allocation:
    fs/orangefs/super.c:
        orangefs_alloc_inode(...):
            ...
            orangefs_inode = kmem_cache_alloc(orangefs_inode_cache, ...);
            ...
            return &orangefs_inode->vfs_inode;

    fs/orangefs/orangefs-utils.c:
        exofs_symlink(...):
            ...
            inode->i_link = orangefs_inode->link_target;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
orangefs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/orangefs/super.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 5a1bed6c8c6a..c67b91239730 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -626,11 +626,16 @@ void orangefs_kill_sb(struct super_block *sb)
 
 int orangefs_inode_cache_initialize(void)
 {
-	orangefs_inode_cache = kmem_cache_create("orangefs_inode_cache",
-					      sizeof(struct orangefs_inode_s),
-					      0,
-					      ORANGEFS_CACHE_CREATE_FLAGS,
-					      orangefs_inode_cache_ctor);
+	orangefs_inode_cache = kmem_cache_create_usercopy(
+					"orangefs_inode_cache",
+					sizeof(struct orangefs_inode_s),
+					0,
+					ORANGEFS_CACHE_CREATE_FLAGS,
+					offsetof(struct orangefs_inode_s,
+						link_target),
+					sizeof_field(struct orangefs_inode_s,
+						link_target),
+					orangefs_inode_cache_ctor);
 
 	if (!orangefs_inode_cache) {
 		gossip_err("Cannot create orangefs_inode_cache\n");
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 12/30] orangefs: Define usercopy region in orangefs_inode_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Mike Marshall, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

orangefs symlink pathnames, stored in struct orangefs_inode_s.link_target
and therefore contained in the orangefs_inode_cache, need to be copied
to/from userspace.

cache object allocation:
    fs/orangefs/super.c:
        orangefs_alloc_inode(...):
            ...
            orangefs_inode = kmem_cache_alloc(orangefs_inode_cache, ...);
            ...
            return &orangefs_inode->vfs_inode;

    fs/orangefs/orangefs-utils.c:
        exofs_symlink(...):
            ...
            inode->i_link = orangefs_inode->link_target;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
orangefs_inode_cache slab cache in which userspace copy operations are
allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/orangefs/super.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 5a1bed6c8c6a..c67b91239730 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -626,11 +626,16 @@ void orangefs_kill_sb(struct super_block *sb)
 
 int orangefs_inode_cache_initialize(void)
 {
-	orangefs_inode_cache = kmem_cache_create("orangefs_inode_cache",
-					      sizeof(struct orangefs_inode_s),
-					      0,
-					      ORANGEFS_CACHE_CREATE_FLAGS,
-					      orangefs_inode_cache_ctor);
+	orangefs_inode_cache = kmem_cache_create_usercopy(
+					"orangefs_inode_cache",
+					sizeof(struct orangefs_inode_s),
+					0,
+					ORANGEFS_CACHE_CREATE_FLAGS,
+					offsetof(struct orangefs_inode_s,
+						link_target),
+					sizeof_field(struct orangefs_inode_s,
+						link_target),
+					orangefs_inode_cache_ctor);
 
 	if (!orangefs_inode_cache) {
 		gossip_err("Cannot create orangefs_inode_cache\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 13/30] ufs: Define usercopy region in ufs_inode_cache slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Evgeniy Dushistov, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The ufs symlink pathnames, stored in struct ufs_inode_info.i_u1.i_symlink
and therefore contained in the ufs_inode_cache slab cache, need to be
copied to/from userspace.

cache object allocation:
    fs/ufs/super.c:
        ufs_alloc_inode(...):
            ...
            ei = kmem_cache_alloc(ufs_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    fs/ufs/ufs.h:
        UFS_I(struct inode *inode):
            return container_of(inode, struct ufs_inode_info, vfs_inode);

    fs/ufs/namei.c:
        ufs_symlink(...):
            ...
            inode->i_link = (char *)UFS_I(inode)->i_u1.i_symlink;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ufs_inode_cache slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/ufs/super.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 0a4f58a5073c..646f971067bc 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -1466,11 +1466,14 @@ static void init_once(void *foo)
 
 static int __init init_inodecache(void)
 {
-	ufs_inode_cachep = kmem_cache_create("ufs_inode_cache",
-					     sizeof(struct ufs_inode_info),
-					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					     init_once);
+	ufs_inode_cachep = kmem_cache_create_usercopy("ufs_inode_cache",
+				sizeof(struct ufs_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct ufs_inode_info, i_u1.i_symlink),
+				sizeof_field(struct ufs_inode_info,
+					i_u1.i_symlink),
+				init_once);
 	if (ufs_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 13/30] ufs: Define usercopy region in ufs_inode_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Evgeniy Dushistov, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The ufs symlink pathnames, stored in struct ufs_inode_info.i_u1.i_symlink
and therefore contained in the ufs_inode_cache slab cache, need to be
copied to/from userspace.

cache object allocation:
    fs/ufs/super.c:
        ufs_alloc_inode(...):
            ...
            ei = kmem_cache_alloc(ufs_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    fs/ufs/ufs.h:
        UFS_I(struct inode *inode):
            return container_of(inode, struct ufs_inode_info, vfs_inode);

    fs/ufs/namei.c:
        ufs_symlink(...):
            ...
            inode->i_link = (char *)UFS_I(inode)->i_u1.i_symlink;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ufs_inode_cache slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/ufs/super.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 0a4f58a5073c..646f971067bc 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -1466,11 +1466,14 @@ static void init_once(void *foo)
 
 static int __init init_inodecache(void)
 {
-	ufs_inode_cachep = kmem_cache_create("ufs_inode_cache",
-					     sizeof(struct ufs_inode_info),
-					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					     init_once);
+	ufs_inode_cachep = kmem_cache_create_usercopy("ufs_inode_cache",
+				sizeof(struct ufs_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct ufs_inode_info, i_u1.i_symlink),
+				sizeof_field(struct ufs_inode_info,
+					i_u1.i_symlink),
+				init_once);
 	if (ufs_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 13/30] ufs: Define usercopy region in ufs_inode_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Evgeniy Dushistov, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The ufs symlink pathnames, stored in struct ufs_inode_info.i_u1.i_symlink
and therefore contained in the ufs_inode_cache slab cache, need to be
copied to/from userspace.

cache object allocation:
    fs/ufs/super.c:
        ufs_alloc_inode(...):
            ...
            ei = kmem_cache_alloc(ufs_inode_cachep, GFP_NOFS);
            ...
            return &ei->vfs_inode;

    fs/ufs/ufs.h:
        UFS_I(struct inode *inode):
            return container_of(inode, struct ufs_inode_info, vfs_inode);

    fs/ufs/namei.c:
        ufs_symlink(...):
            ...
            inode->i_link = (char *)UFS_I(inode)->i_u1.i_symlink;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
ufs_inode_cache slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/ufs/super.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 0a4f58a5073c..646f971067bc 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -1466,11 +1466,14 @@ static void init_once(void *foo)
 
 static int __init init_inodecache(void)
 {
-	ufs_inode_cachep = kmem_cache_create("ufs_inode_cache",
-					     sizeof(struct ufs_inode_info),
-					     0, (SLAB_RECLAIM_ACCOUNT|
-						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
-					     init_once);
+	ufs_inode_cachep = kmem_cache_create_usercopy("ufs_inode_cache",
+				sizeof(struct ufs_inode_info), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
+					SLAB_ACCOUNT),
+				offsetof(struct ufs_inode_info, i_u1.i_symlink),
+				sizeof_field(struct ufs_inode_info,
+					i_u1.i_symlink),
+				init_once);
 	if (ufs_inode_cachep == NULL)
 		return -ENOMEM;
 	return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 14/30] vxfs: Define usercopy region in vxfs_inode slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Hellwig, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

vxfs symlink pathnames, stored in struct vxfs_inode_info field
vii_immed.vi_immed and therefore contained in the vxfs_inode slab cache,
need to be copied to/from userspace.

cache object allocation:
    fs/freevxfs/vxfs_super.c:
        vxfs_alloc_inode(...):
            ...
            vi = kmem_cache_alloc(vxfs_inode_cachep, GFP_KERNEL);
            ...
            return &vi->vfs_inode;

    fs/freevxfs/vxfs_inode.c:
        cxfs_iget(...):
            ...
            inode->i_link = vip->vii_immed.vi_immed;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
vxfs_inode slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/freevxfs/vxfs_super.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
index 455ce5b77e9b..c143e18d5a65 100644
--- a/fs/freevxfs/vxfs_super.c
+++ b/fs/freevxfs/vxfs_super.c
@@ -332,9 +332,13 @@ vxfs_init(void)
 {
 	int rv;
 
-	vxfs_inode_cachep = kmem_cache_create("vxfs_inode",
+	vxfs_inode_cachep = kmem_cache_create_usercopy("vxfs_inode",
 			sizeof(struct vxfs_inode_info), 0,
-			SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL);
+			SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+			offsetof(struct vxfs_inode_info, vii_immed.vi_immed),
+			sizeof_field(struct vxfs_inode_info,
+				vii_immed.vi_immed),
+			NULL);
 	if (!vxfs_inode_cachep)
 		return -ENOMEM;
 	rv = register_filesystem(&vxfs_fs_type);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 14/30] vxfs: Define usercopy region in vxfs_inode slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Hellwig, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

vxfs symlink pathnames, stored in struct vxfs_inode_info field
vii_immed.vi_immed and therefore contained in the vxfs_inode slab cache,
need to be copied to/from userspace.

cache object allocation:
    fs/freevxfs/vxfs_super.c:
        vxfs_alloc_inode(...):
            ...
            vi = kmem_cache_alloc(vxfs_inode_cachep, GFP_KERNEL);
            ...
            return &vi->vfs_inode;

    fs/freevxfs/vxfs_inode.c:
        cxfs_iget(...):
            ...
            inode->i_link = vip->vii_immed.vi_immed;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
vxfs_inode slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/freevxfs/vxfs_super.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
index 455ce5b77e9b..c143e18d5a65 100644
--- a/fs/freevxfs/vxfs_super.c
+++ b/fs/freevxfs/vxfs_super.c
@@ -332,9 +332,13 @@ vxfs_init(void)
 {
 	int rv;
 
-	vxfs_inode_cachep = kmem_cache_create("vxfs_inode",
+	vxfs_inode_cachep = kmem_cache_create_usercopy("vxfs_inode",
 			sizeof(struct vxfs_inode_info), 0,
-			SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL);
+			SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+			offsetof(struct vxfs_inode_info, vii_immed.vi_immed),
+			sizeof_field(struct vxfs_inode_info,
+				vii_immed.vi_immed),
+			NULL);
 	if (!vxfs_inode_cachep)
 		return -ENOMEM;
 	rv = register_filesystem(&vxfs_fs_type);
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 14/30] vxfs: Define usercopy region in vxfs_inode slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Hellwig, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

vxfs symlink pathnames, stored in struct vxfs_inode_info field
vii_immed.vi_immed and therefore contained in the vxfs_inode slab cache,
need to be copied to/from userspace.

cache object allocation:
    fs/freevxfs/vxfs_super.c:
        vxfs_alloc_inode(...):
            ...
            vi = kmem_cache_alloc(vxfs_inode_cachep, GFP_KERNEL);
            ...
            return &vi->vfs_inode;

    fs/freevxfs/vxfs_inode.c:
        cxfs_iget(...):
            ...
            inode->i_link = vip->vii_immed.vi_immed;

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        (inlined in vfs_readlink)
        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
vxfs_inode slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/freevxfs/vxfs_super.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/freevxfs/vxfs_super.c b/fs/freevxfs/vxfs_super.c
index 455ce5b77e9b..c143e18d5a65 100644
--- a/fs/freevxfs/vxfs_super.c
+++ b/fs/freevxfs/vxfs_super.c
@@ -332,9 +332,13 @@ vxfs_init(void)
 {
 	int rv;
 
-	vxfs_inode_cachep = kmem_cache_create("vxfs_inode",
+	vxfs_inode_cachep = kmem_cache_create_usercopy("vxfs_inode",
 			sizeof(struct vxfs_inode_info), 0,
-			SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL);
+			SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD,
+			offsetof(struct vxfs_inode_info, vii_immed.vi_immed),
+			sizeof_field(struct vxfs_inode_info,
+				vii_immed.vi_immed),
+			NULL);
 	if (!vxfs_inode_cachep)
 		return -ENOMEM;
 	rv = register_filesystem(&vxfs_fs_type);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Darrick J. Wong, linux-xfs, linux-mm,
	kernel-hardening

From: David Windsor <dave@nullcore.net>

The XFS inline inode data, stored in struct xfs_inode_t field
i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
cache, needs to be copied to/from userspace.

cache object allocation:
    fs/xfs/xfs_icache.c:
        xfs_inode_alloc(...):
            ...
            ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);

    fs/xfs/libxfs/xfs_inode_fork.c:
        xfs_init_local_fork(...):
            ...
            if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
                    ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
            ...

    fs/xfs/xfs_symlink.c:
        xfs_symlink(...):
            ...
            xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen);

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/xfs/xfs_iops.c:
        (via inode->i_op->get_link)
        xfs_vn_get_link_inline(...):
            ...
            return XFS_I(inode)->i_df.if_u1.if_data;

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            if (!link) {
                    link = inode->i_op->get_link(dentry, inode, &done);
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
xfs_inode slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/xfs/kmem.h      | 10 ++++++++++
 fs/xfs/xfs_super.c |  7 +++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index 4d85992d75b2..08358f38dee6 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -110,6 +110,16 @@ kmem_zone_init_flags(int size, char *zone_name, unsigned long flags,
 	return kmem_cache_create(zone_name, size, 0, flags, construct);
 }
 
+static inline kmem_zone_t *
+kmem_zone_init_flags_usercopy(int size, char *zone_name, unsigned long flags,
+				size_t useroffset, size_t usersize,
+				void (*construct)(void *))
+{
+	return kmem_cache_create_usercopy(zone_name, size, 0, flags,
+				useroffset, usersize, construct);
+}
+
+
 static inline void
 kmem_zone_free(kmem_zone_t *zone, void *ptr)
 {
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 38aaacdbb8b3..6ca428c6f943 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1829,9 +1829,12 @@ xfs_init_zones(void)
 		goto out_destroy_efd_zone;
 
 	xfs_inode_zone =
-		kmem_zone_init_flags(sizeof(xfs_inode_t), "xfs_inode",
+		kmem_zone_init_flags_usercopy(sizeof(xfs_inode_t), "xfs_inode",
 			KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | KM_ZONE_SPREAD |
-			KM_ZONE_ACCOUNT, xfs_fs_inode_init_once);
+				KM_ZONE_ACCOUNT,
+			offsetof(xfs_inode_t, i_df.if_u2.if_inline_data),
+			sizeof_field(xfs_inode_t, i_df.if_u2.if_inline_data),
+			xfs_fs_inode_init_once);
 	if (!xfs_inode_zone)
 		goto out_destroy_efi_zone;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Darrick J. Wong, linux-xfs, linux-mm,
	kernel-hardening

From: David Windsor <dave@nullcore.net>

The XFS inline inode data, stored in struct xfs_inode_t field
i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
cache, needs to be copied to/from userspace.

cache object allocation:
    fs/xfs/xfs_icache.c:
        xfs_inode_alloc(...):
            ...
            ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);

    fs/xfs/libxfs/xfs_inode_fork.c:
        xfs_init_local_fork(...):
            ...
            if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
                    ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
            ...

    fs/xfs/xfs_symlink.c:
        xfs_symlink(...):
            ...
            xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen);

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/xfs/xfs_iops.c:
        (via inode->i_op->get_link)
        xfs_vn_get_link_inline(...):
            ...
            return XFS_I(inode)->i_df.if_u1.if_data;

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            if (!link) {
                    link = inode->i_op->get_link(dentry, inode, &done);
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
xfs_inode slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/xfs/kmem.h      | 10 ++++++++++
 fs/xfs/xfs_super.c |  7 +++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index 4d85992d75b2..08358f38dee6 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -110,6 +110,16 @@ kmem_zone_init_flags(int size, char *zone_name, unsigned long flags,
 	return kmem_cache_create(zone_name, size, 0, flags, construct);
 }
 
+static inline kmem_zone_t *
+kmem_zone_init_flags_usercopy(int size, char *zone_name, unsigned long flags,
+				size_t useroffset, size_t usersize,
+				void (*construct)(void *))
+{
+	return kmem_cache_create_usercopy(zone_name, size, 0, flags,
+				useroffset, usersize, construct);
+}
+
+
 static inline void
 kmem_zone_free(kmem_zone_t *zone, void *ptr)
 {
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 38aaacdbb8b3..6ca428c6f943 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1829,9 +1829,12 @@ xfs_init_zones(void)
 		goto out_destroy_efd_zone;
 
 	xfs_inode_zone =
-		kmem_zone_init_flags(sizeof(xfs_inode_t), "xfs_inode",
+		kmem_zone_init_flags_usercopy(sizeof(xfs_inode_t), "xfs_inode",
 			KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | KM_ZONE_SPREAD |
-			KM_ZONE_ACCOUNT, xfs_fs_inode_init_once);
+				KM_ZONE_ACCOUNT,
+			offsetof(xfs_inode_t, i_df.if_u2.if_inline_data),
+			sizeof_field(xfs_inode_t, i_df.if_u2.if_inline_data),
+			xfs_fs_inode_init_once);
 	if (!xfs_inode_zone)
 		goto out_destroy_efi_zone;
 
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Darrick J. Wong, linux-xfs, linux-mm,
	kernel-hardening

From: David Windsor <dave@nullcore.net>

The XFS inline inode data, stored in struct xfs_inode_t field
i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
cache, needs to be copied to/from userspace.

cache object allocation:
    fs/xfs/xfs_icache.c:
        xfs_inode_alloc(...):
            ...
            ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);

    fs/xfs/libxfs/xfs_inode_fork.c:
        xfs_init_local_fork(...):
            ...
            if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
                    ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
            ...

    fs/xfs/xfs_symlink.c:
        xfs_symlink(...):
            ...
            xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen);

example usage trace:
    readlink_copy+0x43/0x70
    vfs_readlink+0x62/0x110
    SyS_readlinkat+0x100/0x130

    fs/xfs/xfs_iops.c:
        (via inode->i_op->get_link)
        xfs_vn_get_link_inline(...):
            ...
            return XFS_I(inode)->i_df.if_u1.if_data;

    fs/namei.c:
        readlink_copy(..., link):
            ...
            copy_to_user(..., link, len);

        generic_readlink(dentry, ...):
            struct inode *inode = d_inode(dentry);
            const char *link = inode->i_link;
            ...
            if (!link) {
                    link = inode->i_op->get_link(dentry, inode, &done);
            ...
            readlink_copy(..., link);

In support of usercopy hardening, this patch defines a region in the
xfs_inode slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/xfs/kmem.h      | 10 ++++++++++
 fs/xfs/xfs_super.c |  7 +++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index 4d85992d75b2..08358f38dee6 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -110,6 +110,16 @@ kmem_zone_init_flags(int size, char *zone_name, unsigned long flags,
 	return kmem_cache_create(zone_name, size, 0, flags, construct);
 }
 
+static inline kmem_zone_t *
+kmem_zone_init_flags_usercopy(int size, char *zone_name, unsigned long flags,
+				size_t useroffset, size_t usersize,
+				void (*construct)(void *))
+{
+	return kmem_cache_create_usercopy(zone_name, size, 0, flags,
+				useroffset, usersize, construct);
+}
+
+
 static inline void
 kmem_zone_free(kmem_zone_t *zone, void *ptr)
 {
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 38aaacdbb8b3..6ca428c6f943 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1829,9 +1829,12 @@ xfs_init_zones(void)
 		goto out_destroy_efd_zone;
 
 	xfs_inode_zone =
-		kmem_zone_init_flags(sizeof(xfs_inode_t), "xfs_inode",
+		kmem_zone_init_flags_usercopy(sizeof(xfs_inode_t), "xfs_inode",
 			KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | KM_ZONE_SPREAD |
-			KM_ZONE_ACCOUNT, xfs_fs_inode_init_once);
+				KM_ZONE_ACCOUNT,
+			offsetof(xfs_inode_t, i_df.if_u2.if_inline_data),
+			sizeof_field(xfs_inode_t, i_df.if_u2.if_inline_data),
+			xfs_fs_inode_init_once);
 	if (!xfs_inode_zone)
 		goto out_destroy_efi_zone;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 16/30] cifs: Define usercopy region in cifs_request slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Steve French, linux-cifs,
	samba-technical, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

CIFS request buffers, stored in the cifs_request slab cache, need to be
copied to/from userspace.

cache object allocation:
    fs/cifs/cifsfs.c:
        cifs_init_request_bufs():
            ...
            cifs_req_poolp = mempool_create_slab_pool(cifs_min_rcv,
                                                      cifs_req_cachep);

    fs/cifs/misc.c:
        cifs_buf_get():
            ...
            ret_buf = mempool_alloc(cifs_req_poolp, GFP_NOFS);
            ...
            return ret_buf;

In support of usercopy hardening, this patch defines a region in the
cifs_request slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region.  Slab
caches can now check that each copy operation involving cache-managed
memory falls entirely within the slab's usercopy region.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/cifs/cifsfs.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 180b3356ff86..09dfdf76c738 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -1229,9 +1229,11 @@ cifs_init_request_bufs(void)
 	cifs_dbg(VFS, "CIFSMaxBufSize %d 0x%x\n",
 		 CIFSMaxBufSize, CIFSMaxBufSize);
 */
-	cifs_req_cachep = kmem_cache_create("cifs_request",
+	cifs_req_cachep = kmem_cache_create_usercopy("cifs_request",
 					    CIFSMaxBufSize + max_hdr_size, 0,
-					    SLAB_HWCACHE_ALIGN, NULL);
+					    SLAB_HWCACHE_ALIGN, 0,
+					    CIFSMaxBufSize + max_hdr_size,
+					    NULL);
 	if (cifs_req_cachep == NULL)
 		return -ENOMEM;
 
@@ -1257,9 +1259,9 @@ cifs_init_request_bufs(void)
 	more SMBs to use small buffer alloc and is still much more
 	efficient to alloc 1 per page off the slab compared to 17K (5page)
 	alloc of large cifs buffers even when page debugging is on */
-	cifs_sm_req_cachep = kmem_cache_create("cifs_small_rq",
+	cifs_sm_req_cachep = kmem_cache_create_usercopy("cifs_small_rq",
 			MAX_CIFS_SMALL_BUFFER_SIZE, 0, SLAB_HWCACHE_ALIGN,
-			NULL);
+			0, MAX_CIFS_SMALL_BUFFER_SIZE, NULL);
 	if (cifs_sm_req_cachep == NULL) {
 		mempool_destroy(cifs_req_poolp);
 		kmem_cache_destroy(cifs_req_cachep);
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 16/30] cifs: Define usercopy region in cifs_request slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Steve French, linux-cifs,
	samba-technical, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

CIFS request buffers, stored in the cifs_request slab cache, need to be
copied to/from userspace.

cache object allocation:
    fs/cifs/cifsfs.c:
        cifs_init_request_bufs():
            ...
            cifs_req_poolp = mempool_create_slab_pool(cifs_min_rcv,
                                                      cifs_req_cachep);

    fs/cifs/misc.c:
        cifs_buf_get():
            ...
            ret_buf = mempool_alloc(cifs_req_poolp, GFP_NOFS);
            ...
            return ret_buf;

In support of usercopy hardening, this patch defines a region in the
cifs_request slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region.  Slab
caches can now check that each copy operation involving cache-managed
memory falls entirely within the slab's usercopy region.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/cifs/cifsfs.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 180b3356ff86..09dfdf76c738 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -1229,9 +1229,11 @@ cifs_init_request_bufs(void)
 	cifs_dbg(VFS, "CIFSMaxBufSize %d 0x%x\n",
 		 CIFSMaxBufSize, CIFSMaxBufSize);
 */
-	cifs_req_cachep = kmem_cache_create("cifs_request",
+	cifs_req_cachep = kmem_cache_create_usercopy("cifs_request",
 					    CIFSMaxBufSize + max_hdr_size, 0,
-					    SLAB_HWCACHE_ALIGN, NULL);
+					    SLAB_HWCACHE_ALIGN, 0,
+					    CIFSMaxBufSize + max_hdr_size,
+					    NULL);
 	if (cifs_req_cachep == NULL)
 		return -ENOMEM;
 
@@ -1257,9 +1259,9 @@ cifs_init_request_bufs(void)
 	more SMBs to use small buffer alloc and is still much more
 	efficient to alloc 1 per page off the slab compared to 17K (5page)
 	alloc of large cifs buffers even when page debugging is on */
-	cifs_sm_req_cachep = kmem_cache_create("cifs_small_rq",
+	cifs_sm_req_cachep = kmem_cache_create_usercopy("cifs_small_rq",
 			MAX_CIFS_SMALL_BUFFER_SIZE, 0, SLAB_HWCACHE_ALIGN,
-			NULL);
+			0, MAX_CIFS_SMALL_BUFFER_SIZE, NULL);
 	if (cifs_sm_req_cachep == NULL) {
 		mempool_destroy(cifs_req_poolp);
 		kmem_cache_destroy(cifs_req_cachep);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 16/30] cifs: Define usercopy region in cifs_request slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Steve French, linux-cifs,
	samba-technical, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

CIFS request buffers, stored in the cifs_request slab cache, need to be
copied to/from userspace.

cache object allocation:
    fs/cifs/cifsfs.c:
        cifs_init_request_bufs():
            ...
            cifs_req_poolp = mempool_create_slab_pool(cifs_min_rcv,
                                                      cifs_req_cachep);

    fs/cifs/misc.c:
        cifs_buf_get():
            ...
            ret_buf = mempool_alloc(cifs_req_poolp, GFP_NOFS);
            ...
            return ret_buf;

In support of usercopy hardening, this patch defines a region in the
cifs_request slab cache in which userspace copy operations are allowed.

This region is known as the slab cache's usercopy region.  Slab
caches can now check that each copy operation involving cache-managed
memory falls entirely within the slab's usercopy region.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: Steve French <sfrench@samba.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/cifs/cifsfs.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 180b3356ff86..09dfdf76c738 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -1229,9 +1229,11 @@ cifs_init_request_bufs(void)
 	cifs_dbg(VFS, "CIFSMaxBufSize %d 0x%x\n",
 		 CIFSMaxBufSize, CIFSMaxBufSize);
 */
-	cifs_req_cachep = kmem_cache_create("cifs_request",
+	cifs_req_cachep = kmem_cache_create_usercopy("cifs_request",
 					    CIFSMaxBufSize + max_hdr_size, 0,
-					    SLAB_HWCACHE_ALIGN, NULL);
+					    SLAB_HWCACHE_ALIGN, 0,
+					    CIFSMaxBufSize + max_hdr_size,
+					    NULL);
 	if (cifs_req_cachep == NULL)
 		return -ENOMEM;
 
@@ -1257,9 +1259,9 @@ cifs_init_request_bufs(void)
 	more SMBs to use small buffer alloc and is still much more
 	efficient to alloc 1 per page off the slab compared to 17K (5page)
 	alloc of large cifs buffers even when page debugging is on */
-	cifs_sm_req_cachep = kmem_cache_create("cifs_small_rq",
+	cifs_sm_req_cachep = kmem_cache_create_usercopy("cifs_small_rq",
 			MAX_CIFS_SMALL_BUFFER_SIZE, 0, SLAB_HWCACHE_ALIGN,
-			NULL);
+			0, MAX_CIFS_SMALL_BUFFER_SIZE, NULL);
 	if (cifs_sm_req_cachep == NULL) {
 		mempool_destroy(cifs_req_poolp);
 		kmem_cache_destroy(cifs_req_cachep);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 17/30] scsi: Define usercopy region in scsi_sense_cache slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore
contained in the scsi_sense_cache slab cache, need to be copied to/from
userspace.

cache object allocation:
    drivers/scsi/scsi_lib.c:
        scsi_select_sense_cache(...):
            return ... ? scsi_sense_isadma_cache : scsi_sense_cache

        scsi_alloc_sense_buffer(...):
            return kmem_cache_alloc_node(scsi_select_sense_cache(), ...);

        scsi_init_request(...):
            ...
            cmd->sense_buffer = scsi_alloc_sense_buffer(...);
            ...
            cmd->req.sense = cmd->sense_buffer

example usage trace:

    block/scsi_ioctl.c:
        (inline from sg_io)
        blk_complete_sghdr_rq(...):
            struct scsi_request *req = scsi_req(rq);
            ...
            copy_to_user(..., req->sense, len)

        scsi_cmd_ioctl(...):
            sg_io(...);

In support of usercopy hardening, this patch defines a region in
the scsi_sense_cache slab cache in which userspace copy operations
are allowed.

This region is known as the slab cache's usercopy region.  Slab
caches can now check that each copy operation involving cache-managed
memory falls entirely within the slab's usercopy region.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 drivers/scsi/scsi_lib.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f6097b89d5d3..f1c6bd56dd5b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -77,14 +77,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
 	if (shost->unchecked_isa_dma) {
 		scsi_sense_isadma_cache =
 			kmem_cache_create("scsi_sense_cache(DMA)",
-			SCSI_SENSE_BUFFERSIZE, 0,
-			SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
+				SCSI_SENSE_BUFFERSIZE, 0,
+				SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
 		if (!scsi_sense_isadma_cache)
 			ret = -ENOMEM;
 	} else {
 		scsi_sense_cache =
-			kmem_cache_create("scsi_sense_cache",
-			SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN, NULL);
+			kmem_cache_create_usercopy("scsi_sense_cache",
+				SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN,
+				0, SCSI_SENSE_BUFFERSIZE, NULL);
 		if (!scsi_sense_cache)
 			ret = -ENOMEM;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 17/30] scsi: Define usercopy region in scsi_sense_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore
contained in the scsi_sense_cache slab cache, need to be copied to/from
userspace.

cache object allocation:
    drivers/scsi/scsi_lib.c:
        scsi_select_sense_cache(...):
            return ... ? scsi_sense_isadma_cache : scsi_sense_cache

        scsi_alloc_sense_buffer(...):
            return kmem_cache_alloc_node(scsi_select_sense_cache(), ...);

        scsi_init_request(...):
            ...
            cmd->sense_buffer = scsi_alloc_sense_buffer(...);
            ...
            cmd->req.sense = cmd->sense_buffer

example usage trace:

    block/scsi_ioctl.c:
        (inline from sg_io)
        blk_complete_sghdr_rq(...):
            struct scsi_request *req = scsi_req(rq);
            ...
            copy_to_user(..., req->sense, len)

        scsi_cmd_ioctl(...):
            sg_io(...);

In support of usercopy hardening, this patch defines a region in
the scsi_sense_cache slab cache in which userspace copy operations
are allowed.

This region is known as the slab cache's usercopy region.  Slab
caches can now check that each copy operation involving cache-managed
memory falls entirely within the slab's usercopy region.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 drivers/scsi/scsi_lib.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f6097b89d5d3..f1c6bd56dd5b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -77,14 +77,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
 	if (shost->unchecked_isa_dma) {
 		scsi_sense_isadma_cache =
 			kmem_cache_create("scsi_sense_cache(DMA)",
-			SCSI_SENSE_BUFFERSIZE, 0,
-			SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
+				SCSI_SENSE_BUFFERSIZE, 0,
+				SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
 		if (!scsi_sense_isadma_cache)
 			ret = -ENOMEM;
 	} else {
 		scsi_sense_cache =
-			kmem_cache_create("scsi_sense_cache",
-			SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN, NULL);
+			kmem_cache_create_usercopy("scsi_sense_cache",
+				SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN,
+				0, SCSI_SENSE_BUFFERSIZE, NULL);
 		if (!scsi_sense_cache)
 			ret = -ENOMEM;
 	}
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 17/30] scsi: Define usercopy region in scsi_sense_cache slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, James E.J. Bottomley,
	Martin K. Petersen, linux-scsi, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore
contained in the scsi_sense_cache slab cache, need to be copied to/from
userspace.

cache object allocation:
    drivers/scsi/scsi_lib.c:
        scsi_select_sense_cache(...):
            return ... ? scsi_sense_isadma_cache : scsi_sense_cache

        scsi_alloc_sense_buffer(...):
            return kmem_cache_alloc_node(scsi_select_sense_cache(), ...);

        scsi_init_request(...):
            ...
            cmd->sense_buffer = scsi_alloc_sense_buffer(...);
            ...
            cmd->req.sense = cmd->sense_buffer

example usage trace:

    block/scsi_ioctl.c:
        (inline from sg_io)
        blk_complete_sghdr_rq(...):
            struct scsi_request *req = scsi_req(rq);
            ...
            copy_to_user(..., req->sense, len)

        scsi_cmd_ioctl(...):
            sg_io(...);

In support of usercopy hardening, this patch defines a region in
the scsi_sense_cache slab cache in which userspace copy operations
are allowed.

This region is known as the slab cache's usercopy region.  Slab
caches can now check that each copy operation involving cache-managed
memory falls entirely within the slab's usercopy region.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, provide usage trace]
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 drivers/scsi/scsi_lib.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f6097b89d5d3..f1c6bd56dd5b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -77,14 +77,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
 	if (shost->unchecked_isa_dma) {
 		scsi_sense_isadma_cache =
 			kmem_cache_create("scsi_sense_cache(DMA)",
-			SCSI_SENSE_BUFFERSIZE, 0,
-			SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
+				SCSI_SENSE_BUFFERSIZE, 0,
+				SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
 		if (!scsi_sense_isadma_cache)
 			ret = -ENOMEM;
 	} else {
 		scsi_sense_cache =
-			kmem_cache_create("scsi_sense_cache",
-			SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN, NULL);
+			kmem_cache_create_usercopy("scsi_sense_cache",
+				SCSI_SENSE_BUFFERSIZE, 0, SLAB_HWCACHE_ALIGN,
+				0, SCSI_SENSE_BUFFERSIZE, NULL);
 		if (!scsi_sense_cache)
 			ret = -ENOMEM;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 18/30] net: Define usercopy region in struct proto slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:34   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, David S. Miller, Eric Dumazet,
	Paolo Abeni, David Howells, netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

In support of usercopy hardening, this patch defines a region in the
struct proto slab cache in which userspace copy operations are allowed.
Some protocols need to copy objects to/from userspace, and they can
declare the region via their proto structure with the new usersize and
useroffset fields. Initially, if no region is specified (usersize ==
0), the entire field is marked as whitelisted. This allows protocols
to be whitelisted in subsequent patches. Once all protocols have been
annotated, the full-whitelist default can be removed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split off per-proto patches]
[kees: add logic for by-default full-whitelist]
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/net/sock.h | 2 ++
 net/core/sock.c    | 6 +++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7c0632c7e870..170d5b2dbcb6 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1106,6 +1106,8 @@ struct proto {
 	struct kmem_cache	*slab;
 	unsigned int		obj_size;
 	int			slab_flags;
+	size_t			useroffset;	/* Usercopy region offset */
+	size_t			usersize;	/* Usercopy region size */
 
 	struct percpu_counter	*orphan_count;
 
diff --git a/net/core/sock.c b/net/core/sock.c
index ac2a404c73eb..02dab98ca3e3 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3109,8 +3109,12 @@ static int req_prot_init(const struct proto *prot)
 int proto_register(struct proto *prot, int alloc_slab)
 {
 	if (alloc_slab) {
-		prot->slab = kmem_cache_create(prot->name, prot->obj_size, 0,
+		prot->slab = kmem_cache_create_usercopy(prot->name,
+					prot->obj_size, 0,
 					SLAB_HWCACHE_ALIGN | prot->slab_flags,
+					prot->usersize ? prot->useroffset : 0,
+					prot->usersize ? prot->usersize
+						       : prot->obj_size,
 					NULL);
 
 		if (prot->slab == NULL) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 18/30] net: Define usercopy region in struct proto slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, David S. Miller, Eric Dumazet,
	Paolo Abeni, David Howells, netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

In support of usercopy hardening, this patch defines a region in the
struct proto slab cache in which userspace copy operations are allowed.
Some protocols need to copy objects to/from userspace, and they can
declare the region via their proto structure with the new usersize and
useroffset fields. Initially, if no region is specified (usersize ==
0), the entire field is marked as whitelisted. This allows protocols
to be whitelisted in subsequent patches. Once all protocols have been
annotated, the full-whitelist default can be removed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split off per-proto patches]
[kees: add logic for by-default full-whitelist]
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/net/sock.h | 2 ++
 net/core/sock.c    | 6 +++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7c0632c7e870..170d5b2dbcb6 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1106,6 +1106,8 @@ struct proto {
 	struct kmem_cache	*slab;
 	unsigned int		obj_size;
 	int			slab_flags;
+	size_t			useroffset;	/* Usercopy region offset */
+	size_t			usersize;	/* Usercopy region size */
 
 	struct percpu_counter	*orphan_count;
 
diff --git a/net/core/sock.c b/net/core/sock.c
index ac2a404c73eb..02dab98ca3e3 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3109,8 +3109,12 @@ static int req_prot_init(const struct proto *prot)
 int proto_register(struct proto *prot, int alloc_slab)
 {
 	if (alloc_slab) {
-		prot->slab = kmem_cache_create(prot->name, prot->obj_size, 0,
+		prot->slab = kmem_cache_create_usercopy(prot->name,
+					prot->obj_size, 0,
 					SLAB_HWCACHE_ALIGN | prot->slab_flags,
+					prot->usersize ? prot->useroffset : 0,
+					prot->usersize ? prot->usersize
+						       : prot->obj_size,
 					NULL);
 
 		if (prot->slab == NULL) {
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 18/30] net: Define usercopy region in struct proto slab cache
@ 2017-08-28 21:34   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, David S. Miller, Eric Dumazet,
	Paolo Abeni, David Howells, netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

In support of usercopy hardening, this patch defines a region in the
struct proto slab cache in which userspace copy operations are allowed.
Some protocols need to copy objects to/from userspace, and they can
declare the region via their proto structure with the new usersize and
useroffset fields. Initially, if no region is specified (usersize ==
0), the entire field is marked as whitelisted. This allows protocols
to be whitelisted in subsequent patches. Once all protocols have been
annotated, the full-whitelist default can be removed.

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split off per-proto patches]
[kees: add logic for by-default full-whitelist]
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/net/sock.h | 2 ++
 net/core/sock.c    | 6 +++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7c0632c7e870..170d5b2dbcb6 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1106,6 +1106,8 @@ struct proto {
 	struct kmem_cache	*slab;
 	unsigned int		obj_size;
 	int			slab_flags;
+	size_t			useroffset;	/* Usercopy region offset */
+	size_t			usersize;	/* Usercopy region size */
 
 	struct percpu_counter	*orphan_count;
 
diff --git a/net/core/sock.c b/net/core/sock.c
index ac2a404c73eb..02dab98ca3e3 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3109,8 +3109,12 @@ static int req_prot_init(const struct proto *prot)
 int proto_register(struct proto *prot, int alloc_slab)
 {
 	if (alloc_slab) {
-		prot->slab = kmem_cache_create(prot->name, prot->obj_size, 0,
+		prot->slab = kmem_cache_create_usercopy(prot->name,
+					prot->obj_size, 0,
 					SLAB_HWCACHE_ALIGN | prot->slab_flags,
+					prot->usersize ? prot->useroffset : 0,
+					prot->usersize ? prot->usersize
+						       : prot->obj_size,
 					NULL);
 
 		if (prot->slab == NULL) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 19/30] ip: Define usercopy region in IP proto slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:35   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The ICMP filters for IPv4 and IPv6 raw sockets need to be copied to/from
userspace. In support of usercopy hardening, this patch defines a region
in the struct proto slab cache in which userspace copy operations are
allowed.

example usage trace:

    net/ipv4/raw.c:
        raw_seticmpfilter(...):
            ...
            copy_from_user(&raw_sk(sk)->filter, ..., optlen)

        raw_geticmpfilter(...):
            ...
            copy_to_user(..., &raw_sk(sk)->filter, len)

    net/ipv6/raw.c:
        rawv6_seticmpfilter(...):
            ...
            copy_from_user(&raw6_sk(sk)->filter, ..., optlen)

        rawv6_geticmpfilter(...):
            ...
            copy_to_user(..., &raw6_sk(sk)->filter, len)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: split from network patch, provide usage trace]
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/ipv4/raw.c | 2 ++
 net/ipv6/raw.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index b0bb5d0a30bd..6c7f8d2eb3af 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -964,6 +964,8 @@ struct proto raw_prot = {
 	.hash		   = raw_hash_sk,
 	.unhash		   = raw_unhash_sk,
 	.obj_size	   = sizeof(struct raw_sock),
+	.useroffset	   = offsetof(struct raw_sock, filter),
+	.usersize	   = sizeof_field(struct raw_sock, filter),
 	.h.raw_hash	   = &raw_v4_hashinfo,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_raw_setsockopt,
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 60be012fe708..27dd9a5f71c6 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1265,6 +1265,8 @@ struct proto rawv6_prot = {
 	.hash		   = raw_hash_sk,
 	.unhash		   = raw_unhash_sk,
 	.obj_size	   = sizeof(struct raw6_sock),
+	.useroffset	   = offsetof(struct raw6_sock, filter),
+	.usersize	   = sizeof_field(struct raw6_sock, filter),
 	.h.raw_hash	   = &raw_v6_hashinfo,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_rawv6_setsockopt,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 19/30] ip: Define usercopy region in IP proto slab cache
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The ICMP filters for IPv4 and IPv6 raw sockets need to be copied to/from
userspace. In support of usercopy hardening, this patch defines a region
in the struct proto slab cache in which userspace copy operations are
allowed.

example usage trace:

    net/ipv4/raw.c:
        raw_seticmpfilter(...):
            ...
            copy_from_user(&raw_sk(sk)->filter, ..., optlen)

        raw_geticmpfilter(...):
            ...
            copy_to_user(..., &raw_sk(sk)->filter, len)

    net/ipv6/raw.c:
        rawv6_seticmpfilter(...):
            ...
            copy_from_user(&raw6_sk(sk)->filter, ..., optlen)

        rawv6_geticmpfilter(...):
            ...
            copy_to_user(..., &raw6_sk(sk)->filter, len)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: split from network patch, provide usage trace]
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/ipv4/raw.c | 2 ++
 net/ipv6/raw.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index b0bb5d0a30bd..6c7f8d2eb3af 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -964,6 +964,8 @@ struct proto raw_prot = {
 	.hash		   = raw_hash_sk,
 	.unhash		   = raw_unhash_sk,
 	.obj_size	   = sizeof(struct raw_sock),
+	.useroffset	   = offsetof(struct raw_sock, filter),
+	.usersize	   = sizeof_field(struct raw_sock, filter),
 	.h.raw_hash	   = &raw_v4_hashinfo,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_raw_setsockopt,
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 60be012fe708..27dd9a5f71c6 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1265,6 +1265,8 @@ struct proto rawv6_prot = {
 	.hash		   = raw_hash_sk,
 	.unhash		   = raw_unhash_sk,
 	.obj_size	   = sizeof(struct raw6_sock),
+	.useroffset	   = offsetof(struct raw6_sock, filter),
+	.usersize	   = sizeof_field(struct raw6_sock, filter),
 	.h.raw_hash	   = &raw_v6_hashinfo,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_rawv6_setsockopt,
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 19/30] ip: Define usercopy region in IP proto slab cache
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, David S. Miller, Alexey Kuznetsov,
	Hideaki YOSHIFUJI, netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The ICMP filters for IPv4 and IPv6 raw sockets need to be copied to/from
userspace. In support of usercopy hardening, this patch defines a region
in the struct proto slab cache in which userspace copy operations are
allowed.

example usage trace:

    net/ipv4/raw.c:
        raw_seticmpfilter(...):
            ...
            copy_from_user(&raw_sk(sk)->filter, ..., optlen)

        raw_geticmpfilter(...):
            ...
            copy_to_user(..., &raw_sk(sk)->filter, len)

    net/ipv6/raw.c:
        rawv6_seticmpfilter(...):
            ...
            copy_from_user(&raw6_sk(sk)->filter, ..., optlen)

        rawv6_geticmpfilter(...):
            ...
            copy_to_user(..., &raw6_sk(sk)->filter, len)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: split from network patch, provide usage trace]
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/ipv4/raw.c | 2 ++
 net/ipv6/raw.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index b0bb5d0a30bd..6c7f8d2eb3af 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -964,6 +964,8 @@ struct proto raw_prot = {
 	.hash		   = raw_hash_sk,
 	.unhash		   = raw_unhash_sk,
 	.obj_size	   = sizeof(struct raw_sock),
+	.useroffset	   = offsetof(struct raw_sock, filter),
+	.usersize	   = sizeof_field(struct raw_sock, filter),
 	.h.raw_hash	   = &raw_v4_hashinfo,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_raw_setsockopt,
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 60be012fe708..27dd9a5f71c6 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1265,6 +1265,8 @@ struct proto rawv6_prot = {
 	.hash		   = raw_hash_sk,
 	.unhash		   = raw_unhash_sk,
 	.obj_size	   = sizeof(struct raw6_sock),
+	.useroffset	   = offsetof(struct raw6_sock, filter),
+	.usersize	   = sizeof_field(struct raw6_sock, filter),
 	.h.raw_hash	   = &raw_v6_hashinfo,
 #ifdef CONFIG_COMPAT
 	.compat_setsockopt = compat_rawv6_setsockopt,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 20/30] caif: Define usercopy region in caif proto slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:35   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Dmitry Tarnyagin, David S. Miller,
	netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The CAIF channel connection request parameters need to be copied to/from
userspace. In support of usercopy hardening, this patch defines a region
in the struct proto slab cache in which userspace copy operations are
allowed.

example usage trace:

    net/caif/caif_socket.c:
        setsockopt(...):
            ...
            copy_from_user(&cf_sk->conn_req.param.data, ..., ol)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: split from network patch, provide usage trace]
Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/caif/caif_socket.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 632d5a416d97..c76d513b9a7a 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -1032,6 +1032,8 @@ static int caif_create(struct net *net, struct socket *sock, int protocol,
 	static struct proto prot = {.name = "PF_CAIF",
 		.owner = THIS_MODULE,
 		.obj_size = sizeof(struct caifsock),
+		.useroffset = offsetof(struct caifsock, conn_req.param),
+		.usersize = sizeof_field(struct caifsock, conn_req.param)
 	};
 
 	if (!capable(CAP_SYS_ADMIN) && !capable(CAP_NET_ADMIN))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 20/30] caif: Define usercopy region in caif proto slab cache
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Dmitry Tarnyagin, David S. Miller,
	netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The CAIF channel connection request parameters need to be copied to/from
userspace. In support of usercopy hardening, this patch defines a region
in the struct proto slab cache in which userspace copy operations are
allowed.

example usage trace:

    net/caif/caif_socket.c:
        setsockopt(...):
            ...
            copy_from_user(&cf_sk->conn_req.param.data, ..., ol)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: split from network patch, provide usage trace]
Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/caif/caif_socket.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 632d5a416d97..c76d513b9a7a 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -1032,6 +1032,8 @@ static int caif_create(struct net *net, struct socket *sock, int protocol,
 	static struct proto prot = {.name = "PF_CAIF",
 		.owner = THIS_MODULE,
 		.obj_size = sizeof(struct caifsock),
+		.useroffset = offsetof(struct caifsock, conn_req.param),
+		.usersize = sizeof_field(struct caifsock, conn_req.param)
 	};
 
 	if (!capable(CAP_SYS_ADMIN) && !capable(CAP_NET_ADMIN))
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 20/30] caif: Define usercopy region in caif proto slab cache
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Dmitry Tarnyagin, David S. Miller,
	netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The CAIF channel connection request parameters need to be copied to/from
userspace. In support of usercopy hardening, this patch defines a region
in the struct proto slab cache in which userspace copy operations are
allowed.

example usage trace:

    net/caif/caif_socket.c:
        setsockopt(...):
            ...
            copy_from_user(&cf_sk->conn_req.param.data, ..., ol)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: split from network patch, provide usage trace]
Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/caif/caif_socket.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 632d5a416d97..c76d513b9a7a 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -1032,6 +1032,8 @@ static int caif_create(struct net *net, struct socket *sock, int protocol,
 	static struct proto prot = {.name = "PF_CAIF",
 		.owner = THIS_MODULE,
 		.obj_size = sizeof(struct caifsock),
+		.useroffset = offsetof(struct caifsock, conn_req.param),
+		.usersize = sizeof_field(struct caifsock, conn_req.param)
 	};
 
 	if (!capable(CAP_SYS_ADMIN) && !capable(CAP_NET_ADMIN))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 21/30] sctp: Define usercopy region in SCTP proto slab cache
  2017-08-28 21:34 ` Kees Cook
  (?)
  (?)
@ 2017-08-28 21:35   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Vlad Yasevich, Neil Horman,
	David S. Miller, linux-sctp, netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The SCTP socket event notification subscription information need to be
copied to/from userspace. In support of usercopy hardening, this patch
defines a region in the struct proto slab cache in which userspace copy
operations are allowed. Additionally moves the usercopy fields to be
adjacent for the region to cover both.

example usage trace:

    net/sctp/socket.c:
        sctp_getsockopt_events(...):
            ...
            copy_to_user(..., &sctp_sk(sk)->subscribe, len)

        sctp_setsockopt_events(...):
            ...
            copy_from_user(&sctp_sk(sk)->subscribe, ..., optlen)

        sctp_getsockopt_initmsg(...):
            ...
            copy_to_user(..., &sctp_sk(sk)->initmsg, len)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: split from network patch, move struct member adjacent, provide usage]
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/net/sctp/structs.h | 9 +++++++--
 net/sctp/socket.c          | 4 ++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 5ab29af8ca8a..f1d7810e200e 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -202,12 +202,17 @@ struct sctp_sock {
 	/* Flags controlling Heartbeat, SACK delay, and Path MTU Discovery. */
 	__u32 param_flags;
 
-	struct sctp_initmsg initmsg;
 	struct sctp_rtoinfo rtoinfo;
 	struct sctp_paddrparams paddrparam;
-	struct sctp_event_subscribe subscribe;
 	struct sctp_assocparams assocparams;
 
+	/*
+	 * These two structures must be grouped together for the usercopy
+	 * whitelist region.
+	 */
+	struct sctp_event_subscribe subscribe;
+	struct sctp_initmsg initmsg;
+
 	int user_frag;
 
 	__u32 autoclose;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 1db478e34520..c8784cb216e4 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -8235,6 +8235,10 @@ struct proto sctp_prot = {
 	.unhash      =	sctp_unhash,
 	.get_port    =	sctp_get_port,
 	.obj_size    =  sizeof(struct sctp_sock),
+	.useroffset  =  offsetof(struct sctp_sock, subscribe),
+	.usersize    =  offsetof(struct sctp_sock, initmsg) -
+				offsetof(struct sctp_sock, subscribe) +
+				sizeof_field(struct sctp_sock, initmsg),
 	.sysctl_mem  =  sysctl_sctp_mem,
 	.sysctl_rmem =  sysctl_sctp_rmem,
 	.sysctl_wmem =  sysctl_sctp_wmem,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 21/30] sctp: Define usercopy region in SCTP proto slab cache
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Vlad Yasevich, Neil Horman,
	David S. Miller, linux-sctp, netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The SCTP socket event notification subscription information need to be
copied to/from userspace. In support of usercopy hardening, this patch
defines a region in the struct proto slab cache in which userspace copy
operations are allowed. Additionally moves the usercopy fields to be
adjacent for the region to cover both.

example usage trace:

    net/sctp/socket.c:
        sctp_getsockopt_events(...):
            ...
            copy_to_user(..., &sctp_sk(sk)->subscribe, len)

        sctp_setsockopt_events(...):
            ...
            copy_from_user(&sctp_sk(sk)->subscribe, ..., optlen)

        sctp_getsockopt_initmsg(...):
            ...
            copy_to_user(..., &sctp_sk(sk)->initmsg, len)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: split from network patch, move struct member adjacent, provide usage]
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/net/sctp/structs.h | 9 +++++++--
 net/sctp/socket.c          | 4 ++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 5ab29af8ca8a..f1d7810e200e 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -202,12 +202,17 @@ struct sctp_sock {
 	/* Flags controlling Heartbeat, SACK delay, and Path MTU Discovery. */
 	__u32 param_flags;
 
-	struct sctp_initmsg initmsg;
 	struct sctp_rtoinfo rtoinfo;
 	struct sctp_paddrparams paddrparam;
-	struct sctp_event_subscribe subscribe;
 	struct sctp_assocparams assocparams;
 
+	/*
+	 * These two structures must be grouped together for the usercopy
+	 * whitelist region.
+	 */
+	struct sctp_event_subscribe subscribe;
+	struct sctp_initmsg initmsg;
+
 	int user_frag;
 
 	__u32 autoclose;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 1db478e34520..c8784cb216e4 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -8235,6 +8235,10 @@ struct proto sctp_prot = {
 	.unhash      =	sctp_unhash,
 	.get_port    =	sctp_get_port,
 	.obj_size    =  sizeof(struct sctp_sock),
+	.useroffset  =  offsetof(struct sctp_sock, subscribe),
+	.usersize    =  offsetof(struct sctp_sock, initmsg) -
+				offsetof(struct sctp_sock, subscribe) +
+				sizeof_field(struct sctp_sock, initmsg),
 	.sysctl_mem  =  sysctl_sctp_mem,
 	.sysctl_rmem =  sysctl_sctp_rmem,
 	.sysctl_wmem =  sysctl_sctp_wmem,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 21/30] sctp: Define usercopy region in SCTP proto slab cache
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Vlad Yasevich, Neil Horman,
	David S. Miller, linux-sctp, netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The SCTP socket event notification subscription information need to be
copied to/from userspace. In support of usercopy hardening, this patch
defines a region in the struct proto slab cache in which userspace copy
operations are allowed. Additionally moves the usercopy fields to be
adjacent for the region to cover both.

example usage trace:

    net/sctp/socket.c:
        sctp_getsockopt_events(...):
            ...
            copy_to_user(..., &sctp_sk(sk)->subscribe, len)

        sctp_setsockopt_events(...):
            ...
            copy_from_user(&sctp_sk(sk)->subscribe, ..., optlen)

        sctp_getsockopt_initmsg(...):
            ...
            copy_to_user(..., &sctp_sk(sk)->initmsg, len)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: split from network patch, move struct member adjacent, provide usage]
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/net/sctp/structs.h | 9 +++++++--
 net/sctp/socket.c          | 4 ++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 5ab29af8ca8a..f1d7810e200e 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -202,12 +202,17 @@ struct sctp_sock {
 	/* Flags controlling Heartbeat, SACK delay, and Path MTU Discovery. */
 	__u32 param_flags;
 
-	struct sctp_initmsg initmsg;
 	struct sctp_rtoinfo rtoinfo;
 	struct sctp_paddrparams paddrparam;
-	struct sctp_event_subscribe subscribe;
 	struct sctp_assocparams assocparams;
 
+	/*
+	 * These two structures must be grouped together for the usercopy
+	 * whitelist region.
+	 */
+	struct sctp_event_subscribe subscribe;
+	struct sctp_initmsg initmsg;
+
 	int user_frag;
 
 	__u32 autoclose;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 1db478e34520..c8784cb216e4 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -8235,6 +8235,10 @@ struct proto sctp_prot = {
 	.unhash      =	sctp_unhash,
 	.get_port    =	sctp_get_port,
 	.obj_size    =  sizeof(struct sctp_sock),
+	.useroffset  =  offsetof(struct sctp_sock, subscribe),
+	.usersize    =  offsetof(struct sctp_sock, initmsg) -
+				offsetof(struct sctp_sock, subscribe) +
+				sizeof_field(struct sctp_sock, initmsg),
 	.sysctl_mem  =  sysctl_sctp_mem,
 	.sysctl_rmem =  sysctl_sctp_rmem,
 	.sysctl_wmem =  sysctl_sctp_wmem,
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 21/30] sctp: Define usercopy region in SCTP proto slab cache
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Vlad Yasevich, Neil Horman,
	David S. Miller, linux-sctp, netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The SCTP socket event notification subscription information need to be
copied to/from userspace. In support of usercopy hardening, this patch
defines a region in the struct proto slab cache in which userspace copy
operations are allowed. Additionally moves the usercopy fields to be
adjacent for the region to cover both.

example usage trace:

    net/sctp/socket.c:
        sctp_getsockopt_events(...):
            ...
            copy_to_user(..., &sctp_sk(sk)->subscribe, len)

        sctp_setsockopt_events(...):
            ...
            copy_from_user(&sctp_sk(sk)->subscribe, ..., optlen)

        sctp_getsockopt_initmsg(...):
            ...
            copy_to_user(..., &sctp_sk(sk)->initmsg, len)

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: split from network patch, move struct member adjacent, provide usage]
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/net/sctp/structs.h | 9 +++++++--
 net/sctp/socket.c          | 4 ++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 5ab29af8ca8a..f1d7810e200e 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -202,12 +202,17 @@ struct sctp_sock {
 	/* Flags controlling Heartbeat, SACK delay, and Path MTU Discovery. */
 	__u32 param_flags;
 
-	struct sctp_initmsg initmsg;
 	struct sctp_rtoinfo rtoinfo;
 	struct sctp_paddrparams paddrparam;
-	struct sctp_event_subscribe subscribe;
 	struct sctp_assocparams assocparams;
 
+	/*
+	 * These two structures must be grouped together for the usercopy
+	 * whitelist region.
+	 */
+	struct sctp_event_subscribe subscribe;
+	struct sctp_initmsg initmsg;
+
 	int user_frag;
 
 	__u32 autoclose;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 1db478e34520..c8784cb216e4 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -8235,6 +8235,10 @@ struct proto sctp_prot = {
 	.unhash      =	sctp_unhash,
 	.get_port    =	sctp_get_port,
 	.obj_size    =  sizeof(struct sctp_sock),
+	.useroffset  =  offsetof(struct sctp_sock, subscribe),
+	.usersize    =  offsetof(struct sctp_sock, initmsg) -
+				offsetof(struct sctp_sock, subscribe) +
+				sizeof_field(struct sctp_sock, initmsg),
 	.sysctl_mem  =  sysctl_sctp_mem,
 	.sysctl_rmem =  sysctl_sctp_rmem,
 	.sysctl_wmem =  sysctl_sctp_wmem,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 22/30] sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
  2017-08-28 21:34 ` Kees Cook
  (?)
  (?)
@ 2017-08-28 21:35   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Vlad Yasevich, Neil Horman,
	David S. Miller, linux-sctp, netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The autoclose field can be copied with put_user(), so there is no need to
use copy_to_user(). In both cases, hardened usercopy is being bypassed
since the size is constant, and not open to runtime manipulation.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log]
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/sctp/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index c8784cb216e4..a29e41e19d64 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4882,7 +4882,7 @@ static int sctp_getsockopt_autoclose(struct sock *sk, int len, char __user *optv
 	len = sizeof(int);
 	if (put_user(len, optlen))
 		return -EFAULT;
-	if (copy_to_user(optval, &sctp_sk(sk)->autoclose, sizeof(int)))
+	if (put_user(sctp_sk(sk)->autoclose, (int __user *)optval))
 		return -EFAULT;
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 22/30] sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Vlad Yasevich, Neil Horman,
	David S. Miller, linux-sctp, netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The autoclose field can be copied with put_user(), so there is no need to
use copy_to_user(). In both cases, hardened usercopy is being bypassed
since the size is constant, and not open to runtime manipulation.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log]
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/sctp/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index c8784cb216e4..a29e41e19d64 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4882,7 +4882,7 @@ static int sctp_getsockopt_autoclose(struct sock *sk, int len, char __user *optv
 	len = sizeof(int);
 	if (put_user(len, optlen))
 		return -EFAULT;
-	if (copy_to_user(optval, &sctp_sk(sk)->autoclose, sizeof(int)))
+	if (put_user(sctp_sk(sk)->autoclose, (int __user *)optval))
 		return -EFAULT;
 	return 0;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 22/30] sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Vlad Yasevich, Neil Horman,
	David S. Miller, linux-sctp, netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The autoclose field can be copied with put_user(), so there is no need to
use copy_to_user(). In both cases, hardened usercopy is being bypassed
since the size is constant, and not open to runtime manipulation.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log]
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/sctp/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index c8784cb216e4..a29e41e19d64 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4882,7 +4882,7 @@ static int sctp_getsockopt_autoclose(struct sock *sk, int len, char __user *optv
 	len = sizeof(int);
 	if (put_user(len, optlen))
 		return -EFAULT;
-	if (copy_to_user(optval, &sctp_sk(sk)->autoclose, sizeof(int)))
+	if (put_user(sctp_sk(sk)->autoclose, (int __user *)optval))
 		return -EFAULT;
 	return 0;
 }
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 22/30] sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Vlad Yasevich, Neil Horman,
	David S. Miller, linux-sctp, netdev, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

The autoclose field can be copied with put_user(), so there is no need to
use copy_to_user(). In both cases, hardened usercopy is being bypassed
since the size is constant, and not open to runtime manipulation.

This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log]
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/sctp/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index c8784cb216e4..a29e41e19d64 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4882,7 +4882,7 @@ static int sctp_getsockopt_autoclose(struct sock *sk, int len, char __user *optv
 	len = sizeof(int);
 	if (put_user(len, optlen))
 		return -EFAULT;
-	if (copy_to_user(optval, &sctp_sk(sk)->autoclose, sizeof(int)))
+	if (put_user(sctp_sk(sk)->autoclose, (int __user *)optval))
 		return -EFAULT;
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 23/30] net: Restrict unwhitelisted proto caches to size 0
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:35   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David S. Miller, Eric Dumazet, Paolo Abeni,
	David Howells, netdev, linux-mm, kernel-hardening, David Windsor

Now that protocols have been annotated (the copy of icsk_ca_ops->name
is of an ops field from outside the slab cache):

$ git grep 'copy_.*_user.*sk.*->'
caif/caif_socket.c: copy_from_user(&cf_sk->conn_req.param.data, ov, ol)) {
ipv4/raw.c:   if (copy_from_user(&raw_sk(sk)->filter, optval, optlen))
ipv4/raw.c:       copy_to_user(optval, &raw_sk(sk)->filter, len))
ipv4/tcp.c:       if (copy_to_user(optval, icsk->icsk_ca_ops->name, len))
ipv4/tcp.c:       if (copy_to_user(optval, icsk->icsk_ulp_ops->name, len))
ipv6/raw.c:       if (copy_from_user(&raw6_sk(sk)->filter, optval, optlen))
ipv6/raw.c:           if (copy_to_user(optval, &raw6_sk(sk)->filter, len))
sctp/socket.c: if (copy_from_user(&sctp_sk(sk)->subscribe, optval, optlen))
sctp/socket.c: if (copy_to_user(optval, &sctp_sk(sk)->subscribe, len))
sctp/socket.c: if (copy_to_user(optval, &sctp_sk(sk)->initmsg, len))

we can switch the default proto usercopy region to size 0. Any protocols
needing to add whitelisted regions must annotate the fields with the
useroffset and usersize fields of struct proto.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/core/sock.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 02dab98ca3e3..c7d0afa1d0b1 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3112,9 +3112,7 @@ int proto_register(struct proto *prot, int alloc_slab)
 		prot->slab = kmem_cache_create_usercopy(prot->name,
 					prot->obj_size, 0,
 					SLAB_HWCACHE_ALIGN | prot->slab_flags,
-					prot->usersize ? prot->useroffset : 0,
-					prot->usersize ? prot->usersize
-						       : prot->obj_size,
+					prot->useroffset, prot->usersize,
 					NULL);
 
 		if (prot->slab == NULL) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 23/30] net: Restrict unwhitelisted proto caches to size 0
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David S. Miller, Eric Dumazet, Paolo Abeni,
	David Howells, netdev, linux-mm, kernel-hardening, David Windsor

Now that protocols have been annotated (the copy of icsk_ca_ops->name
is of an ops field from outside the slab cache):

$ git grep 'copy_.*_user.*sk.*->'
caif/caif_socket.c: copy_from_user(&cf_sk->conn_req.param.data, ov, ol)) {
ipv4/raw.c:   if (copy_from_user(&raw_sk(sk)->filter, optval, optlen))
ipv4/raw.c:       copy_to_user(optval, &raw_sk(sk)->filter, len))
ipv4/tcp.c:       if (copy_to_user(optval, icsk->icsk_ca_ops->name, len))
ipv4/tcp.c:       if (copy_to_user(optval, icsk->icsk_ulp_ops->name, len))
ipv6/raw.c:       if (copy_from_user(&raw6_sk(sk)->filter, optval, optlen))
ipv6/raw.c:           if (copy_to_user(optval, &raw6_sk(sk)->filter, len))
sctp/socket.c: if (copy_from_user(&sctp_sk(sk)->subscribe, optval, optlen))
sctp/socket.c: if (copy_to_user(optval, &sctp_sk(sk)->subscribe, len))
sctp/socket.c: if (copy_to_user(optval, &sctp_sk(sk)->initmsg, len))

we can switch the default proto usercopy region to size 0. Any protocols
needing to add whitelisted regions must annotate the fields with the
useroffset and usersize fields of struct proto.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/core/sock.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 02dab98ca3e3..c7d0afa1d0b1 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3112,9 +3112,7 @@ int proto_register(struct proto *prot, int alloc_slab)
 		prot->slab = kmem_cache_create_usercopy(prot->name,
 					prot->obj_size, 0,
 					SLAB_HWCACHE_ALIGN | prot->slab_flags,
-					prot->usersize ? prot->useroffset : 0,
-					prot->usersize ? prot->usersize
-						       : prot->obj_size,
+					prot->useroffset, prot->usersize,
 					NULL);
 
 		if (prot->slab == NULL) {
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 23/30] net: Restrict unwhitelisted proto caches to size 0
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David S. Miller, Eric Dumazet, Paolo Abeni,
	David Howells, netdev, linux-mm, kernel-hardening, David Windsor

Now that protocols have been annotated (the copy of icsk_ca_ops->name
is of an ops field from outside the slab cache):

$ git grep 'copy_.*_user.*sk.*->'
caif/caif_socket.c: copy_from_user(&cf_sk->conn_req.param.data, ov, ol)) {
ipv4/raw.c:   if (copy_from_user(&raw_sk(sk)->filter, optval, optlen))
ipv4/raw.c:       copy_to_user(optval, &raw_sk(sk)->filter, len))
ipv4/tcp.c:       if (copy_to_user(optval, icsk->icsk_ca_ops->name, len))
ipv4/tcp.c:       if (copy_to_user(optval, icsk->icsk_ulp_ops->name, len))
ipv6/raw.c:       if (copy_from_user(&raw6_sk(sk)->filter, optval, optlen))
ipv6/raw.c:           if (copy_to_user(optval, &raw6_sk(sk)->filter, len))
sctp/socket.c: if (copy_from_user(&sctp_sk(sk)->subscribe, optval, optlen))
sctp/socket.c: if (copy_to_user(optval, &sctp_sk(sk)->subscribe, len))
sctp/socket.c: if (copy_to_user(optval, &sctp_sk(sk)->initmsg, len))

we can switch the default proto usercopy region to size 0. Any protocols
needing to add whitelisted regions must annotate the fields with the
useroffset and usersize fields of struct proto.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/core/sock.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 02dab98ca3e3..c7d0afa1d0b1 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3112,9 +3112,7 @@ int proto_register(struct proto *prot, int alloc_slab)
 		prot->slab = kmem_cache_create_usercopy(prot->name,
 					prot->obj_size, 0,
 					SLAB_HWCACHE_ALIGN | prot->slab_flags,
-					prot->usersize ? prot->useroffset : 0,
-					prot->usersize ? prot->usersize
-						       : prot->obj_size,
+					prot->useroffset, prot->usersize,
 					NULL);
 
 		if (prot->slab == NULL) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 24/30] fork: Define usercopy region in mm_struct slab caches
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:35   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Andy Lutomirski, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

In support of usercopy hardening, this patch defines a region in the
mm_struct slab caches in which userspace copy operations are allowed.
Only the auxv field is copied to userspace.

cache object allocation:
    kernel/fork.c:
        #define allocate_mm()     (kmem_cache_alloc(mm_cachep, GFP_KERNEL))

        dup_mm():
            ...
            mm = allocate_mm();

        copy_mm(...):
            ...
            dup_mm();

        copy_process(...):
            ...
            copy_mm(...)

        _do_fork(...):
            ...
            copy_process(...)

example usage trace:

    fs/binfmt_elf.c:
        create_elf_tables(...):
            ...
            elf_info = (elf_addr_t *)current->mm->saved_auxv;
            ...
            copy_to_user(..., elf_info, ei_index * sizeof(elf_addr_t))

        load_elf_binary(...):
            ...
            create_elf_tables(...);

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split patch, provide usage trace]
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/fork.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 17921b0390b4..d8ebf755a47b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2206,9 +2206,11 @@ void __init proc_caches_init(void)
 	 * maximum number of CPU's we can ever have.  The cpumask_allocation
 	 * is at the end of the structure, exactly for that reason.
 	 */
-	mm_cachep = kmem_cache_create("mm_struct",
+	mm_cachep = kmem_cache_create_usercopy("mm_struct",
 			sizeof(struct mm_struct), ARCH_MIN_MMSTRUCT_ALIGN,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NOTRACK|SLAB_ACCOUNT,
+			offsetof(struct mm_struct, saved_auxv),
+			sizeof_field(struct mm_struct, saved_auxv),
 			NULL);
 	vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
 	mmap_init();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 24/30] fork: Define usercopy region in mm_struct slab caches
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Andy Lutomirski, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

In support of usercopy hardening, this patch defines a region in the
mm_struct slab caches in which userspace copy operations are allowed.
Only the auxv field is copied to userspace.

cache object allocation:
    kernel/fork.c:
        #define allocate_mm()     (kmem_cache_alloc(mm_cachep, GFP_KERNEL))

        dup_mm():
            ...
            mm = allocate_mm();

        copy_mm(...):
            ...
            dup_mm();

        copy_process(...):
            ...
            copy_mm(...)

        _do_fork(...):
            ...
            copy_process(...)

example usage trace:

    fs/binfmt_elf.c:
        create_elf_tables(...):
            ...
            elf_info = (elf_addr_t *)current->mm->saved_auxv;
            ...
            copy_to_user(..., elf_info, ei_index * sizeof(elf_addr_t))

        load_elf_binary(...):
            ...
            create_elf_tables(...);

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split patch, provide usage trace]
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/fork.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 17921b0390b4..d8ebf755a47b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2206,9 +2206,11 @@ void __init proc_caches_init(void)
 	 * maximum number of CPU's we can ever have.  The cpumask_allocation
 	 * is at the end of the structure, exactly for that reason.
 	 */
-	mm_cachep = kmem_cache_create("mm_struct",
+	mm_cachep = kmem_cache_create_usercopy("mm_struct",
 			sizeof(struct mm_struct), ARCH_MIN_MMSTRUCT_ALIGN,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NOTRACK|SLAB_ACCOUNT,
+			offsetof(struct mm_struct, saved_auxv),
+			sizeof_field(struct mm_struct, saved_auxv),
 			NULL);
 	vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
 	mmap_init();
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 24/30] fork: Define usercopy region in mm_struct slab caches
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Andy Lutomirski, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

In support of usercopy hardening, this patch defines a region in the
mm_struct slab caches in which userspace copy operations are allowed.
Only the auxv field is copied to userspace.

cache object allocation:
    kernel/fork.c:
        #define allocate_mm()     (kmem_cache_alloc(mm_cachep, GFP_KERNEL))

        dup_mm():
            ...
            mm = allocate_mm();

        copy_mm(...):
            ...
            dup_mm();

        copy_process(...):
            ...
            copy_mm(...)

        _do_fork(...):
            ...
            copy_process(...)

example usage trace:

    fs/binfmt_elf.c:
        create_elf_tables(...):
            ...
            elf_info = (elf_addr_t *)current->mm->saved_auxv;
            ...
            copy_to_user(..., elf_info, ei_index * sizeof(elf_addr_t))

        load_elf_binary(...):
            ...
            create_elf_tables(...);

This region is known as the slab cache's usercopy region. Slab caches can
now check that each copy operation involving cache-managed memory falls
entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split patch, provide usage trace]
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/fork.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 17921b0390b4..d8ebf755a47b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2206,9 +2206,11 @@ void __init proc_caches_init(void)
 	 * maximum number of CPU's we can ever have.  The cpumask_allocation
 	 * is at the end of the structure, exactly for that reason.
 	 */
-	mm_cachep = kmem_cache_create("mm_struct",
+	mm_cachep = kmem_cache_create_usercopy("mm_struct",
 			sizeof(struct mm_struct), ARCH_MIN_MMSTRUCT_ALIGN,
 			SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_NOTRACK|SLAB_ACCOUNT,
+			offsetof(struct mm_struct, saved_auxv),
+			sizeof_field(struct mm_struct, saved_auxv),
 			NULL);
 	vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT);
 	mmap_init();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 25/30] fork: Define usercopy region in thread_stack slab caches
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:35   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Andy Lutomirski, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

In support of usercopy hardening, this patch defines a region in the
thread_stack slab caches in which userspace copy operations are allowed.
Since the entire thread_stack needs to be available to userspace, the
entire slab contents are whitelisted. Note that the slab-based thread
stack is only present on systems with THREAD_SIZE < PAGE_SIZE and
!CONFIG_VMAP_STACK.

cache object allocation:
    kernel/fork.c:
        alloc_thread_stack_node(...):
            return kmem_cache_alloc_node(thread_stack_cache, ...)

        dup_task_struct(...):
            ...
            stack = alloc_thread_stack_node(...)
            ...
            tsk->stack = stack;

        copy_process(...):
            ...
            dup_task_struct(...)

        _do_fork(...):
            ...
            copy_process(...)

This region is known as the slab cache's usercopy region. Slab caches
can now check that each copy operation involving cache-managed memory
falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split patch, provide usage trace]
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/fork.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d8ebf755a47b..0f33fb1aabbf 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -276,8 +276,9 @@ static void free_thread_stack(struct task_struct *tsk)
 
 void thread_stack_cache_init(void)
 {
-	thread_stack_cache = kmem_cache_create("thread_stack", THREAD_SIZE,
-					      THREAD_SIZE, 0, NULL);
+	thread_stack_cache = kmem_cache_create_usercopy("thread_stack",
+					THREAD_SIZE, THREAD_SIZE, 0, 0,
+					THREAD_SIZE, NULL);
 	BUG_ON(thread_stack_cache == NULL);
 }
 # endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 25/30] fork: Define usercopy region in thread_stack slab caches
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Andy Lutomirski, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

In support of usercopy hardening, this patch defines a region in the
thread_stack slab caches in which userspace copy operations are allowed.
Since the entire thread_stack needs to be available to userspace, the
entire slab contents are whitelisted. Note that the slab-based thread
stack is only present on systems with THREAD_SIZE < PAGE_SIZE and
!CONFIG_VMAP_STACK.

cache object allocation:
    kernel/fork.c:
        alloc_thread_stack_node(...):
            return kmem_cache_alloc_node(thread_stack_cache, ...)

        dup_task_struct(...):
            ...
            stack = alloc_thread_stack_node(...)
            ...
            tsk->stack = stack;

        copy_process(...):
            ...
            dup_task_struct(...)

        _do_fork(...):
            ...
            copy_process(...)

This region is known as the slab cache's usercopy region. Slab caches
can now check that each copy operation involving cache-managed memory
falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split patch, provide usage trace]
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/fork.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d8ebf755a47b..0f33fb1aabbf 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -276,8 +276,9 @@ static void free_thread_stack(struct task_struct *tsk)
 
 void thread_stack_cache_init(void)
 {
-	thread_stack_cache = kmem_cache_create("thread_stack", THREAD_SIZE,
-					      THREAD_SIZE, 0, NULL);
+	thread_stack_cache = kmem_cache_create_usercopy("thread_stack",
+					THREAD_SIZE, THREAD_SIZE, 0, 0,
+					THREAD_SIZE, NULL);
 	BUG_ON(thread_stack_cache == NULL);
 }
 # endif
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 25/30] fork: Define usercopy region in thread_stack slab caches
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Andy Lutomirski, linux-mm, kernel-hardening

From: David Windsor <dave@nullcore.net>

In support of usercopy hardening, this patch defines a region in the
thread_stack slab caches in which userspace copy operations are allowed.
Since the entire thread_stack needs to be available to userspace, the
entire slab contents are whitelisted. Note that the slab-based thread
stack is only present on systems with THREAD_SIZE < PAGE_SIZE and
!CONFIG_VMAP_STACK.

cache object allocation:
    kernel/fork.c:
        alloc_thread_stack_node(...):
            return kmem_cache_alloc_node(thread_stack_cache, ...)

        dup_task_struct(...):
            ...
            stack = alloc_thread_stack_node(...)
            ...
            tsk->stack = stack;

        copy_process(...):
            ...
            dup_task_struct(...)

        _do_fork(...):
            ...
            copy_process(...)

This region is known as the slab cache's usercopy region. Slab caches
can now check that each copy operation involving cache-managed memory
falls entirely within the slab's usercopy region.

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Signed-off-by: David Windsor <dave@nullcore.net>
[kees: adjust commit log, split patch, provide usage trace]
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/fork.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d8ebf755a47b..0f33fb1aabbf 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -276,8 +276,9 @@ static void free_thread_stack(struct task_struct *tsk)
 
 void thread_stack_cache_init(void)
 {
-	thread_stack_cache = kmem_cache_create("thread_stack", THREAD_SIZE,
-					      THREAD_SIZE, 0, NULL);
+	thread_stack_cache = kmem_cache_create_usercopy("thread_stack",
+					THREAD_SIZE, THREAD_SIZE, 0, 0,
+					THREAD_SIZE, NULL);
 	BUG_ON(thread_stack_cache == NULL);
 }
 # endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 26/30] fork: Provide usercopy whitelisting for task_struct
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:35   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Andrew Morton, Nicholas Piggin, Laura Abbott,
	Mickaël Salaün, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, linux-mm, kernel-hardening, David Windsor

While the blocked and saved_sigmask fields of task_struct are copied to
userspace (via sigmask_to_save() and setup_rt_frame()), it is always
copied with a static length (i.e. sizeof(sigset_t)).

The only portion of task_struct that is potentially dynamically sized and
may be copied to userspace is in the architecture-specific thread_struct
at the end of task_struct.

cache object allocation:
    kernel/fork.c:
        alloc_task_struct_node(...):
            return kmem_cache_alloc_node(task_struct_cachep, ...);

        dup_task_struct(...):
            ...
            tsk = alloc_task_struct_node(node);

        copy_process(...):
            ...
            dup_task_struct(...)

        _do_fork(...):
            ...
            copy_process(...)

example usage trace:

    arch/x86/kernel/fpu/signal.c:
        __fpu__restore_sig(...):
            ...
            struct task_struct *tsk = current;
            struct fpu *fpu = &tsk->thread.fpu;
            ...
            __copy_from_user(&fpu->state.xsave, ..., state_size);

        fpu__restore_sig(...):
            ...
            return __fpu__restore_sig(...);

    arch/x86/kernel/signal.c:
        restore_sigcontext(...):
            ...
            fpu__restore_sig(...)

This introduces arch_thread_struct_whitelist() to let an architecture
declare specifically where the whitelist should be within thread_struct.
If undefined, the entire thread_struct field is left whitelisted.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: "Mickaël Salaün" <mic@digikod.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/Kconfig               | 11 +++++++++++
 include/linux/sched/task.h | 14 ++++++++++++++
 kernel/fork.c              | 22 ++++++++++++++++++++--
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089117fe..380d2bc2001b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -241,6 +241,17 @@ config ARCH_INIT_TASK
 config ARCH_TASK_STRUCT_ALLOCATOR
 	bool
 
+config HAVE_ARCH_THREAD_STRUCT_WHITELIST
+	bool
+	depends on !ARCH_TASK_STRUCT_ALLOCATOR
+	help
+	  An architecture should select this to provide hardened usercopy
+	  knowledge about what region of the thread_struct should be
+	  whitelisted for copying to userspace. Normally this is only the
+	  FPU registers. Specifically, arch_thread_struct_whitelist()
+	  should be implemented. Without this, the entire thread_struct
+	  field in task_struct will be left whitelisted.
+
 # Select if arch has its private alloc_thread_stack() function
 config ARCH_THREAD_STACK_ALLOCATOR
 	bool
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index c97e5f096927..60f36adaa504 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -104,6 +104,20 @@ extern int arch_task_struct_size __read_mostly;
 # define arch_task_struct_size (sizeof(struct task_struct))
 #endif
 
+#ifndef CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST
+/*
+ * If an architecture has not declared a thread_struct whitelist we
+ * must assume something there may need to be copied to userspace.
+ */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+						unsigned long *size)
+{
+	*offset = 0;
+	/* Handle dynamically sized thread_struct. */
+	*size = arch_task_struct_size - offsetof(struct task_struct, thread);
+}
+#endif
+
 #ifdef CONFIG_VMAP_STACK
 static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
 {
diff --git a/kernel/fork.c b/kernel/fork.c
index 0f33fb1aabbf..4fcc9cc8e108 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -452,6 +452,21 @@ static void set_max_threads(unsigned int max_threads_suggested)
 int arch_task_struct_size __read_mostly;
 #endif
 
+static void task_struct_whitelist(unsigned long *offset, unsigned long *size)
+{
+	/* Fetch thread_struct whitelist for the architecture. */
+	arch_thread_struct_whitelist(offset, size);
+
+	/*
+	 * Handle zero-sized whitelist or empty thread_struct, otherwise
+	 * adjust offset to position of thread_struct in task_struct.
+	 */
+	if (unlikely(*size == 0))
+		*offset = 0;
+	else
+		*offset += offsetof(struct task_struct, thread);
+}
+
 void __init fork_init(void)
 {
 	int i;
@@ -460,11 +475,14 @@ void __init fork_init(void)
 #define ARCH_MIN_TASKALIGN	0
 #endif
 	int align = max_t(int, L1_CACHE_BYTES, ARCH_MIN_TASKALIGN);
+	unsigned long useroffset, usersize;
 
 	/* create a slab on which task_structs can be allocated */
-	task_struct_cachep = kmem_cache_create("task_struct",
+	task_struct_whitelist(&useroffset, &usersize);
+	task_struct_cachep = kmem_cache_create_usercopy("task_struct",
 			arch_task_struct_size, align,
-			SLAB_PANIC|SLAB_NOTRACK|SLAB_ACCOUNT, NULL);
+			SLAB_PANIC|SLAB_NOTRACK|SLAB_ACCOUNT,
+			useroffset, usersize, NULL);
 #endif
 
 	/* do the arch specific task caches init */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 26/30] fork: Provide usercopy whitelisting for task_struct
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Andrew Morton, Nicholas Piggin, Laura Abbott,
	Mickaël Salaün, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, linux-mm, kernel-hardening, David Windsor

While the blocked and saved_sigmask fields of task_struct are copied to
userspace (via sigmask_to_save() and setup_rt_frame()), it is always
copied with a static length (i.e. sizeof(sigset_t)).

The only portion of task_struct that is potentially dynamically sized and
may be copied to userspace is in the architecture-specific thread_struct
at the end of task_struct.

cache object allocation:
    kernel/fork.c:
        alloc_task_struct_node(...):
            return kmem_cache_alloc_node(task_struct_cachep, ...);

        dup_task_struct(...):
            ...
            tsk = alloc_task_struct_node(node);

        copy_process(...):
            ...
            dup_task_struct(...)

        _do_fork(...):
            ...
            copy_process(...)

example usage trace:

    arch/x86/kernel/fpu/signal.c:
        __fpu__restore_sig(...):
            ...
            struct task_struct *tsk = current;
            struct fpu *fpu = &tsk->thread.fpu;
            ...
            __copy_from_user(&fpu->state.xsave, ..., state_size);

        fpu__restore_sig(...):
            ...
            return __fpu__restore_sig(...);

    arch/x86/kernel/signal.c:
        restore_sigcontext(...):
            ...
            fpu__restore_sig(...)

This introduces arch_thread_struct_whitelist() to let an architecture
declare specifically where the whitelist should be within thread_struct.
If undefined, the entire thread_struct field is left whitelisted.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: "MickaA<<l SalaA 1/4 n" <mic@digikod.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/Kconfig               | 11 +++++++++++
 include/linux/sched/task.h | 14 ++++++++++++++
 kernel/fork.c              | 22 ++++++++++++++++++++--
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089117fe..380d2bc2001b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -241,6 +241,17 @@ config ARCH_INIT_TASK
 config ARCH_TASK_STRUCT_ALLOCATOR
 	bool
 
+config HAVE_ARCH_THREAD_STRUCT_WHITELIST
+	bool
+	depends on !ARCH_TASK_STRUCT_ALLOCATOR
+	help
+	  An architecture should select this to provide hardened usercopy
+	  knowledge about what region of the thread_struct should be
+	  whitelisted for copying to userspace. Normally this is only the
+	  FPU registers. Specifically, arch_thread_struct_whitelist()
+	  should be implemented. Without this, the entire thread_struct
+	  field in task_struct will be left whitelisted.
+
 # Select if arch has its private alloc_thread_stack() function
 config ARCH_THREAD_STACK_ALLOCATOR
 	bool
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index c97e5f096927..60f36adaa504 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -104,6 +104,20 @@ extern int arch_task_struct_size __read_mostly;
 # define arch_task_struct_size (sizeof(struct task_struct))
 #endif
 
+#ifndef CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST
+/*
+ * If an architecture has not declared a thread_struct whitelist we
+ * must assume something there may need to be copied to userspace.
+ */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+						unsigned long *size)
+{
+	*offset = 0;
+	/* Handle dynamically sized thread_struct. */
+	*size = arch_task_struct_size - offsetof(struct task_struct, thread);
+}
+#endif
+
 #ifdef CONFIG_VMAP_STACK
 static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
 {
diff --git a/kernel/fork.c b/kernel/fork.c
index 0f33fb1aabbf..4fcc9cc8e108 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -452,6 +452,21 @@ static void set_max_threads(unsigned int max_threads_suggested)
 int arch_task_struct_size __read_mostly;
 #endif
 
+static void task_struct_whitelist(unsigned long *offset, unsigned long *size)
+{
+	/* Fetch thread_struct whitelist for the architecture. */
+	arch_thread_struct_whitelist(offset, size);
+
+	/*
+	 * Handle zero-sized whitelist or empty thread_struct, otherwise
+	 * adjust offset to position of thread_struct in task_struct.
+	 */
+	if (unlikely(*size == 0))
+		*offset = 0;
+	else
+		*offset += offsetof(struct task_struct, thread);
+}
+
 void __init fork_init(void)
 {
 	int i;
@@ -460,11 +475,14 @@ void __init fork_init(void)
 #define ARCH_MIN_TASKALIGN	0
 #endif
 	int align = max_t(int, L1_CACHE_BYTES, ARCH_MIN_TASKALIGN);
+	unsigned long useroffset, usersize;
 
 	/* create a slab on which task_structs can be allocated */
-	task_struct_cachep = kmem_cache_create("task_struct",
+	task_struct_whitelist(&useroffset, &usersize);
+	task_struct_cachep = kmem_cache_create_usercopy("task_struct",
 			arch_task_struct_size, align,
-			SLAB_PANIC|SLAB_NOTRACK|SLAB_ACCOUNT, NULL);
+			SLAB_PANIC|SLAB_NOTRACK|SLAB_ACCOUNT,
+			useroffset, usersize, NULL);
 #endif
 
 	/* do the arch specific task caches init */
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 26/30] fork: Provide usercopy whitelisting for task_struct
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Andrew Morton, Nicholas Piggin, Laura Abbott,
	Mickaël Salaün, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, linux-mm, kernel-hardening, David Windsor

While the blocked and saved_sigmask fields of task_struct are copied to
userspace (via sigmask_to_save() and setup_rt_frame()), it is always
copied with a static length (i.e. sizeof(sigset_t)).

The only portion of task_struct that is potentially dynamically sized and
may be copied to userspace is in the architecture-specific thread_struct
at the end of task_struct.

cache object allocation:
    kernel/fork.c:
        alloc_task_struct_node(...):
            return kmem_cache_alloc_node(task_struct_cachep, ...);

        dup_task_struct(...):
            ...
            tsk = alloc_task_struct_node(node);

        copy_process(...):
            ...
            dup_task_struct(...)

        _do_fork(...):
            ...
            copy_process(...)

example usage trace:

    arch/x86/kernel/fpu/signal.c:
        __fpu__restore_sig(...):
            ...
            struct task_struct *tsk = current;
            struct fpu *fpu = &tsk->thread.fpu;
            ...
            __copy_from_user(&fpu->state.xsave, ..., state_size);

        fpu__restore_sig(...):
            ...
            return __fpu__restore_sig(...);

    arch/x86/kernel/signal.c:
        restore_sigcontext(...):
            ...
            fpu__restore_sig(...)

This introduces arch_thread_struct_whitelist() to let an architecture
declare specifically where the whitelist should be within thread_struct.
If undefined, the entire thread_struct field is left whitelisted.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: "Mickaël Salaün" <mic@digikod.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/Kconfig               | 11 +++++++++++
 include/linux/sched/task.h | 14 ++++++++++++++
 kernel/fork.c              | 22 ++++++++++++++++++++--
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089117fe..380d2bc2001b 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -241,6 +241,17 @@ config ARCH_INIT_TASK
 config ARCH_TASK_STRUCT_ALLOCATOR
 	bool
 
+config HAVE_ARCH_THREAD_STRUCT_WHITELIST
+	bool
+	depends on !ARCH_TASK_STRUCT_ALLOCATOR
+	help
+	  An architecture should select this to provide hardened usercopy
+	  knowledge about what region of the thread_struct should be
+	  whitelisted for copying to userspace. Normally this is only the
+	  FPU registers. Specifically, arch_thread_struct_whitelist()
+	  should be implemented. Without this, the entire thread_struct
+	  field in task_struct will be left whitelisted.
+
 # Select if arch has its private alloc_thread_stack() function
 config ARCH_THREAD_STACK_ALLOCATOR
 	bool
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index c97e5f096927..60f36adaa504 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -104,6 +104,20 @@ extern int arch_task_struct_size __read_mostly;
 # define arch_task_struct_size (sizeof(struct task_struct))
 #endif
 
+#ifndef CONFIG_HAVE_ARCH_THREAD_STRUCT_WHITELIST
+/*
+ * If an architecture has not declared a thread_struct whitelist we
+ * must assume something there may need to be copied to userspace.
+ */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+						unsigned long *size)
+{
+	*offset = 0;
+	/* Handle dynamically sized thread_struct. */
+	*size = arch_task_struct_size - offsetof(struct task_struct, thread);
+}
+#endif
+
 #ifdef CONFIG_VMAP_STACK
 static inline struct vm_struct *task_stack_vm_area(const struct task_struct *t)
 {
diff --git a/kernel/fork.c b/kernel/fork.c
index 0f33fb1aabbf..4fcc9cc8e108 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -452,6 +452,21 @@ static void set_max_threads(unsigned int max_threads_suggested)
 int arch_task_struct_size __read_mostly;
 #endif
 
+static void task_struct_whitelist(unsigned long *offset, unsigned long *size)
+{
+	/* Fetch thread_struct whitelist for the architecture. */
+	arch_thread_struct_whitelist(offset, size);
+
+	/*
+	 * Handle zero-sized whitelist or empty thread_struct, otherwise
+	 * adjust offset to position of thread_struct in task_struct.
+	 */
+	if (unlikely(*size == 0))
+		*offset = 0;
+	else
+		*offset += offsetof(struct task_struct, thread);
+}
+
 void __init fork_init(void)
 {
 	int i;
@@ -460,11 +475,14 @@ void __init fork_init(void)
 #define ARCH_MIN_TASKALIGN	0
 #endif
 	int align = max_t(int, L1_CACHE_BYTES, ARCH_MIN_TASKALIGN);
+	unsigned long useroffset, usersize;
 
 	/* create a slab on which task_structs can be allocated */
-	task_struct_cachep = kmem_cache_create("task_struct",
+	task_struct_whitelist(&useroffset, &usersize);
+	task_struct_cachep = kmem_cache_create_usercopy("task_struct",
 			arch_task_struct_size, align,
-			SLAB_PANIC|SLAB_NOTRACK|SLAB_ACCOUNT, NULL);
+			SLAB_PANIC|SLAB_NOTRACK|SLAB_ACCOUNT,
+			useroffset, usersize, NULL);
 #endif
 
 	/* do the arch specific task caches init */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 27/30] x86: Implement thread_struct whitelist for hardened usercopy
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:35   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Borislav Petkov, Andy Lutomirski, Mathias Krause, linux-mm,
	kernel-hardening, David Windsor

This whitelists the FPU register state portion of the thread_struct for
copying to userspace, instead of the default entire struct.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/Kconfig                 | 1 +
 arch/x86/include/asm/processor.h | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 781521b7cf9e..a8793721483c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -113,6 +113,7 @@ config X86
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if MMU && COMPAT
 	select HAVE_ARCH_COMPAT_MMAP_BASES	if MMU && COMPAT
 	select HAVE_ARCH_SECCOMP_FILTER
+	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 028245e1c42b..dc52ba90f090 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -481,6 +481,14 @@ struct thread_struct {
 	 */
 };
 
+/* Whitelist the FPU state from the task_struct for hardened usercopy. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+						unsigned long *size)
+{
+	*offset = offsetof(struct thread_struct, fpu.state);
+	*size = fpu_kernel_xstate_size;
+}
+
 /*
  * Thread-synchronous status.
  *
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 27/30] x86: Implement thread_struct whitelist for hardened usercopy
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Borislav Petkov, Andy Lutomirski, Mathias Krause, linux-mm,
	kernel-hardening, David Windsor

This whitelists the FPU register state portion of the thread_struct for
copying to userspace, instead of the default entire struct.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/Kconfig                 | 1 +
 arch/x86/include/asm/processor.h | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 781521b7cf9e..a8793721483c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -113,6 +113,7 @@ config X86
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if MMU && COMPAT
 	select HAVE_ARCH_COMPAT_MMAP_BASES	if MMU && COMPAT
 	select HAVE_ARCH_SECCOMP_FILTER
+	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 028245e1c42b..dc52ba90f090 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -481,6 +481,14 @@ struct thread_struct {
 	 */
 };
 
+/* Whitelist the FPU state from the task_struct for hardened usercopy. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+						unsigned long *size)
+{
+	*offset = offsetof(struct thread_struct, fpu.state);
+	*size = fpu_kernel_xstate_size;
+}
+
 /*
  * Thread-synchronous status.
  *
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 27/30] x86: Implement thread_struct whitelist for hardened usercopy
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Borislav Petkov, Andy Lutomirski, Mathias Krause, linux-mm,
	kernel-hardening, David Windsor

This whitelists the FPU register state portion of the thread_struct for
copying to userspace, instead of the default entire struct.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/Kconfig                 | 1 +
 arch/x86/include/asm/processor.h | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 781521b7cf9e..a8793721483c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -113,6 +113,7 @@ config X86
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS	if MMU && COMPAT
 	select HAVE_ARCH_COMPAT_MMAP_BASES	if MMU && COMPAT
 	select HAVE_ARCH_SECCOMP_FILTER
+	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 028245e1c42b..dc52ba90f090 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -481,6 +481,14 @@ struct thread_struct {
 	 */
 };
 
+/* Whitelist the FPU state from the task_struct for hardened usercopy. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+						unsigned long *size)
+{
+	*offset = offsetof(struct thread_struct, fpu.state);
+	*size = fpu_kernel_xstate_size;
+}
+
 /*
  * Thread-synchronous status.
  *
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 28/30] arm64: Implement thread_struct whitelist for hardened usercopy
  2017-08-28 21:34 ` Kees Cook
  (?)
  (?)
@ 2017-08-28 21:35   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Catalin Marinas, Will Deacon, Christian Borntraeger,
	Ingo Molnar, James Morse, Peter Zijlstra (Intel),
	Dave Martin, zijun_hu, linux-arm-kernel, linux-mm,
	kernel-hardening, David Windsor

This whitelists the FPU register state portion of the thread_struct for
copying to userspace, instead of the default entire structure.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: zijun_hu <zijun_hu@htc.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm64/Kconfig                 | 1 +
 arch/arm64/include/asm/processor.h | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index dfd908630631..b773299bc4e3 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -73,6 +73,7 @@ config ARM64
 	select HAVE_ARCH_MMAP_RND_BITS
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
 	select HAVE_ARCH_SECCOMP_FILTER
+	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_ARM_SMCCC
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 64c9e78f9882..799f112e5ff7 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -90,6 +90,14 @@ struct thread_struct {
 	struct debug_info	debug;		/* debugging */
 };
 
+/* Whitelist the fpsimd_state for copying to userspace. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+						unsigned long *size)
+{
+	*offset = offsetof(struct thread_struct, fpsimd_state);
+	*size = sizeof(struct fpsimd_state);
+}
+
 #ifdef CONFIG_COMPAT
 #define task_user_tls(t)						\
 ({									\
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 28/30] arm64: Implement thread_struct whitelist for hardened usercopy
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Catalin Marinas, Will Deacon, Christian Borntraeger,
	Ingo Molnar, James Morse, Peter Zijlstra (Intel),
	Dave Martin, zijun_hu, linux-arm-kernel, linux-mm,
	kernel-hardening, David Windsor

This whitelists the FPU register state portion of the thread_struct for
copying to userspace, instead of the default entire structure.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: zijun_hu <zijun_hu@htc.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm64/Kconfig                 | 1 +
 arch/arm64/include/asm/processor.h | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index dfd908630631..b773299bc4e3 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -73,6 +73,7 @@ config ARM64
 	select HAVE_ARCH_MMAP_RND_BITS
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
 	select HAVE_ARCH_SECCOMP_FILTER
+	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_ARM_SMCCC
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 64c9e78f9882..799f112e5ff7 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -90,6 +90,14 @@ struct thread_struct {
 	struct debug_info	debug;		/* debugging */
 };
 
+/* Whitelist the fpsimd_state for copying to userspace. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+						unsigned long *size)
+{
+	*offset = offsetof(struct thread_struct, fpsimd_state);
+	*size = sizeof(struct fpsimd_state);
+}
+
 #ifdef CONFIG_COMPAT
 #define task_user_tls(t)						\
 ({									\
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 28/30] arm64: Implement thread_struct whitelist for hardened usercopy
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-arm-kernel

This whitelists the FPU register state portion of the thread_struct for
copying to userspace, instead of the default entire structure.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: zijun_hu <zijun_hu@htc.com>
Cc: linux-arm-kernel at lists.infradead.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm64/Kconfig                 | 1 +
 arch/arm64/include/asm/processor.h | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index dfd908630631..b773299bc4e3 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -73,6 +73,7 @@ config ARM64
 	select HAVE_ARCH_MMAP_RND_BITS
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
 	select HAVE_ARCH_SECCOMP_FILTER
+	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_ARM_SMCCC
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 64c9e78f9882..799f112e5ff7 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -90,6 +90,14 @@ struct thread_struct {
 	struct debug_info	debug;		/* debugging */
 };
 
+/* Whitelist the fpsimd_state for copying to userspace. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+						unsigned long *size)
+{
+	*offset = offsetof(struct thread_struct, fpsimd_state);
+	*size = sizeof(struct fpsimd_state);
+}
+
 #ifdef CONFIG_COMPAT
 #define task_user_tls(t)						\
 ({									\
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 28/30] arm64: Implement thread_struct whitelist for hardened usercopy
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Catalin Marinas, Will Deacon, Christian Borntraeger,
	Ingo Molnar, James Morse, Peter Zijlstra (Intel),
	Dave Martin, zijun_hu, linux-arm-kernel, linux-mm,
	kernel-hardening, David Windsor

This whitelists the FPU register state portion of the thread_struct for
copying to userspace, instead of the default entire structure.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: zijun_hu <zijun_hu@htc.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm64/Kconfig                 | 1 +
 arch/arm64/include/asm/processor.h | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index dfd908630631..b773299bc4e3 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -73,6 +73,7 @@ config ARM64
 	select HAVE_ARCH_MMAP_RND_BITS
 	select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
 	select HAVE_ARCH_SECCOMP_FILTER
+	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_ARM_SMCCC
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 64c9e78f9882..799f112e5ff7 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -90,6 +90,14 @@ struct thread_struct {
 	struct debug_info	debug;		/* debugging */
 };
 
+/* Whitelist the fpsimd_state for copying to userspace. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+						unsigned long *size)
+{
+	*offset = offsetof(struct thread_struct, fpsimd_state);
+	*size = sizeof(struct fpsimd_state);
+}
+
 #ifdef CONFIG_COMPAT
 #define task_user_tls(t)						\
 ({									\
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 29/30] arm: Implement thread_struct whitelist for hardened usercopy
  2017-08-28 21:34 ` Kees Cook
  (?)
  (?)
@ 2017-08-28 21:35   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Russell King, Ingo Molnar, Christian Borntraeger,
	Peter Zijlstra (Intel),
	linux-arm-kernel, linux-mm, kernel-hardening, David Windsor

ARM does not carry FPU state in the thread structure, so it can declare
no usercopy whitelist at all.

Cc: Russell King <linux@armlinux.org.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm/Kconfig                 | 1 +
 arch/arm/include/asm/processor.h | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a208bfe367b5..3781f08d00fa 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -48,6 +48,7 @@ config ARM
 	select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
 	select HAVE_ARCH_MMAP_RND_BITS if MMU
 	select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
+	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARM_SMCCC if CPU_V7
 	select HAVE_CBPF_JIT
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index c3d5fc124a05..d6dc45c92ee5 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -45,6 +45,13 @@ struct thread_struct {
 	struct debug_info	debug;
 };
 
+/* Nothing needs to be usercopy-whitelisted from thread_struct. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+						unsigned long *size)
+{
+	*offset = *size = 0;
+}
+
 #define INIT_THREAD  {	}
 
 #ifdef CONFIG_MMU
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 29/30] arm: Implement thread_struct whitelist for hardened usercopy
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Russell King, Ingo Molnar, Christian Borntraeger,
	Peter Zijlstra (Intel),
	linux-arm-kernel, linux-mm, kernel-hardening, David Windsor

ARM does not carry FPU state in the thread structure, so it can declare
no usercopy whitelist at all.

Cc: Russell King <linux@armlinux.org.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm/Kconfig                 | 1 +
 arch/arm/include/asm/processor.h | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a208bfe367b5..3781f08d00fa 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -48,6 +48,7 @@ config ARM
 	select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
 	select HAVE_ARCH_MMAP_RND_BITS if MMU
 	select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
+	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARM_SMCCC if CPU_V7
 	select HAVE_CBPF_JIT
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index c3d5fc124a05..d6dc45c92ee5 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -45,6 +45,13 @@ struct thread_struct {
 	struct debug_info	debug;
 };
 
+/* Nothing needs to be usercopy-whitelisted from thread_struct. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+						unsigned long *size)
+{
+	*offset = *size = 0;
+}
+
 #define INIT_THREAD  {	}
 
 #ifdef CONFIG_MMU
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 29/30] arm: Implement thread_struct whitelist for hardened usercopy
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-arm-kernel

ARM does not carry FPU state in the thread structure, so it can declare
no usercopy whitelist at all.

Cc: Russell King <linux@armlinux.org.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: linux-arm-kernel at lists.infradead.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm/Kconfig                 | 1 +
 arch/arm/include/asm/processor.h | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a208bfe367b5..3781f08d00fa 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -48,6 +48,7 @@ config ARM
 	select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
 	select HAVE_ARCH_MMAP_RND_BITS if MMU
 	select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
+	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARM_SMCCC if CPU_V7
 	select HAVE_CBPF_JIT
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index c3d5fc124a05..d6dc45c92ee5 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -45,6 +45,13 @@ struct thread_struct {
 	struct debug_info	debug;
 };
 
+/* Nothing needs to be usercopy-whitelisted from thread_struct. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+						unsigned long *size)
+{
+	*offset = *size = 0;
+}
+
 #define INIT_THREAD  {	}
 
 #ifdef CONFIG_MMU
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 29/30] arm: Implement thread_struct whitelist for hardened usercopy
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Russell King, Ingo Molnar, Christian Borntraeger,
	Peter Zijlstra (Intel),
	linux-arm-kernel, linux-mm, kernel-hardening, David Windsor

ARM does not carry FPU state in the thread structure, so it can declare
no usercopy whitelist at all.

Cc: Russell King <linux@armlinux.org.uk>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm/Kconfig                 | 1 +
 arch/arm/include/asm/processor.h | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a208bfe367b5..3781f08d00fa 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -48,6 +48,7 @@ config ARM
 	select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
 	select HAVE_ARCH_MMAP_RND_BITS if MMU
 	select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
+	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_ARM_SMCCC if CPU_V7
 	select HAVE_CBPF_JIT
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index c3d5fc124a05..d6dc45c92ee5 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -45,6 +45,13 @@ struct thread_struct {
 	struct debug_info	debug;
 };
 
+/* Nothing needs to be usercopy-whitelisted from thread_struct. */
+static inline void arch_thread_struct_whitelist(unsigned long *offset,
+						unsigned long *size)
+{
+	*offset = *size = 0;
+}
+
 #define INIT_THREAD  {	}
 
 #ifdef CONFIG_MMU
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 30/30] usercopy: Restrict non-usercopy caches to size 0
  2017-08-28 21:34 ` Kees Cook
  (?)
@ 2017-08-28 21:35   ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm,
	kernel-hardening

With all known usercopied cache whitelists now defined in the
kernel, switch the default usercopy region of kmem_cache_create()
to size 0. Any new caches with usercopy regions will now need to use
kmem_cache_create_usercopy() instead of kmem_cache_create().

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Cc: David Windsor <dave@nullcore.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 mm/slab_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index f662f4e2fa29..d51c0a36d58b 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -511,7 +511,7 @@ struct kmem_cache *
 kmem_cache_create(const char *name, size_t size, size_t align,
 		unsigned long flags, void (*ctor)(void *))
 {
-	return kmem_cache_create_usercopy(name, size, align, flags, 0, size,
+	return kmem_cache_create_usercopy(name, size, align, flags, 0, 0,
 					  ctor);
 }
 EXPORT_SYMBOL(kmem_cache_create);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [PATCH v2 30/30] usercopy: Restrict non-usercopy caches to size 0
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm,
	kernel-hardening

With all known usercopied cache whitelists now defined in the
kernel, switch the default usercopy region of kmem_cache_create()
to size 0. Any new caches with usercopy regions will now need to use
kmem_cache_create_usercopy() instead of kmem_cache_create().

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Cc: David Windsor <dave@nullcore.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 mm/slab_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index f662f4e2fa29..d51c0a36d58b 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -511,7 +511,7 @@ struct kmem_cache *
 kmem_cache_create(const char *name, size_t size, size_t align,
 		unsigned long flags, void (*ctor)(void *))
 {
-	return kmem_cache_create_usercopy(name, size, align, flags, 0, size,
+	return kmem_cache_create_usercopy(name, size, align, flags, 0, 0,
 					  ctor);
 }
 EXPORT_SYMBOL(kmem_cache_create);
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* [kernel-hardening] [PATCH v2 30/30] usercopy: Restrict non-usercopy caches to size 0
@ 2017-08-28 21:35   ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, David Windsor, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, linux-mm,
	kernel-hardening

With all known usercopied cache whitelists now defined in the
kernel, switch the default usercopy region of kmem_cache_create()
to size 0. Any new caches with usercopy regions will now need to use
kmem_cache_create_usercopy() instead of kmem_cache_create().

This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
whitelisting code in the last public patch of grsecurity/PaX based on my
understanding of the code. Changes or omissions from the original code are
mine and don't reflect the original grsecurity/PaX code.

Cc: David Windsor <dave@nullcore.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 mm/slab_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index f662f4e2fa29..d51c0a36d58b 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -511,7 +511,7 @@ struct kmem_cache *
 kmem_cache_create(const char *name, size_t size, size_t align,
 		unsigned long flags, void (*ctor)(void *))
 {
-	return kmem_cache_create_usercopy(name, size, align, flags, 0, size,
+	return kmem_cache_create_usercopy(name, size, align, flags, 0, 0,
 					  ctor);
 }
 EXPORT_SYMBOL(kmem_cache_create);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 17/30] scsi: Define usercopy region in scsi_sense_cache slab cache
  2017-08-28 21:34   ` Kees Cook
  (?)
@ 2017-08-28 21:42     ` Bart Van Assche
  -1 siblings, 0 replies; 172+ messages in thread
From: Bart Van Assche @ 2017-08-28 21:42 UTC (permalink / raw)
  To: keescook, linux-kernel
  Cc: jejb, linux-scsi, linux-mm, kernel-hardening, martin.petersen, dave

On Mon, 2017-08-28 at 14:34 -0700, Kees Cook wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f6097b89d5d3..f1c6bd56dd5b 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -77,14 +77,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
>  	if (shost->unchecked_isa_dma) {
>  		scsi_sense_isadma_cache =
>  			kmem_cache_create("scsi_sense_cache(DMA)",
> -			SCSI_SENSE_BUFFERSIZE, 0,
> -			SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
> +				SCSI_SENSE_BUFFERSIZE, 0,
> +				SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
>  		if (!scsi_sense_isadma_cache)
>  			ret = -ENOMEM;

All this part of this patch does is to change source code indentation. Should
these changes really be included in this patch?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 17/30] scsi: Define usercopy region in scsi_sense_cache slab cache
@ 2017-08-28 21:42     ` Bart Van Assche
  0 siblings, 0 replies; 172+ messages in thread
From: Bart Van Assche @ 2017-08-28 21:42 UTC (permalink / raw)
  To: keescook, linux-kernel
  Cc: jejb, linux-scsi, linux-mm, kernel-hardening, martin.petersen, dave

On Mon, 2017-08-28 at 14:34 -0700, Kees Cook wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f6097b89d5d3..f1c6bd56dd5b 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -77,14 +77,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
>  	if (shost->unchecked_isa_dma) {
>  		scsi_sense_isadma_cache =
>  			kmem_cache_create("scsi_sense_cache(DMA)",
> -			SCSI_SENSE_BUFFERSIZE, 0,
> -			SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
> +				SCSI_SENSE_BUFFERSIZE, 0,
> +				SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
>  		if (!scsi_sense_isadma_cache)
>  			ret = -ENOMEM;

All this part of this patch does is to change source code indentation. Should
these changes really be included in this patch?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 17/30] scsi: Define usercopy region in scsi_sense_cache slab cache
@ 2017-08-28 21:42     ` Bart Van Assche
  0 siblings, 0 replies; 172+ messages in thread
From: Bart Van Assche @ 2017-08-28 21:42 UTC (permalink / raw)
  To: keescook, linux-kernel
  Cc: jejb, linux-scsi, linux-mm, kernel-hardening, martin.petersen, dave

On Mon, 2017-08-28 at 14:34 -0700, Kees Cook wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f6097b89d5d3..f1c6bd56dd5b 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -77,14 +77,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
>  	if (shost->unchecked_isa_dma) {
>  		scsi_sense_isadma_cache =
>  			kmem_cache_create("scsi_sense_cache(DMA)",
> -			SCSI_SENSE_BUFFERSIZE, 0,
> -			SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
> +				SCSI_SENSE_BUFFERSIZE, 0,
> +				SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
>  		if (!scsi_sense_isadma_cache)
>  			ret = -ENOMEM;

All this part of this patch does is to change source code indentation. Should
these changes really be included in this patch?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-28 21:34   ` Kees Cook
  (?)
@ 2017-08-28 21:49     ` Darrick J. Wong
  -1 siblings, 0 replies; 172+ messages in thread
From: Darrick J. Wong @ 2017-08-28 21:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, David Windsor, linux-xfs, linux-mm, kernel-hardening

On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> From: David Windsor <dave@nullcore.net>
> 
> The XFS inline inode data, stored in struct xfs_inode_t field
> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> cache, needs to be copied to/from userspace.
> 
> cache object allocation:
>     fs/xfs/xfs_icache.c:
>         xfs_inode_alloc(...):
>             ...
>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> 
>     fs/xfs/libxfs/xfs_inode_fork.c:
>         xfs_init_local_fork(...):
>             ...
>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;

Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
will be allocated for ifp->if_u1.if_data which can then be used for
readlink in the same manner as the example usage trace below.  Does
that allocated object have a need for a usercopy annotation like
the one we're adding for if_inline_data?  Or is that already covered
elsewhere?

--D

>             ...
> 
>     fs/xfs/xfs_symlink.c:
>         xfs_symlink(...):
>             ...
>             xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen);
> 
> example usage trace:
>     readlink_copy+0x43/0x70
>     vfs_readlink+0x62/0x110
>     SyS_readlinkat+0x100/0x130
> 
>     fs/xfs/xfs_iops.c:
>         (via inode->i_op->get_link)
>         xfs_vn_get_link_inline(...):
>             ...
>             return XFS_I(inode)->i_df.if_u1.if_data;
> 
>     fs/namei.c:
>         readlink_copy(..., link):
>             ...
>             copy_to_user(..., link, len);
> 
>         generic_readlink(dentry, ...):
>             struct inode *inode = d_inode(dentry);
>             const char *link = inode->i_link;
>             ...
>             if (!link) {
>                     link = inode->i_op->get_link(dentry, inode, &done);
>             ...
>             readlink_copy(..., link);
> 
> In support of usercopy hardening, this patch defines a region in the
> xfs_inode slab cache in which userspace copy operations are allowed.
> 
> This region is known as the slab cache's usercopy region. Slab caches can
> now check that each copy operation involving cache-managed memory falls
> entirely within the slab's usercopy region.
> 
> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> whitelisting code in the last public patch of grsecurity/PaX based on my
> understanding of the code. Changes or omissions from the original code are
> mine and don't reflect the original grsecurity/PaX code.
> 
> Signed-off-by: David Windsor <dave@nullcore.net>
> [kees: adjust commit log, provide usage trace]
> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
> Cc: linux-xfs@vger.kernel.org
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/xfs/kmem.h      | 10 ++++++++++
>  fs/xfs/xfs_super.c |  7 +++++--
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
> index 4d85992d75b2..08358f38dee6 100644
> --- a/fs/xfs/kmem.h
> +++ b/fs/xfs/kmem.h
> @@ -110,6 +110,16 @@ kmem_zone_init_flags(int size, char *zone_name, unsigned long flags,
>  	return kmem_cache_create(zone_name, size, 0, flags, construct);
>  }
>  
> +static inline kmem_zone_t *
> +kmem_zone_init_flags_usercopy(int size, char *zone_name, unsigned long flags,
> +				size_t useroffset, size_t usersize,
> +				void (*construct)(void *))
> +{
> +	return kmem_cache_create_usercopy(zone_name, size, 0, flags,
> +				useroffset, usersize, construct);
> +}
> +
> +
>  static inline void
>  kmem_zone_free(kmem_zone_t *zone, void *ptr)
>  {
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 38aaacdbb8b3..6ca428c6f943 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1829,9 +1829,12 @@ xfs_init_zones(void)
>  		goto out_destroy_efd_zone;
>  
>  	xfs_inode_zone =
> -		kmem_zone_init_flags(sizeof(xfs_inode_t), "xfs_inode",
> +		kmem_zone_init_flags_usercopy(sizeof(xfs_inode_t), "xfs_inode",
>  			KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | KM_ZONE_SPREAD |
> -			KM_ZONE_ACCOUNT, xfs_fs_inode_init_once);
> +				KM_ZONE_ACCOUNT,
> +			offsetof(xfs_inode_t, i_df.if_u2.if_inline_data),
> +			sizeof_field(xfs_inode_t, i_df.if_u2.if_inline_data),
> +			xfs_fs_inode_init_once);
>  	if (!xfs_inode_zone)
>  		goto out_destroy_efi_zone;
>  
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-28 21:49     ` Darrick J. Wong
  0 siblings, 0 replies; 172+ messages in thread
From: Darrick J. Wong @ 2017-08-28 21:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, David Windsor, linux-xfs, linux-mm, kernel-hardening

On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> From: David Windsor <dave@nullcore.net>
> 
> The XFS inline inode data, stored in struct xfs_inode_t field
> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> cache, needs to be copied to/from userspace.
> 
> cache object allocation:
>     fs/xfs/xfs_icache.c:
>         xfs_inode_alloc(...):
>             ...
>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> 
>     fs/xfs/libxfs/xfs_inode_fork.c:
>         xfs_init_local_fork(...):
>             ...
>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;

Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
will be allocated for ifp->if_u1.if_data which can then be used for
readlink in the same manner as the example usage trace below.  Does
that allocated object have a need for a usercopy annotation like
the one we're adding for if_inline_data?  Or is that already covered
elsewhere?

--D

>             ...
> 
>     fs/xfs/xfs_symlink.c:
>         xfs_symlink(...):
>             ...
>             xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen);
> 
> example usage trace:
>     readlink_copy+0x43/0x70
>     vfs_readlink+0x62/0x110
>     SyS_readlinkat+0x100/0x130
> 
>     fs/xfs/xfs_iops.c:
>         (via inode->i_op->get_link)
>         xfs_vn_get_link_inline(...):
>             ...
>             return XFS_I(inode)->i_df.if_u1.if_data;
> 
>     fs/namei.c:
>         readlink_copy(..., link):
>             ...
>             copy_to_user(..., link, len);
> 
>         generic_readlink(dentry, ...):
>             struct inode *inode = d_inode(dentry);
>             const char *link = inode->i_link;
>             ...
>             if (!link) {
>                     link = inode->i_op->get_link(dentry, inode, &done);
>             ...
>             readlink_copy(..., link);
> 
> In support of usercopy hardening, this patch defines a region in the
> xfs_inode slab cache in which userspace copy operations are allowed.
> 
> This region is known as the slab cache's usercopy region. Slab caches can
> now check that each copy operation involving cache-managed memory falls
> entirely within the slab's usercopy region.
> 
> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> whitelisting code in the last public patch of grsecurity/PaX based on my
> understanding of the code. Changes or omissions from the original code are
> mine and don't reflect the original grsecurity/PaX code.
> 
> Signed-off-by: David Windsor <dave@nullcore.net>
> [kees: adjust commit log, provide usage trace]
> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
> Cc: linux-xfs@vger.kernel.org
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/xfs/kmem.h      | 10 ++++++++++
>  fs/xfs/xfs_super.c |  7 +++++--
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
> index 4d85992d75b2..08358f38dee6 100644
> --- a/fs/xfs/kmem.h
> +++ b/fs/xfs/kmem.h
> @@ -110,6 +110,16 @@ kmem_zone_init_flags(int size, char *zone_name, unsigned long flags,
>  	return kmem_cache_create(zone_name, size, 0, flags, construct);
>  }
>  
> +static inline kmem_zone_t *
> +kmem_zone_init_flags_usercopy(int size, char *zone_name, unsigned long flags,
> +				size_t useroffset, size_t usersize,
> +				void (*construct)(void *))
> +{
> +	return kmem_cache_create_usercopy(zone_name, size, 0, flags,
> +				useroffset, usersize, construct);
> +}
> +
> +
>  static inline void
>  kmem_zone_free(kmem_zone_t *zone, void *ptr)
>  {
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 38aaacdbb8b3..6ca428c6f943 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1829,9 +1829,12 @@ xfs_init_zones(void)
>  		goto out_destroy_efd_zone;
>  
>  	xfs_inode_zone =
> -		kmem_zone_init_flags(sizeof(xfs_inode_t), "xfs_inode",
> +		kmem_zone_init_flags_usercopy(sizeof(xfs_inode_t), "xfs_inode",
>  			KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | KM_ZONE_SPREAD |
> -			KM_ZONE_ACCOUNT, xfs_fs_inode_init_once);
> +				KM_ZONE_ACCOUNT,
> +			offsetof(xfs_inode_t, i_df.if_u2.if_inline_data),
> +			sizeof_field(xfs_inode_t, i_df.if_u2.if_inline_data),
> +			xfs_fs_inode_init_once);
>  	if (!xfs_inode_zone)
>  		goto out_destroy_efi_zone;
>  
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-28 21:49     ` Darrick J. Wong
  0 siblings, 0 replies; 172+ messages in thread
From: Darrick J. Wong @ 2017-08-28 21:49 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, David Windsor, linux-xfs, linux-mm, kernel-hardening

On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> From: David Windsor <dave@nullcore.net>
> 
> The XFS inline inode data, stored in struct xfs_inode_t field
> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> cache, needs to be copied to/from userspace.
> 
> cache object allocation:
>     fs/xfs/xfs_icache.c:
>         xfs_inode_alloc(...):
>             ...
>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> 
>     fs/xfs/libxfs/xfs_inode_fork.c:
>         xfs_init_local_fork(...):
>             ...
>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;

Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
will be allocated for ifp->if_u1.if_data which can then be used for
readlink in the same manner as the example usage trace below.  Does
that allocated object have a need for a usercopy annotation like
the one we're adding for if_inline_data?  Or is that already covered
elsewhere?

--D

>             ...
> 
>     fs/xfs/xfs_symlink.c:
>         xfs_symlink(...):
>             ...
>             xfs_init_local_fork(ip, XFS_DATA_FORK, target_path, pathlen);
> 
> example usage trace:
>     readlink_copy+0x43/0x70
>     vfs_readlink+0x62/0x110
>     SyS_readlinkat+0x100/0x130
> 
>     fs/xfs/xfs_iops.c:
>         (via inode->i_op->get_link)
>         xfs_vn_get_link_inline(...):
>             ...
>             return XFS_I(inode)->i_df.if_u1.if_data;
> 
>     fs/namei.c:
>         readlink_copy(..., link):
>             ...
>             copy_to_user(..., link, len);
> 
>         generic_readlink(dentry, ...):
>             struct inode *inode = d_inode(dentry);
>             const char *link = inode->i_link;
>             ...
>             if (!link) {
>                     link = inode->i_op->get_link(dentry, inode, &done);
>             ...
>             readlink_copy(..., link);
> 
> In support of usercopy hardening, this patch defines a region in the
> xfs_inode slab cache in which userspace copy operations are allowed.
> 
> This region is known as the slab cache's usercopy region. Slab caches can
> now check that each copy operation involving cache-managed memory falls
> entirely within the slab's usercopy region.
> 
> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> whitelisting code in the last public patch of grsecurity/PaX based on my
> understanding of the code. Changes or omissions from the original code are
> mine and don't reflect the original grsecurity/PaX code.
> 
> Signed-off-by: David Windsor <dave@nullcore.net>
> [kees: adjust commit log, provide usage trace]
> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
> Cc: linux-xfs@vger.kernel.org
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/xfs/kmem.h      | 10 ++++++++++
>  fs/xfs/xfs_super.c |  7 +++++--
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
> index 4d85992d75b2..08358f38dee6 100644
> --- a/fs/xfs/kmem.h
> +++ b/fs/xfs/kmem.h
> @@ -110,6 +110,16 @@ kmem_zone_init_flags(int size, char *zone_name, unsigned long flags,
>  	return kmem_cache_create(zone_name, size, 0, flags, construct);
>  }
>  
> +static inline kmem_zone_t *
> +kmem_zone_init_flags_usercopy(int size, char *zone_name, unsigned long flags,
> +				size_t useroffset, size_t usersize,
> +				void (*construct)(void *))
> +{
> +	return kmem_cache_create_usercopy(zone_name, size, 0, flags,
> +				useroffset, usersize, construct);
> +}
> +
> +
>  static inline void
>  kmem_zone_free(kmem_zone_t *zone, void *ptr)
>  {
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 38aaacdbb8b3..6ca428c6f943 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1829,9 +1829,12 @@ xfs_init_zones(void)
>  		goto out_destroy_efd_zone;
>  
>  	xfs_inode_zone =
> -		kmem_zone_init_flags(sizeof(xfs_inode_t), "xfs_inode",
> +		kmem_zone_init_flags_usercopy(sizeof(xfs_inode_t), "xfs_inode",
>  			KM_ZONE_HWALIGN | KM_ZONE_RECLAIM | KM_ZONE_SPREAD |
> -			KM_ZONE_ACCOUNT, xfs_fs_inode_init_once);
> +				KM_ZONE_ACCOUNT,
> +			offsetof(xfs_inode_t, i_df.if_u2.if_inline_data),
> +			sizeof_field(xfs_inode_t, i_df.if_u2.if_inline_data),
> +			xfs_fs_inode_init_once);
>  	if (!xfs_inode_zone)
>  		goto out_destroy_efi_zone;
>  
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 17/30] scsi: Define usercopy region in scsi_sense_cache slab cache
  2017-08-28 21:42     ` Bart Van Assche
  (?)
  (?)
@ 2017-08-28 21:52       ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-kernel, jejb, linux-scsi, linux-mm, kernel-hardening,
	martin.petersen, dave

On Mon, Aug 28, 2017 at 2:42 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> On Mon, 2017-08-28 at 14:34 -0700, Kees Cook wrote:
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index f6097b89d5d3..f1c6bd56dd5b 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -77,14 +77,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
>>       if (shost->unchecked_isa_dma) {
>>               scsi_sense_isadma_cache =
>>                       kmem_cache_create("scsi_sense_cache(DMA)",
>> -                     SCSI_SENSE_BUFFERSIZE, 0,
>> -                     SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
>> +                             SCSI_SENSE_BUFFERSIZE, 0,
>> +                             SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
>>               if (!scsi_sense_isadma_cache)
>>                       ret = -ENOMEM;
>
> All this part of this patch does is to change source code indentation. Should
> these changes really be included in this patch?

I can certainly drop that hunk, but the existing alignment is really
ugly. :) Happy to do whatever.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 17/30] scsi: Define usercopy region in scsi_sense_cache slab cache
@ 2017-08-28 21:52       ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-kernel, jejb, linux-scsi, linux-mm, kernel-hardening,
	martin.petersen, dave

On Mon, Aug 28, 2017 at 2:42 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> On Mon, 2017-08-28 at 14:34 -0700, Kees Cook wrote:
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index f6097b89d5d3..f1c6bd56dd5b 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -77,14 +77,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
>>       if (shost->unchecked_isa_dma) {
>>               scsi_sense_isadma_cache =
>>                       kmem_cache_create("scsi_sense_cache(DMA)",
>> -                     SCSI_SENSE_BUFFERSIZE, 0,
>> -                     SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
>> +                             SCSI_SENSE_BUFFERSIZE, 0,
>> +                             SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
>>               if (!scsi_sense_isadma_cache)
>>                       ret = -ENOMEM;
>
> All this part of this patch does is to change source code indentation. Should
> these changes really be included in this patch?

I can certainly drop that hunk, but the existing alignment is really
ugly. :) Happy to do whatever.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 17/30] scsi: Define usercopy region in scsi_sense_cache slab cache
@ 2017-08-28 21:52       ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-kernel, jejb, linux-scsi, linux-mm, kernel-hardening,
	martin.petersen, dave

On Mon, Aug 28, 2017 at 2:42 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> On Mon, 2017-08-28 at 14:34 -0700, Kees Cook wrote:
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index f6097b89d5d3..f1c6bd56dd5b 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -77,14 +77,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
>>       if (shost->unchecked_isa_dma) {
>>               scsi_sense_isadma_cache =
>>                       kmem_cache_create("scsi_sense_cache(DMA)",
>> -                     SCSI_SENSE_BUFFERSIZE, 0,
>> -                     SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
>> +                             SCSI_SENSE_BUFFERSIZE, 0,
>> +                             SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
>>               if (!scsi_sense_isadma_cache)
>>                       ret = -ENOMEM;
>
> All this part of this patch does is to change source code indentation. Should
> these changes really be included in this patch?

I can certainly drop that hunk, but the existing alignment is really
ugly. :) Happy to do whatever.

-Kees

-- 
Kees Cook
Pixel Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 17/30] scsi: Define usercopy region in scsi_sense_cache slab cache
@ 2017-08-28 21:52       ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: linux-kernel, jejb, linux-scsi, linux-mm, kernel-hardening,
	martin.petersen, dave

On Mon, Aug 28, 2017 at 2:42 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> On Mon, 2017-08-28 at 14:34 -0700, Kees Cook wrote:
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index f6097b89d5d3..f1c6bd56dd5b 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -77,14 +77,15 @@ int scsi_init_sense_cache(struct Scsi_Host *shost)
>>       if (shost->unchecked_isa_dma) {
>>               scsi_sense_isadma_cache =
>>                       kmem_cache_create("scsi_sense_cache(DMA)",
>> -                     SCSI_SENSE_BUFFERSIZE, 0,
>> -                     SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
>> +                             SCSI_SENSE_BUFFERSIZE, 0,
>> +                             SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA, NULL);
>>               if (!scsi_sense_isadma_cache)
>>                       ret = -ENOMEM;
>
> All this part of this patch does is to change source code indentation. Should
> these changes really be included in this patch?

I can certainly drop that hunk, but the existing alignment is really
ugly. :) Happy to do whatever.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-28 21:49     ` Darrick J. Wong
  (?)
  (?)
@ 2017-08-28 21:57       ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:57 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
>> From: David Windsor <dave@nullcore.net>
>>
>> The XFS inline inode data, stored in struct xfs_inode_t field
>> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
>> cache, needs to be copied to/from userspace.
>>
>> cache object allocation:
>>     fs/xfs/xfs_icache.c:
>>         xfs_inode_alloc(...):
>>             ...
>>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
>>
>>     fs/xfs/libxfs/xfs_inode_fork.c:
>>         xfs_init_local_fork(...):
>>             ...
>>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
>>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
>
> Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> will be allocated for ifp->if_u1.if_data which can then be used for
> readlink in the same manner as the example usage trace below.  Does
> that allocated object have a need for a usercopy annotation like
> the one we're adding for if_inline_data?  Or is that already covered
> elsewhere?

Yeah, the xfs helper kmem_alloc() is used in the other case, which
ultimately boils down to a call to kmalloc(), which is entirely
whitelisted by an earlier patch in the series:

https://lkml.org/lkml/2017/8/28/1026

(It's possible that at some future time we can start segregating
kernel-only kmallocs from usercopy-able kmallocs, but for now, there
are no plans for this.)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-28 21:57       ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:57 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
>> From: David Windsor <dave@nullcore.net>
>>
>> The XFS inline inode data, stored in struct xfs_inode_t field
>> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
>> cache, needs to be copied to/from userspace.
>>
>> cache object allocation:
>>     fs/xfs/xfs_icache.c:
>>         xfs_inode_alloc(...):
>>             ...
>>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
>>
>>     fs/xfs/libxfs/xfs_inode_fork.c:
>>         xfs_init_local_fork(...):
>>             ...
>>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
>>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
>
> Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> will be allocated for ifp->if_u1.if_data which can then be used for
> readlink in the same manner as the example usage trace below.  Does
> that allocated object have a need for a usercopy annotation like
> the one we're adding for if_inline_data?  Or is that already covered
> elsewhere?

Yeah, the xfs helper kmem_alloc() is used in the other case, which
ultimately boils down to a call to kmalloc(), which is entirely
whitelisted by an earlier patch in the series:

https://lkml.org/lkml/2017/8/28/1026

(It's possible that at some future time we can start segregating
kernel-only kmallocs from usercopy-able kmallocs, but for now, there
are no plans for this.)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-28 21:57       ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:57 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
>> From: David Windsor <dave@nullcore.net>
>>
>> The XFS inline inode data, stored in struct xfs_inode_t field
>> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
>> cache, needs to be copied to/from userspace.
>>
>> cache object allocation:
>>     fs/xfs/xfs_icache.c:
>>         xfs_inode_alloc(...):
>>             ...
>>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
>>
>>     fs/xfs/libxfs/xfs_inode_fork.c:
>>         xfs_init_local_fork(...):
>>             ...
>>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
>>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
>
> Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> will be allocated for ifp->if_u1.if_data which can then be used for
> readlink in the same manner as the example usage trace below.  Does
> that allocated object have a need for a usercopy annotation like
> the one we're adding for if_inline_data?  Or is that already covered
> elsewhere?

Yeah, the xfs helper kmem_alloc() is used in the other case, which
ultimately boils down to a call to kmalloc(), which is entirely
whitelisted by an earlier patch in the series:

https://lkml.org/lkml/2017/8/28/1026

(It's possible that at some future time we can start segregating
kernel-only kmallocs from usercopy-able kmallocs, but for now, there
are no plans for this.)

-Kees

-- 
Kees Cook
Pixel Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-28 21:57       ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-28 21:57 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
>> From: David Windsor <dave@nullcore.net>
>>
>> The XFS inline inode data, stored in struct xfs_inode_t field
>> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
>> cache, needs to be copied to/from userspace.
>>
>> cache object allocation:
>>     fs/xfs/xfs_icache.c:
>>         xfs_inode_alloc(...):
>>             ...
>>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
>>
>>     fs/xfs/libxfs/xfs_inode_fork.c:
>>         xfs_init_local_fork(...):
>>             ...
>>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
>>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
>
> Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> will be allocated for ifp->if_u1.if_data which can then be used for
> readlink in the same manner as the example usage trace below.  Does
> that allocated object have a need for a usercopy annotation like
> the one we're adding for if_inline_data?  Or is that already covered
> elsewhere?

Yeah, the xfs helper kmem_alloc() is used in the other case, which
ultimately boils down to a call to kmalloc(), which is entirely
whitelisted by an earlier patch in the series:

https://lkml.org/lkml/2017/8/28/1026

(It's possible that at some future time we can start segregating
kernel-only kmallocs from usercopy-able kmallocs, but for now, there
are no plans for this.)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-28 21:57       ` Kees Cook
  (?)
  (?)
@ 2017-08-29  4:47         ` Darrick J. Wong
  -1 siblings, 0 replies; 172+ messages in thread
From: Darrick J. Wong @ 2017-08-29  4:47 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> >> From: David Windsor <dave@nullcore.net>
> >>
> >> The XFS inline inode data, stored in struct xfs_inode_t field
> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> >> cache, needs to be copied to/from userspace.
> >>
> >> cache object allocation:
> >>     fs/xfs/xfs_icache.c:
> >>         xfs_inode_alloc(...):
> >>             ...
> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> >>
> >>     fs/xfs/libxfs/xfs_inode_fork.c:
> >>         xfs_init_local_fork(...):
> >>             ...
> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
> >
> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> > will be allocated for ifp->if_u1.if_data which can then be used for
> > readlink in the same manner as the example usage trace below.  Does
> > that allocated object have a need for a usercopy annotation like
> > the one we're adding for if_inline_data?  Or is that already covered
> > elsewhere?
> 
> Yeah, the xfs helper kmem_alloc() is used in the other case, which
> ultimately boils down to a call to kmalloc(), which is entirely
> whitelisted by an earlier patch in the series:
> 
> https://lkml.org/lkml/2017/8/28/1026

Ah.  It would've been helpful to have the first three patches cc'd to
the xfs list.  So basically this series establishes the ability to set
regions within a slab object into which copy_to_user can copy memory
contents, and vice versa.  Have you seen any runtime performance impact?
The overhead looks like it ought to be minimal.

> (It's possible that at some future time we can start segregating
> kernel-only kmallocs from usercopy-able kmallocs, but for now, there
> are no plans for this.)

A pity.  It would be interesting to create no-usercopy versions of the
kmalloc-* slabs and see how much of XFS' memory consumption never
touches userspace buffers. :)

--D

> 
> -Kees
> 
> -- 
> Kees Cook
> Pixel Security
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29  4:47         ` Darrick J. Wong
  0 siblings, 0 replies; 172+ messages in thread
From: Darrick J. Wong @ 2017-08-29  4:47 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> >> From: David Windsor <dave@nullcore.net>
> >>
> >> The XFS inline inode data, stored in struct xfs_inode_t field
> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> >> cache, needs to be copied to/from userspace.
> >>
> >> cache object allocation:
> >>     fs/xfs/xfs_icache.c:
> >>         xfs_inode_alloc(...):
> >>             ...
> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> >>
> >>     fs/xfs/libxfs/xfs_inode_fork.c:
> >>         xfs_init_local_fork(...):
> >>             ...
> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
> >
> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> > will be allocated for ifp->if_u1.if_data which can then be used for
> > readlink in the same manner as the example usage trace below.  Does
> > that allocated object have a need for a usercopy annotation like
> > the one we're adding for if_inline_data?  Or is that already covered
> > elsewhere?
> 
> Yeah, the xfs helper kmem_alloc() is used in the other case, which
> ultimately boils down to a call to kmalloc(), which is entirely
> whitelisted by an earlier patch in the series:
> 
> https://lkml.org/lkml/2017/8/28/1026

Ah.  It would've been helpful to have the first three patches cc'd to
the xfs list.  So basically this series establishes the ability to set
regions within a slab object into which copy_to_user can copy memory
contents, and vice versa.  Have you seen any runtime performance impact?
The overhead looks like it ought to be minimal.

> (It's possible that at some future time we can start segregating
> kernel-only kmallocs from usercopy-able kmallocs, but for now, there
> are no plans for this.)

A pity.  It would be interesting to create no-usercopy versions of the
kmalloc-* slabs and see how much of XFS' memory consumption never
touches userspace buffers. :)

--D

> 
> -Kees
> 
> -- 
> Kees Cook
> Pixel Security
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29  4:47         ` Darrick J. Wong
  0 siblings, 0 replies; 172+ messages in thread
From: Darrick J. Wong @ 2017-08-29  4:47 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> >> From: David Windsor <dave@nullcore.net>
> >>
> >> The XFS inline inode data, stored in struct xfs_inode_t field
> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> >> cache, needs to be copied to/from userspace.
> >>
> >> cache object allocation:
> >>     fs/xfs/xfs_icache.c:
> >>         xfs_inode_alloc(...):
> >>             ...
> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> >>
> >>     fs/xfs/libxfs/xfs_inode_fork.c:
> >>         xfs_init_local_fork(...):
> >>             ...
> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
> >
> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> > will be allocated for ifp->if_u1.if_data which can then be used for
> > readlink in the same manner as the example usage trace below.  Does
> > that allocated object have a need for a usercopy annotation like
> > the one we're adding for if_inline_data?  Or is that already covered
> > elsewhere?
> 
> Yeah, the xfs helper kmem_alloc() is used in the other case, which
> ultimately boils down to a call to kmalloc(), which is entirely
> whitelisted by an earlier patch in the series:
> 
> https://lkml.org/lkml/2017/8/28/1026

Ah.  It would've been helpful to have the first three patches cc'd to
the xfs list.  So basically this series establishes the ability to set
regions within a slab object into which copy_to_user can copy memory
contents, and vice versa.  Have you seen any runtime performance impact?
The overhead looks like it ought to be minimal.

> (It's possible that at some future time we can start segregating
> kernel-only kmallocs from usercopy-able kmallocs, but for now, there
> are no plans for this.)

A pity.  It would be interesting to create no-usercopy versions of the
kmalloc-* slabs and see how much of XFS' memory consumption never
touches userspace buffers. :)

--D

> 
> -Kees
> 
> -- 
> Kees Cook
> Pixel Security
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29  4:47         ` Darrick J. Wong
  0 siblings, 0 replies; 172+ messages in thread
From: Darrick J. Wong @ 2017-08-29  4:47 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> >> From: David Windsor <dave@nullcore.net>
> >>
> >> The XFS inline inode data, stored in struct xfs_inode_t field
> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> >> cache, needs to be copied to/from userspace.
> >>
> >> cache object allocation:
> >>     fs/xfs/xfs_icache.c:
> >>         xfs_inode_alloc(...):
> >>             ...
> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> >>
> >>     fs/xfs/libxfs/xfs_inode_fork.c:
> >>         xfs_init_local_fork(...):
> >>             ...
> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
> >
> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> > will be allocated for ifp->if_u1.if_data which can then be used for
> > readlink in the same manner as the example usage trace below.  Does
> > that allocated object have a need for a usercopy annotation like
> > the one we're adding for if_inline_data?  Or is that already covered
> > elsewhere?
> 
> Yeah, the xfs helper kmem_alloc() is used in the other case, which
> ultimately boils down to a call to kmalloc(), which is entirely
> whitelisted by an earlier patch in the series:
> 
> https://lkml.org/lkml/2017/8/28/1026

Ah.  It would've been helpful to have the first three patches cc'd to
the xfs list.  So basically this series establishes the ability to set
regions within a slab object into which copy_to_user can copy memory
contents, and vice versa.  Have you seen any runtime performance impact?
The overhead looks like it ought to be minimal.

> (It's possible that at some future time we can start segregating
> kernel-only kmallocs from usercopy-able kmallocs, but for now, there
> are no plans for this.)

A pity.  It would be interesting to create no-usercopy versions of the
kmalloc-* slabs and see how much of XFS' memory consumption never
touches userspace buffers. :)

--D

> 
> -Kees
> 
> -- 
> Kees Cook
> Pixel Security
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-28 21:34   ` Kees Cook
  (?)
@ 2017-08-29  8:14     ` Christoph Hellwig
  -1 siblings, 0 replies; 172+ messages in thread
From: Christoph Hellwig @ 2017-08-29  8:14 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, David Windsor, Darrick J. Wong, linux-xfs,
	linux-mm, kernel-hardening

One thing I've been wondering is wether we should actually just
get rid of the online area.  Compared to reading an inode from
disk a single additional kmalloc is negligible, and not having the
inline data / extent list would allow us to reduce the inode size
significantly.

Kees/David:  how many of these patches are file systems with some
sort of inline data?  Given that it's only about 30 patches declaring
allocations either entirely valid for user copy or not might end up
being nicer in many ways than these offsets.

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29  8:14     ` Christoph Hellwig
  0 siblings, 0 replies; 172+ messages in thread
From: Christoph Hellwig @ 2017-08-29  8:14 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, David Windsor, Darrick J. Wong, linux-xfs,
	linux-mm, kernel-hardening

One thing I've been wondering is wether we should actually just
get rid of the online area.  Compared to reading an inode from
disk a single additional kmalloc is negligible, and not having the
inline data / extent list would allow us to reduce the inode size
significantly.

Kees/David:  how many of these patches are file systems with some
sort of inline data?  Given that it's only about 30 patches declaring
allocations either entirely valid for user copy or not might end up
being nicer in many ways than these offsets.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29  8:14     ` Christoph Hellwig
  0 siblings, 0 replies; 172+ messages in thread
From: Christoph Hellwig @ 2017-08-29  8:14 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, David Windsor, Darrick J. Wong, linux-xfs,
	linux-mm, kernel-hardening

One thing I've been wondering is wether we should actually just
get rid of the online area.  Compared to reading an inode from
disk a single additional kmalloc is negligible, and not having the
inline data / extent list would allow us to reduce the inode size
significantly.

Kees/David:  how many of these patches are file systems with some
sort of inline data?  Given that it's only about 30 patches declaring
allocations either entirely valid for user copy or not might end up
being nicer in many ways than these offsets.

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 10/30] befs: Define usercopy region in befs_inode_cache slab cache
  2017-08-28 21:34   ` Kees Cook
  (?)
@ 2017-08-29 10:12     ` Luis de Bethencourt
  -1 siblings, 0 replies; 172+ messages in thread
From: Luis de Bethencourt @ 2017-08-29 10:12 UTC (permalink / raw)
  To: Kees Cook, linux-kernel
  Cc: David Windsor, Salah Triki, linux-mm, kernel-hardening

Hello Kees,

This is great. Thanks :)

Will merge into my befs tree.

Luis

On 08/28/2017 10:34 PM, Kees Cook wrote:
> From: David Windsor <dave@nullcore.net>
> 
> befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
> and therefore contained in the befs_inode_cache slab cache, need to be
> copied to/from userspace.
> 
> cache object allocation:
>      fs/befs/linuxvfs.c:
>          befs_alloc_inode(...):
>              ...
>              bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
>              ...
>              return &bi->vfs_inode;
> 
>          befs_iget(...):
>              ...
>              strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
>                      BEFS_SYMLINK_LEN);
>              ...
>              inode->i_link = befs_ino->i_data.symlink;
> 
> example usage trace:
>      readlink_copy+0x43/0x70
>      vfs_readlink+0x62/0x110
>      SyS_readlinkat+0x100/0x130
> 
>      fs/namei.c:
>          readlink_copy(..., link):
>              ...
>              copy_to_user(..., link, len);
> 
>          (inlined in vfs_readlink)
>          generic_readlink(dentry, ...):
>              struct inode *inode = d_inode(dentry);
>              const char *link = inode->i_link;
>              ...
>              readlink_copy(..., link);
> 
> In support of usercopy hardening, this patch defines a region in the
> befs_inode_cache slab cache in which userspace copy operations are
> allowed.
> 
> This region is known as the slab cache's usercopy region. Slab caches can
> now check that each copy operation involving cache-managed memory falls
> entirely within the slab's usercopy region.
> 
> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> whitelisting code in the last public patch of grsecurity/PaX based on my
> understanding of the code. Changes or omissions from the original code are
> mine and don't reflect the original grsecurity/PaX code.
> 
> Signed-off-by: David Windsor <dave@nullcore.net>
> [kees: adjust commit log, provide usage trace]
> Cc: Luis de Bethencourt <luisbg@kernel.org>
> Cc: Salah Triki <salah.triki@gmail.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>   fs/befs/linuxvfs.c | 14 +++++++++-----
>   1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
> index 4a4a5a366158..1c2dcbee79dd 100644
> --- a/fs/befs/linuxvfs.c
> +++ b/fs/befs/linuxvfs.c
> @@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block *sb, unsigned long ino)
>   static int __init
>   befs_init_inodecache(void)
>   {
> -	befs_inode_cachep = kmem_cache_create("befs_inode_cache",
> -					      sizeof (struct befs_inode_info),
> -					      0, (SLAB_RECLAIM_ACCOUNT|
> -						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
> -					      init_once);
> +	befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
> +				sizeof(struct befs_inode_info), 0,
> +				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
> +					SLAB_ACCOUNT),
> +				offsetof(struct befs_inode_info,
> +					i_data.symlink),
> +				sizeof_field(struct befs_inode_info,
> +					i_data.symlink),
> +				init_once);
>   	if (befs_inode_cachep == NULL)
>   		return -ENOMEM;
>   
> 

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 10/30] befs: Define usercopy region in befs_inode_cache slab cache
@ 2017-08-29 10:12     ` Luis de Bethencourt
  0 siblings, 0 replies; 172+ messages in thread
From: Luis de Bethencourt @ 2017-08-29 10:12 UTC (permalink / raw)
  To: Kees Cook, linux-kernel
  Cc: David Windsor, Salah Triki, linux-mm, kernel-hardening

Hello Kees,

This is great. Thanks :)

Will merge into my befs tree.

Luis

On 08/28/2017 10:34 PM, Kees Cook wrote:
> From: David Windsor <dave@nullcore.net>
> 
> befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
> and therefore contained in the befs_inode_cache slab cache, need to be
> copied to/from userspace.
> 
> cache object allocation:
>      fs/befs/linuxvfs.c:
>          befs_alloc_inode(...):
>              ...
>              bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
>              ...
>              return &bi->vfs_inode;
> 
>          befs_iget(...):
>              ...
>              strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
>                      BEFS_SYMLINK_LEN);
>              ...
>              inode->i_link = befs_ino->i_data.symlink;
> 
> example usage trace:
>      readlink_copy+0x43/0x70
>      vfs_readlink+0x62/0x110
>      SyS_readlinkat+0x100/0x130
> 
>      fs/namei.c:
>          readlink_copy(..., link):
>              ...
>              copy_to_user(..., link, len);
> 
>          (inlined in vfs_readlink)
>          generic_readlink(dentry, ...):
>              struct inode *inode = d_inode(dentry);
>              const char *link = inode->i_link;
>              ...
>              readlink_copy(..., link);
> 
> In support of usercopy hardening, this patch defines a region in the
> befs_inode_cache slab cache in which userspace copy operations are
> allowed.
> 
> This region is known as the slab cache's usercopy region. Slab caches can
> now check that each copy operation involving cache-managed memory falls
> entirely within the slab's usercopy region.
> 
> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> whitelisting code in the last public patch of grsecurity/PaX based on my
> understanding of the code. Changes or omissions from the original code are
> mine and don't reflect the original grsecurity/PaX code.
> 
> Signed-off-by: David Windsor <dave@nullcore.net>
> [kees: adjust commit log, provide usage trace]
> Cc: Luis de Bethencourt <luisbg@kernel.org>
> Cc: Salah Triki <salah.triki@gmail.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>   fs/befs/linuxvfs.c | 14 +++++++++-----
>   1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
> index 4a4a5a366158..1c2dcbee79dd 100644
> --- a/fs/befs/linuxvfs.c
> +++ b/fs/befs/linuxvfs.c
> @@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block *sb, unsigned long ino)
>   static int __init
>   befs_init_inodecache(void)
>   {
> -	befs_inode_cachep = kmem_cache_create("befs_inode_cache",
> -					      sizeof (struct befs_inode_info),
> -					      0, (SLAB_RECLAIM_ACCOUNT|
> -						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
> -					      init_once);
> +	befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
> +				sizeof(struct befs_inode_info), 0,
> +				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
> +					SLAB_ACCOUNT),
> +				offsetof(struct befs_inode_info,
> +					i_data.symlink),
> +				sizeof_field(struct befs_inode_info,
> +					i_data.symlink),
> +				init_once);
>   	if (befs_inode_cachep == NULL)
>   		return -ENOMEM;
>   
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 10/30] befs: Define usercopy region in befs_inode_cache slab cache
@ 2017-08-29 10:12     ` Luis de Bethencourt
  0 siblings, 0 replies; 172+ messages in thread
From: Luis de Bethencourt @ 2017-08-29 10:12 UTC (permalink / raw)
  To: Kees Cook, linux-kernel
  Cc: David Windsor, Salah Triki, linux-mm, kernel-hardening

Hello Kees,

This is great. Thanks :)

Will merge into my befs tree.

Luis

On 08/28/2017 10:34 PM, Kees Cook wrote:
> From: David Windsor <dave@nullcore.net>
> 
> befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
> and therefore contained in the befs_inode_cache slab cache, need to be
> copied to/from userspace.
> 
> cache object allocation:
>      fs/befs/linuxvfs.c:
>          befs_alloc_inode(...):
>              ...
>              bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
>              ...
>              return &bi->vfs_inode;
> 
>          befs_iget(...):
>              ...
>              strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
>                      BEFS_SYMLINK_LEN);
>              ...
>              inode->i_link = befs_ino->i_data.symlink;
> 
> example usage trace:
>      readlink_copy+0x43/0x70
>      vfs_readlink+0x62/0x110
>      SyS_readlinkat+0x100/0x130
> 
>      fs/namei.c:
>          readlink_copy(..., link):
>              ...
>              copy_to_user(..., link, len);
> 
>          (inlined in vfs_readlink)
>          generic_readlink(dentry, ...):
>              struct inode *inode = d_inode(dentry);
>              const char *link = inode->i_link;
>              ...
>              readlink_copy(..., link);
> 
> In support of usercopy hardening, this patch defines a region in the
> befs_inode_cache slab cache in which userspace copy operations are
> allowed.
> 
> This region is known as the slab cache's usercopy region. Slab caches can
> now check that each copy operation involving cache-managed memory falls
> entirely within the slab's usercopy region.
> 
> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> whitelisting code in the last public patch of grsecurity/PaX based on my
> understanding of the code. Changes or omissions from the original code are
> mine and don't reflect the original grsecurity/PaX code.
> 
> Signed-off-by: David Windsor <dave@nullcore.net>
> [kees: adjust commit log, provide usage trace]
> Cc: Luis de Bethencourt <luisbg@kernel.org>
> Cc: Salah Triki <salah.triki@gmail.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>   fs/befs/linuxvfs.c | 14 +++++++++-----
>   1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
> index 4a4a5a366158..1c2dcbee79dd 100644
> --- a/fs/befs/linuxvfs.c
> +++ b/fs/befs/linuxvfs.c
> @@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block *sb, unsigned long ino)
>   static int __init
>   befs_init_inodecache(void)
>   {
> -	befs_inode_cachep = kmem_cache_create("befs_inode_cache",
> -					      sizeof (struct befs_inode_info),
> -					      0, (SLAB_RECLAIM_ACCOUNT|
> -						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
> -					      init_once);
> +	befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
> +				sizeof(struct befs_inode_info), 0,
> +				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
> +					SLAB_ACCOUNT),
> +				offsetof(struct befs_inode_info,
> +					i_data.symlink),
> +				sizeof_field(struct befs_inode_info,
> +					i_data.symlink),
> +				init_once);
>   	if (befs_inode_cachep == NULL)
>   		return -ENOMEM;
>   
> 

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-29  8:14     ` Christoph Hellwig
  (?)
@ 2017-08-29 12:31       ` Dave Chinner
  -1 siblings, 0 replies; 172+ messages in thread
From: Dave Chinner @ 2017-08-29 12:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kees Cook, linux-kernel, David Windsor, Darrick J. Wong,
	linux-xfs, linux-mm, kernel-hardening

On Tue, Aug 29, 2017 at 01:14:53AM -0700, Christoph Hellwig wrote:
> One thing I've been wondering is wether we should actually just
> get rid of the online area.  Compared to reading an inode from
> disk a single additional kmalloc is negligible, and not having the
> inline data / extent list would allow us to reduce the inode size
> significantly.

Probably should.  I've already been looking at killing the inline
extents array to simplify the management of the extent list (much
simpler to index by rbtree when we don't have direct/indirect
structures), so killing the inline data would get rid of the other
part of the union the inline data sits in.

OTOH, if we're going to have to dynamically allocate the memory for
the extent/inline data for the data fork, it may just be easier to
make the entire data fork a dynamic allocation (like the attr fork).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 12:31       ` Dave Chinner
  0 siblings, 0 replies; 172+ messages in thread
From: Dave Chinner @ 2017-08-29 12:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kees Cook, linux-kernel, David Windsor, Darrick J. Wong,
	linux-xfs, linux-mm, kernel-hardening

On Tue, Aug 29, 2017 at 01:14:53AM -0700, Christoph Hellwig wrote:
> One thing I've been wondering is wether we should actually just
> get rid of the online area.  Compared to reading an inode from
> disk a single additional kmalloc is negligible, and not having the
> inline data / extent list would allow us to reduce the inode size
> significantly.

Probably should.  I've already been looking at killing the inline
extents array to simplify the management of the extent list (much
simpler to index by rbtree when we don't have direct/indirect
structures), so killing the inline data would get rid of the other
part of the union the inline data sits in.

OTOH, if we're going to have to dynamically allocate the memory for
the extent/inline data for the data fork, it may just be easier to
make the entire data fork a dynamic allocation (like the attr fork).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 12:31       ` Dave Chinner
  0 siblings, 0 replies; 172+ messages in thread
From: Dave Chinner @ 2017-08-29 12:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kees Cook, linux-kernel, David Windsor, Darrick J. Wong,
	linux-xfs, linux-mm, kernel-hardening

On Tue, Aug 29, 2017 at 01:14:53AM -0700, Christoph Hellwig wrote:
> One thing I've been wondering is wether we should actually just
> get rid of the online area.  Compared to reading an inode from
> disk a single additional kmalloc is negligible, and not having the
> inline data / extent list would allow us to reduce the inode size
> significantly.

Probably should.  I've already been looking at killing the inline
extents array to simplify the management of the extent list (much
simpler to index by rbtree when we don't have direct/indirect
structures), so killing the inline data would get rid of the other
part of the union the inline data sits in.

OTOH, if we're going to have to dynamically allocate the memory for
the extent/inline data for the data fork, it may just be easier to
make the entire data fork a dynamic allocation (like the attr fork).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-29 12:31       ` Dave Chinner
  (?)
@ 2017-08-29 12:45         ` Christoph Hellwig
  -1 siblings, 0 replies; 172+ messages in thread
From: Christoph Hellwig @ 2017-08-29 12:45 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Kees Cook, linux-kernel, David Windsor,
	Darrick J. Wong, linux-xfs, linux-mm, kernel-hardening

On Tue, Aug 29, 2017 at 10:31:26PM +1000, Dave Chinner wrote:
> Probably should.  I've already been looking at killing the inline
> extents array to simplify the management of the extent list (much
> simpler to index by rbtree when we don't have direct/indirect
> structures), so killing the inline data would get rid of the other
> part of the union the inline data sits in.

That's exactly where I came form with my extent list work.  Although
the rbtree performance was horrible due to the memory overhead and
I've switched to a modified b+tree at the moment..

> OTOH, if we're going to have to dynamically allocate the memory for
> the extent/inline data for the data fork, it may just be easier to
> make the entire data fork a dynamic allocation (like the attr fork).

I though about this a bit, but it turned out that we basically
always need the data anyway, so I don't think it's going to buy
us much unless we shrink the inode enough so that they better fit
into a page.

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 12:45         ` Christoph Hellwig
  0 siblings, 0 replies; 172+ messages in thread
From: Christoph Hellwig @ 2017-08-29 12:45 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Kees Cook, linux-kernel, David Windsor,
	Darrick J. Wong, linux-xfs, linux-mm, kernel-hardening

On Tue, Aug 29, 2017 at 10:31:26PM +1000, Dave Chinner wrote:
> Probably should.  I've already been looking at killing the inline
> extents array to simplify the management of the extent list (much
> simpler to index by rbtree when we don't have direct/indirect
> structures), so killing the inline data would get rid of the other
> part of the union the inline data sits in.

That's exactly where I came form with my extent list work.  Although
the rbtree performance was horrible due to the memory overhead and
I've switched to a modified b+tree at the moment..

> OTOH, if we're going to have to dynamically allocate the memory for
> the extent/inline data for the data fork, it may just be easier to
> make the entire data fork a dynamic allocation (like the attr fork).

I though about this a bit, but it turned out that we basically
always need the data anyway, so I don't think it's going to buy
us much unless we shrink the inode enough so that they better fit
into a page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 12:45         ` Christoph Hellwig
  0 siblings, 0 replies; 172+ messages in thread
From: Christoph Hellwig @ 2017-08-29 12:45 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Kees Cook, linux-kernel, David Windsor,
	Darrick J. Wong, linux-xfs, linux-mm, kernel-hardening

On Tue, Aug 29, 2017 at 10:31:26PM +1000, Dave Chinner wrote:
> Probably should.  I've already been looking at killing the inline
> extents array to simplify the management of the extent list (much
> simpler to index by rbtree when we don't have direct/indirect
> structures), so killing the inline data would get rid of the other
> part of the union the inline data sits in.

That's exactly where I came form with my extent list work.  Although
the rbtree performance was horrible due to the memory overhead and
I've switched to a modified b+tree at the moment..

> OTOH, if we're going to have to dynamically allocate the memory for
> the extent/inline data for the data fork, it may just be easier to
> make the entire data fork a dynamic allocation (like the attr fork).

I though about this a bit, but it turned out that we basically
always need the data anyway, so I don't think it's going to buy
us much unless we shrink the inode enough so that they better fit
into a page.

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 10/30] befs: Define usercopy region in befs_inode_cache slab cache
  2017-08-29 10:12     ` Luis de Bethencourt
  (?)
@ 2017-08-29 15:36       ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 15:36 UTC (permalink / raw)
  To: Luis de Bethencourt
  Cc: LKML, David Windsor, Salah Triki, Linux-MM, kernel-hardening

On Tue, Aug 29, 2017 at 3:12 AM, Luis de Bethencourt <luisbg@kernel.org> wrote:
> Hello Kees,
>
> This is great. Thanks :)
>
> Will merge into my befs tree.

Hi! Actually, this depends on the rest of the series, which should be
merged together. If you can Ack this, I'll include it in my usercopy
tree.

Thanks!

-Kees

>
> Luis
>
>
> On 08/28/2017 10:34 PM, Kees Cook wrote:
>>
>> From: David Windsor <dave@nullcore.net>
>>
>> befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
>> and therefore contained in the befs_inode_cache slab cache, need to be
>> copied to/from userspace.
>>
>> cache object allocation:
>>      fs/befs/linuxvfs.c:
>>          befs_alloc_inode(...):
>>              ...
>>              bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
>>              ...
>>              return &bi->vfs_inode;
>>
>>          befs_iget(...):
>>              ...
>>              strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
>>                      BEFS_SYMLINK_LEN);
>>              ...
>>              inode->i_link = befs_ino->i_data.symlink;
>>
>> example usage trace:
>>      readlink_copy+0x43/0x70
>>      vfs_readlink+0x62/0x110
>>      SyS_readlinkat+0x100/0x130
>>
>>      fs/namei.c:
>>          readlink_copy(..., link):
>>              ...
>>              copy_to_user(..., link, len);
>>
>>          (inlined in vfs_readlink)
>>          generic_readlink(dentry, ...):
>>              struct inode *inode = d_inode(dentry);
>>              const char *link = inode->i_link;
>>              ...
>>              readlink_copy(..., link);
>>
>> In support of usercopy hardening, this patch defines a region in the
>> befs_inode_cache slab cache in which userspace copy operations are
>> allowed.
>>
>> This region is known as the slab cache's usercopy region. Slab caches can
>> now check that each copy operation involving cache-managed memory falls
>> entirely within the slab's usercopy region.
>>
>> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
>> whitelisting code in the last public patch of grsecurity/PaX based on my
>> understanding of the code. Changes or omissions from the original code are
>> mine and don't reflect the original grsecurity/PaX code.
>>
>> Signed-off-by: David Windsor <dave@nullcore.net>
>> [kees: adjust commit log, provide usage trace]
>> Cc: Luis de Bethencourt <luisbg@kernel.org>
>> Cc: Salah Triki <salah.triki@gmail.com>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>>   fs/befs/linuxvfs.c | 14 +++++++++-----
>>   1 file changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
>> index 4a4a5a366158..1c2dcbee79dd 100644
>> --- a/fs/befs/linuxvfs.c
>> +++ b/fs/befs/linuxvfs.c
>> @@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block
>> *sb, unsigned long ino)
>>   static int __init
>>   befs_init_inodecache(void)
>>   {
>> -       befs_inode_cachep = kmem_cache_create("befs_inode_cache",
>> -                                             sizeof (struct
>> befs_inode_info),
>> -                                             0, (SLAB_RECLAIM_ACCOUNT|
>> -
>> SLAB_MEM_SPREAD|SLAB_ACCOUNT),
>> -                                             init_once);
>> +       befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
>> +                               sizeof(struct befs_inode_info), 0,
>> +                               (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
>> +                                       SLAB_ACCOUNT),
>> +                               offsetof(struct befs_inode_info,
>> +                                       i_data.symlink),
>> +                               sizeof_field(struct befs_inode_info,
>> +                                       i_data.symlink),
>> +                               init_once);
>>         if (befs_inode_cachep == NULL)
>>                 return -ENOMEM;
>>
>
>



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 10/30] befs: Define usercopy region in befs_inode_cache slab cache
@ 2017-08-29 15:36       ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 15:36 UTC (permalink / raw)
  To: Luis de Bethencourt
  Cc: LKML, David Windsor, Salah Triki, Linux-MM, kernel-hardening

On Tue, Aug 29, 2017 at 3:12 AM, Luis de Bethencourt <luisbg@kernel.org> wrote:
> Hello Kees,
>
> This is great. Thanks :)
>
> Will merge into my befs tree.

Hi! Actually, this depends on the rest of the series, which should be
merged together. If you can Ack this, I'll include it in my usercopy
tree.

Thanks!

-Kees

>
> Luis
>
>
> On 08/28/2017 10:34 PM, Kees Cook wrote:
>>
>> From: David Windsor <dave@nullcore.net>
>>
>> befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
>> and therefore contained in the befs_inode_cache slab cache, need to be
>> copied to/from userspace.
>>
>> cache object allocation:
>>      fs/befs/linuxvfs.c:
>>          befs_alloc_inode(...):
>>              ...
>>              bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
>>              ...
>>              return &bi->vfs_inode;
>>
>>          befs_iget(...):
>>              ...
>>              strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
>>                      BEFS_SYMLINK_LEN);
>>              ...
>>              inode->i_link = befs_ino->i_data.symlink;
>>
>> example usage trace:
>>      readlink_copy+0x43/0x70
>>      vfs_readlink+0x62/0x110
>>      SyS_readlinkat+0x100/0x130
>>
>>      fs/namei.c:
>>          readlink_copy(..., link):
>>              ...
>>              copy_to_user(..., link, len);
>>
>>          (inlined in vfs_readlink)
>>          generic_readlink(dentry, ...):
>>              struct inode *inode = d_inode(dentry);
>>              const char *link = inode->i_link;
>>              ...
>>              readlink_copy(..., link);
>>
>> In support of usercopy hardening, this patch defines a region in the
>> befs_inode_cache slab cache in which userspace copy operations are
>> allowed.
>>
>> This region is known as the slab cache's usercopy region. Slab caches can
>> now check that each copy operation involving cache-managed memory falls
>> entirely within the slab's usercopy region.
>>
>> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
>> whitelisting code in the last public patch of grsecurity/PaX based on my
>> understanding of the code. Changes or omissions from the original code are
>> mine and don't reflect the original grsecurity/PaX code.
>>
>> Signed-off-by: David Windsor <dave@nullcore.net>
>> [kees: adjust commit log, provide usage trace]
>> Cc: Luis de Bethencourt <luisbg@kernel.org>
>> Cc: Salah Triki <salah.triki@gmail.com>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>>   fs/befs/linuxvfs.c | 14 +++++++++-----
>>   1 file changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
>> index 4a4a5a366158..1c2dcbee79dd 100644
>> --- a/fs/befs/linuxvfs.c
>> +++ b/fs/befs/linuxvfs.c
>> @@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block
>> *sb, unsigned long ino)
>>   static int __init
>>   befs_init_inodecache(void)
>>   {
>> -       befs_inode_cachep = kmem_cache_create("befs_inode_cache",
>> -                                             sizeof (struct
>> befs_inode_info),
>> -                                             0, (SLAB_RECLAIM_ACCOUNT|
>> -
>> SLAB_MEM_SPREAD|SLAB_ACCOUNT),
>> -                                             init_once);
>> +       befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
>> +                               sizeof(struct befs_inode_info), 0,
>> +                               (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
>> +                                       SLAB_ACCOUNT),
>> +                               offsetof(struct befs_inode_info,
>> +                                       i_data.symlink),
>> +                               sizeof_field(struct befs_inode_info,
>> +                                       i_data.symlink),
>> +                               init_once);
>>         if (befs_inode_cachep == NULL)
>>                 return -ENOMEM;
>>
>
>



-- 
Kees Cook
Pixel Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 10/30] befs: Define usercopy region in befs_inode_cache slab cache
@ 2017-08-29 15:36       ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 15:36 UTC (permalink / raw)
  To: Luis de Bethencourt
  Cc: LKML, David Windsor, Salah Triki, Linux-MM, kernel-hardening

On Tue, Aug 29, 2017 at 3:12 AM, Luis de Bethencourt <luisbg@kernel.org> wrote:
> Hello Kees,
>
> This is great. Thanks :)
>
> Will merge into my befs tree.

Hi! Actually, this depends on the rest of the series, which should be
merged together. If you can Ack this, I'll include it in my usercopy
tree.

Thanks!

-Kees

>
> Luis
>
>
> On 08/28/2017 10:34 PM, Kees Cook wrote:
>>
>> From: David Windsor <dave@nullcore.net>
>>
>> befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
>> and therefore contained in the befs_inode_cache slab cache, need to be
>> copied to/from userspace.
>>
>> cache object allocation:
>>      fs/befs/linuxvfs.c:
>>          befs_alloc_inode(...):
>>              ...
>>              bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
>>              ...
>>              return &bi->vfs_inode;
>>
>>          befs_iget(...):
>>              ...
>>              strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
>>                      BEFS_SYMLINK_LEN);
>>              ...
>>              inode->i_link = befs_ino->i_data.symlink;
>>
>> example usage trace:
>>      readlink_copy+0x43/0x70
>>      vfs_readlink+0x62/0x110
>>      SyS_readlinkat+0x100/0x130
>>
>>      fs/namei.c:
>>          readlink_copy(..., link):
>>              ...
>>              copy_to_user(..., link, len);
>>
>>          (inlined in vfs_readlink)
>>          generic_readlink(dentry, ...):
>>              struct inode *inode = d_inode(dentry);
>>              const char *link = inode->i_link;
>>              ...
>>              readlink_copy(..., link);
>>
>> In support of usercopy hardening, this patch defines a region in the
>> befs_inode_cache slab cache in which userspace copy operations are
>> allowed.
>>
>> This region is known as the slab cache's usercopy region. Slab caches can
>> now check that each copy operation involving cache-managed memory falls
>> entirely within the slab's usercopy region.
>>
>> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
>> whitelisting code in the last public patch of grsecurity/PaX based on my
>> understanding of the code. Changes or omissions from the original code are
>> mine and don't reflect the original grsecurity/PaX code.
>>
>> Signed-off-by: David Windsor <dave@nullcore.net>
>> [kees: adjust commit log, provide usage trace]
>> Cc: Luis de Bethencourt <luisbg@kernel.org>
>> Cc: Salah Triki <salah.triki@gmail.com>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> ---
>>   fs/befs/linuxvfs.c | 14 +++++++++-----
>>   1 file changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
>> index 4a4a5a366158..1c2dcbee79dd 100644
>> --- a/fs/befs/linuxvfs.c
>> +++ b/fs/befs/linuxvfs.c
>> @@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block
>> *sb, unsigned long ino)
>>   static int __init
>>   befs_init_inodecache(void)
>>   {
>> -       befs_inode_cachep = kmem_cache_create("befs_inode_cache",
>> -                                             sizeof (struct
>> befs_inode_info),
>> -                                             0, (SLAB_RECLAIM_ACCOUNT|
>> -
>> SLAB_MEM_SPREAD|SLAB_ACCOUNT),
>> -                                             init_once);
>> +       befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
>> +                               sizeof(struct befs_inode_info), 0,
>> +                               (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
>> +                                       SLAB_ACCOUNT),
>> +                               offsetof(struct befs_inode_info,
>> +                                       i_data.symlink),
>> +                               sizeof_field(struct befs_inode_info,
>> +                                       i_data.symlink),
>> +                               init_once);
>>         if (befs_inode_cachep == NULL)
>>                 return -ENOMEM;
>>
>
>



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 10/30] befs: Define usercopy region in befs_inode_cache slab cache
  2017-08-29 15:36       ` Kees Cook
  (?)
@ 2017-08-29 17:10         ` Luis de Bethencourt
  -1 siblings, 0 replies; 172+ messages in thread
From: Luis de Bethencourt @ 2017-08-29 17:10 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, David Windsor, Salah Triki, Linux-MM, kernel-hardening

On 08/29/2017 04:36 PM, Kees Cook wrote:
> On Tue, Aug 29, 2017 at 3:12 AM, Luis de Bethencourt <luisbg@kernel.org> wrote:
>> Hello Kees,
>>
>> This is great. Thanks :)
>>
>> Will merge into my befs tree.
> 
> Hi! Actually, this depends on the rest of the series, which should be
> merged together. If you can Ack this, I'll include it in my usercopy
> tree.
> 
> Thanks!
> 
> -Kees
>

Sure!

Acked-by: Luis de Bethencourt <luisbg@kernel.org>

>>
>> Luis
>>
>>
>> On 08/28/2017 10:34 PM, Kees Cook wrote:
>>>
>>> From: David Windsor <dave@nullcore.net>
>>>
>>> befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
>>> and therefore contained in the befs_inode_cache slab cache, need to be
>>> copied to/from userspace.
>>>
>>> cache object allocation:
>>>       fs/befs/linuxvfs.c:
>>>           befs_alloc_inode(...):
>>>               ...
>>>               bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
>>>               ...
>>>               return &bi->vfs_inode;
>>>
>>>           befs_iget(...):
>>>               ...
>>>               strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
>>>                       BEFS_SYMLINK_LEN);
>>>               ...
>>>               inode->i_link = befs_ino->i_data.symlink;
>>>
>>> example usage trace:
>>>       readlink_copy+0x43/0x70
>>>       vfs_readlink+0x62/0x110
>>>       SyS_readlinkat+0x100/0x130
>>>
>>>       fs/namei.c:
>>>           readlink_copy(..., link):
>>>               ...
>>>               copy_to_user(..., link, len);
>>>
>>>           (inlined in vfs_readlink)
>>>           generic_readlink(dentry, ...):
>>>               struct inode *inode = d_inode(dentry);
>>>               const char *link = inode->i_link;
>>>               ...
>>>               readlink_copy(..., link);
>>>
>>> In support of usercopy hardening, this patch defines a region in the
>>> befs_inode_cache slab cache in which userspace copy operations are
>>> allowed.
>>>
>>> This region is known as the slab cache's usercopy region. Slab caches can
>>> now check that each copy operation involving cache-managed memory falls
>>> entirely within the slab's usercopy region.
>>>
>>> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
>>> whitelisting code in the last public patch of grsecurity/PaX based on my
>>> understanding of the code. Changes or omissions from the original code are
>>> mine and don't reflect the original grsecurity/PaX code.
>>>
>>> Signed-off-by: David Windsor <dave@nullcore.net>
>>> [kees: adjust commit log, provide usage trace]
>>> Cc: Luis de Bethencourt <luisbg@kernel.org>
>>> Cc: Salah Triki <salah.triki@gmail.com>
>>> Signed-off-by: Kees Cook <keescook@chromium.org>
>>> ---
>>>    fs/befs/linuxvfs.c | 14 +++++++++-----
>>>    1 file changed, 9 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
>>> index 4a4a5a366158..1c2dcbee79dd 100644
>>> --- a/fs/befs/linuxvfs.c
>>> +++ b/fs/befs/linuxvfs.c
>>> @@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block
>>> *sb, unsigned long ino)
>>>    static int __init
>>>    befs_init_inodecache(void)
>>>    {
>>> -       befs_inode_cachep = kmem_cache_create("befs_inode_cache",
>>> -                                             sizeof (struct
>>> befs_inode_info),
>>> -                                             0, (SLAB_RECLAIM_ACCOUNT|
>>> -
>>> SLAB_MEM_SPREAD|SLAB_ACCOUNT),
>>> -                                             init_once);
>>> +       befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
>>> +                               sizeof(struct befs_inode_info), 0,
>>> +                               (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
>>> +                                       SLAB_ACCOUNT),
>>> +                               offsetof(struct befs_inode_info,
>>> +                                       i_data.symlink),
>>> +                               sizeof_field(struct befs_inode_info,
>>> +                                       i_data.symlink),
>>> +                               init_once);
>>>          if (befs_inode_cachep == NULL)
>>>                  return -ENOMEM;
>>>
>>
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 10/30] befs: Define usercopy region in befs_inode_cache slab cache
@ 2017-08-29 17:10         ` Luis de Bethencourt
  0 siblings, 0 replies; 172+ messages in thread
From: Luis de Bethencourt @ 2017-08-29 17:10 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, David Windsor, Salah Triki, Linux-MM, kernel-hardening

On 08/29/2017 04:36 PM, Kees Cook wrote:
> On Tue, Aug 29, 2017 at 3:12 AM, Luis de Bethencourt <luisbg@kernel.org> wrote:
>> Hello Kees,
>>
>> This is great. Thanks :)
>>
>> Will merge into my befs tree.
> 
> Hi! Actually, this depends on the rest of the series, which should be
> merged together. If you can Ack this, I'll include it in my usercopy
> tree.
> 
> Thanks!
> 
> -Kees
>

Sure!

Acked-by: Luis de Bethencourt <luisbg@kernel.org>

>>
>> Luis
>>
>>
>> On 08/28/2017 10:34 PM, Kees Cook wrote:
>>>
>>> From: David Windsor <dave@nullcore.net>
>>>
>>> befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
>>> and therefore contained in the befs_inode_cache slab cache, need to be
>>> copied to/from userspace.
>>>
>>> cache object allocation:
>>>       fs/befs/linuxvfs.c:
>>>           befs_alloc_inode(...):
>>>               ...
>>>               bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
>>>               ...
>>>               return &bi->vfs_inode;
>>>
>>>           befs_iget(...):
>>>               ...
>>>               strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
>>>                       BEFS_SYMLINK_LEN);
>>>               ...
>>>               inode->i_link = befs_ino->i_data.symlink;
>>>
>>> example usage trace:
>>>       readlink_copy+0x43/0x70
>>>       vfs_readlink+0x62/0x110
>>>       SyS_readlinkat+0x100/0x130
>>>
>>>       fs/namei.c:
>>>           readlink_copy(..., link):
>>>               ...
>>>               copy_to_user(..., link, len);
>>>
>>>           (inlined in vfs_readlink)
>>>           generic_readlink(dentry, ...):
>>>               struct inode *inode = d_inode(dentry);
>>>               const char *link = inode->i_link;
>>>               ...
>>>               readlink_copy(..., link);
>>>
>>> In support of usercopy hardening, this patch defines a region in the
>>> befs_inode_cache slab cache in which userspace copy operations are
>>> allowed.
>>>
>>> This region is known as the slab cache's usercopy region. Slab caches can
>>> now check that each copy operation involving cache-managed memory falls
>>> entirely within the slab's usercopy region.
>>>
>>> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
>>> whitelisting code in the last public patch of grsecurity/PaX based on my
>>> understanding of the code. Changes or omissions from the original code are
>>> mine and don't reflect the original grsecurity/PaX code.
>>>
>>> Signed-off-by: David Windsor <dave@nullcore.net>
>>> [kees: adjust commit log, provide usage trace]
>>> Cc: Luis de Bethencourt <luisbg@kernel.org>
>>> Cc: Salah Triki <salah.triki@gmail.com>
>>> Signed-off-by: Kees Cook <keescook@chromium.org>
>>> ---
>>>    fs/befs/linuxvfs.c | 14 +++++++++-----
>>>    1 file changed, 9 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
>>> index 4a4a5a366158..1c2dcbee79dd 100644
>>> --- a/fs/befs/linuxvfs.c
>>> +++ b/fs/befs/linuxvfs.c
>>> @@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block
>>> *sb, unsigned long ino)
>>>    static int __init
>>>    befs_init_inodecache(void)
>>>    {
>>> -       befs_inode_cachep = kmem_cache_create("befs_inode_cache",
>>> -                                             sizeof (struct
>>> befs_inode_info),
>>> -                                             0, (SLAB_RECLAIM_ACCOUNT|
>>> -
>>> SLAB_MEM_SPREAD|SLAB_ACCOUNT),
>>> -                                             init_once);
>>> +       befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
>>> +                               sizeof(struct befs_inode_info), 0,
>>> +                               (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
>>> +                                       SLAB_ACCOUNT),
>>> +                               offsetof(struct befs_inode_info,
>>> +                                       i_data.symlink),
>>> +                               sizeof_field(struct befs_inode_info,
>>> +                                       i_data.symlink),
>>> +                               init_once);
>>>          if (befs_inode_cachep == NULL)
>>>                  return -ENOMEM;
>>>
>>
>>
> 
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 10/30] befs: Define usercopy region in befs_inode_cache slab cache
@ 2017-08-29 17:10         ` Luis de Bethencourt
  0 siblings, 0 replies; 172+ messages in thread
From: Luis de Bethencourt @ 2017-08-29 17:10 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, David Windsor, Salah Triki, Linux-MM, kernel-hardening

On 08/29/2017 04:36 PM, Kees Cook wrote:
> On Tue, Aug 29, 2017 at 3:12 AM, Luis de Bethencourt <luisbg@kernel.org> wrote:
>> Hello Kees,
>>
>> This is great. Thanks :)
>>
>> Will merge into my befs tree.
> 
> Hi! Actually, this depends on the rest of the series, which should be
> merged together. If you can Ack this, I'll include it in my usercopy
> tree.
> 
> Thanks!
> 
> -Kees
>

Sure!

Acked-by: Luis de Bethencourt <luisbg@kernel.org>

>>
>> Luis
>>
>>
>> On 08/28/2017 10:34 PM, Kees Cook wrote:
>>>
>>> From: David Windsor <dave@nullcore.net>
>>>
>>> befs symlink pathnames, stored in struct befs_inode_info.i_data.symlink
>>> and therefore contained in the befs_inode_cache slab cache, need to be
>>> copied to/from userspace.
>>>
>>> cache object allocation:
>>>       fs/befs/linuxvfs.c:
>>>           befs_alloc_inode(...):
>>>               ...
>>>               bi = kmem_cache_alloc(befs_inode_cachep, GFP_KERNEL);
>>>               ...
>>>               return &bi->vfs_inode;
>>>
>>>           befs_iget(...):
>>>               ...
>>>               strlcpy(befs_ino->i_data.symlink, raw_inode->data.symlink,
>>>                       BEFS_SYMLINK_LEN);
>>>               ...
>>>               inode->i_link = befs_ino->i_data.symlink;
>>>
>>> example usage trace:
>>>       readlink_copy+0x43/0x70
>>>       vfs_readlink+0x62/0x110
>>>       SyS_readlinkat+0x100/0x130
>>>
>>>       fs/namei.c:
>>>           readlink_copy(..., link):
>>>               ...
>>>               copy_to_user(..., link, len);
>>>
>>>           (inlined in vfs_readlink)
>>>           generic_readlink(dentry, ...):
>>>               struct inode *inode = d_inode(dentry);
>>>               const char *link = inode->i_link;
>>>               ...
>>>               readlink_copy(..., link);
>>>
>>> In support of usercopy hardening, this patch defines a region in the
>>> befs_inode_cache slab cache in which userspace copy operations are
>>> allowed.
>>>
>>> This region is known as the slab cache's usercopy region. Slab caches can
>>> now check that each copy operation involving cache-managed memory falls
>>> entirely within the slab's usercopy region.
>>>
>>> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
>>> whitelisting code in the last public patch of grsecurity/PaX based on my
>>> understanding of the code. Changes or omissions from the original code are
>>> mine and don't reflect the original grsecurity/PaX code.
>>>
>>> Signed-off-by: David Windsor <dave@nullcore.net>
>>> [kees: adjust commit log, provide usage trace]
>>> Cc: Luis de Bethencourt <luisbg@kernel.org>
>>> Cc: Salah Triki <salah.triki@gmail.com>
>>> Signed-off-by: Kees Cook <keescook@chromium.org>
>>> ---
>>>    fs/befs/linuxvfs.c | 14 +++++++++-----
>>>    1 file changed, 9 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/befs/linuxvfs.c b/fs/befs/linuxvfs.c
>>> index 4a4a5a366158..1c2dcbee79dd 100644
>>> --- a/fs/befs/linuxvfs.c
>>> +++ b/fs/befs/linuxvfs.c
>>> @@ -444,11 +444,15 @@ static struct inode *befs_iget(struct super_block
>>> *sb, unsigned long ino)
>>>    static int __init
>>>    befs_init_inodecache(void)
>>>    {
>>> -       befs_inode_cachep = kmem_cache_create("befs_inode_cache",
>>> -                                             sizeof (struct
>>> befs_inode_info),
>>> -                                             0, (SLAB_RECLAIM_ACCOUNT|
>>> -
>>> SLAB_MEM_SPREAD|SLAB_ACCOUNT),
>>> -                                             init_once);
>>> +       befs_inode_cachep = kmem_cache_create_usercopy("befs_inode_cache",
>>> +                               sizeof(struct befs_inode_info), 0,
>>> +                               (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
>>> +                                       SLAB_ACCOUNT),
>>> +                               offsetof(struct befs_inode_info,
>>> +                                       i_data.symlink),
>>> +                               sizeof_field(struct befs_inode_info,
>>> +                                       i_data.symlink),
>>> +                               init_once);
>>>          if (befs_inode_cachep == NULL)
>>>                  return -ENOMEM;
>>>
>>
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-29  4:47         ` Darrick J. Wong
  (?)
  (?)
@ 2017-08-29 18:48           ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 18:48 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Mon, Aug 28, 2017 at 9:47 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
>> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
>> <darrick.wong@oracle.com> wrote:
>> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
>> >> From: David Windsor <dave@nullcore.net>
>> >>
>> >> The XFS inline inode data, stored in struct xfs_inode_t field
>> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
>> >> cache, needs to be copied to/from userspace.
>> >>
>> >> cache object allocation:
>> >>     fs/xfs/xfs_icache.c:
>> >>         xfs_inode_alloc(...):
>> >>             ...
>> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
>> >>
>> >>     fs/xfs/libxfs/xfs_inode_fork.c:
>> >>         xfs_init_local_fork(...):
>> >>             ...
>> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
>> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
>> >
>> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
>> > will be allocated for ifp->if_u1.if_data which can then be used for
>> > readlink in the same manner as the example usage trace below.  Does
>> > that allocated object have a need for a usercopy annotation like
>> > the one we're adding for if_inline_data?  Or is that already covered
>> > elsewhere?
>>
>> Yeah, the xfs helper kmem_alloc() is used in the other case, which
>> ultimately boils down to a call to kmalloc(), which is entirely
>> whitelisted by an earlier patch in the series:
>>
>> https://lkml.org/lkml/2017/8/28/1026
>
> Ah.  It would've been helpful to have the first three patches cc'd to
> the xfs list.  So basically this series establishes the ability to set

I went back and forth on that, and given all the things it touched, it
seemed like too large a CC list. :) I can explicitly add the xfs list
to the first three for any future versions.

> regions within a slab object into which copy_to_user can copy memory
> contents, and vice versa.  Have you seen any runtime performance impact?
> The overhead looks like it ought to be minimal.

Under CONFIG_HARDENED_USERCOPY, there's no difference in performance
between the earlier bounds checking (of the whole slab object) vs the
new bounds checking (of the useroffset/usersize portion of the slab
object). Perf difference of CONFIG_HARDENED_USERCOPY itself has proven
hard to measure, which likely means it's very minimal.

>> (It's possible that at some future time we can start segregating
>> kernel-only kmallocs from usercopy-able kmallocs, but for now, there
>> are no plans for this.)
>
> A pity.  It would be interesting to create no-usercopy versions of the
> kmalloc-* slabs and see how much of XFS' memory consumption never
> touches userspace buffers. :)

There are plans for building either a new helper (kmalloc_usercopy())
or adding a new flag (GFP_USERCOPY), but I haven't had time yet to
come back around to it. I wanted to land this step first, and we could
then move forward on the rest in future.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 18:48           ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 18:48 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Mon, Aug 28, 2017 at 9:47 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
>> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
>> <darrick.wong@oracle.com> wrote:
>> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
>> >> From: David Windsor <dave@nullcore.net>
>> >>
>> >> The XFS inline inode data, stored in struct xfs_inode_t field
>> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
>> >> cache, needs to be copied to/from userspace.
>> >>
>> >> cache object allocation:
>> >>     fs/xfs/xfs_icache.c:
>> >>         xfs_inode_alloc(...):
>> >>             ...
>> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
>> >>
>> >>     fs/xfs/libxfs/xfs_inode_fork.c:
>> >>         xfs_init_local_fork(...):
>> >>             ...
>> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
>> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
>> >
>> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
>> > will be allocated for ifp->if_u1.if_data which can then be used for
>> > readlink in the same manner as the example usage trace below.  Does
>> > that allocated object have a need for a usercopy annotation like
>> > the one we're adding for if_inline_data?  Or is that already covered
>> > elsewhere?
>>
>> Yeah, the xfs helper kmem_alloc() is used in the other case, which
>> ultimately boils down to a call to kmalloc(), which is entirely
>> whitelisted by an earlier patch in the series:
>>
>> https://lkml.org/lkml/2017/8/28/1026
>
> Ah.  It would've been helpful to have the first three patches cc'd to
> the xfs list.  So basically this series establishes the ability to set

I went back and forth on that, and given all the things it touched, it
seemed like too large a CC list. :) I can explicitly add the xfs list
to the first three for any future versions.

> regions within a slab object into which copy_to_user can copy memory
> contents, and vice versa.  Have you seen any runtime performance impact?
> The overhead looks like it ought to be minimal.

Under CONFIG_HARDENED_USERCOPY, there's no difference in performance
between the earlier bounds checking (of the whole slab object) vs the
new bounds checking (of the useroffset/usersize portion of the slab
object). Perf difference of CONFIG_HARDENED_USERCOPY itself has proven
hard to measure, which likely means it's very minimal.

>> (It's possible that at some future time we can start segregating
>> kernel-only kmallocs from usercopy-able kmallocs, but for now, there
>> are no plans for this.)
>
> A pity.  It would be interesting to create no-usercopy versions of the
> kmalloc-* slabs and see how much of XFS' memory consumption never
> touches userspace buffers. :)

There are plans for building either a new helper (kmalloc_usercopy())
or adding a new flag (GFP_USERCOPY), but I haven't had time yet to
come back around to it. I wanted to land this step first, and we could
then move forward on the rest in future.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 18:48           ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 18:48 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Mon, Aug 28, 2017 at 9:47 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
>> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
>> <darrick.wong@oracle.com> wrote:
>> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
>> >> From: David Windsor <dave@nullcore.net>
>> >>
>> >> The XFS inline inode data, stored in struct xfs_inode_t field
>> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
>> >> cache, needs to be copied to/from userspace.
>> >>
>> >> cache object allocation:
>> >>     fs/xfs/xfs_icache.c:
>> >>         xfs_inode_alloc(...):
>> >>             ...
>> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
>> >>
>> >>     fs/xfs/libxfs/xfs_inode_fork.c:
>> >>         xfs_init_local_fork(...):
>> >>             ...
>> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
>> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
>> >
>> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
>> > will be allocated for ifp->if_u1.if_data which can then be used for
>> > readlink in the same manner as the example usage trace below.  Does
>> > that allocated object have a need for a usercopy annotation like
>> > the one we're adding for if_inline_data?  Or is that already covered
>> > elsewhere?
>>
>> Yeah, the xfs helper kmem_alloc() is used in the other case, which
>> ultimately boils down to a call to kmalloc(), which is entirely
>> whitelisted by an earlier patch in the series:
>>
>> https://lkml.org/lkml/2017/8/28/1026
>
> Ah.  It would've been helpful to have the first three patches cc'd to
> the xfs list.  So basically this series establishes the ability to set

I went back and forth on that, and given all the things it touched, it
seemed like too large a CC list. :) I can explicitly add the xfs list
to the first three for any future versions.

> regions within a slab object into which copy_to_user can copy memory
> contents, and vice versa.  Have you seen any runtime performance impact?
> The overhead looks like it ought to be minimal.

Under CONFIG_HARDENED_USERCOPY, there's no difference in performance
between the earlier bounds checking (of the whole slab object) vs the
new bounds checking (of the useroffset/usersize portion of the slab
object). Perf difference of CONFIG_HARDENED_USERCOPY itself has proven
hard to measure, which likely means it's very minimal.

>> (It's possible that at some future time we can start segregating
>> kernel-only kmallocs from usercopy-able kmallocs, but for now, there
>> are no plans for this.)
>
> A pity.  It would be interesting to create no-usercopy versions of the
> kmalloc-* slabs and see how much of XFS' memory consumption never
> touches userspace buffers. :)

There are plans for building either a new helper (kmalloc_usercopy())
or adding a new flag (GFP_USERCOPY), but I haven't had time yet to
come back around to it. I wanted to land this step first, and we could
then move forward on the rest in future.

-Kees

-- 
Kees Cook
Pixel Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 18:48           ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 18:48 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Mon, Aug 28, 2017 at 9:47 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
>> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
>> <darrick.wong@oracle.com> wrote:
>> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
>> >> From: David Windsor <dave@nullcore.net>
>> >>
>> >> The XFS inline inode data, stored in struct xfs_inode_t field
>> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
>> >> cache, needs to be copied to/from userspace.
>> >>
>> >> cache object allocation:
>> >>     fs/xfs/xfs_icache.c:
>> >>         xfs_inode_alloc(...):
>> >>             ...
>> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
>> >>
>> >>     fs/xfs/libxfs/xfs_inode_fork.c:
>> >>         xfs_init_local_fork(...):
>> >>             ...
>> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
>> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
>> >
>> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
>> > will be allocated for ifp->if_u1.if_data which can then be used for
>> > readlink in the same manner as the example usage trace below.  Does
>> > that allocated object have a need for a usercopy annotation like
>> > the one we're adding for if_inline_data?  Or is that already covered
>> > elsewhere?
>>
>> Yeah, the xfs helper kmem_alloc() is used in the other case, which
>> ultimately boils down to a call to kmalloc(), which is entirely
>> whitelisted by an earlier patch in the series:
>>
>> https://lkml.org/lkml/2017/8/28/1026
>
> Ah.  It would've been helpful to have the first three patches cc'd to
> the xfs list.  So basically this series establishes the ability to set

I went back and forth on that, and given all the things it touched, it
seemed like too large a CC list. :) I can explicitly add the xfs list
to the first three for any future versions.

> regions within a slab object into which copy_to_user can copy memory
> contents, and vice versa.  Have you seen any runtime performance impact?
> The overhead looks like it ought to be minimal.

Under CONFIG_HARDENED_USERCOPY, there's no difference in performance
between the earlier bounds checking (of the whole slab object) vs the
new bounds checking (of the useroffset/usersize portion of the slab
object). Perf difference of CONFIG_HARDENED_USERCOPY itself has proven
hard to measure, which likely means it's very minimal.

>> (It's possible that at some future time we can start segregating
>> kernel-only kmallocs from usercopy-able kmallocs, but for now, there
>> are no plans for this.)
>
> A pity.  It would be interesting to create no-usercopy versions of the
> kmalloc-* slabs and see how much of XFS' memory consumption never
> touches userspace buffers. :)

There are plans for building either a new helper (kmalloc_usercopy())
or adding a new flag (GFP_USERCOPY), but I haven't had time yet to
come back around to it. I wanted to land this step first, and we could
then move forward on the rest in future.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-29  8:14     ` Christoph Hellwig
  (?)
  (?)
@ 2017-08-29 18:55       ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 18:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: LKML, David Windsor, Darrick J. Wong, linux-xfs, Linux-MM,
	kernel-hardening

On Tue, Aug 29, 2017 at 1:14 AM, Christoph Hellwig <hch@infradead.org> wrote:
> One thing I've been wondering is wether we should actually just
> get rid of the online area.  Compared to reading an inode from
> disk a single additional kmalloc is negligible, and not having the
> inline data / extent list would allow us to reduce the inode size
> significantly.
>
> Kees/David:  how many of these patches are file systems with some
> sort of inline data?  Given that it's only about 30 patches declaring
> allocations either entirely valid for user copy or not might end up
> being nicer in many ways than these offsets.

9 filesystems use some form of inline data: xfs, vxfs, ufs, orangefs,
exofs, befs, jfs, ext2, and ext4. How much of each slab is whitelisted
varies by filesystem (e.g. ext2/4 uses i_data for other things, but
ufs and orangefs and have a dedicate field for symlink names).

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 18:55       ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 18:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: LKML, David Windsor, Darrick J. Wong, linux-xfs, Linux-MM,
	kernel-hardening

On Tue, Aug 29, 2017 at 1:14 AM, Christoph Hellwig <hch@infradead.org> wrote:
> One thing I've been wondering is wether we should actually just
> get rid of the online area.  Compared to reading an inode from
> disk a single additional kmalloc is negligible, and not having the
> inline data / extent list would allow us to reduce the inode size
> significantly.
>
> Kees/David:  how many of these patches are file systems with some
> sort of inline data?  Given that it's only about 30 patches declaring
> allocations either entirely valid for user copy or not might end up
> being nicer in many ways than these offsets.

9 filesystems use some form of inline data: xfs, vxfs, ufs, orangefs,
exofs, befs, jfs, ext2, and ext4. How much of each slab is whitelisted
varies by filesystem (e.g. ext2/4 uses i_data for other things, but
ufs and orangefs and have a dedicate field for symlink names).

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 18:55       ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 18:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: LKML, David Windsor, Darrick J. Wong, linux-xfs, Linux-MM,
	kernel-hardening

On Tue, Aug 29, 2017 at 1:14 AM, Christoph Hellwig <hch@infradead.org> wrote:
> One thing I've been wondering is wether we should actually just
> get rid of the online area.  Compared to reading an inode from
> disk a single additional kmalloc is negligible, and not having the
> inline data / extent list would allow us to reduce the inode size
> significantly.
>
> Kees/David:  how many of these patches are file systems with some
> sort of inline data?  Given that it's only about 30 patches declaring
> allocations either entirely valid for user copy or not might end up
> being nicer in many ways than these offsets.

9 filesystems use some form of inline data: xfs, vxfs, ufs, orangefs,
exofs, befs, jfs, ext2, and ext4. How much of each slab is whitelisted
varies by filesystem (e.g. ext2/4 uses i_data for other things, but
ufs and orangefs and have a dedicate field for symlink names).

-Kees

-- 
Kees Cook
Pixel Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 18:55       ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 18:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: LKML, David Windsor, Darrick J. Wong, linux-xfs, Linux-MM,
	kernel-hardening

On Tue, Aug 29, 2017 at 1:14 AM, Christoph Hellwig <hch@infradead.org> wrote:
> One thing I've been wondering is wether we should actually just
> get rid of the online area.  Compared to reading an inode from
> disk a single additional kmalloc is negligible, and not having the
> inline data / extent list would allow us to reduce the inode size
> significantly.
>
> Kees/David:  how many of these patches are file systems with some
> sort of inline data?  Given that it's only about 30 patches declaring
> allocations either entirely valid for user copy or not might end up
> being nicer in many ways than these offsets.

9 filesystems use some form of inline data: xfs, vxfs, ufs, orangefs,
exofs, befs, jfs, ext2, and ext4. How much of each slab is whitelisted
varies by filesystem (e.g. ext2/4 uses i_data for other things, but
ufs and orangefs and have a dedicate field for symlink names).

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-29 18:48           ` Kees Cook
  (?)
  (?)
@ 2017-08-29 19:00             ` Darrick J. Wong
  -1 siblings, 0 replies; 172+ messages in thread
From: Darrick J. Wong @ 2017-08-29 19:00 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Tue, Aug 29, 2017 at 11:48:49AM -0700, Kees Cook wrote:
> On Mon, Aug 28, 2017 at 9:47 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
> >> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
> >> <darrick.wong@oracle.com> wrote:
> >> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> >> >> From: David Windsor <dave@nullcore.net>
> >> >>
> >> >> The XFS inline inode data, stored in struct xfs_inode_t field
> >> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> >> >> cache, needs to be copied to/from userspace.
> >> >>
> >> >> cache object allocation:
> >> >>     fs/xfs/xfs_icache.c:
> >> >>         xfs_inode_alloc(...):
> >> >>             ...
> >> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> >> >>
> >> >>     fs/xfs/libxfs/xfs_inode_fork.c:
> >> >>         xfs_init_local_fork(...):
> >> >>             ...
> >> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
> >> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
> >> >
> >> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> >> > will be allocated for ifp->if_u1.if_data which can then be used for
> >> > readlink in the same manner as the example usage trace below.  Does
> >> > that allocated object have a need for a usercopy annotation like
> >> > the one we're adding for if_inline_data?  Or is that already covered
> >> > elsewhere?
> >>
> >> Yeah, the xfs helper kmem_alloc() is used in the other case, which
> >> ultimately boils down to a call to kmalloc(), which is entirely
> >> whitelisted by an earlier patch in the series:
> >>
> >> https://lkml.org/lkml/2017/8/28/1026
> >
> > Ah.  It would've been helpful to have the first three patches cc'd to
> > the xfs list.  So basically this series establishes the ability to set
> 
> I went back and forth on that, and given all the things it touched, it
> seemed like too large a CC list. :) I can explicitly add the xfs list
> to the first three for any future versions.
> 
> > regions within a slab object into which copy_to_user can copy memory
> > contents, and vice versa.  Have you seen any runtime performance impact?
> > The overhead looks like it ought to be minimal.
> 
> Under CONFIG_HARDENED_USERCOPY, there's no difference in performance
> between the earlier bounds checking (of the whole slab object) vs the
> new bounds checking (of the useroffset/usersize portion of the slab
> object). Perf difference of CONFIG_HARDENED_USERCOPY itself has proven
> hard to measure, which likely means it's very minimal.
> 
> >> (It's possible that at some future time we can start segregating
> >> kernel-only kmallocs from usercopy-able kmallocs, but for now, there
> >> are no plans for this.)
> >
> > A pity.  It would be interesting to create no-usercopy versions of the
> > kmalloc-* slabs and see how much of XFS' memory consumption never
> > touches userspace buffers. :)
> 
> There are plans for building either a new helper (kmalloc_usercopy())
> or adding a new flag (GFP_USERCOPY), but I haven't had time yet to
> come back around to it. I wanted to land this step first, and we could
> then move forward on the rest in future.

Heh, fair enough.

For the XFS bits,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> -Kees
> 
> -- 
> Kees Cook
> Pixel Security
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 19:00             ` Darrick J. Wong
  0 siblings, 0 replies; 172+ messages in thread
From: Darrick J. Wong @ 2017-08-29 19:00 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Tue, Aug 29, 2017 at 11:48:49AM -0700, Kees Cook wrote:
> On Mon, Aug 28, 2017 at 9:47 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
> >> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
> >> <darrick.wong@oracle.com> wrote:
> >> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> >> >> From: David Windsor <dave@nullcore.net>
> >> >>
> >> >> The XFS inline inode data, stored in struct xfs_inode_t field
> >> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> >> >> cache, needs to be copied to/from userspace.
> >> >>
> >> >> cache object allocation:
> >> >>     fs/xfs/xfs_icache.c:
> >> >>         xfs_inode_alloc(...):
> >> >>             ...
> >> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> >> >>
> >> >>     fs/xfs/libxfs/xfs_inode_fork.c:
> >> >>         xfs_init_local_fork(...):
> >> >>             ...
> >> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
> >> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
> >> >
> >> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> >> > will be allocated for ifp->if_u1.if_data which can then be used for
> >> > readlink in the same manner as the example usage trace below.  Does
> >> > that allocated object have a need for a usercopy annotation like
> >> > the one we're adding for if_inline_data?  Or is that already covered
> >> > elsewhere?
> >>
> >> Yeah, the xfs helper kmem_alloc() is used in the other case, which
> >> ultimately boils down to a call to kmalloc(), which is entirely
> >> whitelisted by an earlier patch in the series:
> >>
> >> https://lkml.org/lkml/2017/8/28/1026
> >
> > Ah.  It would've been helpful to have the first three patches cc'd to
> > the xfs list.  So basically this series establishes the ability to set
> 
> I went back and forth on that, and given all the things it touched, it
> seemed like too large a CC list. :) I can explicitly add the xfs list
> to the first three for any future versions.
> 
> > regions within a slab object into which copy_to_user can copy memory
> > contents, and vice versa.  Have you seen any runtime performance impact?
> > The overhead looks like it ought to be minimal.
> 
> Under CONFIG_HARDENED_USERCOPY, there's no difference in performance
> between the earlier bounds checking (of the whole slab object) vs the
> new bounds checking (of the useroffset/usersize portion of the slab
> object). Perf difference of CONFIG_HARDENED_USERCOPY itself has proven
> hard to measure, which likely means it's very minimal.
> 
> >> (It's possible that at some future time we can start segregating
> >> kernel-only kmallocs from usercopy-able kmallocs, but for now, there
> >> are no plans for this.)
> >
> > A pity.  It would be interesting to create no-usercopy versions of the
> > kmalloc-* slabs and see how much of XFS' memory consumption never
> > touches userspace buffers. :)
> 
> There are plans for building either a new helper (kmalloc_usercopy())
> or adding a new flag (GFP_USERCOPY), but I haven't had time yet to
> come back around to it. I wanted to land this step first, and we could
> then move forward on the rest in future.

Heh, fair enough.

For the XFS bits,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> -Kees
> 
> -- 
> Kees Cook
> Pixel Security
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 19:00             ` Darrick J. Wong
  0 siblings, 0 replies; 172+ messages in thread
From: Darrick J. Wong @ 2017-08-29 19:00 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Tue, Aug 29, 2017 at 11:48:49AM -0700, Kees Cook wrote:
> On Mon, Aug 28, 2017 at 9:47 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
> >> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
> >> <darrick.wong@oracle.com> wrote:
> >> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> >> >> From: David Windsor <dave@nullcore.net>
> >> >>
> >> >> The XFS inline inode data, stored in struct xfs_inode_t field
> >> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> >> >> cache, needs to be copied to/from userspace.
> >> >>
> >> >> cache object allocation:
> >> >>     fs/xfs/xfs_icache.c:
> >> >>         xfs_inode_alloc(...):
> >> >>             ...
> >> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> >> >>
> >> >>     fs/xfs/libxfs/xfs_inode_fork.c:
> >> >>         xfs_init_local_fork(...):
> >> >>             ...
> >> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
> >> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
> >> >
> >> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> >> > will be allocated for ifp->if_u1.if_data which can then be used for
> >> > readlink in the same manner as the example usage trace below.  Does
> >> > that allocated object have a need for a usercopy annotation like
> >> > the one we're adding for if_inline_data?  Or is that already covered
> >> > elsewhere?
> >>
> >> Yeah, the xfs helper kmem_alloc() is used in the other case, which
> >> ultimately boils down to a call to kmalloc(), which is entirely
> >> whitelisted by an earlier patch in the series:
> >>
> >> https://lkml.org/lkml/2017/8/28/1026
> >
> > Ah.  It would've been helpful to have the first three patches cc'd to
> > the xfs list.  So basically this series establishes the ability to set
> 
> I went back and forth on that, and given all the things it touched, it
> seemed like too large a CC list. :) I can explicitly add the xfs list
> to the first three for any future versions.
> 
> > regions within a slab object into which copy_to_user can copy memory
> > contents, and vice versa.  Have you seen any runtime performance impact?
> > The overhead looks like it ought to be minimal.
> 
> Under CONFIG_HARDENED_USERCOPY, there's no difference in performance
> between the earlier bounds checking (of the whole slab object) vs the
> new bounds checking (of the useroffset/usersize portion of the slab
> object). Perf difference of CONFIG_HARDENED_USERCOPY itself has proven
> hard to measure, which likely means it's very minimal.
> 
> >> (It's possible that at some future time we can start segregating
> >> kernel-only kmallocs from usercopy-able kmallocs, but for now, there
> >> are no plans for this.)
> >
> > A pity.  It would be interesting to create no-usercopy versions of the
> > kmalloc-* slabs and see how much of XFS' memory consumption never
> > touches userspace buffers. :)
> 
> There are plans for building either a new helper (kmalloc_usercopy())
> or adding a new flag (GFP_USERCOPY), but I haven't had time yet to
> come back around to it. I wanted to land this step first, and we could
> then move forward on the rest in future.

Heh, fair enough.

For the XFS bits,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> -Kees
> 
> -- 
> Kees Cook
> Pixel Security
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 19:00             ` Darrick J. Wong
  0 siblings, 0 replies; 172+ messages in thread
From: Darrick J. Wong @ 2017-08-29 19:00 UTC (permalink / raw)
  To: Kees Cook; +Cc: LKML, David Windsor, linux-xfs, Linux-MM, kernel-hardening

On Tue, Aug 29, 2017 at 11:48:49AM -0700, Kees Cook wrote:
> On Mon, Aug 28, 2017 at 9:47 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
> >> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
> >> <darrick.wong@oracle.com> wrote:
> >> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> >> >> From: David Windsor <dave@nullcore.net>
> >> >>
> >> >> The XFS inline inode data, stored in struct xfs_inode_t field
> >> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> >> >> cache, needs to be copied to/from userspace.
> >> >>
> >> >> cache object allocation:
> >> >>     fs/xfs/xfs_icache.c:
> >> >>         xfs_inode_alloc(...):
> >> >>             ...
> >> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> >> >>
> >> >>     fs/xfs/libxfs/xfs_inode_fork.c:
> >> >>         xfs_init_local_fork(...):
> >> >>             ...
> >> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
> >> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
> >> >
> >> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> >> > will be allocated for ifp->if_u1.if_data which can then be used for
> >> > readlink in the same manner as the example usage trace below.  Does
> >> > that allocated object have a need for a usercopy annotation like
> >> > the one we're adding for if_inline_data?  Or is that already covered
> >> > elsewhere?
> >>
> >> Yeah, the xfs helper kmem_alloc() is used in the other case, which
> >> ultimately boils down to a call to kmalloc(), which is entirely
> >> whitelisted by an earlier patch in the series:
> >>
> >> https://lkml.org/lkml/2017/8/28/1026
> >
> > Ah.  It would've been helpful to have the first three patches cc'd to
> > the xfs list.  So basically this series establishes the ability to set
> 
> I went back and forth on that, and given all the things it touched, it
> seemed like too large a CC list. :) I can explicitly add the xfs list
> to the first three for any future versions.
> 
> > regions within a slab object into which copy_to_user can copy memory
> > contents, and vice versa.  Have you seen any runtime performance impact?
> > The overhead looks like it ought to be minimal.
> 
> Under CONFIG_HARDENED_USERCOPY, there's no difference in performance
> between the earlier bounds checking (of the whole slab object) vs the
> new bounds checking (of the useroffset/usersize portion of the slab
> object). Perf difference of CONFIG_HARDENED_USERCOPY itself has proven
> hard to measure, which likely means it's very minimal.
> 
> >> (It's possible that at some future time we can start segregating
> >> kernel-only kmallocs from usercopy-able kmallocs, but for now, there
> >> are no plans for this.)
> >
> > A pity.  It would be interesting to create no-usercopy versions of the
> > kmalloc-* slabs and see how much of XFS' memory consumption never
> > touches userspace buffers. :)
> 
> There are plans for building either a new helper (kmalloc_usercopy())
> or adding a new flag (GFP_USERCOPY), but I haven't had time yet to
> come back around to it. I wanted to land this step first, and we could
> then move forward on the rest in future.

Heh, fair enough.

For the XFS bits,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> 
> -Kees
> 
> -- 
> Kees Cook
> Pixel Security
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-29 12:45         ` Christoph Hellwig
  (?)
@ 2017-08-29 21:51           ` Dave Chinner
  -1 siblings, 0 replies; 172+ messages in thread
From: Dave Chinner @ 2017-08-29 21:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kees Cook, linux-kernel, David Windsor, Darrick J. Wong,
	linux-xfs, linux-mm, kernel-hardening

On Tue, Aug 29, 2017 at 05:45:36AM -0700, Christoph Hellwig wrote:
> On Tue, Aug 29, 2017 at 10:31:26PM +1000, Dave Chinner wrote:
> > Probably should.  I've already been looking at killing the inline
> > extents array to simplify the management of the extent list (much
> > simpler to index by rbtree when we don't have direct/indirect
> > structures), so killing the inline data would get rid of the other
> > part of the union the inline data sits in.
> 
> That's exactly where I came form with my extent list work.  Although
> the rbtree performance was horrible due to the memory overhead and
> I've switched to a modified b+tree at the moment..

Right, I've looked at btrees, too, but it's more complex than just
using an rbtree. I originally looked at using Peter Z's old
RCU-aware btree code, but it doesn't hold data in the tree leaves.
So that needed significant modification to make work without a
memory alloc per extent and that didn't work with original aim of
RCU-safe extent lookups.  I also looked at that "generic" btree
stuff that came from logfs, and after a little while ran away
screaming. So if we are going to use a b+tree, it sounds like you
are probably going the right way.

As it is, I've been looking at using interval tree - I have kinda
working code - which basically leaves the page based extent arrays
intact but adds an rbnode/interval state header to the start of each
page to track the offsets within the node and propagate them back up
to the root for fast offset based extent lookups. With a lookaside
cache on the root, it should behave and perform almost identically
to the current indirect array and should have very little extra
overhead....

The sticking point, IMO, is the extent array index based lookups in
all the bmbt code.  I've been looking at converting all that to use
offset based lookups and a cursor w/ lookup/inc/dec/insert/delete
ioperations wrapping xfs_iext_lookup_ext() and friends. This means
the modifications are pretty much identical to the on-disk extent
btree, so they can be abstracted out into a single extent update
interface for both trees.  Have you planned/done any cleanup/changes
with this code?

> > OTOH, if we're going to have to dynamically allocate the memory for
> > the extent/inline data for the data fork, it may just be easier to
> > make the entire data fork a dynamic allocation (like the attr fork).
> 
> I though about this a bit, but it turned out that we basically
> always need the data anyway, so I don't think it's going to buy
> us much unless we shrink the inode enough so that they better fit
> into a page.

True. Keep it mind for when we've shrunk the inode by another
100 bytes...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 21:51           ` Dave Chinner
  0 siblings, 0 replies; 172+ messages in thread
From: Dave Chinner @ 2017-08-29 21:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kees Cook, linux-kernel, David Windsor, Darrick J. Wong,
	linux-xfs, linux-mm, kernel-hardening

On Tue, Aug 29, 2017 at 05:45:36AM -0700, Christoph Hellwig wrote:
> On Tue, Aug 29, 2017 at 10:31:26PM +1000, Dave Chinner wrote:
> > Probably should.  I've already been looking at killing the inline
> > extents array to simplify the management of the extent list (much
> > simpler to index by rbtree when we don't have direct/indirect
> > structures), so killing the inline data would get rid of the other
> > part of the union the inline data sits in.
> 
> That's exactly where I came form with my extent list work.  Although
> the rbtree performance was horrible due to the memory overhead and
> I've switched to a modified b+tree at the moment..

Right, I've looked at btrees, too, but it's more complex than just
using an rbtree. I originally looked at using Peter Z's old
RCU-aware btree code, but it doesn't hold data in the tree leaves.
So that needed significant modification to make work without a
memory alloc per extent and that didn't work with original aim of
RCU-safe extent lookups.  I also looked at that "generic" btree
stuff that came from logfs, and after a little while ran away
screaming. So if we are going to use a b+tree, it sounds like you
are probably going the right way.

As it is, I've been looking at using interval tree - I have kinda
working code - which basically leaves the page based extent arrays
intact but adds an rbnode/interval state header to the start of each
page to track the offsets within the node and propagate them back up
to the root for fast offset based extent lookups. With a lookaside
cache on the root, it should behave and perform almost identically
to the current indirect array and should have very little extra
overhead....

The sticking point, IMO, is the extent array index based lookups in
all the bmbt code.  I've been looking at converting all that to use
offset based lookups and a cursor w/ lookup/inc/dec/insert/delete
ioperations wrapping xfs_iext_lookup_ext() and friends. This means
the modifications are pretty much identical to the on-disk extent
btree, so they can be abstracted out into a single extent update
interface for both trees.  Have you planned/done any cleanup/changes
with this code?

> > OTOH, if we're going to have to dynamically allocate the memory for
> > the extent/inline data for the data fork, it may just be easier to
> > make the entire data fork a dynamic allocation (like the attr fork).
> 
> I though about this a bit, but it turned out that we basically
> always need the data anyway, so I don't think it's going to buy
> us much unless we shrink the inode enough so that they better fit
> into a page.

True. Keep it mind for when we've shrunk the inode by another
100 bytes...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 21:51           ` Dave Chinner
  0 siblings, 0 replies; 172+ messages in thread
From: Dave Chinner @ 2017-08-29 21:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kees Cook, linux-kernel, David Windsor, Darrick J. Wong,
	linux-xfs, linux-mm, kernel-hardening

On Tue, Aug 29, 2017 at 05:45:36AM -0700, Christoph Hellwig wrote:
> On Tue, Aug 29, 2017 at 10:31:26PM +1000, Dave Chinner wrote:
> > Probably should.  I've already been looking at killing the inline
> > extents array to simplify the management of the extent list (much
> > simpler to index by rbtree when we don't have direct/indirect
> > structures), so killing the inline data would get rid of the other
> > part of the union the inline data sits in.
> 
> That's exactly where I came form with my extent list work.  Although
> the rbtree performance was horrible due to the memory overhead and
> I've switched to a modified b+tree at the moment..

Right, I've looked at btrees, too, but it's more complex than just
using an rbtree. I originally looked at using Peter Z's old
RCU-aware btree code, but it doesn't hold data in the tree leaves.
So that needed significant modification to make work without a
memory alloc per extent and that didn't work with original aim of
RCU-safe extent lookups.  I also looked at that "generic" btree
stuff that came from logfs, and after a little while ran away
screaming. So if we are going to use a b+tree, it sounds like you
are probably going the right way.

As it is, I've been looking at using interval tree - I have kinda
working code - which basically leaves the page based extent arrays
intact but adds an rbnode/interval state header to the start of each
page to track the offsets within the node and propagate them back up
to the root for fast offset based extent lookups. With a lookaside
cache on the root, it should behave and perform almost identically
to the current indirect array and should have very little extra
overhead....

The sticking point, IMO, is the extent array index based lookups in
all the bmbt code.  I've been looking at converting all that to use
offset based lookups and a cursor w/ lookup/inc/dec/insert/delete
ioperations wrapping xfs_iext_lookup_ext() and friends. This means
the modifications are pretty much identical to the on-disk extent
btree, so they can be abstracted out into a single extent update
interface for both trees.  Have you planned/done any cleanup/changes
with this code?

> > OTOH, if we're going to have to dynamically allocate the memory for
> > the extent/inline data for the data fork, it may just be easier to
> > make the entire data fork a dynamic allocation (like the attr fork).
> 
> I though about this a bit, but it turned out that we basically
> always need the data anyway, so I don't think it's going to buy
> us much unless we shrink the inode enough so that they better fit
> into a page.

True. Keep it mind for when we've shrunk the inode by another
100 bytes...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-29 18:48           ` Kees Cook
  (?)
  (?)
@ 2017-08-29 22:15             ` Dave Chinner
  -1 siblings, 0 replies; 172+ messages in thread
From: Dave Chinner @ 2017-08-29 22:15 UTC (permalink / raw)
  To: Kees Cook
  Cc: Darrick J. Wong, LKML, David Windsor, linux-xfs, Linux-MM,
	kernel-hardening

On Tue, Aug 29, 2017 at 11:48:49AM -0700, Kees Cook wrote:
> On Mon, Aug 28, 2017 at 9:47 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
> >> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
> >> <darrick.wong@oracle.com> wrote:
> >> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> >> >> From: David Windsor <dave@nullcore.net>
> >> >>
> >> >> The XFS inline inode data, stored in struct xfs_inode_t field
> >> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> >> >> cache, needs to be copied to/from userspace.
> >> >>
> >> >> cache object allocation:
> >> >>     fs/xfs/xfs_icache.c:
> >> >>         xfs_inode_alloc(...):
> >> >>             ...
> >> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> >> >>
> >> >>     fs/xfs/libxfs/xfs_inode_fork.c:
> >> >>         xfs_init_local_fork(...):
> >> >>             ...
> >> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
> >> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
> >> >
> >> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> >> > will be allocated for ifp->if_u1.if_data which can then be used for
> >> > readlink in the same manner as the example usage trace below.  Does
> >> > that allocated object have a need for a usercopy annotation like
> >> > the one we're adding for if_inline_data?  Or is that already covered
> >> > elsewhere?
> >>
> >> Yeah, the xfs helper kmem_alloc() is used in the other case, which
> >> ultimately boils down to a call to kmalloc(), which is entirely
> >> whitelisted by an earlier patch in the series:
> >>
> >> https://lkml.org/lkml/2017/8/28/1026
> >
> > Ah.  It would've been helpful to have the first three patches cc'd to
> > the xfs list.  So basically this series establishes the ability to set
> 
> I went back and forth on that, and given all the things it touched, it
> seemed like too large a CC list. :) I can explicitly add the xfs list
> to the first three for any future versions.

If you are touching multiple filesystems, you really should cc the
entire patchset to linux-fsdevel, similar to how you sent the entire
patchset to lkml. That way the entire series will end up on a list
that almost all fs developers read. LKML is not a list you can rely
on all filesystem developers reading (or developers in any other
subsystem, for that matter)...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 22:15             ` Dave Chinner
  0 siblings, 0 replies; 172+ messages in thread
From: Dave Chinner @ 2017-08-29 22:15 UTC (permalink / raw)
  To: Kees Cook
  Cc: Darrick J. Wong, LKML, David Windsor, linux-xfs, Linux-MM,
	kernel-hardening

On Tue, Aug 29, 2017 at 11:48:49AM -0700, Kees Cook wrote:
> On Mon, Aug 28, 2017 at 9:47 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
> >> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
> >> <darrick.wong@oracle.com> wrote:
> >> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> >> >> From: David Windsor <dave@nullcore.net>
> >> >>
> >> >> The XFS inline inode data, stored in struct xfs_inode_t field
> >> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> >> >> cache, needs to be copied to/from userspace.
> >> >>
> >> >> cache object allocation:
> >> >>     fs/xfs/xfs_icache.c:
> >> >>         xfs_inode_alloc(...):
> >> >>             ...
> >> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> >> >>
> >> >>     fs/xfs/libxfs/xfs_inode_fork.c:
> >> >>         xfs_init_local_fork(...):
> >> >>             ...
> >> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
> >> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
> >> >
> >> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> >> > will be allocated for ifp->if_u1.if_data which can then be used for
> >> > readlink in the same manner as the example usage trace below.  Does
> >> > that allocated object have a need for a usercopy annotation like
> >> > the one we're adding for if_inline_data?  Or is that already covered
> >> > elsewhere?
> >>
> >> Yeah, the xfs helper kmem_alloc() is used in the other case, which
> >> ultimately boils down to a call to kmalloc(), which is entirely
> >> whitelisted by an earlier patch in the series:
> >>
> >> https://lkml.org/lkml/2017/8/28/1026
> >
> > Ah.  It would've been helpful to have the first three patches cc'd to
> > the xfs list.  So basically this series establishes the ability to set
> 
> I went back and forth on that, and given all the things it touched, it
> seemed like too large a CC list. :) I can explicitly add the xfs list
> to the first three for any future versions.

If you are touching multiple filesystems, you really should cc the
entire patchset to linux-fsdevel, similar to how you sent the entire
patchset to lkml. That way the entire series will end up on a list
that almost all fs developers read. LKML is not a list you can rely
on all filesystem developers reading (or developers in any other
subsystem, for that matter)...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 22:15             ` Dave Chinner
  0 siblings, 0 replies; 172+ messages in thread
From: Dave Chinner @ 2017-08-29 22:15 UTC (permalink / raw)
  To: Kees Cook
  Cc: Darrick J. Wong, LKML, David Windsor, linux-xfs, Linux-MM,
	kernel-hardening

On Tue, Aug 29, 2017 at 11:48:49AM -0700, Kees Cook wrote:
> On Mon, Aug 28, 2017 at 9:47 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
> >> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
> >> <darrick.wong@oracle.com> wrote:
> >> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> >> >> From: David Windsor <dave@nullcore.net>
> >> >>
> >> >> The XFS inline inode data, stored in struct xfs_inode_t field
> >> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> >> >> cache, needs to be copied to/from userspace.
> >> >>
> >> >> cache object allocation:
> >> >>     fs/xfs/xfs_icache.c:
> >> >>         xfs_inode_alloc(...):
> >> >>             ...
> >> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> >> >>
> >> >>     fs/xfs/libxfs/xfs_inode_fork.c:
> >> >>         xfs_init_local_fork(...):
> >> >>             ...
> >> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
> >> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
> >> >
> >> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> >> > will be allocated for ifp->if_u1.if_data which can then be used for
> >> > readlink in the same manner as the example usage trace below.  Does
> >> > that allocated object have a need for a usercopy annotation like
> >> > the one we're adding for if_inline_data?  Or is that already covered
> >> > elsewhere?
> >>
> >> Yeah, the xfs helper kmem_alloc() is used in the other case, which
> >> ultimately boils down to a call to kmalloc(), which is entirely
> >> whitelisted by an earlier patch in the series:
> >>
> >> https://lkml.org/lkml/2017/8/28/1026
> >
> > Ah.  It would've been helpful to have the first three patches cc'd to
> > the xfs list.  So basically this series establishes the ability to set
> 
> I went back and forth on that, and given all the things it touched, it
> seemed like too large a CC list. :) I can explicitly add the xfs list
> to the first three for any future versions.

If you are touching multiple filesystems, you really should cc the
entire patchset to linux-fsdevel, similar to how you sent the entire
patchset to lkml. That way the entire series will end up on a list
that almost all fs developers read. LKML is not a list you can rely
on all filesystem developers reading (or developers in any other
subsystem, for that matter)...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 22:15             ` Dave Chinner
  0 siblings, 0 replies; 172+ messages in thread
From: Dave Chinner @ 2017-08-29 22:15 UTC (permalink / raw)
  To: Kees Cook
  Cc: Darrick J. Wong, LKML, David Windsor, linux-xfs, Linux-MM,
	kernel-hardening

On Tue, Aug 29, 2017 at 11:48:49AM -0700, Kees Cook wrote:
> On Mon, Aug 28, 2017 at 9:47 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Mon, Aug 28, 2017 at 02:57:14PM -0700, Kees Cook wrote:
> >> On Mon, Aug 28, 2017 at 2:49 PM, Darrick J. Wong
> >> <darrick.wong@oracle.com> wrote:
> >> > On Mon, Aug 28, 2017 at 02:34:56PM -0700, Kees Cook wrote:
> >> >> From: David Windsor <dave@nullcore.net>
> >> >>
> >> >> The XFS inline inode data, stored in struct xfs_inode_t field
> >> >> i_df.if_u2.if_inline_data and therefore contained in the xfs_inode slab
> >> >> cache, needs to be copied to/from userspace.
> >> >>
> >> >> cache object allocation:
> >> >>     fs/xfs/xfs_icache.c:
> >> >>         xfs_inode_alloc(...):
> >> >>             ...
> >> >>             ip = kmem_zone_alloc(xfs_inode_zone, KM_SLEEP);
> >> >>
> >> >>     fs/xfs/libxfs/xfs_inode_fork.c:
> >> >>         xfs_init_local_fork(...):
> >> >>             ...
> >> >>             if (mem_size <= sizeof(ifp->if_u2.if_inline_data))
> >> >>                     ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
> >> >
> >> > Hmm, what happens when mem_size > sizeof(if_inline_data)?  A slab object
> >> > will be allocated for ifp->if_u1.if_data which can then be used for
> >> > readlink in the same manner as the example usage trace below.  Does
> >> > that allocated object have a need for a usercopy annotation like
> >> > the one we're adding for if_inline_data?  Or is that already covered
> >> > elsewhere?
> >>
> >> Yeah, the xfs helper kmem_alloc() is used in the other case, which
> >> ultimately boils down to a call to kmalloc(), which is entirely
> >> whitelisted by an earlier patch in the series:
> >>
> >> https://lkml.org/lkml/2017/8/28/1026
> >
> > Ah.  It would've been helpful to have the first three patches cc'd to
> > the xfs list.  So basically this series establishes the ability to set
> 
> I went back and forth on that, and given all the things it touched, it
> seemed like too large a CC list. :) I can explicitly add the xfs list
> to the first three for any future versions.

If you are touching multiple filesystems, you really should cc the
entire patchset to linux-fsdevel, similar to how you sent the entire
patchset to lkml. That way the entire series will end up on a list
that almost all fs developers read. LKML is not a list you can rely
on all filesystem developers reading (or developers in any other
subsystem, for that matter)...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-29 22:15             ` Dave Chinner
  (?)
  (?)
@ 2017-08-29 22:25               ` Kees Cook
  -1 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 22:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, LKML, David Windsor, linux-xfs, Linux-MM,
	kernel-hardening

On Tue, Aug 29, 2017 at 3:15 PM, Dave Chinner <david@fromorbit.com> wrote:
> If you are touching multiple filesystems, you really should cc the
> entire patchset to linux-fsdevel, similar to how you sent the entire
> patchset to lkml. That way the entire series will end up on a list
> that almost all fs developers read. LKML is not a list you can rely
> on all filesystem developers reading (or developers in any other
> subsystem, for that matter)...

Okay, sounds good. Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 22:25               ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 22:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, LKML, David Windsor, linux-xfs, Linux-MM,
	kernel-hardening

On Tue, Aug 29, 2017 at 3:15 PM, Dave Chinner <david@fromorbit.com> wrote:
> If you are touching multiple filesystems, you really should cc the
> entire patchset to linux-fsdevel, similar to how you sent the entire
> patchset to lkml. That way the entire series will end up on a list
> that almost all fs developers read. LKML is not a list you can rely
> on all filesystem developers reading (or developers in any other
> subsystem, for that matter)...

Okay, sounds good. Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 22:25               ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 22:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, LKML, David Windsor, linux-xfs, Linux-MM,
	kernel-hardening

On Tue, Aug 29, 2017 at 3:15 PM, Dave Chinner <david@fromorbit.com> wrote:
> If you are touching multiple filesystems, you really should cc the
> entire patchset to linux-fsdevel, similar to how you sent the entire
> patchset to lkml. That way the entire series will end up on a list
> that almost all fs developers read. LKML is not a list you can rely
> on all filesystem developers reading (or developers in any other
> subsystem, for that matter)...

Okay, sounds good. Thanks!

-Kees

-- 
Kees Cook
Pixel Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-29 22:25               ` Kees Cook
  0 siblings, 0 replies; 172+ messages in thread
From: Kees Cook @ 2017-08-29 22:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, LKML, David Windsor, linux-xfs, Linux-MM,
	kernel-hardening

On Tue, Aug 29, 2017 at 3:15 PM, Dave Chinner <david@fromorbit.com> wrote:
> If you are touching multiple filesystems, you really should cc the
> entire patchset to linux-fsdevel, similar to how you sent the entire
> patchset to lkml. That way the entire series will end up on a list
> that almost all fs developers read. LKML is not a list you can rely
> on all filesystem developers reading (or developers in any other
> subsystem, for that matter)...

Okay, sounds good. Thanks!

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-29 21:51           ` Dave Chinner
  (?)
@ 2017-08-30  7:14             ` Christoph Hellwig
  -1 siblings, 0 replies; 172+ messages in thread
From: Christoph Hellwig @ 2017-08-30  7:14 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Kees Cook, linux-kernel, David Windsor,
	Darrick J. Wong, linux-xfs, linux-mm, kernel-hardening

On Wed, Aug 30, 2017 at 07:51:57AM +1000, Dave Chinner wrote:
> Right, I've looked at btrees, too, but it's more complex than just
> using an rbtree. I originally looked at using Peter Z's old
> RCU-aware btree code, but it doesn't hold data in the tree leaves.
> So that needed significant modification to make work without a
> memory alloc per extent and that didn't work with original aim of
> RCU-safe extent lookups.  I also looked at that "generic" btree
> stuff that came from logfs, and after a little while ran away
> screaming.

I started with the latter, but it's not really looking like it any more:
there nodes are formatted as a series of u64s instead of all the
long magic, and the data is stored inline - in fact I use a cute
trick to keep the size down, derived from our "compressed" on disk
extent format:

Key:

 +-------+----------------------------+
 | 00:51 | all 52 bits of startoff    |
 | 52:63 | low 12 bits of startblock  |
 +-------+----------------------------+

Value

 +-------+----------------------------+
 | 00:20 | all 21 bits of length      |
 |    21 | unwritten extent bit       |
 | 22:63 | high 42 bits of startblock |
 +-------+----------------------------+

So we only need a 64-bit key and a 64-bit value by abusing parts
of the key to store bits of the startblock.

For non-leaf nodes we iterate through the keys only, never touching
the cache lines for the value.  For the leaf nodes we have to touch
the value anyway because we have to do a range lookup to find the
exact record.

This works fine so far in an isolated simulator, and now I'm ammending
it to be a b+tree with pointers to the previous and next node so
that we can nicely implement our extent iterators instead of doing
full lookups.

> The sticking point, IMO, is the extent array index based lookups in
> all the bmbt code.  I've been looking at converting all that to use
> offset based lookups and a cursor w/ lookup/inc/dec/insert/delete
> ioperations wrapping xfs_iext_lookup_ext() and friends. This means
> the modifications are pretty much identical to the on-disk extent
> btree, so they can be abstracted out into a single extent update
> interface for both trees.  Have you planned/done any cleanup/changes
> with this code?

I've done various cleanups, but I've not yet consolidated the two.
Basically step one at the moment is to move everyone to
xfs_iext_lookup_extent + xfs_iext_get_extent that removes all the
bad intrusion.

Once we move to the actual b+trees the extnum_t cursor will be replaced
with a real cursor structure that contains a pointer to the current
b+tree leaf node, and an index inside that, which will allows us very
efficient iteration.  The xfs_iext_get_extent calls will be replaced
with more specific xfs_iext_prev_extent, xfs_iext_next_extent calls
that include the now slightly more complex cursor decrement, increment
as well as a new xfs_iext_last_extent helper for the last extent
that we need in a few places.

insert/delete remain very similar to what they do right now, they'll
get a different cursor type, and the manual xfs_iext_add calls will
go away.  The new xfs_iext_update_extent helper I posted to the list
yesterday will become a bit more complex, as changing the startoff
will have to be propagated up the tree.

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-30  7:14             ` Christoph Hellwig
  0 siblings, 0 replies; 172+ messages in thread
From: Christoph Hellwig @ 2017-08-30  7:14 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Kees Cook, linux-kernel, David Windsor,
	Darrick J. Wong, linux-xfs, linux-mm, kernel-hardening

On Wed, Aug 30, 2017 at 07:51:57AM +1000, Dave Chinner wrote:
> Right, I've looked at btrees, too, but it's more complex than just
> using an rbtree. I originally looked at using Peter Z's old
> RCU-aware btree code, but it doesn't hold data in the tree leaves.
> So that needed significant modification to make work without a
> memory alloc per extent and that didn't work with original aim of
> RCU-safe extent lookups.  I also looked at that "generic" btree
> stuff that came from logfs, and after a little while ran away
> screaming.

I started with the latter, but it's not really looking like it any more:
there nodes are formatted as a series of u64s instead of all the
long magic, and the data is stored inline - in fact I use a cute
trick to keep the size down, derived from our "compressed" on disk
extent format:

Key:

 +-------+----------------------------+
 | 00:51 | all 52 bits of startoff    |
 | 52:63 | low 12 bits of startblock  |
 +-------+----------------------------+

Value

 +-------+----------------------------+
 | 00:20 | all 21 bits of length      |
 |    21 | unwritten extent bit       |
 | 22:63 | high 42 bits of startblock |
 +-------+----------------------------+

So we only need a 64-bit key and a 64-bit value by abusing parts
of the key to store bits of the startblock.

For non-leaf nodes we iterate through the keys only, never touching
the cache lines for the value.  For the leaf nodes we have to touch
the value anyway because we have to do a range lookup to find the
exact record.

This works fine so far in an isolated simulator, and now I'm ammending
it to be a b+tree with pointers to the previous and next node so
that we can nicely implement our extent iterators instead of doing
full lookups.

> The sticking point, IMO, is the extent array index based lookups in
> all the bmbt code.  I've been looking at converting all that to use
> offset based lookups and a cursor w/ lookup/inc/dec/insert/delete
> ioperations wrapping xfs_iext_lookup_ext() and friends. This means
> the modifications are pretty much identical to the on-disk extent
> btree, so they can be abstracted out into a single extent update
> interface for both trees.  Have you planned/done any cleanup/changes
> with this code?

I've done various cleanups, but I've not yet consolidated the two.
Basically step one at the moment is to move everyone to
xfs_iext_lookup_extent + xfs_iext_get_extent that removes all the
bad intrusion.

Once we move to the actual b+trees the extnum_t cursor will be replaced
with a real cursor structure that contains a pointer to the current
b+tree leaf node, and an index inside that, which will allows us very
efficient iteration.  The xfs_iext_get_extent calls will be replaced
with more specific xfs_iext_prev_extent, xfs_iext_next_extent calls
that include the now slightly more complex cursor decrement, increment
as well as a new xfs_iext_last_extent helper for the last extent
that we need in a few places.

insert/delete remain very similar to what they do right now, they'll
get a different cursor type, and the manual xfs_iext_add calls will
go away.  The new xfs_iext_update_extent helper I posted to the list
yesterday will become a bit more complex, as changing the startoff
will have to be propagated up the tree.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-30  7:14             ` Christoph Hellwig
  0 siblings, 0 replies; 172+ messages in thread
From: Christoph Hellwig @ 2017-08-30  7:14 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Kees Cook, linux-kernel, David Windsor,
	Darrick J. Wong, linux-xfs, linux-mm, kernel-hardening

On Wed, Aug 30, 2017 at 07:51:57AM +1000, Dave Chinner wrote:
> Right, I've looked at btrees, too, but it's more complex than just
> using an rbtree. I originally looked at using Peter Z's old
> RCU-aware btree code, but it doesn't hold data in the tree leaves.
> So that needed significant modification to make work without a
> memory alloc per extent and that didn't work with original aim of
> RCU-safe extent lookups.  I also looked at that "generic" btree
> stuff that came from logfs, and after a little while ran away
> screaming.

I started with the latter, but it's not really looking like it any more:
there nodes are formatted as a series of u64s instead of all the
long magic, and the data is stored inline - in fact I use a cute
trick to keep the size down, derived from our "compressed" on disk
extent format:

Key:

 +-------+----------------------------+
 | 00:51 | all 52 bits of startoff    |
 | 52:63 | low 12 bits of startblock  |
 +-------+----------------------------+

Value

 +-------+----------------------------+
 | 00:20 | all 21 bits of length      |
 |    21 | unwritten extent bit       |
 | 22:63 | high 42 bits of startblock |
 +-------+----------------------------+

So we only need a 64-bit key and a 64-bit value by abusing parts
of the key to store bits of the startblock.

For non-leaf nodes we iterate through the keys only, never touching
the cache lines for the value.  For the leaf nodes we have to touch
the value anyway because we have to do a range lookup to find the
exact record.

This works fine so far in an isolated simulator, and now I'm ammending
it to be a b+tree with pointers to the previous and next node so
that we can nicely implement our extent iterators instead of doing
full lookups.

> The sticking point, IMO, is the extent array index based lookups in
> all the bmbt code.  I've been looking at converting all that to use
> offset based lookups and a cursor w/ lookup/inc/dec/insert/delete
> ioperations wrapping xfs_iext_lookup_ext() and friends. This means
> the modifications are pretty much identical to the on-disk extent
> btree, so they can be abstracted out into a single extent update
> interface for both trees.  Have you planned/done any cleanup/changes
> with this code?

I've done various cleanups, but I've not yet consolidated the two.
Basically step one at the moment is to move everyone to
xfs_iext_lookup_extent + xfs_iext_get_extent that removes all the
bad intrusion.

Once we move to the actual b+trees the extnum_t cursor will be replaced
with a real cursor structure that contains a pointer to the current
b+tree leaf node, and an index inside that, which will allows us very
efficient iteration.  The xfs_iext_get_extent calls will be replaced
with more specific xfs_iext_prev_extent, xfs_iext_next_extent calls
that include the now slightly more complex cursor decrement, increment
as well as a new xfs_iext_last_extent helper for the last extent
that we need in a few places.

insert/delete remain very similar to what they do right now, they'll
get a different cursor type, and the manual xfs_iext_add calls will
go away.  The new xfs_iext_update_extent helper I posted to the list
yesterday will become a bit more complex, as changing the startoff
will have to be propagated up the tree.

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-30  7:14             ` Christoph Hellwig
  (?)
@ 2017-08-30  8:05               ` Dave Chinner
  -1 siblings, 0 replies; 172+ messages in thread
From: Dave Chinner @ 2017-08-30  8:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kees Cook, linux-kernel, David Windsor, Darrick J. Wong,
	linux-xfs, linux-mm, kernel-hardening

On Wed, Aug 30, 2017 at 12:14:03AM -0700, Christoph Hellwig wrote:
> On Wed, Aug 30, 2017 at 07:51:57AM +1000, Dave Chinner wrote:
> > Right, I've looked at btrees, too, but it's more complex than just
> > using an rbtree. I originally looked at using Peter Z's old
> > RCU-aware btree code, but it doesn't hold data in the tree leaves.
> > So that needed significant modification to make work without a
> > memory alloc per extent and that didn't work with original aim of
> > RCU-safe extent lookups.  I also looked at that "generic" btree
> > stuff that came from logfs, and after a little while ran away
> > screaming.
> 
> I started with the latter, but it's not really looking like it any more:
> there nodes are formatted as a series of u64s instead of all the
> long magic,

Yeah, that was about where I started to run away and look for
something nicer....

> and the data is stored inline - in fact I use a cute
> trick to keep the size down, derived from our "compressed" on disk
> extent format:
> 
> Key:
> 
>  +-------+----------------------------+
>  | 00:51 | all 52 bits of startoff    |
>  | 52:63 | low 12 bits of startblock  |
>  +-------+----------------------------+
> 
> Value
> 
>  +-------+----------------------------+
>  | 00:20 | all 21 bits of length      |
>  |    21 | unwritten extent bit       |
>  | 22:63 | high 42 bits of startblock |
>  +-------+----------------------------+
> 
> So we only need a 64-bit key and a 64-bit value by abusing parts
> of the key to store bits of the startblock.

Neat! :)

> For non-leaf nodes we iterate through the keys only, never touching
> the cache lines for the value.  For the leaf nodes we have to touch
> the value anyway because we have to do a range lookup to find the
> exact record.
> 
> This works fine so far in an isolated simulator, and now I'm ammending
> it to be a b+tree with pointers to the previous and next node so
> that we can nicely implement our extent iterators instead of doing
> full lookups.

Ok, that sounds exactly what I have been looking towards....

> > The sticking point, IMO, is the extent array index based lookups in
> > all the bmbt code.  I've been looking at converting all that to use
> > offset based lookups and a cursor w/ lookup/inc/dec/insert/delete
> > ioperations wrapping xfs_iext_lookup_ext() and friends. This means
> > the modifications are pretty much identical to the on-disk extent
> > btree, so they can be abstracted out into a single extent update
> > interface for both trees.  Have you planned/done any cleanup/changes
> > with this code?
> 
> I've done various cleanups, but I've not yet consolidated the two.
> Basically step one at the moment is to move everyone to
> xfs_iext_lookup_extent + xfs_iext_get_extent that removes all the
> bad intrusion.

Yup.

> Once we move to the actual b+trees the extnum_t cursor will be replaced
> with a real cursor structure that contains a pointer to the current
> b+tree leaf node, and an index inside that, which will allows us very
> efficient iteration.  The xfs_iext_get_extent calls will be replaced
> with more specific xfs_iext_prev_extent, xfs_iext_next_extent calls
> that include the now slightly more complex cursor decrement, increment
> as well as a new xfs_iext_last_extent helper for the last extent
> that we need in a few places.

Ok, that's sounds like it'll fit right in with what I've been
prototyping for the extent code in xfs_bmap.c. I can make that work
with a cursor-based lookup/inc/dec/ins/del API similar to the bmbt
API. I've been looking to abstract the extent manipulations out into
functions that modify both trees like this:

[note: just put template code in to get my thoughts straight, it's
not working code]

+static int
+xfs_bmex_delete(
+       struct xfs_iext_cursor          *icur,
+       struct xfs_btree_cursor         *cur,
+       int                             *nextents)
+{
+       int                             i;
+
+       xfs_iext_remove(bma->ip, bma->idx + 1, 2, state);
+       if (nextents)
+               (*nextents)--;
+       if (!cur)
+               return 0;
+       error = xfs_btree_delete(cur, &i);
+       if (error)
+               return error;
+       XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
+       return 0;
+}
+
+static int
+xfs_bmex_increment(
+       struct xfs_iext_cursor          *icur,
+       struct xfs_btree_cursor         *cur)
+{
+       int                             i;
+
+       icur->ep = xfs_iext_get_right_ext(icur->ep);
+       if (!cur)
+               return 0;
+       error = xfs_btree_increment(cur, 0, &i);
+       if (error)
+               return error;
+       XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
+       return 0;
+}
+
+static int
+xfs_bmex_decrement(
+       struct xfs_iext_cursor          *icur,
+       struct xfs_btree_cursor         *cur)
+{
+       int                             i;
+
+       icur->ep = xfs_iext_get_left_ext(icur->ep);
+       if (!cur)
+               return 0;
+       error = xfs_btree_decrement(cur, 0, &i);
+       if (error)
+               return error;
+       XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
+       return 0;
+}

And so what you're doing would fit straight into that. I'm
ending up with is extent operations that look like this:

xfs_bmap_add_extent_delay_real()
.....
	case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG |
             BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG:
                /*
                 * Filling in all of a previously delayed allocation extent.
                 * The left and right neighbors are both contiguous with new.
                 */
+               rval |= XFS_ILOG_CORE;
+
+               /* remove the incore delalloc extent first */
+               error = xfs_bmex_delete(&icur, NULL, nextents);
+               if (error)
+                       goto done;
+
+               /*
+                * update incore and bmap extent trees
+                *      1. set cursors to the right extent
+                *      2. remove the right extent
+                *      3. update the left extent to span all 3 extent ranges
+                */
+               error = xfs_bmex_lookup_eq(&icur, bma->cur, RIGHT.br_startoff,
+                               RIGHT.br_startblock, RIGHT.br_blockcount, 1);
+               if (error)
+                       goto done;
+               error = xfs_bmex_delete(&icur, bma->cur, NULL);
+               if (error)
+                       goto done;
+               error = xfs_bmex_decrement(&icur, bma->cur);
+               if (error)
+                       goto done;
+               error = xfs_bmex_update(&icur, bma->cur, LEFT.br_startoff,
+                               LEFT.br_startblock,
+                               LEFT.br_blockcount + PREV.br_blockcount +
+                                       RIGHT.br_blockcount,
+                               LEFT.br_state);
+               if (error)
+                       goto done;
 		break;
....

And I'm starting to see where there are common extent manipulations
being done so there's probably a fair amount of further factoring
that can be done on top of this....

> insert/delete remain very similar to what they do right now, they'll
> get a different cursor type, and the manual xfs_iext_add calls will
> go away.  The new xfs_iext_update_extent helper I posted to the list
> yesterday will become a bit more complex, as changing the startoff
> will have to be propagated up the tree.

I've had a quick look at them and pulled it down into my tree for
testing (which had a cpu burning hang on xfs/020 a few minutes ago),
but I'll spend more time grokking them tomorrow.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-30  8:05               ` Dave Chinner
  0 siblings, 0 replies; 172+ messages in thread
From: Dave Chinner @ 2017-08-30  8:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kees Cook, linux-kernel, David Windsor, Darrick J. Wong,
	linux-xfs, linux-mm, kernel-hardening

On Wed, Aug 30, 2017 at 12:14:03AM -0700, Christoph Hellwig wrote:
> On Wed, Aug 30, 2017 at 07:51:57AM +1000, Dave Chinner wrote:
> > Right, I've looked at btrees, too, but it's more complex than just
> > using an rbtree. I originally looked at using Peter Z's old
> > RCU-aware btree code, but it doesn't hold data in the tree leaves.
> > So that needed significant modification to make work without a
> > memory alloc per extent and that didn't work with original aim of
> > RCU-safe extent lookups.  I also looked at that "generic" btree
> > stuff that came from logfs, and after a little while ran away
> > screaming.
> 
> I started with the latter, but it's not really looking like it any more:
> there nodes are formatted as a series of u64s instead of all the
> long magic,

Yeah, that was about where I started to run away and look for
something nicer....

> and the data is stored inline - in fact I use a cute
> trick to keep the size down, derived from our "compressed" on disk
> extent format:
> 
> Key:
> 
>  +-------+----------------------------+
>  | 00:51 | all 52 bits of startoff    |
>  | 52:63 | low 12 bits of startblock  |
>  +-------+----------------------------+
> 
> Value
> 
>  +-------+----------------------------+
>  | 00:20 | all 21 bits of length      |
>  |    21 | unwritten extent bit       |
>  | 22:63 | high 42 bits of startblock |
>  +-------+----------------------------+
> 
> So we only need a 64-bit key and a 64-bit value by abusing parts
> of the key to store bits of the startblock.

Neat! :)

> For non-leaf nodes we iterate through the keys only, never touching
> the cache lines for the value.  For the leaf nodes we have to touch
> the value anyway because we have to do a range lookup to find the
> exact record.
> 
> This works fine so far in an isolated simulator, and now I'm ammending
> it to be a b+tree with pointers to the previous and next node so
> that we can nicely implement our extent iterators instead of doing
> full lookups.

Ok, that sounds exactly what I have been looking towards....

> > The sticking point, IMO, is the extent array index based lookups in
> > all the bmbt code.  I've been looking at converting all that to use
> > offset based lookups and a cursor w/ lookup/inc/dec/insert/delete
> > ioperations wrapping xfs_iext_lookup_ext() and friends. This means
> > the modifications are pretty much identical to the on-disk extent
> > btree, so they can be abstracted out into a single extent update
> > interface for both trees.  Have you planned/done any cleanup/changes
> > with this code?
> 
> I've done various cleanups, but I've not yet consolidated the two.
> Basically step one at the moment is to move everyone to
> xfs_iext_lookup_extent + xfs_iext_get_extent that removes all the
> bad intrusion.

Yup.

> Once we move to the actual b+trees the extnum_t cursor will be replaced
> with a real cursor structure that contains a pointer to the current
> b+tree leaf node, and an index inside that, which will allows us very
> efficient iteration.  The xfs_iext_get_extent calls will be replaced
> with more specific xfs_iext_prev_extent, xfs_iext_next_extent calls
> that include the now slightly more complex cursor decrement, increment
> as well as a new xfs_iext_last_extent helper for the last extent
> that we need in a few places.

Ok, that's sounds like it'll fit right in with what I've been
prototyping for the extent code in xfs_bmap.c. I can make that work
with a cursor-based lookup/inc/dec/ins/del API similar to the bmbt
API. I've been looking to abstract the extent manipulations out into
functions that modify both trees like this:

[note: just put template code in to get my thoughts straight, it's
not working code]

+static int
+xfs_bmex_delete(
+       struct xfs_iext_cursor          *icur,
+       struct xfs_btree_cursor         *cur,
+       int                             *nextents)
+{
+       int                             i;
+
+       xfs_iext_remove(bma->ip, bma->idx + 1, 2, state);
+       if (nextents)
+               (*nextents)--;
+       if (!cur)
+               return 0;
+       error = xfs_btree_delete(cur, &i);
+       if (error)
+               return error;
+       XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
+       return 0;
+}
+
+static int
+xfs_bmex_increment(
+       struct xfs_iext_cursor          *icur,
+       struct xfs_btree_cursor         *cur)
+{
+       int                             i;
+
+       icur->ep = xfs_iext_get_right_ext(icur->ep);
+       if (!cur)
+               return 0;
+       error = xfs_btree_increment(cur, 0, &i);
+       if (error)
+               return error;
+       XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
+       return 0;
+}
+
+static int
+xfs_bmex_decrement(
+       struct xfs_iext_cursor          *icur,
+       struct xfs_btree_cursor         *cur)
+{
+       int                             i;
+
+       icur->ep = xfs_iext_get_left_ext(icur->ep);
+       if (!cur)
+               return 0;
+       error = xfs_btree_decrement(cur, 0, &i);
+       if (error)
+               return error;
+       XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
+       return 0;
+}

And so what you're doing would fit straight into that. I'm
ending up with is extent operations that look like this:

xfs_bmap_add_extent_delay_real()
.....
	case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG |
             BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG:
                /*
                 * Filling in all of a previously delayed allocation extent.
                 * The left and right neighbors are both contiguous with new.
                 */
+               rval |= XFS_ILOG_CORE;
+
+               /* remove the incore delalloc extent first */
+               error = xfs_bmex_delete(&icur, NULL, nextents);
+               if (error)
+                       goto done;
+
+               /*
+                * update incore and bmap extent trees
+                *      1. set cursors to the right extent
+                *      2. remove the right extent
+                *      3. update the left extent to span all 3 extent ranges
+                */
+               error = xfs_bmex_lookup_eq(&icur, bma->cur, RIGHT.br_startoff,
+                               RIGHT.br_startblock, RIGHT.br_blockcount, 1);
+               if (error)
+                       goto done;
+               error = xfs_bmex_delete(&icur, bma->cur, NULL);
+               if (error)
+                       goto done;
+               error = xfs_bmex_decrement(&icur, bma->cur);
+               if (error)
+                       goto done;
+               error = xfs_bmex_update(&icur, bma->cur, LEFT.br_startoff,
+                               LEFT.br_startblock,
+                               LEFT.br_blockcount + PREV.br_blockcount +
+                                       RIGHT.br_blockcount,
+                               LEFT.br_state);
+               if (error)
+                       goto done;
 		break;
....

And I'm starting to see where there are common extent manipulations
being done so there's probably a fair amount of further factoring
that can be done on top of this....

> insert/delete remain very similar to what they do right now, they'll
> get a different cursor type, and the manual xfs_iext_add calls will
> go away.  The new xfs_iext_update_extent helper I posted to the list
> yesterday will become a bit more complex, as changing the startoff
> will have to be propagated up the tree.

I've had a quick look at them and pulled it down into my tree for
testing (which had a cpu burning hang on xfs/020 a few minutes ago),
but I'll spend more time grokking them tomorrow.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-30  8:05               ` Dave Chinner
  0 siblings, 0 replies; 172+ messages in thread
From: Dave Chinner @ 2017-08-30  8:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kees Cook, linux-kernel, David Windsor, Darrick J. Wong,
	linux-xfs, linux-mm, kernel-hardening

On Wed, Aug 30, 2017 at 12:14:03AM -0700, Christoph Hellwig wrote:
> On Wed, Aug 30, 2017 at 07:51:57AM +1000, Dave Chinner wrote:
> > Right, I've looked at btrees, too, but it's more complex than just
> > using an rbtree. I originally looked at using Peter Z's old
> > RCU-aware btree code, but it doesn't hold data in the tree leaves.
> > So that needed significant modification to make work without a
> > memory alloc per extent and that didn't work with original aim of
> > RCU-safe extent lookups.  I also looked at that "generic" btree
> > stuff that came from logfs, and after a little while ran away
> > screaming.
> 
> I started with the latter, but it's not really looking like it any more:
> there nodes are formatted as a series of u64s instead of all the
> long magic,

Yeah, that was about where I started to run away and look for
something nicer....

> and the data is stored inline - in fact I use a cute
> trick to keep the size down, derived from our "compressed" on disk
> extent format:
> 
> Key:
> 
>  +-------+----------------------------+
>  | 00:51 | all 52 bits of startoff    |
>  | 52:63 | low 12 bits of startblock  |
>  +-------+----------------------------+
> 
> Value
> 
>  +-------+----------------------------+
>  | 00:20 | all 21 bits of length      |
>  |    21 | unwritten extent bit       |
>  | 22:63 | high 42 bits of startblock |
>  +-------+----------------------------+
> 
> So we only need a 64-bit key and a 64-bit value by abusing parts
> of the key to store bits of the startblock.

Neat! :)

> For non-leaf nodes we iterate through the keys only, never touching
> the cache lines for the value.  For the leaf nodes we have to touch
> the value anyway because we have to do a range lookup to find the
> exact record.
> 
> This works fine so far in an isolated simulator, and now I'm ammending
> it to be a b+tree with pointers to the previous and next node so
> that we can nicely implement our extent iterators instead of doing
> full lookups.

Ok, that sounds exactly what I have been looking towards....

> > The sticking point, IMO, is the extent array index based lookups in
> > all the bmbt code.  I've been looking at converting all that to use
> > offset based lookups and a cursor w/ lookup/inc/dec/insert/delete
> > ioperations wrapping xfs_iext_lookup_ext() and friends. This means
> > the modifications are pretty much identical to the on-disk extent
> > btree, so they can be abstracted out into a single extent update
> > interface for both trees.  Have you planned/done any cleanup/changes
> > with this code?
> 
> I've done various cleanups, but I've not yet consolidated the two.
> Basically step one at the moment is to move everyone to
> xfs_iext_lookup_extent + xfs_iext_get_extent that removes all the
> bad intrusion.

Yup.

> Once we move to the actual b+trees the extnum_t cursor will be replaced
> with a real cursor structure that contains a pointer to the current
> b+tree leaf node, and an index inside that, which will allows us very
> efficient iteration.  The xfs_iext_get_extent calls will be replaced
> with more specific xfs_iext_prev_extent, xfs_iext_next_extent calls
> that include the now slightly more complex cursor decrement, increment
> as well as a new xfs_iext_last_extent helper for the last extent
> that we need in a few places.

Ok, that's sounds like it'll fit right in with what I've been
prototyping for the extent code in xfs_bmap.c. I can make that work
with a cursor-based lookup/inc/dec/ins/del API similar to the bmbt
API. I've been looking to abstract the extent manipulations out into
functions that modify both trees like this:

[note: just put template code in to get my thoughts straight, it's
not working code]

+static int
+xfs_bmex_delete(
+       struct xfs_iext_cursor          *icur,
+       struct xfs_btree_cursor         *cur,
+       int                             *nextents)
+{
+       int                             i;
+
+       xfs_iext_remove(bma->ip, bma->idx + 1, 2, state);
+       if (nextents)
+               (*nextents)--;
+       if (!cur)
+               return 0;
+       error = xfs_btree_delete(cur, &i);
+       if (error)
+               return error;
+       XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
+       return 0;
+}
+
+static int
+xfs_bmex_increment(
+       struct xfs_iext_cursor          *icur,
+       struct xfs_btree_cursor         *cur)
+{
+       int                             i;
+
+       icur->ep = xfs_iext_get_right_ext(icur->ep);
+       if (!cur)
+               return 0;
+       error = xfs_btree_increment(cur, 0, &i);
+       if (error)
+               return error;
+       XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
+       return 0;
+}
+
+static int
+xfs_bmex_decrement(
+       struct xfs_iext_cursor          *icur,
+       struct xfs_btree_cursor         *cur)
+{
+       int                             i;
+
+       icur->ep = xfs_iext_get_left_ext(icur->ep);
+       if (!cur)
+               return 0;
+       error = xfs_btree_decrement(cur, 0, &i);
+       if (error)
+               return error;
+       XFS_WANT_CORRUPTED_RETURN(cur->bc_mp, i == 1);
+       return 0;
+}

And so what you're doing would fit straight into that. I'm
ending up with is extent operations that look like this:

xfs_bmap_add_extent_delay_real()
.....
	case BMAP_LEFT_FILLING | BMAP_LEFT_CONTIG |
             BMAP_RIGHT_FILLING | BMAP_RIGHT_CONTIG:
                /*
                 * Filling in all of a previously delayed allocation extent.
                 * The left and right neighbors are both contiguous with new.
                 */
+               rval |= XFS_ILOG_CORE;
+
+               /* remove the incore delalloc extent first */
+               error = xfs_bmex_delete(&icur, NULL, nextents);
+               if (error)
+                       goto done;
+
+               /*
+                * update incore and bmap extent trees
+                *      1. set cursors to the right extent
+                *      2. remove the right extent
+                *      3. update the left extent to span all 3 extent ranges
+                */
+               error = xfs_bmex_lookup_eq(&icur, bma->cur, RIGHT.br_startoff,
+                               RIGHT.br_startblock, RIGHT.br_blockcount, 1);
+               if (error)
+                       goto done;
+               error = xfs_bmex_delete(&icur, bma->cur, NULL);
+               if (error)
+                       goto done;
+               error = xfs_bmex_decrement(&icur, bma->cur);
+               if (error)
+                       goto done;
+               error = xfs_bmex_update(&icur, bma->cur, LEFT.br_startoff,
+                               LEFT.br_startblock,
+                               LEFT.br_blockcount + PREV.br_blockcount +
+                                       RIGHT.br_blockcount,
+                               LEFT.br_state);
+               if (error)
+                       goto done;
 		break;
....

And I'm starting to see where there are common extent manipulations
being done so there's probably a fair amount of further factoring
that can be done on top of this....

> insert/delete remain very similar to what they do right now, they'll
> get a different cursor type, and the manual xfs_iext_add calls will
> go away.  The new xfs_iext_update_extent helper I posted to the list
> yesterday will become a bit more complex, as changing the startoff
> will have to be propagated up the tree.

I've had a quick look at them and pulled it down into my tree for
testing (which had a cpu burning hang on xfs/020 a few minutes ago),
but I'll spend more time grokking them tomorrow.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
  2017-08-30  8:05               ` Dave Chinner
  (?)
@ 2017-08-30  8:33                 ` Christoph Hellwig
  -1 siblings, 0 replies; 172+ messages in thread
From: Christoph Hellwig @ 2017-08-30  8:33 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Kees Cook, linux-kernel, David Windsor,
	Darrick J. Wong, linux-xfs, linux-mm, kernel-hardening

On Wed, Aug 30, 2017 at 06:05:58PM +1000, Dave Chinner wrote:
> Ok, that's sounds like it'll fit right in with what I've been
> prototyping for the extent code in xfs_bmap.c. I can make that work
> with a cursor-based lookup/inc/dec/ins/del API similar to the bmbt
> API. I've been looking to abstract the extent manipulations out into
> functions that modify both trees like this:
> 
> [note: just put template code in to get my thoughts straight, it's
> not working code]

FYI, I've got somewhat working changes in that area (still has bugs
but a few tests pass :)), what I'm doing is to make sure all of
the xfs_bmap_{add,del}_extent_* routines fully operate on xfs_bmbt_irec
structures that they acquire through the xfs_bmalloca structure or
from xfs_iext_get_extent and update using xfs_iext_update_extent.
A nice fallout from that is that we can change the prototypes for
xfs_bmbt_lookup_* and xfs_bmbt_update to take a xfs_bmbt_irec
as well instead of taking the individual arguments.  That should
help with your next step cleanups a bit.

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-30  8:33                 ` Christoph Hellwig
  0 siblings, 0 replies; 172+ messages in thread
From: Christoph Hellwig @ 2017-08-30  8:33 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Kees Cook, linux-kernel, David Windsor,
	Darrick J. Wong, linux-xfs, linux-mm, kernel-hardening

On Wed, Aug 30, 2017 at 06:05:58PM +1000, Dave Chinner wrote:
> Ok, that's sounds like it'll fit right in with what I've been
> prototyping for the extent code in xfs_bmap.c. I can make that work
> with a cursor-based lookup/inc/dec/ins/del API similar to the bmbt
> API. I've been looking to abstract the extent manipulations out into
> functions that modify both trees like this:
> 
> [note: just put template code in to get my thoughts straight, it's
> not working code]

FYI, I've got somewhat working changes in that area (still has bugs
but a few tests pass :)), what I'm doing is to make sure all of
the xfs_bmap_{add,del}_extent_* routines fully operate on xfs_bmbt_irec
structures that they acquire through the xfs_bmalloca structure or
from xfs_iext_get_extent and update using xfs_iext_update_extent.
A nice fallout from that is that we can change the prototypes for
xfs_bmbt_lookup_* and xfs_bmbt_update to take a xfs_bmbt_irec
as well instead of taking the individual arguments.  That should
help with your next step cleanups a bit.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode slab cache
@ 2017-08-30  8:33                 ` Christoph Hellwig
  0 siblings, 0 replies; 172+ messages in thread
From: Christoph Hellwig @ 2017-08-30  8:33 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Christoph Hellwig, Kees Cook, linux-kernel, David Windsor,
	Darrick J. Wong, linux-xfs, linux-mm, kernel-hardening

On Wed, Aug 30, 2017 at 06:05:58PM +1000, Dave Chinner wrote:
> Ok, that's sounds like it'll fit right in with what I've been
> prototyping for the extent code in xfs_bmap.c. I can make that work
> with a cursor-based lookup/inc/dec/ins/del API similar to the bmbt
> API. I've been looking to abstract the extent manipulations out into
> functions that modify both trees like this:
> 
> [note: just put template code in to get my thoughts straight, it's
> not working code]

FYI, I've got somewhat working changes in that area (still has bugs
but a few tests pass :)), what I'm doing is to make sure all of
the xfs_bmap_{add,del}_extent_* routines fully operate on xfs_bmbt_irec
structures that they acquire through the xfs_bmalloca structure or
from xfs_iext_get_extent and update using xfs_iext_update_extent.
A nice fallout from that is that we can change the prototypes for
xfs_bmbt_lookup_* and xfs_bmbt_update to take a xfs_bmbt_irec
as well instead of taking the individual arguments.  That should
help with your next step cleanups a bit.

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 08/30] ext2: Define usercopy region in ext2_inode_cache slab cache
  2017-08-28 21:34   ` Kees Cook
  (?)
@ 2017-08-30 11:22     ` Jan Kara
  -1 siblings, 0 replies; 172+ messages in thread
From: Jan Kara @ 2017-08-30 11:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, David Windsor, Jan Kara, linux-ext4, linux-mm,
	kernel-hardening

On Mon 28-08-17 14:34:49, Kees Cook wrote:
> From: David Windsor <dave@nullcore.net>
> 
> The ext2 symlink pathnames, stored in struct ext2_inode_info.i_data and
> therefore contained in the ext2_inode_cache slab cache, need to be copied
> to/from userspace.
> 
> cache object allocation:
>     fs/ext2/super.c:
>         ext2_alloc_inode(...):
>             struct ext2_inode_info *ei;
>             ...
>             ei = kmem_cache_alloc(ext2_inode_cachep, GFP_NOFS);
>             ...
>             return &ei->vfs_inode;
> 
>     fs/ext2/ext2.h:
>         EXT2_I(struct inode *inode):
>             return container_of(inode, struct ext2_inode_info, vfs_inode);
> 
>     fs/ext2/namei.c:
>         ext2_symlink(...):
>             ...
>             inode->i_link = (char *)&EXT2_I(inode)->i_data;
> 
> example usage trace:
>     readlink_copy+0x43/0x70
>     vfs_readlink+0x62/0x110
>     SyS_readlinkat+0x100/0x130
> 
>     fs/namei.c:
>         readlink_copy(..., link):
>             ...
>             copy_to_user(..., link, len);
> 
>         (inlined into vfs_readlink)
>         generic_readlink(dentry, ...):
>             struct inode *inode = d_inode(dentry);
>             const char *link = inode->i_link;
>             ...
>             readlink_copy(..., link);
> 
> In support of usercopy hardening, this patch defines a region in the
> ext2_inode_cache slab cache in which userspace copy operations are
> allowed.
> 
> This region is known as the slab cache's usercopy region. Slab caches can
> now check that each copy operation involving cache-managed memory falls
> entirely within the slab's usercopy region.
> 
> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> whitelisting code in the last public patch of grsecurity/PaX based on my
> understanding of the code. Changes or omissions from the original code are
> mine and don't reflect the original grsecurity/PaX code.
> 
> Signed-off-by: David Windsor <dave@nullcore.net>
> [kees: adjust commit log, provide usage trace]
> Cc: Jan Kara <jack@suse.com>
> Cc: linux-ext4@vger.kernel.org
> Signed-off-by: Kees Cook <keescook@chromium.org>

Looks good. You can add:

Acked-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext2/super.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ext2/super.c b/fs/ext2/super.c
> index 7b1bc9059863..670142cde59d 100644
> --- a/fs/ext2/super.c
> +++ b/fs/ext2/super.c
> @@ -219,11 +219,13 @@ static void init_once(void *foo)
>  
>  static int __init init_inodecache(void)
>  {
> -	ext2_inode_cachep = kmem_cache_create("ext2_inode_cache",
> -					     sizeof(struct ext2_inode_info),
> -					     0, (SLAB_RECLAIM_ACCOUNT|
> -						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
> -					     init_once);
> +	ext2_inode_cachep = kmem_cache_create_usercopy("ext2_inode_cache",
> +				sizeof(struct ext2_inode_info), 0,
> +				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
> +					SLAB_ACCOUNT),
> +				offsetof(struct ext2_inode_info, i_data),
> +				sizeof_field(struct ext2_inode_info, i_data),
> +				init_once);
>  	if (ext2_inode_cachep == NULL)
>  		return -ENOMEM;
>  	return 0;
> -- 
> 2.7.4
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [PATCH v2 08/30] ext2: Define usercopy region in ext2_inode_cache slab cache
@ 2017-08-30 11:22     ` Jan Kara
  0 siblings, 0 replies; 172+ messages in thread
From: Jan Kara @ 2017-08-30 11:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, David Windsor, Jan Kara, linux-ext4, linux-mm,
	kernel-hardening

On Mon 28-08-17 14:34:49, Kees Cook wrote:
> From: David Windsor <dave@nullcore.net>
> 
> The ext2 symlink pathnames, stored in struct ext2_inode_info.i_data and
> therefore contained in the ext2_inode_cache slab cache, need to be copied
> to/from userspace.
> 
> cache object allocation:
>     fs/ext2/super.c:
>         ext2_alloc_inode(...):
>             struct ext2_inode_info *ei;
>             ...
>             ei = kmem_cache_alloc(ext2_inode_cachep, GFP_NOFS);
>             ...
>             return &ei->vfs_inode;
> 
>     fs/ext2/ext2.h:
>         EXT2_I(struct inode *inode):
>             return container_of(inode, struct ext2_inode_info, vfs_inode);
> 
>     fs/ext2/namei.c:
>         ext2_symlink(...):
>             ...
>             inode->i_link = (char *)&EXT2_I(inode)->i_data;
> 
> example usage trace:
>     readlink_copy+0x43/0x70
>     vfs_readlink+0x62/0x110
>     SyS_readlinkat+0x100/0x130
> 
>     fs/namei.c:
>         readlink_copy(..., link):
>             ...
>             copy_to_user(..., link, len);
> 
>         (inlined into vfs_readlink)
>         generic_readlink(dentry, ...):
>             struct inode *inode = d_inode(dentry);
>             const char *link = inode->i_link;
>             ...
>             readlink_copy(..., link);
> 
> In support of usercopy hardening, this patch defines a region in the
> ext2_inode_cache slab cache in which userspace copy operations are
> allowed.
> 
> This region is known as the slab cache's usercopy region. Slab caches can
> now check that each copy operation involving cache-managed memory falls
> entirely within the slab's usercopy region.
> 
> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> whitelisting code in the last public patch of grsecurity/PaX based on my
> understanding of the code. Changes or omissions from the original code are
> mine and don't reflect the original grsecurity/PaX code.
> 
> Signed-off-by: David Windsor <dave@nullcore.net>
> [kees: adjust commit log, provide usage trace]
> Cc: Jan Kara <jack@suse.com>
> Cc: linux-ext4@vger.kernel.org
> Signed-off-by: Kees Cook <keescook@chromium.org>

Looks good. You can add:

Acked-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext2/super.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ext2/super.c b/fs/ext2/super.c
> index 7b1bc9059863..670142cde59d 100644
> --- a/fs/ext2/super.c
> +++ b/fs/ext2/super.c
> @@ -219,11 +219,13 @@ static void init_once(void *foo)
>  
>  static int __init init_inodecache(void)
>  {
> -	ext2_inode_cachep = kmem_cache_create("ext2_inode_cache",
> -					     sizeof(struct ext2_inode_info),
> -					     0, (SLAB_RECLAIM_ACCOUNT|
> -						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
> -					     init_once);
> +	ext2_inode_cachep = kmem_cache_create_usercopy("ext2_inode_cache",
> +				sizeof(struct ext2_inode_info), 0,
> +				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
> +					SLAB_ACCOUNT),
> +				offsetof(struct ext2_inode_info, i_data),
> +				sizeof_field(struct ext2_inode_info, i_data),
> +				init_once);
>  	if (ext2_inode_cachep == NULL)
>  		return -ENOMEM;
>  	return 0;
> -- 
> 2.7.4
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 172+ messages in thread

* [kernel-hardening] Re: [PATCH v2 08/30] ext2: Define usercopy region in ext2_inode_cache slab cache
@ 2017-08-30 11:22     ` Jan Kara
  0 siblings, 0 replies; 172+ messages in thread
From: Jan Kara @ 2017-08-30 11:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, David Windsor, Jan Kara, linux-ext4, linux-mm,
	kernel-hardening

On Mon 28-08-17 14:34:49, Kees Cook wrote:
> From: David Windsor <dave@nullcore.net>
> 
> The ext2 symlink pathnames, stored in struct ext2_inode_info.i_data and
> therefore contained in the ext2_inode_cache slab cache, need to be copied
> to/from userspace.
> 
> cache object allocation:
>     fs/ext2/super.c:
>         ext2_alloc_inode(...):
>             struct ext2_inode_info *ei;
>             ...
>             ei = kmem_cache_alloc(ext2_inode_cachep, GFP_NOFS);
>             ...
>             return &ei->vfs_inode;
> 
>     fs/ext2/ext2.h:
>         EXT2_I(struct inode *inode):
>             return container_of(inode, struct ext2_inode_info, vfs_inode);
> 
>     fs/ext2/namei.c:
>         ext2_symlink(...):
>             ...
>             inode->i_link = (char *)&EXT2_I(inode)->i_data;
> 
> example usage trace:
>     readlink_copy+0x43/0x70
>     vfs_readlink+0x62/0x110
>     SyS_readlinkat+0x100/0x130
> 
>     fs/namei.c:
>         readlink_copy(..., link):
>             ...
>             copy_to_user(..., link, len);
> 
>         (inlined into vfs_readlink)
>         generic_readlink(dentry, ...):
>             struct inode *inode = d_inode(dentry);
>             const char *link = inode->i_link;
>             ...
>             readlink_copy(..., link);
> 
> In support of usercopy hardening, this patch defines a region in the
> ext2_inode_cache slab cache in which userspace copy operations are
> allowed.
> 
> This region is known as the slab cache's usercopy region. Slab caches can
> now check that each copy operation involving cache-managed memory falls
> entirely within the slab's usercopy region.
> 
> This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY
> whitelisting code in the last public patch of grsecurity/PaX based on my
> understanding of the code. Changes or omissions from the original code are
> mine and don't reflect the original grsecurity/PaX code.
> 
> Signed-off-by: David Windsor <dave@nullcore.net>
> [kees: adjust commit log, provide usage trace]
> Cc: Jan Kara <jack@suse.com>
> Cc: linux-ext4@vger.kernel.org
> Signed-off-by: Kees Cook <keescook@chromium.org>

Looks good. You can add:

Acked-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext2/super.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/ext2/super.c b/fs/ext2/super.c
> index 7b1bc9059863..670142cde59d 100644
> --- a/fs/ext2/super.c
> +++ b/fs/ext2/super.c
> @@ -219,11 +219,13 @@ static void init_once(void *foo)
>  
>  static int __init init_inodecache(void)
>  {
> -	ext2_inode_cachep = kmem_cache_create("ext2_inode_cache",
> -					     sizeof(struct ext2_inode_info),
> -					     0, (SLAB_RECLAIM_ACCOUNT|
> -						SLAB_MEM_SPREAD|SLAB_ACCOUNT),
> -					     init_once);
> +	ext2_inode_cachep = kmem_cache_create_usercopy("ext2_inode_cache",
> +				sizeof(struct ext2_inode_info), 0,
> +				(SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
> +					SLAB_ACCOUNT),
> +				offsetof(struct ext2_inode_info, i_data),
> +				sizeof_field(struct ext2_inode_info, i_data),
> +				init_once);
>  	if (ext2_inode_cachep == NULL)
>  		return -ENOMEM;
>  	return 0;
> -- 
> 2.7.4
> 
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [kernel-hardening] [PATCH v2 26/30] fork: Provide usercopy whitelisting for task_struct
  2017-08-28 21:35   ` Kees Cook
  (?)
  (?)
@ 2017-08-30 18:55   ` Rik van Riel
  -1 siblings, 0 replies; 172+ messages in thread
From: Rik van Riel @ 2017-08-30 18:55 UTC (permalink / raw)
  To: Kees Cook, linux-kernel
  Cc: Andrew Morton, Nicholas Piggin, Laura Abbott,
	Mickaël Salaün, Ingo Molnar, Thomas Gleixner,
	Andy Lutomirski, linux-mm, kernel-hardening, David Windsor

[-- Attachment #1: Type: text/plain, Size: 524 bytes --]

On Mon, 2017-08-28 at 14:35 -0700, Kees Cook wrote:
> While the blocked and saved_sigmask fields of task_struct are copied
> to
> userspace (via sigmask_to_save() and setup_rt_frame()), it is always
> copied with a static length (i.e. sizeof(sigset_t)).
> 
> The only portion of task_struct that is potentially dynamically sized
> and
> may be copied to userspace is in the architecture-specific
> thread_struct
> at the end of task_struct.
> 
Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [kernel-hardening] [PATCH v2 27/30] x86: Implement thread_struct whitelist for hardened usercopy
  2017-08-28 21:35   ` Kees Cook
  (?)
  (?)
@ 2017-08-30 18:55   ` Rik van Riel
  -1 siblings, 0 replies; 172+ messages in thread
From: Rik van Riel @ 2017-08-30 18:55 UTC (permalink / raw)
  To: Kees Cook, linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	Borislav Petkov, Andy Lutomirski, Mathias Krause, linux-mm,
	kernel-hardening, David Windsor

[-- Attachment #1: Type: text/plain, Size: 747 bytes --]

On Mon, 2017-08-28 at 14:35 -0700, Kees Cook wrote:
> This whitelists the FPU register state portion of the thread_struct
> for
> copying to userspace, instead of the default entire struct.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: x86@kernel.org
> Cc: Borislav Petkov <bp@suse.de>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Mathias Krause <minipli@googlemail.com>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  arch/x86/Kconfig                 | 1 +
>  arch/x86/include/asm/processor.h | 8 ++++++++
>  2 files changed, 9 insertions(+)
> 
Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [kernel-hardening] [PATCH v2 25/30] fork: Define usercopy region in thread_stack slab caches
  2017-08-28 21:35   ` Kees Cook
  (?)
  (?)
@ 2017-08-30 18:55   ` Rik van Riel
  -1 siblings, 0 replies; 172+ messages in thread
From: Rik van Riel @ 2017-08-30 18:55 UTC (permalink / raw)
  To: Kees Cook, linux-kernel
  Cc: David Windsor, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Andy Lutomirski, linux-mm, kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 563 bytes --]

On Mon, 2017-08-28 at 14:35 -0700, Kees Cook wrote:
> From: David Windsor <dave@nullcore.net>
> 
> In support of usercopy hardening, this patch defines a region in the
> thread_stack slab caches in which userspace copy operations are
> allowed.
> Since the entire thread_stack needs to be available to userspace, the
> entire slab contents are whitelisted. Note that the slab-based thread
> stack is only present on systems with THREAD_SIZE < PAGE_SIZE and
> !CONFIG_VMAP_STACK.
> 

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 172+ messages in thread

* Re: [kernel-hardening] [PATCH v2 24/30] fork: Define usercopy region in mm_struct slab caches
  2017-08-28 21:35   ` Kees Cook
  (?)
  (?)
@ 2017-08-30 19:29   ` Rik van Riel
  -1 siblings, 0 replies; 172+ messages in thread
From: Rik van Riel @ 2017-08-30 19:29 UTC (permalink / raw)
  To: Kees Cook, linux-kernel
  Cc: David Windsor, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Andy Lutomirski, linux-mm, kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 364 bytes --]

On Mon, 2017-08-28 at 14:35 -0700, Kees Cook wrote:
> From: David Windsor <dave@nullcore.net>
> 
> In support of usercopy hardening, this patch defines a region in the
> mm_struct slab caches in which userspace copy operations are allowed.
> Only the auxv field is copied to userspace.
> 
Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 172+ messages in thread

end of thread, other threads:[~2017-08-30 19:29 UTC | newest]

Thread overview: 172+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-28 21:34 [PATCH v2 00/30] Hardened usercopy whitelisting Kees Cook
2017-08-28 21:34 ` [kernel-hardening] " Kees Cook
2017-08-28 21:34 ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 01/30] usercopy: Prepare for " Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 02/30] usercopy: Enforce slab cache usercopy region boundaries Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 03/30] usercopy: Mark kmalloc caches as usercopy caches Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 04/30] dcache: Define usercopy region in dentry_cache slab cache Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 05/30] vfs: Define usercopy region in names_cache slab caches Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 06/30] vfs: Copy struct mount.mnt_id to userspace using put_user() Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 07/30] ext4: Define usercopy region in ext4_inode_cache slab cache Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 08/30] ext2: Define usercopy region in ext2_inode_cache " Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-30 11:22   ` Jan Kara
2017-08-30 11:22     ` [kernel-hardening] " Jan Kara
2017-08-30 11:22     ` Jan Kara
2017-08-28 21:34 ` [PATCH v2 09/30] jfs: Define usercopy region in jfs_ip " Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 10/30] befs: Define usercopy region in befs_inode_cache " Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-29 10:12   ` Luis de Bethencourt
2017-08-29 10:12     ` [kernel-hardening] " Luis de Bethencourt
2017-08-29 10:12     ` Luis de Bethencourt
2017-08-29 15:36     ` Kees Cook
2017-08-29 15:36       ` [kernel-hardening] " Kees Cook
2017-08-29 15:36       ` Kees Cook
2017-08-29 17:10       ` Luis de Bethencourt
2017-08-29 17:10         ` [kernel-hardening] " Luis de Bethencourt
2017-08-29 17:10         ` Luis de Bethencourt
2017-08-28 21:34 ` [PATCH v2 11/30] exofs: Define usercopy region in exofs_inode_cache " Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 12/30] orangefs: Define usercopy region in orangefs_inode_cache " Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 13/30] ufs: Define usercopy region in ufs_inode_cache " Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 14/30] vxfs: Define usercopy region in vxfs_inode " Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 15/30] xfs: Define usercopy region in xfs_inode " Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:49   ` Darrick J. Wong
2017-08-28 21:49     ` [kernel-hardening] " Darrick J. Wong
2017-08-28 21:49     ` Darrick J. Wong
2017-08-28 21:57     ` Kees Cook
2017-08-28 21:57       ` [kernel-hardening] " Kees Cook
2017-08-28 21:57       ` Kees Cook
2017-08-28 21:57       ` Kees Cook
2017-08-29  4:47       ` Darrick J. Wong
2017-08-29  4:47         ` [kernel-hardening] " Darrick J. Wong
2017-08-29  4:47         ` Darrick J. Wong
2017-08-29  4:47         ` Darrick J. Wong
2017-08-29 18:48         ` Kees Cook
2017-08-29 18:48           ` [kernel-hardening] " Kees Cook
2017-08-29 18:48           ` Kees Cook
2017-08-29 18:48           ` Kees Cook
2017-08-29 19:00           ` Darrick J. Wong
2017-08-29 19:00             ` [kernel-hardening] " Darrick J. Wong
2017-08-29 19:00             ` Darrick J. Wong
2017-08-29 19:00             ` Darrick J. Wong
2017-08-29 22:15           ` Dave Chinner
2017-08-29 22:15             ` [kernel-hardening] " Dave Chinner
2017-08-29 22:15             ` Dave Chinner
2017-08-29 22:15             ` Dave Chinner
2017-08-29 22:25             ` Kees Cook
2017-08-29 22:25               ` [kernel-hardening] " Kees Cook
2017-08-29 22:25               ` Kees Cook
2017-08-29 22:25               ` Kees Cook
2017-08-29  8:14   ` Christoph Hellwig
2017-08-29  8:14     ` [kernel-hardening] " Christoph Hellwig
2017-08-29  8:14     ` Christoph Hellwig
2017-08-29 12:31     ` Dave Chinner
2017-08-29 12:31       ` [kernel-hardening] " Dave Chinner
2017-08-29 12:31       ` Dave Chinner
2017-08-29 12:45       ` Christoph Hellwig
2017-08-29 12:45         ` [kernel-hardening] " Christoph Hellwig
2017-08-29 12:45         ` Christoph Hellwig
2017-08-29 21:51         ` Dave Chinner
2017-08-29 21:51           ` [kernel-hardening] " Dave Chinner
2017-08-29 21:51           ` Dave Chinner
2017-08-30  7:14           ` Christoph Hellwig
2017-08-30  7:14             ` [kernel-hardening] " Christoph Hellwig
2017-08-30  7:14             ` Christoph Hellwig
2017-08-30  8:05             ` Dave Chinner
2017-08-30  8:05               ` [kernel-hardening] " Dave Chinner
2017-08-30  8:05               ` Dave Chinner
2017-08-30  8:33               ` Christoph Hellwig
2017-08-30  8:33                 ` [kernel-hardening] " Christoph Hellwig
2017-08-30  8:33                 ` Christoph Hellwig
2017-08-29 18:55     ` Kees Cook
2017-08-29 18:55       ` [kernel-hardening] " Kees Cook
2017-08-29 18:55       ` Kees Cook
2017-08-29 18:55       ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 16/30] cifs: Define usercopy region in cifs_request " Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 17/30] scsi: Define usercopy region in scsi_sense_cache " Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:42   ` Bart Van Assche
2017-08-28 21:42     ` [kernel-hardening] " Bart Van Assche
2017-08-28 21:42     ` Bart Van Assche
2017-08-28 21:52     ` Kees Cook
2017-08-28 21:52       ` [kernel-hardening] " Kees Cook
2017-08-28 21:52       ` Kees Cook
2017-08-28 21:52       ` Kees Cook
2017-08-28 21:34 ` [PATCH v2 18/30] net: Define usercopy region in struct proto " Kees Cook
2017-08-28 21:34   ` [kernel-hardening] " Kees Cook
2017-08-28 21:34   ` Kees Cook
2017-08-28 21:35 ` [PATCH v2 19/30] ip: Define usercopy region in IP " Kees Cook
2017-08-28 21:35   ` [kernel-hardening] " Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-28 21:35 ` [PATCH v2 20/30] caif: Define usercopy region in caif " Kees Cook
2017-08-28 21:35   ` [kernel-hardening] " Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-28 21:35 ` [PATCH v2 21/30] sctp: Define usercopy region in SCTP " Kees Cook
2017-08-28 21:35   ` [kernel-hardening] " Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-28 21:35 ` [PATCH v2 22/30] sctp: Copy struct sctp_sock.autoclose to userspace using put_user() Kees Cook
2017-08-28 21:35   ` [kernel-hardening] " Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-28 21:35 ` [PATCH v2 23/30] net: Restrict unwhitelisted proto caches to size 0 Kees Cook
2017-08-28 21:35   ` [kernel-hardening] " Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-28 21:35 ` [PATCH v2 24/30] fork: Define usercopy region in mm_struct slab caches Kees Cook
2017-08-28 21:35   ` [kernel-hardening] " Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-30 19:29   ` [kernel-hardening] " Rik van Riel
2017-08-28 21:35 ` [PATCH v2 25/30] fork: Define usercopy region in thread_stack " Kees Cook
2017-08-28 21:35   ` [kernel-hardening] " Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-30 18:55   ` [kernel-hardening] " Rik van Riel
2017-08-28 21:35 ` [PATCH v2 26/30] fork: Provide usercopy whitelisting for task_struct Kees Cook
2017-08-28 21:35   ` [kernel-hardening] " Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-30 18:55   ` [kernel-hardening] " Rik van Riel
2017-08-28 21:35 ` [PATCH v2 27/30] x86: Implement thread_struct whitelist for hardened usercopy Kees Cook
2017-08-28 21:35   ` [kernel-hardening] " Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-30 18:55   ` [kernel-hardening] " Rik van Riel
2017-08-28 21:35 ` [PATCH v2 28/30] arm64: " Kees Cook
2017-08-28 21:35   ` [kernel-hardening] " Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-28 21:35 ` [PATCH v2 29/30] arm: " Kees Cook
2017-08-28 21:35   ` [kernel-hardening] " Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-28 21:35   ` Kees Cook
2017-08-28 21:35 ` [PATCH v2 30/30] usercopy: Restrict non-usercopy caches to size 0 Kees Cook
2017-08-28 21:35   ` [kernel-hardening] " Kees Cook
2017-08-28 21:35   ` Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.