* [PATCH 00/24] block, scsi: final compat_ioctl cleanup @ 2019-12-11 20:42 Arnd Bergmann 2019-12-11 20:42 ` [PATCH 24/24] Documentation: document ioctl interfaces better Arnd Bergmann 0 siblings, 1 reply; 6+ messages in thread From: Arnd Bergmann @ 2019-12-11 20:42 UTC (permalink / raw) To: Jens Axboe, James E.J. Bottomley, Martin K. Petersen, Alexander Viro Cc: linux-kernel, y2038, Arnd Bergmann, corbet, catalin.marinas, will, jdike, richard, anton.ivanov, fujita.tomonori, justin, efremov, tim, mst, jasowang, pbonzini, stefanha, boris.ostrovsky, jgross, sstabellini, konrad.wilk, roger.pau, bp, davem, john.garry, brking, intel-linux-scu, artur.paszkiewicz, jinpu.wang, dgilbert, Kai.Makisara, damien.lemoal, hare, linux-doc, linux-block, linux-arm-kernel, linux-um, linux-scsi, linux-ide, virtualization, xen-devel, linux-fsdevel Hi Jens, James and Martin, This series concludes the work I did for linux-5.5 on the compat_ioctl() cleanup, killing off fs/compat_ioctl.c and block/compat_ioctl.c by moving everything into drivers. Overall this would be a reduction both in complexity and line count, but as I'm also adding documentation the overall number of lines increases in the end. My plan was originally to keep the SCSI and block parts separate. This did not work easily because of interdependencies: I cannot do the final SCSI cleanup in a good way without first addressing the CDROM ioctls, so this is one series that I hope could be merged through either the block or the scsi git trees, or possibly both if you can pull in the same branch. The series comes in these steps: 1. clean up the sg v3 interface as suggested by Linus. I have talked about this with Doug Gilbert as well, and he would rebase his sg v4 patches on top of "compat: scsi: sg: fix v3 compat read/write interface" 2. Four patches for missing block compat_ioctl handlers, to be backported into stable kernels. Separate patches because they are needed in different stable versions. 3. Actually moving handlers out of block/compat_ioctl.c and block/scsi_ioctl.c into drivers, mixed in with cleanup patches 4. Document how to do this right. I keep getting asked about this, and it helps to point to some documentation file. The series is avaialable for testing at [1]. Arnd [1] https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground.git/log/?h=compat-ioctl-endgame Arnd Bergmann (24): compat: ARM64: always include asm-generic/compat.h compat: scsi: sg: fix v3 compat read/write interface compat_ioctl: block: handle BLKREPORTZONE/BLKRESETZONE compat_ioctl: block: handle BLKGETZONESZ/BLKGETNRZONES compat_ioctl: block: handle add zone open, close and finish ioctl compat_ioctl: block: handle Persistent Reservations compaT_ioctl: ubd, aoe: use blkdev_compat_ptr_ioctl compat_ioctl: move CDROM_SEND_PACKET handling into scsi compat_ioctl: move CDROMREADADIO to cdrom.c compat_ioctl: cdrom: handle CDROM_LAST_WRITTEN compat_ioctl: block: handle cdrom compat ioctl in non-cdrom drivers compat_ioctl: add scsi_compat_ioctl compat_ioctl: bsg: add handler compat_ioctl: ide: floppy: add handler compat_ioctl: scsi: move ioctl handling into drivers compat_ioctl: move sys_compat_ioctl() to ioctl.c compat_ioctl: simplify the implementation compat_ioctl: move cdrom commands into cdrom.c compat_ioctl: scsi: handle HDIO commands from drivers compat_ioctl: move HDIO ioctl handling into drivers/ide compat_ioctl: block: move blkdev_compat_ioctl() into ioctl.c compat_ioctl: block: simplify compat_blkpg_ioctl() compat_ioctl: simplify up block/ioctl.c Documentation: document ioctl interfaces better Documentation/core-api/index.rst | 1 + Documentation/core-api/ioctl.rst | 250 +++++++++++++++ arch/arm64/include/asm/compat.h | 5 +- arch/um/drivers/ubd_kern.c | 1 + block/Makefile | 1 - block/bsg.c | 1 + block/compat_ioctl.c | 411 ------------------------- block/ioctl.c | 319 +++++++++++++++---- block/scsi_ioctl.c | 214 ++++++++----- drivers/ata/libata-scsi.c | 9 + drivers/block/aoe/aoeblk.c | 1 + drivers/block/floppy.c | 3 + drivers/block/paride/pcd.c | 3 + drivers/block/paride/pd.c | 1 + drivers/block/paride/pf.c | 1 + drivers/block/pktcdvd.c | 26 +- drivers/block/sunvdc.c | 1 + drivers/block/virtio_blk.c | 3 + drivers/block/xen-blkfront.c | 1 + drivers/cdrom/cdrom.c | 35 ++- drivers/cdrom/gdrom.c | 3 + drivers/ide/ide-cd.c | 40 +++ drivers/ide/ide-disk.c | 3 + drivers/ide/ide-floppy.c | 4 + drivers/ide/ide-floppy.h | 2 + drivers/ide/ide-floppy_ioctl.c | 35 +++ drivers/ide/ide-gd.c | 14 + drivers/ide/ide-ioctls.c | 47 ++- drivers/ide/ide-tape.c | 14 + drivers/scsi/aic94xx/aic94xx_init.c | 3 + drivers/scsi/ch.c | 9 +- drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 3 + drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 3 + drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 3 + drivers/scsi/ipr.c | 3 + drivers/scsi/isci/init.c | 3 + drivers/scsi/mvsas/mv_init.c | 3 + drivers/scsi/pm8001/pm8001_init.c | 3 + drivers/scsi/scsi_ioctl.c | 54 +++- drivers/scsi/sd.c | 50 ++- drivers/scsi/sg.c | 169 +++++----- drivers/scsi/sr.c | 53 +++- drivers/scsi/st.c | 51 +-- fs/Makefile | 2 +- fs/compat_ioctl.c | 261 ---------------- fs/internal.h | 6 - fs/ioctl.c | 131 +++++--- include/linux/blkdev.h | 7 + include/linux/falloc.h | 2 - include/linux/fs.h | 4 - include/linux/ide.h | 2 + include/linux/libata.h | 6 + include/scsi/scsi_ioctl.h | 1 + include/scsi/sg.h | 30 ++ 54 files changed, 1249 insertions(+), 1062 deletions(-) create mode 100644 Documentation/core-api/ioctl.rst delete mode 100644 block/compat_ioctl.c delete mode 100644 fs/compat_ioctl.c -- 2.20.0 Cc: corbet@lwn.net Cc: catalin.marinas@arm.com Cc: will@kernel.org Cc: jdike@addtoit.com Cc: richard@nod.at Cc: anton.ivanov@cambridgegreys.com Cc: fujita.tomonori@lab.ntt.co.jp Cc: justin@coraid.com Cc: efremov@linux.com Cc: tim@cyberelk.net Cc: mst@redhat.com Cc: jasowang@redhat.com Cc: pbonzini@redhat.com Cc: stefanha@redhat.com Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: sstabellini@kernel.org Cc: konrad.wilk@oracle.com Cc: roger.pau@citrix.com Cc: bp@alien8.de Cc: davem@davemloft.net Cc: john.garry@huawei.com Cc: brking@us.ibm.com Cc: intel-linux-scu@intel.com Cc: artur.paszkiewicz@intel.com Cc: jinpu.wang@cloud.ionos.com Cc: dgilbert@interlog.com Cc: Kai.Makisara@kolumbus.fi Cc: arnd@arndb.de Cc: damien.lemoal@hgst.com Cc: hare@suse.com Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-block@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-um@lists.infradead.org Cc: linux-scsi@vger.kernel.org Cc: linux-ide@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Cc: linux-fsdevel@vger.kernel.org ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 24/24] Documentation: document ioctl interfaces better 2019-12-11 20:42 [PATCH 00/24] block, scsi: final compat_ioctl cleanup Arnd Bergmann @ 2019-12-11 20:42 ` Arnd Bergmann 2019-12-11 21:05 ` Jonathan Corbet 2019-12-12 8:16 ` Geert Uytterhoeven 0 siblings, 2 replies; 6+ messages in thread From: Arnd Bergmann @ 2019-12-11 20:42 UTC (permalink / raw) To: Jens Axboe, James E.J. Bottomley, Martin K. Petersen, Alexander Viro, Jonathan Corbet Cc: linux-kernel, y2038, Arnd Bergmann, Mauro Carvalho Chehab, Jonathan Neuschäfer, Masahiro Yamada, Vladimir Oltean, Kent Overstreet, linux-doc Documentation/process/botching-up-ioctls.rst was orignally written as a blog post for DRM driver writers, so it it misses some points while going into a lot of detail on others. Try to provide a replacement that addresses typical issues across a wider range of subsystems, and follows the style of the core-api documentation better. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- Documentation/core-api/index.rst | 1 + Documentation/core-api/ioctl.rst | 250 +++++++++++++++++++++++++++++++ 2 files changed, 251 insertions(+) create mode 100644 Documentation/core-api/ioctl.rst diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index ab0eae1c153a..3f28b2f668be 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -39,6 +39,7 @@ Core utilities ../RCU/index gcc-plugins symbol-namespaces + ioctl Interfaces for kernel debugging diff --git a/Documentation/core-api/ioctl.rst b/Documentation/core-api/ioctl.rst new file mode 100644 index 000000000000..cb2c86ae63e7 --- /dev/null +++ b/Documentation/core-api/ioctl.rst @@ -0,0 +1,250 @@ +====================== +ioctl based interfaces +====================== + +:c:func:`ioctl` is the most common way for applications to interface +with device drivers. It is flexible and easily extended by adding new +commands and can be passed through character devices, block devices as +well as sockets and other special file descriptors. + +However, it is also very easy to get ioctl command definitions wrong, +and hard to fix them later without breaking existing applications, +so this documentation tries to help developers get it right. + +Command number definitions +========================== + +The command number, or request number, is the second argument passed to +the ioctl system call. While this can be any 32-bit number that uniquely +identifies an action for a particular driver, there are a number of +conventions around defining them. + +``include/uapi/asm-generic/ioctl.h`` provides four macros for defining +ioctl commands that follow modern conventions: ``_IOC``, ``_IOR``, +``_IOW``, and ``_IORW``. These should be used for all new commands, +with the correct parameters: + +_IO/_IOR/_IOW/_IOWR + The macro name determines whether the argument is used for passing + data into kernel (_IOW), from the kernel (_IOR), both (_IOWR) or is + not a pointer (_IOC). It is possible but not recommended to pass an + integer value instead of a pointer with _IOC. + +type + An 8-bit number, often a character literal, specific to a subsystem + or driver, and listed in :doc:`../ioctl/ioctl-number` + +nr + An 8-bit number identifying the specific command, unique for a give + value of 'type' + +size + The name of the data type pointed to by the argument, the command + number encodes the ``sizeof(size)`` value in a 13-bit or 14-bit integer, + leading to a limit of 8191 bytes for the maximum size of the argument. + Note: do not pass sizeof(type) type into _IOR/IOW, as that will lead + to encoding sizeof(sizeof(type)), i.e. sizeof(size_t). + + +Interface versions +================== + +Some subsystems use version numbers in data structures to overload +commands with different interpretations of the argument. + +This is generally a bad idea, since changes to existing commands tend +to break existing applications. + +A better approach is to add a new ioctl command with a new number. The +old command still needs to be implemented in the kernel for compatibility, +but this can be a wrapper around the new implementation. + +Return code +=========== + +ioctl commands can return negative error codes as documented in errno(3), +these get turned into errno values in user space. On success, the return +code should be zero. It is also possible but not recommended to return +a positive 'long' value. + +When the ioctl callback is called with an unknown command number, the +handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in +-ENOTTY being returned from the system call. Some subsystems return +-ENOSYS or -EINVAL here for historic reasons, but this is wrong. + +Prior to Linux-5.5, compat_ioctl handlers were required to return +-ENOIOCTLCMD in order to use the fallback conversion into native +commands. As all subsystems are now responsible for handling compat +mode themselves, this is no longer needed, but it may be important to +consider when backporting bug fixes to older kernels. + +Timestamps +========== + +Traditionally, timestamps and timeout values are passed as ``struct +timespec`` or ``struct timeval``, but these are problematic because of +incompatible definitions of these structures in user space after the +move to 64-bit time_t. + +The __kernel_timespec type can be used instead to be embedded in other +data structures when separate second/nanosecond values are desired, +or passed to user space directly. This is still not ideal though, +as the structure matches neither the kernel's timespec64 nor the user +space timespec exactly. The get_timespec64() and put_timespec64() helper +functions canbe used to ensure that the layout remains compatible with +user space and the padding is treated correctly. + +As it is cheap to convert seconds to nanoseconds, but the opposite +requires an expensive 64-bit division, a simple __u64 nanosecond value +can be simpler and more efficient. + +Timeout values and timestamps should ideally use CLOCK_MONOTONIC time, +as returned by ``ktime_get_ns()`` or ``ktime_get_ts64()``. Unlike +CLOCK_REALTIME, this makes the timestamps immune from jumping backwards +or forwards due to leap second adjustments and clock_settime() calls. + +``ktime_get_real_ns()`` can be used for CLOCK_REALTIME timestamps that +may be required for timestamps that need to be persistent across a reboot +or between multiple machines. + +32-bit compat mode +================== + +In order to support 32-bit user space running on a 64-bit machine, each +subsystem or driver that implements an ioctl callback handler must also +implement the corresponding compat_ioctl handler. + +As long as all the rules for data structures are followed, this is as +easy as setting the .compat_ioctl pointer to a helper function such as +``compat_ptr_ioctl()`` or ``blkdev_compat_ptr_ioctl``. + +compat_ptr() +------------ + +On the s/390 architecture, 31-bit user space has ambiguous representations +for data pointers, with the upper bit being ignored. When running such +a process in compat mode, the ``compat_ptr()`` helper must be used to +clear the upper bit of a compat_uptr_t and turn it into a valid 64-bit +pointer. On other architectures, this macro only performs a cast to a +``void __user *`` pointer. + +In an compat_ioctl() callback, the last argument is an unsigned long, +which can be interpreted as either a pointer or a scalar depending on +the command. If it is a scalar, then compat_ptr() must not be used, to +ensure that the 64-bit kernel behaves the same way as a 32-bit kernel +for arguments with the upper bit set. + +The compat_ptr_ioctl() helper can be used in place of a custom +compat_ioctl file operation for drivers that only take arguments that +are pointers to compatible data structures. + +Structure layout +---------------- + +Compatible data structures have the same layout on all architectures, +avoiding all problematic members: + +* ``long`` and ``unsigned long`` are the size of a register, so + they can be either 32 bit or 64 bit wide and cannot be used in portable + data structures. Fixed-length replacements are ``__s32``, ``__u32``, + ``__s64`` and ``__u64``. + +* Pointers have the same problem, in addition to requiring the + use of ``compat_ptr()``. The best workaround is to use ``__u64`` + in place of pointers, which requires a cast to ``uintptr_t`` in user + space, and the use of ``u64_to_user_ptr()`` in the kernel to convert + it back into a user pointer. + +* On the x86-32 (i386) architecture, the alignment of 64-bit variables + is only 32 bit, but they are naturally aligned on most other + architectures including x86-64. This means a structure like + + :: + + struct foo { + __u32 a; + __u64 b; + __u32 c; + }; + + has four bytes of padding between a and b on x86-64, plus another four + bytes of padding at the end, but no padding on i386, and it needs a + compat_ioctl conversion handler to translate between the two formats. + + To avoid this problem, all structures should have their members + naturally aligned, or explicit reserved fields added in place of the + implicit padding. + +* On ARM OABI user space, 16-bit member variables have 32-bit + alignment, making them incompatible with modern EABI kernels. + Conversely, on the m68k architecture, all struct members have at most + 16-bit alignment. These rarely cause problems as neither ARM-OABI nor + m68k are supported by any compat mode, but for consistency, it is best + to completely avoid 16-bit member variables. + + +* Bitfields and enums generally work as one would expect them to, + but some properties of them are implementation-defined, so it is better + to avoid them completely in ioctl interfaces. + +* ``char`` members can be either signed or unsigned, depending on + the architecture, so the __u8 and __s8 types should be used for 8-bit + integer values, though char arrays are clearer for fixed-length strings. + +Information leaks +================= + +Uninitialized data must not be copied back to user space, as this can +cause an information leak, which can be used to defeat kernel address +space layout randomization (KASLR), helping in an attack. + +As explained for the compat mode, it is best to not avoid any padding in +data structures, but if there is already padding in existing structures, +the kernel driver must be careful to zero out the padding using +``memset()`` or similar before copying it to user space. + +Subsystem abstractions +====================== + +While some device drivers implement their own ioctl function, most +subsystems implement the same command for multiple drivers. Ideally the +subsystem has an .ioctl() handler that copies the arguments from and +to user space, passing them into subsystem specific callback functions +through normal kernel pointers. + +This helps in various ways: + +* Applications written for one driver are more likely to work for + another one in the same subsystem if there are no subtle differences + in the user space ABI. + +* The complexity of user space access and data structure layout at done + in one place, reducing the potential for implementation bugs. + +* It is more likely to be reviewed by experienced developers + that can spot problems in the interface when the ioctl is shared + between multiple drivers than when it is only used in a single driver. + +Alternatives to ioctl +===================== + +There are many cases in which ioctl is not the best solution for a +problem. Alternatives include + +* System calls are a better choice for a system-wide feature that + is not tied to a physical device or constrained by the file system + permissions of a character device node + +* netlink is the preferred way of configuring any network related + objects through sockets. + +* debugfs is used for ad-hoc interfaces for debugging functionality + that does not need to be exposed as a stable interface to applications. + +* sysfs is a good way to expose the state of an in-kernel object + that is not tied to a file descriptor. + +* configfs can be used for more complex configuration than sysfs + +* A custom file system can provide extra flexibility with a simple + user interface but add a lot of complexity in the implementation. -- 2.20.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 24/24] Documentation: document ioctl interfaces better 2019-12-11 20:42 ` [PATCH 24/24] Documentation: document ioctl interfaces better Arnd Bergmann @ 2019-12-11 21:05 ` Jonathan Corbet 2019-12-12 10:56 ` Arnd Bergmann 2019-12-12 8:16 ` Geert Uytterhoeven 1 sibling, 1 reply; 6+ messages in thread From: Jonathan Corbet @ 2019-12-11 21:05 UTC (permalink / raw) To: Arnd Bergmann Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen, Alexander Viro, linux-kernel, y2038, Mauro Carvalho Chehab, Jonathan Neuschäfer, Masahiro Yamada, Vladimir Oltean, Kent Overstreet, linux-doc On Wed, 11 Dec 2019 21:42:58 +0100 Arnd Bergmann <arnd@arndb.de> wrote: > Documentation/process/botching-up-ioctls.rst was orignally > written as a blog post for DRM driver writers, so it it misses > some points while going into a lot of detail on others. > > Try to provide a replacement that addresses typical issues > across a wider range of subsystems, and follows the style of > the core-api documentation better. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Thanks for improving the docs! I have a few nits outside of the content itself. > Documentation/core-api/index.rst | 1 + > Documentation/core-api/ioctl.rst | 250 +++++++++++++++++++++++++++++++ > 2 files changed, 251 insertions(+) > create mode 100644 Documentation/core-api/ioctl.rst So you left the old document in place; was that intentional? > diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst > index ab0eae1c153a..3f28b2f668be 100644 > --- a/Documentation/core-api/index.rst > +++ b/Documentation/core-api/index.rst > @@ -39,6 +39,7 @@ Core utilities > ../RCU/index > gcc-plugins > symbol-namespaces > + ioctl > > > Interfaces for kernel debugging > diff --git a/Documentation/core-api/ioctl.rst b/Documentation/core-api/ioctl.rst > new file mode 100644 > index 000000000000..cb2c86ae63e7 > --- /dev/null > +++ b/Documentation/core-api/ioctl.rst > @@ -0,0 +1,250 @@ > +====================== > +ioctl based interfaces > +====================== > + > +:c:func:`ioctl` is the most common way for applications to interface Please don't use :c:func: anymore. If you just say "ioctl()" the right thing will happen (which is nothing here, since there isn't anything that makes sense to link to in the internal kernel context). We need a checkpatch rule for :c:func: I guess. Similarly, later on you have: > +Timeout values and timestamps should ideally use CLOCK_MONOTONIC time, > +as returned by ``ktime_get_ns()`` or ``ktime_get_ts64()``. Unlike > +CLOCK_REALTIME, this makes the timestamps immune from jumping backwards > +or forwards due to leap second adjustments and clock_settime() calls. Making those functions ``literal`` will defeat the automatic cross-referencing. Better to just say ktime_get_ns() without quotes. [...] > +* On the x86-32 (i386) architecture, the alignment of 64-bit variables > + is only 32 bit, but they are naturally aligned on most other > + architectures including x86-64. This means a structure like > + > + :: You don't need the extra lines here; just say "...a structure like::" > + struct foo { > + __u32 a; > + __u64 b; > + __u32 c; > + }; > + Thanks, jon ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 24/24] Documentation: document ioctl interfaces better 2019-12-11 21:05 ` Jonathan Corbet @ 2019-12-12 10:56 ` Arnd Bergmann 0 siblings, 0 replies; 6+ messages in thread From: Arnd Bergmann @ 2019-12-12 10:56 UTC (permalink / raw) To: Jonathan Corbet Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen, Alexander Viro, linux-kernel, y2038 Mailman List, Mauro Carvalho Chehab, Jonathan Neuschäfer, Masahiro Yamada, Vladimir Oltean, Kent Overstreet, open list:DOCUMENTATION, Geert Uytterhoeven, Daniel Vetter On Wed, Dec 11, 2019 at 10:05 PM Jonathan Corbet <corbet@lwn.net> wrote: > > On Wed, 11 Dec 2019 21:42:58 +0100 > Arnd Bergmann <arnd@arndb.de> wrote: > > > Documentation/process/botching-up-ioctls.rst was orignally > > written as a blog post for DRM driver writers, so it it misses > > some points while going into a lot of detail on others. > > > > Try to provide a replacement that addresses typical issues > > across a wider range of subsystems, and follows the style of > > the core-api documentation better. > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > Thanks for improving the docs! I have a few nits outside of the content > itself. > > > Documentation/core-api/index.rst | 1 + > > Documentation/core-api/ioctl.rst | 250 +++++++++++++++++++++++++++++++ > > 2 files changed, 251 insertions(+) > > create mode 100644 Documentation/core-api/ioctl.rst > > So you left the old document in place; was that intentional? I wasn't quite sure what to do with it. It does have some points that are relevant for drivers/gpu/drm that I did not cover in the new document. Maybe Daniel has an idea for how the two documents can be combined now, or the overlap reduced. > > diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst > > index ab0eae1c153a..3f28b2f668be 100644 > > --- a/Documentation/core-api/index.rst > > +++ b/Documentation/core-api/index.rst > > @@ -39,6 +39,7 @@ Core utilities > > ../RCU/index > > gcc-plugins > > symbol-namespaces > > + ioctl > > > > > > Interfaces for kernel debugging > > diff --git a/Documentation/core-api/ioctl.rst b/Documentation/core-api/ioctl.rst > > new file mode 100644 > > index 000000000000..cb2c86ae63e7 > > --- /dev/null > > +++ b/Documentation/core-api/ioctl.rst > > @@ -0,0 +1,250 @@ > > +====================== > > +ioctl based interfaces > > +====================== > > + > > +:c:func:`ioctl` is the most common way for applications to interface > > Please don't use :c:func: anymore. If you just say "ioctl()" the right > thing will happen (which is nothing here, since there isn't anything that > makes sense to link to in the internal kernel context). > > We need a checkpatch rule for :c:func: I guess. Ok, fixed. > Similarly, later on you have: > > > +Timeout values and timestamps should ideally use CLOCK_MONOTONIC time, > > +as returned by ``ktime_get_ns()`` or ``ktime_get_ts64()``. Unlike > > +CLOCK_REALTIME, this makes the timestamps immune from jumping backwards > > +or forwards due to leap second adjustments and clock_settime() calls. > > Making those functions ``literal`` will defeat the automatic > cross-referencing. Better to just say ktime_get_ns() without quotes. Ok, done. Does this only work for function names, or should I also use a different way to write ``include/uapi/asm-generic/ioctl.h`` or ``sizeof(size)`` and ``unsigned long`` > [...] > > > +* On the x86-32 (i386) architecture, the alignment of 64-bit variables > > + is only 32 bit, but they are naturally aligned on most other > > + architectures including x86-64. This means a structure like > > + > > + :: > > You don't need the extra lines here; just say "...a structure like::" Done. See below for changes I did relative to the feedback so far, now squashed into the latest version. Thanks for the review, Arnd 8<----------- diff --git a/Documentation/core-api/ioctl.rst b/Documentation/core-api/ioctl.rst index cb2c86ae63e7..2e70d3633883 100644 --- a/Documentation/core-api/ioctl.rst +++ b/Documentation/core-api/ioctl.rst @@ -2,7 +2,7 @@ ioctl based interfaces ====================== -:c:func:`ioctl` is the most common way for applications to interface +ioctl() is the most common way for applications to interface with device drivers. It is flexible and easily extended by adding new commands and can be passed through character devices, block devices as well as sockets and other special file descriptors. @@ -20,19 +20,19 @@ identifies an action for a particular driver, there are a number of conventions around defining them. ``include/uapi/asm-generic/ioctl.h`` provides four macros for defining -ioctl commands that follow modern conventions: ``_IOC``, ``_IOR``, +ioctl commands that follow modern conventions: ``_IO``, ``_IOR``, ``_IOW``, and ``_IORW``. These should be used for all new commands, with the correct parameters: _IO/_IOR/_IOW/_IOWR The macro name determines whether the argument is used for passing data into kernel (_IOW), from the kernel (_IOR), both (_IOWR) or is - not a pointer (_IOC). It is possible but not recommended to pass an - integer value instead of a pointer with _IOC. + not a pointer (_IO). It is possible but not recommended to pass an + integer value instead of a pointer with _IO. type An 8-bit number, often a character literal, specific to a subsystem - or driver, and listed in :doc:`../ioctl/ioctl-number` + or driver, and listed in :doc:`../userspace-api/ioctl/ioctl-number` nr An 8-bit number identifying the specific command, unique for a give @@ -91,7 +91,7 @@ data structures when separate second/nanosecond values are desired, or passed to user space directly. This is still not ideal though, as the structure matches neither the kernel's timespec64 nor the user space timespec exactly. The get_timespec64() and put_timespec64() helper -functions canbe used to ensure that the layout remains compatible with +functions can be used to ensure that the layout remains compatible with user space and the padding is treated correctly. As it is cheap to convert seconds to nanoseconds, but the opposite @@ -99,13 +99,12 @@ requires an expensive 64-bit division, a simple __u64 nanosecond value can be simpler and more efficient. Timeout values and timestamps should ideally use CLOCK_MONOTONIC time, -as returned by ``ktime_get_ns()`` or ``ktime_get_ts64()``. Unlike +as returned by ktime_get_ns() or ktime_get_ts64(). Unlike CLOCK_REALTIME, this makes the timestamps immune from jumping backwards or forwards due to leap second adjustments and clock_settime() calls. -``ktime_get_real_ns()`` can be used for CLOCK_REALTIME timestamps that -may be required for timestamps that need to be persistent across a reboot -or between multiple machines. +ktime_get_real_ns() can be used for CLOCK_REALTIME timestamps that +need to be persistent across a reboot or between multiple machines. 32-bit compat mode ================== @@ -116,14 +115,14 @@ implement the corresponding compat_ioctl handler. As long as all the rules for data structures are followed, this is as easy as setting the .compat_ioctl pointer to a helper function such as -``compat_ptr_ioctl()`` or ``blkdev_compat_ptr_ioctl``. +compat_ptr_ioctl() or blkdev_compat_ptr_ioctl(). compat_ptr() ------------ On the s/390 architecture, 31-bit user space has ambiguous representations for data pointers, with the upper bit being ignored. When running such -a process in compat mode, the ``compat_ptr()`` helper must be used to +a process in compat mode, the compat_ptr() helper must be used to clear the upper bit of a compat_uptr_t and turn it into a valid 64-bit pointer. On other architectures, this macro only performs a cast to a ``void __user *`` pointer. @@ -150,16 +149,14 @@ avoiding all problematic members: ``__s64`` and ``__u64``. * Pointers have the same problem, in addition to requiring the - use of ``compat_ptr()``. The best workaround is to use ``__u64`` + use of compat_ptr(). The best workaround is to use ``__u64`` in place of pointers, which requires a cast to ``uintptr_t`` in user - space, and the use of ``u64_to_user_ptr()`` in the kernel to convert + space, and the use of u64_to_user_ptr() in the kernel to convert it back into a user pointer. * On the x86-32 (i386) architecture, the alignment of 64-bit variables - is only 32 bit, but they are naturally aligned on most other - architectures including x86-64. This means a structure like - - :: + is only 32-bit, but they are naturally aligned on most other + architectures including x86-64. This means a structure like:: struct foo { __u32 a; @@ -177,9 +174,10 @@ avoiding all problematic members: * On ARM OABI user space, 16-bit member variables have 32-bit alignment, making them incompatible with modern EABI kernels. - Conversely, on the m68k architecture, all struct members have at most - 16-bit alignment. These rarely cause problems as neither ARM-OABI nor - m68k are supported by any compat mode, but for consistency, it is best + Conversely, on the m68k architecture, struct members are not + guaranteed to have an alignment greater than 16-bit. + These rarely cause problems as neither ARM-OABI nor m68k are + supported by any compat mode, but for consistency, it is best to completely avoid 16-bit member variables. @@ -198,10 +196,10 @@ Uninitialized data must not be copied back to user space, as this can cause an information leak, which can be used to defeat kernel address space layout randomization (KASLR), helping in an attack. -As explained for the compat mode, it is best to not avoid any padding in -data structures, but if there is already padding in existing structures, -the kernel driver must be careful to zero out the padding using -``memset()`` or similar before copying it to user space. +As explained for the compat mode, it is best to not avoid any implicit +padding in data structures, but if there is already padding in existing +structures, the kernel driver must be careful to zero out the padding +using memset() or similar before copying it to user space. Subsystem abstractions ====================== @@ -218,7 +216,7 @@ This helps in various ways: another one in the same subsystem if there are no subtle differences in the user space ABI. -* The complexity of user space access and data structure layout at done +* The complexity of user space access and data structure layout is done in one place, reducing the potential for implementation bugs. * It is more likely to be reviewed by experienced developers @@ -247,4 +245,4 @@ problem. Alternatives include * configfs can be used for more complex configuration than sysfs * A custom file system can provide extra flexibility with a simple - user interface but add a lot of complexity in the implementation. + user interface but add a lot of complexity to the implementation. ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 24/24] Documentation: document ioctl interfaces better 2019-12-11 20:42 ` [PATCH 24/24] Documentation: document ioctl interfaces better Arnd Bergmann 2019-12-11 21:05 ` Jonathan Corbet @ 2019-12-12 8:16 ` Geert Uytterhoeven 2019-12-12 11:04 ` Arnd Bergmann 1 sibling, 1 reply; 6+ messages in thread From: Geert Uytterhoeven @ 2019-12-12 8:16 UTC (permalink / raw) To: Arnd Bergmann Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen, Alexander Viro, Jonathan Corbet, Linux Kernel Mailing List, y2038 Mailman List, Mauro Carvalho Chehab, Jonathan Neuschäfer, Masahiro Yamada, Vladimir Oltean, Kent Overstreet, open list:DOCUMENTATION Hi Arnd, On Wed, Dec 11, 2019 at 9:53 PM Arnd Bergmann <arnd@arndb.de> wrote: > Documentation/process/botching-up-ioctls.rst was orignally > written as a blog post for DRM driver writers, so it it misses > some points while going into a lot of detail on others. > > Try to provide a replacement that addresses typical issues > across a wider range of subsystems, and follows the style of > the core-api documentation better. > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Thanks for your patch! > --- /dev/null > +++ b/Documentation/core-api/ioctl.rst > @@ -0,0 +1,250 @@ > +====================== > +ioctl based interfaces > +====================== > + > +:c:func:`ioctl` is the most common way for applications to interface > +with device drivers. It is flexible and easily extended by adding new > +commands and can be passed through character devices, block devices as > +well as sockets and other special file descriptors. > + > +However, it is also very easy to get ioctl command definitions wrong, > +and hard to fix them later without breaking existing applications, > +so this documentation tries to help developers get it right. > + > +Command number definitions > +========================== > + > +The command number, or request number, is the second argument passed to > +the ioctl system call. While this can be any 32-bit number that uniquely > +identifies an action for a particular driver, there are a number of > +conventions around defining them. Interesting. I never realized the action is 32-bit in the kernel, but unsigned long in userspace... > + > +``include/uapi/asm-generic/ioctl.h`` provides four macros for defining > +ioctl commands that follow modern conventions: ``_IOC``, ``_IOR``, > +``_IOW``, and ``_IORW``. These should be used for all new commands, > +with the correct parameters: > + > +_IO/_IOR/_IOW/_IOWR This says _IO.... > + The macro name determines whether the argument is used for passing > + data into kernel (_IOW), from the kernel (_IOR), both (_IOWR) or is into the kernel , or is > + not a pointer (_IOC). It is possible but not recommended to pass an > + integer value instead of a pointer with _IOC. ...which is not explained here, but _IOC is? > + > +type > + An 8-bit number, often a character literal, specific to a subsystem > + or driver, and listed in :doc:`../ioctl/ioctl-number` > + > +nr > + An 8-bit number identifying the specific command, unique for a give > + value of 'type' > + > +size > + The name of the data type pointed to by the argument, the command > + number encodes the ``sizeof(size)`` value in a 13-bit or 14-bit integer, > + leading to a limit of 8191 bytes for the maximum size of the argument. > + Note: do not pass sizeof(type) type into _IOR/IOW, as that will lead > + to encoding sizeof(sizeof(type)), i.e. sizeof(size_t). Looks like "size" could be renamed, to make this more obvious? > +Timestamps > +========== > + > +Traditionally, timestamps and timeout values are passed as ``struct > +timespec`` or ``struct timeval``, but these are problematic because of > +incompatible definitions of these structures in user space after the > +move to 64-bit time_t. > + > +The __kernel_timespec type can be used instead to be embedded in other > +data structures when separate second/nanosecond values are desired, > +or passed to user space directly. This is still not ideal though, > +as the structure matches neither the kernel's timespec64 nor the user > +space timespec exactly. The get_timespec64() and put_timespec64() helper > +functions canbe used to ensure that the layout remains compatible with can be > +user space and the padding is treated correctly. > + > +As it is cheap to convert seconds to nanoseconds, but the opposite > +requires an expensive 64-bit division, a simple __u64 nanosecond value > +can be simpler and more efficient. > + > +Timeout values and timestamps should ideally use CLOCK_MONOTONIC time, > +as returned by ``ktime_get_ns()`` or ``ktime_get_ts64()``. Unlike > +CLOCK_REALTIME, this makes the timestamps immune from jumping backwards > +or forwards due to leap second adjustments and clock_settime() calls. > + > +``ktime_get_real_ns()`` can be used for CLOCK_REALTIME timestamps that > +may be required for timestamps that need to be persistent across a reboot Drop "may be required for timestamps that"? > +or between multiple machines. > +Structure layout > +---------------- > + > +Compatible data structures have the same layout on all architectures, > +avoiding all problematic members: > + > +* ``long`` and ``unsigned long`` are the size of a register, so > + they can be either 32 bit or 64 bit wide and cannot be used in portable 32-bit or 64-bit (for consistency with the rest of the document) > + data structures. Fixed-length replacements are ``__s32``, ``__u32``, > + ``__s64`` and ``__u64``. > + > +* Pointers have the same problem, in addition to requiring the > + use of ``compat_ptr()``. The best workaround is to use ``__u64`` > + in place of pointers, which requires a cast to ``uintptr_t`` in user > + space, and the use of ``u64_to_user_ptr()`` in the kernel to convert > + it back into a user pointer. > + > +* On the x86-32 (i386) architecture, the alignment of 64-bit variables > + is only 32 bit, but they are naturally aligned on most other 32-bit > + architectures including x86-64. This means a structure like > + > + :: > + > + struct foo { > + __u32 a; > + __u64 b; > + __u32 c; > + }; > + > + has four bytes of padding between a and b on x86-64, plus another four > + bytes of padding at the end, but no padding on i386, and it needs a > + compat_ioctl conversion handler to translate between the two formats. > + > + To avoid this problem, all structures should have their members > + naturally aligned, or explicit reserved fields added in place of the > + implicit padding. > + > +* On ARM OABI user space, 16-bit member variables have 32-bit > + alignment, making them incompatible with modern EABI kernels. > + Conversely, on the m68k architecture, all struct members have at most > + 16-bit alignment. These rarely cause problems as neither ARM-OABI nor "have at most 16-bit alignment" sounds a bit weird to me, as a member may have a greater alignment. "struct members are not guaranteed to have an alignment greater than 16-bit"? > + m68k are supported by any compat mode, but for consistency, it is best > + to completely avoid 16-bit member variables. > + > + > +* Bitfields and enums generally work as one would expect them to, > + but some properties of them are implementation-defined, so it is better > + to avoid them completely in ioctl interfaces. > + > +* ``char`` members can be either signed or unsigned, depending on > + the architecture, so the __u8 and __s8 types should be used for 8-bit > + integer values, though char arrays are clearer for fixed-length strings. > + > +Information leaks > +================= > + > +Uninitialized data must not be copied back to user space, as this can > +cause an information leak, which can be used to defeat kernel address > +space layout randomization (KASLR), helping in an attack. > + > +As explained for the compat mode, it is best to not avoid any padding in best to avoid any implicit padding? > +data structures, but if there is already padding in existing structures, > +the kernel driver must be careful to zero out the padding using > +``memset()`` or similar before copying it to user space. > + > +Subsystem abstractions > +====================== > + > +While some device drivers implement their own ioctl function, most > +subsystems implement the same command for multiple drivers. Ideally the > +subsystem has an .ioctl() handler that copies the arguments from and > +to user space, passing them into subsystem specific callback functions > +through normal kernel pointers. > + > +This helps in various ways: > + > +* Applications written for one driver are more likely to work for > + another one in the same subsystem if there are no subtle differences > + in the user space ABI. > + > +* The complexity of user space access and data structure layout at done is done > + in one place, reducing the potential for implementation bugs. > + > +* It is more likely to be reviewed by experienced developers > + that can spot problems in the interface when the ioctl is shared > + between multiple drivers than when it is only used in a single driver. > + > +Alternatives to ioctl > +===================== > + > +There are many cases in which ioctl is not the best solution for a > +problem. Alternatives include : > + > +* System calls are a better choice for a system-wide feature that > + is not tied to a physical device or constrained by the file system > + permissions of a character device node > + > +* netlink is the preferred way of configuring any network related > + objects through sockets. > + > +* debugfs is used for ad-hoc interfaces for debugging functionality > + that does not need to be exposed as a stable interface to applications. > + > +* sysfs is a good way to expose the state of an in-kernel object > + that is not tied to a file descriptor. > + > +* configfs can be used for more complex configuration than sysfs > + > +* A custom file system can provide extra flexibility with a simple > + user interface but add a lot of complexity in the implementation. adds ... to Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 24/24] Documentation: document ioctl interfaces better 2019-12-12 8:16 ` Geert Uytterhoeven @ 2019-12-12 11:04 ` Arnd Bergmann 0 siblings, 0 replies; 6+ messages in thread From: Arnd Bergmann @ 2019-12-12 11:04 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Jens Axboe, James E.J. Bottomley, Martin K. Petersen, Alexander Viro, Jonathan Corbet, Linux Kernel Mailing List, y2038 Mailman List, Mauro Carvalho Chehab, Jonathan Neuschäfer, Masahiro Yamada, Vladimir Oltean, Kent Overstreet, open list:DOCUMENTATION On Thu, Dec 12, 2019 at 9:16 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > On Wed, Dec 11, 2019 at 9:53 PM Arnd Bergmann <arnd@arndb.de> wrote: > > +``include/uapi/asm-generic/ioctl.h`` provides four macros for defining > > +ioctl commands that follow modern conventions: ``_IOC``, ``_IOR``, > > +``_IOW``, and ``_IORW``. These should be used for all new commands, > > +with the correct parameters: > > + > > +_IO/_IOR/_IOW/_IOWR > > This says _IO.... > > > + The macro name determines whether the argument is used for passing > > + data into kernel (_IOW), from the kernel (_IOR), both (_IOWR) or is > > into the kernel > , or is > > > + not a pointer (_IOC). It is possible but not recommended to pass an > > + integer value instead of a pointer with _IOC. > > ...which is not explained here, but _IOC is? I mean _IO() everywhere, I would consider _IOC an implementation detail and not document that here. s/_IOC/_IO/ throughout the document now. > > +size > > + The name of the data type pointed to by the argument, the command > > + number encodes the ``sizeof(size)`` value in a 13-bit or 14-bit integer, > > + leading to a limit of 8191 bytes for the maximum size of the argument. > > + Note: do not pass sizeof(type) type into _IOR/IOW, as that will lead > > + to encoding sizeof(sizeof(type)), i.e. sizeof(size_t). > > Looks like "size" could be renamed, to make this more obvious? Changed to data_type now, which is what I found in some other documentation. > > +space timespec exactly. The get_timespec64() and put_timespec64() helper > > +functions canbe used to ensure that the layout remains compatible with > > can be done. > > + > > +``ktime_get_real_ns()`` can be used for CLOCK_REALTIME timestamps that > > +may be required for timestamps that need to be persistent across a reboot > > Drop "may be required for timestamps that"? done. > > +* ``long`` and ``unsigned long`` are the size of a register, so > > + they can be either 32 bit or 64 bit wide and cannot be used in portable > > 32-bit or 64-bit (for consistency with the rest of the document) The convention I was trying to follow is to write "32-bit userspace" or "32-bit processor" but "this type is 32 bit wide" I wasn't consistent though, changed it now as you suggested. > > + > > +* On ARM OABI user space, 16-bit member variables have 32-bit > > + alignment, making them incompatible with modern EABI kernels. > > + Conversely, on the m68k architecture, all struct members have at most > > + 16-bit alignment. These rarely cause problems as neither ARM-OABI nor > > "have at most 16-bit alignment" sounds a bit weird to me, as a member > may have a greater alignment. > "struct members are not guaranteed to have an alignment greater than 16-bit"? done. > > + > > +As explained for the compat mode, it is best to not avoid any padding in > > best to avoid any implicit padding? done. > > + > > +* The complexity of user space access and data structure layout at done > > is done changed > > +There are many cases in which ioctl is not the best solution for a > > +problem. Alternatives include > > : done > > +* System calls are a better choice for a system-wide feature that > > + is not tied to a physical device or constrained by the file system > > + permissions of a character device node > > + > > +* netlink is the preferred way of configuring any network related > > + objects through sockets. > > + > > +* debugfs is used for ad-hoc interfaces for debugging functionality > > + that does not need to be exposed as a stable interface to applications. > > + > > +* sysfs is a good way to expose the state of an in-kernel object > > + that is not tied to a file descriptor. > > + > > +* configfs can be used for more complex configuration than sysfs > > + > > +* A custom file system can provide extra flexibility with a simple > > + user interface but add a lot of complexity in the implementation. > > adds ... to done. Thanks for all the suggestions! Arnd ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-12-12 11:05 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-11 20:42 [PATCH 00/24] block, scsi: final compat_ioctl cleanup Arnd Bergmann 2019-12-11 20:42 ` [PATCH 24/24] Documentation: document ioctl interfaces better Arnd Bergmann 2019-12-11 21:05 ` Jonathan Corbet 2019-12-12 10:56 ` Arnd Bergmann 2019-12-12 8:16 ` Geert Uytterhoeven 2019-12-12 11:04 ` Arnd Bergmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).