All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v18 00/83] sg: add v4 interface, request sharing
@ 2021-04-27 21:56 Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 00/45] sg: add v4 interface Douglas Gilbert
                   ` (83 more replies)
  0 siblings, 84 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

This is the combined patchset showing the additions in the
second half patchset, that will be presented after the first
half patchset is accepted.

Patches 1 to 45 (inclusive) are a new version (v18) of the
first half patchset and have their own cover letter whose
subject line is:
    [PATCH v18 00/45] sg: add v4 interface

The idea behind releasing the combined patchset is so they can
be run through the kernel's code sanity checking mechanisms.
So expect some noise.

The additions in the second half patchset are more fully described
in:    https://sg.danny.cz/sg/sg_v40.html  which is mirrored at:
       https://doug-gilbert.github.io/sg_v40.html
       

The following list is a summary of features in the second patchset:

    - add (sg) file descriptor sharing; this is used by:
      - request sharing
      - multiple requests (mrq) use of SGV4_FLAG_DO_ON_OTHER flag

    - add request sharing, mainly to expedite copying. READ bio
      handed off to paired WRITE with no data to user space unless
      requested. Also capable of using VERIFY(BytChk=1) instead of
      the WRITE (what NVMe does in its Compare NVM command).

    - extend the request sharing logic so SGV4_FLAG_KEEP_SHARE
      will keep bio after a WRITE. This allows for a single
      source, multiple destinations copy 

    - add an extensible SG_SET_GET_EXTENDED ioctl(2) that takes a
      fixed size structure (96 byte).

    - add multiple requests capability (mrq) in a single ioctl(SG_IO)
      or ioctl(SG_IOSUBMIT) invocation. Can be combined with request
      sharing.

    - add a SGV4_FLAG_IMMED flag for ioctl(SG_IORECEIVE) or
      ioctl(SG_IORECEIVE_V3) so they don't wait

    - add logic for (block layer generated) tag handling and keep
      existing pack_id (packet id) logic which plays a similar role

    - add ioctl(SG_IOABORT) to abort an inflight command/request
      using its pack-id or tag.

    - add shared variable blocking (svb) method to the mrq. Assumes
      it is doing copy-like request sharing. By default WRITEs are
      unordered (wrt to each other). With SGV4_FLAG_ORDERED_WR flag
      WRITEs are ordered as required for ZBC disks.

    - add support to pass a fd generated by eventfd(2) to the driver
      via an ioctl(2).

    - use iopoll/hipri/blk_poll with mrq, especially svb.

    - bump the driver version number to 4.0.47

Most of the above are _only_ implemented for the sg version 4
(i.e. based on struct sg_io_v4) interface.


Douglas Gilbert (83):
  sg: move functions around
  sg: remove typedefs, type+formatting cleanup
  sg: sg_log and is_enabled
  sg: rework sg_poll(), minor changes
  sg: bitops in sg_device
  sg: make open count an atomic
  sg: move header to uapi section
  sg: speed sg_poll and sg_get_num_waiting
  sg: sg_allow_if_err_recovery and renames
  sg: improve naming
  sg: change rwlock to spinlock
  sg: ioctl handling
  sg: split sg_read
  sg: sg_common_write add structure for arguments
  sg: rework sg_vma_fault
  sg: rework sg_mmap
  sg: replace sg_allow_access
  sg: rework scatter gather handling
  sg: introduce request state machine
  sg: sg_find_srp_by_id
  sg: sg_fill_request_element
  sg: printk change %p to %pK
  sg: xarray for fds in device
  sg: xarray for reqs in fd
  sg: replace rq array with xarray
  sg: sense buffer rework
  sg: add sg v4 interface support
  sg: rework debug info
  sg: add 8 byte SCSI LUN to sg_scsi_id
  sg: expand sg_comm_wr_t
  sg: add sg_iosubmit_v3 and sg_ioreceive_v3 ioctls
  sg: add some __must_hold macros
  sg: move procfs objects to avoid forward decls
  sg: protect multiple receivers
  sg: first debugfs support
  sg: rework mmap support
  sg: defang allow_dio
  sg: warn v3 write system call users
  sg: add mmap_sz tracking
  sg: remove rcv_done request state
  sg: track lowest inactive and await indexes
  sg: remove unit attention check for device changed
  sg: no_dxfer: move to/from kernel buffers
  sg: add blk_poll support
  sg: bump version to 4.0.12
  sg: add sg_ioabort ioctl
  sg: add sg_set_get_extended ioctl
  sg: sgat_elem_sz and sum_fd_dlens
  sg: tag and more_async
  sg: add fd sharing , change, unshare
  sg: add shared requests
  sg: add multiple request support
  sg: rename some mrq variables
  sg: unlikely likely
  sg: mrq abort
  sg: reduce atomic operations
  sg: add excl_wait flag
  sg: tweak sg_find_sfp_by_fd()
  sg: add snap_dev flag and snapped in debugfs
  sg: compress usercontext to uc
  sg: optionally output sg_request.frq_bm flags
  sg: work on sg_mrq_sanity()
  sg: shared variable blocking
  sg: device timestamp
  sg: condition met is not an error
  sg: split sg_setup_req
  sg: finish after read-side request
  sg: keep share and dout offset flags
  sg: add dlen to sg_comm_wr_t
  sg: make use of struct sg_mrq_hold
  sg: add mmap IO option for mrq metadata
  sg: add eventfd support
  sg: table of error number explanations
  sg: add ordered write flag
  sg: expand source line length to 100 characters
  sg: add no_attach_msg parameter
  sg: add SGV4_FLAG_REC_ORDER
  sg: max to read for mrq sg_ioreceive
  sg: mrq: if uniform svb then re-use bio_s
  sg: expand bvec usage; re-use bio_s
  sg: blk_poll/hipri work for mrq
  sg: pollable and non-pollable requests
  sg: bump version to 4.0.47

 drivers/scsi/sg.c      | 10003 +++++++++++++++++++++++++++++++--------
 include/scsi/sg.h      |   273 +-
 include/uapi/scsi/sg.h |   491 ++
 3 files changed, 8440 insertions(+), 2327 deletions(-)
 create mode 100644 include/uapi/scsi/sg.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v18 00/45] sg: add v4 interface
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 01/83] sg: move functions around Douglas Gilbert
                   ` (82 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

This patchset is the first stage of a two stage rewrite of the scsi
generic (sg) driver. The main goal of the first stage is to introduce
the sg v4 interface that uses 'struct sg_io_v4' as well as keeping and
modernizing the sg v3 interface (based on 'struct sg_io_hdr'). The
async interface formerly requiring the use of write() and read()
system calls now have ioctl(SG_IOSUBMIT) and ioctl(SG_IORECEIVE)
replacements.

A recent patch to fio allows its sg engine to issue iopoll requests
by using hipri=1 in the fio script. When testing sg device nodes
with fio this option must be set: direct=0. The sync option may
be set to 0 (use sg's write/read async interface) or 1 (uses
ioctl(SG_IO) interface).

For documentation see:
    https://sg.danny.cz/sg/sg_v40.html which is mirrored at:
    https://doug-gilbert.github.io/sg_v40.html

This patchset is against Martin Petersen's 5.13/scsi-queue branch.

Changes since v17 (sent to linux-scsi list on 20210407)
  - make clearer distinction between user pollable (i.e. async)
    requests and (user) non-pollable requests (e.g. those injected
    with ioctl(SG_IO), IOWs sync requests)
  - fix crash in sg_start_req() when blk_get_request() yields an
    error (e.g. -EAGAIN when low on resources)
  - sg_finish_scsi_blk_rq(): remove now_zero variable as suggested
    by Hannes R.
  - change deprecation warning url reference from http to https

Changes since v16 (sent to linux-scsi list on 20210208)
  - sg_start_req() fix double free on error path [KASAN]
  - sg_rq_map_kern() fix uninitialized variable [coverity]
  - sg_add_sfp() fix use after free [coverity]
  - sg_remove_sfp_usercontext(): remove pointless NULL check [coverity]
  - fix misuse of WARN_ONCE in sg_rq_end_io_usercontext() [D. Carpenter]
  - remove unused error checks: tracking blk_put_request() calls and
    multiple SG_XA_RQ_FREE calls
  - hipri: as blk_poll() can return > 0 for requests other than the one
    that is being checked for, need to re-check that request is ready
  - rebased on MKP's 5.13/scsi-queue

Changes since v15 (sent to linux-scsi list on 20210125)
  - tweak state machine which sets INFLIGHT state _before_
    blk_execute_rq_nowait() is called. Add a bit flag that indicates
    the logic flow has returned from that call. This guards against
    blk_poll() being called before the block layer has really
    launched the request.
  - fix bug clearing SG_FFD_HIPRI_SEEN bit as
    atomic_dec_and_test() returns true when the post-decrement value
    is zero, the opposite of what a C conditional does.

Changes since v14 (sent to linux-scsi list on 20210124)
  - two fixes based on report from Dan Carpenter and kernel test
    robot
  - fix Johannes Thumshirn's email address
  - separate patch issued on fio to add 'hipri' option to its sg
    engine. Enables fio to test new sg driver blk_poll() support

Changes since v13 (sent to linux-scsi list on 20210113)
  - fix obscure compile error reported by "kernel test robot
    <lkp@intel.com>"
  - harden code around blk_poll() invocation; needed based on
    fio testing
  - remove SG_FFD_MMAP_CALLED bit code after Hannes Reinecke pointed
    out it was redundant

Changes since v12 (sent to linux-scsi list on 20201115)
  - add blk_poll() support, prefix that patch's subject with 'RFC'

Changes since v11 (sent to linux-scsi list on 20201014)
  - no author originated changes since v11
  - port from lk 5.9.0-rc1 to lk 5.10.0-rc1 picks up a change to
    the import_iovec() which requires a change to patch 25/44
  - only publish this cover letter and v12 of patch 25/44 to
    the linux-scsi list. The other 43 patches remain as published
    on 20201014.

Changes since v10 (sent to linux-scsi list on 20200823)
  - unchanged: 0001 to 0009, 0010 to 0017
  - rename sg_add_req() to sg_setup_req() [0010]
  - patches 40,41,42 and 43 are new, see their commit messages
  - remove SG_RS_RCV_DONE request state leaving 3.5 states
    [the 0.5 state is SG_RS_BUSY]
  - rework sg_rq_chg_state() code that enforces request
    state changes and associated xarray marks
  - track lowest used and unused indexes in the request arrays so
    iterations over the request xarray are efficient. This is a
    significant saving when the iodepth queue length is large

Changes since v9 (sent to linux-scsi list on 20200421)
  - rebase on MKP's 5.10/scsi-queue branch
  - remove some master/slave terminology that had bled in from
    the part 2 patchset
  - change sg_request::start_ns type from ktime_t to u64
  - pick up several error path correction fixes applied to the
    sg driver by other authors

Changes since v8 to v1 in earlier patchsets
  - see: the v10 patchset sent to linux-scsi on 20200823

Douglas Gilbert (45):
  sg: move functions around
  sg: remove typedefs, type+formatting cleanup
  sg: sg_log and is_enabled
  sg: rework sg_poll(), minor changes
  sg: bitops in sg_device
  sg: make open count an atomic
  sg: move header to uapi section
  sg: speed sg_poll and sg_get_num_waiting
  sg: sg_allow_if_err_recovery and renames
  sg: improve naming
  sg: change rwlock to spinlock
  sg: ioctl handling
  sg: split sg_read
  sg: sg_common_write add structure for arguments
  sg: rework sg_vma_fault
  sg: rework sg_mmap
  sg: replace sg_allow_access
  sg: rework scatter gather handling
  sg: introduce request state machine
  sg: sg_find_srp_by_id
  sg: sg_fill_request_element
  sg: printk change %p to %pK
  sg: xarray for fds in device
  sg: xarray for reqs in fd
  sg: replace rq array with xarray
  sg: sense buffer rework
  sg: add sg v4 interface support
  sg: rework debug info
  sg: add 8 byte SCSI LUN to sg_scsi_id
  sg: expand sg_comm_wr_t
  sg: add sg_iosubmit_v3 and sg_ioreceive_v3 ioctls
  sg: add some __must_hold macros
  sg: move procfs objects to avoid forward decls
  sg: protect multiple receivers
  sg: first debugfs support
  sg: rework mmap support
  sg: defang allow_dio
  sg: warn v3 write system call users
  sg: add mmap_sz tracking
  sg: remove rcv_done request state
  sg: track lowest inactive and await indexes
  sg: remove unit attention check for device changed
  sg: no_dxfer: move to/from kernel buffers
  sg: add blk_poll support
  sg: bump version to 4.0.12

 drivers/scsi/sg.c      | 5409 +++++++++++++++++++++++++++-------------
 include/scsi/sg.h      |  273 +-
 include/uapi/scsi/sg.h |  375 +++
 3 files changed, 4116 insertions(+), 1941 deletions(-)
 create mode 100644 include/uapi/scsi/sg.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v18 01/83] sg: move functions around
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 00/45] sg: add v4 interface Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 02/83] sg: remove typedefs, type+formatting cleanup Douglas Gilbert
                   ` (81 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, Hannes Reinecke

Move main entry point functions around so submission code comes
before completion code. Prior to this, the driver used the
traditional open(), close(), read(), write(), ioctl() ordering
however in this case that places completion code (i.e.
sg_read()) before submission code (i.e. sg_write()). The main
driver entry points are considered to be those named in struct
file_operations sg_fops' definition.

Helper functions are placed above their caller to reduce the
number of forward function declarations needed.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 496 ++++++++++++++++++++++++----------------------
 1 file changed, 261 insertions(+), 235 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 737cea9d908e..5750bbb073dd 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -8,11 +8,12 @@
  * Original driver (sg.c):
  *        Copyright (C) 1992 Lawrence Foard
  * Version 2 and 3 extensions to driver:
- *        Copyright (C) 1998 - 2014 Douglas Gilbert
+ *        Copyright (C) 1998 - 2019 Douglas Gilbert
  */
 
-static int sg_version_num = 30536;	/* 2 digits for each component */
-#define SG_VERSION_STR "3.5.36"
+static int sg_version_num = 30901;  /* [x]xyyzz where [x] empty when x=0 */
+#define SG_VERSION_STR "3.9.01"		/* [x]x.[y]y.zz */
+static char *sg_version_date = "20190606";
 
 /*
  *  D. P. Gilbert (dgilbert@interlog.com), notes:
@@ -47,6 +48,7 @@ static int sg_version_num = 30536;	/* 2 digits for each component */
 #include <linux/ratelimit.h>
 #include <linux/uio.h>
 #include <linux/cred.h> /* for sg_check_file_access() */
+#include <linux/proc_fs.h>
 
 #include "scsi.h"
 #include <scsi/scsi_dbg.h>
@@ -57,12 +59,6 @@ static int sg_version_num = 30536;	/* 2 digits for each component */
 
 #include "scsi_logging.h"
 
-#ifdef CONFIG_SCSI_PROC_FS
-#include <linux/proc_fs.h>
-static char *sg_version_date = "20140603";
-
-static int sg_proc_init(void);
-#endif
 
 #define SG_ALLOW_DIO_DEF 0
 
@@ -173,11 +169,11 @@ typedef struct sg_device { /* holds the state of each scsi generic device */
 
 /* tasklet or soft irq callback */
 static void sg_rq_end_io(struct request *rq, blk_status_t status);
+/* Declarations of other static functions used before they are defined */
+static int sg_proc_init(void);
 static int sg_start_req(Sg_request *srp, unsigned char *cmd);
 static int sg_finish_rem_req(Sg_request * srp);
 static int sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size);
-static ssize_t sg_new_read(Sg_fd * sfp, char __user *buf, size_t count,
-			   Sg_request * srp);
 static ssize_t sg_new_write(Sg_fd *sfp, struct file *file,
 			const char __user *buf, size_t count, int blocking,
 			int read_only, int sg_io_owned, Sg_request **o_srp);
@@ -190,7 +186,6 @@ static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
 static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
 static Sg_fd *sg_add_sfp(Sg_device * sdp);
 static void sg_remove_sfp(struct kref *);
-static Sg_request *sg_get_rq_mark(Sg_fd * sfp, int pack_id);
 static Sg_request *sg_add_request(Sg_fd * sfp);
 static int sg_remove_request(Sg_fd * sfp, Sg_request * srp);
 static Sg_device *sg_get_dev(int dev);
@@ -232,16 +227,6 @@ static int sg_check_file_access(struct file *filp, const char *caller)
 	return 0;
 }
 
-static int sg_allow_access(struct file *filp, unsigned char *cmd)
-{
-	struct sg_fd *sfp = filp->private_data;
-
-	if (sfp->parentdp->device->type == TYPE_SCANNER)
-		return 0;
-
-	return blk_verify_command(cmd, filp->f_mode);
-}
-
 static int
 open_wait(Sg_device *sdp, int flags)
 {
@@ -405,196 +390,6 @@ sg_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
-static int get_sg_io_pack_id(int *pack_id, void __user *buf, size_t count)
-{
-	struct sg_header __user *old_hdr = buf;
-	int reply_len;
-
-	if (count >= SZ_SG_HEADER) {
-		/* negative reply_len means v3 format, otherwise v1/v2 */
-		if (get_user(reply_len, &old_hdr->reply_len))
-			return -EFAULT;
-
-		if (reply_len >= 0)
-			return get_user(*pack_id, &old_hdr->pack_id);
-
-		if (in_compat_syscall() &&
-		    count >= sizeof(struct compat_sg_io_hdr)) {
-			struct compat_sg_io_hdr __user *hp = buf;
-
-			return get_user(*pack_id, &hp->pack_id);
-		}
-
-		if (count >= sizeof(struct sg_io_hdr)) {
-			struct sg_io_hdr __user *hp = buf;
-
-			return get_user(*pack_id, &hp->pack_id);
-		}
-	}
-
-	/* no valid header was passed, so ignore the pack_id */
-	*pack_id = -1;
-	return 0;
-}
-
-static ssize_t
-sg_read(struct file *filp, char __user *buf, size_t count, loff_t * ppos)
-{
-	Sg_device *sdp;
-	Sg_fd *sfp;
-	Sg_request *srp;
-	int req_pack_id = -1;
-	sg_io_hdr_t *hp;
-	struct sg_header *old_hdr;
-	int retval;
-
-	/*
-	 * This could cause a response to be stranded. Close the associated
-	 * file descriptor to free up any resources being held.
-	 */
-	retval = sg_check_file_access(filp, __func__);
-	if (retval)
-		return retval;
-
-	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
-		return -ENXIO;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_read: count=%d\n", (int) count));
-
-	if (sfp->force_packid)
-		retval = get_sg_io_pack_id(&req_pack_id, buf, count);
-	if (retval)
-		return retval;
-
-	srp = sg_get_rq_mark(sfp, req_pack_id);
-	if (!srp) {		/* now wait on packet to arrive */
-		if (atomic_read(&sdp->detaching))
-			return -ENODEV;
-		if (filp->f_flags & O_NONBLOCK)
-			return -EAGAIN;
-		retval = wait_event_interruptible(sfp->read_wait,
-			(atomic_read(&sdp->detaching) ||
-			(srp = sg_get_rq_mark(sfp, req_pack_id))));
-		if (atomic_read(&sdp->detaching))
-			return -ENODEV;
-		if (retval)
-			/* -ERESTARTSYS as signal hit process */
-			return retval;
-	}
-	if (srp->header.interface_id != '\0')
-		return sg_new_read(sfp, buf, count, srp);
-
-	hp = &srp->header;
-	old_hdr = kzalloc(SZ_SG_HEADER, GFP_KERNEL);
-	if (!old_hdr)
-		return -ENOMEM;
-
-	old_hdr->reply_len = (int) hp->timeout;
-	old_hdr->pack_len = old_hdr->reply_len; /* old, strange behaviour */
-	old_hdr->pack_id = hp->pack_id;
-	old_hdr->twelve_byte =
-	    ((srp->data.cmd_opcode >= 0xc0) && (12 == hp->cmd_len)) ? 1 : 0;
-	old_hdr->target_status = hp->masked_status;
-	old_hdr->host_status = hp->host_status;
-	old_hdr->driver_status = hp->driver_status;
-	if ((CHECK_CONDITION & hp->masked_status) ||
-	    (DRIVER_SENSE & hp->driver_status))
-		memcpy(old_hdr->sense_buffer, srp->sense_b,
-		       sizeof (old_hdr->sense_buffer));
-	switch (hp->host_status) {
-	/* This setup of 'result' is for backward compatibility and is best
-	   ignored by the user who should use target, host + driver status */
-	case DID_OK:
-	case DID_PASSTHROUGH:
-	case DID_SOFT_ERROR:
-		old_hdr->result = 0;
-		break;
-	case DID_NO_CONNECT:
-	case DID_BUS_BUSY:
-	case DID_TIME_OUT:
-		old_hdr->result = EBUSY;
-		break;
-	case DID_BAD_TARGET:
-	case DID_ABORT:
-	case DID_PARITY:
-	case DID_RESET:
-	case DID_BAD_INTR:
-		old_hdr->result = EIO;
-		break;
-	case DID_ERROR:
-		old_hdr->result = (srp->sense_b[0] == 0 && 
-				  hp->masked_status == GOOD) ? 0 : EIO;
-		break;
-	default:
-		old_hdr->result = EIO;
-		break;
-	}
-
-	/* Now copy the result back to the user buffer.  */
-	if (count >= SZ_SG_HEADER) {
-		if (copy_to_user(buf, old_hdr, SZ_SG_HEADER)) {
-			retval = -EFAULT;
-			goto free_old_hdr;
-		}
-		buf += SZ_SG_HEADER;
-		if (count > old_hdr->reply_len)
-			count = old_hdr->reply_len;
-		if (count > SZ_SG_HEADER) {
-			if (sg_read_oxfer(srp, buf, count - SZ_SG_HEADER)) {
-				retval = -EFAULT;
-				goto free_old_hdr;
-			}
-		}
-	} else
-		count = (old_hdr->result == 0) ? 0 : -EIO;
-	sg_finish_rem_req(srp);
-	sg_remove_request(sfp, srp);
-	retval = count;
-free_old_hdr:
-	kfree(old_hdr);
-	return retval;
-}
-
-static ssize_t
-sg_new_read(Sg_fd * sfp, char __user *buf, size_t count, Sg_request * srp)
-{
-	sg_io_hdr_t *hp = &srp->header;
-	int err = 0, err2;
-	int len;
-
-	if (in_compat_syscall()) {
-		if (count < sizeof(struct compat_sg_io_hdr)) {
-			err = -EINVAL;
-			goto err_out;
-		}
-	} else if (count < SZ_SG_IO_HDR) {
-		err = -EINVAL;
-		goto err_out;
-	}
-	hp->sb_len_wr = 0;
-	if ((hp->mx_sb_len > 0) && hp->sbp) {
-		if ((CHECK_CONDITION & hp->masked_status) ||
-		    (DRIVER_SENSE & hp->driver_status)) {
-			int sb_len = SCSI_SENSE_BUFFERSIZE;
-			sb_len = (hp->mx_sb_len > sb_len) ? sb_len : hp->mx_sb_len;
-			len = 8 + (int) srp->sense_b[7];	/* Additional sense length field */
-			len = (len > sb_len) ? sb_len : len;
-			if (copy_to_user(hp->sbp, srp->sense_b, len)) {
-				err = -EFAULT;
-				goto err_out;
-			}
-			hp->sb_len_wr = len;
-		}
-	}
-	if (hp->masked_status || hp->host_status || hp->driver_status)
-		hp->info |= SG_INFO_CHECK;
-	err = put_sg_io_hdr(hp, buf);
-err_out:
-	err2 = sg_finish_rem_req(srp);
-	sg_remove_request(sfp, srp);
-	return err ? : err2 ? : count;
-}
-
 static ssize_t
 sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 {
@@ -708,6 +503,16 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 	return (k < 0) ? k : count;
 }
 
+static int sg_allow_access(struct file *filp, unsigned char *cmd)
+{
+	struct sg_fd *sfp = filp->private_data;
+
+	if (sfp->parentdp->device->type == TYPE_SCANNER)
+		return 0;
+
+	return blk_verify_command(cmd, filp->f_mode);
+}
+
 static ssize_t
 sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
 		 size_t count, int blocking, int read_only, int sg_io_owned,
@@ -833,6 +638,75 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
 	return 0;
 }
 
+/*
+ * read(2) related functions follow. They are shown after write(2) related
+ * functions. Apart from read(2) itself, ioctl(SG_IORECEIVE) and the second
+ * half of the ioctl(SG_IO) share code with read(2).
+ */
+
+static Sg_request *
+sg_get_rq_mark(Sg_fd *sfp, int pack_id)
+{
+	Sg_request *resp;
+	unsigned long iflags;
+
+	write_lock_irqsave(&sfp->rq_list_lock, iflags);
+	list_for_each_entry(resp, &sfp->rq_list, entry) {
+		/* look for requests that are ready + not SG_IO owned */
+		if (resp->done == 1 && !resp->sg_io_owned &&
+		    (-1 == pack_id || resp->header.pack_id == pack_id)) {
+			resp->done = 2;	/* guard against other readers */
+			write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+			return resp;
+		}
+	}
+	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	return NULL;
+}
+
+static ssize_t
+sg_new_read(Sg_fd *sfp, char __user *buf, size_t count, Sg_request *srp)
+{
+	sg_io_hdr_t *hp = &srp->header;
+	int err = 0, err2;
+	int len;
+
+	if (in_compat_syscall()) {
+		if (count < sizeof(struct compat_sg_io_hdr)) {
+			err = -EINVAL;
+			goto err_out;
+		}
+	} else if (count < SZ_SG_IO_HDR) {
+		err = -EINVAL;
+		goto err_out;
+	}
+	hp->sb_len_wr = 0;
+	if (hp->mx_sb_len > 0 && hp->sbp) {
+		if ((CHECK_CONDITION & hp->masked_status) ||
+		    (DRIVER_SENSE & hp->driver_status)) {
+			int sb_len = SCSI_SENSE_BUFFERSIZE;
+
+			sb_len = (hp->mx_sb_len > sb_len) ? sb_len :
+							    hp->mx_sb_len;
+			/* Additional sense length field */
+			len = 8 + (int)srp->sense_b[7];
+			len = (len > sb_len) ? sb_len : len;
+			if (copy_to_user(hp->sbp, srp->sense_b, len)) {
+				err = -EFAULT;
+				goto err_out;
+			}
+			hp->sb_len_wr = len;
+		}
+	}
+	if (hp->masked_status || hp->host_status || hp->driver_status)
+		hp->info |= SG_INFO_CHECK;
+	err = put_sg_io_hdr(hp, buf);
+err_out:
+	err2 = sg_finish_rem_req(srp);
+	sg_remove_request(sfp, srp);
+	return err ? : err2 ? : count;
+}
+
 static int srp_done(Sg_fd *sfp, Sg_request *srp)
 {
 	unsigned long flags;
@@ -844,6 +718,171 @@ static int srp_done(Sg_fd *sfp, Sg_request *srp)
 	return ret;
 }
 
+static ssize_t
+sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
+{
+	Sg_device *sdp;
+	Sg_fd *sfp;
+	Sg_request *srp;
+	int req_pack_id = -1;
+	sg_io_hdr_t *hp;
+	struct sg_header *old_hdr = NULL;
+	int retval = 0;
+
+	/*
+	 * This could cause a response to be stranded. Close the associated
+	 * file descriptor to free up any resources being held.
+	 */
+	retval = sg_check_file_access(filp, __func__);
+	if (retval)
+		return retval;
+
+	sfp = filp->private_data;
+	sdp = sfp->parentdp;
+	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
+				      "%s: count=%d\n", __func__,
+				      (int)count));
+	if (!sdp)
+		return -ENXIO;
+
+	if (!access_ok(buf, count))
+		return -EFAULT;
+	if (sfp->force_packid && count >= SZ_SG_HEADER) {
+		old_hdr = kmalloc(SZ_SG_HEADER, GFP_KERNEL);
+		if (!old_hdr)
+			return -ENOMEM;
+		if (__copy_from_user(old_hdr, buf, SZ_SG_HEADER)) {
+			retval = -EFAULT;
+			goto free_old_hdr;
+		}
+		if (old_hdr->reply_len < 0) {
+			if (count >= SZ_SG_IO_HDR) {
+				sg_io_hdr_t *new_hdr;
+
+				new_hdr = kmalloc(SZ_SG_IO_HDR, GFP_KERNEL);
+				if (!new_hdr) {
+					retval = -ENOMEM;
+					goto free_old_hdr;
+				}
+				retval = __copy_from_user
+				    (new_hdr, buf, SZ_SG_IO_HDR);
+				req_pack_id = new_hdr->pack_id;
+				kfree(new_hdr);
+				if (retval) {
+					retval = -EFAULT;
+					goto free_old_hdr;
+				}
+			}
+		} else {
+			req_pack_id = old_hdr->pack_id;
+		}
+	}
+	srp = sg_get_rq_mark(sfp, req_pack_id);
+	if (!srp) {		/* now wait on packet to arrive */
+		if (atomic_read(&sdp->detaching)) {
+			retval = -ENODEV;
+			goto free_old_hdr;
+		}
+		if (filp->f_flags & O_NONBLOCK) {
+			retval = -EAGAIN;
+			goto free_old_hdr;
+		}
+		retval = wait_event_interruptible
+				(sfp->read_wait,
+				 (atomic_read(&sdp->detaching) ||
+				  (srp = sg_get_rq_mark(sfp, req_pack_id))));
+		if (atomic_read(&sdp->detaching)) {
+			retval = -ENODEV;
+			goto free_old_hdr;
+		}
+		if (retval) {
+			/* -ERESTARTSYS as signal hit process */
+			goto free_old_hdr;
+		}
+	}
+	if (srp->header.interface_id != '\0') {
+		retval = sg_new_read(sfp, buf, count, srp);
+		goto free_old_hdr;
+	}
+
+	hp = &srp->header;
+	if (!old_hdr) {
+		old_hdr = kmalloc(SZ_SG_HEADER, GFP_KERNEL);
+		if (!old_hdr) {
+			retval = -ENOMEM;
+			goto free_old_hdr;
+		}
+	}
+	memset(old_hdr, 0, SZ_SG_HEADER);
+	old_hdr->reply_len = (int)hp->timeout;
+	old_hdr->pack_len = old_hdr->reply_len; /* old, strange behaviour */
+	old_hdr->pack_id = hp->pack_id;
+	old_hdr->twelve_byte =
+	    ((srp->data.cmd_opcode >= 0xc0) && (hp->cmd_len == 12)) ? 1 : 0;
+	old_hdr->target_status = hp->masked_status;
+	old_hdr->host_status = hp->host_status;
+	old_hdr->driver_status = hp->driver_status;
+	if ((hp->masked_status & CHECK_CONDITION) ||
+	    (hp->driver_status & DRIVER_SENSE))
+		memcpy(old_hdr->sense_buffer, srp->sense_b,
+		       sizeof(old_hdr->sense_buffer));
+	switch (hp->host_status) {
+	/*
+	 * This setup of 'result' is for backward compatibility and is best
+	 * ignored by the user who should use target, host + driver status
+	 */
+	case DID_OK:
+	case DID_PASSTHROUGH:
+	case DID_SOFT_ERROR:
+		old_hdr->result = 0;
+		break;
+	case DID_NO_CONNECT:
+	case DID_BUS_BUSY:
+	case DID_TIME_OUT:
+		old_hdr->result = EBUSY;
+		break;
+	case DID_BAD_TARGET:
+	case DID_ABORT:
+	case DID_PARITY:
+	case DID_RESET:
+	case DID_BAD_INTR:
+		old_hdr->result = EIO;
+		break;
+	case DID_ERROR:
+		old_hdr->result = (srp->sense_b[0] == 0 &&
+				  hp->masked_status == GOOD) ? 0 : EIO;
+		break;
+	default:
+		old_hdr->result = EIO;
+		break;
+	}
+
+	/* Now copy the result back to the user buffer.  */
+	if (count >= SZ_SG_HEADER) {
+		if (__copy_to_user(buf, old_hdr, SZ_SG_HEADER)) {
+			retval = -EFAULT;
+			goto free_old_hdr;
+		}
+		buf += SZ_SG_HEADER;
+		if (count > old_hdr->reply_len)
+			count = old_hdr->reply_len;
+		if (count > SZ_SG_HEADER) {
+			if (sg_read_oxfer(srp, buf, count - SZ_SG_HEADER)) {
+				retval = -EFAULT;
+				goto free_old_hdr;
+			}
+		}
+	} else {
+		count = (old_hdr->result == 0) ? 0 : -EIO;
+	}
+	sg_finish_rem_req(srp);
+	sg_remove_request(sfp, srp);
+	retval = count;
+free_old_hdr:
+	kfree(old_hdr);
+	return retval;
+}
+
 static int max_sectors_bytes(struct request_queue *q)
 {
 	unsigned int max_sectors = queue_max_sectors(q);
@@ -1691,9 +1730,7 @@ init_sg(void)
 	sg_sysfs_valid = 1;
 	rc = scsi_register_interface(&sg_interface);
 	if (0 == rc) {
-#ifdef CONFIG_SCSI_PROC_FS
 		sg_proc_init();
-#endif				/* CONFIG_SCSI_PROC_FS */
 		return 0;
 	}
 	class_destroy(sg_sysfs_class);
@@ -1702,6 +1739,14 @@ init_sg(void)
 	return rc;
 }
 
+#ifndef CONFIG_SCSI_PROC_FS
+static int
+sg_proc_init(void)
+{
+	return 0;
+}
+#endif
+
 static void __exit
 exit_sg(void)
 {
@@ -2068,9 +2113,10 @@ sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size)
 			rem -= num;
 	}
 
-	if (k >= rsv_schp->k_use_sg)
+	if (k >= rsv_schp->k_use_sg) {
 		SCSI_LOG_TIMEOUT(1, sg_printk(KERN_INFO, sfp->parentdp,
 				 "sg_link_reserve: BAD size\n"));
+	}
 }
 
 static void
@@ -2091,26 +2137,6 @@ sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp)
 	sfp->res_in_use = 0;
 }
 
-static Sg_request *
-sg_get_rq_mark(Sg_fd * sfp, int pack_id)
-{
-	Sg_request *resp;
-	unsigned long iflags;
-
-	write_lock_irqsave(&sfp->rq_list_lock, iflags);
-	list_for_each_entry(resp, &sfp->rq_list, entry) {
-		/* look for requests that are ready + not SG_IO owned */
-		if ((1 == resp->done) && (!resp->sg_io_owned) &&
-		    ((-1 == pack_id) || (resp->header.pack_id == pack_id))) {
-			resp->done = 2;	/* guard against other readers */
-			write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-			return resp;
-		}
-	}
-	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return NULL;
-}
-
 /* always adds to end of list */
 static Sg_request *
 sg_add_request(Sg_fd * sfp)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 02/83] sg: remove typedefs, type+formatting cleanup
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 00/45] sg: add v4 interface Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 01/83] sg: move functions around Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 03/83] sg: sg_log and is_enabled Douglas Gilbert
                   ` (80 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi
  Cc: martin.petersen, jejb, hare, Johannes Thumshirn,
	Christoph Hellwig, Hannes Reinecke

Typedefs for structure types are discouraged so those structures
that are private to the driver have had their typedefs removed.

This also means that most "camel" type variable names (i.e. mixed
case) have been removed.

Reviewed-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 400 +++++++++++++++++++++++++---------------------
 1 file changed, 222 insertions(+), 178 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 5750bbb073dd..443ea7f36f2b 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -3,7 +3,7 @@
  *  History:
  *  Started: Aug 9 by Lawrence Foard (entropy@world.std.com),
  *           to allow user process control of SCSI devices.
- *  Development Sponsored by Killy Corp. NY NY
+ *  Development Sponsored by Killy Corp. NY NY   [guess: 1992]
  *
  * Original driver (sg.c):
  *        Copyright (C) 1992 Lawrence Foard
@@ -15,13 +15,6 @@ static int sg_version_num = 30901;  /* [x]xyyzz where [x] empty when x=0 */
 #define SG_VERSION_STR "3.9.01"		/* [x]x.[y]y.zz */
 static char *sg_version_date = "20190606";
 
-/*
- *  D. P. Gilbert (dgilbert@interlog.com), notes:
- *      - scsi logging is available via SCSI_LOG_TIMEOUT macros. First
- *        the kernel/module needs to be built with CONFIG_SCSI_LOGGING
- *        (otherwise the macros compile to empty statements).
- *
- */
 #include <linux/module.h>
 
 #include <linux/fs.h>
@@ -91,33 +84,32 @@ static int sg_add_device(struct device *, struct class_interface *);
 static void sg_remove_device(struct device *, struct class_interface *);
 
 static DEFINE_IDR(sg_index_idr);
-static DEFINE_RWLOCK(sg_index_lock);	/* Also used to lock
-							   file descriptor list for device */
+static DEFINE_RWLOCK(sg_index_lock); /* Also used to lock fd list for device */
 
 static struct class_interface sg_interface = {
 	.add_dev        = sg_add_device,
 	.remove_dev     = sg_remove_device,
 };
 
-typedef struct sg_scatter_hold { /* holding area for scsi scatter gather info */
-	unsigned short k_use_sg; /* Count of kernel scatter-gather pieces */
-	unsigned sglist_len; /* size of malloc'd scatter-gather list ++ */
-	unsigned bufflen;	/* Size of (aggregate) data buffer */
+struct sg_scatter_hold { /* holding area for scsi scatter gather info */
+	u16 k_use_sg; /* Count of kernel scatter-gather pieces */
+	unsigned int sglist_len; /* size of malloc'd scatter-gather list ++ */
+	unsigned int bufflen;	/* Size of (aggregate) data buffer */
 	struct page **pages;
 	int page_order;
 	char dio_in_use;	/* 0->indirect IO (or mmap), 1->dio */
-	unsigned char cmd_opcode; /* first byte of command */
-} Sg_scatter_hold;
+	u8 cmd_opcode;		/* first byte of command */
+};
 
 struct sg_device;		/* forward declarations */
 struct sg_fd;
 
-typedef struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
+struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
 	struct list_head entry;	/* list entry */
 	struct sg_fd *parentfp;	/* NULL -> not in use */
-	Sg_scatter_hold data;	/* hold buffer, perhaps scatter list */
+	struct sg_scatter_hold data;	/* hold buffer, perhaps scatter list */
 	sg_io_hdr_t header;	/* scsi command+info, see <scsi/sg.h> */
-	unsigned char sense_b[SCSI_SENSE_BUFFERSIZE];
+	u8 sense_b[SCSI_SENSE_BUFFERSIZE];
 	char res_used;		/* 1 -> using reserve buffer, 0 -> not ... */
 	char orphan;		/* 1 -> drop on sight, 0 -> normal */
 	char sg_io_owned;	/* 1 -> packet belongs to SG_IO */
@@ -126,9 +118,9 @@ typedef struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
 	struct request *rq;
 	struct bio *bio;
 	struct execute_work ew;
-} Sg_request;
+};
 
-typedef struct sg_fd {		/* holds the state of a file descriptor */
+struct sg_fd {		/* holds the state of a file descriptor */
 	struct list_head sfd_siblings;  /* protected by device's sfd_lock */
 	struct sg_device *parentdp;	/* owning device */
 	wait_queue_head_t read_wait;	/* queue read until command done */
@@ -136,21 +128,21 @@ typedef struct sg_fd {		/* holds the state of a file descriptor */
 	struct mutex f_mutex;	/* protect against changes in this fd */
 	int timeout;		/* defaults to SG_DEFAULT_TIMEOUT      */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
-	Sg_scatter_hold reserve;	/* buffer held for this file descriptor */
+	struct sg_scatter_hold reserve;	/* buffer for this file descriptor */
 	struct list_head rq_list; /* head of request list */
 	struct fasync_struct *async_qp;	/* used by asynchronous notification */
-	Sg_request req_arr[SG_MAX_QUEUE];	/* used as singly-linked list */
+	struct sg_request req_arr[SG_MAX_QUEUE];/* use as singly-linked list */
 	char force_packid;	/* 1 -> pack_id input to read(), 0 -> ignored */
 	char cmd_q;		/* 1 -> allow command queuing, 0 -> don't */
-	unsigned char next_cmd_len; /* 0: automatic, >0: use on next write() */
+	u8 next_cmd_len;	/* 0: automatic, >0: use on next write() */
 	char keep_orphan;	/* 0 -> drop orphan (def), 1 -> keep for read() */
 	char mmap_called;	/* 0 -> mmap() never called on this fd */
 	char res_in_use;	/* 1 -> 'reserve' array in use */
 	struct kref f_ref;
 	struct execute_work ew;
-} Sg_fd;
+};
 
-typedef struct sg_device { /* holds the state of each scsi generic device */
+struct sg_device { /* holds the state of each scsi generic device */
 	struct scsi_device *device;
 	wait_queue_head_t open_wait;    /* queue open() when O_EXCL present */
 	struct mutex open_rel_lock;     /* held when in open() or release() */
@@ -163,32 +155,36 @@ typedef struct sg_device { /* holds the state of each scsi generic device */
 	int open_cnt;		/* count of opens (perhaps < num(sfds) ) */
 	char sgdebug;		/* 0->off, 1->sense, 9->dump dev, 10-> all devs */
 	struct gendisk *disk;
-	struct cdev * cdev;	/* char_dev [sysfs: /sys/cdev/major/sg<n>] */
+	struct cdev *cdev;
 	struct kref d_ref;
-} Sg_device;
+};
 
 /* tasklet or soft irq callback */
 static void sg_rq_end_io(struct request *rq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
 static int sg_proc_init(void);
-static int sg_start_req(Sg_request *srp, unsigned char *cmd);
-static int sg_finish_rem_req(Sg_request * srp);
-static int sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size);
-static ssize_t sg_new_write(Sg_fd *sfp, struct file *file,
-			const char __user *buf, size_t count, int blocking,
-			int read_only, int sg_io_owned, Sg_request **o_srp);
-static int sg_common_write(Sg_fd * sfp, Sg_request * srp,
-			   unsigned char *cmnd, int timeout, int blocking);
-static int sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer);
-static void sg_remove_scat(Sg_fd * sfp, Sg_scatter_hold * schp);
-static void sg_build_reserve(Sg_fd * sfp, int req_size);
-static void sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size);
-static void sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp);
-static Sg_fd *sg_add_sfp(Sg_device * sdp);
+static int sg_start_req(struct sg_request *srp, u8 *cmd);
+static int sg_finish_rem_req(struct sg_request *srp);
+static int sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
+			     int buff_size);
+static ssize_t sg_new_write(struct sg_fd *sfp, struct file *file,
+			    const char __user *buf, size_t count, int blocking,
+			    int read_only, int sg_io_owned,
+			    struct sg_request **o_srp);
+static int sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
+			   u8 *cmnd, int timeout, int blocking);
+static int sg_read_oxfer(struct sg_request *srp, char __user *outp,
+			 int num_read_xfer);
+static void sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp);
+static void sg_build_reserve(struct sg_fd *sfp, int req_size);
+static void sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp,
+			    int size);
+static void sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp);
+static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
 static void sg_remove_sfp(struct kref *);
-static Sg_request *sg_add_request(Sg_fd * sfp);
-static int sg_remove_request(Sg_fd * sfp, Sg_request * srp);
-static Sg_device *sg_get_dev(int dev);
+static struct sg_request *sg_add_request(struct sg_fd *sfp);
+static int sg_remove_request(struct sg_fd *sfp, struct sg_request *srp);
+static struct sg_device *sg_get_dev(int dev);
 static void sg_device_destroy(struct kref *kref);
 
 #define SZ_SG_HEADER sizeof(struct sg_header)
@@ -212,7 +208,8 @@ static void sg_device_destroy(struct kref *kref);
  * This function provides protection for the legacy API by restricting the
  * calling context.
  */
-static int sg_check_file_access(struct file *filp, const char *caller)
+static int
+sg_check_file_access(struct file *filp, const char *caller)
 {
 	if (filp->f_cred != current_real_cred()) {
 		pr_err_once("%s: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n",
@@ -228,11 +225,11 @@ static int sg_check_file_access(struct file *filp, const char *caller)
 }
 
 static int
-open_wait(Sg_device *sdp, int flags)
+sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 {
 	int retval = 0;
 
-	if (flags & O_EXCL) {
+	if (o_excl) {
 		while (sdp->open_cnt > 0) {
 			mutex_unlock(&sdp->open_rel_lock);
 			retval = wait_event_interruptible(sdp->open_wait,
@@ -263,26 +260,34 @@ open_wait(Sg_device *sdp, int flags)
 	return retval;
 }
 
-/* Returns 0 on success, else a negated errno value */
+/*
+ * Corresponds to the open() system call on sg devices. Implements O_EXCL on
+ * a per device basis using 'open_cnt'. If O_EXCL and O_NONBLOCK and there is
+ * already a sg handle open on this device then it fails with an errno of
+ * EBUSY. Without the O_NONBLOCK flag then this thread enters an interruptible
+ * wait until the other handle(s) are closed.
+ */
 static int
 sg_open(struct inode *inode, struct file *filp)
 {
-	int dev = iminor(inode);
-	int flags = filp->f_flags;
+	bool o_excl;
+	int min_dev = iminor(inode);
+	int op_flags = filp->f_flags;
 	struct request_queue *q;
-	Sg_device *sdp;
-	Sg_fd *sfp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
 	int retval;
 
 	nonseekable_open(inode, filp);
-	if ((flags & O_EXCL) && (O_RDONLY == (flags & O_ACCMODE)))
+	o_excl = !!(op_flags & O_EXCL);
+	if (o_excl && ((op_flags & O_ACCMODE) == O_RDONLY))
 		return -EPERM; /* Can't lock it with read only access */
-	sdp = sg_get_dev(dev);
+	sdp = sg_get_dev(min_dev);
 	if (IS_ERR(sdp))
 		return PTR_ERR(sdp);
 
 	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_open: flags=0x%x\n", flags));
+				      "sg_open: flags=0x%x\n", op_flags));
 
 	/* This driver's module count bumped by fops_get in <linux/fs.h> */
 	/* Prevent the device driver from vanishing while we sleep */
@@ -297,7 +302,7 @@ sg_open(struct inode *inode, struct file *filp)
 	/* scsi_block_when_processing_errors() may block so bypass
 	 * check if O_NONBLOCK. Permits SCSI commands to be issued
 	 * during error recovery. Tread carefully. */
-	if (!((flags & O_NONBLOCK) ||
+	if (!((op_flags & O_NONBLOCK) ||
 	      scsi_block_when_processing_errors(sdp->device))) {
 		retval = -ENXIO;
 		/* we are in error recovery for this device */
@@ -305,8 +310,8 @@ sg_open(struct inode *inode, struct file *filp)
 	}
 
 	mutex_lock(&sdp->open_rel_lock);
-	if (flags & O_NONBLOCK) {
-		if (flags & O_EXCL) {
+	if (op_flags & O_NONBLOCK) {
+		if (o_excl) {
 			if (sdp->open_cnt > 0) {
 				retval = -EBUSY;
 				goto error_mutex_locked;
@@ -318,13 +323,13 @@ sg_open(struct inode *inode, struct file *filp)
 			}
 		}
 	} else {
-		retval = open_wait(sdp, flags);
+		retval = sg_wait_open_event(sdp, o_excl);
 		if (retval) /* -ERESTARTSYS or -ENODEV */
 			goto error_mutex_locked;
 	}
 
 	/* N.B. at this point we are holding the open_rel_lock */
-	if (flags & O_EXCL)
+	if (o_excl)
 		sdp->exclude = true;
 
 	if (sdp->open_cnt < 1) {  /* no existing opens */
@@ -348,7 +353,7 @@ sg_open(struct inode *inode, struct file *filp)
 	return retval;
 
 out_undo:
-	if (flags & O_EXCL) {
+	if (o_excl) {
 		sdp->exclude = false;   /* undo if error */
 		wake_up_interruptible(&sdp->open_wait);
 	}
@@ -366,10 +371,12 @@ sg_open(struct inode *inode, struct file *filp)
 static int
 sg_release(struct inode *inode, struct file *filp)
 {
-	Sg_device *sdp;
-	Sg_fd *sfp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
 
-	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
+	sfp = filp->private_data;
+	sdp = sfp->parentdp;
+	if (!sdp)
 		return -ENXIO;
 	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp, "sg_release\n"));
 
@@ -378,7 +385,7 @@ sg_release(struct inode *inode, struct file *filp)
 	kref_put(&sfp->f_ref, sg_remove_sfp);
 	sdp->open_cnt--;
 
-	/* possibly many open()s waiting on exlude clearing, start many;
+	/* possibly many open()s waiting on exclude clearing, start many;
 	 * only open(O_EXCL)s wait on 0==open_cnt so only start one */
 	if (sdp->exclude) {
 		sdp->exclude = false;
@@ -395,20 +402,22 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 {
 	int mxsize, cmd_size, k;
 	int input_size, blocking;
-	unsigned char opcode;
-	Sg_device *sdp;
-	Sg_fd *sfp;
-	Sg_request *srp;
+	u8 opcode;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
+	struct sg_request *srp;
 	struct sg_header old_hdr;
 	sg_io_hdr_t *hp;
-	unsigned char cmnd[SG_MAX_CDB_SIZE];
+	u8 cmnd[SG_MAX_CDB_SIZE];
 	int retval;
 
 	retval = sg_check_file_access(filp, __func__);
 	if (retval)
 		return retval;
 
-	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
+	sfp = filp->private_data;
+	sdp = sfp->parentdp;
+	if (!sdp)
 		return -ENXIO;
 	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
 				      "sg_write: count=%d\n", (int) count));
@@ -461,7 +470,7 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 	}
 	hp = &srp->header;
 	hp->interface_id = '\0';	/* indicator of old interface tunnelled */
-	hp->cmd_len = (unsigned char) cmd_size;
+	hp->cmd_len = (u8)cmd_size;
 	hp->iovec_count = 0;
 	hp->mx_sb_len = 0;
 	if (input_size > 0)
@@ -503,7 +512,8 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 	return (k < 0) ? k : count;
 }
 
-static int sg_allow_access(struct file *filp, unsigned char *cmd)
+static int
+sg_allow_access(struct file *filp, u8 *cmd)
 {
 	struct sg_fd *sfp = filp->private_data;
 
@@ -514,14 +524,14 @@ static int sg_allow_access(struct file *filp, unsigned char *cmd)
 }
 
 static ssize_t
-sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
-		 size_t count, int blocking, int read_only, int sg_io_owned,
-		 Sg_request **o_srp)
+sg_new_write(struct sg_fd *sfp, struct file *file, const char __user *buf,
+	     size_t count, int blocking, int read_only, int sg_io_owned,
+	     struct sg_request **o_srp)
 {
 	int k;
-	Sg_request *srp;
+	struct sg_request *srp;
 	sg_io_hdr_t *hp;
-	unsigned char cmnd[SG_MAX_CDB_SIZE];
+	u8 cmnd[SG_MAX_CDB_SIZE];
 	int timeout;
 	unsigned long ul_timeout;
 
@@ -581,11 +591,11 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
 }
 
 static int
-sg_common_write(Sg_fd * sfp, Sg_request * srp,
-		unsigned char *cmnd, int timeout, int blocking)
+sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
+		u8 *cmnd, int timeout, int blocking)
 {
 	int k, at_head;
-	Sg_device *sdp = sfp->parentdp;
+	struct sg_device *sdp = sfp->parentdp;
 	sg_io_hdr_t *hp = &srp->header;
 
 	srp->data.cmd_opcode = cmnd[0];	/* hold opcode of command */
@@ -644,10 +654,10 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
  * half of the ioctl(SG_IO) share code with read(2).
  */
 
-static Sg_request *
-sg_get_rq_mark(Sg_fd *sfp, int pack_id)
+static struct sg_request *
+sg_get_rq_mark(struct sg_fd *sfp, int pack_id)
 {
-	Sg_request *resp;
+	struct sg_request *resp;
 	unsigned long iflags;
 
 	write_lock_irqsave(&sfp->rq_list_lock, iflags);
@@ -665,7 +675,8 @@ sg_get_rq_mark(Sg_fd *sfp, int pack_id)
 }
 
 static ssize_t
-sg_new_read(Sg_fd *sfp, char __user *buf, size_t count, Sg_request *srp)
+sg_new_read(struct sg_fd *sfp, char __user *buf, size_t count,
+	    struct sg_request *srp)
 {
 	sg_io_hdr_t *hp = &srp->header;
 	int err = 0, err2;
@@ -707,7 +718,8 @@ sg_new_read(Sg_fd *sfp, char __user *buf, size_t count, Sg_request *srp)
 	return err ? : err2 ? : count;
 }
 
-static int srp_done(Sg_fd *sfp, Sg_request *srp)
+static int
+srp_done(struct sg_fd *sfp, struct sg_request *srp)
 {
 	unsigned long flags;
 	int ret;
@@ -721,9 +733,9 @@ static int srp_done(Sg_fd *sfp, Sg_request *srp)
 static ssize_t
 sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 {
-	Sg_device *sdp;
-	Sg_fd *sfp;
-	Sg_request *srp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
+	struct sg_request *srp;
 	int req_pack_id = -1;
 	sg_io_hdr_t *hp;
 	struct sg_header *old_hdr = NULL;
@@ -883,7 +895,8 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 	return retval;
 }
 
-static int max_sectors_bytes(struct request_queue *q)
+static int
+max_sectors_bytes(struct request_queue *q)
 {
 	unsigned int max_sectors = queue_max_sectors(q);
 
@@ -893,9 +906,9 @@ static int max_sectors_bytes(struct request_queue *q)
 }
 
 static void
-sg_fill_request_table(Sg_fd *sfp, sg_req_info_t *rinfo)
+sg_fill_request_table(struct sg_fd *sfp, struct sg_req_info *rinfo)
 {
-	Sg_request *srp;
+	struct sg_request *srp;
 	int val;
 	unsigned int ms;
 
@@ -953,12 +966,12 @@ static int put_compat_request_table(struct compat_sg_req_info __user *o,
 #endif
 
 static long
-sg_ioctl_common(struct file *filp, Sg_device *sdp, Sg_fd *sfp,
+sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		unsigned int cmd_in, void __user *p)
 {
 	int __user *ip = p;
 	int result, val, read_only;
-	Sg_request *srp;
+	struct sg_request *srp;
 	unsigned long iflags;
 
 	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
@@ -1191,11 +1204,13 @@ static long
 sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 {
 	void __user *p = (void __user *)arg;
-	Sg_device *sdp;
-	Sg_fd *sfp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
 	int ret;
 
-	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
+	sfp = filp->private_data;
+	sdp = sfp->parentdp;
+	if (!sdp)
 		return -ENXIO;
 
 	ret = sg_ioctl_common(filp, sdp, sfp, cmd_in, p);
@@ -1209,11 +1224,13 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 static long sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 {
 	void __user *p = compat_ptr(arg);
-	Sg_device *sdp;
-	Sg_fd *sfp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
 	int ret;
 
-	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
+	sfp = filp->private_data;
+	sdp = sfp->parentdp;
+	if (!sdp)
 		return -ENXIO;
 
 	ret = sg_ioctl_common(filp, sdp, sfp, cmd_in, p);
@@ -1228,9 +1245,9 @@ static __poll_t
 sg_poll(struct file *filp, poll_table * wait)
 {
 	__poll_t res = 0;
-	Sg_device *sdp;
-	Sg_fd *sfp;
-	Sg_request *srp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
+	struct sg_request *srp;
 	int count = 0;
 	unsigned long iflags;
 
@@ -1265,10 +1282,12 @@ sg_poll(struct file *filp, poll_table * wait)
 static int
 sg_fasync(int fd, struct file *filp, int mode)
 {
-	Sg_device *sdp;
-	Sg_fd *sfp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
 
-	if ((!(sfp = (Sg_fd *) filp->private_data)) || (!(sdp = sfp->parentdp)))
+	sfp = filp->private_data;
+	sdp = sfp->parentdp;
+	if (!sdp)
 		return -ENXIO;
 	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
 				      "sg_fasync: mode=%d\n", mode));
@@ -1280,13 +1299,21 @@ static vm_fault_t
 sg_vma_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	Sg_fd *sfp;
+	struct sg_fd *sfp;
 	unsigned long offset, len, sa;
-	Sg_scatter_hold *rsv_schp;
+	struct sg_scatter_hold *rsv_schp;
 	int k, length;
+	const char *nbp = "==NULL, bad";
 
-	if ((NULL == vma) || (!(sfp = (Sg_fd *) vma->vm_private_data)))
-		return VM_FAULT_SIGBUS;
+	if (!vma) {
+		pr_warn("%s: vma%s\n", __func__, nbp);
+		goto out_err;
+	}
+	sfp = vma->vm_private_data;
+	if (!sfp) {
+		pr_warn("%s: sfp%s\n", __func__, nbp);
+		goto out_err;
+	}
 	rsv_schp = &sfp->reserve;
 	offset = vmf->pgoff << PAGE_SHIFT;
 	if (offset >= rsv_schp->bufflen)
@@ -1309,7 +1336,7 @@ sg_vma_fault(struct vm_fault *vmf)
 		sa += len;
 		offset -= len;
 	}
-
+out_err:
 	return VM_FAULT_SIGBUS;
 }
 
@@ -1320,14 +1347,19 @@ static const struct vm_operations_struct sg_mmap_vm_ops = {
 static int
 sg_mmap(struct file *filp, struct vm_area_struct *vma)
 {
-	Sg_fd *sfp;
+	struct sg_fd *sfp;
 	unsigned long req_sz, len, sa;
-	Sg_scatter_hold *rsv_schp;
+	struct sg_scatter_hold *rsv_schp;
 	int k, length;
 	int ret = 0;
 
-	if ((!filp) || (!vma) || (!(sfp = (Sg_fd *) filp->private_data)))
+	if (!filp || !vma)
+		return -ENXIO;
+	sfp = filp->private_data;
+	if (!sfp) {
+		pr_warn("sg: %s: sfp is NULL\n", __func__);
 		return -ENXIO;
+	}
 	req_sz = vma->vm_end - vma->vm_start;
 	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sfp->parentdp,
 				      "sg_mmap starting, vm_start=%p, len=%d\n",
@@ -1378,8 +1410,8 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 {
 	struct sg_request *srp = rq->end_io_data;
 	struct scsi_request *req = scsi_req(rq);
-	Sg_device *sdp;
-	Sg_fd *sfp;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
 	unsigned long iflags;
 	unsigned int ms;
 	char *sense;
@@ -1491,21 +1523,18 @@ static struct class *sg_sysfs_class;
 
 static int sg_sysfs_valid = 0;
 
-static Sg_device *
+static struct sg_device *
 sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 {
 	struct request_queue *q = scsidp->request_queue;
-	Sg_device *sdp;
+	struct sg_device *sdp;
 	unsigned long iflags;
 	int error;
 	u32 k;
 
-	sdp = kzalloc(sizeof(Sg_device), GFP_KERNEL);
-	if (!sdp) {
-		sdev_printk(KERN_WARNING, scsidp, "%s: kmalloc Sg_device "
-			    "failure\n", __func__);
+	sdp = kzalloc(sizeof(*sdp), GFP_KERNEL);
+	if (!sdp)
 		return ERR_PTR(-ENOMEM);
-	}
 
 	idr_preload(GFP_KERNEL);
 	write_lock_irqsave(&sg_index_lock, iflags);
@@ -1518,8 +1547,8 @@ sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 				    scsidp->type, SG_MAX_DEVS - 1);
 			error = -ENODEV;
 		} else {
-			sdev_printk(KERN_WARNING, scsidp, "%s: idr "
-				    "allocation Sg_device failure: %d\n",
+			sdev_printk(KERN_WARNING, scsidp,
+				    "%s: idr alloc sg_device failure: %d\n",
 				    __func__, error);
 		}
 		goto out_unlock;
@@ -1558,7 +1587,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 {
 	struct scsi_device *scsidp = to_scsi_device(cl_dev->parent);
 	struct gendisk *disk;
-	Sg_device *sdp = NULL;
+	struct sg_device *sdp = NULL;
 	struct cdev * cdev = NULL;
 	int error;
 	unsigned long iflags;
@@ -1657,9 +1686,9 @@ static void
 sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 {
 	struct scsi_device *scsidp = to_scsi_device(cl_dev->parent);
-	Sg_device *sdp = dev_get_drvdata(cl_dev);
+	struct sg_device *sdp = dev_get_drvdata(cl_dev);
 	unsigned long iflags;
-	Sg_fd *sfp;
+	struct sg_fd *sfp;
 	int val;
 
 	if (!sdp)
@@ -1762,22 +1791,22 @@ exit_sg(void)
 }
 
 static int
-sg_start_req(Sg_request *srp, unsigned char *cmd)
+sg_start_req(struct sg_request *srp, u8 *cmd)
 {
 	int res;
 	struct request *rq;
 	struct scsi_request *req;
-	Sg_fd *sfp = srp->parentfp;
+	struct sg_fd *sfp = srp->parentfp;
 	sg_io_hdr_t *hp = &srp->header;
 	int dxfer_len = (int) hp->dxfer_len;
 	int dxfer_dir = hp->dxfer_direction;
 	unsigned int iov_count = hp->iovec_count;
-	Sg_scatter_hold *req_schp = &srp->data;
-	Sg_scatter_hold *rsv_schp = &sfp->reserve;
+	struct sg_scatter_hold *req_schp = &srp->data;
+	struct sg_scatter_hold *rsv_schp = &sfp->reserve;
 	struct request_queue *q = sfp->parentdp->device->request_queue;
 	struct rq_map_data *md, map_data;
 	int rw = hp->dxfer_direction == SG_DXFER_TO_DEV ? WRITE : READ;
-	unsigned char *long_cmdp = NULL;
+	u8 *long_cmdp = NULL;
 
 	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
 				      "sg_start_req: dxfer_len=%d\n",
@@ -1892,12 +1921,12 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
 }
 
 static int
-sg_finish_rem_req(Sg_request *srp)
+sg_finish_rem_req(struct sg_request *srp)
 {
 	int ret = 0;
 
-	Sg_fd *sfp = srp->parentfp;
-	Sg_scatter_hold *req_schp = &srp->data;
+	struct sg_fd *sfp = srp->parentfp;
+	struct sg_scatter_hold *req_schp = &srp->data;
 
 	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
 				      "sg_finish_rem_req: res_used=%d\n",
@@ -1919,7 +1948,8 @@ sg_finish_rem_req(Sg_request *srp)
 }
 
 static int
-sg_build_sgat(Sg_scatter_hold * schp, const Sg_fd * sfp, int tablesize)
+sg_build_sgat(struct sg_scatter_hold *schp, const struct sg_fd *sfp,
+	      int tablesize)
 {
 	int sg_bufflen = tablesize * sizeof(struct page *);
 	gfp_t gfp_flags = GFP_ATOMIC | __GFP_NOWARN;
@@ -1932,7 +1962,8 @@ sg_build_sgat(Sg_scatter_hold * schp, const Sg_fd * sfp, int tablesize)
 }
 
 static int
-sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size)
+sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
+		  int buff_size)
 {
 	int ret_sz = 0, i, k, rem_sz, num, mx_sc_elems;
 	int sg_tablesize = sfp->parentdp->sg_tablesize;
@@ -2014,7 +2045,7 @@ sg_build_indirect(Sg_scatter_hold * schp, Sg_fd * sfp, int buff_size)
 }
 
 static void
-sg_remove_scat(Sg_fd * sfp, Sg_scatter_hold * schp)
+sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
 {
 	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
 			 "sg_remove_scat: k_use_sg=%d\n", schp->k_use_sg));
@@ -2037,9 +2068,9 @@ sg_remove_scat(Sg_fd * sfp, Sg_scatter_hold * schp)
 }
 
 static int
-sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer)
+sg_read_oxfer(struct sg_request *srp, char __user *outp, int num_read_xfer)
 {
-	Sg_scatter_hold *schp = &srp->data;
+	struct sg_scatter_hold *schp = &srp->data;
 	int k, num;
 
 	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, srp->parentfp->parentdp,
@@ -2070,9 +2101,9 @@ sg_read_oxfer(Sg_request * srp, char __user *outp, int num_read_xfer)
 }
 
 static void
-sg_build_reserve(Sg_fd * sfp, int req_size)
+sg_build_reserve(struct sg_fd *sfp, int req_size)
 {
-	Sg_scatter_hold *schp = &sfp->reserve;
+	struct sg_scatter_hold *schp = &sfp->reserve;
 
 	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
 			 "sg_build_reserve: req_size=%d\n", req_size));
@@ -2088,10 +2119,10 @@ sg_build_reserve(Sg_fd * sfp, int req_size)
 }
 
 static void
-sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size)
+sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp, int size)
 {
-	Sg_scatter_hold *req_schp = &srp->data;
-	Sg_scatter_hold *rsv_schp = &sfp->reserve;
+	struct sg_scatter_hold *req_schp = &srp->data;
+	struct sg_scatter_hold *rsv_schp = &sfp->reserve;
 	int k, num, rem;
 
 	srp->res_used = 1;
@@ -2120,9 +2151,9 @@ sg_link_reserve(Sg_fd * sfp, Sg_request * srp, int size)
 }
 
 static void
-sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp)
+sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp)
 {
-	Sg_scatter_hold *req_schp = &srp->data;
+	struct sg_scatter_hold *req_schp = &srp->data;
 
 	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, srp->parentfp->parentdp,
 				      "sg_unlink_reserve: req->k_use_sg=%d\n",
@@ -2138,12 +2169,12 @@ sg_unlink_reserve(Sg_fd * sfp, Sg_request * srp)
 }
 
 /* always adds to end of list */
-static Sg_request *
-sg_add_request(Sg_fd * sfp)
+static struct sg_request *
+sg_add_request(struct sg_fd *sfp)
 {
 	int k;
 	unsigned long iflags;
-	Sg_request *rp = sfp->req_arr;
+	struct sg_request *rp = sfp->req_arr;
 
 	write_lock_irqsave(&sfp->rq_list_lock, iflags);
 	if (!list_empty(&sfp->rq_list)) {
@@ -2157,7 +2188,7 @@ sg_add_request(Sg_fd * sfp)
 		if (k >= SG_MAX_QUEUE)
 			goto out_unlock;
 	}
-	memset(rp, 0, sizeof (Sg_request));
+	memset(rp, 0, sizeof(struct sg_request));
 	rp->parentfp = sfp;
 	rp->header.duration = jiffies_to_msecs(jiffies);
 	list_add_tail(&rp->entry, &sfp->rq_list);
@@ -2170,7 +2201,7 @@ sg_add_request(Sg_fd * sfp)
 
 /* Return of 1 for found; 0 for not found */
 static int
-sg_remove_request(Sg_fd * sfp, Sg_request * srp)
+sg_remove_request(struct sg_fd *sfp, struct sg_request *srp)
 {
 	unsigned long iflags;
 	int res = 0;
@@ -2187,10 +2218,10 @@ sg_remove_request(Sg_fd * sfp, Sg_request * srp)
 	return res;
 }
 
-static Sg_fd *
-sg_add_sfp(Sg_device * sdp)
+static struct sg_fd *
+sg_add_sfp(struct sg_device *sdp)
 {
-	Sg_fd *sfp;
+	struct sg_fd *sfp;
 	unsigned long iflags;
 	int bufflen;
 
@@ -2240,13 +2271,13 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 {
 	struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work);
 	struct sg_device *sdp = sfp->parentdp;
-	Sg_request *srp;
+	struct sg_request *srp;
 	unsigned long iflags;
 
 	/* Cleanup any responses which were never read(). */
 	write_lock_irqsave(&sfp->rq_list_lock, iflags);
 	while (!list_empty(&sfp->rq_list)) {
-		srp = list_first_entry(&sfp->rq_list, Sg_request, entry);
+		srp = list_first_entry(&sfp->rq_list, struct sg_request, entry);
 		sg_finish_rem_req(srp);
 		list_del(&srp->entry);
 		srp->parentfp = NULL;
@@ -2311,12 +2342,13 @@ sg_last_dev(void)
 #endif
 
 /* must be called with sg_index_lock held */
-static Sg_device *sg_lookup_dev(int dev)
+static struct sg_device *
+sg_lookup_dev(int dev)
 {
 	return idr_find(&sg_index_idr, dev);
 }
 
-static Sg_device *
+static struct sg_device *
 sg_get_dev(int dev)
 {
 	struct sg_device *sdp;
@@ -2412,13 +2444,15 @@ sg_proc_init(void)
 }
 
 
-static int sg_proc_seq_show_int(struct seq_file *s, void *v)
+static int
+sg_proc_seq_show_int(struct seq_file *s, void *v)
 {
 	seq_printf(s, "%d\n", *((int *)s->private));
 	return 0;
 }
 
-static int sg_proc_single_open_adio(struct inode *inode, struct file *file)
+static int
+sg_proc_single_open_adio(struct inode *inode, struct file *file)
 {
 	return single_open(file, sg_proc_seq_show_int, &sg_allow_dio);
 }
@@ -2439,7 +2473,8 @@ sg_proc_write_adio(struct file *filp, const char __user *buffer,
 	return count;
 }
 
-static int sg_proc_single_open_dressz(struct inode *inode, struct file *file)
+static int
+sg_proc_single_open_dressz(struct inode *inode, struct file *file)
 {
 	return single_open(file, sg_proc_seq_show_int, &sg_big_buff);
 }
@@ -2464,14 +2499,16 @@ sg_proc_write_dressz(struct file *filp, const char __user *buffer,
 	return -ERANGE;
 }
 
-static int sg_proc_seq_show_version(struct seq_file *s, void *v)
+static int
+sg_proc_seq_show_version(struct seq_file *s, void *v)
 {
 	seq_printf(s, "%d\t%s [%s]\n", sg_version_num, SG_VERSION_STR,
 		   sg_version_date);
 	return 0;
 }
 
-static int sg_proc_seq_show_devhdr(struct seq_file *s, void *v)
+static int
+sg_proc_seq_show_devhdr(struct seq_file *s, void *v)
 {
 	seq_puts(s, "host\tchan\tid\tlun\ttype\topens\tqdepth\tbusy\tonline\n");
 	return 0;
@@ -2482,7 +2519,8 @@ struct sg_proc_deviter {
 	size_t	max;
 };
 
-static void * dev_seq_start(struct seq_file *s, loff_t *pos)
+static void *
+dev_seq_start(struct seq_file *s, loff_t *pos)
 {
 	struct sg_proc_deviter * it = kmalloc(sizeof(*it), GFP_KERNEL);
 
@@ -2497,7 +2535,8 @@ static void * dev_seq_start(struct seq_file *s, loff_t *pos)
 	return it;
 }
 
-static void * dev_seq_next(struct seq_file *s, void *v, loff_t *pos)
+static void *
+dev_seq_next(struct seq_file *s, void *v, loff_t *pos)
 {
 	struct sg_proc_deviter * it = s->private;
 
@@ -2505,15 +2544,17 @@ static void * dev_seq_next(struct seq_file *s, void *v, loff_t *pos)
 	return (it->index < it->max) ? it : NULL;
 }
 
-static void dev_seq_stop(struct seq_file *s, void *v)
+static void
+dev_seq_stop(struct seq_file *s, void *v)
 {
 	kfree(s->private);
 }
 
-static int sg_proc_seq_show_dev(struct seq_file *s, void *v)
+static int
+sg_proc_seq_show_dev(struct seq_file *s, void *v)
 {
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
-	Sg_device *sdp;
+	struct sg_device *sdp;
 	struct scsi_device *scsidp;
 	unsigned long iflags;
 
@@ -2536,10 +2577,11 @@ static int sg_proc_seq_show_dev(struct seq_file *s, void *v)
 	return 0;
 }
 
-static int sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
+static int
+sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 {
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
-	Sg_device *sdp;
+	struct sg_device *sdp;
 	struct scsi_device *scsidp;
 	unsigned long iflags;
 
@@ -2556,11 +2598,12 @@ static int sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 }
 
 /* must be called while holding sg_index_lock */
-static void sg_proc_debug_helper(struct seq_file *s, Sg_device * sdp)
+static void
+sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 {
 	int k, new_interface, blen, usg;
-	Sg_request *srp;
-	Sg_fd *fp;
+	struct sg_request *srp;
+	struct sg_fd *fp;
 	const sg_io_hdr_t *hp;
 	const char * cp;
 	unsigned int ms;
@@ -2619,10 +2662,11 @@ static void sg_proc_debug_helper(struct seq_file *s, Sg_device * sdp)
 	}
 }
 
-static int sg_proc_seq_show_debug(struct seq_file *s, void *v)
+static int
+sg_proc_seq_show_debug(struct seq_file *s, void *v)
 {
 	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
-	Sg_device *sdp;
+	struct sg_device *sdp;
 	unsigned long iflags;
 
 	if (it && (0 == it->index))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 03/83] sg: sg_log and is_enabled
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (2 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 02/83] sg: remove typedefs, type+formatting cleanup Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 04/83] sg: rework sg_poll(), minor changes Douglas Gilbert
                   ` (79 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, Hannes Reinecke

Replace SCSI_LOG_TIMEOUT macros with SG_LOG macros across the driver.
The definition of SG_LOG calls SCSI_LOG_TIMEOUT if given and derived
pointers are non-zero, calls pr_info otherwise. SG_LOGS additionally
prints the sg device name and the thread id. The thread id is very
useful, even in single threaded invocations because the driver not
only uses the invocer's thread but also uses work queues and the
main callback (i.e. sg_rq_end_io()) may hit any thread. Some
interesting cases arise when the callback hits its invocer's
thread.

SG_LOGS takes 48 bytes on the stack to build this printf format
string: "sg%u: tid=%d" whose size is clearly bounded above by
the maximum size of those two integers.
Protecting against the 'current' pointer being zero is for safety
and the case where the boot device is SCSI and the sg driver is
built into the kernel. Also when debugging, getting a message
from a compromised kernel can be very useful in pinpointing the
location of the failure.

The simple fact that the SG_LOG macro is shorter than
SCSI_LOG_TIMEOUT macro allow more error message "payload" per line.

Also replace #if and #ifdef conditional compilations with
the IS_ENABLED macro.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 260 +++++++++++++++++++++++-----------------------
 1 file changed, 130 insertions(+), 130 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 443ea7f36f2b..71a1be1d9d7b 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -57,6 +57,15 @@ static char *sg_version_date = "20190606";
 
 #define SG_MAX_DEVS 32768
 
+/* Comment out the following line to compile out SCSI_LOGGING stuff */
+#define SG_DEBUG 1
+
+#if !IS_ENABLED(SG_DEBUG)
+#if IS_ENABLED(DEBUG)	/* If SG_DEBUG not defined, check for DEBUG */
+#define SG_DEBUG DEBUG
+#endif
+#endif
+
 /* SG_MAX_CDB_SIZE should be 260 (spc4r37 section 3.1.30) however the type
  * of sg_io_hdr::cmd_len can only represent 255. All SCSI commands greater
  * than 16 bytes are "variable length" whose length is a multiple of 4
@@ -174,7 +183,7 @@ static ssize_t sg_new_write(struct sg_fd *sfp, struct file *file,
 static int sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 			   u8 *cmnd, int timeout, int blocking);
 static int sg_read_oxfer(struct sg_request *srp, char __user *outp,
-			 int num_read_xfer);
+			 int num_xfer);
 static void sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp);
 static void sg_build_reserve(struct sg_fd *sfp, int req_size);
 static void sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp,
@@ -187,14 +196,45 @@ static int sg_remove_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int dev);
 static void sg_device_destroy(struct kref *kref);
 
-#define SZ_SG_HEADER sizeof(struct sg_header)
-#define SZ_SG_IO_HDR sizeof(sg_io_hdr_t)
-#define SZ_SG_IOVEC sizeof(sg_iovec_t)
-#define SZ_SG_REQ_INFO sizeof(sg_req_info_t)
+#define SZ_SG_HEADER ((int)sizeof(struct sg_header))	/* v1 and v2 header */
+#define SZ_SG_IO_HDR ((int)sizeof(struct sg_io_hdr))	/* v3 header */
+#define SZ_SG_REQ_INFO ((int)sizeof(struct sg_req_info))
+
+/*
+ * Kernel needs to be built with CONFIG_SCSI_LOGGING to see log messages.
+ * 'depth' is a number between 1 (most severe) and 7 (most noisy, most
+ * information). All messages are logged as informational (KERN_INFO). In
+ * the unexpected situation where sfp or sdp is NULL the macro reverts to
+ * a pr_info and ignores SCSI_LOG_TIMEOUT and always prints to the log.
+ * Example: this invocation: 'scsi_logging_level -s -T 3' will print
+ * depth (aka level) 1 and 2 SG_LOG() messages.
+ */
+
+#define SG_PROC_DEBUG_SZ 8192
+
+#if IS_ENABLED(CONFIG_SCSI_LOGGING) && IS_ENABLED(SG_DEBUG)
+#define SG_LOG_BUFF_SZ 48
+
+#define SG_LOG(depth, sfp, fmt, a...)					\
+	do {								\
+		char _b[SG_LOG_BUFF_SZ];				\
+		int _tid = (current ? current->pid : -1);		\
+		struct sg_fd *_fp = sfp;				\
+		struct sg_device *_sdp = _fp ? _fp->parentdp : NULL;	\
+									\
+		if (likely(_sdp && _sdp->disk)) {			\
+			snprintf(_b, sizeof(_b), "sg%u: tid=%d",	\
+				 _sdp->index, _tid);			\
+			SCSI_LOG_TIMEOUT(depth,				\
+					 sdev_prefix_printk(KERN_INFO,	\
+					 _sdp->device, _b, fmt, ##a));	\
+		} else							\
+			pr_info("sg: sdp or sfp NULL, " fmt, ##a);	\
+	} while (0)
+#else
+#define SG_LOG(depth, sfp, fmt, a...) do { } while (0)
+#endif	/* end of CONFIG_SCSI_LOGGING && SG_DEBUG conditional */
 
-#define sg_printk(prefix, sdp, fmt, a...) \
-	sdev_prefix_printk(prefix, (sdp)->device,		\
-			   (sdp)->disk->disk_name, fmt, ##a)
 
 /*
  * The SCSI interfaces that use read() and write() as an asynchronous variant of
@@ -286,9 +326,6 @@ sg_open(struct inode *inode, struct file *filp)
 	if (IS_ERR(sdp))
 		return PTR_ERR(sdp);
 
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_open: flags=0x%x\n", op_flags));
-
 	/* This driver's module count bumped by fops_get in <linux/fs.h> */
 	/* Prevent the device driver from vanishing while we sleep */
 	retval = scsi_device_get(sdp->device);
@@ -346,6 +383,9 @@ sg_open(struct inode *inode, struct file *filp)
 	filp->private_data = sfp;
 	sdp->open_cnt++;
 	mutex_unlock(&sdp->open_rel_lock);
+	SG_LOG(3, sfp, "%s: minor=%d, op_flags=0x%x; %s count prior=%d%s\n",
+	       __func__, min_dev, op_flags, "device open", sdp->open_cnt,
+	       ((op_flags & O_NONBLOCK) ? " O_NONBLOCK" : ""));
 
 	retval = 0;
 sg_put:
@@ -376,9 +416,10 @@ sg_release(struct inode *inode, struct file *filp)
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
+	SG_LOG(3, sfp, "%s: device open count prior=%d\n", __func__,
+	       sdp->open_cnt);
 	if (!sdp)
 		return -ENXIO;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp, "sg_release\n"));
 
 	mutex_lock(&sdp->open_rel_lock);
 	scsi_autopm_put_device(sdp->device);
@@ -417,10 +458,9 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
+	SG_LOG(3, sfp, "%s: write(3rd arg) count=%d\n", __func__, (int)count);
 	if (!sdp)
 		return -ENXIO;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_write: count=%d\n", (int) count));
 	if (atomic_read(&sdp->detaching))
 		return -ENODEV;
 	if (!((filp->f_flags & O_NONBLOCK) ||
@@ -443,8 +483,7 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 		return -EFAULT;
 
 	if (!(srp = sg_add_request(sfp))) {
-		SCSI_LOG_TIMEOUT(1, sg_printk(KERN_INFO, sdp,
-					      "sg_write: queue full\n"));
+		SG_LOG(1, sfp, "%s: queue full\n", __func__);
 		return -EDOM;
 	}
 	mutex_lock(&sfp->f_mutex);
@@ -457,9 +496,8 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 			cmd_size = 12;
 	}
 	mutex_unlock(&sfp->f_mutex);
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sdp,
-		"sg_write:   scsi opcode=0x%02x, cmd_size=%d\n", (int) opcode, cmd_size));
-/* Determine buffer size.  */
+	SG_LOG(4, sfp, "%s:   scsi opcode=0x%02x, cmd_size=%d\n", __func__,
+	       (unsigned int)opcode, cmd_size);
 	input_size = count - cmd_size;
 	mxsize = (input_size > old_hdr.reply_len) ? input_size : old_hdr.reply_len;
 	mxsize -= SZ_SG_HEADER;
@@ -540,8 +578,7 @@ sg_new_write(struct sg_fd *sfp, struct file *file, const char __user *buf,
 
 	sfp->cmd_q = 1;	/* when sg_io_hdr seen, set command queuing on */
 	if (!(srp = sg_add_request(sfp))) {
-		SCSI_LOG_TIMEOUT(1, sg_printk(KERN_INFO, sfp->parentdp,
-					      "sg_new_write: queue full\n"));
+		SG_LOG(1, sfp, "%s: queue full\n", __func__);
 		return -EDOM;
 	}
 	srp->sg_io_owned = sg_io_owned;
@@ -606,9 +643,8 @@ sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 	hp->host_status = 0;
 	hp->driver_status = 0;
 	hp->resid = 0;
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
-			"sg_common_write:  scsi opcode=0x%02x, cmd_size=%d\n",
-			(int) cmnd[0], (int) hp->cmd_len));
+	SG_LOG(4, sfp, "%s:  opcode=0x%02x, cmd_sz=%d\n", __func__,
+	       (int)cmnd[0], hp->cmd_len);
 
 	if (hp->dxfer_len >= SZ_256M) {
 		sg_remove_request(sfp, srp);
@@ -617,8 +653,7 @@ sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 
 	k = sg_start_req(srp, cmnd);
 	if (k) {
-		SCSI_LOG_TIMEOUT(1, sg_printk(KERN_INFO, sfp->parentdp,
-			"sg_common_write: start_req err=%d\n", k));
+		SG_LOG(1, sfp, "%s: start_req err=%d\n", __func__, k);
 		sg_finish_rem_req(srp);
 		sg_remove_request(sfp, srp);
 		return k;	/* probably out of space --> ENOMEM */
@@ -751,9 +786,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "%s: count=%d\n", __func__,
-				      (int)count));
+	SG_LOG(3, sfp, "%s: read() count=%d\n", __func__, (int)count);
 	if (!sdp)
 		return -ENXIO;
 
@@ -974,8 +1007,8 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	struct sg_request *srp;
 	unsigned long iflags;
 
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				   "sg_ioctl: cmd=0x%x\n", (int) cmd_in));
+	SG_LOG(6, sfp, "%s: cmd=0x%x, O_NONBLOCK=%d\n", __func__, cmd_in,
+	       !!(filp->f_flags & O_NONBLOCK));
 	read_only = (O_RDWR != (filp->f_flags & O_ACCMODE));
 
 	switch (cmd_in) {
@@ -1220,8 +1253,9 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 	return scsi_ioctl(sdp->device, cmd_in, p);
 }
 
-#ifdef CONFIG_COMPAT
-static long sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
+#if IS_ENABLED(CONFIG_COMPAT)
+static long
+sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 {
 	void __user *p = compat_ptr(arg);
 	struct sg_device *sdp;
@@ -1274,24 +1308,16 @@ sg_poll(struct file *filp, poll_table * wait)
 			res |= EPOLLOUT | EPOLLWRNORM;
 	} else if (count < SG_MAX_QUEUE)
 		res |= EPOLLOUT | EPOLLWRNORM;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_poll: res=0x%x\n", (__force u32) res));
+	SG_LOG(3, sfp, "%s: res=0x%x\n", __func__, (__force u32)res);
 	return res;
 }
 
 static int
 sg_fasync(int fd, struct file *filp, int mode)
 {
-	struct sg_device *sdp;
-	struct sg_fd *sfp;
-
-	sfp = filp->private_data;
-	sdp = sfp->parentdp;
-	if (!sdp)
-		return -ENXIO;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_fasync: mode=%d\n", mode));
+	struct sg_fd *sfp = filp->private_data;
 
+	SG_LOG(3, sfp, "%s: mode(%s)\n", __func__, (mode ? "add" : "remove"));
 	return fasync_helper(fd, filp, mode, &sfp->async_qp);
 }
 
@@ -1318,10 +1344,8 @@ sg_vma_fault(struct vm_fault *vmf)
 	offset = vmf->pgoff << PAGE_SHIFT;
 	if (offset >= rsv_schp->bufflen)
 		return VM_FAULT_SIGBUS;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sfp->parentdp,
-				      "sg_vma_fault: offset=%lu, scatg=%d\n",
-				      offset, rsv_schp->k_use_sg));
 	sa = vma->vm_start;
+	SG_LOG(3, sfp, "%s: vm_start=0x%lx, off=%lu\n", __func__, sa, offset);
 	length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
 	for (k = 0; k < rsv_schp->k_use_sg && sa < vma->vm_end; k++) {
 		len = vma->vm_end - sa;
@@ -1361,9 +1385,8 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 		return -ENXIO;
 	}
 	req_sz = vma->vm_end - vma->vm_start;
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sfp->parentdp,
-				      "sg_mmap starting, vm_start=%p, len=%d\n",
-				      (void *) vma->vm_start, (int) req_sz));
+	SG_LOG(3, sfp, "%s: vm_start=%p, len=%d\n", __func__,
+	       (void *)vma->vm_start, (int)req_sz);
 	if (vma->vm_pgoff)
 		return -EINVAL;	/* want no offset */
 	rsv_schp = &sfp->reserve;
@@ -1432,10 +1455,9 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 	result = req->result;
 	resid = req->resid_len;
 
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sdp,
-				      "sg_cmd_done: pack_id=%d, res=0x%x\n",
-				      srp->header.pack_id, result));
 	srp->header.resid = resid;
+	SG_LOG(6, sfp, "%s: pack_id=%d, res=0x%x\n", __func__,
+	       srp->header.pack_id, result);
 	ms = jiffies_to_msecs(jiffies);
 	srp->header.duration = (ms > srp->header.duration) ?
 				(ms - srp->header.duration) : 0;
@@ -1509,7 +1531,7 @@ static const struct file_operations sg_fops = {
 	.write = sg_write,
 	.poll = sg_poll,
 	.unlocked_ioctl = sg_ioctl,
-#ifdef CONFIG_COMPAT
+#if IS_ENABLED(CONFIG_COMPAT)
 	.compat_ioctl = sg_compat_ioctl,
 #endif
 	.open = sg_open,
@@ -1556,7 +1578,7 @@ sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 	k = error;
 
 	SCSI_LOG_TIMEOUT(3, sdev_printk(KERN_INFO, scsidp,
-					"sg_alloc: dev=%d \n", k));
+			 "%s: dev=%d, sdp=0x%p ++\n", __func__, k, sdp));
 	sprintf(disk->disk_name, "sg%d", k);
 	disk->first_minor = k;
 	sdp->disk = disk;
@@ -1666,7 +1688,11 @@ sg_device_destroy(struct kref *kref)
 	struct sg_device *sdp = container_of(kref, struct sg_device, d_ref);
 	unsigned long flags;
 
-	/* CAUTION!  Note that the device can still be found via idr_find()
+	SCSI_LOG_TIMEOUT(1, pr_info("[tid=%d] %s: sdp idx=%d, sdp=0x%p --\n",
+				    (current ? current->pid : -1), __func__,
+				    sdp->index, sdp));
+	/*
+	 * CAUTION!  Note that the device can still be found via idr_find()
 	 * even though the refcount is 0.  Therefore, do idr_remove() BEFORE
 	 * any other cleanup.
 	 */
@@ -1675,9 +1701,6 @@ sg_device_destroy(struct kref *kref)
 	idr_remove(&sg_index_idr, sdp->index);
 	write_unlock_irqrestore(&sg_index_lock, flags);
 
-	SCSI_LOG_TIMEOUT(3,
-		sg_printk(KERN_INFO, sdp, "sg_device_destroy\n"));
-
 	put_disk(sdp->disk);
 	kfree(sdp);
 }
@@ -1698,8 +1721,8 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 	if (val > 1)
 		return; /* only want to do following once per device */
 
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "%s\n", __func__));
+	SCSI_LOG_TIMEOUT(3, sdev_printk(KERN_INFO, sdp->device,
+					"%s: 0x%p\n", __func__, sdp));
 
 	read_lock_irqsave(&sdp->sfd_lock, iflags);
 	list_for_each_entry(sfp, &sdp->sfds, sfd_siblings) {
@@ -1768,7 +1791,7 @@ init_sg(void)
 	return rc;
 }
 
-#ifndef CONFIG_SCSI_PROC_FS
+#if !IS_ENABLED(CONFIG_SCSI_PROC_FS)
 static int
 sg_proc_init(void)
 {
@@ -1779,9 +1802,8 @@ sg_proc_init(void)
 static void __exit
 exit_sg(void)
 {
-#ifdef CONFIG_SCSI_PROC_FS
-	remove_proc_subtree("scsi/sg", NULL);
-#endif				/* CONFIG_SCSI_PROC_FS */
+	if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
+		remove_proc_subtree("scsi/sg", NULL);
 	scsi_unregister_interface(&sg_interface);
 	class_destroy(sg_sysfs_class);
 	sg_sysfs_valid = 0;
@@ -1808,15 +1830,14 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 	int rw = hp->dxfer_direction == SG_DXFER_TO_DEV ? WRITE : READ;
 	u8 *long_cmdp = NULL;
 
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
-				      "sg_start_req: dxfer_len=%d\n",
-				      dxfer_len));
-
 	if (hp->cmd_len > BLK_MAX_CDB) {
 		long_cmdp = kzalloc(hp->cmd_len, GFP_KERNEL);
 		if (!long_cmdp)
 			return -ENOMEM;
+		SG_LOG(5, sfp, "%s: long_cmdp=0x%p ++\n", __func__, long_cmdp);
 	}
+	SG_LOG(4, sfp, "%s: dxfer_len=%d, data-%s\n", __func__, dxfer_len,
+	       (rw ? "OUT" : "IN"));
 
 	/*
 	 * NOTE
@@ -1928,9 +1949,8 @@ sg_finish_rem_req(struct sg_request *srp)
 	struct sg_fd *sfp = srp->parentfp;
 	struct sg_scatter_hold *req_schp = &srp->data;
 
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
-				      "sg_finish_rem_req: res_used=%d\n",
-				      (int) srp->res_used));
+	SG_LOG(4, sfp, "%s: srp=0x%p%s\n", __func__, srp,
+	       (srp->res_used) ? " rsv" : "");
 	if (srp->bio)
 		ret = blk_rq_unmap_user(srp->bio);
 
@@ -1977,9 +1997,8 @@ sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
 		++blk_size;	/* don't know why */
 	/* round request up to next highest SG_SECTOR_SZ byte boundary */
 	blk_size = ALIGN(blk_size, SG_SECTOR_SZ);
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
-		"sg_build_indirect: buff_size=%d, blk_size=%d\n",
-		buff_size, blk_size));
+	SG_LOG(4, sfp, "%s: buff_size=%d, blk_size=%d\n", __func__, buff_size,
+	       blk_size);
 
 	/* N.B. ret_sz carried into this block ... */
 	mx_sc_elems = sg_build_sgat(schp, sfp, sg_tablesize);
@@ -2018,18 +2037,13 @@ sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
 				scatter_elem_sz_prev = ret_sz;
 			}
 		}
-
-		SCSI_LOG_TIMEOUT(5, sg_printk(KERN_INFO, sfp->parentdp,
-				 "sg_build_indirect: k=%d, num=%d, ret_sz=%d\n",
-				 k, num, ret_sz));
+		SG_LOG(5, sfp, "%s: k=%d, num=%d, ret_sz=%d\n", __func__, k,
+		       num, ret_sz);
 	}		/* end of for loop */
 
 	schp->page_order = order;
 	schp->k_use_sg = k;
-	SCSI_LOG_TIMEOUT(5, sg_printk(KERN_INFO, sfp->parentdp,
-			 "sg_build_indirect: k_use_sg=%d, rem_sz=%d\n",
-			 k, rem_sz));
-
+	SG_LOG(5, sfp, "%s: k_use_sg=%d, order=%d\n", __func__, k, order);
 	schp->bufflen = blk_size;
 	if (rem_sz > 0)	/* must have failed */
 		return -ENOMEM;
@@ -2047,35 +2061,34 @@ sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
 static void
 sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
 {
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
-			 "sg_remove_scat: k_use_sg=%d\n", schp->k_use_sg));
+	SG_LOG(4, sfp, "%s: num_sgat=%d\n", __func__, schp->k_use_sg);
 	if (schp->pages && schp->sglist_len > 0) {
 		if (!schp->dio_in_use) {
 			int k;
 
 			for (k = 0; k < schp->k_use_sg && schp->pages[k]; k++) {
-				SCSI_LOG_TIMEOUT(5,
-					sg_printk(KERN_INFO, sfp->parentdp,
-					"sg_remove_scat: k=%d, pg=0x%p\n",
-					k, schp->pages[k]));
+				SG_LOG(5, sfp, "%s: pg[%d]=0x%p --\n",
+				       __func__, k, schp->pages[k]);
 				__free_pages(schp->pages[k], schp->page_order);
 			}
-
 			kfree(schp->pages);
 		}
 	}
 	memset(schp, 0, sizeof (*schp));
 }
 
+/*
+ * For sg v1 and v2 interface: with a command yielding a data-in buffer, after
+ * it has arrived in kernel memory, this function copies it to the user space,
+ * appended to given struct sg_header object.
+ */
 static int
 sg_read_oxfer(struct sg_request *srp, char __user *outp, int num_read_xfer)
 {
 	struct sg_scatter_hold *schp = &srp->data;
 	int k, num;
 
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, srp->parentfp->parentdp,
-			 "sg_read_oxfer: num_read_xfer=%d\n",
-			 num_read_xfer));
+	SG_LOG(4, srp->parentfp, "%s: num_xfer=%d\n", __func__, num_read_xfer);
 	if ((!outp) || (num_read_xfer <= 0))
 		return 0;
 
@@ -2105,8 +2118,7 @@ sg_build_reserve(struct sg_fd *sfp, int req_size)
 {
 	struct sg_scatter_hold *schp = &sfp->reserve;
 
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
-			 "sg_build_reserve: req_size=%d\n", req_size));
+	SG_LOG(3, sfp, "%s: buflen=%d\n", __func__, req_size);
 	do {
 		if (req_size < PAGE_SIZE)
 			req_size = PAGE_SIZE;
@@ -2126,8 +2138,7 @@ sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp, int size)
 	int k, num, rem;
 
 	srp->res_used = 1;
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp,
-			 "sg_link_reserve: size=%d\n", size));
+	SG_LOG(4, sfp, "%s: size=%d\n", __func__, size);
 	rem = size;
 
 	num = 1 << (PAGE_SHIFT + rsv_schp->page_order);
@@ -2145,8 +2156,7 @@ sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp, int size)
 	}
 
 	if (k >= rsv_schp->k_use_sg) {
-		SCSI_LOG_TIMEOUT(1, sg_printk(KERN_INFO, sfp->parentdp,
-				 "sg_link_reserve: BAD size\n"));
+		SG_LOG(1, sfp, "%s: BAD size\n", __func__);
 	}
 }
 
@@ -2155,9 +2165,8 @@ sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp)
 {
 	struct sg_scatter_hold *req_schp = &srp->data;
 
-	SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, srp->parentfp->parentdp,
-				      "sg_unlink_reserve: req->k_use_sg=%d\n",
-				      (int) req_schp->k_use_sg));
+	SG_LOG(4, srp->parentfp, "%s: req->k_use_sg=%d\n", __func__,
+	       (int)req_schp->k_use_sg);
 	req_schp->k_use_sg = 0;
 	req_schp->bufflen = 0;
 	req_schp->pages = NULL;
@@ -2248,18 +2257,15 @@ sg_add_sfp(struct sg_device *sdp)
 	}
 	list_add_tail(&sfp->sfd_siblings, &sdp->sfds);
 	write_unlock_irqrestore(&sdp->sfd_lock, iflags);
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_add_sfp: sfp=0x%p\n", sfp));
+	SG_LOG(3, sfp, "%s: sfp=0x%p\n", __func__, sfp);
 	if (unlikely(sg_big_buff != def_reserved_size))
 		sg_big_buff = def_reserved_size;
 
 	bufflen = min_t(int, sg_big_buff,
 			max_sectors_bytes(sdp->device->request_queue));
 	sg_build_reserve(sfp, bufflen);
-	SCSI_LOG_TIMEOUT(3, sg_printk(KERN_INFO, sdp,
-				      "sg_add_sfp: bufflen=%d, k_use_sg=%d\n",
-				      sfp->reserve.bufflen,
-				      sfp->reserve.k_use_sg));
+	SG_LOG(3, sfp, "%s: bufflen=%d, k_use_sg=%d\n", __func__,
+	       sfp->reserve.bufflen, sfp->reserve.k_use_sg);
 
 	kref_get(&sdp->d_ref);
 	__module_get(THIS_MODULE);
@@ -2285,15 +2291,12 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 
 	if (sfp->reserve.bufflen > 0) {
-		SCSI_LOG_TIMEOUT(6, sg_printk(KERN_INFO, sdp,
-				"sg_remove_sfp:    bufflen=%d, k_use_sg=%d\n",
-				(int) sfp->reserve.bufflen,
-				(int) sfp->reserve.k_use_sg));
+		SG_LOG(6, sfp, "%s:    bufflen=%d, k_use_sg=%d\n", __func__,
+		       (int)sfp->reserve.bufflen, (int)sfp->reserve.k_use_sg);
 		sg_remove_scat(sfp, &sfp->reserve);
 	}
 
-	SCSI_LOG_TIMEOUT(6, sg_printk(KERN_INFO, sdp,
-			"sg_remove_sfp: sfp=0x%p\n", sfp));
+	SG_LOG(6, sfp, "%s: sfp=0x%p\n", __func__, sfp);
 	kfree(sfp);
 
 	scsi_device_put(sdp->device);
@@ -2316,7 +2319,6 @@ sg_remove_sfp(struct kref *kref)
 	schedule_work(&sfp->ew.work);
 }
 
-#ifdef CONFIG_SCSI_PROC_FS
 static int
 sg_idr_max_id(int id, void *p, void *data)
 {
@@ -2328,19 +2330,6 @@ sg_idr_max_id(int id, void *p, void *data)
 	return 0;
 }
 
-static int
-sg_last_dev(void)
-{
-	int k = -1;
-	unsigned long iflags;
-
-	read_lock_irqsave(&sg_index_lock, iflags);
-	idr_for_each(&sg_index_idr, sg_idr_max_id, &k);
-	read_unlock_irqrestore(&sg_index_lock, iflags);
-	return k + 1;		/* origin 1 */
-}
-#endif
-
 /* must be called with sg_index_lock held */
 static struct sg_device *
 sg_lookup_dev(int dev)
@@ -2370,7 +2359,7 @@ sg_get_dev(int dev)
 	return sdp;
 }
 
-#ifdef CONFIG_SCSI_PROC_FS
+#if IS_ENABLED(CONFIG_SCSI_PROC_FS)     /* long, almost to end of file */
 static int sg_proc_seq_show_int(struct seq_file *s, void *v);
 
 static int sg_proc_single_open_adio(struct inode *inode, struct file *file);
@@ -2443,6 +2432,17 @@ sg_proc_init(void)
 	return 0;
 }
 
+static int
+sg_last_dev(void)
+{
+	int k = -1;
+	unsigned long iflags;
+
+	read_lock_irqsave(&sg_index_lock, iflags);
+	idr_for_each(&sg_index_idr, sg_idr_max_id, &k);
+	read_unlock_irqrestore(&sg_index_lock, iflags);
+	return k + 1;		/* origin 1 */
+}
 
 static int
 sg_proc_seq_show_int(struct seq_file *s, void *v)
@@ -2701,7 +2701,7 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v)
 	return 0;
 }
 
-#endif				/* CONFIG_SCSI_PROC_FS */
+#endif				/* CONFIG_SCSI_PROC_FS (~300 lines back) */
 
 module_init(init_sg);
 module_exit(exit_sg);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 04/83] sg: rework sg_poll(), minor changes
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (3 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 03/83] sg: sg_log and is_enabled Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 05/83] sg: bitops in sg_device Douglas Gilbert
                   ` (78 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, Hannes Reinecke

Re-arrange code in sg_poll(). Rename sg_read_oxfer() to
sg_rd_append(). In sg_start_req() rename rw to r0w.
Plus associated changes demanded by checkpatch.pl

Reviewed-by: Hannes Reinecke <hare@suse.com>

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 65 ++++++++++++++++++++++-------------------------
 1 file changed, 30 insertions(+), 35 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 71a1be1d9d7b..0827193fe290 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -182,8 +182,8 @@ static ssize_t sg_new_write(struct sg_fd *sfp, struct file *file,
 			    struct sg_request **o_srp);
 static int sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 			   u8 *cmnd, int timeout, int blocking);
-static int sg_read_oxfer(struct sg_request *srp, char __user *outp,
-			 int num_xfer);
+static int sg_rd_append(struct sg_request *srp, char __user *outp,
+			int num_xfer);
 static void sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp);
 static void sg_build_reserve(struct sg_fd *sfp, int req_size);
 static void sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp,
@@ -796,7 +796,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 		old_hdr = kmalloc(SZ_SG_HEADER, GFP_KERNEL);
 		if (!old_hdr)
 			return -ENOMEM;
-		if (__copy_from_user(old_hdr, buf, SZ_SG_HEADER)) {
+		if (copy_from_user(old_hdr, buf, SZ_SG_HEADER)) {
 			retval = -EFAULT;
 			goto free_old_hdr;
 		}
@@ -809,7 +809,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 					retval = -ENOMEM;
 					goto free_old_hdr;
 				}
-				retval = __copy_from_user
+				retval = copy_from_user
 				    (new_hdr, buf, SZ_SG_IO_HDR);
 				req_pack_id = new_hdr->pack_id;
 				kfree(new_hdr);
@@ -904,7 +904,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 
 	/* Now copy the result back to the user buffer.  */
 	if (count >= SZ_SG_HEADER) {
-		if (__copy_to_user(buf, old_hdr, SZ_SG_HEADER)) {
+		if (copy_to_user(buf, old_hdr, SZ_SG_HEADER)) {
 			retval = -EFAULT;
 			goto free_old_hdr;
 		}
@@ -912,7 +912,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 		if (count > old_hdr->reply_len)
 			count = old_hdr->reply_len;
 		if (count > SZ_SG_HEADER) {
-			if (sg_read_oxfer(srp, buf, count - SZ_SG_HEADER)) {
+			if (sg_rd_append(srp, buf, count - SZ_SG_HEADER)) {
 				retval = -EFAULT;
 				goto free_old_hdr;
 			}
@@ -1278,38 +1278,34 @@ sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 static __poll_t
 sg_poll(struct file *filp, poll_table * wait)
 {
-	__poll_t res = 0;
-	struct sg_device *sdp;
-	struct sg_fd *sfp;
+	__poll_t p_res = 0;
+	struct sg_fd *sfp = filp->private_data;
 	struct sg_request *srp;
 	int count = 0;
 	unsigned long iflags;
 
-	sfp = filp->private_data;
 	if (!sfp)
 		return EPOLLERR;
-	sdp = sfp->parentdp;
-	if (!sdp)
-		return EPOLLERR;
 	poll_wait(filp, &sfp->read_wait, wait);
 	read_lock_irqsave(&sfp->rq_list_lock, iflags);
 	list_for_each_entry(srp, &sfp->rq_list, entry) {
 		/* if any read waiting, flag it */
-		if ((0 == res) && (1 == srp->done) && (!srp->sg_io_owned))
-			res = EPOLLIN | EPOLLRDNORM;
+		if (p_res == 0 && srp->done == 1 && !srp->sg_io_owned)
+			p_res = EPOLLIN | EPOLLRDNORM;
 		++count;
 	}
 	read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 
-	if (atomic_read(&sdp->detaching))
-		res |= EPOLLHUP;
-	else if (!sfp->cmd_q) {
-		if (0 == count)
-			res |= EPOLLOUT | EPOLLWRNORM;
-	} else if (count < SG_MAX_QUEUE)
-		res |= EPOLLOUT | EPOLLWRNORM;
-	SG_LOG(3, sfp, "%s: res=0x%x\n", __func__, (__force u32)res);
-	return res;
+	if (sfp->parentdp && atomic_read(&sfp->parentdp->detaching)) {
+		p_res |= EPOLLHUP;
+	} else if (!sfp->cmd_q) {
+		if (count == 0)
+			p_res |= EPOLLOUT | EPOLLWRNORM;
+	} else if (count < SG_MAX_QUEUE) {
+		p_res |= EPOLLOUT | EPOLLWRNORM;
+	}
+	SG_LOG(3, sfp, "%s: p_res=0x%x\n", __func__, (__force u32)p_res);
+	return p_res;
 }
 
 static int
@@ -1827,7 +1823,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 	struct sg_scatter_hold *rsv_schp = &sfp->reserve;
 	struct request_queue *q = sfp->parentdp->device->request_queue;
 	struct rq_map_data *md, map_data;
-	int rw = hp->dxfer_direction == SG_DXFER_TO_DEV ? WRITE : READ;
+	int r0w = hp->dxfer_direction == SG_DXFER_TO_DEV ? WRITE : READ;
 	u8 *long_cmdp = NULL;
 
 	if (hp->cmd_len > BLK_MAX_CDB) {
@@ -1837,7 +1833,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 		SG_LOG(5, sfp, "%s: long_cmdp=0x%p ++\n", __func__, long_cmdp);
 	}
 	SG_LOG(4, sfp, "%s: dxfer_len=%d, data-%s\n", __func__, dxfer_len,
-	       (rw ? "OUT" : "IN"));
+	       (r0w ? "OUT" : "IN"));
 
 	/*
 	 * NOTE
@@ -1914,7 +1910,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 		struct iovec *iov = NULL;
 		struct iov_iter i;
 
-		res = import_iovec(rw, hp->dxferp, iov_count, 0, &iov, &i);
+		res = import_iovec(r0w, hp->dxferp, iov_count, 0, &iov, &i);
 		if (res < 0)
 			return res;
 
@@ -2083,33 +2079,32 @@ sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
  * appended to given struct sg_header object.
  */
 static int
-sg_read_oxfer(struct sg_request *srp, char __user *outp, int num_read_xfer)
+sg_rd_append(struct sg_request *srp, char __user *outp, int num_xfer)
 {
 	struct sg_scatter_hold *schp = &srp->data;
 	int k, num;
 
-	SG_LOG(4, srp->parentfp, "%s: num_xfer=%d\n", __func__, num_read_xfer);
-	if ((!outp) || (num_read_xfer <= 0))
+	SG_LOG(4, srp->parentfp, "%s: num_xfer=%d\n", __func__, num_xfer);
+	if (!outp || num_xfer <= 0)
 		return 0;
 
 	num = 1 << (PAGE_SHIFT + schp->page_order);
 	for (k = 0; k < schp->k_use_sg && schp->pages[k]; k++) {
-		if (num > num_read_xfer) {
+		if (num > num_xfer) {
 			if (copy_to_user(outp, page_address(schp->pages[k]),
-					   num_read_xfer))
+					   num_xfer))
 				return -EFAULT;
 			break;
 		} else {
 			if (copy_to_user(outp, page_address(schp->pages[k]),
 					   num))
 				return -EFAULT;
-			num_read_xfer -= num;
-			if (num_read_xfer <= 0)
+			num_xfer -= num;
+			if (num_xfer <= 0)
 				break;
 			outp += num;
 		}
 	}
-
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 05/83] sg: bitops in sg_device
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (4 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 04/83] sg: rework sg_poll(), minor changes Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 06/83] sg: make open count an atomic Douglas Gilbert
                   ` (77 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, Hannes Reinecke

Introduce bitops in sg_device to replace an atomic, a bool and a
char. That char (sgdebug) had been reduced to only two states.
Add some associated macros to make the code a little clearer.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 104 +++++++++++++++++++++++-----------------------
 1 file changed, 53 insertions(+), 51 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 0827193fe290..6a54fd655797 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -74,6 +74,11 @@ static char *sg_version_date = "20190606";
 
 #define SG_DEFAULT_TIMEOUT mult_frac(SG_DEFAULT_TIMEOUT_USER, HZ, USER_HZ)
 
+/* Bit positions (flags) for sg_device::fdev_bm bitmask follow */
+#define SG_FDEV_EXCLUDE		0	/* have fd open with O_EXCL */
+#define SG_FDEV_DETACHING	1	/* may be unexpected device removal */
+#define SG_FDEV_LOG_SENSE	2	/* set by ioctl(SG_SET_DEBUG) */
+
 int sg_big_buff = SG_DEF_RESERVED_SIZE;
 /* N.B. This variable is readable and writeable via
    /proc/scsi/sg/def_reserved_size . Each time sg_open() is called a buffer
@@ -155,14 +160,12 @@ struct sg_device { /* holds the state of each scsi generic device */
 	struct scsi_device *device;
 	wait_queue_head_t open_wait;    /* queue open() when O_EXCL present */
 	struct mutex open_rel_lock;     /* held when in open() or release() */
-	int sg_tablesize;	/* adapter's max scatter-gather table size */
-	u32 index;		/* device index number */
 	struct list_head sfds;
 	rwlock_t sfd_lock;      /* protect access to sfd list */
-	atomic_t detaching;     /* 0->device usable, 1->device detaching */
-	bool exclude;		/* 1->open(O_EXCL) succeeded and is active */
+	int sg_tablesize;	/* adapter's max scatter-gather table size */
+	u32 index;		/* device index number */
 	int open_cnt;		/* count of opens (perhaps < num(sfds) ) */
-	char sgdebug;		/* 0->off, 1->sense, 9->dump dev, 10-> all devs */
+	unsigned long fdev_bm[1];	/* see SG_FDEV_* defines above */
 	struct gendisk *disk;
 	struct cdev *cdev;
 	struct kref d_ref;
@@ -200,6 +203,9 @@ static void sg_device_destroy(struct kref *kref);
 #define SZ_SG_IO_HDR ((int)sizeof(struct sg_io_hdr))	/* v3 header */
 #define SZ_SG_REQ_INFO ((int)sizeof(struct sg_req_info))
 
+#define SG_IS_DETACHING(sdp) test_bit(SG_FDEV_DETACHING, (sdp)->fdev_bm)
+#define SG_HAVE_EXCLUDE(sdp) test_bit(SG_FDEV_EXCLUDE, (sdp)->fdev_bm)
+
 /*
  * Kernel needs to be built with CONFIG_SCSI_LOGGING to see log messages.
  * 'depth' is a number between 1 (most severe) and 7 (most noisy, most
@@ -273,26 +279,26 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 		while (sdp->open_cnt > 0) {
 			mutex_unlock(&sdp->open_rel_lock);
 			retval = wait_event_interruptible(sdp->open_wait,
-					(atomic_read(&sdp->detaching) ||
+					(SG_IS_DETACHING(sdp) ||
 					 !sdp->open_cnt));
 			mutex_lock(&sdp->open_rel_lock);
 
 			if (retval) /* -ERESTARTSYS */
 				return retval;
-			if (atomic_read(&sdp->detaching))
+			if (SG_IS_DETACHING(sdp))
 				return -ENODEV;
 		}
 	} else {
-		while (sdp->exclude) {
+		while (SG_HAVE_EXCLUDE(sdp)) {
 			mutex_unlock(&sdp->open_rel_lock);
 			retval = wait_event_interruptible(sdp->open_wait,
-					(atomic_read(&sdp->detaching) ||
-					 !sdp->exclude));
+					(SG_IS_DETACHING(sdp) ||
+					 !SG_HAVE_EXCLUDE(sdp)));
 			mutex_lock(&sdp->open_rel_lock);
 
 			if (retval) /* -ERESTARTSYS */
 				return retval;
-			if (atomic_read(&sdp->detaching))
+			if (SG_IS_DETACHING(sdp))
 				return -ENODEV;
 		}
 	}
@@ -354,7 +360,7 @@ sg_open(struct inode *inode, struct file *filp)
 				goto error_mutex_locked;
 			}
 		} else {
-			if (sdp->exclude) {
+			if (SG_HAVE_EXCLUDE(sdp)) {
 				retval = -EBUSY;
 				goto error_mutex_locked;
 			}
@@ -367,10 +373,10 @@ sg_open(struct inode *inode, struct file *filp)
 
 	/* N.B. at this point we are holding the open_rel_lock */
 	if (o_excl)
-		sdp->exclude = true;
+		set_bit(SG_FDEV_EXCLUDE, sdp->fdev_bm);
 
 	if (sdp->open_cnt < 1) {  /* no existing opens */
-		sdp->sgdebug = 0;
+		clear_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm);
 		q = sdp->device->request_queue;
 		sdp->sg_tablesize = queue_max_segments(q);
 	}
@@ -393,8 +399,8 @@ sg_open(struct inode *inode, struct file *filp)
 	return retval;
 
 out_undo:
-	if (o_excl) {
-		sdp->exclude = false;   /* undo if error */
+	if (o_excl) {		/* undo if error */
+		clear_bit(SG_FDEV_EXCLUDE, sdp->fdev_bm);
 		wake_up_interruptible(&sdp->open_wait);
 	}
 error_mutex_locked:
@@ -428,12 +434,10 @@ sg_release(struct inode *inode, struct file *filp)
 
 	/* possibly many open()s waiting on exclude clearing, start many;
 	 * only open(O_EXCL)s wait on 0==open_cnt so only start one */
-	if (sdp->exclude) {
-		sdp->exclude = false;
+	if (test_and_clear_bit(SG_FDEV_EXCLUDE, sdp->fdev_bm))
 		wake_up_interruptible_all(&sdp->open_wait);
-	} else if (0 == sdp->open_cnt) {
+	else if (sdp->open_cnt == 0)
 		wake_up_interruptible(&sdp->open_wait);
-	}
 	mutex_unlock(&sdp->open_rel_lock);
 	return 0;
 }
@@ -461,7 +465,7 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 	SG_LOG(3, sfp, "%s: write(3rd arg) count=%d\n", __func__, (int)count);
 	if (!sdp)
 		return -ENXIO;
-	if (atomic_read(&sdp->detaching))
+	if (SG_IS_DETACHING(sdp))
 		return -ENODEV;
 	if (!((filp->f_flags & O_NONBLOCK) ||
 	      scsi_block_when_processing_errors(sdp->device)))
@@ -658,7 +662,7 @@ sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 		sg_remove_request(sfp, srp);
 		return k;	/* probably out of space --> ENOMEM */
 	}
-	if (atomic_read(&sdp->detaching)) {
+	if (SG_IS_DETACHING(sdp)) {
 		if (srp->bio) {
 			scsi_req_free_cmd(scsi_req(srp->rq));
 			blk_put_request(srp->rq);
@@ -824,7 +828,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 	}
 	srp = sg_get_rq_mark(sfp, req_pack_id);
 	if (!srp) {		/* now wait on packet to arrive */
-		if (atomic_read(&sdp->detaching)) {
+		if (SG_IS_DETACHING(sdp)) {
 			retval = -ENODEV;
 			goto free_old_hdr;
 		}
@@ -834,9 +838,9 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 		}
 		retval = wait_event_interruptible
 				(sfp->read_wait,
-				 (atomic_read(&sdp->detaching) ||
+				 (SG_IS_DETACHING(sdp) ||
 				  (srp = sg_get_rq_mark(sfp, req_pack_id))));
-		if (atomic_read(&sdp->detaching)) {
+		if (SG_IS_DETACHING(sdp)) {
 			retval = -ENODEV;
 			goto free_old_hdr;
 		}
@@ -1013,7 +1017,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 
 	switch (cmd_in) {
 	case SG_IO:
-		if (atomic_read(&sdp->detaching))
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		if (!scsi_block_when_processing_errors(sdp->device))
 			return -ENXIO;
@@ -1022,8 +1026,8 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		if (result < 0)
 			return result;
 		result = wait_event_interruptible(sfp->read_wait,
-			(srp_done(sfp, srp) || atomic_read(&sdp->detaching)));
-		if (atomic_read(&sdp->detaching))
+			(srp_done(sfp, srp) || SG_IS_DETACHING(sdp)));
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		write_lock_irq(&sfp->rq_list_lock);
 		if (srp->done) {
@@ -1064,7 +1068,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		{
 			sg_scsi_id_t v;
 
-			if (atomic_read(&sdp->detaching))
+			if (SG_IS_DETACHING(sdp))
 				return -ENODEV;
 			memset(&v, 0, sizeof(v));
 			v.host_no = sdp->device->host->host_no;
@@ -1184,18 +1188,18 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 			return result;
 		}
 	case SG_EMULATED_HOST:
-		if (atomic_read(&sdp->detaching))
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		return put_user(sdp->device->host->hostt->emulated, ip);
 	case SCSI_IOCTL_SEND_COMMAND:
-		if (atomic_read(&sdp->detaching))
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		return sg_scsi_ioctl(sdp->device->request_queue, NULL, filp->f_mode, p);
 	case SG_SET_DEBUG:
 		result = get_user(val, ip);
 		if (result)
 			return result;
-		sdp->sgdebug = (char) val;
+		assign_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm, val);
 		return 0;
 	case BLKSECTGET:
 		return put_user(max_sectors_bytes(sdp->device->request_queue),
@@ -1216,7 +1220,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	case SCSI_IOCTL_PROBE_HOST:
 	case SG_GET_TRANSFORM:
 	case SG_SCSI_RESET:
-		if (atomic_read(&sdp->detaching))
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		break;
 	default:
@@ -1296,7 +1300,7 @@ sg_poll(struct file *filp, poll_table * wait)
 	}
 	read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 
-	if (sfp->parentdp && atomic_read(&sfp->parentdp->detaching)) {
+	if (sfp->parentdp && SG_IS_DETACHING(sfp->parentdp)) {
 		p_res |= EPOLLHUP;
 	} else if (!sfp->cmd_q) {
 		if (count == 0)
@@ -1444,7 +1448,7 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 		return;
 
 	sdp = sfp->parentdp;
-	if (unlikely(atomic_read(&sdp->detaching)))
+	if (unlikely(SG_IS_DETACHING(sdp)))
 		pr_info("%s: device detaching\n", __func__);
 
 	sense = req->sense;
@@ -1465,9 +1469,9 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 		srp->header.msg_status = msg_byte(result);
 		srp->header.host_status = host_byte(result);
 		srp->header.driver_status = driver_byte(result);
-		if ((sdp->sgdebug > 0) &&
-		    ((CHECK_CONDITION == srp->header.masked_status) ||
-		     (COMMAND_TERMINATED == srp->header.masked_status)))
+		if (test_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm) &&
+		    (srp->header.masked_status == CHECK_CONDITION ||
+		     srp->header.masked_status == COMMAND_TERMINATED))
 			__scsi_print_sense(sdp->device, __func__, sense,
 					   SCSI_SENSE_BUFFERSIZE);
 
@@ -1582,7 +1586,7 @@ sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 	mutex_init(&sdp->open_rel_lock);
 	INIT_LIST_HEAD(&sdp->sfds);
 	init_waitqueue_head(&sdp->open_wait);
-	atomic_set(&sdp->detaching, 0);
+	clear_bit(SG_FDEV_DETACHING, sdp->fdev_bm);
 	rwlock_init(&sdp->sfd_lock);
 	sdp->sg_tablesize = queue_max_segments(q);
 	sdp->index = k;
@@ -1708,13 +1712,11 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 	struct sg_device *sdp = dev_get_drvdata(cl_dev);
 	unsigned long iflags;
 	struct sg_fd *sfp;
-	int val;
 
 	if (!sdp)
 		return;
-	/* want sdp->detaching non-zero as soon as possible */
-	val = atomic_inc_return(&sdp->detaching);
-	if (val > 1)
+	/* set this flag as soon as possible as it could be a surprise */
+	if (test_and_set_bit(SG_FDEV_DETACHING, sdp->fdev_bm))
 		return; /* only want to do following once per device */
 
 	SCSI_LOG_TIMEOUT(3, sdev_printk(KERN_INFO, sdp->device,
@@ -2245,7 +2247,7 @@ sg_add_sfp(struct sg_device *sdp)
 	sfp->keep_orphan = SG_DEF_KEEP_ORPHAN;
 	sfp->parentdp = sdp;
 	write_lock_irqsave(&sdp->sfd_lock, iflags);
-	if (atomic_read(&sdp->detaching)) {
+	if (SG_IS_DETACHING(sdp)) {
 		write_unlock_irqrestore(&sdp->sfd_lock, iflags);
 		kfree(sfp);
 		return ERR_PTR(-ENODEV);
@@ -2342,8 +2344,8 @@ sg_get_dev(int dev)
 	sdp = sg_lookup_dev(dev);
 	if (!sdp)
 		sdp = ERR_PTR(-ENXIO);
-	else if (atomic_read(&sdp->detaching)) {
-		/* If sdp->detaching, then the refcount may already be 0, in
+	else if (SG_IS_DETACHING(sdp)) {
+		/* If detaching, then the refcount may already be 0, in
 		 * which case it would be a bug to do kref_get().
 		 */
 		sdp = ERR_PTR(-ENODEV);
@@ -2555,8 +2557,7 @@ sg_proc_seq_show_dev(struct seq_file *s, void *v)
 
 	read_lock_irqsave(&sg_index_lock, iflags);
 	sdp = it ? sg_lookup_dev(it->index) : NULL;
-	if ((NULL == sdp) || (NULL == sdp->device) ||
-	    (atomic_read(&sdp->detaching)))
+	if (!sdp || !sdp->device || SG_IS_DETACHING(sdp))
 		seq_puts(s, "-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\n");
 	else {
 		scsidp = sdp->device;
@@ -2583,7 +2584,7 @@ sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 	read_lock_irqsave(&sg_index_lock, iflags);
 	sdp = it ? sg_lookup_dev(it->index) : NULL;
 	scsidp = sdp ? sdp->device : NULL;
-	if (sdp && scsidp && (!atomic_read(&sdp->detaching)))
+	if (sdp && scsidp && !SG_IS_DETACHING(sdp))
 		seq_printf(s, "%8.8s\t%16.16s\t%4.4s\n",
 			   scsidp->vendor, scsidp->model, scsidp->rev);
 	else
@@ -2675,7 +2676,7 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v)
 	read_lock(&sdp->sfd_lock);
 	if (!list_empty(&sdp->sfds)) {
 		seq_printf(s, " >>> device=%s ", sdp->disk->disk_name);
-		if (atomic_read(&sdp->detaching))
+		if (SG_IS_DETACHING(sdp))
 			seq_puts(s, "detaching pending close ");
 		else if (sdp->device) {
 			struct scsi_device *scsidp = sdp->device;
@@ -2687,7 +2688,8 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v)
 				   scsidp->host->hostt->emulated);
 		}
 		seq_printf(s, " sg_tablesize=%d excl=%d open_cnt=%d\n",
-			   sdp->sg_tablesize, sdp->exclude, sdp->open_cnt);
+			   sdp->sg_tablesize, SG_HAVE_EXCLUDE(sdp),
+			   sdp->open_cnt);
 		sg_proc_debug_helper(s, sdp);
 	}
 	read_unlock(&sdp->sfd_lock);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 06/83] sg: make open count an atomic
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (5 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 05/83] sg: bitops in sg_device Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 07/83] sg: move header to uapi section Douglas Gilbert
                   ` (76 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, Hannes Reinecke

Convert sg_device::open_cnt into an atomic. Also rename
sg_tablesize into the more descriptive max_sgat_elems.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 44 +++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 6a54fd655797..42c5ffedf09b 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -162,9 +162,9 @@ struct sg_device { /* holds the state of each scsi generic device */
 	struct mutex open_rel_lock;     /* held when in open() or release() */
 	struct list_head sfds;
 	rwlock_t sfd_lock;      /* protect access to sfd list */
-	int sg_tablesize;	/* adapter's max scatter-gather table size */
+	int max_sgat_elems;	/* adapter's max sgat number of elements */
 	u32 index;		/* device index number */
-	int open_cnt;		/* count of opens (perhaps < num(sfds) ) */
+	atomic_t open_cnt;	/* count of opens (perhaps < num(sfds) ) */
 	unsigned long fdev_bm[1];	/* see SG_FDEV_* defines above */
 	struct gendisk *disk;
 	struct cdev *cdev;
@@ -276,11 +276,11 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 	int retval = 0;
 
 	if (o_excl) {
-		while (sdp->open_cnt > 0) {
+		while (atomic_read(&sdp->open_cnt) > 0) {
 			mutex_unlock(&sdp->open_rel_lock);
 			retval = wait_event_interruptible(sdp->open_wait,
 					(SG_IS_DETACHING(sdp) ||
-					 !sdp->open_cnt));
+					 atomic_read(&sdp->open_cnt) == 0));
 			mutex_lock(&sdp->open_rel_lock);
 
 			if (retval) /* -ERESTARTSYS */
@@ -328,7 +328,7 @@ sg_open(struct inode *inode, struct file *filp)
 	o_excl = !!(op_flags & O_EXCL);
 	if (o_excl && ((op_flags & O_ACCMODE) == O_RDONLY))
 		return -EPERM; /* Can't lock it with read only access */
-	sdp = sg_get_dev(min_dev);
+	sdp = sg_get_dev(min_dev);	/* increments sdp->d_ref */
 	if (IS_ERR(sdp))
 		return PTR_ERR(sdp);
 
@@ -355,7 +355,7 @@ sg_open(struct inode *inode, struct file *filp)
 	mutex_lock(&sdp->open_rel_lock);
 	if (op_flags & O_NONBLOCK) {
 		if (o_excl) {
-			if (sdp->open_cnt > 0) {
+			if (atomic_read(&sdp->open_cnt) > 0) {
 				retval = -EBUSY;
 				goto error_mutex_locked;
 			}
@@ -375,27 +375,29 @@ sg_open(struct inode *inode, struct file *filp)
 	if (o_excl)
 		set_bit(SG_FDEV_EXCLUDE, sdp->fdev_bm);
 
-	if (sdp->open_cnt < 1) {  /* no existing opens */
+	if (atomic_read(&sdp->open_cnt) < 1) {  /* no existing opens */
 		clear_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm);
 		q = sdp->device->request_queue;
-		sdp->sg_tablesize = queue_max_segments(q);
+		sdp->max_sgat_elems = queue_max_segments(q);
 	}
-	sfp = sg_add_sfp(sdp);
+	sfp = sg_add_sfp(sdp);		/* increments sdp->d_ref */
 	if (IS_ERR(sfp)) {
 		retval = PTR_ERR(sfp);
 		goto out_undo;
 	}
 
 	filp->private_data = sfp;
-	sdp->open_cnt++;
+	atomic_inc(&sdp->open_cnt);
 	mutex_unlock(&sdp->open_rel_lock);
 	SG_LOG(3, sfp, "%s: minor=%d, op_flags=0x%x; %s count prior=%d%s\n",
-	       __func__, min_dev, op_flags, "device open", sdp->open_cnt,
+	       __func__, min_dev, op_flags, "device open",
+	       atomic_read(&sdp->open_cnt),
 	       ((op_flags & O_NONBLOCK) ? " O_NONBLOCK" : ""));
 
 	retval = 0;
 sg_put:
 	kref_put(&sdp->d_ref, sg_device_destroy);
+	/* if success, sdp->d_ref is incremented twice, decremented once */
 	return retval;
 
 out_undo:
@@ -423,20 +425,20 @@ sg_release(struct inode *inode, struct file *filp)
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
 	SG_LOG(3, sfp, "%s: device open count prior=%d\n", __func__,
-	       sdp->open_cnt);
+	       atomic_read(&sdp->open_cnt));
 	if (!sdp)
 		return -ENXIO;
 
 	mutex_lock(&sdp->open_rel_lock);
 	scsi_autopm_put_device(sdp->device);
 	kref_put(&sfp->f_ref, sg_remove_sfp);
-	sdp->open_cnt--;
+	atomic_dec(&sdp->open_cnt);
 
 	/* possibly many open()s waiting on exclude clearing, start many;
 	 * only open(O_EXCL)s wait on 0==open_cnt so only start one */
 	if (test_and_clear_bit(SG_FDEV_EXCLUDE, sdp->fdev_bm))
 		wake_up_interruptible_all(&sdp->open_wait);
-	else if (sdp->open_cnt == 0)
+	else if (atomic_read(&sdp->open_cnt) == 0)
 		wake_up_interruptible(&sdp->open_wait);
 	mutex_unlock(&sdp->open_rel_lock);
 	return 0;
@@ -1109,7 +1111,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 		return put_user(val, ip);
 	case SG_GET_SG_TABLESIZE:
-		return put_user(sdp->sg_tablesize, ip);
+		return put_user(sdp->max_sgat_elems, ip);
 	case SG_SET_RESERVED_SIZE:
 		result = get_user(val, ip);
 		if (result)
@@ -1588,7 +1590,7 @@ sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 	init_waitqueue_head(&sdp->open_wait);
 	clear_bit(SG_FDEV_DETACHING, sdp->fdev_bm);
 	rwlock_init(&sdp->sfd_lock);
-	sdp->sg_tablesize = queue_max_segments(q);
+	sdp->max_sgat_elems = queue_max_segments(q);
 	sdp->index = k;
 	kref_init(&sdp->d_ref);
 	error = 0;
@@ -1984,7 +1986,7 @@ sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
 		  int buff_size)
 {
 	int ret_sz = 0, i, k, rem_sz, num, mx_sc_elems;
-	int sg_tablesize = sfp->parentdp->sg_tablesize;
+	int max_sgat_elems = sfp->parentdp->max_sgat_elems;
 	int blk_size = buff_size, order;
 	gfp_t gfp_mask = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN | __GFP_ZERO;
 	struct sg_device *sdp = sfp->parentdp;
@@ -1999,7 +2001,7 @@ sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
 	       blk_size);
 
 	/* N.B. ret_sz carried into this block ... */
-	mx_sc_elems = sg_build_sgat(schp, sfp, sg_tablesize);
+	mx_sc_elems = sg_build_sgat(schp, sfp, max_sgat_elems);
 	if (mx_sc_elems < 0)
 		return mx_sc_elems;	/* most likely -ENOMEM */
 
@@ -2687,9 +2689,9 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v)
 				   scsidp->lun,
 				   scsidp->host->hostt->emulated);
 		}
-		seq_printf(s, " sg_tablesize=%d excl=%d open_cnt=%d\n",
-			   sdp->sg_tablesize, SG_HAVE_EXCLUDE(sdp),
-			   sdp->open_cnt);
+		seq_printf(s, " max_sgat_elems=%d excl=%d open_cnt=%d\n",
+			   sdp->max_sgat_elems, SG_HAVE_EXCLUDE(sdp),
+			   atomic_read(&sdp->open_cnt));
 		sg_proc_debug_helper(s, sdp);
 	}
 	read_unlock(&sdp->sfd_lock);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 07/83] sg: move header to uapi section
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (6 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 06/83] sg: make open count an atomic Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 08/83] sg: speed sg_poll and sg_get_num_waiting Douglas Gilbert
                   ` (75 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, Hannes Reinecke

Move user interface part of scsi/sg.h into the new header file:
include/uapi/scsi/sg.h . Since scsi/sg.h includes the new header,
other code including scsi/sg.h should not be impacted.

Add include for <stddef.h> as it defines size_t amongst others
and Linux includes may not do that when included outside the
kernel space.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 include/scsi/sg.h      | 273 ++--------------------------------
 include/uapi/scsi/sg.h | 330 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 340 insertions(+), 263 deletions(-)
 create mode 100644 include/uapi/scsi/sg.h

diff --git a/include/scsi/sg.h b/include/scsi/sg.h
index 7327e12f3373..f9fa142bf23a 100644
--- a/include/scsi/sg.h
+++ b/include/scsi/sg.h
@@ -4,71 +4,17 @@
 
 #include <linux/compiler.h>
 
-/*
- * History:
- *  Started: Aug 9 by Lawrence Foard (entropy@world.std.com), to allow user
- *   process control of SCSI devices.
- *  Development Sponsored by Killy Corp. NY NY
- *
- * Original driver (sg.h):
- *       Copyright (C) 1992 Lawrence Foard
- * Version 2 and 3 extensions to driver:
- *	Copyright (C) 1998 - 2014 Douglas Gilbert
- *
- *  Version: 3.5.36 (20140603)
- *  This version is for 2.6 and 3 series kernels.
- *
- * Documentation
- * =============
- * A web site for the SG device driver can be found at:
- *	http://sg.danny.cz/sg  [alternatively check the MAINTAINERS file]
- * The documentation for the sg version 3 driver can be found at:
- *	http://sg.danny.cz/sg/p/sg_v3_ho.html
- * Also see: <kernel_source>/Documentation/scsi/scsi-generic.rst
- *
- * For utility and test programs see: http://sg.danny.cz/sg/sg3_utils.html
- */
-
-#ifdef __KERNEL__
+#if defined(__KERNEL__)
 extern int sg_big_buff; /* for sysctl */
-#endif
-
-
-typedef struct sg_iovec /* same structure as used by readv() Linux system */
-{                       /* call. It defines one scatter-gather element. */
-    void __user *iov_base;      /* Starting address  */
-    size_t iov_len;             /* Length in bytes  */
-} sg_iovec_t;
 
+/*
+ * In version 3.9.01 of the sg driver, this file was split in two, with the
+ * bulk of the user space interface being placed in the file being included
+ * in the following line.
+ */
 
-typedef struct sg_io_hdr
-{
-    int interface_id;           /* [i] 'S' for SCSI generic (required) */
-    int dxfer_direction;        /* [i] data transfer direction  */
-    unsigned char cmd_len;      /* [i] SCSI command length */
-    unsigned char mx_sb_len;    /* [i] max length to write to sbp */
-    unsigned short iovec_count; /* [i] 0 implies no scatter gather */
-    unsigned int dxfer_len;     /* [i] byte count of data transfer */
-    void __user *dxferp;	/* [i], [*io] points to data transfer memory
-					      or scatter gather list */
-    unsigned char __user *cmdp; /* [i], [*i] points to command to perform */
-    void __user *sbp;		/* [i], [*o] points to sense_buffer memory */
-    unsigned int timeout;       /* [i] MAX_UINT->no timeout (unit: millisec) */
-    unsigned int flags;         /* [i] 0 -> default, see SG_FLAG... */
-    int pack_id;                /* [i->o] unused internally (normally) */
-    void __user * usr_ptr;      /* [i->o] unused internally */
-    unsigned char status;       /* [o] scsi status */
-    unsigned char masked_status;/* [o] shifted, masked scsi status */
-    unsigned char msg_status;   /* [o] messaging level data (optional) */
-    unsigned char sb_len_wr;    /* [o] byte count actually written to sbp */
-    unsigned short host_status; /* [o] errors from host adapter */
-    unsigned short driver_status;/* [o] errors from software driver */
-    int resid;                  /* [o] dxfer_len - actual_transferred */
-    unsigned int duration;      /* [o] time taken by cmd (unit: millisec) */
-    unsigned int info;          /* [o] auxiliary information */
-} sg_io_hdr_t;  /* 64 bytes long (on i386) */
+#include <uapi/scsi/sg.h>
 
-#if defined(__KERNEL__)
 #include <linux/compat.h>
 
 struct compat_sg_io_hdr {
@@ -96,209 +42,10 @@ struct compat_sg_io_hdr {
 	compat_uint_t duration;		/* [o] time taken by cmd (unit: millisec) */
 	compat_uint_t info;		/* [o] auxiliary information */
 };
-#endif
-
-#define SG_INTERFACE_ID_ORIG 'S'
-
-/* Use negative values to flag difference from original sg_header structure */
-#define SG_DXFER_NONE (-1)      /* e.g. a SCSI Test Unit Ready command */
-#define SG_DXFER_TO_DEV (-2)    /* e.g. a SCSI WRITE command */
-#define SG_DXFER_FROM_DEV (-3)  /* e.g. a SCSI READ command */
-#define SG_DXFER_TO_FROM_DEV (-4) /* treated like SG_DXFER_FROM_DEV with the
-				   additional property than during indirect
-				   IO the user buffer is copied into the
-				   kernel buffers before the transfer */
-#define SG_DXFER_UNKNOWN (-5)   /* Unknown data direction */
-
-/* following flag values can be "or"-ed together */
-#define SG_FLAG_DIRECT_IO 1     /* default is indirect IO */
-#define SG_FLAG_UNUSED_LUN_INHIBIT 2   /* default is overwrite lun in SCSI */
-				/* command block (when <= SCSI_2) */
-#define SG_FLAG_MMAP_IO 4       /* request memory mapped IO */
-#define SG_FLAG_NO_DXFER 0x10000 /* no transfer of kernel buffers to/from */
-				/* user space (debug indirect IO) */
-/* defaults:: for sg driver: Q_AT_HEAD; for block layer: Q_AT_TAIL */
-#define SG_FLAG_Q_AT_TAIL 0x10
-#define SG_FLAG_Q_AT_HEAD 0x20
-
-/* following 'info' values are "or"-ed together */
-#define SG_INFO_OK_MASK 0x1
-#define SG_INFO_OK 0x0          /* no sense, host nor driver "noise" */
-#define SG_INFO_CHECK 0x1       /* something abnormal happened */
-
-#define SG_INFO_DIRECT_IO_MASK 0x6
-#define SG_INFO_INDIRECT_IO 0x0 /* data xfer via kernel buffers (or no xfer) */
-#define SG_INFO_DIRECT_IO 0x2   /* direct IO requested and performed */
-#define SG_INFO_MIXED_IO 0x4    /* part direct, part indirect IO */
-
-
-typedef struct sg_scsi_id { /* used by SG_GET_SCSI_ID ioctl() */
-    int host_no;        /* as in "scsi<n>" where 'n' is one of 0, 1, 2 etc */
-    int channel;
-    int scsi_id;        /* scsi id of target device */
-    int lun;
-    int scsi_type;      /* TYPE_... defined in scsi/scsi.h */
-    short h_cmd_per_lun;/* host (adapter) maximum commands per lun */
-    short d_queue_depth;/* device (or adapter) maximum queue length */
-    int unused[2];      /* probably find a good use, set 0 for now */
-} sg_scsi_id_t; /* 32 bytes long on i386 */
-
-typedef struct sg_req_info { /* used by SG_GET_REQUEST_TABLE ioctl() */
-    char req_state;     /* 0 -> not used, 1 -> written, 2 -> ready to read */
-    char orphan;        /* 0 -> normal request, 1 -> from interruped SG_IO */
-    char sg_io_owned;   /* 0 -> complete with read(), 1 -> owned by SG_IO */
-    char problem;       /* 0 -> no problem detected, 1 -> error to report */
-    int pack_id;        /* pack_id associated with request */
-    void __user *usr_ptr;     /* user provided pointer (in new interface) */
-    unsigned int duration; /* millisecs elapsed since written (req_state==1)
-			      or request duration (req_state==2) */
-    int unused;
-} sg_req_info_t; /* 20 bytes long on i386 */
-
-
-/* IOCTLs: Those ioctls that are relevant to the SG 3.x drivers follow.
- [Those that only apply to the SG 2.x drivers are at the end of the file.]
- (_GET_s yield result via 'int *' 3rd argument unless otherwise indicated) */
-
-#define SG_EMULATED_HOST 0x2203 /* true for emulated host adapter (ATAPI) */
-
-/* Used to configure SCSI command transformation layer for ATAPI devices */
-/* Only supported by the ide-scsi driver */
-#define SG_SET_TRANSFORM 0x2204 /* N.B. 3rd arg is not pointer but value: */
-		      /* 3rd arg = 0 to disable transform, 1 to enable it */
-#define SG_GET_TRANSFORM 0x2205
-
-#define SG_SET_RESERVED_SIZE 0x2275  /* request a new reserved buffer size */
-#define SG_GET_RESERVED_SIZE 0x2272  /* actual size of reserved buffer */
-
-/* The following ioctl has a 'sg_scsi_id_t *' object as its 3rd argument. */
-#define SG_GET_SCSI_ID 0x2276   /* Yields fd's bus, chan, dev, lun + type */
-/* SCSI id information can also be obtained from SCSI_IOCTL_GET_IDLUN */
-
-/* Override host setting and always DMA using low memory ( <16MB on i386) */
-#define SG_SET_FORCE_LOW_DMA 0x2279  /* 0-> use adapter setting, 1-> force */
-#define SG_GET_LOW_DMA 0x227a   /* 0-> use all ram for dma; 1-> low dma ram */
-
-/* When SG_SET_FORCE_PACK_ID set to 1, pack_id is input to read() which
-   tries to fetch a packet with a matching pack_id, waits, or returns EAGAIN.
-   If pack_id is -1 then read oldest waiting. When ...FORCE_PACK_ID set to 0
-   then pack_id ignored by read() and oldest readable fetched. */
-#define SG_SET_FORCE_PACK_ID 0x227b
-#define SG_GET_PACK_ID 0x227c /* Yields oldest readable pack_id (or -1) */
 
-#define SG_GET_NUM_WAITING 0x227d /* Number of commands awaiting read() */
-
-/* Yields max scatter gather tablesize allowed by current host adapter */
-#define SG_GET_SG_TABLESIZE 0x227F  /* 0 implies can't do scatter gather */
-
-#define SG_GET_VERSION_NUM 0x2282 /* Example: version 2.1.34 yields 20134 */
-
-/* Returns -EBUSY if occupied. 3rd argument pointer to int (see next) */
-#define SG_SCSI_RESET 0x2284
-/* Associated values that can be given to SG_SCSI_RESET follow.
- * SG_SCSI_RESET_NO_ESCALATE may be OR-ed to the _DEVICE, _TARGET, _BUS
- * or _HOST reset value so only that action is attempted. */
-#define		SG_SCSI_RESET_NOTHING	0
-#define		SG_SCSI_RESET_DEVICE	1
-#define		SG_SCSI_RESET_BUS	2
-#define		SG_SCSI_RESET_HOST	3
-#define		SG_SCSI_RESET_TARGET	4
-#define		SG_SCSI_RESET_NO_ESCALATE	0x100
-
-/* synchronous SCSI command ioctl, (only in version 3 interface) */
-#define SG_IO 0x2285   /* similar effect as write() followed by read() */
-
-#define SG_GET_REQUEST_TABLE 0x2286   /* yields table of active requests */
-
-/* How to treat EINTR during SG_IO ioctl(), only in SG 3.x series */
-#define SG_SET_KEEP_ORPHAN 0x2287 /* 1 -> hold for read(), 0 -> drop (def) */
-#define SG_GET_KEEP_ORPHAN 0x2288
-
-/* yields scsi midlevel's access_count for this SCSI device */
-#define SG_GET_ACCESS_COUNT 0x2289  
-
-
-#define SG_SCATTER_SZ (8 * 4096)
-/* Largest size (in bytes) a single scatter-gather list element can have.
-   The value used by the driver is 'max(SG_SCATTER_SZ, PAGE_SIZE)'.
-   This value should be a power of 2 (and may be rounded up internally).
-   If scatter-gather is not supported by adapter then this value is the
-   largest data block that can be read/written by a single scsi command. */
-
-#define SG_DEFAULT_RETRIES 0
-
-/* Defaults, commented if they differ from original sg driver */
-#define SG_DEF_FORCE_PACK_ID 0
-#define SG_DEF_KEEP_ORPHAN 0
-#define SG_DEF_RESERVED_SIZE SG_SCATTER_SZ /* load time option */
-
-/* maximum outstanding requests, write() yields EDOM if exceeded */
-#define SG_MAX_QUEUE 16
-
-#define SG_BIG_BUFF SG_DEF_RESERVED_SIZE    /* for backward compatibility */
-
-/* Alternate style type names, "..._t" variants preferred */
-typedef struct sg_io_hdr Sg_io_hdr;
-typedef struct sg_io_vec Sg_io_vec;
-typedef struct sg_scsi_id Sg_scsi_id;
-typedef struct sg_req_info Sg_req_info;
-
-
-/* vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv */
-/*   The older SG interface based on the 'sg_header' structure follows.   */
-/* ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ */
-
-#define SG_MAX_SENSE 16   /* this only applies to the sg_header interface */
-
-struct sg_header
-{
-    int pack_len;    /* [o] reply_len (ie useless), ignored as input */
-    int reply_len;   /* [i] max length of expected reply (inc. sg_header) */
-    int pack_id;     /* [io] id number of packet (use ints >= 0) */
-    int result;      /* [o] 0==ok, else (+ve) Unix errno (best ignored) */
-    unsigned int twelve_byte:1;
-	/* [i] Force 12 byte command length for group 6 & 7 commands  */
-    unsigned int target_status:5;   /* [o] scsi status from target */
-    unsigned int host_status:8;     /* [o] host status (see "DID" codes) */
-    unsigned int driver_status:8;   /* [o] driver status+suggestion */
-    unsigned int other_flags:10;    /* unused */
-    unsigned char sense_buffer[SG_MAX_SENSE]; /* [o] Output in 3 cases:
-	   when target_status is CHECK_CONDITION or
-	   when target_status is COMMAND_TERMINATED or
-	   when (driver_status & DRIVER_SENSE) is true. */
-};      /* This structure is 36 bytes long on i386 */
-
-
-/* IOCTLs: The following are not required (or ignored) when the sg_io_hdr_t
-	   interface is used. They are kept for backward compatibility with
-	   the original and version 2 drivers. */
-
-#define SG_SET_TIMEOUT 0x2201  /* unit: jiffies (10ms on i386) */
-#define SG_GET_TIMEOUT 0x2202  /* yield timeout as _return_ value */
-
-/* Get/set command queuing state per fd (default is SG_DEF_COMMAND_Q.
-   Each time a sg_io_hdr_t object is seen on this file descriptor, this
-   command queuing flag is set on (overriding the previous setting). */
-#define SG_GET_COMMAND_Q 0x2270   /* Yields 0 (queuing off) or 1 (on) */
-#define SG_SET_COMMAND_Q 0x2271   /* Change queuing state with 0 or 1 */
-
-/* Turn on/off error sense trace (1 and 0 respectively, default is off).
-   Try using: "# cat /proc/scsi/sg/debug" instead in the v3 driver */
-#define SG_SET_DEBUG 0x227e    /* 0 -> turn off debug */
-
-#define SG_NEXT_CMD_LEN 0x2283  /* override SCSI command length with given
-		   number on the next write() on this file descriptor */
-
-
-/* Defaults, commented if they differ from original sg driver */
-#ifdef __KERNEL__
-#define SG_DEFAULT_TIMEOUT_USER	(60*USER_HZ) /* HZ == 'jiffies in 1 second' */
-#else
-#define SG_DEFAULT_TIMEOUT	(60*HZ)	     /* HZ == 'jiffies in 1 second' */
+#define SG_DEFAULT_TIMEOUT_USER (60 * USER_HZ) /* HZ: jiffies in 1 second */
 #endif
 
-#define SG_DEF_COMMAND_Q 0     /* command queuing is always on when
-				  the new interface is used */
-#define SG_DEF_UNDERRUN_FLAG 0
+#undef SG_DEFAULT_TIMEOUT	/* because of conflicting define in sg.c */
 
-#endif
+#endif	/* end of ifndef _SCSI_GENERIC_H guard */
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
new file mode 100644
index 000000000000..c5a813462631
--- /dev/null
+++ b/include/uapi/scsi/sg.h
@@ -0,0 +1,330 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_SCSI_SG_H
+#define _UAPI_SCSI_SG_H
+
+/*
+ * History:
+ *  Started: Aug 9 by Lawrence Foard (entropy@world.std.com), to allow user
+ *  process control of SCSI devices.
+ *  Development Sponsored by Killy Corp. NY NY
+ *
+ * Original driver (sg.h):
+ *   Copyright (C) 1992 Lawrence Foard
+ *
+ * Later extensions (versions 2, 3 and 4) to driver:
+ *   Copyright (C) 1998 - 2018 Douglas Gilbert
+ *
+ * Version 4.0.11 (20190502)
+ *  This version is for Linux 4 and 5 series kernels.
+ *
+ * Documentation
+ * =============
+ * A web site for the SG device driver can be found at:
+ *   https://sg.danny.cz/sg  [alternatively check the MAINTAINERS file]
+ * The documentation for the sg version 3 driver can be found at:
+ *   https://sg.danny.cz/sg/p/sg_v3_ho.html
+ * Also see: <kernel_source>/Documentation/scsi/scsi-generic.txt
+ *
+ * For utility and test programs see: https://sg.danny.cz/sg/sg3_utils.html
+ */
+
+#include <stddef.h>
+#include <linux/types.h>
+#include <linux/major.h>
+
+/* bsg.h contains the sg v4 user space interface structure (sg_io_v4). */
+#include <linux/bsg.h>
+
+/*
+ * Same structure as used by readv() call. It defines one scatter-gather
+ * element. "Scatter-gather" is abbreviated to "sgat" in this driver to
+ * avoid confusion with this driver's name.
+ */
+typedef struct sg_iovec	{
+	void __user *iov_base;	/* Starting address (of a byte) */
+	size_t iov_len;		/* Length in bytes */
+} sg_iovec_t;
+
+
+typedef struct sg_io_hdr {
+	int interface_id;	/* [i] 'S' for SCSI generic (required) */
+	int dxfer_direction;	/* [i] data transfer direction  */
+	unsigned char cmd_len;	/* [i] SCSI command length */
+	unsigned char mx_sb_len;/* [i] max length to write to sbp */
+	unsigned short iovec_count;	/* [i] 0 implies no sgat list */
+	unsigned int dxfer_len;	/* [i] byte count of data transfer */
+	/* dxferp points to data transfer memory or scatter gather list */
+	void __user *dxferp;	/* [i], [*io] */
+	unsigned char __user *cmdp;/* [i], [*i] points to command to perform */
+	void __user *sbp;	/* [i], [*o] points to sense_buffer memory */
+	unsigned int timeout;	/* [i] MAX_UINT->no timeout (unit: millisec) */
+	unsigned int flags;	/* [i] 0 -> default, see SG_FLAG... */
+	int pack_id;		/* [i->o] unused internally (normally) */
+	void __user *usr_ptr;	/* [i->o] unused internally */
+	unsigned char status;	/* [o] scsi status */
+	unsigned char masked_status;/* [o] shifted, masked scsi status */
+	unsigned char msg_status;/* [o] messaging level data (optional) */
+	unsigned char sb_len_wr; /* [o] byte count actually written to sbp */
+	unsigned short host_status; /* [o] errors from host adapter */
+	unsigned short driver_status;/* [o] errors from software driver */
+	int resid;		/* [o] dxfer_len - actual_transferred */
+	/* unit may be nanoseconds after SG_SET_GET_EXTENDED ioctl use */
+	unsigned int duration;	/* [o] time taken by cmd (unit: millisec) */
+	unsigned int info;	/* [o] auxiliary information */
+} sg_io_hdr_t;
+
+#define SG_INTERFACE_ID_ORIG 'S'
+
+/* Use negative values to flag difference from original sg_header structure */
+#define SG_DXFER_NONE (-1)	/* e.g. a SCSI Test Unit Ready command */
+#define SG_DXFER_TO_DEV (-2)	/* data-out buffer e.g. SCSI WRITE command */
+#define SG_DXFER_FROM_DEV (-3)	/* data-in buffer e.g. SCSI READ command */
+/*
+ * SG_DXFER_TO_FROM_DEV is treated like SG_DXFER_FROM_DEV with the additional
+ * property than during indirect IO the user buffer is copied into the kernel
+ * buffers _before_ the transfer from the device takes place. Useful if short
+ * DMA transfers (less than requested) are not reported (e.g. resid always 0).
+ */
+#define SG_DXFER_TO_FROM_DEV (-4)
+#define SG_DXFER_UNKNOWN (-5)	/* Unknown data direction, do not use */
+
+/* following flag values can be OR-ed together in v3::flags or v4::flags */
+#define SG_FLAG_DIRECT_IO 1	/* default is indirect IO */
+/* SG_FLAG_UNUSED_LUN_INHIBIT is ignored in sg v4 driver */
+#define SG_FLAG_UNUSED_LUN_INHIBIT 2  /* ignored, was LUN overwrite in cdb */
+#define SG_FLAG_MMAP_IO 4	/* request memory mapped IO */
+/* no transfers between kernel<-->user space; keep device<-->kernel xfers */
+#define SG_FLAG_NO_DXFER 0x10000 /* See comment on previous line! */
+/* defaults: for sg driver (v3_v4): Q_AT_HEAD; for block layer: Q_AT_TAIL */
+#define SG_FLAG_Q_AT_TAIL 0x10
+#define SG_FLAG_Q_AT_HEAD 0x20
+
+/* Output (potentially OR-ed together) in v3::info or v4::info field */
+#define SG_INFO_OK_MASK 0x1
+#define SG_INFO_OK 0x0		/* no sense, host nor driver "noise" */
+#define SG_INFO_CHECK 0x1	/* something abnormal happened */
+
+#define SG_INFO_DIRECT_IO_MASK 0x6
+#define SG_INFO_INDIRECT_IO 0x0	/* data xfer via kernel buffers (or no xfer) */
+#define SG_INFO_DIRECT_IO 0x2	/* direct IO requested and performed */
+#define SG_INFO_MIXED_IO 0x4	/* not used, always 0 */
+#define SG_INFO_DEVICE_DETACHING 0x8	/* completed successfully but ... */
+#define SG_INFO_ABORTED 0x10	/* this command has been aborted */
+#define SG_INFO_MRQ_FINI 0x20	/* marks multi-reqs that have finished */
+
+/*
+ * Pointer to object of this structure filled by ioctl(SG_GET_SCSI_ID). Last
+ * field changed in v4 driver, was 'int unused[2]' so remains the same size.
+ */
+typedef struct sg_scsi_id {
+	int host_no;	/* as in "scsi<n>" where 'n' is one of 0, 1, 2 etc */
+	int channel;
+	int scsi_id;	/* scsi id of target device */
+	int lun;	/* lower 32 bits of internal 64 bit integer */
+	int scsi_type;	/* TYPE_... defined in scsi/scsi.h */
+	short h_cmd_per_lun;/* host (adapter) maximum commands per lun */
+	short d_queue_depth;/* device (or adapter) maximum queue length */
+	int unused[2];
+} sg_scsi_id_t;
+
+/* For backward compatibility v4 driver yields at most SG_MAX_QUEUE of these */
+typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
+	char req_state;	/* See 'enum sg_rq_state' definition in v4 driver */
+	char orphan;	/* 0 -> normal request, 1 -> from interrupted SG_IO */
+	/* sg_io_owned set imples synchronous, clear implies asynchronous */
+	char sg_io_owned;/* 0 -> complete with read(), 1 -> owned by SG_IO */
+	char problem;	/* 0 -> no problem detected, 1 -> error to report */
+	/* If SG_CTL_FLAGM_TAG_FOR_PACK_ID set on fd then next field is tag */
+	int pack_id;	/* pack_id, in v4 driver may be tag instead */
+	void __user *usr_ptr;	/* user provided pointer in v3+v4 interface */
+	unsigned int duration;
+	int unused;
+} sg_req_info_t;
+
+/*
+ * IOCTLs: Those ioctls that are relevant to the SG 3.x drivers follow.
+ * [Those that only apply to the SG 2.x drivers are at the end of the file.]
+ * (_GET_s yield result via 'int *' 3rd argument unless otherwise indicated)
+ */
+
+#define SG_EMULATED_HOST 0x2203	/* true for emulated host adapter (ATAPI) */
+
+/*
+ * Used to configure SCSI command transformation layer for ATAPI devices.
+ * Only supported by the ide-scsi driver. 20181014 No longer supported, this
+ * driver passes them to the mid-level which returns a EINVAL (22) errno.
+ *
+ * Original note: N.B. 3rd arg is not pointer but value: 3rd arg = 0 to
+ * disable transform, 1 to enable it
+ */
+#define SG_SET_TRANSFORM 0x2204
+#define SG_GET_TRANSFORM 0x2205
+
+#define SG_SET_RESERVED_SIZE 0x2275  /* request new reserved buffer size */
+#define SG_GET_RESERVED_SIZE 0x2272  /* actual size of reserved buffer */
+
+/* The following ioctl has a 'sg_scsi_id_t *' object as its 3rd argument. */
+#define SG_GET_SCSI_ID 0x2276   /* Yields fd's bus, chan, dev, lun + type */
+/* SCSI id information can also be obtained from SCSI_IOCTL_GET_IDLUN */
+
+/* Override host setting and always DMA using low memory ( <16MB on i386) */
+#define SG_SET_FORCE_LOW_DMA 0x2279  /* 0-> use adapter setting, 1-> force */
+#define SG_GET_LOW_DMA 0x227a	/* 0-> use all ram for dma; 1-> low dma ram */
+
+/*
+ * When SG_SET_FORCE_PACK_ID set to 1, pack_id (or tag) is input to read() or
+ * ioctl(SG_IO_RECEIVE). These functions wait until matching packet (request/
+ * command) is finished but they will return with EAGAIN quickly if the file
+ * descriptor was opened O_NONBLOCK or (in v4) if SGV4_FLAG_IMMED is given.
+ * The tag is used when SG_CTL_FLAGM_TAG_FOR_PACK_ID is set on the parent
+ * file descriptor (default: use pack_id). If pack_id or tag is -1 then read
+ * oldest waiting and this is the same action as when FORCE_PACK_ID is
+ * clear on the parent file descriptor. In the v4 interface the pack_id is
+ * placed the in sg_io_v4::request_extra field .
+ */
+#define SG_SET_FORCE_PACK_ID 0x227b	/* pack_id or in v4 can be tag */
+#define SG_GET_PACK_ID 0x227c  /* Yields oldest readable pack_id/tag, or -1 */
+
+#define SG_GET_NUM_WAITING 0x227d /* Number of commands awaiting read() */
+
+/* Yields max scatter gather tablesize allowed by current host adapter */
+#define SG_GET_SG_TABLESIZE 0x227F  /* 0 implies can't do scatter gather */
+
+/*
+ * Integer form of version number: [x]xyyzz where [x] empty when x=0 .
+ * String form of version number: "[x]x.[y]y.zz"
+ */
+#define SG_GET_VERSION_NUM 0x2282 /* Example: version "2.1.34" yields 20134 */
+
+/* Returns -EBUSY if occupied. 3rd argument pointer to int (see next) */
+#define SG_SCSI_RESET 0x2284
+/*
+ * Associated values that can be given to SG_SCSI_RESET follow.
+ * SG_SCSI_RESET_NO_ESCALATE may be OR-ed to the _DEVICE, _TARGET, _BUS
+ * or _HOST reset value so only that action is attempted.
+ */
+#define		SG_SCSI_RESET_NOTHING	0
+#define		SG_SCSI_RESET_DEVICE	1
+#define		SG_SCSI_RESET_BUS	2
+#define		SG_SCSI_RESET_HOST	3
+#define		SG_SCSI_RESET_TARGET	4
+#define		SG_SCSI_RESET_NO_ESCALATE	0x100
+
+/* synchronous SCSI command ioctl, (for version 3 and 4 interface) */
+#define SG_IO 0x2285	/* similar effect as write() followed by read() */
+
+#define SG_GET_REQUEST_TABLE 0x2286	/* yields table of active requests */
+
+/* How to treat EINTR during SG_IO ioctl(), only in sg v3 and v4 driver */
+#define SG_SET_KEEP_ORPHAN 0x2287 /* 1 -> hold for read(), 0 -> drop (def) */
+#define SG_GET_KEEP_ORPHAN 0x2288
+
+/*
+ * Yields scsi midlevel's access_count for this SCSI device. 20181014 No
+ * longer available, always yields 1.
+ */
+#define SG_GET_ACCESS_COUNT 0x2289
+
+
+/*
+ * Default size (in bytes) a single scatter-gather list element can have.
+ * The value used by the driver is 'max(SG_SCATTER_SZ, PAGE_SIZE)'. This
+ * value should be a power of 2 (and may be rounded up internally). In the
+ * v4 driver this can be changed by ioctl(SG_SET_GET_EXTENDED{SGAT_ELEM_SZ}).
+ */
+#define SG_SCATTER_SZ (8 * 4096)
+
+/* sg driver users' code should handle retries (e.g. from Unit Attentions) */
+#define SG_DEFAULT_RETRIES 0
+
+/* Defaults, commented if they differ from original sg driver */
+#define SG_DEF_FORCE_PACK_ID 0
+#define SG_DEF_KEEP_ORPHAN 0
+#define SG_DEF_RESERVED_SIZE SG_SCATTER_SZ /* load time option */
+
+/*
+ * Maximum outstanding requests (i.e write()s without corresponding read()s)
+ * yields EDOM from write() if exceeded. This limit only applies prior to
+ * version 3.9 . It is still used as a maximum number of sg_req_info objects
+ * that are returned from the SG_GET_REQUEST_TABLE ioctl.
+ */
+#define SG_MAX_QUEUE 16
+
+#define SG_BIG_BUFF SG_DEF_RESERVED_SIZE    /* for backward compatibility */
+
+/*
+ * Alternate style type names, "..._t" variants (as found in the
+ * 'typedef struct * {};' definitions above) are preferred to these:
+ */
+typedef struct sg_io_hdr Sg_io_hdr;
+typedef struct sg_io_vec Sg_io_vec;
+typedef struct sg_scsi_id Sg_scsi_id;
+typedef struct sg_req_info Sg_req_info;
+
+
+/* vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv */
+/*   The v1+v2 SG interface based on the 'sg_header' structure follows.   */
+/* ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ */
+
+#define SG_MAX_SENSE 16	/* this only applies to the sg_header interface */
+
+struct sg_header {
+	int pack_len;	/* [o] reply_len (ie useless), ignored as input */
+	int reply_len;	/* [i] max length of expected reply (inc. sg_header) */
+	int pack_id;	/* [io] id number of packet (use ints >= 0) */
+	int result;	/* [o] 0==ok, else (+ve) Unix errno (best ignored) */
+	unsigned int twelve_byte:1;
+	    /* [i] Force 12 byte command length for group 6 & 7 commands  */
+	unsigned int target_status:5;	/* [o] scsi status from target */
+	unsigned int host_status:8;	/* [o] host status (see "DID" codes) */
+	unsigned int driver_status:8;	/* [o] driver status+suggestion */
+	unsigned int other_flags:10;	/* unused */
+	unsigned char sense_buffer[SG_MAX_SENSE];
+	/*
+	 * [o] Output in 3 cases:
+	 *	when target_status is CHECK_CONDITION or
+	 *	when target_status is COMMAND_TERMINATED or
+	 *	when (driver_status & DRIVER_SENSE) is true.
+	 */
+};
+
+/*
+ * IOCTLs: The following are not required (or ignored) when the v3 or v4
+ * interface is used as those structures contain a timeout field. These
+ * ioctls are kept for backward compatibility with v1+v2 interfaces.
+ */
+
+#define SG_SET_TIMEOUT 0x2201  /* unit: (user space) jiffies */
+#define SG_GET_TIMEOUT 0x2202  /* yield timeout as _return_ value */
+
+/*
+ * Get/set command queuing state per fd (default is SG_DEF_COMMAND_Q.
+ * Each time a sg_io_hdr_t object is seen on this file descriptor, this
+ * command queuing flag is set on (overriding the previous setting).
+ * This setting defaults to 0 (i.e. no queuing) but gets set the first
+ * time that fd sees a v3 or v4 interface request.
+ */
+#define SG_GET_COMMAND_Q 0x2270   /* Yields 0 (queuing off) or 1 (on) */
+#define SG_SET_COMMAND_Q 0x2271   /* Change queuing state with 0 or 1 */
+
+/*
+ * Turn on/off error sense trace (1 and 0 respectively, default is off).
+ * Try using: "# cat /proc/scsi/sg/debug" instead in the v3 driver
+ */
+#define SG_SET_DEBUG 0x227e    /* 0 -> turn off debug */
+
+/*
+ * override SCSI command length with given number on the next write() on
+ * this file descriptor (v1 and v2 interface only)
+ */
+#define SG_NEXT_CMD_LEN 0x2283
+
+/* command queuing is always on when the v3 or v4 interface is used */
+#define SG_DEF_COMMAND_Q 0
+
+#define SG_DEF_UNDERRUN_FLAG 0
+
+/* If the timeout value in the v3_v4 interfaces is 0, this value is used */
+#define SG_DEFAULT_TIMEOUT	(60*HZ)	/* HZ == 'jiffies in 1 second' */
+
+#endif		/* end of _UAPI_SCSI_SG_H guard */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 08/83] sg: speed sg_poll and sg_get_num_waiting
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (7 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 07/83] sg: move header to uapi section Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 09/83] sg: sg_allow_if_err_recovery and renames Douglas Gilbert
                   ` (74 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, Hannes Reinecke

Track the number of submitted and waiting (for read/receive)
requests on each file descriptor with two atomic integers.
This speeds sg_poll() and ioctl(SG_GET_NUM_WAITING) which
are oft used with the asynchronous (non-blocking) interfaces.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 54 +++++++++++++++++++++++------------------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 42c5ffedf09b..3b760eb0d7ba 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -142,6 +142,8 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	struct mutex f_mutex;	/* protect against changes in this fd */
 	int timeout;		/* defaults to SG_DEFAULT_TIMEOUT      */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
+	atomic_t submitted;	/* number inflight or awaiting read */
+	atomic_t waiting;	/* number of requests awaiting read */
 	struct sg_scatter_hold reserve;	/* buffer for this file descriptor */
 	struct list_head rq_list; /* head of request list */
 	struct fasync_struct *async_qp;	/* used by asynchronous notification */
@@ -683,6 +685,8 @@ sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 	else
 		at_head = 1;
 
+	if (!blocking)
+		atomic_inc(&sfp->submitted);
 	srp->rq->timeout = timeout;
 	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
 	blk_execute_rq_nowait(sdp->disk, srp->rq, at_head, sg_rq_end_io);
@@ -1102,14 +1106,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 		return put_user(-1, ip);
 	case SG_GET_NUM_WAITING:
-		read_lock_irqsave(&sfp->rq_list_lock, iflags);
-		val = 0;
-		list_for_each_entry(srp, &sfp->rq_list, entry) {
-			if ((1 == srp->done) && (!srp->sg_io_owned))
-				++val;
-		}
-		read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-		return put_user(val, ip);
+		return put_user(atomic_read(&sfp->waiting), ip);
 	case SG_GET_SG_TABLESIZE:
 		return put_user(sdp->max_sgat_elems, ip);
 	case SG_SET_RESERVED_SIZE:
@@ -1281,35 +1278,26 @@ sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 }
 #endif
 
+/*
+ * Implements the poll(2) system call for this driver. Returns various EPOLL*
+ * flags OR-ed together.
+ */
 static __poll_t
 sg_poll(struct file *filp, poll_table * wait)
 {
 	__poll_t p_res = 0;
 	struct sg_fd *sfp = filp->private_data;
-	struct sg_request *srp;
-	int count = 0;
-	unsigned long iflags;
 
-	if (!sfp)
-		return EPOLLERR;
 	poll_wait(filp, &sfp->read_wait, wait);
-	read_lock_irqsave(&sfp->rq_list_lock, iflags);
-	list_for_each_entry(srp, &sfp->rq_list, entry) {
-		/* if any read waiting, flag it */
-		if (p_res == 0 && srp->done == 1 && !srp->sg_io_owned)
-			p_res = EPOLLIN | EPOLLRDNORM;
-		++count;
-	}
-	read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	if (atomic_read(&sfp->waiting) > 0)
+		p_res = EPOLLIN | EPOLLRDNORM;
 
-	if (sfp->parentdp && SG_IS_DETACHING(sfp->parentdp)) {
+	if (unlikely(SG_IS_DETACHING(sfp->parentdp)))
 		p_res |= EPOLLHUP;
-	} else if (!sfp->cmd_q) {
-		if (count == 0)
-			p_res |= EPOLLOUT | EPOLLWRNORM;
-	} else if (count < SG_MAX_QUEUE) {
+	else if (likely(sfp->cmd_q))
+		p_res |= EPOLLOUT | EPOLLWRNORM;
+	else if (atomic_read(&sfp->submitted) == 0)
 		p_res |= EPOLLOUT | EPOLLWRNORM;
-	}
 	SG_LOG(3, sfp, "%s: p_res=0x%x\n", __func__, (__force u32)p_res);
 	return p_res;
 }
@@ -1494,6 +1482,8 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 
 	/* Rely on write phase to clean out srp status values, so no "else" */
 
+	if (!srp->sg_io_owned)
+		atomic_inc(&sfp->waiting);
 	/*
 	 * Free the request as soon as it is complete so that its resources
 	 * can be reused without waiting for userspace to read() the
@@ -1951,6 +1941,10 @@ sg_finish_rem_req(struct sg_request *srp)
 
 	SG_LOG(4, sfp, "%s: srp=0x%p%s\n", __func__, srp,
 	       (srp->res_used) ? " rsv" : "");
+	if (!srp->sg_io_owned) {
+		atomic_dec(&sfp->submitted);
+		atomic_dec(&sfp->waiting);
+	}
 	if (srp->bio)
 		ret = blk_rq_unmap_user(srp->bio);
 
@@ -2248,6 +2242,9 @@ sg_add_sfp(struct sg_device *sdp)
 	sfp->cmd_q = SG_DEF_COMMAND_Q;
 	sfp->keep_orphan = SG_DEF_KEEP_ORPHAN;
 	sfp->parentdp = sdp;
+	atomic_set(&sfp->submitted, 0);
+	atomic_set(&sfp->waiting, 0);
+
 	write_lock_irqsave(&sdp->sfd_lock, iflags);
 	if (SG_IS_DETACHING(sdp)) {
 		write_unlock_irqrestore(&sdp->sfd_lock, iflags);
@@ -2619,6 +2616,9 @@ sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 		seq_printf(s, "   cmd_q=%d f_packid=%d k_orphan=%d closed=0\n",
 			   (int) fp->cmd_q, (int) fp->force_packid,
 			   (int) fp->keep_orphan);
+		seq_printf(s, "   submitted=%d waiting=%d\n",
+			   atomic_read(&fp->submitted),
+			   atomic_read(&fp->waiting));
 		list_for_each_entry(srp, &fp->rq_list, entry) {
 			hp = &srp->header;
 			new_interface = (hp->interface_id == '\0') ? 0 : 1;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 09/83] sg: sg_allow_if_err_recovery and renames
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (8 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 08/83] sg: speed sg_poll and sg_get_num_waiting Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 10/83] sg: improve naming Douglas Gilbert
                   ` (73 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, Hannes Reinecke

Add sg_allow_if_err_recover() to do checks common to several entry
points. Replace retval with either res or ret. Rename
sg_finish_rem_req() to sg_finish_scsi_blk_rq(). Rename
sg_new_write() to sg_submit(). Other cleanups triggered by
checkpatch.pl .

Reviewed-by: Hannes Reinecke <hare@suse.com>

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 240 +++++++++++++++++++++++++---------------------
 1 file changed, 130 insertions(+), 110 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 3b760eb0d7ba..588e4c05c6c9 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -68,7 +68,7 @@ static char *sg_version_date = "20190606";
 
 /* SG_MAX_CDB_SIZE should be 260 (spc4r37 section 3.1.30) however the type
  * of sg_io_hdr::cmd_len can only represent 255. All SCSI commands greater
- * than 16 bytes are "variable length" whose length is a multiple of 4
+ * than 16 bytes are "variable length" whose length is a multiple of 4, so:
  */
 #define SG_MAX_CDB_SIZE 252
 
@@ -178,16 +178,16 @@ static void sg_rq_end_io(struct request *rq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
 static int sg_proc_init(void);
 static int sg_start_req(struct sg_request *srp, u8 *cmd);
-static int sg_finish_rem_req(struct sg_request *srp);
+static int sg_finish_scsi_blk_rq(struct sg_request *srp);
 static int sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
 			     int buff_size);
-static ssize_t sg_new_write(struct sg_fd *sfp, struct file *file,
-			    const char __user *buf, size_t count, int blocking,
-			    int read_only, int sg_io_owned,
-			    struct sg_request **o_srp);
+static ssize_t sg_submit(struct sg_fd *sfp, struct file *filp,
+			 const char __user *buf, size_t count, bool blocking,
+			 bool read_only, bool sg_io_owned,
+			 struct sg_request **o_srp);
 static int sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 			   u8 *cmnd, int timeout, int blocking);
-static int sg_rd_append(struct sg_request *srp, char __user *outp,
+static int sg_rd_append(struct sg_request *srp, void __user *outp,
 			int num_xfer);
 static void sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp);
 static void sg_build_reserve(struct sg_fd *sfp, int req_size);
@@ -275,37 +275,60 @@ sg_check_file_access(struct file *filp, const char *caller)
 static int
 sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 {
-	int retval = 0;
+	int res = 0;
 
 	if (o_excl) {
 		while (atomic_read(&sdp->open_cnt) > 0) {
 			mutex_unlock(&sdp->open_rel_lock);
-			retval = wait_event_interruptible(sdp->open_wait,
-					(SG_IS_DETACHING(sdp) ||
-					 atomic_read(&sdp->open_cnt) == 0));
+			res = wait_event_interruptible
+					(sdp->open_wait,
+					 (SG_IS_DETACHING(sdp) ||
+					  atomic_read(&sdp->open_cnt) == 0));
 			mutex_lock(&sdp->open_rel_lock);
 
-			if (retval) /* -ERESTARTSYS */
-				return retval;
+			if (res) /* -ERESTARTSYS */
+				return res;
 			if (SG_IS_DETACHING(sdp))
 				return -ENODEV;
 		}
 	} else {
 		while (SG_HAVE_EXCLUDE(sdp)) {
 			mutex_unlock(&sdp->open_rel_lock);
-			retval = wait_event_interruptible(sdp->open_wait,
-					(SG_IS_DETACHING(sdp) ||
-					 !SG_HAVE_EXCLUDE(sdp)));
+			res = wait_event_interruptible
+					(sdp->open_wait,
+					 (SG_IS_DETACHING(sdp) ||
+					  !SG_HAVE_EXCLUDE(sdp)));
 			mutex_lock(&sdp->open_rel_lock);
 
-			if (retval) /* -ERESTARTSYS */
-				return retval;
+			if (res) /* -ERESTARTSYS */
+				return res;
 			if (SG_IS_DETACHING(sdp))
 				return -ENODEV;
 		}
 	}
 
-	return retval;
+	return res;
+}
+
+/*
+ * scsi_block_when_processing_errors() returns 0 when dev was taken offline by
+ * error recovery, 1 otherwise (i.e. okay). Even if in error recovery, let
+ * user continue if O_NONBLOCK set. Permits SCSI commands to be issued during
+ * error recovery. Tread carefully.
+ * Returns 0 for ok (i.e. allow), -EPROTO if sdp is NULL, otherwise -ENXIO .
+ */
+static inline int
+sg_allow_if_err_recovery(struct sg_device *sdp, bool non_block)
+{
+	if (!sdp)
+		return -EPROTO;
+	if (SG_IS_DETACHING(sdp))
+		return -ENODEV;
+	if (non_block)
+		return 0;
+	if (likely(scsi_block_when_processing_errors(sdp->device)))
+		return 0;
+	return -ENXIO;
 }
 
 /*
@@ -318,16 +341,17 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 static int
 sg_open(struct inode *inode, struct file *filp)
 {
-	bool o_excl;
+	bool o_excl, non_block;
 	int min_dev = iminor(inode);
 	int op_flags = filp->f_flags;
+	int res;
 	struct request_queue *q;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
-	int retval;
 
 	nonseekable_open(inode, filp);
 	o_excl = !!(op_flags & O_EXCL);
+	non_block = !!(op_flags & O_NONBLOCK);
 	if (o_excl && ((op_flags & O_ACCMODE) == O_RDONLY))
 		return -EPERM; /* Can't lock it with read only access */
 	sdp = sg_get_dev(min_dev);	/* increments sdp->d_ref */
@@ -336,20 +360,23 @@ sg_open(struct inode *inode, struct file *filp)
 
 	/* This driver's module count bumped by fops_get in <linux/fs.h> */
 	/* Prevent the device driver from vanishing while we sleep */
-	retval = scsi_device_get(sdp->device);
-	if (retval)
+	res = scsi_device_get(sdp->device);
+	if (res)
 		goto sg_put;
 
-	retval = scsi_autopm_get_device(sdp->device);
-	if (retval)
+	res = scsi_autopm_get_device(sdp->device);
+	if (res)
 		goto sdp_put;
 
+	res = sg_allow_if_err_recovery(sdp, non_block);
+	if (res)
+		goto error_out;
 	/* scsi_block_when_processing_errors() may block so bypass
 	 * check if O_NONBLOCK. Permits SCSI commands to be issued
 	 * during error recovery. Tread carefully. */
 	if (!((op_flags & O_NONBLOCK) ||
 	      scsi_block_when_processing_errors(sdp->device))) {
-		retval = -ENXIO;
+		res = -ENXIO;
 		/* we are in error recovery for this device */
 		goto error_out;
 	}
@@ -358,18 +385,18 @@ sg_open(struct inode *inode, struct file *filp)
 	if (op_flags & O_NONBLOCK) {
 		if (o_excl) {
 			if (atomic_read(&sdp->open_cnt) > 0) {
-				retval = -EBUSY;
+				res = -EBUSY;
 				goto error_mutex_locked;
 			}
 		} else {
 			if (SG_HAVE_EXCLUDE(sdp)) {
-				retval = -EBUSY;
+				res = -EBUSY;
 				goto error_mutex_locked;
 			}
 		}
 	} else {
-		retval = sg_wait_open_event(sdp, o_excl);
-		if (retval) /* -ERESTARTSYS or -ENODEV */
+		res = sg_wait_open_event(sdp, o_excl);
+		if (res) /* -ERESTARTSYS or -ENODEV */
 			goto error_mutex_locked;
 	}
 
@@ -384,7 +411,7 @@ sg_open(struct inode *inode, struct file *filp)
 	}
 	sfp = sg_add_sfp(sdp);		/* increments sdp->d_ref */
 	if (IS_ERR(sfp)) {
-		retval = PTR_ERR(sfp);
+		res = PTR_ERR(sfp);
 		goto out_undo;
 	}
 
@@ -396,11 +423,11 @@ sg_open(struct inode *inode, struct file *filp)
 	       atomic_read(&sdp->open_cnt),
 	       ((op_flags & O_NONBLOCK) ? " O_NONBLOCK" : ""));
 
-	retval = 0;
+	res = 0;
 sg_put:
 	kref_put(&sdp->d_ref, sg_device_destroy);
 	/* if success, sdp->d_ref is incremented twice, decremented once */
-	return retval;
+	return res;
 
 out_undo:
 	if (o_excl) {		/* undo if error */
@@ -449,40 +476,34 @@ sg_release(struct inode *inode, struct file *filp)
 static ssize_t
 sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 {
-	int mxsize, cmd_size, k;
-	int input_size, blocking;
+	bool blocking = !(filp->f_flags & O_NONBLOCK);
 	u8 opcode;
+	int mxsize, cmd_size, input_size, res;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 	struct sg_request *srp;
 	struct sg_header old_hdr;
 	sg_io_hdr_t *hp;
 	u8 cmnd[SG_MAX_CDB_SIZE];
-	int retval;
 
-	retval = sg_check_file_access(filp, __func__);
-	if (retval)
-		return retval;
+	res = sg_check_file_access(filp, __func__);
+	if (res)
+		return res;
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
 	SG_LOG(3, sfp, "%s: write(3rd arg) count=%d\n", __func__, (int)count);
-	if (!sdp)
-		return -ENXIO;
-	if (SG_IS_DETACHING(sdp))
-		return -ENODEV;
-	if (!((filp->f_flags & O_NONBLOCK) ||
-	      scsi_block_when_processing_errors(sdp->device)))
-		return -ENXIO;
+	res = sg_allow_if_err_recovery(sdp, !blocking);
+	if (res)
+		return res;
 
 	if (count < SZ_SG_HEADER)
 		return -EIO;
 	if (copy_from_user(&old_hdr, buf, SZ_SG_HEADER))
 		return -EFAULT;
-	blocking = !(filp->f_flags & O_NONBLOCK);
 	if (old_hdr.reply_len < 0)
-		return sg_new_write(sfp, filp, buf, count,
-				    blocking, 0, 0, NULL);
+		return sg_submit(sfp, filp, buf, count, blocking, false, false,
+				 NULL);
 	if (count < (SZ_SG_HEADER + 6))
 		return -EIO;	/* The minimum scsi command length is 6 bytes. */
 
@@ -554,8 +575,8 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 				   input_size, (unsigned int) cmnd[0],
 				   current->comm);
 	}
-	k = sg_common_write(sfp, srp, cmnd, sfp->timeout, blocking);
-	return (k < 0) ? k : count;
+	res = sg_common_write(sfp, srp, cmnd, sfp->timeout, blocking);
+	return (res < 0) ? res : count;
 }
 
 static int
@@ -570,9 +591,9 @@ sg_allow_access(struct file *filp, u8 *cmd)
 }
 
 static ssize_t
-sg_new_write(struct sg_fd *sfp, struct file *file, const char __user *buf,
-	     size_t count, int blocking, int read_only, int sg_io_owned,
-	     struct sg_request **o_srp)
+sg_submit(struct sg_fd *sfp, struct file *filp, const char __user *buf,
+	  size_t count, bool blocking, bool read_only, bool sg_io_owned,
+	  struct sg_request **o_srp)
 {
 	int k;
 	struct sg_request *srp;
@@ -623,7 +644,7 @@ sg_new_write(struct sg_fd *sfp, struct file *file, const char __user *buf,
 		sg_remove_request(sfp, srp);
 		return -EFAULT;
 	}
-	if (read_only && sg_allow_access(file, cmnd)) {
+	if (read_only && sg_allow_access(filp, cmnd)) {
 		sg_remove_request(sfp, srp);
 		return -EPERM;
 	}
@@ -662,7 +683,7 @@ sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 	k = sg_start_req(srp, cmnd);
 	if (k) {
 		SG_LOG(1, sfp, "%s: start_req err=%d\n", __func__, k);
-		sg_finish_rem_req(srp);
+		sg_finish_scsi_blk_rq(srp);
 		sg_remove_request(sfp, srp);
 		return k;	/* probably out of space --> ENOMEM */
 	}
@@ -673,7 +694,7 @@ sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 			srp->rq = NULL;
 		}
 
-		sg_finish_rem_req(srp);
+		sg_finish_scsi_blk_rq(srp);
 		sg_remove_request(sfp, srp);
 		return -ENODEV;
 	}
@@ -758,7 +779,7 @@ sg_new_read(struct sg_fd *sfp, char __user *buf, size_t count,
 		hp->info |= SG_INFO_CHECK;
 	err = put_sg_io_hdr(hp, buf);
 err_out:
-	err2 = sg_finish_rem_req(srp);
+	err2 = sg_finish_scsi_blk_rq(srp);
 	sg_remove_request(sfp, srp);
 	return err ? : err2 ? : count;
 }
@@ -782,23 +803,24 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 	struct sg_fd *sfp;
 	struct sg_request *srp;
 	int req_pack_id = -1;
+	int ret = 0;
 	sg_io_hdr_t *hp;
 	struct sg_header *old_hdr = NULL;
-	int retval = 0;
 
 	/*
 	 * This could cause a response to be stranded. Close the associated
 	 * file descriptor to free up any resources being held.
 	 */
-	retval = sg_check_file_access(filp, __func__);
-	if (retval)
-		return retval;
+	ret = sg_check_file_access(filp, __func__);
+	if (ret)
+		return ret;
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
 	SG_LOG(3, sfp, "%s: read() count=%d\n", __func__, (int)count);
-	if (!sdp)
-		return -ENXIO;
+	ret = sg_allow_if_err_recovery(sdp, false);
+	if (ret)
+		return ret;
 
 	if (!access_ok(buf, count))
 		return -EFAULT;
@@ -807,7 +829,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 		if (!old_hdr)
 			return -ENOMEM;
 		if (copy_from_user(old_hdr, buf, SZ_SG_HEADER)) {
-			retval = -EFAULT;
+			ret = -EFAULT;
 			goto free_old_hdr;
 		}
 		if (old_hdr->reply_len < 0) {
@@ -816,15 +838,15 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 
 				new_hdr = kmalloc(SZ_SG_IO_HDR, GFP_KERNEL);
 				if (!new_hdr) {
-					retval = -ENOMEM;
+					ret = -ENOMEM;
 					goto free_old_hdr;
 				}
-				retval = copy_from_user
+				ret = copy_from_user
 				    (new_hdr, buf, SZ_SG_IO_HDR);
 				req_pack_id = new_hdr->pack_id;
 				kfree(new_hdr);
-				if (retval) {
-					retval = -EFAULT;
+				if (ret) {
+					ret = -EFAULT;
 					goto free_old_hdr;
 				}
 			}
@@ -835,28 +857,28 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 	srp = sg_get_rq_mark(sfp, req_pack_id);
 	if (!srp) {		/* now wait on packet to arrive */
 		if (SG_IS_DETACHING(sdp)) {
-			retval = -ENODEV;
+			ret = -ENODEV;
 			goto free_old_hdr;
 		}
 		if (filp->f_flags & O_NONBLOCK) {
-			retval = -EAGAIN;
+			ret = -EAGAIN;
 			goto free_old_hdr;
 		}
-		retval = wait_event_interruptible
+		ret = wait_event_interruptible
 				(sfp->read_wait,
 				 (SG_IS_DETACHING(sdp) ||
 				  (srp = sg_get_rq_mark(sfp, req_pack_id))));
 		if (SG_IS_DETACHING(sdp)) {
-			retval = -ENODEV;
+			ret = -ENODEV;
 			goto free_old_hdr;
 		}
-		if (retval) {
+		if (ret) {
 			/* -ERESTARTSYS as signal hit process */
 			goto free_old_hdr;
 		}
 	}
 	if (srp->header.interface_id != '\0') {
-		retval = sg_new_read(sfp, buf, count, srp);
+		ret = sg_new_read(sfp, buf, count, srp);
 		goto free_old_hdr;
 	}
 
@@ -864,7 +886,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 	if (!old_hdr) {
 		old_hdr = kmalloc(SZ_SG_HEADER, GFP_KERNEL);
 		if (!old_hdr) {
-			retval = -ENOMEM;
+			ret = -ENOMEM;
 			goto free_old_hdr;
 		}
 	}
@@ -915,7 +937,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 	/* Now copy the result back to the user buffer.  */
 	if (count >= SZ_SG_HEADER) {
 		if (copy_to_user(buf, old_hdr, SZ_SG_HEADER)) {
-			retval = -EFAULT;
+			ret = -EFAULT;
 			goto free_old_hdr;
 		}
 		buf += SZ_SG_HEADER;
@@ -923,19 +945,19 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 			count = old_hdr->reply_len;
 		if (count > SZ_SG_HEADER) {
 			if (sg_rd_append(srp, buf, count - SZ_SG_HEADER)) {
-				retval = -EFAULT;
+				ret = -EFAULT;
 				goto free_old_hdr;
 			}
 		}
 	} else {
 		count = (old_hdr->result == 0) ? 0 : -EIO;
 	}
-	sg_finish_rem_req(srp);
+	sg_finish_scsi_blk_rq(srp);
 	sg_remove_request(sfp, srp);
-	retval = count;
+	ret = count;
 free_old_hdr:
 	kfree(old_hdr);
-	return retval;
+	return ret;
 }
 
 static int
@@ -1023,12 +1045,11 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 
 	switch (cmd_in) {
 	case SG_IO:
-		if (SG_IS_DETACHING(sdp))
-			return -ENODEV;
-		if (!scsi_block_when_processing_errors(sdp->device))
-			return -ENXIO;
-		result = sg_new_write(sfp, filp, p, SZ_SG_IO_HDR,
-				 1, read_only, 1, &srp);
+		result = sg_allow_if_err_recovery(sdp, false);
+		if (result)
+			return result;
+		result = sg_submit(sfp, filp, p, SZ_SG_IO_HDR, true, read_only,
+				   true, &srp);
 		if (result < 0)
 			return result;
 		result = wait_event_interruptible(sfp->read_wait,
@@ -1228,8 +1249,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		break;
 	}
 
-	result = scsi_ioctl_block_when_processing_errors(sdp->device,
-			cmd_in, filp->f_flags & O_NDELAY);
+	result = sg_allow_if_err_recovery(sdp, filp->f_flags & O_NDELAY);
 	if (result)
 		return result;
 
@@ -1409,7 +1429,7 @@ sg_rq_end_io_usercontext(struct work_struct *work)
 	struct sg_request *srp = container_of(work, struct sg_request, ew.work);
 	struct sg_fd *sfp = srp->parentfp;
 
-	sg_finish_rem_req(srp);
+	sg_finish_scsi_blk_rq(srp);
 	sg_remove_request(sfp, srp);
 	kref_put(&sfp->f_ref, sg_remove_sfp);
 }
@@ -1932,7 +1952,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 }
 
 static int
-sg_finish_rem_req(struct sg_request *srp)
+sg_finish_scsi_blk_rq(struct sg_request *srp)
 {
 	int ret = 0;
 
@@ -2077,7 +2097,7 @@ sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
  * appended to given struct sg_header object.
  */
 static int
-sg_rd_append(struct sg_request *srp, char __user *outp, int num_xfer)
+sg_rd_append(struct sg_request *srp, void __user *outp, int num_xfer)
 {
 	struct sg_scatter_hold *schp = &srp->data;
 	int k, num;
@@ -2280,7 +2300,7 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 	write_lock_irqsave(&sfp->rq_list_lock, iflags);
 	while (!list_empty(&sfp->rq_list)) {
 		srp = list_first_entry(&sfp->rq_list, struct sg_request, entry);
-		sg_finish_rem_req(srp);
+		sg_finish_scsi_blk_rq(srp);
 		list_del(&srp->entry);
 		srp->parentfp = NULL;
 	}
@@ -2358,7 +2378,7 @@ sg_get_dev(int dev)
 #if IS_ENABLED(CONFIG_SCSI_PROC_FS)     /* long, almost to end of file */
 static int sg_proc_seq_show_int(struct seq_file *s, void *v);
 
-static int sg_proc_single_open_adio(struct inode *inode, struct file *file);
+static int sg_proc_single_open_adio(struct inode *inode, struct file *filp);
 static ssize_t sg_proc_write_adio(struct file *filp, const char __user *buffer,
 			          size_t count, loff_t *off);
 static const struct proc_ops adio_proc_ops = {
@@ -2369,7 +2389,7 @@ static const struct proc_ops adio_proc_ops = {
 	.proc_release	= single_release,
 };
 
-static int sg_proc_single_open_dressz(struct inode *inode, struct file *file);
+static int sg_proc_single_open_dressz(struct inode *inode, struct file *filp);
 static ssize_t sg_proc_write_dressz(struct file *filp, 
 		const char __user *buffer, size_t count, loff_t *off);
 static const struct proc_ops dressz_proc_ops = {
@@ -2418,13 +2438,13 @@ sg_proc_init(void)
 	if (!p)
 		return 1;
 
-	proc_create("allow_dio", S_IRUGO | S_IWUSR, p, &adio_proc_ops);
-	proc_create_seq("debug", S_IRUGO, p, &debug_seq_ops);
-	proc_create("def_reserved_size", S_IRUGO | S_IWUSR, p, &dressz_proc_ops);
-	proc_create_single("device_hdr", S_IRUGO, p, sg_proc_seq_show_devhdr);
-	proc_create_seq("devices", S_IRUGO, p, &dev_seq_ops);
-	proc_create_seq("device_strs", S_IRUGO, p, &devstrs_seq_ops);
-	proc_create_single("version", S_IRUGO, p, sg_proc_seq_show_version);
+	proc_create("allow_dio", 0644, p, &adio_proc_ops);
+	proc_create_seq("debug", 0444, p, &debug_seq_ops);
+	proc_create("def_reserved_size", 0644, p, &dressz_proc_ops);
+	proc_create_single("device_hdr", 0444, p, sg_proc_seq_show_devhdr);
+	proc_create_seq("devices", 0444, p, &dev_seq_ops);
+	proc_create_seq("device_strs", 0444, p, &devstrs_seq_ops);
+	proc_create_single("version", 0444, p, sg_proc_seq_show_version);
 	return 0;
 }
 
@@ -2448,9 +2468,9 @@ sg_proc_seq_show_int(struct seq_file *s, void *v)
 }
 
 static int
-sg_proc_single_open_adio(struct inode *inode, struct file *file)
+sg_proc_single_open_adio(struct inode *inode, struct file *filp)
 {
-	return single_open(file, sg_proc_seq_show_int, &sg_allow_dio);
+	return single_open(filp, sg_proc_seq_show_int, &sg_allow_dio);
 }
 
 static ssize_t 
@@ -2470,9 +2490,9 @@ sg_proc_write_adio(struct file *filp, const char __user *buffer,
 }
 
 static int
-sg_proc_single_open_dressz(struct inode *inode, struct file *file)
+sg_proc_single_open_dressz(struct inode *inode, struct file *filp)
 {
-	return single_open(file, sg_proc_seq_show_int, &sg_big_buff);
+	return single_open(filp, sg_proc_seq_show_int, &sg_big_buff);
 }
 
 static ssize_t 
@@ -2534,7 +2554,7 @@ dev_seq_start(struct seq_file *s, loff_t *pos)
 static void *
 dev_seq_next(struct seq_file *s, void *v, loff_t *pos)
 {
-	struct sg_proc_deviter * it = s->private;
+	struct sg_proc_deviter *it = s->private;
 
 	*pos = ++it->index;
 	return (it->index < it->max) ? it : NULL;
@@ -2549,7 +2569,7 @@ dev_seq_stop(struct seq_file *s, void *v)
 static int
 sg_proc_seq_show_dev(struct seq_file *s, void *v)
 {
-	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
+	struct sg_proc_deviter *it = (struct sg_proc_deviter *)v;
 	struct sg_device *sdp;
 	struct scsi_device *scsidp;
 	unsigned long iflags;
@@ -2575,7 +2595,7 @@ sg_proc_seq_show_dev(struct seq_file *s, void *v)
 static int
 sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 {
-	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
+	struct sg_proc_deviter *it = (struct sg_proc_deviter *)v;
 	struct sg_device *sdp;
 	struct scsi_device *scsidp;
 	unsigned long iflags;
@@ -2663,7 +2683,7 @@ sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 static int
 sg_proc_seq_show_debug(struct seq_file *s, void *v)
 {
-	struct sg_proc_deviter * it = (struct sg_proc_deviter *) v;
+	struct sg_proc_deviter *it = (struct sg_proc_deviter *)v;
 	struct sg_device *sdp;
 	unsigned long iflags;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 10/83] sg: improve naming
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (9 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 09/83] sg: sg_allow_if_err_recovery and renames Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 11/83] sg: change rwlock to spinlock Douglas Gilbert
                   ` (72 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Remove use of typedef sg_io_hdr_t and replace with struct
sg_io_hdr. Change some names on driver wide structure fields
and comment them. Rename sg_alloc() to sg_add_device_helper()
to reflect its current role. Rename sg_add_request() to
sg_setup_req() to imply that it precedes sg_start_req().

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 248 ++++++++++++++++++++++++----------------------
 1 file changed, 131 insertions(+), 117 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 588e4c05c6c9..592048f7e430 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -92,7 +92,7 @@ static int sg_allow_dio = SG_ALLOW_DIO_DEF;
 static int scatter_elem_sz = SG_SCATTER_SZ;
 static int scatter_elem_sz_prev = SG_SCATTER_SZ;
 
-#define SG_SECTOR_SZ 512
+#define SG_DEF_SECTOR_SZ 512
 
 static int sg_add_device(struct device *, struct class_interface *);
 static void sg_remove_device(struct device *, struct class_interface *);
@@ -105,12 +105,13 @@ static struct class_interface sg_interface = {
 	.remove_dev     = sg_remove_device,
 };
 
-struct sg_scatter_hold { /* holding area for scsi scatter gather info */
-	u16 k_use_sg; /* Count of kernel scatter-gather pieces */
+struct sg_scatter_hold {     /* holding area for scsi scatter gather info */
+	struct page **pages;	/* num_sgat element array of struct page* */
+	int buflen;		/* capacity in bytes (dlen<=buflen) */
+	int dlen;		/* current valid data length of this req */
+	u16 page_order;		/* byte_len = (page_size*(2**page_order)) */
+	u16 num_sgat;		/* actual number of scatter-gather segments */
 	unsigned int sglist_len; /* size of malloc'd scatter-gather list ++ */
-	unsigned int bufflen;	/* Size of (aggregate) data buffer */
-	struct page **pages;
-	int page_order;
 	char dio_in_use;	/* 0->indirect IO (or mmap), 1->dio */
 	u8 cmd_opcode;		/* first byte of command */
 };
@@ -122,20 +123,20 @@ struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
 	struct list_head entry;	/* list entry */
 	struct sg_fd *parentfp;	/* NULL -> not in use */
 	struct sg_scatter_hold data;	/* hold buffer, perhaps scatter list */
-	sg_io_hdr_t header;	/* scsi command+info, see <scsi/sg.h> */
+	struct sg_io_hdr header;  /* scsi command+info, see <scsi/sg.h> */
 	u8 sense_b[SCSI_SENSE_BUFFERSIZE];
 	char res_used;		/* 1 -> using reserve buffer, 0 -> not ... */
 	char orphan;		/* 1 -> drop on sight, 0 -> normal */
 	char sg_io_owned;	/* 1 -> packet belongs to SG_IO */
 	/* done protected by rq_list_lock */
 	char done;		/* 0->before bh, 1->before read, 2->read */
-	struct request *rq;
-	struct bio *bio;
-	struct execute_work ew;
+	struct request *rq;	/* released in sg_rq_end_io(), bio kept */
+	struct bio *bio;	/* kept until this req -->SG_RS_INACTIVE */
+	struct execute_work ew_orph;	/* harvest orphan request */
 };
 
 struct sg_fd {		/* holds the state of a file descriptor */
-	struct list_head sfd_siblings;  /* protected by device's sfd_lock */
+	struct list_head sfd_entry;	/* member sg_device::sfds list */
 	struct sg_device *parentdp;	/* owning device */
 	wait_queue_head_t read_wait;	/* queue read until command done */
 	rwlock_t rq_list_lock;	/* protect access to list in req_arr */
@@ -155,7 +156,7 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	char mmap_called;	/* 0 -> mmap() never called on this fd */
 	char res_in_use;	/* 1 -> 'reserve' array in use */
 	struct kref f_ref;
-	struct execute_work ew;
+	struct execute_work ew_fd;  /* harvest all fd resources and lists */
 };
 
 struct sg_device { /* holds the state of each scsi generic device */
@@ -164,7 +165,7 @@ struct sg_device { /* holds the state of each scsi generic device */
 	struct mutex open_rel_lock;     /* held when in open() or release() */
 	struct list_head sfds;
 	rwlock_t sfd_lock;      /* protect access to sfd list */
-	int max_sgat_elems;	/* adapter's max sgat number of elements */
+	int max_sgat_sz;	/* max number of bytes in sgat list */
 	u32 index;		/* device index number */
 	atomic_t open_cnt;	/* count of opens (perhaps < num(sfds) ) */
 	unsigned long fdev_bm[1];	/* see SG_FDEV_* defines above */
@@ -196,7 +197,7 @@ static void sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp,
 static void sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
 static void sg_remove_sfp(struct kref *);
-static struct sg_request *sg_add_request(struct sg_fd *sfp);
+static struct sg_request *sg_setup_req(struct sg_fd *sfp);
 static int sg_remove_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int dev);
 static void sg_device_destroy(struct kref *kref);
@@ -407,7 +408,7 @@ sg_open(struct inode *inode, struct file *filp)
 	if (atomic_read(&sdp->open_cnt) < 1) {  /* no existing opens */
 		clear_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm);
 		q = sdp->device->request_queue;
-		sdp->max_sgat_elems = queue_max_segments(q);
+		sdp->max_sgat_sz = queue_max_segments(q);
 	}
 	sfp = sg_add_sfp(sdp);		/* increments sdp->d_ref */
 	if (IS_ERR(sfp)) {
@@ -474,16 +475,18 @@ sg_release(struct inode *inode, struct file *filp)
 }
 
 static ssize_t
-sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
+sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 {
 	bool blocking = !(filp->f_flags & O_NONBLOCK);
-	u8 opcode;
 	int mxsize, cmd_size, input_size, res;
+	u8 opcode;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 	struct sg_request *srp;
-	struct sg_header old_hdr;
-	sg_io_hdr_t *hp;
+	struct sg_header ov2hdr;
+	struct sg_io_hdr v3hdr;
+	struct sg_header *ohp = &ov2hdr;
+	struct sg_io_hdr *h3p = &v3hdr;
 	u8 cmnd[SG_MAX_CDB_SIZE];
 
 	res = sg_check_file_access(filp, __func__);
@@ -493,25 +496,36 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
 	SG_LOG(3, sfp, "%s: write(3rd arg) count=%d\n", __func__, (int)count);
-	res = sg_allow_if_err_recovery(sdp, !blocking);
+	res = sg_allow_if_err_recovery(sdp, !!(filp->f_flags & O_NONBLOCK));
 	if (res)
 		return res;
 
 	if (count < SZ_SG_HEADER)
 		return -EIO;
-	if (copy_from_user(&old_hdr, buf, SZ_SG_HEADER))
+	if (copy_from_user(ohp, p, SZ_SG_HEADER))
 		return -EFAULT;
-	if (old_hdr.reply_len < 0)
-		return sg_submit(sfp, filp, buf, count, blocking, false, false,
+	if (ohp->reply_len < 0) {	/* assume this is v3 */
+		struct sg_io_hdr *reinter_2p = (struct sg_io_hdr *)ohp;
+
+		if (count < SZ_SG_IO_HDR)
+			return -EIO;
+		if (reinter_2p->interface_id != 'S') {
+			pr_info_once("sg: %s: v3 interface only here\n",
+				     __func__);
+			return -EPERM;
+		}
+		return sg_submit(sfp, filp, p, count,
+				 !(filp->f_flags & O_NONBLOCK), false, false,
 				 NULL);
+	}
 	if (count < (SZ_SG_HEADER + 6))
 		return -EIO;	/* The minimum scsi command length is 6 bytes. */
 
-	buf += SZ_SG_HEADER;
-	if (get_user(opcode, buf))
+	p += SZ_SG_HEADER;
+	if (get_user(opcode, p))
 		return -EFAULT;
 
-	if (!(srp = sg_add_request(sfp))) {
+	if (!(srp = sg_setup_req(sfp))) {
 		SG_LOG(1, sfp, "%s: queue full\n", __func__);
 		return -EDOM;
 	}
@@ -520,43 +534,44 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 		cmd_size = sfp->next_cmd_len;
 		sfp->next_cmd_len = 0;	/* reset so only this write() effected */
 	} else {
-		cmd_size = COMMAND_SIZE(opcode);	/* based on SCSI command group */
-		if ((opcode >= 0xc0) && old_hdr.twelve_byte)
+		cmd_size = COMMAND_SIZE(opcode);  /* old: SCSI command group */
+		if (opcode >= 0xc0 && ohp->twelve_byte)
 			cmd_size = 12;
 	}
 	mutex_unlock(&sfp->f_mutex);
 	SG_LOG(4, sfp, "%s:   scsi opcode=0x%02x, cmd_size=%d\n", __func__,
 	       (unsigned int)opcode, cmd_size);
 	input_size = count - cmd_size;
-	mxsize = (input_size > old_hdr.reply_len) ? input_size : old_hdr.reply_len;
+	mxsize = max_t(int, input_size, ohp->reply_len);
 	mxsize -= SZ_SG_HEADER;
 	input_size -= SZ_SG_HEADER;
 	if (input_size < 0) {
 		sg_remove_request(sfp, srp);
 		return -EIO;	/* User did not pass enough bytes for this command. */
 	}
-	hp = &srp->header;
-	hp->interface_id = '\0';	/* indicator of old interface tunnelled */
-	hp->cmd_len = (u8)cmd_size;
-	hp->iovec_count = 0;
-	hp->mx_sb_len = 0;
+	h3p = &srp->header;
+	h3p->interface_id = '\0';  /* indicator of old interface tunnelled */
+	h3p->cmd_len = (u8)cmd_size;
+	h3p->iovec_count = 0;
+	h3p->mx_sb_len = 0;
 	if (input_size > 0)
-		hp->dxfer_direction = (old_hdr.reply_len > SZ_SG_HEADER) ?
+		h3p->dxfer_direction = (ohp->reply_len > SZ_SG_HEADER) ?
 		    SG_DXFER_TO_FROM_DEV : SG_DXFER_TO_DEV;
 	else
-		hp->dxfer_direction = (mxsize > 0) ? SG_DXFER_FROM_DEV : SG_DXFER_NONE;
-	hp->dxfer_len = mxsize;
-	if ((hp->dxfer_direction == SG_DXFER_TO_DEV) ||
-	    (hp->dxfer_direction == SG_DXFER_TO_FROM_DEV))
-		hp->dxferp = (char __user *)buf + cmd_size;
+		h3p->dxfer_direction = (mxsize > 0) ? SG_DXFER_FROM_DEV :
+						      SG_DXFER_NONE;
+	h3p->dxfer_len = mxsize;
+	if (h3p->dxfer_direction == SG_DXFER_TO_DEV ||
+	    h3p->dxfer_direction == SG_DXFER_TO_FROM_DEV)
+		h3p->dxferp = (char __user *)p + cmd_size;
 	else
-		hp->dxferp = NULL;
-	hp->sbp = NULL;
-	hp->timeout = old_hdr.reply_len;	/* structure abuse ... */
-	hp->flags = input_size;	/* structure abuse ... */
-	hp->pack_id = old_hdr.pack_id;
-	hp->usr_ptr = NULL;
-	if (copy_from_user(cmnd, buf, cmd_size)) {
+		h3p->dxferp = NULL;
+	h3p->sbp = NULL;
+	h3p->timeout = ohp->reply_len;	/* structure abuse ... */
+	h3p->flags = input_size;	/* structure abuse ... */
+	h3p->pack_id = ohp->pack_id;
+	h3p->usr_ptr = NULL;
+	if (copy_from_user(cmnd, p, cmd_size)) {
 		sg_remove_request(sfp, srp);
 		return -EFAULT;
 	}
@@ -565,13 +580,13 @@ sg_write(struct file *filp, const char __user *buf, size_t count, loff_t * ppos)
 	 * but is is possible that the app intended SG_DXFER_TO_DEV, because there
 	 * is a non-zero input_size, so emit a warning.
 	 */
-	if (hp->dxfer_direction == SG_DXFER_TO_FROM_DEV) {
+	if (h3p->dxfer_direction == SG_DXFER_TO_FROM_DEV) {
 		printk_ratelimited(KERN_WARNING
 				   "sg_write: data in/out %d/%d bytes "
 				   "for SCSI command 0x%x-- guessing "
 				   "data in;\n   program %s not setting "
 				   "count and/or reply_len properly\n",
-				   old_hdr.reply_len - (int)SZ_SG_HEADER,
+				   ohp->reply_len - (int)SZ_SG_HEADER,
 				   input_size, (unsigned int) cmnd[0],
 				   current->comm);
 	}
@@ -597,7 +612,7 @@ sg_submit(struct sg_fd *sfp, struct file *filp, const char __user *buf,
 {
 	int k;
 	struct sg_request *srp;
-	sg_io_hdr_t *hp;
+	struct sg_io_hdr *hp;
 	u8 cmnd[SG_MAX_CDB_SIZE];
 	int timeout;
 	unsigned long ul_timeout;
@@ -606,7 +621,7 @@ sg_submit(struct sg_fd *sfp, struct file *filp, const char __user *buf,
 		return -EINVAL;
 
 	sfp->cmd_q = 1;	/* when sg_io_hdr seen, set command queuing on */
-	if (!(srp = sg_add_request(sfp))) {
+	if (!(srp = sg_setup_req(sfp))) {
 		SG_LOG(1, sfp, "%s: queue full\n", __func__);
 		return -EDOM;
 	}
@@ -621,7 +636,7 @@ sg_submit(struct sg_fd *sfp, struct file *filp, const char __user *buf,
 		return -ENOSYS;
 	}
 	if (hp->flags & SG_FLAG_MMAP_IO) {
-		if (hp->dxfer_len > sfp->reserve.bufflen) {
+		if (hp->dxfer_len > sfp->reserve.buflen) {
 			sg_remove_request(sfp, srp);
 			return -ENOMEM;	/* MMAP_IO size must fit in reserve buffer */
 		}
@@ -662,7 +677,7 @@ sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 {
 	int k, at_head;
 	struct sg_device *sdp = sfp->parentdp;
-	sg_io_hdr_t *hp = &srp->header;
+	struct sg_io_hdr *hp = &srp->header;
 
 	srp->data.cmd_opcode = cmnd[0];	/* hold opcode of command */
 	hp->status = 0;
@@ -744,7 +759,7 @@ static ssize_t
 sg_new_read(struct sg_fd *sfp, char __user *buf, size_t count,
 	    struct sg_request *srp)
 {
-	sg_io_hdr_t *hp = &srp->header;
+	struct sg_io_hdr *hp = &srp->header;
 	int err = 0, err2;
 	int len;
 
@@ -804,7 +819,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 	struct sg_request *srp;
 	int req_pack_id = -1;
 	int ret = 0;
-	sg_io_hdr_t *hp;
+	struct sg_io_hdr *hp;
 	struct sg_header *old_hdr = NULL;
 
 	/*
@@ -834,7 +849,7 @@ sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
 		}
 		if (old_hdr->reply_len < 0) {
 			if (count >= SZ_SG_IO_HDR) {
-				sg_io_hdr_t *new_hdr;
+				struct sg_io_hdr *new_hdr;
 
 				new_hdr = kmalloc(SZ_SG_IO_HDR, GFP_KERNEL);
 				if (!new_hdr) {
@@ -1129,7 +1144,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	case SG_GET_NUM_WAITING:
 		return put_user(atomic_read(&sfp->waiting), ip);
 	case SG_GET_SG_TABLESIZE:
-		return put_user(sdp->max_sgat_elems, ip);
+		return put_user(sdp->max_sgat_sz, ip);
 	case SG_SET_RESERVED_SIZE:
 		result = get_user(val, ip);
 		if (result)
@@ -1139,7 +1154,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		val = min_t(int, val,
 			    max_sectors_bytes(sdp->device->request_queue));
 		mutex_lock(&sfp->f_mutex);
-		if (val != sfp->reserve.bufflen) {
+		if (val != sfp->reserve.buflen) {
 			if (sfp->mmap_called ||
 			    sfp->res_in_use) {
 				mutex_unlock(&sfp->f_mutex);
@@ -1152,7 +1167,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		mutex_unlock(&sfp->f_mutex);
 		return 0;
 	case SG_GET_RESERVED_SIZE:
-		val = min_t(int, sfp->reserve.bufflen,
+		val = min_t(int, sfp->reserve.buflen,
 			    max_sectors_bytes(sdp->device->request_queue));
 		return put_user(val, ip);
 	case SG_SET_COMMAND_Q:
@@ -1352,12 +1367,12 @@ sg_vma_fault(struct vm_fault *vmf)
 	}
 	rsv_schp = &sfp->reserve;
 	offset = vmf->pgoff << PAGE_SHIFT;
-	if (offset >= rsv_schp->bufflen)
+	if (offset >= rsv_schp->buflen)
 		return VM_FAULT_SIGBUS;
 	sa = vma->vm_start;
 	SG_LOG(3, sfp, "%s: vm_start=0x%lx, off=%lu\n", __func__, sa, offset);
 	length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
-	for (k = 0; k < rsv_schp->k_use_sg && sa < vma->vm_end; k++) {
+	for (k = 0; k < rsv_schp->num_sgat && sa < vma->vm_end; k++) {
 		len = vma->vm_end - sa;
 		len = (len < length) ? len : length;
 		if (offset < len) {
@@ -1401,14 +1416,14 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 		return -EINVAL;	/* want no offset */
 	rsv_schp = &sfp->reserve;
 	mutex_lock(&sfp->f_mutex);
-	if (req_sz > rsv_schp->bufflen) {
+	if (req_sz > rsv_schp->buflen) {
 		ret = -ENOMEM;	/* cannot map more than reserved buffer */
 		goto out;
 	}
 
 	sa = vma->vm_start;
 	length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
-	for (k = 0; k < rsv_schp->k_use_sg && sa < vma->vm_end; k++) {
+	for (k = 0; k < rsv_schp->num_sgat && sa < vma->vm_end; k++) {
 		len = vma->vm_end - sa;
 		len = (len < length) ? len : length;
 		sa += len;
@@ -1426,7 +1441,8 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 static void
 sg_rq_end_io_usercontext(struct work_struct *work)
 {
-	struct sg_request *srp = container_of(work, struct sg_request, ew.work);
+	struct sg_request *srp = container_of(work, struct sg_request,
+					      ew_orph.work);
 	struct sg_fd *sfp = srp->parentfp;
 
 	sg_finish_scsi_blk_rq(srp);
@@ -1532,8 +1548,8 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
 		kref_put(&sfp->f_ref, sg_remove_sfp);
 	} else {
-		INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext);
-		schedule_work(&srp->ew.work);
+		INIT_WORK(&srp->ew_orph.work, sg_rq_end_io_usercontext);
+		schedule_work(&srp->ew_orph.work);
 	}
 }
 
@@ -1558,7 +1574,7 @@ static struct class *sg_sysfs_class;
 static int sg_sysfs_valid = 0;
 
 static struct sg_device *
-sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
+sg_add_device_helper(struct gendisk *disk, struct scsi_device *scsidp)
 {
 	struct request_queue *q = scsidp->request_queue;
 	struct sg_device *sdp;
@@ -1600,7 +1616,7 @@ sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 	init_waitqueue_head(&sdp->open_wait);
 	clear_bit(SG_FDEV_DETACHING, sdp->fdev_bm);
 	rwlock_init(&sdp->sfd_lock);
-	sdp->max_sgat_elems = queue_max_segments(q);
+	sdp->max_sgat_sz = queue_max_segments(q);
 	sdp->index = k;
 	kref_init(&sdp->d_ref);
 	error = 0;
@@ -1642,9 +1658,8 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 	cdev->owner = THIS_MODULE;
 	cdev->ops = &sg_fops;
 
-	sdp = sg_alloc(disk, scsidp);
+	sdp = sg_add_device_helper(disk, scsidp);
 	if (IS_ERR(sdp)) {
-		pr_warn("%s: sg_alloc failed\n", __func__);
 		error = PTR_ERR(sdp);
 		goto out;
 	}
@@ -1735,7 +1750,7 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 					"%s: 0x%p\n", __func__, sdp));
 
 	read_lock_irqsave(&sdp->sfd_lock, iflags);
-	list_for_each_entry(sfp, &sdp->sfds, sfd_siblings) {
+	list_for_each_entry(sfp, &sdp->sfds, sfd_entry) {
 		wake_up_interruptible_all(&sfp->read_wait);
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_HUP);
 	}
@@ -1829,7 +1844,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 	struct request *rq;
 	struct scsi_request *req;
 	struct sg_fd *sfp = srp->parentfp;
-	sg_io_hdr_t *hp = &srp->header;
+	struct sg_io_hdr *hp = &srp->header;
 	int dxfer_len = (int) hp->dxfer_len;
 	int dxfer_dir = hp->dxfer_direction;
 	unsigned int iov_count = hp->iovec_count;
@@ -1890,13 +1905,13 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 
 	if (md) {
 		mutex_lock(&sfp->f_mutex);
-		if (dxfer_len <= rsv_schp->bufflen &&
+		if (dxfer_len <= rsv_schp->buflen &&
 		    !sfp->res_in_use) {
 			sfp->res_in_use = 1;
 			sg_link_reserve(sfp, srp, dxfer_len);
 		} else if (hp->flags & SG_FLAG_MMAP_IO) {
 			res = -EBUSY; /* sfp->res_in_use == 1 */
-			if (dxfer_len > rsv_schp->bufflen)
+			if (dxfer_len > rsv_schp->buflen)
 				res = -ENOMEM;
 			mutex_unlock(&sfp->f_mutex);
 			return res;
@@ -1911,7 +1926,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 
 		md->pages = req_schp->pages;
 		md->page_order = req_schp->page_order;
-		md->nr_entries = req_schp->k_use_sg;
+		md->nr_entries = req_schp->num_sgat;
 		md->offset = 0;
 		md->null_mapped = hp->dxferp ? 0 : 1;
 		if (dxfer_dir == SG_DXFER_TO_FROM_DEV)
@@ -1985,13 +2000,13 @@ static int
 sg_build_sgat(struct sg_scatter_hold *schp, const struct sg_fd *sfp,
 	      int tablesize)
 {
-	int sg_bufflen = tablesize * sizeof(struct page *);
+	int sg_buflen = tablesize * sizeof(struct page *);
 	gfp_t gfp_flags = GFP_ATOMIC | __GFP_NOWARN;
 
-	schp->pages = kzalloc(sg_bufflen, gfp_flags);
+	schp->pages = kzalloc(sg_buflen, gfp_flags);
 	if (!schp->pages)
 		return -ENOMEM;
-	schp->sglist_len = sg_bufflen;
+	schp->sglist_len = sg_buflen;
 	return tablesize;	/* number of scat_gath elements allocated */
 }
 
@@ -2000,7 +2015,7 @@ sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
 		  int buff_size)
 {
 	int ret_sz = 0, i, k, rem_sz, num, mx_sc_elems;
-	int max_sgat_elems = sfp->parentdp->max_sgat_elems;
+	int max_sgat_sz = sfp->parentdp->max_sgat_sz;
 	int blk_size = buff_size, order;
 	gfp_t gfp_mask = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN | __GFP_ZERO;
 	struct sg_device *sdp = sfp->parentdp;
@@ -2009,13 +2024,13 @@ sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
 		return -EFAULT;
 	if (0 == blk_size)
 		++blk_size;	/* don't know why */
-	/* round request up to next highest SG_SECTOR_SZ byte boundary */
-	blk_size = ALIGN(blk_size, SG_SECTOR_SZ);
+	/* round request up to next highest SG_DEF_SECTOR_SZ byte boundary */
+	blk_size = ALIGN(blk_size, SG_DEF_SECTOR_SZ);
 	SG_LOG(4, sfp, "%s: buff_size=%d, blk_size=%d\n", __func__, buff_size,
 	       blk_size);
 
 	/* N.B. ret_sz carried into this block ... */
-	mx_sc_elems = sg_build_sgat(schp, sfp, max_sgat_elems);
+	mx_sc_elems = sg_build_sgat(schp, sfp, max_sgat_sz);
 	if (mx_sc_elems < 0)
 		return mx_sc_elems;	/* most likely -ENOMEM */
 
@@ -2056,9 +2071,9 @@ sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
 	}		/* end of for loop */
 
 	schp->page_order = order;
-	schp->k_use_sg = k;
-	SG_LOG(5, sfp, "%s: k_use_sg=%d, order=%d\n", __func__, k, order);
-	schp->bufflen = blk_size;
+	schp->num_sgat = k;
+	SG_LOG(5, sfp, "%s: num_sgat=%d, order=%d\n", __func__, k, order);
+	schp->buflen = blk_size;
 	if (rem_sz > 0)	/* must have failed */
 		return -ENOMEM;
 	return 0;
@@ -2075,12 +2090,12 @@ sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
 static void
 sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
 {
-	SG_LOG(4, sfp, "%s: num_sgat=%d\n", __func__, schp->k_use_sg);
+	SG_LOG(4, sfp, "%s: num_sgat=%d\n", __func__, schp->num_sgat);
 	if (schp->pages && schp->sglist_len > 0) {
 		if (!schp->dio_in_use) {
 			int k;
 
-			for (k = 0; k < schp->k_use_sg && schp->pages[k]; k++) {
+			for (k = 0; k < schp->num_sgat && schp->pages[k]; k++) {
 				SG_LOG(5, sfp, "%s: pg[%d]=0x%p --\n",
 				       __func__, k, schp->pages[k]);
 				__free_pages(schp->pages[k], schp->page_order);
@@ -2107,7 +2122,7 @@ sg_rd_append(struct sg_request *srp, void __user *outp, int num_xfer)
 		return 0;
 
 	num = 1 << (PAGE_SHIFT + schp->page_order);
-	for (k = 0; k < schp->k_use_sg && schp->pages[k]; k++) {
+	for (k = 0; k < schp->num_sgat && schp->pages[k]; k++) {
 		if (num > num_xfer) {
 			if (copy_to_user(outp, page_address(schp->pages[k]),
 					   num_xfer))
@@ -2155,22 +2170,21 @@ sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp, int size)
 	rem = size;
 
 	num = 1 << (PAGE_SHIFT + rsv_schp->page_order);
-	for (k = 0; k < rsv_schp->k_use_sg; k++) {
+	for (k = 0; k < rsv_schp->num_sgat; k++) {
 		if (rem <= num) {
-			req_schp->k_use_sg = k + 1;
+			req_schp->num_sgat = k + 1;
 			req_schp->sglist_len = rsv_schp->sglist_len;
 			req_schp->pages = rsv_schp->pages;
 
-			req_schp->bufflen = size;
+			req_schp->buflen = size;
 			req_schp->page_order = rsv_schp->page_order;
 			break;
 		} else
 			rem -= num;
 	}
 
-	if (k >= rsv_schp->k_use_sg) {
+	if (k >= rsv_schp->num_sgat)
 		SG_LOG(1, sfp, "%s: BAD size\n", __func__);
-	}
 }
 
 static void
@@ -2178,10 +2192,10 @@ sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp)
 {
 	struct sg_scatter_hold *req_schp = &srp->data;
 
-	SG_LOG(4, srp->parentfp, "%s: req->k_use_sg=%d\n", __func__,
-	       (int)req_schp->k_use_sg);
-	req_schp->k_use_sg = 0;
-	req_schp->bufflen = 0;
+	SG_LOG(4, srp->parentfp, "%s: req->num_sgat=%d\n", __func__,
+	       (int)req_schp->num_sgat);
+	req_schp->num_sgat = 0;
+	req_schp->buflen = 0;
 	req_schp->pages = NULL;
 	req_schp->page_order = 0;
 	req_schp->sglist_len = 0;
@@ -2192,7 +2206,7 @@ sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp)
 
 /* always adds to end of list */
 static struct sg_request *
-sg_add_request(struct sg_fd *sfp)
+sg_setup_req(struct sg_fd *sfp)
 {
 	int k;
 	unsigned long iflags;
@@ -2271,7 +2285,7 @@ sg_add_sfp(struct sg_device *sdp)
 		kfree(sfp);
 		return ERR_PTR(-ENODEV);
 	}
-	list_add_tail(&sfp->sfd_siblings, &sdp->sfds);
+	list_add_tail(&sfp->sfd_entry, &sdp->sfds);
 	write_unlock_irqrestore(&sdp->sfd_lock, iflags);
 	SG_LOG(3, sfp, "%s: sfp=0x%p\n", __func__, sfp);
 	if (unlikely(sg_big_buff != def_reserved_size))
@@ -2280,8 +2294,8 @@ sg_add_sfp(struct sg_device *sdp)
 	bufflen = min_t(int, sg_big_buff,
 			max_sectors_bytes(sdp->device->request_queue));
 	sg_build_reserve(sfp, bufflen);
-	SG_LOG(3, sfp, "%s: bufflen=%d, k_use_sg=%d\n", __func__,
-	       sfp->reserve.bufflen, sfp->reserve.k_use_sg);
+	SG_LOG(3, sfp, "%s: bufflen=%d, num_sgat=%d\n", __func__,
+	       sfp->reserve.buflen, sfp->reserve.num_sgat);
 
 	kref_get(&sdp->d_ref);
 	__module_get(THIS_MODULE);
@@ -2291,7 +2305,7 @@ sg_add_sfp(struct sg_device *sdp)
 static void
 sg_remove_sfp_usercontext(struct work_struct *work)
 {
-	struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work);
+	struct sg_fd *sfp = container_of(work, struct sg_fd, ew_fd.work);
 	struct sg_device *sdp = sfp->parentdp;
 	struct sg_request *srp;
 	unsigned long iflags;
@@ -2306,9 +2320,9 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 	}
 	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 
-	if (sfp->reserve.bufflen > 0) {
-		SG_LOG(6, sfp, "%s:    bufflen=%d, k_use_sg=%d\n", __func__,
-		       (int)sfp->reserve.bufflen, (int)sfp->reserve.k_use_sg);
+	if (sfp->reserve.buflen > 0) {
+		SG_LOG(6, sfp, "%s:    buflen=%d, num_sgat=%d\n", __func__,
+		       (int)sfp->reserve.buflen, (int)sfp->reserve.num_sgat);
 		sg_remove_scat(sfp, &sfp->reserve);
 	}
 
@@ -2328,11 +2342,11 @@ sg_remove_sfp(struct kref *kref)
 	unsigned long iflags;
 
 	write_lock_irqsave(&sdp->sfd_lock, iflags);
-	list_del(&sfp->sfd_siblings);
+	list_del(&sfp->sfd_entry);
 	write_unlock_irqrestore(&sdp->sfd_lock, iflags);
 
-	INIT_WORK(&sfp->ew.work, sg_remove_sfp_usercontext);
-	schedule_work(&sfp->ew.work);
+	INIT_WORK(&sfp->ew_fd.work, sg_remove_sfp_usercontext);
+	schedule_work(&sfp->ew_fd.work);
 }
 
 static int
@@ -2619,19 +2633,19 @@ sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 	int k, new_interface, blen, usg;
 	struct sg_request *srp;
 	struct sg_fd *fp;
-	const sg_io_hdr_t *hp;
+	const struct sg_io_hdr *hp;
 	const char * cp;
 	unsigned int ms;
 
 	k = 0;
-	list_for_each_entry(fp, &sdp->sfds, sfd_siblings) {
+	list_for_each_entry(fp, &sdp->sfds, sfd_entry) {
 		k++;
 		read_lock(&fp->rq_list_lock); /* irqs already disabled */
-		seq_printf(s, "   FD(%d): timeout=%dms bufflen=%d "
+		seq_printf(s, "   FD(%d): timeout=%dms buflen=%d "
 			   "(res)sgat=%d low_dma=%d\n", k,
 			   jiffies_to_msecs(fp->timeout),
-			   fp->reserve.bufflen,
-			   (int) fp->reserve.k_use_sg,
+			   fp->reserve.buflen,
+			   (int)fp->reserve.num_sgat,
 			   (int) sdp->device->host->unchecked_isa_dma);
 		seq_printf(s, "   cmd_q=%d f_packid=%d k_orphan=%d closed=0\n",
 			   (int) fp->cmd_q, (int) fp->force_packid,
@@ -2655,8 +2669,8 @@ sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 					cp = "     ";
 			}
 			seq_puts(s, cp);
-			blen = srp->data.bufflen;
-			usg = srp->data.k_use_sg;
+			blen = srp->data.buflen;
+			usg = srp->data.num_sgat;
 			seq_puts(s, srp->done ?
 				 ((1 == srp->done) ?  "rcv:" : "fin:")
 				  : "act:");
@@ -2709,8 +2723,8 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v)
 				   scsidp->lun,
 				   scsidp->host->hostt->emulated);
 		}
-		seq_printf(s, " max_sgat_elems=%d excl=%d open_cnt=%d\n",
-			   sdp->max_sgat_elems, SG_HAVE_EXCLUDE(sdp),
+		seq_printf(s, " max_sgat_sz=%d excl=%d open_cnt=%d\n",
+			   sdp->max_sgat_sz, SG_HAVE_EXCLUDE(sdp),
 			   atomic_read(&sdp->open_cnt));
 		sg_proc_debug_helper(s, sdp);
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 11/83] sg: change rwlock to spinlock
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (10 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 10/83] sg: improve naming Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 12/83] sg: ioctl handling Douglas Gilbert
                   ` (71 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

A reviewer suggested that the extra overhead associated with a
rw lock compared to a spinlock was not worth it for short,
oft-used critcal sections.

So the rwlock on the request list/array is changed to a spinlock.
The head of that list is in the owning sf file descriptor object.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 52 +++++++++++++++++++++++------------------------
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 592048f7e430..105d88f9d8e2 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -139,7 +139,7 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	struct list_head sfd_entry;	/* member sg_device::sfds list */
 	struct sg_device *parentdp;	/* owning device */
 	wait_queue_head_t read_wait;	/* queue read until command done */
-	rwlock_t rq_list_lock;	/* protect access to list in req_arr */
+	spinlock_t rq_list_lock;	/* protect access to list in req_arr */
 	struct mutex f_mutex;	/* protect against changes in this fd */
 	int timeout;		/* defaults to SG_DEFAULT_TIMEOUT      */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
@@ -741,17 +741,17 @@ sg_get_rq_mark(struct sg_fd *sfp, int pack_id)
 	struct sg_request *resp;
 	unsigned long iflags;
 
-	write_lock_irqsave(&sfp->rq_list_lock, iflags);
+	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
 	list_for_each_entry(resp, &sfp->rq_list, entry) {
 		/* look for requests that are ready + not SG_IO owned */
 		if (resp->done == 1 && !resp->sg_io_owned &&
 		    (-1 == pack_id || resp->header.pack_id == pack_id)) {
 			resp->done = 2;	/* guard against other readers */
-			write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+			spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 			return resp;
 		}
 	}
-	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 	return NULL;
 }
 
@@ -805,9 +805,9 @@ srp_done(struct sg_fd *sfp, struct sg_request *srp)
 	unsigned long flags;
 	int ret;
 
-	read_lock_irqsave(&sfp->rq_list_lock, flags);
+	spin_lock_irqsave(&sfp->rq_list_lock, flags);
 	ret = srp->done;
-	read_unlock_irqrestore(&sfp->rq_list_lock, flags);
+	spin_unlock_irqrestore(&sfp->rq_list_lock, flags);
 	return ret;
 }
 
@@ -1071,15 +1071,15 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 			(srp_done(sfp, srp) || SG_IS_DETACHING(sdp)));
 		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
-		write_lock_irq(&sfp->rq_list_lock);
+		spin_lock_irq(&sfp->rq_list_lock);
 		if (srp->done) {
 			srp->done = 2;
-			write_unlock_irq(&sfp->rq_list_lock);
+			spin_unlock_irq(&sfp->rq_list_lock);
 			result = sg_new_read(sfp, p, SZ_SG_IO_HDR, srp);
 			return (result < 0) ? result : 0;
 		}
 		srp->orphan = 1;
-		write_unlock_irq(&sfp->rq_list_lock);
+		spin_unlock_irq(&sfp->rq_list_lock);
 		return result;	/* -ERESTARTSYS because signal hit process */
 	case SG_SET_TIMEOUT:
 		result = get_user(val, ip);
@@ -1131,15 +1131,15 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		sfp->force_packid = val ? 1 : 0;
 		return 0;
 	case SG_GET_PACK_ID:
-		read_lock_irqsave(&sfp->rq_list_lock, iflags);
+		spin_lock_irqsave(&sfp->rq_list_lock, iflags);
 		list_for_each_entry(srp, &sfp->rq_list, entry) {
 			if ((1 == srp->done) && (!srp->sg_io_owned)) {
-				read_unlock_irqrestore(&sfp->rq_list_lock,
+				spin_unlock_irqrestore(&sfp->rq_list_lock,
 						       iflags);
 				return put_user(srp->header.pack_id, ip);
 			}
 		}
-		read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+		spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 		return put_user(-1, ip);
 	case SG_GET_NUM_WAITING:
 		return put_user(atomic_read(&sfp->waiting), ip);
@@ -1208,9 +1208,9 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 					GFP_KERNEL);
 			if (!rinfo)
 				return -ENOMEM;
-			read_lock_irqsave(&sfp->rq_list_lock, iflags);
+			spin_lock_irqsave(&sfp->rq_list_lock, iflags);
 			sg_fill_request_table(sfp, rinfo);
-			read_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+			spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 	#ifdef CONFIG_COMPAT
 			if (in_compat_syscall())
 				result = put_compat_request_table(p, rinfo);
@@ -1530,7 +1530,7 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 	scsi_req_free_cmd(scsi_req(rq));
 	blk_put_request(rq);
 
-	write_lock_irqsave(&sfp->rq_list_lock, iflags);
+	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
 	if (unlikely(srp->orphan)) {
 		if (sfp->keep_orphan)
 			srp->sg_io_owned = 0;
@@ -1538,7 +1538,7 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 			done = 0;
 	}
 	srp->done = done;
-	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 
 	if (likely(done)) {
 		/* Now wake up any sg_read() that is waiting for this
@@ -2212,7 +2212,7 @@ sg_setup_req(struct sg_fd *sfp)
 	unsigned long iflags;
 	struct sg_request *rp = sfp->req_arr;
 
-	write_lock_irqsave(&sfp->rq_list_lock, iflags);
+	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
 	if (!list_empty(&sfp->rq_list)) {
 		if (!sfp->cmd_q)
 			goto out_unlock;
@@ -2228,10 +2228,10 @@ sg_setup_req(struct sg_fd *sfp)
 	rp->parentfp = sfp;
 	rp->header.duration = jiffies_to_msecs(jiffies);
 	list_add_tail(&rp->entry, &sfp->rq_list);
-	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 	return rp;
 out_unlock:
-	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 	return NULL;
 }
 
@@ -2244,13 +2244,13 @@ sg_remove_request(struct sg_fd *sfp, struct sg_request *srp)
 
 	if (!sfp || !srp || list_empty(&sfp->rq_list))
 		return res;
-	write_lock_irqsave(&sfp->rq_list_lock, iflags);
+	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
 	if (!list_empty(&srp->entry)) {
 		list_del(&srp->entry);
 		srp->parentfp = NULL;
 		res = 1;
 	}
-	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 	return res;
 }
 
@@ -2266,7 +2266,7 @@ sg_add_sfp(struct sg_device *sdp)
 		return ERR_PTR(-ENOMEM);
 
 	init_waitqueue_head(&sfp->read_wait);
-	rwlock_init(&sfp->rq_list_lock);
+	spin_lock_init(&sfp->rq_list_lock);
 	INIT_LIST_HEAD(&sfp->rq_list);
 	kref_init(&sfp->f_ref);
 	mutex_init(&sfp->f_mutex);
@@ -2311,14 +2311,14 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 	unsigned long iflags;
 
 	/* Cleanup any responses which were never read(). */
-	write_lock_irqsave(&sfp->rq_list_lock, iflags);
+	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
 	while (!list_empty(&sfp->rq_list)) {
 		srp = list_first_entry(&sfp->rq_list, struct sg_request, entry);
 		sg_finish_scsi_blk_rq(srp);
 		list_del(&srp->entry);
 		srp->parentfp = NULL;
 	}
-	write_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 
 	if (sfp->reserve.buflen > 0) {
 		SG_LOG(6, sfp, "%s:    buflen=%d, num_sgat=%d\n", __func__,
@@ -2640,7 +2640,7 @@ sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 	k = 0;
 	list_for_each_entry(fp, &sdp->sfds, sfd_entry) {
 		k++;
-		read_lock(&fp->rq_list_lock); /* irqs already disabled */
+		spin_lock(&fp->rq_list_lock); /* irqs already disabled */
 		seq_printf(s, "   FD(%d): timeout=%dms buflen=%d "
 			   "(res)sgat=%d low_dma=%d\n", k,
 			   jiffies_to_msecs(fp->timeout),
@@ -2690,7 +2690,7 @@ sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 		}
 		if (list_empty(&fp->rq_list))
 			seq_puts(s, "     No requests active\n");
-		read_unlock(&fp->rq_list_lock);
+		spin_unlock(&fp->rq_list_lock);
 	}
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 12/83] sg: ioctl handling
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (11 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 11/83] sg: change rwlock to spinlock Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 13/83] sg: split sg_read Douglas Gilbert
                   ` (70 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, kernel test robot, Dan Carpenter

Shorten sg_ioctl() by adding some helper functions. sg_ioctl()
is the main entry point for ioctls used on this driver's
devices.

Treat short copy to user space in sg_ctl_req_tbl() as -EFAULT
after report from test robot. This makes it consistent with
handling of all other copy_to_user/copy_from_user functions
in the driver.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 328 ++++++++++++++++++++++++++++------------------
 1 file changed, 199 insertions(+), 129 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 105d88f9d8e2..c40d9f24cc4d 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1018,6 +1018,56 @@ sg_fill_request_table(struct sg_fd *sfp, struct sg_req_info *rinfo)
 	}
 }
 
+/*
+ * Handles ioctl(SG_IO) for blocking (sync) usage of v3 or v4 interface.
+ * Returns 0 on success else a negated errno.
+ */
+static int
+sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
+	     void __user *p)
+{
+	bool read_only = O_RDWR != (filp->f_flags & O_ACCMODE);
+	int res;
+	struct sg_request *srp;
+
+	res = sg_allow_if_err_recovery(sdp, false);
+	if (res)
+		return res;
+	res = sg_submit(sfp, filp, p, SZ_SG_IO_HDR, true, read_only,
+			true, &srp);
+	if (res < 0)
+		return res;
+	res = wait_event_interruptible
+		(sfp->read_wait, (srp_done(sfp, srp) || SG_IS_DETACHING(sdp)));
+	if (SG_IS_DETACHING(sdp))
+		return -ENODEV;
+	spin_lock_irq(&sfp->rq_list_lock);
+	if (srp->done) {
+		srp->done = 2;
+		spin_unlock_irq(&sfp->rq_list_lock);
+		res = sg_new_read(sfp, p, SZ_SG_IO_HDR, srp);
+		return (res < 0) ? res : 0;
+	}
+	srp->orphan = 1;
+	spin_unlock_irq(&sfp->rq_list_lock);
+	return res;	/* -ERESTARTSYS because signal hit process */
+}
+
+static int
+sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
+{
+	if (want_rsv_sz != sfp->reserve.buflen) {
+		if (sfp->mmap_called ||
+		    sfp->res_in_use) {
+			return -EBUSY;
+		}
+
+		sg_remove_scat(sfp, &sfp->reserve);
+		sg_build_reserve(sfp, want_rsv_sz);
+	}
+	return 0;
+}
+
 #ifdef CONFIG_COMPAT
 struct compat_sg_req_info { /* used by SG_GET_REQUEST_TABLE ioctl() */
 	char req_state;
@@ -1045,148 +1095,180 @@ static int put_compat_request_table(struct compat_sg_req_info __user *o,
 }
 #endif
 
+static int
+sg_ctl_req_tbl(struct sg_fd *sfp, void __user *p)
+{
+	int result;
+	unsigned long iflags;
+	sg_req_info_t *rinfo;
+
+	rinfo = kcalloc(SG_MAX_QUEUE, SZ_SG_REQ_INFO,
+			GFP_KERNEL);
+	if (!rinfo)
+		return -ENOMEM;
+	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
+	sg_fill_request_table(sfp, rinfo);
+	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+#ifdef CONFIG_COMPAT
+	if (in_compat_syscall())
+		result = put_compat_request_table(p, rinfo);
+	else
+		result = copy_to_user(p, rinfo,
+				      SZ_SG_REQ_INFO * SG_MAX_QUEUE);
+#else
+	result = copy_to_user(p, rinfo,
+			      SZ_SG_REQ_INFO * SG_MAX_QUEUE);
+#endif
+	kfree(rinfo);
+	return result > 0 ? -EFAULT : result;	/* treat short copy as error */
+}
+
+static int
+sg_ctl_scsi_id(struct scsi_device *sdev, struct sg_fd *sfp, void __user *p)
+{
+	struct sg_scsi_id ss_id;
+
+	SG_LOG(3, sfp, "%s:    SG_GET_SCSI_ID\n", __func__);
+	ss_id.host_no = sdev->host->host_no;
+	ss_id.channel = sdev->channel;
+	ss_id.scsi_id = sdev->id;
+	ss_id.lun = sdev->lun;
+	ss_id.scsi_type = sdev->type;
+	ss_id.h_cmd_per_lun = sdev->host->cmd_per_lun;
+	ss_id.d_queue_depth = sdev->queue_depth;
+	ss_id.unused[0] = 0;
+	ss_id.unused[1] = 0;
+	if (copy_to_user(p, &ss_id, sizeof(struct sg_scsi_id)))
+		return -EFAULT;
+	return 0;
+}
+
 static long
 sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		unsigned int cmd_in, void __user *p)
 {
+	bool read_only = O_RDWR != (filp->f_flags & O_ACCMODE);
+	int val;
+	int result = 0;
 	int __user *ip = p;
-	int result, val, read_only;
 	struct sg_request *srp;
+	struct scsi_device *sdev;
 	unsigned long iflags;
+	__maybe_unused const char *pmlp = ", pass to mid-level";
 
 	SG_LOG(6, sfp, "%s: cmd=0x%x, O_NONBLOCK=%d\n", __func__, cmd_in,
 	       !!(filp->f_flags & O_NONBLOCK));
-	read_only = (O_RDWR != (filp->f_flags & O_ACCMODE));
+	if (unlikely(SG_IS_DETACHING(sdp)))
+		return -ENODEV;
+	sdev = sdp->device;
 
 	switch (cmd_in) {
 	case SG_IO:
-		result = sg_allow_if_err_recovery(sdp, false);
-		if (result)
-			return result;
-		result = sg_submit(sfp, filp, p, SZ_SG_IO_HDR, true, read_only,
-				   true, &srp);
-		if (result < 0)
-			return result;
-		result = wait_event_interruptible(sfp->read_wait,
-			(srp_done(sfp, srp) || SG_IS_DETACHING(sdp)));
-		if (SG_IS_DETACHING(sdp))
-			return -ENODEV;
-		spin_lock_irq(&sfp->rq_list_lock);
-		if (srp->done) {
-			srp->done = 2;
-			spin_unlock_irq(&sfp->rq_list_lock);
-			result = sg_new_read(sfp, p, SZ_SG_IO_HDR, srp);
-			return (result < 0) ? result : 0;
-		}
-		srp->orphan = 1;
-		spin_unlock_irq(&sfp->rq_list_lock);
-		return result;	/* -ERESTARTSYS because signal hit process */
-	case SG_SET_TIMEOUT:
-		result = get_user(val, ip);
-		if (result)
-			return result;
-		if (val < 0)
-			return -EIO;
-		if (val >= mult_frac((s64)INT_MAX, USER_HZ, HZ))
-			val = min_t(s64, mult_frac((s64)INT_MAX, USER_HZ, HZ),
-				    INT_MAX);
-		sfp->timeout_user = val;
-		sfp->timeout = mult_frac(val, HZ, USER_HZ);
-
-		return 0;
-	case SG_GET_TIMEOUT:	/* N.B. User receives timeout as return value */
-				/* strange ..., for backward compatibility */
-		return sfp->timeout_user;
-	case SG_SET_FORCE_LOW_DMA:
-		/*
-		 * N.B. This ioctl never worked properly, but failed to
-		 * return an error value. So returning '0' to keep compability
-		 * with legacy applications.
-		 */
-		return 0;
-	case SG_GET_LOW_DMA:
-		return put_user((int) sdp->device->host->unchecked_isa_dma, ip);
+		return sg_ctl_sg_io(filp, sdp, sfp, p);
 	case SG_GET_SCSI_ID:
-		{
-			sg_scsi_id_t v;
-
-			if (SG_IS_DETACHING(sdp))
-				return -ENODEV;
-			memset(&v, 0, sizeof(v));
-			v.host_no = sdp->device->host->host_no;
-			v.channel = sdp->device->channel;
-			v.scsi_id = sdp->device->id;
-			v.lun = sdp->device->lun;
-			v.scsi_type = sdp->device->type;
-			v.h_cmd_per_lun = sdp->device->host->cmd_per_lun;
-			v.d_queue_depth = sdp->device->queue_depth;
-			if (copy_to_user(p, &v, sizeof(sg_scsi_id_t)))
-				return -EFAULT;
-			return 0;
-		}
+		return sg_ctl_scsi_id(sdev, sfp, p);
 	case SG_SET_FORCE_PACK_ID:
+		SG_LOG(3, sfp, "%s:    SG_SET_FORCE_PACK_ID\n", __func__);
 		result = get_user(val, ip);
 		if (result)
 			return result;
 		sfp->force_packid = val ? 1 : 0;
 		return 0;
 	case SG_GET_PACK_ID:
+		val = -1;
 		spin_lock_irqsave(&sfp->rq_list_lock, iflags);
 		list_for_each_entry(srp, &sfp->rq_list, entry) {
 			if ((1 == srp->done) && (!srp->sg_io_owned)) {
-				spin_unlock_irqrestore(&sfp->rq_list_lock,
-						       iflags);
-				return put_user(srp->header.pack_id, ip);
+				val = srp->header.pack_id;
+				break;
 			}
 		}
 		spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-		return put_user(-1, ip);
+		SG_LOG(3, sfp, "%s:    SG_GET_PACK_ID=%d\n", __func__, val);
+		return put_user(val, ip);
 	case SG_GET_NUM_WAITING:
 		return put_user(atomic_read(&sfp->waiting), ip);
 	case SG_GET_SG_TABLESIZE:
+		SG_LOG(3, sfp, "%s:    SG_GET_SG_TABLESIZE=%d\n", __func__,
+		       sdp->max_sgat_sz);
 		return put_user(sdp->max_sgat_sz, ip);
 	case SG_SET_RESERVED_SIZE:
-		result = get_user(val, ip);
-		if (result)
-			return result;
-                if (val < 0)
-                        return -EINVAL;
-		val = min_t(int, val,
-			    max_sectors_bytes(sdp->device->request_queue));
 		mutex_lock(&sfp->f_mutex);
-		if (val != sfp->reserve.buflen) {
-			if (sfp->mmap_called ||
-			    sfp->res_in_use) {
-				mutex_unlock(&sfp->f_mutex);
-				return -EBUSY;
+		result = get_user(val, ip);
+		if (!result) {
+			if (val >= 0 && val <= (1024 * 1024 * 1024)) {
+				result = sg_set_reserved_sz(sfp, val);
+			} else {
+				SG_LOG(3, sfp, "%s: invalid size\n", __func__);
+				result = -EINVAL;
 			}
-
-			sg_remove_scat(sfp, &sfp->reserve);
-			sg_build_reserve(sfp, val);
 		}
 		mutex_unlock(&sfp->f_mutex);
-		return 0;
+		return result;
 	case SG_GET_RESERVED_SIZE:
 		val = min_t(int, sfp->reserve.buflen,
-			    max_sectors_bytes(sdp->device->request_queue));
+			    max_sectors_bytes(sdev->request_queue));
+		SG_LOG(3, sfp, "%s:    SG_GET_RESERVED_SIZE=%d\n",
+		       __func__, val);
 		return put_user(val, ip);
 	case SG_SET_COMMAND_Q:
+		SG_LOG(3, sfp, "%s:    SG_SET_COMMAND_Q\n", __func__);
 		result = get_user(val, ip);
 		if (result)
 			return result;
 		sfp->cmd_q = val ? 1 : 0;
 		return 0;
 	case SG_GET_COMMAND_Q:
+		SG_LOG(3, sfp, "%s:    SG_GET_COMMAND_Q\n", __func__);
 		return put_user((int) sfp->cmd_q, ip);
 	case SG_SET_KEEP_ORPHAN:
+		SG_LOG(3, sfp, "%s:    SG_SET_KEEP_ORPHAN\n", __func__);
 		result = get_user(val, ip);
 		if (result)
 			return result;
 		sfp->keep_orphan = val;
 		return 0;
 	case SG_GET_KEEP_ORPHAN:
+		SG_LOG(3, sfp, "%s:    SG_GET_KEEP_ORPHAN\n", __func__);
 		return put_user((int) sfp->keep_orphan, ip);
+	case SG_GET_VERSION_NUM:
+		SG_LOG(3, sfp, "%s:    SG_GET_VERSION_NUM\n", __func__);
+		return put_user(sg_version_num, ip);
+	case SG_GET_REQUEST_TABLE:
+		return sg_ctl_req_tbl(sfp, p);
+	case SG_SCSI_RESET:
+		SG_LOG(3, sfp, "%s:    SG_SCSI_RESET\n", __func__);
+		break;
+	case SG_SET_TIMEOUT:
+		SG_LOG(3, sfp, "%s:    SG_SET_TIMEOUT\n", __func__);
+		result = get_user(val, ip);
+		if (result)
+			return result;
+		if (val < 0)
+			return -EIO;
+		if (val >= mult_frac((s64)INT_MAX, USER_HZ, HZ))
+			val = min_t(s64, mult_frac((s64)INT_MAX, USER_HZ, HZ),
+				    INT_MAX);
+		sfp->timeout_user = val;
+		sfp->timeout = mult_frac(val, HZ, USER_HZ);
+		return 0;
+	case SG_GET_TIMEOUT:    /* N.B. User receives timeout as return value */
+				/* strange ..., for backward compatibility */
+		SG_LOG(3, sfp, "%s:    SG_GET_TIMEOUT\n", __func__);
+		return sfp->timeout_user;
+	case SG_SET_FORCE_LOW_DMA:
+		/*
+		 * N.B. This ioctl never worked properly, but failed to
+		 * return an error value. So returning '0' to keep
+		 * compatibility with legacy applications.
+		 */
+		SG_LOG(3, sfp, "%s:    SG_SET_FORCE_LOW_DMA\n", __func__);
+		return 0;
+	case SG_GET_LOW_DMA:
+		SG_LOG(3, sfp, "%s:    SG_GET_LOW_DMA\n", __func__);
+		return put_user((int)sdev->host->unchecked_isa_dma, ip);
 	case SG_NEXT_CMD_LEN:
+		SG_LOG(3, sfp, "%s:    SG_NEXT_CMD_LEN\n", __func__);
 		result = get_user(val, ip);
 		if (result)
 			return result;
@@ -1194,80 +1276,68 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 			return -ENOMEM;
 		sfp->next_cmd_len = (val > 0) ? val : 0;
 		return 0;
-	case SG_GET_VERSION_NUM:
-		return put_user(sg_version_num, ip);
 	case SG_GET_ACCESS_COUNT:
+		SG_LOG(3, sfp, "%s:    SG_GET_ACCESS_COUNT\n", __func__);
 		/* faked - we don't have a real access count anymore */
-		val = (sdp->device ? 1 : 0);
+		val = (sdev ? 1 : 0);
 		return put_user(val, ip);
-	case SG_GET_REQUEST_TABLE:
-		{
-			sg_req_info_t *rinfo;
-
-			rinfo = kcalloc(SG_MAX_QUEUE, SZ_SG_REQ_INFO,
-					GFP_KERNEL);
-			if (!rinfo)
-				return -ENOMEM;
-			spin_lock_irqsave(&sfp->rq_list_lock, iflags);
-			sg_fill_request_table(sfp, rinfo);
-			spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	#ifdef CONFIG_COMPAT
-			if (in_compat_syscall())
-				result = put_compat_request_table(p, rinfo);
-			else
-	#endif
-				result = copy_to_user(p, rinfo,
-						      SZ_SG_REQ_INFO * SG_MAX_QUEUE);
-			result = result ? -EFAULT : 0;
-			kfree(rinfo);
-			return result;
-		}
 	case SG_EMULATED_HOST:
-		if (SG_IS_DETACHING(sdp))
-			return -ENODEV;
-		return put_user(sdp->device->host->hostt->emulated, ip);
+		SG_LOG(3, sfp, "%s:    SG_EMULATED_HOST\n", __func__);
+		return put_user(sdev->host->hostt->emulated, ip);
 	case SCSI_IOCTL_SEND_COMMAND:
-		if (SG_IS_DETACHING(sdp))
-			return -ENODEV;
-		return sg_scsi_ioctl(sdp->device->request_queue, NULL, filp->f_mode, p);
+		SG_LOG(3, sfp, "%s:    SCSI_IOCTL_SEND_COMMAND\n", __func__);
+		return sg_scsi_ioctl(sdev->request_queue, NULL, filp->f_mode,
+				     p);
 	case SG_SET_DEBUG:
+		SG_LOG(3, sfp, "%s:    SG_SET_DEBUG\n", __func__);
 		result = get_user(val, ip);
 		if (result)
 			return result;
 		assign_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm, val);
 		return 0;
 	case BLKSECTGET:
-		return put_user(max_sectors_bytes(sdp->device->request_queue),
-				ip);
+		SG_LOG(3, sfp, "%s:    BLKSECTGET\n", __func__);
+		return put_user(max_sectors_bytes(sdev->request_queue), ip);
 	case BLKTRACESETUP:
-		return blk_trace_setup(sdp->device->request_queue,
+		SG_LOG(3, sfp, "%s:    BLKTRACESETUP\n", __func__);
+		return blk_trace_setup(sdev->request_queue,
 				       sdp->disk->disk_name,
 				       MKDEV(SCSI_GENERIC_MAJOR, sdp->index),
 				       NULL, p);
 	case BLKTRACESTART:
-		return blk_trace_startstop(sdp->device->request_queue, 1);
+		SG_LOG(3, sfp, "%s:    BLKTRACESTART\n", __func__);
+		return blk_trace_startstop(sdev->request_queue, 1);
 	case BLKTRACESTOP:
-		return blk_trace_startstop(sdp->device->request_queue, 0);
+		SG_LOG(3, sfp, "%s:    BLKTRACESTOP\n", __func__);
+		return blk_trace_startstop(sdev->request_queue, 0);
 	case BLKTRACETEARDOWN:
-		return blk_trace_remove(sdp->device->request_queue);
+		SG_LOG(3, sfp, "%s:    BLKTRACETEARDOWN\n", __func__);
+		return blk_trace_remove(sdev->request_queue);
 	case SCSI_IOCTL_GET_IDLUN:
+		SG_LOG(3, sfp, "%s:    SCSI_IOCTL_GET_IDLUN %s\n", __func__,
+		       pmlp);
+		break;
 	case SCSI_IOCTL_GET_BUS_NUMBER:
+		SG_LOG(3, sfp, "%s:    SCSI_IOCTL_GET_BUS_NUMBER%s\n",
+		       __func__, pmlp);
+		break;
 	case SCSI_IOCTL_PROBE_HOST:
+		SG_LOG(3, sfp, "%s:    SCSI_IOCTL_PROBE_HOST%s",
+		       __func__, pmlp);
+		break;
 	case SG_GET_TRANSFORM:
-	case SG_SCSI_RESET:
-		if (SG_IS_DETACHING(sdp))
-			return -ENODEV;
+		SG_LOG(3, sfp, "%s:    SG_GET_TRANSFORM%s\n", __func__, pmlp);
 		break;
 	default:
+		SG_LOG(3, sfp, "%s:    unrecognized ioctl [0x%x]%s\n",
+		       __func__, cmd_in, pmlp);
 		if (read_only)
-			return -EPERM;	/* don't know so take safe approach */
+			return -EPERM;	/* don't know, so take safer approach */
 		break;
 	}
-
 	result = sg_allow_if_err_recovery(sdp, filp->f_flags & O_NDELAY);
 	if (result)
 		return result;
-
 	return -ENOIOCTLCMD;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 13/83] sg: split sg_read
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (12 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 12/83] sg: ioctl handling Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 14/83] sg: sg_common_write add structure for arguments Douglas Gilbert
                   ` (69 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

As sg_read() is getting quite long, split out the v1 and v2
processing into sg_read_v1v2(). Rename sg_new_read() to
sg_receive_v3() as the v3 interface is now older than the v4
interface which is being added in a later patch.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 267 +++++++++++++++++++++++-----------------------
 1 file changed, 131 insertions(+), 136 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index c40d9f24cc4d..246ebed1cee4 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -188,8 +188,8 @@ static ssize_t sg_submit(struct sg_fd *sfp, struct file *filp,
 			 struct sg_request **o_srp);
 static int sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 			   u8 *cmnd, int timeout, int blocking);
-static int sg_rd_append(struct sg_request *srp, void __user *outp,
-			int num_xfer);
+static int sg_read_append(struct sg_request *srp, void __user *outp,
+			  int num_xfer);
 static void sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp);
 static void sg_build_reserve(struct sg_fd *sfp, int req_size);
 static void sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp,
@@ -756,8 +756,8 @@ sg_get_rq_mark(struct sg_fd *sfp, int pack_id)
 }
 
 static ssize_t
-sg_new_read(struct sg_fd *sfp, char __user *buf, size_t count,
-	    struct sg_request *srp)
+sg_receive_v3(struct sg_fd *sfp, char __user *buf, size_t count,
+	      struct sg_request *srp)
 {
 	struct sg_io_hdr *hp = &srp->header;
 	int err = 0, err2;
@@ -811,168 +811,163 @@ srp_done(struct sg_fd *sfp, struct sg_request *srp)
 	return ret;
 }
 
-static ssize_t
-sg_read(struct file *filp, char __user *buf, size_t count, loff_t *ppos)
+static int
+sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
+	     struct sg_request *srp)
 {
-	struct sg_device *sdp;
-	struct sg_fd *sfp;
-	struct sg_request *srp;
-	int req_pack_id = -1;
-	int ret = 0;
-	struct sg_io_hdr *hp;
-	struct sg_header *old_hdr = NULL;
-
-	/*
-	 * This could cause a response to be stranded. Close the associated
-	 * file descriptor to free up any resources being held.
-	 */
-	ret = sg_check_file_access(filp, __func__);
-	if (ret)
-		return ret;
-
-	sfp = filp->private_data;
-	sdp = sfp->parentdp;
-	SG_LOG(3, sfp, "%s: read() count=%d\n", __func__, (int)count);
-	ret = sg_allow_if_err_recovery(sdp, false);
-	if (ret)
-		return ret;
-
-	if (!access_ok(buf, count))
-		return -EFAULT;
-	if (sfp->force_packid && count >= SZ_SG_HEADER) {
-		old_hdr = kmalloc(SZ_SG_HEADER, GFP_KERNEL);
-		if (!old_hdr)
-			return -ENOMEM;
-		if (copy_from_user(old_hdr, buf, SZ_SG_HEADER)) {
-			ret = -EFAULT;
-			goto free_old_hdr;
-		}
-		if (old_hdr->reply_len < 0) {
-			if (count >= SZ_SG_IO_HDR) {
-				struct sg_io_hdr *new_hdr;
-
-				new_hdr = kmalloc(SZ_SG_IO_HDR, GFP_KERNEL);
-				if (!new_hdr) {
-					ret = -ENOMEM;
-					goto free_old_hdr;
-				}
-				ret = copy_from_user
-				    (new_hdr, buf, SZ_SG_IO_HDR);
-				req_pack_id = new_hdr->pack_id;
-				kfree(new_hdr);
-				if (ret) {
-					ret = -EFAULT;
-					goto free_old_hdr;
-				}
-			}
-		} else {
-			req_pack_id = old_hdr->pack_id;
-		}
-	}
-	srp = sg_get_rq_mark(sfp, req_pack_id);
-	if (!srp) {		/* now wait on packet to arrive */
-		if (SG_IS_DETACHING(sdp)) {
-			ret = -ENODEV;
-			goto free_old_hdr;
-		}
-		if (filp->f_flags & O_NONBLOCK) {
-			ret = -EAGAIN;
-			goto free_old_hdr;
-		}
-		ret = wait_event_interruptible
-				(sfp->read_wait,
-				 (SG_IS_DETACHING(sdp) ||
-				  (srp = sg_get_rq_mark(sfp, req_pack_id))));
-		if (SG_IS_DETACHING(sdp)) {
-			ret = -ENODEV;
-			goto free_old_hdr;
-		}
-		if (ret) {
-			/* -ERESTARTSYS as signal hit process */
-			goto free_old_hdr;
-		}
-	}
-	if (srp->header.interface_id != '\0') {
-		ret = sg_new_read(sfp, buf, count, srp);
-		goto free_old_hdr;
-	}
-
-	hp = &srp->header;
-	if (!old_hdr) {
-		old_hdr = kmalloc(SZ_SG_HEADER, GFP_KERNEL);
-		if (!old_hdr) {
-			ret = -ENOMEM;
-			goto free_old_hdr;
-		}
-	}
-	memset(old_hdr, 0, SZ_SG_HEADER);
-	old_hdr->reply_len = (int)hp->timeout;
-	old_hdr->pack_len = old_hdr->reply_len; /* old, strange behaviour */
-	old_hdr->pack_id = hp->pack_id;
-	old_hdr->twelve_byte =
-	    ((srp->data.cmd_opcode >= 0xc0) && (hp->cmd_len == 12)) ? 1 : 0;
-	old_hdr->target_status = hp->masked_status;
-	old_hdr->host_status = hp->host_status;
-	old_hdr->driver_status = hp->driver_status;
-	if ((hp->masked_status & CHECK_CONDITION) ||
-	    (hp->driver_status & DRIVER_SENSE))
-		memcpy(old_hdr->sense_buffer, srp->sense_b,
-		       sizeof(old_hdr->sense_buffer));
-	switch (hp->host_status) {
+	int res = 0;
+	struct sg_io_hdr *sh3p = &srp->header;
+	struct sg_header *h2p;
+	struct sg_header a_v2hdr;
+
+	h2p = &a_v2hdr;
+	memset(h2p, 0, SZ_SG_HEADER);
+	h2p->reply_len = (int)sh3p->timeout;
+	h2p->pack_len = h2p->reply_len; /* old, strange behaviour */
+	h2p->pack_id = sh3p->pack_id;
+	h2p->twelve_byte = (srp->data.cmd_opcode >= 0xc0 &&
+			    sh3p->cmd_len == 12);
+	h2p->target_status = sh3p->masked_status;
+	h2p->host_status = sh3p->host_status;
+	h2p->driver_status = sh3p->driver_status;
+	if ((CHECK_CONDITION & h2p->target_status) ||
+	    (DRIVER_SENSE & sh3p->driver_status)) {
+		memcpy(h2p->sense_buffer, srp->sense_b,
+		       sizeof(h2p->sense_buffer));
+	}
+	switch (h2p->host_status) {
 	/*
-	 * This setup of 'result' is for backward compatibility and is best
-	 * ignored by the user who should use target, host + driver status
+	 * This following setting of 'result' is for backward compatibility
+	 * and is best ignored by the user who should use target, host and
+	 * driver status.
 	 */
 	case DID_OK:
 	case DID_PASSTHROUGH:
 	case DID_SOFT_ERROR:
-		old_hdr->result = 0;
+		h2p->result = 0;
 		break;
 	case DID_NO_CONNECT:
 	case DID_BUS_BUSY:
 	case DID_TIME_OUT:
-		old_hdr->result = EBUSY;
+		h2p->result = EBUSY;
 		break;
 	case DID_BAD_TARGET:
 	case DID_ABORT:
 	case DID_PARITY:
 	case DID_RESET:
 	case DID_BAD_INTR:
-		old_hdr->result = EIO;
+		h2p->result = EIO;
 		break;
 	case DID_ERROR:
-		old_hdr->result = (srp->sense_b[0] == 0 &&
-				  hp->masked_status == GOOD) ? 0 : EIO;
+		h2p->result = (h2p->target_status == GOOD) ? 0 : EIO;
 		break;
 	default:
-		old_hdr->result = EIO;
+		h2p->result = EIO;
 		break;
 	}
 
 	/* Now copy the result back to the user buffer.  */
 	if (count >= SZ_SG_HEADER) {
-		if (copy_to_user(buf, old_hdr, SZ_SG_HEADER)) {
-			ret = -EFAULT;
-			goto free_old_hdr;
-		}
+		if (copy_to_user(buf, h2p, SZ_SG_HEADER))
+			return -EFAULT;
 		buf += SZ_SG_HEADER;
-		if (count > old_hdr->reply_len)
-			count = old_hdr->reply_len;
+		if (count > h2p->reply_len)
+			count = h2p->reply_len;
 		if (count > SZ_SG_HEADER) {
-			if (sg_rd_append(srp, buf, count - SZ_SG_HEADER)) {
-				ret = -EFAULT;
-				goto free_old_hdr;
-			}
+			if (sg_read_append(srp, buf, count - SZ_SG_HEADER))
+				return -EFAULT;
 		}
 	} else {
-		count = (old_hdr->result == 0) ? 0 : -EIO;
+		res = (h2p->result == 0) ? 0 : -EIO;
 	}
 	sg_finish_scsi_blk_rq(srp);
 	sg_remove_request(sfp, srp);
-	ret = count;
-free_old_hdr:
-	kfree(old_hdr);
-	return ret;
+	return res;
+}
+
+static ssize_t
+sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
+{
+	bool could_be_v3;
+	bool non_block = !!(filp->f_flags & O_NONBLOCK);
+	int want_id = -1;
+	int hlen, ret;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
+	struct sg_request *srp;
+	struct sg_header *h2p = NULL;
+	struct sg_io_hdr a_sg_io_hdr;
+
+	/*
+	 * This could cause a response to be stranded. Close the associated
+	 * file descriptor to free up any resources being held.
+	 */
+	ret = sg_check_file_access(filp, __func__);
+	if (ret)
+		return ret;
+
+	sfp = filp->private_data;
+	sdp = sfp->parentdp;
+	SG_LOG(3, sfp, "%s: read() count=%d\n", __func__, (int)count);
+	ret = sg_allow_if_err_recovery(sdp, false);
+	if (ret)
+		return ret;
+
+	could_be_v3 = (count >= SZ_SG_IO_HDR);
+	hlen = could_be_v3 ? SZ_SG_IO_HDR : SZ_SG_HEADER;
+	h2p = (struct sg_header *)&a_sg_io_hdr;
+
+	if (sfp->force_packid && count >= hlen) {
+		/*
+		 * Even though this is a user space read() system call, this
+		 * code is cheating to fetch the pack_id.
+		 * Only need first three 32 bit ints to determine interface.
+		 */
+		if (unlikely(copy_from_user(h2p, p, 3 * sizeof(int))))
+			return -EFAULT;
+		if (h2p->reply_len < 0 && could_be_v3) {
+			struct sg_io_hdr *v3_hdr = (struct sg_io_hdr *)h2p;
+
+			if (likely(v3_hdr->interface_id == 'S')) {
+				struct sg_io_hdr __user *h3_up;
+
+				h3_up = (struct sg_io_hdr __user *)p;
+				ret = get_user(want_id, &h3_up->pack_id);
+				if (unlikely(ret))
+					return ret;
+			} else if (v3_hdr->interface_id == 'Q') {
+				pr_info_once("sg: %s: v4 interface%s here\n",
+					     __func__, " disallowed");
+				return -EPERM;
+			} else {
+				return -EPERM;
+			}
+		} else { /* for v1+v2 interfaces, this is the 3rd integer */
+			want_id = h2p->pack_id;
+		}
+	}
+	srp = sg_get_rq_mark(sfp, want_id);
+	if (!srp) {		/* now wait on packet to arrive */
+		if (SG_IS_DETACHING(sdp))
+			return -ENODEV;
+		if (non_block) /* O_NONBLOCK or v3::flags & SGV4_FLAG_IMMED */
+			return -EAGAIN;
+		ret = wait_event_interruptible
+				(sfp->read_wait,
+				 (SG_IS_DETACHING(sdp) ||
+				  (srp = sg_get_rq_mark(sfp, want_id))));
+		if (SG_IS_DETACHING(sdp))
+			return -ENODEV;
+		if (ret)	/* -ERESTARTSYS as signal hit process */
+			return ret;
+	}
+	if (srp->header.interface_id == '\0')
+		ret = sg_read_v1v2(p, (int)count, sfp, srp);
+	else
+		ret = sg_receive_v3(sfp, p, count, srp);
+	if (ret < 0)
+		SG_LOG(1, sfp, "%s: negated errno: %d\n", __func__, ret);
+	return ret < 0 ? ret : (int)count;
 }
 
 static int
@@ -1045,7 +1040,7 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	if (srp->done) {
 		srp->done = 2;
 		spin_unlock_irq(&sfp->rq_list_lock);
-		res = sg_new_read(sfp, p, SZ_SG_IO_HDR, srp);
+		res = sg_receive_v3(sfp, p, SZ_SG_IO_HDR, srp);
 		return (res < 0) ? res : 0;
 	}
 	srp->orphan = 1;
@@ -2182,7 +2177,7 @@ sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
  * appended to given struct sg_header object.
  */
 static int
-sg_rd_append(struct sg_request *srp, void __user *outp, int num_xfer)
+sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
 {
 	struct sg_scatter_hold *schp = &srp->data;
 	int k, num;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 14/83] sg: sg_common_write add structure for arguments
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (13 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 13/83] sg: split sg_read Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 15/83] sg: rework sg_vma_fault Douglas Gilbert
                   ` (68 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

As the number of arguments to sg_common_write() starts to grow
(more in later patches) add a structure to hold most of these
arguments.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 47 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 32 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 246ebed1cee4..fde02484c54a 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -174,6 +174,13 @@ struct sg_device { /* holds the state of each scsi generic device */
 	struct kref d_ref;
 };
 
+struct sg_comm_wr_t {  /* arguments to sg_common_write() */
+	int timeout;
+	int blocking;
+	struct sg_request *srp;
+	u8 *cmnd;
+};
+
 /* tasklet or soft irq callback */
 static void sg_rq_end_io(struct request *rq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
@@ -186,8 +193,7 @@ static ssize_t sg_submit(struct sg_fd *sfp, struct file *filp,
 			 const char __user *buf, size_t count, bool blocking,
 			 bool read_only, bool sg_io_owned,
 			 struct sg_request **o_srp);
-static int sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
-			   u8 *cmnd, int timeout, int blocking);
+static int sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwp);
 static int sg_read_append(struct sg_request *srp, void __user *outp,
 			  int num_xfer);
 static void sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp);
@@ -487,6 +493,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	struct sg_io_hdr v3hdr;
 	struct sg_header *ohp = &ov2hdr;
 	struct sg_io_hdr *h3p = &v3hdr;
+	struct sg_comm_wr_t cwr;
 	u8 cmnd[SG_MAX_CDB_SIZE];
 
 	res = sg_check_file_access(filp, __func__);
@@ -590,7 +597,11 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 				   input_size, (unsigned int) cmnd[0],
 				   current->comm);
 	}
-	res = sg_common_write(sfp, srp, cmnd, sfp->timeout, blocking);
+	cwr.timeout = sfp->timeout;
+	cwr.blocking = blocking;
+	cwr.srp = srp;
+	cwr.cmnd = cmnd;
+	res = sg_common_write(sfp, &cwr);
 	return (res < 0) ? res : count;
 }
 
@@ -613,6 +624,7 @@ sg_submit(struct sg_fd *sfp, struct file *filp, const char __user *buf,
 	int k;
 	struct sg_request *srp;
 	struct sg_io_hdr *hp;
+	struct sg_comm_wr_t cwr;
 	u8 cmnd[SG_MAX_CDB_SIZE];
 	int timeout;
 	unsigned long ul_timeout;
@@ -663,23 +675,28 @@ sg_submit(struct sg_fd *sfp, struct file *filp, const char __user *buf,
 		sg_remove_request(sfp, srp);
 		return -EPERM;
 	}
-	k = sg_common_write(sfp, srp, cmnd, timeout, blocking);
+	cwr.timeout = timeout;
+	cwr.blocking = blocking;
+	cwr.srp = srp;
+	cwr.cmnd = cmnd;
+	k = sg_common_write(sfp, &cwr);
 	if (k < 0)
 		return k;
 	if (o_srp)
-		*o_srp = srp;
+		*o_srp = cwr.srp;
 	return count;
 }
 
 static int
-sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
-		u8 *cmnd, int timeout, int blocking)
+sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 {
-	int k, at_head;
+	bool at_head;
+	int k;
 	struct sg_device *sdp = sfp->parentdp;
+	struct sg_request *srp = cwrp->srp;
 	struct sg_io_hdr *hp = &srp->header;
 
-	srp->data.cmd_opcode = cmnd[0];	/* hold opcode of command */
+	srp->data.cmd_opcode = cwrp->cmnd[0];	/* hold opcode of command */
 	hp->status = 0;
 	hp->masked_status = 0;
 	hp->msg_status = 0;
@@ -688,14 +705,14 @@ sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 	hp->driver_status = 0;
 	hp->resid = 0;
 	SG_LOG(4, sfp, "%s:  opcode=0x%02x, cmd_sz=%d\n", __func__,
-	       (int)cmnd[0], hp->cmd_len);
+	       (int)cwrp->cmnd[0], hp->cmd_len);
 
 	if (hp->dxfer_len >= SZ_256M) {
 		sg_remove_request(sfp, srp);
 		return -EINVAL;
 	}
 
-	k = sg_start_req(srp, cmnd);
+	k = sg_start_req(srp, cwrp->cmnd);
 	if (k) {
 		SG_LOG(1, sfp, "%s: start_req err=%d\n", __func__, k);
 		sg_finish_scsi_blk_rq(srp);
@@ -717,13 +734,13 @@ sg_common_write(struct sg_fd *sfp, struct sg_request *srp,
 	hp->duration = jiffies_to_msecs(jiffies);
 	if (hp->interface_id != '\0' &&	/* v3 (or later) interface */
 	    (SG_FLAG_Q_AT_TAIL & hp->flags))
-		at_head = 0;
+		at_head = false;
 	else
-		at_head = 1;
+		at_head = true;
 
-	if (!blocking)
+	if (!srp->sg_io_owned)
 		atomic_inc(&sfp->submitted);
-	srp->rq->timeout = timeout;
+	srp->rq->timeout = cwrp->timeout;
 	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
 	blk_execute_rq_nowait(sdp->disk, srp->rq, at_head, sg_rq_end_io);
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 15/83] sg: rework sg_vma_fault
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (14 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 14/83] sg: sg_common_write add structure for arguments Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 16/83] sg: rework sg_mmap Douglas Gilbert
                   ` (67 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Simple refactoring of the sg_vma_fault() function.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index fde02484c54a..72fdb76f409d 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1428,14 +1428,16 @@ sg_fasync(int fd, struct file *filp, int mode)
 	return fasync_helper(fd, filp, mode, &sfp->async_qp);
 }
 
+/* Note: the error return: VM_FAULT_SIGBUS causes a "bus error" */
 static vm_fault_t
 sg_vma_fault(struct vm_fault *vmf)
 {
-	struct vm_area_struct *vma = vmf->vma;
-	struct sg_fd *sfp;
+	int k, length;
 	unsigned long offset, len, sa;
+	struct vm_area_struct *vma = vmf->vma;
 	struct sg_scatter_hold *rsv_schp;
-	int k, length;
+	struct sg_device *sdp;
+	struct sg_fd *sfp;
 	const char *nbp = "==NULL, bad";
 
 	if (!vma) {
@@ -1447,20 +1449,31 @@ sg_vma_fault(struct vm_fault *vmf)
 		pr_warn("%s: sfp%s\n", __func__, nbp);
 		goto out_err;
 	}
+	sdp = sfp->parentdp;
+	if (sdp && unlikely(SG_IS_DETACHING(sdp))) {
+		SG_LOG(1, sfp, "%s: device detaching\n", __func__);
+		goto out_err;
+	}
 	rsv_schp = &sfp->reserve;
 	offset = vmf->pgoff << PAGE_SHIFT;
-	if (offset >= rsv_schp->buflen)
-		return VM_FAULT_SIGBUS;
+	if (offset >= (unsigned int)rsv_schp->buflen) {
+		SG_LOG(1, sfp, "%s: offset[%lu] >= rsv.buflen\n", __func__,
+		       offset);
+		goto out_err;
+	}
 	sa = vma->vm_start;
 	SG_LOG(3, sfp, "%s: vm_start=0x%lx, off=%lu\n", __func__, sa, offset);
 	length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
-	for (k = 0; k < rsv_schp->num_sgat && sa < vma->vm_end; k++) {
+	for (k = 0; k < rsv_schp->num_sgat && sa < vma->vm_end; ++k) {
 		len = vma->vm_end - sa;
-		len = (len < length) ? len : length;
+		len = min_t(int, len, (int)length);
 		if (offset < len) {
-			struct page *page = nth_page(rsv_schp->pages[k],
-						     offset >> PAGE_SHIFT);
-			get_page(page);	/* increment page count */
+			struct page *page;
+			struct page *pp;
+
+			pp = rsv_schp->pages[k];
+			page = nth_page(pp, offset >> PAGE_SHIFT);
+			get_page(page); /* increment page count */
 			vmf->page = page;
 			return 0; /* success */
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 16/83] sg: rework sg_mmap
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (15 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 15/83] sg: rework sg_vma_fault Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 17/83] sg: replace sg_allow_access Douglas Gilbert
                   ` (66 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Simple rework of the sg_mmap() function.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 72fdb76f409d..c3c458bf36da 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1488,14 +1488,15 @@ static const struct vm_operations_struct sg_mmap_vm_ops = {
 	.fault = sg_vma_fault,
 };
 
+/* Entry point for mmap(2) system call */
 static int
 sg_mmap(struct file *filp, struct vm_area_struct *vma)
 {
-	struct sg_fd *sfp;
-	unsigned long req_sz, len, sa;
-	struct sg_scatter_hold *rsv_schp;
 	int k, length;
 	int ret = 0;
+	unsigned long req_sz, len, sa;
+	struct sg_scatter_hold *rsv_schp;
+	struct sg_fd *sfp;
 
 	if (!filp || !vma)
 		return -ENXIO;
@@ -1508,19 +1509,23 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 	SG_LOG(3, sfp, "%s: vm_start=%p, len=%d\n", __func__,
 	       (void *)vma->vm_start, (int)req_sz);
 	if (vma->vm_pgoff)
-		return -EINVAL;	/* want no offset */
-	rsv_schp = &sfp->reserve;
+		return -EINVAL; /* only an offset of 0 accepted */
+	/* Check reserve request is inactive and has large enough buffer */
 	mutex_lock(&sfp->f_mutex);
-	if (req_sz > rsv_schp->buflen) {
-		ret = -ENOMEM;	/* cannot map more than reserved buffer */
+	if (sfp->res_in_use) {
+		ret = -EBUSY;
+		goto out;
+	}
+	rsv_schp = &sfp->reserve;
+	if (req_sz > (unsigned long)rsv_schp->buflen) {
+		ret = -ENOMEM;
 		goto out;
 	}
-
 	sa = vma->vm_start;
 	length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
-	for (k = 0; k < rsv_schp->num_sgat && sa < vma->vm_end; k++) {
+	for (k = 0; k < rsv_schp->num_sgat && sa < vma->vm_end; ++k) {
 		len = vma->vm_end - sa;
-		len = (len < length) ? len : length;
+		len = min_t(unsigned long, len, (unsigned long)length);
 		sa += len;
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 17/83] sg: replace sg_allow_access
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (16 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 16/83] sg: rework sg_mmap Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 18/83] sg: rework scatter gather handling Douglas Gilbert
                   ` (65 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Replace the sg_allow_access() function with sg_fetch_cmnd()
which does a little more. Change sg_finish_scsi_blk_rq() from an
int to a void returning function. Rename sg_remove_request()
to sg_deact_request(). Other changes, mainly cosmetic.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 150 +++++++++++++++++++++++++---------------------
 1 file changed, 82 insertions(+), 68 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index c3c458bf36da..610cd69e5201 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -186,7 +186,7 @@ static void sg_rq_end_io(struct request *rq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
 static int sg_proc_init(void);
 static int sg_start_req(struct sg_request *srp, u8 *cmd);
-static int sg_finish_scsi_blk_rq(struct sg_request *srp);
+static void sg_finish_scsi_blk_rq(struct sg_request *srp);
 static int sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
 			     int buff_size);
 static ssize_t sg_submit(struct sg_fd *sfp, struct file *filp,
@@ -204,7 +204,7 @@ static void sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
 static void sg_remove_sfp(struct kref *);
 static struct sg_request *sg_setup_req(struct sg_fd *sfp);
-static int sg_remove_request(struct sg_fd *sfp, struct sg_request *srp);
+static int sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int dev);
 static void sg_device_destroy(struct kref *kref);
 
@@ -539,7 +539,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	mutex_lock(&sfp->f_mutex);
 	if (sfp->next_cmd_len > 0) {
 		cmd_size = sfp->next_cmd_len;
-		sfp->next_cmd_len = 0;	/* reset so only this write() effected */
+		sfp->next_cmd_len = 0;	/* reset, only this write() effected */
 	} else {
 		cmd_size = COMMAND_SIZE(opcode);  /* old: SCSI command group */
 		if (opcode >= 0xc0 && ohp->twelve_byte)
@@ -553,7 +553,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	mxsize -= SZ_SG_HEADER;
 	input_size -= SZ_SG_HEADER;
 	if (input_size < 0) {
-		sg_remove_request(sfp, srp);
+		sg_deact_request(sfp, srp);
 		return -EIO;	/* User did not pass enough bytes for this command. */
 	}
 	h3p = &srp->header;
@@ -570,7 +570,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	h3p->dxfer_len = mxsize;
 	if (h3p->dxfer_direction == SG_DXFER_TO_DEV ||
 	    h3p->dxfer_direction == SG_DXFER_TO_FROM_DEV)
-		h3p->dxferp = (char __user *)p + cmd_size;
+		h3p->dxferp = (u8 __user *)p + cmd_size;
 	else
 		h3p->dxferp = NULL;
 	h3p->sbp = NULL;
@@ -579,7 +579,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	h3p->pack_id = ohp->pack_id;
 	h3p->usr_ptr = NULL;
 	if (copy_from_user(cmnd, p, cmd_size)) {
-		sg_remove_request(sfp, srp);
+		sg_deact_request(sfp, srp);
 		return -EFAULT;
 	}
 	/*
@@ -606,14 +606,24 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 }
 
 static int
-sg_allow_access(struct file *filp, u8 *cmd)
+sg_fetch_cmnd(struct file *filp, struct sg_fd *sfp, const u8 __user *u_cdbp,
+	      int len, u8 *cdbp)
 {
-	struct sg_fd *sfp = filp->private_data;
-
-	if (sfp->parentdp->device->type == TYPE_SCANNER)
-		return 0;
-
-	return blk_verify_command(cmd, filp->f_mode);
+	if (!u_cdbp || len < 6 || len > SG_MAX_CDB_SIZE)
+		return -EMSGSIZE;
+	if (copy_from_user(cdbp, u_cdbp, len))
+		return -EFAULT;
+	if (O_RDWR != (filp->f_flags & O_ACCMODE)) {	/* read-only */
+		switch (sfp->parentdp->device->type) {
+		case TYPE_DISK:
+		case TYPE_RBC:
+		case TYPE_ZBC:
+			return blk_verify_command(cdbp, filp->f_mode);
+		default:	/* SSC, SES, etc cbd_s may differ from SBC */
+			break;
+		}
+	}
+	return 0;
 }
 
 static ssize_t
@@ -621,12 +631,11 @@ sg_submit(struct sg_fd *sfp, struct file *filp, const char __user *buf,
 	  size_t count, bool blocking, bool read_only, bool sg_io_owned,
 	  struct sg_request **o_srp)
 {
-	int k;
+	int k, res, timeout;
 	struct sg_request *srp;
 	struct sg_io_hdr *hp;
 	struct sg_comm_wr_t cwr;
 	u8 cmnd[SG_MAX_CDB_SIZE];
-	int timeout;
 	unsigned long ul_timeout;
 
 	if (count < SZ_SG_IO_HDR)
@@ -639,41 +648,35 @@ sg_submit(struct sg_fd *sfp, struct file *filp, const char __user *buf,
 	}
 	srp->sg_io_owned = sg_io_owned;
 	hp = &srp->header;
+	/* get_sg_io_hdr() is defined in block/scsi_ioctl.c */
 	if (get_sg_io_hdr(hp, buf)) {
-		sg_remove_request(sfp, srp);
+		sg_deact_request(sfp, srp);
 		return -EFAULT;
 	}
 	if (hp->interface_id != 'S') {
-		sg_remove_request(sfp, srp);
+		sg_deact_request(sfp, srp);
 		return -ENOSYS;
 	}
 	if (hp->flags & SG_FLAG_MMAP_IO) {
 		if (hp->dxfer_len > sfp->reserve.buflen) {
-			sg_remove_request(sfp, srp);
+			sg_deact_request(sfp, srp);
 			return -ENOMEM;	/* MMAP_IO size must fit in reserve buffer */
 		}
 		if (hp->flags & SG_FLAG_DIRECT_IO) {
-			sg_remove_request(sfp, srp);
+			sg_deact_request(sfp, srp);
 			return -EINVAL;	/* either MMAP_IO or DIRECT_IO (not both) */
 		}
 		if (sfp->res_in_use) {
-			sg_remove_request(sfp, srp);
+			sg_deact_request(sfp, srp);
 			return -EBUSY;	/* reserve buffer already being used */
 		}
 	}
 	ul_timeout = msecs_to_jiffies(srp->header.timeout);
 	timeout = (ul_timeout < INT_MAX) ? ul_timeout : INT_MAX;
-	if ((!hp->cmdp) || (hp->cmd_len < 6) || (hp->cmd_len > sizeof (cmnd))) {
-		sg_remove_request(sfp, srp);
-		return -EMSGSIZE;
-	}
-	if (copy_from_user(cmnd, hp->cmdp, hp->cmd_len)) {
-		sg_remove_request(sfp, srp);
-		return -EFAULT;
-	}
-	if (read_only && sg_allow_access(filp, cmnd)) {
-		sg_remove_request(sfp, srp);
-		return -EPERM;
+	res = sg_fetch_cmnd(filp, sfp, hp->cmdp, hp->cmd_len, cmnd);
+	if (res) {
+		sg_deact_request(sfp, srp);
+		return res;
 	}
 	cwr.timeout = timeout;
 	cwr.blocking = blocking;
@@ -708,7 +711,7 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 	       (int)cwrp->cmnd[0], hp->cmd_len);
 
 	if (hp->dxfer_len >= SZ_256M) {
-		sg_remove_request(sfp, srp);
+		sg_deact_request(sfp, srp);
 		return -EINVAL;
 	}
 
@@ -716,7 +719,7 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 	if (k) {
 		SG_LOG(1, sfp, "%s: start_req err=%d\n", __func__, k);
 		sg_finish_scsi_blk_rq(srp);
-		sg_remove_request(sfp, srp);
+		sg_deact_request(sfp, srp);
 		return k;	/* probably out of space --> ENOMEM */
 	}
 	if (SG_IS_DETACHING(sdp)) {
@@ -727,7 +730,7 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 		}
 
 		sg_finish_scsi_blk_rq(srp);
-		sg_remove_request(sfp, srp);
+		sg_deact_request(sfp, srp);
 		return -ENODEV;
 	}
 
@@ -772,12 +775,24 @@ sg_get_rq_mark(struct sg_fd *sfp, int pack_id)
 	return NULL;
 }
 
+static int
+srp_done(struct sg_fd *sfp, struct sg_request *srp)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&sfp->rq_list_lock, flags);
+	ret = srp->done;
+	spin_unlock_irqrestore(&sfp->rq_list_lock, flags);
+	return ret;
+}
+
 static ssize_t
 sg_receive_v3(struct sg_fd *sfp, char __user *buf, size_t count,
 	      struct sg_request *srp)
 {
 	struct sg_io_hdr *hp = &srp->header;
-	int err = 0, err2;
+	int err = 0;
 	int len;
 
 	if (in_compat_syscall()) {
@@ -811,21 +826,9 @@ sg_receive_v3(struct sg_fd *sfp, char __user *buf, size_t count,
 		hp->info |= SG_INFO_CHECK;
 	err = put_sg_io_hdr(hp, buf);
 err_out:
-	err2 = sg_finish_scsi_blk_rq(srp);
-	sg_remove_request(sfp, srp);
-	return err ? : err2 ? : count;
-}
-
-static int
-srp_done(struct sg_fd *sfp, struct sg_request *srp)
-{
-	unsigned long flags;
-	int ret;
-
-	spin_lock_irqsave(&sfp->rq_list_lock, flags);
-	ret = srp->done;
-	spin_unlock_irqrestore(&sfp->rq_list_lock, flags);
-	return ret;
+	sg_finish_scsi_blk_rq(srp);
+	sg_deact_request(sfp, srp);
+	return err;
 }
 
 static int
@@ -898,7 +901,7 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 		res = (h2p->result == 0) ? 0 : -EIO;
 	}
 	sg_finish_scsi_blk_rq(srp);
-	sg_remove_request(sfp, srp);
+	sg_deact_request(sfp, srp);
 	return res;
 }
 
@@ -1546,7 +1549,7 @@ sg_rq_end_io_usercontext(struct work_struct *work)
 	struct sg_fd *sfp = srp->parentfp;
 
 	sg_finish_scsi_blk_rq(srp);
-	sg_remove_request(sfp, srp);
+	sg_deact_request(sfp, srp);
 	kref_put(&sfp->f_ref, sg_remove_sfp);
 }
 
@@ -1671,7 +1674,7 @@ static const struct file_operations sg_fops = {
 
 static struct class *sg_sysfs_class;
 
-static int sg_sysfs_valid = 0;
+static bool sg_sysfs_valid;
 
 static struct sg_device *
 sg_add_device_helper(struct gendisk *disk, struct scsi_device *scsidp)
@@ -1904,7 +1907,7 @@ init_sg(void)
 		rc = PTR_ERR(sg_sysfs_class);
 		goto err_out;
         }
-	sg_sysfs_valid = 1;
+	sg_sysfs_valid = true;
 	rc = scsi_register_interface(&sg_interface);
 	if (0 == rc) {
 		sg_proc_init();
@@ -1931,7 +1934,7 @@ exit_sg(void)
 		remove_proc_subtree("scsi/sg", NULL);
 	scsi_unregister_interface(&sg_interface);
 	class_destroy(sg_sysfs_class);
-	sg_sysfs_valid = 0;
+	sg_sysfs_valid = false;
 	unregister_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0),
 				 SG_MAX_DEVS);
 	idr_destroy(&sg_index_idr);
@@ -2066,10 +2069,10 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 	return res;
 }
 
-static int
+static void
 sg_finish_scsi_blk_rq(struct sg_request *srp)
 {
-	int ret = 0;
+	int ret;
 
 	struct sg_fd *sfp = srp->parentfp;
 	struct sg_scatter_hold *req_schp = &srp->data;
@@ -2080,8 +2083,13 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 		atomic_dec(&sfp->submitted);
 		atomic_dec(&sfp->waiting);
 	}
-	if (srp->bio)
+	if (srp->bio) {
 		ret = blk_rq_unmap_user(srp->bio);
+		if (ret)	/* -EINTR (-4) can be ignored */
+			SG_LOG(6, sfp, "%s: blk_rq_unmap_user() --> %d\n",
+			       __func__, ret);
+		srp->bio = NULL;
+	}
 
 	if (srp->rq) {
 		scsi_req_free_cmd(scsi_req(srp->rq));
@@ -2092,8 +2100,6 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 		sg_unlink_reserve(sfp, srp);
 	else
 		sg_remove_scat(sfp, req_schp);
-
-	return ret;
 }
 
 static int
@@ -2337,7 +2343,7 @@ sg_setup_req(struct sg_fd *sfp)
 
 /* Return of 1 for found; 0 for not found */
 static int
-sg_remove_request(struct sg_fd *sfp, struct sg_request *srp)
+sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 {
 	unsigned long iflags;
 	int res = 0;
@@ -2357,9 +2363,9 @@ sg_remove_request(struct sg_fd *sfp, struct sg_request *srp)
 static struct sg_fd *
 sg_add_sfp(struct sg_device *sdp)
 {
-	struct sg_fd *sfp;
 	unsigned long iflags;
 	int bufflen;
+	struct sg_fd *sfp;
 
 	sfp = kzalloc(sizeof(*sfp), GFP_ATOMIC | __GFP_NOWARN);
 	if (!sfp)
@@ -2405,10 +2411,16 @@ sg_add_sfp(struct sg_device *sdp)
 static void
 sg_remove_sfp_usercontext(struct work_struct *work)
 {
+	unsigned long iflags;
 	struct sg_fd *sfp = container_of(work, struct sg_fd, ew_fd.work);
-	struct sg_device *sdp = sfp->parentdp;
+	struct sg_device *sdp;
 	struct sg_request *srp;
-	unsigned long iflags;
+
+	if (!sfp) {
+		pr_warn("sg: %s: sfp is NULL\n", __func__);
+		return;
+	}
+	sdp = sfp->parentdp;
 
 	/* Cleanup any responses which were never read(). */
 	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
@@ -2429,17 +2441,19 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 	SG_LOG(6, sfp, "%s: sfp=0x%p\n", __func__, sfp);
 	kfree(sfp);
 
-	scsi_device_put(sdp->device);
-	kref_put(&sdp->d_ref, sg_device_destroy);
+	if (sdp) {
+		scsi_device_put(sdp->device);
+		kref_put(&sdp->d_ref, sg_device_destroy);
+	}
 	module_put(THIS_MODULE);
 }
 
 static void
 sg_remove_sfp(struct kref *kref)
 {
+	unsigned long iflags;
 	struct sg_fd *sfp = container_of(kref, struct sg_fd, f_ref);
 	struct sg_device *sdp = sfp->parentdp;
-	unsigned long iflags;
 
 	write_lock_irqsave(&sdp->sfd_lock, iflags);
 	list_del(&sfp->sfd_entry);
@@ -2652,7 +2666,7 @@ struct sg_proc_deviter {
 static void *
 dev_seq_start(struct seq_file *s, loff_t *pos)
 {
-	struct sg_proc_deviter * it = kmalloc(sizeof(*it), GFP_KERNEL);
+	struct sg_proc_deviter *it = kzalloc(sizeof(*it), GFP_KERNEL);
 
 	s->private = it;
 	if (! it)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 18/83] sg: rework scatter gather handling
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (17 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 17/83] sg: replace sg_allow_access Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 19/83] sg: introduce request state machine Douglas Gilbert
                   ` (64 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, kernel test robot, Dan Carpenter

Rename sg_build_indirect() to sg_mk_sgat() and sg_remove_scat()
to sg_remove_sgat(). Re-implement those functions. Add
sg_calc_sgat_param() to calculate various scatter gather
list parameters. Some other minor clean-ups.

Earlier versions of this patch made the order and o_order
variables in sg_mk_sgat() unsigned int but that breaks
'if (--order >= 0)' as pointed out by test robot. Make
those variable signed again.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 294 +++++++++++++++++++++++++---------------------
 1 file changed, 162 insertions(+), 132 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 610cd69e5201..0aeb47018b92 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -90,7 +90,6 @@ static int def_reserved_size = -1;	/* picks up init parameter */
 static int sg_allow_dio = SG_ALLOW_DIO_DEF;
 
 static int scatter_elem_sz = SG_SCATTER_SZ;
-static int scatter_elem_sz_prev = SG_SCATTER_SZ;
 
 #define SG_DEF_SECTOR_SZ 512
 
@@ -145,6 +144,7 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
 	atomic_t submitted;	/* number inflight or awaiting read */
 	atomic_t waiting;	/* number of requests awaiting read */
+	int sgat_elem_sz;	/* initialized to scatter_elem_sz */
 	struct sg_scatter_hold reserve;	/* buffer for this file descriptor */
 	struct list_head rq_list; /* head of request list */
 	struct fasync_struct *async_qp;	/* used by asynchronous notification */
@@ -165,6 +165,7 @@ struct sg_device { /* holds the state of each scsi generic device */
 	struct mutex open_rel_lock;     /* held when in open() or release() */
 	struct list_head sfds;
 	rwlock_t sfd_lock;      /* protect access to sfd list */
+	int max_sgat_elems;     /* adapter's max number of elements in sgat */
 	int max_sgat_sz;	/* max number of bytes in sgat list */
 	u32 index;		/* device index number */
 	atomic_t open_cnt;	/* count of opens (perhaps < num(sfds) ) */
@@ -187,8 +188,8 @@ static void sg_rq_end_io(struct request *rq, blk_status_t status);
 static int sg_proc_init(void);
 static int sg_start_req(struct sg_request *srp, u8 *cmd);
 static void sg_finish_scsi_blk_rq(struct sg_request *srp);
-static int sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
-			     int buff_size);
+static int sg_mk_sgat(struct sg_scatter_hold *schp, struct sg_fd *sfp,
+		      int minlen);
 static ssize_t sg_submit(struct sg_fd *sfp, struct file *filp,
 			 const char __user *buf, size_t count, bool blocking,
 			 bool read_only, bool sg_io_owned,
@@ -196,7 +197,7 @@ static ssize_t sg_submit(struct sg_fd *sfp, struct file *filp,
 static int sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwp);
 static int sg_read_append(struct sg_request *srp, void __user *outp,
 			  int num_xfer);
-static void sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp);
+static void sg_remove_sgat(struct sg_fd *sfp, struct sg_scatter_hold *schp);
 static void sg_build_reserve(struct sg_fd *sfp, int req_size);
 static void sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp,
 			    int size);
@@ -207,6 +208,7 @@ static struct sg_request *sg_setup_req(struct sg_fd *sfp);
 static int sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int dev);
 static void sg_device_destroy(struct kref *kref);
+static void sg_calc_sgat_param(struct sg_device *sdp);
 
 #define SZ_SG_HEADER ((int)sizeof(struct sg_header))	/* v1 and v2 header */
 #define SZ_SG_IO_HDR ((int)sizeof(struct sg_io_hdr))	/* v3 header */
@@ -352,7 +354,6 @@ sg_open(struct inode *inode, struct file *filp)
 	int min_dev = iminor(inode);
 	int op_flags = filp->f_flags;
 	int res;
-	struct request_queue *q;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 
@@ -411,16 +412,12 @@ sg_open(struct inode *inode, struct file *filp)
 	if (o_excl)
 		set_bit(SG_FDEV_EXCLUDE, sdp->fdev_bm);
 
-	if (atomic_read(&sdp->open_cnt) < 1) {  /* no existing opens */
-		clear_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm);
-		q = sdp->device->request_queue;
-		sdp->max_sgat_sz = queue_max_segments(q);
-	}
+	if (atomic_read(&sdp->open_cnt) < 1)	/* no existing opens */
+		sg_calc_sgat_param(sdp);
 	sfp = sg_add_sfp(sdp);		/* increments sdp->d_ref */
 	if (IS_ERR(sfp)) {
 		res = PTR_ERR(sfp);
-		goto out_undo;
-	}
+		goto out_undo; }
 
 	filp->private_data = sfp;
 	atomic_inc(&sdp->open_cnt);
@@ -996,10 +993,43 @@ max_sectors_bytes(struct request_queue *q)
 	unsigned int max_sectors = queue_max_sectors(q);
 
 	max_sectors = min_t(unsigned int, max_sectors, INT_MAX >> 9);
-
 	return max_sectors << 9;
 }
 
+/*
+ * Calculates sg_device::max_sgat_elems and sg_device::max_sgat_sz. It uses
+ * the device's request queue. If q not available sets max_sgat_elems to 1
+ * and max_sgat_sz to PAGE_SIZE. If potential max_sgat_sz is greater than
+ * 2^30 scales down the implied max_segment_size so the product of the
+ * max_segment_size and max_sgat_elems is less than or equal to 2^30 .
+ */
+static void
+sg_calc_sgat_param(struct sg_device *sdp)
+{
+	int sz;
+	u64 m;
+	struct scsi_device *sdev = sdp->device;
+	struct request_queue *q = sdev ? sdev->request_queue : NULL;
+
+	clear_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm);
+	if (!q) {
+		sdp->max_sgat_elems = 1;
+		sdp->max_sgat_sz = PAGE_SIZE;
+		return;
+	}
+	sdp->max_sgat_elems = queue_max_segments(q);
+	m = (u64)queue_max_segment_size(q) * queue_max_segments(q);
+	if (m < PAGE_SIZE) {
+		sdp->max_sgat_elems = 1;
+		sdp->max_sgat_sz = PAGE_SIZE;
+		return;
+	}
+	sz = (int)min_t(u64, m, 1 << 30);
+	if (sz == (1 << 30))	/* round down so: sz = elems * elem_sz */
+		sz = ((1 << 30) / sdp->max_sgat_elems) * sdp->max_sgat_elems;
+	sdp->max_sgat_sz = sz;
+}
+
 static void
 sg_fill_request_table(struct sg_fd *sfp, struct sg_req_info *rinfo)
 {
@@ -1065,7 +1095,7 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	}
 	srp->orphan = 1;
 	spin_unlock_irq(&sfp->rq_list_lock);
-	return res;	/* -ERESTARTSYS because signal hit process */
+	return res;
 }
 
 static int
@@ -1076,8 +1106,7 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 		    sfp->res_in_use) {
 			return -EBUSY;
 		}
-
-		sg_remove_scat(sfp, &sfp->reserve);
+		sg_remove_sgat(sfp, &sfp->reserve);
 		sg_build_reserve(sfp, want_rsv_sz);
 	}
 	return 0;
@@ -1546,8 +1575,18 @@ sg_rq_end_io_usercontext(struct work_struct *work)
 {
 	struct sg_request *srp = container_of(work, struct sg_request,
 					      ew_orph.work);
-	struct sg_fd *sfp = srp->parentfp;
+	struct sg_fd *sfp;
 
+	if (!srp) {
+		WARN_ONCE(1, "%s: srp unexpectedly NULL\n", __func__);
+		return;
+	}
+	sfp = srp->parentfp;
+	if (!sfp) {
+		WARN_ONCE(1, "%s: sfp unexpectedly NULL\n", __func__);
+		return;
+	}
+	SG_LOG(3, sfp, "%s: srp=0x%p\n", __func__, srp);
 	sg_finish_scsi_blk_rq(srp);
 	sg_deact_request(sfp, srp);
 	kref_put(&sfp->f_ref, sg_remove_sfp);
@@ -1679,7 +1718,6 @@ static bool sg_sysfs_valid;
 static struct sg_device *
 sg_add_device_helper(struct gendisk *disk, struct scsi_device *scsidp)
 {
-	struct request_queue *q = scsidp->request_queue;
 	struct sg_device *sdp;
 	unsigned long iflags;
 	int error;
@@ -1719,7 +1757,7 @@ sg_add_device_helper(struct gendisk *disk, struct scsi_device *scsidp)
 	init_waitqueue_head(&sdp->open_wait);
 	clear_bit(SG_FDEV_DETACHING, sdp->fdev_bm);
 	rwlock_init(&sdp->sfd_lock);
-	sdp->max_sgat_sz = queue_max_segments(q);
+	sg_calc_sgat_param(sdp);
 	sdp->index = k;
 	kref_init(&sdp->d_ref);
 	error = 0;
@@ -1889,24 +1927,24 @@ init_sg(void)
 {
 	int rc;
 
-	if (scatter_elem_sz < PAGE_SIZE) {
+	if (scatter_elem_sz < (int)PAGE_SIZE)
 		scatter_elem_sz = PAGE_SIZE;
-		scatter_elem_sz_prev = scatter_elem_sz;
-	}
+	else if (!is_power_of_2(scatter_elem_sz))
+		scatter_elem_sz = roundup_pow_of_two(scatter_elem_sz);
 	if (def_reserved_size >= 0)
 		sg_big_buff = def_reserved_size;
 	else
 		def_reserved_size = sg_big_buff;
 
-	rc = register_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0), 
+	rc = register_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0),
 				    SG_MAX_DEVS, "sg");
 	if (rc)
 		return rc;
         sg_sysfs_class = class_create(THIS_MODULE, "scsi_generic");
         if ( IS_ERR(sg_sysfs_class) ) {
 		rc = PTR_ERR(sg_sysfs_class);
-		goto err_out;
-        }
+		goto err_out_unreg;
+	}
 	sg_sysfs_valid = true;
 	rc = scsi_register_interface(&sg_interface);
 	if (0 == rc) {
@@ -1914,7 +1952,7 @@ init_sg(void)
 		return 0;
 	}
 	class_destroy(sg_sysfs_class);
-err_out:
+err_out_unreg:
 	unregister_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0), SG_MAX_DEVS);
 	return rc;
 }
@@ -2019,7 +2057,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 			mutex_unlock(&sfp->f_mutex);
 			return res;
 		} else {
-			res = sg_build_indirect(req_schp, sfp, dxfer_len);
+			res = sg_mk_sgat(req_schp, sfp, dxfer_len);
 			if (res) {
 				mutex_unlock(&sfp->f_mutex);
 				return res;
@@ -2099,117 +2137,104 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 	if (srp->res_used)
 		sg_unlink_reserve(sfp, srp);
 	else
-		sg_remove_scat(sfp, req_schp);
+		sg_remove_sgat(sfp, req_schp);
 }
 
 static int
-sg_build_sgat(struct sg_scatter_hold *schp, const struct sg_fd *sfp,
-	      int tablesize)
-{
-	int sg_buflen = tablesize * sizeof(struct page *);
-	gfp_t gfp_flags = GFP_ATOMIC | __GFP_NOWARN;
-
-	schp->pages = kzalloc(sg_buflen, gfp_flags);
-	if (!schp->pages)
-		return -ENOMEM;
-	schp->sglist_len = sg_buflen;
-	return tablesize;	/* number of scat_gath elements allocated */
-}
-
-static int
-sg_build_indirect(struct sg_scatter_hold *schp, struct sg_fd *sfp,
-		  int buff_size)
-{
-	int ret_sz = 0, i, k, rem_sz, num, mx_sc_elems;
-	int max_sgat_sz = sfp->parentdp->max_sgat_sz;
-	int blk_size = buff_size, order;
-	gfp_t gfp_mask = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN | __GFP_ZERO;
+sg_mk_sgat(struct sg_scatter_hold *schp, struct sg_fd *sfp, int minlen)
+{
+	int j, k, rem_sz, align_sz, order, o_order;
+	int mx_sgat_elems = sfp->parentdp->max_sgat_elems;
+	unsigned int elem_sz;
+	const size_t ptr_sz = sizeof(struct page *);
+	gfp_t mask_ap = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN | __GFP_ZERO;
+	gfp_t mask_kz = GFP_ATOMIC | __GFP_NOWARN;
 	struct sg_device *sdp = sfp->parentdp;
 
-	if (blk_size < 0)
-		return -EFAULT;
-	if (0 == blk_size)
-		++blk_size;	/* don't know why */
-	/* round request up to next highest SG_DEF_SECTOR_SZ byte boundary */
-	blk_size = ALIGN(blk_size, SG_DEF_SECTOR_SZ);
-	SG_LOG(4, sfp, "%s: buff_size=%d, blk_size=%d\n", __func__, buff_size,
-	       blk_size);
-
-	/* N.B. ret_sz carried into this block ... */
-	mx_sc_elems = sg_build_sgat(schp, sfp, max_sgat_sz);
-	if (mx_sc_elems < 0)
-		return mx_sc_elems;	/* most likely -ENOMEM */
-
-	num = scatter_elem_sz;
-	if (unlikely(num != scatter_elem_sz_prev)) {
-		if (num < PAGE_SIZE) {
-			scatter_elem_sz = PAGE_SIZE;
-			scatter_elem_sz_prev = PAGE_SIZE;
-		} else
-			scatter_elem_sz_prev = num;
+	if (unlikely(minlen <= 0)) {
+		if (minlen < 0)
+			return -EFAULT;
+		++minlen;	/* don't remember why */
 	}
+	/* round request up to next highest SG_DEF_SECTOR_SZ byte boundary */
+	align_sz = ALIGN(minlen, SG_DEF_SECTOR_SZ);
 
-	if (sdp->device->host->unchecked_isa_dma)
-		gfp_mask |= GFP_DMA;
-
-	order = get_order(num);
-retry:
-	ret_sz = 1 << (PAGE_SHIFT + order);
-
-	for (k = 0, rem_sz = blk_size; rem_sz > 0 && k < mx_sc_elems;
-	     k++, rem_sz -= ret_sz) {
+	schp->pages = kcalloc(mx_sgat_elems, ptr_sz, mask_kz);
+	SG_LOG(4, sfp, "%s: minlen=%d, align_sz=%d [sz=%zu, 0x%p ++]\n",
+	       __func__, minlen, align_sz, mx_sgat_elems * ptr_sz,
+	       schp->pages);
+	if (unlikely(!schp->pages))
+		return -ENOMEM;
 
-		num = (rem_sz > scatter_elem_sz_prev) ?
-			scatter_elem_sz_prev : rem_sz;
+	elem_sz = sfp->sgat_elem_sz;    /* power of 2 and >= PAGE_SIZE */
+	if (sdp && unlikely(sdp->device->host->unchecked_isa_dma))
+		mask_ap |= GFP_DMA;
+	o_order = get_order(elem_sz);
+	order = o_order;
 
-		schp->pages[k] = alloc_pages(gfp_mask, order);
+again:
+	for (k = 0, rem_sz = align_sz; rem_sz > 0 && k < mx_sgat_elems;
+	     ++k, rem_sz -= elem_sz) {
+		schp->pages[k] = alloc_pages(mask_ap, order);
 		if (!schp->pages[k])
-			goto out;
-
-		if (num == scatter_elem_sz_prev) {
-			if (unlikely(ret_sz > scatter_elem_sz_prev)) {
-				scatter_elem_sz = ret_sz;
-				scatter_elem_sz_prev = ret_sz;
-			}
-		}
-		SG_LOG(5, sfp, "%s: k=%d, num=%d, ret_sz=%d\n", __func__, k,
-		       num, ret_sz);
-	}		/* end of for loop */
-
+			goto err_out;
+		SG_LOG(5, sfp, "%s: k=%d, order=%d [0x%p ++]\n", __func__, k,
+		       order, schp->pages[k]);
+	}
 	schp->page_order = order;
 	schp->num_sgat = k;
-	SG_LOG(5, sfp, "%s: num_sgat=%d, order=%d\n", __func__, k, order);
-	schp->buflen = blk_size;
-	if (rem_sz > 0)	/* must have failed */
-		return -ENOMEM;
+	SG_LOG(((order != o_order || rem_sz > 0) ? 2 : 5), sfp,
+	       "%s: num_sgat=%d, order=%d,%d\n", __func__, k, o_order, order);
+	if (unlikely(rem_sz > 0)) {	/* hit mx_sgat_elems */
+		order = 0;		/* force exit */
+		goto err_out;
+	}
+	schp->buflen = align_sz;
 	return 0;
-out:
-	for (i = 0; i < k; i++)
-		__free_pages(schp->pages[i], order);
-
-	if (--order >= 0)
-		goto retry;
+err_out:
+	for (j = 0; j < k; ++j)
+		__free_pages(schp->pages[j], order);
 
+	if (--order >= 0) {
+		elem_sz >>= 1;
+		goto again;
+	}
+	kfree(schp->pages);
+	schp->pages = NULL;
 	return -ENOMEM;
 }
 
 static void
-sg_remove_scat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
+sg_remove_sgat_helper(struct sg_fd *sfp, struct sg_scatter_hold *schp)
 {
-	SG_LOG(4, sfp, "%s: num_sgat=%d\n", __func__, schp->num_sgat);
-	if (schp->pages && schp->sglist_len > 0) {
-		if (!schp->dio_in_use) {
-			int k;
+	int k;
+	void *p;
 
-			for (k = 0; k < schp->num_sgat && schp->pages[k]; k++) {
-				SG_LOG(5, sfp, "%s: pg[%d]=0x%p --\n",
-				       __func__, k, schp->pages[k]);
-				__free_pages(schp->pages[k], schp->page_order);
-			}
-			kfree(schp->pages);
-		}
+	if (!schp->pages)
+		return;
+	for (k = 0; k < schp->num_sgat; ++k) {
+		p = schp->pages[k];
+		SG_LOG(5, sfp, "%s: pg[%d]=0x%p --\n", __func__, k, p);
+		if (unlikely(!p))
+			continue;
+		__free_pages(p, schp->page_order);
 	}
-	memset(schp, 0, sizeof (*schp));
+	SG_LOG(5, sfp, "%s: pg_order=%u, free pgs=0x%p --\n", __func__,
+	       schp->page_order, schp->pages);
+	kfree(schp->pages);
+}
+
+/* Remove the data (possibly a sgat list) held by srp, not srp itself */
+static void
+sg_remove_sgat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
+{
+	SG_LOG(4, sfp, "%s: num_sgat=%d%s\n", __func__, schp->num_sgat,
+	       ((sfp ? (&sfp->reserve == schp) : false) ?
+		" [rsv]" : ""));
+	if (!schp->dio_in_use)
+		sg_remove_sgat_helper(sfp, schp);
+
+	memset(schp, 0, sizeof(*schp));         /* zeros buflen and dlen */
 }
 
 /*
@@ -2231,12 +2256,12 @@ sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
 	for (k = 0; k < schp->num_sgat && schp->pages[k]; k++) {
 		if (num > num_xfer) {
 			if (copy_to_user(outp, page_address(schp->pages[k]),
-					   num_xfer))
+					 num_xfer))
 				return -EFAULT;
 			break;
 		} else {
 			if (copy_to_user(outp, page_address(schp->pages[k]),
-					   num))
+					 num))
 				return -EFAULT;
 			num_xfer -= num;
 			if (num_xfer <= 0)
@@ -2256,10 +2281,10 @@ sg_build_reserve(struct sg_fd *sfp, int req_size)
 	do {
 		if (req_size < PAGE_SIZE)
 			req_size = PAGE_SIZE;
-		if (0 == sg_build_indirect(schp, sfp, req_size))
+		if (sg_mk_sgat(schp, sfp, req_size) == 0)
 			return;
 		else
-			sg_remove_scat(sfp, schp);
+			sg_remove_sgat(sfp, schp);
 		req_size >>= 1;	/* divide by 2 */
 	} while (req_size > (PAGE_SIZE / 2));
 }
@@ -2363,8 +2388,8 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 static struct sg_fd *
 sg_add_sfp(struct sg_device *sdp)
 {
+	int rbuf_len;
 	unsigned long iflags;
-	int bufflen;
 	struct sg_fd *sfp;
 
 	sfp = kzalloc(sizeof(*sfp), GFP_ATOMIC | __GFP_NOWARN);
@@ -2381,6 +2406,14 @@ sg_add_sfp(struct sg_device *sdp)
 	sfp->force_packid = SG_DEF_FORCE_PACK_ID;
 	sfp->cmd_q = SG_DEF_COMMAND_Q;
 	sfp->keep_orphan = SG_DEF_KEEP_ORPHAN;
+	/*
+	 * SG_SCATTER_SZ initializes scatter_elem_sz but different value may
+	 * be given as driver/module parameter (e.g. 'scatter_elem_sz=8192').
+	 * Any user provided number will be changed to be PAGE_SIZE as a
+	 * minimum, otherwise it will be rounded down (if required) to a
+	 * power of 2. So it will always be a power of 2.
+	 */
+	sfp->sgat_elem_sz = scatter_elem_sz;
 	sfp->parentdp = sdp;
 	atomic_set(&sfp->submitted, 0);
 	atomic_set(&sfp->waiting, 0);
@@ -2397,14 +2430,13 @@ sg_add_sfp(struct sg_device *sdp)
 	if (unlikely(sg_big_buff != def_reserved_size))
 		sg_big_buff = def_reserved_size;
 
-	bufflen = min_t(int, sg_big_buff,
-			max_sectors_bytes(sdp->device->request_queue));
-	sg_build_reserve(sfp, bufflen);
-	SG_LOG(3, sfp, "%s: bufflen=%d, num_sgat=%d\n", __func__,
-	       sfp->reserve.buflen, sfp->reserve.num_sgat);
+	rbuf_len = min_t(int, sg_big_buff, sdp->max_sgat_sz);
+	if (rbuf_len > 0)
+		sg_build_reserve(sfp, rbuf_len);
 
 	kref_get(&sdp->d_ref);
 	__module_get(THIS_MODULE);
+	SG_LOG(3, sfp, "%s: success, sfp=0x%p ++\n", __func__, sfp);
 	return sfp;
 }
 
@@ -2435,16 +2467,14 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 	if (sfp->reserve.buflen > 0) {
 		SG_LOG(6, sfp, "%s:    buflen=%d, num_sgat=%d\n", __func__,
 		       (int)sfp->reserve.buflen, (int)sfp->reserve.num_sgat);
-		sg_remove_scat(sfp, &sfp->reserve);
+		sg_remove_sgat(sfp, &sfp->reserve);
 	}
 
 	SG_LOG(6, sfp, "%s: sfp=0x%p\n", __func__, sfp);
 	kfree(sfp);
 
-	if (sdp) {
-		scsi_device_put(sdp->device);
-		kref_put(&sdp->d_ref, sg_device_destroy);
-	}
+	scsi_device_put(sdp->device);
+	kref_put(&sdp->d_ref, sg_device_destroy);
 	module_put(THIS_MODULE);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 19/83] sg: introduce request state machine
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (18 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 18/83] sg: rework scatter gather handling Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 20/83] sg: sg_find_srp_by_id Douglas Gilbert
                   ` (63 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

The introduced request state machine is not wired in so that
the size of one of the following patches is reduced. Bit
operation defines for the request and file descriptor level
are also introduced. Minor rework og sg_read_append() function.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 229 ++++++++++++++++++++++++++++++++++------------
 1 file changed, 171 insertions(+), 58 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 0aeb47018b92..81de8cf5ef4b 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -72,7 +72,41 @@ static char *sg_version_date = "20190606";
  */
 #define SG_MAX_CDB_SIZE 252
 
+/* Following enum contains the states of sg_request::rq_st */
+enum sg_rq_state {
+	SG_RS_INACTIVE = 0,	/* request not in use (e.g. on fl) */
+	SG_RS_INFLIGHT,		/* active: cmd/req issued, no response yet */
+	SG_RS_AWAIT_RCV,	/* have response from LLD, awaiting receive */
+	SG_RS_RCV_DONE,		/* receive is ongoing or done */
+	SG_RS_BUSY,		/* temporary state should rarely be seen */
+};
+
+#define SG_TIME_UNIT_MS 0	/* milliseconds */
+#define SG_DEF_TIME_UNIT SG_TIME_UNIT_MS
 #define SG_DEFAULT_TIMEOUT mult_frac(SG_DEFAULT_TIMEOUT_USER, HZ, USER_HZ)
+#define SG_FD_Q_AT_HEAD 0
+#define SG_DEFAULT_Q_AT SG_FD_Q_AT_HEAD /* for backward compatibility */
+#define SG_FL_MMAP_DIRECT (SG_FLAG_MMAP_IO | SG_FLAG_DIRECT_IO)
+
+/* Only take lower 4 bits of driver byte, all host byte and sense byte */
+#define SG_ML_RESULT_MSK 0x0fff00ff	/* mid-level's 32 bit result value */
+
+#define SG_PACK_ID_WILDCARD (-1)
+
+#define SG_ADD_RQ_MAX_RETRIES 40	/* to stop infinite _trylock(s) */
+
+/* Bit positions (flags) for sg_request::frq_bm bitmask follow */
+#define SG_FRQ_IS_ORPHAN	1	/* owner of request gone */
+#define SG_FRQ_SYNC_INVOC	2	/* synchronous (blocking) invocation */
+#define SG_FRQ_NO_US_XFER	3	/* no user space transfer of data */
+#define SG_FRQ_DEACT_ORPHAN	6	/* not keeping orphan so de-activate */
+
+/* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */
+#define SG_FFD_FORCE_PACKID	0	/* receive only given pack_id/tag */
+#define SG_FFD_CMD_Q		1	/* clear: only 1 active req per fd */
+#define SG_FFD_KEEP_ORPHAN	2	/* policy for this fd */
+#define SG_FFD_MMAP_CALLED	3	/* mmap(2) system call made on fd */
+#define SG_FFD_Q_AT_TAIL	5	/* set: queue reqs at tail of blk q */
 
 /* Bit positions (flags) for sg_device::fdev_bm bitmask follow */
 #define SG_FDEV_EXCLUDE		0	/* have fd open with O_EXCL */
@@ -80,12 +114,11 @@ static char *sg_version_date = "20190606";
 #define SG_FDEV_LOG_SENSE	2	/* set by ioctl(SG_SET_DEBUG) */
 
 int sg_big_buff = SG_DEF_RESERVED_SIZE;
-/* N.B. This variable is readable and writeable via
-   /proc/scsi/sg/def_reserved_size . Each time sg_open() is called a buffer
-   of this size (or less if there is not enough memory) will be reserved
-   for use by this file descriptor. [Deprecated usage: this variable is also
-   readable via /proc/sys/kernel/sg-big-buff if the sg driver is built into
-   the kernel (i.e. it is not a module).] */
+/*
+ * This variable is accessible via /proc/scsi/sg/def_reserved_size . Each
+ * time sg_open() is called a sg_request of this size (or less if there is
+ * not enough memory) will be reserved for use by this file descriptor.
+ */
 static int def_reserved_size = -1;	/* picks up init parameter */
 static int sg_allow_dio = SG_ALLOW_DIO_DEF;
 
@@ -129,6 +162,7 @@ struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
 	char sg_io_owned;	/* 1 -> packet belongs to SG_IO */
 	/* done protected by rq_list_lock */
 	char done;		/* 0->before bh, 1->before read, 2->read */
+	atomic_t rq_st;		/* request state, holds a enum sg_rq_state */
 	struct request *rq;	/* released in sg_rq_end_io(), bio kept */
 	struct bio *bio;	/* kept until this req -->SG_RS_INACTIVE */
 	struct execute_work ew_orph;	/* harvest orphan request */
@@ -205,10 +239,15 @@ static void sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
 static void sg_remove_sfp(struct kref *);
 static struct sg_request *sg_setup_req(struct sg_fd *sfp);
-static int sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
+static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int dev);
 static void sg_device_destroy(struct kref *kref);
 static void sg_calc_sgat_param(struct sg_device *sdp);
+static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
+static void sg_rep_rq_state_fail(struct sg_fd *sfp,
+				 enum sg_rq_state exp_old_st,
+				 enum sg_rq_state want_st,
+				 enum sg_rq_state act_old_st);
 
 #define SZ_SG_HEADER ((int)sizeof(struct sg_header))	/* v1 and v2 header */
 #define SZ_SG_IO_HDR ((int)sizeof(struct sg_io_hdr))	/* v3 header */
@@ -216,6 +255,8 @@ static void sg_calc_sgat_param(struct sg_device *sdp);
 
 #define SG_IS_DETACHING(sdp) test_bit(SG_FDEV_DETACHING, (sdp)->fdev_bm)
 #define SG_HAVE_EXCLUDE(sdp) test_bit(SG_FDEV_EXCLUDE, (sdp)->fdev_bm)
+#define SG_RS_ACTIVE(srp) (atomic_read(&(srp)->rq_st) != SG_RS_INACTIVE)
+#define SG_RS_AWAIT_READ(srp) (atomic_read(&(srp)->rq_st) == SG_RS_AWAIT_RCV)
 
 /*
  * Kernel needs to be built with CONFIG_SCSI_LOGGING to see log messages.
@@ -379,15 +420,6 @@ sg_open(struct inode *inode, struct file *filp)
 	res = sg_allow_if_err_recovery(sdp, non_block);
 	if (res)
 		goto error_out;
-	/* scsi_block_when_processing_errors() may block so bypass
-	 * check if O_NONBLOCK. Permits SCSI commands to be issued
-	 * during error recovery. Tread carefully. */
-	if (!((op_flags & O_NONBLOCK) ||
-	      scsi_block_when_processing_errors(sdp->device))) {
-		res = -ENXIO;
-		/* we are in error recovery for this device */
-		goto error_out;
-	}
 
 	mutex_lock(&sdp->open_rel_lock);
 	if (op_flags & O_NONBLOCK) {
@@ -486,12 +518,12 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 	struct sg_request *srp;
+	u8 cmnd[SG_MAX_CDB_SIZE];
 	struct sg_header ov2hdr;
 	struct sg_io_hdr v3hdr;
 	struct sg_header *ohp = &ov2hdr;
 	struct sg_io_hdr *h3p = &v3hdr;
 	struct sg_comm_wr_t cwr;
-	u8 cmnd[SG_MAX_CDB_SIZE];
 
 	res = sg_check_file_access(filp, __func__);
 	if (res)
@@ -746,10 +778,25 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 	return 0;
 }
 
+static inline int
+sg_rstate_chg(struct sg_request *srp, enum sg_rq_state old_st,
+	      enum sg_rq_state new_st)
+{
+	enum sg_rq_state act_old_st = (enum sg_rq_state)
+				atomic_cmpxchg(&srp->rq_st, old_st, new_st);
+
+	if (act_old_st == old_st)
+		return 0;	/* implies new_st --> srp->rq_st */
+	else if (IS_ENABLED(CONFIG_SCSI_LOGGING))
+		sg_rep_rq_state_fail(srp->parentfp, old_st, new_st,
+				     act_old_st);
+	return -EPROTOTYPE;
+}
+
 /*
- * read(2) related functions follow. They are shown after write(2) related
- * functions. Apart from read(2) itself, ioctl(SG_IORECEIVE) and the second
- * half of the ioctl(SG_IO) share code with read(2).
+ * This function is called by wait_event_interruptible in sg_read() and
+ * sg_ctl_ioreceive(). wait_event_interruptible will return if this one
+ * returns true (or an event like a signal (e.g. control-C) occurs).
  */
 
 static struct sg_request *
@@ -784,6 +831,32 @@ srp_done(struct sg_fd *sfp, struct sg_request *srp)
 	return ret;
 }
 
+#if IS_ENABLED(CONFIG_SCSI_LOGGING)
+static void
+sg_rep_rq_state_fail(struct sg_fd *sfp, enum sg_rq_state exp_old_st,
+		     enum sg_rq_state want_st, enum sg_rq_state act_old_st)
+{
+	const char *eors = "expected old rq_st: ";
+	const char *aors = "actual old rq_st: ";
+
+	if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
+		SG_LOG(1, sfp, "%s: %s%s, %s%s, wanted rq_st: %s\n", __func__,
+		       eors, sg_rq_st_str(exp_old_st, false),
+		       aors, sg_rq_st_str(act_old_st, false),
+		       sg_rq_st_str(want_st, false));
+	else
+		pr_info("sg: %s: %s%d, %s%d, wanted rq_st: %d\n", __func__,
+			eors, (int)exp_old_st, aors, (int)act_old_st,
+			(int)want_st);
+}
+#else
+static void
+sg_rep_rq_state_fail(struct sg_fd *sfp, enum sg_rq_state exp_old_st,
+		     enum sg_rq_state want_st, enum sg_rq_state act_old_st)
+{
+}
+#endif
+
 static ssize_t
 sg_receive_v3(struct sg_fd *sfp, char __user *buf, size_t count,
 	      struct sg_request *srp)
@@ -1311,7 +1384,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	case SG_GET_LOW_DMA:
 		SG_LOG(3, sfp, "%s:    SG_GET_LOW_DMA\n", __func__);
 		return put_user((int)sdev->host->unchecked_isa_dma, ip);
-	case SG_NEXT_CMD_LEN:
+	case SG_NEXT_CMD_LEN:	/* active only in v2 interface */
 		SG_LOG(3, sfp, "%s:    SG_NEXT_CMD_LEN\n", __func__);
 		result = get_user(val, ip);
 		if (result)
@@ -2245,48 +2318,37 @@ sg_remove_sgat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
 static int
 sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
 {
+	int k, num, res;
+	struct page *pgp;
 	struct sg_scatter_hold *schp = &srp->data;
-	int k, num;
 
 	SG_LOG(4, srp->parentfp, "%s: num_xfer=%d\n", __func__, num_xfer);
-	if (!outp || num_xfer <= 0)
-		return 0;
+	if (unlikely(!outp || num_xfer <= 0))
+		return (num_xfer == 0 && outp) ? 0 : -EINVAL;
 
 	num = 1 << (PAGE_SHIFT + schp->page_order);
-	for (k = 0; k < schp->num_sgat && schp->pages[k]; k++) {
+	for (k = 0, res = 0; k < schp->num_sgat; ++k) {
+		pgp = schp->pages[k];
+		if (unlikely(!pgp)) {
+			res = -ENXIO;
+			break;
+		}
 		if (num > num_xfer) {
-			if (copy_to_user(outp, page_address(schp->pages[k]),
-					 num_xfer))
-				return -EFAULT;
+			if (__copy_to_user(outp, page_address(pgp), num_xfer))
+				res = -EFAULT;
 			break;
 		} else {
-			if (copy_to_user(outp, page_address(schp->pages[k]),
-					 num))
-				return -EFAULT;
+			if (__copy_to_user(outp, page_address(pgp), num)) {
+				res = -EFAULT;
+				break;
+			}
 			num_xfer -= num;
 			if (num_xfer <= 0)
 				break;
 			outp += num;
 		}
 	}
-	return 0;
-}
-
-static void
-sg_build_reserve(struct sg_fd *sfp, int req_size)
-{
-	struct sg_scatter_hold *schp = &sfp->reserve;
-
-	SG_LOG(3, sfp, "%s: buflen=%d\n", __func__, req_size);
-	do {
-		if (req_size < PAGE_SIZE)
-			req_size = PAGE_SIZE;
-		if (sg_mk_sgat(schp, sfp, req_size) == 0)
-			return;
-		else
-			sg_remove_sgat(sfp, schp);
-		req_size >>= 1;	/* divide by 2 */
-	} while (req_size > (PAGE_SIZE / 2));
+	return res;
 }
 
 static void
@@ -2335,6 +2397,22 @@ sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp)
 	sfp->res_in_use = 0;
 }
 
+static void
+sg_build_reserve(struct sg_fd *sfp, int req_size)
+{
+	struct sg_scatter_hold *schp = &sfp->reserve;
+
+	SG_LOG(3, sfp, "%s: buflen=%d\n", __func__, req_size);
+	do {
+		if (req_size < PAGE_SIZE)
+			req_size = PAGE_SIZE;
+		if (sg_mk_sgat(schp, sfp, req_size) == 0)
+			return;
+		sg_remove_sgat(sfp, schp);
+		req_size >>= 1;	/* divide by 2 */
+	} while (req_size > (PAGE_SIZE / 2));
+}
+
 /* always adds to end of list */
 static struct sg_request *
 sg_setup_req(struct sg_fd *sfp)
@@ -2366,23 +2444,21 @@ sg_setup_req(struct sg_fd *sfp)
 	return NULL;
 }
 
-/* Return of 1 for found; 0 for not found */
-static int
+static void
 sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 {
 	unsigned long iflags;
-	int res = 0;
 
-	if (!sfp || !srp || list_empty(&sfp->rq_list))
-		return res;
+	if (WARN_ON(!sfp || !srp))
+		return;
+	if (list_empty(&sfp->rq_list))
+		return;
 	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
 	if (!list_empty(&srp->entry)) {
 		list_del(&srp->entry);
 		srp->parentfp = NULL;
-		res = 1;
 	}
 	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return res;
 }
 
 static struct sg_fd *
@@ -2440,6 +2516,15 @@ sg_add_sfp(struct sg_device *sdp)
 	return sfp;
 }
 
+/*
+ * A successful call to sg_release() will result, at some later time, to this
+ * function being invoked. All requests associated with this file descriptor
+ * should be completed or cancelled when this function is called (due to
+ * sfp->f_ref). Also the file descriptor itself has not been accessible since
+ * it was list_del()-ed by the preceding sg_remove_sfp() call. So no locking
+ * is required. sdp should never be NULL but to make debugging more robust,
+ * this function will not blow up in that case.
+ */
 static void
 sg_remove_sfp_usercontext(struct work_struct *work)
 {
@@ -2533,6 +2618,33 @@ sg_get_dev(int dev)
 	return sdp;
 }
 
+#if IS_ENABLED(CONFIG_SCSI_PROC_FS)
+static const char *
+sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
+{
+	switch (rq_st) {	/* request state */
+	case SG_RS_INACTIVE:
+		return long_str ? "inactive" :  "ina";
+	case SG_RS_INFLIGHT:
+		return long_str ? "inflight" : "act";
+	case SG_RS_AWAIT_RCV:
+		return long_str ? "await_receive" : "rcv";
+	case SG_RS_RCV_DONE:
+		return long_str ? "receive_done" : "fin";
+	case SG_RS_BUSY:
+		return long_str ? "busy" : "bsy";
+	default:
+		return long_str ? "unknown" : "unk";
+	}
+}
+#else
+static const char *
+sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
+{
+	return "";
+}
+#endif
+
 #if IS_ENABLED(CONFIG_SCSI_PROC_FS)     /* long, almost to end of file */
 static int sg_proc_seq_show_int(struct seq_file *s, void *v);
 
@@ -2829,8 +2941,9 @@ sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 						  jiffies_to_msecs(fp->timeout)),
 					(ms > hp->duration ? ms - hp->duration : 0));
 			}
-			seq_printf(s, "ms sgat=%d op=0x%02x\n", usg,
-				   (int) srp->data.cmd_opcode);
+			seq_printf(s, "ms sgat=%d op=0x%02x dummy: %s\n", usg,
+				   (int)srp->data.cmd_opcode,
+				   sg_rq_st_str(SG_RS_INACTIVE, false));
 		}
 		if (list_empty(&fp->rq_list))
 			seq_puts(s, "     No requests active\n");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 20/83] sg: sg_find_srp_by_id
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (19 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 19/83] sg: introduce request state machine Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 21/83] sg: sg_fill_request_element Douglas Gilbert
                   ` (62 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Replace sg_get_rq_mark() with sg_find_srp_by_id() and
sg_get_ready_srp(). Add sg_chk_mmap() to check flags and
reserve buffer available for mmap() based requests. Add
sg_copy_sense() and sg_rec_state_v3() which is just
refactoring. Add sg_calc_rq_dur() and sg_get_dur() in
preparation for optional nanosecond duration timing.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 286 +++++++++++++++++++++++++++++++---------------
 1 file changed, 197 insertions(+), 89 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 81de8cf5ef4b..74df15255a18 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -153,16 +153,19 @@ struct sg_fd;
 
 struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
 	struct list_head entry;	/* list entry */
-	struct sg_fd *parentfp;	/* NULL -> not in use */
 	struct sg_scatter_hold data;	/* hold buffer, perhaps scatter list */
 	struct sg_io_hdr header;  /* scsi command+info, see <scsi/sg.h> */
 	u8 sense_b[SCSI_SENSE_BUFFERSIZE];
+	u32 duration;		/* cmd duration in milliseconds */
 	char res_used;		/* 1 -> using reserve buffer, 0 -> not ... */
 	char orphan;		/* 1 -> drop on sight, 0 -> normal */
 	char sg_io_owned;	/* 1 -> packet belongs to SG_IO */
 	/* done protected by rq_list_lock */
 	char done;		/* 0->before bh, 1->before read, 2->read */
 	atomic_t rq_st;		/* request state, holds a enum sg_rq_state */
+	u64 start_ns;		/* starting point of command duration calc */
+	unsigned long frq_bm[1];        /* see SG_FRQ_* defines above */
+	struct sg_fd *parentfp; /* pointer to owning fd, even when on fl */
 	struct request *rq;	/* released in sg_rq_end_io(), bio kept */
 	struct bio *bio;	/* kept until this req -->SG_RS_INACTIVE */
 	struct execute_work ew_orph;	/* harvest orphan request */
@@ -228,7 +231,7 @@ static ssize_t sg_submit(struct sg_fd *sfp, struct file *filp,
 			 const char __user *buf, size_t count, bool blocking,
 			 bool read_only, bool sg_io_owned,
 			 struct sg_request **o_srp);
-static int sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwp);
+static int sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp);
 static int sg_read_append(struct sg_request *srp, void __user *outp,
 			  int num_xfer);
 static void sg_remove_sgat(struct sg_fd *sfp, struct sg_scatter_hold *schp);
@@ -238,6 +241,7 @@ static void sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp,
 static void sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
 static void sg_remove_sfp(struct kref *);
+static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int pack_id);
 static struct sg_request *sg_setup_req(struct sg_fd *sfp);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int dev);
@@ -449,7 +453,8 @@ sg_open(struct inode *inode, struct file *filp)
 	sfp = sg_add_sfp(sdp);		/* increments sdp->d_ref */
 	if (IS_ERR(sfp)) {
 		res = PTR_ERR(sfp);
-		goto out_undo; }
+		goto out_undo;
+	}
 
 	filp->private_data = sfp;
 	atomic_inc(&sdp->open_cnt);
@@ -512,7 +517,6 @@ sg_release(struct inode *inode, struct file *filp)
 static ssize_t
 sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 {
-	bool blocking = !(filp->f_flags & O_NONBLOCK);
 	int mxsize, cmd_size, input_size, res;
 	u8 opcode;
 	struct sg_device *sdp;
@@ -613,21 +617,19 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	}
 	/*
 	 * SG_DXFER_TO_FROM_DEV is functionally equivalent to SG_DXFER_FROM_DEV,
-	 * but is is possible that the app intended SG_DXFER_TO_DEV, because there
-	 * is a non-zero input_size, so emit a warning.
+	 * but it is possible that the app intended SG_DXFER_TO_DEV, because
+	 * there is a non-zero input_size, so emit a warning.
 	 */
 	if (h3p->dxfer_direction == SG_DXFER_TO_FROM_DEV) {
-		printk_ratelimited(KERN_WARNING
-				   "sg_write: data in/out %d/%d bytes "
-				   "for SCSI command 0x%x-- guessing "
-				   "data in;\n   program %s not setting "
-				   "count and/or reply_len properly\n",
-				   ohp->reply_len - (int)SZ_SG_HEADER,
-				   input_size, (unsigned int) cmnd[0],
-				   current->comm);
+		printk_ratelimited
+			(KERN_WARNING
+			 "%s: data in/out %d/%d bytes for SCSI command 0x%x-- guessing data in;\n"
+			 "   program %s not setting count and/or reply_len properly\n",
+			 __func__, ohp->reply_len - (int)SZ_SG_HEADER,
+			 input_size, (unsigned int)cmnd[0], current->comm);
 	}
 	cwr.timeout = sfp->timeout;
-	cwr.blocking = blocking;
+	cwr.blocking = !(filp->f_flags & O_NONBLOCK);
 	cwr.srp = srp;
 	cwr.cmnd = cmnd;
 	res = sg_common_write(sfp, &cwr);
@@ -655,6 +657,18 @@ sg_fetch_cmnd(struct file *filp, struct sg_fd *sfp, const u8 __user *u_cdbp,
 	return 0;
 }
 
+static inline int
+sg_chk_mmap(struct sg_fd *sfp, int rq_flags, int len)
+{
+	if (len > sfp->reserve.buflen)
+		return -ENOMEM;	/* MMAP_IO size must fit in reserve buffer */
+	if (rq_flags & SG_FLAG_DIRECT_IO)
+		return -EINVAL;	/* either MMAP_IO or DIRECT_IO (not both) */
+	if (sfp->res_in_use)
+		return -EBUSY;	/* reserve buffer already being used */
+	return 0;
+}
+
 static ssize_t
 sg_submit(struct sg_fd *sfp, struct file *filp, const char __user *buf,
 	  size_t count, bool blocking, bool read_only, bool sg_io_owned,
@@ -687,17 +701,10 @@ sg_submit(struct sg_fd *sfp, struct file *filp, const char __user *buf,
 		return -ENOSYS;
 	}
 	if (hp->flags & SG_FLAG_MMAP_IO) {
-		if (hp->dxfer_len > sfp->reserve.buflen) {
-			sg_deact_request(sfp, srp);
-			return -ENOMEM;	/* MMAP_IO size must fit in reserve buffer */
-		}
-		if (hp->flags & SG_FLAG_DIRECT_IO) {
+		res = sg_chk_mmap(sfp, hp->flags, hp->dxfer_len);
+		if (res) {
 			sg_deact_request(sfp, srp);
-			return -EINVAL;	/* either MMAP_IO or DIRECT_IO (not both) */
-		}
-		if (sfp->res_in_use) {
-			sg_deact_request(sfp, srp);
-			return -EBUSY;	/* reserve buffer already being used */
+			return res;
 		}
 	}
 	ul_timeout = msecs_to_jiffies(srp->header.timeout);
@@ -719,6 +726,12 @@ sg_submit(struct sg_fd *sfp, struct file *filp, const char __user *buf,
 	return count;
 }
 
+/*
+ * All writes and submits converge on this function to launch the SCSI
+ * command/request (via blk_execute_rq_nowait). Returns a pointer to a
+ * sg_request object holding the request just issued or a negated errno
+ * value twisted by ERR_PTR.
+ */
 static int
 sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 {
@@ -799,36 +812,58 @@ sg_rstate_chg(struct sg_request *srp, enum sg_rq_state old_st,
  * returns true (or an event like a signal (e.g. control-C) occurs).
  */
 
-static struct sg_request *
-sg_get_rq_mark(struct sg_fd *sfp, int pack_id)
+static inline bool
+sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int pack_id)
 {
-	struct sg_request *resp;
-	unsigned long iflags;
+	struct sg_request *srp;
 
-	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
-	list_for_each_entry(resp, &sfp->rq_list, entry) {
-		/* look for requests that are ready + not SG_IO owned */
-		if (resp->done == 1 && !resp->sg_io_owned &&
-		    (-1 == pack_id || resp->header.pack_id == pack_id)) {
-			resp->done = 2;	/* guard against other readers */
-			spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-			return resp;
-		}
+	if (unlikely(SG_IS_DETACHING(sfp->parentdp))) {
+		*srpp = NULL;
+		return true;
 	}
-	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return NULL;
+	srp = sg_find_srp_by_id(sfp, pack_id);
+	*srpp = srp;
+	return !!srp;
+}
+
+/*
+ * Returns number of bytes copied to user space provided sense buffer or
+ * negated errno value.
+ */
+static int
+sg_copy_sense(struct sg_request *srp)
+{
+	int sb_len_ret = 0;
+	struct sg_io_hdr *hp = &srp->header;
+
+	/* If need be, copy the sense buffer to the user space */
+	if ((CHECK_CONDITION & hp->masked_status) ||
+	    (DRIVER_SENSE & hp->driver_status)) {
+		int sb_len = SCSI_SENSE_BUFFERSIZE;
+		void __user *up = hp->sbp;
+
+		sb_len = min_t(int, hp->mx_sb_len, sb_len);
+		/* Additional sense length field */
+		sb_len_ret = 8 + (int)srp->sense_b[7];
+		sb_len_ret = min_t(int, sb_len_ret, sb_len);
+		if (copy_to_user(up, srp->sense_b, sb_len_ret))
+			return -EFAULT;
+		hp->sb_len_wr = sb_len_ret;
+	}
+	return sb_len_ret;
 }
 
 static int
-srp_done(struct sg_fd *sfp, struct sg_request *srp)
+sg_rec_state_v3(struct sg_fd *sfp, struct sg_request *srp)
 {
-	unsigned long flags;
-	int ret;
+	int sb_len_wr;
 
-	spin_lock_irqsave(&sfp->rq_list_lock, flags);
-	ret = srp->done;
-	spin_unlock_irqrestore(&sfp->rq_list_lock, flags);
-	return ret;
+	sb_len_wr = sg_copy_sense(srp);
+	if (sb_len_wr < 0)
+		return sb_len_wr;
+	if (unlikely(SG_IS_DETACHING(sfp->parentdp)))
+		return -ENODEV;
+	return 0;
 }
 
 #if IS_ENABLED(CONFIG_SCSI_LOGGING)
@@ -858,12 +893,11 @@ sg_rep_rq_state_fail(struct sg_fd *sfp, enum sg_rq_state exp_old_st,
 #endif
 
 static ssize_t
-sg_receive_v3(struct sg_fd *sfp, char __user *buf, size_t count,
-	      struct sg_request *srp)
+sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, size_t count,
+	      void __user *p)
 {
-	struct sg_io_hdr *hp = &srp->header;
 	int err = 0;
-	int len;
+	struct sg_io_hdr *hp = &srp->header;
 
 	if (in_compat_syscall()) {
 		if (count < sizeof(struct compat_sg_io_hdr)) {
@@ -874,27 +908,11 @@ sg_receive_v3(struct sg_fd *sfp, char __user *buf, size_t count,
 		err = -EINVAL;
 		goto err_out;
 	}
-	hp->sb_len_wr = 0;
-	if (hp->mx_sb_len > 0 && hp->sbp) {
-		if ((CHECK_CONDITION & hp->masked_status) ||
-		    (DRIVER_SENSE & hp->driver_status)) {
-			int sb_len = SCSI_SENSE_BUFFERSIZE;
-
-			sb_len = (hp->mx_sb_len > sb_len) ? sb_len :
-							    hp->mx_sb_len;
-			/* Additional sense length field */
-			len = 8 + (int)srp->sense_b[7];
-			len = (len > sb_len) ? sb_len : len;
-			if (copy_to_user(hp->sbp, srp->sense_b, len)) {
-				err = -EFAULT;
-				goto err_out;
-			}
-			hp->sb_len_wr = len;
-		}
-	}
+	SG_LOG(3, sfp, "%s: srp=0x%p\n", __func__, srp);
+	err = sg_rec_state_v3(sfp, srp);
 	if (hp->masked_status || hp->host_status || hp->driver_status)
 		hp->info |= SG_INFO_CHECK;
-	err = put_sg_io_hdr(hp, buf);
+	err = put_sg_io_hdr(hp, p);
 err_out:
 	sg_finish_scsi_blk_rq(srp);
 	sg_deact_request(sfp, srp);
@@ -975,16 +993,22 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 	return res;
 }
 
+/*
+ * This is the read(2) system call entry point (see sg_fops) for this driver.
+ * Accepts v1, v2 or v3 type headers (not v4). Returns count or negated
+ * errno; if count is 0 then v3: returns -EINVAL; v1+v2: 0 when no other
+ * error detected or -EIO.
+ */
 static ssize_t
 sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 {
 	bool could_be_v3;
 	bool non_block = !!(filp->f_flags & O_NONBLOCK);
-	int want_id = -1;
+	int want_id = SG_PACK_ID_WILDCARD;
 	int hlen, ret;
-	struct sg_device *sdp;
+	struct sg_device *sdp = NULL;
 	struct sg_fd *sfp;
-	struct sg_request *srp;
+	struct sg_request *srp = NULL;
 	struct sg_header *h2p = NULL;
 	struct sg_io_hdr a_sg_io_hdr;
 
@@ -999,7 +1023,7 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
 	SG_LOG(3, sfp, "%s: read() count=%d\n", __func__, (int)count);
-	ret = sg_allow_if_err_recovery(sdp, false);
+	ret = sg_allow_if_err_recovery(sdp, non_block);
 	if (ret)
 		return ret;
 
@@ -1018,17 +1042,13 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 		if (h2p->reply_len < 0 && could_be_v3) {
 			struct sg_io_hdr *v3_hdr = (struct sg_io_hdr *)h2p;
 
-			if (likely(v3_hdr->interface_id == 'S')) {
+			if (v3_hdr->interface_id == 'S') {
 				struct sg_io_hdr __user *h3_up;
 
 				h3_up = (struct sg_io_hdr __user *)p;
 				ret = get_user(want_id, &h3_up->pack_id);
-				if (unlikely(ret))
+				if (ret)
 					return ret;
-			} else if (v3_hdr->interface_id == 'Q') {
-				pr_info_once("sg: %s: v4 interface%s here\n",
-					     __func__, " disallowed");
-				return -EPERM;
 			} else {
 				return -EPERM;
 			}
@@ -1036,25 +1056,25 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 			want_id = h2p->pack_id;
 		}
 	}
-	srp = sg_get_rq_mark(sfp, want_id);
-	if (!srp) {		/* now wait on packet to arrive */
+	srp = sg_find_srp_by_id(sfp, want_id);
+	if (!srp) {	/* nothing available so wait on packet to arrive or */
 		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		if (non_block) /* O_NONBLOCK or v3::flags & SGV4_FLAG_IMMED */
 			return -EAGAIN;
-		ret = wait_event_interruptible
-				(sfp->read_wait,
-				 (SG_IS_DETACHING(sdp) ||
-				  (srp = sg_get_rq_mark(sfp, want_id))));
+		ret = wait_event_interruptible(sfp->read_wait,
+					       sg_get_ready_srp(sfp, &srp,
+								want_id));
 		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		if (ret)	/* -ERESTARTSYS as signal hit process */
 			return ret;
+		/* otherwise srp should be valid */
 	}
 	if (srp->header.interface_id == '\0')
 		ret = sg_read_v1v2(p, (int)count, sfp, srp);
 	else
-		ret = sg_receive_v3(sfp, p, count, srp);
+		ret = sg_receive_v3(sfp, srp, count, p);
 	if (ret < 0)
 		SG_LOG(1, sfp, "%s: negated errno: %d\n", __func__, ret);
 	return ret < 0 ? ret : (int)count;
@@ -1103,6 +1123,52 @@ sg_calc_sgat_param(struct sg_device *sdp)
 	sdp->max_sgat_sz = sz;
 }
 
+static u32
+sg_calc_rq_dur(const struct sg_request *srp)
+{
+	ktime_t ts0 = ns_to_ktime(srp->start_ns);
+	ktime_t now_ts;
+	s64 diff;
+
+	if (ts0 == 0)
+		return 0;
+	if (unlikely(ts0 == S64_MAX))	/* _prior_ to issuing req */
+		return 999999999;	/* eye catching */
+	now_ts = ktime_get_boottime();
+	if (unlikely(ts0 > now_ts))
+		return 999999998;
+	/* unlikely req duration will exceed 2**32 milliseconds */
+	diff = ktime_ms_delta(now_ts, ts0);
+	return (diff > (s64)U32_MAX) ? 3999999999U : (u32)diff;
+}
+
+/* Return of U32_MAX means srp is inactive */
+static u32
+sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
+	   bool *is_durp)
+{
+	bool is_dur = false;
+	u32 res = U32_MAX;
+
+	switch (sr_stp ? *sr_stp : atomic_read(&srp->rq_st)) {
+	case SG_RS_INFLIGHT:
+	case SG_RS_BUSY:
+		res = sg_calc_rq_dur(srp);
+		break;
+	case SG_RS_AWAIT_RCV:
+	case SG_RS_RCV_DONE:
+	case SG_RS_INACTIVE:
+		res = srp->duration;
+		is_dur = true;	/* completion has occurred, timing finished */
+		break;
+	default:
+		break;
+	}
+	if (is_durp)
+		*is_durp = is_dur;
+	return res;
+}
+
 static void
 sg_fill_request_table(struct sg_fd *sfp, struct sg_req_info *rinfo)
 {
@@ -1119,6 +1185,7 @@ sg_fill_request_table(struct sg_fd *sfp, struct sg_req_info *rinfo)
 			srp->header.masked_status &
 			srp->header.host_status &
 			srp->header.driver_status;
+		rinfo[val].duration = sg_get_dur(srp, NULL, NULL); /* dummy */
 		if (srp->done)
 			rinfo[val].duration =
 				srp->header.duration;
@@ -1136,6 +1203,18 @@ sg_fill_request_table(struct sg_fd *sfp, struct sg_req_info *rinfo)
 	}
 }
 
+static int
+srp_done(struct sg_fd *sfp, struct sg_request *srp)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&sfp->rq_list_lock, flags);
+	ret = srp->done;
+	spin_unlock_irqrestore(&sfp->rq_list_lock, flags);
+	return ret;
+}
+
 /*
  * Handles ioctl(SG_IO) for blocking (sync) usage of v3 or v4 interface.
  * Returns 0 on success else a negated errno.
@@ -1163,7 +1242,7 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	if (srp->done) {
 		srp->done = 2;
 		spin_unlock_irq(&sfp->rq_list_lock);
-		res = sg_receive_v3(sfp, p, SZ_SG_IO_HDR, srp);
+		res = sg_receive_v3(sfp, srp, SZ_SG_IO_HDR, p);
 		return (res < 0) ? res : 0;
 	}
 	srp->orphan = 1;
@@ -1391,7 +1470,9 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 			return result;
 		if (val > SG_MAX_CDB_SIZE)
 			return -ENOMEM;
-		sfp->next_cmd_len = (val > 0) ? val : 0;
+		mutex_lock(&sfp->f_mutex);
+		sfp->next_cmd_len = max_t(int, val, 0);
+		mutex_unlock(&sfp->f_mutex);
 		return 0;
 	case SG_GET_ACCESS_COUNT:
 		SG_LOG(3, sfp, "%s:    SG_GET_ACCESS_COUNT\n", __func__);
@@ -2351,6 +2432,33 @@ sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
 	return res;
 }
 
+/*
+ * If there are multiple requests outstanding, the speed of this function is
+ * important. SG_PACK_ID_WILDCARD is -1 and that case is typically
+ * the fast path. This function is only used in the non-blocking cases.
+ * Returns pointer to (first) matching sg_request or NULL. If found,
+ * sg_request state is moved from SG_RS_AWAIT_RCV to SG_RS_BUSY.
+ */
+static struct sg_request *
+sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
+{
+	unsigned long iflags;
+	struct sg_request *resp;
+
+	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
+	list_for_each_entry(resp, &sfp->rq_list, entry) {
+		/* look for requests that are ready + not SG_IO owned */
+		if (resp->done == 1 && !resp->sg_io_owned &&
+		    (-1 == pack_id || resp->header.pack_id == pack_id)) {
+			resp->done = 2;	/* guard against other readers */
+			spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+			return resp;
+		}
+	}
+	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	return NULL;
+}
+
 static void
 sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp, int size)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 21/83] sg: sg_fill_request_element
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (20 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 20/83] sg: sg_find_srp_by_id Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 22/83] sg: printk change %p to %pK Douglas Gilbert
                   ` (61 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, Hannes Reinecke

Replace sg_fill_request_table() with sg_fill_request_element().
Reduce the size of the sg_rq_end_io() function by breaking out
some sense buffer checks into sg_check_sense(). Reduce the
size of the sg_start_req() function with sg_set_map_data()
helper. All code refactoring, no logical change.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 215 ++++++++++++++++++++++++++--------------------
 1 file changed, 120 insertions(+), 95 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 74df15255a18..c30e98b958c4 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -159,6 +159,7 @@ struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
 	u32 duration;		/* cmd duration in milliseconds */
 	char res_used;		/* 1 -> using reserve buffer, 0 -> not ... */
 	char orphan;		/* 1 -> drop on sight, 0 -> normal */
+	u32 rq_result;		/* packed scsi request result from LLD */
 	char sg_io_owned;	/* 1 -> packet belongs to SG_IO */
 	/* done protected by rq_list_lock */
 	char done;		/* 0->before bh, 1->before read, 2->read */
@@ -636,6 +637,18 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	return (res < 0) ? res : count;
 }
 
+static inline int
+sg_chk_mmap(struct sg_fd *sfp, int rq_flags, int len)
+{
+	if (len > sfp->reserve.buflen)
+		return -ENOMEM;	/* MMAP_IO size must fit in reserve buffer */
+	if (rq_flags & SG_FLAG_DIRECT_IO)
+		return -EINVAL;	/* either MMAP_IO or DIRECT_IO (not both) */
+	if (sfp->res_in_use)
+		return -EBUSY;	/* reserve buffer already being used */
+	return 0;
+}
+
 static int
 sg_fetch_cmnd(struct file *filp, struct sg_fd *sfp, const u8 __user *u_cdbp,
 	      int len, u8 *cdbp)
@@ -657,18 +670,6 @@ sg_fetch_cmnd(struct file *filp, struct sg_fd *sfp, const u8 __user *u_cdbp,
 	return 0;
 }
 
-static inline int
-sg_chk_mmap(struct sg_fd *sfp, int rq_flags, int len)
-{
-	if (len > sfp->reserve.buflen)
-		return -ENOMEM;	/* MMAP_IO size must fit in reserve buffer */
-	if (rq_flags & SG_FLAG_DIRECT_IO)
-		return -EINVAL;	/* either MMAP_IO or DIRECT_IO (not both) */
-	if (sfp->res_in_use)
-		return -EBUSY;	/* reserve buffer already being used */
-	return 0;
-}
-
 static ssize_t
 sg_submit(struct sg_fd *sfp, struct file *filp, const char __user *buf,
 	  size_t count, bool blocking, bool read_only, bool sg_io_owned,
@@ -919,6 +920,11 @@ sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, size_t count,
 	return err;
 }
 
+/*
+ * Completes a v3 request/command. Called from sg_read {v2 or v3},
+ * ioctl(SG_IO) {for v3}, or from ioctl(SG_IORECEIVE) when its
+ * completing a v3 request/command.
+ */
 static int
 sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 	     struct sg_request *srp)
@@ -1170,37 +1176,28 @@ sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
 }
 
 static void
-sg_fill_request_table(struct sg_fd *sfp, struct sg_req_info *rinfo)
+sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
+			struct sg_req_info *rip)
 {
-	struct sg_request *srp;
-	int val;
 	unsigned int ms;
 
-	val = 0;
-	list_for_each_entry(srp, &sfp->rq_list, entry) {
-		if (val >= SG_MAX_QUEUE)
-			break;
-		rinfo[val].req_state = srp->done + 1;
-		rinfo[val].problem =
-			srp->header.masked_status &
-			srp->header.host_status &
-			srp->header.driver_status;
-		rinfo[val].duration = sg_get_dur(srp, NULL, NULL); /* dummy */
-		if (srp->done)
-			rinfo[val].duration =
-				srp->header.duration;
-		else {
-			ms = jiffies_to_msecs(jiffies);
-			rinfo[val].duration =
-				(ms > srp->header.duration) ?
+	rip->req_state = srp->done + 1;
+	rip->problem = srp->header.masked_status &
+		       srp->header.host_status &
+		       srp->header.driver_status;
+	rip->duration = sg_get_dur(srp, NULL, NULL); /* dummy */
+	if (srp->done) {
+		rip->duration = srp->header.duration;
+	} else {
+		ms = jiffies_to_msecs(jiffies);
+		rip->duration = (ms > srp->header.duration) ?
 				(ms - srp->header.duration) : 0;
-		}
-		rinfo[val].orphan = srp->orphan;
-		rinfo[val].sg_io_owned = srp->sg_io_owned;
-		rinfo[val].pack_id = srp->header.pack_id;
-		rinfo[val].usr_ptr = srp->header.usr_ptr;
-		val++;
 	}
+	rip->orphan = srp->orphan;
+	rip->sg_io_owned = srp->sg_io_owned;
+	rip->pack_id = srp->header.pack_id;
+	rip->usr_ptr = srp->header.usr_ptr;
+
 }
 
 static int
@@ -1294,28 +1291,35 @@ static int put_compat_request_table(struct compat_sg_req_info __user *o,
 static int
 sg_ctl_req_tbl(struct sg_fd *sfp, void __user *p)
 {
-	int result;
+	int result, val;
 	unsigned long iflags;
-	sg_req_info_t *rinfo;
+	struct sg_request *srp;
+	sg_req_info_t *rinfop;
 
-	rinfo = kcalloc(SG_MAX_QUEUE, SZ_SG_REQ_INFO,
-			GFP_KERNEL);
-	if (!rinfo)
+	rinfop = kcalloc(SG_MAX_QUEUE, SZ_SG_REQ_INFO,
+			 GFP_KERNEL);
+	if (!rinfop)
 		return -ENOMEM;
 	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
-	sg_fill_request_table(sfp, rinfo);
+	val = 0;
+	list_for_each_entry(srp, &sfp->rq_list, entry) {
+		if (val >= SG_MAX_QUEUE)
+			break;
+		sg_fill_request_element(sfp, srp, rinfop + val);
+		val++;
+	}
 	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 #ifdef CONFIG_COMPAT
 	if (in_compat_syscall())
-		result = put_compat_request_table(p, rinfo);
+		result = put_compat_request_table(p, rinfop);
 	else
-		result = copy_to_user(p, rinfo,
+		result = copy_to_user(p, rinfop,
 				      SZ_SG_REQ_INFO * SG_MAX_QUEUE);
 #else
-	result = copy_to_user(p, rinfo,
+	result = copy_to_user(p, rinfop,
 			      SZ_SG_REQ_INFO * SG_MAX_QUEUE);
 #endif
-	kfree(rinfo);
+	kfree(rinfop);
 	return result > 0 ? -EFAULT : result;	/* treat short copy as error */
 }
 
@@ -1370,7 +1374,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 			return result;
 		sfp->force_packid = val ? 1 : 0;
 		return 0;
-	case SG_GET_PACK_ID:
+	case SG_GET_PACK_ID:    /* or tag of oldest "read"-able, -1 if none */
 		val = -1;
 		spin_lock_irqsave(&sfp->rq_list_lock, iflags);
 		list_for_each_entry(srp, &sfp->rq_list, entry) {
@@ -1746,6 +1750,39 @@ sg_rq_end_io_usercontext(struct work_struct *work)
 	kref_put(&sfp->f_ref, sg_remove_sfp);
 }
 
+static void
+sg_check_sense(struct sg_device *sdp, struct sg_request *srp, int sense_len)
+{
+	int driver_stat;
+	u32 rq_res = srp->rq_result;
+	struct scsi_request *scsi_rp = scsi_req(srp->rq);
+	u8 *sbp = scsi_rp ? scsi_rp->sense : NULL;
+
+	if (!sbp)
+		return;
+	driver_stat = driver_byte(rq_res);
+	if (driver_stat & DRIVER_SENSE) {
+		struct scsi_sense_hdr ssh;
+
+		if (scsi_normalize_sense(sbp, sense_len, &ssh)) {
+			if (!scsi_sense_is_deferred(&ssh)) {
+				if (ssh.sense_key == UNIT_ATTENTION) {
+					if (sdp->device->removable)
+						sdp->device->changed = 1;
+				}
+			}
+		}
+	}
+	if (test_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm) > 0) {
+		int scsi_stat = rq_res & 0xff;
+
+		if (scsi_stat == SAM_STAT_CHECK_CONDITION ||
+		    scsi_stat == SAM_STAT_COMMAND_TERMINATED)
+			__scsi_print_sense(sdp->device, __func__, sbp,
+					   sense_len);
+	}
+}
+
 /*
  * This function is a "bottom half" handler that is called by the mid
  * level when a command is completed (or has failed).
@@ -1754,13 +1791,13 @@ static void
 sg_rq_end_io(struct request *rq, blk_status_t status)
 {
 	struct sg_request *srp = rq->end_io_data;
-	struct scsi_request *req = scsi_req(rq);
+	struct scsi_request *scsi_rp = scsi_req(rq);
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 	unsigned long iflags;
 	unsigned int ms;
-	char *sense;
-	int result, resid, done = 1;
+	int resid, slen;
+	int done = 1;
 
 	if (WARN_ON(srp->done != 0))
 		return;
@@ -1773,44 +1810,22 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 	if (unlikely(SG_IS_DETACHING(sdp)))
 		pr_info("%s: device detaching\n", __func__);
 
-	sense = req->sense;
-	result = req->result;
-	resid = req->resid_len;
+	srp->rq_result = scsi_rp->result;
+	resid = scsi_rp->resid_len;
 
 	srp->header.resid = resid;
+
+	slen = min_t(int, scsi_rp->sense_len, SCSI_SENSE_BUFFERSIZE);
+
 	SG_LOG(6, sfp, "%s: pack_id=%d, res=0x%x\n", __func__,
-	       srp->header.pack_id, result);
+	       srp->header.pack_id, srp->rq_result);
 	ms = jiffies_to_msecs(jiffies);
 	srp->header.duration = (ms > srp->header.duration) ?
 				(ms - srp->header.duration) : 0;
-	if (0 != result) {
-		struct scsi_sense_hdr sshdr;
-
-		srp->header.status = 0xff & result;
-		srp->header.masked_status = status_byte(result);
-		srp->header.msg_status = msg_byte(result);
-		srp->header.host_status = host_byte(result);
-		srp->header.driver_status = driver_byte(result);
-		if (test_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm) &&
-		    (srp->header.masked_status == CHECK_CONDITION ||
-		     srp->header.masked_status == COMMAND_TERMINATED))
-			__scsi_print_sense(sdp->device, __func__, sense,
-					   SCSI_SENSE_BUFFERSIZE);
-
-		/* Following if statement is a patch supplied by Eric Youngdale */
-		if (driver_byte(result) != 0
-		    && scsi_normalize_sense(sense, SCSI_SENSE_BUFFERSIZE, &sshdr)
-		    && !scsi_sense_is_deferred(&sshdr)
-		    && sshdr.sense_key == UNIT_ATTENTION
-		    && sdp->device->removable) {
-			/* Detected possible disc change. Set the bit - this */
-			/* may be used if there are filesystems using this device */
-			sdp->device->changed = 1;
-		}
-	}
-
-	if (req->sense_len)
-		memcpy(srp->sense_b, req->sense, SCSI_SENSE_BUFFERSIZE);
+	if (srp->rq_result != 0 && slen > 0)
+		sg_check_sense(sdp, srp, slen);
+	if (slen > 0)
+		memcpy(srp->sense_b, scsi_rp->sense, slen);
 
 	/* Rely on write phase to clean out srp status values, so no "else" */
 
@@ -1869,6 +1884,7 @@ static struct class *sg_sysfs_class;
 
 static bool sg_sysfs_valid;
 
+/* Returns valid pointer to sg_device or negated errno twisted by ERR_PTR */
 static struct sg_device *
 sg_add_device_helper(struct gendisk *disk, struct scsi_device *scsidp)
 {
@@ -2081,6 +2097,7 @@ init_sg(void)
 {
 	int rc;
 
+	/* check scatter_elem_sz module parameter, change if inappropriate */
 	if (scatter_elem_sz < (int)PAGE_SIZE)
 		scatter_elem_sz = PAGE_SIZE;
 	else if (!is_power_of_2(scatter_elem_sz))
@@ -2094,8 +2111,11 @@ init_sg(void)
 				    SG_MAX_DEVS, "sg");
 	if (rc)
 		return rc;
-        sg_sysfs_class = class_create(THIS_MODULE, "scsi_generic");
-        if ( IS_ERR(sg_sysfs_class) ) {
+	pr_info("Registered %s[char major=0x%x], version: %s, date: %s\n",
+		"sg device ", SCSI_GENERIC_MAJOR, SG_VERSION_STR,
+		sg_version_date);
+	sg_sysfs_class = class_create(THIS_MODULE, "scsi_generic");
+	if (IS_ERR(sg_sysfs_class)) {
 		rc = PTR_ERR(sg_sysfs_class);
 		goto err_out_unreg;
 	}
@@ -2132,6 +2152,18 @@ exit_sg(void)
 	idr_destroy(&sg_index_idr);
 }
 
+static void
+sg_set_map_data(const struct sg_scatter_hold *schp, bool up_valid,
+		struct rq_map_data *mdp)
+{
+	memset(mdp, 0, sizeof(*mdp));
+	mdp->pages = schp->pages;
+	mdp->page_order = schp->page_order;
+	mdp->nr_entries = schp->num_sgat;
+	mdp->offset = 0;
+	mdp->null_mapped = !up_valid;
+}
+
 static int
 sg_start_req(struct sg_request *srp, u8 *cmd)
 {
@@ -2219,15 +2251,8 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 		}
 		mutex_unlock(&sfp->f_mutex);
 
-		md->pages = req_schp->pages;
-		md->page_order = req_schp->page_order;
-		md->nr_entries = req_schp->num_sgat;
-		md->offset = 0;
-		md->null_mapped = hp->dxferp ? 0 : 1;
-		if (dxfer_dir == SG_DXFER_TO_FROM_DEV)
-			md->from_user = 1;
-		else
-			md->from_user = 0;
+		sg_set_map_data(req_schp, !!hp->dxferp, md);
+		md->from_user = (dxfer_dir == SG_DXFER_TO_FROM_DEV);
 	}
 
 	if (iov_count) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 22/83] sg: printk change %p to %pK
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (21 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 21/83] sg: sg_fill_request_element Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 23/83] sg: xarray for fds in device Douglas Gilbert
                   ` (60 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

This driver does a lot of buffer juggling in an attempt to
take some of that chore away from its users. When debugging
problems associated with that buffer juggling getting
sensible pointer values is a major aid. So change %p
to %pK. The system administrator can choose to obfuscate
%pK pointers. The "pK" is also easier to search for in the
code if further changes are required.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index c30e98b958c4..6fa3faf792a6 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -909,7 +909,7 @@ sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, size_t count,
 		err = -EINVAL;
 		goto err_out;
 	}
-	SG_LOG(3, sfp, "%s: srp=0x%p\n", __func__, srp);
+	SG_LOG(3, sfp, "%s: srp=0x%pK\n", __func__, srp);
 	err = sg_rec_state_v3(sfp, srp);
 	if (hp->masked_status || hp->host_status || hp->driver_status)
 		hp->info |= SG_INFO_CHECK;
@@ -1696,7 +1696,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 		return -ENXIO;
 	}
 	req_sz = vma->vm_end - vma->vm_start;
-	SG_LOG(3, sfp, "%s: vm_start=%p, len=%d\n", __func__,
+	SG_LOG(3, sfp, "%s: vm_start=%pK, len=%d\n", __func__,
 	       (void *)vma->vm_start, (int)req_sz);
 	if (vma->vm_pgoff)
 		return -EINVAL; /* only an offset of 0 accepted */
@@ -1744,7 +1744,7 @@ sg_rq_end_io_usercontext(struct work_struct *work)
 		WARN_ONCE(1, "%s: sfp unexpectedly NULL\n", __func__);
 		return;
 	}
-	SG_LOG(3, sfp, "%s: srp=0x%p\n", __func__, srp);
+	SG_LOG(3, sfp, "%s: srp=0x%pK\n", __func__, srp);
 	sg_finish_scsi_blk_rq(srp);
 	sg_deact_request(sfp, srp);
 	kref_put(&sfp->f_ref, sg_remove_sfp);
@@ -1917,7 +1917,7 @@ sg_add_device_helper(struct gendisk *disk, struct scsi_device *scsidp)
 	k = error;
 
 	SCSI_LOG_TIMEOUT(3, sdev_printk(KERN_INFO, scsidp,
-			 "%s: dev=%d, sdp=0x%p ++\n", __func__, k, sdp));
+			 "%s: dev=%d, sdp=0x%pK ++\n", __func__, k, sdp));
 	sprintf(disk->disk_name, "sg%d", k);
 	disk->first_minor = k;
 	sdp->disk = disk;
@@ -2026,7 +2026,7 @@ sg_device_destroy(struct kref *kref)
 	struct sg_device *sdp = container_of(kref, struct sg_device, d_ref);
 	unsigned long flags;
 
-	SCSI_LOG_TIMEOUT(1, pr_info("[tid=%d] %s: sdp idx=%d, sdp=0x%p --\n",
+	SCSI_LOG_TIMEOUT(1, pr_info("[tid=%d] %s: sdp idx=%d, sdp=0x%pK --\n",
 				    (current ? current->pid : -1), __func__,
 				    sdp->index, sdp));
 	/*
@@ -2058,7 +2058,7 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 		return; /* only want to do following once per device */
 
 	SCSI_LOG_TIMEOUT(3, sdev_printk(KERN_INFO, sdp->device,
-					"%s: 0x%p\n", __func__, sdp));
+					"%s: 0x%pK\n", __func__, sdp));
 
 	read_lock_irqsave(&sdp->sfd_lock, iflags);
 	list_for_each_entry(sfp, &sdp->sfds, sfd_entry) {
@@ -2186,7 +2186,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 		long_cmdp = kzalloc(hp->cmd_len, GFP_KERNEL);
 		if (!long_cmdp)
 			return -ENOMEM;
-		SG_LOG(5, sfp, "%s: long_cmdp=0x%p ++\n", __func__, long_cmdp);
+		SG_LOG(5, sfp, "%s: long_cmdp=0x%pK ++\n", __func__, long_cmdp);
 	}
 	SG_LOG(4, sfp, "%s: dxfer_len=%d, data-%s\n", __func__, dxfer_len,
 	       (r0w ? "OUT" : "IN"));
@@ -2294,7 +2294,7 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 	struct sg_fd *sfp = srp->parentfp;
 	struct sg_scatter_hold *req_schp = &srp->data;
 
-	SG_LOG(4, sfp, "%s: srp=0x%p%s\n", __func__, srp,
+	SG_LOG(4, sfp, "%s: srp=0x%pK%s\n", __func__, srp,
 	       (srp->res_used) ? " rsv" : "");
 	if (!srp->sg_io_owned) {
 		atomic_dec(&sfp->submitted);
@@ -2339,7 +2339,7 @@ sg_mk_sgat(struct sg_scatter_hold *schp, struct sg_fd *sfp, int minlen)
 	align_sz = ALIGN(minlen, SG_DEF_SECTOR_SZ);
 
 	schp->pages = kcalloc(mx_sgat_elems, ptr_sz, mask_kz);
-	SG_LOG(4, sfp, "%s: minlen=%d, align_sz=%d [sz=%zu, 0x%p ++]\n",
+	SG_LOG(4, sfp, "%s: minlen=%d, align_sz=%d [sz=%zu, 0x%pK ++]\n",
 	       __func__, minlen, align_sz, mx_sgat_elems * ptr_sz,
 	       schp->pages);
 	if (unlikely(!schp->pages))
@@ -2357,7 +2357,7 @@ sg_mk_sgat(struct sg_scatter_hold *schp, struct sg_fd *sfp, int minlen)
 		schp->pages[k] = alloc_pages(mask_ap, order);
 		if (!schp->pages[k])
 			goto err_out;
-		SG_LOG(5, sfp, "%s: k=%d, order=%d [0x%p ++]\n", __func__, k,
+		SG_LOG(5, sfp, "%s: k=%d, order=%d [0x%pK ++]\n", __func__, k,
 		       order, schp->pages[k]);
 	}
 	schp->page_order = order;
@@ -2393,12 +2393,12 @@ sg_remove_sgat_helper(struct sg_fd *sfp, struct sg_scatter_hold *schp)
 		return;
 	for (k = 0; k < schp->num_sgat; ++k) {
 		p = schp->pages[k];
-		SG_LOG(5, sfp, "%s: pg[%d]=0x%p --\n", __func__, k, p);
+		SG_LOG(5, sfp, "%s: pg[%d]=0x%pK --\n", __func__, k, p);
 		if (unlikely(!p))
 			continue;
 		__free_pages(p, schp->page_order);
 	}
-	SG_LOG(5, sfp, "%s: pg_order=%u, free pgs=0x%p --\n", __func__,
+	SG_LOG(5, sfp, "%s: pg_order=%u, free pgs=0x%pK --\n", __func__,
 	       schp->page_order, schp->pages);
 	kfree(schp->pages);
 }
@@ -2635,7 +2635,7 @@ sg_add_sfp(struct sg_device *sdp)
 	}
 	list_add_tail(&sfp->sfd_entry, &sdp->sfds);
 	write_unlock_irqrestore(&sdp->sfd_lock, iflags);
-	SG_LOG(3, sfp, "%s: sfp=0x%p\n", __func__, sfp);
+	SG_LOG(3, sfp, "%s: sfp=0x%pK\n", __func__, sfp);
 	if (unlikely(sg_big_buff != def_reserved_size))
 		sg_big_buff = def_reserved_size;
 
@@ -2645,7 +2645,7 @@ sg_add_sfp(struct sg_device *sdp)
 
 	kref_get(&sdp->d_ref);
 	__module_get(THIS_MODULE);
-	SG_LOG(3, sfp, "%s: success, sfp=0x%p ++\n", __func__, sfp);
+	SG_LOG(3, sfp, "%s: success, sfp=0x%pK ++\n", __func__, sfp);
 	return sfp;
 }
 
@@ -2688,7 +2688,7 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 		sg_remove_sgat(sfp, &sfp->reserve);
 	}
 
-	SG_LOG(6, sfp, "%s: sfp=0x%p\n", __func__, sfp);
+	SG_LOG(6, sfp, "%s: sfp=0x%pK\n", __func__, sfp);
 	kfree(sfp);
 
 	scsi_device_put(sdp->device);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 23/83] sg: xarray for fds in device
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (22 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 22/83] sg: printk change %p to %pK Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 24/83] sg: xarray for reqs in fd Douglas Gilbert
                   ` (59 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Add xarray in each sg_device object holding pointers to
children. The children are sg_fd objects, each associated
with an open file descriptor. The xarray replaces a doubly
linked list and its access lock.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 159 +++++++++++++++++++---------------------------
 1 file changed, 65 insertions(+), 94 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 6fa3faf792a6..96f0e28701cf 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -42,6 +42,7 @@ static char *sg_version_date = "20190606";
 #include <linux/uio.h>
 #include <linux/cred.h> /* for sg_check_file_access() */
 #include <linux/proc_fs.h>
+#include <linux/xarray.h>
 
 #include "scsi.h"
 #include <scsi/scsi_dbg.h>
@@ -52,7 +53,6 @@ static char *sg_version_date = "20190606";
 
 #include "scsi_logging.h"
 
-
 #define SG_ALLOW_DIO_DEF 0
 
 #define SG_MAX_DEVS 32768
@@ -173,13 +173,13 @@ struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
 };
 
 struct sg_fd {		/* holds the state of a file descriptor */
-	struct list_head sfd_entry;	/* member sg_device::sfds list */
 	struct sg_device *parentdp;	/* owning device */
 	wait_queue_head_t read_wait;	/* queue read until command done */
 	spinlock_t rq_list_lock;	/* protect access to list in req_arr */
 	struct mutex f_mutex;	/* protect against changes in this fd */
 	int timeout;		/* defaults to SG_DEFAULT_TIMEOUT      */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
+	u32 idx;		/* my index within parent's sfp_arr */
 	atomic_t submitted;	/* number inflight or awaiting read */
 	atomic_t waiting;	/* number of requests awaiting read */
 	int sgat_elem_sz;	/* initialized to scatter_elem_sz */
@@ -202,7 +202,6 @@ struct sg_device { /* holds the state of each scsi generic device */
 	wait_queue_head_t open_wait;    /* queue open() when O_EXCL present */
 	struct mutex open_rel_lock;     /* held when in open() or release() */
 	struct list_head sfds;
-	rwlock_t sfd_lock;      /* protect access to sfd list */
 	int max_sgat_elems;     /* adapter's max number of elements in sgat */
 	int max_sgat_sz;	/* max number of bytes in sgat list */
 	u32 index;		/* device index number */
@@ -210,6 +209,7 @@ struct sg_device { /* holds the state of each scsi generic device */
 	unsigned long fdev_bm[1];	/* see SG_FDEV_* defines above */
 	struct gendisk *disk;
 	struct cdev *cdev;
+	struct xarray sfp_arr;
 	struct kref d_ref;
 };
 
@@ -247,12 +247,7 @@ static struct sg_request *sg_setup_req(struct sg_fd *sfp);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int dev);
 static void sg_device_destroy(struct kref *kref);
-static void sg_calc_sgat_param(struct sg_device *sdp);
 static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
-static void sg_rep_rq_state_fail(struct sg_fd *sfp,
-				 enum sg_rq_state exp_old_st,
-				 enum sg_rq_state want_st,
-				 enum sg_rq_state act_old_st);
 
 #define SZ_SG_HEADER ((int)sizeof(struct sg_header))	/* v1 and v2 header */
 #define SZ_SG_IO_HDR ((int)sizeof(struct sg_io_hdr))	/* v3 header */
@@ -261,7 +256,6 @@ static void sg_rep_rq_state_fail(struct sg_fd *sfp,
 #define SG_IS_DETACHING(sdp) test_bit(SG_FDEV_DETACHING, (sdp)->fdev_bm)
 #define SG_HAVE_EXCLUDE(sdp) test_bit(SG_FDEV_EXCLUDE, (sdp)->fdev_bm)
 #define SG_RS_ACTIVE(srp) (atomic_read(&(srp)->rq_st) != SG_RS_INACTIVE)
-#define SG_RS_AWAIT_READ(srp) (atomic_read(&(srp)->rq_st) == SG_RS_AWAIT_RCV)
 
 /*
  * Kernel needs to be built with CONFIG_SCSI_LOGGING to see log messages.
@@ -400,6 +394,7 @@ sg_open(struct inode *inode, struct file *filp)
 	int min_dev = iminor(inode);
 	int op_flags = filp->f_flags;
 	int res;
+	__maybe_unused int o_count;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 
@@ -449,20 +444,18 @@ sg_open(struct inode *inode, struct file *filp)
 	if (o_excl)
 		set_bit(SG_FDEV_EXCLUDE, sdp->fdev_bm);
 
-	if (atomic_read(&sdp->open_cnt) < 1)	/* no existing opens */
-		sg_calc_sgat_param(sdp);
+	o_count = atomic_inc_return(&sdp->open_cnt);
 	sfp = sg_add_sfp(sdp);		/* increments sdp->d_ref */
 	if (IS_ERR(sfp)) {
+		atomic_dec(&sdp->open_cnt);
 		res = PTR_ERR(sfp);
 		goto out_undo;
 	}
 
 	filp->private_data = sfp;
-	atomic_inc(&sdp->open_cnt);
 	mutex_unlock(&sdp->open_rel_lock);
-	SG_LOG(3, sfp, "%s: minor=%d, op_flags=0x%x; %s count prior=%d%s\n",
-	       __func__, min_dev, op_flags, "device open",
-	       atomic_read(&sdp->open_cnt),
+	SG_LOG(3, sfp, "%s: minor=%d, op_flags=0x%x; %s count after=%d%s\n",
+	       __func__, min_dev, op_flags, "device open", o_count,
 	       ((op_flags & O_NONBLOCK) ? " O_NONBLOCK" : ""));
 
 	res = 0;
@@ -490,26 +483,28 @@ sg_open(struct inode *inode, struct file *filp)
 static int
 sg_release(struct inode *inode, struct file *filp)
 {
+	int o_count;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 
 	sfp = filp->private_data;
-	sdp = sfp->parentdp;
-	SG_LOG(3, sfp, "%s: device open count prior=%d\n", __func__,
-	       atomic_read(&sdp->open_cnt));
-	if (!sdp)
+	sdp = sfp ? sfp->parentdp : NULL;
+	if (unlikely(!sdp))
 		return -ENXIO;
 
 	mutex_lock(&sdp->open_rel_lock);
+	o_count = atomic_read(&sdp->open_cnt);
+	SG_LOG(3, sfp, "%s: open count before=%d\n", __func__, o_count);
 	scsi_autopm_put_device(sdp->device);
 	kref_put(&sfp->f_ref, sg_remove_sfp);
-	atomic_dec(&sdp->open_cnt);
 
-	/* possibly many open()s waiting on exclude clearing, start many;
-	 * only open(O_EXCL)s wait on 0==open_cnt so only start one */
+	/*
+	 * Possibly many open()s waiting on exclude clearing, start many;
+	 * only open(O_EXCL)'s wait when open_cnt<2 and only start one.
+	 */
 	if (test_and_clear_bit(SG_FDEV_EXCLUDE, sdp->fdev_bm))
 		wake_up_interruptible_all(&sdp->open_wait);
-	else if (atomic_read(&sdp->open_cnt) == 0)
+	else if (o_count < 2)
 		wake_up_interruptible(&sdp->open_wait);
 	mutex_unlock(&sdp->open_rel_lock);
 	return 0;
@@ -792,21 +787,6 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 	return 0;
 }
 
-static inline int
-sg_rstate_chg(struct sg_request *srp, enum sg_rq_state old_st,
-	      enum sg_rq_state new_st)
-{
-	enum sg_rq_state act_old_st = (enum sg_rq_state)
-				atomic_cmpxchg(&srp->rq_st, old_st, new_st);
-
-	if (act_old_st == old_st)
-		return 0;	/* implies new_st --> srp->rq_st */
-	else if (IS_ENABLED(CONFIG_SCSI_LOGGING))
-		sg_rep_rq_state_fail(srp->parentfp, old_st, new_st,
-				     act_old_st);
-	return -EPROTOTYPE;
-}
-
 /*
  * This function is called by wait_event_interruptible in sg_read() and
  * sg_ctl_ioreceive(). wait_event_interruptible will return if this one
@@ -867,32 +847,6 @@ sg_rec_state_v3(struct sg_fd *sfp, struct sg_request *srp)
 	return 0;
 }
 
-#if IS_ENABLED(CONFIG_SCSI_LOGGING)
-static void
-sg_rep_rq_state_fail(struct sg_fd *sfp, enum sg_rq_state exp_old_st,
-		     enum sg_rq_state want_st, enum sg_rq_state act_old_st)
-{
-	const char *eors = "expected old rq_st: ";
-	const char *aors = "actual old rq_st: ";
-
-	if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
-		SG_LOG(1, sfp, "%s: %s%s, %s%s, wanted rq_st: %s\n", __func__,
-		       eors, sg_rq_st_str(exp_old_st, false),
-		       aors, sg_rq_st_str(act_old_st, false),
-		       sg_rq_st_str(want_st, false));
-	else
-		pr_info("sg: %s: %s%d, %s%d, wanted rq_st: %d\n", __func__,
-			eors, (int)exp_old_st, aors, (int)act_old_st,
-			(int)want_st);
-}
-#else
-static void
-sg_rep_rq_state_fail(struct sg_fd *sfp, enum sg_rq_state exp_old_st,
-		     enum sg_rq_state want_st, enum sg_rq_state act_old_st)
-{
-}
-#endif
-
 static ssize_t
 sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, size_t count,
 	      void __user *p)
@@ -1496,6 +1450,8 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		if (result)
 			return result;
 		assign_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm, val);
+		if (val == 0)	/* user can force recalculation */
+			sg_calc_sgat_param(sdp);
 		return 0;
 	case BLKSECTGET:
 		SG_LOG(3, sfp, "%s:    BLKSECTGET\n", __func__);
@@ -1923,11 +1879,9 @@ sg_add_device_helper(struct gendisk *disk, struct scsi_device *scsidp)
 	sdp->disk = disk;
 	sdp->device = scsidp;
 	mutex_init(&sdp->open_rel_lock);
-	INIT_LIST_HEAD(&sdp->sfds);
+	xa_init_flags(&sdp->sfp_arr, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
 	init_waitqueue_head(&sdp->open_wait);
 	clear_bit(SG_FDEV_DETACHING, sdp->fdev_bm);
-	rwlock_init(&sdp->sfd_lock);
-	sg_calc_sgat_param(sdp);
 	sdp->index = k;
 	kref_init(&sdp->d_ref);
 	error = 0;
@@ -2000,6 +1954,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 	} else
 		pr_warn("%s: sg_sys Invalid\n", __func__);
 
+	sg_calc_sgat_param(sdp);
 	sdev_printk(KERN_NOTICE, scsidp, "Attached scsi generic sg%d "
 		    "type %d\n", sdp->index, scsidp->type);
 
@@ -2035,6 +1990,7 @@ sg_device_destroy(struct kref *kref)
 	 * any other cleanup.
 	 */
 
+	xa_destroy(&sdp->sfp_arr);
 	write_lock_irqsave(&sg_index_lock, flags);
 	idr_remove(&sg_index_idr, sdp->index);
 	write_unlock_irqrestore(&sg_index_lock, flags);
@@ -2048,7 +2004,7 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 {
 	struct scsi_device *scsidp = to_scsi_device(cl_dev->parent);
 	struct sg_device *sdp = dev_get_drvdata(cl_dev);
-	unsigned long iflags;
+	unsigned long idx;
 	struct sg_fd *sfp;
 
 	if (!sdp)
@@ -2060,13 +2016,13 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 	SCSI_LOG_TIMEOUT(3, sdev_printk(KERN_INFO, sdp->device,
 					"%s: 0x%pK\n", __func__, sdp));
 
-	read_lock_irqsave(&sdp->sfd_lock, iflags);
-	list_for_each_entry(sfp, &sdp->sfds, sfd_entry) {
+	xa_for_each(&sdp->sfp_arr, idx, sfp) {
+		if (!sfp)
+			continue;
 		wake_up_interruptible_all(&sfp->read_wait);
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_HUP);
 	}
 	wake_up_interruptible_all(&sdp->open_wait);
-	read_unlock_irqrestore(&sdp->sfd_lock, iflags);
 
 	sysfs_remove_link(&scsidp->sdev_gendev.kobj, "generic");
 	device_destroy(sg_sysfs_class, MKDEV(SCSI_GENERIC_MAJOR, sdp->index));
@@ -2597,9 +2553,11 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 static struct sg_fd *
 sg_add_sfp(struct sg_device *sdp)
 {
-	int rbuf_len;
+	int rbuf_len, res;
+	u32 idx;
 	unsigned long iflags;
 	struct sg_fd *sfp;
+	struct xa_limit xal;
 
 	sfp = kzalloc(sizeof(*sfp), GFP_ATOMIC | __GFP_NOWARN);
 	if (!sfp)
@@ -2627,14 +2585,10 @@ sg_add_sfp(struct sg_device *sdp)
 	atomic_set(&sfp->submitted, 0);
 	atomic_set(&sfp->waiting, 0);
 
-	write_lock_irqsave(&sdp->sfd_lock, iflags);
 	if (SG_IS_DETACHING(sdp)) {
-		write_unlock_irqrestore(&sdp->sfd_lock, iflags);
 		kfree(sfp);
 		return ERR_PTR(-ENODEV);
 	}
-	list_add_tail(&sfp->sfd_entry, &sdp->sfds);
-	write_unlock_irqrestore(&sdp->sfd_lock, iflags);
 	SG_LOG(3, sfp, "%s: sfp=0x%pK\n", __func__, sfp);
 	if (unlikely(sg_big_buff != def_reserved_size))
 		sg_big_buff = def_reserved_size;
@@ -2643,6 +2597,20 @@ sg_add_sfp(struct sg_device *sdp)
 	if (rbuf_len > 0)
 		sg_build_reserve(sfp, rbuf_len);
 
+	xa_lock_irqsave(&sdp->sfp_arr, iflags);
+	xal.min = 0;
+	xal.max = atomic_read(&sdp->open_cnt);
+	res = __xa_alloc(&sdp->sfp_arr, &idx, sfp, xal, GFP_KERNEL);
+	xa_unlock_irqrestore(&sdp->sfp_arr, iflags);
+	if (res < 0) {
+		pr_warn("%s: xa_alloc(sdp) bad, o_count=%d, errno=%d\n",
+			__func__, xal.max, -res);
+		if (rbuf_len > 0)
+			sg_remove_sgat(sfp, &sfp->reserve);
+		kfree(sfp);
+		return ERR_PTR(res);
+	}
+	sfp->idx = idx;
 	kref_get(&sdp->d_ref);
 	__module_get(THIS_MODULE);
 	SG_LOG(3, sfp, "%s: success, sfp=0x%pK ++\n", __func__, sfp);
@@ -2661,9 +2629,11 @@ sg_add_sfp(struct sg_device *sdp)
 static void
 sg_remove_sfp_usercontext(struct work_struct *work)
 {
+	__maybe_unused int o_count;
 	unsigned long iflags;
-	struct sg_fd *sfp = container_of(work, struct sg_fd, ew_fd.work);
 	struct sg_device *sdp;
+	struct sg_fd *sfp = container_of(work, struct sg_fd, ew_fd.work);
+	struct sg_fd *e_sfp;
 	struct sg_request *srp;
 
 	if (!sfp) {
@@ -2688,7 +2658,15 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 		sg_remove_sgat(sfp, &sfp->reserve);
 	}
 
-	SG_LOG(6, sfp, "%s: sfp=0x%pK\n", __func__, sfp);
+	xa_lock_irqsave(&sdp->sfp_arr, iflags);
+	e_sfp = __xa_erase(&sdp->sfp_arr, sfp->idx);
+	xa_unlock_irqrestore(&sdp->sfp_arr, iflags);
+	if (unlikely(sfp != e_sfp))
+		SG_LOG(1, sfp, "%s: xa_erase() return unexpected\n",
+		       __func__);
+	o_count = atomic_dec_return(&sdp->open_cnt);
+	SG_LOG(3, sfp, "%s: dev o_count after=%d: sfp=0x%pK --\n", __func__,
+	       o_count, sfp);
 	kfree(sfp);
 
 	scsi_device_put(sdp->device);
@@ -2699,13 +2677,7 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 static void
 sg_remove_sfp(struct kref *kref)
 {
-	unsigned long iflags;
 	struct sg_fd *sfp = container_of(kref, struct sg_fd, f_ref);
-	struct sg_device *sdp = sfp->parentdp;
-
-	write_lock_irqsave(&sdp->sfd_lock, iflags);
-	list_del(&sfp->sfd_entry);
-	write_unlock_irqrestore(&sdp->sfd_lock, iflags);
 
 	INIT_WORK(&sfp->ew_fd.work, sg_remove_sfp_usercontext);
 	schedule_work(&sfp->ew_fd.work);
@@ -3020,6 +2992,7 @@ static void
 sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 {
 	int k, new_interface, blen, usg;
+	unsigned long idx;
 	struct sg_request *srp;
 	struct sg_fd *fp;
 	const struct sg_io_hdr *hp;
@@ -3027,15 +3000,15 @@ sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 	unsigned int ms;
 
 	k = 0;
-	list_for_each_entry(fp, &sdp->sfds, sfd_entry) {
+	xa_for_each(&sdp->sfp_arr, idx, fp) {
+		if (!fp)
+			continue;
 		k++;
 		spin_lock(&fp->rq_list_lock); /* irqs already disabled */
-		seq_printf(s, "   FD(%d): timeout=%dms buflen=%d "
-			   "(res)sgat=%d low_dma=%d\n", k,
-			   jiffies_to_msecs(fp->timeout),
-			   fp->reserve.buflen,
-			   (int)fp->reserve.num_sgat,
-			   (int) sdp->device->host->unchecked_isa_dma);
+		seq_printf(s, "   FD(%d): timeout=%dms buflen=%d (res)sgat=%d low_dma=%d idx=%lu\n",
+			   k, jiffies_to_msecs(fp->timeout),
+			   fp->reserve.buflen, (int)fp->reserve.num_sgat,
+			   (int)sdp->device->host->unchecked_isa_dma, idx);
 		seq_printf(s, "   cmd_q=%d f_packid=%d k_orphan=%d closed=0\n",
 			   (int) fp->cmd_q, (int) fp->force_packid,
 			   (int) fp->keep_orphan);
@@ -3099,8 +3072,7 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v)
 	sdp = it ? sg_lookup_dev(it->index) : NULL;
 	if (NULL == sdp)
 		goto skip;
-	read_lock(&sdp->sfd_lock);
-	if (!list_empty(&sdp->sfds)) {
+	if (!xa_empty(&sdp->sfp_arr)) {
 		seq_printf(s, " >>> device=%s ", sdp->disk->disk_name);
 		if (SG_IS_DETACHING(sdp))
 			seq_puts(s, "detaching pending close ");
@@ -3118,7 +3090,6 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v)
 			   atomic_read(&sdp->open_cnt));
 		sg_proc_debug_helper(s, sdp);
 	}
-	read_unlock(&sdp->sfd_lock);
 skip:
 	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 24/83] sg: xarray for reqs in fd
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (23 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 23/83] sg: xarray for fds in device Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 25/83] sg: replace rq array with xarray Douglas Gilbert
                   ` (58 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Replace the linked list and the fixed array of requests (max 16)
with an xarray. The xarray (srp_arr) has two marks: one for
INACTIVE state (i.e. available for re-use) requests; the other
is AWAIT state which is after the internal completion point of
a request and before the user space has fetched the response.

Of the five states in sg_request::rq_st, two are marked. They are
SG_RS_INACTIVE and SG_RS_AWAIT_RCV. This allows the request xarray
(sg_fd::srp_arr) to be searched (with xa_for_each_mark) on two
embedded sub-lists. The SG_RS_INACTIVE sub-list replaces the free
list. The SG_RS_AWAIT_RCV sub-list contains requests that have
reached their internal completion point but have not been read/
received by the user space. Add support functions for this and
partially wire them up.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 317 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 230 insertions(+), 87 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 96f0e28701cf..273861374de7 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -73,7 +73,7 @@ static char *sg_version_date = "20190606";
 #define SG_MAX_CDB_SIZE 252
 
 /* Following enum contains the states of sg_request::rq_st */
-enum sg_rq_state {
+enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 	SG_RS_INACTIVE = 0,	/* request not in use (e.g. on fl) */
 	SG_RS_INFLIGHT,		/* active: cmd/req issued, no response yet */
 	SG_RS_AWAIT_RCV,	/* have response from LLD, awaiting receive */
@@ -113,6 +113,11 @@ enum sg_rq_state {
 #define SG_FDEV_DETACHING	1	/* may be unexpected device removal */
 #define SG_FDEV_LOG_SENSE	2	/* set by ioctl(SG_SET_DEBUG) */
 
+/* xarray 'mark's allow sub-lists within main array/list. */
+#define SG_XA_RQ_FREE XA_MARK_0	/* xarray sets+clears */
+#define SG_XA_RQ_INACTIVE XA_MARK_1
+#define SG_XA_RQ_AWAIT XA_MARK_2
+
 int sg_big_buff = SG_DEF_RESERVED_SIZE;
 /*
  * This variable is accessible via /proc/scsi/sg/def_reserved_size . Each
@@ -152,11 +157,11 @@ struct sg_device;		/* forward declarations */
 struct sg_fd;
 
 struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
-	struct list_head entry;	/* list entry */
 	struct sg_scatter_hold data;	/* hold buffer, perhaps scatter list */
 	struct sg_io_hdr header;  /* scsi command+info, see <scsi/sg.h> */
 	u8 sense_b[SCSI_SENSE_BUFFERSIZE];
 	u32 duration;		/* cmd duration in milliseconds */
+	u32 rq_idx;		/* my index within parent's srp_arr */
 	char res_used;		/* 1 -> using reserve buffer, 0 -> not ... */
 	char orphan;		/* 1 -> drop on sight, 0 -> normal */
 	u32 rq_result;		/* packed scsi request result from LLD */
@@ -175,24 +180,23 @@ struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
 struct sg_fd {		/* holds the state of a file descriptor */
 	struct sg_device *parentdp;	/* owning device */
 	wait_queue_head_t read_wait;	/* queue read until command done */
-	spinlock_t rq_list_lock;	/* protect access to list in req_arr */
 	struct mutex f_mutex;	/* protect against changes in this fd */
 	int timeout;		/* defaults to SG_DEFAULT_TIMEOUT      */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
 	u32 idx;		/* my index within parent's sfp_arr */
-	atomic_t submitted;	/* number inflight or awaiting read */
-	atomic_t waiting;	/* number of requests awaiting read */
+	atomic_t submitted;	/* number inflight or awaiting receive */
+	atomic_t waiting;	/* number of requests awaiting receive */
+	atomic_t req_cnt;	/* number of requests */
 	int sgat_elem_sz;	/* initialized to scatter_elem_sz */
 	struct sg_scatter_hold reserve;	/* buffer for this file descriptor */
-	struct list_head rq_list; /* head of request list */
-	struct fasync_struct *async_qp;	/* used by asynchronous notification */
-	struct sg_request req_arr[SG_MAX_QUEUE];/* use as singly-linked list */
 	char force_packid;	/* 1 -> pack_id input to read(), 0 -> ignored */
 	char cmd_q;		/* 1 -> allow command queuing, 0 -> don't */
 	u8 next_cmd_len;	/* 0: automatic, >0: use on next write() */
 	char keep_orphan;	/* 0 -> drop orphan (def), 1 -> keep for read() */
 	char mmap_called;	/* 0 -> mmap() never called on this fd */
 	char res_in_use;	/* 1 -> 'reserve' array in use */
+	struct fasync_struct *async_qp;	/* used by asynchronous notification */
+	struct xarray srp_arr;
 	struct kref f_ref;
 	struct execute_work ew_fd;  /* harvest all fd resources and lists */
 };
@@ -271,6 +275,7 @@ static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
 
 #if IS_ENABLED(CONFIG_SCSI_LOGGING) && IS_ENABLED(SG_DEBUG)
 #define SG_LOG_BUFF_SZ 48
+#define SG_LOG_ACTIVE 1
 
 #define SG_LOG(depth, sfp, fmt, a...)					\
 	do {								\
@@ -722,6 +727,115 @@ sg_submit(struct sg_fd *sfp, struct file *filp, const char __user *buf,
 	return count;
 }
 
+#if IS_ENABLED(SG_LOG_ACTIVE)
+static void
+sg_rq_state_fail_msg(struct sg_fd *sfp, enum sg_rq_state exp_old_st,
+		     enum sg_rq_state want_st, enum sg_rq_state act_old_st,
+		     const char *fromp)
+{
+	const char *eaw_rs = "expected_old,actual_old,wanted rq_st";
+
+	if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
+		SG_LOG(1, sfp, "%s: %s: %s: %s,%s,%s\n",
+		       __func__, fromp, eaw_rs,
+		       sg_rq_st_str(exp_old_st, false),
+		       sg_rq_st_str(act_old_st, false),
+		       sg_rq_st_str(want_st, false));
+	else
+		pr_info("sg: %s: %s: %s: %d,%d,%d\n", __func__, fromp, eaw_rs,
+			(int)exp_old_st, (int)act_old_st, (int)want_st);
+}
+#endif
+
+static void
+sg_rq_state_force(struct sg_request *srp, enum sg_rq_state new_st)
+{
+	bool prev, want;
+	struct xarray *xafp = &srp->parentfp->srp_arr;
+
+	atomic_set(&srp->rq_st, new_st);
+	want = (new_st == SG_RS_AWAIT_RCV);
+	prev = xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
+	if (prev != want) {
+		if (want)
+			__xa_set_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
+		else
+			__xa_clear_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
+	}
+	want = (new_st == SG_RS_INACTIVE);
+	prev = xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
+	if (prev != want) {
+		if (want)
+			__xa_set_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
+		else
+			__xa_clear_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
+	}
+}
+
+static void
+sg_rq_state_helper(struct xarray *xafp, struct sg_request *srp, int indic)
+{
+	if (indic & 1)		/* from inactive state */
+		__xa_clear_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
+	else if (indic & 2)	/* to inactive state */
+		__xa_set_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
+
+	if (indic & 4)		/* from await state */
+		__xa_clear_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
+	else if (indic & 8)	/* to await state */
+		__xa_set_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
+}
+
+/* Following array indexed by enum sg_rq_state, 0 means no xa mark change */
+static const int sg_rq_state_arr[] = {1, 0, 4, 0, 0};
+static const int sg_rq_state_mul2arr[] = {2, 0, 8, 0, 0};
+
+/*
+ * This function keeps the srp->rq_st state and associated marks on the
+ * owning xarray's element in sync. If force is true then new_st is stored
+ * in srp->rq_st and xarray marks are set accordingly (and old_st is
+ * ignored); and 0 is returned.
+ * If force is false, then atomic_cmpxchg() is called. If the actual
+ * srp->rq_st is not old_st, then -EPROTOTYPE is returned. If the actual
+ * srp->rq_st is old_st then it is replaced by new_st and the xarray marks
+ * are setup accordingly and 0 is returned. This assumes srp_arr xarray
+ * spinlock is held.
+ */
+static int
+sg_rq_state_chg(struct sg_request *srp, enum sg_rq_state old_st,
+		enum sg_rq_state new_st, bool force, const char *fromp)
+{
+	enum sg_rq_state act_old_st;
+	int indic;
+	unsigned long iflags;
+	struct xarray *xafp = &srp->parentfp->srp_arr;
+
+	if (force) {
+		xa_lock_irqsave(xafp, iflags);
+		sg_rq_state_force(srp, new_st);
+		xa_unlock_irqrestore(xafp, iflags);
+		return 0;
+	}
+	indic = sg_rq_state_arr[(int)old_st] +
+		sg_rq_state_mul2arr[(int)new_st];
+	act_old_st = (enum sg_rq_state)atomic_cmpxchg(&srp->rq_st, old_st,
+						      new_st);
+	if (act_old_st != old_st) {
+#if IS_ENABLED(SG_LOG_ACTIVE)
+		if (fromp)
+			sg_rq_state_fail_msg(srp->parentfp, old_st, new_st,
+					     act_old_st, fromp);
+#endif
+		return -EPROTOTYPE;	/* only used for this error type */
+	}
+	if (indic) {
+		xa_lock_irqsave(xafp, iflags);
+		sg_rq_state_helper(xafp, srp, indic);
+		xa_unlock_irqrestore(xafp, iflags);
+	}
+	return 0;
+}
+
 /*
  * All writes and submits converge on this function to launch the SCSI
  * command/request (via blk_execute_rq_nowait). Returns a pointer to a
@@ -760,17 +874,8 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 		sg_deact_request(sfp, srp);
 		return k;	/* probably out of space --> ENOMEM */
 	}
-	if (SG_IS_DETACHING(sdp)) {
-		if (srp->bio) {
-			scsi_req_free_cmd(scsi_req(srp->rq));
-			blk_put_request(srp->rq);
-			srp->rq = NULL;
-		}
-
-		sg_finish_scsi_blk_rq(srp);
-		sg_deact_request(sfp, srp);
-		return -ENODEV;
-	}
+	if (SG_IS_DETACHING(sdp))
+		goto err_out;
 
 	hp->duration = jiffies_to_msecs(jiffies);
 	if (hp->interface_id != '\0' &&	/* v3 (or later) interface */
@@ -785,6 +890,22 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
 	blk_execute_rq_nowait(sdp->disk, srp->rq, at_head, sg_rq_end_io);
 	return 0;
+err_out:
+	if (srp->bio) {
+		scsi_req_free_cmd(scsi_req(srp->rq));
+		blk_put_request(srp->rq);
+		srp->rq = NULL;
+	}
+	sg_finish_scsi_blk_rq(srp);
+	sg_deact_request(sfp, srp);
+	return -ENODEV;
+}
+
+static inline int
+sg_rstate_chg(struct sg_request *srp, enum sg_rq_state old_st,
+	      enum sg_rq_state new_st)
+{
+	return sg_rq_state_chg(srp, old_st, new_st, false, __func__);
 }
 
 /*
@@ -1157,12 +1278,9 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 static int
 srp_done(struct sg_fd *sfp, struct sg_request *srp)
 {
-	unsigned long flags;
 	int ret;
 
-	spin_lock_irqsave(&sfp->rq_list_lock, flags);
 	ret = srp->done;
-	spin_unlock_irqrestore(&sfp->rq_list_lock, flags);
 	return ret;
 }
 
@@ -1189,15 +1307,12 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		(sfp->read_wait, (srp_done(sfp, srp) || SG_IS_DETACHING(sdp)));
 	if (SG_IS_DETACHING(sdp))
 		return -ENODEV;
-	spin_lock_irq(&sfp->rq_list_lock);
 	if (srp->done) {
 		srp->done = 2;
-		spin_unlock_irq(&sfp->rq_list_lock);
 		res = sg_receive_v3(sfp, srp, SZ_SG_IO_HDR, p);
 		return (res < 0) ? res : 0;
 	}
 	srp->orphan = 1;
-	spin_unlock_irq(&sfp->rq_list_lock);
 	return res;
 }
 
@@ -1246,7 +1361,7 @@ static int
 sg_ctl_req_tbl(struct sg_fd *sfp, void __user *p)
 {
 	int result, val;
-	unsigned long iflags;
+	unsigned long idx;
 	struct sg_request *srp;
 	sg_req_info_t *rinfop;
 
@@ -1254,15 +1369,17 @@ sg_ctl_req_tbl(struct sg_fd *sfp, void __user *p)
 			 GFP_KERNEL);
 	if (!rinfop)
 		return -ENOMEM;
-	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
 	val = 0;
-	list_for_each_entry(srp, &sfp->rq_list, entry) {
+	xa_for_each(&sfp->srp_arr, idx, srp) {
+		if (!srp)
+			continue;
+		if (xa_get_mark(&sfp->srp_arr, idx, SG_XA_RQ_AWAIT))
+			continue;
 		if (val >= SG_MAX_QUEUE)
 			break;
 		sg_fill_request_element(sfp, srp, rinfop + val);
 		val++;
 	}
-	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 #ifdef CONFIG_COMPAT
 	if (in_compat_syscall())
 		result = put_compat_request_table(p, rinfop);
@@ -1307,7 +1424,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	int __user *ip = p;
 	struct sg_request *srp;
 	struct scsi_device *sdev;
-	unsigned long iflags;
+	unsigned long idx;
 	__maybe_unused const char *pmlp = ", pass to mid-level";
 
 	SG_LOG(6, sfp, "%s: cmd=0x%x, O_NONBLOCK=%d\n", __func__, cmd_in,
@@ -1330,14 +1447,15 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return 0;
 	case SG_GET_PACK_ID:    /* or tag of oldest "read"-able, -1 if none */
 		val = -1;
-		spin_lock_irqsave(&sfp->rq_list_lock, iflags);
-		list_for_each_entry(srp, &sfp->rq_list, entry) {
+		srp = NULL;
+		xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
+			if (!srp)
+				continue;
 			if ((1 == srp->done) && (!srp->sg_io_owned)) {
 				val = srp->header.pack_id;
 				break;
 			}
 		}
-		spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 		SG_LOG(3, sfp, "%s:    SG_GET_PACK_ID=%d\n", __func__, val);
 		return put_user(val, ip);
 	case SG_GET_NUM_WAITING:
@@ -1750,10 +1868,10 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 	struct scsi_request *scsi_rp = scsi_req(rq);
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
-	unsigned long iflags;
 	unsigned int ms;
 	int resid, slen;
 	int done = 1;
+	unsigned long iflags;
 
 	if (WARN_ON(srp->done != 0))
 		return;
@@ -1797,7 +1915,6 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 	scsi_req_free_cmd(scsi_req(rq));
 	blk_put_request(rq);
 
-	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
 	if (unlikely(srp->orphan)) {
 		if (sfp->keep_orphan)
 			srp->sg_io_owned = 0;
@@ -1805,12 +1922,14 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 			done = 0;
 	}
 	srp->done = done;
-	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 
 	if (likely(done)) {
 		/* Now wake up any sg_read() that is waiting for this
 		 * packet.
 		 */
+		xa_lock_irqsave(&sfp->srp_arr, iflags);
+		__xa_set_mark(&sfp->srp_arr, srp->rq_idx, SG_XA_RQ_AWAIT);
+		xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 		wake_up_interruptible(&sfp->read_wait);
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
 		kref_put(&sfp->f_ref, sg_remove_sfp);
@@ -2423,20 +2542,19 @@ sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
 static struct sg_request *
 sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 {
-	unsigned long iflags;
+	unsigned long idx;
 	struct sg_request *resp;
 
-	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
-	list_for_each_entry(resp, &sfp->rq_list, entry) {
+	xa_for_each_marked(&sfp->srp_arr, idx, resp, SG_XA_RQ_AWAIT) {
+		if (!resp)
+			continue;
 		/* look for requests that are ready + not SG_IO owned */
 		if (resp->done == 1 && !resp->sg_io_owned &&
 		    (-1 == pack_id || resp->header.pack_id == pack_id)) {
 			resp->done = 2;	/* guard against other readers */
-			spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 			return resp;
 		}
 	}
-	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
 	return NULL;
 }
 
@@ -2506,31 +2624,51 @@ sg_build_reserve(struct sg_fd *sfp, int req_size)
 static struct sg_request *
 sg_setup_req(struct sg_fd *sfp)
 {
-	int k;
-	unsigned long iflags;
-	struct sg_request *rp = sfp->req_arr;
-
-	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
-	if (!list_empty(&sfp->rq_list)) {
-		if (!sfp->cmd_q)
-			goto out_unlock;
-
-		for (k = 0; k < SG_MAX_QUEUE; ++k, ++rp) {
-			if (!rp->parentfp)
-				break;
+	bool found = false;
+	int res;
+	unsigned long idx, iflags;
+	struct sg_request *rp;
+	struct xarray *xafp = &sfp->srp_arr;
+
+	if (!xa_empty(xafp)) {
+		xa_for_each_marked(xafp, idx, rp, SG_XA_RQ_INACTIVE) {
+			if (!rp)
+				continue;
+			if (sg_rstate_chg(rp, SG_RS_INACTIVE, SG_RS_BUSY))
+				continue;
+			memset(rp, 0, sizeof(*rp));
+			rp->rq_idx = idx;
+			xa_lock_irqsave(xafp, iflags);
+			__xa_clear_mark(xafp, idx, SG_XA_RQ_INACTIVE);
+			xa_unlock_irqrestore(xafp, iflags);
+			found = true;
+			break;
 		}
-		if (k >= SG_MAX_QUEUE)
-			goto out_unlock;
 	}
-	memset(rp, 0, sizeof(struct sg_request));
+	if (!found) {
+		rp = kzalloc(sizeof(*rp), GFP_KERNEL);
+		if (!rp)
+			return NULL;
+	}
 	rp->parentfp = sfp;
 	rp->header.duration = jiffies_to_msecs(jiffies);
-	list_add_tail(&rp->entry, &sfp->rq_list);
-	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	if (!found) {
+		u32 n_idx;
+		struct xa_limit xal = { .max = 0, .min = 0 };
+
+		atomic_set(&rp->rq_st, SG_RS_BUSY);
+		xa_lock_irqsave(xafp, iflags);
+		xal.max = atomic_inc_return(&sfp->req_cnt);
+		res = __xa_alloc(xafp, &n_idx, rp, xal, GFP_KERNEL);
+		xa_unlock_irqrestore(xafp, iflags);
+		if (res < 0) {
+			pr_warn("%s: don't expect xa_alloc() to fail, errno=%d\n",
+				__func__,  -res);
+			return NULL;
+		}
+		rp->rq_idx = n_idx;
+	}
 	return rp;
-out_unlock:
-	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-	return NULL;
 }
 
 static void
@@ -2540,14 +2678,10 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 
 	if (WARN_ON(!sfp || !srp))
 		return;
-	if (list_empty(&sfp->rq_list))
-		return;
-	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
-	if (!list_empty(&srp->entry)) {
-		list_del(&srp->entry);
-		srp->parentfp = NULL;
-	}
-	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
+	xa_lock_irqsave(&sfp->srp_arr, iflags);
+	__xa_set_mark(&sfp->srp_arr, srp->rq_idx, SG_XA_RQ_INACTIVE);
+	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
+	atomic_set(&srp->rq_st, SG_RS_INACTIVE);
 }
 
 static struct sg_fd *
@@ -2564,8 +2698,7 @@ sg_add_sfp(struct sg_device *sdp)
 		return ERR_PTR(-ENOMEM);
 
 	init_waitqueue_head(&sfp->read_wait);
-	spin_lock_init(&sfp->rq_list_lock);
-	INIT_LIST_HEAD(&sfp->rq_list);
+	xa_init_flags(&sfp->srp_arr, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
 	kref_init(&sfp->f_ref);
 	mutex_init(&sfp->f_mutex);
 	sfp->timeout = SG_DEFAULT_TIMEOUT;
@@ -2584,6 +2717,7 @@ sg_add_sfp(struct sg_device *sdp)
 	sfp->parentdp = sdp;
 	atomic_set(&sfp->submitted, 0);
 	atomic_set(&sfp->waiting, 0);
+	atomic_set(&sfp->req_cnt, 0);
 
 	if (SG_IS_DETACHING(sdp)) {
 		kfree(sfp);
@@ -2630,11 +2764,13 @@ static void
 sg_remove_sfp_usercontext(struct work_struct *work)
 {
 	__maybe_unused int o_count;
-	unsigned long iflags;
+	unsigned long idx, iflags;
 	struct sg_device *sdp;
 	struct sg_fd *sfp = container_of(work, struct sg_fd, ew_fd.work);
 	struct sg_fd *e_sfp;
 	struct sg_request *srp;
+	struct sg_request *e_srp;
+	struct xarray *xafp = &sfp->srp_arr;
 
 	if (!sfp) {
 		pr_warn("sg: %s: sfp is NULL\n", __func__);
@@ -2643,15 +2779,20 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 	sdp = sfp->parentdp;
 
 	/* Cleanup any responses which were never read(). */
-	spin_lock_irqsave(&sfp->rq_list_lock, iflags);
-	while (!list_empty(&sfp->rq_list)) {
-		srp = list_first_entry(&sfp->rq_list, struct sg_request, entry);
-		sg_finish_scsi_blk_rq(srp);
-		list_del(&srp->entry);
-		srp->parentfp = NULL;
-	}
-	spin_unlock_irqrestore(&sfp->rq_list_lock, iflags);
-
+	xa_for_each(xafp, idx, srp) {
+		if (!srp)
+			continue;
+		if (!xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE))
+			sg_finish_scsi_blk_rq(srp);
+		xa_lock_irqsave(xafp, iflags);
+		e_srp = __xa_erase(xafp, srp->rq_idx);
+		xa_unlock_irqrestore(xafp, iflags);
+		if (srp != e_srp)
+			SG_LOG(1, sfp, "%s: xa_erase() return unexpected\n",
+			       __func__);
+		kfree(srp);
+	}
+	xa_destroy(xafp);
 	if (sfp->reserve.buflen > 0) {
 		SG_LOG(6, sfp, "%s:    buflen=%d, num_sgat=%d\n", __func__,
 		       (int)sfp->reserve.buflen, (int)sfp->reserve.num_sgat);
@@ -2742,7 +2883,9 @@ sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
 		return long_str ? "unknown" : "unk";
 	}
 }
-#else
+
+#elif IS_ENABLED(SG_LOG_ACTIVE)
+
 static const char *
 sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
 {
@@ -2992,7 +3135,7 @@ static void
 sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 {
 	int k, new_interface, blen, usg;
-	unsigned long idx;
+	unsigned long idx, idx2;
 	struct sg_request *srp;
 	struct sg_fd *fp;
 	const struct sg_io_hdr *hp;
@@ -3004,7 +3147,6 @@ sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 		if (!fp)
 			continue;
 		k++;
-		spin_lock(&fp->rq_list_lock); /* irqs already disabled */
 		seq_printf(s, "   FD(%d): timeout=%dms buflen=%d (res)sgat=%d low_dma=%d idx=%lu\n",
 			   k, jiffies_to_msecs(fp->timeout),
 			   fp->reserve.buflen, (int)fp->reserve.num_sgat,
@@ -3015,7 +3157,9 @@ sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 		seq_printf(s, "   submitted=%d waiting=%d\n",
 			   atomic_read(&fp->submitted),
 			   atomic_read(&fp->waiting));
-		list_for_each_entry(srp, &fp->rq_list, entry) {
+		xa_for_each(&fp->srp_arr, idx2, srp) {
+			if (!srp)
+				continue;
 			hp = &srp->header;
 			new_interface = (hp->interface_id == '\0') ? 0 : 1;
 			if (srp->res_used) {
@@ -3051,9 +3195,8 @@ sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 				   (int)srp->data.cmd_opcode,
 				   sg_rq_st_str(SG_RS_INACTIVE, false));
 		}
-		if (list_empty(&fp->rq_list))
+		if (xa_empty(&fp->srp_arr))
 			seq_puts(s, "     No requests active\n");
-		spin_unlock(&fp->rq_list_lock);
 	}
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 25/83] sg: replace rq array with xarray
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (24 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 24/83] sg: xarray for reqs in fd Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 26/83] sg: sense buffer rework Douglas Gilbert
                   ` (57 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, kbuild test robot

Remove the fixed size array of 16 request elements per file
descriptor and replace with the xarray added in the previous
patch. All sg_request objects are now kept, available for
re-use, until their owning file descriptor is closed. The
sg_request deletions are in sg_remove_sfp_usercontext().
Each active sg_request object has an associated block
request and a scsi_request object but they have different
lifetimes. The block request and the scsi_request object
are released much earlier; their lifetime is the same as it
was in the v3 sg driver. The lifetime of the bio is also the
same (but is stretched in a later patch).

Collect various flags into bit maps: one for requests
(SG_FRQ_*) and the other for file descriptors (SG_FFD_*).
They join a per sg_device bit map (SG_FDEV_*) added in an
earlier patch.

Prior to a new sg_request object being (re-)built, information
that will be placed in it uses a new struct sg_comm_wr_t
object.

Since the above changes touch almost every function and low
level structures, this patch is big.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 1547 ++++++++++++++++++++++++++++-----------------
 1 file changed, 983 insertions(+), 564 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 273861374de7..6df7aa81349b 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -142,36 +142,51 @@ static struct class_interface sg_interface = {
 	.remove_dev     = sg_remove_device,
 };
 
+/* Subset of sg_io_hdr found in <scsi/sg.h>, has only [i] and [i->o] fields */
+struct sg_slice_hdr3 {
+	int interface_id;
+	int dxfer_direction;
+	u8 cmd_len;
+	u8 mx_sb_len;
+	u16 iovec_count;
+	unsigned int dxfer_len;
+	void __user *dxferp;
+	u8 __user *cmdp;
+	void __user *sbp;
+	unsigned int timeout;
+	unsigned int flags;
+	int pack_id;
+	void __user *usr_ptr;
+};
+
 struct sg_scatter_hold {     /* holding area for scsi scatter gather info */
 	struct page **pages;	/* num_sgat element array of struct page* */
 	int buflen;		/* capacity in bytes (dlen<=buflen) */
 	int dlen;		/* current valid data length of this req */
 	u16 page_order;		/* byte_len = (page_size*(2**page_order)) */
 	u16 num_sgat;		/* actual number of scatter-gather segments */
-	unsigned int sglist_len; /* size of malloc'd scatter-gather list ++ */
-	char dio_in_use;	/* 0->indirect IO (or mmap), 1->dio */
-	u8 cmd_opcode;		/* first byte of command */
 };
 
 struct sg_device;		/* forward declarations */
 struct sg_fd;
 
-struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
-	struct sg_scatter_hold data;	/* hold buffer, perhaps scatter list */
-	struct sg_io_hdr header;  /* scsi command+info, see <scsi/sg.h> */
+struct sg_request {	/* active SCSI command or inactive request */
+	struct sg_scatter_hold sgat_h;	/* hold buffer, perhaps scatter list */
+	struct sg_slice_hdr3 s_hdr3;  /* subset of sg_io_hdr */
 	u8 sense_b[SCSI_SENSE_BUFFERSIZE];
 	u32 duration;		/* cmd duration in milliseconds */
+	u32 rq_flags;		/* hold user supplied flags */
 	u32 rq_idx;		/* my index within parent's srp_arr */
-	char res_used;		/* 1 -> using reserve buffer, 0 -> not ... */
-	char orphan;		/* 1 -> drop on sight, 0 -> normal */
+	u32 rq_info;		/* info supplied by v3 and v4 interfaces */
 	u32 rq_result;		/* packed scsi request result from LLD */
-	char sg_io_owned;	/* 1 -> packet belongs to SG_IO */
-	/* done protected by rq_list_lock */
-	char done;		/* 0->before bh, 1->before read, 2->read */
+	int in_resid;		/* requested-actual byte count on data-in */
+	int pack_id;		/* user provided packet identifier field */
+	int sense_len;		/* actual sense buffer length (data-in) */
 	atomic_t rq_st;		/* request state, holds a enum sg_rq_state */
+	u8 cmd_opcode;		/* first byte of SCSI cdb */
 	u64 start_ns;		/* starting point of command duration calc */
-	unsigned long frq_bm[1];        /* see SG_FRQ_* defines above */
-	struct sg_fd *parentfp; /* pointer to owning fd, even when on fl */
+	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
+	struct sg_fd *parentfp;	/* pointer to owning fd, even when on fl */
 	struct request *rq;	/* released in sg_rq_end_io(), bio kept */
 	struct bio *bio;	/* kept until this req -->SG_RS_INACTIVE */
 	struct execute_work ew_orph;	/* harvest orphan request */
@@ -180,7 +195,7 @@ struct sg_request {	/* SG_MAX_QUEUE requests outstanding per file */
 struct sg_fd {		/* holds the state of a file descriptor */
 	struct sg_device *parentdp;	/* owning device */
 	wait_queue_head_t read_wait;	/* queue read until command done */
-	struct mutex f_mutex;	/* protect against changes in this fd */
+	struct mutex f_mutex;	/* serialize ioctls on this fd */
 	int timeout;		/* defaults to SG_DEFAULT_TIMEOUT      */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
 	u32 idx;		/* my index within parent's sfp_arr */
@@ -188,15 +203,12 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	atomic_t waiting;	/* number of requests awaiting receive */
 	atomic_t req_cnt;	/* number of requests */
 	int sgat_elem_sz;	/* initialized to scatter_elem_sz */
-	struct sg_scatter_hold reserve;	/* buffer for this file descriptor */
-	char force_packid;	/* 1 -> pack_id input to read(), 0 -> ignored */
-	char cmd_q;		/* 1 -> allow command queuing, 0 -> don't */
+	unsigned long ffd_bm[1];	/* see SG_FFD_* defines above */
+	pid_t tid;		/* thread id when opened */
 	u8 next_cmd_len;	/* 0: automatic, >0: use on next write() */
-	char keep_orphan;	/* 0 -> drop orphan (def), 1 -> keep for read() */
-	char mmap_called;	/* 0 -> mmap() never called on this fd */
-	char res_in_use;	/* 1 -> 'reserve' array in use */
-	struct fasync_struct *async_qp;	/* used by asynchronous notification */
-	struct xarray srp_arr;
+	struct sg_request *rsv_srp;/* one reserve request per fd */
+	struct fasync_struct *async_qp; /* used by asynchronous notification */
+	struct xarray srp_arr;	/* xarray of sg_request object pointers */
 	struct kref f_ref;
 	struct execute_work ew_fd;  /* harvest all fd resources and lists */
 };
@@ -219,8 +231,8 @@ struct sg_device { /* holds the state of each scsi generic device */
 
 struct sg_comm_wr_t {  /* arguments to sg_common_write() */
 	int timeout;
-	int blocking;
-	struct sg_request *srp;
+	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
+	struct sg_io_hdr *h3p;
 	u8 *cmnd;
 };
 
@@ -228,31 +240,32 @@ struct sg_comm_wr_t {  /* arguments to sg_common_write() */
 static void sg_rq_end_io(struct request *rq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
 static int sg_proc_init(void);
-static int sg_start_req(struct sg_request *srp, u8 *cmd);
+static int sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
+			int dxfer_dir);
 static void sg_finish_scsi_blk_rq(struct sg_request *srp);
-static int sg_mk_sgat(struct sg_scatter_hold *schp, struct sg_fd *sfp,
-		      int minlen);
-static ssize_t sg_submit(struct sg_fd *sfp, struct file *filp,
-			 const char __user *buf, size_t count, bool blocking,
-			 bool read_only, bool sg_io_owned,
-			 struct sg_request **o_srp);
-static int sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp);
+static int sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen);
+static int sg_submit(struct file *filp, struct sg_fd *sfp,
+		     struct sg_io_hdr *hp, bool sync,
+		     struct sg_request **o_srp);
+static struct sg_request *sg_common_write(struct sg_fd *sfp,
+					  struct sg_comm_wr_t *cwrp);
 static int sg_read_append(struct sg_request *srp, void __user *outp,
 			  int num_xfer);
-static void sg_remove_sgat(struct sg_fd *sfp, struct sg_scatter_hold *schp);
-static void sg_build_reserve(struct sg_fd *sfp, int req_size);
-static void sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp,
-			    int size);
-static void sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp);
+static void sg_remove_sgat(struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
 static void sg_remove_sfp(struct kref *);
 static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int pack_id);
-static struct sg_request *sg_setup_req(struct sg_fd *sfp);
+static struct sg_request *sg_setup_req(struct sg_fd *sfp, int dxfr_len,
+				       struct sg_comm_wr_t *cwrp);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int dev);
 static void sg_device_destroy(struct kref *kref);
+static struct sg_request *sg_mk_srp_sgat(struct sg_fd *sfp, bool first,
+					 int db_len);
 static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
 
+#define SG_WRITE_COUNT_LIMIT (32 * 1024 * 1024)
+
 #define SZ_SG_HEADER ((int)sizeof(struct sg_header))	/* v1 and v2 header */
 #define SZ_SG_IO_HDR ((int)sizeof(struct sg_io_hdr))	/* v3 header */
 #define SZ_SG_REQ_INFO ((int)sizeof(struct sg_req_info))
@@ -518,6 +531,7 @@ sg_release(struct inode *inode, struct file *filp)
 static ssize_t
 sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 {
+	bool get_v3_hdr;
 	int mxsize, cmd_size, input_size, res;
 	u8 opcode;
 	struct sg_device *sdp;
@@ -540,36 +554,61 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	res = sg_allow_if_err_recovery(sdp, !!(filp->f_flags & O_NONBLOCK));
 	if (res)
 		return res;
-
-	if (count < SZ_SG_HEADER)
+	if (count < SZ_SG_HEADER || count > SG_WRITE_COUNT_LIMIT)
 		return -EIO;
-	if (copy_from_user(ohp, p, SZ_SG_HEADER))
-		return -EFAULT;
-	if (ohp->reply_len < 0) {	/* assume this is v3 */
-		struct sg_io_hdr *reinter_2p = (struct sg_io_hdr *)ohp;
+#ifdef CONFIG_COMPAT
+	if (in_compat_syscall())
+		get_v3_hdr = (count == sizeof(struct compat_sg_io_hdr));
+	else
+		get_v3_hdr = (count == sizeof(struct sg_io_hdr));
+#else
+	get_v3_hdr = (count == sizeof(struct sg_io_hdr));
+#endif
+	if (get_v3_hdr) {
+		if (get_sg_io_hdr(h3p, p))
+			return -EFAULT;
+	} else {
+		if (copy_from_user(ohp, p, SZ_SG_HEADER))
+			return -EFAULT;
+		if (ohp->reply_len < 0) {	/* not v2, may be v3 */
+			bool lt = false;
 
-		if (count < SZ_SG_IO_HDR)
-			return -EIO;
-		if (reinter_2p->interface_id != 'S') {
+#ifdef CONFIG_COMPAT
+			if (in_compat_syscall())
+				lt = (count < sizeof(struct compat_sg_io_hdr));
+			else
+				lt = (count < sizeof(struct sg_io_hdr));
+#else
+			lt = (count < sizeof(struct sg_io_hdr));
+#endif
+			if (lt)
+				return -EIO;
+			get_v3_hdr = true;
+			if (get_sg_io_hdr(h3p, p))
+				return -EFAULT;
+		}
+	}
+	if (get_v3_hdr) {
+		/* v3 dxfer_direction_s are all negative values by design */
+		if (h3p->dxfer_direction >= 0) {	/* so it is not v3 */
+			memcpy(ohp, h3p, count);
+			goto to_v2;
+		}
+		if (h3p->interface_id != 'S') {
 			pr_info_once("sg: %s: v3 interface only here\n",
 				     __func__);
 			return -EPERM;
 		}
-		return sg_submit(sfp, filp, p, count,
-				 !(filp->f_flags & O_NONBLOCK), false, false,
-				 NULL);
+		res = sg_submit(filp, sfp, h3p, false, NULL);
+		return res < 0 ? res : (int)count;
 	}
+to_v2:
+	/* v1 and v2 interfaces processed below this point */
 	if (count < (SZ_SG_HEADER + 6))
-		return -EIO;	/* The minimum scsi command length is 6 bytes. */
-
+		return -EIO;    /* minimum scsi command length is 6 bytes */
 	p += SZ_SG_HEADER;
 	if (get_user(opcode, p))
 		return -EFAULT;
-
-	if (!(srp = sg_setup_req(sfp))) {
-		SG_LOG(1, sfp, "%s: queue full\n", __func__);
-		return -EDOM;
-	}
 	mutex_lock(&sfp->f_mutex);
 	if (sfp->next_cmd_len > 0) {
 		cmd_size = sfp->next_cmd_len;
@@ -586,12 +625,10 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	mxsize = max_t(int, input_size, ohp->reply_len);
 	mxsize -= SZ_SG_HEADER;
 	input_size -= SZ_SG_HEADER;
-	if (input_size < 0) {
-		sg_deact_request(sfp, srp);
-		return -EIO;	/* User did not pass enough bytes for this command. */
-	}
-	h3p = &srp->header;
-	h3p->interface_id = '\0';  /* indicator of old interface tunnelled */
+	if (input_size < 0)
+		return -EIO; /* Insufficient bytes passed for this command. */
+	memset(h3p, 0, sizeof(*h3p));
+	h3p->interface_id = '\0';/* indicate v1 or v2 interface (tunnelled) */
 	h3p->cmd_len = (u8)cmd_size;
 	h3p->iovec_count = 0;
 	h3p->mx_sb_len = 0;
@@ -612,10 +649,9 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	h3p->flags = input_size;	/* structure abuse ... */
 	h3p->pack_id = ohp->pack_id;
 	h3p->usr_ptr = NULL;
-	if (copy_from_user(cmnd, p, cmd_size)) {
-		sg_deact_request(sfp, srp);
+	cmnd[0] = opcode;
+	if (copy_from_user(cmnd + 1, p + 1, cmd_size - 1))
 		return -EFAULT;
-	}
 	/*
 	 * SG_DXFER_TO_FROM_DEV is functionally equivalent to SG_DXFER_FROM_DEV,
 	 * but it is possible that the app intended SG_DXFER_TO_DEV, because
@@ -629,23 +665,23 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 			 __func__, ohp->reply_len - (int)SZ_SG_HEADER,
 			 input_size, (unsigned int)cmnd[0], current->comm);
 	}
+	cwr.frq_bm[0] = 0;	/* initial state clear for all req flags */
+	cwr.h3p = h3p;
 	cwr.timeout = sfp->timeout;
-	cwr.blocking = !(filp->f_flags & O_NONBLOCK);
-	cwr.srp = srp;
 	cwr.cmnd = cmnd;
-	res = sg_common_write(sfp, &cwr);
-	return (res < 0) ? res : count;
+	srp = sg_common_write(sfp, &cwr);
+	return (IS_ERR(srp)) ? PTR_ERR(srp) : (int)count;
 }
 
 static inline int
 sg_chk_mmap(struct sg_fd *sfp, int rq_flags, int len)
 {
-	if (len > sfp->reserve.buflen)
-		return -ENOMEM;	/* MMAP_IO size must fit in reserve buffer */
+	if (!xa_empty(&sfp->srp_arr))
+		return -EBUSY;  /* already active requests on fd */
+	if (len > sfp->rsv_srp->sgat_h.buflen)
+		return -ENOMEM; /* MMAP_IO size must fit in reserve */
 	if (rq_flags & SG_FLAG_DIRECT_IO)
-		return -EINVAL;	/* either MMAP_IO or DIRECT_IO (not both) */
-	if (sfp->res_in_use)
-		return -EBUSY;	/* reserve buffer already being used */
+		return -EINVAL; /* not both MMAP_IO and DIRECT_IO */
 	return 0;
 }
 
@@ -670,61 +706,40 @@ sg_fetch_cmnd(struct file *filp, struct sg_fd *sfp, const u8 __user *u_cdbp,
 	return 0;
 }
 
-static ssize_t
-sg_submit(struct sg_fd *sfp, struct file *filp, const char __user *buf,
-	  size_t count, bool blocking, bool read_only, bool sg_io_owned,
-	  struct sg_request **o_srp)
+static int
+sg_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
+	  bool sync, struct sg_request **o_srp)
 {
-	int k, res, timeout;
+	int res, timeout;
+	unsigned long ul_timeout;
 	struct sg_request *srp;
-	struct sg_io_hdr *hp;
 	struct sg_comm_wr_t cwr;
 	u8 cmnd[SG_MAX_CDB_SIZE];
-	unsigned long ul_timeout;
 
-	if (count < SZ_SG_IO_HDR)
-		return -EINVAL;
-
-	sfp->cmd_q = 1;	/* when sg_io_hdr seen, set command queuing on */
-	if (!(srp = sg_setup_req(sfp))) {
-		SG_LOG(1, sfp, "%s: queue full\n", __func__);
-		return -EDOM;
-	}
-	srp->sg_io_owned = sg_io_owned;
-	hp = &srp->header;
-	/* get_sg_io_hdr() is defined in block/scsi_ioctl.c */
-	if (get_sg_io_hdr(hp, buf)) {
-		sg_deact_request(sfp, srp);
-		return -EFAULT;
-	}
-	if (hp->interface_id != 'S') {
-		sg_deact_request(sfp, srp);
-		return -ENOSYS;
-	}
+	/* now doing v3 blocking (sync) or non-blocking submission */
 	if (hp->flags & SG_FLAG_MMAP_IO) {
 		res = sg_chk_mmap(sfp, hp->flags, hp->dxfer_len);
-		if (res) {
-			sg_deact_request(sfp, srp);
+		if (res)
 			return res;
-		}
 	}
-	ul_timeout = msecs_to_jiffies(srp->header.timeout);
-	timeout = (ul_timeout < INT_MAX) ? ul_timeout : INT_MAX;
+	/* when v3 seen, allow cmd_q on this fd (def: no cmd_q) */
+	set_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
+	ul_timeout = msecs_to_jiffies(hp->timeout);
+	timeout = min_t(unsigned long, ul_timeout, INT_MAX);
 	res = sg_fetch_cmnd(filp, sfp, hp->cmdp, hp->cmd_len, cmnd);
-	if (res) {
-		sg_deact_request(sfp, srp);
+	if (res)
 		return res;
-	}
+	cwr.frq_bm[0] = 0;
+	__assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
+	cwr.h3p = hp;
 	cwr.timeout = timeout;
-	cwr.blocking = blocking;
-	cwr.srp = srp;
 	cwr.cmnd = cmnd;
-	k = sg_common_write(sfp, &cwr);
-	if (k < 0)
-		return k;
+	srp = sg_common_write(sfp, &cwr);
+	if (IS_ERR(srp))
+		return PTR_ERR(srp);
 	if (o_srp)
-		*o_srp = cwr.srp;
-	return count;
+		*o_srp = srp;
+	return 0;
 }
 
 #if IS_ENABLED(SG_LOG_ACTIVE)
@@ -842,70 +857,68 @@ sg_rq_state_chg(struct sg_request *srp, enum sg_rq_state old_st,
  * sg_request object holding the request just issued or a negated errno
  * value twisted by ERR_PTR.
  */
-static int
+static struct sg_request *
 sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 {
 	bool at_head;
-	int k;
+	int res = 0;
+	int dxfr_len, dir, cmd_len;
+	int pack_id = SG_PACK_ID_WILDCARD;
+	u32 rq_flags;
 	struct sg_device *sdp = sfp->parentdp;
-	struct sg_request *srp = cwrp->srp;
-	struct sg_io_hdr *hp = &srp->header;
-
-	srp->data.cmd_opcode = cwrp->cmnd[0];	/* hold opcode of command */
-	hp->status = 0;
-	hp->masked_status = 0;
-	hp->msg_status = 0;
-	hp->info = 0;
-	hp->host_status = 0;
-	hp->driver_status = 0;
-	hp->resid = 0;
-	SG_LOG(4, sfp, "%s:  opcode=0x%02x, cmd_sz=%d\n", __func__,
-	       (int)cwrp->cmnd[0], hp->cmd_len);
-
-	if (hp->dxfer_len >= SZ_256M) {
-		sg_deact_request(sfp, srp);
-		return -EINVAL;
-	}
-
-	k = sg_start_req(srp, cwrp->cmnd);
-	if (k) {
-		SG_LOG(1, sfp, "%s: start_req err=%d\n", __func__, k);
-		sg_finish_scsi_blk_rq(srp);
-		sg_deact_request(sfp, srp);
-		return k;	/* probably out of space --> ENOMEM */
-	}
-	if (SG_IS_DETACHING(sdp))
+	struct sg_request *srp;
+	struct sg_io_hdr *hi_p;
+
+	hi_p = cwrp->h3p;
+	dir = hi_p->dxfer_direction;
+	dxfr_len = hi_p->dxfer_len;
+	rq_flags = hi_p->flags;
+	pack_id = hi_p->pack_id;
+	if (dxfr_len >= SZ_256M)
+		return ERR_PTR(-EINVAL);
+
+	srp = sg_setup_req(sfp, dxfr_len, cwrp);
+	if (IS_ERR(srp))
+		return srp;
+	srp->rq_flags = rq_flags;
+	srp->pack_id = pack_id;
+
+	cmd_len = hi_p->cmd_len;
+	memcpy(&srp->s_hdr3, hi_p, sizeof(srp->s_hdr3));
+	srp->cmd_opcode = cwrp->cmnd[0];/* hold opcode of command for debug */
+	SG_LOG(4, sfp, "%s: opcode=0x%02x, cdb_sz=%d, pack_id=%d\n", __func__,
+	       (int)cwrp->cmnd[0], cmd_len, pack_id);
+
+	res = sg_start_req(srp, cwrp->cmnd, cmd_len, dir);
+	if (res < 0)		/* probably out of space --> -ENOMEM */
 		goto err_out;
-
-	hp->duration = jiffies_to_msecs(jiffies);
-	if (hp->interface_id != '\0' &&	/* v3 (or later) interface */
-	    (SG_FLAG_Q_AT_TAIL & hp->flags))
-		at_head = false;
-	else
-		at_head = true;
-
-	if (!srp->sg_io_owned)
-		atomic_inc(&sfp->submitted);
+	if (unlikely(SG_IS_DETACHING(sdp))) {
+		res = -ENODEV;
+		goto err_out;
+	}
 	srp->rq->timeout = cwrp->timeout;
 	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
+	res = sg_rq_state_chg(srp, SG_RS_BUSY, SG_RS_INFLIGHT, false,
+			      __func__);
+	if (res)
+		goto err_out;
+	srp->start_ns = ktime_get_boottime_ns();
+	srp->duration = 0;
+
+	if (srp->s_hdr3.interface_id == '\0')
+		at_head = true; /* backward compatibility: v1+v2 interfaces */
+	else if (test_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm))
+	/* cmd flags can override sfd setting */
+		at_head = !!(srp->rq_flags & SG_FLAG_Q_AT_HEAD);
+	else            /* this sfd is defaulting to head */
+		at_head = !(srp->rq_flags & SG_FLAG_Q_AT_TAIL);
+	if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
+		atomic_inc(&sfp->submitted);
 	blk_execute_rq_nowait(sdp->disk, srp->rq, at_head, sg_rq_end_io);
-	return 0;
+	return srp;
 err_out:
-	if (srp->bio) {
-		scsi_req_free_cmd(scsi_req(srp->rq));
-		blk_put_request(srp->rq);
-		srp->rq = NULL;
-	}
-	sg_finish_scsi_blk_rq(srp);
 	sg_deact_request(sfp, srp);
-	return -ENODEV;
-}
-
-static inline int
-sg_rstate_chg(struct sg_request *srp, enum sg_rq_state old_st,
-	      enum sg_rq_state new_st)
-{
-	return sg_rq_state_chg(srp, old_st, new_st, false, __func__);
+	return ERR_PTR(res);
 }
 
 /*
@@ -936,21 +949,26 @@ static int
 sg_copy_sense(struct sg_request *srp)
 {
 	int sb_len_ret = 0;
-	struct sg_io_hdr *hp = &srp->header;
+	int scsi_stat;
 
 	/* If need be, copy the sense buffer to the user space */
-	if ((CHECK_CONDITION & hp->masked_status) ||
-	    (DRIVER_SENSE & hp->driver_status)) {
-		int sb_len = SCSI_SENSE_BUFFERSIZE;
-		void __user *up = hp->sbp;
-
-		sb_len = min_t(int, hp->mx_sb_len, sb_len);
-		/* Additional sense length field */
-		sb_len_ret = 8 + (int)srp->sense_b[7];
-		sb_len_ret = min_t(int, sb_len_ret, sb_len);
-		if (copy_to_user(up, srp->sense_b, sb_len_ret))
-			return -EFAULT;
-		hp->sb_len_wr = sb_len_ret;
+	scsi_stat = srp->rq_result & 0xff;
+	if ((scsi_stat & SAM_STAT_CHECK_CONDITION) ||
+	    (driver_byte(srp->rq_result) & DRIVER_SENSE)) {
+		int sb_len = min_t(int, SCSI_SENSE_BUFFERSIZE, srp->sense_len);
+		int mx_sb_len = srp->s_hdr3.mx_sb_len;
+		void __user *up = srp->s_hdr3.sbp;
+
+		if (up && mx_sb_len > 0) {
+			sb_len = min_t(int, mx_sb_len, sb_len);
+			/* Additional sense length field */
+			sb_len_ret = 8 + (int)srp->sense_b[7];
+			sb_len_ret = min_t(int, sb_len_ret, sb_len);
+			if (copy_to_user(up, srp->sense_b, sb_len_ret))
+				sb_len_ret = -EFAULT;
+		} else {
+			sb_len_ret = 0;
+		}
 	}
 	return sb_len_ret;
 }
@@ -959,12 +977,15 @@ static int
 sg_rec_state_v3(struct sg_fd *sfp, struct sg_request *srp)
 {
 	int sb_len_wr;
+	u32 rq_res = srp->rq_result;
 
 	sb_len_wr = sg_copy_sense(srp);
 	if (sb_len_wr < 0)
 		return sb_len_wr;
+	if (rq_res & SG_ML_RESULT_MSK)
+		srp->rq_info |= SG_INFO_CHECK;
 	if (unlikely(SG_IS_DETACHING(sfp->parentdp)))
-		return -ENODEV;
+		srp->rq_info |= SG_INFO_DEVICE_DETACHING;
 	return 0;
 }
 
@@ -972,8 +993,10 @@ static ssize_t
 sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, size_t count,
 	      void __user *p)
 {
-	int err = 0;
-	struct sg_io_hdr *hp = &srp->header;
+	int err, err2;
+	int rq_result = srp->rq_result;
+	struct sg_io_hdr hdr3;
+	struct sg_io_hdr *hp = &hdr3;
 
 	if (in_compat_syscall()) {
 		if (count < sizeof(struct compat_sg_io_hdr)) {
@@ -986,9 +1009,23 @@ sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, size_t count,
 	}
 	SG_LOG(3, sfp, "%s: srp=0x%pK\n", __func__, srp);
 	err = sg_rec_state_v3(sfp, srp);
-	if (hp->masked_status || hp->host_status || hp->driver_status)
-		hp->info |= SG_INFO_CHECK;
-	err = put_sg_io_hdr(hp, p);
+	memset(hp, 0, sizeof(*hp));
+	memcpy(hp, &srp->s_hdr3, sizeof(srp->s_hdr3));
+	hp->sb_len_wr = srp->sense_len;
+	hp->info = srp->rq_info;
+	hp->resid = srp->in_resid;
+	hp->duration = srp->duration;
+	hp->status = rq_result & 0xff;
+	hp->masked_status = status_byte(rq_result);
+	hp->msg_status = msg_byte(rq_result);
+	hp->host_status = host_byte(rq_result);
+	hp->driver_status = driver_byte(rq_result);
+	err2 = put_sg_io_hdr(hp, p);
+	err = err ? err : err2;
+	err2 = sg_rq_state_chg(srp, atomic_read(&srp->rq_st), SG_RS_RCV_DONE,
+			       false, __func__);
+	if (err2)
+		err = err ? err : err2;
 err_out:
 	sg_finish_scsi_blk_rq(srp);
 	sg_deact_request(sfp, srp);
@@ -1005,26 +1042,27 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 	     struct sg_request *srp)
 {
 	int res = 0;
-	struct sg_io_hdr *sh3p = &srp->header;
+	u32 rq_result = srp->rq_result;
 	struct sg_header *h2p;
+	struct sg_slice_hdr3 *sh3p;
 	struct sg_header a_v2hdr;
 
 	h2p = &a_v2hdr;
 	memset(h2p, 0, SZ_SG_HEADER);
+	sh3p = &srp->s_hdr3;
 	h2p->reply_len = (int)sh3p->timeout;
 	h2p->pack_len = h2p->reply_len; /* old, strange behaviour */
 	h2p->pack_id = sh3p->pack_id;
-	h2p->twelve_byte = (srp->data.cmd_opcode >= 0xc0 &&
-			    sh3p->cmd_len == 12);
-	h2p->target_status = sh3p->masked_status;
-	h2p->host_status = sh3p->host_status;
-	h2p->driver_status = sh3p->driver_status;
-	if ((CHECK_CONDITION & h2p->target_status) ||
-	    (DRIVER_SENSE & sh3p->driver_status)) {
+	h2p->twelve_byte = (srp->cmd_opcode >= 0xc0 && sh3p->cmd_len == 12);
+	h2p->target_status = status_byte(rq_result);
+	h2p->host_status = host_byte(rq_result);
+	h2p->driver_status = driver_byte(rq_result);
+	if ((CHECK_CONDITION & status_byte(rq_result)) ||
+	    (DRIVER_SENSE & driver_byte(rq_result))) {
 		memcpy(h2p->sense_buffer, srp->sense_b,
 		       sizeof(h2p->sense_buffer));
 	}
-	switch (h2p->host_status) {
+	switch (host_byte(rq_result)) {
 	/*
 	 * This following setting of 'result' is for backward compatibility
 	 * and is best ignored by the user who should use target, host and
@@ -1048,7 +1086,7 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 		h2p->result = EIO;
 		break;
 	case DID_ERROR:
-		h2p->result = (h2p->target_status == GOOD) ? 0 : EIO;
+		h2p->result = (status_byte(rq_result) == GOOD) ? 0 : EIO;
 		break;
 	default:
 		h2p->result = EIO;
@@ -1069,6 +1107,7 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 	} else {
 		res = (h2p->result == 0) ? 0 : -EIO;
 	}
+	atomic_set(&srp->rq_st, SG_RS_RCV_DONE);
 	sg_finish_scsi_blk_rq(srp);
 	sg_deact_request(sfp, srp);
 	return res;
@@ -1112,13 +1151,13 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 	hlen = could_be_v3 ? SZ_SG_IO_HDR : SZ_SG_HEADER;
 	h2p = (struct sg_header *)&a_sg_io_hdr;
 
-	if (sfp->force_packid && count >= hlen) {
+	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm) && (int)count >= hlen) {
 		/*
 		 * Even though this is a user space read() system call, this
 		 * code is cheating to fetch the pack_id.
 		 * Only need first three 32 bit ints to determine interface.
 		 */
-		if (unlikely(copy_from_user(h2p, p, 3 * sizeof(int))))
+		if (copy_from_user(h2p, p, 3 * sizeof(int)))
 			return -EFAULT;
 		if (h2p->reply_len < 0 && could_be_v3) {
 			struct sg_io_hdr *v3_hdr = (struct sg_io_hdr *)h2p;
@@ -1139,20 +1178,20 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 	}
 	srp = sg_find_srp_by_id(sfp, want_id);
 	if (!srp) {	/* nothing available so wait on packet to arrive or */
-		if (SG_IS_DETACHING(sdp))
+		if (unlikely(SG_IS_DETACHING(sdp)))
 			return -ENODEV;
 		if (non_block) /* O_NONBLOCK or v3::flags & SGV4_FLAG_IMMED */
 			return -EAGAIN;
 		ret = wait_event_interruptible(sfp->read_wait,
 					       sg_get_ready_srp(sfp, &srp,
 								want_id));
-		if (SG_IS_DETACHING(sdp))
+		if (unlikely(SG_IS_DETACHING(sdp)))
 			return -ENODEV;
 		if (ret)	/* -ERESTARTSYS as signal hit process */
 			return ret;
 		/* otherwise srp should be valid */
 	}
-	if (srp->header.interface_id == '\0')
+	if (srp->s_hdr3.interface_id == '\0')
 		ret = sg_read_v1v2(p, (int)count, sfp, srp);
 	else
 		ret = sg_receive_v3(sfp, srp, count, p);
@@ -1223,7 +1262,7 @@ sg_calc_rq_dur(const struct sg_request *srp)
 	return (diff > (s64)U32_MAX) ? 3999999999U : (u32)diff;
 }
 
-/* Return of U32_MAX means srp is inactive */
+/* Return of U32_MAX means srp is inactive state */
 static u32
 sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
 	   bool *is_durp)
@@ -1254,34 +1293,63 @@ static void
 sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 			struct sg_req_info *rip)
 {
-	unsigned int ms;
+	unsigned long iflags;
 
-	rip->req_state = srp->done + 1;
-	rip->problem = srp->header.masked_status &
-		       srp->header.host_status &
-		       srp->header.driver_status;
-	rip->duration = sg_get_dur(srp, NULL, NULL); /* dummy */
-	if (srp->done) {
-		rip->duration = srp->header.duration;
-	} else {
-		ms = jiffies_to_msecs(jiffies);
-		rip->duration = (ms > srp->header.duration) ?
-				(ms - srp->header.duration) : 0;
-	}
-	rip->orphan = srp->orphan;
-	rip->sg_io_owned = srp->sg_io_owned;
-	rip->pack_id = srp->header.pack_id;
-	rip->usr_ptr = srp->header.usr_ptr;
+	xa_lock_irqsave(&sfp->srp_arr, iflags);
+	rip->duration = sg_get_dur(srp, NULL, NULL);
+	if (rip->duration == U32_MAX)
+		rip->duration = 0;
+	rip->orphan = test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
+	rip->sg_io_owned = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
+	rip->problem = !!(srp->rq_result & SG_ML_RESULT_MSK);
+	rip->pack_id = srp->pack_id;
+	rip->usr_ptr = srp->s_hdr3.usr_ptr;
+	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
+}
 
+static inline bool
+sg_rq_landed(struct sg_device *sdp, struct sg_request *srp)
+{
+	return atomic_read(&srp->rq_st) != SG_RS_INFLIGHT ||
+	       unlikely(SG_IS_DETACHING(sdp));
 }
 
+/*
+ * This is a blocking wait for a specific srp. When h4p is non-NULL, it is
+ * the blocking multiple request case
+ */
 static int
-srp_done(struct sg_fd *sfp, struct sg_request *srp)
+sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
+		  struct sg_request *srp)
 {
-	int ret;
+	int res;
+	enum sg_rq_state sr_st;
+	struct sg_device *sdp = sfp->parentdp;
 
-	ret = srp->done;
-	return ret;
+	SG_LOG(3, sfp, "%s: about to wait_event...()\n", __func__);
+	/* usually will be woken up by sg_rq_end_io() callback */
+	res = wait_event_interruptible(sfp->read_wait,
+				       sg_rq_landed(sdp, srp));
+	if (unlikely(res)) { /* -ERESTARTSYS because signal hit thread */
+		set_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
+		/* orphans harvested when sfp->keep_orphan is false */
+		atomic_set(&srp->rq_st, SG_RS_INFLIGHT);
+		SG_LOG(1, sfp, "%s:  wait_event_interruptible gave %d\n",
+		       __func__, res);
+		return res;
+	}
+	if (unlikely(SG_IS_DETACHING(sdp))) {
+		atomic_set(&srp->rq_st, SG_RS_INACTIVE);
+		return -ENODEV;
+	}
+	sr_st = atomic_read(&srp->rq_st);
+	if (unlikely(sr_st != SG_RS_AWAIT_RCV))
+		return -EPROTO;         /* Logic error */
+	res = sg_rq_state_chg(srp, sr_st, SG_RS_BUSY, false, __func__);
+	if (unlikely(res))
+		return res;
+	res = sg_receive_v3(sfp, srp, SZ_SG_IO_HDR, p);
+	return (res < 0) ? res : 0;
 }
 
 /*
@@ -1292,42 +1360,119 @@ static int
 sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	     void __user *p)
 {
-	bool read_only = O_RDWR != (filp->f_flags & O_ACCMODE);
 	int res;
-	struct sg_request *srp;
+	struct sg_request *srp = NULL;
+	u8 hu8arr[SZ_SG_IO_HDR];
+	struct sg_io_hdr *h3p = (struct sg_io_hdr *)hu8arr;
 
+	SG_LOG(3, sfp, "%s:  SG_IO%s\n", __func__,
+	       ((filp->f_flags & O_NONBLOCK) ? " O_NONBLOCK ignored" : ""));
 	res = sg_allow_if_err_recovery(sdp, false);
 	if (res)
 		return res;
-	res = sg_submit(sfp, filp, p, SZ_SG_IO_HDR, true, read_only,
-			true, &srp);
-	if (res < 0)
+	if (get_sg_io_hdr(h3p, p))
+		return -EFAULT;
+	if (h3p->interface_id == 'S')
+		res = sg_submit(filp, sfp, h3p, true, &srp);
+	else
+		return -EPERM;
+	if (unlikely(res < 0))
 		return res;
-	res = wait_event_interruptible
-		(sfp->read_wait, (srp_done(sfp, srp) || SG_IS_DETACHING(sdp)));
-	if (SG_IS_DETACHING(sdp))
-		return -ENODEV;
-	if (srp->done) {
-		srp->done = 2;
-		res = sg_receive_v3(sfp, srp, SZ_SG_IO_HDR, p);
-		return (res < 0) ? res : 0;
-	}
-	srp->orphan = 1;
+	if (!srp)	/* mrq case: already processed all responses */
+		return res;
+	res = sg_wait_event_srp(filp, sfp, p, srp);
+	if (res)
+		SG_LOG(1, sfp, "%s: %s=0x%pK  state: %s\n", __func__,
+		       "unexpected srp", srp,
+		       sg_rq_st_str(atomic_read(&srp->rq_st), false));
 	return res;
 }
 
+/*
+ * First normalize want_rsv_sz to be >= sfp->sgat_elem_sz and
+ * <= max_segment_size. Exit if that is the same as old size; otherwise
+ * create a new candidate request of the new size. Then decide whether to
+ * re-use an existing free list request (least buflen >= required size) or
+ * use the new candidate. If new one used, leave old one but it is no longer
+ * the reserved request. Returns 0 on success, else a negated errno value.
+ */
 static int
 sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 {
-	if (want_rsv_sz != sfp->reserve.buflen) {
-		if (sfp->mmap_called ||
-		    sfp->res_in_use) {
-			return -EBUSY;
+	bool use_new_srp = false;
+	int res = 0;
+	int new_sz, blen;
+	unsigned long idx, iflags;
+	struct sg_request *o_srp;       /* prior reserve sg_request */
+	struct sg_request *n_srp;       /* new sg_request, may be used */
+	struct sg_request *t_srp;       /* other fl entries */
+	struct sg_device *sdp = sfp->parentdp;
+	struct xarray *xafp = &sfp->srp_arr;
+
+	o_srp = sfp->rsv_srp;
+	if (!o_srp)
+		return -EPROTO;
+	new_sz = min_t(int, want_rsv_sz, sdp->max_sgat_sz);
+	new_sz = max_t(int, new_sz, sfp->sgat_elem_sz);
+	blen = o_srp->sgat_h.buflen;
+	SG_LOG(3, sfp, "%s: was=%d, ask=%d, new=%d (sgat_elem_sz=%d)\n",
+	       __func__, blen, want_rsv_sz, new_sz, sfp->sgat_elem_sz);
+	if (blen == new_sz)
+		return 0;
+	n_srp = sg_mk_srp_sgat(sfp, true /* can take time */, new_sz);
+	if (IS_ERR(n_srp))
+		return PTR_ERR(n_srp);
+	sg_rq_state_force(n_srp, SG_RS_INACTIVE);
+	/* new sg_request object, sized correctly is now available */
+try_again:
+	o_srp = sfp->rsv_srp;
+	if (!o_srp) {
+		res = -EPROTO;
+		goto fini;
+	}
+	if (SG_RS_ACTIVE(o_srp) ||
+	    test_bit(SG_FFD_MMAP_CALLED, sfp->ffd_bm)) {
+		res = -EBUSY;
+		goto fini;
+	}
+	use_new_srp = true;
+	xa_for_each(xafp, idx, t_srp) {
+		if (t_srp != o_srp && new_sz <= t_srp->sgat_h.buflen &&
+		    !SG_RS_ACTIVE(t_srp)) {
+			/* good candidate on free list, use */
+			use_new_srp = false;
+			sfp->rsv_srp = t_srp;
+			break;
 		}
-		sg_remove_sgat(sfp, &sfp->reserve);
-		sg_build_reserve(sfp, want_rsv_sz);
 	}
-	return 0;
+	if (use_new_srp) {
+		struct sg_request *cxc_srp;
+
+		xa_lock_irqsave(xafp, iflags);
+		n_srp->rq_idx = o_srp->rq_idx;
+		idx = o_srp->rq_idx;
+		cxc_srp = __xa_cmpxchg(xafp, idx, o_srp, n_srp, GFP_ATOMIC);
+		if (o_srp == cxc_srp) {
+			sfp->rsv_srp = n_srp;
+			sg_rq_state_force(n_srp, SG_RS_INACTIVE);
+			xa_unlock_irqrestore(xafp, iflags);
+			SG_LOG(6, sfp, "%s: new rsv srp=0x%pK ++\n", __func__,
+			       n_srp);
+			sg_remove_sgat(o_srp);
+			kfree(o_srp);
+		} else {
+			xa_unlock_irqrestore(xafp, iflags);
+			SG_LOG(1, sfp, "%s: xa_cmpxchg() failed, again\n",
+			       __func__);
+			goto try_again;
+		}
+	}
+fini:
+	if (!use_new_srp) {
+		sg_remove_sgat(n_srp);
+		kfree(n_srp);   /* no-one else has seen n_srp, so safe */
+	}
+	return res;
 }
 
 #ifdef CONFIG_COMPAT
@@ -1357,26 +1502,43 @@ static int put_compat_request_table(struct compat_sg_req_info __user *o,
 }
 #endif
 
+/*
+ * For backward compatibility, output SG_MAX_QUEUE sg_req_info objects. First
+ * fetch from the active list then, if there is still room, from the free
+ * list. Some of the trailing elements may be empty which is indicated by all
+ * fields being zero. Any requests beyond SG_MAX_QUEUE are ignored.
+ */
 static int
 sg_ctl_req_tbl(struct sg_fd *sfp, void __user *p)
 {
-	int result, val;
+	int k, result, val;
 	unsigned long idx;
 	struct sg_request *srp;
-	sg_req_info_t *rinfop;
+	struct sg_req_info *rinfop;
 
-	rinfop = kcalloc(SG_MAX_QUEUE, SZ_SG_REQ_INFO,
-			 GFP_KERNEL);
+	SG_LOG(3, sfp, "%s:    SG_GET_REQUEST_TABLE\n", __func__);
+	k = SG_MAX_QUEUE;
+	rinfop = kcalloc(k, SZ_SG_REQ_INFO, GFP_KERNEL);
 	if (!rinfop)
 		return -ENOMEM;
 	val = 0;
 	xa_for_each(&sfp->srp_arr, idx, srp) {
 		if (!srp)
 			continue;
-		if (xa_get_mark(&sfp->srp_arr, idx, SG_XA_RQ_AWAIT))
+		if (val >= SG_MAX_QUEUE)
+			break;
+		if (xa_get_mark(&sfp->srp_arr, idx, SG_XA_RQ_INACTIVE))
+			continue;
+		sg_fill_request_element(sfp, srp, rinfop + val);
+		val++;
+	}
+	xa_for_each(&sfp->srp_arr, idx, srp) {
+		if (!srp)
 			continue;
 		if (val >= SG_MAX_QUEUE)
 			break;
+		if (!xa_get_mark(&sfp->srp_arr, idx, SG_XA_RQ_INACTIVE))
+			continue;
 		sg_fill_request_element(sfp, srp, rinfop + val);
 		val++;
 	}
@@ -1443,16 +1605,13 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		result = get_user(val, ip);
 		if (result)
 			return result;
-		sfp->force_packid = val ? 1 : 0;
+		assign_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm, !!val);
 		return 0;
 	case SG_GET_PACK_ID:    /* or tag of oldest "read"-able, -1 if none */
 		val = -1;
-		srp = NULL;
 		xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
-			if (!srp)
-				continue;
-			if ((1 == srp->done) && (!srp->sg_io_owned)) {
-				val = srp->header.pack_id;
+			if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
+				val = srp->pack_id;
 				break;
 			}
 		}
@@ -1465,44 +1624,48 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		       sdp->max_sgat_sz);
 		return put_user(sdp->max_sgat_sz, ip);
 	case SG_SET_RESERVED_SIZE:
-		mutex_lock(&sfp->f_mutex);
 		result = get_user(val, ip);
 		if (!result) {
 			if (val >= 0 && val <= (1024 * 1024 * 1024)) {
+				mutex_lock(&sfp->f_mutex);
 				result = sg_set_reserved_sz(sfp, val);
+				mutex_unlock(&sfp->f_mutex);
 			} else {
 				SG_LOG(3, sfp, "%s: invalid size\n", __func__);
 				result = -EINVAL;
 			}
 		}
-		mutex_unlock(&sfp->f_mutex);
 		return result;
 	case SG_GET_RESERVED_SIZE:
-		val = min_t(int, sfp->reserve.buflen,
-			    max_sectors_bytes(sdev->request_queue));
+		mutex_lock(&sfp->f_mutex);
+		val = min_t(int, sfp->rsv_srp->sgat_h.buflen,
+			    sdp->max_sgat_sz);
+		mutex_unlock(&sfp->f_mutex);
 		SG_LOG(3, sfp, "%s:    SG_GET_RESERVED_SIZE=%d\n",
 		       __func__, val);
-		return put_user(val, ip);
+		result = put_user(val, ip);
+		return result;
 	case SG_SET_COMMAND_Q:
 		SG_LOG(3, sfp, "%s:    SG_SET_COMMAND_Q\n", __func__);
 		result = get_user(val, ip);
 		if (result)
 			return result;
-		sfp->cmd_q = val ? 1 : 0;
+		assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, !!val);
 		return 0;
 	case SG_GET_COMMAND_Q:
 		SG_LOG(3, sfp, "%s:    SG_GET_COMMAND_Q\n", __func__);
-		return put_user((int) sfp->cmd_q, ip);
+		return put_user(test_bit(SG_FFD_CMD_Q, sfp->ffd_bm), ip);
 	case SG_SET_KEEP_ORPHAN:
 		SG_LOG(3, sfp, "%s:    SG_SET_KEEP_ORPHAN\n", __func__);
 		result = get_user(val, ip);
 		if (result)
 			return result;
-		sfp->keep_orphan = val;
+		assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, !!val);
 		return 0;
 	case SG_GET_KEEP_ORPHAN:
 		SG_LOG(3, sfp, "%s:    SG_GET_KEEP_ORPHAN\n", __func__);
-		return put_user((int) sfp->keep_orphan, ip);
+		return put_user(test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm),
+				ip);
 	case SG_GET_VERSION_NUM:
 		SG_LOG(3, sfp, "%s:    SG_GET_VERSION_NUM\n", __func__);
 		return put_user(sg_version_num, ip);
@@ -1557,6 +1720,8 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return put_user(val, ip);
 	case SG_EMULATED_HOST:
 		SG_LOG(3, sfp, "%s:    SG_EMULATED_HOST\n", __func__);
+		if (unlikely(SG_IS_DETACHING(sdp)))
+			return -ENODEV;
 		return put_user(sdev->host->hostt->emulated, ip);
 	case SCSI_IOCTL_SEND_COMMAND:
 		SG_LOG(3, sfp, "%s:    SCSI_IOCTL_SEND_COMMAND\n", __func__);
@@ -1567,7 +1732,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		result = get_user(val, ip);
 		if (result)
 			return result;
-		assign_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm, val);
+		assign_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm, !!val);
 		if (val == 0)	/* user can force recalculation */
 			sg_calc_sgat_param(sdp);
 		return 0;
@@ -1612,7 +1777,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		break;
 	}
 	result = sg_allow_if_err_recovery(sdp, filp->f_flags & O_NDELAY);
-	if (result)
+	if (unlikely(result))
 		return result;
 	return -ENOIOCTLCMD;
 }
@@ -1675,7 +1840,7 @@ sg_poll(struct file *filp, poll_table * wait)
 
 	if (unlikely(SG_IS_DETACHING(sfp->parentdp)))
 		p_res |= EPOLLHUP;
-	else if (likely(sfp->cmd_q))
+	else if (likely(test_bit(SG_FFD_CMD_Q, sfp->ffd_bm)))
 		p_res |= EPOLLOUT | EPOLLWRNORM;
 	else if (atomic_read(&sfp->submitted) == 0)
 		p_res |= EPOLLOUT | EPOLLWRNORM;
@@ -1697,9 +1862,10 @@ static vm_fault_t
 sg_vma_fault(struct vm_fault *vmf)
 {
 	int k, length;
-	unsigned long offset, len, sa;
+	unsigned long offset, len, sa, iflags;
 	struct vm_area_struct *vma = vmf->vma;
 	struct sg_scatter_hold *rsv_schp;
+	struct sg_request *srp;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 	const char *nbp = "==NULL, bad";
@@ -1718,12 +1884,18 @@ sg_vma_fault(struct vm_fault *vmf)
 		SG_LOG(1, sfp, "%s: device detaching\n", __func__);
 		goto out_err;
 	}
-	rsv_schp = &sfp->reserve;
+	srp = sfp->rsv_srp;
+	if (!srp) {
+		SG_LOG(1, sfp, "%s: srp%s\n", __func__, nbp);
+		goto out_err;
+	}
+	xa_lock_irqsave(&sfp->srp_arr, iflags);
+	rsv_schp = &srp->sgat_h;
 	offset = vmf->pgoff << PAGE_SHIFT;
 	if (offset >= (unsigned int)rsv_schp->buflen) {
 		SG_LOG(1, sfp, "%s: offset[%lu] >= rsv.buflen\n", __func__,
 		       offset);
-		goto out_err;
+		goto out_err_unlock;
 	}
 	sa = vma->vm_start;
 	SG_LOG(3, sfp, "%s: vm_start=0x%lx, off=%lu\n", __func__, sa, offset);
@@ -1736,6 +1908,7 @@ sg_vma_fault(struct vm_fault *vmf)
 			struct page *pp;
 
 			pp = rsv_schp->pages[k];
+			xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 			page = nth_page(pp, offset >> PAGE_SHIFT);
 			get_page(page); /* increment page count */
 			vmf->page = page;
@@ -1744,6 +1917,8 @@ sg_vma_fault(struct vm_fault *vmf)
 		sa += len;
 		offset -= len;
 	}
+out_err_unlock:
+	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 out_err:
 	return VM_FAULT_SIGBUS;
 }
@@ -1758,9 +1933,10 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 {
 	int k, length;
 	int ret = 0;
-	unsigned long req_sz, len, sa;
+	unsigned long req_sz, len, sa, iflags;
 	struct sg_scatter_hold *rsv_schp;
 	struct sg_fd *sfp;
+	struct sg_request *srp;
 
 	if (!filp || !vma)
 		return -ENXIO;
@@ -1776,11 +1952,13 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 		return -EINVAL; /* only an offset of 0 accepted */
 	/* Check reserve request is inactive and has large enough buffer */
 	mutex_lock(&sfp->f_mutex);
-	if (sfp->res_in_use) {
+	srp = sfp->rsv_srp;
+	xa_lock_irqsave(&sfp->srp_arr, iflags);
+	if (SG_RS_ACTIVE(srp)) {
 		ret = -EBUSY;
 		goto out;
 	}
-	rsv_schp = &sfp->reserve;
+	rsv_schp = &srp->sgat_h;
 	if (req_sz > (unsigned long)rsv_schp->buflen) {
 		ret = -ENOMEM;
 		goto out;
@@ -1793,11 +1971,12 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 		sa += len;
 	}
 
-	sfp->mmap_called = 1;
+	set_bit(SG_FFD_MMAP_CALLED, sfp->ffd_bm);
 	vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
 	vma->vm_private_data = sfp;
 	vma->vm_ops = &sg_mmap_vm_ops;
 out:
+	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 	mutex_unlock(&sfp->f_mutex);
 	return ret;
 }
@@ -1819,8 +1998,10 @@ sg_rq_end_io_usercontext(struct work_struct *work)
 		return;
 	}
 	SG_LOG(3, sfp, "%s: srp=0x%pK\n", __func__, srp);
-	sg_finish_scsi_blk_rq(srp);
-	sg_deact_request(sfp, srp);
+	if (test_bit(SG_FRQ_DEACT_ORPHAN, srp->frq_bm)) {
+		sg_finish_scsi_blk_rq(srp);	/* clean up orphan case */
+		sg_deact_request(sfp, srp);
+	}
 	kref_put(&sfp->f_ref, sg_remove_sfp);
 }
 
@@ -1858,85 +2039,95 @@ sg_check_sense(struct sg_device *sdp, struct sg_request *srp, int sense_len)
 }
 
 /*
- * This function is a "bottom half" handler that is called by the mid
- * level when a command is completed (or has failed).
+ * This "bottom half" (soft interrupt) handler is called by the mid-level
+ * when a request has completed or failed. This callback is registered in a
+ * blk_execute_rq_nowait() call in the sg_common_write(). For ioctl(SG_IO)
+ * (sync) usage, sg_ctl_sg_io() waits to be woken up by this callback.
  */
 static void
 sg_rq_end_io(struct request *rq, blk_status_t status)
 {
+	enum sg_rq_state rqq_state = SG_RS_AWAIT_RCV;
+	int a_resid, slen;
 	struct sg_request *srp = rq->end_io_data;
 	struct scsi_request *scsi_rp = scsi_req(rq);
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
-	unsigned int ms;
-	int resid, slen;
-	int done = 1;
-	unsigned long iflags;
 
-	if (WARN_ON(srp->done != 0))
+	if (!scsi_rp) {
+		WARN_ONCE("%s: scsi_req(rq) unexpectedly NULL\n", __func__);
 		return;
-
-	sfp = srp->parentfp;
-	if (WARN_ON(sfp == NULL))
+	}
+	if (!srp) {
+		WARN_ONCE("%s: srp unexpectedly NULL\n", __func__);
 		return;
-
+	}
+	if (WARN_ON(atomic_read(&srp->rq_st) != SG_RS_INFLIGHT)) {
+		pr_warn("%s: bad rq_st=%d\n", __func__,
+			atomic_read(&srp->rq_st));
+		goto early_err;
+	}
+	sfp = srp->parentfp;
+	if (unlikely(!sfp)) {
+		WARN_ONCE(1, "%s: sfp unexpectedly NULL", __func__);
+		goto early_err;
+	}
 	sdp = sfp->parentdp;
 	if (unlikely(SG_IS_DETACHING(sdp)))
 		pr_info("%s: device detaching\n", __func__);
 
 	srp->rq_result = scsi_rp->result;
-	resid = scsi_rp->resid_len;
-
-	srp->header.resid = resid;
-
 	slen = min_t(int, scsi_rp->sense_len, SCSI_SENSE_BUFFERSIZE);
+	a_resid = scsi_rp->resid_len;
+
+	if (a_resid)
+		srp->in_resid = a_resid;
 
-	SG_LOG(6, sfp, "%s: pack_id=%d, res=0x%x\n", __func__,
-	       srp->header.pack_id, srp->rq_result);
-	ms = jiffies_to_msecs(jiffies);
-	srp->header.duration = (ms > srp->header.duration) ?
-				(ms - srp->header.duration) : 0;
-	if (srp->rq_result != 0 && slen > 0)
+	SG_LOG(6, sfp, "%s: pack_id=%d, res=0x%x\n", __func__, srp->pack_id,
+	       srp->rq_result);
+	srp->duration = sg_calc_rq_dur(srp);
+	if (unlikely((srp->rq_result & SG_ML_RESULT_MSK) && slen > 0))
 		sg_check_sense(sdp, srp, slen);
 	if (slen > 0)
 		memcpy(srp->sense_b, scsi_rp->sense, slen);
-
-	/* Rely on write phase to clean out srp status values, so no "else" */
-
-	if (!srp->sg_io_owned)
+	srp->sense_len = slen;
+	if (unlikely(test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm))) {
+		if (test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm)) {
+			clear_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
+		} else {
+			rqq_state = SG_RS_BUSY;
+			set_bit(SG_FRQ_DEACT_ORPHAN, srp->frq_bm);
+		}
+	}
+	if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
 		atomic_inc(&sfp->waiting);
+	if (unlikely(sg_rq_state_chg(srp, SG_RS_INFLIGHT, rqq_state,
+				     false, __func__)))
+		pr_warn("%s: can't set rq_st\n", __func__);
 	/*
-	 * Free the request as soon as it is complete so that its resources
-	 * can be reused without waiting for userspace to read() the
-	 * result.  But keep the associated bio (if any) around until
-	 * blk_rq_unmap_user() can be called from user context.
+	 * Free the mid-level resources apart from the bio (if any). The bio's
+	 * blk_rq_unmap_user() can be called later from user context.
 	 */
 	srp->rq = NULL;
-	scsi_req_free_cmd(scsi_req(rq));
+	scsi_req_free_cmd(scsi_rp);
 	blk_put_request(rq);
 
-	if (unlikely(srp->orphan)) {
-		if (sfp->keep_orphan)
-			srp->sg_io_owned = 0;
-		else
-			done = 0;
-	}
-	srp->done = done;
-
-	if (likely(done)) {
-		/* Now wake up any sg_read() that is waiting for this
-		 * packet.
-		 */
-		xa_lock_irqsave(&sfp->srp_arr, iflags);
-		__xa_set_mark(&sfp->srp_arr, srp->rq_idx, SG_XA_RQ_AWAIT);
-		xa_unlock_irqrestore(&sfp->srp_arr, iflags);
+	if (likely(rqq_state == SG_RS_AWAIT_RCV)) {
+		/* Wake any sg_read()/ioctl(SG_IORECEIVE) awaiting this req */
 		wake_up_interruptible(&sfp->read_wait);
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
 		kref_put(&sfp->f_ref, sg_remove_sfp);
-	} else {
+	} else {        /* clean up orphaned request that aren't being kept */
 		INIT_WORK(&srp->ew_orph.work, sg_rq_end_io_usercontext);
 		schedule_work(&srp->ew_orph.work);
 	}
+	return;
+
+early_err:
+	srp->rq = NULL;
+	if (scsi_rp)
+		scsi_req_free_cmd(scsi_rp);
+	blk_put_request(rq);
 }
 
 static const struct file_operations sg_fops = {
@@ -1964,9 +2155,9 @@ static struct sg_device *
 sg_add_device_helper(struct gendisk *disk, struct scsi_device *scsidp)
 {
 	struct sg_device *sdp;
-	unsigned long iflags;
 	int error;
 	u32 k;
+	unsigned long iflags;
 
 	sdp = kzalloc(sizeof(*sdp), GFP_KERNEL);
 	if (!sdp)
@@ -1984,7 +2175,7 @@ sg_add_device_helper(struct gendisk *disk, struct scsi_device *scsidp)
 			error = -ENODEV;
 		} else {
 			sdev_printk(KERN_WARNING, scsidp,
-				    "%s: idr alloc sg_device failure: %d\n",
+				"%s: idr allocation sg_device failure: %d\n",
 				    __func__, error);
 		}
 		goto out_unlock;
@@ -2186,6 +2377,7 @@ init_sg(void)
 				    SG_MAX_DEVS, "sg");
 	if (rc)
 		return rc;
+
 	pr_info("Registered %s[char major=0x%x], version: %s, date: %s\n",
 		"sg device ", SCSI_GENERIC_MAJOR, SG_VERSION_STR,
 		sg_version_date);
@@ -2201,6 +2393,7 @@ init_sg(void)
 		return 0;
 	}
 	class_destroy(sg_sysfs_class);
+
 err_out_unreg:
 	unregister_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0), SG_MAX_DEVS);
 	return rc;
@@ -2227,6 +2420,19 @@ exit_sg(void)
 	idr_destroy(&sg_index_idr);
 }
 
+static inline bool
+sg_chk_dio_allowed(struct sg_device *sdp, struct sg_request *srp,
+		   int iov_count, int dir)
+{
+	if (sg_allow_dio && (srp->rq_flags & SG_FLAG_DIRECT_IO)) {
+		if (dir != SG_DXFER_UNKNOWN && !iov_count) {
+			if (!sdp->device->host->unchecked_isa_dma)
+				return true;
+		}
+	}
+	return false;
+}
+
 static void
 sg_set_map_data(const struct sg_scatter_hold *schp, bool up_valid,
 		struct rq_map_data *mdp)
@@ -2240,31 +2446,40 @@ sg_set_map_data(const struct sg_scatter_hold *schp, bool up_valid,
 }
 
 static int
-sg_start_req(struct sg_request *srp, u8 *cmd)
+sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len, int dxfer_dir)
 {
-	int res;
+	bool reserved, us_xfer;
+	int res = 0;
+	int dxfer_len = 0;
+	int r0w = READ;
+	unsigned int iov_count = 0;
+	void __user *up;
 	struct request *rq;
-	struct scsi_request *req;
+	struct scsi_request *scsi_rp;
 	struct sg_fd *sfp = srp->parentfp;
-	struct sg_io_hdr *hp = &srp->header;
-	int dxfer_len = (int) hp->dxfer_len;
-	int dxfer_dir = hp->dxfer_direction;
-	unsigned int iov_count = hp->iovec_count;
-	struct sg_scatter_hold *req_schp = &srp->data;
-	struct sg_scatter_hold *rsv_schp = &sfp->reserve;
-	struct request_queue *q = sfp->parentdp->device->request_queue;
-	struct rq_map_data *md, map_data;
-	int r0w = hp->dxfer_direction == SG_DXFER_TO_DEV ? WRITE : READ;
+	struct sg_device *sdp;
+	struct sg_scatter_hold *req_schp;
+	struct request_queue *q;
+	struct rq_map_data *md = (void *)srp; /* want any non-NULL value */
 	u8 *long_cmdp = NULL;
+	__maybe_unused const char *cp = "";
+	struct sg_slice_hdr3 *sh3p = &srp->s_hdr3;
+	struct rq_map_data map_data;
 
-	if (hp->cmd_len > BLK_MAX_CDB) {
-		long_cmdp = kzalloc(hp->cmd_len, GFP_KERNEL);
+	sdp = sfp->parentdp;
+	if (cmd_len > BLK_MAX_CDB) {	/* for longer SCSI cdb_s */
+		long_cmdp = kzalloc(cmd_len, GFP_KERNEL);
 		if (!long_cmdp)
 			return -ENOMEM;
 		SG_LOG(5, sfp, "%s: long_cmdp=0x%pK ++\n", __func__, long_cmdp);
 	}
+	up = sh3p->dxferp;
+	dxfer_len = (int)sh3p->dxfer_len;
+	iov_count = sh3p->iovec_count;
+	r0w = dxfer_dir == SG_DXFER_TO_DEV ? WRITE : READ;
 	SG_LOG(4, sfp, "%s: dxfer_len=%d, data-%s\n", __func__, dxfer_len,
 	       (r0w ? "OUT" : "IN"));
+	q = sdp->device->request_queue;
 
 	/*
 	 * NOTE
@@ -2277,125 +2492,144 @@ sg_start_req(struct sg_request *srp, u8 *cmd)
 	 * do not want to use BLK_MQ_REQ_NOWAIT here because userspace might
 	 * not expect an EWOULDBLOCK from this condition.
 	 */
-	rq = blk_get_request(q, hp->dxfer_direction == SG_DXFER_TO_DEV ?
-			REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, 0);
+	rq = blk_get_request(q, (r0w ? REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN), 0);
 	if (IS_ERR(rq)) {
 		kfree(long_cmdp);
 		return PTR_ERR(rq);
 	}
-	req = scsi_req(rq);
-
-	if (hp->cmd_len > BLK_MAX_CDB)
-		req->cmd = long_cmdp;
-	memcpy(req->cmd, cmd, hp->cmd_len);
-	req->cmd_len = hp->cmd_len;
-
+	/* current sg_request protected by SG_RS_BUSY state */
+	scsi_rp = scsi_req(rq);
 	srp->rq = rq;
-	rq->end_io_data = srp;
-	req->retries = SG_DEFAULT_RETRIES;
 
-	if ((dxfer_len <= 0) || (dxfer_dir == SG_DXFER_NONE))
-		return 0;
-
-	if (sg_allow_dio && hp->flags & SG_FLAG_DIRECT_IO &&
-	    dxfer_dir != SG_DXFER_UNKNOWN && !iov_count &&
-	    !sfp->parentdp->device->host->unchecked_isa_dma &&
-	    blk_rq_aligned(q, (unsigned long)hp->dxferp, dxfer_len))
+	if (cmd_len > BLK_MAX_CDB)
+		scsi_rp->cmd = long_cmdp;
+	memcpy(scsi_rp->cmd, cmd, cmd_len);
+	scsi_rp->cmd_len = cmd_len;
+	us_xfer = !(srp->rq_flags & SG_FLAG_NO_DXFER);
+	assign_bit(SG_FRQ_NO_US_XFER, srp->frq_bm, !us_xfer);
+	reserved = (sfp->rsv_srp == srp);
+	rq->end_io_data = srp;
+	scsi_rp->retries = SG_DEFAULT_RETRIES;
+	req_schp = &srp->sgat_h;
+
+	if (dxfer_len <= 0 || dxfer_dir == SG_DXFER_NONE) {
+		SG_LOG(4, sfp, "%s: no data xfer [0x%pK]\n", __func__, srp);
+		set_bit(SG_FRQ_NO_US_XFER, srp->frq_bm);
+		goto fini;	/* path of reqs with no din nor dout */
+	} else if (sg_chk_dio_allowed(sdp, srp, iov_count, dxfer_dir) &&
+		   blk_rq_aligned(q, (unsigned long)up, dxfer_len)) {
+		srp->rq_info |= SG_INFO_DIRECT_IO;
 		md = NULL;
-	else
+		if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
+			cp = "direct_io, ";
+	} else {	/* normal IO and failed conditions for dio path */
 		md = &map_data;
+	}
 
-	if (md) {
-		mutex_lock(&sfp->f_mutex);
-		if (dxfer_len <= rsv_schp->buflen &&
-		    !sfp->res_in_use) {
-			sfp->res_in_use = 1;
-			sg_link_reserve(sfp, srp, dxfer_len);
-		} else if (hp->flags & SG_FLAG_MMAP_IO) {
-			res = -EBUSY; /* sfp->res_in_use == 1 */
-			if (dxfer_len > rsv_schp->buflen)
-				res = -ENOMEM;
-			mutex_unlock(&sfp->f_mutex);
-			return res;
-		} else {
-			res = sg_mk_sgat(req_schp, sfp, dxfer_len);
-			if (res) {
-				mutex_unlock(&sfp->f_mutex);
-				return res;
-			}
+	if (likely(md)) {	/* normal, "indirect" IO */
+		if (unlikely((srp->rq_flags & SG_FLAG_MMAP_IO))) {
+			/* mmap IO must use and fit in reserve request */
+			if (!reserved || dxfer_len > req_schp->buflen)
+				res = reserved ? -ENOMEM : -EBUSY;
+		} else if (req_schp->buflen == 0) {
+			int up_sz = max_t(int, dxfer_len, sfp->sgat_elem_sz);
+
+			res = sg_mk_sgat(srp, sfp, up_sz);
 		}
-		mutex_unlock(&sfp->f_mutex);
+		if (res)
+			goto fini;
 
-		sg_set_map_data(req_schp, !!hp->dxferp, md);
+		sg_set_map_data(req_schp, !!up, md);
 		md->from_user = (dxfer_dir == SG_DXFER_TO_FROM_DEV);
 	}
 
-	if (iov_count) {
+	if (unlikely(iov_count)) {
 		struct iovec *iov = NULL;
 		struct iov_iter i;
 
-		res = import_iovec(r0w, hp->dxferp, iov_count, 0, &iov, &i);
+		res = import_iovec(r0w, up, iov_count, 0, &iov, &i);
 		if (res < 0)
-			return res;
+			goto fini;
 
-		iov_iter_truncate(&i, hp->dxfer_len);
+		iov_iter_truncate(&i, dxfer_len);
 		if (!iov_iter_count(&i)) {
 			kfree(iov);
-			return -EINVAL;
+			res = -EINVAL;
+			goto fini;
 		}
 
-		res = blk_rq_map_user_iov(q, rq, md, &i, GFP_ATOMIC);
+		if (us_xfer)
+			res = blk_rq_map_user_iov(q, rq, md, &i, GFP_ATOMIC);
 		kfree(iov);
-	} else
-		res = blk_rq_map_user(q, rq, md, hp->dxferp,
-				      hp->dxfer_len, GFP_ATOMIC);
-
-	if (!res) {
+		if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
+			cp = "iov_count > 0";
+	} else if (us_xfer) { /* setup for transfer data to/from user space */
+		res = blk_rq_map_user(q, rq, md, up, dxfer_len, GFP_ATOMIC);
+		if (IS_ENABLED(CONFIG_SCSI_PROC_FS) && res)
+			SG_LOG(1, sfp, "%s: blk_rq_map_user() res=%d\n",
+			       __func__, res);
+	}
+fini:
+	if (unlikely(res)) {		/* failure, free up resources */
+		scsi_req_free_cmd(scsi_rp);
+		srp->rq = NULL;
+		if (us_xfer && rq->bio)
+			blk_rq_unmap_user(rq->bio);
+		blk_put_request(rq);
+	} else {
 		srp->bio = rq->bio;
-
-		if (!md) {
-			req_schp->dio_in_use = 1;
-			hp->info |= SG_INFO_DIRECT_IO;
-		}
 	}
+	SG_LOG((res ? 1 : 4), sfp, "%s: %s res=%d [0x%pK]\n", __func__, cp,
+	       res, srp);
 	return res;
 }
 
+/*
+ * Clean up mid-level and block layer resources of finished request. Sometimes
+ * blk_rq_unmap_user() returns -4 (-EINTR) and this is why: "If we're in a
+ * workqueue, the request is orphaned, so don't copy into a random user
+ * address space, just free and return -EINTR so user space doesn't expect
+ * any data." [block/bio.c]
+ */
 static void
 sg_finish_scsi_blk_rq(struct sg_request *srp)
 {
 	int ret;
-
 	struct sg_fd *sfp = srp->parentfp;
-	struct sg_scatter_hold *req_schp = &srp->data;
+	struct request *rq = srp->rq;
 
 	SG_LOG(4, sfp, "%s: srp=0x%pK%s\n", __func__, srp,
-	       (srp->res_used) ? " rsv" : "");
-	if (!srp->sg_io_owned) {
+	       (srp->parentfp->rsv_srp == srp) ? " rsv" : "");
+	if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
 		atomic_dec(&sfp->submitted);
 		atomic_dec(&sfp->waiting);
 	}
 	if (srp->bio) {
-		ret = blk_rq_unmap_user(srp->bio);
-		if (ret)	/* -EINTR (-4) can be ignored */
-			SG_LOG(6, sfp, "%s: blk_rq_unmap_user() --> %d\n",
-			       __func__, ret);
+		bool us_xfer = !test_bit(SG_FRQ_NO_US_XFER, srp->frq_bm);
+
+		if (us_xfer) {
+			ret = blk_rq_unmap_user(srp->bio);
+			if (ret) {	/* -EINTR (-4) can be ignored */
+				SG_LOG(6, sfp,
+				       "%s: blk_rq_unmap_user() --> %d\n",
+				       __func__, ret);
+			}
+		}
 		srp->bio = NULL;
 	}
+	/* In worst case READ data returned to user space by this point */
 
-	if (srp->rq) {
-		scsi_req_free_cmd(scsi_req(srp->rq));
-		blk_put_request(srp->rq);
+	/* Expect blk_put_request(rq) already called in sg_rq_end_io() */
+	if (rq) {       /* blk_get_request() may have failed */
+		if (scsi_req(rq))
+			scsi_req_free_cmd(scsi_req(rq));
+		srp->rq = NULL;
+		blk_put_request(rq);
 	}
-
-	if (srp->res_used)
-		sg_unlink_reserve(sfp, srp);
-	else
-		sg_remove_sgat(sfp, req_schp);
 }
 
 static int
-sg_mk_sgat(struct sg_scatter_hold *schp, struct sg_fd *sfp, int minlen)
+sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen)
 {
 	int j, k, rem_sz, align_sz, order, o_order;
 	int mx_sgat_elems = sfp->parentdp->max_sgat_elems;
@@ -2404,6 +2638,7 @@ sg_mk_sgat(struct sg_scatter_hold *schp, struct sg_fd *sfp, int minlen)
 	gfp_t mask_ap = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN | __GFP_ZERO;
 	gfp_t mask_kz = GFP_ATOMIC | __GFP_NOWARN;
 	struct sg_device *sdp = sfp->parentdp;
+	struct sg_scatter_hold *schp = &srp->sgat_h;
 
 	if (unlikely(minlen <= 0)) {
 		if (minlen < 0)
@@ -2414,9 +2649,8 @@ sg_mk_sgat(struct sg_scatter_hold *schp, struct sg_fd *sfp, int minlen)
 	align_sz = ALIGN(minlen, SG_DEF_SECTOR_SZ);
 
 	schp->pages = kcalloc(mx_sgat_elems, ptr_sz, mask_kz);
-	SG_LOG(4, sfp, "%s: minlen=%d, align_sz=%d [sz=%zu, 0x%pK ++]\n",
-	       __func__, minlen, align_sz, mx_sgat_elems * ptr_sz,
-	       schp->pages);
+	SG_LOG(4, sfp, "%s: minlen=%d [sz=%zu, 0x%pK ++]\n", __func__, minlen,
+	       mx_sgat_elems * ptr_sz, schp->pages);
 	if (unlikely(!schp->pages))
 		return -ENOMEM;
 
@@ -2480,13 +2714,15 @@ sg_remove_sgat_helper(struct sg_fd *sfp, struct sg_scatter_hold *schp)
 
 /* Remove the data (possibly a sgat list) held by srp, not srp itself */
 static void
-sg_remove_sgat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
+sg_remove_sgat(struct sg_request *srp)
 {
+	struct sg_scatter_hold *schp = &srp->sgat_h; /* care: remove own data */
+	struct sg_fd *sfp = srp->parentfp;
+
 	SG_LOG(4, sfp, "%s: num_sgat=%d%s\n", __func__, schp->num_sgat,
-	       ((sfp ? (&sfp->reserve == schp) : false) ?
+	       ((srp->parentfp ? (sfp->rsv_srp == srp) : false) ?
 		" [rsv]" : ""));
-	if (!schp->dio_in_use)
-		sg_remove_sgat_helper(sfp, schp);
+	sg_remove_sgat_helper(sfp, schp);
 
 	memset(schp, 0, sizeof(*schp));         /* zeros buflen and dlen */
 }
@@ -2501,7 +2737,7 @@ sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
 {
 	int k, num, res;
 	struct page *pgp;
-	struct sg_scatter_hold *schp = &srp->data;
+	struct sg_scatter_hold *schp = &srp->sgat_h;
 
 	SG_LOG(4, srp->parentfp, "%s: num_xfer=%d\n", __func__, num_xfer);
 	if (unlikely(!outp || num_xfer <= 0))
@@ -2515,11 +2751,11 @@ sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
 			break;
 		}
 		if (num > num_xfer) {
-			if (__copy_to_user(outp, page_address(pgp), num_xfer))
+			if (copy_to_user(outp, page_address(pgp), num_xfer))
 				res = -EFAULT;
 			break;
 		} else {
-			if (__copy_to_user(outp, page_address(pgp), num)) {
+			if (copy_to_user(outp, page_address(pgp), num)) {
 				res = -EFAULT;
 				break;
 			}
@@ -2542,135 +2778,273 @@ sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
 static struct sg_request *
 sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 {
+	__maybe_unused bool is_bad_st = false;
+	__maybe_unused enum sg_rq_state bad_sr_st = SG_RS_INACTIVE;
+	bool search_for_1 = (pack_id != SG_PACK_ID_WILDCARD);
+	int res;
+	int num_waiting = atomic_read(&sfp->waiting);
 	unsigned long idx;
-	struct sg_request *resp;
+	struct sg_request *srp = NULL;
+	struct xarray *xafp = &sfp->srp_arr;
 
-	xa_for_each_marked(&sfp->srp_arr, idx, resp, SG_XA_RQ_AWAIT) {
-		if (!resp)
-			continue;
-		/* look for requests that are ready + not SG_IO owned */
-		if (resp->done == 1 && !resp->sg_io_owned &&
-		    (-1 == pack_id || resp->header.pack_id == pack_id)) {
-			resp->done = 2;	/* guard against other readers */
-			return resp;
+	if (num_waiting < 1)
+		return NULL;
+	if (unlikely(search_for_1)) {
+		xa_for_each_marked(xafp, idx, srp, SG_XA_RQ_AWAIT) {
+			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
+				continue;
+			if (srp->pack_id != pack_id)
+				continue;
+			res = sg_rq_state_chg(srp, SG_RS_AWAIT_RCV, SG_RS_BUSY,
+					      false, __func__);
+			if (likely(res == 0))
+				goto good;
+			/* else another caller got it, move on */
+			if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
+				is_bad_st = true;
+				bad_sr_st = atomic_read(&srp->rq_st);
+			}
+			break;
+		}
+	} else {        /* search for any request is more likely */
+		xa_for_each_marked(xafp, idx, srp, SG_XA_RQ_AWAIT) {
+			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
+				continue;
+			res = sg_rq_state_chg(srp, SG_RS_AWAIT_RCV, SG_RS_BUSY,
+					      false, __func__);
+			if (likely(res == 0))
+				goto good;
+		}
+	}
+	/* here if one of above loops does _not_ find a match */
+	if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
+		if (search_for_1) {
+			const char *cptp = "pack_id=";
+
+			if (is_bad_st)
+				SG_LOG(1, sfp, "%s: %s%d wrong state: %s\n",
+				       __func__, cptp, pack_id,
+				       sg_rq_st_str(bad_sr_st, true));
+			else
+				SG_LOG(6, sfp, "%s: %s%d not awaiting read\n",
+				       __func__, cptp, pack_id);
 		}
 	}
 	return NULL;
+good:
+	SG_LOG(6, sfp, "%s: %s%d found [srp=0x%pK]\n", __func__, "pack_id=",
+	       pack_id, srp);
+	return srp;
 }
 
-static void
-sg_link_reserve(struct sg_fd *sfp, struct sg_request *srp, int size)
+/*
+ * Makes a new sg_request object. If 'first' is set then use GFP_KERNEL which
+ * may take time but has improved chance of success, otherwise use GFP_ATOMIC.
+ * Note that basic initialization is done but srp is not added to either sfp
+ * list. On error returns twisted negated errno value (not NULL).
+ */
+static struct sg_request *
+sg_mk_srp(struct sg_fd *sfp, bool first)
 {
-	struct sg_scatter_hold *req_schp = &srp->data;
-	struct sg_scatter_hold *rsv_schp = &sfp->reserve;
-	int k, num, rem;
-
-	srp->res_used = 1;
-	SG_LOG(4, sfp, "%s: size=%d\n", __func__, size);
-	rem = size;
-
-	num = 1 << (PAGE_SHIFT + rsv_schp->page_order);
-	for (k = 0; k < rsv_schp->num_sgat; k++) {
-		if (rem <= num) {
-			req_schp->num_sgat = k + 1;
-			req_schp->sglist_len = rsv_schp->sglist_len;
-			req_schp->pages = rsv_schp->pages;
+	struct sg_request *srp;
+	gfp_t gfp =  __GFP_NOWARN;
 
-			req_schp->buflen = size;
-			req_schp->page_order = rsv_schp->page_order;
-			break;
-		} else
-			rem -= num;
+	if (first)      /* prepared to wait if none already outstanding */
+		srp = kzalloc(sizeof(*srp), gfp | GFP_KERNEL);
+	else
+		srp = kzalloc(sizeof(*srp), gfp | GFP_ATOMIC);
+	if (srp) {
+		atomic_set(&srp->rq_st, SG_RS_BUSY);
+		srp->parentfp = sfp;
+		return srp;
+	} else {
+		return ERR_PTR(-ENOMEM);
 	}
-
-	if (k >= rsv_schp->num_sgat)
-		SG_LOG(1, sfp, "%s: BAD size\n", __func__);
 }
 
-static void
-sg_unlink_reserve(struct sg_fd *sfp, struct sg_request *srp)
+static struct sg_request *
+sg_mk_srp_sgat(struct sg_fd *sfp, bool first, int db_len)
 {
-	struct sg_scatter_hold *req_schp = &srp->data;
+	int res;
+	struct sg_request *n_srp = sg_mk_srp(sfp, first);
 
-	SG_LOG(4, srp->parentfp, "%s: req->num_sgat=%d\n", __func__,
-	       (int)req_schp->num_sgat);
-	req_schp->num_sgat = 0;
-	req_schp->buflen = 0;
-	req_schp->pages = NULL;
-	req_schp->page_order = 0;
-	req_schp->sglist_len = 0;
-	srp->res_used = 0;
-	/* Called without mutex lock to avoid deadlock */
-	sfp->res_in_use = 0;
+	if (IS_ERR(n_srp))
+		return n_srp;
+	if (db_len > 0) {
+		res = sg_mk_sgat(n_srp, sfp, db_len);
+		if (res) {
+			kfree(n_srp);
+			return ERR_PTR(res);
+		}
+	}
+	return n_srp;
 }
 
-static void
-sg_build_reserve(struct sg_fd *sfp, int req_size)
+/*
+ * Irrespective of the given reserve request size, the minimum size requested
+ * will be PAGE_SIZE (often 4096 bytes). Returns a pointer to reserve object or
+ * a negated errno value twisted by ERR_PTR() macro. The actual number of bytes
+ * allocated (maybe less than buflen) is in srp->sgat_h.buflen . Note that this
+ * function is only called in contexts where locking is not required.
+ */
+static struct sg_request *
+sg_build_reserve(struct sg_fd *sfp, int buflen)
 {
-	struct sg_scatter_hold *schp = &sfp->reserve;
+	bool go_out = false;
+	int res;
+	struct sg_request *srp;
 
-	SG_LOG(3, sfp, "%s: buflen=%d\n", __func__, req_size);
+	SG_LOG(3, sfp, "%s: buflen=%d\n", __func__, buflen);
+	srp = sg_mk_srp(sfp, xa_empty(&sfp->srp_arr));
+	if (IS_ERR(srp))
+		return srp;
+	sfp->rsv_srp = srp;
 	do {
-		if (req_size < PAGE_SIZE)
-			req_size = PAGE_SIZE;
-		if (sg_mk_sgat(schp, sfp, req_size) == 0)
-			return;
-		sg_remove_sgat(sfp, schp);
-		req_size >>= 1;	/* divide by 2 */
-	} while (req_size > (PAGE_SIZE / 2));
+		if (buflen < (int)PAGE_SIZE) {
+			buflen = PAGE_SIZE;
+			go_out = true;
+		}
+		res = sg_mk_sgat(srp, sfp, buflen);
+		if (res == 0) {
+			SG_LOG(4, sfp, "%s: final buflen=%d, srp=0x%pK ++\n",
+			       __func__, buflen, srp);
+			return srp;
+		}
+		if (go_out)
+			return ERR_PTR(res);
+		/* failed so remove, halve buflen, try again */
+		sg_remove_sgat(srp);
+		buflen >>= 1;   /* divide by 2 */
+	} while (true);
 }
 
-/* always adds to end of list */
+/*
+ * Setup an active request (soon to carry a SCSI command) to the current file
+ * descriptor by creating a new one or re-using a request from the free
+ * list (fl). If successful returns a valid pointer in SG_RS_BUSY state. On
+ * failure returns a negated errno value twisted by ERR_PTR() macro.
+ */
 static struct sg_request *
-sg_setup_req(struct sg_fd *sfp)
+sg_setup_req(struct sg_fd *sfp, int dxfr_len, struct sg_comm_wr_t *cwrp)
 {
+	bool act_empty = false;
 	bool found = false;
+	bool mk_new_srp = false;
+	bool try_harder = false;
 	int res;
-	unsigned long idx, iflags;
-	struct sg_request *rp;
+	int num_inactive = 0;
+	unsigned long idx, last_idx, iflags;
+	struct sg_request *r_srp = NULL;	/* request to return */
 	struct xarray *xafp = &sfp->srp_arr;
-
-	if (!xa_empty(xafp)) {
-		xa_for_each_marked(xafp, idx, rp, SG_XA_RQ_INACTIVE) {
-			if (!rp)
+	__maybe_unused const char *cp;
+
+start_again:
+	cp = "";
+	if (xa_empty(xafp)) {
+		act_empty = true;
+		mk_new_srp = true;
+	} else if (!try_harder && dxfr_len < SG_DEF_SECTOR_SZ) {
+		last_idx = ~0UL;
+		xa_for_each_marked(xafp, idx, r_srp, SG_XA_RQ_INACTIVE) {
+			if (!r_srp)
 				continue;
-			if (sg_rstate_chg(rp, SG_RS_INACTIVE, SG_RS_BUSY))
+			++num_inactive;
+			if (dxfr_len < SG_DEF_SECTOR_SZ) {
+				last_idx = idx;
 				continue;
-			memset(rp, 0, sizeof(*rp));
-			rp->rq_idx = idx;
-			xa_lock_irqsave(xafp, iflags);
-			__xa_clear_mark(xafp, idx, SG_XA_RQ_INACTIVE);
-			xa_unlock_irqrestore(xafp, iflags);
+			}
+		}
+		/* If dxfr_len is small, use last inactive request */
+		if (last_idx != ~0UL) {
+			idx = last_idx;
+			r_srp = xa_load(xafp, idx);
+			if (!r_srp)
+				goto start_again;
+			if (sg_rq_state_chg(r_srp, SG_RS_INACTIVE, SG_RS_BUSY,
+					    false, __func__))
+				goto start_again; /* gone to another thread */
+			cp = "toward back of srp_arr";
 			found = true;
-			break;
+		}
+	} else {
+		xa_for_each_marked(xafp, idx, r_srp, SG_XA_RQ_INACTIVE) {
+			if (!r_srp)
+				continue;
+			if (r_srp->sgat_h.buflen >= dxfr_len) {
+				if (sg_rq_state_chg
+					(r_srp, SG_RS_INACTIVE, SG_RS_BUSY,
+					 false, __func__))
+					continue;
+				cp = "from front of srp_arr";
+				found = true;
+				break;
+			}
 		}
 	}
-	if (!found) {
-		rp = kzalloc(sizeof(*rp), GFP_KERNEL);
-		if (!rp)
-			return NULL;
+	if (found) {
+		r_srp->in_resid = 0;
+		r_srp->rq_info = 0;
+		r_srp->sense_len = 0;
+		mk_new_srp = false;
+	} else {
+		mk_new_srp = true;
 	}
-	rp->parentfp = sfp;
-	rp->header.duration = jiffies_to_msecs(jiffies);
-	if (!found) {
+	if (mk_new_srp) {
+		bool allow_cmd_q = test_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
 		u32 n_idx;
 		struct xa_limit xal = { .max = 0, .min = 0 };
 
-		atomic_set(&rp->rq_st, SG_RS_BUSY);
+		cp = "new";
+		if (!allow_cmd_q && atomic_read(&sfp->submitted) > 0) {
+			r_srp = ERR_PTR(-EDOM);
+			SG_LOG(6, sfp, "%s: trying 2nd req but cmd_q=false\n",
+			       __func__);
+			goto fini;
+		}
+		r_srp = sg_mk_srp_sgat(sfp, act_empty, dxfr_len);
+		if (IS_ERR(r_srp)) {
+			if (!try_harder && dxfr_len < SG_DEF_SECTOR_SZ &&
+			    num_inactive > 0) {
+				try_harder = true;
+				goto start_again;
+			}
+			goto fini;
+		}
+		atomic_set(&r_srp->rq_st, SG_RS_BUSY);
 		xa_lock_irqsave(xafp, iflags);
 		xal.max = atomic_inc_return(&sfp->req_cnt);
-		res = __xa_alloc(xafp, &n_idx, rp, xal, GFP_KERNEL);
+		res = __xa_alloc(xafp, &n_idx, r_srp, xal, GFP_KERNEL);
 		xa_unlock_irqrestore(xafp, iflags);
 		if (res < 0) {
-			pr_warn("%s: don't expect xa_alloc() to fail, errno=%d\n",
-				__func__,  -res);
-			return NULL;
+			SG_LOG(1, sfp, "%s: xa_alloc() failed, errno=%d\n",
+			       __func__,  -res);
+			sg_remove_sgat(r_srp);
+			kfree(r_srp);
+			r_srp = ERR_PTR(-EPROTOTYPE);
+			goto fini;
 		}
-		rp->rq_idx = n_idx;
-	}
-	return rp;
+		idx = n_idx;
+		r_srp->rq_idx = idx;
+		r_srp->parentfp = sfp;
+		SG_LOG(4, sfp, "%s: mk_new_srp=0x%pK ++\n", __func__, r_srp);
+	}
+	r_srp->frq_bm[0] = cwrp->frq_bm[0];	/* assumes <= 32 req flags */
+	r_srp->sgat_h.dlen = dxfr_len;/* must be <= r_srp->sgat_h.buflen */
+	r_srp->cmd_opcode = 0xff;  /* set invalid opcode (VS), 0x0 is TUR */
+fini:
+	if (IS_ERR(r_srp))
+		SG_LOG(1, sfp, "%s: err=%ld\n", __func__, PTR_ERR(r_srp));
+	if (!IS_ERR(r_srp))
+		SG_LOG(4, sfp, "%s: %s r_srp=0x%pK\n", __func__, cp, r_srp);
+	return r_srp;
 }
 
+/*
+ * Moves a completed sg_request object to the free list and sets it to
+ * SG_RS_INACTIVE which makes it available for re-use. Requests with no data
+ * associated are appended to the tail of the free list while other requests
+ * are prepended to the head of the free list.
+ */
 static void
 sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 {
@@ -2678,34 +3052,43 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 
 	if (WARN_ON(!sfp || !srp))
 		return;
+	atomic_set(&srp->rq_st, SG_RS_INACTIVE);
 	xa_lock_irqsave(&sfp->srp_arr, iflags);
 	__xa_set_mark(&sfp->srp_arr, srp->rq_idx, SG_XA_RQ_INACTIVE);
+	__xa_clear_mark(&sfp->srp_arr, srp->rq_idx, SG_XA_RQ_AWAIT);
 	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
-	atomic_set(&srp->rq_st, SG_RS_INACTIVE);
 }
 
+/* Returns pointer to sg_fd object or negated errno twisted by ERR_PTR */
 static struct sg_fd *
 sg_add_sfp(struct sg_device *sdp)
 {
+	bool reduced = false;
 	int rbuf_len, res;
 	u32 idx;
+	long err;
 	unsigned long iflags;
 	struct sg_fd *sfp;
+	struct sg_request *srp = NULL;
+	struct xarray *xadp = &sdp->sfp_arr;
+	struct xarray *xafp;
 	struct xa_limit xal;
 
 	sfp = kzalloc(sizeof(*sfp), GFP_ATOMIC | __GFP_NOWARN);
 	if (!sfp)
 		return ERR_PTR(-ENOMEM);
-
 	init_waitqueue_head(&sfp->read_wait);
 	xa_init_flags(&sfp->srp_arr, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
+	xafp = &sfp->srp_arr;
 	kref_init(&sfp->f_ref);
 	mutex_init(&sfp->f_mutex);
 	sfp->timeout = SG_DEFAULT_TIMEOUT;
 	sfp->timeout_user = SG_DEFAULT_TIMEOUT_USER;
-	sfp->force_packid = SG_DEF_FORCE_PACK_ID;
-	sfp->cmd_q = SG_DEF_COMMAND_Q;
-	sfp->keep_orphan = SG_DEF_KEEP_ORPHAN;
+	/* other bits in sfp->ffd_bm[1] cleared by kzalloc() above */
+	__assign_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm, SG_DEF_FORCE_PACK_ID);
+	__assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, SG_DEF_COMMAND_Q);
+	__assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, SG_DEF_KEEP_ORPHAN);
+	__assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm, SG_DEFAULT_Q_AT);
 	/*
 	 * SG_SCATTER_SZ initializes scatter_elem_sz but different value may
 	 * be given as driver/module parameter (e.g. 'scatter_elem_sz=8192').
@@ -2719,28 +3102,64 @@ sg_add_sfp(struct sg_device *sdp)
 	atomic_set(&sfp->waiting, 0);
 	atomic_set(&sfp->req_cnt, 0);
 
-	if (SG_IS_DETACHING(sdp)) {
+	if (unlikely(SG_IS_DETACHING(sdp))) {
+		SG_LOG(1, sfp, "%s: detaching\n", __func__);
 		kfree(sfp);
 		return ERR_PTR(-ENODEV);
 	}
-	SG_LOG(3, sfp, "%s: sfp=0x%pK\n", __func__, sfp);
 	if (unlikely(sg_big_buff != def_reserved_size))
 		sg_big_buff = def_reserved_size;
 
 	rbuf_len = min_t(int, sg_big_buff, sdp->max_sgat_sz);
-	if (rbuf_len > 0)
-		sg_build_reserve(sfp, rbuf_len);
-
-	xa_lock_irqsave(&sdp->sfp_arr, iflags);
+	if (rbuf_len > 0) {
+		struct xa_limit xalrq = { .max = 0, .min = 0 };
+
+		srp = sg_build_reserve(sfp, rbuf_len);
+		if (IS_ERR(srp)) {
+			err = PTR_ERR(srp);
+			SG_LOG(1, sfp, "%s: build reserve err=%ld\n", __func__,
+			       -err);
+			kfree(sfp);
+			return ERR_PTR(err);
+		}
+		if (srp->sgat_h.buflen < rbuf_len) {
+			reduced = true;
+			SG_LOG(2, sfp,
+			       "%s: reserve reduced from %d to buflen=%d\n",
+			       __func__, rbuf_len, srp->sgat_h.buflen);
+		}
+		xa_lock_irqsave(xafp, iflags);
+		xalrq.max = atomic_inc_return(&sfp->req_cnt);
+		res = __xa_alloc(xafp, &idx, srp, xalrq, GFP_ATOMIC);
+		xa_unlock_irqrestore(xafp, iflags);
+		if (res < 0) {
+			SG_LOG(1, sfp, "%s: xa_alloc(srp) bad, errno=%d\n",
+			       __func__,  -res);
+			sg_remove_sgat(srp);
+			kfree(srp);
+			kfree(sfp);
+			return ERR_PTR(-EPROTOTYPE);
+		}
+		srp->rq_idx = idx;
+		srp->parentfp = sfp;
+		sg_rq_state_chg(srp, 0, SG_RS_INACTIVE, true, __func__);
+	}
+	if (!reduced) {
+		SG_LOG(4, sfp, "%s: built reserve buflen=%d\n", __func__,
+		       rbuf_len);
+	}
+	xa_lock_irqsave(xadp, iflags);
 	xal.min = 0;
 	xal.max = atomic_read(&sdp->open_cnt);
-	res = __xa_alloc(&sdp->sfp_arr, &idx, sfp, xal, GFP_KERNEL);
-	xa_unlock_irqrestore(&sdp->sfp_arr, iflags);
+	res = __xa_alloc(xadp, &idx, sfp, xal, GFP_KERNEL);
+	xa_unlock_irqrestore(xadp, iflags);
 	if (res < 0) {
 		pr_warn("%s: xa_alloc(sdp) bad, o_count=%d, errno=%d\n",
 			__func__, xal.max, -res);
-		if (rbuf_len > 0)
-			sg_remove_sgat(sfp, &sfp->reserve);
+		if (srp) {
+			sg_remove_sgat(srp);
+			kfree(srp);
+		}
 		kfree(sfp);
 		return ERR_PTR(res);
 	}
@@ -2771,12 +3190,14 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 	struct sg_request *srp;
 	struct sg_request *e_srp;
 	struct xarray *xafp = &sfp->srp_arr;
+	struct xarray *xadp;
 
 	if (!sfp) {
 		pr_warn("sg: %s: sfp is NULL\n", __func__);
 		return;
 	}
 	sdp = sfp->parentdp;
+	xadp = &sdp->sfp_arr;
 
 	/* Cleanup any responses which were never read(). */
 	xa_for_each(xafp, idx, srp) {
@@ -2784,24 +3205,20 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 			continue;
 		if (!xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE))
 			sg_finish_scsi_blk_rq(srp);
+		sg_remove_sgat(srp);
 		xa_lock_irqsave(xafp, iflags);
 		e_srp = __xa_erase(xafp, srp->rq_idx);
 		xa_unlock_irqrestore(xafp, iflags);
 		if (srp != e_srp)
 			SG_LOG(1, sfp, "%s: xa_erase() return unexpected\n",
 			       __func__);
+		SG_LOG(6, sfp, "%s: kfree: srp=%pK --\n", __func__, srp);
 		kfree(srp);
 	}
 	xa_destroy(xafp);
-	if (sfp->reserve.buflen > 0) {
-		SG_LOG(6, sfp, "%s:    buflen=%d, num_sgat=%d\n", __func__,
-		       (int)sfp->reserve.buflen, (int)sfp->reserve.num_sgat);
-		sg_remove_sgat(sfp, &sfp->reserve);
-	}
-
-	xa_lock_irqsave(&sdp->sfp_arr, iflags);
-	e_sfp = __xa_erase(&sdp->sfp_arr, sfp->idx);
-	xa_unlock_irqrestore(&sdp->sfp_arr, iflags);
+	xa_lock_irqsave(xadp, iflags);
+	e_sfp = __xa_erase(xadp, sfp->idx);
+	xa_unlock_irqrestore(xadp, iflags);
 	if (unlikely(sfp != e_sfp))
 		SG_LOG(1, sfp, "%s: xa_erase() return unexpected\n",
 		       __func__);
@@ -3051,6 +3468,7 @@ sg_proc_seq_show_devhdr(struct seq_file *s, void *v)
 struct sg_proc_deviter {
 	loff_t	index;
 	size_t	max;
+	int fd_index;
 };
 
 static void *
@@ -3134,11 +3552,10 @@ sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 static void
 sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 {
-	int k, new_interface, blen, usg;
+	int k;
 	unsigned long idx, idx2;
 	struct sg_request *srp;
 	struct sg_fd *fp;
-	const struct sg_io_hdr *hp;
 	const char * cp;
 	unsigned int ms;
 
@@ -3149,51 +3566,53 @@ sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
 		k++;
 		seq_printf(s, "   FD(%d): timeout=%dms buflen=%d (res)sgat=%d low_dma=%d idx=%lu\n",
 			   k, jiffies_to_msecs(fp->timeout),
-			   fp->reserve.buflen, (int)fp->reserve.num_sgat,
+			   fp->rsv_srp->sgat_h.buflen,
+			   (int)fp->rsv_srp->sgat_h.num_sgat,
 			   (int)sdp->device->host->unchecked_isa_dma, idx);
 		seq_printf(s, "   cmd_q=%d f_packid=%d k_orphan=%d closed=0\n",
-			   (int) fp->cmd_q, (int) fp->force_packid,
-			   (int) fp->keep_orphan);
+			   (int)test_bit(SG_FFD_CMD_Q, fp->ffd_bm),
+			   (int)test_bit(SG_FFD_FORCE_PACKID, fp->ffd_bm),
+			   (int)test_bit(SG_FFD_KEEP_ORPHAN, fp->ffd_bm));
 		seq_printf(s, "   submitted=%d waiting=%d\n",
 			   atomic_read(&fp->submitted),
 			   atomic_read(&fp->waiting));
 		xa_for_each(&fp->srp_arr, idx2, srp) {
+			const struct sg_slice_hdr3 *sh3p = &srp->s_hdr3;
+			bool is_v3 = (sh3p->interface_id != '\0');
+			enum sg_rq_state rq_st = atomic_read(&srp->rq_st);
+
 			if (!srp)
 				continue;
-			hp = &srp->header;
-			new_interface = (hp->interface_id == '\0') ? 0 : 1;
-			if (srp->res_used) {
-				if (new_interface &&
-				    (SG_FLAG_MMAP_IO & hp->flags))
+			if (srp->parentfp->rsv_srp == srp) {
+				if (is_v3 && (SG_FLAG_MMAP_IO & sh3p->flags))
 					cp = "     mmap>> ";
 				else
 					cp = "     rb>> ";
 			} else {
-				if (SG_INFO_DIRECT_IO_MASK & hp->info)
+				if (SG_INFO_DIRECT_IO_MASK & srp->rq_info)
 					cp = "     dio>> ";
 				else
 					cp = "     ";
 			}
 			seq_puts(s, cp);
-			blen = srp->data.buflen;
-			usg = srp->data.num_sgat;
-			seq_puts(s, srp->done ?
-				 ((1 == srp->done) ?  "rcv:" : "fin:")
-				  : "act:");
-			seq_printf(s, " id=%d blen=%d",
-				   srp->header.pack_id, blen);
-			if (srp->done)
-				seq_printf(s, " dur=%d", hp->duration);
-			else {
-				ms = jiffies_to_msecs(jiffies);
-				seq_printf(s, " t_o/elap=%d/%d",
-					(new_interface ? hp->timeout :
-						  jiffies_to_msecs(fp->timeout)),
-					(ms > hp->duration ? ms - hp->duration : 0));
+			seq_puts(s, sg_rq_st_str(rq_st, false));
+			seq_printf(s, ": id=%d len/blen=%d/%d",
+				   srp->pack_id, srp->sgat_h.dlen,
+				   srp->sgat_h.buflen);
+			if (rq_st == SG_RS_AWAIT_RCV ||
+			    rq_st == SG_RS_RCV_DONE) {
+				seq_printf(s, " dur=%d", srp->duration);
+				goto fin_line;
 			}
-			seq_printf(s, "ms sgat=%d op=0x%02x dummy: %s\n", usg,
-				   (int)srp->data.cmd_opcode,
-				   sg_rq_st_str(SG_RS_INACTIVE, false));
+			ms = jiffies_to_msecs(jiffies);
+			seq_printf(s, " t_o/elap=%d/%d",
+				   (is_v3 ? sh3p->timeout :
+					    jiffies_to_msecs(fp->timeout)),
+				   (ms > srp->duration ?  ms - srp->duration :
+							  0));
+fin_line:
+			seq_printf(s, "ms sgat=%d op=0x%02x\n",
+				   srp->sgat_h.num_sgat, (int)srp->cmd_opcode);
 		}
 		if (xa_empty(&fp->srp_arr))
 			seq_puts(s, "     No requests active\n");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 26/83] sg: sense buffer rework
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (25 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 25/83] sg: replace rq array with xarray Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 27/83] sg: add sg v4 interface support Douglas Gilbert
                   ` (56 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

The biggest single item in the sg_request object is the sense
buffer array which is SCSI_SENSE_BUFFERSIZE bytes long. That
constant started out at 18 bytes 20 years ago and is 96 bytes
now and might grow in the future. On the other hand the sense
buffer is only used by a small number of SCSI commands: those
that fail and those that want to return more information
other than a SCSI status of GOOD.

Set up a small mempool called "sg_sense" that is only used as
required and released back to the mempool as soon as practical.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 114 +++++++++++++++++++++++++++++++++++-----------
 1 file changed, 88 insertions(+), 26 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 6df7aa81349b..2baebe33b05d 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -72,6 +72,10 @@ static char *sg_version_date = "20190606";
  */
 #define SG_MAX_CDB_SIZE 252
 
+static struct kmem_cache *sg_sense_cache;
+#define SG_MEMPOOL_MIN_NR 4
+static mempool_t *sg_sense_pool;
+
 /* Following enum contains the states of sg_request::rq_st */
 enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 	SG_RS_INACTIVE = 0,	/* request not in use (e.g. on fl) */
@@ -173,7 +177,6 @@ struct sg_fd;
 struct sg_request {	/* active SCSI command or inactive request */
 	struct sg_scatter_hold sgat_h;	/* hold buffer, perhaps scatter list */
 	struct sg_slice_hdr3 s_hdr3;  /* subset of sg_io_hdr */
-	u8 sense_b[SCSI_SENSE_BUFFERSIZE];
 	u32 duration;		/* cmd duration in milliseconds */
 	u32 rq_flags;		/* hold user supplied flags */
 	u32 rq_idx;		/* my index within parent's srp_arr */
@@ -186,6 +189,7 @@ struct sg_request {	/* active SCSI command or inactive request */
 	u8 cmd_opcode;		/* first byte of SCSI cdb */
 	u64 start_ns;		/* starting point of command duration calc */
 	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
+	u8 *sense_bp;		/* mempool alloc-ed sense buffer, as needed */
 	struct sg_fd *parentfp;	/* pointer to owning fd, even when on fl */
 	struct request *rq;	/* released in sg_rq_end_io(), bio kept */
 	struct bio *bio;	/* kept until this req -->SG_RS_INACTIVE */
@@ -957,18 +961,21 @@ sg_copy_sense(struct sg_request *srp)
 	    (driver_byte(srp->rq_result) & DRIVER_SENSE)) {
 		int sb_len = min_t(int, SCSI_SENSE_BUFFERSIZE, srp->sense_len);
 		int mx_sb_len = srp->s_hdr3.mx_sb_len;
+		u8 *sbp = srp->sense_bp;
 		void __user *up = srp->s_hdr3.sbp;
 
-		if (up && mx_sb_len > 0) {
+		srp->sense_bp = NULL;
+		if (up && mx_sb_len > 0 && sbp) {
 			sb_len = min_t(int, mx_sb_len, sb_len);
 			/* Additional sense length field */
-			sb_len_ret = 8 + (int)srp->sense_b[7];
+			sb_len_ret = 8 + (int)sbp[7];
 			sb_len_ret = min_t(int, sb_len_ret, sb_len);
-			if (copy_to_user(up, srp->sense_b, sb_len_ret))
+			if (copy_to_user(up, sbp, sb_len_ret))
 				sb_len_ret = -EFAULT;
 		} else {
 			sb_len_ret = 0;
 		}
+		mempool_free(sbp, sg_sense_pool);
 	}
 	return sb_len_ret;
 }
@@ -1059,8 +1066,14 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 	h2p->driver_status = driver_byte(rq_result);
 	if ((CHECK_CONDITION & status_byte(rq_result)) ||
 	    (DRIVER_SENSE & driver_byte(rq_result))) {
-		memcpy(h2p->sense_buffer, srp->sense_b,
-		       sizeof(h2p->sense_buffer));
+		if (srp->sense_bp) {
+			u8 *sbp = srp->sense_bp;
+
+			srp->sense_bp = NULL;
+			memcpy(h2p->sense_buffer, sbp,
+			       sizeof(h2p->sense_buffer));
+			mempool_free(sbp, sg_sense_pool);
+		}
 	}
 	switch (host_byte(rq_result)) {
 	/*
@@ -1095,18 +1108,22 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 
 	/* Now copy the result back to the user buffer.  */
 	if (count >= SZ_SG_HEADER) {
-		if (copy_to_user(buf, h2p, SZ_SG_HEADER))
-			return -EFAULT;
+		if (copy_to_user(buf, h2p, SZ_SG_HEADER)) {
+			res = -EFAULT;
+			goto fini;
+		}
 		buf += SZ_SG_HEADER;
 		if (count > h2p->reply_len)
 			count = h2p->reply_len;
 		if (count > SZ_SG_HEADER) {
-			if (sg_read_append(srp, buf, count - SZ_SG_HEADER))
-				return -EFAULT;
+			res = sg_read_append(srp, buf, count - SZ_SG_HEADER);
+			if (res)
+				goto fini;
 		}
 	} else {
 		res = (h2p->result == 0) ? 0 : -EIO;
 	}
+fini:
 	atomic_set(&srp->rq_st, SG_RS_RCV_DONE);
 	sg_finish_scsi_blk_rq(srp);
 	sg_deact_request(sfp, srp);
@@ -2088,8 +2105,25 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 	srp->duration = sg_calc_rq_dur(srp);
 	if (unlikely((srp->rq_result & SG_ML_RESULT_MSK) && slen > 0))
 		sg_check_sense(sdp, srp, slen);
-	if (slen > 0)
-		memcpy(srp->sense_b, scsi_rp->sense, slen);
+	if (slen > 0) {
+		if (scsi_rp->sense) {
+			srp->sense_bp = mempool_alloc(sg_sense_pool,
+						      GFP_ATOMIC);
+			if (srp->sense_bp) {
+				memcpy(srp->sense_bp, scsi_rp->sense, slen);
+				if (slen < SCSI_SENSE_BUFFERSIZE)
+					memset(srp->sense_bp + slen, 0,
+					       SCSI_SENSE_BUFFERSIZE - slen);
+			} else {
+				slen = 0;
+				pr_warn("%s: sense but can't alloc buffer\n",
+					__func__);
+			}
+		} else {
+			slen = 0;
+			pr_warn("%s: sense_len>0 but sense==NULL\n", __func__);
+		}
+	}
 	srp->sense_len = slen;
 	if (unlikely(test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm))) {
 		if (test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm)) {
@@ -2378,13 +2412,30 @@ init_sg(void)
 	if (rc)
 		return rc;
 
+	sg_sense_cache = kmem_cache_create_usercopy
+				("sg_sense", SCSI_SENSE_BUFFERSIZE, 0,
+				 SLAB_HWCACHE_ALIGN, 0,
+				 SCSI_SENSE_BUFFERSIZE, NULL);
+	if (!sg_sense_cache) {
+		pr_err("sg: can't init sense cache\n");
+		rc = -ENOMEM;
+		goto err_out_unreg;
+	}
+	sg_sense_pool = mempool_create_slab_pool(SG_MEMPOOL_MIN_NR,
+						 sg_sense_cache);
+	if (!sg_sense_pool) {
+		pr_err("sg: can't init sense pool\n");
+		rc = -ENOMEM;
+		goto err_out_cache;
+	}
+
 	pr_info("Registered %s[char major=0x%x], version: %s, date: %s\n",
 		"sg device ", SCSI_GENERIC_MAJOR, SG_VERSION_STR,
 		sg_version_date);
 	sg_sysfs_class = class_create(THIS_MODULE, "scsi_generic");
 	if (IS_ERR(sg_sysfs_class)) {
 		rc = PTR_ERR(sg_sysfs_class);
-		goto err_out_unreg;
+		goto err_out_pool;
 	}
 	sg_sysfs_valid = true;
 	rc = scsi_register_interface(&sg_interface);
@@ -2394,6 +2445,10 @@ init_sg(void)
 	}
 	class_destroy(sg_sysfs_class);
 
+err_out_pool:
+	mempool_destroy(sg_sense_pool);
+err_out_cache:
+	kmem_cache_destroy(sg_sense_cache);
 err_out_unreg:
 	unregister_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0), SG_MAX_DEVS);
 	return rc;
@@ -2413,6 +2468,8 @@ exit_sg(void)
 	if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
 		remove_proc_subtree("scsi/sg", NULL);
 	scsi_unregister_interface(&sg_interface);
+	mempool_destroy(sg_sense_pool);
+	kmem_cache_destroy(sg_sense_cache);
 	class_destroy(sg_sysfs_class);
 	sg_sysfs_valid = false;
 	unregister_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0),
@@ -2935,6 +2992,7 @@ sg_setup_req(struct sg_fd *sfp, int dxfr_len, struct sg_comm_wr_t *cwrp)
 	int num_inactive = 0;
 	unsigned long idx, last_idx, iflags;
 	struct sg_request *r_srp = NULL;	/* request to return */
+	struct sg_request *last_srp = NULL;
 	struct xarray *xafp = &sfp->srp_arr;
 	__maybe_unused const char *cp;
 
@@ -2951,19 +3009,17 @@ sg_setup_req(struct sg_fd *sfp, int dxfr_len, struct sg_comm_wr_t *cwrp)
 			++num_inactive;
 			if (dxfr_len < SG_DEF_SECTOR_SZ) {
 				last_idx = idx;
+				last_srp = r_srp;
 				continue;
 			}
 		}
 		/* If dxfr_len is small, use last inactive request */
-		if (last_idx != ~0UL) {
-			idx = last_idx;
-			r_srp = xa_load(xafp, idx);
-			if (!r_srp)
-				goto start_again;
+		if (last_idx != ~0UL && last_srp) {
+			r_srp = last_srp;
 			if (sg_rq_state_chg(r_srp, SG_RS_INACTIVE, SG_RS_BUSY,
 					    false, __func__))
 				goto start_again; /* gone to another thread */
-			cp = "toward back of srp_arr";
+			cp = "toward end of srp_arr";
 			found = true;
 		}
 	} else {
@@ -3048,15 +3104,16 @@ sg_setup_req(struct sg_fd *sfp, int dxfr_len, struct sg_comm_wr_t *cwrp)
 static void
 sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 {
-	unsigned long iflags;
+	u8 *sbp;
 
 	if (WARN_ON(!sfp || !srp))
 		return;
-	atomic_set(&srp->rq_st, SG_RS_INACTIVE);
-	xa_lock_irqsave(&sfp->srp_arr, iflags);
-	__xa_set_mark(&sfp->srp_arr, srp->rq_idx, SG_XA_RQ_INACTIVE);
-	__xa_clear_mark(&sfp->srp_arr, srp->rq_idx, SG_XA_RQ_AWAIT);
-	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
+	sbp = srp->sense_bp;
+	srp->sense_bp = NULL;
+	sg_rq_state_chg(srp, 0, SG_RS_INACTIVE, true /* force */, __func__);
+	/* maybe orphaned req, thus never read */
+	if (sbp)
+		mempool_free(sbp, sg_sense_pool);
 }
 
 /* Returns pointer to sg_fd object or negated errno twisted by ERR_PTR */
@@ -3205,7 +3262,12 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 			continue;
 		if (!xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE))
 			sg_finish_scsi_blk_rq(srp);
-		sg_remove_sgat(srp);
+		if (srp->sgat_h.buflen > 0)
+			sg_remove_sgat(srp);
+		if (srp->sense_bp) {
+			mempool_free(srp->sense_bp, sg_sense_pool);
+			srp->sense_bp = NULL;
+		}
 		xa_lock_irqsave(xafp, iflags);
 		e_srp = __xa_erase(xafp, srp->rq_idx);
 		xa_unlock_irqrestore(xafp, iflags);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 27/83] sg: add sg v4 interface support
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (26 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 26/83] sg: sense buffer rework Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 28/83] sg: rework debug info Douglas Gilbert
                   ` (55 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Add support for the sg v4 interface based on struct sg_io_v4 found
in include/uapi/linux/bsg.h and only previously supported by the
bsg driver. Add ioctl(SG_IOSUBMIT) and ioctl(SG_IORECEIVE) for
async (non-blocking) usage of the sg v4 interface. Do not accept
the v3 interface with these ioctls. Do not accept the v4
interface with this driver's existing write() and read()
system calls.

For sync (blocking) usage expand the existing ioctl(SG_IO)
to additionally accept the sg v4 interface object.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 438 +++++++++++++++++++++++++++++++++--------
 include/uapi/scsi/sg.h |  37 +++-
 2 files changed, 396 insertions(+), 79 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 2baebe33b05d..e0dd62001a1e 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -40,11 +40,12 @@ static char *sg_version_date = "20190606";
 #include <linux/atomic.h>
 #include <linux/ratelimit.h>
 #include <linux/uio.h>
-#include <linux/cred.h> /* for sg_check_file_access() */
+#include <linux/cred.h>			/* for sg_check_file_access() */
 #include <linux/proc_fs.h>
 #include <linux/xarray.h>
 
-#include "scsi.h"
+#include <scsi/scsi.h>
+#include <scsi/scsi_eh.h>
 #include <scsi/scsi_dbg.h>
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_driver.h>
@@ -76,6 +77,9 @@ static struct kmem_cache *sg_sense_cache;
 #define SG_MEMPOOL_MIN_NR 4
 static mempool_t *sg_sense_pool;
 
+#define uptr64(usp_val) ((void __user *)(uintptr_t)(usp_val))
+#define cuptr64(usp_val) ((const void __user *)(uintptr_t)(usp_val))
+
 /* Following enum contains the states of sg_request::rq_st */
 enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 	SG_RS_INACTIVE = 0,	/* request not in use (e.g. on fl) */
@@ -100,6 +104,7 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 #define SG_ADD_RQ_MAX_RETRIES 40	/* to stop infinite _trylock(s) */
 
 /* Bit positions (flags) for sg_request::frq_bm bitmask follow */
+#define SG_FRQ_IS_V4I		0	/* true (set) when is v4 interface */
 #define SG_FRQ_IS_ORPHAN	1	/* owner of request gone */
 #define SG_FRQ_SYNC_INVOC	2	/* synchronous (blocking) invocation */
 #define SG_FRQ_NO_US_XFER	3	/* no user space transfer of data */
@@ -163,6 +168,15 @@ struct sg_slice_hdr3 {
 	void __user *usr_ptr;
 };
 
+struct sg_slice_hdr4 {	/* parts of sg_io_v4 object needed in async usage */
+	void __user *sbp;	/* derived from sg_io_v4::response */
+	u64 usr_ptr;		/* hold sg_io_v4::usr_ptr as given (u64) */
+	int out_resid;
+	s16 dir;		/* data xfer direction; SG_DXFER_*  */
+	u16 cmd_len;		/* truncated of sg_io_v4::request_len */
+	u16 max_sb_len;		/* truncated of sg_io_v4::max_response_len */
+};
+
 struct sg_scatter_hold {     /* holding area for scsi scatter gather info */
 	struct page **pages;	/* num_sgat element array of struct page* */
 	int buflen;		/* capacity in bytes (dlen<=buflen) */
@@ -176,7 +190,10 @@ struct sg_fd;
 
 struct sg_request {	/* active SCSI command or inactive request */
 	struct sg_scatter_hold sgat_h;	/* hold buffer, perhaps scatter list */
-	struct sg_slice_hdr3 s_hdr3;  /* subset of sg_io_hdr */
+	union {
+		struct sg_slice_hdr3 s_hdr3;  /* subset of sg_io_hdr */
+		struct sg_slice_hdr4 s_hdr4; /* reduced size struct sg_io_v4 */
+	};
 	u32 duration;		/* cmd duration in milliseconds */
 	u32 rq_flags;		/* hold user supplied flags */
 	u32 rq_idx;		/* my index within parent's srp_arr */
@@ -236,7 +253,10 @@ struct sg_device { /* holds the state of each scsi generic device */
 struct sg_comm_wr_t {  /* arguments to sg_common_write() */
 	int timeout;
 	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
-	struct sg_io_hdr *h3p;
+	union {		/* selector is frq_bm.SG_FRQ_IS_V4I */
+		struct sg_io_hdr *h3p;
+		struct sg_io_v4 *h4p;
+	};
 	u8 *cmnd;
 };
 
@@ -245,12 +265,12 @@ static void sg_rq_end_io(struct request *rq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
 static int sg_proc_init(void);
 static int sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
-			int dxfer_dir);
+			struct sg_io_v4 *h4p, int dxfer_dir);
 static void sg_finish_scsi_blk_rq(struct sg_request *srp);
 static int sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen);
-static int sg_submit(struct file *filp, struct sg_fd *sfp,
-		     struct sg_io_hdr *hp, bool sync,
-		     struct sg_request **o_srp);
+static int sg_v3_submit(struct file *filp, struct sg_fd *sfp,
+			struct sg_io_hdr *hp, bool sync,
+			struct sg_request **o_srp);
 static struct sg_request *sg_common_write(struct sg_fd *sfp,
 					  struct sg_comm_wr_t *cwrp);
 static int sg_read_append(struct sg_request *srp, void __user *outp,
@@ -258,11 +278,11 @@ static int sg_read_append(struct sg_request *srp, void __user *outp,
 static void sg_remove_sgat(struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
 static void sg_remove_sfp(struct kref *);
-static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int pack_id);
+static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id);
 static struct sg_request *sg_setup_req(struct sg_fd *sfp, int dxfr_len,
 				       struct sg_comm_wr_t *cwrp);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
-static struct sg_device *sg_get_dev(int dev);
+static struct sg_device *sg_get_dev(int min_dev);
 static void sg_device_destroy(struct kref *kref);
 static struct sg_request *sg_mk_srp_sgat(struct sg_fd *sfp, bool first,
 					 int db_len);
@@ -272,8 +292,11 @@ static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
 
 #define SZ_SG_HEADER ((int)sizeof(struct sg_header))	/* v1 and v2 header */
 #define SZ_SG_IO_HDR ((int)sizeof(struct sg_io_hdr))	/* v3 header */
+#define SZ_SG_IO_V4 ((int)sizeof(struct sg_io_v4))  /* v4 header (in bsg.h) */
 #define SZ_SG_REQ_INFO ((int)sizeof(struct sg_req_info))
 
+/* There is a assert that SZ_SG_IO_V4 >= SZ_SG_IO_HDR in first function */
+
 #define SG_IS_DETACHING(sdp) test_bit(SG_FDEV_DETACHING, (sdp)->fdev_bm)
 #define SG_HAVE_EXCLUDE(sdp) test_bit(SG_FDEV_EXCLUDE, (sdp)->fdev_bm)
 #define SG_RS_ACTIVE(srp) (atomic_read(&(srp)->rq_st) != SG_RS_INACTIVE)
@@ -330,6 +353,10 @@ static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
 static int
 sg_check_file_access(struct file *filp, const char *caller)
 {
+	/* can't put following in declarations where it belongs */
+	compiletime_assert(SZ_SG_IO_V4 >= SZ_SG_IO_HDR,
+			   "struct sg_io_v4 should be larger than sg_io_hdr");
+
 	if (filp->f_cred != current_real_cred()) {
 		pr_err_once("%s: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n",
 			caller, task_tgid_vnr(current), current->comm);
@@ -424,21 +451,18 @@ sg_open(struct inode *inode, struct file *filp)
 	o_excl = !!(op_flags & O_EXCL);
 	non_block = !!(op_flags & O_NONBLOCK);
 	if (o_excl && ((op_flags & O_ACCMODE) == O_RDONLY))
-		return -EPERM; /* Can't lock it with read only access */
+		return -EPERM;/* not permitted, need write access for O_EXCL */
 	sdp = sg_get_dev(min_dev);	/* increments sdp->d_ref */
 	if (IS_ERR(sdp))
 		return PTR_ERR(sdp);
 
-	/* This driver's module count bumped by fops_get in <linux/fs.h> */
 	/* Prevent the device driver from vanishing while we sleep */
 	res = scsi_device_get(sdp->device);
 	if (res)
 		goto sg_put;
-
 	res = scsi_autopm_get_device(sdp->device);
 	if (res)
 		goto sdp_put;
-
 	res = sg_allow_if_err_recovery(sdp, non_block);
 	if (res)
 		goto error_out;
@@ -475,9 +499,10 @@ sg_open(struct inode *inode, struct file *filp)
 	}
 
 	filp->private_data = sfp;
+	sfp->tid = (current ? current->pid : -1);
 	mutex_unlock(&sdp->open_rel_lock);
-	SG_LOG(3, sfp, "%s: minor=%d, op_flags=0x%x; %s count after=%d%s\n",
-	       __func__, min_dev, op_flags, "device open", o_count,
+	SG_LOG(3, sfp, "%s: o_count after=%d on minor=%d, op_flags=0x%x%s\n",
+	       __func__, o_count, min_dev, op_flags,
 	       ((op_flags & O_NONBLOCK) ? " O_NONBLOCK" : ""));
 
 	res = 0;
@@ -500,8 +525,13 @@ sg_open(struct inode *inode, struct file *filp)
 	goto sg_put;
 }
 
-/* Release resources associated with a successful sg_open()
- * Returns 0 on success, else a negated errno value */
+/*
+ * Release resources associated with a prior, successful sg_open(). It can be
+ * seen as the (final) close() call on a sg device file descriptor in the user
+ * space. The real work releasing all resources associated with this file
+ * descriptor is done by sg_remove_sfp_usercontext() which is scheduled by
+ * sg_remove_sfp().
+ */
 static int
 sg_release(struct inode *inode, struct file *filp)
 {
@@ -603,7 +633,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 				     __func__);
 			return -EPERM;
 		}
-		res = sg_submit(filp, sfp, h3p, false, NULL);
+		res = sg_v3_submit(filp, sfp, h3p, false, NULL);
 		return res < 0 ? res : (int)count;
 	}
 to_v2:
@@ -680,7 +710,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 static inline int
 sg_chk_mmap(struct sg_fd *sfp, int rq_flags, int len)
 {
-	if (!xa_empty(&sfp->srp_arr))
+	if (atomic_read(&sfp->submitted) > 0)
 		return -EBUSY;  /* already active requests on fd */
 	if (len > sfp->rsv_srp->sgat_h.buflen)
 		return -ENOMEM; /* MMAP_IO size must fit in reserve */
@@ -711,8 +741,8 @@ sg_fetch_cmnd(struct file *filp, struct sg_fd *sfp, const u8 __user *u_cdbp,
 }
 
 static int
-sg_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
-	  bool sync, struct sg_request **o_srp)
+sg_v3_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
+	     bool sync, struct sg_request **o_srp)
 {
 	int res, timeout;
 	unsigned long ul_timeout;
@@ -746,6 +776,67 @@ sg_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
 	return 0;
 }
 
+static int
+sg_submit_v4(struct file *filp, struct sg_fd *sfp, void __user *p,
+	     struct sg_io_v4 *h4p, bool sync, struct sg_request **o_srp)
+{
+	int timeout, res;
+	unsigned long ul_timeout;
+	struct sg_request *srp;
+	struct sg_comm_wr_t cwr;
+	u8 cmnd[SG_MAX_CDB_SIZE];
+
+	if (h4p->flags & SG_FLAG_MMAP_IO) {
+		int len = 0;
+
+		if (h4p->din_xferp)
+			len = h4p->din_xfer_len;
+		else if (h4p->dout_xferp)
+			len = h4p->dout_xfer_len;
+		res = sg_chk_mmap(sfp, h4p->flags, len);
+		if (res)
+			return res;
+	}
+	/* once v4 (or v3) seen, allow cmd_q on this fd (def: no cmd_q) */
+	set_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
+	ul_timeout = msecs_to_jiffies(h4p->timeout);
+	timeout = min_t(unsigned long, ul_timeout, INT_MAX);
+	res = sg_fetch_cmnd(filp, sfp, cuptr64(h4p->request), h4p->request_len,
+			    cmnd);
+	if (res)
+		return res;
+	cwr.frq_bm[0] = 0;
+	assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
+	set_bit(SG_FRQ_IS_V4I, cwr.frq_bm);
+	cwr.h4p = h4p;
+	cwr.timeout = timeout;
+	cwr.cmnd = cmnd;
+	srp = sg_common_write(sfp, &cwr);
+	if (IS_ERR(srp))
+		return PTR_ERR(srp);
+	if (o_srp)
+		*o_srp = srp;
+	return res;
+}
+
+static int
+sg_ctl_iosubmit(struct file *filp, struct sg_fd *sfp, void __user *p)
+{
+	int res;
+	u8 hdr_store[SZ_SG_IO_V4];
+	struct sg_io_v4 *h4p = (struct sg_io_v4 *)hdr_store;
+	struct sg_device *sdp = sfp->parentdp;
+
+	res = sg_allow_if_err_recovery(sdp, (filp->f_flags & O_NONBLOCK));
+	if (res)
+		return res;
+	if (copy_from_user(hdr_store, p, SZ_SG_IO_V4))
+		return -EFAULT;
+	if (h4p->guard == 'Q')
+		return sg_submit_v4(filp, sfp, p, h4p, false, NULL);
+	return -EPERM;
+}
+
 #if IS_ENABLED(SG_LOG_ACTIVE)
 static void
 sg_rq_state_fail_msg(struct sg_fd *sfp, enum sg_rq_state exp_old_st,
@@ -855,16 +946,46 @@ sg_rq_state_chg(struct sg_request *srp, enum sg_rq_state old_st,
 	return 0;
 }
 
+static void
+sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
+{
+	bool at_head, is_v4h, sync;
+	struct sg_device *sdp = sfp->parentdp;
+
+	is_v4h = test_bit(SG_FRQ_IS_V4I, srp->frq_bm);
+	sync = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
+	SG_LOG(3, sfp, "%s: is_v4h=%d\n", __func__, (int)is_v4h);
+	srp->start_ns = ktime_get_boottime_ns();
+	srp->duration = 0;
+
+	if (!is_v4h && srp->s_hdr3.interface_id == '\0')
+		at_head = true;	/* backward compatibility: v1+v2 interfaces */
+	else if (test_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm))
+		/* cmd flags can override sfd setting */
+		at_head = !!(srp->rq_flags & SG_FLAG_Q_AT_HEAD);
+	else            /* this sfd is defaulting to head */
+		at_head = !(srp->rq_flags & SG_FLAG_Q_AT_TAIL);
+
+	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
+	sg_rq_state_chg(srp, SG_RS_BUSY /* ignored */, SG_RS_INFLIGHT,
+			true, __func__);
+
+	/* >>>>>>> send cmd/req off to other levels <<<<<<<< */
+	if (!sync)
+		atomic_inc(&sfp->submitted);
+	blk_execute_rq_nowait(sdp->disk, srp->rq, (int)at_head, sg_rq_end_io);
+}
+
 /*
  * All writes and submits converge on this function to launch the SCSI
  * command/request (via blk_execute_rq_nowait). Returns a pointer to a
  * sg_request object holding the request just issued or a negated errno
  * value twisted by ERR_PTR.
+ * N.B. pack_id placed in sg_io_v4::request_extra field.
  */
 static struct sg_request *
 sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 {
-	bool at_head;
 	int res = 0;
 	int dxfr_len, dir, cmd_len;
 	int pack_id = SG_PACK_ID_WILDCARD;
@@ -872,12 +993,32 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 	struct sg_device *sdp = sfp->parentdp;
 	struct sg_request *srp;
 	struct sg_io_hdr *hi_p;
-
-	hi_p = cwrp->h3p;
-	dir = hi_p->dxfer_direction;
-	dxfr_len = hi_p->dxfer_len;
-	rq_flags = hi_p->flags;
-	pack_id = hi_p->pack_id;
+	struct sg_io_v4 *h4p;
+
+	if (test_bit(SG_FRQ_IS_V4I, cwrp->frq_bm)) {
+		h4p = cwrp->h4p;
+		hi_p = NULL;
+		dxfr_len = 0;
+		dir = SG_DXFER_NONE;
+		rq_flags = h4p->flags;
+		pack_id = h4p->request_extra;
+		if (h4p->din_xfer_len && h4p->dout_xfer_len) {
+			return ERR_PTR(-EOPNOTSUPP);
+		} else if (h4p->din_xfer_len) {
+			dxfr_len = h4p->din_xfer_len;
+			dir = SG_DXFER_FROM_DEV;
+		} else if (h4p->dout_xfer_len) {
+			dxfr_len = h4p->dout_xfer_len;
+			dir = SG_DXFER_TO_DEV;
+		}
+	} else {                /* sg v3 interface so hi_p valid */
+		h4p = NULL;
+		hi_p = cwrp->h3p;
+		dir = hi_p->dxfer_direction;
+		dxfr_len = hi_p->dxfer_len;
+		rq_flags = hi_p->flags;
+		pack_id = hi_p->pack_id;
+	}
 	if (dxfr_len >= SZ_256M)
 		return ERR_PTR(-EINVAL);
 
@@ -887,13 +1028,23 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 	srp->rq_flags = rq_flags;
 	srp->pack_id = pack_id;
 
-	cmd_len = hi_p->cmd_len;
-	memcpy(&srp->s_hdr3, hi_p, sizeof(srp->s_hdr3));
+	if (h4p) {
+		memset(&srp->s_hdr4, 0, sizeof(srp->s_hdr4));
+		srp->s_hdr4.usr_ptr = h4p->usr_ptr;
+		srp->s_hdr4.sbp = uptr64(h4p->response);
+		srp->s_hdr4.max_sb_len = h4p->max_response_len;
+		srp->s_hdr4.cmd_len = h4p->request_len;
+		srp->s_hdr4.dir = dir;
+		cmd_len = h4p->request_len;
+	} else {	/* v3 interface active */
+		cmd_len = hi_p->cmd_len;
+		memcpy(&srp->s_hdr3, hi_p, sizeof(srp->s_hdr3));
+	}
 	srp->cmd_opcode = cwrp->cmnd[0];/* hold opcode of command for debug */
 	SG_LOG(4, sfp, "%s: opcode=0x%02x, cdb_sz=%d, pack_id=%d\n", __func__,
 	       (int)cwrp->cmnd[0], cmd_len, pack_id);
 
-	res = sg_start_req(srp, cwrp->cmnd, cmd_len, dir);
+	res = sg_start_req(srp, cwrp->cmnd, cmd_len, h4p, dir);
 	if (res < 0)		/* probably out of space --> -ENOMEM */
 		goto err_out;
 	if (unlikely(SG_IS_DETACHING(sdp))) {
@@ -901,24 +1052,7 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 		goto err_out;
 	}
 	srp->rq->timeout = cwrp->timeout;
-	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
-	res = sg_rq_state_chg(srp, SG_RS_BUSY, SG_RS_INFLIGHT, false,
-			      __func__);
-	if (res)
-		goto err_out;
-	srp->start_ns = ktime_get_boottime_ns();
-	srp->duration = 0;
-
-	if (srp->s_hdr3.interface_id == '\0')
-		at_head = true; /* backward compatibility: v1+v2 interfaces */
-	else if (test_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm))
-	/* cmd flags can override sfd setting */
-		at_head = !!(srp->rq_flags & SG_FLAG_Q_AT_HEAD);
-	else            /* this sfd is defaulting to head */
-		at_head = !(srp->rq_flags & SG_FLAG_Q_AT_TAIL);
-	if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
-		atomic_inc(&sfp->submitted);
-	blk_execute_rq_nowait(sdp->disk, srp->rq, at_head, sg_rq_end_io);
+	sg_execute_cmd(sfp, srp);
 	return srp;
 err_out:
 	sg_deact_request(sfp, srp);
@@ -930,7 +1064,6 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
  * sg_ctl_ioreceive(). wait_event_interruptible will return if this one
  * returns true (or an event like a signal (e.g. control-C) occurs).
  */
-
 static inline bool
 sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int pack_id)
 {
@@ -950,7 +1083,7 @@ sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int pack_id)
  * negated errno value.
  */
 static int
-sg_copy_sense(struct sg_request *srp)
+sg_copy_sense(struct sg_request *srp, bool v4_active)
 {
 	int sb_len_ret = 0;
 	int scsi_stat;
@@ -960,11 +1093,18 @@ sg_copy_sense(struct sg_request *srp)
 	if ((scsi_stat & SAM_STAT_CHECK_CONDITION) ||
 	    (driver_byte(srp->rq_result) & DRIVER_SENSE)) {
 		int sb_len = min_t(int, SCSI_SENSE_BUFFERSIZE, srp->sense_len);
-		int mx_sb_len = srp->s_hdr3.mx_sb_len;
+		int mx_sb_len;
 		u8 *sbp = srp->sense_bp;
-		void __user *up = srp->s_hdr3.sbp;
+		void __user *up;
 
 		srp->sense_bp = NULL;
+		if (v4_active) {
+			up = uptr64(srp->s_hdr4.sbp);
+			mx_sb_len = srp->s_hdr4.max_sb_len;
+		} else {
+			up = (void __user *)srp->s_hdr3.sbp;
+			mx_sb_len = srp->s_hdr3.mx_sb_len;
+		}
 		if (up && mx_sb_len > 0 && sbp) {
 			sb_len = min_t(int, mx_sb_len, sb_len);
 			/* Additional sense length field */
@@ -981,14 +1121,16 @@ sg_copy_sense(struct sg_request *srp)
 }
 
 static int
-sg_rec_state_v3(struct sg_fd *sfp, struct sg_request *srp)
+sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 {
-	int sb_len_wr;
 	u32 rq_res = srp->rq_result;
 
-	sb_len_wr = sg_copy_sense(srp);
-	if (sb_len_wr < 0)
-		return sb_len_wr;
+	if (unlikely(srp->rq_result & 0xff)) {
+		int sb_len_wr = sg_copy_sense(srp, v4_active);
+
+		if (sb_len_wr < 0)
+			return sb_len_wr;
+	}
 	if (rq_res & SG_ML_RESULT_MSK)
 		srp->rq_info |= SG_INFO_CHECK;
 	if (unlikely(SG_IS_DETACHING(sfp->parentdp)))
@@ -1015,7 +1157,7 @@ sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, size_t count,
 		goto err_out;
 	}
 	SG_LOG(3, sfp, "%s: srp=0x%pK\n", __func__, srp);
-	err = sg_rec_state_v3(sfp, srp);
+	err = sg_rec_state_v3v4(sfp, srp, false);
 	memset(hp, 0, sizeof(*hp));
 	memcpy(hp, &srp->s_hdr3, sizeof(srp->s_hdr3));
 	hp->sb_len_wr = srp->sense_len;
@@ -1039,11 +1181,103 @@ sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, size_t count,
 	return err;
 }
 
+static int
+sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
+	      struct sg_io_v4 *h4p)
+{
+	int err, err2;
+	u32 rq_result = srp->rq_result;
+
+	SG_LOG(3, sfp, "%s: p=%s, h4p=%s\n", __func__,
+	       (p ? "given" : "NULL"), (h4p ? "given" : "NULL"));
+	err = sg_rec_state_v3v4(sfp, srp, true);
+	h4p->guard = 'Q';
+	h4p->protocol = 0;
+	h4p->subprotocol = 0;
+	h4p->device_status = rq_result & 0xff;
+	h4p->driver_status = driver_byte(rq_result);
+	h4p->transport_status = host_byte(rq_result);
+	h4p->response_len = srp->sense_len;
+	h4p->info = srp->rq_info;
+	h4p->flags = srp->rq_flags;
+	h4p->duration = srp->duration;
+	switch (srp->s_hdr4.dir) {
+	case SG_DXFER_FROM_DEV:
+		h4p->din_xfer_len = srp->sgat_h.dlen;
+		break;
+	case SG_DXFER_TO_DEV:
+		h4p->dout_xfer_len = srp->sgat_h.dlen;
+		break;
+	default:
+		break;
+	}
+	h4p->din_resid = srp->in_resid;
+	h4p->dout_resid = srp->s_hdr4.out_resid;
+	h4p->usr_ptr = srp->s_hdr4.usr_ptr;
+	h4p->response = (u64)srp->s_hdr4.sbp;
+	h4p->request_extra = srp->pack_id;
+	if (p) {
+		if (copy_to_user(p, h4p, SZ_SG_IO_V4))
+			err = err ? err : -EFAULT;
+	}
+	err2 = sg_rq_state_chg(srp, atomic_read(&srp->rq_st), SG_RS_RCV_DONE,
+			       false, __func__);
+	if (err2)
+		err = err ? err : err2;
+	sg_finish_scsi_blk_rq(srp);
+	sg_deact_request(sfp, srp);
+	return err < 0 ? err : 0;
+}
+
 /*
- * Completes a v3 request/command. Called from sg_read {v2 or v3},
- * ioctl(SG_IO) {for v3}, or from ioctl(SG_IORECEIVE) when its
- * completing a v3 request/command.
+ * Called when ioctl(SG_IORECEIVE) received. Expects a v4 interface object.
+ * Checks if O_NONBLOCK file flag given, if not checks given 'flags' field
+ * to see if SGV4_FLAG_IMMED is set. Either of these implies non blocking.
+ * When non-blocking and there is no request waiting, yields EAGAIN;
+ * otherwise it waits (i.e. it "blocks").
  */
+static int
+sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
+{
+	bool non_block = !!(filp->f_flags & O_NONBLOCK);
+	int res, id;
+	int pack_id = SG_PACK_ID_WILDCARD;
+	u8 v4_holder[SZ_SG_IO_V4];
+	struct sg_io_v4 *h4p = (struct sg_io_v4 *)v4_holder;
+	struct sg_device *sdp = sfp->parentdp;
+	struct sg_request *srp;
+
+	res = sg_allow_if_err_recovery(sdp, non_block);
+	if (res)
+		return res;
+	/* Get first three 32 bit integers: guard, proto+subproto */
+	if (copy_from_user(h4p, p, SZ_SG_IO_V4))
+		return -EFAULT;
+	/* for v4: protocol=0 --> SCSI;  subprotocol=0 --> SPC++ */
+	if (h4p->guard != 'Q' || h4p->protocol != 0 || h4p->subprotocol != 0)
+		return -EPERM;
+	if (h4p->flags & SGV4_FLAG_IMMED)
+		non_block = true;	/* set by either this or O_NONBLOCK */
+	SG_LOG(3, sfp, "%s: non_block(+IMMED)=%d\n", __func__, non_block);
+	/* read in part of v3 or v4 header for pack_id or tag based find */
+	id = pack_id;
+	srp = sg_find_srp_by_id(sfp, id);
+	if (!srp) {     /* nothing available so wait on packet or */
+		if (unlikely(SG_IS_DETACHING(sdp)))
+			return -ENODEV;
+		if (non_block)
+			return -EAGAIN;
+		res = wait_event_interruptible(sfp->read_wait,
+					       sg_get_ready_srp(sfp, &srp,
+								id));
+		if (unlikely(SG_IS_DETACHING(sdp)))
+			return -ENODEV;
+		if (res)	/* -ERESTARTSYS as signal hit process */
+			return res;
+	}	/* now srp should be valid */
+	return sg_receive_v4(sfp, srp, p, h4p);
+}
+
 static int
 sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 	     struct sg_request *srp)
@@ -1320,6 +1554,8 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 	rip->sg_io_owned = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
 	rip->problem = !!(srp->rq_result & SG_ML_RESULT_MSK);
 	rip->pack_id = srp->pack_id;
+	rip->usr_ptr = test_bit(SG_FRQ_IS_V4I, srp->frq_bm) ?
+			uptr64(srp->s_hdr4.usr_ptr) : srp->s_hdr3.usr_ptr;
 	rip->usr_ptr = srp->s_hdr3.usr_ptr;
 	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 }
@@ -1337,7 +1573,7 @@ sg_rq_landed(struct sg_device *sdp, struct sg_request *srp)
  */
 static int
 sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
-		  struct sg_request *srp)
+		  struct sg_io_v4 *h4p, struct sg_request *srp)
 {
 	int res;
 	enum sg_rq_state sr_st;
@@ -1365,7 +1601,10 @@ sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
 	res = sg_rq_state_chg(srp, sr_st, SG_RS_BUSY, false, __func__);
 	if (unlikely(res))
 		return res;
-	res = sg_receive_v3(sfp, srp, SZ_SG_IO_HDR, p);
+	if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm))
+		res = sg_receive_v4(sfp, srp, p, h4p);
+	else
+		res = sg_receive_v3(sfp, srp, SZ_SG_IO_HDR, p);
 	return (res < 0) ? res : 0;
 }
 
@@ -1379,8 +1618,9 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 {
 	int res;
 	struct sg_request *srp = NULL;
-	u8 hu8arr[SZ_SG_IO_HDR];
+	u8 hu8arr[SZ_SG_IO_V4];
 	struct sg_io_hdr *h3p = (struct sg_io_hdr *)hu8arr;
+	struct sg_io_v4 *h4p = (struct sg_io_v4 *)hu8arr;
 
 	SG_LOG(3, sfp, "%s:  SG_IO%s\n", __func__,
 	       ((filp->f_flags & O_NONBLOCK) ? " O_NONBLOCK ignored" : ""));
@@ -1389,15 +1629,25 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return res;
 	if (get_sg_io_hdr(h3p, p))
 		return -EFAULT;
-	if (h3p->interface_id == 'S')
-		res = sg_submit(filp, sfp, h3p, true, &srp);
-	else
+	if (h3p->interface_id == 'Q') {
+		/* copy in rest of sg_io_v4 object */
+		if (copy_from_user(hu8arr + SZ_SG_IO_HDR,
+				   ((u8 __user *)p) + SZ_SG_IO_HDR,
+				   SZ_SG_IO_V4 - SZ_SG_IO_HDR))
+			return -EFAULT;
+		res = sg_submit_v4(filp, sfp, p, h4p, true, &srp);
+	} else if (h3p->interface_id == 'S') {
+		res = sg_v3_submit(filp, sfp, h3p, true, &srp);
+	} else {
+		pr_info_once("sg: %s: v3 or v4 interface only here\n",
+			     __func__);
 		return -EPERM;
+	}
 	if (unlikely(res < 0))
 		return res;
 	if (!srp)	/* mrq case: already processed all responses */
 		return res;
-	res = sg_wait_event_srp(filp, sfp, p, srp);
+	res = sg_wait_event_srp(filp, sfp, p, h4p, srp);
 	if (res)
 		SG_LOG(1, sfp, "%s: %s=0x%pK  state: %s\n", __func__,
 		       "unexpected srp", srp,
@@ -1615,6 +1865,12 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	switch (cmd_in) {
 	case SG_IO:
 		return sg_ctl_sg_io(filp, sdp, sfp, p);
+	case SG_IOSUBMIT:
+		SG_LOG(3, sfp, "%s:    SG_IOSUBMIT\n", __func__);
+		return sg_ctl_iosubmit(filp, sfp, p);
+	case SG_IORECEIVE:
+		SG_LOG(3, sfp, "%s:    SG_IORECEIVE\n", __func__);
+		return sg_ctl_ioreceive(filp, sfp, p);
 	case SG_GET_SCSI_ID:
 		return sg_ctl_scsi_id(sdev, sfp, p);
 	case SG_SET_FORCE_PACK_ID:
@@ -2097,8 +2353,16 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 	slen = min_t(int, scsi_rp->sense_len, SCSI_SENSE_BUFFERSIZE);
 	a_resid = scsi_rp->resid_len;
 
-	if (a_resid)
-		srp->in_resid = a_resid;
+	if (a_resid) {
+		if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm)) {
+			if (rq_data_dir(rq) == READ)
+				srp->in_resid = a_resid;
+			else
+				srp->s_hdr4.out_resid = a_resid;
+		} else {
+			srp->in_resid = a_resid;
+		}
+	}
 
 	SG_LOG(6, sfp, "%s: pack_id=%d, res=0x%x\n", __func__, srp->pack_id,
 	       srp->rq_result);
@@ -2503,7 +2767,8 @@ sg_set_map_data(const struct sg_scatter_hold *schp, bool up_valid,
 }
 
 static int
-sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len, int dxfer_dir)
+sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
+	     struct sg_io_v4 *h4p, int dxfer_dir)
 {
 	bool reserved, us_xfer;
 	int res = 0;
@@ -2520,7 +2785,6 @@ sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len, int dxfer_dir)
 	struct rq_map_data *md = (void *)srp; /* want any non-NULL value */
 	u8 *long_cmdp = NULL;
 	__maybe_unused const char *cp = "";
-	struct sg_slice_hdr3 *sh3p = &srp->s_hdr3;
 	struct rq_map_data map_data;
 
 	sdp = sfp->parentdp;
@@ -2530,10 +2794,28 @@ sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len, int dxfer_dir)
 			return -ENOMEM;
 		SG_LOG(5, sfp, "%s: long_cmdp=0x%pK ++\n", __func__, long_cmdp);
 	}
-	up = sh3p->dxferp;
-	dxfer_len = (int)sh3p->dxfer_len;
-	iov_count = sh3p->iovec_count;
-	r0w = dxfer_dir == SG_DXFER_TO_DEV ? WRITE : READ;
+	if (h4p) {
+		if (dxfer_dir == SG_DXFER_TO_DEV) {
+			r0w = WRITE;
+			up = uptr64(h4p->dout_xferp);
+			dxfer_len = (int)h4p->dout_xfer_len;
+			iov_count = h4p->dout_iovec_count;
+		} else if (dxfer_dir == SG_DXFER_FROM_DEV) {
+			r0w = READ;
+			up = uptr64(h4p->din_xferp);
+			dxfer_len = (int)h4p->din_xfer_len;
+			iov_count = h4p->din_iovec_count;
+		} else {
+			up = NULL;
+		}
+	} else {
+		struct sg_slice_hdr3 *sh3p = &srp->s_hdr3;
+
+		up = sh3p->dxferp;
+		dxfer_len = (int)sh3p->dxfer_len;
+		iov_count = sh3p->iovec_count;
+		r0w = dxfer_dir == SG_DXFER_TO_DEV ? WRITE : READ;
+	}
 	SG_LOG(4, sfp, "%s: dxfer_len=%d, data-%s\n", __func__, dxfer_len,
 	       (r0w ? "OUT" : "IN"));
 	q = sdp->device->request_queue;
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index c5a813462631..7b733e826b62 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -99,6 +99,18 @@ typedef struct sg_io_hdr {
 #define SG_FLAG_Q_AT_TAIL 0x10
 #define SG_FLAG_Q_AT_HEAD 0x20
 
+/*
+ * Flags used by ioctl(SG_IOSUBMIT) [abbrev: SG_IOS] and ioctl(SG_IORECEIVE)
+ * [abbrev: SG_IOR] OR-ed into sg_io_v4::flags. The sync v4 interface uses
+ * ioctl(SG_IO) and can take these new flags, as can the v3 interface.
+ * These flags apply for SG_IOS unless otherwise noted. May be OR-ed together.
+ */
+#define SGV4_FLAG_DIRECT_IO SG_FLAG_DIRECT_IO
+#define SGV4_FLAG_MMAP_IO SG_FLAG_MMAP_IO
+#define SGV4_FLAG_Q_AT_TAIL SG_FLAG_Q_AT_TAIL
+#define SGV4_FLAG_Q_AT_HEAD SG_FLAG_Q_AT_HEAD
+#define SGV4_FLAG_IMMED 0x400 /* for polling with SG_IOR, ignored in SG_IOS */
+
 /* Output (potentially OR-ed together) in v3::info or v4::info field */
 #define SG_INFO_OK_MASK 0x1
 #define SG_INFO_OK 0x0		/* no sense, host nor driver "noise" */
@@ -134,7 +146,6 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 	/* sg_io_owned set imples synchronous, clear implies asynchronous */
 	char sg_io_owned;/* 0 -> complete with read(), 1 -> owned by SG_IO */
 	char problem;	/* 0 -> no problem detected, 1 -> error to report */
-	/* If SG_CTL_FLAGM_TAG_FOR_PACK_ID set on fd then next field is tag */
 	int pack_id;	/* pack_id, in v4 driver may be tag instead */
 	void __user *usr_ptr;	/* user provided pointer in v3+v4 interface */
 	unsigned int duration;
@@ -163,6 +174,13 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_SET_RESERVED_SIZE 0x2275  /* request new reserved buffer size */
 #define SG_GET_RESERVED_SIZE 0x2272  /* actual size of reserved buffer */
 
+/*
+ * Historically the scsi/sg driver has used 0x22 as it ioctl base number.
+ * Add a define for that value and use it for several new ioctls added in
+ * version 4.0.01 sg driver and later.
+ */
+#define SG_IOCTL_MAGIC_NUM 0x22
+
 /* The following ioctl has a 'sg_scsi_id_t *' object as its 3rd argument. */
 #define SG_GET_SCSI_ID 0x2276   /* Yields fd's bus, chan, dev, lun + type */
 /* SCSI id information can also be obtained from SCSI_IOCTL_GET_IDLUN */
@@ -319,6 +337,23 @@ struct sg_header {
  */
 #define SG_NEXT_CMD_LEN 0x2283
 
+/*
+ * New ioctls to replace async (non-blocking) write()/read() interface.
+ * Present in version 4 and later of the sg driver [>20190427]. The
+ * SG_IOSUBMIT and SG_IORECEIVE ioctls accept the sg_v4 interface based on
+ * struct sg_io_v4 found in <include/uapi/linux/bsg.h>. These objects are
+ * passed by a pointer in the third argument of the ioctl.
+ *
+ * Data may be transferred both from the user space to the driver by these
+ * ioctls. Hence the _IOWR macro is used here to generate the ioctl number
+ * rather than _IOW or _IOR.
+ */
+/* Submits a v4 interface object to driver, optionally receive tag back */
+#define SG_IOSUBMIT _IOWR(SG_IOCTL_MAGIC_NUM, 0x41, struct sg_io_v4)
+
+/* Gives some v4 identifying info to driver, receives associated response */
+#define SG_IORECEIVE _IOWR(SG_IOCTL_MAGIC_NUM, 0x42, struct sg_io_v4)
+
 /* command queuing is always on when the v3 or v4 interface is used */
 #define SG_DEF_COMMAND_Q 0
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 28/83] sg: rework debug info
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (27 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 27/83] sg: add sg v4 interface support Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 29/83] sg: add 8 byte SCSI LUN to sg_scsi_id Douglas Gilbert
                   ` (54 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Since the version 2 driver, the state of the driver can be found
with 'cat /proc/scsi/sg/debug'. As the driver becomes more
threaded and IO faster (e.g. scsi_debug with a command timer
of 5 microseconds), the existing state dump can become
misleading as the state can change during the "snapshot". The
new approach in this patch is to allocate a buffer of
SG_PROC_DEBUG_SZ bytes and use scnprintf() to populate it. Only
when the whole state is captured (or the buffer fills) is the
output to the caller's terminal performed. The previous
approach was line based: assemble a line of information and
then output it.

Locks are taken as required for short periods and should not
interfere with a disk IO intensive program. Operations
such as closing a sg file descriptor or removing a sg device
may be held up for a short while (microseconds).

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 256 ++++++++++++++++++++++++++++++++--------------
 1 file changed, 177 insertions(+), 79 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index e0dd62001a1e..5569da92f2fe 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -2370,7 +2370,7 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 	if (unlikely((srp->rq_result & SG_ML_RESULT_MSK) && slen > 0))
 		sg_check_sense(sdp, srp, slen);
 	if (slen > 0) {
-		if (scsi_rp->sense) {
+		if (scsi_rp->sense && !srp->sense_bp) {
 			srp->sense_bp = mempool_alloc(sg_sense_pool,
 						      GFP_ATOMIC);
 			if (srp->sense_bp) {
@@ -2383,6 +2383,9 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 				pr_warn("%s: sense but can't alloc buffer\n",
 					__func__);
 			}
+		} else if (srp->sense_bp) {
+			slen = 0;
+			pr_warn("%s: non-NULL srp->sense_bp ? ?\n", __func__);
 		} else {
 			slen = 0;
 			pr_warn("%s: sense_len>0 but sense==NULL\n", __func__);
@@ -3892,116 +3895,211 @@ sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 	return 0;
 }
 
-/* must be called while holding sg_index_lock */
-static void
-sg_proc_debug_helper(struct seq_file *s, struct sg_device *sdp)
+/* Writes debug info for one sg_request in obp buffer */
+static int
+sg_proc_debug_sreq(struct sg_request *srp, int to, char *obp, int len)
 {
-	int k;
-	unsigned long idx, idx2;
+	bool is_v3v4, v4, is_dur;
+	int n = 0;
+	u32 dur;
+	enum sg_rq_state rq_st;
+	const char *cp;
+
+	if (len < 1)
+		return 0;
+	v4 = test_bit(SG_FRQ_IS_V4I, srp->frq_bm);
+	is_v3v4 = v4 ? true : (srp->s_hdr3.interface_id != '\0');
+	if (srp->parentfp->rsv_srp == srp)
+		cp = (is_v3v4 && (srp->rq_flags & SG_FLAG_MMAP_IO)) ?
+				"     mmap>> " : "     rsv>> ";
+	else
+		cp = (srp->rq_info & SG_INFO_DIRECT_IO_MASK) ?
+				"     dio>> " : "     ";
+	rq_st = atomic_read(&srp->rq_st);
+	dur = sg_get_dur(srp, &rq_st, &is_dur);
+	n += scnprintf(obp + n, len - n, "%s%s: dlen=%d/%d id=%d", cp,
+		       sg_rq_st_str(rq_st, false), srp->sgat_h.dlen,
+		       srp->sgat_h.buflen, (int)srp->pack_id);
+	if (is_dur)	/* cmd/req has completed, waiting for ... */
+		n += scnprintf(obp + n, len - n, " dur=%ums", dur);
+	else if (dur < U32_MAX)	/* in-flight or busy (so ongoing) */
+		n += scnprintf(obp + n, len - n, " t_o/elap=%us/%ums",
+			       to / 1000, dur);
+	n += scnprintf(obp + n, len - n, " sgat=%d op=0x%02x\n",
+		       srp->sgat_h.num_sgat, srp->cmd_opcode);
+	return n;
+}
+
+/* Writes debug info for one sg fd (including its sg requests) in obp buffer */
+static int
+sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
+{
+	int n = 0;
+	int to, k;
+	unsigned long iflags;
 	struct sg_request *srp;
-	struct sg_fd *fp;
-	const char * cp;
-	unsigned int ms;
 
+	/* sgat=-1 means unavailable */
+	to = (fp->timeout >= 0) ? jiffies_to_msecs(fp->timeout) : -999;
+	if (to < 0)
+		n += scnprintf(obp + n, len - n, "BAD timeout=%d",
+			       fp->timeout);
+	else if (to % 1000)
+		n += scnprintf(obp + n, len - n, "timeout=%dms rs", to);
+	else
+		n += scnprintf(obp + n, len - n, "timeout=%ds rs", to / 1000);
+	n += scnprintf(obp + n, len - n, "v_buflen=%d idx=%lu\n   cmd_q=%d ",
+		       fp->rsv_srp->sgat_h.buflen, idx,
+		       (int)test_bit(SG_FFD_CMD_Q, fp->ffd_bm));
+	n += scnprintf(obp + n, len - n,
+		       "f_packid=%d k_orphan=%d ffd_bm=0x%lx\n",
+		       (int)test_bit(SG_FFD_FORCE_PACKID, fp->ffd_bm),
+		       (int)test_bit(SG_FFD_KEEP_ORPHAN, fp->ffd_bm),
+		       fp->ffd_bm[0]);
+	n += scnprintf(obp + n, len - n, "   mmap_called=%d\n",
+		       test_bit(SG_FFD_MMAP_CALLED, fp->ffd_bm));
+	n += scnprintf(obp + n, len - n,
+		       "   submitted=%d waiting=%d   open thr_id=%d\n",
+		       atomic_read(&fp->submitted),
+		       atomic_read(&fp->waiting), fp->tid);
+	k = 0;
+	xa_lock_irqsave(&fp->srp_arr, iflags);
+	xa_for_each(&fp->srp_arr, idx, srp) {
+		if (!srp)
+			continue;
+		if (xa_get_mark(&fp->srp_arr, idx, SG_XA_RQ_INACTIVE))
+			continue;
+		n += sg_proc_debug_sreq(srp, fp->timeout, obp + n, len - n);
+		++k;
+		if ((k % 8) == 0) {     /* don't hold up isr_s too long */
+			xa_unlock_irqrestore(&fp->srp_arr, iflags);
+			cpu_relax();
+			xa_lock_irqsave(&fp->srp_arr, iflags);
+		}
+	}
+	if (k == 0)
+		n += scnprintf(obp + n, len - n, "     No requests active\n");
 	k = 0;
+	xa_for_each_marked(&fp->srp_arr, idx, srp, SG_XA_RQ_INACTIVE) {
+		if (!srp)
+			continue;
+		if (k == 0)
+			n += scnprintf(obp + n, len - n, "   Inactives:\n");
+		n += sg_proc_debug_sreq(srp, fp->timeout, obp + n, len - n);
+		++k;
+		if ((k % 8) == 0) {     /* don't hold up isr_s too long */
+			xa_unlock_irqrestore(&fp->srp_arr, iflags);
+			cpu_relax();
+			xa_lock_irqsave(&fp->srp_arr, iflags);
+		}
+	}
+	xa_unlock_irqrestore(&fp->srp_arr, iflags);
+	return n;
+}
+
+/* Writes debug info for one sg device (including its sg fds) in obp buffer */
+static int
+sg_proc_debug_sdev(struct sg_device *sdp, char *obp, int len, int *fd_counterp)
+{
+	int n = 0;
+	int my_count = 0;
+	unsigned long idx;
+	struct scsi_device *ssdp = sdp->device;
+	struct sg_fd *fp;
+	char *disk_name;
+	int *countp;
+
+	countp = fd_counterp ? fd_counterp : &my_count;
+	disk_name = (sdp->disk ? sdp->disk->disk_name : "?_?");
+	n += scnprintf(obp + n, len - n, " >>> device=%s ", disk_name);
+	n += scnprintf(obp + n, len - n, "%d:%d:%d:%llu ", ssdp->host->host_no,
+		       ssdp->channel, ssdp->id, ssdp->lun);
+	n += scnprintf(obp + n, len - n,
+		       "  max_sgat_sz,elems=2^%d,%d excl=%d open_cnt=%d\n",
+		       ilog2(sdp->max_sgat_sz), sdp->max_sgat_elems,
+		       SG_HAVE_EXCLUDE(sdp), atomic_read(&sdp->open_cnt));
 	xa_for_each(&sdp->sfp_arr, idx, fp) {
 		if (!fp)
 			continue;
-		k++;
-		seq_printf(s, "   FD(%d): timeout=%dms buflen=%d (res)sgat=%d low_dma=%d idx=%lu\n",
-			   k, jiffies_to_msecs(fp->timeout),
-			   fp->rsv_srp->sgat_h.buflen,
-			   (int)fp->rsv_srp->sgat_h.num_sgat,
-			   (int)sdp->device->host->unchecked_isa_dma, idx);
-		seq_printf(s, "   cmd_q=%d f_packid=%d k_orphan=%d closed=0\n",
-			   (int)test_bit(SG_FFD_CMD_Q, fp->ffd_bm),
-			   (int)test_bit(SG_FFD_FORCE_PACKID, fp->ffd_bm),
-			   (int)test_bit(SG_FFD_KEEP_ORPHAN, fp->ffd_bm));
-		seq_printf(s, "   submitted=%d waiting=%d\n",
-			   atomic_read(&fp->submitted),
-			   atomic_read(&fp->waiting));
-		xa_for_each(&fp->srp_arr, idx2, srp) {
-			const struct sg_slice_hdr3 *sh3p = &srp->s_hdr3;
-			bool is_v3 = (sh3p->interface_id != '\0');
-			enum sg_rq_state rq_st = atomic_read(&srp->rq_st);
-
-			if (!srp)
-				continue;
-			if (srp->parentfp->rsv_srp == srp) {
-				if (is_v3 && (SG_FLAG_MMAP_IO & sh3p->flags))
-					cp = "     mmap>> ";
-				else
-					cp = "     rb>> ";
-			} else {
-				if (SG_INFO_DIRECT_IO_MASK & srp->rq_info)
-					cp = "     dio>> ";
-				else
-					cp = "     ";
-			}
-			seq_puts(s, cp);
-			seq_puts(s, sg_rq_st_str(rq_st, false));
-			seq_printf(s, ": id=%d len/blen=%d/%d",
-				   srp->pack_id, srp->sgat_h.dlen,
-				   srp->sgat_h.buflen);
-			if (rq_st == SG_RS_AWAIT_RCV ||
-			    rq_st == SG_RS_RCV_DONE) {
-				seq_printf(s, " dur=%d", srp->duration);
-				goto fin_line;
-			}
-			ms = jiffies_to_msecs(jiffies);
-			seq_printf(s, " t_o/elap=%d/%d",
-				   (is_v3 ? sh3p->timeout :
-					    jiffies_to_msecs(fp->timeout)),
-				   (ms > srp->duration ?  ms - srp->duration :
-							  0));
-fin_line:
-			seq_printf(s, "ms sgat=%d op=0x%02x\n",
-				   srp->sgat_h.num_sgat, (int)srp->cmd_opcode);
-		}
-		if (xa_empty(&fp->srp_arr))
-			seq_puts(s, "     No requests active\n");
+		++*countp;
+		n += scnprintf(obp + n, len - n, "  FD(%d): ", *countp);
+		n += sg_proc_debug_fd(fp, obp + n, len - n, idx);
 	}
+	return n;
 }
 
+/* Called via dbg_seq_ops once for each sg device */
 static int
 sg_proc_seq_show_debug(struct seq_file *s, void *v)
 {
+	bool found = false;
+	bool trunc = false;
+	const int bp_len = SG_PROC_DEBUG_SZ;
+	int n = 0;
+	int k = 0;
+	unsigned long iflags;
 	struct sg_proc_deviter *it = (struct sg_proc_deviter *)v;
 	struct sg_device *sdp;
-	unsigned long iflags;
+	int *fdi_p;
+	char *bp;
+	char *disk_name;
+	char b1[128];
 
+	b1[0] = '\0';
 	if (it && (0 == it->index))
 		seq_printf(s, "max_active_device=%d  def_reserved_size=%d\n",
-			   (int)it->max, sg_big_buff);
-
+			   (int)it->max, def_reserved_size);
+	fdi_p = it ? &it->fd_index : &k;
+	bp = kzalloc(bp_len, __GFP_NOWARN | GFP_KERNEL);
+	if (!bp) {
+		seq_printf(s, "%s: Unable to allocate %d on heap, finish\n",
+			   __func__, bp_len);
+		return -ENOMEM;
+	}
 	read_lock_irqsave(&sg_index_lock, iflags);
 	sdp = it ? sg_lookup_dev(it->index) : NULL;
 	if (NULL == sdp)
 		goto skip;
 	if (!xa_empty(&sdp->sfp_arr)) {
-		seq_printf(s, " >>> device=%s ", sdp->disk->disk_name);
+		found = true;
+		disk_name = (sdp->disk ? sdp->disk->disk_name : "?_?");
 		if (SG_IS_DETACHING(sdp))
-			seq_puts(s, "detaching pending close ");
+			snprintf(b1, sizeof(b1), " >>> device=%s  %s\n",
+				 disk_name, "detaching pending close\n");
 		else if (sdp->device) {
-			struct scsi_device *scsidp = sdp->device;
-
-			seq_printf(s, "%d:%d:%d:%llu   em=%d",
-				   scsidp->host->host_no,
-				   scsidp->channel, scsidp->id,
-				   scsidp->lun,
-				   scsidp->host->hostt->emulated);
+			n = sg_proc_debug_sdev(sdp, bp, bp_len, fdi_p);
+			if (n >= bp_len - 1) {
+				trunc = true;
+				if (bp[bp_len - 2] != '\n')
+					bp[bp_len - 2] = '\n';
+			}
+		} else {
+			snprintf(b1, sizeof(b1), " >>> device=%s  %s\n",
+				 disk_name, "sdp->device==NULL, skip");
 		}
-		seq_printf(s, " max_sgat_sz=%d excl=%d open_cnt=%d\n",
-			   sdp->max_sgat_sz, SG_HAVE_EXCLUDE(sdp),
-			   atomic_read(&sdp->open_cnt));
-		sg_proc_debug_helper(s, sdp);
 	}
 skip:
 	read_unlock_irqrestore(&sg_index_lock, iflags);
+	if (found) {
+		if (n > 0) {
+			seq_puts(s, bp);
+			if (seq_has_overflowed(s))
+				goto s_ovfl;
+			if (trunc)
+				seq_printf(s, "   >> Output truncated %s\n",
+					   "due to buffer size");
+		} else if (b1[0]) {
+			seq_puts(s, b1);
+			if (seq_has_overflowed(s))
+				goto s_ovfl;
+		}
+	}
+s_ovfl:
+	kfree(bp);
 	return 0;
 }
 
-#endif				/* CONFIG_SCSI_PROC_FS (~300 lines back) */
+#endif				/* CONFIG_SCSI_PROC_FS (~400 lines back) */
 
 module_init(init_sg);
 module_exit(exit_sg);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 29/83] sg: add 8 byte SCSI LUN to sg_scsi_id
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (28 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 28/83] sg: rework debug info Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 30/83] sg: expand sg_comm_wr_t Douglas Gilbert
                   ` (53 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, Hannes Reinecke

The existing ioctl(SG_GET_SCSI_ID) fills a object of type
struct sg_scsi_id whose last field is int unused[2]. Add
an anonymous union with u8 scsi_lun[8] sharing those last
8 bytes. This patch will place the current device's full
LUN in the scsi_lun array using T10's preferred LUN
format (i.e. an array of 8 bytes) when
ioctl(SG_GET_SCSI_ID) is called.

Note that structure already contains a 'lun' field but that
is a 32 bit integer. Users of this upgrade should choose
the scsi_lun array field henceforth but existing code
can remain as it is and will get the same 'lun' value with
the version 3 or version 4 driver.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 8 +++++---
 include/uapi/scsi/sg.h | 5 ++++-
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 5569da92f2fe..973fc910a60a 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1214,7 +1214,7 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 	h4p->din_resid = srp->in_resid;
 	h4p->dout_resid = srp->s_hdr4.out_resid;
 	h4p->usr_ptr = srp->s_hdr4.usr_ptr;
-	h4p->response = (u64)srp->s_hdr4.sbp;
+	h4p->response = (uintptr_t)srp->s_hdr4.sbp;
 	h4p->request_extra = srp->pack_id;
 	if (p) {
 		if (copy_to_user(p, h4p, SZ_SG_IO_V4))
@@ -1827,6 +1827,7 @@ static int
 sg_ctl_scsi_id(struct scsi_device *sdev, struct sg_fd *sfp, void __user *p)
 {
 	struct sg_scsi_id ss_id;
+	struct scsi_lun lun8b;
 
 	SG_LOG(3, sfp, "%s:    SG_GET_SCSI_ID\n", __func__);
 	ss_id.host_no = sdev->host->host_no;
@@ -1836,8 +1837,9 @@ sg_ctl_scsi_id(struct scsi_device *sdev, struct sg_fd *sfp, void __user *p)
 	ss_id.scsi_type = sdev->type;
 	ss_id.h_cmd_per_lun = sdev->host->cmd_per_lun;
 	ss_id.d_queue_depth = sdev->queue_depth;
-	ss_id.unused[0] = 0;
-	ss_id.unused[1] = 0;
+	int_to_scsilun(sdev->lun, &lun8b);
+	/* ss_id.scsi_lun is in an anonymous union with 'int unused[2]' */
+	memcpy(ss_id.scsi_lun, lun8b.scsi_lun, 8);
 	if (copy_to_user(p, &ss_id, sizeof(struct sg_scsi_id)))
 		return -EFAULT;
 	return 0;
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 7b733e826b62..4a073708aca7 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -136,7 +136,10 @@ typedef struct sg_scsi_id {
 	int scsi_type;	/* TYPE_... defined in scsi/scsi.h */
 	short h_cmd_per_lun;/* host (adapter) maximum commands per lun */
 	short d_queue_depth;/* device (or adapter) maximum queue length */
-	int unused[2];
+	union {
+		int unused[2];  /* as per version 3 driver */
+		__u8 scsi_lun[8];  /* full 8 byte SCSI LUN [in v4 driver] */
+	};
 } sg_scsi_id_t;
 
 /* For backward compatibility v4 driver yields at most SG_MAX_QUEUE of these */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 30/83] sg: expand sg_comm_wr_t
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (29 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 29/83] sg: add 8 byte SCSI LUN to sg_scsi_id Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 31/83] sg: add sg_iosubmit_v3 and sg_ioreceive_v3 ioctls Douglas Gilbert
                   ` (52 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare, Hannes Reinecke

The internal struct sg_comm_wr_t was added when the number of
arguments to sg_common_write() became excessive. Expand this idea
so multiple calls to sg_fetch_cmnd() can be deferred until a
scsi_request object is ready to receive the command. This saves
a 252 byte stack allocation on every submit path. Prior to this
and a few other changes, the kernel infrastructure was warning
about excessive stack usage.

Reviewed-by: Hannes Reinecke <hare@suse.com>

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 171 +++++++++++++++++++++++-----------------------
 1 file changed, 87 insertions(+), 84 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 973fc910a60a..3fd2feec462f 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -252,35 +252,37 @@ struct sg_device { /* holds the state of each scsi generic device */
 
 struct sg_comm_wr_t {  /* arguments to sg_common_write() */
 	int timeout;
+	int cmd_len;
 	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
 	union {		/* selector is frq_bm.SG_FRQ_IS_V4I */
 		struct sg_io_hdr *h3p;
 		struct sg_io_v4 *h4p;
 	};
-	u8 *cmnd;
+	struct sg_fd *sfp;
+	struct file *filp;
+	const u8 __user *u_cmdp;
 };
 
 /* tasklet or soft irq callback */
 static void sg_rq_end_io(struct request *rq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
 static int sg_proc_init(void);
-static int sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
-			struct sg_io_v4 *h4p, int dxfer_dir);
+static int sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp,
+			int dxfer_dir);
 static void sg_finish_scsi_blk_rq(struct sg_request *srp);
 static int sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen);
 static int sg_v3_submit(struct file *filp, struct sg_fd *sfp,
 			struct sg_io_hdr *hp, bool sync,
 			struct sg_request **o_srp);
-static struct sg_request *sg_common_write(struct sg_fd *sfp,
-					  struct sg_comm_wr_t *cwrp);
+static struct sg_request *sg_common_write(struct sg_comm_wr_t *cwrp);
 static int sg_read_append(struct sg_request *srp, void __user *outp,
 			  int num_xfer);
 static void sg_remove_sgat(struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
 static void sg_remove_sfp(struct kref *);
 static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id);
-static struct sg_request *sg_setup_req(struct sg_fd *sfp, int dxfr_len,
-				       struct sg_comm_wr_t *cwrp);
+static struct sg_request *sg_setup_req(struct sg_comm_wr_t *cwrp,
+				       int dxfr_len);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int min_dev);
 static void sg_device_destroy(struct kref *kref);
@@ -571,7 +573,6 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 	struct sg_request *srp;
-	u8 cmnd[SG_MAX_CDB_SIZE];
 	struct sg_header ov2hdr;
 	struct sg_io_hdr v3hdr;
 	struct sg_header *ohp = &ov2hdr;
@@ -683,9 +684,6 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	h3p->flags = input_size;	/* structure abuse ... */
 	h3p->pack_id = ohp->pack_id;
 	h3p->usr_ptr = NULL;
-	cmnd[0] = opcode;
-	if (copy_from_user(cmnd + 1, p + 1, cmd_size - 1))
-		return -EFAULT;
 	/*
 	 * SG_DXFER_TO_FROM_DEV is functionally equivalent to SG_DXFER_FROM_DEV,
 	 * but it is possible that the app intended SG_DXFER_TO_DEV, because
@@ -697,13 +695,16 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 			 "%s: data in/out %d/%d bytes for SCSI command 0x%x-- guessing data in;\n"
 			 "   program %s not setting count and/or reply_len properly\n",
 			 __func__, ohp->reply_len - (int)SZ_SG_HEADER,
-			 input_size, (unsigned int)cmnd[0], current->comm);
+			 input_size, (unsigned int)opcode, current->comm);
 	}
-	cwr.frq_bm[0] = 0;	/* initial state clear for all req flags */
 	cwr.h3p = h3p;
+	cwr.frq_bm[0] = 0;
 	cwr.timeout = sfp->timeout;
-	cwr.cmnd = cmnd;
-	srp = sg_common_write(sfp, &cwr);
+	cwr.cmd_len = cmd_size;
+	cwr.filp = filp;
+	cwr.sfp = sfp;
+	cwr.u_cmdp = p;
+	srp = sg_common_write(&cwr);
 	return (IS_ERR(srp)) ? PTR_ERR(srp) : (int)count;
 }
 
@@ -744,31 +745,29 @@ static int
 sg_v3_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
 	     bool sync, struct sg_request **o_srp)
 {
-	int res, timeout;
 	unsigned long ul_timeout;
 	struct sg_request *srp;
 	struct sg_comm_wr_t cwr;
-	u8 cmnd[SG_MAX_CDB_SIZE];
 
 	/* now doing v3 blocking (sync) or non-blocking submission */
 	if (hp->flags & SG_FLAG_MMAP_IO) {
-		res = sg_chk_mmap(sfp, hp->flags, hp->dxfer_len);
+		int res = sg_chk_mmap(sfp, hp->flags, hp->dxfer_len);
+
 		if (res)
 			return res;
 	}
 	/* when v3 seen, allow cmd_q on this fd (def: no cmd_q) */
 	set_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(hp->timeout);
-	timeout = min_t(unsigned long, ul_timeout, INT_MAX);
-	res = sg_fetch_cmnd(filp, sfp, hp->cmdp, hp->cmd_len, cmnd);
-	if (res)
-		return res;
 	cwr.frq_bm[0] = 0;
 	__assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
 	cwr.h3p = hp;
-	cwr.timeout = timeout;
-	cwr.cmnd = cmnd;
-	srp = sg_common_write(sfp, &cwr);
+	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
+	cwr.cmd_len = hp->cmd_len;
+	cwr.filp = filp;
+	cwr.sfp = sfp;
+	cwr.u_cmdp = hp->cmdp;
+	srp = sg_common_write(&cwr);
 	if (IS_ERR(srp))
 		return PTR_ERR(srp);
 	if (o_srp)
@@ -780,11 +779,10 @@ static int
 sg_submit_v4(struct file *filp, struct sg_fd *sfp, void __user *p,
 	     struct sg_io_v4 *h4p, bool sync, struct sg_request **o_srp)
 {
-	int timeout, res;
+	int res = 0;
 	unsigned long ul_timeout;
 	struct sg_request *srp;
 	struct sg_comm_wr_t cwr;
-	u8 cmnd[SG_MAX_CDB_SIZE];
 
 	if (h4p->flags & SG_FLAG_MMAP_IO) {
 		int len = 0;
@@ -800,18 +798,16 @@ sg_submit_v4(struct file *filp, struct sg_fd *sfp, void __user *p,
 	/* once v4 (or v3) seen, allow cmd_q on this fd (def: no cmd_q) */
 	set_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(h4p->timeout);
-	timeout = min_t(unsigned long, ul_timeout, INT_MAX);
-	res = sg_fetch_cmnd(filp, sfp, cuptr64(h4p->request), h4p->request_len,
-			    cmnd);
-	if (res)
-		return res;
+	cwr.filp = filp;
+	cwr.sfp = sfp;
 	cwr.frq_bm[0] = 0;
-	assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
-	set_bit(SG_FRQ_IS_V4I, cwr.frq_bm);
+	__assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
+	__set_bit(SG_FRQ_IS_V4I, cwr.frq_bm);
 	cwr.h4p = h4p;
-	cwr.timeout = timeout;
-	cwr.cmnd = cmnd;
-	srp = sg_common_write(sfp, &cwr);
+	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
+	cwr.cmd_len = h4p->request_len;
+	cwr.u_cmdp = cuptr64(h4p->request);
+	srp = sg_common_write(&cwr);
 	if (IS_ERR(srp))
 		return PTR_ERR(srp);
 	if (o_srp)
@@ -984,13 +980,14 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
  * N.B. pack_id placed in sg_io_v4::request_extra field.
  */
 static struct sg_request *
-sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
+sg_common_write(struct sg_comm_wr_t *cwrp)
 {
 	int res = 0;
-	int dxfr_len, dir, cmd_len;
+	int dxfr_len, dir;
 	int pack_id = SG_PACK_ID_WILDCARD;
 	u32 rq_flags;
-	struct sg_device *sdp = sfp->parentdp;
+	struct sg_fd *fp = cwrp->sfp;
+	struct sg_device *sdp = fp->parentdp;
 	struct sg_request *srp;
 	struct sg_io_hdr *hi_p;
 	struct sg_io_v4 *h4p;
@@ -1022,40 +1019,36 @@ sg_common_write(struct sg_fd *sfp, struct sg_comm_wr_t *cwrp)
 	if (dxfr_len >= SZ_256M)
 		return ERR_PTR(-EINVAL);
 
-	srp = sg_setup_req(sfp, dxfr_len, cwrp);
+	srp = sg_setup_req(cwrp, dxfr_len);
 	if (IS_ERR(srp))
 		return srp;
 	srp->rq_flags = rq_flags;
 	srp->pack_id = pack_id;
 
 	if (h4p) {
-		memset(&srp->s_hdr4, 0, sizeof(srp->s_hdr4));
 		srp->s_hdr4.usr_ptr = h4p->usr_ptr;
 		srp->s_hdr4.sbp = uptr64(h4p->response);
 		srp->s_hdr4.max_sb_len = h4p->max_response_len;
 		srp->s_hdr4.cmd_len = h4p->request_len;
 		srp->s_hdr4.dir = dir;
-		cmd_len = h4p->request_len;
+		srp->s_hdr4.out_resid = 0;
 	} else {	/* v3 interface active */
-		cmd_len = hi_p->cmd_len;
 		memcpy(&srp->s_hdr3, hi_p, sizeof(srp->s_hdr3));
 	}
-	srp->cmd_opcode = cwrp->cmnd[0];/* hold opcode of command for debug */
-	SG_LOG(4, sfp, "%s: opcode=0x%02x, cdb_sz=%d, pack_id=%d\n", __func__,
-	       (int)cwrp->cmnd[0], cmd_len, pack_id);
-
-	res = sg_start_req(srp, cwrp->cmnd, cmd_len, h4p, dir);
+	res = sg_start_req(srp, cwrp, dir);
 	if (res < 0)		/* probably out of space --> -ENOMEM */
 		goto err_out;
+	SG_LOG(4, fp, "%s: opcode=0x%02x, cdb_sz=%d, pack_id=%d\n", __func__,
+	       srp->cmd_opcode, cwrp->cmd_len, pack_id);
 	if (unlikely(SG_IS_DETACHING(sdp))) {
 		res = -ENODEV;
 		goto err_out;
 	}
 	srp->rq->timeout = cwrp->timeout;
-	sg_execute_cmd(sfp, srp);
+	sg_execute_cmd(fp, srp);
 	return srp;
 err_out:
-	sg_deact_request(sfp, srp);
+	sg_deact_request(fp, srp);
 	return ERR_PTR(res);
 }
 
@@ -1272,8 +1265,8 @@ sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
 								id));
 		if (unlikely(SG_IS_DETACHING(sdp)))
 			return -ENODEV;
-		if (res)	/* -ERESTARTSYS as signal hit process */
-			return res;
+		if (res)
+			return res;	/* signal --> -ERESTARTSYS */
 	}	/* now srp should be valid */
 	return sg_receive_v4(sfp, srp, p, h4p);
 }
@@ -2708,7 +2701,7 @@ init_sg(void)
 	}
 	sg_sysfs_valid = true;
 	rc = scsi_register_interface(&sg_interface);
-	if (0 == rc) {
+	if (rc == 0) {
 		sg_proc_init();
 		return 0;
 	}
@@ -2759,11 +2752,10 @@ sg_chk_dio_allowed(struct sg_device *sdp, struct sg_request *srp,
 	return false;
 }
 
-static void
+static inline void
 sg_set_map_data(const struct sg_scatter_hold *schp, bool up_valid,
 		struct rq_map_data *mdp)
 {
-	memset(mdp, 0, sizeof(*mdp));
 	mdp->pages = schp->pages;
 	mdp->page_order = schp->page_order;
 	mdp->nr_entries = schp->num_sgat;
@@ -2772,8 +2764,7 @@ sg_set_map_data(const struct sg_scatter_hold *schp, bool up_valid,
 }
 
 static int
-sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
-	     struct sg_io_v4 *h4p, int dxfer_dir)
+sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 {
 	bool reserved, us_xfer;
 	int res = 0;
@@ -2783,7 +2774,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
 	void __user *up;
 	struct request *rq;
 	struct scsi_request *scsi_rp;
-	struct sg_fd *sfp = srp->parentfp;
+	struct sg_fd *sfp = cwrp->sfp;
 	struct sg_device *sdp;
 	struct sg_scatter_hold *req_schp;
 	struct request_queue *q;
@@ -2793,20 +2784,21 @@ sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
 	struct rq_map_data map_data;
 
 	sdp = sfp->parentdp;
-	if (cmd_len > BLK_MAX_CDB) {	/* for longer SCSI cdb_s */
-		long_cmdp = kzalloc(cmd_len, GFP_KERNEL);
+	if (cwrp->cmd_len > BLK_MAX_CDB) {	/* for longer SCSI cdb_s */
+		long_cmdp = kzalloc(cwrp->cmd_len, GFP_KERNEL);
 		if (!long_cmdp)
 			return -ENOMEM;
 		SG_LOG(5, sfp, "%s: long_cmdp=0x%pK ++\n", __func__, long_cmdp);
 	}
-	if (h4p) {
+	if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm)) {
+		struct sg_io_v4 *h4p = cwrp->h4p;
+
 		if (dxfer_dir == SG_DXFER_TO_DEV) {
 			r0w = WRITE;
 			up = uptr64(h4p->dout_xferp);
 			dxfer_len = (int)h4p->dout_xfer_len;
 			iov_count = h4p->dout_iovec_count;
 		} else if (dxfer_dir == SG_DXFER_FROM_DEV) {
-			r0w = READ;
 			up = uptr64(h4p->din_xferp);
 			dxfer_len = (int)h4p->din_xfer_len;
 			iov_count = h4p->din_iovec_count;
@@ -2845,10 +2837,17 @@ sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
 	scsi_rp = scsi_req(rq);
 	srp->rq = rq;
 
-	if (cmd_len > BLK_MAX_CDB)
-		scsi_rp->cmd = long_cmdp;
-	memcpy(scsi_rp->cmd, cmd, cmd_len);
-	scsi_rp->cmd_len = cmd_len;
+	if (cwrp->cmd_len > BLK_MAX_CDB)
+		scsi_rp->cmd = long_cmdp;	/* transfer ownership */
+	if (cwrp->u_cmdp)
+		res = sg_fetch_cmnd(cwrp->filp, sfp, cwrp->u_cmdp,
+				    cwrp->cmd_len, scsi_rp->cmd);
+	else
+		res = -EPROTO;
+	if (res)
+		goto fini;
+	scsi_rp->cmd_len = cwrp->cmd_len;
+	srp->cmd_opcode = scsi_rp->cmd[0];
 	us_xfer = !(srp->rq_flags & SG_FLAG_NO_DXFER);
 	assign_bit(SG_FRQ_NO_US_XFER, srp->frq_bm, !us_xfer);
 	reserved = (sfp->rsv_srp == srp);
@@ -2871,7 +2870,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
 	}
 
 	if (likely(md)) {	/* normal, "indirect" IO */
-		if (unlikely((srp->rq_flags & SG_FLAG_MMAP_IO))) {
+		if (unlikely(srp->rq_flags & SG_FLAG_MMAP_IO)) {
 			/* mmap IO must use and fit in reserve request */
 			if (!reserved || dxfer_len > req_schp->buflen)
 				res = reserved ? -ENOMEM : -EBUSY;
@@ -2880,7 +2879,7 @@ sg_start_req(struct sg_request *srp, u8 *cmd, int cmd_len,
 
 			res = sg_mk_sgat(srp, sfp, up_sz);
 		}
-		if (res)
+		if (unlikely(res))
 			goto fini;
 
 		sg_set_map_data(req_schp, !!up, md);
@@ -3216,7 +3215,7 @@ sg_mk_srp_sgat(struct sg_fd *sfp, bool first, int db_len)
 		return n_srp;
 	if (db_len > 0) {
 		res = sg_mk_sgat(n_srp, sfp, db_len);
-		if (res) {
+		if (unlikely(res)) {
 			kfree(n_srp);
 			return ERR_PTR(res);
 		}
@@ -3269,18 +3268,18 @@ sg_build_reserve(struct sg_fd *sfp, int buflen)
  * failure returns a negated errno value twisted by ERR_PTR() macro.
  */
 static struct sg_request *
-sg_setup_req(struct sg_fd *sfp, int dxfr_len, struct sg_comm_wr_t *cwrp)
+sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 {
 	bool act_empty = false;
 	bool found = false;
-	bool mk_new_srp = false;
+	bool mk_new_srp = true;
 	bool try_harder = false;
-	int res;
 	int num_inactive = 0;
 	unsigned long idx, last_idx, iflags;
+	struct sg_fd *fp = cwrp->sfp;
 	struct sg_request *r_srp = NULL;	/* request to return */
 	struct sg_request *last_srp = NULL;
-	struct xarray *xafp = &sfp->srp_arr;
+	struct xarray *xafp = &fp->srp_arr;
 	__maybe_unused const char *cp;
 
 start_again:
@@ -3333,18 +3332,19 @@ sg_setup_req(struct sg_fd *sfp, int dxfr_len, struct sg_comm_wr_t *cwrp)
 		mk_new_srp = true;
 	}
 	if (mk_new_srp) {
-		bool allow_cmd_q = test_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
+		bool allow_cmd_q = test_bit(SG_FFD_CMD_Q, fp->ffd_bm);
+		int res;
 		u32 n_idx;
 		struct xa_limit xal = { .max = 0, .min = 0 };
 
 		cp = "new";
-		if (!allow_cmd_q && atomic_read(&sfp->submitted) > 0) {
+		if (!allow_cmd_q && atomic_read(&fp->submitted) > 0) {
 			r_srp = ERR_PTR(-EDOM);
-			SG_LOG(6, sfp, "%s: trying 2nd req but cmd_q=false\n",
+			SG_LOG(6, fp, "%s: trying 2nd req but cmd_q=false\n",
 			       __func__);
 			goto fini;
 		}
-		r_srp = sg_mk_srp_sgat(sfp, act_empty, dxfr_len);
+		r_srp = sg_mk_srp_sgat(fp, act_empty, dxfr_len);
 		if (IS_ERR(r_srp)) {
 			if (!try_harder && dxfr_len < SG_DEF_SECTOR_SZ &&
 			    num_inactive > 0) {
@@ -3355,11 +3355,11 @@ sg_setup_req(struct sg_fd *sfp, int dxfr_len, struct sg_comm_wr_t *cwrp)
 		}
 		atomic_set(&r_srp->rq_st, SG_RS_BUSY);
 		xa_lock_irqsave(xafp, iflags);
-		xal.max = atomic_inc_return(&sfp->req_cnt);
+		xal.max = atomic_inc_return(&fp->req_cnt);
 		res = __xa_alloc(xafp, &n_idx, r_srp, xal, GFP_KERNEL);
 		xa_unlock_irqrestore(xafp, iflags);
 		if (res < 0) {
-			SG_LOG(1, sfp, "%s: xa_alloc() failed, errno=%d\n",
+			SG_LOG(1, fp, "%s: xa_alloc() failed, errno=%d\n",
 			       __func__,  -res);
 			sg_remove_sgat(r_srp);
 			kfree(r_srp);
@@ -3368,17 +3368,18 @@ sg_setup_req(struct sg_fd *sfp, int dxfr_len, struct sg_comm_wr_t *cwrp)
 		}
 		idx = n_idx;
 		r_srp->rq_idx = idx;
-		r_srp->parentfp = sfp;
-		SG_LOG(4, sfp, "%s: mk_new_srp=0x%pK ++\n", __func__, r_srp);
+		r_srp->parentfp = fp;
+		SG_LOG(4, fp, "%s: mk_new_srp=0x%pK ++\n", __func__, r_srp);
 	}
 	r_srp->frq_bm[0] = cwrp->frq_bm[0];	/* assumes <= 32 req flags */
 	r_srp->sgat_h.dlen = dxfr_len;/* must be <= r_srp->sgat_h.buflen */
 	r_srp->cmd_opcode = 0xff;  /* set invalid opcode (VS), 0x0 is TUR */
 fini:
 	if (IS_ERR(r_srp))
-		SG_LOG(1, sfp, "%s: err=%ld\n", __func__, PTR_ERR(r_srp));
+		SG_LOG(1, fp, "%s: err=%ld\n", __func__, PTR_ERR(r_srp));
 	if (!IS_ERR(r_srp))
-		SG_LOG(4, sfp, "%s: %s r_srp=0x%pK\n", __func__, cp, r_srp);
+		SG_LOG(4, fp, "%s: %s %sr_srp=0x%pK\n", __func__, cp,
+		       ((r_srp == fp->rsv_srp) ? "[rsv] " : ""), r_srp);
 	return r_srp;
 }
 
@@ -3433,6 +3434,8 @@ sg_add_sfp(struct sg_device *sdp)
 	__assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, SG_DEF_COMMAND_Q);
 	__assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, SG_DEF_KEEP_ORPHAN);
 	__assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm, SG_DEFAULT_Q_AT);
+	atomic_set(&sfp->submitted, 0);
+	atomic_set(&sfp->waiting, 0);
 	/*
 	 * SG_SCATTER_SZ initializes scatter_elem_sz but different value may
 	 * be given as driver/module parameter (e.g. 'scatter_elem_sz=8192').
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 31/83] sg: add sg_iosubmit_v3 and sg_ioreceive_v3 ioctls
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (30 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 30/83] sg: expand sg_comm_wr_t Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 32/83] sg: add some __must_hold macros Douglas Gilbert
                   ` (51 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Add ioctl(SG_IOSUBMIT_V3) and ioctl(SG_IORECEIVE_V3). These ioctls
are meant to be (almost) drop-in replacements for the write()/read()
async version 3 interface. They only accept the version 3 interface.

See the webpage at: https://sg.danny.cz/sg/sg_v40.html
specifically the table in the section titled: "13 SG interface
support changes".

If sgv3 is a struct sg_io_hdr object, suitably configured, then
    res = write(sg_fd, &sgv3, sizeof(sgv3));
and
    res = ioctl(sg_fd, SG_IOSUBMIT_V3, &sgv3);
are equivalent. Dito for read() and ioctl(SG_IORECEIVE_V3).

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 76 ++++++++++++++++++++++++++++++++++++++++++
 include/uapi/scsi/sg.h |  6 ++++
 2 files changed, 82 insertions(+)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 3fd2feec462f..5c21f3ac7d9d 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -833,6 +833,24 @@ sg_ctl_iosubmit(struct file *filp, struct sg_fd *sfp, void __user *p)
 	return -EPERM;
 }
 
+static int
+sg_ctl_iosubmit_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
+{
+	int res;
+	u8 hdr_store[SZ_SG_IO_V4];      /* max(v3interface, v4interface) */
+	struct sg_io_hdr *h3p = (struct sg_io_hdr *)hdr_store;
+	struct sg_device *sdp = sfp->parentdp;
+
+	res = sg_allow_if_err_recovery(sdp, (filp->f_flags & O_NONBLOCK));
+	if (unlikely(res))
+		return res;
+	if (copy_from_user(h3p, p, SZ_SG_IO_HDR))
+		return -EFAULT;
+	if (h3p->interface_id == 'S')
+		return sg_v3_submit(filp, sfp, h3p, false, NULL);
+	return -EPERM;
+}
+
 #if IS_ENABLED(SG_LOG_ACTIVE)
 static void
 sg_rq_state_fail_msg(struct sg_fd *sfp, enum sg_rq_state exp_old_st,
@@ -1156,6 +1174,7 @@ sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, size_t count,
 	hp->sb_len_wr = srp->sense_len;
 	hp->info = srp->rq_info;
 	hp->resid = srp->in_resid;
+	hp->pack_id = srp->pack_id;
 	hp->duration = srp->duration;
 	hp->status = rq_result & 0xff;
 	hp->masked_status = status_byte(rq_result);
@@ -1271,6 +1290,57 @@ sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
 	return sg_receive_v4(sfp, srp, p, h4p);
 }
 
+/*
+ * Called when ioctl(SG_IORECEIVE_V3) received. Expects a v3 interface.
+ * Checks if O_NONBLOCK file flag given, if not checks given flags field
+ * to see if SGV4_FLAG_IMMED is set. Either of these implies non blocking.
+ * When non-blocking and there is no request waiting, yields EAGAIN;
+ * otherwise it waits.
+ */
+static int
+sg_ctl_ioreceive_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
+{
+	bool non_block = !!(filp->f_flags & O_NONBLOCK);
+	int res;
+	int pack_id = SG_PACK_ID_WILDCARD;
+	u8 v3_holder[SZ_SG_IO_HDR];
+	struct sg_io_hdr *h3p = (struct sg_io_hdr *)v3_holder;
+	struct sg_device *sdp = sfp->parentdp;
+	struct sg_request *srp;
+
+	res = sg_allow_if_err_recovery(sdp, non_block);
+	if (unlikely(res))
+		return res;
+	/* Get first three 32 bit integers: guard, proto+subproto */
+	if (copy_from_user(h3p, p, SZ_SG_IO_HDR))
+		return -EFAULT;
+	/* for v3: interface_id=='S' (in a 32 bit int) */
+	if (h3p->interface_id != 'S')
+		return -EPERM;
+	if (h3p->flags & SGV4_FLAG_IMMED)
+		non_block = true;	/* set by either this or O_NONBLOCK */
+	SG_LOG(3, sfp, "%s: non_block(+IMMED)=%d\n", __func__, non_block);
+
+	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm))
+		pack_id = h3p->pack_id;
+
+	srp = sg_find_srp_by_id(sfp, pack_id);
+	if (!srp) {     /* nothing available so wait on packet or */
+		if (unlikely(SG_IS_DETACHING(sdp)))
+			return -ENODEV;
+		if (non_block)
+			return -EAGAIN;
+		res = wait_event_interruptible
+				(sfp->read_wait,
+				 sg_get_ready_srp(sfp, &srp, pack_id));
+		if (unlikely(SG_IS_DETACHING(sdp)))
+			return -ENODEV;
+		if (unlikely(res))
+			return res;	/* signal --> -ERESTARTSYS */
+	}	/* now srp should be valid */
+	return sg_receive_v3(sfp, srp, SZ_SG_IO_HDR, p);
+}
+
 static int
 sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 	     struct sg_request *srp)
@@ -1863,9 +1933,15 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	case SG_IOSUBMIT:
 		SG_LOG(3, sfp, "%s:    SG_IOSUBMIT\n", __func__);
 		return sg_ctl_iosubmit(filp, sfp, p);
+	case SG_IOSUBMIT_V3:
+		SG_LOG(3, sfp, "%s:    SG_IOSUBMIT_V3\n", __func__);
+		return sg_ctl_iosubmit_v3(filp, sfp, p);
 	case SG_IORECEIVE:
 		SG_LOG(3, sfp, "%s:    SG_IORECEIVE\n", __func__);
 		return sg_ctl_ioreceive(filp, sfp, p);
+	case SG_IORECEIVE_V3:
+		SG_LOG(3, sfp, "%s:    SG_IORECEIVE_V3\n", __func__);
+		return sg_ctl_ioreceive_v3(filp, sfp, p);
 	case SG_GET_SCSI_ID:
 		return sg_ctl_scsi_id(sdev, sfp, p);
 	case SG_SET_FORCE_PACK_ID:
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 4a073708aca7..6373bc83c3b3 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -357,6 +357,12 @@ struct sg_header {
 /* Gives some v4 identifying info to driver, receives associated response */
 #define SG_IORECEIVE _IOWR(SG_IOCTL_MAGIC_NUM, 0x42, struct sg_io_v4)
 
+/* Submits a v3 interface object to driver */
+#define SG_IOSUBMIT_V3 _IOWR(SG_IOCTL_MAGIC_NUM, 0x45, struct sg_io_hdr)
+
+/* Gives some v3 identifying info to driver, receives associated response */
+#define SG_IORECEIVE_V3 _IOWR(SG_IOCTL_MAGIC_NUM, 0x46, struct sg_io_hdr)
+
 /* command queuing is always on when the v3 or v4 interface is used */
 #define SG_DEF_COMMAND_Q 0
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 32/83] sg: add some __must_hold macros
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (31 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 31/83] sg: add sg_iosubmit_v3 and sg_ioreceive_v3 ioctls Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 33/83] sg: move procfs objects to avoid forward decls Douglas Gilbert
                   ` (50 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

In the case of sg_wait_open_event() which calls mutex_unlock on
sdp->open_rel_lock and later calls mutex_lock on the same
lock; this macro is needed to stop sparse complaining. In
other cases it is a reminder to the coder (a precondition).

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 5c21f3ac7d9d..b985b11cfbd1 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -374,6 +374,7 @@ sg_check_file_access(struct file *filp, const char *caller)
 
 static int
 sg_wait_open_event(struct sg_device *sdp, bool o_excl)
+		__must_hold(sdp->open_rel_lock)
 {
 	int res = 0;
 
@@ -1728,6 +1729,7 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
  */
 static int
 sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
+		__must_hold(sfp->f_mutex)
 {
 	bool use_new_srp = false;
 	int res = 0;
@@ -3671,12 +3673,12 @@ sg_remove_sfp(struct kref *kref)
 
 static int
 sg_idr_max_id(int id, void *p, void *data)
+		__must_hold(sg_index_lock)
 {
 	int *k = data;
 
 	if (*k < id)
 		*k = id;
-
 	return 0;
 }
 
@@ -3979,6 +3981,7 @@ sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 /* Writes debug info for one sg_request in obp buffer */
 static int
 sg_proc_debug_sreq(struct sg_request *srp, int to, char *obp, int len)
+		__must_hold(sfp->srp_arr.xa_lock)
 {
 	bool is_v3v4, v4, is_dur;
 	int n = 0;
@@ -4081,6 +4084,7 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
 /* Writes debug info for one sg device (including its sg fds) in obp buffer */
 static int
 sg_proc_debug_sdev(struct sg_device *sdp, char *obp, int len, int *fd_counterp)
+		__must_hold(sg_index_lock)
 {
 	int n = 0;
 	int my_count = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 33/83] sg: move procfs objects to avoid forward decls
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (32 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 32/83] sg: add some __must_hold macros Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 34/83] sg: protect multiple receivers Douglas Gilbert
                   ` (49 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Move the procfs related file_operations and seq_operations
definitions toward the end of the source file to minimize the
need for forward declarations of the functions they name.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 129 +++++++++++++++++++++-------------------------
 1 file changed, 58 insertions(+), 71 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index b985b11cfbd1..fa464a9e0ea5 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -3741,77 +3741,6 @@ sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
 #endif
 
 #if IS_ENABLED(CONFIG_SCSI_PROC_FS)     /* long, almost to end of file */
-static int sg_proc_seq_show_int(struct seq_file *s, void *v);
-
-static int sg_proc_single_open_adio(struct inode *inode, struct file *filp);
-static ssize_t sg_proc_write_adio(struct file *filp, const char __user *buffer,
-			          size_t count, loff_t *off);
-static const struct proc_ops adio_proc_ops = {
-	.proc_open	= sg_proc_single_open_adio,
-	.proc_read	= seq_read,
-	.proc_lseek	= seq_lseek,
-	.proc_write	= sg_proc_write_adio,
-	.proc_release	= single_release,
-};
-
-static int sg_proc_single_open_dressz(struct inode *inode, struct file *filp);
-static ssize_t sg_proc_write_dressz(struct file *filp, 
-		const char __user *buffer, size_t count, loff_t *off);
-static const struct proc_ops dressz_proc_ops = {
-	.proc_open	= sg_proc_single_open_dressz,
-	.proc_read	= seq_read,
-	.proc_lseek	= seq_lseek,
-	.proc_write	= sg_proc_write_dressz,
-	.proc_release	= single_release,
-};
-
-static int sg_proc_seq_show_version(struct seq_file *s, void *v);
-static int sg_proc_seq_show_devhdr(struct seq_file *s, void *v);
-static int sg_proc_seq_show_dev(struct seq_file *s, void *v);
-static void * dev_seq_start(struct seq_file *s, loff_t *pos);
-static void * dev_seq_next(struct seq_file *s, void *v, loff_t *pos);
-static void dev_seq_stop(struct seq_file *s, void *v);
-static const struct seq_operations dev_seq_ops = {
-	.start = dev_seq_start,
-	.next  = dev_seq_next,
-	.stop  = dev_seq_stop,
-	.show  = sg_proc_seq_show_dev,
-};
-
-static int sg_proc_seq_show_devstrs(struct seq_file *s, void *v);
-static const struct seq_operations devstrs_seq_ops = {
-	.start = dev_seq_start,
-	.next  = dev_seq_next,
-	.stop  = dev_seq_stop,
-	.show  = sg_proc_seq_show_devstrs,
-};
-
-static int sg_proc_seq_show_debug(struct seq_file *s, void *v);
-static const struct seq_operations debug_seq_ops = {
-	.start = dev_seq_start,
-	.next  = dev_seq_next,
-	.stop  = dev_seq_stop,
-	.show  = sg_proc_seq_show_debug,
-};
-
-static int
-sg_proc_init(void)
-{
-	struct proc_dir_entry *p;
-
-	p = proc_mkdir("scsi/sg", NULL);
-	if (!p)
-		return 1;
-
-	proc_create("allow_dio", 0644, p, &adio_proc_ops);
-	proc_create_seq("debug", 0444, p, &debug_seq_ops);
-	proc_create("def_reserved_size", 0644, p, &dressz_proc_ops);
-	proc_create_single("device_hdr", 0444, p, sg_proc_seq_show_devhdr);
-	proc_create_seq("devices", 0444, p, &dev_seq_ops);
-	proc_create_seq("device_strs", 0444, p, &devstrs_seq_ops);
-	proc_create_single("version", 0444, p, sg_proc_seq_show_version);
-	return 0;
-}
 
 static int
 sg_last_dev(void)
@@ -4184,6 +4113,64 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v)
 	return 0;
 }
 
+static const struct proc_ops adio_proc_ops = {
+	.proc_open      = sg_proc_single_open_adio,
+	.proc_read      = seq_read,
+	.proc_lseek     = seq_lseek,
+	.proc_write     = sg_proc_write_adio,
+	.proc_release   = single_release,
+};
+
+static const struct proc_ops dressz_proc_ops = {
+	.proc_open      = sg_proc_single_open_dressz,
+	.proc_read      = seq_read,
+	.proc_lseek     = seq_lseek,
+	.proc_write     = sg_proc_write_dressz,
+	.proc_release   = single_release,
+};
+
+static const struct seq_operations dev_seq_ops = {
+	.start = dev_seq_start,
+	.next  = dev_seq_next,
+	.stop  = dev_seq_stop,
+	.show  = sg_proc_seq_show_dev,
+};
+
+static const struct seq_operations devstrs_seq_ops = {
+	.start = dev_seq_start,
+	.next  = dev_seq_next,
+	.stop  = dev_seq_stop,
+	.show  = sg_proc_seq_show_devstrs,
+};
+
+static const struct seq_operations debug_seq_ops = {
+	.start = dev_seq_start,
+	.next  = dev_seq_next,
+	.stop  = dev_seq_stop,
+	.show  = sg_proc_seq_show_debug,
+};
+
+static int
+sg_proc_init(void)
+{
+	struct proc_dir_entry *p;
+
+	p = proc_mkdir("scsi/sg", NULL);
+	if (!p)
+		return 1;
+
+	proc_create("allow_dio", 0644, p, &adio_proc_ops);
+	proc_create_seq("debug", 0444, p, &debug_seq_ops);
+	proc_create("def_reserved_size", 0644, p, &dressz_proc_ops);
+	proc_create_single("device_hdr", 0444, p, sg_proc_seq_show_devhdr);
+	proc_create_seq("devices", 0444, p, &dev_seq_ops);
+	proc_create_seq("device_strs", 0444, p, &devstrs_seq_ops);
+	proc_create_single("version", 0444, p, sg_proc_seq_show_version);
+	return 0;
+}
+
+/* remove_proc_subtree("scsi/sg", NULL) in exit_sg() does cleanup */
+
 #endif				/* CONFIG_SCSI_PROC_FS (~400 lines back) */
 
 module_init(init_sg);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 34/83] sg: protect multiple receivers
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (33 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 33/83] sg: move procfs objects to avoid forward decls Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 35/83] sg: first debugfs support Douglas Gilbert
                   ` (48 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

If two threads call ioctl(SG_IORECEIVE) [or read()] on the same
file descriptor there is a potential race on the same request
response. Use atomic bit operations to make sure only one thread
gets each request response. [The other thread will either get
another request response or nothing.]

Also make sfp cleanup a bit more robust and report if the
number of submitted requests (which are decremented when
completed) is other than the expected value of zero.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 48 ++++++++++++++++++++++++++++++++++-------------
 1 file changed, 35 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index fa464a9e0ea5..ae6a77e0a148 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -109,6 +109,7 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 #define SG_FRQ_SYNC_INVOC	2	/* synchronous (blocking) invocation */
 #define SG_FRQ_NO_US_XFER	3	/* no user space transfer of data */
 #define SG_FRQ_DEACT_ORPHAN	6	/* not keeping orphan so de-activate */
+#define SG_FRQ_RECEIVING	7	/* guard against multiple receivers */
 
 /* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */
 #define SG_FFD_FORCE_PACKID	0	/* receive only given pack_id/tag */
@@ -1274,6 +1275,7 @@ sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
 	SG_LOG(3, sfp, "%s: non_block(+IMMED)=%d\n", __func__, non_block);
 	/* read in part of v3 or v4 header for pack_id or tag based find */
 	id = pack_id;
+try_again:
 	srp = sg_find_srp_by_id(sfp, id);
 	if (!srp) {     /* nothing available so wait on packet or */
 		if (unlikely(SG_IS_DETACHING(sdp)))
@@ -1288,6 +1290,10 @@ sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
 		if (res)
 			return res;	/* signal --> -ERESTARTSYS */
 	}	/* now srp should be valid */
+	if (test_and_set_bit(SG_FRQ_RECEIVING, srp->frq_bm)) {
+		cpu_relax();
+		goto try_again;
+	}
 	return sg_receive_v4(sfp, srp, p, h4p);
 }
 
@@ -1324,7 +1330,7 @@ sg_ctl_ioreceive_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
 
 	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm))
 		pack_id = h3p->pack_id;
-
+try_again:
 	srp = sg_find_srp_by_id(sfp, pack_id);
 	if (!srp) {     /* nothing available so wait on packet or */
 		if (unlikely(SG_IS_DETACHING(sdp)))
@@ -1339,6 +1345,10 @@ sg_ctl_ioreceive_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
 		if (unlikely(res))
 			return res;	/* signal --> -ERESTARTSYS */
 	}	/* now srp should be valid */
+	if (test_and_set_bit(SG_FRQ_RECEIVING, srp->frq_bm)) {
+		cpu_relax();
+		goto try_again;
+	}
 	return sg_receive_v3(sfp, srp, SZ_SG_IO_HDR, p);
 }
 
@@ -1491,6 +1501,7 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 			want_id = h2p->pack_id;
 		}
 	}
+try_again:
 	srp = sg_find_srp_by_id(sfp, want_id);
 	if (!srp) {	/* nothing available so wait on packet to arrive or */
 		if (unlikely(SG_IS_DETACHING(sdp)))
@@ -1506,6 +1517,10 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 			return ret;
 		/* otherwise srp should be valid */
 	}
+	if (test_and_set_bit(SG_FRQ_RECEIVING, srp->frq_bm)) {
+		cpu_relax();
+		goto try_again;
+	}
 	if (srp->s_hdr3.interface_id == '\0')
 		ret = sg_read_v1v2(p, (int)count, sfp, srp);
 	else
@@ -3025,28 +3040,29 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 		atomic_dec(&sfp->submitted);
 		atomic_dec(&sfp->waiting);
 	}
+
+	/* Expect blk_put_request(rq) already called in sg_rq_end_io() */
+	if (rq) {       /* blk_get_request() may have failed */
+		srp->rq = NULL;
+		if (scsi_req(rq))
+			scsi_req_free_cmd(scsi_req(rq));
+		blk_put_request(rq);
+	}
 	if (srp->bio) {
 		bool us_xfer = !test_bit(SG_FRQ_NO_US_XFER, srp->frq_bm);
+		struct bio *bio = srp->bio;
 
-		if (us_xfer) {
-			ret = blk_rq_unmap_user(srp->bio);
+		srp->bio = NULL;
+		if (us_xfer && bio) {
+			ret = blk_rq_unmap_user(bio);
 			if (ret) {	/* -EINTR (-4) can be ignored */
 				SG_LOG(6, sfp,
 				       "%s: blk_rq_unmap_user() --> %d\n",
 				       __func__, ret);
 			}
 		}
-		srp->bio = NULL;
-	}
-	/* In worst case READ data returned to user space by this point */
-
-	/* Expect blk_put_request(rq) already called in sg_rq_end_io() */
-	if (rq) {       /* blk_get_request() may have failed */
-		if (scsi_req(rq))
-			scsi_req_free_cmd(scsi_req(rq));
-		srp->rq = NULL;
-		blk_put_request(rq);
 	}
+	/* In worst case, READ data returned to user space by this point */
 }
 
 static int
@@ -3476,6 +3492,7 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 		return;
 	sbp = srp->sense_bp;
 	srp->sense_bp = NULL;
+	srp->frq_bm[0] = 0;
 	sg_rq_state_chg(srp, 0, SG_RS_INACTIVE, true /* force */, __func__);
 	/* maybe orphaned req, thus never read */
 	if (sbp)
@@ -3608,6 +3625,7 @@ static void
 sg_remove_sfp_usercontext(struct work_struct *work)
 {
 	__maybe_unused int o_count;
+	int subm;
 	unsigned long idx, iflags;
 	struct sg_device *sdp;
 	struct sg_fd *sfp = container_of(work, struct sg_fd, ew_fd.work);
@@ -3645,6 +3663,10 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 		SG_LOG(6, sfp, "%s: kfree: srp=%pK --\n", __func__, srp);
 		kfree(srp);
 	}
+	subm = atomic_read(&sfp->submitted);
+	if (subm != 0)
+		SG_LOG(1, sfp, "%s: expected submitted=0 got %d\n",
+		       __func__, subm);
 	xa_destroy(xafp);
 	xa_lock_irqsave(xadp, iflags);
 	e_sfp = __xa_erase(xadp, sfp->idx);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 35/83] sg: first debugfs support
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (34 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 34/83] sg: protect multiple receivers Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 36/83] sg: rework mmap support Douglas Gilbert
                   ` (47 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Duplicate the semantics of 'cat /proc/scsi/sg/debug' on
'cat /sys/kernel/debug/scsi_generic/snapshot'. Make code
that generates the snapshot conditional on either
CONFIG_SCSI_PROC_FS or CONFIG_DEBUG_FS being defined.

Also add snapshot_devs which can be written with a list of
comma separated integers corresponding to sg (minor) device
numbers. That file can also be read showing that list. Minus
one (or any negative number) means accept all when in the
first position (the default) or means the end of the list
in a later position. When a subsequent
cat /sys/kernel/debug/scsi_generic/snapshot
is performed, only sg device numbers matching an element
in that list are output.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 412 +++++++++++++++++++++++++++++++++++++---------
 1 file changed, 332 insertions(+), 80 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index ae6a77e0a148..fdbff29669d3 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -43,6 +43,7 @@ static char *sg_version_date = "20190606";
 #include <linux/cred.h>			/* for sg_check_file_access() */
 #include <linux/proc_fs.h>
 #include <linux/xarray.h>
+#include <linux/debugfs.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_eh.h>
@@ -67,6 +68,10 @@ static char *sg_version_date = "20190606";
 #endif
 #endif
 
+#if IS_ENABLED(CONFIG_SCSI_PROC_FS) || IS_ENABLED(CONFIG_DEBUG_FS)
+#define SG_PROC_OR_DEBUG_FS 1
+#endif
+
 /* SG_MAX_CDB_SIZE should be 260 (spc4r37 section 3.1.30) however the type
  * of sg_io_hdr::cmd_len can only represent 255. All SCSI commands greater
  * than 16 bytes are "variable length" whose length is a multiple of 4, so:
@@ -268,6 +273,8 @@ struct sg_comm_wr_t {  /* arguments to sg_common_write() */
 static void sg_rq_end_io(struct request *rq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
 static int sg_proc_init(void);
+static void sg_dfs_init(void);
+static void sg_dfs_exit(void);
 static int sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp,
 			int dxfer_dir);
 static void sg_finish_scsi_blk_rq(struct sg_request *srp);
@@ -2731,22 +2738,6 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 	kref_put(&sdp->d_ref, sg_device_destroy);
 }
 
-module_param_named(scatter_elem_sz, scatter_elem_sz, int, S_IRUGO | S_IWUSR);
-module_param_named(def_reserved_size, def_reserved_size, int,
-		   S_IRUGO | S_IWUSR);
-module_param_named(allow_dio, sg_allow_dio, int, S_IRUGO | S_IWUSR);
-
-MODULE_AUTHOR("Douglas Gilbert");
-MODULE_DESCRIPTION("SCSI generic (sg) driver");
-MODULE_LICENSE("GPL");
-MODULE_VERSION(SG_VERSION_STR);
-MODULE_ALIAS_CHARDEV_MAJOR(SCSI_GENERIC_MAJOR);
-
-MODULE_PARM_DESC(scatter_elem_sz, "scatter gather element "
-                "size (default: max(SG_SCATTER_SZ, PAGE_SIZE))");
-MODULE_PARM_DESC(def_reserved_size, "size of buffer reserved for each fd");
-MODULE_PARM_DESC(allow_dio, "allow direct I/O (default: 0 (disallow))");
-
 static int __init
 init_sg(void)
 {
@@ -2796,6 +2787,7 @@ init_sg(void)
 	rc = scsi_register_interface(&sg_interface);
 	if (rc == 0) {
 		sg_proc_init();
+		sg_dfs_init();
 		return 0;
 	}
 	class_destroy(sg_sysfs_class);
@@ -2809,17 +2801,10 @@ init_sg(void)
 	return rc;
 }
 
-#if !IS_ENABLED(CONFIG_SCSI_PROC_FS)
-static int
-sg_proc_init(void)
-{
-	return 0;
-}
-#endif
-
 static void __exit
 exit_sg(void)
 {
+	sg_dfs_exit();
 	if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
 		remove_proc_subtree("scsi/sg", NULL);
 	scsi_unregister_interface(&sg_interface);
@@ -3256,7 +3241,7 @@ sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 	/* here if one of above loops does _not_ find a match */
 	if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
 		if (search_for_1) {
-			const char *cptp = "pack_id=";
+			__maybe_unused const char *cptp = "pack_id=";
 
 			if (is_bad_st)
 				SG_LOG(1, sfp, "%s: %s%d wrong state: %s\n",
@@ -3693,17 +3678,6 @@ sg_remove_sfp(struct kref *kref)
 	schedule_work(&sfp->ew_fd.work);
 }
 
-static int
-sg_idr_max_id(int id, void *p, void *data)
-		__must_hold(sg_index_lock)
-{
-	int *k = data;
-
-	if (*k < id)
-		*k = id;
-	return 0;
-}
-
 /* must be called with sg_index_lock held */
 static struct sg_device *
 sg_lookup_dev(int dev)
@@ -3733,7 +3707,7 @@ sg_get_dev(int dev)
 	return sdp;
 }
 
-#if IS_ENABLED(CONFIG_SCSI_PROC_FS)
+#if IS_ENABLED(SG_PROC_OR_DEBUG_FS)
 static const char *
 sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
 {
@@ -3762,7 +3736,35 @@ sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
 }
 #endif
 
-#if IS_ENABLED(CONFIG_SCSI_PROC_FS)     /* long, almost to end of file */
+#if IS_ENABLED(SG_PROC_OR_DEBUG_FS)
+
+#define SG_SNAPSHOT_DEV_MAX 4
+
+/*
+ * For snapshot_devs array, -1 or two adjacent the same is terminator.
+ * -1 in first element of first two elements the same implies all.
+ */
+static struct sg_dfs_context_t {
+	struct dentry *dfs_rootdir;
+	int snapshot_devs[SG_SNAPSHOT_DEV_MAX];
+} sg_dfs_cxt;
+
+struct sg_proc_deviter {
+	loff_t	index;
+	size_t	max;
+	int fd_index;
+};
+
+static int
+sg_idr_max_id(int id, void *p, void *data)
+		__must_hold(sg_index_lock)
+{
+	int *k = data;
+
+	if (*k < id)
+		*k = id;
+	return 0;
+}
 
 static int
 sg_last_dev(void)
@@ -3776,6 +3778,41 @@ sg_last_dev(void)
 	return k + 1;		/* origin 1 */
 }
 
+static void *
+dev_seq_start(struct seq_file *s, loff_t *pos)
+{
+	struct sg_proc_deviter *it = kzalloc(sizeof(*it), GFP_KERNEL);
+
+	s->private = it;
+	if (!it)
+		return NULL;
+
+	it->index = *pos;
+	it->max = sg_last_dev();
+	if (it->index >= it->max)
+		return NULL;
+	return it;
+}
+
+static void *
+dev_seq_next(struct seq_file *s, void *v, loff_t *pos)
+{
+	struct sg_proc_deviter *it = s->private;
+
+	*pos = ++it->index;
+	return (it->index < it->max) ? it : NULL;
+}
+
+static void
+dev_seq_stop(struct seq_file *s, void *v)
+{
+	kfree(s->private);
+}
+
+#endif			/* SG_PROC_OR_DEBUG_FS */
+
+#if IS_ENABLED(CONFIG_SCSI_PROC_FS)     /* around 100 lines */
+
 static int
 sg_proc_seq_show_int(struct seq_file *s, void *v)
 {
@@ -3789,7 +3826,7 @@ sg_proc_single_open_adio(struct inode *inode, struct file *filp)
 	return single_open(filp, sg_proc_seq_show_int, &sg_allow_dio);
 }
 
-static ssize_t 
+static ssize_t
 sg_proc_write_adio(struct file *filp, const char __user *buffer,
 		   size_t count, loff_t *off)
 {
@@ -3811,7 +3848,7 @@ sg_proc_single_open_dressz(struct inode *inode, struct file *filp)
 	return single_open(filp, sg_proc_seq_show_int, &sg_big_buff);
 }
 
-static ssize_t 
+static ssize_t
 sg_proc_write_dressz(struct file *filp, const char __user *buffer,
 		     size_t count, loff_t *off)
 {
@@ -3846,43 +3883,6 @@ sg_proc_seq_show_devhdr(struct seq_file *s, void *v)
 	return 0;
 }
 
-struct sg_proc_deviter {
-	loff_t	index;
-	size_t	max;
-	int fd_index;
-};
-
-static void *
-dev_seq_start(struct seq_file *s, loff_t *pos)
-{
-	struct sg_proc_deviter *it = kzalloc(sizeof(*it), GFP_KERNEL);
-
-	s->private = it;
-	if (! it)
-		return NULL;
-
-	it->index = *pos;
-	it->max = sg_last_dev();
-	if (it->index >= it->max)
-		return NULL;
-	return it;
-}
-
-static void *
-dev_seq_next(struct seq_file *s, void *v, loff_t *pos)
-{
-	struct sg_proc_deviter *it = s->private;
-
-	*pos = ++it->index;
-	return (it->index < it->max) ? it : NULL;
-}
-
-static void
-dev_seq_stop(struct seq_file *s, void *v)
-{
-	kfree(s->private);
-}
-
 static int
 sg_proc_seq_show_dev(struct seq_file *s, void *v)
 {
@@ -3929,6 +3929,10 @@ sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 	return 0;
 }
 
+#endif		/* CONFIG_SCSI_PROC_FS (~100 lines back) */
+
+#if IS_ENABLED(SG_PROC_OR_DEBUG_FS)
+
 /* Writes debug info for one sg_request in obp buffer */
 static int
 sg_proc_debug_sreq(struct sg_request *srp, int to, char *obp, int len)
@@ -4071,18 +4075,20 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v)
 	bool found = false;
 	bool trunc = false;
 	const int bp_len = SG_PROC_DEBUG_SZ;
+	int j, sd_n;
 	int n = 0;
 	int k = 0;
 	unsigned long iflags;
 	struct sg_proc_deviter *it = (struct sg_proc_deviter *)v;
 	struct sg_device *sdp;
 	int *fdi_p;
+	const int *dev_arr = sg_dfs_cxt.snapshot_devs;
 	char *bp;
 	char *disk_name;
 	char b1[128];
 
 	b1[0] = '\0';
-	if (it && (0 == it->index))
+	if (it && it->index == 0)
 		seq_printf(s, "max_active_device=%d  def_reserved_size=%d\n",
 			   (int)it->max, def_reserved_size);
 	fdi_p = it ? &it->fd_index : &k;
@@ -4094,8 +4100,31 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v)
 	}
 	read_lock_irqsave(&sg_index_lock, iflags);
 	sdp = it ? sg_lookup_dev(it->index) : NULL;
-	if (NULL == sdp)
+	if (!sdp)
 		goto skip;
+	sd_n = dev_arr[0];
+	if (sd_n != -1 && sd_n != sdp->index && sd_n != dev_arr[1]) {
+		for (j = 1; j < SG_SNAPSHOT_DEV_MAX; ) {
+			sd_n = dev_arr[j];
+			if (sd_n < 0)
+				goto skip;
+			++j;
+			if (j >= SG_SNAPSHOT_DEV_MAX) {
+				if (sd_n == sdp->index) {
+					found = true;
+					break;
+				}
+			} else if (sd_n == dev_arr[j]) {
+				goto skip;
+			} else if (sd_n == sdp->index) {
+				found = true;
+				break;
+			}
+		}
+		if (!found)
+			goto skip;
+		found = false;
+	}
 	if (!xa_empty(&sdp->sfp_arr)) {
 		found = true;
 		disk_name = (sdp->disk ? sdp->disk->disk_name : "?_?");
@@ -4135,6 +4164,10 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v)
 	return 0;
 }
 
+#endif         /* SG_PROC_OR_DEBUG_FS */
+
+#if IS_ENABLED(CONFIG_SCSI_PROC_FS)
+
 static const struct proc_ops adio_proc_ops = {
 	.proc_open      = sg_proc_single_open_adio,
 	.proc_read      = seq_read,
@@ -4193,7 +4226,226 @@ sg_proc_init(void)
 
 /* remove_proc_subtree("scsi/sg", NULL) in exit_sg() does cleanup */
 
-#endif				/* CONFIG_SCSI_PROC_FS (~400 lines back) */
+#else
+
+static int
+sg_proc_init(void)
+{
+	return 0;
+}
+
+#endif			/* CONFIG_SCSI_PROC_FS */
+
+#if IS_ENABLED(CONFIG_DEBUG_FS)
+
+struct sg_dfs_attr {
+	const char *name;
+	umode_t mode;
+	int (*show)(void *d, struct seq_file *m);
+	ssize_t (*write)(void *d, const char __user *b, size_t s, loff_t *o);
+	/* Set either .show or .seq_ops. */
+	const struct seq_operations *seq_ops;
+};
 
+static int
+sg_dfs_snapshot_devs_show(void *data, struct seq_file *m)
+{
+	bool last;
+	int k, d;
+	struct sg_dfs_context_t *ctxp = data;
+
+	for (k = 0; k < SG_SNAPSHOT_DEV_MAX; ++k) {
+		d = ctxp->snapshot_devs[k];
+		last = (k + 1 == SG_SNAPSHOT_DEV_MAX);
+		if (d < 0) {
+			if (k == 0)
+				seq_puts(m, "-1");
+			break;
+		}
+		if (!last && d == ctxp->snapshot_devs[k + 1]) {
+			if (k == 0)
+				seq_puts(m, "-1");
+			break;
+		}
+		if (k != 0)
+			seq_puts(m, ",");
+		seq_printf(m, "%d", d);
+	}
+	seq_puts(m, "\n");
+	return 0;
+}
+
+static ssize_t
+sg_dfs_snapshot_devs_write(void *data, const char __user *buf, size_t count,
+			   loff_t *ppos)
+{
+	bool trailing_comma;
+	int k, n;
+	struct sg_dfs_context_t *cxtp = data;
+	char lbuf[64] = { }, *cp, *c2p;
+
+	if (count >= sizeof(lbuf)) {
+		pr_err("%s: operation too long\n", __func__);
+		return -EINVAL;
+	}
+	if (copy_from_user(lbuf, buf, count))
+		return -EFAULT;
+	for (k = 0, n = 0, cp = lbuf; k < SG_SNAPSHOT_DEV_MAX;
+	     ++k, cp = c2p + 1) {
+		c2p = strchr(cp, ',');
+		if (c2p)
+			*c2p = '\0';
+		trailing_comma = !!c2p;
+		/* sscanf is easier to use that this ... */
+		if (kstrtoint(cp, 10, cxtp->snapshot_devs + k))
+			break;
+		++n;
+		if (!trailing_comma)
+			break;
+	}
+	if (n == 0) {
+		return -EINVAL;
+	} else if (k >= SG_SNAPSHOT_DEV_MAX && trailing_comma) {
+		pr_err("%s: only %d elements in snapshot array\n", __func__,
+		       SG_SNAPSHOT_DEV_MAX);
+		return -EINVAL;
+	}
+	if (n < SG_SNAPSHOT_DEV_MAX)
+		cxtp->snapshot_devs[n] = -1;
+	return count;
+}
+
+static int
+sg_dfs_show(struct seq_file *m, void *v)
+{
+	const struct sg_dfs_attr *attr = m->private;
+	void *data = d_inode(m->file->f_path.dentry->d_parent)->i_private;
+
+	return attr->show(data, m);
+}
+
+static ssize_t
+sg_dfs_write(struct file *file, const char __user *buf, size_t count,
+	     loff_t *ppos)
+{
+	struct seq_file *m = file->private_data;
+	const struct sg_dfs_attr *attr = m->private;
+	void *data = d_inode(file->f_path.dentry->d_parent)->i_private;
+
+	/*
+	 * Attributes that only implement .seq_ops are read-only and 'attr' is
+	 * the same with 'data' in this case.
+	 */
+	if (attr == data || !attr->write)
+		return -EPERM;
+	return attr->write(data, buf, count, ppos);
+}
+
+static int
+sg_dfs_open(struct inode *inode, struct file *file)
+{
+	const struct sg_dfs_attr *attr = inode->i_private;
+	void *data = d_inode(file->f_path.dentry->d_parent)->i_private;
+	struct seq_file *m;
+	int ret;
+
+	if (attr->seq_ops) {
+		ret = seq_open(file, attr->seq_ops);
+		if (!ret) {
+			m = file->private_data;
+			m->private = data;
+		}
+		return ret;
+	}
+	if (WARN_ON_ONCE(!attr->show))
+		return -EPERM;
+	return single_open(file, sg_dfs_show, inode->i_private);
+}
+
+static int
+sg_dfs_release(struct inode *inode, struct file *file)
+{
+	const struct sg_dfs_attr *attr = inode->i_private;
+
+	if (attr->show)
+		return single_release(inode, file);
+	return seq_release(inode, file);
+}
+
+static const struct file_operations sg_dfs_fops = {
+	.owner		= THIS_MODULE,
+	.open		= sg_dfs_open,
+	.read		= seq_read,
+	.write		= sg_dfs_write,
+	.llseek		= seq_lseek,
+	.release	= sg_dfs_release,
+};
+
+static void sg_dfs_mk_files(struct dentry *parent, void *data,
+			    const struct sg_dfs_attr *attr)
+{
+	if (IS_ERR_OR_NULL(parent))
+		return;
+
+	d_inode(parent)->i_private = data;
+	for (; attr->name; ++attr)
+		debugfs_create_file(attr->name, attr->mode, parent,
+				    (void *)attr, &sg_dfs_fops);
+}
+
+static const struct seq_operations sg_snapshot_seq_ops = {
+	.start = dev_seq_start,
+	.next  = dev_seq_next,
+	.stop  = dev_seq_stop,
+	.show  = sg_proc_seq_show_debug,
+};
+
+static const struct sg_dfs_attr sg_dfs_attrs[] = {
+	{"snapshot", 0400, .seq_ops = &sg_snapshot_seq_ops},
+	{"snapshot_devs", 0600, sg_dfs_snapshot_devs_show,
+	 sg_dfs_snapshot_devs_write},
+	{ },
+};
+
+static void
+sg_dfs_init(void)
+{
+	/* create and populate /sys/kernel/debug/scsi_generic directory */
+	if (!sg_dfs_cxt.dfs_rootdir) {
+		sg_dfs_cxt.dfs_rootdir = debugfs_create_dir("scsi_generic",
+							    NULL);
+		sg_dfs_mk_files(sg_dfs_cxt.dfs_rootdir, &sg_dfs_cxt,
+				sg_dfs_attrs);
+	}
+	sg_dfs_cxt.snapshot_devs[0] = -1;	/* show all sg devices */
+}
+
+static void
+sg_dfs_exit(void)
+{
+	debugfs_remove_recursive(sg_dfs_cxt.dfs_rootdir);
+	sg_dfs_cxt.dfs_rootdir = NULL;
+}
+
+#else		/* not  defined: CONFIG_DEBUG_FS */
+
+static void sg_dfs_init(void) {}
+static void sg_dfs_exit(void) {}
+
+#endif		/* CONFIG_DEBUG_FS */
+
+module_param_named(scatter_elem_sz, scatter_elem_sz, int, 0644);
+module_param_named(def_reserved_size, def_reserved_size, int, 0644);
+module_param_named(allow_dio, sg_allow_dio, int, 0644);
+
+MODULE_AUTHOR("Douglas Gilbert");
+MODULE_DESCRIPTION("SCSI generic (sg) driver");
+MODULE_LICENSE("GPL");
+MODULE_VERSION(SG_VERSION_STR);
+MODULE_ALIAS_CHARDEV_MAJOR(SCSI_GENERIC_MAJOR);
+
+MODULE_PARM_DESC(scatter_elem_sz, "scatter gather element size (default: max(SG_SCATTER_SZ, PAGE_SIZE))");
+MODULE_PARM_DESC(def_reserved_size, "size of buffer reserved for each fd");
+MODULE_PARM_DESC(allow_dio, "allow direct I/O (default: 0 (disallow))");
 module_init(init_sg);
 module_exit(exit_sg);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 36/83] sg: rework mmap support
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (35 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 35/83] sg: first debugfs support Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 37/83] sg: defang allow_dio Douglas Gilbert
                   ` (46 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Linux has an issue with mmap-ed multiple pages issued by
alloc_pages() [with order > 0]. So when mmap(2) is called if the
reserve request's scatter gather (sgat) list is either:
  - not big enough, or
  - made up of elements of order > 0 (i.e. size > PAGE_SIZE)
then throw away reserve requests sgat list and rebuild it meeting
those requirements.
Clean up related code and stop doing mmap+indirect_io.

This new mmap implementation is marginally more flexible (but
still compatible with) the production driver. Previously if the
user wanted a larger mmap(2) 'length' than the reserve request's
size, then they needed to use ioctl(SG_SET_RESERVED_SIZE) to set
the new size first. Now mmap(2) called on a sg device node will
adjust to the size given by 'length' [mmap's second parameter].

Tweak some SG_LOG() levels to control the amount of debug
output. Add some WRITE_ONCE() macros when bitop integers are
being initialised.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 177 +++++++++++++++++++++++++++-------------------
 1 file changed, 106 insertions(+), 71 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index fdbff29669d3..4d0a5a50a00b 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -115,6 +115,7 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 #define SG_FRQ_NO_US_XFER	3	/* no user space transfer of data */
 #define SG_FRQ_DEACT_ORPHAN	6	/* not keeping orphan so de-activate */
 #define SG_FRQ_RECEIVING	7	/* guard against multiple receivers */
+#define SG_FRQ_FOR_MMAP		8	/* request needs PAGE_SIZE elements */
 
 /* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */
 #define SG_FFD_FORCE_PACKID	0	/* receive only given pack_id/tag */
@@ -707,7 +708,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 			 input_size, (unsigned int)opcode, current->comm);
 	}
 	cwr.h3p = h3p;
-	cwr.frq_bm[0] = 0;
+	WRITE_ONCE(cwr.frq_bm[0], 0);
 	cwr.timeout = sfp->timeout;
 	cwr.cmd_len = cmd_size;
 	cwr.filp = filp;
@@ -768,7 +769,7 @@ sg_v3_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
 	/* when v3 seen, allow cmd_q on this fd (def: no cmd_q) */
 	set_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(hp->timeout);
-	cwr.frq_bm[0] = 0;
+	WRITE_ONCE(cwr.frq_bm[0], 0);
 	__assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
 	cwr.h3p = hp;
 	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
@@ -809,7 +810,7 @@ sg_submit_v4(struct file *filp, struct sg_fd *sfp, void __user *p,
 	ul_timeout = msecs_to_jiffies(h4p->timeout);
 	cwr.filp = filp;
 	cwr.sfp = sfp;
-	cwr.frq_bm[0] = 0;
+	WRITE_ONCE(cwr.frq_bm[0], 0);
 	__assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
 	__set_bit(SG_FRQ_IS_V4I, cwr.frq_bm);
 	cwr.h4p = h4p;
@@ -2225,13 +2226,39 @@ sg_fasync(int fd, struct file *filp, int mode)
 	return fasync_helper(fd, filp, mode, &sfp->async_qp);
 }
 
+static void
+sg_vma_open(struct vm_area_struct *vma)
+{
+	struct sg_fd *sfp = vma->vm_private_data;
+
+	if (unlikely(!sfp)) {
+		pr_warn("%s: sfp null\n", __func__);
+		return;
+	}
+	kref_get(&sfp->f_ref);
+}
+
+static void
+sg_vma_close(struct vm_area_struct *vma)
+{
+	struct sg_fd *sfp = vma->vm_private_data;
+
+	if (unlikely(!sfp)) {
+		pr_warn("%s: sfp null\n", __func__);
+		return;
+	}
+	kref_put(&sfp->f_ref, sg_remove_sfp); /* get in: sg_vma_open() */
+}
+
 /* Note: the error return: VM_FAULT_SIGBUS causes a "bus error" */
 static vm_fault_t
 sg_vma_fault(struct vm_fault *vmf)
 {
-	int k, length;
-	unsigned long offset, len, sa, iflags;
+	int k, n, length;
+	int res = VM_FAULT_SIGBUS;
+	unsigned long offset;
 	struct vm_area_struct *vma = vmf->vma;
+	struct page *page;
 	struct sg_scatter_hold *rsv_schp;
 	struct sg_request *srp;
 	struct sg_device *sdp;
@@ -2257,7 +2284,7 @@ sg_vma_fault(struct vm_fault *vmf)
 		SG_LOG(1, sfp, "%s: srp%s\n", __func__, nbp);
 		goto out_err;
 	}
-	xa_lock_irqsave(&sfp->srp_arr, iflags);
+	mutex_lock(&sfp->f_mutex);
 	rsv_schp = &srp->sgat_h;
 	offset = vmf->pgoff << PAGE_SHIFT;
 	if (offset >= (unsigned int)rsv_schp->buflen) {
@@ -2265,44 +2292,37 @@ sg_vma_fault(struct vm_fault *vmf)
 		       offset);
 		goto out_err_unlock;
 	}
-	sa = vma->vm_start;
-	SG_LOG(3, sfp, "%s: vm_start=0x%lx, off=%lu\n", __func__, sa, offset);
+	SG_LOG(5, sfp, "%s: vm_start=0x%lx, off=%lu\n", __func__,
+	       vma->vm_start, offset);
 	length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
-	for (k = 0; k < rsv_schp->num_sgat && sa < vma->vm_end; ++k) {
-		len = vma->vm_end - sa;
-		len = min_t(int, len, (int)length);
-		if (offset < len) {
-			struct page *page;
-			struct page *pp;
-
-			pp = rsv_schp->pages[k];
-			xa_unlock_irqrestore(&sfp->srp_arr, iflags);
-			page = nth_page(pp, offset >> PAGE_SHIFT);
-			get_page(page); /* increment page count */
-			vmf->page = page;
-			return 0; /* success */
-		}
-		sa += len;
-		offset -= len;
-	}
+	k = (int)offset / length;
+	n = ((int)offset % length) >> PAGE_SHIFT;
+	page = nth_page(rsv_schp->pages[k], n);
+	get_page(page);
+	vmf->page = page;
+	res = 0;
 out_err_unlock:
-	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
+	mutex_unlock(&sfp->f_mutex);
 out_err:
-	return VM_FAULT_SIGBUS;
+	return res;
 }
 
 static const struct vm_operations_struct sg_mmap_vm_ops = {
 	.fault = sg_vma_fault,
+	.open = sg_vma_open,
+	.close = sg_vma_close,
 };
 
-/* Entry point for mmap(2) system call */
+/*
+ * Entry point for mmap(2) system call. For mmap(2) to work, request's
+ * scatter gather list needs to be order 0 which it is unlikely to be
+ * by default. mmap(2) cannot be called more than once per fd.
+ */
 static int
 sg_mmap(struct file *filp, struct vm_area_struct *vma)
 {
-	int k, length;
-	int ret = 0;
-	unsigned long req_sz, len, sa, iflags;
-	struct sg_scatter_hold *rsv_schp;
+	int res = 0;
+	unsigned long req_sz;
 	struct sg_fd *sfp;
 	struct sg_request *srp;
 
@@ -2313,40 +2333,48 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 		pr_warn("sg: %s: sfp is NULL\n", __func__);
 		return -ENXIO;
 	}
+	mutex_lock(&sfp->f_mutex);
 	req_sz = vma->vm_end - vma->vm_start;
 	SG_LOG(3, sfp, "%s: vm_start=%pK, len=%d\n", __func__,
 	       (void *)vma->vm_start, (int)req_sz);
-	if (vma->vm_pgoff)
-		return -EINVAL; /* only an offset of 0 accepted */
+	if (vma->vm_pgoff) {
+		res = -EINVAL; /* only an offset of 0 accepted */
+		goto fini;
+	}
 	/* Check reserve request is inactive and has large enough buffer */
-	mutex_lock(&sfp->f_mutex);
 	srp = sfp->rsv_srp;
-	xa_lock_irqsave(&sfp->srp_arr, iflags);
 	if (SG_RS_ACTIVE(srp)) {
-		ret = -EBUSY;
-		goto out;
+		res = -EBUSY;
+		goto fini;
 	}
-	rsv_schp = &srp->sgat_h;
-	if (req_sz > (unsigned long)rsv_schp->buflen) {
-		ret = -ENOMEM;
-		goto out;
+	if (req_sz > SG_WRITE_COUNT_LIMIT) {	/* sanity check */
+		res = -ENOMEM;
+		goto fini;
 	}
-	sa = vma->vm_start;
-	length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
-	for (k = 0; k < rsv_schp->num_sgat && sa < vma->vm_end; ++k) {
-		len = vma->vm_end - sa;
-		len = min_t(unsigned long, len, (unsigned long)length);
-		sa += len;
+	if (test_and_set_bit(SG_FFD_MMAP_CALLED, sfp->ffd_bm)) {
+		SG_LOG(1, sfp, "%s: multiple invocations on this fd\n",
+		       __func__);
+		res = -EADDRINUSE;
+		goto fini;
+	}
+	if (srp->sgat_h.page_order > 0 ||
+	    req_sz > (unsigned long)srp->sgat_h.buflen) {
+		sg_remove_sgat(srp);
+		set_bit(SG_FRQ_FOR_MMAP, srp->frq_bm);
+		res = sg_mk_sgat(srp, sfp, req_sz);
+		if (res) {
+			SG_LOG(1, sfp, "%s: sg_mk_sgat failed, wanted=%lu\n",
+			       __func__, req_sz);
+			goto fini;
+		}
 	}
-
-	set_bit(SG_FFD_MMAP_CALLED, sfp->ffd_bm);
 	vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
 	vma->vm_private_data = sfp;
 	vma->vm_ops = &sg_mmap_vm_ops;
-out:
-	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
+	sg_vma_open(vma);
+fini:
 	mutex_unlock(&sfp->f_mutex);
-	return ret;
+	return res;
 }
 
 static void
@@ -2926,7 +2954,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		goto fini;
 	scsi_rp->cmd_len = cwrp->cmd_len;
 	srp->cmd_opcode = scsi_rp->cmd[0];
-	us_xfer = !(srp->rq_flags & SG_FLAG_NO_DXFER);
+	us_xfer = !(srp->rq_flags & (SG_FLAG_NO_DXFER | SG_FLAG_MMAP_IO));
 	assign_bit(SG_FRQ_NO_US_XFER, srp->frq_bm, !us_xfer);
 	reserved = (sfp->rsv_srp == srp);
 	rq->end_io_data = srp;
@@ -3061,6 +3089,7 @@ sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen)
 	gfp_t mask_kz = GFP_ATOMIC | __GFP_NOWARN;
 	struct sg_device *sdp = sfp->parentdp;
 	struct sg_scatter_hold *schp = &srp->sgat_h;
+	struct page **pgp;
 
 	if (unlikely(minlen <= 0)) {
 		if (minlen < 0)
@@ -3076,32 +3105,37 @@ sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen)
 	if (unlikely(!schp->pages))
 		return -ENOMEM;
 
-	elem_sz = sfp->sgat_elem_sz;    /* power of 2 and >= PAGE_SIZE */
+	/* elem_sz must be power of 2 and >= PAGE_SIZE */
+	elem_sz = test_bit(SG_FRQ_FOR_MMAP, srp->frq_bm) ? (int)PAGE_SIZE : sfp->sgat_elem_sz;
 	if (sdp && unlikely(sdp->device->host->unchecked_isa_dma))
 		mask_ap |= GFP_DMA;
 	o_order = get_order(elem_sz);
 	order = o_order;
 
 again:
-	for (k = 0, rem_sz = align_sz; rem_sz > 0 && k < mx_sgat_elems;
-	     ++k, rem_sz -= elem_sz) {
-		schp->pages[k] = alloc_pages(mask_ap, order);
-		if (!schp->pages[k])
+	if (elem_sz * mx_sgat_elems < align_sz) {	/* misfit ? */
+		SG_LOG(1, sfp, "%s: align_sz=%d too big\n", __func__,
+		       align_sz);
+		goto b4_alloc_pages;
+	}
+	rem_sz = align_sz;
+	for (pgp = schp->pages; rem_sz > 0; ++pgp, rem_sz -= elem_sz) {
+		*pgp = alloc_pages(mask_ap, order);
+		if (unlikely(!*pgp))
 			goto err_out;
-		SG_LOG(5, sfp, "%s: k=%d, order=%d [0x%pK ++]\n", __func__, k,
-		       order, schp->pages[k]);
+		SG_LOG(6, sfp, "%s: elem_sz=%d [0x%pK ++]\n", __func__,
+		       elem_sz, *pgp);
 	}
+	k = pgp - schp->pages;
+	SG_LOG(((order != o_order || rem_sz > 0) ? 2 : 5), sfp,
+	       "%s: num_sgat=%d, order=%d,%d  rem_sz=%d\n", __func__, k,
+	       o_order, order, rem_sz);
 	schp->page_order = order;
 	schp->num_sgat = k;
-	SG_LOG(((order != o_order || rem_sz > 0) ? 2 : 5), sfp,
-	       "%s: num_sgat=%d, order=%d,%d\n", __func__, k, o_order, order);
-	if (unlikely(rem_sz > 0)) {	/* hit mx_sgat_elems */
-		order = 0;		/* force exit */
-		goto err_out;
-	}
 	schp->buflen = align_sz;
 	return 0;
 err_out:
+	k = pgp - schp->pages;
 	for (j = 0; j < k; ++j)
 		__free_pages(schp->pages[j], order);
 
@@ -3109,6 +3143,7 @@ sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen)
 		elem_sz >>= 1;
 		goto again;
 	}
+b4_alloc_pages:
 	kfree(schp->pages);
 	schp->pages = NULL;
 	return -ENOMEM;
@@ -3124,7 +3159,7 @@ sg_remove_sgat_helper(struct sg_fd *sfp, struct sg_scatter_hold *schp)
 		return;
 	for (k = 0; k < schp->num_sgat; ++k) {
 		p = schp->pages[k];
-		SG_LOG(5, sfp, "%s: pg[%d]=0x%pK --\n", __func__, k, p);
+		SG_LOG(6, sfp, "%s: pg[%d]=0x%pK --\n", __func__, k, p);
 		if (unlikely(!p))
 			continue;
 		__free_pages(p, schp->page_order);
@@ -3254,7 +3289,7 @@ sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 	}
 	return NULL;
 good:
-	SG_LOG(6, sfp, "%s: %s%d found [srp=0x%pK]\n", __func__, "pack_id=",
+	SG_LOG(5, sfp, "%s: %s%d found [srp=0x%pK]\n", __func__, "pack_id=",
 	       pack_id, srp);
 	return srp;
 }
@@ -3450,7 +3485,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 		r_srp->parentfp = fp;
 		SG_LOG(4, fp, "%s: mk_new_srp=0x%pK ++\n", __func__, r_srp);
 	}
-	r_srp->frq_bm[0] = cwrp->frq_bm[0];	/* assumes <= 32 req flags */
+	WRITE_ONCE(r_srp->frq_bm[0], cwrp->frq_bm[0]);	/* assumes <= 32 req flags */
 	r_srp->sgat_h.dlen = dxfr_len;/* must be <= r_srp->sgat_h.buflen */
 	r_srp->cmd_opcode = 0xff;  /* set invalid opcode (VS), 0x0 is TUR */
 fini:
@@ -3477,7 +3512,7 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 		return;
 	sbp = srp->sense_bp;
 	srp->sense_bp = NULL;
-	srp->frq_bm[0] = 0;
+	WRITE_ONCE(srp->frq_bm[0], 0);
 	sg_rq_state_chg(srp, 0, SG_RS_INACTIVE, true /* force */, __func__);
 	/* maybe orphaned req, thus never read */
 	if (sbp)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 37/83] sg: defang allow_dio
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (36 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 36/83] sg: rework mmap support Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 38/83] sg: warn v3 write system call users Douglas Gilbert
                   ` (45 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Before direct IO was permitted by this driver, the user either had
to give 'allow_dio=1' as a module parameter or write to procfs
with 'echo 1 > /proc/scsi/sg/allow_dio'. The user also needs
to set the SG_FLAG_DIRECT_IO flag in the flags field of either
the sg v3 or v3 interface. The reason this "belts and braces"
approach was taken is lost in the mists of time (done over 20
years ago). So this patch keeps the allow_dio attribute for
backward compatibility but ignores it, relying on the
SG_FLAG_DIRECT_IO flag alone. This brings the use of the
SG_FLAG_DIRECT_IO flag into line with the SG_FLAG_MMAP_IO
flag; the two mechanisms are no more, or less, safe than one
another in recent Linux kernels.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 26 ++++++++------------------
 1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 4d0a5a50a00b..e4d4afe7be34 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -141,7 +141,7 @@ int sg_big_buff = SG_DEF_RESERVED_SIZE;
  * not enough memory) will be reserved for use by this file descriptor.
  */
 static int def_reserved_size = -1;	/* picks up init parameter */
-static int sg_allow_dio = SG_ALLOW_DIO_DEF;
+static int sg_allow_dio = SG_ALLOW_DIO_DEF;	/* ignored by code */
 
 static int scatter_elem_sz = SG_SCATTER_SZ;
 
@@ -2845,19 +2845,6 @@ exit_sg(void)
 	idr_destroy(&sg_index_idr);
 }
 
-static inline bool
-sg_chk_dio_allowed(struct sg_device *sdp, struct sg_request *srp,
-		   int iov_count, int dir)
-{
-	if (sg_allow_dio && (srp->rq_flags & SG_FLAG_DIRECT_IO)) {
-		if (dir != SG_DXFER_UNKNOWN && !iov_count) {
-			if (!sdp->device->host->unchecked_isa_dma)
-				return true;
-		}
-	}
-	return false;
-}
-
 static inline void
 sg_set_map_data(const struct sg_scatter_hold *schp, bool up_valid,
 		struct rq_map_data *mdp)
@@ -2876,6 +2863,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	int res = 0;
 	int dxfer_len = 0;
 	int r0w = READ;
+	u32 rq_flags = srp->rq_flags;
 	unsigned int iov_count = 0;
 	void __user *up;
 	struct request *rq;
@@ -2954,7 +2942,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		goto fini;
 	scsi_rp->cmd_len = cwrp->cmd_len;
 	srp->cmd_opcode = scsi_rp->cmd[0];
-	us_xfer = !(srp->rq_flags & (SG_FLAG_NO_DXFER | SG_FLAG_MMAP_IO));
+	us_xfer = !(rq_flags & (SG_FLAG_NO_DXFER | SG_FLAG_MMAP_IO));
 	assign_bit(SG_FRQ_NO_US_XFER, srp->frq_bm, !us_xfer);
 	reserved = (sfp->rsv_srp == srp);
 	rq->end_io_data = srp;
@@ -2965,7 +2953,8 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		SG_LOG(4, sfp, "%s: no data xfer [0x%pK]\n", __func__, srp);
 		set_bit(SG_FRQ_NO_US_XFER, srp->frq_bm);
 		goto fini;	/* path of reqs with no din nor dout */
-	} else if (sg_chk_dio_allowed(sdp, srp, iov_count, dxfer_dir) &&
+	} else if ((rq_flags & SG_FLAG_DIRECT_IO) && iov_count == 0 &&
+		   !sdp->device->host->unchecked_isa_dma &&
 		   blk_rq_aligned(q, (unsigned long)up, dxfer_len)) {
 		srp->rq_info |= SG_INFO_DIRECT_IO;
 		md = NULL;
@@ -2976,7 +2965,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	}
 
 	if (likely(md)) {	/* normal, "indirect" IO */
-		if (unlikely(srp->rq_flags & SG_FLAG_MMAP_IO)) {
+		if (unlikely(rq_flags & SG_FLAG_MMAP_IO)) {
 			/* mmap IO must use and fit in reserve request */
 			if (!reserved || dxfer_len > req_schp->buflen)
 				res = reserved ? -ENOMEM : -EBUSY;
@@ -3861,6 +3850,7 @@ sg_proc_single_open_adio(struct inode *inode, struct file *filp)
 	return single_open(filp, sg_proc_seq_show_int, &sg_allow_dio);
 }
 
+/* Kept for backward compatibility. sg_allow_dio is now ignored. */
 static ssize_t
 sg_proc_write_adio(struct file *filp, const char __user *buffer,
 		   size_t count, loff_t *off)
@@ -4481,6 +4471,6 @@ MODULE_ALIAS_CHARDEV_MAJOR(SCSI_GENERIC_MAJOR);
 
 MODULE_PARM_DESC(scatter_elem_sz, "scatter gather element size (default: max(SG_SCATTER_SZ, PAGE_SIZE))");
 MODULE_PARM_DESC(def_reserved_size, "size of buffer reserved for each fd");
-MODULE_PARM_DESC(allow_dio, "allow direct I/O (default: 0 (disallow))");
+MODULE_PARM_DESC(allow_dio, "allow direct I/O (default: 0 (disallow)); now ignored");
 module_init(init_sg);
 module_exit(exit_sg);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 38/83] sg: warn v3 write system call users
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (37 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 37/83] sg: defang allow_dio Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 39/83] sg: add mmap_sz tracking Douglas Gilbert
                   ` (44 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Should generate one log message per kernel run when the write()
system call is used with the sg interface version 3. Due to
security concerns suggest that they use ioctl(SG_SUBMIT_v3)
instead.

Sg interface version 1 or 2 based code may also be calling
write() in this context. There is no easy solution for them
(short of upgrading their interface to version 3 or 4), so
don't produce a warning suggesting the conversion will be
simple.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index e4d4afe7be34..b27a77e17ad2 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -644,6 +644,9 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 				     __func__);
 			return -EPERM;
 		}
+		pr_warn_once("Please use %s instead of write(),\n%s\n",
+			     "ioctl(SG_SUBMIT_V3)",
+			     "  See: https://sg.danny.cz/sg/sg_v40.html");
 		res = sg_v3_submit(filp, sfp, h3p, false, NULL);
 		return res < 0 ? res : (int)count;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 39/83] sg: add mmap_sz tracking
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (38 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 38/83] sg: warn v3 write system call users Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 40/83] sg: remove rcv_done request state Douglas Gilbert
                   ` (43 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Track mmap_sz from prior mmap(2) call, per sg file descriptor. Also
reset this value whenever munmap(2) is called. Fail SG_FLAG_MMAP_IO
uses if mmap(2) hasn't been called or the memory associated with it
is not large enough for the current request.

Remove SG_FFD_MMAP_CALLED bit as it can be deduced from
sfp->mmap_sz where a value of 0 implies no mmap() call active.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index b27a77e17ad2..184696b94425 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -121,8 +121,7 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 #define SG_FFD_FORCE_PACKID	0	/* receive only given pack_id/tag */
 #define SG_FFD_CMD_Q		1	/* clear: only 1 active req per fd */
 #define SG_FFD_KEEP_ORPHAN	2	/* policy for this fd */
-#define SG_FFD_MMAP_CALLED	3	/* mmap(2) system call made on fd */
-#define SG_FFD_Q_AT_TAIL	5	/* set: queue reqs at tail of blk q */
+#define SG_FFD_Q_AT_TAIL	3	/* set: queue reqs at tail of blk q */
 
 /* Bit positions (flags) for sg_device::fdev_bm bitmask follow */
 #define SG_FDEV_EXCLUDE		0	/* have fd open with O_EXCL */
@@ -231,6 +230,7 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	atomic_t waiting;	/* number of requests awaiting receive */
 	atomic_t req_cnt;	/* number of requests */
 	int sgat_elem_sz;	/* initialized to scatter_elem_sz */
+	int mmap_sz;		/* byte size of previous mmap() call */
 	unsigned long ffd_bm[1];	/* see SG_FFD_* defines above */
 	pid_t tid;		/* thread id when opened */
 	u8 next_cmd_len;	/* 0: automatic, >0: use on next write() */
@@ -724,10 +724,14 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 static inline int
 sg_chk_mmap(struct sg_fd *sfp, int rq_flags, int len)
 {
+	if (unlikely(sfp->mmap_sz == 0))
+		return -EBADFD;
 	if (atomic_read(&sfp->submitted) > 0)
 		return -EBUSY;  /* already active requests on fd */
 	if (len > sfp->rsv_srp->sgat_h.buflen)
 		return -ENOMEM; /* MMAP_IO size must fit in reserve */
+	if (unlikely(len > sfp->mmap_sz))
+		return -ENOMEM; /* MMAP_IO size can't exceed mmap() size */
 	if (rq_flags & SG_FLAG_DIRECT_IO)
 		return -EINVAL; /* not both MMAP_IO and DIRECT_IO */
 	return 0;
@@ -1788,8 +1792,7 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 		res = -EPROTO;
 		goto fini;
 	}
-	if (SG_RS_ACTIVE(o_srp) ||
-	    test_bit(SG_FFD_MMAP_CALLED, sfp->ffd_bm)) {
+	if (SG_RS_ACTIVE(o_srp) || sfp->mmap_sz > 0) {
 		res = -EBUSY;
 		goto fini;
 	}
@@ -2250,6 +2253,7 @@ sg_vma_close(struct vm_area_struct *vma)
 		pr_warn("%s: sfp null\n", __func__);
 		return;
 	}
+	sfp->mmap_sz = 0;
 	kref_put(&sfp->f_ref, sg_remove_sfp); /* get in: sg_vma_open() */
 }
 
@@ -2340,7 +2344,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 	req_sz = vma->vm_end - vma->vm_start;
 	SG_LOG(3, sfp, "%s: vm_start=%pK, len=%d\n", __func__,
 	       (void *)vma->vm_start, (int)req_sz);
-	if (vma->vm_pgoff) {
+	if (unlikely(vma->vm_pgoff || req_sz < SG_DEF_SECTOR_SZ)) {
 		res = -EINVAL; /* only an offset of 0 accepted */
 		goto fini;
 	}
@@ -2354,7 +2358,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 		res = -ENOMEM;
 		goto fini;
 	}
-	if (test_and_set_bit(SG_FFD_MMAP_CALLED, sfp->ffd_bm)) {
+	if (sfp->mmap_sz > 0) {
 		SG_LOG(1, sfp, "%s: multiple invocations on this fd\n",
 		       __func__);
 		res = -EADDRINUSE;
@@ -2371,6 +2375,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 			goto fini;
 		}
 	}
+	sfp->mmap_sz = req_sz;
 	vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
 	vma->vm_private_data = sfp;
 	vma->vm_ops = &sg_mmap_vm_ops;
@@ -4023,8 +4028,7 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
 		       (int)test_bit(SG_FFD_FORCE_PACKID, fp->ffd_bm),
 		       (int)test_bit(SG_FFD_KEEP_ORPHAN, fp->ffd_bm),
 		       fp->ffd_bm[0]);
-	n += scnprintf(obp + n, len - n, "   mmap_called=%d\n",
-		       test_bit(SG_FFD_MMAP_CALLED, fp->ffd_bm));
+	n += scnprintf(obp + n, len - n, "   mmap_sz=%d\n", fp->mmap_sz);
 	n += scnprintf(obp + n, len - n,
 		       "   submitted=%d waiting=%d   open thr_id=%d\n",
 		       atomic_read(&fp->submitted),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 40/83] sg: remove rcv_done request state
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (39 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 39/83] sg: add mmap_sz tracking Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 41/83] sg: track lowest inactive and await indexes Douglas Gilbert
                   ` (42 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Remove SG_RQ_RCV_DONE request state. Also remember the position of
the last used request array element and start subsequent searches
for completed requests and new requests from that index.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 168 +++++++++++++++++++++++++++-------------------
 1 file changed, 98 insertions(+), 70 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 184696b94425..d342606c9632 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -90,7 +90,6 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 	SG_RS_INACTIVE = 0,	/* request not in use (e.g. on fl) */
 	SG_RS_INFLIGHT,		/* active: cmd/req issued, no response yet */
 	SG_RS_AWAIT_RCV,	/* have response from LLD, awaiting receive */
-	SG_RS_RCV_DONE,		/* receive is ongoing or done */
 	SG_RS_BUSY,		/* temporary state should rarely be seen */
 };
 
@@ -225,6 +224,7 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	struct mutex f_mutex;	/* serialize ioctls on this fd */
 	int timeout;		/* defaults to SG_DEFAULT_TIMEOUT      */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
+	int prev_used_idx;	/* previous used index */
 	u32 idx;		/* my index within parent's sfp_arr */
 	atomic_t submitted;	/* number inflight or awaiting receive */
 	atomic_t waiting;	/* number of requests awaiting receive */
@@ -297,7 +297,9 @@ static struct sg_device *sg_get_dev(int min_dev);
 static void sg_device_destroy(struct kref *kref);
 static struct sg_request *sg_mk_srp_sgat(struct sg_fd *sfp, bool first,
 					 int db_len);
+#if IS_ENABLED(CONFIG_SCSI_LOGGING) && IS_ENABLED(SG_DEBUG)
 static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
+#endif
 
 #define SG_WRITE_COUNT_LIMIT (32 * 1024 * 1024)
 
@@ -871,20 +873,18 @@ sg_ctl_iosubmit_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
 #if IS_ENABLED(SG_LOG_ACTIVE)
 static void
 sg_rq_state_fail_msg(struct sg_fd *sfp, enum sg_rq_state exp_old_st,
-		     enum sg_rq_state want_st, enum sg_rq_state act_old_st,
-		     const char *fromp)
+		     enum sg_rq_state want_st, const char *fromp)
 {
-	const char *eaw_rs = "expected_old,actual_old,wanted rq_st";
+	const char *eaw_rs = "expected_old,wanted rq_st";
 
 	if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
-		SG_LOG(1, sfp, "%s: %s: %s: %s,%s,%s\n",
+		SG_LOG(1, sfp, "%s: %s: %s,%s,%s\n",
 		       __func__, fromp, eaw_rs,
 		       sg_rq_st_str(exp_old_st, false),
-		       sg_rq_st_str(act_old_st, false),
 		       sg_rq_st_str(want_st, false));
 	else
-		pr_info("sg: %s: %s: %s: %d,%d,%d\n", __func__, fromp, eaw_rs,
-			(int)exp_old_st, (int)act_old_st, (int)want_st);
+		pr_info("sg: %s: %s: %s: %d,%d\n", __func__, fromp, eaw_rs,
+			(int)exp_old_st, (int)want_st);
 }
 #endif
 
@@ -928,8 +928,8 @@ sg_rq_state_helper(struct xarray *xafp, struct sg_request *srp, int indic)
 }
 
 /* Following array indexed by enum sg_rq_state, 0 means no xa mark change */
-static const int sg_rq_state_arr[] = {1, 0, 4, 0, 0};
-static const int sg_rq_state_mul2arr[] = {2, 0, 8, 0, 0};
+static const int sg_rq_state_arr[] = {1, 0, 4, 0};
+static const int sg_rq_state_mul2arr[] = {2, 0, 8, 0};
 
 /*
  * This function keeps the srp->rq_st state and associated marks on the
@@ -944,39 +944,47 @@ static const int sg_rq_state_mul2arr[] = {2, 0, 8, 0, 0};
  */
 static int
 sg_rq_state_chg(struct sg_request *srp, enum sg_rq_state old_st,
-		enum sg_rq_state new_st, bool force, const char *fromp)
+		enum sg_rq_state new_st)
 {
 	enum sg_rq_state act_old_st;
 	int indic;
 	unsigned long iflags;
-	struct xarray *xafp = &srp->parentfp->srp_arr;
+	struct sg_fd *sfp = srp->parentfp;
+	struct xarray *xafp = &sfp->srp_arr;
 
-	if (force) {
-		xa_lock_irqsave(xafp, iflags);
-		sg_rq_state_force(srp, new_st);
-		xa_unlock_irqrestore(xafp, iflags);
-		return 0;
-	}
 	indic = sg_rq_state_arr[(int)old_st] +
 		sg_rq_state_mul2arr[(int)new_st];
 	act_old_st = (enum sg_rq_state)atomic_cmpxchg(&srp->rq_st, old_st,
 						      new_st);
 	if (act_old_st != old_st) {
-#if IS_ENABLED(SG_LOG_ACTIVE)
-		if (fromp)
-			sg_rq_state_fail_msg(srp->parentfp, old_st, new_st,
-					     act_old_st, fromp);
-#endif
+		SG_LOG(1, sfp, "%s: unexpected old state: %s\n", __func__,
+		       sg_rq_st_str(act_old_st, false));
 		return -EPROTOTYPE;	/* only used for this error type */
 	}
 	if (indic) {
 		xa_lock_irqsave(xafp, iflags);
+		if (new_st == SG_RS_INACTIVE)
+			WRITE_ONCE(sfp->prev_used_idx, srp->rq_idx);
 		sg_rq_state_helper(xafp, srp, indic);
 		xa_unlock_irqrestore(xafp, iflags);
 	}
 	return 0;
 }
 
+static void
+sg_rq_state_chg_force(struct sg_request *srp, enum sg_rq_state new_st)
+{
+	unsigned long iflags;
+	struct sg_fd *sfp = srp->parentfp;
+	struct xarray *xafp = &sfp->srp_arr;
+
+	xa_lock_irqsave(xafp, iflags);
+	if (new_st == SG_RS_INACTIVE)
+		WRITE_ONCE(sfp->prev_used_idx, srp->rq_idx);
+	sg_rq_state_force(srp, new_st);
+	xa_unlock_irqrestore(xafp, iflags);
+}
+
 static void
 sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 {
@@ -998,8 +1006,7 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 		at_head = !(srp->rq_flags & SG_FLAG_Q_AT_TAIL);
 
 	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
-	sg_rq_state_chg(srp, SG_RS_BUSY /* ignored */, SG_RS_INFLIGHT,
-			true, __func__);
+	sg_rq_state_chg_force(srp, SG_RS_INFLIGHT);
 
 	/* >>>>>>> send cmd/req off to other levels <<<<<<<< */
 	if (!sync)
@@ -1200,10 +1207,6 @@ sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, size_t count,
 	hp->driver_status = driver_byte(rq_result);
 	err2 = put_sg_io_hdr(hp, p);
 	err = err ? err : err2;
-	err2 = sg_rq_state_chg(srp, atomic_read(&srp->rq_st), SG_RS_RCV_DONE,
-			       false, __func__);
-	if (err2)
-		err = err ? err : err2;
 err_out:
 	sg_finish_scsi_blk_rq(srp);
 	sg_deact_request(sfp, srp);
@@ -1214,7 +1217,7 @@ static int
 sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 	      struct sg_io_v4 *h4p)
 {
-	int err, err2;
+	int err;
 	u32 rq_result = srp->rq_result;
 
 	SG_LOG(3, sfp, "%s: p=%s, h4p=%s\n", __func__,
@@ -1249,10 +1252,6 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 		if (copy_to_user(p, h4p, SZ_SG_IO_V4))
 			err = err ? err : -EFAULT;
 	}
-	err2 = sg_rq_state_chg(srp, atomic_read(&srp->rq_st), SG_RS_RCV_DONE,
-			       false, __func__);
-	if (err2)
-		err = err ? err : err2;
 	sg_finish_scsi_blk_rq(srp);
 	sg_deact_request(sfp, srp);
 	return err < 0 ? err : 0;
@@ -1447,7 +1446,6 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 		res = (h2p->result == 0) ? 0 : -EIO;
 	}
 fini:
-	atomic_set(&srp->rq_st, SG_RS_RCV_DONE);
 	sg_finish_scsi_blk_rq(srp);
 	sg_deact_request(sfp, srp);
 	return res;
@@ -1621,7 +1619,6 @@ sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
 		res = sg_calc_rq_dur(srp);
 		break;
 	case SG_RS_AWAIT_RCV:
-	case SG_RS_RCV_DONE:
 	case SG_RS_INACTIVE:
 		res = srp->duration;
 		is_dur = true;	/* completion has occurred, timing finished */
@@ -1692,9 +1689,13 @@ sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
 	sr_st = atomic_read(&srp->rq_st);
 	if (unlikely(sr_st != SG_RS_AWAIT_RCV))
 		return -EPROTO;         /* Logic error */
-	res = sg_rq_state_chg(srp, sr_st, SG_RS_BUSY, false, __func__);
-	if (unlikely(res))
+	res = sg_rq_state_chg(srp, sr_st, SG_RS_BUSY);
+	if (unlikely(res)) {
+#if IS_ENABLED(SG_LOG_ACTIVE)
+		sg_rq_state_fail_msg(sfp, sr_st, SG_RS_BUSY, __func__);
+#endif
 		return res;
+	}
 	if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm))
 		res = sg_receive_v4(sfp, srp, p, h4p);
 	else
@@ -2533,8 +2534,7 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 	}
 	if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
 		atomic_inc(&sfp->waiting);
-	if (unlikely(sg_rq_state_chg(srp, SG_RS_INFLIGHT, rqq_state,
-				     false, __func__)))
+	if (unlikely(sg_rq_state_chg(srp, SG_RS_INFLIGHT, rqq_state)))
 		pr_warn("%s: can't set rq_st\n", __func__);
 	/*
 	 * Free the mid-level resources apart from the bio (if any). The bio's
@@ -3017,6 +3017,8 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	}
 fini:
 	if (unlikely(res)) {		/* failure, free up resources */
+		if (us_xfer && rq->bio)
+			blk_rq_unmap_user(rq->bio);
 		scsi_req_free_cmd(scsi_rp);
 		srp->rq = NULL;
 		if (us_xfer && rq->bio)
@@ -3235,9 +3237,10 @@ sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 	__maybe_unused bool is_bad_st = false;
 	__maybe_unused enum sg_rq_state bad_sr_st = SG_RS_INACTIVE;
 	bool search_for_1 = (pack_id != SG_PACK_ID_WILDCARD);
+	bool second = false;
 	int res;
 	int num_waiting = atomic_read(&sfp->waiting);
-	unsigned long idx;
+	unsigned long idx, start_idx, end_idx;
 	struct sg_request *srp = NULL;
 	struct xarray *xafp = &sfp->srp_arr;
 
@@ -3249,8 +3252,7 @@ sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 				continue;
 			if (srp->pack_id != pack_id)
 				continue;
-			res = sg_rq_state_chg(srp, SG_RS_AWAIT_RCV, SG_RS_BUSY,
-					      false, __func__);
+			res = sg_rq_state_chg(srp, SG_RS_AWAIT_RCV, SG_RS_BUSY);
 			if (likely(res == 0))
 				goto good;
 			/* else another caller got it, move on */
@@ -3260,14 +3262,37 @@ sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 			}
 			break;
 		}
-	} else {        /* search for any request is more likely */
-		xa_for_each_marked(xafp, idx, srp, SG_XA_RQ_AWAIT) {
+	} else {
+		/*
+		 * Searching for _any_ request is the more likely usage. Start searching with the
+		 * last xarray index that was used. In the case of a large-ish IO depth, it is
+		 * likely that the second (relative) position will be the request we want, if it
+		 * is ready. If there is no queuing and the "last used" has been re-used then the
+		 * first (relative) position will be the request we want.
+		 */
+		start_idx = READ_ONCE(sfp->prev_used_idx);
+		end_idx = ULONG_MAX;
+second_time:
+		idx = start_idx;
+		for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
+		     srp;
+		     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
 			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
 				continue;
-			res = sg_rq_state_chg(srp, SG_RS_AWAIT_RCV, SG_RS_BUSY,
-					      false, __func__);
+			res = sg_rq_state_chg(srp, SG_RS_AWAIT_RCV, SG_RS_BUSY);
 			if (likely(res == 0))
 				goto good;
+#if IS_ENABLED(SG_LOG_ACTIVE)
+			else
+				sg_rq_state_fail_msg(sfp, SG_RS_AWAIT_RCV, SG_RS_BUSY, __func__);
+#endif
+		}
+		/* If not found so far, need to wrap around and search [0 ... start_idx) */
+		if (!srp && !second && start_idx > 0) {
+			end_idx = start_idx - 1;
+			start_idx = 0;
+			second = true;
+			goto second_time;
 		}
 	}
 	/* here if one of above loops does _not_ find a match */
@@ -3385,8 +3410,9 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 	bool found = false;
 	bool mk_new_srp = true;
 	bool try_harder = false;
+	bool second = false;
 	int num_inactive = 0;
-	unsigned long idx, last_idx, iflags;
+	unsigned long idx, start_idx, end_idx, iflags;
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_request *r_srp = NULL;	/* request to return */
 	struct sg_request *last_srp = NULL;
@@ -3399,45 +3425,48 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 		act_empty = true;
 		mk_new_srp = true;
 	} else if (!try_harder && dxfr_len < SG_DEF_SECTOR_SZ) {
-		last_idx = ~0UL;
 		xa_for_each_marked(xafp, idx, r_srp, SG_XA_RQ_INACTIVE) {
-			if (!r_srp)
-				continue;
 			++num_inactive;
-			if (dxfr_len < SG_DEF_SECTOR_SZ) {
-				last_idx = idx;
+			if (dxfr_len < SG_DEF_SECTOR_SZ)
 				last_srp = r_srp;
-				continue;
-			}
 		}
 		/* If dxfr_len is small, use last inactive request */
-		if (last_idx != ~0UL && last_srp) {
+		if (last_srp) {
 			r_srp = last_srp;
-			if (sg_rq_state_chg(r_srp, SG_RS_INACTIVE, SG_RS_BUSY,
-					    false, __func__))
+			if (sg_rq_state_chg(r_srp, SG_RS_INACTIVE, SG_RS_BUSY))
 				goto start_again; /* gone to another thread */
 			cp = "toward end of srp_arr";
 			found = true;
 		}
 	} else {
-		xa_for_each_marked(xafp, idx, r_srp, SG_XA_RQ_INACTIVE) {
-			if (!r_srp)
-				continue;
+		start_idx = READ_ONCE(fp->prev_used_idx);
+		end_idx = ULONG_MAX;
+second_time:
+		idx = start_idx;
+		for (r_srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE);
+		     r_srp;
+		     r_srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE)) {
 			if (r_srp->sgat_h.buflen >= dxfr_len) {
-				if (sg_rq_state_chg
-					(r_srp, SG_RS_INACTIVE, SG_RS_BUSY,
-					 false, __func__))
+				if (sg_rq_state_chg(r_srp, SG_RS_INACTIVE, SG_RS_BUSY))
 					continue;
-				cp = "from front of srp_arr";
+				cp = "near front of srp_arr";
 				found = true;
 				break;
 			}
 		}
+		/* If not found so far, need to wrap around and search [0 ... start_idx) */
+		if (!r_srp && !second && start_idx > 0) {
+			end_idx = start_idx - 1;
+			start_idx = 0;
+			second = true;
+			goto second_time;
+		}
 	}
 	if (found) {
 		r_srp->in_resid = 0;
 		r_srp->rq_info = 0;
 		r_srp->sense_len = 0;
+		WRITE_ONCE(fp->prev_used_idx, r_srp->rq_idx);
 		mk_new_srp = false;
 	} else {
 		mk_new_srp = true;
@@ -3510,7 +3539,7 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 	sbp = srp->sense_bp;
 	srp->sense_bp = NULL;
 	WRITE_ONCE(srp->frq_bm[0], 0);
-	sg_rq_state_chg(srp, 0, SG_RS_INACTIVE, true /* force */, __func__);
+	sg_rq_state_chg_force(srp, SG_RS_INACTIVE);
 	/* maybe orphaned req, thus never read */
 	if (sbp)
 		mempool_free(sbp, sg_sense_pool);
@@ -3601,7 +3630,7 @@ sg_add_sfp(struct sg_device *sdp)
 		}
 		srp->rq_idx = idx;
 		srp->parentfp = sfp;
-		sg_rq_state_chg(srp, 0, SG_RS_INACTIVE, true, __func__);
+		sg_rq_state_chg_force(srp, SG_RS_INACTIVE);
 	}
 	if (!reduced) {
 		SG_LOG(4, sfp, "%s: built reserve buflen=%d\n", __func__,
@@ -3750,8 +3779,6 @@ sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
 		return long_str ? "inflight" : "act";
 	case SG_RS_AWAIT_RCV:
 		return long_str ? "await_receive" : "rcv";
-	case SG_RS_RCV_DONE:
-		return long_str ? "receive_done" : "fin";
 	case SG_RS_BUSY:
 		return long_str ? "busy" : "bsy";
 	default:
@@ -4028,7 +4055,8 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
 		       (int)test_bit(SG_FFD_FORCE_PACKID, fp->ffd_bm),
 		       (int)test_bit(SG_FFD_KEEP_ORPHAN, fp->ffd_bm),
 		       fp->ffd_bm[0]);
-	n += scnprintf(obp + n, len - n, "   mmap_sz=%d\n", fp->mmap_sz);
+	n += scnprintf(obp + n, len - n, "   mmap_sz=%d prev_used_idx=%d\n",
+		       fp->mmap_sz, fp->prev_used_idx);
 	n += scnprintf(obp + n, len - n,
 		       "   submitted=%d waiting=%d   open thr_id=%d\n",
 		       atomic_read(&fp->submitted),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 41/83] sg: track lowest inactive and await indexes
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (40 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 40/83] sg: remove rcv_done request state Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 42/83] sg: remove unit attention check for device changed Douglas Gilbert
                   ` (41 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Use two integers in the sg_fd structure to track recent and lowest
xarray indexes that have become inactive or await a foreground
receive. This is used to shorten the number of xarray iterations
required prior to a match when queue (IO) depths are large, say
128.

Replace the req_cnt atomic in struct sg_fd with the inactives
atomic. With large queues, cycles were wasted checking the request
xarray for any inactives when there were none to be found.

Rename the sg_rq_state_chg_*() functions to sg_rq_chg_state_*()
since too many things start with "sg_rq_state". Also the new
function names emphasize the change part a little more.

Rename the struct request pointer from rq to rqq and when it read
and written to sg_request::rqq use READ_ONCE() and WRITE_ONCE()
macros.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 369 ++++++++++++++++++++++++++--------------------
 1 file changed, 209 insertions(+), 160 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index d342606c9632..8521782adf70 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -115,6 +115,7 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 #define SG_FRQ_DEACT_ORPHAN	6	/* not keeping orphan so de-activate */
 #define SG_FRQ_RECEIVING	7	/* guard against multiple receivers */
 #define SG_FRQ_FOR_MMAP		8	/* request needs PAGE_SIZE elements */
+#define SG_FRQ_COUNT_ACTIVE	9	/* sfp->submitted + waiting active */
 
 /* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */
 #define SG_FFD_FORCE_PACKID	0	/* receive only given pack_id/tag */
@@ -213,7 +214,7 @@ struct sg_request {	/* active SCSI command or inactive request */
 	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
 	u8 *sense_bp;		/* mempool alloc-ed sense buffer, as needed */
 	struct sg_fd *parentfp;	/* pointer to owning fd, even when on fl */
-	struct request *rq;	/* released in sg_rq_end_io(), bio kept */
+	struct request *rqq;	/* released in sg_rq_end_io(), bio kept */
 	struct bio *bio;	/* kept until this req -->SG_RS_INACTIVE */
 	struct execute_work ew_orph;	/* harvest orphan request */
 };
@@ -224,11 +225,12 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	struct mutex f_mutex;	/* serialize ioctls on this fd */
 	int timeout;		/* defaults to SG_DEFAULT_TIMEOUT      */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
-	int prev_used_idx;	/* previous used index */
+	int low_used_idx;	/* previous or lower used index */
+	int low_await_idx;	/* previous or lower await index */
 	u32 idx;		/* my index within parent's sfp_arr */
 	atomic_t submitted;	/* number inflight or awaiting receive */
 	atomic_t waiting;	/* number of requests awaiting receive */
-	atomic_t req_cnt;	/* number of requests */
+	atomic_t inactives;	/* number of inactive requests */
 	int sgat_elem_sz;	/* initialized to scatter_elem_sz */
 	int mmap_sz;		/* byte size of previous mmap() call */
 	unsigned long ffd_bm[1];	/* see SG_FFD_* defines above */
@@ -888,11 +890,13 @@ sg_rq_state_fail_msg(struct sg_fd *sfp, enum sg_rq_state exp_old_st,
 }
 #endif
 
+/* Functions ending in '_ulck' assume sfp->xa_lock held by caller. */
 static void
-sg_rq_state_force(struct sg_request *srp, enum sg_rq_state new_st)
+sg_rq_chg_state_force_ulck(struct sg_request *srp, enum sg_rq_state new_st)
 {
 	bool prev, want;
-	struct xarray *xafp = &srp->parentfp->srp_arr;
+	struct sg_fd *sfp = srp->parentfp;
+	struct xarray *xafp = &sfp->srp_arr;
 
 	atomic_set(&srp->rq_st, new_st);
 	want = (new_st == SG_RS_AWAIT_RCV);
@@ -906,15 +910,21 @@ sg_rq_state_force(struct sg_request *srp, enum sg_rq_state new_st)
 	want = (new_st == SG_RS_INACTIVE);
 	prev = xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
 	if (prev != want) {
-		if (want)
+		if (want) {
+			int prev_idx = READ_ONCE(sfp->low_used_idx);
+
+			if (prev_idx < 0 || srp->rq_idx < prev_idx ||
+			    !xa_get_mark(xafp, prev_idx, SG_XA_RQ_INACTIVE))
+				WRITE_ONCE(sfp->low_used_idx, srp->rq_idx);
 			__xa_set_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
-		else
+		} else {
 			__xa_clear_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
+		}
 	}
 }
 
 static void
-sg_rq_state_helper(struct xarray *xafp, struct sg_request *srp, int indic)
+sg_rq_chg_state_help(struct xarray *xafp, struct sg_request *srp, int indic)
 {
 	if (indic & 1)		/* from inactive state */
 		__xa_clear_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
@@ -943,45 +953,53 @@ static const int sg_rq_state_mul2arr[] = {2, 0, 8, 0};
  * spinlock is held.
  */
 static int
-sg_rq_state_chg(struct sg_request *srp, enum sg_rq_state old_st,
+sg_rq_chg_state(struct sg_request *srp, enum sg_rq_state old_st,
 		enum sg_rq_state new_st)
 {
 	enum sg_rq_state act_old_st;
-	int indic;
-	unsigned long iflags;
+	int indic = sg_rq_state_arr[(int)old_st] + sg_rq_state_mul2arr[(int)new_st];
 	struct sg_fd *sfp = srp->parentfp;
-	struct xarray *xafp = &sfp->srp_arr;
 
-	indic = sg_rq_state_arr[(int)old_st] +
-		sg_rq_state_mul2arr[(int)new_st];
-	act_old_st = (enum sg_rq_state)atomic_cmpxchg(&srp->rq_st, old_st,
-						      new_st);
-	if (act_old_st != old_st) {
-		SG_LOG(1, sfp, "%s: unexpected old state: %s\n", __func__,
-		       sg_rq_st_str(act_old_st, false));
-		return -EPROTOTYPE;	/* only used for this error type */
-	}
 	if (indic) {
+		unsigned long iflags;
+		struct xarray *xafp = &sfp->srp_arr;
+
 		xa_lock_irqsave(xafp, iflags);
-		if (new_st == SG_RS_INACTIVE)
-			WRITE_ONCE(sfp->prev_used_idx, srp->rq_idx);
-		sg_rq_state_helper(xafp, srp, indic);
+		act_old_st = (enum sg_rq_state)atomic_cmpxchg_relaxed(&srp->rq_st, old_st, new_st);
+		if (unlikely(act_old_st != old_st)) {
+			xa_unlock_irqrestore(xafp, iflags);
+			SG_LOG(1, sfp, "%s: unexpected old state: %s\n", __func__,
+			       sg_rq_st_str(act_old_st, false));
+			return -EPROTOTYPE;     /* only used for this error type */
+		}
+		if (new_st == SG_RS_INACTIVE) {
+			int prev_idx = READ_ONCE(sfp->low_used_idx);
+
+			if (prev_idx < 0 || srp->rq_idx < prev_idx ||
+			    !xa_get_mark(xafp, prev_idx, SG_XA_RQ_INACTIVE))
+				WRITE_ONCE(sfp->low_used_idx, srp->rq_idx);
+		}
+		sg_rq_chg_state_help(xafp, srp, indic);
 		xa_unlock_irqrestore(xafp, iflags);
+	} else {
+		act_old_st = (enum sg_rq_state)atomic_cmpxchg(&srp->rq_st, old_st, new_st);
+		if (unlikely(act_old_st != old_st)) {
+			SG_LOG(1, sfp, "%s: unexpected old state: %s\n", __func__,
+			       sg_rq_st_str(act_old_st, false));
+			return -EPROTOTYPE;     /* only used for this error type */
+		}
 	}
 	return 0;
 }
 
 static void
-sg_rq_state_chg_force(struct sg_request *srp, enum sg_rq_state new_st)
+sg_rq_chg_state_force(struct sg_request *srp, enum sg_rq_state new_st)
 {
 	unsigned long iflags;
-	struct sg_fd *sfp = srp->parentfp;
-	struct xarray *xafp = &sfp->srp_arr;
+	struct xarray *xafp = &srp->parentfp->srp_arr;
 
 	xa_lock_irqsave(xafp, iflags);
-	if (new_st == SG_RS_INACTIVE)
-		WRITE_ONCE(sfp->prev_used_idx, srp->rq_idx);
-	sg_rq_state_force(srp, new_st);
+	sg_rq_chg_state_force_ulck(srp, new_st);
 	xa_unlock_irqrestore(xafp, iflags);
 }
 
@@ -1006,12 +1024,14 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 		at_head = !(srp->rq_flags & SG_FLAG_Q_AT_TAIL);
 
 	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
-	sg_rq_state_chg_force(srp, SG_RS_INFLIGHT);
+	sg_rq_chg_state_force(srp, SG_RS_INFLIGHT);
 
 	/* >>>>>>> send cmd/req off to other levels <<<<<<<< */
-	if (!sync)
+	if (!sync) {
 		atomic_inc(&sfp->submitted);
-	blk_execute_rq_nowait(sdp->disk, srp->rq, (int)at_head, sg_rq_end_io);
+		set_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm);
+	}
+	blk_execute_rq_nowait(sdp->disk, READ_ONCE(srp->rqq), (int)at_head, sg_rq_end_io);
 }
 
 /*
@@ -1086,7 +1106,7 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		res = -ENODEV;
 		goto err_out;
 	}
-	srp->rq->timeout = cwrp->timeout;
+	READ_ONCE(srp->rqq)->timeout = cwrp->timeout;
 	sg_execute_cmd(fp, srp);
 	return srp;
 err_out:
@@ -1654,7 +1674,7 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 static inline bool
 sg_rq_landed(struct sg_device *sdp, struct sg_request *srp)
 {
-	return atomic_read(&srp->rq_st) != SG_RS_INFLIGHT ||
+	return atomic_read_acquire(&srp->rq_st) != SG_RS_INFLIGHT ||
 	       unlikely(SG_IS_DETACHING(sdp));
 }
 
@@ -1670,6 +1690,8 @@ sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
 	enum sg_rq_state sr_st;
 	struct sg_device *sdp = sfp->parentdp;
 
+	if (atomic_read(&srp->rq_st) != SG_RS_INFLIGHT)
+		goto skip_wait;		/* and skip _acquire() */
 	SG_LOG(3, sfp, "%s: about to wait_event...()\n", __func__);
 	/* usually will be woken up by sg_rq_end_io() callback */
 	res = wait_event_interruptible(sfp->read_wait,
@@ -1682,14 +1704,16 @@ sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
 		       __func__, res);
 		return res;
 	}
+skip_wait:
 	if (unlikely(SG_IS_DETACHING(sdp))) {
-		atomic_set(&srp->rq_st, SG_RS_INACTIVE);
+		sg_rq_chg_state_force(srp, SG_RS_INACTIVE);
+		atomic_inc(&sfp->inactives);
 		return -ENODEV;
 	}
 	sr_st = atomic_read(&srp->rq_st);
 	if (unlikely(sr_st != SG_RS_AWAIT_RCV))
 		return -EPROTO;         /* Logic error */
-	res = sg_rq_state_chg(srp, sr_st, SG_RS_BUSY);
+	res = sg_rq_chg_state(srp, sr_st, SG_RS_BUSY);
 	if (unlikely(res)) {
 #if IS_ENABLED(SG_LOG_ACTIVE)
 		sg_rq_state_fail_msg(sfp, sr_st, SG_RS_BUSY, __func__);
@@ -1785,7 +1809,6 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 	n_srp = sg_mk_srp_sgat(sfp, true /* can take time */, new_sz);
 	if (IS_ERR(n_srp))
 		return PTR_ERR(n_srp);
-	sg_rq_state_force(n_srp, SG_RS_INACTIVE);
 	/* new sg_request object, sized correctly is now available */
 try_again:
 	o_srp = sfp->rsv_srp;
@@ -1816,7 +1839,8 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 		cxc_srp = __xa_cmpxchg(xafp, idx, o_srp, n_srp, GFP_ATOMIC);
 		if (o_srp == cxc_srp) {
 			sfp->rsv_srp = n_srp;
-			sg_rq_state_force(n_srp, SG_RS_INACTIVE);
+			sg_rq_chg_state_force_ulck(n_srp, SG_RS_INACTIVE);
+			/* don't bump inactives, since replaced an inactive */
 			xa_unlock_irqrestore(xafp, iflags);
 			SG_LOG(6, sfp, "%s: new rsv srp=0x%pK ++\n", __func__,
 			       n_srp);
@@ -1994,7 +2018,10 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		SG_LOG(3, sfp, "%s:    SG_GET_PACK_ID=%d\n", __func__, val);
 		return put_user(val, ip);
 	case SG_GET_NUM_WAITING:
-		return put_user(atomic_read(&sfp->waiting), ip);
+		val = atomic_read(&sfp->waiting);
+		if (val)
+			return put_user(val, ip);
+		return put_user(atomic_read_acquire(&sfp->waiting), ip);
 	case SG_GET_SG_TABLESIZE:
 		SG_LOG(3, sfp, "%s:    SG_GET_SG_TABLESIZE=%d\n", __func__,
 		       sdp->max_sgat_sz);
@@ -2207,11 +2234,16 @@ sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 static __poll_t
 sg_poll(struct file *filp, poll_table * wait)
 {
+	int num;
 	__poll_t p_res = 0;
 	struct sg_fd *sfp = filp->private_data;
 
-	poll_wait(filp, &sfp->read_wait, wait);
-	if (atomic_read(&sfp->waiting) > 0)
+	num = atomic_read(&sfp->waiting);
+	if (num < 1) {
+		poll_wait(filp, &sfp->read_wait, wait);
+		num = atomic_read(&sfp->waiting);
+	}
+	if (num > 0)
 		p_res = EPOLLIN | EPOLLRDNORM;
 
 	if (unlikely(SG_IS_DETACHING(sfp->parentdp)))
@@ -2415,7 +2447,7 @@ sg_check_sense(struct sg_device *sdp, struct sg_request *srp, int sense_len)
 {
 	int driver_stat;
 	u32 rq_res = srp->rq_result;
-	struct scsi_request *scsi_rp = scsi_req(srp->rq);
+	struct scsi_request *scsi_rp = scsi_req(READ_ONCE(srp->rqq));
 	u8 *sbp = scsi_rp ? scsi_rp->sense : NULL;
 
 	if (!sbp)
@@ -2450,36 +2482,18 @@ sg_check_sense(struct sg_device *sdp, struct sg_request *srp, int sense_len)
  * (sync) usage, sg_ctl_sg_io() waits to be woken up by this callback.
  */
 static void
-sg_rq_end_io(struct request *rq, blk_status_t status)
+sg_rq_end_io(struct request *rqq, blk_status_t status)
 {
 	enum sg_rq_state rqq_state = SG_RS_AWAIT_RCV;
 	int a_resid, slen;
-	struct sg_request *srp = rq->end_io_data;
-	struct scsi_request *scsi_rp = scsi_req(rq);
+	unsigned long iflags;
+	struct sg_request *srp = rqq->end_io_data;
+	struct scsi_request *scsi_rp = scsi_req(rqq);
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 
-	if (!scsi_rp) {
-		WARN_ONCE("%s: scsi_req(rq) unexpectedly NULL\n", __func__);
-		return;
-	}
-	if (!srp) {
-		WARN_ONCE("%s: srp unexpectedly NULL\n", __func__);
-		return;
-	}
-	if (WARN_ON(atomic_read(&srp->rq_st) != SG_RS_INFLIGHT)) {
-		pr_warn("%s: bad rq_st=%d\n", __func__,
-			atomic_read(&srp->rq_st));
-		goto early_err;
-	}
 	sfp = srp->parentfp;
-	if (unlikely(!sfp)) {
-		WARN_ONCE(1, "%s: sfp unexpectedly NULL", __func__);
-		goto early_err;
-	}
 	sdp = sfp->parentdp;
-	if (unlikely(SG_IS_DETACHING(sdp)))
-		pr_info("%s: device detaching\n", __func__);
 
 	srp->rq_result = scsi_rp->result;
 	slen = min_t(int, scsi_rp->sense_len, SCSI_SENSE_BUFFERSIZE);
@@ -2487,7 +2501,7 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 
 	if (a_resid) {
 		if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm)) {
-			if (rq_data_dir(rq) == READ)
+			if (rq_data_dir(rqq) == READ)
 				srp->in_resid = a_resid;
 			else
 				srp->s_hdr4.out_resid = a_resid;
@@ -2532,17 +2546,29 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 			set_bit(SG_FRQ_DEACT_ORPHAN, srp->frq_bm);
 		}
 	}
-	if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
-		atomic_inc(&sfp->waiting);
-	if (unlikely(sg_rq_state_chg(srp, SG_RS_INFLIGHT, rqq_state)))
-		pr_warn("%s: can't set rq_st\n", __func__);
+	xa_lock_irqsave(&sfp->srp_arr, iflags);
+	sg_rq_chg_state_force_ulck(srp, rqq_state);
+	WRITE_ONCE(srp->rqq, NULL);
+	if (test_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm)) {
+		int num = atomic_inc_return(&sfp->waiting);
+
+		if (num < 2) {
+			WRITE_ONCE(sfp->low_await_idx, srp->rq_idx);
+		} else {
+			int l_await_idx = READ_ONCE(sfp->low_await_idx);
+
+			if (l_await_idx < 0 || srp->rq_idx < l_await_idx ||
+			    !xa_get_mark(&sfp->srp_arr, l_await_idx, SG_XA_RQ_AWAIT))
+				WRITE_ONCE(sfp->low_await_idx, srp->rq_idx);
+		}
+	}
+	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 	/*
 	 * Free the mid-level resources apart from the bio (if any). The bio's
 	 * blk_rq_unmap_user() can be called later from user context.
 	 */
-	srp->rq = NULL;
 	scsi_req_free_cmd(scsi_rp);
-	blk_put_request(rq);
+	blk_put_request(rqq);
 
 	if (likely(rqq_state == SG_RS_AWAIT_RCV)) {
 		/* Wake any sg_read()/ioctl(SG_IORECEIVE) awaiting this req */
@@ -2554,12 +2580,6 @@ sg_rq_end_io(struct request *rq, blk_status_t status)
 		schedule_work(&srp->ew_orph.work);
 	}
 	return;
-
-early_err:
-	srp->rq = NULL;
-	if (scsi_rp)
-		scsi_req_free_cmd(scsi_rp);
-	blk_put_request(rq);
 }
 
 static const struct file_operations sg_fops = {
@@ -2874,7 +2894,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	u32 rq_flags = srp->rq_flags;
 	unsigned int iov_count = 0;
 	void __user *up;
-	struct request *rq;
+	struct request *rqq;
 	struct scsi_request *scsi_rp;
 	struct sg_fd *sfp = cwrp->sfp;
 	struct sg_device *sdp;
@@ -2930,14 +2950,14 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	 * do not want to use BLK_MQ_REQ_NOWAIT here because userspace might
 	 * not expect an EWOULDBLOCK from this condition.
 	 */
-	rq = blk_get_request(q, (r0w ? REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN), 0);
-	if (IS_ERR(rq)) {
+	rqq = blk_get_request(q, (r0w ? REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN), 0);
+	if (IS_ERR(rqq)) {
 		kfree(long_cmdp);
-		return PTR_ERR(rq);
+		return PTR_ERR(rqq);
 	}
 	/* current sg_request protected by SG_RS_BUSY state */
-	scsi_rp = scsi_req(rq);
-	srp->rq = rq;
+	scsi_rp = scsi_req(rqq);
+	WRITE_ONCE(srp->rqq, rqq);
 
 	if (cwrp->cmd_len > BLK_MAX_CDB)
 		scsi_rp->cmd = long_cmdp;	/* transfer ownership */
@@ -2953,7 +2973,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	us_xfer = !(rq_flags & (SG_FLAG_NO_DXFER | SG_FLAG_MMAP_IO));
 	assign_bit(SG_FRQ_NO_US_XFER, srp->frq_bm, !us_xfer);
 	reserved = (sfp->rsv_srp == srp);
-	rq->end_io_data = srp;
+	rqq->end_io_data = srp;
 	scsi_rp->retries = SG_DEFAULT_RETRIES;
 	req_schp = &srp->sgat_h;
 
@@ -3005,27 +3025,25 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		}
 
 		if (us_xfer)
-			res = blk_rq_map_user_iov(q, rq, md, &i, GFP_ATOMIC);
+			res = blk_rq_map_user_iov(q, rqq, md, &i, GFP_ATOMIC);
 		kfree(iov);
 		if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
 			cp = "iov_count > 0";
 	} else if (us_xfer) { /* setup for transfer data to/from user space */
-		res = blk_rq_map_user(q, rq, md, up, dxfer_len, GFP_ATOMIC);
+		res = blk_rq_map_user(q, rqq, md, up, dxfer_len, GFP_ATOMIC);
 		if (IS_ENABLED(CONFIG_SCSI_PROC_FS) && res)
 			SG_LOG(1, sfp, "%s: blk_rq_map_user() res=%d\n",
 			       __func__, res);
 	}
 fini:
 	if (unlikely(res)) {		/* failure, free up resources */
-		if (us_xfer && rq->bio)
-			blk_rq_unmap_user(rq->bio);
+		if (us_xfer && rqq->bio)
+			blk_rq_unmap_user(rqq->bio);
 		scsi_req_free_cmd(scsi_rp);
-		srp->rq = NULL;
-		if (us_xfer && rq->bio)
-			blk_rq_unmap_user(rq->bio);
-		blk_put_request(rq);
+		WRITE_ONCE(srp->rqq, NULL);
+		blk_put_request(rqq);
 	} else {
-		srp->bio = rq->bio;
+		srp->bio = rqq->bio;
 	}
 	SG_LOG((res ? 1 : 4), sfp, "%s: %s res=%d [0x%pK]\n", __func__, cp,
 	       res, srp);
@@ -3044,21 +3062,21 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 {
 	int ret;
 	struct sg_fd *sfp = srp->parentfp;
-	struct request *rq = srp->rq;
+	struct request *rqq = READ_ONCE(srp->rqq);
 
 	SG_LOG(4, sfp, "%s: srp=0x%pK%s\n", __func__, srp,
 	       (srp->parentfp->rsv_srp == srp) ? " rsv" : "");
-	if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
+	if (test_and_clear_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm)) {
 		atomic_dec(&sfp->submitted);
 		atomic_dec(&sfp->waiting);
 	}
 
-	/* Expect blk_put_request(rq) already called in sg_rq_end_io() */
-	if (rq) {       /* blk_get_request() may have failed */
-		srp->rq = NULL;
-		if (scsi_req(rq))
-			scsi_req_free_cmd(scsi_req(rq));
-		blk_put_request(rq);
+	/* Expect blk_put_request(rqq) already called in sg_rq_end_io() */
+	if (rqq) {       /* blk_get_request() may have failed */
+		WRITE_ONCE(srp->rqq, NULL);
+		if (scsi_req(rqq))
+			scsi_req_free_cmd(scsi_req(rqq));
+		blk_put_request(rqq);
 	}
 	if (srp->bio) {
 		bool us_xfer = !test_bit(SG_FRQ_NO_US_XFER, srp->frq_bm);
@@ -3240,19 +3258,30 @@ sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 	bool second = false;
 	int res;
 	int num_waiting = atomic_read(&sfp->waiting);
-	unsigned long idx, start_idx, end_idx;
+	int l_await_idx = READ_ONCE(sfp->low_await_idx);
+	unsigned long idx, s_idx;
+	unsigned long end_idx = ULONG_MAX;
 	struct sg_request *srp = NULL;
 	struct xarray *xafp = &sfp->srp_arr;
 
-	if (num_waiting < 1)
-		return NULL;
+	if (num_waiting < 1) {
+		num_waiting = atomic_read_acquire(&sfp->waiting);
+		if (num_waiting < 1)
+			return NULL;
+	}
+
+	s_idx = (l_await_idx < 0) ? 0 : l_await_idx;
+	idx = s_idx;
 	if (unlikely(search_for_1)) {
-		xa_for_each_marked(xafp, idx, srp, SG_XA_RQ_AWAIT) {
+second_time:
+		for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
+		     srp;
+		     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
 			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
 				continue;
 			if (srp->pack_id != pack_id)
 				continue;
-			res = sg_rq_state_chg(srp, SG_RS_AWAIT_RCV, SG_RS_BUSY);
+			res = sg_rq_chg_state(srp, SG_RS_AWAIT_RCV, SG_RS_BUSY);
 			if (likely(res == 0))
 				goto good;
 			/* else another caller got it, move on */
@@ -3262,6 +3291,14 @@ sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 			}
 			break;
 		}
+		/* If not found so far, need to wrap around and search [0 ... s_idx) */
+		if (!srp && !second && s_idx > 0) {
+			end_idx = s_idx - 1;
+			s_idx = 0;
+			idx = s_idx;
+			second = true;
+			goto second_time;
+		}
 	} else {
 		/*
 		 * Searching for _any_ request is the more likely usage. Start searching with the
@@ -3270,29 +3307,27 @@ sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 		 * is ready. If there is no queuing and the "last used" has been re-used then the
 		 * first (relative) position will be the request we want.
 		 */
-		start_idx = READ_ONCE(sfp->prev_used_idx);
-		end_idx = ULONG_MAX;
-second_time:
-		idx = start_idx;
+second_time2:
 		for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
 		     srp;
 		     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
 			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
 				continue;
-			res = sg_rq_state_chg(srp, SG_RS_AWAIT_RCV, SG_RS_BUSY);
-			if (likely(res == 0))
+			res = sg_rq_chg_state(srp, SG_RS_AWAIT_RCV, SG_RS_BUSY);
+			if (likely(res == 0)) {
+				WRITE_ONCE(sfp->low_await_idx, idx + 1);
 				goto good;
+			}
 #if IS_ENABLED(SG_LOG_ACTIVE)
-			else
-				sg_rq_state_fail_msg(sfp, SG_RS_AWAIT_RCV, SG_RS_BUSY, __func__);
+			sg_rq_state_fail_msg(sfp, SG_RS_AWAIT_RCV, SG_RS_BUSY, __func__);
 #endif
 		}
-		/* If not found so far, need to wrap around and search [0 ... start_idx) */
-		if (!srp && !second && start_idx > 0) {
-			end_idx = start_idx - 1;
-			start_idx = 0;
+		if (!srp && !second && s_idx > 0) {
+			end_idx = s_idx - 1;
+			s_idx = 0;
+			idx = s_idx;
 			second = true;
-			goto second_time;
+			goto second_time2;
 		}
 	}
 	/* here if one of above loops does _not_ find a match */
@@ -3411,11 +3446,12 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 	bool mk_new_srp = true;
 	bool try_harder = false;
 	bool second = false;
-	int num_inactive = 0;
-	unsigned long idx, start_idx, end_idx, iflags;
+	bool has_inactive = false;
+	int l_used_idx;
+	unsigned long idx, s_idx, end_idx, iflags;
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_request *r_srp = NULL;	/* request to return */
-	struct sg_request *last_srp = NULL;
+	struct sg_request *low_srp = NULL;
 	struct xarray *xafp = &fp->srp_arr;
 	__maybe_unused const char *cp;
 
@@ -3424,49 +3460,70 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 	if (xa_empty(xafp)) {
 		act_empty = true;
 		mk_new_srp = true;
+	} else if (atomic_read(&fp->inactives) <= 0) {
+		mk_new_srp = true;
 	} else if (!try_harder && dxfr_len < SG_DEF_SECTOR_SZ) {
+		l_used_idx = READ_ONCE(fp->low_used_idx);
+		s_idx = (l_used_idx < 0) ? 0 : l_used_idx;
+		if (l_used_idx >= 0 && xa_get_mark(xafp, s_idx, SG_XA_RQ_INACTIVE)) {
+			r_srp = xa_load(xafp, s_idx);
+			if (r_srp && r_srp->sgat_h.buflen <= SG_DEF_SECTOR_SZ) {
+				if (sg_rq_chg_state(r_srp, SG_RS_INACTIVE, SG_RS_BUSY) == 0) {
+					found = true;
+					atomic_dec(&fp->inactives);
+					goto have_existing;
+				}
+			}
+		}
 		xa_for_each_marked(xafp, idx, r_srp, SG_XA_RQ_INACTIVE) {
-			++num_inactive;
-			if (dxfr_len < SG_DEF_SECTOR_SZ)
-				last_srp = r_srp;
+			has_inactive = true;
+			if (!low_srp && dxfr_len < SG_DEF_SECTOR_SZ) {
+				low_srp = r_srp;
+				break;
+			}
 		}
-		/* If dxfr_len is small, use last inactive request */
-		if (last_srp) {
-			r_srp = last_srp;
-			if (sg_rq_state_chg(r_srp, SG_RS_INACTIVE, SG_RS_BUSY))
+		/* If dxfr_len is small, use lowest inactive request */
+		if (low_srp) {
+			r_srp = low_srp;
+			if (sg_rq_chg_state(r_srp, SG_RS_INACTIVE, SG_RS_BUSY))
 				goto start_again; /* gone to another thread */
+			atomic_dec(&fp->inactives);
 			cp = "toward end of srp_arr";
 			found = true;
 		}
 	} else {
-		start_idx = READ_ONCE(fp->prev_used_idx);
+		l_used_idx = READ_ONCE(fp->low_used_idx);
+		s_idx = (l_used_idx < 0) ? 0 : l_used_idx;
+		idx = s_idx;
 		end_idx = ULONG_MAX;
 second_time:
-		idx = start_idx;
 		for (r_srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE);
 		     r_srp;
 		     r_srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE)) {
 			if (r_srp->sgat_h.buflen >= dxfr_len) {
-				if (sg_rq_state_chg(r_srp, SG_RS_INACTIVE, SG_RS_BUSY))
+				if (sg_rq_chg_state(r_srp, SG_RS_INACTIVE, SG_RS_BUSY))
 					continue;
+				atomic_dec(&fp->inactives);
+				WRITE_ONCE(fp->low_used_idx, idx + 1);
 				cp = "near front of srp_arr";
 				found = true;
 				break;
 			}
 		}
 		/* If not found so far, need to wrap around and search [0 ... start_idx) */
-		if (!r_srp && !second && start_idx > 0) {
-			end_idx = start_idx - 1;
-			start_idx = 0;
+		if (!r_srp && !second && s_idx > 0) {
+			end_idx = s_idx - 1;
+			s_idx = 0;
+			idx = s_idx;
 			second = true;
 			goto second_time;
 		}
 	}
+have_existing:
 	if (found) {
 		r_srp->in_resid = 0;
 		r_srp->rq_info = 0;
 		r_srp->sense_len = 0;
-		WRITE_ONCE(fp->prev_used_idx, r_srp->rq_idx);
 		mk_new_srp = false;
 	} else {
 		mk_new_srp = true;
@@ -3475,7 +3532,6 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 		bool allow_cmd_q = test_bit(SG_FFD_CMD_Q, fp->ffd_bm);
 		int res;
 		u32 n_idx;
-		struct xa_limit xal = { .max = 0, .min = 0 };
 
 		cp = "new";
 		if (!allow_cmd_q && atomic_read(&fp->submitted) > 0) {
@@ -3487,16 +3543,14 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 		r_srp = sg_mk_srp_sgat(fp, act_empty, dxfr_len);
 		if (IS_ERR(r_srp)) {
 			if (!try_harder && dxfr_len < SG_DEF_SECTOR_SZ &&
-			    num_inactive > 0) {
+			    has_inactive) {
 				try_harder = true;
 				goto start_again;
 			}
 			goto fini;
 		}
-		atomic_set(&r_srp->rq_st, SG_RS_BUSY);
 		xa_lock_irqsave(xafp, iflags);
-		xal.max = atomic_inc_return(&fp->req_cnt);
-		res = __xa_alloc(xafp, &n_idx, r_srp, xal, GFP_KERNEL);
+		res = __xa_alloc(xafp, &n_idx, r_srp, xa_limit_32b, GFP_KERNEL);
 		xa_unlock_irqrestore(xafp, iflags);
 		if (res < 0) {
 			SG_LOG(1, fp, "%s: xa_alloc() failed, errno=%d\n",
@@ -3539,7 +3593,8 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 	sbp = srp->sense_bp;
 	srp->sense_bp = NULL;
 	WRITE_ONCE(srp->frq_bm[0], 0);
-	sg_rq_state_chg_force(srp, SG_RS_INACTIVE);
+	sg_rq_chg_state_force(srp, SG_RS_INACTIVE);
+	atomic_inc(&sfp->inactives);
 	/* maybe orphaned req, thus never read */
 	if (sbp)
 		mempool_free(sbp, sg_sense_pool);
@@ -3558,7 +3613,6 @@ sg_add_sfp(struct sg_device *sdp)
 	struct sg_request *srp = NULL;
 	struct xarray *xadp = &sdp->sfp_arr;
 	struct xarray *xafp;
-	struct xa_limit xal;
 
 	sfp = kzalloc(sizeof(*sfp), GFP_ATOMIC | __GFP_NOWARN);
 	if (!sfp)
@@ -3575,8 +3629,6 @@ sg_add_sfp(struct sg_device *sdp)
 	__assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, SG_DEF_COMMAND_Q);
 	__assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, SG_DEF_KEEP_ORPHAN);
 	__assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm, SG_DEFAULT_Q_AT);
-	atomic_set(&sfp->submitted, 0);
-	atomic_set(&sfp->waiting, 0);
 	/*
 	 * SG_SCATTER_SZ initializes scatter_elem_sz but different value may
 	 * be given as driver/module parameter (e.g. 'scatter_elem_sz=8192').
@@ -3588,7 +3640,7 @@ sg_add_sfp(struct sg_device *sdp)
 	sfp->parentdp = sdp;
 	atomic_set(&sfp->submitted, 0);
 	atomic_set(&sfp->waiting, 0);
-	atomic_set(&sfp->req_cnt, 0);
+	atomic_set(&sfp->inactives, 0);
 
 	if (unlikely(SG_IS_DETACHING(sdp))) {
 		SG_LOG(1, sfp, "%s: detaching\n", __func__);
@@ -3600,8 +3652,6 @@ sg_add_sfp(struct sg_device *sdp)
 
 	rbuf_len = min_t(int, sg_big_buff, sdp->max_sgat_sz);
 	if (rbuf_len > 0) {
-		struct xa_limit xalrq = { .max = 0, .min = 0 };
-
 		srp = sg_build_reserve(sfp, rbuf_len);
 		if (IS_ERR(srp)) {
 			err = PTR_ERR(srp);
@@ -3617,8 +3667,7 @@ sg_add_sfp(struct sg_device *sdp)
 			       __func__, rbuf_len, srp->sgat_h.buflen);
 		}
 		xa_lock_irqsave(xafp, iflags);
-		xalrq.max = atomic_inc_return(&sfp->req_cnt);
-		res = __xa_alloc(xafp, &idx, srp, xalrq, GFP_ATOMIC);
+		res = __xa_alloc(xafp, &idx, srp, xa_limit_32b, GFP_ATOMIC);
 		xa_unlock_irqrestore(xafp, iflags);
 		if (res < 0) {
 			SG_LOG(1, sfp, "%s: xa_alloc(srp) bad, errno=%d\n",
@@ -3630,20 +3679,19 @@ sg_add_sfp(struct sg_device *sdp)
 		}
 		srp->rq_idx = idx;
 		srp->parentfp = sfp;
-		sg_rq_state_chg_force(srp, SG_RS_INACTIVE);
+		sg_rq_chg_state_force(srp, SG_RS_INACTIVE);
+		atomic_inc(&sfp->inactives);
 	}
 	if (!reduced) {
 		SG_LOG(4, sfp, "%s: built reserve buflen=%d\n", __func__,
 		       rbuf_len);
 	}
 	xa_lock_irqsave(xadp, iflags);
-	xal.min = 0;
-	xal.max = atomic_read(&sdp->open_cnt);
-	res = __xa_alloc(xadp, &idx, sfp, xal, GFP_KERNEL);
+	res = __xa_alloc(xadp, &idx, sfp, xa_limit_32b, GFP_KERNEL);
 	xa_unlock_irqrestore(xadp, iflags);
 	if (res < 0) {
 		pr_warn("%s: xa_alloc(sdp) bad, o_count=%d, errno=%d\n",
-			__func__, xal.max, -res);
+			__func__, atomic_read(&sdp->open_cnt), -res);
 		if (srp) {
 			sg_remove_sgat(srp);
 			kfree(srp);
@@ -4055,12 +4103,13 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
 		       (int)test_bit(SG_FFD_FORCE_PACKID, fp->ffd_bm),
 		       (int)test_bit(SG_FFD_KEEP_ORPHAN, fp->ffd_bm),
 		       fp->ffd_bm[0]);
-	n += scnprintf(obp + n, len - n, "   mmap_sz=%d prev_used_idx=%d\n",
-		       fp->mmap_sz, fp->prev_used_idx);
 	n += scnprintf(obp + n, len - n,
-		       "   submitted=%d waiting=%d   open thr_id=%d\n",
+		       "   mmap_sz=%d low_used_idx=%d low_await_idx=%d\n",
+		       fp->mmap_sz, READ_ONCE(fp->low_used_idx), READ_ONCE(fp->low_await_idx));
+	n += scnprintf(obp + n, len - n,
+		       "   submitted=%d waiting=%d inactives=%d   open thr_id=%d\n",
 		       atomic_read(&fp->submitted),
-		       atomic_read(&fp->waiting), fp->tid);
+		       atomic_read(&fp->waiting), atomic_read(&fp->inactives), fp->tid);
 	k = 0;
 	xa_lock_irqsave(&fp->srp_arr, iflags);
 	xa_for_each(&fp->srp_arr, idx, srp) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 42/83] sg: remove unit attention check for device changed
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (41 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 41/83] sg: track lowest inactive and await indexes Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 43/83] sg: no_dxfer: move to/from kernel buffers Douglas Gilbert
                   ` (40 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

The SCSI mid-layer now checks for SCSI UNIT ATTENTIONs and takes
the appropriate actions. This means that the sg driver no longer
needs to do this check.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 49 ++++++++++++-----------------------------------
 1 file changed, 12 insertions(+), 37 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 8521782adf70..1e187ecbf0ae 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -2442,39 +2442,6 @@ sg_rq_end_io_usercontext(struct work_struct *work)
 	kref_put(&sfp->f_ref, sg_remove_sfp);
 }
 
-static void
-sg_check_sense(struct sg_device *sdp, struct sg_request *srp, int sense_len)
-{
-	int driver_stat;
-	u32 rq_res = srp->rq_result;
-	struct scsi_request *scsi_rp = scsi_req(READ_ONCE(srp->rqq));
-	u8 *sbp = scsi_rp ? scsi_rp->sense : NULL;
-
-	if (!sbp)
-		return;
-	driver_stat = driver_byte(rq_res);
-	if (driver_stat & DRIVER_SENSE) {
-		struct scsi_sense_hdr ssh;
-
-		if (scsi_normalize_sense(sbp, sense_len, &ssh)) {
-			if (!scsi_sense_is_deferred(&ssh)) {
-				if (ssh.sense_key == UNIT_ATTENTION) {
-					if (sdp->device->removable)
-						sdp->device->changed = 1;
-				}
-			}
-		}
-	}
-	if (test_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm) > 0) {
-		int scsi_stat = rq_res & 0xff;
-
-		if (scsi_stat == SAM_STAT_CHECK_CONDITION ||
-		    scsi_stat == SAM_STAT_COMMAND_TERMINATED)
-			__scsi_print_sense(sdp->device, __func__, sbp,
-					   sense_len);
-	}
-}
-
 /*
  * This "bottom half" (soft interrupt) handler is called by the mid-level
  * when a request has completed or failed. This callback is registered in a
@@ -2486,6 +2453,7 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 {
 	enum sg_rq_state rqq_state = SG_RS_AWAIT_RCV;
 	int a_resid, slen;
+	u32 rq_result;
 	unsigned long iflags;
 	struct sg_request *srp = rqq->end_io_data;
 	struct scsi_request *scsi_rp = scsi_req(rqq);
@@ -2495,7 +2463,8 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	sfp = srp->parentfp;
 	sdp = sfp->parentdp;
 
-	srp->rq_result = scsi_rp->result;
+	rq_result = scsi_rp->result;
+	srp->rq_result = rq_result;
 	slen = min_t(int, scsi_rp->sense_len, SCSI_SENSE_BUFFERSIZE);
 	a_resid = scsi_rp->resid_len;
 
@@ -2511,10 +2480,16 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	}
 
 	SG_LOG(6, sfp, "%s: pack_id=%d, res=0x%x\n", __func__, srp->pack_id,
-	       srp->rq_result);
+	       rq_result);
 	srp->duration = sg_calc_rq_dur(srp);
-	if (unlikely((srp->rq_result & SG_ML_RESULT_MSK) && slen > 0))
-		sg_check_sense(sdp, srp, slen);
+	if (unlikely((rq_result & SG_ML_RESULT_MSK) && slen > 0 &&
+		     test_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm))) {
+		u32 scsi_stat = rq_result & 0xff;
+
+		if (scsi_stat == SAM_STAT_CHECK_CONDITION ||
+		    scsi_stat == SAM_STAT_COMMAND_TERMINATED)
+			__scsi_print_sense(sdp->device, __func__, scsi_rp->sense, slen);
+	}
 	if (slen > 0) {
 		if (scsi_rp->sense && !srp->sense_bp) {
 			srp->sense_bp = mempool_alloc(sg_sense_pool,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 43/83] sg: no_dxfer: move to/from kernel buffers
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (42 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 42/83] sg: remove unit attention check for device changed Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-28  7:07   ` Hannes Reinecke
  2021-04-27 21:56 ` [PATCH v18 44/83] sg: add blk_poll support Douglas Gilbert
                   ` (39 subsequent siblings)
  83 siblings, 1 reply; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

When the NO_DXFER flag is use on a command/request, the data-in
and data-out buffers (if present) should not be ignored. Add
sg_rq_map_kern() function to handle this. Uses a single bio with
multiple bvec_s usually each holding multiple pages, if necessary.
The driver default element size is 32 KiB so if PAGE_SIZE is 4096
then get_order()==3 .

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 1e187ecbf0ae..9e93047bcb0f 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -2848,6 +2848,63 @@ exit_sg(void)
 	idr_destroy(&sg_index_idr);
 }
 
+static struct bio *
+sg_mk_kern_bio(int bvec_cnt)
+{
+	struct bio *biop;
+
+	if (bvec_cnt > BIO_MAX_PAGES)
+		return NULL;
+	biop = bio_alloc(GFP_ATOMIC, bvec_cnt);
+	if (!biop)
+		return NULL;
+	biop->bi_end_io = bio_put;
+	return biop;
+}
+
+/*
+ * Setup to move data between kernel buffers managed by this driver and a SCSI device. Note that
+ * there is no corresponding 'unmap' call as is required by blk_rq_map_user() . Uses a single
+ * bio with an expanded appended bvec if necessary.
+ */
+static int
+sg_rq_map_kern(struct sg_request *srp, struct request_queue *q, struct request *rqq, int rw_ind)
+{
+	struct sg_scatter_hold *schp = &srp->sgat_h;
+	struct bio *bio;
+	int k, ln;
+	int op_flags = 0;
+	int num_sgat = schp->num_sgat;
+	int dlen = schp->dlen;
+	int pg_sz = 1 << (PAGE_SHIFT + schp->page_order);
+	int num_segs = (1 << schp->page_order) * num_sgat;
+	int res = 0;
+
+	SG_LOG(4, srp->parentfp, "%s: dlen=%d, pg_sz=%d\n", __func__, dlen, pg_sz);
+	if (num_sgat <= 0)
+		return 0;
+	if (rw_ind == WRITE)
+		op_flags = REQ_SYNC | REQ_IDLE;
+	bio = sg_mk_kern_bio(num_sgat);
+	if (!bio)
+		return -ENOMEM;
+	bio->bi_opf = req_op(rqq) | op_flags;
+
+	for (k = 0; k < num_sgat && dlen > 0; ++k, dlen -= ln) {
+		ln = min_t(int, dlen, pg_sz);
+		if (bio_add_pc_page(q, bio, schp->pages[k], ln, 0) < ln) {
+			bio_put(bio);
+			return -EINVAL;
+		}
+	}
+	res = blk_rq_append_bio(rqq, &bio);
+	if (unlikely(res))
+		bio_put(bio);
+	else
+		rqq->nr_phys_segments = num_segs;
+	return res;
+}
+
 static inline void
 sg_set_map_data(const struct sg_scatter_hold *schp, bool up_valid,
 		struct rq_map_data *mdp)
@@ -3009,6 +3066,8 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		if (IS_ENABLED(CONFIG_SCSI_PROC_FS) && res)
 			SG_LOG(1, sfp, "%s: blk_rq_map_user() res=%d\n",
 			       __func__, res);
+	} else {	/* transfer data to/from kernel buffers */
+		res = sg_rq_map_kern(srp, q, rqq, r0w);
 	}
 fini:
 	if (unlikely(res)) {		/* failure, free up resources */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 44/83] sg: add blk_poll support
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (43 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 43/83] sg: no_dxfer: move to/from kernel buffers Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 45/83] sg: bump version to 4.0.12 Douglas Gilbert
                   ` (38 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

The support is added via the new SGV4_FLAG_HIPRI command flag which
causes REQ_HIPRI to be set on the request. Before waiting on an
inflight request, it is checked to see if it has SGV4_FLAG_HIPRI,
and if so blk_poll() is called instead of the wait. In situations
where only the file descriptor is known (e.g. sg_poll() and
ioctl(SG_GET_NUM_WAITING)) all inflight requests associated with
the file descriptor that have SGV4_FLAG_HIPRI set, have blk_poll()
called on them.

It is important to know blk_execute_rq_nowait() has finished before
sending blk_poll() on that request. The SG_RS_INFLIGHT state is set
just before blk_execute_rq_nowait() is called so a new bit setting
SG_FRQ_ISSUED has been added that is set just after that calls
returns.

Note that the implementation of blk_poll() calls mq_poll() in the
LLD associated with the request. Then for any request found to be
ready, blk_poll() invokes the scsi_done() callback. When blk_poll()
returns > 0 , sg_rq_end_io() may have been called on the given
request. If so the given request will be in await_rcv state.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 107 ++++++++++++++++++++++++++++++++++++++---
 include/uapi/scsi/sg.h |   1 +
 2 files changed, 102 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 9e93047bcb0f..bf83b07236f8 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -116,12 +116,14 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 #define SG_FRQ_RECEIVING	7	/* guard against multiple receivers */
 #define SG_FRQ_FOR_MMAP		8	/* request needs PAGE_SIZE elements */
 #define SG_FRQ_COUNT_ACTIVE	9	/* sfp->submitted + waiting active */
+#define SG_FRQ_ISSUED		10	/* blk_execute_rq_nowait() finished */
 
 /* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */
 #define SG_FFD_FORCE_PACKID	0	/* receive only given pack_id/tag */
 #define SG_FFD_CMD_Q		1	/* clear: only 1 active req per fd */
 #define SG_FFD_KEEP_ORPHAN	2	/* policy for this fd */
-#define SG_FFD_Q_AT_TAIL	3	/* set: queue reqs at tail of blk q */
+#define SG_FFD_HIPRI_SEEN	3	/* could have HIPRI requests active */
+#define SG_FFD_Q_AT_TAIL	4	/* set: queue reqs at tail of blk q */
 
 /* Bit positions (flags) for sg_device::fdev_bm bitmask follow */
 #define SG_FDEV_EXCLUDE		0	/* have fd open with O_EXCL */
@@ -210,6 +212,7 @@ struct sg_request {	/* active SCSI command or inactive request */
 	int sense_len;		/* actual sense buffer length (data-in) */
 	atomic_t rq_st;		/* request state, holds a enum sg_rq_state */
 	u8 cmd_opcode;		/* first byte of SCSI cdb */
+	blk_qc_t cookie;	/* ids 1 or more queues for blk_poll() */
 	u64 start_ns;		/* starting point of command duration calc */
 	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
 	u8 *sense_bp;		/* mempool alloc-ed sense buffer, as needed */
@@ -299,6 +302,9 @@ static struct sg_device *sg_get_dev(int min_dev);
 static void sg_device_destroy(struct kref *kref);
 static struct sg_request *sg_mk_srp_sgat(struct sg_fd *sfp, bool first,
 					 int db_len);
+static int sg_sfp_blk_poll(struct sg_fd *sfp, int loop_count);
+static int sg_srp_q_blk_poll(struct sg_request *srp, struct request_queue *q,
+			     int loop_count);
 #if IS_ENABLED(CONFIG_SCSI_LOGGING) && IS_ENABLED(SG_DEBUG)
 static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
 #endif
@@ -1008,6 +1014,7 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 {
 	bool at_head, is_v4h, sync;
 	struct sg_device *sdp = sfp->parentdp;
+	struct request *rqq = READ_ONCE(srp->rqq);
 
 	is_v4h = test_bit(SG_FRQ_IS_V4I, srp->frq_bm);
 	sync = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
@@ -1031,7 +1038,12 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 		atomic_inc(&sfp->submitted);
 		set_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm);
 	}
-	blk_execute_rq_nowait(sdp->disk, READ_ONCE(srp->rqq), (int)at_head, sg_rq_end_io);
+	if (srp->rq_flags & SGV4_FLAG_HIPRI) {
+		rqq->cmd_flags |= REQ_HIPRI;
+		srp->cookie = request_to_qc_t(rqq->mq_hctx, rqq);
+	}
+	blk_execute_rq_nowait(sdp->disk, rqq, (int)at_head, sg_rq_end_io);
+	set_bit(SG_FRQ_ISSUED, srp->frq_bm);
 }
 
 /*
@@ -1692,6 +1704,13 @@ sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
 
 	if (atomic_read(&srp->rq_st) != SG_RS_INFLIGHT)
 		goto skip_wait;		/* and skip _acquire() */
+	if (srp->rq_flags & SGV4_FLAG_HIPRI) {
+		/* call blk_poll(), spinning till found */
+		res = sg_srp_q_blk_poll(srp, sdp->device->request_queue, -1);
+		if (res != -ENODATA && unlikely(res < 0))
+			return res;
+		goto skip_wait;
+	}
 	SG_LOG(3, sfp, "%s: about to wait_event...()\n", __func__);
 	/* usually will be woken up by sg_rq_end_io() callback */
 	res = wait_event_interruptible(sfp->read_wait,
@@ -2018,6 +2037,8 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		SG_LOG(3, sfp, "%s:    SG_GET_PACK_ID=%d\n", __func__, val);
 		return put_user(val, ip);
 	case SG_GET_NUM_WAITING:
+		if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm))
+			sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready */
 		val = atomic_read(&sfp->waiting);
 		if (val)
 			return put_user(val, ip);
@@ -2227,6 +2248,70 @@ sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 }
 #endif
 
+/*
+ * If the sg_request object is not inflight, return -ENODATA. This function
+ * returns 1 if the given object was in inflight state and is in await_rcv
+ * state after blk_poll() returns 1 or more. If blk_poll() fails, then that
+ * (negative) value is returned. Otherwise returns 0. Note that blk_poll()
+ * may complete unrelated requests that share the same q and cookie.
+ */
+static int
+sg_srp_q_blk_poll(struct sg_request *srp, struct request_queue *q, int loop_count)
+{
+	int k, n, num;
+
+	num = (loop_count < 1) ? 1 : loop_count;
+	for (k = 0; k < num; ++k) {
+		if (atomic_read(&srp->rq_st) != SG_RS_INFLIGHT)
+			return -ENODATA;
+		n = blk_poll(q, srp->cookie, loop_count < 0 /* spin if negative */);
+		if (n > 0)
+			return atomic_read(&srp->rq_st) == SG_RS_AWAIT_RCV;
+		if (n < 0)
+			return n;
+	}
+	return 0;
+}
+
+/*
+ * Check all requests on this sfp that are both inflight and HIPRI. That check involves calling
+ * blk_poll(spin<-false) loop_count times. If loop_count is 0 then call blk_poll once.
+ * If loop_count is negative then call blk_poll(spin <- true)) once for each request.
+ * Returns number found (could be 0) or a negated errno value.
+ */
+static int
+sg_sfp_blk_poll(struct sg_fd *sfp, int loop_count)
+{
+	int res = 0;
+	int n;
+	unsigned long idx, iflags;
+	struct sg_request *srp;
+	struct scsi_device *sdev = sfp->parentdp->device;
+	struct request_queue *q = sdev ? sdev->request_queue : NULL;
+	struct xarray *xafp = &sfp->srp_arr;
+
+	if (!q)
+		return -EINVAL;
+	xa_lock_irqsave(xafp, iflags);
+	xa_for_each(xafp, idx, srp) {
+		if ((srp->rq_flags & SGV4_FLAG_HIPRI) &&
+		    !test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm) &&
+		    atomic_read(&srp->rq_st) == SG_RS_INFLIGHT &&
+		    test_bit(SG_FRQ_ISSUED, srp->frq_bm)) {
+			xa_unlock_irqrestore(xafp, iflags);
+			n = sg_srp_q_blk_poll(srp, q, loop_count);
+			if (n == -ENODATA)
+				n = 0;
+			if (unlikely(n < 0))
+				return n;
+			xa_lock_irqsave(xafp, iflags);
+			res += n;
+		}
+	}
+	xa_unlock_irqrestore(xafp, iflags);
+	return res;
+}
+
 /*
  * Implements the poll(2) system call for this driver. Returns various EPOLL*
  * flags OR-ed together.
@@ -2238,6 +2323,8 @@ sg_poll(struct file *filp, poll_table * wait)
 	__poll_t p_res = 0;
 	struct sg_fd *sfp = filp->private_data;
 
+	if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm))
+		sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready to push up */
 	num = atomic_read(&sfp->waiting);
 	if (num < 1) {
 		poll_wait(filp, &sfp->read_wait, wait);
@@ -2522,6 +2609,7 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 		}
 	}
 	xa_lock_irqsave(&sfp->srp_arr, iflags);
+	__set_bit(SG_FRQ_ISSUED, srp->frq_bm);
 	sg_rq_chg_state_force_ulck(srp, rqq_state);
 	WRITE_ONCE(srp->rqq, NULL);
 	if (test_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm)) {
@@ -2547,7 +2635,8 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 
 	if (likely(rqq_state == SG_RS_AWAIT_RCV)) {
 		/* Wake any sg_read()/ioctl(SG_IORECEIVE) awaiting this req */
-		wake_up_interruptible(&sfp->read_wait);
+		if (!(srp->rq_flags & SGV4_FLAG_HIPRI))
+			wake_up_interruptible(&sfp->read_wait);
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
 		kref_put(&sfp->f_ref, sg_remove_sfp);
 	} else {        /* clean up orphaned request that aren't being kept */
@@ -2990,6 +3079,8 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	/* current sg_request protected by SG_RS_BUSY state */
 	scsi_rp = scsi_req(rqq);
 	WRITE_ONCE(srp->rqq, rqq);
+	if (rq_flags & SGV4_FLAG_HIPRI)
+		set_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm);
 
 	if (cwrp->cmd_len > BLK_MAX_CDB)
 		scsi_rp->cmd = long_cmdp;	/* transfer ownership */
@@ -3101,7 +3192,8 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 	SG_LOG(4, sfp, "%s: srp=0x%pK%s\n", __func__, srp,
 	       (srp->parentfp->rsv_srp == srp) ? " rsv" : "");
 	if (test_and_clear_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm)) {
-		atomic_dec(&sfp->submitted);
+		if (atomic_dec_and_test(&sfp->submitted))
+			clear_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm);
 		atomic_dec(&sfp->waiting);
 	}
 
@@ -3298,6 +3390,8 @@ sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 	struct sg_request *srp = NULL;
 	struct xarray *xafp = &sfp->srp_arr;
 
+	if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm))
+		sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready to push up */
 	if (num_waiting < 1) {
 		num_waiting = atomic_read_acquire(&sfp->waiting);
 		if (num_waiting < 1)
@@ -4106,8 +4200,9 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, char *obp, int len)
 	else if (dur < U32_MAX)	/* in-flight or busy (so ongoing) */
 		n += scnprintf(obp + n, len - n, " t_o/elap=%us/%ums",
 			       to / 1000, dur);
-	n += scnprintf(obp + n, len - n, " sgat=%d op=0x%02x\n",
-		       srp->sgat_h.num_sgat, srp->cmd_opcode);
+	cp = (srp->rq_flags & SGV4_FLAG_HIPRI) ? "hipri " : "";
+	n += scnprintf(obp + n, len - n, " sgat=%d %sop=0x%02x\n",
+		       srp->sgat_h.num_sgat, cp, srp->cmd_opcode);
 	return n;
 }
 
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 6373bc83c3b3..6fce44607613 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -110,6 +110,7 @@ typedef struct sg_io_hdr {
 #define SGV4_FLAG_Q_AT_TAIL SG_FLAG_Q_AT_TAIL
 #define SGV4_FLAG_Q_AT_HEAD SG_FLAG_Q_AT_HEAD
 #define SGV4_FLAG_IMMED 0x400 /* for polling with SG_IOR, ignored in SG_IOS */
+#define SGV4_FLAG_HIPRI 0x800 /* request will use blk_poll to complete */
 
 /* Output (potentially OR-ed together) in v3::info or v4::info field */
 #define SG_INFO_OK_MASK 0x1
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 45/83] sg: bump version to 4.0.12
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (44 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 44/83] sg: add blk_poll support Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 46/83] sg: add sg_ioabort ioctl Douglas Gilbert
                   ` (37 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Now that the sg version 4 interface is supported:
  - with ioctl(SG_IO) for synchronous/blocking use
  - with ioctl(SG_IOSUBMIT) and ioctl(SG_IORECEIVE) for
    async/non-blocking use
Plus new ioctl(SG_IOSUBMIT_V3) and ioctl(SG_IORECEIVE_V3)
potentially replace write() and read() for the sg
version 3 interface. Bump major driver version number
from 3 to 4.

The main new feature is the removal of the fixed 16 element
array of requests per file descriptor. It is replaced by
a xarray (eXtensible array) in their parent which is a
sg_fd object (i.e. a file descriptor). The sg_request
objects are not freed until the owning file descriptor is
closed; instead these objects are re-used when multiple
commands are sent to the same file descriptor.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 11 ++++++-----
 include/uapi/scsi/sg.h |  4 ++--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index bf83b07236f8..b5dc274a57c0 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -7,13 +7,14 @@
  *
  * Original driver (sg.c):
  *        Copyright (C) 1992 Lawrence Foard
- * Version 2 and 3 extensions to driver:
- *        Copyright (C) 1998 - 2019 Douglas Gilbert
+ * Version 2, 3 and 4 extensions to driver:
+ *        Copyright (C) 1998 - 2021 Douglas Gilbert
+ *
  */
 
-static int sg_version_num = 30901;  /* [x]xyyzz where [x] empty when x=0 */
-#define SG_VERSION_STR "3.9.01"		/* [x]x.[y]y.zz */
-static char *sg_version_date = "20190606";
+static int sg_version_num = 40012;  /* [x]xyyzz where [x] empty when x=0 */
+#define SG_VERSION_STR "4.0.12"		/* [x]x.[y]y.zz */
+static char *sg_version_date = "20210421";
 
 #include <linux/module.h>
 
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 6fce44607613..079ef6c57aea 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -12,9 +12,9 @@
  *   Copyright (C) 1992 Lawrence Foard
  *
  * Later extensions (versions 2, 3 and 4) to driver:
- *   Copyright (C) 1998 - 2018 Douglas Gilbert
+ *   Copyright (C) 1998 - 2021 Douglas Gilbert
  *
- * Version 4.0.11 (20190502)
+ * Version 4.0.12 (20210111)
  *  This version is for Linux 4 and 5 series kernels.
  *
  * Documentation
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 46/83] sg: add sg_ioabort ioctl
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (45 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 45/83] sg: bump version to 4.0.12 Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 47/83] sg: add sg_set_get_extended ioctl Douglas Gilbert
                   ` (36 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Add ioctl(SG_IOABORT) that acts as a front-end to
blk_abort_request() which is only called if the request
is "inflight". The request to abort is matched via its
pack_id and the scope of the search is the current
device.

That scope will be fine tuned in a later patch to being
either all file descriptors belonging to the current
device, or just the current file descriptor.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 170 ++++++++++++++++++++++++++++++-----------
 include/uapi/scsi/sg.h |   3 +
 2 files changed, 128 insertions(+), 45 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index b5dc274a57c0..d8628517fbe0 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -113,6 +113,7 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 #define SG_FRQ_IS_ORPHAN	1	/* owner of request gone */
 #define SG_FRQ_SYNC_INVOC	2	/* synchronous (blocking) invocation */
 #define SG_FRQ_NO_US_XFER	3	/* no user space transfer of data */
+#define SG_FRQ_ABORTING		4	/* in process of aborting this cmd */
 #define SG_FRQ_DEACT_ORPHAN	6	/* not keeping orphan so de-activate */
 #define SG_FRQ_RECEIVING	7	/* guard against multiple receivers */
 #define SG_FRQ_FOR_MMAP		8	/* request needs PAGE_SIZE elements */
@@ -1794,6 +1795,93 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	return res;
 }
 
+static struct sg_request *
+sg_match_request(struct sg_fd *sfp, int id)
+{
+	int num_waiting = atomic_read(&sfp->waiting);
+	unsigned long idx;
+	struct sg_request *srp;
+
+	if (num_waiting < 1)
+		return NULL;
+	if (id == SG_PACK_ID_WILDCARD) {
+		xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT)
+			return srp;
+	} else {
+		xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
+			if (id == srp->pack_id)
+				return srp;
+		}
+	}
+	return NULL;
+}
+
+static int
+sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
+{
+	int res, pack_id, id;
+	unsigned long iflags, idx;
+	struct sg_fd *o_sfp;
+	struct sg_request *srp;
+	struct sg_io_v4 io_v4;
+	struct sg_io_v4 *h4p = &io_v4;
+
+	if (copy_from_user(h4p, p, SZ_SG_IO_V4))
+		return -EFAULT;
+	if (h4p->guard != 'Q' || h4p->protocol != 0 || h4p->subprotocol != 0)
+		return -EPERM;
+	pack_id = h4p->request_extra;
+	id = pack_id;
+
+	xa_lock_irqsave(&sfp->srp_arr, iflags);
+	srp = sg_match_request(sfp, id);
+	if (!srp) {	/* assume device (not just fd) scope */
+		xa_unlock_irqrestore(&sfp->srp_arr, iflags);
+		xa_for_each(&sdp->sfp_arr, idx, o_sfp) {
+			if (o_sfp == sfp)
+				continue;	/* already checked */
+			srp = sg_match_request(o_sfp, id);
+			if (srp) {
+				sfp = o_sfp;
+				xa_lock_irqsave(&sfp->srp_arr, iflags);
+				break;
+			}
+		}
+		if (!srp)
+			return -ENODATA;
+	}
+
+	set_bit(SG_FRQ_ABORTING, srp->frq_bm);
+	res = 0;
+	switch (atomic_read(&srp->rq_st)) {
+	case SG_RS_BUSY:
+		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		res = -EBUSY;	/* shouldn't occur often */
+		break;
+	case SG_RS_INACTIVE:	/* inactive on rq_list not good */
+		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		res = -EPROTO;
+		break;
+	case SG_RS_AWAIT_RCV:	/* user should still do completion */
+		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		break;		/* nothing to do here, return 0 */
+	case SG_RS_INFLIGHT:	/* only attempt abort if inflight */
+		srp->rq_result |= (DRIVER_SOFT << 24);
+		{
+			struct request *rqq = READ_ONCE(srp->rqq);
+
+			if (rqq)
+				blk_abort_request(rqq);
+		}
+		break;
+	default:
+		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		break;
+	}
+	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
+	return res;
+}
+
 /*
  * First normalize want_rsv_sz to be >= sfp->sgat_elem_sz and
  * <= max_segment_size. Exit if that is the same as old size; otherwise
@@ -1929,8 +2017,6 @@ sg_ctl_req_tbl(struct sg_fd *sfp, void __user *p)
 		return -ENOMEM;
 	val = 0;
 	xa_for_each(&sfp->srp_arr, idx, srp) {
-		if (!srp)
-			continue;
 		if (val >= SG_MAX_QUEUE)
 			break;
 		if (xa_get_mark(&sfp->srp_arr, idx, SG_XA_RQ_INACTIVE))
@@ -1939,8 +2025,6 @@ sg_ctl_req_tbl(struct sg_fd *sfp, void __user *p)
 		val++;
 	}
 	xa_for_each(&sfp->srp_arr, idx, srp) {
-		if (!srp)
-			continue;
 		if (val >= SG_MAX_QUEUE)
 			break;
 		if (!xa_get_mark(&sfp->srp_arr, idx, SG_XA_RQ_INACTIVE))
@@ -1990,7 +2074,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 {
 	bool read_only = O_RDWR != (filp->f_flags & O_ACCMODE);
 	int val;
-	int result = 0;
+	int res = 0;
 	int __user *ip = p;
 	struct sg_request *srp;
 	struct scsi_device *sdev;
@@ -2018,13 +2102,21 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	case SG_IORECEIVE_V3:
 		SG_LOG(3, sfp, "%s:    SG_IORECEIVE_V3\n", __func__);
 		return sg_ctl_ioreceive_v3(filp, sfp, p);
+	case SG_IOABORT:
+		SG_LOG(3, sfp, "%s:    SG_IOABORT\n", __func__);
+		if (read_only)
+			return -EPERM;
+		mutex_lock(&sfp->f_mutex);
+		res = sg_ctl_abort(sdp, sfp, p);
+		mutex_unlock(&sfp->f_mutex);
+		return res;
 	case SG_GET_SCSI_ID:
 		return sg_ctl_scsi_id(sdev, sfp, p);
 	case SG_SET_FORCE_PACK_ID:
 		SG_LOG(3, sfp, "%s:    SG_SET_FORCE_PACK_ID\n", __func__);
-		result = get_user(val, ip);
-		if (result)
-			return result;
+		res = get_user(val, ip);
+		if (res)
+			return res;
 		assign_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm, !!val);
 		return 0;
 	case SG_GET_PACK_ID:    /* or tag of oldest "read"-able, -1 if none */
@@ -2049,18 +2141,18 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		       sdp->max_sgat_sz);
 		return put_user(sdp->max_sgat_sz, ip);
 	case SG_SET_RESERVED_SIZE:
-		result = get_user(val, ip);
-		if (!result) {
+		res = get_user(val, ip);
+		if (!res) {
 			if (val >= 0 && val <= (1024 * 1024 * 1024)) {
 				mutex_lock(&sfp->f_mutex);
-				result = sg_set_reserved_sz(sfp, val);
+				res = sg_set_reserved_sz(sfp, val);
 				mutex_unlock(&sfp->f_mutex);
 			} else {
 				SG_LOG(3, sfp, "%s: invalid size\n", __func__);
-				result = -EINVAL;
+				res = -EINVAL;
 			}
 		}
-		return result;
+		return res;
 	case SG_GET_RESERVED_SIZE:
 		mutex_lock(&sfp->f_mutex);
 		val = min_t(int, sfp->rsv_srp->sgat_h.buflen,
@@ -2068,13 +2160,13 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		mutex_unlock(&sfp->f_mutex);
 		SG_LOG(3, sfp, "%s:    SG_GET_RESERVED_SIZE=%d\n",
 		       __func__, val);
-		result = put_user(val, ip);
-		return result;
+		res = put_user(val, ip);
+		return res;
 	case SG_SET_COMMAND_Q:
 		SG_LOG(3, sfp, "%s:    SG_SET_COMMAND_Q\n", __func__);
-		result = get_user(val, ip);
-		if (result)
-			return result;
+		res = get_user(val, ip);
+		if (res)
+			return res;
 		assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, !!val);
 		return 0;
 	case SG_GET_COMMAND_Q:
@@ -2082,9 +2174,9 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return put_user(test_bit(SG_FFD_CMD_Q, sfp->ffd_bm), ip);
 	case SG_SET_KEEP_ORPHAN:
 		SG_LOG(3, sfp, "%s:    SG_SET_KEEP_ORPHAN\n", __func__);
-		result = get_user(val, ip);
-		if (result)
-			return result;
+		res = get_user(val, ip);
+		if (res)
+			return res;
 		assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, !!val);
 		return 0;
 	case SG_GET_KEEP_ORPHAN:
@@ -2101,9 +2193,9 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		break;
 	case SG_SET_TIMEOUT:
 		SG_LOG(3, sfp, "%s:    SG_SET_TIMEOUT\n", __func__);
-		result = get_user(val, ip);
-		if (result)
-			return result;
+		res = get_user(val, ip);
+		if (res)
+			return res;
 		if (val < 0)
 			return -EIO;
 		if (val >= mult_frac((s64)INT_MAX, USER_HZ, HZ))
@@ -2129,9 +2221,9 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return put_user((int)sdev->host->unchecked_isa_dma, ip);
 	case SG_NEXT_CMD_LEN:	/* active only in v2 interface */
 		SG_LOG(3, sfp, "%s:    SG_NEXT_CMD_LEN\n", __func__);
-		result = get_user(val, ip);
-		if (result)
-			return result;
+		res = get_user(val, ip);
+		if (res)
+			return res;
 		if (val > SG_MAX_CDB_SIZE)
 			return -ENOMEM;
 		mutex_lock(&sfp->f_mutex);
@@ -2154,9 +2246,9 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 				     p);
 	case SG_SET_DEBUG:
 		SG_LOG(3, sfp, "%s:    SG_SET_DEBUG\n", __func__);
-		result = get_user(val, ip);
-		if (result)
-			return result;
+		res = get_user(val, ip);
+		if (res)
+			return res;
 		assign_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm, !!val);
 		if (val == 0)	/* user can force recalculation */
 			sg_calc_sgat_param(sdp);
@@ -2201,9 +2293,9 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 			return -EPERM;	/* don't know, so take safer approach */
 		break;
 	}
-	result = sg_allow_if_err_recovery(sdp, filp->f_flags & O_NDELAY);
-	if (unlikely(result))
-		return result;
+	res = sg_allow_if_err_recovery(sdp, filp->f_flags & O_NDELAY);
+	if (unlikely(res))
+		return res;
 	return -ENOIOCTLCMD;
 }
 
@@ -2844,8 +2936,6 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 					"%s: 0x%pK\n", __func__, sdp));
 
 	xa_for_each(&sdp->sfp_arr, idx, sfp) {
-		if (!sfp)
-			continue;
 		wake_up_interruptible_all(&sfp->read_wait);
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_HUP);
 	}
@@ -3867,8 +3957,6 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 
 	/* Cleanup any responses which were never read(). */
 	xa_for_each(xafp, idx, srp) {
-		if (!srp)
-			continue;
 		if (!xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE))
 			sg_finish_scsi_blk_rq(srp);
 		if (srp->sgat_h.buflen > 0)
@@ -4173,7 +4261,6 @@ sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 /* Writes debug info for one sg_request in obp buffer */
 static int
 sg_proc_debug_sreq(struct sg_request *srp, int to, char *obp, int len)
-		__must_hold(sfp->srp_arr.xa_lock)
 {
 	bool is_v3v4, v4, is_dur;
 	int n = 0;
@@ -4243,8 +4330,6 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
 	k = 0;
 	xa_lock_irqsave(&fp->srp_arr, iflags);
 	xa_for_each(&fp->srp_arr, idx, srp) {
-		if (!srp)
-			continue;
 		if (xa_get_mark(&fp->srp_arr, idx, SG_XA_RQ_INACTIVE))
 			continue;
 		n += sg_proc_debug_sreq(srp, fp->timeout, obp + n, len - n);
@@ -4259,8 +4344,6 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
 		n += scnprintf(obp + n, len - n, "     No requests active\n");
 	k = 0;
 	xa_for_each_marked(&fp->srp_arr, idx, srp, SG_XA_RQ_INACTIVE) {
-		if (!srp)
-			continue;
 		if (k == 0)
 			n += scnprintf(obp + n, len - n, "   Inactives:\n");
 		n += sg_proc_debug_sreq(srp, fp->timeout, obp + n, len - n);
@@ -4278,7 +4361,6 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
 /* Writes debug info for one sg device (including its sg fds) in obp buffer */
 static int
 sg_proc_debug_sdev(struct sg_device *sdp, char *obp, int len, int *fd_counterp)
-		__must_hold(sg_index_lock)
 {
 	int n = 0;
 	int my_count = 0;
@@ -4298,8 +4380,6 @@ sg_proc_debug_sdev(struct sg_device *sdp, char *obp, int len, int *fd_counterp)
 		       ilog2(sdp->max_sgat_sz), sdp->max_sgat_elems,
 		       SG_HAVE_EXCLUDE(sdp), atomic_read(&sdp->open_cnt));
 	xa_for_each(&sdp->sfp_arr, idx, fp) {
-		if (!fp)
-			continue;
 		++*countp;
 		n += scnprintf(obp + n, len - n, "  FD(%d): ", *countp);
 		n += sg_proc_debug_fd(fp, obp + n, len - n, idx);
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 079ef6c57aea..2b1b9df6c114 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -364,6 +364,9 @@ struct sg_header {
 /* Gives some v3 identifying info to driver, receives associated response */
 #define SG_IORECEIVE_V3 _IOWR(SG_IOCTL_MAGIC_NUM, 0x46, struct sg_io_hdr)
 
+/* Provides identifying info about a prior submission (e.g. a tag) */
+#define SG_IOABORT _IOW(SG_IOCTL_MAGIC_NUM, 0x43, struct sg_io_v4)
+
 /* command queuing is always on when the v3 or v4 interface is used */
 #define SG_DEF_COMMAND_Q 0
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 47/83] sg: add sg_set_get_extended ioctl
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (46 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 46/83] sg: add sg_ioabort ioctl Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 48/83] sg: sgat_elem_sz and sum_fd_dlens Douglas Gilbert
                   ` (35 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Add ioctl(SG_SET_GET_EXTENDED) together with its interface:
struct sg_extended_info which is 96 bytes long, only half
of which is currently used. The "SET_GET" component of the
name is to stress data flows towards and back from the ioctl.

That ioctl has three sections: one for getting and setting 32
bit quantities, a second section for manipulating boolean
(bit) flags, and a final section for reading 32 bit
quantities where a well known value is written and the
corresponding value is read back. Several settings can be
made in one invocation.

See the webpage at: https://sg.danny.cz/sg/sg_v40.html
specifically in section 14 titled: "IOCTLs".

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 274 +++++++++++++++++++++++++++++++++++++----
 include/uapi/scsi/sg.h |  69 +++++++++++
 2 files changed, 320 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index d8628517fbe0..17a733d621c7 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -42,7 +42,7 @@ static char *sg_version_date = "20210421";
 #include <linux/ratelimit.h>
 #include <linux/uio.h>
 #include <linux/cred.h>			/* for sg_check_file_access() */
-#include <linux/proc_fs.h>
+#include <linux/proc_fs.h>		/* used if CONFIG_SCSI_PROC_FS */
 #include <linux/xarray.h>
 #include <linux/debugfs.h>
 
@@ -125,7 +125,9 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 #define SG_FFD_CMD_Q		1	/* clear: only 1 active req per fd */
 #define SG_FFD_KEEP_ORPHAN	2	/* policy for this fd */
 #define SG_FFD_HIPRI_SEEN	3	/* could have HIPRI requests active */
-#define SG_FFD_Q_AT_TAIL	4	/* set: queue reqs at tail of blk q */
+#define SG_FFD_TIME_IN_NS	4	/* set: time in nanoseconds, else ms */
+#define SG_FFD_Q_AT_TAIL	5	/* set: queue reqs at tail of blk q */
+#define SG_FFD_NO_DURATION	6	/* don't do command duration calc */
 
 /* Bit positions (flags) for sg_device::fdev_bm bitmask follow */
 #define SG_FDEV_EXCLUDE		0	/* have fd open with O_EXCL */
@@ -317,6 +319,7 @@ static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
 #define SZ_SG_IO_HDR ((int)sizeof(struct sg_io_hdr))	/* v3 header */
 #define SZ_SG_IO_V4 ((int)sizeof(struct sg_io_v4))  /* v4 header (in bsg.h) */
 #define SZ_SG_REQ_INFO ((int)sizeof(struct sg_req_info))
+#define SZ_SG_EXTENDED_INFO ((int)sizeof(struct sg_extended_info))
 
 /* There is a assert that SZ_SG_IO_V4 >= SZ_SG_IO_HDR in first function */
 
@@ -464,10 +467,10 @@ static int
 sg_open(struct inode *inode, struct file *filp)
 {
 	bool o_excl, non_block;
-	int min_dev = iminor(inode);
-	int op_flags = filp->f_flags;
 	int res;
 	__maybe_unused int o_count;
+	int min_dev = iminor(inode);
+	int op_flags = filp->f_flags;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
 
@@ -1021,7 +1024,10 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 	is_v4h = test_bit(SG_FRQ_IS_V4I, srp->frq_bm);
 	sync = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
 	SG_LOG(3, sfp, "%s: is_v4h=%d\n", __func__, (int)is_v4h);
-	srp->start_ns = ktime_get_boottime_ns();
+	if (test_bit(SG_FFD_NO_DURATION, sfp->ffd_bm))
+		srp->start_ns = 0;
+	else
+		srp->start_ns = ktime_get_boottime_ns();/* assume always > 0 */
 	srp->duration = 0;
 
 	if (!is_v4h && srp->s_hdr3.interface_id == '\0')
@@ -1620,29 +1626,42 @@ sg_calc_sgat_param(struct sg_device *sdp)
 	sdp->max_sgat_sz = sz;
 }
 
+/*
+ * Returns duration since srp->start_ns (using boot time as an epoch). Unit
+ * is nanoseconds when time_in_ns==true; else it is in milliseconds.
+ * For backward compatibility the duration is placed in a 32 bit unsigned
+ * integer. This limits the maximum nanosecond duration that can be
+ * represented (without wrapping) to about 4.3 seconds. If that is exceeded
+ * return equivalent of 3.999.. secs as it is more eye catching than the real
+ * number. Negative durations should not be possible but if they occur set
+ * duration to an unlikely 2 nanosec. Stalls in a request setup will have
+ * ts0==S64_MAX and will return 1 for an unlikely 1 nanosecond duration.
+ */
 static u32
-sg_calc_rq_dur(const struct sg_request *srp)
+sg_calc_rq_dur(const struct sg_request *srp, bool time_in_ns)
 {
 	ktime_t ts0 = ns_to_ktime(srp->start_ns);
 	ktime_t now_ts;
 	s64 diff;
 
-	if (ts0 == 0)
+	if (ts0 == 0)	/* only when SG_FFD_NO_DURATION is set */
 		return 0;
 	if (unlikely(ts0 == S64_MAX))	/* _prior_ to issuing req */
-		return 999999999;	/* eye catching */
+		return time_in_ns ? 1 : 999999999;
 	now_ts = ktime_get_boottime();
 	if (unlikely(ts0 > now_ts))
-		return 999999998;
-	/* unlikely req duration will exceed 2**32 milliseconds */
-	diff = ktime_ms_delta(now_ts, ts0);
+		return time_in_ns ? 2 : 999999998;
+	if (time_in_ns)
+		diff = ktime_to_ns(ktime_sub(now_ts, ts0));
+	else	/* unlikely req duration will exceed 2**32 milliseconds */
+		diff = ktime_ms_delta(now_ts, ts0);
 	return (diff > (s64)U32_MAX) ? 3999999999U : (u32)diff;
 }
 
 /* Return of U32_MAX means srp is inactive state */
 static u32
 sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
-	   bool *is_durp)
+	   bool time_in_ns, bool *is_durp)
 {
 	bool is_dur = false;
 	u32 res = U32_MAX;
@@ -1650,7 +1669,7 @@ sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
 	switch (sr_stp ? *sr_stp : atomic_read(&srp->rq_st)) {
 	case SG_RS_INFLIGHT:
 	case SG_RS_BUSY:
-		res = sg_calc_rq_dur(srp);
+		res = sg_calc_rq_dur(srp, time_in_ns);
 		break;
 	case SG_RS_AWAIT_RCV:
 	case SG_RS_INACTIVE:
@@ -1672,7 +1691,8 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 	unsigned long iflags;
 
 	xa_lock_irqsave(&sfp->srp_arr, iflags);
-	rip->duration = sg_get_dur(srp, NULL, NULL);
+	rip->duration = sg_get_dur(srp, NULL, test_bit(SG_FFD_TIME_IN_NS,
+						       sfp->ffd_bm), NULL);
 	if (rip->duration == U32_MAX)
 		rip->duration = 0;
 	rip->orphan = test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
@@ -1996,6 +2016,200 @@ static int put_compat_request_table(struct compat_sg_req_info __user *o,
 }
 #endif
 
+/*
+ * Processing of ioctl(SG_SET_GET_EXTENDED(SG_SEIM_CTL_FLAGS)) which is a set
+ * of boolean flags. Access abbreviations: [rw], read-write; [ro], read-only;
+ * [wo], write-only; [raw], read after write; [rbw], read before write.
+ */
+static void
+sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
+{
+	bool flg = false;
+	const u32 c_flgs_wm = seip->ctl_flags_wr_mask;
+	const u32 c_flgs_rm = seip->ctl_flags_rd_mask;
+	const u32 c_flgs_val_in = seip->ctl_flags;
+	u32 c_flgs_val_out = c_flgs_val_in;
+	struct sg_device *sdp = sfp->parentdp;
+
+	/* TIME_IN_NS boolean, [raw] time in nanoseconds (def: millisecs) */
+	if (c_flgs_wm & SG_CTL_FLAGM_TIME_IN_NS)
+		assign_bit(SG_FFD_TIME_IN_NS, sfp->ffd_bm,
+			   !!(c_flgs_val_in & SG_CTL_FLAGM_TIME_IN_NS));
+	if (c_flgs_rm & SG_CTL_FLAGM_TIME_IN_NS) {
+		if (test_bit(SG_FFD_TIME_IN_NS, sfp->ffd_bm))
+			c_flgs_val_out |= SG_CTL_FLAGM_TIME_IN_NS;
+		else
+			c_flgs_val_out &= ~SG_CTL_FLAGM_TIME_IN_NS;
+	}
+	/* OTHER_OPENS boolean, [ro] any other sg open fds on this dev? */
+	if (c_flgs_rm & SG_CTL_FLAGM_OTHER_OPENS) {
+		if (atomic_read(&sdp->open_cnt) > 1)
+			c_flgs_val_out |= SG_CTL_FLAGM_OTHER_OPENS;
+		else
+			c_flgs_val_out &= ~SG_CTL_FLAGM_OTHER_OPENS;
+	}
+	/* Q_TAIL boolean, [raw] 1: queue at tail; 0: head (def: depends) */
+	if (c_flgs_wm & SG_CTL_FLAGM_Q_TAIL)
+		assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm,
+			   !!(c_flgs_val_in & SG_CTL_FLAGM_Q_TAIL));
+	if (c_flgs_rm & SG_CTL_FLAGM_Q_TAIL) {
+		if (test_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm))
+			c_flgs_val_out |= SG_CTL_FLAGM_Q_TAIL;
+		else
+			c_flgs_val_out &= ~SG_CTL_FLAGM_Q_TAIL;
+	}
+	/* NO_DURATION boolean, [rbw] */
+	if (c_flgs_rm & SG_CTL_FLAGM_NO_DURATION)
+		flg = test_bit(SG_FFD_NO_DURATION, sfp->ffd_bm);
+	if (c_flgs_wm & SG_CTL_FLAGM_NO_DURATION)
+		assign_bit(SG_FFD_NO_DURATION, sfp->ffd_bm,
+			   !!(c_flgs_val_in & SG_CTL_FLAGM_NO_DURATION));
+	if (c_flgs_rm & SG_CTL_FLAGM_NO_DURATION) {
+		if (flg)
+			c_flgs_val_out |= SG_CTL_FLAGM_NO_DURATION;
+		else
+			c_flgs_val_out &= ~SG_CTL_FLAGM_NO_DURATION;
+	}
+
+	if (c_flgs_val_in != c_flgs_val_out)
+		seip->ctl_flags = c_flgs_val_out;
+}
+
+static void
+sg_extended_read_value(struct sg_fd *sfp, struct sg_extended_info *seip)
+{
+	u32 uv;
+	unsigned long idx, idx2;
+	struct sg_fd *a_sfp;
+	struct sg_device *sdp = sfp->parentdp;
+	struct sg_request *srp;
+
+	switch (seip->read_value) {
+	case SG_SEIRV_INT_MASK:
+		seip->read_value = SG_SEIM_ALL_BITS;
+		break;
+	case SG_SEIRV_BOOL_MASK:
+		seip->read_value = SG_CTL_FLAGM_ALL_BITS;
+		break;
+	case SG_SEIRV_VERS_NUM:
+		seip->read_value = sg_version_num;
+		break;
+	case SG_SEIRV_INACT_RQS:
+		uv = 0;
+		xa_for_each_marked(&sfp->srp_arr, idx, srp,
+				   SG_XA_RQ_INACTIVE) {
+			if (!srp)
+				continue;
+			++uv;
+		}
+		seip->read_value = uv;
+		break;
+	case SG_SEIRV_DEV_INACT_RQS:
+		uv = 0;
+		xa_for_each(&sdp->sfp_arr, idx2, a_sfp) {
+			if (!a_sfp)
+				continue;
+			xa_for_each_marked(&a_sfp->srp_arr, idx, srp,
+					   SG_XA_RQ_INACTIVE) {
+				if (!srp)
+					continue;
+				++uv;
+			}
+		}
+		seip->read_value = uv;
+		break;
+	case SG_SEIRV_SUBMITTED:  /* counts all non-blocking on active list */
+		seip->read_value = (u32)atomic_read(&sfp->submitted);
+		break;
+	case SG_SEIRV_DEV_SUBMITTED: /* sum(submitted) on all fd's siblings */
+		uv = 0;
+		xa_for_each(&sdp->sfp_arr, idx2, a_sfp) {
+			if (!a_sfp)
+				continue;
+			uv += (u32)atomic_read(&a_sfp->submitted);
+		}
+		seip->read_value = uv;
+		break;
+	default:
+		SG_LOG(6, sfp, "%s: can't decode %d --> read_value\n",
+		       __func__, seip->read_value);
+		seip->read_value = 0;
+		break;
+	}
+}
+
+/* Called when processing ioctl(SG_SET_GET_EXTENDED) */
+static int
+sg_ctl_extended(struct sg_fd *sfp, void __user *p)
+{
+	int result = 0;
+	int ret = 0;
+	int n, s_wr_mask, s_rd_mask;
+	u32 or_masks;
+	struct sg_device *sdp = sfp->parentdp;
+	struct sg_extended_info *seip;
+	struct sg_extended_info sei;
+
+	seip = &sei;
+	if (copy_from_user(seip, p, SZ_SG_EXTENDED_INFO))
+		return -EFAULT;
+	s_wr_mask = seip->sei_wr_mask;
+	s_rd_mask = seip->sei_rd_mask;
+	or_masks = s_wr_mask | s_rd_mask;
+	if (or_masks == 0) {
+		SG_LOG(2, sfp, "%s: both masks 0, do nothing\n", __func__);
+		return 0;
+	}
+	SG_LOG(3, sfp, "%s: wr_mask=0x%x rd_mask=0x%x\n", __func__, s_wr_mask,
+	       s_rd_mask);
+	/* check all boolean flags for either wr or rd mask set in or_mask */
+	if (or_masks & SG_SEIM_CTL_FLAGS)
+		sg_extended_bool_flags(sfp, seip);
+	/* yields minor_index (type: u32) [ro] */
+	if (or_masks & SG_SEIM_MINOR_INDEX) {
+		if (s_wr_mask & SG_SEIM_MINOR_INDEX) {
+			SG_LOG(2, sfp, "%s: writing to minor_index ignored\n",
+			       __func__);
+		}
+		if (s_rd_mask & SG_SEIM_MINOR_INDEX)
+			seip->minor_index = sdp->index;
+	}
+	if ((s_rd_mask & SG_SEIM_READ_VAL) && (s_wr_mask & SG_SEIM_READ_VAL))
+		sg_extended_read_value(sfp, seip);
+	if (or_masks & SG_SEIM_BLK_POLL) {
+		n = 0;
+		if (s_wr_mask & SG_SEIM_BLK_POLL) {
+			result = sg_sfp_blk_poll(sfp, seip->num);
+			if (result < 0) {
+				if (ret == 0)
+					ret = result;
+			} else {
+				n = result;
+			}
+		}
+		if (s_rd_mask & SG_SEIM_BLK_POLL)
+			seip->num = n;
+	}
+	/* reserved_sz [raw], since may be reduced by other limits */
+	if (s_wr_mask & SG_SEIM_RESERVED_SIZE) {
+		mutex_lock(&sfp->f_mutex);
+		result = sg_set_reserved_sz(sfp, (int)seip->reserved_sz);
+		if (ret == 0 && result)
+			ret = result;
+		mutex_unlock(&sfp->f_mutex);
+	}
+	if (s_rd_mask & SG_SEIM_RESERVED_SIZE)
+		seip->reserved_sz = (u32)min_t(int,
+					       sfp->rsv_srp->sgat_h.buflen,
+					       sdp->max_sgat_sz);
+	/* copy to user space if int or boolean read mask non-zero */
+	if (s_rd_mask || seip->ctl_flags_rd_mask) {
+		if (copy_to_user(p, seip, SZ_SG_EXTENDED_INFO))
+			ret = ret ? ret : -EFAULT;
+	}
+	return ret;
+}
+
 /*
  * For backward compatibility, output SG_MAX_QUEUE sg_req_info objects. First
  * fetch from the active list then, if there is still room, from the free
@@ -2110,6 +2324,9 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		res = sg_ctl_abort(sdp, sfp, p);
 		mutex_unlock(&sfp->f_mutex);
 		return res;
+	case SG_SET_GET_EXTENDED:
+		SG_LOG(3, sfp, "%s:    SG_SET_GET_EXTENDED\n", __func__);
+		return sg_ctl_extended(sfp, p);
 	case SG_GET_SCSI_ID:
 		return sg_ctl_scsi_id(sdev, sfp, p);
 	case SG_SET_FORCE_PACK_ID:
@@ -2660,8 +2877,10 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	}
 
 	SG_LOG(6, sfp, "%s: pack_id=%d, res=0x%x\n", __func__, srp->pack_id,
-	       rq_result);
-	srp->duration = sg_calc_rq_dur(srp);
+	       srp->rq_result);
+	if (srp->start_ns > 0)	/* zero only when SG_FFD_NO_DURATION is set */
+		srp->duration = sg_calc_rq_dur(srp, test_bit(SG_FFD_TIME_IN_NS,
+							     sfp->ffd_bm));
 	if (unlikely((rq_result & SG_ML_RESULT_MSK) && slen > 0 &&
 		     test_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm))) {
 		u32 scsi_stat = rq_result & 0xff;
@@ -3788,6 +4007,9 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 	r_srp->sgat_h.dlen = dxfr_len;/* must be <= r_srp->sgat_h.buflen */
 	r_srp->cmd_opcode = 0xff;  /* set invalid opcode (VS), 0x0 is TUR */
 fini:
+	/* If setup stalls (e.g. blk_get_request()) debug shows 'elap=1 ns' */
+	if (test_bit(SG_FFD_TIME_IN_NS, fp->ffd_bm))
+		r_srp->start_ns = S64_MAX;
 	if (IS_ERR(r_srp))
 		SG_LOG(1, fp, "%s: err=%ld\n", __func__, PTR_ERR(r_srp));
 	if (!IS_ERR(r_srp))
@@ -3847,6 +4069,7 @@ sg_add_sfp(struct sg_device *sdp)
 	__assign_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm, SG_DEF_FORCE_PACK_ID);
 	__assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, SG_DEF_COMMAND_Q);
 	__assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, SG_DEF_KEEP_ORPHAN);
+	__assign_bit(SG_FFD_TIME_IN_NS, sfp->ffd_bm, SG_DEF_TIME_UNIT);
 	__assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm, SG_DEFAULT_Q_AT);
 	/*
 	 * SG_SCATTER_SZ initializes scatter_elem_sz but different value may
@@ -4260,13 +4483,15 @@ sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 
 /* Writes debug info for one sg_request in obp buffer */
 static int
-sg_proc_debug_sreq(struct sg_request *srp, int to, char *obp, int len)
+sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
+		   int len)
 {
 	bool is_v3v4, v4, is_dur;
 	int n = 0;
 	u32 dur;
 	enum sg_rq_state rq_st;
 	const char *cp;
+	const char *tp = t_in_ns ? "ns" : "ms";
 
 	if (len < 1)
 		return 0;
@@ -4279,15 +4504,15 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, char *obp, int len)
 		cp = (srp->rq_info & SG_INFO_DIRECT_IO_MASK) ?
 				"     dio>> " : "     ";
 	rq_st = atomic_read(&srp->rq_st);
-	dur = sg_get_dur(srp, &rq_st, &is_dur);
+	dur = sg_get_dur(srp, &rq_st, t_in_ns, &is_dur);
 	n += scnprintf(obp + n, len - n, "%s%s: dlen=%d/%d id=%d", cp,
 		       sg_rq_st_str(rq_st, false), srp->sgat_h.dlen,
 		       srp->sgat_h.buflen, (int)srp->pack_id);
 	if (is_dur)	/* cmd/req has completed, waiting for ... */
-		n += scnprintf(obp + n, len - n, " dur=%ums", dur);
+		n += scnprintf(obp + n, len - n, " dur=%u%s", dur, tp);
 	else if (dur < U32_MAX)	/* in-flight or busy (so ongoing) */
-		n += scnprintf(obp + n, len - n, " t_o/elap=%us/%ums",
-			       to / 1000, dur);
+		n += scnprintf(obp + n, len - n, " t_o/elap=%us/%u%s",
+			       to / 1000, dur, tp);
 	cp = (srp->rq_flags & SGV4_FLAG_HIPRI) ? "hipri " : "";
 	n += scnprintf(obp + n, len - n, " sgat=%d %sop=0x%02x\n",
 		       srp->sgat_h.num_sgat, cp, srp->cmd_opcode);
@@ -4298,6 +4523,7 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, char *obp, int len)
 static int
 sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
 {
+	bool t_in_ns = test_bit(SG_FFD_TIME_IN_NS, fp->ffd_bm);
 	int n = 0;
 	int to, k;
 	unsigned long iflags;
@@ -4332,7 +4558,8 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
 	xa_for_each(&fp->srp_arr, idx, srp) {
 		if (xa_get_mark(&fp->srp_arr, idx, SG_XA_RQ_INACTIVE))
 			continue;
-		n += sg_proc_debug_sreq(srp, fp->timeout, obp + n, len - n);
+		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns, obp + n,
+					len - n);
 		++k;
 		if ((k % 8) == 0) {     /* don't hold up isr_s too long */
 			xa_unlock_irqrestore(&fp->srp_arr, iflags);
@@ -4346,7 +4573,8 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
 	xa_for_each_marked(&fp->srp_arr, idx, srp, SG_XA_RQ_INACTIVE) {
 		if (k == 0)
 			n += scnprintf(obp + n, len - n, "   Inactives:\n");
-		n += sg_proc_debug_sreq(srp, fp->timeout, obp + n, len - n);
+		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns,
+					obp + n, len - n);
 		++k;
 		if ((k % 8) == 0) {     /* don't hold up isr_s too long */
 			xa_unlock_irqrestore(&fp->srp_arr, iflags);
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 2b1b9df6c114..74f177583fce 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -156,6 +156,72 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 	int unused;
 } sg_req_info_t;
 
+/*
+ * The following defines are for manipulating struct sg_extended_info which
+ * is abbreviated to "SEI". A following "M" (i.e. "_SEIM_") indicates a
+ * mask. Most mask values correspond to a integer (usually a uint32_t) apart
+ * from SG_SEIM_CTL_FLAGS which is for boolean values packed into an integer.
+ * The mask values for those booleans start with "SG_CTL_FLAGM_". The scope
+ * of these settings, like most other ioctls, is usually that of the file
+ * descriptor the ioctl is executed on. The "rd:" indication means read-only,
+ * attempts to write to them are ignored. "rd>" means action when reading.
+ */
+#define SG_SEIM_CTL_FLAGS	0x1	/* ctl_flags_mask bits in ctl_flags */
+#define SG_SEIM_READ_VAL	0x2	/* write SG_SEIRV_*, read back value */
+#define SG_SEIM_RESERVED_SIZE	0x4	/* reserved_sz of reserve request */
+#define SG_SEIM_MINOR_INDEX	0x10	/* sg device minor index number */
+#define SG_SEIM_SGAT_ELEM_SZ	0x80	/* sgat element size (>= PAGE_SIZE) */
+#define SG_SEIM_BLK_POLL	0x100	/* call blk_poll, uses 'num' field */
+#define SG_SEIM_ALL_BITS	0x1ff	/* should be OR of previous items */
+
+/* flag and mask values for boolean fields follow */
+#define SG_CTL_FLAGM_TIME_IN_NS	0x1	/* time: nanosecs (def: millisecs) */
+#define SG_CTL_FLAGM_OTHER_OPENS 0x4	/* rd: other sg fd_s on this dev */
+#define SG_CTL_FLAGM_ORPHANS	0x8	/* rd: orphaned requests on this fd */
+#define SG_CTL_FLAGM_Q_TAIL	0x10	/* used for future cmds on this fd */
+#define SG_CTL_FLAGM_NO_DURATION 0x400	/* don't calc command duration */
+#define SG_CTL_FLAGM_ALL_BITS	0xfff	/* should be OR of previous items */
+
+/* Write one of the following values to sg_extended_info::read_value, get... */
+#define SG_SEIRV_INT_MASK	0x0	/* get SG_SEIM_ALL_BITS */
+#define SG_SEIRV_BOOL_MASK	0x1	/* get SG_CTL_FLAGM_ALL_BITS */
+#define SG_SEIRV_VERS_NUM	0x2	/* get driver version number as int */
+#define SG_SEIRV_INACT_RQS	0x3	/* number of inactive requests */
+#define SG_SEIRV_DEV_INACT_RQS	0x4	/* sum(inactive rqs) on owning dev */
+#define SG_SEIRV_SUBMITTED	0x5	/* number of mrqs submitted+unread */
+#define SG_SEIRV_DEV_SUBMITTED	0x6	/* sum(submitted) on all dev's fds */
+
+/*
+ * A pointer to the following structure is passed as the third argument to
+ * ioctl(SG_SET_GET_EXTENDED). Each bit in the *_wr_mask fields causes the
+ * corresponding integer (e.g. reserved_sz) or bit (e.g. the
+ * SG_CTL_FLAG_TIME_IN_NS bit in ctl_flags) to be read from the user space
+ * and modify the driver. Each bit in the *_rd_mask fields causes the
+ * corresponding integer or bit to be fetched from the driver and written
+ * back to the user space. If the same bit is set in both the *_wr_mask and
+ * corresponding *_rd_mask fields, then which one comes first depends on the
+ * setting but no other operation will split the two. This structure is
+ * padded to 96 bytes to allow for new values to be added in the future.
+ */
+
+/* If both sei_wr_mask and sei_rd_mask are 0, this ioctl does nothing */
+struct sg_extended_info {
+	__u32	sei_wr_mask;	/* OR-ed SG_SEIM_* user->driver values */
+	__u32	sei_rd_mask;	/* OR-ed SG_SEIM_* driver->user values */
+	__u32	ctl_flags_wr_mask;	/* OR-ed SG_CTL_FLAGM_* values */
+	__u32	ctl_flags_rd_mask;	/* OR-ed SG_CTL_FLAGM_* values */
+	__u32	ctl_flags;	/* bit values OR-ed, see SG_CTL_FLAGM_* */
+	__u32	read_value;	/* write SG_SEIRV_*, read back related */
+
+	__u32	reserved_sz;	/* data/sgl size of pre-allocated request */
+	__u32	tot_fd_thresh;	/* total data/sgat for this fd, 0: no limit */
+	__u32	minor_index;	/* rd: kernel's sg device minor number */
+	__u32	share_fd;	/* SHARE_FD and CHG_SHARE_FD use this */
+	__u32	sgat_elem_sz;	/* sgat element size (must be power of 2) */
+	__s32	num;		/* blk_poll: loop_count (-1 -> spin)) */
+	__u8	pad_to_96[48];	/* pad so struct is 96 bytes long */
+};
+
 /*
  * IOCTLs: Those ioctls that are relevant to the SG 3.x drivers follow.
  * [Those that only apply to the SG 2.x drivers are at the end of the file.]
@@ -185,6 +251,9 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
  */
 #define SG_IOCTL_MAGIC_NUM 0x22
 
+#define SG_SET_GET_EXTENDED _IOWR(SG_IOCTL_MAGIC_NUM, 0x51,	\
+				  struct sg_extended_info)
+
 /* The following ioctl has a 'sg_scsi_id_t *' object as its 3rd argument. */
 #define SG_GET_SCSI_ID 0x2276   /* Yields fd's bus, chan, dev, lun + type */
 /* SCSI id information can also be obtained from SCSI_IOCTL_GET_IDLUN */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 48/83] sg: sgat_elem_sz and sum_fd_dlens
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (47 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 47/83] sg: add sg_set_get_extended ioctl Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:56 ` [PATCH v18 49/83] sg: tag and more_async Douglas Gilbert
                   ` (34 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Wire up some more capabilities of ioctl(SG_SET_GET_EXTENDED). One
is the size of each internal scatter gather list element. This
defaults to 2^15 and was fixed in previous versions of this
driver. If the user provides a value, it must be a power of
2 (bytes) and no less than PAGE_SIZE.

sum_fd_dlens provides user control over a mechanism designed to
stop the starvation of the host machine's memory. Since requests
per file descriptor are no longer limited to 16, thousands could
be queued up by a badly designed program. If each one requests
a large buffer (say 128 KB each for READs) then without this
mechanism, the OOM killer may be called on to save the machine.
The driver counts the cumulative size of data buffers
outstanding held by each file descriptor. Once that figure
exceeds a default size of 32 MB, further submissions on that
file descriptor are failed with E2BIG.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 68 ++++++++++++++++++++++++++++++++++++++----
 include/uapi/scsi/sg.h |  1 +
 2 files changed, 64 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 17a733d621c7..b141b0113f96 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -94,7 +94,11 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 	SG_RS_BUSY,		/* temporary state should rarely be seen */
 };
 
+/* If sum_of(dlen) of a fd exceeds this, write() will yield E2BIG */
+#define SG_TOT_FD_THRESHOLD (32 * 1024 * 1024)
+
 #define SG_TIME_UNIT_MS 0	/* milliseconds */
+/* #define SG_TIME_UNIT_NS 1	   nanoseconds */
 #define SG_DEF_TIME_UNIT SG_TIME_UNIT_MS
 #define SG_DEFAULT_TIMEOUT mult_frac(SG_DEFAULT_TIMEOUT_USER, HZ, USER_HZ)
 #define SG_FD_Q_AT_HEAD 0
@@ -238,6 +242,8 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	atomic_t submitted;	/* number inflight or awaiting receive */
 	atomic_t waiting;	/* number of requests awaiting receive */
 	atomic_t inactives;	/* number of inactive requests */
+	atomic_t sum_fd_dlens;	/* when tot_fd_thresh>0 this is sum_of(dlen) */
+	int tot_fd_thresh;	/* E2BIG if sum_of(dlen) > this, 0: ignore */
 	int sgat_elem_sz;	/* initialized to scatter_elem_sz */
 	int mmap_sz;		/* byte size of previous mmap() call */
 	unsigned long ffd_bm[1];	/* see SG_FFD_* defines above */
@@ -2144,8 +2150,8 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 {
 	int result = 0;
 	int ret = 0;
-	int n, s_wr_mask, s_rd_mask;
-	u32 or_masks;
+	int n, j, s_wr_mask, s_rd_mask;
+	u32 uv, or_masks;
 	struct sg_device *sdp = sfp->parentdp;
 	struct sg_extended_info *seip;
 	struct sg_extended_info sei;
@@ -2162,6 +2168,19 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 	}
 	SG_LOG(3, sfp, "%s: wr_mask=0x%x rd_mask=0x%x\n", __func__, s_wr_mask,
 	       s_rd_mask);
+	/* tot_fd_thresh (u32), [rbw] [limit for sum of active cmd dlen_s] */
+	if (or_masks & SG_SEIM_TOT_FD_THRESH) {
+		u32 hold = sfp->tot_fd_thresh;
+
+		if (s_wr_mask & SG_SEIM_TOT_FD_THRESH) {
+			uv = seip->tot_fd_thresh;
+			if (uv > 0 && uv < PAGE_SIZE)
+				uv = PAGE_SIZE;
+			sfp->tot_fd_thresh = uv;
+		}
+		if (s_rd_mask & SG_SEIM_TOT_FD_THRESH)
+			seip->tot_fd_thresh = hold;
+	}
 	/* check all boolean flags for either wr or rd mask set in or_mask */
 	if (or_masks & SG_SEIM_CTL_FLAGS)
 		sg_extended_bool_flags(sfp, seip);
@@ -2176,6 +2195,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 	}
 	if ((s_rd_mask & SG_SEIM_READ_VAL) && (s_wr_mask & SG_SEIM_READ_VAL))
 		sg_extended_read_value(sfp, seip);
+	/* call blk_poll() on this fd's HIPRI requests [raw] */
 	if (or_masks & SG_SEIM_BLK_POLL) {
 		n = 0;
 		if (s_wr_mask & SG_SEIM_BLK_POLL) {
@@ -2188,7 +2208,24 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 			}
 		}
 		if (s_rd_mask & SG_SEIM_BLK_POLL)
-			seip->num = n;
+			seip->num = n;		/* number completed by LLD */
+	}
+	/* override scatter gather element size [rbw] (def: SG_SCATTER_SZ) */
+	if (or_masks & SG_SEIM_SGAT_ELEM_SZ) {
+		n = sfp->sgat_elem_sz;
+		if (s_wr_mask & SG_SEIM_SGAT_ELEM_SZ) {
+			j = (int)seip->sgat_elem_sz;
+			if (!is_power_of_2(j) || j < (int)PAGE_SIZE) {
+				SG_LOG(1, sfp, "%s: %s not power of 2, %s\n",
+				       __func__, "sgat element size",
+				       "or less than PAGE_SIZE");
+				ret = -EINVAL;
+			} else {
+				sfp->sgat_elem_sz = j;
+			}
+		}
+		if (s_rd_mask & SG_SEIM_SGAT_ELEM_SZ)
+			seip->sgat_elem_sz = n; /* prior value if rw */
 	}
 	/* reserved_sz [raw], since may be reduced by other limits */
 	if (s_wr_mask & SG_SEIM_RESERVED_SIZE) {
@@ -3586,6 +3623,8 @@ sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen)
 	schp->page_order = order;
 	schp->num_sgat = k;
 	schp->buflen = align_sz;
+	if (sfp->tot_fd_thresh > 0)
+		atomic_add(align_sz, &sfp->sum_fd_dlens);
 	return 0;
 err_out:
 	k = pgp - schp->pages;
@@ -3634,6 +3673,14 @@ sg_remove_sgat(struct sg_request *srp)
 		" [rsv]" : ""));
 	sg_remove_sgat_helper(sfp, schp);
 
+	if (sfp->tot_fd_thresh > 0) {
+		/* this is a subtraction, error if it goes negative */
+		if (atomic_add_negative(-schp->buflen, &sfp->sum_fd_dlens)) {
+			SG_LOG(2, sfp, "%s: logic error: this dlen > %s\n",
+			       __func__, "sum_fd_dlens");
+			atomic_set(&sfp->sum_fd_dlens, 0);
+		}
+	}
 	memset(schp, 0, sizeof(*schp));         /* zeros buflen and dlen */
 }
 
@@ -3886,6 +3933,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 	bool second = false;
 	bool has_inactive = false;
 	int l_used_idx;
+	u32 sum_dlen;
 	unsigned long idx, s_idx, end_idx, iflags;
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_request *r_srp = NULL;	/* request to return */
@@ -3977,6 +4025,13 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 			SG_LOG(6, fp, "%s: trying 2nd req but cmd_q=false\n",
 			       __func__);
 			goto fini;
+		} else if (fp->tot_fd_thresh > 0) {
+			sum_dlen = atomic_read(&fp->sum_fd_dlens) + dxfr_len;
+			if (sum_dlen > (u32)fp->tot_fd_thresh) {
+				r_srp = ERR_PTR(-E2BIG);
+				SG_LOG(2, fp, "%s: sum_of_dlen(%u) > %s\n",
+				       __func__, sum_dlen, "tot_fd_thresh");
+			}
 		}
 		r_srp = sg_mk_srp_sgat(fp, act_empty, dxfr_len);
 		if (IS_ERR(r_srp)) {
@@ -4071,6 +4126,8 @@ sg_add_sfp(struct sg_device *sdp)
 	__assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, SG_DEF_KEEP_ORPHAN);
 	__assign_bit(SG_FFD_TIME_IN_NS, sfp->ffd_bm, SG_DEF_TIME_UNIT);
 	__assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm, SG_DEFAULT_Q_AT);
+	sfp->tot_fd_thresh = SG_TOT_FD_THRESHOLD;
+	atomic_set(&sfp->sum_fd_dlens, 0);
 	/*
 	 * SG_SCATTER_SZ initializes scatter_elem_sz but different value may
 	 * be given as driver/module parameter (e.g. 'scatter_elem_sz=8192').
@@ -4547,8 +4604,9 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
 		       (int)test_bit(SG_FFD_KEEP_ORPHAN, fp->ffd_bm),
 		       fp->ffd_bm[0]);
 	n += scnprintf(obp + n, len - n,
-		       "   mmap_sz=%d low_used_idx=%d low_await_idx=%d\n",
-		       fp->mmap_sz, READ_ONCE(fp->low_used_idx), READ_ONCE(fp->low_await_idx));
+		       "   mmap_sz=%d low_used_idx=%d low_await_idx=%d sum_fd_dlens=%u\n",
+		       fp->mmap_sz, READ_ONCE(fp->low_used_idx), READ_ONCE(fp->low_await_idx),
+		       atomic_read(&fp->sum_fd_dlens));
 	n += scnprintf(obp + n, len - n,
 		       "   submitted=%d waiting=%d inactives=%d   open thr_id=%d\n",
 		       atomic_read(&fp->submitted),
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 74f177583fce..532f0f0a56be 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -169,6 +169,7 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_SEIM_CTL_FLAGS	0x1	/* ctl_flags_mask bits in ctl_flags */
 #define SG_SEIM_READ_VAL	0x2	/* write SG_SEIRV_*, read back value */
 #define SG_SEIM_RESERVED_SIZE	0x4	/* reserved_sz of reserve request */
+#define SG_SEIM_TOT_FD_THRESH	0x8	/* tot_fd_thresh of data buffers */
 #define SG_SEIM_MINOR_INDEX	0x10	/* sg device minor index number */
 #define SG_SEIM_SGAT_ELEM_SZ	0x80	/* sgat element size (>= PAGE_SIZE) */
 #define SG_SEIM_BLK_POLL	0x100	/* call blk_poll, uses 'num' field */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 49/83] sg: tag and more_async
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (48 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 48/83] sg: sgat_elem_sz and sum_fd_dlens Douglas Gilbert
@ 2021-04-27 21:56 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 50/83] sg: add fd sharing , change, unshare Douglas Gilbert
                   ` (33 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:56 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Add tag tracking capability to the functionally similar pack_id.
The difference is that the sg user provides the pack_id while
the block layer generates the tag.

The more_async flag when set instructs the blk_get_request() not
to block which is does in the current driver on rare occasions
for some obscure reason.

Add debug_summary to /proc/scsi/sg/ and snapshot_summary to
/sys/kernel/debug/scsi_generic/ . Both give a summary of each
active sg file descriptor but don't go down to the request
level.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 301 +++++++++++++++++++++++++++++------------
 include/uapi/scsi/sg.h |   3 +
 2 files changed, 214 insertions(+), 90 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index b141b0113f96..c0a4fbcc4aa2 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -109,6 +109,7 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 #define SG_ML_RESULT_MSK 0x0fff00ff	/* mid-level's 32 bit result value */
 
 #define SG_PACK_ID_WILDCARD (-1)
+#define SG_TAG_WILDCARD (-1)
 
 #define SG_ADD_RQ_MAX_RETRIES 40	/* to stop infinite _trylock(s) */
 
@@ -131,7 +132,9 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 #define SG_FFD_HIPRI_SEEN	3	/* could have HIPRI requests active */
 #define SG_FFD_TIME_IN_NS	4	/* set: time in nanoseconds, else ms */
 #define SG_FFD_Q_AT_TAIL	5	/* set: queue reqs at tail of blk q */
-#define SG_FFD_NO_DURATION	6	/* don't do command duration calc */
+#define SG_FFD_PREFER_TAG	6	/* prefer tag over pack_id (def) */
+#define SG_FFD_NO_DURATION	7	/* don't do command duration calc */
+#define SG_FFD_MORE_ASYNC	8	/* yield EBUSY more often */
 
 /* Bit positions (flags) for sg_device::fdev_bm bitmask follow */
 #define SG_FDEV_EXCLUDE		0	/* have fd open with O_EXCL */
@@ -210,16 +213,17 @@ struct sg_request {	/* active SCSI command or inactive request */
 		struct sg_slice_hdr3 s_hdr3;  /* subset of sg_io_hdr */
 		struct sg_slice_hdr4 s_hdr4; /* reduced size struct sg_io_v4 */
 	};
-	u32 duration;		/* cmd duration in milliseconds */
-	u32 rq_flags;		/* hold user supplied flags */
+	u32 duration;		/* cmd duration in milli or nano seconds */
+	u32 rq_flags;		/* flags given in v3 and v4 */
 	u32 rq_idx;		/* my index within parent's srp_arr */
 	u32 rq_info;		/* info supplied by v3 and v4 interfaces */
 	u32 rq_result;		/* packed scsi request result from LLD */
 	int in_resid;		/* requested-actual byte count on data-in */
-	int pack_id;		/* user provided packet identifier field */
+	int pack_id;		/* v3 pack_id or in v4 request_extra field */
 	int sense_len;		/* actual sense buffer length (data-in) */
 	atomic_t rq_st;		/* request state, holds a enum sg_rq_state */
 	u8 cmd_opcode;		/* first byte of SCSI cdb */
+	int tag;		/* block layer identifier of request */
 	blk_qc_t cookie;	/* ids 1 or more queues for blk_poll() */
 	u64 start_ns;		/* starting point of command duration calc */
 	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
@@ -304,7 +308,8 @@ static int sg_read_append(struct sg_request *srp, void __user *outp,
 static void sg_remove_sgat(struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
 static void sg_remove_sfp(struct kref *);
-static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id);
+static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id,
+					    bool is_tag);
 static struct sg_request *sg_setup_req(struct sg_comm_wr_t *cwrp,
 				       int dxfr_len);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
@@ -850,6 +855,14 @@ sg_submit_v4(struct file *filp, struct sg_fd *sfp, void __user *p,
 		return PTR_ERR(srp);
 	if (o_srp)
 		*o_srp = srp;
+	if (p && !sync && (srp->rq_flags & SGV4_FLAG_YIELD_TAG)) {
+		u64 gen_tag = srp->tag;
+		struct sg_io_v4 __user *h4_up = (struct sg_io_v4 __user *)p;
+
+		if (unlikely(copy_to_user(&h4_up->generated_tag, &gen_tag,
+					  sizeof(gen_tag))))
+			return -EFAULT;
+	}
 	return res;
 }
 
@@ -875,7 +888,7 @@ static int
 sg_ctl_iosubmit_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
 {
 	int res;
-	u8 hdr_store[SZ_SG_IO_V4];      /* max(v3interface, v4interface) */
+	u8 hdr_store[SZ_SG_IO_V4];	/* max(v3interface, v4interface) */
 	struct sg_io_hdr *h3p = (struct sg_io_hdr *)hdr_store;
 	struct sg_device *sdp = sfp->parentdp;
 
@@ -1146,7 +1159,8 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
  * returns true (or an event like a signal (e.g. control-C) occurs).
  */
 static inline bool
-sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int pack_id)
+sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int id,
+		 bool is_tag)
 {
 	struct sg_request *srp;
 
@@ -1154,7 +1168,7 @@ sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int pack_id)
 		*srpp = NULL;
 		return true;
 	}
-	srp = sg_find_srp_by_id(sfp, pack_id);
+	srp = sg_find_srp_by_id(sfp, id, is_tag);
 	*srpp = srp;
 	return !!srp;
 }
@@ -1294,6 +1308,8 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 	h4p->usr_ptr = srp->s_hdr4.usr_ptr;
 	h4p->response = (uintptr_t)srp->s_hdr4.sbp;
 	h4p->request_extra = srp->pack_id;
+	if (test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm))
+		h4p->generated_tag = srp->tag;
 	if (p) {
 		if (copy_to_user(p, h4p, SZ_SG_IO_V4))
 			err = err ? err : -EFAULT;
@@ -1314,8 +1330,10 @@ static int
 sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
 {
 	bool non_block = !!(filp->f_flags & O_NONBLOCK);
+	bool use_tag = false;
 	int res, id;
 	int pack_id = SG_PACK_ID_WILDCARD;
+	int tag = SG_TAG_WILDCARD;
 	u8 v4_holder[SZ_SG_IO_V4];
 	struct sg_io_v4 *h4p = (struct sg_io_v4 *)v4_holder;
 	struct sg_device *sdp = sfp->parentdp;
@@ -1334,9 +1352,16 @@ sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
 		non_block = true;	/* set by either this or O_NONBLOCK */
 	SG_LOG(3, sfp, "%s: non_block(+IMMED)=%d\n", __func__, non_block);
 	/* read in part of v3 or v4 header for pack_id or tag based find */
-	id = pack_id;
+	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm)) {
+		use_tag = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm);
+		if (use_tag)
+			tag = h4p->request_tag;	/* top 32 bits ignored */
+		else
+			pack_id = h4p->request_extra;
+	}
+	id = use_tag ? tag : pack_id;
 try_again:
-	srp = sg_find_srp_by_id(sfp, id);
+	srp = sg_find_srp_by_id(sfp, id, use_tag);
 	if (!srp) {     /* nothing available so wait on packet or */
 		if (unlikely(SG_IS_DETACHING(sdp)))
 			return -ENODEV;
@@ -1344,7 +1369,7 @@ sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
 			return -EAGAIN;
 		res = wait_event_interruptible(sfp->read_wait,
 					       sg_get_ready_srp(sfp, &srp,
-								id));
+								id, use_tag));
 		if (unlikely(SG_IS_DETACHING(sdp)))
 			return -ENODEV;
 		if (res)
@@ -1391,7 +1416,7 @@ sg_ctl_ioreceive_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
 	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm))
 		pack_id = h3p->pack_id;
 try_again:
-	srp = sg_find_srp_by_id(sfp, pack_id);
+	srp = sg_find_srp_by_id(sfp, pack_id, false);
 	if (!srp) {     /* nothing available so wait on packet or */
 		if (unlikely(SG_IS_DETACHING(sdp)))
 			return -ENODEV;
@@ -1399,7 +1424,7 @@ sg_ctl_ioreceive_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
 			return -EAGAIN;
 		res = wait_event_interruptible
 				(sfp->read_wait,
-				 sg_get_ready_srp(sfp, &srp, pack_id));
+				 sg_get_ready_srp(sfp, &srp, pack_id, false));
 		if (unlikely(SG_IS_DETACHING(sdp)))
 			return -ENODEV;
 		if (unlikely(res))
@@ -1561,15 +1586,15 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 		}
 	}
 try_again:
-	srp = sg_find_srp_by_id(sfp, want_id);
+	srp = sg_find_srp_by_id(sfp, want_id, false);
 	if (!srp) {	/* nothing available so wait on packet to arrive or */
 		if (unlikely(SG_IS_DETACHING(sdp)))
 			return -ENODEV;
 		if (non_block) /* O_NONBLOCK or v3::flags & SGV4_FLAG_IMMED */
 			return -EAGAIN;
-		ret = wait_event_interruptible(sfp->read_wait,
-					       sg_get_ready_srp(sfp, &srp,
-								want_id));
+		ret = wait_event_interruptible
+				(sfp->read_wait,
+				 sg_get_ready_srp(sfp, &srp, want_id, false));
 		if (unlikely(SG_IS_DETACHING(sdp)))
 			return -ENODEV;
 		if (ret)	/* -ERESTARTSYS as signal hit process */
@@ -1704,10 +1729,10 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 	rip->orphan = test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
 	rip->sg_io_owned = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
 	rip->problem = !!(srp->rq_result & SG_ML_RESULT_MSK);
-	rip->pack_id = srp->pack_id;
+	rip->pack_id = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm) ?
+				srp->tag : srp->pack_id;
 	rip->usr_ptr = test_bit(SG_FRQ_IS_V4I, srp->frq_bm) ?
 			uptr64(srp->s_hdr4.usr_ptr) : srp->s_hdr3.usr_ptr;
-	rip->usr_ptr = srp->s_hdr3.usr_ptr;
 	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 }
 
@@ -1821,18 +1846,27 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	return res;
 }
 
+/* When use_tag is true then id is a tag, else it is a pack_id. */
 static struct sg_request *
-sg_match_request(struct sg_fd *sfp, int id)
+sg_match_request(struct sg_fd *sfp, bool use_tag, int id)
 {
 	int num_waiting = atomic_read(&sfp->waiting);
 	unsigned long idx;
 	struct sg_request *srp;
 
-	if (num_waiting < 1)
-		return NULL;
+	if (num_waiting < 1) {
+		num_waiting = atomic_read_acquire(&sfp->waiting);
+		if (num_waiting < 1)
+			return NULL;
+	}
 	if (id == SG_PACK_ID_WILDCARD) {
 		xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT)
 			return srp;
+	} else if (use_tag) {
+		xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
+			if (id == srp->tag)
+				return srp;
+		}
 	} else {
 		xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
 			if (id == srp->pack_id)
@@ -1845,7 +1879,8 @@ sg_match_request(struct sg_fd *sfp, int id)
 static int
 sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 {
-	int res, pack_id, id;
+	bool use_tag;
+	int res, pack_id, tag, id;
 	unsigned long iflags, idx;
 	struct sg_fd *o_sfp;
 	struct sg_request *srp;
@@ -1857,16 +1892,18 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	if (h4p->guard != 'Q' || h4p->protocol != 0 || h4p->subprotocol != 0)
 		return -EPERM;
 	pack_id = h4p->request_extra;
-	id = pack_id;
+	tag = h4p->request_tag;
+	use_tag = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm);
+	id = use_tag ? tag : pack_id;
 
 	xa_lock_irqsave(&sfp->srp_arr, iflags);
-	srp = sg_match_request(sfp, id);
+	srp = sg_match_request(sfp, use_tag, id);
 	if (!srp) {	/* assume device (not just fd) scope */
 		xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 		xa_for_each(&sdp->sfp_arr, idx, o_sfp) {
 			if (o_sfp == sfp)
 				continue;	/* already checked */
-			srp = sg_match_request(o_sfp, id);
+			srp = sg_match_request(o_sfp, use_tag, id);
 			if (srp) {
 				sfp = o_sfp;
 				xa_lock_irqsave(&sfp->srp_arr, iflags);
@@ -2047,6 +2084,16 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		else
 			c_flgs_val_out &= ~SG_CTL_FLAGM_TIME_IN_NS;
 	}
+	/* TAG_FOR_PACK_ID boolean, [raw] search by tag or pack_id (def) */
+	if (c_flgs_wm & SG_CTL_FLAGM_TAG_FOR_PACK_ID)
+		assign_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm,
+			   !!(c_flgs_val_in & SG_CTL_FLAGM_TAG_FOR_PACK_ID));
+	if (c_flgs_rm & SG_CTL_FLAGM_TAG_FOR_PACK_ID) {
+		if (test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm))
+			c_flgs_val_out |= SG_CTL_FLAGM_TAG_FOR_PACK_ID;
+		else
+			c_flgs_val_out &= ~SG_CTL_FLAGM_TAG_FOR_PACK_ID;
+	}
 	/* OTHER_OPENS boolean, [ro] any other sg open fds on this dev? */
 	if (c_flgs_rm & SG_CTL_FLAGM_OTHER_OPENS) {
 		if (atomic_read(&sdp->open_cnt) > 1)
@@ -2076,6 +2123,18 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		else
 			c_flgs_val_out &= ~SG_CTL_FLAGM_NO_DURATION;
 	}
+	/* MORE_ASYNC boolean, [rbw] */
+	if (c_flgs_rm & SG_CTL_FLAGM_MORE_ASYNC)
+		flg = test_bit(SG_FFD_MORE_ASYNC, sfp->ffd_bm);
+	if (c_flgs_wm & SG_CTL_FLAGM_MORE_ASYNC)
+		assign_bit(SG_FFD_MORE_ASYNC, sfp->ffd_bm,
+			   !!(c_flgs_val_in & SG_CTL_FLAGM_MORE_ASYNC));
+	if (c_flgs_rm & SG_CTL_FLAGM_MORE_ASYNC) {
+		if (flg)
+			c_flgs_val_out |= SG_CTL_FLAGM_MORE_ASYNC;
+		else
+			c_flgs_val_out &= ~SG_CTL_FLAGM_MORE_ASYNC;
+	}
 
 	if (c_flgs_val_in != c_flgs_val_out)
 		seip->ctl_flags = c_flgs_val_out;
@@ -2103,24 +2162,16 @@ sg_extended_read_value(struct sg_fd *sfp, struct sg_extended_info *seip)
 	case SG_SEIRV_INACT_RQS:
 		uv = 0;
 		xa_for_each_marked(&sfp->srp_arr, idx, srp,
-				   SG_XA_RQ_INACTIVE) {
-			if (!srp)
-				continue;
+				   SG_XA_RQ_INACTIVE)
 			++uv;
-		}
 		seip->read_value = uv;
 		break;
 	case SG_SEIRV_DEV_INACT_RQS:
 		uv = 0;
 		xa_for_each(&sdp->sfp_arr, idx2, a_sfp) {
-			if (!a_sfp)
-				continue;
 			xa_for_each_marked(&a_sfp->srp_arr, idx, srp,
-					   SG_XA_RQ_INACTIVE) {
-				if (!srp)
-					continue;
+					   SG_XA_RQ_INACTIVE)
 				++uv;
-			}
 		}
 		seip->read_value = uv;
 		break;
@@ -2129,11 +2180,8 @@ sg_extended_read_value(struct sg_fd *sfp, struct sg_extended_info *seip)
 		break;
 	case SG_SEIRV_DEV_SUBMITTED: /* sum(submitted) on all fd's siblings */
 		uv = 0;
-		xa_for_each(&sdp->sfp_arr, idx2, a_sfp) {
-			if (!a_sfp)
-				continue;
+		xa_for_each(&sdp->sfp_arr, idx2, a_sfp)
 			uv += (u32)atomic_read(&a_sfp->submitted);
-		}
 		seip->read_value = uv;
 		break;
 	default:
@@ -2375,10 +2423,21 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return 0;
 	case SG_GET_PACK_ID:    /* or tag of oldest "read"-able, -1 if none */
 		val = -1;
-		xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
-			if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
-				val = srp->pack_id;
-				break;
+		if (test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm)) {
+			xa_for_each_marked(&sfp->srp_arr, idx, srp,
+					   SG_XA_RQ_AWAIT) {
+				if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
+					val = srp->tag;
+					break;
+				}
+			}
+		} else {
+			xa_for_each_marked(&sfp->srp_arr, idx, srp,
+					   SG_XA_RQ_AWAIT) {
+				if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
+					val = srp->pack_id;
+					break;
+				}
 			}
 		}
 		SG_LOG(3, sfp, "%s:    SG_GET_PACK_ID=%d\n", __func__, val);
@@ -2664,7 +2723,7 @@ sg_sfp_blk_poll(struct sg_fd *sfp, int loop_count)
  * flags OR-ed together.
  */
 static __poll_t
-sg_poll(struct file *filp, poll_table * wait)
+sg_poll(struct file *filp, poll_table *wait)
 {
 	int num;
 	__poll_t p_res = 0;
@@ -3078,7 +3137,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 	struct scsi_device *scsidp = to_scsi_device(cl_dev->parent);
 	struct gendisk *disk;
 	struct sg_device *sdp = NULL;
-	struct cdev * cdev = NULL;
+	struct cdev *cdev = NULL;
 	int error;
 	unsigned long iflags;
 
@@ -3360,6 +3419,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	int dxfer_len = 0;
 	int r0w = READ;
 	u32 rq_flags = srp->rq_flags;
+	int blk_flgs;
 	unsigned int iov_count = 0;
 	void __user *up;
 	struct request *rqq;
@@ -3408,17 +3468,15 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	q = sdp->device->request_queue;
 
 	/*
-	 * NOTE
-	 *
-	 * With scsi-mq enabled, there are a fixed number of preallocated
-	 * requests equal in number to shost->can_queue.  If all of the
-	 * preallocated requests are already in use, then blk_get_request()
-	 * will sleep until an active command completes, freeing up a request.
-	 * Although waiting in an asynchronous interface is less than ideal, we
-	 * do not want to use BLK_MQ_REQ_NOWAIT here because userspace might
-	 * not expect an EWOULDBLOCK from this condition.
+	 * For backward compatibility default to using blocking variant even
+	 * when in non-blocking (async) mode. If the SG_CTL_FLAGM_MORE_ASYNC
+	 * boolean set on this file descriptor, returns -EAGAIN if
+	 * blk_get_request(BLK_MQ_REQ_NOWAIT) yields EAGAIN (aka EWOULDBLOCK).
 	 */
-	rqq = blk_get_request(q, (r0w ? REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN), 0);
+	blk_flgs = (test_bit(SG_FFD_MORE_ASYNC, sfp->ffd_bm)) ?
+						BLK_MQ_REQ_NOWAIT : 0;
+	rqq = blk_get_request(q, (r0w ? REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN),
+			      blk_flgs);
 	if (IS_ERR(rqq)) {
 		kfree(long_cmdp);
 		return PTR_ERR(rqq);
@@ -3426,9 +3484,10 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	/* current sg_request protected by SG_RS_BUSY state */
 	scsi_rp = scsi_req(rqq);
 	WRITE_ONCE(srp->rqq, rqq);
+	if (rq_flags & SGV4_FLAG_YIELD_TAG)
+		srp->tag = rqq->tag;
 	if (rq_flags & SGV4_FLAG_HIPRI)
 		set_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm);
-
 	if (cwrp->cmd_len > BLK_MAX_CDB)
 		scsi_rp->cmd = long_cmdp;	/* transfer ownership */
 	if (cwrp->u_cmdp)
@@ -3727,18 +3786,20 @@ sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
 
 /*
  * If there are multiple requests outstanding, the speed of this function is
- * important. SG_PACK_ID_WILDCARD is -1 and that case is typically
+ * important. 'id' is pack_id when is_tag=false, otherwise it is a tag. Both
+ * SG_PACK_ID_WILDCARD and SG_TAG_WILDCARD are -1 and that case is typically
  * the fast path. This function is only used in the non-blocking cases.
  * Returns pointer to (first) matching sg_request or NULL. If found,
  * sg_request state is moved from SG_RS_AWAIT_RCV to SG_RS_BUSY.
  */
 static struct sg_request *
-sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
+sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 {
 	__maybe_unused bool is_bad_st = false;
 	__maybe_unused enum sg_rq_state bad_sr_st = SG_RS_INACTIVE;
-	bool search_for_1 = (pack_id != SG_PACK_ID_WILDCARD);
+	bool search_for_1 = (id != SG_TAG_WILDCARD);
 	bool second = false;
+	enum sg_rq_state sr_st;
 	int res;
 	int num_waiting = atomic_read(&sfp->waiting);
 	int l_await_idx = READ_ONCE(sfp->low_await_idx);
@@ -3764,15 +3825,33 @@ sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 		     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
 			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
 				continue;
-			if (srp->pack_id != pack_id)
-				continue;
-			res = sg_rq_chg_state(srp, SG_RS_AWAIT_RCV, SG_RS_BUSY);
-			if (likely(res == 0))
-				goto good;
-			/* else another caller got it, move on */
-			if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
-				is_bad_st = true;
-				bad_sr_st = atomic_read(&srp->rq_st);
+			if (is_tag) {
+				if (srp->tag != id)
+					continue;
+			} else {
+				if (srp->pack_id != id)
+					continue;
+			}
+			sr_st = atomic_read(&srp->rq_st);
+			switch (sr_st) {
+			case SG_RS_AWAIT_RCV:
+				res = sg_rq_chg_state(srp, sr_st, SG_RS_BUSY);
+				if (likely(res == 0))
+					goto good;
+				/* else another caller got it, move on */
+				if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
+					is_bad_st = true;
+					bad_sr_st = atomic_read(&srp->rq_st);
+				}
+				break;
+			case SG_RS_INFLIGHT:
+				break;
+			default:
+				if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
+					is_bad_st = true;
+					bad_sr_st = sr_st;
+				}
+				break;
 			}
 			break;
 		}
@@ -3818,21 +3897,22 @@ sg_find_srp_by_id(struct sg_fd *sfp, int pack_id)
 	/* here if one of above loops does _not_ find a match */
 	if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
 		if (search_for_1) {
-			__maybe_unused const char *cptp = "pack_id=";
+			__maybe_unused const char *cptp = is_tag ? "tag=" :
+								   "pack_id=";
 
 			if (is_bad_st)
 				SG_LOG(1, sfp, "%s: %s%d wrong state: %s\n",
-				       __func__, cptp, pack_id,
+				       __func__, cptp, id,
 				       sg_rq_st_str(bad_sr_st, true));
 			else
 				SG_LOG(6, sfp, "%s: %s%d not awaiting read\n",
-				       __func__, cptp, pack_id);
+				       __func__, cptp, id);
 		}
 	}
 	return NULL;
 good:
-	SG_LOG(5, sfp, "%s: %s%d found [srp=0x%pK]\n", __func__, "pack_id=",
-	       pack_id, srp);
+	SG_LOG(5, sfp, "%s: %s%d found [srp=0x%pK]\n", __func__,
+	       (is_tag ? "tag=" : "pack_id="), id, srp);
 	return srp;
 }
 
@@ -3855,6 +3935,7 @@ sg_mk_srp(struct sg_fd *sfp, bool first)
 	if (srp) {
 		atomic_set(&srp->rq_st, SG_RS_BUSY);
 		srp->parentfp = sfp;
+		srp->tag = SG_TAG_WILDCARD;
 		return srp;
 	} else {
 		return ERR_PTR(-ENOMEM);
@@ -4291,14 +4372,18 @@ sg_lookup_dev(int dev)
 	return idr_find(&sg_index_idr, dev);
 }
 
+/*
+ * Returns valid pointer to a sg_device object on success or a negated
+ * errno value on failure. Does not return NULL.
+ */
 static struct sg_device *
-sg_get_dev(int dev)
+sg_get_dev(int min_dev)
 {
 	struct sg_device *sdp;
-	unsigned long flags;
+	unsigned long iflags;
 
-	read_lock_irqsave(&sg_index_lock, flags);
-	sdp = sg_lookup_dev(dev);
+	read_lock_irqsave(&sg_index_lock, iflags);
+	sdp = sg_lookup_dev(min_dev);
 	if (!sdp)
 		sdp = ERR_PTR(-ENXIO);
 	else if (SG_IS_DETACHING(sdp)) {
@@ -4308,8 +4393,7 @@ sg_get_dev(int dev)
 		sdp = ERR_PTR(-ENODEV);
 	} else
 		kref_get(&sdp->d_ref);
-	read_unlock_irqrestore(&sg_index_lock, flags);
-
+	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return sdp;
 }
 
@@ -4404,7 +4488,7 @@ dev_seq_next(struct seq_file *s, void *v, loff_t *pos)
 	struct sg_proc_deviter *it = s->private;
 
 	*pos = ++it->index;
-	return (it->index < it->max) ? it : NULL;
+	return (it->index < (int)it->max) ? it : NULL;
 }
 
 static void
@@ -4567,9 +4651,14 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
 		       srp->sgat_h.buflen, (int)srp->pack_id);
 	if (is_dur)	/* cmd/req has completed, waiting for ... */
 		n += scnprintf(obp + n, len - n, " dur=%u%s", dur, tp);
-	else if (dur < U32_MAX)	/* in-flight or busy (so ongoing) */
+	else if (dur < U32_MAX) { /* in-flight or busy (so ongoing) */
+		if ((srp->rq_flags & SGV4_FLAG_YIELD_TAG) &&
+		    srp->tag != SG_TAG_WILDCARD)
+			n += scnprintf(obp + n, len - n, " tag=0x%x",
+				       srp->tag);
 		n += scnprintf(obp + n, len - n, " t_o/elap=%us/%u%s",
 			       to / 1000, dur, tp);
+	}
 	cp = (srp->rq_flags & SGV4_FLAG_HIPRI) ? "hipri " : "";
 	n += scnprintf(obp + n, len - n, " sgat=%d %sop=0x%02x\n",
 		       srp->sgat_h.num_sgat, cp, srp->cmd_opcode);
@@ -4578,7 +4667,8 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
 
 /* Writes debug info for one sg fd (including its sg requests) in obp buffer */
 static int
-sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
+sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
+		 bool reduced)
 {
 	bool t_in_ns = test_bit(SG_FFD_TIME_IN_NS, fp->ffd_bm);
 	int n = 0;
@@ -4611,6 +4701,8 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
 		       "   submitted=%d waiting=%d inactives=%d   open thr_id=%d\n",
 		       atomic_read(&fp->submitted),
 		       atomic_read(&fp->waiting), atomic_read(&fp->inactives), fp->tid);
+	if (reduced)
+		return n;
 	k = 0;
 	xa_lock_irqsave(&fp->srp_arr, iflags);
 	xa_for_each(&fp->srp_arr, idx, srp) {
@@ -4646,7 +4738,8 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx)
 
 /* Writes debug info for one sg device (including its sg fds) in obp buffer */
 static int
-sg_proc_debug_sdev(struct sg_device *sdp, char *obp, int len, int *fd_counterp)
+sg_proc_debug_sdev(struct sg_device *sdp, char *obp, int len,
+		   int *fd_counterp, bool reduced)
 {
 	int n = 0;
 	int my_count = 0;
@@ -4668,14 +4761,13 @@ sg_proc_debug_sdev(struct sg_device *sdp, char *obp, int len, int *fd_counterp)
 	xa_for_each(&sdp->sfp_arr, idx, fp) {
 		++*countp;
 		n += scnprintf(obp + n, len - n, "  FD(%d): ", *countp);
-		n += sg_proc_debug_fd(fp, obp + n, len - n, idx);
+		n += sg_proc_debug_fd(fp, obp + n, len - n, idx, reduced);
 	}
 	return n;
 }
 
-/* Called via dbg_seq_ops once for each sg device */
 static int
-sg_proc_seq_show_debug(struct seq_file *s, void *v)
+sg_proc_seq_show_debug(struct seq_file *s, void *v, bool reduced)
 {
 	bool found = false;
 	bool trunc = false;
@@ -4737,7 +4829,8 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v)
 			snprintf(b1, sizeof(b1), " >>> device=%s  %s\n",
 				 disk_name, "detaching pending close\n");
 		else if (sdp->device) {
-			n = sg_proc_debug_sdev(sdp, bp, bp_len, fdi_p);
+			n = sg_proc_debug_sdev(sdp, bp, bp_len, fdi_p,
+					       reduced);
 			if (n >= bp_len - 1) {
 				trunc = true;
 				if (bp[bp_len - 2] != '\n')
@@ -4769,6 +4862,18 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v)
 	return 0;
 }
 
+static int
+sg_proc_seq_show_debug_full(struct seq_file *s, void *v)
+{
+	return sg_proc_seq_show_debug(s, v, false);
+}
+
+static int
+sg_proc_seq_show_debug_summ(struct seq_file *s, void *v)
+{
+	return sg_proc_seq_show_debug(s, v, true);
+}
+
 #endif         /* SG_PROC_OR_DEBUG_FS */
 
 #if IS_ENABLED(CONFIG_SCSI_PROC_FS)
@@ -4807,7 +4912,14 @@ static const struct seq_operations debug_seq_ops = {
 	.start = dev_seq_start,
 	.next  = dev_seq_next,
 	.stop  = dev_seq_stop,
-	.show  = sg_proc_seq_show_debug,
+	.show  = sg_proc_seq_show_debug_full,
+};
+
+static const struct seq_operations debug_summ_seq_ops = {
+	.start = dev_seq_start,
+	.next  = dev_seq_next,
+	.stop  = dev_seq_stop,
+	.show  = sg_proc_seq_show_debug_summ,
 };
 
 static int
@@ -4821,6 +4933,7 @@ sg_proc_init(void)
 
 	proc_create("allow_dio", 0644, p, &adio_proc_ops);
 	proc_create_seq("debug", 0444, p, &debug_seq_ops);
+	proc_create_seq("debug_summary", 0444, p, &debug_summ_seq_ops);
 	proc_create("def_reserved_size", 0644, p, &dressz_proc_ops);
 	proc_create_single("device_hdr", 0444, p, sg_proc_seq_show_devhdr);
 	proc_create_seq("devices", 0444, p, &dev_seq_ops);
@@ -5002,13 +5115,21 @@ static const struct seq_operations sg_snapshot_seq_ops = {
 	.start = dev_seq_start,
 	.next  = dev_seq_next,
 	.stop  = dev_seq_stop,
-	.show  = sg_proc_seq_show_debug,
+	.show  = sg_proc_seq_show_debug_full,
+};
+
+static const struct seq_operations sg_snapshot_summ_seq_ops = {
+	.start = dev_seq_start,
+	.next  = dev_seq_next,
+	.stop  = dev_seq_stop,
+	.show  = sg_proc_seq_show_debug_summ,
 };
 
 static const struct sg_dfs_attr sg_dfs_attrs[] = {
 	{"snapshot", 0400, .seq_ops = &sg_snapshot_seq_ops},
 	{"snapshot_devs", 0600, sg_dfs_snapshot_devs_show,
 	 sg_dfs_snapshot_devs_write},
+	{"snapshot_summary", 0400, .seq_ops = &sg_snapshot_summ_seq_ops},
 	{ },
 };
 
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 532f0f0a56be..7d11905dd787 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -107,6 +107,7 @@ typedef struct sg_io_hdr {
  */
 #define SGV4_FLAG_DIRECT_IO SG_FLAG_DIRECT_IO
 #define SGV4_FLAG_MMAP_IO SG_FLAG_MMAP_IO
+#define SGV4_FLAG_YIELD_TAG 0x8  /* sg_io_v4::generated_tag set after SG_IOS */
 #define SGV4_FLAG_Q_AT_TAIL SG_FLAG_Q_AT_TAIL
 #define SGV4_FLAG_Q_AT_HEAD SG_FLAG_Q_AT_HEAD
 #define SGV4_FLAG_IMMED 0x400 /* for polling with SG_IOR, ignored in SG_IOS */
@@ -177,10 +178,12 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 
 /* flag and mask values for boolean fields follow */
 #define SG_CTL_FLAGM_TIME_IN_NS	0x1	/* time: nanosecs (def: millisecs) */
+#define SG_CTL_FLAGM_TAG_FOR_PACK_ID 0x2 /* prefer tag over pack_id (def) */
 #define SG_CTL_FLAGM_OTHER_OPENS 0x4	/* rd: other sg fd_s on this dev */
 #define SG_CTL_FLAGM_ORPHANS	0x8	/* rd: orphaned requests on this fd */
 #define SG_CTL_FLAGM_Q_TAIL	0x10	/* used for future cmds on this fd */
 #define SG_CTL_FLAGM_NO_DURATION 0x400	/* don't calc command duration */
+#define SG_CTL_FLAGM_MORE_ASYNC	0x800	/* yield EAGAIN in more cases */
 #define SG_CTL_FLAGM_ALL_BITS	0xfff	/* should be OR of previous items */
 
 /* Write one of the following values to sg_extended_info::read_value, get... */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 50/83] sg: add fd sharing , change, unshare
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (49 preceding siblings ...)
  2021-04-27 21:56 ` [PATCH v18 49/83] sg: tag and more_async Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 51/83] sg: add shared requests Douglas Gilbert
                   ` (32 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Add the ability establish a share between any two open file
descriptors in the sg driver. Neither file descriptor can
already be part of a share. This fd share is used for two
features added and described in later patches: request sharing
and the "do_on_other" flag used when multiple requests are
issued (with a single invocation from the user space). See the
webpage at:
https://sg.danny.cz/sg/sg_v40.html
in the section titled: "6 Sharing file descriptors".

Usually two file descriptors are enough. To support the ability
to READ once and then WRITE to two or more file descriptors
(hence potentially to write the same data to different disks)
the ability to drop the share partner file descriptor and
replace it with a new fd is also available.

Finally a share can explicitly be undone, or unshared, by either
side. In practice, close()-ing either side of a fd share has the
same effect (i.e. to unshare) so this route is the more common.

File shares maybe within a single-threaded process, between
threads in the same process, or even between processes (on the
same machine) by passing an open file descriptor via Unix
sockets to the other process.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 694 +++++++++++++++++++++++++++++++++++------
 include/uapi/scsi/sg.h |   3 +
 2 files changed, 593 insertions(+), 104 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index c0a4fbcc4aa2..fb3782b1f9c7 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -33,6 +33,7 @@ static char *sg_version_date = "20210421";
 #include <linux/moduleparam.h>
 #include <linux/cdev.h>
 #include <linux/idr.h>
+#include <linux/file.h>		/* for fget() and fput() */
 #include <linux/seq_file.h>
 #include <linux/blkdev.h>
 #include <linux/delay.h>
@@ -42,6 +43,7 @@ static char *sg_version_date = "20210421";
 #include <linux/ratelimit.h>
 #include <linux/uio.h>
 #include <linux/cred.h>			/* for sg_check_file_access() */
+#include <linux/timekeeping.h>
 #include <linux/proc_fs.h>		/* used if CONFIG_SCSI_PROC_FS */
 #include <linux/xarray.h>
 #include <linux/debugfs.h>
@@ -133,8 +135,9 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 #define SG_FFD_TIME_IN_NS	4	/* set: time in nanoseconds, else ms */
 #define SG_FFD_Q_AT_TAIL	5	/* set: queue reqs at tail of blk q */
 #define SG_FFD_PREFER_TAG	6	/* prefer tag over pack_id (def) */
-#define SG_FFD_NO_DURATION	7	/* don't do command duration calc */
-#define SG_FFD_MORE_ASYNC	8	/* yield EBUSY more often */
+#define SG_FFD_RELEASE		7	/* release (close) underway */
+#define SG_FFD_NO_DURATION	8	/* don't do command duration calc */
+#define SG_FFD_MORE_ASYNC	9	/* yield EBUSY more often */
 
 /* Bit positions (flags) for sg_device::fdev_bm bitmask follow */
 #define SG_FDEV_EXCLUDE		0	/* have fd open with O_EXCL */
@@ -142,7 +145,11 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 #define SG_FDEV_LOG_SENSE	2	/* set by ioctl(SG_SET_DEBUG) */
 
 /* xarray 'mark's allow sub-lists within main array/list. */
-#define SG_XA_RQ_FREE XA_MARK_0	/* xarray sets+clears */
+#define SG_XA_FD_FREE XA_MARK_0		/* xarray sets+clears */
+#define SG_XA_FD_UNSHARED XA_MARK_1
+#define SG_XA_FD_RS_SHARE XA_MARK_2
+
+#define SG_XA_RQ_FREE XA_MARK_0		/* xarray sets+clears */
 #define SG_XA_RQ_INACTIVE XA_MARK_1
 #define SG_XA_RQ_AWAIT XA_MARK_2
 
@@ -250,10 +257,12 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	int tot_fd_thresh;	/* E2BIG if sum_of(dlen) > this, 0: ignore */
 	int sgat_elem_sz;	/* initialized to scatter_elem_sz */
 	int mmap_sz;		/* byte size of previous mmap() call */
-	unsigned long ffd_bm[1];	/* see SG_FFD_* defines above */
 	pid_t tid;		/* thread id when opened */
 	u8 next_cmd_len;	/* 0: automatic, >0: use on next write() */
+	unsigned long ffd_bm[1];	/* see SG_FFD_* defines above */
+	struct file *filp;	/* my identity when sharing */
 	struct sg_request *rsv_srp;/* one reserve request per fd */
+	struct sg_fd __rcu *share_sfp;/* fd share cross-references, else NULL */
 	struct fasync_struct *async_qp; /* used by asynchronous notification */
 	struct xarray srp_arr;	/* xarray of sg_request object pointers */
 	struct kref f_ref;
@@ -285,7 +294,6 @@ struct sg_comm_wr_t {  /* arguments to sg_common_write() */
 		struct sg_io_v4 *h4p;
 	};
 	struct sg_fd *sfp;
-	struct file *filp;
 	const u8 __user *u_cmdp;
 };
 
@@ -299,14 +307,15 @@ static int sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp,
 			int dxfer_dir);
 static void sg_finish_scsi_blk_rq(struct sg_request *srp);
 static int sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen);
-static int sg_v3_submit(struct file *filp, struct sg_fd *sfp,
-			struct sg_io_hdr *hp, bool sync,
+static int sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp,
+			 void __user *p);
+static int sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 			struct sg_request **o_srp);
 static struct sg_request *sg_common_write(struct sg_comm_wr_t *cwrp);
 static int sg_read_append(struct sg_request *srp, void __user *outp,
 			  int num_xfer);
 static void sg_remove_sgat(struct sg_request *srp);
-static struct sg_fd *sg_add_sfp(struct sg_device *sdp);
+static struct sg_fd *sg_add_sfp(struct sg_device *sdp, struct file *filp);
 static void sg_remove_sfp(struct kref *);
 static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id,
 					    bool is_tag);
@@ -529,7 +538,7 @@ sg_open(struct inode *inode, struct file *filp)
 		set_bit(SG_FDEV_EXCLUDE, sdp->fdev_bm);
 
 	o_count = atomic_inc_return(&sdp->open_cnt);
-	sfp = sg_add_sfp(sdp);		/* increments sdp->d_ref */
+	sfp = sg_add_sfp(sdp, filp);	/* increments sdp->d_ref */
 	if (IS_ERR(sfp)) {
 		atomic_dec(&sdp->open_cnt);
 		res = PTR_ERR(sfp);
@@ -563,6 +572,21 @@ sg_open(struct inode *inode, struct file *filp)
 	goto sg_put;
 }
 
+static inline struct sg_fd *
+sg_fd_shared_ptr(struct sg_fd *sfp)
+{
+	struct sg_fd *res_sfp;
+	struct sg_device *sdp = sfp->parentdp;
+
+	rcu_read_lock();
+	if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_UNSHARED))
+		res_sfp = NULL;
+	else
+		res_sfp = sfp->share_sfp;
+	rcu_read_unlock();
+	return res_sfp;
+}
+
 /*
  * Release resources associated with a prior, successful sg_open(). It can be
  * seen as the (final) close() call on a sg device file descriptor in the user
@@ -582,9 +606,17 @@ sg_release(struct inode *inode, struct file *filp)
 	if (unlikely(!sdp))
 		return -ENXIO;
 
+	if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_FREE)) {
+		SG_LOG(1, sfp, "%s: sfp erased!!!\n", __func__);
+		return 0;	/* get out but can't fail */
+	}
+
 	mutex_lock(&sdp->open_rel_lock);
 	o_count = atomic_read(&sdp->open_cnt);
 	SG_LOG(3, sfp, "%s: open count before=%d\n", __func__, o_count);
+	if (test_and_set_bit(SG_FFD_RELEASE, sfp->ffd_bm))
+		SG_LOG(1, sfp, "%s: second release on this fd ? ?\n",
+		       __func__);
 	scsi_autopm_put_device(sdp->device);
 	kref_put(&sfp->f_ref, sg_remove_sfp);
 
@@ -673,7 +705,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 		pr_warn_once("Please use %s instead of write(),\n%s\n",
 			     "ioctl(SG_SUBMIT_V3)",
 			     "  See: https://sg.danny.cz/sg/sg_v40.html");
-		res = sg_v3_submit(filp, sfp, h3p, false, NULL);
+		res = sg_submit_v3(sfp, h3p, false, NULL);
 		return res < 0 ? res : (int)count;
 	}
 to_v2:
@@ -740,7 +772,6 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	WRITE_ONCE(cwr.frq_bm[0], 0);
 	cwr.timeout = sfp->timeout;
 	cwr.cmd_len = cmd_size;
-	cwr.filp = filp;
 	cwr.sfp = sfp;
 	cwr.u_cmdp = p;
 	srp = sg_common_write(&cwr);
@@ -764,19 +795,18 @@ sg_chk_mmap(struct sg_fd *sfp, int rq_flags, int len)
 }
 
 static int
-sg_fetch_cmnd(struct file *filp, struct sg_fd *sfp, const u8 __user *u_cdbp,
-	      int len, u8 *cdbp)
+sg_fetch_cmnd(struct sg_fd *sfp, const u8 __user *u_cdbp, int len, u8 *cdbp)
 {
 	if (!u_cdbp || len < 6 || len > SG_MAX_CDB_SIZE)
 		return -EMSGSIZE;
 	if (copy_from_user(cdbp, u_cdbp, len))
 		return -EFAULT;
-	if (O_RDWR != (filp->f_flags & O_ACCMODE)) {	/* read-only */
+	if (O_RDWR != (sfp->filp->f_flags & O_ACCMODE)) { /* read-only */
 		switch (sfp->parentdp->device->type) {
 		case TYPE_DISK:
 		case TYPE_RBC:
 		case TYPE_ZBC:
-			return blk_verify_command(cdbp, filp->f_mode);
+			return blk_verify_command(cdbp, sfp->filp->f_mode);
 		default:	/* SSC, SES, etc cbd_s may differ from SBC */
 			break;
 		}
@@ -785,8 +815,8 @@ sg_fetch_cmnd(struct file *filp, struct sg_fd *sfp, const u8 __user *u_cdbp,
 }
 
 static int
-sg_v3_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
-	     bool sync, struct sg_request **o_srp)
+sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
+	     struct sg_request **o_srp)
 {
 	unsigned long ul_timeout;
 	struct sg_request *srp;
@@ -807,7 +837,6 @@ sg_v3_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
 	cwr.h3p = hp;
 	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
 	cwr.cmd_len = hp->cmd_len;
-	cwr.filp = filp;
 	cwr.sfp = sfp;
 	cwr.u_cmdp = hp->cmdp;
 	srp = sg_common_write(&cwr);
@@ -819,8 +848,8 @@ sg_v3_submit(struct file *filp, struct sg_fd *sfp, struct sg_io_hdr *hp,
 }
 
 static int
-sg_submit_v4(struct file *filp, struct sg_fd *sfp, void __user *p,
-	     struct sg_io_v4 *h4p, bool sync, struct sg_request **o_srp)
+sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
+	     bool sync, struct sg_request **o_srp)
 {
 	int res = 0;
 	unsigned long ul_timeout;
@@ -841,7 +870,6 @@ sg_submit_v4(struct file *filp, struct sg_fd *sfp, void __user *p,
 	/* once v4 (or v3) seen, allow cmd_q on this fd (def: no cmd_q) */
 	set_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(h4p->timeout);
-	cwr.filp = filp;
 	cwr.sfp = sfp;
 	WRITE_ONCE(cwr.frq_bm[0], 0);
 	__assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
@@ -867,38 +895,38 @@ sg_submit_v4(struct file *filp, struct sg_fd *sfp, void __user *p,
 }
 
 static int
-sg_ctl_iosubmit(struct file *filp, struct sg_fd *sfp, void __user *p)
+sg_ctl_iosubmit(struct sg_fd *sfp, void __user *p)
 {
 	int res;
 	u8 hdr_store[SZ_SG_IO_V4];
 	struct sg_io_v4 *h4p = (struct sg_io_v4 *)hdr_store;
 	struct sg_device *sdp = sfp->parentdp;
 
-	res = sg_allow_if_err_recovery(sdp, (filp->f_flags & O_NONBLOCK));
+	res = sg_allow_if_err_recovery(sdp, (sfp->filp->f_flags & O_NONBLOCK));
 	if (res)
 		return res;
 	if (copy_from_user(hdr_store, p, SZ_SG_IO_V4))
 		return -EFAULT;
 	if (h4p->guard == 'Q')
-		return sg_submit_v4(filp, sfp, p, h4p, false, NULL);
+		return sg_submit_v4(sfp, p, h4p, false, NULL);
 	return -EPERM;
 }
 
 static int
-sg_ctl_iosubmit_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
+sg_ctl_iosubmit_v3(struct sg_fd *sfp, void __user *p)
 {
 	int res;
 	u8 hdr_store[SZ_SG_IO_V4];	/* max(v3interface, v4interface) */
 	struct sg_io_hdr *h3p = (struct sg_io_hdr *)hdr_store;
 	struct sg_device *sdp = sfp->parentdp;
 
-	res = sg_allow_if_err_recovery(sdp, (filp->f_flags & O_NONBLOCK));
+	res = sg_allow_if_err_recovery(sdp, (sfp->filp->f_flags & O_NONBLOCK));
 	if (unlikely(res))
 		return res;
 	if (copy_from_user(h3p, p, SZ_SG_IO_HDR))
 		return -EFAULT;
 	if (h3p->interface_id == 'S')
-		return sg_v3_submit(filp, sfp, h3p, false, NULL);
+		return sg_submit_v3(sfp, h3p, false, NULL);
 	return -EPERM;
 }
 
@@ -1233,46 +1261,6 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 	return 0;
 }
 
-static ssize_t
-sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, size_t count,
-	      void __user *p)
-{
-	int err, err2;
-	int rq_result = srp->rq_result;
-	struct sg_io_hdr hdr3;
-	struct sg_io_hdr *hp = &hdr3;
-
-	if (in_compat_syscall()) {
-		if (count < sizeof(struct compat_sg_io_hdr)) {
-			err = -EINVAL;
-			goto err_out;
-		}
-	} else if (count < SZ_SG_IO_HDR) {
-		err = -EINVAL;
-		goto err_out;
-	}
-	SG_LOG(3, sfp, "%s: srp=0x%pK\n", __func__, srp);
-	err = sg_rec_state_v3v4(sfp, srp, false);
-	memset(hp, 0, sizeof(*hp));
-	memcpy(hp, &srp->s_hdr3, sizeof(srp->s_hdr3));
-	hp->sb_len_wr = srp->sense_len;
-	hp->info = srp->rq_info;
-	hp->resid = srp->in_resid;
-	hp->pack_id = srp->pack_id;
-	hp->duration = srp->duration;
-	hp->status = rq_result & 0xff;
-	hp->masked_status = status_byte(rq_result);
-	hp->msg_status = msg_byte(rq_result);
-	hp->host_status = host_byte(rq_result);
-	hp->driver_status = driver_byte(rq_result);
-	err2 = put_sg_io_hdr(hp, p);
-	err = err ? err : err2;
-err_out:
-	sg_finish_scsi_blk_rq(srp);
-	sg_deact_request(sfp, srp);
-	return err;
-}
-
 static int
 sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 	      struct sg_io_v4 *h4p)
@@ -1327,9 +1315,9 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
  * otherwise it waits (i.e. it "blocks").
  */
 static int
-sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
+sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 {
-	bool non_block = !!(filp->f_flags & O_NONBLOCK);
+	bool non_block = !!(sfp->filp->f_flags & O_NONBLOCK);
 	bool use_tag = false;
 	int res, id;
 	int pack_id = SG_PACK_ID_WILDCARD;
@@ -1390,9 +1378,9 @@ sg_ctl_ioreceive(struct file *filp, struct sg_fd *sfp, void __user *p)
  * otherwise it waits.
  */
 static int
-sg_ctl_ioreceive_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
+sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 {
-	bool non_block = !!(filp->f_flags & O_NONBLOCK);
+	bool non_block = !!(sfp->filp->f_flags & O_NONBLOCK);
 	int res;
 	int pack_id = SG_PACK_ID_WILDCARD;
 	u8 v3_holder[SZ_SG_IO_HDR];
@@ -1434,7 +1422,7 @@ sg_ctl_ioreceive_v3(struct file *filp, struct sg_fd *sfp, void __user *p)
 		cpu_relax();
 		goto try_again;
 	}
-	return sg_receive_v3(sfp, srp, SZ_SG_IO_HDR, p);
+	return sg_receive_v3(sfp, srp, p);
 }
 
 static int
@@ -1605,15 +1593,56 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 		cpu_relax();
 		goto try_again;
 	}
-	if (srp->s_hdr3.interface_id == '\0')
+	if (srp->s_hdr3.interface_id == '\0') {
 		ret = sg_read_v1v2(p, (int)count, sfp, srp);
-	else
-		ret = sg_receive_v3(sfp, srp, count, p);
+	} else {
+		if (in_compat_syscall()) {
+			if (count < sizeof(struct compat_sg_io_hdr))
+				return -EINVAL;
+		} else if (count < SZ_SG_IO_HDR) {
+			return -EINVAL;
+		}
+		ret = sg_receive_v3(sfp, srp, p);
+	}
 	if (ret < 0)
 		SG_LOG(1, sfp, "%s: negated errno: %d\n", __func__, ret);
 	return ret < 0 ? ret : (int)count;
 }
 
+/*
+ * Completes a v3 request/command. Called from sg_read {v2 or v3},
+ * ioctl(SG_IO) {for v3}, or from ioctl(SG_IORECEIVE) when its
+ * completing a v3 request/command.
+ */
+static int
+sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
+{
+	int err, err2;
+	int rq_result = srp->rq_result;
+	struct sg_io_hdr hdr3;
+	struct sg_io_hdr *hp = &hdr3;
+
+	SG_LOG(3, sfp, "%s: srp=0x%pK\n", __func__, srp);
+	err = sg_rec_state_v3v4(sfp, srp, false);
+	memset(hp, 0, sizeof(*hp));
+	memcpy(hp, &srp->s_hdr3, sizeof(srp->s_hdr3));
+	hp->sb_len_wr = srp->sense_len;
+	hp->info = srp->rq_info;
+	hp->resid = srp->in_resid;
+	hp->pack_id = srp->pack_id;
+	hp->duration = srp->duration;
+	hp->status = rq_result & 0xff;
+	hp->masked_status = status_byte(rq_result);
+	hp->msg_status = msg_byte(rq_result);
+	hp->host_status = host_byte(rq_result);
+	hp->driver_status = driver_byte(rq_result);
+	err2 = put_sg_io_hdr(hp, p);
+	err = err ? err : err2;
+	sg_finish_scsi_blk_rq(srp);
+	sg_deact_request(sfp, srp);
+	return err;
+}
+
 static int
 max_sectors_bytes(struct request_queue *q)
 {
@@ -1657,6 +1686,155 @@ sg_calc_sgat_param(struct sg_device *sdp)
 	sdp->max_sgat_sz = sz;
 }
 
+/*
+ * Depending on which side is calling for the unshare, it is best to unshare
+ * the other side first. For example: if the invocation is from the read-side
+ * fd then rd_first should be false so the write-side is unshared first.
+ */
+static void
+sg_unshare_fds(struct sg_fd *rs_sfp, bool rs_lck, struct sg_fd *ws_sfp,
+	       bool ws_lck, bool rs_first)
+{
+	bool diff_sdps = true;
+	unsigned long iflags = 0;
+	struct sg_device *sdp;
+	struct xarray *xap;
+
+	if (rs_lck && ws_lck &&  rs_sfp && ws_sfp &&
+	    rs_sfp->parentdp == ws_sfp->parentdp)
+		diff_sdps = false;
+	if (!rs_first && ws_sfp)
+		goto wr_first;
+rd_first:
+	if (rs_sfp) {
+		sdp = rs_sfp->parentdp;
+		xap = &sdp->sfp_arr;
+		rcu_assign_pointer(rs_sfp->share_sfp, NULL);
+		if (rs_lck && (rs_first || diff_sdps))
+			xa_lock_irqsave(xap, iflags);
+		__xa_set_mark(xap, rs_sfp->idx, SG_XA_FD_UNSHARED);
+		__xa_clear_mark(xap, rs_sfp->idx, SG_XA_FD_RS_SHARE);
+		if (rs_lck && (!rs_first || diff_sdps))
+			xa_unlock_irqrestore(xap, iflags);
+		kref_put(&sdp->d_ref, sg_device_destroy);
+	}
+	if (!rs_first || !ws_sfp)
+		return;
+wr_first:
+	if (ws_sfp) {
+		sdp = ws_sfp->parentdp;
+		xap = &sdp->sfp_arr;
+		rcu_assign_pointer(ws_sfp->share_sfp, NULL);
+		if (ws_lck && (!rs_first || diff_sdps))
+			xa_lock_irqsave(xap, iflags);
+		__xa_set_mark(xap, ws_sfp->idx, SG_XA_FD_UNSHARED);
+		/* SG_XA_FD_RS_SHARE mark should be already clear */
+		if (ws_lck && (rs_first || diff_sdps))
+			xa_unlock_irqrestore(xap, iflags);
+		kref_put(&sdp->d_ref, sg_device_destroy);
+	}
+	if (!rs_first && rs_sfp)
+		goto rd_first;
+}
+
+/*
+ * Clean up loose ends that occur when clsong a file descriptor which is
+ * part of a file share. There may be request shares in various states using
+ * this file share so care is needed.
+ */
+static void
+sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
+{
+	unsigned long iflags;
+	struct sg_fd *o_sfp = sg_fd_shared_ptr(sfp);
+	struct sg_device *sdp;
+	struct xarray *xap;
+
+	SG_LOG(3, sfp, "%s: sfp=0x%pK, o_sfp=0x%pK%s\n", __func__, sfp, o_sfp,
+	       (is_rd_side ? " read-side" : ""));
+	if (is_rd_side) {
+		sdp = sfp->parentdp;
+		xap = &sdp->sfp_arr;
+		xa_lock_irqsave(xap, iflags);
+		if (!xa_get_mark(xap, sfp->idx, SG_XA_FD_RS_SHARE)) {
+			xa_unlock_irqrestore(xap, iflags);
+			return;
+		}
+		sg_unshare_fds(sfp, false, NULL, false, true);
+		xa_unlock_irqrestore(&sdp->sfp_arr, iflags);
+	} else {
+		sdp = sfp->parentdp;
+		xap = &sdp->sfp_arr;
+		xa_lock_irqsave(xap, iflags);
+		if (xa_get_mark(xap, sfp->idx, SG_XA_FD_UNSHARED)) {
+			xa_unlock_irqrestore(xap, iflags);
+			return;
+		}
+		sg_unshare_fds(NULL, false, sfp, false, false);
+		xa_unlock_irqrestore(xap, iflags);
+	}
+}
+
+/*
+ * Active when writing 1 to ioctl(SG_SET_GET_EXTENDED(CTL_FLAGS(UNSHARE))),
+ * writing 0 has no effect. Undoes the configuration that has done by
+ * ioctl(SG_SET_GET_EXTENDED(SHARE_FD)).
+ */
+static void
+sg_do_unshare(struct sg_fd *sfp, bool unshare_val)
+{
+	bool retry;
+	int retry_count = 0;
+	unsigned long iflags;
+	struct sg_fd *rs_sfp;
+	struct sg_fd *ws_sfp;
+	struct sg_fd *o_sfp = sg_fd_shared_ptr(sfp);
+	struct sg_device *sdp = sfp->parentdp;
+
+	if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_UNSHARED)) {
+		SG_LOG(1, sfp, "%s: not shared ? ?\n", __func__);
+		return; /* no share to undo */
+	}
+	if (!unshare_val)
+		return;
+again:
+	retry = false;
+	xa_lock_irqsave(&sfp->srp_arr, iflags);
+	if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_RS_SHARE)) {
+		rs_sfp = sfp;
+		ws_sfp = o_sfp;
+		if (!xa_trylock(&ws_sfp->srp_arr)) {
+			if (++retry_count > SG_ADD_RQ_MAX_RETRIES)
+				SG_LOG(1, sfp, "%s: cannot get write-side lock\n",
+				       __func__);
+			else
+				retry = true;
+			goto fini;
+		}
+		sg_unshare_fds(rs_sfp, false, ws_sfp, false, false);
+		xa_unlock(&ws_sfp->srp_arr);
+	} else {			/* called on write-side fd */
+		rs_sfp = o_sfp;
+		ws_sfp = sfp;
+		if (!xa_trylock(&rs_sfp->srp_arr)) {
+			if (++retry_count > SG_ADD_RQ_MAX_RETRIES)
+				SG_LOG(1, sfp, "%s: cannot get read side lock\n",
+				       __func__);
+			else
+				retry = true;
+			goto fini;
+		}
+		sg_unshare_fds(rs_sfp, false, ws_sfp, false, true);
+		xa_unlock(&rs_sfp->srp_arr);
+	}
+fini:
+	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
+	if (retry) {
+		cpu_relax();
+		goto again;
+	}
+}
+
 /*
  * Returns duration since srp->start_ns (using boot time as an epoch). Unit
  * is nanoseconds when time_in_ns==true; else it is in milliseconds.
@@ -1748,8 +1926,8 @@ sg_rq_landed(struct sg_device *sdp, struct sg_request *srp)
  * the blocking multiple request case
  */
 static int
-sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
-		  struct sg_io_v4 *h4p, struct sg_request *srp)
+sg_wait_event_srp(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
+		  struct sg_request *srp)
 {
 	int res;
 	enum sg_rq_state sr_st;
@@ -1795,7 +1973,7 @@ sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
 	if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm))
 		res = sg_receive_v4(sfp, srp, p, h4p);
 	else
-		res = sg_receive_v3(sfp, srp, SZ_SG_IO_HDR, p);
+		res = sg_receive_v3(sfp, srp, p);
 	return (res < 0) ? res : 0;
 }
 
@@ -1804,8 +1982,7 @@ sg_wait_event_srp(struct file *filp, struct sg_fd *sfp, void __user *p,
  * Returns 0 on success else a negated errno.
  */
 static int
-sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
-	     void __user *p)
+sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 {
 	int res;
 	struct sg_request *srp = NULL;
@@ -1814,7 +1991,8 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	struct sg_io_v4 *h4p = (struct sg_io_v4 *)hu8arr;
 
 	SG_LOG(3, sfp, "%s:  SG_IO%s\n", __func__,
-	       ((filp->f_flags & O_NONBLOCK) ? " O_NONBLOCK ignored" : ""));
+	       ((sfp->filp->f_flags & O_NONBLOCK) ? " O_NONBLOCK ignored" :
+						    ""));
 	res = sg_allow_if_err_recovery(sdp, false);
 	if (res)
 		return res;
@@ -1826,9 +2004,9 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 				   ((u8 __user *)p) + SZ_SG_IO_HDR,
 				   SZ_SG_IO_V4 - SZ_SG_IO_HDR))
 			return -EFAULT;
-		res = sg_submit_v4(filp, sfp, p, h4p, true, &srp);
+		res = sg_submit_v4(sfp, p, h4p, true, &srp);
 	} else if (h3p->interface_id == 'S') {
-		res = sg_v3_submit(filp, sfp, h3p, true, &srp);
+		res = sg_submit_v3(sfp, h3p, true, &srp);
 	} else {
 		pr_info_once("sg: %s: v3 or v4 interface only here\n",
 			     __func__);
@@ -1838,7 +2016,7 @@ sg_ctl_sg_io(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return res;
 	if (!srp)	/* mrq case: already processed all responses */
 		return res;
-	res = sg_wait_event_srp(filp, sfp, p, h4p, srp);
+	res = sg_wait_event_srp(sfp, p, h4p, srp);
 	if (res)
 		SG_LOG(1, sfp, "%s: %s=0x%pK  state: %s\n", __func__,
 		       "unexpected srp", srp,
@@ -1945,6 +2123,265 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	return res;
 }
 
+static int
+sg_idr_max_id(int id, void *p, void *data)
+		__must_hold(&sg_index_lock)
+{
+	int *k = data;
+
+	if (*k < id)
+		*k = id;
+	return 0;
+}
+
+static int
+sg_find_sfp_helper(struct sg_fd *from_sfp, struct sg_fd *pair_sfp,
+		   bool from_rd_side, int search_fd)
+		__must_hold(&from_sfp->f_mutex)
+{
+	bool same_sdp;
+	int res = 0;
+	unsigned long iflags;
+	struct sg_device *from_sdp = from_sfp->parentdp;
+	struct sg_device *pair_sdp = pair_sfp->parentdp;
+
+	if (unlikely(!mutex_trylock(&pair_sfp->f_mutex)))
+		return -EPROBE_DEFER;	/* use to suggest re-invocation */
+	if (unlikely(!xa_get_mark(&pair_sdp->sfp_arr, pair_sfp->idx,
+				  SG_XA_FD_UNSHARED)))
+		res = -EADDRNOTAVAIL;
+	else if (unlikely(SG_HAVE_EXCLUDE(pair_sdp)))
+		res = -EPERM;
+	if (res) {
+		mutex_unlock(&pair_sfp->f_mutex);
+		return res;
+	}
+	same_sdp = (from_sdp == pair_sdp);
+	xa_lock_irqsave(&from_sdp->sfp_arr, iflags);
+	rcu_assign_pointer(from_sfp->share_sfp, pair_sfp);
+	__xa_clear_mark(&from_sdp->sfp_arr, from_sfp->idx, SG_XA_FD_UNSHARED);
+	kref_get(&from_sdp->d_ref);	/* treat share like pseudo open() */
+	if (from_rd_side)
+		__xa_set_mark(&from_sdp->sfp_arr, from_sfp->idx,
+			      SG_XA_FD_RS_SHARE);
+
+	if (!same_sdp) {
+		xa_unlock_irqrestore(&from_sdp->sfp_arr, iflags);
+		xa_lock_irqsave(&pair_sdp->sfp_arr, iflags);
+	}
+
+	mutex_unlock(&pair_sfp->f_mutex);
+	rcu_assign_pointer(pair_sfp->share_sfp, from_sfp);
+	__xa_clear_mark(&pair_sdp->sfp_arr, pair_sfp->idx, SG_XA_FD_UNSHARED);
+	if (!from_rd_side)
+		__xa_set_mark(&pair_sdp->sfp_arr, pair_sfp->idx,
+			      SG_XA_FD_RS_SHARE);
+	kref_get(&pair_sdp->d_ref);	/* keep symmetry */
+	xa_unlock_irqrestore(&pair_sdp->sfp_arr, iflags);
+	return 0;
+}
+
+/*
+ * Scans sg driver object tree looking for search_for. Returns valid pointer
+ * if found; returns negated errno twisted by ERR_PTR(); or return NULL if
+ * not found (and no error).
+ */
+static struct sg_fd *
+sg_find_sfp_by_fd(const struct file *search_for, int search_fd,
+		  struct sg_fd *from_sfp, bool from_is_rd_side)
+		__must_hold(&from_sfp->f_mutex)
+{
+	bool found = false;
+	int k, num_d;
+	int res = 0;
+	unsigned long iflags, idx;
+	struct sg_fd *sfp;
+	struct sg_device *sdp;
+
+	num_d = -1;
+	SG_LOG(6, from_sfp, "%s: enter,  from_sfp=%pK search_for=%pK\n",
+	       __func__, from_sfp, search_for);
+	read_lock_irqsave(&sg_index_lock, iflags);
+	idr_for_each(&sg_index_idr, sg_idr_max_id, &num_d);
+	++num_d;
+	for (k = 0; k < num_d; ++k) {
+		sdp = idr_find(&sg_index_idr, k);
+		if (unlikely(!sdp || SG_IS_DETACHING(sdp)))
+			continue;
+		xa_for_each_marked(&sdp->sfp_arr, idx, sfp,
+				   SG_XA_FD_UNSHARED) {
+			if (sfp == from_sfp)
+				continue;
+			if (test_bit(SG_FFD_RELEASE, sfp->ffd_bm))
+				continue;
+			if (search_for != sfp->filp)
+				continue;       /* not this one */
+			res = sg_find_sfp_helper(from_sfp, sfp,
+						 from_is_rd_side, search_fd);
+			if (res == 0) {
+				found = true;
+				break;
+			}
+		}       /* end of loop of all fd_s in current device */
+		if (res || found)
+			break;
+	}       /* end of loop of all sg devices */
+	read_unlock_irqrestore(&sg_index_lock, iflags);
+	if (found) {	/* mark both fds as part of share */
+		struct sg_device *from_sdp = from_sfp->parentdp;
+
+		xa_lock_irqsave(&sdp->sfp_arr, iflags);
+		__xa_clear_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_UNSHARED);
+		xa_unlock_irqrestore(&sdp->sfp_arr, iflags);
+		xa_lock_irqsave(&from_sdp->sfp_arr, iflags);
+		__xa_clear_mark(&from_sfp->parentdp->sfp_arr, from_sfp->idx,
+				SG_XA_FD_UNSHARED);
+		xa_unlock_irqrestore(&from_sdp->sfp_arr, iflags);
+	} else if (res == 0) {	/* fine tune error response */
+		num_d = -1;
+		read_lock_irqsave(&sg_index_lock, iflags);
+		idr_for_each(&sg_index_idr, sg_idr_max_id, &num_d);
+		++num_d;
+		for (k = 0; k < num_d; ++k) {
+			sdp = idr_find(&sg_index_idr, k);
+			if (unlikely(!sdp || SG_IS_DETACHING(sdp)))
+				continue;
+			xa_for_each(&sdp->sfp_arr, idx, sfp) {
+				if (xa_get_mark(&sdp->sfp_arr, idx,
+						SG_XA_FD_UNSHARED))
+					continue;
+				if (search_for == sfp->filp) {
+					res = -EADDRNOTAVAIL;  /* already */
+					break;
+				}
+			}
+			if (res)
+				break;
+		}
+		read_unlock_irqrestore(&sg_index_lock, iflags);
+	}
+	if (unlikely(res < 0))
+		return ERR_PTR(res);
+	return found ? sfp : NULL;
+}
+
+/*
+ * After checking the proposed read-side/write-side relationship is unique and valid,
+ * sets up pointers between read-side and write-side sg_fd objects. Returns 0 on
+ * success or negated errno value. From ioctl(EXTENDED(SG_SEIM_SHARE_FD)).
+ */
+static int
+sg_fd_share(struct sg_fd *ws_sfp, int m_fd)
+		__must_hold(&ws_sfp->f_mutex)
+{
+	bool found = false;
+	int res = 0;
+	int retry_count = 0;
+	struct file *filp;
+	struct sg_fd *rs_sfp;
+
+	SG_LOG(3, ws_sfp, "%s:  SHARE: read-side fd: %d\n", __func__, m_fd);
+	if (unlikely(!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO)))
+		return -EACCES;
+	if (unlikely(m_fd < 0))
+		return -EBADF;
+
+	if (unlikely(!xa_get_mark(&ws_sfp->parentdp->sfp_arr, ws_sfp->idx,
+				  SG_XA_FD_UNSHARED)))
+		return -EADDRINUSE;  /* don't allow chain of shares */
+	/* Alternate approach: fcheck_files(current->files, m_fd) */
+	filp = fget(m_fd);
+	if (unlikely(!filp))
+		return -ENOENT;
+	if (unlikely(ws_sfp->filp == filp)) {/* share with self is confusing */
+		res = -ELOOP;
+		goto fini;
+	}
+	SG_LOG(6, ws_sfp, "%s: read-side fd okay, scan for filp=0x%pK\n",
+	       __func__, filp);
+again:
+	rs_sfp = sg_find_sfp_by_fd(filp, m_fd, ws_sfp, false);
+	if (IS_ERR(rs_sfp)) {
+		res = PTR_ERR(rs_sfp);
+		if (res == -EPROBE_DEFER) {
+			if (unlikely(++retry_count > SG_ADD_RQ_MAX_RETRIES)) {
+				res = -EBUSY;
+			} else {
+				res = 0;
+				cpu_relax();
+				goto again;
+			}
+		}
+	} else {
+		found = !!rs_sfp;
+	}
+fini:
+	/* paired with filp=fget(m_fd) above */
+	fput(filp);
+	if (unlikely(res))
+		return res;
+	return found ? 0 : -ENOTSOCK; /* ENOTSOCK for fd exists but not sg */
+}
+
+/*
+ * After checking the proposed file share relationship is unique and
+ * valid, sets up pointers between read-side and write-side sg_fd objects.
+ * Return 0 on success or negated errno value.
+ */
+static int
+sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
+		__must_hold(&rs_sfp->f_mutex)
+{
+	bool found = false;
+	int res = 0;
+	int retry_count = 0;
+	struct file *filp;
+	struct sg_fd *ws_sfp = sg_fd_shared_ptr(rs_sfp);
+
+	SG_LOG(3, ws_sfp, "%s:  new_write_side_fd: %d\n", __func__, new_ws_fd);
+	if (unlikely(!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO)))
+		return -EACCES;
+	if (unlikely(new_ws_fd < 0))
+		return -EBADF;
+	if (unlikely(!xa_get_mark(&rs_sfp->parentdp->sfp_arr, rs_sfp->idx,
+				  SG_XA_FD_RS_SHARE)))
+		return -EINVAL;
+
+	/* Alternate approach: fcheck_files(current->files, m_fd) */
+	filp = fget(new_ws_fd);
+	if (unlikely(!filp))
+		return -ENOENT;
+	if (unlikely(rs_sfp->filp == filp)) {/* share with self is confusing */
+		res = -ELOOP;
+		goto fini;
+	}
+	SG_LOG(6, ws_sfp, "%s: write-side fd ok, scan for filp=0x%pK\n", __func__,
+	       filp);
+	sg_unshare_fds(NULL, false, ws_sfp, false, false);
+again:
+	ws_sfp = sg_find_sfp_by_fd(filp, new_ws_fd, rs_sfp, true);
+	if (IS_ERR(ws_sfp)) {
+		res = PTR_ERR(ws_sfp);
+		if (res == -EPROBE_DEFER) {
+			if (unlikely(++retry_count > SG_ADD_RQ_MAX_RETRIES)) {
+				res = -EBUSY;
+			} else {
+				res = 0;
+				cpu_relax();
+				goto again;
+			}
+		}
+	} else {
+		found = !!ws_sfp;
+	}
+fini:
+	/* paired with filp=fget(new_ws_fd) above */
+	fput(filp);
+	if (unlikely(res))
+		return res;
+	return found ? 0 : -ENOTSOCK; /* ENOTSOCK for fd exists but not sg */
+}
+
 /*
  * First normalize want_rsv_sz to be >= sfp->sgat_elem_sz and
  * <= max_segment_size. Exit if that is the same as old size; otherwise
@@ -2111,6 +2548,16 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		else
 			c_flgs_val_out &= ~SG_CTL_FLAGM_Q_TAIL;
 	}
+	/*
+	 * UNSHARE boolean: when reading yields zero. When writing true,
+	 * unshares this fd from a previously established fd share. If
+	 * a shared commands is inflight, waits a little while for it
+	 * to finish.
+	 */
+	if (c_flgs_wm & SG_CTL_FLAGM_UNSHARE)
+		sg_do_unshare(sfp, !!(c_flgs_val_in & SG_CTL_FLAGM_UNSHARE));
+	if (c_flgs_rm & SG_CTL_FLAGM_UNSHARE)
+		c_flgs_val_out &= ~SG_CTL_FLAGM_UNSHARE;   /* clear bit */
 	/* NO_DURATION boolean, [rbw] */
 	if (c_flgs_rm & SG_CTL_FLAGM_NO_DURATION)
 		flg = test_bit(SG_FFD_NO_DURATION, sfp->ffd_bm);
@@ -2243,6 +2690,40 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 	}
 	if ((s_rd_mask & SG_SEIM_READ_VAL) && (s_wr_mask & SG_SEIM_READ_VAL))
 		sg_extended_read_value(sfp, seip);
+	/* create share: write-side gives fd of read-side to share with [raw] */
+	if (or_masks & SG_SEIM_SHARE_FD) {
+		mutex_lock(&sfp->f_mutex);
+		if (s_wr_mask & SG_SEIM_SHARE_FD) {
+			result = sg_fd_share(sfp, (int)seip->share_fd);
+			if (ret == 0 && result)
+				ret = result;
+		}
+		/* if share then yield device number of (other) read-side */
+		if (s_rd_mask & SG_SEIM_SHARE_FD) {
+			struct sg_fd *sh_sfp = sg_fd_shared_ptr(sfp);
+
+			seip->share_fd = sh_sfp ? sh_sfp->parentdp->index :
+						   U32_MAX;
+		}
+		mutex_unlock(&sfp->f_mutex);
+	}
+	/* change_share: read-side is given shr_fd of new write-side [raw] */
+	if (or_masks & SG_SEIM_CHG_SHARE_FD) {
+		mutex_lock(&sfp->f_mutex);
+		if (s_wr_mask & SG_SEIM_CHG_SHARE_FD) {
+			result = sg_fd_reshare(sfp, (int)seip->share_fd);
+			if (ret == 0 && result)
+				ret = result;
+		}
+		/* if share then yield device number of (other) write-side */
+		if (s_rd_mask & SG_SEIM_CHG_SHARE_FD) {
+			struct sg_fd *sh_sfp = sg_fd_shared_ptr(sfp);
+
+			seip->share_fd = sh_sfp ? sh_sfp->parentdp->index :
+						  U32_MAX;
+		}
+		mutex_unlock(&sfp->f_mutex);
+	}
 	/* call blk_poll() on this fd's HIPRI requests [raw] */
 	if (or_masks & SG_SEIM_BLK_POLL) {
 		n = 0;
@@ -2388,19 +2869,19 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 
 	switch (cmd_in) {
 	case SG_IO:
-		return sg_ctl_sg_io(filp, sdp, sfp, p);
+		return sg_ctl_sg_io(sdp, sfp, p);
 	case SG_IOSUBMIT:
 		SG_LOG(3, sfp, "%s:    SG_IOSUBMIT\n", __func__);
-		return sg_ctl_iosubmit(filp, sfp, p);
+		return sg_ctl_iosubmit(sfp, p);
 	case SG_IOSUBMIT_V3:
 		SG_LOG(3, sfp, "%s:    SG_IOSUBMIT_V3\n", __func__);
-		return sg_ctl_iosubmit_v3(filp, sfp, p);
+		return sg_ctl_iosubmit_v3(sfp, p);
 	case SG_IORECEIVE:
 		SG_LOG(3, sfp, "%s:    SG_IORECEIVE\n", __func__);
-		return sg_ctl_ioreceive(filp, sfp, p);
+		return sg_ctl_ioreceive(sfp, p);
 	case SG_IORECEIVE_V3:
 		SG_LOG(3, sfp, "%s:    SG_IORECEIVE_V3\n", __func__);
-		return sg_ctl_ioreceive_v3(filp, sfp, p);
+		return sg_ctl_ioreceive_v3(sfp, p);
 	case SG_IOABORT:
 		SG_LOG(3, sfp, "%s:    SG_IOABORT\n", __func__);
 		if (read_only)
@@ -3491,8 +3972,8 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	if (cwrp->cmd_len > BLK_MAX_CDB)
 		scsi_rp->cmd = long_cmdp;	/* transfer ownership */
 	if (cwrp->u_cmdp)
-		res = sg_fetch_cmnd(cwrp->filp, sfp, cwrp->u_cmdp,
-				    cwrp->cmd_len, scsi_rp->cmd);
+		res = sg_fetch_cmnd(sfp, cwrp->u_cmdp, cwrp->cmd_len,
+				    scsi_rp->cmd);
 	else
 		res = -EPROTO;
 	if (res)
@@ -4179,7 +4660,7 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 
 /* Returns pointer to sg_fd object or negated errno twisted by ERR_PTR */
 static struct sg_fd *
-sg_add_sfp(struct sg_device *sdp)
+sg_add_sfp(struct sg_device *sdp, struct file *filp)
 {
 	bool reduced = false;
 	int rbuf_len, res;
@@ -4201,6 +4682,7 @@ sg_add_sfp(struct sg_device *sdp)
 	mutex_init(&sfp->f_mutex);
 	sfp->timeout = SG_DEFAULT_TIMEOUT;
 	sfp->timeout_user = SG_DEFAULT_TIMEOUT_USER;
+	sfp->filp = filp;
 	/* other bits in sfp->ffd_bm[1] cleared by kzalloc() above */
 	__assign_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm, SG_DEF_FORCE_PACK_ID);
 	__assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, SG_DEF_COMMAND_Q);
@@ -4360,7 +4842,22 @@ static void
 sg_remove_sfp(struct kref *kref)
 {
 	struct sg_fd *sfp = container_of(kref, struct sg_fd, f_ref);
-
+	struct sg_device *sdp = sfp->parentdp;
+	struct xarray *xap = &sdp->sfp_arr;
+
+	if (!xa_get_mark(xap, sfp->idx, SG_XA_FD_UNSHARED)) {
+		struct sg_fd *o_sfp;
+
+		o_sfp = sg_fd_shared_ptr(sfp);
+		if (o_sfp && !test_bit(SG_FFD_RELEASE, o_sfp->ffd_bm) &&
+		    !xa_get_mark(xap, sfp->idx, SG_XA_FD_UNSHARED)) {
+			mutex_lock(&o_sfp->f_mutex);
+			sg_remove_sfp_share
+				(sfp, xa_get_mark(xap, sfp->idx,
+						  SG_XA_FD_RS_SHARE));
+			mutex_unlock(&o_sfp->f_mutex);
+		}
+	}
 	INIT_WORK(&sfp->ew_fd.work, sg_remove_sfp_usercontext);
 	schedule_work(&sfp->ew_fd.work);
 }
@@ -4443,17 +4940,6 @@ struct sg_proc_deviter {
 	int fd_index;
 };
 
-static int
-sg_idr_max_id(int id, void *p, void *data)
-		__must_hold(sg_index_lock)
-{
-	int *k = data;
-
-	if (*k < id)
-		*k = id;
-	return 0;
-}
-
 static int
 sg_last_dev(void)
 {
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 7d11905dd787..5c8a7c2c3191 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -172,6 +172,8 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_SEIM_RESERVED_SIZE	0x4	/* reserved_sz of reserve request */
 #define SG_SEIM_TOT_FD_THRESH	0x8	/* tot_fd_thresh of data buffers */
 #define SG_SEIM_MINOR_INDEX	0x10	/* sg device minor index number */
+#define SG_SEIM_SHARE_FD	0x20	/* write-side gives fd of read-side */
+#define SG_SEIM_CHG_SHARE_FD	0x40	/* read-side given new write-side fd */
 #define SG_SEIM_SGAT_ELEM_SZ	0x80	/* sgat element size (>= PAGE_SIZE) */
 #define SG_SEIM_BLK_POLL	0x100	/* call blk_poll, uses 'num' field */
 #define SG_SEIM_ALL_BITS	0x1ff	/* should be OR of previous items */
@@ -182,6 +184,7 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_CTL_FLAGM_OTHER_OPENS 0x4	/* rd: other sg fd_s on this dev */
 #define SG_CTL_FLAGM_ORPHANS	0x8	/* rd: orphaned requests on this fd */
 #define SG_CTL_FLAGM_Q_TAIL	0x10	/* used for future cmds on this fd */
+#define SG_CTL_FLAGM_UNSHARE	0x80	/* undo share after inflight cmd */
 #define SG_CTL_FLAGM_NO_DURATION 0x400	/* don't calc command duration */
 #define SG_CTL_FLAGM_MORE_ASYNC	0x800	/* yield EAGAIN in more cases */
 #define SG_CTL_FLAGM_ALL_BITS	0xfff	/* should be OR of previous items */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 51/83] sg: add shared requests
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (50 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 50/83] sg: add fd sharing , change, unshare Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 52/83] sg: add multiple request support Douglas Gilbert
                   ` (31 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Add request sharing which is invoked on a shared file descriptor by
using SGV4_FLAG_SHARE. The file share is asymmetric: the read-side
is assumed to do data-in command (e.g. READ) first, followed by the
write-side doing a data-out command (e.g. WRITE). The read-side
may also set SG_FLAG_NO_DXFER and the write-side must set that flag.
If both sides set that flag then a single bio is used and the user
space doesn't "see" the data. If the read-side does not set
SG_FLAG_NO_DXFER then the read data is copied to the user space.
And that copy to user space can replaced by using SG_FLAG_MMAP_IO
(but that adds some other overheads).

See the webpage at: https://sg.danny.cz/sg/sg_v40.html
in the section titled: "8 Request sharing".

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 1203 +++++++++++++++++++++++++++++-----------
 include/uapi/scsi/sg.h |    8 +
 2 files changed, 892 insertions(+), 319 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index fb3782b1f9c7..f43cfd2ae739 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -89,11 +89,22 @@ static mempool_t *sg_sense_pool;
 #define cuptr64(usp_val) ((const void __user *)(uintptr_t)(usp_val))
 
 /* Following enum contains the states of sg_request::rq_st */
-enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
-	SG_RS_INACTIVE = 0,	/* request not in use (e.g. on fl) */
-	SG_RS_INFLIGHT,		/* active: cmd/req issued, no response yet */
-	SG_RS_AWAIT_RCV,	/* have response from LLD, awaiting receive */
-	SG_RS_BUSY,		/* temporary state should rarely be seen */
+enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RQ_AWAIT_RCV==2 */
+	SG_RQ_INACTIVE = 0,	/* request not in use (e.g. on fl) */
+	SG_RQ_INFLIGHT,		/* active: cmd/req issued, no response yet */
+	SG_RQ_AWAIT_RCV,	/* have response from LLD, awaiting receive */
+	SG_RQ_BUSY,		/* temporary state should rarely be seen */
+	SG_RQ_SHR_SWAP,		/* read-side: is finished, await swap to write-side */
+	SG_RQ_SHR_IN_WS,	/* read-side: waits while write-side inflight */
+};
+
+/* write-side sets up sharing: ioctl(ws_fd,SG_SET_GET_EXTENDED(SHARE_FD(rs_fd))) */
+enum sg_shr_var {
+	SG_SHR_NONE = 0,	/* no sharing on this fd, so _not_ shared request */
+	SG_SHR_RS_NOT_SRQ,	/* read-side fd but _not_ shared request */
+	SG_SHR_RS_RQ,		/* read-side sharing on this request */
+	SG_SHR_WS_NOT_SRQ,	/* write-side fd but _not_ shared request */
+	SG_SHR_WS_RQ,		/* write-side sharing on this request */
 };
 
 /* If sum_of(dlen) of a fd exceeds this, write() will yield E2BIG */
@@ -119,13 +130,13 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 #define SG_FRQ_IS_V4I		0	/* true (set) when is v4 interface */
 #define SG_FRQ_IS_ORPHAN	1	/* owner of request gone */
 #define SG_FRQ_SYNC_INVOC	2	/* synchronous (blocking) invocation */
-#define SG_FRQ_NO_US_XFER	3	/* no user space transfer of data */
+#define SG_FRQ_US_XFER		3	/* kernel<-->user_space data transfer */
 #define SG_FRQ_ABORTING		4	/* in process of aborting this cmd */
-#define SG_FRQ_DEACT_ORPHAN	6	/* not keeping orphan so de-activate */
-#define SG_FRQ_RECEIVING	7	/* guard against multiple receivers */
-#define SG_FRQ_FOR_MMAP		8	/* request needs PAGE_SIZE elements */
-#define SG_FRQ_COUNT_ACTIVE	9	/* sfp->submitted + waiting active */
-#define SG_FRQ_ISSUED		10	/* blk_execute_rq_nowait() finished */
+#define SG_FRQ_DEACT_ORPHAN	5	/* not keeping orphan so de-activate */
+#define SG_FRQ_RECEIVING	6	/* guard against multiple receivers */
+#define SG_FRQ_FOR_MMAP		7	/* request needs PAGE_SIZE elements */
+#define SG_FRQ_COUNT_ACTIVE	8	/* sfp->submitted + waiting active */
+#define SG_FRQ_ISSUED		9	/* blk_execute_rq_nowait() finished */
 
 /* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */
 #define SG_FFD_FORCE_PACKID	0	/* receive only given pack_id/tag */
@@ -134,10 +145,11 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RS_AWAIT_RCV==2 */
 #define SG_FFD_HIPRI_SEEN	3	/* could have HIPRI requests active */
 #define SG_FFD_TIME_IN_NS	4	/* set: time in nanoseconds, else ms */
 #define SG_FFD_Q_AT_TAIL	5	/* set: queue reqs at tail of blk q */
-#define SG_FFD_PREFER_TAG	6	/* prefer tag over pack_id (def) */
-#define SG_FFD_RELEASE		7	/* release (close) underway */
-#define SG_FFD_NO_DURATION	8	/* don't do command duration calc */
-#define SG_FFD_MORE_ASYNC	9	/* yield EBUSY more often */
+#define SG_FFD_READ_SIDE_ERR	6	/* prior read-side of share failed */
+#define SG_FFD_PREFER_TAG	7	/* prefer tag over pack_id (def) */
+#define SG_FFD_RELEASE		8	/* release (close) underway */
+#define SG_FFD_NO_DURATION	9	/* don't do command duration calc */
+#define SG_FFD_MORE_ASYNC	10	/* yield EBUSY more often */
 
 /* Bit positions (flags) for sg_device::fdev_bm bitmask follow */
 #define SG_FDEV_EXCLUDE		0	/* have fd open with O_EXCL */
@@ -216,6 +228,7 @@ struct sg_fd;
 
 struct sg_request {	/* active SCSI command or inactive request */
 	struct sg_scatter_hold sgat_h;	/* hold buffer, perhaps scatter list */
+	struct sg_scatter_hold *sgatp;	/* ptr to prev unless write-side shr req */
 	union {
 		struct sg_slice_hdr3 s_hdr3;  /* subset of sg_io_hdr */
 		struct sg_slice_hdr4 s_hdr4; /* reduced size struct sg_io_v4 */
@@ -229,6 +242,7 @@ struct sg_request {	/* active SCSI command or inactive request */
 	int pack_id;		/* v3 pack_id or in v4 request_extra field */
 	int sense_len;		/* actual sense buffer length (data-in) */
 	atomic_t rq_st;		/* request state, holds a enum sg_rq_state */
+	enum sg_shr_var sh_var;	/* sharing variety, SG_SHR_NONE=0 if none */
 	u8 cmd_opcode;		/* first byte of SCSI cdb */
 	int tag;		/* block layer identifier of request */
 	blk_qc_t cookie;	/* ids 1 or more queues for blk_poll() */
@@ -237,7 +251,7 @@ struct sg_request {	/* active SCSI command or inactive request */
 	u8 *sense_bp;		/* mempool alloc-ed sense buffer, as needed */
 	struct sg_fd *parentfp;	/* pointer to owning fd, even when on fl */
 	struct request *rqq;	/* released in sg_rq_end_io(), bio kept */
-	struct bio *bio;	/* kept until this req -->SG_RS_INACTIVE */
+	struct bio *bio;	/* kept until this req -->SG_RQ_INACTIVE */
 	struct execute_work ew_orph;	/* harvest orphan request */
 };
 
@@ -262,6 +276,7 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	unsigned long ffd_bm[1];	/* see SG_FFD_* defines above */
 	struct file *filp;	/* my identity when sharing */
 	struct sg_request *rsv_srp;/* one reserve request per fd */
+	struct sg_request *ws_srp; /* when rsv SG_SHR_RS_RQ, ptr to write-side */
 	struct sg_fd __rcu *share_sfp;/* fd share cross-references, else NULL */
 	struct fasync_struct *async_qp; /* used by asynchronous notification */
 	struct xarray srp_arr;	/* xarray of sg_request object pointers */
@@ -317,10 +332,11 @@ static int sg_read_append(struct sg_request *srp, void __user *outp,
 static void sg_remove_sgat(struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp, struct file *filp);
 static void sg_remove_sfp(struct kref *);
+static void sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side);
 static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id,
 					    bool is_tag);
 static struct sg_request *sg_setup_req(struct sg_comm_wr_t *cwrp,
-				       int dxfr_len);
+				       enum sg_shr_var sh_var, int dxfr_len);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int min_dev);
 static void sg_device_destroy(struct kref *kref);
@@ -331,6 +347,7 @@ static int sg_srp_q_blk_poll(struct sg_request *srp, struct request_queue *q,
 			     int loop_count);
 #if IS_ENABLED(CONFIG_SCSI_LOGGING) && IS_ENABLED(SG_DEBUG)
 static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
+static const char *sg_shr_str(enum sg_shr_var sh_var, bool long_str);
 #endif
 
 #define SG_WRITE_COUNT_LIMIT (32 * 1024 * 1024)
@@ -345,7 +362,9 @@ static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
 
 #define SG_IS_DETACHING(sdp) test_bit(SG_FDEV_DETACHING, (sdp)->fdev_bm)
 #define SG_HAVE_EXCLUDE(sdp) test_bit(SG_FDEV_EXCLUDE, (sdp)->fdev_bm)
-#define SG_RS_ACTIVE(srp) (atomic_read(&(srp)->rq_st) != SG_RS_INACTIVE)
+#define SG_IS_O_NONBLOCK(sfp) (!!((sfp)->filp->f_flags & O_NONBLOCK))
+#define SG_RQ_ACTIVE(srp) (atomic_read(&(srp)->rq_st) != SG_RQ_INACTIVE)
+// #define SG_RQ_THIS_RQ(srp) ((srp)->sh_var == SG_SHR_RS_RQ)
 
 /*
  * Kernel needs to be built with CONFIG_SCSI_LOGGING to see log messages.
@@ -427,7 +446,7 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 			mutex_unlock(&sdp->open_rel_lock);
 			res = wait_event_interruptible
 					(sdp->open_wait,
-					 (SG_IS_DETACHING(sdp) ||
+					 (unlikely(SG_IS_DETACHING(sdp)) ||
 					  atomic_read(&sdp->open_cnt) == 0));
 			mutex_lock(&sdp->open_rel_lock);
 
@@ -441,7 +460,7 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 			mutex_unlock(&sdp->open_rel_lock);
 			res = wait_event_interruptible
 					(sdp->open_wait,
-					 (SG_IS_DETACHING(sdp) ||
+					 (unlikely(SG_IS_DETACHING(sdp)) ||
 					  !SG_HAVE_EXCLUDE(sdp)));
 			mutex_lock(&sdp->open_rel_lock);
 
@@ -497,7 +516,7 @@ sg_open(struct inode *inode, struct file *filp)
 	nonseekable_open(inode, filp);
 	o_excl = !!(op_flags & O_EXCL);
 	non_block = !!(op_flags & O_NONBLOCK);
-	if (o_excl && ((op_flags & O_ACCMODE) == O_RDONLY))
+	if (unlikely(o_excl) && ((op_flags & O_ACCMODE) == O_RDONLY))
 		return -EPERM;/* not permitted, need write access for O_EXCL */
 	sdp = sg_get_dev(min_dev);	/* increments sdp->d_ref */
 	if (IS_ERR(sdp))
@@ -572,8 +591,15 @@ sg_open(struct inode *inode, struct file *filp)
 	goto sg_put;
 }
 
+static inline bool
+sg_fd_is_shared(struct sg_fd *sfp)
+{
+	return !xa_get_mark(&sfp->parentdp->sfp_arr, sfp->idx,
+			    SG_XA_FD_UNSHARED);
+}
+
 static inline struct sg_fd *
-sg_fd_shared_ptr(struct sg_fd *sfp)
+sg_fd_share_ptr(struct sg_fd *sfp)
 {
 	struct sg_fd *res_sfp;
 	struct sg_device *sdp = sfp->parentdp;
@@ -618,6 +644,10 @@ sg_release(struct inode *inode, struct file *filp)
 		SG_LOG(1, sfp, "%s: second release on this fd ? ?\n",
 		       __func__);
 	scsi_autopm_put_device(sdp->device);
+	if (!xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_FREE) &&
+	    sg_fd_is_shared(sfp))
+		sg_remove_sfp_share(sfp, xa_get_mark(&sdp->sfp_arr, sfp->idx,
+						     SG_XA_FD_RS_SHARE));
 	kref_put(&sfp->f_ref, sg_remove_sfp);
 
 	/*
@@ -826,7 +856,7 @@ sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 	if (hp->flags & SG_FLAG_MMAP_IO) {
 		int res = sg_chk_mmap(sfp, hp->flags, hp->dxfer_len);
 
-		if (res)
+		if (unlikely(res))
 			return res;
 	}
 	/* when v3 seen, allow cmd_q on this fd (def: no cmd_q) */
@@ -864,7 +894,7 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 		else if (h4p->dout_xferp)
 			len = h4p->dout_xfer_len;
 		res = sg_chk_mmap(sfp, h4p->flags, len);
-		if (res)
+		if (unlikely(res))
 			return res;
 	}
 	/* once v4 (or v3) seen, allow cmd_q on this fd (def: no cmd_q) */
@@ -902,7 +932,7 @@ sg_ctl_iosubmit(struct sg_fd *sfp, void __user *p)
 	struct sg_io_v4 *h4p = (struct sg_io_v4 *)hdr_store;
 	struct sg_device *sdp = sfp->parentdp;
 
-	res = sg_allow_if_err_recovery(sdp, (sfp->filp->f_flags & O_NONBLOCK));
+	res = sg_allow_if_err_recovery(sdp, SG_IS_O_NONBLOCK(sfp));
 	if (res)
 		return res;
 	if (copy_from_user(hdr_store, p, SZ_SG_IO_V4))
@@ -920,7 +950,7 @@ sg_ctl_iosubmit_v3(struct sg_fd *sfp, void __user *p)
 	struct sg_io_hdr *h3p = (struct sg_io_hdr *)hdr_store;
 	struct sg_device *sdp = sfp->parentdp;
 
-	res = sg_allow_if_err_recovery(sdp, (sfp->filp->f_flags & O_NONBLOCK));
+	res = sg_allow_if_err_recovery(sdp, SG_IS_O_NONBLOCK(sfp));
 	if (unlikely(res))
 		return res;
 	if (copy_from_user(h3p, p, SZ_SG_IO_HDR))
@@ -930,6 +960,54 @@ sg_ctl_iosubmit_v3(struct sg_fd *sfp, void __user *p)
 	return -EPERM;
 }
 
+/*
+ * Assumes sharing has been established at the file descriptor level and now we
+ * check the rq_flags of a new request/command. SGV4_FLAG_NO_DXFER may or may
+ * not be used on the read-side, it must be used on the write-side. Also
+ * returns (via *sh_varp) the proposed sg_request::sh_var of the new request
+ * yet to be built/re-used.
+ */
+static int
+sg_share_chk_flags(struct sg_fd *sfp, u32 rq_flags, int dxfer_len, int dir,
+		   enum sg_shr_var *sh_varp)
+{
+	bool is_read_side = xa_get_mark(&sfp->parentdp->sfp_arr, sfp->idx,
+					SG_XA_FD_RS_SHARE);
+	int result = 0;
+	enum sg_shr_var sh_var = SG_SHR_NONE;
+
+	if (rq_flags & SGV4_FLAG_SHARE) {
+		if (rq_flags & SG_FLAG_DIRECT_IO)
+			result = -EINVAL; /* since no control of data buffer */
+		else if (dxfer_len < 1)
+			result = -ENODATA;
+		else if (is_read_side) {
+			sh_var = SG_SHR_RS_RQ;
+			if (dir != SG_DXFER_FROM_DEV)
+				result = -ENOMSG;
+			if (rq_flags & SGV4_FLAG_NO_DXFER) {
+				/* rule out some contradictions */
+				if (rq_flags & SG_FL_MMAP_DIRECT)
+					result = -ENODATA;
+			}
+		} else {			/* fd is write-side */
+			sh_var = SG_SHR_WS_RQ;
+			if (dir != SG_DXFER_TO_DEV)
+				result = -ENOMSG;
+			if (!(rq_flags & SGV4_FLAG_NO_DXFER))
+				result = -ENOMSG;
+			if (rq_flags & SG_FL_MMAP_DIRECT)
+				result = -ENODATA;
+		}
+	} else if (is_read_side) {
+		sh_var = SG_SHR_RS_NOT_SRQ;
+	} else {
+		sh_var = SG_SHR_WS_NOT_SRQ;
+	}
+	*sh_varp = sh_var;
+	return result;
+}
+
 #if IS_ENABLED(SG_LOG_ACTIVE)
 static void
 sg_rq_state_fail_msg(struct sg_fd *sfp, enum sg_rq_state exp_old_st,
@@ -949,38 +1027,6 @@ sg_rq_state_fail_msg(struct sg_fd *sfp, enum sg_rq_state exp_old_st,
 #endif
 
 /* Functions ending in '_ulck' assume sfp->xa_lock held by caller. */
-static void
-sg_rq_chg_state_force_ulck(struct sg_request *srp, enum sg_rq_state new_st)
-{
-	bool prev, want;
-	struct sg_fd *sfp = srp->parentfp;
-	struct xarray *xafp = &sfp->srp_arr;
-
-	atomic_set(&srp->rq_st, new_st);
-	want = (new_st == SG_RS_AWAIT_RCV);
-	prev = xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
-	if (prev != want) {
-		if (want)
-			__xa_set_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
-		else
-			__xa_clear_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
-	}
-	want = (new_st == SG_RS_INACTIVE);
-	prev = xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
-	if (prev != want) {
-		if (want) {
-			int prev_idx = READ_ONCE(sfp->low_used_idx);
-
-			if (prev_idx < 0 || srp->rq_idx < prev_idx ||
-			    !xa_get_mark(xafp, prev_idx, SG_XA_RQ_INACTIVE))
-				WRITE_ONCE(sfp->low_used_idx, srp->rq_idx);
-			__xa_set_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
-		} else {
-			__xa_clear_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
-		}
-	}
-}
-
 static void
 sg_rq_chg_state_help(struct xarray *xafp, struct sg_request *srp, int indic)
 {
@@ -996,21 +1042,42 @@ sg_rq_chg_state_help(struct xarray *xafp, struct sg_request *srp, int indic)
 }
 
 /* Following array indexed by enum sg_rq_state, 0 means no xa mark change */
-static const int sg_rq_state_arr[] = {1, 0, 4, 0};
-static const int sg_rq_state_mul2arr[] = {2, 0, 8, 0};
+static const int sg_rq_state_arr[] = {1, 0, 4, 0, 0, 0};
+static const int sg_rq_state_mul2arr[] = {2, 0, 8, 0, 0, 0};
 
 /*
  * This function keeps the srp->rq_st state and associated marks on the
- * owning xarray's element in sync. If force is true then new_st is stored
- * in srp->rq_st and xarray marks are set accordingly (and old_st is
- * ignored); and 0 is returned.
- * If force is false, then atomic_cmpxchg() is called. If the actual
- * srp->rq_st is not old_st, then -EPROTOTYPE is returned. If the actual
- * srp->rq_st is old_st then it is replaced by new_st and the xarray marks
- * are setup accordingly and 0 is returned. This assumes srp_arr xarray
- * spinlock is held.
+ * owning xarray's element in sync. An attempt si made to change state with
+ * a call to atomic_cmpxchg(). If the actual srp->rq_st is not old_st, then
+ * -EPROTOTYPE is returned. If the actual srp->rq_st is old_st then it is
+ * replaced by new_st and the xarray marks are setup accordingly and 0 is
+ * returned. This assumes srp_arr xarray spinlock is held.
  */
 static int
+sg_rq_chg_state_ulck(struct sg_request *srp, enum sg_rq_state old_st,
+		     enum sg_rq_state new_st)
+{
+	enum sg_rq_state act_old_st;
+	int indic;
+
+	indic = sg_rq_state_arr[(int)old_st] +
+		sg_rq_state_mul2arr[(int)new_st];
+	act_old_st = (enum sg_rq_state)atomic_cmpxchg(&srp->rq_st, old_st,
+						      new_st);
+	if (act_old_st != old_st) {
+#if IS_ENABLED(SG_LOG_ACTIVE)
+		SG_LOG(1, srp->parentfp, "%s: unexpected old state: %s\n",
+		       __func__, sg_rq_st_str(act_old_st, false));
+#endif
+		return -EPROTOTYPE;	/* only used for this error type */
+	}
+	if (indic)
+		sg_rq_chg_state_help(&srp->parentfp->srp_arr, srp, indic);
+	return 0;
+}
+
+/* Similar to sg_rq_chg_state_ulck() but uses the xarray spinlock */
+static int
 sg_rq_chg_state(struct sg_request *srp, enum sg_rq_state old_st,
 		enum sg_rq_state new_st)
 {
@@ -1030,7 +1097,7 @@ sg_rq_chg_state(struct sg_request *srp, enum sg_rq_state old_st,
 			       sg_rq_st_str(act_old_st, false));
 			return -EPROTOTYPE;     /* only used for this error type */
 		}
-		if (new_st == SG_RS_INACTIVE) {
+		if (new_st == SG_RQ_INACTIVE) {
 			int prev_idx = READ_ONCE(sfp->low_used_idx);
 
 			if (prev_idx < 0 || srp->rq_idx < prev_idx ||
@@ -1050,6 +1117,38 @@ sg_rq_chg_state(struct sg_request *srp, enum sg_rq_state old_st,
 	return 0;
 }
 
+static void
+sg_rq_chg_state_force_ulck(struct sg_request *srp, enum sg_rq_state new_st)
+{
+	bool prev, want;
+	struct sg_fd *sfp = srp->parentfp;
+	struct xarray *xafp = &sfp->srp_arr;
+
+	atomic_set(&srp->rq_st, new_st);
+	want = (new_st == SG_RQ_AWAIT_RCV);
+	prev = xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
+	if (prev != want) {
+		if (want)
+			__xa_set_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
+		else
+			__xa_clear_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
+	}
+	want = (new_st == SG_RQ_INACTIVE);
+	prev = xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
+	if (prev != want) {
+		if (want) {
+			int prev_idx = READ_ONCE(sfp->low_used_idx);
+
+			if (prev_idx < 0 || srp->rq_idx < prev_idx ||
+			    !xa_get_mark(xafp, prev_idx, SG_XA_RQ_INACTIVE))
+				WRITE_ONCE(sfp->low_used_idx, srp->rq_idx);
+			__xa_set_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
+		} else {
+			__xa_clear_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
+		}
+	}
+}
+
 static void
 sg_rq_chg_state_force(struct sg_request *srp, enum sg_rq_state new_st)
 {
@@ -1086,7 +1185,7 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 		at_head = !(srp->rq_flags & SG_FLAG_Q_AT_TAIL);
 
 	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
-	sg_rq_chg_state_force(srp, SG_RS_INFLIGHT);
+	sg_rq_chg_state_force(srp, SG_RQ_INFLIGHT);
 
 	/* >>>>>>> send cmd/req off to other levels <<<<<<<< */
 	if (!sync) {
@@ -1115,6 +1214,7 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 	int dxfr_len, dir;
 	int pack_id = SG_PACK_ID_WILDCARD;
 	u32 rq_flags;
+	enum sg_shr_var sh_var;
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_device *sdp = fp->parentdp;
 	struct sg_request *srp;
@@ -1145,10 +1245,19 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		rq_flags = hi_p->flags;
 		pack_id = hi_p->pack_id;
 	}
+	if (sg_fd_is_shared(fp)) {
+		res = sg_share_chk_flags(fp, rq_flags, dxfr_len, dir, &sh_var);
+		if (unlikely(res < 0))
+			return ERR_PTR(res);
+	} else {
+		sh_var = SG_SHR_NONE;
+		if (rq_flags & SGV4_FLAG_SHARE)
+			return ERR_PTR(-ENOMSG);
+	}
 	if (dxfr_len >= SZ_256M)
 		return ERR_PTR(-EINVAL);
 
-	srp = sg_setup_req(cwrp, dxfr_len);
+	srp = sg_setup_req(cwrp, sh_var, dxfr_len);
 	if (IS_ERR(srp))
 		return srp;
 	srp->rq_flags = rq_flags;
@@ -1235,8 +1344,6 @@ sg_copy_sense(struct sg_request *srp, bool v4_active)
 			sb_len_ret = min_t(int, sb_len_ret, sb_len);
 			if (copy_to_user(up, sbp, sb_len_ret))
 				sb_len_ret = -EFAULT;
-		} else {
-			sb_len_ret = 0;
 		}
 		mempool_free(sbp, sg_sense_pool);
 	}
@@ -1246,7 +1353,10 @@ sg_copy_sense(struct sg_request *srp, bool v4_active)
 static int
 sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 {
+	int err = 0;
 	u32 rq_res = srp->rq_result;
+	enum sg_shr_var sh_var = srp->sh_var;
+	struct sg_fd *sh_sfp;
 
 	if (unlikely(srp->rq_result & 0xff)) {
 		int sb_len_wr = sg_copy_sense(srp, v4_active);
@@ -1256,9 +1366,86 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 	}
 	if (rq_res & SG_ML_RESULT_MSK)
 		srp->rq_info |= SG_INFO_CHECK;
+	if (test_bit(SG_FRQ_ABORTING, srp->frq_bm))
+		srp->rq_info |= SG_INFO_ABORTED;
+
+	sh_sfp = sg_fd_share_ptr(sfp);
+	if (sh_var == SG_SHR_WS_RQ && sg_fd_is_shared(sfp)) {
+		struct sg_request *rs_srp = sh_sfp->rsv_srp;
+		enum sg_rq_state mar_st = atomic_read(&rs_srp->rq_st);
+
+		switch (mar_st) {
+		case SG_RQ_SHR_SWAP:
+		case SG_RQ_SHR_IN_WS:
+			/* make read-side request available for re-use */
+			rs_srp->tag = SG_TAG_WILDCARD;
+			rs_srp->sh_var = SG_SHR_NONE;
+			sg_rq_chg_state_force(rs_srp, SG_RQ_INACTIVE);
+			atomic_inc(&sh_sfp->inactives);
+			break;
+		case SG_RQ_INACTIVE:
+		case SG_RQ_AWAIT_RCV:
+			sh_sfp->ws_srp = NULL;
+			break;  /* nothing to do */
+		default:
+			err = -EPROTO;  /* Logic error */
+			SG_LOG(1, sfp,
+			       "%s: SHR_WS_RQ, bad read-side state: %s\n",
+			       __func__, sg_rq_st_str(mar_st, true));
+			break;  /* nothing to do */
+		}
+	}
 	if (unlikely(SG_IS_DETACHING(sfp->parentdp)))
 		srp->rq_info |= SG_INFO_DEVICE_DETACHING;
-	return 0;
+	return err;
+}
+
+static void
+sg_complete_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool other_err)
+{
+	enum sg_rq_state sr_st = atomic_read(&srp->rq_st);
+
+	/* advance state machine, send signal to write-side if appropriate */
+	switch (srp->sh_var) {
+	case SG_SHR_RS_RQ:
+		{
+			int poll_type = POLL_OUT;
+			struct sg_fd *sh_sfp = sg_fd_share_ptr(sfp);
+
+			if ((srp->rq_result & SG_ML_RESULT_MSK) || other_err) {
+				set_bit(SG_FFD_READ_SIDE_ERR, sfp->ffd_bm);
+				if (sr_st != SG_RQ_BUSY)
+					sg_rq_chg_state_force(srp, SG_RQ_BUSY);
+				poll_type = POLL_HUP;   /* "Hang-UP flag */
+			} else if (sr_st != SG_RQ_SHR_SWAP) {
+				sg_rq_chg_state_force(srp, SG_RQ_SHR_SWAP);
+			}
+			if (sh_sfp)
+				kill_fasync(&sh_sfp->async_qp, SIGPOLL,
+					    poll_type);
+		}
+		break;
+	case SG_SHR_WS_RQ:      /* cleanup both on write-side completion */
+		{
+			struct sg_fd *rs_sfp = sg_fd_share_ptr(sfp);
+
+			if (rs_sfp) {
+				rs_sfp->ws_srp = NULL;
+				if (rs_sfp->rsv_srp)
+					rs_sfp->rsv_srp->sh_var =
+							SG_SHR_RS_NOT_SRQ;
+			}
+		}
+		srp->sh_var = SG_SHR_WS_NOT_SRQ;
+		srp->sgatp = &srp->sgat_h;
+		if (sr_st != SG_RQ_BUSY)
+			sg_rq_chg_state_force(srp, SG_RQ_BUSY);
+		break;
+	default:
+		if (sr_st != SG_RQ_BUSY)
+			sg_rq_chg_state_force(srp, SG_RQ_BUSY);
+		break;
+	}
 }
 
 static int
@@ -1283,10 +1470,10 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 	h4p->duration = srp->duration;
 	switch (srp->s_hdr4.dir) {
 	case SG_DXFER_FROM_DEV:
-		h4p->din_xfer_len = srp->sgat_h.dlen;
+		h4p->din_xfer_len = srp->sgatp->dlen;
 		break;
 	case SG_DXFER_TO_DEV:
-		h4p->dout_xfer_len = srp->sgat_h.dlen;
+		h4p->dout_xfer_len = srp->sgatp->dlen;
 		break;
 	default:
 		break;
@@ -1302,6 +1489,7 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 		if (copy_to_user(p, h4p, SZ_SG_IO_V4))
 			err = err ? err : -EFAULT;
 	}
+	sg_complete_v3v4(sfp, srp, err < 0);
 	sg_finish_scsi_blk_rq(srp);
 	sg_deact_request(sfp, srp);
 	return err < 0 ? err : 0;
@@ -1317,7 +1505,7 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 static int
 sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 {
-	bool non_block = !!(sfp->filp->f_flags & O_NONBLOCK);
+	bool non_block = SG_IS_O_NONBLOCK(sfp);
 	bool use_tag = false;
 	int res, id;
 	int pack_id = SG_PACK_ID_WILDCARD;
@@ -1355,9 +1543,9 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 			return -ENODEV;
 		if (non_block)
 			return -EAGAIN;
-		res = wait_event_interruptible(sfp->read_wait,
-					       sg_get_ready_srp(sfp, &srp,
-								id, use_tag));
+		res = wait_event_interruptible
+				(sfp->read_wait,
+				 sg_get_ready_srp(sfp, &srp, id, use_tag));
 		if (unlikely(SG_IS_DETACHING(sdp)))
 			return -ENODEV;
 		if (res)
@@ -1380,7 +1568,7 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 static int
 sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 {
-	bool non_block = !!(sfp->filp->f_flags & O_NONBLOCK);
+	bool non_block = SG_IS_O_NONBLOCK(sfp);
 	int res;
 	int pack_id = SG_PACK_ID_WILDCARD;
 	u8 v3_holder[SZ_SG_IO_HDR];
@@ -1566,6 +1754,19 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 				ret = get_user(want_id, &h3_up->pack_id);
 				if (ret)
 					return ret;
+				if (!non_block) {
+					int flgs;
+
+					ret = get_user(flgs, &h3_up->flags);
+					if (ret)
+						return ret;
+					if (flgs & SGV4_FLAG_IMMED)
+						non_block = true;
+				}
+			} else if (v3_hdr->interface_id == 'Q') {
+				pr_info_once("sg: %s: v4 interface%s here\n",
+					     __func__, " disallowed");
+				return -EPERM;
 			} else {
 				return -EPERM;
 			}
@@ -1622,7 +1823,8 @@ sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
 	struct sg_io_hdr hdr3;
 	struct sg_io_hdr *hp = &hdr3;
 
-	SG_LOG(3, sfp, "%s: srp=0x%pK\n", __func__, srp);
+	SG_LOG(3, sfp, "%s: sh_var: %s srp=0x%pK\n", __func__,
+	       sg_shr_str(srp->sh_var, false), srp);
 	err = sg_rec_state_v3v4(sfp, srp, false);
 	memset(hp, 0, sizeof(*hp));
 	memcpy(hp, &srp->s_hdr3, sizeof(srp->s_hdr3));
@@ -1687,92 +1889,192 @@ sg_calc_sgat_param(struct sg_device *sdp)
 }
 
 /*
- * Depending on which side is calling for the unshare, it is best to unshare
- * the other side first. For example: if the invocation is from the read-side
- * fd then rd_first should be false so the write-side is unshared first.
+ * Only valid for shared file descriptors, else -EINVAL. Should only be
+ * called after a read-side request has successfully completed so that
+ * there is valid data in reserve buffer. If fini1_again0 is true then
+ * read-side is taken out of the state waiting for a write-side request and the
+ * read-side is put in the inactive state. If fini1_again0 is false (0) then
+ * the read-side (assuming it is inactive) is put in a state waiting for
+ * a write-side request. This function is called when the write mask is set on
+ * ioctl(SG_SET_GET_EXTENDED(SG_CTL_FLAGM_READ_SIDE_FINI)).
  */
+static int
+sg_change_after_read_side_rq(struct sg_fd *sfp, bool fini1_again0)
+{
+	int res = 0;
+	enum sg_rq_state sr_st;
+	unsigned long iflags;
+	struct sg_fd *rs_sfp;
+	struct sg_request *rs_rsv_srp = NULL;
+	struct sg_device *sdp = sfp->parentdp;
+
+	rs_sfp = sg_fd_share_ptr(sfp);
+	if (unlikely(!rs_sfp)) {
+		res = -EINVAL;
+	} else if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_RS_SHARE)) {
+		rs_rsv_srp = sfp->rsv_srp;
+		rs_sfp = sfp;
+	} else {	/* else called on write-side */
+		rs_rsv_srp = rs_sfp->rsv_srp;
+	}
+	if (res || !rs_rsv_srp)
+		goto fini;
+
+	xa_lock_irqsave(&rs_sfp->srp_arr, iflags);
+	sr_st = atomic_read(&rs_rsv_srp->rq_st);
+	if (fini1_again0) {
+		switch (sr_st) {
+		case SG_RQ_SHR_SWAP:
+			rs_rsv_srp->sh_var = SG_SHR_RS_NOT_SRQ;
+			rs_rsv_srp = NULL;
+			res = sg_rq_chg_state(rs_rsv_srp, sr_st, SG_RQ_INACTIVE);
+			if (!res)
+				atomic_inc(&rs_sfp->inactives);
+			break;
+		case SG_RQ_SHR_IN_WS:	/* too late, write-side rq active */
+		case SG_RQ_BUSY:
+			res = -EAGAIN;
+			break;
+		default:	/* read-side in SG_RQ_SHR_SWAIT is bad */
+			res = -EINVAL;
+			break;
+		}
+	} else {
+		switch (sr_st) {
+		case SG_RQ_INACTIVE:
+			rs_rsv_srp->sh_var = SG_SHR_RS_RQ;
+			res = sg_rq_chg_state(rs_rsv_srp, sr_st, SG_RQ_SHR_SWAP);
+			break;
+		case SG_RQ_SHR_SWAP:
+			break;	/* already done, redundant call? */
+		default:	/* all other states */
+			res = -EBUSY;	/* read-side busy doing ... */
+			break;
+		}
+	}
+	xa_unlock_irqrestore(&rs_sfp->srp_arr, iflags);
+fini:
+	if (unlikely(res)) {
+		SG_LOG(1, sfp, "%s: err=%d\n", __func__, -res);
+	} else {
+		SG_LOG(6, sfp, "%s: okay, fini1_again0=%d\n", __func__,
+		       fini1_again0);
+	}
+	return res;
+}
+
 static void
-sg_unshare_fds(struct sg_fd *rs_sfp, bool rs_lck, struct sg_fd *ws_sfp,
-	       bool ws_lck, bool rs_first)
+sg_unshare_rs_fd(struct sg_fd *rs_sfp, bool lck)
 {
-	bool diff_sdps = true;
 	unsigned long iflags = 0;
-	struct sg_device *sdp;
-	struct xarray *xap;
-
-	if (rs_lck && ws_lck &&  rs_sfp && ws_sfp &&
-	    rs_sfp->parentdp == ws_sfp->parentdp)
-		diff_sdps = false;
-	if (!rs_first && ws_sfp)
-		goto wr_first;
-rd_first:
-	if (rs_sfp) {
-		sdp = rs_sfp->parentdp;
-		xap = &sdp->sfp_arr;
-		rcu_assign_pointer(rs_sfp->share_sfp, NULL);
-		if (rs_lck && (rs_first || diff_sdps))
-			xa_lock_irqsave(xap, iflags);
-		__xa_set_mark(xap, rs_sfp->idx, SG_XA_FD_UNSHARED);
-		__xa_clear_mark(xap, rs_sfp->idx, SG_XA_FD_RS_SHARE);
-		if (rs_lck && (!rs_first || diff_sdps))
-			xa_unlock_irqrestore(xap, iflags);
-		kref_put(&sdp->d_ref, sg_device_destroy);
-	}
-	if (!rs_first || !ws_sfp)
-		return;
-wr_first:
-	if (ws_sfp) {
-		sdp = ws_sfp->parentdp;
-		xap = &sdp->sfp_arr;
-		rcu_assign_pointer(ws_sfp->share_sfp, NULL);
-		if (ws_lck && (!rs_first || diff_sdps))
-			xa_lock_irqsave(xap, iflags);
-		__xa_set_mark(xap, ws_sfp->idx, SG_XA_FD_UNSHARED);
-		/* SG_XA_FD_RS_SHARE mark should be already clear */
-		if (ws_lck && (rs_first || diff_sdps))
-			xa_unlock_irqrestore(xap, iflags);
-		kref_put(&sdp->d_ref, sg_device_destroy);
-	}
-	if (!rs_first && rs_sfp)
-		goto rd_first;
+	struct sg_device *sdp = rs_sfp->parentdp;
+	struct xarray *xadp = &sdp->sfp_arr;
+
+	rcu_assign_pointer(rs_sfp->share_sfp, NULL);
+	if (lck)
+		xa_lock_irqsave(xadp, iflags);
+	rs_sfp->ws_srp = NULL;
+	__xa_set_mark(xadp, rs_sfp->idx, SG_XA_FD_UNSHARED);
+	__xa_clear_mark(xadp, rs_sfp->idx, SG_XA_FD_RS_SHARE);
+	if (lck)
+		xa_unlock_irqrestore(xadp, iflags);
+	kref_put(&rs_sfp->f_ref, sg_remove_sfp);/* get: sg_find_sfp_helper() */
+}
+
+static void
+sg_unshare_ws_fd(struct sg_fd *ws_sfp, bool lck)
+{
+	unsigned long iflags;
+	struct sg_device *sdp = ws_sfp->parentdp;
+	struct xarray *xadp = &sdp->sfp_arr;
+
+	rcu_assign_pointer(ws_sfp->share_sfp, NULL);
+	if (lck)
+		xa_lock_irqsave(xadp, iflags);
+	__xa_set_mark(xadp, ws_sfp->idx, SG_XA_FD_UNSHARED);
+	/* SG_XA_FD_RS_SHARE mark should be already clear */
+	if (lck)
+		xa_unlock_irqrestore(xadp, iflags);
+	kref_put(&ws_sfp->f_ref, sg_remove_sfp);/* get: sg_find_sfp_helper() */
 }
 
 /*
- * Clean up loose ends that occur when clsong a file descriptor which is
+ * Clean up loose ends that occur when closing a file descriptor which is
  * part of a file share. There may be request shares in various states using
- * this file share so care is needed.
+ * this file share so care is needed. Potential race when both sides of fd
+ * share have their fd_s closed (i.e. sg_release()) at around the same time
+ * is the reason for rechecking the FD_RS_SHARE or FD_UNSHARED marks.
  */
 static void
 sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
 {
+	__maybe_unused int res = 0;
 	unsigned long iflags;
-	struct sg_fd *o_sfp = sg_fd_shared_ptr(sfp);
-	struct sg_device *sdp;
-	struct xarray *xap;
+	enum sg_rq_state sr_st;
+	struct sg_device *sdp = sfp->parentdp;
+	struct sg_device *sh_sdp;
+	struct sg_fd *sh_sfp;
+	struct sg_request *rsv_srp = NULL;
+	struct sg_request *ws_srp;
+	struct xarray *xadp = &sdp->sfp_arr;
 
-	SG_LOG(3, sfp, "%s: sfp=0x%pK, o_sfp=0x%pK%s\n", __func__, sfp, o_sfp,
-	       (is_rd_side ? " read-side" : ""));
+	SG_LOG(3, sfp, "%s: sfp=%pK %s\n", __func__, sfp,
+	       (is_rd_side ? "read-side" : "write-side"));
+	xa_lock_irqsave(xadp, iflags);
+	sh_sfp = sg_fd_share_ptr(sfp);
+	if (!sg_fd_is_shared(sfp))
+		goto err_out;
+	sh_sdp = sh_sfp->parentdp;
 	if (is_rd_side) {
-		sdp = sfp->parentdp;
-		xap = &sdp->sfp_arr;
-		xa_lock_irqsave(xap, iflags);
-		if (!xa_get_mark(xap, sfp->idx, SG_XA_FD_RS_SHARE)) {
-			xa_unlock_irqrestore(xap, iflags);
+		bool set_inactive = false;
+
+		if (!xa_get_mark(xadp, sfp->idx, SG_XA_FD_RS_SHARE)) {
+			xa_unlock_irqrestore(xadp, iflags);
 			return;
 		}
-		sg_unshare_fds(sfp, false, NULL, false, true);
-		xa_unlock_irqrestore(&sdp->sfp_arr, iflags);
+		rsv_srp = sfp->rsv_srp;
+		if (!rsv_srp)
+			goto fini;
+		if (rsv_srp->sh_var != SG_SHR_RS_RQ)
+			goto fini;
+		sr_st = atomic_read(&rsv_srp->rq_st);
+		switch (sr_st) {
+		case SG_RQ_SHR_SWAP:
+			set_inactive = true;
+			break;
+		case SG_RQ_SHR_IN_WS:
+			ws_srp = sfp->ws_srp;
+			if (ws_srp && !IS_ERR(ws_srp)) {
+				ws_srp->sh_var = SG_SHR_WS_NOT_SRQ;
+				sfp->ws_srp = NULL;
+			}
+			set_inactive = true;
+			break;
+		default:
+			break;
+		}
+		rsv_srp->sh_var = SG_SHR_NONE;
+		if (set_inactive) {
+			res = sg_rq_chg_state_ulck(rsv_srp, sr_st, SG_RQ_INACTIVE);
+			if (!res)
+				atomic_inc(&sfp->inactives);
+		}
+fini:
+		if (!xa_get_mark(&sh_sdp->sfp_arr, sh_sfp->idx,
+				 SG_XA_FD_FREE) && sg_fd_is_shared(sh_sfp))
+			sg_unshare_ws_fd(sh_sfp, sdp != sh_sdp);
+		sg_unshare_rs_fd(sfp, false);
 	} else {
-		sdp = sfp->parentdp;
-		xap = &sdp->sfp_arr;
-		xa_lock_irqsave(xap, iflags);
-		if (xa_get_mark(xap, sfp->idx, SG_XA_FD_UNSHARED)) {
-			xa_unlock_irqrestore(xap, iflags);
+		if (!sg_fd_is_shared(sfp)) {
+			xa_unlock_irqrestore(xadp, iflags);
 			return;
-		}
-		sg_unshare_fds(NULL, false, sfp, false, false);
-		xa_unlock_irqrestore(xap, iflags);
+		} else if (!xa_get_mark(&sh_sdp->sfp_arr, sh_sfp->idx,
+					SG_XA_FD_FREE))
+			sg_unshare_rs_fd(sh_sfp, sdp != sh_sdp);
+		sg_unshare_ws_fd(sfp, false);
 	}
+err_out:
+	xa_unlock_irqrestore(xadp, iflags);
 }
 
 /*
@@ -1782,41 +2084,45 @@ sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
  */
 static void
 sg_do_unshare(struct sg_fd *sfp, bool unshare_val)
+		__must_hold(sfp->f_mutex)
 {
 	bool retry;
 	int retry_count = 0;
-	unsigned long iflags;
+	struct sg_request *rs_rsv_srp;
 	struct sg_fd *rs_sfp;
 	struct sg_fd *ws_sfp;
-	struct sg_fd *o_sfp = sg_fd_shared_ptr(sfp);
+	struct sg_fd *o_sfp = sg_fd_share_ptr(sfp);
 	struct sg_device *sdp = sfp->parentdp;
 
 	if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_UNSHARED)) {
 		SG_LOG(1, sfp, "%s: not shared ? ?\n", __func__);
-		return; /* no share to undo */
+		return;	/* no share to undo */
 	}
 	if (!unshare_val)
-		return;
+		return;		/* when unshare value is zero, it's a NOP */
 again:
 	retry = false;
-	xa_lock_irqsave(&sfp->srp_arr, iflags);
 	if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_RS_SHARE)) {
 		rs_sfp = sfp;
 		ws_sfp = o_sfp;
-		if (!xa_trylock(&ws_sfp->srp_arr)) {
-			if (++retry_count > SG_ADD_RQ_MAX_RETRIES)
-				SG_LOG(1, sfp, "%s: cannot get write-side lock\n",
-				       __func__);
-			else
-				retry = true;
-			goto fini;
+		rs_rsv_srp = rs_sfp->rsv_srp;
+		if (rs_rsv_srp && rs_rsv_srp->sh_var != SG_SHR_RS_RQ) {
+			if (unlikely(!mutex_trylock(&ws_sfp->f_mutex))) {
+				if (++retry_count > SG_ADD_RQ_MAX_RETRIES)
+					SG_LOG(1, sfp,
+					       "%s: cannot get write-side lock\n",
+					       __func__);
+				else
+					retry = true;
+				goto fini;
+			}
+			sg_unshare_rs_fd(rs_sfp, true);
+			mutex_unlock(&ws_sfp->f_mutex);
 		}
-		sg_unshare_fds(rs_sfp, false, ws_sfp, false, false);
-		xa_unlock(&ws_sfp->srp_arr);
 	} else {			/* called on write-side fd */
 		rs_sfp = o_sfp;
 		ws_sfp = sfp;
-		if (!xa_trylock(&rs_sfp->srp_arr)) {
+		if (unlikely(!mutex_trylock(&rs_sfp->f_mutex))) {
 			if (++retry_count > SG_ADD_RQ_MAX_RETRIES)
 				SG_LOG(1, sfp, "%s: cannot get read side lock\n",
 				       __func__);
@@ -1824,12 +2130,15 @@ sg_do_unshare(struct sg_fd *sfp, bool unshare_val)
 				retry = true;
 			goto fini;
 		}
-		sg_unshare_fds(rs_sfp, false, ws_sfp, false, true);
-		xa_unlock(&rs_sfp->srp_arr);
+		rs_rsv_srp = rs_sfp->rsv_srp;
+		if (rs_rsv_srp->sh_var != SG_SHR_RS_RQ) {
+			sg_unshare_rs_fd(rs_sfp, true);
+			sg_unshare_ws_fd(ws_sfp, true);
+		}
+		mutex_unlock(&rs_sfp->f_mutex);
 	}
 fini:
-	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
-	if (retry) {
+	if (unlikely(retry)) {
 		cpu_relax();
 		goto again;
 	}
@@ -1876,12 +2185,14 @@ sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
 	u32 res = U32_MAX;
 
 	switch (sr_stp ? *sr_stp : atomic_read(&srp->rq_st)) {
-	case SG_RS_INFLIGHT:
-	case SG_RS_BUSY:
+	case SG_RQ_INFLIGHT:
+	case SG_RQ_BUSY:
 		res = sg_calc_rq_dur(srp, time_in_ns);
 		break;
-	case SG_RS_AWAIT_RCV:
-	case SG_RS_INACTIVE:
+	case SG_RQ_AWAIT_RCV:
+	case SG_RQ_SHR_SWAP:
+	case SG_RQ_SHR_IN_WS:
+	case SG_RQ_INACTIVE:
 		res = srp->duration;
 		is_dur = true;	/* completion has occurred, timing finished */
 		break;
@@ -1917,7 +2228,7 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 static inline bool
 sg_rq_landed(struct sg_device *sdp, struct sg_request *srp)
 {
-	return atomic_read_acquire(&srp->rq_st) != SG_RS_INFLIGHT ||
+	return atomic_read_acquire(&srp->rq_st) != SG_RQ_INFLIGHT ||
 	       unlikely(SG_IS_DETACHING(sdp));
 }
 
@@ -1933,7 +2244,7 @@ sg_wait_event_srp(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 	enum sg_rq_state sr_st;
 	struct sg_device *sdp = sfp->parentdp;
 
-	if (atomic_read(&srp->rq_st) != SG_RS_INFLIGHT)
+	if (atomic_read(&srp->rq_st) != SG_RQ_INFLIGHT)
 		goto skip_wait;		/* and skip _acquire() */
 	if (srp->rq_flags & SGV4_FLAG_HIPRI) {
 		/* call blk_poll(), spinning till found */
@@ -1949,24 +2260,25 @@ sg_wait_event_srp(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 	if (unlikely(res)) { /* -ERESTARTSYS because signal hit thread */
 		set_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
 		/* orphans harvested when sfp->keep_orphan is false */
-		atomic_set(&srp->rq_st, SG_RS_INFLIGHT);
-		SG_LOG(1, sfp, "%s:  wait_event_interruptible gave %d\n",
-		       __func__, res);
+		sg_rq_chg_state_force(srp, SG_RQ_INFLIGHT);
+		SG_LOG(1, sfp, "%s:  wait_event_interruptible(): %s[%d]\n",
+		       __func__, (res == -ERESTARTSYS ? "ERESTARTSYS" : ""),
+		       res);
 		return res;
 	}
 skip_wait:
 	if (unlikely(SG_IS_DETACHING(sdp))) {
-		sg_rq_chg_state_force(srp, SG_RS_INACTIVE);
+		sg_rq_chg_state_force(srp, SG_RQ_INACTIVE);
 		atomic_inc(&sfp->inactives);
 		return -ENODEV;
 	}
 	sr_st = atomic_read(&srp->rq_st);
-	if (unlikely(sr_st != SG_RS_AWAIT_RCV))
+	if (unlikely(sr_st != SG_RQ_AWAIT_RCV))
 		return -EPROTO;         /* Logic error */
-	res = sg_rq_chg_state(srp, sr_st, SG_RS_BUSY);
+	res = sg_rq_chg_state(srp, sr_st, SG_RQ_BUSY);
 	if (unlikely(res)) {
 #if IS_ENABLED(SG_LOG_ACTIVE)
-		sg_rq_state_fail_msg(sfp, sr_st, SG_RS_BUSY, __func__);
+		sg_rq_state_fail_msg(sfp, sr_st, SG_RQ_BUSY, __func__);
 #endif
 		return res;
 	}
@@ -1991,8 +2303,7 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	struct sg_io_v4 *h4p = (struct sg_io_v4 *)hu8arr;
 
 	SG_LOG(3, sfp, "%s:  SG_IO%s\n", __func__,
-	       ((sfp->filp->f_flags & O_NONBLOCK) ? " O_NONBLOCK ignored" :
-						    ""));
+	       (SG_IS_O_NONBLOCK(sfp) ? " O_NONBLOCK ignored" : ""));
 	res = sg_allow_if_err_recovery(sdp, false);
 	if (res)
 		return res;
@@ -2017,14 +2328,20 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	if (!srp)	/* mrq case: already processed all responses */
 		return res;
 	res = sg_wait_event_srp(sfp, p, h4p, srp);
-	if (res)
-		SG_LOG(1, sfp, "%s: %s=0x%pK  state: %s\n", __func__,
-		       "unexpected srp", srp,
-		       sg_rq_st_str(atomic_read(&srp->rq_st), false));
+#if IS_ENABLED(SG_LOG_ACTIVE)
+	if (unlikely(res))
+		SG_LOG(1, sfp, "%s: %s=0x%pK  state: %s, share: %s\n",
+		       __func__, "unexpected srp", srp,
+		       sg_rq_st_str(atomic_read(&srp->rq_st), false),
+		       sg_shr_str(srp->sh_var, false));
+#endif
 	return res;
 }
 
-/* When use_tag is true then id is a tag, else it is a pack_id. */
+/*
+ * When use_tag is true then id is a tag, else it is a pack_id. Returns
+ * valid srp if match, else returns NULL.
+ */
 static struct sg_request *
 sg_match_request(struct sg_fd *sfp, bool use_tag, int id)
 {
@@ -2056,6 +2373,7 @@ sg_match_request(struct sg_fd *sfp, bool use_tag, int id)
 
 static int
 sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
+		__must_hold(sfp->f_mutex)
 {
 	bool use_tag;
 	int res, pack_id, tag, id;
@@ -2078,6 +2396,8 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	srp = sg_match_request(sfp, use_tag, id);
 	if (!srp) {	/* assume device (not just fd) scope */
 		xa_unlock_irqrestore(&sfp->srp_arr, iflags);
+		if (!(h4p->flags & SGV4_FLAG_DEV_SCOPE))
+			return -ENODATA;
 		xa_for_each(&sdp->sfp_arr, idx, o_sfp) {
 			if (o_sfp == sfp)
 				continue;	/* already checked */
@@ -2095,18 +2415,20 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	set_bit(SG_FRQ_ABORTING, srp->frq_bm);
 	res = 0;
 	switch (atomic_read(&srp->rq_st)) {
-	case SG_RS_BUSY:
+	case SG_RQ_BUSY:
 		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
-		res = -EBUSY;	/* shouldn't occur often */
+		res = -EBUSY;	/* should not occur often */
 		break;
-	case SG_RS_INACTIVE:	/* inactive on rq_list not good */
+	case SG_RQ_INACTIVE:	/* inactive on rq_list not good */
 		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
 		res = -EPROTO;
 		break;
-	case SG_RS_AWAIT_RCV:	/* user should still do completion */
+	case SG_RQ_AWAIT_RCV:	/* user should still do completion */
+	case SG_RQ_SHR_SWAP:
+	case SG_RQ_SHR_IN_WS:
 		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
 		break;		/* nothing to do here, return 0 */
-	case SG_RS_INFLIGHT:	/* only attempt abort if inflight */
+	case SG_RQ_INFLIGHT:	/* only attempt abort if inflight */
 		srp->rq_result |= (DRIVER_SOFT << 24);
 		{
 			struct request *rqq = READ_ONCE(srp->rqq);
@@ -2160,7 +2482,7 @@ sg_find_sfp_helper(struct sg_fd *from_sfp, struct sg_fd *pair_sfp,
 	xa_lock_irqsave(&from_sdp->sfp_arr, iflags);
 	rcu_assign_pointer(from_sfp->share_sfp, pair_sfp);
 	__xa_clear_mark(&from_sdp->sfp_arr, from_sfp->idx, SG_XA_FD_UNSHARED);
-	kref_get(&from_sdp->d_ref);	/* treat share like pseudo open() */
+	kref_get(&from_sfp->f_ref);	/* so unshare done before release */
 	if (from_rd_side)
 		__xa_set_mark(&from_sdp->sfp_arr, from_sfp->idx,
 			      SG_XA_FD_RS_SHARE);
@@ -2176,7 +2498,7 @@ sg_find_sfp_helper(struct sg_fd *from_sfp, struct sg_fd *pair_sfp,
 	if (!from_rd_side)
 		__xa_set_mark(&pair_sdp->sfp_arr, pair_sfp->idx,
 			      SG_XA_FD_RS_SHARE);
-	kref_get(&pair_sdp->d_ref);	/* keep symmetry */
+	kref_get(&pair_sfp->f_ref);	/* keep symmetry */
 	xa_unlock_irqrestore(&pair_sdp->sfp_arr, iflags);
 	return 0;
 }
@@ -2336,7 +2658,7 @@ sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
 	int res = 0;
 	int retry_count = 0;
 	struct file *filp;
-	struct sg_fd *ws_sfp = sg_fd_shared_ptr(rs_sfp);
+	struct sg_fd *ws_sfp = sg_fd_share_ptr(rs_sfp);
 
 	SG_LOG(3, ws_sfp, "%s:  new_write_side_fd: %d\n", __func__, new_ws_fd);
 	if (unlikely(!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO)))
@@ -2357,7 +2679,7 @@ sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
 	}
 	SG_LOG(6, ws_sfp, "%s: write-side fd ok, scan for filp=0x%pK\n", __func__,
 	       filp);
-	sg_unshare_fds(NULL, false, ws_sfp, false, false);
+	sg_unshare_ws_fd(ws_sfp, false);
 again:
 	ws_sfp = sg_find_sfp_by_fd(filp, new_ws_fd, rs_sfp, true);
 	if (IS_ERR(ws_sfp)) {
@@ -2386,7 +2708,7 @@ sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
  * First normalize want_rsv_sz to be >= sfp->sgat_elem_sz and
  * <= max_segment_size. Exit if that is the same as old size; otherwise
  * create a new candidate request of the new size. Then decide whether to
- * re-use an existing free list request (least buflen >= required size) or
+ * re-use an existing inactive request (least buflen >= required size) or
  * use the new candidate. If new one used, leave old one but it is no longer
  * the reserved request. Returns 0 on success, else a negated errno value.
  */
@@ -2404,12 +2726,15 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 	struct sg_device *sdp = sfp->parentdp;
 	struct xarray *xafp = &sfp->srp_arr;
 
+	if (unlikely(!xa_get_mark(&sfp->parentdp->sfp_arr, sfp->idx,
+				  SG_XA_FD_UNSHARED)))
+		return -EBUSY;	/* this fd can't be either side of share */
 	o_srp = sfp->rsv_srp;
 	if (!o_srp)
 		return -EPROTO;
 	new_sz = min_t(int, want_rsv_sz, sdp->max_sgat_sz);
 	new_sz = max_t(int, new_sz, sfp->sgat_elem_sz);
-	blen = o_srp->sgat_h.buflen;
+	blen = o_srp->sgatp->buflen;
 	SG_LOG(3, sfp, "%s: was=%d, ask=%d, new=%d (sgat_elem_sz=%d)\n",
 	       __func__, blen, want_rsv_sz, new_sz, sfp->sgat_elem_sz);
 	if (blen == new_sz)
@@ -2424,15 +2749,14 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 		res = -EPROTO;
 		goto fini;
 	}
-	if (SG_RS_ACTIVE(o_srp) || sfp->mmap_sz > 0) {
+	if (SG_RQ_ACTIVE(o_srp) || sfp->mmap_sz > 0) {
 		res = -EBUSY;
 		goto fini;
 	}
 	use_new_srp = true;
 	xa_for_each(xafp, idx, t_srp) {
-		if (t_srp != o_srp && new_sz <= t_srp->sgat_h.buflen &&
-		    !SG_RS_ACTIVE(t_srp)) {
-			/* good candidate on free list, use */
+		if (t_srp != o_srp && new_sz <= t_srp->sgatp->buflen &&
+		    !SG_RQ_ACTIVE(t_srp)) {
 			use_new_srp = false;
 			sfp->rsv_srp = t_srp;
 			break;
@@ -2447,7 +2771,7 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 		cxc_srp = __xa_cmpxchg(xafp, idx, o_srp, n_srp, GFP_ATOMIC);
 		if (o_srp == cxc_srp) {
 			sfp->rsv_srp = n_srp;
-			sg_rq_chg_state_force_ulck(n_srp, SG_RS_INACTIVE);
+			sg_rq_chg_state_force_ulck(n_srp, SG_RQ_INACTIVE);
 			/* don't bump inactives, since replaced an inactive */
 			xa_unlock_irqrestore(xafp, iflags);
 			SG_LOG(6, sfp, "%s: new rsv srp=0x%pK ++\n", __func__,
@@ -2496,6 +2820,27 @@ static int put_compat_request_table(struct compat_sg_req_info __user *o,
 }
 #endif
 
+static bool
+sg_any_persistent_orphans(struct sg_fd *sfp)
+{
+	if (test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm)) {
+		int num_waiting = atomic_read(&sfp->waiting);
+		unsigned long idx;
+		struct sg_request *srp;
+		struct xarray *xafp = &sfp->srp_arr;
+
+		if (num_waiting < 1)
+			return false;
+		xa_for_each_marked(xafp, idx, srp, SG_XA_RQ_AWAIT) {
+			if (unlikely(!srp))
+				continue;
+			if (test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm))
+				return true;
+		}
+	}
+	return false;
+}
+
 /*
  * Processing of ioctl(SG_SET_GET_EXTENDED(SG_SEIM_CTL_FLAGS)) which is a set
  * of boolean flags. Access abbreviations: [rw], read-write; [ro], read-only;
@@ -2509,6 +2854,7 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 	const u32 c_flgs_rm = seip->ctl_flags_rd_mask;
 	const u32 c_flgs_val_in = seip->ctl_flags;
 	u32 c_flgs_val_out = c_flgs_val_in;
+	struct sg_fd *rs_sfp;
 	struct sg_device *sdp = sfp->parentdp;
 
 	/* TIME_IN_NS boolean, [raw] time in nanoseconds (def: millisecs) */
@@ -2531,6 +2877,13 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		else
 			c_flgs_val_out &= ~SG_CTL_FLAGM_TAG_FOR_PACK_ID;
 	}
+	/* ORPHANS boolean, [ro] does this fd have any orphan requests? */
+	if (c_flgs_rm & SG_CTL_FLAGM_ORPHANS) {
+		if (sg_any_persistent_orphans(sfp))
+			c_flgs_val_out |= SG_CTL_FLAGM_ORPHANS;
+		else
+			c_flgs_val_out &= ~SG_CTL_FLAGM_ORPHANS;
+	}
 	/* OTHER_OPENS boolean, [ro] any other sg open fds on this dev? */
 	if (c_flgs_rm & SG_CTL_FLAGM_OTHER_OPENS) {
 		if (atomic_read(&sdp->open_cnt) > 1)
@@ -2554,10 +2907,58 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 	 * a shared commands is inflight, waits a little while for it
 	 * to finish.
 	 */
-	if (c_flgs_wm & SG_CTL_FLAGM_UNSHARE)
+	if (c_flgs_wm & SG_CTL_FLAGM_UNSHARE) {
+		mutex_lock(&sfp->f_mutex);
 		sg_do_unshare(sfp, !!(c_flgs_val_in & SG_CTL_FLAGM_UNSHARE));
+		mutex_unlock(&sfp->f_mutex);
+	}
 	if (c_flgs_rm & SG_CTL_FLAGM_UNSHARE)
-		c_flgs_val_out &= ~SG_CTL_FLAGM_UNSHARE;   /* clear bit */
+		c_flgs_val_out &= ~SG_CTL_FLAGM_UNSHARE;	/* clear bit */
+	/* IS_SHARE boolean: [ro] true if fd may be read-side or write-side share */
+	if (c_flgs_rm & SG_CTL_FLAGM_IS_SHARE) {
+		if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_UNSHARED))
+			c_flgs_val_out &= ~SG_CTL_FLAGM_IS_SHARE;
+		else
+			c_flgs_val_out |= SG_CTL_FLAGM_IS_SHARE;
+	}
+	/* IS_READ_SIDE boolean: [ro] true if this fd may be a read-side share */
+	if (c_flgs_rm & SG_CTL_FLAGM_IS_READ_SIDE) {
+		if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_RS_SHARE))
+			c_flgs_val_out |= SG_CTL_FLAGM_IS_READ_SIDE;
+		else
+			c_flgs_val_out &= ~SG_CTL_FLAGM_IS_READ_SIDE;
+	}
+	/*
+	 * READ_SIDE_FINI boolean, [rbw] should be called by write-side; when
+	 * reading: read-side is finished, awaiting action by write-side;
+	 * when written: 1 --> write-side doesn't want to continue
+	 */
+	if (c_flgs_rm & SG_CTL_FLAGM_READ_SIDE_FINI) {
+		rs_sfp = sg_fd_share_ptr(sfp);
+		if (rs_sfp && rs_sfp->rsv_srp) {
+			struct sg_request *res_srp = rs_sfp->rsv_srp;
+
+			if (atomic_read(&res_srp->rq_st) == SG_RQ_SHR_SWAP)
+				c_flgs_val_out |= SG_CTL_FLAGM_READ_SIDE_FINI;
+			else
+				c_flgs_val_out &= ~SG_CTL_FLAGM_READ_SIDE_FINI;
+		} else {
+			c_flgs_val_out &= ~SG_CTL_FLAGM_READ_SIDE_FINI;
+		}
+	}
+	if (c_flgs_wm & SG_CTL_FLAGM_READ_SIDE_FINI) {
+		bool rs_fini_wm = !!(c_flgs_val_in & SG_CTL_FLAGM_READ_SIDE_FINI);
+
+		sg_change_after_read_side_rq(sfp, rs_fini_wm);
+	}
+	/* READ_SIDE_ERR boolean, [ro] share: read-side finished with error */
+	if (c_flgs_rm & SG_CTL_FLAGM_READ_SIDE_ERR) {
+		rs_sfp = sg_fd_share_ptr(sfp);
+		if (rs_sfp && test_bit(SG_FFD_READ_SIDE_ERR, rs_sfp->ffd_bm))
+			c_flgs_val_out |= SG_CTL_FLAGM_READ_SIDE_ERR;
+		else
+			c_flgs_val_out &= ~SG_CTL_FLAGM_READ_SIDE_ERR;
+	}
 	/* NO_DURATION boolean, [rbw] */
 	if (c_flgs_rm & SG_CTL_FLAGM_NO_DURATION)
 		flg = test_bit(SG_FFD_NO_DURATION, sfp->ffd_bm);
@@ -2700,7 +3101,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 		}
 		/* if share then yield device number of (other) read-side */
 		if (s_rd_mask & SG_SEIM_SHARE_FD) {
-			struct sg_fd *sh_sfp = sg_fd_shared_ptr(sfp);
+			struct sg_fd *sh_sfp = sg_fd_share_ptr(sfp);
 
 			seip->share_fd = sh_sfp ? sh_sfp->parentdp->index :
 						   U32_MAX;
@@ -2717,7 +3118,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 		}
 		/* if share then yield device number of (other) write-side */
 		if (s_rd_mask & SG_SEIM_CHG_SHARE_FD) {
-			struct sg_fd *sh_sfp = sg_fd_shared_ptr(sfp);
+			struct sg_fd *sh_sfp = sg_fd_share_ptr(sfp);
 
 			seip->share_fd = sh_sfp ? sh_sfp->parentdp->index :
 						  U32_MAX;
@@ -2766,7 +3167,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 	}
 	if (s_rd_mask & SG_SEIM_RESERVED_SIZE)
 		seip->reserved_sz = (u32)min_t(int,
-					       sfp->rsv_srp->sgat_h.buflen,
+					       sfp->rsv_srp->sgatp->buflen,
 					       sdp->max_sgat_sz);
 	/* copy to user space if int or boolean read mask non-zero */
 	if (s_rd_mask || seip->ctl_flags_rd_mask) {
@@ -2863,27 +3264,37 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 
 	SG_LOG(6, sfp, "%s: cmd=0x%x, O_NONBLOCK=%d\n", __func__, cmd_in,
 	       !!(filp->f_flags & O_NONBLOCK));
-	if (unlikely(SG_IS_DETACHING(sdp)))
-		return -ENODEV;
 	sdev = sdp->device;
 
 	switch (cmd_in) {
 	case SG_IO:
+		if (unlikely(SG_IS_DETACHING(sdp)))
+			return -ENODEV;
 		return sg_ctl_sg_io(sdp, sfp, p);
 	case SG_IOSUBMIT:
 		SG_LOG(3, sfp, "%s:    SG_IOSUBMIT\n", __func__);
+		if (unlikely(SG_IS_DETACHING(sdp)))
+			return -ENODEV;
 		return sg_ctl_iosubmit(sfp, p);
 	case SG_IOSUBMIT_V3:
 		SG_LOG(3, sfp, "%s:    SG_IOSUBMIT_V3\n", __func__);
+		if (unlikely(SG_IS_DETACHING(sdp)))
+			return -ENODEV;
 		return sg_ctl_iosubmit_v3(sfp, p);
 	case SG_IORECEIVE:
 		SG_LOG(3, sfp, "%s:    SG_IORECEIVE\n", __func__);
+		if (unlikely(SG_IS_DETACHING(sdp)))
+			return -ENODEV;
 		return sg_ctl_ioreceive(sfp, p);
 	case SG_IORECEIVE_V3:
 		SG_LOG(3, sfp, "%s:    SG_IORECEIVE_V3\n", __func__);
+		if (unlikely(SG_IS_DETACHING(sdp)))
+			return -ENODEV;
 		return sg_ctl_ioreceive_v3(sfp, p);
 	case SG_IOABORT:
 		SG_LOG(3, sfp, "%s:    SG_IOABORT\n", __func__);
+		if (unlikely(SG_IS_DETACHING(sdp)))
+			return -ENODEV;
 		if (read_only)
 			return -EPERM;
 		mutex_lock(&sfp->f_mutex);
@@ -2949,7 +3360,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return res;
 	case SG_GET_RESERVED_SIZE:
 		mutex_lock(&sfp->f_mutex);
-		val = min_t(int, sfp->rsv_srp->sgat_h.buflen,
+		val = min_t(int, sfp->rsv_srp->sgatp->buflen,
 			    sdp->max_sgat_sz);
 		mutex_unlock(&sfp->f_mutex);
 		SG_LOG(3, sfp, "%s:    SG_GET_RESERVED_SIZE=%d\n",
@@ -3149,11 +3560,11 @@ sg_srp_q_blk_poll(struct sg_request *srp, struct request_queue *q, int loop_coun
 
 	num = (loop_count < 1) ? 1 : loop_count;
 	for (k = 0; k < num; ++k) {
-		if (atomic_read(&srp->rq_st) != SG_RS_INFLIGHT)
+		if (atomic_read(&srp->rq_st) != SG_RQ_INFLIGHT)
 			return -ENODATA;
 		n = blk_poll(q, srp->cookie, loop_count < 0 /* spin if negative */);
 		if (n > 0)
-			return atomic_read(&srp->rq_st) == SG_RS_AWAIT_RCV;
+			return atomic_read(&srp->rq_st) == SG_RQ_AWAIT_RCV;
 		if (n < 0)
 			return n;
 	}
@@ -3183,7 +3594,7 @@ sg_sfp_blk_poll(struct sg_fd *sfp, int loop_count)
 	xa_for_each(xafp, idx, srp) {
 		if ((srp->rq_flags & SGV4_FLAG_HIPRI) &&
 		    !test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm) &&
-		    atomic_read(&srp->rq_st) == SG_RS_INFLIGHT &&
+		    atomic_read(&srp->rq_st) == SG_RQ_INFLIGHT &&
 		    test_bit(SG_FRQ_ISSUED, srp->frq_bm)) {
 			xa_unlock_irqrestore(xafp, iflags);
 			n = sg_srp_q_blk_poll(srp, q, loop_count);
@@ -3299,7 +3710,7 @@ sg_vma_fault(struct vm_fault *vmf)
 		goto out_err;
 	}
 	mutex_lock(&sfp->f_mutex);
-	rsv_schp = &srp->sgat_h;
+	rsv_schp = srp->sgatp;
 	offset = vmf->pgoff << PAGE_SHIFT;
 	if (offset >= (unsigned int)rsv_schp->buflen) {
 		SG_LOG(1, sfp, "%s: offset[%lu] >= rsv.buflen\n", __func__,
@@ -3357,7 +3768,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 	}
 	/* Check reserve request is inactive and has large enough buffer */
 	srp = sfp->rsv_srp;
-	if (SG_RS_ACTIVE(srp)) {
+	if (SG_RQ_ACTIVE(srp)) {
 		res = -EBUSY;
 		goto fini;
 	}
@@ -3425,7 +3836,7 @@ sg_rq_end_io_usercontext(struct work_struct *work)
 static void
 sg_rq_end_io(struct request *rqq, blk_status_t status)
 {
-	enum sg_rq_state rqq_state = SG_RS_AWAIT_RCV;
+	enum sg_rq_state rqq_state = SG_RQ_AWAIT_RCV;
 	int a_resid, slen;
 	u32 rq_result;
 	unsigned long iflags;
@@ -3452,18 +3863,18 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 			srp->in_resid = a_resid;
 		}
 	}
+	if (test_bit(SG_FRQ_ABORTING, srp->frq_bm) && rq_result == 0)
+		srp->rq_result |= (DRIVER_HARD << 24);
 
-	SG_LOG(6, sfp, "%s: pack_id=%d, res=0x%x\n", __func__, srp->pack_id,
-	       srp->rq_result);
+	SG_LOG(6, sfp, "%s: pack_id=%d, tag=%d, res=0x%x\n", __func__,
+	       srp->pack_id, srp->tag, srp->rq_result);
 	if (srp->start_ns > 0)	/* zero only when SG_FFD_NO_DURATION is set */
 		srp->duration = sg_calc_rq_dur(srp, test_bit(SG_FFD_TIME_IN_NS,
 							     sfp->ffd_bm));
 	if (unlikely((rq_result & SG_ML_RESULT_MSK) && slen > 0 &&
 		     test_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm))) {
-		u32 scsi_stat = rq_result & 0xff;
-
-		if (scsi_stat == SAM_STAT_CHECK_CONDITION ||
-		    scsi_stat == SAM_STAT_COMMAND_TERMINATED)
+		if ((rq_result & 0xff) == SAM_STAT_CHECK_CONDITION ||
+		    (rq_result & 0xff) == SAM_STAT_COMMAND_TERMINATED)
 			__scsi_print_sense(sdp->device, __func__, scsi_rp->sense, slen);
 	}
 	if (slen > 0) {
@@ -3491,10 +3902,10 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	srp->sense_len = slen;
 	if (unlikely(test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm))) {
 		if (test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm)) {
-			clear_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
+			__clear_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
 		} else {
-			rqq_state = SG_RS_BUSY;
-			set_bit(SG_FRQ_DEACT_ORPHAN, srp->frq_bm);
+			rqq_state = SG_RQ_BUSY;
+			__set_bit(SG_FRQ_DEACT_ORPHAN, srp->frq_bm);
 		}
 	}
 	xa_lock_irqsave(&sfp->srp_arr, iflags);
@@ -3522,7 +3933,7 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	scsi_req_free_cmd(scsi_rp);
 	blk_put_request(rqq);
 
-	if (likely(rqq_state == SG_RS_AWAIT_RCV)) {
+	if (likely(rqq_state == SG_RQ_AWAIT_RCV)) {
 		/* Wake any sg_read()/ioctl(SG_IORECEIVE) awaiting this req */
 		if (!(srp->rq_flags & SGV4_FLAG_HIPRI))
 			wake_up_interruptible(&sfp->read_wait);
@@ -3649,7 +4060,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 		goto cdev_add_err;
 
 	sdp->cdev = cdev;
-	if (sg_sysfs_valid) {
+	if (likely(sg_sysfs_valid)) {
 		struct device *sg_class_member;
 
 		sg_class_member = device_create(sg_sysfs_class, cl_dev->parent,
@@ -3663,7 +4074,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 		}
 		error = sysfs_create_link(&scsidp->sdev_gendev.kobj,
 					  &sg_class_member->kobj, "generic");
-		if (error)
+		if (unlikely(error))
 			pr_err("%s: unable to make symlink 'generic' back "
 			       "to sg%d\n", __func__, sdp->index);
 	} else
@@ -3674,7 +4085,6 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 		    "type %d\n", sdp->index, scsidp->type);
 
 	dev_set_drvdata(cl_dev, sdp);
-
 	return 0;
 
 cdev_add_err:
@@ -3694,7 +4104,7 @@ static void
 sg_device_destroy(struct kref *kref)
 {
 	struct sg_device *sdp = container_of(kref, struct sg_device, d_ref);
-	unsigned long flags;
+	unsigned long iflags;
 
 	SCSI_LOG_TIMEOUT(1, pr_info("[tid=%d] %s: sdp idx=%d, sdp=0x%pK --\n",
 				    (current ? current->pid : -1), __func__,
@@ -3706,9 +4116,9 @@ sg_device_destroy(struct kref *kref)
 	 */
 
 	xa_destroy(&sdp->sfp_arr);
-	write_lock_irqsave(&sg_index_lock, flags);
+	write_lock_irqsave(&sg_index_lock, iflags);
 	idr_remove(&sg_index_idr, sdp->index);
-	write_unlock_irqrestore(&sg_index_lock, flags);
+	write_unlock_irqrestore(&sg_index_lock, iflags);
 
 	put_disk(sdp->disk);
 	kfree(sdp);
@@ -3962,7 +4372,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		kfree(long_cmdp);
 		return PTR_ERR(rqq);
 	}
-	/* current sg_request protected by SG_RS_BUSY state */
+	/* current sg_request protected by SG_RQ_BUSY state */
 	scsi_rp = scsi_req(rqq);
 	WRITE_ONCE(srp->rqq, rqq);
 	if (rq_flags & SGV4_FLAG_YIELD_TAG)
@@ -3981,15 +4391,15 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	scsi_rp->cmd_len = cwrp->cmd_len;
 	srp->cmd_opcode = scsi_rp->cmd[0];
 	us_xfer = !(rq_flags & (SG_FLAG_NO_DXFER | SG_FLAG_MMAP_IO));
-	assign_bit(SG_FRQ_NO_US_XFER, srp->frq_bm, !us_xfer);
+	assign_bit(SG_FRQ_US_XFER, srp->frq_bm, us_xfer);
 	reserved = (sfp->rsv_srp == srp);
 	rqq->end_io_data = srp;
 	scsi_rp->retries = SG_DEFAULT_RETRIES;
-	req_schp = &srp->sgat_h;
+	req_schp = srp->sgatp;
 
 	if (dxfer_len <= 0 || dxfer_dir == SG_DXFER_NONE) {
 		SG_LOG(4, sfp, "%s: no data xfer [0x%pK]\n", __func__, srp);
-		set_bit(SG_FRQ_NO_US_XFER, srp->frq_bm);
+		clear_bit(SG_FRQ_US_XFER, srp->frq_bm);
 		goto fini;	/* path of reqs with no din nor dout */
 	} else if ((rq_flags & SG_FLAG_DIRECT_IO) && iov_count == 0 &&
 		   !sdp->device->host->unchecked_isa_dma &&
@@ -4057,8 +4467,8 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	} else {
 		srp->bio = rqq->bio;
 	}
-	SG_LOG((res ? 1 : 4), sfp, "%s: %s res=%d [0x%pK]\n", __func__, cp,
-	       res, srp);
+	SG_LOG((res ? 1 : 4), sfp, "%s: %s %s res=%d [0x%pK]\n", __func__,
+	       sg_shr_str(srp->sh_var, false), cp, res, srp);
 	return res;
 }
 
@@ -4092,7 +4502,7 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 		blk_put_request(rqq);
 	}
 	if (srp->bio) {
-		bool us_xfer = !test_bit(SG_FRQ_NO_US_XFER, srp->frq_bm);
+		bool us_xfer = test_bit(SG_FRQ_US_XFER, srp->frq_bm);
 		struct bio *bio = srp->bio;
 
 		srp->bio = NULL;
@@ -4118,7 +4528,7 @@ sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen)
 	gfp_t mask_ap = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN | __GFP_ZERO;
 	gfp_t mask_kz = GFP_ATOMIC | __GFP_NOWARN;
 	struct sg_device *sdp = sfp->parentdp;
-	struct sg_scatter_hold *schp = &srp->sgat_h;
+	struct sg_scatter_hold *schp = srp->sgatp;
 	struct page **pgp;
 
 	if (unlikely(minlen <= 0)) {
@@ -4234,7 +4644,7 @@ sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
 {
 	int k, num, res;
 	struct page *pgp;
-	struct sg_scatter_hold *schp = &srp->sgat_h;
+	struct sg_scatter_hold *schp = srp->sgatp;
 
 	SG_LOG(4, srp->parentfp, "%s: num_xfer=%d\n", __func__, num_xfer);
 	if (unlikely(!outp || num_xfer <= 0))
@@ -4271,13 +4681,13 @@ sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
  * SG_PACK_ID_WILDCARD and SG_TAG_WILDCARD are -1 and that case is typically
  * the fast path. This function is only used in the non-blocking cases.
  * Returns pointer to (first) matching sg_request or NULL. If found,
- * sg_request state is moved from SG_RS_AWAIT_RCV to SG_RS_BUSY.
+ * sg_request state is moved from SG_RQ_AWAIT_RCV to SG_RQ_BUSY.
  */
 static struct sg_request *
 sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 {
 	__maybe_unused bool is_bad_st = false;
-	__maybe_unused enum sg_rq_state bad_sr_st = SG_RS_INACTIVE;
+	__maybe_unused enum sg_rq_state bad_sr_st = SG_RQ_INACTIVE;
 	bool search_for_1 = (id != SG_TAG_WILDCARD);
 	bool second = false;
 	enum sg_rq_state sr_st;
@@ -4315,8 +4725,8 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 			}
 			sr_st = atomic_read(&srp->rq_st);
 			switch (sr_st) {
-			case SG_RS_AWAIT_RCV:
-				res = sg_rq_chg_state(srp, sr_st, SG_RS_BUSY);
+			case SG_RQ_AWAIT_RCV:
+				res = sg_rq_chg_state(srp, sr_st, SG_RQ_BUSY);
 				if (likely(res == 0))
 					goto good;
 				/* else another caller got it, move on */
@@ -4325,7 +4735,9 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 					bad_sr_st = atomic_read(&srp->rq_st);
 				}
 				break;
-			case SG_RS_INFLIGHT:
+			case SG_RQ_SHR_IN_WS:
+				goto good;
+			case SG_RQ_INFLIGHT:
 				break;
 			default:
 				if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
@@ -4358,13 +4770,13 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 		     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
 			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
 				continue;
-			res = sg_rq_chg_state(srp, SG_RS_AWAIT_RCV, SG_RS_BUSY);
+			res = sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY);
 			if (likely(res == 0)) {
 				WRITE_ONCE(sfp->low_await_idx, idx + 1);
 				goto good;
 			}
 #if IS_ENABLED(SG_LOG_ACTIVE)
-			sg_rq_state_fail_msg(sfp, SG_RS_AWAIT_RCV, SG_RS_BUSY, __func__);
+			sg_rq_state_fail_msg(sfp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY, __func__);
 #endif
 		}
 		if (!srp && !second && s_idx > 0) {
@@ -4414,9 +4826,11 @@ sg_mk_srp(struct sg_fd *sfp, bool first)
 	else
 		srp = kzalloc(sizeof(*srp), gfp | GFP_ATOMIC);
 	if (srp) {
-		atomic_set(&srp->rq_st, SG_RS_BUSY);
+		atomic_set(&srp->rq_st, SG_RQ_BUSY);
+		srp->sh_var = SG_SHR_NONE;
 		srp->parentfp = sfp;
 		srp->tag = SG_TAG_WILDCARD;
+		srp->sgatp = &srp->sgat_h; /* only write-side share changes sgatp */
 		return srp;
 	} else {
 		return ERR_PTR(-ENOMEM);
@@ -4445,7 +4859,7 @@ sg_mk_srp_sgat(struct sg_fd *sfp, bool first, int db_len)
  * Irrespective of the given reserve request size, the minimum size requested
  * will be PAGE_SIZE (often 4096 bytes). Returns a pointer to reserve object or
  * a negated errno value twisted by ERR_PTR() macro. The actual number of bytes
- * allocated (maybe less than buflen) is in srp->sgat_h.buflen . Note that this
+ * allocated (maybe less than buflen) is in srp->sgatp->buflen . Note that this
  * function is only called in contexts where locking is not required.
  */
 static struct sg_request *
@@ -4482,26 +4896,125 @@ sg_build_reserve(struct sg_fd *sfp, int buflen)
 /*
  * Setup an active request (soon to carry a SCSI command) to the current file
  * descriptor by creating a new one or re-using a request from the free
- * list (fl). If successful returns a valid pointer in SG_RS_BUSY state. On
+ * list (fl). If successful returns a valid pointer in SG_RQ_BUSY state. On
  * failure returns a negated errno value twisted by ERR_PTR() macro.
  */
 static struct sg_request *
-sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
+sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 {
 	bool act_empty = false;
-	bool found = false;
+	bool allow_rsv = true;
 	bool mk_new_srp = true;
+	bool ws_rq = false;
 	bool try_harder = false;
 	bool second = false;
 	bool has_inactive = false;
-	int l_used_idx;
+	int res, l_used_idx;
 	u32 sum_dlen;
 	unsigned long idx, s_idx, end_idx, iflags;
+	enum sg_rq_state sr_st;
+	enum sg_rq_state rs_sr_st = SG_RQ_INACTIVE;
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_request *r_srp = NULL;	/* request to return */
 	struct sg_request *low_srp = NULL;
+	__maybe_unused struct sg_request *rsv_srp;
+	struct sg_request *rs_rsv_srp = NULL;
+	struct sg_fd *rs_sfp = NULL;
 	struct xarray *xafp = &fp->srp_arr;
 	__maybe_unused const char *cp;
+	char b[48];
+
+	b[0] = '\0';
+	rsv_srp = fp->rsv_srp;
+
+	switch (sh_var) {
+	case SG_SHR_NONE:
+	case SG_SHR_WS_NOT_SRQ:
+		break;
+	case SG_SHR_RS_RQ:
+		sr_st = atomic_read(&rsv_srp->rq_st);
+		if (sr_st == SG_RQ_INACTIVE) {
+			res = sg_rq_chg_state(rsv_srp, sr_st, SG_RQ_BUSY);
+			if (likely(res == 0)) {
+				r_srp = rsv_srp;
+				mk_new_srp = false;
+				cp = "rs_rq";
+				goto good_fini;
+			}
+		}
+		r_srp = ERR_PTR(-EBUSY);
+		break;
+	case SG_SHR_RS_NOT_SRQ:
+		allow_rsv = false;
+		break;
+	case SG_SHR_WS_RQ:
+		rs_sfp = sg_fd_share_ptr(fp);
+		if (!sg_fd_is_shared(fp)) {
+			r_srp = ERR_PTR(-EPROTO);
+			break;
+		}
+		/*
+		 * Contention here may be with another potential write-side trying
+		 * to pair with this read-side. The loser will receive an
+		 * EADDRINUSE errno. The winner advances read-side's rq_state:
+		 *     SG_RQ_SHR_SWAP --> SG_RQ_SHR_IN_WS
+		 */
+		rs_rsv_srp = rs_sfp->rsv_srp;
+		rs_sr_st = atomic_read(&rs_rsv_srp->rq_st);
+		switch (rs_sr_st) {
+		case SG_RQ_AWAIT_RCV:
+			if (rs_rsv_srp->rq_result & SG_ML_RESULT_MSK) {
+				r_srp = ERR_PTR(-ENOSTR);
+				break;
+			}
+			fallthrough;
+		case SG_RQ_SHR_SWAP:
+			ws_rq = true;
+			if (rs_sr_st == SG_RQ_AWAIT_RCV)
+				break;
+			res = sg_rq_chg_state(rs_rsv_srp, rs_sr_st, SG_RQ_SHR_IN_WS);
+			if (unlikely(res))
+				r_srp = ERR_PTR(-EADDRINUSE);
+			break;
+		case SG_RQ_INFLIGHT:
+		case SG_RQ_BUSY:
+			r_srp = ERR_PTR(-EBUSY);
+			break;
+		case SG_RQ_INACTIVE:
+			r_srp = ERR_PTR(-EADDRNOTAVAIL);
+			break;
+		case SG_RQ_SHR_IN_WS:
+		default:
+			r_srp = ERR_PTR(-EADDRINUSE);
+			break;
+		}
+		break;
+	}
+	if (IS_ERR(r_srp)) {
+		if (PTR_ERR(r_srp) == -EBUSY)
+			goto err_out2;
+		if (sh_var == SG_SHR_RS_RQ)
+			snprintf(b, sizeof(b), "SG_SHR_RS_RQ --> sr_st=%s",
+				 sg_rq_st_str(sr_st, false));
+		else if (sh_var == SG_SHR_WS_RQ && rs_sfp)
+			snprintf(b, sizeof(b), "SG_SHR_WS_RQ-->rs_sr_st=%s",
+				 sg_rq_st_str(rs_sr_st, false));
+		else
+			snprintf(b, sizeof(b), "sh_var=%s",
+				 sg_shr_str(sh_var, false));
+		goto err_out;
+	}
+	cp = "";
+
+	if (ws_rq) {	/* write-side dlen may be smaller than read-side's dlen */
+		if (dxfr_len > rs_rsv_srp->sgatp->dlen) {
+			SG_LOG(4, fp, "%s: write-side dlen [%d] > read-side dlen\n",
+			       __func__, dxfr_len);
+			r_srp = ERR_PTR(-E2BIG);
+			goto err_out;
+		}
+		dxfr_len = 0;	/* any srp for write-side will do, pick smallest */
+	}
 
 start_again:
 	cp = "";
@@ -4516,8 +5029,8 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 		if (l_used_idx >= 0 && xa_get_mark(xafp, s_idx, SG_XA_RQ_INACTIVE)) {
 			r_srp = xa_load(xafp, s_idx);
 			if (r_srp && r_srp->sgat_h.buflen <= SG_DEF_SECTOR_SZ) {
-				if (sg_rq_chg_state(r_srp, SG_RS_INACTIVE, SG_RS_BUSY) == 0) {
-					found = true;
+				if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY) == 0) {
+					mk_new_srp = false;
 					atomic_dec(&fp->inactives);
 					goto have_existing;
 				}
@@ -4525,6 +5038,8 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 		}
 		xa_for_each_marked(xafp, idx, r_srp, SG_XA_RQ_INACTIVE) {
 			has_inactive = true;
+			if (!allow_rsv && rsv_srp == r_srp)
+				continue;
 			if (!low_srp && dxfr_len < SG_DEF_SECTOR_SZ) {
 				low_srp = r_srp;
 				break;
@@ -4533,11 +5048,11 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 		/* If dxfr_len is small, use lowest inactive request */
 		if (low_srp) {
 			r_srp = low_srp;
-			if (sg_rq_chg_state(r_srp, SG_RS_INACTIVE, SG_RS_BUSY))
+			if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
 				goto start_again; /* gone to another thread */
 			atomic_dec(&fp->inactives);
-			cp = "toward end of srp_arr";
-			found = true;
+			cp = "lowest inactive in srp_arr";
+			mk_new_srp = false;
 		}
 	} else {
 		l_used_idx = READ_ONCE(fp->low_used_idx);
@@ -4548,13 +5063,15 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 		for (r_srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE);
 		     r_srp;
 		     r_srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE)) {
+			if (!allow_rsv && rsv_srp == r_srp)
+				continue;
 			if (r_srp->sgat_h.buflen >= dxfr_len) {
-				if (sg_rq_chg_state(r_srp, SG_RS_INACTIVE, SG_RS_BUSY))
+				if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
 					continue;
 				atomic_dec(&fp->inactives);
 				WRITE_ONCE(fp->low_used_idx, idx + 1);
 				cp = "near front of srp_arr";
-				found = true;
+				mk_new_srp = false;
 				break;
 			}
 		}
@@ -4568,15 +5085,14 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 		}
 	}
 have_existing:
-	if (found) {
+	if (!mk_new_srp) {
 		r_srp->in_resid = 0;
 		r_srp->rq_info = 0;
 		r_srp->sense_len = 0;
-		mk_new_srp = false;
-	} else {
-		mk_new_srp = true;
 	}
-	if (mk_new_srp) {
+
+good_fini:
+	if (mk_new_srp) {	/* Need new sg_request object */
 		bool allow_cmd_q = test_bit(SG_FFD_CMD_Q, fp->ffd_bm);
 		int res;
 		u32 n_idx;
@@ -4608,51 +5124,74 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, int dxfr_len)
 		res = __xa_alloc(xafp, &n_idx, r_srp, xa_limit_32b, GFP_KERNEL);
 		xa_unlock_irqrestore(xafp, iflags);
 		if (res < 0) {
-			SG_LOG(1, fp, "%s: xa_alloc() failed, errno=%d\n",
-			       __func__,  -res);
 			sg_remove_sgat(r_srp);
 			kfree(r_srp);
 			r_srp = ERR_PTR(-EPROTOTYPE);
+			SG_LOG(1, fp, "%s: xa_alloc() failed, errno=%d\n",
+			       __func__,  -res);
 			goto fini;
 		}
 		idx = n_idx;
 		r_srp->rq_idx = idx;
 		r_srp->parentfp = fp;
+		sg_rq_chg_state_force(r_srp, SG_RQ_BUSY);
 		SG_LOG(4, fp, "%s: mk_new_srp=0x%pK ++\n", __func__, r_srp);
 	}
+	/* following copes with unlikely case where frq_bm > one ulong */
 	WRITE_ONCE(r_srp->frq_bm[0], cwrp->frq_bm[0]);	/* assumes <= 32 req flags */
-	r_srp->sgat_h.dlen = dxfr_len;/* must be <= r_srp->sgat_h.buflen */
+	r_srp->sgatp->dlen = dxfr_len;/* must be <= r_srp->sgat_h.buflen */
+	r_srp->sh_var = sh_var;
 	r_srp->cmd_opcode = 0xff;  /* set invalid opcode (VS), 0x0 is TUR */
 fini:
 	/* If setup stalls (e.g. blk_get_request()) debug shows 'elap=1 ns' */
 	if (test_bit(SG_FFD_TIME_IN_NS, fp->ffd_bm))
 		r_srp->start_ns = S64_MAX;
-	if (IS_ERR(r_srp))
-		SG_LOG(1, fp, "%s: err=%ld\n", __func__, PTR_ERR(r_srp));
+	if (ws_rq && rs_rsv_srp) {
+		rs_sfp->ws_srp = r_srp;
+		/* write-side "shares" the read-side reserve request's data buffer */
+		r_srp->sgatp = &rs_rsv_srp->sgat_h;
+	} else if (sh_var == SG_SHR_RS_RQ && test_bit(SG_FFD_READ_SIDE_ERR, fp->ffd_bm))
+		clear_bit(SG_FFD_READ_SIDE_ERR, fp->ffd_bm);
+err_out:
+	if (IS_ERR(r_srp) && b[0])
+		SG_LOG(1, fp, "%s: bad %s\n", __func__, b);
 	if (!IS_ERR(r_srp))
 		SG_LOG(4, fp, "%s: %s %sr_srp=0x%pK\n", __func__, cp,
 		       ((r_srp == fp->rsv_srp) ? "[rsv] " : ""), r_srp);
+err_out2:
 	return r_srp;
 }
 
 /*
- * Moves a completed sg_request object to the free list and sets it to
- * SG_RS_INACTIVE which makes it available for re-use. Requests with no data
- * associated are appended to the tail of the free list while other requests
- * are prepended to the head of the free list.
+ * Sets srp to SG_RQ_INACTIVE unless it was in SG_RQ_SHR_SWAP state. Also
+ * change the asociated xarray entry flags to be consistent with
+ * SG_RQ_INACTIVE. Since this function can be called from many contexts,
+ * then assume no xa locks held.
+ * The state machine should insure that two threads should never race here.
  */
 static void
 sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 {
+	enum sg_rq_state sr_st;
 	u8 *sbp;
 
 	if (WARN_ON(!sfp || !srp))
 		return;
 	sbp = srp->sense_bp;
 	srp->sense_bp = NULL;
-	WRITE_ONCE(srp->frq_bm[0], 0);
-	sg_rq_chg_state_force(srp, SG_RS_INACTIVE);
-	atomic_inc(&sfp->inactives);
+	sr_st = atomic_read(&srp->rq_st);
+	if (sr_st != SG_RQ_SHR_SWAP) { /* mark _BUSY then _INACTIVE at end */
+		/*
+		 * Can be called from many contexts and it is hard to know
+		 * whether xa locks held. So assume not.
+		 */
+		sg_rq_chg_state_force(srp, SG_RQ_INACTIVE);
+		atomic_inc(&sfp->inactives);
+		WRITE_ONCE(srp->frq_bm[0], 0);
+		srp->tag = SG_TAG_WILDCARD;
+		srp->in_resid = 0;
+		srp->rq_info = 0;
+	}
 	/* maybe orphaned req, thus never read */
 	if (sbp)
 		mempool_free(sbp, sg_sense_pool);
@@ -4722,14 +5261,20 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 			kfree(sfp);
 			return ERR_PTR(err);
 		}
-		if (srp->sgat_h.buflen < rbuf_len) {
+		if (srp->sgatp->buflen < rbuf_len) {
 			reduced = true;
 			SG_LOG(2, sfp,
 			       "%s: reserve reduced from %d to buflen=%d\n",
-			       __func__, rbuf_len, srp->sgat_h.buflen);
+			       __func__, rbuf_len, srp->sgatp->buflen);
 		}
 		xa_lock_irqsave(xafp, iflags);
 		res = __xa_alloc(xafp, &idx, srp, xa_limit_32b, GFP_ATOMIC);
+		if (!res) {
+			srp->rq_idx = idx;
+			srp->parentfp = sfp;
+			sg_rq_chg_state_force_ulck(srp, SG_RQ_INACTIVE);
+			atomic_inc(&sfp->inactives);
+		}
 		xa_unlock_irqrestore(xafp, iflags);
 		if (res < 0) {
 			SG_LOG(1, sfp, "%s: xa_alloc(srp) bad, errno=%d\n",
@@ -4739,10 +5284,6 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 			kfree(sfp);
 			return ERR_PTR(-EPROTOTYPE);
 		}
-		srp->rq_idx = idx;
-		srp->parentfp = sfp;
-		sg_rq_chg_state_force(srp, SG_RS_INACTIVE);
-		atomic_inc(&sfp->inactives);
 	}
 	if (!reduced) {
 		SG_LOG(4, sfp, "%s: built reserve buflen=%d\n", __func__,
@@ -4802,7 +5343,7 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 	xa_for_each(xafp, idx, srp) {
 		if (!xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE))
 			sg_finish_scsi_blk_rq(srp);
-		if (srp->sgat_h.buflen > 0)
+		if (srp->sgatp->buflen > 0)
 			sg_remove_sgat(srp);
 		if (srp->sense_bp) {
 			mempool_free(srp->sense_bp, sg_sense_pool);
@@ -4842,29 +5383,14 @@ static void
 sg_remove_sfp(struct kref *kref)
 {
 	struct sg_fd *sfp = container_of(kref, struct sg_fd, f_ref);
-	struct sg_device *sdp = sfp->parentdp;
-	struct xarray *xap = &sdp->sfp_arr;
-
-	if (!xa_get_mark(xap, sfp->idx, SG_XA_FD_UNSHARED)) {
-		struct sg_fd *o_sfp;
-
-		o_sfp = sg_fd_shared_ptr(sfp);
-		if (o_sfp && !test_bit(SG_FFD_RELEASE, o_sfp->ffd_bm) &&
-		    !xa_get_mark(xap, sfp->idx, SG_XA_FD_UNSHARED)) {
-			mutex_lock(&o_sfp->f_mutex);
-			sg_remove_sfp_share
-				(sfp, xa_get_mark(xap, sfp->idx,
-						  SG_XA_FD_RS_SHARE));
-			mutex_unlock(&o_sfp->f_mutex);
-		}
-	}
+
 	INIT_WORK(&sfp->ew_fd.work, sg_remove_sfp_usercontext);
 	schedule_work(&sfp->ew_fd.work);
 }
 
-/* must be called with sg_index_lock held */
 static struct sg_device *
 sg_lookup_dev(int dev)
+	__must_hold(&sg_index_lock)
 {
 	return idr_find(&sg_index_idr, dev);
 }
@@ -4899,14 +5425,37 @@ static const char *
 sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
 {
 	switch (rq_st) {	/* request state */
-	case SG_RS_INACTIVE:
+	case SG_RQ_INACTIVE:
 		return long_str ? "inactive" :  "ina";
-	case SG_RS_INFLIGHT:
+	case SG_RQ_INFLIGHT:
 		return long_str ? "inflight" : "act";
-	case SG_RS_AWAIT_RCV:
+	case SG_RQ_AWAIT_RCV:
 		return long_str ? "await_receive" : "rcv";
-	case SG_RS_BUSY:
+	case SG_RQ_BUSY:
 		return long_str ? "busy" : "bsy";
+	case SG_RQ_SHR_SWAP:	/* only an active read-side has this */
+		return long_str ? "share swap" : "s_wp";
+	case SG_RQ_SHR_IN_WS:	/* only an active read-side has this */
+		return long_str ? "share write-side active" : "ws_a";
+	default:
+		return long_str ? "unknown" : "unk";
+	}
+}
+
+static const char *
+sg_shr_str(enum sg_shr_var sh_var, bool long_str)
+{
+	switch (sh_var) {	/* share variety of request */
+	case SG_SHR_NONE:
+		return long_str ? "none" :  "-";
+	case SG_SHR_RS_RQ:
+		return long_str ? "read-side request" :  "rs_rq";
+	case SG_SHR_RS_NOT_SRQ:
+		return long_str ? "read-side, not share request" :  "rs_nsh";
+	case SG_SHR_WS_RQ:
+		return long_str ? "write-side request" :  "ws_rq";
+	case SG_SHR_WS_NOT_SRQ:
+		return long_str ? "write-side, not share request" :  "ws_nsh";
 	default:
 		return long_str ? "unknown" : "unk";
 	}
@@ -4919,6 +5468,12 @@ sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
 {
 	return "";
 }
+
+static const char *
+sg_shr_str(enum sg_shr_var sh_var, bool long_str)
+{
+	return "";
+}
 #endif
 
 #if IS_ENABLED(SG_PROC_OR_DEBUG_FS)
@@ -4935,8 +5490,8 @@ static struct sg_dfs_context_t {
 } sg_dfs_cxt;
 
 struct sg_proc_deviter {
-	loff_t	index;
-	size_t	max;
+	loff_t index;
+	size_t max;
 	int fd_index;
 };
 
@@ -4963,7 +5518,7 @@ dev_seq_start(struct seq_file *s, loff_t *pos)
 
 	it->index = *pos;
 	it->max = sg_last_dev();
-	if (it->index >= it->max)
+	if (it->index >= (int)it->max)
 		return NULL;
 	return it;
 }
@@ -5040,7 +5595,7 @@ sg_proc_write_dressz(struct file *filp, const char __user *buffer,
 		sg_big_buff = k;
 		return count;
 	}
-	return -ERANGE;
+	return -EDOM;
 }
 
 static int
@@ -5074,7 +5629,7 @@ sg_proc_seq_show_dev(struct seq_file *s, void *v)
 		scsidp = sdp->device;
 		seq_printf(s, "%d\t%d\t%d\t%llu\t%d\t%d\t%d\t%d\t%d\n",
 			      scsidp->host->host_no, scsidp->channel,
-			      scsidp->id, scsidp->lun, (int) scsidp->type,
+			      scsidp->id, scsidp->lun, (int)scsidp->type,
 			      1,
 			      (int) scsidp->queue_depth,
 			      (int) scsi_device_busy(scsidp),
@@ -5133,8 +5688,8 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
 	rq_st = atomic_read(&srp->rq_st);
 	dur = sg_get_dur(srp, &rq_st, t_in_ns, &is_dur);
 	n += scnprintf(obp + n, len - n, "%s%s: dlen=%d/%d id=%d", cp,
-		       sg_rq_st_str(rq_st, false), srp->sgat_h.dlen,
-		       srp->sgat_h.buflen, (int)srp->pack_id);
+		       sg_rq_st_str(rq_st, false), srp->sgatp->dlen,
+		       srp->sgatp->buflen, (int)srp->pack_id);
 	if (is_dur)	/* cmd/req has completed, waiting for ... */
 		n += scnprintf(obp + n, len - n, " dur=%u%s", dur, tp);
 	else if (dur < U32_MAX) { /* in-flight or busy (so ongoing) */
@@ -5145,9 +5700,12 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
 		n += scnprintf(obp + n, len - n, " t_o/elap=%us/%u%s",
 			       to / 1000, dur, tp);
 	}
+	if (srp->sh_var != SG_SHR_NONE)
+		n += scnprintf(obp + n, len - n, " shr=%s",
+			       sg_shr_str(srp->sh_var, false));
 	cp = (srp->rq_flags & SGV4_FLAG_HIPRI) ? "hipri " : "";
 	n += scnprintf(obp + n, len - n, " sgat=%d %sop=0x%02x\n",
-		       srp->sgat_h.num_sgat, cp, srp->cmd_opcode);
+		       srp->sgatp->num_sgat, cp, srp->cmd_opcode);
 	return n;
 }
 
@@ -5160,8 +5718,15 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 	int n = 0;
 	int to, k;
 	unsigned long iflags;
+	const char *cp;
 	struct sg_request *srp;
+	struct sg_device *sdp = fp->parentdp;
 
+	if (xa_get_mark(&sdp->sfp_arr, fp->idx, SG_XA_FD_UNSHARED))
+		cp = "";
+	else
+		cp = xa_get_mark(&sdp->sfp_arr, fp->idx, SG_XA_FD_RS_SHARE) ?
+			" shr_rs" : " shr_ws";
 	/* sgat=-1 means unavailable */
 	to = (fp->timeout >= 0) ? jiffies_to_msecs(fp->timeout) : -999;
 	if (to < 0)
@@ -5171,8 +5736,8 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 		n += scnprintf(obp + n, len - n, "timeout=%dms rs", to);
 	else
 		n += scnprintf(obp + n, len - n, "timeout=%ds rs", to / 1000);
-	n += scnprintf(obp + n, len - n, "v_buflen=%d idx=%lu\n   cmd_q=%d ",
-		       fp->rsv_srp->sgat_h.buflen, idx,
+	n += scnprintf(obp + n, len - n, "v_buflen=%d%s idx=%lu\n   cmd_q=%d ",
+		       fp->rsv_srp->sgatp->buflen, cp, idx,
 		       (int)test_bit(SG_FFD_CMD_Q, fp->ffd_bm));
 	n += scnprintf(obp + n, len - n,
 		       "f_packid=%d k_orphan=%d ffd_bm=0x%lx\n",
@@ -5311,10 +5876,10 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v, bool reduced)
 	if (!xa_empty(&sdp->sfp_arr)) {
 		found = true;
 		disk_name = (sdp->disk ? sdp->disk->disk_name : "?_?");
-		if (SG_IS_DETACHING(sdp))
+		if (SG_IS_DETACHING(sdp)) {
 			snprintf(b1, sizeof(b1), " >>> device=%s  %s\n",
 				 disk_name, "detaching pending close\n");
-		else if (sdp->device) {
+		} else if (sdp->device) {
 			n = sg_proc_debug_sdev(sdp, bp, bp_len, fdi_p,
 					       reduced);
 			if (n >= bp_len - 1) {
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 5c8a7c2c3191..272001a69d01 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -112,6 +112,9 @@ typedef struct sg_io_hdr {
 #define SGV4_FLAG_Q_AT_HEAD SG_FLAG_Q_AT_HEAD
 #define SGV4_FLAG_IMMED 0x400 /* for polling with SG_IOR, ignored in SG_IOS */
 #define SGV4_FLAG_HIPRI 0x800 /* request will use blk_poll to complete */
+#define SGV4_FLAG_DEV_SCOPE 0x1000 /* permit SG_IOABORT to have wider scope */
+#define SGV4_FLAG_SHARE 0x2000	/* share IO buffer; needs SG_SEIM_SHARE_FD */
+#define SGV4_FLAG_NO_DXFER SG_FLAG_NO_DXFER /* but keep dev<-->kernel xfr */
 
 /* Output (potentially OR-ed together) in v3::info or v4::info field */
 #define SG_INFO_OK_MASK 0x1
@@ -184,7 +187,12 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_CTL_FLAGM_OTHER_OPENS 0x4	/* rd: other sg fd_s on this dev */
 #define SG_CTL_FLAGM_ORPHANS	0x8	/* rd: orphaned requests on this fd */
 #define SG_CTL_FLAGM_Q_TAIL	0x10	/* used for future cmds on this fd */
+#define SG_CTL_FLAGM_IS_SHARE	0x20	/* rd: fd is read-side or write-side share */
+#define SG_CTL_FLAGM_IS_READ_SIDE 0x40	/* rd: this fd is read-side share */
 #define SG_CTL_FLAGM_UNSHARE	0x80	/* undo share after inflight cmd */
+/* rd> 1: read-side finished 0: not; wr> 1: finish share post read-side */
+#define SG_CTL_FLAGM_READ_SIDE_FINI 0x100 /* wr> 0: setup for repeat write-side req */
+#define SG_CTL_FLAGM_READ_SIDE_ERR 0x200 /* rd: sharing, read-side got error */
 #define SG_CTL_FLAGM_NO_DURATION 0x400	/* don't calc command duration */
 #define SG_CTL_FLAGM_MORE_ASYNC	0x800	/* yield EAGAIN in more cases */
 #define SG_CTL_FLAGM_ALL_BITS	0xfff	/* should be OR of previous items */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 52/83] sg: add multiple request support
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (51 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 51/83] sg: add shared requests Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 53/83] sg: rename some mrq variables Douglas Gilbert
                   ` (30 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Before the write() and read() system calls were removed from
the bsg driver (around lk 4.15) bsg supported multiple SCSI
requests being submitted in a single invocation. It did this
by passing an array of struct sg_io_v4 objects to the write()
whose third argument (the size the second argument points to)
is then a multiple of sizeof(sg_io_v4).

Doing the same with ioctl(SG_IOSUBMIT) is not practical since
with an ioctl() there is no "length of passed object" argument.
Further the __IOWR macro used to generate the ioctl number for
SG_IOSUBMIT encodes the expected length of the passed object,
and that is the size of a _single_ struct sg_io_v4 object.
So an indirect approach is taken: any object passed to
ioctl(SG_IO), ioctl(SG_IOSUBMIT) and ioctl(SG_IORECEIVE) with
SGV4_FLAG_MULTIPLE_REQS set is interpreted as a "controlling
object". It is parsed differently from other struct sg_io_v4
objects. Its data-out buffer contains an array of "normal"
struct sg_io_v4 objects.

Multiple requests can be combined with shared file
descriptors with SGV4_FLAG_DO_ON_OTHER indicating the other
file descriptor (in the share) is to be used for the
command it appears with. Multiple requests can be combined
with shared requests.

As a further optimisation, an array of SCSI commands can
be passed from the user space via the controlling object's
request "pointer". Without that, the multiple request
logic would need to visit the user space once per command
to pick up each SCSI command (cdb).

See the webpage at: https://sg.danny.cz/sg/sg_v40.html
in the section titled: "10 Multiple requests"

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 784 +++++++++++++++++++++++++++++++++++++----
 include/uapi/scsi/sg.h |  15 +-
 2 files changed, 722 insertions(+), 77 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index f43cfd2ae739..635a3e2b10e5 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -61,6 +61,7 @@ static char *sg_version_date = "20210421";
 #define SG_ALLOW_DIO_DEF 0
 
 #define SG_MAX_DEVS 32768
+#define SG_MAX_MULTI_REQ_SZ (2 * 1024 * 1024)
 
 /* Comment out the following line to compile out SCSI_LOGGING stuff */
 #define SG_DEBUG 1
@@ -75,7 +76,8 @@ static char *sg_version_date = "20210421";
 #define SG_PROC_OR_DEBUG_FS 1
 #endif
 
-/* SG_MAX_CDB_SIZE should be 260 (spc4r37 section 3.1.30) however the type
+/*
+ * SG_MAX_CDB_SIZE should be 260 (spc4r37 section 3.1.30) however the type
  * of sg_io_hdr::cmd_len can only represent 255. All SCSI commands greater
  * than 16 bytes are "variable length" whose length is a multiple of 4, so:
  */
@@ -213,6 +215,7 @@ struct sg_slice_hdr4 {	/* parts of sg_io_v4 object needed in async usage */
 	s16 dir;		/* data xfer direction; SG_DXFER_*  */
 	u16 cmd_len;		/* truncated of sg_io_v4::request_len */
 	u16 max_sb_len;		/* truncated of sg_io_v4::max_response_len */
+	u16 mrq_ind;		/* position in parentfp->mrq_arr */
 };
 
 struct sg_scatter_hold {     /* holding area for scsi scatter gather info */
@@ -257,7 +260,7 @@ struct sg_request {	/* active SCSI command or inactive request */
 
 struct sg_fd {		/* holds the state of a file descriptor */
 	struct sg_device *parentdp;	/* owning device */
-	wait_queue_head_t read_wait;	/* queue read until command done */
+	wait_queue_head_t cmpl_wait;	/* queue awaiting req completion */
 	struct mutex f_mutex;	/* serialize ioctls on this fd */
 	int timeout;		/* defaults to SG_DEFAULT_TIMEOUT      */
 	int timeout_user;	/* defaults to SG_DEFAULT_TIMEOUT_USER */
@@ -310,6 +313,7 @@ struct sg_comm_wr_t {  /* arguments to sg_common_write() */
 	};
 	struct sg_fd *sfp;
 	const u8 __user *u_cmdp;
+	const u8 *cmdp;
 };
 
 /* tasklet or soft irq callback */
@@ -327,6 +331,10 @@ static int sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp,
 static int sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 			struct sg_request **o_srp);
 static struct sg_request *sg_common_write(struct sg_comm_wr_t *cwrp);
+static int sg_wait_event_srp(struct sg_fd *sfp, void __user *p,
+			     struct sg_io_v4 *h4p, struct sg_request *srp);
+static int sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp,
+			 void __user *p, struct sg_io_v4 *h4p);
 static int sg_read_append(struct sg_request *srp, void __user *outp,
 			  int num_xfer);
 static void sg_remove_sgat(struct sg_request *srp);
@@ -335,6 +343,7 @@ static void sg_remove_sfp(struct kref *);
 static void sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side);
 static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id,
 					    bool is_tag);
+static bool sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp);
 static struct sg_request *sg_setup_req(struct sg_comm_wr_t *cwrp,
 				       enum sg_shr_var sh_var, int dxfr_len);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
@@ -364,7 +373,6 @@ static const char *sg_shr_str(enum sg_shr_var sh_var, bool long_str);
 #define SG_HAVE_EXCLUDE(sdp) test_bit(SG_FDEV_EXCLUDE, (sdp)->fdev_bm)
 #define SG_IS_O_NONBLOCK(sfp) (!!((sfp)->filp->f_flags & O_NONBLOCK))
 #define SG_RQ_ACTIVE(srp) (atomic_read(&(srp)->rq_st) != SG_RQ_INACTIVE)
-// #define SG_RQ_THIS_RQ(srp) ((srp)->sh_var == SG_SHR_RS_RQ)
 
 /*
  * Kernel needs to be built with CONFIG_SCSI_LOGGING to see log messages.
@@ -662,6 +670,16 @@ sg_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+/*
+ * ***********************************************************************
+ * write(2) related functions follow. They are shown before read(2) related
+ * functions. That is because SCSI commands/requests are first "written" to
+ * the SCSI device by using write(2), ioctl(SG_IOSUBMIT) or the first half
+ * of the synchronous ioctl(SG_IO) system call.
+ * ***********************************************************************
+ */
+
+/* This is the write(2) system call entry point. v4 interface disallowed. */
 static ssize_t
 sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 {
@@ -804,6 +822,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	cwr.cmd_len = cmd_size;
 	cwr.sfp = sfp;
 	cwr.u_cmdp = p;
+	cwr.cmdp = NULL;
 	srp = sg_common_write(&cwr);
 	return (IS_ERR(srp)) ? PTR_ERR(srp) : (int)count;
 }
@@ -831,7 +850,7 @@ sg_fetch_cmnd(struct sg_fd *sfp, const u8 __user *u_cdbp, int len, u8 *cdbp)
 		return -EMSGSIZE;
 	if (copy_from_user(cdbp, u_cdbp, len))
 		return -EFAULT;
-	if (O_RDWR != (sfp->filp->f_flags & O_ACCMODE)) { /* read-only */
+	if (O_RDWR != (sfp->filp->f_flags & O_ACCMODE)) {	/* read-only */
 		switch (sfp->parentdp->device->type) {
 		case TYPE_DISK:
 		case TYPE_RBC:
@@ -853,6 +872,8 @@ sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 	struct sg_comm_wr_t cwr;
 
 	/* now doing v3 blocking (sync) or non-blocking submission */
+	if (hp->flags & SGV4_FLAG_MULTIPLE_REQS)
+		return -ERANGE;		/* need to use v4 interface */
 	if (hp->flags & SG_FLAG_MMAP_IO) {
 		int res = sg_chk_mmap(sfp, hp->flags, hp->dxfer_len);
 
@@ -869,6 +890,7 @@ sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 	cwr.cmd_len = hp->cmd_len;
 	cwr.sfp = sfp;
 	cwr.u_cmdp = hp->cmdp;
+	cwr.cmdp = NULL;
 	srp = sg_common_write(&cwr);
 	if (IS_ERR(srp))
 		return PTR_ERR(srp);
@@ -877,6 +899,423 @@ sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 	return 0;
 }
 
+static void
+sg_sgv4_out_zero(struct sg_io_v4 *h4p)
+{
+	h4p->driver_status = 0;
+	h4p->transport_status = 0;
+	h4p->device_status = 0;
+	h4p->retry_delay = 0;
+	h4p->info = 0;
+	h4p->response_len = 0;
+	h4p->duration = 0;
+	h4p->din_resid = 0;
+	h4p->dout_resid = 0;
+	h4p->generated_tag = 0;
+	h4p->spare_out = 0;
+}
+
+/*
+ * Takes a pointer to the controlling multiple request (mrq) object and a
+ * pointer to the command array. The command array (with tot_reqs elements)
+ * is written out (flushed) to user space pointer cop->din_xferp. The
+ * secondary error value (s_res) is placed in the cop->spare_out field.
+ */
+static int
+sg_mrq_arr_flush(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds, u32 tot_reqs,
+		 int s_res)
+{
+	u32 sz = min(tot_reqs * SZ_SG_IO_V4, cop->din_xfer_len);
+	void __user *p = uptr64(cop->din_xferp);
+
+	if (s_res)
+		cop->spare_out = -s_res;
+	if (!p)
+		return 0;
+	if (sz > 0) {
+		if (copy_to_user(p, a_hds, sz))
+			return -EFAULT;
+	}
+	return 0;
+}
+
+static int
+sg_mrq_1complet(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
+		struct sg_fd *w_sfp, int tot_reqs, struct sg_request *srp)
+{
+	int s_res, indx;
+	struct sg_io_v4 *siv4p;
+
+	SG_LOG(3, w_sfp, "%s: start, tot_reqs=%d\n", __func__, tot_reqs);
+	if (!srp)
+		return -EPROTO;
+	indx = srp->s_hdr4.mrq_ind;
+	if (indx < 0 || indx >= tot_reqs)
+		return -EPROTO;
+	siv4p = a_hds + indx;
+	s_res = sg_receive_v4(w_sfp, srp, NULL, siv4p);
+	if (s_res == -EFAULT)
+		return s_res;
+	siv4p->info |= SG_INFO_MRQ_FINI;
+	if (w_sfp->async_qp && (siv4p->flags & SGV4_FLAG_SIGNAL)) {
+		s_res = sg_mrq_arr_flush(cop, a_hds, tot_reqs, s_res);
+		if (unlikely(s_res))	/* can only be -EFAULT */
+			return s_res;
+		kill_fasync(&w_sfp->async_qp, SIGPOLL, POLL_IN);
+	}
+	return 0;
+}
+
+/*
+ * This is a fair-ish algorithm for an interruptible wait on two file
+ * descriptors. It favours the main fd over the secondary fd (sec_sfp).
+ */
+static int
+sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
+		struct sg_fd *sfp, struct sg_fd *sec_sfp, int tot_reqs,
+		int mreqs, int sec_reqs)
+{
+	int res;
+	int sum_inflight = mreqs + sec_reqs;	/* may be < tot_reqs */
+	struct sg_request *srp;
+
+	SG_LOG(3, sfp, "%s: mreqs=%d, sec_reqs=%d\n", __func__, mreqs,
+	       sec_reqs);
+	for ( ; sum_inflight > 0; --sum_inflight) {
+		srp = NULL;
+		if (mreqs > 0 && sg_mrq_get_ready_srp(sfp, &srp)) {
+			if (IS_ERR(srp)) {	/* -ENODATA: no mrqs here */
+				mreqs = 0;
+			} else {
+				--mreqs;
+				res = sg_mrq_1complet(cop, a_hds, sfp,
+						      tot_reqs, srp);
+				if (unlikely(res))
+					return res;
+			}
+		} else if (sec_reqs > 0 &&
+			   sg_mrq_get_ready_srp(sec_sfp, &srp)) {
+			if (IS_ERR(srp)) {
+				sec_reqs = 0;
+			} else {
+				--sec_reqs;
+				res = sg_mrq_1complet(cop, a_hds, sec_sfp,
+						      tot_reqs, srp);
+				if (unlikely(res))
+					return res;
+			}
+		} else if (mreqs > 0) {
+			res = wait_event_interruptible
+					(sfp->cmpl_wait,
+					 sg_mrq_get_ready_srp(sfp, &srp));
+			if (unlikely(res))
+				return res;	/* signal --> -ERESTARTSYS */
+			if (IS_ERR(srp)) {
+				mreqs = 0;
+			} else {
+				--mreqs;
+				res = sg_mrq_1complet(cop, a_hds, sfp,
+						      tot_reqs, srp);
+				if (unlikely(res))
+					return res;
+			}
+		} else if (sec_reqs > 0) {
+			res = wait_event_interruptible
+					(sec_sfp->cmpl_wait,
+					 sg_mrq_get_ready_srp(sec_sfp, &srp));
+			if (unlikely(res))
+				return res;	/* signal --> -ERESTARTSYS */
+			if (IS_ERR(srp)) {
+				sec_reqs = 0;
+			} else {
+				--sec_reqs;
+				res = sg_mrq_1complet(cop, a_hds, sec_sfp,
+						      tot_reqs, srp);
+				if (unlikely(res))
+					return res;
+			}
+		} else { /* expect one of the above conditions to be true */
+			return -EPROTO;
+		}
+		if (cop->din_xfer_len > 0)
+			--cop->din_resid;
+	}
+	return 0;
+}
+
+static int
+sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
+	      struct sg_io_v4 *a_hds, u8 *cdb_ap, struct sg_fd *sfp,
+	      u32 tot_reqs)
+{
+	bool immed = !!(cop->flags & SGV4_FLAG_IMMED);
+	bool have_mrq_sense = (cop->response && cop->max_response_len);
+	int k;
+	u32 cdb_alen = cop->request_len;
+	u32 cdb_mxlen = cdb_alen / tot_reqs;
+	u32 flags;
+	struct sg_io_v4 *siv4p;
+	__maybe_unused const char *rip = "request index";
+
+	/* Pre-check each request for anomalies */
+	for (k = 0, siv4p = a_hds; k < tot_reqs; ++k, ++siv4p) {
+		flags = siv4p->flags;
+		sg_sgv4_out_zero(siv4p);
+		if (siv4p->guard != 'Q' || siv4p->protocol != 0 ||
+		    siv4p->subprotocol != 0) {
+			SG_LOG(1, sfp, "%s: req index %u: %s or protocol\n",
+			       __func__, k, "bad guard");
+			return -ERANGE;
+		}
+		if (flags & SGV4_FLAG_MULTIPLE_REQS) {
+			SG_LOG(1, sfp, "%s: %s %u: no nested multi-reqs\n",
+			       __func__, rip, k);
+			return -ERANGE;
+		}
+		if (immed) {	/* only accept async submits on current fd */
+			if (flags & SGV4_FLAG_DO_ON_OTHER) {
+				SG_LOG(1, sfp, "%s: %s %u, %s\n", __func__,
+				       rip, k, "no IMMED with ON_OTHER");
+				return -ERANGE;
+			} else if (flags & SGV4_FLAG_SHARE) {
+				SG_LOG(1, sfp, "%s: %s %u, %s\n", __func__,
+				       rip, k, "no IMMED with FLAG_SHARE");
+				return -ERANGE;
+			} else if (flags & SGV4_FLAG_COMPLETE_B4) {
+				SG_LOG(1, sfp, "%s: %s %u, %s\n", __func__,
+				       rip, k, "no IMMED with COMPLETE_B4");
+				return -ERANGE;
+			}
+		}
+		if (!sg_fd_is_shared(sfp)) {
+			if (flags & SGV4_FLAG_SHARE) {
+				SG_LOG(1, sfp, "%s: %s %u, no share\n",
+				       __func__, rip, k);
+				return -ERANGE;
+			} else if (flags & SGV4_FLAG_DO_ON_OTHER) {
+				SG_LOG(1, sfp, "%s: %s %u, %s do on\n",
+				       __func__, rip, k, "no other fd to");
+				return -ERANGE;
+			}
+		}
+		if (cdb_ap) {
+			if (siv4p->request_len > cdb_mxlen) {
+				SG_LOG(1, sfp, "%s: %s %u, cdb too long\n",
+				       __func__, rip, k);
+				return -ERANGE;
+			}
+		}
+		if (have_mrq_sense && siv4p->response == 0 &&
+		    siv4p->max_response_len == 0) {
+			siv4p->response = cop->response;
+			siv4p->max_response_len = cop->max_response_len;
+		}
+	}
+	return 0;
+}
+
+/*
+ * Implements the multiple request functionality. When blocking is true
+ * invocation was via ioctl(SG_IO), otherwise it was via ioctl(SG_IOSUBMIT).
+ * Only fully non-blocking if IMMED flag given or when ioctl(SG_IOSUBMIT)
+ * is used with O_NONBLOCK set on its file descriptor.
+ */
+static int
+sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
+{
+	bool set_this, set_other, immed, stop_if, f_non_block;
+	int res = 0;
+	int s_res = 0;	/* for secondary error: some-good-then-error, case */
+	int other_fp_sent = 0;
+	int this_fp_sent = 0;
+	int num_cmpl = 0;
+	const int shr_complet_b4 = SGV4_FLAG_SHARE | SGV4_FLAG_COMPLETE_B4;
+	unsigned long ul_timeout;
+	struct sg_io_v4 *cop = cwrp->h4p;
+	u32 k, n, flags, cdb_mxlen;
+	u32 blen = cop->dout_xfer_len;
+	u32 cdb_alen = cop->request_len;
+	u32 tot_reqs = blen / SZ_SG_IO_V4;
+	struct sg_io_v4 *siv4p;
+	u8 *cdb_ap = NULL;
+	struct sg_io_v4 *a_hds;
+	struct sg_fd *fp = cwrp->sfp;
+	struct sg_fd *o_sfp = sg_fd_share_ptr(fp);
+	struct sg_fd *rq_sfp;
+	struct sg_request *srp;
+	struct sg_device *sdp = fp->parentdp;
+
+	f_non_block = !!(fp->filp->f_flags & O_NONBLOCK);
+	immed = !!(cop->flags & SGV4_FLAG_IMMED);
+	stop_if = !!(cop->flags & SGV4_FLAG_STOP_IF);
+	if (blocking) {		/* came from ioctl(SG_IO) */
+		if (unlikely(immed)) {
+			SG_LOG(1, fp, "%s: ioctl(SG_IO) %s contradicts\n",
+			       __func__, "with SGV4_FLAG_IMMED");
+			return -ERANGE;
+		}
+		if (unlikely(f_non_block)) {
+			SG_LOG(6, fp, "%s: ioctl(SG_IO) %s O_NONBLOCK\n",
+			       __func__, "ignoring");
+			f_non_block = false;
+		}
+	}
+	if (!immed && f_non_block)
+		immed = true;
+	SG_LOG(3, fp, "%s: %s, tot_reqs=%u, cdb_alen=%u\n", __func__,
+	       (immed ? "IMMED" : (blocking ?  "ordered blocking" :
+				   "variable blocking")), tot_reqs, cdb_alen);
+	sg_sgv4_out_zero(cop);
+
+	if (unlikely(tot_reqs > U16_MAX)) {
+		return -ERANGE;
+	} else if (unlikely(blen > SG_MAX_MULTI_REQ_SZ ||
+			    cdb_alen > SG_MAX_MULTI_REQ_SZ)) {
+		return  -E2BIG;
+	} else if (unlikely(immed && stop_if)) {
+		return -ERANGE;
+	} else if (unlikely(tot_reqs == 0)) {
+		return 0;
+	} else if (unlikely(!!cdb_alen != !!cop->request)) {
+		return -ERANGE;	/* both must be zero or both non-zero */
+	} else if (cdb_alen) {
+		if (unlikely(cdb_alen % tot_reqs))
+			return -ERANGE;
+		cdb_mxlen = cdb_alen / tot_reqs;
+		if (unlikely(cdb_mxlen < 6))
+			return -ERANGE;	/* too short for SCSI cdbs */
+	} else {
+		cdb_mxlen = 0;
+	}
+
+	if (unlikely(SG_IS_DETACHING(sdp)))
+		return -ENODEV;
+	else if (unlikely(o_sfp && SG_IS_DETACHING((o_sfp->parentdp))))
+		return -ENODEV;
+
+	a_hds = kcalloc(tot_reqs, SZ_SG_IO_V4, GFP_KERNEL | __GFP_NOWARN);
+	if (!a_hds)
+		return -ENOMEM;
+	n = tot_reqs * SZ_SG_IO_V4;
+	if (copy_from_user(a_hds, cuptr64(cop->dout_xferp), n)) {
+		res = -EFAULT;
+		goto fini;
+	}
+	if (cdb_alen > 0) {
+		cdb_ap = kzalloc(cdb_alen, GFP_KERNEL | __GFP_NOWARN);
+		if (unlikely(!cdb_ap)) {
+			res = -ENOMEM;
+			goto fini;
+		}
+		if (copy_from_user(cdb_ap, cuptr64(cop->request), cdb_alen)) {
+			res = -EFAULT;
+			goto fini;
+		}
+	}
+	/* do sanity checks on all requests before starting */
+	res = sg_mrq_sanity(sdp, cop, a_hds, cdb_ap, fp, tot_reqs);
+	if (unlikely(res))
+		goto fini;
+	set_this = false;
+	set_other = false;
+	/* Dispatch requests and optionally wait for response */
+	for (k = 0, siv4p = a_hds; k < tot_reqs; ++k, ++siv4p) {
+		flags = siv4p->flags;
+		if (flags & SGV4_FLAG_DO_ON_OTHER) {
+			rq_sfp = o_sfp;
+			if (!set_other) {
+				set_other = true;
+				set_bit(SG_FFD_CMD_Q, rq_sfp->ffd_bm);
+			}
+		} else {
+			rq_sfp = fp;
+			if (!set_this) {
+				set_this = true;
+				set_bit(SG_FFD_CMD_Q, rq_sfp->ffd_bm);
+			}
+		}
+		if (cdb_ap) {	/* already have array of cdbs */
+			cwrp->cmdp = cdb_ap + (k * cdb_mxlen);
+			cwrp->u_cmdp = NULL;
+		} else {	/* fetch each cdb from user space */
+			cwrp->cmdp = NULL;
+			cwrp->u_cmdp = cuptr64(siv4p->request);
+		}
+		cwrp->cmd_len = siv4p->request_len;
+		ul_timeout = msecs_to_jiffies(siv4p->timeout);
+		cwrp->frq_bm[0] = 0;
+		assign_bit(SG_FRQ_SYNC_INVOC, cwrp->frq_bm, (int)blocking);
+		set_bit(SG_FRQ_IS_V4I, cwrp->frq_bm);
+		cwrp->h4p = siv4p;
+		cwrp->timeout = min_t(unsigned long, ul_timeout, INT_MAX);
+		cwrp->sfp = rq_sfp;
+		srp = sg_common_write(cwrp);
+		if (IS_ERR(srp)) {
+			s_res = PTR_ERR(srp);
+			break;
+		}
+		srp->s_hdr4.mrq_ind = k;
+		if (immed || (!(blocking || (flags & shr_complet_b4)))) {
+			if (fp == rq_sfp)
+				++this_fp_sent;
+			else
+				++other_fp_sent;
+			continue;  /* defer completion until all submitted */
+		}
+		s_res = sg_wait_event_srp(rq_sfp, NULL, siv4p, srp);
+		if (s_res) {
+			if (s_res == -ERESTARTSYS) {
+				res = s_res;
+				goto fini;
+			}
+			break;
+		}
+		if (!srp) {
+			s_res = -EPROTO;
+			break;
+		}
+		++num_cmpl;
+		siv4p->info |= SG_INFO_MRQ_FINI;
+		if (stop_if && (siv4p->driver_status ||
+				siv4p->transport_status ||
+				siv4p->device_status)) {
+			SG_LOG(2, fp, "%s: %s=0x%x/0x%x/0x%x] cause exit\n",
+			       __func__, "STOP_IF and status [drv/tran/scsi",
+			       siv4p->driver_status, siv4p->transport_status,
+			       siv4p->device_status);
+			break;	/* cop::driver_status <-- 0 in this case */
+		}
+		if (rq_sfp->async_qp && (siv4p->flags & SGV4_FLAG_SIGNAL)) {
+			res = sg_mrq_arr_flush(cop, a_hds, tot_reqs, s_res);
+			if (unlikely(res))
+				break;
+			kill_fasync(&rq_sfp->async_qp, SIGPOLL, POLL_IN);
+		}
+	}	/* end of dispatch request and optionally wait loop */
+	cop->dout_resid = tot_reqs - k;
+	cop->info = k;
+	if (cop->din_xfer_len > 0) {
+		cop->din_resid = tot_reqs - num_cmpl;
+		cop->spare_out = -s_res;
+	}
+
+	if (immed)
+		goto fini;
+
+	if (res == 0 && (this_fp_sent + other_fp_sent) > 0) {
+		s_res = sg_mrq_complets(cop, a_hds, fp, o_sfp, tot_reqs,
+					this_fp_sent, other_fp_sent);
+		if (s_res == -EFAULT || s_res == -ERESTARTSYS)
+			res = s_res;	/* this may leave orphans */
+	}
+fini:
+	if (res == 0 && !immed)
+		res = sg_mrq_arr_flush(cop, a_hds, tot_reqs, s_res);
+	kfree(cdb_ap);
+	kfree(a_hds);
+	return res;
+}
+
 static int
 sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 	     bool sync, struct sg_request **o_srp)
@@ -886,6 +1325,27 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 	struct sg_request *srp;
 	struct sg_comm_wr_t cwr;
 
+	if (h4p->flags & SGV4_FLAG_MULTIPLE_REQS) {
+		/* want v4 async or sync with guard, din and dout and flags */
+		if (!h4p->dout_xferp || h4p->din_iovec_count ||
+		    h4p->dout_iovec_count ||
+		    (h4p->dout_xfer_len % SZ_SG_IO_V4))
+			return -ERANGE;
+		if (o_srp)
+			*o_srp = NULL;
+		memset(&cwr, 0, sizeof(cwr));
+		cwr.sfp = sfp;
+		cwr.h4p = h4p;
+		res = sg_do_multi_req(&cwr, sync);
+		if (unlikely(res))
+			return res;
+		if (p) {
+			/* Write back sg_io_v4 object for error/warning info */
+			if (copy_to_user(p, h4p, SZ_SG_IO_V4))
+				return -EFAULT;
+		}
+		return 0;
+	}
 	if (h4p->flags & SG_FLAG_MMAP_IO) {
 		int len = 0;
 
@@ -908,6 +1368,7 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
 	cwr.cmd_len = h4p->request_len;
 	cwr.u_cmdp = cuptr64(h4p->request);
+	cwr.cmdp = NULL;
 	srp = sg_common_write(&cwr);
 	if (IS_ERR(srp))
 		return PTR_ERR(srp);
@@ -928,14 +1389,14 @@ static int
 sg_ctl_iosubmit(struct sg_fd *sfp, void __user *p)
 {
 	int res;
-	u8 hdr_store[SZ_SG_IO_V4];
-	struct sg_io_v4 *h4p = (struct sg_io_v4 *)hdr_store;
+	struct sg_io_v4 h4;
+	struct sg_io_v4 *h4p = &h4;
 	struct sg_device *sdp = sfp->parentdp;
 
 	res = sg_allow_if_err_recovery(sdp, SG_IS_O_NONBLOCK(sfp));
 	if (res)
 		return res;
-	if (copy_from_user(hdr_store, p, SZ_SG_IO_V4))
+	if (copy_from_user(h4p, p, SZ_SG_IO_V4))
 		return -EFAULT;
 	if (h4p->guard == 'Q')
 		return sg_submit_v4(sfp, p, h4p, false, NULL);
@@ -946,8 +1407,8 @@ static int
 sg_ctl_iosubmit_v3(struct sg_fd *sfp, void __user *p)
 {
 	int res;
-	u8 hdr_store[SZ_SG_IO_V4];	/* max(v3interface, v4interface) */
-	struct sg_io_hdr *h3p = (struct sg_io_hdr *)hdr_store;
+	struct sg_io_hdr h3;
+	struct sg_io_hdr *h3p = &h3;
 	struct sg_device *sdp = sfp->parentdp;
 
 	res = sg_allow_if_err_recovery(sdp, SG_IS_O_NONBLOCK(sfp));
@@ -1237,7 +1698,7 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 			dxfr_len = h4p->dout_xfer_len;
 			dir = SG_DXFER_TO_DEV;
 		}
-	} else {                /* sg v3 interface so hi_p valid */
+	} else {			/* sg v3 interface so hi_p valid */
 		h4p = NULL;
 		hi_p = cwrp->h3p;
 		dir = hi_p->dxfer_direction;
@@ -1245,6 +1706,8 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		rq_flags = hi_p->flags;
 		pack_id = hi_p->pack_id;
 	}
+	if (rq_flags & SGV4_FLAG_MULTIPLE_REQS)
+		return ERR_PTR(-ERANGE);
 	if (sg_fd_is_shared(fp)) {
 		res = sg_share_chk_flags(fp, rq_flags, dxfr_len, dir, &sh_var);
 		if (unlikely(res < 0))
@@ -1290,6 +1753,14 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 	return ERR_PTR(res);
 }
 
+/*
+ * ***********************************************************************
+ * read(2) related functions follow. They are shown after write(2) related
+ * functions. Apart from read(2) itself, ioctl(SG_IORECEIVE) and the second
+ * half of the ioctl(SG_IO) share code with read(2).
+ * ***********************************************************************
+ */
+
 /*
  * This function is called by wait_event_interruptible in sg_read() and
  * sg_ctl_ioreceive(). wait_event_interruptible will return if this one
@@ -1302,7 +1773,7 @@ sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int id,
 	struct sg_request *srp;
 
 	if (unlikely(SG_IS_DETACHING(sfp->parentdp))) {
-		*srpp = NULL;
+		*srpp = ERR_PTR(-ENODEV);
 		return true;
 	}
 	srp = sg_find_srp_by_id(sfp, id, is_tag);
@@ -1388,11 +1859,11 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 			sh_sfp->ws_srp = NULL;
 			break;  /* nothing to do */
 		default:
-			err = -EPROTO;  /* Logic error */
+			err = -EPROTO;	/* Logic error */
 			SG_LOG(1, sfp,
 			       "%s: SHR_WS_RQ, bad read-side state: %s\n",
 			       __func__, sg_rq_st_str(mar_st, true));
-			break;  /* nothing to do */
+			break;	/* nothing to do */
 		}
 	}
 	if (unlikely(SG_IS_DETACHING(sfp->parentdp)))
@@ -1410,7 +1881,7 @@ sg_complete_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool other_err)
 	case SG_SHR_RS_RQ:
 		{
 			int poll_type = POLL_OUT;
-			struct sg_fd *sh_sfp = sg_fd_share_ptr(sfp);
+			struct sg_fd *ws_sfp = sg_fd_share_ptr(sfp);
 
 			if ((srp->rq_result & SG_ML_RESULT_MSK) || other_err) {
 				set_bit(SG_FFD_READ_SIDE_ERR, sfp->ffd_bm);
@@ -1420,8 +1891,10 @@ sg_complete_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool other_err)
 			} else if (sr_st != SG_RQ_SHR_SWAP) {
 				sg_rq_chg_state_force(srp, SG_RQ_SHR_SWAP);
 			}
-			if (sh_sfp)
-				kill_fasync(&sh_sfp->async_qp, SIGPOLL,
+			if (ws_sfp && ws_sfp->async_qp &&
+			    (!test_bit(SG_FRQ_IS_V4I, srp->frq_bm) ||
+			     (srp->rq_flags & SGV4_FLAG_SIGNAL)))
+				kill_fasync(&ws_sfp->async_qp, SIGPOLL,
 					    poll_type);
 		}
 		break;
@@ -1495,6 +1968,99 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 	return err < 0 ? err : 0;
 }
 
+/*
+ * Returns negative on error including -ENODATA if there are no mrqs submitted
+ * nor waiting. Otherwise it returns the number of elements written to
+ * rsp_arr, which may be 0 if mrqs submitted but none waiting
+ */
+static int
+sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs,
+		      struct sg_io_v4 *rsp_arr)
+{
+	int k;
+	int res = 0;
+	struct sg_request *srp;
+
+	SG_LOG(3, sfp, "%s: max_mrqs=%d\n", __func__, max_mrqs);
+	for (k = 0; k < max_mrqs; ++k) {
+		if (!sg_mrq_get_ready_srp(sfp, &srp))
+			break;
+		if (IS_ERR(srp))
+			return k ? k : PTR_ERR(srp);
+		res = sg_receive_v4(sfp, srp, NULL, rsp_arr + k);
+		if (unlikely(res))
+			return res;
+		rsp_arr[k].info |= SG_INFO_MRQ_FINI;
+	}
+	if (non_block)
+		return k;
+
+	for ( ; k < max_mrqs; ++k) {
+		res = wait_event_interruptible
+				(sfp->cmpl_wait,
+				 sg_mrq_get_ready_srp(sfp, &srp));
+		if (unlikely(res))
+			return res;	/* signal --> -ERESTARTSYS */
+		if (IS_ERR(srp))
+			return k ? k : PTR_ERR(srp);
+		res = sg_receive_v4(sfp, srp, NULL, rsp_arr + k);
+		if (unlikely(res))
+			return res;
+		rsp_arr[k].info |= SG_INFO_MRQ_FINI;
+	}
+	return k;
+}
+
+/*
+ * Expected race as multiple concurrent calls with the same pack_id/tag can
+ * occur. Only one should succeed per request (more may succeed but will get
+ * different requests).
+ */
+static int
+sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p,
+		 bool non_block)
+{
+	int res = 0;
+	u32 len, n;
+	struct sg_io_v4 *rsp_v4_arr;
+	void __user *pp;
+
+	SG_LOG(3, sfp, "%s: non_block=%d\n", __func__, !!non_block);
+	n = cop->din_xfer_len;
+	if (n > SG_MAX_MULTI_REQ_SZ)
+		return -E2BIG;
+	if (!cop->din_xferp || n < SZ_SG_IO_V4 || (n % SZ_SG_IO_V4))
+		return -ERANGE;
+	n /= SZ_SG_IO_V4;
+	len = n * SZ_SG_IO_V4;
+	SG_LOG(3, sfp, "%s: %s, num_reqs=%u\n", __func__,
+	       (non_block ? "IMMED" : "blocking"), n);
+	rsp_v4_arr = kcalloc(n, SZ_SG_IO_V4, GFP_KERNEL);
+	if (!rsp_v4_arr)
+		return -ENOMEM;
+
+	sg_sgv4_out_zero(cop);
+	cop->din_resid = n;
+	res = sg_mrq_iorec_complets(sfp, non_block, n, rsp_v4_arr);
+	if (unlikely(res < 0))
+		goto fini;
+	cop->din_resid -= res;
+	cop->info = res;
+	if (copy_to_user(p, cop, sizeof(*cop)))
+		return -EFAULT;
+	res = 0;
+	pp = uptr64(cop->din_xferp);
+	if (pp) {
+		if (copy_to_user(pp, rsp_v4_arr, len))
+			res = -EFAULT;
+	} else {
+		pr_info("%s: cop->din_xferp==NULL ?_?\n", __func__);
+	}
+fini:
+	kfree(rsp_v4_arr);
+	return res;
+}
+
 /*
  * Called when ioctl(SG_IORECEIVE) received. Expects a v4 interface object.
  * Checks if O_NONBLOCK file flag given, if not checks given 'flags' field
@@ -1527,6 +2093,8 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 	if (h4p->flags & SGV4_FLAG_IMMED)
 		non_block = true;	/* set by either this or O_NONBLOCK */
 	SG_LOG(3, sfp, "%s: non_block(+IMMED)=%d\n", __func__, non_block);
+	if (h4p->flags & SGV4_FLAG_MULTIPLE_REQS)
+		return sg_mrq_ioreceive(sfp, h4p, p, non_block);
 	/* read in part of v3 or v4 header for pack_id or tag based find */
 	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm)) {
 		use_tag = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm);
@@ -1544,12 +2112,12 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 		if (non_block)
 			return -EAGAIN;
 		res = wait_event_interruptible
-				(sfp->read_wait,
+				(sfp->cmpl_wait,
 				 sg_get_ready_srp(sfp, &srp, id, use_tag));
-		if (unlikely(SG_IS_DETACHING(sdp)))
-			return -ENODEV;
 		if (res)
 			return res;	/* signal --> -ERESTARTSYS */
+		if (IS_ERR(srp))
+			return PTR_ERR(srp);
 	}	/* now srp should be valid */
 	if (test_and_set_bit(SG_FRQ_RECEIVING, srp->frq_bm)) {
 		cpu_relax();
@@ -1588,6 +2156,8 @@ sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 	if (h3p->flags & SGV4_FLAG_IMMED)
 		non_block = true;	/* set by either this or O_NONBLOCK */
 	SG_LOG(3, sfp, "%s: non_block(+IMMED)=%d\n", __func__, non_block);
+	if (h3p->flags & SGV4_FLAG_MULTIPLE_REQS)
+		return -EINVAL;
 
 	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm))
 		pack_id = h3p->pack_id;
@@ -1599,12 +2169,12 @@ sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 		if (non_block)
 			return -EAGAIN;
 		res = wait_event_interruptible
-				(sfp->read_wait,
+				(sfp->cmpl_wait,
 				 sg_get_ready_srp(sfp, &srp, pack_id, false));
-		if (unlikely(SG_IS_DETACHING(sdp)))
-			return -ENODEV;
 		if (unlikely(res))
 			return res;	/* signal --> -ERESTARTSYS */
+		if (IS_ERR(srp))
+			return PTR_ERR(srp);
 	}	/* now srp should be valid */
 	if (test_and_set_bit(SG_FRQ_RECEIVING, srp->frq_bm)) {
 		cpu_relax();
@@ -1782,12 +2352,12 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 		if (non_block) /* O_NONBLOCK or v3::flags & SGV4_FLAG_IMMED */
 			return -EAGAIN;
 		ret = wait_event_interruptible
-				(sfp->read_wait,
+				(sfp->cmpl_wait,
 				 sg_get_ready_srp(sfp, &srp, want_id, false));
-		if (unlikely(SG_IS_DETACHING(sdp)))
-			return -ENODEV;
 		if (ret)	/* -ERESTARTSYS as signal hit process */
 			return ret;
+		if (IS_ERR(srp))
+			return PTR_ERR(srp);
 		/* otherwise srp should be valid */
 	}
 	if (test_and_set_bit(SG_FRQ_RECEIVING, srp->frq_bm)) {
@@ -1840,6 +2410,7 @@ sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
 	hp->driver_status = driver_byte(rq_result);
 	err2 = put_sg_io_hdr(hp, p);
 	err = err ? err : err2;
+	sg_complete_v3v4(sfp, srp, err < 0);
 	sg_finish_scsi_blk_rq(srp);
 	sg_deact_request(sfp, srp);
 	return err;
@@ -2094,7 +2665,7 @@ sg_do_unshare(struct sg_fd *sfp, bool unshare_val)
 	struct sg_fd *o_sfp = sg_fd_share_ptr(sfp);
 	struct sg_device *sdp = sfp->parentdp;
 
-	if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_UNSHARED)) {
+	if (!sg_fd_is_shared(sfp)) {
 		SG_LOG(1, sfp, "%s: not shared ? ?\n", __func__);
 		return;	/* no share to undo */
 	}
@@ -2176,7 +2747,6 @@ sg_calc_rq_dur(const struct sg_request *srp, bool time_in_ns)
 	return (diff > (s64)U32_MAX) ? 3999999999U : (u32)diff;
 }
 
-/* Return of U32_MAX means srp is inactive state */
 static u32
 sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
 	   bool time_in_ns, bool *is_durp)
@@ -2255,7 +2825,7 @@ sg_wait_event_srp(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 	}
 	SG_LOG(3, sfp, "%s: about to wait_event...()\n", __func__);
 	/* usually will be woken up by sg_rq_end_io() callback */
-	res = wait_event_interruptible(sfp->read_wait,
+	res = wait_event_interruptible(sfp->cmpl_wait,
 				       sg_rq_landed(sdp, srp));
 	if (unlikely(res)) { /* -ERESTARTSYS because signal hit thread */
 		set_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
@@ -2344,6 +2914,7 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
  */
 static struct sg_request *
 sg_match_request(struct sg_fd *sfp, bool use_tag, int id)
+		__must_hold(&sfp->rq_list_lock)
 {
 	int num_waiting = atomic_read(&sfp->waiting);
 	unsigned long idx;
@@ -2376,7 +2947,8 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		__must_hold(sfp->f_mutex)
 {
 	bool use_tag;
-	int res, pack_id, tag, id;
+	int pack_id, tag, id;
+	int res = 0;
 	unsigned long iflags, idx;
 	struct sg_fd *o_sfp;
 	struct sg_request *srp;
@@ -2412,16 +2984,16 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 			return -ENODATA;
 	}
 
-	set_bit(SG_FRQ_ABORTING, srp->frq_bm);
-	res = 0;
+	if (test_and_set_bit(SG_FRQ_ABORTING, srp->frq_bm))
+		goto fini;
+
 	switch (atomic_read(&srp->rq_st)) {
 	case SG_RQ_BUSY:
 		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
 		res = -EBUSY;	/* should not occur often */
 		break;
-	case SG_RQ_INACTIVE:	/* inactive on rq_list not good */
+	case SG_RQ_INACTIVE:	/* perhaps done already */
 		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
-		res = -EPROTO;
 		break;
 	case SG_RQ_AWAIT_RCV:	/* user should still do completion */
 	case SG_RQ_SHR_SWAP:
@@ -2441,6 +3013,7 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
 		break;
 	}
+fini:
 	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 	return res;
 }
@@ -2469,8 +3042,7 @@ sg_find_sfp_helper(struct sg_fd *from_sfp, struct sg_fd *pair_sfp,
 
 	if (unlikely(!mutex_trylock(&pair_sfp->f_mutex)))
 		return -EPROBE_DEFER;	/* use to suggest re-invocation */
-	if (unlikely(!xa_get_mark(&pair_sdp->sfp_arr, pair_sfp->idx,
-				  SG_XA_FD_UNSHARED)))
+	if (unlikely(sg_fd_is_shared(pair_sfp)))
 		res = -EADDRNOTAVAIL;
 	else if (unlikely(SG_HAVE_EXCLUDE(pair_sdp)))
 		res = -EPERM;
@@ -2569,8 +3141,7 @@ sg_find_sfp_by_fd(const struct file *search_for, int search_fd,
 			if (unlikely(!sdp || SG_IS_DETACHING(sdp)))
 				continue;
 			xa_for_each(&sdp->sfp_arr, idx, sfp) {
-				if (xa_get_mark(&sdp->sfp_arr, idx,
-						SG_XA_FD_UNSHARED))
+				if (!sg_fd_is_shared(sfp))
 					continue;
 				if (search_for == sfp->filp) {
 					res = -EADDRNOTAVAIL;  /* already */
@@ -2608,8 +3179,7 @@ sg_fd_share(struct sg_fd *ws_sfp, int m_fd)
 	if (unlikely(m_fd < 0))
 		return -EBADF;
 
-	if (unlikely(!xa_get_mark(&ws_sfp->parentdp->sfp_arr, ws_sfp->idx,
-				  SG_XA_FD_UNSHARED)))
+	if (unlikely(sg_fd_is_shared(ws_sfp)))
 		return -EADDRINUSE;  /* don't allow chain of shares */
 	/* Alternate approach: fcheck_files(current->files, m_fd) */
 	filp = fget(m_fd);
@@ -2726,8 +3296,7 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 	struct sg_device *sdp = sfp->parentdp;
 	struct xarray *xafp = &sfp->srp_arr;
 
-	if (unlikely(!xa_get_mark(&sfp->parentdp->sfp_arr, sfp->idx,
-				  SG_XA_FD_UNSHARED)))
+	if (unlikely(sg_fd_is_shared(sfp)))
 		return -EBUSY;	/* this fd can't be either side of share */
 	o_srp = sfp->rsv_srp;
 	if (!o_srp)
@@ -2824,7 +3393,7 @@ static bool
 sg_any_persistent_orphans(struct sg_fd *sfp)
 {
 	if (test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm)) {
-		int num_waiting = atomic_read(&sfp->waiting);
+		int num_waiting = atomic_read_acquire(&sfp->waiting);
 		unsigned long idx;
 		struct sg_request *srp;
 		struct xarray *xafp = &sfp->srp_arr;
@@ -2832,8 +3401,6 @@ sg_any_persistent_orphans(struct sg_fd *sfp)
 		if (num_waiting < 1)
 			return false;
 		xa_for_each_marked(xafp, idx, srp, SG_XA_RQ_AWAIT) {
-			if (unlikely(!srp))
-				continue;
 			if (test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm))
 				return true;
 		}
@@ -2916,10 +3483,10 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		c_flgs_val_out &= ~SG_CTL_FLAGM_UNSHARE;	/* clear bit */
 	/* IS_SHARE boolean: [ro] true if fd may be read-side or write-side share */
 	if (c_flgs_rm & SG_CTL_FLAGM_IS_SHARE) {
-		if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_UNSHARED))
-			c_flgs_val_out &= ~SG_CTL_FLAGM_IS_SHARE;
-		else
+		if (sg_fd_is_shared(sfp))
 			c_flgs_val_out |= SG_CTL_FLAGM_IS_SHARE;
+		else
+			c_flgs_val_out &= ~SG_CTL_FLAGM_IS_SHARE;
 	}
 	/* IS_READ_SIDE boolean: [ro] true if this fd may be a read-side share */
 	if (c_flgs_rm & SG_CTL_FLAGM_IS_READ_SIDE) {
@@ -3335,6 +3902,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		SG_LOG(3, sfp, "%s:    SG_GET_PACK_ID=%d\n", __func__, val);
 		return put_user(val, ip);
 	case SG_GET_NUM_WAITING:
+		/* Want as fast as possible, with a useful result */
 		if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm))
 			sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready */
 		val = atomic_read(&sfp->waiting);
@@ -3625,7 +4193,7 @@ sg_poll(struct file *filp, poll_table *wait)
 		sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready to push up */
 	num = atomic_read(&sfp->waiting);
 	if (num < 1) {
-		poll_wait(filp, &sfp->read_wait, wait);
+		poll_wait(filp, &sfp->cmpl_wait, wait);
 		num = atomic_read(&sfp->waiting);
 	}
 	if (num > 0)
@@ -3866,8 +4434,8 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	if (test_bit(SG_FRQ_ABORTING, srp->frq_bm) && rq_result == 0)
 		srp->rq_result |= (DRIVER_HARD << 24);
 
-	SG_LOG(6, sfp, "%s: pack_id=%d, tag=%d, res=0x%x\n", __func__,
-	       srp->pack_id, srp->tag, srp->rq_result);
+	SG_LOG(6, sfp, "%s: pack/tag_id=%d/%d, cmd=0x%x, res=0x%x\n", __func__,
+	       srp->pack_id, srp->tag, srp->cmd_opcode, srp->rq_result);
 	if (srp->start_ns > 0)	/* zero only when SG_FFD_NO_DURATION is set */
 		srp->duration = sg_calc_rq_dur(srp, test_bit(SG_FFD_TIME_IN_NS,
 							     sfp->ffd_bm));
@@ -3936,8 +4504,10 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	if (likely(rqq_state == SG_RQ_AWAIT_RCV)) {
 		/* Wake any sg_read()/ioctl(SG_IORECEIVE) awaiting this req */
 		if (!(srp->rq_flags & SGV4_FLAG_HIPRI))
-			wake_up_interruptible(&sfp->read_wait);
-		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+			wake_up_interruptible(&sfp->cmpl_wait);
+		if (sfp->async_qp && (!test_bit(SG_FRQ_IS_V4I, srp->frq_bm) ||
+				      (srp->rq_flags & SGV4_FLAG_SIGNAL)))
+			kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
 		kref_put(&sfp->f_ref, sg_remove_sfp);
 	} else {        /* clean up orphaned request that aren't being kept */
 		INIT_WORK(&srp->ew_orph.work, sg_rq_end_io_usercontext);
@@ -4008,6 +4578,7 @@ sg_add_device_helper(struct gendisk *disk, struct scsi_device *scsidp)
 	xa_init_flags(&sdp->sfp_arr, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
 	init_waitqueue_head(&sdp->open_wait);
 	clear_bit(SG_FDEV_DETACHING, sdp->fdev_bm);
+	atomic_set(&sdp->open_cnt, 0);
 	sdp->index = k;
 	kref_init(&sdp->d_ref);
 	error = 0;
@@ -4142,8 +4713,9 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 					"%s: 0x%pK\n", __func__, sdp));
 
 	xa_for_each(&sdp->sfp_arr, idx, sfp) {
-		wake_up_interruptible_all(&sfp->read_wait);
-		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_HUP);
+		wake_up_interruptible_all(&sfp->cmpl_wait);
+		if (sfp->async_qp)
+			kill_fasync(&sfp->async_qp, SIGPOLL, POLL_HUP);
 	}
 	wake_up_interruptible_all(&sdp->open_wait);
 
@@ -4310,7 +4882,6 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	int dxfer_len = 0;
 	int r0w = READ;
 	u32 rq_flags = srp->rq_flags;
-	int blk_flgs;
 	unsigned int iov_count = 0;
 	void __user *up;
 	struct request *rqq;
@@ -4327,8 +4898,10 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	sdp = sfp->parentdp;
 	if (cwrp->cmd_len > BLK_MAX_CDB) {	/* for longer SCSI cdb_s */
 		long_cmdp = kzalloc(cwrp->cmd_len, GFP_KERNEL);
-		if (!long_cmdp)
-			return -ENOMEM;
+		if (!long_cmdp) {
+			res = -ENOMEM;
+			goto err_out;
+		}
 		SG_LOG(5, sfp, "%s: long_cmdp=0x%pK ++\n", __func__, long_cmdp);
 	}
 	if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm)) {
@@ -4364,13 +4937,13 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	 * boolean set on this file descriptor, returns -EAGAIN if
 	 * blk_get_request(BLK_MQ_REQ_NOWAIT) yields EAGAIN (aka EWOULDBLOCK).
 	 */
-	blk_flgs = (test_bit(SG_FFD_MORE_ASYNC, sfp->ffd_bm)) ?
-						BLK_MQ_REQ_NOWAIT : 0;
 	rqq = blk_get_request(q, (r0w ? REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN),
-			      blk_flgs);
+			      (test_bit(SG_FFD_MORE_ASYNC, sfp->ffd_bm) ?
+						BLK_MQ_REQ_NOWAIT : 0));
 	if (IS_ERR(rqq)) {
 		kfree(long_cmdp);
-		return PTR_ERR(rqq);
+		res = PTR_ERR(rqq);
+		goto err_out;
 	}
 	/* current sg_request protected by SG_RQ_BUSY state */
 	scsi_rp = scsi_req(rqq);
@@ -4384,10 +4957,12 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	if (cwrp->u_cmdp)
 		res = sg_fetch_cmnd(sfp, cwrp->u_cmdp, cwrp->cmd_len,
 				    scsi_rp->cmd);
+	else if (cwrp->cmdp)
+		memcpy(scsi_rp->cmd, cwrp->cmdp, cwrp->cmd_len);
 	else
 		res = -EPROTO;
 	if (res)
-		goto fini;
+		goto err_out;
 	scsi_rp->cmd_len = cwrp->cmd_len;
 	srp->cmd_opcode = scsi_rp->cmd[0];
 	us_xfer = !(rq_flags & (SG_FLAG_NO_DXFER | SG_FLAG_MMAP_IO));
@@ -4467,6 +5042,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	} else {
 		srp->bio = rqq->bio;
 	}
+err_out:
 	SG_LOG((res ? 1 : 4), sfp, "%s: %s %s res=%d [0x%pK]\n", __func__,
 	       sg_shr_str(srp->sh_var, false), cp, res, srp);
 	return res;
@@ -4620,7 +5196,7 @@ sg_remove_sgat(struct sg_request *srp)
 
 	SG_LOG(4, sfp, "%s: num_sgat=%d%s\n", __func__, schp->num_sgat,
 	       ((srp->parentfp ? (sfp->rsv_srp == srp) : false) ?
-		" [rsv]" : ""));
+							" [rsv]" : ""));
 	sg_remove_sgat_helper(sfp, schp);
 
 	if (sfp->tot_fd_thresh > 0) {
@@ -4809,6 +5385,65 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 	return srp;
 }
 
+/*
+ * Returns true if a request is ready and its srp is written to *srpp . If
+ * nothing can be found (because nothing is currently submitted) then true
+ * is returned and ERR_PTR(-ENODATA) --> *srpp . If nothing is found but
+ * sfp has requests submitted, returns false and NULL --> *srpp .
+ */
+static bool
+sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
+{
+	bool second = false;
+	int num_waiting, res;
+	int l_await_idx = READ_ONCE(sfp->low_await_idx);
+	unsigned long idx, s_idx, end_idx;
+	struct sg_request *srp;
+	struct xarray *xafp = &sfp->srp_arr;
+
+	if (unlikely(SG_IS_DETACHING(sfp->parentdp))) {
+		*srpp = ERR_PTR(-ENODEV);
+		return true;
+	}
+	if (atomic_read(&sfp->submitted) < 1) {
+		*srpp = ERR_PTR(-ENODATA);
+		return true;
+	}
+	num_waiting = atomic_read_acquire(&sfp->waiting);
+	if (num_waiting < 1)
+		goto fini;
+
+	s_idx = (l_await_idx < 0) ? 0 : l_await_idx;
+	idx = s_idx;
+	end_idx = ULONG_MAX;
+
+second_time:
+	for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
+	     srp;
+	     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
+		res = sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY);
+		if (likely(res == 0)) {
+			*srpp = srp;
+			WRITE_ONCE(sfp->low_await_idx, idx + 1);
+			return true;
+		}
+#if IS_ENABLED(SG_LOG_ACTIVE)
+		sg_rq_state_fail_msg(sfp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY, __func__);
+#endif
+	}
+	/* If not found so far, need to wrap around and search [0 ... end_idx) */
+	if (!srp && !second && s_idx > 0) {
+		end_idx = s_idx - 1;
+		s_idx = 0;
+		idx = s_idx;
+		second = true;
+		goto second_time;
+	}
+fini:
+	*srpp = NULL;
+	return false;
+}
+
 /*
  * Makes a new sg_request object. If 'first' is set then use GFP_KERNEL which
  * may take time but has improved chance of success, otherwise use GFP_ATOMIC.
@@ -4819,7 +5454,7 @@ static struct sg_request *
 sg_mk_srp(struct sg_fd *sfp, bool first)
 {
 	struct sg_request *srp;
-	gfp_t gfp =  __GFP_NOWARN;
+	gfp_t gfp = __GFP_NOWARN;
 
 	if (first)      /* prepared to wait if none already outstanding */
 		srp = kzalloc(sizeof(*srp), gfp | GFP_KERNEL);
@@ -4915,7 +5550,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 	enum sg_rq_state sr_st;
 	enum sg_rq_state rs_sr_st = SG_RQ_INACTIVE;
 	struct sg_fd *fp = cwrp->sfp;
-	struct sg_request *r_srp = NULL;	/* request to return */
+	struct sg_request *r_srp = NULL; /* returned value won't be NULL */
 	struct sg_request *low_srp = NULL;
 	__maybe_unused struct sg_request *rsv_srp;
 	struct sg_request *rs_rsv_srp = NULL;
@@ -4942,6 +5577,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 				goto good_fini;
 			}
 		}
+		/* Did not find the reserve request available */
 		r_srp = ERR_PTR(-EBUSY);
 		break;
 	case SG_SHR_RS_NOT_SRQ:
@@ -4954,7 +5590,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 			break;
 		}
 		/*
-		 * Contention here may be with another potential write-side trying
+		 * There may be contention with another potential write-side trying
 		 * to pair with this read-side. The loser will receive an
 		 * EADDRINUSE errno. The winner advances read-side's rq_state:
 		 *     SG_RQ_SHR_SWAP --> SG_RQ_SHR_IN_WS
@@ -4964,6 +5600,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		switch (rs_sr_st) {
 		case SG_RQ_AWAIT_RCV:
 			if (rs_rsv_srp->rq_result & SG_ML_RESULT_MSK) {
+				/* read-side done but error occurred */
 				r_srp = ERR_PTR(-ENOSTR);
 				break;
 			}
@@ -4992,7 +5629,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 	}
 	if (IS_ERR(r_srp)) {
 		if (PTR_ERR(r_srp) == -EBUSY)
-			goto err_out2;
+			goto err_out;
 		if (sh_var == SG_SHR_RS_RQ)
 			snprintf(b, sizeof(b), "SG_SHR_RS_RQ --> sr_st=%s",
 				 sg_rq_st_str(sr_st, false));
@@ -5153,12 +5790,11 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 	} else if (sh_var == SG_SHR_RS_RQ && test_bit(SG_FFD_READ_SIDE_ERR, fp->ffd_bm))
 		clear_bit(SG_FFD_READ_SIDE_ERR, fp->ffd_bm);
 err_out:
-	if (IS_ERR(r_srp) && b[0])
+	if (IS_ERR(r_srp) && PTR_ERR(r_srp) != -EBUSY && b[0])
 		SG_LOG(1, fp, "%s: bad %s\n", __func__, b);
 	if (!IS_ERR(r_srp))
 		SG_LOG(4, fp, "%s: %s %sr_srp=0x%pK\n", __func__, cp,
 		       ((r_srp == fp->rsv_srp) ? "[rsv] " : ""), r_srp);
-err_out2:
 	return r_srp;
 }
 
@@ -5214,7 +5850,7 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 	sfp = kzalloc(sizeof(*sfp), GFP_ATOMIC | __GFP_NOWARN);
 	if (!sfp)
 		return ERR_PTR(-ENOMEM);
-	init_waitqueue_head(&sfp->read_wait);
+	init_waitqueue_head(&sfp->cmpl_wait);
 	xa_init_flags(&sfp->srp_arr, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
 	xafp = &sfp->srp_arr;
 	kref_init(&sfp->f_ref);
@@ -5722,11 +6358,11 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 	struct sg_request *srp;
 	struct sg_device *sdp = fp->parentdp;
 
-	if (xa_get_mark(&sdp->sfp_arr, fp->idx, SG_XA_FD_UNSHARED))
-		cp = "";
-	else
+	if (sg_fd_is_shared(fp))
 		cp = xa_get_mark(&sdp->sfp_arr, fp->idx, SG_XA_FD_RS_SHARE) ?
-			" shr_rs" : " shr_ws";
+			" shr_rs" : " shr_rs";
+	else
+		cp = "";
 	/* sgat=-1 means unavailable */
 	to = (fp->timeout >= 0) ? jiffies_to_msecs(fp->timeout) : -999;
 	if (to < 0)
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 272001a69d01..e1919eadf036 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -32,7 +32,11 @@
 #include <linux/types.h>
 #include <linux/major.h>
 
-/* bsg.h contains the sg v4 user space interface structure (sg_io_v4). */
+/*
+ * bsg.h contains the sg v4 user space interface structure (sg_io_v4).
+ * That structure is also used as the controlling object when multiple
+ * requests are issued with one ioctl() call.
+ */
 #include <linux/bsg.h>
 
 /*
@@ -110,11 +114,16 @@ typedef struct sg_io_hdr {
 #define SGV4_FLAG_YIELD_TAG 0x8  /* sg_io_v4::generated_tag set after SG_IOS */
 #define SGV4_FLAG_Q_AT_TAIL SG_FLAG_Q_AT_TAIL
 #define SGV4_FLAG_Q_AT_HEAD SG_FLAG_Q_AT_HEAD
+#define SGV4_FLAG_COMPLETE_B4  0x100
+#define SGV4_FLAG_SIGNAL  0x200	/* v3: ignored; v4 signal on completion */
 #define SGV4_FLAG_IMMED 0x400 /* for polling with SG_IOR, ignored in SG_IOS */
 #define SGV4_FLAG_HIPRI 0x800 /* request will use blk_poll to complete */
-#define SGV4_FLAG_DEV_SCOPE 0x1000 /* permit SG_IOABORT to have wider scope */
-#define SGV4_FLAG_SHARE 0x2000	/* share IO buffer; needs SG_SEIM_SHARE_FD */
+#define SGV4_FLAG_STOP_IF 0x1000	/* Stops sync mrq if error or warning */
+#define SGV4_FLAG_DEV_SCOPE 0x2000 /* permit SG_IOABORT to have wider scope */
+#define SGV4_FLAG_SHARE 0x4000	/* share IO buffer; needs SG_SEIM_SHARE_FD */
+#define SGV4_FLAG_DO_ON_OTHER 0x8000 /* available on either of shared pair */
 #define SGV4_FLAG_NO_DXFER SG_FLAG_NO_DXFER /* but keep dev<-->kernel xfr */
+#define SGV4_FLAG_MULTIPLE_REQS 0x20000	/* n sg_io_v4s in data-in */
 
 /* Output (potentially OR-ed together) in v3::info or v4::info field */
 #define SG_INFO_OK_MASK 0x1
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 53/83] sg: rename some mrq variables
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (52 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 52/83] sg: add multiple request support Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 54/83] sg: unlikely likely Douglas Gilbert
                   ` (29 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

While preparing for a larger change/addition to multiple
request (mrq) handling, some variable names have been
changed for greater clarity and/or brevity.

In sg_mrq_get_rq() -ENODATA was being returned via srpp when there
was no request waiting. It should only do that when no request has
been submitted. Submitted includes the number inflight plus the
number waiting. Fold sg_mrq_get_rq() into sg_mrq_get_ready_srp().
Fix sg_rec_state_v3v4() treatment of read-side's SG_RQ_AWAIT_RCV
state.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 120 +++++++++++++++++++++-------------------------
 1 file changed, 55 insertions(+), 65 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 635a3e2b10e5..d30c0034d767 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -916,8 +916,8 @@ sg_sgv4_out_zero(struct sg_io_v4 *h4p)
 }
 
 /*
- * Takes a pointer to the controlling multiple request (mrq) object and a
- * pointer to the command array. The command array (with tot_reqs elements)
+ * Takes a pointer (cop) to the multiple request (mrq) control object and
+ * a pointer to the command array. The command array (with tot_reqs elements)
  * is written out (flushed) to user space pointer cop->din_xferp. The
  * secondary error value (s_res) is placed in the cop->spare_out field.
  */
@@ -969,6 +969,7 @@ sg_mrq_1complet(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 /*
  * This is a fair-ish algorithm for an interruptible wait on two file
  * descriptors. It favours the main fd over the secondary fd (sec_sfp).
+ * Increments cop->info for each successful completion.
  */
 static int
 sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
@@ -1046,23 +1047,22 @@ sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 static int
 sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 	      struct sg_io_v4 *a_hds, u8 *cdb_ap, struct sg_fd *sfp,
-	      u32 tot_reqs)
+	      bool immed, u32 tot_reqs)
 {
-	bool immed = !!(cop->flags & SGV4_FLAG_IMMED);
 	bool have_mrq_sense = (cop->response && cop->max_response_len);
 	int k;
 	u32 cdb_alen = cop->request_len;
 	u32 cdb_mxlen = cdb_alen / tot_reqs;
 	u32 flags;
-	struct sg_io_v4 *siv4p;
+	struct sg_io_v4 *hp;
 	__maybe_unused const char *rip = "request index";
 
-	/* Pre-check each request for anomalies */
-	for (k = 0, siv4p = a_hds; k < tot_reqs; ++k, ++siv4p) {
-		flags = siv4p->flags;
-		sg_sgv4_out_zero(siv4p);
-		if (siv4p->guard != 'Q' || siv4p->protocol != 0 ||
-		    siv4p->subprotocol != 0) {
+	/* Pre-check each request for anomalies, plus some preparation */
+	for (k = 0, hp = a_hds; k < tot_reqs; ++k, ++hp) {
+		flags = hp->flags;
+		sg_sgv4_out_zero(hp);
+		if (unlikely(hp->guard != 'Q' || hp->protocol != 0 ||
+			     hp->subprotocol != 0)) {
 			SG_LOG(1, sfp, "%s: req index %u: %s or protocol\n",
 			       __func__, k, "bad guard");
 			return -ERANGE;
@@ -1099,23 +1099,23 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 			}
 		}
 		if (cdb_ap) {
-			if (siv4p->request_len > cdb_mxlen) {
+			if (unlikely(hp->request_len > cdb_mxlen)) {
 				SG_LOG(1, sfp, "%s: %s %u, cdb too long\n",
 				       __func__, rip, k);
 				return -ERANGE;
 			}
 		}
-		if (have_mrq_sense && siv4p->response == 0 &&
-		    siv4p->max_response_len == 0) {
-			siv4p->response = cop->response;
-			siv4p->max_response_len = cop->max_response_len;
+		if (have_mrq_sense && hp->response == 0 &&
+		    hp->max_response_len == 0) {
+			hp->response = cop->response;
+			hp->max_response_len = cop->max_response_len;
 		}
 	}
 	return 0;
 }
 
 /*
- * Implements the multiple request functionality. When blocking is true
+ * Implements the multiple request functionality. When 'blocking' is true
  * invocation was via ioctl(SG_IO), otherwise it was via ioctl(SG_IOSUBMIT).
  * Only fully non-blocking if IMMED flag given or when ioctl(SG_IOSUBMIT)
  * is used with O_NONBLOCK set on its file descriptor.
@@ -1123,7 +1123,7 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 static int
 sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 {
-	bool set_this, set_other, immed, stop_if, f_non_block;
+	bool immed, stop_if, f_non_block;
 	int res = 0;
 	int s_res = 0;	/* for secondary error: some-good-then-error, case */
 	int other_fp_sent = 0;
@@ -1136,9 +1136,9 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 	u32 blen = cop->dout_xfer_len;
 	u32 cdb_alen = cop->request_len;
 	u32 tot_reqs = blen / SZ_SG_IO_V4;
-	struct sg_io_v4 *siv4p;
 	u8 *cdb_ap = NULL;
-	struct sg_io_v4 *a_hds;
+	struct sg_io_v4 *hp;		/* ptr to request object in a_hds */
+	struct sg_io_v4 *a_hds;		/* array of request objects */
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_fd *o_sfp = sg_fd_share_ptr(fp);
 	struct sg_fd *rq_sfp;
@@ -1213,40 +1213,31 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 		}
 	}
 	/* do sanity checks on all requests before starting */
-	res = sg_mrq_sanity(sdp, cop, a_hds, cdb_ap, fp, tot_reqs);
+	res = sg_mrq_sanity(sdp, cop, a_hds, cdb_ap, fp, immed, tot_reqs);
 	if (unlikely(res))
 		goto fini;
-	set_this = false;
-	set_other = false;
-	/* Dispatch requests and optionally wait for response */
-	for (k = 0, siv4p = a_hds; k < tot_reqs; ++k, ++siv4p) {
-		flags = siv4p->flags;
-		if (flags & SGV4_FLAG_DO_ON_OTHER) {
-			rq_sfp = o_sfp;
-			if (!set_other) {
-				set_other = true;
-				set_bit(SG_FFD_CMD_Q, rq_sfp->ffd_bm);
-			}
-		} else {
-			rq_sfp = fp;
-			if (!set_this) {
-				set_this = true;
-				set_bit(SG_FFD_CMD_Q, rq_sfp->ffd_bm);
-			}
-		}
+	/* override cmd queuing setting to allow */
+	set_bit(SG_FFD_CMD_Q, fp->ffd_bm);
+	if (o_sfp)
+		set_bit(SG_FFD_CMD_Q, o_sfp->ffd_bm);
+
+	/* Dispatch (submit) requests and optionally wait for response */
+	for (hp = a_hds, k = 0; num_cmpl < tot_reqs; ++hp, ++k) {
+		flags = hp->flags;
+		rq_sfp = (flags & SGV4_FLAG_DO_ON_OTHER) ? o_sfp : fp;
 		if (cdb_ap) {	/* already have array of cdbs */
 			cwrp->cmdp = cdb_ap + (k * cdb_mxlen);
 			cwrp->u_cmdp = NULL;
 		} else {	/* fetch each cdb from user space */
 			cwrp->cmdp = NULL;
-			cwrp->u_cmdp = cuptr64(siv4p->request);
+			cwrp->u_cmdp = cuptr64(hp->request);
 		}
-		cwrp->cmd_len = siv4p->request_len;
-		ul_timeout = msecs_to_jiffies(siv4p->timeout);
+		cwrp->cmd_len = hp->request_len;
+		ul_timeout = msecs_to_jiffies(hp->timeout);
 		cwrp->frq_bm[0] = 0;
-		assign_bit(SG_FRQ_SYNC_INVOC, cwrp->frq_bm, (int)blocking);
-		set_bit(SG_FRQ_IS_V4I, cwrp->frq_bm);
-		cwrp->h4p = siv4p;
+		__assign_bit(SG_FRQ_SYNC_INVOC, cwrp->frq_bm, (int)blocking);
+		__set_bit(SG_FRQ_IS_V4I, cwrp->frq_bm);
+		cwrp->h4p = hp;
 		cwrp->timeout = min_t(unsigned long, ul_timeout, INT_MAX);
 		cwrp->sfp = rq_sfp;
 		srp = sg_common_write(cwrp);
@@ -1262,8 +1253,8 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 				++other_fp_sent;
 			continue;  /* defer completion until all submitted */
 		}
-		s_res = sg_wait_event_srp(rq_sfp, NULL, siv4p, srp);
-		if (s_res) {
+		s_res = sg_wait_event_srp(rq_sfp, NULL, hp, srp);
+		if (unlikely(s_res)) {
 			if (s_res == -ERESTARTSYS) {
 				res = s_res;
 				goto fini;
@@ -1275,25 +1266,24 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 			break;
 		}
 		++num_cmpl;
-		siv4p->info |= SG_INFO_MRQ_FINI;
-		if (stop_if && (siv4p->driver_status ||
-				siv4p->transport_status ||
-				siv4p->device_status)) {
+		hp->info |= SG_INFO_MRQ_FINI;
+		if (stop_if && (hp->driver_status || hp->transport_status ||
+				hp->device_status)) {
 			SG_LOG(2, fp, "%s: %s=0x%x/0x%x/0x%x] cause exit\n",
 			       __func__, "STOP_IF and status [drv/tran/scsi",
-			       siv4p->driver_status, siv4p->transport_status,
-			       siv4p->device_status);
-			break;	/* cop::driver_status <-- 0 in this case */
+			       hp->driver_status, hp->transport_status,
+			       hp->device_status);
+			break;	/* cop->driver_status <-- 0 in this case */
 		}
-		if (rq_sfp->async_qp && (siv4p->flags & SGV4_FLAG_SIGNAL)) {
+		if (rq_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
 			res = sg_mrq_arr_flush(cop, a_hds, tot_reqs, s_res);
 			if (unlikely(res))
 				break;
 			kill_fasync(&rq_sfp->async_qp, SIGPOLL, POLL_IN);
 		}
-	}	/* end of dispatch request and optionally wait loop */
-	cop->dout_resid = tot_reqs - k;
-	cop->info = k;
+	}	/* end of dispatch request and optionally wait response loop */
+	cop->dout_resid = tot_reqs - num_cmpl;
+	cop->info = num_cmpl;
 	if (cop->din_xfer_len > 0) {
 		cop->din_resid = tot_reqs - num_cmpl;
 		cop->spare_out = -s_res;
@@ -1624,21 +1614,20 @@ sg_rq_chg_state_force(struct sg_request *srp, enum sg_rq_state new_st)
 static void
 sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 {
-	bool at_head, is_v4h, sync;
+	bool at_head, sync;
 	struct sg_device *sdp = sfp->parentdp;
 	struct request *rqq = READ_ONCE(srp->rqq);
 
-	is_v4h = test_bit(SG_FRQ_IS_V4I, srp->frq_bm);
 	sync = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
-	SG_LOG(3, sfp, "%s: is_v4h=%d\n", __func__, (int)is_v4h);
+	SG_LOG(3, sfp, "%s: pack_id=%d\n", __func__, srp->pack_id);
 	if (test_bit(SG_FFD_NO_DURATION, sfp->ffd_bm))
 		srp->start_ns = 0;
 	else
 		srp->start_ns = ktime_get_boottime_ns();/* assume always > 0 */
 	srp->duration = 0;
 
-	if (!is_v4h && srp->s_hdr3.interface_id == '\0')
-		at_head = true;	/* backward compatibility: v1+v2 interfaces */
+	if (!test_bit(SG_FRQ_IS_V4I, srp->frq_bm) && srp->s_hdr3.interface_id == '\0')
+		at_head = true;	/* backward compatibility for v1+v2 interfaces */
 	else if (test_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm))
 		/* cmd flags can override sfd setting */
 		at_head = !!(srp->rq_flags & SG_FLAG_Q_AT_HEAD);
@@ -1854,10 +1843,11 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 			sg_rq_chg_state_force(rs_srp, SG_RQ_INACTIVE);
 			atomic_inc(&sh_sfp->inactives);
 			break;
-		case SG_RQ_INACTIVE:
 		case SG_RQ_AWAIT_RCV:
+			break;
+		case SG_RQ_INACTIVE:
 			sh_sfp->ws_srp = NULL;
-			break;  /* nothing to do */
+			break;	/* nothing to do */
 		default:
 			err = -EPROTO;	/* Logic error */
 			SG_LOG(1, sfp,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 54/83] sg: unlikely likely
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (53 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 53/83] sg: rename some mrq variables Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 55/83] sg: mrq abort Douglas Gilbert
                   ` (28 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Apply 'unlikely' qualifier (or 'likely' qualifier) across
almost all functions in the driver where error path departs
from the fast path.

Other small changes:
  - move sg_rep_rq_state_fail() definition before use
  - remove some remnants of when SG_IOSUBMIT and SG_IORECEIVE
    accepted both v3 and v4 interfaces. Hence no need for
    array u8 to hold either interface, simply use correct
    interface type
  - refactor some abort request code

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 576 +++++++++++++++++++++++++---------------------
 1 file changed, 314 insertions(+), 262 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index d30c0034d767..27d9ac801f11 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -369,7 +369,7 @@ static const char *sg_shr_str(enum sg_shr_var sh_var, bool long_str);
 
 /* There is a assert that SZ_SG_IO_V4 >= SZ_SG_IO_HDR in first function */
 
-#define SG_IS_DETACHING(sdp) test_bit(SG_FDEV_DETACHING, (sdp)->fdev_bm)
+#define SG_IS_DETACHING(sdp) unlikely(test_bit(SG_FDEV_DETACHING, (sdp)->fdev_bm))
 #define SG_HAVE_EXCLUDE(sdp) test_bit(SG_FDEV_EXCLUDE, (sdp)->fdev_bm)
 #define SG_IS_O_NONBLOCK(sfp) (!!((sfp)->filp->f_flags & O_NONBLOCK))
 #define SG_RQ_ACTIVE(srp) (atomic_read(&(srp)->rq_st) != SG_RQ_INACTIVE)
@@ -430,12 +430,12 @@ sg_check_file_access(struct file *filp, const char *caller)
 	compiletime_assert(SZ_SG_IO_V4 >= SZ_SG_IO_HDR,
 			   "struct sg_io_v4 should be larger than sg_io_hdr");
 
-	if (filp->f_cred != current_real_cred()) {
+	if (unlikely(filp->f_cred != current_real_cred())) {
 		pr_err_once("%s: process %d (%s) changed security contexts after opening file descriptor, this is not allowed.\n",
 			caller, task_tgid_vnr(current), current->comm);
 		return -EPERM;
 	}
-	if (uaccess_kernel()) {
+	if (unlikely(uaccess_kernel())) {
 		pr_err_once("%s: process %d (%s) called from kernel context, this is not allowed.\n",
 			caller, task_tgid_vnr(current), current->comm);
 		return -EACCES;
@@ -454,7 +454,7 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 			mutex_unlock(&sdp->open_rel_lock);
 			res = wait_event_interruptible
 					(sdp->open_wait,
-					 (unlikely(SG_IS_DETACHING(sdp)) ||
+					 (SG_IS_DETACHING(sdp) ||
 					  atomic_read(&sdp->open_cnt) == 0));
 			mutex_lock(&sdp->open_rel_lock);
 
@@ -468,7 +468,7 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 			mutex_unlock(&sdp->open_rel_lock);
 			res = wait_event_interruptible
 					(sdp->open_wait,
-					 (unlikely(SG_IS_DETACHING(sdp)) ||
+					 (SG_IS_DETACHING(sdp) ||
 					  !SG_HAVE_EXCLUDE(sdp)));
 			mutex_lock(&sdp->open_rel_lock);
 
@@ -532,36 +532,36 @@ sg_open(struct inode *inode, struct file *filp)
 
 	/* Prevent the device driver from vanishing while we sleep */
 	res = scsi_device_get(sdp->device);
-	if (res)
+	if (unlikely(res))
 		goto sg_put;
 	res = scsi_autopm_get_device(sdp->device);
-	if (res)
+	if (unlikely(res))
 		goto sdp_put;
 	res = sg_allow_if_err_recovery(sdp, non_block);
-	if (res)
+	if (unlikely(res))
 		goto error_out;
 
 	mutex_lock(&sdp->open_rel_lock);
 	if (op_flags & O_NONBLOCK) {
-		if (o_excl) {
+		if (unlikely(o_excl)) {
 			if (atomic_read(&sdp->open_cnt) > 0) {
 				res = -EBUSY;
 				goto error_mutex_locked;
 			}
 		} else {
-			if (SG_HAVE_EXCLUDE(sdp)) {
+			if (unlikely(SG_HAVE_EXCLUDE(sdp))) {
 				res = -EBUSY;
 				goto error_mutex_locked;
 			}
 		}
 	} else {
 		res = sg_wait_open_event(sdp, o_excl);
-		if (res) /* -ERESTARTSYS or -ENODEV */
+		if (unlikely(res)) /* -ERESTARTSYS or -ENODEV */
 			goto error_mutex_locked;
 	}
 
 	/* N.B. at this point we are holding the open_rel_lock */
-	if (o_excl)
+	if (unlikely(o_excl))
 		set_bit(SG_FDEV_EXCLUDE, sdp->fdev_bm);
 
 	o_count = atomic_inc_return(&sdp->open_cnt);
@@ -586,7 +586,7 @@ sg_open(struct inode *inode, struct file *filp)
 	return res;
 
 out_undo:
-	if (o_excl) {		/* undo if error */
+	if (unlikely(o_excl)) {		/* undo if error */
 		clear_bit(SG_FDEV_EXCLUDE, sdp->fdev_bm);
 		wake_up_interruptible(&sdp->open_wait);
 	}
@@ -640,7 +640,7 @@ sg_release(struct inode *inode, struct file *filp)
 	if (unlikely(!sdp))
 		return -ENXIO;
 
-	if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_FREE)) {
+	if (unlikely(xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_FREE))) {
 		SG_LOG(1, sfp, "%s: sfp erased!!!\n", __func__);
 		return 0;	/* get out but can't fail */
 	}
@@ -648,11 +648,11 @@ sg_release(struct inode *inode, struct file *filp)
 	mutex_lock(&sdp->open_rel_lock);
 	o_count = atomic_read(&sdp->open_cnt);
 	SG_LOG(3, sfp, "%s: open count before=%d\n", __func__, o_count);
-	if (test_and_set_bit(SG_FFD_RELEASE, sfp->ffd_bm))
+	if (unlikely(test_and_set_bit(SG_FFD_RELEASE, sfp->ffd_bm)))
 		SG_LOG(1, sfp, "%s: second release on this fd ? ?\n",
 		       __func__);
 	scsi_autopm_put_device(sdp->device);
-	if (!xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_FREE) &&
+	if (likely(!xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_FREE)) &&
 	    sg_fd_is_shared(sfp))
 		sg_remove_sfp_share(sfp, xa_get_mark(&sdp->sfp_arr, sfp->idx,
 						     SG_XA_FD_RS_SHARE));
@@ -696,16 +696,16 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	struct sg_comm_wr_t cwr;
 
 	res = sg_check_file_access(filp, __func__);
-	if (res)
+	if (unlikely(res))
 		return res;
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
 	SG_LOG(3, sfp, "%s: write(3rd arg) count=%d\n", __func__, (int)count);
 	res = sg_allow_if_err_recovery(sdp, !!(filp->f_flags & O_NONBLOCK));
-	if (res)
+	if (unlikely(res))
 		return res;
-	if (count < SZ_SG_HEADER || count > SG_WRITE_COUNT_LIMIT)
+	if (unlikely(count < SZ_SG_HEADER || count > SG_WRITE_COUNT_LIMIT))
 		return -EIO;
 #ifdef CONFIG_COMPAT
 	if (in_compat_syscall())
@@ -732,10 +732,10 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 #else
 			lt = (count < sizeof(struct sg_io_hdr));
 #endif
-			if (lt)
+			if (unlikely(lt))
 				return -EIO;
 			get_v3_hdr = true;
-			if (get_sg_io_hdr(h3p, p))
+			if (unlikely(get_sg_io_hdr(h3p, p)))
 				return -EFAULT;
 		}
 	}
@@ -758,13 +758,13 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	}
 to_v2:
 	/* v1 and v2 interfaces processed below this point */
-	if (count < (SZ_SG_HEADER + 6))
+	if (unlikely(count < SZ_SG_HEADER + 6))
 		return -EIO;    /* minimum scsi command length is 6 bytes */
 	p += SZ_SG_HEADER;
-	if (get_user(opcode, p))
+	if (unlikely(get_user(opcode, p)))
 		return -EFAULT;
 	mutex_lock(&sfp->f_mutex);
-	if (sfp->next_cmd_len > 0) {
+	if (unlikely(sfp->next_cmd_len > 0)) {
 		cmd_size = sfp->next_cmd_len;
 		sfp->next_cmd_len = 0;	/* reset, only this write() effected */
 	} else {
@@ -779,7 +779,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	mxsize = max_t(int, input_size, ohp->reply_len);
 	mxsize -= SZ_SG_HEADER;
 	input_size -= SZ_SG_HEADER;
-	if (input_size < 0)
+	if (unlikely(input_size < 0))
 		return -EIO; /* Insufficient bytes passed for this command. */
 	memset(h3p, 0, sizeof(*h3p));
 	h3p->interface_id = '\0';/* indicate v1 or v2 interface (tunnelled) */
@@ -808,7 +808,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	 * but it is possible that the app intended SG_DXFER_TO_DEV, because
 	 * there is a non-zero input_size, so emit a warning.
 	 */
-	if (h3p->dxfer_direction == SG_DXFER_TO_FROM_DEV) {
+	if (unlikely(h3p->dxfer_direction == SG_DXFER_TO_FROM_DEV)) {
 		printk_ratelimited
 			(KERN_WARNING
 			 "%s: data in/out %d/%d bytes for SCSI command 0x%x-- guessing data in;\n"
@@ -832,13 +832,13 @@ sg_chk_mmap(struct sg_fd *sfp, int rq_flags, int len)
 {
 	if (unlikely(sfp->mmap_sz == 0))
 		return -EBADFD;
-	if (atomic_read(&sfp->submitted) > 0)
+	if (unlikely(atomic_read(&sfp->submitted) > 0))
 		return -EBUSY;  /* already active requests on fd */
-	if (len > sfp->rsv_srp->sgat_h.buflen)
+	if (unlikely(len > sfp->rsv_srp->sgat_h.buflen))
 		return -ENOMEM; /* MMAP_IO size must fit in reserve */
 	if (unlikely(len > sfp->mmap_sz))
 		return -ENOMEM; /* MMAP_IO size can't exceed mmap() size */
-	if (rq_flags & SG_FLAG_DIRECT_IO)
+	if (unlikely(rq_flags & SG_FLAG_DIRECT_IO))
 		return -EINVAL; /* not both MMAP_IO and DIRECT_IO */
 	return 0;
 }
@@ -846,7 +846,7 @@ sg_chk_mmap(struct sg_fd *sfp, int rq_flags, int len)
 static int
 sg_fetch_cmnd(struct sg_fd *sfp, const u8 __user *u_cdbp, int len, u8 *cdbp)
 {
-	if (!u_cdbp || len < 6 || len > SG_MAX_CDB_SIZE)
+	if (unlikely(!u_cdbp || len < 6 || len > SG_MAX_CDB_SIZE))
 		return -EMSGSIZE;
 	if (copy_from_user(cdbp, u_cdbp, len))
 		return -EFAULT;
@@ -872,9 +872,9 @@ sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 	struct sg_comm_wr_t cwr;
 
 	/* now doing v3 blocking (sync) or non-blocking submission */
-	if (hp->flags & SGV4_FLAG_MULTIPLE_REQS)
+	if (unlikely(hp->flags & SGV4_FLAG_MULTIPLE_REQS))
 		return -ERANGE;		/* need to use v4 interface */
-	if (hp->flags & SG_FLAG_MMAP_IO) {
+	if (unlikely(hp->flags & SG_FLAG_MMAP_IO)) {
 		int res = sg_chk_mmap(sfp, hp->flags, hp->dxfer_len);
 
 		if (unlikely(res))
@@ -928,9 +928,9 @@ sg_mrq_arr_flush(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds, u32 tot_reqs,
 	u32 sz = min(tot_reqs * SZ_SG_IO_V4, cop->din_xfer_len);
 	void __user *p = uptr64(cop->din_xferp);
 
-	if (s_res)
+	if (unlikely(s_res))
 		cop->spare_out = -s_res;
-	if (!p)
+	if (unlikely(!p))
 		return 0;
 	if (sz > 0) {
 		if (copy_to_user(p, a_hds, sz))
@@ -947,14 +947,14 @@ sg_mrq_1complet(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 	struct sg_io_v4 *siv4p;
 
 	SG_LOG(3, w_sfp, "%s: start, tot_reqs=%d\n", __func__, tot_reqs);
-	if (!srp)
+	if (unlikely(!srp))
 		return -EPROTO;
 	indx = srp->s_hdr4.mrq_ind;
-	if (indx < 0 || indx >= tot_reqs)
+	if (unlikely(indx < 0 || indx >= tot_reqs))
 		return -EPROTO;
 	siv4p = a_hds + indx;
 	s_res = sg_receive_v4(w_sfp, srp, NULL, siv4p);
-	if (s_res == -EFAULT)
+	if (unlikely(s_res == -EFAULT))
 		return s_res;
 	siv4p->info |= SG_INFO_MRQ_FINI;
 	if (w_sfp->async_qp && (siv4p->flags & SGV4_FLAG_SIGNAL)) {
@@ -1067,32 +1067,32 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 			       __func__, k, "bad guard");
 			return -ERANGE;
 		}
-		if (flags & SGV4_FLAG_MULTIPLE_REQS) {
+		if (unlikely(flags & SGV4_FLAG_MULTIPLE_REQS)) {
 			SG_LOG(1, sfp, "%s: %s %u: no nested multi-reqs\n",
 			       __func__, rip, k);
 			return -ERANGE;
 		}
 		if (immed) {	/* only accept async submits on current fd */
-			if (flags & SGV4_FLAG_DO_ON_OTHER) {
+			if (unlikely(flags & SGV4_FLAG_DO_ON_OTHER)) {
 				SG_LOG(1, sfp, "%s: %s %u, %s\n", __func__,
 				       rip, k, "no IMMED with ON_OTHER");
 				return -ERANGE;
-			} else if (flags & SGV4_FLAG_SHARE) {
+			} else if (unlikely(flags & SGV4_FLAG_SHARE)) {
 				SG_LOG(1, sfp, "%s: %s %u, %s\n", __func__,
 				       rip, k, "no IMMED with FLAG_SHARE");
 				return -ERANGE;
-			} else if (flags & SGV4_FLAG_COMPLETE_B4) {
+			} else if (unlikely(flags & SGV4_FLAG_COMPLETE_B4)) {
 				SG_LOG(1, sfp, "%s: %s %u, %s\n", __func__,
 				       rip, k, "no IMMED with COMPLETE_B4");
 				return -ERANGE;
 			}
 		}
 		if (!sg_fd_is_shared(sfp)) {
-			if (flags & SGV4_FLAG_SHARE) {
+			if (unlikely(flags & SGV4_FLAG_SHARE)) {
 				SG_LOG(1, sfp, "%s: %s %u, no share\n",
 				       __func__, rip, k);
 				return -ERANGE;
-			} else if (flags & SGV4_FLAG_DO_ON_OTHER) {
+			} else if (unlikely(flags & SGV4_FLAG_DO_ON_OTHER)) {
 				SG_LOG(1, sfp, "%s: %s %u, %s do on\n",
 				       __func__, rip, k, "no other fd to");
 				return -ERANGE;
@@ -1188,13 +1188,13 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 		cdb_mxlen = 0;
 	}
 
-	if (unlikely(SG_IS_DETACHING(sdp)))
+	if (SG_IS_DETACHING(sdp))
 		return -ENODEV;
 	else if (unlikely(o_sfp && SG_IS_DETACHING((o_sfp->parentdp))))
 		return -ENODEV;
 
 	a_hds = kcalloc(tot_reqs, SZ_SG_IO_V4, GFP_KERNEL | __GFP_NOWARN);
-	if (!a_hds)
+	if (unlikely(!a_hds))
 		return -ENOMEM;
 	n = tot_reqs * SZ_SG_IO_V4;
 	if (copy_from_user(a_hds, cuptr64(cop->dout_xferp), n)) {
@@ -1261,10 +1261,6 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 			}
 			break;
 		}
-		if (!srp) {
-			s_res = -EPROTO;
-			break;
-		}
 		++num_cmpl;
 		hp->info |= SG_INFO_MRQ_FINI;
 		if (stop_if && (hp->driver_status || hp->transport_status ||
@@ -1292,14 +1288,14 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 	if (immed)
 		goto fini;
 
-	if (res == 0 && (this_fp_sent + other_fp_sent) > 0) {
+	if (likely(res == 0 && (this_fp_sent + other_fp_sent) > 0)) {
 		s_res = sg_mrq_complets(cop, a_hds, fp, o_sfp, tot_reqs,
 					this_fp_sent, other_fp_sent);
-		if (s_res == -EFAULT || s_res == -ERESTARTSYS)
+		if (unlikely(s_res == -EFAULT || s_res == -ERESTARTSYS))
 			res = s_res;	/* this may leave orphans */
 	}
 fini:
-	if (res == 0 && !immed)
+	if (likely(res == 0) && !immed)
 		res = sg_mrq_arr_flush(cop, a_hds, tot_reqs, s_res);
 	kfree(cdb_ap);
 	kfree(a_hds);
@@ -1329,7 +1325,7 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 		res = sg_do_multi_req(&cwr, sync);
 		if (unlikely(res))
 			return res;
-		if (p) {
+		if (likely(p)) {
 			/* Write back sg_io_v4 object for error/warning info */
 			if (copy_to_user(p, h4p, SZ_SG_IO_V4))
 				return -EFAULT;
@@ -1368,8 +1364,8 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 		u64 gen_tag = srp->tag;
 		struct sg_io_v4 __user *h4_up = (struct sg_io_v4 __user *)p;
 
-		if (unlikely(copy_to_user(&h4_up->generated_tag, &gen_tag,
-					  sizeof(gen_tag))))
+		if (copy_to_user(&h4_up->generated_tag, &gen_tag,
+				 sizeof(gen_tag)))
 			return -EFAULT;
 	}
 	return res;
@@ -1384,11 +1380,11 @@ sg_ctl_iosubmit(struct sg_fd *sfp, void __user *p)
 	struct sg_device *sdp = sfp->parentdp;
 
 	res = sg_allow_if_err_recovery(sdp, SG_IS_O_NONBLOCK(sfp));
-	if (res)
+	if (unlikely(res))
 		return res;
 	if (copy_from_user(h4p, p, SZ_SG_IO_V4))
 		return -EFAULT;
-	if (h4p->guard == 'Q')
+	if (likely(h4p->guard == 'Q'))
 		return sg_submit_v4(sfp, p, h4p, false, NULL);
 	return -EPERM;
 }
@@ -1406,7 +1402,7 @@ sg_ctl_iosubmit_v3(struct sg_fd *sfp, void __user *p)
 		return res;
 	if (copy_from_user(h3p, p, SZ_SG_IO_HDR))
 		return -EFAULT;
-	if (h3p->interface_id == 'S')
+	if (likely(h3p->interface_id == 'S'))
 		return sg_submit_v3(sfp, h3p, false, NULL);
 	return -EPERM;
 }
@@ -1428,26 +1424,26 @@ sg_share_chk_flags(struct sg_fd *sfp, u32 rq_flags, int dxfer_len, int dir,
 	enum sg_shr_var sh_var = SG_SHR_NONE;
 
 	if (rq_flags & SGV4_FLAG_SHARE) {
-		if (rq_flags & SG_FLAG_DIRECT_IO)
+		if (unlikely(rq_flags & SG_FLAG_DIRECT_IO))
 			result = -EINVAL; /* since no control of data buffer */
-		else if (dxfer_len < 1)
+		else if (unlikely(dxfer_len < 1))
 			result = -ENODATA;
 		else if (is_read_side) {
 			sh_var = SG_SHR_RS_RQ;
-			if (dir != SG_DXFER_FROM_DEV)
+			if (unlikely(dir != SG_DXFER_FROM_DEV))
 				result = -ENOMSG;
 			if (rq_flags & SGV4_FLAG_NO_DXFER) {
 				/* rule out some contradictions */
-				if (rq_flags & SG_FL_MMAP_DIRECT)
+				if (unlikely(rq_flags & SG_FL_MMAP_DIRECT))
 					result = -ENODATA;
 			}
 		} else {			/* fd is write-side */
 			sh_var = SG_SHR_WS_RQ;
-			if (dir != SG_DXFER_TO_DEV)
+			if (unlikely(dir != SG_DXFER_TO_DEV))
 				result = -ENOMSG;
-			if (!(rq_flags & SGV4_FLAG_NO_DXFER))
+			if (unlikely(!(rq_flags & SGV4_FLAG_NO_DXFER)))
 				result = -ENOMSG;
-			if (rq_flags & SG_FL_MMAP_DIRECT)
+			if (unlikely(rq_flags & SG_FL_MMAP_DIRECT))
 				result = -ENODATA;
 		}
 	} else if (is_read_side) {
@@ -1515,7 +1511,7 @@ sg_rq_chg_state_ulck(struct sg_request *srp, enum sg_rq_state old_st,
 		sg_rq_state_mul2arr[(int)new_st];
 	act_old_st = (enum sg_rq_state)atomic_cmpxchg(&srp->rq_st, old_st,
 						      new_st);
-	if (act_old_st != old_st) {
+	if (unlikely(act_old_st != old_st)) {
 #if IS_ENABLED(SG_LOG_ACTIVE)
 		SG_LOG(1, srp->parentfp, "%s: unexpected old state: %s\n",
 		       __func__, sg_rq_st_str(act_old_st, false));
@@ -1671,14 +1667,14 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 	struct sg_io_hdr *hi_p;
 	struct sg_io_v4 *h4p;
 
-	if (test_bit(SG_FRQ_IS_V4I, cwrp->frq_bm)) {
+	if (likely(test_bit(SG_FRQ_IS_V4I, cwrp->frq_bm))) {
 		h4p = cwrp->h4p;
 		hi_p = NULL;
 		dxfr_len = 0;
 		dir = SG_DXFER_NONE;
 		rq_flags = h4p->flags;
 		pack_id = h4p->request_extra;
-		if (h4p->din_xfer_len && h4p->dout_xfer_len) {
+		if (unlikely(h4p->din_xfer_len && h4p->dout_xfer_len)) {
 			return ERR_PTR(-EOPNOTSUPP);
 		} else if (h4p->din_xfer_len) {
 			dxfr_len = h4p->din_xfer_len;
@@ -1695,7 +1691,7 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		rq_flags = hi_p->flags;
 		pack_id = hi_p->pack_id;
 	}
-	if (rq_flags & SGV4_FLAG_MULTIPLE_REQS)
+	if (unlikely(rq_flags & SGV4_FLAG_MULTIPLE_REQS))
 		return ERR_PTR(-ERANGE);
 	if (sg_fd_is_shared(fp)) {
 		res = sg_share_chk_flags(fp, rq_flags, dxfr_len, dir, &sh_var);
@@ -1706,7 +1702,7 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		if (rq_flags & SGV4_FLAG_SHARE)
 			return ERR_PTR(-ENOMSG);
 	}
-	if (dxfr_len >= SZ_256M)
+	if (unlikely(dxfr_len >= SZ_256M))
 		return ERR_PTR(-EINVAL);
 
 	srp = sg_setup_req(cwrp, sh_var, dxfr_len);
@@ -1715,7 +1711,7 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 	srp->rq_flags = rq_flags;
 	srp->pack_id = pack_id;
 
-	if (h4p) {
+	if (likely(h4p)) {
 		srp->s_hdr4.usr_ptr = h4p->usr_ptr;
 		srp->s_hdr4.sbp = uptr64(h4p->response);
 		srp->s_hdr4.max_sb_len = h4p->max_response_len;
@@ -1726,11 +1722,11 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		memcpy(&srp->s_hdr3, hi_p, sizeof(srp->s_hdr3));
 	}
 	res = sg_start_req(srp, cwrp, dir);
-	if (res < 0)		/* probably out of space --> -ENOMEM */
+	if (unlikely(res < 0))	/* probably out of space --> -ENOMEM */
 		goto err_out;
 	SG_LOG(4, fp, "%s: opcode=0x%02x, cdb_sz=%d, pack_id=%d\n", __func__,
 	       srp->cmd_opcode, cwrp->cmd_len, pack_id);
-	if (unlikely(SG_IS_DETACHING(sdp))) {
+	if (SG_IS_DETACHING(sdp)) {
 		res = -ENODEV;
 		goto err_out;
 	}
@@ -1761,7 +1757,7 @@ sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int id,
 {
 	struct sg_request *srp;
 
-	if (unlikely(SG_IS_DETACHING(sfp->parentdp))) {
+	if (SG_IS_DETACHING(sfp->parentdp)) {
 		*srpp = ERR_PTR(-ENODEV);
 		return true;
 	}
@@ -1782,8 +1778,8 @@ sg_copy_sense(struct sg_request *srp, bool v4_active)
 
 	/* If need be, copy the sense buffer to the user space */
 	scsi_stat = srp->rq_result & 0xff;
-	if ((scsi_stat & SAM_STAT_CHECK_CONDITION) ||
-	    (driver_byte(srp->rq_result) & DRIVER_SENSE)) {
+	if (unlikely((scsi_stat & SAM_STAT_CHECK_CONDITION) ||
+		     (driver_byte(srp->rq_result) & DRIVER_SENSE))) {
 		int sb_len = min_t(int, SCSI_SENSE_BUFFERSIZE, srp->sense_len);
 		int mx_sb_len;
 		u8 *sbp = srp->sense_bp;
@@ -1821,12 +1817,12 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 	if (unlikely(srp->rq_result & 0xff)) {
 		int sb_len_wr = sg_copy_sense(srp, v4_active);
 
-		if (sb_len_wr < 0)
+		if (unlikely(sb_len_wr < 0))
 			return sb_len_wr;
 	}
 	if (rq_res & SG_ML_RESULT_MSK)
 		srp->rq_info |= SG_INFO_CHECK;
-	if (test_bit(SG_FRQ_ABORTING, srp->frq_bm))
+	if (unlikely(test_bit(SG_FRQ_ABORTING, srp->frq_bm)))
 		srp->rq_info |= SG_INFO_ABORTED;
 
 	sh_sfp = sg_fd_share_ptr(sfp);
@@ -1856,7 +1852,7 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 			break;	/* nothing to do */
 		}
 	}
-	if (unlikely(SG_IS_DETACHING(sfp->parentdp)))
+	if (SG_IS_DETACHING(sfp->parentdp))
 		srp->rq_info |= SG_INFO_DEVICE_DETACHING;
 	return err;
 }
@@ -1873,7 +1869,8 @@ sg_complete_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool other_err)
 			int poll_type = POLL_OUT;
 			struct sg_fd *ws_sfp = sg_fd_share_ptr(sfp);
 
-			if ((srp->rq_result & SG_ML_RESULT_MSK) || other_err) {
+			if (unlikely((srp->rq_result & SG_ML_RESULT_MSK) ||
+				     other_err)) {
 				set_bit(SG_FFD_READ_SIDE_ERR, sfp->ffd_bm);
 				if (sr_st != SG_RQ_BUSY)
 					sg_rq_chg_state_force(srp, SG_RQ_BUSY);
@@ -1892,7 +1889,7 @@ sg_complete_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool other_err)
 		{
 			struct sg_fd *rs_sfp = sg_fd_share_ptr(sfp);
 
-			if (rs_sfp) {
+			if (likely(rs_sfp)) {
 				rs_sfp->ws_srp = NULL;
 				if (rs_sfp->rsv_srp)
 					rs_sfp->rsv_srp->sh_var =
@@ -1955,7 +1952,7 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 	sg_complete_v3v4(sfp, srp, err < 0);
 	sg_finish_scsi_blk_rq(srp);
 	sg_deact_request(sfp, srp);
-	return err < 0 ? err : 0;
+	return unlikely(err < 0) ? err : 0;
 }
 
 /*
@@ -2017,16 +2014,16 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p,
 
 	SG_LOG(3, sfp, "%s: non_block=%d\n", __func__, !!non_block);
 	n = cop->din_xfer_len;
-	if (n > SG_MAX_MULTI_REQ_SZ)
+	if (unlikely(n > SG_MAX_MULTI_REQ_SZ))
 		return -E2BIG;
-	if (!cop->din_xferp || n < SZ_SG_IO_V4 || (n % SZ_SG_IO_V4))
+	if (unlikely(!cop->din_xferp || n < SZ_SG_IO_V4 || (n % SZ_SG_IO_V4)))
 		return -ERANGE;
 	n /= SZ_SG_IO_V4;
 	len = n * SZ_SG_IO_V4;
 	SG_LOG(3, sfp, "%s: %s, num_reqs=%u\n", __func__,
 	       (non_block ? "IMMED" : "blocking"), n);
 	rsp_v4_arr = kcalloc(n, SZ_SG_IO_V4, GFP_KERNEL);
-	if (!rsp_v4_arr)
+	if (unlikely(!rsp_v4_arr))
 		return -ENOMEM;
 
 	sg_sgv4_out_zero(cop);
@@ -2040,7 +2037,7 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p,
 		return -EFAULT;
 	res = 0;
 	pp = uptr64(cop->din_xferp);
-	if (pp) {
+	if (likely(pp)) {
 		if (copy_to_user(pp, rsp_v4_arr, len))
 			res = -EFAULT;
 	} else {
@@ -2066,19 +2063,20 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 	int res, id;
 	int pack_id = SG_PACK_ID_WILDCARD;
 	int tag = SG_TAG_WILDCARD;
-	u8 v4_holder[SZ_SG_IO_V4];
-	struct sg_io_v4 *h4p = (struct sg_io_v4 *)v4_holder;
+	struct sg_io_v4 h4;
+	struct sg_io_v4 *h4p = &h4;
 	struct sg_device *sdp = sfp->parentdp;
 	struct sg_request *srp;
 
 	res = sg_allow_if_err_recovery(sdp, non_block);
-	if (res)
+	if (unlikely(res))
 		return res;
 	/* Get first three 32 bit integers: guard, proto+subproto */
 	if (copy_from_user(h4p, p, SZ_SG_IO_V4))
 		return -EFAULT;
 	/* for v4: protocol=0 --> SCSI;  subprotocol=0 --> SPC++ */
-	if (h4p->guard != 'Q' || h4p->protocol != 0 || h4p->subprotocol != 0)
+	if (unlikely(h4p->guard != 'Q' || h4p->protocol != 0 ||
+		     h4p->subprotocol != 0))
 		return -EPERM;
 	if (h4p->flags & SGV4_FLAG_IMMED)
 		non_block = true;	/* set by either this or O_NONBLOCK */
@@ -2097,14 +2095,14 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 try_again:
 	srp = sg_find_srp_by_id(sfp, id, use_tag);
 	if (!srp) {     /* nothing available so wait on packet or */
-		if (unlikely(SG_IS_DETACHING(sdp)))
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		if (non_block)
 			return -EAGAIN;
 		res = wait_event_interruptible
 				(sfp->cmpl_wait,
 				 sg_get_ready_srp(sfp, &srp, id, use_tag));
-		if (res)
+		if (unlikely(res))
 			return res;	/* signal --> -ERESTARTSYS */
 		if (IS_ERR(srp))
 			return PTR_ERR(srp);
@@ -2129,8 +2127,8 @@ sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 	bool non_block = SG_IS_O_NONBLOCK(sfp);
 	int res;
 	int pack_id = SG_PACK_ID_WILDCARD;
-	u8 v3_holder[SZ_SG_IO_HDR];
-	struct sg_io_hdr *h3p = (struct sg_io_hdr *)v3_holder;
+	struct sg_io_hdr h3;
+	struct sg_io_hdr *h3p = &h3;
 	struct sg_device *sdp = sfp->parentdp;
 	struct sg_request *srp;
 
@@ -2141,12 +2139,12 @@ sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 	if (copy_from_user(h3p, p, SZ_SG_IO_HDR))
 		return -EFAULT;
 	/* for v3: interface_id=='S' (in a 32 bit int) */
-	if (h3p->interface_id != 'S')
+	if (unlikely(h3p->interface_id != 'S'))
 		return -EPERM;
 	if (h3p->flags & SGV4_FLAG_IMMED)
 		non_block = true;	/* set by either this or O_NONBLOCK */
 	SG_LOG(3, sfp, "%s: non_block(+IMMED)=%d\n", __func__, non_block);
-	if (h3p->flags & SGV4_FLAG_MULTIPLE_REQS)
+	if (unlikely(h3p->flags & SGV4_FLAG_MULTIPLE_REQS))
 		return -EINVAL;
 
 	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm))
@@ -2154,7 +2152,7 @@ sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 try_again:
 	srp = sg_find_srp_by_id(sfp, pack_id, false);
 	if (!srp) {     /* nothing available so wait on packet or */
-		if (unlikely(SG_IS_DETACHING(sdp)))
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		if (non_block)
 			return -EAGAIN;
@@ -2193,9 +2191,9 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 	h2p->target_status = status_byte(rq_result);
 	h2p->host_status = host_byte(rq_result);
 	h2p->driver_status = driver_byte(rq_result);
-	if ((CHECK_CONDITION & status_byte(rq_result)) ||
-	    (DRIVER_SENSE & driver_byte(rq_result))) {
-		if (srp->sense_bp) {
+	if (unlikely((CHECK_CONDITION & status_byte(rq_result)) ||
+		     (DRIVER_SENSE & driver_byte(rq_result)))) {
+		if (likely(srp->sense_bp)) {
 			u8 *sbp = srp->sense_bp;
 
 			srp->sense_bp = NULL;
@@ -2204,7 +2202,7 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 			mempool_free(sbp, sg_sense_pool);
 		}
 	}
-	switch (host_byte(rq_result)) {
+	switch (unlikely(host_byte(rq_result))) {
 	/*
 	 * This following setting of 'result' is for backward compatibility
 	 * and is best ignored by the user who should use target, host and
@@ -2246,7 +2244,7 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 			count = h2p->reply_len;
 		if (count > SZ_SG_HEADER) {
 			res = sg_read_append(srp, buf, count - SZ_SG_HEADER);
-			if (res)
+			if (unlikely(res))
 				goto fini;
 		}
 	} else {
@@ -2282,14 +2280,14 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 	 * file descriptor to free up any resources being held.
 	 */
 	ret = sg_check_file_access(filp, __func__);
-	if (ret)
+	if (unlikely(ret))
 		return ret;
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
 	SG_LOG(3, sfp, "%s: read() count=%d\n", __func__, (int)count);
 	ret = sg_allow_if_err_recovery(sdp, non_block);
-	if (ret)
+	if (unlikely(ret))
 		return ret;
 
 	could_be_v3 = (count >= SZ_SG_IO_HDR);
@@ -2307,12 +2305,12 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 		if (h2p->reply_len < 0 && could_be_v3) {
 			struct sg_io_hdr *v3_hdr = (struct sg_io_hdr *)h2p;
 
-			if (v3_hdr->interface_id == 'S') {
+			if (likely(v3_hdr->interface_id == 'S')) {
 				struct sg_io_hdr __user *h3_up;
 
 				h3_up = (struct sg_io_hdr __user *)p;
 				ret = get_user(want_id, &h3_up->pack_id);
-				if (ret)
+				if (unlikely(ret))
 					return ret;
 				if (!non_block) {
 					int flgs;
@@ -2337,14 +2335,14 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 try_again:
 	srp = sg_find_srp_by_id(sfp, want_id, false);
 	if (!srp) {	/* nothing available so wait on packet to arrive or */
-		if (unlikely(SG_IS_DETACHING(sdp)))
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		if (non_block) /* O_NONBLOCK or v3::flags & SGV4_FLAG_IMMED */
 			return -EAGAIN;
 		ret = wait_event_interruptible
 				(sfp->cmpl_wait,
 				 sg_get_ready_srp(sfp, &srp, want_id, false));
-		if (ret)	/* -ERESTARTSYS as signal hit process */
+		if (unlikely(ret))  /* -ERESTARTSYS as signal hit process */
 			return ret;
 		if (IS_ERR(srp))
 			return PTR_ERR(srp);
@@ -2365,9 +2363,11 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 		}
 		ret = sg_receive_v3(sfp, srp, p);
 	}
-	if (ret < 0)
+#if IS_ENABLED(SG_LOG_ACTIVE)
+	if (unlikely(ret < 0))
 		SG_LOG(1, sfp, "%s: negated errno: %d\n", __func__, ret);
-	return ret < 0 ? ret : (int)count;
+#endif
+	return unlikely(ret < 0) ? ret : (int)count;
 }
 
 /*
@@ -2496,7 +2496,7 @@ sg_change_after_read_side_rq(struct sg_fd *sfp, bool fini1_again0)
 		case SG_RQ_BUSY:
 			res = -EAGAIN;
 			break;
-		default:	/* read-side in SG_RQ_SHR_SWAIT is bad */
+		default:
 			res = -EINVAL;
 			break;
 		}
@@ -2589,14 +2589,15 @@ sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
 	if (is_rd_side) {
 		bool set_inactive = false;
 
-		if (!xa_get_mark(xadp, sfp->idx, SG_XA_FD_RS_SHARE)) {
+		if (unlikely(!xa_get_mark(xadp, sfp->idx,
+					  SG_XA_FD_RS_SHARE))) {
 			xa_unlock_irqrestore(xadp, iflags);
 			return;
 		}
 		rsv_srp = sfp->rsv_srp;
-		if (!rsv_srp)
+		if (unlikely(!rsv_srp))
 			goto fini;
-		if (rsv_srp->sh_var != SG_SHR_RS_RQ)
+		if (unlikely(rsv_srp->sh_var != SG_SHR_RS_RQ))
 			goto fini;
 		sr_st = atomic_read(&rsv_srp->rq_st);
 		switch (sr_st) {
@@ -2626,7 +2627,7 @@ sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
 			sg_unshare_ws_fd(sh_sfp, sdp != sh_sdp);
 		sg_unshare_rs_fd(sfp, false);
 	} else {
-		if (!sg_fd_is_shared(sfp)) {
+		if (unlikely(!sg_fd_is_shared(sfp))) {
 			xa_unlock_irqrestore(xadp, iflags);
 			return;
 		} else if (!xa_get_mark(&sh_sdp->sfp_arr, sh_sfp->idx,
@@ -2788,8 +2789,7 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 static inline bool
 sg_rq_landed(struct sg_device *sdp, struct sg_request *srp)
 {
-	return atomic_read_acquire(&srp->rq_st) != SG_RQ_INFLIGHT ||
-	       unlikely(SG_IS_DETACHING(sdp));
+	return atomic_read_acquire(&srp->rq_st) != SG_RQ_INFLIGHT || SG_IS_DETACHING(sdp);
 }
 
 /*
@@ -2827,7 +2827,7 @@ sg_wait_event_srp(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 		return res;
 	}
 skip_wait:
-	if (unlikely(SG_IS_DETACHING(sdp))) {
+	if (SG_IS_DETACHING(sdp)) {
 		sg_rq_chg_state_force(srp, SG_RQ_INACTIVE);
 		atomic_inc(&sfp->inactives);
 		return -ENODEV;
@@ -2865,15 +2865,25 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	SG_LOG(3, sfp, "%s:  SG_IO%s\n", __func__,
 	       (SG_IS_O_NONBLOCK(sfp) ? " O_NONBLOCK ignored" : ""));
 	res = sg_allow_if_err_recovery(sdp, false);
-	if (res)
+	if (unlikely(res))
 		return res;
-	if (get_sg_io_hdr(h3p, p))
+	if (unlikely(get_sg_io_hdr(h3p, p)))
 		return -EFAULT;
 	if (h3p->interface_id == 'Q') {
 		/* copy in rest of sg_io_v4 object */
-		if (copy_from_user(hu8arr + SZ_SG_IO_HDR,
-				   ((u8 __user *)p) + SZ_SG_IO_HDR,
-				   SZ_SG_IO_V4 - SZ_SG_IO_HDR))
+		int v3_len;
+
+#ifdef CONFIG_COMPAT
+		if (in_compat_syscall())
+			v3_len = sizeof(struct compat_sg_io_hdr);
+		else
+			v3_len = SZ_SG_IO_HDR;
+#else
+		v3_len = SZ_SG_IO_HDR;
+#endif
+		if (copy_from_user(hu8arr + v3_len,
+				   ((u8 __user *)p) + v3_len,
+				   SZ_SG_IO_V4 - v3_len))
 			return -EFAULT;
 		res = sg_submit_v4(sfp, p, h4p, true, &srp);
 	} else if (h3p->interface_id == 'S') {
@@ -2932,6 +2942,61 @@ sg_match_request(struct sg_fd *sfp, bool use_tag, int id)
 	return NULL;
 }
 
+static int
+sg_abort_req(struct sg_fd *sfp, struct sg_request *srp)
+		__must_hold(&sfp->srp_arr->xa_lock)
+{
+	int res = 0;
+	enum sg_rq_state rq_st;
+
+	if (test_and_set_bit(SG_FRQ_ABORTING, srp->frq_bm)) {
+		SG_LOG(1, sfp, "%s: already aborting req pack_id/tag=%d/%d\n",
+		       __func__, srp->pack_id, srp->tag);
+		goto fini;	/* skip quietly if already aborted */
+	}
+	rq_st = atomic_read(&srp->rq_st);
+	SG_LOG(3, sfp, "%s: req pack_id/tag=%d/%d, status=%s\n", __func__,
+	       srp->pack_id, srp->tag, sg_rq_st_str(rq_st, false));
+	switch (rq_st) {
+	case SG_RQ_BUSY:
+		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		res = -EBUSY;	/* should not occur often */
+		break;
+	case SG_RQ_INACTIVE:	/* perhaps done already */
+		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		break;
+	case SG_RQ_AWAIT_RCV:	/* user should still do completion */
+	case SG_RQ_SHR_SWAP:
+	case SG_RQ_SHR_IN_WS:
+		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		break;		/* nothing to do here, return 0 */
+	case SG_RQ_INFLIGHT:	/* only attempt abort if inflight */
+		srp->rq_result |= (DRIVER_SOFT << 24);
+		{
+			struct request *rqq = READ_ONCE(srp->rqq);
+
+			if (likely(rqq)) {
+				SG_LOG(5, sfp, "%s: -->blk_abort_request srp=0x%pK\n",
+				       __func__, srp);
+				blk_abort_request(rqq);
+			}
+		}
+		break;
+	default:
+		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		break;
+	}
+fini:
+	return res;
+}
+
+/*
+ * Tries to abort an inflight request/command. First it checks the current fd
+ * for a match on pack_id or tag. If there is a match, aborts that match.
+ * Otherwise, if SGV4_FLAG_DEV_SCOPE is set, the rest of the file descriptors
+ * belonging to the current device are similarly checked. If there is no match
+ * then -ENODATA is returned.
+ */
 static int
 sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		__must_hold(sfp->f_mutex)
@@ -2973,37 +3038,7 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		if (!srp)
 			return -ENODATA;
 	}
-
-	if (test_and_set_bit(SG_FRQ_ABORTING, srp->frq_bm))
-		goto fini;
-
-	switch (atomic_read(&srp->rq_st)) {
-	case SG_RQ_BUSY:
-		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
-		res = -EBUSY;	/* should not occur often */
-		break;
-	case SG_RQ_INACTIVE:	/* perhaps done already */
-		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
-		break;
-	case SG_RQ_AWAIT_RCV:	/* user should still do completion */
-	case SG_RQ_SHR_SWAP:
-	case SG_RQ_SHR_IN_WS:
-		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
-		break;		/* nothing to do here, return 0 */
-	case SG_RQ_INFLIGHT:	/* only attempt abort if inflight */
-		srp->rq_result |= (DRIVER_SOFT << 24);
-		{
-			struct request *rqq = READ_ONCE(srp->rqq);
-
-			if (rqq)
-				blk_abort_request(rqq);
-		}
-		break;
-	default:
-		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
-		break;
-	}
-fini:
+	res = sg_abort_req(sfp, srp);
 	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 	return res;
 }
@@ -3102,7 +3137,7 @@ sg_find_sfp_by_fd(const struct file *search_for, int search_fd,
 				continue;       /* not this one */
 			res = sg_find_sfp_helper(from_sfp, sfp,
 						 from_is_rd_side, search_fd);
-			if (res == 0) {
+			if (likely(res == 0)) {
 				found = true;
 				break;
 			}
@@ -3217,6 +3252,7 @@ sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
 	bool found = false;
 	int res = 0;
 	int retry_count = 0;
+	enum sg_rq_state rq_st;
 	struct file *filp;
 	struct sg_fd *ws_sfp = sg_fd_share_ptr(rs_sfp);
 
@@ -3228,6 +3264,17 @@ sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
 	if (unlikely(!xa_get_mark(&rs_sfp->parentdp->sfp_arr, rs_sfp->idx,
 				  SG_XA_FD_RS_SHARE)))
 		return -EINVAL;
+	if (unlikely(!ws_sfp))
+		return -EINVAL;
+	if (unlikely(!rs_sfp->rsv_srp))
+		res = -EPROTO;	/* Internal error */
+	rq_st = atomic_read(&rs_sfp->rsv_srp->rq_st);
+	if (!(rq_st == SG_RQ_INACTIVE || rq_st == SG_RQ_SHR_SWAP))
+		res = -EBUSY;		/* read-side reserve buffer busy */
+	if (rs_sfp->ws_srp)
+		res = -EBUSY;	/* previous write-side request not finished */
+	if (unlikely(res))
+		return res;
 
 	/* Alternate approach: fcheck_files(current->files, m_fd) */
 	filp = fget(new_ws_fd);
@@ -3289,7 +3336,7 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 	if (unlikely(sg_fd_is_shared(sfp)))
 		return -EBUSY;	/* this fd can't be either side of share */
 	o_srp = sfp->rsv_srp;
-	if (!o_srp)
+	if (unlikely(!o_srp))
 		return -EPROTO;
 	new_sz = min_t(int, want_rsv_sz, sdp->max_sgat_sz);
 	new_sz = max_t(int, new_sz, sfp->sgat_elem_sz);
@@ -3304,11 +3351,11 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 	/* new sg_request object, sized correctly is now available */
 try_again:
 	o_srp = sfp->rsv_srp;
-	if (!o_srp) {
+	if (unlikely(!o_srp)) {
 		res = -EPROTO;
 		goto fini;
 	}
-	if (SG_RQ_ACTIVE(o_srp) || sfp->mmap_sz > 0) {
+	if (unlikely(SG_RQ_ACTIVE(o_srp) || sfp->mmap_sz > 0)) {
 		res = -EBUSY;
 		goto fini;
 	}
@@ -3615,7 +3662,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 	s_wr_mask = seip->sei_wr_mask;
 	s_rd_mask = seip->sei_rd_mask;
 	or_masks = s_wr_mask | s_rd_mask;
-	if (or_masks == 0) {
+	if (unlikely(or_masks == 0)) {
 		SG_LOG(2, sfp, "%s: both masks 0, do nothing\n", __func__);
 		return 0;
 	}
@@ -3653,7 +3700,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 		mutex_lock(&sfp->f_mutex);
 		if (s_wr_mask & SG_SEIM_SHARE_FD) {
 			result = sg_fd_share(sfp, (int)seip->share_fd);
-			if (ret == 0 && result)
+			if (ret == 0 && unlikely(result))
 				ret = result;
 		}
 		/* if share then yield device number of (other) read-side */
@@ -3670,7 +3717,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 		mutex_lock(&sfp->f_mutex);
 		if (s_wr_mask & SG_SEIM_CHG_SHARE_FD) {
 			result = sg_fd_reshare(sfp, (int)seip->share_fd);
-			if (ret == 0 && result)
+			if (ret == 0 && unlikely(result))
 				ret = result;
 		}
 		/* if share then yield device number of (other) write-side */
@@ -3718,7 +3765,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 	if (s_wr_mask & SG_SEIM_RESERVED_SIZE) {
 		mutex_lock(&sfp->f_mutex);
 		result = sg_set_reserved_sz(sfp, (int)seip->reserved_sz);
-		if (ret == 0 && result)
+		if (ret == 0 && unlikely(result))
 			ret = result;
 		mutex_unlock(&sfp->f_mutex);
 	}
@@ -3743,7 +3790,8 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 static int
 sg_ctl_req_tbl(struct sg_fd *sfp, void __user *p)
 {
-	int k, result, val;
+	int k, val;
+	int result = 0;
 	unsigned long idx;
 	struct sg_request *srp;
 	struct sg_req_info *rinfop;
@@ -3751,11 +3799,11 @@ sg_ctl_req_tbl(struct sg_fd *sfp, void __user *p)
 	SG_LOG(3, sfp, "%s:    SG_GET_REQUEST_TABLE\n", __func__);
 	k = SG_MAX_QUEUE;
 	rinfop = kcalloc(k, SZ_SG_REQ_INFO, GFP_KERNEL);
-	if (!rinfop)
+	if (unlikely(!rinfop))
 		return -ENOMEM;
 	val = 0;
 	xa_for_each(&sfp->srp_arr, idx, srp) {
-		if (val >= SG_MAX_QUEUE)
+		if (unlikely(val >= SG_MAX_QUEUE))
 			break;
 		if (xa_get_mark(&sfp->srp_arr, idx, SG_XA_RQ_INACTIVE))
 			continue;
@@ -3763,7 +3811,7 @@ sg_ctl_req_tbl(struct sg_fd *sfp, void __user *p)
 		val++;
 	}
 	xa_for_each(&sfp->srp_arr, idx, srp) {
-		if (val >= SG_MAX_QUEUE)
+		if (unlikely(val >= SG_MAX_QUEUE))
 			break;
 		if (!xa_get_mark(&sfp->srp_arr, idx, SG_XA_RQ_INACTIVE))
 			continue;
@@ -3825,34 +3873,34 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 
 	switch (cmd_in) {
 	case SG_IO:
-		if (unlikely(SG_IS_DETACHING(sdp)))
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		return sg_ctl_sg_io(sdp, sfp, p);
 	case SG_IOSUBMIT:
 		SG_LOG(3, sfp, "%s:    SG_IOSUBMIT\n", __func__);
-		if (unlikely(SG_IS_DETACHING(sdp)))
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		return sg_ctl_iosubmit(sfp, p);
 	case SG_IOSUBMIT_V3:
 		SG_LOG(3, sfp, "%s:    SG_IOSUBMIT_V3\n", __func__);
-		if (unlikely(SG_IS_DETACHING(sdp)))
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		return sg_ctl_iosubmit_v3(sfp, p);
 	case SG_IORECEIVE:
 		SG_LOG(3, sfp, "%s:    SG_IORECEIVE\n", __func__);
-		if (unlikely(SG_IS_DETACHING(sdp)))
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		return sg_ctl_ioreceive(sfp, p);
 	case SG_IORECEIVE_V3:
 		SG_LOG(3, sfp, "%s:    SG_IORECEIVE_V3\n", __func__);
-		if (unlikely(SG_IS_DETACHING(sdp)))
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		return sg_ctl_ioreceive_v3(sfp, p);
 	case SG_IOABORT:
 		SG_LOG(3, sfp, "%s:    SG_IOABORT\n", __func__);
-		if (unlikely(SG_IS_DETACHING(sdp)))
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
-		if (read_only)
+		if (unlikely(read_only))
 			return -EPERM;
 		mutex_lock(&sfp->f_mutex);
 		res = sg_ctl_abort(sdp, sfp, p);
@@ -3866,7 +3914,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	case SG_SET_FORCE_PACK_ID:
 		SG_LOG(3, sfp, "%s:    SG_SET_FORCE_PACK_ID\n", __func__);
 		res = get_user(val, ip);
-		if (res)
+		if (unlikely(res))
 			return res;
 		assign_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm, !!val);
 		return 0;
@@ -3905,8 +3953,8 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return put_user(sdp->max_sgat_sz, ip);
 	case SG_SET_RESERVED_SIZE:
 		res = get_user(val, ip);
-		if (!res) {
-			if (val >= 0 && val <= (1024 * 1024 * 1024)) {
+		if (likely(!res)) {
+			if (likely(val >= 0 && val <= (1024 * 1024 * 1024))) {
 				mutex_lock(&sfp->f_mutex);
 				res = sg_set_reserved_sz(sfp, val);
 				mutex_unlock(&sfp->f_mutex);
@@ -3921,14 +3969,14 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		val = min_t(int, sfp->rsv_srp->sgatp->buflen,
 			    sdp->max_sgat_sz);
 		mutex_unlock(&sfp->f_mutex);
-		SG_LOG(3, sfp, "%s:    SG_GET_RESERVED_SIZE=%d\n",
-		       __func__, val);
+		SG_LOG(3, sfp, "%s:    SG_GET_RESERVED_SIZE=%d\n", __func__,
+		       val);
 		res = put_user(val, ip);
 		return res;
 	case SG_SET_COMMAND_Q:
 		SG_LOG(3, sfp, "%s:    SG_SET_COMMAND_Q\n", __func__);
 		res = get_user(val, ip);
-		if (res)
+		if (unlikely(res))
 			return res;
 		assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, !!val);
 		return 0;
@@ -3938,7 +3986,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	case SG_SET_KEEP_ORPHAN:
 		SG_LOG(3, sfp, "%s:    SG_SET_KEEP_ORPHAN\n", __func__);
 		res = get_user(val, ip);
-		if (res)
+		if (unlikely(res))
 			return res;
 		assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, !!val);
 		return 0;
@@ -3957,9 +4005,9 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	case SG_SET_TIMEOUT:
 		SG_LOG(3, sfp, "%s:    SG_SET_TIMEOUT\n", __func__);
 		res = get_user(val, ip);
-		if (res)
+		if (unlikely(res))
 			return res;
-		if (val < 0)
+		if (unlikely(val < 0))
 			return -EIO;
 		if (val >= mult_frac((s64)INT_MAX, USER_HZ, HZ))
 			val = min_t(s64, mult_frac((s64)INT_MAX, USER_HZ, HZ),
@@ -3985,9 +4033,9 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	case SG_NEXT_CMD_LEN:	/* active only in v2 interface */
 		SG_LOG(3, sfp, "%s:    SG_NEXT_CMD_LEN\n", __func__);
 		res = get_user(val, ip);
-		if (res)
+		if (unlikely(res))
 			return res;
-		if (val > SG_MAX_CDB_SIZE)
+		if (unlikely(val > SG_MAX_CDB_SIZE))
 			return -ENOMEM;
 		mutex_lock(&sfp->f_mutex);
 		sfp->next_cmd_len = max_t(int, val, 0);
@@ -4000,7 +4048,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return put_user(val, ip);
 	case SG_EMULATED_HOST:
 		SG_LOG(3, sfp, "%s:    SG_EMULATED_HOST\n", __func__);
-		if (unlikely(SG_IS_DETACHING(sdp)))
+		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
 		return put_user(sdev->host->hostt->emulated, ip);
 	case SCSI_IOCTL_SEND_COMMAND:
@@ -4010,7 +4058,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	case SG_SET_DEBUG:
 		SG_LOG(3, sfp, "%s:    SG_SET_DEBUG\n", __func__);
 		res = get_user(val, ip);
-		if (res)
+		if (unlikely(res))
 			return res;
 		assign_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm, !!val);
 		if (val == 0)	/* user can force recalculation */
@@ -4072,7 +4120,7 @@ sg_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 
 	sfp = filp->private_data;
 	sdp = sfp->parentdp;
-	if (!sdp)
+	if (unlikely(!sdp))
 		return -ENXIO;
 
 	ret = sg_ioctl_common(filp, sdp, sfp, cmd_in, p);
@@ -4189,7 +4237,7 @@ sg_poll(struct file *filp, poll_table *wait)
 	if (num > 0)
 		p_res = EPOLLIN | EPOLLRDNORM;
 
-	if (unlikely(SG_IS_DETACHING(sfp->parentdp)))
+	if (SG_IS_DETACHING(sfp->parentdp))
 		p_res |= EPOLLHUP;
 	else if (likely(test_bit(SG_FFD_CMD_Q, sfp->ffd_bm)))
 		p_res |= EPOLLOUT | EPOLLWRNORM;
@@ -4248,29 +4296,29 @@ sg_vma_fault(struct vm_fault *vmf)
 	struct sg_fd *sfp;
 	const char *nbp = "==NULL, bad";
 
-	if (!vma) {
+	if (unlikely(!vma)) {
 		pr_warn("%s: vma%s\n", __func__, nbp);
 		goto out_err;
 	}
 	sfp = vma->vm_private_data;
-	if (!sfp) {
+	if (unlikely(!sfp)) {
 		pr_warn("%s: sfp%s\n", __func__, nbp);
 		goto out_err;
 	}
 	sdp = sfp->parentdp;
-	if (sdp && unlikely(SG_IS_DETACHING(sdp))) {
+	if (sdp && SG_IS_DETACHING(sdp)) {
 		SG_LOG(1, sfp, "%s: device detaching\n", __func__);
 		goto out_err;
 	}
 	srp = sfp->rsv_srp;
-	if (!srp) {
+	if (unlikely(!srp)) {
 		SG_LOG(1, sfp, "%s: srp%s\n", __func__, nbp);
 		goto out_err;
 	}
 	mutex_lock(&sfp->f_mutex);
 	rsv_schp = srp->sgatp;
 	offset = vmf->pgoff << PAGE_SHIFT;
-	if (offset >= (unsigned int)rsv_schp->buflen) {
+	if (unlikely(offset >= (unsigned int)rsv_schp->buflen)) {
 		SG_LOG(1, sfp, "%s: offset[%lu] >= rsv.buflen\n", __func__,
 		       offset);
 		goto out_err_unlock;
@@ -4309,10 +4357,10 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 	struct sg_fd *sfp;
 	struct sg_request *srp;
 
-	if (!filp || !vma)
+	if (unlikely(!filp || !vma))
 		return -ENXIO;
 	sfp = filp->private_data;
-	if (!sfp) {
+	if (unlikely(!sfp)) {
 		pr_warn("sg: %s: sfp is NULL\n", __func__);
 		return -ENXIO;
 	}
@@ -4330,7 +4378,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 		res = -EBUSY;
 		goto fini;
 	}
-	if (req_sz > SG_WRITE_COUNT_LIMIT) {	/* sanity check */
+	if (unlikely(req_sz > SG_WRITE_COUNT_LIMIT)) {	/* sanity check */
 		res = -ENOMEM;
 		goto fini;
 	}
@@ -4368,17 +4416,17 @@ sg_rq_end_io_usercontext(struct work_struct *work)
 					      ew_orph.work);
 	struct sg_fd *sfp;
 
-	if (!srp) {
+	if (unlikely(!srp)) {
 		WARN_ONCE(1, "%s: srp unexpectedly NULL\n", __func__);
 		return;
 	}
 	sfp = srp->parentfp;
-	if (!sfp) {
+	if (unlikely(!sfp)) {
 		WARN_ONCE(1, "%s: sfp unexpectedly NULL\n", __func__);
 		return;
 	}
 	SG_LOG(3, sfp, "%s: srp=0x%pK\n", __func__, srp);
-	if (test_bit(SG_FRQ_DEACT_ORPHAN, srp->frq_bm)) {
+	if (unlikely(test_bit(SG_FRQ_DEACT_ORPHAN, srp->frq_bm))) {
 		sg_finish_scsi_blk_rq(srp);	/* clean up orphan case */
 		sg_deact_request(sfp, srp);
 	}
@@ -4411,7 +4459,7 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	slen = min_t(int, scsi_rp->sense_len, SCSI_SENSE_BUFFERSIZE);
 	a_resid = scsi_rp->resid_len;
 
-	if (a_resid) {
+	if (unlikely(a_resid)) {
 		if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm)) {
 			if (rq_data_dir(rqq) == READ)
 				srp->in_resid = a_resid;
@@ -4421,7 +4469,7 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 			srp->in_resid = a_resid;
 		}
 	}
-	if (test_bit(SG_FRQ_ABORTING, srp->frq_bm) && rq_result == 0)
+	if (unlikely(test_bit(SG_FRQ_ABORTING, srp->frq_bm)) && rq_result == 0)
 		srp->rq_result |= (DRIVER_HARD << 24);
 
 	SG_LOG(6, sfp, "%s: pack/tag_id=%d/%d, cmd=0x%x, res=0x%x\n", __func__,
@@ -4435,11 +4483,12 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 		    (rq_result & 0xff) == SAM_STAT_COMMAND_TERMINATED)
 			__scsi_print_sense(sdp->device, __func__, scsi_rp->sense, slen);
 	}
-	if (slen > 0) {
-		if (scsi_rp->sense && !srp->sense_bp) {
-			srp->sense_bp = mempool_alloc(sg_sense_pool,
-						      GFP_ATOMIC);
-			if (srp->sense_bp) {
+	if (unlikely(slen > 0)) {
+		if (likely(scsi_rp->sense && !srp->sense_bp)) {
+			srp->sense_bp =
+				mempool_alloc(sg_sense_pool,
+					      GFP_ATOMIC   /* <-- leave */);
+			if (likely(srp->sense_bp)) {
 				memcpy(srp->sense_bp, scsi_rp->sense, slen);
 				if (slen < SCSI_SENSE_BUFFERSIZE)
 					memset(srp->sense_bp + slen, 0,
@@ -4449,7 +4498,7 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 				pr_warn("%s: sense but can't alloc buffer\n",
 					__func__);
 			}
-		} else if (srp->sense_bp) {
+		} else if (unlikely(srp->sense_bp)) {
 			slen = 0;
 			pr_warn("%s: non-NULL srp->sense_bp ? ?\n", __func__);
 		} else {
@@ -4536,14 +4585,14 @@ sg_add_device_helper(struct gendisk *disk, struct scsi_device *scsidp)
 	unsigned long iflags;
 
 	sdp = kzalloc(sizeof(*sdp), GFP_KERNEL);
-	if (!sdp)
+	if (unlikely(!sdp))
 		return ERR_PTR(-ENOMEM);
 
 	idr_preload(GFP_KERNEL);
 	write_lock_irqsave(&sg_index_lock, iflags);
 
 	error = idr_alloc(&sg_index_idr, sdp, 0, SG_MAX_DEVS, GFP_NOWAIT);
-	if (error < 0) {
+	if (unlikely(error < 0)) {
 		if (error == -ENOSPC) {
 			sdev_printk(KERN_WARNING, scsidp,
 				    "Unable to attach sg device type=%d, minor number exceeds %d\n",
@@ -4577,7 +4626,7 @@ sg_add_device_helper(struct gendisk *disk, struct scsi_device *scsidp)
 	write_unlock_irqrestore(&sg_index_lock, iflags);
 	idr_preload_end();
 
-	if (error) {
+	if (unlikely(error)) {
 		kfree(sdp);
 		return ERR_PTR(error);
 	}
@@ -4595,7 +4644,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 	unsigned long iflags;
 
 	disk = alloc_disk(1);
-	if (!disk) {
+	if (unlikely(!disk)) {
 		pr_warn("%s: alloc_disk failed\n", __func__);
 		return -ENOMEM;
 	}
@@ -4603,7 +4652,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 
 	error = -ENOMEM;
 	cdev = cdev_alloc();
-	if (!cdev) {
+	if (unlikely(!cdev)) {
 		pr_warn("%s: cdev_alloc failed\n", __func__);
 		goto out;
 	}
@@ -4617,7 +4666,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 	}
 
 	error = cdev_add(cdev, MKDEV(SCSI_GENERIC_MAJOR, sdp->index), 1);
-	if (error)
+	if (unlikely(error))
 		goto cdev_add_err;
 
 	sdp->cdev = cdev;
@@ -4693,7 +4742,7 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 	unsigned long idx;
 	struct sg_fd *sfp;
 
-	if (!sdp)
+	if (unlikely(!sdp))
 		return;
 	/* set this flag as soon as possible as it could be a surprise */
 	if (test_and_set_bit(SG_FDEV_DETACHING, sdp->fdev_bm))
@@ -4734,21 +4783,21 @@ init_sg(void)
 
 	rc = register_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0),
 				    SG_MAX_DEVS, "sg");
-	if (rc)
+	if (unlikely(rc))
 		return rc;
 
 	sg_sense_cache = kmem_cache_create_usercopy
 				("sg_sense", SCSI_SENSE_BUFFERSIZE, 0,
 				 SLAB_HWCACHE_ALIGN, 0,
 				 SCSI_SENSE_BUFFERSIZE, NULL);
-	if (!sg_sense_cache) {
+	if (unlikely(!sg_sense_cache)) {
 		pr_err("sg: can't init sense cache\n");
 		rc = -ENOMEM;
 		goto err_out_unreg;
 	}
 	sg_sense_pool = mempool_create_slab_pool(SG_MEMPOOL_MIN_NR,
 						 sg_sense_cache);
-	if (!sg_sense_pool) {
+	if (unlikely(!sg_sense_pool)) {
 		pr_err("sg: can't init sense pool\n");
 		rc = -ENOMEM;
 		goto err_out_cache;
@@ -4764,7 +4813,7 @@ init_sg(void)
 	}
 	sg_sysfs_valid = true;
 	rc = scsi_register_interface(&sg_interface);
-	if (rc == 0) {
+	if (likely(rc == 0)) {
 		sg_proc_init();
 		sg_dfs_init();
 		return 0;
@@ -4888,13 +4937,13 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	sdp = sfp->parentdp;
 	if (cwrp->cmd_len > BLK_MAX_CDB) {	/* for longer SCSI cdb_s */
 		long_cmdp = kzalloc(cwrp->cmd_len, GFP_KERNEL);
-		if (!long_cmdp) {
+		if (unlikely(!long_cmdp)) {
 			res = -ENOMEM;
 			goto err_out;
 		}
 		SG_LOG(5, sfp, "%s: long_cmdp=0x%pK ++\n", __func__, long_cmdp);
 	}
-	if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm)) {
+	if (likely(test_bit(SG_FRQ_IS_V4I, srp->frq_bm))) {
 		struct sg_io_v4 *h4p = cwrp->h4p;
 
 		if (dxfer_dir == SG_DXFER_TO_DEV) {
@@ -4947,11 +4996,11 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	if (cwrp->u_cmdp)
 		res = sg_fetch_cmnd(sfp, cwrp->u_cmdp, cwrp->cmd_len,
 				    scsi_rp->cmd);
-	else if (cwrp->cmdp)
+	else if (likely(cwrp->cmdp))
 		memcpy(scsi_rp->cmd, cwrp->cmdp, cwrp->cmd_len);
 	else
 		res = -EPROTO;
-	if (res)
+	if (unlikely(res))
 		goto err_out;
 	scsi_rp->cmd_len = cwrp->cmd_len;
 	srp->cmd_opcode = scsi_rp->cmd[0];
@@ -4966,7 +5015,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		SG_LOG(4, sfp, "%s: no data xfer [0x%pK]\n", __func__, srp);
 		clear_bit(SG_FRQ_US_XFER, srp->frq_bm);
 		goto fini;	/* path of reqs with no din nor dout */
-	} else if ((rq_flags & SG_FLAG_DIRECT_IO) && iov_count == 0 &&
+	} else if (unlikely(rq_flags & SG_FLAG_DIRECT_IO) && iov_count == 0 &&
 		   !sdp->device->host->unchecked_isa_dma &&
 		   blk_rq_aligned(q, (unsigned long)up, dxfer_len)) {
 		srp->rq_info |= SG_INFO_DIRECT_IO;
@@ -4980,7 +5029,8 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	if (likely(md)) {	/* normal, "indirect" IO */
 		if (unlikely(rq_flags & SG_FLAG_MMAP_IO)) {
 			/* mmap IO must use and fit in reserve request */
-			if (!reserved || dxfer_len > req_schp->buflen)
+			if (unlikely(!reserved ||
+				     dxfer_len > req_schp->buflen))
 				res = reserved ? -ENOMEM : -EBUSY;
 		} else if (req_schp->buflen == 0) {
 			int up_sz = max_t(int, dxfer_len, sfp->sgat_elem_sz);
@@ -5003,7 +5053,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 			goto fini;
 
 		iov_iter_truncate(&i, dxfer_len);
-		if (!iov_iter_count(&i)) {
+		if (unlikely(!iov_iter_count(&i))) {
 			kfree(iov);
 			res = -EINVAL;
 			goto fini;
@@ -5016,9 +5066,11 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 			cp = "iov_count > 0";
 	} else if (us_xfer) { /* setup for transfer data to/from user space */
 		res = blk_rq_map_user(q, rqq, md, up, dxfer_len, GFP_ATOMIC);
-		if (IS_ENABLED(CONFIG_SCSI_PROC_FS) && res)
+#if IS_ENABLED(SG_LOG_ACTIVE)
+		if (unlikely(res))
 			SG_LOG(1, sfp, "%s: blk_rq_map_user() res=%d\n",
 			       __func__, res);
+#endif
 	} else {	/* transfer data to/from kernel buffers */
 		res = sg_rq_map_kern(srp, q, rqq, r0w);
 	}
@@ -5067,14 +5119,14 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 			scsi_req_free_cmd(scsi_req(rqq));
 		blk_put_request(rqq);
 	}
-	if (srp->bio) {
+	if (likely(srp->bio)) {
 		bool us_xfer = test_bit(SG_FRQ_US_XFER, srp->frq_bm);
 		struct bio *bio = srp->bio;
 
 		srp->bio = NULL;
 		if (us_xfer && bio) {
 			ret = blk_rq_unmap_user(bio);
-			if (ret) {	/* -EINTR (-4) can be ignored */
+			if (unlikely(ret)) {	/* -EINTR (-4) can be ignored */
 				SG_LOG(6, sfp,
 				       "%s: blk_rq_unmap_user() --> %d\n",
 				       __func__, ret);
@@ -5282,7 +5334,7 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 		     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
 			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
 				continue;
-			if (is_tag) {
+			if (unlikely(is_tag)) {
 				if (srp->tag != id)
 					continue;
 			} else {
@@ -5359,7 +5411,7 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 			__maybe_unused const char *cptp = is_tag ? "tag=" :
 								   "pack_id=";
 
-			if (is_bad_st)
+			if (unlikely(is_bad_st))
 				SG_LOG(1, sfp, "%s: %s%d wrong state: %s\n",
 				       __func__, cptp, id,
 				       sg_rq_st_str(bad_sr_st, true));
@@ -5391,7 +5443,7 @@ sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
 	struct sg_request *srp;
 	struct xarray *xafp = &sfp->srp_arr;
 
-	if (unlikely(SG_IS_DETACHING(sfp->parentdp))) {
+	if (SG_IS_DETACHING(sfp->parentdp)) {
 		*srpp = ERR_PTR(-ENODEV);
 		return true;
 	}
@@ -5450,7 +5502,7 @@ sg_mk_srp(struct sg_fd *sfp, bool first)
 		srp = kzalloc(sizeof(*srp), gfp | GFP_KERNEL);
 	else
 		srp = kzalloc(sizeof(*srp), gfp | GFP_ATOMIC);
-	if (srp) {
+	if (likely(srp)) {
 		atomic_set(&srp->rq_st, SG_RQ_BUSY);
 		srp->sh_var = SG_SHR_NONE;
 		srp->parentfp = sfp;
@@ -5505,7 +5557,7 @@ sg_build_reserve(struct sg_fd *sfp, int buflen)
 			go_out = true;
 		}
 		res = sg_mk_sgat(srp, sfp, buflen);
-		if (res == 0) {
+		if (likely(res == 0)) {
 			SG_LOG(4, sfp, "%s: final buflen=%d, srp=0x%pK ++\n",
 			       __func__, buflen, srp);
 			return srp;
@@ -5575,7 +5627,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		break;
 	case SG_SHR_WS_RQ:
 		rs_sfp = sg_fd_share_ptr(fp);
-		if (!sg_fd_is_shared(fp)) {
+		if (unlikely(!rs_sfp)) {
 			r_srp = ERR_PTR(-EPROTO);
 			break;
 		}
@@ -5589,7 +5641,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		rs_sr_st = atomic_read(&rs_rsv_srp->rq_st);
 		switch (rs_sr_st) {
 		case SG_RQ_AWAIT_RCV:
-			if (rs_rsv_srp->rq_result & SG_ML_RESULT_MSK) {
+			if (unlikely(rs_rsv_srp->rq_result & SG_ML_RESULT_MSK)) {
 				/* read-side done but error occurred */
 				r_srp = ERR_PTR(-ENOSTR);
 				break;
@@ -5597,7 +5649,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 			fallthrough;
 		case SG_RQ_SHR_SWAP:
 			ws_rq = true;
-			if (rs_sr_st == SG_RQ_AWAIT_RCV)
+			if (unlikely(rs_sr_st == SG_RQ_AWAIT_RCV))
 				break;
 			res = sg_rq_chg_state(rs_rsv_srp, rs_sr_st, SG_RQ_SHR_IN_WS);
 			if (unlikely(res))
@@ -5633,8 +5685,8 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 	}
 	cp = "";
 
-	if (ws_rq) {	/* write-side dlen may be smaller than read-side's dlen */
-		if (dxfr_len > rs_rsv_srp->sgatp->dlen) {
+	if (ws_rq) {	/* write-side dlen may be <= read-side's dlen */
+		if (unlikely(dxfr_len > rs_rsv_srp->sgatp->dlen)) {
 			SG_LOG(4, fp, "%s: write-side dlen [%d] > read-side dlen\n",
 			       __func__, dxfr_len);
 			r_srp = ERR_PTR(-E2BIG);
@@ -5650,7 +5702,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		mk_new_srp = true;
 	} else if (atomic_read(&fp->inactives) <= 0) {
 		mk_new_srp = true;
-	} else if (!try_harder && dxfr_len < SG_DEF_SECTOR_SZ) {
+	} else if (likely(!try_harder) && dxfr_len < SG_DEF_SECTOR_SZ) {
 		l_used_idx = READ_ONCE(fp->low_used_idx);
 		s_idx = (l_used_idx < 0) ? 0 : l_used_idx;
 		if (l_used_idx >= 0 && xa_get_mark(xafp, s_idx, SG_XA_RQ_INACTIVE)) {
@@ -5675,7 +5727,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		/* If dxfr_len is small, use lowest inactive request */
 		if (low_srp) {
 			r_srp = low_srp;
-			if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
+			if (unlikely(sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY)))
 				goto start_again; /* gone to another thread */
 			atomic_dec(&fp->inactives);
 			cp = "lowest inactive in srp_arr";
@@ -5732,7 +5784,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 			goto fini;
 		} else if (fp->tot_fd_thresh > 0) {
 			sum_dlen = atomic_read(&fp->sum_fd_dlens) + dxfr_len;
-			if (sum_dlen > (u32)fp->tot_fd_thresh) {
+			if (unlikely(sum_dlen > (u32)fp->tot_fd_thresh)) {
 				r_srp = ERR_PTR(-E2BIG);
 				SG_LOG(2, fp, "%s: sum_of_dlen(%u) > %s\n",
 				       __func__, sum_dlen, "tot_fd_thresh");
@@ -5750,7 +5802,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		xa_lock_irqsave(xafp, iflags);
 		res = __xa_alloc(xafp, &n_idx, r_srp, xa_limit_32b, GFP_KERNEL);
 		xa_unlock_irqrestore(xafp, iflags);
-		if (res < 0) {
+		if (unlikely(res < 0)) {
 			sg_remove_sgat(r_srp);
 			kfree(r_srp);
 			r_srp = ERR_PTR(-EPROTOTYPE);
@@ -5838,7 +5890,7 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 	struct xarray *xafp;
 
 	sfp = kzalloc(sizeof(*sfp), GFP_ATOMIC | __GFP_NOWARN);
-	if (!sfp)
+	if (unlikely(!sfp))
 		return ERR_PTR(-ENOMEM);
 	init_waitqueue_head(&sfp->cmpl_wait);
 	xa_init_flags(&sfp->srp_arr, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
@@ -5869,7 +5921,7 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 	atomic_set(&sfp->waiting, 0);
 	atomic_set(&sfp->inactives, 0);
 
-	if (unlikely(SG_IS_DETACHING(sdp))) {
+	if (SG_IS_DETACHING(sdp)) {
 		SG_LOG(1, sfp, "%s: detaching\n", __func__);
 		kfree(sfp);
 		return ERR_PTR(-ENODEV);
@@ -5918,7 +5970,7 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 	xa_lock_irqsave(xadp, iflags);
 	res = __xa_alloc(xadp, &idx, sfp, xa_limit_32b, GFP_KERNEL);
 	xa_unlock_irqrestore(xadp, iflags);
-	if (res < 0) {
+	if (unlikely(res < 0)) {
 		pr_warn("%s: xa_alloc(sdp) bad, o_count=%d, errno=%d\n",
 			__func__, atomic_read(&sdp->open_cnt), -res);
 		if (srp) {
@@ -5958,7 +6010,7 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 	struct xarray *xafp = &sfp->srp_arr;
 	struct xarray *xadp;
 
-	if (!sfp) {
+	if (unlikely(!sfp)) {
 		pr_warn("sg: %s: sfp is NULL\n", __func__);
 		return;
 	}
@@ -5971,14 +6023,14 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 			sg_finish_scsi_blk_rq(srp);
 		if (srp->sgatp->buflen > 0)
 			sg_remove_sgat(srp);
-		if (srp->sense_bp) {
+		if (unlikely(srp->sense_bp)) {
 			mempool_free(srp->sense_bp, sg_sense_pool);
 			srp->sense_bp = NULL;
 		}
 		xa_lock_irqsave(xafp, iflags);
 		e_srp = __xa_erase(xafp, srp->rq_idx);
 		xa_unlock_irqrestore(xafp, iflags);
-		if (srp != e_srp)
+		if (unlikely(srp != e_srp))
 			SG_LOG(1, sfp, "%s: xa_erase() return unexpected\n",
 			       __func__);
 		SG_LOG(6, sfp, "%s: kfree: srp=%pK --\n", __func__, srp);
@@ -6033,7 +6085,7 @@ sg_get_dev(int min_dev)
 
 	read_lock_irqsave(&sg_index_lock, iflags);
 	sdp = sg_lookup_dev(min_dev);
-	if (!sdp)
+	if (unlikely(!sdp))
 		sdp = ERR_PTR(-ENXIO);
 	else if (SG_IS_DETACHING(sdp)) {
 		/* If detaching, then the refcount may already be 0, in
@@ -6139,7 +6191,7 @@ dev_seq_start(struct seq_file *s, loff_t *pos)
 	struct sg_proc_deviter *it = kzalloc(sizeof(*it), GFP_KERNEL);
 
 	s->private = it;
-	if (!it)
+	if (unlikely(!it))
 		return NULL;
 
 	it->index = *pos;
@@ -6189,10 +6241,10 @@ sg_proc_write_adio(struct file *filp, const char __user *buffer,
 	int err;
 	unsigned long num;
 
-	if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
+	if (unlikely(!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO)))
 		return -EACCES;
 	err = kstrtoul_from_user(buffer, count, 0, &num);
-	if (err)
+	if (unlikely(err))
 		return err;
 	sg_allow_dio = num ? 1 : 0;
 	return count;
@@ -6211,13 +6263,13 @@ sg_proc_write_dressz(struct file *filp, const char __user *buffer,
 	int err;
 	unsigned long k = ULONG_MAX;
 
-	if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
+	if (unlikely(!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO)))
 		return -EACCES;
 
 	err = kstrtoul_from_user(buffer, count, 0, &k);
-	if (err)
+	if (unlikely(err))
 		return err;
-	if (k <= 1048576) {	/* limit "big buff" to 1 MB */
+	if (likely(k <= 1048576)) {	/* limit "big buff" to 1 MB */
 		sg_big_buff = k;
 		return count;
 	}
@@ -6249,7 +6301,7 @@ sg_proc_seq_show_dev(struct seq_file *s, void *v)
 
 	read_lock_irqsave(&sg_index_lock, iflags);
 	sdp = it ? sg_lookup_dev(it->index) : NULL;
-	if (!sdp || !sdp->device || SG_IS_DETACHING(sdp))
+	if (unlikely(!sdp || !sdp->device || SG_IS_DETACHING(sdp)))
 		seq_puts(s, "-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\n");
 	else {
 		scsidp = sdp->device;
@@ -6301,7 +6353,7 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
 	const char *cp;
 	const char *tp = t_in_ns ? "ns" : "ms";
 
-	if (len < 1)
+	if (unlikely(len < 1))
 		return 0;
 	v4 = test_bit(SG_FRQ_IS_V4I, srp->frq_bm);
 	is_v3v4 = v4 ? true : (srp->s_hdr3.interface_id != '\0');
@@ -6467,14 +6519,14 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v, bool reduced)
 			   (int)it->max, def_reserved_size);
 	fdi_p = it ? &it->fd_index : &k;
 	bp = kzalloc(bp_len, __GFP_NOWARN | GFP_KERNEL);
-	if (!bp) {
+	if (unlikely(!bp)) {
 		seq_printf(s, "%s: Unable to allocate %d on heap, finish\n",
 			   __func__, bp_len);
 		return -ENOMEM;
 	}
 	read_lock_irqsave(&sg_index_lock, iflags);
 	sdp = it ? sg_lookup_dev(it->index) : NULL;
-	if (!sdp)
+	if (unlikely(!sdp))
 		goto skip;
 	sd_n = dev_arr[0];
 	if (sd_n != -1 && sd_n != sdp->index && sd_n != dev_arr[1]) {
@@ -6530,7 +6582,7 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v, bool reduced)
 					   "due to buffer size");
 		} else if (b1[0]) {
 			seq_puts(s, b1);
-			if (seq_has_overflowed(s))
+			if (unlikely(seq_has_overflowed(s)))
 				goto s_ovfl;
 		}
 	}
@@ -6605,7 +6657,7 @@ sg_proc_init(void)
 	struct proc_dir_entry *p;
 
 	p = proc_mkdir("scsi/sg", NULL);
-	if (!p)
+	if (unlikely(!p))
 		return 1;
 
 	proc_create("allow_dio", 0644, p, &adio_proc_ops);
@@ -6731,7 +6783,7 @@ sg_dfs_write(struct file *file, const char __user *buf, size_t count,
 	 * Attributes that only implement .seq_ops are read-only and 'attr' is
 	 * the same with 'data' in this case.
 	 */
-	if (attr == data || !attr->write)
+	if (unlikely(attr == data || !attr->write))
 		return -EPERM;
 	return attr->write(data, buf, count, ppos);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 55/83] sg: mrq abort
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (54 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 54/83] sg: unlikely likely Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 56/83] sg: reduce atomic operations Douglas Gilbert
                   ` (27 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Add ability to abort current and remaining (unsubmitted) part
of a multiple requests invocation. Any multiple requests
invocation that the user may want to abort later must be given
a non-zero mrq pack_id (in the request_extra field of the
control object). There can only be one of these non-zero
mrq pack_ids outstanding at a time per file descriptor.

Any requests in a multiple request invocation that have already
reached their internal completion point when the mrq abort is
issued must be processed in the normal fashion. Any inflight
requests will have blk_abort_request() called on them. Those
remaining requests that have not yet been submitted will be
dropped. The ctl_obj.info field is set to number of received
requests that have been processed. The ctl_obj.resid_out field
in the number of requests given less the number actually
submitted. The ctl_obj.resid_in field in the number of given
requests less the number actually received and processed
(i.e. given - ctl_obj.info).

ioctl(sg_fd, SG_IOABORT, &ctl_obj) is used to issue a mrq abort.
The flags field must have SGV4_FLAG_MULTIPLE_REQS set and the
request_extra field must be set to the non-zero mrq pack_id.
SG_PACK_ID_WILDCARD can be given for the mrq pack_id.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 274 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 237 insertions(+), 37 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 27d9ac801f11..3d659ff90788 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -152,6 +152,7 @@ enum sg_shr_var {
 #define SG_FFD_RELEASE		8	/* release (close) underway */
 #define SG_FFD_NO_DURATION	9	/* don't do command duration calc */
 #define SG_FFD_MORE_ASYNC	10	/* yield EBUSY more often */
+#define SG_FFD_MRQ_ABORT	11	/* SG_IOABORT + FLAG_MULTIPLE_REQS */
 
 /* Bit positions (flags) for sg_device::fdev_bm bitmask follow */
 #define SG_FDEV_EXCLUDE		0	/* have fd open with O_EXCL */
@@ -215,7 +216,8 @@ struct sg_slice_hdr4 {	/* parts of sg_io_v4 object needed in async usage */
 	s16 dir;		/* data xfer direction; SG_DXFER_*  */
 	u16 cmd_len;		/* truncated of sg_io_v4::request_len */
 	u16 max_sb_len;		/* truncated of sg_io_v4::max_response_len */
-	u16 mrq_ind;		/* position in parentfp->mrq_arr */
+	u16 mrq_ind;		/* mrq submit order, origin 0 */
+	atomic_t pack_id_of_mrq;	/* mrq pack_id, active when > 0 */
 };
 
 struct sg_scatter_hold {     /* holding area for scsi scatter gather info */
@@ -271,6 +273,7 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	atomic_t waiting;	/* number of requests awaiting receive */
 	atomic_t inactives;	/* number of inactive requests */
 	atomic_t sum_fd_dlens;	/* when tot_fd_thresh>0 this is sum_of(dlen) */
+	atomic_t mrq_id_abort;	/* inactive when 0, else id if aborted */
 	int tot_fd_thresh;	/* E2BIG if sum_of(dlen) > this, 0: ignore */
 	int sgat_elem_sz;	/* initialized to scatter_elem_sz */
 	int mmap_sz;		/* byte size of previous mmap() call */
@@ -410,7 +413,6 @@ static const char *sg_shr_str(enum sg_shr_var sh_var, bool long_str);
 #define SG_LOG(depth, sfp, fmt, a...) do { } while (0)
 #endif	/* end of CONFIG_SCSI_LOGGING && SG_DEBUG conditional */
 
-
 /*
  * The SCSI interfaces that use read() and write() as an asynchronous variant of
  * ioctl(..., SG_IO, ...) are fundamentally unsafe, since there are lots of ways
@@ -944,7 +946,7 @@ sg_mrq_1complet(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 		struct sg_fd *w_sfp, int tot_reqs, struct sg_request *srp)
 {
 	int s_res, indx;
-	struct sg_io_v4 *siv4p;
+	struct sg_io_v4 *hp;
 
 	SG_LOG(3, w_sfp, "%s: start, tot_reqs=%d\n", __func__, tot_reqs);
 	if (unlikely(!srp))
@@ -952,12 +954,12 @@ sg_mrq_1complet(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 	indx = srp->s_hdr4.mrq_ind;
 	if (unlikely(indx < 0 || indx >= tot_reqs))
 		return -EPROTO;
-	siv4p = a_hds + indx;
-	s_res = sg_receive_v4(w_sfp, srp, NULL, siv4p);
+	hp = a_hds + indx;
+	s_res = sg_receive_v4(w_sfp, srp, NULL, hp);
 	if (unlikely(s_res == -EFAULT))
 		return s_res;
-	siv4p->info |= SG_INFO_MRQ_FINI;
-	if (w_sfp->async_qp && (siv4p->flags & SGV4_FLAG_SIGNAL)) {
+	hp->info |= SG_INFO_MRQ_FINI;
+	if (w_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
 		s_res = sg_mrq_arr_flush(cop, a_hds, tot_reqs, s_res);
 		if (unlikely(s_res))	/* can only be -EFAULT */
 			return s_res;
@@ -1123,16 +1125,19 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 static int
 sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 {
-	bool immed, stop_if, f_non_block;
+	bool chk_abort = false;
+	bool set_this, set_other, immed, stop_if, f_non_block;
 	int res = 0;
 	int s_res = 0;	/* for secondary error: some-good-then-error, case */
 	int other_fp_sent = 0;
 	int this_fp_sent = 0;
+	int num_subm = 0;
 	int num_cmpl = 0;
 	const int shr_complet_b4 = SGV4_FLAG_SHARE | SGV4_FLAG_COMPLETE_B4;
+	int id_of_mrq, existing_id;
+	u32 n, flags, cdb_mxlen;
 	unsigned long ul_timeout;
-	struct sg_io_v4 *cop = cwrp->h4p;
-	u32 k, n, flags, cdb_mxlen;
+	struct sg_io_v4 *cop = cwrp->h4p;	/* controlling object */
 	u32 blen = cop->dout_xfer_len;
 	u32 cdb_alen = cop->request_len;
 	u32 tot_reqs = blen / SZ_SG_IO_V4;
@@ -1148,6 +1153,17 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 	f_non_block = !!(fp->filp->f_flags & O_NONBLOCK);
 	immed = !!(cop->flags & SGV4_FLAG_IMMED);
 	stop_if = !!(cop->flags & SGV4_FLAG_STOP_IF);
+	id_of_mrq = (int)cop->request_extra;
+	if (id_of_mrq) {
+		existing_id = atomic_cmpxchg(&fp->mrq_id_abort, 0, id_of_mrq);
+		if (existing_id && existing_id != id_of_mrq) {
+			SG_LOG(1, fp, "%s: existing id=%d id_of_mrq=%d\n",
+			       __func__, existing_id, id_of_mrq);
+			return -EDOM;
+		}
+		clear_bit(SG_FFD_MRQ_ABORT, fp->ffd_bm);
+		chk_abort = true;
+	}
 	if (blocking) {		/* came from ioctl(SG_IO) */
 		if (unlikely(immed)) {
 			SG_LOG(1, fp, "%s: ioctl(SG_IO) %s contradicts\n",
@@ -1162,9 +1178,10 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 	}
 	if (!immed && f_non_block)
 		immed = true;
-	SG_LOG(3, fp, "%s: %s, tot_reqs=%u, cdb_alen=%u\n", __func__,
+	SG_LOG(3, fp, "%s: %s, tot_reqs=%u, id_of_mrq=%d\n", __func__,
 	       (immed ? "IMMED" : (blocking ?  "ordered blocking" :
-				   "variable blocking")), tot_reqs, cdb_alen);
+				   "variable blocking")),
+	       tot_reqs, id_of_mrq);
 	sg_sgv4_out_zero(cop);
 
 	if (unlikely(tot_reqs > U16_MAX)) {
@@ -1216,17 +1233,32 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 	res = sg_mrq_sanity(sdp, cop, a_hds, cdb_ap, fp, immed, tot_reqs);
 	if (unlikely(res))
 		goto fini;
-	/* override cmd queuing setting to allow */
-	set_bit(SG_FFD_CMD_Q, fp->ffd_bm);
-	if (o_sfp)
-		set_bit(SG_FFD_CMD_Q, o_sfp->ffd_bm);
-
+	set_this = false;
+	set_other = false;
 	/* Dispatch (submit) requests and optionally wait for response */
-	for (hp = a_hds, k = 0; num_cmpl < tot_reqs; ++hp, ++k) {
+	for (hp = a_hds; num_subm < tot_reqs; ++hp) {
+		if (chk_abort && test_and_clear_bit(SG_FFD_MRQ_ABORT,
+						    fp->ffd_bm)) {
+			SG_LOG(1, fp, "%s: id_of_mrq=%d aborting at ind=%d\n",
+			       __func__, id_of_mrq, num_subm);
+			break;	/* N.B. rest not submitted */
+		}
 		flags = hp->flags;
-		rq_sfp = (flags & SGV4_FLAG_DO_ON_OTHER) ? o_sfp : fp;
+		if (flags & SGV4_FLAG_DO_ON_OTHER) {
+			rq_sfp = o_sfp;
+			if (!set_other) {
+				set_other = true;
+				set_bit(SG_FFD_CMD_Q, rq_sfp->ffd_bm);
+			}
+		} else {
+			rq_sfp = fp;
+			if (!set_this) {
+				set_this = true;
+				set_bit(SG_FFD_CMD_Q, rq_sfp->ffd_bm);
+			}
+		}
 		if (cdb_ap) {	/* already have array of cdbs */
-			cwrp->cmdp = cdb_ap + (k * cdb_mxlen);
+			cwrp->cmdp = cdb_ap + (num_subm * cdb_mxlen);
 			cwrp->u_cmdp = NULL;
 		} else {	/* fetch each cdb from user space */
 			cwrp->cmdp = NULL;
@@ -1245,7 +1277,9 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 			s_res = PTR_ERR(srp);
 			break;
 		}
-		srp->s_hdr4.mrq_ind = k;
+		srp->s_hdr4.mrq_ind = num_subm++;
+		if (chk_abort)
+			atomic_set(&srp->s_hdr4.pack_id_of_mrq, id_of_mrq);
 		if (immed || (!(blocking || (flags & shr_complet_b4)))) {
 			if (fp == rq_sfp)
 				++this_fp_sent;
@@ -1278,8 +1312,8 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 			kill_fasync(&rq_sfp->async_qp, SIGPOLL, POLL_IN);
 		}
 	}	/* end of dispatch request and optionally wait response loop */
-	cop->dout_resid = tot_reqs - num_cmpl;
-	cop->info = num_cmpl;
+	cop->dout_resid = tot_reqs - num_subm;
+	cop->info = num_cmpl;		/* number received */
 	if (cop->din_xfer_len > 0) {
 		cop->din_resid = tot_reqs - num_cmpl;
 		cop->spare_out = -s_res;
@@ -1294,6 +1328,8 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 		if (unlikely(s_res == -EFAULT || s_res == -ERESTARTSYS))
 			res = s_res;	/* this may leave orphans */
 	}
+	if (id_of_mrq)	/* can no longer do a mrq abort */
+		atomic_set(&fp->mrq_id_abort, 0);
 fini:
 	if (likely(res == 0) && !immed)
 		res = sg_mrq_arr_flush(cop, a_hds, tot_reqs, s_res);
@@ -2032,7 +2068,7 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p,
 	if (unlikely(res < 0))
 		goto fini;
 	cop->din_resid -= res;
-	cop->info = res;
+	cop->info = res;	/* number received */
 	if (copy_to_user(p, cop, sizeof(*cop)))
 		return -EFAULT;
 	res = 0;
@@ -2914,7 +2950,6 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
  */
 static struct sg_request *
 sg_match_request(struct sg_fd *sfp, bool use_tag, int id)
-		__must_hold(&sfp->rq_list_lock)
 {
 	int num_waiting = atomic_read(&sfp->waiting);
 	unsigned long idx;
@@ -2942,6 +2977,48 @@ sg_match_request(struct sg_fd *sfp, bool use_tag, int id)
 	return NULL;
 }
 
+/*
+ * Looks for first request following 'after_rp' (or the start if after_rp is
+ * NULL) whose pack_id_of_mrq matches the given pack_id. If after_rp is
+ * non-NULL and it is not found, then the search restarts from the beginning
+ * of the list. If no match is found then NULL is returned.
+ */
+static struct sg_request *
+sg_match_first_mrq_after(struct sg_fd *sfp, int pack_id,
+			 struct sg_request *after_rp)
+{
+	bool found = false;
+	bool look_for_after = after_rp ? true : false;
+	int id;
+	unsigned long idx;
+	struct sg_request *srp;
+
+	if (atomic_read(&sfp->waiting) < 1)
+		return NULL;
+once_more:
+	xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
+		if (unlikely(!srp))
+			continue;
+		if (look_for_after) {
+			if (after_rp == srp)
+				look_for_after = false;
+			continue;
+		}
+		id = atomic_read(&srp->s_hdr4.pack_id_of_mrq);
+		if (id == 0)	/* mrq_pack_ids cannot be zero */
+			continue;
+		if (pack_id == SG_PACK_ID_WILDCARD || pack_id == id) {
+			found = true;
+			break;
+		}
+	}
+	if (look_for_after) {	/* after_rp may now be on free list */
+		look_for_after = false;
+		goto once_more;
+	}
+	return found ? srp : NULL;
+}
+
 static int
 sg_abort_req(struct sg_fd *sfp, struct sg_request *srp)
 		__must_hold(&sfp->srp_arr->xa_lock)
@@ -2990,6 +3067,117 @@ sg_abort_req(struct sg_fd *sfp, struct sg_request *srp)
 	return res;
 }
 
+static int
+sg_mrq_abort_inflight(struct sg_fd *sfp, int pack_id)
+{
+	bool got_ebusy = false;
+	int res = 0;
+	unsigned long iflags;
+	struct sg_request *srp;
+	struct sg_request *prev_srp;
+
+	xa_lock_irqsave(&sfp->srp_arr, iflags);
+	for (prev_srp = NULL; true; prev_srp = srp) {
+		srp = sg_match_first_mrq_after(sfp, pack_id, prev_srp);
+		if (!srp)
+			break;
+		res = sg_abort_req(sfp, srp);
+		if (res == -EBUSY)	/* check rest of active list */
+			got_ebusy = true;
+		else if (res)
+			break;
+	}
+	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
+	if (res)
+		return res;
+	return got_ebusy ? -EBUSY : 0;
+}
+
+/*
+ * Implements ioctl(SG_IOABORT) when SGV4_FLAG_MULTIPLE_REQS set. pack_id is
+ * non-zero and is from the request_extra field. dev_scope is set when
+ * SGV4_FLAG_DEV_SCOPE is given; in that case there is one level of recursion
+ * if there is no match or clash with given sfp. Will abort the first
+ * mrq that matches then exit. Can only do mrq abort if the mrq submission
+ * used a non-zero ctl_obj.request_extra (pack_id).
+ */
+static int
+sg_mrq_abort(struct sg_fd *sfp, int pack_id, bool dev_scope)
+		__must_hold(sfp->f_mutex)
+{
+	int existing_id;
+	int res = 0;
+	unsigned long idx;
+	struct sg_device *sdp;
+	struct sg_fd *o_sfp;
+	struct sg_fd *s_sfp;
+
+	if (pack_id != SG_PACK_ID_WILDCARD)
+		SG_LOG(3, sfp, "%s: pack_id=%d, dev_scope=%s\n", __func__,
+		       pack_id, (dev_scope ? "true" : "false"));
+	existing_id = atomic_read(&sfp->mrq_id_abort);
+	if (existing_id == 0) {
+		if (dev_scope)
+			goto check_whole_dev;
+		SG_LOG(1, sfp, "%s: sfp->mrq_id_abort is 0, nothing to do\n",
+		       __func__);
+		return -EADDRNOTAVAIL;
+	}
+	if (pack_id == SG_PACK_ID_WILDCARD) {
+		pack_id = existing_id;
+		SG_LOG(3, sfp, "%s: wildcard becomes pack_id=%d\n", __func__,
+		       pack_id);
+	} else if (pack_id != existing_id) {
+		if (dev_scope)
+			goto check_whole_dev;
+		SG_LOG(1, sfp, "%s: want id=%d, got sfp->mrq_id_abort=%d\n",
+		       __func__, pack_id, existing_id);
+		return -EADDRINUSE;
+	}
+	if (test_and_set_bit(SG_FFD_MRQ_ABORT, sfp->ffd_bm))
+		SG_LOG(2, sfp, "%s: repeated SG_IOABORT on mrq_id=%d\n",
+		       __func__, pack_id);
+
+	/* now look for inflight requests matching that mrq pack_id */
+	xa_lock(&sfp->srp_arr);
+	res = sg_mrq_abort_inflight(sfp, pack_id);
+	if (res == -EBUSY) {
+		res = sg_mrq_abort_inflight(sfp, pack_id);
+		if (res)
+			goto fini;
+	}
+	s_sfp = sg_fd_share_ptr(sfp);
+	if (s_sfp) {	/* SGV4_FLAG_DO_ON_OTHER may have been used */
+		xa_unlock(&sfp->srp_arr);
+		sfp = s_sfp;	/* if share, check other fd */
+		xa_lock(&sfp->srp_arr);
+		if (sg_fd_is_shared(sfp))
+			goto fini;
+		/* tough luck if other fd used same mrq pack_id */
+		res = sg_mrq_abort_inflight(sfp, pack_id);
+		if (res == -EBUSY)
+			res = sg_mrq_abort_inflight(sfp, pack_id);
+	}
+fini:
+	xa_unlock(&sfp->srp_arr);
+	return res;
+
+check_whole_dev:
+	res = -ENODATA;
+	sdp = sfp->parentdp;
+	xa_for_each(&sdp->sfp_arr, idx, o_sfp) {
+		if (o_sfp == sfp)
+			continue;       /* already checked */
+		xa_lock(&o_sfp->srp_arr);
+		/* recurse, dev_scope==false is stopping condition */
+		res = sg_mrq_abort(o_sfp, pack_id, false);
+		xa_unlock(&o_sfp->srp_arr);
+		if (res == 0)
+			break;
+	}
+	return res;
+}
+
 /*
  * Tries to abort an inflight request/command. First it checks the current fd
  * for a match on pack_id or tag. If there is a match, aborts that match.
@@ -3001,8 +3189,8 @@ static int
 sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		__must_hold(sfp->f_mutex)
 {
-	bool use_tag;
-	int pack_id, tag, id;
+	bool use_tag, dev_scope;
+	int pack_id, id;
 	int res = 0;
 	unsigned long iflags, idx;
 	struct sg_fd *o_sfp;
@@ -3014,16 +3202,21 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		return -EFAULT;
 	if (h4p->guard != 'Q' || h4p->protocol != 0 || h4p->subprotocol != 0)
 		return -EPERM;
+	dev_scope = !!(h4p->flags & SGV4_FLAG_DEV_SCOPE);
 	pack_id = h4p->request_extra;
-	tag = h4p->request_tag;
+	if (h4p->flags & SGV4_FLAG_MULTIPLE_REQS) {
+		if (pack_id == 0)
+			return -ENOSTR;
+		return sg_mrq_abort(sfp, pack_id, dev_scope);
+	}
 	use_tag = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm);
-	id = use_tag ? tag : pack_id;
+	id = use_tag ? (int)h4p->request_tag : pack_id;
 
 	xa_lock_irqsave(&sfp->srp_arr, iflags);
 	srp = sg_match_request(sfp, use_tag, id);
 	if (!srp) {	/* assume device (not just fd) scope */
 		xa_unlock_irqrestore(&sfp->srp_arr, iflags);
-		if (!(h4p->flags & SGV4_FLAG_DEV_SCOPE))
+		if (!dev_scope)
 			return -ENODATA;
 		xa_for_each(&sdp->sfp_arr, idx, o_sfp) {
 			if (o_sfp == sfp)
@@ -3430,13 +3623,16 @@ static bool
 sg_any_persistent_orphans(struct sg_fd *sfp)
 {
 	if (test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm)) {
-		int num_waiting = atomic_read_acquire(&sfp->waiting);
+		int num_waiting = atomic_read(&sfp->waiting);
 		unsigned long idx;
 		struct sg_request *srp;
 		struct xarray *xafp = &sfp->srp_arr;
 
-		if (num_waiting < 1)
-			return false;
+		if (num_waiting < 1) {
+			num_waiting = atomic_read_acquire(&sfp->waiting);
+			if (num_waiting < 1)
+				return false;
+		}
 		xa_for_each_marked(xafp, idx, srp, SG_XA_RQ_AWAIT) {
 			if (test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm))
 				return true;
@@ -3902,7 +4098,8 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 			return -ENODEV;
 		if (unlikely(read_only))
 			return -EPERM;
-		mutex_lock(&sfp->f_mutex);
+		if (!mutex_trylock(&sfp->f_mutex))
+			return -EAGAIN;
 		res = sg_ctl_abort(sdp, sfp, p);
 		mutex_unlock(&sfp->f_mutex);
 		return res;
@@ -5451,9 +5648,12 @@ sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
 		*srpp = ERR_PTR(-ENODATA);
 		return true;
 	}
-	num_waiting = atomic_read_acquire(&sfp->waiting);
-	if (num_waiting < 1)
-		goto fini;
+	num_waiting = atomic_read(&sfp->waiting);
+	if (num_waiting < 1) {
+		num_waiting = atomic_read_acquire(&sfp->waiting);
+		if (num_waiting < 1)
+			goto fini;
+	}
 
 	s_idx = (l_await_idx < 0) ? 0 : l_await_idx;
 	idx = s_idx;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 56/83] sg: reduce atomic operations
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (55 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 55/83] sg: mrq abort Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 57/83] sg: add excl_wait flag Douglas Gilbert
                   ` (26 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Various renamings and reductions of atomic operations:
  - rename mrq control object pointer from cv4p to mrqcp
  - change sense of internal file descriptor bitop
    SG_FFD_CMD_Q, now SG_FFD_NO_CMD_Q
  - add early exit to sg_find_srp_by_id() if waiting
    atomic < 1
  - change some atomic bitops into non-atomic variant

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 100 ++++++++++++++++++++++++----------------------
 1 file changed, 53 insertions(+), 47 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 3d659ff90788..4ccb7ab469f1 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -142,7 +142,7 @@ enum sg_shr_var {
 
 /* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */
 #define SG_FFD_FORCE_PACKID	0	/* receive only given pack_id/tag */
-#define SG_FFD_CMD_Q		1	/* clear: only 1 active req per fd */
+#define SG_FFD_NO_CMD_Q		1	/* set: only 1 active req per fd */
 #define SG_FFD_KEEP_ORPHAN	2	/* policy for this fd */
 #define SG_FFD_HIPRI_SEEN	3	/* could have HIPRI requests active */
 #define SG_FFD_TIME_IN_NS	4	/* set: time in nanoseconds, else ms */
@@ -882,8 +882,9 @@ sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 		if (unlikely(res))
 			return res;
 	}
-	/* when v3 seen, allow cmd_q on this fd (def: no cmd_q) */
-	set_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
+	/* when v3 seen, allow cmd_q on this fd (def: cmd_q) */
+	if (test_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm))
+		clear_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(hp->timeout);
 	WRITE_ONCE(cwr.frq_bm[0], 0);
 	__assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
@@ -904,17 +905,10 @@ sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 static void
 sg_sgv4_out_zero(struct sg_io_v4 *h4p)
 {
-	h4p->driver_status = 0;
-	h4p->transport_status = 0;
-	h4p->device_status = 0;
-	h4p->retry_delay = 0;
-	h4p->info = 0;
-	h4p->response_len = 0;
-	h4p->duration = 0;
-	h4p->din_resid = 0;
-	h4p->dout_resid = 0;
-	h4p->generated_tag = 0;
-	h4p->spare_out = 0;
+	const int off = offsetof(struct sg_io_v4, driver_status);
+
+	/* clear from and including driver_status to end of object */
+	memset((u8 *)h4p + off, 0, SZ_SG_IO_V4 - off);
 }
 
 /*
@@ -984,7 +978,7 @@ sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 
 	SG_LOG(3, sfp, "%s: mreqs=%d, sec_reqs=%d\n", __func__, mreqs,
 	       sec_reqs);
-	for ( ; sum_inflight > 0; --sum_inflight) {
+	for ( ; sum_inflight > 0; --sum_inflight, ++cop->info) {
 		srp = NULL;
 		if (mreqs > 0 && sg_mrq_get_ready_srp(sfp, &srp)) {
 			if (IS_ERR(srp)) {	/* -ENODATA: no mrqs here */
@@ -1248,13 +1242,17 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 			rq_sfp = o_sfp;
 			if (!set_other) {
 				set_other = true;
-				set_bit(SG_FFD_CMD_Q, rq_sfp->ffd_bm);
+				if (test_bit(SG_FFD_NO_CMD_Q, rq_sfp->ffd_bm))
+					clear_bit(SG_FFD_NO_CMD_Q,
+						  rq_sfp->ffd_bm);
 			}
 		} else {
 			rq_sfp = fp;
 			if (!set_this) {
 				set_this = true;
-				set_bit(SG_FFD_CMD_Q, rq_sfp->ffd_bm);
+				if (test_bit(SG_FFD_NO_CMD_Q, rq_sfp->ffd_bm))
+					clear_bit(SG_FFD_NO_CMD_Q,
+						  rq_sfp->ffd_bm);
 			}
 		}
 		if (cdb_ap) {	/* already have array of cdbs */
@@ -1380,7 +1378,8 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 			return res;
 	}
 	/* once v4 (or v3) seen, allow cmd_q on this fd (def: no cmd_q) */
-	set_bit(SG_FFD_CMD_Q, sfp->ffd_bm);
+	if (test_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm))
+		clear_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(h4p->timeout);
 	cwr.sfp = sfp;
 	WRITE_ONCE(cwr.frq_bm[0], 0);
@@ -1728,14 +1727,14 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		pack_id = hi_p->pack_id;
 	}
 	if (unlikely(rq_flags & SGV4_FLAG_MULTIPLE_REQS))
-		return ERR_PTR(-ERANGE);
+		return ERR_PTR(-ERANGE);  /* only control object sets this */
 	if (sg_fd_is_shared(fp)) {
 		res = sg_share_chk_flags(fp, rq_flags, dxfr_len, dir, &sh_var);
 		if (unlikely(res < 0))
 			return ERR_PTR(res);
 	} else {
 		sh_var = SG_SHR_NONE;
-		if (rq_flags & SGV4_FLAG_SHARE)
+		if (unlikely(rq_flags & SGV4_FLAG_SHARE))
 			return ERR_PTR(-ENOMSG);
 	}
 	if (unlikely(dxfr_len >= SZ_256M))
@@ -2828,17 +2827,14 @@ sg_rq_landed(struct sg_device *sdp, struct sg_request *srp)
 	return atomic_read_acquire(&srp->rq_st) != SG_RQ_INFLIGHT || SG_IS_DETACHING(sdp);
 }
 
-/*
- * This is a blocking wait for a specific srp. When h4p is non-NULL, it is
- * the blocking multiple request case
- */
+/* This is a blocking wait then complete for a specific srp. */
 static int
 sg_wait_event_srp(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 		  struct sg_request *srp)
 {
 	int res;
-	enum sg_rq_state sr_st;
 	struct sg_device *sdp = sfp->parentdp;
+	enum sg_rq_state sr_st;
 
 	if (atomic_read(&srp->rq_st) != SG_RQ_INFLIGHT)
 		goto skip_wait;		/* and skip _acquire() */
@@ -2993,8 +2989,10 @@ sg_match_first_mrq_after(struct sg_fd *sfp, int pack_id,
 	unsigned long idx;
 	struct sg_request *srp;
 
-	if (atomic_read(&sfp->waiting) < 1)
-		return NULL;
+	if (atomic_read(&sfp->waiting) < 1) {
+		if (atomic_read_acquire(&sfp->waiting) < 1)
+			return NULL;
+	}
 once_more:
 	xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
 		if (unlikely(!srp))
@@ -4170,16 +4168,16 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		       val);
 		res = put_user(val, ip);
 		return res;
-	case SG_SET_COMMAND_Q:
+	case SG_SET_COMMAND_Q:	/* set by driver whenever v3 or v4 req seen */
 		SG_LOG(3, sfp, "%s:    SG_SET_COMMAND_Q\n", __func__);
 		res = get_user(val, ip);
 		if (unlikely(res))
 			return res;
-		assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, !!val);
+		assign_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm, !val);
 		return 0;
 	case SG_GET_COMMAND_Q:
 		SG_LOG(3, sfp, "%s:    SG_GET_COMMAND_Q\n", __func__);
-		return put_user(test_bit(SG_FFD_CMD_Q, sfp->ffd_bm), ip);
+		return put_user(!test_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm), ip);
 	case SG_SET_KEEP_ORPHAN:
 		SG_LOG(3, sfp, "%s:    SG_SET_KEEP_ORPHAN\n", __func__);
 		res = get_user(val, ip);
@@ -4436,7 +4434,7 @@ sg_poll(struct file *filp, poll_table *wait)
 
 	if (SG_IS_DETACHING(sfp->parentdp))
 		p_res |= EPOLLHUP;
-	else if (likely(test_bit(SG_FFD_CMD_Q, sfp->ffd_bm)))
+	else if (likely(!test_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm)))
 		p_res |= EPOLLOUT | EPOLLWRNORM;
 	else if (atomic_read(&sfp->submitted) == 0)
 		p_res |= EPOLLOUT | EPOLLWRNORM;
@@ -5113,7 +5111,7 @@ sg_set_map_data(const struct sg_scatter_hold *schp, bool up_valid,
 static int
 sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 {
-	bool reserved, us_xfer;
+	bool reserved, no_xfer, us_xfer;
 	int res = 0;
 	int dxfer_len = 0;
 	int r0w = READ;
@@ -5201,16 +5199,16 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		goto err_out;
 	scsi_rp->cmd_len = cwrp->cmd_len;
 	srp->cmd_opcode = scsi_rp->cmd[0];
+	no_xfer = dxfer_len <= 0 || dxfer_dir == SG_DXFER_NONE;
 	us_xfer = !(rq_flags & (SG_FLAG_NO_DXFER | SG_FLAG_MMAP_IO));
-	assign_bit(SG_FRQ_US_XFER, srp->frq_bm, us_xfer);
+	__assign_bit(SG_FRQ_US_XFER, srp->frq_bm, !no_xfer && us_xfer);
 	reserved = (sfp->rsv_srp == srp);
 	rqq->end_io_data = srp;
 	scsi_rp->retries = SG_DEFAULT_RETRIES;
 	req_schp = srp->sgatp;
 
-	if (dxfer_len <= 0 || dxfer_dir == SG_DXFER_NONE) {
+	if (no_xfer) {
 		SG_LOG(4, sfp, "%s: no data xfer [0x%pK]\n", __func__, srp);
-		clear_bit(SG_FRQ_US_XFER, srp->frq_bm);
 		goto fini;	/* path of reqs with no din nor dout */
 	} else if (unlikely(rq_flags & SG_FLAG_DIRECT_IO) && iov_count == 0 &&
 		   !sdp->device->host->unchecked_isa_dma &&
@@ -5773,14 +5771,16 @@ sg_build_reserve(struct sg_fd *sfp, int buflen)
 /*
  * Setup an active request (soon to carry a SCSI command) to the current file
  * descriptor by creating a new one or re-using a request from the free
- * list (fl). If successful returns a valid pointer in SG_RQ_BUSY state. On
- * failure returns a negated errno value twisted by ERR_PTR() macro.
+ * list (fl). If successful returns a valid pointer to a sg_request object
+ * which is in the SG_RQ_BUSY state. On failure returns a negated errno value
+ * twisted by ERR_PTR() macro. Note that once a file share is established,
+ * the read-side's reserve request can only be used in a request share.
  */
 static struct sg_request *
 sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 {
 	bool act_empty = false;
-	bool allow_rsv = true;
+	bool allow_rsv = true;		/* see note above */
 	bool mk_new_srp = true;
 	bool ws_rq = false;
 	bool try_harder = false;
@@ -5872,6 +5872,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 	if (IS_ERR(r_srp)) {
 		if (PTR_ERR(r_srp) == -EBUSY)
 			goto err_out;
+#if IS_ENABLED(SG_LOG_ACTIVE)
 		if (sh_var == SG_SHR_RS_RQ)
 			snprintf(b, sizeof(b), "SG_SHR_RS_RQ --> sr_st=%s",
 				 sg_rq_st_str(sr_st, false));
@@ -5881,6 +5882,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		else
 			snprintf(b, sizeof(b), "sh_var=%s",
 				 sg_shr_str(sh_var, false));
+#endif
 		goto err_out;
 	}
 	cp = "";
@@ -5907,11 +5909,14 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		s_idx = (l_used_idx < 0) ? 0 : l_used_idx;
 		if (l_used_idx >= 0 && xa_get_mark(xafp, s_idx, SG_XA_RQ_INACTIVE)) {
 			r_srp = xa_load(xafp, s_idx);
-			if (r_srp && r_srp->sgat_h.buflen <= SG_DEF_SECTOR_SZ) {
-				if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY) == 0) {
-					mk_new_srp = false;
-					atomic_dec(&fp->inactives);
-					goto have_existing;
+			if (r_srp && (allow_rsv || rsv_srp != r_srp)) {
+				if (r_srp->sgat_h.buflen <= SG_DEF_SECTOR_SZ) {
+					if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE,
+							    SG_RQ_BUSY) == 0) {
+						mk_new_srp = false;
+						atomic_dec(&fp->inactives);
+						goto have_existing;
+					}
 				}
 			}
 		}
@@ -5972,12 +5977,13 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 
 good_fini:
 	if (mk_new_srp) {	/* Need new sg_request object */
-		bool allow_cmd_q = test_bit(SG_FFD_CMD_Q, fp->ffd_bm);
+		bool disallow_cmd_q = test_bit(SG_FFD_NO_CMD_Q, fp->ffd_bm);
 		int res;
 		u32 n_idx;
 
 		cp = "new";
-		if (!allow_cmd_q && atomic_read(&fp->submitted) > 0) {
+		r_srp = NULL;
+		if (disallow_cmd_q && atomic_read(&fp->submitted) > 0) {
 			r_srp = ERR_PTR(-EDOM);
 			SG_LOG(6, fp, "%s: trying 2nd req but cmd_q=false\n",
 			       __func__);
@@ -6102,7 +6108,7 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 	sfp->filp = filp;
 	/* other bits in sfp->ffd_bm[1] cleared by kzalloc() above */
 	__assign_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm, SG_DEF_FORCE_PACK_ID);
-	__assign_bit(SG_FFD_CMD_Q, sfp->ffd_bm, SG_DEF_COMMAND_Q);
+	__assign_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm, !SG_DEF_COMMAND_Q);
 	__assign_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm, SG_DEF_KEEP_ORPHAN);
 	__assign_bit(SG_FFD_TIME_IN_NS, sfp->ffd_bm, SG_DEF_TIME_UNIT);
 	__assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm, SG_DEFAULT_Q_AT);
@@ -6616,7 +6622,7 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 		n += scnprintf(obp + n, len - n, "timeout=%ds rs", to / 1000);
 	n += scnprintf(obp + n, len - n, "v_buflen=%d%s idx=%lu\n   cmd_q=%d ",
 		       fp->rsv_srp->sgatp->buflen, cp, idx,
-		       (int)test_bit(SG_FFD_CMD_Q, fp->ffd_bm));
+		       (int)!test_bit(SG_FFD_NO_CMD_Q, fp->ffd_bm));
 	n += scnprintf(obp + n, len - n,
 		       "f_packid=%d k_orphan=%d ffd_bm=0x%lx\n",
 		       (int)test_bit(SG_FFD_FORCE_PACKID, fp->ffd_bm),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 57/83] sg: add excl_wait flag
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (56 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 56/83] sg: reduce atomic operations Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 58/83] sg: tweak sg_find_sfp_by_fd() Douglas Gilbert
                   ` (25 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

The new SG_CTL_FLAGM_EXCL_WAITQ boolean flag can be set on a sg file
descriptor so that subsequent wait_event_interruptible() calls can
be changed to their "_exclusive()" variants. This is to address the
potential "thundering herd" problem with the wait_queue

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 83 ++++++++++++++++++++++++++++--------------
 include/uapi/scsi/sg.h | 29 ++++++++++-----
 2 files changed, 74 insertions(+), 38 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 4ccb7ab469f1..02435d2ef555 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -139,6 +139,7 @@ enum sg_shr_var {
 #define SG_FRQ_FOR_MMAP		7	/* request needs PAGE_SIZE elements */
 #define SG_FRQ_COUNT_ACTIVE	8	/* sfp->submitted + waiting active */
 #define SG_FRQ_ISSUED		9	/* blk_execute_rq_nowait() finished */
+#define SG_FRQ_POLL_SLEPT	10	/* stop re-entry of hybrid_sleep() */
 
 /* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */
 #define SG_FFD_FORCE_PACKID	0	/* receive only given pack_id/tag */
@@ -153,6 +154,7 @@ enum sg_shr_var {
 #define SG_FFD_NO_DURATION	9	/* don't do command duration calc */
 #define SG_FFD_MORE_ASYNC	10	/* yield EBUSY more often */
 #define SG_FFD_MRQ_ABORT	11	/* SG_IOABORT + FLAG_MULTIPLE_REQS */
+#define SG_FFD_EXCL_WAITQ	12	/* append _exclusive to wait_event */
 
 /* Bit positions (flags) for sg_device::fdev_bm bitmask follow */
 #define SG_FDEV_EXCLUDE		0	/* have fd open with O_EXCL */
@@ -962,6 +964,17 @@ sg_mrq_1complet(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 	return 0;
 }
 
+static int
+sg_wait_mrq_event(struct sg_fd *sfp, struct sg_request **srpp)
+{
+	if (test_bit(SG_FFD_EXCL_WAITQ, sfp->ffd_bm))
+		return __wait_event_interruptible_exclusive
+					(sfp->cmpl_wait,
+					 sg_mrq_get_ready_srp(sfp, srpp));
+	return __wait_event_interruptible(sfp->cmpl_wait,
+					  sg_mrq_get_ready_srp(sfp, srpp));
+}
+
 /*
  * This is a fair-ish algorithm for an interruptible wait on two file
  * descriptors. It favours the main fd over the secondary fd (sec_sfp).
@@ -1002,9 +1015,7 @@ sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 					return res;
 			}
 		} else if (mreqs > 0) {
-			res = wait_event_interruptible
-					(sfp->cmpl_wait,
-					 sg_mrq_get_ready_srp(sfp, &srp));
+			res = sg_wait_mrq_event(sfp, &srp);
 			if (unlikely(res))
 				return res;	/* signal --> -ERESTARTSYS */
 			if (IS_ERR(srp)) {
@@ -1017,9 +1028,7 @@ sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 					return res;
 			}
 		} else if (sec_reqs > 0) {
-			res = wait_event_interruptible
-					(sec_sfp->cmpl_wait,
-					 sg_mrq_get_ready_srp(sec_sfp, &srp));
+			res = sg_wait_mrq_event(sec_sfp, &srp);
 			if (unlikely(res))
 				return res;	/* signal --> -ERESTARTSYS */
 			if (IS_ERR(srp)) {
@@ -1082,6 +1091,7 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 				       rip, k, "no IMMED with COMPLETE_B4");
 				return -ERANGE;
 			}
+			/* N.B. SGV4_FLAG_SIG_ON_OTHER is allowed */
 		}
 		if (!sg_fd_is_shared(sfp)) {
 			if (unlikely(flags & SGV4_FLAG_SHARE)) {
@@ -1113,8 +1123,9 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 /*
  * Implements the multiple request functionality. When 'blocking' is true
  * invocation was via ioctl(SG_IO), otherwise it was via ioctl(SG_IOSUBMIT).
- * Only fully non-blocking if IMMED flag given or when ioctl(SG_IOSUBMIT)
- * is used with O_NONBLOCK set on its file descriptor.
+ * Submit non-blocking if IMMED flag given or when ioctl(SG_IOSUBMIT)
+ * is used with O_NONBLOCK set on its file descriptor. Hipri non-blocking
+ * is when the HIPRI flag is given.
  */
 static int
 sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
@@ -1174,8 +1185,7 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 		immed = true;
 	SG_LOG(3, fp, "%s: %s, tot_reqs=%u, id_of_mrq=%d\n", __func__,
 	       (immed ? "IMMED" : (blocking ?  "ordered blocking" :
-				   "variable blocking")),
-	       tot_reqs, id_of_mrq);
+				   "variable blocking")), tot_reqs, id_of_mrq);
 	sg_sgv4_out_zero(cop);
 
 	if (unlikely(tot_reqs > U16_MAX)) {
@@ -2018,9 +2028,7 @@ sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs,
 		return k;
 
 	for ( ; k < max_mrqs; ++k) {
-		res = wait_event_interruptible
-				(sfp->cmpl_wait,
-				 sg_mrq_get_ready_srp(sfp, &srp));
+		res = sg_wait_mrq_event(sfp, &srp);
 		if (unlikely(res))
 			return res;	/* signal --> -ERESTARTSYS */
 		if (IS_ERR(srp))
@@ -2083,6 +2091,19 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p,
 	return res;
 }
 
+static int
+sg_wait_id_event(struct sg_fd *sfp, struct sg_request **srpp, int id,
+		 bool is_tag)
+{
+	if (test_bit(SG_FFD_EXCL_WAITQ, sfp->ffd_bm))
+		return __wait_event_interruptible_exclusive
+				(sfp->cmpl_wait,
+				 sg_get_ready_srp(sfp, srpp, id, is_tag));
+	return __wait_event_interruptible
+			(sfp->cmpl_wait,
+			 sg_get_ready_srp(sfp, srpp, id, is_tag));
+}
+
 /*
  * Called when ioctl(SG_IORECEIVE) received. Expects a v4 interface object.
  * Checks if O_NONBLOCK file flag given, if not checks given 'flags' field
@@ -2134,9 +2155,7 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 			return -ENODEV;
 		if (non_block)
 			return -EAGAIN;
-		res = wait_event_interruptible
-				(sfp->cmpl_wait,
-				 sg_get_ready_srp(sfp, &srp, id, use_tag));
+		res = sg_wait_id_event(sfp, &srp, id, use_tag);
 		if (unlikely(res))
 			return res;	/* signal --> -ERESTARTSYS */
 		if (IS_ERR(srp))
@@ -2191,9 +2210,7 @@ sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 			return -ENODEV;
 		if (non_block)
 			return -EAGAIN;
-		res = wait_event_interruptible
-				(sfp->cmpl_wait,
-				 sg_get_ready_srp(sfp, &srp, pack_id, false));
+		res = sg_wait_id_event(sfp, &srp, pack_id, false);
 		if (unlikely(res))
 			return res;	/* signal --> -ERESTARTSYS */
 		if (IS_ERR(srp))
@@ -2351,7 +2368,7 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 					int flgs;
 
 					ret = get_user(flgs, &h3_up->flags);
-					if (ret)
+					if (unlikely(ret))
 						return ret;
 					if (flgs & SGV4_FLAG_IMMED)
 						non_block = true;
@@ -2374,9 +2391,7 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 			return -ENODEV;
 		if (non_block) /* O_NONBLOCK or v3::flags & SGV4_FLAG_IMMED */
 			return -EAGAIN;
-		ret = wait_event_interruptible
-				(sfp->cmpl_wait,
-				 sg_get_ready_srp(sfp, &srp, want_id, false));
+		ret = sg_wait_id_event(sfp, &srp, want_id, false);
 		if (unlikely(ret))  /* -ERESTARTSYS as signal hit process */
 			return ret;
 		if (IS_ERR(srp))
@@ -2846,9 +2861,9 @@ sg_wait_event_srp(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 		goto skip_wait;
 	}
 	SG_LOG(3, sfp, "%s: about to wait_event...()\n", __func__);
-	/* usually will be woken up by sg_rq_end_io() callback */
-	res = wait_event_interruptible(sfp->cmpl_wait,
-				       sg_rq_landed(sdp, srp));
+	/* N.B. The SG_FFD_EXCL_WAITQ flag is ignored here. */
+	res = __wait_event_interruptible(sfp->cmpl_wait,
+					 sg_rq_landed(sdp, srp));
 	if (unlikely(res)) { /* -ERESTARTSYS because signal hit thread */
 		set_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
 		/* orphans harvested when sfp->keep_orphan is false */
@@ -3316,7 +3331,7 @@ sg_find_sfp_by_fd(const struct file *search_for, int search_fd,
 	++num_d;
 	for (k = 0; k < num_d; ++k) {
 		sdp = idr_find(&sg_index_idr, k);
-		if (unlikely(!sdp || SG_IS_DETACHING(sdp)))
+		if (unlikely(!sdp) || SG_IS_DETACHING(sdp))
 			continue;
 		xa_for_each_marked(&sdp->sfp_arr, idx, sfp,
 				   SG_XA_FD_UNSHARED) {
@@ -3354,7 +3369,7 @@ sg_find_sfp_by_fd(const struct file *search_for, int search_fd,
 		++num_d;
 		for (k = 0; k < num_d; ++k) {
 			sdp = idr_find(&sg_index_idr, k);
-			if (unlikely(!sdp || SG_IS_DETACHING(sdp)))
+			if (unlikely(!sdp) || SG_IS_DETACHING(sdp))
 				continue;
 			xa_for_each(&sdp->sfp_arr, idx, sfp) {
 				if (!sg_fd_is_shared(sfp))
@@ -3781,6 +3796,18 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		else
 			c_flgs_val_out &= ~SG_CTL_FLAGM_MORE_ASYNC;
 	}
+	/* EXCL_WAITQ boolean, [rbw] */
+	if (c_flgs_rm & SG_CTL_FLAGM_EXCL_WAITQ)
+		flg = test_bit(SG_FFD_EXCL_WAITQ, sfp->ffd_bm);
+	if (c_flgs_wm & SG_CTL_FLAGM_EXCL_WAITQ)
+		assign_bit(SG_FFD_EXCL_WAITQ, sfp->ffd_bm,
+			   !!(c_flgs_val_in & SG_CTL_FLAGM_EXCL_WAITQ));
+	if (c_flgs_rm & SG_CTL_FLAGM_EXCL_WAITQ) {
+		if (flg)
+			c_flgs_val_out |= SG_CTL_FLAGM_EXCL_WAITQ;
+		else
+			c_flgs_val_out &= ~SG_CTL_FLAGM_EXCL_WAITQ;
+	}
 
 	if (c_flgs_val_in != c_flgs_val_out)
 		seip->ctl_flags = c_flgs_val_out;
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index e1919eadf036..8b3fe773dfd5 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -114,16 +114,16 @@ typedef struct sg_io_hdr {
 #define SGV4_FLAG_YIELD_TAG 0x8  /* sg_io_v4::generated_tag set after SG_IOS */
 #define SGV4_FLAG_Q_AT_TAIL SG_FLAG_Q_AT_TAIL
 #define SGV4_FLAG_Q_AT_HEAD SG_FLAG_Q_AT_HEAD
-#define SGV4_FLAG_COMPLETE_B4  0x100
-#define SGV4_FLAG_SIGNAL  0x200	/* v3: ignored; v4 signal on completion */
-#define SGV4_FLAG_IMMED 0x400 /* for polling with SG_IOR, ignored in SG_IOS */
+#define SGV4_FLAG_COMPLETE_B4  0x100	/* mrq: complete this rq before next */
+#define SGV4_FLAG_SIGNAL 0x200	/* v3: ignored; v4 signal on completion */
+#define SGV4_FLAG_IMMED 0x400   /* issue request and return immediately ... */
 #define SGV4_FLAG_HIPRI 0x800 /* request will use blk_poll to complete */
 #define SGV4_FLAG_STOP_IF 0x1000	/* Stops sync mrq if error or warning */
 #define SGV4_FLAG_DEV_SCOPE 0x2000 /* permit SG_IOABORT to have wider scope */
 #define SGV4_FLAG_SHARE 0x4000	/* share IO buffer; needs SG_SEIM_SHARE_FD */
 #define SGV4_FLAG_DO_ON_OTHER 0x8000 /* available on either of shared pair */
 #define SGV4_FLAG_NO_DXFER SG_FLAG_NO_DXFER /* but keep dev<-->kernel xfr */
-#define SGV4_FLAG_MULTIPLE_REQS 0x20000	/* n sg_io_v4s in data-in */
+#define SGV4_FLAG_MULTIPLE_REQS 0x20000	/* 1 or more sg_io_v4-s in data-in */
 
 /* Output (potentially OR-ed together) in v3::info or v4::info field */
 #define SG_INFO_OK_MASK 0x1
@@ -151,7 +151,7 @@ typedef struct sg_scsi_id {
 	short h_cmd_per_lun;/* host (adapter) maximum commands per lun */
 	short d_queue_depth;/* device (or adapter) maximum queue length */
 	union {
-		int unused[2];  /* as per version 3 driver */
+		int unused[2];	/* as per version 3 driver */
 		__u8 scsi_lun[8];  /* full 8 byte SCSI LUN [in v4 driver] */
 	};
 } sg_scsi_id_t;
@@ -163,8 +163,14 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 	/* sg_io_owned set imples synchronous, clear implies asynchronous */
 	char sg_io_owned;/* 0 -> complete with read(), 1 -> owned by SG_IO */
 	char problem;	/* 0 -> no problem detected, 1 -> error to report */
+	/* If SG_CTL_FLAGM_TAG_FOR_PACK_ID set on fd then next field is tag */
 	int pack_id;	/* pack_id, in v4 driver may be tag instead */
 	void __user *usr_ptr;	/* user provided pointer in v3+v4 interface */
+	/*
+	 * millisecs elapsed since the command started (req_state==1) or
+	 * command duration (req_state==2). Will be in nanoseconds after
+	 * the SG_SET_GET_EXTENDED{TIME_IN_NS} ioctl.
+	 */
 	unsigned int duration;
 	int unused;
 } sg_req_info_t;
@@ -199,12 +205,13 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_CTL_FLAGM_IS_SHARE	0x20	/* rd: fd is read-side or write-side share */
 #define SG_CTL_FLAGM_IS_READ_SIDE 0x40	/* rd: this fd is read-side share */
 #define SG_CTL_FLAGM_UNSHARE	0x80	/* undo share after inflight cmd */
-/* rd> 1: read-side finished 0: not; wr> 1: finish share post read-side */
+/* rd> 1: read-side finished, 0: not; wr> 1: finish share post read-side */
 #define SG_CTL_FLAGM_READ_SIDE_FINI 0x100 /* wr> 0: setup for repeat write-side req */
 #define SG_CTL_FLAGM_READ_SIDE_ERR 0x200 /* rd: sharing, read-side got error */
 #define SG_CTL_FLAGM_NO_DURATION 0x400	/* don't calc command duration */
 #define SG_CTL_FLAGM_MORE_ASYNC	0x800	/* yield EAGAIN in more cases */
-#define SG_CTL_FLAGM_ALL_BITS	0xfff	/* should be OR of previous items */
+#define SG_CTL_FLAGM_EXCL_WAITQ 0x1000	/* only 1 wake up per response */
+#define SG_CTL_FLAGM_ALL_BITS	0x1fff	/* should be OR of previous items */
 
 /* Write one of the following values to sg_extended_info::read_value, get... */
 #define SG_SEIRV_INT_MASK	0x0	/* get SG_SEIM_ALL_BITS */
@@ -437,9 +444,11 @@ struct sg_header {
 /*
  * New ioctls to replace async (non-blocking) write()/read() interface.
  * Present in version 4 and later of the sg driver [>20190427]. The
- * SG_IOSUBMIT and SG_IORECEIVE ioctls accept the sg_v4 interface based on
- * struct sg_io_v4 found in <include/uapi/linux/bsg.h>. These objects are
- * passed by a pointer in the third argument of the ioctl.
+ * SG_IOSUBMIT_V3 and SG_IORECEIVE_V3 ioctls accept the sg_v3 interface
+ * based on struct sg_io_hdr shown above. The SG_IOSUBMIT and SG_IORECEIVE
+ * ioctls accept the sg_v4 interface based on struct sg_io_v4 found in
+ * <include/uapi/linux/bsg.h>. These objects are passed by a pointer in
+ * the third argument of the ioctl.
  *
  * Data may be transferred both from the user space to the driver by these
  * ioctls. Hence the _IOWR macro is used here to generate the ioctl number
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 58/83] sg: tweak sg_find_sfp_by_fd()
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (57 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 57/83] sg: add excl_wait flag Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 59/83] sg: add snap_dev flag and snapped in debugfs Douglas Gilbert
                   ` (24 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

The sg_find_sfp_by_fd() function is called every time a file share
is established. If request sharing is being used to copy to two
or more destinations, there will be many calls to this function
to swap between those destination, so its performance may become
important. Simplify the "search" by drilling into the given
fd's 'struct file' as, if all is well, the wanted sfp is in
filp->private_data.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 193 +++++++++++++++-------------------------------
 1 file changed, 62 insertions(+), 131 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 02435d2ef555..7f62cd9bffe0 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -457,9 +457,8 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 		while (atomic_read(&sdp->open_cnt) > 0) {
 			mutex_unlock(&sdp->open_rel_lock);
 			res = wait_event_interruptible
-					(sdp->open_wait,
-					 (SG_IS_DETACHING(sdp) ||
-					  atomic_read(&sdp->open_cnt) == 0));
+				(sdp->open_wait,
+				 (SG_IS_DETACHING(sdp) || atomic_read(&sdp->open_cnt) == 0));
 			mutex_lock(&sdp->open_rel_lock);
 
 			if (res) /* -ERESTARTSYS */
@@ -471,9 +470,7 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 		while (SG_HAVE_EXCLUDE(sdp)) {
 			mutex_unlock(&sdp->open_rel_lock);
 			res = wait_event_interruptible
-					(sdp->open_wait,
-					 (SG_IS_DETACHING(sdp) ||
-					  !SG_HAVE_EXCLUDE(sdp)));
+				(sdp->open_wait, (SG_IS_DETACHING(sdp) || !SG_HAVE_EXCLUDE(sdp)));
 			mutex_lock(&sdp->open_rel_lock);
 
 			if (res) /* -ERESTARTSYS */
@@ -2589,7 +2586,7 @@ sg_unshare_rs_fd(struct sg_fd *rs_sfp, bool lck)
 	__xa_clear_mark(xadp, rs_sfp->idx, SG_XA_FD_RS_SHARE);
 	if (lck)
 		xa_unlock_irqrestore(xadp, iflags);
-	kref_put(&rs_sfp->f_ref, sg_remove_sfp);/* get: sg_find_sfp_helper() */
+	kref_put(&rs_sfp->f_ref, sg_remove_sfp);/* get: sg_find_sfp_by_fd() */
 }
 
 static void
@@ -2606,7 +2603,7 @@ sg_unshare_ws_fd(struct sg_fd *ws_sfp, bool lck)
 	/* SG_XA_FD_RS_SHARE mark should be already clear */
 	if (lck)
 		xa_unlock_irqrestore(xadp, iflags);
-	kref_put(&ws_sfp->f_ref, sg_remove_sfp);/* get: sg_find_sfp_helper() */
+	kref_put(&ws_sfp->f_ref, sg_remove_sfp);/* get: sg_find_sfp_by_fd() */
 }
 
 /*
@@ -3249,144 +3246,67 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	return res;
 }
 
-static int
-sg_idr_max_id(int id, void *p, void *data)
-		__must_hold(&sg_index_lock)
-{
-	int *k = data;
-
-	if (*k < id)
-		*k = id;
-	return 0;
-}
-
-static int
-sg_find_sfp_helper(struct sg_fd *from_sfp, struct sg_fd *pair_sfp,
-		   bool from_rd_side, int search_fd)
+/*
+ * Check if search_for is a "char" device fd whose MAJOR is this driver.
+ * If so filp->private_data must be the sfp we are looking for. Do further
+ * checks (e.g. not already in a file share). If all is well set up cross
+ * references and adjust xarray marks. Returns a sfp or negative errno
+ * twisted by ERR_PTR().
+ */
+static struct sg_fd *
+sg_find_sfp_by_fd(const struct file *search_for, struct sg_fd *from_sfp,
+		  bool is_reshare)
 		__must_hold(&from_sfp->f_mutex)
 {
-	bool same_sdp;
 	int res = 0;
 	unsigned long iflags;
+	struct sg_fd *sfp;
 	struct sg_device *from_sdp = from_sfp->parentdp;
-	struct sg_device *pair_sdp = pair_sfp->parentdp;
+	struct sg_device *sdp;
 
-	if (unlikely(!mutex_trylock(&pair_sfp->f_mutex)))
-		return -EPROBE_DEFER;	/* use to suggest re-invocation */
-	if (unlikely(sg_fd_is_shared(pair_sfp)))
+	SG_LOG(6, from_sfp, "%s: enter,  from_sfp=%pK search_for=%pK\n",
+	       __func__, from_sfp, search_for);
+	if (!(S_ISCHR(search_for->f_inode->i_mode) &&
+	      MAJOR(search_for->f_inode->i_rdev) == SCSI_GENERIC_MAJOR))
+		return ERR_PTR(-EBADF);
+	sfp = search_for->private_data;
+	if (!sfp)
+		return ERR_PTR(-ENXIO);
+	sdp = sfp->parentdp;
+	if (!sdp)
+		return ERR_PTR(-ENXIO);
+	if (unlikely(!mutex_trylock(&sfp->f_mutex)))
+		return ERR_PTR(-EPROBE_DEFER);	/* suggest re-invocation */
+	if (unlikely(sg_fd_is_shared(sfp)))
 		res = -EADDRNOTAVAIL;
-	else if (unlikely(SG_HAVE_EXCLUDE(pair_sdp)))
+	else if (unlikely(SG_HAVE_EXCLUDE(sdp)))
 		res = -EPERM;
 	if (res) {
-		mutex_unlock(&pair_sfp->f_mutex);
-		return res;
+		mutex_unlock(&sfp->f_mutex);
+		return ERR_PTR(res);
 	}
-	same_sdp = (from_sdp == pair_sdp);
+
 	xa_lock_irqsave(&from_sdp->sfp_arr, iflags);
-	rcu_assign_pointer(from_sfp->share_sfp, pair_sfp);
+	rcu_assign_pointer(from_sfp->share_sfp, sfp);
 	__xa_clear_mark(&from_sdp->sfp_arr, from_sfp->idx, SG_XA_FD_UNSHARED);
-	kref_get(&from_sfp->f_ref);	/* so unshare done before release */
-	if (from_rd_side)
+	if (is_reshare)	/* reshare case: no kref_get() on read-side */
 		__xa_set_mark(&from_sdp->sfp_arr, from_sfp->idx,
 			      SG_XA_FD_RS_SHARE);
-
-	if (!same_sdp) {
+	else
+		kref_get(&from_sfp->f_ref);/* so unshare done before release */
+	if (from_sdp != sdp) {
 		xa_unlock_irqrestore(&from_sdp->sfp_arr, iflags);
-		xa_lock_irqsave(&pair_sdp->sfp_arr, iflags);
-	}
-
-	mutex_unlock(&pair_sfp->f_mutex);
-	rcu_assign_pointer(pair_sfp->share_sfp, from_sfp);
-	__xa_clear_mark(&pair_sdp->sfp_arr, pair_sfp->idx, SG_XA_FD_UNSHARED);
-	if (!from_rd_side)
-		__xa_set_mark(&pair_sdp->sfp_arr, pair_sfp->idx,
-			      SG_XA_FD_RS_SHARE);
-	kref_get(&pair_sfp->f_ref);	/* keep symmetry */
-	xa_unlock_irqrestore(&pair_sdp->sfp_arr, iflags);
-	return 0;
-}
-
-/*
- * Scans sg driver object tree looking for search_for. Returns valid pointer
- * if found; returns negated errno twisted by ERR_PTR(); or return NULL if
- * not found (and no error).
- */
-static struct sg_fd *
-sg_find_sfp_by_fd(const struct file *search_for, int search_fd,
-		  struct sg_fd *from_sfp, bool from_is_rd_side)
-		__must_hold(&from_sfp->f_mutex)
-{
-	bool found = false;
-	int k, num_d;
-	int res = 0;
-	unsigned long iflags, idx;
-	struct sg_fd *sfp;
-	struct sg_device *sdp;
-
-	num_d = -1;
-	SG_LOG(6, from_sfp, "%s: enter,  from_sfp=%pK search_for=%pK\n",
-	       __func__, from_sfp, search_for);
-	read_lock_irqsave(&sg_index_lock, iflags);
-	idr_for_each(&sg_index_idr, sg_idr_max_id, &num_d);
-	++num_d;
-	for (k = 0; k < num_d; ++k) {
-		sdp = idr_find(&sg_index_idr, k);
-		if (unlikely(!sdp) || SG_IS_DETACHING(sdp))
-			continue;
-		xa_for_each_marked(&sdp->sfp_arr, idx, sfp,
-				   SG_XA_FD_UNSHARED) {
-			if (sfp == from_sfp)
-				continue;
-			if (test_bit(SG_FFD_RELEASE, sfp->ffd_bm))
-				continue;
-			if (search_for != sfp->filp)
-				continue;       /* not this one */
-			res = sg_find_sfp_helper(from_sfp, sfp,
-						 from_is_rd_side, search_fd);
-			if (likely(res == 0)) {
-				found = true;
-				break;
-			}
-		}       /* end of loop of all fd_s in current device */
-		if (res || found)
-			break;
-	}       /* end of loop of all sg devices */
-	read_unlock_irqrestore(&sg_index_lock, iflags);
-	if (found) {	/* mark both fds as part of share */
-		struct sg_device *from_sdp = from_sfp->parentdp;
-
 		xa_lock_irqsave(&sdp->sfp_arr, iflags);
-		__xa_clear_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_UNSHARED);
-		xa_unlock_irqrestore(&sdp->sfp_arr, iflags);
-		xa_lock_irqsave(&from_sdp->sfp_arr, iflags);
-		__xa_clear_mark(&from_sfp->parentdp->sfp_arr, from_sfp->idx,
-				SG_XA_FD_UNSHARED);
-		xa_unlock_irqrestore(&from_sdp->sfp_arr, iflags);
-	} else if (res == 0) {	/* fine tune error response */
-		num_d = -1;
-		read_lock_irqsave(&sg_index_lock, iflags);
-		idr_for_each(&sg_index_idr, sg_idr_max_id, &num_d);
-		++num_d;
-		for (k = 0; k < num_d; ++k) {
-			sdp = idr_find(&sg_index_idr, k);
-			if (unlikely(!sdp) || SG_IS_DETACHING(sdp))
-				continue;
-			xa_for_each(&sdp->sfp_arr, idx, sfp) {
-				if (!sg_fd_is_shared(sfp))
-					continue;
-				if (search_for == sfp->filp) {
-					res = -EADDRNOTAVAIL;  /* already */
-					break;
-				}
-			}
-			if (res)
-				break;
-		}
-		read_unlock_irqrestore(&sg_index_lock, iflags);
 	}
-	if (unlikely(res < 0))
-		return ERR_PTR(res);
-	return found ? sfp : NULL;
+	mutex_unlock(&sfp->f_mutex);
+	rcu_assign_pointer(sfp->share_sfp, from_sfp);
+	__xa_clear_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_UNSHARED);
+	if (!is_reshare)
+		__xa_set_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_RS_SHARE);
+	kref_get(&sfp->f_ref);		/* undone: sg_unshare_*_fd() */
+	xa_unlock_irqrestore(&sdp->sfp_arr, iflags);
+
+	return sfp;
 }
 
 /*
@@ -3423,7 +3343,7 @@ sg_fd_share(struct sg_fd *ws_sfp, int m_fd)
 	SG_LOG(6, ws_sfp, "%s: read-side fd okay, scan for filp=0x%pK\n",
 	       __func__, filp);
 again:
-	rs_sfp = sg_find_sfp_by_fd(filp, m_fd, ws_sfp, false);
+	rs_sfp = sg_find_sfp_by_fd(filp, ws_sfp, false);
 	if (IS_ERR(rs_sfp)) {
 		res = PTR_ERR(rs_sfp);
 		if (res == -EPROBE_DEFER) {
@@ -3494,7 +3414,7 @@ sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
 	       filp);
 	sg_unshare_ws_fd(ws_sfp, false);
 again:
-	ws_sfp = sg_find_sfp_by_fd(filp, new_ws_fd, rs_sfp, true);
+	ws_sfp = sg_find_sfp_by_fd(filp, rs_sfp, true);
 	if (IS_ERR(ws_sfp)) {
 		res = PTR_ERR(ws_sfp);
 		if (res == -EPROBE_DEFER) {
@@ -6406,6 +6326,17 @@ struct sg_proc_deviter {
 	int fd_index;
 };
 
+static int
+sg_idr_max_id(int id, void *p, void *data)
+		__must_hold(&sg_index_lock)
+{
+	int *k = data;
+
+	if (*k < id)
+		*k = id;
+	return 0;
+}
+
 static int
 sg_last_dev(void)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 59/83] sg: add snap_dev flag and snapped in debugfs
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (58 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 58/83] sg: tweak sg_find_sfp_by_fd() Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 60/83] sg: compress usercontext to uc Douglas Gilbert
                   ` (23 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Add SG_CTL_FLAGM_SNAP_DEV flag to ioctl(SG_SET_GET_EXTENDED)
to allow a snapshot of the current device's data structures
to be sent to /sys/kernel/debug/scsi_generic/snapped
programmatically. The format of the output is similar to what
is seen in: 'cat /sys/kernel/debug/scsi_generic/snapshot' .
Each "snap_dev" is prefixed by a "UTC time: <timestamp>". The
timestamp has microsecond resolution. Each "snap_dev" is
appended to the single internal buffer which is reset to
position zero after that buffer becomes half full.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 107 +++++++++++++++++++++++++++++++++++++++++
 include/uapi/scsi/sg.h |   3 +-
 2 files changed, 109 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 7f62cd9bffe0..045aa96addac 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -194,6 +194,9 @@ static struct class_interface sg_interface = {
 	.remove_dev     = sg_remove_device,
 };
 
+static DEFINE_MUTEX(snapped_mutex);
+static char *snapped_buf;
+
 /* Subset of sg_io_hdr found in <scsi/sg.h>, has only [i] and [i->o] fields */
 struct sg_slice_hdr3 {
 	int interface_id;
@@ -363,6 +366,11 @@ static int sg_srp_q_blk_poll(struct sg_request *srp, struct request_queue *q,
 static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
 static const char *sg_shr_str(enum sg_shr_var sh_var, bool long_str);
 #endif
+#if IS_ENABLED(SG_PROC_OR_DEBUG_FS)
+static int sg_proc_debug_sdev(struct sg_device *sdp, char *obp, int len,
+			      int *fd_counterp, bool reduced);
+static void sg_take_snap(struct sg_fd *sfp, bool clear_first);
+#endif
 
 #define SG_WRITE_COUNT_LIMIT (32 * 1024 * 1024)
 
@@ -390,6 +398,7 @@ static const char *sg_shr_str(enum sg_shr_var sh_var, bool long_str);
  */
 
 #define SG_PROC_DEBUG_SZ 8192
+#define SG_SNAP_BUFF_SZ (SG_PROC_DEBUG_SZ * 8)
 
 #if IS_ENABLED(CONFIG_SCSI_LOGGING) && IS_ENABLED(SG_DEBUG)
 #define SG_LOG_BUFF_SZ 48
@@ -3574,6 +3583,62 @@ sg_any_persistent_orphans(struct sg_fd *sfp)
 	return false;
 }
 
+/* Ignore append if size already over half of available buffer */
+static void
+sg_take_snap(struct sg_fd *sfp, bool dont_append)
+{
+	u32 hour, minute, second;
+	u64 n;
+	struct sg_device *sdp = sfp->parentdp;
+	struct timespec64 ts64;
+	char b[64];
+
+	if (!sdp)
+		return;
+	ktime_get_real_ts64(&ts64);
+	/* prefer local time but sys_tz.tz_minuteswest is always 0 */
+	n = ts64.tv_sec;
+	second = (u32)do_div(n, 60);
+	minute = (u32)do_div(n, 60);
+	hour = (u32)do_div(n, 24);	/* hour within a UTC day */
+	snprintf(b, sizeof(b), "UTC time: %.2u:%.2u:%.2u:%.6u [tid=%d]",
+		 hour, minute, second, (u32)ts64.tv_nsec / 1000,
+		 (current ? current->pid : -1));
+	mutex_lock(&snapped_mutex);
+	if (!snapped_buf) {
+		snapped_buf = kzalloc(SG_SNAP_BUFF_SZ,
+				      GFP_KERNEL | __GFP_NOWARN);
+		if (!snapped_buf)
+			goto fini;
+	} else if (dont_append) {
+		memset(snapped_buf, 0, SG_SNAP_BUFF_SZ);
+	}
+#if IS_ENABLED(SG_PROC_OR_DEBUG_FS)
+	if (true) {	/* for some declarations */
+		int n, prevlen, bp_len;
+		char *bp;
+
+		prevlen = strlen(snapped_buf);
+		if (prevlen > SG_SNAP_BUFF_SZ / 2)
+			prevlen = 0;
+		bp_len = SG_SNAP_BUFF_SZ - prevlen;
+		bp = snapped_buf + prevlen;
+		n = scnprintf(bp, bp_len, "%s\n", b);
+		bp += n;
+		bp_len -= n;
+		if (bp_len < 2)
+			goto fini;
+		n = sg_proc_debug_sdev(sdp, bp, bp_len, NULL, false);
+		if (n >= bp_len - 1) {
+			if (bp[bp_len - 2] != '\n')
+				bp[bp_len - 2] = '\n';
+		}
+	}
+#endif
+fini:
+	mutex_unlock(&snapped_mutex);
+}
+
 /*
  * Processing of ioctl(SG_SET_GET_EXTENDED(SG_SEIM_CTL_FLAGS)) which is a set
  * of boolean flags. Access abbreviations: [rw], read-write; [ro], read-only;
@@ -3728,6 +3793,20 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		else
 			c_flgs_val_out &= ~SG_CTL_FLAGM_EXCL_WAITQ;
 	}
+	/* SNAP_DEV boolean, [rbw] */
+	if (c_flgs_rm & SG_CTL_FLAGM_SNAP_DEV) {
+		mutex_lock(&snapped_mutex);
+		flg = (snapped_buf && strlen(snapped_buf) > 0);
+		mutex_unlock(&snapped_mutex);
+	}
+	if (c_flgs_wm & SG_CTL_FLAGM_SNAP_DEV)
+		sg_take_snap(sfp, !!(c_flgs_val_in & SG_CTL_FLAGM_SNAP_DEV));
+	if (c_flgs_rm & SG_CTL_FLAGM_SNAP_DEV) {
+		if (flg)
+			c_flgs_val_out |= SG_CTL_FLAGM_SNAP_DEV;
+		else
+			c_flgs_val_out &= ~SG_CTL_FLAGM_SNAP_DEV;
+	}
 
 	if (c_flgs_val_in != c_flgs_val_out)
 		seip->ctl_flags = c_flgs_val_out;
@@ -4977,6 +5056,7 @@ exit_sg(void)
 	sg_dfs_exit();
 	if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
 		remove_proc_subtree("scsi/sg", NULL);
+	kfree(snapped_buf);
 	scsi_unregister_interface(&sg_interface);
 	mempool_destroy(sg_sense_pool);
 	kmem_cache_destroy(sg_sense_cache);
@@ -6599,6 +6679,10 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 	k = 0;
 	xa_lock_irqsave(&fp->srp_arr, iflags);
 	xa_for_each(&fp->srp_arr, idx, srp) {
+		if (srp->rq_idx != (unsigned long)idx)
+			n += scnprintf(obp + n, len - n,
+				       ">>> xa_index=%lu, rq_idx=%d, bad\n",
+				       idx, srp->rq_idx);
 		if (xa_get_mark(&fp->srp_arr, idx, SG_XA_RQ_INACTIVE))
 			continue;
 		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns, obp + n,
@@ -6858,6 +6942,28 @@ struct sg_dfs_attr {
 	const struct seq_operations *seq_ops;
 };
 
+static int
+sg_dfs_snapped_show(void *data, struct seq_file *m)
+{
+	mutex_lock(&snapped_mutex);
+	if (snapped_buf && snapped_buf[0])
+		seq_puts(m, snapped_buf);
+	mutex_unlock(&snapped_mutex);
+	return 0;
+}
+
+static ssize_t
+sg_dfs_snapped_write(void *data, const char __user *buf, size_t count,
+		     loff_t *ppos)
+{
+	/* Any write clears snapped buffer */
+	mutex_lock(&snapped_mutex);
+	kfree(snapped_buf);
+	snapped_buf = NULL;
+	mutex_unlock(&snapped_mutex);
+	return count;
+}
+
 static int
 sg_dfs_snapshot_devs_show(void *data, struct seq_file *m)
 {
@@ -7019,6 +7125,7 @@ static const struct seq_operations sg_snapshot_summ_seq_ops = {
 };
 
 static const struct sg_dfs_attr sg_dfs_attrs[] = {
+	{"snapped", 0600, sg_dfs_snapped_show, sg_dfs_snapped_write},
 	{"snapshot", 0400, .seq_ops = &sg_snapshot_seq_ops},
 	{"snapshot_devs", 0600, sg_dfs_snapshot_devs_show,
 	 sg_dfs_snapshot_devs_write},
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 8b3fe773dfd5..bf947ebe06dd 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -211,7 +211,8 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_CTL_FLAGM_NO_DURATION 0x400	/* don't calc command duration */
 #define SG_CTL_FLAGM_MORE_ASYNC	0x800	/* yield EAGAIN in more cases */
 #define SG_CTL_FLAGM_EXCL_WAITQ 0x1000	/* only 1 wake up per response */
-#define SG_CTL_FLAGM_ALL_BITS	0x1fff	/* should be OR of previous items */
+#define SG_CTL_FLAGM_SNAP_DEV	0x2000	/* output to debugfs::snapped */
+#define SG_CTL_FLAGM_ALL_BITS	0x3fff	/* should be OR of previous items */
 
 /* Write one of the following values to sg_extended_info::read_value, get... */
 #define SG_SEIRV_INT_MASK	0x0	/* get SG_SEIM_ALL_BITS */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 60/83] sg: compress usercontext to uc
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (59 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 59/83] sg: add snap_dev flag and snapped in debugfs Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 61/83] sg: optionally output sg_request.frq_bm flags Douglas Gilbert
                   ` (22 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Abbreviate sg_usercontext_* functions to start with sg_uc_ instead.
Rework associated function.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 58 ++++++++++++++++++++++++-----------------------
 1 file changed, 30 insertions(+), 28 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 045aa96addac..8e0ae40cde87 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -635,7 +635,7 @@ sg_fd_share_ptr(struct sg_fd *sfp)
  * Release resources associated with a prior, successful sg_open(). It can be
  * seen as the (final) close() call on a sg device file descriptor in the user
  * space. The real work releasing all resources associated with this file
- * descriptor is done by sg_remove_sfp_usercontext() which is scheduled by
+ * descriptor is done by sg_uc_remove_sfp() which is scheduled by
  * sg_remove_sfp().
  */
 static int
@@ -4630,24 +4630,25 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 	return res;
 }
 
+/*
+ * This user context function is called from sg_rq_end_io() when an orphaned
+ * request needs to be cleaned up (e.g. when control C is typed while an
+ * ioctl(SG_IO) is active).
+ */
 static void
-sg_rq_end_io_usercontext(struct work_struct *work)
+sg_uc_rq_end_io_orphaned(struct work_struct *work)
 {
 	struct sg_request *srp = container_of(work, struct sg_request,
 					      ew_orph.work);
 	struct sg_fd *sfp;
 
-	if (unlikely(!srp)) {
-		WARN_ONCE(1, "%s: srp unexpectedly NULL\n", __func__);
-		return;
-	}
 	sfp = srp->parentfp;
 	if (unlikely(!sfp)) {
 		WARN_ONCE(1, "%s: sfp unexpectedly NULL\n", __func__);
 		return;
 	}
 	SG_LOG(3, sfp, "%s: srp=0x%pK\n", __func__, srp);
-	if (unlikely(test_bit(SG_FRQ_DEACT_ORPHAN, srp->frq_bm))) {
+	if (test_bit(SG_FRQ_DEACT_ORPHAN, srp->frq_bm)) {
 		sg_finish_scsi_blk_rq(srp);	/* clean up orphan case */
 		sg_deact_request(sfp, srp);
 	}
@@ -4761,18 +4762,19 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	scsi_req_free_cmd(scsi_rp);
 	blk_put_request(rqq);
 
-	if (likely(rqq_state == SG_RQ_AWAIT_RCV)) {
-		/* Wake any sg_read()/ioctl(SG_IORECEIVE) awaiting this req */
-		if (!(srp->rq_flags & SGV4_FLAG_HIPRI))
-			wake_up_interruptible(&sfp->cmpl_wait);
-		if (sfp->async_qp && (!test_bit(SG_FRQ_IS_V4I, srp->frq_bm) ||
-				      (srp->rq_flags & SGV4_FLAG_SIGNAL)))
-			kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
-		kref_put(&sfp->f_ref, sg_remove_sfp);
-	} else {        /* clean up orphaned request that aren't being kept */
-		INIT_WORK(&srp->ew_orph.work, sg_rq_end_io_usercontext);
+	if (unlikely(rqq_state != SG_RQ_AWAIT_RCV)) {
+		/* clean up orphaned request that aren't being kept */
+		INIT_WORK(&srp->ew_orph.work, sg_uc_rq_end_io_orphaned);
 		schedule_work(&srp->ew_orph.work);
+		return;
 	}
+	/* Wake any sg_read()/ioctl(SG_IORECEIVE) awaiting this req */
+	if (!(srp->rq_flags & SGV4_FLAG_HIPRI))
+		wake_up_interruptible(&sfp->cmpl_wait);
+	if (sfp->async_qp && (!test_bit(SG_FRQ_IS_V4I, srp->frq_bm) ||
+			      (srp->rq_flags & SGV4_FLAG_SIGNAL)))
+		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+	kref_put(&sfp->f_ref, sg_remove_sfp);
 	return;
 }
 
@@ -6222,15 +6224,15 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 
 /*
  * A successful call to sg_release() will result, at some later time, to this
- * function being invoked. All requests associated with this file descriptor
- * should be completed or cancelled when this function is called (due to
- * sfp->f_ref). Also the file descriptor itself has not been accessible since
- * it was list_del()-ed by the preceding sg_remove_sfp() call. So no locking
- * is required. sdp should never be NULL but to make debugging more robust,
- * this function will not blow up in that case.
+ * "user context" function being invoked. All requests associated with this
+ * file descriptor should be completed or cancelled when this function is
+ * called (due to sfp->f_ref). Also the file descriptor itself has not been
+ * accessible since it was list_del()-ed by the preceding sg_remove_sfp()
+ * call. So no locking is required. sdp should never be NULL but to make
+ * debugging more robust, this function will not blow up in that case.
  */
 static void
-sg_remove_sfp_usercontext(struct work_struct *work)
+sg_uc_remove_sfp(struct work_struct *work)
 {
 	__maybe_unused int o_count;
 	int subm;
@@ -6295,7 +6297,7 @@ sg_remove_sfp(struct kref *kref)
 {
 	struct sg_fd *sfp = container_of(kref, struct sg_fd, f_ref);
 
-	INIT_WORK(&sfp->ew_fd.work, sg_remove_sfp_usercontext);
+	INIT_WORK(&sfp->ew_fd.work, sg_uc_remove_sfp);
 	schedule_work(&sfp->ew_fd.work);
 }
 
@@ -6342,11 +6344,11 @@ sg_rq_st_str(enum sg_rq_state rq_st, bool long_str)
 		return long_str ? "inflight" : "act";
 	case SG_RQ_AWAIT_RCV:
 		return long_str ? "await_receive" : "rcv";
-	case SG_RQ_BUSY:
+	case SG_RQ_BUSY:	/* state transitioning */
 		return long_str ? "busy" : "bsy";
-	case SG_RQ_SHR_SWAP:	/* only an active read-side has this */
+	case SG_RQ_SHR_SWAP:	/* read-side: awaiting write-side req start */
 		return long_str ? "share swap" : "s_wp";
-	case SG_RQ_SHR_IN_WS:	/* only an active read-side has this */
+	case SG_RQ_SHR_IN_WS:	/* read-side: waiting for inflight write-side */
 		return long_str ? "share write-side active" : "ws_a";
 	default:
 		return long_str ? "unknown" : "unk";
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 61/83] sg: optionally output sg_request.frq_bm flags
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (60 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 60/83] sg: compress usercontext to uc Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 62/83] sg: work on sg_mrq_sanity() Douglas Gilbert
                   ` (21 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Add this option to the little used ioctl(SG_SET_DEBUG). Once set
then 'cat /proc/scsi/sg/debug' or
'cat /sys/kernel/debug/scsi_generic/snapshot' will prefix each
request output line with its flags (sg_request::frq_bm) in hex.
It is a bitmask. To decode the hex see the SG_FRQ_* defines.

Use non_block boolean in sg_open(). Rework
sg_change_after_read_side_rq() helper.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 57 ++++++++++++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 8e0ae40cde87..5e6c67bac5cd 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -359,6 +359,7 @@ static struct sg_device *sg_get_dev(int min_dev);
 static void sg_device_destroy(struct kref *kref);
 static struct sg_request *sg_mk_srp_sgat(struct sg_fd *sfp, bool first,
 					 int db_len);
+static int sg_abort_req(struct sg_fd *sfp, struct sg_request *srp);
 static int sg_sfp_blk_poll(struct sg_fd *sfp, int loop_count);
 static int sg_srp_q_blk_poll(struct sg_request *srp, struct request_queue *q,
 			     int loop_count);
@@ -552,7 +553,7 @@ sg_open(struct inode *inode, struct file *filp)
 		goto error_out;
 
 	mutex_lock(&sdp->open_rel_lock);
-	if (op_flags & O_NONBLOCK) {
+	if (non_block) {
 		if (unlikely(o_excl)) {
 			if (atomic_read(&sdp->open_cnt) > 0) {
 				res = -EBUSY;
@@ -587,7 +588,7 @@ sg_open(struct inode *inode, struct file *filp)
 	mutex_unlock(&sdp->open_rel_lock);
 	SG_LOG(3, sfp, "%s: o_count after=%d on minor=%d, op_flags=0x%x%s\n",
 	       __func__, o_count, min_dev, op_flags,
-	       ((op_flags & O_NONBLOCK) ? " O_NONBLOCK" : ""));
+	       (non_block ? " O_NONBLOCK" : ""));
 
 	res = 0;
 sg_put:
@@ -651,8 +652,8 @@ sg_release(struct inode *inode, struct file *filp)
 		return -ENXIO;
 
 	if (unlikely(xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_FREE))) {
-		SG_LOG(1, sfp, "%s: sfp erased!!!\n", __func__);
-		return 0;	/* get out but can't fail */
+		SG_LOG(1, sfp, "%s: sfp already erased!!!\n", __func__);
+		return 0;       /* yell out but can't fail */
 	}
 
 	mutex_lock(&sdp->open_rel_lock);
@@ -2048,6 +2049,7 @@ sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs,
 }
 
 /*
+ * Invoked when user calls ioctl(SG_IORECEIVE, SGV4_FLAG_MULTIPLE_REQS).
  * Expected race as multiple concurrent calls with the same pack_id/tag can
  * occur. Only one should succeed per request (more may succeed but will get
  * different requests).
@@ -2090,7 +2092,7 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p,
 		if (copy_to_user(pp, rsp_v4_arr, len))
 			res = -EFAULT;
 	} else {
-		pr_info("%s: cop->din_xferp==NULL ?_?\n", __func__);
+		SG_LOG(1, sfp, "%s: cop->din_xferp==NULL ?_?\n", __func__);
 	}
 fini:
 	kfree(rsp_v4_arr);
@@ -2518,28 +2520,26 @@ sg_calc_sgat_param(struct sg_device *sdp)
 static int
 sg_change_after_read_side_rq(struct sg_fd *sfp, bool fini1_again0)
 {
-	int res = 0;
+	int res = -EINVAL;
 	enum sg_rq_state sr_st;
 	unsigned long iflags;
 	struct sg_fd *rs_sfp;
-	struct sg_request *rs_rsv_srp = NULL;
+	struct sg_request *rs_rsv_srp;
 	struct sg_device *sdp = sfp->parentdp;
 
 	rs_sfp = sg_fd_share_ptr(sfp);
-	if (unlikely(!rs_sfp)) {
-		res = -EINVAL;
-	} else if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_RS_SHARE)) {
-		rs_rsv_srp = sfp->rsv_srp;
+	if (unlikely(!rs_sfp))
+		goto fini;
+	if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_RS_SHARE))
 		rs_sfp = sfp;
-	} else {	/* else called on write-side */
-		rs_rsv_srp = rs_sfp->rsv_srp;
-	}
-	if (res || !rs_rsv_srp)
+	rs_rsv_srp = sfp->rsv_srp;
+	if (IS_ERR_OR_NULL(rs_rsv_srp))
 		goto fini;
 
+	res = 0;
 	xa_lock_irqsave(&rs_sfp->srp_arr, iflags);
 	sr_st = atomic_read(&rs_rsv_srp->rq_st);
-	if (fini1_again0) {
+	if (fini1_again0) {	/* finish req share after read-side req */
 		switch (sr_st) {
 		case SG_RQ_SHR_SWAP:
 			rs_rsv_srp->sh_var = SG_SHR_RS_NOT_SRQ;
@@ -2556,7 +2556,7 @@ sg_change_after_read_side_rq(struct sg_fd *sfp, bool fini1_again0)
 			res = -EINVAL;
 			break;
 		}
-	} else {
+	} else {	/* again: tweak state to allow another write-side request */
 		switch (sr_st) {
 		case SG_RQ_INACTIVE:
 			rs_rsv_srp->sh_var = SG_SHR_RS_RQ;
@@ -5827,8 +5827,8 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 	struct sg_request *rs_rsv_srp = NULL;
 	struct sg_fd *rs_sfp = NULL;
 	struct xarray *xafp = &fp->srp_arr;
-	__maybe_unused const char *cp;
-	char b[48];
+	__maybe_unused const char *cp = NULL;
+	__maybe_unused char b[64];
 
 	b[0] = '\0';
 	rsv_srp = fp->rsv_srp;
@@ -6638,6 +6638,7 @@ static int
 sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 		 bool reduced)
 {
+	bool set_debug;
 	bool t_in_ns = test_bit(SG_FFD_TIME_IN_NS, fp->ffd_bm);
 	int n = 0;
 	int to, k;
@@ -6651,6 +6652,7 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 			" shr_rs" : " shr_rs";
 	else
 		cp = "";
+	set_debug = test_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm);
 	/* sgat=-1 means unavailable */
 	to = (fp->timeout >= 0) ? jiffies_to_msecs(fp->timeout) : -999;
 	if (to < 0)
@@ -6687,10 +6689,16 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 				       idx, srp->rq_idx);
 		if (xa_get_mark(&fp->srp_arr, idx, SG_XA_RQ_INACTIVE))
 			continue;
+		if (set_debug)
+			n += scnprintf(obp + n, len - n, "     frq_bm=0x%lx  ",
+				       srp->frq_bm[0]);
+		else if (test_bit(SG_FRQ_ABORTING, srp->frq_bm))
+			n += scnprintf(obp + n, len - n,
+				       "     abort>> ");
 		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns, obp + n,
 					len - n);
 		++k;
-		if ((k % 8) == 0) {     /* don't hold up isr_s too long */
+		if ((k % 8) == 0) {	/* don't hold up isr_s too long */
 			xa_unlock_irqrestore(&fp->srp_arr, iflags);
 			cpu_relax();
 			xa_lock_irqsave(&fp->srp_arr, iflags);
@@ -6702,10 +6710,13 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 	xa_for_each_marked(&fp->srp_arr, idx, srp, SG_XA_RQ_INACTIVE) {
 		if (k == 0)
 			n += scnprintf(obp + n, len - n, "   Inactives:\n");
+		if (set_debug)
+			n += scnprintf(obp + n, len - n, "     frq_bm=0x%lx  ",
+				       srp->frq_bm[0]);
 		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns,
 					obp + n, len - n);
 		++k;
-		if ((k % 8) == 0) {     /* don't hold up isr_s too long */
+		if ((k % 8) == 0) {	/* don't hold up isr_s too long */
 			xa_unlock_irqrestore(&fp->srp_arr, iflags);
 			cpu_relax();
 			xa_lock_irqsave(&fp->srp_arr, iflags);
@@ -6805,8 +6816,8 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v, bool reduced)
 		found = true;
 		disk_name = (sdp->disk ? sdp->disk->disk_name : "?_?");
 		if (SG_IS_DETACHING(sdp)) {
-			snprintf(b1, sizeof(b1), " >>> device=%s  %s\n",
-				 disk_name, "detaching pending close\n");
+			snprintf(b1, sizeof(b1), " >>> %s %s\n", disk_name,
+				 "detaching pending close\n");
 		} else if (sdp->device) {
 			n = sg_proc_debug_sdev(sdp, bp, bp_len, fdi_p,
 					       reduced);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 62/83] sg: work on sg_mrq_sanity()
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (61 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 61/83] sg: optionally output sg_request.frq_bm flags Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 63/83] sg: shared variable blocking Douglas Gilbert
                   ` (20 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Work for following share variable blocking patch.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 5e6c67bac5cd..e43bb1673adc 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1059,9 +1059,11 @@ sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 static int
 sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 	      struct sg_io_v4 *a_hds, u8 *cdb_ap, struct sg_fd *sfp,
-	      bool immed, u32 tot_reqs)
+	      bool immed, u32 tot_reqs, bool *share_on_othp)
 {
 	bool have_mrq_sense = (cop->response && cop->max_response_len);
+	bool share_on_oth = false;
+	bool share;
 	int k;
 	u32 cdb_alen = cop->request_len;
 	u32 cdb_mxlen = cdb_alen / tot_reqs;
@@ -1084,12 +1086,13 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 			       __func__, rip, k);
 			return -ERANGE;
 		}
+		share = !!(flags & SGV4_FLAG_SHARE);
 		if (immed) {	/* only accept async submits on current fd */
 			if (unlikely(flags & SGV4_FLAG_DO_ON_OTHER)) {
 				SG_LOG(1, sfp, "%s: %s %u, %s\n", __func__,
 				       rip, k, "no IMMED with ON_OTHER");
 				return -ERANGE;
-			} else if (unlikely(flags & SGV4_FLAG_SHARE)) {
+			} else if (unlikely(share)) {
 				SG_LOG(1, sfp, "%s: %s %u, %s\n", __func__,
 				       rip, k, "no IMMED with FLAG_SHARE");
 				return -ERANGE;
@@ -1100,8 +1103,11 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 			}
 			/* N.B. SGV4_FLAG_SIG_ON_OTHER is allowed */
 		}
-		if (!sg_fd_is_shared(sfp)) {
-			if (unlikely(flags & SGV4_FLAG_SHARE)) {
+		if (sg_fd_is_shared(sfp)) {
+			if (!share_on_oth && share)
+				share_on_oth = true;
+		} else {
+			if (unlikely(share)) {
 				SG_LOG(1, sfp, "%s: %s %u, no share\n",
 				       __func__, rip, k);
 				return -ERANGE;
@@ -1124,6 +1130,8 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 			hp->max_response_len = cop->max_response_len;
 		}
 	}
+	if (share_on_othp)
+		*share_on_othp = share_on_othp;
 	return 0;
 }
 
@@ -1241,7 +1249,8 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 		}
 	}
 	/* do sanity checks on all requests before starting */
-	res = sg_mrq_sanity(sdp, cop, a_hds, cdb_ap, fp, immed, tot_reqs);
+	res = sg_mrq_sanity(sdp, cop, a_hds, cdb_ap, fp, immed, tot_reqs,
+			    NULL);
 	if (unlikely(res))
 		goto fini;
 	set_this = false;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 63/83] sg: shared variable blocking
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (62 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 62/83] sg: work on sg_mrq_sanity() Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 64/83] sg: device timestamp Douglas Gilbert
                   ` (19 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Increase the number of reserve requests per file descriptor
from 1 to SG_MAX_RSV_REQS. This is used to implement a new
type of variable blocking multiple requests that processes
request shares. This is done in a partially asynchronous
fashion.

For example up to SG_MAX_RSV_REQS read-side requests are
submitted. Then the responses for these read-side requests
are processed (which may include interruptible waits). After
this the matching write-side requests are issued and their
responses are processed.

The multiple request array presented for shared variable
blocking should be a sequence of read-side/write-side
requests. The only other commands that are accepted are
those that move no (user) data. TEST UNIT READY and
SYNCHRONIZE CACHE are examples of acceptable non-data commands.

Rename sg_remove_sgat() to the more accurate sg_remove_srp().

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 1949 +++++++++++++++++++++++++++-------------
 include/uapi/scsi/sg.h |    1 +
 2 files changed, 1328 insertions(+), 622 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index e43bb1673adc..c401047cae70 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -120,6 +120,8 @@ enum sg_shr_var {
 #define SG_DEFAULT_Q_AT SG_FD_Q_AT_HEAD /* for backward compatibility */
 #define SG_FL_MMAP_DIRECT (SG_FLAG_MMAP_IO | SG_FLAG_DIRECT_IO)
 
+#define SG_MAX_RSV_REQS 8
+
 /* Only take lower 4 bits of driver byte, all host byte and sense byte */
 #define SG_ML_RESULT_MSK 0x0fff00ff	/* mid-level's 32 bit result value */
 
@@ -140,6 +142,7 @@ enum sg_shr_var {
 #define SG_FRQ_COUNT_ACTIVE	8	/* sfp->submitted + waiting active */
 #define SG_FRQ_ISSUED		9	/* blk_execute_rq_nowait() finished */
 #define SG_FRQ_POLL_SLEPT	10	/* stop re-entry of hybrid_sleep() */
+#define SG_FRQ_RESERVED		11	/* marks a reserved request */
 
 /* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */
 #define SG_FFD_FORCE_PACKID	0	/* receive only given pack_id/tag */
@@ -155,6 +158,8 @@ enum sg_shr_var {
 #define SG_FFD_MORE_ASYNC	10	/* yield EBUSY more often */
 #define SG_FFD_MRQ_ABORT	11	/* SG_IOABORT + FLAG_MULTIPLE_REQS */
 #define SG_FFD_EXCL_WAITQ	12	/* append _exclusive to wait_event */
+#define SG_FFD_SVB_ACTIVE	13	/* shared variable blocking active */
+#define SG_FFD_RESHARE		14	/* reshare limits to single rsv req */
 
 /* Bit positions (flags) for sg_device::fdev_bm bitmask follow */
 #define SG_FDEV_EXCLUDE		0	/* have fd open with O_EXCL */
@@ -261,6 +266,7 @@ struct sg_request {	/* active SCSI command or inactive request */
 	u8 *sense_bp;		/* mempool alloc-ed sense buffer, as needed */
 	struct sg_fd *parentfp;	/* pointer to owning fd, even when on fl */
 	struct request *rqq;	/* released in sg_rq_end_io(), bio kept */
+	struct sg_request *sh_srp; /* read-side's write srp (or vice versa) */
 	struct bio *bio;	/* kept until this req -->SG_RQ_INACTIVE */
 	struct execute_work ew_orph;	/* harvest orphan request */
 };
@@ -286,11 +292,10 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	u8 next_cmd_len;	/* 0: automatic, >0: use on next write() */
 	unsigned long ffd_bm[1];	/* see SG_FFD_* defines above */
 	struct file *filp;	/* my identity when sharing */
-	struct sg_request *rsv_srp;/* one reserve request per fd */
-	struct sg_request *ws_srp; /* when rsv SG_SHR_RS_RQ, ptr to write-side */
 	struct sg_fd __rcu *share_sfp;/* fd share cross-references, else NULL */
 	struct fasync_struct *async_qp; /* used by asynchronous notification */
 	struct xarray srp_arr;	/* xarray of sg_request object pointers */
+	struct sg_request *rsv_arr[SG_MAX_RSV_REQS];
 	struct kref f_ref;
 	struct execute_work ew_fd;  /* harvest all fd resources and lists */
 };
@@ -314,6 +319,7 @@ struct sg_device { /* holds the state of each scsi generic device */
 struct sg_comm_wr_t {  /* arguments to sg_common_write() */
 	int timeout;
 	int cmd_len;
+	int rsv_idx;		/* wanted rsv_arr index, def: -1 (anyone) */
 	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
 	union {		/* selector is frq_bm.SG_FRQ_IS_V4I */
 		struct sg_io_hdr *h3p;
@@ -324,6 +330,20 @@ struct sg_comm_wr_t {  /* arguments to sg_common_write() */
 	const u8 *cmdp;
 };
 
+struct sg_mrq_hold {	/* for passing context between mrq functions */
+	bool blocking;
+	bool chk_abort;
+	bool immed;
+	bool stop_if;
+	int id_of_mrq;
+	int s_res;		/* secondary error: some-good-then-error */
+	u32 cdb_mxlen;		/* cdb length in cdb_ap, actual be may less */
+	u32 tot_reqs;		/* total number of requests and cdb_s */
+	struct sg_comm_wr_t *cwrp;
+	u8 *cdb_ap;		/* array of commands */
+	struct sg_io_v4 *a_hds;	/* array of request to execute */
+};
+
 /* tasklet or soft irq callback */
 static void sg_rq_end_io(struct request *rq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
@@ -345,7 +365,7 @@ static int sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp,
 			 void __user *p, struct sg_io_v4 *h4p);
 static int sg_read_append(struct sg_request *srp, void __user *outp,
 			  int num_xfer);
-static void sg_remove_sgat(struct sg_request *srp);
+static void sg_remove_srp(struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp, struct file *filp);
 static void sg_remove_sfp(struct kref *);
 static void sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side);
@@ -592,8 +612,7 @@ sg_open(struct inode *inode, struct file *filp)
 
 	res = 0;
 sg_put:
-	kref_put(&sdp->d_ref, sg_device_destroy);
-	/* if success, sdp->d_ref is incremented twice, decremented once */
+	kref_put(&sdp->d_ref, sg_device_destroy);  /* get: sg_get_dev() */
 	return res;
 
 out_undo:
@@ -653,7 +672,7 @@ sg_release(struct inode *inode, struct file *filp)
 
 	if (unlikely(xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_FREE))) {
 		SG_LOG(1, sfp, "%s: sfp already erased!!!\n", __func__);
-		return 0;       /* yell out but can't fail */
+		return 0;	/* yell out but can't fail */
 	}
 
 	mutex_lock(&sdp->open_rel_lock);
@@ -667,8 +686,7 @@ sg_release(struct inode *inode, struct file *filp)
 	    sg_fd_is_shared(sfp))
 		sg_remove_sfp_share(sfp, xa_get_mark(&sdp->sfp_arr, sfp->idx,
 						     SG_XA_FD_RS_SHARE));
-	kref_put(&sfp->f_ref, sg_remove_sfp);
-
+	kref_put(&sfp->f_ref, sg_remove_sfp);	/* init=1: sg_add_sfp() */
 	/*
 	 * Possibly many open()s waiting on exclude clearing, start many;
 	 * only open(O_EXCL)'s wait when open_cnt<2 and only start one.
@@ -831,6 +849,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	WRITE_ONCE(cwr.frq_bm[0], 0);
 	cwr.timeout = sfp->timeout;
 	cwr.cmd_len = cmd_size;
+	cwr.rsv_idx = -1;
 	cwr.sfp = sfp;
 	cwr.u_cmdp = p;
 	cwr.cmdp = NULL;
@@ -841,11 +860,15 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 static inline int
 sg_chk_mmap(struct sg_fd *sfp, int rq_flags, int len)
 {
+	struct sg_request *rsv_srp = sfp->rsv_arr[0];
+
 	if (unlikely(sfp->mmap_sz == 0))
 		return -EBADFD;
 	if (unlikely(atomic_read(&sfp->submitted) > 0))
 		return -EBUSY;  /* already active requests on fd */
-	if (unlikely(len > sfp->rsv_srp->sgat_h.buflen))
+	if (IS_ERR_OR_NULL(rsv_srp))
+		return -EPROTO;	/* first element always a reserve req */
+	if (unlikely(len > rsv_srp->sgatp->buflen))
 		return -ENOMEM; /* MMAP_IO size must fit in reserve */
 	if (unlikely(len > sfp->mmap_sz))
 		return -ENOMEM; /* MMAP_IO size can't exceed mmap() size */
@@ -900,6 +923,7 @@ sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 	cwr.h3p = hp;
 	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
 	cwr.cmd_len = hp->cmd_len;
+	cwr.rsv_idx = -1;
 	cwr.sfp = sfp;
 	cwr.u_cmdp = hp->cmdp;
 	cwr.cmdp = NULL;
@@ -946,27 +970,33 @@ sg_mrq_arr_flush(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds, u32 tot_reqs,
 
 static int
 sg_mrq_1complet(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
-		struct sg_fd *w_sfp, int tot_reqs, struct sg_request *srp)
+		struct sg_fd *do_on_sfp, int tot_reqs, struct sg_request *srp)
 {
 	int s_res, indx;
 	struct sg_io_v4 *hp;
 
-	SG_LOG(3, w_sfp, "%s: start, tot_reqs=%d\n", __func__, tot_reqs);
 	if (unlikely(!srp))
 		return -EPROTO;
 	indx = srp->s_hdr4.mrq_ind;
+	if (unlikely(srp->parentfp != do_on_sfp)) {
+		SG_LOG(1, do_on_sfp, "%s: mrq_ind=%d, sfp out-of-sync\n",
+		       __func__, indx);
+		return -EPROTO;
+	}
+	SG_LOG(3, do_on_sfp, "%s: mrq_ind=%d, pack_id=%d\n", __func__, indx,
+	       srp->pack_id);
 	if (unlikely(indx < 0 || indx >= tot_reqs))
 		return -EPROTO;
 	hp = a_hds + indx;
-	s_res = sg_receive_v4(w_sfp, srp, NULL, hp);
+	s_res = sg_receive_v4(do_on_sfp, srp, NULL, hp);
 	if (unlikely(s_res == -EFAULT))
 		return s_res;
 	hp->info |= SG_INFO_MRQ_FINI;
-	if (w_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
+	if (do_on_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
 		s_res = sg_mrq_arr_flush(cop, a_hds, tot_reqs, s_res);
 		if (unlikely(s_res))	/* can only be -EFAULT */
 			return s_res;
-		kill_fasync(&w_sfp->async_qp, SIGPOLL, POLL_IN);
+		kill_fasync(&do_on_sfp->async_qp, SIGPOLL, POLL_IN);
 	}
 	return 0;
 }
@@ -992,36 +1022,47 @@ sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 		struct sg_fd *sfp, struct sg_fd *sec_sfp, int tot_reqs,
 		int mreqs, int sec_reqs)
 {
-	int res;
-	int sum_inflight = mreqs + sec_reqs;	/* may be < tot_reqs */
+	int res = 0;
+	int rres;
 	struct sg_request *srp;
 
 	SG_LOG(3, sfp, "%s: mreqs=%d, sec_reqs=%d\n", __func__, mreqs,
 	       sec_reqs);
-	for ( ; sum_inflight > 0; --sum_inflight, ++cop->info) {
-		srp = NULL;
-		if (mreqs > 0 && sg_mrq_get_ready_srp(sfp, &srp)) {
+	while (mreqs + sec_reqs > 0) {
+		while (mreqs > 0 && sg_mrq_get_ready_srp(sfp, &srp)) {
 			if (IS_ERR(srp)) {	/* -ENODATA: no mrqs here */
-				mreqs = 0;
-			} else {
-				--mreqs;
-				res = sg_mrq_1complet(cop, a_hds, sfp,
-						      tot_reqs, srp);
-				if (unlikely(res))
-					return res;
+				if (PTR_ERR(srp) == -ENODATA)
+					break;
+				res = PTR_ERR(srp);
+				break;
 			}
-		} else if (sec_reqs > 0 &&
-			   sg_mrq_get_ready_srp(sec_sfp, &srp)) {
+			--mreqs;
+			res = sg_mrq_1complet(cop, a_hds, sfp, tot_reqs, srp);
+			if (unlikely(res))
+				return res;
+			++cop->info;
+			if (cop->din_xfer_len > 0)
+				--cop->din_resid;
+		}
+		while (sec_reqs > 0 && sg_mrq_get_ready_srp(sec_sfp, &srp)) {
 			if (IS_ERR(srp)) {
-				sec_reqs = 0;
-			} else {
-				--sec_reqs;
-				res = sg_mrq_1complet(cop, a_hds, sec_sfp,
-						      tot_reqs, srp);
-				if (unlikely(res))
-					return res;
+				if (PTR_ERR(srp) == -ENODATA)
+					break;
+				res = PTR_ERR(srp);
+				break;
 			}
-		} else if (mreqs > 0) {
+			--sec_reqs;
+			rres = sg_mrq_1complet(cop, a_hds, sec_sfp, tot_reqs,
+					       srp);
+			if (unlikely(rres))
+				return rres;
+			++cop->info;
+			if (cop->din_xfer_len > 0)
+				--cop->din_resid;
+		}
+		if (res)
+			break;
+		if (mreqs > 0) {
 			res = sg_wait_mrq_event(sfp, &srp);
 			if (unlikely(res))
 				return res;	/* signal --> -ERESTARTSYS */
@@ -1033,8 +1074,12 @@ sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 						      tot_reqs, srp);
 				if (unlikely(res))
 					return res;
+				++cop->info;
+				if (cop->din_xfer_len > 0)
+					--cop->din_resid;
 			}
-		} else if (sec_reqs > 0) {
+		}
+		if (sec_reqs > 0) {
 			res = sg_wait_mrq_event(sec_sfp, &srp);
 			if (unlikely(res))
 				return res;	/* signal --> -ERESTARTSYS */
@@ -1046,14 +1091,13 @@ sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 						      tot_reqs, srp);
 				if (unlikely(res))
 					return res;
+				++cop->info;
+				if (cop->din_xfer_len > 0)
+					--cop->din_resid;
 			}
-		} else { /* expect one of the above conditions to be true */
-			return -EPROTO;
 		}
-		if (cop->din_xfer_len > 0)
-			--cop->din_resid;
-	}
-	return 0;
+	}	/* end of outer while loop (while requests still inflight) */
+	return res;
 }
 
 static int
@@ -1101,7 +1145,6 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 				       rip, k, "no IMMED with COMPLETE_B4");
 				return -ERANGE;
 			}
-			/* N.B. SGV4_FLAG_SIG_ON_OTHER is allowed */
 		}
 		if (sg_fd_is_shared(sfp)) {
 			if (!share_on_oth && share)
@@ -1135,6 +1178,422 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 	return 0;
 }
 
+static bool
+sg_mrq_svb_chk(struct sg_io_v4 *a_hds, u32 tot_reqs)
+{
+	bool expect_rd;
+	int k;
+	u32 flags;
+	struct sg_io_v4 *hp;
+
+	/* expect read-write pairs, all with SGV4_FLAG_NO_DXFER set */
+	for (k = 0, hp = a_hds, expect_rd = true; k < tot_reqs; ++k, ++hp) {
+		flags = hp->flags;
+		if (flags & (SGV4_FLAG_COMPLETE_B4))
+			return false;
+		if (expect_rd) {
+			if (hp->dout_xfer_len > 0)
+				return false;
+			if (hp->din_xfer_len > 0) {
+				if (!(flags & SGV4_FLAG_SHARE))
+					return false;
+				if (flags & SGV4_FLAG_DO_ON_OTHER)
+					return false;
+				expect_rd = false;
+			}
+			/* allowing commands with no dxfer */
+		} else {	/* checking write side */
+			if (hp->dout_xfer_len > 0) {
+				if (~flags &
+				    (SGV4_FLAG_NO_DXFER | SGV4_FLAG_SHARE |
+				     SGV4_FLAG_DO_ON_OTHER))
+					return false;
+				expect_rd = true;
+			}
+			if (hp->din_xfer_len > 0)
+				return false;
+		}
+	}
+	if (!expect_rd)
+		return false;
+	return true;
+}
+
+static struct sg_request *
+sg_mrq_submit(struct sg_fd *rq_sfp, struct sg_mrq_hold *mhp, int pos_hdr,
+	      int rsv_idx)
+{
+	unsigned long ul_timeout;
+	struct sg_comm_wr_t r_cwr;
+	struct sg_comm_wr_t *r_cwrp = &r_cwr;
+	struct sg_io_v4 *hp = mhp->a_hds + pos_hdr;
+
+	if (mhp->cdb_ap) {	/* already have array of cdbs */
+		r_cwrp->cmdp = mhp->cdb_ap + (pos_hdr * mhp->cdb_mxlen);
+		r_cwrp->u_cmdp = NULL;
+	} else {	/* fetch each cdb from user space */
+		r_cwrp->cmdp = NULL;
+		r_cwrp->u_cmdp = cuptr64(hp->request);
+	}
+	r_cwrp->cmd_len = hp->request_len;
+	r_cwrp->rsv_idx = rsv_idx;
+	ul_timeout = msecs_to_jiffies(hp->timeout);
+	r_cwrp->frq_bm[0] = 0;
+	__assign_bit(SG_FRQ_SYNC_INVOC, r_cwrp->frq_bm,
+		     (int)mhp->blocking);
+	__set_bit(SG_FRQ_IS_V4I, r_cwrp->frq_bm);
+	r_cwrp->h4p = hp;
+	r_cwrp->timeout = min_t(unsigned long, ul_timeout, INT_MAX);
+	r_cwrp->sfp = rq_sfp;
+	return sg_common_write(r_cwrp);
+}
+
+/*
+ * Processes most mrq requests apart from those from "shared variable
+ * blocking" (svb) method which is processed in sg_process_svb_mrq().
+ */
+static int
+sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
+		    struct sg_mrq_hold *mhp)
+{
+	int flags, j;
+	int num_subm = 0;
+	int num_cmpl = 0;
+	int res = 0;
+	int other_fp_sent = 0;
+	int this_fp_sent = 0;
+	const int shr_complet_b4 = SGV4_FLAG_SHARE | SGV4_FLAG_COMPLETE_B4;
+	struct sg_fd *rq_sfp;
+	struct sg_io_v4 *cop = mhp->cwrp->h4p;
+	struct sg_io_v4 *hp;		/* ptr to request object in a_hds */
+	struct sg_request *srp;
+
+	SG_LOG(3, fp, "%s: id_of_mrq=%d, tot_reqs=%d, enter\n", __func__,
+	       mhp->id_of_mrq, mhp->tot_reqs);
+	/* Dispatch (submit) requests and optionally wait for response */
+	for (hp = mhp->a_hds, j = 0; num_subm < mhp->tot_reqs; ++hp, ++j) {
+		if (mhp->chk_abort && test_and_clear_bit(SG_FFD_MRQ_ABORT,
+							 fp->ffd_bm)) {
+			SG_LOG(1, fp, "%s: id_of_mrq=%d aborting at ind=%d\n",
+			       __func__, mhp->id_of_mrq, num_subm);
+			break;	/* N.B. rest not submitted */
+		}
+		flags = hp->flags;
+		rq_sfp = (flags & SGV4_FLAG_DO_ON_OTHER) ? o_sfp : fp;
+		srp = sg_mrq_submit(rq_sfp, mhp, j, -1);
+		if (IS_ERR(srp)) {
+			mhp->s_res = PTR_ERR(srp);
+			break;
+		}
+		srp->s_hdr4.mrq_ind = num_subm++;
+		if (mhp->chk_abort)
+			atomic_set(&srp->s_hdr4.pack_id_of_mrq,
+				   mhp->id_of_mrq);
+		if (mhp->immed ||
+		    (!(mhp->blocking || (flags & shr_complet_b4)))) {
+			if (fp == rq_sfp)
+				++this_fp_sent;
+			else
+				++other_fp_sent;
+			continue;  /* defer completion until all submitted */
+		}
+		mhp->s_res = sg_wait_event_srp(rq_sfp, NULL, hp, srp);
+		if (unlikely(mhp->s_res)) {
+			if (mhp->s_res == -ERESTARTSYS)
+				return mhp->s_res;
+			break;
+		}
+		++num_cmpl;
+		hp->info |= SG_INFO_MRQ_FINI;
+		if (mhp->stop_if && (hp->driver_status ||
+				     hp->transport_status ||
+				     hp->device_status)) {
+			SG_LOG(2, fp, "%s: %s=0x%x/0x%x/0x%x] cause exit\n",
+			       __func__, "STOP_IF and status [drv/tran/scsi",
+			       hp->driver_status, hp->transport_status,
+			       hp->device_status);
+			break;	/* cop->driver_status <-- 0 in this case */
+		}
+		if (rq_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
+			res = sg_mrq_arr_flush(cop, mhp->a_hds, mhp->tot_reqs,
+					       mhp->s_res);
+			if (unlikely(res))
+				break;
+			kill_fasync(&rq_sfp->async_qp, SIGPOLL, POLL_IN);
+		}
+	}	/* end of dispatch request and optionally wait response loop */
+	cop->dout_resid = mhp->tot_reqs - num_subm;
+	cop->info = mhp->immed ? num_subm : num_cmpl;
+	if (cop->din_xfer_len > 0) {
+		cop->din_resid = mhp->tot_reqs - num_cmpl;
+		cop->spare_out = -mhp->s_res;
+	}
+
+	if (mhp->immed)
+		return res;
+	if (likely(res == 0 && (this_fp_sent + other_fp_sent) > 0)) {
+		mhp->s_res = sg_mrq_complets(cop, mhp->a_hds, fp, o_sfp,
+					     mhp->tot_reqs, this_fp_sent,
+					     other_fp_sent);
+		if (unlikely(mhp->s_res == -EFAULT ||
+			     mhp->s_res == -ERESTARTSYS))
+			res = mhp->s_res;	/* this may leave orphans */
+	}
+	if (mhp->id_of_mrq)	/* can no longer do a mrq abort */
+		atomic_set(&fp->mrq_id_abort, 0);
+	return res;
+}
+
+static int
+sg_find_srp_idx(struct sg_fd *sfp, const struct sg_request *srp)
+{
+	int k;
+	struct sg_request **rapp = sfp->rsv_arr;
+
+	for (k = 0; k < SG_MAX_RSV_REQS; ++k, ++rapp) {
+		if (*rapp == srp)
+			return k;
+	}
+	return -1;
+}
+
+/*
+ * Processes shared variable blocking. First inner loop submits a chunk of
+ * requests (some read-side, some non-data) but defers any write-side requests. The
+ * second inner loop processes the completions from the first inner loop, plus
+ * for any completed read-side request it submits the paired write-side request. The
+ * second inner loop also waits for the completions of those write-side requests.
+ * The outer loop then moves onto the next chunk, working its way through
+ * the multiple requests. The user sees a blocking command, but the chunks
+ * are run in parallel apart from read-write ordering requirement.
+ * N.B. Only one svb mrq permitted per file descriptor at a time.
+ */
+static int
+sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
+		   struct sg_mrq_hold *mhp)
+{
+	bool aborted = false;
+	bool chk_oth_first;
+	int k, j, i, m, rcv_before, idx, ws_pos, sent;
+	int this_fp_sent, other_fp_sent;
+	int num_subm = 0;
+	int num_cmpl = 0;
+	int res = 0;
+	struct sg_fd *rq_sfp;
+	struct sg_io_v4 *cop = mhp->cwrp->h4p;
+	struct sg_io_v4 *hp;		/* ptr to request object in a_hds */
+	struct sg_request *srp;
+	struct sg_request *rs_srp;
+	struct sg_io_v4 *a_hds = mhp->a_hds;
+	int ws_pos_a[SG_MAX_RSV_REQS];	/* write-side hdr pos within a_hds */
+	struct sg_request *rs_srp_a[SG_MAX_RSV_REQS];
+
+	SG_LOG(3, fp, "%s: id_of_mrq=%d, tot_reqs=%d, enter\n", __func__,
+	       mhp->id_of_mrq, mhp->tot_reqs);
+
+	/* work through mrq array, SG_MAX_RSV_REQS read-side requests at a time */
+	for (hp = a_hds, j = 0; j < mhp->tot_reqs; ) {
+		this_fp_sent = 0;
+		other_fp_sent = 0;
+		chk_oth_first = false;
+		for (k = 0; k < SG_MAX_RSV_REQS && j < mhp->tot_reqs;
+		     ++hp, ++j) {
+			if (mhp->chk_abort &&
+			    test_and_clear_bit(SG_FFD_MRQ_ABORT, fp->ffd_bm)) {
+				SG_LOG(1, fp,
+				       "%s: id_of_mrq=%d aborting at pos=%d\n",
+				       __func__, mhp->id_of_mrq, num_subm);
+				aborted = true;
+				/*
+				 * after mrq abort detected, complete those
+				 * already submitted, but don't submit any more
+				 */
+			}
+			if (aborted)
+				break;
+			if (hp->flags & SGV4_FLAG_DO_ON_OTHER) {
+				if (hp->dout_xfer_len > 0) {
+					/* need to await read-side completion */
+					ws_pos_a[k] = j;
+					++k;
+					continue;  /* deferred to next loop */
+				}
+				chk_oth_first = true;
+				SG_LOG(6, o_sfp,
+				       "%s: subm-nodat p_id=%d on write-side\n",
+				       __func__, (int)hp->request_extra);
+				rq_sfp = o_sfp;
+			} else {
+				SG_LOG(6, fp, "%s: submit p_id=%d on read-side\n",
+				       __func__, (int)hp->request_extra);
+				rq_sfp = fp;
+			}
+			srp = sg_mrq_submit(rq_sfp, mhp, j, -1);
+			if (IS_ERR(srp)) {
+				mhp->s_res = PTR_ERR(srp);
+				res = mhp->s_res;	/* don't loop again */
+				SG_LOG(1, rq_sfp, "%s: mrq_submit()->%d\n",
+				       __func__, res);
+				break;
+			}
+			num_subm++;
+			if (hp->din_xfer_len > 0)
+				rs_srp_a[k] = srp;
+			srp->s_hdr4.mrq_ind = j;
+			if (mhp->chk_abort)
+				atomic_set(&srp->s_hdr4.pack_id_of_mrq,
+					   mhp->id_of_mrq);
+			if (fp == rq_sfp)
+				++this_fp_sent;
+			else
+				++other_fp_sent;
+		}
+		sent = this_fp_sent + other_fp_sent;
+		if (sent <= 0)
+			break;
+		/*
+		 * We have just submitted a fixed number read-side reqs and any
+		 * others (that don't move data). Now we pick up their
+		 * responses. Any responses that were read-side requests have
+		 * their paired write-side submitted. Finally we wait for those
+		 * paired write-side to complete.
+		 */
+		rcv_before = cop->info;
+		for (i = 0; i < sent; ++i) {	/* now process responses */
+			if (other_fp_sent > 0 &&
+			    sg_mrq_get_ready_srp(o_sfp, &srp)) {
+other_found:
+				if (IS_ERR(srp)) {
+					res = PTR_ERR(srp);
+					break;
+				}
+				--other_fp_sent;
+				res = sg_mrq_1complet(cop, a_hds, o_sfp,
+						      mhp->tot_reqs, srp);
+				if (unlikely(res))
+					return res;
+				++cop->info;
+				if (cop->din_xfer_len > 0)
+					--cop->din_resid;
+				continue;  /* do available submits first */
+			}
+			if (this_fp_sent > 0 &&
+			    sg_mrq_get_ready_srp(fp, &srp)) {
+this_found:
+				if (IS_ERR(srp)) {
+					res = PTR_ERR(srp);
+					break;
+				}
+				--this_fp_sent;
+				res = sg_mrq_1complet(cop, a_hds, fp,
+						      mhp->tot_reqs, srp);
+				if (unlikely(res))
+					return res;
+				++cop->info;
+				if (cop->din_xfer_len > 0)
+					--cop->din_resid;
+				if (srp->s_hdr4.dir != SG_DXFER_FROM_DEV)
+					continue;
+				/* read-side req completed, submit its write-side */
+				rs_srp = srp;
+				for (m = 0; m < k; ++m) {
+					if (rs_srp == rs_srp_a[m])
+						break;
+				}
+				if (m >= k) {
+					SG_LOG(1, rs_srp->parentfp,
+					       "%s: m >= %d, pack_id=%d\n",
+					       __func__, k, rs_srp->pack_id);
+					res = -EPROTO;
+					break;
+				}
+				ws_pos = ws_pos_a[m];
+				idx = sg_find_srp_idx(fp, rs_srp);
+				if (idx < 0) {
+					SG_LOG(1, rs_srp->parentfp,
+					       "%s: idx < 0\n", __func__);
+					res = -EPROTO;
+					break;
+				}
+				SG_LOG(6, o_sfp,
+				       "%s: submit ws_pos=%d, rs_idx=%d\n",
+				       __func__, ws_pos, idx);
+				srp = sg_mrq_submit(o_sfp, mhp, ws_pos, idx);
+				if (IS_ERR(srp)) {
+					mhp->s_res = PTR_ERR(srp);
+					res = mhp->s_res;
+					SG_LOG(1, o_sfp,
+					       "%s: mrq_submit(oth)->%d\n",
+						__func__, res);
+					break;
+				}
+				++num_subm;
+				++other_fp_sent;
+				++sent;
+				srp->s_hdr4.mrq_ind = ws_pos;
+				if (mhp->chk_abort)
+					atomic_set(&srp->s_hdr4.pack_id_of_mrq,
+						   mhp->id_of_mrq);
+				continue;  /* do available submits first */
+			}
+			/* waits maybe interrupted by signals (-ERESTARTSYS) */
+			if (chk_oth_first)
+				goto oth_first;
+this_second:
+			if (this_fp_sent > 0) {
+				res = sg_wait_mrq_event(fp, &srp);
+				if (unlikely(res))
+					return res;
+				goto this_found;
+			}
+			if (chk_oth_first)
+				continue;
+oth_first:
+			if (other_fp_sent > 0) {
+				res = sg_wait_mrq_event(o_sfp, &srp);
+				if (unlikely(res))
+					return res;
+				goto other_found;
+			}
+			if (chk_oth_first)
+				goto this_second;
+		}	/* end of response/write_side_submit/write_side_response loop */
+		if (unlikely(mhp->s_res == -EFAULT ||
+			     mhp->s_res == -ERESTARTSYS))
+			res = mhp->s_res;	/* this may leave orphans */
+		num_cmpl += (cop->info - rcv_before);
+		if (res)
+			break;
+		if (aborted)
+			break;
+	}	/* end of outer for loop */
+
+	cop->dout_resid = mhp->tot_reqs - num_subm;
+	if (cop->din_xfer_len > 0) {
+		cop->din_resid = mhp->tot_reqs - num_cmpl;
+		cop->spare_out = -mhp->s_res;
+	}
+	if (mhp->id_of_mrq)	/* can no longer do a mrq abort */
+		atomic_set(&fp->mrq_id_abort, 0);
+	return res;
+}
+
+#if IS_ENABLED(SG_LOG_ACTIVE)
+static const char *
+sg_mrq_name(bool blocking, u32 flags)
+{
+	if (!(flags & SGV4_FLAG_MULTIPLE_REQS))
+		return "_not_ multiple requests control object";
+	if (blocking)
+		return "ordered blocking";
+	if (flags & SGV4_FLAG_IMMED)
+		return "submit or full non-blocking";
+	if (flags & SGV4_FLAG_SHARE)
+		return "shared variable blocking";
+	return "variable blocking";
+}
+#endif
+
 /*
  * Implements the multiple request functionality. When 'blocking' is true
  * invocation was via ioctl(SG_IO), otherwise it was via ioctl(SG_IOSUBMIT).
@@ -1145,47 +1604,51 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 static int
 sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 {
-	bool chk_abort = false;
-	bool set_this, set_other, immed, stop_if, f_non_block;
+	bool f_non_block, share_on_oth;
 	int res = 0;
-	int s_res = 0;	/* for secondary error: some-good-then-error, case */
-	int other_fp_sent = 0;
-	int this_fp_sent = 0;
-	int num_subm = 0;
-	int num_cmpl = 0;
-	const int shr_complet_b4 = SGV4_FLAG_SHARE | SGV4_FLAG_COMPLETE_B4;
-	int id_of_mrq, existing_id;
-	u32 n, flags, cdb_mxlen;
-	unsigned long ul_timeout;
+	int existing_id;
+	u32 cdb_mxlen;
 	struct sg_io_v4 *cop = cwrp->h4p;	/* controlling object */
 	u32 blen = cop->dout_xfer_len;
 	u32 cdb_alen = cop->request_len;
 	u32 tot_reqs = blen / SZ_SG_IO_V4;
 	u8 *cdb_ap = NULL;
-	struct sg_io_v4 *hp;		/* ptr to request object in a_hds */
 	struct sg_io_v4 *a_hds;		/* array of request objects */
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_fd *o_sfp = sg_fd_share_ptr(fp);
-	struct sg_fd *rq_sfp;
-	struct sg_request *srp;
 	struct sg_device *sdp = fp->parentdp;
+	struct sg_mrq_hold mh;
+	struct sg_mrq_hold *mhp = &mh;
+#if IS_ENABLED(SG_LOG_ACTIVE)
+	const char *mrq_name;
+#endif
 
+	mhp->cwrp = cwrp;
+	mhp->blocking = blocking;
+#if IS_ENABLED(SG_LOG_ACTIVE)
+	mrq_name = sg_mrq_name(blocking, cop->flags);
+#endif
 	f_non_block = !!(fp->filp->f_flags & O_NONBLOCK);
-	immed = !!(cop->flags & SGV4_FLAG_IMMED);
-	stop_if = !!(cop->flags & SGV4_FLAG_STOP_IF);
-	id_of_mrq = (int)cop->request_extra;
-	if (id_of_mrq) {
-		existing_id = atomic_cmpxchg(&fp->mrq_id_abort, 0, id_of_mrq);
-		if (existing_id && existing_id != id_of_mrq) {
+	mhp->immed = !!(cop->flags & SGV4_FLAG_IMMED);
+	mhp->stop_if = !!(cop->flags & SGV4_FLAG_STOP_IF);
+	mhp->id_of_mrq = (int)cop->request_extra;
+	mhp->tot_reqs = tot_reqs;
+	mhp->s_res = 0;
+	if (mhp->id_of_mrq) {
+		existing_id = atomic_cmpxchg(&fp->mrq_id_abort, 0,
+					     mhp->id_of_mrq);
+		if (existing_id && existing_id != mhp->id_of_mrq) {
 			SG_LOG(1, fp, "%s: existing id=%d id_of_mrq=%d\n",
-			       __func__, existing_id, id_of_mrq);
+			       __func__, existing_id, mhp->id_of_mrq);
 			return -EDOM;
 		}
 		clear_bit(SG_FFD_MRQ_ABORT, fp->ffd_bm);
-		chk_abort = true;
+		mhp->chk_abort = true;
+	} else {
+		mhp->chk_abort = false;
 	}
 	if (blocking) {		/* came from ioctl(SG_IO) */
-		if (unlikely(immed)) {
+		if (unlikely(mhp->immed)) {
 			SG_LOG(1, fp, "%s: ioctl(SG_IO) %s contradicts\n",
 			       __func__, "with SGV4_FLAG_IMMED");
 			return -ERANGE;
@@ -1196,11 +1659,10 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 			f_non_block = false;
 		}
 	}
-	if (!immed && f_non_block)
-		immed = true;
+	if (!mhp->immed && f_non_block)
+		mhp->immed = true;
 	SG_LOG(3, fp, "%s: %s, tot_reqs=%u, id_of_mrq=%d\n", __func__,
-	       (immed ? "IMMED" : (blocking ?  "ordered blocking" :
-				   "variable blocking")), tot_reqs, id_of_mrq);
+	       mrq_name, tot_reqs, mhp->id_of_mrq);
 	sg_sgv4_out_zero(cop);
 
 	if (unlikely(tot_reqs > U16_MAX)) {
@@ -1208,7 +1670,7 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 	} else if (unlikely(blen > SG_MAX_MULTI_REQ_SZ ||
 			    cdb_alen > SG_MAX_MULTI_REQ_SZ)) {
 		return  -E2BIG;
-	} else if (unlikely(immed && stop_if)) {
+	} else if (unlikely(mhp->immed && mhp->stop_if)) {
 		return -ERANGE;
 	} else if (unlikely(tot_reqs == 0)) {
 		return 0;
@@ -1224,16 +1686,14 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 		cdb_mxlen = 0;
 	}
 
-	if (SG_IS_DETACHING(sdp))
-		return -ENODEV;
-	else if (unlikely(o_sfp && SG_IS_DETACHING((o_sfp->parentdp))))
+	if (SG_IS_DETACHING(sdp) || (o_sfp && SG_IS_DETACHING(o_sfp->parentdp)))
 		return -ENODEV;
 
 	a_hds = kcalloc(tot_reqs, SZ_SG_IO_V4, GFP_KERNEL | __GFP_NOWARN);
 	if (unlikely(!a_hds))
 		return -ENOMEM;
-	n = tot_reqs * SZ_SG_IO_V4;
-	if (copy_from_user(a_hds, cuptr64(cop->dout_xferp), n)) {
+	if (copy_from_user(a_hds, cuptr64(cop->dout_xferp),
+			   tot_reqs * SZ_SG_IO_V4)) {
 		res = -EFAULT;
 		goto fini;
 	}
@@ -1249,114 +1709,45 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 		}
 	}
 	/* do sanity checks on all requests before starting */
-	res = sg_mrq_sanity(sdp, cop, a_hds, cdb_ap, fp, immed, tot_reqs,
-			    NULL);
+	res = sg_mrq_sanity(sdp, cop, a_hds, cdb_ap, fp, mhp->immed,
+			    tot_reqs, &share_on_oth);
 	if (unlikely(res))
 		goto fini;
-	set_this = false;
-	set_other = false;
-	/* Dispatch (submit) requests and optionally wait for response */
-	for (hp = a_hds; num_subm < tot_reqs; ++hp) {
-		if (chk_abort && test_and_clear_bit(SG_FFD_MRQ_ABORT,
-						    fp->ffd_bm)) {
-			SG_LOG(1, fp, "%s: id_of_mrq=%d aborting at ind=%d\n",
-			       __func__, id_of_mrq, num_subm);
-			break;	/* N.B. rest not submitted */
-		}
-		flags = hp->flags;
-		if (flags & SGV4_FLAG_DO_ON_OTHER) {
-			rq_sfp = o_sfp;
-			if (!set_other) {
-				set_other = true;
-				if (test_bit(SG_FFD_NO_CMD_Q, rq_sfp->ffd_bm))
-					clear_bit(SG_FFD_NO_CMD_Q,
-						  rq_sfp->ffd_bm);
-			}
-		} else {
-			rq_sfp = fp;
-			if (!set_this) {
-				set_this = true;
-				if (test_bit(SG_FFD_NO_CMD_Q, rq_sfp->ffd_bm))
-					clear_bit(SG_FFD_NO_CMD_Q,
-						  rq_sfp->ffd_bm);
-			}
-		}
-		if (cdb_ap) {	/* already have array of cdbs */
-			cwrp->cmdp = cdb_ap + (num_subm * cdb_mxlen);
-			cwrp->u_cmdp = NULL;
-		} else {	/* fetch each cdb from user space */
-			cwrp->cmdp = NULL;
-			cwrp->u_cmdp = cuptr64(hp->request);
-		}
-		cwrp->cmd_len = hp->request_len;
-		ul_timeout = msecs_to_jiffies(hp->timeout);
-		cwrp->frq_bm[0] = 0;
-		__assign_bit(SG_FRQ_SYNC_INVOC, cwrp->frq_bm, (int)blocking);
-		__set_bit(SG_FRQ_IS_V4I, cwrp->frq_bm);
-		cwrp->h4p = hp;
-		cwrp->timeout = min_t(unsigned long, ul_timeout, INT_MAX);
-		cwrp->sfp = rq_sfp;
-		srp = sg_common_write(cwrp);
-		if (IS_ERR(srp)) {
-			s_res = PTR_ERR(srp);
-			break;
-		}
-		srp->s_hdr4.mrq_ind = num_subm++;
-		if (chk_abort)
-			atomic_set(&srp->s_hdr4.pack_id_of_mrq, id_of_mrq);
-		if (immed || (!(blocking || (flags & shr_complet_b4)))) {
-			if (fp == rq_sfp)
-				++this_fp_sent;
-			else
-				++other_fp_sent;
-			continue;  /* defer completion until all submitted */
-		}
-		s_res = sg_wait_event_srp(rq_sfp, NULL, hp, srp);
-		if (unlikely(s_res)) {
-			if (s_res == -ERESTARTSYS) {
-				res = s_res;
-				goto fini;
-			}
-			break;
-		}
-		++num_cmpl;
-		hp->info |= SG_INFO_MRQ_FINI;
-		if (stop_if && (hp->driver_status || hp->transport_status ||
-				hp->device_status)) {
-			SG_LOG(2, fp, "%s: %s=0x%x/0x%x/0x%x] cause exit\n",
-			       __func__, "STOP_IF and status [drv/tran/scsi",
-			       hp->driver_status, hp->transport_status,
-			       hp->device_status);
-			break;	/* cop->driver_status <-- 0 in this case */
-		}
-		if (rq_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
-			res = sg_mrq_arr_flush(cop, a_hds, tot_reqs, s_res);
-			if (unlikely(res))
-				break;
-			kill_fasync(&rq_sfp->async_qp, SIGPOLL, POLL_IN);
-		}
-	}	/* end of dispatch request and optionally wait response loop */
-	cop->dout_resid = tot_reqs - num_subm;
-	cop->info = num_cmpl;		/* number received */
-	if (cop->din_xfer_len > 0) {
-		cop->din_resid = tot_reqs - num_cmpl;
-		cop->spare_out = -s_res;
-	}
 
-	if (immed)
-		goto fini;
+	/* override cmd queuing setting to allow */
+	clear_bit(SG_FFD_NO_CMD_Q, fp->ffd_bm);
+	if (o_sfp)
+		clear_bit(SG_FFD_NO_CMD_Q, o_sfp->ffd_bm);
 
-	if (likely(res == 0 && (this_fp_sent + other_fp_sent) > 0)) {
-		s_res = sg_mrq_complets(cop, a_hds, fp, o_sfp, tot_reqs,
-					this_fp_sent, other_fp_sent);
-		if (unlikely(s_res == -EFAULT || s_res == -ERESTARTSYS))
-			res = s_res;	/* this may leave orphans */
+	mhp->cdb_ap = cdb_ap;
+	mhp->a_hds = a_hds;
+	mhp->cdb_mxlen = cdb_mxlen;
+
+	if (!mhp->immed && !blocking && share_on_oth) {
+		bool ok;
+
+		/* check for 'shared' variable blocking (svb) */
+		ok = sg_mrq_svb_chk(a_hds, tot_reqs);
+		if (!ok) {
+			SG_LOG(1, fp, "%s: %s failed on req(s)\n", __func__,
+			       mrq_name);
+			res = -ERANGE;
+			goto fini;
+		}
+		if (test_and_set_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm)) {
+			SG_LOG(1, fp, "%s: %s already active\n", __func__,
+			       mrq_name);
+			res = -EBUSY;
+			goto fini;
+		}
+		res = sg_process_svb_mrq(fp, o_sfp, mhp);
+		clear_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm);
+	} else {
+		res = sg_process_most_mrq(fp, o_sfp, mhp);
 	}
-	if (id_of_mrq)	/* can no longer do a mrq abort */
-		atomic_set(&fp->mrq_id_abort, 0);
 fini:
-	if (likely(res == 0) && !immed)
-		res = sg_mrq_arr_flush(cop, a_hds, tot_reqs, s_res);
+	if (likely(res == 0) && !mhp->immed)
+		res = sg_mrq_arr_flush(cop, a_hds, tot_reqs, mhp->s_res);
 	kfree(cdb_ap);
 	kfree(a_hds);
 	return res;
@@ -1414,6 +1805,7 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 	cwr.h4p = h4p;
 	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
 	cwr.cmd_len = h4p->request_len;
+	cwr.rsv_idx = -1;
 	cwr.u_cmdp = cuptr64(h4p->request);
 	cwr.cmdp = NULL;
 	srp = sg_common_write(&cwr);
@@ -1485,11 +1877,12 @@ sg_share_chk_flags(struct sg_fd *sfp, u32 rq_flags, int dxfer_len, int dir,
 	enum sg_shr_var sh_var = SG_SHR_NONE;
 
 	if (rq_flags & SGV4_FLAG_SHARE) {
-		if (unlikely(rq_flags & SG_FLAG_DIRECT_IO))
+		if (unlikely(rq_flags & SG_FLAG_DIRECT_IO)) {
 			result = -EINVAL; /* since no control of data buffer */
-		else if (unlikely(dxfer_len < 1))
-			result = -ENODATA;
-		else if (is_read_side) {
+		} else if (unlikely(dxfer_len < 1)) {
+			sh_var = is_read_side ? SG_SHR_RS_NOT_SRQ :
+						SG_SHR_WS_NOT_SRQ;
+		} else if (is_read_side) {
 			sh_var = SG_SHR_RS_RQ;
 			if (unlikely(dir != SG_DXFER_FROM_DEV))
 				result = -ENOMSG;
@@ -1498,7 +1891,7 @@ sg_share_chk_flags(struct sg_fd *sfp, u32 rq_flags, int dxfer_len, int dir,
 				if (unlikely(rq_flags & SG_FL_MMAP_DIRECT))
 					result = -ENODATA;
 			}
-		} else {			/* fd is write-side */
+		} else {
 			sh_var = SG_SHR_WS_RQ;
 			if (unlikely(dir != SG_DXFER_TO_DEV))
 				result = -ENOMSG;
@@ -1536,6 +1929,49 @@ sg_rq_state_fail_msg(struct sg_fd *sfp, enum sg_rq_state exp_old_st,
 
 /* Functions ending in '_ulck' assume sfp->xa_lock held by caller. */
 static void
+sg_rq_chg_state_force_ulck(struct sg_request *srp, enum sg_rq_state new_st)
+{
+	bool prev, want;
+	struct sg_fd *sfp = srp->parentfp;
+	struct xarray *xafp = &sfp->srp_arr;
+
+	atomic_set(&srp->rq_st, new_st);
+	want = (new_st == SG_RQ_AWAIT_RCV);
+	prev = xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
+	if (prev != want) {
+		if (want)
+			__xa_set_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
+		else
+			__xa_clear_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
+	}
+	want = (new_st == SG_RQ_INACTIVE);
+	prev = xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
+	if (prev != want) {
+		if (want) {
+			int prev_idx = READ_ONCE(sfp->low_used_idx);
+
+			if (prev_idx < 0 || srp->rq_idx < prev_idx ||
+			    !xa_get_mark(xafp, prev_idx, SG_XA_RQ_INACTIVE))
+				WRITE_ONCE(sfp->low_used_idx, srp->rq_idx);
+			__xa_set_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
+		} else {
+			__xa_clear_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
+		}
+	}
+}
+
+static void
+sg_rq_chg_state_force(struct sg_request *srp, enum sg_rq_state new_st)
+{
+	unsigned long iflags;
+	struct xarray *xafp = &srp->parentfp->srp_arr;
+
+	xa_lock_irqsave(xafp, iflags);
+	sg_rq_chg_state_force_ulck(srp, new_st);
+	xa_unlock_irqrestore(xafp, iflags);
+}
+
+static inline void
 sg_rq_chg_state_help(struct xarray *xafp, struct sg_request *srp, int indic)
 {
 	if (indic & 1)		/* from inactive state */
@@ -1565,13 +2001,10 @@ static int
 sg_rq_chg_state_ulck(struct sg_request *srp, enum sg_rq_state old_st,
 		     enum sg_rq_state new_st)
 {
-	enum sg_rq_state act_old_st;
-	int indic;
+	enum sg_rq_state act_old_st =
+			(enum sg_rq_state)atomic_cmpxchg_relaxed(&srp->rq_st, old_st, new_st);
+	int indic = sg_rq_state_arr[(int)old_st] + sg_rq_state_mul2arr[(int)new_st];
 
-	indic = sg_rq_state_arr[(int)old_st] +
-		sg_rq_state_mul2arr[(int)new_st];
-	act_old_st = (enum sg_rq_state)atomic_cmpxchg(&srp->rq_st, old_st,
-						      new_st);
 	if (unlikely(act_old_st != old_st)) {
 #if IS_ENABLED(SG_LOG_ACTIVE)
 		SG_LOG(1, srp->parentfp, "%s: unexpected old state: %s\n",
@@ -1579,8 +2012,19 @@ sg_rq_chg_state_ulck(struct sg_request *srp, enum sg_rq_state old_st,
 #endif
 		return -EPROTOTYPE;	/* only used for this error type */
 	}
-	if (indic)
-		sg_rq_chg_state_help(&srp->parentfp->srp_arr, srp, indic);
+	if (indic) {
+		struct sg_fd *sfp = srp->parentfp;
+
+		if (new_st == SG_RQ_INACTIVE) {
+			int prev_idx = READ_ONCE(sfp->low_used_idx);
+			struct xarray *xafp = &sfp->srp_arr;
+
+			if (prev_idx < 0 || srp->rq_idx < prev_idx ||
+			    !xa_get_mark(xafp, prev_idx, SG_XA_RQ_INACTIVE))
+				WRITE_ONCE(sfp->low_used_idx, srp->rq_idx);
+		}
+		sg_rq_chg_state_help(&sfp->srp_arr, srp, indic);
+	}
 	return 0;
 }
 
@@ -1625,47 +2069,139 @@ sg_rq_chg_state(struct sg_request *srp, enum sg_rq_state old_st,
 	return 0;
 }
 
-static void
-sg_rq_chg_state_force_ulck(struct sg_request *srp, enum sg_rq_state new_st)
+/*
+ * Returns index of an unused element in sfp's rsv_arr, or -1 if it is full.
+ * Marks that element's rsv_srp with ERR_PTR(-EBUSY) to reserve that index.
+ */
+static int
+sg_get_idx_new(struct sg_fd *sfp)
 {
-	bool prev, want;
-	struct sg_fd *sfp = srp->parentfp;
-	struct xarray *xafp = &sfp->srp_arr;
+	int k;
+	struct sg_request **rapp = sfp->rsv_arr;
 
-	atomic_set(&srp->rq_st, new_st);
-	want = (new_st == SG_RQ_AWAIT_RCV);
-	prev = xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
-	if (prev != want) {
-		if (want)
-			__xa_set_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
-		else
-			__xa_clear_mark(xafp, srp->rq_idx, SG_XA_RQ_AWAIT);
+	for (k = 0; k < SG_MAX_RSV_REQS; ++k, ++rapp) {
+		if (!*rapp) {
+			*rapp = ERR_PTR(-EBUSY);
+			return k;
+		}
 	}
-	want = (new_st == SG_RQ_INACTIVE);
-	prev = xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
-	if (prev != want) {
-		if (want) {
-			int prev_idx = READ_ONCE(sfp->low_used_idx);
+	return -1;
+}
 
-			if (prev_idx < 0 || srp->rq_idx < prev_idx ||
-			    !xa_get_mark(xafp, prev_idx, SG_XA_RQ_INACTIVE))
-				WRITE_ONCE(sfp->low_used_idx, srp->rq_idx);
-			__xa_set_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
-		} else {
-			__xa_clear_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE);
+static int
+sg_get_idx_new_lck(struct sg_fd *sfp)
+{
+	int res;
+	unsigned long iflags;
+
+	xa_lock_irqsave(&sfp->srp_arr, iflags);
+	res = sg_get_idx_new(sfp);
+	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
+	return res;
+}
+
+/*
+ * Looks for an available element index in sfp's rsv_arr. That element's
+ * sh_srp must be NULL and will be set to ERR_PTR(-EBUSY). If no element
+ * is available then returns -1.
+ */
+static int
+sg_get_idx_available(struct sg_fd *sfp)
+{
+	int k;
+	struct sg_request **rapp = sfp->rsv_arr;
+	struct sg_request *srp;
+
+	for (k = 0; k < SG_MAX_RSV_REQS; ++k, ++rapp) {
+		srp = *rapp;
+		if (!IS_ERR_OR_NULL(srp)) {
+			if (!srp->sh_srp && !SG_RQ_ACTIVE(srp)) {
+				srp->sh_srp = ERR_PTR(-EBUSY);
+				return k;
+			}
 		}
 	}
+	return -1;
 }
 
-static void
-sg_rq_chg_state_force(struct sg_request *srp, enum sg_rq_state new_st)
+static struct sg_request *
+sg_get_probable_read_side(struct sg_fd *sfp)
+{
+	struct sg_request **rapp = sfp->rsv_arr;
+	struct sg_request **end_rapp = rapp + SG_MAX_RSV_REQS;
+	struct sg_request *rs_srp;
+
+	for ( ; rapp < end_rapp; ++rapp) {
+		rs_srp = *rapp;
+		if (IS_ERR_OR_NULL(rs_srp) || rs_srp->sh_srp)
+			continue;
+		switch (atomic_read(&rs_srp->rq_st)) {
+		case SG_RQ_INFLIGHT:
+		case SG_RQ_AWAIT_RCV:
+		case SG_RQ_BUSY:
+		case SG_RQ_SHR_SWAP:
+			return rs_srp;
+		default:
+			break;
+		}
+	}
+	return NULL;
+}
+
+/*
+ * Returns string of the form: <leadin>rsv<num><leadout> if srp is one of
+ * the reserve requests. Otherwise a blank string of length <leadin> plus
+ * length of <leadout> is returned.
+ */
+static const char *
+sg_get_rsv_str(struct sg_request *srp, const char *leadin,
+	       const char *leadout, int b_len, char *b)
+{
+	int k, i_len, o_len, len;
+	struct sg_fd *sfp;
+	struct sg_request **rapp;
+
+	if (!b || b_len < 1)
+		return b;
+	if (!leadin)
+		leadin = "";
+	if (!leadout)
+		leadout = "";
+	i_len = strlen(leadin);
+	o_len = strlen(leadout);
+	if (!srp)
+		goto blank;
+	sfp = srp->parentfp;
+	if (!sfp)
+		goto blank;
+	rapp = sfp->rsv_arr;
+	for (k = 0; k < SG_MAX_RSV_REQS; ++k, ++rapp) {
+		if (srp == *rapp)
+			break;
+	}
+	if (k >= SG_MAX_RSV_REQS)
+		goto blank;
+	scnprintf(b, b_len, "%srsv%d%s", leadin, k, leadout);
+	return b;
+blank:
+	len = min_t(int, i_len + o_len, b_len - 1);
+	for (k = 0; k < len; ++k)
+		b[k] = ' ';
+	b[len] = '\0';
+	return b;
+}
+
+static inline const char *
+sg_get_rsv_str_lck(struct sg_request *srp, const char *leadin,
+		   const char *leadout, int b_len, char *b)
 {
 	unsigned long iflags;
-	struct xarray *xafp = &srp->parentfp->srp_arr;
+	const char *cp;
 
-	xa_lock_irqsave(xafp, iflags);
-	sg_rq_chg_state_force_ulck(srp, new_st);
-	xa_unlock_irqrestore(xafp, iflags);
+	xa_lock_irqsave(&srp->parentfp->srp_arr, iflags);
+	cp = sg_get_rsv_str(srp, leadin, leadout, b_len, b);
+	xa_unlock_irqrestore(&srp->parentfp->srp_arr, iflags);
+	return cp;
 }
 
 static void
@@ -1691,9 +2227,8 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 	else            /* this sfd is defaulting to head */
 		at_head = !(srp->rq_flags & SG_FLAG_Q_AT_TAIL);
 
-	kref_get(&sfp->f_ref); /* sg_rq_end_io() does kref_put(). */
+	kref_get(&sfp->f_ref); /* put usually in: sg_rq_end_io() */
 	sg_rq_chg_state_force(srp, SG_RQ_INFLIGHT);
-
 	/* >>>>>>> send cmd/req off to other levels <<<<<<<< */
 	if (!sync) {
 		atomic_inc(&sfp->submitted);
@@ -1761,7 +2296,7 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 	} else {
 		sh_var = SG_SHR_NONE;
 		if (unlikely(rq_flags & SGV4_FLAG_SHARE))
-			return ERR_PTR(-ENOMSG);
+			return ERR_PTR(-ENOMSG);    /* no file share found */
 	}
 	if (unlikely(dxfr_len >= SZ_256M))
 		return ERR_PTR(-EINVAL);
@@ -1779,6 +2314,7 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		srp->s_hdr4.cmd_len = h4p->request_len;
 		srp->s_hdr4.dir = dir;
 		srp->s_hdr4.out_resid = 0;
+		srp->s_hdr4.mrq_ind = 0;
 	} else {	/* v3 interface active */
 		memcpy(&srp->s_hdr3, hi_p, sizeof(srp->s_hdr3));
 	}
@@ -1873,7 +2409,6 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 	int err = 0;
 	u32 rq_res = srp->rq_result;
 	enum sg_shr_var sh_var = srp->sh_var;
-	struct sg_fd *sh_sfp;
 
 	if (unlikely(srp->rq_result & 0xff)) {
 		int sb_len_wr = sg_copy_sense(srp, v4_active);
@@ -1886,30 +2421,40 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 	if (unlikely(test_bit(SG_FRQ_ABORTING, srp->frq_bm)))
 		srp->rq_info |= SG_INFO_ABORTED;
 
-	sh_sfp = sg_fd_share_ptr(sfp);
 	if (sh_var == SG_SHR_WS_RQ && sg_fd_is_shared(sfp)) {
-		struct sg_request *rs_srp = sh_sfp->rsv_srp;
-		enum sg_rq_state mar_st = atomic_read(&rs_srp->rq_st);
+		enum sg_rq_state rs_st;
+		struct sg_request *rs_srp = srp->sh_srp;
+
+		if (!rs_srp)
+			return -EPROTO;
+		rs_st = atomic_read(&rs_srp->rq_st);
 
-		switch (mar_st) {
+		switch (rs_st) {
 		case SG_RQ_SHR_SWAP:
 		case SG_RQ_SHR_IN_WS:
 			/* make read-side request available for re-use */
 			rs_srp->tag = SG_TAG_WILDCARD;
 			rs_srp->sh_var = SG_SHR_NONE;
 			sg_rq_chg_state_force(rs_srp, SG_RQ_INACTIVE);
-			atomic_inc(&sh_sfp->inactives);
+			atomic_inc(&rs_srp->parentfp->inactives);
+			rs_srp->frq_bm[0] = 0;
+			__set_bit(SG_FRQ_RESERVED, rs_srp->frq_bm);
+			rs_srp->in_resid = 0;
+			rs_srp->rq_info = 0;
+			rs_srp->sense_len = 0;
+			rs_srp->sh_srp = NULL;
 			break;
 		case SG_RQ_AWAIT_RCV:
 			break;
 		case SG_RQ_INACTIVE:
-			sh_sfp->ws_srp = NULL;
-			break;	/* nothing to do */
+			/* remove request share mapping */
+			rs_srp->sh_srp = NULL;
+			break;
 		default:
 			err = -EPROTO;	/* Logic error */
 			SG_LOG(1, sfp,
 			       "%s: SHR_WS_RQ, bad read-side state: %s\n",
-			       __func__, sg_rq_st_str(mar_st, true));
+			       __func__, sg_rq_st_str(rs_st, true));
 			break;	/* nothing to do */
 		}
 	}
@@ -1924,6 +2469,8 @@ sg_complete_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool other_err)
 	enum sg_rq_state sr_st = atomic_read(&srp->rq_st);
 
 	/* advance state machine, send signal to write-side if appropriate */
+	SG_LOG(4, sfp, "%s: %pK: sh_var=%s\n", __func__, srp,
+	       sg_shr_str(srp->sh_var, true));
 	switch (srp->sh_var) {
 	case SG_SHR_RS_RQ:
 		{
@@ -1939,29 +2486,32 @@ sg_complete_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool other_err)
 			} else if (sr_st != SG_RQ_SHR_SWAP) {
 				sg_rq_chg_state_force(srp, SG_RQ_SHR_SWAP);
 			}
-			if (ws_sfp && ws_sfp->async_qp &&
+			if (ws_sfp && ws_sfp->async_qp && !srp->sh_srp &&
 			    (!test_bit(SG_FRQ_IS_V4I, srp->frq_bm) ||
 			     (srp->rq_flags & SGV4_FLAG_SIGNAL)))
 				kill_fasync(&ws_sfp->async_qp, SIGPOLL,
 					    poll_type);
 		}
 		break;
-	case SG_SHR_WS_RQ:      /* cleanup both on write-side completion */
-		{
-			struct sg_fd *rs_sfp = sg_fd_share_ptr(sfp);
+	case SG_SHR_WS_RQ:	/* cleanup both on write-side completion */
+		if (likely(sg_fd_is_shared(sfp))) {
+			struct sg_request *rs_srp = srp->sh_srp;
 
-			if (likely(rs_sfp)) {
-				rs_sfp->ws_srp = NULL;
-				if (rs_sfp->rsv_srp)
-					rs_sfp->rsv_srp->sh_var =
-							SG_SHR_RS_NOT_SRQ;
+			if (rs_srp) {
+				rs_srp->sh_srp = NULL;
+				rs_srp->sh_var = SG_SHR_RS_NOT_SRQ;
+			} else {
+				SG_LOG(2, sfp, "%s: write-side's paired read is missing\n",
+				       __func__);
 			}
 		}
 		srp->sh_var = SG_SHR_WS_NOT_SRQ;
+		srp->sh_srp = NULL;
 		srp->sgatp = &srp->sgat_h;
 		if (sr_st != SG_RQ_BUSY)
 			sg_rq_chg_state_force(srp, SG_RQ_BUSY);
 		break;
+	case SG_SHR_WS_NOT_SRQ:
 	default:
 		if (sr_st != SG_RQ_BUSY)
 			sg_rq_chg_state_force(srp, SG_RQ_BUSY);
@@ -2017,6 +2567,7 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 }
 
 /*
+ * Invoked when user calls ioctl(SG_IORECEIVE, SGV4_FLAG_MULTIPLE_REQS).
  * Returns negative on error including -ENODATA if there are no mrqs submitted
  * nor waiting. Otherwise it returns the number of elements written to
  * rsp_arr, which may be 0 if mrqs submitted but none waiting
@@ -2059,7 +2610,7 @@ sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs,
 
 /*
  * Invoked when user calls ioctl(SG_IORECEIVE, SGV4_FLAG_MULTIPLE_REQS).
- * Expected race as multiple concurrent calls with the same pack_id/tag can
+ * Expected race as many concurrent calls with the same pack_id/tag can
  * occur. Only one should succeed per request (more may succeed but will get
  * different requests).
  */
@@ -2541,7 +3092,7 @@ sg_change_after_read_side_rq(struct sg_fd *sfp, bool fini1_again0)
 		goto fini;
 	if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_RS_SHARE))
 		rs_sfp = sfp;
-	rs_rsv_srp = sfp->rsv_srp;
+	rs_rsv_srp = rs_sfp->rsv_arr[0];
 	if (IS_ERR_OR_NULL(rs_rsv_srp))
 		goto fini;
 
@@ -2592,18 +3143,27 @@ sg_change_after_read_side_rq(struct sg_fd *sfp, bool fini1_again0)
 static void
 sg_unshare_rs_fd(struct sg_fd *rs_sfp, bool lck)
 {
+	int k;
 	unsigned long iflags = 0;
 	struct sg_device *sdp = rs_sfp->parentdp;
+	struct sg_request **rapp = rs_sfp->rsv_arr;
 	struct xarray *xadp = &sdp->sfp_arr;
+	struct sg_request *r_srp;
 
-	rcu_assign_pointer(rs_sfp->share_sfp, NULL);
 	if (lck)
-		xa_lock_irqsave(xadp, iflags);
-	rs_sfp->ws_srp = NULL;
+		xa_lock_irqsave_nested(xadp, iflags, 1);
+	__clear_bit(SG_FFD_RESHARE, rs_sfp->ffd_bm);
+	for (k = 0; k < SG_MAX_RSV_REQS; ++k, ++rapp) {
+		r_srp = *rapp;
+		if (IS_ERR_OR_NULL(r_srp))
+			continue;
+		r_srp->sh_srp = NULL;
+	}
 	__xa_set_mark(xadp, rs_sfp->idx, SG_XA_FD_UNSHARED);
 	__xa_clear_mark(xadp, rs_sfp->idx, SG_XA_FD_RS_SHARE);
 	if (lck)
 		xa_unlock_irqrestore(xadp, iflags);
+	rcu_assign_pointer(rs_sfp->share_sfp, NULL);
 	kref_put(&rs_sfp->f_ref, sg_remove_sfp);/* get: sg_find_sfp_by_fd() */
 }
 
@@ -2614,13 +3174,13 @@ sg_unshare_ws_fd(struct sg_fd *ws_sfp, bool lck)
 	struct sg_device *sdp = ws_sfp->parentdp;
 	struct xarray *xadp = &sdp->sfp_arr;
 
-	rcu_assign_pointer(ws_sfp->share_sfp, NULL);
 	if (lck)
-		xa_lock_irqsave(xadp, iflags);
+		xa_lock_irqsave_nested(xadp, iflags, 1);
 	__xa_set_mark(xadp, ws_sfp->idx, SG_XA_FD_UNSHARED);
 	/* SG_XA_FD_RS_SHARE mark should be already clear */
 	if (lck)
 		xa_unlock_irqrestore(xadp, iflags);
+	rcu_assign_pointer(ws_sfp->share_sfp, NULL);
 	kref_put(&ws_sfp->f_ref, sg_remove_sfp);/* get: sg_find_sfp_by_fd() */
 }
 
@@ -2633,74 +3193,95 @@ sg_unshare_ws_fd(struct sg_fd *ws_sfp, bool lck)
  */
 static void
 sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
+		__must_hold(sfp->parentdp->open_rel_lock)
 {
 	__maybe_unused int res = 0;
+	int k, retry_count;
 	unsigned long iflags;
 	enum sg_rq_state sr_st;
+	struct sg_request **rapp;
 	struct sg_device *sdp = sfp->parentdp;
 	struct sg_device *sh_sdp;
 	struct sg_fd *sh_sfp;
 	struct sg_request *rsv_srp = NULL;
 	struct sg_request *ws_srp;
 	struct xarray *xadp = &sdp->sfp_arr;
+	struct xarray *xafp = &sfp->srp_arr;
 
 	SG_LOG(3, sfp, "%s: sfp=%pK %s\n", __func__, sfp,
 	       (is_rd_side ? "read-side" : "write-side"));
 	xa_lock_irqsave(xadp, iflags);
+	retry_count = 0;
+try_again:
+	if (is_rd_side && !xa_get_mark(xadp, sfp->idx, SG_XA_FD_RS_SHARE))
+		goto fini;
 	sh_sfp = sg_fd_share_ptr(sfp);
-	if (!sg_fd_is_shared(sfp))
-		goto err_out;
+	if (unlikely(!sh_sfp))
+		goto fini;
 	sh_sdp = sh_sfp->parentdp;
-	if (is_rd_side) {
-		bool set_inactive = false;
-
-		if (unlikely(!xa_get_mark(xadp, sfp->idx,
-					  SG_XA_FD_RS_SHARE))) {
-			xa_unlock_irqrestore(xadp, iflags);
+	if (!xa_trylock(xafp)) {
+		/*
+		 * The other side of the share might be closing as well, avoid
+		 * deadlock. Should clear relatively quickly.
+		 */
+		xa_unlock_irqrestore(xadp, iflags);
+		if (++retry_count > SG_ADD_RQ_MAX_RETRIES) {
+			SG_LOG(1, sfp, "%s: retry_count>>\n", __func__);
 			return;
 		}
-		rsv_srp = sfp->rsv_srp;
-		if (unlikely(!rsv_srp))
-			goto fini;
-		if (unlikely(rsv_srp->sh_var != SG_SHR_RS_RQ))
-			goto fini;
-		sr_st = atomic_read(&rsv_srp->rq_st);
-		switch (sr_st) {
-		case SG_RQ_SHR_SWAP:
-			set_inactive = true;
-			break;
-		case SG_RQ_SHR_IN_WS:
-			ws_srp = sfp->ws_srp;
-			if (ws_srp && !IS_ERR(ws_srp)) {
-				ws_srp->sh_var = SG_SHR_WS_NOT_SRQ;
-				sfp->ws_srp = NULL;
+		mutex_unlock(&sdp->open_rel_lock);
+		cpu_relax();
+		mutex_lock(&sdp->open_rel_lock);
+		xa_lock_irqsave(xadp, iflags);
+		goto try_again;
+	}
+	/* have acquired xafp lock */
+	if (is_rd_side) {
+		rapp = sfp->rsv_arr;
+		for (k = 0; k < SG_MAX_RSV_REQS; ++k, ++rapp) {
+			bool set_inactive = false;
+
+			rsv_srp = *rapp;
+			if (IS_ERR_OR_NULL(rsv_srp) ||
+			    rsv_srp->sh_var != SG_SHR_RS_RQ)
+				continue;
+			sr_st = atomic_read(&rsv_srp->rq_st);
+			switch (sr_st) {
+			case SG_RQ_SHR_SWAP:
+				set_inactive = true;
+				break;
+			case SG_RQ_SHR_IN_WS:
+				ws_srp = rsv_srp->sh_srp;
+				if (!IS_ERR_OR_NULL(ws_srp) &&
+				    !test_bit(SG_FFD_RELEASE,
+					      sh_sfp->ffd_bm)) {
+					ws_srp->sh_var = SG_SHR_WS_NOT_SRQ;
+				}
+				rsv_srp->sh_srp = NULL;
+				set_inactive = true;
+				break;
+			default:
+				break;
+			}
+			rsv_srp->sh_var = SG_SHR_NONE;
+			if (set_inactive) {
+				res = sg_rq_chg_state_ulck(rsv_srp, sr_st, SG_RQ_INACTIVE);
+				if (!res)
+					atomic_inc(&sfp->inactives);
 			}
-			set_inactive = true;
-			break;
-		default:
-			break;
-		}
-		rsv_srp->sh_var = SG_SHR_NONE;
-		if (set_inactive) {
-			res = sg_rq_chg_state_ulck(rsv_srp, sr_st, SG_RQ_INACTIVE);
-			if (!res)
-				atomic_inc(&sfp->inactives);
 		}
-fini:
 		if (!xa_get_mark(&sh_sdp->sfp_arr, sh_sfp->idx,
 				 SG_XA_FD_FREE) && sg_fd_is_shared(sh_sfp))
 			sg_unshare_ws_fd(sh_sfp, sdp != sh_sdp);
 		sg_unshare_rs_fd(sfp, false);
-	} else {
-		if (unlikely(!sg_fd_is_shared(sfp))) {
-			xa_unlock_irqrestore(xadp, iflags);
-			return;
-		} else if (!xa_get_mark(&sh_sdp->sfp_arr, sh_sfp->idx,
-					SG_XA_FD_FREE))
+	} else {			/* is write-side of share */
+		if (!xa_get_mark(&sh_sdp->sfp_arr, sh_sfp->idx,
+				 SG_XA_FD_FREE) && sg_fd_is_shared(sh_sfp))
 			sg_unshare_rs_fd(sh_sfp, sdp != sh_sdp);
 		sg_unshare_ws_fd(sfp, false);
 	}
-err_out:
+	xa_unlock(xafp);
+fini:
 	xa_unlock_irqrestore(xadp, iflags);
 }
 
@@ -2713,27 +3294,31 @@ static void
 sg_do_unshare(struct sg_fd *sfp, bool unshare_val)
 		__must_hold(sfp->f_mutex)
 {
-	bool retry;
+	bool retry, same_sdp_s;
 	int retry_count = 0;
+	unsigned long iflags;
 	struct sg_request *rs_rsv_srp;
 	struct sg_fd *rs_sfp;
 	struct sg_fd *ws_sfp;
 	struct sg_fd *o_sfp = sg_fd_share_ptr(sfp);
 	struct sg_device *sdp = sfp->parentdp;
+	struct xarray *xadp = &sdp->sfp_arr;
 
-	if (!sg_fd_is_shared(sfp)) {
+	if (unlikely(!o_sfp)) {
 		SG_LOG(1, sfp, "%s: not shared ? ?\n", __func__);
 		return;	/* no share to undo */
 	}
 	if (!unshare_val)
 		return;		/* when unshare value is zero, it's a NOP */
+	same_sdp_s = (o_sfp && sfp->parentdp == o_sfp->parentdp);
 again:
 	retry = false;
 	if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_RS_SHARE)) {
 		rs_sfp = sfp;
 		ws_sfp = o_sfp;
-		rs_rsv_srp = rs_sfp->rsv_srp;
-		if (rs_rsv_srp && rs_rsv_srp->sh_var != SG_SHR_RS_RQ) {
+		rs_rsv_srp = rs_sfp->rsv_arr[0];
+		if (!IS_ERR_OR_NULL(rs_rsv_srp) &&
+		    rs_rsv_srp->sh_var != SG_SHR_RS_RQ) {
 			if (unlikely(!mutex_trylock(&ws_sfp->f_mutex))) {
 				if (++retry_count > SG_ADD_RQ_MAX_RETRIES)
 					SG_LOG(1, sfp,
@@ -2743,7 +3328,16 @@ sg_do_unshare(struct sg_fd *sfp, bool unshare_val)
 					retry = true;
 				goto fini;
 			}
-			sg_unshare_rs_fd(rs_sfp, true);
+			if (same_sdp_s) {
+				xa_lock_irqsave(xadp, iflags);
+				/* write-side is 'other' so do first */
+				sg_unshare_ws_fd(ws_sfp, false);
+				sg_unshare_rs_fd(rs_sfp, false);
+				xa_unlock_irqrestore(xadp, iflags);
+			} else {
+				sg_unshare_ws_fd(ws_sfp, true);
+				sg_unshare_rs_fd(rs_sfp, true);
+			}
 			mutex_unlock(&ws_sfp->f_mutex);
 		}
 	} else {			/* called on write-side fd */
@@ -2757,10 +3351,19 @@ sg_do_unshare(struct sg_fd *sfp, bool unshare_val)
 				retry = true;
 			goto fini;
 		}
-		rs_rsv_srp = rs_sfp->rsv_srp;
-		if (rs_rsv_srp->sh_var != SG_SHR_RS_RQ) {
-			sg_unshare_rs_fd(rs_sfp, true);
-			sg_unshare_ws_fd(ws_sfp, true);
+		rs_rsv_srp = rs_sfp->rsv_arr[0];
+		if (!IS_ERR_OR_NULL(rs_rsv_srp) &&
+		    rs_rsv_srp->sh_var != SG_SHR_RS_RQ) {
+			if (same_sdp_s) {
+				xa_lock_irqsave(xadp, iflags);
+				/* read-side is 'other' so do first */
+				sg_unshare_rs_fd(rs_sfp, false);
+				sg_unshare_ws_fd(ws_sfp, false);
+				xa_unlock_irqrestore(xadp, iflags);
+			} else {
+				sg_unshare_rs_fd(rs_sfp, true);
+				sg_unshare_ws_fd(ws_sfp, true);
+			}
 		}
 		mutex_unlock(&rs_sfp->f_mutex);
 	}
@@ -2970,6 +3573,16 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	return res;
 }
 
+static inline int
+sg_num_waiting_maybe_acquire(struct sg_fd *sfp)
+{
+	int num = atomic_read(&sfp->waiting);
+
+	if (num < 1)
+		num = atomic_read_acquire(&sfp->waiting);
+	return num;
+}
+
 /*
  * When use_tag is true then id is a tag, else it is a pack_id. Returns
  * valid srp if match, else returns NULL.
@@ -2977,15 +3590,11 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 static struct sg_request *
 sg_match_request(struct sg_fd *sfp, bool use_tag, int id)
 {
-	int num_waiting = atomic_read(&sfp->waiting);
 	unsigned long idx;
 	struct sg_request *srp;
 
-	if (num_waiting < 1) {
-		num_waiting = atomic_read_acquire(&sfp->waiting);
-		if (num_waiting < 1)
-			return NULL;
-	}
+	if (sg_num_waiting_maybe_acquire(sfp) < 1)
+		return NULL;
 	if (id == SG_PACK_ID_WILDCARD) {
 		xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT)
 			return srp;
@@ -3019,14 +3628,10 @@ sg_match_first_mrq_after(struct sg_fd *sfp, int pack_id,
 	unsigned long idx;
 	struct sg_request *srp;
 
-	if (atomic_read(&sfp->waiting) < 1) {
-		if (atomic_read_acquire(&sfp->waiting) < 1)
-			return NULL;
-	}
+	if (sg_num_waiting_maybe_acquire(sfp) < 1)
+		return NULL;
 once_more:
 	xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
-		if (unlikely(!srp))
-			continue;
 		if (look_for_after) {
 			if (after_rp == srp)
 				look_for_after = false;
@@ -3095,16 +3700,15 @@ sg_abort_req(struct sg_fd *sfp, struct sg_request *srp)
 	return res;
 }
 
+/* Holding xa_lock_irq(&sfp->srp_arr) */
 static int
 sg_mrq_abort_inflight(struct sg_fd *sfp, int pack_id)
 {
 	bool got_ebusy = false;
 	int res = 0;
-	unsigned long iflags;
 	struct sg_request *srp;
 	struct sg_request *prev_srp;
 
-	xa_lock_irqsave(&sfp->srp_arr, iflags);
 	for (prev_srp = NULL; true; prev_srp = srp) {
 		srp = sg_match_first_mrq_after(sfp, pack_id, prev_srp);
 		if (!srp)
@@ -3115,7 +3719,6 @@ sg_mrq_abort_inflight(struct sg_fd *sfp, int pack_id)
 		else if (res)
 			break;
 	}
-	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 	if (res)
 		return res;
 	return got_ebusy ? -EBUSY : 0;
@@ -3135,7 +3738,7 @@ sg_mrq_abort(struct sg_fd *sfp, int pack_id, bool dev_scope)
 {
 	int existing_id;
 	int res = 0;
-	unsigned long idx;
+	unsigned long idx, iflags;
 	struct sg_device *sdp;
 	struct sg_fd *o_sfp;
 	struct sg_fd *s_sfp;
@@ -3167,7 +3770,7 @@ sg_mrq_abort(struct sg_fd *sfp, int pack_id, bool dev_scope)
 		       __func__, pack_id);
 
 	/* now look for inflight requests matching that mrq pack_id */
-	xa_lock(&sfp->srp_arr);
+	xa_lock_irqsave(&sfp->srp_arr, iflags);
 	res = sg_mrq_abort_inflight(sfp, pack_id);
 	if (res == -EBUSY) {
 		res = sg_mrq_abort_inflight(sfp, pack_id);
@@ -3175,11 +3778,11 @@ sg_mrq_abort(struct sg_fd *sfp, int pack_id, bool dev_scope)
 			goto fini;
 	}
 	s_sfp = sg_fd_share_ptr(sfp);
-	if (s_sfp) {	/* SGV4_FLAG_DO_ON_OTHER may have been used */
-		xa_unlock(&sfp->srp_arr);
-		sfp = s_sfp;	/* if share, check other fd */
-		xa_lock(&sfp->srp_arr);
-		if (sg_fd_is_shared(sfp))
+	if (s_sfp) {	/* SGV4_FLAG_DO_ON_OTHER possible */
+		xa_unlock_irqrestore(&sfp->srp_arr, iflags);
+		sfp = s_sfp;	/* if share, switch to other fd */
+		xa_lock_irqsave(&sfp->srp_arr, iflags);
+		if (!sg_fd_is_shared(sfp))
 			goto fini;
 		/* tough luck if other fd used same mrq pack_id */
 		res = sg_mrq_abort_inflight(sfp, pack_id);
@@ -3187,7 +3790,7 @@ sg_mrq_abort(struct sg_fd *sfp, int pack_id, bool dev_scope)
 			res = sg_mrq_abort_inflight(sfp, pack_id);
 	}
 fini:
-	xa_unlock(&sfp->srp_arr);
+	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 	return res;
 
 check_whole_dev:
@@ -3196,10 +3799,10 @@ sg_mrq_abort(struct sg_fd *sfp, int pack_id, bool dev_scope)
 	xa_for_each(&sdp->sfp_arr, idx, o_sfp) {
 		if (o_sfp == sfp)
 			continue;       /* already checked */
-		xa_lock(&o_sfp->srp_arr);
+		mutex_lock(&o_sfp->f_mutex);
 		/* recurse, dev_scope==false is stopping condition */
 		res = sg_mrq_abort(o_sfp, pack_id, false);
-		xa_unlock(&o_sfp->srp_arr);
+		mutex_unlock(&o_sfp->f_mutex);
 		if (res == 0)
 			break;
 	}
@@ -3235,12 +3838,13 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	if (h4p->flags & SGV4_FLAG_MULTIPLE_REQS) {
 		if (pack_id == 0)
 			return -ENOSTR;
-		return sg_mrq_abort(sfp, pack_id, dev_scope);
+		res = sg_mrq_abort(sfp, pack_id, dev_scope);
+		return res;
 	}
+	xa_lock_irqsave(&sfp->srp_arr, iflags);
 	use_tag = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm);
 	id = use_tag ? (int)h4p->request_tag : pack_id;
 
-	xa_lock_irqsave(&sfp->srp_arr, iflags);
 	srp = sg_match_request(sfp, use_tag, id);
 	if (!srp) {	/* assume device (not just fd) scope */
 		xa_unlock_irqrestore(&sfp->srp_arr, iflags);
@@ -3311,7 +3915,7 @@ sg_find_sfp_by_fd(const struct file *search_for, struct sg_fd *from_sfp,
 		__xa_set_mark(&from_sdp->sfp_arr, from_sfp->idx,
 			      SG_XA_FD_RS_SHARE);
 	else
-		kref_get(&from_sfp->f_ref);/* so unshare done before release */
+		kref_get(&from_sfp->f_ref);  /* undone: sg_unshare_*_fd() */
 	if (from_sdp != sdp) {
 		xa_unlock_irqrestore(&from_sdp->sfp_arr, iflags);
 		xa_lock_irqsave(&sdp->sfp_arr, iflags);
@@ -3338,7 +3942,6 @@ sg_fd_share(struct sg_fd *ws_sfp, int m_fd)
 {
 	bool found = false;
 	int res = 0;
-	int retry_count = 0;
 	struct file *filp;
 	struct sg_fd *rs_sfp;
 
@@ -3360,22 +3963,9 @@ sg_fd_share(struct sg_fd *ws_sfp, int m_fd)
 	}
 	SG_LOG(6, ws_sfp, "%s: read-side fd okay, scan for filp=0x%pK\n",
 	       __func__, filp);
-again:
 	rs_sfp = sg_find_sfp_by_fd(filp, ws_sfp, false);
-	if (IS_ERR(rs_sfp)) {
-		res = PTR_ERR(rs_sfp);
-		if (res == -EPROBE_DEFER) {
-			if (unlikely(++retry_count > SG_ADD_RQ_MAX_RETRIES)) {
-				res = -EBUSY;
-			} else {
-				res = 0;
-				cpu_relax();
-				goto again;
-			}
-		}
-	} else {
+	if (!IS_ERR(rs_sfp))
 		found = !!rs_sfp;
-	}
 fini:
 	/* paired with filp=fget(m_fd) above */
 	fput(filp);
@@ -3395,8 +3985,6 @@ sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
 {
 	bool found = false;
 	int res = 0;
-	int retry_count = 0;
-	enum sg_rq_state rq_st;
 	struct file *filp;
 	struct sg_fd *ws_sfp = sg_fd_share_ptr(rs_sfp);
 
@@ -3408,17 +3996,7 @@ sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
 	if (unlikely(!xa_get_mark(&rs_sfp->parentdp->sfp_arr, rs_sfp->idx,
 				  SG_XA_FD_RS_SHARE)))
 		return -EINVAL;
-	if (unlikely(!ws_sfp))
-		return -EINVAL;
-	if (unlikely(!rs_sfp->rsv_srp))
-		res = -EPROTO;	/* Internal error */
-	rq_st = atomic_read(&rs_sfp->rsv_srp->rq_st);
-	if (!(rq_st == SG_RQ_INACTIVE || rq_st == SG_RQ_SHR_SWAP))
-		res = -EBUSY;		/* read-side reserve buffer busy */
-	if (rs_sfp->ws_srp)
-		res = -EBUSY;	/* previous write-side request not finished */
-	if (unlikely(res))
-		return res;
+	/* SG_XA_FD_RS_SHARE set impiles ws_sfp is valid */
 
 	/* Alternate approach: fcheck_files(current->files, m_fd) */
 	filp = fget(new_ws_fd);
@@ -3430,28 +4008,22 @@ sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
 	}
 	SG_LOG(6, ws_sfp, "%s: write-side fd ok, scan for filp=0x%pK\n", __func__,
 	       filp);
-	sg_unshare_ws_fd(ws_sfp, false);
-again:
+	sg_unshare_ws_fd(ws_sfp, true);
 	ws_sfp = sg_find_sfp_by_fd(filp, rs_sfp, true);
-	if (IS_ERR(ws_sfp)) {
-		res = PTR_ERR(ws_sfp);
-		if (res == -EPROBE_DEFER) {
-			if (unlikely(++retry_count > SG_ADD_RQ_MAX_RETRIES)) {
-				res = -EBUSY;
-			} else {
-				res = 0;
-				cpu_relax();
-				goto again;
-			}
-		}
-	} else {
+	if (!IS_ERR(ws_sfp))
 		found = !!ws_sfp;
-	}
 fini:
 	/* paired with filp=fget(new_ws_fd) above */
 	fput(filp);
 	if (unlikely(res))
 		return res;
+	if (found) {	/* can only reshare rsv_arr[0] */
+		struct sg_request *rs_srp = rs_sfp->rsv_arr[0];
+
+		if (!IS_ERR_OR_NULL(rs_srp))
+			rs_srp->sh_srp = NULL;
+		set_bit(SG_FFD_RESHARE, rs_sfp->ffd_bm);
+	}
 	return found ? 0 : -ENOTSOCK; /* ENOTSOCK for fd exists but not sg */
 }
 
@@ -3469,76 +4041,92 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 {
 	bool use_new_srp = false;
 	int res = 0;
-	int new_sz, blen;
-	unsigned long idx, iflags;
+	int k, new_sz, blen;
+	unsigned long idx = 0;
+	unsigned long iflags;
 	struct sg_request *o_srp;       /* prior reserve sg_request */
 	struct sg_request *n_srp;       /* new sg_request, may be used */
 	struct sg_request *t_srp;       /* other fl entries */
 	struct sg_device *sdp = sfp->parentdp;
+	struct sg_request **rapp = &sfp->rsv_arr[SG_MAX_RSV_REQS - 1];
 	struct xarray *xafp = &sfp->srp_arr;
 
 	if (unlikely(sg_fd_is_shared(sfp)))
 		return -EBUSY;	/* this fd can't be either side of share */
-	o_srp = sfp->rsv_srp;
-	if (unlikely(!o_srp))
-		return -EPROTO;
 	new_sz = min_t(int, want_rsv_sz, sdp->max_sgat_sz);
 	new_sz = max_t(int, new_sz, sfp->sgat_elem_sz);
-	blen = o_srp->sgatp->buflen;
 	SG_LOG(3, sfp, "%s: was=%d, ask=%d, new=%d (sgat_elem_sz=%d)\n",
-	       __func__, blen, want_rsv_sz, new_sz, sfp->sgat_elem_sz);
-	if (blen == new_sz)
-		return 0;
-	n_srp = sg_mk_srp_sgat(sfp, true /* can take time */, new_sz);
-	if (IS_ERR(n_srp))
-		return PTR_ERR(n_srp);
-	/* new sg_request object, sized correctly is now available */
+	       __func__, *rapp ? (*rapp)->sgatp->buflen : -1,
+	       want_rsv_sz, new_sz, sfp->sgat_elem_sz);
+	if (unlikely(sfp->mmap_sz > 0))
+		return -EBUSY;	/* existing pages possibly pinned */
+
+	for (k = SG_MAX_RSV_REQS - 1; k >= 0; --k, --rapp) {
+		o_srp = *rapp;
+		if (IS_ERR_OR_NULL(o_srp))
+			continue;
+		blen = o_srp->sgatp->buflen;
+		if (blen == new_sz)
+			continue;
+		/* new sg_request object, sized correctly is now available */
+		n_srp = sg_mk_srp_sgat(sfp, true /* can take time */, new_sz);
+		if (IS_ERR(n_srp))
+			return PTR_ERR(n_srp);
 try_again:
-	o_srp = sfp->rsv_srp;
-	if (unlikely(!o_srp)) {
-		res = -EPROTO;
-		goto fini;
-	}
-	if (unlikely(SG_RQ_ACTIVE(o_srp) || sfp->mmap_sz > 0)) {
-		res = -EBUSY;
-		goto fini;
-	}
-	use_new_srp = true;
-	xa_for_each(xafp, idx, t_srp) {
-		if (t_srp != o_srp && new_sz <= t_srp->sgatp->buflen &&
-		    !SG_RQ_ACTIVE(t_srp)) {
-			use_new_srp = false;
-			sfp->rsv_srp = t_srp;
-			break;
+		o_srp = *rapp;
+		if (unlikely(SG_RQ_ACTIVE(o_srp))) {
+			res = -EBUSY;
+			goto fini;
 		}
-	}
-	if (use_new_srp) {
-		struct sg_request *cxc_srp;
+		use_new_srp = true;
+		xa_for_each_marked(xafp, idx, t_srp, SG_XA_RQ_INACTIVE) {
+			if (t_srp != o_srp && new_sz <= t_srp->sgatp->buflen) {
+				use_new_srp = false;
+				xa_lock_irqsave(xafp, iflags);
+				__clear_bit(SG_FRQ_RESERVED, o_srp->frq_bm);
+				__set_bit(SG_FRQ_RESERVED, t_srp->frq_bm);
+				*rapp = t_srp;
+				xa_unlock_irqrestore(xafp, iflags);
+				sg_remove_srp(n_srp);
+				kfree(n_srp);
+				n_srp = NULL;
+				break;
+			}
+		}
+		if (use_new_srp) {
+			struct sg_request *cxc_srp;
 
-		xa_lock_irqsave(xafp, iflags);
-		n_srp->rq_idx = o_srp->rq_idx;
-		idx = o_srp->rq_idx;
-		cxc_srp = __xa_cmpxchg(xafp, idx, o_srp, n_srp, GFP_ATOMIC);
-		if (o_srp == cxc_srp) {
-			sfp->rsv_srp = n_srp;
-			sg_rq_chg_state_force_ulck(n_srp, SG_RQ_INACTIVE);
-			/* don't bump inactives, since replaced an inactive */
-			xa_unlock_irqrestore(xafp, iflags);
-			SG_LOG(6, sfp, "%s: new rsv srp=0x%pK ++\n", __func__,
-			       n_srp);
-			sg_remove_sgat(o_srp);
-			kfree(o_srp);
-		} else {
-			xa_unlock_irqrestore(xafp, iflags);
-			SG_LOG(1, sfp, "%s: xa_cmpxchg() failed, again\n",
-			       __func__);
-			goto try_again;
+			xa_lock_irqsave(xafp, iflags);
+			n_srp->rq_idx = o_srp->rq_idx;
+			idx = o_srp->rq_idx;
+			cxc_srp = __xa_cmpxchg(xafp, idx, o_srp, n_srp,
+					       GFP_ATOMIC);
+			if (o_srp == cxc_srp) {
+				__assign_bit(SG_FRQ_RESERVED, n_srp->frq_bm,
+					     test_bit(SG_FRQ_RESERVED,
+						      o_srp->frq_bm));
+				*rapp = n_srp;
+				sg_rq_chg_state_force_ulck(n_srp, SG_RQ_INACTIVE);
+				xa_unlock_irqrestore(xafp, iflags);
+				SG_LOG(6, sfp, "%s: new rsv srp=0x%pK ++\n",
+				       __func__, n_srp);
+				n_srp = NULL;
+				sg_remove_srp(o_srp);
+				kfree(o_srp);
+				o_srp = NULL;
+			} else {
+				xa_unlock_irqrestore(xafp, iflags);
+				SG_LOG(1, sfp, "%s: xa_cmpxchg()-->retry\n",
+				       __func__);
+				goto try_again;
+			}
 		}
 	}
+	return res;
 fini:
-	if (!use_new_srp) {
-		sg_remove_sgat(n_srp);
-		kfree(n_srp);   /* no-one else has seen n_srp, so safe */
+	if (n_srp) {
+		sg_remove_srp(n_srp);
+		kfree(n_srp);	/* nothing has seen n_srp, so safe */
 	}
 	return res;
 }
@@ -3574,16 +4162,12 @@ static bool
 sg_any_persistent_orphans(struct sg_fd *sfp)
 {
 	if (test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm)) {
-		int num_waiting = atomic_read(&sfp->waiting);
 		unsigned long idx;
 		struct sg_request *srp;
 		struct xarray *xafp = &sfp->srp_arr;
 
-		if (num_waiting < 1) {
-			num_waiting = atomic_read_acquire(&sfp->waiting);
-			if (num_waiting < 1)
-				return false;
-		}
+		if (sg_num_waiting_maybe_acquire(sfp) < 1)
+			return false;
 		xa_for_each_marked(xafp, idx, srp, SG_XA_RQ_AWAIT) {
 			if (test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm))
 				return true;
@@ -3592,9 +4176,17 @@ sg_any_persistent_orphans(struct sg_fd *sfp)
 	return false;
 }
 
-/* Ignore append if size already over half of available buffer */
+/*
+ * Will clear_first if size already over half of available buffer.
+ *
+ * N.B. This function is a useful debug aid to be called inline with its
+ * output going to /sys/kernel/debug/scsi_generic/snapped for later
+ * examination. Best to call it with no locks held and that implies that
+ * the driver state may change while it is processing. Interpret the
+ * result with this in mind.
+ */
 static void
-sg_take_snap(struct sg_fd *sfp, bool dont_append)
+sg_take_snap(struct sg_fd *sfp, bool clear_first)
 {
 	u32 hour, minute, second;
 	u64 n;
@@ -3619,7 +4211,7 @@ sg_take_snap(struct sg_fd *sfp, bool dont_append)
 				      GFP_KERNEL | __GFP_NOWARN);
 		if (!snapped_buf)
 			goto fini;
-	} else if (dont_append) {
+	} else if (clear_first) {
 		memset(snapped_buf, 0, SG_SNAP_BUFF_SZ);
 	}
 #if IS_ENABLED(SG_PROC_OR_DEBUG_FS)
@@ -3653,10 +4245,11 @@ sg_take_snap(struct sg_fd *sfp, bool dont_append)
  * of boolean flags. Access abbreviations: [rw], read-write; [ro], read-only;
  * [wo], write-only; [raw], read after write; [rbw], read before write.
  */
-static void
+static int
 sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 {
 	bool flg = false;
+	int res = 0;
 	const u32 c_flgs_wm = seip->ctl_flags_wr_mask;
 	const u32 c_flgs_rm = seip->ctl_flags_rd_mask;
 	const u32 c_flgs_val_in = seip->ctl_flags;
@@ -3740,10 +4333,10 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 	 * reading: read-side is finished, awaiting action by write-side;
 	 * when written: 1 --> write-side doesn't want to continue
 	 */
-	if (c_flgs_rm & SG_CTL_FLAGM_READ_SIDE_FINI) {
+	if ((c_flgs_rm & SG_CTL_FLAGM_READ_SIDE_FINI) && sg_fd_is_shared(sfp)) {
 		rs_sfp = sg_fd_share_ptr(sfp);
-		if (rs_sfp && rs_sfp->rsv_srp) {
-			struct sg_request *res_srp = rs_sfp->rsv_srp;
+		if (rs_sfp && !IS_ERR_OR_NULL(rs_sfp->rsv_arr[0])) {
+			struct sg_request *res_srp = rs_sfp->rsv_arr[0];
 
 			if (atomic_read(&res_srp->rq_st) == SG_RQ_SHR_SWAP)
 				c_flgs_val_out |= SG_CTL_FLAGM_READ_SIDE_FINI;
@@ -3756,7 +4349,7 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 	if (c_flgs_wm & SG_CTL_FLAGM_READ_SIDE_FINI) {
 		bool rs_fini_wm = !!(c_flgs_val_in & SG_CTL_FLAGM_READ_SIDE_FINI);
 
-		sg_change_after_read_side_rq(sfp, rs_fini_wm);
+		res = sg_change_after_read_side_rq(sfp, rs_fini_wm);
 	}
 	/* READ_SIDE_ERR boolean, [ro] share: read-side finished with error */
 	if (c_flgs_rm & SG_CTL_FLAGM_READ_SIDE_ERR) {
@@ -3819,6 +4412,7 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 
 	if (c_flgs_val_in != c_flgs_val_out)
 		seip->ctl_flags = c_flgs_val_out;
+	return res;
 }
 
 static void
@@ -3865,6 +4459,9 @@ sg_extended_read_value(struct sg_fd *sfp, struct sg_extended_info *seip)
 			uv += (u32)atomic_read(&a_sfp->submitted);
 		seip->read_value = uv;
 		break;
+	case SG_SEIRV_MAX_RSV_REQS:
+		seip->read_value = SG_MAX_RSV_REQS;
+		break;
 	default:
 		SG_LOG(6, sfp, "%s: can't decode %d --> read_value\n",
 		       __func__, seip->read_value);
@@ -3911,8 +4508,11 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 			seip->tot_fd_thresh = hold;
 	}
 	/* check all boolean flags for either wr or rd mask set in or_mask */
-	if (or_masks & SG_SEIM_CTL_FLAGS)
-		sg_extended_bool_flags(sfp, seip);
+	if (or_masks & SG_SEIM_CTL_FLAGS) {
+		result = sg_extended_bool_flags(sfp, seip);
+		if (ret == 0 && unlikely(result))
+			ret = result;
+	}
 	/* yields minor_index (type: u32) [ro] */
 	if (or_masks & SG_SEIM_MINOR_INDEX) {
 		if (s_wr_mask & SG_SEIM_MINOR_INDEX) {
@@ -3937,7 +4537,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 			struct sg_fd *sh_sfp = sg_fd_share_ptr(sfp);
 
 			seip->share_fd = sh_sfp ? sh_sfp->parentdp->index :
-						   U32_MAX;
+						  U32_MAX;
 		}
 		mutex_unlock(&sfp->f_mutex);
 	}
@@ -3998,10 +4598,12 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 			ret = result;
 		mutex_unlock(&sfp->f_mutex);
 	}
-	if (s_rd_mask & SG_SEIM_RESERVED_SIZE)
-		seip->reserved_sz = (u32)min_t(int,
-					       sfp->rsv_srp->sgatp->buflen,
+	if (s_rd_mask & SG_SEIM_RESERVED_SIZE) {
+		struct sg_request *r_srp = sfp->rsv_arr[0];
+
+		seip->reserved_sz = (u32)min_t(int, r_srp->sgatp->buflen,
 					       sdp->max_sgat_sz);
+	}
 	/* copy to user space if int or boolean read mask non-zero */
 	if (s_rd_mask || seip->ctl_flags_rd_mask) {
 		if (copy_to_user(p, seip, SZ_SG_EXTENDED_INFO))
@@ -4096,11 +4698,20 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	unsigned long idx;
 	__maybe_unused const char *pmlp = ", pass to mid-level";
 
-	SG_LOG(6, sfp, "%s: cmd=0x%x, O_NONBLOCK=%d\n", __func__, cmd_in,
-	       !!(filp->f_flags & O_NONBLOCK));
+	SG_LOG(6, sfp, "%s: cmd=0x%x, O_NONBLOCK=%d%s\n", __func__, cmd_in,
+	       !!(filp->f_flags & O_NONBLOCK),
+	       (cmd_in == SG_GET_NUM_WAITING ? ", SG_GET_NUM_WAITING" : ""));
 	sdev = sdp->device;
 
 	switch (cmd_in) {
+	case SG_GET_NUM_WAITING:
+		/* Want as fast as possible, with a useful result */
+		if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm))
+			sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready */
+		val = atomic_read(&sfp->waiting);
+		if (val)
+			return put_user(val, ip);
+		return put_user(atomic_read_acquire(&sfp->waiting), ip);
 	case SG_IO:
 		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
@@ -4169,18 +4780,10 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		}
 		SG_LOG(3, sfp, "%s:    SG_GET_PACK_ID=%d\n", __func__, val);
 		return put_user(val, ip);
-	case SG_GET_NUM_WAITING:
-		/* Want as fast as possible, with a useful result */
-		if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm))
-			sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready */
-		val = atomic_read(&sfp->waiting);
-		if (val)
-			return put_user(val, ip);
-		return put_user(atomic_read_acquire(&sfp->waiting), ip);
 	case SG_GET_SG_TABLESIZE:
 		SG_LOG(3, sfp, "%s:    SG_GET_SG_TABLESIZE=%d\n", __func__,
-		       sdp->max_sgat_sz);
-		return put_user(sdp->max_sgat_sz, ip);
+		       sdp->max_sgat_elems);
+		return put_user(sdp->max_sgat_elems, ip);
 	case SG_SET_RESERVED_SIZE:
 		res = get_user(val, ip);
 		if (likely(!res)) {
@@ -4195,13 +4798,17 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		}
 		return res;
 	case SG_GET_RESERVED_SIZE:
-		mutex_lock(&sfp->f_mutex);
-		val = min_t(int, sfp->rsv_srp->sgatp->buflen,
-			    sdp->max_sgat_sz);
-		mutex_unlock(&sfp->f_mutex);
+		{
+			struct sg_request *r_srp = sfp->rsv_arr[0];
+
+			mutex_lock(&sfp->f_mutex);
+			val = min_t(int, r_srp->sgatp->buflen,
+				    sdp->max_sgat_sz);
+			mutex_unlock(&sfp->f_mutex);
+			res = put_user(val, ip);
+		}
 		SG_LOG(3, sfp, "%s:    SG_GET_RESERVED_SIZE=%d\n", __func__,
 		       val);
-		res = put_user(val, ip);
 		return res;
 	case SG_SET_COMMAND_Q:	/* set by driver whenever v3 or v4 req seen */
 		SG_LOG(3, sfp, "%s:    SG_SET_COMMAND_Q\n", __func__);
@@ -4495,7 +5102,7 @@ sg_vma_open(struct vm_area_struct *vma)
 		pr_warn("%s: sfp null\n", __func__);
 		return;
 	}
-	kref_get(&sfp->f_ref);
+	kref_get(&sfp->f_ref);	/* put in: sg_vma_close() */
 }
 
 static void
@@ -4540,8 +5147,8 @@ sg_vma_fault(struct vm_fault *vmf)
 		SG_LOG(1, sfp, "%s: device detaching\n", __func__);
 		goto out_err;
 	}
-	srp = sfp->rsv_srp;
-	if (unlikely(!srp)) {
+	srp = sfp->rsv_arr[0];
+	if (IS_ERR_OR_NULL(srp)) {
 		SG_LOG(1, sfp, "%s: srp%s\n", __func__, nbp);
 		goto out_err;
 	}
@@ -4594,7 +5201,8 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 		pr_warn("sg: %s: sfp is NULL\n", __func__);
 		return -ENXIO;
 	}
-	mutex_lock(&sfp->f_mutex);
+	if (unlikely(!mutex_trylock(&sfp->f_mutex)))
+		return -EBUSY;
 	req_sz = vma->vm_end - vma->vm_start;
 	SG_LOG(3, sfp, "%s: vm_start=%pK, len=%d\n", __func__,
 	       (void *)vma->vm_start, (int)req_sz);
@@ -4603,7 +5211,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 		goto fini;
 	}
 	/* Check reserve request is inactive and has large enough buffer */
-	srp = sfp->rsv_srp;
+	srp = sfp->rsv_arr[0];
 	if (SG_RQ_ACTIVE(srp)) {
 		res = -EBUSY;
 		goto fini;
@@ -4620,7 +5228,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 	}
 	if (srp->sgat_h.page_order > 0 ||
 	    req_sz > (unsigned long)srp->sgat_h.buflen) {
-		sg_remove_sgat(srp);
+		sg_remove_srp(srp);
 		set_bit(SG_FRQ_FOR_MMAP, srp->frq_bm);
 		res = sg_mk_sgat(srp, sfp, req_sz);
 		if (res) {
@@ -4661,7 +5269,7 @@ sg_uc_rq_end_io_orphaned(struct work_struct *work)
 		sg_finish_scsi_blk_rq(srp);	/* clean up orphan case */
 		sg_deact_request(sfp, srp);
 	}
-	kref_put(&sfp->f_ref, sg_remove_sfp);
+	kref_put(&sfp->f_ref, sg_remove_sfp); /* get in: sg_execute_cmd() */
 }
 
 /*
@@ -4748,7 +5356,7 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	}
 	xa_lock_irqsave(&sfp->srp_arr, iflags);
 	__set_bit(SG_FRQ_ISSUED, srp->frq_bm);
-	sg_rq_chg_state_force_ulck(srp, rqq_state);
+	sg_rq_chg_state_force_ulck(srp, rqq_state);	/* normally --> SG_RQ_AWAIT_RCV */
 	WRITE_ONCE(srp->rqq, NULL);
 	if (test_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm)) {
 		int num = atomic_inc_return(&sfp->waiting);
@@ -4775,16 +5383,15 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 		/* clean up orphaned request that aren't being kept */
 		INIT_WORK(&srp->ew_orph.work, sg_uc_rq_end_io_orphaned);
 		schedule_work(&srp->ew_orph.work);
+		/* kref_put(f_ref) done in sg_uc_rq_end_io_orphaned() */
 		return;
 	}
-	/* Wake any sg_read()/ioctl(SG_IORECEIVE) awaiting this req */
 	if (!(srp->rq_flags & SGV4_FLAG_HIPRI))
 		wake_up_interruptible(&sfp->cmpl_wait);
 	if (sfp->async_qp && (!test_bit(SG_FRQ_IS_V4I, srp->frq_bm) ||
 			      (srp->rq_flags & SGV4_FLAG_SIGNAL)))
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
-	kref_put(&sfp->f_ref, sg_remove_sfp);
-	return;
+	kref_put(&sfp->f_ref, sg_remove_sfp);	/* get in: sg_execute_cmd() */
 }
 
 static const struct file_operations sg_fops = {
@@ -4851,6 +5458,7 @@ sg_add_device_helper(struct gendisk *disk, struct scsi_device *scsidp)
 	clear_bit(SG_FDEV_DETACHING, sdp->fdev_bm);
 	atomic_set(&sdp->open_cnt, 0);
 	sdp->index = k;
+	/* set d_ref to 1; corresponding put in: sg_remove_device() */
 	kref_init(&sdp->d_ref);
 	error = 0;
 
@@ -4977,12 +5585,13 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 	if (unlikely(!sdp))
 		return;
 	/* set this flag as soon as possible as it could be a surprise */
-	if (test_and_set_bit(SG_FDEV_DETACHING, sdp->fdev_bm))
+	if (test_and_set_bit(SG_FDEV_DETACHING, sdp->fdev_bm)) {
+		pr_warn("%s: multiple entries: sg%u\n", __func__, sdp->index);
 		return; /* only want to do following once per device */
-
+	}
 	SCSI_LOG_TIMEOUT(3, sdev_printk(KERN_INFO, sdp->device,
-					"%s: 0x%pK\n", __func__, sdp));
-
+					"%s: sg%u 0x%pK\n", __func__,
+					sdp->index, sdp));
 	xa_for_each(&sdp->sfp_arr, idx, sfp) {
 		wake_up_interruptible_all(&sfp->cmpl_wait);
 		if (sfp->async_qp)
@@ -4995,6 +5604,7 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 	cdev_del(sdp->cdev);
 	sdp->cdev = NULL;
 
+	/* init to 1: kref_init() in sg_add_device_helper() */
 	kref_put(&sdp->d_ref, sg_device_destroy);
 }
 
@@ -5135,21 +5745,10 @@ sg_rq_map_kern(struct sg_request *srp, struct request_queue *q, struct request *
 	return res;
 }
 
-static inline void
-sg_set_map_data(const struct sg_scatter_hold *schp, bool up_valid,
-		struct rq_map_data *mdp)
-{
-	mdp->pages = schp->pages;
-	mdp->page_order = schp->page_order;
-	mdp->nr_entries = schp->num_sgat;
-	mdp->offset = 0;
-	mdp->null_mapped = !up_valid;
-}
-
 static int
 sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 {
-	bool reserved, no_xfer, us_xfer;
+	bool no_dxfer, us_xfer;
 	int res = 0;
 	int dxfer_len = 0;
 	int r0w = READ;
@@ -5172,7 +5771,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		long_cmdp = kzalloc(cwrp->cmd_len, GFP_KERNEL);
 		if (unlikely(!long_cmdp)) {
 			res = -ENOMEM;
-			goto err_out;
+			goto err_pre_blk_get;
 		}
 		SG_LOG(5, sfp, "%s: long_cmdp=0x%pK ++\n", __func__, long_cmdp);
 	}
@@ -5199,8 +5798,8 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		iov_count = sh3p->iovec_count;
 		r0w = dxfer_dir == SG_DXFER_TO_DEV ? WRITE : READ;
 	}
-	SG_LOG(4, sfp, "%s: dxfer_len=%d, data-%s\n", __func__, dxfer_len,
-	       (r0w ? "OUT" : "IN"));
+	SG_LOG(4, sfp, "%s: dxfer_len=%d%s\n", __func__, dxfer_len,
+	       (dxfer_len ? (r0w ? ", data-OUT" : ", data-IN") : ""));
 	q = sdp->device->request_queue;
 
 	/*
@@ -5213,9 +5812,8 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 			      (test_bit(SG_FFD_MORE_ASYNC, sfp->ffd_bm) ?
 						BLK_MQ_REQ_NOWAIT : 0));
 	if (IS_ERR(rqq)) {
-		kfree(long_cmdp);
 		res = PTR_ERR(rqq);
-		goto err_out;
+		goto err_pre_blk_get;
 	}
 	/* current sg_request protected by SG_RQ_BUSY state */
 	scsi_rp = scsi_req(rqq);
@@ -5224,8 +5822,11 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		srp->tag = rqq->tag;
 	if (rq_flags & SGV4_FLAG_HIPRI)
 		set_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm);
-	if (cwrp->cmd_len > BLK_MAX_CDB)
+	if (cwrp->cmd_len > BLK_MAX_CDB) {
 		scsi_rp->cmd = long_cmdp;	/* transfer ownership */
+		/* this heap freed in scsi_req_free_cmd() */
+		long_cmdp = NULL;
+	}
 	if (cwrp->u_cmdp)
 		res = sg_fetch_cmnd(sfp, cwrp->u_cmdp, cwrp->cmd_len,
 				    scsi_rp->cmd);
@@ -5234,18 +5835,17 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	else
 		res = -EPROTO;
 	if (unlikely(res))
-		goto err_out;
+		goto fini;
 	scsi_rp->cmd_len = cwrp->cmd_len;
 	srp->cmd_opcode = scsi_rp->cmd[0];
-	no_xfer = dxfer_len <= 0 || dxfer_dir == SG_DXFER_NONE;
+	no_dxfer = dxfer_len <= 0 || dxfer_dir == SG_DXFER_NONE;
 	us_xfer = !(rq_flags & (SG_FLAG_NO_DXFER | SG_FLAG_MMAP_IO));
-	__assign_bit(SG_FRQ_US_XFER, srp->frq_bm, !no_xfer && us_xfer);
-	reserved = (sfp->rsv_srp == srp);
+	__assign_bit(SG_FRQ_US_XFER, srp->frq_bm, !no_dxfer && us_xfer);
 	rqq->end_io_data = srp;
 	scsi_rp->retries = SG_DEFAULT_RETRIES;
 	req_schp = srp->sgatp;
 
-	if (no_xfer) {
+	if (no_dxfer) {
 		SG_LOG(4, sfp, "%s: no data xfer [0x%pK]\n", __func__, srp);
 		goto fini;	/* path of reqs with no din nor dout */
 	} else if (unlikely(rq_flags & SG_FLAG_DIRECT_IO) && iov_count == 0 &&
@@ -5262,9 +5862,13 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	if (likely(md)) {	/* normal, "indirect" IO */
 		if (unlikely(rq_flags & SG_FLAG_MMAP_IO)) {
 			/* mmap IO must use and fit in reserve request */
-			if (unlikely(!reserved ||
+			bool reserve0;
+			struct sg_request *r_srp = sfp->rsv_arr[0];
+
+			reserve0 = (r_srp == srp);
+			if (unlikely(!reserve0 ||
 				     dxfer_len > req_schp->buflen))
-				res = reserved ? -ENOMEM : -EBUSY;
+				res = reserve0 ? -ENOMEM : -EBUSY;
 		} else if (req_schp->buflen == 0) {
 			int up_sz = max_t(int, dxfer_len, sfp->sgat_elem_sz);
 
@@ -5272,8 +5876,11 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		}
 		if (unlikely(res))
 			goto fini;
-
-		sg_set_map_data(req_schp, !!up, md);
+		md->pages = req_schp->pages;
+		md->page_order = req_schp->page_order;
+		md->nr_entries = req_schp->num_sgat;
+		md->offset = 0;
+		md->null_mapped = !up;
 		md->from_user = (dxfer_dir == SG_DXFER_TO_FROM_DEV);
 	}
 
@@ -5282,7 +5889,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		struct iov_iter i;
 
 		res = import_iovec(r0w, up, iov_count, 0, &iov, &i);
-		if (res < 0)
+		if (unlikely(res < 0))
 			goto fini;
 
 		iov_iter_truncate(&i, dxfer_len);
@@ -5317,9 +5924,10 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	} else {
 		srp->bio = rqq->bio;
 	}
-err_out:
+err_pre_blk_get:
 	SG_LOG((res ? 1 : 4), sfp, "%s: %s %s res=%d [0x%pK]\n", __func__,
 	       sg_shr_str(srp->sh_var, false), cp, res, srp);
+	kfree(long_cmdp);
 	return res;
 }
 
@@ -5336,13 +5944,14 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 	int ret;
 	struct sg_fd *sfp = srp->parentfp;
 	struct request *rqq = READ_ONCE(srp->rqq);
+	__maybe_unused char b[32];
 
 	SG_LOG(4, sfp, "%s: srp=0x%pK%s\n", __func__, srp,
-	       (srp->parentfp->rsv_srp == srp) ? " rsv" : "");
+	       sg_get_rsv_str_lck(srp, " ", "", sizeof(b), b));
 	if (test_and_clear_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm)) {
 		if (atomic_dec_and_test(&sfp->submitted))
 			clear_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm);
-		atomic_dec(&sfp->waiting);
+		atomic_dec_return_release(&sfp->waiting);
 	}
 
 	/* Expect blk_put_request(rqq) already called in sg_rq_end_io() */
@@ -5443,7 +6052,7 @@ sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen)
 }
 
 static void
-sg_remove_sgat_helper(struct sg_fd *sfp, struct sg_scatter_hold *schp)
+sg_remove_sgat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
 {
 	int k;
 	void *p;
@@ -5464,15 +6073,19 @@ sg_remove_sgat_helper(struct sg_fd *sfp, struct sg_scatter_hold *schp)
 
 /* Remove the data (possibly a sgat list) held by srp, not srp itself */
 static void
-sg_remove_sgat(struct sg_request *srp)
+sg_remove_srp(struct sg_request *srp)
 {
-	struct sg_scatter_hold *schp = &srp->sgat_h; /* care: remove own data */
-	struct sg_fd *sfp = srp->parentfp;
+	struct sg_scatter_hold *schp;
+	struct sg_fd *sfp;
+	__maybe_unused char b[48];
 
+	if (!srp)
+		return;
+	schp = &srp->sgat_h; /* care: remove own data */
+	sfp = srp->parentfp;
 	SG_LOG(4, sfp, "%s: num_sgat=%d%s\n", __func__, schp->num_sgat,
-	       ((srp->parentfp ? (sfp->rsv_srp == srp) : false) ?
-							" [rsv]" : ""));
-	sg_remove_sgat_helper(sfp, schp);
+	       sg_get_rsv_str_lck(srp, " [", "]", sizeof(b), b));
+	sg_remove_sgat(sfp, schp);
 
 	if (sfp->tot_fd_thresh > 0) {
 		/* this is a subtraction, error if it goes negative */
@@ -5527,7 +6140,7 @@ sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
 }
 
 /*
- * If there are multiple requests outstanding, the speed of this function is
+ * If there are many requests outstanding, the speed of this function is
  * important. 'id' is pack_id when is_tag=false, otherwise it is a tag. Both
  * SG_PACK_ID_WILDCARD and SG_TAG_WILDCARD are -1 and that case is typically
  * the fast path. This function is only used in the non-blocking cases.
@@ -5543,7 +6156,6 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 	bool second = false;
 	enum sg_rq_state sr_st;
 	int res;
-	int num_waiting = atomic_read(&sfp->waiting);
 	int l_await_idx = READ_ONCE(sfp->low_await_idx);
 	unsigned long idx, s_idx;
 	unsigned long end_idx = ULONG_MAX;
@@ -5552,11 +6164,8 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 
 	if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm))
 		sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready to push up */
-	if (num_waiting < 1) {
-		num_waiting = atomic_read_acquire(&sfp->waiting);
-		if (num_waiting < 1)
-			return NULL;
-	}
+	if (sg_num_waiting_maybe_acquire(sfp) < 1)
+		return NULL;
 
 	s_idx = (l_await_idx < 0) ? 0 : l_await_idx;
 	idx = s_idx;
@@ -5670,7 +6279,7 @@ static bool
 sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
 {
 	bool second = false;
-	int num_waiting, res;
+	int res;
 	int l_await_idx = READ_ONCE(sfp->low_await_idx);
 	unsigned long idx, s_idx, end_idx;
 	struct sg_request *srp;
@@ -5684,12 +6293,8 @@ sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
 		*srpp = ERR_PTR(-ENODATA);
 		return true;
 	}
-	num_waiting = atomic_read(&sfp->waiting);
-	if (num_waiting < 1) {
-		num_waiting = atomic_read_acquire(&sfp->waiting);
-		if (num_waiting < 1)
-			goto fini;
-	}
+	if (sg_num_waiting_maybe_acquire(sfp) < 1)
+		goto fini;
 
 	s_idx = (l_await_idx < 0) ? 0 : l_await_idx;
 	idx = s_idx;
@@ -5727,9 +6332,10 @@ sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
  * may take time but has improved chance of success, otherwise use GFP_ATOMIC.
  * Note that basic initialization is done but srp is not added to either sfp
  * list. On error returns twisted negated errno value (not NULL).
+ * N.B. Initializes new srp state to SG_RQ_BUSY.
  */
 static struct sg_request *
-sg_mk_srp(struct sg_fd *sfp, bool first)
+sg_mk_only_srp(struct sg_fd *sfp, bool first)
 {
 	struct sg_request *srp;
 	gfp_t gfp = __GFP_NOWARN;
@@ -5754,7 +6360,7 @@ static struct sg_request *
 sg_mk_srp_sgat(struct sg_fd *sfp, bool first, int db_len)
 {
 	int res;
-	struct sg_request *n_srp = sg_mk_srp(sfp, first);
+	struct sg_request *n_srp = sg_mk_only_srp(sfp, first);
 
 	if (IS_ERR(n_srp))
 		return n_srp;
@@ -5779,14 +6385,22 @@ static struct sg_request *
 sg_build_reserve(struct sg_fd *sfp, int buflen)
 {
 	bool go_out = false;
-	int res;
+	int res, idx;
 	struct sg_request *srp;
+	struct sg_request **rapp;
 
 	SG_LOG(3, sfp, "%s: buflen=%d\n", __func__, buflen);
-	srp = sg_mk_srp(sfp, xa_empty(&sfp->srp_arr));
-	if (IS_ERR(srp))
+	idx = sg_get_idx_new_lck(sfp);
+	if (idx < 0) {
+		SG_LOG(1, sfp, "%s: sg_get_idx_new_lck() failed\n", __func__);
+		return ERR_PTR(-EFBIG);
+	}
+	rapp = sfp->rsv_arr + idx;
+	srp = sg_mk_only_srp(sfp, xa_empty(&sfp->srp_arr));
+	if (IS_ERR(srp)) {
+		*rapp = NULL;
 		return srp;
-	sfp->rsv_srp = srp;
+	}
 	do {
 		if (buflen < (int)PAGE_SIZE) {
 			buflen = PAGE_SIZE;
@@ -5794,14 +6408,18 @@ sg_build_reserve(struct sg_fd *sfp, int buflen)
 		}
 		res = sg_mk_sgat(srp, sfp, buflen);
 		if (likely(res == 0)) {
-			SG_LOG(4, sfp, "%s: final buflen=%d, srp=0x%pK ++\n",
-			       __func__, buflen, srp);
+			*rapp = srp;
+			SG_LOG(4, sfp,
+			       "%s: rsv%d: final buflen=%d, srp=0x%pK ++\n",
+			       __func__, idx, buflen, srp);
 			return srp;
 		}
-		if (go_out)
+		if (go_out) {
+			*rapp = NULL;
 			return ERR_PTR(res);
+		}
 		/* failed so remove, halve buflen, try again */
-		sg_remove_sgat(srp);
+		sg_remove_srp(srp);
 		buflen >>= 1;   /* divide by 2 */
 	} while (true);
 }
@@ -5820,19 +6438,21 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 	bool act_empty = false;
 	bool allow_rsv = true;		/* see note above */
 	bool mk_new_srp = true;
+	bool new_rsv_srp = false;
 	bool ws_rq = false;
 	bool try_harder = false;
 	bool second = false;
 	bool has_inactive = false;
+	bool is_rsv;
+	int ra_idx = 0;
 	int res, l_used_idx;
 	u32 sum_dlen;
 	unsigned long idx, s_idx, end_idx, iflags;
 	enum sg_rq_state sr_st;
-	enum sg_rq_state rs_sr_st = SG_RQ_INACTIVE;
+	enum sg_rq_state rs_st = SG_RQ_INACTIVE;
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_request *r_srp = NULL; /* returned value won't be NULL */
 	struct sg_request *low_srp = NULL;
-	__maybe_unused struct sg_request *rsv_srp;
 	struct sg_request *rs_rsv_srp = NULL;
 	struct sg_fd *rs_sfp = NULL;
 	struct xarray *xafp = &fp->srp_arr;
@@ -5840,25 +6460,33 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 	__maybe_unused char b[64];
 
 	b[0] = '\0';
-	rsv_srp = fp->rsv_srp;
-
 	switch (sh_var) {
 	case SG_SHR_NONE:
 	case SG_SHR_WS_NOT_SRQ:
 		break;
 	case SG_SHR_RS_RQ:
-		sr_st = atomic_read(&rsv_srp->rq_st);
+		if (test_bit(SG_FFD_RESHARE, fp->ffd_bm))
+			ra_idx = 0;
+		else
+			ra_idx = sg_get_idx_available(fp);
+		if (ra_idx < 0) {
+			new_rsv_srp = true;
+			cp = "m_rq";
+			goto good_fini;
+		}
+		r_srp = fp->rsv_arr[ra_idx];
+		sr_st = atomic_read(&r_srp->rq_st);
 		if (sr_st == SG_RQ_INACTIVE) {
-			res = sg_rq_chg_state(rsv_srp, sr_st, SG_RQ_BUSY);
+			res = sg_rq_chg_state(r_srp, sr_st, SG_RQ_BUSY);
 			if (likely(res == 0)) {
-				r_srp = rsv_srp;
+				r_srp->sh_srp = NULL;
 				mk_new_srp = false;
 				cp = "rs_rq";
 				goto good_fini;
 			}
 		}
 		/* Did not find the reserve request available */
-		r_srp = ERR_PTR(-EBUSY);
+		r_srp = ERR_PTR(-EFBIG);
 		break;
 	case SG_SHR_RS_NOT_SRQ:
 		allow_rsv = false;
@@ -5875,26 +6503,36 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		 * EADDRINUSE errno. The winner advances read-side's rq_state:
 		 *     SG_RQ_SHR_SWAP --> SG_RQ_SHR_IN_WS
 		 */
-		rs_rsv_srp = rs_sfp->rsv_srp;
-		rs_sr_st = atomic_read(&rs_rsv_srp->rq_st);
-		switch (rs_sr_st) {
+		if (cwrp->rsv_idx >= 0)
+			rs_rsv_srp = rs_sfp->rsv_arr[cwrp->rsv_idx];
+		else
+			rs_rsv_srp = sg_get_probable_read_side(rs_sfp);
+		if (!rs_rsv_srp) {
+			r_srp = ERR_PTR(-ENOSTR);
+			break;
+		}
+		rs_st = atomic_read(&rs_rsv_srp->rq_st);
+		switch (rs_st) {
 		case SG_RQ_AWAIT_RCV:
 			if (unlikely(rs_rsv_srp->rq_result & SG_ML_RESULT_MSK)) {
 				/* read-side done but error occurred */
 				r_srp = ERR_PTR(-ENOSTR);
 				break;
 			}
-			fallthrough;
+			ws_rq = true;
+			break;
 		case SG_RQ_SHR_SWAP:
 			ws_rq = true;
-			if (unlikely(rs_sr_st == SG_RQ_AWAIT_RCV))
+			if (unlikely(rs_st == SG_RQ_AWAIT_RCV))
 				break;
-			res = sg_rq_chg_state(rs_rsv_srp, rs_sr_st, SG_RQ_SHR_IN_WS);
+			res = sg_rq_chg_state(rs_rsv_srp, rs_st, SG_RQ_SHR_IN_WS);
 			if (unlikely(res))
 				r_srp = ERR_PTR(-EADDRINUSE);
 			break;
 		case SG_RQ_INFLIGHT:
 		case SG_RQ_BUSY:
+			SG_LOG(6, fp, "%s: write-side finds read-side: %s\n", __func__,
+			       sg_rq_st_str(rs_st, true));
 			r_srp = ERR_PTR(-EBUSY);
 			break;
 		case SG_RQ_INACTIVE:
@@ -5911,15 +6549,24 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		if (PTR_ERR(r_srp) == -EBUSY)
 			goto err_out;
 #if IS_ENABLED(SG_LOG_ACTIVE)
-		if (sh_var == SG_SHR_RS_RQ)
+		if (sh_var == SG_SHR_RS_RQ) {
 			snprintf(b, sizeof(b), "SG_SHR_RS_RQ --> sr_st=%s",
 				 sg_rq_st_str(sr_st, false));
-		else if (sh_var == SG_SHR_WS_RQ && rs_sfp)
-			snprintf(b, sizeof(b), "SG_SHR_WS_RQ-->rs_sr_st=%s",
-				 sg_rq_st_str(rs_sr_st, false));
-		else
+		} else if (sh_var == SG_SHR_WS_RQ && rs_sfp) {
+			char c[32];
+			const char *ccp;
+
+			if (rs_rsv_srp)
+				ccp = sg_get_rsv_str(rs_rsv_srp, "[", "]",
+						     sizeof(c), c);
+			else
+				ccp = "? ";
+			snprintf(b, sizeof(b), "SHR_WS_RQ --> rs_sr%s_st=%s",
+				 ccp, sg_rq_st_str(rs_st, false));
+		} else {
 			snprintf(b, sizeof(b), "sh_var=%s",
 				 sg_shr_str(sh_var, false));
+		}
 #endif
 		goto err_out;
 	}
@@ -5947,7 +6594,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		s_idx = (l_used_idx < 0) ? 0 : l_used_idx;
 		if (l_used_idx >= 0 && xa_get_mark(xafp, s_idx, SG_XA_RQ_INACTIVE)) {
 			r_srp = xa_load(xafp, s_idx);
-			if (r_srp && (allow_rsv || rsv_srp != r_srp)) {
+			if (r_srp && (allow_rsv || !test_bit(SG_FRQ_RESERVED, r_srp->frq_bm))) {
 				if (r_srp->sgat_h.buflen <= SG_DEF_SECTOR_SZ) {
 					if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE,
 							    SG_RQ_BUSY) == 0) {
@@ -5960,7 +6607,8 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		}
 		xa_for_each_marked(xafp, idx, r_srp, SG_XA_RQ_INACTIVE) {
 			has_inactive = true;
-			if (!allow_rsv && rsv_srp == r_srp)
+			if (!allow_rsv &&
+			    test_bit(SG_FRQ_RESERVED, r_srp->frq_bm))
 				continue;
 			if (!low_srp && dxfr_len < SG_DEF_SECTOR_SZ) {
 				low_srp = r_srp;
@@ -5985,7 +6633,8 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		for (r_srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE);
 		     r_srp;
 		     r_srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE)) {
-			if (!allow_rsv && rsv_srp == r_srp)
+			if (!allow_rsv &&
+			    test_bit(SG_FRQ_RESERVED, r_srp->frq_bm))
 				continue;
 			if (r_srp->sgat_h.buflen >= dxfr_len) {
 				if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
@@ -6025,7 +6674,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 			r_srp = ERR_PTR(-EDOM);
 			SG_LOG(6, fp, "%s: trying 2nd req but cmd_q=false\n",
 			       __func__);
-			goto fini;
+			goto err_out;
 		} else if (fp->tot_fd_thresh > 0) {
 			sum_dlen = atomic_read(&fp->sum_fd_dlens) + dxfr_len;
 			if (unlikely(sum_dlen > (u32)fp->tot_fd_thresh)) {
@@ -6034,6 +6683,20 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 				       __func__, sum_dlen, "tot_fd_thresh");
 			}
 		}
+		if (!IS_ERR(r_srp) && new_rsv_srp) {
+			ra_idx = sg_get_idx_new(fp);
+			if (ra_idx < 0) {
+				ra_idx = sg_get_idx_available(fp);
+				if (ra_idx < 0) {
+					SG_LOG(1, fp,
+					       "%s: no read-side reqs available\n",
+					       __func__);
+					r_srp = ERR_PTR(-EFBIG);
+				}
+			}
+		}
+		if (IS_ERR(r_srp))	/* NULL is _not_ an ERR here */
+			goto err_out;
 		r_srp = sg_mk_srp_sgat(fp, act_empty, dxfr_len);
 		if (IS_ERR(r_srp)) {
 			if (!try_harder && dxfr_len < SG_DEF_SECTOR_SZ &&
@@ -6041,46 +6704,70 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 				try_harder = true;
 				goto start_again;
 			}
-			goto fini;
+			goto err_out;
+		}
+		SG_LOG(4, fp, "%s: %smk_new_srp=0x%pK ++\n", __func__,
+		       (new_rsv_srp ? "rsv " : ""), r_srp);
+		if (new_rsv_srp) {
+			fp->rsv_arr[ra_idx] = r_srp;
+			set_bit(SG_FRQ_RESERVED, r_srp->frq_bm);
+			r_srp->sh_srp = NULL;
 		}
 		xa_lock_irqsave(xafp, iflags);
-		res = __xa_alloc(xafp, &n_idx, r_srp, xa_limit_32b, GFP_KERNEL);
+		res = __xa_alloc(xafp, &n_idx, r_srp, xa_limit_32b, GFP_ATOMIC);
 		xa_unlock_irqrestore(xafp, iflags);
 		if (unlikely(res < 0)) {
-			sg_remove_sgat(r_srp);
+			xa_unlock_irqrestore(xafp, iflags);
+			sg_remove_srp(r_srp);
 			kfree(r_srp);
 			r_srp = ERR_PTR(-EPROTOTYPE);
 			SG_LOG(1, fp, "%s: xa_alloc() failed, errno=%d\n",
 			       __func__,  -res);
-			goto fini;
+			goto err_out;
 		}
-		idx = n_idx;
-		r_srp->rq_idx = idx;
+		r_srp->rq_idx = n_idx;
 		r_srp->parentfp = fp;
-		sg_rq_chg_state_force(r_srp, SG_RQ_BUSY);
-		SG_LOG(4, fp, "%s: mk_new_srp=0x%pK ++\n", __func__, r_srp);
+		xa_unlock_irqrestore(xafp, iflags);
 	}
-	/* following copes with unlikely case where frq_bm > one ulong */
-	WRITE_ONCE(r_srp->frq_bm[0], cwrp->frq_bm[0]);	/* assumes <= 32 req flags */
+	/* keep SG_FRQ_RESERVED setting from prior/new r_srp; clear rest */
+	is_rsv = test_bit(SG_FRQ_RESERVED, r_srp->frq_bm);
+	WRITE_ONCE(r_srp->frq_bm[0], 0);
+	if (is_rsv)
+		set_bit(SG_FRQ_RESERVED, r_srp->frq_bm);
+	/* r_srp inherits these 3 flags from cwrp->frq_bm */
+	if (test_bit(SG_FRQ_IS_V4I, cwrp->frq_bm))
+		set_bit(SG_FRQ_IS_V4I, r_srp->frq_bm);
+	if (test_bit(SG_FRQ_SYNC_INVOC, cwrp->frq_bm))
+		set_bit(SG_FRQ_SYNC_INVOC, r_srp->frq_bm);
 	r_srp->sgatp->dlen = dxfr_len;/* must be <= r_srp->sgat_h.buflen */
 	r_srp->sh_var = sh_var;
 	r_srp->cmd_opcode = 0xff;  /* set invalid opcode (VS), 0x0 is TUR */
-fini:
 	/* If setup stalls (e.g. blk_get_request()) debug shows 'elap=1 ns' */
 	if (test_bit(SG_FFD_TIME_IN_NS, fp->ffd_bm))
 		r_srp->start_ns = S64_MAX;
 	if (ws_rq && rs_rsv_srp) {
-		rs_sfp->ws_srp = r_srp;
 		/* write-side "shares" the read-side reserve request's data buffer */
 		r_srp->sgatp = &rs_rsv_srp->sgat_h;
-	} else if (sh_var == SG_SHR_RS_RQ && test_bit(SG_FFD_READ_SIDE_ERR, fp->ffd_bm))
+		rs_rsv_srp->sh_srp = r_srp;
+		r_srp->sh_srp = rs_rsv_srp;
+	} else if (sh_var == SG_SHR_RS_RQ && test_bit(SG_FFD_READ_SIDE_ERR, fp->ffd_bm)) {
 		clear_bit(SG_FFD_READ_SIDE_ERR, fp->ffd_bm);
+	}
 err_out:
-	if (IS_ERR(r_srp) && PTR_ERR(r_srp) != -EBUSY && b[0])
-		SG_LOG(1, fp, "%s: bad %s\n", __func__, b);
-	if (!IS_ERR(r_srp))
+#if IS_ENABLED(SG_LOG_ACTIVE)
+	if (IS_ERR(r_srp)) {
+		int err = -PTR_ERR(r_srp);
+
+		if (err == EBUSY)
+			SG_LOG(4, fp, "%s: EBUSY (as ptr err)\n", __func__);
+		else
+			SG_LOG(1, fp, "%s: %s err=%d\n", __func__, b, err);
+	} else {
 		SG_LOG(4, fp, "%s: %s %sr_srp=0x%pK\n", __func__, cp,
-		       ((r_srp == fp->rsv_srp) ? "[rsv] " : ""), r_srp);
+		       sg_get_rsv_str_lck(r_srp, "[", "] ", sizeof(b), b),
+		       r_srp);
+	}
+#endif
 	return r_srp;
 }
 
@@ -6094,25 +6781,31 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 static void
 sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 {
+	bool is_rsv;
 	enum sg_rq_state sr_st;
 	u8 *sbp;
 
 	if (WARN_ON(!sfp || !srp))
 		return;
+	SG_LOG(3, sfp, "%s: srp=%pK\n", __func__, srp);
 	sbp = srp->sense_bp;
 	srp->sense_bp = NULL;
 	sr_st = atomic_read(&srp->rq_st);
-	if (sr_st != SG_RQ_SHR_SWAP) { /* mark _BUSY then _INACTIVE at end */
+	if (sr_st != SG_RQ_SHR_SWAP) {
 		/*
 		 * Can be called from many contexts and it is hard to know
 		 * whether xa locks held. So assume not.
 		 */
 		sg_rq_chg_state_force(srp, SG_RQ_INACTIVE);
 		atomic_inc(&sfp->inactives);
+		is_rsv = test_bit(SG_FRQ_RESERVED, srp->frq_bm);
 		WRITE_ONCE(srp->frq_bm[0], 0);
+		if (is_rsv)
+			__set_bit(SG_FRQ_RESERVED, srp->frq_bm);
 		srp->tag = SG_TAG_WILDCARD;
 		srp->in_resid = 0;
 		srp->rq_info = 0;
+		srp->sense_len = 0;
 	}
 	/* maybe orphaned req, thus never read */
 	if (sbp)
@@ -6130,16 +6823,15 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 	unsigned long iflags;
 	struct sg_fd *sfp;
 	struct sg_request *srp = NULL;
-	struct xarray *xadp = &sdp->sfp_arr;
 	struct xarray *xafp;
+	struct xarray *xadp;
 
 	sfp = kzalloc(sizeof(*sfp), GFP_ATOMIC | __GFP_NOWARN);
 	if (unlikely(!sfp))
 		return ERR_PTR(-ENOMEM);
 	init_waitqueue_head(&sfp->cmpl_wait);
 	xa_init_flags(&sfp->srp_arr, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
-	xafp = &sfp->srp_arr;
-	kref_init(&sfp->f_ref);
+	kref_init(&sfp->f_ref);		/* init to 1; put: sg_release() */
 	mutex_init(&sfp->f_mutex);
 	sfp->timeout = SG_DEFAULT_TIMEOUT;
 	sfp->timeout_user = SG_DEFAULT_TIMEOUT_USER;
@@ -6152,6 +6844,9 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 	__assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm, SG_DEFAULT_Q_AT);
 	sfp->tot_fd_thresh = SG_TOT_FD_THRESHOLD;
 	atomic_set(&sfp->sum_fd_dlens, 0);
+	atomic_set(&sfp->submitted, 0);
+	atomic_set(&sfp->waiting, 0);
+	atomic_set(&sfp->inactives, 0);
 	/*
 	 * SG_SCATTER_SZ initializes scatter_elem_sz but different value may
 	 * be given as driver/module parameter (e.g. 'scatter_elem_sz=8192').
@@ -6161,12 +6856,9 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 	 */
 	sfp->sgat_elem_sz = scatter_elem_sz;
 	sfp->parentdp = sdp;
-	atomic_set(&sfp->submitted, 0);
-	atomic_set(&sfp->waiting, 0);
-	atomic_set(&sfp->inactives, 0);
 
 	if (SG_IS_DETACHING(sdp)) {
-		SG_LOG(1, sfp, "%s: detaching\n", __func__);
+		SG_LOG(1, sfp, "%s: sg%u detaching\n", __func__, sdp->index);
 		kfree(sfp);
 		return ERR_PTR(-ENODEV);
 	}
@@ -6175,6 +6867,7 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 
 	rbuf_len = min_t(int, sg_big_buff, sdp->max_sgat_sz);
 	if (rbuf_len > 0) {
+		xafp = &sfp->srp_arr;
 		srp = sg_build_reserve(sfp, rbuf_len);
 		if (IS_ERR(srp)) {
 			err = PTR_ERR(srp);
@@ -6191,41 +6884,44 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 		}
 		xa_lock_irqsave(xafp, iflags);
 		res = __xa_alloc(xafp, &idx, srp, xa_limit_32b, GFP_ATOMIC);
-		if (!res) {
-			srp->rq_idx = idx;
-			srp->parentfp = sfp;
-			sg_rq_chg_state_force_ulck(srp, SG_RQ_INACTIVE);
-			atomic_inc(&sfp->inactives);
-		}
-		xa_unlock_irqrestore(xafp, iflags);
 		if (res < 0) {
 			SG_LOG(1, sfp, "%s: xa_alloc(srp) bad, errno=%d\n",
 			       __func__,  -res);
-			sg_remove_sgat(srp);
+			xa_unlock_irqrestore(xafp, iflags);
+			sg_remove_srp(srp);
 			kfree(srp);
 			kfree(sfp);
 			return ERR_PTR(-EPROTOTYPE);
 		}
+		srp->rq_idx = idx;
+		srp->parentfp = sfp;
+		sg_rq_chg_state_force_ulck(srp, SG_RQ_INACTIVE);
+		atomic_inc(&sfp->inactives);
+		__set_bit(SG_FRQ_RESERVED, srp->frq_bm);
+		xa_unlock_irqrestore(xafp, iflags);
 	}
 	if (!reduced) {
 		SG_LOG(4, sfp, "%s: built reserve buflen=%d\n", __func__,
 		       rbuf_len);
 	}
+	xadp = &sdp->sfp_arr;
 	xa_lock_irqsave(xadp, iflags);
-	res = __xa_alloc(xadp, &idx, sfp, xa_limit_32b, GFP_KERNEL);
-	xa_unlock_irqrestore(xadp, iflags);
+	res = __xa_alloc(xadp, &idx, sfp, xa_limit_32b, GFP_ATOMIC);
 	if (unlikely(res < 0)) {
+		xa_unlock_irqrestore(xadp, iflags);
 		pr_warn("%s: xa_alloc(sdp) bad, o_count=%d, errno=%d\n",
 			__func__, atomic_read(&sdp->open_cnt), -res);
 		if (srp) {
-			sg_remove_sgat(srp);
+			sg_remove_srp(srp);
 			kfree(srp);
 		}
 		kfree(sfp);
 		return ERR_PTR(res);
 	}
 	sfp->idx = idx;
-	kref_get(&sdp->d_ref);
+	__xa_set_mark(xadp, idx, SG_XA_FD_UNSHARED);
+	xa_unlock_irqrestore(xadp, iflags);
+	kref_get(&sdp->d_ref);	/* put in: sg_uc_remove_sfp() */
 	__module_get(THIS_MODULE);
 	SG_LOG(3, sfp, "%s: success, sfp=0x%pK ++\n", __func__, sfp);
 	return sfp;
@@ -6259,14 +6955,13 @@ sg_uc_remove_sfp(struct work_struct *work)
 		return;
 	}
 	sdp = sfp->parentdp;
-	xadp = &sdp->sfp_arr;
 
 	/* Cleanup any responses which were never read(). */
 	xa_for_each(xafp, idx, srp) {
 		if (!xa_get_mark(xafp, srp->rq_idx, SG_XA_RQ_INACTIVE))
 			sg_finish_scsi_blk_rq(srp);
 		if (srp->sgatp->buflen > 0)
-			sg_remove_sgat(srp);
+			sg_remove_srp(srp);
 		if (unlikely(srp->sense_bp)) {
 			mempool_free(srp->sense_bp, sg_sense_pool);
 			srp->sense_bp = NULL;
@@ -6285,6 +6980,7 @@ sg_uc_remove_sfp(struct work_struct *work)
 		SG_LOG(1, sfp, "%s: expected submitted=0 got %d\n",
 		       __func__, subm);
 	xa_destroy(xafp);
+	xadp = &sdp->sfp_arr;
 	xa_lock_irqsave(xadp, iflags);
 	e_sfp = __xa_erase(xadp, sfp->idx);
 	xa_unlock_irqrestore(xadp, iflags);
@@ -6297,7 +6993,7 @@ sg_uc_remove_sfp(struct work_struct *work)
 	kfree(sfp);
 
 	scsi_device_put(sdp->device);
-	kref_put(&sdp->d_ref, sg_device_destroy);
+	kref_put(&sdp->d_ref, sg_device_destroy);	/* get: sg_add_sfp() */
 	module_put(THIS_MODULE);
 }
 
@@ -6337,7 +7033,7 @@ sg_get_dev(int min_dev)
 		 */
 		sdp = ERR_PTR(-ENODEV);
 	} else
-		kref_get(&sdp->d_ref);
+		kref_get(&sdp->d_ref);	/* put: sg_open() */
 	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return sdp;
 }
@@ -6607,23 +7303,26 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
 	enum sg_rq_state rq_st;
 	const char *cp;
 	const char *tp = t_in_ns ? "ns" : "ms";
+	char b[32];
 
 	if (unlikely(len < 1))
 		return 0;
 	v4 = test_bit(SG_FRQ_IS_V4I, srp->frq_bm);
 	is_v3v4 = v4 ? true : (srp->s_hdr3.interface_id != '\0');
-	if (srp->parentfp->rsv_srp == srp)
+	sg_get_rsv_str(srp, "     ", "", sizeof(b), b);
+	if (strlen(b) > 5)
 		cp = (is_v3v4 && (srp->rq_flags & SG_FLAG_MMAP_IO)) ?
-				"     mmap>> " : "     rsv>> ";
+					" mmap" : "";
 	else
-		cp = (srp->rq_info & SG_INFO_DIRECT_IO_MASK) ?
-				"     dio>> " : "     ";
+		cp = (srp->rq_info & SG_INFO_DIRECT_IO_MASK) ? " dio" : "";
 	rq_st = atomic_read(&srp->rq_st);
 	dur = sg_get_dur(srp, &rq_st, t_in_ns, &is_dur);
-	n += scnprintf(obp + n, len - n, "%s%s: dlen=%d/%d id=%d", cp,
-		       sg_rq_st_str(rq_st, false), srp->sgatp->dlen,
+	n += scnprintf(obp + n, len - n, "%s%s>> %s:%d dlen=%d/%d id=%d", b,
+		       cp, sg_rq_st_str(rq_st, false), srp->rq_idx, srp->sgatp->dlen,
 		       srp->sgatp->buflen, (int)srp->pack_id);
-	if (is_dur)	/* cmd/req has completed, waiting for ... */
+	if (test_bit(SG_FFD_NO_DURATION, srp->parentfp->ffd_bm))
+		;
+	else if (is_dur)	/* cmd/req has completed, waiting for ... */
 		n += scnprintf(obp + n, len - n, " dur=%u%s", dur, tp);
 	else if (dur < U32_MAX) { /* in-flight or busy (so ongoing) */
 		if ((srp->rq_flags & SGV4_FLAG_YIELD_TAG) &&
@@ -6636,9 +7335,10 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
 	if (srp->sh_var != SG_SHR_NONE)
 		n += scnprintf(obp + n, len - n, " shr=%s",
 			       sg_shr_str(srp->sh_var, false));
+	if (srp->sgatp->num_sgat > 1)
+		n += scnprintf(obp + n, len - n, " sgat=%d", srp->sgatp->num_sgat);
 	cp = (srp->rq_flags & SGV4_FLAG_HIPRI) ? "hipri " : "";
-	n += scnprintf(obp + n, len - n, " sgat=%d %sop=0x%02x\n",
-		       srp->sgatp->num_sgat, cp, srp->cmd_opcode);
+	n += scnprintf(obp + n, len - n, " %sop=0x%02x\n", cp, srp->cmd_opcode);
 	return n;
 }
 
@@ -6653,7 +7353,7 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 	int to, k;
 	unsigned long iflags;
 	const char *cp;
-	struct sg_request *srp;
+	struct sg_request *srp = fp->rsv_arr[0];
 	struct sg_device *sdp = fp->parentdp;
 
 	if (sg_fd_is_shared(fp))
@@ -6671,14 +7371,19 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 		n += scnprintf(obp + n, len - n, "timeout=%dms rs", to);
 	else
 		n += scnprintf(obp + n, len - n, "timeout=%ds rs", to / 1000);
-	n += scnprintf(obp + n, len - n, "v_buflen=%d%s idx=%lu\n   cmd_q=%d ",
-		       fp->rsv_srp->sgatp->buflen, cp, idx,
-		       (int)!test_bit(SG_FFD_NO_CMD_Q, fp->ffd_bm));
-	n += scnprintf(obp + n, len - n,
-		       "f_packid=%d k_orphan=%d ffd_bm=0x%lx\n",
-		       (int)test_bit(SG_FFD_FORCE_PACKID, fp->ffd_bm),
-		       (int)test_bit(SG_FFD_KEEP_ORPHAN, fp->ffd_bm),
-		       fp->ffd_bm[0]);
+	n += scnprintf(obp + n, len - n, "v_buflen=%d%s fd_idx=%lu\n  ",
+		       (srp ? srp->sgatp->buflen : -1), cp, idx);
+	if (test_bit(SG_FFD_NO_CMD_Q, fp->ffd_bm))
+		n += scnprintf(obp + n, len - n, " no_cmd_q");
+	if (test_bit(SG_FFD_FORCE_PACKID, fp->ffd_bm))
+		n += scnprintf(obp + n, len - n, " force_packid");
+	if (test_bit(SG_FFD_KEEP_ORPHAN, fp->ffd_bm))
+		n += scnprintf(obp + n, len - n, " keep_orphan");
+	if (test_bit(SG_FFD_EXCL_WAITQ, fp->ffd_bm))
+		n += scnprintf(obp + n, len - n, " excl_waitq");
+	if (test_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm))
+		n += scnprintf(obp + n, len - n, " svb");
+	n += scnprintf(obp + n, len - n, " fd_bm=0x%lx\n", fp->ffd_bm[0]);
 	n += scnprintf(obp + n, len - n,
 		       "   mmap_sz=%d low_used_idx=%d low_await_idx=%d sum_fd_dlens=%u\n",
 		       fp->mmap_sz, READ_ONCE(fp->low_used_idx), READ_ONCE(fp->low_await_idx),
@@ -6699,7 +7404,7 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 		if (xa_get_mark(&fp->srp_arr, idx, SG_XA_RQ_INACTIVE))
 			continue;
 		if (set_debug)
-			n += scnprintf(obp + n, len - n, "     frq_bm=0x%lx  ",
+			n += scnprintf(obp + n, len - n, "     rq_bm=0x%lx",
 				       srp->frq_bm[0]);
 		else if (test_bit(SG_FRQ_ABORTING, srp->frq_bm))
 			n += scnprintf(obp + n, len - n,
@@ -6720,7 +7425,7 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 		if (k == 0)
 			n += scnprintf(obp + n, len - n, "   Inactives:\n");
 		if (set_debug)
-			n += scnprintf(obp + n, len - n, "     frq_bm=0x%lx  ",
+			n += scnprintf(obp + n, len - n, "     rq_bm=0x%lx",
 				       srp->frq_bm[0]);
 		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns,
 					obp + n, len - n);
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index bf947ebe06dd..a1f35fd34816 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -222,6 +222,7 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_SEIRV_DEV_INACT_RQS	0x4	/* sum(inactive rqs) on owning dev */
 #define SG_SEIRV_SUBMITTED	0x5	/* number of mrqs submitted+unread */
 #define SG_SEIRV_DEV_SUBMITTED	0x6	/* sum(submitted) on all dev's fds */
+#define SG_SEIRV_MAX_RSV_REQS	0x7	/* maximum reserve requests */
 
 /*
  * A pointer to the following structure is passed as the third argument to
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 64/83] sg: device timestamp
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (63 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 63/83] sg: shared variable blocking Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 65/83] sg: condition met is not an error Douglas Gilbert
                   ` (18 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Add timestamp to each sg_device object that is written when the
object is created. The timestamp is the number nanoseconds since
the boot time of the machine.

The purpose is to allow the user to detect, via the extended
ioctl()s SG_SEIRV_DEV_TS_LOWER and SG_SEIRV_DEV_TS_UPPER, if
a given sg device object (e.g. /dev/sg3) may have possibly
changed. One worrisome scenario is when a device disappears
and a newly connected device takes the same sg device object
number (e.g. /dev/sg3) as the recently disappeared device.
Linux gives no guarantees that this type of behaviour will
_not_ happen. Recording the device creation timestamp is one
way an application can detect when this happens.

The uptime command in Linux shows, in humanly readable form,
how long a machine has been "up", that is the time that has
elapsed since the machine was started or "rebooted".

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 8 ++++++++
 include/uapi/scsi/sg.h | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index c401047cae70..dc85592112e2 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -308,6 +308,7 @@ struct sg_device { /* holds the state of each scsi generic device */
 	int max_sgat_elems;     /* adapter's max number of elements in sgat */
 	int max_sgat_sz;	/* max number of bytes in sgat list */
 	u32 index;		/* device index number */
+	u64 create_ns;		/* nanoseconds since bootup device created */
 	atomic_t open_cnt;	/* count of opens (perhaps < num(sfds) ) */
 	unsigned long fdev_bm[1];	/* see SG_FDEV_* defines above */
 	struct gendisk *disk;
@@ -4462,6 +4463,12 @@ sg_extended_read_value(struct sg_fd *sfp, struct sg_extended_info *seip)
 	case SG_SEIRV_MAX_RSV_REQS:
 		seip->read_value = SG_MAX_RSV_REQS;
 		break;
+	case SG_SEIRV_DEV_TS_LOWER:	/* timestamp is 64 bits */
+		seip->read_value = sfp->parentdp->create_ns & U32_MAX;
+		break;
+	case SG_SEIRV_DEV_TS_UPPER:
+		seip->read_value = (sfp->parentdp->create_ns >> 32) & U32_MAX;
+		break;
 	default:
 		SG_LOG(6, sfp, "%s: can't decode %d --> read_value\n",
 		       __func__, seip->read_value);
@@ -5530,6 +5537,7 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 	} else
 		pr_warn("%s: sg_sys Invalid\n", __func__);
 
+	sdp->create_ns = ktime_get_boottime_ns();
 	sg_calc_sgat_param(sdp);
 	sdev_printk(KERN_NOTICE, scsidp, "Attached scsi generic sg%d "
 		    "type %d\n", sdp->index, scsidp->type);
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index a1f35fd34816..a3f3d244d2af 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -223,6 +223,8 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_SEIRV_SUBMITTED	0x5	/* number of mrqs submitted+unread */
 #define SG_SEIRV_DEV_SUBMITTED	0x6	/* sum(submitted) on all dev's fds */
 #define SG_SEIRV_MAX_RSV_REQS	0x7	/* maximum reserve requests */
+#define SG_SEIRV_DEV_TS_LOWER	0x8	/* device timestamp's lower 32 bits */
+#define SG_SEIRV_DEV_TS_UPPER	0x9	/* device timestamp's upper 32 bits */
 
 /*
  * A pointer to the following structure is passed as the third argument to
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 65/83] sg: condition met is not an error
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (64 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 64/83] sg: device timestamp Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 66/83] sg: split sg_setup_req Douglas Gilbert
                   ` (17 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Most, but not all, non-zero SCSI status values indicate there is
a problem with the current or earlier command. The main exception
is from the PRE-FETCH commands that return CONDITION MET (0x4)
when the specified blocks can/have fitted in the device's cache.
And somewhat strangely return the GOOD (0x0) status if it will/has
not fit. Clean up all SCSI status and associated error
processing.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 64 ++++++++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 28 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index dc85592112e2..ca6af752b23d 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -122,9 +122,6 @@ enum sg_shr_var {
 
 #define SG_MAX_RSV_REQS 8
 
-/* Only take lower 4 bits of driver byte, all host byte and sense byte */
-#define SG_ML_RESULT_MSK 0x0fff00ff	/* mid-level's 32 bit result value */
-
 #define SG_PACK_ID_WILDCARD (-1)
 #define SG_TAG_WILDCARD (-1)
 
@@ -652,6 +649,19 @@ sg_fd_share_ptr(struct sg_fd *sfp)
 	return res_sfp;
 }
 
+/*
+ * Picks up driver or host (transport) errors and actual SCSI status problems.
+ * Specifically SAM_STAT_CONDITION_MET is _not_ an error.
+ */
+static inline bool
+sg_result_is_good(int rq_result)
+{
+	/* Take lower 4 bits of driver byte and all host byte */
+	const int ml_result_msk = 0x0fff0000;
+
+	return !(rq_result & ml_result_msk) && scsi_status_is_good(rq_result);
+}
+
 /*
  * Release resources associated with a prior, successful sg_open(). It can be
  * seen as the (final) close() call on a sg device file descriptor in the user
@@ -1306,9 +1316,7 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 		}
 		++num_cmpl;
 		hp->info |= SG_INFO_MRQ_FINI;
-		if (mhp->stop_if && (hp->driver_status ||
-				     hp->transport_status ||
-				     hp->device_status)) {
+		if (mhp->stop_if && !sg_result_is_good(srp->rq_result)) {
 			SG_LOG(2, fp, "%s: %s=0x%x/0x%x/0x%x] cause exit\n",
 			       __func__, "STOP_IF and status [drv/tran/scsi",
 			       hp->driver_status, hp->transport_status,
@@ -2375,7 +2383,7 @@ sg_copy_sense(struct sg_request *srp, bool v4_active)
 	int scsi_stat;
 
 	/* If need be, copy the sense buffer to the user space */
-	scsi_stat = srp->rq_result & 0xff;
+	scsi_stat = srp->rq_result & 0xfe;
 	if (unlikely((scsi_stat & SAM_STAT_CHECK_CONDITION) ||
 		     (driver_byte(srp->rq_result) & DRIVER_SENSE))) {
 		int sb_len = min_t(int, SCSI_SENSE_BUFFERSIZE, srp->sense_len);
@@ -2411,13 +2419,13 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 	u32 rq_res = srp->rq_result;
 	enum sg_shr_var sh_var = srp->sh_var;
 
-	if (unlikely(srp->rq_result & 0xff)) {
+	if (unlikely(!scsi_status_is_good(rq_res))) {
 		int sb_len_wr = sg_copy_sense(srp, v4_active);
 
 		if (unlikely(sb_len_wr < 0))
 			return sb_len_wr;
 	}
-	if (rq_res & SG_ML_RESULT_MSK)
+	if (!sg_result_is_good(rq_res))
 		srp->rq_info |= SG_INFO_CHECK;
 	if (unlikely(test_bit(SG_FRQ_ABORTING, srp->frq_bm)))
 		srp->rq_info |= SG_INFO_ABORTED;
@@ -2478,7 +2486,7 @@ sg_complete_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool other_err)
 			int poll_type = POLL_OUT;
 			struct sg_fd *ws_sfp = sg_fd_share_ptr(sfp);
 
-			if (unlikely((srp->rq_result & SG_ML_RESULT_MSK) ||
+			if (unlikely(!sg_result_is_good(srp->rq_result) ||
 				     other_err)) {
 				set_bit(SG_FFD_READ_SIDE_ERR, sfp->ffd_bm);
 				if (sr_st != SG_RQ_BUSY)
@@ -2797,7 +2805,7 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 	     struct sg_request *srp)
 {
 	int res = 0;
-	u32 rq_result = srp->rq_result;
+	u32 rq_res = srp->rq_result;
 	struct sg_header *h2p;
 	struct sg_slice_hdr3 *sh3p;
 	struct sg_header a_v2hdr;
@@ -2809,11 +2817,11 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 	h2p->pack_len = h2p->reply_len; /* old, strange behaviour */
 	h2p->pack_id = sh3p->pack_id;
 	h2p->twelve_byte = (srp->cmd_opcode >= 0xc0 && sh3p->cmd_len == 12);
-	h2p->target_status = status_byte(rq_result);
-	h2p->host_status = host_byte(rq_result);
-	h2p->driver_status = driver_byte(rq_result);
-	if (unlikely((CHECK_CONDITION & status_byte(rq_result)) ||
-		     (DRIVER_SENSE & driver_byte(rq_result)))) {
+	h2p->target_status = status_byte(rq_res);
+	h2p->host_status = host_byte(rq_res);
+	h2p->driver_status = driver_byte(rq_res);
+	if (unlikely(!scsi_status_is_good(rq_res) ||
+		     (driver_byte(rq_res) & DRIVER_SENSE))) {
 		if (likely(srp->sense_bp)) {
 			u8 *sbp = srp->sense_bp;
 
@@ -2823,7 +2831,7 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 			mempool_free(sbp, sg_sense_pool);
 		}
 	}
-	switch (unlikely(host_byte(rq_result))) {
+	switch (unlikely(host_byte(rq_res))) {
 	/*
 	 * This following setting of 'result' is for backward compatibility
 	 * and is best ignored by the user who should use target, host and
@@ -2847,7 +2855,7 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 		h2p->result = EIO;
 		break;
 	case DID_ERROR:
-		h2p->result = (status_byte(rq_result) == GOOD) ? 0 : EIO;
+		h2p->result = sg_result_is_good(rq_res) ? 0 : EIO;
 		break;
 	default:
 		h2p->result = EIO;
@@ -2998,7 +3006,7 @@ static int
 sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
 {
 	int err, err2;
-	int rq_result = srp->rq_result;
+	int rq_res = srp->rq_result;
 	struct sg_io_hdr hdr3;
 	struct sg_io_hdr *hp = &hdr3;
 
@@ -3012,11 +3020,11 @@ sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
 	hp->resid = srp->in_resid;
 	hp->pack_id = srp->pack_id;
 	hp->duration = srp->duration;
-	hp->status = rq_result & 0xff;
-	hp->masked_status = status_byte(rq_result);
-	hp->msg_status = msg_byte(rq_result);
-	hp->host_status = host_byte(rq_result);
-	hp->driver_status = driver_byte(rq_result);
+	hp->status = rq_res & 0xff;
+	hp->masked_status = status_byte(rq_res);
+	hp->msg_status = msg_byte(rq_res);
+	hp->host_status = host_byte(rq_res);
+	hp->driver_status = driver_byte(rq_res);
 	err2 = put_sg_io_hdr(hp, p);
 	err = err ? err : err2;
 	sg_complete_v3v4(sfp, srp, err < 0);
@@ -3447,7 +3455,7 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 		rip->duration = 0;
 	rip->orphan = test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
 	rip->sg_io_owned = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
-	rip->problem = !!(srp->rq_result & SG_ML_RESULT_MSK);
+	rip->problem = !sg_result_is_good(srp->rq_result);
 	rip->pack_id = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm) ?
 				srp->tag : srp->pack_id;
 	rip->usr_ptr = test_bit(SG_FRQ_IS_V4I, srp->frq_bm) ?
@@ -5315,7 +5323,7 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 			srp->in_resid = a_resid;
 		}
 	}
-	if (unlikely(test_bit(SG_FRQ_ABORTING, srp->frq_bm)) && rq_result == 0)
+	if (unlikely(test_bit(SG_FRQ_ABORTING, srp->frq_bm)) && sg_result_is_good(rq_result))
 		srp->rq_result |= (DRIVER_HARD << 24);
 
 	SG_LOG(6, sfp, "%s: pack/tag_id=%d/%d, cmd=0x%x, res=0x%x\n", __func__,
@@ -5323,7 +5331,7 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	if (srp->start_ns > 0)	/* zero only when SG_FFD_NO_DURATION is set */
 		srp->duration = sg_calc_rq_dur(srp, test_bit(SG_FFD_TIME_IN_NS,
 							     sfp->ffd_bm));
-	if (unlikely((rq_result & SG_ML_RESULT_MSK) && slen > 0 &&
+	if (unlikely(!sg_result_is_good(rq_result) && slen > 0 &&
 		     test_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm))) {
 		if ((rq_result & 0xff) == SAM_STAT_CHECK_CONDITION ||
 		    (rq_result & 0xff) == SAM_STAT_COMMAND_TERMINATED)
@@ -6522,7 +6530,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		rs_st = atomic_read(&rs_rsv_srp->rq_st);
 		switch (rs_st) {
 		case SG_RQ_AWAIT_RCV:
-			if (unlikely(rs_rsv_srp->rq_result & SG_ML_RESULT_MSK)) {
+			if (!sg_result_is_good(rs_rsv_srp->rq_result)) {
 				/* read-side done but error occurred */
 				r_srp = ERR_PTR(-ENOSTR);
 				break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 66/83] sg: split sg_setup_req
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (65 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 65/83] sg: condition met is not an error Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 67/83] sg: finish after read-side request Douglas Gilbert
                   ` (16 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

The sg_setup_req() function was getting too long. It has been
split into a helper (sg_setup_req_ws_helper() ) and a function of
the same original name.

Rename all pointers to struct request from rq to rqq. This is to
better distinguish them from the other rq_* variables. Add
READ_ONCE/WRITE_ONCE macros to all accesses to srp->rqq .

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 331 +++++++++++++++++++++++-----------------------
 1 file changed, 162 insertions(+), 169 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index ca6af752b23d..dcb9afe722c2 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -343,7 +343,7 @@ struct sg_mrq_hold {	/* for passing context between mrq functions */
 };
 
 /* tasklet or soft irq callback */
-static void sg_rq_end_io(struct request *rq, blk_status_t status);
+static void sg_rq_end_io(struct request *rqq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
 static int sg_proc_init(void);
 static void sg_dfs_init(void);
@@ -3667,6 +3667,7 @@ sg_abort_req(struct sg_fd *sfp, struct sg_request *srp)
 {
 	int res = 0;
 	enum sg_rq_state rq_st;
+	struct request *rqq;
 
 	if (test_and_set_bit(SG_FRQ_ABORTING, srp->frq_bm)) {
 		SG_LOG(1, sfp, "%s: already aborting req pack_id/tag=%d/%d\n",
@@ -3691,14 +3692,11 @@ sg_abort_req(struct sg_fd *sfp, struct sg_request *srp)
 		break;		/* nothing to do here, return 0 */
 	case SG_RQ_INFLIGHT:	/* only attempt abort if inflight */
 		srp->rq_result |= (DRIVER_SOFT << 24);
-		{
-			struct request *rqq = READ_ONCE(srp->rqq);
-
-			if (likely(rqq)) {
-				SG_LOG(5, sfp, "%s: -->blk_abort_request srp=0x%pK\n",
-				       __func__, srp);
-				blk_abort_request(rqq);
-			}
+		rqq = READ_ONCE(srp->rqq);
+		if (likely(rqq)) {
+			SG_LOG(5, sfp, "%s: -->blk_abort_request srp=0x%pK\n",
+			       __func__, srp);
+			blk_abort_request(rqq);
 		}
 		break;
 	default:
@@ -4116,6 +4114,7 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 						      o_srp->frq_bm));
 				*rapp = n_srp;
 				sg_rq_chg_state_force_ulck(n_srp, SG_RQ_INACTIVE);
+				/* no bump of sfp->inactives since replacement */
 				xa_unlock_irqrestore(xafp, iflags);
 				SG_LOG(6, sfp, "%s: new rsv srp=0x%pK ++\n",
 				       __func__, n_srp);
@@ -4225,7 +4224,7 @@ sg_take_snap(struct sg_fd *sfp, bool clear_first)
 	}
 #if IS_ENABLED(SG_PROC_OR_DEBUG_FS)
 	if (true) {	/* for some declarations */
-		int n, prevlen, bp_len;
+		int prevlen, bp_len;
 		char *bp;
 
 		prevlen = strlen(snapped_buf);
@@ -5333,8 +5332,8 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 							     sfp->ffd_bm));
 	if (unlikely(!sg_result_is_good(rq_result) && slen > 0 &&
 		     test_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm))) {
-		if ((rq_result & 0xff) == SAM_STAT_CHECK_CONDITION ||
-		    (rq_result & 0xff) == SAM_STAT_COMMAND_TERMINATED)
+		if ((rq_result & 0xfe) == SAM_STAT_CHECK_CONDITION ||
+		    (rq_result & 0xfe) == SAM_STAT_COMMAND_TERMINATED)
 			__scsi_print_sense(sdp->device, __func__, scsi_rp->sense, slen);
 	}
 	if (unlikely(slen > 0)) {
@@ -5825,8 +5824,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	 * blk_get_request(BLK_MQ_REQ_NOWAIT) yields EAGAIN (aka EWOULDBLOCK).
 	 */
 	rqq = blk_get_request(q, (r0w ? REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN),
-			      (test_bit(SG_FFD_MORE_ASYNC, sfp->ffd_bm) ?
-						BLK_MQ_REQ_NOWAIT : 0));
+			      (test_bit(SG_FFD_MORE_ASYNC, sfp->ffd_bm) ?  BLK_MQ_REQ_NOWAIT : 0));
 	if (IS_ERR(rqq)) {
 		res = PTR_ERR(rqq);
 		goto err_pre_blk_get;
@@ -5971,7 +5969,7 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 	}
 
 	/* Expect blk_put_request(rqq) already called in sg_rq_end_io() */
-	if (rqq) {       /* blk_get_request() may have failed */
+	if (rqq) {	/* blk_get_request() may have failed */
 		WRITE_ONCE(srp->rqq, NULL);
 		if (scsi_req(rqq))
 			scsi_req_free_cmd(scsi_req(rqq));
@@ -6440,172 +6438,145 @@ sg_build_reserve(struct sg_fd *sfp, int buflen)
 	} while (true);
 }
 
+static struct sg_request *
+sg_setup_req_ws_helper(struct sg_fd *fp, int rsv_idx)
+{
+	int res;
+	struct sg_request *r_srp;
+	enum sg_rq_state rs_sr_st;
+	struct sg_fd *rs_sfp = sg_fd_share_ptr(fp);
+
+	if (unlikely(!rs_sfp))
+		return ERR_PTR(-EPROTO);
+	/*
+	 * There may be contention with another potential write-side trying
+	 * to pair with this read-side. The loser will receive an
+	 * EADDRINUSE errno. The winner advances read-side's rq_state:
+	 *     SG_RQ_SHR_SWAP --> SG_RQ_SHR_IN_WS
+	 */
+	if (rsv_idx >= 0)
+		r_srp = rs_sfp->rsv_arr[rsv_idx];
+	else
+		r_srp = sg_get_probable_read_side(rs_sfp);
+	if (unlikely(!r_srp))
+		return ERR_PTR(-ENOSTR);
+
+	rs_sr_st = atomic_read(&r_srp->rq_st);
+	switch (rs_sr_st) {
+	case SG_RQ_SHR_SWAP:
+		break;
+	case SG_RQ_AWAIT_RCV:
+	case SG_RQ_INFLIGHT:
+	case SG_RQ_BUSY:
+		return ERR_PTR(-EBUSY);	/* too early for write-side req */
+	case SG_RQ_INACTIVE:
+		SG_LOG(1, fp, "%s: write-side finds read-side inactive\n",
+		       __func__);
+		return ERR_PTR(-EADDRNOTAVAIL);
+	case SG_RQ_SHR_IN_WS:
+		SG_LOG(1, fp, "%s: write-side find read-side shr_in_ws\n",
+		       __func__);
+		return ERR_PTR(-EADDRINUSE);
+	}
+	res = sg_rq_chg_state(r_srp, rs_sr_st, SG_RQ_SHR_IN_WS);
+	if (unlikely(res))
+		return ERR_PTR(-EADDRINUSE);
+	return r_srp;
+}
+
 /*
  * Setup an active request (soon to carry a SCSI command) to the current file
- * descriptor by creating a new one or re-using a request from the free
- * list (fl). If successful returns a valid pointer to a sg_request object
- * which is in the SG_RQ_BUSY state. On failure returns a negated errno value
- * twisted by ERR_PTR() macro. Note that once a file share is established,
- * the read-side's reserve request can only be used in a request share.
+ * descriptor by creating a new one or re-using a request marked inactive.
+ * If successful returns a valid pointer to a sg_request object which is in
+ * the SG_RQ_BUSY state. On failure returns a negated errno value twisted by
+ * ERR_PTR() macro. Note that once a file share is established, the read-side
+ * side's reserve request can only be used in a request share.
  */
 static struct sg_request *
 sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 {
-	bool act_empty = false;
 	bool allow_rsv = true;		/* see note above */
 	bool mk_new_srp = true;
 	bool new_rsv_srp = false;
+	bool no_reqs = false;
 	bool ws_rq = false;
+	bool some_inactive = false;
 	bool try_harder = false;
 	bool second = false;
-	bool has_inactive = false;
 	bool is_rsv;
 	int ra_idx = 0;
-	int res, l_used_idx;
+	int l_used_idx;
 	u32 sum_dlen;
 	unsigned long idx, s_idx, end_idx, iflags;
 	enum sg_rq_state sr_st;
-	enum sg_rq_state rs_st = SG_RQ_INACTIVE;
 	struct sg_fd *fp = cwrp->sfp;
-	struct sg_request *r_srp = NULL; /* returned value won't be NULL */
-	struct sg_request *low_srp = NULL;
+	struct sg_request *r_srp; /* returned value won't be NULL */
 	struct sg_request *rs_rsv_srp = NULL;
-	struct sg_fd *rs_sfp = NULL;
 	struct xarray *xafp = &fp->srp_arr;
-	__maybe_unused const char *cp = NULL;
+	__maybe_unused const char *cp = "";
 	__maybe_unused char b[64];
 
-	b[0] = '\0';
 	switch (sh_var) {
-	case SG_SHR_NONE:
-	case SG_SHR_WS_NOT_SRQ:
-		break;
 	case SG_SHR_RS_RQ:
-		if (test_bit(SG_FFD_RESHARE, fp->ffd_bm))
-			ra_idx = 0;
-		else
-			ra_idx = sg_get_idx_available(fp);
+		cp = "rs_rq";
+		ra_idx = (test_bit(SG_FFD_RESHARE, fp->ffd_bm)) ? 0 : sg_get_idx_available(fp);
 		if (ra_idx < 0) {
 			new_rsv_srp = true;
-			cp = "m_rq";
 			goto good_fini;
 		}
 		r_srp = fp->rsv_arr[ra_idx];
 		sr_st = atomic_read(&r_srp->rq_st);
 		if (sr_st == SG_RQ_INACTIVE) {
-			res = sg_rq_chg_state(r_srp, sr_st, SG_RQ_BUSY);
-			if (likely(res == 0)) {
+			int res = sg_rq_chg_state(r_srp, sr_st, SG_RQ_BUSY);
+
+			if (unlikely(res)) {
+				r_srp = NULL;
+			} else {
 				r_srp->sh_srp = NULL;
 				mk_new_srp = false;
-				cp = "rs_rq";
-				goto good_fini;
 			}
+		} else {
+			SG_LOG(1, fp, "%s: no reserve request available\n", __func__);
+			r_srp = ERR_PTR(-EFBIG);
 		}
-		/* Did not find the reserve request available */
-		r_srp = ERR_PTR(-EFBIG);
-		break;
-	case SG_SHR_RS_NOT_SRQ:
-		allow_rsv = false;
-		break;
+		if (IS_ERR(r_srp))
+			goto err_out;
+		if (mk_new_srp)
+			new_rsv_srp = true;
+		goto good_fini;
 	case SG_SHR_WS_RQ:
-		rs_sfp = sg_fd_share_ptr(fp);
-		if (unlikely(!rs_sfp)) {
-			r_srp = ERR_PTR(-EPROTO);
-			break;
-		}
-		/*
-		 * There may be contention with another potential write-side trying
-		 * to pair with this read-side. The loser will receive an
-		 * EADDRINUSE errno. The winner advances read-side's rq_state:
-		 *     SG_RQ_SHR_SWAP --> SG_RQ_SHR_IN_WS
-		 */
-		if (cwrp->rsv_idx >= 0)
-			rs_rsv_srp = rs_sfp->rsv_arr[cwrp->rsv_idx];
-		else
-			rs_rsv_srp = sg_get_probable_read_side(rs_sfp);
-		if (!rs_rsv_srp) {
-			r_srp = ERR_PTR(-ENOSTR);
-			break;
-		}
-		rs_st = atomic_read(&rs_rsv_srp->rq_st);
-		switch (rs_st) {
-		case SG_RQ_AWAIT_RCV:
-			if (!sg_result_is_good(rs_rsv_srp->rq_result)) {
-				/* read-side done but error occurred */
-				r_srp = ERR_PTR(-ENOSTR);
-				break;
-			}
-			ws_rq = true;
-			break;
-		case SG_RQ_SHR_SWAP:
-			ws_rq = true;
-			if (unlikely(rs_st == SG_RQ_AWAIT_RCV))
-				break;
-			res = sg_rq_chg_state(rs_rsv_srp, rs_st, SG_RQ_SHR_IN_WS);
-			if (unlikely(res))
-				r_srp = ERR_PTR(-EADDRINUSE);
-			break;
-		case SG_RQ_INFLIGHT:
-		case SG_RQ_BUSY:
-			SG_LOG(6, fp, "%s: write-side finds read-side: %s\n", __func__,
-			       sg_rq_st_str(rs_st, true));
-			r_srp = ERR_PTR(-EBUSY);
-			break;
-		case SG_RQ_INACTIVE:
-			r_srp = ERR_PTR(-EADDRNOTAVAIL);
-			break;
-		case SG_RQ_SHR_IN_WS:
-		default:
-			r_srp = ERR_PTR(-EADDRINUSE);
-			break;
-		}
-		break;
-	}
-	if (IS_ERR(r_srp)) {
-		if (PTR_ERR(r_srp) == -EBUSY)
+		cp = "rs_rq";
+		rs_rsv_srp = sg_setup_req_ws_helper(fp, cwrp->rsv_idx);
+		if (IS_ERR(rs_rsv_srp)) {
+			r_srp = rs_rsv_srp;
 			goto err_out;
-#if IS_ENABLED(SG_LOG_ACTIVE)
-		if (sh_var == SG_SHR_RS_RQ) {
-			snprintf(b, sizeof(b), "SG_SHR_RS_RQ --> sr_st=%s",
-				 sg_rq_st_str(sr_st, false));
-		} else if (sh_var == SG_SHR_WS_RQ && rs_sfp) {
-			char c[32];
-			const char *ccp;
-
-			if (rs_rsv_srp)
-				ccp = sg_get_rsv_str(rs_rsv_srp, "[", "]",
-						     sizeof(c), c);
-			else
-				ccp = "? ";
-			snprintf(b, sizeof(b), "SHR_WS_RQ --> rs_sr%s_st=%s",
-				 ccp, sg_rq_st_str(rs_st, false));
-		} else {
-			snprintf(b, sizeof(b), "sh_var=%s",
-				 sg_shr_str(sh_var, false));
 		}
-#endif
-		goto err_out;
-	}
-	cp = "";
-
-	if (ws_rq) {	/* write-side dlen may be <= read-side's dlen */
+		/* write-side dlen may be <= read-side's dlen */
 		if (unlikely(dxfr_len > rs_rsv_srp->sgatp->dlen)) {
-			SG_LOG(4, fp, "%s: write-side dlen [%d] > read-side dlen\n",
+			SG_LOG(1, fp, "%s: bad, write-side dlen [%d] > read-side's\n",
 			       __func__, dxfr_len);
 			r_srp = ERR_PTR(-E2BIG);
 			goto err_out;
 		}
+		ws_rq = true;
 		dxfr_len = 0;	/* any srp for write-side will do, pick smallest */
+		break;
+	case SG_SHR_RS_NOT_SRQ:
+		allow_rsv = false;
+		break;
+	default:
+		break;
 	}
 
 start_again:
-	cp = "";
 	if (xa_empty(xafp)) {
-		act_empty = true;
+		no_reqs = true;
 		mk_new_srp = true;
 	} else if (atomic_read(&fp->inactives) <= 0) {
 		mk_new_srp = true;
 	} else if (likely(!try_harder) && dxfr_len < SG_DEF_SECTOR_SZ) {
+		struct sg_request *low_srp = NULL;
+
 		l_used_idx = READ_ONCE(fp->low_used_idx);
 		s_idx = (l_used_idx < 0) ? 0 : l_used_idx;
 		if (l_used_idx >= 0 && xa_get_mark(xafp, s_idx, SG_XA_RQ_INACTIVE)) {
@@ -6622,53 +6593,75 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 			}
 		}
 		xa_for_each_marked(xafp, idx, r_srp, SG_XA_RQ_INACTIVE) {
-			has_inactive = true;
-			if (!allow_rsv &&
-			    test_bit(SG_FRQ_RESERVED, r_srp->frq_bm))
-				continue;
-			if (!low_srp && dxfr_len < SG_DEF_SECTOR_SZ) {
-				low_srp = r_srp;
-				break;
+			if (allow_rsv || !test_bit(SG_FRQ_RESERVED, r_srp->frq_bm)) {
+				if (r_srp->sgat_h.buflen <= SG_DEF_SECTOR_SZ) {
+					if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
+						continue;
+					mk_new_srp = false;
+					break;
+				} else if (!low_srp) {
+					low_srp = r_srp;
+				}
 			}
 		}
-		/* If dxfr_len is small, use lowest inactive request */
-		if (low_srp) {
+		if (mk_new_srp && low_srp) {	/* no candidate yet */
+			/* take non-NULL low_srp, irrespective of r_srp->sgat_h.buflen size */
 			r_srp = low_srp;
-			if (unlikely(sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY)))
-				goto start_again; /* gone to another thread */
-			atomic_dec(&fp->inactives);
-			cp = "lowest inactive in srp_arr";
-			mk_new_srp = false;
+			if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY) == 0) {
+				mk_new_srp = false;
+				atomic_dec(&fp->inactives);
+			}
 		}
 	} else {
+		cp = "larger from srp_arr";
 		l_used_idx = READ_ONCE(fp->low_used_idx);
 		s_idx = (l_used_idx < 0) ? 0 : l_used_idx;
 		idx = s_idx;
 		end_idx = ULONG_MAX;
+
+		if (allow_rsv) {
 second_time:
-		for (r_srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE);
-		     r_srp;
-		     r_srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE)) {
-			if (!allow_rsv &&
-			    test_bit(SG_FRQ_RESERVED, r_srp->frq_bm))
-				continue;
-			if (r_srp->sgat_h.buflen >= dxfr_len) {
-				if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
-					continue;
-				atomic_dec(&fp->inactives);
-				WRITE_ONCE(fp->low_used_idx, idx + 1);
-				cp = "near front of srp_arr";
-				mk_new_srp = false;
-				break;
+			for (r_srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE);
+			     r_srp;
+			     r_srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE)) {
+				if (dxfr_len <= r_srp->sgat_h.buflen) {
+					if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
+						continue;
+					atomic_dec(&fp->inactives);
+					WRITE_ONCE(fp->low_used_idx, idx + 1);
+					mk_new_srp = false;
+					break;
+				}
+			}
+			if (!r_srp && !second && s_idx > 0) {
+				end_idx = s_idx - 1;
+				s_idx = 0;
+				idx = s_idx;
+				second = true;
+				goto second_time;
+			}
+		} else {
+second_time_2:
+			for (r_srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE);
+			     r_srp;
+			     r_srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE)) {
+				if (dxfr_len <= r_srp->sgat_h.buflen &&
+				    !test_bit(SG_FRQ_RESERVED, r_srp->frq_bm)) {
+					if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
+						continue;
+					atomic_dec(&fp->inactives);
+					WRITE_ONCE(fp->low_used_idx, idx + 1);
+					mk_new_srp = false;
+					break;
+				}
+			}
+			if (!r_srp && !second && s_idx > 0) {
+				end_idx = s_idx - 1;
+				s_idx = 0;
+				idx = s_idx;
+				second = true;
+				goto second_time_2;
 			}
-		}
-		/* If not found so far, need to wrap around and search [0 ... start_idx) */
-		if (!r_srp && !second && s_idx > 0) {
-			end_idx = s_idx - 1;
-			s_idx = 0;
-			idx = s_idx;
-			second = true;
-			goto second_time;
 		}
 	}
 have_existing:
@@ -6713,10 +6706,10 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		}
 		if (IS_ERR(r_srp))	/* NULL is _not_ an ERR here */
 			goto err_out;
-		r_srp = sg_mk_srp_sgat(fp, act_empty, dxfr_len);
+		r_srp = sg_mk_srp_sgat(fp, no_reqs, dxfr_len);
 		if (IS_ERR(r_srp)) {
 			if (!try_harder && dxfr_len < SG_DEF_SECTOR_SZ &&
-			    has_inactive) {
+			    some_inactive) {
 				try_harder = true;
 				goto start_again;
 			}
@@ -6777,7 +6770,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		if (err == EBUSY)
 			SG_LOG(4, fp, "%s: EBUSY (as ptr err)\n", __func__);
 		else
-			SG_LOG(1, fp, "%s: %s err=%d\n", __func__, b, err);
+			SG_LOG(1, fp, "%s: err=%d\n", __func__, err);
 	} else {
 		SG_LOG(4, fp, "%s: %s %sr_srp=0x%pK\n", __func__, cp,
 		       sg_get_rsv_str_lck(r_srp, "[", "] ", sizeof(b), b),
@@ -6911,9 +6904,9 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 		}
 		srp->rq_idx = idx;
 		srp->parentfp = sfp;
+		__set_bit(SG_FRQ_RESERVED, srp->frq_bm);
 		sg_rq_chg_state_force_ulck(srp, SG_RQ_INACTIVE);
 		atomic_inc(&sfp->inactives);
-		__set_bit(SG_FRQ_RESERVED, srp->frq_bm);
 		xa_unlock_irqrestore(xafp, iflags);
 	}
 	if (!reduced) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 67/83] sg: finish after read-side request
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (66 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 66/83] sg: split sg_setup_req Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 68/83] sg: keep share and dout offset flags Douglas Gilbert
                   ` (15 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Replace dual role sg_change_after_read_side_rq() with single role
sg_finish_rs_rq(). Its purpose is to terminate a request share
after its first half (i.e. the read-side). The termination makes
the read-side's reserve request available for re-use.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 90 +++++++++++++++++++++--------------------------
 1 file changed, 41 insertions(+), 49 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index dcb9afe722c2..13a9c3f77715 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -100,13 +100,13 @@ enum sg_rq_state {	/* N.B. sg_rq_state_arr assumes SG_RQ_AWAIT_RCV==2 */
 	SG_RQ_SHR_IN_WS,	/* read-side: waits while write-side inflight */
 };
 
-/* write-side sets up sharing: ioctl(ws_fd,SG_SET_GET_EXTENDED(SHARE_FD(rs_fd))) */
+/* these varieties of share requests are known before a request is created */
 enum sg_shr_var {
-	SG_SHR_NONE = 0,	/* no sharing on this fd, so _not_ shared request */
-	SG_SHR_RS_NOT_SRQ,	/* read-side fd but _not_ shared request */
-	SG_SHR_RS_RQ,		/* read-side sharing on this request */
-	SG_SHR_WS_NOT_SRQ,	/* write-side fd but _not_ shared request */
-	SG_SHR_WS_RQ,		/* write-side sharing on this request */
+	SG_SHR_NONE = 0,	/* no sharing on owning fd */
+	SG_SHR_RS_NOT_SRQ,	/* read-side sharing on fd but not on this req */
+	SG_SHR_RS_RQ,		/* read-side sharing on this data carrying req */
+	SG_SHR_WS_NOT_SRQ,	/* write-side sharing on fd but not on this req */
+	SG_SHR_WS_RQ,		/* write-side sharing on this data carrying req */
 };
 
 /* If sum_of(dlen) of a fd exceeds this, write() will yield E2BIG */
@@ -1503,6 +1503,8 @@ sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 					--cop->din_resid;
 				if (srp->s_hdr4.dir != SG_DXFER_FROM_DEV)
 					continue;
+				if (test_bit(SG_FFD_READ_SIDE_ERR, fp->ffd_bm))
+					continue;
 				/* read-side req completed, submit its write-side */
 				rs_srp = srp;
 				for (m = 0; m < k; ++m) {
@@ -3077,19 +3079,18 @@ sg_calc_sgat_param(struct sg_device *sdp)
 }
 
 /*
- * Only valid for shared file descriptors, else -EINVAL. Should only be
- * called after a read-side request has successfully completed so that
- * there is valid data in reserve buffer. If fini1_again0 is true then
- * read-side is taken out of the state waiting for a write-side request and the
- * read-side is put in the inactive state. If fini1_again0 is false (0) then
- * the read-side (assuming it is inactive) is put in a state waiting for
- * a write-side request. This function is called when the write mask is set on
- * ioctl(SG_SET_GET_EXTENDED(SG_CTL_FLAGM_READ_SIDE_FINI)).
+ * Only valid for shared file descriptors. Designed to be called after a
+ * read-side request has successfully completed leaving valid data in a
+ * reserve request buffer. The read-side is moved from SG_RQ_SHR_SWAP
+ * to SG_RQ_INACTIVE state and returns 0. Acts on first reserve requests.
+ * Otherwise -EINVAL is returned, unless write-side is in progress in
+ * which case -EBUSY is returned.
  */
 static int
-sg_change_after_read_side_rq(struct sg_fd *sfp, bool fini1_again0)
+sg_finish_rs_rq(struct sg_fd *sfp)
 {
 	int res = -EINVAL;
+	int k;
 	enum sg_rq_state sr_st;
 	unsigned long iflags;
 	struct sg_fd *rs_sfp;
@@ -3101,51 +3102,44 @@ sg_change_after_read_side_rq(struct sg_fd *sfp, bool fini1_again0)
 		goto fini;
 	if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_RS_SHARE))
 		rs_sfp = sfp;
-	rs_rsv_srp = rs_sfp->rsv_arr[0];
-	if (IS_ERR_OR_NULL(rs_rsv_srp))
-		goto fini;
 
-	res = 0;
-	xa_lock_irqsave(&rs_sfp->srp_arr, iflags);
-	sr_st = atomic_read(&rs_rsv_srp->rq_st);
-	if (fini1_again0) {	/* finish req share after read-side req */
+	for (k = 0; k < SG_MAX_RSV_REQS; ++k) {
+		res = -EINVAL;
+		rs_rsv_srp = rs_sfp->rsv_arr[k];
+		if (IS_ERR_OR_NULL(rs_rsv_srp))
+			continue;
+		xa_lock_irqsave(&rs_sfp->srp_arr, iflags);
+		sr_st = atomic_read(&rs_rsv_srp->rq_st);
 		switch (sr_st) {
 		case SG_RQ_SHR_SWAP:
-			rs_rsv_srp->sh_var = SG_SHR_RS_NOT_SRQ;
-			rs_rsv_srp = NULL;
-			res = sg_rq_chg_state(rs_rsv_srp, sr_st, SG_RQ_INACTIVE);
+			res = sg_rq_chg_state_ulck(rs_rsv_srp, sr_st, SG_RQ_BUSY);
 			if (!res)
 				atomic_inc(&rs_sfp->inactives);
+			rs_rsv_srp->tag = SG_TAG_WILDCARD;
+			rs_rsv_srp->sh_var = SG_SHR_NONE;
+			set_bit(SG_FRQ_RESERVED, rs_rsv_srp->frq_bm);
+			rs_rsv_srp->in_resid = 0;
+			rs_rsv_srp->rq_info = 0;
+			rs_rsv_srp->sense_len = 0;
+			rs_rsv_srp->sh_srp = NULL;
+			sg_finish_scsi_blk_rq(rs_rsv_srp);
+			sg_deact_request(rs_rsv_srp->parentfp, rs_rsv_srp);
 			break;
 		case SG_RQ_SHR_IN_WS:	/* too late, write-side rq active */
 		case SG_RQ_BUSY:
-			res = -EAGAIN;
+			res = -EBUSY;
 			break;
 		default:
 			res = -EINVAL;
 			break;
 		}
-	} else {	/* again: tweak state to allow another write-side request */
-		switch (sr_st) {
-		case SG_RQ_INACTIVE:
-			rs_rsv_srp->sh_var = SG_SHR_RS_RQ;
-			res = sg_rq_chg_state(rs_rsv_srp, sr_st, SG_RQ_SHR_SWAP);
-			break;
-		case SG_RQ_SHR_SWAP:
-			break;	/* already done, redundant call? */
-		default:	/* all other states */
-			res = -EBUSY;	/* read-side busy doing ... */
-			break;
-		}
+		xa_unlock_irqrestore(&rs_sfp->srp_arr, iflags);
+		if (res == 0)
+			return res;
 	}
-	xa_unlock_irqrestore(&rs_sfp->srp_arr, iflags);
 fini:
-	if (unlikely(res)) {
+	if (unlikely(res))
 		SG_LOG(1, sfp, "%s: err=%d\n", __func__, -res);
-	} else {
-		SG_LOG(6, sfp, "%s: okay, fini1_again0=%d\n", __func__,
-		       fini1_again0);
-	}
 	return res;
 }
 
@@ -4354,11 +4348,9 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 			c_flgs_val_out &= ~SG_CTL_FLAGM_READ_SIDE_FINI;
 		}
 	}
-	if (c_flgs_wm & SG_CTL_FLAGM_READ_SIDE_FINI) {
-		bool rs_fini_wm = !!(c_flgs_val_in & SG_CTL_FLAGM_READ_SIDE_FINI);
-
-		res = sg_change_after_read_side_rq(sfp, rs_fini_wm);
-	}
+	if ((c_flgs_wm & SG_CTL_FLAGM_READ_SIDE_FINI) &&
+	    (c_flgs_val_in & SG_CTL_FLAGM_READ_SIDE_FINI))
+		res = sg_finish_rs_rq(sfp);
 	/* READ_SIDE_ERR boolean, [ro] share: read-side finished with error */
 	if (c_flgs_rm & SG_CTL_FLAGM_READ_SIDE_ERR) {
 		rs_sfp = sg_fd_share_ptr(sfp);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 68/83] sg: keep share and dout offset flags
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (67 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 67/83] sg: finish after read-side request Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 69/83] sg: add dlen to sg_comm_wr_t Douglas Gilbert
                   ` (14 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Add new ability to have single READ followed by one or more
WRITEs. This is done by using the SGV4_FLAG_KEEP_SHARE flag
on all but the last WRITE request.

Further each WRITE (a "dout" operation) may start at a byte
offset by placing that offset in sg_io_v4::spare_in and setting
the SGV4_FLAG_DOUT_OFFSET flag.

Any shared WRITE's length may be less than or equal the prior
(and that includes when the "dout" offset is taken into account).

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 244 ++++++++++++++++++++++++++++++-----------
 include/uapi/scsi/sg.h |   4 +-
 2 files changed, 181 insertions(+), 67 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 13a9c3f77715..1f6aae3909c7 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -220,6 +220,8 @@ struct sg_slice_hdr4 {	/* parts of sg_io_v4 object needed in async usage */
 	void __user *sbp;	/* derived from sg_io_v4::response */
 	u64 usr_ptr;		/* hold sg_io_v4::usr_ptr as given (u64) */
 	int out_resid;
+	u32 wr_offset;		/* from v4::spare_in when flagged; in bytes */
+	u32 wr_len;		/* for shared reqs maybe < read-side */
 	s16 dir;		/* data xfer direction; SG_DXFER_*  */
 	u16 cmd_len;		/* truncated of sg_io_v4::request_len */
 	u16 max_sb_len;		/* truncated of sg_io_v4::max_response_len */
@@ -315,9 +317,11 @@ struct sg_device { /* holds the state of each scsi generic device */
 };
 
 struct sg_comm_wr_t {  /* arguments to sg_common_write() */
+	bool keep_share;
 	int timeout;
 	int cmd_len;
 	int rsv_idx;		/* wanted rsv_arr index, def: -1 (anyone) */
+	int wr_offset;		/* non-zero if v4 and DOUT_OFFSET set */
 	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
 	union {		/* selector is frq_bm.SG_FRQ_IS_V4I */
 		struct sg_io_hdr *h3p;
@@ -710,6 +714,14 @@ sg_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+static inline void
+sg_comm_wr_init(struct sg_comm_wr_t *cwrp)
+{
+	memset(cwrp, 0, sizeof(*cwrp));
+	WRITE_ONCE(cwrp->frq_bm[0], 0);
+	cwrp->rsv_idx = -1;
+}
+
 /*
  * ***********************************************************************
  * write(2) related functions follow. They are shown before read(2) related
@@ -856,14 +868,12 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 			 __func__, ohp->reply_len - (int)SZ_SG_HEADER,
 			 input_size, (unsigned int)opcode, current->comm);
 	}
+	sg_comm_wr_init(&cwr);
 	cwr.h3p = h3p;
-	WRITE_ONCE(cwr.frq_bm[0], 0);
 	cwr.timeout = sfp->timeout;
 	cwr.cmd_len = cmd_size;
-	cwr.rsv_idx = -1;
 	cwr.sfp = sfp;
 	cwr.u_cmdp = p;
-	cwr.cmdp = NULL;
 	srp = sg_common_write(&cwr);
 	return (IS_ERR(srp)) ? PTR_ERR(srp) : (int)count;
 }
@@ -929,15 +939,13 @@ sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 	if (test_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm))
 		clear_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(hp->timeout);
-	WRITE_ONCE(cwr.frq_bm[0], 0);
+	sg_comm_wr_init(&cwr);
 	__assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
 	cwr.h3p = hp;
 	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
 	cwr.cmd_len = hp->cmd_len;
-	cwr.rsv_idx = -1;
 	cwr.sfp = sfp;
 	cwr.u_cmdp = hp->cmdp;
-	cwr.cmdp = NULL;
 	srp = sg_common_write(&cwr);
 	if (IS_ERR(srp))
 		return PTR_ERR(srp);
@@ -1118,6 +1126,7 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 {
 	bool have_mrq_sense = (cop->response && cop->max_response_len);
 	bool share_on_oth = false;
+	bool last_is_keep_share = false;
 	bool share;
 	int k;
 	u32 cdb_alen = cop->request_len;
@@ -1136,6 +1145,7 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 			       __func__, k, "bad guard");
 			return -ERANGE;
 		}
+		last_is_keep_share = !!(flags & SGV4_FLAG_KEEP_SHARE);
 		if (unlikely(flags & SGV4_FLAG_MULTIPLE_REQS)) {
 			SG_LOG(1, sfp, "%s: %s %u: no nested multi-reqs\n",
 			       __func__, rip, k);
@@ -1184,25 +1194,40 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 			hp->max_response_len = cop->max_response_len;
 		}
 	}
+	if (last_is_keep_share) {
+		SG_LOG(1, sfp,
+		       "%s: Can't set SGV4_FLAG_KEEP_SHARE on last mrq req\n",
+		       __func__);
+		return -ERANGE;
+	}
 	if (share_on_othp)
 		*share_on_othp = share_on_othp;
 	return 0;
 }
 
+/*
+ * Read operation (din) must precede any write (dout) operations and a din
+ * operation can't be last (data transferring) operations. Non data
+ * transferring operations can appear anywhere. Data transferring operations
+ * must have SGV4_FLAG_SHARE set. Dout operations must additionally have
+ * SGV4_FLAG_NO_DXFER and SGV4_FLAG_DO_ON_OTHER set. Din operations must
+ * not set SGV4_FLAG_DO_ON_OTHER.
+ */
 static bool
 sg_mrq_svb_chk(struct sg_io_v4 *a_hds, u32 tot_reqs)
 {
-	bool expect_rd;
+	bool last_rd = false;
+	bool seen_wr = false;
 	int k;
 	u32 flags;
 	struct sg_io_v4 *hp;
 
 	/* expect read-write pairs, all with SGV4_FLAG_NO_DXFER set */
-	for (k = 0, hp = a_hds, expect_rd = true; k < tot_reqs; ++k, ++hp) {
+	for (k = 0, hp = a_hds; k < tot_reqs; ++k, ++hp) {
 		flags = hp->flags;
 		if (flags & (SGV4_FLAG_COMPLETE_B4))
 			return false;
-		if (expect_rd) {
+		if (!seen_wr) {
 			if (hp->dout_xfer_len > 0)
 				return false;
 			if (hp->din_xfer_len > 0) {
@@ -1210,7 +1235,8 @@ sg_mrq_svb_chk(struct sg_io_v4 *a_hds, u32 tot_reqs)
 					return false;
 				if (flags & SGV4_FLAG_DO_ON_OTHER)
 					return false;
-				expect_rd = false;
+				seen_wr = true;
+				last_rd = true;
 			}
 			/* allowing commands with no dxfer */
 		} else {	/* checking write side */
@@ -1219,43 +1245,46 @@ sg_mrq_svb_chk(struct sg_io_v4 *a_hds, u32 tot_reqs)
 				    (SGV4_FLAG_NO_DXFER | SGV4_FLAG_SHARE |
 				     SGV4_FLAG_DO_ON_OTHER))
 					return false;
-				expect_rd = true;
+				last_rd = false;
+			}
+			if (hp->din_xfer_len > 0) {
+				if (!(flags & SGV4_FLAG_SHARE))
+					return false;
+				if (flags & SGV4_FLAG_DO_ON_OTHER)
+					return false;
+				last_rd = true;
 			}
-			if (hp->din_xfer_len > 0)
-				return false;
 		}
 	}
-	if (!expect_rd)
-		return false;
-	return true;
+	return !last_rd;
 }
 
 static struct sg_request *
 sg_mrq_submit(struct sg_fd *rq_sfp, struct sg_mrq_hold *mhp, int pos_hdr,
-	      int rsv_idx)
+	      int rsv_idx, bool keep_share)
 {
 	unsigned long ul_timeout;
 	struct sg_comm_wr_t r_cwr;
 	struct sg_comm_wr_t *r_cwrp = &r_cwr;
 	struct sg_io_v4 *hp = mhp->a_hds + pos_hdr;
 
-	if (mhp->cdb_ap) {	/* already have array of cdbs */
+	sg_comm_wr_init(r_cwrp);
+	if (mhp->cdb_ap)	/* already have array of cdbs */
 		r_cwrp->cmdp = mhp->cdb_ap + (pos_hdr * mhp->cdb_mxlen);
-		r_cwrp->u_cmdp = NULL;
-	} else {	/* fetch each cdb from user space */
-		r_cwrp->cmdp = NULL;
+	else			/* fetch each cdb from user space */
 		r_cwrp->u_cmdp = cuptr64(hp->request);
-	}
 	r_cwrp->cmd_len = hp->request_len;
 	r_cwrp->rsv_idx = rsv_idx;
 	ul_timeout = msecs_to_jiffies(hp->timeout);
-	r_cwrp->frq_bm[0] = 0;
 	__assign_bit(SG_FRQ_SYNC_INVOC, r_cwrp->frq_bm,
 		     (int)mhp->blocking);
 	__set_bit(SG_FRQ_IS_V4I, r_cwrp->frq_bm);
 	r_cwrp->h4p = hp;
 	r_cwrp->timeout = min_t(unsigned long, ul_timeout, INT_MAX);
+	if (hp->flags & SGV4_FLAG_DOUT_OFFSET)
+		r_cwrp->wr_offset = hp->spare_in;
 	r_cwrp->sfp = rq_sfp;
+	r_cwrp->keep_share = keep_share;
 	return sg_common_write(r_cwrp);
 }
 
@@ -1291,7 +1320,7 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 		}
 		flags = hp->flags;
 		rq_sfp = (flags & SGV4_FLAG_DO_ON_OTHER) ? o_sfp : fp;
-		srp = sg_mrq_submit(rq_sfp, mhp, j, -1);
+		srp = sg_mrq_submit(rq_sfp, mhp, j, -1, false);
 		if (IS_ERR(srp)) {
 			mhp->s_res = PTR_ERR(srp);
 			break;
@@ -1382,7 +1411,7 @@ sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 		   struct sg_mrq_hold *mhp)
 {
 	bool aborted = false;
-	bool chk_oth_first;
+	bool chk_oth_first, keep_share;
 	int k, j, i, m, rcv_before, idx, ws_pos, sent;
 	int this_fp_sent, other_fp_sent;
 	int num_subm = 0;
@@ -1437,7 +1466,7 @@ sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 				       __func__, (int)hp->request_extra);
 				rq_sfp = fp;
 			}
-			srp = sg_mrq_submit(rq_sfp, mhp, j, -1);
+			srp = sg_mrq_submit(rq_sfp, mhp, j, -1, false);
 			if (IS_ERR(srp)) {
 				mhp->s_res = PTR_ERR(srp);
 				res = mhp->s_res;	/* don't loop again */
@@ -1526,10 +1555,13 @@ sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 					res = -EPROTO;
 					break;
 				}
+				keep_share = false;
+another_dout:
 				SG_LOG(6, o_sfp,
 				       "%s: submit ws_pos=%d, rs_idx=%d\n",
 				       __func__, ws_pos, idx);
-				srp = sg_mrq_submit(o_sfp, mhp, ws_pos, idx);
+				srp = sg_mrq_submit(o_sfp, mhp, ws_pos, idx,
+						    keep_share);
 				if (IS_ERR(srp)) {
 					mhp->s_res = PTR_ERR(srp);
 					res = mhp->s_res;
@@ -1542,6 +1574,11 @@ sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 				++other_fp_sent;
 				++sent;
 				srp->s_hdr4.mrq_ind = ws_pos;
+				if (srp->rq_flags & SGV4_FLAG_KEEP_SHARE) {
+					++ws_pos;  /* next for same read-side */
+					keep_share = true;
+					goto another_dout;
+				}
 				if (mhp->chk_abort)
 					atomic_set(&srp->s_hdr4.pack_id_of_mrq,
 						   mhp->id_of_mrq);
@@ -1773,6 +1810,7 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 	struct sg_request *srp;
 	struct sg_comm_wr_t cwr;
 
+	sg_comm_wr_init(&cwr);
 	if (h4p->flags & SGV4_FLAG_MULTIPLE_REQS) {
 		/* want v4 async or sync with guard, din and dout and flags */
 		if (!h4p->dout_xferp || h4p->din_iovec_count ||
@@ -1781,7 +1819,6 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 			return -ERANGE;
 		if (o_srp)
 			*o_srp = NULL;
-		memset(&cwr, 0, sizeof(cwr));
 		cwr.sfp = sfp;
 		cwr.h4p = h4p;
 		res = sg_do_multi_req(&cwr, sync);
@@ -1810,15 +1847,14 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 		clear_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(h4p->timeout);
 	cwr.sfp = sfp;
-	WRITE_ONCE(cwr.frq_bm[0], 0);
 	__assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
 	__set_bit(SG_FRQ_IS_V4I, cwr.frq_bm);
 	cwr.h4p = h4p;
 	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
 	cwr.cmd_len = h4p->request_len;
-	cwr.rsv_idx = -1;
+	if (h4p->flags & SGV4_FLAG_DOUT_OFFSET)
+		cwr.wr_offset = h4p->spare_in;
 	cwr.u_cmdp = cuptr64(h4p->request);
-	cwr.cmdp = NULL;
 	srp = sg_common_write(&cwr);
 	if (IS_ERR(srp))
 		return PTR_ERR(srp);
@@ -2156,6 +2192,18 @@ sg_get_probable_read_side(struct sg_fd *sfp)
 			break;
 		}
 	}
+	/* Subsequent dout data transfers (e.g. WRITE) on a request share */
+	for (rapp = sfp->rsv_arr; rapp < end_rapp; ++rapp) {
+		rs_srp = *rapp;
+		if (IS_ERR_OR_NULL(rs_srp) || rs_srp->sh_srp)
+			continue;
+		switch (atomic_read(&rs_srp->rq_st)) {
+		case SG_RQ_INACTIVE:
+			return rs_srp;
+		default:
+			break;
+		}
+	}
 	return NULL;
 }
 
@@ -2326,6 +2374,10 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		srp->s_hdr4.dir = dir;
 		srp->s_hdr4.out_resid = 0;
 		srp->s_hdr4.mrq_ind = 0;
+		if (dir == SG_DXFER_TO_DEV) {
+			srp->s_hdr4.wr_offset = cwrp->wr_offset;
+			srp->s_hdr4.wr_len = dxfr_len;
+		}
 	} else {	/* v3 interface active */
 		memcpy(&srp->s_hdr3, hi_p, sizeof(srp->s_hdr3));
 	}
@@ -2420,6 +2472,8 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 	int err = 0;
 	u32 rq_res = srp->rq_result;
 	enum sg_shr_var sh_var = srp->sh_var;
+	enum sg_rq_state rs_st = SG_RQ_INACTIVE;
+	struct sg_request *rs_srp;
 
 	if (unlikely(!scsi_status_is_good(rq_res))) {
 		int sb_len_wr = sg_copy_sense(srp, v4_active);
@@ -2433,27 +2487,28 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 		srp->rq_info |= SG_INFO_ABORTED;
 
 	if (sh_var == SG_SHR_WS_RQ && sg_fd_is_shared(sfp)) {
-		enum sg_rq_state rs_st;
-		struct sg_request *rs_srp = srp->sh_srp;
+		__maybe_unused char b[32];
 
+		rs_srp = srp->sh_srp;
 		if (!rs_srp)
 			return -EPROTO;
 		rs_st = atomic_read(&rs_srp->rq_st);
 
 		switch (rs_st) {
 		case SG_RQ_SHR_SWAP:
+			if (!(srp->rq_flags & SGV4_FLAG_KEEP_SHARE))
+				goto set_inactive;
+			SG_LOG(6, sfp, "%s: hold onto %s share\n",
+			       __func__, sg_get_rsv_str(rs_srp, "", "",
+							sizeof(b), b));
+			break;
 		case SG_RQ_SHR_IN_WS:
-			/* make read-side request available for re-use */
-			rs_srp->tag = SG_TAG_WILDCARD;
-			rs_srp->sh_var = SG_SHR_NONE;
-			sg_rq_chg_state_force(rs_srp, SG_RQ_INACTIVE);
-			atomic_inc(&rs_srp->parentfp->inactives);
-			rs_srp->frq_bm[0] = 0;
-			__set_bit(SG_FRQ_RESERVED, rs_srp->frq_bm);
-			rs_srp->in_resid = 0;
-			rs_srp->rq_info = 0;
-			rs_srp->sense_len = 0;
-			rs_srp->sh_srp = NULL;
+			if (!(srp->rq_flags & SGV4_FLAG_KEEP_SHARE))
+				goto set_inactive;
+			err = sg_rq_chg_state(rs_srp, rs_st, SG_RQ_SHR_SWAP);
+			SG_LOG(6, sfp, "%s: hold onto %s share\n",
+			       __func__, sg_get_rsv_str(rs_srp, "", "",
+							sizeof(b), b));
 			break;
 		case SG_RQ_AWAIT_RCV:
 			break;
@@ -2472,6 +2527,18 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 	if (SG_IS_DETACHING(sfp->parentdp))
 		srp->rq_info |= SG_INFO_DEVICE_DETACHING;
 	return err;
+set_inactive:
+	/* make read-side request available for re-use */
+	rs_srp->tag = SG_TAG_WILDCARD;
+	rs_srp->sh_var = SG_SHR_NONE;
+	sg_rq_chg_state_force(rs_srp, SG_RQ_INACTIVE);
+	atomic_inc(&rs_srp->parentfp->inactives);
+	rs_srp->frq_bm[0] &= (1 << SG_FRQ_RESERVED);
+	rs_srp->in_resid = 0;
+	rs_srp->rq_info = 0;
+	rs_srp->sense_len = 0;
+	rs_srp->sh_srp = NULL;
+	return err;
 }
 
 static void
@@ -3978,7 +4045,7 @@ sg_fd_share(struct sg_fd *ws_sfp, int m_fd)
 /*
  * After checking the proposed file share relationship is unique and
  * valid, sets up pointers between read-side and write-side sg_fd objects.
- * Return 0 on success or negated errno value.
+ * Allows previous write-side to be the same as the new new_ws_fd .
  */
 static int
 sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
@@ -3996,15 +4063,21 @@ sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
 		return -EBADF;
 	if (unlikely(!xa_get_mark(&rs_sfp->parentdp->sfp_arr, rs_sfp->idx,
 				  SG_XA_FD_RS_SHARE)))
-		return -EINVAL;
-	/* SG_XA_FD_RS_SHARE set impiles ws_sfp is valid */
+		res = -EINVAL;	/* invalid unless prev_sl==new_sl */
 
 	/* Alternate approach: fcheck_files(current->files, m_fd) */
 	filp = fget(new_ws_fd);
 	if (unlikely(!filp))
 		return -ENOENT;
-	if (unlikely(rs_sfp->filp == filp)) {/* share with self is confusing */
-		res = -ELOOP;
+	if (unlikely(rs_sfp->filp == filp)) {
+		res = -ELOOP;	/* share with self is confusing */
+		goto fini;
+	}
+	if (res == -EINVAL) {
+		if (ws_sfp && ws_sfp->filp == filp) {
+			found = true;
+			res = 0;	/* prev_sl==new_sl is okay */
+		}	/* else it is invalid and res is still -EINVAL */
 		goto fini;
 	}
 	SG_LOG(6, ws_sfp, "%s: write-side fd ok, scan for filp=0x%pK\n", __func__,
@@ -4014,7 +4087,7 @@ sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
 	if (!IS_ERR(ws_sfp))
 		found = !!ws_sfp;
 fini:
-	/* paired with filp=fget(new_ws_fd) above */
+	/* fput() paired with filp=fget(new_ws_fd) above */
 	fput(filp);
 	if (unlikely(res))
 		return res;
@@ -5717,8 +5790,9 @@ sg_mk_kern_bio(int bvec_cnt)
 static int
 sg_rq_map_kern(struct sg_request *srp, struct request_queue *q, struct request *rqq, int rw_ind)
 {
-	struct sg_scatter_hold *schp = &srp->sgat_h;
+	struct sg_scatter_hold *schp = srp->sgatp;
 	struct bio *bio;
+	bool have_bio = false;
 	int k, ln;
 	int op_flags = 0;
 	int num_sgat = schp->num_sgat;
@@ -5732,12 +5806,48 @@ sg_rq_map_kern(struct sg_request *srp, struct request_queue *q, struct request *
 		return 0;
 	if (rw_ind == WRITE)
 		op_flags = REQ_SYNC | REQ_IDLE;
-	bio = sg_mk_kern_bio(num_sgat);
-	if (!bio)
-		return -ENOMEM;
-	bio->bi_opf = req_op(rqq) | op_flags;
-
-	for (k = 0; k < num_sgat && dlen > 0; ++k, dlen -= ln) {
+	k = 0;		/* N.B. following condition may increase k */
+	if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm)) {
+		struct sg_slice_hdr4 *slh4p = &srp->s_hdr4;
+
+		if (slh4p->dir == SG_DXFER_TO_DEV) {
+			u32 wr_len = slh4p->wr_len;
+			u32 wr_off = slh4p->wr_offset;
+
+			if (wr_off > 0) {  /* skip over wr_off, conditionally add partial page */
+				for (ln = 0; k < num_sgat && wr_off > 0; ++k, wr_off -= ln)
+					ln = min_t(int, wr_off, pg_sz);
+				bio = sg_mk_kern_bio(num_sgat + 1 - k);
+				if (!bio)
+					return -ENOMEM;
+				bio->bi_opf = req_op(rqq) | op_flags;
+				have_bio = true;
+				if (ln < pg_sz) {	/* k > 0 since num_sgat > 0 */
+					int rlen = pg_sz - ln;
+					struct page *pg = schp->pages[k - 1];
+
+					if (bio_add_pc_page(q, bio, pg, rlen, ln) < rlen) {
+						bio_put(bio);
+						return -EINVAL;
+					}
+					wr_len -= pg_sz - ln;
+				}
+				dlen = wr_len;
+				SG_LOG(5, srp->parentfp, "%s:   wr_off=%u wr_len=%u\n", __func__,
+				       wr_off, wr_len);
+			} else {
+				if (wr_len < dlen)
+					dlen = wr_len;	/* short write, offset 0 */
+			}
+		}
+	}
+	if (!have_bio) {
+		bio = sg_mk_kern_bio(num_sgat - k);
+		if (!bio)
+			return -ENOMEM;
+		bio->bi_opf = req_op(rqq) | op_flags;
+	}
+	for ( ; k < num_sgat && dlen > 0; ++k, dlen -= ln) {
 		ln = min_t(int, dlen, pg_sz);
 		if (bio_add_pc_page(q, bio, schp->pages[k], ln, 0) < ln) {
 			bio_put(bio);
@@ -6431,23 +6541,23 @@ sg_build_reserve(struct sg_fd *sfp, int buflen)
 }
 
 static struct sg_request *
-sg_setup_req_ws_helper(struct sg_fd *fp, int rsv_idx)
+sg_setup_req_ws_helper(struct sg_comm_wr_t *cwrp)
 {
 	int res;
 	struct sg_request *r_srp;
 	enum sg_rq_state rs_sr_st;
+	struct sg_fd *fp = cwrp->sfp;
 	struct sg_fd *rs_sfp = sg_fd_share_ptr(fp);
 
 	if (unlikely(!rs_sfp))
 		return ERR_PTR(-EPROTO);
 	/*
-	 * There may be contention with another potential write-side trying
-	 * to pair with this read-side. The loser will receive an
-	 * EADDRINUSE errno. The winner advances read-side's rq_state:
-	 *     SG_RQ_SHR_SWAP --> SG_RQ_SHR_IN_WS
+	 * There may be contention with another potential write-side trying to pair with this
+	 * read-side. The loser will receive an EADDRINUSE errno. The winner advances read-side's
+	 * rq_state:	SG_RQ_SHR_SWAP --> SG_RQ_SHR_IN_WS
 	 */
-	if (rsv_idx >= 0)
-		r_srp = rs_sfp->rsv_arr[rsv_idx];
+	if (cwrp->rsv_idx >= 0)
+		r_srp = rs_sfp->rsv_arr[cwrp->rsv_idx];
 	else
 		r_srp = sg_get_probable_read_side(rs_sfp);
 	if (unlikely(!r_srp))
@@ -6538,13 +6648,14 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		goto good_fini;
 	case SG_SHR_WS_RQ:
 		cp = "rs_rq";
-		rs_rsv_srp = sg_setup_req_ws_helper(fp, cwrp->rsv_idx);
+		rs_rsv_srp = sg_setup_req_ws_helper(cwrp);
 		if (IS_ERR(rs_rsv_srp)) {
 			r_srp = rs_rsv_srp;
 			goto err_out;
 		}
 		/* write-side dlen may be <= read-side's dlen */
-		if (unlikely(dxfr_len > rs_rsv_srp->sgatp->dlen)) {
+		if (unlikely(dxfr_len + cwrp->wr_offset >
+			     rs_rsv_srp->sgatp->dlen)) {
 			SG_LOG(1, fp, "%s: bad, write-side dlen [%d] > read-side's\n",
 			       __func__, dxfr_len);
 			r_srp = ERR_PTR(-E2BIG);
@@ -6589,6 +6700,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 				if (r_srp->sgat_h.buflen <= SG_DEF_SECTOR_SZ) {
 					if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
 						continue;
+					atomic_dec(&fp->inactives);
 					mk_new_srp = false;
 					break;
 				} else if (!low_srp) {
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index a3f3d244d2af..52eccedf2f33 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -114,6 +114,7 @@ typedef struct sg_io_hdr {
 #define SGV4_FLAG_YIELD_TAG 0x8  /* sg_io_v4::generated_tag set after SG_IOS */
 #define SGV4_FLAG_Q_AT_TAIL SG_FLAG_Q_AT_TAIL
 #define SGV4_FLAG_Q_AT_HEAD SG_FLAG_Q_AT_HEAD
+#define SGV4_FLAG_DOUT_OFFSET  0x40	/* dout byte offset in v4::spare_in */
 #define SGV4_FLAG_COMPLETE_B4  0x100	/* mrq: complete this rq before next */
 #define SGV4_FLAG_SIGNAL 0x200	/* v3: ignored; v4 signal on completion */
 #define SGV4_FLAG_IMMED 0x400   /* issue request and return immediately ... */
@@ -123,7 +124,8 @@ typedef struct sg_io_hdr {
 #define SGV4_FLAG_SHARE 0x4000	/* share IO buffer; needs SG_SEIM_SHARE_FD */
 #define SGV4_FLAG_DO_ON_OTHER 0x8000 /* available on either of shared pair */
 #define SGV4_FLAG_NO_DXFER SG_FLAG_NO_DXFER /* but keep dev<-->kernel xfr */
-#define SGV4_FLAG_MULTIPLE_REQS 0x20000	/* 1 or more sg_io_v4-s in data-in */
+#define SGV4_FLAG_KEEP_SHARE 0x20000  /* ... buffer for another dout command */
+#define SGV4_FLAG_MULTIPLE_REQS 0x40000	/* 1 or more sg_io_v4-s in data-in */
 
 /* Output (potentially OR-ed together) in v3::info or v4::info field */
 #define SG_INFO_OK_MASK 0x1
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 69/83] sg: add dlen to sg_comm_wr_t
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (68 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 68/83] sg: keep share and dout offset flags Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 70/83] sg: make use of struct sg_mrq_hold Douglas Gilbert
                   ` (13 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

The data transfer length was being recalculated and passed as
a function argument. It is tidier to place it in struct
sg_comm_wr_t with other similar parameters.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 84 ++++++++++++++++++++++-------------------------
 1 file changed, 39 insertions(+), 45 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 1f6aae3909c7..ef3b42814b9a 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -321,6 +321,7 @@ struct sg_comm_wr_t {  /* arguments to sg_common_write() */
 	int timeout;
 	int cmd_len;
 	int rsv_idx;		/* wanted rsv_arr index, def: -1 (anyone) */
+	int dlen;		/* dout or din length in bytes */
 	int wr_offset;		/* non-zero if v4 and DOUT_OFFSET set */
 	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
 	union {		/* selector is frq_bm.SG_FRQ_IS_V4I */
@@ -375,7 +376,7 @@ static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id,
 					    bool is_tag);
 static bool sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp);
 static struct sg_request *sg_setup_req(struct sg_comm_wr_t *cwrp,
-				       enum sg_shr_var sh_var, int dxfr_len);
+				       enum sg_shr_var sh_var);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int min_dev);
 static void sg_device_destroy(struct kref *kref);
@@ -870,6 +871,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	}
 	sg_comm_wr_init(&cwr);
 	cwr.h3p = h3p;
+	cwr.dlen = h3p->dxfer_len;
 	cwr.timeout = sfp->timeout;
 	cwr.cmd_len = cmd_size;
 	cwr.sfp = sfp;
@@ -942,6 +944,7 @@ sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 	sg_comm_wr_init(&cwr);
 	__assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
 	cwr.h3p = hp;
+	cwr.dlen = hp->dxfer_len;
 	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
 	cwr.cmd_len = hp->cmd_len;
 	cwr.sfp = sfp;
@@ -1280,6 +1283,7 @@ sg_mrq_submit(struct sg_fd *rq_sfp, struct sg_mrq_hold *mhp, int pos_hdr,
 		     (int)mhp->blocking);
 	__set_bit(SG_FRQ_IS_V4I, r_cwrp->frq_bm);
 	r_cwrp->h4p = hp;
+	r_cwrp->dlen = hp->din_xfer_len ? hp->din_xfer_len : hp->dout_xfer_len;
 	r_cwrp->timeout = min_t(unsigned long, ul_timeout, INT_MAX);
 	if (hp->flags & SGV4_FLAG_DOUT_OFFSET)
 		r_cwrp->wr_offset = hp->spare_in;
@@ -1806,11 +1810,14 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 	     bool sync, struct sg_request **o_srp)
 {
 	int res = 0;
+	int dlen;
 	unsigned long ul_timeout;
 	struct sg_request *srp;
 	struct sg_comm_wr_t cwr;
 
 	sg_comm_wr_init(&cwr);
+	dlen = h4p->din_xfer_len ? h4p->din_xfer_len : h4p->dout_xfer_len;
+	cwr.dlen = dlen;
 	if (h4p->flags & SGV4_FLAG_MULTIPLE_REQS) {
 		/* want v4 async or sync with guard, din and dout and flags */
 		if (!h4p->dout_xferp || h4p->din_iovec_count ||
@@ -1832,13 +1839,7 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 		return 0;
 	}
 	if (h4p->flags & SG_FLAG_MMAP_IO) {
-		int len = 0;
-
-		if (h4p->din_xferp)
-			len = h4p->din_xfer_len;
-		else if (h4p->dout_xferp)
-			len = h4p->dout_xfer_len;
-		res = sg_chk_mmap(sfp, h4p->flags, len);
+		res = sg_chk_mmap(sfp, h4p->flags, dlen);
 		if (unlikely(res))
 			return res;
 	}
@@ -2312,7 +2313,8 @@ static struct sg_request *
 sg_common_write(struct sg_comm_wr_t *cwrp)
 {
 	int res = 0;
-	int dxfr_len, dir;
+	int dlen = cwrp->dlen;
+	int dir;
 	int pack_id = SG_PACK_ID_WILDCARD;
 	u32 rq_flags;
 	enum sg_shr_var sh_var;
@@ -2325,31 +2327,26 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 	if (likely(test_bit(SG_FRQ_IS_V4I, cwrp->frq_bm))) {
 		h4p = cwrp->h4p;
 		hi_p = NULL;
-		dxfr_len = 0;
 		dir = SG_DXFER_NONE;
 		rq_flags = h4p->flags;
 		pack_id = h4p->request_extra;
-		if (unlikely(h4p->din_xfer_len && h4p->dout_xfer_len)) {
+		if (unlikely(h4p->din_xfer_len && h4p->dout_xfer_len))
 			return ERR_PTR(-EOPNOTSUPP);
-		} else if (h4p->din_xfer_len) {
-			dxfr_len = h4p->din_xfer_len;
+		else if (h4p->din_xfer_len)
 			dir = SG_DXFER_FROM_DEV;
-		} else if (h4p->dout_xfer_len) {
-			dxfr_len = h4p->dout_xfer_len;
+		else if (h4p->dout_xfer_len)
 			dir = SG_DXFER_TO_DEV;
-		}
 	} else {			/* sg v3 interface so hi_p valid */
 		h4p = NULL;
 		hi_p = cwrp->h3p;
 		dir = hi_p->dxfer_direction;
-		dxfr_len = hi_p->dxfer_len;
 		rq_flags = hi_p->flags;
 		pack_id = hi_p->pack_id;
 	}
 	if (unlikely(rq_flags & SGV4_FLAG_MULTIPLE_REQS))
 		return ERR_PTR(-ERANGE);  /* only control object sets this */
 	if (sg_fd_is_shared(fp)) {
-		res = sg_share_chk_flags(fp, rq_flags, dxfr_len, dir, &sh_var);
+		res = sg_share_chk_flags(fp, rq_flags, dlen, dir, &sh_var);
 		if (unlikely(res < 0))
 			return ERR_PTR(res);
 	} else {
@@ -2357,10 +2354,10 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		if (unlikely(rq_flags & SGV4_FLAG_SHARE))
 			return ERR_PTR(-ENOMSG);    /* no file share found */
 	}
-	if (unlikely(dxfr_len >= SZ_256M))
+	if (unlikely(dlen >= SZ_256M))
 		return ERR_PTR(-EINVAL);
 
-	srp = sg_setup_req(cwrp, sh_var, dxfr_len);
+	srp = sg_setup_req(cwrp, sh_var);
 	if (IS_ERR(srp))
 		return srp;
 	srp->rq_flags = rq_flags;
@@ -2376,7 +2373,7 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		srp->s_hdr4.mrq_ind = 0;
 		if (dir == SG_DXFER_TO_DEV) {
 			srp->s_hdr4.wr_offset = cwrp->wr_offset;
-			srp->s_hdr4.wr_len = dxfr_len;
+			srp->s_hdr4.wr_len = dlen;
 		}
 	} else {	/* v3 interface active */
 		memcpy(&srp->s_hdr3, hi_p, sizeof(srp->s_hdr3));
@@ -5867,7 +5864,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 {
 	bool no_dxfer, us_xfer;
 	int res = 0;
-	int dxfer_len = 0;
+	int dlen = cwrp->dlen;
 	int r0w = READ;
 	u32 rq_flags = srp->rq_flags;
 	unsigned int iov_count = 0;
@@ -5898,11 +5895,9 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		if (dxfer_dir == SG_DXFER_TO_DEV) {
 			r0w = WRITE;
 			up = uptr64(h4p->dout_xferp);
-			dxfer_len = (int)h4p->dout_xfer_len;
 			iov_count = h4p->dout_iovec_count;
 		} else if (dxfer_dir == SG_DXFER_FROM_DEV) {
 			up = uptr64(h4p->din_xferp);
-			dxfer_len = (int)h4p->din_xfer_len;
 			iov_count = h4p->din_iovec_count;
 		} else {
 			up = NULL;
@@ -5911,12 +5906,11 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		struct sg_slice_hdr3 *sh3p = &srp->s_hdr3;
 
 		up = sh3p->dxferp;
-		dxfer_len = (int)sh3p->dxfer_len;
 		iov_count = sh3p->iovec_count;
 		r0w = dxfer_dir == SG_DXFER_TO_DEV ? WRITE : READ;
 	}
-	SG_LOG(4, sfp, "%s: dxfer_len=%d%s\n", __func__, dxfer_len,
-	       (dxfer_len ? (r0w ? ", data-OUT" : ", data-IN") : ""));
+	SG_LOG(4, sfp, "%s: dlen=%d%s\n", __func__, dlen,
+	       (dlen ? (r0w ? ", data-OUT" : ", data-IN") : ""));
 	q = sdp->device->request_queue;
 
 	/*
@@ -5954,7 +5948,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		goto fini;
 	scsi_rp->cmd_len = cwrp->cmd_len;
 	srp->cmd_opcode = scsi_rp->cmd[0];
-	no_dxfer = dxfer_len <= 0 || dxfer_dir == SG_DXFER_NONE;
+	no_dxfer = dlen <= 0 || dxfer_dir == SG_DXFER_NONE;
 	us_xfer = !(rq_flags & (SG_FLAG_NO_DXFER | SG_FLAG_MMAP_IO));
 	__assign_bit(SG_FRQ_US_XFER, srp->frq_bm, !no_dxfer && us_xfer);
 	rqq->end_io_data = srp;
@@ -5966,7 +5960,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		goto fini;	/* path of reqs with no din nor dout */
 	} else if (unlikely(rq_flags & SG_FLAG_DIRECT_IO) && iov_count == 0 &&
 		   !sdp->device->host->unchecked_isa_dma &&
-		   blk_rq_aligned(q, (unsigned long)up, dxfer_len)) {
+		   blk_rq_aligned(q, (unsigned long)up, dlen)) {
 		srp->rq_info |= SG_INFO_DIRECT_IO;
 		md = NULL;
 		if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
@@ -5982,11 +5976,10 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 			struct sg_request *r_srp = sfp->rsv_arr[0];
 
 			reserve0 = (r_srp == srp);
-			if (unlikely(!reserve0 ||
-				     dxfer_len > req_schp->buflen))
+			if (unlikely(!reserve0 || dlen > req_schp->buflen))
 				res = reserve0 ? -ENOMEM : -EBUSY;
 		} else if (req_schp->buflen == 0) {
-			int up_sz = max_t(int, dxfer_len, sfp->sgat_elem_sz);
+			int up_sz = max_t(int, dlen, sfp->sgat_elem_sz);
 
 			res = sg_mk_sgat(srp, sfp, up_sz);
 		}
@@ -6008,7 +6001,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		if (unlikely(res < 0))
 			goto fini;
 
-		iov_iter_truncate(&i, dxfer_len);
+		iov_iter_truncate(&i, dlen);
 		if (unlikely(!iov_iter_count(&i))) {
 			kfree(iov);
 			res = -EINVAL;
@@ -6021,7 +6014,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
 			cp = "iov_count > 0";
 	} else if (us_xfer) { /* setup for transfer data to/from user space */
-		res = blk_rq_map_user(q, rqq, md, up, dxfer_len, GFP_ATOMIC);
+		res = blk_rq_map_user(q, rqq, md, up, dlen, GFP_ATOMIC);
 #if IS_ENABLED(SG_LOG_ACTIVE)
 		if (unlikely(res))
 			SG_LOG(1, sfp, "%s: blk_rq_map_user() res=%d\n",
@@ -6595,7 +6588,7 @@ sg_setup_req_ws_helper(struct sg_comm_wr_t *cwrp)
  * side's reserve request can only be used in a request share.
  */
 static struct sg_request *
-sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
+sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 {
 	bool allow_rsv = true;		/* see note above */
 	bool mk_new_srp = true;
@@ -6608,6 +6601,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 	bool is_rsv;
 	int ra_idx = 0;
 	int l_used_idx;
+	int dlen = cwrp->dlen;
 	u32 sum_dlen;
 	unsigned long idx, s_idx, end_idx, iflags;
 	enum sg_rq_state sr_st;
@@ -6654,15 +6648,15 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 			goto err_out;
 		}
 		/* write-side dlen may be <= read-side's dlen */
-		if (unlikely(dxfr_len + cwrp->wr_offset >
+		if (unlikely(dlen + cwrp->wr_offset >
 			     rs_rsv_srp->sgatp->dlen)) {
 			SG_LOG(1, fp, "%s: bad, write-side dlen [%d] > read-side's\n",
-			       __func__, dxfr_len);
+			       __func__, dlen);
 			r_srp = ERR_PTR(-E2BIG);
 			goto err_out;
 		}
 		ws_rq = true;
-		dxfr_len = 0;	/* any srp for write-side will do, pick smallest */
+		dlen = 0;	/* any srp for write-side will do, pick smallest */
 		break;
 	case SG_SHR_RS_NOT_SRQ:
 		allow_rsv = false;
@@ -6677,7 +6671,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		mk_new_srp = true;
 	} else if (atomic_read(&fp->inactives) <= 0) {
 		mk_new_srp = true;
-	} else if (likely(!try_harder) && dxfr_len < SG_DEF_SECTOR_SZ) {
+	} else if (likely(!try_harder) && dlen < SG_DEF_SECTOR_SZ) {
 		struct sg_request *low_srp = NULL;
 
 		l_used_idx = READ_ONCE(fp->low_used_idx);
@@ -6728,7 +6722,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 			for (r_srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE);
 			     r_srp;
 			     r_srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE)) {
-				if (dxfr_len <= r_srp->sgat_h.buflen) {
+				if (r_srp->sgat_h.buflen >= dlen) {
 					if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
 						continue;
 					atomic_dec(&fp->inactives);
@@ -6749,7 +6743,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 			for (r_srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE);
 			     r_srp;
 			     r_srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE)) {
-				if (dxfr_len <= r_srp->sgat_h.buflen &&
+				if (r_srp->sgat_h.buflen >= dlen &&
 				    !test_bit(SG_FRQ_RESERVED, r_srp->frq_bm)) {
 					if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
 						continue;
@@ -6789,7 +6783,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 			       __func__);
 			goto err_out;
 		} else if (fp->tot_fd_thresh > 0) {
-			sum_dlen = atomic_read(&fp->sum_fd_dlens) + dxfr_len;
+			sum_dlen = atomic_read(&fp->sum_fd_dlens) + dlen;
 			if (unlikely(sum_dlen > (u32)fp->tot_fd_thresh)) {
 				r_srp = ERR_PTR(-E2BIG);
 				SG_LOG(2, fp, "%s: sum_of_dlen(%u) > %s\n",
@@ -6810,9 +6804,9 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		}
 		if (IS_ERR(r_srp))	/* NULL is _not_ an ERR here */
 			goto err_out;
-		r_srp = sg_mk_srp_sgat(fp, no_reqs, dxfr_len);
+		r_srp = sg_mk_srp_sgat(fp, no_reqs, dlen);
 		if (IS_ERR(r_srp)) {
-			if (!try_harder && dxfr_len < SG_DEF_SECTOR_SZ &&
+			if (!try_harder && dlen < SG_DEF_SECTOR_SZ &&
 			    some_inactive) {
 				try_harder = true;
 				goto start_again;
@@ -6852,7 +6846,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var, int dxfr_len)
 		set_bit(SG_FRQ_IS_V4I, r_srp->frq_bm);
 	if (test_bit(SG_FRQ_SYNC_INVOC, cwrp->frq_bm))
 		set_bit(SG_FRQ_SYNC_INVOC, r_srp->frq_bm);
-	r_srp->sgatp->dlen = dxfr_len;/* must be <= r_srp->sgat_h.buflen */
+	r_srp->sgatp->dlen = dlen;	/* must be <= r_srp->sgat_h.buflen */
 	r_srp->sh_var = sh_var;
 	r_srp->cmd_opcode = 0xff;  /* set invalid opcode (VS), 0x0 is TUR */
 	/* If setup stalls (e.g. blk_get_request()) debug shows 'elap=1 ns' */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 70/83] sg: make use of struct sg_mrq_hold
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (69 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 69/83] sg: add dlen to sg_comm_wr_t Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 71/83] sg: add mmap IO option for mrq metadata Douglas Gilbert
                   ` (12 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Not enough use was being made of the mrq parameters holding structure.
By using this more, significant reductions are made in the number of
passed parameters in the functions processing mrq requests.

Rename sg_sgv4_out_zero() to sg_v4h_partial_zero() and add a comment
above the function about what it does.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 61 ++++++++++++++++++++++-------------------------
 1 file changed, 29 insertions(+), 32 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index ef3b42814b9a..bdb9b3dbf970 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -957,12 +957,12 @@ sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 	return 0;
 }
 
-static void
-sg_sgv4_out_zero(struct sg_io_v4 *h4p)
+/* Clear from and including driver_status to end of sg_io_v4 object */
+static inline void
+sg_v4h_partial_zero(struct sg_io_v4 *h4p)
 {
-	const int off = offsetof(struct sg_io_v4, driver_status);
+	static const int off = offsetof(struct sg_io_v4, driver_status);
 
-	/* clear from and including driver_status to end of object */
 	memset((u8 *)h4p + off, 0, SZ_SG_IO_V4 - off);
 }
 
@@ -973,11 +973,13 @@ sg_sgv4_out_zero(struct sg_io_v4 *h4p)
  * secondary error value (s_res) is placed in the cop->spare_out field.
  */
 static int
-sg_mrq_arr_flush(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds, u32 tot_reqs,
-		 int s_res)
+sg_mrq_arr_flush(struct sg_mrq_hold *mhp)
 {
-	u32 sz = min(tot_reqs * SZ_SG_IO_V4, cop->din_xfer_len);
+	int s_res = mhp->s_res;
+	struct sg_io_v4 *cop = mhp->cwrp->h4p;
 	void __user *p = uptr64(cop->din_xferp);
+	struct sg_io_v4 *a_hds = mhp->a_hds;
+	u32 sz = min(mhp->tot_reqs * SZ_SG_IO_V4, cop->din_xfer_len);
 
 	if (unlikely(s_res))
 		cop->spare_out = -s_res;
@@ -991,11 +993,13 @@ sg_mrq_arr_flush(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds, u32 tot_reqs,
 }
 
 static int
-sg_mrq_1complet(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
-		struct sg_fd *do_on_sfp, int tot_reqs, struct sg_request *srp)
+sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *do_on_sfp,
+		struct sg_request *srp)
 {
 	int s_res, indx;
+	int tot_reqs = mhp->tot_reqs;
 	struct sg_io_v4 *hp;
+	struct sg_io_v4 *a_hds = mhp->a_hds;
 
 	if (unlikely(!srp))
 		return -EPROTO;
@@ -1015,7 +1019,7 @@ sg_mrq_1complet(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 		return s_res;
 	hp->info |= SG_INFO_MRQ_FINI;
 	if (do_on_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
-		s_res = sg_mrq_arr_flush(cop, a_hds, tot_reqs, s_res);
+		s_res = sg_mrq_arr_flush(mhp);
 		if (unlikely(s_res))	/* can only be -EFAULT */
 			return s_res;
 		kill_fasync(&do_on_sfp->async_qp, SIGPOLL, POLL_IN);
@@ -1040,13 +1044,13 @@ sg_wait_mrq_event(struct sg_fd *sfp, struct sg_request **srpp)
  * Increments cop->info for each successful completion.
  */
 static int
-sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
-		struct sg_fd *sfp, struct sg_fd *sec_sfp, int tot_reqs,
-		int mreqs, int sec_reqs)
+sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp,
+		struct sg_fd *sec_sfp, int mreqs, int sec_reqs)
 {
 	int res = 0;
 	int rres;
 	struct sg_request *srp;
+	struct sg_io_v4 *cop = mhp->cwrp->h4p;
 
 	SG_LOG(3, sfp, "%s: mreqs=%d, sec_reqs=%d\n", __func__, mreqs,
 	       sec_reqs);
@@ -1059,7 +1063,7 @@ sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 				break;
 			}
 			--mreqs;
-			res = sg_mrq_1complet(cop, a_hds, sfp, tot_reqs, srp);
+			res = sg_mrq_1complet(mhp, sfp, srp);
 			if (unlikely(res))
 				return res;
 			++cop->info;
@@ -1074,8 +1078,7 @@ sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 				break;
 			}
 			--sec_reqs;
-			rres = sg_mrq_1complet(cop, a_hds, sec_sfp, tot_reqs,
-					       srp);
+			rres = sg_mrq_1complet(mhp, sec_sfp, srp);
 			if (unlikely(rres))
 				return rres;
 			++cop->info;
@@ -1092,8 +1095,7 @@ sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 				mreqs = 0;
 			} else {
 				--mreqs;
-				res = sg_mrq_1complet(cop, a_hds, sfp,
-						      tot_reqs, srp);
+				res = sg_mrq_1complet(mhp, sfp, srp);
 				if (unlikely(res))
 					return res;
 				++cop->info;
@@ -1109,8 +1111,7 @@ sg_mrq_complets(struct sg_io_v4 *cop, struct sg_io_v4 *a_hds,
 				sec_reqs = 0;
 			} else {
 				--sec_reqs;
-				res = sg_mrq_1complet(cop, a_hds, sec_sfp,
-						      tot_reqs, srp);
+				res = sg_mrq_1complet(mhp, sec_sfp, srp);
 				if (unlikely(res))
 					return res;
 				++cop->info;
@@ -1141,7 +1142,7 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 	/* Pre-check each request for anomalies, plus some preparation */
 	for (k = 0, hp = a_hds; k < tot_reqs; ++k, ++hp) {
 		flags = hp->flags;
-		sg_sgv4_out_zero(hp);
+		sg_v4h_partial_zero(hp);
 		if (unlikely(hp->guard != 'Q' || hp->protocol != 0 ||
 			     hp->subprotocol != 0)) {
 			SG_LOG(1, sfp, "%s: req index %u: %s or protocol\n",
@@ -1357,8 +1358,7 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 			break;	/* cop->driver_status <-- 0 in this case */
 		}
 		if (rq_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
-			res = sg_mrq_arr_flush(cop, mhp->a_hds, mhp->tot_reqs,
-					       mhp->s_res);
+			res = sg_mrq_arr_flush(mhp);
 			if (unlikely(res))
 				break;
 			kill_fasync(&rq_sfp->async_qp, SIGPOLL, POLL_IN);
@@ -1374,8 +1374,7 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 	if (mhp->immed)
 		return res;
 	if (likely(res == 0 && (this_fp_sent + other_fp_sent) > 0)) {
-		mhp->s_res = sg_mrq_complets(cop, mhp->a_hds, fp, o_sfp,
-					     mhp->tot_reqs, this_fp_sent,
+		mhp->s_res = sg_mrq_complets(mhp, fp, o_sfp, this_fp_sent,
 					     other_fp_sent);
 		if (unlikely(mhp->s_res == -EFAULT ||
 			     mhp->s_res == -ERESTARTSYS))
@@ -1510,8 +1509,7 @@ sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 					break;
 				}
 				--other_fp_sent;
-				res = sg_mrq_1complet(cop, a_hds, o_sfp,
-						      mhp->tot_reqs, srp);
+				res = sg_mrq_1complet(mhp, o_sfp, srp);
 				if (unlikely(res))
 					return res;
 				++cop->info;
@@ -1527,8 +1525,7 @@ sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 					break;
 				}
 				--this_fp_sent;
-				res = sg_mrq_1complet(cop, a_hds, fp,
-						      mhp->tot_reqs, srp);
+				res = sg_mrq_1complet(mhp, fp, srp);
 				if (unlikely(res))
 					return res;
 				++cop->info;
@@ -1715,7 +1712,7 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 		mhp->immed = true;
 	SG_LOG(3, fp, "%s: %s, tot_reqs=%u, id_of_mrq=%d\n", __func__,
 	       mrq_name, tot_reqs, mhp->id_of_mrq);
-	sg_sgv4_out_zero(cop);
+	sg_v4h_partial_zero(cop);
 
 	if (unlikely(tot_reqs > U16_MAX)) {
 		return -ERANGE;
@@ -1799,7 +1796,7 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 	}
 fini:
 	if (likely(res == 0) && !mhp->immed)
-		res = sg_mrq_arr_flush(cop, a_hds, tot_reqs, mhp->s_res);
+		res = sg_mrq_arr_flush(mhp);
 	kfree(cdb_ap);
 	kfree(a_hds);
 	return res;
@@ -2712,7 +2709,7 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p,
 	if (unlikely(!rsp_v4_arr))
 		return -ENOMEM;
 
-	sg_sgv4_out_zero(cop);
+	sg_v4h_partial_zero(cop);
 	cop->din_resid = n;
 	res = sg_mrq_iorec_complets(sfp, non_block, n, rsp_v4_arr);
 	if (unlikely(res < 0))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 71/83] sg: add mmap IO option for mrq metadata
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (70 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 70/83] sg: make use of struct sg_mrq_hold Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 72/83] sg: add eventfd support Douglas Gilbert
                   ` (11 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

The SG_FLAG_MMAP_IO is not very useful on individual elements in a
multiple request invocation. That is because a mrq invocation involves
one or maybe two file descriptors. mmap()-ed IO buffers are bound to a
single file descriptor. And one or possibly two is not enough IO data
buffers for mrq to be practical.

This patch adds SG_FLAG_MMAP_IO functionality to the control object in
a mrq request. When set on a mrq control object, then instead of the
returning metadata being send to din_xferp, it is sent, one element at
a time (on its completion) to the mmap(2)-ed buffer. So the user space
program must call mmap(2) before using the SG_FLAG_MMAP_IO (and that
now applies to all usages of that flag, the code was too lenient in
that area prior to this).

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 202 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 172 insertions(+), 30 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index bdb9b3dbf970..48bf5ccca5b5 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -338,13 +338,15 @@ struct sg_mrq_hold {	/* for passing context between mrq functions */
 	bool chk_abort;
 	bool immed;
 	bool stop_if;
+	bool co_mmap;
 	int id_of_mrq;
 	int s_res;		/* secondary error: some-good-then-error */
 	u32 cdb_mxlen;		/* cdb length in cdb_ap, actual be may less */
 	u32 tot_reqs;		/* total number of requests and cdb_s */
-	struct sg_comm_wr_t *cwrp;
+	struct sg_comm_wr_t *cwrp;	/* cwrp->h4p is mrq control object */
 	u8 *cdb_ap;		/* array of commands */
 	struct sg_io_v4 *a_hds;	/* array of request to execute */
+	struct sg_scatter_hold *co_mmap_sgatp;
 };
 
 /* tasklet or soft irq callback */
@@ -966,6 +968,109 @@ sg_v4h_partial_zero(struct sg_io_v4 *h4p)
 	memset((u8 *)h4p + off, 0, SZ_SG_IO_V4 - off);
 }
 
+static void
+sg_sgat_zero(struct sg_scatter_hold *sgatp, int off, int nbytes)
+{
+	int k, rem, off_pl_nbyt;
+	int ind = 0;
+	int pg_ind = 0;
+	int num_sgat = sgatp->num_sgat;
+	int elem_sz = PAGE_SIZE * (1 << sgatp->page_order);
+	struct page *pg_ep = sgatp->pages[pg_ind];
+
+	if (off >= sgatp->dlen)
+		return;
+	off_pl_nbyt = off + nbytes;
+	if (off_pl_nbyt >= sgatp->dlen) {
+		nbytes = sgatp->dlen - off;
+		off_pl_nbyt = off + nbytes;
+	}
+	/* first loop steps over off bytes, second loop zeros nbytes */
+	for (k = 0; k < off; k += rem) {
+		rem = off - k;
+		if (rem >= elem_sz) {
+			++pg_ind;
+			if (pg_ind >= num_sgat)
+				return;
+			rem = elem_sz;
+			ind = 0;
+		} else {
+			ind = elem_sz - rem;
+		}
+	}
+	pg_ep = sgatp->pages[pg_ind];
+	for ( ; k < off_pl_nbyt; k += rem) {
+		rem = off_pl_nbyt - k;
+		if (rem >= elem_sz) {
+			memset((u8 *)pg_ep + ind, 0, elem_sz - ind);
+			if (++pg_ind >= num_sgat)
+				return;
+			pg_ep = sgatp->pages[pg_ind];
+			rem = elem_sz;
+			ind = 0;
+		} else {
+			memset((u8 *)pg_ep + ind, 0, rem - ind);
+			ind = elem_sz - rem;
+		}
+	}
+}
+
+/*
+ * Copies nbytes from the start of 'fromp' into sgatp (this driver's scatter
+ * gather list representation) starting at byte offset 'off'. If nbytes is
+ * too long then it is trimmed.
+ */
+static void
+sg_sgat_cp_into(struct sg_scatter_hold *sgatp, int off, const u8 *fromp,
+		int nbytes)
+{
+	int k, rem, off_pl_nbyt;
+	int ind = 0;
+	int from_off = 0;
+	int pg_ind = 0;
+	int num_sgat = sgatp->num_sgat;
+	int elem_sz = PAGE_SIZE * (1 << sgatp->page_order);
+	struct page *pg_ep = sgatp->pages[pg_ind];
+
+	if (off >= sgatp->dlen)
+		return;
+	off_pl_nbyt = off + nbytes;
+	if (off_pl_nbyt >= sgatp->dlen) {
+		nbytes = sgatp->dlen - off;
+		off_pl_nbyt = off + nbytes;
+	}
+	/* first loop steps over off bytes, second loop zeros nbytes */
+	for (k = 0; k < off; k += rem) {
+		rem = off - k;
+		if (rem >= elem_sz) {
+			++pg_ind;
+			if (pg_ind >= num_sgat)
+				return;
+			rem = elem_sz;
+			ind = 0;
+		} else {
+			ind = elem_sz - rem;
+		}
+	}
+	pg_ep = sgatp->pages[pg_ind];
+	for ( ; k < off_pl_nbyt; k += rem) {
+		rem = off_pl_nbyt - k;
+		if (rem >= elem_sz) {
+			memcpy((u8 *)pg_ep + ind, fromp + from_off,
+			       elem_sz - ind);
+			if (++pg_ind >= num_sgat)
+				return;
+			pg_ep = sgatp->pages[pg_ind];
+			rem = elem_sz;
+			ind = 0;
+			from_off += elem_sz - ind;
+		} else {
+			memcpy((u8 *)pg_ep + ind, fromp + from_off, rem - ind);
+			/* last time around, no need to update indexes */
+		}
+	}
+}
+
 /*
  * Takes a pointer (cop) to the multiple request (mrq) control object and
  * a pointer to the command array. The command array (with tot_reqs elements)
@@ -1018,7 +1123,12 @@ sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *do_on_sfp,
 	if (unlikely(s_res == -EFAULT))
 		return s_res;
 	hp->info |= SG_INFO_MRQ_FINI;
-	if (do_on_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
+	if (mhp->co_mmap) {
+		sg_sgat_cp_into(mhp->co_mmap_sgatp, indx * SZ_SG_IO_V4,
+				(const u8 *)hp, SZ_SG_IO_V4);
+		if (do_on_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL))
+			kill_fasync(&do_on_sfp->async_qp, SIGPOLL, POLL_IN);
+	} else if (do_on_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
 		s_res = sg_mrq_arr_flush(mhp);
 		if (unlikely(s_res))	/* can only be -EFAULT */
 			return s_res;
@@ -1124,23 +1234,24 @@ sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp,
 }
 
 static int
-sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
-	      struct sg_io_v4 *a_hds, u8 *cdb_ap, struct sg_fd *sfp,
-	      bool immed, u32 tot_reqs, bool *share_on_othp)
+sg_mrq_sanity(struct sg_mrq_hold *mhp)
 {
-	bool have_mrq_sense = (cop->response && cop->max_response_len);
-	bool share_on_oth = false;
 	bool last_is_keep_share = false;
-	bool share;
+	bool share, have_mrq_sense;
 	int k;
+	struct sg_io_v4 *cop = mhp->cwrp->h4p;
 	u32 cdb_alen = cop->request_len;
-	u32 cdb_mxlen = cdb_alen / tot_reqs;
+	u32 cdb_mxlen = cdb_alen / mhp->tot_reqs;
 	u32 flags;
+	struct sg_fd *sfp = mhp->cwrp->sfp;
+	struct sg_io_v4 *a_hds = mhp->a_hds;
+	u8 *cdb_ap = mhp->cdb_ap;
 	struct sg_io_v4 *hp;
 	__maybe_unused const char *rip = "request index";
 
+	have_mrq_sense = (cop->response && cop->max_response_len);
 	/* Pre-check each request for anomalies, plus some preparation */
-	for (k = 0, hp = a_hds; k < tot_reqs; ++k, ++hp) {
+	for (k = 0, hp = a_hds; k < mhp->tot_reqs; ++k, ++hp) {
 		flags = hp->flags;
 		sg_v4h_partial_zero(hp);
 		if (unlikely(hp->guard != 'Q' || hp->protocol != 0 ||
@@ -1156,7 +1267,7 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 			return -ERANGE;
 		}
 		share = !!(flags & SGV4_FLAG_SHARE);
-		if (immed) {	/* only accept async submits on current fd */
+		if (mhp->immed) {/* only accept async submits on current fd */
 			if (unlikely(flags & SGV4_FLAG_DO_ON_OTHER)) {
 				SG_LOG(1, sfp, "%s: %s %u, %s\n", __func__,
 				       rip, k, "no IMMED with ON_OTHER");
@@ -1171,10 +1282,12 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 				return -ERANGE;
 			}
 		}
-		if (sg_fd_is_shared(sfp)) {
-			if (!share_on_oth && share)
-				share_on_oth = true;
-		} else {
+		if (mhp->co_mmap && (flags & SGV4_FLAG_MMAP_IO)) {
+			SG_LOG(1, sfp, "%s: %s %u, MMAP in co AND here\n",
+			       __func__, rip, k);
+			return -ERANGE;
+		}
+		if (!sg_fd_is_shared(sfp)) {
 			if (unlikely(share)) {
 				SG_LOG(1, sfp, "%s: %s %u, no share\n",
 				       __func__, rip, k);
@@ -1204,8 +1317,6 @@ sg_mrq_sanity(struct sg_device *sdp, struct sg_io_v4 *cop,
 		       __func__);
 		return -ERANGE;
 	}
-	if (share_on_othp)
-		*share_on_othp = share_on_othp;
 	return 0;
 }
 
@@ -1229,7 +1340,7 @@ sg_mrq_svb_chk(struct sg_io_v4 *a_hds, u32 tot_reqs)
 	/* expect read-write pairs, all with SGV4_FLAG_NO_DXFER set */
 	for (k = 0, hp = a_hds; k < tot_reqs; ++k, ++hp) {
 		flags = hp->flags;
-		if (flags & (SGV4_FLAG_COMPLETE_B4))
+		if (flags & SGV4_FLAG_COMPLETE_B4)
 			return false;
 		if (!seen_wr) {
 			if (hp->dout_xfer_len > 0)
@@ -1357,7 +1468,14 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 			       hp->device_status);
 			break;	/* cop->driver_status <-- 0 in this case */
 		}
-		if (rq_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
+		if (mhp->co_mmap) {
+			sg_sgat_cp_into(mhp->co_mmap_sgatp, j * SZ_SG_IO_V4,
+					(const u8 *)hp, SZ_SG_IO_V4);
+			if (rq_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL))
+				kill_fasync(&rq_sfp->async_qp, SIGPOLL,
+					    POLL_IN);
+		} else if (rq_sfp->async_qp &&
+			   (hp->flags & SGV4_FLAG_SIGNAL)) {
 			res = sg_mrq_arr_flush(mhp);
 			if (unlikely(res))
 				break;
@@ -1653,14 +1771,15 @@ sg_mrq_name(bool blocking, u32 flags)
 static int
 sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 {
-	bool f_non_block, share_on_oth;
+	bool f_non_block, co_share;
 	int res = 0;
 	int existing_id;
 	u32 cdb_mxlen;
 	struct sg_io_v4 *cop = cwrp->h4p;	/* controlling object */
-	u32 blen = cop->dout_xfer_len;
+	u32 dout_len = cop->dout_xfer_len;
+	u32 din_len = cwrp->dlen;
 	u32 cdb_alen = cop->request_len;
-	u32 tot_reqs = blen / SZ_SG_IO_V4;
+	u32 tot_reqs = dout_len / SZ_SG_IO_V4;
 	u8 *cdb_ap = NULL;
 	struct sg_io_v4 *a_hds;		/* array of request objects */
 	struct sg_fd *fp = cwrp->sfp;
@@ -1678,8 +1797,12 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 	mrq_name = sg_mrq_name(blocking, cop->flags);
 #endif
 	f_non_block = !!(fp->filp->f_flags & O_NONBLOCK);
+	co_share = !!(cop->flags & SGV4_FLAG_SHARE);
 	mhp->immed = !!(cop->flags & SGV4_FLAG_IMMED);
 	mhp->stop_if = !!(cop->flags & SGV4_FLAG_STOP_IF);
+	mhp->co_mmap = !!(cop->flags & SGV4_FLAG_MMAP_IO);
+	if (mhp->co_mmap)
+		mhp->co_mmap_sgatp = fp->rsv_arr[0]->sgatp;
 	mhp->id_of_mrq = (int)cop->request_extra;
 	mhp->tot_reqs = tot_reqs;
 	mhp->s_res = 0;
@@ -1702,6 +1825,11 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 			       __func__, "with SGV4_FLAG_IMMED");
 			return -ERANGE;
 		}
+		if (unlikely(co_share)) {
+			SG_LOG(1, fp, "%s: ioctl(SG_IO) %s contradicts\n",
+			       __func__, "with SGV4_FLAG_SHARE");
+			return -ERANGE;
+		}
 		if (unlikely(f_non_block)) {
 			SG_LOG(6, fp, "%s: ioctl(SG_IO) %s O_NONBLOCK\n",
 			       __func__, "ignoring");
@@ -1714,9 +1842,25 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 	       mrq_name, tot_reqs, mhp->id_of_mrq);
 	sg_v4h_partial_zero(cop);
 
+	if (mhp->co_mmap) {
+		struct sg_request *srp = fp->rsv_arr[0];
+
+		if (unlikely(fp->mmap_sz == 0))
+			return -EBADFD;	/* want mmap() active on fd */
+		if ((int)din_len > fp->mmap_sz)
+			return  -E2BIG;
+		if (cop->din_xferp)
+			pr_info_once("%s: co::din_xferp ignored due to SGV4_FLAG_MMAP_IO\n",
+				     __func__);
+		if (srp)
+			sg_sgat_zero(srp->sgatp, 0 /* offset */, fp->mmap_sz);
+		else
+			return -EPROTO;
+	}
 	if (unlikely(tot_reqs > U16_MAX)) {
 		return -ERANGE;
-	} else if (unlikely(blen > SG_MAX_MULTI_REQ_SZ ||
+	} else if (unlikely(dout_len > SG_MAX_MULTI_REQ_SZ ||
+			    din_len > SG_MAX_MULTI_REQ_SZ ||
 			    cdb_alen > SG_MAX_MULTI_REQ_SZ)) {
 		return  -E2BIG;
 	} else if (unlikely(mhp->immed && mhp->stop_if)) {
@@ -1757,9 +1901,11 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 			goto fini;
 		}
 	}
+	mhp->cdb_ap = cdb_ap;
+	mhp->a_hds = a_hds;
+	mhp->cdb_mxlen = cdb_mxlen;
 	/* do sanity checks on all requests before starting */
-	res = sg_mrq_sanity(sdp, cop, a_hds, cdb_ap, fp, mhp->immed,
-			    tot_reqs, &share_on_oth);
+	res = sg_mrq_sanity(mhp);
 	if (unlikely(res))
 		goto fini;
 
@@ -1768,11 +1914,7 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 	if (o_sfp)
 		clear_bit(SG_FFD_NO_CMD_Q, o_sfp->ffd_bm);
 
-	mhp->cdb_ap = cdb_ap;
-	mhp->a_hds = a_hds;
-	mhp->cdb_mxlen = cdb_mxlen;
-
-	if (!mhp->immed && !blocking && share_on_oth) {
+	if (co_share) {
 		bool ok;
 
 		/* check for 'shared' variable blocking (svb) */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 72/83] sg: add eventfd support
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (71 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 71/83] sg: add mmap IO option for mrq metadata Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 73/83] sg: table of error number explanations Douglas Gilbert
                   ` (10 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Experimental version. Add support for user space to pass a file
descriptor generated by the eventfd(2) system call by ioctl(2) to this
driver, thereby associating the eventfd with a sg file descriptor. Add
support to remove the eventfd relationship so another can be added to
the same sg file descriptor. If a eventfd is active on a sg fd and a
request has the SGV4_FLAG_EVENTFD flag set then on completion of that
request, it "signals" that eventfd by adding 1 to its internal count.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 157 +++++++++++++++++++++++++++++++----------
 include/uapi/scsi/sg.h |   9 ++-
 2 files changed, 124 insertions(+), 42 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 48bf5ccca5b5..d030f7c43bf0 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -46,6 +46,7 @@ static char *sg_version_date = "20210421";
 #include <linux/timekeeping.h>
 #include <linux/proc_fs.h>		/* used if CONFIG_SCSI_PROC_FS */
 #include <linux/xarray.h>
+#include <linux/eventfd.h>
 #include <linux/debugfs.h>
 
 #include <scsi/scsi.h>
@@ -293,6 +294,7 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	struct file *filp;	/* my identity when sharing */
 	struct sg_fd __rcu *share_sfp;/* fd share cross-references, else NULL */
 	struct fasync_struct *async_qp; /* used by asynchronous notification */
+	struct eventfd_ctx *efd_ctxp;	/* eventfd context or NULL */
 	struct xarray srp_arr;	/* xarray of sg_request object pointers */
 	struct sg_request *rsv_arr[SG_MAX_RSV_REQS];
 	struct kref f_ref;
@@ -412,6 +414,7 @@ static void sg_take_snap(struct sg_fd *sfp, bool clear_first);
 #define SG_HAVE_EXCLUDE(sdp) test_bit(SG_FDEV_EXCLUDE, (sdp)->fdev_bm)
 #define SG_IS_O_NONBLOCK(sfp) (!!((sfp)->filp->f_flags & O_NONBLOCK))
 #define SG_RQ_ACTIVE(srp) (atomic_read(&(srp)->rq_st) != SG_RQ_INACTIVE)
+#define SG_IS_V4I(srp) test_bit(SG_FRQ_IS_V4I, (srp)->frq_bm)
 
 /*
  * Kernel needs to be built with CONFIG_SCSI_LOGGING to see log messages.
@@ -1098,7 +1101,7 @@ sg_mrq_arr_flush(struct sg_mrq_hold *mhp)
 }
 
 static int
-sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *do_on_sfp,
+sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *sfp,
 		struct sg_request *srp)
 {
 	int s_res, indx;
@@ -1109,30 +1112,37 @@ sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *do_on_sfp,
 	if (unlikely(!srp))
 		return -EPROTO;
 	indx = srp->s_hdr4.mrq_ind;
-	if (unlikely(srp->parentfp != do_on_sfp)) {
-		SG_LOG(1, do_on_sfp, "%s: mrq_ind=%d, sfp out-of-sync\n",
+	if (unlikely(srp->parentfp != sfp)) {
+		SG_LOG(1, sfp, "%s: mrq_ind=%d, sfp out-of-sync\n",
 		       __func__, indx);
 		return -EPROTO;
 	}
-	SG_LOG(3, do_on_sfp, "%s: mrq_ind=%d, pack_id=%d\n", __func__, indx,
+	SG_LOG(3, sfp, "%s: mrq_ind=%d, pack_id=%d\n", __func__, indx,
 	       srp->pack_id);
 	if (unlikely(indx < 0 || indx >= tot_reqs))
 		return -EPROTO;
 	hp = a_hds + indx;
-	s_res = sg_receive_v4(do_on_sfp, srp, NULL, hp);
+	s_res = sg_receive_v4(sfp, srp, NULL, hp);
 	if (unlikely(s_res == -EFAULT))
 		return s_res;
 	hp->info |= SG_INFO_MRQ_FINI;
 	if (mhp->co_mmap) {
 		sg_sgat_cp_into(mhp->co_mmap_sgatp, indx * SZ_SG_IO_V4,
 				(const u8 *)hp, SZ_SG_IO_V4);
-		if (do_on_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL))
-			kill_fasync(&do_on_sfp->async_qp, SIGPOLL, POLL_IN);
-	} else if (do_on_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
+		if (sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL))
+			kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+		if (sfp->efd_ctxp && (srp->rq_flags & SGV4_FLAG_EVENTFD)) {
+			u64 n = eventfd_signal(sfp->efd_ctxp, 1);
+
+			if (n != 1)
+				pr_info("%s: srp=%pK eventfd_signal problem\n",
+					__func__, srp);
+		}
+	} else if (sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
 		s_res = sg_mrq_arr_flush(mhp);
 		if (unlikely(s_res))	/* can only be -EFAULT */
 			return s_res;
-		kill_fasync(&do_on_sfp->async_qp, SIGPOLL, POLL_IN);
+		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
 	}
 	return 0;
 }
@@ -1474,6 +1484,14 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 			if (rq_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL))
 				kill_fasync(&rq_sfp->async_qp, SIGPOLL,
 					    POLL_IN);
+			if (rq_sfp->efd_ctxp &&
+			    (srp->rq_flags & SGV4_FLAG_EVENTFD)) {
+				u64 n = eventfd_signal(rq_sfp->efd_ctxp, 1);
+
+				if (n != 1)
+					pr_info("%s: eventfd_signal prob\n",
+						__func__);
+			}
 		} else if (rq_sfp->async_qp &&
 			   (hp->flags & SGV4_FLAG_SIGNAL)) {
 			res = sg_mrq_arr_flush(mhp);
@@ -2677,6 +2695,34 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 	return err;
 }
 
+static void
+sg_complete_shr_rs(struct sg_fd *sfp, struct sg_request *srp, bool other_err,
+		   enum sg_rq_state sr_st)
+{
+	int poll_type = POLL_OUT;
+	struct sg_fd *ws_sfp = sg_fd_share_ptr(sfp);
+
+	if (unlikely(!sg_result_is_good(srp->rq_result) || other_err)) {
+		set_bit(SG_FFD_READ_SIDE_ERR, sfp->ffd_bm);
+		sg_rq_chg_state_force(srp, SG_RQ_BUSY);
+		poll_type = POLL_HUP;   /* "Hang-UP flag */
+	} else if (sr_st != SG_RQ_SHR_SWAP) {
+		sg_rq_chg_state_force(srp, SG_RQ_SHR_SWAP);
+	}
+	if (ws_sfp && !srp->sh_srp) {
+		if (ws_sfp->async_qp &&
+		    (!SG_IS_V4I(srp) || (srp->rq_flags & SGV4_FLAG_SIGNAL)))
+			kill_fasync(&ws_sfp->async_qp, SIGPOLL, poll_type);
+		if (ws_sfp->efd_ctxp && (srp->rq_flags & SGV4_FLAG_EVENTFD)) {
+			u64 n = eventfd_signal(ws_sfp->efd_ctxp, 1);
+
+			if (n != 1)
+				pr_info("%s: srp=%pK eventfd prob\n",
+					__func__, srp);
+		}
+	}
+}
+
 static void
 sg_complete_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool other_err)
 {
@@ -2687,25 +2733,7 @@ sg_complete_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool other_err)
 	       sg_shr_str(srp->sh_var, true));
 	switch (srp->sh_var) {
 	case SG_SHR_RS_RQ:
-		{
-			int poll_type = POLL_OUT;
-			struct sg_fd *ws_sfp = sg_fd_share_ptr(sfp);
-
-			if (unlikely(!sg_result_is_good(srp->rq_result) ||
-				     other_err)) {
-				set_bit(SG_FFD_READ_SIDE_ERR, sfp->ffd_bm);
-				if (sr_st != SG_RQ_BUSY)
-					sg_rq_chg_state_force(srp, SG_RQ_BUSY);
-				poll_type = POLL_HUP;   /* "Hang-UP flag */
-			} else if (sr_st != SG_RQ_SHR_SWAP) {
-				sg_rq_chg_state_force(srp, SG_RQ_SHR_SWAP);
-			}
-			if (ws_sfp && ws_sfp->async_qp && !srp->sh_srp &&
-			    (!test_bit(SG_FRQ_IS_V4I, srp->frq_bm) ||
-			     (srp->rq_flags & SGV4_FLAG_SIGNAL)))
-				kill_fasync(&ws_sfp->async_qp, SIGPOLL,
-					    poll_type);
-		}
+		sg_complete_shr_rs(sfp, srp, other_err, sr_st);
 		break;
 	case SG_SHR_WS_RQ:	/* cleanup both on write-side completion */
 		if (likely(sg_fd_is_shared(sfp))) {
@@ -3655,8 +3683,8 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 	rip->problem = !sg_result_is_good(srp->rq_result);
 	rip->pack_id = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm) ?
 				srp->tag : srp->pack_id;
-	rip->usr_ptr = test_bit(SG_FRQ_IS_V4I, srp->frq_bm) ?
-			uptr64(srp->s_hdr4.usr_ptr) : srp->s_hdr3.usr_ptr;
+	rip->usr_ptr = SG_IS_V4I(srp) ? uptr64(srp->s_hdr4.usr_ptr)
+				      : srp->s_hdr3.usr_ptr;
 	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 }
 
@@ -3713,7 +3741,7 @@ sg_wait_event_srp(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 #endif
 		return res;
 	}
-	if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm))
+	if (SG_IS_V4I(srp))
 		res = sg_receive_v4(sfp, srp, p, h4p);
 	else
 		res = sg_receive_v3(sfp, srp, p);
@@ -4237,6 +4265,23 @@ sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
 	return found ? 0 : -ENOTSOCK; /* ENOTSOCK for fd exists but not sg */
 }
 
+static int
+sg_eventfd_new(struct sg_fd *rs_sfp, int eventfd)
+		__must_hold(&rs_sfp->f_mutex)
+{
+	int ret = 0;
+
+	if (rs_sfp->efd_ctxp)
+		return -EBUSY;
+	rs_sfp->efd_ctxp = eventfd_ctx_fdget(eventfd);
+	if (IS_ERR(rs_sfp->efd_ctxp)) {
+		ret = PTR_ERR(rs_sfp->efd_ctxp);
+		rs_sfp->efd_ctxp = NULL;
+		return ret;
+	}
+	return ret;
+}
+
 /*
  * First normalize want_rsv_sz to be >= sfp->sgat_elem_sz and
  * <= max_segment_size. Exit if that is the same as old size; otherwise
@@ -4465,7 +4510,6 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 	const u32 c_flgs_rm = seip->ctl_flags_rd_mask;
 	const u32 c_flgs_val_in = seip->ctl_flags;
 	u32 c_flgs_val_out = c_flgs_val_in;
-	struct sg_fd *rs_sfp;
 	struct sg_device *sdp = sfp->parentdp;
 
 	/* TIME_IN_NS boolean, [raw] time in nanoseconds (def: millisecs) */
@@ -4545,7 +4589,8 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 	 * when written: 1 --> write-side doesn't want to continue
 	 */
 	if ((c_flgs_rm & SG_CTL_FLAGM_READ_SIDE_FINI) && sg_fd_is_shared(sfp)) {
-		rs_sfp = sg_fd_share_ptr(sfp);
+		struct sg_fd *rs_sfp = sg_fd_share_ptr(sfp);
+
 		if (rs_sfp && !IS_ERR_OR_NULL(rs_sfp->rsv_arr[0])) {
 			struct sg_request *res_srp = rs_sfp->rsv_arr[0];
 
@@ -4562,7 +4607,8 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		res = sg_finish_rs_rq(sfp);
 	/* READ_SIDE_ERR boolean, [ro] share: read-side finished with error */
 	if (c_flgs_rm & SG_CTL_FLAGM_READ_SIDE_ERR) {
-		rs_sfp = sg_fd_share_ptr(sfp);
+		struct sg_fd *rs_sfp = sg_fd_share_ptr(sfp);
+
 		if (rs_sfp && test_bit(SG_FFD_READ_SIDE_ERR, rs_sfp->ffd_bm))
 			c_flgs_val_out |= SG_CTL_FLAGM_READ_SIDE_ERR;
 		else
@@ -4618,6 +4664,21 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		else
 			c_flgs_val_out &= ~SG_CTL_FLAGM_SNAP_DEV;
 	}
+	/* RM_EVENTFD boolean, [rbw] */
+	if (c_flgs_rm & SG_CTL_FLAGM_RM_EVENTFD)
+		flg = !!sfp->efd_ctxp;
+	if ((c_flgs_wm & SG_CTL_FLAGM_RM_EVENTFD) && (c_flgs_val_in & SG_CTL_FLAGM_RM_EVENTFD)) {
+		if (sfp->efd_ctxp && atomic_read(&sfp->submitted) < 1) {
+			eventfd_ctx_put(sfp->efd_ctxp);
+			sfp->efd_ctxp = NULL;
+		}
+	}
+	if (c_flgs_rm & SG_CTL_FLAGM_RM_EVENTFD) {
+		if (flg)
+			c_flgs_val_out |= SG_CTL_FLAGM_RM_EVENTFD;
+		else
+			c_flgs_val_out &= ~SG_CTL_FLAGM_RM_EVENTFD;
+	}
 
 	if (c_flgs_val_in != c_flgs_val_out)
 		seip->ctl_flags = c_flgs_val_out;
@@ -4773,6 +4834,15 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 		}
 		mutex_unlock(&sfp->f_mutex);
 	}
+	if (or_masks & SG_SEIM_EVENTFD) {
+		mutex_lock(&sfp->f_mutex);
+		if (s_wr_mask & SG_SEIM_EVENTFD) {
+			result = sg_eventfd_new(sfp, (int)seip->share_fd);
+			if (ret == 0 && unlikely(result))
+				ret = result;
+		}
+		mutex_unlock(&sfp->f_mutex);
+	}
 	/* call blk_poll() on this fd's HIPRI requests [raw] */
 	if (or_masks & SG_SEIM_BLK_POLL) {
 		n = 0;
@@ -5514,7 +5584,7 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	a_resid = scsi_rp->resid_len;
 
 	if (unlikely(a_resid)) {
-		if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm)) {
+		if (SG_IS_V4I(srp)) {
 			if (rq_data_dir(rqq) == READ)
 				srp->in_resid = a_resid;
 			else
@@ -5603,9 +5673,16 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	}
 	if (!(srp->rq_flags & SGV4_FLAG_HIPRI))
 		wake_up_interruptible(&sfp->cmpl_wait);
-	if (sfp->async_qp && (!test_bit(SG_FRQ_IS_V4I, srp->frq_bm) ||
+	if (sfp->async_qp && (!SG_IS_V4I(srp) ||
 			      (srp->rq_flags & SGV4_FLAG_SIGNAL)))
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
+	if (sfp->efd_ctxp && (srp->rq_flags & SGV4_FLAG_EVENTFD)) {
+		u64 n = eventfd_signal(sfp->efd_ctxp, 1);
+
+		if (n != 1)
+			pr_info("%s: srp=%pK eventfd_signal problem\n",
+				__func__, srp);
+	}
 	kref_put(&sfp->f_ref, sg_remove_sfp);	/* get in: sg_execute_cmd() */
 }
 
@@ -5943,7 +6020,7 @@ sg_rq_map_kern(struct sg_request *srp, struct request_queue *q, struct request *
 	if (rw_ind == WRITE)
 		op_flags = REQ_SYNC | REQ_IDLE;
 	k = 0;		/* N.B. following condition may increase k */
-	if (test_bit(SG_FRQ_IS_V4I, srp->frq_bm)) {
+	if (SG_IS_V4I(srp)) {
 		struct sg_slice_hdr4 *slh4p = &srp->s_hdr4;
 
 		if (slh4p->dir == SG_DXFER_TO_DEV) {
@@ -6028,7 +6105,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		}
 		SG_LOG(5, sfp, "%s: long_cmdp=0x%pK ++\n", __func__, long_cmdp);
 	}
-	if (likely(test_bit(SG_FRQ_IS_V4I, srp->frq_bm))) {
+	if (SG_IS_V4I(srp)) {
 		struct sg_io_v4 *h4p = cwrp->h4p;
 
 		if (dxfer_dir == SG_DXFER_TO_DEV) {
@@ -7225,6 +7302,8 @@ sg_uc_remove_sfp(struct work_struct *work)
 	if (subm != 0)
 		SG_LOG(1, sfp, "%s: expected submitted=0 got %d\n",
 		       __func__, subm);
+	if (sfp->efd_ctxp)
+		eventfd_ctx_put(sfp->efd_ctxp);
 	xa_destroy(xafp);
 	xadp = &sdp->sfp_arr;
 	xa_lock_irqsave(xadp, iflags);
@@ -7553,7 +7632,7 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
 
 	if (unlikely(len < 1))
 		return 0;
-	v4 = test_bit(SG_FRQ_IS_V4I, srp->frq_bm);
+	v4 = SG_IS_V4I(srp);
 	is_v3v4 = v4 ? true : (srp->s_hdr3.interface_id != '\0');
 	sg_get_rsv_str(srp, "     ", "", sizeof(b), b);
 	if (strlen(b) > 5)
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 52eccedf2f33..148a5f2786ee 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -115,6 +115,7 @@ typedef struct sg_io_hdr {
 #define SGV4_FLAG_Q_AT_TAIL SG_FLAG_Q_AT_TAIL
 #define SGV4_FLAG_Q_AT_HEAD SG_FLAG_Q_AT_HEAD
 #define SGV4_FLAG_DOUT_OFFSET  0x40	/* dout byte offset in v4::spare_in */
+#define SGV4_FLAG_EVENTFD 0x80		/* signal completion on ... */
 #define SGV4_FLAG_COMPLETE_B4  0x100	/* mrq: complete this rq before next */
 #define SGV4_FLAG_SIGNAL 0x200	/* v3: ignored; v4 signal on completion */
 #define SGV4_FLAG_IMMED 0x400   /* issue request and return immediately ... */
@@ -196,7 +197,8 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_SEIM_CHG_SHARE_FD	0x40	/* read-side given new write-side fd */
 #define SG_SEIM_SGAT_ELEM_SZ	0x80	/* sgat element size (>= PAGE_SIZE) */
 #define SG_SEIM_BLK_POLL	0x100	/* call blk_poll, uses 'num' field */
-#define SG_SEIM_ALL_BITS	0x1ff	/* should be OR of previous items */
+#define SG_SEIM_EVENTFD		0x200	/* pass eventfd to driver */
+#define SG_SEIM_ALL_BITS	0x3ff	/* should be OR of previous items */
 
 /* flag and mask values for boolean fields follow */
 #define SG_CTL_FLAGM_TIME_IN_NS	0x1	/* time: nanosecs (def: millisecs) */
@@ -214,7 +216,8 @@ typedef struct sg_req_info {	/* used by SG_GET_REQUEST_TABLE ioctl() */
 #define SG_CTL_FLAGM_MORE_ASYNC	0x800	/* yield EAGAIN in more cases */
 #define SG_CTL_FLAGM_EXCL_WAITQ 0x1000	/* only 1 wake up per response */
 #define SG_CTL_FLAGM_SNAP_DEV	0x2000	/* output to debugfs::snapped */
-#define SG_CTL_FLAGM_ALL_BITS	0x3fff	/* should be OR of previous items */
+#define SG_CTL_FLAGM_RM_EVENTFD	0x4000	/* only if new eventfd wanted */
+#define SG_CTL_FLAGM_ALL_BITS	0x7fff	/* should be OR of previous items */
 
 /* Write one of the following values to sg_extended_info::read_value, get... */
 #define SG_SEIRV_INT_MASK	0x0	/* get SG_SEIM_ALL_BITS */
@@ -253,7 +256,7 @@ struct sg_extended_info {
 	__u32	reserved_sz;	/* data/sgl size of pre-allocated request */
 	__u32	tot_fd_thresh;	/* total data/sgat for this fd, 0: no limit */
 	__u32	minor_index;	/* rd: kernel's sg device minor number */
-	__u32	share_fd;	/* SHARE_FD and CHG_SHARE_FD use this */
+	__u32	share_fd;	/* for SHARE_FD, CHG_SHARE_FD or EVENTFD */
 	__u32	sgat_elem_sz;	/* sgat element size (must be power of 2) */
 	__s32	num;		/* blk_poll: loop_count (-1 -> spin)) */
 	__u8	pad_to_96[48];	/* pad so struct is 96 bytes long */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 73/83] sg: table of error number explanations
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (72 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 72/83] sg: add eventfd support Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 74/83] sg: add ordered write flag Douglas Gilbert
                   ` (9 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Rather than having a piece of paper recording which errno
values have been used for what, the author thought why not
place then in one table in the driver code.

As a guesstimate, over half the code in this driver is dedicated
to sanity checking and reporting errors. Those errors may come
from the host machine, the SCSI HBA or its associated hardware,
or the transport or the storage device. For near end errors
some creative license is taken with errno values (e.g.
ENOTSOCK) to convey a better sense of what this driver is
objecting to.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index d030f7c43bf0..c4421a426045 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -453,6 +453,50 @@ static void sg_take_snap(struct sg_fd *sfp, bool clear_first);
 #define SG_LOG(depth, sfp, fmt, a...) do { } while (0)
 #endif	/* end of CONFIG_SCSI_LOGGING && SG_DEBUG conditional */
 
+/*
+ * Unless otherwise noted, functions that return int will return 0 for successful or a
+ * negated errno value. Here is list of errno_s generated by this driver:
+ *
+ * E2BIG	sum(dlen) > tot_fd_thresh ; write-side dxfer_len > read-side dxfer_len
+ * EACCES	user (process) does not have sufficient privilege or capabilities
+ * EADDRINUSE	sharing: write-side file descriptor already in share
+ * EADDRNOTAVAIL   sharing: read-side file descriptor already in share
+ *		   write-side request but no preceding read-side request
+ * EAGAIN	[aka EWOULDBLOCK]; occurs when O_NONBLOCK set on open() or
+ *		SGV4_FLAG_IMMED given, and SG_IORECEIVE (or read(2)) not ready
+ * EBADF	user provided fd to sg_fd_share() or sg_fd_reshare() is bad
+ * EBADFD	SG_FLAG_MMAP_IO given but no mmap() active
+ * EBUSY	'Device or resource busy'; this uses open(O_NONBLOCK) but another
+ *		has open(O_EXCL); reserve request in use (e.g. when mmap() called)
+ * EDOM		numerical error, command queueing false and second command
+ *		attempted when one already outstanding, mrq pack_id
+ * EFAULT	problem moving data to or from user space
+ * EFBIG	too many reserve requests on this file descriptor
+ * EINTR	interrupted system call (generated by kernel, not this driver)
+ * EINVAL	flags or other input information contradicts or disallowed
+ * EIO		only kept for backward compatibility, too generic to be useful
+ * ELOOP	sharing: file descriptor can't share with itself
+ * EMSGSIZE	cdb too long (> 252 bytes) or too short (less than 6 bytes)
+ * ENODATA	sharing: no data xfer requested; mmap or direct io problem
+ *		SG_IOABORT: no match on pack_id or tag; mrq: no active reqs
+ * ENODEV	target (SCSI) device associated with the fd has "disappeared"
+ * ENOMEM	obvious; could be some pre-allocated cache that is exhausted
+ * ENOMSG	data transfer setup needed or (direction) disallowed (sharing)
+ * ENOSTR	write-side request abandoned due to read-side error or state
+ * ENOTSOCK	sharing: file descriptor for sharing unassociated with sg driver
+ * ENXIO	'no such device or address' SCSI mid-level processing errors
+ *		(e.g. command timeouts); also sg info not in 'file' struct
+ * EPERM	not permitted (even if has ACCES); v1+2,v3,v4 interface usage
+ *		violation, opened read-only but SCSI command not listed read-only
+ * EPROTO	logic error (in driver); like "shouldn't get here"
+ * EPROTOTYPE	atomic state change failed unexpectedly
+ * ERANGE	multiple requests: usually bad flag values
+ * ERESTARTSYS	should not be seen in user space, associated with an
+ *		interruptable wait; will restart system call or give EINTR
+ * EWOULDBLOCK	[aka EAGAIN]; additionally if the 'more async' flag is set
+ *		SG_IOSUBMIT may yield this error
+ */
+
 /*
  * The SCSI interfaces that use read() and write() as an asynchronous variant of
  * ioctl(..., SG_IO, ...) are fundamentally unsafe, since there are lots of ways
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 74/83] sg: add ordered write flag
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (73 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 73/83] sg: table of error number explanations Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 75/83] sg: expand source line length to 100 characters Douglas Gilbert
                   ` (8 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Add a new flag: SGV4_FLAG_ORDERED_WR which is only used in the
"shared variable blocking" (svb) method of multiple requests, on
the control object. Without this flags, write-side requests may
may be issued in a different order than either the order their
corresponding read-side requests were issued, or the order they
appear in the request array. [Both of those amount to the same
thing.] This occurs because write-side requests are issued when
the corresponding read-side request has completed and those
completions may be out-of-order.

With this flag on the control object, read-side request completions
are processed strictly in the order they were issued. This leads
to the desired effect of having the write-side requests issued in
the same order that they appear in the command array (and after
their corresponding read-side completions).

In svb, the chances are that the data length being read then
written is the same from one chunk to the next; perhaps smaller for
the last chunk. This will lead to the same write-side request object
being chosen as each read-write pair is processed. So provide the
previous write-side request object pointer as a candidate for the
current write-side object.

The important sg_setup_request() function is getting bloated again so
factor out sg_setup_req_new_srp() helper.

Clean up same variable namings to lessen (the author's)
confusion. Also do some checkpatch work.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 1889 ++++++++++++++++++++++------------------
 include/uapi/scsi/sg.h |    1 +
 2 files changed, 1047 insertions(+), 843 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index c4421a426045..d6e18cb4df11 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -113,20 +113,21 @@ enum sg_shr_var {
 /* If sum_of(dlen) of a fd exceeds this, write() will yield E2BIG */
 #define SG_TOT_FD_THRESHOLD (32 * 1024 * 1024)
 
-#define SG_TIME_UNIT_MS 0	/* milliseconds */
-/* #define SG_TIME_UNIT_NS 1	   nanoseconds */
+#define SG_TIME_UNIT_MS 0	/* command duration unit: a millisecond */
+/* #define SG_TIME_UNIT_NS 1	   in nanoseconds, using high resolution timer (hrt) */
 #define SG_DEF_TIME_UNIT SG_TIME_UNIT_MS
 #define SG_DEFAULT_TIMEOUT mult_frac(SG_DEFAULT_TIMEOUT_USER, HZ, USER_HZ)
 #define SG_FD_Q_AT_HEAD 0
 #define SG_DEFAULT_Q_AT SG_FD_Q_AT_HEAD /* for backward compatibility */
 #define SG_FL_MMAP_DIRECT (SG_FLAG_MMAP_IO | SG_FLAG_DIRECT_IO)
 
-#define SG_MAX_RSV_REQS 8
+#define SG_MAX_RSV_REQS 8	/* number of svb requests done asynchronously; assume small-ish */
 
 #define SG_PACK_ID_WILDCARD (-1)
 #define SG_TAG_WILDCARD (-1)
 
 #define SG_ADD_RQ_MAX_RETRIES 40	/* to stop infinite _trylock(s) */
+#define SG_DEF_BLK_POLL_LOOP_COUNT 1000	/* may allow user to tweak this */
 
 /* Bit positions (flags) for sg_request::frq_bm bitmask follow */
 #define SG_FRQ_IS_V4I		0	/* true (set) when is v4 interface */
@@ -333,16 +334,19 @@ struct sg_comm_wr_t {  /* arguments to sg_common_write() */
 	struct sg_fd *sfp;
 	const u8 __user *u_cmdp;
 	const u8 *cmdp;
+	struct sg_request *possible_srp;	/* possible candidate for this request */
 };
 
-struct sg_mrq_hold {	/* for passing context between mrq functions */
-	bool blocking;
-	bool chk_abort;
-	bool immed;
-	bool stop_if;
-	bool co_mmap;
+struct sg_mrq_hold {	/* for passing context between multiple requests (mrq) functions */
+	unsigned from_sg_io:1;
+	unsigned chk_abort:1;
+	unsigned immed:1;
+	unsigned hipri:1;
+	unsigned stop_if:1;
+	unsigned co_mmap:1;
+	unsigned ordered_wr:1;
 	int id_of_mrq;
-	int s_res;		/* secondary error: some-good-then-error */
+	int s_res;		/* secondary error: some-good-then-error; in co.spare_out */
 	u32 cdb_mxlen;		/* cdb length in cdb_ap, actual be may less */
 	u32 tot_reqs;		/* total number of requests and cdb_s */
 	struct sg_comm_wr_t *cwrp;	/* cwrp->h4p is mrq control object */
@@ -351,6 +355,12 @@ struct sg_mrq_hold {	/* for passing context between mrq functions */
 	struct sg_scatter_hold *co_mmap_sgatp;
 };
 
+struct sg_svb_elem {	/* context of shared variable blocking (svb) per SG_MAX_RSV_REQS */
+	int ws_pos;			/* write-side position in user supplied sg_io_v4 array */
+	struct sg_request *rs_srp;	/* read-side object ptr, will be next */
+	struct sg_request *prev_ws_srp;	/* previous write-side object ptr, candidate for next */
+};
+
 /* tasklet or soft irq callback */
 static void sg_rq_end_io(struct request *rqq, blk_status_t status);
 /* Declarations of other static functions used before they are defined */
@@ -366,8 +376,6 @@ static int sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp,
 static int sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 			struct sg_request **o_srp);
 static struct sg_request *sg_common_write(struct sg_comm_wr_t *cwrp);
-static int sg_wait_event_srp(struct sg_fd *sfp, void __user *p,
-			     struct sg_io_v4 *h4p, struct sg_request *srp);
 static int sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp,
 			 void __user *p, struct sg_io_v4 *h4p);
 static int sg_read_append(struct sg_request *srp, void __user *outp,
@@ -378,7 +386,6 @@ static void sg_remove_sfp(struct kref *);
 static void sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side);
 static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id,
 					    bool is_tag);
-static bool sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp);
 static struct sg_request *sg_setup_req(struct sg_comm_wr_t *cwrp,
 				       enum sg_shr_var sh_var);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
@@ -387,9 +394,15 @@ static void sg_device_destroy(struct kref *kref);
 static struct sg_request *sg_mk_srp_sgat(struct sg_fd *sfp, bool first,
 					 int db_len);
 static int sg_abort_req(struct sg_fd *sfp, struct sg_request *srp);
+static int sg_rq_chg_state(struct sg_request *srp, enum sg_rq_state old_st,
+			   enum sg_rq_state new_st);
+static int sg_finish_rs_rq(struct sg_fd *sfp, struct sg_request *rs_srp,
+			   bool even_if_in_ws);
+static void sg_rq_chg_state_force(struct sg_request *srp, enum sg_rq_state new_st);
 static int sg_sfp_blk_poll(struct sg_fd *sfp, int loop_count);
 static int sg_srp_q_blk_poll(struct sg_request *srp, struct request_queue *q,
 			     int loop_count);
+
 #if IS_ENABLED(CONFIG_SCSI_LOGGING) && IS_ENABLED(SG_DEBUG)
 static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
 static const char *sg_shr_str(enum sg_shr_var sh_var, bool long_str);
@@ -492,7 +505,7 @@ static void sg_take_snap(struct sg_fd *sfp, bool clear_first);
  * EPROTOTYPE	atomic state change failed unexpectedly
  * ERANGE	multiple requests: usually bad flag values
  * ERESTARTSYS	should not be seen in user space, associated with an
- *		interruptable wait; will restart system call or give EINTR
+ *		interruptible wait; will restart system call or give EINTR
  * EWOULDBLOCK	[aka EAGAIN]; additionally if the 'more async' flag is set
  *		SG_IOSUBMIT may yield this error
  */
@@ -1144,6 +1157,71 @@ sg_mrq_arr_flush(struct sg_mrq_hold *mhp)
 	return 0;
 }
 
+static inline const char *
+sg_side_str(struct sg_request *srp)
+{
+	return (srp->sh_var == SG_SHR_WS_NOT_SRQ || srp->sh_var == SG_SHR_WS_RQ) ? "write_side" :
+										   "read-side";
+}
+
+static inline int
+sg_num_waiting_maybe_acquire(struct sg_fd *sfp)
+{
+	int num = atomic_read(&sfp->waiting);
+
+	if (num < 1)
+		num = atomic_read_acquire(&sfp->waiting);
+	return num;
+}
+
+/*
+ * Returns true if a request is ready and its srp is written to *srpp . If nothing can be found
+ * returns false and NULL --> *srpp . If device is detaching, returns true and NULL --> *srpp .
+ */
+static bool
+sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
+{
+	bool second = false;
+	int l_await_idx = READ_ONCE(sfp->low_await_idx);
+	unsigned long idx, s_idx, end_idx;
+	struct sg_request *srp;
+	struct xarray *xafp = &sfp->srp_arr;
+
+	if (SG_IS_DETACHING(sfp->parentdp)) {
+		*srpp = ERR_PTR(-ENODEV);
+		return true;
+	}
+	if (sg_num_waiting_maybe_acquire(sfp) < 1)
+		goto fini;
+
+	s_idx = (l_await_idx < 0) ? 0 : l_await_idx;
+	idx = s_idx;
+	end_idx = ULONG_MAX;
+
+second_time:
+	for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
+	     srp;
+	     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
+		if (likely(sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY) == 0)) {
+			*srpp = srp;
+			WRITE_ONCE(sfp->low_await_idx, idx + 1);
+			return true;
+		}
+	}
+	/* If not found so far, need to wrap around and search [0 ... s_idx) */
+	if (!srp && !second && s_idx > 0) {
+		end_idx = s_idx - 1;
+		s_idx = 0;
+		idx = s_idx;
+		second = true;
+		goto second_time;
+	}
+fini:
+	*srpp = NULL;
+	return false;
+}
+
+/* N.B. After this function is completed what srp points to should be considered invalid. */
 static int
 sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *sfp,
 		struct sg_request *srp)
@@ -1152,6 +1230,7 @@ sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *sfp,
 	int tot_reqs = mhp->tot_reqs;
 	struct sg_io_v4 *hp;
 	struct sg_io_v4 *a_hds = mhp->a_hds;
+	struct sg_io_v4 *cop = mhp->cwrp->h4p;
 
 	if (unlikely(!srp))
 		return -EPROTO;
@@ -1161,26 +1240,32 @@ sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *sfp,
 		       __func__, indx);
 		return -EPROTO;
 	}
-	SG_LOG(3, sfp, "%s: mrq_ind=%d, pack_id=%d\n", __func__, indx,
-	       srp->pack_id);
+	SG_LOG(3, sfp, "%s: %s, mrq_ind=%d, pack_id=%d\n", __func__,
+	       sg_side_str(srp), indx, srp->pack_id);
 	if (unlikely(indx < 0 || indx >= tot_reqs))
 		return -EPROTO;
 	hp = a_hds + indx;
 	s_res = sg_receive_v4(sfp, srp, NULL, hp);
+	if (unlikely(!sg_result_is_good(srp->rq_result)))
+		SG_LOG(2, sfp, "%s: %s, bad status: drv/tran/scsi=0x%x/0x%x/0x%x\n",
+		       __func__, sg_side_str(srp), hp->driver_status,
+		       hp->transport_status, hp->device_status);
 	if (unlikely(s_res == -EFAULT))
 		return s_res;
 	hp->info |= SG_INFO_MRQ_FINI;
+	++cop->info;
+	if (cop->din_xfer_len > 0)
+		--cop->din_resid;
 	if (mhp->co_mmap) {
 		sg_sgat_cp_into(mhp->co_mmap_sgatp, indx * SZ_SG_IO_V4,
 				(const u8 *)hp, SZ_SG_IO_V4);
 		if (sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL))
 			kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
-		if (sfp->efd_ctxp && (srp->rq_flags & SGV4_FLAG_EVENTFD)) {
+		if (sfp->efd_ctxp && (hp->flags & SGV4_FLAG_EVENTFD)) {
 			u64 n = eventfd_signal(sfp->efd_ctxp, 1);
 
 			if (n != 1)
-				pr_info("%s: srp=%pK eventfd_signal problem\n",
-					__func__, srp);
+				pr_info("%s: eventfd_signal problem\n", __func__);
 		}
 	} else if (sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
 		s_res = sg_mrq_arr_flush(mhp);
@@ -1192,7 +1277,7 @@ sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *sfp,
 }
 
 static int
-sg_wait_mrq_event(struct sg_fd *sfp, struct sg_request **srpp)
+sg_wait_any_mrq(struct sg_fd *sfp, struct sg_request **srpp)
 {
 	if (test_bit(SG_FFD_EXCL_WAITQ, sfp->ffd_bm))
 		return __wait_event_interruptible_exclusive
@@ -1202,6 +1287,159 @@ sg_wait_mrq_event(struct sg_fd *sfp, struct sg_request **srpp)
 					  sg_mrq_get_ready_srp(sfp, srpp));
 }
 
+static bool
+sg_srp_hybrid_sleep(struct sg_request *srp)
+{
+	struct hrtimer_sleeper hs;
+	enum hrtimer_mode mode;
+	ktime_t kt = ns_to_ktime(5000);
+
+	if (test_and_set_bit(SG_FRQ_POLL_SLEPT, srp->frq_bm))
+		return false;
+	if (kt == 0)
+		return false;
+
+	mode = HRTIMER_MODE_REL;
+	hrtimer_init_sleeper_on_stack(&hs, CLOCK_MONOTONIC, mode);
+	hrtimer_set_expires(&hs.timer, kt);
+
+	do {
+		if (atomic_read(&srp->rq_st) != SG_RQ_INFLIGHT)
+			break;
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		hrtimer_sleeper_start_expires(&hs, mode);
+		if (hs.task)
+			io_schedule();
+		hrtimer_cancel(&hs.timer);
+		mode = HRTIMER_MODE_ABS;
+	} while (hs.task && !signal_pending(current));
+
+	__set_current_state(TASK_RUNNING);
+	destroy_hrtimer_on_stack(&hs.timer);
+	return true;
+}
+
+static inline bool
+sg_rq_landed(struct sg_device *sdp, struct sg_request *srp)
+{
+	return atomic_read_acquire(&srp->rq_st) != SG_RQ_INFLIGHT || SG_IS_DETACHING(sdp);
+}
+
+/* This is a blocking wait (or poll) for a given srp. */
+static int
+sg_wait_poll_for_given_srp(struct sg_fd *sfp, struct sg_request *srp, bool do_poll)
+{
+	int res;
+	struct sg_device *sdp = sfp->parentdp;
+
+	SG_LOG(3, sfp, "%s: do_poll=%d\n", __func__, (int)do_poll);
+	if (do_poll || (srp->rq_flags & SGV4_FLAG_HIPRI))
+		goto poll_loop;
+
+	if (atomic_read(&srp->rq_st) != SG_RQ_INFLIGHT)
+		goto skip_wait;		/* and skip _acquire() */
+	/* N.B. The SG_FFD_EXCL_WAITQ flag is ignored here. */
+	res = __wait_event_interruptible(sfp->cmpl_wait,
+					 sg_rq_landed(sdp, srp));
+	if (unlikely(res)) { /* -ERESTARTSYS because signal hit thread */
+		set_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
+		/* orphans harvested when sfp->keep_orphan is false */
+		sg_rq_chg_state_force(srp, SG_RQ_INFLIGHT);
+		SG_LOG(1, sfp, "%s:  wait_event_interruptible(): %s[%d]\n",
+		       __func__, (res == -ERESTARTSYS ? "ERESTARTSYS" : ""),
+		       res);
+		return res;
+	}
+skip_wait:
+	if (SG_IS_DETACHING(sdp))
+		goto detaching;
+	return sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY);
+poll_loop:
+	if (srp->rq_flags & SGV4_FLAG_HIPRI) {
+		long state = current->state;
+
+		do {
+			res = sg_srp_q_blk_poll(srp, sdp->device->request_queue,
+						SG_DEF_BLK_POLL_LOOP_COUNT);
+			if (res == -ENODATA || res > 0) {
+				__set_current_state(TASK_RUNNING);
+				break;
+			}
+			if (unlikely(res < 0)) {
+				__set_current_state(TASK_RUNNING);
+				return res;
+			}
+			if (signal_pending_state(state, current)) {
+				__set_current_state(TASK_RUNNING);
+				return -ERESTARTSYS;
+			}
+			if (SG_IS_DETACHING(sdp)) {
+				__set_current_state(TASK_RUNNING);
+				goto detaching;
+			}
+			cpu_relax();
+		} while (true);
+	} else {
+		enum sg_rq_state sr_st;
+
+		if (!sg_srp_hybrid_sleep(srp))
+			return -EINVAL;
+		if (signal_pending(current))
+			return -ERESTARTSYS;
+		if (SG_IS_DETACHING(sdp))
+			goto detaching;
+		sr_st = atomic_read(&srp->rq_st);
+		if (unlikely(sr_st != SG_RQ_AWAIT_RCV))
+			return -EPROTO;         /* Logic error */
+		return sg_rq_chg_state(srp, sr_st, SG_RQ_BUSY);
+	}
+	if (atomic_read_acquire(&srp->rq_st) != SG_RQ_AWAIT_RCV)
+		return (test_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm) &&
+			atomic_read(&sfp->submitted) < 1) ? -ENODATA : 0;
+	return unlikely(sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY)) ? -EPROTO : 0;
+
+detaching:
+	sg_rq_chg_state_force(srp, SG_RQ_INACTIVE);
+	atomic_inc(&sfp->inactives);
+	return -ENODEV;
+}
+
+static struct sg_request *
+sg_mrq_poll_either(struct sg_fd *sfp, struct sg_fd *sec_sfp, bool *on_sfp)
+{
+	bool sig_pending = false;
+	long state = current->state;
+	struct sg_request *srp;
+
+	do {		/* alternating polling loop */
+		if (sfp) {
+			if (sg_mrq_get_ready_srp(sfp, &srp)) {
+				if (!srp)
+					return ERR_PTR(-ENODEV);
+				*on_sfp = true;
+				__set_current_state(TASK_RUNNING);
+				return srp;
+			}
+		}
+		if (sec_sfp && sfp != sec_sfp) {
+			if (sg_mrq_get_ready_srp(sec_sfp, &srp)) {
+				if (!srp)
+					return ERR_PTR(-ENODEV);
+				*on_sfp = false;
+				__set_current_state(TASK_RUNNING);
+				return srp;
+			}
+		}
+		if (signal_pending_state(state, current)) {
+			sig_pending = true;
+			break;
+		}
+		cpu_relax();
+	} while (!need_resched());
+	__set_current_state(TASK_RUNNING);
+	return ERR_PTR(sig_pending ? -ERESTARTSYS : -EAGAIN);
+}
+
 /*
  * This is a fair-ish algorithm for an interruptible wait on two file
  * descriptors. It favours the main fd over the secondary fd (sec_sfp).
@@ -1211,48 +1449,31 @@ static int
 sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp,
 		struct sg_fd *sec_sfp, int mreqs, int sec_reqs)
 {
-	int res = 0;
-	int rres;
+	bool on_sfp;
+	int res;
 	struct sg_request *srp;
-	struct sg_io_v4 *cop = mhp->cwrp->h4p;
 
 	SG_LOG(3, sfp, "%s: mreqs=%d, sec_reqs=%d\n", __func__, mreqs,
 	       sec_reqs);
 	while (mreqs + sec_reqs > 0) {
 		while (mreqs > 0 && sg_mrq_get_ready_srp(sfp, &srp)) {
-			if (IS_ERR(srp)) {	/* -ENODATA: no mrqs here */
-				if (PTR_ERR(srp) == -ENODATA)
-					break;
-				res = PTR_ERR(srp);
-				break;
-			}
 			--mreqs;
 			res = sg_mrq_1complet(mhp, sfp, srp);
 			if (unlikely(res))
 				return res;
-			++cop->info;
-			if (cop->din_xfer_len > 0)
-				--cop->din_resid;
 		}
 		while (sec_reqs > 0 && sg_mrq_get_ready_srp(sec_sfp, &srp)) {
-			if (IS_ERR(srp)) {
-				if (PTR_ERR(srp) == -ENODATA)
-					break;
-				res = PTR_ERR(srp);
-				break;
-			}
 			--sec_reqs;
-			rres = sg_mrq_1complet(mhp, sec_sfp, srp);
-			if (unlikely(rres))
-				return rres;
-			++cop->info;
-			if (cop->din_xfer_len > 0)
-				--cop->din_resid;
+			res = sg_mrq_1complet(mhp, sec_sfp, srp);
+			if (unlikely(res))
+				return res;
 		}
+		if (mhp->hipri)
+			goto start_polling;
 		if (res)
 			break;
 		if (mreqs > 0) {
-			res = sg_wait_mrq_event(sfp, &srp);
+			res = sg_wait_any_mrq(sfp, &srp);
 			if (unlikely(res))
 				return res;	/* signal --> -ERESTARTSYS */
 			if (IS_ERR(srp)) {
@@ -1262,13 +1483,10 @@ sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp,
 				res = sg_mrq_1complet(mhp, sfp, srp);
 				if (unlikely(res))
 					return res;
-				++cop->info;
-				if (cop->din_xfer_len > 0)
-					--cop->din_resid;
 			}
 		}
 		if (sec_reqs > 0) {
-			res = sg_wait_mrq_event(sec_sfp, &srp);
+			res = sg_wait_any_mrq(sec_sfp, &srp);
 			if (unlikely(res))
 				return res;	/* signal --> -ERESTARTSYS */
 			if (IS_ERR(srp)) {
@@ -1278,20 +1496,43 @@ sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp,
 				res = sg_mrq_1complet(mhp, sec_sfp, srp);
 				if (unlikely(res))
 					return res;
-				++cop->info;
-				if (cop->din_xfer_len > 0)
-					--cop->din_resid;
 			}
 		}
 	}	/* end of outer while loop (while requests still inflight) */
-	return res;
+	return 0;
+start_polling:
+	while (mreqs + sec_reqs > 0) {
+		srp = sg_mrq_poll_either(sfp, sec_sfp, &on_sfp);
+		if (IS_ERR(srp))
+			return PTR_ERR(srp);
+		if (on_sfp) {
+			--mreqs;
+			res = sg_mrq_1complet(mhp, sfp, srp);
+			if (unlikely(res))
+				return res;
+		} else {
+			--sec_reqs;
+			res = sg_mrq_1complet(mhp, sec_sfp, srp);
+			if (unlikely(res))
+				return res;
+		}
+	}
+	return 0;
 }
 
-static int
-sg_mrq_sanity(struct sg_mrq_hold *mhp)
+/*
+ * Does once pass through the request array looking mainly for bad flag settings and other
+ * contradictions such as setting the SGV4_FLAG_SHARE flag when no file share is set up. Has
+ * code toward the end of the loop for checking the share variable blocking (svb) is using
+ * a strict READ (like) thence WRITE (like) sequence on all data carrying commands; also
+ * a dangling READ is not allowed at the end of a scb request array.
+ */
+static bool
+sg_mrq_sanity(struct sg_mrq_hold *mhp, bool is_svb)
 {
 	bool last_is_keep_share = false;
-	bool share, have_mrq_sense;
+	bool expect_wr = false;
+	bool share, have_mrq_sense, have_file_share;
 	int k;
 	struct sg_io_v4 *cop = mhp->cwrp->h4p;
 	u32 cdb_alen = cop->request_len;
@@ -1304,149 +1545,116 @@ sg_mrq_sanity(struct sg_mrq_hold *mhp)
 	__maybe_unused const char *rip = "request index";
 
 	have_mrq_sense = (cop->response && cop->max_response_len);
+	have_file_share = sg_fd_is_shared(sfp);
+	if (is_svb && unlikely(!have_file_share)) {
+		SG_LOG(1, sfp, "%s: share variable blocking (svb) needs file share\n", __func__);
+		return false;
+	}
 	/* Pre-check each request for anomalies, plus some preparation */
 	for (k = 0, hp = a_hds; k < mhp->tot_reqs; ++k, ++hp) {
 		flags = hp->flags;
 		sg_v4h_partial_zero(hp);
-		if (unlikely(hp->guard != 'Q' || hp->protocol != 0 ||
-			     hp->subprotocol != 0)) {
-			SG_LOG(1, sfp, "%s: req index %u: %s or protocol\n",
-			       __func__, k, "bad guard");
-			return -ERANGE;
+		if (unlikely(hp->guard != 'Q' || hp->protocol != 0 || hp->subprotocol != 0)) {
+			SG_LOG(1, sfp, "%s: req index %u: bad guard or protocol\n", __func__, k);
+			return false;
 		}
-		last_is_keep_share = !!(flags & SGV4_FLAG_KEEP_SHARE);
 		if (unlikely(flags & SGV4_FLAG_MULTIPLE_REQS)) {
-			SG_LOG(1, sfp, "%s: %s %u: no nested multi-reqs\n",
-			       __func__, rip, k);
-			return -ERANGE;
+			SG_LOG(1, sfp, "%s: %s %u: no nested multi-reqs\n", __func__, rip, k);
+			return false;
 		}
 		share = !!(flags & SGV4_FLAG_SHARE);
-		if (mhp->immed) {/* only accept async submits on current fd */
-			if (unlikely(flags & SGV4_FLAG_DO_ON_OTHER)) {
-				SG_LOG(1, sfp, "%s: %s %u, %s\n", __func__,
-				       rip, k, "no IMMED with ON_OTHER");
-				return -ERANGE;
-			} else if (unlikely(share)) {
-				SG_LOG(1, sfp, "%s: %s %u, %s\n", __func__,
-				       rip, k, "no IMMED with FLAG_SHARE");
-				return -ERANGE;
-			} else if (unlikely(flags & SGV4_FLAG_COMPLETE_B4)) {
-				SG_LOG(1, sfp, "%s: %s %u, %s\n", __func__,
-				       rip, k, "no IMMED with COMPLETE_B4");
-				return -ERANGE;
-			}
+		last_is_keep_share = !!(flags & SGV4_FLAG_KEEP_SHARE);
+		if (mhp->immed &&
+		    unlikely(flags & (SGV4_FLAG_DO_ON_OTHER | SGV4_FLAG_COMPLETE_B4))) {
+			SG_LOG(1, sfp, "%s: %s %u, no IMMED with ON_OTHER or COMPLETE_B4\n",
+			       __func__, rip, k);
+			return false;
+		}
+		if (mhp->immed && unlikely(share)) {
+			SG_LOG(1, sfp, "%s: %s %u, no IMMED with FLAG_SHARE\n", __func__, rip, k);
+			return false;
 		}
 		if (mhp->co_mmap && (flags & SGV4_FLAG_MMAP_IO)) {
-			SG_LOG(1, sfp, "%s: %s %u, MMAP in co AND here\n",
-			       __func__, rip, k);
-			return -ERANGE;
+			SG_LOG(1, sfp, "%s: %s %u, MMAP in co AND here\n", __func__, rip, k);
+			return false;
 		}
-		if (!sg_fd_is_shared(sfp)) {
-			if (unlikely(share)) {
-				SG_LOG(1, sfp, "%s: %s %u, no share\n",
-				       __func__, rip, k);
-				return -ERANGE;
-			} else if (unlikely(flags & SGV4_FLAG_DO_ON_OTHER)) {
-				SG_LOG(1, sfp, "%s: %s %u, %s do on\n",
-				       __func__, rip, k, "no other fd to");
-				return -ERANGE;
-			}
+		if (unlikely(!have_file_share && share)) {
+			SG_LOG(1, sfp, "%s: %s %u, no file share\n", __func__, rip, k);
+			return false;
 		}
-		if (cdb_ap) {
-			if (unlikely(hp->request_len > cdb_mxlen)) {
-				SG_LOG(1, sfp, "%s: %s %u, cdb too long\n",
-				       __func__, rip, k);
-				return -ERANGE;
-			}
+		if (unlikely(!have_file_share && !!(flags & SGV4_FLAG_DO_ON_OTHER))) {
+			SG_LOG(1, sfp, "%s: %s %u, no other fd to do on\n", __func__, rip, k);
+			return false;
 		}
-		if (have_mrq_sense && hp->response == 0 &&
-		    hp->max_response_len == 0) {
+		if (cdb_ap && unlikely(hp->request_len > cdb_mxlen)) {
+			SG_LOG(1, sfp, "%s: %s %u, cdb too long\n", __func__, rip, k);
+			return false;
+		}
+		if (have_mrq_sense && hp->response == 0 && hp->max_response_len == 0) {
 			hp->response = cop->response;
 			hp->max_response_len = cop->max_response_len;
 		}
-	}
-	if (last_is_keep_share) {
-		SG_LOG(1, sfp,
-		       "%s: Can't set SGV4_FLAG_KEEP_SHARE on last mrq req\n",
-		       __func__);
-		return -ERANGE;
-	}
-	return 0;
-}
-
-/*
- * Read operation (din) must precede any write (dout) operations and a din
- * operation can't be last (data transferring) operations. Non data
- * transferring operations can appear anywhere. Data transferring operations
- * must have SGV4_FLAG_SHARE set. Dout operations must additionally have
- * SGV4_FLAG_NO_DXFER and SGV4_FLAG_DO_ON_OTHER set. Din operations must
- * not set SGV4_FLAG_DO_ON_OTHER.
- */
-static bool
-sg_mrq_svb_chk(struct sg_io_v4 *a_hds, u32 tot_reqs)
-{
-	bool last_rd = false;
-	bool seen_wr = false;
-	int k;
-	u32 flags;
-	struct sg_io_v4 *hp;
-
-	/* expect read-write pairs, all with SGV4_FLAG_NO_DXFER set */
-	for (k = 0, hp = a_hds; k < tot_reqs; ++k, ++hp) {
-		flags = hp->flags;
-		if (flags & SGV4_FLAG_COMPLETE_B4)
+		if (!is_svb)
+			continue;
+		/* mrq share variable blocking (svb) additional constraints checked here */
+		if (unlikely(flags & (SGV4_FLAG_COMPLETE_B4 | SGV4_FLAG_KEEP_SHARE))) {
+			SG_LOG(1, sfp, "%s: %s %u: no KEEP_SHARE with svb\n", __func__, rip, k);
 			return false;
-		if (!seen_wr) {
+		}
+		if (!expect_wr) {
 			if (hp->dout_xfer_len > 0)
-				return false;
+				goto bad_svb;
 			if (hp->din_xfer_len > 0) {
 				if (!(flags & SGV4_FLAG_SHARE))
-					return false;
+					goto bad_svb;
 				if (flags & SGV4_FLAG_DO_ON_OTHER)
-					return false;
-				seen_wr = true;
-				last_rd = true;
+					goto bad_svb;
+				expect_wr = true;
 			}
-			/* allowing commands with no dxfer */
+			/* allowing commands with no dxfer (in both cases) */
 		} else {	/* checking write side */
 			if (hp->dout_xfer_len > 0) {
-				if (~flags &
-				    (SGV4_FLAG_NO_DXFER | SGV4_FLAG_SHARE |
-				     SGV4_FLAG_DO_ON_OTHER))
-					return false;
-				last_rd = false;
-			}
-			if (hp->din_xfer_len > 0) {
-				if (!(flags & SGV4_FLAG_SHARE))
-					return false;
-				if (flags & SGV4_FLAG_DO_ON_OTHER)
-					return false;
-				last_rd = true;
+				if (unlikely(~flags & (SGV4_FLAG_NO_DXFER | SGV4_FLAG_SHARE |
+						       SGV4_FLAG_DO_ON_OTHER)))
+					goto bad_svb;
+				expect_wr = false;
+			} else if (unlikely(hp->din_xfer_len > 0)) {
+				goto bad_svb;
 			}
 		}
+	}		/* end of request array iterating loop */
+	if (last_is_keep_share) {
+		SG_LOG(1, sfp, "%s: Can't set SGV4_FLAG_KEEP_SHARE on last mrq req\n", __func__);
+		return false;
+	}
+	if (is_svb && expect_wr) {
+		SG_LOG(1, sfp, "%s: svb: unpaired READ at end of request array\n", __func__);
+		return false;
 	}
-	return !last_rd;
+	return true;
+bad_svb:
+	SG_LOG(1, sfp, "%s: %s %u: svb alternating read-then-write or flags bad\n", __func__,
+	       rip, k);
+	return false;
 }
-
 static struct sg_request *
-sg_mrq_submit(struct sg_fd *rq_sfp, struct sg_mrq_hold *mhp, int pos_hdr,
-	      int rsv_idx, bool keep_share)
+sg_mrq_submit(struct sg_fd *rq_sfp, struct sg_mrq_hold *mhp, int pos_in_rq_arr, int rsv_idx,
+	      struct sg_request *possible_srp)
 {
 	unsigned long ul_timeout;
 	struct sg_comm_wr_t r_cwr;
 	struct sg_comm_wr_t *r_cwrp = &r_cwr;
-	struct sg_io_v4 *hp = mhp->a_hds + pos_hdr;
+	struct sg_io_v4 *hp = mhp->a_hds + pos_in_rq_arr;
 
 	sg_comm_wr_init(r_cwrp);
 	if (mhp->cdb_ap)	/* already have array of cdbs */
-		r_cwrp->cmdp = mhp->cdb_ap + (pos_hdr * mhp->cdb_mxlen);
+		r_cwrp->cmdp = mhp->cdb_ap + (pos_in_rq_arr * mhp->cdb_mxlen);
 	else			/* fetch each cdb from user space */
 		r_cwrp->u_cmdp = cuptr64(hp->request);
 	r_cwrp->cmd_len = hp->request_len;
 	r_cwrp->rsv_idx = rsv_idx;
 	ul_timeout = msecs_to_jiffies(hp->timeout);
-	__assign_bit(SG_FRQ_SYNC_INVOC, r_cwrp->frq_bm,
-		     (int)mhp->blocking);
+	__assign_bit(SG_FRQ_SYNC_INVOC, r_cwrp->frq_bm, (int)mhp->from_sg_io);
 	__set_bit(SG_FRQ_IS_V4I, r_cwrp->frq_bm);
 	r_cwrp->h4p = hp;
 	r_cwrp->dlen = hp->din_xfer_len ? hp->din_xfer_len : hp->dout_xfer_len;
@@ -1454,7 +1662,7 @@ sg_mrq_submit(struct sg_fd *rq_sfp, struct sg_mrq_hold *mhp, int pos_hdr,
 	if (hp->flags & SGV4_FLAG_DOUT_OFFSET)
 		r_cwrp->wr_offset = hp->spare_in;
 	r_cwrp->sfp = rq_sfp;
-	r_cwrp->keep_share = keep_share;
+	r_cwrp->possible_srp = possible_srp;
 	return sg_common_write(r_cwrp);
 }
 
@@ -1490,7 +1698,7 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 		}
 		flags = hp->flags;
 		rq_sfp = (flags & SGV4_FLAG_DO_ON_OTHER) ? o_sfp : fp;
-		srp = sg_mrq_submit(rq_sfp, mhp, j, -1, false);
+		srp = sg_mrq_submit(rq_sfp, mhp, j, -1, NULL);
 		if (IS_ERR(srp)) {
 			mhp->s_res = PTR_ERR(srp);
 			break;
@@ -1499,50 +1707,24 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 		if (mhp->chk_abort)
 			atomic_set(&srp->s_hdr4.pack_id_of_mrq,
 				   mhp->id_of_mrq);
-		if (mhp->immed ||
-		    (!(mhp->blocking || (flags & shr_complet_b4)))) {
+		if (mhp->immed || (!(mhp->from_sg_io || (flags & shr_complet_b4)))) {
 			if (fp == rq_sfp)
 				++this_fp_sent;
 			else
 				++other_fp_sent;
 			continue;  /* defer completion until all submitted */
 		}
-		mhp->s_res = sg_wait_event_srp(rq_sfp, NULL, hp, srp);
+		mhp->s_res = sg_wait_poll_for_given_srp(rq_sfp, srp, mhp->hipri);
 		if (unlikely(mhp->s_res)) {
-			if (mhp->s_res == -ERESTARTSYS)
+			if (mhp->s_res == -ERESTARTSYS || mhp->s_res == -ENODEV)
 				return mhp->s_res;
 			break;
 		}
+		res = sg_mrq_1complet(mhp, rq_sfp, srp);
+		if (unlikely(res))
+			break;
 		++num_cmpl;
-		hp->info |= SG_INFO_MRQ_FINI;
-		if (mhp->stop_if && !sg_result_is_good(srp->rq_result)) {
-			SG_LOG(2, fp, "%s: %s=0x%x/0x%x/0x%x] cause exit\n",
-			       __func__, "STOP_IF and status [drv/tran/scsi",
-			       hp->driver_status, hp->transport_status,
-			       hp->device_status);
-			break;	/* cop->driver_status <-- 0 in this case */
-		}
-		if (mhp->co_mmap) {
-			sg_sgat_cp_into(mhp->co_mmap_sgatp, j * SZ_SG_IO_V4,
-					(const u8 *)hp, SZ_SG_IO_V4);
-			if (rq_sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL))
-				kill_fasync(&rq_sfp->async_qp, SIGPOLL,
-					    POLL_IN);
-			if (rq_sfp->efd_ctxp &&
-			    (srp->rq_flags & SGV4_FLAG_EVENTFD)) {
-				u64 n = eventfd_signal(rq_sfp->efd_ctxp, 1);
-
-				if (n != 1)
-					pr_info("%s: eventfd_signal prob\n",
-						__func__);
-			}
-		} else if (rq_sfp->async_qp &&
-			   (hp->flags & SGV4_FLAG_SIGNAL)) {
-			res = sg_mrq_arr_flush(mhp);
-			if (unlikely(res))
-				break;
-			kill_fasync(&rq_sfp->async_qp, SIGPOLL, POLL_IN);
-		}
+
 	}	/* end of dispatch request and optionally wait response loop */
 	cop->dout_resid = mhp->tot_reqs - num_subm;
 	cop->info = mhp->immed ? num_subm : num_cmpl;
@@ -1565,238 +1747,342 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 	return res;
 }
 
+/* For multiple requests (mrq) share variable blocking (svb) with no SGV4_FLAG_ORDERED_WR */
 static int
-sg_find_srp_idx(struct sg_fd *sfp, const struct sg_request *srp)
+sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mhp, int ra_ind,
+		      int *num_submp)
 {
-	int k;
-	struct sg_request **rapp = sfp->rsv_arr;
+	bool chk_oth_first = false;
+	bool is_first = true;
+	enum sg_rq_state rq_st;
+	int this_fp_sent = 0;
+	int other_fp_sent = 0;
+	int res = 0;
+	int first_err = 0;
+	int k, m, idx, ws_pos, num_reads, sent, dir;
+	struct sg_io_v4 *hp = mhp->a_hds + ra_ind;
+	struct sg_request *srp;
+	struct sg_request *rs_srp;
+	struct sg_svb_elem svb_arr[SG_MAX_RSV_REQS];
+
+	memset(svb_arr, 0, sizeof(svb_arr));
+	for (k = 0; k < SG_MAX_RSV_REQS && ra_ind < mhp->tot_reqs;
+	     ++hp, ++ra_ind, is_first = false) {
+		if (hp->flags & SGV4_FLAG_DO_ON_OTHER) {
+			if (hp->dout_xfer_len > 0) {	/* need to await read-side completion */
+				svb_arr[k].ws_pos = ra_ind;
+				++k;
+				continue;  /* deferred to next loop */
+			}
+			if (is_first)
+				chk_oth_first = true;
+			SG_LOG(6, o_sfp, "%s: subm-nodat p_id=%d on write-side\n", __func__,
+			       (int)hp->request_extra);
+			srp = sg_mrq_submit(o_sfp, mhp, ra_ind, -1, NULL);
+			if (!IS_ERR(srp))
+				++other_fp_sent;
+		} else {
+			rs_srp = (hp->din_xfer_len > 0) ? svb_arr[k].rs_srp : NULL;
+			SG_LOG(6, fp, "%s: submit p_id=%d on read-side\n", __func__,
+			       (int)hp->request_extra);
+			srp = sg_mrq_submit(fp, mhp, ra_ind, -1, rs_srp);
+			if (!IS_ERR(srp))
+				++this_fp_sent;
+		}
+		if (IS_ERR(srp)) {
+			mhp->s_res = PTR_ERR(srp);
+			if (first_err == 0)
+				first_err = mhp->s_res;
+			SG_LOG(1, fp, "%s: sg_mrq_submit() err: %d\n", __func__, mhp->s_res);
+			break;	/* stop doing rs submits */
+		}
+		++*num_submp;
+		if (hp->din_xfer_len > 0)
+			svb_arr[k].rs_srp = srp;
+		srp->s_hdr4.mrq_ind = ra_ind;
+		if (mhp->chk_abort)
+			atomic_set(&srp->s_hdr4.pack_id_of_mrq, mhp->id_of_mrq);
+	}	/* end of read-side submission, write-side defer loop */
 
-	for (k = 0; k < SG_MAX_RSV_REQS; ++k, ++rapp) {
-		if (*rapp == srp)
-			return k;
+	num_reads = k;
+	sent = this_fp_sent + other_fp_sent;
+
+	for (k = 0; k < sent; ++k) {
+		if (other_fp_sent > 0 && sg_mrq_get_ready_srp(o_sfp, &srp)) {
+other_found:
+			--other_fp_sent;
+			res = sg_mrq_1complet(mhp, o_sfp, srp);
+			if (unlikely(res))
+				break;
+			continue;  /* do available submits first */
+		}
+		if (this_fp_sent > 0 && sg_mrq_get_ready_srp(fp, &srp)) {
+this_found:
+			--this_fp_sent;
+			dir = srp->s_hdr4.dir;
+			res = sg_mrq_1complet(mhp, fp, srp);
+			if (unlikely(res))
+				break;
+			if (dir != SG_DXFER_FROM_DEV)
+				continue;
+			if (test_bit(SG_FFD_READ_SIDE_ERR, fp->ffd_bm))
+				continue;
+			/* read-side req completed, submit its write-side(s) */
+			rs_srp = srp;
+			for (m = 0; m < num_reads; ++m) {
+				if (rs_srp == svb_arr[m].rs_srp)
+					break;
+			}
+			if (m >= num_reads) {
+				SG_LOG(1, fp, "%s: rs [pack_id=%d]: missing ws\n", __func__,
+				       srp->pack_id);
+				continue;
+			}
+			rq_st = atomic_read(&rs_srp->rq_st);
+			if (rq_st == SG_RQ_INACTIVE)
+				continue;       /* probably an error, bypass paired write-side rq */
+			else if (rq_st != SG_RQ_SHR_SWAP) {
+				SG_LOG(1, fp, "%s: expect rs_srp to be in shr_swap\n", __func__);
+				res = -EPROTO;
+				break;
+			}
+			ws_pos = svb_arr[m].ws_pos;
+			for (idx = 0; idx < SG_MAX_RSV_REQS; ++idx) {
+				if (fp->rsv_arr[idx] == srp)
+					break;
+			}
+			if (idx >= SG_MAX_RSV_REQS) {
+				SG_LOG(1, fp, "%s: srp not in rsv_arr\n", __func__);
+				res = -EPROTO;
+				break;
+			}
+			SG_LOG(6, o_sfp, "%s: ws_pos=%d, rs_idx=%d\n", __func__, ws_pos, idx);
+			srp = sg_mrq_submit(o_sfp, mhp, ws_pos, idx, svb_arr[m].prev_ws_srp);
+			if (IS_ERR(srp)) {
+				mhp->s_res = PTR_ERR(srp);
+				if (mhp->s_res == -EFBIG) {	/* out of reserve slots */
+					if (first_err)
+						break;
+					res = mhp->s_res;
+					break;
+				}
+				if (first_err == 0)
+					first_err = mhp->s_res;
+				svb_arr[m].prev_ws_srp = NULL;
+				SG_LOG(1, o_sfp, "%s: mrq_submit(oth)->%d\n", __func__, mhp->s_res);
+				continue;
+			}
+			svb_arr[m].prev_ws_srp = srp;
+			++*num_submp;
+			++other_fp_sent;
+			++sent;
+			srp->s_hdr4.mrq_ind = ws_pos;
+			if (mhp->chk_abort)
+				atomic_set(&srp->s_hdr4.pack_id_of_mrq, mhp->id_of_mrq);
+			continue;  /* do available submits first */
+		}
+		/* waits maybe interrupted by signals (-ERESTARTSYS) */
+		if (chk_oth_first)
+			goto oth_first;
+this_second:
+		if (this_fp_sent > 0) {
+			res = sg_wait_any_mrq(fp, &srp);
+			if (unlikely(res))
+				break;
+			goto this_found;
+		}
+		if (chk_oth_first)
+			continue;
+oth_first:
+		if (other_fp_sent > 0) {
+			res = sg_wait_any_mrq(o_sfp, &srp);
+			if (unlikely(res))
+				break;
+			goto other_found;
+		}
+		if (chk_oth_first)
+			goto this_second;
+	}	/* end of loop for deferred ws submits and all responses */
+
+	if (res == 0 && first_err)
+		res = first_err;
+	return res;
+}
+
+static int
+sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mhp, int ra_ind,
+		   int *num_submp)
+{
+	enum sg_rq_state rq_st;
+	int k, m, res, idx, ws_pos, num_reads;
+	int this_fp_sent = 0;
+	int other_fp_sent = 0;
+	struct sg_io_v4 *hp = mhp->a_hds + ra_ind;
+	struct sg_request *srp;
+	struct sg_request *rs_srp;
+	struct sg_svb_elem svb_arr[SG_MAX_RSV_REQS];
+
+	memset(svb_arr, 0, sizeof(svb_arr));
+	for (k = 0; k < SG_MAX_RSV_REQS && ra_ind < mhp->tot_reqs; ++hp, ++ra_ind) {
+		if (hp->flags & SGV4_FLAG_DO_ON_OTHER) {
+			if (hp->dout_xfer_len > 0) {
+				/* need to await read-side completion */
+				svb_arr[k].ws_pos = ra_ind;
+				++k;
+				continue;  /* deferred to next loop */
+			}
+			SG_LOG(6, o_sfp, "%s: subm-nodat p_id=%d on write-side\n", __func__,
+			       (int)hp->request_extra);
+			srp = sg_mrq_submit(o_sfp, mhp, ra_ind, -1, NULL);
+			if (!IS_ERR(srp))
+				++other_fp_sent;
+		} else {
+			rs_srp = (hp->din_xfer_len > 0) ? svb_arr[k].rs_srp : NULL;
+			SG_LOG(6, fp, "%s: submit p_id=%d on read-side\n", __func__,
+			       (int)hp->request_extra);
+			srp = sg_mrq_submit(fp, mhp, ra_ind, -1, rs_srp);
+			if (!IS_ERR(srp))
+				++this_fp_sent;
+		}
+		if (IS_ERR(srp)) {
+			mhp->s_res = PTR_ERR(srp);
+			res = mhp->s_res;	/* don't loop again */
+			SG_LOG(1, fp, "%s: sg_mrq_submit() err: %d\n", __func__, res);
+			break;
+		}
+		++*num_submp;
+		if (hp->din_xfer_len > 0)
+			svb_arr[k].rs_srp = srp;
+		srp->s_hdr4.mrq_ind = ra_ind;
+		if (mhp->chk_abort)
+			atomic_set(&srp->s_hdr4.pack_id_of_mrq, mhp->id_of_mrq);
+	}	/* end of first, inner for loop */
+
+	num_reads = k;
+
+	if (this_fp_sent + other_fp_sent <= 0)
+		return 0;
+	for (m = 0; m < num_reads; ++m) {
+		rs_srp = svb_arr[m].rs_srp;
+		if (!rs_srp)
+			continue;
+		res = sg_wait_poll_for_given_srp(fp, rs_srp, mhp->hipri);
+		if (unlikely(res))
+			return res;
+		--this_fp_sent;
+		res = sg_mrq_1complet(mhp, fp, rs_srp);
+		if (unlikely(res))
+			return res;
+		if (test_bit(SG_FFD_READ_SIDE_ERR, fp->ffd_bm))
+			continue;
+		rq_st = atomic_read(&rs_srp->rq_st);
+		if (rq_st == SG_RQ_INACTIVE)
+			continue;       /* probably an error, bypass paired write-side rq */
+		else if (rq_st != SG_RQ_SHR_SWAP) {
+			SG_LOG(1, fp, "%s: expect rs_srp to be in shr_swap\n", __func__);
+			res = -EPROTO;
+			break;
+		}
+		ws_pos = svb_arr[m].ws_pos;
+		for (idx = 0; idx < SG_MAX_RSV_REQS; ++idx) {
+			if (fp->rsv_arr[idx] == rs_srp)
+				break;
+		}
+		if (idx >= SG_MAX_RSV_REQS) {
+			SG_LOG(1, rs_srp->parentfp, "%s: srp not in rsv_arr\n", __func__);
+			res = -EPROTO;
+			return res;
+		}
+		SG_LOG(6, o_sfp, "%s: ws_pos=%d, rs_idx=%d\n", __func__, ws_pos, idx);
+		srp = sg_mrq_submit(o_sfp, mhp, ws_pos, idx, svb_arr[m].prev_ws_srp);
+		if (IS_ERR(srp)) {
+			mhp->s_res = PTR_ERR(srp);
+			res = mhp->s_res;
+			SG_LOG(1, o_sfp,
+			       "%s: mrq_submit(oth)->%d\n",
+				__func__, res);
+			return res;
+		}
+		svb_arr[m].prev_ws_srp = srp;
+		++*num_submp;
+		++other_fp_sent;
+		srp->s_hdr4.mrq_ind = ws_pos;
+		if (mhp->chk_abort)
+			atomic_set(&srp->s_hdr4.pack_id_of_mrq,
+				   mhp->id_of_mrq);
 	}
-	return -1;
+	while (this_fp_sent > 0) {	/* non-data requests */
+		res = sg_wait_any_mrq(fp, &srp);
+		if (unlikely(res))
+			return res;
+		--this_fp_sent;
+		res = sg_mrq_1complet(mhp, fp, srp);
+		if (unlikely(res))
+			return res;
+	}
+	while (other_fp_sent > 0) {
+		res = sg_wait_any_mrq(o_sfp, &srp);
+		if (unlikely(res))
+			return res;
+		--other_fp_sent;
+		res = sg_mrq_1complet(mhp, o_sfp, srp);
+		if (unlikely(res))
+			return res;
+	}
+	return 0;
 }
 
 /*
- * Processes shared variable blocking. First inner loop submits a chunk of
- * requests (some read-side, some non-data) but defers any write-side requests. The
- * second inner loop processes the completions from the first inner loop, plus
- * for any completed read-side request it submits the paired write-side request. The
- * second inner loop also waits for the completions of those write-side requests.
- * The outer loop then moves onto the next chunk, working its way through
- * the multiple requests. The user sees a blocking command, but the chunks
- * are run in parallel apart from read-write ordering requirement.
- * N.B. Only one svb mrq permitted per file descriptor at a time.
+ * Processes shared variable blocking (svb) method for multiple requests (mrq). There are two
+ * variants: unordered write-side requests; and ordered write-side requests. The read-side requests
+ * are always issued in the order specified in the request array. The unordered write-side requests
+ * are processed on a "first come, first serve" basis, with the majority of the work done by
+ * sg_svb_mrq_first_come(). Likewise sg_svb_mrq_ordered() handles the majoity of the ordered
+ * write-side requests variant. Those two functions process a "chunk" of mrq_s at a time. This
+ * function loops until request array is exhausted and does some clean-up. N.B. the "only one mrq
+ * per fd" rule is enforced by the SG_FFD_SVB_ACTIVE file descriptor flag.
  */
 static int
 sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 		   struct sg_mrq_hold *mhp)
 {
 	bool aborted = false;
-	bool chk_oth_first, keep_share;
-	int k, j, i, m, rcv_before, idx, ws_pos, sent;
-	int this_fp_sent, other_fp_sent;
+	int j, delta_subm, subm_before, cmpl_before;
 	int num_subm = 0;
 	int num_cmpl = 0;
 	int res = 0;
-	struct sg_fd *rq_sfp;
 	struct sg_io_v4 *cop = mhp->cwrp->h4p;
-	struct sg_io_v4 *hp;		/* ptr to request object in a_hds */
-	struct sg_request *srp;
-	struct sg_request *rs_srp;
-	struct sg_io_v4 *a_hds = mhp->a_hds;
-	int ws_pos_a[SG_MAX_RSV_REQS];	/* write-side hdr pos within a_hds */
-	struct sg_request *rs_srp_a[SG_MAX_RSV_REQS];
 
 	SG_LOG(3, fp, "%s: id_of_mrq=%d, tot_reqs=%d, enter\n", __func__,
 	       mhp->id_of_mrq, mhp->tot_reqs);
 
-	/* work through mrq array, SG_MAX_RSV_REQS read-side requests at a time */
-	for (hp = a_hds, j = 0; j < mhp->tot_reqs; ) {
-		this_fp_sent = 0;
-		other_fp_sent = 0;
-		chk_oth_first = false;
-		for (k = 0; k < SG_MAX_RSV_REQS && j < mhp->tot_reqs;
-		     ++hp, ++j) {
-			if (mhp->chk_abort &&
-			    test_and_clear_bit(SG_FFD_MRQ_ABORT, fp->ffd_bm)) {
-				SG_LOG(1, fp,
-				       "%s: id_of_mrq=%d aborting at pos=%d\n",
-				       __func__, mhp->id_of_mrq, num_subm);
-				aborted = true;
-				/*
-				 * after mrq abort detected, complete those
-				 * already submitted, but don't submit any more
-				 */
-			}
-			if (aborted)
-				break;
-			if (hp->flags & SGV4_FLAG_DO_ON_OTHER) {
-				if (hp->dout_xfer_len > 0) {
-					/* need to await read-side completion */
-					ws_pos_a[k] = j;
-					++k;
-					continue;  /* deferred to next loop */
-				}
-				chk_oth_first = true;
-				SG_LOG(6, o_sfp,
-				       "%s: subm-nodat p_id=%d on write-side\n",
-				       __func__, (int)hp->request_extra);
-				rq_sfp = o_sfp;
-			} else {
-				SG_LOG(6, fp, "%s: submit p_id=%d on read-side\n",
-				       __func__, (int)hp->request_extra);
-				rq_sfp = fp;
-			}
-			srp = sg_mrq_submit(rq_sfp, mhp, j, -1, false);
-			if (IS_ERR(srp)) {
-				mhp->s_res = PTR_ERR(srp);
-				res = mhp->s_res;	/* don't loop again */
-				SG_LOG(1, rq_sfp, "%s: mrq_submit()->%d\n",
-				       __func__, res);
-				break;
-			}
-			num_subm++;
-			if (hp->din_xfer_len > 0)
-				rs_srp_a[k] = srp;
-			srp->s_hdr4.mrq_ind = j;
-			if (mhp->chk_abort)
-				atomic_set(&srp->s_hdr4.pack_id_of_mrq,
-					   mhp->id_of_mrq);
-			if (fp == rq_sfp)
-				++this_fp_sent;
-			else
-				++other_fp_sent;
+	/* outer loop: SG_MAX_RSV_REQS read-side requests (chunks) at a time */
+	for (j = 0; j < mhp->tot_reqs; j += delta_subm) {
+		if (mhp->chk_abort &&
+		    test_and_clear_bit(SG_FFD_MRQ_ABORT, fp->ffd_bm)) {
+			SG_LOG(1, fp, "%s: id_of_mrq=%d aborting at pos=%d\n", __func__,
+			       mhp->id_of_mrq, num_subm);
+			aborted = true;
 		}
-		sent = this_fp_sent + other_fp_sent;
-		if (sent <= 0)
+		if (aborted)
+			break;
+
+		subm_before = num_subm;
+		cmpl_before = cop->info;
+		if (mhp->ordered_wr)
+			res = sg_svb_mrq_ordered(fp, o_sfp, mhp, j, &num_subm);
+		else	/* write-side request done on first come, first served basis */
+			res = sg_svb_mrq_first_come(fp, o_sfp, mhp, j, &num_subm);
+		delta_subm = num_subm - subm_before;
+		num_cmpl += (cop->info - cmpl_before);
+		if (res || delta_subm == 0)	/* error or didn't make progress */
 			break;
-		/*
-		 * We have just submitted a fixed number read-side reqs and any
-		 * others (that don't move data). Now we pick up their
-		 * responses. Any responses that were read-side requests have
-		 * their paired write-side submitted. Finally we wait for those
-		 * paired write-side to complete.
-		 */
-		rcv_before = cop->info;
-		for (i = 0; i < sent; ++i) {	/* now process responses */
-			if (other_fp_sent > 0 &&
-			    sg_mrq_get_ready_srp(o_sfp, &srp)) {
-other_found:
-				if (IS_ERR(srp)) {
-					res = PTR_ERR(srp);
-					break;
-				}
-				--other_fp_sent;
-				res = sg_mrq_1complet(mhp, o_sfp, srp);
-				if (unlikely(res))
-					return res;
-				++cop->info;
-				if (cop->din_xfer_len > 0)
-					--cop->din_resid;
-				continue;  /* do available submits first */
-			}
-			if (this_fp_sent > 0 &&
-			    sg_mrq_get_ready_srp(fp, &srp)) {
-this_found:
-				if (IS_ERR(srp)) {
-					res = PTR_ERR(srp);
-					break;
-				}
-				--this_fp_sent;
-				res = sg_mrq_1complet(mhp, fp, srp);
-				if (unlikely(res))
-					return res;
-				++cop->info;
-				if (cop->din_xfer_len > 0)
-					--cop->din_resid;
-				if (srp->s_hdr4.dir != SG_DXFER_FROM_DEV)
-					continue;
-				if (test_bit(SG_FFD_READ_SIDE_ERR, fp->ffd_bm))
-					continue;
-				/* read-side req completed, submit its write-side */
-				rs_srp = srp;
-				for (m = 0; m < k; ++m) {
-					if (rs_srp == rs_srp_a[m])
-						break;
-				}
-				if (m >= k) {
-					SG_LOG(1, rs_srp->parentfp,
-					       "%s: m >= %d, pack_id=%d\n",
-					       __func__, k, rs_srp->pack_id);
-					res = -EPROTO;
-					break;
-				}
-				ws_pos = ws_pos_a[m];
-				idx = sg_find_srp_idx(fp, rs_srp);
-				if (idx < 0) {
-					SG_LOG(1, rs_srp->parentfp,
-					       "%s: idx < 0\n", __func__);
-					res = -EPROTO;
-					break;
-				}
-				keep_share = false;
-another_dout:
-				SG_LOG(6, o_sfp,
-				       "%s: submit ws_pos=%d, rs_idx=%d\n",
-				       __func__, ws_pos, idx);
-				srp = sg_mrq_submit(o_sfp, mhp, ws_pos, idx,
-						    keep_share);
-				if (IS_ERR(srp)) {
-					mhp->s_res = PTR_ERR(srp);
-					res = mhp->s_res;
-					SG_LOG(1, o_sfp,
-					       "%s: mrq_submit(oth)->%d\n",
-						__func__, res);
-					break;
-				}
-				++num_subm;
-				++other_fp_sent;
-				++sent;
-				srp->s_hdr4.mrq_ind = ws_pos;
-				if (srp->rq_flags & SGV4_FLAG_KEEP_SHARE) {
-					++ws_pos;  /* next for same read-side */
-					keep_share = true;
-					goto another_dout;
-				}
-				if (mhp->chk_abort)
-					atomic_set(&srp->s_hdr4.pack_id_of_mrq,
-						   mhp->id_of_mrq);
-				continue;  /* do available submits first */
-			}
-			/* waits maybe interrupted by signals (-ERESTARTSYS) */
-			if (chk_oth_first)
-				goto oth_first;
-this_second:
-			if (this_fp_sent > 0) {
-				res = sg_wait_mrq_event(fp, &srp);
-				if (unlikely(res))
-					return res;
-				goto this_found;
-			}
-			if (chk_oth_first)
-				continue;
-oth_first:
-			if (other_fp_sent > 0) {
-				res = sg_wait_mrq_event(o_sfp, &srp);
-				if (unlikely(res))
-					return res;
-				goto other_found;
-			}
-			if (chk_oth_first)
-				goto this_second;
-		}	/* end of response/write_side_submit/write_side_response loop */
 		if (unlikely(mhp->s_res == -EFAULT ||
 			     mhp->s_res == -ERESTARTSYS))
 			res = mhp->s_res;	/* this may leave orphans */
-		num_cmpl += (cop->info - rcv_before);
 		if (res)
 			break;
-		if (aborted)
-			break;
-	}	/* end of outer for loop */
-
+	}
 	cop->dout_resid = mhp->tot_reqs - num_subm;
 	if (cop->din_xfer_len > 0) {
 		cop->din_resid = mhp->tot_reqs - num_cmpl;
@@ -1809,11 +2095,11 @@ sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 
 #if IS_ENABLED(SG_LOG_ACTIVE)
 static const char *
-sg_mrq_name(bool blocking, u32 flags)
+sg_mrq_name(bool from_sg_io, u32 flags)
 {
 	if (!(flags & SGV4_FLAG_MULTIPLE_REQS))
 		return "_not_ multiple requests control object";
-	if (blocking)
+	if (from_sg_io)
 		return "ordered blocking";
 	if (flags & SGV4_FLAG_IMMED)
 		return "submit or full non-blocking";
@@ -1824,16 +2110,16 @@ sg_mrq_name(bool blocking, u32 flags)
 #endif
 
 /*
- * Implements the multiple request functionality. When 'blocking' is true
+ * Implements the multiple request functionality. When 'from_sg_io' is true
  * invocation was via ioctl(SG_IO), otherwise it was via ioctl(SG_IOSUBMIT).
  * Submit non-blocking if IMMED flag given or when ioctl(SG_IOSUBMIT)
  * is used with O_NONBLOCK set on its file descriptor. Hipri non-blocking
  * is when the HIPRI flag is given.
  */
 static int
-sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
+sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 {
-	bool f_non_block, co_share;
+	bool f_non_block, is_svb;
 	int res = 0;
 	int existing_id;
 	u32 cdb_mxlen;
@@ -1854,14 +2140,16 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 #endif
 
 	mhp->cwrp = cwrp;
-	mhp->blocking = blocking;
+	mhp->from_sg_io = from_sg_io; /* false if from SG_IOSUBMIT */
 #if IS_ENABLED(SG_LOG_ACTIVE)
-	mrq_name = sg_mrq_name(blocking, cop->flags);
+	mrq_name = sg_mrq_name(from_sg_io, cop->flags);
 #endif
 	f_non_block = !!(fp->filp->f_flags & O_NONBLOCK);
-	co_share = !!(cop->flags & SGV4_FLAG_SHARE);
+	is_svb = !!(cop->flags & SGV4_FLAG_SHARE);	/* via ioctl(SG_IOSUBMIT) only */
 	mhp->immed = !!(cop->flags & SGV4_FLAG_IMMED);
+	mhp->hipri = !!(cop->flags & SGV4_FLAG_HIPRI);
 	mhp->stop_if = !!(cop->flags & SGV4_FLAG_STOP_IF);
+	mhp->ordered_wr = !!(cop->flags & SGV4_FLAG_ORDERED_WR);
 	mhp->co_mmap = !!(cop->flags & SGV4_FLAG_MMAP_IO);
 	if (mhp->co_mmap)
 		mhp->co_mmap_sgatp = fp->rsv_arr[0]->sgatp;
@@ -1881,13 +2169,13 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 	} else {
 		mhp->chk_abort = false;
 	}
-	if (blocking) {		/* came from ioctl(SG_IO) */
+	if (from_sg_io) {
 		if (unlikely(mhp->immed)) {
 			SG_LOG(1, fp, "%s: ioctl(SG_IO) %s contradicts\n",
 			       __func__, "with SGV4_FLAG_IMMED");
 			return -ERANGE;
 		}
-		if (unlikely(co_share)) {
+		if (unlikely(is_svb)) {
 			SG_LOG(1, fp, "%s: ioctl(SG_IO) %s contradicts\n",
 			       __func__, "with SGV4_FLAG_SHARE");
 			return -ERANGE;
@@ -1899,7 +2187,7 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 		}
 	}
 	if (!mhp->immed && f_non_block)
-		mhp->immed = true;
+		mhp->immed = true;	/* hmm, think about this */
 	SG_LOG(3, fp, "%s: %s, tot_reqs=%u, id_of_mrq=%d\n", __func__,
 	       mrq_name, tot_reqs, mhp->id_of_mrq);
 	sg_v4h_partial_zero(cop);
@@ -1943,10 +2231,16 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 
 	if (SG_IS_DETACHING(sdp) || (o_sfp && SG_IS_DETACHING(o_sfp->parentdp)))
 		return -ENODEV;
-
+	if (is_svb && unlikely(test_and_set_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm))) {
+		SG_LOG(1, fp, "%s: %s already active\n", __func__, mrq_name);
+		return -EBUSY;
+	}
 	a_hds = kcalloc(tot_reqs, SZ_SG_IO_V4, GFP_KERNEL | __GFP_NOWARN);
-	if (unlikely(!a_hds))
+	if (unlikely(!a_hds)) {
+		if (is_svb)
+			clear_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm);
 		return -ENOMEM;
+	}
 	if (copy_from_user(a_hds, cuptr64(cop->dout_xferp),
 			   tot_reqs * SZ_SG_IO_V4)) {
 		res = -EFAULT;
@@ -1967,40 +2261,29 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 	mhp->a_hds = a_hds;
 	mhp->cdb_mxlen = cdb_mxlen;
 	/* do sanity checks on all requests before starting */
-	res = sg_mrq_sanity(mhp);
-	if (unlikely(res))
+	if (unlikely(!sg_mrq_sanity(mhp, is_svb))) {
+		res = -ERANGE;
 		goto fini;
+	}
 
 	/* override cmd queuing setting to allow */
 	clear_bit(SG_FFD_NO_CMD_Q, fp->ffd_bm);
 	if (o_sfp)
 		clear_bit(SG_FFD_NO_CMD_Q, o_sfp->ffd_bm);
 
-	if (co_share) {
-		bool ok;
-
-		/* check for 'shared' variable blocking (svb) */
-		ok = sg_mrq_svb_chk(a_hds, tot_reqs);
-		if (!ok) {
-			SG_LOG(1, fp, "%s: %s failed on req(s)\n", __func__,
-			       mrq_name);
-			res = -ERANGE;
-			goto fini;
-		}
-		if (test_and_set_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm)) {
-			SG_LOG(1, fp, "%s: %s already active\n", __func__,
-			       mrq_name);
-			res = -EBUSY;
-			goto fini;
-		}
+	if (is_svb)
 		res = sg_process_svb_mrq(fp, o_sfp, mhp);
-		clear_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm);
-	} else {
+	else
 		res = sg_process_most_mrq(fp, o_sfp, mhp);
-	}
 fini:
-	if (likely(res == 0) && !mhp->immed)
-		res = sg_mrq_arr_flush(mhp);
+	if (!mhp->immed) {		/* for the blocking mrq invocations */
+		int rres = sg_mrq_arr_flush(mhp);
+
+		if (unlikely(rres > 0 && res == 0))
+			res = rres;
+	}
+	if (is_svb)
+		clear_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm);
 	kfree(cdb_ap);
 	kfree(a_hds);
 	return res;
@@ -2008,7 +2291,7 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool blocking)
 
 static int
 sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
-	     bool sync, struct sg_request **o_srp)
+	     bool from_sg_io, struct sg_request **o_srp)
 {
 	int res = 0;
 	int dlen;
@@ -2029,7 +2312,7 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 			*o_srp = NULL;
 		cwr.sfp = sfp;
 		cwr.h4p = h4p;
-		res = sg_do_multi_req(&cwr, sync);
+		res = sg_do_multi_req(&cwr, from_sg_io);
 		if (unlikely(res))
 			return res;
 		if (likely(p)) {
@@ -2049,7 +2332,7 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 		clear_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(h4p->timeout);
 	cwr.sfp = sfp;
-	__assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
+	__assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)from_sg_io);
 	__set_bit(SG_FRQ_IS_V4I, cwr.frq_bm);
 	cwr.h4p = h4p;
 	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
@@ -2062,7 +2345,7 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 		return PTR_ERR(srp);
 	if (o_srp)
 		*o_srp = srp;
-	if (p && !sync && (srp->rq_flags & SGV4_FLAG_YIELD_TAG)) {
+	if (p && !from_sg_io && (srp->rq_flags & SGV4_FLAG_YIELD_TAG)) {
 		u64 gen_tag = srp->tag;
 		struct sg_io_v4 __user *h4_up = (struct sg_io_v4 __user *)p;
 
@@ -2239,12 +2522,11 @@ static const int sg_rq_state_arr[] = {1, 0, 4, 0, 0, 0};
 static const int sg_rq_state_mul2arr[] = {2, 0, 8, 0, 0, 0};
 
 /*
- * This function keeps the srp->rq_st state and associated marks on the
- * owning xarray's element in sync. An attempt si made to change state with
- * a call to atomic_cmpxchg(). If the actual srp->rq_st is not old_st, then
- * -EPROTOTYPE is returned. If the actual srp->rq_st is old_st then it is
- * replaced by new_st and the xarray marks are setup accordingly and 0 is
- * returned. This assumes srp_arr xarray spinlock is held.
+ * This function keeps the srp->rq_st state and associated marks on the owning xarray's element in
+ * sync. An attempt si made to change state with a call to atomic_cmpxchg(). If the actual
+ * srp->rq_st is not old_st, then -EPROTOTYPE is returned. If the actual srp->rq_st is old_st then
+ * it is replaced by new_st and the xarray marks are setup accordingly and 0 is returned. This
+ * function (and others ending in '_ulck') assumes srp_arr xarray spinlock is already held.
  */
 static int
 sg_rq_chg_state_ulck(struct sg_request *srp, enum sg_rq_state old_st,
@@ -2376,37 +2658,29 @@ sg_get_idx_available(struct sg_fd *sfp)
 static struct sg_request *
 sg_get_probable_read_side(struct sg_fd *sfp)
 {
-	struct sg_request **rapp = sfp->rsv_arr;
-	struct sg_request **end_rapp = rapp + SG_MAX_RSV_REQS;
+	struct sg_request **rapp;
 	struct sg_request *rs_srp;
+	struct sg_request *rs_inactive_srp = NULL;
 
-	for ( ; rapp < end_rapp; ++rapp) {
+	for (rapp = sfp->rsv_arr; rapp < rapp + SG_MAX_RSV_REQS; ++rapp) {
 		rs_srp = *rapp;
 		if (IS_ERR_OR_NULL(rs_srp) || rs_srp->sh_srp)
 			continue;
-		switch (atomic_read(&rs_srp->rq_st)) {
+		switch (atomic_read_acquire(&rs_srp->rq_st)) {
 		case SG_RQ_INFLIGHT:
 		case SG_RQ_AWAIT_RCV:
 		case SG_RQ_BUSY:
 		case SG_RQ_SHR_SWAP:
 			return rs_srp;
-		default:
-			break;
-		}
-	}
-	/* Subsequent dout data transfers (e.g. WRITE) on a request share */
-	for (rapp = sfp->rsv_arr; rapp < end_rapp; ++rapp) {
-		rs_srp = *rapp;
-		if (IS_ERR_OR_NULL(rs_srp) || rs_srp->sh_srp)
-			continue;
-		switch (atomic_read(&rs_srp->rq_st)) {
 		case SG_RQ_INACTIVE:
-			return rs_srp;
+			if (!rs_inactive_srp)
+				rs_inactive_srp = rs_srp;
+			break;
 		default:
 			break;
 		}
 	}
-	return NULL;
+	return rs_inactive_srp;
 }
 
 /*
@@ -2468,11 +2742,10 @@ sg_get_rsv_str_lck(struct sg_request *srp, const char *leadin,
 static void
 sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 {
-	bool at_head, sync;
+	bool at_head;
 	struct sg_device *sdp = sfp->parentdp;
 	struct request *rqq = READ_ONCE(srp->rqq);
 
-	sync = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
 	SG_LOG(3, sfp, "%s: pack_id=%d\n", __func__, srp->pack_id);
 	if (test_bit(SG_FFD_NO_DURATION, sfp->ffd_bm))
 		srp->start_ns = 0;
@@ -2491,7 +2764,7 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 	kref_get(&sfp->f_ref); /* put usually in: sg_rq_end_io() */
 	sg_rq_chg_state_force(srp, SG_RQ_INFLIGHT);
 	/* >>>>>>> send cmd/req off to other levels <<<<<<<< */
-	if (!sync) {
+	if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
 		atomic_inc(&sfp->submitted);
 		set_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm);
 	}
@@ -2550,6 +2823,7 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		res = sg_share_chk_flags(fp, rq_flags, dlen, dir, &sh_var);
 		if (unlikely(res < 0))
 			return ERR_PTR(res);
+		cwrp->keep_share = !!(rq_flags & SGV4_FLAG_KEEP_SHARE);
 	} else {
 		sh_var = SG_SHR_NONE;
 		if (unlikely(rq_flags & SGV4_FLAG_SHARE))
@@ -2673,14 +2947,15 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 	enum sg_rq_state rs_st = SG_RQ_INACTIVE;
 	struct sg_request *rs_srp;
 
-	if (unlikely(!scsi_status_is_good(rq_res))) {
-		int sb_len_wr = sg_copy_sense(srp, v4_active);
+	if (unlikely(!sg_result_is_good(rq_res))) {
+		srp->rq_info |= SG_INFO_CHECK;
+		if (!scsi_status_is_good(rq_res)) {
+			int sb_len_wr = sg_copy_sense(srp, v4_active);
 
-		if (unlikely(sb_len_wr < 0))
-			return sb_len_wr;
+			if (unlikely(sb_len_wr < 0))
+				return sb_len_wr;
+		}
 	}
-	if (!sg_result_is_good(rq_res))
-		srp->rq_info |= SG_INFO_CHECK;
 	if (unlikely(test_bit(SG_FRQ_ABORTING, srp->frq_bm)))
 		srp->rq_info |= SG_INFO_ABORTED;
 
@@ -2881,7 +3156,7 @@ sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs,
 		return k;
 
 	for ( ; k < max_mrqs; ++k) {
-		res = sg_wait_mrq_event(sfp, &srp);
+		res = sg_wait_any_mrq(sfp, &srp);
 		if (unlikely(res))
 			return res;	/* signal --> -ERESTARTSYS */
 		if (IS_ERR(srp))
@@ -2945,10 +3220,14 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p,
 	return res;
 }
 
+// sg_wait_id_event
 static int
-sg_wait_id_event(struct sg_fd *sfp, struct sg_request **srpp, int id,
-		 bool is_tag)
+sg_wait_poll_by_id(struct sg_fd *sfp, struct sg_request **srpp, int id,
+		   bool is_tag, int do_poll)
 {
+	if (do_poll)
+		goto poll_loop;
+
 	if (test_bit(SG_FFD_EXCL_WAITQ, sfp->ffd_bm))
 		return __wait_event_interruptible_exclusive
 				(sfp->cmpl_wait,
@@ -2956,6 +3235,28 @@ sg_wait_id_event(struct sg_fd *sfp, struct sg_request **srpp, int id,
 	return __wait_event_interruptible
 			(sfp->cmpl_wait,
 			 sg_get_ready_srp(sfp, srpp, id, is_tag));
+poll_loop:
+	{
+		bool sig_pending = false;
+		long state = current->state;
+		struct sg_request *srp;
+
+		do {
+			srp = sg_find_srp_by_id(sfp, id, is_tag);
+			if (srp) {
+				__set_current_state(TASK_RUNNING);
+				*srpp = srp;
+				return 0;
+			}
+			if (signal_pending_state(state, current)) {
+				sig_pending = true;
+				break;
+			}
+			cpu_relax();
+		} while (!need_resched());
+		__set_current_state(TASK_RUNNING);
+		return sig_pending ? -ERESTARTSYS : -EAGAIN;
+	}
 }
 
 /*
@@ -2988,9 +3289,10 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 	if (unlikely(h4p->guard != 'Q' || h4p->protocol != 0 ||
 		     h4p->subprotocol != 0))
 		return -EPERM;
+	SG_LOG(3, sfp, "%s: non_block=%d, immed=%d, hipri=%d\n", __func__, non_block,
+	       !!(h4p->flags & SGV4_FLAG_IMMED), !!(h4p->flags & SGV4_FLAG_HIPRI));
 	if (h4p->flags & SGV4_FLAG_IMMED)
 		non_block = true;	/* set by either this or O_NONBLOCK */
-	SG_LOG(3, sfp, "%s: non_block(+IMMED)=%d\n", __func__, non_block);
 	if (h4p->flags & SGV4_FLAG_MULTIPLE_REQS)
 		return sg_mrq_ioreceive(sfp, h4p, p, non_block);
 	/* read in part of v3 or v4 header for pack_id or tag based find */
@@ -3001,20 +3303,20 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 		else
 			pack_id = h4p->request_extra;
 	}
-	id = use_tag ? tag : pack_id;
-try_again:
-	srp = sg_find_srp_by_id(sfp, id, use_tag);
-	if (!srp) {     /* nothing available so wait on packet or */
-		if (SG_IS_DETACHING(sdp))
-			return -ENODEV;
-		if (non_block)
-			return -EAGAIN;
-		res = sg_wait_id_event(sfp, &srp, id, use_tag);
-		if (unlikely(res))
-			return res;	/* signal --> -ERESTARTSYS */
+	id = use_tag ? tag : pack_id;
+try_again:
+	if (non_block) {
+		srp = sg_find_srp_by_id(sfp, id, use_tag);
+		if (!srp)
+			return SG_IS_DETACHING(sdp) ? -ENODEV : -EAGAIN;
+	} else {
+		res = sg_wait_poll_by_id(sfp, &srp, pack_id, use_tag,
+					 !!(h4p->flags & SGV4_FLAG_HIPRI));
 		if (IS_ERR(srp))
 			return PTR_ERR(srp);
-	}	/* now srp should be valid */
+		if (unlikely(res))
+			return res;	/* signal --> -ERESTARTSYS */
+	}
 	if (test_and_set_bit(SG_FRQ_RECEIVING, srp->frq_bm)) {
 		cpu_relax();
 		goto try_again;
@@ -3058,18 +3360,18 @@ sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm))
 		pack_id = h3p->pack_id;
 try_again:
-	srp = sg_find_srp_by_id(sfp, pack_id, false);
-	if (!srp) {     /* nothing available so wait on packet or */
-		if (SG_IS_DETACHING(sdp))
-			return -ENODEV;
-		if (non_block)
-			return -EAGAIN;
-		res = sg_wait_id_event(sfp, &srp, pack_id, false);
+	if (non_block) {
+		srp = sg_find_srp_by_id(sfp, pack_id, false);
+		if (!srp)
+			return SG_IS_DETACHING(sdp) ? -ENODEV : -EAGAIN;
+	} else {
+		res = sg_wait_poll_by_id(sfp, &srp, pack_id, false,
+					 !!(h3p->flags & SGV4_FLAG_HIPRI));
 		if (unlikely(res))
 			return res;	/* signal --> -ERESTARTSYS */
 		if (IS_ERR(srp))
 			return PTR_ERR(srp);
-	}	/* now srp should be valid */
+	}
 	if (test_and_set_bit(SG_FRQ_RECEIVING, srp->frq_bm)) {
 		cpu_relax();
 		goto try_again;
@@ -3239,18 +3541,16 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 		}
 	}
 try_again:
-	srp = sg_find_srp_by_id(sfp, want_id, false);
-	if (!srp) {	/* nothing available so wait on packet to arrive or */
-		if (SG_IS_DETACHING(sdp))
-			return -ENODEV;
-		if (non_block) /* O_NONBLOCK or v3::flags & SGV4_FLAG_IMMED */
-			return -EAGAIN;
-		ret = sg_wait_id_event(sfp, &srp, want_id, false);
-		if (unlikely(ret))  /* -ERESTARTSYS as signal hit process */
-			return ret;
+	if (non_block) {
+		srp = sg_find_srp_by_id(sfp, want_id, false);
+		if (!srp)
+			return SG_IS_DETACHING(sdp) ? -ENODEV : -EAGAIN;
+	} else {
+		ret = sg_wait_poll_by_id(sfp, &srp, want_id, false, false);
+		if (unlikely(ret))
+			return ret;	/* signal --> -ERESTARTSYS */
 		if (IS_ERR(srp))
 			return PTR_ERR(srp);
-		/* otherwise srp should be valid */
 	}
 	if (test_and_set_bit(SG_FRQ_RECEIVING, srp->frq_bm)) {
 		cpu_relax();
@@ -3354,16 +3654,18 @@ sg_calc_sgat_param(struct sg_device *sdp)
 }
 
 /*
- * Only valid for shared file descriptors. Designed to be called after a
- * read-side request has successfully completed leaving valid data in a
- * reserve request buffer. The read-side is moved from SG_RQ_SHR_SWAP
- * to SG_RQ_INACTIVE state and returns 0. Acts on first reserve requests.
- * Otherwise -EINVAL is returned, unless write-side is in progress in
+ * Only valid for shared file descriptors. Designed to be called after a read-side request has
+ * successfully completed leaving valid data in a reserve request buffer. May also be called after
+ * a write-side request that has the SGV4_FLAG_KEEP_SHARE flag set. If rs_srp is NULL, acts
+ * on first reserve request in SG_RQ_SHR_SWAP state, making it inactive and returning 0. If rs_srp
+ * is non-NULL and is a reserve request and is in SG_RQ_SHR_SWAP state, makes it busy then
+ * inactive and returns 0. Otherwise -EINVAL is returned, unless write-side is in progress in
  * which case -EBUSY is returned.
  */
 static int
-sg_finish_rs_rq(struct sg_fd *sfp)
+sg_finish_rs_rq(struct sg_fd *sfp, struct sg_request *rs_srp, bool even_if_in_ws)
 {
+	bool found_one = false;
 	int res = -EINVAL;
 	int k;
 	enum sg_rq_state sr_st;
@@ -3381,26 +3683,24 @@ sg_finish_rs_rq(struct sg_fd *sfp)
 	for (k = 0; k < SG_MAX_RSV_REQS; ++k) {
 		res = -EINVAL;
 		rs_rsv_srp = rs_sfp->rsv_arr[k];
+		if (rs_srp) {
+			if (rs_srp != rs_rsv_srp)
+				continue;
+		}
 		if (IS_ERR_OR_NULL(rs_rsv_srp))
 			continue;
 		xa_lock_irqsave(&rs_sfp->srp_arr, iflags);
 		sr_st = atomic_read(&rs_rsv_srp->rq_st);
 		switch (sr_st) {
 		case SG_RQ_SHR_SWAP:
-			res = sg_rq_chg_state_ulck(rs_rsv_srp, sr_st, SG_RQ_BUSY);
-			if (!res)
-				atomic_inc(&rs_sfp->inactives);
-			rs_rsv_srp->tag = SG_TAG_WILDCARD;
-			rs_rsv_srp->sh_var = SG_SHR_NONE;
-			set_bit(SG_FRQ_RESERVED, rs_rsv_srp->frq_bm);
-			rs_rsv_srp->in_resid = 0;
-			rs_rsv_srp->rq_info = 0;
-			rs_rsv_srp->sense_len = 0;
-			rs_rsv_srp->sh_srp = NULL;
-			sg_finish_scsi_blk_rq(rs_rsv_srp);
-			sg_deact_request(rs_rsv_srp->parentfp, rs_rsv_srp);
+			found_one = true;
+			break;
+		case SG_RQ_SHR_IN_WS:
+			if (even_if_in_ws)
+				found_one = true;
+			else
+				res = -EBUSY;
 			break;
-		case SG_RQ_SHR_IN_WS:	/* too late, write-side rq active */
 		case SG_RQ_BUSY:
 			res = -EBUSY;
 			break;
@@ -3408,14 +3708,31 @@ sg_finish_rs_rq(struct sg_fd *sfp)
 			res = -EINVAL;
 			break;
 		}
+		if (found_one)
+			goto found;
 		xa_unlock_irqrestore(&rs_sfp->srp_arr, iflags);
-		if (res == 0)
-			return res;
+		if (rs_srp)
+			return res;	/* found rs_srp but was in wrong state */
 	}
 fini:
 	if (unlikely(res))
 		SG_LOG(1, sfp, "%s: err=%d\n", __func__, -res);
 	return res;
+found:
+	res = sg_rq_chg_state_ulck(rs_rsv_srp, sr_st, SG_RQ_BUSY);
+	if (!res)
+		atomic_inc(&rs_sfp->inactives);
+	rs_rsv_srp->tag = SG_TAG_WILDCARD;
+	rs_rsv_srp->sh_var = SG_SHR_NONE;
+	set_bit(SG_FRQ_RESERVED, rs_rsv_srp->frq_bm);
+	rs_rsv_srp->in_resid = 0;
+	rs_rsv_srp->rq_info = 0;
+	rs_rsv_srp->sense_len = 0;
+	rs_rsv_srp->sh_srp = NULL;
+	xa_unlock_irqrestore(&rs_sfp->srp_arr, iflags);
+	sg_finish_scsi_blk_rq(rs_rsv_srp);
+	sg_deact_request(rs_rsv_srp->parentfp, rs_rsv_srp);
+	return 0;
 }
 
 static void
@@ -3523,7 +3840,7 @@ sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
 			if (IS_ERR_OR_NULL(rsv_srp) ||
 			    rsv_srp->sh_var != SG_SHR_RS_RQ)
 				continue;
-			sr_st = atomic_read(&rsv_srp->rq_st);
+			sr_st = atomic_read_acquire(&rsv_srp->rq_st);
 			switch (sr_st) {
 			case SG_RQ_SHR_SWAP:
 				set_inactive = true;
@@ -3732,66 +4049,6 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
 	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 }
 
-static inline bool
-sg_rq_landed(struct sg_device *sdp, struct sg_request *srp)
-{
-	return atomic_read_acquire(&srp->rq_st) != SG_RQ_INFLIGHT || SG_IS_DETACHING(sdp);
-}
-
-/* This is a blocking wait then complete for a specific srp. */
-static int
-sg_wait_event_srp(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
-		  struct sg_request *srp)
-{
-	int res;
-	struct sg_device *sdp = sfp->parentdp;
-	enum sg_rq_state sr_st;
-
-	if (atomic_read(&srp->rq_st) != SG_RQ_INFLIGHT)
-		goto skip_wait;		/* and skip _acquire() */
-	if (srp->rq_flags & SGV4_FLAG_HIPRI) {
-		/* call blk_poll(), spinning till found */
-		res = sg_srp_q_blk_poll(srp, sdp->device->request_queue, -1);
-		if (res != -ENODATA && unlikely(res < 0))
-			return res;
-		goto skip_wait;
-	}
-	SG_LOG(3, sfp, "%s: about to wait_event...()\n", __func__);
-	/* N.B. The SG_FFD_EXCL_WAITQ flag is ignored here. */
-	res = __wait_event_interruptible(sfp->cmpl_wait,
-					 sg_rq_landed(sdp, srp));
-	if (unlikely(res)) { /* -ERESTARTSYS because signal hit thread */
-		set_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
-		/* orphans harvested when sfp->keep_orphan is false */
-		sg_rq_chg_state_force(srp, SG_RQ_INFLIGHT);
-		SG_LOG(1, sfp, "%s:  wait_event_interruptible(): %s[%d]\n",
-		       __func__, (res == -ERESTARTSYS ? "ERESTARTSYS" : ""),
-		       res);
-		return res;
-	}
-skip_wait:
-	if (SG_IS_DETACHING(sdp)) {
-		sg_rq_chg_state_force(srp, SG_RQ_INACTIVE);
-		atomic_inc(&sfp->inactives);
-		return -ENODEV;
-	}
-	sr_st = atomic_read(&srp->rq_st);
-	if (unlikely(sr_st != SG_RQ_AWAIT_RCV))
-		return -EPROTO;         /* Logic error */
-	res = sg_rq_chg_state(srp, sr_st, SG_RQ_BUSY);
-	if (unlikely(res)) {
-#if IS_ENABLED(SG_LOG_ACTIVE)
-		sg_rq_state_fail_msg(sfp, sr_st, SG_RQ_BUSY, __func__);
-#endif
-		return res;
-	}
-	if (SG_IS_V4I(srp))
-		res = sg_receive_v4(sfp, srp, p, h4p);
-	else
-		res = sg_receive_v3(sfp, srp, p);
-	return (res < 0) ? res : 0;
-}
-
 /*
  * Handles ioctl(SG_IO) for blocking (sync) usage of v3 or v4 interface.
  * Returns 0 on success else a negated errno.
@@ -3799,6 +4056,7 @@ sg_wait_event_srp(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 static int
 sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 {
+	bool is_v4, hipri;
 	int res;
 	struct sg_request *srp = NULL;
 	u8 hu8arr[SZ_SG_IO_V4];
@@ -3828,8 +4086,12 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 				   ((u8 __user *)p) + v3_len,
 				   SZ_SG_IO_V4 - v3_len))
 			return -EFAULT;
+		is_v4 = true;
+		hipri = !!(h4p->flags & SGV4_FLAG_HIPRI);
 		res = sg_submit_v4(sfp, p, h4p, true, &srp);
 	} else if (h3p->interface_id == 'S') {
+		is_v4 = false;
+		hipri = !!(h3p->flags & SGV4_FLAG_HIPRI);
 		res = sg_submit_v3(sfp, h3p, true, &srp);
 	} else {
 		pr_info_once("sg: %s: v3 or v4 interface only here\n",
@@ -3840,7 +4102,7 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		return res;
 	if (!srp)	/* mrq case: already processed all responses */
 		return res;
-	res = sg_wait_event_srp(sfp, p, h4p, srp);
+	res = sg_wait_poll_for_given_srp(sfp, srp, hipri);
 #if IS_ENABLED(SG_LOG_ACTIVE)
 	if (unlikely(res))
 		SG_LOG(1, sfp, "%s: %s=0x%pK  state: %s, share: %s\n",
@@ -3848,19 +4110,15 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		       sg_rq_st_str(atomic_read(&srp->rq_st), false),
 		       sg_shr_str(srp->sh_var, false));
 #endif
+	if (likely(res == 0)) {
+		if (is_v4)
+			res = sg_receive_v4(sfp, srp, p, h4p);
+		else
+			res = sg_receive_v3(sfp, srp, p);
+	}
 	return res;
 }
 
-static inline int
-sg_num_waiting_maybe_acquire(struct sg_fd *sfp)
-{
-	int num = atomic_read(&sfp->waiting);
-
-	if (num < 1)
-		num = atomic_read_acquire(&sfp->waiting);
-	return num;
-}
-
 /*
  * When use_tag is true then id is a tag, else it is a pack_id. Returns
  * valid srp if match, else returns NULL.
@@ -3943,7 +4201,7 @@ sg_abort_req(struct sg_fd *sfp, struct sg_request *srp)
 		       __func__, srp->pack_id, srp->tag);
 		goto fini;	/* skip quietly if already aborted */
 	}
-	rq_st = atomic_read(&srp->rq_st);
+	rq_st = atomic_read_acquire(&srp->rq_st);
 	SG_LOG(3, sfp, "%s: req pack_id/tag=%d/%d, status=%s\n", __func__,
 	       srp->pack_id, srp->tag, sg_rq_st_str(rq_st, false));
 	switch (rq_st) {
@@ -4252,8 +4510,9 @@ sg_fd_share(struct sg_fd *ws_sfp, int m_fd)
 
 /*
  * After checking the proposed file share relationship is unique and
- * valid, sets up pointers between read-side and write-side sg_fd objects.
- * Allows previous write-side to be the same as the new new_ws_fd .
+ * valid, sets up pointers between read-side and write-side sg_fd objects. Allows
+ * previous write-side to be the same as the new write-side (fd). Return 0 on success
+ * or negated errno value.
  */
 static int
 sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
@@ -4447,6 +4706,7 @@ static int put_compat_request_table(struct compat_sg_req_info __user *o,
 				    struct sg_req_info *rinfo)
 {
 	int i;
+
 	for (i = 0; i < SG_MAX_QUEUE; i++) {
 		if (copy_to_user(o + i, rinfo + i, offsetof(sg_req_info_t, usr_ptr)) ||
 		    put_user((uintptr_t)rinfo[i].usr_ptr, &o[i].usr_ptr) ||
@@ -4638,7 +4898,7 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		if (rs_sfp && !IS_ERR_OR_NULL(rs_sfp->rsv_arr[0])) {
 			struct sg_request *res_srp = rs_sfp->rsv_arr[0];
 
-			if (atomic_read(&res_srp->rq_st) == SG_RQ_SHR_SWAP)
+			if (atomic_read_acquire(&res_srp->rq_st) == SG_RQ_SHR_SWAP)
 				c_flgs_val_out |= SG_CTL_FLAGM_READ_SIDE_FINI;
 			else
 				c_flgs_val_out &= ~SG_CTL_FLAGM_READ_SIDE_FINI;
@@ -4647,8 +4907,8 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		}
 	}
 	if ((c_flgs_wm & SG_CTL_FLAGM_READ_SIDE_FINI) &&
-	    (c_flgs_val_in & SG_CTL_FLAGM_READ_SIDE_FINI))
-		res = sg_finish_rs_rq(sfp);
+	    (c_flgs_val_out & SG_CTL_FLAGM_READ_SIDE_FINI))
+		res = sg_finish_rs_rq(sfp, NULL, false);
 	/* READ_SIDE_ERR boolean, [ro] share: read-side finished with error */
 	if (c_flgs_rm & SG_CTL_FLAGM_READ_SIDE_ERR) {
 		struct sg_fd *rs_sfp = sg_fd_share_ptr(sfp);
@@ -4835,10 +5095,8 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 	}
 	/* yields minor_index (type: u32) [ro] */
 	if (or_masks & SG_SEIM_MINOR_INDEX) {
-		if (s_wr_mask & SG_SEIM_MINOR_INDEX) {
-			SG_LOG(2, sfp, "%s: writing to minor_index ignored\n",
-			       __func__);
-		}
+		if (s_wr_mask & SG_SEIM_MINOR_INDEX)
+			SG_LOG(2, sfp, "%s: writing to minor_index ignored\n", __func__);
 		if (s_rd_mask & SG_SEIM_MINOR_INDEX)
 			seip->minor_index = sdp->index;
 	}
@@ -4892,7 +5150,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 		n = 0;
 		if (s_wr_mask & SG_SEIM_BLK_POLL) {
 			result = sg_sfp_blk_poll(sfp, seip->num);
-			if (result < 0) {
+			if (unlikely(result < 0)) {
 				if (ret == 0)
 					ret = result;
 			} else {
@@ -5035,8 +5293,11 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	switch (cmd_in) {
 	case SG_GET_NUM_WAITING:
 		/* Want as fast as possible, with a useful result */
-		if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm))
-			sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready */
+		if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm)) {
+			res = sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready */
+			if (unlikely(res < 0))
+				return res;
+		}
 		val = atomic_read(&sfp->waiting);
 		if (val)
 			return put_user(val, ip);
@@ -5360,7 +5621,7 @@ sg_sfp_blk_poll(struct sg_fd *sfp, int loop_count)
 	struct request_queue *q = sdev ? sdev->request_queue : NULL;
 	struct xarray *xafp = &sfp->srp_arr;
 
-	if (!q)
+	if (unlikely(!q))
 		return -EINVAL;
 	xa_lock_irqsave(xafp, iflags);
 	xa_for_each(xafp, idx, srp) {
@@ -5863,8 +6124,9 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 		if (unlikely(error))
 			pr_err("%s: unable to make symlink 'generic' back "
 			       "to sg%d\n", __func__, sdp->index);
-	} else
+	} else {
 		pr_warn("%s: sg_sys Invalid\n", __func__);
+	}
 
 	sdp->create_ns = ktime_get_boottime_ns();
 	sg_calc_sgat_param(sdp);
@@ -6494,16 +6756,15 @@ sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
 			if (copy_to_user(outp, page_address(pgp), num_xfer))
 				res = -EFAULT;
 			break;
-		} else {
-			if (copy_to_user(outp, page_address(pgp), num)) {
-				res = -EFAULT;
-				break;
-			}
-			num_xfer -= num;
-			if (num_xfer <= 0)
-				break;
-			outp += num;
 		}
+		if (copy_to_user(outp, page_address(pgp), num)) {
+			res = -EFAULT;
+			break;
+		}
+		num_xfer -= num;
+		if (num_xfer <= 0)
+			break;
+		outp += num;
 	}
 	return res;
 }
@@ -6520,10 +6781,8 @@ static struct sg_request *
 sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 {
 	__maybe_unused bool is_bad_st = false;
-	__maybe_unused enum sg_rq_state bad_sr_st = SG_RQ_INACTIVE;
 	bool search_for_1 = (id != SG_TAG_WILDCARD);
 	bool second = false;
-	enum sg_rq_state sr_st;
 	int res;
 	int l_await_idx = READ_ONCE(sfp->low_await_idx);
 	unsigned long idx, s_idx;
@@ -6531,8 +6790,11 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 	struct sg_request *srp = NULL;
 	struct xarray *xafp = &sfp->srp_arr;
 
-	if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm))
-		sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready to push up */
+	if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm)) {
+		res = sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready to push up */
+		if (unlikely(res < 0))
+			return ERR_PTR(res);
+	}
 	if (sg_num_waiting_maybe_acquire(sfp) < 1)
 		return NULL;
 
@@ -6552,30 +6814,9 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 				if (srp->pack_id != id)
 					continue;
 			}
-			sr_st = atomic_read(&srp->rq_st);
-			switch (sr_st) {
-			case SG_RQ_AWAIT_RCV:
-				res = sg_rq_chg_state(srp, sr_st, SG_RQ_BUSY);
-				if (likely(res == 0))
-					goto good;
-				/* else another caller got it, move on */
-				if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
-					is_bad_st = true;
-					bad_sr_st = atomic_read(&srp->rq_st);
-				}
-				break;
-			case SG_RQ_SHR_IN_WS:
+			res = sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY);
+			if (likely(res == 0))
 				goto good;
-			case SG_RQ_INFLIGHT:
-				break;
-			default:
-				if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
-					is_bad_st = true;
-					bad_sr_st = sr_st;
-				}
-				break;
-			}
-			break;
 		}
 		/* If not found so far, need to wrap around and search [0 ... s_idx) */
 		if (!srp && !second && s_idx > 0) {
@@ -6616,21 +6857,6 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 			goto second_time2;
 		}
 	}
-	/* here if one of above loops does _not_ find a match */
-	if (IS_ENABLED(CONFIG_SCSI_PROC_FS)) {
-		if (search_for_1) {
-			__maybe_unused const char *cptp = is_tag ? "tag=" :
-								   "pack_id=";
-
-			if (unlikely(is_bad_st))
-				SG_LOG(1, sfp, "%s: %s%d wrong state: %s\n",
-				       __func__, cptp, id,
-				       sg_rq_st_str(bad_sr_st, true));
-			else
-				SG_LOG(6, sfp, "%s: %s%d not awaiting read\n",
-				       __func__, cptp, id);
-		}
-	}
 	return NULL;
 good:
 	SG_LOG(5, sfp, "%s: %s%d found [srp=0x%pK]\n", __func__,
@@ -6638,64 +6864,6 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 	return srp;
 }
 
-/*
- * Returns true if a request is ready and its srp is written to *srpp . If
- * nothing can be found (because nothing is currently submitted) then true
- * is returned and ERR_PTR(-ENODATA) --> *srpp . If nothing is found but
- * sfp has requests submitted, returns false and NULL --> *srpp .
- */
-static bool
-sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
-{
-	bool second = false;
-	int res;
-	int l_await_idx = READ_ONCE(sfp->low_await_idx);
-	unsigned long idx, s_idx, end_idx;
-	struct sg_request *srp;
-	struct xarray *xafp = &sfp->srp_arr;
-
-	if (SG_IS_DETACHING(sfp->parentdp)) {
-		*srpp = ERR_PTR(-ENODEV);
-		return true;
-	}
-	if (atomic_read(&sfp->submitted) < 1) {
-		*srpp = ERR_PTR(-ENODATA);
-		return true;
-	}
-	if (sg_num_waiting_maybe_acquire(sfp) < 1)
-		goto fini;
-
-	s_idx = (l_await_idx < 0) ? 0 : l_await_idx;
-	idx = s_idx;
-	end_idx = ULONG_MAX;
-
-second_time:
-	for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
-	     srp;
-	     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
-		res = sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY);
-		if (likely(res == 0)) {
-			*srpp = srp;
-			WRITE_ONCE(sfp->low_await_idx, idx + 1);
-			return true;
-		}
-#if IS_ENABLED(SG_LOG_ACTIVE)
-		sg_rq_state_fail_msg(sfp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY, __func__);
-#endif
-	}
-	/* If not found so far, need to wrap around and search [0 ... end_idx) */
-	if (!srp && !second && s_idx > 0) {
-		end_idx = s_idx - 1;
-		s_idx = 0;
-		idx = s_idx;
-		second = true;
-		goto second_time;
-	}
-fini:
-	*srpp = NULL;
-	return false;
-}
-
 /*
  * Makes a new sg_request object. If 'first' is set then use GFP_KERNEL which
  * may take time but has improved chance of success, otherwise use GFP_ATOMIC.
@@ -6797,7 +6965,7 @@ static struct sg_request *
 sg_setup_req_ws_helper(struct sg_comm_wr_t *cwrp)
 {
 	int res;
-	struct sg_request *r_srp;
+	struct sg_request *rs_srp;
 	enum sg_rq_state rs_sr_st;
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_fd *rs_sfp = sg_fd_share_ptr(fp);
@@ -6810,32 +6978,94 @@ sg_setup_req_ws_helper(struct sg_comm_wr_t *cwrp)
 	 * rq_state:	SG_RQ_SHR_SWAP --> SG_RQ_SHR_IN_WS
 	 */
 	if (cwrp->rsv_idx >= 0)
-		r_srp = rs_sfp->rsv_arr[cwrp->rsv_idx];
+		rs_srp = rs_sfp->rsv_arr[cwrp->rsv_idx];
 	else
-		r_srp = sg_get_probable_read_side(rs_sfp);
-	if (unlikely(!r_srp))
+		rs_srp = sg_get_probable_read_side(rs_sfp);
+	if (unlikely(!rs_srp))
 		return ERR_PTR(-ENOSTR);
 
-	rs_sr_st = atomic_read(&r_srp->rq_st);
+	rs_sr_st = atomic_read(&rs_srp->rq_st);
 	switch (rs_sr_st) {
 	case SG_RQ_SHR_SWAP:
 		break;
 	case SG_RQ_AWAIT_RCV:
 	case SG_RQ_INFLIGHT:
-	case SG_RQ_BUSY:
-		return ERR_PTR(-EBUSY);	/* too early for write-side req */
-	case SG_RQ_INACTIVE:
-		SG_LOG(1, fp, "%s: write-side finds read-side inactive\n",
-		       __func__);
+	case SG_RQ_BUSY:	/* too early for write-side req */
+		return ERR_PTR(-EBUSY);
+	case SG_RQ_INACTIVE:	/* read-side may have ended with an error */
+		SG_LOG(1, fp, "%s: write-side finds read-side inactive\n", __func__);
 		return ERR_PTR(-EADDRNOTAVAIL);
-	case SG_RQ_SHR_IN_WS:
-		SG_LOG(1, fp, "%s: write-side find read-side shr_in_ws\n",
-		       __func__);
+	case SG_RQ_SHR_IN_WS:	/* write-side already being processed, why another? */
+		SG_LOG(1, fp, "%s: write-side find read-side shr_in_ws\n", __func__);
 		return ERR_PTR(-EADDRINUSE);
 	}
-	res = sg_rq_chg_state(r_srp, rs_sr_st, SG_RQ_SHR_IN_WS);
+	res = sg_rq_chg_state(rs_srp, rs_sr_st, SG_RQ_SHR_IN_WS);
 	if (unlikely(res))
 		return ERR_PTR(-EADDRINUSE);
+	return rs_srp;
+}
+
+static struct sg_request *
+sg_setup_req_new_srp(struct sg_comm_wr_t *cwrp, bool new_rsv_srp, bool no_reqs,
+		     bool *try_harderp)
+{
+	struct sg_fd *fp = cwrp->sfp;
+	int dlen = cwrp->dlen;
+	int res;
+	int ra_idx = 0;
+	u32 n_idx, sum_dlen;
+	unsigned long iflags;
+	struct sg_request *r_srp = NULL;
+	struct xarray *xafp = &fp->srp_arr;
+
+	if (test_bit(SG_FFD_NO_CMD_Q, fp->ffd_bm) && atomic_read(&fp->submitted) > 0) {
+		SG_LOG(6, fp, "%s: trying 2nd req but cmd_q=false\n", __func__);
+		return ERR_PTR(-EDOM);
+	} else if (fp->tot_fd_thresh > 0) {
+		sum_dlen = atomic_read(&fp->sum_fd_dlens) + dlen;
+		if (unlikely(sum_dlen > (u32)fp->tot_fd_thresh)) {
+			SG_LOG(2, fp, "%s: sum_of_dlen(%u) > tot_fd_thresh\n", __func__,
+			       sum_dlen);
+			return ERR_PTR(-E2BIG);
+		}
+	}
+	if (new_rsv_srp) {
+		ra_idx = sg_get_idx_new(fp);
+		if (ra_idx < 0) {
+			ra_idx = sg_get_idx_available(fp);
+			if (ra_idx < 0) {
+				SG_LOG(1, fp, "%s: run out of read-side reqs\n", __func__);
+				return ERR_PTR(-EFBIG);
+			}
+		}
+	}
+	r_srp = sg_mk_srp_sgat(fp, no_reqs, dlen);
+	if (IS_ERR(r_srp)) {
+		if (!*try_harderp && dlen < SG_DEF_SECTOR_SZ) {
+			*try_harderp = true;
+			return NULL;
+		}
+		return r_srp;
+	}
+	SG_LOG(4, fp, "%s: %smk_new_srp=0x%pK ++\n", __func__, (new_rsv_srp ? "rsv " : ""),
+	       r_srp);
+	if (new_rsv_srp) {
+		fp->rsv_arr[ra_idx] = r_srp;
+		set_bit(SG_FRQ_RESERVED, r_srp->frq_bm);
+		r_srp->sh_srp = NULL;
+	}
+	xa_lock_irqsave(xafp, iflags);
+	res = __xa_alloc(xafp, &n_idx, r_srp, xa_limit_32b, GFP_ATOMIC);
+	if (unlikely(res < 0)) {
+		xa_unlock_irqrestore(xafp, iflags);
+		sg_remove_srp(r_srp);
+		kfree(r_srp);
+		SG_LOG(1, fp, "%s: xa_alloc() failed, errno=%d\n", __func__,  -res);
+		return ERR_PTR(-EPROTOTYPE);
+	}
+	r_srp->rq_idx = n_idx;
+	r_srp->parentfp = fp;
+	xa_unlock_irqrestore(xafp, iflags);
 	return r_srp;
 }
 
@@ -6855,15 +7085,12 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 	bool new_rsv_srp = false;
 	bool no_reqs = false;
 	bool ws_rq = false;
-	bool some_inactive = false;
 	bool try_harder = false;
+	bool keep_frq_bm = false;
 	bool second = false;
-	bool is_rsv;
-	int ra_idx = 0;
-	int l_used_idx;
+	int res, ra_idx, l_used_idx;
 	int dlen = cwrp->dlen;
-	u32 sum_dlen;
-	unsigned long idx, s_idx, end_idx, iflags;
+	unsigned long idx, s_idx, end_idx;
 	enum sg_rq_state sr_st;
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_request *r_srp; /* returned value won't be NULL */
@@ -6875,16 +7102,27 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 	switch (sh_var) {
 	case SG_SHR_RS_RQ:
 		cp = "rs_rq";
+		if (cwrp->possible_srp) {
+			r_srp = cwrp->possible_srp;
+			res = sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY);
+			if (unlikely(res)) {
+				r_srp = NULL;
+			} else {
+				atomic_dec(&fp->inactives);
+				keep_frq_bm = true;
+				r_srp->sh_srp = NULL;
+				goto final_setup;
+			}
+		}
 		ra_idx = (test_bit(SG_FFD_RESHARE, fp->ffd_bm)) ? 0 : sg_get_idx_available(fp);
 		if (ra_idx < 0) {
 			new_rsv_srp = true;
-			goto good_fini;
+			goto maybe_new;
 		}
 		r_srp = fp->rsv_arr[ra_idx];
-		sr_st = atomic_read(&r_srp->rq_st);
+		sr_st = atomic_read_acquire(&r_srp->rq_st);
 		if (sr_st == SG_RQ_INACTIVE) {
-			int res = sg_rq_chg_state(r_srp, sr_st, SG_RQ_BUSY);
-
+			res = sg_rq_chg_state(r_srp, sr_st, SG_RQ_BUSY);
 			if (unlikely(res)) {
 				r_srp = NULL;
 			} else {
@@ -6897,9 +7135,12 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 		}
 		if (IS_ERR(r_srp))
 			goto err_out;
-		if (mk_new_srp)
+		if (mk_new_srp) {
 			new_rsv_srp = true;
-		goto good_fini;
+			goto maybe_new;
+		} else {
+			goto final_setup;
+		}
 	case SG_SHR_WS_RQ:
 		cp = "rs_rq";
 		rs_rsv_srp = sg_setup_req_ws_helper(cwrp);
@@ -6916,6 +7157,20 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 			goto err_out;
 		}
 		ws_rq = true;
+		r_srp = cwrp->possible_srp;
+		if (r_srp) {
+			sr_st = atomic_read_acquire(&r_srp->rq_st);
+			if (sr_st == SG_RQ_INACTIVE && dlen <= r_srp->sgat_h.buflen) {
+				res = sg_rq_chg_state(r_srp, sr_st, SG_RQ_BUSY);
+				if (likely(res == 0)) {
+					/* possible_srp bypasses loop to find candidate */
+					mk_new_srp = false;
+					keep_frq_bm = true;
+					goto final_setup;
+				}
+			}
+			r_srp = NULL;
+		}
 		dlen = 0;	/* any srp for write-side will do, pick smallest */
 		break;
 	case SG_SHR_RS_NOT_SRQ:
@@ -6931,9 +7186,10 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 		mk_new_srp = true;
 	} else if (atomic_read(&fp->inactives) <= 0) {
 		mk_new_srp = true;
-	} else if (likely(!try_harder) && dlen < SG_DEF_SECTOR_SZ) {
+	} else if (dlen < SG_DEF_SECTOR_SZ && likely(!try_harder)) {
 		struct sg_request *low_srp = NULL;
 
+		cp = "small dlen from inactives";
 		l_used_idx = READ_ONCE(fp->low_used_idx);
 		s_idx = (l_used_idx < 0) ? 0 : l_used_idx;
 		if (l_used_idx >= 0 && xa_get_mark(xafp, s_idx, SG_XA_RQ_INACTIVE)) {
@@ -6965,13 +7221,13 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 		if (mk_new_srp && low_srp) {	/* no candidate yet */
 			/* take non-NULL low_srp, irrespective of r_srp->sgat_h.buflen size */
 			r_srp = low_srp;
-			if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY) == 0) {
+			if (likely(sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY) == 0)) {
 				mk_new_srp = false;
 				atomic_dec(&fp->inactives);
 			}
 		}
 	} else {
-		cp = "larger from srp_arr";
+		cp = "larger dlen from inactives";
 		l_used_idx = READ_ONCE(fp->low_used_idx);
 		s_idx = (l_used_idx < 0) ? 0 : l_used_idx;
 		idx = s_idx;
@@ -6982,7 +7238,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 			for (r_srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE);
 			     r_srp;
 			     r_srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE)) {
-				if (r_srp->sgat_h.buflen >= dlen) {
+				if (dlen <= r_srp->sgat_h.buflen) {
 					if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
 						continue;
 					atomic_dec(&fp->inactives);
@@ -7003,7 +7259,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 			for (r_srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE);
 			     r_srp;
 			     r_srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE)) {
-				if (r_srp->sgat_h.buflen >= dlen &&
+				if (dlen <= r_srp->sgat_h.buflen &&
 				    !test_bit(SG_FRQ_RESERVED, r_srp->frq_bm)) {
 					if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
 						continue;
@@ -7023,89 +7279,34 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 		}
 	}
 have_existing:
-	if (!mk_new_srp) {
+	if (!mk_new_srp) {		/* re-using an existing sg_request object */
 		r_srp->in_resid = 0;
 		r_srp->rq_info = 0;
 		r_srp->sense_len = 0;
 	}
-
-good_fini:
+maybe_new:
 	if (mk_new_srp) {	/* Need new sg_request object */
-		bool disallow_cmd_q = test_bit(SG_FFD_NO_CMD_Q, fp->ffd_bm);
-		int res;
-		u32 n_idx;
-
 		cp = "new";
-		r_srp = NULL;
-		if (disallow_cmd_q && atomic_read(&fp->submitted) > 0) {
-			r_srp = ERR_PTR(-EDOM);
-			SG_LOG(6, fp, "%s: trying 2nd req but cmd_q=false\n",
-			       __func__);
-			goto err_out;
-		} else if (fp->tot_fd_thresh > 0) {
-			sum_dlen = atomic_read(&fp->sum_fd_dlens) + dlen;
-			if (unlikely(sum_dlen > (u32)fp->tot_fd_thresh)) {
-				r_srp = ERR_PTR(-E2BIG);
-				SG_LOG(2, fp, "%s: sum_of_dlen(%u) > %s\n",
-				       __func__, sum_dlen, "tot_fd_thresh");
-			}
-		}
-		if (!IS_ERR(r_srp) && new_rsv_srp) {
-			ra_idx = sg_get_idx_new(fp);
-			if (ra_idx < 0) {
-				ra_idx = sg_get_idx_available(fp);
-				if (ra_idx < 0) {
-					SG_LOG(1, fp,
-					       "%s: no read-side reqs available\n",
-					       __func__);
-					r_srp = ERR_PTR(-EFBIG);
-				}
-			}
-		}
-		if (IS_ERR(r_srp))	/* NULL is _not_ an ERR here */
-			goto err_out;
-		r_srp = sg_mk_srp_sgat(fp, no_reqs, dlen);
-		if (IS_ERR(r_srp)) {
-			if (!try_harder && dlen < SG_DEF_SECTOR_SZ &&
-			    some_inactive) {
-				try_harder = true;
-				goto start_again;
-			}
+		r_srp = sg_setup_req_new_srp(cwrp, new_rsv_srp, no_reqs, &try_harder);
+		if (IS_ERR(r_srp))
 			goto err_out;
-		}
-		SG_LOG(4, fp, "%s: %smk_new_srp=0x%pK ++\n", __func__,
-		       (new_rsv_srp ? "rsv " : ""), r_srp);
-		if (new_rsv_srp) {
-			fp->rsv_arr[ra_idx] = r_srp;
+		if (!r_srp && try_harder)
+			goto start_again;
+	}
+final_setup:
+	if (!keep_frq_bm) {
+		/* keep SG_FRQ_RESERVED setting from prior/new r_srp; clear rest */
+		bool is_rsv = test_bit(SG_FRQ_RESERVED, r_srp->frq_bm);
+
+		r_srp->frq_bm[0] = 0;
+		if (is_rsv)
 			set_bit(SG_FRQ_RESERVED, r_srp->frq_bm);
-			r_srp->sh_srp = NULL;
-		}
-		xa_lock_irqsave(xafp, iflags);
-		res = __xa_alloc(xafp, &n_idx, r_srp, xa_limit_32b, GFP_ATOMIC);
-		xa_unlock_irqrestore(xafp, iflags);
-		if (unlikely(res < 0)) {
-			xa_unlock_irqrestore(xafp, iflags);
-			sg_remove_srp(r_srp);
-			kfree(r_srp);
-			r_srp = ERR_PTR(-EPROTOTYPE);
-			SG_LOG(1, fp, "%s: xa_alloc() failed, errno=%d\n",
-			       __func__,  -res);
-			goto err_out;
-		}
-		r_srp->rq_idx = n_idx;
-		r_srp->parentfp = fp;
-		xa_unlock_irqrestore(xafp, iflags);
+		/* r_srp inherits these flags from cwrp->frq_bm */
+		if (test_bit(SG_FRQ_IS_V4I, cwrp->frq_bm))
+			set_bit(SG_FRQ_IS_V4I, r_srp->frq_bm);
+		if (test_bit(SG_FRQ_SYNC_INVOC, cwrp->frq_bm))
+			set_bit(SG_FRQ_SYNC_INVOC, r_srp->frq_bm);
 	}
-	/* keep SG_FRQ_RESERVED setting from prior/new r_srp; clear rest */
-	is_rsv = test_bit(SG_FRQ_RESERVED, r_srp->frq_bm);
-	WRITE_ONCE(r_srp->frq_bm[0], 0);
-	if (is_rsv)
-		set_bit(SG_FRQ_RESERVED, r_srp->frq_bm);
-	/* r_srp inherits these 3 flags from cwrp->frq_bm */
-	if (test_bit(SG_FRQ_IS_V4I, cwrp->frq_bm))
-		set_bit(SG_FRQ_IS_V4I, r_srp->frq_bm);
-	if (test_bit(SG_FRQ_SYNC_INVOC, cwrp->frq_bm))
-		set_bit(SG_FRQ_SYNC_INVOC, r_srp->frq_bm);
 	r_srp->sgatp->dlen = dlen;	/* must be <= r_srp->sgat_h.buflen */
 	r_srp->sh_var = sh_var;
 	r_srp->cmd_opcode = 0xff;  /* set invalid opcode (VS), 0x0 is TUR */
@@ -7140,7 +7341,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 
 /*
  * Sets srp to SG_RQ_INACTIVE unless it was in SG_RQ_SHR_SWAP state. Also
- * change the asociated xarray entry flags to be consistent with
+ * change the associated xarray entry flags to be consistent with
  * SG_RQ_INACTIVE. Since this function can be called from many contexts,
  * then assume no xa locks held.
  * The state machine should insure that two threads should never race here.
@@ -7157,7 +7358,7 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 	SG_LOG(3, sfp, "%s: srp=%pK\n", __func__, srp);
 	sbp = srp->sense_bp;
 	srp->sense_bp = NULL;
-	sr_st = atomic_read(&srp->rq_st);
+	sr_st = atomic_read_acquire(&srp->rq_st);
 	if (sr_st != SG_RQ_SHR_SWAP) {
 		/*
 		 * Can be called from many contexts and it is hard to know
@@ -7621,17 +7822,16 @@ sg_proc_seq_show_dev(struct seq_file *s, void *v)
 
 	read_lock_irqsave(&sg_index_lock, iflags);
 	sdp = it ? sg_lookup_dev(it->index) : NULL;
-	if (unlikely(!sdp || !sdp->device || SG_IS_DETACHING(sdp)))
+	if (unlikely(!sdp || !sdp->device) || SG_IS_DETACHING(sdp)) {
 		seq_puts(s, "-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\t-1\n");
-	else {
+	} else {
 		scsidp = sdp->device;
 		seq_printf(s, "%d\t%d\t%d\t%llu\t%d\t%d\t%d\t%d\t%d\n",
-			      scsidp->host->host_no, scsidp->channel,
-			      scsidp->id, scsidp->lun, (int)scsidp->type,
-			      1,
-			      (int) scsidp->queue_depth,
-			      (int) scsi_device_busy(scsidp),
-			      (int) scsi_device_online(scsidp));
+			   scsidp->host->host_no, scsidp->channel,
+			   scsidp->id, scsidp->lun, (int)scsidp->type, 1,
+			   (int)scsidp->queue_depth,
+			   (int)scsi_device_busy(scsidp),
+			   (int)scsi_device_online(scsidp));
 	}
 	read_unlock_irqrestore(&sg_index_lock, iflags);
 	return 0;
@@ -7663,8 +7863,7 @@ sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 
 /* Writes debug info for one sg_request in obp buffer */
 static int
-sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
-		   int len)
+sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, bool inactive, char *obp, int len)
 {
 	bool is_v3v4, v4, is_dur;
 	int n = 0;
@@ -7708,6 +7907,13 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, char *obp,
 		n += scnprintf(obp + n, len - n, " sgat=%d", srp->sgatp->num_sgat);
 	cp = (srp->rq_flags & SGV4_FLAG_HIPRI) ? "hipri " : "";
 	n += scnprintf(obp + n, len - n, " %sop=0x%02x\n", cp, srp->cmd_opcode);
+	if (inactive && rq_st != SG_RQ_INACTIVE) {
+		if (xa_get_mark(&srp->parentfp->srp_arr, srp->rq_idx, SG_XA_RQ_INACTIVE))
+			cp = "still marked inactive, BAD";
+		else
+			cp = "no longer marked inactive";
+		n += scnprintf(obp + n, len - n, "       <<< xarray %s >>>\n", cp);
+	}
 	return n;
 }
 
@@ -7767,8 +7973,7 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 	xa_lock_irqsave(&fp->srp_arr, iflags);
 	xa_for_each(&fp->srp_arr, idx, srp) {
 		if (srp->rq_idx != (unsigned long)idx)
-			n += scnprintf(obp + n, len - n,
-				       ">>> xa_index=%lu, rq_idx=%d, bad\n",
+			n += scnprintf(obp + n, len - n, ">>> BAD: xa_index!=rq_idx [%lu,%u]\n",
 				       idx, srp->rq_idx);
 		if (xa_get_mark(&fp->srp_arr, idx, SG_XA_RQ_INACTIVE))
 			continue;
@@ -7778,8 +7983,7 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 		else if (test_bit(SG_FRQ_ABORTING, srp->frq_bm))
 			n += scnprintf(obp + n, len - n,
 				       "     abort>> ");
-		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns, obp + n,
-					len - n);
+		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns, false, obp + n, len - n);
 		++k;
 		if ((k % 8) == 0) {	/* don't hold up isr_s too long */
 			xa_unlock_irqrestore(&fp->srp_arr, iflags);
@@ -7796,8 +8000,7 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 		if (set_debug)
 			n += scnprintf(obp + n, len - n, "     rq_bm=0x%lx",
 				       srp->frq_bm[0]);
-		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns,
-					obp + n, len - n);
+		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns, true, obp + n, len - n);
 		++k;
 		if ((k % 8) == 0) {	/* don't hold up isr_s too long */
 			xa_unlock_irqrestore(&fp->srp_arr, iflags);
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 148a5f2786ee..236ac4678f71 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -127,6 +127,7 @@ typedef struct sg_io_hdr {
 #define SGV4_FLAG_NO_DXFER SG_FLAG_NO_DXFER /* but keep dev<-->kernel xfr */
 #define SGV4_FLAG_KEEP_SHARE 0x20000  /* ... buffer for another dout command */
 #define SGV4_FLAG_MULTIPLE_REQS 0x40000	/* 1 or more sg_io_v4-s in data-in */
+#define SGV4_FLAG_ORDERED_WR 0x80000	/* svb: issue in-order writes */
 
 /* Output (potentially OR-ed together) in v3::info or v4::info field */
 #define SG_INFO_OK_MASK 0x1
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 75/83] sg: expand source line length to 100 characters
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (74 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 74/83] sg: add ordered write flag Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 76/83] sg: add no_attach_msg parameter Douglas Gilbert
                   ` (7 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

There are many examples of function invocation, debug strings and
function header comments that are split due to the previous limit
of 80 characters per source line in most cases. Over 350 lines are
saved by squeezing more onto each line.

Inline the simple sg_find_srp_idx() function as it is only called
twice.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 1395 ++++++++++++++++++---------------------------
 1 file changed, 563 insertions(+), 832 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index d6e18cb4df11..a159af1e3ee6 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -33,7 +33,7 @@ static char *sg_version_date = "20210421";
 #include <linux/moduleparam.h>
 #include <linux/cdev.h>
 #include <linux/idr.h>
-#include <linux/file.h>		/* for fget() and fput() */
+#include <linux/file.h>			/* for fget() and fput() */
 #include <linux/seq_file.h>
 #include <linux/blkdev.h>
 #include <linux/delay.h>
@@ -78,9 +78,9 @@ static char *sg_version_date = "20210421";
 #endif
 
 /*
- * SG_MAX_CDB_SIZE should be 260 (spc4r37 section 3.1.30) however the type
- * of sg_io_hdr::cmd_len can only represent 255. All SCSI commands greater
- * than 16 bytes are "variable length" whose length is a multiple of 4, so:
+ * SG_MAX_CDB_SIZE should be 260 (spc4r37 section 3.1.30) however the type of sg_io_hdr::cmd_len
+ * can only represent 255. All SCSI commands greater than 16 bytes are "variable length" whose
+ * length is a multiple of 4, so:
  */
 #define SG_MAX_CDB_SIZE 252
 
@@ -176,9 +176,9 @@ enum sg_shr_var {
 
 int sg_big_buff = SG_DEF_RESERVED_SIZE;
 /*
- * This variable is accessible via /proc/scsi/sg/def_reserved_size . Each
- * time sg_open() is called a sg_request of this size (or less if there is
- * not enough memory) will be reserved for use by this file descriptor.
+ * This variable is accessible via /proc/scsi/sg/def_reserved_size . Each time sg_open() is called
+ * a sg_request of this size (or less if there is not enough memory) will be reserved for use by
+ * this file descriptor.
  */
 static int def_reserved_size = -1;	/* picks up init parameter */
 static int sg_allow_dio = SG_ALLOW_DIO_DEF;	/* ignored by code */
@@ -367,32 +367,26 @@ static void sg_rq_end_io(struct request *rqq, blk_status_t status);
 static int sg_proc_init(void);
 static void sg_dfs_init(void);
 static void sg_dfs_exit(void);
-static int sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp,
-			int dxfer_dir);
+static int sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir);
 static void sg_finish_scsi_blk_rq(struct sg_request *srp);
 static int sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen);
-static int sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp,
-			 void __user *p);
+static int sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, void __user *p);
 static int sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
 			struct sg_request **o_srp);
 static struct sg_request *sg_common_write(struct sg_comm_wr_t *cwrp);
-static int sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp,
-			 void __user *p, struct sg_io_v4 *h4p);
-static int sg_read_append(struct sg_request *srp, void __user *outp,
-			  int num_xfer);
+static int sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
+			 struct sg_io_v4 *h4p);
+static int sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer);
 static void sg_remove_srp(struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp, struct file *filp);
 static void sg_remove_sfp(struct kref *);
 static void sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side);
-static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id,
-					    bool is_tag);
-static struct sg_request *sg_setup_req(struct sg_comm_wr_t *cwrp,
-				       enum sg_shr_var sh_var);
+static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag);
+static struct sg_request *sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int min_dev);
 static void sg_device_destroy(struct kref *kref);
-static struct sg_request *sg_mk_srp_sgat(struct sg_fd *sfp, bool first,
-					 int db_len);
+static struct sg_request *sg_mk_srp_sgat(struct sg_fd *sfp, bool first, int db_len);
 static int sg_abort_req(struct sg_fd *sfp, struct sg_request *srp);
 static int sg_rq_chg_state(struct sg_request *srp, enum sg_rq_state old_st,
 			   enum sg_rq_state new_st);
@@ -408,8 +402,8 @@ static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
 static const char *sg_shr_str(enum sg_shr_var sh_var, bool long_str);
 #endif
 #if IS_ENABLED(SG_PROC_OR_DEBUG_FS)
-static int sg_proc_debug_sdev(struct sg_device *sdp, char *obp, int len,
-			      int *fd_counterp, bool reduced);
+static int sg_proc_debug_sdev(struct sg_device *sdp, char *obp, int len, int *fd_counterp,
+			      bool reduced);
 static void sg_take_snap(struct sg_fd *sfp, bool clear_first);
 #endif
 
@@ -430,11 +424,10 @@ static void sg_take_snap(struct sg_fd *sfp, bool clear_first);
 #define SG_IS_V4I(srp) test_bit(SG_FRQ_IS_V4I, (srp)->frq_bm)
 
 /*
- * Kernel needs to be built with CONFIG_SCSI_LOGGING to see log messages.
- * 'depth' is a number between 1 (most severe) and 7 (most noisy, most
- * information). All messages are logged as informational (KERN_INFO). In
- * the unexpected situation where sfp or sdp is NULL the macro reverts to
- * a pr_info and ignores SCSI_LOG_TIMEOUT and always prints to the log.
+ * Kernel needs to be built with CONFIG_SCSI_LOGGING to see log messages. 'depth' is a number
+ * between 1 (most severe) and 7 (most noisy, most information). All messages are logged as
+ * informational (KERN_INFO). In the unexpected situation where sfp or sdp is NULL the macro
+ * reverts to a pr_info and ignores SCSI_LOG_TIMEOUT and always prints to the log.
  * Example: this invocation: 'scsi_logging_level -s -T 3' will print
  * depth (aka level) 1 and 2 SG_LOG() messages.
  */
@@ -446,21 +439,19 @@ static void sg_take_snap(struct sg_fd *sfp, bool clear_first);
 #define SG_LOG_BUFF_SZ 48
 #define SG_LOG_ACTIVE 1
 
-#define SG_LOG(depth, sfp, fmt, a...)					\
-	do {								\
-		char _b[SG_LOG_BUFF_SZ];				\
-		int _tid = (current ? current->pid : -1);		\
-		struct sg_fd *_fp = sfp;				\
-		struct sg_device *_sdp = _fp ? _fp->parentdp : NULL;	\
-									\
-		if (likely(_sdp && _sdp->disk)) {			\
-			snprintf(_b, sizeof(_b), "sg%u: tid=%d",	\
-				 _sdp->index, _tid);			\
-			SCSI_LOG_TIMEOUT(depth,				\
-					 sdev_prefix_printk(KERN_INFO,	\
-					 _sdp->device, _b, fmt, ##a));	\
-		} else							\
-			pr_info("sg: sdp or sfp NULL, " fmt, ##a);	\
+#define SG_LOG(depth, sfp, fmt, a...)							\
+	do {										\
+		char _b[SG_LOG_BUFF_SZ];						\
+		int _tid = (current ? current->pid : -1);				\
+		struct sg_fd *_fp = sfp;						\
+		struct sg_device *_sdp = _fp ? _fp->parentdp : NULL;			\
+											\
+		if (likely(_sdp && _sdp->disk)) {					\
+			snprintf(_b, sizeof(_b), "sg%u: tid=%d", _sdp->index, _tid);	\
+			SCSI_LOG_TIMEOUT(depth,	sdev_prefix_printk(KERN_INFO,		\
+					 _sdp->device, _b, fmt, ##a));			\
+		} else									\
+			pr_info("sg: sdp or sfp NULL, " fmt, ##a);			\
 	} while (0)
 #else
 #define SG_LOG(depth, sfp, fmt, a...) do { } while (0)
@@ -503,7 +494,7 @@ static void sg_take_snap(struct sg_fd *sfp, bool clear_first);
  *		violation, opened read-only but SCSI command not listed read-only
  * EPROTO	logic error (in driver); like "shouldn't get here"
  * EPROTOTYPE	atomic state change failed unexpectedly
- * ERANGE	multiple requests: usually bad flag values
+ * ERANGE	multiple requests: usually a bad flag or combination of flag values
  * ERESTARTSYS	should not be seen in user space, associated with an
  *		interruptible wait; will restart system call or give EINTR
  * EWOULDBLOCK	[aka EAGAIN]; additionally if the 'more async' flag is set
@@ -512,15 +503,13 @@ static void sg_take_snap(struct sg_fd *sfp, bool clear_first);
 
 /*
  * The SCSI interfaces that use read() and write() as an asynchronous variant of
- * ioctl(..., SG_IO, ...) are fundamentally unsafe, since there are lots of ways
- * to trigger read() and write() calls from various contexts with elevated
- * privileges. This can lead to kernel memory corruption (e.g. if these
- * interfaces are called through splice()) and privilege escalation inside
- * userspace (e.g. if a process with access to such a device passes a file
+ * ioctl(..., SG_IO, ...) are fundamentally unsafe, since there are lots of ways to trigger read()
+ * and write() calls from various contexts with elevated privileges. This can lead to kernel
+ * memory corruption (e.g. if these interfaces are called through splice()) and privilege
+ * escalation inside userspace (e.g. if a process with access to such a device passes a file
  * descriptor to a SUID binary as stdin/stdout/stderr).
  *
- * This function provides protection for the legacy API by restricting the
- * calling context.
+ * This function provides protection for the legacy API by restricting the calling context.
  */
 static int
 sg_check_file_access(struct file *filp, const char *caller)
@@ -579,11 +568,9 @@ sg_wait_open_event(struct sg_device *sdp, bool o_excl)
 }
 
 /*
- * scsi_block_when_processing_errors() returns 0 when dev was taken offline by
- * error recovery, 1 otherwise (i.e. okay). Even if in error recovery, let
- * user continue if O_NONBLOCK set. Permits SCSI commands to be issued during
- * error recovery. Tread carefully.
- * Returns 0 for ok (i.e. allow), -EPROTO if sdp is NULL, otherwise -ENXIO .
+ * scsi_block_when_processing_errors() returns 0 when dev was taken offline by error recovery, 1
+ * otherwise (i.e. okay). Even if in error recovery, let user continue if O_NONBLOCK set. Permits
+ * SCSI commands to be issued during error recovery. Tread carefully.
  */
 static inline int
 sg_allow_if_err_recovery(struct sg_device *sdp, bool non_block)
@@ -600,11 +587,10 @@ sg_allow_if_err_recovery(struct sg_device *sdp, bool non_block)
 }
 
 /*
- * Corresponds to the open() system call on sg devices. Implements O_EXCL on
- * a per device basis using 'open_cnt'. If O_EXCL and O_NONBLOCK and there is
- * already a sg handle open on this device then it fails with an errno of
- * EBUSY. Without the O_NONBLOCK flag then this thread enters an interruptible
- * wait until the other handle(s) are closed.
+ * Corresponds to the open() system call on sg devices. Implements O_EXCL on a per device basis
+ * using 'open_cnt'. If O_EXCL and O_NONBLOCK and there is already a sg handle open on this device
+ * then it fails with an errno of EBUSY. Without the O_NONBLOCK flag then this thread enters an
+ * interruptible wait until the other handle(s) are closed.
  */
 static int
 sg_open(struct inode *inode, struct file *filp)
@@ -671,9 +657,8 @@ sg_open(struct inode *inode, struct file *filp)
 	filp->private_data = sfp;
 	sfp->tid = (current ? current->pid : -1);
 	mutex_unlock(&sdp->open_rel_lock);
-	SG_LOG(3, sfp, "%s: o_count after=%d on minor=%d, op_flags=0x%x%s\n",
-	       __func__, o_count, min_dev, op_flags,
-	       (non_block ? " O_NONBLOCK" : ""));
+	SG_LOG(3, sfp, "%s: o_count after=%d on minor=%d, op_flags=0x%x%s\n", __func__, o_count,
+	       min_dev, op_flags, (non_block ? " O_NONBLOCK" : ""));
 
 	res = 0;
 sg_put:
@@ -697,8 +682,7 @@ sg_open(struct inode *inode, struct file *filp)
 static inline bool
 sg_fd_is_shared(struct sg_fd *sfp)
 {
-	return !xa_get_mark(&sfp->parentdp->sfp_arr, sfp->idx,
-			    SG_XA_FD_UNSHARED);
+	return !xa_get_mark(&sfp->parentdp->sfp_arr, sfp->idx, SG_XA_FD_UNSHARED);
 }
 
 static inline struct sg_fd *
@@ -718,7 +702,7 @@ sg_fd_share_ptr(struct sg_fd *sfp)
 
 /*
  * Picks up driver or host (transport) errors and actual SCSI status problems.
- * Specifically SAM_STAT_CONDITION_MET is _not_ an error.
+ * Specifically SCSI status: SAM_STAT_CONDITION_MET is _not_ an error.
  */
 static inline bool
 sg_result_is_good(int rq_result)
@@ -730,11 +714,10 @@ sg_result_is_good(int rq_result)
 }
 
 /*
- * Release resources associated with a prior, successful sg_open(). It can be
- * seen as the (final) close() call on a sg device file descriptor in the user
- * space. The real work releasing all resources associated with this file
- * descriptor is done by sg_uc_remove_sfp() which is scheduled by
- * sg_remove_sfp().
+ * Release resources associated with a prior, successful sg_open(). It can be seen as the (final)
+ * close() call on a sg device file descriptor in the user space. The real work releasing all
+ * resources associated with this file descriptor is done by sg_uc_remove_sfp() which is
+ * scheduled by sg_remove_sfp().
  */
 static int
 sg_release(struct inode *inode, struct file *filp)
@@ -757,17 +740,14 @@ sg_release(struct inode *inode, struct file *filp)
 	o_count = atomic_read(&sdp->open_cnt);
 	SG_LOG(3, sfp, "%s: open count before=%d\n", __func__, o_count);
 	if (unlikely(test_and_set_bit(SG_FFD_RELEASE, sfp->ffd_bm)))
-		SG_LOG(1, sfp, "%s: second release on this fd ? ?\n",
-		       __func__);
+		SG_LOG(1, sfp, "%s: second release on this fd ? ?\n", __func__);
 	scsi_autopm_put_device(sdp->device);
-	if (likely(!xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_FREE)) &&
-	    sg_fd_is_shared(sfp))
-		sg_remove_sfp_share(sfp, xa_get_mark(&sdp->sfp_arr, sfp->idx,
-						     SG_XA_FD_RS_SHARE));
+	if (likely(!xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_FREE)) && sg_fd_is_shared(sfp))
+		sg_remove_sfp_share(sfp, xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_RS_SHARE));
 	kref_put(&sfp->f_ref, sg_remove_sfp);	/* init=1: sg_add_sfp() */
 	/*
-	 * Possibly many open()s waiting on exclude clearing, start many;
-	 * only open(O_EXCL)'s wait when open_cnt<2 and only start one.
+	 * Possibly many open()s waiting on exclude clearing, start many; only open(O_EXCL)'s wait
+	 * when open_cnt<2 and only start one.
 	 */
 	if (test_and_clear_bit(SG_FDEV_EXCLUDE, sdp->fdev_bm))
 		wake_up_interruptible_all(&sdp->open_wait);
@@ -786,12 +766,11 @@ sg_comm_wr_init(struct sg_comm_wr_t *cwrp)
 }
 
 /*
- * ***********************************************************************
- * write(2) related functions follow. They are shown before read(2) related
- * functions. That is because SCSI commands/requests are first "written" to
- * the SCSI device by using write(2), ioctl(SG_IOSUBMIT) or the first half
- * of the synchronous ioctl(SG_IO) system call.
- * ***********************************************************************
+ * **********************************************************************************************
+ * write(2) related functions follow. They are shown before read(2) related functions. That is
+ * because SCSI commands/requests are first "written" to the SCSI device by using write(2),
+ * ioctl(SG_IOSUBMIT) or the first half of the synchronous ioctl(SG_IO) system call.
+ * **********************************************************************************************
  */
 
 /* This is the write(2) system call entry point. v4 interface disallowed. */
@@ -799,7 +778,7 @@ static ssize_t
 sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 {
 	bool get_v3_hdr;
-	int mxsize, cmd_size, input_size, res;
+	int mxsize, cmd_size, input_size, res, ddir;
 	u8 opcode;
 	struct sg_device *sdp;
 	struct sg_fd *sfp;
@@ -861,12 +840,10 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 			goto to_v2;
 		}
 		if (h3p->interface_id != 'S') {
-			pr_info_once("sg: %s: v3 interface only here\n",
-				     __func__);
+			pr_info_once("sg: %s: v3 interface only here\n", __func__);
 			return -EPERM;
 		}
-		pr_warn_once("Please use %s instead of write(),\n%s\n",
-			     "ioctl(SG_SUBMIT_V3)",
+		pr_warn_once("Please use ioctl(SG_SUBMIT_V3) instead of write(),\n%s\n",
 			     "  See: https://sg.danny.cz/sg/sg_v40.html");
 		res = sg_submit_v3(sfp, h3p, false, NULL);
 		return res < 0 ? res : (int)count;
@@ -902,14 +879,12 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	h3p->iovec_count = 0;
 	h3p->mx_sb_len = 0;
 	if (input_size > 0)
-		h3p->dxfer_direction = (ohp->reply_len > SZ_SG_HEADER) ?
-		    SG_DXFER_TO_FROM_DEV : SG_DXFER_TO_DEV;
+		ddir = (ohp->reply_len > SZ_SG_HEADER) ? SG_DXFER_TO_FROM_DEV : SG_DXFER_TO_DEV;
 	else
-		h3p->dxfer_direction = (mxsize > 0) ? SG_DXFER_FROM_DEV :
-						      SG_DXFER_NONE;
+		ddir = (mxsize > 0) ? SG_DXFER_FROM_DEV : SG_DXFER_NONE;
+	h3p->dxfer_direction = ddir;
 	h3p->dxfer_len = mxsize;
-	if (h3p->dxfer_direction == SG_DXFER_TO_DEV ||
-	    h3p->dxfer_direction == SG_DXFER_TO_FROM_DEV)
+	if (ddir == SG_DXFER_TO_DEV || ddir == SG_DXFER_TO_FROM_DEV)
 		h3p->dxferp = (u8 __user *)p + cmd_size;
 	else
 		h3p->dxferp = NULL;
@@ -919,17 +894,17 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 	h3p->pack_id = ohp->pack_id;
 	h3p->usr_ptr = NULL;
 	/*
-	 * SG_DXFER_TO_FROM_DEV is functionally equivalent to SG_DXFER_FROM_DEV,
-	 * but it is possible that the app intended SG_DXFER_TO_DEV, because
-	 * there is a non-zero input_size, so emit a warning.
+	 * SG_DXFER_TO_FROM_DEV is functionally equivalent to SG_DXFER_FROM_DEV, but it is possible
+	 * that the app intended SG_DXFER_TO_DEV, because there is a non-zero input_size, so emit
+	 * a warning.
 	 */
 	if (unlikely(h3p->dxfer_direction == SG_DXFER_TO_FROM_DEV)) {
+		/* Linux kernel has questionable multiline string literal conventions */
 		printk_ratelimited
-			(KERN_WARNING
-			 "%s: data in/out %d/%d bytes for SCSI command 0x%x-- guessing data in;\n"
-			 "   program %s not setting count and/or reply_len properly\n",
-			 __func__, ohp->reply_len - (int)SZ_SG_HEADER,
-			 input_size, (unsigned int)opcode, current->comm);
+			(KERN_WARNING "%s: data in/out %d/%d %s 0x%x-- %s;\n   program %s %s\n",
+			 __func__, ohp->reply_len - (int)SZ_SG_HEADER, input_size,
+			 "bytes for SCSI command", (unsigned int)opcode, "guessing data in",
+			 current->comm, "not setting count and/or reply_len properly");
 	}
 	sg_comm_wr_init(&cwr);
 	cwr.h3p = h3p;
@@ -983,8 +958,7 @@ sg_fetch_cmnd(struct sg_fd *sfp, const u8 __user *u_cdbp, int len, u8 *cdbp)
 }
 
 static int
-sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync,
-	     struct sg_request **o_srp)
+sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync, struct sg_request **o_srp)
 {
 	unsigned long ul_timeout;
 	struct sg_request *srp;
@@ -1076,13 +1050,11 @@ sg_sgat_zero(struct sg_scatter_hold *sgatp, int off, int nbytes)
 }
 
 /*
- * Copies nbytes from the start of 'fromp' into sgatp (this driver's scatter
- * gather list representation) starting at byte offset 'off'. If nbytes is
- * too long then it is trimmed.
+ * Copies nbytes from the start of 'fromp' into sgatp (this driver's scatter gather list
+ * representation) starting at byte offset 'off'. If nbytes is too long then it is trimmed.
  */
 static void
-sg_sgat_cp_into(struct sg_scatter_hold *sgatp, int off, const u8 *fromp,
-		int nbytes)
+sg_sgat_cp_into(struct sg_scatter_hold *sgatp, int off, const u8 *fromp, int nbytes)
 {
 	int k, rem, off_pl_nbyt;
 	int ind = 0;
@@ -1116,8 +1088,7 @@ sg_sgat_cp_into(struct sg_scatter_hold *sgatp, int off, const u8 *fromp,
 	for ( ; k < off_pl_nbyt; k += rem) {
 		rem = off_pl_nbyt - k;
 		if (rem >= elem_sz) {
-			memcpy((u8 *)pg_ep + ind, fromp + from_off,
-			       elem_sz - ind);
+			memcpy((u8 *)pg_ep + ind, fromp + from_off, elem_sz - ind);
 			if (++pg_ind >= num_sgat)
 				return;
 			pg_ep = sgatp->pages[pg_ind];
@@ -1132,10 +1103,9 @@ sg_sgat_cp_into(struct sg_scatter_hold *sgatp, int off, const u8 *fromp,
 }
 
 /*
- * Takes a pointer (cop) to the multiple request (mrq) control object and
- * a pointer to the command array. The command array (with tot_reqs elements)
- * is written out (flushed) to user space pointer cop->din_xferp. The
- * secondary error value (s_res) is placed in the cop->spare_out field.
+ * Takes a pointer (cop) to the multiple request (mrq) control object and a pointer to the command
+ * array. The command array (with tot_reqs elements) is written out (flushed) to user space
+ * pointer cop->din_xferp. The secondary error value (s_res) is placed in the cop->spare_out field.
  */
 static int
 sg_mrq_arr_flush(struct sg_mrq_hold *mhp)
@@ -1143,15 +1113,12 @@ sg_mrq_arr_flush(struct sg_mrq_hold *mhp)
 	int s_res = mhp->s_res;
 	struct sg_io_v4 *cop = mhp->cwrp->h4p;
 	void __user *p = uptr64(cop->din_xferp);
-	struct sg_io_v4 *a_hds = mhp->a_hds;
 	u32 sz = min(mhp->tot_reqs * SZ_SG_IO_V4, cop->din_xfer_len);
 
 	if (unlikely(s_res))
 		cop->spare_out = -s_res;
-	if (unlikely(!p))
-		return 0;
-	if (sz > 0) {
-		if (copy_to_user(p, a_hds, sz))
+	if (likely(sz > 0 && p)) {
+		if (copy_to_user(p, mhp->a_hds, sz))
 			return -EFAULT;
 	}
 	return 0;
@@ -1223,8 +1190,7 @@ sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
 
 /* N.B. After this function is completed what srp points to should be considered invalid. */
 static int
-sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *sfp,
-		struct sg_request *srp)
+sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_request *srp)
 {
 	int s_res, indx;
 	int tot_reqs = mhp->tot_reqs;
@@ -1236,12 +1202,11 @@ sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *sfp,
 		return -EPROTO;
 	indx = srp->s_hdr4.mrq_ind;
 	if (unlikely(srp->parentfp != sfp)) {
-		SG_LOG(1, sfp, "%s: mrq_ind=%d, sfp out-of-sync\n",
-		       __func__, indx);
+		SG_LOG(1, sfp, "%s: mrq_ind=%d, sfp out-of-sync\n", __func__, indx);
 		return -EPROTO;
 	}
-	SG_LOG(3, sfp, "%s: %s, mrq_ind=%d, pack_id=%d\n", __func__,
-	       sg_side_str(srp), indx, srp->pack_id);
+	SG_LOG(3, sfp, "%s: %s, mrq_ind=%d, pack_id=%d\n", __func__, sg_side_str(srp), indx,
+	       srp->pack_id);
 	if (unlikely(indx < 0 || indx >= tot_reqs))
 		return -EPROTO;
 	hp = a_hds + indx;
@@ -1257,14 +1222,12 @@ sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *sfp,
 	if (cop->din_xfer_len > 0)
 		--cop->din_resid;
 	if (mhp->co_mmap) {
-		sg_sgat_cp_into(mhp->co_mmap_sgatp, indx * SZ_SG_IO_V4,
-				(const u8 *)hp, SZ_SG_IO_V4);
+		sg_sgat_cp_into(mhp->co_mmap_sgatp, indx * SZ_SG_IO_V4, (const u8 *)hp,
+				SZ_SG_IO_V4);
 		if (sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL))
 			kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
 		if (sfp->efd_ctxp && (hp->flags & SGV4_FLAG_EVENTFD)) {
-			u64 n = eventfd_signal(sfp->efd_ctxp, 1);
-
-			if (n != 1)
+			if (eventfd_signal(sfp->efd_ctxp, 1) != 1)
 				pr_info("%s: eventfd_signal problem\n", __func__);
 		}
 	} else if (sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
@@ -1280,11 +1243,9 @@ static int
 sg_wait_any_mrq(struct sg_fd *sfp, struct sg_request **srpp)
 {
 	if (test_bit(SG_FFD_EXCL_WAITQ, sfp->ffd_bm))
-		return __wait_event_interruptible_exclusive
-					(sfp->cmpl_wait,
-					 sg_mrq_get_ready_srp(sfp, srpp));
-	return __wait_event_interruptible(sfp->cmpl_wait,
-					  sg_mrq_get_ready_srp(sfp, srpp));
+		return __wait_event_interruptible_exclusive(sfp->cmpl_wait,
+							    sg_mrq_get_ready_srp(sfp, srpp));
+	return __wait_event_interruptible(sfp->cmpl_wait, sg_mrq_get_ready_srp(sfp, srpp));
 }
 
 static bool
@@ -1339,15 +1300,13 @@ sg_wait_poll_for_given_srp(struct sg_fd *sfp, struct sg_request *srp, bool do_po
 	if (atomic_read(&srp->rq_st) != SG_RQ_INFLIGHT)
 		goto skip_wait;		/* and skip _acquire() */
 	/* N.B. The SG_FFD_EXCL_WAITQ flag is ignored here. */
-	res = __wait_event_interruptible(sfp->cmpl_wait,
-					 sg_rq_landed(sdp, srp));
+	res = __wait_event_interruptible(sfp->cmpl_wait, sg_rq_landed(sdp, srp));
 	if (unlikely(res)) { /* -ERESTARTSYS because signal hit thread */
 		set_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
 		/* orphans harvested when sfp->keep_orphan is false */
 		sg_rq_chg_state_force(srp, SG_RQ_INFLIGHT);
-		SG_LOG(1, sfp, "%s:  wait_event_interruptible(): %s[%d]\n",
-		       __func__, (res == -ERESTARTSYS ? "ERESTARTSYS" : ""),
-		       res);
+		SG_LOG(1, sfp, "%s:  wait_event_interruptible(): %s[%d]\n", __func__,
+		       (res == -ERESTARTSYS ? "ERESTARTSYS" : ""), res);
 		return res;
 	}
 skip_wait:
@@ -1441,20 +1400,18 @@ sg_mrq_poll_either(struct sg_fd *sfp, struct sg_fd *sec_sfp, bool *on_sfp)
 }
 
 /*
- * This is a fair-ish algorithm for an interruptible wait on two file
- * descriptors. It favours the main fd over the secondary fd (sec_sfp).
- * Increments cop->info for each successful completion.
+ * This is a fair-ish algorithm for an interruptible wait on two file descriptors. It favours the
+ * main fd over the secondary fd (sec_sfp). Increments cop->info for each successful completion.
  */
 static int
-sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp,
-		struct sg_fd *sec_sfp, int mreqs, int sec_reqs)
+sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_fd *sec_sfp, int mreqs,
+		int sec_reqs)
 {
 	bool on_sfp;
 	int res;
 	struct sg_request *srp;
 
-	SG_LOG(3, sfp, "%s: mreqs=%d, sec_reqs=%d\n", __func__, mreqs,
-	       sec_reqs);
+	SG_LOG(3, sfp, "%s: mreqs=%d, sec_reqs=%d\n", __func__, mreqs, sec_reqs);
 	while (mreqs + sec_reqs > 0) {
 		while (mreqs > 0 && sg_mrq_get_ready_srp(sfp, &srp)) {
 			--mreqs;
@@ -1637,6 +1594,8 @@ sg_mrq_sanity(struct sg_mrq_hold *mhp, bool is_svb)
 	       rip, k);
 	return false;
 }
+
+/* rsv_idx>=0 only when this request is the write-side of a request share */
 static struct sg_request *
 sg_mrq_submit(struct sg_fd *rq_sfp, struct sg_mrq_hold *mhp, int pos_in_rq_arr, int rsv_idx,
 	      struct sg_request *possible_srp)
@@ -1667,12 +1626,11 @@ sg_mrq_submit(struct sg_fd *rq_sfp, struct sg_mrq_hold *mhp, int pos_in_rq_arr,
 }
 
 /*
- * Processes most mrq requests apart from those from "shared variable
- * blocking" (svb) method which is processed in sg_process_svb_mrq().
+ * Processes most mrq requests apart from those from "shared variable blocking" (svb) method which
+ * is processed in sg_process_svb_mrq().
  */
 static int
-sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
-		    struct sg_mrq_hold *mhp)
+sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mhp)
 {
 	int flags, j;
 	int num_subm = 0;
@@ -1686,14 +1644,13 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 	struct sg_io_v4 *hp;		/* ptr to request object in a_hds */
 	struct sg_request *srp;
 
-	SG_LOG(3, fp, "%s: id_of_mrq=%d, tot_reqs=%d, enter\n", __func__,
-	       mhp->id_of_mrq, mhp->tot_reqs);
+	SG_LOG(3, fp, "%s: id_of_mrq=%d, tot_reqs=%d, enter\n", __func__, mhp->id_of_mrq,
+	       mhp->tot_reqs);
 	/* Dispatch (submit) requests and optionally wait for response */
 	for (hp = mhp->a_hds, j = 0; num_subm < mhp->tot_reqs; ++hp, ++j) {
-		if (mhp->chk_abort && test_and_clear_bit(SG_FFD_MRQ_ABORT,
-							 fp->ffd_bm)) {
-			SG_LOG(1, fp, "%s: id_of_mrq=%d aborting at ind=%d\n",
-			       __func__, mhp->id_of_mrq, num_subm);
+		if (mhp->chk_abort && test_and_clear_bit(SG_FFD_MRQ_ABORT, fp->ffd_bm)) {
+			SG_LOG(1, fp, "%s: id_of_mrq=%d aborting at ind=%d\n", __func__,
+			       mhp->id_of_mrq, num_subm);
 			break;	/* N.B. rest not submitted */
 		}
 		flags = hp->flags;
@@ -1705,8 +1662,7 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 		}
 		srp->s_hdr4.mrq_ind = num_subm++;
 		if (mhp->chk_abort)
-			atomic_set(&srp->s_hdr4.pack_id_of_mrq,
-				   mhp->id_of_mrq);
+			atomic_set(&srp->s_hdr4.pack_id_of_mrq, mhp->id_of_mrq);
 		if (mhp->immed || (!(mhp->from_sg_io || (flags & shr_complet_b4)))) {
 			if (fp == rq_sfp)
 				++this_fp_sent;
@@ -1724,7 +1680,6 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 		if (unlikely(res))
 			break;
 		++num_cmpl;
-
 	}	/* end of dispatch request and optionally wait response loop */
 	cop->dout_resid = mhp->tot_reqs - num_subm;
 	cop->info = mhp->immed ? num_subm : num_cmpl;
@@ -1736,10 +1691,8 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 	if (mhp->immed)
 		return res;
 	if (likely(res == 0 && (this_fp_sent + other_fp_sent) > 0)) {
-		mhp->s_res = sg_mrq_complets(mhp, fp, o_sfp, this_fp_sent,
-					     other_fp_sent);
-		if (unlikely(mhp->s_res == -EFAULT ||
-			     mhp->s_res == -ERESTARTSYS))
+		mhp->s_res = sg_mrq_complets(mhp, fp, o_sfp, this_fp_sent, other_fp_sent);
+		if (unlikely(mhp->s_res == -EFAULT || mhp->s_res == -ERESTARTSYS))
 			res = mhp->s_res;	/* this may leave orphans */
 	}
 	if (mhp->id_of_mrq)	/* can no longer do a mrq abort */
@@ -1869,7 +1822,8 @@ sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold
 				if (first_err == 0)
 					first_err = mhp->s_res;
 				svb_arr[m].prev_ws_srp = NULL;
-				SG_LOG(1, o_sfp, "%s: mrq_submit(oth)->%d\n", __func__, mhp->s_res);
+				SG_LOG(1, o_sfp, "%s: sg_mrq_submit(oth)->%d\n", __func__,
+				       mhp->s_res);
 				continue;
 			}
 			svb_arr[m].prev_ws_srp = srp;
@@ -1902,7 +1856,7 @@ sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold
 		}
 		if (chk_oth_first)
 			goto this_second;
-	}	/* end of loop for deferred ws submits and all responses */
+	}	/* end of loop for deferred ws submits and rs+ws responses */
 
 	if (res == 0 && first_err)
 		res = first_err;
@@ -1998,9 +1952,7 @@ sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 		if (IS_ERR(srp)) {
 			mhp->s_res = PTR_ERR(srp);
 			res = mhp->s_res;
-			SG_LOG(1, o_sfp,
-			       "%s: mrq_submit(oth)->%d\n",
-				__func__, res);
+			SG_LOG(1, o_sfp, "%s: mrq_submit(oth)->%d\n", __func__, res);
 			return res;
 		}
 		svb_arr[m].prev_ws_srp = srp;
@@ -2008,8 +1960,7 @@ sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 		++other_fp_sent;
 		srp->s_hdr4.mrq_ind = ws_pos;
 		if (mhp->chk_abort)
-			atomic_set(&srp->s_hdr4.pack_id_of_mrq,
-				   mhp->id_of_mrq);
+			atomic_set(&srp->s_hdr4.pack_id_of_mrq, mhp->id_of_mrq);
 	}
 	while (this_fp_sent > 0) {	/* non-data requests */
 		res = sg_wait_any_mrq(fp, &srp);
@@ -2043,8 +1994,7 @@ sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
  * per fd" rule is enforced by the SG_FFD_SVB_ACTIVE file descriptor flag.
  */
 static int
-sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
-		   struct sg_mrq_hold *mhp)
+sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mhp)
 {
 	bool aborted = false;
 	int j, delta_subm, subm_before, cmpl_before;
@@ -2053,13 +2003,12 @@ sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 	int res = 0;
 	struct sg_io_v4 *cop = mhp->cwrp->h4p;
 
-	SG_LOG(3, fp, "%s: id_of_mrq=%d, tot_reqs=%d, enter\n", __func__,
-	       mhp->id_of_mrq, mhp->tot_reqs);
+	SG_LOG(3, fp, "%s: id_of_mrq=%d, tot_reqs=%d, enter\n", __func__, mhp->id_of_mrq,
+	       mhp->tot_reqs);
 
 	/* outer loop: SG_MAX_RSV_REQS read-side requests (chunks) at a time */
 	for (j = 0; j < mhp->tot_reqs; j += delta_subm) {
-		if (mhp->chk_abort &&
-		    test_and_clear_bit(SG_FFD_MRQ_ABORT, fp->ffd_bm)) {
+		if (mhp->chk_abort && test_and_clear_bit(SG_FFD_MRQ_ABORT, fp->ffd_bm)) {
 			SG_LOG(1, fp, "%s: id_of_mrq=%d aborting at pos=%d\n", __func__,
 			       mhp->id_of_mrq, num_subm);
 			aborted = true;
@@ -2077,8 +2026,7 @@ sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp,
 		num_cmpl += (cop->info - cmpl_before);
 		if (res || delta_subm == 0)	/* error or didn't make progress */
 			break;
-		if (unlikely(mhp->s_res == -EFAULT ||
-			     mhp->s_res == -ERESTARTSYS))
+		if (unlikely(mhp->s_res == -EFAULT || mhp->s_res == -ERESTARTSYS))
 			res = mhp->s_res;	/* this may leave orphans */
 		if (res)
 			break;
@@ -2110,11 +2058,10 @@ sg_mrq_name(bool from_sg_io, u32 flags)
 #endif
 
 /*
- * Implements the multiple request functionality. When 'from_sg_io' is true
- * invocation was via ioctl(SG_IO), otherwise it was via ioctl(SG_IOSUBMIT).
- * Submit non-blocking if IMMED flag given or when ioctl(SG_IOSUBMIT)
- * is used with O_NONBLOCK set on its file descriptor. Hipri non-blocking
- * is when the HIPRI flag is given.
+ * Implements the multiple request functionality. When from_sg_io is true invocation was via
+ * ioctl(SG_IO), otherwise it was via ioctl(SG_IOSUBMIT). Submit non-blocking if IMMED flag given
+ * or when ioctl(SG_IOSUBMIT) is used with O_NONBLOCK set on its file descriptor. Hipri
+ * non-blocking is when the HIPRI flag is given.
  */
 static int
 sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
@@ -2129,7 +2076,7 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 	u32 cdb_alen = cop->request_len;
 	u32 tot_reqs = dout_len / SZ_SG_IO_V4;
 	u8 *cdb_ap = NULL;
-	struct sg_io_v4 *a_hds;		/* array of request objects */
+	struct sg_io_v4 *a_hds;			/* array of request objects */
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_fd *o_sfp = sg_fd_share_ptr(fp);
 	struct sg_device *sdp = fp->parentdp;
@@ -2157,11 +2104,10 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 	mhp->tot_reqs = tot_reqs;
 	mhp->s_res = 0;
 	if (mhp->id_of_mrq) {
-		existing_id = atomic_cmpxchg(&fp->mrq_id_abort, 0,
-					     mhp->id_of_mrq);
+		existing_id = atomic_cmpxchg(&fp->mrq_id_abort, 0, mhp->id_of_mrq);
 		if (existing_id && existing_id != mhp->id_of_mrq) {
-			SG_LOG(1, fp, "%s: existing id=%d id_of_mrq=%d\n",
-			       __func__, existing_id, mhp->id_of_mrq);
+			SG_LOG(1, fp, "%s: existing id=%d id_of_mrq=%d\n", __func__, existing_id,
+			       mhp->id_of_mrq);
 			return -EDOM;
 		}
 		clear_bit(SG_FFD_MRQ_ABORT, fp->ffd_bm);
@@ -2169,27 +2115,26 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 	} else {
 		mhp->chk_abort = false;
 	}
-	if (from_sg_io) {
+	if (from_sg_io) {	/* only ordered blocking uses ioctl(SG_IO) */
 		if (unlikely(mhp->immed)) {
-			SG_LOG(1, fp, "%s: ioctl(SG_IO) %s contradicts\n",
-			       __func__, "with SGV4_FLAG_IMMED");
+			SG_LOG(1, fp, "%s: ioctl(SG_IO) with SGV4_FLAG_IMMED contradicts\n",
+			       __func__);
 			return -ERANGE;
 		}
 		if (unlikely(is_svb)) {
-			SG_LOG(1, fp, "%s: ioctl(SG_IO) %s contradicts\n",
-			       __func__, "with SGV4_FLAG_SHARE");
+			SG_LOG(1, fp, "%s: ioctl(SG_IO) with SGV4_FLAG_SHARE contradicts\n",
+			       __func__);
 			return -ERANGE;
 		}
 		if (unlikely(f_non_block)) {
-			SG_LOG(6, fp, "%s: ioctl(SG_IO) %s O_NONBLOCK\n",
-			       __func__, "ignoring");
+			SG_LOG(6, fp, "%s: ioctl(SG_IO) ignoring O_NONBLOCK\n", __func__);
 			f_non_block = false;
 		}
 	}
 	if (!mhp->immed && f_non_block)
 		mhp->immed = true;	/* hmm, think about this */
-	SG_LOG(3, fp, "%s: %s, tot_reqs=%u, id_of_mrq=%d\n", __func__,
-	       mrq_name, tot_reqs, mhp->id_of_mrq);
+	SG_LOG(3, fp, "%s: %s, tot_reqs=%u, id_of_mrq=%d\n", __func__, mrq_name, tot_reqs,
+	       mhp->id_of_mrq);
 	sg_v4h_partial_zero(cop);
 
 	if (mhp->co_mmap) {
@@ -2209,8 +2154,7 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 	}
 	if (unlikely(tot_reqs > U16_MAX)) {
 		return -ERANGE;
-	} else if (unlikely(dout_len > SG_MAX_MULTI_REQ_SZ ||
-			    din_len > SG_MAX_MULTI_REQ_SZ ||
+	} else if (unlikely(dout_len > SG_MAX_MULTI_REQ_SZ || din_len > SG_MAX_MULTI_REQ_SZ ||
 			    cdb_alen > SG_MAX_MULTI_REQ_SZ)) {
 		return  -E2BIG;
 	} else if (unlikely(mhp->immed && mhp->stop_if)) {
@@ -2241,8 +2185,7 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 			clear_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm);
 		return -ENOMEM;
 	}
-	if (copy_from_user(a_hds, cuptr64(cop->dout_xferp),
-			   tot_reqs * SZ_SG_IO_V4)) {
+	if (copy_from_user(a_hds, cuptr64(cop->dout_xferp), tot_reqs * SZ_SG_IO_V4)) {
 		res = -EFAULT;
 		goto fini;
 	}
@@ -2290,8 +2233,8 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 }
 
 static int
-sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
-	     bool from_sg_io, struct sg_request **o_srp)
+sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p, bool from_sg_io,
+	     struct sg_request **o_srp)
 {
 	int res = 0;
 	int dlen;
@@ -2304,8 +2247,7 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 	cwr.dlen = dlen;
 	if (h4p->flags & SGV4_FLAG_MULTIPLE_REQS) {
 		/* want v4 async or sync with guard, din and dout and flags */
-		if (!h4p->dout_xferp || h4p->din_iovec_count ||
-		    h4p->dout_iovec_count ||
+		if (!h4p->dout_xferp || h4p->din_iovec_count || h4p->dout_iovec_count ||
 		    (h4p->dout_xfer_len % SZ_SG_IO_V4))
 			return -ERANGE;
 		if (o_srp)
@@ -2349,8 +2291,7 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p,
 		u64 gen_tag = srp->tag;
 		struct sg_io_v4 __user *h4_up = (struct sg_io_v4 __user *)p;
 
-		if (copy_to_user(&h4_up->generated_tag, &gen_tag,
-				 sizeof(gen_tag)))
+		if (copy_to_user(&h4_up->generated_tag, &gen_tag, sizeof(gen_tag)))
 			return -EFAULT;
 	}
 	return res;
@@ -2393,18 +2334,16 @@ sg_ctl_iosubmit_v3(struct sg_fd *sfp, void __user *p)
 }
 
 /*
- * Assumes sharing has been established at the file descriptor level and now we
- * check the rq_flags of a new request/command. SGV4_FLAG_NO_DXFER may or may
- * not be used on the read-side, it must be used on the write-side. Also
- * returns (via *sh_varp) the proposed sg_request::sh_var of the new request
- * yet to be built/re-used.
+ * Assumes sharing has been established at the file descriptor level and now we check the rq_flags
+ * of a new request/command. SGV4_FLAG_NO_DXFER may or may not be used on the read-side, it must
+ * be used on the write-side. Also returns (via *sh_varp) the proposed sg_request::sh_var of the
+ * new request yet to be built/re-used.
  */
 static int
 sg_share_chk_flags(struct sg_fd *sfp, u32 rq_flags, int dxfer_len, int dir,
 		   enum sg_shr_var *sh_varp)
 {
-	bool is_read_side = xa_get_mark(&sfp->parentdp->sfp_arr, sfp->idx,
-					SG_XA_FD_RS_SHARE);
+	bool is_read_side = xa_get_mark(&sfp->parentdp->sfp_arr, sfp->idx, SG_XA_FD_RS_SHARE);
 	int result = 0;
 	enum sg_shr_var sh_var = SG_SHR_NONE;
 
@@ -2412,14 +2351,12 @@ sg_share_chk_flags(struct sg_fd *sfp, u32 rq_flags, int dxfer_len, int dir,
 		if (unlikely(rq_flags & SG_FLAG_DIRECT_IO)) {
 			result = -EINVAL; /* since no control of data buffer */
 		} else if (unlikely(dxfer_len < 1)) {
-			sh_var = is_read_side ? SG_SHR_RS_NOT_SRQ :
-						SG_SHR_WS_NOT_SRQ;
+			sh_var = is_read_side ? SG_SHR_RS_NOT_SRQ : SG_SHR_WS_NOT_SRQ;
 		} else if (is_read_side) {
 			sh_var = SG_SHR_RS_RQ;
 			if (unlikely(dir != SG_DXFER_FROM_DEV))
 				result = -ENOMSG;
-			if (rq_flags & SGV4_FLAG_NO_DXFER) {
-				/* rule out some contradictions */
+			if (rq_flags & SGV4_FLAG_NO_DXFER) {	/* rule out some contradictions */
 				if (unlikely(rq_flags & SG_FL_MMAP_DIRECT))
 					result = -ENODATA;
 			}
@@ -2443,19 +2380,17 @@ sg_share_chk_flags(struct sg_fd *sfp, u32 rq_flags, int dxfer_len, int dir,
 
 #if IS_ENABLED(SG_LOG_ACTIVE)
 static void
-sg_rq_state_fail_msg(struct sg_fd *sfp, enum sg_rq_state exp_old_st,
-		     enum sg_rq_state want_st, const char *fromp)
+sg_rq_state_fail_msg(struct sg_fd *sfp, enum sg_rq_state exp_old_st, enum sg_rq_state want_st,
+		     const char *fromp)
 {
 	const char *eaw_rs = "expected_old,wanted rq_st";
 
 	if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
-		SG_LOG(1, sfp, "%s: %s: %s,%s,%s\n",
-		       __func__, fromp, eaw_rs,
-		       sg_rq_st_str(exp_old_st, false),
-		       sg_rq_st_str(want_st, false));
+		SG_LOG(1, sfp, "%s: %s: %s,%s,%s\n", __func__, fromp, eaw_rs,
+		       sg_rq_st_str(exp_old_st, false), sg_rq_st_str(want_st, false));
 	else
-		pr_info("sg: %s: %s: %s: %d,%d\n", __func__, fromp, eaw_rs,
-			(int)exp_old_st, (int)want_st);
+		pr_info("sg: %s: %s: %s: %d,%d\n", __func__, fromp, eaw_rs, (int)exp_old_st,
+			(int)want_st);
 }
 #endif
 
@@ -2529,8 +2464,7 @@ static const int sg_rq_state_mul2arr[] = {2, 0, 8, 0, 0, 0};
  * function (and others ending in '_ulck') assumes srp_arr xarray spinlock is already held.
  */
 static int
-sg_rq_chg_state_ulck(struct sg_request *srp, enum sg_rq_state old_st,
-		     enum sg_rq_state new_st)
+sg_rq_chg_state_ulck(struct sg_request *srp, enum sg_rq_state old_st, enum sg_rq_state new_st)
 {
 	enum sg_rq_state act_old_st =
 			(enum sg_rq_state)atomic_cmpxchg_relaxed(&srp->rq_st, old_st, new_st);
@@ -2538,8 +2472,8 @@ sg_rq_chg_state_ulck(struct sg_request *srp, enum sg_rq_state old_st,
 
 	if (unlikely(act_old_st != old_st)) {
 #if IS_ENABLED(SG_LOG_ACTIVE)
-		SG_LOG(1, srp->parentfp, "%s: unexpected old state: %s\n",
-		       __func__, sg_rq_st_str(act_old_st, false));
+		SG_LOG(1, srp->parentfp, "%s: unexpected old state: %s\n", __func__,
+		       sg_rq_st_str(act_old_st, false));
 #endif
 		return -EPROTOTYPE;	/* only used for this error type */
 	}
@@ -2561,8 +2495,7 @@ sg_rq_chg_state_ulck(struct sg_request *srp, enum sg_rq_state old_st,
 
 /* Similar to sg_rq_chg_state_ulck() but uses the xarray spinlock */
 static int
-sg_rq_chg_state(struct sg_request *srp, enum sg_rq_state old_st,
-		enum sg_rq_state new_st)
+sg_rq_chg_state(struct sg_request *srp, enum sg_rq_state old_st, enum sg_rq_state new_st)
 {
 	enum sg_rq_state act_old_st;
 	int indic = sg_rq_state_arr[(int)old_st] + sg_rq_state_mul2arr[(int)new_st];
@@ -2601,8 +2534,8 @@ sg_rq_chg_state(struct sg_request *srp, enum sg_rq_state old_st,
 }
 
 /*
- * Returns index of an unused element in sfp's rsv_arr, or -1 if it is full.
- * Marks that element's rsv_srp with ERR_PTR(-EBUSY) to reserve that index.
+ * Returns index of an unused element in sfp's rsv_arr, or -1 if it is full. Marks that element's
+ * rsv_srp with ERR_PTR(-EBUSY) to reserve that index.
  */
 static int
 sg_get_idx_new(struct sg_fd *sfp)
@@ -2632,9 +2565,8 @@ sg_get_idx_new_lck(struct sg_fd *sfp)
 }
 
 /*
- * Looks for an available element index in sfp's rsv_arr. That element's
- * sh_srp must be NULL and will be set to ERR_PTR(-EBUSY). If no element
- * is available then returns -1.
+ * Looks for an available element index in sfp's rsv_arr. That element's sh_srp must be NULL and
+ * will be set to ERR_PTR(-EBUSY). If no element is available then returns -1.
  */
 static int
 sg_get_idx_available(struct sg_fd *sfp)
@@ -2684,13 +2616,11 @@ sg_get_probable_read_side(struct sg_fd *sfp)
 }
 
 /*
- * Returns string of the form: <leadin>rsv<num><leadout> if srp is one of
- * the reserve requests. Otherwise a blank string of length <leadin> plus
- * length of <leadout> is returned.
+ * Returns string of the form: <leadin>rsv<num><leadout> if srp is one of the reserve requests.
+ * Otherwise a blank string of length <leadin> plus length of <leadout> is returned.
  */
 static const char *
-sg_get_rsv_str(struct sg_request *srp, const char *leadin,
-	       const char *leadout, int b_len, char *b)
+sg_get_rsv_str(struct sg_request *srp, const char *leadin, const char *leadout, int b_len, char *b)
 {
 	int k, i_len, o_len, len;
 	struct sg_fd *sfp;
@@ -2727,8 +2657,8 @@ sg_get_rsv_str(struct sg_request *srp, const char *leadin,
 }
 
 static inline const char *
-sg_get_rsv_str_lck(struct sg_request *srp, const char *leadin,
-		   const char *leadout, int b_len, char *b)
+sg_get_rsv_str_lck(struct sg_request *srp, const char *leadin, const char *leadout, int b_len,
+		   char *b)
 {
 	unsigned long iflags;
 	const char *cp;
@@ -2777,11 +2707,10 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 }
 
 /*
- * All writes and submits converge on this function to launch the SCSI
- * command/request (via blk_execute_rq_nowait). Returns a pointer to a
- * sg_request object holding the request just issued or a negated errno
- * value twisted by ERR_PTR.
- * N.B. pack_id placed in sg_io_v4::request_extra field.
+ * All writes and submits converge on this function to launch the SCSI command/request (via
+ * blk_execute_rq_nowait). Returns a pointer to a sg_request object holding the request just issued
+ * or a negated errno value twisted by ERR_PTR. N.B. pack_id placed in sg_io_v4::request_extra
+ * field.
  */
 static struct sg_request *
 sg_common_write(struct sg_comm_wr_t *cwrp)
@@ -2856,8 +2785,8 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 	res = sg_start_req(srp, cwrp, dir);
 	if (unlikely(res < 0))	/* probably out of space --> -ENOMEM */
 		goto err_out;
-	SG_LOG(4, fp, "%s: opcode=0x%02x, cdb_sz=%d, pack_id=%d\n", __func__,
-	       srp->cmd_opcode, cwrp->cmd_len, pack_id);
+	SG_LOG(4, fp, "%s: opcode=0x%02x, cdb_sz=%d, pack_id=%d\n", __func__, srp->cmd_opcode,
+	       cwrp->cmd_len, pack_id);
 	if (SG_IS_DETACHING(sdp)) {
 		res = -ENODEV;
 		goto err_out;
@@ -2871,21 +2800,20 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 }
 
 /*
- * ***********************************************************************
- * read(2) related functions follow. They are shown after write(2) related
- * functions. Apart from read(2) itself, ioctl(SG_IORECEIVE) and the second
- * half of the ioctl(SG_IO) share code with read(2).
- * ***********************************************************************
+ * *********************************************************************************************
+ * read(2) related functions follow. They are shown after write(2) related functions. Apart from
+ * read(2) itself, ioctl(SG_IORECEIVE) and the second half of the ioctl(SG_IO) share code with
+ * read(2).
+ * *********************************************************************************************
  */
 
 /*
- * This function is called by wait_event_interruptible in sg_read() and
- * sg_ctl_ioreceive(). wait_event_interruptible will return if this one
- * returns true (or an event like a signal (e.g. control-C) occurs).
+ * This function is called by wait_event_interruptible in sg_read() and sg_ctl_ioreceive().
+ * wait_event_interruptible will return if this one returns true (or an event like a signal (e.g.
+ * control-C) occurs).
  */
 static inline bool
-sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int id,
-		 bool is_tag)
+sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int id, bool is_tag)
 {
 	struct sg_request *srp;
 
@@ -2898,10 +2826,7 @@ sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int id,
 	return !!srp;
 }
 
-/*
- * Returns number of bytes copied to user space provided sense buffer or
- * negated errno value.
- */
+/* Returns number of bytes copied to user space provided sense buffer or negated errno value. */
 static int
 sg_copy_sense(struct sg_request *srp, bool v4_active)
 {
@@ -2971,17 +2896,15 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 		case SG_RQ_SHR_SWAP:
 			if (!(srp->rq_flags & SGV4_FLAG_KEEP_SHARE))
 				goto set_inactive;
-			SG_LOG(6, sfp, "%s: hold onto %s share\n",
-			       __func__, sg_get_rsv_str(rs_srp, "", "",
-							sizeof(b), b));
+			SG_LOG(6, sfp, "%s: hold onto %s share\n", __func__,
+			       sg_get_rsv_str(rs_srp, "", "", sizeof(b), b));
 			break;
 		case SG_RQ_SHR_IN_WS:
 			if (!(srp->rq_flags & SGV4_FLAG_KEEP_SHARE))
 				goto set_inactive;
 			err = sg_rq_chg_state(rs_srp, rs_st, SG_RQ_SHR_SWAP);
-			SG_LOG(6, sfp, "%s: hold onto %s share\n",
-			       __func__, sg_get_rsv_str(rs_srp, "", "",
-							sizeof(b), b));
+			SG_LOG(6, sfp, "%s: hold onto %s share\n", __func__,
+			       sg_get_rsv_str(rs_srp, "", "", sizeof(b), b));
 			break;
 		case SG_RQ_AWAIT_RCV:
 			break;
@@ -2991,9 +2914,8 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 			break;
 		default:
 			err = -EPROTO;	/* Logic error */
-			SG_LOG(1, sfp,
-			       "%s: SHR_WS_RQ, bad read-side state: %s\n",
-			       __func__, sg_rq_st_str(rs_st, true));
+			SG_LOG(1, sfp, "%s: SHR_WS_RQ, bad read-side state: %s\n", __func__,
+			       sg_rq_st_str(rs_st, true));
 			break;	/* nothing to do */
 		}
 	}
@@ -3029,15 +2951,13 @@ sg_complete_shr_rs(struct sg_fd *sfp, struct sg_request *srp, bool other_err,
 		sg_rq_chg_state_force(srp, SG_RQ_SHR_SWAP);
 	}
 	if (ws_sfp && !srp->sh_srp) {
-		if (ws_sfp->async_qp &&
-		    (!SG_IS_V4I(srp) || (srp->rq_flags & SGV4_FLAG_SIGNAL)))
+		if (ws_sfp->async_qp && (!SG_IS_V4I(srp) || (srp->rq_flags & SGV4_FLAG_SIGNAL)))
 			kill_fasync(&ws_sfp->async_qp, SIGPOLL, poll_type);
 		if (ws_sfp->efd_ctxp && (srp->rq_flags & SGV4_FLAG_EVENTFD)) {
 			u64 n = eventfd_signal(ws_sfp->efd_ctxp, 1);
 
 			if (n != 1)
-				pr_info("%s: srp=%pK eventfd prob\n",
-					__func__, srp);
+				pr_info("%s: srp=%pK eventfd problem\n", __func__, srp);
 		}
 	}
 }
@@ -3048,8 +2968,7 @@ sg_complete_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool other_err)
 	enum sg_rq_state sr_st = atomic_read(&srp->rq_st);
 
 	/* advance state machine, send signal to write-side if appropriate */
-	SG_LOG(4, sfp, "%s: %pK: sh_var=%s\n", __func__, srp,
-	       sg_shr_str(srp->sh_var, true));
+	SG_LOG(4, sfp, "%s: %pK: sh_var=%s\n", __func__, srp, sg_shr_str(srp->sh_var, true));
 	switch (srp->sh_var) {
 	case SG_SHR_RS_RQ:
 		sg_complete_shr_rs(sfp, srp, other_err, sr_st);
@@ -3062,8 +2981,7 @@ sg_complete_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool other_err)
 				rs_srp->sh_srp = NULL;
 				rs_srp->sh_var = SG_SHR_RS_NOT_SRQ;
 			} else {
-				SG_LOG(2, sfp, "%s: write-side's paired read is missing\n",
-				       __func__);
+				SG_LOG(2, sfp, "%s: paired read is missing\n", __func__);
 			}
 		}
 		srp->sh_var = SG_SHR_WS_NOT_SRQ;
@@ -3081,14 +2999,13 @@ sg_complete_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool other_err)
 }
 
 static int
-sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
-	      struct sg_io_v4 *h4p)
+sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p, struct sg_io_v4 *h4p)
 {
 	int err;
 	u32 rq_result = srp->rq_result;
 
-	SG_LOG(3, sfp, "%s: p=%s, h4p=%s\n", __func__,
-	       (p ? "given" : "NULL"), (h4p ? "given" : "NULL"));
+	SG_LOG(3, sfp, "%s: p=%s, h4p=%s\n", __func__, (p ? "given" : "NULL"),
+	       (h4p ? "given" : "NULL"));
 	err = sg_rec_state_v3v4(sfp, srp, true);
 	h4p->guard = 'Q';
 	h4p->protocol = 0;
@@ -3128,14 +3045,12 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p,
 }
 
 /*
- * Invoked when user calls ioctl(SG_IORECEIVE, SGV4_FLAG_MULTIPLE_REQS).
- * Returns negative on error including -ENODATA if there are no mrqs submitted
- * nor waiting. Otherwise it returns the number of elements written to
- * rsp_arr, which may be 0 if mrqs submitted but none waiting
+ * Invoked when user calls ioctl(SG_IORECEIVE, SGV4_FLAG_MULTIPLE_REQS). Returns negative on error
+ * including -ENODATA if there are no mrqs submitted nor waiting. Otherwise it returns the number
+ * of elements written to rsp_arr, which may be 0 if mrqs submitted but none waiting
  */
 static int
-sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs,
-		      struct sg_io_v4 *rsp_arr)
+sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs, struct sg_io_v4 *rsp_arr)
 {
 	int k;
 	int res = 0;
@@ -3170,14 +3085,12 @@ sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs,
 }
 
 /*
- * Invoked when user calls ioctl(SG_IORECEIVE, SGV4_FLAG_MULTIPLE_REQS).
- * Expected race as many concurrent calls with the same pack_id/tag can
- * occur. Only one should succeed per request (more may succeed but will get
- * different requests).
+ * Invoked when user calls ioctl(SG_IORECEIVE, SGV4_FLAG_MULTIPLE_REQS). Expected race as many
+ * concurrent calls with the same pack_id/tag can occur. Only one should succeed per request (more
+ * may succeed but will get different requests).
  */
 static int
-sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p,
-		 bool non_block)
+sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p, bool non_block)
 {
 	int res = 0;
 	u32 len, n;
@@ -3192,8 +3105,7 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p,
 		return -ERANGE;
 	n /= SZ_SG_IO_V4;
 	len = n * SZ_SG_IO_V4;
-	SG_LOG(3, sfp, "%s: %s, num_reqs=%u\n", __func__,
-	       (non_block ? "IMMED" : "blocking"), n);
+	SG_LOG(3, sfp, "%s: %s, num_reqs=%u\n", __func__, (non_block ? "IMMED" : "blocking"), n);
 	rsp_v4_arr = kcalloc(n, SZ_SG_IO_V4, GFP_KERNEL);
 	if (unlikely(!rsp_v4_arr))
 		return -ENOMEM;
@@ -3230,11 +3142,8 @@ sg_wait_poll_by_id(struct sg_fd *sfp, struct sg_request **srpp, int id,
 
 	if (test_bit(SG_FFD_EXCL_WAITQ, sfp->ffd_bm))
 		return __wait_event_interruptible_exclusive
-				(sfp->cmpl_wait,
-				 sg_get_ready_srp(sfp, srpp, id, is_tag));
-	return __wait_event_interruptible
-			(sfp->cmpl_wait,
-			 sg_get_ready_srp(sfp, srpp, id, is_tag));
+					(sfp->cmpl_wait, sg_get_ready_srp(sfp, srpp, id, is_tag));
+	return __wait_event_interruptible(sfp->cmpl_wait, sg_get_ready_srp(sfp, srpp, id, is_tag));
 poll_loop:
 	{
 		bool sig_pending = false;
@@ -3260,10 +3169,9 @@ sg_wait_poll_by_id(struct sg_fd *sfp, struct sg_request **srpp, int id,
 }
 
 /*
- * Called when ioctl(SG_IORECEIVE) received. Expects a v4 interface object.
- * Checks if O_NONBLOCK file flag given, if not checks given 'flags' field
- * to see if SGV4_FLAG_IMMED is set. Either of these implies non blocking.
- * When non-blocking and there is no request waiting, yields EAGAIN;
+ * Called when ioctl(SG_IORECEIVE) received. Expects a v4 interface object. Checks if O_NONBLOCK
+ * file flag given, if not checks given 'flags' field to see if SGV4_FLAG_IMMED is set. Either of
+ * these implies non blocking. When non-blocking and there is no request waiting, yields EAGAIN;
  * otherwise it waits (i.e. it "blocks").
  */
 static int
@@ -3286,8 +3194,7 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 	if (copy_from_user(h4p, p, SZ_SG_IO_V4))
 		return -EFAULT;
 	/* for v4: protocol=0 --> SCSI;  subprotocol=0 --> SPC++ */
-	if (unlikely(h4p->guard != 'Q' || h4p->protocol != 0 ||
-		     h4p->subprotocol != 0))
+	if (unlikely(h4p->guard != 'Q' || h4p->protocol != 0 || h4p->subprotocol != 0))
 		return -EPERM;
 	SG_LOG(3, sfp, "%s: non_block=%d, immed=%d, hipri=%d\n", __func__, non_block,
 	       !!(h4p->flags & SGV4_FLAG_IMMED), !!(h4p->flags & SGV4_FLAG_HIPRI));
@@ -3325,10 +3232,9 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 }
 
 /*
- * Called when ioctl(SG_IORECEIVE_V3) received. Expects a v3 interface.
- * Checks if O_NONBLOCK file flag given, if not checks given flags field
- * to see if SGV4_FLAG_IMMED is set. Either of these implies non blocking.
- * When non-blocking and there is no request waiting, yields EAGAIN;
+ * Called when ioctl(SG_IORECEIVE_V3) received. Expects a v3 interface. Checks if O_NONBLOCK file
+ * flag given, if not checks given flags field to see if SGV4_FLAG_IMMED is set. Either of these
+ * implies non blocking. When non-blocking and there is no request waiting, yields EAGAIN;
  * otherwise it waits.
  */
 static int
@@ -3380,8 +3286,7 @@ sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 }
 
 static int
-sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
-	     struct sg_request *srp)
+sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp, struct sg_request *srp)
 {
 	int res = 0;
 	u32 rq_res = srp->rq_result;
@@ -3399,22 +3304,19 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 	h2p->target_status = status_byte(rq_res);
 	h2p->host_status = host_byte(rq_res);
 	h2p->driver_status = driver_byte(rq_res);
-	if (unlikely(!scsi_status_is_good(rq_res) ||
-		     (driver_byte(rq_res) & DRIVER_SENSE))) {
+	if (unlikely(!scsi_status_is_good(rq_res) || (driver_byte(rq_res) & DRIVER_SENSE))) {
 		if (likely(srp->sense_bp)) {
 			u8 *sbp = srp->sense_bp;
 
 			srp->sense_bp = NULL;
-			memcpy(h2p->sense_buffer, sbp,
-			       sizeof(h2p->sense_buffer));
+			memcpy(h2p->sense_buffer, sbp, sizeof(h2p->sense_buffer));
 			mempool_free(sbp, sg_sense_pool);
 		}
 	}
 	switch (unlikely(host_byte(rq_res))) {
 	/*
-	 * This following setting of 'result' is for backward compatibility
-	 * and is best ignored by the user who should use target, host and
-	 * driver status.
+	 * This following setting of 'result' is for backward compatibility and is best ignored by
+	 * the user who should use target, host and driver status.
 	 */
 	case DID_OK:
 	case DID_PASSTHROUGH:
@@ -3465,10 +3367,9 @@ sg_read_v1v2(void __user *buf, int count, struct sg_fd *sfp,
 }
 
 /*
- * This is the read(2) system call entry point (see sg_fops) for this driver.
- * Accepts v1, v2 or v3 type headers (not v4). Returns count or negated
- * errno; if count is 0 then v3: returns -EINVAL; v1+v2: 0 when no other
- * error detected or -EIO.
+ * This is the read(2) system call entry point (see sg_fops) for this driver. Accepts v1, v2 or
+ * v3 type headers (not v4). Returns count or negated errno; if count is 0 then v3: returns
+ * -EINVAL; v1+v2: 0 when no other error detected or -EIO.
  */
 static ssize_t
 sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
@@ -3484,8 +3385,8 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 	struct sg_io_hdr a_sg_io_hdr;
 
 	/*
-	 * This could cause a response to be stranded. Close the associated
-	 * file descriptor to free up any resources being held.
+	 * This could cause a response to be stranded. Close the associated file descriptor to
+	 * free up any resources being held.
 	 */
 	ret = sg_check_file_access(filp, __func__);
 	if (unlikely(ret))
@@ -3504,9 +3405,8 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 
 	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm) && (int)count >= hlen) {
 		/*
-		 * Even though this is a user space read() system call, this
-		 * code is cheating to fetch the pack_id.
-		 * Only need first three 32 bit ints to determine interface.
+		 * Even though this is a user space read() system call, this code is cheating to
+		 * fetch the pack_id. Only need first three 32 bit ints to determine interface.
 		 */
 		if (copy_from_user(h2p, p, 3 * sizeof(int)))
 			return -EFAULT;
@@ -3530,8 +3430,7 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 						non_block = true;
 				}
 			} else if (v3_hdr->interface_id == 'Q') {
-				pr_info_once("sg: %s: v4 interface%s here\n",
-					     __func__, " disallowed");
+				pr_info_once("sg: %s: v4 interface disallowed here\n", __func__);
 				return -EPERM;
 			} else {
 				return -EPERM;
@@ -3575,9 +3474,8 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 }
 
 /*
- * Completes a v3 request/command. Called from sg_read {v2 or v3},
- * ioctl(SG_IO) {for v3}, or from ioctl(SG_IORECEIVE) when its
- * completing a v3 request/command.
+ * Completes a v3 request/command. Called from sg_read {v2 or v3}, ioctl(SG_IO) {for v3}, or from
+ * ioctl(SG_IORECEIVE) when its completing a v3 request/command.
  */
 static int
 sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
@@ -3587,8 +3485,8 @@ sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
 	struct sg_io_hdr hdr3;
 	struct sg_io_hdr *hp = &hdr3;
 
-	SG_LOG(3, sfp, "%s: sh_var: %s srp=0x%pK\n", __func__,
-	       sg_shr_str(srp->sh_var, false), srp);
+	SG_LOG(3, sfp, "%s: sh_var: %s srp=0x%pK\n", __func__, sg_shr_str(srp->sh_var, false),
+	       srp);
 	err = sg_rec_state_v3v4(sfp, srp, false);
 	memset(hp, 0, sizeof(*hp));
 	memcpy(hp, &srp->s_hdr3, sizeof(srp->s_hdr3));
@@ -3620,11 +3518,10 @@ max_sectors_bytes(struct request_queue *q)
 }
 
 /*
- * Calculates sg_device::max_sgat_elems and sg_device::max_sgat_sz. It uses
- * the device's request queue. If q not available sets max_sgat_elems to 1
- * and max_sgat_sz to PAGE_SIZE. If potential max_sgat_sz is greater than
- * 2^30 scales down the implied max_segment_size so the product of the
- * max_segment_size and max_sgat_elems is less than or equal to 2^30 .
+ * Calculates sg_device::max_sgat_elems and sg_device::max_sgat_sz. It uses the device's request
+ * queue. If q not available sets max_sgat_elems to 1 and max_sgat_sz to PAGE_SIZE. If potential
+ * max_sgat_sz is greater than 2^30 scales down the implied max_segment_size so the product of
+ * the max_segment_size and max_sgat_elems is less than or equal to 2^30 .
  */
 static void
 sg_calc_sgat_param(struct sg_device *sdp)
@@ -3759,7 +3656,7 @@ sg_unshare_rs_fd(struct sg_fd *rs_sfp, bool lck)
 	if (lck)
 		xa_unlock_irqrestore(xadp, iflags);
 	rcu_assign_pointer(rs_sfp->share_sfp, NULL);
-	kref_put(&rs_sfp->f_ref, sg_remove_sfp);/* get: sg_find_sfp_by_fd() */
+	kref_put(&rs_sfp->f_ref, sg_remove_sfp);	/* get: sg_find_sfp_by_fd() */
 }
 
 static void
@@ -3780,11 +3677,10 @@ sg_unshare_ws_fd(struct sg_fd *ws_sfp, bool lck)
 }
 
 /*
- * Clean up loose ends that occur when closing a file descriptor which is
- * part of a file share. There may be request shares in various states using
- * this file share so care is needed. Potential race when both sides of fd
- * share have their fd_s closed (i.e. sg_release()) at around the same time
- * is the reason for rechecking the FD_RS_SHARE or FD_UNSHARED marks.
+ * Clean up loose ends that occur when closing a file descriptor which is part of a file share.
+ * There may be request shares in various states using this file share so care is needed. Potential
+ * race when both sides of fd share have their fd_s closed (i.e. sg_release()) at around the same
+ * time is the reason for rechecking the FD_RS_SHARE or FD_UNSHARED marks.
  */
 static void
 sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
@@ -3803,8 +3699,7 @@ sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
 	struct xarray *xadp = &sdp->sfp_arr;
 	struct xarray *xafp = &sfp->srp_arr;
 
-	SG_LOG(3, sfp, "%s: sfp=%pK %s\n", __func__, sfp,
-	       (is_rd_side ? "read-side" : "write-side"));
+	SG_LOG(3, sfp, "%s: sfp=%pK %s-side\n", __func__, sfp, (is_rd_side ? "read" : "write"));
 	xa_lock_irqsave(xadp, iflags);
 	retry_count = 0;
 try_again:
@@ -3816,8 +3711,8 @@ sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
 	sh_sdp = sh_sfp->parentdp;
 	if (!xa_trylock(xafp)) {
 		/*
-		 * The other side of the share might be closing as well, avoid
-		 * deadlock. Should clear relatively quickly.
+		 * The other side of the share might be closing as well, avoid deadlock. Should
+		 * clear relatively quickly.
 		 */
 		xa_unlock_irqrestore(xadp, iflags);
 		if (++retry_count > SG_ADD_RQ_MAX_RETRIES) {
@@ -3837,8 +3732,7 @@ sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
 			bool set_inactive = false;
 
 			rsv_srp = *rapp;
-			if (IS_ERR_OR_NULL(rsv_srp) ||
-			    rsv_srp->sh_var != SG_SHR_RS_RQ)
+			if (IS_ERR_OR_NULL(rsv_srp) || rsv_srp->sh_var != SG_SHR_RS_RQ)
 				continue;
 			sr_st = atomic_read_acquire(&rsv_srp->rq_st);
 			switch (sr_st) {
@@ -3847,11 +3741,9 @@ sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
 				break;
 			case SG_RQ_SHR_IN_WS:
 				ws_srp = rsv_srp->sh_srp;
-				if (!IS_ERR_OR_NULL(ws_srp) &&
-				    !test_bit(SG_FFD_RELEASE,
-					      sh_sfp->ffd_bm)) {
+				if (!IS_ERR_OR_NULL(ws_srp) && !test_bit(SG_FFD_RELEASE,
+									 sh_sfp->ffd_bm))
 					ws_srp->sh_var = SG_SHR_WS_NOT_SRQ;
-				}
 				rsv_srp->sh_srp = NULL;
 				set_inactive = true;
 				break;
@@ -3865,13 +3757,13 @@ sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
 					atomic_inc(&sfp->inactives);
 			}
 		}
-		if (!xa_get_mark(&sh_sdp->sfp_arr, sh_sfp->idx,
-				 SG_XA_FD_FREE) && sg_fd_is_shared(sh_sfp))
+		if (!xa_get_mark(&sh_sdp->sfp_arr, sh_sfp->idx, SG_XA_FD_FREE) &&
+		    sg_fd_is_shared(sh_sfp))
 			sg_unshare_ws_fd(sh_sfp, sdp != sh_sdp);
 		sg_unshare_rs_fd(sfp, false);
 	} else {			/* is write-side of share */
-		if (!xa_get_mark(&sh_sdp->sfp_arr, sh_sfp->idx,
-				 SG_XA_FD_FREE) && sg_fd_is_shared(sh_sfp))
+		if (!xa_get_mark(&sh_sdp->sfp_arr, sh_sfp->idx, SG_XA_FD_FREE) &&
+		    sg_fd_is_shared(sh_sfp))
 			sg_unshare_rs_fd(sh_sfp, sdp != sh_sdp);
 		sg_unshare_ws_fd(sfp, false);
 	}
@@ -3881,9 +3773,8 @@ sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
 }
 
 /*
- * Active when writing 1 to ioctl(SG_SET_GET_EXTENDED(CTL_FLAGS(UNSHARE))),
- * writing 0 has no effect. Undoes the configuration that has done by
- * ioctl(SG_SET_GET_EXTENDED(SHARE_FD)).
+ * Active when writing 1 to ioctl(SG_SET_GET_EXTENDED(CTL_FLAGS(UNSHARE))), writing 0 has no
+ * effect. Undoes the configuration that has done by ioctl(SG_SET_GET_EXTENDED(SHARE_FD)).
  */
 static void
 sg_do_unshare(struct sg_fd *sfp, bool unshare_val)
@@ -3912,13 +3803,10 @@ sg_do_unshare(struct sg_fd *sfp, bool unshare_val)
 		rs_sfp = sfp;
 		ws_sfp = o_sfp;
 		rs_rsv_srp = rs_sfp->rsv_arr[0];
-		if (!IS_ERR_OR_NULL(rs_rsv_srp) &&
-		    rs_rsv_srp->sh_var != SG_SHR_RS_RQ) {
+		if (!IS_ERR_OR_NULL(rs_rsv_srp) && rs_rsv_srp->sh_var != SG_SHR_RS_RQ) {
 			if (unlikely(!mutex_trylock(&ws_sfp->f_mutex))) {
 				if (++retry_count > SG_ADD_RQ_MAX_RETRIES)
-					SG_LOG(1, sfp,
-					       "%s: cannot get write-side lock\n",
-					       __func__);
+					SG_LOG(1, sfp, "%s: cannot get ws lock\n", __func__);
 				else
 					retry = true;
 				goto fini;
@@ -3940,15 +3828,13 @@ sg_do_unshare(struct sg_fd *sfp, bool unshare_val)
 		ws_sfp = sfp;
 		if (unlikely(!mutex_trylock(&rs_sfp->f_mutex))) {
 			if (++retry_count > SG_ADD_RQ_MAX_RETRIES)
-				SG_LOG(1, sfp, "%s: cannot get read side lock\n",
-				       __func__);
+				SG_LOG(1, sfp, "%s: cannot get read side lock\n", __func__);
 			else
 				retry = true;
 			goto fini;
 		}
 		rs_rsv_srp = rs_sfp->rsv_arr[0];
-		if (!IS_ERR_OR_NULL(rs_rsv_srp) &&
-		    rs_rsv_srp->sh_var != SG_SHR_RS_RQ) {
+		if (!IS_ERR_OR_NULL(rs_rsv_srp) && rs_rsv_srp->sh_var != SG_SHR_RS_RQ) {
 			if (same_sdp_s) {
 				xa_lock_irqsave(xadp, iflags);
 				/* read-side is 'other' so do first */
@@ -3970,15 +3856,13 @@ sg_do_unshare(struct sg_fd *sfp, bool unshare_val)
 }
 
 /*
- * Returns duration since srp->start_ns (using boot time as an epoch). Unit
- * is nanoseconds when time_in_ns==true; else it is in milliseconds.
- * For backward compatibility the duration is placed in a 32 bit unsigned
- * integer. This limits the maximum nanosecond duration that can be
- * represented (without wrapping) to about 4.3 seconds. If that is exceeded
- * return equivalent of 3.999.. secs as it is more eye catching than the real
- * number. Negative durations should not be possible but if they occur set
- * duration to an unlikely 2 nanosec. Stalls in a request setup will have
- * ts0==S64_MAX and will return 1 for an unlikely 1 nanosecond duration.
+ * Returns duration since srp->start_ns (using boot time as an epoch). Unit is nanoseconds when
+ * time_in_ns==true; else it is in milliseconds. For backward compatibility the duration is placed
+ * in a 32 bit unsigned integer. This limits the maximum nanosecond duration that can be
+ * represented (without wrapping) to about 4.3 seconds. If that is exceeded return equivalent of
+ * 3.999.. secs as it is more eye catching than the real number. Negative durations should not be
+ * possible but if they occur set duration to an unlikely 2 nanosec. Stalls in a request setup
+ * will have ts0==S64_MAX and will return 1 for an unlikely 1 nanosecond duration.
  */
 static u32
 sg_calc_rq_dur(const struct sg_request *srp, bool time_in_ns)
@@ -4001,9 +3885,13 @@ sg_calc_rq_dur(const struct sg_request *srp, bool time_in_ns)
 	return (diff > (s64)U32_MAX) ? 3999999999U : (u32)diff;
 }
 
+/*
+ * Return of U32_MAX means srp is inactive. *is_durp is unused as an input but if is_dur is
+ * non-NULL, it is set on output when duration calculation has completed, clear (or false) if
+ * it is on-going.
+ */
 static u32
-sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
-	   bool time_in_ns, bool *is_durp)
+sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp, bool time_in_ns, bool *is_durp)
 {
 	bool is_dur = false;
 	u32 res = U32_MAX;
@@ -4029,30 +3917,23 @@ sg_get_dur(struct sg_request *srp, const enum sg_rq_state *sr_stp,
 }
 
 static void
-sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp,
-			struct sg_req_info *rip)
+sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp, struct sg_req_info *rip)
 {
 	unsigned long iflags;
 
 	xa_lock_irqsave(&sfp->srp_arr, iflags);
-	rip->duration = sg_get_dur(srp, NULL, test_bit(SG_FFD_TIME_IN_NS,
-						       sfp->ffd_bm), NULL);
+	rip->duration = sg_get_dur(srp, NULL, test_bit(SG_FFD_TIME_IN_NS, sfp->ffd_bm), NULL);
 	if (rip->duration == U32_MAX)
 		rip->duration = 0;
 	rip->orphan = test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
 	rip->sg_io_owned = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
 	rip->problem = !sg_result_is_good(srp->rq_result);
-	rip->pack_id = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm) ?
-				srp->tag : srp->pack_id;
-	rip->usr_ptr = SG_IS_V4I(srp) ? uptr64(srp->s_hdr4.usr_ptr)
-				      : srp->s_hdr3.usr_ptr;
+	rip->pack_id = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm) ? srp->tag : srp->pack_id;
+	rip->usr_ptr = SG_IS_V4I(srp) ? uptr64(srp->s_hdr4.usr_ptr) : srp->s_hdr3.usr_ptr;
 	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 }
 
-/*
- * Handles ioctl(SG_IO) for blocking (sync) usage of v3 or v4 interface.
- * Returns 0 on success else a negated errno.
- */
+/* Handles ioctl(SG_IO) for blocking (sync) usage of v3 or v4 interface. */
 static int
 sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 {
@@ -4070,8 +3951,7 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		return res;
 	if (unlikely(get_sg_io_hdr(h3p, p)))
 		return -EFAULT;
-	if (h3p->interface_id == 'Q') {
-		/* copy in rest of sg_io_v4 object */
+	if (h3p->interface_id == 'Q') {	/* copy in rest of sg_io_v4 object */
 		int v3_len;
 
 #ifdef CONFIG_COMPAT
@@ -4082,8 +3962,7 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 #else
 		v3_len = SZ_SG_IO_HDR;
 #endif
-		if (copy_from_user(hu8arr + v3_len,
-				   ((u8 __user *)p) + v3_len,
+		if (copy_from_user(hu8arr + v3_len, ((u8 __user *)p) + v3_len,
 				   SZ_SG_IO_V4 - v3_len))
 			return -EFAULT;
 		is_v4 = true;
@@ -4094,8 +3973,7 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		hipri = !!(h3p->flags & SGV4_FLAG_HIPRI);
 		res = sg_submit_v3(sfp, h3p, true, &srp);
 	} else {
-		pr_info_once("sg: %s: v3 or v4 interface only here\n",
-			     __func__);
+		pr_info_once("sg: %s: v3 or v4 interface only here\n", __func__);
 		return -EPERM;
 	}
 	if (unlikely(res < 0))
@@ -4105,9 +3983,8 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	res = sg_wait_poll_for_given_srp(sfp, srp, hipri);
 #if IS_ENABLED(SG_LOG_ACTIVE)
 	if (unlikely(res))
-		SG_LOG(1, sfp, "%s: %s=0x%pK  state: %s, share: %s\n",
-		       __func__, "unexpected srp", srp,
-		       sg_rq_st_str(atomic_read(&srp->rq_st), false),
+		SG_LOG(1, sfp, "%s: unexpected srp=0x%pK  state: %s, share: %s\n", __func__,
+		       srp, sg_rq_st_str(atomic_read(&srp->rq_st), false),
 		       sg_shr_str(srp->sh_var, false));
 #endif
 	if (likely(res == 0)) {
@@ -4120,8 +3997,8 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 }
 
 /*
- * When use_tag is true then id is a tag, else it is a pack_id. Returns
- * valid srp if match, else returns NULL.
+ * When use_tag is true then id is a tag, else it is a pack_id. Returns valid srp if match, else
+ * returns NULL.
  */
 static struct sg_request *
 sg_match_request(struct sg_fd *sfp, bool use_tag, int id)
@@ -4149,14 +4026,12 @@ sg_match_request(struct sg_fd *sfp, bool use_tag, int id)
 }
 
 /*
- * Looks for first request following 'after_rp' (or the start if after_rp is
- * NULL) whose pack_id_of_mrq matches the given pack_id. If after_rp is
- * non-NULL and it is not found, then the search restarts from the beginning
- * of the list. If no match is found then NULL is returned.
+ * Looks for first request following 'after_rp' (or the start if after_rp is NULL) whose
+ * pack_id_of_mrq matches the given pack_id. If after_rp is non-NULL and it is not found, then the
+ * search restarts from the beginning of the list. If no match is found then NULL is returned.
  */
 static struct sg_request *
-sg_match_first_mrq_after(struct sg_fd *sfp, int pack_id,
-			 struct sg_request *after_rp)
+sg_match_first_mrq_after(struct sg_fd *sfp, int pack_id, struct sg_request *after_rp)
 {
 	bool found = false;
 	bool look_for_after = after_rp ? true : false;
@@ -4197,13 +4072,13 @@ sg_abort_req(struct sg_fd *sfp, struct sg_request *srp)
 	struct request *rqq;
 
 	if (test_and_set_bit(SG_FRQ_ABORTING, srp->frq_bm)) {
-		SG_LOG(1, sfp, "%s: already aborting req pack_id/tag=%d/%d\n",
-		       __func__, srp->pack_id, srp->tag);
+		SG_LOG(1, sfp, "%s: already aborting req pack_id/tag=%d/%d\n", __func__,
+		       srp->pack_id, srp->tag);
 		goto fini;	/* skip quietly if already aborted */
 	}
 	rq_st = atomic_read_acquire(&srp->rq_st);
-	SG_LOG(3, sfp, "%s: req pack_id/tag=%d/%d, status=%s\n", __func__,
-	       srp->pack_id, srp->tag, sg_rq_st_str(rq_st, false));
+	SG_LOG(3, sfp, "%s: req pack_id/tag=%d/%d, status=%s\n", __func__, srp->pack_id, srp->tag,
+	       sg_rq_st_str(rq_st, false));
 	switch (rq_st) {
 	case SG_RQ_BUSY:
 		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
@@ -4221,8 +4096,7 @@ sg_abort_req(struct sg_fd *sfp, struct sg_request *srp)
 		srp->rq_result |= (DRIVER_SOFT << 24);
 		rqq = READ_ONCE(srp->rqq);
 		if (likely(rqq)) {
-			SG_LOG(5, sfp, "%s: -->blk_abort_request srp=0x%pK\n",
-			       __func__, srp);
+			SG_LOG(5, sfp, "%s: -->blk_abort_request srp=0x%pK\n", __func__, srp);
 			blk_abort_request(rqq);
 		}
 		break;
@@ -4259,12 +4133,11 @@ sg_mrq_abort_inflight(struct sg_fd *sfp, int pack_id)
 }
 
 /*
- * Implements ioctl(SG_IOABORT) when SGV4_FLAG_MULTIPLE_REQS set. pack_id is
- * non-zero and is from the request_extra field. dev_scope is set when
- * SGV4_FLAG_DEV_SCOPE is given; in that case there is one level of recursion
- * if there is no match or clash with given sfp. Will abort the first
- * mrq that matches then exit. Can only do mrq abort if the mrq submission
- * used a non-zero ctl_obj.request_extra (pack_id).
+ * Implements ioctl(SG_IOABORT) when SGV4_FLAG_MULTIPLE_REQS set. pack_id is non-zero and is from
+ * the request_extra field. dev_scope is set when SGV4_FLAG_DEV_SCOPE is given; in that case there
+ * is one level of recursion if there is no match or clash with given sfp. Will abort the first
+ * mrq that matches then exit. Can only do mrq abort if the mrq submission used a non-zero
+ * ctl_obj.request_extra (pack_id).
  */
 static int
 sg_mrq_abort(struct sg_fd *sfp, int pack_id, bool dev_scope)
@@ -4278,30 +4151,27 @@ sg_mrq_abort(struct sg_fd *sfp, int pack_id, bool dev_scope)
 	struct sg_fd *s_sfp;
 
 	if (pack_id != SG_PACK_ID_WILDCARD)
-		SG_LOG(3, sfp, "%s: pack_id=%d, dev_scope=%s\n", __func__,
-		       pack_id, (dev_scope ? "true" : "false"));
+		SG_LOG(3, sfp, "%s: pack_id=%d, dev_scope=%s\n", __func__, pack_id,
+		       (dev_scope ? "true" : "false"));
 	existing_id = atomic_read(&sfp->mrq_id_abort);
 	if (existing_id == 0) {
 		if (dev_scope)
 			goto check_whole_dev;
-		SG_LOG(1, sfp, "%s: sfp->mrq_id_abort is 0, nothing to do\n",
-		       __func__);
+		SG_LOG(1, sfp, "%s: sfp->mrq_id_abort is 0, nothing to do\n", __func__);
 		return -EADDRNOTAVAIL;
 	}
 	if (pack_id == SG_PACK_ID_WILDCARD) {
 		pack_id = existing_id;
-		SG_LOG(3, sfp, "%s: wildcard becomes pack_id=%d\n", __func__,
-		       pack_id);
+		SG_LOG(3, sfp, "%s: wildcard becomes pack_id=%d\n", __func__, pack_id);
 	} else if (pack_id != existing_id) {
 		if (dev_scope)
 			goto check_whole_dev;
-		SG_LOG(1, sfp, "%s: want id=%d, got sfp->mrq_id_abort=%d\n",
-		       __func__, pack_id, existing_id);
+		SG_LOG(1, sfp, "%s: want id=%d, got sfp->mrq_id_abort=%d\n", __func__, pack_id,
+		       existing_id);
 		return -EADDRINUSE;
 	}
 	if (test_and_set_bit(SG_FFD_MRQ_ABORT, sfp->ffd_bm))
-		SG_LOG(2, sfp, "%s: repeated SG_IOABORT on mrq_id=%d\n",
-		       __func__, pack_id);
+		SG_LOG(2, sfp, "%s: repeated SG_IOABORT on mrq_id=%d\n", __func__, pack_id);
 
 	/* now look for inflight requests matching that mrq pack_id */
 	xa_lock_irqsave(&sfp->srp_arr, iflags);
@@ -4344,11 +4214,10 @@ sg_mrq_abort(struct sg_fd *sfp, int pack_id, bool dev_scope)
 }
 
 /*
- * Tries to abort an inflight request/command. First it checks the current fd
- * for a match on pack_id or tag. If there is a match, aborts that match.
- * Otherwise, if SGV4_FLAG_DEV_SCOPE is set, the rest of the file descriptors
- * belonging to the current device are similarly checked. If there is no match
- * then -ENODATA is returned.
+ * Tries to abort an inflight request/command. First it checks the current fd for a match on
+ * pack_id or tag. If there is a match, aborts that match. Otherwise, if SGV4_FLAG_DEV_SCOPE is
+ * set, the rest of the file descriptors belonging to the current device are similarly checked.
+ * If there is no match then -ENODATA is returned.
  */
 static int
 sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
@@ -4403,15 +4272,13 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 }
 
 /*
- * Check if search_for is a "char" device fd whose MAJOR is this driver.
- * If so filp->private_data must be the sfp we are looking for. Do further
- * checks (e.g. not already in a file share). If all is well set up cross
- * references and adjust xarray marks. Returns a sfp or negative errno
- * twisted by ERR_PTR().
+ * Check if search_for is a "char" device fd whose MAJOR is this driver. If so filp->private_data
+ * must be the sfp we are looking for. Do further checks (e.g. not already in a file share). If all
+ * is well set up cross references and adjust xarray marks. Returns a sfp or negative errno twisted
+ * by ERR_PTR().
  */
 static struct sg_fd *
-sg_find_sfp_by_fd(const struct file *search_for, struct sg_fd *from_sfp,
-		  bool is_reshare)
+sg_find_sfp_by_fd(const struct file *search_for, struct sg_fd *from_sfp, bool is_reshare)
 		__must_hold(&from_sfp->f_mutex)
 {
 	int res = 0;
@@ -4420,8 +4287,8 @@ sg_find_sfp_by_fd(const struct file *search_for, struct sg_fd *from_sfp,
 	struct sg_device *from_sdp = from_sfp->parentdp;
 	struct sg_device *sdp;
 
-	SG_LOG(6, from_sfp, "%s: enter,  from_sfp=%pK search_for=%pK\n",
-	       __func__, from_sfp, search_for);
+	SG_LOG(6, from_sfp, "%s: enter,  from_sfp=%pK search_for=%pK\n", __func__, from_sfp,
+	       search_for);
 	if (!(S_ISCHR(search_for->f_inode->i_mode) &&
 	      MAJOR(search_for->f_inode->i_rdev) == SCSI_GENERIC_MAJOR))
 		return ERR_PTR(-EBADF);
@@ -4446,8 +4313,7 @@ sg_find_sfp_by_fd(const struct file *search_for, struct sg_fd *from_sfp,
 	rcu_assign_pointer(from_sfp->share_sfp, sfp);
 	__xa_clear_mark(&from_sdp->sfp_arr, from_sfp->idx, SG_XA_FD_UNSHARED);
 	if (is_reshare)	/* reshare case: no kref_get() on read-side */
-		__xa_set_mark(&from_sdp->sfp_arr, from_sfp->idx,
-			      SG_XA_FD_RS_SHARE);
+		__xa_set_mark(&from_sdp->sfp_arr, from_sfp->idx, SG_XA_FD_RS_SHARE);
 	else
 		kref_get(&from_sfp->f_ref);  /* undone: sg_unshare_*_fd() */
 	if (from_sdp != sdp) {
@@ -4466,9 +4332,9 @@ sg_find_sfp_by_fd(const struct file *search_for, struct sg_fd *from_sfp,
 }
 
 /*
- * After checking the proposed read-side/write-side relationship is unique and valid,
- * sets up pointers between read-side and write-side sg_fd objects. Returns 0 on
- * success or negated errno value. From ioctl(EXTENDED(SG_SEIM_SHARE_FD)).
+ * After checking the proposed read-side/write-side relationship is unique and valid, sets up
+ * pointers between read-side and write-side sg_fd objects. Returns 0 on success or negated errno
+ * value. From ioctl(EXTENDED(SG_SEIM_SHARE_FD)).
  */
 static int
 sg_fd_share(struct sg_fd *ws_sfp, int m_fd)
@@ -4495,8 +4361,7 @@ sg_fd_share(struct sg_fd *ws_sfp, int m_fd)
 		res = -ELOOP;
 		goto fini;
 	}
-	SG_LOG(6, ws_sfp, "%s: read-side fd okay, scan for filp=0x%pK\n",
-	       __func__, filp);
+	SG_LOG(6, ws_sfp, "%s: read-side fd okay, scan for filp=0x%pK\n", __func__, filp);
 	rs_sfp = sg_find_sfp_by_fd(filp, ws_sfp, false);
 	if (!IS_ERR(rs_sfp))
 		found = !!rs_sfp;
@@ -4509,10 +4374,9 @@ sg_fd_share(struct sg_fd *ws_sfp, int m_fd)
 }
 
 /*
- * After checking the proposed file share relationship is unique and
- * valid, sets up pointers between read-side and write-side sg_fd objects. Allows
- * previous write-side to be the same as the new write-side (fd). Return 0 on success
- * or negated errno value.
+ * After checking the proposed file share relationship is unique and valid, sets up pointers
+ * between read-side and write-side sg_fd objects. Allows previous write-side to be the same as
+ * the new write-side (fd). Return 0 on success or negated errno value.
  */
 static int
 sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
@@ -4528,8 +4392,7 @@ sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
 		return -EACCES;
 	if (unlikely(new_ws_fd < 0))
 		return -EBADF;
-	if (unlikely(!xa_get_mark(&rs_sfp->parentdp->sfp_arr, rs_sfp->idx,
-				  SG_XA_FD_RS_SHARE)))
+	if (unlikely(!xa_get_mark(&rs_sfp->parentdp->sfp_arr, rs_sfp->idx, SG_XA_FD_RS_SHARE)))
 		res = -EINVAL;	/* invalid unless prev_sl==new_sl */
 
 	/* Alternate approach: fcheck_files(current->files, m_fd) */
@@ -4547,8 +4410,7 @@ sg_fd_reshare(struct sg_fd *rs_sfp, int new_ws_fd)
 		}	/* else it is invalid and res is still -EINVAL */
 		goto fini;
 	}
-	SG_LOG(6, ws_sfp, "%s: write-side fd ok, scan for filp=0x%pK\n", __func__,
-	       filp);
+	SG_LOG(6, ws_sfp, "%s: write-side fd ok, scan for filp=0x%pK\n", __func__, filp);
 	sg_unshare_ws_fd(ws_sfp, true);
 	ws_sfp = sg_find_sfp_by_fd(filp, rs_sfp, true);
 	if (!IS_ERR(ws_sfp))
@@ -4586,12 +4448,11 @@ sg_eventfd_new(struct sg_fd *rs_sfp, int eventfd)
 }
 
 /*
- * First normalize want_rsv_sz to be >= sfp->sgat_elem_sz and
- * <= max_segment_size. Exit if that is the same as old size; otherwise
- * create a new candidate request of the new size. Then decide whether to
- * re-use an existing inactive request (least buflen >= required size) or
- * use the new candidate. If new one used, leave old one but it is no longer
- * the reserved request. Returns 0 on success, else a negated errno value.
+ * First normalize want_rsv_sz to be >= sfp->sgat_elem_sz and <= max_segment_size. Exit if that is
+ * the same as old size; otherwise create a new candidate request of the new size. Then decide
+ * whether to re-use an existing inactive request (least buflen >= required size) or use the new
+ * candidate. If new one used, leave old one but it is no longer the reserved request. Returns 0
+ * on success, else a negated errno value.
  */
 static int
 sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
@@ -4613,9 +4474,8 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 		return -EBUSY;	/* this fd can't be either side of share */
 	new_sz = min_t(int, want_rsv_sz, sdp->max_sgat_sz);
 	new_sz = max_t(int, new_sz, sfp->sgat_elem_sz);
-	SG_LOG(3, sfp, "%s: was=%d, ask=%d, new=%d (sgat_elem_sz=%d)\n",
-	       __func__, *rapp ? (*rapp)->sgatp->buflen : -1,
-	       want_rsv_sz, new_sz, sfp->sgat_elem_sz);
+	SG_LOG(3, sfp, "%s: was=%d, ask=%d, new=%d (sgat_elem_sz=%d)\n", __func__,
+	       *rapp ? (*rapp)->sgatp->buflen : -1, want_rsv_sz, new_sz, sfp->sgat_elem_sz);
 	if (unlikely(sfp->mmap_sz > 0))
 		return -EBUSY;	/* existing pages possibly pinned */
 
@@ -4657,26 +4517,22 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 			xa_lock_irqsave(xafp, iflags);
 			n_srp->rq_idx = o_srp->rq_idx;
 			idx = o_srp->rq_idx;
-			cxc_srp = __xa_cmpxchg(xafp, idx, o_srp, n_srp,
-					       GFP_ATOMIC);
+			cxc_srp = __xa_cmpxchg(xafp, idx, o_srp, n_srp, GFP_ATOMIC);
 			if (o_srp == cxc_srp) {
 				__assign_bit(SG_FRQ_RESERVED, n_srp->frq_bm,
-					     test_bit(SG_FRQ_RESERVED,
-						      o_srp->frq_bm));
+					     test_bit(SG_FRQ_RESERVED, o_srp->frq_bm));
 				*rapp = n_srp;
 				sg_rq_chg_state_force_ulck(n_srp, SG_RQ_INACTIVE);
 				/* no bump of sfp->inactives since replacement */
 				xa_unlock_irqrestore(xafp, iflags);
-				SG_LOG(6, sfp, "%s: new rsv srp=0x%pK ++\n",
-				       __func__, n_srp);
+				SG_LOG(6, sfp, "%s: new rsv srp=0x%pK ++\n", __func__, n_srp);
 				n_srp = NULL;
 				sg_remove_srp(o_srp);
 				kfree(o_srp);
 				o_srp = NULL;
 			} else {
 				xa_unlock_irqrestore(xafp, iflags);
-				SG_LOG(1, sfp, "%s: xa_cmpxchg()-->retry\n",
-				       __func__);
+				SG_LOG(1, sfp, "%s: xa_cmpxchg()-->retry\n", __func__);
 				goto try_again;
 			}
 		}
@@ -4702,8 +4558,7 @@ struct compat_sg_req_info { /* used by SG_GET_REQUEST_TABLE ioctl() */
 	int unused;
 };
 
-static int put_compat_request_table(struct compat_sg_req_info __user *o,
-				    struct sg_req_info *rinfo)
+static int put_compat_request_table(struct compat_sg_req_info __user *o, struct sg_req_info *rinfo)
 {
 	int i;
 
@@ -4738,11 +4593,9 @@ sg_any_persistent_orphans(struct sg_fd *sfp)
 
 /*
  * Will clear_first if size already over half of available buffer.
- *
- * N.B. This function is a useful debug aid to be called inline with its
- * output going to /sys/kernel/debug/scsi_generic/snapped for later
- * examination. Best to call it with no locks held and that implies that
- * the driver state may change while it is processing. Interpret the
+ * N.B. This function is a useful debug aid to be called inline with its output going to
+ * /sys/kernel/debug/scsi_generic/snapped for later examination. Best to call it with no locks
+ * held and that implies that the driver state may change while it is processing. Interpret the
  * result with this in mind.
  */
 static void
@@ -4762,13 +4615,11 @@ sg_take_snap(struct sg_fd *sfp, bool clear_first)
 	second = (u32)do_div(n, 60);
 	minute = (u32)do_div(n, 60);
 	hour = (u32)do_div(n, 24);	/* hour within a UTC day */
-	snprintf(b, sizeof(b), "UTC time: %.2u:%.2u:%.2u:%.6u [tid=%d]",
-		 hour, minute, second, (u32)ts64.tv_nsec / 1000,
-		 (current ? current->pid : -1));
+	snprintf(b, sizeof(b), "UTC time: %.2u:%.2u:%.2u:%.6u [tid=%d]", hour, minute, second,
+		 (u32)ts64.tv_nsec / 1000, (current ? current->pid : -1));
 	mutex_lock(&snapped_mutex);
 	if (!snapped_buf) {
-		snapped_buf = kzalloc(SG_SNAP_BUFF_SZ,
-				      GFP_KERNEL | __GFP_NOWARN);
+		snapped_buf = kzalloc(SG_SNAP_BUFF_SZ, GFP_KERNEL | __GFP_NOWARN);
 		if (!snapped_buf)
 			goto fini;
 	} else if (clear_first) {
@@ -4801,9 +4652,9 @@ sg_take_snap(struct sg_fd *sfp, bool clear_first)
 }
 
 /*
- * Processing of ioctl(SG_SET_GET_EXTENDED(SG_SEIM_CTL_FLAGS)) which is a set
- * of boolean flags. Access abbreviations: [rw], read-write; [ro], read-only;
- * [wo], write-only; [raw], read after write; [rbw], read before write.
+ * Processing of ioctl(SG_SET_GET_EXTENDED(SG_SEIM_CTL_FLAGS)) which is a set of boolean flags.
+ * Access abbreviations: [rw], read-write; [ro], read-only; [wo], write-only; [raw], read after
+ * write; [rbw], read before write.
  */
 static int
 sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
@@ -4852,8 +4703,7 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 	}
 	/* Q_TAIL boolean, [raw] 1: queue at tail; 0: head (def: depends) */
 	if (c_flgs_wm & SG_CTL_FLAGM_Q_TAIL)
-		assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm,
-			   !!(c_flgs_val_in & SG_CTL_FLAGM_Q_TAIL));
+		assign_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm, !!(c_flgs_val_in & SG_CTL_FLAGM_Q_TAIL));
 	if (c_flgs_rm & SG_CTL_FLAGM_Q_TAIL) {
 		if (test_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm))
 			c_flgs_val_out |= SG_CTL_FLAGM_Q_TAIL;
@@ -4861,10 +4711,9 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 			c_flgs_val_out &= ~SG_CTL_FLAGM_Q_TAIL;
 	}
 	/*
-	 * UNSHARE boolean: when reading yields zero. When writing true,
-	 * unshares this fd from a previously established fd share. If
-	 * a shared commands is inflight, waits a little while for it
-	 * to finish.
+	 * UNSHARE boolean: when reading yields zero. When writing true, unshares this fd from a
+	 * previously established fd share. If a shared commands is inflight, waits a little
+	 * while for it to finish.
 	 */
 	if (c_flgs_wm & SG_CTL_FLAGM_UNSHARE) {
 		mutex_lock(&sfp->f_mutex);
@@ -4888,9 +4737,9 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 			c_flgs_val_out &= ~SG_CTL_FLAGM_IS_READ_SIDE;
 	}
 	/*
-	 * READ_SIDE_FINI boolean, [rbw] should be called by write-side; when
-	 * reading: read-side is finished, awaiting action by write-side;
-	 * when written: 1 --> write-side doesn't want to continue
+	 * READ_SIDE_FINI boolean, [rbw] should be called by write-side; when reading: read-side
+	 * is finished, awaiting action by write-side; when written: 1 --> write-side doesn't
+	 * want to continue
 	 */
 	if ((c_flgs_rm & SG_CTL_FLAGM_READ_SIDE_FINI) && sg_fd_is_shared(sfp)) {
 		struct sg_fd *rs_sfp = sg_fd_share_ptr(sfp);
@@ -5010,16 +4859,14 @@ sg_extended_read_value(struct sg_fd *sfp, struct sg_extended_info *seip)
 		break;
 	case SG_SEIRV_INACT_RQS:
 		uv = 0;
-		xa_for_each_marked(&sfp->srp_arr, idx, srp,
-				   SG_XA_RQ_INACTIVE)
+		xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_INACTIVE)
 			++uv;
 		seip->read_value = uv;
 		break;
 	case SG_SEIRV_DEV_INACT_RQS:
 		uv = 0;
 		xa_for_each(&sdp->sfp_arr, idx2, a_sfp) {
-			xa_for_each_marked(&a_sfp->srp_arr, idx, srp,
-					   SG_XA_RQ_INACTIVE)
+			xa_for_each_marked(&a_sfp->srp_arr, idx, srp, SG_XA_RQ_INACTIVE)
 				++uv;
 		}
 		seip->read_value = uv;
@@ -5043,8 +4890,7 @@ sg_extended_read_value(struct sg_fd *sfp, struct sg_extended_info *seip)
 		seip->read_value = (sfp->parentdp->create_ns >> 32) & U32_MAX;
 		break;
 	default:
-		SG_LOG(6, sfp, "%s: can't decode %d --> read_value\n",
-		       __func__, seip->read_value);
+		SG_LOG(6, sfp, "%s: can't decode %d --> read_value\n", __func__, seip->read_value);
 		seip->read_value = 0;
 		break;
 	}
@@ -5072,8 +4918,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 		SG_LOG(2, sfp, "%s: both masks 0, do nothing\n", __func__);
 		return 0;
 	}
-	SG_LOG(3, sfp, "%s: wr_mask=0x%x rd_mask=0x%x\n", __func__, s_wr_mask,
-	       s_rd_mask);
+	SG_LOG(3, sfp, "%s: wr_mask=0x%x rd_mask=0x%x\n", __func__, s_wr_mask, s_rd_mask);
 	/* tot_fd_thresh (u32), [rbw] [limit for sum of active cmd dlen_s] */
 	if (or_masks & SG_SEIM_TOT_FD_THRESH) {
 		u32 hold = sfp->tot_fd_thresh;
@@ -5114,8 +4959,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 		if (s_rd_mask & SG_SEIM_SHARE_FD) {
 			struct sg_fd *sh_sfp = sg_fd_share_ptr(sfp);
 
-			seip->share_fd = sh_sfp ? sh_sfp->parentdp->index :
-						  U32_MAX;
+			seip->share_fd = sh_sfp ? sh_sfp->parentdp->index : U32_MAX;
 		}
 		mutex_unlock(&sfp->f_mutex);
 	}
@@ -5131,8 +4975,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 		if (s_rd_mask & SG_SEIM_CHG_SHARE_FD) {
 			struct sg_fd *sh_sfp = sg_fd_share_ptr(sfp);
 
-			seip->share_fd = sh_sfp ? sh_sfp->parentdp->index :
-						  U32_MAX;
+			seip->share_fd = sh_sfp ? sh_sfp->parentdp->index : U32_MAX;
 		}
 		mutex_unlock(&sfp->f_mutex);
 	}
@@ -5166,9 +5009,8 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 		if (s_wr_mask & SG_SEIM_SGAT_ELEM_SZ) {
 			j = (int)seip->sgat_elem_sz;
 			if (!is_power_of_2(j) || j < (int)PAGE_SIZE) {
-				SG_LOG(1, sfp, "%s: %s not power of 2, %s\n",
-				       __func__, "sgat element size",
-				       "or less than PAGE_SIZE");
+				SG_LOG(1, sfp, "%s: sgat element size not power of 2, %s\n",
+				       __func__, "or less than PAGE_SIZE");
 				ret = -EINVAL;
 			} else {
 				sfp->sgat_elem_sz = j;
@@ -5188,8 +5030,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 	if (s_rd_mask & SG_SEIM_RESERVED_SIZE) {
 		struct sg_request *r_srp = sfp->rsv_arr[0];
 
-		seip->reserved_sz = (u32)min_t(int, r_srp->sgatp->buflen,
-					       sdp->max_sgat_sz);
+		seip->reserved_sz = (u32)min_t(int, r_srp->sgatp->buflen, sdp->max_sgat_sz);
 	}
 	/* copy to user space if int or boolean read mask non-zero */
 	if (s_rd_mask || seip->ctl_flags_rd_mask) {
@@ -5200,10 +5041,9 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 }
 
 /*
- * For backward compatibility, output SG_MAX_QUEUE sg_req_info objects. First
- * fetch from the active list then, if there is still room, from the free
- * list. Some of the trailing elements may be empty which is indicated by all
- * fields being zero. Any requests beyond SG_MAX_QUEUE are ignored.
+ * For backward compatibility, output SG_MAX_QUEUE sg_req_info objects. First fetch from the active
+ * list then, if there is still room, from the free list. Some of the trailing elements may be
+ * empty which is indicated by all fields being zero. Any requests beyond SG_MAX_QUEUE are ignored.
  */
 static int
 sg_ctl_req_tbl(struct sg_fd *sfp, void __user *p)
@@ -5240,11 +5080,9 @@ sg_ctl_req_tbl(struct sg_fd *sfp, void __user *p)
 	if (in_compat_syscall())
 		result = put_compat_request_table(p, rinfop);
 	else
-		result = copy_to_user(p, rinfop,
-				      SZ_SG_REQ_INFO * SG_MAX_QUEUE);
+		result = copy_to_user(p, rinfop, SZ_SG_REQ_INFO * SG_MAX_QUEUE);
 #else
-	result = copy_to_user(p, rinfop,
-			      SZ_SG_REQ_INFO * SG_MAX_QUEUE);
+	result = copy_to_user(p, rinfop, SZ_SG_REQ_INFO * SG_MAX_QUEUE);
 #endif
 	kfree(rinfop);
 	return result > 0 ? -EFAULT : result;	/* treat short copy as error */
@@ -5273,8 +5111,8 @@ sg_ctl_scsi_id(struct scsi_device *sdev, struct sg_fd *sfp, void __user *p)
 }
 
 static long
-sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
-		unsigned int cmd_in, void __user *p)
+sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp, unsigned int cmd_in,
+		void __user *p)
 {
 	bool read_only = O_RDWR != (filp->f_flags & O_ACCMODE);
 	int val;
@@ -5352,16 +5190,14 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 	case SG_GET_PACK_ID:    /* or tag of oldest "read"-able, -1 if none */
 		val = -1;
 		if (test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm)) {
-			xa_for_each_marked(&sfp->srp_arr, idx, srp,
-					   SG_XA_RQ_AWAIT) {
+			xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
 				if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
 					val = srp->tag;
 					break;
 				}
 			}
 		} else {
-			xa_for_each_marked(&sfp->srp_arr, idx, srp,
-					   SG_XA_RQ_AWAIT) {
+			xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
 				if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
 					val = srp->pack_id;
 					break;
@@ -5371,8 +5207,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		SG_LOG(3, sfp, "%s:    SG_GET_PACK_ID=%d\n", __func__, val);
 		return put_user(val, ip);
 	case SG_GET_SG_TABLESIZE:
-		SG_LOG(3, sfp, "%s:    SG_GET_SG_TABLESIZE=%d\n", __func__,
-		       sdp->max_sgat_elems);
+		SG_LOG(3, sfp, "%s:    SG_GET_SG_TABLESIZE=%d\n", __func__, sdp->max_sgat_elems);
 		return put_user(sdp->max_sgat_elems, ip);
 	case SG_SET_RESERVED_SIZE:
 		res = get_user(val, ip);
@@ -5389,16 +5224,12 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return res;
 	case SG_GET_RESERVED_SIZE:
 		{
-			struct sg_request *r_srp = sfp->rsv_arr[0];
-
 			mutex_lock(&sfp->f_mutex);
-			val = min_t(int, r_srp->sgatp->buflen,
-				    sdp->max_sgat_sz);
+			val = min_t(int, sfp->rsv_arr[0]->sgatp->buflen, sdp->max_sgat_sz);
 			mutex_unlock(&sfp->f_mutex);
 			res = put_user(val, ip);
 		}
-		SG_LOG(3, sfp, "%s:    SG_GET_RESERVED_SIZE=%d\n", __func__,
-		       val);
+		SG_LOG(3, sfp, "%s:    SG_GET_RESERVED_SIZE=%d\n", __func__, val);
 		return res;
 	case SG_SET_COMMAND_Q:	/* set by driver whenever v3 or v4 req seen */
 		SG_LOG(3, sfp, "%s:    SG_SET_COMMAND_Q\n", __func__);
@@ -5419,8 +5250,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return 0;
 	case SG_GET_KEEP_ORPHAN:
 		SG_LOG(3, sfp, "%s:    SG_GET_KEEP_ORPHAN\n", __func__);
-		return put_user(test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm),
-				ip);
+		return put_user(test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm), ip);
 	case SG_GET_VERSION_NUM:
 		SG_LOG(3, sfp, "%s:    SG_GET_VERSION_NUM\n", __func__);
 		return put_user(sg_version_num, ip);
@@ -5437,20 +5267,18 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		if (unlikely(val < 0))
 			return -EIO;
 		if (val >= mult_frac((s64)INT_MAX, USER_HZ, HZ))
-			val = min_t(s64, mult_frac((s64)INT_MAX, USER_HZ, HZ),
-				    INT_MAX);
+			val = min_t(s64, mult_frac((s64)INT_MAX, USER_HZ, HZ), INT_MAX);
 		sfp->timeout_user = val;
 		sfp->timeout = mult_frac(val, HZ, USER_HZ);
 		return 0;
-	case SG_GET_TIMEOUT:    /* N.B. User receives timeout as return value */
-				/* strange ..., for backward compatibility */
+	case SG_GET_TIMEOUT:
+		/* N.B. User receives timeout as return value, keep for backward compatibility */
 		SG_LOG(3, sfp, "%s:    SG_GET_TIMEOUT\n", __func__);
 		return sfp->timeout_user;
 	case SG_SET_FORCE_LOW_DMA:
 		/*
-		 * N.B. This ioctl never worked properly, but failed to
-		 * return an error value. So returning '0' to keep
-		 * compatibility with legacy applications.
+		 * N.B. This ioctl never worked properly, but failed to return an error value. So
+		 * returning '0' to keep compatibility with legacy applications.
 		 */
 		SG_LOG(3, sfp, "%s:    SG_SET_FORCE_LOW_DMA\n", __func__);
 		return 0;
@@ -5480,8 +5308,7 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return put_user(sdev->host->hostt->emulated, ip);
 	case SCSI_IOCTL_SEND_COMMAND:
 		SG_LOG(3, sfp, "%s:    SCSI_IOCTL_SEND_COMMAND\n", __func__);
-		return sg_scsi_ioctl(sdev->request_queue, NULL, filp->f_mode,
-				     p);
+		return sg_scsi_ioctl(sdev->request_queue, NULL, filp->f_mode, p);
 	case SG_SET_DEBUG:
 		SG_LOG(3, sfp, "%s:    SG_SET_DEBUG\n", __func__);
 		res = get_user(val, ip);
@@ -5496,10 +5323,8 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		return put_user(max_sectors_bytes(sdev->request_queue), ip);
 	case BLKTRACESETUP:
 		SG_LOG(3, sfp, "%s:    BLKTRACESETUP\n", __func__);
-		return blk_trace_setup(sdev->request_queue,
-				       sdp->disk->disk_name,
-				       MKDEV(SCSI_GENERIC_MAJOR, sdp->index),
-				       NULL, p);
+		return blk_trace_setup(sdev->request_queue, sdp->disk->disk_name,
+				       MKDEV(SCSI_GENERIC_MAJOR, sdp->index), NULL, p);
 	case BLKTRACESTART:
 		SG_LOG(3, sfp, "%s:    BLKTRACESTART\n", __func__);
 		return blk_trace_startstop(sdev->request_queue, 1);
@@ -5510,23 +5335,19 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp,
 		SG_LOG(3, sfp, "%s:    BLKTRACETEARDOWN\n", __func__);
 		return blk_trace_remove(sdev->request_queue);
 	case SCSI_IOCTL_GET_IDLUN:
-		SG_LOG(3, sfp, "%s:    SCSI_IOCTL_GET_IDLUN %s\n", __func__,
-		       pmlp);
+		SG_LOG(3, sfp, "%s:    SCSI_IOCTL_GET_IDLUN %s\n", __func__, pmlp);
 		break;
 	case SCSI_IOCTL_GET_BUS_NUMBER:
-		SG_LOG(3, sfp, "%s:    SCSI_IOCTL_GET_BUS_NUMBER%s\n",
-		       __func__, pmlp);
+		SG_LOG(3, sfp, "%s:    SCSI_IOCTL_GET_BUS_NUMBER%s\n", __func__, pmlp);
 		break;
 	case SCSI_IOCTL_PROBE_HOST:
-		SG_LOG(3, sfp, "%s:    SCSI_IOCTL_PROBE_HOST%s",
-		       __func__, pmlp);
+		SG_LOG(3, sfp, "%s:    SCSI_IOCTL_PROBE_HOST%s", __func__, pmlp);
 		break;
 	case SG_GET_TRANSFORM:
 		SG_LOG(3, sfp, "%s:    SG_GET_TRANSFORM%s\n", __func__, pmlp);
 		break;
 	default:
-		SG_LOG(3, sfp, "%s:    unrecognized ioctl [0x%x]%s\n",
-		       __func__, cmd_in, pmlp);
+		SG_LOG(3, sfp, "%s:    unrecognized ioctl [0x%x]%s\n", __func__, cmd_in, pmlp);
 		if (read_only)
 			return -EPERM;	/* don't know, so take safer approach */
 		break;
@@ -5643,10 +5464,7 @@ sg_sfp_blk_poll(struct sg_fd *sfp, int loop_count)
 	return res;
 }
 
-/*
- * Implements the poll(2) system call for this driver. Returns various EPOLL*
- * flags OR-ed together.
- */
+/* Implements the poll(2) system call. Returns various EPOLL* flags OR-ed together. */
 static __poll_t
 sg_poll(struct file *filp, poll_table *wait)
 {
@@ -5746,12 +5564,10 @@ sg_vma_fault(struct vm_fault *vmf)
 	rsv_schp = srp->sgatp;
 	offset = vmf->pgoff << PAGE_SHIFT;
 	if (unlikely(offset >= (unsigned int)rsv_schp->buflen)) {
-		SG_LOG(1, sfp, "%s: offset[%lu] >= rsv.buflen\n", __func__,
-		       offset);
+		SG_LOG(1, sfp, "%s: offset[%lu] >= rsv.buflen\n", __func__, offset);
 		goto out_err_unlock;
 	}
-	SG_LOG(5, sfp, "%s: vm_start=0x%lx, off=%lu\n", __func__,
-	       vma->vm_start, offset);
+	SG_LOG(5, sfp, "%s: vm_start=0x%lx, off=%lu\n", __func__, vma->vm_start, offset);
 	length = 1 << (PAGE_SHIFT + rsv_schp->page_order);
 	k = (int)offset / length;
 	n = ((int)offset % length) >> PAGE_SHIFT;
@@ -5772,9 +5588,8 @@ static const struct vm_operations_struct sg_mmap_vm_ops = {
 };
 
 /*
- * Entry point for mmap(2) system call. For mmap(2) to work, request's
- * scatter gather list needs to be order 0 which it is unlikely to be
- * by default. mmap(2) cannot be called more than once per fd.
+ * Entry point for mmap(2) system call. For mmap(2) to work, request's scatter gather list needs
+ * to be order 0 which it is unlikely to be by default. mmap(2) calls cannot overlap on this fd.
  */
 static int
 sg_mmap(struct file *filp, struct vm_area_struct *vma)
@@ -5794,8 +5609,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 	if (unlikely(!mutex_trylock(&sfp->f_mutex)))
 		return -EBUSY;
 	req_sz = vma->vm_end - vma->vm_start;
-	SG_LOG(3, sfp, "%s: vm_start=%pK, len=%d\n", __func__,
-	       (void *)vma->vm_start, (int)req_sz);
+	SG_LOG(3, sfp, "%s: vm_start=%pK, len=%d\n", __func__, (void *)vma->vm_start, (int)req_sz);
 	if (unlikely(vma->vm_pgoff || req_sz < SG_DEF_SECTOR_SZ)) {
 		res = -EINVAL; /* only an offset of 0 accepted */
 		goto fini;
@@ -5811,19 +5625,16 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 		goto fini;
 	}
 	if (sfp->mmap_sz > 0) {
-		SG_LOG(1, sfp, "%s: multiple invocations on this fd\n",
-		       __func__);
+		SG_LOG(1, sfp, "%s: multiple invocations on this fd\n", __func__);
 		res = -EADDRINUSE;
 		goto fini;
 	}
-	if (srp->sgat_h.page_order > 0 ||
-	    req_sz > (unsigned long)srp->sgat_h.buflen) {
+	if (srp->sgat_h.page_order > 0 || req_sz > (unsigned long)srp->sgat_h.buflen) {
 		sg_remove_srp(srp);
 		set_bit(SG_FRQ_FOR_MMAP, srp->frq_bm);
 		res = sg_mk_sgat(srp, sfp, req_sz);
 		if (res) {
-			SG_LOG(1, sfp, "%s: sg_mk_sgat failed, wanted=%lu\n",
-			       __func__, req_sz);
+			SG_LOG(1, sfp, "%s: sg_mk_sgat failed, wanted=%lu\n", __func__, req_sz);
 			goto fini;
 		}
 	}
@@ -5838,15 +5649,13 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 }
 
 /*
- * This user context function is called from sg_rq_end_io() when an orphaned
- * request needs to be cleaned up (e.g. when control C is typed while an
- * ioctl(SG_IO) is active).
+ * This user context function is called from sg_rq_end_io() when an orphaned request needs to be
+ * cleaned up (e.g. when control C is typed while an ioctl(SG_IO) is active).
  */
 static void
 sg_uc_rq_end_io_orphaned(struct work_struct *work)
 {
-	struct sg_request *srp = container_of(work, struct sg_request,
-					      ew_orph.work);
+	struct sg_request *srp = container_of(work, struct sg_request, ew_orph.work);
 	struct sg_fd *sfp;
 
 	sfp = srp->parentfp;
@@ -5863,10 +5672,9 @@ sg_uc_rq_end_io_orphaned(struct work_struct *work)
 }
 
 /*
- * This "bottom half" (soft interrupt) handler is called by the mid-level
- * when a request has completed or failed. This callback is registered in a
- * blk_execute_rq_nowait() call in the sg_common_write(). For ioctl(SG_IO)
- * (sync) usage, sg_ctl_sg_io() waits to be woken up by this callback.
+ * This "bottom half" (soft interrupt) handler is called by the mid-level when a request has
+ * completed or failed. This callback is registered in a blk_execute_rq_nowait() call in the
+ * sg_common_write(). For ioctl(SG_IO) (sync) usage, sg_ctl_sg_io() awaits this callback.
  */
 static void
 sg_rq_end_io(struct request *rqq, blk_status_t status)
@@ -5901,11 +5709,10 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	if (unlikely(test_bit(SG_FRQ_ABORTING, srp->frq_bm)) && sg_result_is_good(rq_result))
 		srp->rq_result |= (DRIVER_HARD << 24);
 
-	SG_LOG(6, sfp, "%s: pack/tag_id=%d/%d, cmd=0x%x, res=0x%x\n", __func__,
-	       srp->pack_id, srp->tag, srp->cmd_opcode, srp->rq_result);
+	SG_LOG(6, sfp, "%s: pack/tag_id=%d/%d, cmd=0x%x, res=0x%x\n", __func__, srp->pack_id,
+	       srp->tag, srp->cmd_opcode, srp->rq_result);
 	if (srp->start_ns > 0)	/* zero only when SG_FFD_NO_DURATION is set */
-		srp->duration = sg_calc_rq_dur(srp, test_bit(SG_FFD_TIME_IN_NS,
-							     sfp->ffd_bm));
+		srp->duration = sg_calc_rq_dur(srp, test_bit(SG_FFD_TIME_IN_NS, sfp->ffd_bm));
 	if (unlikely(!sg_result_is_good(rq_result) && slen > 0 &&
 		     test_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm))) {
 		if ((rq_result & 0xfe) == SAM_STAT_CHECK_CONDITION ||
@@ -5914,9 +5721,7 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	}
 	if (unlikely(slen > 0)) {
 		if (likely(scsi_rp->sense && !srp->sense_bp)) {
-			srp->sense_bp =
-				mempool_alloc(sg_sense_pool,
-					      GFP_ATOMIC   /* <-- leave */);
+			srp->sense_bp = mempool_alloc(sg_sense_pool, GFP_ATOMIC /* <-- leave */);
 			if (likely(srp->sense_bp)) {
 				memcpy(srp->sense_bp, scsi_rp->sense, slen);
 				if (slen < SCSI_SENSE_BUFFERSIZE)
@@ -5924,8 +5729,7 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 					       SCSI_SENSE_BUFFERSIZE - slen);
 			} else {
 				slen = 0;
-				pr_warn("%s: sense but can't alloc buffer\n",
-					__func__);
+				pr_warn("%s: sense but can't alloc buffer\n", __func__);
 			}
 		} else if (unlikely(srp->sense_bp)) {
 			slen = 0;
@@ -5963,8 +5767,8 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	}
 	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 	/*
-	 * Free the mid-level resources apart from the bio (if any). The bio's
-	 * blk_rq_unmap_user() can be called later from user context.
+	 * Free the mid-level resources apart from the bio (if any). The bio's blk_rq_unmap_user()
+	 * can be called later from user context.
 	 */
 	scsi_req_free_cmd(scsi_rp);
 	blk_put_request(rqq);
@@ -5978,15 +5782,13 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	}
 	if (!(srp->rq_flags & SGV4_FLAG_HIPRI))
 		wake_up_interruptible(&sfp->cmpl_wait);
-	if (sfp->async_qp && (!SG_IS_V4I(srp) ||
-			      (srp->rq_flags & SGV4_FLAG_SIGNAL)))
+	if (sfp->async_qp && (!SG_IS_V4I(srp) || (srp->rq_flags & SGV4_FLAG_SIGNAL)))
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
 	if (sfp->efd_ctxp && (srp->rq_flags & SGV4_FLAG_EVENTFD)) {
 		u64 n = eventfd_signal(sfp->efd_ctxp, 1);
 
 		if (n != 1)
-			pr_info("%s: srp=%pK eventfd_signal problem\n",
-				__func__, srp);
+			pr_info("%s: srp=%pK eventfd_signal problem\n", __func__, srp);
 	}
 	kref_put(&sfp->f_ref, sg_remove_sfp);	/* get in: sg_execute_cmd() */
 }
@@ -6030,21 +5832,19 @@ sg_add_device_helper(struct gendisk *disk, struct scsi_device *scsidp)
 	error = idr_alloc(&sg_index_idr, sdp, 0, SG_MAX_DEVS, GFP_NOWAIT);
 	if (unlikely(error < 0)) {
 		if (error == -ENOSPC) {
-			sdev_printk(KERN_WARNING, scsidp,
-				    "Unable to attach sg device type=%d, minor number exceeds %d\n",
-				    scsidp->type, SG_MAX_DEVS - 1);
+			sdev_printk(KERN_WARNING, scsidp, "Can't attach sg device type=%d%s%d\n",
+				    scsidp->type, ", minor number exceeds ", SG_MAX_DEVS - 1);
 			error = -ENODEV;
 		} else {
-			sdev_printk(KERN_WARNING, scsidp,
-				"%s: idr allocation sg_device failure: %d\n",
+			sdev_printk(KERN_WARNING, scsidp, "%s: idr_alloc sg_device failure: %d\n",
 				    __func__, error);
 		}
 		goto out_unlock;
 	}
 	k = error;
 
-	SCSI_LOG_TIMEOUT(3, sdev_printk(KERN_INFO, scsidp,
-			 "%s: dev=%d, sdp=0x%pK ++\n", __func__, k, sdp));
+	SCSI_LOG_TIMEOUT(3, sdev_printk(KERN_INFO, scsidp, "%s: dev=%d, sdp=0x%pK ++\n", __func__,
+					k, sdp));
 	sprintf(disk->disk_name, "sg%d", k);
 	disk->first_minor = k;
 	sdp->disk = disk;
@@ -6111,27 +5911,26 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 		struct device *sg_class_member;
 
 		sg_class_member = device_create(sg_sysfs_class, cl_dev->parent,
-						MKDEV(SCSI_GENERIC_MAJOR,
-						      sdp->index),
+						MKDEV(SCSI_GENERIC_MAJOR, sdp->index),
 						sdp, "%s", disk->disk_name);
 		if (IS_ERR(sg_class_member)) {
 			pr_err("%s: device_create failed\n", __func__);
 			error = PTR_ERR(sg_class_member);
 			goto cdev_add_err;
 		}
-		error = sysfs_create_link(&scsidp->sdev_gendev.kobj,
-					  &sg_class_member->kobj, "generic");
+		error = sysfs_create_link(&scsidp->sdev_gendev.kobj, &sg_class_member->kobj,
+					  "generic");
 		if (unlikely(error))
-			pr_err("%s: unable to make symlink 'generic' back "
-			       "to sg%d\n", __func__, sdp->index);
+			pr_err("%s: unable to make symlink 'generic' back to sg%d\n", __func__,
+			       sdp->index);
 	} else {
 		pr_warn("%s: sg_sys Invalid\n", __func__);
 	}
 
 	sdp->create_ns = ktime_get_boottime_ns();
 	sg_calc_sgat_param(sdp);
-	sdev_printk(KERN_NOTICE, scsidp, "Attached scsi generic sg%d "
-		    "type %d\n", sdp->index, scsidp->type);
+	sdev_printk(KERN_NOTICE, scsidp, "Attached scsi generic sg%d type %d\n", sdp->index,
+		    scsidp->type);
 
 	dev_set_drvdata(cl_dev, sdp);
 	return 0;
@@ -6156,12 +5955,10 @@ sg_device_destroy(struct kref *kref)
 	unsigned long iflags;
 
 	SCSI_LOG_TIMEOUT(1, pr_info("[tid=%d] %s: sdp idx=%d, sdp=0x%pK --\n",
-				    (current ? current->pid : -1), __func__,
-				    sdp->index, sdp));
+				    (current ? current->pid : -1), __func__, sdp->index, sdp));
 	/*
-	 * CAUTION!  Note that the device can still be found via idr_find()
-	 * even though the refcount is 0.  Therefore, do idr_remove() BEFORE
-	 * any other cleanup.
+	 * CAUTION!  Note that the device can still be found via idr_find() even though the
+	 * refcount is 0.  Therefore, do idr_remove() BEFORE any other cleanup.
 	 */
 
 	xa_destroy(&sdp->sfp_arr);
@@ -6188,8 +5985,7 @@ sg_remove_device(struct device *cl_dev, struct class_interface *cl_intf)
 		pr_warn("%s: multiple entries: sg%u\n", __func__, sdp->index);
 		return; /* only want to do following once per device */
 	}
-	SCSI_LOG_TIMEOUT(3, sdev_printk(KERN_INFO, sdp->device,
-					"%s: sg%u 0x%pK\n", __func__,
+	SCSI_LOG_TIMEOUT(3, sdev_printk(KERN_INFO, sdp->device, "%s: sg%u 0x%pK\n", __func__,
 					sdp->index, sdp));
 	xa_for_each(&sdp->sfp_arr, idx, sfp) {
 		wake_up_interruptible_all(&sfp->cmpl_wait);
@@ -6222,31 +6018,27 @@ init_sg(void)
 	else
 		def_reserved_size = sg_big_buff;
 
-	rc = register_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0),
-				    SG_MAX_DEVS, "sg");
+	rc = register_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0), SG_MAX_DEVS, "sg");
 	if (unlikely(rc))
 		return rc;
 
-	sg_sense_cache = kmem_cache_create_usercopy
-				("sg_sense", SCSI_SENSE_BUFFERSIZE, 0,
-				 SLAB_HWCACHE_ALIGN, 0,
-				 SCSI_SENSE_BUFFERSIZE, NULL);
+	sg_sense_cache = kmem_cache_create_usercopy("sg_sense", SCSI_SENSE_BUFFERSIZE, 0,
+						    SLAB_HWCACHE_ALIGN, 0, SCSI_SENSE_BUFFERSIZE,
+						    NULL);
 	if (unlikely(!sg_sense_cache)) {
 		pr_err("sg: can't init sense cache\n");
 		rc = -ENOMEM;
 		goto err_out_unreg;
 	}
-	sg_sense_pool = mempool_create_slab_pool(SG_MEMPOOL_MIN_NR,
-						 sg_sense_cache);
+	sg_sense_pool = mempool_create_slab_pool(SG_MEMPOOL_MIN_NR, sg_sense_cache);
 	if (unlikely(!sg_sense_pool)) {
 		pr_err("sg: can't init sense pool\n");
 		rc = -ENOMEM;
 		goto err_out_cache;
 	}
 
-	pr_info("Registered %s[char major=0x%x], version: %s, date: %s\n",
-		"sg device ", SCSI_GENERIC_MAJOR, SG_VERSION_STR,
-		sg_version_date);
+	pr_info("Registered sg device [char major=0x%x], version: %s, date: %s\n",
+		SCSI_GENERIC_MAJOR, SG_VERSION_STR, sg_version_date);
 	sg_sysfs_class = class_create(THIS_MODULE, "scsi_generic");
 	if (IS_ERR(sg_sysfs_class)) {
 		rc = PTR_ERR(sg_sysfs_class);
@@ -6282,8 +6074,7 @@ exit_sg(void)
 	kmem_cache_destroy(sg_sense_cache);
 	class_destroy(sg_sysfs_class);
 	sg_sysfs_valid = false;
-	unregister_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0),
-				 SG_MAX_DEVS);
+	unregister_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0), SG_MAX_DEVS);
 	idr_destroy(&sg_index_idr);
 }
 
@@ -6436,10 +6227,9 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	q = sdp->device->request_queue;
 
 	/*
-	 * For backward compatibility default to using blocking variant even
-	 * when in non-blocking (async) mode. If the SG_CTL_FLAGM_MORE_ASYNC
-	 * boolean set on this file descriptor, returns -EAGAIN if
-	 * blk_get_request(BLK_MQ_REQ_NOWAIT) yields EAGAIN (aka EWOULDBLOCK).
+	 * For backward compatibility default to using blocking variant even when in non-blocking
+	 * (async) mode. If the SG_CTL_FLAGM_MORE_ASYNC boolean set on this file descriptor,
+	 * returns -EAGAIN if blk_get_request(BLK_MQ_REQ_NOWAIT) yields EAGAIN (aka EWOULDBLOCK).
 	 */
 	rqq = blk_get_request(q, (r0w ? REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN),
 			      (test_bit(SG_FFD_MORE_ASYNC, sfp->ffd_bm) ?  BLK_MQ_REQ_NOWAIT : 0));
@@ -6460,8 +6250,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		long_cmdp = NULL;
 	}
 	if (cwrp->u_cmdp)
-		res = sg_fetch_cmnd(sfp, cwrp->u_cmdp, cwrp->cmd_len,
-				    scsi_rp->cmd);
+		res = sg_fetch_cmnd(sfp, cwrp->u_cmdp, cwrp->cmd_len, scsi_rp->cmd);
 	else if (likely(cwrp->cmdp))
 		memcpy(scsi_rp->cmd, cwrp->cmdp, cwrp->cmd_len);
 	else
@@ -6501,9 +6290,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 			if (unlikely(!reserve0 || dlen > req_schp->buflen))
 				res = reserve0 ? -ENOMEM : -EBUSY;
 		} else if (req_schp->buflen == 0) {
-			int up_sz = max_t(int, dlen, sfp->sgat_elem_sz);
-
-			res = sg_mk_sgat(srp, sfp, up_sz);
+			res = sg_mk_sgat(srp, sfp, max_t(int, dlen, sfp->sgat_elem_sz));
 		}
 		if (unlikely(res))
 			goto fini;
@@ -6539,8 +6326,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		res = blk_rq_map_user(q, rqq, md, up, dlen, GFP_ATOMIC);
 #if IS_ENABLED(SG_LOG_ACTIVE)
 		if (unlikely(res))
-			SG_LOG(1, sfp, "%s: blk_rq_map_user() res=%d\n",
-			       __func__, res);
+			SG_LOG(1, sfp, "%s: blk_rq_map_user() res=%d\n", __func__, res);
 #endif
 	} else {	/* transfer data to/from kernel buffers */
 		res = sg_rq_map_kern(srp, q, rqq, r0w);
@@ -6563,11 +6349,10 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 }
 
 /*
- * Clean up mid-level and block layer resources of finished request. Sometimes
- * blk_rq_unmap_user() returns -4 (-EINTR) and this is why: "If we're in a
- * workqueue, the request is orphaned, so don't copy into a random user
- * address space, just free and return -EINTR so user space doesn't expect
- * any data." [block/bio.c]
+ * Clean up mid-level and block layer resources of finished request. Sometimes blk_rq_unmap_user()
+ * returns -4 (-EINTR) and this is why: "If we're in a workqueue, the request is orphaned, so
+ * don't copy into a random user address space, just free and return -EINTR so user space doesn't
+ * expect any data." [block/bio.c]
  */
 static void
 sg_finish_scsi_blk_rq(struct sg_request *srp)
@@ -6577,8 +6362,8 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 	struct request *rqq = READ_ONCE(srp->rqq);
 	__maybe_unused char b[32];
 
-	SG_LOG(4, sfp, "%s: srp=0x%pK%s\n", __func__, srp,
-	       sg_get_rsv_str_lck(srp, " ", "", sizeof(b), b));
+	SG_LOG(4, sfp, "%s: srp=0x%pK%s\n", __func__, srp, sg_get_rsv_str_lck(srp, " ", "",
+									      sizeof(b), b));
 	if (test_and_clear_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm)) {
 		if (atomic_dec_and_test(&sfp->submitted))
 			clear_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm);
@@ -6599,11 +6384,8 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 		srp->bio = NULL;
 		if (us_xfer && bio) {
 			ret = blk_rq_unmap_user(bio);
-			if (unlikely(ret)) {	/* -EINTR (-4) can be ignored */
-				SG_LOG(6, sfp,
-				       "%s: blk_rq_unmap_user() --> %d\n",
-				       __func__, ret);
-			}
+			if (unlikely(ret))	/* -EINTR (-4) can be ignored */
+				SG_LOG(6, sfp, "%s: blk_rq_unmap_user() --> %d\n", __func__, ret);
 		}
 	}
 	/* In worst case, READ data returned to user space by this point */
@@ -6644,9 +6426,8 @@ sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen)
 	order = o_order;
 
 again:
-	if (elem_sz * mx_sgat_elems < align_sz) {	/* misfit ? */
-		SG_LOG(1, sfp, "%s: align_sz=%d too big\n", __func__,
-		       align_sz);
+	if (elem_sz * mx_sgat_elems < align_sz) {
+		SG_LOG(1, sfp, "%s: align_sz=%d too big\n", __func__, align_sz);
 		goto b4_alloc_pages;
 	}
 	rem_sz = align_sz;
@@ -6654,13 +6435,11 @@ sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen)
 		*pgp = alloc_pages(mask_ap, order);
 		if (unlikely(!*pgp))
 			goto err_out;
-		SG_LOG(6, sfp, "%s: elem_sz=%d [0x%pK ++]\n", __func__,
-		       elem_sz, *pgp);
+		SG_LOG(6, sfp, "%s: elem_sz=%d [0x%pK ++]\n", __func__, elem_sz, *pgp);
 	}
 	k = pgp - schp->pages;
 	SG_LOG(((order != o_order || rem_sz > 0) ? 2 : 5), sfp,
-	       "%s: num_sgat=%d, order=%d,%d  rem_sz=%d\n", __func__, k,
-	       o_order, order, rem_sz);
+	       "%s: num_sgat=%d, order=%d,%d  rem_sz=%d\n", __func__, k, o_order, order, rem_sz);
 	schp->page_order = order;
 	schp->num_sgat = k;
 	schp->buflen = align_sz;
@@ -6697,8 +6476,8 @@ sg_remove_sgat(struct sg_fd *sfp, struct sg_scatter_hold *schp)
 			continue;
 		__free_pages(p, schp->page_order);
 	}
-	SG_LOG(5, sfp, "%s: pg_order=%u, free pgs=0x%pK --\n", __func__,
-	       schp->page_order, schp->pages);
+	SG_LOG(5, sfp, "%s: pg_order=%u, free pgs=0x%pK --\n", __func__, schp->page_order,
+	       schp->pages);
 	kfree(schp->pages);
 }
 
@@ -6721,8 +6500,7 @@ sg_remove_srp(struct sg_request *srp)
 	if (sfp->tot_fd_thresh > 0) {
 		/* this is a subtraction, error if it goes negative */
 		if (atomic_add_negative(-schp->buflen, &sfp->sum_fd_dlens)) {
-			SG_LOG(2, sfp, "%s: logic error: this dlen > %s\n",
-			       __func__, "sum_fd_dlens");
+			SG_LOG(2, sfp, "%s: logic error: this dlen > sum_fd_dlens\n", __func__);
 			atomic_set(&sfp->sum_fd_dlens, 0);
 		}
 	}
@@ -6730,9 +6508,8 @@ sg_remove_srp(struct sg_request *srp)
 }
 
 /*
- * For sg v1 and v2 interface: with a command yielding a data-in buffer, after
- * it has arrived in kernel memory, this function copies it to the user space,
- * appended to given struct sg_header object.
+ * For sg v1 and v2 interface: with a command yielding a data-in buffer, after it has arrived in
+ * kernel memory, this function copies it to the user space, appended to given sg_header object.
  */
 static int
 sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
@@ -6770,12 +6547,11 @@ sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
 }
 
 /*
- * If there are many requests outstanding, the speed of this function is
- * important. 'id' is pack_id when is_tag=false, otherwise it is a tag. Both
- * SG_PACK_ID_WILDCARD and SG_TAG_WILDCARD are -1 and that case is typically
- * the fast path. This function is only used in the non-blocking cases.
- * Returns pointer to (first) matching sg_request or NULL. If found,
- * sg_request state is moved from SG_RQ_AWAIT_RCV to SG_RQ_BUSY.
+ * If there are many requests outstanding, the speed of this function is important. 'id' is pack_id
+ * when is_tag=false, otherwise it is a tag. Both SG_PACK_ID_WILDCARD and SG_TAG_WILDCARD are -1
+ * and that case is typically the fast path. This function is only used in the non-blocking cases.
+ * Returns pointer to (first) matching sg_request or NULL. If found, sg_request state is moved
+ * from SG_RQ_AWAIT_RCV to SG_RQ_BUSY.
  */
 static struct sg_request *
 sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
@@ -6859,16 +6635,15 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 	}
 	return NULL;
 good:
-	SG_LOG(5, sfp, "%s: %s%d found [srp=0x%pK]\n", __func__,
-	       (is_tag ? "tag=" : "pack_id="), id, srp);
+	SG_LOG(5, sfp, "%s: %s%d found [srp=0x%pK]\n", __func__, (is_tag ? "tag=" : "pack_id="),
+	       id, srp);
 	return srp;
 }
 
 /*
- * Makes a new sg_request object. If 'first' is set then use GFP_KERNEL which
- * may take time but has improved chance of success, otherwise use GFP_ATOMIC.
- * Note that basic initialization is done but srp is not added to either sfp
- * list. On error returns twisted negated errno value (not NULL).
+ * Makes a new sg_request object. If 'first' is set then use GFP_KERNEL which may take time but has
+ * improved chance of success, otherwise use GFP_ATOMIC. Note that basic initialization is done but
+ * srp is not added to either sfp list. On error returns twisted negated errno value (not NULL).
  * N.B. Initializes new srp state to SG_RQ_BUSY.
  */
 static struct sg_request *
@@ -6912,11 +6687,10 @@ sg_mk_srp_sgat(struct sg_fd *sfp, bool first, int db_len)
 }
 
 /*
- * Irrespective of the given reserve request size, the minimum size requested
- * will be PAGE_SIZE (often 4096 bytes). Returns a pointer to reserve object or
- * a negated errno value twisted by ERR_PTR() macro. The actual number of bytes
- * allocated (maybe less than buflen) is in srp->sgatp->buflen . Note that this
- * function is only called in contexts where locking is not required.
+ * Irrespective of the given reserve request size, the minimum size requested will be PAGE_SIZE
+ * (often 4096 bytes). Returns a pointer to reserve object or a negated errno value twisted by
+ * ERR_PTR() macro. The actual number of bytes allocated (maybe less than buflen) is in
+ * srp->sgatp->buflen . NB this function is only called in contexts where locking is not required.
  */
 static struct sg_request *
 sg_build_reserve(struct sg_fd *sfp, int buflen)
@@ -6946,9 +6720,8 @@ sg_build_reserve(struct sg_fd *sfp, int buflen)
 		res = sg_mk_sgat(srp, sfp, buflen);
 		if (likely(res == 0)) {
 			*rapp = srp;
-			SG_LOG(4, sfp,
-			       "%s: rsv%d: final buflen=%d, srp=0x%pK ++\n",
-			       __func__, idx, buflen, srp);
+			SG_LOG(4, sfp, "%s: rsv%d: final buflen=%d, srp=0x%pK ++\n", __func__,
+			       idx, buflen, srp);
 			return srp;
 		}
 		if (go_out) {
@@ -7070,12 +6843,11 @@ sg_setup_req_new_srp(struct sg_comm_wr_t *cwrp, bool new_rsv_srp, bool no_reqs,
 }
 
 /*
- * Setup an active request (soon to carry a SCSI command) to the current file
- * descriptor by creating a new one or re-using a request marked inactive.
- * If successful returns a valid pointer to a sg_request object which is in
- * the SG_RQ_BUSY state. On failure returns a negated errno value twisted by
- * ERR_PTR() macro. Note that once a file share is established, the read-side
- * side's reserve request can only be used in a request share.
+ * Setup an active request (soon to carry a SCSI command) to the current file descriptor by
+ * creating a new one or re-using a request marked inactive. If successful returns a valid pointer
+ * to a sg_request object which is in the SG_RQ_BUSY state. On failure returns a negated errno
+ * value twisted by ERR_PTR() macro. Note that once a file share is established, the read-side's
+ * reserve request can only be used in a request share.
  */
 static struct sg_request *
 sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
@@ -7126,6 +6898,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 			if (unlikely(res)) {
 				r_srp = NULL;
 			} else {
+				atomic_dec(&fp->inactives);
 				r_srp->sh_srp = NULL;
 				mk_new_srp = false;
 			}
@@ -7148,9 +6921,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 			r_srp = rs_rsv_srp;
 			goto err_out;
 		}
-		/* write-side dlen may be <= read-side's dlen */
-		if (unlikely(dlen + cwrp->wr_offset >
-			     rs_rsv_srp->sgatp->dlen)) {
+		if (unlikely(dlen + cwrp->wr_offset > rs_rsv_srp->sgatp->dlen)) {
 			SG_LOG(1, fp, "%s: bad, write-side dlen [%d] > read-side's\n",
 			       __func__, dlen);
 			r_srp = ERR_PTR(-E2BIG);
@@ -7332,19 +7103,17 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 			SG_LOG(1, fp, "%s: err=%d\n", __func__, err);
 	} else {
 		SG_LOG(4, fp, "%s: %s %sr_srp=0x%pK\n", __func__, cp,
-		       sg_get_rsv_str_lck(r_srp, "[", "] ", sizeof(b), b),
-		       r_srp);
+		       sg_get_rsv_str_lck(r_srp, "[", "] ", sizeof(b), b), r_srp);
 	}
 #endif
 	return r_srp;
 }
 
 /*
- * Sets srp to SG_RQ_INACTIVE unless it was in SG_RQ_SHR_SWAP state. Also
- * change the associated xarray entry flags to be consistent with
- * SG_RQ_INACTIVE. Since this function can be called from many contexts,
- * then assume no xa locks held.
- * The state machine should insure that two threads should never race here.
+ * Sets srp to SG_RQ_INACTIVE unless it was in SG_RQ_SHR_SWAP state. Also change the associated
+ * xarray entry flags to be consistent with SG_RQ_INACTIVE. Since this function can be called from
+ * many contexts, then assume no xa locks held.
+ * The state machine should ensure that two threads should never race here.
  */
 static void
 sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
@@ -7360,10 +7129,7 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 	srp->sense_bp = NULL;
 	sr_st = atomic_read_acquire(&srp->rq_st);
 	if (sr_st != SG_RQ_SHR_SWAP) {
-		/*
-		 * Can be called from many contexts and it is hard to know
-		 * whether xa locks held. So assume not.
-		 */
+		/* Called from many contexts, don't know whether xa locks held. So assume not. */
 		sg_rq_chg_state_force(srp, SG_RQ_INACTIVE);
 		atomic_inc(&sfp->inactives);
 		is_rsv = test_bit(SG_FRQ_RESERVED, srp->frq_bm);
@@ -7416,11 +7182,10 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 	atomic_set(&sfp->waiting, 0);
 	atomic_set(&sfp->inactives, 0);
 	/*
-	 * SG_SCATTER_SZ initializes scatter_elem_sz but different value may
-	 * be given as driver/module parameter (e.g. 'scatter_elem_sz=8192').
-	 * Any user provided number will be changed to be PAGE_SIZE as a
-	 * minimum, otherwise it will be rounded down (if required) to a
-	 * power of 2. So it will always be a power of 2.
+	 * SG_SCATTER_SZ initializes scatter_elem_sz but different value may be given as driver
+	 * or module parameter (e.g. 'scatter_elem_sz=8192'). Any user provided number will be
+	 * changed to be PAGE_SIZE as a minimum, otherwise it will be rounded down (if required)
+	 * to a power of 2. So it will always be a power of 2.
 	 */
 	sfp->sgat_elem_sz = scatter_elem_sz;
 	sfp->parentdp = sdp;
@@ -7439,22 +7204,19 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 		srp = sg_build_reserve(sfp, rbuf_len);
 		if (IS_ERR(srp)) {
 			err = PTR_ERR(srp);
-			SG_LOG(1, sfp, "%s: build reserve err=%ld\n", __func__,
-			       -err);
+			SG_LOG(1, sfp, "%s: build reserve err=%ld\n", __func__, -err);
 			kfree(sfp);
 			return ERR_PTR(err);
 		}
 		if (srp->sgatp->buflen < rbuf_len) {
 			reduced = true;
-			SG_LOG(2, sfp,
-			       "%s: reserve reduced from %d to buflen=%d\n",
-			       __func__, rbuf_len, srp->sgatp->buflen);
+			SG_LOG(2, sfp, "%s: reserve reduced from %d to buflen=%d\n", __func__,
+			       rbuf_len, srp->sgatp->buflen);
 		}
 		xa_lock_irqsave(xafp, iflags);
 		res = __xa_alloc(xafp, &idx, srp, xa_limit_32b, GFP_ATOMIC);
 		if (res < 0) {
-			SG_LOG(1, sfp, "%s: xa_alloc(srp) bad, errno=%d\n",
-			       __func__,  -res);
+			SG_LOG(1, sfp, "%s: xa_alloc(srp) bad, errno=%d\n", __func__,  -res);
 			xa_unlock_irqrestore(xafp, iflags);
 			sg_remove_srp(srp);
 			kfree(srp);
@@ -7468,17 +7230,15 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 		atomic_inc(&sfp->inactives);
 		xa_unlock_irqrestore(xafp, iflags);
 	}
-	if (!reduced) {
-		SG_LOG(4, sfp, "%s: built reserve buflen=%d\n", __func__,
-		       rbuf_len);
-	}
+	if (!reduced)
+		SG_LOG(4, sfp, "%s: built reserve buflen=%d\n", __func__, rbuf_len);
 	xadp = &sdp->sfp_arr;
 	xa_lock_irqsave(xadp, iflags);
 	res = __xa_alloc(xadp, &idx, sfp, xa_limit_32b, GFP_ATOMIC);
 	if (unlikely(res < 0)) {
 		xa_unlock_irqrestore(xadp, iflags);
-		pr_warn("%s: xa_alloc(sdp) bad, o_count=%d, errno=%d\n",
-			__func__, atomic_read(&sdp->open_cnt), -res);
+		pr_warn("%s: xa_alloc(sdp) bad, o_count=%d, errno=%d\n", __func__,
+			atomic_read(&sdp->open_cnt), -res);
 		if (srp) {
 			sg_remove_srp(srp);
 			kfree(srp);
@@ -7496,13 +7256,12 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 }
 
 /*
- * A successful call to sg_release() will result, at some later time, to this
- * "user context" function being invoked. All requests associated with this
- * file descriptor should be completed or cancelled when this function is
- * called (due to sfp->f_ref). Also the file descriptor itself has not been
- * accessible since it was list_del()-ed by the preceding sg_remove_sfp()
- * call. So no locking is required. sdp should never be NULL but to make
- * debugging more robust, this function will not blow up in that case.
+ * A successful call to sg_release() will result, at some later time, to this "user context"
+ * function being invoked. All requests associated with this file descriptor should be completed
+ * or cancelled when this function is called (due to sfp->f_ref). Also the file descriptor itself
+ * has not been accessible since it was list_del()-ed by the preceding sg_remove_sfp() call. So
+ * no locking is required. sdp should never be NULL but to make debugging more robust, this
+ * function will not blow up in that case.
  */
 static void
 sg_uc_remove_sfp(struct work_struct *work)
@@ -7538,15 +7297,13 @@ sg_uc_remove_sfp(struct work_struct *work)
 		e_srp = __xa_erase(xafp, srp->rq_idx);
 		xa_unlock_irqrestore(xafp, iflags);
 		if (unlikely(srp != e_srp))
-			SG_LOG(1, sfp, "%s: xa_erase() return unexpected\n",
-			       __func__);
+			SG_LOG(1, sfp, "%s: xa_erase() return unexpected\n", __func__);
 		SG_LOG(6, sfp, "%s: kfree: srp=%pK --\n", __func__, srp);
 		kfree(srp);
 	}
 	subm = atomic_read(&sfp->submitted);
 	if (subm != 0)
-		SG_LOG(1, sfp, "%s: expected submitted=0 got %d\n",
-		       __func__, subm);
+		SG_LOG(1, sfp, "%s: expected submitted=0 got %d\n", __func__, subm);
 	if (sfp->efd_ctxp)
 		eventfd_ctx_put(sfp->efd_ctxp);
 	xa_destroy(xafp);
@@ -7555,11 +7312,9 @@ sg_uc_remove_sfp(struct work_struct *work)
 	e_sfp = __xa_erase(xadp, sfp->idx);
 	xa_unlock_irqrestore(xadp, iflags);
 	if (unlikely(sfp != e_sfp))
-		SG_LOG(1, sfp, "%s: xa_erase() return unexpected\n",
-		       __func__);
+		SG_LOG(1, sfp, "%s: xa_erase() return unexpected\n", __func__);
 	o_count = atomic_dec_return(&sdp->open_cnt);
-	SG_LOG(3, sfp, "%s: dev o_count after=%d: sfp=0x%pK --\n", __func__,
-	       o_count, sfp);
+	SG_LOG(3, sfp, "%s: dev o_count after=%d: sfp=0x%pK --\n", __func__, o_count, sfp);
 	kfree(sfp);
 
 	scsi_device_put(sdp->device);
@@ -7598,8 +7353,9 @@ sg_get_dev(int min_dev)
 	if (unlikely(!sdp))
 		sdp = ERR_PTR(-ENXIO);
 	else if (SG_IS_DETACHING(sdp)) {
-		/* If detaching, then the refcount may already be 0, in
-		 * which case it would be a bug to do kref_get().
+		/*
+		 * If detaching, then the refcount may already be 0, in which case it would
+		 * be a bug to do kref_get().
 		 */
 		sdp = ERR_PTR(-ENODEV);
 	} else
@@ -7756,8 +7512,7 @@ sg_proc_single_open_adio(struct inode *inode, struct file *filp)
 
 /* Kept for backward compatibility. sg_allow_dio is now ignored. */
 static ssize_t
-sg_proc_write_adio(struct file *filp, const char __user *buffer,
-		   size_t count, loff_t *off)
+sg_proc_write_adio(struct file *filp, const char __user *buffer, size_t count, loff_t *off)
 {
 	int err;
 	unsigned long num;
@@ -7778,8 +7533,7 @@ sg_proc_single_open_dressz(struct inode *inode, struct file *filp)
 }
 
 static ssize_t
-sg_proc_write_dressz(struct file *filp, const char __user *buffer,
-		     size_t count, loff_t *off)
+sg_proc_write_dressz(struct file *filp, const char __user *buffer, size_t count, loff_t *off)
 {
 	int err;
 	unsigned long k = ULONG_MAX;
@@ -7849,8 +7603,8 @@ sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 	sdp = it ? sg_lookup_dev(it->index) : NULL;
 	scsidp = sdp ? sdp->device : NULL;
 	if (sdp && scsidp && !SG_IS_DETACHING(sdp))
-		seq_printf(s, "%8.8s\t%16.16s\t%4.4s\n",
-			   scsidp->vendor, scsidp->model, scsidp->rev);
+		seq_printf(s, "%8.8s\t%16.16s\t%4.4s\n", scsidp->vendor, scsidp->model,
+			   scsidp->rev);
 	else
 		seq_puts(s, "<no active device>\n");
 	read_unlock_irqrestore(&sg_index_lock, iflags);
@@ -7879,26 +7633,22 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, bool inactive,
 	is_v3v4 = v4 ? true : (srp->s_hdr3.interface_id != '\0');
 	sg_get_rsv_str(srp, "     ", "", sizeof(b), b);
 	if (strlen(b) > 5)
-		cp = (is_v3v4 && (srp->rq_flags & SG_FLAG_MMAP_IO)) ?
-					" mmap" : "";
+		cp = (is_v3v4 && (srp->rq_flags & SG_FLAG_MMAP_IO)) ?  " mmap" : "";
 	else
 		cp = (srp->rq_info & SG_INFO_DIRECT_IO_MASK) ? " dio" : "";
 	rq_st = atomic_read(&srp->rq_st);
 	dur = sg_get_dur(srp, &rq_st, t_in_ns, &is_dur);
-	n += scnprintf(obp + n, len - n, "%s%s>> %s:%d dlen=%d/%d id=%d", b,
-		       cp, sg_rq_st_str(rq_st, false), srp->rq_idx, srp->sgatp->dlen,
+	n += scnprintf(obp + n, len - n, "%s%s>> %s:%d dlen=%d/%d id=%d", b, cp,
+		       sg_rq_st_str(rq_st, false), srp->rq_idx, srp->sgatp->dlen,
 		       srp->sgatp->buflen, (int)srp->pack_id);
 	if (test_bit(SG_FFD_NO_DURATION, srp->parentfp->ffd_bm))
 		;
 	else if (is_dur)	/* cmd/req has completed, waiting for ... */
 		n += scnprintf(obp + n, len - n, " dur=%u%s", dur, tp);
 	else if (dur < U32_MAX) { /* in-flight or busy (so ongoing) */
-		if ((srp->rq_flags & SGV4_FLAG_YIELD_TAG) &&
-		    srp->tag != SG_TAG_WILDCARD)
-			n += scnprintf(obp + n, len - n, " tag=0x%x",
-				       srp->tag);
-		n += scnprintf(obp + n, len - n, " t_o/elap=%us/%u%s",
-			       to / 1000, dur, tp);
+		if ((srp->rq_flags & SGV4_FLAG_YIELD_TAG) && srp->tag != SG_TAG_WILDCARD)
+			n += scnprintf(obp + n, len - n, " tag=0x%x", srp->tag);
+		n += scnprintf(obp + n, len - n, " t_o/elap=%us/%u%s", to / 1000, dur, tp);
 	}
 	if (srp->sh_var != SG_SHR_NONE)
 		n += scnprintf(obp + n, len - n, " shr=%s",
@@ -7919,8 +7669,7 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, bool inactive,
 
 /* Writes debug info for one sg fd (including its sg requests) in obp buffer */
 static int
-sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
-		 bool reduced)
+sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx, bool reduced)
 {
 	bool set_debug;
 	bool t_in_ns = test_bit(SG_FFD_TIME_IN_NS, fp->ffd_bm);
@@ -7932,16 +7681,15 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 	struct sg_device *sdp = fp->parentdp;
 
 	if (sg_fd_is_shared(fp))
-		cp = xa_get_mark(&sdp->sfp_arr, fp->idx, SG_XA_FD_RS_SHARE) ?
-			" shr_rs" : " shr_rs";
+		cp = xa_get_mark(&sdp->sfp_arr, fp->idx, SG_XA_FD_RS_SHARE) ? " shr_rs" :
+									      " shr_ws";
 	else
 		cp = "";
 	set_debug = test_bit(SG_FDEV_LOG_SENSE, sdp->fdev_bm);
 	/* sgat=-1 means unavailable */
 	to = (fp->timeout >= 0) ? jiffies_to_msecs(fp->timeout) : -999;
 	if (to < 0)
-		n += scnprintf(obp + n, len - n, "BAD timeout=%d",
-			       fp->timeout);
+		n += scnprintf(obp + n, len - n, "BAD timeout=%d", fp->timeout);
 	else if (to % 1000)
 		n += scnprintf(obp + n, len - n, "timeout=%dms rs", to);
 	else
@@ -7965,8 +7713,8 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 		       atomic_read(&fp->sum_fd_dlens));
 	n += scnprintf(obp + n, len - n,
 		       "   submitted=%d waiting=%d inactives=%d   open thr_id=%d\n",
-		       atomic_read(&fp->submitted),
-		       atomic_read(&fp->waiting), atomic_read(&fp->inactives), fp->tid);
+		       atomic_read(&fp->submitted), atomic_read(&fp->waiting),
+		       atomic_read(&fp->inactives), fp->tid);
 	if (reduced)
 		return n;
 	k = 0;
@@ -7978,11 +7726,9 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 		if (xa_get_mark(&fp->srp_arr, idx, SG_XA_RQ_INACTIVE))
 			continue;
 		if (set_debug)
-			n += scnprintf(obp + n, len - n, "     rq_bm=0x%lx",
-				       srp->frq_bm[0]);
+			n += scnprintf(obp + n, len - n, "     rq_bm=0x%lx", srp->frq_bm[0]);
 		else if (test_bit(SG_FRQ_ABORTING, srp->frq_bm))
-			n += scnprintf(obp + n, len - n,
-				       "     abort>> ");
+			n += scnprintf(obp + n, len - n, "     abort>> ");
 		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns, false, obp + n, len - n);
 		++k;
 		if ((k % 8) == 0) {	/* don't hold up isr_s too long */
@@ -7998,8 +7744,7 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 		if (k == 0)
 			n += scnprintf(obp + n, len - n, "   Inactives:\n");
 		if (set_debug)
-			n += scnprintf(obp + n, len - n, "     rq_bm=0x%lx",
-				       srp->frq_bm[0]);
+			n += scnprintf(obp + n, len - n, "     rq_bm=0x%lx", srp->frq_bm[0]);
 		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns, true, obp + n, len - n);
 		++k;
 		if ((k % 8) == 0) {	/* don't hold up isr_s too long */
@@ -8014,8 +7759,7 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx,
 
 /* Writes debug info for one sg device (including its sg fds) in obp buffer */
 static int
-sg_proc_debug_sdev(struct sg_device *sdp, char *obp, int len,
-		   int *fd_counterp, bool reduced)
+sg_proc_debug_sdev(struct sg_device *sdp, char *obp, int len, int *fd_counterp, bool reduced)
 {
 	int n = 0;
 	int my_count = 0;
@@ -8028,10 +7772,9 @@ sg_proc_debug_sdev(struct sg_device *sdp, char *obp, int len,
 	countp = fd_counterp ? fd_counterp : &my_count;
 	disk_name = (sdp->disk ? sdp->disk->disk_name : "?_?");
 	n += scnprintf(obp + n, len - n, " >>> device=%s ", disk_name);
-	n += scnprintf(obp + n, len - n, "%d:%d:%d:%llu ", ssdp->host->host_no,
-		       ssdp->channel, ssdp->id, ssdp->lun);
-	n += scnprintf(obp + n, len - n,
-		       "  max_sgat_sz,elems=2^%d,%d excl=%d open_cnt=%d\n",
+	n += scnprintf(obp + n, len - n, "%d:%d:%d:%llu ", ssdp->host->host_no, ssdp->channel,
+		       ssdp->id, ssdp->lun);
+	n += scnprintf(obp + n, len - n, "  max_sgat_sz,elems=2^%d,%d excl=%d open_cnt=%d\n",
 		       ilog2(sdp->max_sgat_sz), sdp->max_sgat_elems,
 		       SG_HAVE_EXCLUDE(sdp), atomic_read(&sdp->open_cnt));
 	xa_for_each(&sdp->sfp_arr, idx, fp) {
@@ -8062,13 +7805,12 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v, bool reduced)
 
 	b1[0] = '\0';
 	if (it && it->index == 0)
-		seq_printf(s, "max_active_device=%d  def_reserved_size=%d\n",
-			   (int)it->max, def_reserved_size);
+		seq_printf(s, "max_active_device=%d  def_reserved_size=%d\n", (int)it->max,
+			   def_reserved_size);
 	fdi_p = it ? &it->fd_index : &k;
 	bp = kzalloc(bp_len, __GFP_NOWARN | GFP_KERNEL);
 	if (unlikely(!bp)) {
-		seq_printf(s, "%s: Unable to allocate %d on heap, finish\n",
-			   __func__, bp_len);
+		seq_printf(s, "%s: Unable to allocate %d on heap, finish\n", __func__, bp_len);
 		return -ENOMEM;
 	}
 	read_lock_irqsave(&sg_index_lock, iflags);
@@ -8102,19 +7844,17 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v, bool reduced)
 		found = true;
 		disk_name = (sdp->disk ? sdp->disk->disk_name : "?_?");
 		if (SG_IS_DETACHING(sdp)) {
-			snprintf(b1, sizeof(b1), " >>> %s %s\n", disk_name,
-				 "detaching pending close\n");
+			snprintf(b1, sizeof(b1), " >>> detaching pending close %s\n", disk_name);
 		} else if (sdp->device) {
-			n = sg_proc_debug_sdev(sdp, bp, bp_len, fdi_p,
-					       reduced);
+			n = sg_proc_debug_sdev(sdp, bp, bp_len, fdi_p, reduced);
 			if (n >= bp_len - 1) {
 				trunc = true;
 				if (bp[bp_len - 2] != '\n')
 					bp[bp_len - 2] = '\n';
 			}
 		} else {
-			snprintf(b1, sizeof(b1), " >>> device=%s  %s\n",
-				 disk_name, "sdp->device==NULL, skip");
+			snprintf(b1, sizeof(b1), " >>> device=%s sdp->device==NULL, skip\n",
+				 disk_name);
 		}
 	}
 skip:
@@ -8125,8 +7865,7 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v, bool reduced)
 			if (seq_has_overflowed(s))
 				goto s_ovfl;
 			if (trunc)
-				seq_printf(s, "   >> Output truncated %s\n",
-					   "due to buffer size");
+				seq_puts(s, "   >> Output truncated due to buffer size\n");
 		} else if (b1[0]) {
 			seq_puts(s, b1);
 			if (unlikely(seq_has_overflowed(s)))
@@ -8252,8 +7991,7 @@ sg_dfs_snapped_show(void *data, struct seq_file *m)
 }
 
 static ssize_t
-sg_dfs_snapped_write(void *data, const char __user *buf, size_t count,
-		     loff_t *ppos)
+sg_dfs_snapped_write(void *data, const char __user *buf, size_t count, loff_t *ppos)
 {
 	/* Any write clears snapped buffer */
 	mutex_lock(&snapped_mutex);
@@ -8292,8 +8030,7 @@ sg_dfs_snapshot_devs_show(void *data, struct seq_file *m)
 }
 
 static ssize_t
-sg_dfs_snapshot_devs_write(void *data, const char __user *buf, size_t count,
-			   loff_t *ppos)
+sg_dfs_snapshot_devs_write(void *data, const char __user *buf, size_t count, loff_t *ppos)
 {
 	bool trailing_comma;
 	int k, n;
@@ -8322,8 +8059,7 @@ sg_dfs_snapshot_devs_write(void *data, const char __user *buf, size_t count,
 	if (n == 0) {
 		return -EINVAL;
 	} else if (k >= SG_SNAPSHOT_DEV_MAX && trailing_comma) {
-		pr_err("%s: only %d elements in snapshot array\n", __func__,
-		       SG_SNAPSHOT_DEV_MAX);
+		pr_err("%s: only %d elements in snapshot array\n", __func__, SG_SNAPSHOT_DEV_MAX);
 		return -EINVAL;
 	}
 	if (n < SG_SNAPSHOT_DEV_MAX)
@@ -8341,16 +8077,15 @@ sg_dfs_show(struct seq_file *m, void *v)
 }
 
 static ssize_t
-sg_dfs_write(struct file *file, const char __user *buf, size_t count,
-	     loff_t *ppos)
+sg_dfs_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos)
 {
 	struct seq_file *m = file->private_data;
 	const struct sg_dfs_attr *attr = m->private;
 	void *data = d_inode(file->f_path.dentry->d_parent)->i_private;
 
 	/*
-	 * Attributes that only implement .seq_ops are read-only and 'attr' is
-	 * the same with 'data' in this case.
+	 * Attributes that only implement .seq_ops are read-only and 'attr' is the same with
+	 * 'data' in this case.
 	 */
 	if (unlikely(attr == data || !attr->write))
 		return -EPERM;
@@ -8397,16 +8132,14 @@ static const struct file_operations sg_dfs_fops = {
 	.release	= sg_dfs_release,
 };
 
-static void sg_dfs_mk_files(struct dentry *parent, void *data,
-			    const struct sg_dfs_attr *attr)
+static void sg_dfs_mk_files(struct dentry *parent, void *data, const struct sg_dfs_attr *attr)
 {
 	if (IS_ERR_OR_NULL(parent))
 		return;
 
 	d_inode(parent)->i_private = data;
 	for (; attr->name; ++attr)
-		debugfs_create_file(attr->name, attr->mode, parent,
-				    (void *)attr, &sg_dfs_fops);
+		debugfs_create_file(attr->name, attr->mode, parent, (void *)attr, &sg_dfs_fops);
 }
 
 static const struct seq_operations sg_snapshot_seq_ops = {
@@ -8437,10 +8170,8 @@ sg_dfs_init(void)
 {
 	/* create and populate /sys/kernel/debug/scsi_generic directory */
 	if (!sg_dfs_cxt.dfs_rootdir) {
-		sg_dfs_cxt.dfs_rootdir = debugfs_create_dir("scsi_generic",
-							    NULL);
-		sg_dfs_mk_files(sg_dfs_cxt.dfs_rootdir, &sg_dfs_cxt,
-				sg_dfs_attrs);
+		sg_dfs_cxt.dfs_rootdir = debugfs_create_dir("scsi_generic", NULL);
+		sg_dfs_mk_files(sg_dfs_cxt.dfs_rootdir, &sg_dfs_cxt, sg_dfs_attrs);
 	}
 	sg_dfs_cxt.snapshot_devs[0] = -1;	/* show all sg devices */
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 76/83] sg: add no_attach_msg parameter
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (75 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 75/83] sg: expand source line length to 100 characters Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 77/83] sg: add SGV4_FLAG_REC_ORDER Douglas Gilbert
                   ` (6 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

When testing and with big storage arrays having a log messaage
for each sg device attached becomes a little annoying. Still keep
that as the default but when no_attach_msg=1 is given at driver
or module load time, then any sg device nodes attached will not
be reported to the log.

Re-order the other three driver/module load time parameters so
they appear in alphabetical order when viewed with modinfo.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index a159af1e3ee6..a76ab2c59553 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -184,6 +184,7 @@ static int def_reserved_size = -1;	/* picks up init parameter */
 static int sg_allow_dio = SG_ALLOW_DIO_DEF;	/* ignored by code */
 
 static int scatter_elem_sz = SG_SCATTER_SZ;
+static bool no_attach_msg;
 
 #define SG_DEF_SECTOR_SZ 512
 
@@ -5929,8 +5930,9 @@ sg_add_device(struct device *cl_dev, struct class_interface *cl_intf)
 
 	sdp->create_ns = ktime_get_boottime_ns();
 	sg_calc_sgat_param(sdp);
-	sdev_printk(KERN_NOTICE, scsidp, "Attached scsi generic sg%d type %d\n", sdp->index,
-		    scsidp->type);
+	if (!no_attach_msg)
+		sdev_printk(KERN_NOTICE, scsidp, "Attached scsi generic sg%d type %d\n",
+			    sdp->index, scsidp->type);
 
 	dev_set_drvdata(cl_dev, sdp);
 	return 0;
@@ -8190,9 +8192,10 @@ static void sg_dfs_exit(void) {}
 
 #endif		/* CONFIG_DEBUG_FS */
 
-module_param_named(scatter_elem_sz, scatter_elem_sz, int, 0644);
-module_param_named(def_reserved_size, def_reserved_size, int, 0644);
 module_param_named(allow_dio, sg_allow_dio, int, 0644);
+module_param_named(def_reserved_size, def_reserved_size, int, 0644);
+module_param_named(no_attach_msg, no_attach_msg, bool, 0644);
+module_param_named(scatter_elem_sz, scatter_elem_sz, int, 0644);
 
 MODULE_AUTHOR("Douglas Gilbert");
 MODULE_DESCRIPTION("SCSI generic (sg) driver");
@@ -8200,8 +8203,9 @@ MODULE_LICENSE("GPL");
 MODULE_VERSION(SG_VERSION_STR);
 MODULE_ALIAS_CHARDEV_MAJOR(SCSI_GENERIC_MAJOR);
 
-MODULE_PARM_DESC(scatter_elem_sz, "scatter gather element size (default: max(SG_SCATTER_SZ, PAGE_SIZE))");
-MODULE_PARM_DESC(def_reserved_size, "size of buffer reserved for each fd");
 MODULE_PARM_DESC(allow_dio, "allow direct I/O (default: 0 (disallow)); now ignored");
+MODULE_PARM_DESC(def_reserved_size, "size of buffer reserved for each fd");
+MODULE_PARM_DESC(no_attach_msg, "don't log sg device attach message when 1 (def:0)");
+MODULE_PARM_DESC(scatter_elem_sz, "scatter gather element size (only powers of 2 >= PAGE_SIZE)");
 module_init(init_sg);
 module_exit(exit_sg);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 77/83] sg: add SGV4_FLAG_REC_ORDER
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (76 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 76/83] sg: add no_attach_msg parameter Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 78/83] sg: max to read for mrq sg_ioreceive Douglas Gilbert
                   ` (5 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

By default, when ioctl(SG_IORECEIVE) is used in multiple requests
mode (mrq) the response array is built in completion order. And the
completion order isn't necessarily submission order which can be a
nuisance. This new flag allows the user to specify where (via an
index in the v4::request_priority field) a given request's response
will be placed in the response array associated with a mrq
ioctl(SG_IORECEIVE) call.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 43 +++++++++++++++++++++++++++---------------
 include/uapi/scsi/sg.h |  1 +
 2 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index a76ab2c59553..37a3361dec31 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1552,8 +1552,11 @@ sg_mrq_sanity(struct sg_mrq_hold *mhp, bool is_svb)
 			hp->response = cop->response;
 			hp->max_response_len = cop->max_response_len;
 		}
-		if (!is_svb)
+		if (!is_svb) {
+			if (cop->flags & SGV4_FLAG_REC_ORDER)
+				hp->flags |= SGV4_FLAG_REC_ORDER;
 			continue;
+		}
 		/* mrq share variable blocking (svb) additional constraints checked here */
 		if (unlikely(flags & (SGV4_FLAG_COMPLETE_B4 | SGV4_FLAG_KEEP_SHARE))) {
 			SG_LOG(1, sfp, "%s: %s %u: no KEEP_SHARE with svb\n", __func__, rip, k);
@@ -2775,7 +2778,7 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		srp->s_hdr4.cmd_len = h4p->request_len;
 		srp->s_hdr4.dir = dir;
 		srp->s_hdr4.out_resid = 0;
-		srp->s_hdr4.mrq_ind = 0;
+		srp->s_hdr4.mrq_ind = (rq_flags & SGV4_FLAG_REC_ORDER) ? h4p->request_priority : 0;
 		if (dir == SG_DXFER_TO_DEV) {
 			srp->s_hdr4.wr_offset = cwrp->wr_offset;
 			srp->s_hdr4.wr_len = dlen;
@@ -3053,7 +3056,7 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p, struct
 static int
 sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs, struct sg_io_v4 *rsp_arr)
 {
-	int k;
+	int k, idx;
 	int res = 0;
 	struct sg_request *srp;
 
@@ -3062,8 +3065,15 @@ sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs, struct sg
 		if (!sg_mrq_get_ready_srp(sfp, &srp))
 			break;
 		if (IS_ERR(srp))
-			return k ? k : PTR_ERR(srp);
-		res = sg_receive_v4(sfp, srp, NULL, rsp_arr + k);
+			return k ? k /* some but not all */ : PTR_ERR(srp);
+		if (srp->rq_flags & SGV4_FLAG_REC_ORDER) {
+			idx = srp->s_hdr4.mrq_ind;
+			if (idx >= max_mrqs)
+				idx = 0;	/* overwrite index 0 when trouble */
+		} else {
+			idx = k;	/* completion order */
+		}
+		res = sg_receive_v4(sfp, srp, NULL, rsp_arr + idx);
 		if (unlikely(res))
 			return res;
 		rsp_arr[k].info |= SG_INFO_MRQ_FINI;
@@ -3077,7 +3087,14 @@ sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs, struct sg
 			return res;	/* signal --> -ERESTARTSYS */
 		if (IS_ERR(srp))
 			return k ? k : PTR_ERR(srp);
-		res = sg_receive_v4(sfp, srp, NULL, rsp_arr + k);
+		if (srp->rq_flags & SGV4_FLAG_REC_ORDER) {
+			idx = srp->s_hdr4.mrq_ind;
+			if (idx >= max_mrqs)
+				idx = 0;
+		} else {
+			idx = k;
+		}
+		res = sg_receive_v4(sfp, srp, NULL, rsp_arr + idx);
 		if (unlikely(res))
 			return res;
 		rsp_arr[k].info |= SG_INFO_MRQ_FINI;
@@ -7619,7 +7636,8 @@ sg_proc_seq_show_devstrs(struct seq_file *s, void *v)
 
 /* Writes debug info for one sg_request in obp buffer */
 static int
-sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, bool inactive, char *obp, int len)
+sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, bool inactive, char *obp,
+		   int len)
 {
 	bool is_v3v4, v4, is_dur;
 	int n = 0;
@@ -7659,13 +7677,8 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, bool inactive,
 		n += scnprintf(obp + n, len - n, " sgat=%d", srp->sgatp->num_sgat);
 	cp = (srp->rq_flags & SGV4_FLAG_HIPRI) ? "hipri " : "";
 	n += scnprintf(obp + n, len - n, " %sop=0x%02x\n", cp, srp->cmd_opcode);
-	if (inactive && rq_st != SG_RQ_INACTIVE) {
-		if (xa_get_mark(&srp->parentfp->srp_arr, srp->rq_idx, SG_XA_RQ_INACTIVE))
-			cp = "still marked inactive, BAD";
-		else
-			cp = "no longer marked inactive";
-		n += scnprintf(obp + n, len - n, "       <<< xarray %s >>>\n", cp);
-	}
+	if (inactive && rq_st != SG_RQ_INACTIVE)
+		n += scnprintf(obp + n, len - n, "       <<< inconsistent state >>>\n");
 	return n;
 }
 
@@ -7749,7 +7762,7 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx, bool r
 			n += scnprintf(obp + n, len - n, "     rq_bm=0x%lx", srp->frq_bm[0]);
 		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns, true, obp + n, len - n);
 		++k;
-		if ((k % 8) == 0) {	/* don't hold up isr_s too long */
+		if ((k % 8) == 0) {	/* don't hold up things too long */
 			xa_unlock_irqrestore(&fp->srp_arr, iflags);
 			cpu_relax();
 			xa_lock_irqsave(&fp->srp_arr, iflags);
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 236ac4678f71..871073d1a8d3 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -128,6 +128,7 @@ typedef struct sg_io_hdr {
 #define SGV4_FLAG_KEEP_SHARE 0x20000  /* ... buffer for another dout command */
 #define SGV4_FLAG_MULTIPLE_REQS 0x40000	/* 1 or more sg_io_v4-s in data-in */
 #define SGV4_FLAG_ORDERED_WR 0x80000	/* svb: issue in-order writes */
+#define SGV4_FLAG_REC_ORDER 0x100000 /* receive order in v4:request_priority */
 
 /* Output (potentially OR-ed together) in v3::info or v4::info field */
 #define SG_INFO_OK_MASK 0x1
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 78/83] sg: max to read for mrq sg_ioreceive
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (77 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 77/83] sg: add SGV4_FLAG_REC_ORDER Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 79/83] sg: mrq: if uniform svb then re-use bio_s Douglas Gilbert
                   ` (4 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

When using a multiple request (mrq) ioctl(SG_IORECEIVE) the size of the
supplied response array dictates an implicit maximum number of
responses that can be read by an invocation. An explicit maximum number
to read can be given in the control object's request_priority field. A
value of 0 in this field uses the implicit maximum value.

The mrq ioctl(SG_IORECEIVE) control object can now take the
SGV4_FLAG_IMMED flag, if so only those responses associated with
completed requests will be reported.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 56 ++++++++++++++++++++++++++---------------------
 1 file changed, 31 insertions(+), 25 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 37a3361dec31..ac7321ffbd05 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1314,6 +1314,7 @@ sg_wait_poll_for_given_srp(struct sg_fd *sfp, struct sg_request *srp, bool do_po
 	if (SG_IS_DETACHING(sdp))
 		goto detaching;
 	return sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY);
+
 poll_loop:
 	if (srp->rq_flags & SGV4_FLAG_HIPRI) {
 		long state = current->state;
@@ -1367,37 +1368,34 @@ sg_wait_poll_for_given_srp(struct sg_fd *sfp, struct sg_request *srp, bool do_po
 static struct sg_request *
 sg_mrq_poll_either(struct sg_fd *sfp, struct sg_fd *sec_sfp, bool *on_sfp)
 {
-	bool sig_pending = false;
 	long state = current->state;
 	struct sg_request *srp;
 
 	do {		/* alternating polling loop */
 		if (sfp) {
 			if (sg_mrq_get_ready_srp(sfp, &srp)) {
+				__set_current_state(TASK_RUNNING);
 				if (!srp)
 					return ERR_PTR(-ENODEV);
 				*on_sfp = true;
-				__set_current_state(TASK_RUNNING);
 				return srp;
 			}
 		}
 		if (sec_sfp && sfp != sec_sfp) {
 			if (sg_mrq_get_ready_srp(sec_sfp, &srp)) {
+				__set_current_state(TASK_RUNNING);
 				if (!srp)
 					return ERR_PTR(-ENODEV);
 				*on_sfp = false;
-				__set_current_state(TASK_RUNNING);
 				return srp;
 			}
 		}
 		if (signal_pending_state(state, current)) {
-			sig_pending = true;
-			break;
+			__set_current_state(TASK_RUNNING);
+			return ERR_PTR(-ERESTARTSYS);
 		}
 		cpu_relax();
-	} while (!need_resched());
-	__set_current_state(TASK_RUNNING);
-	return ERR_PTR(sig_pending ? -ERESTARTSYS : -EAGAIN);
+	} while (true);
 }
 
 /*
@@ -3054,21 +3052,25 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p, struct
  * of elements written to rsp_arr, which may be 0 if mrqs submitted but none waiting
  */
 static int
-sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs, struct sg_io_v4 *rsp_arr)
+sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_rcv, int num_rsp_arr,
+		      struct sg_io_v4 *rsp_arr)
 {
 	int k, idx;
 	int res = 0;
 	struct sg_request *srp;
 
-	SG_LOG(3, sfp, "%s: max_mrqs=%d\n", __func__, max_mrqs);
-	for (k = 0; k < max_mrqs; ++k) {
+	SG_LOG(3, sfp, "%s: num_rsp_arr=%d, max_rcv=%d", __func__, num_rsp_arr, max_rcv);
+	if (max_rcv == 0 || max_rcv > num_rsp_arr)
+		max_rcv = num_rsp_arr;
+	k = 0;
+	for ( ; k < max_rcv; ++k) {
 		if (!sg_mrq_get_ready_srp(sfp, &srp))
 			break;
 		if (IS_ERR(srp))
 			return k ? k /* some but not all */ : PTR_ERR(srp);
 		if (srp->rq_flags & SGV4_FLAG_REC_ORDER) {
 			idx = srp->s_hdr4.mrq_ind;
-			if (idx >= max_mrqs)
+			if (idx >= num_rsp_arr)
 				idx = 0;	/* overwrite index 0 when trouble */
 		} else {
 			idx = k;	/* completion order */
@@ -3076,12 +3078,12 @@ sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs, struct sg
 		res = sg_receive_v4(sfp, srp, NULL, rsp_arr + idx);
 		if (unlikely(res))
 			return res;
-		rsp_arr[k].info |= SG_INFO_MRQ_FINI;
+		rsp_arr[idx].info |= SG_INFO_MRQ_FINI;
 	}
-	if (non_block)
+	if (non_block || k >= max_rcv)
 		return k;
-
-	for ( ; k < max_mrqs; ++k) {
+	SG_LOG(6, sfp, "%s: received=%d, max=%d\n", __func__, k, max_rcv);
+	for ( ; k < max_rcv; ++k) {
 		res = sg_wait_any_mrq(sfp, &srp);
 		if (unlikely(res))
 			return res;	/* signal --> -ERESTARTSYS */
@@ -3089,7 +3091,7 @@ sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_mrqs, struct sg
 			return k ? k : PTR_ERR(srp);
 		if (srp->rq_flags & SGV4_FLAG_REC_ORDER) {
 			idx = srp->s_hdr4.mrq_ind;
-			if (idx >= max_mrqs)
+			if (idx >= num_rsp_arr)
 				idx = 0;
 		} else {
 			idx = k;
@@ -3111,6 +3113,7 @@ static int
 sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p, bool non_block)
 {
 	int res = 0;
+	int max_rcv;
 	u32 len, n;
 	struct sg_io_v4 *rsp_v4_arr;
 	void __user *pp;
@@ -3123,14 +3126,16 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p, bool n
 		return -ERANGE;
 	n /= SZ_SG_IO_V4;
 	len = n * SZ_SG_IO_V4;
-	SG_LOG(3, sfp, "%s: %s, num_reqs=%u\n", __func__, (non_block ? "IMMED" : "blocking"), n);
+	max_rcv = cop->din_iovec_count;
+	SG_LOG(3, sfp, "%s: %s, num_reqs=%u, max_rcv=%d\n", __func__,
+	       (non_block ? "IMMED" : "blocking"), n, max_rcv);
 	rsp_v4_arr = kcalloc(n, SZ_SG_IO_V4, GFP_KERNEL);
 	if (unlikely(!rsp_v4_arr))
 		return -ENOMEM;
 
 	sg_v4h_partial_zero(cop);
 	cop->din_resid = n;
-	res = sg_mrq_iorec_complets(sfp, non_block, n, rsp_v4_arr);
+	res = sg_mrq_iorec_complets(sfp, non_block, max_rcv, n, rsp_v4_arr);
 	if (unlikely(res < 0))
 		goto fini;
 	cop->din_resid -= res;
@@ -3164,7 +3169,6 @@ sg_wait_poll_by_id(struct sg_fd *sfp, struct sg_request **srpp, int id,
 	return __wait_event_interruptible(sfp->cmpl_wait, sg_get_ready_srp(sfp, srpp, id, is_tag));
 poll_loop:
 	{
-		bool sig_pending = false;
 		long state = current->state;
 		struct sg_request *srp;
 
@@ -3175,14 +3179,16 @@ sg_wait_poll_by_id(struct sg_fd *sfp, struct sg_request **srpp, int id,
 				*srpp = srp;
 				return 0;
 			}
+			if (SG_IS_DETACHING(sfp->parentdp)) {
+				__set_current_state(TASK_RUNNING);
+				return -ENODEV;
+			}
 			if (signal_pending_state(state, current)) {
-				sig_pending = true;
-				break;
+				__set_current_state(TASK_RUNNING);
+				return -ERESTARTSYS;
 			}
 			cpu_relax();
-		} while (!need_resched());
-		__set_current_state(TASK_RUNNING);
-		return sig_pending ? -ERESTARTSYS : -EAGAIN;
+		} while (true);
 	}
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 79/83] sg: mrq: if uniform svb then re-use bio_s
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (78 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 78/83] sg: max to read for mrq sg_ioreceive Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 80/83] sg: expand bvec usage; " Douglas Gilbert
                   ` (3 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

The array of sg_io_v4 objects given for share variable blocking
multiple requests (svb mrq) is pre-scanned to check for errors and
to apply certain fix-ups. Add further code to determine whether the
READs and WRITEs are of the same size and that no use is made of
SGV4_FLAG_DOUT_OFFSET. Also require that SGV4_FLAG_NO_DXFER is used
on the READs. If those requirements are met then term the svb mrq
as 'uniform svb' and set the SG_FFD_CAN_REUSE_BIO bit flag. To see
the benefit from this, the number of commands given to the uniform
svb needs to be greater than SG_MAX_RSV_REQS (currently 8).
Preferably two or more times that number.

As part of the above, divide the per sg_request bitmap (formerly:
frq_bm) into two bitmaps: frq_lt_bm and frq_pc_bm. The "lt" group
are long term, potentially spanning many SCSI commands that use a
sg_request object. The "pc" group are per command and pertain to
the current SCSI command being processed by the sg_request object.

Rework the sg_rq_map_kern() function.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 370 +++++++++++++++++++++++++++-------------------
 1 file changed, 222 insertions(+), 148 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index ac7321ffbd05..0a0b40a8ab65 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -129,19 +129,22 @@ enum sg_shr_var {
 #define SG_ADD_RQ_MAX_RETRIES 40	/* to stop infinite _trylock(s) */
 #define SG_DEF_BLK_POLL_LOOP_COUNT 1000	/* may allow user to tweak this */
 
-/* Bit positions (flags) for sg_request::frq_bm bitmask follow */
-#define SG_FRQ_IS_V4I		0	/* true (set) when is v4 interface */
-#define SG_FRQ_IS_ORPHAN	1	/* owner of request gone */
-#define SG_FRQ_SYNC_INVOC	2	/* synchronous (blocking) invocation */
-#define SG_FRQ_US_XFER		3	/* kernel<-->user_space data transfer */
-#define SG_FRQ_ABORTING		4	/* in process of aborting this cmd */
-#define SG_FRQ_DEACT_ORPHAN	5	/* not keeping orphan so de-activate */
-#define SG_FRQ_RECEIVING	6	/* guard against multiple receivers */
-#define SG_FRQ_FOR_MMAP		7	/* request needs PAGE_SIZE elements */
-#define SG_FRQ_COUNT_ACTIVE	8	/* sfp->submitted + waiting active */
-#define SG_FRQ_ISSUED		9	/* blk_execute_rq_nowait() finished */
+/* Bit positions (flags) for sg_request::frq_lt_bm bitmask, lt: long term */
+#define SG_FRQ_LT_RESERVED	0	/* marks a reserved request */
+#define SG_FRQ_LT_REUSE_BIO	1	/* srp->bio primed for re-use */
+
+/* Bit positions (flags) for sg_request::frq_pc_bm bitmask. pc: per command */
+#define SG_FRQ_PC_IS_V4I	0	/* true (set) when is v4 interface */
+#define SG_FRQ_PC_IS_ORPHAN	1	/* owner of request gone */
+#define SG_FRQ_PC_SYNC_INVOC	2	/* synchronous (blocking) invocation */
+#define SG_FRQ_PC_US_XFER	3	/* kernel<-->user_space data transfer */
+#define SG_FRQ_PC_ABORTING	4	/* in process of aborting this cmd */
+#define SG_FRQ_PC_DEACT_ORPHAN	5	/* not keeping orphan so de-activate */
+#define SG_FRQ_PC_RECEIVING	6	/* guard against multiple receivers */
+#define SG_FRQ_PC_FOR_MMAP	7	/* request needs PAGE_SIZE elements */
+#define SG_FRQ_PC_COUNT_ACTIVE	8	/* sfp->submitted + waiting active */
+#define SG_FRQ_PC_ISSUED	9	/* blk_execute_rq_nowait() finished */
 #define SG_FRQ_POLL_SLEPT	10	/* stop re-entry of hybrid_sleep() */
-#define SG_FRQ_RESERVED		11	/* marks a reserved request */
 
 /* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */
 #define SG_FFD_FORCE_PACKID	0	/* receive only given pack_id/tag */
@@ -159,6 +162,7 @@ enum sg_shr_var {
 #define SG_FFD_EXCL_WAITQ	12	/* append _exclusive to wait_event */
 #define SG_FFD_SVB_ACTIVE	13	/* shared variable blocking active */
 #define SG_FFD_RESHARE		14	/* reshare limits to single rsv req */
+#define SG_FFD_CAN_REUSE_BIO	15	/* uniform svb --> can re-use bio_s */
 
 /* Bit positions (flags) for sg_device::fdev_bm bitmask follow */
 #define SG_FDEV_EXCLUDE		0	/* have fd open with O_EXCL */
@@ -185,6 +189,7 @@ static int sg_allow_dio = SG_ALLOW_DIO_DEF;	/* ignored by code */
 
 static int scatter_elem_sz = SG_SCATTER_SZ;
 static bool no_attach_msg;
+static atomic_t sg_tmp_count_reused_bios;
 
 #define SG_DEF_SECTOR_SZ 512
 
@@ -264,7 +269,8 @@ struct sg_request {	/* active SCSI command or inactive request */
 	int tag;		/* block layer identifier of request */
 	blk_qc_t cookie;	/* ids 1 or more queues for blk_poll() */
 	u64 start_ns;		/* starting point of command duration calc */
-	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
+	unsigned long frq_lt_bm[1];	/* see SG_FRQ_LT_* defines above */
+	unsigned long frq_pc_bm[1];	/* see SG_FRQ_PC_* defines above */
 	u8 *sense_bp;		/* mempool alloc-ed sense buffer, as needed */
 	struct sg_fd *parentfp;	/* pointer to owning fd, even when on fl */
 	struct request *rqq;	/* released in sg_rq_end_io(), bio kept */
@@ -327,8 +333,8 @@ struct sg_comm_wr_t {  /* arguments to sg_common_write() */
 	int rsv_idx;		/* wanted rsv_arr index, def: -1 (anyone) */
 	int dlen;		/* dout or din length in bytes */
 	int wr_offset;		/* non-zero if v4 and DOUT_OFFSET set */
-	unsigned long frq_bm[1];	/* see SG_FRQ_* defines above */
-	union {		/* selector is frq_bm.SG_FRQ_IS_V4I */
+	unsigned long frq_pc_bm[1];	/* see SG_FRQ_PC_* defines above */
+	union {		/* selector is frq_pc_bm.SG_FRQ_IS_V4I */
 		struct sg_io_hdr *h3p;
 		struct sg_io_v4 *h4p;
 	};
@@ -422,7 +428,7 @@ static void sg_take_snap(struct sg_fd *sfp, bool clear_first);
 #define SG_HAVE_EXCLUDE(sdp) test_bit(SG_FDEV_EXCLUDE, (sdp)->fdev_bm)
 #define SG_IS_O_NONBLOCK(sfp) (!!((sfp)->filp->f_flags & O_NONBLOCK))
 #define SG_RQ_ACTIVE(srp) (atomic_read(&(srp)->rq_st) != SG_RQ_INACTIVE)
-#define SG_IS_V4I(srp) test_bit(SG_FRQ_IS_V4I, (srp)->frq_bm)
+#define SG_IS_V4I(srp) test_bit(SG_FRQ_PC_IS_V4I, (srp)->frq_pc_bm)
 
 /*
  * Kernel needs to be built with CONFIG_SCSI_LOGGING to see log messages. 'depth' is a number
@@ -486,7 +492,8 @@ static void sg_take_snap(struct sg_fd *sfp, bool clear_first);
  *		SG_IOABORT: no match on pack_id or tag; mrq: no active reqs
  * ENODEV	target (SCSI) device associated with the fd has "disappeared"
  * ENOMEM	obvious; could be some pre-allocated cache that is exhausted
- * ENOMSG	data transfer setup needed or (direction) disallowed (sharing)
+ * ENOMSG	data transfer setup needed or (direction) disallowed (sharing);
+ *		inconsistency in share settings (mrq)
  * ENOSTR	write-side request abandoned due to read-side error or state
  * ENOTSOCK	sharing: file descriptor for sharing unassociated with sg driver
  * ENXIO	'no such device or address' SCSI mid-level processing errors
@@ -762,7 +769,7 @@ static inline void
 sg_comm_wr_init(struct sg_comm_wr_t *cwrp)
 {
 	memset(cwrp, 0, sizeof(*cwrp));
-	WRITE_ONCE(cwrp->frq_bm[0], 0);
+	/* WRITE_ONCE(cwrp->frq_pc_bm[0], 0); */
 	cwrp->rsv_idx = -1;
 }
 
@@ -979,7 +986,7 @@ sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync, struct sg_reque
 		clear_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(hp->timeout);
 	sg_comm_wr_init(&cwr);
-	__assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)sync);
+	__assign_bit(SG_FRQ_PC_SYNC_INVOC, cwr.frq_pc_bm, (int)sync);
 	cwr.h3p = hp;
 	cwr.dlen = hp->dxfer_len;
 	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
@@ -1256,7 +1263,7 @@ sg_srp_hybrid_sleep(struct sg_request *srp)
 	enum hrtimer_mode mode;
 	ktime_t kt = ns_to_ktime(5000);
 
-	if (test_and_set_bit(SG_FRQ_POLL_SLEPT, srp->frq_bm))
+	if (test_and_set_bit(SG_FRQ_POLL_SLEPT, srp->frq_pc_bm))
 		return false;
 	if (kt == 0)
 		return false;
@@ -1303,7 +1310,7 @@ sg_wait_poll_for_given_srp(struct sg_fd *sfp, struct sg_request *srp, bool do_po
 	/* N.B. The SG_FFD_EXCL_WAITQ flag is ignored here. */
 	res = __wait_event_interruptible(sfp->cmpl_wait, sg_rq_landed(sdp, srp));
 	if (unlikely(res)) { /* -ERESTARTSYS because signal hit thread */
-		set_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
+		set_bit(SG_FRQ_PC_IS_ORPHAN, srp->frq_pc_bm);
 		/* orphans harvested when sfp->keep_orphan is false */
 		sg_rq_chg_state_force(srp, SG_RQ_INFLIGHT);
 		SG_LOG(1, sfp, "%s:  wait_event_interruptible(): %s[%d]\n", __func__,
@@ -1355,7 +1362,7 @@ sg_wait_poll_for_given_srp(struct sg_fd *sfp, struct sg_request *srp, bool do_po
 		return sg_rq_chg_state(srp, sr_st, SG_RQ_BUSY);
 	}
 	if (atomic_read_acquire(&srp->rq_st) != SG_RQ_AWAIT_RCV)
-		return (test_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm) &&
+		return (test_bit(SG_FRQ_PC_COUNT_ACTIVE, srp->frq_pc_bm) &&
 			atomic_read(&sfp->submitted) < 1) ? -ENODATA : 0;
 	return unlikely(sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY)) ? -EPROTO : 0;
 
@@ -1483,13 +1490,15 @@ sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_fd *sec_sf
  * a strict READ (like) thence WRITE (like) sequence on all data carrying commands; also
  * a dangling READ is not allowed at the end of a scb request array.
  */
-static bool
-sg_mrq_sanity(struct sg_mrq_hold *mhp, bool is_svb)
+static int
+sg_mrq_prepare(struct sg_mrq_hold *mhp, bool is_svb)
 {
 	bool last_is_keep_share = false;
 	bool expect_wr = false;
+	bool uniform_svb = true;	/* no dxfr to user space, all data moves same size */
 	bool share, have_mrq_sense, have_file_share;
-	int k;
+	int k, dlen;
+	int prev_dlen = 0;
 	struct sg_io_v4 *cop = mhp->cwrp->h4p;
 	u32 cdb_alen = cop->request_len;
 	u32 cdb_mxlen = cdb_alen / mhp->tot_reqs;
@@ -1504,7 +1513,7 @@ sg_mrq_sanity(struct sg_mrq_hold *mhp, bool is_svb)
 	have_file_share = sg_fd_is_shared(sfp);
 	if (is_svb && unlikely(!have_file_share)) {
 		SG_LOG(1, sfp, "%s: share variable blocking (svb) needs file share\n", __func__);
-		return false;
+		return -ENOMSG;
 	}
 	/* Pre-check each request for anomalies, plus some preparation */
 	for (k = 0, hp = a_hds; k < mhp->tot_reqs; ++k, ++hp) {
@@ -1512,11 +1521,11 @@ sg_mrq_sanity(struct sg_mrq_hold *mhp, bool is_svb)
 		sg_v4h_partial_zero(hp);
 		if (unlikely(hp->guard != 'Q' || hp->protocol != 0 || hp->subprotocol != 0)) {
 			SG_LOG(1, sfp, "%s: req index %u: bad guard or protocol\n", __func__, k);
-			return false;
+			return -EPERM;
 		}
 		if (unlikely(flags & SGV4_FLAG_MULTIPLE_REQS)) {
 			SG_LOG(1, sfp, "%s: %s %u: no nested multi-reqs\n", __func__, rip, k);
-			return false;
+			return -ERANGE;
 		}
 		share = !!(flags & SGV4_FLAG_SHARE);
 		last_is_keep_share = !!(flags & SGV4_FLAG_KEEP_SHARE);
@@ -1524,27 +1533,27 @@ sg_mrq_sanity(struct sg_mrq_hold *mhp, bool is_svb)
 		    unlikely(flags & (SGV4_FLAG_DO_ON_OTHER | SGV4_FLAG_COMPLETE_B4))) {
 			SG_LOG(1, sfp, "%s: %s %u, no IMMED with ON_OTHER or COMPLETE_B4\n",
 			       __func__, rip, k);
-			return false;
+			return -ERANGE;
 		}
 		if (mhp->immed && unlikely(share)) {
 			SG_LOG(1, sfp, "%s: %s %u, no IMMED with FLAG_SHARE\n", __func__, rip, k);
-			return false;
+			return -ENOMSG;
 		}
 		if (mhp->co_mmap && (flags & SGV4_FLAG_MMAP_IO)) {
 			SG_LOG(1, sfp, "%s: %s %u, MMAP in co AND here\n", __func__, rip, k);
-			return false;
+			return -ERANGE;
 		}
 		if (unlikely(!have_file_share && share)) {
 			SG_LOG(1, sfp, "%s: %s %u, no file share\n", __func__, rip, k);
-			return false;
+			return -ENOMSG;
 		}
 		if (unlikely(!have_file_share && !!(flags & SGV4_FLAG_DO_ON_OTHER))) {
 			SG_LOG(1, sfp, "%s: %s %u, no other fd to do on\n", __func__, rip, k);
-			return false;
+			return -ENOMSG;
 		}
 		if (cdb_ap && unlikely(hp->request_len > cdb_mxlen)) {
 			SG_LOG(1, sfp, "%s: %s %u, cdb too long\n", __func__, rip, k);
-			return false;
+			return -ERANGE;
 		}
 		if (have_mrq_sense && hp->response == 0 && hp->max_response_len == 0) {
 			hp->response = cop->response;
@@ -1558,43 +1567,66 @@ sg_mrq_sanity(struct sg_mrq_hold *mhp, bool is_svb)
 		/* mrq share variable blocking (svb) additional constraints checked here */
 		if (unlikely(flags & (SGV4_FLAG_COMPLETE_B4 | SGV4_FLAG_KEEP_SHARE))) {
 			SG_LOG(1, sfp, "%s: %s %u: no KEEP_SHARE with svb\n", __func__, rip, k);
-			return false;
+			return -ENOMSG;
 		}
+		dlen = 0;
 		if (!expect_wr) {
+			dlen = hp->din_xfer_len;
 			if (hp->dout_xfer_len > 0)
 				goto bad_svb;
-			if (hp->din_xfer_len > 0) {
+			if (dlen > 0) {
 				if (!(flags & SGV4_FLAG_SHARE))
 					goto bad_svb;
 				if (flags & SGV4_FLAG_DO_ON_OTHER)
 					goto bad_svb;
 				expect_wr = true;
+				if (!(flags & SGV4_FLAG_NO_DXFER))
+					uniform_svb = false;
 			}
 			/* allowing commands with no dxfer (in both cases) */
 		} else {	/* checking write side */
-			if (hp->dout_xfer_len > 0) {
+			dlen = hp->dout_xfer_len;
+			if (dlen > 0) {
 				if (unlikely(~flags & (SGV4_FLAG_NO_DXFER | SGV4_FLAG_SHARE |
 						       SGV4_FLAG_DO_ON_OTHER)))
 					goto bad_svb;
 				expect_wr = false;
+				if (unlikely(flags & SGV4_FLAG_DOUT_OFFSET))
+					uniform_svb = false;
 			} else if (unlikely(hp->din_xfer_len > 0)) {
 				goto bad_svb;
 			}
 		}
+		if (!uniform_svb)
+			continue;
+		if (prev_dlen == 0)
+			prev_dlen = dlen;
+		else if (dlen != prev_dlen)
+			uniform_svb = false;
 	}		/* end of request array iterating loop */
 	if (last_is_keep_share) {
 		SG_LOG(1, sfp, "%s: Can't set SGV4_FLAG_KEEP_SHARE on last mrq req\n", __func__);
-		return false;
+		return -ENOMSG;
 	}
 	if (is_svb && expect_wr) {
 		SG_LOG(1, sfp, "%s: svb: unpaired READ at end of request array\n", __func__);
-		return false;
+		return -ENOMSG;
 	}
-	return true;
+	if (is_svb) {
+		bool cur_uniform_svb = test_bit(SG_FFD_CAN_REUSE_BIO, sfp->ffd_bm);
+
+		if (uniform_svb != cur_uniform_svb) {
+			if (uniform_svb)
+				set_bit(SG_FFD_CAN_REUSE_BIO, sfp->ffd_bm);
+			else
+				clear_bit(SG_FFD_CAN_REUSE_BIO, sfp->ffd_bm);
+		}
+	}
+	return 0;
 bad_svb:
 	SG_LOG(1, sfp, "%s: %s %u: svb alternating read-then-write or flags bad\n", __func__,
 	       rip, k);
-	return false;
+	return -ENOMSG;
 }
 
 /* rsv_idx>=0 only when this request is the write-side of a request share */
@@ -1615,8 +1647,8 @@ sg_mrq_submit(struct sg_fd *rq_sfp, struct sg_mrq_hold *mhp, int pos_in_rq_arr,
 	r_cwrp->cmd_len = hp->request_len;
 	r_cwrp->rsv_idx = rsv_idx;
 	ul_timeout = msecs_to_jiffies(hp->timeout);
-	__assign_bit(SG_FRQ_SYNC_INVOC, r_cwrp->frq_bm, (int)mhp->from_sg_io);
-	__set_bit(SG_FRQ_IS_V4I, r_cwrp->frq_bm);
+	__assign_bit(SG_FRQ_PC_SYNC_INVOC, r_cwrp->frq_pc_bm, (int)mhp->from_sg_io);
+	__set_bit(SG_FRQ_PC_IS_V4I, r_cwrp->frq_pc_bm);
 	r_cwrp->h4p = hp;
 	r_cwrp->dlen = hp->din_xfer_len ? hp->din_xfer_len : hp->dout_xfer_len;
 	r_cwrp->timeout = min_t(unsigned long, ul_timeout, INT_MAX);
@@ -1985,6 +2017,21 @@ sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 	return 0;
 }
 
+static void
+sg_svb_cleanup(struct sg_fd *sfp)
+{
+	unsigned long idx;
+	struct xarray *xafp = &sfp->srp_arr;
+	struct sg_request *srp;
+
+	xa_for_each_marked(xafp, idx, srp, SG_XA_RQ_INACTIVE) {
+		if (test_and_clear_bit(SG_FRQ_LT_REUSE_BIO, srp->frq_lt_bm)) {
+			bio_put(srp->bio);	/* _get() near end of sg_start_req() */
+			srp->bio = NULL;
+		}
+	}
+}
+
 /*
  * Processes shared variable blocking (svb) method for multiple requests (mrq). There are two
  * variants: unordered write-side requests; and ordered write-side requests. The read-side requests
@@ -2040,12 +2087,15 @@ sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 	}
 	if (mhp->id_of_mrq)	/* can no longer do a mrq abort */
 		atomic_set(&fp->mrq_id_abort, 0);
+	if (test_and_clear_bit(SG_FFD_CAN_REUSE_BIO, fp->ffd_bm))
+		sg_svb_cleanup(fp);
 	return res;
 }
 
 #if IS_ENABLED(SG_LOG_ACTIVE)
+/* Returns a descriptive string for the different mrq varieties */
 static const char *
-sg_mrq_name(bool from_sg_io, u32 flags)
+sg_mrq_var_str(bool from_sg_io, u32 flags)
 {
 	if (!(flags & SGV4_FLAG_MULTIPLE_REQS))
 		return "_not_ multiple requests control object";
@@ -2063,7 +2113,8 @@ sg_mrq_name(bool from_sg_io, u32 flags)
  * Implements the multiple request functionality. When from_sg_io is true invocation was via
  * ioctl(SG_IO), otherwise it was via ioctl(SG_IOSUBMIT). Submit non-blocking if IMMED flag given
  * or when ioctl(SG_IOSUBMIT) is used with O_NONBLOCK set on its file descriptor. Hipri
- * non-blocking is when the HIPRI flag is given.
+ * non-blocking is when the HIPRI flag is given. Note that on this fd, svb cannot be started
+ * if any mrq is in progress and no mrq can be started if svb is in progress.
  */
 static int
 sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
@@ -2085,13 +2136,13 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 	struct sg_mrq_hold mh;
 	struct sg_mrq_hold *mhp = &mh;
 #if IS_ENABLED(SG_LOG_ACTIVE)
-	const char *mrq_name;
+	const char *mrq_vs;
 #endif
 
 	mhp->cwrp = cwrp;
 	mhp->from_sg_io = from_sg_io; /* false if from SG_IOSUBMIT */
 #if IS_ENABLED(SG_LOG_ACTIVE)
-	mrq_name = sg_mrq_name(from_sg_io, cop->flags);
+	mrq_vs = sg_mrq_var_str(from_sg_io, cop->flags);
 #endif
 	f_non_block = !!(fp->filp->f_flags & O_NONBLOCK);
 	is_svb = !!(cop->flags & SGV4_FLAG_SHARE);	/* via ioctl(SG_IOSUBMIT) only */
@@ -2135,7 +2186,7 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 	}
 	if (!mhp->immed && f_non_block)
 		mhp->immed = true;	/* hmm, think about this */
-	SG_LOG(3, fp, "%s: %s, tot_reqs=%u, id_of_mrq=%d\n", __func__, mrq_name, tot_reqs,
+	SG_LOG(3, fp, "%s: %s, tot_reqs=%u, id_of_mrq=%d\n", __func__, mrq_vs, tot_reqs,
 	       mhp->id_of_mrq);
 	sg_v4h_partial_zero(cop);
 
@@ -2177,8 +2228,13 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 
 	if (SG_IS_DETACHING(sdp) || (o_sfp && SG_IS_DETACHING(o_sfp->parentdp)))
 		return -ENODEV;
-	if (is_svb && unlikely(test_and_set_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm))) {
-		SG_LOG(1, fp, "%s: %s already active\n", __func__, mrq_name);
+	if (is_svb) {
+		if (unlikely(test_and_set_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm))) {
+			SG_LOG(1, fp, "%s: %s already active\n", __func__, mrq_vs);
+			return -EBUSY;
+		}
+	} else if (unlikely(test_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm))) {
+		SG_LOG(1, fp, "%s: %s disallowed with existing svb\n", __func__, mrq_vs);
 		return -EBUSY;
 	}
 	a_hds = kcalloc(tot_reqs, SZ_SG_IO_V4, GFP_KERNEL | __GFP_NOWARN);
@@ -2205,11 +2261,10 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 	mhp->cdb_ap = cdb_ap;
 	mhp->a_hds = a_hds;
 	mhp->cdb_mxlen = cdb_mxlen;
-	/* do sanity checks on all requests before starting */
-	if (unlikely(!sg_mrq_sanity(mhp, is_svb))) {
-		res = -ERANGE;
+	/* do pre-scan on mrq array for sanity and fix-ups */
+	res = sg_mrq_prepare(mhp, is_svb);
+	if (unlikely(res))
 		goto fini;
-	}
 
 	/* override cmd queuing setting to allow */
 	clear_bit(SG_FFD_NO_CMD_Q, fp->ffd_bm);
@@ -2276,8 +2331,8 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p, bool from_
 		clear_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(h4p->timeout);
 	cwr.sfp = sfp;
-	__assign_bit(SG_FRQ_SYNC_INVOC, cwr.frq_bm, (int)from_sg_io);
-	__set_bit(SG_FRQ_IS_V4I, cwr.frq_bm);
+	__assign_bit(SG_FRQ_PC_SYNC_INVOC, cwr.frq_pc_bm, (int)from_sg_io);
+	__set_bit(SG_FRQ_PC_IS_V4I, cwr.frq_pc_bm);
 	cwr.h4p = h4p;
 	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
 	cwr.cmd_len = h4p->request_len;
@@ -2596,11 +2651,11 @@ sg_get_probable_read_side(struct sg_fd *sfp)
 	struct sg_request *rs_srp;
 	struct sg_request *rs_inactive_srp = NULL;
 
-	for (rapp = sfp->rsv_arr; rapp < rapp + SG_MAX_RSV_REQS; ++rapp) {
+	for (rapp = sfp->rsv_arr; rapp < sfp->rsv_arr + SG_MAX_RSV_REQS; ++rapp) {
 		rs_srp = *rapp;
 		if (IS_ERR_OR_NULL(rs_srp) || rs_srp->sh_srp)
 			continue;
-		switch (atomic_read_acquire(&rs_srp->rq_st)) {
+		switch (atomic_read(&rs_srp->rq_st)) {
 		case SG_RQ_INFLIGHT:
 		case SG_RQ_AWAIT_RCV:
 		case SG_RQ_BUSY:
@@ -2685,7 +2740,7 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 		srp->start_ns = ktime_get_boottime_ns();/* assume always > 0 */
 	srp->duration = 0;
 
-	if (!test_bit(SG_FRQ_IS_V4I, srp->frq_bm) && srp->s_hdr3.interface_id == '\0')
+	if (!test_bit(SG_FRQ_PC_IS_V4I, srp->frq_pc_bm) && srp->s_hdr3.interface_id == '\0')
 		at_head = true;	/* backward compatibility for v1+v2 interfaces */
 	else if (test_bit(SG_FFD_Q_AT_TAIL, sfp->ffd_bm))
 		/* cmd flags can override sfd setting */
@@ -2696,16 +2751,16 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 	kref_get(&sfp->f_ref); /* put usually in: sg_rq_end_io() */
 	sg_rq_chg_state_force(srp, SG_RQ_INFLIGHT);
 	/* >>>>>>> send cmd/req off to other levels <<<<<<<< */
-	if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
+	if (!test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm)) {
 		atomic_inc(&sfp->submitted);
-		set_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm);
+		set_bit(SG_FRQ_PC_COUNT_ACTIVE, srp->frq_pc_bm);
 	}
 	if (srp->rq_flags & SGV4_FLAG_HIPRI) {
 		rqq->cmd_flags |= REQ_HIPRI;
 		srp->cookie = request_to_qc_t(rqq->mq_hctx, rqq);
 	}
 	blk_execute_rq_nowait(sdp->disk, rqq, (int)at_head, sg_rq_end_io);
-	set_bit(SG_FRQ_ISSUED, srp->frq_bm);
+	set_bit(SG_FRQ_PC_ISSUED, srp->frq_pc_bm);
 }
 
 /*
@@ -2729,7 +2784,7 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 	struct sg_io_hdr *hi_p;
 	struct sg_io_v4 *h4p;
 
-	if (likely(test_bit(SG_FRQ_IS_V4I, cwrp->frq_bm))) {
+	if (likely(test_bit(SG_FRQ_PC_IS_V4I, cwrp->frq_pc_bm))) {
 		h4p = cwrp->h4p;
 		hi_p = NULL;
 		dir = SG_DXFER_NONE;
@@ -2830,7 +2885,7 @@ sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int id, bool is_ta
 
 /* Returns number of bytes copied to user space provided sense buffer or negated errno value. */
 static int
-sg_copy_sense(struct sg_request *srp, bool v4_active)
+sg_copy_sense(struct sg_request *srp)
 {
 	int sb_len_ret = 0;
 	int scsi_stat;
@@ -2845,7 +2900,7 @@ sg_copy_sense(struct sg_request *srp, bool v4_active)
 		void __user *up;
 
 		srp->sense_bp = NULL;
-		if (v4_active) {
+		if (SG_IS_V4I(srp)) {
 			up = uptr64(srp->s_hdr4.sbp);
 			mx_sb_len = srp->s_hdr4.max_sb_len;
 		} else {
@@ -2866,7 +2921,7 @@ sg_copy_sense(struct sg_request *srp, bool v4_active)
 }
 
 static int
-sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
+sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp)
 {
 	int err = 0;
 	u32 rq_res = srp->rq_result;
@@ -2877,13 +2932,13 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 	if (unlikely(!sg_result_is_good(rq_res))) {
 		srp->rq_info |= SG_INFO_CHECK;
 		if (!scsi_status_is_good(rq_res)) {
-			int sb_len_wr = sg_copy_sense(srp, v4_active);
+			int sb_len_wr = sg_copy_sense(srp);
 
 			if (unlikely(sb_len_wr < 0))
 				return sb_len_wr;
 		}
 	}
-	if (unlikely(test_bit(SG_FRQ_ABORTING, srp->frq_bm)))
+	if (unlikely(test_bit(SG_FRQ_PC_ABORTING, srp->frq_pc_bm)))
 		srp->rq_info |= SG_INFO_ABORTED;
 
 	if (sh_var == SG_SHR_WS_RQ && sg_fd_is_shared(sfp)) {
@@ -2930,7 +2985,7 @@ sg_rec_state_v3v4(struct sg_fd *sfp, struct sg_request *srp, bool v4_active)
 	rs_srp->sh_var = SG_SHR_NONE;
 	sg_rq_chg_state_force(rs_srp, SG_RQ_INACTIVE);
 	atomic_inc(&rs_srp->parentfp->inactives);
-	rs_srp->frq_bm[0] &= (1 << SG_FRQ_RESERVED);
+	rs_srp->frq_pc_bm[0] = 0;
 	rs_srp->in_resid = 0;
 	rs_srp->rq_info = 0;
 	rs_srp->sense_len = 0;
@@ -3008,7 +3063,7 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p, struct
 
 	SG_LOG(3, sfp, "%s: p=%s, h4p=%s\n", __func__, (p ? "given" : "NULL"),
 	       (h4p ? "given" : "NULL"));
-	err = sg_rec_state_v3v4(sfp, srp, true);
+	err = sg_rec_state_v3v4(sfp, srp);
 	h4p->guard = 'Q';
 	h4p->protocol = 0;
 	h4p->subprotocol = 0;
@@ -3155,7 +3210,7 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p, bool n
 	return res;
 }
 
-// sg_wait_id_event
+/* Either wait for command completion matching id ('-1': any); or poll for it if do_poll==true */
 static int
 sg_wait_poll_by_id(struct sg_fd *sfp, struct sg_request **srpp, int id,
 		   bool is_tag, int do_poll)
@@ -3248,7 +3303,7 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 		if (unlikely(res))
 			return res;	/* signal --> -ERESTARTSYS */
 	}
-	if (test_and_set_bit(SG_FRQ_RECEIVING, srp->frq_bm)) {
+	if (test_and_set_bit(SG_FRQ_PC_RECEIVING, srp->frq_pc_bm)) {
 		cpu_relax();
 		goto try_again;
 	}
@@ -3302,7 +3357,7 @@ sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 		if (IS_ERR(srp))
 			return PTR_ERR(srp);
 	}
-	if (test_and_set_bit(SG_FRQ_RECEIVING, srp->frq_bm)) {
+	if (test_and_set_bit(SG_FRQ_PC_RECEIVING, srp->frq_pc_bm)) {
 		cpu_relax();
 		goto try_again;
 	}
@@ -3475,7 +3530,7 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 		if (IS_ERR(srp))
 			return PTR_ERR(srp);
 	}
-	if (test_and_set_bit(SG_FRQ_RECEIVING, srp->frq_bm)) {
+	if (test_and_set_bit(SG_FRQ_PC_RECEIVING, srp->frq_pc_bm)) {
 		cpu_relax();
 		goto try_again;
 	}
@@ -3511,7 +3566,7 @@ sg_receive_v3(struct sg_fd *sfp, struct sg_request *srp, void __user *p)
 
 	SG_LOG(3, sfp, "%s: sh_var: %s srp=0x%pK\n", __func__, sg_shr_str(srp->sh_var, false),
 	       srp);
-	err = sg_rec_state_v3v4(sfp, srp, false);
+	err = sg_rec_state_v3v4(sfp, srp);
 	memset(hp, 0, sizeof(*hp));
 	memcpy(hp, &srp->s_hdr3, sizeof(srp->s_hdr3));
 	hp->sb_len_wr = srp->sense_len;
@@ -3645,7 +3700,6 @@ sg_finish_rs_rq(struct sg_fd *sfp, struct sg_request *rs_srp, bool even_if_in_ws
 		atomic_inc(&rs_sfp->inactives);
 	rs_rsv_srp->tag = SG_TAG_WILDCARD;
 	rs_rsv_srp->sh_var = SG_SHR_NONE;
-	set_bit(SG_FRQ_RESERVED, rs_rsv_srp->frq_bm);
 	rs_rsv_srp->in_resid = 0;
 	rs_rsv_srp->rq_info = 0;
 	rs_rsv_srp->sense_len = 0;
@@ -3758,7 +3812,7 @@ sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side)
 			rsv_srp = *rapp;
 			if (IS_ERR_OR_NULL(rsv_srp) || rsv_srp->sh_var != SG_SHR_RS_RQ)
 				continue;
-			sr_st = atomic_read_acquire(&rsv_srp->rq_st);
+			sr_st = atomic_read(&rsv_srp->rq_st);
 			switch (sr_st) {
 			case SG_RQ_SHR_SWAP:
 				set_inactive = true;
@@ -3949,8 +4003,8 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp, struct sg_req
 	rip->duration = sg_get_dur(srp, NULL, test_bit(SG_FFD_TIME_IN_NS, sfp->ffd_bm), NULL);
 	if (rip->duration == U32_MAX)
 		rip->duration = 0;
-	rip->orphan = test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm);
-	rip->sg_io_owned = test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
+	rip->orphan = test_bit(SG_FRQ_PC_IS_ORPHAN, srp->frq_pc_bm);
+	rip->sg_io_owned = test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm);
 	rip->problem = !sg_result_is_good(srp->rq_result);
 	rip->pack_id = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm) ? srp->tag : srp->pack_id;
 	rip->usr_ptr = SG_IS_V4I(srp) ? uptr64(srp->s_hdr4.usr_ptr) : srp->s_hdr3.usr_ptr;
@@ -4095,7 +4149,7 @@ sg_abort_req(struct sg_fd *sfp, struct sg_request *srp)
 	enum sg_rq_state rq_st;
 	struct request *rqq;
 
-	if (test_and_set_bit(SG_FRQ_ABORTING, srp->frq_bm)) {
+	if (test_and_set_bit(SG_FRQ_PC_ABORTING, srp->frq_pc_bm)) {
 		SG_LOG(1, sfp, "%s: already aborting req pack_id/tag=%d/%d\n", __func__,
 		       srp->pack_id, srp->tag);
 		goto fini;	/* skip quietly if already aborted */
@@ -4105,16 +4159,16 @@ sg_abort_req(struct sg_fd *sfp, struct sg_request *srp)
 	       sg_rq_st_str(rq_st, false));
 	switch (rq_st) {
 	case SG_RQ_BUSY:
-		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		clear_bit(SG_FRQ_PC_ABORTING, srp->frq_pc_bm);
 		res = -EBUSY;	/* should not occur often */
 		break;
 	case SG_RQ_INACTIVE:	/* perhaps done already */
-		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		clear_bit(SG_FRQ_PC_ABORTING, srp->frq_pc_bm);
 		break;
 	case SG_RQ_AWAIT_RCV:	/* user should still do completion */
 	case SG_RQ_SHR_SWAP:
 	case SG_RQ_SHR_IN_WS:
-		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		clear_bit(SG_FRQ_PC_ABORTING, srp->frq_pc_bm);
 		break;		/* nothing to do here, return 0 */
 	case SG_RQ_INFLIGHT:	/* only attempt abort if inflight */
 		srp->rq_result |= (DRIVER_SOFT << 24);
@@ -4125,7 +4179,7 @@ sg_abort_req(struct sg_fd *sfp, struct sg_request *srp)
 		}
 		break;
 	default:
-		clear_bit(SG_FRQ_ABORTING, srp->frq_bm);
+		clear_bit(SG_FRQ_PC_ABORTING, srp->frq_pc_bm);
 		break;
 	}
 fini:
@@ -4523,10 +4577,13 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 		use_new_srp = true;
 		xa_for_each_marked(xafp, idx, t_srp, SG_XA_RQ_INACTIVE) {
 			if (t_srp != o_srp && new_sz <= t_srp->sgatp->buflen) {
+				bool is_reuse_bio = test_bit(SG_FRQ_LT_REUSE_BIO,
+							     o_srp->frq_lt_bm);
 				use_new_srp = false;
 				xa_lock_irqsave(xafp, iflags);
-				__clear_bit(SG_FRQ_RESERVED, o_srp->frq_bm);
-				__set_bit(SG_FRQ_RESERVED, t_srp->frq_bm);
+				__clear_bit(SG_FRQ_LT_RESERVED, o_srp->frq_lt_bm);
+				__set_bit(SG_FRQ_LT_RESERVED, t_srp->frq_lt_bm);
+				__assign_bit(SG_FRQ_LT_REUSE_BIO, t_srp->frq_lt_bm, is_reuse_bio);
 				*rapp = t_srp;
 				xa_unlock_irqrestore(xafp, iflags);
 				sg_remove_srp(n_srp);
@@ -4543,8 +4600,10 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 			idx = o_srp->rq_idx;
 			cxc_srp = __xa_cmpxchg(xafp, idx, o_srp, n_srp, GFP_ATOMIC);
 			if (o_srp == cxc_srp) {
-				__assign_bit(SG_FRQ_RESERVED, n_srp->frq_bm,
-					     test_bit(SG_FRQ_RESERVED, o_srp->frq_bm));
+				__assign_bit(SG_FRQ_LT_RESERVED, n_srp->frq_lt_bm,
+					     test_bit(SG_FRQ_LT_RESERVED, o_srp->frq_lt_bm));
+				__assign_bit(SG_FRQ_LT_REUSE_BIO, n_srp->frq_lt_bm,
+					     test_bit(SG_FRQ_LT_REUSE_BIO, o_srp->frq_lt_bm));
 				*rapp = n_srp;
 				sg_rq_chg_state_force_ulck(n_srp, SG_RQ_INACTIVE);
 				/* no bump of sfp->inactives since replacement */
@@ -4608,7 +4667,7 @@ sg_any_persistent_orphans(struct sg_fd *sfp)
 		if (sg_num_waiting_maybe_acquire(sfp) < 1)
 			return false;
 		xa_for_each_marked(xafp, idx, srp, SG_XA_RQ_AWAIT) {
-			if (test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm))
+			if (test_bit(SG_FRQ_PC_IS_ORPHAN, srp->frq_pc_bm))
 				return true;
 		}
 	}
@@ -4771,7 +4830,7 @@ sg_extended_bool_flags(struct sg_fd *sfp, struct sg_extended_info *seip)
 		if (rs_sfp && !IS_ERR_OR_NULL(rs_sfp->rsv_arr[0])) {
 			struct sg_request *res_srp = rs_sfp->rsv_arr[0];
 
-			if (atomic_read_acquire(&res_srp->rq_st) == SG_RQ_SHR_SWAP)
+			if (atomic_read(&res_srp->rq_st) == SG_RQ_SHR_SWAP)
 				c_flgs_val_out |= SG_CTL_FLAGM_READ_SIDE_FINI;
 			else
 				c_flgs_val_out &= ~SG_CTL_FLAGM_READ_SIDE_FINI;
@@ -5215,14 +5274,14 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp, uns
 		val = -1;
 		if (test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm)) {
 			xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
-				if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
+				if (!test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm)) {
 					val = srp->tag;
 					break;
 				}
 			}
 		} else {
 			xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
-				if (!test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm)) {
+				if (!test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm)) {
 					val = srp->pack_id;
 					break;
 				}
@@ -5471,9 +5530,9 @@ sg_sfp_blk_poll(struct sg_fd *sfp, int loop_count)
 	xa_lock_irqsave(xafp, iflags);
 	xa_for_each(xafp, idx, srp) {
 		if ((srp->rq_flags & SGV4_FLAG_HIPRI) &&
-		    !test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm) &&
+		    !test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm) &&
 		    atomic_read(&srp->rq_st) == SG_RQ_INFLIGHT &&
-		    test_bit(SG_FRQ_ISSUED, srp->frq_bm)) {
+		    test_bit(SG_FRQ_PC_ISSUED, srp->frq_pc_bm)) {
 			xa_unlock_irqrestore(xafp, iflags);
 			n = sg_srp_q_blk_poll(srp, q, loop_count);
 			if (n == -ENODATA)
@@ -5655,7 +5714,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
 	}
 	if (srp->sgat_h.page_order > 0 || req_sz > (unsigned long)srp->sgat_h.buflen) {
 		sg_remove_srp(srp);
-		set_bit(SG_FRQ_FOR_MMAP, srp->frq_bm);
+		set_bit(SG_FRQ_PC_FOR_MMAP, srp->frq_pc_bm);
 		res = sg_mk_sgat(srp, sfp, req_sz);
 		if (res) {
 			SG_LOG(1, sfp, "%s: sg_mk_sgat failed, wanted=%lu\n", __func__, req_sz);
@@ -5688,7 +5747,7 @@ sg_uc_rq_end_io_orphaned(struct work_struct *work)
 		return;
 	}
 	SG_LOG(3, sfp, "%s: srp=0x%pK\n", __func__, srp);
-	if (test_bit(SG_FRQ_DEACT_ORPHAN, srp->frq_bm)) {
+	if (test_bit(SG_FRQ_PC_DEACT_ORPHAN, srp->frq_pc_bm)) {
 		sg_finish_scsi_blk_rq(srp);	/* clean up orphan case */
 		sg_deact_request(sfp, srp);
 	}
@@ -5730,7 +5789,7 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 			srp->in_resid = a_resid;
 		}
 	}
-	if (unlikely(test_bit(SG_FRQ_ABORTING, srp->frq_bm)) && sg_result_is_good(rq_result))
+	if (unlikely(test_bit(SG_FRQ_PC_ABORTING, srp->frq_pc_bm)) && sg_result_is_good(rq_result))
 		srp->rq_result |= (DRIVER_HARD << 24);
 
 	SG_LOG(6, sfp, "%s: pack/tag_id=%d/%d, cmd=0x%x, res=0x%x\n", __func__, srp->pack_id,
@@ -5764,19 +5823,19 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 		}
 	}
 	srp->sense_len = slen;
-	if (unlikely(test_bit(SG_FRQ_IS_ORPHAN, srp->frq_bm))) {
+	if (unlikely(test_bit(SG_FRQ_PC_IS_ORPHAN, srp->frq_pc_bm))) {
 		if (test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm)) {
-			__clear_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm);
+			__clear_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm);
 		} else {
 			rqq_state = SG_RQ_BUSY;
-			__set_bit(SG_FRQ_DEACT_ORPHAN, srp->frq_bm);
+			__set_bit(SG_FRQ_PC_DEACT_ORPHAN, srp->frq_pc_bm);
 		}
 	}
 	xa_lock_irqsave(&sfp->srp_arr, iflags);
-	__set_bit(SG_FRQ_ISSUED, srp->frq_bm);
+	__set_bit(SG_FRQ_PC_ISSUED, srp->frq_pc_bm);
 	sg_rq_chg_state_force_ulck(srp, rqq_state);	/* normally --> SG_RQ_AWAIT_RCV */
 	WRITE_ONCE(srp->rqq, NULL);
-	if (test_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm)) {
+	if (test_bit(SG_FRQ_PC_COUNT_ACTIVE, srp->frq_pc_bm)) {
 		int num = atomic_inc_return(&sfp->waiting);
 
 		if (num < 2) {
@@ -6200,6 +6259,7 @@ sg_rq_map_kern(struct sg_request *srp, struct request_queue *q, struct request *
 static int
 sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 {
+	bool bump_bio_get = false;
 	bool no_dxfer, us_xfer;
 	int res = 0;
 	int dlen = cwrp->dlen;
@@ -6286,7 +6346,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	srp->cmd_opcode = scsi_rp->cmd[0];
 	no_dxfer = dlen <= 0 || dxfer_dir == SG_DXFER_NONE;
 	us_xfer = !(rq_flags & (SG_FLAG_NO_DXFER | SG_FLAG_MMAP_IO));
-	__assign_bit(SG_FRQ_US_XFER, srp->frq_bm, !no_dxfer && us_xfer);
+	__assign_bit(SG_FRQ_PC_US_XFER, srp->frq_pc_bm, !no_dxfer && us_xfer);
 	rqq->end_io_data = srp;
 	scsi_rp->retries = SG_DEFAULT_RETRIES;
 	req_schp = srp->sgatp;
@@ -6301,7 +6361,24 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		md = NULL;
 		if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
 			cp = "direct_io, ";
-	} else {	/* normal IO and failed conditions for dio path */
+	} else if (test_bit(SG_FFD_CAN_REUSE_BIO, sfp->ffd_bm)) {
+		if (test_bit(SG_FRQ_LT_REUSE_BIO, srp->frq_lt_bm)) {
+			if (srp->bio) {
+				res = blk_rq_append_bio(rqq, &srp->bio);
+				if (res)
+					SG_LOG(1, sfp, "%s: blk_rq_append_bio err=%d\n", __func__,
+					       res);
+				md = NULL;
+				atomic_inc(&sg_tmp_count_reused_bios);
+			} else {
+				res = -EPROTO;
+			}
+			goto fini;
+		} else {	/* first use of bio, almost normal setup */
+			md = &map_data;
+			bump_bio_get = true;
+		}
+	} else {	/* normal indirect IO */
 		md = &map_data;
 	}
 
@@ -6355,6 +6432,12 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 #endif
 	} else {	/* transfer data to/from kernel buffers */
 		res = sg_rq_map_kern(srp, q, rqq, r0w);
+		if (res)
+			goto fini;
+		if (bump_bio_get) {	/* keep bio alive to re-use next time */
+			set_bit(SG_FRQ_LT_REUSE_BIO, srp->frq_lt_bm);
+			bio_get(rqq->bio);	/* _put() in sg_svb_cleanup() */
+		}
 	}
 fini:
 	if (unlikely(res)) {		/* failure, free up resources */
@@ -6385,11 +6468,12 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 	int ret;
 	struct sg_fd *sfp = srp->parentfp;
 	struct request *rqq = READ_ONCE(srp->rqq);
+	struct bio *bio;
 	__maybe_unused char b[32];
 
 	SG_LOG(4, sfp, "%s: srp=0x%pK%s\n", __func__, srp, sg_get_rsv_str_lck(srp, " ", "",
 									      sizeof(b), b));
-	if (test_and_clear_bit(SG_FRQ_COUNT_ACTIVE, srp->frq_bm)) {
+	if (test_and_clear_bit(SG_FRQ_PC_COUNT_ACTIVE, srp->frq_pc_bm)) {
 		if (atomic_dec_and_test(&sfp->submitted))
 			clear_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm);
 		atomic_dec_return_release(&sfp->waiting);
@@ -6402,16 +6486,18 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 			scsi_req_free_cmd(scsi_req(rqq));
 		blk_put_request(rqq);
 	}
-	if (likely(srp->bio)) {
-		bool us_xfer = test_bit(SG_FRQ_US_XFER, srp->frq_bm);
-		struct bio *bio = srp->bio;
+	bio = srp->bio;
+	if (likely(bio)) {
+		bool us_xfer = test_bit(SG_FRQ_PC_US_XFER, srp->frq_pc_bm);
 
-		srp->bio = NULL;
-		if (us_xfer && bio) {
+		if (us_xfer) {
 			ret = blk_rq_unmap_user(bio);
 			if (unlikely(ret))	/* -EINTR (-4) can be ignored */
 				SG_LOG(6, sfp, "%s: blk_rq_unmap_user() --> %d\n", __func__, ret);
-		}
+			srp->bio = NULL;
+		} else if (!test_bit(SG_FRQ_LT_REUSE_BIO, srp->frq_lt_bm)) {
+			srp->bio = NULL;
+		} /* else may be able to re-use this bio [mrq, uniform svb] */
 	}
 	/* In worst case, READ data returned to user space by this point */
 }
@@ -6444,7 +6530,8 @@ sg_mk_sgat(struct sg_request *srp, struct sg_fd *sfp, int minlen)
 		return -ENOMEM;
 
 	/* elem_sz must be power of 2 and >= PAGE_SIZE */
-	elem_sz = test_bit(SG_FRQ_FOR_MMAP, srp->frq_bm) ? (int)PAGE_SIZE : sfp->sgat_elem_sz;
+	elem_sz = test_bit(SG_FRQ_PC_FOR_MMAP, srp->frq_pc_bm) ? (int)PAGE_SIZE :
+								 sfp->sgat_elem_sz;
 	if (sdp && unlikely(sdp->device->host->unchecked_isa_dma))
 		mask_ap |= GFP_DMA;
 	o_order = get_order(elem_sz);
@@ -6606,7 +6693,7 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 		for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
 		     srp;
 		     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
-			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
+			if (test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm))
 				continue;
 			if (unlikely(is_tag)) {
 				if (srp->tag != id)
@@ -6639,7 +6726,7 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 		for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
 		     srp;
 		     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
-			if (test_bit(SG_FRQ_SYNC_INVOC, srp->frq_bm))
+			if (test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm))
 				continue;
 			res = sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY);
 			if (likely(res == 0)) {
@@ -6849,7 +6936,7 @@ sg_setup_req_new_srp(struct sg_comm_wr_t *cwrp, bool new_rsv_srp, bool no_reqs,
 	       r_srp);
 	if (new_rsv_srp) {
 		fp->rsv_arr[ra_idx] = r_srp;
-		set_bit(SG_FRQ_RESERVED, r_srp->frq_bm);
+		set_bit(SG_FRQ_LT_RESERVED, r_srp->frq_lt_bm);
 		r_srp->sh_srp = NULL;
 	}
 	xa_lock_irqsave(xafp, iflags);
@@ -6883,7 +6970,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 	bool no_reqs = false;
 	bool ws_rq = false;
 	bool try_harder = false;
-	bool keep_frq_bm = false;
+	bool keep_frq_pc_bm = false;
 	bool second = false;
 	int res, ra_idx, l_used_idx;
 	int dlen = cwrp->dlen;
@@ -6906,7 +6993,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 				r_srp = NULL;
 			} else {
 				atomic_dec(&fp->inactives);
-				keep_frq_bm = true;
+				keep_frq_pc_bm = true;
 				r_srp->sh_srp = NULL;
 				goto final_setup;
 			}
@@ -6917,7 +7004,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 			goto maybe_new;
 		}
 		r_srp = fp->rsv_arr[ra_idx];
-		sr_st = atomic_read_acquire(&r_srp->rq_st);
+		sr_st = atomic_read(&r_srp->rq_st);
 		if (sr_st == SG_RQ_INACTIVE) {
 			res = sg_rq_chg_state(r_srp, sr_st, SG_RQ_BUSY);
 			if (unlikely(res)) {
@@ -6955,13 +7042,13 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 		ws_rq = true;
 		r_srp = cwrp->possible_srp;
 		if (r_srp) {
-			sr_st = atomic_read_acquire(&r_srp->rq_st);
+			sr_st = atomic_read(&r_srp->rq_st);
 			if (sr_st == SG_RQ_INACTIVE && dlen <= r_srp->sgat_h.buflen) {
 				res = sg_rq_chg_state(r_srp, sr_st, SG_RQ_BUSY);
 				if (likely(res == 0)) {
 					/* possible_srp bypasses loop to find candidate */
 					mk_new_srp = false;
-					keep_frq_bm = true;
+					keep_frq_pc_bm = true;
 					goto final_setup;
 				}
 			}
@@ -6990,7 +7077,8 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 		s_idx = (l_used_idx < 0) ? 0 : l_used_idx;
 		if (l_used_idx >= 0 && xa_get_mark(xafp, s_idx, SG_XA_RQ_INACTIVE)) {
 			r_srp = xa_load(xafp, s_idx);
-			if (r_srp && (allow_rsv || !test_bit(SG_FRQ_RESERVED, r_srp->frq_bm))) {
+			if (r_srp &&
+			    (allow_rsv || !test_bit(SG_FRQ_LT_RESERVED, r_srp->frq_lt_bm))) {
 				if (r_srp->sgat_h.buflen <= SG_DEF_SECTOR_SZ) {
 					if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE,
 							    SG_RQ_BUSY) == 0) {
@@ -7002,7 +7090,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 			}
 		}
 		xa_for_each_marked(xafp, idx, r_srp, SG_XA_RQ_INACTIVE) {
-			if (allow_rsv || !test_bit(SG_FRQ_RESERVED, r_srp->frq_bm)) {
+			if (allow_rsv || !test_bit(SG_FRQ_LT_RESERVED, r_srp->frq_lt_bm)) {
 				if (r_srp->sgat_h.buflen <= SG_DEF_SECTOR_SZ) {
 					if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
 						continue;
@@ -7056,7 +7144,7 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 			     r_srp;
 			     r_srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_INACTIVE)) {
 				if (dlen <= r_srp->sgat_h.buflen &&
-				    !test_bit(SG_FRQ_RESERVED, r_srp->frq_bm)) {
+				    !test_bit(SG_FRQ_LT_RESERVED, r_srp->frq_lt_bm)) {
 					if (sg_rq_chg_state(r_srp, SG_RQ_INACTIVE, SG_RQ_BUSY))
 						continue;
 					atomic_dec(&fp->inactives);
@@ -7090,19 +7178,8 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 			goto start_again;
 	}
 final_setup:
-	if (!keep_frq_bm) {
-		/* keep SG_FRQ_RESERVED setting from prior/new r_srp; clear rest */
-		bool is_rsv = test_bit(SG_FRQ_RESERVED, r_srp->frq_bm);
-
-		r_srp->frq_bm[0] = 0;
-		if (is_rsv)
-			set_bit(SG_FRQ_RESERVED, r_srp->frq_bm);
-		/* r_srp inherits these flags from cwrp->frq_bm */
-		if (test_bit(SG_FRQ_IS_V4I, cwrp->frq_bm))
-			set_bit(SG_FRQ_IS_V4I, r_srp->frq_bm);
-		if (test_bit(SG_FRQ_SYNC_INVOC, cwrp->frq_bm))
-			set_bit(SG_FRQ_SYNC_INVOC, r_srp->frq_bm);
-	}
+	if (!keep_frq_pc_bm)
+		r_srp->frq_pc_bm[0] = cwrp->frq_pc_bm[0];
 	r_srp->sgatp->dlen = dlen;	/* must be <= r_srp->sgat_h.buflen */
 	r_srp->sh_var = sh_var;
 	r_srp->cmd_opcode = 0xff;  /* set invalid opcode (VS), 0x0 is TUR */
@@ -7143,7 +7220,6 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 static void
 sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 {
-	bool is_rsv;
 	enum sg_rq_state sr_st;
 	u8 *sbp;
 
@@ -7152,15 +7228,12 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 	SG_LOG(3, sfp, "%s: srp=%pK\n", __func__, srp);
 	sbp = srp->sense_bp;
 	srp->sense_bp = NULL;
-	sr_st = atomic_read_acquire(&srp->rq_st);
+	sr_st = atomic_read(&srp->rq_st);
 	if (sr_st != SG_RQ_SHR_SWAP) {
 		/* Called from many contexts, don't know whether xa locks held. So assume not. */
 		sg_rq_chg_state_force(srp, SG_RQ_INACTIVE);
 		atomic_inc(&sfp->inactives);
-		is_rsv = test_bit(SG_FRQ_RESERVED, srp->frq_bm);
-		WRITE_ONCE(srp->frq_bm[0], 0);
-		if (is_rsv)
-			__set_bit(SG_FRQ_RESERVED, srp->frq_bm);
+		WRITE_ONCE(srp->frq_pc_bm[0], 0);
 		srp->tag = SG_TAG_WILDCARD;
 		srp->in_resid = 0;
 		srp->rq_info = 0;
@@ -7250,7 +7323,7 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 		}
 		srp->rq_idx = idx;
 		srp->parentfp = sfp;
-		__set_bit(SG_FRQ_RESERVED, srp->frq_bm);
+		__set_bit(SG_FRQ_LT_RESERVED, srp->frq_lt_bm);
 		sg_rq_chg_state_force_ulck(srp, SG_RQ_INACTIVE);
 		atomic_inc(&sfp->inactives);
 		xa_unlock_irqrestore(xafp, iflags);
@@ -7747,8 +7820,8 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx, bool r
 		if (xa_get_mark(&fp->srp_arr, idx, SG_XA_RQ_INACTIVE))
 			continue;
 		if (set_debug)
-			n += scnprintf(obp + n, len - n, "     rq_bm=0x%lx", srp->frq_bm[0]);
-		else if (test_bit(SG_FRQ_ABORTING, srp->frq_bm))
+			n += scnprintf(obp + n, len - n, "     rq_pc_bm=0x%lx", srp->frq_pc_bm[0]);
+		else if (test_bit(SG_FRQ_PC_ABORTING, srp->frq_pc_bm))
 			n += scnprintf(obp + n, len - n, "     abort>> ");
 		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns, false, obp + n, len - n);
 		++k;
@@ -7765,7 +7838,7 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx, bool r
 		if (k == 0)
 			n += scnprintf(obp + n, len - n, "   Inactives:\n");
 		if (set_debug)
-			n += scnprintf(obp + n, len - n, "     rq_bm=0x%lx", srp->frq_bm[0]);
+			n += scnprintf(obp + n, len - n, "     rq_lt_bm=0x%lx", srp->frq_lt_bm[0]);
 		n += sg_proc_debug_sreq(srp, fp->timeout, t_in_ns, true, obp + n, len - n);
 		++k;
 		if ((k % 8) == 0) {	/* don't hold up things too long */
@@ -7826,8 +7899,9 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v, bool reduced)
 
 	b1[0] = '\0';
 	if (it && it->index == 0)
-		seq_printf(s, "max_active_device=%d  def_reserved_size=%d\n", (int)it->max,
-			   def_reserved_size);
+		seq_printf(s, "max_active_device=%d  def_reserved_size=%d  num_reused_bios=%d\n",
+			   (int)it->max, def_reserved_size,
+			   atomic_read(&sg_tmp_count_reused_bios));
 	fdi_p = it ? &it->fd_index : &k;
 	bp = kzalloc(bp_len, __GFP_NOWARN | GFP_KERNEL);
 	if (unlikely(!bp)) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 80/83] sg: expand bvec usage; re-use bio_s
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (79 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 79/83] sg: mrq: if uniform svb then re-use bio_s Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 81/83] sg: blk_poll/hipri work for mrq Douglas Gilbert
                   ` (2 subsequent siblings)
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Rework sg_rq_map_kern() to use the newer bio_add_pc_page() and
blk_rq_append_bio() functions instead of blk_rq_map_kern(). Build
single bio with multiple bvec elements, each holding 1 or more
pages. This requires direct manipulation of this request
object field: nr_phys_segments .

Re-use bio_s. Having built a complex bio why throw it away after
one use if the driver knows (e.g. in mrq svb mode) that it will
be building exactly the same bio again and again? This requires
manipulating bio_get() and bio_put() plus remembering about
5 bio fields that are cleared by bio_reset().

More clearly mark that a request belongs to a larger multiple
requests (mrq) submission with the SG_FRQ_PC_PART_MRQ bit field.

Change the error strategy in sg_svb_mrq_first_come() and
sg_svb_mrq_ordered(): once started they must continue to move
forward, even in the face of errors. If not, then there will be
very hard to detect memory leaks or worse. Read-side requests
especially must not be left stranded.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 543 ++++++++++++++++++++++++++++------------------
 1 file changed, 328 insertions(+), 215 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 0a0b40a8ab65..26047a8ff1e2 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -145,6 +145,7 @@ enum sg_shr_var {
 #define SG_FRQ_PC_COUNT_ACTIVE	8	/* sfp->submitted + waiting active */
 #define SG_FRQ_PC_ISSUED	9	/* blk_execute_rq_nowait() finished */
 #define SG_FRQ_POLL_SLEPT	10	/* stop re-entry of hybrid_sleep() */
+#define SG_FRQ_PC_PART_MRQ	11	/* this cmd part of mrq array */
 
 /* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */
 #define SG_FFD_FORCE_PACKID	0	/* receive only given pack_id/tag */
@@ -189,7 +190,6 @@ static int sg_allow_dio = SG_ALLOW_DIO_DEF;	/* ignored by code */
 
 static int scatter_elem_sz = SG_SCATTER_SZ;
 static bool no_attach_msg;
-static atomic_t sg_tmp_count_reused_bios;
 
 #define SG_DEF_SECTOR_SZ 512
 
@@ -226,10 +226,14 @@ struct sg_slice_hdr3 {
 
 struct sg_slice_hdr4 {	/* parts of sg_io_v4 object needed in async usage */
 	void __user *sbp;	/* derived from sg_io_v4::response */
+	bio_end_io_t *bi_end_io;
 	u64 usr_ptr;		/* hold sg_io_v4::usr_ptr as given (u64) */
 	int out_resid;
 	u32 wr_offset;		/* from v4::spare_in when flagged; in bytes */
 	u32 wr_len;		/* for shared reqs maybe < read-side */
+	unsigned int bi_size;	/* reuse_bio: from original bio */
+	unsigned short bi_opf;	/* reuse_bio: from original bio */
+	unsigned short bi_vcnt;	/* reuse_bio: from original bio */
 	s16 dir;		/* data xfer direction; SG_DXFER_*  */
 	u16 cmd_len;		/* truncated of sg_io_v4::request_len */
 	u16 max_sb_len;		/* truncated of sg_io_v4::max_response_len */
@@ -260,6 +264,7 @@ struct sg_request {	/* active SCSI command or inactive request */
 	u32 rq_idx;		/* my index within parent's srp_arr */
 	u32 rq_info;		/* info supplied by v3 and v4 interfaces */
 	u32 rq_result;		/* packed scsi request result from LLD */
+	u32 rsv_arr_idx;	/* my index in parentfp->rsv_arr */
 	int in_resid;		/* requested-actual byte count on data-in */
 	int pack_id;		/* v3 pack_id or in v4 request_extra field */
 	int sense_len;		/* actual sense buffer length (data-in) */
@@ -304,9 +309,9 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	struct fasync_struct *async_qp; /* used by asynchronous notification */
 	struct eventfd_ctx *efd_ctxp;	/* eventfd context or NULL */
 	struct xarray srp_arr;	/* xarray of sg_request object pointers */
-	struct sg_request *rsv_arr[SG_MAX_RSV_REQS];
 	struct kref f_ref;
 	struct execute_work ew_fd;  /* harvest all fd resources and lists */
+	struct sg_request *rsv_arr[SG_MAX_RSV_REQS];
 };
 
 struct sg_device { /* holds the state of each scsi generic device */
@@ -354,6 +359,7 @@ struct sg_mrq_hold {	/* for passing context between multiple requests (mrq) func
 	unsigned ordered_wr:1;
 	int id_of_mrq;
 	int s_res;		/* secondary error: some-good-then-error; in co.spare_out */
+	int dtd_errs;		/* incremented for each driver/transport/device error */
 	u32 cdb_mxlen;		/* cdb length in cdb_ap, actual be may less */
 	u32 tot_reqs;		/* total number of requests and cdb_s */
 	struct sg_comm_wr_t *cwrp;	/* cwrp->h4p is mrq control object */
@@ -397,8 +403,7 @@ static struct sg_request *sg_mk_srp_sgat(struct sg_fd *sfp, bool first, int db_l
 static int sg_abort_req(struct sg_fd *sfp, struct sg_request *srp);
 static int sg_rq_chg_state(struct sg_request *srp, enum sg_rq_state old_st,
 			   enum sg_rq_state new_st);
-static int sg_finish_rs_rq(struct sg_fd *sfp, struct sg_request *rs_srp,
-			   bool even_if_in_ws);
+static int sg_finish_rs_rq(struct sg_fd *sfp, struct sg_request *rs_srp, bool even_if_in_ws);
 static void sg_rq_chg_state_force(struct sg_request *srp, enum sg_rq_state new_st);
 static int sg_sfp_blk_poll(struct sg_fd *sfp, int loop_count);
 static int sg_srp_q_blk_poll(struct sg_request *srp, struct request_queue *q,
@@ -1010,6 +1015,14 @@ sg_v4h_partial_zero(struct sg_io_v4 *h4p)
 	memset((u8 *)h4p + off, 0, SZ_SG_IO_V4 - off);
 }
 
+static inline bool
+sg_v4_cmd_good(struct sg_io_v4 *h4p)
+{
+	return (scsi_status_is_good(h4p->device_status) &&
+		(h4p->driver_status & 0xf) == 0 &&
+		(h4p->transport_status & 0xff) == 0);
+}
+
 static void
 sg_sgat_zero(struct sg_scatter_hold *sgatp, int off, int nbytes)
 {
@@ -1198,33 +1211,28 @@ sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
 
 /* N.B. After this function is completed what srp points to should be considered invalid. */
 static int
-sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_request *srp)
+sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_request *srp)
 {
-	int s_res, indx;
+	int res, indx;
 	int tot_reqs = mhp->tot_reqs;
+	struct sg_fd *sfp = srp->parentfp;
 	struct sg_io_v4 *hp;
 	struct sg_io_v4 *a_hds = mhp->a_hds;
 	struct sg_io_v4 *cop = mhp->cwrp->h4p;
 
-	if (unlikely(!srp))
-		return -EPROTO;
 	indx = srp->s_hdr4.mrq_ind;
-	if (unlikely(srp->parentfp != sfp)) {
-		SG_LOG(1, sfp, "%s: mrq_ind=%d, sfp out-of-sync\n", __func__, indx);
-		return -EPROTO;
-	}
 	SG_LOG(3, sfp, "%s: %s, mrq_ind=%d, pack_id=%d\n", __func__, sg_side_str(srp), indx,
 	       srp->pack_id);
 	if (unlikely(indx < 0 || indx >= tot_reqs))
 		return -EPROTO;
 	hp = a_hds + indx;
-	s_res = sg_receive_v4(sfp, srp, NULL, hp);
-	if (unlikely(!sg_result_is_good(srp->rq_result)))
+	res = sg_receive_v4(sfp, srp, NULL, hp);
+	if (unlikely(res))
+		return res;
+	if (unlikely(!sg_v4_cmd_good(hp)))
 		SG_LOG(2, sfp, "%s: %s, bad status: drv/tran/scsi=0x%x/0x%x/0x%x\n",
 		       __func__, sg_side_str(srp), hp->driver_status,
 		       hp->transport_status, hp->device_status);
-	if (unlikely(s_res == -EFAULT))
-		return s_res;
 	hp->info |= SG_INFO_MRQ_FINI;
 	++cop->info;
 	if (cop->din_xfer_len > 0)
@@ -1239,9 +1247,9 @@ sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_request *s
 				pr_info("%s: eventfd_signal problem\n", __func__);
 		}
 	} else if (sfp->async_qp && (hp->flags & SGV4_FLAG_SIGNAL)) {
-		s_res = sg_mrq_arr_flush(mhp);
-		if (unlikely(s_res))	/* can only be -EFAULT */
-			return s_res;
+		res = sg_mrq_arr_flush(mhp);
+		if (unlikely(res))
+			return res;
 		kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
 	}
 	return 0;
@@ -1382,8 +1390,6 @@ sg_mrq_poll_either(struct sg_fd *sfp, struct sg_fd *sec_sfp, bool *on_sfp)
 		if (sfp) {
 			if (sg_mrq_get_ready_srp(sfp, &srp)) {
 				__set_current_state(TASK_RUNNING);
-				if (!srp)
-					return ERR_PTR(-ENODEV);
 				*on_sfp = true;
 				return srp;
 			}
@@ -1391,8 +1397,6 @@ sg_mrq_poll_either(struct sg_fd *sfp, struct sg_fd *sec_sfp, bool *on_sfp)
 		if (sec_sfp && sfp != sec_sfp) {
 			if (sg_mrq_get_ready_srp(sec_sfp, &srp)) {
 				__set_current_state(TASK_RUNNING);
-				if (!srp)
-					return ERR_PTR(-ENODEV);
 				*on_sfp = false;
 				return srp;
 			}
@@ -1420,14 +1424,18 @@ sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_fd *sec_sf
 	SG_LOG(3, sfp, "%s: mreqs=%d, sec_reqs=%d\n", __func__, mreqs, sec_reqs);
 	while (mreqs + sec_reqs > 0) {
 		while (mreqs > 0 && sg_mrq_get_ready_srp(sfp, &srp)) {
+			if (IS_ERR(srp))
+				return PTR_ERR(srp);
 			--mreqs;
-			res = sg_mrq_1complet(mhp, sfp, srp);
+			res = sg_mrq_1complet(mhp, srp);
 			if (unlikely(res))
 				return res;
 		}
 		while (sec_reqs > 0 && sg_mrq_get_ready_srp(sec_sfp, &srp)) {
+			if (IS_ERR(srp))
+				return PTR_ERR(srp);
 			--sec_reqs;
-			res = sg_mrq_1complet(mhp, sec_sfp, srp);
+			res = sg_mrq_1complet(mhp, srp);
 			if (unlikely(res))
 				return res;
 		}
@@ -1443,7 +1451,7 @@ sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_fd *sec_sf
 				mreqs = 0;
 			} else {
 				--mreqs;
-				res = sg_mrq_1complet(mhp, sfp, srp);
+				res = sg_mrq_1complet(mhp, srp);
 				if (unlikely(res))
 					return res;
 			}
@@ -1456,7 +1464,7 @@ sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_fd *sec_sf
 				sec_reqs = 0;
 			} else {
 				--sec_reqs;
-				res = sg_mrq_1complet(mhp, sec_sfp, srp);
+				res = sg_mrq_1complet(mhp, srp);
 				if (unlikely(res))
 					return res;
 			}
@@ -1470,12 +1478,12 @@ sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_fd *sec_sf
 			return PTR_ERR(srp);
 		if (on_sfp) {
 			--mreqs;
-			res = sg_mrq_1complet(mhp, sfp, srp);
+			res = sg_mrq_1complet(mhp, srp);
 			if (unlikely(res))
 				return res;
 		} else {
 			--sec_reqs;
-			res = sg_mrq_1complet(mhp, sec_sfp, srp);
+			res = sg_mrq_1complet(mhp, srp);
 			if (unlikely(res))
 				return res;
 		}
@@ -1649,6 +1657,7 @@ sg_mrq_submit(struct sg_fd *rq_sfp, struct sg_mrq_hold *mhp, int pos_in_rq_arr,
 	ul_timeout = msecs_to_jiffies(hp->timeout);
 	__assign_bit(SG_FRQ_PC_SYNC_INVOC, r_cwrp->frq_pc_bm, (int)mhp->from_sg_io);
 	__set_bit(SG_FRQ_PC_IS_V4I, r_cwrp->frq_pc_bm);
+	__set_bit(SG_FRQ_PC_PART_MRQ, r_cwrp->frq_pc_bm);
 	r_cwrp->h4p = hp;
 	r_cwrp->dlen = hp->din_xfer_len ? hp->din_xfer_len : hp->dout_xfer_len;
 	r_cwrp->timeout = min_t(unsigned long, ul_timeout, INT_MAX);
@@ -1704,15 +1713,18 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *m
 				++other_fp_sent;
 			continue;  /* defer completion until all submitted */
 		}
-		mhp->s_res = sg_wait_poll_for_given_srp(rq_sfp, srp, mhp->hipri);
-		if (unlikely(mhp->s_res)) {
-			if (mhp->s_res == -ERESTARTSYS || mhp->s_res == -ENODEV)
-				return mhp->s_res;
+		res = sg_wait_poll_for_given_srp(rq_sfp, srp, mhp->hipri);
+		if (unlikely(res)) {
+			mhp->s_res = res;
+			if (res == -ERESTARTSYS || res == -ENODEV)
+				return res;
 			break;
 		}
-		res = sg_mrq_1complet(mhp, rq_sfp, srp);
-		if (unlikely(res))
+		res = sg_mrq_1complet(mhp, srp);
+		if (unlikely(res)) {
+			mhp->s_res = res;
 			break;
+		}
 		++num_cmpl;
 	}	/* end of dispatch request and optionally wait response loop */
 	cop->dout_resid = mhp->tot_reqs - num_subm;
@@ -1725,28 +1737,54 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *m
 	if (mhp->immed)
 		return res;
 	if (likely(res == 0 && (this_fp_sent + other_fp_sent) > 0)) {
-		mhp->s_res = sg_mrq_complets(mhp, fp, o_sfp, this_fp_sent, other_fp_sent);
-		if (unlikely(mhp->s_res == -EFAULT || mhp->s_res == -ERESTARTSYS))
-			res = mhp->s_res;	/* this may leave orphans */
+		res = sg_mrq_complets(mhp, fp, o_sfp, this_fp_sent, other_fp_sent);
+		if (res)
+			mhp->s_res = res;	/* this may leave orphans */
 	}
 	if (mhp->id_of_mrq)	/* can no longer do a mrq abort */
 		atomic_set(&fp->mrq_id_abort, 0);
 	return res;
 }
 
+static bool
+sg_svb_err_process(struct sg_mrq_hold *mhp, int m_ind, struct sg_fd *fp, int res, bool rs)
+{
+	__maybe_unused const char *ss = rs ? "read-side" : "write-side";
+
+	if (res) {
+		if (mhp->s_res == 0)
+			mhp->s_res = res;
+		SG_LOG(1, fp, "%s: %s failure, res=%d\n", __func__, ss, res);
+	} else {
+		struct sg_io_v4 *hp = mhp->a_hds + m_ind;
+
+		++mhp->dtd_errs;
+		SG_LOG(2, fp, "%s: %s, bad status: drv/trans_host/scsi=0x%x/0x%x/0x%x\n",
+		       __func__, ss, hp->driver_status, hp->transport_status, hp->device_status);
+	}
+	return mhp->stop_if;
+}
+
+static inline void
+sg_svb_zero_elem(struct sg_svb_elem *svb_ap, int m)
+{
+	svb_ap[m].rs_srp = NULL;
+	svb_ap[m].prev_ws_srp = NULL;
+}
+
 /* For multiple requests (mrq) share variable blocking (svb) with no SGV4_FLAG_ORDERED_WR */
 static int
 sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mhp, int ra_ind,
 		      int *num_submp)
 {
 	bool chk_oth_first = false;
+	bool stop_triggered = false;
 	bool is_first = true;
+	bool rs_fail;
 	enum sg_rq_state rq_st;
 	int this_fp_sent = 0;
 	int other_fp_sent = 0;
-	int res = 0;
-	int first_err = 0;
-	int k, m, idx, ws_pos, num_reads, sent, dir;
+	int k, m, res, idx, ws_pos, num_reads, sent, dir, m_ind;
 	struct sg_io_v4 *hp = mhp->a_hds + ra_ind;
 	struct sg_request *srp;
 	struct sg_request *rs_srp;
@@ -1778,15 +1816,13 @@ sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold
 		}
 		if (IS_ERR(srp)) {
 			mhp->s_res = PTR_ERR(srp);
-			if (first_err == 0)
-				first_err = mhp->s_res;
 			SG_LOG(1, fp, "%s: sg_mrq_submit() err: %d\n", __func__, mhp->s_res);
 			break;	/* stop doing rs submits */
 		}
 		++*num_submp;
+		srp->s_hdr4.mrq_ind = ra_ind;
 		if (hp->din_xfer_len > 0)
 			svb_arr[k].rs_srp = srp;
-		srp->s_hdr4.mrq_ind = ra_ind;
 		if (mhp->chk_abort)
 			atomic_set(&srp->s_hdr4.pack_id_of_mrq, mhp->id_of_mrq);
 	}	/* end of read-side submission, write-side defer loop */
@@ -1794,26 +1830,41 @@ sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold
 	num_reads = k;
 	sent = this_fp_sent + other_fp_sent;
 
+	/*
+	 * It is important _not_ to break out of this loop as that will lead to hard to detect
+	 * memory leaks. We must wait for inflight requests to complete before final cleanup.
+	 */
 	for (k = 0; k < sent; ++k) {
 		if (other_fp_sent > 0 && sg_mrq_get_ready_srp(o_sfp, &srp)) {
+			if (IS_ERR(srp)) {
+				mhp->s_res = PTR_ERR(srp);
+				continue;
+			}
 other_found:
 			--other_fp_sent;
-			res = sg_mrq_1complet(mhp, o_sfp, srp);
-			if (unlikely(res))
-				break;
+			m_ind = srp->s_hdr4.mrq_ind;
+			res = sg_mrq_1complet(mhp, srp);
+			if (unlikely(res || !sg_v4_cmd_good(mhp->a_hds + m_ind)))
+				stop_triggered = sg_svb_err_process(mhp, m_ind, o_sfp, res, false);
 			continue;  /* do available submits first */
 		}
 		if (this_fp_sent > 0 && sg_mrq_get_ready_srp(fp, &srp)) {
+			if (IS_ERR(srp)) {
+				mhp->s_res = PTR_ERR(srp);
+				continue;
+			}
 this_found:
 			--this_fp_sent;
 			dir = srp->s_hdr4.dir;
-			res = sg_mrq_1complet(mhp, fp, srp);
-			if (unlikely(res))
-				break;
+			rs_fail = false;
+			m_ind = srp->s_hdr4.mrq_ind;
+			res = sg_mrq_1complet(mhp, srp);
+			if (unlikely(res || !sg_v4_cmd_good(mhp->a_hds + m_ind))) {
+				rs_fail = true;
+				stop_triggered = sg_svb_err_process(mhp, m_ind, fp, res, true);
+			}
 			if (dir != SG_DXFER_FROM_DEV)
 				continue;
-			if (test_bit(SG_FFD_READ_SIDE_ERR, fp->ffd_bm))
-				continue;
 			/* read-side req completed, submit its write-side(s) */
 			rs_srp = srp;
 			for (m = 0; m < num_reads; ++m) {
@@ -1825,37 +1876,27 @@ sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold
 				       srp->pack_id);
 				continue;
 			}
+			if (stop_triggered || rs_fail) {
+				sg_svb_zero_elem(svb_arr, m);
+				continue;
+			}
 			rq_st = atomic_read(&rs_srp->rq_st);
-			if (rq_st == SG_RQ_INACTIVE)
-				continue;       /* probably an error, bypass paired write-side rq */
-			else if (rq_st != SG_RQ_SHR_SWAP) {
+			if (rq_st == SG_RQ_INACTIVE) {
+				sg_svb_zero_elem(svb_arr, m);
+				continue;  /* probably an error, bypass paired write-side rq */
+			} else if (rq_st != SG_RQ_SHR_SWAP) {
 				SG_LOG(1, fp, "%s: expect rs_srp to be in shr_swap\n", __func__);
-				res = -EPROTO;
-				break;
+				mhp->s_res = -EPROTO;
+				sg_svb_zero_elem(svb_arr, m);
+				continue;
 			}
 			ws_pos = svb_arr[m].ws_pos;
-			for (idx = 0; idx < SG_MAX_RSV_REQS; ++idx) {
-				if (fp->rsv_arr[idx] == srp)
-					break;
-			}
-			if (idx >= SG_MAX_RSV_REQS) {
-				SG_LOG(1, fp, "%s: srp not in rsv_arr\n", __func__);
-				res = -EPROTO;
-				break;
-			}
+			idx = srp->rsv_arr_idx;
 			SG_LOG(6, o_sfp, "%s: ws_pos=%d, rs_idx=%d\n", __func__, ws_pos, idx);
 			srp = sg_mrq_submit(o_sfp, mhp, ws_pos, idx, svb_arr[m].prev_ws_srp);
 			if (IS_ERR(srp)) {
+				sg_svb_zero_elem(svb_arr, m);
 				mhp->s_res = PTR_ERR(srp);
-				if (mhp->s_res == -EFBIG) {	/* out of reserve slots */
-					if (first_err)
-						break;
-					res = mhp->s_res;
-					break;
-				}
-				if (first_err == 0)
-					first_err = mhp->s_res;
-				svb_arr[m].prev_ws_srp = NULL;
 				SG_LOG(1, o_sfp, "%s: sg_mrq_submit(oth)->%d\n", __func__,
 				       mhp->s_res);
 				continue;
@@ -1876,8 +1917,11 @@ sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold
 		if (this_fp_sent > 0) {
 			res = sg_wait_any_mrq(fp, &srp);
 			if (unlikely(res))
-				break;
-			goto this_found;
+				mhp->s_res = res;
+			else if (IS_ERR(srp))
+				mhp->s_res = PTR_ERR(srp);
+			else
+				goto this_found;
 		}
 		if (chk_oth_first)
 			continue;
@@ -1885,24 +1929,31 @@ sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold
 		if (other_fp_sent > 0) {
 			res = sg_wait_any_mrq(o_sfp, &srp);
 			if (unlikely(res))
-				break;
-			goto other_found;
+				mhp->s_res = res;
+			else if (IS_ERR(srp))
+				mhp->s_res = PTR_ERR(srp);
+			else
+				goto other_found;
 		}
 		if (chk_oth_first)
 			goto this_second;
 	}	/* end of loop for deferred ws submits and rs+ws responses */
 
-	if (res == 0 && first_err)
-		res = first_err;
-	return res;
+	if (mhp->s_res) {
+		if (mhp->stop_if)
+			stop_triggered = true;
+	}
+	return stop_triggered ? -ECANCELED : 0;
 }
 
 static int
 sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mhp, int ra_ind,
 		   int *num_submp)
 {
+	bool stop_triggered = false;
+	bool rs_fail;
 	enum sg_rq_state rq_st;
-	int k, m, res, idx, ws_pos, num_reads;
+	int k, m, res, idx, ws_pos, num_reads, m_ind;
 	int this_fp_sent = 0;
 	int other_fp_sent = 0;
 	struct sg_io_v4 *hp = mhp->a_hds + ra_ind;
@@ -1949,45 +2000,46 @@ sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 	num_reads = k;
 
 	if (this_fp_sent + other_fp_sent <= 0)
-		return 0;
+		goto fini;
+
 	for (m = 0; m < num_reads; ++m) {
 		rs_srp = svb_arr[m].rs_srp;
 		if (!rs_srp)
 			continue;
 		res = sg_wait_poll_for_given_srp(fp, rs_srp, mhp->hipri);
 		if (unlikely(res))
-			return res;
+			mhp->s_res = res;
 		--this_fp_sent;
-		res = sg_mrq_1complet(mhp, fp, rs_srp);
-		if (unlikely(res))
-			return res;
-		if (test_bit(SG_FFD_READ_SIDE_ERR, fp->ffd_bm))
+		rs_fail = false;
+		m_ind = rs_srp->s_hdr4.mrq_ind;
+		res = sg_mrq_1complet(mhp, rs_srp);
+		if (unlikely(res || !sg_v4_cmd_good(mhp->a_hds + m_ind))) {
+			rs_fail = true;
+			stop_triggered = sg_svb_err_process(mhp, m_ind, fp, res, true);
+		}
+		if (unlikely(stop_triggered || rs_fail)) {
+			sg_svb_zero_elem(svb_arr, m);
 			continue;
+		}
 		rq_st = atomic_read(&rs_srp->rq_st);
-		if (rq_st == SG_RQ_INACTIVE)
+		if (rq_st == SG_RQ_INACTIVE) {
+			sg_svb_zero_elem(svb_arr, m);
 			continue;       /* probably an error, bypass paired write-side rq */
-		else if (rq_st != SG_RQ_SHR_SWAP) {
+		} else if (rq_st != SG_RQ_SHR_SWAP) {
 			SG_LOG(1, fp, "%s: expect rs_srp to be in shr_swap\n", __func__);
-			res = -EPROTO;
-			break;
+			mhp->s_res = -EPROTO;
+			sg_svb_zero_elem(svb_arr, m);
+			continue;
 		}
 		ws_pos = svb_arr[m].ws_pos;
-		for (idx = 0; idx < SG_MAX_RSV_REQS; ++idx) {
-			if (fp->rsv_arr[idx] == rs_srp)
-				break;
-		}
-		if (idx >= SG_MAX_RSV_REQS) {
-			SG_LOG(1, rs_srp->parentfp, "%s: srp not in rsv_arr\n", __func__);
-			res = -EPROTO;
-			return res;
-		}
+		idx = rs_srp->rsv_arr_idx;
 		SG_LOG(6, o_sfp, "%s: ws_pos=%d, rs_idx=%d\n", __func__, ws_pos, idx);
 		srp = sg_mrq_submit(o_sfp, mhp, ws_pos, idx, svb_arr[m].prev_ws_srp);
 		if (IS_ERR(srp)) {
+			sg_svb_zero_elem(svb_arr, m);
 			mhp->s_res = PTR_ERR(srp);
-			res = mhp->s_res;
-			SG_LOG(1, o_sfp, "%s: mrq_submit(oth)->%d\n", __func__, res);
-			return res;
+			SG_LOG(1, o_sfp, "%s: mrq_submit(oth)->%d\n", __func__, mhp->s_res);
+			continue;
 		}
 		svb_arr[m].prev_ws_srp = srp;
 		++*num_submp;
@@ -1997,24 +2049,54 @@ sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 			atomic_set(&srp->s_hdr4.pack_id_of_mrq, mhp->id_of_mrq);
 	}
 	while (this_fp_sent > 0) {	/* non-data requests */
-		res = sg_wait_any_mrq(fp, &srp);
-		if (unlikely(res))
-			return res;
 		--this_fp_sent;
-		res = sg_mrq_1complet(mhp, fp, srp);
+		res = sg_wait_any_mrq(fp, &srp);
+		if (unlikely(res)) {
+			mhp->s_res = res;
+			continue;
+		}
+		if (IS_ERR(srp)) {
+			mhp->s_res = PTR_ERR(srp);
+			continue;
+		}
+		m_ind = srp->s_hdr4.mrq_ind;
+		res = sg_mrq_1complet(mhp, srp);
 		if (unlikely(res))
-			return res;
+			mhp->s_res = res;
 	}
 	while (other_fp_sent > 0) {
-		res = sg_wait_any_mrq(o_sfp, &srp);
-		if (unlikely(res))
-			return res;
 		--other_fp_sent;
-		res = sg_mrq_1complet(mhp, o_sfp, srp);
+		res = sg_wait_any_mrq(o_sfp, &srp);
+		if (unlikely(res)) {
+			mhp->s_res = res;
+			continue;
+		}
+		if (IS_ERR(srp)) {
+			mhp->s_res = PTR_ERR(srp);
+			continue;
+		}
+		m_ind = srp->s_hdr4.mrq_ind;
+		res = sg_mrq_1complet(mhp, srp);
 		if (unlikely(res))
-			return res;
+			mhp->s_res = res;
+	}
+fini:
+	if (mhp->s_res) {
+		if (mhp->stop_if)
+			stop_triggered = true;
+	}
+	return stop_triggered ? -ECANCELED : 0;
+}
+
+static inline void
+sg_svb_srp_cleanup(struct sg_request *srp)
+{
+	if (test_and_clear_bit(SG_FRQ_LT_REUSE_BIO, srp->frq_lt_bm)) {
+		if (srp->bio) {
+			bio_put(srp->bio);	/* _get() near end of sg_start_req() */
+			srp->bio = NULL;
+		}
 	}
-	return 0;
 }
 
 static void
@@ -2024,12 +2106,8 @@ sg_svb_cleanup(struct sg_fd *sfp)
 	struct xarray *xafp = &sfp->srp_arr;
 	struct sg_request *srp;
 
-	xa_for_each_marked(xafp, idx, srp, SG_XA_RQ_INACTIVE) {
-		if (test_and_clear_bit(SG_FRQ_LT_REUSE_BIO, srp->frq_lt_bm)) {
-			bio_put(srp->bio);	/* _get() near end of sg_start_req() */
-			srp->bio = NULL;
-		}
-	}
+	xa_for_each_marked(xafp, idx, srp, SG_XA_RQ_INACTIVE)
+		sg_svb_srp_cleanup(srp);
 }
 
 /*
@@ -2089,7 +2167,7 @@ sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 		atomic_set(&fp->mrq_id_abort, 0);
 	if (test_and_clear_bit(SG_FFD_CAN_REUSE_BIO, fp->ffd_bm))
 		sg_svb_cleanup(fp);
-	return res;
+	return res == -ECANCELED ? 0 : res;
 }
 
 #if IS_ENABLED(SG_LOG_ACTIVE)
@@ -2156,6 +2234,7 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 	mhp->id_of_mrq = (int)cop->request_extra;
 	mhp->tot_reqs = tot_reqs;
 	mhp->s_res = 0;
+	mhp->dtd_errs = 0;
 	if (mhp->id_of_mrq) {
 		existing_id = atomic_cmpxchg(&fp->mrq_id_abort, 0, mhp->id_of_mrq);
 		if (existing_id && existing_id != mhp->id_of_mrq) {
@@ -2205,11 +2284,15 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 		else
 			return -EPROTO;
 	}
+	if (din_len > 0) {
+		if (unlikely(din_len > SG_MAX_MULTI_REQ_SZ))
+			return  -E2BIG;
+	} else if (dout_len > 0) {
+		if (unlikely(dout_len > SG_MAX_MULTI_REQ_SZ))
+			return  -E2BIG;
+	}
 	if (unlikely(tot_reqs > U16_MAX)) {
 		return -ERANGE;
-	} else if (unlikely(dout_len > SG_MAX_MULTI_REQ_SZ || din_len > SG_MAX_MULTI_REQ_SZ ||
-			    cdb_alen > SG_MAX_MULTI_REQ_SZ)) {
-		return  -E2BIG;
 	} else if (unlikely(mhp->immed && mhp->stop_if)) {
 		return -ERANGE;
 	} else if (unlikely(tot_reqs == 0)) {
@@ -2217,6 +2300,8 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 	} else if (unlikely(!!cdb_alen != !!cop->request)) {
 		return -ERANGE;	/* both must be zero or both non-zero */
 	} else if (cdb_alen) {
+		if (unlikely(cdb_alen > SG_MAX_MULTI_REQ_SZ))
+			return  -E2BIG;
 		if (unlikely(cdb_alen % tot_reqs))
 			return -ERANGE;
 		cdb_mxlen = cdb_alen / tot_reqs;
@@ -2681,7 +2766,6 @@ sg_get_rsv_str(struct sg_request *srp, const char *leadin, const char *leadout,
 {
 	int k, i_len, o_len, len;
 	struct sg_fd *sfp;
-	struct sg_request **rapp;
 
 	if (!b || b_len < 1)
 		return b;
@@ -2696,13 +2780,9 @@ sg_get_rsv_str(struct sg_request *srp, const char *leadin, const char *leadout,
 	sfp = srp->parentfp;
 	if (!sfp)
 		goto blank;
-	rapp = sfp->rsv_arr;
-	for (k = 0; k < SG_MAX_RSV_REQS; ++k, ++rapp) {
-		if (srp == *rapp)
-			break;
-	}
-	if (k >= SG_MAX_RSV_REQS)
+	if (!test_bit(SG_FRQ_LT_RESERVED, srp->frq_lt_bm))
 		goto blank;
+	k = srp->rsv_arr_idx;
 	scnprintf(b, b_len, "%srsv%d%s", leadin, k, leadout);
 	return b;
 blank:
@@ -2803,8 +2883,6 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		rq_flags = hi_p->flags;
 		pack_id = hi_p->pack_id;
 	}
-	if (unlikely(rq_flags & SGV4_FLAG_MULTIPLE_REQS))
-		return ERR_PTR(-ERANGE);  /* only control object sets this */
 	if (sg_fd_is_shared(fp)) {
 		res = sg_share_chk_flags(fp, rq_flags, dlen, dir, &sh_var);
 		if (unlikely(res < 0))
@@ -2852,6 +2930,12 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 	sg_execute_cmd(fp, srp);
 	return srp;
 err_out:
+	if (srp->sh_var == SG_SHR_WS_RQ) {
+		struct sg_request *rs_srp = srp->sh_srp;
+
+		if (rs_srp)
+			sg_finish_rs_rq(NULL, rs_srp, true);
+	}
 	sg_deact_request(fp, srp);
 	return ERR_PTR(res);
 }
@@ -3644,70 +3728,81 @@ sg_finish_rs_rq(struct sg_fd *sfp, struct sg_request *rs_srp, bool even_if_in_ws
 	bool found_one = false;
 	int res = -EINVAL;
 	int k;
-	enum sg_rq_state sr_st;
+	enum sg_rq_state rq_st;
 	unsigned long iflags;
 	struct sg_fd *rs_sfp;
-	struct sg_request *rs_rsv_srp;
-	struct sg_device *sdp = sfp->parentdp;
-
-	rs_sfp = sg_fd_share_ptr(sfp);
-	if (unlikely(!rs_sfp))
-		goto fini;
-	if (xa_get_mark(&sdp->sfp_arr, sfp->idx, SG_XA_FD_RS_SHARE))
-		rs_sfp = sfp;
 
-	for (k = 0; k < SG_MAX_RSV_REQS; ++k) {
-		res = -EINVAL;
-		rs_rsv_srp = rs_sfp->rsv_arr[k];
-		if (rs_srp) {
-			if (rs_srp != rs_rsv_srp)
+	if (rs_srp) {
+		if (rs_srp->sh_var != SG_SHR_RS_RQ) {
+			res = -EPROTO;
+			goto err;
+		}
+		rs_sfp = rs_srp->parentfp;
+	} else {
+		if (!sfp)
+			goto err;
+		if (xa_get_mark(&sfp->parentdp->sfp_arr, sfp->idx, SG_XA_FD_RS_SHARE)) {
+			rs_sfp = sfp;
+		} else if (sg_fd_is_shared(sfp)) {	/* must be write-side */
+			rs_sfp = sg_fd_share_ptr(sfp);
+		} else {
+			pr_warn("%s: non sharing fd given\n", __func__);
+			res = -EINVAL;
+			goto err;
+		}
+		for (k = 0; k < SG_MAX_RSV_REQS; ++k) {
+			rs_srp = rs_sfp->rsv_arr[k];
+			if (IS_ERR_OR_NULL(rs_srp))
 				continue;
+			if (atomic_read(&rs_srp->rq_st) == SG_RQ_SHR_SWAP)
+				break;
 		}
-		if (IS_ERR_OR_NULL(rs_rsv_srp))
-			continue;
-		xa_lock_irqsave(&rs_sfp->srp_arr, iflags);
-		sr_st = atomic_read(&rs_rsv_srp->rq_st);
-		switch (sr_st) {
-		case SG_RQ_SHR_SWAP:
-			found_one = true;
-			break;
-		case SG_RQ_SHR_IN_WS:
-			if (even_if_in_ws)
-				found_one = true;
-			else
-				res = -EBUSY;
-			break;
-		case SG_RQ_BUSY:
-			res = -EBUSY;
-			break;
-		default:
+		if (k >= SG_MAX_RSV_REQS) {
 			res = -EINVAL;
-			break;
+			goto fini;
 		}
-		if (found_one)
-			goto found;
-		xa_unlock_irqrestore(&rs_sfp->srp_arr, iflags);
-		if (rs_srp)
-			return res;	/* found rs_srp but was in wrong state */
 	}
-fini:
+	xa_lock_irqsave(&rs_sfp->srp_arr, iflags);
+	rq_st = atomic_read(&rs_srp->rq_st);
+	switch (rq_st) {
+	case SG_RQ_SHR_SWAP:
+		found_one = true;
+		break;
+	case SG_RQ_SHR_IN_WS:
+	case SG_RQ_BUSY:
+		if (even_if_in_ws)
+			found_one = true;
+		else
+			res = -EBUSY;
+		break;
+	default:
+		res = -EINVAL;
+		break;
+	}
+	if (found_one)
+		goto found;
+	xa_unlock_irqrestore(&rs_sfp->srp_arr, iflags);
+err:
 	if (unlikely(res))
-		SG_LOG(1, sfp, "%s: err=%d\n", __func__, -res);
+		SG_LOG(1, rs_sfp, "%s: err=%d\n", __func__, -res);
+	if (rs_srp)
+		goto fini;
 	return res;
 found:
-	res = sg_rq_chg_state_ulck(rs_rsv_srp, sr_st, SG_RQ_BUSY);
+	res = sg_rq_chg_state_ulck(rs_srp, rq_st, SG_RQ_BUSY);
 	if (!res)
 		atomic_inc(&rs_sfp->inactives);
-	rs_rsv_srp->tag = SG_TAG_WILDCARD;
-	rs_rsv_srp->sh_var = SG_SHR_NONE;
-	rs_rsv_srp->in_resid = 0;
-	rs_rsv_srp->rq_info = 0;
-	rs_rsv_srp->sense_len = 0;
-	rs_rsv_srp->sh_srp = NULL;
+	rs_srp->tag = SG_TAG_WILDCARD;
+	rs_srp->sh_var = SG_SHR_NONE;
+	rs_srp->in_resid = 0;
+	rs_srp->rq_info = 0;
+	rs_srp->sense_len = 0;
+	rs_srp->sh_srp = NULL;
 	xa_unlock_irqrestore(&rs_sfp->srp_arr, iflags);
-	sg_finish_scsi_blk_rq(rs_rsv_srp);
-	sg_deact_request(rs_rsv_srp->parentfp, rs_rsv_srp);
-	return 0;
+fini:
+	sg_finish_scsi_blk_rq(rs_srp);
+	sg_deact_request(rs_sfp, rs_srp);
+	return res;
 }
 
 static void
@@ -4126,6 +4221,8 @@ sg_match_first_mrq_after(struct sg_fd *sfp, int pack_id, struct sg_request *afte
 				look_for_after = false;
 			continue;
 		}
+		if (!test_bit(SG_FRQ_PC_PART_MRQ, srp->frq_pc_bm))
+			continue;
 		id = atomic_read(&srp->s_hdr4.pack_id_of_mrq);
 		if (id == 0)	/* mrq_pack_ids cannot be zero */
 			continue;
@@ -4584,6 +4681,7 @@ sg_set_reserved_sz(struct sg_fd *sfp, int want_rsv_sz)
 				__clear_bit(SG_FRQ_LT_RESERVED, o_srp->frq_lt_bm);
 				__set_bit(SG_FRQ_LT_RESERVED, t_srp->frq_lt_bm);
 				__assign_bit(SG_FRQ_LT_REUSE_BIO, t_srp->frq_lt_bm, is_reuse_bio);
+				o_srp->rsv_arr_idx = 0;
 				*rapp = t_srp;
 				xa_unlock_irqrestore(xafp, iflags);
 				sg_remove_srp(n_srp);
@@ -5871,7 +5969,8 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 		u64 n = eventfd_signal(sfp->efd_ctxp, 1);
 
 		if (n != 1)
-			pr_info("%s: srp=%pK eventfd_signal problem\n", __func__, srp);
+			pr_info("%s: srp->pack_id=%d eventfd_signal problem\n", __func__,
+				srp->pack_id);
 	}
 	kref_put(&sfp->f_ref, sg_remove_sfp);	/* get in: sg_execute_cmd() */
 }
@@ -6259,7 +6358,7 @@ sg_rq_map_kern(struct sg_request *srp, struct request_queue *q, struct request *
 static int
 sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 {
-	bool bump_bio_get = false;
+	bool first_reuse_bio = false;
 	bool no_dxfer, us_xfer;
 	int res = 0;
 	int dlen = cwrp->dlen;
@@ -6271,7 +6370,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	struct scsi_request *scsi_rp;
 	struct sg_fd *sfp = cwrp->sfp;
 	struct sg_device *sdp;
-	struct sg_scatter_hold *req_schp;
+	struct sg_scatter_hold *schp;
 	struct request_queue *q;
 	struct rq_map_data *md = (void *)srp; /* want any non-NULL value */
 	u8 *long_cmdp = NULL;
@@ -6349,7 +6448,7 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 	__assign_bit(SG_FRQ_PC_US_XFER, srp->frq_pc_bm, !no_dxfer && us_xfer);
 	rqq->end_io_data = srp;
 	scsi_rp->retries = SG_DEFAULT_RETRIES;
-	req_schp = srp->sgatp;
+	schp = srp->sgatp;
 
 	if (no_dxfer) {
 		SG_LOG(4, sfp, "%s: no data xfer [0x%pK]\n", __func__, srp);
@@ -6361,22 +6460,33 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		md = NULL;
 		if (IS_ENABLED(CONFIG_SCSI_PROC_FS))
 			cp = "direct_io, ";
-	} else if (test_bit(SG_FFD_CAN_REUSE_BIO, sfp->ffd_bm)) {
+	} else if (test_bit(SG_FFD_CAN_REUSE_BIO, sfp->ffd_bm) &&
+		   test_bit(SG_FRQ_PC_PART_MRQ, srp->frq_pc_bm)) {
 		if (test_bit(SG_FRQ_LT_REUSE_BIO, srp->frq_lt_bm)) {
-			if (srp->bio) {
+			if (srp->bio) {		/* 2,3,4 ... reuse bio handling */
+				bio_reset(srp->bio);
+				srp->bio->bi_iter.bi_size = srp->s_hdr4.bi_size;
+				srp->bio->bi_opf = srp->s_hdr4.bi_opf;
+				srp->bio->bi_vcnt = srp->s_hdr4.bi_vcnt;
+				srp->bio->bi_end_io = srp->s_hdr4.bi_end_io;
 				res = blk_rq_append_bio(rqq, &srp->bio);
+				rqq->nr_phys_segments = (1 << schp->page_order) * schp->num_sgat;
+				bio_get(rqq->bio);
+				/*
+				 * balancing bio_put() is either in:
+				 *     - normal case: in sg_mk_kern_bio(), or
+				 *     - error case: in sg_common_write() after err_out label
+				 */
 				if (res)
 					SG_LOG(1, sfp, "%s: blk_rq_append_bio err=%d\n", __func__,
 					       res);
-				md = NULL;
-				atomic_inc(&sg_tmp_count_reused_bios);
 			} else {
 				res = -EPROTO;
 			}
 			goto fini;
 		} else {	/* first use of bio, almost normal setup */
 			md = &map_data;
-			bump_bio_get = true;
+			first_reuse_bio = true;
 		}
 	} else {	/* normal indirect IO */
 		md = &map_data;
@@ -6389,16 +6499,16 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 			struct sg_request *r_srp = sfp->rsv_arr[0];
 
 			reserve0 = (r_srp == srp);
-			if (unlikely(!reserve0 || dlen > req_schp->buflen))
+			if (unlikely(!reserve0 || dlen > schp->buflen))
 				res = reserve0 ? -ENOMEM : -EBUSY;
-		} else if (req_schp->buflen == 0) {
+		} else if (schp->buflen == 0) {
 			res = sg_mk_sgat(srp, sfp, max_t(int, dlen, sfp->sgat_elem_sz));
 		}
 		if (unlikely(res))
 			goto fini;
-		md->pages = req_schp->pages;
-		md->page_order = req_schp->page_order;
-		md->nr_entries = req_schp->num_sgat;
+		md->pages = schp->pages;
+		md->page_order = schp->page_order;
+		md->nr_entries = schp->num_sgat;
 		md->offset = 0;
 		md->null_mapped = !up;
 		md->from_user = (dxfer_dir == SG_DXFER_TO_FROM_DEV);
@@ -6434,9 +6544,13 @@ sg_start_req(struct sg_request *srp, struct sg_comm_wr_t *cwrp, int dxfer_dir)
 		res = sg_rq_map_kern(srp, q, rqq, r0w);
 		if (res)
 			goto fini;
-		if (bump_bio_get) {	/* keep bio alive to re-use next time */
+		if (first_reuse_bio) {	/* keep bio alive to re-use, hold some bio fields */
+			srp->s_hdr4.bi_size = rqq->bio->bi_iter.bi_size;
+			srp->s_hdr4.bi_opf = rqq->bio->bi_opf;
+			srp->s_hdr4.bi_vcnt = rqq->bio->bi_vcnt;
+			srp->s_hdr4.bi_end_io = rqq->bio->bi_end_io;
 			set_bit(SG_FRQ_LT_REUSE_BIO, srp->frq_lt_bm);
-			bio_get(rqq->bio);	/* _put() in sg_svb_cleanup() */
+			bio_get(rqq->bio);	/* _put() in sg_svb_srp_cleanup() */
 		}
 	}
 fini:
@@ -6478,9 +6592,8 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 			clear_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm);
 		atomic_dec_return_release(&sfp->waiting);
 	}
-
 	/* Expect blk_put_request(rqq) already called in sg_rq_end_io() */
-	if (rqq) {	/* blk_get_request() may have failed */
+	if (unlikely(rqq)) {
 		WRITE_ONCE(srp->rqq, NULL);
 		if (scsi_req(rqq))
 			scsi_req_free_cmd(scsi_req(rqq));
@@ -6494,10 +6607,10 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 			ret = blk_rq_unmap_user(bio);
 			if (unlikely(ret))	/* -EINTR (-4) can be ignored */
 				SG_LOG(6, sfp, "%s: blk_rq_unmap_user() --> %d\n", __func__, ret);
+		}
+		if (!test_bit(SG_FRQ_LT_REUSE_BIO, srp->frq_lt_bm))
 			srp->bio = NULL;
-		} else if (!test_bit(SG_FRQ_LT_REUSE_BIO, srp->frq_lt_bm)) {
-			srp->bio = NULL;
-		} /* else may be able to re-use this bio [mrq, uniform svb] */
+		/* else may be able to re-use this bio [mrq, uniform svb] */
 	}
 	/* In worst case, READ data returned to user space by this point */
 }
@@ -6936,6 +7049,7 @@ sg_setup_req_new_srp(struct sg_comm_wr_t *cwrp, bool new_rsv_srp, bool no_reqs,
 	       r_srp);
 	if (new_rsv_srp) {
 		fp->rsv_arr[ra_idx] = r_srp;
+		r_srp->rsv_arr_idx = ra_idx;
 		set_bit(SG_FRQ_LT_RESERVED, r_srp->frq_lt_bm);
 		r_srp->sh_srp = NULL;
 	}
@@ -6970,7 +7084,6 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 	bool no_reqs = false;
 	bool ws_rq = false;
 	bool try_harder = false;
-	bool keep_frq_pc_bm = false;
 	bool second = false;
 	int res, ra_idx, l_used_idx;
 	int dlen = cwrp->dlen;
@@ -6993,7 +7106,6 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 				r_srp = NULL;
 			} else {
 				atomic_dec(&fp->inactives);
-				keep_frq_pc_bm = true;
 				r_srp->sh_srp = NULL;
 				goto final_setup;
 			}
@@ -7048,7 +7160,6 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 				if (likely(res == 0)) {
 					/* possible_srp bypasses loop to find candidate */
 					mk_new_srp = false;
-					keep_frq_pc_bm = true;
 					goto final_setup;
 				}
 			}
@@ -7178,11 +7289,10 @@ sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var)
 			goto start_again;
 	}
 final_setup:
-	if (!keep_frq_pc_bm)
-		r_srp->frq_pc_bm[0] = cwrp->frq_pc_bm[0];
+	r_srp->frq_pc_bm[0] = cwrp->frq_pc_bm[0];
 	r_srp->sgatp->dlen = dlen;	/* must be <= r_srp->sgat_h.buflen */
 	r_srp->sh_var = sh_var;
-	r_srp->cmd_opcode = 0xff;  /* set invalid opcode (VS), 0x0 is TUR */
+	r_srp->cmd_opcode = cwrp->cmdp ? cwrp->cmdp[0] : 0xff;	/* get later if in user space */
 	/* If setup stalls (e.g. blk_get_request()) debug shows 'elap=1 ns' */
 	if (test_bit(SG_FFD_TIME_IN_NS, fp->ffd_bm))
 		r_srp->start_ns = S64_MAX;
@@ -7754,6 +7864,8 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, bool inactive,
 			       sg_shr_str(srp->sh_var, false));
 	if (srp->sgatp->num_sgat > 1)
 		n += scnprintf(obp + n, len - n, " sgat=%d", srp->sgatp->num_sgat);
+	if (test_bit(SG_FRQ_LT_REUSE_BIO, srp->frq_lt_bm))
+		n += scnprintf(obp + n, len - n, " re-use_bio");
 	cp = (srp->rq_flags & SGV4_FLAG_HIPRI) ? "hipri " : "";
 	n += scnprintf(obp + n, len - n, " %sop=0x%02x\n", cp, srp->cmd_opcode);
 	if (inactive && rq_st != SG_RQ_INACTIVE)
@@ -7800,6 +7912,8 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx, bool r
 		n += scnprintf(obp + n, len - n, " excl_waitq");
 	if (test_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm))
 		n += scnprintf(obp + n, len - n, " svb");
+	if (test_bit(SG_FFD_CAN_REUSE_BIO, fp->ffd_bm))
+		n += scnprintf(obp + n, len - n, " can_reuse_bio");
 	n += scnprintf(obp + n, len - n, " fd_bm=0x%lx\n", fp->ffd_bm[0]);
 	n += scnprintf(obp + n, len - n,
 		       "   mmap_sz=%d low_used_idx=%d low_await_idx=%d sum_fd_dlens=%u\n",
@@ -7899,9 +8013,8 @@ sg_proc_seq_show_debug(struct seq_file *s, void *v, bool reduced)
 
 	b1[0] = '\0';
 	if (it && it->index == 0)
-		seq_printf(s, "max_active_device=%d  def_reserved_size=%d  num_reused_bios=%d\n",
-			   (int)it->max, def_reserved_size,
-			   atomic_read(&sg_tmp_count_reused_bios));
+		seq_printf(s, "max_active_device=%d  def_reserved_size=%d\n",
+			   (int)it->max, def_reserved_size);
 	fdi_p = it ? &it->fd_index : &k;
 	bp = kzalloc(bp_len, __GFP_NOWARN | GFP_KERNEL);
 	if (unlikely(!bp)) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 81/83] sg: blk_poll/hipri work for mrq
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (80 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 80/83] sg: expand bvec usage; " Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 82/83] sg: pollable and non-pollable requests Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 83/83] sg: bump version to 4.0.47 Douglas Gilbert
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

There are several variants here to cope with. The SGV4_FLAG_HIPRI
flag may be set on the control object and/or set on any number of
requests in the req_arr. To simplify things, if that flag is
set on the control object, then its is set internally on each
request in req_arr. That leaves the case were this flag is cleared
on the control object, but set on some or all of the requests in
req_arr. That will work but it may not be the best approach, see
the next paragraph.

For the shared variable blocking mrq method, if the SGV4_FLAG_HIPRI
flag is set on the control object, then an interleaved blk_poll()
algorithm is performed on inflight requests. This should give
better results than doing blk_poll() on each request until it is
completed before moving onto the next inflight request.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 449 ++++++++++++++++++++++++++--------------------
 1 file changed, 253 insertions(+), 196 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 26047a8ff1e2..773843a14038 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -332,7 +332,6 @@ struct sg_device { /* holds the state of each scsi generic device */
 };
 
 struct sg_comm_wr_t {  /* arguments to sg_common_write() */
-	bool keep_share;
 	int timeout;
 	int cmd_len;
 	int rsv_idx;		/* wanted rsv_arr index, def: -1 (anyone) */
@@ -353,10 +352,10 @@ struct sg_mrq_hold {	/* for passing context between multiple requests (mrq) func
 	unsigned from_sg_io:1;
 	unsigned chk_abort:1;
 	unsigned immed:1;
-	unsigned hipri:1;
 	unsigned stop_if:1;
 	unsigned co_mmap:1;
 	unsigned ordered_wr:1;
+	unsigned hipri:1;
 	int id_of_mrq;
 	int s_res;		/* secondary error: some-good-then-error; in co.spare_out */
 	int dtd_errs;		/* incremented for each driver/transport/device error */
@@ -394,7 +393,7 @@ static void sg_remove_srp(struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp, struct file *filp);
 static void sg_remove_sfp(struct kref *);
 static void sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side);
-static struct sg_request *sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag);
+static struct sg_request *sg_get_srp_by_id(struct sg_fd *sfp, int id, bool is_tag, bool part_mrq);
 static struct sg_request *sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int min_dev);
@@ -1163,11 +1162,11 @@ sg_num_waiting_maybe_acquire(struct sg_fd *sfp)
 }
 
 /*
- * Returns true if a request is ready and its srp is written to *srpp . If nothing can be found
- * returns false and NULL --> *srpp . If device is detaching, returns true and NULL --> *srpp .
+ * Looks for request in SG_RQ_AWAIT_RCV state on given fd that matches part_mrq. The first one
+ * found is placed in SG_RQ_BUSY state and its address is returned. If none found returns NULL.
  */
-static bool
-sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
+static struct sg_request *
+sg_get_any_srp(struct sg_fd *sfp, bool part_mrq)
 {
 	bool second = false;
 	int l_await_idx = READ_ONCE(sfp->low_await_idx);
@@ -1175,25 +1174,18 @@ sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
 	struct sg_request *srp;
 	struct xarray *xafp = &sfp->srp_arr;
 
-	if (SG_IS_DETACHING(sfp->parentdp)) {
-		*srpp = ERR_PTR(-ENODEV);
-		return true;
-	}
-	if (sg_num_waiting_maybe_acquire(sfp) < 1)
-		goto fini;
-
 	s_idx = (l_await_idx < 0) ? 0 : l_await_idx;
 	idx = s_idx;
 	end_idx = ULONG_MAX;
-
 second_time:
 	for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
 	     srp;
 	     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
+		if (part_mrq != test_bit(SG_FRQ_PC_PART_MRQ, srp->frq_pc_bm))
+			continue;
 		if (likely(sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY) == 0)) {
-			*srpp = srp;
 			WRITE_ONCE(sfp->low_await_idx, idx + 1);
-			return true;
+			return srp;
 		}
 	}
 	/* If not found so far, need to wrap around and search [0 ... s_idx) */
@@ -1204,9 +1196,33 @@ sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
 		second = true;
 		goto second_time;
 	}
-fini:
-	*srpp = NULL;
-	return false;
+	return NULL;
+}
+
+/*
+ * Returns true if a request is ready and its srp is written to *srpp . If nothing can be found
+ * returns false and NULL --> *srpp . If an error is detected returns true with IS_ERR(*srpp)
+ * also being true.
+ */
+static bool
+sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
+{
+	if (SG_IS_DETACHING(sfp->parentdp)) {
+		*srpp = ERR_PTR(-ENODEV);
+		return true;
+	}
+	if (sg_num_waiting_maybe_acquire(sfp) < 1) {
+		if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm)) {
+			int res = sg_sfp_blk_poll(sfp, 1);
+
+			if (res < 0) {
+				*srpp = ERR_PTR(res);
+				return true;
+			}
+		}
+	}
+	*srpp = sg_get_any_srp(sfp, true);
+	return !!*srpp;
 }
 
 /* N.B. After this function is completed what srp points to should be considered invalid. */
@@ -1256,84 +1272,67 @@ sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_request *srp)
 }
 
 static int
-sg_wait_any_mrq(struct sg_fd *sfp, struct sg_request **srpp)
+sg_wait_any_mrq(struct sg_fd *sfp, struct sg_mrq_hold *mhp, struct sg_request **srpp)
 {
+	bool hipri = mhp->hipri || test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm);
+
+	if (hipri) {
+		long state = current->state;
+		struct sg_request *srp;
+
+		do {
+			if (hipri) {
+				int res = sg_sfp_blk_poll(sfp, SG_DEF_BLK_POLL_LOOP_COUNT);
+
+				if (res < 0)
+					return res;
+			}
+			srp = sg_get_any_srp(sfp, true);
+			if (IS_ERR(srp))
+				return PTR_ERR(srp);
+			if (srp) {
+				__set_current_state(TASK_RUNNING);
+				break;
+			}
+			if (SG_IS_DETACHING(sfp->parentdp)) {
+				__set_current_state(TASK_RUNNING);
+				return -ENODEV;
+			}
+			if (signal_pending_state(state, current)) {
+				__set_current_state(TASK_RUNNING);
+				return -ERESTARTSYS;
+			}
+			cpu_relax();
+		} while (true);
+		*srpp = srp;
+		return 0;
+	}
 	if (test_bit(SG_FFD_EXCL_WAITQ, sfp->ffd_bm))
 		return __wait_event_interruptible_exclusive(sfp->cmpl_wait,
 							    sg_mrq_get_ready_srp(sfp, srpp));
 	return __wait_event_interruptible(sfp->cmpl_wait, sg_mrq_get_ready_srp(sfp, srpp));
 }
 
-static bool
-sg_srp_hybrid_sleep(struct sg_request *srp)
-{
-	struct hrtimer_sleeper hs;
-	enum hrtimer_mode mode;
-	ktime_t kt = ns_to_ktime(5000);
-
-	if (test_and_set_bit(SG_FRQ_POLL_SLEPT, srp->frq_pc_bm))
-		return false;
-	if (kt == 0)
-		return false;
-
-	mode = HRTIMER_MODE_REL;
-	hrtimer_init_sleeper_on_stack(&hs, CLOCK_MONOTONIC, mode);
-	hrtimer_set_expires(&hs.timer, kt);
-
-	do {
-		if (atomic_read(&srp->rq_st) != SG_RQ_INFLIGHT)
-			break;
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		hrtimer_sleeper_start_expires(&hs, mode);
-		if (hs.task)
-			io_schedule();
-		hrtimer_cancel(&hs.timer);
-		mode = HRTIMER_MODE_ABS;
-	} while (hs.task && !signal_pending(current));
-
-	__set_current_state(TASK_RUNNING);
-	destroy_hrtimer_on_stack(&hs.timer);
-	return true;
-}
-
 static inline bool
 sg_rq_landed(struct sg_device *sdp, struct sg_request *srp)
 {
 	return atomic_read_acquire(&srp->rq_st) != SG_RQ_INFLIGHT || SG_IS_DETACHING(sdp);
 }
 
-/* This is a blocking wait (or poll) for a given srp. */
+/*
+ * This is a blocking wait (or poll) for a given srp to reach completion. If
+ * SGV4_FLAG_HIPRI is set this functions goes into a polling loop.
+ */
 static int
-sg_wait_poll_for_given_srp(struct sg_fd *sfp, struct sg_request *srp, bool do_poll)
+sg_poll_wait4_given_srp(struct sg_fd *sfp, struct sg_request *srp)
 {
 	int res;
 	struct sg_device *sdp = sfp->parentdp;
 
-	SG_LOG(3, sfp, "%s: do_poll=%d\n", __func__, (int)do_poll);
-	if (do_poll || (srp->rq_flags & SGV4_FLAG_HIPRI))
-		goto poll_loop;
-
-	if (atomic_read(&srp->rq_st) != SG_RQ_INFLIGHT)
-		goto skip_wait;		/* and skip _acquire() */
-	/* N.B. The SG_FFD_EXCL_WAITQ flag is ignored here. */
-	res = __wait_event_interruptible(sfp->cmpl_wait, sg_rq_landed(sdp, srp));
-	if (unlikely(res)) { /* -ERESTARTSYS because signal hit thread */
-		set_bit(SG_FRQ_PC_IS_ORPHAN, srp->frq_pc_bm);
-		/* orphans harvested when sfp->keep_orphan is false */
-		sg_rq_chg_state_force(srp, SG_RQ_INFLIGHT);
-		SG_LOG(1, sfp, "%s:  wait_event_interruptible(): %s[%d]\n", __func__,
-		       (res == -ERESTARTSYS ? "ERESTARTSYS" : ""), res);
-		return res;
-	}
-skip_wait:
-	if (SG_IS_DETACHING(sdp))
-		goto detaching;
-	return sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY);
-
-poll_loop:
 	if (srp->rq_flags & SGV4_FLAG_HIPRI) {
 		long state = current->state;
 
+		SG_LOG(3, sfp, "%s: polling\n", __func__);
 		do {
 			res = sg_srp_q_blk_poll(srp, sdp->device->request_queue,
 						SG_DEF_BLK_POLL_LOOP_COUNT);
@@ -1356,18 +1355,17 @@ sg_wait_poll_for_given_srp(struct sg_fd *sfp, struct sg_request *srp, bool do_po
 			cpu_relax();
 		} while (true);
 	} else {
-		enum sg_rq_state sr_st;
-
-		if (!sg_srp_hybrid_sleep(srp))
-			return -EINVAL;
-		if (signal_pending(current))
-			return -ERESTARTSYS;
-		if (SG_IS_DETACHING(sdp))
-			goto detaching;
-		sr_st = atomic_read(&srp->rq_st);
-		if (unlikely(sr_st != SG_RQ_AWAIT_RCV))
-			return -EPROTO;         /* Logic error */
-		return sg_rq_chg_state(srp, sr_st, SG_RQ_BUSY);
+		SG_LOG(3, sfp, "%s: wait_event\n", __func__);
+		/* N.B. The SG_FFD_EXCL_WAITQ flag is ignored here. */
+		res = __wait_event_interruptible(sfp->cmpl_wait, sg_rq_landed(sdp, srp));
+		if (unlikely(res)) { /* -ERESTARTSYS because signal hit thread */
+			set_bit(SG_FRQ_PC_IS_ORPHAN, srp->frq_pc_bm);
+			/* orphans harvested when sfp->keep_orphan is false */
+			sg_rq_chg_state_force(srp, SG_RQ_INFLIGHT);
+			SG_LOG(1, sfp, "%s:  wait_event_interruptible(): %s[%d]\n", __func__,
+			       (res == -ERESTARTSYS ? "ERESTARTSYS" : ""), res);
+			return res;
+		}
 	}
 	if (atomic_read_acquire(&srp->rq_st) != SG_RQ_AWAIT_RCV)
 		return (test_bit(SG_FRQ_PC_COUNT_ACTIVE, srp->frq_pc_bm) &&
@@ -1444,7 +1442,7 @@ sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_fd *sec_sf
 		if (res)
 			break;
 		if (mreqs > 0) {
-			res = sg_wait_any_mrq(sfp, &srp);
+			res = sg_wait_any_mrq(sfp, mhp, &srp);
 			if (unlikely(res))
 				return res;	/* signal --> -ERESTARTSYS */
 			if (IS_ERR(srp)) {
@@ -1457,7 +1455,7 @@ sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_fd *sec_sf
 			}
 		}
 		if (sec_reqs > 0) {
-			res = sg_wait_any_mrq(sec_sfp, &srp);
+			res = sg_wait_any_mrq(sec_sfp, mhp, &srp);
 			if (unlikely(res))
 				return res;	/* signal --> -ERESTARTSYS */
 			if (IS_ERR(srp)) {
@@ -1551,13 +1549,16 @@ sg_mrq_prepare(struct sg_mrq_hold *mhp, bool is_svb)
 			SG_LOG(1, sfp, "%s: %s %u, MMAP in co AND here\n", __func__, rip, k);
 			return -ERANGE;
 		}
-		if (unlikely(!have_file_share && share)) {
-			SG_LOG(1, sfp, "%s: %s %u, no file share\n", __func__, rip, k);
-			return -ENOMSG;
-		}
-		if (unlikely(!have_file_share && !!(flags & SGV4_FLAG_DO_ON_OTHER))) {
-			SG_LOG(1, sfp, "%s: %s %u, no other fd to do on\n", __func__, rip, k);
-			return -ENOMSG;
+		if (!have_file_share) {
+			if (unlikely(share || (flags & SGV4_FLAG_DO_ON_OTHER))) {
+				if (share)
+					SG_LOG(1, sfp, "%s: %s %u, no file share\n", __func__,
+					       rip, k);
+				else
+					SG_LOG(1, sfp, "%s: %s %u, no other fd to do on\n",
+					       __func__, rip, k);
+				return -ENOMSG;
+			}
 		}
 		if (cdb_ap && unlikely(hp->request_len > cdb_mxlen)) {
 			SG_LOG(1, sfp, "%s: %s %u, cdb too long\n", __func__, rip, k);
@@ -1567,10 +1568,16 @@ sg_mrq_prepare(struct sg_mrq_hold *mhp, bool is_svb)
 			hp->response = cop->response;
 			hp->max_response_len = cop->max_response_len;
 		}
+		if (mhp->hipri) {
+			if (!(hp->flags & SGV4_FLAG_HIPRI))
+				hp->flags |= SGV4_FLAG_HIPRI;
+		}	/* HIPRI may be set on hp->flags and _not_ on the control object */
 		if (!is_svb) {
-			if (cop->flags & SGV4_FLAG_REC_ORDER)
-				hp->flags |= SGV4_FLAG_REC_ORDER;
-			continue;
+			if (cop->flags & SGV4_FLAG_REC_ORDER) {
+				if (!(hp->flags & SGV4_FLAG_REC_ORDER))
+					hp->flags |= SGV4_FLAG_REC_ORDER;
+			}
+			continue;	/* <<< non-svb skip rest of loop */
 		}
 		/* mrq share variable blocking (svb) additional constraints checked here */
 		if (unlikely(flags & (SGV4_FLAG_COMPLETE_B4 | SGV4_FLAG_KEEP_SHARE))) {
@@ -1713,7 +1720,7 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *m
 				++other_fp_sent;
 			continue;  /* defer completion until all submitted */
 		}
-		res = sg_wait_poll_for_given_srp(rq_sfp, srp, mhp->hipri);
+		res = sg_poll_wait4_given_srp(rq_sfp, srp);
 		if (unlikely(res)) {
 			mhp->s_res = res;
 			if (res == -ERESTARTSYS || res == -ENODEV)
@@ -1915,7 +1922,7 @@ sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold
 			goto oth_first;
 this_second:
 		if (this_fp_sent > 0) {
-			res = sg_wait_any_mrq(fp, &srp);
+			res = sg_wait_any_mrq(fp, mhp, &srp);
 			if (unlikely(res))
 				mhp->s_res = res;
 			else if (IS_ERR(srp))
@@ -1927,7 +1934,7 @@ sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold
 			continue;
 oth_first:
 		if (other_fp_sent > 0) {
-			res = sg_wait_any_mrq(o_sfp, &srp);
+			res = sg_wait_any_mrq(o_sfp, mhp, &srp);
 			if (unlikely(res))
 				mhp->s_res = res;
 			else if (IS_ERR(srp))
@@ -2006,7 +2013,7 @@ sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 		rs_srp = svb_arr[m].rs_srp;
 		if (!rs_srp)
 			continue;
-		res = sg_wait_poll_for_given_srp(fp, rs_srp, mhp->hipri);
+		res = sg_poll_wait4_given_srp(fp, rs_srp);
 		if (unlikely(res))
 			mhp->s_res = res;
 		--this_fp_sent;
@@ -2050,7 +2057,7 @@ sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 	}
 	while (this_fp_sent > 0) {	/* non-data requests */
 		--this_fp_sent;
-		res = sg_wait_any_mrq(fp, &srp);
+		res = sg_wait_any_mrq(fp, mhp, &srp);
 		if (unlikely(res)) {
 			mhp->s_res = res;
 			continue;
@@ -2066,7 +2073,7 @@ sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 	}
 	while (other_fp_sent > 0) {
 		--other_fp_sent;
-		res = sg_wait_any_mrq(o_sfp, &srp);
+		res = sg_wait_any_mrq(o_sfp, mhp, &srp);
 		if (unlikely(res)) {
 			mhp->s_res = res;
 			continue;
@@ -2199,7 +2206,6 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 {
 	bool f_non_block, is_svb;
 	int res = 0;
-	int existing_id;
 	u32 cdb_mxlen;
 	struct sg_io_v4 *cop = cwrp->h4p;	/* controlling object */
 	u32 dout_len = cop->dout_xfer_len;
@@ -2210,13 +2216,33 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 	struct sg_io_v4 *a_hds;			/* array of request objects */
 	struct sg_fd *fp = cwrp->sfp;
 	struct sg_fd *o_sfp = sg_fd_share_ptr(fp);
-	struct sg_device *sdp = fp->parentdp;
 	struct sg_mrq_hold mh;
 	struct sg_mrq_hold *mhp = &mh;
 #if IS_ENABLED(SG_LOG_ACTIVE)
 	const char *mrq_vs;
 #endif
 
+	if (unlikely(SG_IS_DETACHING(fp->parentdp) || (o_sfp && SG_IS_DETACHING(o_sfp->parentdp))))
+		return -ENODEV;
+	if (unlikely(tot_reqs == 0))
+		return 0;
+	if (unlikely(tot_reqs > U16_MAX))
+		return -E2BIG;
+	if (unlikely(!!cdb_alen != !!cop->request))
+		return -ERANGE;	/* both must be zero or both non-zero */
+	if (cdb_alen) {
+		if (unlikely(cdb_alen > SG_MAX_MULTI_REQ_SZ))
+			return  -E2BIG;
+		if (unlikely(cdb_alen % tot_reqs))
+			return -ERANGE;
+		cdb_mxlen = cdb_alen / tot_reqs;
+		if (unlikely(cdb_mxlen < 6))
+			return -ERANGE;	/* too short for SCSI cdbs */
+	} else {
+		cdb_mxlen = 0;
+	}
+	if (unlikely(din_len > SG_MAX_MULTI_REQ_SZ || dout_len > SG_MAX_MULTI_REQ_SZ))
+		return  -E2BIG;
 	mhp->cwrp = cwrp;
 	mhp->from_sg_io = from_sg_io; /* false if from SG_IOSUBMIT */
 #if IS_ENABLED(SG_LOG_ACTIVE)
@@ -2227,7 +2253,10 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 	mhp->immed = !!(cop->flags & SGV4_FLAG_IMMED);
 	mhp->hipri = !!(cop->flags & SGV4_FLAG_HIPRI);
 	mhp->stop_if = !!(cop->flags & SGV4_FLAG_STOP_IF);
+	if (unlikely(mhp->immed && mhp->stop_if))
+		return -ERANGE;
 	mhp->ordered_wr = !!(cop->flags & SGV4_FLAG_ORDERED_WR);
+	mhp->hipri = !!(cop->flags & SGV4_FLAG_HIPRI);
 	mhp->co_mmap = !!(cop->flags & SGV4_FLAG_MMAP_IO);
 	if (mhp->co_mmap)
 		mhp->co_mmap_sgatp = fp->rsv_arr[0]->sgatp;
@@ -2236,7 +2265,8 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 	mhp->s_res = 0;
 	mhp->dtd_errs = 0;
 	if (mhp->id_of_mrq) {
-		existing_id = atomic_cmpxchg(&fp->mrq_id_abort, 0, mhp->id_of_mrq);
+		int existing_id = atomic_cmpxchg(&fp->mrq_id_abort, 0, mhp->id_of_mrq);
+
 		if (existing_id && existing_id != mhp->id_of_mrq) {
 			SG_LOG(1, fp, "%s: existing id=%d id_of_mrq=%d\n", __func__, existing_id,
 			       mhp->id_of_mrq);
@@ -2284,42 +2314,13 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 		else
 			return -EPROTO;
 	}
-	if (din_len > 0) {
-		if (unlikely(din_len > SG_MAX_MULTI_REQ_SZ))
-			return  -E2BIG;
-	} else if (dout_len > 0) {
-		if (unlikely(dout_len > SG_MAX_MULTI_REQ_SZ))
-			return  -E2BIG;
-	}
-	if (unlikely(tot_reqs > U16_MAX)) {
-		return -ERANGE;
-	} else if (unlikely(mhp->immed && mhp->stop_if)) {
-		return -ERANGE;
-	} else if (unlikely(tot_reqs == 0)) {
-		return 0;
-	} else if (unlikely(!!cdb_alen != !!cop->request)) {
-		return -ERANGE;	/* both must be zero or both non-zero */
-	} else if (cdb_alen) {
-		if (unlikely(cdb_alen > SG_MAX_MULTI_REQ_SZ))
-			return  -E2BIG;
-		if (unlikely(cdb_alen % tot_reqs))
-			return -ERANGE;
-		cdb_mxlen = cdb_alen / tot_reqs;
-		if (unlikely(cdb_mxlen < 6))
-			return -ERANGE;	/* too short for SCSI cdbs */
-	} else {
-		cdb_mxlen = 0;
-	}
-
-	if (SG_IS_DETACHING(sdp) || (o_sfp && SG_IS_DETACHING(o_sfp->parentdp)))
-		return -ENODEV;
 	if (is_svb) {
 		if (unlikely(test_and_set_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm))) {
 			SG_LOG(1, fp, "%s: %s already active\n", __func__, mrq_vs);
 			return -EBUSY;
 		}
 	} else if (unlikely(test_bit(SG_FFD_SVB_ACTIVE, fp->ffd_bm))) {
-		SG_LOG(1, fp, "%s: %s disallowed with existing svb\n", __func__, mrq_vs);
+		SG_LOG(1, fp, "%s: svb active on this fd so %s disallowed\n", __func__, mrq_vs);
 		return -EBUSY;
 	}
 	a_hds = kcalloc(tot_reqs, SZ_SG_IO_V4, GFP_KERNEL | __GFP_NOWARN);
@@ -2887,7 +2888,6 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
 		res = sg_share_chk_flags(fp, rq_flags, dlen, dir, &sh_var);
 		if (unlikely(res < 0))
 			return ERR_PTR(res);
-		cwrp->keep_share = !!(rq_flags & SGV4_FLAG_KEEP_SHARE);
 	} else {
 		sh_var = SG_SHR_NONE;
 		if (unlikely(rq_flags & SGV4_FLAG_SHARE))
@@ -2962,8 +2962,21 @@ sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int id, bool is_ta
 		*srpp = ERR_PTR(-ENODEV);
 		return true;
 	}
-	srp = sg_find_srp_by_id(sfp, id, is_tag);
-	*srpp = srp;
+	srp = sg_get_srp_by_id(sfp, id, is_tag, false);
+	*srpp = srp;	/* Warning: IS_ERR(srp) may be true */
+	return !!srp;
+}
+
+static inline bool
+sg_get_any_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
+{
+	struct sg_request *srp;
+
+	if (SG_IS_DETACHING(sfp->parentdp))
+		srp = ERR_PTR(-ENODEV);
+	else
+		srp = sg_get_any_srp(sfp, false);
+	*srpp = srp;	/* Warning: IS_ERR(srp) may be true */
 	return !!srp;
 }
 
@@ -3191,17 +3204,19 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p, struct
  * of elements written to rsp_arr, which may be 0 if mrqs submitted but none waiting
  */
 static int
-sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_rcv, int num_rsp_arr,
-		      struct sg_io_v4 *rsp_arr)
+sg_mrq_iorec_complets(struct sg_fd *sfp, struct sg_mrq_hold *mhp, bool non_block, int max_rcv)
 {
 	int k, idx;
 	int res = 0;
 	struct sg_request *srp;
+	struct sg_io_v4 *rsp_arr = mhp->a_hds;
 
-	SG_LOG(3, sfp, "%s: num_rsp_arr=%d, max_rcv=%d", __func__, num_rsp_arr, max_rcv);
-	if (max_rcv == 0 || max_rcv > num_rsp_arr)
-		max_rcv = num_rsp_arr;
+	SG_LOG(3, sfp, "%s: num_responses=%d, max_rcv=%d, hipri=%u\n", __func__,
+	       mhp->tot_reqs, max_rcv, mhp->hipri);
+	if (max_rcv == 0 || max_rcv > mhp->tot_reqs)
+		max_rcv = mhp->tot_reqs;
 	k = 0;
+recheck:
 	for ( ; k < max_rcv; ++k) {
 		if (!sg_mrq_get_ready_srp(sfp, &srp))
 			break;
@@ -3209,7 +3224,7 @@ sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_rcv, int num_rs
 			return k ? k /* some but not all */ : PTR_ERR(srp);
 		if (srp->rq_flags & SGV4_FLAG_REC_ORDER) {
 			idx = srp->s_hdr4.mrq_ind;
-			if (idx >= num_rsp_arr)
+			if (idx >= mhp->tot_reqs)
 				idx = 0;	/* overwrite index 0 when trouble */
 		} else {
 			idx = k;	/* completion order */
@@ -3221,16 +3236,24 @@ sg_mrq_iorec_complets(struct sg_fd *sfp, bool non_block, int max_rcv, int num_rs
 	}
 	if (non_block || k >= max_rcv)
 		return k;
+	if (mhp->hipri) {
+		if (SG_IS_DETACHING(sfp->parentdp))
+			return -ENODEV;
+		if (signal_pending(current))
+			return -ERESTARTSYS;
+		cpu_relax();
+		goto recheck;
+	}
 	SG_LOG(6, sfp, "%s: received=%d, max=%d\n", __func__, k, max_rcv);
 	for ( ; k < max_rcv; ++k) {
-		res = sg_wait_any_mrq(sfp, &srp);
+		res = sg_wait_any_mrq(sfp, mhp, &srp);
 		if (unlikely(res))
 			return res;	/* signal --> -ERESTARTSYS */
 		if (IS_ERR(srp))
 			return k ? k : PTR_ERR(srp);
 		if (srp->rq_flags & SGV4_FLAG_REC_ORDER) {
 			idx = srp->s_hdr4.mrq_ind;
-			if (idx >= num_rsp_arr)
+			if (idx >= mhp->tot_reqs)
 				idx = 0;
 		} else {
 			idx = k;
@@ -3256,6 +3279,8 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p, bool n
 	u32 len, n;
 	struct sg_io_v4 *rsp_v4_arr;
 	void __user *pp;
+	struct sg_mrq_hold mh;
+	struct sg_mrq_hold *mhp = &mh;
 
 	SG_LOG(3, sfp, "%s: non_block=%d\n", __func__, !!non_block);
 	n = cop->din_xfer_len;
@@ -3263,9 +3288,12 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p, bool n
 		return -E2BIG;
 	if (unlikely(!cop->din_xferp || n < SZ_SG_IO_V4 || (n % SZ_SG_IO_V4)))
 		return -ERANGE;
+	memset(mhp, 0, sizeof(*mhp));
 	n /= SZ_SG_IO_V4;
+	mhp->tot_reqs = n;
 	len = n * SZ_SG_IO_V4;
 	max_rcv = cop->din_iovec_count;
+	mhp->hipri = !!(cop->flags & SGV4_FLAG_HIPRI);
 	SG_LOG(3, sfp, "%s: %s, num_reqs=%u, max_rcv=%d\n", __func__,
 	       (non_block ? "IMMED" : "blocking"), n, max_rcv);
 	rsp_v4_arr = kcalloc(n, SZ_SG_IO_V4, GFP_KERNEL);
@@ -3274,7 +3302,8 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p, bool n
 
 	sg_v4h_partial_zero(cop);
 	cop->din_resid = n;
-	res = sg_mrq_iorec_complets(sfp, non_block, max_rcv, n, rsp_v4_arr);
+	mhp->a_hds = rsp_v4_arr;
+	res = sg_mrq_iorec_complets(sfp, mhp, non_block, max_rcv);
 	if (unlikely(res < 0))
 		goto fini;
 	cop->din_resid -= res;
@@ -3294,41 +3323,70 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p, bool n
 	return res;
 }
 
-/* Either wait for command completion matching id ('-1': any); or poll for it if do_poll==true */
+static struct sg_request *
+sg_poll_wait4_srp(struct sg_fd *sfp, int id, bool is_tag, bool part_mrq)
+{
+	long state = current->state;
+	struct sg_request *srp;
+
+	do {
+		if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm)) {
+			int res = sg_sfp_blk_poll(sfp, SG_DEF_BLK_POLL_LOOP_COUNT);
+
+			if (res < 0)
+				return ERR_PTR(res);
+		}
+		if (id == -1)
+			srp = sg_get_any_srp(sfp, part_mrq);
+		else
+			srp = sg_get_srp_by_id(sfp, id, is_tag, part_mrq);
+		if (IS_ERR(srp))
+			return srp;
+		if (srp) {
+			__set_current_state(TASK_RUNNING);
+			return srp;
+		}
+		if (SG_IS_DETACHING(sfp->parentdp)) {
+			__set_current_state(TASK_RUNNING);
+			return ERR_PTR(-ENODEV);
+		}
+		if (signal_pending_state(state, current)) {
+			__set_current_state(TASK_RUNNING);
+			return ERR_PTR(-ERESTARTSYS);
+		}
+		cpu_relax();
+	} while (true);
+}
+
+/*
+ * Called from read(), ioctl(SG_IORECEIVE) or ioctl(SG_IORECEIVE_V3). Either wait event for
+ * command completion matching id ('-1': any); or poll for it if do_poll==true
+ */
 static int
 sg_wait_poll_by_id(struct sg_fd *sfp, struct sg_request **srpp, int id,
 		   bool is_tag, int do_poll)
 {
-	if (do_poll)
-		goto poll_loop;
+	if (do_poll) {
+		struct sg_request *srp = sg_poll_wait4_srp(sfp, id, is_tag, false);
 
-	if (test_bit(SG_FFD_EXCL_WAITQ, sfp->ffd_bm))
-		return __wait_event_interruptible_exclusive
+		if (IS_ERR(srp))
+			return PTR_ERR(srp);
+		*srpp = srp;
+		return 0;
+	}
+	if (test_bit(SG_FFD_EXCL_WAITQ, sfp->ffd_bm)) {
+		if (id == -1)
+			return __wait_event_interruptible_exclusive
+					(sfp->cmpl_wait, sg_get_any_ready_srp(sfp, srpp));
+		else
+			return __wait_event_interruptible_exclusive
 					(sfp->cmpl_wait, sg_get_ready_srp(sfp, srpp, id, is_tag));
-	return __wait_event_interruptible(sfp->cmpl_wait, sg_get_ready_srp(sfp, srpp, id, is_tag));
-poll_loop:
-	{
-		long state = current->state;
-		struct sg_request *srp;
-
-		do {
-			srp = sg_find_srp_by_id(sfp, id, is_tag);
-			if (srp) {
-				__set_current_state(TASK_RUNNING);
-				*srpp = srp;
-				return 0;
-			}
-			if (SG_IS_DETACHING(sfp->parentdp)) {
-				__set_current_state(TASK_RUNNING);
-				return -ENODEV;
-			}
-			if (signal_pending_state(state, current)) {
-				__set_current_state(TASK_RUNNING);
-				return -ERESTARTSYS;
-			}
-			cpu_relax();
-		} while (true);
 	}
+	if (id == -1)
+		return __wait_event_interruptible(sfp->cmpl_wait, sg_get_any_ready_srp(sfp, srpp));
+	else
+		return __wait_event_interruptible(sfp->cmpl_wait, sg_get_ready_srp(sfp, srpp, id,
+						  is_tag));
 }
 
 /*
@@ -3376,7 +3434,7 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 	id = use_tag ? tag : pack_id;
 try_again:
 	if (non_block) {
-		srp = sg_find_srp_by_id(sfp, id, use_tag);
+		srp = sg_get_srp_by_id(sfp, id, use_tag, false /* part_mrq */);
 		if (!srp)
 			return SG_IS_DETACHING(sdp) ? -ENODEV : -EAGAIN;
 	} else {
@@ -3430,7 +3488,7 @@ sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 		pack_id = h3p->pack_id;
 try_again:
 	if (non_block) {
-		srp = sg_find_srp_by_id(sfp, pack_id, false);
+		srp = sg_get_srp_by_id(sfp, pack_id, false, false);
 		if (!srp)
 			return SG_IS_DETACHING(sdp) ? -ENODEV : -EAGAIN;
 	} else {
@@ -3604,11 +3662,11 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 	}
 try_again:
 	if (non_block) {
-		srp = sg_find_srp_by_id(sfp, want_id, false);
+		srp = sg_get_srp_by_id(sfp, want_id, false, false /* part_mrq */);
 		if (!srp)
 			return SG_IS_DETACHING(sdp) ? -ENODEV : -EAGAIN;
 	} else {
-		ret = sg_wait_poll_by_id(sfp, &srp, want_id, false, false);
+		ret = sg_wait_poll_by_id(sfp, &srp, want_id, false, false /* do_poll */);
 		if (unlikely(ret))
 			return ret;	/* signal --> -ERESTARTSYS */
 		if (IS_ERR(srp))
@@ -3715,8 +3773,7 @@ sg_calc_sgat_param(struct sg_device *sdp)
 
 /*
  * Only valid for shared file descriptors. Designed to be called after a read-side request has
- * successfully completed leaving valid data in a reserve request buffer. May also be called after
- * a write-side request that has the SGV4_FLAG_KEEP_SHARE flag set. If rs_srp is NULL, acts
+ * successfully completed leaving valid data in a reserve request buffer. If rs_srp is NULL, acts
  * on first reserve request in SG_RQ_SHR_SWAP state, making it inactive and returning 0. If rs_srp
  * is non-NULL and is a reserve request and is in SG_RQ_SHR_SWAP state, makes it busy then
  * inactive and returns 0. Otherwise -EINVAL is returned, unless write-side is in progress in
@@ -4110,7 +4167,7 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp, struct sg_req
 static int
 sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 {
-	bool is_v4, hipri;
+	bool is_v4;
 	int res;
 	struct sg_request *srp = NULL;
 	u8 hu8arr[SZ_SG_IO_V4];
@@ -4139,11 +4196,9 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 				   SZ_SG_IO_V4 - v3_len))
 			return -EFAULT;
 		is_v4 = true;
-		hipri = !!(h4p->flags & SGV4_FLAG_HIPRI);
 		res = sg_submit_v4(sfp, p, h4p, true, &srp);
 	} else if (h3p->interface_id == 'S') {
 		is_v4 = false;
-		hipri = !!(h3p->flags & SGV4_FLAG_HIPRI);
 		res = sg_submit_v3(sfp, h3p, true, &srp);
 	} else {
 		pr_info_once("sg: %s: v3 or v4 interface only here\n", __func__);
@@ -4153,7 +4208,7 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		return res;
 	if (!srp)	/* mrq case: already processed all responses */
 		return res;
-	res = sg_wait_poll_for_given_srp(sfp, srp, hipri);
+	res = sg_poll_wait4_given_srp(sfp, srp);
 #if IS_ENABLED(SG_LOG_ACTIVE)
 	if (unlikely(res))
 		SG_LOG(1, sfp, "%s: unexpected srp=0x%pK  state: %s, share: %s\n", __func__,
@@ -6779,7 +6834,7 @@ sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
  * from SG_RQ_AWAIT_RCV to SG_RQ_BUSY.
  */
 static struct sg_request *
-sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
+sg_get_srp_by_id(struct sg_fd *sfp, int id, bool is_tag, bool part_mrq)
 {
 	__maybe_unused bool is_bad_st = false;
 	bool search_for_1 = (id != SG_TAG_WILDCARD);
@@ -6802,11 +6857,13 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 	s_idx = (l_await_idx < 0) ? 0 : l_await_idx;
 	idx = s_idx;
 	if (unlikely(search_for_1)) {
-second_time:
+second_time_for_1:
 		for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
 		     srp;
 		     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
-			if (test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm))
+			if (part_mrq != test_bit(SG_FRQ_PC_PART_MRQ, srp->frq_pc_bm))
+				continue;
+			if (!part_mrq && test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm))
 				continue;
 			if (unlikely(is_tag)) {
 				if (srp->tag != id)
@@ -6825,7 +6882,7 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 			s_idx = 0;
 			idx = s_idx;
 			second = true;
-			goto second_time;
+			goto second_time_for_1;
 		}
 	} else {
 		/*
@@ -6835,7 +6892,7 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 		 * is ready. If there is no queuing and the "last used" has been re-used then the
 		 * first (relative) position will be the request we want.
 		 */
-second_time2:
+second_time_for_any:
 		for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
 		     srp;
 		     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
@@ -6855,7 +6912,7 @@ sg_find_srp_by_id(struct sg_fd *sfp, int id, bool is_tag)
 			s_idx = 0;
 			idx = s_idx;
 			second = true;
-			goto second_time2;
+			goto second_time_for_any;
 		}
 	}
 	return NULL;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 82/83] sg: pollable and non-pollable requests
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (81 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 81/83] sg: blk_poll/hipri work for mrq Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  2021-04-27 21:57 ` [PATCH v18 83/83] sg: bump version to 4.0.47 Douglas Gilbert
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Pollable is a new categorization of requests that implies
ioctl(SG_IORECEIVE) can be used to complete a request. This new
category displaces "sync invocation" which originally meant that
ioctl(SG_IO) had been called. However multiple requests (mrq-s)
blur the "sync invocation" picture.

Whether or not a request will complete by itself or requires
ioctl(SG_IORECEIVE) [ioctl(SG_IORECEIVE_V3) or read(2)] and
associated helpers (e.g. ioctl(SG_GET_NUM_WAITING) and poll(2)) is
better. All requests can be divided into two groups that are
termed as pollable and non-pollable. Now both groups have their
own atomic counters for requests in AWAIT_RCV state: poll_waiting
and nonp_waiting. When a user calls ioctl(SG_GET_NUM_WAITING) they
are getting the current value of poll_waiting. The nonp_waiting
value is important for the driver's internal processing (e.g. of
shared variable blocking (svb) mrq-s) but is no business of a
user doing a single async request on the same file descriptor.

Some function names where changed for clarity.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c | 936 ++++++++++++++++++++++++----------------------
 1 file changed, 487 insertions(+), 449 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 773843a14038..5328befc0893 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -110,6 +110,12 @@ enum sg_shr_var {
 	SG_SHR_WS_RQ,		/* write-side sharing on this data carrying req */
 };
 
+enum sg_search_srp {
+	SG_SEARCH_ANY = 0,	/* searching unconstrained by pack_id or tag */
+	SG_SEARCH_BY_PACK_ID,
+	SG_SEARCH_BY_TAG,
+};
+
 /* If sum_of(dlen) of a fd exceeds this, write() will yield E2BIG */
 #define SG_TOT_FD_THRESHOLD (32 * 1024 * 1024)
 
@@ -124,10 +130,9 @@ enum sg_shr_var {
 #define SG_MAX_RSV_REQS 8	/* number of svb requests done asynchronously; assume small-ish */
 
 #define SG_PACK_ID_WILDCARD (-1)
-#define SG_TAG_WILDCARD (-1)
+#define SG_TAG_WILDCARD SG_PACK_ID_WILDCARD
 
 #define SG_ADD_RQ_MAX_RETRIES 40	/* to stop infinite _trylock(s) */
-#define SG_DEF_BLK_POLL_LOOP_COUNT 1000	/* may allow user to tweak this */
 
 /* Bit positions (flags) for sg_request::frq_lt_bm bitmask, lt: long term */
 #define SG_FRQ_LT_RESERVED	0	/* marks a reserved request */
@@ -136,16 +141,15 @@ enum sg_shr_var {
 /* Bit positions (flags) for sg_request::frq_pc_bm bitmask. pc: per command */
 #define SG_FRQ_PC_IS_V4I	0	/* true (set) when is v4 interface */
 #define SG_FRQ_PC_IS_ORPHAN	1	/* owner of request gone */
-#define SG_FRQ_PC_SYNC_INVOC	2	/* synchronous (blocking) invocation */
+#define SG_FRQ_PC_POLLABLE	2	/* sg_ioreceive may be called */
 #define SG_FRQ_PC_US_XFER	3	/* kernel<-->user_space data transfer */
 #define SG_FRQ_PC_ABORTING	4	/* in process of aborting this cmd */
 #define SG_FRQ_PC_DEACT_ORPHAN	5	/* not keeping orphan so de-activate */
 #define SG_FRQ_PC_RECEIVING	6	/* guard against multiple receivers */
 #define SG_FRQ_PC_FOR_MMAP	7	/* request needs PAGE_SIZE elements */
-#define SG_FRQ_PC_COUNT_ACTIVE	8	/* sfp->submitted + waiting active */
-#define SG_FRQ_PC_ISSUED	9	/* blk_execute_rq_nowait() finished */
-#define SG_FRQ_POLL_SLEPT	10	/* stop re-entry of hybrid_sleep() */
-#define SG_FRQ_PC_PART_MRQ	11	/* this cmd part of mrq array */
+#define SG_FRQ_PC_ISSUED	8	/* blk_execute_rq_nowait() finished */
+#define SG_FRQ_POLL_SLEPT	9	/* stop re-entry of hybrid_sleep() */
+#define SG_FRQ_PC_PART_MRQ	10	/* this cmd part of mrq array */
 
 /* Bit positions (flags) for sg_fd::ffd_bm bitmask follow */
 #define SG_FFD_FORCE_PACKID	0	/* receive only given pack_id/tag */
@@ -164,6 +168,7 @@ enum sg_shr_var {
 #define SG_FFD_SVB_ACTIVE	13	/* shared variable blocking active */
 #define SG_FFD_RESHARE		14	/* reshare limits to single rsv req */
 #define SG_FFD_CAN_REUSE_BIO	15	/* uniform svb --> can re-use bio_s */
+#define SG_FFD_SIG_PEND		16	/* (fatal) signal pending */
 
 /* Bit positions (flags) for sg_device::fdev_bm bitmask follow */
 #define SG_FDEV_EXCLUDE		0	/* have fd open with O_EXCL */
@@ -294,7 +299,8 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	int low_await_idx;	/* previous or lower await index */
 	u32 idx;		/* my index within parent's sfp_arr */
 	atomic_t submitted;	/* number inflight or awaiting receive */
-	atomic_t waiting;	/* number of requests awaiting receive */
+	atomic_t poll_waiting;	/* # of pollable requests awaiting receive */
+	atomic_t nonp_waiting;	/* # of non-pollable requests awaiting rcv */
 	atomic_t inactives;	/* number of inactive requests */
 	atomic_t sum_fd_dlens;	/* when tot_fd_thresh>0 this is sum_of(dlen) */
 	atomic_t mrq_id_abort;	/* inactive when 0, else id if aborted */
@@ -314,6 +320,15 @@ struct sg_fd {		/* holds the state of a file descriptor */
 	struct sg_request *rsv_arr[SG_MAX_RSV_REQS];
 };
 
+struct sg_fd_pollable {		/* sfp plus adds context for completions */
+	struct sg_fd *fp;
+	bool pollable;		/* can async machinery find this req ? */
+	bool immed;		/* immed set on ioctl(sg_ioreceive) ? */
+	bool part_mrq;
+	enum sg_search_srp find_by;
+	int pack_id_tag;
+};
+
 struct sg_device { /* holds the state of each scsi generic device */
 	struct scsi_device *device;
 	wait_queue_head_t open_wait;    /* queue open() when O_EXCL present */
@@ -349,9 +364,9 @@ struct sg_comm_wr_t {  /* arguments to sg_common_write() */
 };
 
 struct sg_mrq_hold {	/* for passing context between multiple requests (mrq) functions */
-	unsigned from_sg_io:1;
 	unsigned chk_abort:1;
-	unsigned immed:1;
+	unsigned immed:1;	/* may differ between sg_iosubmit and sg_ioreceive */
+	unsigned pollable:1;	/* same as immed bit during submission */
 	unsigned stop_if:1;
 	unsigned co_mmap:1;
 	unsigned ordered_wr:1;
@@ -393,7 +408,7 @@ static void sg_remove_srp(struct sg_request *srp);
 static struct sg_fd *sg_add_sfp(struct sg_device *sdp, struct file *filp);
 static void sg_remove_sfp(struct kref *);
 static void sg_remove_sfp_share(struct sg_fd *sfp, bool is_rd_side);
-static struct sg_request *sg_get_srp_by_id(struct sg_fd *sfp, int id, bool is_tag, bool part_mrq);
+static struct sg_request *sg_find_srp_from(struct sg_fd_pollable *sfp_p);
 static struct sg_request *sg_setup_req(struct sg_comm_wr_t *cwrp, enum sg_shr_var sh_var);
 static void sg_deact_request(struct sg_fd *sfp, struct sg_request *srp);
 static struct sg_device *sg_get_dev(int min_dev);
@@ -404,9 +419,8 @@ static int sg_rq_chg_state(struct sg_request *srp, enum sg_rq_state old_st,
 			   enum sg_rq_state new_st);
 static int sg_finish_rs_rq(struct sg_fd *sfp, struct sg_request *rs_srp, bool even_if_in_ws);
 static void sg_rq_chg_state_force(struct sg_request *srp, enum sg_rq_state new_st);
-static int sg_sfp_blk_poll(struct sg_fd *sfp, int loop_count);
-static int sg_srp_q_blk_poll(struct sg_request *srp, struct request_queue *q,
-			     int loop_count);
+static int sg_sfp_blk_poll_first(struct sg_fd *sfp);
+static int sg_sfp_blk_poll_all(struct sg_fd *sfp, int loop_count);
 
 #if IS_ENABLED(CONFIG_SCSI_LOGGING) && IS_ENABLED(SG_DEBUG)
 static const char *sg_rq_st_str(enum sg_rq_state rq_st, bool long_str);
@@ -919,6 +933,7 @@ sg_write(struct file *filp, const char __user *p, size_t count, loff_t *ppos)
 			 current->comm, "not setting count and/or reply_len properly");
 	}
 	sg_comm_wr_init(&cwr);
+	__set_bit(SG_FRQ_PC_POLLABLE, cwr.frq_pc_bm);
 	cwr.h3p = h3p;
 	cwr.dlen = h3p->dxfer_len;
 	cwr.timeout = sfp->timeout;
@@ -990,7 +1005,8 @@ sg_submit_v3(struct sg_fd *sfp, struct sg_io_hdr *hp, bool sync, struct sg_reque
 		clear_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(hp->timeout);
 	sg_comm_wr_init(&cwr);
-	__assign_bit(SG_FRQ_PC_SYNC_INVOC, cwr.frq_pc_bm, (int)sync);
+	if (hp->flags & SGV4_FLAG_IMMED)
+		__set_bit(SG_FRQ_PC_POLLABLE, cwr.frq_pc_bm);
 	cwr.h3p = hp;
 	cwr.dlen = hp->dxfer_len;
 	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
@@ -1152,79 +1168,17 @@ sg_side_str(struct sg_request *srp)
 }
 
 static inline int
-sg_num_waiting_maybe_acquire(struct sg_fd *sfp)
+sg_num_waiting_maybe_acquire(struct sg_fd_pollable *sfp_p)
 {
-	int num = atomic_read(&sfp->waiting);
+	struct sg_fd *fp = sfp_p->fp;
+	atomic_t *ap = sfp_p->pollable ? &fp->poll_waiting : &fp->nonp_waiting;
+	int num = atomic_read(ap);
 
 	if (num < 1)
-		num = atomic_read_acquire(&sfp->waiting);
+		num = atomic_read_acquire(ap);
 	return num;
 }
 
-/*
- * Looks for request in SG_RQ_AWAIT_RCV state on given fd that matches part_mrq. The first one
- * found is placed in SG_RQ_BUSY state and its address is returned. If none found returns NULL.
- */
-static struct sg_request *
-sg_get_any_srp(struct sg_fd *sfp, bool part_mrq)
-{
-	bool second = false;
-	int l_await_idx = READ_ONCE(sfp->low_await_idx);
-	unsigned long idx, s_idx, end_idx;
-	struct sg_request *srp;
-	struct xarray *xafp = &sfp->srp_arr;
-
-	s_idx = (l_await_idx < 0) ? 0 : l_await_idx;
-	idx = s_idx;
-	end_idx = ULONG_MAX;
-second_time:
-	for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
-	     srp;
-	     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
-		if (part_mrq != test_bit(SG_FRQ_PC_PART_MRQ, srp->frq_pc_bm))
-			continue;
-		if (likely(sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY) == 0)) {
-			WRITE_ONCE(sfp->low_await_idx, idx + 1);
-			return srp;
-		}
-	}
-	/* If not found so far, need to wrap around and search [0 ... s_idx) */
-	if (!srp && !second && s_idx > 0) {
-		end_idx = s_idx - 1;
-		s_idx = 0;
-		idx = s_idx;
-		second = true;
-		goto second_time;
-	}
-	return NULL;
-}
-
-/*
- * Returns true if a request is ready and its srp is written to *srpp . If nothing can be found
- * returns false and NULL --> *srpp . If an error is detected returns true with IS_ERR(*srpp)
- * also being true.
- */
-static bool
-sg_mrq_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
-{
-	if (SG_IS_DETACHING(sfp->parentdp)) {
-		*srpp = ERR_PTR(-ENODEV);
-		return true;
-	}
-	if (sg_num_waiting_maybe_acquire(sfp) < 1) {
-		if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm)) {
-			int res = sg_sfp_blk_poll(sfp, 1);
-
-			if (res < 0) {
-				*srpp = ERR_PTR(res);
-				return true;
-			}
-		}
-	}
-	*srpp = sg_get_any_srp(sfp, true);
-	return !!*srpp;
-}
-
 /* N.B. After this function is completed what srp points to should be considered invalid. */
 static int
 sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_request *srp)
@@ -1271,30 +1225,45 @@ sg_mrq_1complet(struct sg_mrq_hold *mhp, struct sg_request *srp)
 	return 0;
 }
 
+/*
+ * This function wraps sg_find_srp_from() so it can be called as a predicate for
+ * wait_event_interruptible().
+ */
+static bool
+sg_find_srp_pred(struct sg_fd_pollable *sfp_p, struct sg_request **srpp)
+{
+	struct sg_request *srp;
+	struct sg_fd *fp = sfp_p->fp;
+
+	if (SG_IS_DETACHING(fp->parentdp))
+		srp = ERR_PTR(-ENODEV);
+	else
+		srp = sg_find_srp_from(sfp_p);
+	*srpp = srp;	/* Warning: IS_ERR(srp) may also be true */
+	return !!srp;
+}
+
 static int
-sg_wait_any_mrq(struct sg_fd *sfp, struct sg_mrq_hold *mhp, struct sg_request **srpp)
+sg_wait_any_mrq(struct sg_fd_pollable *sfp_p, struct sg_mrq_hold *mhp, struct sg_request **srpp)
 {
-	bool hipri = mhp->hipri || test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm);
+	struct sg_fd *fp = sfp_p->fp;
+	bool hipri = mhp->hipri || test_bit(SG_FFD_HIPRI_SEEN, fp->ffd_bm);
+	int res;
 
 	if (hipri) {
 		long state = current->state;
 		struct sg_request *srp;
 
+		set_bit(SG_FFD_HIPRI_SEEN, fp->ffd_bm);
 		do {
-			if (hipri) {
-				int res = sg_sfp_blk_poll(sfp, SG_DEF_BLK_POLL_LOOP_COUNT);
-
-				if (res < 0)
-					return res;
-			}
-			srp = sg_get_any_srp(sfp, true);
+			srp = sg_find_srp_from(sfp_p);
 			if (IS_ERR(srp))
 				return PTR_ERR(srp);
 			if (srp) {
 				__set_current_state(TASK_RUNNING);
 				break;
 			}
-			if (SG_IS_DETACHING(sfp->parentdp)) {
+			if (SG_IS_DETACHING(fp->parentdp)) {
 				__set_current_state(TASK_RUNNING);
 				return -ENODEV;
 			}
@@ -1307,10 +1276,20 @@ sg_wait_any_mrq(struct sg_fd *sfp, struct sg_mrq_hold *mhp, struct sg_request **
 		*srpp = srp;
 		return 0;
 	}
-	if (test_bit(SG_FFD_EXCL_WAITQ, sfp->ffd_bm))
-		return __wait_event_interruptible_exclusive(sfp->cmpl_wait,
-							    sg_mrq_get_ready_srp(sfp, srpp));
-	return __wait_event_interruptible(sfp->cmpl_wait, sg_mrq_get_ready_srp(sfp, srpp));
+	if (test_bit(SG_FFD_EXCL_WAITQ, fp->ffd_bm))
+		res = __wait_event_interruptible_exclusive
+				(fp->cmpl_wait, sg_find_srp_pred(sfp_p, srpp));
+	else
+		res = __wait_event_interruptible(fp->cmpl_wait, sg_find_srp_pred(sfp_p, srpp));
+	if (unlikely(res)) { /* -ERESTARTSYS because signal hit thread */
+		set_bit(SG_FFD_SIG_PEND, fp->ffd_bm);
+		SG_LOG(1, fp, "%s:  wait_event_interruptible(): %s[%d]\n", __func__,
+		       (res == -ERESTARTSYS ? "ERESTARTSYS" : ""), res);
+		return res;
+	}
+	if (SG_IS_DETACHING(fp->parentdp))
+		return -ENODEV;
+	return 0;
 }
 
 static inline bool
@@ -1324,19 +1303,24 @@ sg_rq_landed(struct sg_device *sdp, struct sg_request *srp)
  * SGV4_FLAG_HIPRI is set this functions goes into a polling loop.
  */
 static int
-sg_poll_wait4_given_srp(struct sg_fd *sfp, struct sg_request *srp)
+sg_poll_wait4_given_srp(struct sg_fd_pollable *sfp_p, struct sg_request *srp)
 {
 	int res;
-	struct sg_device *sdp = sfp->parentdp;
+	struct sg_fd *fp = sfp_p->fp;
+	struct sg_device *sdp = fp->parentdp;
 
 	if (srp->rq_flags & SGV4_FLAG_HIPRI) {
 		long state = current->state;
+		struct request_queue *q = sdp->device->request_queue;
 
-		SG_LOG(3, sfp, "%s: polling\n", __func__);
+		SG_LOG(3, fp, "%s: polling\n", __func__);
 		do {
-			res = sg_srp_q_blk_poll(srp, sdp->device->request_queue,
-						SG_DEF_BLK_POLL_LOOP_COUNT);
-			if (res == -ENODATA || res > 0) {
+			if (atomic_read(&srp->rq_st) != SG_RQ_INFLIGHT) {
+				__set_current_state(TASK_RUNNING);
+				break;
+			}
+			res = blk_poll(q, srp->cookie, false /* do not spin */);
+			if (res > 0 && atomic_read(&srp->rq_st) == SG_RQ_AWAIT_RCV) {
 				__set_current_state(TASK_RUNNING);
 				break;
 			}
@@ -1355,47 +1339,55 @@ sg_poll_wait4_given_srp(struct sg_fd *sfp, struct sg_request *srp)
 			cpu_relax();
 		} while (true);
 	} else {
-		SG_LOG(3, sfp, "%s: wait_event\n", __func__);
+		SG_LOG(3, fp, "%s: wait_event\n", __func__);
 		/* N.B. The SG_FFD_EXCL_WAITQ flag is ignored here. */
-		res = __wait_event_interruptible(sfp->cmpl_wait, sg_rq_landed(sdp, srp));
+		res = __wait_event_interruptible(fp->cmpl_wait, sg_rq_landed(sdp, srp));
 		if (unlikely(res)) { /* -ERESTARTSYS because signal hit thread */
-			set_bit(SG_FRQ_PC_IS_ORPHAN, srp->frq_pc_bm);
-			/* orphans harvested when sfp->keep_orphan is false */
-			sg_rq_chg_state_force(srp, SG_RQ_INFLIGHT);
-			SG_LOG(1, sfp, "%s:  wait_event_interruptible(): %s[%d]\n", __func__,
+			set_bit(SG_FFD_SIG_PEND, fp->ffd_bm);
+			/* pollable requests may be harvested */
+			SG_LOG(1, fp, "%s:  wait_event_interruptible(): %s[%d]\n", __func__,
 			       (res == -ERESTARTSYS ? "ERESTARTSYS" : ""), res);
 			return res;
 		}
 	}
 	if (atomic_read_acquire(&srp->rq_st) != SG_RQ_AWAIT_RCV)
-		return (test_bit(SG_FRQ_PC_COUNT_ACTIVE, srp->frq_pc_bm) &&
-			atomic_read(&sfp->submitted) < 1) ? -ENODATA : 0;
+		return (atomic_read(&fp->submitted) < 1) ? -ENODATA : 0;
 	return unlikely(sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY)) ? -EPROTO : 0;
 
 detaching:
 	sg_rq_chg_state_force(srp, SG_RQ_INACTIVE);
-	atomic_inc(&sfp->inactives);
+	atomic_inc(&fp->inactives);
 	return -ENODEV;
 }
 
 static struct sg_request *
-sg_mrq_poll_either(struct sg_fd *sfp, struct sg_fd *sec_sfp, bool *on_sfp)
+sg_mrq_poll_either(struct sg_fd_pollable *sfp_p, struct sg_fd *sec_sfp, bool *on_first)
 {
 	long state = current->state;
+	struct sg_fd *fp = sfp_p->fp;
 	struct sg_request *srp;
+	struct sg_fd_pollable a_sfpoll = *sfp_p;
 
-	do {		/* alternating polling loop */
-		if (sfp) {
-			if (sg_mrq_get_ready_srp(sfp, &srp)) {
+	do {		/* first poll read-side, then poll write-side */
+		if (fp) {
+			a_sfpoll.fp = fp;
+			srp = sg_find_srp_from(&a_sfpoll);
+			if (IS_ERR(srp))
+				return srp;
+			if (srp) {
 				__set_current_state(TASK_RUNNING);
-				*on_sfp = true;
+				*on_first = true;
 				return srp;
 			}
 		}
-		if (sec_sfp && sfp != sec_sfp) {
-			if (sg_mrq_get_ready_srp(sec_sfp, &srp)) {
+		if (sec_sfp && fp != sec_sfp) {
+			a_sfpoll.fp = sec_sfp;
+			srp = sg_find_srp_from(&a_sfpoll);
+			if (IS_ERR(srp))
+				return srp;
+			if (srp) {
 				__set_current_state(TASK_RUNNING);
-				*on_sfp = false;
+				*on_first = false;
 				return srp;
 			}
 		}
@@ -1412,26 +1404,36 @@ sg_mrq_poll_either(struct sg_fd *sfp, struct sg_fd *sec_sfp, bool *on_sfp)
  * main fd over the secondary fd (sec_sfp). Increments cop->info for each successful completion.
  */
 static int
-sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_fd *sec_sfp, int mreqs,
-		int sec_reqs)
+sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd_pollable *sfp_p, struct sg_fd *sec_sfp,
+		int mreqs, int sec_reqs)
 {
-	bool on_sfp;
+	bool on_first;
 	int res;
 	struct sg_request *srp;
+	struct sg_fd *fp = sfp_p->fp;
+	struct sg_fd_pollable a_sfpoll = *sfp_p;
 
-	SG_LOG(3, sfp, "%s: mreqs=%d, sec_reqs=%d\n", __func__, mreqs, sec_reqs);
+	SG_LOG(3, fp, "%s: mreqs=%d, sec_reqs=%d\n", __func__, mreqs, sec_reqs);
 	while (mreqs + sec_reqs > 0) {
-		while (mreqs > 0 && sg_mrq_get_ready_srp(sfp, &srp)) {
+		while (mreqs > 0) {
+			a_sfpoll.fp = fp;
+			srp = sg_find_srp_from(&a_sfpoll);
 			if (IS_ERR(srp))
 				return PTR_ERR(srp);
+			if (!srp)
+				break;
 			--mreqs;
 			res = sg_mrq_1complet(mhp, srp);
 			if (unlikely(res))
 				return res;
 		}
-		while (sec_reqs > 0 && sg_mrq_get_ready_srp(sec_sfp, &srp)) {
+		while (sec_reqs > 0) {
+			a_sfpoll.fp = sec_sfp;
+			srp = sg_find_srp_from(&a_sfpoll);
 			if (IS_ERR(srp))
 				return PTR_ERR(srp);
+			if (!srp)
+				break;
 			--sec_reqs;
 			res = sg_mrq_1complet(mhp, srp);
 			if (unlikely(res))
@@ -1442,7 +1444,8 @@ sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_fd *sec_sf
 		if (res)
 			break;
 		if (mreqs > 0) {
-			res = sg_wait_any_mrq(sfp, mhp, &srp);
+			a_sfpoll.fp = fp;
+			res = sg_wait_any_mrq(&a_sfpoll, mhp, &srp);
 			if (unlikely(res))
 				return res;	/* signal --> -ERESTARTSYS */
 			if (IS_ERR(srp)) {
@@ -1455,7 +1458,8 @@ sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_fd *sec_sf
 			}
 		}
 		if (sec_reqs > 0) {
-			res = sg_wait_any_mrq(sec_sfp, mhp, &srp);
+			a_sfpoll.fp = sec_sfp;
+			res = sg_wait_any_mrq(&a_sfpoll, mhp, &srp);
 			if (unlikely(res))
 				return res;	/* signal --> -ERESTARTSYS */
 			if (IS_ERR(srp)) {
@@ -1471,10 +1475,11 @@ sg_mrq_complets(struct sg_mrq_hold *mhp, struct sg_fd *sfp, struct sg_fd *sec_sf
 	return 0;
 start_polling:
 	while (mreqs + sec_reqs > 0) {
-		srp = sg_mrq_poll_either(sfp, sec_sfp, &on_sfp);
+		a_sfpoll.fp = fp;
+		srp = sg_mrq_poll_either(&a_sfpoll, sec_sfp, &on_first);
 		if (IS_ERR(srp))
 			return PTR_ERR(srp);
-		if (on_sfp) {
+		if (on_first) {
 			--mreqs;
 			res = sg_mrq_1complet(mhp, srp);
 			if (unlikely(res))
@@ -1662,7 +1667,7 @@ sg_mrq_submit(struct sg_fd *rq_sfp, struct sg_mrq_hold *mhp, int pos_in_rq_arr,
 	r_cwrp->cmd_len = hp->request_len;
 	r_cwrp->rsv_idx = rsv_idx;
 	ul_timeout = msecs_to_jiffies(hp->timeout);
-	__assign_bit(SG_FRQ_PC_SYNC_INVOC, r_cwrp->frq_pc_bm, (int)mhp->from_sg_io);
+	__assign_bit(SG_FRQ_PC_POLLABLE, r_cwrp->frq_pc_bm, (int)mhp->pollable);
 	__set_bit(SG_FRQ_PC_IS_V4I, r_cwrp->frq_pc_bm);
 	__set_bit(SG_FRQ_PC_PART_MRQ, r_cwrp->frq_pc_bm);
 	r_cwrp->h4p = hp;
@@ -1680,7 +1685,7 @@ sg_mrq_submit(struct sg_fd *rq_sfp, struct sg_mrq_hold *mhp, int pos_in_rq_arr,
  * is processed in sg_process_svb_mrq().
  */
 static int
-sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mhp)
+sg_process_most_mrq(struct sg_fd_pollable *sfp_p, struct sg_fd *o_sfp, struct sg_mrq_hold *mhp)
 {
 	int flags, j;
 	int num_subm = 0;
@@ -1693,6 +1698,8 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *m
 	struct sg_io_v4 *cop = mhp->cwrp->h4p;
 	struct sg_io_v4 *hp;		/* ptr to request object in a_hds */
 	struct sg_request *srp;
+	struct sg_fd *fp = sfp_p->fp;
+	struct sg_fd_pollable a_sfpoll = *sfp_p;
 
 	SG_LOG(3, fp, "%s: id_of_mrq=%d, tot_reqs=%d, enter\n", __func__, mhp->id_of_mrq,
 	       mhp->tot_reqs);
@@ -1713,14 +1720,15 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *m
 		srp->s_hdr4.mrq_ind = num_subm++;
 		if (mhp->chk_abort)
 			atomic_set(&srp->s_hdr4.pack_id_of_mrq, mhp->id_of_mrq);
-		if (mhp->immed || (!(mhp->from_sg_io || (flags & shr_complet_b4)))) {
+		if (mhp->immed || !(flags & shr_complet_b4)) {
 			if (fp == rq_sfp)
 				++this_fp_sent;
 			else
 				++other_fp_sent;
 			continue;  /* defer completion until all submitted */
 		}
-		res = sg_poll_wait4_given_srp(rq_sfp, srp);
+		a_sfpoll.fp = rq_sfp;
+		res = sg_poll_wait4_given_srp(&a_sfpoll, srp);
 		if (unlikely(res)) {
 			mhp->s_res = res;
 			if (res == -ERESTARTSYS || res == -ENODEV)
@@ -1744,8 +1752,9 @@ sg_process_most_mrq(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *m
 	if (mhp->immed)
 		return res;
 	if (likely(res == 0 && (this_fp_sent + other_fp_sent) > 0)) {
-		res = sg_mrq_complets(mhp, fp, o_sfp, this_fp_sent, other_fp_sent);
-		if (res)
+		a_sfpoll.fp = fp;
+		res = sg_mrq_complets(mhp, &a_sfpoll, o_sfp, this_fp_sent, other_fp_sent);
+		if (unlikely(res))
 			mhp->s_res = res;	/* this may leave orphans */
 	}
 	if (mhp->id_of_mrq)	/* can no longer do a mrq abort */
@@ -1781,8 +1790,8 @@ sg_svb_zero_elem(struct sg_svb_elem *svb_ap, int m)
 
 /* For multiple requests (mrq) share variable blocking (svb) with no SGV4_FLAG_ORDERED_WR */
 static int
-sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mhp, int ra_ind,
-		      int *num_submp)
+sg_svb_mrq_first_come(struct sg_fd_pollable *sfp_p, struct sg_fd *o_sfp, struct sg_mrq_hold *mhp,
+		      int ra_ind, int *num_submp)
 {
 	bool chk_oth_first = false;
 	bool stop_triggered = false;
@@ -1795,6 +1804,8 @@ sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold
 	struct sg_io_v4 *hp = mhp->a_hds + ra_ind;
 	struct sg_request *srp;
 	struct sg_request *rs_srp;
+	struct sg_fd *fp = sfp_p->fp;
+	struct sg_fd_pollable a_sfpoll = *sfp_p;
 	struct sg_svb_elem svb_arr[SG_MAX_RSV_REQS];
 
 	memset(svb_arr, 0, sizeof(svb_arr));
@@ -1842,24 +1853,35 @@ sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold
 	 * memory leaks. We must wait for inflight requests to complete before final cleanup.
 	 */
 	for (k = 0; k < sent; ++k) {
-		if (other_fp_sent > 0 && sg_mrq_get_ready_srp(o_sfp, &srp)) {
+		a_sfpoll.fp = o_sfp;
+		if (other_fp_sent > 0) {
+			a_sfpoll.fp = o_sfp;
+			srp = sg_find_srp_from(&a_sfpoll);
 			if (IS_ERR(srp)) {
 				mhp->s_res = PTR_ERR(srp);
 				continue;
-			}
+			} else if (srp) {
 other_found:
-			--other_fp_sent;
-			m_ind = srp->s_hdr4.mrq_ind;
-			res = sg_mrq_1complet(mhp, srp);
-			if (unlikely(res || !sg_v4_cmd_good(mhp->a_hds + m_ind)))
-				stop_triggered = sg_svb_err_process(mhp, m_ind, o_sfp, res, false);
-			continue;  /* do available submits first */
+				--other_fp_sent;
+				m_ind = srp->s_hdr4.mrq_ind;
+				res = sg_mrq_1complet(mhp, srp);
+				if (unlikely(res || !sg_v4_cmd_good(mhp->a_hds + m_ind)))
+					stop_triggered = sg_svb_err_process(mhp, m_ind, o_sfp, res,
+									    false);
+				continue;  /* do available submits first */
+			}
 		}
-		if (this_fp_sent > 0 && sg_mrq_get_ready_srp(fp, &srp)) {
+		if (this_fp_sent > 0) {
+			a_sfpoll.fp = fp;
+			srp = sg_find_srp_from(&a_sfpoll);
 			if (IS_ERR(srp)) {
 				mhp->s_res = PTR_ERR(srp);
 				continue;
 			}
+		} else {
+			srp = NULL;
+		}
+		if (srp) {
 this_found:
 			--this_fp_sent;
 			dir = srp->s_hdr4.dir;
@@ -1922,7 +1944,8 @@ sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold
 			goto oth_first;
 this_second:
 		if (this_fp_sent > 0) {
-			res = sg_wait_any_mrq(fp, mhp, &srp);
+			a_sfpoll.fp = fp;
+			res = sg_wait_any_mrq(&a_sfpoll, mhp, &srp);
 			if (unlikely(res))
 				mhp->s_res = res;
 			else if (IS_ERR(srp))
@@ -1934,7 +1957,8 @@ sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold
 			continue;
 oth_first:
 		if (other_fp_sent > 0) {
-			res = sg_wait_any_mrq(o_sfp, mhp, &srp);
+			a_sfpoll.fp = o_sfp;
+			res = sg_wait_any_mrq(&a_sfpoll, mhp, &srp);
 			if (unlikely(res))
 				mhp->s_res = res;
 			else if (IS_ERR(srp))
@@ -1954,8 +1978,8 @@ sg_svb_mrq_first_come(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold
 }
 
 static int
-sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mhp, int ra_ind,
-		   int *num_submp)
+sg_svb_mrq_ordered(struct sg_fd_pollable *sfp_p, struct sg_fd *o_sfp, struct sg_mrq_hold *mhp,
+		   int ra_ind, int *num_submp)
 {
 	bool stop_triggered = false;
 	bool rs_fail;
@@ -1966,6 +1990,8 @@ sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 	struct sg_io_v4 *hp = mhp->a_hds + ra_ind;
 	struct sg_request *srp;
 	struct sg_request *rs_srp;
+	struct sg_fd *fp = sfp_p->fp;
+	struct sg_fd_pollable a_sfpoll = *sfp_p;
 	struct sg_svb_elem svb_arr[SG_MAX_RSV_REQS];
 
 	memset(svb_arr, 0, sizeof(svb_arr));
@@ -2013,7 +2039,8 @@ sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 		rs_srp = svb_arr[m].rs_srp;
 		if (!rs_srp)
 			continue;
-		res = sg_poll_wait4_given_srp(fp, rs_srp);
+		a_sfpoll.fp = fp;
+		res = sg_poll_wait4_given_srp(&a_sfpoll, rs_srp);
 		if (unlikely(res))
 			mhp->s_res = res;
 		--this_fp_sent;
@@ -2057,7 +2084,8 @@ sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 	}
 	while (this_fp_sent > 0) {	/* non-data requests */
 		--this_fp_sent;
-		res = sg_wait_any_mrq(fp, mhp, &srp);
+		a_sfpoll.fp = fp;
+		res = sg_wait_any_mrq(&a_sfpoll, mhp, &srp);
 		if (unlikely(res)) {
 			mhp->s_res = res;
 			continue;
@@ -2073,7 +2101,8 @@ sg_svb_mrq_ordered(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 	}
 	while (other_fp_sent > 0) {
 		--other_fp_sent;
-		res = sg_wait_any_mrq(o_sfp, mhp, &srp);
+		a_sfpoll.fp = o_sfp;
+		res = sg_wait_any_mrq(&a_sfpoll, mhp, &srp);
 		if (unlikely(res)) {
 			mhp->s_res = res;
 			continue;
@@ -2128,7 +2157,7 @@ sg_svb_cleanup(struct sg_fd *sfp)
  * per fd" rule is enforced by the SG_FFD_SVB_ACTIVE file descriptor flag.
  */
 static int
-sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mhp)
+sg_process_svb_mrq(struct sg_fd_pollable *sfp_p, struct sg_fd *o_sfp, struct sg_mrq_hold *mhp)
 {
 	bool aborted = false;
 	int j, delta_subm, subm_before, cmpl_before;
@@ -2136,6 +2165,7 @@ sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 	int num_cmpl = 0;
 	int res = 0;
 	struct sg_io_v4 *cop = mhp->cwrp->h4p;
+	struct sg_fd *fp = sfp_p->fp;
 
 	SG_LOG(3, fp, "%s: id_of_mrq=%d, tot_reqs=%d, enter\n", __func__, mhp->id_of_mrq,
 	       mhp->tot_reqs);
@@ -2153,9 +2183,9 @@ sg_process_svb_mrq(struct sg_fd *fp, struct sg_fd *o_sfp, struct sg_mrq_hold *mh
 		subm_before = num_subm;
 		cmpl_before = cop->info;
 		if (mhp->ordered_wr)
-			res = sg_svb_mrq_ordered(fp, o_sfp, mhp, j, &num_subm);
+			res = sg_svb_mrq_ordered(sfp_p, o_sfp, mhp, j, &num_subm);
 		else	/* write-side request done on first come, first served basis */
-			res = sg_svb_mrq_first_come(fp, o_sfp, mhp, j, &num_subm);
+			res = sg_svb_mrq_first_come(sfp_p, o_sfp, mhp, j, &num_subm);
 		delta_subm = num_subm - subm_before;
 		num_cmpl += (cop->info - cmpl_before);
 		if (res || delta_subm == 0)	/* error or didn't make progress */
@@ -2221,8 +2251,9 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 #if IS_ENABLED(SG_LOG_ACTIVE)
 	const char *mrq_vs;
 #endif
+	struct sg_fd_pollable a_sfpoll;
 
-	if (unlikely(SG_IS_DETACHING(fp->parentdp) || (o_sfp && SG_IS_DETACHING(o_sfp->parentdp))))
+	if (SG_IS_DETACHING(fp->parentdp) || (o_sfp && SG_IS_DETACHING(o_sfp->parentdp)))
 		return -ENODEV;
 	if (unlikely(tot_reqs == 0))
 		return 0;
@@ -2244,13 +2275,13 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 	if (unlikely(din_len > SG_MAX_MULTI_REQ_SZ || dout_len > SG_MAX_MULTI_REQ_SZ))
 		return  -E2BIG;
 	mhp->cwrp = cwrp;
-	mhp->from_sg_io = from_sg_io; /* false if from SG_IOSUBMIT */
 #if IS_ENABLED(SG_LOG_ACTIVE)
 	mrq_vs = sg_mrq_var_str(from_sg_io, cop->flags);
 #endif
 	f_non_block = !!(fp->filp->f_flags & O_NONBLOCK);
 	is_svb = !!(cop->flags & SGV4_FLAG_SHARE);	/* via ioctl(SG_IOSUBMIT) only */
 	mhp->immed = !!(cop->flags & SGV4_FLAG_IMMED);
+	mhp->pollable = mhp->immed;
 	mhp->hipri = !!(cop->flags & SGV4_FLAG_HIPRI);
 	mhp->stop_if = !!(cop->flags & SGV4_FLAG_STOP_IF);
 	if (unlikely(mhp->immed && mhp->stop_if))
@@ -2264,6 +2295,12 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 	mhp->tot_reqs = tot_reqs;
 	mhp->s_res = 0;
 	mhp->dtd_errs = 0;
+	a_sfpoll.fp = fp;
+	a_sfpoll.pollable = mhp->immed;
+	a_sfpoll.immed = mhp->immed;
+	a_sfpoll.part_mrq = true;
+	a_sfpoll.find_by = SG_SEARCH_ANY;
+	a_sfpoll.pack_id_tag = -1;
 	if (mhp->id_of_mrq) {
 		int existing_id = atomic_cmpxchg(&fp->mrq_id_abort, 0, mhp->id_of_mrq);
 
@@ -2358,9 +2395,9 @@ sg_do_multi_req(struct sg_comm_wr_t *cwrp, bool from_sg_io)
 		clear_bit(SG_FFD_NO_CMD_Q, o_sfp->ffd_bm);
 
 	if (is_svb)
-		res = sg_process_svb_mrq(fp, o_sfp, mhp);
+		res = sg_process_svb_mrq(&a_sfpoll, o_sfp, mhp);
 	else
-		res = sg_process_most_mrq(fp, o_sfp, mhp);
+		res = sg_process_most_mrq(&a_sfpoll, o_sfp, mhp);
 fini:
 	if (!mhp->immed) {		/* for the blocking mrq invocations */
 		int rres = sg_mrq_arr_flush(mhp);
@@ -2417,8 +2454,9 @@ sg_submit_v4(struct sg_fd *sfp, void __user *p, struct sg_io_v4 *h4p, bool from_
 		clear_bit(SG_FFD_NO_CMD_Q, sfp->ffd_bm);
 	ul_timeout = msecs_to_jiffies(h4p->timeout);
 	cwr.sfp = sfp;
-	__assign_bit(SG_FRQ_PC_SYNC_INVOC, cwr.frq_pc_bm, (int)from_sg_io);
 	__set_bit(SG_FRQ_PC_IS_V4I, cwr.frq_pc_bm);
+	if (h4p->flags & SGV4_FLAG_IMMED)
+		__set_bit(SG_FRQ_PC_POLLABLE, cwr.frq_pc_bm);
 	cwr.h4p = h4p;
 	cwr.timeout = min_t(unsigned long, ul_timeout, INT_MAX);
 	cwr.cmd_len = h4p->request_len;
@@ -2831,15 +2869,12 @@ sg_execute_cmd(struct sg_fd *sfp, struct sg_request *srp)
 
 	kref_get(&sfp->f_ref); /* put usually in: sg_rq_end_io() */
 	sg_rq_chg_state_force(srp, SG_RQ_INFLIGHT);
-	/* >>>>>>> send cmd/req off to other levels <<<<<<<< */
-	if (!test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm)) {
-		atomic_inc(&sfp->submitted);
-		set_bit(SG_FRQ_PC_COUNT_ACTIVE, srp->frq_pc_bm);
-	}
+	atomic_inc(&sfp->submitted);
 	if (srp->rq_flags & SGV4_FLAG_HIPRI) {
 		rqq->cmd_flags |= REQ_HIPRI;
 		srp->cookie = request_to_qc_t(rqq->mq_hctx, rqq);
 	}
+	/* >>>>>>> send cmd/req off to other levels <<<<<<<< */
 	blk_execute_rq_nowait(sdp->disk, rqq, (int)at_head, sg_rq_end_io);
 	set_bit(SG_FRQ_PC_ISSUED, srp->frq_pc_bm);
 }
@@ -2948,38 +2983,6 @@ sg_common_write(struct sg_comm_wr_t *cwrp)
  * *********************************************************************************************
  */
 
-/*
- * This function is called by wait_event_interruptible in sg_read() and sg_ctl_ioreceive().
- * wait_event_interruptible will return if this one returns true (or an event like a signal (e.g.
- * control-C) occurs).
- */
-static inline bool
-sg_get_ready_srp(struct sg_fd *sfp, struct sg_request **srpp, int id, bool is_tag)
-{
-	struct sg_request *srp;
-
-	if (SG_IS_DETACHING(sfp->parentdp)) {
-		*srpp = ERR_PTR(-ENODEV);
-		return true;
-	}
-	srp = sg_get_srp_by_id(sfp, id, is_tag, false);
-	*srpp = srp;	/* Warning: IS_ERR(srp) may be true */
-	return !!srp;
-}
-
-static inline bool
-sg_get_any_ready_srp(struct sg_fd *sfp, struct sg_request **srpp)
-{
-	struct sg_request *srp;
-
-	if (SG_IS_DETACHING(sfp->parentdp))
-		srp = ERR_PTR(-ENODEV);
-	else
-		srp = sg_get_any_srp(sfp, false);
-	*srpp = srp;	/* Warning: IS_ERR(srp) may be true */
-	return !!srp;
-}
-
 /* Returns number of bytes copied to user space provided sense buffer or negated errno value. */
 static int
 sg_copy_sense(struct sg_request *srp)
@@ -3204,24 +3207,26 @@ sg_receive_v4(struct sg_fd *sfp, struct sg_request *srp, void __user *p, struct
  * of elements written to rsp_arr, which may be 0 if mrqs submitted but none waiting
  */
 static int
-sg_mrq_iorec_complets(struct sg_fd *sfp, struct sg_mrq_hold *mhp, bool non_block, int max_rcv)
+sg_mrq_iorec_complets(struct sg_fd_pollable *sfp_p, struct sg_mrq_hold *mhp, int max_rcv)
 {
 	int k, idx;
 	int res = 0;
 	struct sg_request *srp;
 	struct sg_io_v4 *rsp_arr = mhp->a_hds;
+	struct sg_fd *fp = sfp_p->fp;
 
-	SG_LOG(3, sfp, "%s: num_responses=%d, max_rcv=%d, hipri=%u\n", __func__,
+	SG_LOG(3, fp, "%s: num_responses=%d, max_rcv=%d, hipri=%u\n", __func__,
 	       mhp->tot_reqs, max_rcv, mhp->hipri);
 	if (max_rcv == 0 || max_rcv > mhp->tot_reqs)
 		max_rcv = mhp->tot_reqs;
 	k = 0;
 recheck:
 	for ( ; k < max_rcv; ++k) {
-		if (!sg_mrq_get_ready_srp(sfp, &srp))
-			break;
+		srp = sg_find_srp_from(sfp_p);
 		if (IS_ERR(srp))
 			return k ? k /* some but not all */ : PTR_ERR(srp);
+		if (!srp)
+			break;
 		if (srp->rq_flags & SGV4_FLAG_REC_ORDER) {
 			idx = srp->s_hdr4.mrq_ind;
 			if (idx >= mhp->tot_reqs)
@@ -3229,24 +3234,24 @@ sg_mrq_iorec_complets(struct sg_fd *sfp, struct sg_mrq_hold *mhp, bool non_block
 		} else {
 			idx = k;	/* completion order */
 		}
-		res = sg_receive_v4(sfp, srp, NULL, rsp_arr + idx);
+		res = sg_receive_v4(fp, srp, NULL, rsp_arr + idx);
 		if (unlikely(res))
 			return res;
 		rsp_arr[idx].info |= SG_INFO_MRQ_FINI;
 	}
-	if (non_block || k >= max_rcv)
+	if (sfp_p->immed || k >= max_rcv)
 		return k;
 	if (mhp->hipri) {
-		if (SG_IS_DETACHING(sfp->parentdp))
+		if (SG_IS_DETACHING(fp->parentdp))
 			return -ENODEV;
 		if (signal_pending(current))
 			return -ERESTARTSYS;
 		cpu_relax();
 		goto recheck;
 	}
-	SG_LOG(6, sfp, "%s: received=%d, max=%d\n", __func__, k, max_rcv);
+	SG_LOG(6, fp, "%s: received=%d, max=%d\n", __func__, k, max_rcv);
 	for ( ; k < max_rcv; ++k) {
-		res = sg_wait_any_mrq(sfp, mhp, &srp);
+		res = sg_wait_any_mrq(sfp_p, mhp, &srp);
 		if (unlikely(res))
 			return res;	/* signal --> -ERESTARTSYS */
 		if (IS_ERR(srp))
@@ -3258,7 +3263,7 @@ sg_mrq_iorec_complets(struct sg_fd *sfp, struct sg_mrq_hold *mhp, bool non_block
 		} else {
 			idx = k;
 		}
-		res = sg_receive_v4(sfp, srp, NULL, rsp_arr + idx);
+		res = sg_receive_v4(fp, srp, NULL, rsp_arr + idx);
 		if (unlikely(res))
 			return res;
 		rsp_arr[k].info |= SG_INFO_MRQ_FINI;
@@ -3272,7 +3277,7 @@ sg_mrq_iorec_complets(struct sg_fd *sfp, struct sg_mrq_hold *mhp, bool non_block
  * may succeed but will get different requests).
  */
 static int
-sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p, bool non_block)
+sg_mrq_ioreceive(struct sg_fd_pollable *sfp_p, struct sg_io_v4 *cop, void __user *p)
 {
 	int res = 0;
 	int max_rcv;
@@ -3281,8 +3286,10 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p, bool n
 	void __user *pp;
 	struct sg_mrq_hold mh;
 	struct sg_mrq_hold *mhp = &mh;
+	struct sg_fd *fp = sfp_p->fp;
 
-	SG_LOG(3, sfp, "%s: non_block=%d\n", __func__, !!non_block);
+	SG_LOG(3, fp, "%s: immed=%d\n", __func__, sfp_p->immed);
+	sfp_p->part_mrq = true;
 	n = cop->din_xfer_len;
 	if (unlikely(n > SG_MAX_MULTI_REQ_SZ))
 		return -E2BIG;
@@ -3294,8 +3301,8 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p, bool n
 	len = n * SZ_SG_IO_V4;
 	max_rcv = cop->din_iovec_count;
 	mhp->hipri = !!(cop->flags & SGV4_FLAG_HIPRI);
-	SG_LOG(3, sfp, "%s: %s, num_reqs=%u, max_rcv=%d\n", __func__,
-	       (non_block ? "IMMED" : "blocking"), n, max_rcv);
+	SG_LOG(3, fp, "%s: %s, num_reqs=%u, max_rcv=%d\n", __func__,
+	       (sfp_p->immed ? "IMMED" : "blocking"), n, max_rcv);
 	rsp_v4_arr = kcalloc(n, SZ_SG_IO_V4, GFP_KERNEL);
 	if (unlikely(!rsp_v4_arr))
 		return -ENOMEM;
@@ -3303,7 +3310,7 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p, bool n
 	sg_v4h_partial_zero(cop);
 	cop->din_resid = n;
 	mhp->a_hds = rsp_v4_arr;
-	res = sg_mrq_iorec_complets(sfp, mhp, non_block, max_rcv);
+	res = sg_mrq_iorec_complets(sfp_p, mhp, max_rcv);
 	if (unlikely(res < 0))
 		goto fini;
 	cop->din_resid -= res;
@@ -3316,37 +3323,36 @@ sg_mrq_ioreceive(struct sg_fd *sfp, struct sg_io_v4 *cop, void __user *p, bool n
 		if (copy_to_user(pp, rsp_v4_arr, len))
 			res = -EFAULT;
 	} else {
-		SG_LOG(1, sfp, "%s: cop->din_xferp==NULL ?_?\n", __func__);
+		SG_LOG(1, fp, "%s: cop->din_xferp==NULL ?_?\n", __func__);
 	}
 fini:
 	kfree(rsp_v4_arr);
 	return res;
 }
 
+/* Returns first srp that meets the constraints in sfp_p */
 static struct sg_request *
-sg_poll_wait4_srp(struct sg_fd *sfp, int id, bool is_tag, bool part_mrq)
+sg_poll4_any_srp(struct sg_fd_pollable *sfp_p)
 {
 	long state = current->state;
 	struct sg_request *srp;
+	struct sg_fd *fp = sfp_p->fp;
 
 	do {
-		if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm)) {
-			int res = sg_sfp_blk_poll(sfp, SG_DEF_BLK_POLL_LOOP_COUNT);
+		if (test_bit(SG_FFD_HIPRI_SEEN, fp->ffd_bm)) {
+			int res = sg_sfp_blk_poll_first(fp);
 
 			if (res < 0)
 				return ERR_PTR(res);
 		}
-		if (id == -1)
-			srp = sg_get_any_srp(sfp, part_mrq);
-		else
-			srp = sg_get_srp_by_id(sfp, id, is_tag, part_mrq);
+		srp = sg_find_srp_from(sfp_p);
 		if (IS_ERR(srp))
 			return srp;
 		if (srp) {
 			__set_current_state(TASK_RUNNING);
 			return srp;
 		}
-		if (SG_IS_DETACHING(sfp->parentdp)) {
+		if (SG_IS_DETACHING(fp->parentdp)) {
 			__set_current_state(TASK_RUNNING);
 			return ERR_PTR(-ENODEV);
 		}
@@ -3359,34 +3365,37 @@ sg_poll_wait4_srp(struct sg_fd *sfp, int id, bool is_tag, bool part_mrq)
 }
 
 /*
- * Called from read(), ioctl(SG_IORECEIVE) or ioctl(SG_IORECEIVE_V3). Either wait event for
- * command completion matching id ('-1': any); or poll for it if do_poll==true
+ * Called from read(), ioctl(SG_IORECEIVE) or ioctl(SG_IORECEIVE_V3). Poll or wait_event
+ * depending hipri setting.
  */
 static int
-sg_wait_poll_by_id(struct sg_fd *sfp, struct sg_request **srpp, int id,
-		   bool is_tag, int do_poll)
+sg_poll_or_wait_ev_srp(struct sg_fd_pollable *sfp_p, struct sg_request **srpp, bool hipri)
 {
-	if (do_poll) {
-		struct sg_request *srp = sg_poll_wait4_srp(sfp, id, is_tag, false);
+	int res;
+	struct sg_fd *fp = sfp_p->fp;
+
+	if (hipri) {
+		struct sg_request *srp = sg_poll4_any_srp(sfp_p);
 
 		if (IS_ERR(srp))
 			return PTR_ERR(srp);
 		*srpp = srp;
 		return 0;
 	}
-	if (test_bit(SG_FFD_EXCL_WAITQ, sfp->ffd_bm)) {
-		if (id == -1)
-			return __wait_event_interruptible_exclusive
-					(sfp->cmpl_wait, sg_get_any_ready_srp(sfp, srpp));
-		else
-			return __wait_event_interruptible_exclusive
-					(sfp->cmpl_wait, sg_get_ready_srp(sfp, srpp, id, is_tag));
-	}
-	if (id == -1)
-		return __wait_event_interruptible(sfp->cmpl_wait, sg_get_any_ready_srp(sfp, srpp));
+	if (test_bit(SG_FFD_EXCL_WAITQ, fp->ffd_bm))
+		res = __wait_event_interruptible_exclusive(fp->cmpl_wait,
+							   sg_find_srp_pred(sfp_p, srpp));
 	else
-		return __wait_event_interruptible(sfp->cmpl_wait, sg_get_ready_srp(sfp, srpp, id,
-						  is_tag));
+		res = __wait_event_interruptible(fp->cmpl_wait, sg_find_srp_pred(sfp_p, srpp));
+	if (unlikely(res)) { /* -ERESTARTSYS because signal hit thread */
+		set_bit(SG_FFD_SIG_PEND, fp->ffd_bm);
+		SG_LOG(1, fp, "%s:  wait_event_interruptible(): %s[%d]\n", __func__,
+		       (res == -ERESTARTSYS ? "ERESTARTSYS" : ""), res);
+		return res;
+	}
+	if (SG_IS_DETACHING(fp->parentdp))
+		return -ENODEV;
+	return 0;
 }
 
 /*
@@ -3399,14 +3408,12 @@ static int
 sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 {
 	bool non_block = SG_IS_O_NONBLOCK(sfp);
-	bool use_tag = false;
-	int res, id;
-	int pack_id = SG_PACK_ID_WILDCARD;
-	int tag = SG_TAG_WILDCARD;
+	int res;
 	struct sg_io_v4 h4;
 	struct sg_io_v4 *h4p = &h4;
 	struct sg_device *sdp = sfp->parentdp;
 	struct sg_request *srp;
+	struct sg_fd_pollable a_sfpoll;
 
 	res = sg_allow_if_err_recovery(sdp, non_block);
 	if (unlikely(res))
@@ -3421,29 +3428,36 @@ sg_ctl_ioreceive(struct sg_fd *sfp, void __user *p)
 	       !!(h4p->flags & SGV4_FLAG_IMMED), !!(h4p->flags & SGV4_FLAG_HIPRI));
 	if (h4p->flags & SGV4_FLAG_IMMED)
 		non_block = true;	/* set by either this or O_NONBLOCK */
+	a_sfpoll.fp = sfp;
+	a_sfpoll.pollable = true;
+	a_sfpoll.immed = non_block;
 	if (h4p->flags & SGV4_FLAG_MULTIPLE_REQS)
-		return sg_mrq_ioreceive(sfp, h4p, p, non_block);
+		return sg_mrq_ioreceive(&a_sfpoll, h4p, p);
+	a_sfpoll.part_mrq = false;
 	/* read in part of v3 or v4 header for pack_id or tag based find */
 	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm)) {
-		use_tag = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm);
-		if (use_tag)
-			tag = h4p->request_tag;	/* top 32 bits ignored */
-		else
-			pack_id = h4p->request_extra;
+		if (test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm)) {
+			a_sfpoll.find_by = SG_SEARCH_BY_TAG;
+			a_sfpoll.pack_id_tag = h4p->request_tag;/* top 32 bits ignored */
+		} else {
+			a_sfpoll.find_by = SG_SEARCH_BY_PACK_ID;
+			a_sfpoll.pack_id_tag = h4p->request_extra;
+		}
+	} else {
+		a_sfpoll.find_by = SG_SEARCH_ANY;
+		a_sfpoll.pack_id_tag = SG_PACK_ID_WILDCARD;
 	}
-	id = use_tag ? tag : pack_id;
 try_again:
 	if (non_block) {
-		srp = sg_get_srp_by_id(sfp, id, use_tag, false /* part_mrq */);
+		srp = sg_find_srp_from(&a_sfpoll);
 		if (!srp)
 			return SG_IS_DETACHING(sdp) ? -ENODEV : -EAGAIN;
 	} else {
-		res = sg_wait_poll_by_id(sfp, &srp, pack_id, use_tag,
-					 !!(h4p->flags & SGV4_FLAG_HIPRI));
-		if (IS_ERR(srp))
-			return PTR_ERR(srp);
+		res = sg_poll_or_wait_ev_srp(&a_sfpoll, &srp, !!(h4p->flags & SGV4_FLAG_HIPRI));
 		if (unlikely(res))
 			return res;	/* signal --> -ERESTARTSYS */
+		if (IS_ERR(srp))
+			return PTR_ERR(srp);
 	}
 	if (test_and_set_bit(SG_FRQ_PC_RECEIVING, srp->frq_pc_bm)) {
 		cpu_relax();
@@ -3463,11 +3477,11 @@ sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 {
 	bool non_block = SG_IS_O_NONBLOCK(sfp);
 	int res;
-	int pack_id = SG_PACK_ID_WILDCARD;
 	struct sg_io_hdr h3;
 	struct sg_io_hdr *h3p = &h3;
 	struct sg_device *sdp = sfp->parentdp;
 	struct sg_request *srp;
+	struct sg_fd_pollable a_sfpoll;
 
 	res = sg_allow_if_err_recovery(sdp, non_block);
 	if (unlikely(res))
@@ -3480,20 +3494,28 @@ sg_ctl_ioreceive_v3(struct sg_fd *sfp, void __user *p)
 		return -EPERM;
 	if (h3p->flags & SGV4_FLAG_IMMED)
 		non_block = true;	/* set by either this or O_NONBLOCK */
+	a_sfpoll.fp = sfp;
+	a_sfpoll.pollable = true;
+	a_sfpoll.immed = non_block;
+	a_sfpoll.part_mrq = false;
 	SG_LOG(3, sfp, "%s: non_block(+IMMED)=%d\n", __func__, non_block);
 	if (unlikely(h3p->flags & SGV4_FLAG_MULTIPLE_REQS))
 		return -EINVAL;
 
-	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm))
-		pack_id = h3p->pack_id;
+	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm)) {
+		a_sfpoll.find_by = SG_SEARCH_BY_PACK_ID;
+		a_sfpoll.pack_id_tag = h3p->pack_id;
+	} else {
+		a_sfpoll.find_by = SG_SEARCH_ANY;
+		a_sfpoll.pack_id_tag = SG_PACK_ID_WILDCARD;
+	}
 try_again:
 	if (non_block) {
-		srp = sg_get_srp_by_id(sfp, pack_id, false, false);
+		srp = sg_find_srp_from(&a_sfpoll);
 		if (!srp)
 			return SG_IS_DETACHING(sdp) ? -ENODEV : -EAGAIN;
 	} else {
-		res = sg_wait_poll_by_id(sfp, &srp, pack_id, false,
-					 !!(h3p->flags & SGV4_FLAG_HIPRI));
+		res = sg_poll_or_wait_ev_srp(&a_sfpoll, &srp, !!(h3p->flags & SGV4_FLAG_HIPRI));
 		if (unlikely(res))
 			return res;	/* signal --> -ERESTARTSYS */
 		if (IS_ERR(srp))
@@ -3597,13 +3619,13 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 {
 	bool could_be_v3;
 	bool non_block = !!(filp->f_flags & O_NONBLOCK);
-	int want_id = SG_PACK_ID_WILDCARD;
 	int hlen, ret;
 	struct sg_device *sdp = NULL;
 	struct sg_fd *sfp;
 	struct sg_request *srp = NULL;
 	struct sg_header *h2p = NULL;
 	struct sg_io_hdr a_sg_io_hdr;
+	struct sg_fd_pollable a_sfpoll;
 
 	/*
 	 * This could cause a response to be stranded. Close the associated file descriptor to
@@ -3623,6 +3645,9 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 	could_be_v3 = (count >= SZ_SG_IO_HDR);
 	hlen = could_be_v3 ? SZ_SG_IO_HDR : SZ_SG_HEADER;
 	h2p = (struct sg_header *)&a_sg_io_hdr;
+	a_sfpoll.fp = sfp;
+	a_sfpoll.pollable = true;
+	a_sfpoll.part_mrq = false;
 
 	if (test_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm) && (int)count >= hlen) {
 		/*
@@ -3635,6 +3660,7 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 			struct sg_io_hdr *v3_hdr = (struct sg_io_hdr *)h2p;
 
 			if (likely(v3_hdr->interface_id == 'S')) {
+				int want_id;
 				struct sg_io_hdr __user *h3_up;
 
 				h3_up = (struct sg_io_hdr __user *)p;
@@ -3650,23 +3676,31 @@ sg_read(struct file *filp, char __user *p, size_t count, loff_t *ppos)
 					if (flgs & SGV4_FLAG_IMMED)
 						non_block = true;
 				}
+				a_sfpoll.find_by = SG_SEARCH_BY_PACK_ID;
+				a_sfpoll.pack_id_tag = want_id;
 			} else if (v3_hdr->interface_id == 'Q') {
 				pr_info_once("sg: %s: v4 interface disallowed here\n", __func__);
 				return -EPERM;
 			} else {
 				return -EPERM;
 			}
-		} else { /* for v1+v2 interfaces, this is the 3rd integer */
-			want_id = h2p->pack_id;
+		} else { /* for v1+v2 interfaces, pack_id is the 3rd integer */
+			a_sfpoll.find_by = SG_SEARCH_BY_PACK_ID;
+			a_sfpoll.pack_id_tag = h2p->pack_id;
 		}
+	} else {
+		a_sfpoll.find_by = SG_SEARCH_ANY;
+		a_sfpoll.pack_id_tag = SG_PACK_ID_WILDCARD;
 	}
+	a_sfpoll.immed = non_block;
+
 try_again:
 	if (non_block) {
-		srp = sg_get_srp_by_id(sfp, want_id, false, false /* part_mrq */);
+		srp = sg_find_srp_from(&a_sfpoll);
 		if (!srp)
 			return SG_IS_DETACHING(sdp) ? -ENODEV : -EAGAIN;
 	} else {
-		ret = sg_wait_poll_by_id(sfp, &srp, want_id, false, false /* do_poll */);
+		ret = sg_poll_or_wait_ev_srp(&a_sfpoll, &srp, false);
 		if (unlikely(ret))
 			return ret;	/* signal --> -ERESTARTSYS */
 		if (IS_ERR(srp))
@@ -4156,7 +4190,7 @@ sg_fill_request_element(struct sg_fd *sfp, struct sg_request *srp, struct sg_req
 	if (rip->duration == U32_MAX)
 		rip->duration = 0;
 	rip->orphan = test_bit(SG_FRQ_PC_IS_ORPHAN, srp->frq_pc_bm);
-	rip->sg_io_owned = test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm);
+	rip->sg_io_owned = !test_bit(SG_FRQ_PC_POLLABLE, srp->frq_pc_bm);
 	rip->problem = !sg_result_is_good(srp->rq_result);
 	rip->pack_id = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm) ? srp->tag : srp->pack_id;
 	rip->usr_ptr = SG_IS_V4I(srp) ? uptr64(srp->s_hdr4.usr_ptr) : srp->s_hdr3.usr_ptr;
@@ -4173,6 +4207,7 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	u8 hu8arr[SZ_SG_IO_V4];
 	struct sg_io_hdr *h3p = (struct sg_io_hdr *)hu8arr;
 	struct sg_io_v4 *h4p = (struct sg_io_v4 *)hu8arr;
+	struct sg_fd_pollable a_sfpoll;
 
 	SG_LOG(3, sfp, "%s:  SG_IO%s\n", __func__,
 	       (SG_IS_O_NONBLOCK(sfp) ? " O_NONBLOCK ignored" : ""));
@@ -4195,9 +4230,13 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		if (copy_from_user(hu8arr + v3_len, ((u8 __user *)p) + v3_len,
 				   SZ_SG_IO_V4 - v3_len))
 			return -EFAULT;
+		if (h4p->flags & SGV4_FLAG_IMMED)
+			return -EINVAL;
 		is_v4 = true;
 		res = sg_submit_v4(sfp, p, h4p, true, &srp);
 	} else if (h3p->interface_id == 'S') {
+		if (h3p->flags & SGV4_FLAG_IMMED)
+			return -EINVAL;
 		is_v4 = false;
 		res = sg_submit_v3(sfp, h3p, true, &srp);
 	} else {
@@ -4208,7 +4247,11 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		return res;
 	if (!srp)	/* mrq case: already processed all responses */
 		return res;
-	res = sg_poll_wait4_given_srp(sfp, srp);
+	a_sfpoll.fp = sfp;
+	a_sfpoll.pollable = false;
+	a_sfpoll.immed = false;
+	a_sfpoll.part_mrq = false;
+	res = sg_poll_wait4_given_srp(&a_sfpoll, srp);
 #if IS_ENABLED(SG_LOG_ACTIVE)
 	if (unlikely(res))
 		SG_LOG(1, sfp, "%s: unexpected srp=0x%pK  state: %s, share: %s\n", __func__,
@@ -4229,23 +4272,24 @@ sg_ctl_sg_io(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
  * returns NULL.
  */
 static struct sg_request *
-sg_match_request(struct sg_fd *sfp, bool use_tag, int id)
+sg_match_request(struct sg_fd_pollable *sfp_p, bool use_tag, int id)
 {
 	unsigned long idx;
 	struct sg_request *srp;
+	struct sg_fd *fp = sfp_p->fp;
 
-	if (sg_num_waiting_maybe_acquire(sfp) < 1)
+	if (sg_num_waiting_maybe_acquire(sfp_p) < 1)
 		return NULL;
 	if (id == SG_PACK_ID_WILDCARD) {
-		xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT)
+		xa_for_each_marked(&fp->srp_arr, idx, srp, SG_XA_RQ_AWAIT)
 			return srp;
 	} else if (use_tag) {
-		xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
+		xa_for_each_marked(&fp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
 			if (id == srp->tag)
 				return srp;
 		}
 	} else {
-		xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
+		xa_for_each_marked(&fp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
 			if (id == srp->pack_id)
 				return srp;
 		}
@@ -4259,18 +4303,19 @@ sg_match_request(struct sg_fd *sfp, bool use_tag, int id)
  * search restarts from the beginning of the list. If no match is found then NULL is returned.
  */
 static struct sg_request *
-sg_match_first_mrq_after(struct sg_fd *sfp, int pack_id, struct sg_request *after_rp)
+sg_match_first_mrq_after(struct sg_fd_pollable *sfp_p, int pack_id, struct sg_request *after_rp)
 {
 	bool found = false;
 	bool look_for_after = after_rp ? true : false;
 	int id;
 	unsigned long idx;
 	struct sg_request *srp;
+	struct sg_fd *fp = sfp_p->fp;
 
-	if (sg_num_waiting_maybe_acquire(sfp) < 1)
+	if (sg_num_waiting_maybe_acquire(sfp_p) < 1)
 		return NULL;
 once_more:
-	xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
+	xa_for_each_marked(&fp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
 		if (look_for_after) {
 			if (after_rp == srp)
 				look_for_after = false;
@@ -4340,18 +4385,19 @@ sg_abort_req(struct sg_fd *sfp, struct sg_request *srp)
 
 /* Holding xa_lock_irq(&sfp->srp_arr) */
 static int
-sg_mrq_abort_inflight(struct sg_fd *sfp, int pack_id)
+sg_mrq_abort_inflight(struct sg_fd_pollable *sfp_p, int pack_id)
 {
 	bool got_ebusy = false;
 	int res = 0;
 	struct sg_request *srp;
 	struct sg_request *prev_srp;
+	struct sg_fd *fp = sfp_p->fp;
 
 	for (prev_srp = NULL; true; prev_srp = srp) {
-		srp = sg_match_first_mrq_after(sfp, pack_id, prev_srp);
+		srp = sg_match_first_mrq_after(sfp_p, pack_id, prev_srp);
 		if (!srp)
 			break;
-		res = sg_abort_req(sfp, srp);
+		res = sg_abort_req(fp, srp);
 		if (res == -EBUSY)	/* check rest of active list */
 			got_ebusy = true;
 		else if (res)
@@ -4370,7 +4416,7 @@ sg_mrq_abort_inflight(struct sg_fd *sfp, int pack_id)
  * ctl_obj.request_extra (pack_id).
  */
 static int
-sg_mrq_abort(struct sg_fd *sfp, int pack_id, bool dev_scope)
+sg_mrq_abort(struct sg_fd_pollable *sfp_p, int pack_id, bool dev_scope)
 		__must_hold(sfp->f_mutex)
 {
 	int existing_id;
@@ -4379,63 +4425,67 @@ sg_mrq_abort(struct sg_fd *sfp, int pack_id, bool dev_scope)
 	struct sg_device *sdp;
 	struct sg_fd *o_sfp;
 	struct sg_fd *s_sfp;
+	struct sg_fd *fp = sfp_p->fp;
 
 	if (pack_id != SG_PACK_ID_WILDCARD)
-		SG_LOG(3, sfp, "%s: pack_id=%d, dev_scope=%s\n", __func__, pack_id,
+		SG_LOG(3, fp, "%s: pack_id=%d, dev_scope=%s\n", __func__, pack_id,
 		       (dev_scope ? "true" : "false"));
-	existing_id = atomic_read(&sfp->mrq_id_abort);
+	existing_id = atomic_read(&fp->mrq_id_abort);
 	if (existing_id == 0) {
 		if (dev_scope)
 			goto check_whole_dev;
-		SG_LOG(1, sfp, "%s: sfp->mrq_id_abort is 0, nothing to do\n", __func__);
+		SG_LOG(1, fp, "%s: sfp->mrq_id_abort is 0, nothing to do\n", __func__);
 		return -EADDRNOTAVAIL;
 	}
 	if (pack_id == SG_PACK_ID_WILDCARD) {
 		pack_id = existing_id;
-		SG_LOG(3, sfp, "%s: wildcard becomes pack_id=%d\n", __func__, pack_id);
+		SG_LOG(3, fp, "%s: wildcard becomes pack_id=%d\n", __func__, pack_id);
 	} else if (pack_id != existing_id) {
 		if (dev_scope)
 			goto check_whole_dev;
-		SG_LOG(1, sfp, "%s: want id=%d, got sfp->mrq_id_abort=%d\n", __func__, pack_id,
+		SG_LOG(1, fp, "%s: want id=%d, got sfp->mrq_id_abort=%d\n", __func__, pack_id,
 		       existing_id);
 		return -EADDRINUSE;
 	}
-	if (test_and_set_bit(SG_FFD_MRQ_ABORT, sfp->ffd_bm))
-		SG_LOG(2, sfp, "%s: repeated SG_IOABORT on mrq_id=%d\n", __func__, pack_id);
+	if (test_and_set_bit(SG_FFD_MRQ_ABORT, fp->ffd_bm))
+		SG_LOG(2, fp, "%s: repeated SG_IOABORT on mrq_id=%d\n", __func__, pack_id);
 
 	/* now look for inflight requests matching that mrq pack_id */
-	xa_lock_irqsave(&sfp->srp_arr, iflags);
-	res = sg_mrq_abort_inflight(sfp, pack_id);
+	xa_lock_irqsave(&fp->srp_arr, iflags);
+	res = sg_mrq_abort_inflight(sfp_p, pack_id);
 	if (res == -EBUSY) {
-		res = sg_mrq_abort_inflight(sfp, pack_id);
+		res = sg_mrq_abort_inflight(sfp_p, pack_id);
 		if (res)
 			goto fini;
 	}
-	s_sfp = sg_fd_share_ptr(sfp);
+	s_sfp = sg_fd_share_ptr(fp);
 	if (s_sfp) {	/* SGV4_FLAG_DO_ON_OTHER possible */
-		xa_unlock_irqrestore(&sfp->srp_arr, iflags);
-		sfp = s_sfp;	/* if share, switch to other fd */
-		xa_lock_irqsave(&sfp->srp_arr, iflags);
-		if (!sg_fd_is_shared(sfp))
+		xa_unlock_irqrestore(&fp->srp_arr, iflags);
+		fp = s_sfp;	/* if share, switch to other fd */
+		xa_lock_irqsave(&fp->srp_arr, iflags);
+		if (!sg_fd_is_shared(fp))
 			goto fini;
 		/* tough luck if other fd used same mrq pack_id */
-		res = sg_mrq_abort_inflight(sfp, pack_id);
+		res = sg_mrq_abort_inflight(sfp_p, pack_id);
 		if (res == -EBUSY)
-			res = sg_mrq_abort_inflight(sfp, pack_id);
+			res = sg_mrq_abort_inflight(sfp_p, pack_id);
 	}
 fini:
-	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
+	xa_unlock_irqrestore(&fp->srp_arr, iflags);
 	return res;
 
 check_whole_dev:
 	res = -ENODATA;
-	sdp = sfp->parentdp;
+	sdp = fp->parentdp;
 	xa_for_each(&sdp->sfp_arr, idx, o_sfp) {
-		if (o_sfp == sfp)
+		struct sg_fd_pollable a_sfpoll = *sfp_p;
+
+		if (o_sfp == fp)
 			continue;       /* already checked */
 		mutex_lock(&o_sfp->f_mutex);
+		a_sfpoll.fp = o_sfp;
 		/* recurse, dev_scope==false is stopping condition */
-		res = sg_mrq_abort(o_sfp, pack_id, false);
+		res = sg_mrq_abort(&a_sfpoll, pack_id, false);
 		mutex_unlock(&o_sfp->f_mutex);
 		if (res == 0)
 			break;
@@ -4461,6 +4511,7 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 	struct sg_request *srp;
 	struct sg_io_v4 io_v4;
 	struct sg_io_v4 *h4p = &io_v4;
+	struct sg_fd_pollable a_sfpoll;
 
 	if (copy_from_user(h4p, p, SZ_SG_IO_V4))
 		return -EFAULT;
@@ -4468,17 +4519,22 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		return -EPERM;
 	dev_scope = !!(h4p->flags & SGV4_FLAG_DEV_SCOPE);
 	pack_id = h4p->request_extra;
+	a_sfpoll.fp = sfp;
+	a_sfpoll.pollable = true;
+	a_sfpoll.immed = true;
 	if (h4p->flags & SGV4_FLAG_MULTIPLE_REQS) {
+		a_sfpoll.part_mrq = true;
 		if (pack_id == 0)
 			return -ENOSTR;
-		res = sg_mrq_abort(sfp, pack_id, dev_scope);
+		res = sg_mrq_abort(&a_sfpoll, pack_id, dev_scope);
 		return res;
 	}
+	a_sfpoll.part_mrq = false;
 	xa_lock_irqsave(&sfp->srp_arr, iflags);
 	use_tag = test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm);
 	id = use_tag ? (int)h4p->request_tag : pack_id;
 
-	srp = sg_match_request(sfp, use_tag, id);
+	srp = sg_match_request(&a_sfpoll, use_tag, id);
 	if (!srp) {	/* assume device (not just fd) scope */
 		xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 		if (!dev_scope)
@@ -4486,7 +4542,8 @@ sg_ctl_abort(struct sg_device *sdp, struct sg_fd *sfp, void __user *p)
 		xa_for_each(&sdp->sfp_arr, idx, o_sfp) {
 			if (o_sfp == sfp)
 				continue;	/* already checked */
-			srp = sg_match_request(o_sfp, use_tag, id);
+			a_sfpoll.fp = o_sfp;
+			srp = sg_match_request(&a_sfpoll, use_tag, id);
 			if (srp) {
 				sfp = o_sfp;
 				xa_lock_irqsave(&sfp->srp_arr, iflags);
@@ -4817,8 +4874,6 @@ sg_any_persistent_orphans(struct sg_fd *sfp)
 		struct sg_request *srp;
 		struct xarray *xafp = &sfp->srp_arr;
 
-		if (sg_num_waiting_maybe_acquire(sfp) < 1)
-			return false;
 		xa_for_each_marked(xafp, idx, srp, SG_XA_RQ_AWAIT) {
 			if (test_bit(SG_FRQ_PC_IS_ORPHAN, srp->frq_pc_bm))
 				return true;
@@ -5228,7 +5283,7 @@ sg_ctl_extended(struct sg_fd *sfp, void __user *p)
 	if (or_masks & SG_SEIM_BLK_POLL) {
 		n = 0;
 		if (s_wr_mask & SG_SEIM_BLK_POLL) {
-			result = sg_sfp_blk_poll(sfp, seip->num);
+			result = sg_sfp_blk_poll_all(sfp, seip->num);
 			if (unlikely(result < 0)) {
 				if (ret == 0)
 					ret = result;
@@ -5368,14 +5423,14 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp, uns
 	case SG_GET_NUM_WAITING:
 		/* Want as fast as possible, with a useful result */
 		if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm)) {
-			res = sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready */
+			res = sg_sfp_blk_poll_all(sfp, 1);	/* LLD may have some ready */
 			if (unlikely(res < 0))
 				return res;
 		}
-		val = atomic_read(&sfp->waiting);
+		val = atomic_read(&sfp->poll_waiting);
 		if (val)
 			return put_user(val, ip);
-		return put_user(atomic_read_acquire(&sfp->waiting), ip);
+		return put_user(atomic_read_acquire(&sfp->poll_waiting), ip);
 	case SG_IO:
 		if (SG_IS_DETACHING(sdp))
 			return -ENODEV;
@@ -5423,18 +5478,18 @@ sg_ioctl_common(struct file *filp, struct sg_device *sdp, struct sg_fd *sfp, uns
 			return res;
 		assign_bit(SG_FFD_FORCE_PACKID, sfp->ffd_bm, !!val);
 		return 0;
-	case SG_GET_PACK_ID:    /* or tag of oldest "read"-able, -1 if none */
+	case SG_GET_PACK_ID:    /* or tag of oldest pollable, -1 if none */
 		val = -1;
 		if (test_bit(SG_FFD_PREFER_TAG, sfp->ffd_bm)) {
 			xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
-				if (!test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm)) {
+				if (test_bit(SG_FRQ_PC_POLLABLE, srp->frq_pc_bm)) {
 					val = srp->tag;
 					break;
 				}
 			}
 		} else {
 			xa_for_each_marked(&sfp->srp_arr, idx, srp, SG_XA_RQ_AWAIT) {
-				if (!test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm)) {
+				if (test_bit(SG_FRQ_PC_POLLABLE, srp->frq_pc_bm)) {
 					val = srp->pack_id;
 					break;
 				}
@@ -5637,67 +5692,56 @@ sg_compat_ioctl(struct file *filp, unsigned int cmd_in, unsigned long arg)
 #endif
 
 /*
- * If the sg_request object is not inflight, return -ENODATA. This function
- * returns 1 if the given object was in inflight state and is in await_rcv
- * state after blk_poll() returns 1 or more. If blk_poll() fails, then that
- * (negative) value is returned. Otherwise returns 0. Note that blk_poll()
- * may complete unrelated requests that share the same q and cookie.
+ * Calls blk_poll(spin <- false) loop_count times. If loop_count is 0 then call blk_poll() once.
+ * If loop_count is negative then call blk_poll(spin <- true)) once for each request. If
+ * blk_poll() reports something other than 0 then returns at that point.
  */
 static int
-sg_srp_q_blk_poll(struct sg_request *srp, struct request_queue *q, int loop_count)
+sg_sfp_blk_poll_all(struct sg_fd *sfp, int loop_count)
 {
+	int res = 0;
 	int k, n, num;
+	unsigned long idx;
+	struct sg_request *srp;
+	struct scsi_device *sdev = sfp->parentdp->device;
+	struct request_queue *q = sdev ? sdev->request_queue : NULL;
 
+	if (unlikely(!q))
+		return -EINVAL;
 	num = (loop_count < 1) ? 1 : loop_count;
-	for (k = 0; k < num; ++k) {
-		if (atomic_read(&srp->rq_st) != SG_RQ_INFLIGHT)
-			return -ENODATA;
-		n = blk_poll(q, srp->cookie, loop_count < 0 /* spin if negative */);
-		if (n > 0)
-			return atomic_read(&srp->rq_st) == SG_RQ_AWAIT_RCV;
-		if (n < 0)
-			return n;
+	xa_for_each(&sfp->srp_arr, idx, srp) {
+		if ((srp->rq_flags & SGV4_FLAG_HIPRI) &&
+		    atomic_read(&srp->rq_st) == SG_RQ_INFLIGHT &&
+		    test_bit(SG_FRQ_PC_ISSUED, srp->frq_pc_bm)) {
+			for (k = 0; k < num; ++k) {
+				n = blk_poll(q, srp->cookie, loop_count < 0);
+				if (n < 0)
+					return n;
+				if (n > 0)
+					res += n;
+			}
+		}
 	}
-	return 0;
+	return res;
 }
 
-/*
- * Check all requests on this sfp that are both inflight and HIPRI. That check involves calling
- * blk_poll(spin<-false) loop_count times. If loop_count is 0 then call blk_poll once.
- * If loop_count is negative then call blk_poll(spin <- true)) once for each request.
- * Returns number found (could be 0) or a negated errno value.
- */
 static int
-sg_sfp_blk_poll(struct sg_fd *sfp, int loop_count)
+sg_sfp_blk_poll_first(struct sg_fd *sfp)
 {
-	int res = 0;
-	int n;
-	unsigned long idx, iflags;
+	unsigned long idx;
 	struct sg_request *srp;
 	struct scsi_device *sdev = sfp->parentdp->device;
 	struct request_queue *q = sdev ? sdev->request_queue : NULL;
-	struct xarray *xafp = &sfp->srp_arr;
 
 	if (unlikely(!q))
 		return -EINVAL;
-	xa_lock_irqsave(xafp, iflags);
-	xa_for_each(xafp, idx, srp) {
+	xa_for_each(&sfp->srp_arr, idx, srp) {
 		if ((srp->rq_flags & SGV4_FLAG_HIPRI) &&
-		    !test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm) &&
 		    atomic_read(&srp->rq_st) == SG_RQ_INFLIGHT &&
-		    test_bit(SG_FRQ_PC_ISSUED, srp->frq_pc_bm)) {
-			xa_unlock_irqrestore(xafp, iflags);
-			n = sg_srp_q_blk_poll(srp, q, loop_count);
-			if (n == -ENODATA)
-				n = 0;
-			if (unlikely(n < 0))
-				return n;
-			xa_lock_irqsave(xafp, iflags);
-			res += n;
-		}
+		    test_bit(SG_FRQ_PC_ISSUED, srp->frq_pc_bm))
+			return blk_poll(q, srp->cookie, false);
 	}
-	xa_unlock_irqrestore(xafp, iflags);
-	return res;
+	return 0;
 }
 
 /* Implements the poll(2) system call. Returns various EPOLL* flags OR-ed together. */
@@ -5709,11 +5753,11 @@ sg_poll(struct file *filp, poll_table *wait)
 	struct sg_fd *sfp = filp->private_data;
 
 	if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm))
-		sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready to push up */
-	num = atomic_read(&sfp->waiting);
+		sg_sfp_blk_poll_all(sfp, 1);	/* LLD may have some ready to push up */
+	num = atomic_read(&sfp->poll_waiting);
 	if (num < 1) {
 		poll_wait(filp, &sfp->cmpl_wait, wait);
-		num = atomic_read(&sfp->waiting);
+		num = atomic_read(&sfp->poll_waiting);
 	}
 	if (num > 0)
 		p_res = EPOLLIN | EPOLLRDNORM;
@@ -5916,7 +5960,7 @@ static void
 sg_rq_end_io(struct request *rqq, blk_status_t status)
 {
 	enum sg_rq_state rqq_state = SG_RQ_AWAIT_RCV;
-	int a_resid, slen;
+	int a_resid, slen, num;
 	u32 rq_result;
 	unsigned long iflags;
 	struct sg_request *srp = rqq->end_io_data;
@@ -5976,30 +6020,31 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 		}
 	}
 	srp->sense_len = slen;
-	if (unlikely(test_bit(SG_FRQ_PC_IS_ORPHAN, srp->frq_pc_bm))) {
-		if (test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm)) {
-			__clear_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm);
-		} else {
+	xa_lock_irqsave(&sfp->srp_arr, iflags);
+	if (unlikely(test_bit(SG_FFD_SIG_PEND, sfp->ffd_bm))) {
+		__set_bit(SG_FRQ_PC_IS_ORPHAN, srp->frq_pc_bm);
+		if (test_bit(SG_FRQ_PC_POLLABLE, srp->frq_pc_bm) &&
+		    (!test_bit(SG_FFD_KEEP_ORPHAN, sfp->ffd_bm))) {
 			rqq_state = SG_RQ_BUSY;
+			/* since ! KEEP_ORPHAN then we will harvest it */
 			__set_bit(SG_FRQ_PC_DEACT_ORPHAN, srp->frq_pc_bm);
 		}
 	}
-	xa_lock_irqsave(&sfp->srp_arr, iflags);
 	__set_bit(SG_FRQ_PC_ISSUED, srp->frq_pc_bm);
 	sg_rq_chg_state_force_ulck(srp, rqq_state);	/* normally --> SG_RQ_AWAIT_RCV */
 	WRITE_ONCE(srp->rqq, NULL);
-	if (test_bit(SG_FRQ_PC_COUNT_ACTIVE, srp->frq_pc_bm)) {
-		int num = atomic_inc_return(&sfp->waiting);
+	if (test_bit(SG_FRQ_PC_POLLABLE, srp->frq_pc_bm))
+		num = atomic_inc_return(&sfp->poll_waiting);
+	else
+		num = atomic_inc_return(&sfp->nonp_waiting);
+	if (num < 2) {
+		WRITE_ONCE(sfp->low_await_idx, srp->rq_idx);
+	} else {
+		int l_await_idx = READ_ONCE(sfp->low_await_idx);
 
-		if (num < 2) {
+		if (l_await_idx < 0 || srp->rq_idx < l_await_idx ||
+		    !xa_get_mark(&sfp->srp_arr, l_await_idx, SG_XA_RQ_AWAIT))
 			WRITE_ONCE(sfp->low_await_idx, srp->rq_idx);
-		} else {
-			int l_await_idx = READ_ONCE(sfp->low_await_idx);
-
-			if (l_await_idx < 0 || srp->rq_idx < l_await_idx ||
-			    !xa_get_mark(&sfp->srp_arr, l_await_idx, SG_XA_RQ_AWAIT))
-				WRITE_ONCE(sfp->low_await_idx, srp->rq_idx);
-		}
 	}
 	xa_unlock_irqrestore(&sfp->srp_arr, iflags);
 	/*
@@ -6007,7 +6052,7 @@ sg_rq_end_io(struct request *rqq, blk_status_t status)
 	 * can be called later from user context.
 	 */
 	scsi_req_free_cmd(scsi_rp);
-	blk_put_request(rqq);
+	blk_put_request(rqq);		/* may want to delay this in HIPRI case */
 
 	if (unlikely(rqq_state != SG_RQ_AWAIT_RCV)) {
 		/* clean up orphaned request that aren't being kept */
@@ -6640,13 +6685,18 @@ sg_finish_scsi_blk_rq(struct sg_request *srp)
 	struct bio *bio;
 	__maybe_unused char b[32];
 
+	if (xa_get_mark(&sfp->srp_arr, srp->rq_idx, SG_XA_RQ_INACTIVE)) {
+		SG_LOG(1, sfp, "%s: warning: already inactive srp=0x%pK\n", __func__, srp);
+		return;
+	}
 	SG_LOG(4, sfp, "%s: srp=0x%pK%s\n", __func__, srp, sg_get_rsv_str_lck(srp, " ", "",
 									      sizeof(b), b));
-	if (test_and_clear_bit(SG_FRQ_PC_COUNT_ACTIVE, srp->frq_pc_bm)) {
-		if (atomic_dec_and_test(&sfp->submitted))
-			clear_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm);
-		atomic_dec_return_release(&sfp->waiting);
-	}
+	if (atomic_dec_and_test(&sfp->submitted))
+		clear_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm);
+	if (test_bit(SG_FRQ_PC_POLLABLE, srp->frq_pc_bm))
+		atomic_dec_return_release(&sfp->poll_waiting);
+	else
+		atomic_dec_return_release(&sfp->nonp_waiting);
 	/* Expect blk_put_request(rqq) already called in sg_rq_end_io() */
 	if (unlikely(rqq)) {
 		WRITE_ONCE(srp->rqq, NULL);
@@ -6827,98 +6877,78 @@ sg_read_append(struct sg_request *srp, void __user *outp, int num_xfer)
 }
 
 /*
- * If there are many requests outstanding, the speed of this function is important. 'id' is pack_id
- * when is_tag=false, otherwise it is a tag. Both SG_PACK_ID_WILDCARD and SG_TAG_WILDCARD are -1
- * and that case is typically the fast path. This function is only used in the non-blocking cases.
- * Returns pointer to (first) matching sg_request or NULL. If found, sg_request state is moved
- * from SG_RQ_AWAIT_RCV to SG_RQ_BUSY.
+ * If there are many requests outstanding, the speed of this function is important. If the
+ * number waiting (divided into two counts: poll_waiting and nonp_waiting) is 0 then returns
+ * NULL. Otherwise it checks all the awaiting requests (after applying the constraints in
+ * sfp_p) and the first it finds, it tries to place it in busy state. If that succeeds the
+ * address of the request (a srp) is returned. If it fails, it keeps looking. If the requests
+ * are exhausted before one can be placed in busy state, then NULL is returned.
  */
 static struct sg_request *
-sg_get_srp_by_id(struct sg_fd *sfp, int id, bool is_tag, bool part_mrq)
+sg_find_srp_from(struct sg_fd_pollable *sfp_p)
 {
 	__maybe_unused bool is_bad_st = false;
-	bool search_for_1 = (id != SG_TAG_WILDCARD);
+	bool search_for_any = (sfp_p->find_by == SG_SEARCH_ANY || sfp_p->pack_id_tag == -1);
 	bool second = false;
 	int res;
-	int l_await_idx = READ_ONCE(sfp->low_await_idx);
 	unsigned long idx, s_idx;
 	unsigned long end_idx = ULONG_MAX;
+	struct sg_fd *fp = sfp_p->fp;
+	int l_await_idx = READ_ONCE(fp->low_await_idx);
 	struct sg_request *srp = NULL;
-	struct xarray *xafp = &sfp->srp_arr;
+	struct xarray *xafp = &fp->srp_arr;
 
-	if (test_bit(SG_FFD_HIPRI_SEEN, sfp->ffd_bm)) {
-		res = sg_sfp_blk_poll(sfp, 0);	/* LLD may have some ready to push up */
+	if (test_bit(SG_FFD_HIPRI_SEEN, fp->ffd_bm)) {
+		res = sg_sfp_blk_poll_first(fp);	/* LLD may have some ready to push up */
 		if (unlikely(res < 0))
 			return ERR_PTR(res);
 	}
-	if (sg_num_waiting_maybe_acquire(sfp) < 1)
+	if (sg_num_waiting_maybe_acquire(sfp_p) < 1)
 		return NULL;
 
 	s_idx = (l_await_idx < 0) ? 0 : l_await_idx;
 	idx = s_idx;
-	if (unlikely(search_for_1)) {
-second_time_for_1:
-		for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
-		     srp;
-		     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
-			if (part_mrq != test_bit(SG_FRQ_PC_PART_MRQ, srp->frq_pc_bm))
-				continue;
-			if (!part_mrq && test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm))
-				continue;
-			if (unlikely(is_tag)) {
-				if (srp->tag != id)
+	/* first search from [s_idx ... end_idx) */
+second_time:
+	for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
+	     srp;
+	     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
+		if (sfp_p->part_mrq != test_bit(SG_FRQ_PC_PART_MRQ, srp->frq_pc_bm))
+			continue;
+		if (sfp_p->pollable != test_bit(SG_FRQ_PC_POLLABLE, srp->frq_pc_bm))
+			continue;
+		if (unlikely(!search_for_any)) {
+			if (sfp_p->find_by == SG_SEARCH_BY_TAG) {
+				if (srp->tag != sfp_p->pack_id_tag)
 					continue;
 			} else {
-				if (srp->pack_id != id)
+				if (srp->pack_id != sfp_p->pack_id_tag)
 					continue;
 			}
-			res = sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY);
-			if (likely(res == 0))
-				goto good;
-		}
-		/* If not found so far, need to wrap around and search [0 ... s_idx) */
-		if (!srp && !second && s_idx > 0) {
-			end_idx = s_idx - 1;
-			s_idx = 0;
-			idx = s_idx;
-			second = true;
-			goto second_time_for_1;
 		}
-	} else {
-		/*
-		 * Searching for _any_ request is the more likely usage. Start searching with the
-		 * last xarray index that was used. In the case of a large-ish IO depth, it is
-		 * likely that the second (relative) position will be the request we want, if it
-		 * is ready. If there is no queuing and the "last used" has been re-used then the
-		 * first (relative) position will be the request we want.
-		 */
-second_time_for_any:
-		for (srp = xa_find(xafp, &idx, end_idx, SG_XA_RQ_AWAIT);
-		     srp;
-		     srp = xa_find_after(xafp, &idx, end_idx, SG_XA_RQ_AWAIT)) {
-			if (test_bit(SG_FRQ_PC_SYNC_INVOC, srp->frq_pc_bm))
-				continue;
-			res = sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY);
-			if (likely(res == 0)) {
-				WRITE_ONCE(sfp->low_await_idx, idx + 1);
-				goto good;
-			}
+		res = sg_rq_chg_state(srp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY);
+		if (likely(res == 0)) {
+			if (search_for_any)
+				WRITE_ONCE(fp->low_await_idx, idx + 1);
+			goto good;
+		}
 #if IS_ENABLED(SG_LOG_ACTIVE)
-			sg_rq_state_fail_msg(sfp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY, __func__);
+		sg_rq_state_fail_msg(fp, SG_RQ_AWAIT_RCV, SG_RQ_BUSY, __func__);
 #endif
-		}
-		if (!srp && !second && s_idx > 0) {
-			end_idx = s_idx - 1;
-			s_idx = 0;
-			idx = s_idx;
-			second = true;
-			goto second_time_for_any;
-		}
+	}
+	/* If not found so far, need to wrap around and search [0 ... s_idx) */
+	if (!srp && !second && s_idx > 0) {
+		end_idx = s_idx - 1;
+		s_idx = 0;
+		idx = s_idx;
+		second = true;
+		goto second_time;
 	}
 	return NULL;
 good:
-	SG_LOG(5, sfp, "%s: %s%d found [srp=0x%pK]\n", __func__, (is_tag ? "tag=" : "pack_id="),
-	       id, srp);
+	SG_LOG(5, fp, "%s: %s%d found [srp=0x%pK]\n", __func__,
+	       ((sfp_p->find_by == SG_SEARCH_BY_TAG) ? "tag=" : "pack_id="), sfp_p->pack_id_tag,
+	       srp);
 	return srp;
 }
 
@@ -7392,6 +7422,10 @@ sg_deact_request(struct sg_fd *sfp, struct sg_request *srp)
 
 	if (WARN_ON(!sfp || !srp))
 		return;
+	if (xa_get_mark(&sfp->srp_arr, srp->rq_idx, SG_XA_RQ_INACTIVE)) {
+		SG_LOG(4, sfp, "%s: warning: already inactive srp=0x%pK\n", __func__, srp);
+		return;
+	}
 	SG_LOG(3, sfp, "%s: srp=%pK\n", __func__, srp);
 	sbp = srp->sense_bp;
 	srp->sense_bp = NULL;
@@ -7444,7 +7478,8 @@ sg_add_sfp(struct sg_device *sdp, struct file *filp)
 	sfp->tot_fd_thresh = SG_TOT_FD_THRESHOLD;
 	atomic_set(&sfp->sum_fd_dlens, 0);
 	atomic_set(&sfp->submitted, 0);
-	atomic_set(&sfp->waiting, 0);
+	atomic_set(&sfp->poll_waiting, 0);
+	atomic_set(&sfp->nonp_waiting, 0);
 	atomic_set(&sfp->inactives, 0);
 	/*
 	 * SG_SCATTER_SZ initializes scatter_elem_sz but different value may be given as driver
@@ -7921,6 +7956,8 @@ sg_proc_debug_sreq(struct sg_request *srp, int to, bool t_in_ns, bool inactive,
 			       sg_shr_str(srp->sh_var, false));
 	if (srp->sgatp->num_sgat > 1)
 		n += scnprintf(obp + n, len - n, " sgat=%d", srp->sgatp->num_sgat);
+	if (test_bit(SG_FRQ_PC_POLLABLE, srp->frq_lt_bm))
+		n += scnprintf(obp + n, len - n, " pollable");
 	if (test_bit(SG_FRQ_LT_REUSE_BIO, srp->frq_lt_bm))
 		n += scnprintf(obp + n, len - n, " re-use_bio");
 	cp = (srp->rq_flags & SGV4_FLAG_HIPRI) ? "hipri " : "";
@@ -7977,9 +8014,10 @@ sg_proc_debug_fd(struct sg_fd *fp, char *obp, int len, unsigned long idx, bool r
 		       fp->mmap_sz, READ_ONCE(fp->low_used_idx), READ_ONCE(fp->low_await_idx),
 		       atomic_read(&fp->sum_fd_dlens));
 	n += scnprintf(obp + n, len - n,
-		       "   submitted=%d waiting=%d inactives=%d   open thr_id=%d\n",
-		       atomic_read(&fp->submitted), atomic_read(&fp->waiting),
-		       atomic_read(&fp->inactives), fp->tid);
+		       "   submitted=%d poll_waiting=%d nonp_waiting=%d inactives=%d   %s=%d\n",
+		       atomic_read(&fp->submitted), atomic_read(&fp->poll_waiting),
+		       atomic_read(&fp->nonp_waiting), atomic_read(&fp->inactives),
+		       "open thr_id", fp->tid);
 	if (reduced)
 		return n;
 	k = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v18 83/83] sg: bump version to 4.0.47
  2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
                   ` (82 preceding siblings ...)
  2021-04-27 21:57 ` [PATCH v18 82/83] sg: pollable and non-pollable requests Douglas Gilbert
@ 2021-04-27 21:57 ` Douglas Gilbert
  83 siblings, 0 replies; 86+ messages in thread
From: Douglas Gilbert @ 2021-04-27 21:57 UTC (permalink / raw)
  To: linux-scsi; +Cc: martin.petersen, jejb, hare

Version is bumped to > 4.0.30 so test tools is sg3_utils (beta 1.45
revision 827 and later) can distinguish between this full
featured driver and a reduced featured version. The full featured
driver has request sharing and multiple requests (in a single
invocation) support. The reduced featured driver only adds sg v4
interface support and its version numbers
are >= 4.0.00 and < 4.0.30 .

Both versions of the sg v4 driver are described on this webpage:
    https://sg.danny.cz/sg/sg_v40.html

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
---
 drivers/scsi/sg.c      | 4 ++--
 include/uapi/scsi/sg.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 5328befc0893..f10b252fc091 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -12,8 +12,8 @@
  *
  */
 
-static int sg_version_num = 40012;  /* [x]xyyzz where [x] empty when x=0 */
-#define SG_VERSION_STR "4.0.12"		/* [x]x.[y]y.zz */
+static int sg_version_num = 40047;  /* [x]xyyzz where [x] empty when x=0 */
+#define SG_VERSION_STR "4.0.47"		/* [x]x.[y]y.zz */
 static char *sg_version_date = "20210421";
 
 #include <linux/module.h>
diff --git a/include/uapi/scsi/sg.h b/include/uapi/scsi/sg.h
index 871073d1a8d3..22ed1e1ad91e 100644
--- a/include/uapi/scsi/sg.h
+++ b/include/uapi/scsi/sg.h
@@ -14,7 +14,7 @@
  * Later extensions (versions 2, 3 and 4) to driver:
  *   Copyright (C) 1998 - 2021 Douglas Gilbert
  *
- * Version 4.0.12 (20210111)
+ * Version 4.0.46 (20210111)
  *  This version is for Linux 4 and 5 series kernels.
  *
  * Documentation
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v18 43/83] sg: no_dxfer: move to/from kernel buffers
  2021-04-27 21:56 ` [PATCH v18 43/83] sg: no_dxfer: move to/from kernel buffers Douglas Gilbert
@ 2021-04-28  7:07   ` Hannes Reinecke
  0 siblings, 0 replies; 86+ messages in thread
From: Hannes Reinecke @ 2021-04-28  7:07 UTC (permalink / raw)
  To: Douglas Gilbert, linux-scsi; +Cc: martin.petersen, jejb

On 4/27/21 11:56 PM, Douglas Gilbert wrote:
> When the NO_DXFER flag is use on a command/request, the data-in
> and data-out buffers (if present) should not be ignored. Add
> sg_rq_map_kern() function to handle this. Uses a single bio with
> multiple bvec_s usually each holding multiple pages, if necessary.
> The driver default element size is 32 KiB so if PAGE_SIZE is 4096
> then get_order()==3 .
> 
> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
> ---
>  drivers/scsi/sg.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 59 insertions(+)
> Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2021-04-28  7:07 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27 21:56 [PATCH v18 00/83] sg: add v4 interface, request sharing Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 00/45] sg: add v4 interface Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 01/83] sg: move functions around Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 02/83] sg: remove typedefs, type+formatting cleanup Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 03/83] sg: sg_log and is_enabled Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 04/83] sg: rework sg_poll(), minor changes Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 05/83] sg: bitops in sg_device Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 06/83] sg: make open count an atomic Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 07/83] sg: move header to uapi section Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 08/83] sg: speed sg_poll and sg_get_num_waiting Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 09/83] sg: sg_allow_if_err_recovery and renames Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 10/83] sg: improve naming Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 11/83] sg: change rwlock to spinlock Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 12/83] sg: ioctl handling Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 13/83] sg: split sg_read Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 14/83] sg: sg_common_write add structure for arguments Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 15/83] sg: rework sg_vma_fault Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 16/83] sg: rework sg_mmap Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 17/83] sg: replace sg_allow_access Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 18/83] sg: rework scatter gather handling Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 19/83] sg: introduce request state machine Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 20/83] sg: sg_find_srp_by_id Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 21/83] sg: sg_fill_request_element Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 22/83] sg: printk change %p to %pK Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 23/83] sg: xarray for fds in device Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 24/83] sg: xarray for reqs in fd Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 25/83] sg: replace rq array with xarray Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 26/83] sg: sense buffer rework Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 27/83] sg: add sg v4 interface support Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 28/83] sg: rework debug info Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 29/83] sg: add 8 byte SCSI LUN to sg_scsi_id Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 30/83] sg: expand sg_comm_wr_t Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 31/83] sg: add sg_iosubmit_v3 and sg_ioreceive_v3 ioctls Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 32/83] sg: add some __must_hold macros Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 33/83] sg: move procfs objects to avoid forward decls Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 34/83] sg: protect multiple receivers Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 35/83] sg: first debugfs support Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 36/83] sg: rework mmap support Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 37/83] sg: defang allow_dio Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 38/83] sg: warn v3 write system call users Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 39/83] sg: add mmap_sz tracking Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 40/83] sg: remove rcv_done request state Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 41/83] sg: track lowest inactive and await indexes Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 42/83] sg: remove unit attention check for device changed Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 43/83] sg: no_dxfer: move to/from kernel buffers Douglas Gilbert
2021-04-28  7:07   ` Hannes Reinecke
2021-04-27 21:56 ` [PATCH v18 44/83] sg: add blk_poll support Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 45/83] sg: bump version to 4.0.12 Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 46/83] sg: add sg_ioabort ioctl Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 47/83] sg: add sg_set_get_extended ioctl Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 48/83] sg: sgat_elem_sz and sum_fd_dlens Douglas Gilbert
2021-04-27 21:56 ` [PATCH v18 49/83] sg: tag and more_async Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 50/83] sg: add fd sharing , change, unshare Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 51/83] sg: add shared requests Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 52/83] sg: add multiple request support Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 53/83] sg: rename some mrq variables Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 54/83] sg: unlikely likely Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 55/83] sg: mrq abort Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 56/83] sg: reduce atomic operations Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 57/83] sg: add excl_wait flag Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 58/83] sg: tweak sg_find_sfp_by_fd() Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 59/83] sg: add snap_dev flag and snapped in debugfs Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 60/83] sg: compress usercontext to uc Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 61/83] sg: optionally output sg_request.frq_bm flags Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 62/83] sg: work on sg_mrq_sanity() Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 63/83] sg: shared variable blocking Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 64/83] sg: device timestamp Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 65/83] sg: condition met is not an error Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 66/83] sg: split sg_setup_req Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 67/83] sg: finish after read-side request Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 68/83] sg: keep share and dout offset flags Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 69/83] sg: add dlen to sg_comm_wr_t Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 70/83] sg: make use of struct sg_mrq_hold Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 71/83] sg: add mmap IO option for mrq metadata Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 72/83] sg: add eventfd support Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 73/83] sg: table of error number explanations Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 74/83] sg: add ordered write flag Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 75/83] sg: expand source line length to 100 characters Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 76/83] sg: add no_attach_msg parameter Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 77/83] sg: add SGV4_FLAG_REC_ORDER Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 78/83] sg: max to read for mrq sg_ioreceive Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 79/83] sg: mrq: if uniform svb then re-use bio_s Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 80/83] sg: expand bvec usage; " Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 81/83] sg: blk_poll/hipri work for mrq Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 82/83] sg: pollable and non-pollable requests Douglas Gilbert
2021-04-27 21:57 ` [PATCH v18 83/83] sg: bump version to 4.0.47 Douglas Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.