linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/13] scsi: ufs: Add HPB Support
@ 2020-05-15 10:30 Avri Altman
  2020-05-15 10:30 ` [RFC PATCH 01/13] scsi: ufs: Add HPB parameters Avri Altman
                   ` (13 more replies)
  0 siblings, 14 replies; 39+ messages in thread
From: Avri Altman @ 2020-05-15 10:30 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI, Avri Altman

NAND flash-based storage devices, needs to translate the logical
addresses of the IO requests to its corresponding physical addresses of
the flash storage.  As the internal SRAM of the storage device cannot
accommodate the entire L2P mapping table, the device can only partially
load the L2P data it needs based on the incoming LBAs. As a result,
cache misses - which are more frequent in random read access, can result
in serious performance degradation.  To remedy the above, UFS3.1 offers
the Host Performance Booster (HPB) feature, which uses the host’s system
memory as a cache for L2P map data. The basic concept of HPB is to
cache L2P mapping entries in host system memory so that both physical
block address (PBA) and logical block address (LBA) can be delivered in
HPB read command.  Not to be confused with host-managed-FTL, HPB is
merely about NAND flash cost reduction.

HPB, by its nature, imposes an interesting question regarding the proper
location of its components across the storage stack. On Init it requires
to detect the HPB capabilities and parameters, which can be only done
from the LLD level.  On the other hand, it requires to send scsi
commands as part of its cache management, which preferably should be
done from scsi mid-layer,  and compose the "HPB_READ" command, which
should be done as part of scsi command setup path.
Therefore, we came up with a 2 module layout: ufshpb in the ufs driver,
and scsi_dh_ufshpb under scsi device handler.

The ufshpb module bear 2 main duties. During initialization, it reads
the hpb configuration from the device. Later on, during the scan host
phase, it attaches the scsi device and activates the scsi hpb device
handler.  The second duty mainly concern supporting the HPB cache
management. The scsi hpb device handler will perform the cache
management and send the HPB_READ command. The modules will communicate
via the standard device handler API: the handler_data opaque pointer,
and the set_params op-mode.

This series has borrowed heavily from a HPB implementation that was
published as part of the pixel3 code, authored by:
Yongmyung Lee <ymhungry.lee@samsung.com> and
Jinyoung Choi <j-young.choi@samsung.com>.

We kept some of its design and implementation details. We made some
minor modifications to adopt the latest spec changes (HPB1.0 was not
close when the driver initially published), and also divide the
implementation between the scsi handler and the ufs modules, instead of
a single module in the original driver, which simplified the
implementation to a great deal and resulted in far less code. One more
big difference is that the Pixel3 driver support device managed mode,
while we are supporting host managed mode, which reflect heavily on the
cache management decision process.

Many thanks to Damien Le Moal for his insightful comments.


Avri Altman (13):
  scsi: ufs: Add HPB parameters
  scsi: ufshpb: Init part I - Read HPB config
  scsi: scsi_dh: Introduce scsi_dh_ufshpb
  scsi: ufs: ufshpb: Init part II - Attach scsi device
  scsi: ufs: ufshpb: Disable HPB if no HPB-enabled luns
  scsi: scsi_dh: ufshpb: Prepare for L2P cache management
  scsi: scsi_dh: ufshpb: Add ufshpb state machine
  scsi: dh: ufshpb: Activate pinned regions
  scsi: ufshpb: Add response API
  scsi: dh: ufshpb: Add ufshpb_set_params
  scsi: Allow device handler set their own CDB
  scsi: dh: ufshpb: Add prep_fn handler
  scsi: scsi_dh: ufshpb: Add "Cold" subregions timer

 drivers/scsi/device_handler/Makefile         |    1 +
 drivers/scsi/device_handler/scsi_dh_ufshpb.c | 1480 ++++++++++++++++++++++++++
 drivers/scsi/scsi_lib.c                      |    5 +-
 drivers/scsi/sd.c                            |    9 +
 drivers/scsi/ufs/Kconfig                     |   13 +
 drivers/scsi/ufs/Makefile                    |    1 +
 drivers/scsi/ufs/ufs.h                       |   16 +
 drivers/scsi/ufs/ufshcd.c                    |   12 +
 drivers/scsi/ufs/ufshcd.h                    |   11 +
 drivers/scsi/ufs/ufshpb.c                    |  610 +++++++++++
 drivers/scsi/ufs/ufshpb.h                    |   28 +
 include/scsi/scsi_dh_ufshpb.h                |   67 ++
 12 files changed, 2251 insertions(+), 2 deletions(-)
 create mode 100644 drivers/scsi/device_handler/scsi_dh_ufshpb.c
 create mode 100644 drivers/scsi/ufs/ufshpb.c
 create mode 100644 drivers/scsi/ufs/ufshpb.h
 create mode 100644 include/scsi/scsi_dh_ufshpb.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC PATCH 01/13] scsi: ufs: Add HPB parameters
  2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
@ 2020-05-15 10:30 ` Avri Altman
  2020-05-15 10:30 ` [RFC PATCH 02/13] scsi: ufshpb: Init part I - Read HPB config Avri Altman
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Avri Altman @ 2020-05-15 10:30 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI, Avri Altman

A prep patch to introduce HPB-related parameters, that we will use
throughout.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
---
 drivers/scsi/ufs/ufs.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/scsi/ufs/ufs.h b/drivers/scsi/ufs/ufs.h
index b313534..6428538 100644
--- a/drivers/scsi/ufs/ufs.h
+++ b/drivers/scsi/ufs/ufs.h
@@ -130,6 +130,11 @@ enum {
 	UPIU_QUERY_FUNC_STANDARD_WRITE_REQUEST          = 0x81,
 };
 
+/* Logical unit enable - to decode bLUEnable */
+enum {
+	UFS_LUN_HPB = 0x2,
+};
+
 /* Flag idn for Query Requests*/
 enum flag_idn {
 	QUERY_FLAG_IDN_FDEVICEINIT			= 0x01,
@@ -146,6 +151,7 @@ enum flag_idn {
 	QUERY_FLAG_IDN_WB_EN                            = 0x0E,
 	QUERY_FLAG_IDN_WB_BUFF_FLUSH_EN                 = 0x0F,
 	QUERY_FLAG_IDN_WB_BUFF_FLUSH_DURING_HIBERN8     = 0x10,
+	QUERY_FLAG_IDN_HPB_RESET                        = 0x11,
 };
 
 /* Attribute idn for Query requests */
@@ -229,6 +235,9 @@ enum unit_desc_param {
 	UNIT_DESC_PARAM_PHY_MEM_RSRC_CNT	= 0x18,
 	UNIT_DESC_PARAM_CTX_CAPABILITIES	= 0x20,
 	UNIT_DESC_PARAM_LARGE_UNIT_SIZE_M1	= 0x22,
+	UNIT_DESC_PARAM_MAX_ACTIVE_HPBREGIONS	= 0x23,
+	UNIT_DESC_PARAM_HPB_PINNED_OFFSET	= 0x25,
+	UNIT_DESC_PARAM_HPB_PINNED_COUNT	= 0x27,
 	UNIT_DESC_PARAM_WB_BUF_ALLOC_UNITS	= 0x29,
 };
 
@@ -269,6 +278,8 @@ enum device_desc_param {
 	DEVICE_DESC_PARAM_PSA_MAX_DATA		= 0x25,
 	DEVICE_DESC_PARAM_PSA_TMT		= 0x29,
 	DEVICE_DESC_PARAM_PRDCT_REV		= 0x2A,
+	DEVICE_DESC_PARAM_HPBVER		= 0x40,
+	DEVICE_DESC_PARAM_HPBCONTROL		= 0x42,
 	DEVICE_DESC_PARAM_EXT_UFS_FEATURE_SUP	= 0x4F,
 	DEVICE_DESC_PARAM_WB_PRESRV_USRSPC_EN	= 0x53,
 	DEVICE_DESC_PARAM_WB_TYPE		= 0x54,
@@ -317,6 +328,10 @@ enum geometry_desc_param {
 	GEOMETRY_DESC_PARAM_ENM4_MAX_NUM_UNITS	= 0x3E,
 	GEOMETRY_DESC_PARAM_ENM4_CAP_ADJ_FCTR	= 0x42,
 	GEOMETRY_DESC_PARAM_OPT_LOG_BLK_SIZE	= 0x44,
+	GEOMETRY_DESC_PARAM_HPB_REGION_SIZE	= 0x48,
+	GEOMETRY_DESC_PARAM_MAX_HPB_LUNS	= 0x49,
+	GEOMETRY_DESC_PARAM_HPB_SUBREGION_SIZE	= 0x4A,
+	GEOMETRY_DESC_PARAM_HPB_ACTIVE_REGIONS	= 0x4B,
 	GEOMETRY_DESC_PARAM_WB_MAX_ALLOC_UNITS	= 0x4F,
 	GEOMETRY_DESC_PARAM_WB_MAX_WB_LUNS	= 0x53,
 	GEOMETRY_DESC_PARAM_WB_BUFF_CAP_ADJ	= 0x54,
@@ -361,6 +376,7 @@ enum {
 
 /* Possible values for dExtendedUFSFeaturesSupport */
 enum {
+	UFS_FEATURE_HPB			= BIT(7),
 	UFS_DEV_WRITE_BOOSTER_SUP	= BIT(8),
 };
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 02/13] scsi: ufshpb: Init part I - Read HPB config
  2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
  2020-05-15 10:30 ` [RFC PATCH 01/13] scsi: ufs: Add HPB parameters Avri Altman
@ 2020-05-15 10:30 ` Avri Altman
  2020-05-15 15:33   ` Randy Dunlap
                     ` (2 more replies)
  2020-05-15 10:30 ` [RFC PATCH 03/13] scsi: scsi_dh: Introduce scsi_dh_ufshpb Avri Altman
                   ` (11 subsequent siblings)
  13 siblings, 3 replies; 39+ messages in thread
From: Avri Altman @ 2020-05-15 10:30 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI, Avri Altman

The ufshpb module bear 2 main duties.  During initialization, it reads
the hpb configuration from the device.  Later on, during the scan host
phase, it attaches the scsi device and activates the scsi hpb device
handler.

The second duty mainly concern the HPB cache management and will be
dealt with in future patches.

This patch introduce an API to carry the first part of the
initialization: reading the HPB configuration.

At this point we introduce the key data structure of the ufshpb module -
ufshpb_lun.  It will bear some more duties in the future, so we will
elaborate it as we go.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
---
 drivers/scsi/ufs/Kconfig  |  12 ++
 drivers/scsi/ufs/Makefile |   1 +
 drivers/scsi/ufs/ufshcd.c |   5 +
 drivers/scsi/ufs/ufshcd.h |  10 ++
 drivers/scsi/ufs/ufshpb.c | 318 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/ufs/ufshpb.h |  22 ++++
 6 files changed, 368 insertions(+)
 create mode 100644 drivers/scsi/ufs/ufshpb.c
 create mode 100644 drivers/scsi/ufs/ufshpb.h

diff --git a/drivers/scsi/ufs/Kconfig b/drivers/scsi/ufs/Kconfig
index e2005ae..a540919 100644
--- a/drivers/scsi/ufs/Kconfig
+++ b/drivers/scsi/ufs/Kconfig
@@ -160,3 +160,15 @@ config SCSI_UFS_BSG
 
 	  Select this if you need a bsg device node for your UFS controller.
 	  If unsure, say N.
+
+config SCSI_UFS_HPB
+	bool "Support UFS Host Performance Booster (HPB)"
+        depends on SCSI_UFSHCD
+        help
+	  A UFS feature targeted to improve random read performance.  It uses
+	  the host’s system memory as a cache for L2P map data, so that both
+	  physical block address (PBA) and logical block address (LBA) can be
+	  delivered in HPB read command.
+
+          Select this to enable this feature.
+          If unsure, say N.
diff --git a/drivers/scsi/ufs/Makefile b/drivers/scsi/ufs/Makefile
index 94c6c5d..c38788a 100644
--- a/drivers/scsi/ufs/Makefile
+++ b/drivers/scsi/ufs/Makefile
@@ -12,3 +12,4 @@ obj-$(CONFIG_SCSI_UFSHCD_PLATFORM) += ufshcd-pltfrm.o
 obj-$(CONFIG_SCSI_UFS_HISI) += ufs-hisi.o
 obj-$(CONFIG_SCSI_UFS_MEDIATEK) += ufs-mediatek.o
 obj-$(CONFIG_SCSI_UFS_TI_J721E) += ti-j721e-ufs.o
+obj-$(CONFIG_SCSI_UFS_HPB) += ufshpb.o
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 426073a..bffe699 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -50,6 +50,7 @@
 #include "ufs_bsg.h"
 #include <asm/unaligned.h>
 #include <linux/blkdev.h>
+#include "ufshpb.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/ufs.h>
@@ -7341,6 +7342,9 @@ static int ufshcd_add_lus(struct ufs_hba *hba)
 		hba->clk_scaling.is_allowed = true;
 	}
 
+	if (ufshcd_is_hpb_supported(hba))
+		ufshpb_probe(hba);
+
 	ufs_bsg_probe(hba);
 	scsi_scan_host(hba->host);
 	pm_runtime_put_sync(hba->dev);
@@ -8608,6 +8612,7 @@ EXPORT_SYMBOL(ufshcd_shutdown);
 void ufshcd_remove(struct ufs_hba *hba)
 {
 	ufs_bsg_remove(hba);
+	ufshpb_remove(hba);
 	ufs_sysfs_remove_nodes(hba->dev);
 	blk_cleanup_queue(hba->tmf_queue);
 	blk_mq_free_tag_set(&hba->tmf_tag_set);
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 23a434c..e7014a3 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -564,6 +564,11 @@ enum ufshcd_caps {
 	 * provisioned to be used. This would increase the write performance.
 	 */
 	UFSHCD_CAP_WB_EN				= 1 << 7,
+
+	/*
+	 * This capability indicates that the platform supports HPB
+	 */
+	UFSHCD_CAP_HPB 					= 1 << 8,
 };
 
 /**
@@ -791,6 +796,11 @@ static inline bool ufshcd_is_wb_allowed(struct ufs_hba *hba)
 	return hba->caps & UFSHCD_CAP_WB_EN;
 }
 
+static inline bool ufshcd_is_hpb_supported(struct ufs_hba *hba)
+{
+	return hba->caps & UFSHCD_CAP_HPB;
+}
+
 #define ufshcd_writel(hba, val, reg)	\
 	writel((val), (hba)->mmio_base + (reg))
 #define ufshcd_readl(hba, reg)	\
diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
new file mode 100644
index 0000000..181917f
--- /dev/null
+++ b/drivers/scsi/ufs/ufshpb.c
@@ -0,0 +1,318 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Host Performance Booster(HPB)
+ * Copyright (C) 2017-2018 Samsung Electronics Co., Ltd.
+ * Copyright (C) 2018, Google, Inc.
+ * Copyright (C) 2020 Western Digital Corporation or its affiliates.
+ */
+
+#include <asm/unaligned.h>
+#include "ufshcd.h"
+#include "ufs.h"
+#include "ufshpb.h"
+
+#define UFSHPB_SPEC_VER (0x310)
+#define UFSHPB_MAX_LUNS (0x20)
+#define DEV_DESC_HPB_SIZE (0x53)
+#define GEO_DESC_HPB_SIZE (0x4D)
+#define UNIT_DESC_HPB_SIZE (0x29)
+
+enum ufshpb_control_modes {
+	UFSHPB_HOST_CONTROL,
+	UFSHPB_DEVICE_CONTROL
+};
+
+/**
+ * struct ufshpb_lun_conf - to hold hpb lun config from unit descriptor
+ * @lun - lun id
+ * @size - lun size = 2^bLogicalBlockSize x qLogicalBlockCount
+ * @max_active_regions - Maximum Number of Active HPB Regions for that LU
+ * @pinned_starting_idx - HPB Pinned Region Start Offset
+ * @pinned_count - Number of HPB Pinned Regions
+ */
+struct ufshpb_lun_config {
+	u8 lun;
+	u64 size;
+	u8 hpb_lun_idx;
+	u16 max_active_regions;
+	u16 pinned_starting_idx;
+	u16 pinned_count;
+};
+
+/**
+ * struct ufshpb_config - to hold hpb config supported by the device
+ * @version - HPB Specification Version
+ * @mode - hpb control mode
+ * @region_size - HPB Region size[B] = 512B x 2^bHPBRegionSize
+ * @max_luns - Maximum number of HPB LU supported by the device <= 0x20
+ * @subregion_size - HPB Sub-Region size[B] = 512B x 2^bHPBSubRegionSize
+ * @max_active_regions - Maximum number of Active HPB Regions
+ * @num_hpb_luns - hpb luns that are configured in the device
+ * @hpb_luns - to hold hpb lun config from unit descriptor
+ */
+struct ufshpb_config {
+	u16 version;
+	u8 mode;
+	u8 region_size;
+	u8 max_luns;
+	u8 subregion_size;
+	u16 max_active_regions;
+	u8 num_hpb_luns;
+};
+
+struct ufshpb_lun {
+	u8 lun;
+};
+
+
+struct ufshpb_config *ufshpb_conf;
+struct ufshpb_lun_config *ufshpb_luns_conf;
+struct ufshpb_lun *ufshpb_luns;
+static unsigned long ufshpb_lun_map[BITS_TO_LONGS(UFSHPB_MAX_LUNS)];
+static u8 ufshpb_lun_lookup[UFSHPB_MAX_LUNS];
+
+/**
+ * ufshpb_remove - ufshpb cleanup
+ *
+ * Should be called when unloading the driver.
+ */
+void ufshpb_remove(struct ufs_hba *hba)
+{
+	kfree(ufshpb_conf);
+	kfree(ufshpb_luns_conf);
+	kfree(ufshpb_luns);
+	ufshcd_query_flag(hba, UPIU_QUERY_OPCODE_SET_FLAG,
+			  QUERY_FLAG_IDN_HPB_RESET, 0, NULL);
+}
+
+static int ufshpb_hpb_init(void)
+{
+	u8 num_hpb_luns = ufshpb_conf->num_hpb_luns;
+	int i;
+
+	ufshpb_luns = kcalloc(num_hpb_luns, sizeof(*ufshpb_luns), GFP_KERNEL);
+	if (!ufshpb_luns)
+		return -ENOMEM;
+
+	for (i = 0; i < num_hpb_luns; i++) {
+		struct ufshpb_lun *hpb = ufshpb_luns + i;
+
+		hpb->lun = (ufshpb_luns_conf + i)->lun;
+	}
+
+	return 0;
+}
+
+static inline bool ufshpb_desc_proper_sizes(struct ufs_hba *hba)
+{
+	return hba->desc_size.dev_desc >= DEV_DESC_HPB_SIZE &&
+		hba->desc_size.geom_desc >= GEO_DESC_HPB_SIZE &&
+		hba->desc_size.unit_desc >= UNIT_DESC_HPB_SIZE;
+}
+
+static int ufshpb_get_geo_config(struct ufs_hba *hba)
+{
+	int ret;
+	u8 *desc_buf;
+
+	desc_buf = kzalloc(GEO_DESC_HPB_SIZE, GFP_KERNEL);
+	if (!desc_buf)
+		return -ENOMEM;
+
+	ret = ufshcd_read_desc_param(hba, QUERY_DESC_IDN_GEOMETRY, 0, 0,
+				     desc_buf, GEO_DESC_HPB_SIZE);
+	if (ret)
+		goto free;
+
+	ufshpb_conf->region_size =
+			desc_buf[GEOMETRY_DESC_PARAM_HPB_REGION_SIZE];
+	ufshpb_conf->subregion_size =
+			desc_buf[GEOMETRY_DESC_PARAM_HPB_SUBREGION_SIZE];
+	ufshpb_conf->max_luns = desc_buf[GEOMETRY_DESC_PARAM_MAX_HPB_LUNS];
+
+	/* None of HPB geometrical parameters should be 0 */
+	if (!ufshpb_conf->max_luns || !ufshpb_conf->region_size ||
+	    !ufshpb_conf->subregion_size) {
+		ret = -EINVAL;
+		goto free;
+	}
+
+	ret = 0;
+
+free:
+	kfree(desc_buf);
+	return ret;
+}
+
+static int ufshpb_get_unit_config(struct ufs_hba *hba, u8 dev_num_luns)
+{
+	int ret;
+	struct ufshpb_lun_config *luns_conf;
+	u8 *desc_buf;
+	u8 num_hpb_luns = 0;
+	u16 max_active_regions = 0;
+	int i;
+
+	luns_conf = kcalloc(ufshpb_conf->max_luns, sizeof(*luns_conf),
+			    GFP_KERNEL);
+	if (!luns_conf)
+		return -ENOMEM;
+
+	desc_buf = kzalloc(UNIT_DESC_HPB_SIZE, GFP_KERNEL);
+	if (!desc_buf) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/* iterate over all unit descriptors, look for hpb luns */
+	for (i = 0; i < dev_num_luns; i++) {
+		struct ufshpb_lun_config *lun_conf;
+		u64 block_count;
+		u8 block_size;
+
+		memset(desc_buf, 0x0, UNIT_DESC_HPB_SIZE);
+
+		ret = ufshcd_read_desc_param(hba, QUERY_DESC_IDN_UNIT, i, 0,
+					     desc_buf, UNIT_DESC_HPB_SIZE);
+		if (ret)
+			goto out;
+
+		if (desc_buf[UNIT_DESC_PARAM_LU_ENABLE] != UFS_LUN_HPB)
+			continue;
+
+		lun_conf = luns_conf + num_hpb_luns;
+
+		/* mark the hpb lun for future lookup */
+		ufshpb_lun_lookup[i] = num_hpb_luns;
+		lun_conf->lun = i;
+		lun_conf->hpb_lun_idx = num_hpb_luns;
+		set_bit(i, ufshpb_lun_map);
+		num_hpb_luns++;
+
+		block_count = get_unaligned_be64(
+			&desc_buf[UNIT_DESC_PARAM_LOGICAL_BLK_COUNT]);
+		block_size = desc_buf[UNIT_DESC_PARAM_LOGICAL_BLK_SIZE];
+		lun_conf->size = BIT(block_size) * block_count;
+
+		/* get ufshpb params */
+		lun_conf->max_active_regions = get_unaligned_be16(
+			&desc_buf[UNIT_DESC_PARAM_MAX_ACTIVE_HPBREGIONS]);
+
+		/*
+		 * wDeviceMaxActiveHPBRegions is redundant. Suffices to keep
+		 * track of sum of wLUMaxActiveHPBRegions over the luns.
+		 */
+		max_active_regions += lun_conf->max_active_regions;
+
+		lun_conf->pinned_count = get_unaligned_be16(
+			&desc_buf[UNIT_DESC_PARAM_HPB_PINNED_COUNT]);
+		if (lun_conf->pinned_count)
+			lun_conf->pinned_starting_idx = get_unaligned_be16(
+				&desc_buf[UNIT_DESC_PARAM_HPB_PINNED_OFFSET]);
+
+		/* there can be up to bHPBNumberLU HPB luns */
+		if (num_hpb_luns == ufshpb_conf->max_luns)
+			break;
+	}
+
+	if (!num_hpb_luns || !max_active_regions) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ufshpb_conf->num_hpb_luns = num_hpb_luns;
+	ufshpb_conf->max_active_regions = max_active_regions;
+	ufshpb_luns_conf = luns_conf;
+
+	ret = 0;
+
+out:
+	kfree(desc_buf);
+	if (ret)
+		kfree(luns_conf);
+	return ret;
+}
+
+/**
+ * ufshpb_probe - read hpb config from the device
+ * @hba: per adapter object
+ *
+ * Called during initial loading of the driver, and before scsi_scan_host.
+ */
+int ufshpb_probe(struct ufs_hba *hba)
+{
+	int ret = -EINVAL;
+	u16 spec_ver = 0;
+	u8 dev_num_luns = 0;
+	u8 *dev_desc = NULL;
+
+	/*
+	 * if the HPB database already exist, that's fine, leave it.
+	 * The device will signal should it needs to be reset.
+	 */
+	if (ufshpb_luns)
+		return 0;
+
+	if (!ufshpb_desc_proper_sizes(hba))
+		goto out;
+
+	dev_desc = kzalloc(DEV_DESC_HPB_SIZE, GFP_KERNEL);
+	if (!dev_desc) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (ufshcd_read_desc_param(hba, QUERY_DESC_IDN_DEVICE, 0, 0, dev_desc,
+				   DEV_DESC_HPB_SIZE))
+		goto out;
+
+	spec_ver = get_unaligned_be16(&dev_desc[DEVICE_DESC_PARAM_SPEC_VER]);
+	if (spec_ver < UFSHPB_SPEC_VER)
+		goto out;
+
+	if (!(dev_desc[DEVICE_DESC_PARAM_UFS_FEAT] & UFS_FEATURE_HPB))
+		goto out;
+
+	ufshpb_conf = kzalloc(sizeof(*ufshpb_conf), GFP_KERNEL);
+	if (!ufshpb_conf) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ufshpb_conf->version =
+		get_unaligned_be16(&dev_desc[DEVICE_DESC_PARAM_HPBVER]);
+	ufshpb_conf->mode = dev_desc[DEVICE_DESC_PARAM_HPBCONTROL];
+
+	/* this driver supports only host control mode */
+	if (ufshpb_conf->mode != UFSHPB_HOST_CONTROL)
+		goto out;
+
+	/* bNumberLU does not include wluns which can't be hpb luns */
+	dev_num_luns = dev_desc[DEVICE_DESC_PARAM_NUM_LU];
+
+	ret = ufshpb_get_geo_config(hba);
+	if (ret)
+		goto out;
+
+	ret = ufshpb_get_unit_config(hba, dev_num_luns);
+	if (ret)
+		goto out;
+
+	ret = ufshpb_hpb_init();
+	if (ret)
+		goto out;
+
+out:
+	kfree(dev_desc);
+	if (ret) {
+		dev_err(hba->dev, "err %d while probing HPB\n", ret);
+		hba->caps &= ~UFSHCD_CAP_HPB;
+		ufshpb_remove(hba);
+	}
+
+	return ret;
+}
+
+MODULE_AUTHOR("Avri Altman <avri.altman@wdc.com>");
+MODULE_DESCRIPTION("UFS Host Performance Booster Driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/scsi/ufs/ufshpb.h b/drivers/scsi/ufs/ufshpb.h
new file mode 100644
index 0000000..ee990f4
--- /dev/null
+++ b/drivers/scsi/ufs/ufshpb.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2017-2018 Samsung Electronics Co., Ltd.
+ * Copyright (C) 2018, Google, Inc.
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ */
+#ifndef _UFSHPB_H
+#define _UFSHPB_H
+
+#include <linux/types.h>
+
+struct ufs_hba;
+
+#ifdef CONFIG_SCSI_UFS_HPB
+void ufshpb_remove(struct ufs_hba *hba);
+int ufshpb_probe(struct ufs_hba *hba);
+#else
+static inline void ufshpb_remove(struct ufs_hba *hba) {}
+static inline int ufshpb_probe(struct ufs_hba *hba) { return 0; }
+#endif
+
+#endif /* _UFSHPB_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 03/13] scsi: scsi_dh: Introduce scsi_dh_ufshpb
  2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
  2020-05-15 10:30 ` [RFC PATCH 01/13] scsi: ufs: Add HPB parameters Avri Altman
  2020-05-15 10:30 ` [RFC PATCH 02/13] scsi: ufshpb: Init part I - Read HPB config Avri Altman
@ 2020-05-15 10:30 ` Avri Altman
  2020-05-16  1:48   ` Bart Van Assche
  2020-05-15 10:30 ` [RFC PATCH 04/13] scsi: ufs: ufshpb: Init part II - Attach scsi device Avri Altman
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Avri Altman @ 2020-05-15 10:30 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI, Avri Altman

Add a scsi device handler companion to support ufs hpb.

The scsi-device-handler infrastructure was added to the kernel mainly to
facilitate multiple paths for storage devices.  The HPB framework,
although far from the original intention of the authors, might as well
fit in.  In that sense, using READs and HPB_READs intermittently, can be
perceived as a multi-path.

Device handlers are meant to be called as part of the device mapper
multi-path code.  We can’t do that – we need to attach & activate the
device handler manually, only after the ufs driver verified that HPB is
supported by both the platform and the device.

At this point just add the bare minimum to allow the ufs driver to
complete its second phase of initialization: attaching and activating
the device handler.  This module however, will store the L2P cache,
perform the cache-management functionality, and will setup the HPB_READ
command instead of READ. So it is about to grow extensively.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
---
 drivers/scsi/device_handler/Makefile         |  1 +
 drivers/scsi/device_handler/scsi_dh_ufshpb.c | 73 ++++++++++++++++++++++++++++
 drivers/scsi/ufs/Kconfig                     |  1 +
 drivers/scsi/ufs/ufshpb.c                    | 39 +--------------
 include/scsi/scsi_dh_ufshpb.h                | 50 +++++++++++++++++++
 5 files changed, 126 insertions(+), 38 deletions(-)
 create mode 100644 drivers/scsi/device_handler/scsi_dh_ufshpb.c
 create mode 100644 include/scsi/scsi_dh_ufshpb.h

diff --git a/drivers/scsi/device_handler/Makefile b/drivers/scsi/device_handler/Makefile
index 0a603ae..6587622 100644
--- a/drivers/scsi/device_handler/Makefile
+++ b/drivers/scsi/device_handler/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_SCSI_DH_RDAC)	+= scsi_dh_rdac.o
 obj-$(CONFIG_SCSI_DH_HP_SW)	+= scsi_dh_hp_sw.o
 obj-$(CONFIG_SCSI_DH_EMC)	+= scsi_dh_emc.o
 obj-$(CONFIG_SCSI_DH_ALUA)	+= scsi_dh_alua.o
+obj-$(CONFIG_SCSI_UFS_HPB)	+= scsi_dh_ufshpb.o
diff --git a/drivers/scsi/device_handler/scsi_dh_ufshpb.c b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
new file mode 100644
index 0000000..f26208c
--- /dev/null
+++ b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
@@ -0,0 +1,73 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Host Performance Booster(HPB) for UFS devices
+ *
+ * Copyright (C) 2017-2018 Samsung Electronics Co., Ltd.
+ * Copyright (C) 2018, Google, Inc.
+ * Copyright (C) 2020 Western Digital Corporation or its affiliates.
+ */
+
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <scsi/scsi.h>
+#include <scsi/scsi_dbg.h>
+#include <scsi/scsi_eh.h>
+#include <scsi/scsi_dh.h>
+#include <scsi/scsi_dh_ufshpb.h>
+
+#define UFSHPB_NAME	"ufshpb"
+
+struct ufshpb_dh_data {
+	const struct ufshpb_config *ufshpb_conf;
+	const struct ufshpb_lun_config *ufshpb_lun_conf;
+};
+
+static int ufshpb_attach(struct scsi_device *sdev)
+{
+	struct ufshpb_dh_data *h;
+
+	h = kzalloc(sizeof(*h), GFP_KERNEL);
+	if (!h)
+		return SCSI_DH_NOMEM;
+
+	sdev_printk(KERN_INFO, sdev, "%s: attached to sdev (lun) %llu\n",
+		    UFSHPB_NAME, sdev->lun);
+
+	sdev->handler_data = h;
+
+	return SCSI_DH_OK;
+}
+
+/*
+ * scsi_dh_detach will be called from the ufs driver upon failuire and on
+ * remove.
+ */
+static void ufshpb_detach(struct scsi_device *sdev)
+{
+	kfree(sdev->handler_data);
+	sdev->handler_data = NULL;
+}
+
+static struct scsi_device_handler ufshpb_dh = {
+	.name		= UFSHPB_NAME,
+	.module		= THIS_MODULE,
+	.attach		= ufshpb_attach,
+	.detach		= ufshpb_detach,
+};
+
+static int __init ufshpb_init(void)
+{
+	return scsi_register_device_handler(&ufshpb_dh);
+}
+
+static void __exit ufshpb_exit(void)
+{
+	scsi_unregister_device_handler(&ufshpb_dh);
+}
+
+module_init(ufshpb_init);
+module_exit(ufshpb_exit);
+
+MODULE_DESCRIPTION("ufshpb scsi device handler driver");
+MODULE_AUTHOR("Avri Altman <avri.altman@wdc.com>");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/scsi/ufs/Kconfig b/drivers/scsi/ufs/Kconfig
index a540919..0868d53 100644
--- a/drivers/scsi/ufs/Kconfig
+++ b/drivers/scsi/ufs/Kconfig
@@ -164,6 +164,7 @@ config SCSI_UFS_BSG
 config SCSI_UFS_HPB
 	bool "Support UFS Host Performance Booster (HPB)"
         depends on SCSI_UFSHCD
+	select SCSI_DH
         help
 	  A UFS feature targeted to improve random read performance.  It uses
 	  the host’s system memory as a cache for L2P map data, so that both
diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
index 181917f..e94e699 100644
--- a/drivers/scsi/ufs/ufshpb.c
+++ b/drivers/scsi/ufs/ufshpb.c
@@ -7,6 +7,7 @@
  */
 
 #include <asm/unaligned.h>
+#include <scsi/scsi_dh_ufshpb.h>
 #include "ufshcd.h"
 #include "ufs.h"
 #include "ufshpb.h"
@@ -22,44 +23,6 @@ enum ufshpb_control_modes {
 	UFSHPB_DEVICE_CONTROL
 };
 
-/**
- * struct ufshpb_lun_conf - to hold hpb lun config from unit descriptor
- * @lun - lun id
- * @size - lun size = 2^bLogicalBlockSize x qLogicalBlockCount
- * @max_active_regions - Maximum Number of Active HPB Regions for that LU
- * @pinned_starting_idx - HPB Pinned Region Start Offset
- * @pinned_count - Number of HPB Pinned Regions
- */
-struct ufshpb_lun_config {
-	u8 lun;
-	u64 size;
-	u8 hpb_lun_idx;
-	u16 max_active_regions;
-	u16 pinned_starting_idx;
-	u16 pinned_count;
-};
-
-/**
- * struct ufshpb_config - to hold hpb config supported by the device
- * @version - HPB Specification Version
- * @mode - hpb control mode
- * @region_size - HPB Region size[B] = 512B x 2^bHPBRegionSize
- * @max_luns - Maximum number of HPB LU supported by the device <= 0x20
- * @subregion_size - HPB Sub-Region size[B] = 512B x 2^bHPBSubRegionSize
- * @max_active_regions - Maximum number of Active HPB Regions
- * @num_hpb_luns - hpb luns that are configured in the device
- * @hpb_luns - to hold hpb lun config from unit descriptor
- */
-struct ufshpb_config {
-	u16 version;
-	u8 mode;
-	u8 region_size;
-	u8 max_luns;
-	u8 subregion_size;
-	u16 max_active_regions;
-	u8 num_hpb_luns;
-};
-
 struct ufshpb_lun {
 	u8 lun;
 };
diff --git a/include/scsi/scsi_dh_ufshpb.h b/include/scsi/scsi_dh_ufshpb.h
new file mode 100644
index 0000000..3dcb442
--- /dev/null
+++ b/include/scsi/scsi_dh_ufshpb.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Header file for UFS HPB device handler
+ *
+ * Copyright (C) 2020 Western Digital Corporation or its affiliates.
+ */
+#ifndef _SCSI_DH_UFSHPB_H
+#define _SCSI_DH_UFSHPB_H
+
+#include <scsi/scsi_device.h>
+
+/**
+ * struct ufshpb_lun_conf - to hold hpb lun config from unit descriptor
+ * @lun - lun id
+ * @size - lun size = 2^bLogicalBlockSize x qLogicalBlockCount
+ * @max_active_regions - Maximum Number of Active HPB Regions for that LU
+ * @pinned_starting_idx - HPB Pinned Region Start Offset
+ * @pinned_count - Number of HPB Pinned Regions
+ */
+struct ufshpb_lun_config {
+	u8 lun;
+	u64 size;
+	u8 hpb_lun_idx;
+	u16 max_active_regions;
+	u16 pinned_starting_idx;
+	u16 pinned_count;
+};
+
+/**
+ * struct ufshpb_config - to hold hpb config supported by the device
+ * @version - HPB Specification Version
+ * @mode - hpb control mode
+ * @region_size - HPB Region size[B] = 512B x 2^bHPBRegionSize
+ * @max_luns - Maximum number of HPB LU supported by the device <= 0x20
+ * @subregion_size - HPB Sub-Region size[B] = 512B x 2^bHPBSubRegionSize
+ * @max_active_regions - Maximum number of Active HPB Regions
+ * @num_hpb_luns - hpb luns that are configured in the device
+ * @hpb_luns - to hold hpb lun config from unit descriptor
+ */
+struct ufshpb_config {
+	u16 version;
+	u8 mode;
+	u8 region_size;
+	u8 max_luns;
+	u8 subregion_size;
+	u16 max_active_regions;
+	u8 num_hpb_luns;
+};
+#endif /* _SCSI_DH_UFSHPB_H */
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 04/13] scsi: ufs: ufshpb: Init part II - Attach scsi device
  2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
                   ` (2 preceding siblings ...)
  2020-05-15 10:30 ` [RFC PATCH 03/13] scsi: scsi_dh: Introduce scsi_dh_ufshpb Avri Altman
@ 2020-05-15 10:30 ` Avri Altman
  2020-05-16  1:52   ` Bart Van Assche
  2020-05-15 10:30 ` [RFC PATCH 05/13] scsi: ufs: ufshpb: Disable HPB if no HPB-enabled luns Avri Altman
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Avri Altman @ 2020-05-15 10:30 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI, Avri Altman

The ufs boot process is essentially comprised of 2 parts: first a
handshake with the device, and then, scsi scans and assign a scsi device
to each lun.  The latter, although running a-synchronically, is
happening right after reading the device configuration - lun by lun.

By now we've read the device HPB configuration, and we are ready  to
attach a scsi device to our HPB luns.  A perfect timing might be while
scsi is performing its .slave_alloc() or .slave_configure().

Signed-off-by: Avri Altman <avri.altman@wdc.com>
---
 drivers/scsi/ufs/ufshcd.c |   3 ++
 drivers/scsi/ufs/ufshpb.c | 103 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/ufs/ufshpb.h |   3 ++
 3 files changed, 109 insertions(+)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index bffe699..c2011bf 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -4628,6 +4628,9 @@ static int ufshcd_slave_alloc(struct scsi_device *sdev)
 
 	ufshcd_get_lu_power_on_wp_status(hba, sdev);
 
+	if (ufshcd_is_hpb_supported(hba))
+		ufshpb_attach_sdev(hba, sdev);
+
 	return 0;
 }
 
diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
index e94e699..4a10e7b 100644
--- a/drivers/scsi/ufs/ufshpb.c
+++ b/drivers/scsi/ufs/ufshpb.c
@@ -7,7 +7,9 @@
  */
 
 #include <asm/unaligned.h>
+#include <scsi/scsi_dh.h>
 #include <scsi/scsi_dh_ufshpb.h>
+#include "../scsi_priv.h"
 #include "ufshcd.h"
 #include "ufs.h"
 #include "ufshpb.h"
@@ -25,6 +27,7 @@ enum ufshpb_control_modes {
 
 struct ufshpb_lun {
 	u8 lun;
+	struct scsi_device *sdev;
 };
 
 
@@ -34,6 +37,91 @@ struct ufshpb_lun *ufshpb_luns;
 static unsigned long ufshpb_lun_map[BITS_TO_LONGS(UFSHPB_MAX_LUNS)];
 static u8 ufshpb_lun_lookup[UFSHPB_MAX_LUNS];
 
+void ufshpb_remove_lun(u8 lun)
+{
+	struct ufshpb_lun *hpb;
+
+	if (!ufshpb_luns)
+		return;
+
+	hpb = ufshpb_luns + lun;
+	if (hpb->sdev && hpb->sdev->handler_data) {
+		if (scsi_device_get(hpb->sdev))
+			return;
+
+		scsi_dh_release_device(hpb->sdev);
+		scsi_device_put(hpb->sdev);
+	}
+}
+
+/**
+ * ufshpb_attach_sdev - attach and activate a hpb device handler
+ * @hba: per adapter object
+ * @sdev: scsi device that owns that handler
+ *
+ * Called during .slave_alloc(), and after ufshpb_probe() read the device hpb
+ * configuration.
+ */
+void ufshpb_attach_sdev(struct ufs_hba *hba, struct scsi_device *sdev)
+{
+	struct ufshpb_lun *hpb_lun;
+	struct ufshpb_dh_data {
+		struct ufshpb_config *c;
+		struct ufshpb_lun_config *lc;
+	} h;
+	int ret = -EINVAL;
+	u8 lun = sdev->lun;
+	u8 i;
+
+	/* ignore w-luns as those can't be hpb luns */
+	if (lun >= UFSHPB_MAX_LUNS)
+		return;
+
+	/* ignore non-hpb luns */
+	if (!test_bit(lun, ufshpb_lun_map))
+		return;
+
+	i = ufshpb_lun_lookup[lun];
+
+	if (sdev->handler && sdev->handler_data) {
+		dev_err(hba->dev, "trying to re-attach lun %d ?\n", lun);
+		goto out;
+	}
+
+	if (!ufshpb_luns) {
+		dev_err(hba->dev, "HPB was already removed\n");
+		goto out;
+	}
+
+	if (scsi_device_get(sdev)) {
+		dev_err(hba->dev, "failed to get sdev for lun %d\n", lun);
+		goto out;
+	}
+
+	hpb_lun = ufshpb_luns + i;
+	hpb_lun->sdev = sdev;
+
+	ret = scsi_dh_attach(sdev->request_queue, "ufshpb");
+	if (ret || !sdev->handler || !sdev->handler_data)
+		goto put_scsi_dev;
+
+	h.c = ufshpb_conf;
+	h.lc = ufshpb_luns_conf + i;
+
+	memcpy(sdev->handler_data, &h, sizeof(h));
+
+	ret = scsi_dh_activate(sdev->request_queue, NULL, NULL);
+
+put_scsi_dev:
+	scsi_device_put(sdev);
+
+out:
+	if (ret) {
+		dev_err(hba->dev, "attach sdev to HPB lun %d failed\n", lun);
+		ufshpb_remove_lun(i);
+	}
+}
+
 /**
  * ufshpb_remove - ufshpb cleanup
  *
@@ -41,9 +129,24 @@ static u8 ufshpb_lun_lookup[UFSHPB_MAX_LUNS];
  */
 void ufshpb_remove(struct ufs_hba *hba)
 {
+	if (!ufshpb_conf)
+		goto remove_hpb;
+
+	if (ufshpb_luns) {
+		unsigned int num_hpb_luns = ufshpb_conf->num_hpb_luns;
+		int i;
+
+		spin_lock(hba->host->host_lock);
+		for (i = 0; i < num_hpb_luns; i++)
+			ufshpb_remove_lun(i);
+		spin_unlock(hba->host->host_lock);
+	}
+
 	kfree(ufshpb_conf);
 	kfree(ufshpb_luns_conf);
 	kfree(ufshpb_luns);
+
+remove_hpb:
 	ufshcd_query_flag(hba, UPIU_QUERY_OPCODE_SET_FLAG,
 			  QUERY_FLAG_IDN_HPB_RESET, 0, NULL);
 }
diff --git a/drivers/scsi/ufs/ufshpb.h b/drivers/scsi/ufs/ufshpb.h
index ee990f4..276a749 100644
--- a/drivers/scsi/ufs/ufshpb.h
+++ b/drivers/scsi/ufs/ufshpb.h
@@ -14,9 +14,12 @@ struct ufs_hba;
 #ifdef CONFIG_SCSI_UFS_HPB
 void ufshpb_remove(struct ufs_hba *hba);
 int ufshpb_probe(struct ufs_hba *hba);
+void ufshpb_attach_sdev(struct ufs_hba *hba, struct scsi_device *sdev);
 #else
 static inline void ufshpb_remove(struct ufs_hba *hba) {}
 static inline int ufshpb_probe(struct ufs_hba *hba) { return 0; }
+static inline void
+ufshpb_attach_sdev(struct ufs_hba *hba, struct scsi_device *sdev) {}
 #endif
 
 #endif /* _UFSHPB_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 05/13] scsi: ufs: ufshpb: Disable HPB if no HPB-enabled luns
  2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
                   ` (3 preceding siblings ...)
  2020-05-15 10:30 ` [RFC PATCH 04/13] scsi: ufs: ufshpb: Init part II - Attach scsi device Avri Altman
@ 2020-05-15 10:30 ` Avri Altman
  2020-05-16  2:02   ` Bart Van Assche
  2020-05-15 10:30 ` [RFC PATCH 06/13] scsi: scsi_dh: ufshpb: Prepare for L2P cache management Avri Altman
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Avri Altman @ 2020-05-15 10:30 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI, Avri Altman

We are disabling the feature altogether if reading the configuration
fails, or there are no HPB-luns configured, or no active regions are
configure.  Still, after reading the configuration successfully, there
can be no HBP-enabled luns, e.g. if their pinned regions allocation has
failed.

So to avoid keep checking the upiu responses for nothing, we are
verifying that indeed there are any HPB-enabled luns out there after
1min from the configuration read.  By then all the luns were scanned and
initialized their device handler.  If there are no HPB-enabled luns –
the feature is disabled altogether.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
---
 drivers/scsi/ufs/ufshcd.h |  1 +
 drivers/scsi/ufs/ufshpb.c | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index e7014a3..19a5613 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -743,6 +743,7 @@ struct ufs_hba {
 	struct request_queue	*bsg_queue;
 	bool wb_buf_flush_enabled;
 	bool wb_enabled;
+	struct delayed_work hpb_disable_work;
 };
 
 /* Returns true if clocks can be gated. Otherwise false */
diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
index 4a10e7b..be926cb 100644
--- a/drivers/scsi/ufs/ufshpb.c
+++ b/drivers/scsi/ufs/ufshpb.c
@@ -299,6 +299,28 @@ static int ufshpb_get_unit_config(struct ufs_hba *hba, u8 dev_num_luns)
 	return ret;
 }
 
+static void ufshpb_disable_work(struct work_struct *work)
+{
+	struct ufs_hba *hba = container_of(work, struct ufs_hba,
+					   hpb_disable_work.work);
+
+	if (ufshpb_luns) {
+		unsigned int num_hpb_luns = ufshpb_conf->num_hpb_luns;
+		int i;
+
+		for (i = 0; i < num_hpb_luns; i++) {
+			struct ufshpb_lun *hpb = ufshpb_luns + i;
+
+			if (hpb->sdev && hpb->sdev->handler_data)
+				return;
+		}
+	}
+
+	hba->caps &= ~UFSHCD_CAP_HPB;
+	ufshpb_remove(hba);
+	dev_info(hba->dev, "HPB was removed - no active HPB luns\n");
+}
+
 /**
  * ufshpb_probe - read hpb config from the device
  * @hba: per adapter object
@@ -368,6 +390,8 @@ int ufshpb_probe(struct ufs_hba *hba)
 	if (ret)
 		goto out;
 
+	INIT_DELAYED_WORK(&hba->hpb_disable_work, ufshpb_disable_work);
+	schedule_delayed_work(&hba->hpb_disable_work, 60 * HZ);
 out:
 	kfree(dev_desc);
 	if (ret) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 06/13] scsi: scsi_dh: ufshpb: Prepare for L2P cache management
  2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
                   ` (4 preceding siblings ...)
  2020-05-15 10:30 ` [RFC PATCH 05/13] scsi: ufs: ufshpb: Disable HPB if no HPB-enabled luns Avri Altman
@ 2020-05-15 10:30 ` Avri Altman
  2020-05-16  2:13   ` Bart Van Assche
  2020-05-15 10:30 ` [RFC PATCH 07/13] scsi: scsi_dh: ufshpb: Add ufshpb state machine Avri Altman
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Avri Altman @ 2020-05-15 10:30 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI, Avri Altman

Preparing for L2P cache management, this patch introduces the memory
pool supportive structures and constants.

The logical unit address space is divided into equally-sized slots
called Regions. Each such region is further divided into equally-sized
subregions.  subregions which theirs L2P mapping entries are cached in
host system memory are called active subregions. An active region is a
region which includes at least one active subregion. A HPB Entry
provides the physical block address of a specific LBA. The size of the
Entry is 8 bytes. HPB entries of all LBAs included in a subregion are
transferred with a single HPB_READ_BUFFER command.

Each lun is limited by the number of its active regions.  This, on one
hand, determines the size of the L2P cache, and the nature of its by-lun
management, on the other.  The second critical factor that determines
how the L2P will be managed is the fact that subregion is the atomic
unit that can be activated and updated.

We also want to manage the L2P mapping by using pages, because we want
to utilize the kernel’s bio submission constructs, should we need to
update the L2P database.

Each sub-region will be designated by a mapping context, which is as an
array of pages, and arranged those as a linked list. We will also
map all subregions using a regions-subregions table. Derived from the
sequential nature of LBAs, such a structure will allow us to retrieve
its applicable HPB entry in O(1).

The actual page allocation is taking place only upon subregion
activation.

Upon failure to allocate any of the memory pool components – HPB is
disabled for that lun.  This is accomplished by the ->detach() handler.
Another case for disabling HPB for that lun can be upon failing to
allocate its pinned regions.

All those allocations and preparations are done as part of the handler
activation phase.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
---
 drivers/scsi/device_handler/scsi_dh_ufshpb.c | 314 +++++++++++++++++++++++++++
 1 file changed, 314 insertions(+)

diff --git a/drivers/scsi/device_handler/scsi_dh_ufshpb.c b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
index f26208c..9b9b338 100644
--- a/drivers/scsi/device_handler/scsi_dh_ufshpb.c
+++ b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
@@ -17,11 +17,279 @@
 
 #define UFSHPB_NAME	"ufshpb"
 
+#define HPB_ENTRY_SIZE (8)
+
+/* subregion activation threshold - % of subregion size */
+#define ACTIVATION_READS (10)
+/* write/discard requests in blocks to consider a subregion as dirty */
+#define SET_AS_DIRTY (1)
+/* subregion inactivation threshold - % of subregion size */
+#define INACTIVATION_WRITES (30)
+/* to control the amount of HPB_READ_BUFFER commands */
+#define READS_READ_BUFFER_SENT ACTIVATION_READS
+/* not to hold on to "cold" subregions */
+#define READ_TIMEOUT_MSEC (500)
+/* allowed rewinds of the read_timeout for “clean” subregion */
+#define READ_TIMEOUT_EXPIRIES (100)
+/* region eviction threshold: 3 x subregions_per_region x MIN_READS */
+#define EVICTION_READS (3 * ACTIVATION_READS)
+
+/**
+ * struct map_ctx - a mapping context
+ * @pages - array of pages
+ * @list - to hang on lh_map_ctx
+ */
+struct ufshpb_map_ctx {
+	char *pages;
+	struct list_head list;
+};
+
+/**
+ * struct ufshpb_subregion
+ * @mctx - mapping context
+ * @hpb - lun
+ * @r - region in which included in
+ * @reads - cumulative read request in blocks
+ * @writes - cumulative write/discard requests in blocks
+ * @reads_last_trial - read request blocks since last activation trial
+ * @read_timeout_expiries - to rewind the read_timer for "clean" subregion
+ * @region - region index
+ * @subregion - subregion index
+ */
+struct ufshpb_subregion {
+	struct ufshpb_map_ctx *mctx;
+	struct ufshpb_dh_lun *hpb;
+	struct ufshpb_region *r;
+	atomic64_t reads;
+	atomic64_t writes;
+	atomic64_t reads_last_trial;
+	atomic_t read_timeout_expiries;
+	unsigned int region;
+	unsigned int subregion;
+};
+
+/**
+ * struct ufshpb_region
+ * @subregion_tbl - subregions column
+ * @hpb - lun
+ * @reads - sum over subregions @reads
+ * @writes - sum over subregions @writes
+ * @region - region index
+ * @active_subregions - actual active subregions
+ */
+struct ufshpb_region {
+	struct ufshpb_subregion *subregion_tbl;
+	struct ufshpb_dh_lun *hpb;
+	atomic64_t reads;
+	atomic64_t writes;
+	unsigned int region;
+
+	atomic_t active_subregions;
+};
+
+/**
+ * struct ufshpb_dh_lun - hpb lun
+ * @list - to hang on to the hpb luns list
+ * @lh_map_ctx - list head of mapping context
+ * @map_list_lock - to protect mapping context list operations
+ * @region_tbl - regions/subregions table
+ * @sdev - scsi device for that lun
+ * @regions_per_lun
+ * @subregions_per_lun - lun size is not guaranteed to be region aligned
+ * @last_subregion_size - 0 (actual size) if lun is (not) subregion aligned
+ * @max_active_regions - Max allowable active regions at any given time
+ * @active_regions - current active regions
+ */
+struct ufshpb_dh_lun {
+	struct list_head list;
+	struct list_head lh_map_ctx;
+	spinlock_t map_list_lock;
+	struct ufshpb_region *region_tbl;
+	struct scsi_device *sdev;
+
+	unsigned int regions_per_lun;
+	unsigned int subregions_per_lun;
+	unsigned int last_subregion_size;
+	unsigned int max_active_regions;
+
+	atomic_t active_regions;
+};
+
 struct ufshpb_dh_data {
 	const struct ufshpb_config *ufshpb_conf;
 	const struct ufshpb_lun_config *ufshpb_lun_conf;
+	/* keep those first - ufshpb lld expects it */
+	struct ufshpb_dh_lun *hpb;
 };
 
+static struct list_head lh_hpb_lunp;
+/* to support device level configuration */
+static bool is_configured;
+
+/* L2P constants */
+static unsigned int entries_per_region;
+static unsigned int entries_per_subregion;
+static unsigned char order;
+static unsigned long region_size;
+static unsigned long subregion_size;
+static unsigned int subregions_per_region;
+
+static int ufshpb_mempool_init(struct ufshpb_dh_lun *hpb)
+{
+	unsigned int max_active_subregions = hpb->max_active_regions *
+		subregions_per_region;
+	int i;
+
+	INIT_LIST_HEAD(&hpb->lh_map_ctx);
+	spin_lock_init(&hpb->map_list_lock);
+
+	for (i = 0 ; i < max_active_subregions; i++) {
+		struct ufshpb_map_ctx *mctx =
+			kzalloc(sizeof(struct ufshpb_map_ctx), GFP_KERNEL);
+
+		if (!mctx) {
+			/*
+			 * mctxs already added in lh_map_ctx will be removed in
+			 * detach
+			 */
+			return -ENOMEM;
+		}
+
+		/* actual page allocation is done upon subregion activation */
+
+		INIT_LIST_HEAD(&mctx->list);
+		list_add(&mctx->list, &hpb->lh_map_ctx);
+	}
+
+	return 0;
+
+}
+
+static int ufshpb_region_tbl_init(struct ufshpb_dh_lun *hpb)
+{
+	struct ufshpb_region *regions;
+	int i, j;
+
+	regions = kcalloc(hpb->regions_per_lun, sizeof(*regions), GFP_KERNEL);
+	if (!regions)
+		return -ENOMEM;
+
+	atomic_set(&hpb->active_regions, 0);
+
+	for (i = 0 ; i < hpb->regions_per_lun; i++) {
+		struct ufshpb_region *r = regions + i;
+		struct ufshpb_subregion *subregions;
+
+		subregions = kcalloc(subregions_per_region, sizeof(*subregions),
+				     GFP_KERNEL);
+		if (!subregions)
+			goto release_mem;
+
+		for (j = 0; j < subregions_per_region; j++) {
+			struct ufshpb_subregion *s = subregions + j;
+
+			s->hpb = hpb;
+			s->r = r;
+			s->region = i;
+			s->subregion = j;
+		}
+
+		r->subregion_tbl = subregions;
+		r->hpb = hpb;
+		r->region = i;
+	}
+
+	hpb->region_tbl = regions;
+
+	return 0;
+
+release_mem:
+	for (i = 0 ; i < hpb->regions_per_lun; i++) {
+		struct ufshpb_region *r = regions + i;
+
+		if (!r->subregion_tbl)
+			break;
+		kfree(r->subregion_tbl);
+	}
+
+	kfree(regions);
+	/* hpb is released in detach */
+	return -ENOMEM;
+}
+
+static int ufshpb_lun_configuration(struct scsi_device *sdev,
+				    struct ufshpb_dh_data *h)
+{
+	const struct ufshpb_lun_config *lun_conf = h->ufshpb_lun_conf;
+	struct ufshpb_dh_lun *hpb = NULL;
+	int ret;
+
+	if (h->hpb)
+		return -EBUSY;
+
+	hpb = kzalloc(sizeof(*hpb), GFP_KERNEL);
+	if (!hpb)
+		return -ENOMEM;
+
+	hpb->max_active_regions = lun_conf->max_active_regions;
+	hpb->regions_per_lun = DIV_ROUND_UP(lun_conf->size, region_size);
+	hpb->subregions_per_lun = DIV_ROUND_UP(lun_conf->size, subregion_size);
+	hpb->last_subregion_size = lun_conf->size % subregion_size;
+
+	ret = ufshpb_mempool_init(hpb);
+	if (ret)
+		goto out_free;
+
+	ret = ufshpb_region_tbl_init(hpb);
+	if (ret)
+		goto out_free;
+
+	INIT_LIST_HEAD(&hpb->list);
+	list_add(&hpb->list, &lh_hpb_lunp);
+
+	hpb->sdev = sdev;
+	h->hpb = hpb;
+
+out_free:
+	if (ret)
+		kfree(hpb);
+	return ret;
+}
+
+static void ufshpb_get_config(const struct ufshpb_config *c)
+{
+	/* get hpb constants */
+	region_size = BIT(c->region_size + 9);
+	subregion_size = BIT(c->subregion_size + 9);
+	subregions_per_region = BIT(c->region_size - c->subregion_size);
+	entries_per_region = BIT(c->region_size - 3);
+	entries_per_subregion = BIT(c->subregion_size - 3);
+	order = get_order(entries_per_subregion * HPB_ENTRY_SIZE);
+
+	INIT_LIST_HEAD(&lh_hpb_lunp);
+	is_configured = true;
+}
+
+static int ufshpb_activate(struct scsi_device *sdev, activate_complete fn,
+			   void *data)
+{
+	int ret;
+	struct ufshpb_dh_data *h = sdev->handler_data;
+
+	/* device level configuration - only at the 1st time */
+	if (!is_configured)
+		ufshpb_get_config(h->ufshpb_conf);
+
+	ret = ufshpb_lun_configuration(sdev, h);
+	if (ret)
+		return ret;
+
+	if (fn)
+		fn(data, ret);
+
+	return 0;
+}
+
 static int ufshpb_attach(struct scsi_device *sdev)
 {
 	struct ufshpb_dh_data *h;
@@ -38,12 +306,57 @@ static int ufshpb_attach(struct scsi_device *sdev)
 	return SCSI_DH_OK;
 }
 
+static void ufshpb_mempool_remove(struct ufshpb_dh_lun *hpb)
+{
+	struct ufshpb_map_ctx *mctx, *next;
+
+	if (!hpb)
+		return;
+
+	spin_lock(&hpb->map_list_lock);
+
+	list_for_each_entry_safe(mctx, next, &hpb->lh_map_ctx, list) {
+		list_del(&mctx->list);
+		kfree(mctx);
+	}
+
+	spin_unlock(&hpb->map_list_lock);
+}
+
+static void ufshpb_region_tbl_remove(struct ufshpb_dh_lun *hpb)
+{
+	struct ufshpb_region *regions;
+
+	if (!hpb)
+		return;
+
+	regions = hpb->region_tbl;
+	if (regions) {
+		int i;
+
+		for (i = 0 ; i < hpb->regions_per_lun; i++) {
+			struct ufshpb_region *r = regions + i;
+
+			kfree(r->subregion_tbl);
+		}
+
+		kfree(regions);
+	}
+
+	list_del(&hpb->list);
+	kfree(hpb);
+}
+
 /*
  * scsi_dh_detach will be called from the ufs driver upon failuire and on
  * remove.
  */
 static void ufshpb_detach(struct scsi_device *sdev)
 {
+	struct ufshpb_dh_data *h = sdev->handler_data;
+
+	ufshpb_mempool_remove(h->hpb);
+	ufshpb_region_tbl_remove(h->hpb);
 	kfree(sdev->handler_data);
 	sdev->handler_data = NULL;
 }
@@ -53,6 +366,7 @@ static struct scsi_device_handler ufshpb_dh = {
 	.module		= THIS_MODULE,
 	.attach		= ufshpb_attach,
 	.detach		= ufshpb_detach,
+	.activate	= ufshpb_activate,
 };
 
 static int __init ufshpb_init(void)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 07/13] scsi: scsi_dh: ufshpb: Add ufshpb state machine
  2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
                   ` (5 preceding siblings ...)
  2020-05-15 10:30 ` [RFC PATCH 06/13] scsi: scsi_dh: ufshpb: Prepare for L2P cache management Avri Altman
@ 2020-05-15 10:30 ` Avri Altman
  2020-05-16  2:44   ` Bart Van Assche
  2020-05-15 10:30 ` [RFC PATCH 08/13] scsi: dh: ufshpb: Activate pinned regions Avri Altman
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Avri Altman @ 2020-05-15 10:30 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI, Avri Altman

The cache management is mainly derived by 2 types of events: the device
hints on one hand, and read & write operations on the other.  On host
management mode, the host may choose not to follow those hints, but we
will.

Both regions and subregions can be either active or inactive. We will
monitor the lun’s regions/subregions state using a designated worker,
which will gets scheduled on each event.

The region state is derived from the state of its subregions: it becomes
active once its first subregion becomes active, and inactive with its
last.

The device may send to the host an activation signal for one of its
inactive subregions, provided that the region is active.  For an
inactive region such activation trial can only be a result of a read
request. Next we try to allocate the required pages to that subregion
and update its L2P mapping by sending HPB_READ_BUFFER command.  Upon
successful completion the subregion becoms active.

A Third option in which we update the subregion's L2P cache, is when the
device may send an update signal to the host for an active subregion.
This is because the host is not entirely aware of the device’s
in-house activities (GC, etc.).  We distinguish such notifications to
an active subregion as an “update” event instead of “activate” event to
an inactive subregion, but both derived from the same device’s
activation recommendation.  In that case, the subregion change its state
to inactive, but does not release its pages, nor the region changes its
active count. Upon successful completion of HPB_READ_BUFFER the
subregion returns to be active, otherwise become inactive.

A subregion can become inactive if its writes count (dirty) exceeds a
given threshold. In order not to hang on to “cold” subregions, we shall
inactivate a subregion that has no READ access for a predefined amount
of time.

HPB_READ_BUFFER is used to update the subregion's L2P cache.  Upon
its  successful completion the subregion is considered to be active.

In the allocation length field of the HPB_READ_BUFFER command, we fill
entries_per_subregion x 8bytes, regardless of the actual subregion size.
For LUNs that are not subregion aligned, the actual subregion size of
the last subregion might be smaller. However, as there can be no valid
LBA that exceeds the valid part of that subregion, and which shall not
be accessed, it suffices to maintain a constant allocation length.

One last implementation point is calling scsi_execute with command
anon to SCSI still works for older kernels. This is because this flow
trust its users to fill the CDB unconditionally, and the SCSI command
setup is bypassed altogether.
This will no longer work, as all of this code was removed last year in
commit a1ce35fa4985 ("block: remove dead elevator code").
We will most likely need to change this in our next versions.

Region eviction refer to the act of activating a region on the expense
of inactivating another region.  This can only occur when active regions
count is at its max, and the system might benefit from activating
another region.  This act is a stranger to our basic concept that region
activation is derived from its subregion activation, but entails some
possible overall gain.
On region eviction, the leaving region is deactivated, and the entering
region is becoming active by activating the subregion that cause the
eviction trial.
The region eviction work is triggered by the same conditions of
subregion activation – if a subregion belongs to an inactive region and
while all other subregion activation restrictions applies.
Evicting an active region is an extreme operation, so the threshold for
both entering and exiting region are quite high.  The entering region
reads should cross the EVICTION_READS threshold, and the exiting region
should have less than half of that amount of reads.
We are aware that there can be many contender subregions trying to evict
someone simultaneously.  Thus we are risking by cyclic-eviction:
evicting a region that just entered by evicting another region.
That’s fine, and we don’t want to interfere this process – just let it
play, as “stronger” regions enter, the eviction process is bound to
converge.
last but not least - the entering subrigion should be clean, so we can
benefit HPB_READing it, at least in the short term.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
---
 drivers/scsi/device_handler/scsi_dh_ufshpb.c | 521 ++++++++++++++++++++++++++-
 1 file changed, 520 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_ufshpb.c b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
index 9b9b338..c94a616 100644
--- a/drivers/scsi/device_handler/scsi_dh_ufshpb.c
+++ b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
@@ -7,6 +7,7 @@
  * Copyright (C) 2020 Western Digital Corporation or its affiliates.
  */
 
+#include <asm/unaligned.h>
 #include <linux/slab.h>
 #include <linux/module.h>
 #include <scsi/scsi.h>
@@ -17,6 +18,13 @@
 
 #define UFSHPB_NAME	"ufshpb"
 
+#define UFSHPB_WRITE_BUFFER (0xfa)
+#define WRITE_BUFFER_TIMEOUT (3 * HZ)
+#define WRITE_BUFFER_RETRIES (3)
+#define UFSHPB_READ_BUFFER (0xf9)
+#define READ_BUFFER_TIMEOUT (3 * HZ)
+#define READ_BUFFER_RETRIES (3)
+
 #define HPB_ENTRY_SIZE (8)
 
 /* subregion activation threshold - % of subregion size */
@@ -34,6 +42,32 @@
 /* region eviction threshold: 3 x subregions_per_region x MIN_READS */
 #define EVICTION_READS (3 * ACTIVATION_READS)
 
+#define to_subregion() (container_of(work, struct ufshpb_subregion, hpb_work))
+
+enum ufshpb_state {
+	HPB_STATE_INACTIVE,
+	HPB_STATE_ACTIVE,
+};
+
+/**
+ * ufshpb_subregion_events
+ * @SUBREGION_EVENT_ACTIVATE - device signal to a non-active subregion
+ * @SUBREGION_EVENT_UPDATE - device signal to an active subregion
+ * @SUBREGION_EVENT_READ - a new read request to a non-active subregion
+ * @SUBREGION_EVENT_DIRTY_THRESHOLD - a dirty threshold crossed
+ * @SUBREGION_EVENT_TIMER - timer since last read expired
+ */
+enum ufshpb_subregion_events {
+	SUBREGION_EVENT_ACTIVATE,
+	SUBREGION_EVENT_UPDATE,
+	SUBREGION_EVENT_READ,
+	SUBREGION_EVENT_DIRTY_THRESHOLD,
+	SUBREGION_EVENT_TIMER,
+
+	/* keep last */
+	SUBREGION_EVENT_MAX
+};
+
 /**
  * struct map_ctx - a mapping context
  * @pages - array of pages
@@ -55,6 +89,9 @@ struct ufshpb_map_ctx {
  * @read_timeout_expiries - to rewind the read_timer for "clean" subregion
  * @region - region index
  * @subregion - subregion index
+ * @state - one of @ufshpb_state
+ * @event - one of @ufshpb_subregion_events
+ * @hpb_work - worker to implement state machine
  */
 struct ufshpb_subregion {
 	struct ufshpb_map_ctx *mctx;
@@ -66,6 +103,11 @@ struct ufshpb_subregion {
 	atomic_t read_timeout_expiries;
 	unsigned int region;
 	unsigned int subregion;
+
+	unsigned state:1;
+	unsigned long event;
+	struct mutex state_lock;
+	struct work_struct hpb_work;
 };
 
 /**
@@ -76,6 +118,7 @@ struct ufshpb_subregion {
  * @writes - sum over subregions @writes
  * @region - region index
  * @active_subregions - actual active subregions
+ * @evicted - to indicated if this region is currently being evicted
  */
 struct ufshpb_region {
 	struct ufshpb_subregion *subregion_tbl;
@@ -85,6 +128,7 @@ struct ufshpb_region {
 	unsigned int region;
 
 	atomic_t active_subregions;
+	atomic_t evicted;
 };
 
 /**
@@ -93,6 +137,7 @@ struct ufshpb_region {
  * @lh_map_ctx - list head of mapping context
  * @map_list_lock - to protect mapping context list operations
  * @region_tbl - regions/subregions table
+ * @pinned_map - to mark pinned regions
  * @sdev - scsi device for that lun
  * @regions_per_lun
  * @subregions_per_lun - lun size is not guaranteed to be region aligned
@@ -105,6 +150,7 @@ struct ufshpb_dh_lun {
 	struct list_head lh_map_ctx;
 	spinlock_t map_list_lock;
 	struct ufshpb_region *region_tbl;
+	unsigned long *pinned_map;
 	struct scsi_device *sdev;
 
 	unsigned int regions_per_lun;
@@ -113,6 +159,10 @@ struct ufshpb_dh_lun {
 	unsigned int max_active_regions;
 
 	atomic_t active_regions;
+
+	struct mutex eviction_lock;
+
+	struct workqueue_struct *wq;
 };
 
 struct ufshpb_dh_data {
@@ -134,6 +184,440 @@ static unsigned long region_size;
 static unsigned long subregion_size;
 static unsigned int subregions_per_region;
 
+static inline void ufshpb_set_write_buf_cmd(unsigned char *cmd,
+					    unsigned int region)
+{
+	cmd[0] = UFSHPB_WRITE_BUFFER;
+	cmd[1] = 0x01;
+	put_unaligned_be16(region, &cmd[2]);
+}
+
+static int ufshpb_submit_write_buf_cmd(struct scsi_device *sdev,
+				       unsigned int region)
+{
+	unsigned char cmd[10] = {};
+	struct scsi_sense_hdr sshdr = {};
+	u64 flags = REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
+		    REQ_FAILFAST_DRIVER;
+	int timeout = WRITE_BUFFER_TIMEOUT;
+	int cmd_retries = WRITE_BUFFER_RETRIES;
+	int ret = 0;
+
+	ufshpb_set_write_buf_cmd(cmd, region);
+
+	ret = scsi_execute(sdev, cmd, DMA_NONE, NULL, 0, NULL, &sshdr,
+			   timeout, cmd_retries, flags, 0, NULL);
+
+	/* HPB spec does not define any error handling */
+	sdev_printk(KERN_INFO, sdev, "%s: WRITE_BUFFER %s result %d\n",
+		    UFSHPB_NAME, ret ? "failed" : "succeeded", ret);
+
+	return ret;
+}
+
+static void ufshpb_region_inactivate(struct ufshpb_dh_lun *hpb,
+				     struct ufshpb_region *r, bool reset)
+{
+	if (!reset)
+		ufshpb_submit_write_buf_cmd(hpb->sdev, r->region);
+
+	atomic_dec(&hpb->active_regions);
+}
+
+static void __subregion_inactivate(struct ufshpb_dh_lun *hpb,
+				   struct ufshpb_region *r,
+				   struct ufshpb_subregion *s)
+{
+	if (!s->mctx)
+		return;
+
+	if (s->mctx->pages) {
+		free_pages((unsigned long)s->mctx->pages, order);
+		s->mctx->pages = NULL;
+	}
+
+	spin_lock(&hpb->map_list_lock);
+	list_add(&s->mctx->list, &hpb->lh_map_ctx);
+	spin_unlock(&hpb->map_list_lock);
+	s->mctx = NULL;
+}
+
+static void ufshpb_subregion_inactivate(struct ufshpb_dh_lun *hpb,
+					struct ufshpb_region *r,
+					struct ufshpb_subregion *s, bool reset)
+{
+	lockdep_assert_held(&s->state_lock);
+
+	if (s->state == HPB_STATE_INACTIVE)
+		return;
+
+	if (test_bit(r->region, hpb->pinned_map))
+		return;
+
+	__subregion_inactivate(hpb, r, s);
+
+	s->state = HPB_STATE_INACTIVE;
+
+	if (atomic_dec_and_test(&r->active_subregions))
+		ufshpb_region_inactivate(hpb, r, reset);
+}
+
+static void ufshpb_set_read_buf_cmd(unsigned char *cmd, unsigned int region,
+				    unsigned int subregion,
+				    unsigned int alloc_len)
+{
+	cmd[0] = UFSHPB_READ_BUFFER;
+	cmd[1] = 0x01;
+	put_unaligned_be16(region, &cmd[2]);
+	put_unaligned_be16(subregion, &cmd[4]);
+
+	cmd[6] = alloc_len >> 16;
+	cmd[7] = (alloc_len >> 8) & 0xff;
+	cmd[8] = alloc_len & 0xff;
+	cmd[9] = 0x00;
+}
+
+static bool ufshpb_read_buf_cmd_need_retry(struct ufshpb_dh_lun *hpb,
+					   struct scsi_device *sdev,
+					   struct scsi_sense_hdr *sshdr,
+					   int result)
+{
+	if (atomic_read(&hpb->active_regions) >= hpb->max_active_regions)
+		return false;
+
+	/* UFS3.1 spec says: "...
+	 * If the device is performing internal maintenance work, for example
+	 * Garbage collection, the device may terminate the HPB READ BUFFER
+	 * command by sending RESPONSE UPIU with CHECK CONDITION status, with
+	 * the SENSE KEY set to ILLEGAL REQUEST, and the additional sense code
+	 * set OPERATION IN PROGRESS. In this case, the host can retry the HPB
+	 * READ BUFFER command..."
+	 */
+	if (scsi_sense_valid(sshdr) &&
+	    status_byte(result) == CHECK_CONDITION &&
+	    sshdr->sense_key == ILLEGAL_REQUEST && sshdr->asc == 0x00
+	    && sshdr->ascq == 0x16)
+		return true;
+
+	return false;
+}
+
+static int ufshpb_submit_read_buf_cmd(struct ufshpb_dh_lun *hpb,
+				      struct ufshpb_subregion *s)
+{
+	struct scsi_device *sdev = hpb->sdev;
+	unsigned char cmd[10] = {};
+	struct scsi_sense_hdr sshdr = {};
+	char *buf = s->mctx->pages;
+	unsigned int alloc_len = entries_per_subregion * HPB_ENTRY_SIZE;
+	int retries = READ_BUFFER_RETRIES;
+	int ret;
+
+	ufshpb_set_read_buf_cmd(cmd, s->region, s->subregion, alloc_len);
+
+	do {
+		u64 flags = REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
+			REQ_FAILFAST_DRIVER;
+		unsigned int buflen = alloc_len;
+
+		memset(&sshdr, 0, sizeof(sshdr));
+		memset(buf, 0, buflen);
+		ret = scsi_execute(sdev, cmd, DMA_FROM_DEVICE, buf, buflen,
+				NULL, &sshdr, READ_BUFFER_TIMEOUT,
+				READ_BUFFER_RETRIES, flags, 0, NULL);
+
+		if (!ret)
+			return ret;
+
+	} while (ufshpb_read_buf_cmd_need_retry(hpb, sdev, &sshdr, ret) &&
+			retries--);
+
+	return ret;
+}
+
+static int ufshpb_subregion_alloc_pages(struct ufshpb_dh_lun *hpb,
+					struct ufshpb_subregion *s)
+{
+	struct ufshpb_map_ctx *mctx;
+
+	spin_lock(&hpb->map_list_lock);
+	mctx = list_first_entry_or_null(&hpb->lh_map_ctx,
+					struct ufshpb_map_ctx, list);
+	if (!mctx) {
+		spin_unlock(&hpb->map_list_lock);
+		return -EINVAL;
+	}
+
+	list_del_init(&mctx->list);
+	spin_unlock(&hpb->map_list_lock);
+
+	s->mctx = mctx;
+	mctx->pages = (char *)__get_free_pages(GFP_KERNEL, order);
+	if (!mctx->pages)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int __subregion_activate(struct ufshpb_dh_lun *hpb,
+				struct ufshpb_region *r,
+				struct ufshpb_subregion *s, bool alloc_pages)
+{
+	int ret;
+
+	if (alloc_pages) {
+		ret = ufshpb_subregion_alloc_pages(hpb, s);
+		if (ret)
+			goto out;
+	}
+
+	if (!s->mctx || !s->mctx->pages) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = ufshpb_submit_read_buf_cmd(hpb, s);
+	if (ret)
+		goto out;
+
+	atomic64_sub(atomic64_read(&s->writes), &r->writes);
+	atomic64_set(&s->writes, 0);
+
+out:
+	if (ret)
+		__subregion_inactivate(hpb, r, s);
+
+	return ret;
+}
+
+static inline bool ufshpb_can_send_read_buffer(struct ufshpb_subregion *s)
+{
+	unsigned int activation_threshold = READS_READ_BUFFER_SENT *
+					    entries_per_subregion / 100;
+
+	return atomic64_read(&s->reads) - atomic64_read(&s->reads_last_trial)
+		>= activation_threshold;
+}
+
+static int ufshpb_subregion_activate(struct ufshpb_dh_lun *hpb,
+				      struct ufshpb_region *r,
+				      struct ufshpb_subregion *s, bool init)
+{
+	int ret;
+
+	lockdep_assert_held(&s->state_lock);
+
+	if (s->state == HPB_STATE_ACTIVE || atomic_read(&r->evicted))
+		return 0;
+
+	/* do not flood the driver with HPB_READ_BUFFER commands */
+	if (!init && !ufshpb_can_send_read_buffer(s))
+		return 0;
+
+	/* if this is the first subregion to become active */
+	if (atomic_inc_return(&r->active_subregions) == 1) {
+		/* check that max_active_regions is not exceeded */
+		if (atomic_inc_return(&hpb->active_regions) >
+		    hpb->max_active_regions) {
+			atomic_dec(&hpb->active_regions);
+			atomic_dec(&r->active_subregions);
+			return 0;
+		}
+	}
+
+	ret = __subregion_activate(hpb, r, s, true);
+	atomic64_set(&s->reads_last_trial, atomic64_read(&s->reads));
+	if (ret) {
+		if (atomic_dec_and_test(&r->active_subregions))
+			atomic_dec(&hpb->active_regions);
+		return ret;
+	}
+
+	s->state = HPB_STATE_ACTIVE;
+
+	return 0;
+}
+
+static struct ufshpb_region *__find_region_to_evict(struct ufshpb_dh_lun *hpb,
+						    struct ufshpb_region *r)
+{
+	struct ufshpb_region *regions = hpb->region_tbl;
+	struct ufshpb_region *exiting_region = NULL;
+	unsigned int eviction_threshold;
+	unsigned int exit_threshold;
+	unsigned int lowest_reads;
+	int i;
+
+	eviction_threshold = EVICTION_READS * entries_per_region / 100;
+	if (atomic64_read(&r->reads) < eviction_threshold)
+		goto out;
+
+	exit_threshold = eviction_threshold >> 1;
+	lowest_reads = exit_threshold;
+	for (i = 0; i < hpb->regions_per_lun; i++) {
+		struct ufshpb_region *rr = regions + i;
+		unsigned int tmp_reads;
+
+		if (r == rr)
+			continue;
+
+		if (test_bit(rr->region, hpb->pinned_map))
+			continue;
+
+		if (!atomic_read(&rr->active_subregions))
+			continue;
+
+		tmp_reads = atomic64_read(&rr->reads);
+		if (tmp_reads < lowest_reads) {
+			lowest_reads = tmp_reads;
+			exiting_region = rr;
+		}
+	}
+
+out:
+	return exiting_region;
+}
+
+static int __region_evict(struct ufshpb_dh_lun *hpb,
+			  struct ufshpb_region *exiting_region, bool reset)
+{
+	struct ufshpb_subregion *subregions = exiting_region->subregion_tbl;
+	int ret = 0;
+	int j;
+
+	/* mark that region as being evicted */
+	atomic_set(&exiting_region->evicted, 1);
+
+	for (j = 0; j < subregions_per_region; j++) {
+		struct ufshpb_subregion *s = subregions + j;
+
+		if (s->state != HPB_STATE_ACTIVE)
+			continue;
+
+		mutex_lock(&s->state_lock);
+		ufshpb_subregion_inactivate(hpb, exiting_region, s, reset);
+		mutex_unlock(&s->state_lock);
+	}
+
+	atomic_set(&exiting_region->evicted, 0);
+
+	return ret;
+}
+
+static int ufshpb_region_try_evict(struct ufshpb_dh_lun *hpb,
+				   struct ufshpb_region *r,
+				   struct ufshpb_subregion *s)
+{
+	struct ufshpb_region *exiting_region = NULL;
+
+	lockdep_assert_held(&s->state_lock);
+
+	if (s->state == HPB_STATE_ACTIVE ||
+	    atomic64_read(&s->writes) > SET_AS_DIRTY)
+		return -EINVAL;
+
+	exiting_region = __find_region_to_evict(hpb, r);
+	if (!exiting_region)
+		return -EINVAL;
+
+	/*
+	 * before inactivating the exiting region check again - some regions
+	 * may become free, so just activate the entring subregion
+	 */
+	if (atomic_read(&hpb->active_regions) >= hpb->max_active_regions)
+		__region_evict(hpb, exiting_region, false);
+
+	/* activate the entering subregion */
+	ufshpb_subregion_activate(hpb, r, s, false);
+
+	return 0;
+}
+
+static void ufshpb_subregion_update(struct ufshpb_dh_lun *hpb,
+				    struct ufshpb_region *r,
+				    struct ufshpb_subregion *s, bool reset)
+{
+	lockdep_assert_held(&s->state_lock);
+
+	if (s->state != HPB_STATE_ACTIVE)
+		return;
+
+	/* bail out if the region is currently being evicted */
+	if (atomic_read(&r->evicted))
+		return;
+
+	/* do not flood the driver with HPB_READ_BUFFER commands */
+	if (!reset && !ufshpb_can_send_read_buffer(s))
+		return;
+
+	s->state = HPB_STATE_INACTIVE;
+
+	if (__subregion_activate(hpb, r, s, false)) {
+		ufshpb_subregion_inactivate(hpb, r, s, false);
+		return;
+	}
+
+	s->state = HPB_STATE_ACTIVE;
+}
+
+/*
+ * ufshpb_dispatch - ufshpb state machine
+ * make the various decisions based on region/subregion state & events
+ */
+static void ufshpb_dispatch(struct ufshpb_dh_lun *hpb, struct ufshpb_region *r,
+			    struct ufshpb_subregion *s)
+{
+	unsigned long e;
+	int i;
+
+	lockdep_assert_held(&s->state_lock);
+
+	e = find_first_bit(&s->event, SUBREGION_EVENT_MAX);
+
+	switch (e) {
+	case SUBREGION_EVENT_READ:
+		if (atomic_read(&r->active_subregions) < 1 &&
+		    atomic_read(&hpb->active_regions) >=
+		    hpb->max_active_regions) {
+			/* potential eviction may take place */
+			mutex_lock(&hpb->eviction_lock);
+			ufshpb_region_try_evict(hpb, r, s);
+			mutex_unlock(&hpb->eviction_lock);
+			break;
+		}
+		/* fall through */
+	case SUBREGION_EVENT_ACTIVATE:
+		ufshpb_subregion_activate(hpb, r, s, false);
+		break;
+
+	case SUBREGION_EVENT_UPDATE:
+		ufshpb_subregion_update(hpb, r, s, false);
+		break;
+
+	case SUBREGION_EVENT_DIRTY_THRESHOLD:
+	case SUBREGION_EVENT_TIMER:
+		ufshpb_subregion_inactivate(hpb, r, s, false);
+		break;
+
+	default:
+		break;
+	}
+
+	for (i = 0; i < SUBREGION_EVENT_MAX; i++)
+		clear_bit(i, &s->event);
+}
+
+static void ufshpb_work_handler(struct work_struct *work)
+{
+	struct ufshpb_subregion *s = to_subregion();
+
+	mutex_lock(&s->state_lock);
+
+	ufshpb_dispatch(s->hpb, s->r, s);
+
+	mutex_unlock(&s->state_lock);
+}
+
 static int ufshpb_mempool_init(struct ufshpb_dh_lun *hpb)
 {
 	unsigned int max_active_subregions = hpb->max_active_regions *
@@ -192,6 +676,8 @@ static int ufshpb_region_tbl_init(struct ufshpb_dh_lun *hpb)
 			s->r = r;
 			s->region = i;
 			s->subregion = j;
+			INIT_WORK(&s->hpb_work, ufshpb_work_handler);
+			mutex_init(&s->state_lock);
 		}
 
 		r->subregion_tbl = subregions;
@@ -200,6 +686,7 @@ static int ufshpb_region_tbl_init(struct ufshpb_dh_lun *hpb)
 	}
 
 	hpb->region_tbl = regions;
+	mutex_init(&hpb->eviction_lock);
 
 	return 0;
 
@@ -223,6 +710,8 @@ static int ufshpb_lun_configuration(struct scsi_device *sdev,
 	const struct ufshpb_lun_config *lun_conf = h->ufshpb_lun_conf;
 	struct ufshpb_dh_lun *hpb = NULL;
 	int ret;
+	struct workqueue_struct *wq;
+	char wq_name[sizeof("ufshpb_wq_00")] = {};
 
 	if (h->hpb)
 		return -EBUSY;
@@ -236,6 +725,13 @@ static int ufshpb_lun_configuration(struct scsi_device *sdev,
 	hpb->subregions_per_lun = DIV_ROUND_UP(lun_conf->size, subregion_size);
 	hpb->last_subregion_size = lun_conf->size % subregion_size;
 
+	hpb->pinned_map = kcalloc(BITS_TO_LONGS(hpb->regions_per_lun),
+				  sizeof(unsigned long), GFP_KERNEL);
+	if (!hpb->pinned_map) {
+		ret = -ENOMEM;
+		goto out_free;
+	}
+
 	ret = ufshpb_mempool_init(hpb);
 	if (ret)
 		goto out_free;
@@ -244,6 +740,15 @@ static int ufshpb_lun_configuration(struct scsi_device *sdev,
 	if (ret)
 		goto out_free;
 
+	snprintf(wq_name, ARRAY_SIZE(wq_name), "ufshpb_wq_%d", sdev->id);
+	wq = alloc_workqueue(wq_name, WQ_HIGHPRI, WQ_MAX_ACTIVE);
+	if (!wq) {
+		ret = -ENOMEM;
+		goto out_free;
+	}
+
+	hpb->wq = wq;
+
 	INIT_LIST_HEAD(&hpb->list);
 	list_add(&hpb->list, &lh_hpb_lunp);
 
@@ -251,8 +756,10 @@ static int ufshpb_lun_configuration(struct scsi_device *sdev,
 	h->hpb = hpb;
 
 out_free:
-	if (ret)
+	if (ret) {
+		kfree(hpb->pinned_map);
 		kfree(hpb);
+	}
 	return ret;
 }
 
@@ -336,6 +843,14 @@ static void ufshpb_region_tbl_remove(struct ufshpb_dh_lun *hpb)
 
 		for (i = 0 ; i < hpb->regions_per_lun; i++) {
 			struct ufshpb_region *r = regions + i;
+			int j;
+
+			for (j = 0; j < subregions_per_region; j++) {
+				struct ufshpb_subregion *s =
+							r->subregion_tbl + j;
+
+				cancel_work_sync(&s->hpb_work);
+			}
 
 			kfree(r->subregion_tbl);
 		}
@@ -355,6 +870,10 @@ static void ufshpb_detach(struct scsi_device *sdev)
 {
 	struct ufshpb_dh_data *h = sdev->handler_data;
 
+	if (h->hpb && h->hpb->wq) {
+		flush_workqueue(h->hpb->wq);
+		destroy_workqueue(h->hpb->wq);
+	}
 	ufshpb_mempool_remove(h->hpb);
 	ufshpb_region_tbl_remove(h->hpb);
 	kfree(sdev->handler_data);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 08/13] scsi: dh: ufshpb: Activate pinned regions
  2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
                   ` (6 preceding siblings ...)
  2020-05-15 10:30 ` [RFC PATCH 07/13] scsi: scsi_dh: ufshpb: Add ufshpb state machine Avri Altman
@ 2020-05-15 10:30 ` Avri Altman
  2020-05-15 10:30 ` [RFC PATCH 09/13] scsi: ufshpb: Add response API Avri Altman
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Avri Altman @ 2020-05-15 10:30 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI, Avri Altman

Pinned regions are regions that needs to be always active. They are
adjacent starting from a pre-defined starting index. On Init and post
lun reset, those regions are moved promptly to active state.

On init, during the lun activation sequence, we run through the full
sequence of activating all the subregions of the pinned regions. Upon
failure HPB is disabled for that lun.

On lun reset, as it is the middle of the platform life and memory might
not be available, it is ok if a subregion activation has failed.  The
region however is counted as one of the lun active regions anyway.

We will also maintain a bitmap per lun indicating its pinned regions, so
that we will not accidently inactivate any of its subregions.

As a general statement, it is undesirable to configure any pinned
regions at all.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
---
 drivers/scsi/device_handler/scsi_dh_ufshpb.c | 53 ++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/drivers/scsi/device_handler/scsi_dh_ufshpb.c b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
index c94a616..8f85933 100644
--- a/drivers/scsi/device_handler/scsi_dh_ufshpb.c
+++ b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
@@ -618,6 +618,55 @@ static void ufshpb_work_handler(struct work_struct *work)
 	mutex_unlock(&s->state_lock);
 }
 
+static int ufshpb_activate_pinned_regions(struct ufshpb_dh_data *h, bool init)
+{
+	const struct ufshpb_lun_config *lun_conf = h->ufshpb_lun_conf;
+	struct ufshpb_dh_lun *hpb = h->hpb;
+	u16 start_idx = lun_conf->pinned_starting_idx;
+	u16 pinned_count = lun_conf->pinned_count;
+	int i, j;
+	int ret;
+
+	if (!pinned_count)
+		return 0;
+
+	for (i = 0; i < pinned_count; i++) {
+		struct ufshpb_region *r = hpb->region_tbl + start_idx + i;
+		struct ufshpb_subregion *subregions = r->subregion_tbl;
+		unsigned int sub_count = subregions_per_region;
+
+		/*
+		 * lun might not be region/subregion aligned.  pinned region
+		 * activation is the only case in which we do care the exact
+		 * amount of the region’s subregions, to avoid sending
+		 * HPB_READ_BUFFER to a nonexistent subregion
+		 */
+		if (i == hpb->regions_per_lun - 1) {
+			/*
+			 * A small trick to avoid checking if the lun is
+			 * subregion aligned
+			 */
+			sub_count = ((hpb->subregions_per_lun - 1) %
+				     subregions_per_region) + 1;
+		}
+
+		for (j = 0; j < sub_count; j++) {
+			struct ufshpb_subregion *s = subregions + j;
+
+			mutex_lock(&s->state_lock);
+			ret = ufshpb_subregion_activate(hpb, r, s, true);
+			mutex_unlock(&s->state_lock);
+
+			if (ret && init)
+				return ret;
+		}
+
+		set_bit(start_idx + i, hpb->pinned_map);
+	}
+
+	return ret;
+}
+
 static int ufshpb_mempool_init(struct ufshpb_dh_lun *hpb)
 {
 	unsigned int max_active_subregions = hpb->max_active_regions *
@@ -791,6 +840,10 @@ static int ufshpb_activate(struct scsi_device *sdev, activate_complete fn,
 	if (ret)
 		return ret;
 
+	ret = ufshpb_activate_pinned_regions(h, true);
+	if (ret)
+		return ret;
+
 	if (fn)
 		fn(data, ret);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 09/13] scsi: ufshpb: Add response API
  2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
                   ` (7 preceding siblings ...)
  2020-05-15 10:30 ` [RFC PATCH 08/13] scsi: dh: ufshpb: Activate pinned regions Avri Altman
@ 2020-05-15 10:30 ` Avri Altman
  2020-05-16  3:06   ` Bart Van Assche
  2020-05-15 10:30 ` [RFC PATCH 10/13] scsi: dh: ufshpb: Add ufshpb_set_params Avri Altman
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Avri Altman @ 2020-05-15 10:30 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI, Avri Altman

Here we've arrived to the second duty of the ufshpb module which mainly
concern the HPB cache management.  Once a HPB response is detected, the
information encapsulated within needs to be transferred to the cache
management engine in the scsi device handler.

Those upiu responses needs to be decoded and managed as a deferred work,
because it received as part of the completion interrupt context. Also,
as either command response may carry those, several responses needs to
be delivered concurrently.

To support the above, we've elaborated the ufshpb_lun structure as we
promised, to include whatever is needed to manage those HPB responses.
Basically we are using 2 linked list, a spinlock and a tasklet. We are
using a tasklet here instead of a worker to get a better responsiveness
because our response element resources are limited.
We are pre-allocating 32 response elements, because there can be up to
32 responses per completion interrupt.

Once the response is decoded we notify the device handler via its
"set_params" handler. Although the documentation specify a pre-defined
protocol, it is not enforced and is wide open to per-handler inner
implementation.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
---
 drivers/scsi/ufs/ufshcd.c     |   4 +
 drivers/scsi/ufs/ufshpb.c     | 202 ++++++++++++++++++++++++++++++++++++++++++
 drivers/scsi/ufs/ufshpb.h     |   3 +
 include/scsi/scsi_dh_ufshpb.h |  17 ++++
 4 files changed, 226 insertions(+)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index c2011bf..9adb743 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -4779,6 +4779,10 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
 				 */
 				pm_runtime_get_noresume(hba->dev);
 			}
+
+			if (ufshcd_is_hpb_supported(hba) &&
+			    scsi_status == SAM_STAT_GOOD)
+				ufshpb_rsp_upiu(hba, lrbp);
 			break;
 		case UPIU_TRANSACTION_REJECT_UPIU:
 			/* TODO: handle Reject UPIU Response */
diff --git a/drivers/scsi/ufs/ufshpb.c b/drivers/scsi/ufs/ufshpb.c
index be926cb..c3541f2 100644
--- a/drivers/scsi/ufs/ufshpb.c
+++ b/drivers/scsi/ufs/ufshpb.c
@@ -20,14 +20,38 @@
 #define GEO_DESC_HPB_SIZE (0x4D)
 #define UNIT_DESC_HPB_SIZE (0x29)
 
+/* hpb response */
+#define RSP_UPDATE_ALERT (0x20000)
+#define RSP_SENSE_DATA_LEN (0x12)
+#define RSP_DES_TYPE	 (0x80)
+#define MASK_RSP_UPIU_LUN (0xff)
+#define MASK_RSP_UPIU_DESC (0xff000000)
+#define MASK_RSP_UPIU_OPER (0xff00)
+#define MASK_RSP_ACTIVE_CNT (0xff00)
+#define MASK_RSP_INACTIVE_CNT (0xff)
+
 enum ufshpb_control_modes {
 	UFSHPB_HOST_CONTROL,
 	UFSHPB_DEVICE_CONTROL
 };
 
+#define RSP_DATA_SEG_LEN (sizeof(struct ufshpb_sense_data))
+
+struct ufshpb_rsp_element {
+	struct ufshpb_sense_data sense_data;
+	struct list_head list;
+};
+
 struct ufshpb_lun {
 	u8 lun;
 	struct scsi_device *sdev;
+
+	/* to manage hpb responses */
+	struct ufshpb_rsp_element *rsp_elements;
+	struct list_head lh_rsp;
+	struct list_head lh_rsp_free;
+	spinlock_t rsp_list_lock;
+	struct tasklet_struct rsp_tasklet;
 };
 
 
@@ -36,6 +60,157 @@ struct ufshpb_lun_config *ufshpb_luns_conf;
 struct ufshpb_lun *ufshpb_luns;
 static unsigned long ufshpb_lun_map[BITS_TO_LONGS(UFSHPB_MAX_LUNS)];
 static u8 ufshpb_lun_lookup[UFSHPB_MAX_LUNS];
+static u8 hba_nutrs;
+
+static void ufshpb_dh_notify(struct ufshpb_lun *hpb,
+			     struct ufshpb_sense_data *sense)
+{
+	struct ufs_hba *hba = shost_priv(hpb->sdev->host);
+
+	spin_lock(hba->host->host_lock);
+
+	if (scsi_device_get(hpb->sdev)) {
+		spin_unlock(hba->host->host_lock);
+		return;
+	}
+
+	scsi_dh_set_params(hpb->sdev->request_queue, (const char *)sense);
+
+	scsi_device_put(hpb->sdev);
+
+	spin_unlock(hba->host->host_lock);
+}
+
+static struct ufshpb_rsp_element *
+ufshpb_get_rsp_elem(struct ufshpb_lun *hpb, struct list_head *head)
+{
+	struct ufshpb_rsp_element *rsp_elem;
+
+	lockdep_assert_held(&hpb->rsp_list_lock);
+
+	rsp_elem = list_first_entry_or_null(head, struct ufshpb_rsp_element,
+					    list);
+	if (rsp_elem)
+		list_del_init(&rsp_elem->list);
+
+	return rsp_elem;
+}
+
+static void ufshpb_tasklet_fn(unsigned long priv)
+{
+	struct ufshpb_lun *hpb = (struct ufshpb_lun *)priv;
+	struct ufshpb_rsp_element *rsp_elem = NULL;
+	unsigned long flags;
+
+	while (1) {
+		spin_lock_irqsave(&hpb->rsp_list_lock, flags);
+		rsp_elem = ufshpb_get_rsp_elem(hpb, &hpb->lh_rsp);
+		spin_unlock_irqrestore(&hpb->rsp_list_lock, flags);
+
+		if (!rsp_elem)
+			return;
+
+		ufshpb_dh_notify(hpb, &rsp_elem->sense_data);
+
+		spin_lock_irqsave(&hpb->rsp_list_lock, flags);
+		list_add_tail(&rsp_elem->list, &hpb->lh_rsp_free);
+		spin_unlock_irqrestore(&hpb->rsp_list_lock, flags);
+	}
+}
+
+/* in host managed mode - ignore region inactivation hints */
+static bool ufshpb_need_to_respond(struct ufshpb_sense_data *hpb_sense,
+				   u8 *sense)
+{
+	hpb_sense->active_reg_cnt =
+		(get_unaligned_be16(sense + 4) & MASK_RSP_ACTIVE_CNT) >> 8;
+
+	if (!hpb_sense->active_reg_cnt)
+		return false;
+
+	/* read 1st region/sub-region to activate */
+	hpb_sense->active_reg_0 = get_unaligned_be16(sense + 6);
+	hpb_sense->active_subreg_0 = get_unaligned_be16(sense + 8);
+
+	if (hpb_sense->active_reg_cnt == 1)
+		return true;
+
+	/* read 2nd region/sub-region to activate */
+	hpb_sense->active_reg_1 = get_unaligned_be16(sense + 10);
+	hpb_sense->active_subreg_1 = get_unaligned_be16(sense + 12);
+
+	return true;
+}
+
+void ufshpb_rsp_upiu(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
+{
+	struct ufshpb_lun *hpb;
+	struct ufshpb_rsp_element *rsp_elem;
+	u32 dword2 = get_unaligned_be32(&lrbp->ucd_rsp_ptr->header.dword_2);
+	u32 data_seg_len = dword2 & MASK_RSP_UPIU_DATA_SEG_LEN;
+	u16 sense_data_len;
+	u8 lun, desc_type, oper, i;
+
+	if (data_seg_len != RSP_DATA_SEG_LEN)
+		return;
+
+	if (!(dword2 & RSP_UPDATE_ALERT))
+		return;
+
+	sense_data_len =
+		get_unaligned_be16(&lrbp->ucd_rsp_ptr->sr.sense_data_len);
+	if (sense_data_len != RSP_SENSE_DATA_LEN)
+		return;
+
+	lun = get_unaligned_be32(lrbp->ucd_rsp_ptr->sr.sense_data) &
+				 MASK_RSP_UPIU_LUN;
+
+	if (lun >= UFSHPB_MAX_LUNS)
+		return;
+
+	if (!test_bit(lun, ufshpb_lun_map))
+		return;
+
+	i = READ_ONCE(ufshpb_lun_lookup[lun]);
+	if (i > ufshpb_conf->num_hpb_luns)
+		return;
+
+	desc_type = (get_unaligned_be32(lrbp->ucd_rsp_ptr->sr.sense_data) &
+		     MASK_RSP_UPIU_DESC) >> 24;
+	if (desc_type != RSP_DES_TYPE)
+		return;
+
+	oper = (get_unaligned_be32(lrbp->ucd_rsp_ptr->sr.sense_data) &
+		MASK_RSP_UPIU_OPER) >> 8;
+	if (!oper)
+		return;
+
+	hpb = ufshpb_luns + i;
+
+	spin_lock_irq(&hpb->rsp_list_lock);
+	rsp_elem = ufshpb_get_rsp_elem(hpb, &hpb->lh_rsp_free);
+	spin_unlock_irq(&hpb->rsp_list_lock);
+	if (!rsp_elem) {
+		dev_err(hba->dev, "no rsp element available\n");
+		return;
+	}
+
+	memset(&rsp_elem->sense_data, 0x00, data_seg_len);
+	rsp_elem->sense_data.lun = lun;
+	if (!ufshpb_need_to_respond(&rsp_elem->sense_data,
+				    lrbp->ucd_rsp_ptr->sr.sense_data)){
+		spin_lock_irq(&hpb->rsp_list_lock);
+		list_move_tail(&rsp_elem->list, &hpb->lh_rsp_free);
+		spin_unlock_irq(&hpb->rsp_list_lock);
+		return;
+	}
+
+	spin_lock_irq(&hpb->rsp_list_lock);
+	list_move_tail(&rsp_elem->list, &hpb->lh_rsp);
+	spin_unlock_irq(&hpb->rsp_list_lock);
+
+	tasklet_schedule(&hpb->rsp_tasklet);
+}
 
 void ufshpb_remove_lun(u8 lun)
 {
@@ -52,6 +227,8 @@ void ufshpb_remove_lun(u8 lun)
 		scsi_dh_release_device(hpb->sdev);
 		scsi_device_put(hpb->sdev);
 	}
+	kfree(hpb->rsp_elements);
+	tasklet_kill(&hpb->rsp_tasklet);
 }
 
 /**
@@ -162,8 +339,31 @@ static int ufshpb_hpb_init(void)
 
 	for (i = 0; i < num_hpb_luns; i++) {
 		struct ufshpb_lun *hpb = ufshpb_luns + i;
+		struct ufshpb_rsp_element *rsp_elements;
+		u8 num_rsp_elements = hba_nutrs;
+		int j;
 
 		hpb->lun = (ufshpb_luns_conf + i)->lun;
+
+		INIT_LIST_HEAD(&hpb->lh_rsp_free);
+		INIT_LIST_HEAD(&hpb->lh_rsp);
+		spin_lock_init(&hpb->rsp_list_lock);
+		tasklet_init(&hpb->rsp_tasklet, ufshpb_tasklet_fn,
+			     (unsigned long)hpb);
+
+		rsp_elements = kcalloc(num_rsp_elements, sizeof(*rsp_elements),
+				       GFP_KERNEL);
+		if (!rsp_elements)
+			return -ENOMEM;
+
+		for (j = 0; j < num_rsp_elements; j++) {
+			struct ufshpb_rsp_element *rsp_elem = rsp_elements + j;
+
+			INIT_LIST_HEAD(&rsp_elem->list);
+			list_add_tail(&rsp_elem->list, &hpb->lh_rsp_free);
+		}
+
+		hpb->rsp_elements = rsp_elements;
 	}
 
 	return 0;
@@ -386,6 +586,8 @@ int ufshpb_probe(struct ufs_hba *hba)
 	if (ret)
 		goto out;
 
+	hba_nutrs = hba->nutrs;
+
 	ret = ufshpb_hpb_init();
 	if (ret)
 		goto out;
diff --git a/drivers/scsi/ufs/ufshpb.h b/drivers/scsi/ufs/ufshpb.h
index 276a749..52e7994 100644
--- a/drivers/scsi/ufs/ufshpb.h
+++ b/drivers/scsi/ufs/ufshpb.h
@@ -15,11 +15,14 @@ struct ufs_hba;
 void ufshpb_remove(struct ufs_hba *hba);
 int ufshpb_probe(struct ufs_hba *hba);
 void ufshpb_attach_sdev(struct ufs_hba *hba, struct scsi_device *sdev);
+void ufshpb_rsp_upiu(struct ufs_hba *hba, struct ufshcd_lrb *lrbp);
 #else
 static inline void ufshpb_remove(struct ufs_hba *hba) {}
 static inline int ufshpb_probe(struct ufs_hba *hba) { return 0; }
 static inline void
 ufshpb_attach_sdev(struct ufs_hba *hba, struct scsi_device *sdev) {}
+static inline
+void ufshpb_rsp_upiu(struct ufs_hba *hba, struct ufshcd_lrb *lrbp) {};
 #endif
 
 #endif /* _UFSHPB_H */
diff --git a/include/scsi/scsi_dh_ufshpb.h b/include/scsi/scsi_dh_ufshpb.h
index 3dcb442..bb6ea44 100644
--- a/include/scsi/scsi_dh_ufshpb.h
+++ b/include/scsi/scsi_dh_ufshpb.h
@@ -46,5 +46,22 @@ struct ufshpb_config {
 	u16 max_active_regions;
 	u8 num_hpb_luns;
 };
+
+/* per JEDEC spec JESD220-3 Table 6.7 — HPB Sense Data */
+struct ufshpb_sense_data {
+	u16 sense_data_len;
+	u8 desc_type;
+	u8 additional_len;
+	u8 oper;
+	u8 lun;
+	u8 active_reg_cnt;
+	u8 inactive_reg_cnt;
+	u16 active_reg_0;
+	u16 active_subreg_0;
+	u16 active_reg_1;
+	u16 active_subreg_1;
+	u16 inactive_reg_0;
+	u16 inactive_reg_1;
+};
 #endif /* _SCSI_DH_UFSHPB_H */
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 10/13] scsi: dh: ufshpb: Add ufshpb_set_params
  2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
                   ` (8 preceding siblings ...)
  2020-05-15 10:30 ` [RFC PATCH 09/13] scsi: ufshpb: Add response API Avri Altman
@ 2020-05-15 10:30 ` Avri Altman
  2020-05-15 10:30 ` [RFC PATCH 11/13] scsi: Allow device handler set their own CDB Avri Altman
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Avri Altman @ 2020-05-15 10:30 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI, Avri Altman

The ufshpb LLD intercepted a response from the device indicating that
some subregion(s) should be activated, or some region(s) should be
inactivated.  Each such message may carry info of up to 2 subregions to
activate, and up to 2 regions to inactivate.

Set the appropriate triggering events to those regions/subregions and
schedule the state machine handler.

In case the device lost the LUN’s HPB information for whatever reason,
it can signal to the host through HPB Sense data. On LUN reset, all of
its active subregions are being updated.

None of the above applies for pinned regions which are not affected by
any of this and stay active.

Inactivation of the entire LUN barriers heavy implications, both with
respect of performance: sending READ_10 instead of HPB_READ, as well as
with respect of overhead – re-activating the entire LUN.  One might
expect that a proper protection against frequent resets and / or
irrational device behavior should be in place.  However, it is out of
scope of the driver to cover for this.  Instead, issue a proper info
message.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
---
 drivers/scsi/device_handler/scsi_dh_ufshpb.c | 131 +++++++++++++++++++++++++++
 1 file changed, 131 insertions(+)

diff --git a/drivers/scsi/device_handler/scsi_dh_ufshpb.c b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
index 8f85933..affc143 100644
--- a/drivers/scsi/device_handler/scsi_dh_ufshpb.c
+++ b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
@@ -68,6 +68,16 @@ enum ufshpb_subregion_events {
 	SUBREGION_EVENT_MAX
 };
 
+enum ufshpb_response_opers {
+	OPER_TYPE_ACTIVATE	= 1,
+	OPER_TYPE_RESET		= 2,
+};
+
+enum ufshpb_work_locks {
+	LUN_RESET_WORK = BIT(0),
+};
+
+
 /**
  * struct map_ctx - a mapping context
  * @pages - array of pages
@@ -158,11 +168,13 @@ struct ufshpb_dh_lun {
 	unsigned int last_subregion_size;
 	unsigned int max_active_regions;
 
+	struct work_struct lun_reset_work;
 	atomic_t active_regions;
 
 	struct mutex eviction_lock;
 
 	struct workqueue_struct *wq;
+	unsigned long work_lock;
 };
 
 struct ufshpb_dh_data {
@@ -285,6 +297,9 @@ static bool ufshpb_read_buf_cmd_need_retry(struct ufshpb_dh_lun *hpb,
 	if (atomic_read(&hpb->active_regions) >= hpb->max_active_regions)
 		return false;
 
+	if (test_bit(LUN_RESET_WORK, &hpb->work_lock))
+		return false;
+
 	/* UFS3.1 spec says: "...
 	 * If the device is performing internal maintenance work, for example
 	 * Garbage collection, the device may terminate the HPB READ BUFFER
@@ -667,6 +682,43 @@ static int ufshpb_activate_pinned_regions(struct ufshpb_dh_data *h, bool init)
 	return ret;
 }
 
+static void __region_reset(struct ufshpb_dh_lun *hpb,
+			   struct ufshpb_region *r)
+{
+	struct ufshpb_subregion *subregions = r->subregion_tbl;
+	int j;
+
+	for (j = 0; j < subregions_per_region; j++) {
+		struct ufshpb_subregion *s = subregions + j;
+		int k;
+
+		mutex_lock(&s->state_lock);
+		ufshpb_subregion_update(hpb, r, s, true);
+		for (k = 0; k < SUBREGION_EVENT_MAX; k++)
+			clear_bit(k, &s->event);
+		mutex_unlock(&s->state_lock);
+	}
+}
+
+static void ufshpb_lun_reset_work_handler(struct work_struct *work)
+{
+	struct ufshpb_dh_lun *hpb = container_of(work, struct ufshpb_dh_lun,
+			lun_reset_work);
+	struct ufshpb_region *regions = hpb->region_tbl;
+	int i;
+
+	for (i = 0; i < hpb->regions_per_lun; i++) {
+		struct ufshpb_region *r = regions + i;
+
+		__region_reset(hpb, r);
+	}
+
+	clear_bit(LUN_RESET_WORK, &hpb->work_lock);
+
+	sdev_printk(KERN_INFO, hpb->sdev, "%s: lun reset [%llu]\n",
+		    UFSHPB_NAME, hpb->sdev->lun);
+}
+
 static int ufshpb_mempool_init(struct ufshpb_dh_lun *hpb)
 {
 	unsigned int max_active_subregions = hpb->max_active_regions *
@@ -736,6 +788,7 @@ static int ufshpb_region_tbl_init(struct ufshpb_dh_lun *hpb)
 
 	hpb->region_tbl = regions;
 	mutex_init(&hpb->eviction_lock);
+	INIT_WORK(&hpb->lun_reset_work, ufshpb_lun_reset_work_handler);
 
 	return 0;
 
@@ -826,6 +879,83 @@ static void ufshpb_get_config(const struct ufshpb_config *c)
 	is_configured = true;
 }
 
+/*
+ * ufshpb_set_params - obtain response from the device
+ * @sdev: scsi device of the hpb lun
+ * params - a hpb sense data buffer copied from the device
+ *
+ * set the appropriate triggers for the various regions/subregions
+ * each hpb sense buffer carries the information of up to 2 subregions to
+ * activate.
+ */
+static int ufshpb_set_params(struct scsi_device *sdev, const char *params)
+{
+	struct ufshpb_dh_data *h = sdev->handler_data;
+	struct ufshpb_dh_lun *hpb = h->hpb;
+	struct ufshpb_region *r;
+	struct ufshpb_subregion *s;
+	struct ufshpb_sense_data sense = {};
+	enum ufshpb_state s_state;
+	u16 *pos;
+	u16 region, subregion;
+	int ret;
+	u8 active_reg_cnt = 0;
+
+	memcpy(&sense, params, sizeof(sense));
+
+	if (sense.oper == OPER_TYPE_RESET) {
+		struct ufshpb_dh_lun *rst_hpb;
+
+		list_for_each_entry(rst_hpb, &lh_hpb_lunp, list) {
+			set_bit(LUN_RESET_WORK, &rst_hpb->work_lock);
+			queue_work(rst_hpb->wq, &rst_hpb->lun_reset_work);
+		}
+		return 0;
+	}
+
+	/* is there any subregions to activate? */
+	if (!sense.active_reg_cnt)
+		return -EINVAL;
+
+	active_reg_cnt = sense.active_reg_cnt;
+	if (active_reg_cnt > 2) {
+		/* HPB1.0 allows the device to signal up to 2 subregions */
+		return -EINVAL;
+	}
+
+	ret = 0;
+
+	pos = &sense.active_reg_0;
+
+	do {
+		region = *pos++;
+		subregion = *pos;
+
+		if (region >= hpb->regions_per_lun ||
+				subregion >= subregions_per_region)
+			return -EINVAL;
+
+		r = hpb->region_tbl + region;
+		s = r->subregion_tbl + subregion;
+
+		rcu_read_lock();
+		s_state = s->state;
+		rcu_read_unlock();
+
+		if (s_state == HPB_STATE_ACTIVE)
+			set_bit(SUBREGION_EVENT_UPDATE, &s->event);
+		else
+			set_bit(SUBREGION_EVENT_ACTIVATE, &s->event);
+
+		queue_work(hpb->wq, &s->hpb_work);
+
+		pos++;
+
+	} while (--active_reg_cnt);
+
+	return ret;
+}
+
 static int ufshpb_activate(struct scsi_device *sdev, activate_complete fn,
 			   void *data)
 {
@@ -939,6 +1069,7 @@ static struct scsi_device_handler ufshpb_dh = {
 	.attach		= ufshpb_attach,
 	.detach		= ufshpb_detach,
 	.activate	= ufshpb_activate,
+	.set_params     = ufshpb_set_params,
 };
 
 static int __init ufshpb_init(void)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 11/13] scsi: Allow device handler set their own CDB
  2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
                   ` (9 preceding siblings ...)
  2020-05-15 10:30 ` [RFC PATCH 10/13] scsi: dh: ufshpb: Add ufshpb_set_params Avri Altman
@ 2020-05-15 10:30 ` Avri Altman
  2020-05-16  3:19   ` Bart Van Assche
  2020-05-15 10:30 ` [RFC PATCH 12/13] scsi: dh: ufshpb: Add prep_fn handler Avri Altman
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Avri Altman @ 2020-05-15 10:30 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI, Avri Altman

Allow scsi device handler handle their own CDB and ship it down the
stream of scsi passthrough command setup flow, without any further
interventions.

This is useful for setting DRV-OP that implements vendor commands via
the scsi device handlers framework.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
---
 drivers/scsi/scsi_lib.c | 5 +++--
 drivers/scsi/sd.c       | 9 +++++++++
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6b6dd40..4e98714 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1148,14 +1148,15 @@ static blk_status_t scsi_setup_fs_cmnd(struct scsi_device *sdev,
 {
 	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
 
+	cmd->cmnd = scsi_req(req)->cmd = scsi_req(req)->__cmd;
+	memset(cmd->cmnd, 0, BLK_MAX_CDB);
+
 	if (unlikely(sdev->handler && sdev->handler->prep_fn)) {
 		blk_status_t ret = sdev->handler->prep_fn(sdev, req);
 		if (ret != BLK_STS_OK)
 			return ret;
 	}
 
-	cmd->cmnd = scsi_req(req)->cmd = scsi_req(req)->__cmd;
-	memset(cmd->cmnd, 0, BLK_MAX_CDB);
 	return scsi_cmd_to_driver(cmd)->init_command(cmd);
 }
 
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index a793cb0..246bef8 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1221,6 +1221,14 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
 	} else if (sdp->use_16_for_rw || (nr_blocks > 0xffff)) {
 		ret = sd_setup_rw16_cmnd(cmd, write, lba, nr_blocks,
 					 protect | fua);
+	} else if (unlikely(sdp->handler && blk_rq_is_private(rq))) {
+		/*
+		 * scsi device handler that implements vendor commands -
+		 * the command was already constructed in the device handler's
+		 * prep_fn, so no need to do anything here
+		 */
+		rq->cmd_flags = REQ_OP_READ;
+		ret = BLK_STS_OK;
 	} else if ((nr_blocks > 0xff) || (lba > 0x1fffff) ||
 		   sdp->use_10_for_rw || protect) {
 		ret = sd_setup_rw10_cmnd(cmd, write, lba, nr_blocks,
@@ -1285,6 +1293,7 @@ static blk_status_t sd_init_command(struct scsi_cmnd *cmd)
 		return sd_setup_write_same_cmnd(cmd);
 	case REQ_OP_FLUSH:
 		return sd_setup_flush_cmnd(cmd);
+	case REQ_OP_DRV_IN:
 	case REQ_OP_READ:
 	case REQ_OP_WRITE:
 		return sd_setup_read_write_cmnd(cmd);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 12/13] scsi: dh: ufshpb: Add prep_fn handler
  2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
                   ` (10 preceding siblings ...)
  2020-05-15 10:30 ` [RFC PATCH 11/13] scsi: Allow device handler set their own CDB Avri Altman
@ 2020-05-15 10:30 ` Avri Altman
  2020-05-16  3:40   ` Bart Van Assche
  2020-05-15 10:30 ` [RFC PATCH 13/13] scsi: scsi_dh: ufshpb: Add "Cold" subregions timer Avri Altman
  2020-05-16  3:50 ` [RFC PATCH 00/13] scsi: ufs: Add HPB Support Bart Van Assche
  13 siblings, 1 reply; 39+ messages in thread
From: Avri Altman @ 2020-05-15 10:30 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI, Avri Altman

The 3rd and last functionality of the HPB driver is to construct a
HPB_READ command for each read request whenever possible.  Such HPB_READ
command should include both the logical and physical address (ppa) of
the first block to read.

Still, in order to effectively attach the “correct” ppa we need to
monitor the “dirty” blocks: those are the blocks that a WRITE/UNMAP
commands were send to post a HPB_READ_BUFFER.  For that purpose, we are
maintaining for each subregion, the write counter.

On WRITE/UNMAP, we just increment the write counter.
On READ, if the region/subregion is not active we will try to activate
it.  If the subregion is active, we will check if it is “clean”, and if
it does we will fetch its ppa, and construct HPB_READ command and attach
it to the request structure.

Some pre-arrangement were made to assure that scsi command setup process
will let the HPB_READ CDB pass unharmed.

The regions and subregions state will be mostly-read accessed by both
READ and WRITE requests, looking if the entry is "clean" and can be
HPB-READ, or mark those entries as "dirty" when applicable. So RCU seems
most suitable to allow those concurrent accesses.  Also, the fundamental
RCU requirement of a "graceful period" does apply in our case as the
device firmware backs us up and will pool the correct physical address
on our occasional misses.

Upon a read request to an inactive subregion, or a write request to an
active subregion, we try to activate/inactivate it respectively.

The read/write workers calls the state machine if the subregion crossed
the {activation/inactivation} threshold, but once it did, any new
request will trigger it.  In the case of e.g. sequential read to that
subregion, can yield a couple of dozens simultaneous read requests.

That's fine, because if the region is active, then the activation
trial will surely succeed, and once the subregion become active - we do
nothing.  Otherwise, we are in the unknown terrain of eviction which can
fail - and that's fine too because we will keep trying.

For writes its even simpler, as the subregion becomes inactive promptly,
and we won't go there anymore.

We are using the raeds and writes counters per subregion to support the
L2P cache various management decisions.  We don't want to reset those
counters so to not loose its historical meaning: an area that is kept
being read or write over and over.

However, not to overflow the counters, and to maintain the correct
relative ratio among all regions we need to normalize those counters.

When one of the region's reads counters reaches a threshold, all
counters are divided by 2, for all regions – active and inactive.  This
goes to the write counters as well.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
---
 drivers/scsi/device_handler/scsi_dh_ufshpb.c | 365 ++++++++++++++++++++++++++-
 1 file changed, 355 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_ufshpb.c b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
index affc143..04e3d56 100644
--- a/drivers/scsi/device_handler/scsi_dh_ufshpb.c
+++ b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
@@ -15,6 +15,7 @@
 #include <scsi/scsi_eh.h>
 #include <scsi/scsi_dh.h>
 #include <scsi/scsi_dh_ufshpb.h>
+#include "../sd.h"
 
 #define UFSHPB_NAME	"ufshpb"
 
@@ -22,10 +23,12 @@
 #define WRITE_BUFFER_TIMEOUT (3 * HZ)
 #define WRITE_BUFFER_RETRIES (3)
 #define UFSHPB_READ_BUFFER (0xf9)
+#define UFSHPB_READ (0xf8)
 #define READ_BUFFER_TIMEOUT (3 * HZ)
 #define READ_BUFFER_RETRIES (3)
 
 #define HPB_ENTRY_SIZE (8)
+#define HPB_ENTRIES_PER_PAGE (PAGE_SIZE / HPB_ENTRY_SIZE)
 
 /* subregion activation threshold - % of subregion size */
 #define ACTIVATION_READS (10)
@@ -39,6 +42,9 @@
 #define READ_TIMEOUT_MSEC (500)
 /* allowed rewinds of the read_timeout for “clean” subregion */
 #define READ_TIMEOUT_EXPIRIES (100)
+/* normalize the counters to keep the regions balanced */
+#define READ_NORMALIZATION (3)
+#define WRITE_NORMALIZATION (READ_NORMALIZATION)
 /* region eviction threshold: 3 x subregions_per_region x MIN_READS */
 #define EVICTION_READS (3 * ACTIVATION_READS)
 
@@ -77,7 +83,6 @@ enum ufshpb_work_locks {
 	LUN_RESET_WORK = BIT(0),
 };
 
-
 /**
  * struct map_ctx - a mapping context
  * @pages - array of pages
@@ -154,6 +159,8 @@ struct ufshpb_region {
  * @last_subregion_size - 0 (actual size) if lun is (not) subregion aligned
  * @max_active_regions - Max allowable active regions at any given time
  * @active_regions - current active regions
+ * @reads_normalization_work - worker to normalize read counters
+ * @writes_normalization_work - worker to normalize write counters
  */
 struct ufshpb_dh_lun {
 	struct list_head list;
@@ -174,6 +181,8 @@ struct ufshpb_dh_lun {
 	struct mutex eviction_lock;
 
 	struct workqueue_struct *wq;
+	struct work_struct reads_normalization_work;
+	struct work_struct writes_normalization_work;
 	unsigned long work_lock;
 };
 
@@ -195,6 +204,201 @@ static unsigned char order;
 static unsigned long region_size;
 static unsigned long subregion_size;
 static unsigned int subregions_per_region;
+/* Lookup constants */
+static unsigned int entries_per_region_shift;
+static unsigned int entries_per_region_mask;
+static unsigned int entries_per_subregion_shift;
+static unsigned int entries_per_subregion_mask;
+
+static inline unsigned int ufshpb_lba_to_region(u64 lba)
+{
+	return lba >> entries_per_region_shift;
+}
+
+static inline unsigned int ufshpb_lba_to_subregion(u64 lba)
+{
+	unsigned int offset = lba & entries_per_region_mask;
+
+	return offset >> entries_per_region_shift;
+}
+
+static inline unsigned int ufshpb_lba_to_entry(u64 lba)
+{
+	unsigned int offset = lba & entries_per_region_mask;
+
+	return offset & entries_per_subregion_mask;
+}
+
+static inline bool ufshpb_can_send_read_buffer(struct ufshpb_subregion *s)
+{
+	unsigned int activation_threshold = READS_READ_BUFFER_SENT *
+					    entries_per_subregion / 100;
+
+	return atomic64_read(&s->reads) - atomic64_read(&s->reads_last_trial)
+		>= activation_threshold;
+}
+
+static void __update_read_counters(struct ufshpb_dh_lun *hpb,
+				   struct ufshpb_region *r,
+				   struct ufshpb_subregion *s, u64 nr_blocks)
+{
+	enum ufshpb_state s_state;
+
+	atomic64_add(nr_blocks, &s->reads);
+	atomic64_add(nr_blocks, &r->reads);
+
+	/* normalize read counters if needed */
+	if (atomic64_read(&r->reads) >= READ_NORMALIZATION * entries_per_region)
+		queue_work(hpb->wq, &hpb->reads_normalization_work);
+
+	rcu_read_lock();
+	s_state = s->state;
+	rcu_read_unlock();
+
+	if (s_state != HPB_STATE_ACTIVE) {
+		unsigned int activation_threshold =
+			ACTIVATION_READS * entries_per_subregion / 100;
+
+		if (atomic64_read(&s->reads) > activation_threshold) {
+			set_bit(SUBREGION_EVENT_READ, &s->event);
+			schedule_work(&s->hpb_work);
+		}
+	} else if (atomic64_read(&s->writes) >= SET_AS_DIRTY &&
+		   ufshpb_can_send_read_buffer(s)) {
+		/*
+		 * subregion is active and dirty.  reads keep coming, but the
+		 * device hasn't sent update signal yet.  We better update its
+		 * L2P DB so we can benefit from HPB-READing it
+		 */
+		set_bit(SUBREGION_EVENT_UPDATE, &s->event);
+		schedule_work(&s->hpb_work);
+	}
+}
+
+static void __update_write_counters(struct ufshpb_dh_lun *hpb,
+				    struct ufshpb_region *r,
+				    struct ufshpb_subregion *s, u64 nr_blocks)
+{
+	enum ufshpb_state s_state;
+
+	atomic64_add(nr_blocks, &s->writes);
+	atomic64_add(nr_blocks, &r->writes);
+
+	/* normalize write counters if needed */
+	if (atomic64_read(&r->writes) >= WRITE_NORMALIZATION * entries_per_region)
+		queue_work(hpb->wq, &hpb->writes_normalization_work);
+
+	rcu_read_lock();
+	s_state = s->state;
+	rcu_read_unlock();
+
+	if (s_state == HPB_STATE_ACTIVE) {
+		unsigned int inactivation_threshold =
+			INACTIVATION_WRITES * entries_per_subregion / 100;
+
+		if (atomic64_read(&s->writes) > inactivation_threshold) {
+			if (test_bit(s->region, s->hpb->pinned_map))
+				set_bit(SUBREGION_EVENT_UPDATE, &s->event);
+			else
+				set_bit(SUBREGION_EVENT_DIRTY_THRESHOLD,
+					&s->event);
+			schedule_work(&s->hpb_work);
+		}
+	}
+}
+
+static void __update_rw_counters(struct ufshpb_dh_lun *hpb, u64 start_lba,
+				u64 end_lba, enum req_opf op)
+{
+	struct ufshpb_region *r;
+	struct ufshpb_subregion *s;
+	unsigned int end_region, end_subregion;
+	u64 first_lba, last_lba, nr_blocks;
+	int i, j;
+
+	i = ufshpb_lba_to_region(start_lba);
+	j = ufshpb_lba_to_subregion(start_lba);
+	r = hpb->region_tbl + i;
+	s = r->subregion_tbl + j;
+
+	end_region = ufshpb_lba_to_region(end_lba);
+	end_subregion = ufshpb_lba_to_subregion(end_lba);
+
+	if (i == end_region && j == end_subregion) {
+		first_lba = start_lba;
+		goto last_subregion;
+	}
+
+	first_lba = ((u64)i * subregions_per_region + j)
+		    * entries_per_subregion;
+	last_lba = first_lba + entries_per_subregion;
+	nr_blocks =  last_lba - start_lba;
+
+	if (op == REQ_OP_READ)
+		__update_read_counters(hpb, r, s, nr_blocks);
+	else
+		__update_write_counters(hpb, r, s, nr_blocks);
+
+	nr_blocks = entries_per_subregion;
+	while (last_lba < end_lba) {
+		first_lba += entries_per_subregion;
+		last_lba += entries_per_subregion;
+
+		i = ufshpb_lba_to_region(first_lba);
+		j = ufshpb_lba_to_subregion(first_lba);
+
+		r = hpb->region_tbl + i;
+		s = r->subregion_tbl + j;
+
+		if (op == REQ_OP_READ)
+			__update_read_counters(hpb, r, s, nr_blocks);
+		else
+			__update_write_counters(hpb, r, s, nr_blocks);
+	}
+
+last_subregion:
+	nr_blocks =  end_lba - first_lba;
+	if (op == REQ_OP_READ)
+		__update_read_counters(hpb, r, s, nr_blocks);
+	else
+		__update_write_counters(hpb, r, s, nr_blocks);
+}
+
+/* Call this on read from prep_fn */
+static bool ufshpb_test_block_dirty(struct ufshpb_dh_data *h,
+				    struct request *rq, u64 start_lba,
+				    u32 nr_blocks)
+{
+	struct ufshpb_dh_lun *hpb = h->hpb;
+	u64 end_lba = start_lba + nr_blocks;
+	unsigned int region = ufshpb_lba_to_region(start_lba);
+	unsigned int subregion = ufshpb_lba_to_subregion(start_lba);
+	struct ufshpb_region *r = hpb->region_tbl + region;
+	struct ufshpb_subregion *s = r->subregion_tbl + subregion;
+	enum ufshpb_state s_state;
+
+	__update_rw_counters(hpb, start_lba, end_lba, REQ_OP_READ);
+
+	rcu_read_lock();
+	s_state = s->state;
+	rcu_read_unlock();
+
+	if (s_state != HPB_STATE_ACTIVE)
+		return true;
+
+	return (atomic64_read(&s->writes) >= SET_AS_DIRTY);
+}
+
+/* Call this on write from prep_fn */
+static void ufshpb_set_block_dirty(struct ufshpb_dh_data *h,
+				   struct request *rq, u64 start_lba,
+				   u32 nr_blocks)
+{
+	struct ufshpb_dh_lun *hpb = h->hpb;
+
+	__update_rw_counters(hpb, start_lba, start_lba + nr_blocks,
+			     REQ_OP_WRITE);
+}
 
 static inline void ufshpb_set_write_buf_cmd(unsigned char *cmd,
 					    unsigned int region)
@@ -405,15 +609,6 @@ static int __subregion_activate(struct ufshpb_dh_lun *hpb,
 	return ret;
 }
 
-static inline bool ufshpb_can_send_read_buffer(struct ufshpb_subregion *s)
-{
-	unsigned int activation_threshold = READS_READ_BUFFER_SENT *
-					    entries_per_subregion / 100;
-
-	return atomic64_read(&s->reads) - atomic64_read(&s->reads_last_trial)
-		>= activation_threshold;
-}
-
 static int ufshpb_subregion_activate(struct ufshpb_dh_lun *hpb,
 				      struct ufshpb_region *r,
 				      struct ufshpb_subregion *s, bool init)
@@ -575,6 +770,55 @@ static void ufshpb_subregion_update(struct ufshpb_dh_lun *hpb,
 	s->state = HPB_STATE_ACTIVE;
 }
 
+static void ufshpb_reads_normalization_work_handler(struct work_struct *work)
+{
+	struct ufshpb_dh_lun *hpb = container_of(work, struct ufshpb_dh_lun,
+						 reads_normalization_work);
+	struct ufshpb_region *regions = hpb->region_tbl;
+	int i, j;
+
+	for (i = 0; i < hpb->regions_per_lun; i++) {
+		struct ufshpb_region *r = regions + i;
+		struct ufshpb_subregion *subregions = r->subregion_tbl;
+
+		for (j = 0; j < subregions_per_region; j++) {
+			struct ufshpb_subregion *s = subregions + j;
+			u64 last_trial = atomic64_read(&s->reads_last_trial);
+			u64 curr_reads = atomic64_read(&s->reads);
+
+			atomic64_set(&s->reads, curr_reads >> 1);
+			if (last_trial > curr_reads)
+				atomic64_sub(curr_reads, &s->reads_last_trial);
+			else
+				atomic64_set(&s->reads_last_trial, 0);
+		}
+
+		atomic64_set(&r->reads, atomic64_read(&r->reads) >> 1);
+	}
+}
+
+static void ufshpb_writes_normalization_work_handler(struct work_struct *work)
+{
+	struct ufshpb_dh_lun *hpb = container_of(work, struct ufshpb_dh_lun,
+						 writes_normalization_work);
+	struct ufshpb_region *regions = hpb->region_tbl;
+	int i, j;
+
+	for (i = 0; i < hpb->regions_per_lun; i++) {
+		struct ufshpb_region *r = regions + i;
+		struct ufshpb_subregion *subregions = r->subregion_tbl;
+
+		for (j = 0; j < subregions_per_region; j++) {
+			struct ufshpb_subregion *s = subregions + j;
+			u64 curr_writes = atomic64_read(&s->writes);
+
+			atomic64_set(&s->writes, curr_writes >> 1);
+		}
+
+		atomic64_set(&r->writes, atomic64_read(&r->writes) >> 1);
+	}
+}
+
 /*
  * ufshpb_dispatch - ufshpb state machine
  * make the various decisions based on region/subregion state & events
@@ -631,6 +875,9 @@ static void ufshpb_work_handler(struct work_struct *work)
 	ufshpb_dispatch(s->hpb, s->r, s);
 
 	mutex_unlock(&s->state_lock);
+
+	/* the subregion state might has changed */
+	synchronize_rcu();
 }
 
 static int ufshpb_activate_pinned_regions(struct ufshpb_dh_data *h, bool init)
@@ -679,6 +926,12 @@ static int ufshpb_activate_pinned_regions(struct ufshpb_dh_data *h, bool init)
 		set_bit(start_idx + i, hpb->pinned_map);
 	}
 
+	/*
+	 * those subregions of the pinned regions changed their state - they
+	 * are active now
+	 */
+	synchronize_rcu();
+
 	return ret;
 }
 
@@ -713,6 +966,9 @@ static void ufshpb_lun_reset_work_handler(struct work_struct *work)
 		__region_reset(hpb, r);
 	}
 
+	/* update rcu post lun reset */
+	synchronize_rcu();
+
 	clear_bit(LUN_RESET_WORK, &hpb->work_lock);
 
 	sdev_printk(KERN_INFO, hpb->sdev, "%s: lun reset [%llu]\n",
@@ -789,6 +1045,10 @@ static int ufshpb_region_tbl_init(struct ufshpb_dh_lun *hpb)
 	hpb->region_tbl = regions;
 	mutex_init(&hpb->eviction_lock);
 	INIT_WORK(&hpb->lun_reset_work, ufshpb_lun_reset_work_handler);
+	INIT_WORK(&hpb->reads_normalization_work,
+		  ufshpb_reads_normalization_work_handler);
+	INIT_WORK(&hpb->writes_normalization_work,
+		  ufshpb_writes_normalization_work_handler);
 
 	return 0;
 
@@ -875,10 +1135,94 @@ static void ufshpb_get_config(const struct ufshpb_config *c)
 	entries_per_subregion = BIT(c->subregion_size - 3);
 	order = get_order(entries_per_subregion * HPB_ENTRY_SIZE);
 
+	entries_per_region_shift = ilog2(entries_per_region) - 1;
+	entries_per_region_mask = entries_per_region - 1;
+	entries_per_subregion_shift = ilog2(entries_per_subregion) - 1;
+	entries_per_subregion_mask = entries_per_subregion - 1;
+
 	INIT_LIST_HEAD(&lh_hpb_lunp);
 	is_configured = true;
 }
 
+static void ufshpb_set_read_cmd(unsigned char *cmd, u32 lba, u64 ppa,
+				unsigned char len)
+{
+	cmd[0] = UFSHPB_READ;
+	cmd[1] = 0;
+	put_unaligned_be32(lba, &cmd[2]);
+	put_unaligned_be64(ppa, &cmd[6]);
+	cmd[14] = len;
+	cmd[9] = 0;
+}
+
+static void ufshpb_prepare_hpb_read_cmd(struct request *rq,
+					struct ufshpb_dh_lun *hpb, u64 lba,
+					u8 len)
+{
+	unsigned int region = ufshpb_lba_to_region(lba);
+	unsigned int subregion = ufshpb_lba_to_subregion(lba);
+	unsigned int entry = ufshpb_lba_to_entry(lba);
+	struct ufshpb_region *r = hpb->region_tbl + region;
+	struct ufshpb_subregion *s = r->subregion_tbl + subregion;
+	unsigned char *buf;
+	unsigned char cmnd[BLK_MAX_CDB] = {};
+	unsigned int page_index, page_offset;
+	u64 ppa;
+
+	page_index = entry / HPB_ENTRIES_PER_PAGE;
+	page_offset = entry % HPB_ENTRIES_PER_PAGE;
+
+	if (page_index > BIT(order)) {
+		/* TODO: is this even possible ??? */
+		sdev_printk(KERN_ERR, hpb->sdev,
+			    "%s: lba out of range [%llu/%u/%u/%u/%u/%u]\n",
+			    UFSHPB_NAME, lba, r->region, s->subregion, entry,
+			    page_index, page_offset);
+		return;
+	}
+
+	if (!s->mctx || !s->mctx->pages) {
+		/* subregion got inactivated */
+		return;
+	}
+
+	buf = s->mctx->pages + page_index * PAGE_SIZE;
+	ppa = (u64)buf[page_offset];
+
+	ufshpb_set_read_cmd(cmnd, (u32)lba, ppa, len);
+
+	memcpy(scsi_req(rq)->cmd, cmnd, sizeof(cmnd));
+
+	rq->cmd_flags = REQ_OP_DRV_IN;
+}
+
+/*
+ * ufshpb_prep_fn - Construct HPB_READ when possible
+ */
+static blk_status_t ufshpb_prep_fn(struct scsi_device *sdev, struct request *rq)
+{
+	struct ufshpb_dh_data *h = sdev->handler_data;
+	struct ufshpb_dh_lun *hpb = h->hpb;
+	u64 lba = sectors_to_logical(sdev, blk_rq_pos(rq));
+	u32 nr_blocks = sectors_to_logical(sdev, blk_rq_sectors(rq));
+
+	if (op_is_write(req_op(rq)) || op_is_discard(req_op(rq))) {
+		ufshpb_set_block_dirty(h, rq, lba, nr_blocks);
+		goto out;
+	}
+
+	if (req_op(rq) != REQ_OP_READ || nr_blocks > 255)
+		goto out;
+
+	if (ufshpb_test_block_dirty(h, rq, lba, nr_blocks))
+		goto out;
+
+	ufshpb_prepare_hpb_read_cmd(rq, hpb, lba, (u8)nr_blocks);
+
+out:
+	return BLK_STS_OK;
+}
+
 /*
  * ufshpb_set_params - obtain response from the device
  * @sdev: scsi device of the hpb lun
@@ -1069,6 +1413,7 @@ static struct scsi_device_handler ufshpb_dh = {
 	.attach		= ufshpb_attach,
 	.detach		= ufshpb_detach,
 	.activate	= ufshpb_activate,
+	.prep_fn	= ufshpb_prep_fn,
 	.set_params     = ufshpb_set_params,
 };
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC PATCH 13/13] scsi: scsi_dh: ufshpb: Add "Cold" subregions timer
  2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
                   ` (11 preceding siblings ...)
  2020-05-15 10:30 ` [RFC PATCH 12/13] scsi: dh: ufshpb: Add prep_fn handler Avri Altman
@ 2020-05-15 10:30 ` Avri Altman
  2020-05-16  3:50 ` [RFC PATCH 00/13] scsi: ufs: Add HPB Support Bart Van Assche
  13 siblings, 0 replies; 39+ messages in thread
From: Avri Altman @ 2020-05-15 10:30 UTC (permalink / raw)
  To: James E . J . Bottomley, Martin K . Petersen, linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI, Avri Altman

In order not to hang on to “cold” subregions, we shall inactivate a
subregion that has no READ access for a predefined amount of time. For
that purpose we shall attach to any active subregion a timer, triggering
it on every READ to expire after 500msec. On timer expiry we shall make
note of that event and call the state-machine worker to handle it.

Timers will not be attached to a pinned region's subregions.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
---
 drivers/scsi/device_handler/scsi_dh_ufshpb.c | 47 +++++++++++++++++++++++++++-
 1 file changed, 46 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_ufshpb.c b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
index 04e3d56..e89dd30 100644
--- a/drivers/scsi/device_handler/scsi_dh_ufshpb.c
+++ b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
@@ -123,6 +123,8 @@ struct ufshpb_subregion {
 	unsigned long event;
 	struct mutex state_lock;
 	struct work_struct hpb_work;
+
+	struct timer_list read_timer;
 };
 
 /**
@@ -210,6 +212,31 @@ static unsigned int entries_per_region_mask;
 static unsigned int entries_per_subregion_shift;
 static unsigned int entries_per_subregion_mask;
 
+static void ufshpb_read_timeout(struct timer_list *t)
+{
+	struct ufshpb_subregion *s = from_timer(s, t, read_timer);
+	enum ufshpb_state s_state;
+
+	rcu_read_lock();
+	s_state = s->state;
+	rcu_read_unlock();
+
+	if (WARN_ON_ONCE(s_state != HPB_STATE_ACTIVE))
+		return;
+
+	if (atomic64_read(&s->writes) < SET_AS_DIRTY &&
+	    !atomic_dec_and_test(&s->read_timeout_expiries)) {
+		/* rewind the timer for clean subregions */
+		mod_timer(&s->read_timer,
+			  jiffies + msecs_to_jiffies(READ_TIMEOUT_MSEC));
+		return;
+	}
+
+	set_bit(SUBREGION_EVENT_TIMER, &s->event);
+
+	queue_work(s->hpb->wq, &s->hpb_work);
+}
+
 static inline unsigned int ufshpb_lba_to_region(u64 lba)
 {
 	return lba >> entries_per_region_shift;
@@ -376,6 +403,7 @@ static bool ufshpb_test_block_dirty(struct ufshpb_dh_data *h,
 	struct ufshpb_region *r = hpb->region_tbl + region;
 	struct ufshpb_subregion *s = r->subregion_tbl + subregion;
 	enum ufshpb_state s_state;
+	bool is_dirty;
 
 	__update_rw_counters(hpb, start_lba, end_lba, REQ_OP_READ);
 
@@ -386,7 +414,14 @@ static bool ufshpb_test_block_dirty(struct ufshpb_dh_data *h,
 	if (s_state != HPB_STATE_ACTIVE)
 		return true;
 
-	return (atomic64_read(&s->writes) >= SET_AS_DIRTY);
+	is_dirty = (atomic64_read(&s->writes) >= SET_AS_DIRTY);
+	if (!is_dirty && !test_bit(region, hpb->pinned_map)) {
+		mod_timer(&s->read_timer,
+			  jiffies + msecs_to_jiffies(READ_TIMEOUT_MSEC));
+		atomic_set(&s->read_timeout_expiries, READ_TIMEOUT_EXPIRIES);
+	}
+
+	return is_dirty;
 }
 
 /* Call this on write from prep_fn */
@@ -456,6 +491,8 @@ static void __subregion_inactivate(struct ufshpb_dh_lun *hpb,
 	list_add(&s->mctx->list, &hpb->lh_map_ctx);
 	spin_unlock(&hpb->map_list_lock);
 	s->mctx = NULL;
+
+	del_timer(&s->read_timer);
 }
 
 static void ufshpb_subregion_inactivate(struct ufshpb_dh_lun *hpb,
@@ -602,6 +639,13 @@ static int __subregion_activate(struct ufshpb_dh_lun *hpb,
 	atomic64_sub(atomic64_read(&s->writes), &r->writes);
 	atomic64_set(&s->writes, 0);
 
+	if (!test_bit(r->region, hpb->pinned_map)) {
+		timer_setup(&s->read_timer, ufshpb_read_timeout, 0);
+		mod_timer(&s->read_timer,
+			  jiffies + msecs_to_jiffies(READ_TIMEOUT_MSEC));
+		atomic_set(&s->read_timeout_expiries, READ_TIMEOUT_EXPIRIES);
+	}
+
 out:
 	if (ret)
 		__subregion_inactivate(hpb, r, s);
@@ -1377,6 +1421,7 @@ static void ufshpb_region_tbl_remove(struct ufshpb_dh_lun *hpb)
 							r->subregion_tbl + j;
 
 				cancel_work_sync(&s->hpb_work);
+				del_timer(&s->read_timer);
 			}
 
 			kfree(r->subregion_tbl);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 02/13] scsi: ufshpb: Init part I - Read HPB config
  2020-05-15 10:30 ` [RFC PATCH 02/13] scsi: ufshpb: Init part I - Read HPB config Avri Altman
@ 2020-05-15 15:33   ` Randy Dunlap
  2020-05-16  1:46   ` Bart Van Assche
  2020-05-16  1:57   ` Bart Van Assche
  2 siblings, 0 replies; 39+ messages in thread
From: Randy Dunlap @ 2020-05-15 15:33 UTC (permalink / raw)
  To: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel
  Cc: Bart Van Assche, alim.akhtar, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, yongmyung lee,
	Jinyoung CHOI

Hi--

On 5/15/20 3:30 AM, Avri Altman wrote:
> diff --git a/drivers/scsi/ufs/Kconfig b/drivers/scsi/ufs/Kconfig
> index e2005ae..a540919 100644
> --- a/drivers/scsi/ufs/Kconfig
> +++ b/drivers/scsi/ufs/Kconfig
> @@ -160,3 +160,15 @@ config SCSI_UFS_BSG
>  
>  	  Select this if you need a bsg device node for your UFS controller.
>  	  If unsure, say N.
> +
> +config SCSI_UFS_HPB
> +	bool "Support UFS Host Performance Booster (HPB)"
> +        depends on SCSI_UFSHCD
> +        help
> +	  A UFS feature targeted to improve random read performance.  It uses
> +	  the host’s system memory as a cache for L2P map data, so that both
> +	  physical block address (PBA) and logical block address (LBA) can be
> +	  delivered in HPB read command.
> +
> +          Select this to enable this feature.
> +          If unsure, say N.

Please follow Documentation/process/coding-style.rst for Kconfig files:

Lines under a ``config`` definition are indented with
one tab, while help text is indented an additional two spaces.

I.e., not a mixture.

thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 02/13] scsi: ufshpb: Init part I - Read HPB config
  2020-05-15 10:30 ` [RFC PATCH 02/13] scsi: ufshpb: Init part I - Read HPB config Avri Altman
  2020-05-15 15:33   ` Randy Dunlap
@ 2020-05-16  1:46   ` Bart Van Assche
  2020-05-16  1:57   ` Bart Van Assche
  2 siblings, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2020-05-16  1:46 UTC (permalink / raw)
  To: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel
  Cc: alim.akhtar, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	yongmyung lee, Jinyoung CHOI

On 2020-05-15 03:30, Avri Altman wrote:
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 426073a..bffe699 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -50,6 +50,7 @@
>  #include "ufs_bsg.h"
>  #include <asm/unaligned.h>
>  #include <linux/blkdev.h>
> +#include "ufshpb.h"
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/ufs.h>
> @@ -7341,6 +7342,9 @@ static int ufshcd_add_lus(struct ufs_hba *hba)
>  		hba->clk_scaling.is_allowed = true;
>  	}
>  
> +	if (ufshcd_is_hpb_supported(hba))
> +		ufshpb_probe(hba);
> +
>  	ufs_bsg_probe(hba);
>  	scsi_scan_host(hba->host);
>  	pm_runtime_put_sync(hba->dev);

This looks weird to me because of the following reasons:
- If there are direct calls from the ufshcd module into the ufshpb
  module and if there are direct calls from the ufshpb module into the
  ufshcd module, why has ufshpb been implemented as a kernel module? Or
  in other words, will it ever be possible to load the ufshcd module
  without loading the ufshpb module?
- Patch 3/13 makes ufshpb a device handler. There are no direct calls
  from any upstream SCSI LLD to any upstream device handler. However,
  this patch adds a direct call from the ufshcd module to the ufshpb
  module which is a device handler.

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 03/13] scsi: scsi_dh: Introduce scsi_dh_ufshpb
  2020-05-15 10:30 ` [RFC PATCH 03/13] scsi: scsi_dh: Introduce scsi_dh_ufshpb Avri Altman
@ 2020-05-16  1:48   ` Bart Van Assche
  0 siblings, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2020-05-16  1:48 UTC (permalink / raw)
  To: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel
  Cc: alim.akhtar, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	yongmyung lee, Jinyoung CHOI

On 2020-05-15 03:30, Avri Altman wrote:
> +static int ufshpb_attach(struct scsi_device *sdev)
> +{
> +	struct ufshpb_dh_data *h;
> +
> +	h = kzalloc(sizeof(*h), GFP_KERNEL);
> +	if (!h)
> +		return SCSI_DH_NOMEM;
> +
> +	sdev_printk(KERN_INFO, sdev, "%s: attached to sdev (lun) %llu\n",
> +		    UFSHPB_NAME, sdev->lun);
> +
> +	sdev->handler_data = h;
> +
> +	return SCSI_DH_OK;
> +}

I think that all other SCSI device handlers check in their .attach
function whether the @sdev SCSI device is supported by the device
handler. I don't see any such check in the above function?

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 04/13] scsi: ufs: ufshpb: Init part II - Attach scsi device
  2020-05-15 10:30 ` [RFC PATCH 04/13] scsi: ufs: ufshpb: Init part II - Attach scsi device Avri Altman
@ 2020-05-16  1:52   ` Bart Van Assche
  0 siblings, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2020-05-16  1:52 UTC (permalink / raw)
  To: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel
  Cc: alim.akhtar, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	yongmyung lee, Jinyoung CHOI

On 2020-05-15 03:30, Avri Altman wrote:
> @@ -4628,6 +4628,9 @@ static int ufshcd_slave_alloc(struct scsi_device *sdev)
>  
>  	ufshcd_get_lu_power_on_wp_status(hba, sdev);
>  
> +	if (ufshcd_is_hpb_supported(hba))
> +		ufshpb_attach_sdev(hba, sdev);
> +
>  	return 0;
>  }

This approach is unusual because:
- Direct calls from the ufshcd module into the ufshpb module make it
  impossible to unload the latter kernel module.
- No other SCSI LLD calls directly into a device handler.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 02/13] scsi: ufshpb: Init part I - Read HPB config
  2020-05-15 10:30 ` [RFC PATCH 02/13] scsi: ufshpb: Init part I - Read HPB config Avri Altman
  2020-05-15 15:33   ` Randy Dunlap
  2020-05-16  1:46   ` Bart Van Assche
@ 2020-05-16  1:57   ` Bart Van Assche
  2 siblings, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2020-05-16  1:57 UTC (permalink / raw)
  To: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel
  Cc: alim.akhtar, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	yongmyung lee, Jinyoung CHOI

On 2020-05-15 03:30, Avri Altman wrote:
> +struct ufshpb_lun_config *ufshpb_luns_conf;
> +struct ufshpb_lun *ufshpb_luns;
> +static unsigned long ufshpb_lun_map[BITS_TO_LONGS(UFSHPB_MAX_LUNS)];
> +static u8 ufshpb_lun_lookup[UFSHPB_MAX_LUNS];
> +
> +/**
> + * ufshpb_remove - ufshpb cleanup
> + *
> + * Should be called when unloading the driver.
> + */
> +void ufshpb_remove(struct ufs_hba *hba)
> +{
> +	kfree(ufshpb_conf);
> +	kfree(ufshpb_luns_conf);
> +	kfree(ufshpb_luns);
> +	ufshcd_query_flag(hba, UPIU_QUERY_OPCODE_SET_FLAG,
> +			  QUERY_FLAG_IDN_HPB_RESET, 0, NULL);
> +}
> +
> +static int ufshpb_hpb_init(void)
> +{
> +	u8 num_hpb_luns = ufshpb_conf->num_hpb_luns;
> +	int i;
> +
> +	ufshpb_luns = kcalloc(num_hpb_luns, sizeof(*ufshpb_luns), GFP_KERNEL);
> +	if (!ufshpb_luns)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < num_hpb_luns; i++) {
> +		struct ufshpb_lun *hpb = ufshpb_luns + i;
> +
> +		hpb->lun = (ufshpb_luns_conf + i)->lun;
> +	}
> +
> +	return 0;
> +}

Do the ufshpb_lun... data structures perhaps duplicate SCSI core data
structures? If so, please don't introduce duplicates of SCSI core data
structures.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 05/13] scsi: ufs: ufshpb: Disable HPB if no HPB-enabled luns
  2020-05-15 10:30 ` [RFC PATCH 05/13] scsi: ufs: ufshpb: Disable HPB if no HPB-enabled luns Avri Altman
@ 2020-05-16  2:02   ` Bart Van Assche
  0 siblings, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2020-05-16  2:02 UTC (permalink / raw)
  To: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel
  Cc: alim.akhtar, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	yongmyung lee, Jinyoung CHOI

On 2020-05-15 03:30, Avri Altman wrote:
> @@ -368,6 +390,8 @@ int ufshpb_probe(struct ufs_hba *hba)
>  	if (ret)
>  		goto out;
>  
> +	INIT_DELAYED_WORK(&hba->hpb_disable_work, ufshpb_disable_work);
> +	schedule_delayed_work(&hba->hpb_disable_work, 60 * HZ);
>  out:
>  	kfree(dev_desc);
>  	if (ret) {

Calling INIT_DELAYED_WORK() just before schedule_delayed_work() is a bad
practice. If cancel_delayed_work() gets called before
INIT_DELAYED_WORK() then it will encounter something that it not
expects. If cancel_delayed_work() and INIT_DELAYED_WORK() get called
concurrently than that will trigger a race condition. It is better to
call INIT_DELAYED_WORK() from the context that allocates the data
structure in which the work structure is embedded.

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 06/13] scsi: scsi_dh: ufshpb: Prepare for L2P cache management
  2020-05-15 10:30 ` [RFC PATCH 06/13] scsi: scsi_dh: ufshpb: Prepare for L2P cache management Avri Altman
@ 2020-05-16  2:13   ` Bart Van Assche
  0 siblings, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2020-05-16  2:13 UTC (permalink / raw)
  To: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel
  Cc: alim.akhtar, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	yongmyung lee, Jinyoung CHOI

On 2020-05-15 03:30, Avri Altman wrote:
> +static int ufshpb_mempool_init(struct ufshpb_dh_lun *hpb)
> +{
> +	unsigned int max_active_subregions = hpb->max_active_regions *
> +		subregions_per_region;
> +	int i;
> +
> +	INIT_LIST_HEAD(&hpb->lh_map_ctx);
> +	spin_lock_init(&hpb->map_list_lock);
> +
> +	for (i = 0 ; i < max_active_subregions; i++) {
> +		struct ufshpb_map_ctx *mctx =
> +			kzalloc(sizeof(struct ufshpb_map_ctx), GFP_KERNEL);
> +
> +		if (!mctx) {
> +			/*
> +			 * mctxs already added in lh_map_ctx will be removed in
> +			 * detach
> +			 */
> +			return -ENOMEM;
> +		}
> +
> +		/* actual page allocation is done upon subregion activation */
> +
> +		INIT_LIST_HEAD(&mctx->list);
> +		list_add(&mctx->list, &hpb->lh_map_ctx);
> +	}
> +
> +	return 0;
> +
> +}

Could kmem_cache_create() have been used instead of implementing yet
another memory pool implementation?

> +static int ufshpb_region_tbl_init(struct ufshpb_dh_lun *hpb)
> +{
> +	struct ufshpb_region *regions;
> +	int i, j;
> +
> +	regions = kcalloc(hpb->regions_per_lun, sizeof(*regions), GFP_KERNEL);
> +	if (!regions)
> +		return -ENOMEM;
> +
> +	atomic_set(&hpb->active_regions, 0);
> +
> +	for (i = 0 ; i < hpb->regions_per_lun; i++) {
> +		struct ufshpb_region *r = regions + i;
> +		struct ufshpb_subregion *subregions;
> +
> +		subregions = kcalloc(subregions_per_region, sizeof(*subregions),
> +				     GFP_KERNEL);
> +		if (!subregions)
> +			goto release_mem;
> +
> +		for (j = 0; j < subregions_per_region; j++) {
> +			struct ufshpb_subregion *s = subregions + j;
> +
> +			s->hpb = hpb;
> +			s->r = r;
> +			s->region = i;
> +			s->subregion = j;
> +		}
> +
> +		r->subregion_tbl = subregions;
> +		r->hpb = hpb;
> +		r->region = i;
> +	}
> +
> +	hpb->region_tbl = regions;
> +
> +	return 0;

Could kvmalloc() have been used to allocate multiple subregion data
structures instead of calling kcalloc() multiple times?

> +	spin_lock(&hpb->map_list_lock);
> +
> +	list_for_each_entry_safe(mctx, next, &hpb->lh_map_ctx, list) {
> +		list_del(&mctx->list);
> +		kfree(mctx);
> +	}
> +
> +	spin_unlock(&hpb->map_list_lock);

Spinlocks should be held during a short time. I'm not sure that's the
case for the above loop.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 07/13] scsi: scsi_dh: ufshpb: Add ufshpb state machine
  2020-05-15 10:30 ` [RFC PATCH 07/13] scsi: scsi_dh: ufshpb: Add ufshpb state machine Avri Altman
@ 2020-05-16  2:44   ` Bart Van Assche
  0 siblings, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2020-05-16  2:44 UTC (permalink / raw)
  To: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel
  Cc: alim.akhtar, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	yongmyung lee, Jinyoung CHOI

On 2020-05-15 03:30, Avri Altman wrote:
> @@ -17,6 +18,13 @@
>  
>  #define UFSHPB_NAME	"ufshpb"
>  
> +#define UFSHPB_WRITE_BUFFER (0xfa)
> +#define WRITE_BUFFER_TIMEOUT (3 * HZ)
> +#define WRITE_BUFFER_RETRIES (3)
> +#define UFSHPB_READ_BUFFER (0xf9)
> +#define READ_BUFFER_TIMEOUT (3 * HZ)
> +#define READ_BUFFER_RETRIES (3)

Parentheses around expressions are normal but parentheses around
constants are unusual. I think the parentheses around constants can be
left out.

> +#define to_subregion() (container_of(work, struct ufshpb_subregion, hpb_work))

Could this have been defined as an inline function?

> @@ -76,6 +118,7 @@ struct ufshpb_subregion {
>   * @writes - sum over subregions @writes
>   * @region - region index
>   * @active_subregions - actual active subregions
> + * @evicted - to indicated if this region is currently being evicted
>   */
>  struct ufshpb_region {
>  	struct ufshpb_subregion *subregion_tbl;
> @@ -85,6 +128,7 @@ struct ufshpb_region {
>  	unsigned int region;
>  
>  	atomic_t active_subregions;
> +	atomic_t evicted;
>  };

Declaring a state variable as atomic_t is unusual. How are changes of
the @evicted member variable serialized?

>  /**
> @@ -93,6 +137,7 @@ struct ufshpb_region {
>   * @lh_map_ctx - list head of mapping context
>   * @map_list_lock - to protect mapping context list operations
>   * @region_tbl - regions/subregions table
> + * @pinned_map - to mark pinned regions
>   * @sdev - scsi device for that lun
>   * @regions_per_lun
>   * @subregions_per_lun - lun size is not guaranteed to be region aligned
> @@ -105,6 +150,7 @@ struct ufshpb_dh_lun {
>  	struct list_head lh_map_ctx;
>  	spinlock_t map_list_lock;
>  	struct ufshpb_region *region_tbl;
> +	unsigned long *pinned_map;
>  	struct scsi_device *sdev;
>  
>  	unsigned int regions_per_lun;
> @@ -113,6 +159,10 @@ struct ufshpb_dh_lun {
>  	unsigned int max_active_regions;
>  
>  	atomic_t active_regions;
> +
> +	struct mutex eviction_lock;
> +
> +	struct workqueue_struct *wq;
>  };

Please document what the eviction_lock protects.

> +static inline void ufshpb_set_write_buf_cmd(unsigned char *cmd,
> +					    unsigned int region)
> +{
> +	cmd[0] = UFSHPB_WRITE_BUFFER;
> +	cmd[1] = 0x01;
> +	put_unaligned_be16(region, &cmd[2]);
> +}

Please follow the example of the sd driver and use the verb "setup"
instead of "set" for functions that initialize a SCSI CDB.

> +static int ufshpb_submit_write_buf_cmd(struct scsi_device *sdev,
> +				       unsigned int region)
> +{
> +	unsigned char cmd[10] = {};
> +	struct scsi_sense_hdr sshdr = {};
> +	u64 flags = REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT |
> +		    REQ_FAILFAST_DRIVER;
> +	int timeout = WRITE_BUFFER_TIMEOUT;
> +	int cmd_retries = WRITE_BUFFER_RETRIES;
> +	int ret = 0;
> +
> +	ufshpb_set_write_buf_cmd(cmd, region);
> +
> +	ret = scsi_execute(sdev, cmd, DMA_NONE, NULL, 0, NULL, &sshdr,
> +			   timeout, cmd_retries, flags, 0, NULL);
> +
> +	/* HPB spec does not define any error handling */
> +	sdev_printk(KERN_INFO, sdev, "%s: WRITE_BUFFER %s result %d\n",
> +		    UFSHPB_NAME, ret ? "failed" : "succeeded", ret);
> +
> +	return ret;
> +}

I don't think that unconditionally printing the result of the WRITE
BUFFER command is acceptable. How about only reporting failures?

> +static void ufshpb_set_read_buf_cmd(unsigned char *cmd, unsigned int region,
> +				    unsigned int subregion,
> +				    unsigned int alloc_len)
> +{
> +	cmd[0] = UFSHPB_READ_BUFFER;
> +	cmd[1] = 0x01;
> +	put_unaligned_be16(region, &cmd[2]);
> +	put_unaligned_be16(subregion, &cmd[4]);
> +
> +	cmd[6] = alloc_len >> 16;
> +	cmd[7] = (alloc_len >> 8) & 0xff;
> +	cmd[8] = alloc_len & 0xff;
> +	cmd[9] = 0x00;
> +}

Please use put_unaligned_be24() instead of open-coding it.

> +static int ufshpb_subregion_alloc_pages(struct ufshpb_dh_lun *hpb,
> +					struct ufshpb_subregion *s)
> +{
> +	struct ufshpb_map_ctx *mctx;
> +
> +	spin_lock(&hpb->map_list_lock);
> +	mctx = list_first_entry_or_null(&hpb->lh_map_ctx,
> +					struct ufshpb_map_ctx, list);
> +	if (!mctx) {
> +		spin_unlock(&hpb->map_list_lock);
> +		return -EINVAL;
> +	}
> +
> +	list_del_init(&mctx->list);
> +	spin_unlock(&hpb->map_list_lock);
> +
> +	s->mctx = mctx;
> +	mctx->pages = (char *)__get_free_pages(GFP_KERNEL, order);
> +	if (!mctx->pages)
> +		return -ENOMEM;
> +
> +	return 0;
> +}

Relying on higher order pages is not acceptable because memory gets
fragmented easily. See also
https://elinux.org/images/a/a8/Controlling_Linux_Memory_Fragmentation_and_Higher_Order_Allocation_Failure-_Analysis%2C_Observations_and_Results.pdf.

> +	hpb->pinned_map = kcalloc(BITS_TO_LONGS(hpb->regions_per_lun),
> +				  sizeof(unsigned long), GFP_KERNEL);

Is this perhaps an open-coded version of bitmap_alloc()? If so, please
use bitmap_alloc() instead.

> +	snprintf(wq_name, ARRAY_SIZE(wq_name), "ufshpb_wq_%d", sdev->id);
> +	wq = alloc_workqueue(wq_name, WQ_HIGHPRI, WQ_MAX_ACTIVE);
> +	if (!wq) {
> +		ret = -ENOMEM;
> +		goto out_free;
> +	}

What is the purpose of the ufshpb_wq_%d workqueues? Why to allocate
dedicated workqueues instead of using one of the existing system
workqueues? If the scsi_execute() calls would be changed into
asynchronous SCSI command submission, would these workqueues still be
necessary?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 09/13] scsi: ufshpb: Add response API
  2020-05-15 10:30 ` [RFC PATCH 09/13] scsi: ufshpb: Add response API Avri Altman
@ 2020-05-16  3:06   ` Bart Van Assche
  0 siblings, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2020-05-16  3:06 UTC (permalink / raw)
  To: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel
  Cc: alim.akhtar, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	yongmyung lee, Jinyoung CHOI

On 2020-05-15 03:30, Avri Altman wrote:
> +#define RSP_DATA_SEG_LEN (sizeof(struct ufshpb_sense_data))

The name of this macro is almost as long as the expression it replaces.
It may make the code easier to read by dropping this macro and using the
sizeof() expression directly.

> +	struct tasklet_struct rsp_tasklet;

Why a tasklet instead of e.g. a work_struct? Tasklets can cause nasty
problems, e.g. CPU lockup complaints if too much work is done in tasklet
context.

> +static void ufshpb_dh_notify(struct ufshpb_lun *hpb,
> +			     struct ufshpb_sense_data *sense)
> +{
> +	struct ufs_hba *hba = shost_priv(hpb->sdev->host);
> +
> +	spin_lock(hba->host->host_lock);
> +
> +	if (scsi_device_get(hpb->sdev)) {
> +		spin_unlock(hba->host->host_lock);
> +		return;
> +	}
> +
> +	scsi_dh_set_params(hpb->sdev->request_queue, (const char *)sense);
> +
> +	scsi_device_put(hpb->sdev);
> +
> +	spin_unlock(hba->host->host_lock);
> +}

To me this looks like slight abuse of the scsi_dh_set_params() function.
The documentation of that function mentions clearly that the second
argument is an ASCII string and not e.g. sense data.

Has this driver been tested on a system with lockdep enabled? I don't
think that it is acceptable to use spin_lock() in tasklet context.

> +static void ufshpb_tasklet_fn(unsigned long priv)
> +{
> +	struct ufshpb_lun *hpb = (struct ufshpb_lun *)priv;
> +	struct ufshpb_rsp_element *rsp_elem = NULL;
> +	unsigned long flags;
> +
> +	while (1) {
> +		spin_lock_irqsave(&hpb->rsp_list_lock, flags);
> +		rsp_elem = ufshpb_get_rsp_elem(hpb, &hpb->lh_rsp);
> +		spin_unlock_irqrestore(&hpb->rsp_list_lock, flags);
> +
> +		if (!rsp_elem)
> +			return;
> +
> +		ufshpb_dh_notify(hpb, &rsp_elem->sense_data);
> +
> +		spin_lock_irqsave(&hpb->rsp_list_lock, flags);
> +		list_add_tail(&rsp_elem->list, &hpb->lh_rsp_free);
> +		spin_unlock_irqrestore(&hpb->rsp_list_lock, flags);
> +	}
> +}

Please schedule work instead of using tasklet context.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 11/13] scsi: Allow device handler set their own CDB
  2020-05-15 10:30 ` [RFC PATCH 11/13] scsi: Allow device handler set their own CDB Avri Altman
@ 2020-05-16  3:19   ` Bart Van Assche
  0 siblings, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2020-05-16  3:19 UTC (permalink / raw)
  To: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel
  Cc: alim.akhtar, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	yongmyung lee, Jinyoung CHOI

On 2020-05-15 03:30, Avri Altman wrote:
> Allow scsi device handler handle their own CDB and ship it down the
> stream of scsi passthrough command setup flow, without any further
> interventions.
> 
> This is useful for setting DRV-OP that implements vendor commands via
> the scsi device handlers framework.
> 
> Signed-off-by: Avri Altman <avri.altman@wdc.com>
> ---
>  drivers/scsi/scsi_lib.c | 5 +++--
>  drivers/scsi/sd.c       | 9 +++++++++
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 6b6dd40..4e98714 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1148,14 +1148,15 @@ static blk_status_t scsi_setup_fs_cmnd(struct scsi_device *sdev,
>  {
>  	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
>  
> +	cmd->cmnd = scsi_req(req)->cmd = scsi_req(req)->__cmd;
> +	memset(cmd->cmnd, 0, BLK_MAX_CDB);
> +
>  	if (unlikely(sdev->handler && sdev->handler->prep_fn)) {
>  		blk_status_t ret = sdev->handler->prep_fn(sdev, req);
>  		if (ret != BLK_STS_OK)
>  			return ret;
>  	}
>  
> -	cmd->cmnd = scsi_req(req)->cmd = scsi_req(req)->__cmd;
> -	memset(cmd->cmnd, 0, BLK_MAX_CDB);
>  	return scsi_cmd_to_driver(cmd)->init_command(cmd);
>  }
>  
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index a793cb0..246bef8 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1221,6 +1221,14 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>  	} else if (sdp->use_16_for_rw || (nr_blocks > 0xffff)) {
>  		ret = sd_setup_rw16_cmnd(cmd, write, lba, nr_blocks,
>  					 protect | fua);
> +	} else if (unlikely(sdp->handler && blk_rq_is_private(rq))) {
> +		/*
> +		 * scsi device handler that implements vendor commands -
> +		 * the command was already constructed in the device handler's
> +		 * prep_fn, so no need to do anything here
> +		 */
> +		rq->cmd_flags = REQ_OP_READ;
> +		ret = BLK_STS_OK;
>  	} else if ((nr_blocks > 0xff) || (lba > 0x1fffff) ||
>  		   sdp->use_10_for_rw || protect) {
>  		ret = sd_setup_rw10_cmnd(cmd, write, lba, nr_blocks,
> @@ -1285,6 +1293,7 @@ static blk_status_t sd_init_command(struct scsi_cmnd *cmd)
>  		return sd_setup_write_same_cmnd(cmd);
>  	case REQ_OP_FLUSH:
>  		return sd_setup_flush_cmnd(cmd);
> +	case REQ_OP_DRV_IN:
>  	case REQ_OP_READ:
>  	case REQ_OP_WRITE:
>  		return sd_setup_read_write_cmnd(cmd);

The above looks weird to me. My understanding is that
scsi_setup_fs_cmnd() is intended for operations like REQ_OP_READ,
REQ_OP_WRITE and REQ_OP_DISCARD. I don't think that it is appropriate to
pass REQ_OP_DRV_IN requests to scsi_setup_fs_cmnd() and/or
sd_init_command().

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 12/13] scsi: dh: ufshpb: Add prep_fn handler
  2020-05-15 10:30 ` [RFC PATCH 12/13] scsi: dh: ufshpb: Add prep_fn handler Avri Altman
@ 2020-05-16  3:40   ` Bart Van Assche
  0 siblings, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2020-05-16  3:40 UTC (permalink / raw)
  To: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel
  Cc: alim.akhtar, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	yongmyung lee, Jinyoung CHOI

On 2020-05-15 03:30, Avri Altman wrote:
> diff --git a/drivers/scsi/device_handler/scsi_dh_ufshpb.c b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
> index affc143..04e3d56 100644
> --- a/drivers/scsi/device_handler/scsi_dh_ufshpb.c
> +++ b/drivers/scsi/device_handler/scsi_dh_ufshpb.c
> @@ -15,6 +15,7 @@
>  #include <scsi/scsi_eh.h>
>  #include <scsi/scsi_dh.h>
>  #include <scsi/scsi_dh_ufshpb.h>
> +#include "../sd.h"

Please add a comment that explains why this include directive is necessary.

> +static void __update_read_counters(struct ufshpb_dh_lun *hpb,
> +				   struct ufshpb_region *r,
> +				   struct ufshpb_subregion *s, u64 nr_blocks)
> +{
> +	enum ufshpb_state s_state;
> +
> +	atomic64_add(nr_blocks, &s->reads);
> +	atomic64_add(nr_blocks, &r->reads);
> +
> +	/* normalize read counters if needed */
> +	if (atomic64_read(&r->reads) >= READ_NORMALIZATION * entries_per_region)
> +		queue_work(hpb->wq, &hpb->reads_normalization_work);
> +
> +	rcu_read_lock();
> +	s_state = s->state;
> +	rcu_read_unlock();

We don't use locking in the Linux kernel to read a scalar that can be
read with a single load instruction, even if it can be modified while it
is being read.

> +/* Call this on read from prep_fn */
> +static bool ufshpb_test_block_dirty(struct ufshpb_dh_data *h,
> +				    struct request *rq, u64 start_lba,
> +				    u32 nr_blocks)
> +{
> +	struct ufshpb_dh_lun *hpb = h->hpb;
> +	u64 end_lba = start_lba + nr_blocks;
> +	unsigned int region = ufshpb_lba_to_region(start_lba);
> +	unsigned int subregion = ufshpb_lba_to_subregion(start_lba);
> +	struct ufshpb_region *r = hpb->region_tbl + region;
> +	struct ufshpb_subregion *s = r->subregion_tbl + subregion;
> +	enum ufshpb_state s_state;
> +
> +	__update_rw_counters(hpb, start_lba, end_lba, REQ_OP_READ);
> +
> +	rcu_read_lock();
> +	s_state = s->state;
> +	rcu_read_unlock();
> +
> +	if (s_state != HPB_STATE_ACTIVE)
> +		return true;
> +
> +	return (atomic64_read(&s->writes) >= SET_AS_DIRTY);
> +}

No parentheses around returned values please.

>  /*
>   * ufshpb_dispatch - ufshpb state machine
>   * make the various decisions based on region/subregion state & events
> @@ -631,6 +875,9 @@ static void ufshpb_work_handler(struct work_struct *work)
>  	ufshpb_dispatch(s->hpb, s->r, s);
>  
>  	mutex_unlock(&s->state_lock);
> +
> +	/* the subregion state might has changed */
> +	synchronize_rcu();
>  }

What is the purpose of this synchronize_rcu() call? This is the first
time that I see a synchronize_rcu() call at the end of a work handler.

>  static int ufshpb_activate_pinned_regions(struct ufshpb_dh_data *h, bool init)
> @@ -679,6 +926,12 @@ static int ufshpb_activate_pinned_regions(struct ufshpb_dh_data *h, bool init)
>  		set_bit(start_idx + i, hpb->pinned_map);
>  	}
>  
> +	/*
> +	 * those subregions of the pinned regions changed their state - they
> +	 * are active now
> +	 */
> +	synchronize_rcu();
> +
>  	return ret;
>  }

Same question here: what is the purpose of this synchronize_rcu() call?

> @@ -713,6 +966,9 @@ static void ufshpb_lun_reset_work_handler(struct work_struct *work)
>  		__region_reset(hpb, r);
>  	}
>  
> +	/* update rcu post lun reset */
> +	synchronize_rcu();
> +

Also here: what is the purpose of this synchronize_rcu() call?

> +/*
> + * ufshpb_prep_fn - Construct HPB_READ when possible
> + */
> +static blk_status_t ufshpb_prep_fn(struct scsi_device *sdev, struct request *rq)
> +{
> +	struct ufshpb_dh_data *h = sdev->handler_data;
> +	struct ufshpb_dh_lun *hpb = h->hpb;
> +	u64 lba = sectors_to_logical(sdev, blk_rq_pos(rq));
> +	u32 nr_blocks = sectors_to_logical(sdev, blk_rq_sectors(rq));
> +
> +	if (op_is_write(req_op(rq)) || op_is_discard(req_op(rq))) {
> +		ufshpb_set_block_dirty(h, rq, lba, nr_blocks);
> +		goto out;
> +	}
> +
> +	if (req_op(rq) != REQ_OP_READ || nr_blocks > 255)
> +		goto out;
> +
> +	if (ufshpb_test_block_dirty(h, rq, lba, nr_blocks))
> +		goto out;
> +
> +	ufshpb_prepare_hpb_read_cmd(rq, hpb, lba, (u8)nr_blocks);
> +
> +out:
> +	return BLK_STS_OK;
> +}

So this prep function calls ufshpb_prepare_hpb_read_cmd(), and that
function does the following:

	memcpy(scsi_req(rq)->cmd, cmnd, sizeof(cmnd));

I'm not sure that such a construct would be acceptable in any SCSI LLD
or device handler. The SCSI CDB is overwritten without updating the
other members of the request structure, e.g. the page pointers in the
bvecs of the bio attached to a request structure. What will e.g. happen
if rq_for_each_segment() would be called? Will it iterate over the data
buffer of the original REQ_OP_READ or will it iterate over the data
buffer of the UFSHPB_READ command?

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 00/13] scsi: ufs: Add HPB Support
  2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
                   ` (12 preceding siblings ...)
  2020-05-15 10:30 ` [RFC PATCH 13/13] scsi: scsi_dh: ufshpb: Add "Cold" subregions timer Avri Altman
@ 2020-05-16  3:50 ` Bart Van Assche
  2020-05-16  9:14   ` Avri Altman
  13 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2020-05-16  3:50 UTC (permalink / raw)
  To: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel
  Cc: alim.akhtar, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	yongmyung lee, Jinyoung CHOI

On 2020-05-15 03:30, Avri Altman wrote:
> NAND flash-based storage devices, needs to translate the logical
> addresses of the IO requests to its corresponding physical addresses of
> the flash storage.  As the internal SRAM of the storage device cannot
> accommodate the entire L2P mapping table, the device can only partially
> load the L2P data it needs based on the incoming LBAs. As a result,
> cache misses - which are more frequent in random read access, can result
> in serious performance degradation.  To remedy the above, UFS3.1 offers
> the Host Performance Booster (HPB) feature, which uses the host’s system
> memory as a cache for L2P map data. The basic concept of HPB is to
> cache L2P mapping entries in host system memory so that both physical
> block address (PBA) and logical block address (LBA) can be delivered in
> HPB read command.  Not to be confused with host-managed-FTL, HPB is
> merely about NAND flash cost reduction.
> 
> HPB, by its nature, imposes an interesting question regarding the proper
> location of its components across the storage stack. On Init it requires
> to detect the HPB capabilities and parameters, which can be only done
> from the LLD level.  On the other hand, it requires to send scsi
> commands as part of its cache management, which preferably should be
> done from scsi mid-layer,  and compose the "HPB_READ" command, which
> should be done as part of scsi command setup path.
> Therefore, we came up with a 2 module layout: ufshpb in the ufs driver,
> and scsi_dh_ufshpb under scsi device handler.
> 
> The ufshpb module bear 2 main duties. During initialization, it reads
> the hpb configuration from the device. Later on, during the scan host
> phase, it attaches the scsi device and activates the scsi hpb device
> handler.  The second duty mainly concern supporting the HPB cache
> management. The scsi hpb device handler will perform the cache
> management and send the HPB_READ command. The modules will communicate
> via the standard device handler API: the handler_data opaque pointer,
> and the set_params op-mode.
> 
> This series has borrowed heavily from a HPB implementation that was
> published as part of the pixel3 code, authored by:
> Yongmyung Lee <ymhungry.lee@samsung.com> and
> Jinyoung Choi <j-young.choi@samsung.com>.
> 
> We kept some of its design and implementation details. We made some
> minor modifications to adopt the latest spec changes (HPB1.0 was not
> close when the driver initially published), and also divide the
> implementation between the scsi handler and the ufs modules, instead of
> a single module in the original driver, which simplified the
> implementation to a great deal and resulted in far less code. One more
> big difference is that the Pixel3 driver support device managed mode,
> while we are supporting host managed mode, which reflect heavily on the
> cache management decision process.

Hi Avri,

Thank you for having taken the time to publish your work. The way this
series has been split into individual patches makes reviewing easy.
Additionally, the cover letter and patch descriptions are very
informative, insightful and well written. However, I'm concerned about a
key aspect of the implementation, namely relying on a device handler to
alter the meaning of a block layer request. My concern about this
approach is that at most one device handler can be associated with a
SCSI LLD. If in the future more functionality would be added to the UFS
spec and if it would be desirable to implement that functionality as a
new kernel module, it won't be possible to implement that functionality
as a new device handler. So I think that not relying on the device
handler infrastructure is more future proof because that removes the
restrictions we have to deal with when using the device handler framework.
 Thanks,

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [RFC PATCH 00/13] scsi: ufs: Add HPB Support
  2020-05-16  3:50 ` [RFC PATCH 00/13] scsi: ufs: Add HPB Support Bart Van Assche
@ 2020-05-16  9:14   ` Avri Altman
  2020-05-16 17:14     ` Bart Van Assche
       [not found]     ` <CGME20200516171420epcas2p108c570904c5117c3654d71e0a2842faa@epcms2p7>
  0 siblings, 2 replies; 39+ messages in thread
From: Avri Altman @ 2020-05-16  9:14 UTC (permalink / raw)
  To: Bart Van Assche, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel
  Cc: alim.akhtar, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	yongmyung lee, Jinyoung CHOI

Hi Bart,

> 
> Hi Avri,
> 
> Thank you for having taken the time to publish your work. The way this
> series has been split into individual patches makes reviewing easy.
> Additionally, the cover letter and patch descriptions are very
> informative, insightful and well written. However, I'm concerned about a
> key aspect of the implementation, namely relying on a device handler to
> alter the meaning of a block layer request. My concern about this
> approach is that at most one device handler can be associated with a
> SCSI LLD. If in the future more functionality would be added to the UFS
> spec and if it would be desirable to implement that functionality as a
> new kernel module, it won't be possible to implement that functionality
> as a new device handler. So I think that not relying on the device
> handler infrastructure is more future proof because that removes the
> restrictions we have to deal with when using the device handler framework.
>  Thanks,
So should we keep perusing this direction, or leave it, and concentrate in Bean's RFC?
Or maybe come up with a 3rd way?

Thanks,
Avri

> 
> Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC PATCH 00/13] scsi: ufs: Add HPB Support
  2020-05-16  9:14   ` Avri Altman
@ 2020-05-16 17:14     ` Bart Van Assche
       [not found]     ` <CGME20200516171420epcas2p108c570904c5117c3654d71e0a2842faa@epcms2p7>
  1 sibling, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2020-05-16 17:14 UTC (permalink / raw)
  To: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel
  Cc: alim.akhtar, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	yongmyung lee, Jinyoung CHOI

On 2020-05-16 02:14, Avri Altman wrote:
>> Thank you for having taken the time to publish your work. The way this
>> series has been split into individual patches makes reviewing easy.
>> Additionally, the cover letter and patch descriptions are very
>> informative, insightful and well written. However, I'm concerned about a
>> key aspect of the implementation, namely relying on a device handler to
>> alter the meaning of a block layer request. My concern about this
>> approach is that at most one device handler can be associated with a
>> SCSI LLD. If in the future more functionality would be added to the UFS
>> spec and if it would be desirable to implement that functionality as a
>> new kernel module, it won't be possible to implement that functionality
>> as a new device handler. So I think that not relying on the device
>> handler infrastructure is more future proof because that removes the
>> restrictions we have to deal with when using the device handler framework.
>
> So should we keep perusing this direction, or leave it, and concentrate in Bean's RFC?
> Or maybe come up with a 3rd way?

Hi Avri,

I prefer to proceed with reviewing Bean's patch series. If someone
prefers a different approach, I think this is a good time to bring that up.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Another approach of UFSHPB
       [not found]     ` <CGME20200516171420epcas2p108c570904c5117c3654d71e0a2842faa@epcms2p7>
@ 2020-05-19 22:31       ` yongmyung lee
  2020-05-20 17:55         ` Christoph Hellwig
                           ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: yongmyung lee @ 2020-05-19 22:31 UTC (permalink / raw)
  To: Bart Van Assche, Avri Altman, James E . J . Bottomley,
	Martin K . Petersen, linux-scsi, linux-kernel
  Cc: ALIM AKHTAR, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	Jinyoung CHOI, Adel Choi, BoRam Shin, Sung-Jun Park, Daejun Park

Hello, All!

I write this email that is my opinion regard to HPB.
This mail is very long. So, summary is ... :

[Summary]
1. HPB, Write Booster and other new UFS features would 
   be managed by ufs-extended feature layer (additional LLD layer)
2. Map table of HPB will be managed in LLD (UFS Driver)
3. Each ufs feature driver can be implemented as a loadable kernel module
4. Samsung prepares test environment based on QEMU also
5. We will push our patch as soon! Please review this.


This is body: 

I am Yongmyung Lee, who is one of the original authors of the HPB driver,
which was out-of-tree implementation from linux kernel mainline.

I have seen the works done by Bean Hou and Avri Altman, which are very impressive.
I also have some opinion on the driver 
and would like to consult with you about it and prepare a set of patches.

The basic idea of my approach is similar to Bean's work
of managing HPB map-data in LLD (UFS Driver).
As HPB Driver is closely related to the behavior of the
UFS Device, I think it would be nice if HPB Driver runs on the LLD layer.

HPB Driver is closely related to the behavior of the UFS Device. 
Thus I think it would be nice if HPB Driver works on additional feature layer on LLD. 

Of course, Avri's work makes sense.
I believe HPB's scheme can be a good alternative to HMB(Host Memory Buffer)of NVMe
for storage devices like eMMC or SATA SSD (may be DRAMless).
Once HPB-like scheme is applied, 
these storage devices could benefit from performance improvement
in read latency as well as throughput.

However, it would not be desirable to relocate 
the map table management to the upper layer such as scsi-dh
as there are no such specification supported for the moment.

When the similar specifications to HPB will be proposed in JEDEC or T13,
it would be a great chance to have a deep discussion on 
which layer to be the best fit for the map table management.

Before disclosing my opinion on the HPB driver development approach,
I would like to introduce the main characteristics of Extended UFS-Feature in progress: 

  * Providing extended functionality.
    That is, UFS device shall work well without extended UFS-Features.
    By using them, however, it will be more powerful in certain application scenarios.

  * Designed to suit the Embedded system (such as Android)
    Due to the characteristics of UFS device, 
    it is being utilized mainly as a mobile storage.
    We are mainly trying to improve performance in Android system. 

  * Closely connected to the UFS devices.
    The Extended UFS-Features use UFS query requests
    which are UFS private commands.
    In addition, other features are closely related with
    electrical and physical characteristics of UFS
   (such as Voltage, Gear-scaling and Runtime suspend/resume etc.,)
    and should be taken into consideration
 
Therefore, it is desirable to have those features managed 
separately from the original UFS driver.

Current extended UFS-Features which are approved by the JEDEC are as below:

  * UFS HPB
  * UFS WriteBooster

In addition to that, we are developing and testing various UFS-features
that would be beneficial in the Embedded system 
(At the moment, most of them would be Android system).
As we have already successfully provided enhanced features 
in cooperation with major Android Phone vendors,
now we willingly want to contribute our works to the Linux kernel mainline.
 

Currently, UFS driver (usually ufshcd.c) has become bulky and complex.
So, I would like to split these codes into layers 
like the works of Bean Huo and Avril Altman.
Especially, I suggest the UFS-Feature Driver model based on Linux Bus-Driver Model,
which is suitable to manage all Extended UFS-Feature drivers like the Figure as below:


UFS Driver data structure (struct ufs_hba)
   |
   |    -----------------------    -- ufshpb driver -- <- attach ufshpb device driver (it can be loadable)
   |---| ufs-ext feature layer |   -- ufs-wb driver -- <- attach ufswb device driver
   |   |                       |   -- ...           -- <- ...
   |    -----------------------    -- next ufs feature driver  -- <- attach ufs-next feature driver

* wb : write-booster
 

Each extended UFS-Feature Driver has a bus of ufs-ext feature type
and it is bound to the ufs-ext feature layer.
The ufs-ext feature layer manages common APIs used by each Extended UFS-Feature Driver.
The APIs consist of UFS Query request and 
UFS Vendor commands related to each ufs-ext feature driver. 


Furthermore, each ufs-ext feature driver will be written as a loadable kernel module.
Vendors (e.g., Android Phone manufacturer) could optionally load and remove each module.
Also they can customize the parameters of ufs-ext feature drivers
while each module is being loaded.
(For example, vendor would set the maximum memory size
 that can be reclaimed in the Host Control mode in HPB)


We are expecting that this kind of standardized and 
abstracted model will surely make it easier to manage the ufs-ext feature drivers.

I noticed that the ufs-wb (write-booster) patch was written by Asutosh Das and Stanley Chu. 
The basic idea of ufs-wb code is to proceed write boosting
in adjusting gear-scaling, which is a good idea that I did not think of.
I also would like to incorporate ufs-wb driver into the ufs-ext feature layer,
and in this way the footprint of UFS driver could be slimmed down likewise.

In brief, system vendors can optimize their system
by loading each ufs-ext feature driver selectively. 

As of now I am still working on the patch and
have it tested on the android phone with real UFS-HPB device on the stable androidzed kernel
as well as on a ufs-simulator based on QEMU to reflect latest kernel version with my team members. 

In addition, we plan to provide QEMU with UFS-simulator
for a test environment for UFS driver development.
I hope there will be a lot of comments on my idea
and patch set to be pushed in the near future
and I will willingly reflect every single opinion.

I will push up my patch set as soon as possible.

Thank you


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Another approach of UFSHPB
  2020-05-19 22:31       ` Another approach of UFSHPB yongmyung lee
@ 2020-05-20 17:55         ` Christoph Hellwig
  2020-05-20 21:19           ` Bart Van Assche
  2020-05-22 16:49         ` Bart Van Assche
       [not found]         ` <CGME20200516171420epcas2p108c570904c5117c3654d71e0a2842faa@epcms2p4>
  2 siblings, 1 reply; 39+ messages in thread
From: Christoph Hellwig @ 2020-05-20 17:55 UTC (permalink / raw)
  To: yongmyung lee
  Cc: Bart Van Assche, Avri Altman, James E . J . Bottomley,
	Martin K . Petersen, linux-scsi, linux-kernel, ALIM AKHTAR,
	asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo, cang,
	stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	Jinyoung CHOI, Adel Choi, BoRam Shin, Sung-Jun Park, Daejun Park

Serious,

HPB is a completely fucked up concept and we shoud not merge it at all.
Especially not with a crazy bullshit vendor extension layer that makes
it even easier for vendors to implement even worse things than the
already horrible spec says.  Just stop this crap and implement sane
interfaces for the next generation hardware instead of wasting your
time on this idiotic idea.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Another approach of UFSHPB
  2020-05-20 17:55         ` Christoph Hellwig
@ 2020-05-20 21:19           ` Bart Van Assche
  2020-05-22 16:35             ` Bart Van Assche
  0 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2020-05-20 21:19 UTC (permalink / raw)
  To: Christoph Hellwig, yongmyung lee
  Cc: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel, ALIM AKHTAR, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, Jinyoung CHOI,
	Adel Choi, BoRam Shin, Sung-Jun Park, Daejun Park

On 2020-05-20 10:55, Christoph Hellwig wrote:
> HPB is a completely fucked up concept and we shoud not merge it at all.
> Especially not with a crazy bullshit vendor extension layer that makes
> it even easier for vendors to implement even worse things than the
> already horrible spec says.  Just stop this crap and implement sane
> interfaces for the next generation hardware instead of wasting your
> time on this idiotic idea.

Hi Christoph,

What exactly is it that you are not happy about? Is it the concept of
using host memory to store L2P translation information or how that
concept has been translated into SCSI commands (HPB READ BUFFER, HPB
READ and HPB WRITE BUFFER)?

In the former case: aren't Open-Channel SSDs another example of storage
devices for which the L2P translation tables are maintained in host
memory? Didn't the driver for Fusion-io SSDs also maintain the L2P
mapping in host memory?

Do you agree that HPB UFS storage devices are already being used widely
and hence that not accepting this functionality in the upstream kernel
will force users of HPB devices to maintain HPB code outside the kernel
tree? Isn't one of the goals of the Linux kernel project to increase its
user base?

Bart.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Another approach of UFSHPB
  2020-05-20 21:19           ` Bart Van Assche
@ 2020-05-22 16:35             ` Bart Van Assche
  0 siblings, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2020-05-22 16:35 UTC (permalink / raw)
  To: Christoph Hellwig, yongmyung lee
  Cc: Avri Altman, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, linux-kernel, ALIM AKHTAR, asutoshd, Zang Leigang,
	Avi Shchislowski, Bean Huo, cang, stanley.chu,
	MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh, Jinyoung CHOI,
	Adel Choi, BoRam Shin, Sung-Jun Park, Daejun Park

On 2020-05-20 14:19, Bart Van Assche wrote:
> On 2020-05-20 10:55, Christoph Hellwig wrote:
>> HPB is a completely fucked up concept and we shoud not merge it at all.
>> Especially not with a crazy bullshit vendor extension layer that makes
>> it even easier for vendors to implement even worse things than the
>> already horrible spec says.  Just stop this crap and implement sane
>> interfaces for the next generation hardware instead of wasting your
>> time on this idiotic idea.
> 
> What exactly is it that you are not happy about? Is it the concept of
> using host memory to store L2P translation information or how that
> concept has been translated into SCSI commands (HPB READ BUFFER, HPB
> READ and HPB WRITE BUFFER)?
> 
> In the former case: aren't Open-Channel SSDs another example of storage
> devices for which the L2P translation tables are maintained in host
> memory? Didn't the driver for Fusion-io SSDs also maintain the L2P
> mapping in host memory?
> 
> Do you agree that HPB UFS storage devices are already being used widely
> and hence that not accepting this functionality in the upstream kernel
> will force users of HPB devices to maintain HPB code outside the kernel
> tree? Isn't one of the goals of the Linux kernel project to increase its
> user base?

The following quote from
https://www.anandtech.com/show/13474/the-google-pixel-3-review/2 is
interesting: "Another big improvement for file I/O is the implementation
of “Host Performance Booster” in the kernel and UFS controller firmware
stack. HPB is essentially caching of the NAND chip’s FTL (flash
translation layer) L2P (logical to physical) mapping tables into the
hosts (SoCs) main memory. This allows the host driver to look up the
target L2P entry directly without betting on UFS’s limited SRAM to have
a cache-hit, reducing latency and greatly increasing random read
performance. The authors of the feature showcase an improvement of
59-67% in random I/O read performance due to the new feature. It’s worth
to mention that traditional Android I/O benchmarks won’t be able to show
this as as those tend to test read speeds with the files they’ve just
created."

Given the cost of SRAM in embedded controllers I think there is a strong
incentive for manufacturers of flash storage devices to reduce the
amount of SRAM on the storage controller. I think this means that
proposals to use host memory for caching L2P mappings will keep popping
up, no matter what we tell the storage controller vendors about what we
think about such a design.

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Another approach of UFSHPB
  2020-05-19 22:31       ` Another approach of UFSHPB yongmyung lee
  2020-05-20 17:55         ` Christoph Hellwig
@ 2020-05-22 16:49         ` Bart Van Assche
       [not found]         ` <CGME20200516171420epcas2p108c570904c5117c3654d71e0a2842faa@epcms2p4>
  2 siblings, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2020-05-22 16:49 UTC (permalink / raw)
  To: ymhungry.lee, Avri Altman, James E . J . Bottomley,
	Martin K . Petersen, linux-scsi, linux-kernel
  Cc: ALIM AKHTAR, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	Jinyoung CHOI, Adel Choi, BoRam Shin, Sung-Jun Park, Daejun Park

On 2020-05-19 15:31, yongmyung lee wrote:
> Currently, UFS driver (usually ufshcd.c) has become bulky and complex.
> So, I would like to split these codes into layers 
> like the works of Bean Huo and Avril Altman.
> Especially, I suggest the UFS-Feature Driver model based on Linux Bus-Driver Model,
> which is suitable to manage all Extended UFS-Feature drivers like the Figure as below:
> 
> 
> UFS Driver data structure (struct ufs_hba)
>    |
>    |    -----------------------    -- ufshpb driver -- <- attach ufshpb device driver (it can be loadable)
>    |---| ufs-ext feature layer |   -- ufs-wb driver -- <- attach ufswb device driver
>    |   |                       |   -- ...           -- <- ...
>    |    -----------------------    -- next ufs feature driver  -- <- attach ufs-next feature driver
> 
> * wb : write-booster

Splitting the UFS driver into multiple modules would be great if the
interface between these modules can be kept small and elegant. However,
I'm not sure that this approach should be based on Linux device driver
bus concept. Devices can be unbound at any time from their driver by
writing into the "unbind" sysfs attribute. I don't think we want the UFS
core functionality ever to be unbound while any other UFS functionality
is still active. Has it been considered to implement each feature as a
loadable module without relying on the bus model? The existing kernel
module infrastructure already prevents to unload modules (e.g. the UFS
core) that are in use by a kernel module that depends on it (e.g. UFS HPB).

> Furthermore, each ufs-ext feature driver will be written as a loadable kernel module.
> Vendors (e.g., Android Phone manufacturer) could optionally load and remove each module.

What will happen if a feature module is unloaded (e.g. HPB) while I/O is
ongoing that relies on HPB?

> Also they can customize the parameters of ufs-ext feature drivers
> while each module is being loaded.
> (For example, vendor would set the maximum memory size
>  that can be reclaimed in the Host Control mode in HPB)

Should these parameters be per module or per UFS device?

> In addition, we plan to provide QEMU with UFS-simulator
> for a test environment for UFS driver development.

A UFS simulator for QEMU support would definitely be welcome.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Another approach of UFSHPB
       [not found]         ` <CGME20200516171420epcas2p108c570904c5117c3654d71e0a2842faa@epcms2p4>
@ 2020-05-25  5:40           ` Daejun Park
  2020-05-25 14:56             ` Bart Van Assche
  0 siblings, 1 reply; 39+ messages in thread
From: Daejun Park @ 2020-05-25  5:40 UTC (permalink / raw)
  To: Bart Van Assche, yongmyung lee, Avri Altman,
	James E . J . Bottomley, Martin K . Petersen, linux-scsi,
	linux-kernel
  Cc: ALIM AKHTAR, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	Jinyoung CHOI, Adel Choi, BoRam Shin, Sung-Jun Park, Daejun Park

I am Daejun Park who is working to patch HPB driver.
Thank you for your comment, and the following is our answer.

> Splitting the UFS driver into multiple modules would be great if the
> interface between these modules can be kept small and elegant. However,
> I'm not sure that this approach should be based on Linux device driver
> bus concept. Devices can be unbound at any time from their driver by
> writing into the "unbind" sysfs attribute. I don't think we want the UFS
> core functionality ever to be unbound while any other UFS functionality
> is still active. Has it been considered to implement each feature as a
> loadable module without relying on the bus model? The existing kernel
> module infrastructure already prevents to unload modules (e.g. the UFS
> core) that are in use by a kernel module that depends on it (e.g. UFS HPB).

The HPB driver is close to the UFS core function, but it is not essential
for operating UFS device. With reference to this article
(https://lwn.net/Articles/645810/), we implemented extended UFS-feature
as bus model. Because the HPB driver consumes the user's main memory, it should
support bind / unbind functionality as needed. We implemented the HPB driver 
can be unbind / unload on runtime.

> What will happen if a feature module is unloaded (e.g. HPB) while I/O is
> ongoing that relies on HPB?

In the HPB driver, it checks whether the HPB can be unload / unbind through
the reference counter. Therefore, there is no problem that the driver is 
unloaded / unbind when there is I/O related to HPB.

> Should these parameters be per module or per UFS device?

I think it is necessary to take parameters for each module. 
This is because each extended UFS-feature module has different functions
and may require different parameters.

Thanks,

Daejun Park.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Another approach of UFSHPB
  2020-05-25  5:40           ` Daejun Park
@ 2020-05-25 14:56             ` Bart Van Assche
  2020-05-26  6:15               ` Avri Altman
  0 siblings, 1 reply; 39+ messages in thread
From: Bart Van Assche @ 2020-05-25 14:56 UTC (permalink / raw)
  To: daejun7.park, yongmyung lee, Avri Altman,
	James E . J . Bottomley, Martin K . Petersen, linux-scsi,
	linux-kernel
  Cc: ALIM AKHTAR, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	Jinyoung CHOI, Adel Choi, BoRam Shin, Sung-Jun Park

On 2020-05-24 22:40, Daejun Park wrote:
> The HPB driver is close to the UFS core function, but it is not essential
> for operating UFS device. With reference to this article
> (https://lwn.net/Articles/645810/), we implemented extended UFS-feature
> as bus model. Because the HPB driver consumes the user's main memory, it should
> support bind / unbind functionality as needed. We implemented the HPB driver 
> can be unbind / unload on runtime.

I do not agree that the bus model is the best choice for freeing cache
memory if it is no longer needed. A shrinker is probably a much better
choice because the callback functions in a shrinker get invoked when a
system is under memory pressure. See also register_shrinker(),
unregister_shrinker() and struct shrinker in include/linux/shrinker.h.

>> Should these parameters be per module or per UFS device?
> 
> I think it is necessary to take parameters for each module. 
> This is because each extended UFS-feature module has different functions
> and may require different parameters.

My question was a rhetorical question. Please choose per device
parameters when appropriate instead of module parameters.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: Another approach of UFSHPB
  2020-05-25 14:56             ` Bart Van Assche
@ 2020-05-26  6:15               ` Avri Altman
  2020-05-26 17:03                 ` Bart Van Assche
       [not found]                 ` <CGME20200516171420epcas2p108c570904c5117c3654d71e0a2842faa@epcms2p3>
  0 siblings, 2 replies; 39+ messages in thread
From: Avri Altman @ 2020-05-26  6:15 UTC (permalink / raw)
  To: Bart Van Assche, daejun7.park, yongmyung lee,
	James E . J . Bottomley, Martin K . Petersen, linux-scsi,
	linux-kernel
  Cc: ALIM AKHTAR, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	Jinyoung CHOI, Adel Choi, BoRam Shin, Sung-Jun Park

> On 2020-05-24 22:40, Daejun Park wrote:
> > The HPB driver is close to the UFS core function, but it is not essential
> > for operating UFS device. With reference to this article
> > (https://lwn.net/Articles/645810/), we implemented extended UFS-feature
> > as bus model. Because the HPB driver consumes the user's main memory, it
> should
> > support bind / unbind functionality as needed. We implemented the HPB
> driver
> > can be unbind / unload on runtime.
> 
> I do not agree that the bus model is the best choice for freeing cache
> memory if it is no longer needed. A shrinker is probably a much better
> choice because the callback functions in a shrinker get invoked when a
> system is under memory pressure. See also register_shrinker(),
> unregister_shrinker() and struct shrinker in include/linux/shrinker.h.
Since this discussion is closely related to cache allocation,
What is your opinion about allocating the pages dynamically as the regions
Are being activated/deactivated, in oppose of how it is done today - 
Statically on init for the entire max-active-subregions?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Another approach of UFSHPB
  2020-05-26  6:15               ` Avri Altman
@ 2020-05-26 17:03                 ` Bart Van Assche
       [not found]                 ` <CGME20200516171420epcas2p108c570904c5117c3654d71e0a2842faa@epcms2p3>
  1 sibling, 0 replies; 39+ messages in thread
From: Bart Van Assche @ 2020-05-26 17:03 UTC (permalink / raw)
  To: Avri Altman, daejun7.park, yongmyung lee,
	James E . J . Bottomley, Martin K . Petersen, linux-scsi,
	linux-kernel
  Cc: ALIM AKHTAR, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	Jinyoung CHOI, Adel Choi, BoRam Shin, Sung-Jun Park

On 2020-05-25 23:15, Avri Altman wrote:
>> On 2020-05-24 22:40, Daejun Park wrote:
>>> The HPB driver is close to the UFS core function, but it is not essential
>>> for operating UFS device. With reference to this article
>>> (https://lwn.net/Articles/645810/), we implemented extended UFS-feature
>>> as bus model. Because the HPB driver consumes the user's main memory, it
>> should
>>> support bind / unbind functionality as needed. We implemented the HPB
>> driver
>>> can be unbind / unload on runtime.
>>
>> I do not agree that the bus model is the best choice for freeing cache
>> memory if it is no longer needed. A shrinker is probably a much better
>> choice because the callback functions in a shrinker get invoked when a
>> system is under memory pressure. See also register_shrinker(),
>> unregister_shrinker() and struct shrinker in include/linux/shrinker.h.
>
> Since this discussion is closely related to cache allocation,
> What is your opinion about allocating the pages dynamically as the regions
> Are being activated/deactivated, in oppose of how it is done today - 
> Statically on init for the entire max-active-subregions?

Memory that is statically allocated cannot be used for any other purpose
(e.g. page cache) without triggering the associated shrinker. As far as
I know shrinkers are only triggered when (close to) out of memory. So
dynamically allocating memory as needed is probably a better strategy
than statically allocating the entire region at initialization time.

Thanks,

Bart.




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Another approach of UFSHPB
       [not found]                 ` <CGME20200516171420epcas2p108c570904c5117c3654d71e0a2842faa@epcms2p3>
@ 2020-05-27  9:11                   ` Daejun Park
  2020-05-27 11:46                     ` Bean Huo
  0 siblings, 1 reply; 39+ messages in thread
From: Daejun Park @ 2020-05-27  9:11 UTC (permalink / raw)
  To: Bart Van Assche, Avri Altman, Daejun Park, yongmyung lee,
	James E . J . Bottomley, Martin K . Petersen, linux-scsi,
	linux-kernel
  Cc: ALIM AKHTAR, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	Jinyoung CHOI, Adel Choi, BoRam Shin, Sung-Jun Park

On 2020-05-26 Bart Van Assche wrote:
>On 2020-05-25 23:15, Avri Altman wrote:
>>> On 2020-05-24 22:40, Daejun Park wrote:
>>>> The HPB driver is close to the UFS core function, but it is not essential
>>>> for operating UFS device. With reference to this article
>>>> (https://lwn.net/Articles/645810/), we implemented extended UFS-feature
>>>> as bus model. Because the HPB driver consumes the user's main memory, it
>>> should
>>>> support bind / unbind functionality as needed. We implemented the HPB
>>> driver
>>>> can be unbind / unload on runtime.
>>>
>>> I do not agree that the bus model is the best choice for freeing cache
>>> memory if it is no longer needed. A shrinker is probably a much better
>>> choice because the callback functions in a shrinker get invoked when a
>>> system is under memory pressure. See also register_shrinker(),
>>> unregister_shrinker() and struct shrinker in include/linux/shrinker.h.
>>
>> Since this discussion is closely related to cache allocation,
>> What is your opinion about allocating the pages dynamically as the regions
>> Are being activated/deactivated, in oppose of how it is done today - 
>> Statically on init for the entire max-active-subregions?

> Memory that is statically allocated cannot be used for any other purpose
> (e.g. page cache) without triggering the associated shrinker. As far as
> I know shrinkers are only triggered when (close to) out of memory. So
> dynamically allocating memory as needed is probably a better strategy
> than statically allocating the entire region at initialization time.

To improve UFS device performance using the HPB driver, 
the number of active-subregions above a certain threshold is essential.
If the number of active-subregions is lower than the threshold, 
the performance improvement by using HPB will be reduced. 
Also, due to frequent and active/inactive protocol overhead, 
performance may be worse than when the HPB feature is not used.

Therefore, it is better to unbind/unload HPB driver than 
to reduce the number of active subregions below the threshold. 
We designed the HPB driver to make the UFS device work 
even when the module is unloaded.

Thanks,

Daejun

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Another approach of UFSHPB
  2020-05-27  9:11                   ` Daejun Park
@ 2020-05-27 11:46                     ` Bean Huo
  0 siblings, 0 replies; 39+ messages in thread
From: Bean Huo @ 2020-05-27 11:46 UTC (permalink / raw)
  To: daejun7.park, Bart Van Assche, Avri Altman, yongmyung lee,
	James E . J . Bottomley, Martin K . Petersen, linux-scsi,
	linux-kernel
  Cc: ALIM AKHTAR, asutoshd, Zang Leigang, Avi Shchislowski, Bean Huo,
	cang, stanley.chu, MOHAMMED RAFIQ KAMAL BASHA, Sang-yoon Oh,
	Jinyoung CHOI, Adel Choi, BoRam Shin, Sung-Jun Park

On Wed, 2020-05-27 at 18:11 +0900, Daejun Park wrote:
> > > > I do not agree that the bus model is the best choice for
> > > > freeing cache
> > > > memory if it is no longer needed. A shrinker is probably a much
> > > > better
> > > > choice because the callback functions in a shrinker get invoked
> > > > when a
> > > > system is under memory pressure. See also register_shrinker(),
> > > > unregister_shrinker() and struct shrinker in
> > > > include/linux/shrinker.h.
> > > 
> > > Since this discussion is closely related to cache allocation,
> > > What is your opinion about allocating the pages dynamically as
> > > the regions
> > > Are being activated/deactivated, in oppose of how it is done
> > > today - 
> > > Statically on init for the entire max-active-subregions?
> > Memory that is statically allocated cannot be used for any other
> > purpose
> > (e.g. page cache) without triggering the associated shrinker. As
> > far as
> > I know shrinkers are only triggered when (close to) out of memory.
> > So
> > dynamically allocating memory as needed is probably a better
> > strategy
> > than statically allocating the entire region at initialization
> > time.
> 
> To improve UFS device performance using the HPB driver, 
> the number of active-subregions above a certain threshold is
> essential.
> If the number of active-subregions is lower than the threshold, 
> the performance improvement by using HPB will be reduced.

Right, but for the embedded system, it is different to mobile usage
case, there should be a maximum limit in the HPB driver, exactly
how much MB of HPB cache, should let the user choose.


> Also, due to frequent and active/inactive protocol overhead, 
> performance may be worse than when the HPB feature is not used.
> 
> Therefore, it is better to unbind/unload HPB driver than 
> to reduce the number of active subregions below the threshold. 
> We designed the HPB driver to make the UFS device work 
> even when the module is unloaded.
>

Actually, along with the HPB running, from point of the HPB cache
consumption, no big difference between dynamical HPB page allocation
and statical allocation. on the contrary, dynamically HPB page
allocation will increase the latency in the HPB entries loading path.

//Bean


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2020-05-27 11:47 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-15 10:30 [RFC PATCH 00/13] scsi: ufs: Add HPB Support Avri Altman
2020-05-15 10:30 ` [RFC PATCH 01/13] scsi: ufs: Add HPB parameters Avri Altman
2020-05-15 10:30 ` [RFC PATCH 02/13] scsi: ufshpb: Init part I - Read HPB config Avri Altman
2020-05-15 15:33   ` Randy Dunlap
2020-05-16  1:46   ` Bart Van Assche
2020-05-16  1:57   ` Bart Van Assche
2020-05-15 10:30 ` [RFC PATCH 03/13] scsi: scsi_dh: Introduce scsi_dh_ufshpb Avri Altman
2020-05-16  1:48   ` Bart Van Assche
2020-05-15 10:30 ` [RFC PATCH 04/13] scsi: ufs: ufshpb: Init part II - Attach scsi device Avri Altman
2020-05-16  1:52   ` Bart Van Assche
2020-05-15 10:30 ` [RFC PATCH 05/13] scsi: ufs: ufshpb: Disable HPB if no HPB-enabled luns Avri Altman
2020-05-16  2:02   ` Bart Van Assche
2020-05-15 10:30 ` [RFC PATCH 06/13] scsi: scsi_dh: ufshpb: Prepare for L2P cache management Avri Altman
2020-05-16  2:13   ` Bart Van Assche
2020-05-15 10:30 ` [RFC PATCH 07/13] scsi: scsi_dh: ufshpb: Add ufshpb state machine Avri Altman
2020-05-16  2:44   ` Bart Van Assche
2020-05-15 10:30 ` [RFC PATCH 08/13] scsi: dh: ufshpb: Activate pinned regions Avri Altman
2020-05-15 10:30 ` [RFC PATCH 09/13] scsi: ufshpb: Add response API Avri Altman
2020-05-16  3:06   ` Bart Van Assche
2020-05-15 10:30 ` [RFC PATCH 10/13] scsi: dh: ufshpb: Add ufshpb_set_params Avri Altman
2020-05-15 10:30 ` [RFC PATCH 11/13] scsi: Allow device handler set their own CDB Avri Altman
2020-05-16  3:19   ` Bart Van Assche
2020-05-15 10:30 ` [RFC PATCH 12/13] scsi: dh: ufshpb: Add prep_fn handler Avri Altman
2020-05-16  3:40   ` Bart Van Assche
2020-05-15 10:30 ` [RFC PATCH 13/13] scsi: scsi_dh: ufshpb: Add "Cold" subregions timer Avri Altman
2020-05-16  3:50 ` [RFC PATCH 00/13] scsi: ufs: Add HPB Support Bart Van Assche
2020-05-16  9:14   ` Avri Altman
2020-05-16 17:14     ` Bart Van Assche
     [not found]     ` <CGME20200516171420epcas2p108c570904c5117c3654d71e0a2842faa@epcms2p7>
2020-05-19 22:31       ` Another approach of UFSHPB yongmyung lee
2020-05-20 17:55         ` Christoph Hellwig
2020-05-20 21:19           ` Bart Van Assche
2020-05-22 16:35             ` Bart Van Assche
2020-05-22 16:49         ` Bart Van Assche
     [not found]         ` <CGME20200516171420epcas2p108c570904c5117c3654d71e0a2842faa@epcms2p4>
2020-05-25  5:40           ` Daejun Park
2020-05-25 14:56             ` Bart Van Assche
2020-05-26  6:15               ` Avri Altman
2020-05-26 17:03                 ` Bart Van Assche
     [not found]                 ` <CGME20200516171420epcas2p108c570904c5117c3654d71e0a2842faa@epcms2p3>
2020-05-27  9:11                   ` Daejun Park
2020-05-27 11:46                     ` Bean Huo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).