* [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value
@ 2020-07-02 12:06 Yufen Yu
2020-07-02 12:06 ` [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf Yufen Yu
` (16 more replies)
0 siblings, 17 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
Hi, all
For now, STRIPE_SIZE is equal to the value of PAGE_SIZE. That means, RAID5
will issue each bio to disk at least 64KB when PAGE_SIZE is 64KB in arm64.
However, filesystem usually issue bio in the unit of 4KB. Then, RAID5 may
waste resource of disk bandwidth.
To solve the problem, this patchset try to set stripe_size as a configuare
value. The default value is 4096. We will add a new sysfs entry and set it
by writing a new value, likely:
echo 16384 > /sys/block/md1/md/stripe_size
Normally, using default stripe_size can get better performance. So, NeilBrown
have suggested just to fix the it as 4096. But, out test result shows that
a big value of stripe_size may have better performance when size of issued
IOs are mostly bigger than 4096. Thus, in this patchset, we still want to
set stripe_size as a configurable value.
In current implementation, grow_buffers() uses alloc_page() to allocate the
buffers for each stripe_head. With the change, it means we allocate 64K buffers
but just use 4K of them. To save memory, we try to let multiple buffers of
stripe_head to share only one real page. Detail shows in following patch.
To evaluate the new feature, we create raid5 device '/dev/md5' with 4 SSD disk
and test it on arm64 machine with 64KB PAGE_SIZE.
1) We format /dev/md5 with mkfs.ext4 and mount ext4 with default configure on
/mnt directory. Then, trying to test it by dbench with command:
dbench -D /mnt -t 1000 10. Result show as:
'stripe_size = 64KB'
Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 9805011 0.021 64.728
Close 7202525 0.001 0.120
Rename 415213 0.051 44.681
Unlink 1980066 0.079 93.147
Deltree 240 1.793 6.516
Mkdir 120 0.004 0.007
Qpathinfo 8887512 0.007 37.114
Qfileinfo 1557262 0.001 0.030
Qfsinfo 1629582 0.012 0.152
Sfileinfo 798756 0.040 57.641
Find 3436004 0.019 57.782
WriteX 4887239 0.021 57.638
ReadX 15370483 0.005 37.818
LockX 31934 0.003 0.022
UnlockX 31933 0.001 0.021
Flush 687205 13.302 530.088
Throughput 307.799 MB/sec 10 clients 10 procs max_latency=530.091 ms
-------------------------------------------------------
'stripe_size = 4KB'
Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 11999166 0.021 36.380
Close 8814128 0.001 0.122
Rename 508113 0.051 29.169
Unlink 2423242 0.070 38.141
Deltree 300 1.885 7.155
Mkdir 150 0.004 0.006
Qpathinfo 10875921 0.007 35.485
Qfileinfo 1905837 0.001 0.032
Qfsinfo 1994304 0.012 0.125
Sfileinfo 977450 0.029 26.489
Find 4204952 0.019 9.361
WriteX 5981890 0.019 27.804
ReadX 18809742 0.004 33.491
LockX 39074 0.003 0.025
UnlockX 39074 0.001 0.014
Flush 841022 10.712 458.848
Throughput 376.777 MB/sec 10 clients 10 procs max_latency=458.852 ms
-------------------------------------------------------
It shows that setting stripe_size as 4KB has higher thoughput, i.e.
(376.777 vs 307.799) and has smaller latency (530.091 vs 458.852)
than that setting as 64KB.
2) We try to evaluate IO throughput for /dev/md5 by fio with config:
[4KB randwrite]
direct=1
numjob=2
iodepth=64
ioengine=libaio
filename=/dev/md5
bs=4KB
rw=randwrite
[64KB write]
direct=1
numjob=2
iodepth=64
ioengine=libaio
filename=/dev/md5
bs=1MB
rw=write
The fio test result as follow:
+ +
| STRIPE_SIZE(64KB) | STRIPE_SIZE(4KB)
+----------------------------------------------------+
4KB randwrite | 15MB/s | 100MB/s
+----------------------------------------------------+
1MB write | 1000MB/s | 700MB/s
The result shows that when size of io is bigger than 4KB (64KB),
64KB stripe_size has much higher IOPS. But for 4KB randwrite, that
means, size of io issued to device are smaller, 4KB stripe_size
have better performance.
V5:
* Rebase code with lastest md-next branch
* Move 'if (new == conf->stripe_size)' down for raid5_store_stripe_size()
* Return error when grow_stripes() fail in raid5_store_stripe_size()
* Split compute syndrome patch into two patch
V4:
* Add sysfs entry for setting stripe_size.
* Fix wrong page index and offset computation for function
raid5_get_dev_page(), raid5_get_page_offset().
* Fix error page offset in handle_stripe_expansion().
V3:
* RAID6 can support shared pages.
* Rename function raid5_compress_stripe_pages() as
raid5_stripe_pages_shared() and update commit message.
* Rename CONFIG_MD_RAID456_STRIPE_SIZE as CONFIG_MD_RAID456_STRIPE_SHIFT,
and make the STRIPE_SIZE as multiple of 4KB.
V2:
https://www.spinics.net/lists/raid/msg64254.html
Introduce share pages strategy to save memory, just support RAID4 and RAID5.
V1:
https://www.spinics.net/lists/raid/msg63111.html
Just add CONFIG_MD_RAID456_STRIPE_SIZE to set STRIPE_SIZE
Yufen Yu (16):
md/raid456: covert macro define of STRIPE_* as members of struct
r5conf
md/raid5: add sysfs entry to set and show stripe_size
md/raid5: set default stripe_size as 4096
md/raid5: add a member of r5pages for struct stripe_head
md/raid5: allocate and free shared pages of r5pages
md/raid5: set correct page offset for bi_io_vec in ops_run_io()
md/raid5: set correct page offset for async_copy_data()
md/raid5: resize stripes and set correct offset when reshape array
md/raid5: add new xor function to support different page offset
md/raid5: add offset array in scribble buffer
md/raid5: compute xor with correct page offset
md/raid5: support config stripe_size by sysfs entry
md/raid6: let syndrome computor support different page offset
md/raid6: let async recovery function support different page offset
md/raid6: compute syndrome with correct page offset
raid6test: adaptation with syndrome function
crypto/async_tx/async_pq.c | 72 ++--
crypto/async_tx/async_raid6_recov.c | 163 ++++++--
crypto/async_tx/async_xor.c | 120 +++++-
crypto/async_tx/raid6test.c | 24 +-
drivers/md/raid5-cache.c | 8 +-
drivers/md/raid5-ppl.c | 12 +-
drivers/md/raid5.c | 627 +++++++++++++++++++++-------
drivers/md/raid5.h | 103 ++++-
include/linux/async_tx.h | 23 +-
9 files changed, 884 insertions(+), 268 deletions(-)
--
2.25.4
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 14:51 ` kernel test robot
` (3 more replies)
2020-07-02 12:06 ` [PATCH v5 02/16] md/raid5: add sysfs entry to set and show stripe_size Yufen Yu
` (15 subsequent siblings)
16 siblings, 4 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
We covert STRIPE_SIZE, STRIPE_SHIFT and STRIPE_SECTORS to stripe_size,
stripe_shift and stripe_sectors as members of struct r5conf. Then each
raid456 array can config different stripe_size. This patch is prepared
for following configurable stripe_size.
Simply replace word STRIPE_ with conf->stripe_ and add 'conf' argument
for function stripe_hash_locks_hash() and r5_next_bio() to get stripe_size.
After that, we initialize stripe_size into setup_conf().
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
drivers/md/raid5-cache.c | 8 +-
drivers/md/raid5-ppl.c | 12 +-
drivers/md/raid5.c | 252 +++++++++++++++++++++++----------------
drivers/md/raid5.h | 41 +++----
4 files changed, 181 insertions(+), 132 deletions(-)
diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 9b6da759dca2..a095de43d4c7 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -298,8 +298,8 @@ r5c_return_dev_pending_writes(struct r5conf *conf, struct r5dev *dev)
wbi = dev->written;
dev->written = NULL;
while (wbi && wbi->bi_iter.bi_sector <
- dev->sector + STRIPE_SECTORS) {
- wbi2 = r5_next_bio(wbi, dev->sector);
+ dev->sector + conf->stripe_sectors) {
+ wbi2 = r5_next_bio(conf, wbi, dev->sector);
md_write_end(conf->mddev);
bio_endio(wbi);
wbi = wbi2;
@@ -316,7 +316,7 @@ void r5c_handle_cached_data_endio(struct r5conf *conf,
set_bit(R5_UPTODATE, &sh->dev[i].flags);
r5c_return_dev_pending_writes(conf, &sh->dev[i]);
md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
- STRIPE_SECTORS,
+ conf->stripe_sectors,
!test_bit(STRIPE_DEGRADED, &sh->state),
0);
}
@@ -364,7 +364,7 @@ void r5c_check_cached_full_stripe(struct r5conf *conf)
*/
if (atomic_read(&conf->r5c_cached_full_stripes) >=
min(R5C_FULL_STRIPE_FLUSH_BATCH(conf),
- conf->chunk_sectors >> STRIPE_SHIFT))
+ conf->chunk_sectors >> conf->stripe_shift))
r5l_wake_reclaim(conf->log, 0);
}
diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c
index d50238d0a85d..16a44cb5751b 100644
--- a/drivers/md/raid5-ppl.c
+++ b/drivers/md/raid5-ppl.c
@@ -324,7 +324,7 @@ static int ppl_log_stripe(struct ppl_log *log, struct stripe_head *sh)
* be just after the last logged stripe and write to the same
* disks. Use bit shift and logarithm to avoid 64-bit division.
*/
- if ((sh->sector == sh_last->sector + STRIPE_SECTORS) &&
+ if ((sh->sector == sh_last->sector + conf->stripe_sectors) &&
(data_sector >> ilog2(conf->chunk_sectors) ==
data_sector_last >> ilog2(conf->chunk_sectors)) &&
((data_sector - data_sector_last) * data_disks ==
@@ -844,9 +844,9 @@ static int ppl_recover_entry(struct ppl_log *log, struct ppl_header_entry *e,
/* if start and end is 4k aligned, use a 4k block */
if (block_size == 512 &&
- (r_sector_first & (STRIPE_SECTORS - 1)) == 0 &&
- (r_sector_last & (STRIPE_SECTORS - 1)) == 0)
- block_size = STRIPE_SIZE;
+ (r_sector_first & (conf->stripe_sectors - 1)) == 0 &&
+ (r_sector_last & (conf->stripe_sectors - 1)) == 0)
+ block_size = conf->stripe_size;
/* iterate through blocks in strip */
for (i = 0; i < strip_sectors; i += (block_size >> 9)) {
@@ -1264,6 +1264,7 @@ static int ppl_validate_rdev(struct md_rdev *rdev)
char b[BDEVNAME_SIZE];
int ppl_data_sectors;
int ppl_size_new;
+ struct r5conf *conf = rdev->mddev->private;
/*
* The configured PPL size must be enough to store
@@ -1274,7 +1275,8 @@ static int ppl_validate_rdev(struct md_rdev *rdev)
ppl_data_sectors = rdev->ppl.size - (PPL_HEADER_SIZE >> 9);
if (ppl_data_sectors > 0)
- ppl_data_sectors = rounddown(ppl_data_sectors, STRIPE_SECTORS);
+ ppl_data_sectors =
+ rounddown(ppl_data_sectors, conf->stripe_sectors);
if (ppl_data_sectors <= 0) {
pr_warn("md/raid:%s: PPL space too small on %s\n",
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ab8067f9ce8c..2981b853c388 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -69,13 +69,13 @@ static struct workqueue_struct *raid5_wq;
static inline struct hlist_head *stripe_hash(struct r5conf *conf, sector_t sect)
{
- int hash = (sect >> STRIPE_SHIFT) & HASH_MASK;
+ int hash = (sect >> conf->stripe_shift) & HASH_MASK;
return &conf->stripe_hashtbl[hash];
}
-static inline int stripe_hash_locks_hash(sector_t sect)
+static inline int stripe_hash_locks_hash(struct r5conf *conf, sector_t sect)
{
- return (sect >> STRIPE_SHIFT) & STRIPE_HASH_LOCKS_MASK;
+ return (sect >> conf->stripe_shift) & STRIPE_HASH_LOCKS_MASK;
}
static inline void lock_device_hash_lock(struct r5conf *conf, int hash)
@@ -627,7 +627,7 @@ raid5_get_active_stripe(struct r5conf *conf, sector_t sector,
int previous, int noblock, int noquiesce)
{
struct stripe_head *sh;
- int hash = stripe_hash_locks_hash(sector);
+ int hash = stripe_hash_locks_hash(conf, sector);
int inc_empty_inactive_list_flag;
pr_debug("get_stripe, sector %llu\n", (unsigned long long)sector);
@@ -748,9 +748,9 @@ static void stripe_add_to_batch_list(struct r5conf *conf, struct stripe_head *sh
tmp_sec = sh->sector;
if (!sector_div(tmp_sec, conf->chunk_sectors))
return;
- head_sector = sh->sector - STRIPE_SECTORS;
+ head_sector = sh->sector - conf->stripe_sectors;
- hash = stripe_hash_locks_hash(head_sector);
+ hash = stripe_hash_locks_hash(conf, head_sector);
spin_lock_irq(conf->hash_locks + hash);
head = __find_stripe(conf, head_sector, conf->generation);
if (head && !atomic_inc_not_zero(&head->count)) {
@@ -1057,8 +1057,9 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
test_bit(WriteErrorSeen, &rdev->flags)) {
sector_t first_bad;
int bad_sectors;
- int bad = is_badblock(rdev, sh->sector, STRIPE_SECTORS,
- &first_bad, &bad_sectors);
+ int bad = is_badblock(rdev, sh->sector,
+ conf->stripe_sectors,
+ &first_bad, &bad_sectors);
if (!bad)
break;
@@ -1089,7 +1090,7 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
if (rdev) {
if (s->syncing || s->expanding || s->expanded
|| s->replacing)
- md_sync_acct(rdev->bdev, STRIPE_SECTORS);
+ md_sync_acct(rdev->bdev, conf->stripe_sectors);
set_bit(STRIPE_IO_STARTED, &sh->state);
@@ -1129,9 +1130,9 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
else
sh->dev[i].vec.bv_page = sh->dev[i].page;
bi->bi_vcnt = 1;
- bi->bi_io_vec[0].bv_len = STRIPE_SIZE;
+ bi->bi_io_vec[0].bv_len = conf->stripe_size;
bi->bi_io_vec[0].bv_offset = 0;
- bi->bi_iter.bi_size = STRIPE_SIZE;
+ bi->bi_iter.bi_size = conf->stripe_size;
bi->bi_write_hint = sh->dev[i].write_hint;
if (!rrdev)
sh->dev[i].write_hint = RWH_WRITE_LIFE_NOT_SET;
@@ -1156,7 +1157,7 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
if (rrdev) {
if (s->syncing || s->expanding || s->expanded
|| s->replacing)
- md_sync_acct(rrdev->bdev, STRIPE_SECTORS);
+ md_sync_acct(rrdev->bdev, conf->stripe_sectors);
set_bit(STRIPE_IO_STARTED, &sh->state);
@@ -1183,9 +1184,9 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
WARN_ON(test_bit(R5_UPTODATE, &sh->dev[i].flags));
sh->dev[i].rvec.bv_page = sh->dev[i].page;
rbi->bi_vcnt = 1;
- rbi->bi_io_vec[0].bv_len = STRIPE_SIZE;
+ rbi->bi_io_vec[0].bv_len = conf->stripe_size;
rbi->bi_io_vec[0].bv_offset = 0;
- rbi->bi_iter.bi_size = STRIPE_SIZE;
+ rbi->bi_iter.bi_size = conf->stripe_size;
rbi->bi_write_hint = sh->dev[i].write_hint;
sh->dev[i].write_hint = RWH_WRITE_LIFE_NOT_SET;
/*
@@ -1235,6 +1236,7 @@ async_copy_data(int frombio, struct bio *bio, struct page **page,
int page_offset;
struct async_submit_ctl submit;
enum async_tx_flags flags = 0;
+ struct r5conf *conf = sh->raid_conf;
if (bio->bi_iter.bi_sector >= sector)
page_offset = (signed)(bio->bi_iter.bi_sector - sector) * 512;
@@ -1256,8 +1258,8 @@ async_copy_data(int frombio, struct bio *bio, struct page **page,
len -= b_offset;
}
- if (len > 0 && page_offset + len > STRIPE_SIZE)
- clen = STRIPE_SIZE - page_offset;
+ if (len > 0 && page_offset + len > conf->stripe_size)
+ clen = conf->stripe_size - page_offset;
else
clen = len;
@@ -1267,7 +1269,7 @@ async_copy_data(int frombio, struct bio *bio, struct page **page,
if (frombio) {
if (sh->raid_conf->skip_copy &&
b_offset == 0 && page_offset == 0 &&
- clen == STRIPE_SIZE &&
+ clen == conf->stripe_size &&
!no_skipcopy)
*page = bio_page;
else
@@ -1292,6 +1294,7 @@ static void ops_complete_biofill(void *stripe_head_ref)
{
struct stripe_head *sh = stripe_head_ref;
int i;
+ struct r5conf *conf = sh->raid_conf;
pr_debug("%s: stripe %llu\n", __func__,
(unsigned long long)sh->sector);
@@ -1312,8 +1315,8 @@ static void ops_complete_biofill(void *stripe_head_ref)
rbi = dev->read;
dev->read = NULL;
while (rbi && rbi->bi_iter.bi_sector <
- dev->sector + STRIPE_SECTORS) {
- rbi2 = r5_next_bio(rbi, dev->sector);
+ dev->sector + conf->stripe_sectors) {
+ rbi2 = r5_next_bio(conf, rbi, dev->sector);
bio_endio(rbi);
rbi = rbi2;
}
@@ -1344,10 +1347,11 @@ static void ops_run_biofill(struct stripe_head *sh)
dev->toread = NULL;
spin_unlock_irq(&sh->stripe_lock);
while (rbi && rbi->bi_iter.bi_sector <
- dev->sector + STRIPE_SECTORS) {
+ dev->sector + sh->raid_conf->stripe_sectors) {
tx = async_copy_data(0, rbi, &dev->page,
dev->sector, tx, sh, 0);
- rbi = r5_next_bio(rbi, dev->sector);
+ rbi = r5_next_bio(sh->raid_conf, rbi,
+ dev->sector);
}
}
}
@@ -1413,6 +1417,7 @@ ops_run_compute5(struct stripe_head *sh, struct raid5_percpu *percpu)
struct dma_async_tx_descriptor *tx;
struct async_submit_ctl submit;
int i;
+ struct r5conf *conf = sh->raid_conf;
BUG_ON(sh->batch_head);
@@ -1429,9 +1434,11 @@ ops_run_compute5(struct stripe_head *sh, struct raid5_percpu *percpu)
init_async_submit(&submit, ASYNC_TX_FENCE|ASYNC_TX_XOR_ZERO_DST, NULL,
ops_complete_compute, sh, to_addr_conv(sh, percpu, 0));
if (unlikely(count == 1))
- tx = async_memcpy(xor_dest, xor_srcs[0], 0, 0, STRIPE_SIZE, &submit);
+ tx = async_memcpy(xor_dest, xor_srcs[0], 0, 0,
+ conf->stripe_size, &submit);
else
- tx = async_xor(xor_dest, xor_srcs, 0, count, STRIPE_SIZE, &submit);
+ tx = async_xor(xor_dest, xor_srcs, 0, count,
+ conf->stripe_size, &submit);
return tx;
}
@@ -1496,6 +1503,7 @@ ops_run_compute6_1(struct stripe_head *sh, struct raid5_percpu *percpu)
struct page *dest;
int i;
int count;
+ struct r5conf *conf = sh->raid_conf;
BUG_ON(sh->batch_head);
if (sh->ops.target < 0)
@@ -1522,7 +1530,8 @@ ops_run_compute6_1(struct stripe_head *sh, struct raid5_percpu *percpu)
init_async_submit(&submit, ASYNC_TX_FENCE, NULL,
ops_complete_compute, sh,
to_addr_conv(sh, percpu, 0));
- tx = async_gen_syndrome(blocks, 0, count+2, STRIPE_SIZE, &submit);
+ tx = async_gen_syndrome(blocks, 0, count+2,
+ conf->stripe_size, &submit);
} else {
/* Compute any data- or p-drive using XOR */
count = 0;
@@ -1535,7 +1544,8 @@ ops_run_compute6_1(struct stripe_head *sh, struct raid5_percpu *percpu)
init_async_submit(&submit, ASYNC_TX_FENCE|ASYNC_TX_XOR_ZERO_DST,
NULL, ops_complete_compute, sh,
to_addr_conv(sh, percpu, 0));
- tx = async_xor(dest, blocks, 0, count, STRIPE_SIZE, &submit);
+ tx = async_xor(dest, blocks, 0, count,
+ conf->stripe_size, &submit);
}
return tx;
@@ -1555,6 +1565,7 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu)
struct dma_async_tx_descriptor *tx;
struct page **blocks = to_addr_page(percpu, 0);
struct async_submit_ctl submit;
+ struct r5conf *conf = sh->raid_conf;
BUG_ON(sh->batch_head);
pr_debug("%s: stripe %llu block1: %d block2: %d\n",
@@ -1598,7 +1609,7 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu)
ops_complete_compute, sh,
to_addr_conv(sh, percpu, 0));
return async_gen_syndrome(blocks, 0, syndrome_disks+2,
- STRIPE_SIZE, &submit);
+ conf->stripe_size, &submit);
} else {
struct page *dest;
int data_target;
@@ -1621,15 +1632,15 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu)
ASYNC_TX_FENCE|ASYNC_TX_XOR_ZERO_DST,
NULL, NULL, NULL,
to_addr_conv(sh, percpu, 0));
- tx = async_xor(dest, blocks, 0, count, STRIPE_SIZE,
- &submit);
+ tx = async_xor(dest, blocks, 0, count,
+ conf->stripe_size, &submit);
count = set_syndrome_sources(blocks, sh, SYNDROME_SRC_ALL);
init_async_submit(&submit, ASYNC_TX_FENCE, tx,
ops_complete_compute, sh,
to_addr_conv(sh, percpu, 0));
return async_gen_syndrome(blocks, 0, count+2,
- STRIPE_SIZE, &submit);
+ conf->stripe_size, &submit);
}
} else {
init_async_submit(&submit, ASYNC_TX_FENCE, NULL,
@@ -1638,12 +1649,13 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu)
if (failb == syndrome_disks) {
/* We're missing D+P. */
return async_raid6_datap_recov(syndrome_disks+2,
- STRIPE_SIZE, faila,
+ conf->stripe_size, faila,
blocks, &submit);
} else {
/* We're missing D+D. */
return async_raid6_2data_recov(syndrome_disks+2,
- STRIPE_SIZE, faila, failb,
+ conf->stripe_size,
+ faila, failb,
blocks, &submit);
}
}
@@ -1672,6 +1684,7 @@ ops_run_prexor5(struct stripe_head *sh, struct raid5_percpu *percpu,
struct page **xor_srcs = to_addr_page(percpu, 0);
int count = 0, pd_idx = sh->pd_idx, i;
struct async_submit_ctl submit;
+ struct r5conf *conf = sh->raid_conf;
/* existing parity data subtracted */
struct page *xor_dest = xor_srcs[count++] = sh->dev[pd_idx].page;
@@ -1691,7 +1704,8 @@ ops_run_prexor5(struct stripe_head *sh, struct raid5_percpu *percpu,
init_async_submit(&submit, ASYNC_TX_FENCE|ASYNC_TX_XOR_DROP_DST, tx,
ops_complete_prexor, sh, to_addr_conv(sh, percpu, 0));
- tx = async_xor(xor_dest, xor_srcs, 0, count, STRIPE_SIZE, &submit);
+ tx = async_xor(xor_dest, xor_srcs, 0, count,
+ conf->stripe_size, &submit);
return tx;
}
@@ -1703,6 +1717,7 @@ ops_run_prexor6(struct stripe_head *sh, struct raid5_percpu *percpu,
struct page **blocks = to_addr_page(percpu, 0);
int count;
struct async_submit_ctl submit;
+ struct r5conf *conf = sh->raid_conf;
pr_debug("%s: stripe %llu\n", __func__,
(unsigned long long)sh->sector);
@@ -1711,7 +1726,8 @@ ops_run_prexor6(struct stripe_head *sh, struct raid5_percpu *percpu,
init_async_submit(&submit, ASYNC_TX_FENCE|ASYNC_TX_PQ_XOR_DST, tx,
ops_complete_prexor, sh, to_addr_conv(sh, percpu, 0));
- tx = async_gen_syndrome(blocks, 0, count+2, STRIPE_SIZE, &submit);
+ tx = async_gen_syndrome(blocks, 0, count+2,
+ conf->stripe_size, &submit);
return tx;
}
@@ -1752,7 +1768,7 @@ ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
WARN_ON(dev->page != dev->orig_page);
while (wbi && wbi->bi_iter.bi_sector <
- dev->sector + STRIPE_SECTORS) {
+ dev->sector + conf->stripe_sectors) {
if (wbi->bi_opf & REQ_FUA)
set_bit(R5_WantFUA, &dev->flags);
if (wbi->bi_opf & REQ_SYNC)
@@ -1770,7 +1786,7 @@ ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
clear_bit(R5_OVERWRITE, &dev->flags);
}
}
- wbi = r5_next_bio(wbi, dev->sector);
+ wbi = r5_next_bio(conf, wbi, dev->sector);
}
if (head_sh->batch_head) {
@@ -1848,6 +1864,7 @@ ops_run_reconstruct5(struct stripe_head *sh, struct raid5_percpu *percpu,
int j = 0;
struct stripe_head *head_sh = sh;
int last_stripe;
+ struct r5conf *conf = sh->raid_conf;
pr_debug("%s: stripe %llu\n", __func__,
(unsigned long long)sh->sector);
@@ -1910,9 +1927,11 @@ ops_run_reconstruct5(struct stripe_head *sh, struct raid5_percpu *percpu,
}
if (unlikely(count == 1))
- tx = async_memcpy(xor_dest, xor_srcs[0], 0, 0, STRIPE_SIZE, &submit);
+ tx = async_memcpy(xor_dest, xor_srcs[0], 0, 0,
+ conf->stripe_size, &submit);
else
- tx = async_xor(xor_dest, xor_srcs, 0, count, STRIPE_SIZE, &submit);
+ tx = async_xor(xor_dest, xor_srcs, 0, count,
+ conf->stripe_size, &submit);
if (!last_stripe) {
j++;
sh = list_first_entry(&sh->batch_list, struct stripe_head,
@@ -1932,6 +1951,7 @@ ops_run_reconstruct6(struct stripe_head *sh, struct raid5_percpu *percpu,
int last_stripe;
int synflags;
unsigned long txflags;
+ struct r5conf *conf = sh->raid_conf;
pr_debug("%s: stripe %llu\n", __func__, (unsigned long long)sh->sector);
@@ -1972,7 +1992,8 @@ ops_run_reconstruct6(struct stripe_head *sh, struct raid5_percpu *percpu,
} else
init_async_submit(&submit, 0, tx, NULL, NULL,
to_addr_conv(sh, percpu, j));
- tx = async_gen_syndrome(blocks, 0, count+2, STRIPE_SIZE, &submit);
+ tx = async_gen_syndrome(blocks, 0, count+2,
+ conf->stripe_size, &submit);
if (!last_stripe) {
j++;
sh = list_first_entry(&sh->batch_list, struct stripe_head,
@@ -2004,6 +2025,7 @@ static void ops_run_check_p(struct stripe_head *sh, struct raid5_percpu *percpu)
struct async_submit_ctl submit;
int count;
int i;
+ struct r5conf *conf = sh->raid_conf;
pr_debug("%s: stripe %llu\n", __func__,
(unsigned long long)sh->sector);
@@ -2020,7 +2042,7 @@ static void ops_run_check_p(struct stripe_head *sh, struct raid5_percpu *percpu)
init_async_submit(&submit, 0, NULL, NULL, NULL,
to_addr_conv(sh, percpu, 0));
- tx = async_xor_val(xor_dest, xor_srcs, 0, count, STRIPE_SIZE,
+ tx = async_xor_val(xor_dest, xor_srcs, 0, count, conf->stripe_size,
&sh->ops.zero_sum_result, &submit);
atomic_inc(&sh->count);
@@ -2033,6 +2055,7 @@ static void ops_run_check_pq(struct stripe_head *sh, struct raid5_percpu *percpu
struct page **srcs = to_addr_page(percpu, 0);
struct async_submit_ctl submit;
int count;
+ struct r5conf *conf = sh->raid_conf;
pr_debug("%s: stripe %llu checkp: %d\n", __func__,
(unsigned long long)sh->sector, checkp);
@@ -2045,7 +2068,7 @@ static void ops_run_check_pq(struct stripe_head *sh, struct raid5_percpu *percpu
atomic_inc(&sh->count);
init_async_submit(&submit, ASYNC_TX_ACK, NULL, ops_complete_check,
sh, to_addr_conv(sh, percpu, 0));
- async_syndrome_val(srcs, 0, count+2, STRIPE_SIZE,
+ async_syndrome_val(srcs, 0, count+2, conf->stripe_size,
&sh->ops.zero_sum_result, percpu->spare_page, &submit);
}
@@ -2275,7 +2298,7 @@ static int resize_chunks(struct r5conf *conf, int new_disks, int new_sectors)
percpu = per_cpu_ptr(conf->percpu, cpu);
err = scribble_alloc(percpu, new_disks,
- new_sectors / STRIPE_SECTORS);
+ new_sectors / conf->stripe_sectors);
if (err)
break;
}
@@ -2509,10 +2532,11 @@ static void raid5_end_read_request(struct bio * bi)
*/
pr_info_ratelimited(
"md/raid:%s: read error corrected (%lu sectors at %llu on %s)\n",
- mdname(conf->mddev), STRIPE_SECTORS,
+ mdname(conf->mddev),
+ conf->stripe_sectors,
(unsigned long long)s,
bdevname(rdev->bdev, b));
- atomic_add(STRIPE_SECTORS, &rdev->corrected_errors);
+ atomic_add(conf->stripe_sectors, &rdev->corrected_errors);
clear_bit(R5_ReadError, &sh->dev[i].flags);
clear_bit(R5_ReWrite, &sh->dev[i].flags);
} else if (test_bit(R5_ReadNoMerge, &sh->dev[i].flags))
@@ -2585,7 +2609,8 @@ static void raid5_end_read_request(struct bio * bi)
if (!(set_bad
&& test_bit(In_sync, &rdev->flags)
&& rdev_set_badblocks(
- rdev, sh->sector, STRIPE_SECTORS, 0)))
+ rdev, sh->sector,
+ conf->stripe_sectors, 0)))
md_error(conf->mddev, rdev);
}
}
@@ -2637,7 +2662,7 @@ static void raid5_end_write_request(struct bio *bi)
if (bi->bi_status)
md_error(conf->mddev, rdev);
else if (is_badblock(rdev, sh->sector,
- STRIPE_SECTORS,
+ conf->stripe_sectors,
&first_bad, &bad_sectors))
set_bit(R5_MadeGoodRepl, &sh->dev[i].flags);
} else {
@@ -2649,7 +2674,7 @@ static void raid5_end_write_request(struct bio *bi)
set_bit(MD_RECOVERY_NEEDED,
&rdev->mddev->recovery);
} else if (is_badblock(rdev, sh->sector,
- STRIPE_SECTORS,
+ conf->stripe_sectors,
&first_bad, &bad_sectors)) {
set_bit(R5_MadeGood, &sh->dev[i].flags);
if (test_bit(R5_ReadError, &sh->dev[i].flags))
@@ -3283,13 +3308,13 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx,
/* check if page is covered */
sector_t sector = sh->dev[dd_idx].sector;
for (bi=sh->dev[dd_idx].towrite;
- sector < sh->dev[dd_idx].sector + STRIPE_SECTORS &&
+ sector < sh->dev[dd_idx].sector + conf->stripe_sectors &&
bi && bi->bi_iter.bi_sector <= sector;
- bi = r5_next_bio(bi, sh->dev[dd_idx].sector)) {
+ bi = r5_next_bio(conf, bi, sh->dev[dd_idx].sector)) {
if (bio_end_sector(bi) >= sector)
sector = bio_end_sector(bi);
}
- if (sector >= sh->dev[dd_idx].sector + STRIPE_SECTORS)
+ if (sector >= sh->dev[dd_idx].sector + conf->stripe_sectors)
if (!test_and_set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags))
sh->overwrite_disks++;
}
@@ -3314,7 +3339,7 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx,
set_bit(STRIPE_BITMAP_PENDING, &sh->state);
spin_unlock_irq(&sh->stripe_lock);
md_bitmap_startwrite(conf->mddev->bitmap, sh->sector,
- STRIPE_SECTORS, 0);
+ conf->stripe_sectors, 0);
spin_lock_irq(&sh->stripe_lock);
clear_bit(STRIPE_BITMAP_PENDING, &sh->state);
if (!sh->batch_head) {
@@ -3376,7 +3401,7 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
if (!rdev_set_badblocks(
rdev,
sh->sector,
- STRIPE_SECTORS, 0))
+ conf->stripe_sectors, 0))
md_error(conf->mddev, rdev);
rdev_dec_pending(rdev, conf->mddev);
}
@@ -3396,8 +3421,9 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
wake_up(&conf->wait_for_overlap);
while (bi && bi->bi_iter.bi_sector <
- sh->dev[i].sector + STRIPE_SECTORS) {
- struct bio *nextbi = r5_next_bio(bi, sh->dev[i].sector);
+ sh->dev[i].sector + conf->stripe_sectors) {
+ struct bio *nextbi =
+ r5_next_bio(conf, bi, sh->dev[i].sector);
md_write_end(conf->mddev);
bio_io_error(bi);
@@ -3405,7 +3431,7 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
}
if (bitmap_end)
md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
- STRIPE_SECTORS, 0, 0);
+ conf->stripe_sectors, 0, 0);
bitmap_end = 0;
/* and fail all 'written' */
bi = sh->dev[i].written;
@@ -3417,8 +3443,9 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
if (bi) bitmap_end = 1;
while (bi && bi->bi_iter.bi_sector <
- sh->dev[i].sector + STRIPE_SECTORS) {
- struct bio *bi2 = r5_next_bio(bi, sh->dev[i].sector);
+ sh->dev[i].sector + conf->stripe_sectors) {
+ struct bio *bi2 =
+ r5_next_bio(conf, bi, sh->dev[i].sector);
md_write_end(conf->mddev);
bio_io_error(bi);
@@ -3441,9 +3468,9 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
if (bi)
s->to_read--;
while (bi && bi->bi_iter.bi_sector <
- sh->dev[i].sector + STRIPE_SECTORS) {
- struct bio *nextbi =
- r5_next_bio(bi, sh->dev[i].sector);
+ sh->dev[i].sector + conf->stripe_sectors) {
+ struct bio *nextbi = r5_next_bio(conf,
+ bi, sh->dev[i].sector);
bio_io_error(bi);
bi = nextbi;
@@ -3451,7 +3478,7 @@ handle_failed_stripe(struct r5conf *conf, struct stripe_head *sh,
}
if (bitmap_end)
md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
- STRIPE_SECTORS, 0, 0);
+ conf->stripe_sectors, 0, 0);
/* If we were in the middle of a write the parity block might
* still be locked - so just clear all R5_LOCKED flags
*/
@@ -3496,14 +3523,14 @@ handle_failed_sync(struct r5conf *conf, struct stripe_head *sh,
&& !test_bit(Faulty, &rdev->flags)
&& !test_bit(In_sync, &rdev->flags)
&& !rdev_set_badblocks(rdev, sh->sector,
- STRIPE_SECTORS, 0))
+ conf->stripe_sectors, 0))
abort = 1;
rdev = rcu_dereference(conf->disks[i].replacement);
if (rdev
&& !test_bit(Faulty, &rdev->flags)
&& !test_bit(In_sync, &rdev->flags)
&& !rdev_set_badblocks(rdev, sh->sector,
- STRIPE_SECTORS, 0))
+ conf->stripe_sectors, 0))
abort = 1;
}
rcu_read_unlock();
@@ -3511,7 +3538,7 @@ handle_failed_sync(struct r5conf *conf, struct stripe_head *sh,
conf->recovery_disabled =
conf->mddev->recovery_disabled;
}
- md_done_sync(conf->mddev, STRIPE_SECTORS, !abort);
+ md_done_sync(conf->mddev, conf->stripe_sectors, !abort);
}
static int want_replace(struct stripe_head *sh, int disk_idx)
@@ -3785,14 +3812,15 @@ static void handle_stripe_clean_event(struct r5conf *conf,
wbi = dev->written;
dev->written = NULL;
while (wbi && wbi->bi_iter.bi_sector <
- dev->sector + STRIPE_SECTORS) {
- wbi2 = r5_next_bio(wbi, dev->sector);
+ dev->sector + conf->stripe_sectors) {
+ wbi2 = r5_next_bio(conf,
+ wbi, dev->sector);
md_write_end(conf->mddev);
bio_endio(wbi);
wbi = wbi2;
}
md_bitmap_endwrite(conf->mddev->bitmap, sh->sector,
- STRIPE_SECTORS,
+ conf->stripe_sectors,
!test_bit(STRIPE_DEGRADED, &sh->state),
0);
if (head_sh->batch_head) {
@@ -4099,7 +4127,8 @@ static void handle_parity_checks5(struct r5conf *conf, struct stripe_head *sh,
*/
set_bit(STRIPE_INSYNC, &sh->state);
else {
- atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches);
+ atomic64_add(conf->stripe_sectors,
+ &conf->mddev->resync_mismatches);
if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) {
/* don't try to repair!! */
set_bit(STRIPE_INSYNC, &sh->state);
@@ -4107,7 +4136,7 @@ static void handle_parity_checks5(struct r5conf *conf, struct stripe_head *sh,
"%llu-%llu\n", mdname(conf->mddev),
(unsigned long long) sh->sector,
(unsigned long long) sh->sector +
- STRIPE_SECTORS);
+ conf->stripe_sectors);
} else {
sh->check_state = check_state_compute_run;
set_bit(STRIPE_COMPUTE_RUN, &sh->state);
@@ -4264,7 +4293,8 @@ static void handle_parity_checks6(struct r5conf *conf, struct stripe_head *sh,
*/
}
} else {
- atomic64_add(STRIPE_SECTORS, &conf->mddev->resync_mismatches);
+ atomic64_add(conf->stripe_sectors,
+ &conf->mddev->resync_mismatches);
if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery)) {
/* don't try to repair!! */
set_bit(STRIPE_INSYNC, &sh->state);
@@ -4272,7 +4302,7 @@ static void handle_parity_checks6(struct r5conf *conf, struct stripe_head *sh,
"%llu-%llu\n", mdname(conf->mddev),
(unsigned long long) sh->sector,
(unsigned long long) sh->sector +
- STRIPE_SECTORS);
+ conf->stripe_sectors);
} else {
int *target = &sh->ops.target;
@@ -4343,7 +4373,8 @@ static void handle_stripe_expansion(struct r5conf *conf, struct stripe_head *sh)
/* place all the copies on one channel */
init_async_submit(&submit, 0, tx, NULL, NULL, NULL);
tx = async_memcpy(sh2->dev[dd_idx].page,
- sh->dev[i].page, 0, 0, STRIPE_SIZE,
+ sh->dev[i].page, 0, 0,
+ conf->stripe_size,
&submit);
set_bit(R5_Expanded, &sh2->dev[dd_idx].flags);
@@ -4442,8 +4473,9 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
*/
rdev = rcu_dereference(conf->disks[i].replacement);
if (rdev && !test_bit(Faulty, &rdev->flags) &&
- rdev->recovery_offset >= sh->sector + STRIPE_SECTORS &&
- !is_badblock(rdev, sh->sector, STRIPE_SECTORS,
+ (rdev->recovery_offset >=
+ sh->sector + conf->stripe_sectors) &&
+ !is_badblock(rdev, sh->sector, conf->stripe_sectors,
&first_bad, &bad_sectors))
set_bit(R5_ReadRepl, &dev->flags);
else {
@@ -4457,8 +4489,9 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
if (rdev && test_bit(Faulty, &rdev->flags))
rdev = NULL;
if (rdev) {
- is_bad = is_badblock(rdev, sh->sector, STRIPE_SECTORS,
- &first_bad, &bad_sectors);
+ is_bad = is_badblock(rdev, sh->sector,
+ conf->stripe_sectors,
+ &first_bad, &bad_sectors);
if (s->blocked_rdev == NULL
&& (test_bit(Blocked, &rdev->flags)
|| is_bad < 0)) {
@@ -4484,7 +4517,8 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
}
} else if (test_bit(In_sync, &rdev->flags))
set_bit(R5_Insync, &dev->flags);
- else if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
+ else if (sh->sector + conf->stripe_sectors <=
+ rdev->recovery_offset)
/* in sync if before recovery_offset */
set_bit(R5_Insync, &dev->flags);
else if (test_bit(R5_UPTODATE, &dev->flags) &&
@@ -4927,7 +4961,7 @@ static void handle_stripe(struct stripe_head *sh)
if ((s.syncing || s.replacing) && s.locked == 0 &&
!test_bit(STRIPE_COMPUTE_RUN, &sh->state) &&
test_bit(STRIPE_INSYNC, &sh->state)) {
- md_done_sync(conf->mddev, STRIPE_SECTORS, 1);
+ md_done_sync(conf->mddev, conf->stripe_sectors, 1);
clear_bit(STRIPE_SYNCING, &sh->state);
if (test_and_clear_bit(R5_Overlap, &sh->dev[sh->pd_idx].flags))
wake_up(&conf->wait_for_overlap);
@@ -4995,7 +5029,7 @@ static void handle_stripe(struct stripe_head *sh)
clear_bit(STRIPE_EXPAND_READY, &sh->state);
atomic_dec(&conf->reshape_stripes);
wake_up(&conf->wait_for_overlap);
- md_done_sync(conf->mddev, STRIPE_SECTORS, 1);
+ md_done_sync(conf->mddev, conf->stripe_sectors, 1);
}
if (s.expanding && s.locked == 0 &&
@@ -5025,14 +5059,14 @@ static void handle_stripe(struct stripe_head *sh)
/* We own a safe reference to the rdev */
rdev = conf->disks[i].rdev;
if (!rdev_set_badblocks(rdev, sh->sector,
- STRIPE_SECTORS, 0))
+ conf->stripe_sectors, 0))
md_error(conf->mddev, rdev);
rdev_dec_pending(rdev, conf->mddev);
}
if (test_and_clear_bit(R5_MadeGood, &dev->flags)) {
rdev = conf->disks[i].rdev;
rdev_clear_badblocks(rdev, sh->sector,
- STRIPE_SECTORS, 0);
+ conf->stripe_sectors, 0);
rdev_dec_pending(rdev, conf->mddev);
}
if (test_and_clear_bit(R5_MadeGoodRepl, &dev->flags)) {
@@ -5041,7 +5075,7 @@ static void handle_stripe(struct stripe_head *sh)
/* rdev have been moved down */
rdev = conf->disks[i].rdev;
rdev_clear_badblocks(rdev, sh->sector,
- STRIPE_SECTORS, 0);
+ conf->stripe_sectors, 0);
rdev_dec_pending(rdev, conf->mddev);
}
}
@@ -5505,7 +5539,8 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi)
/* Skip discard while reshape is happening */
return;
- logical_sector = bi->bi_iter.bi_sector & ~((sector_t)STRIPE_SECTORS-1);
+ logical_sector = bi->bi_iter.bi_sector &
+ ~((sector_t)conf->stripe_sectors-1);
last_sector = bio_end_sector(bi);
bi->bi_next = NULL;
@@ -5520,7 +5555,7 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi)
last_sector *= conf->chunk_sectors;
for (; logical_sector < last_sector;
- logical_sector += STRIPE_SECTORS) {
+ logical_sector += conf->stripe_sectors) {
DEFINE_WAIT(w);
int d;
again:
@@ -5565,7 +5600,7 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi)
d++)
md_bitmap_startwrite(mddev->bitmap,
sh->sector,
- STRIPE_SECTORS,
+ conf->stripe_sectors,
0);
sh->bm_seq = conf->seq_flush + 1;
set_bit(STRIPE_BIT_DELAY, &sh->state);
@@ -5630,12 +5665,14 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
return true;
}
- logical_sector = bi->bi_iter.bi_sector & ~((sector_t)STRIPE_SECTORS-1);
+ logical_sector = bi->bi_iter.bi_sector &
+ ~((sector_t)conf->stripe_sectors-1);
last_sector = bio_end_sector(bi);
bi->bi_next = NULL;
prepare_to_wait(&conf->wait_for_overlap, &w, TASK_UNINTERRUPTIBLE);
- for (;logical_sector < last_sector; logical_sector += STRIPE_SECTORS) {
+ for (; logical_sector < last_sector;
+ logical_sector += conf->stripe_sectors) {
int previous;
int seq;
@@ -5917,7 +5954,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
}
INIT_LIST_HEAD(&stripes);
- for (i = 0; i < reshape_sectors; i += STRIPE_SECTORS) {
+ for (i = 0; i < reshape_sectors; i += conf->stripe_sectors) {
int j;
int skipped_disk = 0;
sh = raid5_get_active_stripe(conf, stripe_addr+i, 0, 0, 1);
@@ -5938,7 +5975,8 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
skipped_disk = 1;
continue;
}
- memset(page_address(sh->dev[j].page), 0, STRIPE_SIZE);
+ memset(page_address(sh->dev[j].page), 0,
+ conf->stripe_size);
set_bit(R5_Expanded, &sh->dev[j].flags);
set_bit(R5_UPTODATE, &sh->dev[j].flags);
}
@@ -5973,7 +6011,7 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
set_bit(STRIPE_EXPAND_SOURCE, &sh->state);
set_bit(STRIPE_HANDLE, &sh->state);
raid5_release_stripe(sh);
- first_sector += STRIPE_SECTORS;
+ first_sector += conf->stripe_sectors;
}
/* Now that the sources are clearly marked, we can release
* the destination stripes
@@ -6079,11 +6117,12 @@ static inline sector_t raid5_sync_request(struct mddev *mddev, sector_t sector_n
if (!test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
!conf->fullsync &&
!md_bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, 1) &&
- sync_blocks >= STRIPE_SECTORS) {
+ sync_blocks >= conf->stripe_sectors) {
/* we can skip this block, and probably more */
- sync_blocks /= STRIPE_SECTORS;
+ sync_blocks /= conf->stripe_sectors;
*skipped = 1;
- return sync_blocks * STRIPE_SECTORS; /* keep things rounded to whole stripes */
+ /* keep things rounded to whole stripes */
+ return sync_blocks * conf->stripe_sectors;
}
md_bitmap_cond_end_sync(mddev->bitmap, sector_nr, false);
@@ -6116,7 +6155,7 @@ static inline sector_t raid5_sync_request(struct mddev *mddev, sector_t sector_n
raid5_release_stripe(sh);
- return STRIPE_SECTORS;
+ return conf->stripe_sectors;
}
static int retry_aligned_read(struct r5conf *conf, struct bio *raid_bio,
@@ -6139,14 +6178,14 @@ static int retry_aligned_read(struct r5conf *conf, struct bio *raid_bio,
int handled = 0;
logical_sector = raid_bio->bi_iter.bi_sector &
- ~((sector_t)STRIPE_SECTORS-1);
+ ~((sector_t)conf->stripe_sectors-1);
sector = raid5_compute_sector(conf, logical_sector,
0, &dd_idx, NULL);
last_sector = bio_end_sector(raid_bio);
for (; logical_sector < last_sector;
- logical_sector += STRIPE_SECTORS,
- sector += STRIPE_SECTORS,
+ logical_sector += conf->stripe_sectors,
+ sector += conf->stripe_sectors,
scnt++) {
if (scnt < offset)
@@ -6766,7 +6805,7 @@ static int alloc_scratch_buffer(struct r5conf *conf, struct raid5_percpu *percpu
conf->previous_raid_disks),
max(conf->chunk_sectors,
conf->prev_chunk_sectors)
- / STRIPE_SECTORS)) {
+ / conf->stripe_sectors)) {
free_scratch_buffer(conf, percpu);
return -ENOMEM;
}
@@ -6918,6 +6957,11 @@ static struct r5conf *setup_conf(struct mddev *mddev)
conf = kzalloc(sizeof(struct r5conf), GFP_KERNEL);
if (conf == NULL)
goto abort;
+
+ conf->stripe_size = PAGE_SIZE;
+ conf->stripe_shift = PAGE_SHIFT - 9;
+ conf->stripe_sectors = conf->stripe_size >> 9;
+
INIT_LIST_HEAD(&conf->free_list);
INIT_LIST_HEAD(&conf->pending_list);
conf->pending_data = kcalloc(PENDING_IO_MAX,
@@ -7069,8 +7113,9 @@ static struct r5conf *setup_conf(struct mddev *mddev)
conf->min_nr_stripes = NR_STRIPES;
if (mddev->reshape_position != MaxSector) {
int stripes = max_t(int,
- ((mddev->chunk_sectors << 9) / STRIPE_SIZE) * 4,
- ((mddev->new_chunk_sectors << 9) / STRIPE_SIZE) * 4);
+ ((mddev->chunk_sectors << 9) / conf->stripe_size) * 4,
+ ((mddev->new_chunk_sectors << 9) /
+ conf->stripe_size) * 4);
conf->min_nr_stripes = max(NR_STRIPES, stripes);
if (conf->min_nr_stripes != NR_STRIPES)
pr_info("md/raid:%s: force stripe size %d for reshape\n",
@@ -7801,14 +7846,14 @@ static int check_stripe_cache(struct mddev *mddev)
* stripe_heads first.
*/
struct r5conf *conf = mddev->private;
- if (((mddev->chunk_sectors << 9) / STRIPE_SIZE) * 4
+ if (((mddev->chunk_sectors << 9) / conf->stripe_size) * 4
> conf->min_nr_stripes ||
- ((mddev->new_chunk_sectors << 9) / STRIPE_SIZE) * 4
+ ((mddev->new_chunk_sectors << 9) / conf->stripe_size) * 4
> conf->min_nr_stripes) {
pr_warn("md/raid:%s: reshape: not enough stripes. Needed %lu\n",
mdname(mddev),
((max(mddev->chunk_sectors, mddev->new_chunk_sectors) << 9)
- / STRIPE_SIZE)*4);
+ / conf->stripe_size)*4);
return 0;
}
return 1;
@@ -8127,6 +8172,7 @@ static void *raid5_takeover_raid1(struct mddev *mddev)
{
int chunksect;
void *ret;
+ struct r5conf *conf = mddev->private;
if (mddev->raid_disks != 2 ||
mddev->degraded > 1)
@@ -8140,7 +8186,7 @@ static void *raid5_takeover_raid1(struct mddev *mddev)
while (chunksect && (mddev->array_sectors & (chunksect-1)))
chunksect >>= 1;
- if ((chunksect<<9) < STRIPE_SIZE)
+ if ((chunksect<<9) < conf->stripe_size)
/* array size does not allow a suitable chunk size */
return ERR_PTR(-EINVAL);
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index f90e0704bed9..e36cf71e8465 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -472,32 +472,12 @@ struct disk_info {
*/
#define NR_STRIPES 256
-#define STRIPE_SIZE PAGE_SIZE
-#define STRIPE_SHIFT (PAGE_SHIFT - 9)
-#define STRIPE_SECTORS (STRIPE_SIZE>>9)
#define IO_THRESHOLD 1
#define BYPASS_THRESHOLD 1
#define NR_HASH (PAGE_SIZE / sizeof(struct hlist_head))
#define HASH_MASK (NR_HASH - 1)
#define MAX_STRIPE_BATCH 8
-/* bio's attached to a stripe+device for I/O are linked together in bi_sector
- * order without overlap. There may be several bio's per stripe+device, and
- * a bio could span several devices.
- * When walking this list for a particular stripe+device, we must never proceed
- * beyond a bio that extends past this device, as the next bio might no longer
- * be valid.
- * This function is used to determine the 'next' bio in the list, given the
- * sector of the current stripe+device
- */
-static inline struct bio *r5_next_bio(struct bio *bio, sector_t sector)
-{
- if (bio_end_sector(bio) < sector + STRIPE_SECTORS)
- return bio->bi_next;
- else
- return NULL;
-}
-
/* NOTE NR_STRIPE_HASH_LOCKS must remain below 64.
* This is because we sometimes take all the spinlocks
* and creating that much locking depth can cause
@@ -574,6 +554,9 @@ struct r5conf {
int raid_disks;
int max_nr_stripes;
int min_nr_stripes;
+ unsigned int stripe_size;
+ unsigned int stripe_shift;
+ unsigned int stripe_sectors;
/* reshape_progress is the leading edge of a 'reshape'
* It has value MaxSector when no reshape is happening
@@ -752,6 +735,24 @@ static inline int algorithm_is_DDF(int layout)
return layout >= 8 && layout <= 10;
}
+/* bio's attached to a stripe+device for I/O are linked together in bi_sector
+ * order without overlap. There may be several bio's per stripe+device, and
+ * a bio could span several devices.
+ * When walking this list for a particular stripe+device, we must never proceed
+ * beyond a bio that extends past this device, as the next bio might no longer
+ * be valid.
+ * This function is used to determine the 'next' bio in the list, given the
+ * sector of the current stripe+device
+ */
+static inline struct bio *
+r5_next_bio(struct r5conf *conf, struct bio *bio, sector_t sector)
+{
+ if (bio_end_sector(bio) < sector + conf->stripe_sectors)
+ return bio->bi_next;
+ else
+ return NULL;
+}
+
extern void md_raid5_kick_device(struct r5conf *conf);
extern int raid5_set_cache_size(struct mddev *mddev, int size);
extern sector_t raid5_compute_blocknr(struct stripe_head *sh, int i, int previous);
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 02/16] md/raid5: add sysfs entry to set and show stripe_size
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
2020-07-02 12:06 ` [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 22:14 ` Song Liu
2020-07-02 12:06 ` [PATCH v5 03/16] md/raid5: set default stripe_size as 4096 Yufen Yu
` (14 subsequent siblings)
16 siblings, 1 reply; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
Here, we don't support setting stripe_size by sysfs.
Following patches will do that.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
drivers/md/raid5.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 2981b853c388..51bc39dab57b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6518,6 +6518,27 @@ raid5_rmw_level = __ATTR(rmw_level, S_IRUGO | S_IWUSR,
raid5_show_rmw_level,
raid5_store_rmw_level);
+static ssize_t
+raid5_show_stripe_size(struct mddev *mddev, char *page)
+{
+ struct r5conf *conf = mddev->private;
+
+ if (conf)
+ return sprintf(page, "%d\n", conf->stripe_size);
+ else
+ return 0;
+}
+
+static ssize_t
+raid5_store_stripe_size(struct mddev *mddev, const char *page, size_t len)
+{
+ return -EINVAL;
+}
+
+static struct md_sysfs_entry
+raid5_stripe_size = __ATTR(stripe_size, 0644,
+ raid5_show_stripe_size,
+ raid5_store_stripe_size);
static ssize_t
raid5_show_preread_threshold(struct mddev *mddev, char *page)
@@ -6706,6 +6727,7 @@ static struct attribute *raid5_attrs[] = {
&raid5_group_thread_cnt.attr,
&raid5_skip_copy.attr,
&raid5_rmw_level.attr,
+ &raid5_stripe_size.attr,
&r5c_journal_mode.attr,
&ppl_write_hint.attr,
NULL,
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 03/16] md/raid5: set default stripe_size as 4096
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
2020-07-02 12:06 ` [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf Yufen Yu
2020-07-02 12:06 ` [PATCH v5 02/16] md/raid5: add sysfs entry to set and show stripe_size Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 12:06 ` [PATCH v5 04/16] md/raid5: add a member of r5pages for struct stripe_head Yufen Yu
` (13 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
In RAID5, if issued bio size is bigger than stripe_size, it will be split
in the unit of stripe_size and process them one by one. Even for size
less then stripe_size, RAID5 also request data from disk at least of
stripe_size.
Nowdays, stripe_size is equal to the value of PAGE_SIZE. Since filesystem
usually issue bio in the unit of 4KB, there is no problem for PAGE_SIZE as
4KB. But, for 64KB PAGE_SIZE, bio from filesystem requests 4KB data while
RAID5 issue IO at least stripe_size (64KB) each time. That will waste
resource of disk bandwidth and compute xor.
To avoding the waste, we want to make stripe_size configurable. This patch
just set default stripe_size as 4096. User can also set the value bigger
than 4KB for some special requirements, such as we know the issued io
size is more than 4KB.
To evaluate the new feature, we create raid5 device '/dev/md5' with
4 SSD disk and test it on arm64 machine with 64KB PAGE_SIZE.
1) We format /dev/md5 with mkfs.ext4 and mount ext4 with default
configure on /mnt directory. Then, trying to test it by dbench with
command: dbench -D /mnt -t 1000 10. Result show as:
'stripe_size = 64KB'
Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 9805011 0.021 64.728
Close 7202525 0.001 0.120
Rename 415213 0.051 44.681
Unlink 1980066 0.079 93.147
Deltree 240 1.793 6.516
Mkdir 120 0.004 0.007
Qpathinfo 8887512 0.007 37.114
Qfileinfo 1557262 0.001 0.030
Qfsinfo 1629582 0.012 0.152
Sfileinfo 798756 0.040 57.641
Find 3436004 0.019 57.782
WriteX 4887239 0.021 57.638
ReadX 15370483 0.005 37.818
LockX 31934 0.003 0.022
UnlockX 31933 0.001 0.021
Flush 687205 13.302 530.088
Throughput 307.799 MB/sec 10 clients 10 procs max_latency=530.091 ms
-------------------------------------------------------
'stripe_size = 4KB'
Operation Count AvgLat MaxLat
----------------------------------------
NTCreateX 11999166 0.021 36.380
Close 8814128 0.001 0.122
Rename 508113 0.051 29.169
Unlink 2423242 0.070 38.141
Deltree 300 1.885 7.155
Mkdir 150 0.004 0.006
Qpathinfo 10875921 0.007 35.485
Qfileinfo 1905837 0.001 0.032
Qfsinfo 1994304 0.012 0.125
Sfileinfo 977450 0.029 26.489
Find 4204952 0.019 9.361
WriteX 5981890 0.019 27.804
ReadX 18809742 0.004 33.491
LockX 39074 0.003 0.025
UnlockX 39074 0.001 0.014
Flush 841022 10.712 458.848
Throughput 376.777 MB/sec 10 clients 10 procs max_latency=458.852 ms
-------------------------------------------------------
It show that setting stripe_size as 4KB has higher thoughput, i.e.
(376.777 vs 307.799) and has smaller latency (530.091 vs 458.852)
than that setting as 64KB.
2) We try to evaluate IO throughput for /dev/md5 by fio with config:
[4KB randwrite]
direct=1
numjob=2
iodepth=64
ioengine=libaio
filename=/dev/md5
bs=4KB
rw=randwrite
[64KB write]
direct=1
numjob=2
iodepth=64
ioengine=libaio
filename=/dev/md5
bs=1MB
rw=write
The result as follow:
+ +
| stripe_size(64KB) | stripe_size(4KB)
+----------------------------------------------------+
4KB randwrite | 15MB/s | 100MB/s
+----------------------------------------------------+
1MB write | 1000MB/s | 700MB/s
The result show that when size of io is bigger than 4KB (64KB),
64KB stripe_size has much higher IOPS. But for 4KB randwrite, that
means, size of io issued to device are smaller, 4KB stripe_size
have better performance.
Normally, default value (4096) can get relatively good performance.
But if each issued io is bigger than 4096, setting value more than
4096 may get better performance.
Here, we just set default stripe_size as 4096, and we will try to support
setting different stripe_size by sysfs interface in the following patch.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
drivers/md/raid5.c | 6 +++---
drivers/md/raid5.h | 1 +
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 51bc39dab57b..694f6713369d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6980,9 +6980,9 @@ static struct r5conf *setup_conf(struct mddev *mddev)
if (conf == NULL)
goto abort;
- conf->stripe_size = PAGE_SIZE;
- conf->stripe_shift = PAGE_SHIFT - 9;
- conf->stripe_sectors = conf->stripe_size >> 9;
+ conf->stripe_size = DEFAULT_STRIPE_SIZE;
+ conf->stripe_shift = ilog2(DEFAULT_STRIPE_SIZE) - 9;
+ conf->stripe_sectors = DEFAULT_STRIPE_SIZE >> 9;
INIT_LIST_HEAD(&conf->free_list);
INIT_LIST_HEAD(&conf->pending_list);
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index e36cf71e8465..98698569370c 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -477,6 +477,7 @@ struct disk_info {
#define NR_HASH (PAGE_SIZE / sizeof(struct hlist_head))
#define HASH_MASK (NR_HASH - 1)
#define MAX_STRIPE_BATCH 8
+#define DEFAULT_STRIPE_SIZE 4096
/* NOTE NR_STRIPE_HASH_LOCKS must remain below 64.
* This is because we sometimes take all the spinlocks
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 04/16] md/raid5: add a member of r5pages for struct stripe_head
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
` (2 preceding siblings ...)
2020-07-02 12:06 ` [PATCH v5 03/16] md/raid5: set default stripe_size as 4096 Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 22:56 ` Song Liu
2020-07-02 12:06 ` [PATCH v5 05/16] md/raid5: allocate and free shared pages of r5pages Yufen Yu
` (12 subsequent siblings)
16 siblings, 1 reply; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
Since grow_buffers() uses alloc_page() to allocate the buffers for
each stripe_head(), means, it will allocate 64K buffers and just use
4K of them, after setting stripe_size as 4096.
To avoid wasting memory, we try to contain multiple 'page' of sh->dev
into one real page. That means, multiple sh->dev[i].page will point to
the only page with different offset. Example of 64K PAGE_SIZE and
4K stripe_size as following:
64K PAGE_SIZE
+---+---+---+---+------------------------------+
| | | | |
| | | | |
+-+-+-+-+-+-+-+-+------------------------------+
^ ^ ^ ^
| | | +----------------------------+
| | | |
| | +-------------------+ |
| | | |
| +----------+ | |
| | | |
+-+ | | |
| | | |
+-----+-----+------+-----+------+-----+------+------+
sh | offset(0) | offset(4K) | offset(8K) | offset(12K) |
+ +-----------+------------+------------+-------------+
+----> dev[0].page dev[1].page dev[2].page dev[3].page
After trying to share one page, the users of sh->dev[i].page need to
take care:
1) When issue bio into stripe_head, bi_io_vec.bv_page will point to
the page directly. So, we should make sure bv_offset to been set
with correct offset.
2) When compute xor, the page will be passed to computer function.
So, we also need to pass offset of that page to computer. Let it
compute correct location of each sh->dev[i].page.
This patch will add a new member of r5pages into stripe_head to manage
all pages needed by each sh->dev[i]. We also add 'offset' for each r5dev
so that users can get related page offset easily. And add helper function
to get page and it's index in r5pages array by disk index.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
drivers/md/raid5.h | 61 ++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 61 insertions(+)
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 98698569370c..61fe26061c92 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -246,6 +246,13 @@ struct stripe_head {
int target, target2;
enum sum_check_flags zero_sum_result;
} ops;
+
+ /* These pages will be used by bios in dev[i] */
+ struct r5pages {
+ struct page **page;
+ int size; /* page array size */
+ } pages;
+
struct r5dev {
/* rreq and rvec are used for the replacement device when
* writing data to both devices.
@@ -253,6 +260,7 @@ struct stripe_head {
struct bio req, rreq;
struct bio_vec vec, rvec;
struct page *page, *orig_page;
+ unsigned int offset; /* offset of this page */
struct bio *toread, *read, *towrite, *written;
sector_t sector; /* sector of this page */
unsigned long flags;
@@ -754,6 +762,59 @@ r5_next_bio(struct r5conf *conf, struct bio *bio, sector_t sector)
return NULL;
}
+/*
+ * Return corresponding page index of r5pages array.
+ */
+static inline int raid5_get_page_index(struct stripe_head *sh, int disk_idx)
+{
+ struct r5conf *conf = sh->raid_conf;
+ int cnt;
+
+ WARN_ON(!sh->pages.page);
+ BUG_ON(conf->stripe_size > PAGE_SIZE);
+
+ cnt = PAGE_SIZE / conf->stripe_size;
+ return disk_idx / cnt;
+}
+
+/*
+ * Return offset of the corresponding page for r5dev.
+ */
+static inline int raid5_get_page_offset(struct stripe_head *sh, int disk_idx)
+{
+ struct r5conf *conf = sh->raid_conf;
+ int cnt;
+
+ WARN_ON(!sh->pages.page);
+ BUG_ON(conf->stripe_size > PAGE_SIZE);
+
+ cnt = PAGE_SIZE / conf->stripe_size;
+ return (disk_idx % cnt) * conf->stripe_size;
+}
+
+/*
+ * Return corresponding page address for r5dev.
+ */
+static inline struct page *
+raid5_get_dev_page(struct stripe_head *sh, int disk_idx)
+{
+ int idx;
+
+ WARN_ON(!sh->pages.page);
+ idx = raid5_get_page_index(sh, disk_idx);
+ return sh->pages.page[idx];
+}
+
+/*
+ * We want to let multiple buffers to share one real page for
+ * stripe_head when PAGE_SIZE is biggger than stripe_size. If
+ * they are equal, no need to use this strategy.
+ */
+static inline int raid5_stripe_pages_shared(struct r5conf *conf)
+{
+ return conf->stripe_size < PAGE_SIZE;
+}
+
extern void md_raid5_kick_device(struct r5conf *conf);
extern int raid5_set_cache_size(struct mddev *mddev, int size);
extern sector_t raid5_compute_blocknr(struct stripe_head *sh, int i, int previous);
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 05/16] md/raid5: allocate and free shared pages of r5pages
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
` (3 preceding siblings ...)
2020-07-02 12:06 ` [PATCH v5 04/16] md/raid5: add a member of r5pages for struct stripe_head Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 12:06 ` [PATCH v5 06/16] md/raid5: set correct page offset for bi_io_vec in ops_run_io() Yufen Yu
` (11 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
When PAGE_SIZE is bigger than stripe_size, try to allocate pages of
r5pages in grow_buffres() and free these pages in shrink_buffers().
Then, set sh->dev[i].page and sh->dev[i].offset as the page in
array. Without enable shared page, we just set offset as value of '0'.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
drivers/md/raid5.c | 113 ++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 106 insertions(+), 7 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 694f6713369d..920e1e147e8a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -448,15 +448,72 @@ static struct stripe_head *get_free_stripe(struct r5conf *conf, int hash)
return sh;
}
+/*
+ * Try to free all pages in r5pages array.
+ */
+static void free_stripe_pages(struct stripe_head *sh)
+{
+ int i;
+ struct page *p;
+
+ /* Have not allocate page pool */
+ if (!sh->pages.page)
+ return;
+
+ for (i = 0; i < sh->pages.size; i++) {
+ p = sh->pages.page[i];
+ if (p)
+ put_page(p);
+ sh->pages.page[i] = NULL;
+ }
+}
+
+/*
+ * Allocate pages for r5pages.
+ */
+static int alloc_stripe_pages(struct stripe_head *sh, gfp_t gfp)
+{
+ int i;
+ struct page *p;
+
+ for (i = 0; i < sh->pages.size; i++) {
+ /* The page have allocated. */
+ if (sh->pages.page[i])
+ continue;
+
+ p = alloc_page(gfp);
+ if (!p) {
+ free_stripe_pages(sh);
+ return -ENOMEM;
+ }
+ sh->pages.page[i] = p;
+ }
+ return 0;
+}
+
static void shrink_buffers(struct stripe_head *sh)
{
struct page *p;
int i;
int num = sh->raid_conf->pool_size;
- for (i = 0; i < num ; i++) {
+ if (raid5_stripe_pages_shared(sh->raid_conf))
+ free_stripe_pages(sh); /* Free pages in r5pages */
+
+ for (i = 0; i < num; i++) {
WARN_ON(sh->dev[i].page != sh->dev[i].orig_page);
p = sh->dev[i].page;
+
+ /*
+ * If we use pages in r5pages, these pages have been
+ * freed in free_stripe_pages().
+ */
+ if (raid5_stripe_pages_shared(sh->raid_conf)) {
+ if (p)
+ sh->dev[i].page = NULL;
+ continue;
+ }
+
if (!p)
continue;
sh->dev[i].page = NULL;
@@ -469,14 +526,26 @@ static int grow_buffers(struct stripe_head *sh, gfp_t gfp)
int i;
int num = sh->raid_conf->pool_size;
+ if (raid5_stripe_pages_shared(sh->raid_conf) &&
+ alloc_stripe_pages(sh, gfp))
+ return -ENOMEM;
+
for (i = 0; i < num; i++) {
struct page *page;
+ unsigned int offset;
- if (!(page = alloc_page(gfp))) {
- return 1;
+ if (raid5_stripe_pages_shared(sh->raid_conf)) {
+ page = raid5_get_dev_page(sh, i);
+ offset = raid5_get_page_offset(sh, i);
+ } else {
+ page = alloc_page(gfp);
+ if (!page)
+ return -ENOMEM;
+ offset = 0;
}
sh->dev[i].page = page;
sh->dev[i].orig_page = page;
+ sh->dev[i].offset = offset;
}
return 0;
@@ -2146,11 +2215,35 @@ static void raid_run_ops(struct stripe_head *sh, unsigned long ops_request)
static void free_stripe(struct kmem_cache *sc, struct stripe_head *sh)
{
+ kfree(sh->pages.page);
+
if (sh->ppl_page)
__free_page(sh->ppl_page);
kmem_cache_free(sc, sh);
}
+static int
+init_stripe_shared_pages(struct stripe_head *sh, struct r5conf *conf)
+{
+ int nr_page;
+ int cnt;
+
+ BUG_ON(conf->stripe_size > PAGE_SIZE);
+ if (!raid5_stripe_pages_shared(conf) || sh->pages.page)
+ return 0;
+
+ /* Each of the sh->dev[i] need one conf->stripe_size */
+ cnt = PAGE_SIZE / conf->stripe_size;
+ nr_page = (conf->raid_disks + cnt - 1) / cnt;
+
+ sh->pages.page = kcalloc(nr_page, sizeof(struct page *), GFP_KERNEL);
+ if (!sh->pages.page)
+ return -ENOMEM;
+ sh->pages.size = nr_page;
+
+ return 0;
+}
+
static struct stripe_head *alloc_stripe(struct kmem_cache *sc, gfp_t gfp,
int disks, struct r5conf *conf)
{
@@ -2177,14 +2270,20 @@ static struct stripe_head *alloc_stripe(struct kmem_cache *sc, gfp_t gfp,
if (raid5_has_ppl(conf)) {
sh->ppl_page = alloc_page(gfp);
- if (!sh->ppl_page) {
- free_stripe(sc, sh);
- sh = NULL;
- }
+ if (!sh->ppl_page)
+ goto fail;
}
+
+ if (init_stripe_shared_pages(sh, conf))
+ goto fail;
}
return sh;
+
+fail:
+ free_stripe(sc, sh);
+ return NULL;
}
+
static int grow_one_stripe(struct r5conf *conf, gfp_t gfp)
{
struct stripe_head *sh;
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 06/16] md/raid5: set correct page offset for bi_io_vec in ops_run_io()
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
` (4 preceding siblings ...)
2020-07-02 12:06 ` [PATCH v5 05/16] md/raid5: allocate and free shared pages of r5pages Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 12:06 ` [PATCH v5 07/16] md/raid5: set correct page offset for async_copy_data() Yufen Yu
` (10 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
After using r5pages for each sh->dev[i], we need to set correct offset
of that page for bi_io_vec when issue bio. The value of offset is zero
without using r5pages.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
drivers/md/raid5.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 920e1e147e8a..0f6ec27cf620 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1200,7 +1200,7 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
sh->dev[i].vec.bv_page = sh->dev[i].page;
bi->bi_vcnt = 1;
bi->bi_io_vec[0].bv_len = conf->stripe_size;
- bi->bi_io_vec[0].bv_offset = 0;
+ bi->bi_io_vec[0].bv_offset = sh->dev[i].offset;
bi->bi_iter.bi_size = conf->stripe_size;
bi->bi_write_hint = sh->dev[i].write_hint;
if (!rrdev)
@@ -1254,7 +1254,7 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
sh->dev[i].rvec.bv_page = sh->dev[i].page;
rbi->bi_vcnt = 1;
rbi->bi_io_vec[0].bv_len = conf->stripe_size;
- rbi->bi_io_vec[0].bv_offset = 0;
+ rbi->bi_io_vec[0].bv_offset = sh->dev[i].offset;
rbi->bi_iter.bi_size = conf->stripe_size;
rbi->bi_write_hint = sh->dev[i].write_hint;
sh->dev[i].write_hint = RWH_WRITE_LIFE_NOT_SET;
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 07/16] md/raid5: set correct page offset for async_copy_data()
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
` (5 preceding siblings ...)
2020-07-02 12:06 ` [PATCH v5 06/16] md/raid5: set correct page offset for bi_io_vec in ops_run_io() Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 12:06 ` [PATCH v5 08/16] md/raid5: resize stripes and set correct offset when reshape array Yufen Yu
` (9 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
ops_run_biofill() and ops_run_biodrain() will call async_copy_data()
to copy sh->dev[i].page from or to bio. It also need to set correct
page offset for dev->page when use r5pages.
Without modifying original code logic, we replace 'page_offset' with
'page_offset + poff' simplify. In case of that wihtout using r5pages,
poff is zero.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
drivers/md/raid5.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0f6ec27cf620..e554d073113b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1296,7 +1296,7 @@ static void ops_run_io(struct stripe_head *sh, struct stripe_head_state *s)
static struct dma_async_tx_descriptor *
async_copy_data(int frombio, struct bio *bio, struct page **page,
- sector_t sector, struct dma_async_tx_descriptor *tx,
+ unsigned int poff, sector_t sector, struct dma_async_tx_descriptor *tx,
struct stripe_head *sh, int no_skipcopy)
{
struct bio_vec bvl;
@@ -1342,11 +1342,13 @@ async_copy_data(int frombio, struct bio *bio, struct page **page,
!no_skipcopy)
*page = bio_page;
else
- tx = async_memcpy(*page, bio_page, page_offset,
- b_offset, clen, &submit);
+ tx = async_memcpy(*page, bio_page,
+ page_offset + poff, b_offset,
+ clen, &submit);
} else
tx = async_memcpy(bio_page, *page, b_offset,
- page_offset, clen, &submit);
+ page_offset + poff,
+ clen, &submit);
}
/* chain the operations */
submit.depend_tx = tx;
@@ -1418,7 +1420,8 @@ static void ops_run_biofill(struct stripe_head *sh)
while (rbi && rbi->bi_iter.bi_sector <
dev->sector + sh->raid_conf->stripe_sectors) {
tx = async_copy_data(0, rbi, &dev->page,
- dev->sector, tx, sh, 0);
+ dev->offset,
+ dev->sector, tx, sh, 0);
rbi = r5_next_bio(sh->raid_conf, rbi,
dev->sector);
}
@@ -1846,6 +1849,7 @@ ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
set_bit(R5_Discard, &dev->flags);
else {
tx = async_copy_data(1, wbi, &dev->page,
+ dev->offset,
dev->sector, tx, sh,
r5c_is_writeback(conf->log));
if (dev->page != dev->orig_page &&
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 08/16] md/raid5: resize stripes and set correct offset when reshape array
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
` (6 preceding siblings ...)
2020-07-02 12:06 ` [PATCH v5 07/16] md/raid5: set correct page offset for async_copy_data() Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 12:06 ` [PATCH v5 09/16] md/raid5: add new xor function to support different page offset Yufen Yu
` (8 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
When reshape array, we try to reuse shared pages of old stripe_head,
and allocate more for the new one if needed. At the same time, set
correct offset when call handle_stripe_expansion().
By the way, we call resize_stripes() only when grow raid array disks,
so that don't worry about memleak for old r5pages.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
drivers/md/raid5.c | 51 +++++++++++++++++++++++++++++++++++++++-------
1 file changed, 44 insertions(+), 7 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e554d073113b..689a36c8e723 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2493,10 +2493,20 @@ static int resize_stripes(struct r5conf *conf, int newsize)
osh = get_free_stripe(conf, hash);
unlock_device_hash_lock(conf, hash);
- for(i=0; i<conf->pool_size; i++) {
+ if (raid5_stripe_pages_shared(conf)) {
+ /* We reuse pages in r5pages of old stripe head */
+ for (i = 0; i < osh->pages.size; i++) {
+ nsh->pages.page[i] = osh->pages.page[i];
+ osh->pages.page[i] = NULL;
+ }
+ }
+
+ for (i = 0; i < conf->pool_size; i++) {
nsh->dev[i].page = osh->dev[i].page;
nsh->dev[i].orig_page = osh->dev[i].page;
+ nsh->dev[i].offset = osh->dev[i].offset;
}
+
nsh->hash_lock_index = hash;
free_stripe(conf->slab_cache, osh);
cnt++;
@@ -2543,17 +2553,42 @@ static int resize_stripes(struct r5conf *conf, int newsize)
/* Step 4, return new stripes to service */
while(!list_empty(&newstripes)) {
+ struct page *p;
+ unsigned int offset;
nsh = list_entry(newstripes.next, struct stripe_head, lru);
list_del_init(&nsh->lru);
- for (i=conf->raid_disks; i < newsize; i++)
- if (nsh->dev[i].page == NULL) {
- struct page *p = alloc_page(GFP_NOIO);
- nsh->dev[i].page = p;
- nsh->dev[i].orig_page = p;
+ /*
+ * If we use r5pages, means, pages.size is not zero,
+ * allocate pages it needed for new stripe_head.
+ */
+ for (i = 0; i < nsh->pages.size; i++) {
+ if (nsh->pages.page[i] == NULL) {
+ p = alloc_page(GFP_NOIO);
+ if (!p)
+ err = -ENOMEM;
+ nsh->pages.page[i] = p;
+ }
+ }
+
+ for (i = conf->raid_disks; i < newsize; i++) {
+ if (nsh->dev[i].page)
+ continue;
+
+ if (raid5_stripe_pages_shared(conf)) {
+ p = raid5_get_dev_page(nsh, i);
+ offset = raid5_get_page_offset(nsh, i);
+ } else {
+ p = alloc_page(GFP_NOIO);
if (!p)
err = -ENOMEM;
+ offset = 0;
}
+
+ nsh->dev[i].page = p;
+ nsh->dev[i].orig_page = p;
+ nsh->dev[i].offset = offset;
+ }
raid5_release_stripe(nsh);
}
/* critical section pass, GFP_NOIO no longer needed */
@@ -4476,7 +4511,9 @@ static void handle_stripe_expansion(struct r5conf *conf, struct stripe_head *sh)
/* place all the copies on one channel */
init_async_submit(&submit, 0, tx, NULL, NULL, NULL);
tx = async_memcpy(sh2->dev[dd_idx].page,
- sh->dev[i].page, 0, 0,
+ sh->dev[i].page,
+ sh2->dev[dd_idx].offset,
+ sh->dev[i].offset,
conf->stripe_size,
&submit);
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 09/16] md/raid5: add new xor function to support different page offset
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
` (7 preceding siblings ...)
2020-07-02 12:06 ` [PATCH v5 08/16] md/raid5: resize stripes and set correct offset when reshape array Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 12:06 ` [PATCH v5 10/16] md/raid5: add offset array in scribble buffer Yufen Yu
` (7 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
RAID5 will call async_xor() and async_xor_val() to compute xor.
However, both of them require common src/dst page offset. After
introducing shared pages of r5pages, we want these xor computer
function to support different src/dst page offset.
Here, we add two new functions async_xor_offsets() and
async_xor_val_offsets() respectively for async_xor() and async_xor_val().
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
crypto/async_tx/async_xor.c | 120 +++++++++++++++++++++++++++++++-----
include/linux/async_tx.h | 11 ++++
2 files changed, 114 insertions(+), 17 deletions(-)
diff --git a/crypto/async_tx/async_xor.c b/crypto/async_tx/async_xor.c
index 4e5eebe52e6a..f5666ac84d34 100644
--- a/crypto/async_tx/async_xor.c
+++ b/crypto/async_tx/async_xor.c
@@ -97,7 +97,8 @@ do_async_xor(struct dma_chan *chan, struct dmaengine_unmap_data *unmap,
}
static void
-do_sync_xor(struct page *dest, struct page **src_list, unsigned int offset,
+do_sync_xor_offsets(struct page *dest, unsigned int offset,
+ struct page **src_list, unsigned int *src_offset,
int src_cnt, size_t len, struct async_submit_ctl *submit)
{
int i;
@@ -114,7 +115,8 @@ do_sync_xor(struct page *dest, struct page **src_list, unsigned int offset,
/* convert to buffer pointers */
for (i = 0; i < src_cnt; i++)
if (src_list[i])
- srcs[xor_src_cnt++] = page_address(src_list[i]) + offset;
+ srcs[xor_src_cnt++] = page_address(src_list[i]) +
+ (src_offset ? src_offset[i] : offset);
src_cnt = xor_src_cnt;
/* set destination address */
dest_buf = page_address(dest) + offset;
@@ -135,11 +137,31 @@ do_sync_xor(struct page *dest, struct page **src_list, unsigned int offset,
async_tx_sync_epilog(submit);
}
+static inline bool
+dma_xor_aligned_offsets(struct dma_device *device, unsigned int offset,
+ unsigned int *src_offset, int src_cnt, int len)
+{
+ int i;
+
+ if (!is_dma_xor_aligned(device, offset, 0, len))
+ return false;
+
+ if (!src_offset)
+ return true;
+
+ for (i = 0; i < src_cnt; i++) {
+ if (!is_dma_xor_aligned(device, src_offset[i], 0, len))
+ return false;
+ }
+ return true;
+}
+
/**
- * async_xor - attempt to xor a set of blocks with a dma engine.
+ * async_xor_offsets - attempt to xor a set of blocks with a dma engine.
* @dest: destination page
+ * @offset: dst offset to start transaction
* @src_list: array of source pages
- * @offset: common src/dst offset to start transaction
+ * @src_offset: array of source pages offset, NULL means common src/dst offset
* @src_cnt: number of source pages
* @len: length in bytes
* @submit: submission / completion modifiers
@@ -157,8 +179,9 @@ do_sync_xor(struct page *dest, struct page **src_list, unsigned int offset,
* is not specified.
*/
struct dma_async_tx_descriptor *
-async_xor(struct page *dest, struct page **src_list, unsigned int offset,
- int src_cnt, size_t len, struct async_submit_ctl *submit)
+async_xor_offsets(struct page *dest, unsigned int offset,
+ struct page **src_list, unsigned int *src_offset,
+ int src_cnt, size_t len, struct async_submit_ctl *submit)
{
struct dma_chan *chan = async_tx_find_channel(submit, DMA_XOR,
&dest, 1, src_list,
@@ -171,7 +194,8 @@ async_xor(struct page *dest, struct page **src_list, unsigned int offset,
if (device)
unmap = dmaengine_get_unmap_data(device->dev, src_cnt+1, GFP_NOWAIT);
- if (unmap && is_dma_xor_aligned(device, offset, 0, len)) {
+ if (unmap && dma_xor_aligned_offsets(device, offset,
+ src_offset, src_cnt, len)) {
struct dma_async_tx_descriptor *tx;
int i, j;
@@ -184,7 +208,8 @@ async_xor(struct page *dest, struct page **src_list, unsigned int offset,
continue;
unmap->to_cnt++;
unmap->addr[j++] = dma_map_page(device->dev, src_list[i],
- offset, len, DMA_TO_DEVICE);
+ src_offset ? src_offset[i] : offset,
+ len, DMA_TO_DEVICE);
}
/* map it bidirectional as it may be re-used as a source */
@@ -213,11 +238,42 @@ async_xor(struct page *dest, struct page **src_list, unsigned int offset,
/* wait for any prerequisite operations */
async_tx_quiesce(&submit->depend_tx);
- do_sync_xor(dest, src_list, offset, src_cnt, len, submit);
+ do_sync_xor_offsets(dest, offset, src_list, src_offset,
+ src_cnt, len, submit);
return NULL;
}
}
+EXPORT_SYMBOL_GPL(async_xor_offsets);
+
+/**
+ * async_xor - attempt to xor a set of blocks with a dma engine.
+ * @dest: destination page
+ * @src_list: array of source pages
+ * @offset: common src/dst offset to start transaction
+ * @src_cnt: number of source pages
+ * @len: length in bytes
+ * @submit: submission / completion modifiers
+ *
+ * honored flags: ASYNC_TX_ACK, ASYNC_TX_XOR_ZERO_DST, ASYNC_TX_XOR_DROP_DST
+ *
+ * xor_blocks always uses the dest as a source so the
+ * ASYNC_TX_XOR_ZERO_DST flag must be set to not include dest data in
+ * the calculation. The assumption with dma eninges is that they only
+ * use the destination buffer as a source when it is explicity specified
+ * in the source list.
+ *
+ * src_list note: if the dest is also a source it must be at index zero.
+ * The contents of this array will be overwritten if a scribble region
+ * is not specified.
+ */
+struct dma_async_tx_descriptor *
+async_xor(struct page *dest, struct page **src_list, unsigned int offset,
+ int src_cnt, size_t len, struct async_submit_ctl *submit)
+{
+ return async_xor_offsets(dest, offset, src_list, NULL,
+ src_cnt, len, submit);
+}
EXPORT_SYMBOL_GPL(async_xor);
static int page_is_zero(struct page *p, unsigned int offset, size_t len)
@@ -237,10 +293,11 @@ xor_val_chan(struct async_submit_ctl *submit, struct page *dest,
}
/**
- * async_xor_val - attempt a xor parity check with a dma engine.
+ * async_xor_val_offsets - attempt a xor parity check with a dma engine.
* @dest: destination page used if the xor is performed synchronously
+ * @offset: des offset in pages to start transaction
* @src_list: array of source pages
- * @offset: offset in pages to start transaction
+ * @src_offset: array of source pages offset, NULL means common src/det offset
* @src_cnt: number of source pages
* @len: length in bytes
* @result: 0 if sum == 0 else non-zero
@@ -253,9 +310,10 @@ xor_val_chan(struct async_submit_ctl *submit, struct page *dest,
* is not specified.
*/
struct dma_async_tx_descriptor *
-async_xor_val(struct page *dest, struct page **src_list, unsigned int offset,
- int src_cnt, size_t len, enum sum_check_flags *result,
- struct async_submit_ctl *submit)
+async_xor_val_offsets(struct page *dest, unsigned int offset,
+ struct page **src_list, unsigned int *src_offset,
+ int src_cnt, size_t len, enum sum_check_flags *result,
+ struct async_submit_ctl *submit)
{
struct dma_chan *chan = xor_val_chan(submit, dest, src_list, src_cnt, len);
struct dma_device *device = chan ? chan->device : NULL;
@@ -268,7 +326,7 @@ async_xor_val(struct page *dest, struct page **src_list, unsigned int offset,
unmap = dmaengine_get_unmap_data(device->dev, src_cnt, GFP_NOWAIT);
if (unmap && src_cnt <= device->max_xor &&
- is_dma_xor_aligned(device, offset, 0, len)) {
+ dma_xor_aligned_offsets(device, offset, src_offset, src_cnt, len)) {
unsigned long dma_prep_flags = 0;
int i;
@@ -281,7 +339,8 @@ async_xor_val(struct page *dest, struct page **src_list, unsigned int offset,
for (i = 0; i < src_cnt; i++) {
unmap->addr[i] = dma_map_page(device->dev, src_list[i],
- offset, len, DMA_TO_DEVICE);
+ src_offset ? src_offset[i] : offset,
+ len, DMA_TO_DEVICE);
unmap->to_cnt++;
}
unmap->len = len;
@@ -312,7 +371,8 @@ async_xor_val(struct page *dest, struct page **src_list, unsigned int offset,
submit->flags |= ASYNC_TX_XOR_DROP_DST;
submit->flags &= ~ASYNC_TX_ACK;
- tx = async_xor(dest, src_list, offset, src_cnt, len, submit);
+ tx = async_xor_offsets(dest, offset, src_list, src_offset,
+ src_cnt, len, submit);
async_tx_quiesce(&tx);
@@ -325,6 +385,32 @@ async_xor_val(struct page *dest, struct page **src_list, unsigned int offset,
return tx;
}
+EXPORT_SYMBOL_GPL(async_xor_val_offsets);
+
+/**
+ * async_xor_val - attempt a xor parity check with a dma engine.
+ * @dest: destination page used if the xor is performed synchronously
+ * @src_list: array of source pages
+ * @offset: offset in pages to start transaction
+ * @src_cnt: number of source pages
+ * @len: length in bytes
+ * @result: 0 if sum == 0 else non-zero
+ * @submit: submission / completion modifiers
+ *
+ * honored flags: ASYNC_TX_ACK
+ *
+ * src_list note: if the dest is also a source it must be at index zero.
+ * The contents of this array will be overwritten if a scribble region
+ * is not specified.
+ */
+struct dma_async_tx_descriptor *
+async_xor_val(struct page *dest, struct page **src_list, unsigned int offset,
+ int src_cnt, size_t len, enum sum_check_flags *result,
+ struct async_submit_ctl *submit)
+{
+ return async_xor_val_offsets(dest, offset, src_list, NULL, src_cnt,
+ len, result, submit);
+}
EXPORT_SYMBOL_GPL(async_xor_val);
MODULE_AUTHOR("Intel Corporation");
diff --git a/include/linux/async_tx.h b/include/linux/async_tx.h
index 75e582b8d2d9..8d79e2de06bd 100644
--- a/include/linux/async_tx.h
+++ b/include/linux/async_tx.h
@@ -162,11 +162,22 @@ struct dma_async_tx_descriptor *
async_xor(struct page *dest, struct page **src_list, unsigned int offset,
int src_cnt, size_t len, struct async_submit_ctl *submit);
+struct dma_async_tx_descriptor *
+async_xor_offsets(struct page *dest, unsigned int offset,
+ struct page **src_list, unsigned int *src_offset,
+ int src_cnt, size_t len, struct async_submit_ctl *submit);
+
struct dma_async_tx_descriptor *
async_xor_val(struct page *dest, struct page **src_list, unsigned int offset,
int src_cnt, size_t len, enum sum_check_flags *result,
struct async_submit_ctl *submit);
+struct dma_async_tx_descriptor *
+async_xor_val_offsets(struct page *dest, unsigned int offset,
+ struct page **src_list, unsigned int *src_offset,
+ int src_cnt, size_t len, enum sum_check_flags *result,
+ struct async_submit_ctl *submit);
+
struct dma_async_tx_descriptor *
async_memcpy(struct page *dest, struct page *src, unsigned int dest_offset,
unsigned int src_offset, size_t len,
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 10/16] md/raid5: add offset array in scribble buffer
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
` (8 preceding siblings ...)
2020-07-02 12:06 ` [PATCH v5 09/16] md/raid5: add new xor function to support different page offset Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 12:06 ` [PATCH v5 11/16] md/raid5: compute xor with correct page offset Yufen Yu
` (6 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
When enable shared buffers for stripe_head, it need an offset
array to record page offset to compute xor. To avoid repeatly allocate
an new array each time, we add a memory region into scribble buffer
to record offset.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
drivers/md/raid5.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 689a36c8e723..b14d5909f6a9 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1477,6 +1477,15 @@ static addr_conv_t *to_addr_conv(struct stripe_head *sh,
return (void *) (to_addr_page(percpu, i) + sh->disks + 2);
}
+/*
+ * Return a pointer to record offset address.
+ */
+static unsigned int *
+to_addr_offs(struct stripe_head *sh, struct raid5_percpu *percpu)
+{
+ return (unsigned int *) (to_addr_conv(sh, percpu, 0) + sh->disks + 2);
+}
+
static struct dma_async_tx_descriptor *
ops_run_compute5(struct stripe_head *sh, struct raid5_percpu *percpu)
{
@@ -2360,8 +2369,9 @@ static int scribble_alloc(struct raid5_percpu *percpu,
int num, int cnt)
{
size_t obj_size =
- sizeof(struct page *) * (num+2) +
- sizeof(addr_conv_t) * (num+2);
+ sizeof(struct page *) * (num + 2) +
+ sizeof(addr_conv_t) * (num + 2) +
+ sizeof(unsigned int) * (num + 2);
void *scribble;
/*
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 11/16] md/raid5: compute xor with correct page offset
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
` (9 preceding siblings ...)
2020-07-02 12:06 ` [PATCH v5 10/16] md/raid5: add offset array in scribble buffer Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 12:06 ` [PATCH v5 12/16] md/raid5: support config stripe_size by sysfs entry Yufen Yu
` (5 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
When compute xor, the pages address will be passed to computer function.
After trying to use pages in r5pages, we also need to pass page offset
to let it know correct location of each page.
For now raid5-cache and raid5-ppl are supported only when PAGE_SIZE is
equal to 4096. In that case, shared pages will not be supported and
dev->offset is '0'. So, we can use that value directly.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
drivers/md/raid5.c | 67 ++++++++++++++++++++++++++++++++++------------
1 file changed, 50 insertions(+), 17 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b14d5909f6a9..f0fd01d9122e 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1491,6 +1491,7 @@ ops_run_compute5(struct stripe_head *sh, struct raid5_percpu *percpu)
{
int disks = sh->disks;
struct page **xor_srcs = to_addr_page(percpu, 0);
+ unsigned int *offs = to_addr_offs(sh, percpu);
int target = sh->ops.target;
struct r5dev *tgt = &sh->dev[target];
struct page *xor_dest = tgt->page;
@@ -1499,6 +1500,7 @@ ops_run_compute5(struct stripe_head *sh, struct raid5_percpu *percpu)
struct async_submit_ctl submit;
int i;
struct r5conf *conf = sh->raid_conf;
+ unsigned int des_offset = tgt->offset;
BUG_ON(sh->batch_head);
@@ -1506,20 +1508,23 @@ ops_run_compute5(struct stripe_head *sh, struct raid5_percpu *percpu)
__func__, (unsigned long long)sh->sector, target);
BUG_ON(!test_bit(R5_Wantcompute, &tgt->flags));
- for (i = disks; i--; )
- if (i != target)
+ for (i = disks; i--; ) {
+ if (i != target) {
+ offs[count] = sh->dev[i].offset;
xor_srcs[count++] = sh->dev[i].page;
+ }
+ }
atomic_inc(&sh->count);
init_async_submit(&submit, ASYNC_TX_FENCE|ASYNC_TX_XOR_ZERO_DST, NULL,
ops_complete_compute, sh, to_addr_conv(sh, percpu, 0));
if (unlikely(count == 1))
- tx = async_memcpy(xor_dest, xor_srcs[0], 0, 0,
+ tx = async_memcpy(xor_dest, xor_srcs[0], des_offset, offs[0],
conf->stripe_size, &submit);
else
- tx = async_xor(xor_dest, xor_srcs, 0, count,
- conf->stripe_size, &submit);
+ tx = async_xor_offsets(xor_dest, des_offset, xor_srcs, offs,
+ count, conf->stripe_size, &submit);
return tx;
}
@@ -1763,11 +1768,13 @@ ops_run_prexor5(struct stripe_head *sh, struct raid5_percpu *percpu,
{
int disks = sh->disks;
struct page **xor_srcs = to_addr_page(percpu, 0);
+ unsigned int *offs = to_addr_offs(sh, percpu);
int count = 0, pd_idx = sh->pd_idx, i;
struct async_submit_ctl submit;
struct r5conf *conf = sh->raid_conf;
/* existing parity data subtracted */
+ unsigned int des_offset = offs[count] = sh->dev[pd_idx].offset;
struct page *xor_dest = xor_srcs[count++] = sh->dev[pd_idx].page;
BUG_ON(sh->batch_head);
@@ -1777,16 +1784,23 @@ ops_run_prexor5(struct stripe_head *sh, struct raid5_percpu *percpu,
for (i = disks; i--; ) {
struct r5dev *dev = &sh->dev[i];
/* Only process blocks that are known to be uptodate */
- if (test_bit(R5_InJournal, &dev->flags))
+ if (test_bit(R5_InJournal, &dev->flags)) {
+ /*
+ * For this case, PAGE_SIZE must be 4KB and will not
+ * use r5pages. So dev->offset is zero.
+ */
+ offs[count] = dev->offset;
xor_srcs[count++] = dev->orig_page;
- else if (test_bit(R5_Wantdrain, &dev->flags))
+ } else if (test_bit(R5_Wantdrain, &dev->flags)) {
+ offs[count] = dev->offset;
xor_srcs[count++] = dev->page;
+ }
}
init_async_submit(&submit, ASYNC_TX_FENCE|ASYNC_TX_XOR_DROP_DST, tx,
ops_complete_prexor, sh, to_addr_conv(sh, percpu, 0));
- tx = async_xor(xor_dest, xor_srcs, 0, count,
- conf->stripe_size, &submit);
+ tx = async_xor_offsets(xor_dest, des_offset, xor_srcs, offs,
+ count, conf->stripe_size, &submit);
return tx;
}
@@ -1938,6 +1952,7 @@ ops_run_reconstruct5(struct stripe_head *sh, struct raid5_percpu *percpu,
{
int disks = sh->disks;
struct page **xor_srcs;
+ unsigned int *offs;
struct async_submit_ctl submit;
int count, pd_idx = sh->pd_idx, i;
struct page *xor_dest;
@@ -1947,6 +1962,7 @@ ops_run_reconstruct5(struct stripe_head *sh, struct raid5_percpu *percpu,
struct stripe_head *head_sh = sh;
int last_stripe;
struct r5conf *conf = sh->raid_conf;
+ unsigned int des_offset;
pr_debug("%s: stripe %llu\n", __func__,
(unsigned long long)sh->sector);
@@ -1966,24 +1982,33 @@ ops_run_reconstruct5(struct stripe_head *sh, struct raid5_percpu *percpu,
again:
count = 0;
xor_srcs = to_addr_page(percpu, j);
+ offs = to_addr_offs(sh, percpu);
/* check if prexor is active which means only process blocks
* that are part of a read-modify-write (written)
*/
if (head_sh->reconstruct_state == reconstruct_state_prexor_drain_run) {
prexor = 1;
+ des_offset = offs[count] = sh->dev[pd_idx].offset;
xor_dest = xor_srcs[count++] = sh->dev[pd_idx].page;
+
for (i = disks; i--; ) {
struct r5dev *dev = &sh->dev[i];
if (head_sh->dev[i].written ||
- test_bit(R5_InJournal, &head_sh->dev[i].flags))
+ test_bit(R5_InJournal, &head_sh->dev[i].flags)) {
+ offs[count] = dev->offset;
xor_srcs[count++] = dev->page;
+ }
}
} else {
xor_dest = sh->dev[pd_idx].page;
+ des_offset = sh->dev[pd_idx].offset;
+
for (i = disks; i--; ) {
struct r5dev *dev = &sh->dev[i];
- if (i != pd_idx)
+ if (i != pd_idx) {
+ offs[count] = dev->offset;
xor_srcs[count++] = dev->page;
+ }
}
}
@@ -2009,11 +2034,12 @@ ops_run_reconstruct5(struct stripe_head *sh, struct raid5_percpu *percpu,
}
if (unlikely(count == 1))
- tx = async_memcpy(xor_dest, xor_srcs[0], 0, 0,
- conf->stripe_size, &submit);
+ tx = async_memcpy(xor_dest, xor_srcs[0], des_offset,
+ offs[0], conf->stripe_size, &submit);
else
- tx = async_xor(xor_dest, xor_srcs, 0, count,
- conf->stripe_size, &submit);
+ tx = async_xor_offsets(xor_dest, des_offset, xor_srcs,
+ offs, count, conf->stripe_size, &submit);
+
if (!last_stripe) {
j++;
sh = list_first_entry(&sh->batch_list, struct stripe_head,
@@ -2103,11 +2129,13 @@ static void ops_run_check_p(struct stripe_head *sh, struct raid5_percpu *percpu)
int qd_idx = sh->qd_idx;
struct page *xor_dest;
struct page **xor_srcs = to_addr_page(percpu, 0);
+ unsigned int *offs = to_addr_offs(sh, percpu);
struct dma_async_tx_descriptor *tx;
struct async_submit_ctl submit;
int count;
int i;
struct r5conf *conf = sh->raid_conf;
+ unsigned int dest_offset;
pr_debug("%s: stripe %llu\n", __func__,
(unsigned long long)sh->sector);
@@ -2115,17 +2143,22 @@ static void ops_run_check_p(struct stripe_head *sh, struct raid5_percpu *percpu)
BUG_ON(sh->batch_head);
count = 0;
xor_dest = sh->dev[pd_idx].page;
+ dest_offset = sh->dev[pd_idx].offset;
+ offs[count] = dest_offset;
xor_srcs[count++] = xor_dest;
+
for (i = disks; i--; ) {
if (i == pd_idx || i == qd_idx)
continue;
+ offs[count] = sh->dev[i].offset;
xor_srcs[count++] = sh->dev[i].page;
}
init_async_submit(&submit, 0, NULL, NULL, NULL,
to_addr_conv(sh, percpu, 0));
- tx = async_xor_val(xor_dest, xor_srcs, 0, count, conf->stripe_size,
- &sh->ops.zero_sum_result, &submit);
+ tx = async_xor_val_offsets(xor_dest, dest_offset, xor_srcs, offs,
+ count, conf->stripe_size,
+ &sh->ops.zero_sum_result, &submit);
atomic_inc(&sh->count);
init_async_submit(&submit, ASYNC_TX_ACK, tx, ops_complete_check, sh, NULL);
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 12/16] md/raid5: support config stripe_size by sysfs entry
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
` (10 preceding siblings ...)
2020-07-02 12:06 ` [PATCH v5 11/16] md/raid5: compute xor with correct page offset Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 22:38 ` Song Liu
2020-07-02 12:06 ` [PATCH v5 13/16] md/raid6: let syndrome computor support different page offset Yufen Yu
` (4 subsequent siblings)
16 siblings, 1 reply; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
After this patch, we can adjust stripe_size by writing value into sysfs
entry, likely, set stripe_size as 16KB:
echo 16384 > /sys/block/md1/md/stripe_size
Show current stripe_size value:
cat /sys/block/md1/md/stripe_size
stripe_size should not be bigger than PAGE_SIZE, and it requires to be
multiple of 4096.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
drivers/md/raid5.c | 69 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 68 insertions(+), 1 deletion(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f0fd01d9122e..a3376a4e4e5c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6715,7 +6715,74 @@ raid5_show_stripe_size(struct mddev *mddev, char *page)
static ssize_t
raid5_store_stripe_size(struct mddev *mddev, const char *page, size_t len)
{
- return -EINVAL;
+ struct r5conf *conf = mddev->private;
+ unsigned int new;
+ int err;
+ int nr;
+
+ if (len >= PAGE_SIZE)
+ return -EINVAL;
+ if (kstrtouint(page, 10, &new))
+ return -EINVAL;
+ if (!conf)
+ return -ENODEV;
+
+ /*
+ * When PAGE_SZIE is 4096, we don't need to modify stripe_size.
+ * And the value should not be bigger than PAGE_SIZE.
+ * It requires to be multiple of 4096.
+ */
+ if (PAGE_SIZE == 4096 || new % 4096 != 0 ||
+ new > PAGE_SIZE || new == 0)
+ return -EINVAL;
+
+ if (new == conf->stripe_size)
+ return len;
+
+ pr_debug("md/raid: change stripe_size from %u to %u\n",
+ conf->stripe_size, new);
+
+ err = mddev_lock(mddev);
+ if (err)
+ return err;
+
+ if (mddev->sync_thread ||
+ test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
+ mddev->reshape_position != MaxSector ||
+ mddev->sysfs_active) {
+ err = -EBUSY;
+ goto out_unlock;
+ }
+
+ nr = conf->max_nr_stripes;
+
+ /* 1. suspend raid array */
+ mddev_suspend(mddev);
+
+ /* 2. free all old stripe_head */
+ mutex_lock(&conf->cache_size_mutex);
+ shrink_stripes(conf);
+ BUG_ON(conf->max_nr_stripes != 0);
+
+ /* 3. set new stripe_size */
+ conf->stripe_size = new;
+ conf->stripe_shift = ilog2(new) - 9;
+ conf->stripe_sectors = new >> 9;
+
+ /* 4. allocate new stripe_head */
+ if (grow_stripes(conf, nr)) {
+ pr_warn("md/raid:%s: couldn't allocate buffers\n",
+ mdname(mddev));
+ err = -ENOMEM;
+ }
+ mutex_unlock(&conf->cache_size_mutex);
+
+ /* 5. resume raid array */
+ mddev_resume(mddev);
+
+out_unlock:
+ mddev_unlock(mddev);
+ return err ?: len;
}
static struct md_sysfs_entry
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 13/16] md/raid6: let syndrome computor support different page offset
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
` (11 preceding siblings ...)
2020-07-02 12:06 ` [PATCH v5 12/16] md/raid5: support config stripe_size by sysfs entry Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 12:06 ` [PATCH v5 14/16] md/raid6: let async recovery function " Yufen Yu
` (3 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
For now, syndrome compute functions require common offset in the pages
array. However, we expect these function can support different page
offset when try to use share page. Simplily covert them by adding page
offset where each page address are referred.
Since the only caller of async_gen_syndrome() and async_syndrome_val()
are in md/raid6, we don't want to reserve them but modify the interface
directly.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
crypto/async_tx/async_pq.c | 72 +++++++++++++++++++++++++-------------
include/linux/async_tx.h | 6 ++--
2 files changed, 51 insertions(+), 27 deletions(-)
diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index 341ece61cf9b..f3256e9ceafe 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -104,7 +104,7 @@ do_async_gen_syndrome(struct dma_chan *chan,
* do_sync_gen_syndrome - synchronously calculate a raid6 syndrome
*/
static void
-do_sync_gen_syndrome(struct page **blocks, unsigned int offset, int disks,
+do_sync_gen_syndrome(struct page **blocks, unsigned int *offsets, int disks,
size_t len, struct async_submit_ctl *submit)
{
void **srcs;
@@ -121,7 +121,8 @@ do_sync_gen_syndrome(struct page **blocks, unsigned int offset, int disks,
BUG_ON(i > disks - 3); /* P or Q can't be zero */
srcs[i] = (void*)raid6_empty_zero_page;
} else {
- srcs[i] = page_address(blocks[i]) + offset;
+ srcs[i] = page_address(blocks[i]) + offsets[i];
+
if (i < disks - 2) {
stop = i;
if (start == -1)
@@ -138,10 +139,23 @@ do_sync_gen_syndrome(struct page **blocks, unsigned int offset, int disks,
async_tx_sync_epilog(submit);
}
+static inline bool
+is_dma_pq_aligned_offs(struct dma_device *dev, unsigned int *offs,
+ int src_cnt, size_t len)
+{
+ int i;
+
+ for (i = 0; i < src_cnt; i++) {
+ if (!is_dma_pq_aligned(dev, offs[i], 0, len))
+ return false;
+ }
+ return true;
+}
+
/**
* async_gen_syndrome - asynchronously calculate a raid6 syndrome
* @blocks: source blocks from idx 0..disks-3, P @ disks-2 and Q @ disks-1
- * @offset: common offset into each block (src and dest) to start transaction
+ * @offsets: offset array into each block (src and dest) to start transaction
* @disks: number of blocks (including missing P or Q, see below)
* @len: length of operation in bytes
* @submit: submission/completion modifiers
@@ -160,7 +174,7 @@ do_sync_gen_syndrome(struct page **blocks, unsigned int offset, int disks,
* path.
*/
struct dma_async_tx_descriptor *
-async_gen_syndrome(struct page **blocks, unsigned int offset, int disks,
+async_gen_syndrome(struct page **blocks, unsigned int *offsets, int disks,
size_t len, struct async_submit_ctl *submit)
{
int src_cnt = disks - 2;
@@ -179,7 +193,7 @@ async_gen_syndrome(struct page **blocks, unsigned int offset, int disks,
if (unmap && !(submit->flags & ASYNC_TX_PQ_XOR_DST) &&
(src_cnt <= dma_maxpq(device, 0) ||
dma_maxpq(device, DMA_PREP_CONTINUE) > 0) &&
- is_dma_pq_aligned(device, offset, 0, len)) {
+ is_dma_pq_aligned_offs(device, offsets, disks, len)) {
struct dma_async_tx_descriptor *tx;
enum dma_ctrl_flags dma_flags = 0;
unsigned char coefs[MAX_DISKS];
@@ -196,8 +210,8 @@ async_gen_syndrome(struct page **blocks, unsigned int offset, int disks,
for (i = 0, j = 0; i < src_cnt; i++) {
if (blocks[i] == NULL)
continue;
- unmap->addr[j] = dma_map_page(device->dev, blocks[i], offset,
- len, DMA_TO_DEVICE);
+ unmap->addr[j] = dma_map_page(device->dev, blocks[i],
+ offsets[i], len, DMA_TO_DEVICE);
coefs[j] = raid6_gfexp[i];
unmap->to_cnt++;
j++;
@@ -210,7 +224,8 @@ async_gen_syndrome(struct page **blocks, unsigned int offset, int disks,
unmap->bidi_cnt++;
if (P(blocks, disks))
unmap->addr[j++] = dma_map_page(device->dev, P(blocks, disks),
- offset, len, DMA_BIDIRECTIONAL);
+ P(offsets, disks),
+ len, DMA_BIDIRECTIONAL);
else {
unmap->addr[j++] = 0;
dma_flags |= DMA_PREP_PQ_DISABLE_P;
@@ -219,7 +234,8 @@ async_gen_syndrome(struct page **blocks, unsigned int offset, int disks,
unmap->bidi_cnt++;
if (Q(blocks, disks))
unmap->addr[j++] = dma_map_page(device->dev, Q(blocks, disks),
- offset, len, DMA_BIDIRECTIONAL);
+ Q(offsets, disks),
+ len, DMA_BIDIRECTIONAL);
else {
unmap->addr[j++] = 0;
dma_flags |= DMA_PREP_PQ_DISABLE_Q;
@@ -240,13 +256,13 @@ async_gen_syndrome(struct page **blocks, unsigned int offset, int disks,
if (!P(blocks, disks)) {
P(blocks, disks) = pq_scribble_page;
- BUG_ON(len + offset > PAGE_SIZE);
+ P(offsets, disks) = 0;
}
if (!Q(blocks, disks)) {
Q(blocks, disks) = pq_scribble_page;
- BUG_ON(len + offset > PAGE_SIZE);
+ Q(offsets, disks) = 0;
}
- do_sync_gen_syndrome(blocks, offset, disks, len, submit);
+ do_sync_gen_syndrome(blocks, offsets, disks, len, submit);
return NULL;
}
@@ -270,6 +286,7 @@ pq_val_chan(struct async_submit_ctl *submit, struct page **blocks, int disks, si
* @len: length of operation in bytes
* @pqres: on val failure SUM_CHECK_P_RESULT and/or SUM_CHECK_Q_RESULT are set
* @spare: temporary result buffer for the synchronous case
+ * @s_off: spare buffer page offset
* @submit: submission / completion modifiers
*
* The same notes from async_gen_syndrome apply to the 'blocks',
@@ -278,9 +295,9 @@ pq_val_chan(struct async_submit_ctl *submit, struct page **blocks, int disks, si
* specified.
*/
struct dma_async_tx_descriptor *
-async_syndrome_val(struct page **blocks, unsigned int offset, int disks,
+async_syndrome_val(struct page **blocks, unsigned int *offsets, int disks,
size_t len, enum sum_check_flags *pqres, struct page *spare,
- struct async_submit_ctl *submit)
+ unsigned int s_off, struct async_submit_ctl *submit)
{
struct dma_chan *chan = pq_val_chan(submit, blocks, disks, len);
struct dma_device *device = chan ? chan->device : NULL;
@@ -295,7 +312,7 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int disks,
unmap = dmaengine_get_unmap_data(device->dev, disks, GFP_NOWAIT);
if (unmap && disks <= dma_maxpq(device, 0) &&
- is_dma_pq_aligned(device, offset, 0, len)) {
+ is_dma_pq_aligned_offs(device, offsets, disks, len)) {
struct device *dev = device->dev;
dma_addr_t pq[2];
int i, j = 0, src_cnt = 0;
@@ -307,7 +324,7 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int disks,
for (i = 0; i < disks-2; i++)
if (likely(blocks[i])) {
unmap->addr[j] = dma_map_page(dev, blocks[i],
- offset, len,
+ offsets[i], len,
DMA_TO_DEVICE);
coefs[j] = raid6_gfexp[i];
unmap->to_cnt++;
@@ -320,7 +337,7 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int disks,
dma_flags |= DMA_PREP_PQ_DISABLE_P;
} else {
pq[0] = dma_map_page(dev, P(blocks, disks),
- offset, len,
+ P(offsets, disks), len,
DMA_TO_DEVICE);
unmap->addr[j++] = pq[0];
unmap->to_cnt++;
@@ -330,7 +347,7 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int disks,
dma_flags |= DMA_PREP_PQ_DISABLE_Q;
} else {
pq[1] = dma_map_page(dev, Q(blocks, disks),
- offset, len,
+ Q(offsets, disks), len,
DMA_TO_DEVICE);
unmap->addr[j++] = pq[1];
unmap->to_cnt++;
@@ -355,7 +372,9 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int disks,
async_tx_submit(chan, tx, submit);
} else {
struct page *p_src = P(blocks, disks);
+ unsigned int p_off = P(offsets, disks);
struct page *q_src = Q(blocks, disks);
+ unsigned int q_off = Q(offsets, disks);
enum async_tx_flags flags_orig = submit->flags;
dma_async_tx_callback cb_fn_orig = submit->cb_fn;
void *scribble = submit->scribble;
@@ -381,27 +400,32 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int disks,
if (p_src) {
init_async_submit(submit, ASYNC_TX_XOR_ZERO_DST, NULL,
NULL, NULL, scribble);
- tx = async_xor(spare, blocks, offset, disks-2, len, submit);
+ tx = async_xor_offsets(spare, s_off,
+ blocks, offsets, disks-2, len, submit);
async_tx_quiesce(&tx);
- p = page_address(p_src) + offset;
- s = page_address(spare) + offset;
+ p = page_address(p_src) + p_off;
+ s = page_address(spare) + s_off;
*pqres |= !!memcmp(p, s, len) << SUM_CHECK_P;
}
if (q_src) {
P(blocks, disks) = NULL;
Q(blocks, disks) = spare;
+ Q(offsets, disks) = s_off;
init_async_submit(submit, 0, NULL, NULL, NULL, scribble);
- tx = async_gen_syndrome(blocks, offset, disks, len, submit);
+ tx = async_gen_syndrome(blocks, offsets, disks,
+ len, submit);
async_tx_quiesce(&tx);
- q = page_address(q_src) + offset;
- s = page_address(spare) + offset;
+ q = page_address(q_src) + q_off;
+ s = page_address(spare) + s_off;
*pqres |= !!memcmp(q, s, len) << SUM_CHECK_Q;
}
/* restore P, Q and submit */
P(blocks, disks) = p_src;
+ P(offsets, disks) = p_off;
Q(blocks, disks) = q_src;
+ Q(offsets, disks) = q_off;
submit->cb_fn = cb_fn_orig;
submit->cb_param = cb_param_orig;
diff --git a/include/linux/async_tx.h b/include/linux/async_tx.h
index 8d79e2de06bd..bbda58d48dbd 100644
--- a/include/linux/async_tx.h
+++ b/include/linux/async_tx.h
@@ -186,13 +186,13 @@ async_memcpy(struct page *dest, struct page *src, unsigned int dest_offset,
struct dma_async_tx_descriptor *async_trigger_callback(struct async_submit_ctl *submit);
struct dma_async_tx_descriptor *
-async_gen_syndrome(struct page **blocks, unsigned int offset, int src_cnt,
+async_gen_syndrome(struct page **blocks, unsigned int *offsets, int src_cnt,
size_t len, struct async_submit_ctl *submit);
struct dma_async_tx_descriptor *
-async_syndrome_val(struct page **blocks, unsigned int offset, int src_cnt,
+async_syndrome_val(struct page **blocks, unsigned int *offsets, int src_cnt,
size_t len, enum sum_check_flags *pqres, struct page *spare,
- struct async_submit_ctl *submit);
+ unsigned int s_off, struct async_submit_ctl *submit);
struct dma_async_tx_descriptor *
async_raid6_2data_recov(int src_num, size_t bytes, int faila, int failb,
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 14/16] md/raid6: let async recovery function support different page offset
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
` (12 preceding siblings ...)
2020-07-02 12:06 ` [PATCH v5 13/16] md/raid6: let syndrome computor support different page offset Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 12:06 ` [PATCH v5 15/16] md/raid6: compute syndrome with correct " Yufen Yu
` (2 subsequent siblings)
16 siblings, 0 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
For now, asynchronous raid6 recovery calculate functions are require
common offset for pages. But, we expect them to support different page
offset after introducing stripe shared page. Do that by simplily adding
page offset where each page address are referred.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
crypto/async_tx/async_raid6_recov.c | 163 ++++++++++++++++++++--------
include/linux/async_tx.h | 6 +-
2 files changed, 124 insertions(+), 45 deletions(-)
diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c
index f249142ceac4..0eb323a618b0 100644
--- a/crypto/async_tx/async_raid6_recov.c
+++ b/crypto/async_tx/async_raid6_recov.c
@@ -15,8 +15,9 @@
#include <linux/dmaengine.h>
static struct dma_async_tx_descriptor *
-async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
- size_t len, struct async_submit_ctl *submit)
+async_sum_product(struct page *dest, unsigned int d_off,
+ struct page **srcs, unsigned int *src_offs, unsigned char *coef,
+ size_t len, struct async_submit_ctl *submit)
{
struct dma_chan *chan = async_tx_find_channel(submit, DMA_PQ,
&dest, 1, srcs, 2, len);
@@ -37,11 +38,14 @@ async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
if (submit->flags & ASYNC_TX_FENCE)
dma_flags |= DMA_PREP_FENCE;
- unmap->addr[0] = dma_map_page(dev, srcs[0], 0, len, DMA_TO_DEVICE);
- unmap->addr[1] = dma_map_page(dev, srcs[1], 0, len, DMA_TO_DEVICE);
+ unmap->addr[0] = dma_map_page(dev, srcs[0], src_offs[0],
+ len, DMA_TO_DEVICE);
+ unmap->addr[1] = dma_map_page(dev, srcs[1], src_offs[1],
+ len, DMA_TO_DEVICE);
unmap->to_cnt = 2;
- unmap->addr[2] = dma_map_page(dev, dest, 0, len, DMA_BIDIRECTIONAL);
+ unmap->addr[2] = dma_map_page(dev, dest, d_off,
+ len, DMA_BIDIRECTIONAL);
unmap->bidi_cnt = 1;
/* engine only looks at Q, but expects it to follow P */
pq[1] = unmap->addr[2];
@@ -66,9 +70,9 @@ async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
async_tx_quiesce(&submit->depend_tx);
amul = raid6_gfmul[coef[0]];
bmul = raid6_gfmul[coef[1]];
- a = page_address(srcs[0]);
- b = page_address(srcs[1]);
- c = page_address(dest);
+ a = page_address(srcs[0]) + src_offs[0];
+ b = page_address(srcs[1]) + src_offs[1];
+ c = page_address(dest) + d_off;
while (len--) {
ax = amul[*a++];
@@ -80,8 +84,9 @@ async_sum_product(struct page *dest, struct page **srcs, unsigned char *coef,
}
static struct dma_async_tx_descriptor *
-async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
- struct async_submit_ctl *submit)
+async_mult(struct page *dest, unsigned int d_off, struct page *src,
+ unsigned int s_off, u8 coef, size_t len,
+ struct async_submit_ctl *submit)
{
struct dma_chan *chan = async_tx_find_channel(submit, DMA_PQ,
&dest, 1, &src, 1, len);
@@ -101,9 +106,11 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
if (submit->flags & ASYNC_TX_FENCE)
dma_flags |= DMA_PREP_FENCE;
- unmap->addr[0] = dma_map_page(dev, src, 0, len, DMA_TO_DEVICE);
+ unmap->addr[0] = dma_map_page(dev, src, s_off,
+ len, DMA_TO_DEVICE);
unmap->to_cnt++;
- unmap->addr[1] = dma_map_page(dev, dest, 0, len, DMA_BIDIRECTIONAL);
+ unmap->addr[1] = dma_map_page(dev, dest, d_off,
+ len, DMA_BIDIRECTIONAL);
dma_dest[1] = unmap->addr[1];
unmap->bidi_cnt++;
unmap->len = len;
@@ -133,8 +140,8 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
*/
async_tx_quiesce(&submit->depend_tx);
qmul = raid6_gfmul[coef];
- d = page_address(dest);
- s = page_address(src);
+ d = page_address(dest) + d_off;
+ s = page_address(src) + s_off;
while (len--)
*d++ = qmul[*s++];
@@ -144,11 +151,14 @@ async_mult(struct page *dest, struct page *src, u8 coef, size_t len,
static struct dma_async_tx_descriptor *
__2data_recov_4(int disks, size_t bytes, int faila, int failb,
- struct page **blocks, struct async_submit_ctl *submit)
+ struct page **blocks, unsigned int *offs,
+ struct async_submit_ctl *submit)
{
struct dma_async_tx_descriptor *tx = NULL;
struct page *p, *q, *a, *b;
+ unsigned int p_off, q_off, a_off, b_off;
struct page *srcs[2];
+ unsigned int src_offs[2];
unsigned char coef[2];
enum async_tx_flags flags = submit->flags;
dma_async_tx_callback cb_fn = submit->cb_fn;
@@ -156,26 +166,34 @@ __2data_recov_4(int disks, size_t bytes, int faila, int failb,
void *scribble = submit->scribble;
p = blocks[disks-2];
+ p_off = offs[disks-2];
q = blocks[disks-1];
+ q_off = offs[disks-1];
a = blocks[faila];
+ a_off = offs[faila];
b = blocks[failb];
+ b_off = offs[failb];
/* in the 4 disk case P + Pxy == P and Q + Qxy == Q */
/* Dx = A*(P+Pxy) + B*(Q+Qxy) */
srcs[0] = p;
+ src_offs[0] = p_off;
srcs[1] = q;
+ src_offs[1] = q_off;
coef[0] = raid6_gfexi[failb-faila];
coef[1] = raid6_gfinv[raid6_gfexp[faila]^raid6_gfexp[failb]];
init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, scribble);
- tx = async_sum_product(b, srcs, coef, bytes, submit);
+ tx = async_sum_product(b, b_off, srcs, src_offs, coef, bytes, submit);
/* Dy = P+Pxy+Dx */
srcs[0] = p;
+ src_offs[0] = p_off;
srcs[1] = b;
+ src_offs[1] = b_off;
init_async_submit(submit, flags | ASYNC_TX_XOR_ZERO_DST, tx, cb_fn,
cb_param, scribble);
- tx = async_xor(a, srcs, 0, 2, bytes, submit);
+ tx = async_xor_offsets(a, a_off, srcs, src_offs, 2, bytes, submit);
return tx;
@@ -183,11 +201,14 @@ __2data_recov_4(int disks, size_t bytes, int faila, int failb,
static struct dma_async_tx_descriptor *
__2data_recov_5(int disks, size_t bytes, int faila, int failb,
- struct page **blocks, struct async_submit_ctl *submit)
+ struct page **blocks, unsigned int *offs,
+ struct async_submit_ctl *submit)
{
struct dma_async_tx_descriptor *tx = NULL;
struct page *p, *q, *g, *dp, *dq;
+ unsigned int p_off, q_off, g_off, dp_off, dq_off;
struct page *srcs[2];
+ unsigned int src_offs[2];
unsigned char coef[2];
enum async_tx_flags flags = submit->flags;
dma_async_tx_callback cb_fn = submit->cb_fn;
@@ -208,60 +229,77 @@ __2data_recov_5(int disks, size_t bytes, int faila, int failb,
BUG_ON(good_srcs > 1);
p = blocks[disks-2];
+ p_off = offs[disks-2];
q = blocks[disks-1];
+ q_off = offs[disks-1];
g = blocks[good];
+ g_off = offs[good];
/* Compute syndrome with zero for the missing data pages
* Use the dead data pages as temporary storage for delta p and
* delta q
*/
dp = blocks[faila];
+ dp_off = offs[faila];
dq = blocks[failb];
+ dq_off = offs[failb];
init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, scribble);
- tx = async_memcpy(dp, g, 0, 0, bytes, submit);
+ tx = async_memcpy(dp, g, dp_off, g_off, bytes, submit);
init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, scribble);
- tx = async_mult(dq, g, raid6_gfexp[good], bytes, submit);
+ tx = async_mult(dq, dq_off, g, g_off,
+ raid6_gfexp[good], bytes, submit);
/* compute P + Pxy */
srcs[0] = dp;
+ src_offs[0] = dp_off;
srcs[1] = p;
+ src_offs[1] = p_off;
init_async_submit(submit, ASYNC_TX_FENCE|ASYNC_TX_XOR_DROP_DST, tx,
NULL, NULL, scribble);
- tx = async_xor(dp, srcs, 0, 2, bytes, submit);
+ tx = async_xor_offsets(dp, dp_off, srcs, src_offs, 2, bytes, submit);
/* compute Q + Qxy */
srcs[0] = dq;
+ src_offs[0] = dq_off;
srcs[1] = q;
+ src_offs[1] = q_off;
init_async_submit(submit, ASYNC_TX_FENCE|ASYNC_TX_XOR_DROP_DST, tx,
NULL, NULL, scribble);
- tx = async_xor(dq, srcs, 0, 2, bytes, submit);
+ tx = async_xor_offsets(dq, dq_off, srcs, src_offs, 2, bytes, submit);
/* Dx = A*(P+Pxy) + B*(Q+Qxy) */
srcs[0] = dp;
+ src_offs[0] = dp_off;
srcs[1] = dq;
+ src_offs[1] = dq_off;
coef[0] = raid6_gfexi[failb-faila];
coef[1] = raid6_gfinv[raid6_gfexp[faila]^raid6_gfexp[failb]];
init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, scribble);
- tx = async_sum_product(dq, srcs, coef, bytes, submit);
+ tx = async_sum_product(dq, dq_off, srcs, src_offs, coef, bytes, submit);
/* Dy = P+Pxy+Dx */
srcs[0] = dp;
+ src_offs[0] = dp_off;
srcs[1] = dq;
+ src_offs[1] = dq_off;
init_async_submit(submit, flags | ASYNC_TX_XOR_DROP_DST, tx, cb_fn,
cb_param, scribble);
- tx = async_xor(dp, srcs, 0, 2, bytes, submit);
+ tx = async_xor_offsets(dp, dp_off, srcs, src_offs, 2, bytes, submit);
return tx;
}
static struct dma_async_tx_descriptor *
__2data_recov_n(int disks, size_t bytes, int faila, int failb,
- struct page **blocks, struct async_submit_ctl *submit)
+ struct page **blocks, unsigned int *offs,
+ struct async_submit_ctl *submit)
{
struct dma_async_tx_descriptor *tx = NULL;
struct page *p, *q, *dp, *dq;
+ unsigned int p_off, q_off, dp_off, dq_off;
struct page *srcs[2];
+ unsigned int src_offs[2];
unsigned char coef[2];
enum async_tx_flags flags = submit->flags;
dma_async_tx_callback cb_fn = submit->cb_fn;
@@ -269,56 +307,74 @@ __2data_recov_n(int disks, size_t bytes, int faila, int failb,
void *scribble = submit->scribble;
p = blocks[disks-2];
+ p_off = offs[disks-2];
q = blocks[disks-1];
+ q_off = offs[disks-1];
/* Compute syndrome with zero for the missing data pages
* Use the dead data pages as temporary storage for
* delta p and delta q
*/
dp = blocks[faila];
+ dp_off = offs[faila];
blocks[faila] = NULL;
blocks[disks-2] = dp;
+ offs[disks-2] = dp_off;
dq = blocks[failb];
+ dq_off = offs[failb];
blocks[failb] = NULL;
blocks[disks-1] = dq;
+ offs[disks-1] = dq_off;
init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, scribble);
- tx = async_gen_syndrome(blocks, 0, disks, bytes, submit);
+ tx = async_gen_syndrome(blocks, offs, disks, bytes, submit);
/* Restore pointer table */
blocks[faila] = dp;
+ offs[faila] = dp_off;
blocks[failb] = dq;
+ offs[failb] = dq_off;
blocks[disks-2] = p;
+ offs[disks-2] = p_off;
blocks[disks-1] = q;
+ offs[disks-1] = q_off;
/* compute P + Pxy */
srcs[0] = dp;
+ src_offs[0] = dp_off;
srcs[1] = p;
+ src_offs[1] = p_off;
init_async_submit(submit, ASYNC_TX_FENCE|ASYNC_TX_XOR_DROP_DST, tx,
NULL, NULL, scribble);
- tx = async_xor(dp, srcs, 0, 2, bytes, submit);
+ tx = async_xor_offsets(dp, dp_off, srcs, src_offs, 2, bytes, submit);
/* compute Q + Qxy */
srcs[0] = dq;
+ src_offs[0] = dq_off;
srcs[1] = q;
+ src_offs[1] = q_off;
init_async_submit(submit, ASYNC_TX_FENCE|ASYNC_TX_XOR_DROP_DST, tx,
NULL, NULL, scribble);
- tx = async_xor(dq, srcs, 0, 2, bytes, submit);
+ tx = async_xor_offsets(dq, dq_off, srcs, src_offs, 2, bytes, submit);
/* Dx = A*(P+Pxy) + B*(Q+Qxy) */
srcs[0] = dp;
+ src_offs[0] = dp_off;
srcs[1] = dq;
+ src_offs[1] = dq_off;
coef[0] = raid6_gfexi[failb-faila];
coef[1] = raid6_gfinv[raid6_gfexp[faila]^raid6_gfexp[failb]];
init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, scribble);
- tx = async_sum_product(dq, srcs, coef, bytes, submit);
+ tx = async_sum_product(dq, dq_off, srcs, src_offs, coef, bytes, submit);
/* Dy = P+Pxy+Dx */
srcs[0] = dp;
+ src_offs[0] = dp_off;
srcs[1] = dq;
+ src_offs[1] = dq_off;
init_async_submit(submit, flags | ASYNC_TX_XOR_DROP_DST, tx, cb_fn,
cb_param, scribble);
- tx = async_xor(dp, srcs, 0, 2, bytes, submit);
+ tx = async_xor_offsets(dp, dp_off, srcs, src_offs, 2, bytes, submit);
return tx;
}
@@ -330,11 +386,13 @@ __2data_recov_n(int disks, size_t bytes, int faila, int failb,
* @faila: first failed drive index
* @failb: second failed drive index
* @blocks: array of source pointers where the last two entries are p and q
+ * @offs: array of offset for pages in blocks
* @submit: submission/completion modifiers
*/
struct dma_async_tx_descriptor *
async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb,
- struct page **blocks, struct async_submit_ctl *submit)
+ struct page **blocks, unsigned int *offs,
+ struct async_submit_ctl *submit)
{
void *scribble = submit->scribble;
int non_zero_srcs, i;
@@ -358,7 +416,7 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb,
if (blocks[i] == NULL)
ptrs[i] = (void *) raid6_empty_zero_page;
else
- ptrs[i] = page_address(blocks[i]);
+ ptrs[i] = page_address(blocks[i]) + offs[i];
raid6_2data_recov(disks, bytes, faila, failb, ptrs);
@@ -383,16 +441,19 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb,
* explicitly handle the special case of a 4 disk array with
* both data disks missing.
*/
- return __2data_recov_4(disks, bytes, faila, failb, blocks, submit);
+ return __2data_recov_4(disks, bytes, faila, failb,
+ blocks, offs, submit);
case 3:
/* dma devices do not uniformly understand a single
* source pq operation (in contrast to the synchronous
* case), so explicitly handle the special case of a 5 disk
* array with 2 of 3 data disks missing.
*/
- return __2data_recov_5(disks, bytes, faila, failb, blocks, submit);
+ return __2data_recov_5(disks, bytes, faila, failb,
+ blocks, offs, submit);
default:
- return __2data_recov_n(disks, bytes, faila, failb, blocks, submit);
+ return __2data_recov_n(disks, bytes, faila, failb,
+ blocks, offs, submit);
}
}
EXPORT_SYMBOL_GPL(async_raid6_2data_recov);
@@ -403,14 +464,17 @@ EXPORT_SYMBOL_GPL(async_raid6_2data_recov);
* @bytes: block size
* @faila: failed drive index
* @blocks: array of source pointers where the last two entries are p and q
+ * @offs: array of offset for pages in blocks
* @submit: submission/completion modifiers
*/
struct dma_async_tx_descriptor *
async_raid6_datap_recov(int disks, size_t bytes, int faila,
- struct page **blocks, struct async_submit_ctl *submit)
+ struct page **blocks, unsigned int *offs,
+ struct async_submit_ctl *submit)
{
struct dma_async_tx_descriptor *tx = NULL;
struct page *p, *q, *dq;
+ unsigned int p_off, q_off, dq_off;
u8 coef;
enum async_tx_flags flags = submit->flags;
dma_async_tx_callback cb_fn = submit->cb_fn;
@@ -418,6 +482,7 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila,
void *scribble = submit->scribble;
int good_srcs, good, i;
struct page *srcs[2];
+ unsigned int src_offs[2];
pr_debug("%s: disks: %d len: %zu\n", __func__, disks, bytes);
@@ -434,7 +499,7 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila,
if (blocks[i] == NULL)
ptrs[i] = (void*)raid6_empty_zero_page;
else
- ptrs[i] = page_address(blocks[i]);
+ ptrs[i] = page_address(blocks[i]) + offs[i];
raid6_datap_recov(disks, bytes, faila, ptrs);
@@ -458,55 +523,67 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila,
BUG_ON(good_srcs == 0);
p = blocks[disks-2];
+ p_off = offs[disks-2];
q = blocks[disks-1];
+ q_off = offs[disks-1];
/* Compute syndrome with zero for the missing data page
* Use the dead data page as temporary storage for delta q
*/
dq = blocks[faila];
+ dq_off = offs[faila];
blocks[faila] = NULL;
blocks[disks-1] = dq;
+ offs[disks-1] = dq_off;
/* in the 4-disk case we only need to perform a single source
* multiplication with the one good data block.
*/
if (good_srcs == 1) {
struct page *g = blocks[good];
+ unsigned int g_off = offs[good];
init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL,
scribble);
- tx = async_memcpy(p, g, 0, 0, bytes, submit);
+ tx = async_memcpy(p, g, p_off, g_off, bytes, submit);
init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL,
scribble);
- tx = async_mult(dq, g, raid6_gfexp[good], bytes, submit);
+ tx = async_mult(dq, dq_off, g, g_off,
+ raid6_gfexp[good], bytes, submit);
} else {
init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL,
scribble);
- tx = async_gen_syndrome(blocks, 0, disks, bytes, submit);
+ tx = async_gen_syndrome(blocks, offs, disks, bytes, submit);
}
/* Restore pointer table */
blocks[faila] = dq;
+ offs[faila] = dq_off;
blocks[disks-1] = q;
+ offs[disks-1] = q_off;
/* calculate g^{-faila} */
coef = raid6_gfinv[raid6_gfexp[faila]];
srcs[0] = dq;
+ src_offs[0] = dq_off;
srcs[1] = q;
+ src_offs[1] = q_off;
init_async_submit(submit, ASYNC_TX_FENCE|ASYNC_TX_XOR_DROP_DST, tx,
NULL, NULL, scribble);
- tx = async_xor(dq, srcs, 0, 2, bytes, submit);
+ tx = async_xor_offsets(dq, dq_off, srcs, src_offs, 2, bytes, submit);
init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, scribble);
- tx = async_mult(dq, dq, coef, bytes, submit);
+ tx = async_mult(dq, dq_off, dq, dq_off, coef, bytes, submit);
srcs[0] = p;
+ src_offs[0] = p_off;
srcs[1] = dq;
+ src_offs[1] = dq_off;
init_async_submit(submit, flags | ASYNC_TX_XOR_DROP_DST, tx, cb_fn,
cb_param, scribble);
- tx = async_xor(p, srcs, 0, 2, bytes, submit);
+ tx = async_xor_offsets(p, p_off, srcs, src_offs, 2, bytes, submit);
return tx;
}
diff --git a/include/linux/async_tx.h b/include/linux/async_tx.h
index bbda58d48dbd..84d5cc5ff060 100644
--- a/include/linux/async_tx.h
+++ b/include/linux/async_tx.h
@@ -196,11 +196,13 @@ async_syndrome_val(struct page **blocks, unsigned int *offsets, int src_cnt,
struct dma_async_tx_descriptor *
async_raid6_2data_recov(int src_num, size_t bytes, int faila, int failb,
- struct page **ptrs, struct async_submit_ctl *submit);
+ struct page **ptrs, unsigned int *offs,
+ struct async_submit_ctl *submit);
struct dma_async_tx_descriptor *
async_raid6_datap_recov(int src_num, size_t bytes, int faila,
- struct page **ptrs, struct async_submit_ctl *submit);
+ struct page **ptrs, unsigned int *offs,
+ struct async_submit_ctl *submit);
void async_tx_quiesce(struct dma_async_tx_descriptor **tx);
#endif /* _ASYNC_TX_H_ */
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 15/16] md/raid6: compute syndrome with correct page offset
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
` (13 preceding siblings ...)
2020-07-02 12:06 ` [PATCH v5 14/16] md/raid6: let async recovery function " Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 12:06 ` [PATCH v5 16/16] raid6test: adaptation with syndrome function Yufen Yu
2020-07-02 23:00 ` [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Song Liu
16 siblings, 0 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
When raid6 compute syndrome, the pages address will be passed to computer
function. After trying to support shared page between multiple sh->dev,
we also need to let computor know the correct location of each page.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
drivers/md/raid5.c | 69 +++++++++++++++++++++++++++++++---------------
1 file changed, 47 insertions(+), 22 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a3376a4e4e5c..533097da2ea6 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1531,6 +1531,7 @@ ops_run_compute5(struct stripe_head *sh, struct raid5_percpu *percpu)
/* set_syndrome_sources - populate source buffers for gen_syndrome
* @srcs - (struct page *) array of size sh->disks
+ * @offs - (unsigned int) array of offset for each page
* @sh - stripe_head to parse
*
* Populates srcs in proper layout order for the stripe and returns the
@@ -1539,6 +1540,7 @@ ops_run_compute5(struct stripe_head *sh, struct raid5_percpu *percpu)
* is recorded in srcs[count+1]].
*/
static int set_syndrome_sources(struct page **srcs,
+ unsigned int *offs,
struct stripe_head *sh,
int srctype)
{
@@ -1569,6 +1571,12 @@ static int set_syndrome_sources(struct page **srcs,
srcs[slot] = sh->dev[i].orig_page;
else
srcs[slot] = sh->dev[i].page;
+ /*
+ * For R5_InJournal, PAGE_SIZE must be 4KB and will
+ * not use r5pages. In that case, dev[i].offset
+ * is 0. So we can also use the value directly.
+ */
+ offs[slot] = sh->dev[i].offset;
}
i = raid6_next_disk(i, disks);
} while (i != d0_idx);
@@ -1581,12 +1589,14 @@ ops_run_compute6_1(struct stripe_head *sh, struct raid5_percpu *percpu)
{
int disks = sh->disks;
struct page **blocks = to_addr_page(percpu, 0);
+ unsigned int *offs = to_addr_offs(sh, percpu);
int target;
int qd_idx = sh->qd_idx;
struct dma_async_tx_descriptor *tx;
struct async_submit_ctl submit;
struct r5dev *tgt;
struct page *dest;
+ unsigned int dest_off;
int i;
int count;
struct r5conf *conf = sh->raid_conf;
@@ -1606,32 +1616,35 @@ ops_run_compute6_1(struct stripe_head *sh, struct raid5_percpu *percpu)
tgt = &sh->dev[target];
BUG_ON(!test_bit(R5_Wantcompute, &tgt->flags));
dest = tgt->page;
+ dest_off = tgt->offset;
atomic_inc(&sh->count);
if (target == qd_idx) {
- count = set_syndrome_sources(blocks, sh, SYNDROME_SRC_ALL);
+ count = set_syndrome_sources(blocks, offs,
+ sh, SYNDROME_SRC_ALL);
blocks[count] = NULL; /* regenerating p is not necessary */
BUG_ON(blocks[count+1] != dest); /* q should already be set */
init_async_submit(&submit, ASYNC_TX_FENCE, NULL,
ops_complete_compute, sh,
to_addr_conv(sh, percpu, 0));
- tx = async_gen_syndrome(blocks, 0, count+2,
- conf->stripe_size, &submit);
+ tx = async_gen_syndrome(blocks, offs,
+ count+2, conf->stripe_size, &submit);
} else {
/* Compute any data- or p-drive using XOR */
count = 0;
for (i = disks; i-- ; ) {
if (i == target || i == qd_idx)
continue;
+ offs[count] = sh->dev[i].offset;
blocks[count++] = sh->dev[i].page;
}
init_async_submit(&submit, ASYNC_TX_FENCE|ASYNC_TX_XOR_ZERO_DST,
NULL, ops_complete_compute, sh,
to_addr_conv(sh, percpu, 0));
- tx = async_xor(dest, blocks, 0, count,
- conf->stripe_size, &submit);
+ tx = async_xor_offsets(dest, dest_off, blocks, offs,
+ count, conf->stripe_size, &submit);
}
return tx;
@@ -1650,6 +1663,7 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu)
struct r5dev *tgt2 = &sh->dev[target2];
struct dma_async_tx_descriptor *tx;
struct page **blocks = to_addr_page(percpu, 0);
+ unsigned int *offs = to_addr_offs(sh, percpu);
struct async_submit_ctl submit;
struct r5conf *conf = sh->raid_conf;
@@ -1670,6 +1684,7 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu)
do {
int slot = raid6_idx_to_slot(i, sh, &count, syndrome_disks);
+ offs[slot] = sh->dev[i].offset;
blocks[slot] = sh->dev[i].page;
if (i == target)
@@ -1694,10 +1709,12 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu)
init_async_submit(&submit, ASYNC_TX_FENCE, NULL,
ops_complete_compute, sh,
to_addr_conv(sh, percpu, 0));
- return async_gen_syndrome(blocks, 0, syndrome_disks+2,
- conf->stripe_size, &submit);
+ return async_gen_syndrome(blocks, offs,
+ syndrome_disks+2,
+ conf->stripe_size, &submit);
} else {
struct page *dest;
+ unsigned int dest_off;
int data_target;
int qd_idx = sh->qd_idx;
@@ -1711,21 +1728,24 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu)
for (i = disks; i-- ; ) {
if (i == data_target || i == qd_idx)
continue;
+ offs[count] = sh->dev[i].offset;
blocks[count++] = sh->dev[i].page;
}
dest = sh->dev[data_target].page;
+ dest_off = sh->dev[data_target].offset;
init_async_submit(&submit,
ASYNC_TX_FENCE|ASYNC_TX_XOR_ZERO_DST,
NULL, NULL, NULL,
to_addr_conv(sh, percpu, 0));
- tx = async_xor(dest, blocks, 0, count,
- conf->stripe_size, &submit);
+ tx = async_xor_offsets(dest, dest_off, blocks, offs,
+ count, conf->stripe_size, &submit);
- count = set_syndrome_sources(blocks, sh, SYNDROME_SRC_ALL);
+ count = set_syndrome_sources(blocks, offs,
+ sh, SYNDROME_SRC_ALL);
init_async_submit(&submit, ASYNC_TX_FENCE, tx,
ops_complete_compute, sh,
to_addr_conv(sh, percpu, 0));
- return async_gen_syndrome(blocks, 0, count+2,
+ return async_gen_syndrome(blocks, offs, count+2,
conf->stripe_size, &submit);
}
} else {
@@ -1736,13 +1756,13 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu)
/* We're missing D+P. */
return async_raid6_datap_recov(syndrome_disks+2,
conf->stripe_size, faila,
- blocks, &submit);
+ blocks, offs, &submit);
} else {
/* We're missing D+D. */
return async_raid6_2data_recov(syndrome_disks+2,
conf->stripe_size,
faila, failb,
- blocks, &submit);
+ blocks, offs, &submit);
}
}
}
@@ -1810,6 +1830,7 @@ ops_run_prexor6(struct stripe_head *sh, struct raid5_percpu *percpu,
struct dma_async_tx_descriptor *tx)
{
struct page **blocks = to_addr_page(percpu, 0);
+ unsigned int *offs = to_addr_offs(sh, percpu);
int count;
struct async_submit_ctl submit;
struct r5conf *conf = sh->raid_conf;
@@ -1817,12 +1838,12 @@ ops_run_prexor6(struct stripe_head *sh, struct raid5_percpu *percpu,
pr_debug("%s: stripe %llu\n", __func__,
(unsigned long long)sh->sector);
- count = set_syndrome_sources(blocks, sh, SYNDROME_SRC_WANT_DRAIN);
+ count = set_syndrome_sources(blocks, offs, sh, SYNDROME_SRC_WANT_DRAIN);
init_async_submit(&submit, ASYNC_TX_FENCE|ASYNC_TX_PQ_XOR_DST, tx,
ops_complete_prexor, sh, to_addr_conv(sh, percpu, 0));
- tx = async_gen_syndrome(blocks, 0, count+2,
- conf->stripe_size, &submit);
+ tx = async_gen_syndrome(blocks, offs, count+2,
+ conf->stripe_size, &submit);
return tx;
}
@@ -2054,6 +2075,7 @@ ops_run_reconstruct6(struct stripe_head *sh, struct raid5_percpu *percpu,
{
struct async_submit_ctl submit;
struct page **blocks;
+ unsigned int *offs;
int count, i, j = 0;
struct stripe_head *head_sh = sh;
int last_stripe;
@@ -2079,6 +2101,7 @@ ops_run_reconstruct6(struct stripe_head *sh, struct raid5_percpu *percpu,
again:
blocks = to_addr_page(percpu, j);
+ offs = to_addr_offs(sh, percpu);
if (sh->reconstruct_state == reconstruct_state_prexor_drain_run) {
synflags = SYNDROME_SRC_WRITTEN;
@@ -2088,7 +2111,7 @@ ops_run_reconstruct6(struct stripe_head *sh, struct raid5_percpu *percpu,
txflags = ASYNC_TX_ACK;
}
- count = set_syndrome_sources(blocks, sh, synflags);
+ count = set_syndrome_sources(blocks, offs, sh, synflags);
last_stripe = !head_sh->batch_head ||
list_first_entry(&sh->batch_list,
struct stripe_head, batch_list) == head_sh;
@@ -2100,8 +2123,8 @@ ops_run_reconstruct6(struct stripe_head *sh, struct raid5_percpu *percpu,
} else
init_async_submit(&submit, 0, tx, NULL, NULL,
to_addr_conv(sh, percpu, j));
- tx = async_gen_syndrome(blocks, 0, count+2,
- conf->stripe_size, &submit);
+ tx = async_gen_syndrome(blocks, offs, count+2,
+ conf->stripe_size, &submit);
if (!last_stripe) {
j++;
sh = list_first_entry(&sh->batch_list, struct stripe_head,
@@ -2168,6 +2191,7 @@ static void ops_run_check_p(struct stripe_head *sh, struct raid5_percpu *percpu)
static void ops_run_check_pq(struct stripe_head *sh, struct raid5_percpu *percpu, int checkp)
{
struct page **srcs = to_addr_page(percpu, 0);
+ unsigned int *offs = to_addr_offs(sh, percpu);
struct async_submit_ctl submit;
int count;
struct r5conf *conf = sh->raid_conf;
@@ -2176,15 +2200,16 @@ static void ops_run_check_pq(struct stripe_head *sh, struct raid5_percpu *percpu
(unsigned long long)sh->sector, checkp);
BUG_ON(sh->batch_head);
- count = set_syndrome_sources(srcs, sh, SYNDROME_SRC_ALL);
+ count = set_syndrome_sources(srcs, offs, sh, SYNDROME_SRC_ALL);
if (!checkp)
srcs[count] = NULL;
atomic_inc(&sh->count);
init_async_submit(&submit, ASYNC_TX_ACK, NULL, ops_complete_check,
sh, to_addr_conv(sh, percpu, 0));
- async_syndrome_val(srcs, 0, count+2, conf->stripe_size,
- &sh->ops.zero_sum_result, percpu->spare_page, &submit);
+ async_syndrome_val(srcs, offs, count+2, conf->stripe_size,
+ &sh->ops.zero_sum_result,
+ percpu->spare_page, 0, &submit);
}
static void raid_run_ops(struct stripe_head *sh, unsigned long ops_request)
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH v5 16/16] raid6test: adaptation with syndrome function
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
` (14 preceding siblings ...)
2020-07-02 12:06 ` [PATCH v5 15/16] md/raid6: compute syndrome with correct " Yufen Yu
@ 2020-07-02 12:06 ` Yufen Yu
2020-07-02 23:00 ` [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Song Liu
16 siblings, 0 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-02 12:06 UTC (permalink / raw)
To: song; +Cc: linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
After changing some syndrome and recovery functions to support
different page offsets, we also need to adapt raid6test module.
In this module, pages are allocated by the itself and their offset
are '0'.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
---
crypto/async_tx/raid6test.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/crypto/async_tx/raid6test.c b/crypto/async_tx/raid6test.c
index 14e73dcd7475..66db82e5a3b1 100644
--- a/crypto/async_tx/raid6test.c
+++ b/crypto/async_tx/raid6test.c
@@ -18,6 +18,7 @@
#define NDISKS 64 /* Including P and Q */
static struct page *dataptrs[NDISKS];
+unsigned int dataoffs[NDISKS];
static addr_conv_t addr_conv[NDISKS];
static struct page *data[NDISKS+3];
static struct page *spare;
@@ -38,6 +39,7 @@ static void makedata(int disks)
for (i = 0; i < disks; i++) {
prandom_bytes(page_address(data[i]), PAGE_SIZE);
dataptrs[i] = data[i];
+ dataoffs[i] = 0;
}
}
@@ -52,7 +54,8 @@ static char disk_type(int d, int disks)
}
/* Recover two failed blocks. */
-static void raid6_dual_recov(int disks, size_t bytes, int faila, int failb, struct page **ptrs)
+static void raid6_dual_recov(int disks, size_t bytes, int faila, int failb,
+ struct page **ptrs, unsigned int *offs)
{
struct async_submit_ctl submit;
struct completion cmp;
@@ -66,7 +69,8 @@ static void raid6_dual_recov(int disks, size_t bytes, int faila, int failb, stru
if (faila == disks-2) {
/* P+Q failure. Just rebuild the syndrome. */
init_async_submit(&submit, 0, NULL, NULL, NULL, addr_conv);
- tx = async_gen_syndrome(ptrs, 0, disks, bytes, &submit);
+ tx = async_gen_syndrome(ptrs, offs,
+ disks, bytes, &submit);
} else {
struct page *blocks[NDISKS];
struct page *dest;
@@ -89,22 +93,26 @@ static void raid6_dual_recov(int disks, size_t bytes, int faila, int failb, stru
tx = async_xor(dest, blocks, 0, count, bytes, &submit);
init_async_submit(&submit, 0, tx, NULL, NULL, addr_conv);
- tx = async_gen_syndrome(ptrs, 0, disks, bytes, &submit);
+ tx = async_gen_syndrome(ptrs, offs,
+ disks, bytes, &submit);
}
} else {
if (failb == disks-2) {
/* data+P failure. */
init_async_submit(&submit, 0, NULL, NULL, NULL, addr_conv);
- tx = async_raid6_datap_recov(disks, bytes, faila, ptrs, &submit);
+ tx = async_raid6_datap_recov(disks, bytes,
+ faila, ptrs, offs, &submit);
} else {
/* data+data failure. */
init_async_submit(&submit, 0, NULL, NULL, NULL, addr_conv);
- tx = async_raid6_2data_recov(disks, bytes, faila, failb, ptrs, &submit);
+ tx = async_raid6_2data_recov(disks, bytes,
+ faila, failb, ptrs, offs, &submit);
}
}
init_completion(&cmp);
init_async_submit(&submit, ASYNC_TX_ACK, tx, callback, &cmp, addr_conv);
- tx = async_syndrome_val(ptrs, 0, disks, bytes, &result, spare, &submit);
+ tx = async_syndrome_val(ptrs, offs,
+ disks, bytes, &result, spare, 0, &submit);
async_tx_issue_pending(tx);
if (wait_for_completion_timeout(&cmp, msecs_to_jiffies(3000)) == 0)
@@ -126,7 +134,7 @@ static int test_disks(int i, int j, int disks)
dataptrs[i] = recovi;
dataptrs[j] = recovj;
- raid6_dual_recov(disks, PAGE_SIZE, i, j, dataptrs);
+ raid6_dual_recov(disks, PAGE_SIZE, i, j, dataptrs, dataoffs);
erra = memcmp(page_address(data[i]), page_address(recovi), PAGE_SIZE);
errb = memcmp(page_address(data[j]), page_address(recovj), PAGE_SIZE);
@@ -162,7 +170,7 @@ static int test(int disks, int *tests)
/* Generate assumed good syndrome */
init_completion(&cmp);
init_async_submit(&submit, ASYNC_TX_ACK, NULL, callback, &cmp, addr_conv);
- tx = async_gen_syndrome(dataptrs, 0, disks, PAGE_SIZE, &submit);
+ tx = async_gen_syndrome(dataptrs, dataoffs, disks, PAGE_SIZE, &submit);
async_tx_issue_pending(tx);
if (wait_for_completion_timeout(&cmp, msecs_to_jiffies(3000)) == 0) {
--
2.25.4
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf
2020-07-02 12:06 ` [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf Yufen Yu
@ 2020-07-02 14:51 ` kernel test robot
2020-07-02 15:44 ` kernel test robot
` (2 subsequent siblings)
3 siblings, 0 replies; 37+ messages in thread
From: kernel test robot @ 2020-07-02 14:51 UTC (permalink / raw)
To: song; +Cc: kbuild-all, linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
[-- Attachment #1: Type: text/plain, Size: 16185 bytes --]
Hi Yufen,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on song-md/md-next]
[also build test WARNING on cryptodev/master v5.8-rc3 next-20200702]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Yufen-Yu/md-raid5-set-STRIPE_SIZE-as-a-configurable-value/20200702-200949
base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next
config: parisc-defconfig (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
In file included from include/linux/kernel.h:15,
from arch/parisc/include/asm/bug.h:5,
from include/linux/bug.h:5,
from include/linux/thread_info.h:12,
from include/asm-generic/current.h:5,
from ./arch/parisc/include/generated/asm/current.h:1,
from include/linux/sched.h:12,
from include/linux/blkdev.h:5,
from drivers/md/raid5.c:38:
drivers/md/raid5.c: In function 'raid5_end_read_request':
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat=]
5 | #define KERN_SOH "\001" /* ASCII Start Of Header */
| ^~~~~~
include/linux/printk.h:507:10: note: in definition of macro 'printk_ratelimited'
507 | printk(fmt, ##__VA_ARGS__); \
| ^~~
include/linux/kern_levels.h:14:19: note: in expansion of macro 'KERN_SOH'
14 | #define KERN_INFO KERN_SOH "6" /* informational */
| ^~~~~~~~
include/linux/printk.h:527:21: note: in expansion of macro 'KERN_INFO'
527 | printk_ratelimited(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
| ^~~~~~~~~
>> drivers/md/raid5.c:2533:4: note: in expansion of macro 'pr_info_ratelimited'
2533 | pr_info_ratelimited(
| ^~~~~~~~~~~~~~~~~~~
drivers/md/raid5.c:2534:42: note: format string is defined here
2534 | "md/raid:%s: read error corrected (%lu sectors at %llu on %s)\n",
| ~~^
| |
| long unsigned int
| %u
In file included from include/linux/printk.h:7,
from include/linux/kernel.h:15,
from arch/parisc/include/asm/bug.h:5,
from include/linux/bug.h:5,
from include/linux/thread_info.h:12,
from include/asm-generic/current.h:5,
from ./arch/parisc/include/generated/asm/current.h:1,
from include/linux/sched.h:12,
from include/linux/blkdev.h:5,
from drivers/md/raid5.c:38:
drivers/md/raid5.c: In function 'check_stripe_cache':
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat=]
5 | #define KERN_SOH "\001" /* ASCII Start Of Header */
| ^~~~~~
include/linux/kern_levels.h:12:22: note: in expansion of macro 'KERN_SOH'
12 | #define KERN_WARNING KERN_SOH "4" /* warning conditions */
| ^~~~~~~~
include/linux/printk.h:348:9: note: in expansion of macro 'KERN_WARNING'
348 | printk(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
| ^~~~~~~~~~~~
>> drivers/md/raid5.c:7853:3: note: in expansion of macro 'pr_warn'
7853 | pr_warn("md/raid:%s: reshape: not enough stripes. Needed %lu\n",
| ^~~~~~~
drivers/md/raid5.c:7853:63: note: format string is defined here
7853 | pr_warn("md/raid:%s: reshape: not enough stripes. Needed %lu\n",
| ~~^
| |
| long unsigned int
| %u
drivers/md/raid5.c: In function 'raid5_start_reshape':
drivers/md/raid5.c:7993:31: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
7993 | /* Failure here is OK */;
| ^
vim +/pr_info_ratelimited +2533 drivers/md/raid5.c
^1da177e4c3f41 Linus Torvalds 2005-04-16 2490
4246a0b63bd8f5 Christoph Hellwig 2015-07-20 2491 static void raid5_end_read_request(struct bio * bi)
^1da177e4c3f41 Linus Torvalds 2005-04-16 2492 {
^1da177e4c3f41 Linus Torvalds 2005-04-16 2493 struct stripe_head *sh = bi->bi_private;
d1688a6d5515f1 NeilBrown 2011-10-11 2494 struct r5conf *conf = sh->raid_conf;
7ecaa1e6a1ad69 NeilBrown 2006-03-27 2495 int disks = sh->disks, i;
d69504325978c4 NeilBrown 2006-07-10 2496 char b[BDEVNAME_SIZE];
dd054fce88d33d NeilBrown 2011-12-23 2497 struct md_rdev *rdev = NULL;
05616be5e11f66 NeilBrown 2012-05-21 2498 sector_t s;
^1da177e4c3f41 Linus Torvalds 2005-04-16 2499
^1da177e4c3f41 Linus Torvalds 2005-04-16 2500 for (i=0 ; i<disks; i++)
^1da177e4c3f41 Linus Torvalds 2005-04-16 2501 if (bi == &sh->dev[i].req)
^1da177e4c3f41 Linus Torvalds 2005-04-16 2502 break;
^1da177e4c3f41 Linus Torvalds 2005-04-16 2503
4246a0b63bd8f5 Christoph Hellwig 2015-07-20 2504 pr_debug("end_read_request %llu/%d, count: %d, error %d.\n",
^1da177e4c3f41 Linus Torvalds 2005-04-16 2505 (unsigned long long)sh->sector, i, atomic_read(&sh->count),
4e4cbee93d5613 Christoph Hellwig 2017-06-03 2506 bi->bi_status);
^1da177e4c3f41 Linus Torvalds 2005-04-16 2507 if (i == disks) {
5f9d1fde7d54a5 Shaohua Li 2016-08-22 2508 bio_reset(bi);
^1da177e4c3f41 Linus Torvalds 2005-04-16 2509 BUG();
6712ecf8f64811 NeilBrown 2007-09-27 2510 return;
^1da177e4c3f41 Linus Torvalds 2005-04-16 2511 }
14a75d3e07c784 NeilBrown 2011-12-23 2512 if (test_bit(R5_ReadRepl, &sh->dev[i].flags))
dd054fce88d33d NeilBrown 2011-12-23 2513 /* If replacement finished while this request was outstanding,
dd054fce88d33d NeilBrown 2011-12-23 2514 * 'replacement' might be NULL already.
dd054fce88d33d NeilBrown 2011-12-23 2515 * In that case it moved down to 'rdev'.
dd054fce88d33d NeilBrown 2011-12-23 2516 * rdev is not removed until all requests are finished.
dd054fce88d33d NeilBrown 2011-12-23 2517 */
14a75d3e07c784 NeilBrown 2011-12-23 2518 rdev = conf->disks[i].replacement;
dd054fce88d33d NeilBrown 2011-12-23 2519 if (!rdev)
14a75d3e07c784 NeilBrown 2011-12-23 2520 rdev = conf->disks[i].rdev;
^1da177e4c3f41 Linus Torvalds 2005-04-16 2521
05616be5e11f66 NeilBrown 2012-05-21 2522 if (use_new_offset(conf, sh))
05616be5e11f66 NeilBrown 2012-05-21 2523 s = sh->sector + rdev->new_data_offset;
05616be5e11f66 NeilBrown 2012-05-21 2524 else
05616be5e11f66 NeilBrown 2012-05-21 2525 s = sh->sector + rdev->data_offset;
4e4cbee93d5613 Christoph Hellwig 2017-06-03 2526 if (!bi->bi_status) {
^1da177e4c3f41 Linus Torvalds 2005-04-16 2527 set_bit(R5_UPTODATE, &sh->dev[i].flags);
4e5314b56a7ea1 NeilBrown 2005-11-08 2528 if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
14a75d3e07c784 NeilBrown 2011-12-23 2529 /* Note that this cannot happen on a
14a75d3e07c784 NeilBrown 2011-12-23 2530 * replacement device. We just fail those on
14a75d3e07c784 NeilBrown 2011-12-23 2531 * any error
14a75d3e07c784 NeilBrown 2011-12-23 2532 */
cc6167b4f3b3ca NeilBrown 2016-11-02 @2533 pr_info_ratelimited(
cc6167b4f3b3ca NeilBrown 2016-11-02 2534 "md/raid:%s: read error corrected (%lu sectors at %llu on %s)\n",
98afa1940ade70 Yufen Yu 2020-07-02 2535 mdname(conf->mddev),
98afa1940ade70 Yufen Yu 2020-07-02 2536 conf->stripe_sectors,
05616be5e11f66 NeilBrown 2012-05-21 2537 (unsigned long long)s,
d69504325978c4 NeilBrown 2006-07-10 2538 bdevname(rdev->bdev, b));
98afa1940ade70 Yufen Yu 2020-07-02 2539 atomic_add(conf->stripe_sectors, &rdev->corrected_errors);
4e5314b56a7ea1 NeilBrown 2005-11-08 2540 clear_bit(R5_ReadError, &sh->dev[i].flags);
4e5314b56a7ea1 NeilBrown 2005-11-08 2541 clear_bit(R5_ReWrite, &sh->dev[i].flags);
3f9e7c140e4c4e majianpeng 2012-07-31 2542 } else if (test_bit(R5_ReadNoMerge, &sh->dev[i].flags))
3f9e7c140e4c4e majianpeng 2012-07-31 2543 clear_bit(R5_ReadNoMerge, &sh->dev[i].flags);
3f9e7c140e4c4e majianpeng 2012-07-31 2544
86aa1397ddfde5 Song Liu 2017-01-12 2545 if (test_bit(R5_InJournal, &sh->dev[i].flags))
86aa1397ddfde5 Song Liu 2017-01-12 2546 /*
86aa1397ddfde5 Song Liu 2017-01-12 2547 * end read for a page in journal, this
86aa1397ddfde5 Song Liu 2017-01-12 2548 * must be preparing for prexor in rmw
86aa1397ddfde5 Song Liu 2017-01-12 2549 */
86aa1397ddfde5 Song Liu 2017-01-12 2550 set_bit(R5_OrigPageUPTDODATE, &sh->dev[i].flags);
86aa1397ddfde5 Song Liu 2017-01-12 2551
14a75d3e07c784 NeilBrown 2011-12-23 2552 if (atomic_read(&rdev->read_errors))
14a75d3e07c784 NeilBrown 2011-12-23 2553 atomic_set(&rdev->read_errors, 0);
^1da177e4c3f41 Linus Torvalds 2005-04-16 2554 } else {
14a75d3e07c784 NeilBrown 2011-12-23 2555 const char *bdn = bdevname(rdev->bdev, b);
ba22dcbf106338 NeilBrown 2005-11-08 2556 int retry = 0;
2e8ac30312973d majianpeng 2012-07-03 2557 int set_bad = 0;
d69504325978c4 NeilBrown 2006-07-10 2558
^1da177e4c3f41 Linus Torvalds 2005-04-16 2559 clear_bit(R5_UPTODATE, &sh->dev[i].flags);
b76b4715eba0d0 Nigel Croxon 2019-09-06 2560 if (!(bi->bi_status == BLK_STS_PROTECTION))
d69504325978c4 NeilBrown 2006-07-10 2561 atomic_inc(&rdev->read_errors);
14a75d3e07c784 NeilBrown 2011-12-23 2562 if (test_bit(R5_ReadRepl, &sh->dev[i].flags))
cc6167b4f3b3ca NeilBrown 2016-11-02 2563 pr_warn_ratelimited(
cc6167b4f3b3ca NeilBrown 2016-11-02 2564 "md/raid:%s: read error on replacement device (sector %llu on %s).\n",
14a75d3e07c784 NeilBrown 2011-12-23 2565 mdname(conf->mddev),
05616be5e11f66 NeilBrown 2012-05-21 2566 (unsigned long long)s,
14a75d3e07c784 NeilBrown 2011-12-23 2567 bdn);
2e8ac30312973d majianpeng 2012-07-03 2568 else if (conf->mddev->degraded >= conf->max_degraded) {
2e8ac30312973d majianpeng 2012-07-03 2569 set_bad = 1;
cc6167b4f3b3ca NeilBrown 2016-11-02 2570 pr_warn_ratelimited(
cc6167b4f3b3ca NeilBrown 2016-11-02 2571 "md/raid:%s: read error not correctable (sector %llu on %s).\n",
d69504325978c4 NeilBrown 2006-07-10 2572 mdname(conf->mddev),
05616be5e11f66 NeilBrown 2012-05-21 2573 (unsigned long long)s,
d69504325978c4 NeilBrown 2006-07-10 2574 bdn);
2e8ac30312973d majianpeng 2012-07-03 2575 } else if (test_bit(R5_ReWrite, &sh->dev[i].flags)) {
4e5314b56a7ea1 NeilBrown 2005-11-08 2576 /* Oh, no!!! */
2e8ac30312973d majianpeng 2012-07-03 2577 set_bad = 1;
cc6167b4f3b3ca NeilBrown 2016-11-02 2578 pr_warn_ratelimited(
cc6167b4f3b3ca NeilBrown 2016-11-02 2579 "md/raid:%s: read error NOT corrected!! (sector %llu on %s).\n",
d69504325978c4 NeilBrown 2006-07-10 2580 mdname(conf->mddev),
05616be5e11f66 NeilBrown 2012-05-21 2581 (unsigned long long)s,
d69504325978c4 NeilBrown 2006-07-10 2582 bdn);
2e8ac30312973d majianpeng 2012-07-03 2583 } else if (atomic_read(&rdev->read_errors)
0009fad0333708 Nigel Croxon 2019-08-21 2584 > conf->max_nr_stripes) {
0009fad0333708 Nigel Croxon 2019-08-21 2585 if (!test_bit(Faulty, &rdev->flags)) {
0009fad0333708 Nigel Croxon 2019-08-21 2586 pr_warn("md/raid:%s: %d read_errors > %d stripes\n",
0009fad0333708 Nigel Croxon 2019-08-21 2587 mdname(conf->mddev),
0009fad0333708 Nigel Croxon 2019-08-21 2588 atomic_read(&rdev->read_errors),
0009fad0333708 Nigel Croxon 2019-08-21 2589 conf->max_nr_stripes);
cc6167b4f3b3ca NeilBrown 2016-11-02 2590 pr_warn("md/raid:%s: Too many read errors, failing device %s.\n",
d69504325978c4 NeilBrown 2006-07-10 2591 mdname(conf->mddev), bdn);
0009fad0333708 Nigel Croxon 2019-08-21 2592 }
0009fad0333708 Nigel Croxon 2019-08-21 2593 } else
ba22dcbf106338 NeilBrown 2005-11-08 2594 retry = 1;
edfa1f651e9326 Bian Yu 2013-11-14 2595 if (set_bad && test_bit(In_sync, &rdev->flags)
edfa1f651e9326 Bian Yu 2013-11-14 2596 && !test_bit(R5_ReadNoMerge, &sh->dev[i].flags))
edfa1f651e9326 Bian Yu 2013-11-14 2597 retry = 1;
ba22dcbf106338 NeilBrown 2005-11-08 2598 if (retry)
143f6e733b7305 Xiao Ni 2019-07-08 2599 if (sh->qd_idx >= 0 && sh->pd_idx == i)
143f6e733b7305 Xiao Ni 2019-07-08 2600 set_bit(R5_ReadError, &sh->dev[i].flags);
143f6e733b7305 Xiao Ni 2019-07-08 2601 else if (test_bit(R5_ReadNoMerge, &sh->dev[i].flags)) {
ba22dcbf106338 NeilBrown 2005-11-08 2602 set_bit(R5_ReadError, &sh->dev[i].flags);
3f9e7c140e4c4e majianpeng 2012-07-31 2603 clear_bit(R5_ReadNoMerge, &sh->dev[i].flags);
3f9e7c140e4c4e majianpeng 2012-07-31 2604 } else
3f9e7c140e4c4e majianpeng 2012-07-31 2605 set_bit(R5_ReadNoMerge, &sh->dev[i].flags);
ba22dcbf106338 NeilBrown 2005-11-08 2606 else {
4e5314b56a7ea1 NeilBrown 2005-11-08 2607 clear_bit(R5_ReadError, &sh->dev[i].flags);
4e5314b56a7ea1 NeilBrown 2005-11-08 2608 clear_bit(R5_ReWrite, &sh->dev[i].flags);
2e8ac30312973d majianpeng 2012-07-03 2609 if (!(set_bad
2e8ac30312973d majianpeng 2012-07-03 2610 && test_bit(In_sync, &rdev->flags)
2e8ac30312973d majianpeng 2012-07-03 2611 && rdev_set_badblocks(
98afa1940ade70 Yufen Yu 2020-07-02 2612 rdev, sh->sector,
98afa1940ade70 Yufen Yu 2020-07-02 2613 conf->stripe_sectors, 0)))
d69504325978c4 NeilBrown 2006-07-10 2614 md_error(conf->mddev, rdev);
ba22dcbf106338 NeilBrown 2005-11-08 2615 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 2616 }
14a75d3e07c784 NeilBrown 2011-12-23 2617 rdev_dec_pending(rdev, conf->mddev);
c94455558337ee Shaohua Li 2016-09-08 2618 bio_reset(bi);
^1da177e4c3f41 Linus Torvalds 2005-04-16 2619 clear_bit(R5_LOCKED, &sh->dev[i].flags);
^1da177e4c3f41 Linus Torvalds 2005-04-16 2620 set_bit(STRIPE_HANDLE, &sh->state);
6d036f7d52e5a9 Shaohua Li 2015-08-13 2621 raid5_release_stripe(sh);
^1da177e4c3f41 Linus Torvalds 2005-04-16 2622 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 2623
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 18576 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf
@ 2020-07-02 14:51 ` kernel test robot
0 siblings, 0 replies; 37+ messages in thread
From: kernel test robot @ 2020-07-02 14:51 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 16417 bytes --]
Hi Yufen,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on song-md/md-next]
[also build test WARNING on cryptodev/master v5.8-rc3 next-20200702]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Yufen-Yu/md-raid5-set-STRIPE_SIZE-as-a-configurable-value/20200702-200949
base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next
config: parisc-defconfig (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
In file included from include/linux/kernel.h:15,
from arch/parisc/include/asm/bug.h:5,
from include/linux/bug.h:5,
from include/linux/thread_info.h:12,
from include/asm-generic/current.h:5,
from ./arch/parisc/include/generated/asm/current.h:1,
from include/linux/sched.h:12,
from include/linux/blkdev.h:5,
from drivers/md/raid5.c:38:
drivers/md/raid5.c: In function 'raid5_end_read_request':
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat=]
5 | #define KERN_SOH "\001" /* ASCII Start Of Header */
| ^~~~~~
include/linux/printk.h:507:10: note: in definition of macro 'printk_ratelimited'
507 | printk(fmt, ##__VA_ARGS__); \
| ^~~
include/linux/kern_levels.h:14:19: note: in expansion of macro 'KERN_SOH'
14 | #define KERN_INFO KERN_SOH "6" /* informational */
| ^~~~~~~~
include/linux/printk.h:527:21: note: in expansion of macro 'KERN_INFO'
527 | printk_ratelimited(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
| ^~~~~~~~~
>> drivers/md/raid5.c:2533:4: note: in expansion of macro 'pr_info_ratelimited'
2533 | pr_info_ratelimited(
| ^~~~~~~~~~~~~~~~~~~
drivers/md/raid5.c:2534:42: note: format string is defined here
2534 | "md/raid:%s: read error corrected (%lu sectors at %llu on %s)\n",
| ~~^
| |
| long unsigned int
| %u
In file included from include/linux/printk.h:7,
from include/linux/kernel.h:15,
from arch/parisc/include/asm/bug.h:5,
from include/linux/bug.h:5,
from include/linux/thread_info.h:12,
from include/asm-generic/current.h:5,
from ./arch/parisc/include/generated/asm/current.h:1,
from include/linux/sched.h:12,
from include/linux/blkdev.h:5,
from drivers/md/raid5.c:38:
drivers/md/raid5.c: In function 'check_stripe_cache':
include/linux/kern_levels.h:5:18: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'unsigned int' [-Wformat=]
5 | #define KERN_SOH "\001" /* ASCII Start Of Header */
| ^~~~~~
include/linux/kern_levels.h:12:22: note: in expansion of macro 'KERN_SOH'
12 | #define KERN_WARNING KERN_SOH "4" /* warning conditions */
| ^~~~~~~~
include/linux/printk.h:348:9: note: in expansion of macro 'KERN_WARNING'
348 | printk(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
| ^~~~~~~~~~~~
>> drivers/md/raid5.c:7853:3: note: in expansion of macro 'pr_warn'
7853 | pr_warn("md/raid:%s: reshape: not enough stripes. Needed %lu\n",
| ^~~~~~~
drivers/md/raid5.c:7853:63: note: format string is defined here
7853 | pr_warn("md/raid:%s: reshape: not enough stripes. Needed %lu\n",
| ~~^
| |
| long unsigned int
| %u
drivers/md/raid5.c: In function 'raid5_start_reshape':
drivers/md/raid5.c:7993:31: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
7993 | /* Failure here is OK */;
| ^
vim +/pr_info_ratelimited +2533 drivers/md/raid5.c
^1da177e4c3f41 Linus Torvalds 2005-04-16 2490
4246a0b63bd8f5 Christoph Hellwig 2015-07-20 2491 static void raid5_end_read_request(struct bio * bi)
^1da177e4c3f41 Linus Torvalds 2005-04-16 2492 {
^1da177e4c3f41 Linus Torvalds 2005-04-16 2493 struct stripe_head *sh = bi->bi_private;
d1688a6d5515f1 NeilBrown 2011-10-11 2494 struct r5conf *conf = sh->raid_conf;
7ecaa1e6a1ad69 NeilBrown 2006-03-27 2495 int disks = sh->disks, i;
d69504325978c4 NeilBrown 2006-07-10 2496 char b[BDEVNAME_SIZE];
dd054fce88d33d NeilBrown 2011-12-23 2497 struct md_rdev *rdev = NULL;
05616be5e11f66 NeilBrown 2012-05-21 2498 sector_t s;
^1da177e4c3f41 Linus Torvalds 2005-04-16 2499
^1da177e4c3f41 Linus Torvalds 2005-04-16 2500 for (i=0 ; i<disks; i++)
^1da177e4c3f41 Linus Torvalds 2005-04-16 2501 if (bi == &sh->dev[i].req)
^1da177e4c3f41 Linus Torvalds 2005-04-16 2502 break;
^1da177e4c3f41 Linus Torvalds 2005-04-16 2503
4246a0b63bd8f5 Christoph Hellwig 2015-07-20 2504 pr_debug("end_read_request %llu/%d, count: %d, error %d.\n",
^1da177e4c3f41 Linus Torvalds 2005-04-16 2505 (unsigned long long)sh->sector, i, atomic_read(&sh->count),
4e4cbee93d5613 Christoph Hellwig 2017-06-03 2506 bi->bi_status);
^1da177e4c3f41 Linus Torvalds 2005-04-16 2507 if (i == disks) {
5f9d1fde7d54a5 Shaohua Li 2016-08-22 2508 bio_reset(bi);
^1da177e4c3f41 Linus Torvalds 2005-04-16 2509 BUG();
6712ecf8f64811 NeilBrown 2007-09-27 2510 return;
^1da177e4c3f41 Linus Torvalds 2005-04-16 2511 }
14a75d3e07c784 NeilBrown 2011-12-23 2512 if (test_bit(R5_ReadRepl, &sh->dev[i].flags))
dd054fce88d33d NeilBrown 2011-12-23 2513 /* If replacement finished while this request was outstanding,
dd054fce88d33d NeilBrown 2011-12-23 2514 * 'replacement' might be NULL already.
dd054fce88d33d NeilBrown 2011-12-23 2515 * In that case it moved down to 'rdev'.
dd054fce88d33d NeilBrown 2011-12-23 2516 * rdev is not removed until all requests are finished.
dd054fce88d33d NeilBrown 2011-12-23 2517 */
14a75d3e07c784 NeilBrown 2011-12-23 2518 rdev = conf->disks[i].replacement;
dd054fce88d33d NeilBrown 2011-12-23 2519 if (!rdev)
14a75d3e07c784 NeilBrown 2011-12-23 2520 rdev = conf->disks[i].rdev;
^1da177e4c3f41 Linus Torvalds 2005-04-16 2521
05616be5e11f66 NeilBrown 2012-05-21 2522 if (use_new_offset(conf, sh))
05616be5e11f66 NeilBrown 2012-05-21 2523 s = sh->sector + rdev->new_data_offset;
05616be5e11f66 NeilBrown 2012-05-21 2524 else
05616be5e11f66 NeilBrown 2012-05-21 2525 s = sh->sector + rdev->data_offset;
4e4cbee93d5613 Christoph Hellwig 2017-06-03 2526 if (!bi->bi_status) {
^1da177e4c3f41 Linus Torvalds 2005-04-16 2527 set_bit(R5_UPTODATE, &sh->dev[i].flags);
4e5314b56a7ea1 NeilBrown 2005-11-08 2528 if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
14a75d3e07c784 NeilBrown 2011-12-23 2529 /* Note that this cannot happen on a
14a75d3e07c784 NeilBrown 2011-12-23 2530 * replacement device. We just fail those on
14a75d3e07c784 NeilBrown 2011-12-23 2531 * any error
14a75d3e07c784 NeilBrown 2011-12-23 2532 */
cc6167b4f3b3ca NeilBrown 2016-11-02 @2533 pr_info_ratelimited(
cc6167b4f3b3ca NeilBrown 2016-11-02 2534 "md/raid:%s: read error corrected (%lu sectors at %llu on %s)\n",
98afa1940ade70 Yufen Yu 2020-07-02 2535 mdname(conf->mddev),
98afa1940ade70 Yufen Yu 2020-07-02 2536 conf->stripe_sectors,
05616be5e11f66 NeilBrown 2012-05-21 2537 (unsigned long long)s,
d69504325978c4 NeilBrown 2006-07-10 2538 bdevname(rdev->bdev, b));
98afa1940ade70 Yufen Yu 2020-07-02 2539 atomic_add(conf->stripe_sectors, &rdev->corrected_errors);
4e5314b56a7ea1 NeilBrown 2005-11-08 2540 clear_bit(R5_ReadError, &sh->dev[i].flags);
4e5314b56a7ea1 NeilBrown 2005-11-08 2541 clear_bit(R5_ReWrite, &sh->dev[i].flags);
3f9e7c140e4c4e majianpeng 2012-07-31 2542 } else if (test_bit(R5_ReadNoMerge, &sh->dev[i].flags))
3f9e7c140e4c4e majianpeng 2012-07-31 2543 clear_bit(R5_ReadNoMerge, &sh->dev[i].flags);
3f9e7c140e4c4e majianpeng 2012-07-31 2544
86aa1397ddfde5 Song Liu 2017-01-12 2545 if (test_bit(R5_InJournal, &sh->dev[i].flags))
86aa1397ddfde5 Song Liu 2017-01-12 2546 /*
86aa1397ddfde5 Song Liu 2017-01-12 2547 * end read for a page in journal, this
86aa1397ddfde5 Song Liu 2017-01-12 2548 * must be preparing for prexor in rmw
86aa1397ddfde5 Song Liu 2017-01-12 2549 */
86aa1397ddfde5 Song Liu 2017-01-12 2550 set_bit(R5_OrigPageUPTDODATE, &sh->dev[i].flags);
86aa1397ddfde5 Song Liu 2017-01-12 2551
14a75d3e07c784 NeilBrown 2011-12-23 2552 if (atomic_read(&rdev->read_errors))
14a75d3e07c784 NeilBrown 2011-12-23 2553 atomic_set(&rdev->read_errors, 0);
^1da177e4c3f41 Linus Torvalds 2005-04-16 2554 } else {
14a75d3e07c784 NeilBrown 2011-12-23 2555 const char *bdn = bdevname(rdev->bdev, b);
ba22dcbf106338 NeilBrown 2005-11-08 2556 int retry = 0;
2e8ac30312973d majianpeng 2012-07-03 2557 int set_bad = 0;
d69504325978c4 NeilBrown 2006-07-10 2558
^1da177e4c3f41 Linus Torvalds 2005-04-16 2559 clear_bit(R5_UPTODATE, &sh->dev[i].flags);
b76b4715eba0d0 Nigel Croxon 2019-09-06 2560 if (!(bi->bi_status == BLK_STS_PROTECTION))
d69504325978c4 NeilBrown 2006-07-10 2561 atomic_inc(&rdev->read_errors);
14a75d3e07c784 NeilBrown 2011-12-23 2562 if (test_bit(R5_ReadRepl, &sh->dev[i].flags))
cc6167b4f3b3ca NeilBrown 2016-11-02 2563 pr_warn_ratelimited(
cc6167b4f3b3ca NeilBrown 2016-11-02 2564 "md/raid:%s: read error on replacement device (sector %llu on %s).\n",
14a75d3e07c784 NeilBrown 2011-12-23 2565 mdname(conf->mddev),
05616be5e11f66 NeilBrown 2012-05-21 2566 (unsigned long long)s,
14a75d3e07c784 NeilBrown 2011-12-23 2567 bdn);
2e8ac30312973d majianpeng 2012-07-03 2568 else if (conf->mddev->degraded >= conf->max_degraded) {
2e8ac30312973d majianpeng 2012-07-03 2569 set_bad = 1;
cc6167b4f3b3ca NeilBrown 2016-11-02 2570 pr_warn_ratelimited(
cc6167b4f3b3ca NeilBrown 2016-11-02 2571 "md/raid:%s: read error not correctable (sector %llu on %s).\n",
d69504325978c4 NeilBrown 2006-07-10 2572 mdname(conf->mddev),
05616be5e11f66 NeilBrown 2012-05-21 2573 (unsigned long long)s,
d69504325978c4 NeilBrown 2006-07-10 2574 bdn);
2e8ac30312973d majianpeng 2012-07-03 2575 } else if (test_bit(R5_ReWrite, &sh->dev[i].flags)) {
4e5314b56a7ea1 NeilBrown 2005-11-08 2576 /* Oh, no!!! */
2e8ac30312973d majianpeng 2012-07-03 2577 set_bad = 1;
cc6167b4f3b3ca NeilBrown 2016-11-02 2578 pr_warn_ratelimited(
cc6167b4f3b3ca NeilBrown 2016-11-02 2579 "md/raid:%s: read error NOT corrected!! (sector %llu on %s).\n",
d69504325978c4 NeilBrown 2006-07-10 2580 mdname(conf->mddev),
05616be5e11f66 NeilBrown 2012-05-21 2581 (unsigned long long)s,
d69504325978c4 NeilBrown 2006-07-10 2582 bdn);
2e8ac30312973d majianpeng 2012-07-03 2583 } else if (atomic_read(&rdev->read_errors)
0009fad0333708 Nigel Croxon 2019-08-21 2584 > conf->max_nr_stripes) {
0009fad0333708 Nigel Croxon 2019-08-21 2585 if (!test_bit(Faulty, &rdev->flags)) {
0009fad0333708 Nigel Croxon 2019-08-21 2586 pr_warn("md/raid:%s: %d read_errors > %d stripes\n",
0009fad0333708 Nigel Croxon 2019-08-21 2587 mdname(conf->mddev),
0009fad0333708 Nigel Croxon 2019-08-21 2588 atomic_read(&rdev->read_errors),
0009fad0333708 Nigel Croxon 2019-08-21 2589 conf->max_nr_stripes);
cc6167b4f3b3ca NeilBrown 2016-11-02 2590 pr_warn("md/raid:%s: Too many read errors, failing device %s.\n",
d69504325978c4 NeilBrown 2006-07-10 2591 mdname(conf->mddev), bdn);
0009fad0333708 Nigel Croxon 2019-08-21 2592 }
0009fad0333708 Nigel Croxon 2019-08-21 2593 } else
ba22dcbf106338 NeilBrown 2005-11-08 2594 retry = 1;
edfa1f651e9326 Bian Yu 2013-11-14 2595 if (set_bad && test_bit(In_sync, &rdev->flags)
edfa1f651e9326 Bian Yu 2013-11-14 2596 && !test_bit(R5_ReadNoMerge, &sh->dev[i].flags))
edfa1f651e9326 Bian Yu 2013-11-14 2597 retry = 1;
ba22dcbf106338 NeilBrown 2005-11-08 2598 if (retry)
143f6e733b7305 Xiao Ni 2019-07-08 2599 if (sh->qd_idx >= 0 && sh->pd_idx == i)
143f6e733b7305 Xiao Ni 2019-07-08 2600 set_bit(R5_ReadError, &sh->dev[i].flags);
143f6e733b7305 Xiao Ni 2019-07-08 2601 else if (test_bit(R5_ReadNoMerge, &sh->dev[i].flags)) {
ba22dcbf106338 NeilBrown 2005-11-08 2602 set_bit(R5_ReadError, &sh->dev[i].flags);
3f9e7c140e4c4e majianpeng 2012-07-31 2603 clear_bit(R5_ReadNoMerge, &sh->dev[i].flags);
3f9e7c140e4c4e majianpeng 2012-07-31 2604 } else
3f9e7c140e4c4e majianpeng 2012-07-31 2605 set_bit(R5_ReadNoMerge, &sh->dev[i].flags);
ba22dcbf106338 NeilBrown 2005-11-08 2606 else {
4e5314b56a7ea1 NeilBrown 2005-11-08 2607 clear_bit(R5_ReadError, &sh->dev[i].flags);
4e5314b56a7ea1 NeilBrown 2005-11-08 2608 clear_bit(R5_ReWrite, &sh->dev[i].flags);
2e8ac30312973d majianpeng 2012-07-03 2609 if (!(set_bad
2e8ac30312973d majianpeng 2012-07-03 2610 && test_bit(In_sync, &rdev->flags)
2e8ac30312973d majianpeng 2012-07-03 2611 && rdev_set_badblocks(
98afa1940ade70 Yufen Yu 2020-07-02 2612 rdev, sh->sector,
98afa1940ade70 Yufen Yu 2020-07-02 2613 conf->stripe_sectors, 0)))
d69504325978c4 NeilBrown 2006-07-10 2614 md_error(conf->mddev, rdev);
ba22dcbf106338 NeilBrown 2005-11-08 2615 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 2616 }
14a75d3e07c784 NeilBrown 2011-12-23 2617 rdev_dec_pending(rdev, conf->mddev);
c94455558337ee Shaohua Li 2016-09-08 2618 bio_reset(bi);
^1da177e4c3f41 Linus Torvalds 2005-04-16 2619 clear_bit(R5_LOCKED, &sh->dev[i].flags);
^1da177e4c3f41 Linus Torvalds 2005-04-16 2620 set_bit(STRIPE_HANDLE, &sh->state);
6d036f7d52e5a9 Shaohua Li 2015-08-13 2621 raid5_release_stripe(sh);
^1da177e4c3f41 Linus Torvalds 2005-04-16 2622 }
^1da177e4c3f41 Linus Torvalds 2005-04-16 2623
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 18576 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf
2020-07-02 12:06 ` [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf Yufen Yu
@ 2020-07-02 15:44 ` kernel test robot
2020-07-02 15:44 ` kernel test robot
` (2 subsequent siblings)
3 siblings, 0 replies; 37+ messages in thread
From: kernel test robot @ 2020-07-02 15:44 UTC (permalink / raw)
To: song; +Cc: kbuild-all, linux-raid, neilb, guoqing.jiang, houtao1, yuyufen
[-- Attachment #1: Type: text/plain, Size: 1301 bytes --]
Hi Yufen,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on song-md/md-next]
[also build test ERROR on cryptodev/master v5.8-rc3 next-20200702]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Yufen-Yu/md-raid5-set-STRIPE_SIZE-as-a-configurable-value/20200702-200949
base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next
config: parisc-defconfig (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>, old ones prefixed by <<):
>> ERROR: modpost: "__udivdi3" [drivers/md/raid456.ko] undefined!
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 18576 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf
@ 2020-07-02 15:44 ` kernel test robot
0 siblings, 0 replies; 37+ messages in thread
From: kernel test robot @ 2020-07-02 15:44 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 1333 bytes --]
Hi Yufen,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on song-md/md-next]
[also build test ERROR on cryptodev/master v5.8-rc3 next-20200702]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Yufen-Yu/md-raid5-set-STRIPE_SIZE-as-a-configurable-value/20200702-200949
base: git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next
config: parisc-defconfig (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>, old ones prefixed by <<):
>> ERROR: modpost: "__udivdi3" [drivers/md/raid456.ko] undefined!
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 18576 bytes --]
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf
2020-07-02 12:06 ` [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf Yufen Yu
2020-07-02 14:51 ` kernel test robot
2020-07-02 15:44 ` kernel test robot
@ 2020-07-02 18:15 ` Song Liu
2020-07-02 18:23 ` Paul Menzel
2020-07-06 9:09 ` Guoqing Jiang
3 siblings, 1 reply; 37+ messages in thread
From: Song Liu @ 2020-07-02 18:15 UTC (permalink / raw)
To: Yufen Yu; +Cc: linux-raid, NeilBrown, Guoqing Jiang, Hou Tao
On Thu, Jul 2, 2020 at 5:05 AM Yufen Yu <yuyufen@huawei.com> wrote:
>
> We covert STRIPE_SIZE, STRIPE_SHIFT and STRIPE_SECTORS to stripe_size,
> stripe_shift and stripe_sectors as members of struct r5conf. Then each
> raid456 array can config different stripe_size. This patch is prepared
> for following configurable stripe_size.
>
> Simply replace word STRIPE_ with conf->stripe_ and add 'conf' argument
> for function stripe_hash_locks_hash() and r5_next_bio() to get stripe_size.
> After that, we initialize stripe_size into setup_conf().
>
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
This patch looks good. Please fix the warning found by the kernel test bot.
Also a nitpick below.
[...]
> -
> /* NOTE NR_STRIPE_HASH_LOCKS must remain below 64.
> * This is because we sometimes take all the spinlocks
> * and creating that much locking depth can cause
> @@ -574,6 +554,9 @@ struct r5conf {
> int raid_disks;
> int max_nr_stripes;
> int min_nr_stripes;
> + unsigned int stripe_size;
> + unsigned int stripe_shift;
> + unsigned int stripe_sectors;
Are you using a different tab size (other than 8)? These 3 new lines are not
aligned with the rest with tab size of 8.
>
> /* reshape_progress is the leading edge of a 'reshape'
> * It has value MaxSector when no reshape is happening
> @@ -752,6 +735,24 @@ static inline int algorithm_is_DDF(int layout)
> return layout >= 8 && layout <= 10;
> }
>
> +/* bio's attached to a stripe+device for I/O are linked together in bi_sector
> + * order without overlap. There may be several bio's per stripe+device, and
> + * a bio could span several devices.
> + * When walking this list for a particular stripe+device, we must never proceed
> + * beyond a bio that extends past this device, as the next bio might no longer
> + * be valid.
> + * This function is used to determine the 'next' bio in the list, given the
> + * sector of the current stripe+device
> + */
> +static inline struct bio *
> +r5_next_bio(struct r5conf *conf, struct bio *bio, sector_t sector)
> +{
> + if (bio_end_sector(bio) < sector + conf->stripe_sectors)
> + return bio->bi_next;
> + else
> + return NULL;
> +}
> +
> extern void md_raid5_kick_device(struct r5conf *conf);
> extern int raid5_set_cache_size(struct mddev *mddev, int size);
> extern sector_t raid5_compute_blocknr(struct stripe_head *sh, int i, int previous);
> --
> 2.25.4
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf
2020-07-02 18:15 ` Song Liu
@ 2020-07-02 18:23 ` Paul Menzel
2020-07-12 22:55 ` antlists
0 siblings, 1 reply; 37+ messages in thread
From: Paul Menzel @ 2020-07-02 18:23 UTC (permalink / raw)
To: Song Liu, Yufen Yu; +Cc: linux-raid, NeilBrown, Guoqing Jiang, Hou Tao
Dear Yufen,
Am 02.07.20 um 20:15 schrieb Song Liu:
> On Thu, Jul 2, 2020 at 5:05 AM Yufen Yu <yuyufen@huawei.com> wrote:
>>
>> We covert STRIPE_SIZE, STRIPE_SHIFT and STRIPE_SECTORS to stripe_size,
>> stripe_shift and stripe_sectors as members of struct r5conf. Then each
>> raid456 array can config different stripe_size. This patch is prepared
>> for following configurable stripe_size.
>>
>> Simply replace word STRIPE_ with conf->stripe_ and add 'conf' argument
>> for function stripe_hash_locks_hash() and r5_next_bio() to get stripe_size.
>> After that, we initialize stripe_size into setup_conf().
>>
>> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
>
> This patch looks good. Please fix the warning found by the kernel test bot.
> Also a nitpick below.
Please also fix the typo *covert* to *Convert* in the commit message
summary, and maybe use *to* instead of *as*.
> Convert macro define of STRIPE_* to members of struct r5conf
[…]
Kind regards,
Paul
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 02/16] md/raid5: add sysfs entry to set and show stripe_size
2020-07-02 12:06 ` [PATCH v5 02/16] md/raid5: add sysfs entry to set and show stripe_size Yufen Yu
@ 2020-07-02 22:14 ` Song Liu
0 siblings, 0 replies; 37+ messages in thread
From: Song Liu @ 2020-07-02 22:14 UTC (permalink / raw)
To: Yufen Yu; +Cc: linux-raid, NeilBrown, Guoqing Jiang, Hou Tao
On Thu, Jul 2, 2020 at 5:05 AM Yufen Yu <yuyufen@huawei.com> wrote:
>
> Here, we don't support setting stripe_size by sysfs.
> Following patches will do that.
>
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
> ---
> drivers/md/raid5.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 2981b853c388..51bc39dab57b 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -6518,6 +6518,27 @@ raid5_rmw_level = __ATTR(rmw_level, S_IRUGO | S_IWUSR,
> raid5_show_rmw_level,
> raid5_store_rmw_level);
>
> +static ssize_t
> +raid5_show_stripe_size(struct mddev *mddev, char *page)
> +{
> + struct r5conf *conf = mddev->private;
> +
> + if (conf)
> + return sprintf(page, "%d\n", conf->stripe_size);
> + else
> + return 0;
> +}
> +
> +static ssize_t
> +raid5_store_stripe_size(struct mddev *mddev, const char *page, size_t len)
> +{
> + return -EINVAL;
> +}
How about we make the file read only for now?
Song
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 12/16] md/raid5: support config stripe_size by sysfs entry
2020-07-02 12:06 ` [PATCH v5 12/16] md/raid5: support config stripe_size by sysfs entry Yufen Yu
@ 2020-07-02 22:38 ` Song Liu
2020-07-04 12:25 ` Yufen Yu
0 siblings, 1 reply; 37+ messages in thread
From: Song Liu @ 2020-07-02 22:38 UTC (permalink / raw)
To: Yufen Yu; +Cc: linux-raid, NeilBrown, Guoqing Jiang, Hou Tao
On Thu, Jul 2, 2020 at 5:05 AM Yufen Yu <yuyufen@huawei.com> wrote:
>
> After this patch, we can adjust stripe_size by writing value into sysfs
> entry, likely, set stripe_size as 16KB:
>
> echo 16384 > /sys/block/md1/md/stripe_size
>
> Show current stripe_size value:
>
> cat /sys/block/md1/md/stripe_size
>
> stripe_size should not be bigger than PAGE_SIZE, and it requires to be
> multiple of 4096.
I think we can just merge 02/16 into this one.
>
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
> ---
> drivers/md/raid5.c | 69 +++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 68 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index f0fd01d9122e..a3376a4e4e5c 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -6715,7 +6715,74 @@ raid5_show_stripe_size(struct mddev *mddev, char *page)
> static ssize_t
> raid5_store_stripe_size(struct mddev *mddev, const char *page, size_t len)
> {
> - return -EINVAL;
> + struct r5conf *conf = mddev->private;
We need mddev_lock(mddev) before accessing mddev->private.
> + unsigned int new;
> + int err;
> + int nr;
> +
> + if (len >= PAGE_SIZE)
> + return -EINVAL;
> + if (kstrtouint(page, 10, &new))
> + return -EINVAL;
> + if (!conf)
> + return -ENODEV;
> +
> + /*
> + * When PAGE_SZIE is 4096, we don't need to modify stripe_size.
> + * And the value should not be bigger than PAGE_SIZE.
> + * It requires to be multiple of 4096.
> + */
> + if (PAGE_SIZE == 4096 || new % 4096 != 0 ||
> + new > PAGE_SIZE || new == 0)
> + return -EINVAL;
> +
> + if (new == conf->stripe_size)
> + return len;
> +
> + pr_debug("md/raid: change stripe_size from %u to %u\n",
> + conf->stripe_size, new);
> +
> + err = mddev_lock(mddev);
> + if (err)
> + return err;
> +
> + if (mddev->sync_thread ||
> + test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
> + mddev->reshape_position != MaxSector ||
> + mddev->sysfs_active) {
> + err = -EBUSY;
> + goto out_unlock;
> + }
> +
> + nr = conf->max_nr_stripes;
> +
> + /* 1. suspend raid array */
> + mddev_suspend(mddev);
> +
> + /* 2. free all old stripe_head */
> + mutex_lock(&conf->cache_size_mutex);
> + shrink_stripes(conf);
> + BUG_ON(conf->max_nr_stripes != 0);
> +
> + /* 3. set new stripe_size */
> + conf->stripe_size = new;
> + conf->stripe_shift = ilog2(new) - 9;
> + conf->stripe_sectors = new >> 9;
> +
> + /* 4. allocate new stripe_head */
> + if (grow_stripes(conf, nr)) {
> + pr_warn("md/raid:%s: couldn't allocate buffers\n",
> + mdname(mddev));
> + err = -ENOMEM;
> + }
> + mutex_unlock(&conf->cache_size_mutex);
> +
> + /* 5. resume raid array */
> + mddev_resume(mddev);
> +
> +out_unlock:
> + mddev_unlock(mddev);
> + return err ?: len;
> }
>
> static struct md_sysfs_entry
> --
> 2.25.4
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 04/16] md/raid5: add a member of r5pages for struct stripe_head
2020-07-02 12:06 ` [PATCH v5 04/16] md/raid5: add a member of r5pages for struct stripe_head Yufen Yu
@ 2020-07-02 22:56 ` Song Liu
2020-07-03 1:22 ` Jason Yan
0 siblings, 1 reply; 37+ messages in thread
From: Song Liu @ 2020-07-02 22:56 UTC (permalink / raw)
To: Yufen Yu; +Cc: linux-raid, NeilBrown, Guoqing Jiang, Hou Tao
On Thu, Jul 2, 2020 at 5:05 AM Yufen Yu <yuyufen@huawei.com> wrote:
>
> Since grow_buffers() uses alloc_page() to allocate the buffers for
> each stripe_head(), means, it will allocate 64K buffers and just use
> 4K of them, after setting stripe_size as 4096.
>
> To avoid wasting memory, we try to contain multiple 'page' of sh->dev
> into one real page. That means, multiple sh->dev[i].page will point to
> the only page with different offset. Example of 64K PAGE_SIZE and
> 4K stripe_size as following:
>
> 64K PAGE_SIZE
> +---+---+---+---+------------------------------+
> | | | | |
> | | | | |
> +-+-+-+-+-+-+-+-+------------------------------+
> ^ ^ ^ ^
> | | | +----------------------------+
> | | | |
> | | +-------------------+ |
> | | | |
> | +----------+ | |
> | | | |
> +-+ | | |
> | | | |
> +-----+-----+------+-----+------+-----+------+------+
> sh | offset(0) | offset(4K) | offset(8K) | offset(12K) |
> + +-----------+------------+------------+-------------+
> +----> dev[0].page dev[1].page dev[2].page dev[3].page
>
> After trying to share one page, the users of sh->dev[i].page need to
> take care:
>
> 1) When issue bio into stripe_head, bi_io_vec.bv_page will point to
> the page directly. So, we should make sure bv_offset to been set
> with correct offset.
>
> 2) When compute xor, the page will be passed to computer function.
> So, we also need to pass offset of that page to computer. Let it
> compute correct location of each sh->dev[i].page.
>
> This patch will add a new member of r5pages into stripe_head to manage
> all pages needed by each sh->dev[i]. We also add 'offset' for each r5dev
> so that users can get related page offset easily. And add helper function
> to get page and it's index in r5pages array by disk index.
>
> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
> ---
> drivers/md/raid5.h | 61 ++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 61 insertions(+)
>
> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
> index 98698569370c..61fe26061c92 100644
> --- a/drivers/md/raid5.h
> +++ b/drivers/md/raid5.h
> @@ -246,6 +246,13 @@ struct stripe_head {
> int target, target2;
> enum sum_check_flags zero_sum_result;
> } ops;
> +
> + /* These pages will be used by bios in dev[i] */
> + struct r5pages {
> + struct page **page;
> + int size; /* page array size */
> + } pages;
> +
struct r5pages seems unnecessary. How about we just use
struct page **pages;
int nr_pages;
> struct r5dev {
> /* rreq and rvec are used for the replacement device when
> * writing data to both devices.
> @@ -253,6 +260,7 @@ struct stripe_head {
> struct bio req, rreq;
> struct bio_vec vec, rvec;
> struct page *page, *orig_page;
> + unsigned int offset; /* offset of this page */
> struct bio *toread, *read, *towrite, *written;
> sector_t sector; /* sector of this page */
> unsigned long flags;
> @@ -754,6 +762,59 @@ r5_next_bio(struct r5conf *conf, struct bio *bio, sector_t sector)
> return NULL;
> }
>
> +/*
> + * Return corresponding page index of r5pages array.
> + */
> +static inline int raid5_get_page_index(struct stripe_head *sh, int disk_idx)
> +{
> + struct r5conf *conf = sh->raid_conf;
> + int cnt;
> +
> + WARN_ON(!sh->pages.page);
> + BUG_ON(conf->stripe_size > PAGE_SIZE);
We have too many of these WARN_ON() and BUG_ON().
> +
> + cnt = PAGE_SIZE / conf->stripe_size;
Maybe add cnt (with different name) to r5conf?
> + return disk_idx / cnt;
> +}
> +
> +/*
> + * Return offset of the corresponding page for r5dev.
> + */
> +static inline int raid5_get_page_offset(struct stripe_head *sh, int disk_idx)
> +{
> + struct r5conf *conf = sh->raid_conf;
> + int cnt;
> +
> + WARN_ON(!sh->pages.page);
> + BUG_ON(conf->stripe_size > PAGE_SIZE);
> +
> + cnt = PAGE_SIZE / conf->stripe_size;
> + return (disk_idx % cnt) * conf->stripe_size;
> +}
> +
> +/*
> + * Return corresponding page address for r5dev.
> + */
> +static inline struct page *
> +raid5_get_dev_page(struct stripe_head *sh, int disk_idx)
> +{
> + int idx;
> +
> + WARN_ON(!sh->pages.page);
> + idx = raid5_get_page_index(sh, disk_idx);
> + return sh->pages.page[idx];
> +}
> +
> +/*
> + * We want to let multiple buffers to share one real page for
> + * stripe_head when PAGE_SIZE is biggger than stripe_size. If
> + * they are equal, no need to use this strategy.
> + */
> +static inline int raid5_stripe_pages_shared(struct r5conf *conf)
> +{
> + return conf->stripe_size < PAGE_SIZE;
> +}
> +
> extern void md_raid5_kick_device(struct r5conf *conf);
> extern int raid5_set_cache_size(struct mddev *mddev, int size);
> extern sector_t raid5_compute_blocknr(struct stripe_head *sh, int i, int previous);
> --
> 2.25.4
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
` (15 preceding siblings ...)
2020-07-02 12:06 ` [PATCH v5 16/16] raid6test: adaptation with syndrome function Yufen Yu
@ 2020-07-02 23:00 ` Song Liu
2020-07-08 13:14 ` Yufen Yu
16 siblings, 1 reply; 37+ messages in thread
From: Song Liu @ 2020-07-02 23:00 UTC (permalink / raw)
To: Yufen Yu; +Cc: linux-raid, NeilBrown, Guoqing Jiang, Hou Tao
On Thu, Jul 2, 2020 at 5:05 AM Yufen Yu <yuyufen@huawei.com> wrote:
>
> Hi, all
>
> For now, STRIPE_SIZE is equal to the value of PAGE_SIZE. That means, RAID5
> will issue each bio to disk at least 64KB when PAGE_SIZE is 64KB in arm64.
> However, filesystem usually issue bio in the unit of 4KB. Then, RAID5 may
> waste resource of disk bandwidth.
>
> To solve the problem, this patchset try to set stripe_size as a configuare
> value. The default value is 4096. We will add a new sysfs entry and set it
> by writing a new value, likely:
>
> echo 16384 > /sys/block/md1/md/stripe_size
Higher level question: do we need to support page size that is NOT 4kB
times power
of 2? Meaning, do we need to support 12kB, 20kB, 24kB, etc. If we only
supports, 4kB,
8kB, 16kB, 32kB, etc. some of the logic can be simpler.
Thanks,
Song
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 04/16] md/raid5: add a member of r5pages for struct stripe_head
2020-07-02 22:56 ` Song Liu
@ 2020-07-03 1:22 ` Jason Yan
0 siblings, 0 replies; 37+ messages in thread
From: Jason Yan @ 2020-07-03 1:22 UTC (permalink / raw)
To: Song Liu, Yufen Yu; +Cc: linux-raid, NeilBrown, Guoqing Jiang, Hou Tao
在 2020/7/3 6:56, Song Liu 写道:
> On Thu, Jul 2, 2020 at 5:05 AM Yufen Yu <yuyufen@huawei.com> wrote:
>>
>> Since grow_buffers() uses alloc_page() to allocate the buffers for
>> each stripe_head(), means, it will allocate 64K buffers and just use
>> 4K of them, after setting stripe_size as 4096.
>>
>> To avoid wasting memory, we try to contain multiple 'page' of sh->dev
>> into one real page. That means, multiple sh->dev[i].page will point to
>> the only page with different offset. Example of 64K PAGE_SIZE and
>> 4K stripe_size as following:
>>
>> 64K PAGE_SIZE
>> +---+---+---+---+------------------------------+
>> | | | | |
>> | | | | |
>> +-+-+-+-+-+-+-+-+------------------------------+
>> ^ ^ ^ ^
>> | | | +----------------------------+
>> | | | |
>> | | +-------------------+ |
>> | | | |
>> | +----------+ | |
>> | | | |
>> +-+ | | |
>> | | | |
>> +-----+-----+------+-----+------+-----+------+------+
>> sh | offset(0) | offset(4K) | offset(8K) | offset(12K) |
>> + +-----------+------------+------------+-------------+
>> +----> dev[0].page dev[1].page dev[2].page dev[3].page
>>
>> After trying to share one page, the users of sh->dev[i].page need to
>> take care:
>>
>> 1) When issue bio into stripe_head, bi_io_vec.bv_page will point to
>> the page directly. So, we should make sure bv_offset to been set
>> with correct offset.
>>
>> 2) When compute xor, the page will be passed to computer function.
>> So, we also need to pass offset of that page to computer. Let it
>> compute correct location of each sh->dev[i].page.
>>
>> This patch will add a new member of r5pages into stripe_head to manage
>> all pages needed by each sh->dev[i]. We also add 'offset' for each r5dev
>> so that users can get related page offset easily. And add helper function
>> to get page and it's index in r5pages array by disk index.
>>
>> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
>> ---
>> drivers/md/raid5.h | 61 ++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 61 insertions(+)
>>
[...]
>>
>> +/*
>> + * Return corresponding page index of r5pages array.
>> + */
>> +static inline int raid5_get_page_index(struct stripe_head *sh, int disk_idx)
>> +{
>> + struct r5conf *conf = sh->raid_conf;
>> + int cnt;
>> +
>> + WARN_ON(!sh->pages.page);
>> + BUG_ON(conf->stripe_size > PAGE_SIZE);
>
> We have too many of these WARN_ON() and BUG_ON().
>
Yes. Yufen, please avoid using BUG_ON() becuase we are reducing the
usage of it and Linus hate it:
https://lkml.org/lkml/2013/5/17/254
https://linuxinsider.com/story/torvalds-blows-stack-over-buggy-new-kernel-83975.html
https://lore.kernel.org/patchwork/patch/568291/
Thanks,
Jason
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 12/16] md/raid5: support config stripe_size by sysfs entry
2020-07-02 22:38 ` Song Liu
@ 2020-07-04 12:25 ` Yufen Yu
0 siblings, 0 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-04 12:25 UTC (permalink / raw)
To: Song Liu; +Cc: linux-raid, NeilBrown, Guoqing Jiang, Hou Tao
On 2020/7/3 6:38, Song Liu wrote:
> On Thu, Jul 2, 2020 at 5:05 AM Yufen Yu <yuyufen@huawei.com> wrote:
>>
>> After this patch, we can adjust stripe_size by writing value into sysfs
>> entry, likely, set stripe_size as 16KB:
>>
>> echo 16384 > /sys/block/md1/md/stripe_size
>>
>> Show current stripe_size value:
>>
>> cat /sys/block/md1/md/stripe_size
>>
>> stripe_size should not be bigger than PAGE_SIZE, and it requires to be
>> multiple of 4096.
>
> I think we can just merge 02/16 into this one.
>
>>
>> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
>> ---
>> drivers/md/raid5.c | 69 +++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 68 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index f0fd01d9122e..a3376a4e4e5c 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -6715,7 +6715,74 @@ raid5_show_stripe_size(struct mddev *mddev, char *page)
>> static ssize_t
>> raid5_store_stripe_size(struct mddev *mddev, const char *page, size_t len)
>> {
>> - return -EINVAL;
>> + struct r5conf *conf = mddev->private;
>
> We need mddev_lock(mddev) before accessing mddev->private.
>
Thanks to point this bug. I will fix it.
Thanks,
Yufen
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf
2020-07-02 12:06 ` [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf Yufen Yu
` (2 preceding siblings ...)
2020-07-02 18:15 ` Song Liu
@ 2020-07-06 9:09 ` Guoqing Jiang
2020-07-06 11:34 ` Guoqing Jiang
2020-07-08 2:22 ` Yufen Yu
3 siblings, 2 replies; 37+ messages in thread
From: Guoqing Jiang @ 2020-07-06 9:09 UTC (permalink / raw)
To: Yufen Yu, song; +Cc: linux-raid, neilb, houtao1
On 7/2/20 2:06 PM, Yufen Yu wrote:
> @@ -2509,10 +2532,11 @@ static void raid5_end_read_request(struct bio * bi)
> */
> pr_info_ratelimited(
> "md/raid:%s: read error corrected (%lu sectors at %llu on %s)\n",
> - mdname(conf->mddev), STRIPE_SECTORS,
> + mdname(conf->mddev),
> + conf->stripe_sectors,
The conf->stripe_sectors is printed with %lu format.
> ot allow a suitable chunk size */
> return ERR_PTR(-EINVAL);
>
> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
> index f90e0704bed9..e36cf71e8465 100644
> --- a/drivers/md/raid5.h
> +++ b/drivers/md/raid5.h
> @@ -472,32 +472,12 @@ struct disk_info {
> */
>
> #define NR_STRIPES 256
> -#define STRIPE_SIZE PAGE_SIZE
> -#define STRIPE_SHIFT (PAGE_SHIFT - 9)
> -#define STRIPE_SECTORS (STRIPE_SIZE>>9)
> #define IO_THRESHOLD 1
> #define BYPASS_THRESHOLD 1
> #define NR_HASH (PAGE_SIZE / sizeof(struct hlist_head))
> #define HASH_MASK (NR_HASH - 1)
> #define MAX_STRIPE_BATCH 8
>
[...]
> return NULL;
> -}
> -
> /* NOTE NR_STRIPE_HASH_LOCKS must remain below 64.
> * This is because we sometimes take all the spinlocks
> * and creating that much locking depth can cause
> @@ -574,6 +554,9 @@ struct r5conf {
> int raid_disks;
> int max_nr_stripes;
> int min_nr_stripes;
> + unsigned int stripe_size;
> + unsigned int stripe_shift;
> + unsigned int stripe_sectors;
So you need to define it with "unsigned long".
Also, I am wondering if it is cleaner with something like below.
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 8dea4398b191..984eea97e77c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6918,6 +6918,7 @@ static struct r5conf *setup_conf(struct mddev *mddev)
conf = kzalloc(sizeof(struct r5conf), GFP_KERNEL);
if (conf == NULL)
goto abort;
+ r5conf_set_size(conf, PAGE_SIZE);
INIT_LIST_HEAD(&conf->free_list);
INIT_LIST_HEAD(&conf->pending_list);
conf->pending_data = kcalloc(PENDING_IO_MAX,
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index f90e0704bed9..04fc4c514d54 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -471,10 +471,13 @@ struct disk_info {
* Stripe cache
*/
+static unsigned long stripe_size = PAGE_SIZE;
+static unsigned long stripe_shift = PAGE_SHIFT - 9;
+static unsigned long stripe_sectors = PAGE_SIZE>>9;
#define NR_STRIPES 256
-#define STRIPE_SIZE PAGE_SIZE
-#define STRIPE_SHIFT (PAGE_SHIFT - 9)
-#define STRIPE_SECTORS (STRIPE_SIZE>>9)
+#define STRIPE_SIZE stripe_size
+#define STRIPE_SHIFT stripe_shift
+#define STRIPE_SECTORS stripe_sectors
#define IO_THRESHOLD 1
#define BYPASS_THRESHOLD 1
#define NR_HASH (PAGE_SIZE / sizeof(struct
hlist_head))
@@ -574,6 +577,9 @@ struct r5conf {
int raid_disks;
int max_nr_stripes;
int min_nr_stripes;
+ unsigned long stripe_size;
+ unsigned long stripe_shift;
+ unsigned long stripe_sectors;
/* reshape_progress is the leading edge of a 'reshape'
* It has value MaxSector when no reshape is happening
@@ -690,6 +696,12 @@ struct r5conf {
struct r5pending_data *next_pending_data;
};
+static inline void r5conf_set_size(struct r5conf *conf, unsigned long size)
+{
+ stripe_size = conf->stripe_size = size;
+ stripe_shift = conf->stripe_shift = ilog2(size) - 9;
+ stripe_sectors = conf->stripe_sectors = size >> 9;
+}
/*
* Our supported algorithms
Thanks,
Guoqing
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf
2020-07-06 9:09 ` Guoqing Jiang
@ 2020-07-06 11:34 ` Guoqing Jiang
2020-07-08 2:22 ` Yufen Yu
1 sibling, 0 replies; 37+ messages in thread
From: Guoqing Jiang @ 2020-07-06 11:34 UTC (permalink / raw)
To: Yufen Yu, song; +Cc: linux-raid, neilb, houtao1
On 7/6/20 11:09 AM, Guoqing Jiang wrote:
> On 7/2/20 2:06 PM, Yufen Yu wrote:
>> @@ -2509,10 +2532,11 @@ static void raid5_end_read_request(struct bio
>> * bi)
>> */
>> pr_info_ratelimited(
>> "md/raid:%s: read error corrected (%lu sectors at
>> %llu on %s)\n",
>> - mdname(conf->mddev), STRIPE_SECTORS,
>> + mdname(conf->mddev),
>> + conf->stripe_sectors,
>
> The conf->stripe_sectors is printed with %lu format.
>
>> ot allow a suitable chunk size */
>> return ERR_PTR(-EINVAL);
>> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
>> index f90e0704bed9..e36cf71e8465 100644
>> --- a/drivers/md/raid5.h
>> +++ b/drivers/md/raid5.h
>> @@ -472,32 +472,12 @@ struct disk_info {
>> */
>> #define NR_STRIPES 256
>> -#define STRIPE_SIZE PAGE_SIZE
>> -#define STRIPE_SHIFT (PAGE_SHIFT - 9)
>> -#define STRIPE_SECTORS (STRIPE_SIZE>>9)
>> #define IO_THRESHOLD 1
>> #define BYPASS_THRESHOLD 1
>> #define NR_HASH (PAGE_SIZE / sizeof(struct hlist_head))
>> #define HASH_MASK (NR_HASH - 1)
>> #define MAX_STRIPE_BATCH 8
>
> [...]
>
>> return NULL;
>> -}
>> -
>> /* NOTE NR_STRIPE_HASH_LOCKS must remain below 64.
>> * This is because we sometimes take all the spinlocks
>> * and creating that much locking depth can cause
>> @@ -574,6 +554,9 @@ struct r5conf {
>> int raid_disks;
>> int max_nr_stripes;
>> int min_nr_stripes;
>> + unsigned int stripe_size;
>> + unsigned int stripe_shift;
>> + unsigned int stripe_sectors;
>
> So you need to define it with "unsigned long".
>
> Also, I am wondering if it is cleaner with something like below.
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 8dea4398b191..984eea97e77c 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -6918,6 +6918,7 @@ static struct r5conf *setup_conf(struct mddev
> *mddev)
> conf = kzalloc(sizeof(struct r5conf), GFP_KERNEL);
> if (conf == NULL)
> goto abort;
> + r5conf_set_size(conf, PAGE_SIZE);
> INIT_LIST_HEAD(&conf->free_list);
> INIT_LIST_HEAD(&conf->pending_list);
> conf->pending_data = kcalloc(PENDING_IO_MAX,
> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
> index f90e0704bed9..04fc4c514d54 100644
> --- a/drivers/md/raid5.h
> +++ b/drivers/md/raid5.h
> @@ -471,10 +471,13 @@ struct disk_info {
> * Stripe cache
> */
>
> +static unsigned long stripe_size = PAGE_SIZE;
> +static unsigned long stripe_shift = PAGE_SHIFT - 9;
> +static unsigned long stripe_sectors = PAGE_SIZE>>9;
> #define NR_STRIPES 256
> -#define STRIPE_SIZE PAGE_SIZE
> -#define STRIPE_SHIFT (PAGE_SHIFT - 9)
> -#define STRIPE_SECTORS (STRIPE_SIZE>>9)
> +#define STRIPE_SIZE stripe_size
> +#define STRIPE_SHIFT stripe_shift
> +#define STRIPE_SECTORS stripe_sectors
> #define IO_THRESHOLD 1
> #define BYPASS_THRESHOLD 1
> #define NR_HASH (PAGE_SIZE / sizeof(struct
> hlist_head))
> @@ -574,6 +577,9 @@ struct r5conf {
> int raid_disks;
> int max_nr_stripes;
> int min_nr_stripes;
> + unsigned long stripe_size;
> + unsigned long stripe_shift;
> + unsigned long stripe_sectors;
>
> /* reshape_progress is the leading edge of a 'reshape'
> * It has value MaxSector when no reshape is happening
> @@ -690,6 +696,12 @@ struct r5conf {
> struct r5pending_data *next_pending_data;
> };
>
> +static inline void r5conf_set_size(struct r5conf *conf, unsigned long
> size)
> +{
> + stripe_size = conf->stripe_size = size;
> + stripe_shift = conf->stripe_shift = ilog2(size) - 9;
> + stripe_sectors = conf->stripe_sectors = size >> 9;
> +}
>
> /*
> * Our supported algorithms
Hmm, I guess it can't support multiple raid5 with different sizes.
Thanks,
Guoqing
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf
2020-07-06 9:09 ` Guoqing Jiang
2020-07-06 11:34 ` Guoqing Jiang
@ 2020-07-08 2:22 ` Yufen Yu
1 sibling, 0 replies; 37+ messages in thread
From: Yufen Yu @ 2020-07-08 2:22 UTC (permalink / raw)
To: Guoqing Jiang, song; +Cc: linux-raid, neilb, houtao1
On 2020/7/6 17:09, Guoqing Jiang wrote:
> On 7/2/20 2:06 PM, Yufen Yu wrote:
>> @@ -2509,10 +2532,11 @@ static void raid5_end_read_request(struct bio * bi)
>> */
>> pr_info_ratelimited(
>> "md/raid:%s: read error corrected (%lu sectors at %llu on %s)\n",
>> - mdname(conf->mddev), STRIPE_SECTORS,
>> + mdname(conf->mddev),
>> + conf->stripe_sectors,
>
> The conf->stripe_sectors is printed with %lu format.
>
>> ot allow a suitable chunk size */
>> return ERR_PTR(-EINVAL);
>> diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
>> index f90e0704bed9..e36cf71e8465 100644
>> --- a/drivers/md/raid5.h
>> +++ b/drivers/md/raid5.h
>> @@ -472,32 +472,12 @@ struct disk_info {
>> */
>> #define NR_STRIPES 256
>> -#define STRIPE_SIZE PAGE_SIZE
>> -#define STRIPE_SHIFT (PAGE_SHIFT - 9)
>> -#define STRIPE_SECTORS (STRIPE_SIZE>>9)
>> #define IO_THRESHOLD 1
>> #define BYPASS_THRESHOLD 1
>> #define NR_HASH (PAGE_SIZE / sizeof(struct hlist_head))
>> #define HASH_MASK (NR_HASH - 1)
>> #define MAX_STRIPE_BATCH 8
>
> [...]
>
>> return NULL;
>> -}
>> -
>> /* NOTE NR_STRIPE_HASH_LOCKS must remain below 64.
>> * This is because we sometimes take all the spinlocks
>> * and creating that much locking depth can cause
>> @@ -574,6 +554,9 @@ struct r5conf {
>> int raid_disks;
>> int max_nr_stripes;
>> int min_nr_stripes;
>> + unsigned int stripe_size;
>> + unsigned int stripe_shift;
>> + unsigned int stripe_sectors;
>
> So you need to define it with "unsigned long".
Yes, I will revise it.
Thanks,
Yufen
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value
2020-07-02 23:00 ` [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Song Liu
@ 2020-07-08 13:14 ` Yufen Yu
2020-07-08 23:55 ` Song Liu
0 siblings, 1 reply; 37+ messages in thread
From: Yufen Yu @ 2020-07-08 13:14 UTC (permalink / raw)
To: Song Liu; +Cc: linux-raid, NeilBrown, Guoqing Jiang, Hou Tao
On 2020/7/3 7:00, Song Liu wrote:
> On Thu, Jul 2, 2020 at 5:05 AM Yufen Yu <yuyufen@huawei.com> wrote:
>>
>> Hi, all
>>
>> For now, STRIPE_SIZE is equal to the value of PAGE_SIZE. That means, RAID5
>> will issue each bio to disk at least 64KB when PAGE_SIZE is 64KB in arm64.
>> However, filesystem usually issue bio in the unit of 4KB. Then, RAID5 may
>> waste resource of disk bandwidth.
>>
>> To solve the problem, this patchset try to set stripe_size as a configuare
>> value. The default value is 4096. We will add a new sysfs entry and set it
>> by writing a new value, likely:
>>
>> echo 16384 > /sys/block/md1/md/stripe_size
>
> Higher level question: do we need to support page size that is NOT 4kB
> times power
> of 2? Meaning, do we need to support 12kB, 20kB, 24kB, etc. If we only
> supports, 4kB,
> 8kB, 16kB, 32kB, etc. some of the logic can be simpler.
Yeah, I think we just support 4kb, 8kb, 16kb, 32kb... is enough.
But Sorry that I don't know what logic can be simpler in current implementation.
I mean it also need to allocate page, and record page offset.
Thanks,
Yufen
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value
2020-07-08 13:14 ` Yufen Yu
@ 2020-07-08 23:55 ` Song Liu
2020-07-09 13:27 ` Yufen Yu
0 siblings, 1 reply; 37+ messages in thread
From: Song Liu @ 2020-07-08 23:55 UTC (permalink / raw)
To: Yufen Yu; +Cc: linux-raid, NeilBrown, Guoqing Jiang, Hou Tao
On Wed, Jul 8, 2020 at 6:15 AM Yufen Yu <yuyufen@huawei.com> wrote:
>
>
>
> On 2020/7/3 7:00, Song Liu wrote:
> > On Thu, Jul 2, 2020 at 5:05 AM Yufen Yu <yuyufen@huawei.com> wrote:
> >>
> >> Hi, all
> >>
> >> For now, STRIPE_SIZE is equal to the value of PAGE_SIZE. That means, RAID5
> >> will issue each bio to disk at least 64KB when PAGE_SIZE is 64KB in arm64.
> >> However, filesystem usually issue bio in the unit of 4KB. Then, RAID5 may
> >> waste resource of disk bandwidth.
> >>
> >> To solve the problem, this patchset try to set stripe_size as a configuare
> >> value. The default value is 4096. We will add a new sysfs entry and set it
> >> by writing a new value, likely:
> >>
> >> echo 16384 > /sys/block/md1/md/stripe_size
> >
> > Higher level question: do we need to support page size that is NOT 4kB
> > times power
> > of 2? Meaning, do we need to support 12kB, 20kB, 24kB, etc. If we only
> > supports, 4kB,
> > 8kB, 16kB, 32kB, etc. some of the logic can be simpler.
>
> Yeah, I think we just support 4kb, 8kb, 16kb, 32kb... is enough.
> But Sorry that I don't know what logic can be simpler in current implementation.
> I mean it also need to allocate page, and record page offset.
I was thinking about replacing multiplication/division with bit
operations (shift left/right).
But I am not very sure how much that matters in modern ARM CPUs. Would you mind
running some benchmarks with this?
Thanks,
Song
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value
2020-07-08 23:55 ` Song Liu
@ 2020-07-09 13:27 ` Yufen Yu
2020-07-10 16:09 ` Song Liu
0 siblings, 1 reply; 37+ messages in thread
From: Yufen Yu @ 2020-07-09 13:27 UTC (permalink / raw)
To: Song Liu; +Cc: linux-raid, NeilBrown, Guoqing Jiang, Hou Tao
On 2020/7/9 7:55, Song Liu wrote:
> On Wed, Jul 8, 2020 at 6:15 AM Yufen Yu <yuyufen@huawei.com> wrote:
>>
>>
>>
>> On 2020/7/3 7:00, Song Liu wrote:
>>> On Thu, Jul 2, 2020 at 5:05 AM Yufen Yu <yuyufen@huawei.com> wrote:
>>>>
>>>> Hi, all
>>>>
>>>> For now, STRIPE_SIZE is equal to the value of PAGE_SIZE. That means, RAID5
>>>> will issue each bio to disk at least 64KB when PAGE_SIZE is 64KB in arm64.
>>>> However, filesystem usually issue bio in the unit of 4KB. Then, RAID5 may
>>>> waste resource of disk bandwidth.
>>>>
>>>> To solve the problem, this patchset try to set stripe_size as a configuare
>>>> value. The default value is 4096. We will add a new sysfs entry and set it
>>>> by writing a new value, likely:
>>>>
>>>> echo 16384 > /sys/block/md1/md/stripe_size
>>>
>>> Higher level question: do we need to support page size that is NOT 4kB
>>> times power
>>> of 2? Meaning, do we need to support 12kB, 20kB, 24kB, etc. If we only
>>> supports, 4kB,
>>> 8kB, 16kB, 32kB, etc. some of the logic can be simpler.
>>
>> Yeah, I think we just support 4kb, 8kb, 16kb, 32kb... is enough.
>> But Sorry that I don't know what logic can be simpler in current implementation.
>> I mean it also need to allocate page, and record page offset.
>
> I was thinking about replacing multiplication/division with bit
> operations (shift left/right).
> But I am not very sure how much that matters in modern ARM CPUs. Would you mind
> running some benchmarks with this?
To test multiplication/division and bit operation, I write a simple test case:
$ cat normal.c
int page_size = 65536;
int stripe_size = 32768; //32KB
int main(int argc, char *argv[])
{
int i, j, count;
int page, offset;
if (argc != 2)
return -1;
count = atol(argv[1]);
for (i = 0; i < count; i++) {
for (j = 0; j < 4; j++) {
page = page_size / stripe_size;
offset = j * stripe_size;
}
}
}
$ cat shift.c
int page_shift = 16; //64KB
int stripe_shift = 15; //32KB
int main(int argc, char *argv[])
{
int i, j, count;
int page, offset;
if (argc != 2)
return -1;
count = atol(argv[1]);
for (i = 0; i < count; i++) {
for (j = 0; j < 4; j++) {
page = 1 << (page_shift - stripe_shift);
offset = j << stripe_shift;
}
}
}
Test them on a arm64 server, the result show there is a minor
performance gap between multiplication/division and shift operation.
[root@localhost shift]# time ./normal 104857600
real 0m1.199s
user 0m1.198s
sys 0m0.000s
[root@localhost shift]# time ./shift 104857600
real 0m1.166s
user 0m1.166s
sys 0m0.000s
For our implementation, page address and page offset are just computed
when allocate stripe_size. After that, we just use the recorded value
in sh->dev[i].page and sh->dev[i].offset. So, I think current implementation
may not cause much overhead.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value
2020-07-09 13:27 ` Yufen Yu
@ 2020-07-10 16:09 ` Song Liu
0 siblings, 0 replies; 37+ messages in thread
From: Song Liu @ 2020-07-10 16:09 UTC (permalink / raw)
To: Yufen Yu; +Cc: linux-raid, NeilBrown, Guoqing Jiang, Hou Tao
On Thu, Jul 9, 2020 at 6:27 AM Yufen Yu <yuyufen@huawei.com> wrote:
>
>
>
> On 2020/7/9 7:55, Song Liu wrote:
> > On Wed, Jul 8, 2020 at 6:15 AM Yufen Yu <yuyufen@huawei.com> wrote:
> >>
> >>
> >>
> >> On 2020/7/3 7:00, Song Liu wrote:
> >>> On Thu, Jul 2, 2020 at 5:05 AM Yufen Yu <yuyufen@huawei.com> wrote:
> >>>>
> >>>> Hi, all
> >>>>
> >>>> For now, STRIPE_SIZE is equal to the value of PAGE_SIZE. That means, RAID5
> >>>> will issue each bio to disk at least 64KB when PAGE_SIZE is 64KB in arm64.
> >>>> However, filesystem usually issue bio in the unit of 4KB. Then, RAID5 may
> >>>> waste resource of disk bandwidth.
> >>>>
> >>>> To solve the problem, this patchset try to set stripe_size as a configuare
> >>>> value. The default value is 4096. We will add a new sysfs entry and set it
> >>>> by writing a new value, likely:
> >>>>
> >>>> echo 16384 > /sys/block/md1/md/stripe_size
> >>>
> >>> Higher level question: do we need to support page size that is NOT 4kB
> >>> times power
> >>> of 2? Meaning, do we need to support 12kB, 20kB, 24kB, etc. If we only
> >>> supports, 4kB,
> >>> 8kB, 16kB, 32kB, etc. some of the logic can be simpler.
> >>
> >> Yeah, I think we just support 4kb, 8kb, 16kb, 32kb... is enough.
> >> But Sorry that I don't know what logic can be simpler in current implementation.
> >> I mean it also need to allocate page, and record page offset.
> >
> > I was thinking about replacing multiplication/division with bit
> > operations (shift left/right).
> > But I am not very sure how much that matters in modern ARM CPUs. Would you mind
> > running some benchmarks with this?
>
> To test multiplication/division and bit operation, I write a simple test case:
>
> $ cat normal.c
>
> int page_size = 65536;
> int stripe_size = 32768; //32KB
>
> int main(int argc, char *argv[])
> {
> int i, j, count;
> int page, offset;
>
> if (argc != 2)
> return -1;
>
> count = atol(argv[1]);
>
> for (i = 0; i < count; i++) {
> for (j = 0; j < 4; j++) {
> page = page_size / stripe_size;
> offset = j * stripe_size;
> }
> }
> }
>
> $ cat shift.c
>
> int page_shift = 16; //64KB
> int stripe_shift = 15; //32KB
>
> int main(int argc, char *argv[])
> {
> int i, j, count;
> int page, offset;
>
> if (argc != 2)
> return -1;
>
> count = atol(argv[1]);
>
> for (i = 0; i < count; i++) {
> for (j = 0; j < 4; j++) {
> page = 1 << (page_shift - stripe_shift);
> offset = j << stripe_shift;
> }
> }
> }
>
> Test them on a arm64 server, the result show there is a minor
> performance gap between multiplication/division and shift operation.
>
> [root@localhost shift]# time ./normal 104857600
>
> real 0m1.199s
> user 0m1.198s
> sys 0m0.000s
>
> [root@localhost shift]# time ./shift 104857600
>
> real 0m1.166s
> user 0m1.166s
> sys 0m0.000s
>
> For our implementation, page address and page offset are just computed
> when allocate stripe_size. After that, we just use the recorded value
> in sh->dev[i].page and sh->dev[i].offset. So, I think current implementation
> may not cause much overhead.
Sounds good. Let's keep this part as-is.
Thanks,
Song
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf
2020-07-02 18:23 ` Paul Menzel
@ 2020-07-12 22:55 ` antlists
0 siblings, 0 replies; 37+ messages in thread
From: antlists @ 2020-07-12 22:55 UTC (permalink / raw)
To: Paul Menzel, Song Liu, Yufen Yu
Cc: linux-raid, NeilBrown, Guoqing Jiang, Hou Tao
On 02/07/2020 19:23, Paul Menzel wrote:
> Dear Yufen,
>
>
> Am 02.07.20 um 20:15 schrieb Song Liu:
>> On Thu, Jul 2, 2020 at 5:05 AM Yufen Yu <yuyufen@huawei.com> wrote:
>>>
>>> We covert STRIPE_SIZE, STRIPE_SHIFT and STRIPE_SECTORS to stripe_size,
>>> stripe_shift and stripe_sectors as members of struct r5conf. Then each
>>> raid456 array can config different stripe_size. This patch is prepared
>>> for following configurable stripe_size.
>>>
>>> Simply replace word STRIPE_ with conf->stripe_ and add 'conf' argument
>>> for function stripe_hash_locks_hash() and r5_next_bio() to get
>>> stripe_size.
>>> After that, we initialize stripe_size into setup_conf().
>>>
>>> Signed-off-by: Yufen Yu <yuyufen@huawei.com>
>>
>> This patch looks good. Please fix the warning found by the kernel test
>> bot.
>> Also a nitpick below.
>
> Please also fix the typo *covert* to *Convert* in the commit message
> summary, and maybe use *to* instead of *as*.
Actually, if I understand the Pidgin correctly, "as" is correct grammar.
"to" just doesn't work for me ...
>
> > Convert macro define of STRIPE_* to members of struct r5conf
>
Cheers,
Wol
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2020-07-12 22:55 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-02 12:06 [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Yufen Yu
2020-07-02 12:06 ` [PATCH v5 01/16] md/raid456: covert macro define of STRIPE_* as members of struct r5conf Yufen Yu
2020-07-02 14:51 ` kernel test robot
2020-07-02 14:51 ` kernel test robot
2020-07-02 15:44 ` kernel test robot
2020-07-02 15:44 ` kernel test robot
2020-07-02 18:15 ` Song Liu
2020-07-02 18:23 ` Paul Menzel
2020-07-12 22:55 ` antlists
2020-07-06 9:09 ` Guoqing Jiang
2020-07-06 11:34 ` Guoqing Jiang
2020-07-08 2:22 ` Yufen Yu
2020-07-02 12:06 ` [PATCH v5 02/16] md/raid5: add sysfs entry to set and show stripe_size Yufen Yu
2020-07-02 22:14 ` Song Liu
2020-07-02 12:06 ` [PATCH v5 03/16] md/raid5: set default stripe_size as 4096 Yufen Yu
2020-07-02 12:06 ` [PATCH v5 04/16] md/raid5: add a member of r5pages for struct stripe_head Yufen Yu
2020-07-02 22:56 ` Song Liu
2020-07-03 1:22 ` Jason Yan
2020-07-02 12:06 ` [PATCH v5 05/16] md/raid5: allocate and free shared pages of r5pages Yufen Yu
2020-07-02 12:06 ` [PATCH v5 06/16] md/raid5: set correct page offset for bi_io_vec in ops_run_io() Yufen Yu
2020-07-02 12:06 ` [PATCH v5 07/16] md/raid5: set correct page offset for async_copy_data() Yufen Yu
2020-07-02 12:06 ` [PATCH v5 08/16] md/raid5: resize stripes and set correct offset when reshape array Yufen Yu
2020-07-02 12:06 ` [PATCH v5 09/16] md/raid5: add new xor function to support different page offset Yufen Yu
2020-07-02 12:06 ` [PATCH v5 10/16] md/raid5: add offset array in scribble buffer Yufen Yu
2020-07-02 12:06 ` [PATCH v5 11/16] md/raid5: compute xor with correct page offset Yufen Yu
2020-07-02 12:06 ` [PATCH v5 12/16] md/raid5: support config stripe_size by sysfs entry Yufen Yu
2020-07-02 22:38 ` Song Liu
2020-07-04 12:25 ` Yufen Yu
2020-07-02 12:06 ` [PATCH v5 13/16] md/raid6: let syndrome computor support different page offset Yufen Yu
2020-07-02 12:06 ` [PATCH v5 14/16] md/raid6: let async recovery function " Yufen Yu
2020-07-02 12:06 ` [PATCH v5 15/16] md/raid6: compute syndrome with correct " Yufen Yu
2020-07-02 12:06 ` [PATCH v5 16/16] raid6test: adaptation with syndrome function Yufen Yu
2020-07-02 23:00 ` [PATCH v5 00/16] md/raid5: set STRIPE_SIZE as a configurable value Song Liu
2020-07-08 13:14 ` Yufen Yu
2020-07-08 23:55 ` Song Liu
2020-07-09 13:27 ` Yufen Yu
2020-07-10 16:09 ` Song Liu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.