* RDMA (smbdirect) testing @ 2022-05-19 20:41 Steve French 2022-05-19 23:06 ` Namjae Jeon ` (3 more replies) 0 siblings, 4 replies; 42+ messages in thread From: Steve French @ 2022-05-19 20:41 UTC (permalink / raw) To: Namjae Jeon, Hyeoncheol Lee; +Cc: CIFS, David Howells, Long Li Namjae and Hyeoncheol, Have you had any luck setting up virtual RDMA devices for testing ksmbd RDMA? (similar to "/usr/sbin/rdma link add siw0 type siw netdev etho" then mount with "rdma" mount option) -- Thanks, Steve ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-19 20:41 RDMA (smbdirect) testing Steve French @ 2022-05-19 23:06 ` Namjae Jeon 2022-05-20 6:01 ` Hyunchul Lee 2022-05-20 6:20 ` David Howells ` (2 subsequent siblings) 3 siblings, 1 reply; 42+ messages in thread From: Namjae Jeon @ 2022-05-19 23:06 UTC (permalink / raw) To: Steve French; +Cc: Hyeoncheol Lee, CIFS, David Howells, Long Li 2022-05-20 5:41 GMT+09:00, Steve French <smfrench@gmail.com>: Hi Steve, > Namjae and Hyeoncheol, > Have you had any luck setting up virtual RDMA devices for testing ksmbd > RDMA? You seem to be asking about soft-ROCE(or soft-iWARP). Hyunchul had been testing RDMA of ksmbd with it before. Hyunchul, please explain how to set-up it. Thanks! > > (similar to "/usr/sbin/rdma link add siw0 type siw netdev etho" then > mount with "rdma" mount option) > > > > -- > Thanks, > > Steve > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-19 23:06 ` Namjae Jeon @ 2022-05-20 6:01 ` Hyunchul Lee 2022-05-20 18:03 ` Tom Talpey 2022-05-20 18:12 ` David Howells 0 siblings, 2 replies; 42+ messages in thread From: Hyunchul Lee @ 2022-05-20 6:01 UTC (permalink / raw) To: Steve French; +Cc: CIFS, David Howells, Long Li, Namjae Jeon Hello Steve, Please refer to the page below to configure soft-ROCE: https://support.mellanox.com/s/article/howto-configure-soft-roce These kernel configs have to be turned on: CONFIG_CIFS_SMB_DIRECT CONFIG_SMB_SERVER CONFIG_SMB_SERVER_SMBDIRECT And you can mount cifs with SMB-direct: mount -t cifs -o rdma,... //<IP address of ethernet interface coupled with soft-ROCE>/<share> ... 2022년 5월 20일 (금) 오전 8:06, Namjae Jeon <linkinjeon@kernel.org>님이 작성: > > 2022-05-20 5:41 GMT+09:00, Steve French <smfrench@gmail.com>: > Hi Steve, > > Namjae and Hyeoncheol, > > Have you had any luck setting up virtual RDMA devices for testing ksmbd > > RDMA? > You seem to be asking about soft-ROCE(or soft-iWARP). Hyunchul had > been testing RDMA > of ksmbd with it before. > Hyunchul, please explain how to set-up it. > > Thanks! > > > > (similar to "/usr/sbin/rdma link add siw0 type siw netdev etho" then > > mount with "rdma" mount option) > > > > > > > > -- > > Thanks, > > > > Steve > > -- Thanks, Hyunchul ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-20 6:01 ` Hyunchul Lee @ 2022-05-20 18:03 ` Tom Talpey 2022-05-20 18:12 ` David Howells 1 sibling, 0 replies; 42+ messages in thread From: Tom Talpey @ 2022-05-20 18:03 UTC (permalink / raw) To: Hyunchul Lee, Steve French; +Cc: CIFS, David Howells, Long Li, Namjae Jeon SoftROCE is a bit of a hot mess in upstream right now. It's getting a lot of attention, but it's still pretty shaky. If you're testing, I'd STRONGLY recommend SoftiWARP. Tom. On 5/20/2022 2:01 AM, Hyunchul Lee wrote: > Hello Steve, > > Please refer to the page below to configure soft-ROCE: > https://support.mellanox.com/s/article/howto-configure-soft-roce > > These kernel configs have to be turned on: > CONFIG_CIFS_SMB_DIRECT > CONFIG_SMB_SERVER > CONFIG_SMB_SERVER_SMBDIRECT > > And you can mount cifs with SMB-direct: > mount -t cifs -o rdma,... //<IP address of ethernet interface coupled > with soft-ROCE>/<share> ... > > > 2022년 5월 20일 (금) 오전 8:06, Namjae Jeon <linkinjeon@kernel.org>님이 작성: >> >> 2022-05-20 5:41 GMT+09:00, Steve French <smfrench@gmail.com>: >> Hi Steve, >>> Namjae and Hyeoncheol, >>> Have you had any luck setting up virtual RDMA devices for testing ksmbd >>> RDMA? >> You seem to be asking about soft-ROCE(or soft-iWARP). Hyunchul had >> been testing RDMA >> of ksmbd with it before. >> Hyunchul, please explain how to set-up it. >> >> Thanks! >>> >>> (similar to "/usr/sbin/rdma link add siw0 type siw netdev etho" then >>> mount with "rdma" mount option) >>> >>> >>> >>> -- >>> Thanks, >>> >>> Steve >>> > > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-20 6:01 ` Hyunchul Lee 2022-05-20 18:03 ` Tom Talpey @ 2022-05-20 18:12 ` David Howells 2022-05-21 11:54 ` Tom Talpey 1 sibling, 1 reply; 42+ messages in thread From: David Howells @ 2022-05-20 18:12 UTC (permalink / raw) To: Tom Talpey Cc: dhowells, Hyunchul Lee, Steve French, CIFS, Long Li, Namjae Jeon Tom Talpey <tom@talpey.com> wrote: > SoftROCE is a bit of a hot mess in upstream right now. It's > getting a lot of attention, but it's still pretty shaky. > If you're testing, I'd STRONGLY recommend SoftiWARP. I'm having problems getting that working. I'm setting the client up with: rdma link add siw0 type siw netdev enp6s0 mount //192.168.6.1/scratch /xfstest.scratch -o rdma,user=shares,pass=... and then see: CIFS: Attempting to mount \\192.168.6.1\scratch CIFS: VFS: _smbd_get_connection:1513 warning: device max_send_sge = 6 too small CIFS: VFS: _smbd_get_connection:1516 Queue Pair creation may fail CIFS: VFS: _smbd_get_connection:1519 warning: device max_recv_sge = 6 too small CIFS: VFS: _smbd_get_connection:1522 Queue Pair creation may fail CIFS: VFS: _smbd_get_connection:1559 rdma_create_qp failed -22 CIFS: VFS: _smbd_get_connection:1513 warning: device max_send_sge = 6 too small CIFS: VFS: _smbd_get_connection:1516 Queue Pair creation may fail CIFS: VFS: _smbd_get_connection:1519 warning: device max_recv_sge = 6 too small CIFS: VFS: _smbd_get_connection:1522 Queue Pair creation may fail CIFS: VFS: _smbd_get_connection:1559 rdma_create_qp failed -22 CIFS: VFS: cifs_mount failed w/return code = -2 in dmesg. Problem is, I don't know what to do about it:-/ David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-20 18:12 ` David Howells @ 2022-05-21 11:54 ` Tom Talpey 2022-05-22 23:06 ` Namjae Jeon ` (3 more replies) 0 siblings, 4 replies; 42+ messages in thread From: Tom Talpey @ 2022-05-21 11:54 UTC (permalink / raw) To: David Howells, Long Li, Namjae Jeon; +Cc: Hyunchul Lee, Steve French, CIFS On 5/20/2022 2:12 PM, David Howells wrote: > Tom Talpey <tom@talpey.com> wrote: > >> SoftROCE is a bit of a hot mess in upstream right now. It's >> getting a lot of attention, but it's still pretty shaky. >> If you're testing, I'd STRONGLY recommend SoftiWARP. > > I'm having problems getting that working. I'm setting the client up with: > > rdma link add siw0 type siw netdev enp6s0 > mount //192.168.6.1/scratch /xfstest.scratch -o rdma,user=shares,pass=... > > and then see: > > CIFS: Attempting to mount \\192.168.6.1\scratch > CIFS: VFS: _smbd_get_connection:1513 warning: device max_send_sge = 6 too small > CIFS: VFS: _smbd_get_connection:1516 Queue Pair creation may fail > CIFS: VFS: _smbd_get_connection:1519 warning: device max_recv_sge = 6 too small > CIFS: VFS: _smbd_get_connection:1522 Queue Pair creation may fail > CIFS: VFS: _smbd_get_connection:1559 rdma_create_qp failed -22 > CIFS: VFS: _smbd_get_connection:1513 warning: device max_send_sge = 6 too small > CIFS: VFS: _smbd_get_connection:1516 Queue Pair creation may fail > CIFS: VFS: _smbd_get_connection:1519 warning: device max_recv_sge = 6 too small > CIFS: VFS: _smbd_get_connection:1522 Queue Pair creation may fail > CIFS: VFS: _smbd_get_connection:1559 rdma_create_qp failed -22 > CIFS: VFS: cifs_mount failed w/return code = -2 > > in dmesg. > > Problem is, I don't know what to do about it:-/ It looks like the client is hardcoding 16 sge's, and has no option to configure a smaller value, or reduce its requested number. That's bad, because providers all have their own limits - and SIW_MAX_SGE is 6. I thought I'd seen this working (metze?), but either the code changed or someone built a custom version. Namjae/Long, have you used siw successfully? Why does the code require 16 sge's, regardless of other size limits? Normally, if the lower layer supports fewer, the upper layer will simply reduce its operation sizes. Tom. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-21 11:54 ` Tom Talpey @ 2022-05-22 23:06 ` Namjae Jeon 2022-05-23 13:45 ` Tom Talpey [not found] ` <747882.1653311226@warthog.procyon.org.uk> ` (2 subsequent siblings) 3 siblings, 1 reply; 42+ messages in thread From: Namjae Jeon @ 2022-05-22 23:06 UTC (permalink / raw) To: Tom Talpey; +Cc: David Howells, Long Li, Hyunchul Lee, Steve French, CIFS 2022-05-21 20:54 GMT+09:00, Tom Talpey <tom@talpey.com>: > > On 5/20/2022 2:12 PM, David Howells wrote: >> Tom Talpey <tom@talpey.com> wrote: >> >>> SoftROCE is a bit of a hot mess in upstream right now. It's >>> getting a lot of attention, but it's still pretty shaky. >>> If you're testing, I'd STRONGLY recommend SoftiWARP. >> >> I'm having problems getting that working. I'm setting the client up >> with: >> >> rdma link add siw0 type siw netdev enp6s0 >> mount //192.168.6.1/scratch /xfstest.scratch -o rdma,user=shares,pass=... >> >> and then see: >> >> CIFS: Attempting to mount \\192.168.6.1\scratch >> CIFS: VFS: _smbd_get_connection:1513 warning: device max_send_sge = 6 too >> small >> CIFS: VFS: _smbd_get_connection:1516 Queue Pair creation may fail >> CIFS: VFS: _smbd_get_connection:1519 warning: device max_recv_sge = 6 too >> small >> CIFS: VFS: _smbd_get_connection:1522 Queue Pair creation may fail >> CIFS: VFS: _smbd_get_connection:1559 rdma_create_qp failed -22 >> CIFS: VFS: _smbd_get_connection:1513 warning: device max_send_sge = 6 too >> small >> CIFS: VFS: _smbd_get_connection:1516 Queue Pair creation may fail >> CIFS: VFS: _smbd_get_connection:1519 warning: device max_recv_sge = 6 too >> small >> CIFS: VFS: _smbd_get_connection:1522 Queue Pair creation may fail >> CIFS: VFS: _smbd_get_connection:1559 rdma_create_qp failed -22 >> CIFS: VFS: cifs_mount failed w/return code = -2 >> >> in dmesg. >> >> Problem is, I don't know what to do about it:-/ > > It looks like the client is hardcoding 16 sge's, and has no option to > configure a smaller value, or reduce its requested number. That's bad, > because providers all have their own limits - and SIW_MAX_SGE is 6. I > thought I'd seen this working (metze?), but either the code changed or > someone built a custom version. I also fully agree that we should provide users with the path to configure this value. > > Namjae/Long, have you used siw successfully? No. I was able to reproduce the same problem that David reported. I and Hyunchul will take a look. I also confirmed that RDMA work well without any problems with soft-ROCE. Until this problem is fixed, I'd like to say David to use soft-ROCE. > Why does the code require > 16 sge's, regardless of other size limits? Normally, if the lower layer > supports fewer, the upper layer will simply reduce its operation sizes. This should be answered by Long Li. It seems that he set the optimized value for the NICs he used to implement RDMA in cifs. > > Tom. > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-22 23:06 ` Namjae Jeon @ 2022-05-23 13:45 ` Tom Talpey 2022-05-23 15:05 ` Namjae Jeon ` (3 more replies) 0 siblings, 4 replies; 42+ messages in thread From: Tom Talpey @ 2022-05-23 13:45 UTC (permalink / raw) To: Namjae Jeon, David Howells; +Cc: Long Li, Hyunchul Lee, Steve French, CIFS On 5/22/2022 7:06 PM, Namjae Jeon wrote: > 2022-05-21 20:54 GMT+09:00, Tom Talpey <tom@talpey.com>: >> >> On 5/20/2022 2:12 PM, David Howells wrote: >>> Tom Talpey <tom@talpey.com> wrote: >>> >>>> SoftROCE is a bit of a hot mess in upstream right now. It's >>>> getting a lot of attention, but it's still pretty shaky. >>>> If you're testing, I'd STRONGLY recommend SoftiWARP. >>> >>> I'm having problems getting that working. I'm setting the client up >>> with: >>> >>> rdma link add siw0 type siw netdev enp6s0 >>> mount //192.168.6.1/scratch /xfstest.scratch -o rdma,user=shares,pass=... >>> >>> and then see: >>> >>> CIFS: Attempting to mount \\192.168.6.1\scratch >>> CIFS: VFS: _smbd_get_connection:1513 warning: device max_send_sge = 6 too >>> small >>> CIFS: VFS: _smbd_get_connection:1516 Queue Pair creation may fail >>> CIFS: VFS: _smbd_get_connection:1519 warning: device max_recv_sge = 6 too >>> small >>> CIFS: VFS: _smbd_get_connection:1522 Queue Pair creation may fail >>> CIFS: VFS: _smbd_get_connection:1559 rdma_create_qp failed -22 >>> CIFS: VFS: _smbd_get_connection:1513 warning: device max_send_sge = 6 too >>> small >>> CIFS: VFS: _smbd_get_connection:1516 Queue Pair creation may fail >>> CIFS: VFS: _smbd_get_connection:1519 warning: device max_recv_sge = 6 too >>> small >>> CIFS: VFS: _smbd_get_connection:1522 Queue Pair creation may fail >>> CIFS: VFS: _smbd_get_connection:1559 rdma_create_qp failed -22 >>> CIFS: VFS: cifs_mount failed w/return code = -2 >>> >>> in dmesg. >>> >>> Problem is, I don't know what to do about it:-/ >> >> It looks like the client is hardcoding 16 sge's, and has no option to >> configure a smaller value, or reduce its requested number. That's bad, >> because providers all have their own limits - and SIW_MAX_SGE is 6. I >> thought I'd seen this working (metze?), but either the code changed or >> someone built a custom version. > I also fully agree that we should provide users with the path to > configure this value. >> >> Namjae/Long, have you used siw successfully? > No. I was able to reproduce the same problem that David reported. I > and Hyunchul will take a look. I also confirmed that RDMA work well > without any problems with soft-ROCE. Until this problem is fixed, I'd > like to say David to use soft-ROCE. > >> Why does the code require >> 16 sge's, regardless of other size limits? Normally, if the lower layer >> supports fewer, the upper layer will simply reduce its operation sizes. > This should be answered by Long Li. It seems that he set the optimized > value for the NICs he used to implement RDMA in cifs. "Optimized" is a funny choice of words. If the provider doesn't support the value, it's not much of an optimization to insist on 16. :) Personally, I'd try building a kernel with smbdirect.h changed to have SMBDIRECT_MAX_SGE set to 6, and see what happens. You might have to reduce the r/w sizes in mount, depending on any other issues this may reveal. Tom. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-23 13:45 ` Tom Talpey @ 2022-05-23 15:05 ` Namjae Jeon 2022-05-23 16:05 ` Tom Talpey 2022-05-24 9:16 ` David Howells 2022-05-25 9:29 ` David Howells ` (2 subsequent siblings) 3 siblings, 2 replies; 42+ messages in thread From: Namjae Jeon @ 2022-05-23 15:05 UTC (permalink / raw) To: Tom Talpey; +Cc: David Howells, Long Li, Hyunchul Lee, Steve French, CIFS 2022-05-23 22:45 GMT+09:00, Tom Talpey <tom@talpey.com>: > On 5/22/2022 7:06 PM, Namjae Jeon wrote: >> 2022-05-21 20:54 GMT+09:00, Tom Talpey <tom@talpey.com>: >>> >>> On 5/20/2022 2:12 PM, David Howells wrote: >>>> Tom Talpey <tom@talpey.com> wrote: >>>> >>>>> SoftROCE is a bit of a hot mess in upstream right now. It's >>>>> getting a lot of attention, but it's still pretty shaky. >>>>> If you're testing, I'd STRONGLY recommend SoftiWARP. >>>> >>>> I'm having problems getting that working. I'm setting the client up >>>> with: >>>> >>>> rdma link add siw0 type siw netdev enp6s0 >>>> mount //192.168.6.1/scratch /xfstest.scratch -o >>>> rdma,user=shares,pass=... >>>> >>>> and then see: >>>> >>>> CIFS: Attempting to mount \\192.168.6.1\scratch >>>> CIFS: VFS: _smbd_get_connection:1513 warning: device max_send_sge = 6 >>>> too >>>> small >>>> CIFS: VFS: _smbd_get_connection:1516 Queue Pair creation may fail >>>> CIFS: VFS: _smbd_get_connection:1519 warning: device max_recv_sge = 6 >>>> too >>>> small >>>> CIFS: VFS: _smbd_get_connection:1522 Queue Pair creation may fail >>>> CIFS: VFS: _smbd_get_connection:1559 rdma_create_qp failed -22 >>>> CIFS: VFS: _smbd_get_connection:1513 warning: device max_send_sge = 6 >>>> too >>>> small >>>> CIFS: VFS: _smbd_get_connection:1516 Queue Pair creation may fail >>>> CIFS: VFS: _smbd_get_connection:1519 warning: device max_recv_sge = 6 >>>> too >>>> small >>>> CIFS: VFS: _smbd_get_connection:1522 Queue Pair creation may fail >>>> CIFS: VFS: _smbd_get_connection:1559 rdma_create_qp failed -22 >>>> CIFS: VFS: cifs_mount failed w/return code = -2 >>>> >>>> in dmesg. >>>> >>>> Problem is, I don't know what to do about it:-/ >>> >>> It looks like the client is hardcoding 16 sge's, and has no option to >>> configure a smaller value, or reduce its requested number. That's bad, >>> because providers all have their own limits - and SIW_MAX_SGE is 6. I >>> thought I'd seen this working (metze?), but either the code changed or >>> someone built a custom version. >> I also fully agree that we should provide users with the path to >> configure this value. >>> >>> Namjae/Long, have you used siw successfully? >> No. I was able to reproduce the same problem that David reported. I >> and Hyunchul will take a look. I also confirmed that RDMA work well >> without any problems with soft-ROCE. Until this problem is fixed, I'd >> like to say David to use soft-ROCE. >> >>> Why does the code require >>> 16 sge's, regardless of other size limits? Normally, if the lower layer >>> supports fewer, the upper layer will simply reduce its operation sizes. >> This should be answered by Long Li. It seems that he set the optimized >> value for the NICs he used to implement RDMA in cifs. > > "Optimized" is a funny choice of words. If the provider doesn't support > the value, it's not much of an optimization to insist on 16. :) Ah, It's obvious that cifs haven't been tested with soft-iWARP. And the same with ksmbd... > > Personally, I'd try building a kernel with smbdirect.h changed to have > SMBDIRECT_MAX_SGE set to 6, and see what happens. You might have to > reduce the r/w sizes in mount, depending on any other issues this may > reveal. Agreed, and ksmbd should also be changed as well as cifs for test. We are preparing the patches to improve this in ksmbd, rather than changing/building this hardcoding every time. diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h index a87fca82a796..7003722ab004 100644 --- a/fs/cifs/smbdirect.h +++ b/fs/cifs/smbdirect.h @@ -226,7 +226,7 @@ struct smbd_buffer_descriptor_v1 { } __packed; /* Default maximum number of SGEs in a RDMA send/recv */ -#define SMBDIRECT_MAX_SGE 16 +#define SMBDIRECT_MAX_SGE 6 /* The context for a SMBD request */ struct smbd_request { struct smbd_connection *info; diff --git a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c index e646d79554b8..70662b3bd590 100644 --- a/fs/ksmbd/transport_rdma.c +++ b/fs/ksmbd/transport_rdma.c @@ -42,7 +42,7 @@ /* SMB_DIRECT negotiation timeout in seconds */ #define SMB_DIRECT_NEGOTIATE_TIMEOUT 120 -#define SMB_DIRECT_MAX_SEND_SGES 8 +#define SMB_DIRECT_MAX_SEND_SGES 6 #define SMB_DIRECT_MAX_RECV_SGES 1 /* Thanks! > > Tom. > ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-23 15:05 ` Namjae Jeon @ 2022-05-23 16:05 ` Tom Talpey 2022-05-23 19:17 ` Long Li 2022-05-24 0:59 ` Namjae Jeon 2022-05-24 9:16 ` David Howells 1 sibling, 2 replies; 42+ messages in thread From: Tom Talpey @ 2022-05-23 16:05 UTC (permalink / raw) To: Namjae Jeon; +Cc: David Howells, Long Li, Hyunchul Lee, Steve French, CIFS On 5/23/2022 11:05 AM, Namjae Jeon wrote: > 2022-05-23 22:45 GMT+09:00, Tom Talpey <tom@talpey.com>: >> On 5/22/2022 7:06 PM, Namjae Jeon wrote: >>> 2022-05-21 20:54 GMT+09:00, Tom Talpey <tom@talpey.com>: >>>> ... >>>> Why does the code require >>>> 16 sge's, regardless of other size limits? Normally, if the lower layer >>>> supports fewer, the upper layer will simply reduce its operation sizes. >>> This should be answered by Long Li. It seems that he set the optimized >>> value for the NICs he used to implement RDMA in cifs. >> >> "Optimized" is a funny choice of words. If the provider doesn't support >> the value, it's not much of an optimization to insist on 16. :) > Ah, It's obvious that cifs haven't been tested with soft-iWARP. And > the same with ksmbd... >> >> Personally, I'd try building a kernel with smbdirect.h changed to have >> SMBDIRECT_MAX_SGE set to 6, and see what happens. You might have to >> reduce the r/w sizes in mount, depending on any other issues this may >> reveal. > Agreed, and ksmbd should also be changed as well as cifs for test. We > are preparing the patches to improve this in ksmbd, rather than > changing/building this hardcoding every time. So, the patch is just for this test, right? Because I don't think any kernel-based storage upper layer should ever need more than 2 or 3. How many memory regions are you doing per operation? I would expect one for the SMB3 headers, and another, if needed, for data. These would all be lmr-type and would not require actual new memreg's. And for bulk data, I would hope you're using fast-register, which takes a different path and doesn't use the same sge's. Getting this right, and keeping things efficient both in SGE bookkeeping as well as memory registration efficiency, is the rocket science behind RDMA performance and correctness. Slapping "16" or "6" or whatever isn't the long-term fix. Tom. > diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h > index a87fca82a796..7003722ab004 100644 > --- a/fs/cifs/smbdirect.h > +++ b/fs/cifs/smbdirect.h > @@ -226,7 +226,7 @@ struct smbd_buffer_descriptor_v1 { > } __packed; > > /* Default maximum number of SGEs in a RDMA send/recv */ > -#define SMBDIRECT_MAX_SGE 16 > +#define SMBDIRECT_MAX_SGE 6 > /* The context for a SMBD request */ > struct smbd_request { > struct smbd_connection *info; > diff --git a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c > index e646d79554b8..70662b3bd590 100644 > --- a/fs/ksmbd/transport_rdma.c > +++ b/fs/ksmbd/transport_rdma.c > @@ -42,7 +42,7 @@ > /* SMB_DIRECT negotiation timeout in seconds */ > #define SMB_DIRECT_NEGOTIATE_TIMEOUT 120 > > -#define SMB_DIRECT_MAX_SEND_SGES 8 > +#define SMB_DIRECT_MAX_SEND_SGES 6 > #define SMB_DIRECT_MAX_RECV_SGES 1 > > /* > > Thanks! >> >> Tom. >> > ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: RDMA (smbdirect) testing 2022-05-23 16:05 ` Tom Talpey @ 2022-05-23 19:17 ` Long Li 2022-05-24 1:01 ` Namjae Jeon 2022-05-24 0:59 ` Namjae Jeon 1 sibling, 1 reply; 42+ messages in thread From: Long Li @ 2022-05-23 19:17 UTC (permalink / raw) To: tom, linkinjeon; +Cc: David Howells, Hyunchul Lee, Steve French, CIFS > Subject: Re: RDMA (smbdirect) testing > > On 5/23/2022 11:05 AM, Namjae Jeon wrote: > > 2022-05-23 22:45 GMT+09:00, Tom Talpey <tom@talpey.com>: > >> On 5/22/2022 7:06 PM, Namjae Jeon wrote: > >>> 2022-05-21 20:54 GMT+09:00, Tom Talpey <tom@talpey.com>: > >>>> ... > >>>> Why does the code require > >>>> 16 sge's, regardless of other size limits? Normally, if the lower > >>>> layer supports fewer, the upper layer will simply reduce its operation sizes. > >>> This should be answered by Long Li. It seems that he set the > >>> optimized value for the NICs he used to implement RDMA in cifs. > >> > >> "Optimized" is a funny choice of words. If the provider doesn't > >> support the value, it's not much of an optimization to insist on 16. > >> :) > > Ah, It's obvious that cifs haven't been tested with soft-iWARP. And > > the same with ksmbd... > >> > >> Personally, I'd try building a kernel with smbdirect.h changed to > >> have SMBDIRECT_MAX_SGE set to 6, and see what happens. You might have > >> to reduce the r/w sizes in mount, depending on any other issues this > >> may reveal. > > Agreed, and ksmbd should also be changed as well as cifs for test. We > > are preparing the patches to improve this in ksmbd, rather than > > changing/building this hardcoding every time. > > So, the patch is just for this test, right? Because I don't think any kernel-based > storage upper layer should ever need more than 2 or 3. > How many memory regions are you doing per operation? I would expect one for > the SMB3 headers, and another, if needed, for data. > These would all be lmr-type and would not require actual new memreg's. > > And for bulk data, I would hope you're using fast-register, which takes a > different path and doesn't use the same sge's. > > Getting this right, and keeping things efficient both in SGE bookkeeping as well > as memory registration efficiency, is the rocket science behind RDMA > performance and correctness. Slapping "16" or "6" or whatever isn't the long- > term fix. I found max_sge is extremely large on Mellanox hardware, but smaller on other iWARP hardware. Hardcoding it to 16 is certainly not a good choice. I think we should set it to the smaller value of 1) a predefined value (e.g. 8), and the 2) the max_sge the hardware reports. If the CIFS upper layer ever sends data with larger number of SGEs, the send will fail. Long > > Tom. > > > diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h index > > a87fca82a796..7003722ab004 100644 > > --- a/fs/cifs/smbdirect.h > > +++ b/fs/cifs/smbdirect.h > > @@ -226,7 +226,7 @@ struct smbd_buffer_descriptor_v1 { > > } __packed; > > > > /* Default maximum number of SGEs in a RDMA send/recv */ > > -#define SMBDIRECT_MAX_SGE 16 > > +#define SMBDIRECT_MAX_SGE 6 > > /* The context for a SMBD request */ > > struct smbd_request { > > struct smbd_connection *info; diff --git > > a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c index > > e646d79554b8..70662b3bd590 100644 > > --- a/fs/ksmbd/transport_rdma.c > > +++ b/fs/ksmbd/transport_rdma.c > > @@ -42,7 +42,7 @@ > > /* SMB_DIRECT negotiation timeout in seconds */ > > #define SMB_DIRECT_NEGOTIATE_TIMEOUT 120 > > > > -#define SMB_DIRECT_MAX_SEND_SGES 8 > > +#define SMB_DIRECT_MAX_SEND_SGES 6 > > #define SMB_DIRECT_MAX_RECV_SGES 1 > > > > /* > > > > Thanks! > >> > >> Tom. > >> > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-23 19:17 ` Long Li @ 2022-05-24 1:01 ` Namjae Jeon 2022-05-24 21:08 ` Long Li 0 siblings, 1 reply; 42+ messages in thread From: Namjae Jeon @ 2022-05-24 1:01 UTC (permalink / raw) To: Long Li; +Cc: tom, David Howells, Hyunchul Lee, Steve French, CIFS 2022-05-24 4:17 GMT+09:00, Long Li <longli@microsoft.com>: >> Subject: Re: RDMA (smbdirect) testing >> >> On 5/23/2022 11:05 AM, Namjae Jeon wrote: >> > 2022-05-23 22:45 GMT+09:00, Tom Talpey <tom@talpey.com>: >> >> On 5/22/2022 7:06 PM, Namjae Jeon wrote: >> >>> 2022-05-21 20:54 GMT+09:00, Tom Talpey <tom@talpey.com>: >> >>>> ... >> >>>> Why does the code require >> >>>> 16 sge's, regardless of other size limits? Normally, if the lower >> >>>> layer supports fewer, the upper layer will simply reduce its >> >>>> operation sizes. >> >>> This should be answered by Long Li. It seems that he set the >> >>> optimized value for the NICs he used to implement RDMA in cifs. >> >> >> >> "Optimized" is a funny choice of words. If the provider doesn't >> >> support the value, it's not much of an optimization to insist on 16. >> >> :) >> > Ah, It's obvious that cifs haven't been tested with soft-iWARP. And >> > the same with ksmbd... >> >> >> >> Personally, I'd try building a kernel with smbdirect.h changed to >> >> have SMBDIRECT_MAX_SGE set to 6, and see what happens. You might have >> >> to reduce the r/w sizes in mount, depending on any other issues this >> >> may reveal. >> > Agreed, and ksmbd should also be changed as well as cifs for test. We >> > are preparing the patches to improve this in ksmbd, rather than >> > changing/building this hardcoding every time. >> >> So, the patch is just for this test, right? Because I don't think any >> kernel-based >> storage upper layer should ever need more than 2 or 3. >> How many memory regions are you doing per operation? I would expect one >> for >> the SMB3 headers, and another, if needed, for data. >> These would all be lmr-type and would not require actual new memreg's. >> >> And for bulk data, I would hope you're using fast-register, which takes a >> different path and doesn't use the same sge's. >> >> Getting this right, and keeping things efficient both in SGE bookkeeping >> as well >> as memory registration efficiency, is the rocket science behind RDMA >> performance and correctness. Slapping "16" or "6" or whatever isn't the >> long- >> term fix. > Hi Long, > I found max_sge is extremely large on Mellanox hardware, but smaller on > other iWARP hardware. > > Hardcoding it to 16 is certainly not a good choice. I think we should set it > to the smaller value of 1) a predefined value (e.g. 8), and the 2) the > max_sge the hardware reports. Okay, Could you please send the patch for cifs.ko ? Thanks. > > If the CIFS upper layer ever sends data with larger number of SGEs, the send > will fail. > > Long > >> >> Tom. >> >> > diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h index >> > a87fca82a796..7003722ab004 100644 >> > --- a/fs/cifs/smbdirect.h >> > +++ b/fs/cifs/smbdirect.h >> > @@ -226,7 +226,7 @@ struct smbd_buffer_descriptor_v1 { >> > } __packed; >> > >> > /* Default maximum number of SGEs in a RDMA send/recv */ >> > -#define SMBDIRECT_MAX_SGE 16 >> > +#define SMBDIRECT_MAX_SGE 6 >> > /* The context for a SMBD request */ >> > struct smbd_request { >> > struct smbd_connection *info; diff --git >> > a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c index >> > e646d79554b8..70662b3bd590 100644 >> > --- a/fs/ksmbd/transport_rdma.c >> > +++ b/fs/ksmbd/transport_rdma.c >> > @@ -42,7 +42,7 @@ >> > /* SMB_DIRECT negotiation timeout in seconds */ >> > #define SMB_DIRECT_NEGOTIATE_TIMEOUT 120 >> > >> > -#define SMB_DIRECT_MAX_SEND_SGES 8 >> > +#define SMB_DIRECT_MAX_SEND_SGES 6 >> > #define SMB_DIRECT_MAX_RECV_SGES 1 >> > >> > /* >> > >> > Thanks! >> >> >> >> Tom. >> >> >> > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: RDMA (smbdirect) testing 2022-05-24 1:01 ` Namjae Jeon @ 2022-05-24 21:08 ` Long Li 2022-06-02 23:32 ` Namjae Jeon 0 siblings, 1 reply; 42+ messages in thread From: Long Li @ 2022-05-24 21:08 UTC (permalink / raw) To: linkinjeon; +Cc: tom, David Howells, Hyunchul Lee, Steve French, CIFS > Subject: Re: RDMA (smbdirect) testing > > 2022-05-24 4:17 GMT+09:00, Long Li <longli@microsoft.com>: > >> Subject: Re: RDMA (smbdirect) testing > >> > >> On 5/23/2022 11:05 AM, Namjae Jeon wrote: > >> > 2022-05-23 22:45 GMT+09:00, Tom Talpey <tom@talpey.com>: > >> >> On 5/22/2022 7:06 PM, Namjae Jeon wrote: > >> >>> 2022-05-21 20:54 GMT+09:00, Tom Talpey <tom@talpey.com>: > >> >>>> ... > >> >>>> Why does the code require > >> >>>> 16 sge's, regardless of other size limits? Normally, if the > >> >>>> lower layer supports fewer, the upper layer will simply reduce > >> >>>> its operation sizes. > >> >>> This should be answered by Long Li. It seems that he set the > >> >>> optimized value for the NICs he used to implement RDMA in cifs. > >> >> > >> >> "Optimized" is a funny choice of words. If the provider doesn't > >> >> support the value, it's not much of an optimization to insist on 16. > >> >> :) > >> > Ah, It's obvious that cifs haven't been tested with soft-iWARP. And > >> > the same with ksmbd... > >> >> > >> >> Personally, I'd try building a kernel with smbdirect.h changed to > >> >> have SMBDIRECT_MAX_SGE set to 6, and see what happens. You > might > >> >> have to reduce the r/w sizes in mount, depending on any other > >> >> issues this may reveal. > >> > Agreed, and ksmbd should also be changed as well as cifs for test. > >> > We are preparing the patches to improve this in ksmbd, rather than > >> > changing/building this hardcoding every time. > >> > >> So, the patch is just for this test, right? Because I don't think any > >> kernel-based storage upper layer should ever need more than 2 or 3. > >> How many memory regions are you doing per operation? I would expect > >> one for the SMB3 headers, and another, if needed, for data. > >> These would all be lmr-type and would not require actual new memreg's. > >> > >> And for bulk data, I would hope you're using fast-register, which > >> takes a different path and doesn't use the same sge's. > >> > >> Getting this right, and keeping things efficient both in SGE > >> bookkeeping as well as memory registration efficiency, is the rocket > >> science behind RDMA performance and correctness. Slapping "16" or "6" > >> or whatever isn't the > >> long- > >> term fix. > > > Hi Long, > > I found max_sge is extremely large on Mellanox hardware, but smaller > > on other iWARP hardware. > > > > Hardcoding it to 16 is certainly not a good choice. I think we should > > set it to the smaller value of 1) a predefined value (e.g. 8), and the > > 2) the max_sge the hardware reports. > Okay, Could you please send the patch for cifs.ko ? Yes, will do. Long > > Thanks. > > > > If the CIFS upper layer ever sends data with larger number of SGEs, > > the send will fail. > > > > Long > > > >> > >> Tom. > >> > >> > diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h index > >> > a87fca82a796..7003722ab004 100644 > >> > --- a/fs/cifs/smbdirect.h > >> > +++ b/fs/cifs/smbdirect.h > >> > @@ -226,7 +226,7 @@ struct smbd_buffer_descriptor_v1 { > >> > } __packed; > >> > > >> > /* Default maximum number of SGEs in a RDMA send/recv */ > >> > -#define SMBDIRECT_MAX_SGE 16 > >> > +#define SMBDIRECT_MAX_SGE 6 > >> > /* The context for a SMBD request */ > >> > struct smbd_request { > >> > struct smbd_connection *info; diff --git > >> > a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c index > >> > e646d79554b8..70662b3bd590 100644 > >> > --- a/fs/ksmbd/transport_rdma.c > >> > +++ b/fs/ksmbd/transport_rdma.c > >> > @@ -42,7 +42,7 @@ > >> > /* SMB_DIRECT negotiation timeout in seconds */ > >> > #define SMB_DIRECT_NEGOTIATE_TIMEOUT 120 > >> > > >> > -#define SMB_DIRECT_MAX_SEND_SGES 8 > >> > +#define SMB_DIRECT_MAX_SEND_SGES 6 > >> > #define SMB_DIRECT_MAX_RECV_SGES 1 > >> > > >> > /* > >> > > >> > Thanks! > >> >> > >> >> Tom. > >> >> > >> > > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-24 21:08 ` Long Li @ 2022-06-02 23:32 ` Namjae Jeon 2022-06-03 0:07 ` Long Li 0 siblings, 1 reply; 42+ messages in thread From: Namjae Jeon @ 2022-06-02 23:32 UTC (permalink / raw) To: Long Li; +Cc: tom, David Howells, Hyunchul Lee, Steve French, CIFS 2022-05-25 6:08 GMT+09:00, Long Li <longli@microsoft.com>: >> Subject: Re: RDMA (smbdirect) testing >> >> 2022-05-24 4:17 GMT+09:00, Long Li <longli@microsoft.com>: >> >> Subject: Re: RDMA (smbdirect) testing >> >> >> >> On 5/23/2022 11:05 AM, Namjae Jeon wrote: >> >> > 2022-05-23 22:45 GMT+09:00, Tom Talpey <tom@talpey.com>: >> >> >> On 5/22/2022 7:06 PM, Namjae Jeon wrote: >> >> >>> 2022-05-21 20:54 GMT+09:00, Tom Talpey <tom@talpey.com>: >> >> >>>> ... >> >> >>>> Why does the code require >> >> >>>> 16 sge's, regardless of other size limits? Normally, if the >> >> >>>> lower layer supports fewer, the upper layer will simply reduce >> >> >>>> its operation sizes. >> >> >>> This should be answered by Long Li. It seems that he set the >> >> >>> optimized value for the NICs he used to implement RDMA in cifs. >> >> >> >> >> >> "Optimized" is a funny choice of words. If the provider doesn't >> >> >> support the value, it's not much of an optimization to insist on >> >> >> 16. >> >> >> :) >> >> > Ah, It's obvious that cifs haven't been tested with soft-iWARP. And >> >> > the same with ksmbd... >> >> >> >> >> >> Personally, I'd try building a kernel with smbdirect.h changed to >> >> >> have SMBDIRECT_MAX_SGE set to 6, and see what happens. You >> might >> >> >> have to reduce the r/w sizes in mount, depending on any other >> >> >> issues this may reveal. >> >> > Agreed, and ksmbd should also be changed as well as cifs for test. >> >> > We are preparing the patches to improve this in ksmbd, rather than >> >> > changing/building this hardcoding every time. >> >> >> >> So, the patch is just for this test, right? Because I don't think any >> >> kernel-based storage upper layer should ever need more than 2 or 3. >> >> How many memory regions are you doing per operation? I would expect >> >> one for the SMB3 headers, and another, if needed, for data. >> >> These would all be lmr-type and would not require actual new memreg's. >> >> >> >> And for bulk data, I would hope you're using fast-register, which >> >> takes a different path and doesn't use the same sge's. >> >> >> >> Getting this right, and keeping things efficient both in SGE >> >> bookkeeping as well as memory registration efficiency, is the rocket >> >> science behind RDMA performance and correctness. Slapping "16" or "6" >> >> or whatever isn't the >> >> long- >> >> term fix. >> > >> Hi Long, >> > I found max_sge is extremely large on Mellanox hardware, but smaller >> > on other iWARP hardware. >> > >> > Hardcoding it to 16 is certainly not a good choice. I think we should >> > set it to the smaller value of 1) a predefined value (e.g. 8), and the >> > 2) the max_sge the hardware reports. >> Okay, Could you please send the patch for cifs.ko ? Long, My Chelsio(iWARP) NIC reports this value as 4. When I set it with hw report value in cifs.ko, There is kernel oops in cifs.ko. Have you checked smb-direct of cifs.ko with Chelsio and any iWARP NICs before ? or only Mellanox NICs ? Thanks! > > Yes, will do. > > Long > >> >> Thanks. >> > >> > If the CIFS upper layer ever sends data with larger number of SGEs, >> > the send will fail. >> > >> > Long >> > >> >> >> >> Tom. >> >> >> >> > diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h index >> >> > a87fca82a796..7003722ab004 100644 >> >> > --- a/fs/cifs/smbdirect.h >> >> > +++ b/fs/cifs/smbdirect.h >> >> > @@ -226,7 +226,7 @@ struct smbd_buffer_descriptor_v1 { >> >> > } __packed; >> >> > >> >> > /* Default maximum number of SGEs in a RDMA send/recv */ >> >> > -#define SMBDIRECT_MAX_SGE 16 >> >> > +#define SMBDIRECT_MAX_SGE 6 >> >> > /* The context for a SMBD request */ >> >> > struct smbd_request { >> >> > struct smbd_connection *info; diff --git >> >> > a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c index >> >> > e646d79554b8..70662b3bd590 100644 >> >> > --- a/fs/ksmbd/transport_rdma.c >> >> > +++ b/fs/ksmbd/transport_rdma.c >> >> > @@ -42,7 +42,7 @@ >> >> > /* SMB_DIRECT negotiation timeout in seconds */ >> >> > #define SMB_DIRECT_NEGOTIATE_TIMEOUT 120 >> >> > >> >> > -#define SMB_DIRECT_MAX_SEND_SGES 8 >> >> > +#define SMB_DIRECT_MAX_SEND_SGES 6 >> >> > #define SMB_DIRECT_MAX_RECV_SGES 1 >> >> > >> >> > /* >> >> > >> >> > Thanks! >> >> >> >> >> >> Tom. >> >> >> >> >> > >> > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: RDMA (smbdirect) testing 2022-06-02 23:32 ` Namjae Jeon @ 2022-06-03 0:07 ` Long Li 2022-06-07 17:26 ` Tom Talpey 0 siblings, 1 reply; 42+ messages in thread From: Long Li @ 2022-06-03 0:07 UTC (permalink / raw) To: linkinjeon; +Cc: tom, David Howells, Hyunchul Lee, Steve French, CIFS > Long, My Chelsio(iWARP) NIC reports this value as 4. When I set it with hw > report value in cifs.ko, There is kernel oops in cifs.ko. Have you checked smb- > direct of cifs.ko with Chelsio and any iWARP NICs before ? or only Mellanox > NICs ? > > Thanks! Yes, I have tested on Chelsio. I didn't see kernel panic. In fact, I can pass a larger value (8) and successfully create a QP on Chelsio. Can you paste your kernel panic trace? > >> > If the CIFS upper layer ever sends data with larger number of SGEs, > >> > the send will fail. > >> > > >> > Long > >> > > >> >> > >> >> Tom. > >> >> > >> >> > diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h index > >> >> > a87fca82a796..7003722ab004 100644 > >> >> > --- a/fs/cifs/smbdirect.h > >> >> > +++ b/fs/cifs/smbdirect.h > >> >> > @@ -226,7 +226,7 @@ struct smbd_buffer_descriptor_v1 { > >> >> > } __packed; > >> >> > > >> >> > /* Default maximum number of SGEs in a RDMA send/recv */ > >> >> > -#define SMBDIRECT_MAX_SGE 16 > >> >> > +#define SMBDIRECT_MAX_SGE 6 > >> >> > /* The context for a SMBD request */ > >> >> > struct smbd_request { > >> >> > struct smbd_connection *info; diff --git > >> >> > a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c index > >> >> > e646d79554b8..70662b3bd590 100644 > >> >> > --- a/fs/ksmbd/transport_rdma.c > >> >> > +++ b/fs/ksmbd/transport_rdma.c > >> >> > @@ -42,7 +42,7 @@ > >> >> > /* SMB_DIRECT negotiation timeout in seconds */ > >> >> > #define SMB_DIRECT_NEGOTIATE_TIMEOUT 120 > >> >> > > >> >> > -#define SMB_DIRECT_MAX_SEND_SGES 8 > >> >> > +#define SMB_DIRECT_MAX_SEND_SGES 6 > >> >> > #define SMB_DIRECT_MAX_RECV_SGES 1 > >> >> > > >> >> > /* > >> >> > > >> >> > Thanks! > >> >> >> > >> >> >> Tom. > >> >> >> > >> >> > > >> > > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-06-03 0:07 ` Long Li @ 2022-06-07 17:26 ` Tom Talpey 2022-06-07 22:25 ` Namjae Jeon 0 siblings, 1 reply; 42+ messages in thread From: Tom Talpey @ 2022-06-07 17:26 UTC (permalink / raw) To: Long Li, linkinjeon; +Cc: David Howells, Hyunchul Lee, Steve French, CIFS On 6/2/2022 8:07 PM, Long Li wrote: >> Long, My Chelsio(iWARP) NIC reports this value as 4. When I set it with hw >> report value in cifs.ko, There is kernel oops in cifs.ko. Have you checked smb- >> direct of cifs.ko with Chelsio and any iWARP NICs before ? or only Mellanox >> NICs ? >> >> Thanks! > > Yes, I have tested on Chelsio. I didn't see kernel panic. In fact, I can pass a larger value (8) and successfully create a QP on Chelsio. There are many generations of Chelsio RDMA adapters, and this number is very likely to be different. You both should be sure you're testing with multiple configurations (and don't forget all the other in-kernel RDMA NICs). But a constant value of "8" is still arbitrary. The kernel should definitely not throw an oops when the NIC doesn't support some precompiled constant value. Namjae, what oops, exactly? Tom. > Can you paste your kernel panic trace? > >>>>> If the CIFS upper layer ever sends data with larger number of SGEs, >>>>> the send will fail. >>>>> >>>>> Long >>>>> >>>>>> >>>>>> Tom. >>>>>> >>>>>>> diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h index >>>>>>> a87fca82a796..7003722ab004 100644 >>>>>>> --- a/fs/cifs/smbdirect.h >>>>>>> +++ b/fs/cifs/smbdirect.h >>>>>>> @@ -226,7 +226,7 @@ struct smbd_buffer_descriptor_v1 { >>>>>>> } __packed; >>>>>>> >>>>>>> /* Default maximum number of SGEs in a RDMA send/recv */ >>>>>>> -#define SMBDIRECT_MAX_SGE 16 >>>>>>> +#define SMBDIRECT_MAX_SGE 6 >>>>>>> /* The context for a SMBD request */ >>>>>>> struct smbd_request { >>>>>>> struct smbd_connection *info; diff --git >>>>>>> a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c index >>>>>>> e646d79554b8..70662b3bd590 100644 >>>>>>> --- a/fs/ksmbd/transport_rdma.c >>>>>>> +++ b/fs/ksmbd/transport_rdma.c >>>>>>> @@ -42,7 +42,7 @@ >>>>>>> /* SMB_DIRECT negotiation timeout in seconds */ >>>>>>> #define SMB_DIRECT_NEGOTIATE_TIMEOUT 120 >>>>>>> >>>>>>> -#define SMB_DIRECT_MAX_SEND_SGES 8 >>>>>>> +#define SMB_DIRECT_MAX_SEND_SGES 6 >>>>>>> #define SMB_DIRECT_MAX_RECV_SGES 1 >>>>>>> >>>>>>> /* >>>>>>> >>>>>>> Thanks! >>>>>>>> >>>>>>>> Tom. >>>>>>>> >>>>>>> >>>>> >>> ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-06-07 17:26 ` Tom Talpey @ 2022-06-07 22:25 ` Namjae Jeon 0 siblings, 0 replies; 42+ messages in thread From: Namjae Jeon @ 2022-06-07 22:25 UTC (permalink / raw) To: Tom Talpey, Long Li; +Cc: David Howells, Hyunchul Lee, Steve French, CIFS 2022-06-08 2:26 GMT+09:00, Tom Talpey <tom@talpey.com>: > On 6/2/2022 8:07 PM, Long Li wrote: >>> Long, My Chelsio(iWARP) NIC reports this value as 4. When I set it with >>> hw >>> report value in cifs.ko, There is kernel oops in cifs.ko. Have you >>> checked smb- >>> direct of cifs.ko with Chelsio and any iWARP NICs before ? or only >>> Mellanox >>> NICs ? >>> >>> Thanks! >> >> Yes, I have tested on Chelsio. I didn't see kernel panic. In fact, I can >> pass a larger value (8) and successfully create a QP on Chelsio. > > There are many generations of Chelsio RDMA adapters, and this number > is very likely to be different. You both should be sure you're testing > with multiple configurations (and don't forget all the other in-kernel > RDMA NICs). But a constant value of "8" is still arbitrary. > > The kernel should definitely not throw an oops when the NIC doesn't > support some precompiled constant value. Namjae, what oops, exactly? Sorry for noise, This is not cifs.ko issue(with max sge 8). There seems to be some issue in chelsio driver(cxgb4) on the latest kernel. I have reported this issue to chelsio maintainers. They was already aware of this problem and are trying to fix it. When I checked again on linux-5.15, there was no problem. I am going to send a reply about this mail after checking it with cxgb4 patch on the latest kernel, It has not been fixed yet, still waiting for the patch from them. Thanks! > > Tom. > >> Can you paste your kernel panic trace? >> >>>>>> If the CIFS upper layer ever sends data with larger number of SGEs, >>>>>> the send will fail. >>>>>> >>>>>> Long >>>>>> >>>>>>> >>>>>>> Tom. >>>>>>> >>>>>>>> diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h index >>>>>>>> a87fca82a796..7003722ab004 100644 >>>>>>>> --- a/fs/cifs/smbdirect.h >>>>>>>> +++ b/fs/cifs/smbdirect.h >>>>>>>> @@ -226,7 +226,7 @@ struct smbd_buffer_descriptor_v1 { >>>>>>>> } __packed; >>>>>>>> >>>>>>>> /* Default maximum number of SGEs in a RDMA send/recv */ >>>>>>>> -#define SMBDIRECT_MAX_SGE 16 >>>>>>>> +#define SMBDIRECT_MAX_SGE 6 >>>>>>>> /* The context for a SMBD request */ >>>>>>>> struct smbd_request { >>>>>>>> struct smbd_connection *info; diff --git >>>>>>>> a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c index >>>>>>>> e646d79554b8..70662b3bd590 100644 >>>>>>>> --- a/fs/ksmbd/transport_rdma.c >>>>>>>> +++ b/fs/ksmbd/transport_rdma.c >>>>>>>> @@ -42,7 +42,7 @@ >>>>>>>> /* SMB_DIRECT negotiation timeout in seconds */ >>>>>>>> #define SMB_DIRECT_NEGOTIATE_TIMEOUT 120 >>>>>>>> >>>>>>>> -#define SMB_DIRECT_MAX_SEND_SGES 8 >>>>>>>> +#define SMB_DIRECT_MAX_SEND_SGES 6 >>>>>>>> #define SMB_DIRECT_MAX_RECV_SGES 1 >>>>>>>> >>>>>>>> /* >>>>>>>> >>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> Tom. >>>>>>>>> >>>>>>>> >>>>>> >>>> > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-23 16:05 ` Tom Talpey 2022-05-23 19:17 ` Long Li @ 2022-05-24 0:59 ` Namjae Jeon 1 sibling, 0 replies; 42+ messages in thread From: Namjae Jeon @ 2022-05-24 0:59 UTC (permalink / raw) To: Tom Talpey; +Cc: David Howells, Long Li, Hyunchul Lee, Steve French, CIFS 2022-05-24 1:05 GMT+09:00, Tom Talpey <tom@talpey.com>: > On 5/23/2022 11:05 AM, Namjae Jeon wrote: >> 2022-05-23 22:45 GMT+09:00, Tom Talpey <tom@talpey.com>: >>> On 5/22/2022 7:06 PM, Namjae Jeon wrote: >>>> 2022-05-21 20:54 GMT+09:00, Tom Talpey <tom@talpey.com>: >>>>> ... >>>>> Why does the code require >>>>> 16 sge's, regardless of other size limits? Normally, if the lower >>>>> layer >>>>> supports fewer, the upper layer will simply reduce its operation >>>>> sizes. >>>> This should be answered by Long Li. It seems that he set the optimized >>>> value for the NICs he used to implement RDMA in cifs. >>> >>> "Optimized" is a funny choice of words. If the provider doesn't support >>> the value, it's not much of an optimization to insist on 16. :) >> Ah, It's obvious that cifs haven't been tested with soft-iWARP. And >> the same with ksmbd... >>> >>> Personally, I'd try building a kernel with smbdirect.h changed to have >>> SMBDIRECT_MAX_SGE set to 6, and see what happens. You might have to >>> reduce the r/w sizes in mount, depending on any other issues this may >>> reveal. >> Agreed, and ksmbd should also be changed as well as cifs for test. We >> are preparing the patches to improve this in ksmbd, rather than >> changing/building this hardcoding every time. > > So, the patch is just for this test, right? Yes. > Because I don't think any > kernel-based storage upper layer should ever need more than 2 or 3. > How many memory regions are you doing per operation? I would > expect one for the SMB3 headers, and another, if needed, for data. > These would all be lmr-type and would not require actual new memreg's. Maximum 4. (smb transform header , smb3 header+ response, data, smb-direct header) > > And for bulk data, I would hope you're using fast-register, which > takes a different path and doesn't use the same sge's. For bulk data, ksmbd already using it. > > Getting this right, and keeping things efficient both in SGE bookkeeping > as well as memory registration efficiency, is the rocket science behind > RDMA performance and correctness. Slapping "16" or "6" or whatever isn't > the long-term fix. Okay. > > Tom. > >> diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h >> index a87fca82a796..7003722ab004 100644 >> --- a/fs/cifs/smbdirect.h >> +++ b/fs/cifs/smbdirect.h >> @@ -226,7 +226,7 @@ struct smbd_buffer_descriptor_v1 { >> } __packed; >> >> /* Default maximum number of SGEs in a RDMA send/recv */ >> -#define SMBDIRECT_MAX_SGE 16 >> +#define SMBDIRECT_MAX_SGE 6 >> /* The context for a SMBD request */ >> struct smbd_request { >> struct smbd_connection *info; >> diff --git a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c >> index e646d79554b8..70662b3bd590 100644 >> --- a/fs/ksmbd/transport_rdma.c >> +++ b/fs/ksmbd/transport_rdma.c >> @@ -42,7 +42,7 @@ >> /* SMB_DIRECT negotiation timeout in seconds */ >> #define SMB_DIRECT_NEGOTIATE_TIMEOUT 120 >> >> -#define SMB_DIRECT_MAX_SEND_SGES 8 >> +#define SMB_DIRECT_MAX_SEND_SGES 6 >> #define SMB_DIRECT_MAX_RECV_SGES 1 >> >> /* >> >> Thanks! >>> >>> Tom. >>> >> > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-23 15:05 ` Namjae Jeon 2022-05-23 16:05 ` Tom Talpey @ 2022-05-24 9:16 ` David Howells 2022-05-24 17:49 ` Steve French 1 sibling, 1 reply; 42+ messages in thread From: David Howells @ 2022-05-24 9:16 UTC (permalink / raw) To: Tom Talpey Cc: dhowells, Namjae Jeon, Long Li, Hyunchul Lee, Steve French, CIFS Is there some way for cifs to ask the RDMA layer what it supports? David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-24 9:16 ` David Howells @ 2022-05-24 17:49 ` Steve French 2022-05-24 18:12 ` Tom Talpey 0 siblings, 1 reply; 42+ messages in thread From: Steve French @ 2022-05-24 17:49 UTC (permalink / raw) To: David Howells; +Cc: Tom Talpey, Namjae Jeon, Long Li, Hyunchul Lee, CIFS Or alternatively - the "query network interfaces" ioctl - wonder if additional flags that could be added (beyond what is defined in MS-SMB2 2.2.32.5 On Tue, May 24, 2022 at 4:16 AM David Howells <dhowells@redhat.com> wrote: > > Is there some way for cifs to ask the RDMA layer what it supports? > > David > -- Thanks, Steve ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-24 17:49 ` Steve French @ 2022-05-24 18:12 ` Tom Talpey 0 siblings, 0 replies; 42+ messages in thread From: Tom Talpey @ 2022-05-24 18:12 UTC (permalink / raw) To: Steve French, David Howells; +Cc: Namjae Jeon, Long Li, Hyunchul Lee, CIFS On 5/24/2022 1:49 PM, Steve French wrote: > Or alternatively - the "query network interfaces" ioctl - wonder if > additional flags that could be added (beyond what is defined in > MS-SMB2 2.2.32.5 Huh? I see no reason to expose this over the wire, it's entirely a local device API question. Zero impact on the far side of the connection. > On Tue, May 24, 2022 at 4:16 AM David Howells <dhowells@redhat.com> wrote: >> >> Is there some way for cifs to ask the RDMA layer what it supports? Yes, and in fact the client and server already fetch it. It's a standard verbs device attibute. client fs/cifs/smbdirect.c: > if (info->id->device->attrs.max_send_sge < SMBDIRECT_MAX_SGE) { > log_rdma_event(ERR, > "warning: device max_send_sge = %d too small\n", > info->id->device->attrs.max_send_sge); > log_rdma_event(ERR, "Queue Pair creation may fail\n"); > } > if (info->id->device->attrs.max_recv_sge < SMBDIRECT_MAX_SGE) { > log_rdma_event(ERR, > "warning: device max_recv_sge = %d too small\n", > info->id->device->attrs.max_recv_sge); > log_rdma_event(ERR, "Queue Pair creation may fail\n"); > } server fs/ksmbd/transport_rdma.c: > > if (device->attrs.max_send_sge < SMB_DIRECT_MAX_SEND_SGES) { > pr_err("warning: device max_send_sge = %d too small\n", > device->attrs.max_send_sge); > return -EINVAL; > } > if (device->attrs.max_recv_sge < SMB_DIRECT_MAX_RECV_SGES) { > pr_err("warning: device max_recv_sge = %d too small\n", > device->attrs.max_recv_sge); > return -EINVAL; > } Tom. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-23 13:45 ` Tom Talpey 2022-05-23 15:05 ` Namjae Jeon @ 2022-05-25 9:29 ` David Howells 2022-05-25 9:41 ` David Howells 2022-08-02 15:10 ` David Howells 3 siblings, 0 replies; 42+ messages in thread From: David Howells @ 2022-05-25 9:29 UTC (permalink / raw) To: Namjae Jeon Cc: dhowells, Tom Talpey, Long Li, Hyunchul Lee, Steve French, CIFS Namjae Jeon <linkinjeon@kernel.org> wrote: > diff --git a/fs/cifs/smbdirect.h b/fs/cifs/smbdirect.h > index a87fca82a796..7003722ab004 100644 > --- a/fs/cifs/smbdirect.h > +++ b/fs/cifs/smbdirect.h > @@ -226,7 +226,7 @@ struct smbd_buffer_descriptor_v1 { > } __packed; > > /* Default maximum number of SGEs in a RDMA send/recv */ > -#define SMBDIRECT_MAX_SGE 16 > +#define SMBDIRECT_MAX_SGE 6 > /* The context for a SMBD request */ > struct smbd_request { > struct smbd_connection *info; > diff --git a/fs/ksmbd/transport_rdma.c b/fs/ksmbd/transport_rdma.c > index e646d79554b8..70662b3bd590 100644 > --- a/fs/ksmbd/transport_rdma.c > +++ b/fs/ksmbd/transport_rdma.c > @@ -42,7 +42,7 @@ > /* SMB_DIRECT negotiation timeout in seconds */ > #define SMB_DIRECT_NEGOTIATE_TIMEOUT 120 > > -#define SMB_DIRECT_MAX_SEND_SGES 8 > +#define SMB_DIRECT_MAX_SEND_SGES 6 > #define SMB_DIRECT_MAX_RECV_SGES 1 > > /* With that, iWarp works for me. You can add: Tested-by: David Howells <dhowells@redhat.com> ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-23 13:45 ` Tom Talpey 2022-05-23 15:05 ` Namjae Jeon 2022-05-25 9:29 ` David Howells @ 2022-05-25 9:41 ` David Howells 2022-05-25 10:00 ` Stefan Metzmacher 2022-05-25 10:20 ` David Howells 2022-08-02 15:10 ` David Howells 3 siblings, 2 replies; 42+ messages in thread From: David Howells @ 2022-05-25 9:41 UTC (permalink / raw) To: Steve French, Namjae Jeon Cc: dhowells, Tom Talpey, Long Li, Hyunchul Lee, CIFS > With that, iWarp works for me. You can add: Note also that whilst wireshark can decode iwarp traffic carrying NFS, it doesn't recognise iwarp traffic carrying cifs. David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-25 9:41 ` David Howells @ 2022-05-25 10:00 ` Stefan Metzmacher 2022-05-25 10:20 ` David Howells 1 sibling, 0 replies; 42+ messages in thread From: Stefan Metzmacher @ 2022-05-25 10:00 UTC (permalink / raw) To: David Howells, Steve French, Namjae Jeon Cc: Tom Talpey, Long Li, Hyunchul Lee, CIFS Hi David, > Note also that whilst wireshark can decode iwarp traffic carrying NFS, it > doesn't recognise iwarp traffic carrying cifs. It works fine for me. Can you share the capture file? What version of wireshark are you using? metze ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-25 9:41 ` David Howells 2022-05-25 10:00 ` Stefan Metzmacher @ 2022-05-25 10:20 ` David Howells 2022-05-26 14:56 ` Stefan Metzmacher 1 sibling, 1 reply; 42+ messages in thread From: David Howells @ 2022-05-25 10:20 UTC (permalink / raw) To: Stefan Metzmacher Cc: dhowells, Steve French, Namjae Jeon, Tom Talpey, Long Li, Hyunchul Lee, CIFS [-- Attachment #1: Type: text/plain, Size: 223 bytes --] Stefan Metzmacher <metze@samba.org> wrote: > Can you share the capture file? See attached. > What version of wireshark are you using? wireshark-3.6.2-1.fc35.x86_64 Wireshark 3.7.0 (v3.7.0rc0-1686-g64dfed53330f) David [-- Attachment #2: pcap-rs.gz --] [-- Type: application/gzip, Size: 8560 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-25 10:20 ` David Howells @ 2022-05-26 14:56 ` Stefan Metzmacher 2022-05-26 15:52 ` Tom Talpey 0 siblings, 1 reply; 42+ messages in thread From: Stefan Metzmacher @ 2022-05-26 14:56 UTC (permalink / raw) To: David Howells Cc: Steve French, Namjae Jeon, Tom Talpey, Long Li, Hyunchul Lee, CIFS Am 25.05.22 um 12:20 schrieb David Howells: > Stefan Metzmacher <metze@samba.org> wrote: > >> Can you share the capture file? > > See attached. > >> What version of wireshark are you using? > > wireshark-3.6.2-1.fc35.x86_64 > Wireshark 3.7.0 (v3.7.0rc0-1686-g64dfed53330f) Works fine for me with 3.6.2-2 on ubuntu and also a recent wireshark master version. I just fixed a minor problem with fragmented iwrap_ddp_rdma_send messages in frames 91-96. Which seem to happen because ksmbd.ko negotiated a preferred send size of 8192, which is accepted by the cifs.ko. Windows only uses 1364, which means each send operation fits into a single ethernet frame... See https://gitlab.com/wireshark/wireshark/-/merge_requests/7025 metze ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-26 14:56 ` Stefan Metzmacher @ 2022-05-26 15:52 ` Tom Talpey 2022-05-27 8:27 ` Stefan Metzmacher 2022-05-27 11:46 ` David Howells 0 siblings, 2 replies; 42+ messages in thread From: Tom Talpey @ 2022-05-26 15:52 UTC (permalink / raw) To: Stefan Metzmacher, David Howells Cc: Steve French, Namjae Jeon, Long Li, Hyunchul Lee, CIFS On 5/26/2022 10:56 AM, Stefan Metzmacher wrote: > > Am 25.05.22 um 12:20 schrieb David Howells: >> Stefan Metzmacher <metze@samba.org> wrote: >> >>> Can you share the capture file? >> >> See attached. >> >>> What version of wireshark are you using? >> >> wireshark-3.6.2-1.fc35.x86_64 >> Wireshark 3.7.0 (v3.7.0rc0-1686-g64dfed53330f) > > Works fine for me with 3.6.2-2 on ubuntu and also a recent wireshark > master version. > > I just fixed a minor problem with fragmented iwrap_ddp_rdma_send > messages in > frames 91-96. Which seem to happen because ksmbd.ko negotiated a > preferred send size of 8192, > which is accepted by the cifs.ko. Windows only uses 1364, which means > each send operation fits into > a single ethernet frame... That was not an accident. :) There's a talk somewhere in which I mentioned how we tried to optimize the smbdirect fragment size and always landed back on 1364. However, this is only an implementation choice, the protocol supports a wide range. So it's appropriate for wireshark to be accommodating. > See https://gitlab.com/wireshark/wireshark/-/merge_requests/7025 I get a blank frame when I view both changes on this link. Do I need a gitlab account?? Tom. > > metze > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-26 15:52 ` Tom Talpey @ 2022-05-27 8:27 ` Stefan Metzmacher 2022-05-27 11:46 ` David Howells 1 sibling, 0 replies; 42+ messages in thread From: Stefan Metzmacher @ 2022-05-27 8:27 UTC (permalink / raw) To: Tom Talpey, David Howells Cc: Steve French, Namjae Jeon, Long Li, Hyunchul Lee, CIFS Hi Tom, >>>> Can you share the capture file? >>> >>> See attached. >>> >>>> What version of wireshark are you using? >>> >>> wireshark-3.6.2-1.fc35.x86_64 >>> Wireshark 3.7.0 (v3.7.0rc0-1686-g64dfed53330f) >> >> Works fine for me with 3.6.2-2 on ubuntu and also a recent wireshark master version. >> >> I just fixed a minor problem with fragmented iwrap_ddp_rdma_send messages in >> frames 91-96. Which seem to happen because ksmbd.ko negotiated a preferred send size of 8192, >> which is accepted by the cifs.ko. Windows only uses 1364, which means each send operation fits into >> a single ethernet frame... > > That was not an accident. :) > > There's a talk somewhere in which I mentioned how we tried to optimize > the smbdirect fragment size and always landed back on 1364. Yes, I remember 4096/3 :-) > However, this is only an implementation choice, the protocol supports a wide > range. So it's appropriate for wireshark to be accommodating. Yes, but the past 20 years showed more than once that, everything but matching Windows just leads to bugs, because of untested code paths. >> See https://gitlab.com/wireshark/wireshark/-/merge_requests/7025 > > I get a blank frame when I view both changes on this link. Do I need > a gitlab account?? I don't think so, I'm seeing it even without being logged in. Does https://gitlab.com/wireshark/wireshark/-/merge_requests/7025.patch or https://gitlab.com/wireshark/wireshark/-/merge_requests/7025/diffs work? metze ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-26 15:52 ` Tom Talpey 2022-05-27 8:27 ` Stefan Metzmacher @ 2022-05-27 11:46 ` David Howells 2022-05-27 13:45 ` Stefan Metzmacher 2022-05-27 22:22 ` David Howells 1 sibling, 2 replies; 42+ messages in thread From: David Howells @ 2022-05-27 11:46 UTC (permalink / raw) To: Stefan Metzmacher Cc: dhowells, Tom Talpey, Steve French, Namjae Jeon, Long Li, Hyunchul Lee, CIFS Stefan Metzmacher <metze@samba.org> wrote: > Does https://gitlab.com/wireshark/wireshark/-/merge_requests/7025.patch I tried applying it (there appear to be two patches therein), but it doesn't seem to have any effect when viewing the pcap file I posted. It says it should decode the port as Artemis. David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-27 11:46 ` David Howells @ 2022-05-27 13:45 ` Stefan Metzmacher 2022-05-27 22:22 ` David Howells 1 sibling, 0 replies; 42+ messages in thread From: Stefan Metzmacher @ 2022-05-27 13:45 UTC (permalink / raw) To: David Howells Cc: Tom Talpey, Steve French, Namjae Jeon, Long Li, Hyunchul Lee, CIFS Am 27.05.22 um 13:46 schrieb David Howells: > Stefan Metzmacher <metze@samba.org> wrote: > >> Does https://gitlab.com/wireshark/wireshark/-/merge_requests/7025.patch > > I tried applying it (there appear to be two patches therein), but it doesn't > seem to have any effect when viewing the pcap file I posted. It says it > should decode the port as Artemis. I found that it's needed to set the tcp.try_heuristic_first preference to TRUE Check this: grep '^tcp' preferences tcp.try_heuristic_first: TRUE For ~/.config/wireshark/ (or ~/.wireshark/' as fallback) metze ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-27 11:46 ` David Howells 2022-05-27 13:45 ` Stefan Metzmacher @ 2022-05-27 22:22 ` David Howells 1 sibling, 0 replies; 42+ messages in thread From: David Howells @ 2022-05-27 22:22 UTC (permalink / raw) To: Stefan Metzmacher Cc: dhowells, Tom Talpey, Steve French, Namjae Jeon, Long Li, Hyunchul Lee, CIFS Stefan Metzmacher <metze@samba.org> wrote: > I found that it's needed to set the > tcp.try_heuristic_first preference to TRUE That works, thanks. Tested-by: David Howells <dhowells@redhat.com> ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-23 13:45 ` Tom Talpey ` (2 preceding siblings ...) 2022-05-25 9:41 ` David Howells @ 2022-08-02 15:10 ` David Howells 2022-08-03 0:55 ` Namjae Jeon 3 siblings, 1 reply; 42+ messages in thread From: David Howells @ 2022-08-02 15:10 UTC (permalink / raw) To: Namjae Jeon Cc: dhowells, Tom Talpey, Long Li, Hyunchul Lee, Steve French, CIFS David Howells <dhowells@redhat.com> wrote: > With that, iWarp works for me. You can add: > > Tested-by: David Howells <dhowells@redhat.com> Would it be possible to get this pushed upstream as a stopgap fix? iWarp still doesn't work in 5.19 with cifs. David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-08-02 15:10 ` David Howells @ 2022-08-03 0:55 ` Namjae Jeon 2022-08-03 2:36 ` Namjae Jeon 2022-08-03 6:16 ` David Howells 0 siblings, 2 replies; 42+ messages in thread From: Namjae Jeon @ 2022-08-03 0:55 UTC (permalink / raw) To: David Howells; +Cc: Tom Talpey, Long Li, Hyunchul Lee, Steve French, CIFS 2022-08-03 0:10 GMT+09:00, David Howells <dhowells@redhat.com>: > David Howells <dhowells@redhat.com> wrote: > >> With that, iWarp works for me. You can add: >> >> Tested-by: David Howells <dhowells@redhat.com> > > Would it be possible to get this pushed upstream as a stopgap fix? iWarp > still doesn't work in 5.19 with cifs. I will try this. Thanks. > > David > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-08-03 0:55 ` Namjae Jeon @ 2022-08-03 2:36 ` Namjae Jeon 2022-08-03 6:16 ` David Howells 1 sibling, 0 replies; 42+ messages in thread From: Namjae Jeon @ 2022-08-03 2:36 UTC (permalink / raw) To: David Howells, Long Li; +Cc: Tom Talpey, Hyunchul Lee, Steve French, CIFS 2022-08-03 9:55 GMT+09:00, Namjae Jeon <linkinjeon@kernel.org>: > 2022-08-03 0:10 GMT+09:00, David Howells <dhowells@redhat.com>: >> David Howells <dhowells@redhat.com> wrote: >> >>> With that, iWarp works for me. You can add: >>> >>> Tested-by: David Howells <dhowells@redhat.com> >> >> Would it be possible to get this pushed upstream as a stopgap fix? iWarp >> still doesn't work in 5.19 with cifs. > I will try this. I have updated the patch, so I don't add your tested-by tag for previous test patch. Long said to change that value to 8. But max_sge in cifs need to be set to 6 for sw-iWARP . I wonder if there is a problem with values lower than 8... Thanks. > > Thanks. >> >> David >> >> > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-08-03 0:55 ` Namjae Jeon 2022-08-03 2:36 ` Namjae Jeon @ 2022-08-03 6:16 ` David Howells 1 sibling, 0 replies; 42+ messages in thread From: David Howells @ 2022-08-03 6:16 UTC (permalink / raw) To: Namjae Jeon Cc: dhowells, Long Li, Tom Talpey, Hyunchul Lee, Steve French, CIFS Namjae Jeon <linkinjeon@kernel.org> wrote: > Long said to change that value to 8. But max_sge in cifs need to be > set to 6 for sw-iWARP . I wonder if there is a problem with values > lower than 8... Looking at the code, I think 5 might suffice. David ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <747882.1653311226@warthog.procyon.org.uk>]
* Re: RDMA (smbdirect) testing [not found] ` <747882.1653311226@warthog.procyon.org.uk> @ 2022-05-23 13:37 ` Tom Talpey 0 siblings, 0 replies; 42+ messages in thread From: Tom Talpey @ 2022-05-23 13:37 UTC (permalink / raw) To: David Howells, Namjae Jeon; +Cc: Long Li, Hyunchul Lee, Steve French, CIFS On 5/23/2022 9:07 AM, David Howells wrote: > Namjae Jeon <linkinjeon@kernel.org> wrote: > >> No. I was able to reproduce the same problem that David reported. I >> and Hyunchul will take a look. I also confirmed that RDMA work well >> without any problems with soft-ROCE. Until this problem is fixed, I'd >> like to say David to use soft-ROCE. > > I managed to set up soft-RoCE and get cifs working over that. > > One thing I'm not sure about: on my server, I see the roce device with > ibv_devices, but on my test client, I don't. I'm wondering if there's > something I need to configure to get that. Probably not, but I'd recommend to avoid wading into that if your testing is going ok. Send a report to linux-rdma and let them know, if so? We are told that RedHat is removing SoftRoCE from RHEL9, this too might affect your future testing plans... Tom. > I've attached the config for reference. Note that I build a monolithic kernel > so that I can PXE boot it directly out of the build dir rather than building > and installing a fresh rpm each time. > > David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-21 11:54 ` Tom Talpey 2022-05-22 23:06 ` Namjae Jeon [not found] ` <747882.1653311226@warthog.procyon.org.uk> @ 2022-05-23 14:03 ` Stefan Metzmacher 2022-05-25 9:35 ` David Howells 3 siblings, 0 replies; 42+ messages in thread From: Stefan Metzmacher @ 2022-05-23 14:03 UTC (permalink / raw) To: Tom Talpey, David Howells, Long Li, Namjae Jeon Cc: Hyunchul Lee, Steve French, CIFS Am 21.05.22 um 13:54 schrieb Tom Talpey: > > On 5/20/2022 2:12 PM, David Howells wrote: >> Tom Talpey <tom@talpey.com> wrote: >> >>> SoftROCE is a bit of a hot mess in upstream right now. It's >>> getting a lot of attention, but it's still pretty shaky. >>> If you're testing, I'd STRONGLY recommend SoftiWARP. >> >> I'm having problems getting that working. I'm setting the client up with: >> >> rdma link add siw0 type siw netdev enp6s0 >> mount //192.168.6.1/scratch /xfstest.scratch -o rdma,user=shares,pass=... >> >> and then see: >> >> CIFS: Attempting to mount \\192.168.6.1\scratch >> CIFS: VFS: _smbd_get_connection:1513 warning: device max_send_sge = 6 too small >> CIFS: VFS: _smbd_get_connection:1516 Queue Pair creation may fail >> CIFS: VFS: _smbd_get_connection:1519 warning: device max_recv_sge = 6 too small >> CIFS: VFS: _smbd_get_connection:1522 Queue Pair creation may fail >> CIFS: VFS: _smbd_get_connection:1559 rdma_create_qp failed -22 >> CIFS: VFS: _smbd_get_connection:1513 warning: device max_send_sge = 6 too small >> CIFS: VFS: _smbd_get_connection:1516 Queue Pair creation may fail >> CIFS: VFS: _smbd_get_connection:1519 warning: device max_recv_sge = 6 too small >> CIFS: VFS: _smbd_get_connection:1522 Queue Pair creation may fail >> CIFS: VFS: _smbd_get_connection:1559 rdma_create_qp failed -22 >> CIFS: VFS: cifs_mount failed w/return code = -2 >> >> in dmesg. >> >> Problem is, I don't know what to do about it:-/ > > It looks like the client is hardcoding 16 sge's, and has no option to > configure a smaller value, or reduce its requested number. That's bad, > because providers all have their own limits - and SIW_MAX_SGE is 6. I > thought I'd seen this working (metze?), but either the code changed or > someone built a custom version. No, I only tested with my smbdirect driver and there I don't have such a problem. I never god cifs.ko to work with smbdirect. metze ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-21 11:54 ` Tom Talpey ` (2 preceding siblings ...) 2022-05-23 14:03 ` Stefan Metzmacher @ 2022-05-25 9:35 ` David Howells 3 siblings, 0 replies; 42+ messages in thread From: David Howells @ 2022-05-25 9:35 UTC (permalink / raw) To: Stefan Metzmacher Cc: dhowells, Tom Talpey, Long Li, Namjae Jeon, Hyunchul Lee, Steve French, CIFS Stefan Metzmacher <metze@samba.org> wrote: > I never god cifs.ko to work with smbdirect. It works for me. I think you need the following modules: modprobe rdma_cm modprobe ib_umad modprobe siw # softiwarp modprobe rdma_rxe # softroce I do: rdma link add siw0 type siw netdev enp2s0 for softiwarp or: rdma link add rxe0 type rxe netdev enp6s0 for softroce on the client before doing: mount //192.168.6.1/test /xfstest.test -o rdma,user=shares,pass=... David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-19 20:41 RDMA (smbdirect) testing Steve French 2022-05-19 23:06 ` Namjae Jeon @ 2022-05-20 6:20 ` David Howells 2022-05-20 8:37 ` Namjae Jeon 2022-05-24 20:12 ` David Howells 2022-05-27 10:33 ` UAF in smbd_reconnect() when using softIWarp David Howells 3 siblings, 1 reply; 42+ messages in thread From: David Howells @ 2022-05-20 6:20 UTC (permalink / raw) To: Namjae Jeon; +Cc: dhowells, Steve French, Hyeoncheol Lee, CIFS, Long Li Namjae Jeon <linkinjeon@kernel.org> wrote: > You seem to be asking about soft-ROCE(or soft-iWARP). Hyunchul had been > testing RDMA of ksmbd with it before. Yep. I don't have any RDMA-capable cards. I have managed to use soft-IWarp with NFS. Also, if you know how to make Samba do RDMA, that would be useful. Thanks, David ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-20 6:20 ` David Howells @ 2022-05-20 8:37 ` Namjae Jeon 0 siblings, 0 replies; 42+ messages in thread From: Namjae Jeon @ 2022-05-20 8:37 UTC (permalink / raw) To: David Howells; +Cc: Steve French, Hyeoncheol Lee, CIFS, Long Li 2022-05-20 15:20 GMT+09:00, David Howells <dhowells@redhat.com>: > Namjae Jeon <linkinjeon@kernel.org> wrote: > >> You seem to be asking about soft-ROCE(or soft-iWARP). Hyunchul had been >> testing RDMA of ksmbd with it before. > > Yep. I don't have any RDMA-capable cards. I have managed to use > soft-IWarp > with NFS. Ah, I have the real hw, so I haven't tried it. I'll do it at this time. > > Also, if you know how to make Samba do RDMA, that would be useful. As I know, samba doesn't support it yet. It is known as in-progress. Thanks! > > Thanks, > David > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: RDMA (smbdirect) testing 2022-05-19 20:41 RDMA (smbdirect) testing Steve French 2022-05-19 23:06 ` Namjae Jeon 2022-05-20 6:20 ` David Howells @ 2022-05-24 20:12 ` David Howells 2022-05-27 10:33 ` UAF in smbd_reconnect() when using softIWarp David Howells 3 siblings, 0 replies; 42+ messages in thread From: David Howells @ 2022-05-24 20:12 UTC (permalink / raw) To: Steve French; +Cc: dhowells, Namjae Jeon, Hyeoncheol Lee, CIFS, Long Li Okay - I got it working; somewhat at least. Now to take out the mass of print statements. David ^ permalink raw reply [flat|nested] 42+ messages in thread
* UAF in smbd_reconnect() when using softIWarp 2022-05-19 20:41 RDMA (smbdirect) testing Steve French ` (2 preceding siblings ...) 2022-05-24 20:12 ` David Howells @ 2022-05-27 10:33 ` David Howells 3 siblings, 0 replies; 42+ messages in thread From: David Howells @ 2022-05-27 10:33 UTC (permalink / raw) To: Steve French Cc: dhowells, Namjae Jeon, Hyeoncheol Lee, CIFS, Long Li, Tom Talpey Hi Steve, I switch to using the softIWarp driver as there's a deadlock in the softRoCE driver. However, this comes up with a repeatable UAF detected by KASAN. The RDMA link was brought up with: rdma link add siw0 type siw netdev enp6s0 and then I started running xfstests with -g quick. MOUNT_OPTIONS -- -ordma,username=shares,password=...,mfsymlinks -o context=system_u:object_r:root_t:s0 //carina/scratch /xfstest.scratch The kernel was v5.18 + iwarp SGE patch. The KASAN splat is attached. Some decoded bits: smbd_reconnect+0xba/0x1a6 smbd_reconnect (fs/cifs/smbdirect.c:1427): if (server->smbd_conn->transport_status == SMBD_CONNECTED) { _smbd_get_connection+0xce/0x1367 _smbd_get_connection (fs/cifs/smbdirect.c:1530): info = kzalloc(sizeof(struct smbd_connection), GFP_KERNEL); smbd_destroy+0x852/0x899 smbd_destroy (fs/cifs/smbdirect.c:1323): (probably the kfree at the end on line 1407) __cifs_reconnect+0x315/0x4b3 __cifs_reconnect (fs/cifs/connect.c:311 fs/cifs/connect.c:358) smbd_destroy(server); David --- run fstests generic/005 at 2022-05-27 11:18:41 run fstests generic/006 at 2022-05-27 11:18:51 CIFS: VFS: smbd_recv_buf:1889 disconnected ================================================================== BUG: KASAN: use-after-free in smbd_reconnect+0xba/0x1a6 Read of size 4 at addr ffff88813029e000 by task cifsd/4509 CPU: 2 PID: 4509 Comm: cifsd Not tainted 5.18.0-build2+ #467 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x59 print_address_description.constprop.0+0x1f/0x2ce ? smbd_reconnect+0xba/0x1a6 print_report+0xf0/0x1d6 ? smbd_reconnect+0xba/0x1a6 ? do_raw_spin_lock+0x13a/0x17b ? smbd_reconnect+0xba/0x1a6 kasan_report+0x81/0xa1 ? smbd_reconnect+0xba/0x1a6 smbd_reconnect+0xba/0x1a6 __cifs_reconnect+0x351/0x4b3 ? cifs_mark_tcp_ses_conns_for_reconnect+0x1b3/0x1b3 ? __raw_spin_lock_init+0x85/0x85 cifs_readv_from_socket+0x29a/0x2f4 cifs_read_from_socket+0x95/0xc5 ? cifs_readv_from_socket+0x2f4/0x2f4 ? cifs_small_buf_get+0x50/0x5d ? allocate_buffers+0xfb/0x186 cifs_demultiplex_thread+0x19b/0xb64 ? cifs_handle_standard+0x27e/0x27e ? lock_downgrade+0xad/0xad ? rcu_read_lock_bh_held+0xab/0xab ? pci_mmcfg_check_reserved+0xbd/0xbd ? preempt_count_sub+0x18/0xba ? _raw_spin_unlock_irqrestore+0x39/0x4c ? cifs_handle_standard+0x27e/0x27e kthread+0x164/0x173 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 4505: stack_trace_save+0x8f/0xbe kasan_save_stack+0x1e/0x39 kasan_set_track+0x21/0x26 ____kasan_kmalloc+0x68/0x72 kmem_cache_alloc_trace+0x121/0x162 _smbd_get_connection+0xce/0x1367 smbd_get_connection+0x21/0x3e cifs_get_tcp_session.part.0+0x853/0xbda mount_get_conns+0x51/0x594 cifs_mount+0x8d/0x279 cifs_smb3_do_mount+0x186/0x471 smb3_get_tree+0x58/0x91 vfs_get_tree+0x46/0x150 do_new_mount+0x19f/0x2c9 path_mount+0x6a5/0x6e3 do_mount+0x9e/0xe1 __do_sys_mount+0x150/0x17c do_syscall_64+0x39/0x46 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 4509: stack_trace_save+0x8f/0xbe kasan_save_stack+0x1e/0x39 kasan_set_track+0x21/0x26 kasan_set_free_info+0x20/0x2f ____kasan_slab_free+0xad/0xc9 kfree+0x125/0x14b smbd_destroy+0x852/0x899 __cifs_reconnect+0x315/0x4b3 cifs_readv_from_socket+0x29a/0x2f4 cifs_read_from_socket+0x95/0xc5 cifs_demultiplex_thread+0x19b/0xb64 kthread+0x164/0x173 ret_from_fork+0x1f/0x30 Last potentially related work creation: stack_trace_save+0x8f/0xbe kasan_save_stack+0x1e/0x39 __kasan_record_aux_stack+0x62/0x68 insert_work+0x30/0xaf __queue_work+0x4b9/0x4dc queue_work_on+0x4d/0x67 __ib_process_cq+0x219/0x268 ib_poll_handler+0x3f/0x14c irq_poll_softirq+0xd8/0x1ab __do_softirq+0x202/0x489 Second to last potentially related work creation: stack_trace_save+0x8f/0xbe kasan_save_stack+0x1e/0x39 __kasan_record_aux_stack+0x62/0x68 insert_work+0x30/0xaf __queue_work+0x4b9/0x4dc queue_work_on+0x4d/0x67 recv_done+0x16f/0x727 __ib_process_cq+0x219/0x268 ib_poll_handler+0x3f/0x14c irq_poll_softirq+0xd8/0x1ab __do_softirq+0x202/0x489 The buggy address belongs to the object at ffff88813029e000 which belongs to the cache kmalloc-4k of size 4096 The buggy address is located 0 bytes inside of 4096-byte region [ffff88813029e000, ffff88813029f000) The buggy address belongs to the physical page: page:0000000001f91160 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13029e head:0000000001f91160 order:1 compound_mapcount:0 compound_pincount:0 flags: 0x200000000010200(slab|head|node=0|zone=2) raw: 0200000000010200 ffffea0004c06d08 ffffea0004c0a288 ffff888100040900 raw: 0000000000000000 ffff88813029e000 0000000100000001 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88813029df00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88813029df80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88813029e000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88813029e080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88813029e100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint CIFS: VFS: RDMA transport re-established CIFS: VFS: smbd_recv_buf:1889 disconnected ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2022-08-03 6:16 UTC | newest] Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-19 20:41 RDMA (smbdirect) testing Steve French 2022-05-19 23:06 ` Namjae Jeon 2022-05-20 6:01 ` Hyunchul Lee 2022-05-20 18:03 ` Tom Talpey 2022-05-20 18:12 ` David Howells 2022-05-21 11:54 ` Tom Talpey 2022-05-22 23:06 ` Namjae Jeon 2022-05-23 13:45 ` Tom Talpey 2022-05-23 15:05 ` Namjae Jeon 2022-05-23 16:05 ` Tom Talpey 2022-05-23 19:17 ` Long Li 2022-05-24 1:01 ` Namjae Jeon 2022-05-24 21:08 ` Long Li 2022-06-02 23:32 ` Namjae Jeon 2022-06-03 0:07 ` Long Li 2022-06-07 17:26 ` Tom Talpey 2022-06-07 22:25 ` Namjae Jeon 2022-05-24 0:59 ` Namjae Jeon 2022-05-24 9:16 ` David Howells 2022-05-24 17:49 ` Steve French 2022-05-24 18:12 ` Tom Talpey 2022-05-25 9:29 ` David Howells 2022-05-25 9:41 ` David Howells 2022-05-25 10:00 ` Stefan Metzmacher 2022-05-25 10:20 ` David Howells 2022-05-26 14:56 ` Stefan Metzmacher 2022-05-26 15:52 ` Tom Talpey 2022-05-27 8:27 ` Stefan Metzmacher 2022-05-27 11:46 ` David Howells 2022-05-27 13:45 ` Stefan Metzmacher 2022-05-27 22:22 ` David Howells 2022-08-02 15:10 ` David Howells 2022-08-03 0:55 ` Namjae Jeon 2022-08-03 2:36 ` Namjae Jeon 2022-08-03 6:16 ` David Howells [not found] ` <747882.1653311226@warthog.procyon.org.uk> 2022-05-23 13:37 ` Tom Talpey 2022-05-23 14:03 ` Stefan Metzmacher 2022-05-25 9:35 ` David Howells 2022-05-20 6:20 ` David Howells 2022-05-20 8:37 ` Namjae Jeon 2022-05-24 20:12 ` David Howells 2022-05-27 10:33 ` UAF in smbd_reconnect() when using softIWarp David Howells
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.