linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] Frequent connection loss using krb5[ip] NFS mounts
@ 2020-05-01 19:04 Chuck Lever
  2020-05-01 19:04 ` [PATCH RFC] SUNRPC: crypto calls should never schedule Chuck Lever
  2020-05-04 19:07 ` [PATCH RFC] Frequent connection loss using krb5[ip] NFS mounts J. Bruce Fields
  0 siblings, 2 replies; 4+ messages in thread
From: Chuck Lever @ 2020-05-01 19:04 UTC (permalink / raw)
  To: linux-nfs, linux-crypto

Hi-

Over the past year or so I've observed that using sec=krb5i or
sec=krb5p with NFS/RDMA while testing my Linux NFS client and server
results in frequent connection loss. I've been able to pin this down
a bit.

The problem is that SUNRPC's calls into the kernel crypto functions
can sometimes reschedule. If that happens between the time the
server receives an RPC Call and the time it has unwrapped the Call's
GSS sequence number, the number is very likely to have fallen
outside the GSS context's sequence number window. When that happens,
the server is required to drop the Call, and therefore also the
connection.

I've tested this observation by applying the following naive patch to
both my client and server, and got the following results.

I. Is this an effective fix?

With sec=krb5p,proto=rdma, I ran a 12-thread git regression test
(cd git/; make -j12 test).

Without this patch on the server, over 3,000 connection loss events
are observed. With it, the test runs on a single connection. Clearly
the server needs to have some kind of mitigation in this area.


II. Does the fix cause a performance regression?

Because this patch affects both the client and server paths, I
tested the performance differences between applying the patch in
various combinations and with both sec=krb5 (which checksums just
the RPC message header) and krb5i (which checksums the whole RPC
message.

fio 8KiB 70% read, 30% write for 30 seconds, NFSv3 on RDMA.

-- krb5 --

unpatched client and server:
Connect count: 3
read: IOPS=84.3k, 50.00th=[ 1467], 99.99th=[10028] 
write: IOPS=36.1k, 50.00th=[ 1565], 99.99th=[20579]

patched client, unpatched server:
Connect count: 2
read: IOPS=75.4k, 50.00th=[ 1647], 99.99th=[ 7111]
write: IOPS=32.3k, 50.00th=[ 1745], 99.99th=[ 7439]

unpatched client, patched server:
Connect count: 1
read: IOPS=84.1k, 50.00th=[ 1467], 99.99th=[ 8717]
write: IOPS=36.1k, 50.00th=[ 1582], 99.99th=[ 9241]

patched client and server:
Connect count: 1
read: IOPS=74.9k, 50.00th=[ 1663], 99.99th=[ 7046]
write: IOPS=31.0k, 50.00th=[ 1762], 99.99th=[ 7242]

-- krb5i --

unpatched client and server:
Connect count: 6
read: IOPS=35.8k, 50.00th=[ 3228], 99.99th=[49546]
write: IOPS=15.4k, 50.00th=[ 3294], 99.99th=[50594]

patched client, unpatched server:
Connect count: 5
read: IOPS=36.3k, 50.00th=[ 3228], 99.99th=[14877]
write: IOPS=15.5k, 50.00th=[ 3294], 99.99th=[15139]

unpatched client, patched server:
Connect count: 3
read: IOPS=35.7k, 50.00th=[ 3228], 99.99th=[15926]
write: IOPS=15.2k, 50.00th=[ 3294], 99.99th=[15926]

patched client and server:
Connect count: 3
read: IOPS=36.3k, 50.00th=[ 3195], 99.99th=[15139]
write: IOPS=15.5k, 50.00th=[ 3261], 99.99th=[15270]


The good news:
Both results show that I/O tail latency improves significantly when
either the client or the server has this patch applied.

The bad news:
The krb5 performance result shows an IOPS regression when the client
has this patch applied.


So now I'm trying to understand how to come up with a solution that
prevents the rescheduling/connection loss issue without also
causing a performance regression.

Any thoughts/comments/advice appreciated.

---

Chuck Lever (1):
      SUNRPC: crypto calls should never schedule

--
Chuck Lever

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH RFC] SUNRPC: crypto calls should never schedule
  2020-05-01 19:04 [PATCH RFC] Frequent connection loss using krb5[ip] NFS mounts Chuck Lever
@ 2020-05-01 19:04 ` Chuck Lever
  2020-05-04 19:07 ` [PATCH RFC] Frequent connection loss using krb5[ip] NFS mounts J. Bruce Fields
  1 sibling, 0 replies; 4+ messages in thread
From: Chuck Lever @ 2020-05-01 19:04 UTC (permalink / raw)
  To: linux-nfs, linux-crypto

Do not allow the crypto core to reschedule while processing RPC
requests. gss_krb5_crypto.c does make crypto API calls in process
context. However:

- When handling a received Call, rescheduling can introduce
latencies that result in processing delays causing a request to
fall outside the GSS sequence number window. When that occurs,
the server is required to drop that request and the connection on
which it arrived.

- On the client side, ostensibly RPC tasks are not supposed to sleep
or reschedule outside the RPC scheduler. Otherwise there is a risk
of deadlock.

Generally speaking, the risk of removing the cond_resched() is low.
A block of data handled in these paths will not exceed 1MB + a
little a overhead, and processing a single RPC is already
appropriately interleaved amongst both processes and CPUs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/auth_gss/gss_krb5_crypto.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/auth_gss/gss_krb5_crypto.c b/net/sunrpc/auth_gss/gss_krb5_crypto.c
index e7180da1fc6a..083438f73e52 100644
--- a/net/sunrpc/auth_gss/gss_krb5_crypto.c
+++ b/net/sunrpc/auth_gss/gss_krb5_crypto.c
@@ -209,7 +209,7 @@ make_checksum_hmac_md5(struct krb5_ctx *kctx, char *header, int hdrlen,
 	if (!req)
 		goto out_free_hmac_md5;
 
-	ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL);
+	ahash_request_set_callback(req, 0, NULL, NULL);
 
 	err = crypto_ahash_init(req);
 	if (err)
@@ -239,7 +239,7 @@ make_checksum_hmac_md5(struct krb5_ctx *kctx, char *header, int hdrlen,
 	if (!req)
 		goto out_free_hmac_md5;
 
-	ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL);
+	ahash_request_set_callback(req, 0, NULL, NULL);
 
 	err = crypto_ahash_setkey(hmac_md5, cksumkey, kctx->gk5e->keylength);
 	if (err)
@@ -307,7 +307,7 @@ make_checksum(struct krb5_ctx *kctx, char *header, int hdrlen,
 	if (!req)
 		goto out_free_ahash;
 
-	ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL);
+	ahash_request_set_callback(req, 0, NULL, NULL);
 
 	checksumlen = crypto_ahash_digestsize(tfm);
 
@@ -403,7 +403,7 @@ make_checksum_v2(struct krb5_ctx *kctx, char *header, int hdrlen,
 	if (!req)
 		goto out_free_ahash;
 
-	ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL);
+	ahash_request_set_callback(req, 0, NULL, NULL);
 
 	err = crypto_ahash_setkey(tfm, cksumkey, kctx->gk5e->keylength);
 	if (err)


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] Frequent connection loss using krb5[ip] NFS mounts
  2020-05-01 19:04 [PATCH RFC] Frequent connection loss using krb5[ip] NFS mounts Chuck Lever
  2020-05-01 19:04 ` [PATCH RFC] SUNRPC: crypto calls should never schedule Chuck Lever
@ 2020-05-04 19:07 ` J. Bruce Fields
  2020-05-04 19:10   ` Chuck Lever
  1 sibling, 1 reply; 4+ messages in thread
From: J. Bruce Fields @ 2020-05-04 19:07 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, linux-crypto

On Fri, May 01, 2020 at 03:04:00PM -0400, Chuck Lever wrote:
> Over the past year or so I've observed that using sec=krb5i or
> sec=krb5p with NFS/RDMA while testing my Linux NFS client and server
> results in frequent connection loss. I've been able to pin this down
> a bit.
> 
> The problem is that SUNRPC's calls into the kernel crypto functions
> can sometimes reschedule. If that happens between the time the
> server receives an RPC Call and the time it has unwrapped the Call's
> GSS sequence number, the number is very likely to have fallen
> outside the GSS context's sequence number window. When that happens,
> the server is required to drop the Call, and therefore also the
> connection.

Does it help to increase GSS_SEQ_WIN?  I think you could even increase
it by a couple orders of magnitudes without getting into higher-order
allocations.

--b.

> 
> I've tested this observation by applying the following naive patch to
> both my client and server, and got the following results.
> 
> I. Is this an effective fix?
> 
> With sec=krb5p,proto=rdma, I ran a 12-thread git regression test
> (cd git/; make -j12 test).
> 
> Without this patch on the server, over 3,000 connection loss events
> are observed. With it, the test runs on a single connection. Clearly
> the server needs to have some kind of mitigation in this area.
> 
> 
> II. Does the fix cause a performance regression?
> 
> Because this patch affects both the client and server paths, I
> tested the performance differences between applying the patch in
> various combinations and with both sec=krb5 (which checksums just
> the RPC message header) and krb5i (which checksums the whole RPC
> message.
> 
> fio 8KiB 70% read, 30% write for 30 seconds, NFSv3 on RDMA.
> 
> -- krb5 --
> 
> unpatched client and server:
> Connect count: 3
> read: IOPS=84.3k, 50.00th=[ 1467], 99.99th=[10028] 
> write: IOPS=36.1k, 50.00th=[ 1565], 99.99th=[20579]
> 
> patched client, unpatched server:
> Connect count: 2
> read: IOPS=75.4k, 50.00th=[ 1647], 99.99th=[ 7111]
> write: IOPS=32.3k, 50.00th=[ 1745], 99.99th=[ 7439]
> 
> unpatched client, patched server:
> Connect count: 1
> read: IOPS=84.1k, 50.00th=[ 1467], 99.99th=[ 8717]
> write: IOPS=36.1k, 50.00th=[ 1582], 99.99th=[ 9241]
> 
> patched client and server:
> Connect count: 1
> read: IOPS=74.9k, 50.00th=[ 1663], 99.99th=[ 7046]
> write: IOPS=31.0k, 50.00th=[ 1762], 99.99th=[ 7242]
> 
> -- krb5i --
> 
> unpatched client and server:
> Connect count: 6
> read: IOPS=35.8k, 50.00th=[ 3228], 99.99th=[49546]
> write: IOPS=15.4k, 50.00th=[ 3294], 99.99th=[50594]
> 
> patched client, unpatched server:
> Connect count: 5
> read: IOPS=36.3k, 50.00th=[ 3228], 99.99th=[14877]
> write: IOPS=15.5k, 50.00th=[ 3294], 99.99th=[15139]
> 
> unpatched client, patched server:
> Connect count: 3
> read: IOPS=35.7k, 50.00th=[ 3228], 99.99th=[15926]
> write: IOPS=15.2k, 50.00th=[ 3294], 99.99th=[15926]
> 
> patched client and server:
> Connect count: 3
> read: IOPS=36.3k, 50.00th=[ 3195], 99.99th=[15139]
> write: IOPS=15.5k, 50.00th=[ 3261], 99.99th=[15270]
> 
> 
> The good news:
> Both results show that I/O tail latency improves significantly when
> either the client or the server has this patch applied.
> 
> The bad news:
> The krb5 performance result shows an IOPS regression when the client
> has this patch applied.
> 
> 
> So now I'm trying to understand how to come up with a solution that
> prevents the rescheduling/connection loss issue without also
> causing a performance regression.
> 
> Any thoughts/comments/advice appreciated.
> 
> ---
> 
> Chuck Lever (1):
>       SUNRPC: crypto calls should never schedule
> 
> --
> Chuck Lever

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] Frequent connection loss using krb5[ip] NFS mounts
  2020-05-04 19:07 ` [PATCH RFC] Frequent connection loss using krb5[ip] NFS mounts J. Bruce Fields
@ 2020-05-04 19:10   ` Chuck Lever
  0 siblings, 0 replies; 4+ messages in thread
From: Chuck Lever @ 2020-05-04 19:10 UTC (permalink / raw)
  To: Bruce Fields; +Cc: Linux NFS Mailing List, linux-crypto



> On May 4, 2020, at 3:07 PM, bfields@fieldses.org wrote:
> 
> On Fri, May 01, 2020 at 03:04:00PM -0400, Chuck Lever wrote:
>> Over the past year or so I've observed that using sec=krb5i or
>> sec=krb5p with NFS/RDMA while testing my Linux NFS client and server
>> results in frequent connection loss. I've been able to pin this down
>> a bit.
>> 
>> The problem is that SUNRPC's calls into the kernel crypto functions
>> can sometimes reschedule. If that happens between the time the
>> server receives an RPC Call and the time it has unwrapped the Call's
>> GSS sequence number, the number is very likely to have fallen
>> outside the GSS context's sequence number window. When that happens,
>> the server is required to drop the Call, and therefore also the
>> connection.
> 
> Does it help to increase GSS_SEQ_WIN?  I think you could even increase
> it by a couple orders of magnitudes without getting into higher-order
> allocations.

Increasing GSS_SEQ_WIN reduces the frequency of connection loss events,
but does not close the race window. It's a workaround -- the server
scheduler can delay incoming requests arbitrarily, so there will
always be a race opportunity here unless incoming requests are heavily
serialized.


> --b.
> 
>> 
>> I've tested this observation by applying the following naive patch to
>> both my client and server, and got the following results.
>> 
>> I. Is this an effective fix?
>> 
>> With sec=krb5p,proto=rdma, I ran a 12-thread git regression test
>> (cd git/; make -j12 test).
>> 
>> Without this patch on the server, over 3,000 connection loss events
>> are observed. With it, the test runs on a single connection. Clearly
>> the server needs to have some kind of mitigation in this area.
>> 
>> 
>> II. Does the fix cause a performance regression?
>> 
>> Because this patch affects both the client and server paths, I
>> tested the performance differences between applying the patch in
>> various combinations and with both sec=krb5 (which checksums just
>> the RPC message header) and krb5i (which checksums the whole RPC
>> message.
>> 
>> fio 8KiB 70% read, 30% write for 30 seconds, NFSv3 on RDMA.
>> 
>> -- krb5 --
>> 
>> unpatched client and server:
>> Connect count: 3
>> read: IOPS=84.3k, 50.00th=[ 1467], 99.99th=[10028] 
>> write: IOPS=36.1k, 50.00th=[ 1565], 99.99th=[20579]
>> 
>> patched client, unpatched server:
>> Connect count: 2
>> read: IOPS=75.4k, 50.00th=[ 1647], 99.99th=[ 7111]
>> write: IOPS=32.3k, 50.00th=[ 1745], 99.99th=[ 7439]
>> 
>> unpatched client, patched server:
>> Connect count: 1
>> read: IOPS=84.1k, 50.00th=[ 1467], 99.99th=[ 8717]
>> write: IOPS=36.1k, 50.00th=[ 1582], 99.99th=[ 9241]
>> 
>> patched client and server:
>> Connect count: 1
>> read: IOPS=74.9k, 50.00th=[ 1663], 99.99th=[ 7046]
>> write: IOPS=31.0k, 50.00th=[ 1762], 99.99th=[ 7242]
>> 
>> -- krb5i --
>> 
>> unpatched client and server:
>> Connect count: 6
>> read: IOPS=35.8k, 50.00th=[ 3228], 99.99th=[49546]
>> write: IOPS=15.4k, 50.00th=[ 3294], 99.99th=[50594]
>> 
>> patched client, unpatched server:
>> Connect count: 5
>> read: IOPS=36.3k, 50.00th=[ 3228], 99.99th=[14877]
>> write: IOPS=15.5k, 50.00th=[ 3294], 99.99th=[15139]
>> 
>> unpatched client, patched server:
>> Connect count: 3
>> read: IOPS=35.7k, 50.00th=[ 3228], 99.99th=[15926]
>> write: IOPS=15.2k, 50.00th=[ 3294], 99.99th=[15926]
>> 
>> patched client and server:
>> Connect count: 3
>> read: IOPS=36.3k, 50.00th=[ 3195], 99.99th=[15139]
>> write: IOPS=15.5k, 50.00th=[ 3261], 99.99th=[15270]
>> 
>> 
>> The good news:
>> Both results show that I/O tail latency improves significantly when
>> either the client or the server has this patch applied.
>> 
>> The bad news:
>> The krb5 performance result shows an IOPS regression when the client
>> has this patch applied.
>> 
>> 
>> So now I'm trying to understand how to come up with a solution that
>> prevents the rescheduling/connection loss issue without also
>> causing a performance regression.
>> 
>> Any thoughts/comments/advice appreciated.
>> 
>> ---
>> 
>> Chuck Lever (1):
>>      SUNRPC: crypto calls should never schedule
>> 
>> --
>> Chuck Lever

--
Chuck Lever




^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-05-04 19:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-01 19:04 [PATCH RFC] Frequent connection loss using krb5[ip] NFS mounts Chuck Lever
2020-05-01 19:04 ` [PATCH RFC] SUNRPC: crypto calls should never schedule Chuck Lever
2020-05-04 19:07 ` [PATCH RFC] Frequent connection loss using krb5[ip] NFS mounts J. Bruce Fields
2020-05-04 19:10   ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).