Discussion:
[PATCH] crypto_memcmp: add constant-time memcmp
James Yonan
2013-09-10 18:38:11 UTC
Permalink
When comparing MAC hashes, AEAD authentication tags, or other hash
values in the context of authentication or integrity checking, it
is important not to leak timing information to a potential attacker.

Bytewise memory comparisons (such as memcmp) are usually optimized so
that they return a nonzero value as soon as a mismatch is found.
This early-return behavior can leak timing information, allowing an
attacker to iteratively guess the correct result.

This patch adds a new method crypto_memcmp that has the same prototype
as standard memcmp, but that compares strings of the same length in
roughly constant time (cache misses could change the timing, but
since they don't reveal information about the content of the strings
being compared, they are effectively benign). Note that crypto_memcmp
(unlike memcmp) can only be used to test for equality or inequality,
NOT greater-than or less-than. This is not an issue for its use-cases
within the Crypto API.

I tried to locate all of the places in the Crypto API where memcmp was
being used for authentication or integrity checking, and convert them
over to crypto_memcmp.

crypto_memcmp is declared noinline and placed in its own source file
because a very smart compiler (or LTO) might notice that the return
value is always compared against zero/nonzero, and might then
reintroduce the same early-return optimization that we are trying to
avoid.

Signed-off-by: James Yonan <***@openvpn.net>
---
crypto/Makefile | 2 +-
crypto/asymmetric_keys/rsa.c | 5 +++--
crypto/authenc.c | 7 ++++---
crypto/authencesn.c | 9 +++++----
crypto/ccm.c | 5 +++--
crypto/crypto_memcmp.c | 31 +++++++++++++++++++++++++++++++
crypto/gcm.c | 3 ++-
include/crypto/internal/crypto_memcmp.h | 20 ++++++++++++++++++++
8 files changed, 69 insertions(+), 13 deletions(-)
create mode 100644 crypto/crypto_memcmp.c
create mode 100644 include/crypto/internal/crypto_memcmp.h

diff --git a/crypto/Makefile b/crypto/Makefile
index 2ba0df2..39a574d 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -3,7 +3,7 @@
#

obj-$(CONFIG_CRYPTO) += crypto.o
-crypto-y := api.o cipher.o compress.o
+crypto-y := api.o cipher.o compress.o crypto_memcmp.o

obj-$(CONFIG_CRYPTO_WORKQUEUE) += crypto_wq.o

diff --git a/crypto/asymmetric_keys/rsa.c b/crypto/asymmetric_keys/rsa.c
index 4a6a069..4f9a250 100644
--- a/crypto/asymmetric_keys/rsa.c
+++ b/crypto/asymmetric_keys/rsa.c
@@ -13,6 +13,7 @@
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/slab.h>
+#include <crypto/internal/crypto_memcmp.h>
#include "public_key.h"

MODULE_LICENSE("GPL");
@@ -189,12 +190,12 @@ static int RSA_verify(const u8 *H, const u8 *EM, size_t k, size_t hash_size,
}
}

- if (memcmp(asn1_template, EM + T_offset, asn1_size) != 0) {
+ if (crypto_memcmp(asn1_template, EM + T_offset, asn1_size) != 0) {
kleave(" = -EBADMSG [EM[T] ASN.1 mismatch]");
return -EBADMSG;
}

- if (memcmp(H, EM + T_offset + asn1_size, hash_size) != 0) {
+ if (crypto_memcmp(H, EM + T_offset + asn1_size, hash_size) != 0) {
kleave(" = -EKEYREJECTED [EM[T] hash mismatch]");
return -EKEYREJECTED;
}
diff --git a/crypto/authenc.c b/crypto/authenc.c
index ffce19d..82ca98f 100644
--- a/crypto/authenc.c
+++ b/crypto/authenc.c
@@ -13,6 +13,7 @@
#include <crypto/aead.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/skcipher.h>
+#include <crypto/internal/crypto_memcmp.h>
#include <crypto/authenc.h>
#include <crypto/scatterwalk.h>
#include <linux/err.h>
@@ -188,7 +189,7 @@ static void authenc_verify_ahash_update_done(struct crypto_async_request *areq,
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);

- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;

@@ -227,7 +228,7 @@ static void authenc_verify_ahash_done(struct crypto_async_request *areq,
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);

- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;

@@ -462,7 +463,7 @@ static int crypto_authenc_verify(struct aead_request *req,
ihash = ohash + authsize;
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- return memcmp(ihash, ohash, authsize) ? -EBADMSG : 0;
+ return crypto_memcmp(ihash, ohash, authsize) ? -EBADMSG : 0;
}

static int crypto_authenc_iverify(struct aead_request *req, u8 *iv,
diff --git a/crypto/authencesn.c b/crypto/authencesn.c
index ab53762..ec3bef9 100644
--- a/crypto/authencesn.c
+++ b/crypto/authencesn.c
@@ -15,6 +15,7 @@
#include <crypto/aead.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/skcipher.h>
+#include <crypto/internal/crypto_memcmp.h>
#include <crypto/authenc.h>
#include <crypto/scatterwalk.h>
#include <linux/err.h>
@@ -247,7 +248,7 @@ static void authenc_esn_verify_ahash_update_done(struct crypto_async_request *ar
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);

- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;

@@ -296,7 +297,7 @@ static void authenc_esn_verify_ahash_update_done2(struct crypto_async_request *a
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);

- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;

@@ -336,7 +337,7 @@ static void authenc_esn_verify_ahash_done(struct crypto_async_request *areq,
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);

- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;

@@ -568,7 +569,7 @@ static int crypto_authenc_esn_verify(struct aead_request *req)
ihash = ohash + authsize;
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- return memcmp(ihash, ohash, authsize) ? -EBADMSG : 0;
+ return crypto_memcmp(ihash, ohash, authsize) ? -EBADMSG : 0;
}

static int crypto_authenc_esn_iverify(struct aead_request *req, u8 *iv,
diff --git a/crypto/ccm.c b/crypto/ccm.c
index 499c917..57349d3 100644
--- a/crypto/ccm.c
+++ b/crypto/ccm.c
@@ -12,6 +12,7 @@

#include <crypto/internal/aead.h>
#include <crypto/internal/skcipher.h>
+#include <crypto/internal/crypto_memcmp.h>
#include <crypto/scatterwalk.h>
#include <linux/err.h>
#include <linux/init.h>
@@ -363,7 +364,7 @@ static void crypto_ccm_decrypt_done(struct crypto_async_request *areq,

if (!err) {
err = crypto_ccm_auth(req, req->dst, cryptlen);
- if (!err && memcmp(pctx->auth_tag, pctx->odata, authsize))
+ if (!err && crypto_memcmp(pctx->auth_tag, pctx->odata, authsize))
err = -EBADMSG;
}
aead_request_complete(req, err);
@@ -422,7 +423,7 @@ static int crypto_ccm_decrypt(struct aead_request *req)
return err;

/* verify */
- if (memcmp(authtag, odata, authsize))
+ if (crypto_memcmp(authtag, odata, authsize))
return -EBADMSG;

return err;
diff --git a/crypto/crypto_memcmp.c b/crypto/crypto_memcmp.c
new file mode 100644
index 0000000..ef4101c
--- /dev/null
+++ b/crypto/crypto_memcmp.c
@@ -0,0 +1,31 @@
+/*
+ * Constant-time memcmp.
+ *
+ * Copyright (C) 2013 OpenVPN Technologies Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include <linux/export.h>
+#include <crypto/internal/crypto_memcmp.h>
+
+/**
+ * Like memcmp(), but constant-time to prevent leakage of timing info.
+ * Returns 0 when data is equal, non-zero otherwise.
+ */
+noinline int crypto_memcmp(const void *a, const void *b, size_t size)
+{
+ const u8 *a1 = a;
+ const u8 *b1 = b;
+ int ret = 0;
+ size_t i;
+
+ for (i = 0; i < size; i++) {
+ ret |= *a1++ ^ *b1++;
+ }
+ return ret;
+}
+EXPORT_SYMBOL_GPL(crypto_memcmp);
diff --git a/crypto/gcm.c b/crypto/gcm.c
index 43e1fb0..6be5bca 100644
--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -12,6 +12,7 @@
#include <crypto/internal/aead.h>
#include <crypto/internal/skcipher.h>
#include <crypto/internal/hash.h>
+#include <crypto/internal/crypto_memcmp.h>
#include <crypto/scatterwalk.h>
#include <crypto/hash.h>
#include "internal.h"
@@ -582,7 +583,7 @@ static int crypto_gcm_verify(struct aead_request *req,

crypto_xor(auth_tag, iauth_tag, 16);
scatterwalk_map_and_copy(iauth_tag, req->src, cryptlen, authsize, 0);
- return memcmp(iauth_tag, auth_tag, authsize) ? -EBADMSG : 0;
+ return crypto_memcmp(iauth_tag, auth_tag, authsize) ? -EBADMSG : 0;
}

static void gcm_decrypt_done(struct crypto_async_request *areq, int err)
diff --git a/include/crypto/internal/crypto_memcmp.h b/include/crypto/internal/crypto_memcmp.h
new file mode 100644
index 0000000..db23cc0
--- /dev/null
+++ b/include/crypto/internal/crypto_memcmp.h
@@ -0,0 +1,20 @@
+/*
+ * Constant-time memcmp.
+ *
+ * Copyright (C) 2013 OpenVPN Technologies Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#ifndef _CRYPTO_INTERNAL_CRYPTO_MEMCMP_H
+#define _CRYPTO_INTERNAL_CRYPTO_MEMCMP_H
+
+#include <linux/types.h>
+#include <linux/compiler.h>
+
+noinline int crypto_memcmp(const void *a, const void *b, size_t size);
+
+#endif
--
1.8.1.2
Daniel Borkmann
2013-09-10 18:57:55 UTC
Permalink
Post by James Yonan
When comparing MAC hashes, AEAD authentication tags, or other hash
values in the context of authentication or integrity checking, it
is important not to leak timing information to a potential attacker.
Bytewise memory comparisons (such as memcmp) are usually optimized so
that they return a nonzero value as soon as a mismatch is found.
This early-return behavior can leak timing information, allowing an
attacker to iteratively guess the correct result.
This patch adds a new method crypto_memcmp that has the same prototype
as standard memcmp, but that compares strings of the same length in
roughly constant time (cache misses could change the timing, but
since they don't reveal information about the content of the strings
being compared, they are effectively benign). Note that crypto_memcmp
(unlike memcmp) can only be used to test for equality or inequality,
NOT greater-than or less-than. This is not an issue for its use-cases
within the Crypto API.
I tried to locate all of the places in the Crypto API where memcmp was
being used for authentication or integrity checking, and convert them
over to crypto_memcmp.
crypto_memcmp is declared noinline and placed in its own source file
because a very smart compiler (or LTO) might notice that the return
value is always compared against zero/nonzero, and might then
reintroduce the same early-return optimization that we are trying to
avoid.
There was a similar patch posted some time ago [1] on lkml, where
Florian (CC) made a good point in [2] that future compiler optimizations
could short circuit on this. This issue should probably be addressed in
such a patch here as well.

[1] https://lkml.org/lkml/2013/2/10/131
[2] https://lkml.org/lkml/2013/2/11/381
Post by James Yonan
---
crypto/Makefile | 2 +-
crypto/asymmetric_keys/rsa.c | 5 +++--
crypto/authenc.c | 7 ++++---
crypto/authencesn.c | 9 +++++----
crypto/ccm.c | 5 +++--
crypto/crypto_memcmp.c | 31 +++++++++++++++++++++++++++++++
crypto/gcm.c | 3 ++-
include/crypto/internal/crypto_memcmp.h | 20 ++++++++++++++++++++
8 files changed, 69 insertions(+), 13 deletions(-)
create mode 100644 crypto/crypto_memcmp.c
create mode 100644 include/crypto/internal/crypto_memcmp.h
diff --git a/crypto/Makefile b/crypto/Makefile
index 2ba0df2..39a574d 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -3,7 +3,7 @@
#
obj-$(CONFIG_CRYPTO) += crypto.o
-crypto-y := api.o cipher.o compress.o
+crypto-y := api.o cipher.o compress.o crypto_memcmp.o
obj-$(CONFIG_CRYPTO_WORKQUEUE) += crypto_wq.o
diff --git a/crypto/asymmetric_keys/rsa.c b/crypto/asymmetric_keys/rsa.c
index 4a6a069..4f9a250 100644
--- a/crypto/asymmetric_keys/rsa.c
+++ b/crypto/asymmetric_keys/rsa.c
@@ -13,6 +13,7 @@
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/slab.h>
+#include <crypto/internal/crypto_memcmp.h>
#include "public_key.h"
MODULE_LICENSE("GPL");
@@ -189,12 +190,12 @@ static int RSA_verify(const u8 *H, const u8 *EM, size_t k, size_t hash_size,
}
}
- if (memcmp(asn1_template, EM + T_offset, asn1_size) != 0) {
+ if (crypto_memcmp(asn1_template, EM + T_offset, asn1_size) != 0) {
kleave(" = -EBADMSG [EM[T] ASN.1 mismatch]");
return -EBADMSG;
}
- if (memcmp(H, EM + T_offset + asn1_size, hash_size) != 0) {
+ if (crypto_memcmp(H, EM + T_offset + asn1_size, hash_size) != 0) {
kleave(" = -EKEYREJECTED [EM[T] hash mismatch]");
return -EKEYREJECTED;
}
diff --git a/crypto/authenc.c b/crypto/authenc.c
index ffce19d..82ca98f 100644
--- a/crypto/authenc.c
+++ b/crypto/authenc.c
@@ -13,6 +13,7 @@
#include <crypto/aead.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/skcipher.h>
+#include <crypto/internal/crypto_memcmp.h>
#include <crypto/authenc.h>
#include <crypto/scatterwalk.h>
#include <linux/err.h>
@@ -188,7 +189,7 @@ static void authenc_verify_ahash_update_done(struct crypto_async_request *areq,
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;
@@ -227,7 +228,7 @@ static void authenc_verify_ahash_done(struct crypto_async_request *areq,
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;
@@ -462,7 +463,7 @@ static int crypto_authenc_verify(struct aead_request *req,
ihash = ohash + authsize;
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- return memcmp(ihash, ohash, authsize) ? -EBADMSG : 0;
+ return crypto_memcmp(ihash, ohash, authsize) ? -EBADMSG : 0;
}
static int crypto_authenc_iverify(struct aead_request *req, u8 *iv,
diff --git a/crypto/authencesn.c b/crypto/authencesn.c
index ab53762..ec3bef9 100644
--- a/crypto/authencesn.c
+++ b/crypto/authencesn.c
@@ -15,6 +15,7 @@
#include <crypto/aead.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/skcipher.h>
+#include <crypto/internal/crypto_memcmp.h>
#include <crypto/authenc.h>
#include <crypto/scatterwalk.h>
#include <linux/err.h>
@@ -247,7 +248,7 @@ static void authenc_esn_verify_ahash_update_done(struct crypto_async_request *ar
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;
@@ -296,7 +297,7 @@ static void authenc_esn_verify_ahash_update_done2(struct crypto_async_request *a
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;
@@ -336,7 +337,7 @@ static void authenc_esn_verify_ahash_done(struct crypto_async_request *areq,
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;
@@ -568,7 +569,7 @@ static int crypto_authenc_esn_verify(struct aead_request *req)
ihash = ohash + authsize;
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- return memcmp(ihash, ohash, authsize) ? -EBADMSG : 0;
+ return crypto_memcmp(ihash, ohash, authsize) ? -EBADMSG : 0;
}
static int crypto_authenc_esn_iverify(struct aead_request *req, u8 *iv,
diff --git a/crypto/ccm.c b/crypto/ccm.c
index 499c917..57349d3 100644
--- a/crypto/ccm.c
+++ b/crypto/ccm.c
@@ -12,6 +12,7 @@
#include <crypto/internal/aead.h>
#include <crypto/internal/skcipher.h>
+#include <crypto/internal/crypto_memcmp.h>
#include <crypto/scatterwalk.h>
#include <linux/err.h>
#include <linux/init.h>
@@ -363,7 +364,7 @@ static void crypto_ccm_decrypt_done(struct crypto_async_request *areq,
if (!err) {
err = crypto_ccm_auth(req, req->dst, cryptlen);
- if (!err && memcmp(pctx->auth_tag, pctx->odata, authsize))
+ if (!err && crypto_memcmp(pctx->auth_tag, pctx->odata, authsize))
err = -EBADMSG;
}
aead_request_complete(req, err);
@@ -422,7 +423,7 @@ static int crypto_ccm_decrypt(struct aead_request *req)
return err;
/* verify */
- if (memcmp(authtag, odata, authsize))
+ if (crypto_memcmp(authtag, odata, authsize))
return -EBADMSG;
return err;
diff --git a/crypto/crypto_memcmp.c b/crypto/crypto_memcmp.c
new file mode 100644
index 0000000..ef4101c
--- /dev/null
+++ b/crypto/crypto_memcmp.c
@@ -0,0 +1,31 @@
+/*
+ * Constant-time memcmp.
+ *
+ * Copyright (C) 2013 OpenVPN Technologies Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include <linux/export.h>
+#include <crypto/internal/crypto_memcmp.h>
+
+/**
+ * Like memcmp(), but constant-time to prevent leakage of timing info.
+ * Returns 0 when data is equal, non-zero otherwise.
+ */
+noinline int crypto_memcmp(const void *a, const void *b, size_t size)
+{
+ const u8 *a1 = a;
+ const u8 *b1 = b;
+ int ret = 0;
+ size_t i;
+
+ for (i = 0; i < size; i++) {
+ ret |= *a1++ ^ *b1++;
+ }
+ return ret;
+}
+EXPORT_SYMBOL_GPL(crypto_memcmp);
diff --git a/crypto/gcm.c b/crypto/gcm.c
index 43e1fb0..6be5bca 100644
--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -12,6 +12,7 @@
#include <crypto/internal/aead.h>
#include <crypto/internal/skcipher.h>
#include <crypto/internal/hash.h>
+#include <crypto/internal/crypto_memcmp.h>
#include <crypto/scatterwalk.h>
#include <crypto/hash.h>
#include "internal.h"
@@ -582,7 +583,7 @@ static int crypto_gcm_verify(struct aead_request *req,
crypto_xor(auth_tag, iauth_tag, 16);
scatterwalk_map_and_copy(iauth_tag, req->src, cryptlen, authsize, 0);
- return memcmp(iauth_tag, auth_tag, authsize) ? -EBADMSG : 0;
+ return crypto_memcmp(iauth_tag, auth_tag, authsize) ? -EBADMSG : 0;
}
static void gcm_decrypt_done(struct crypto_async_request *areq, int err)
diff --git a/include/crypto/internal/crypto_memcmp.h b/include/crypto/internal/crypto_memcmp.h
new file mode 100644
index 0000000..db23cc0
--- /dev/null
+++ b/include/crypto/internal/crypto_memcmp.h
@@ -0,0 +1,20 @@
+/*
+ * Constant-time memcmp.
+ *
+ * Copyright (C) 2013 OpenVPN Technologies Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#ifndef _CRYPTO_INTERNAL_CRYPTO_MEMCMP_H
+#define _CRYPTO_INTERNAL_CRYPTO_MEMCMP_H
+
+#include <linux/types.h>
+#include <linux/compiler.h>
+
+noinline int crypto_memcmp(const void *a, const void *b, size_t size);
+
+#endif
Marcelo Cerri
2013-09-11 12:19:57 UTC
Permalink
The discussion that Daniel pointed out has another interesting point
regarding the function name. I don't think it's a good idea to name it
crypto_memcpy since it doesn't have behavior the same way as strcmp.

Florian suggested in the thread names such crypto_mem_equal, which I
think fits better here.

Regards,
Marcelo
Post by Daniel Borkmann
Post by James Yonan
When comparing MAC hashes, AEAD authentication tags, or other hash
values in the context of authentication or integrity checking, it
is important not to leak timing information to a potential attacker.
Bytewise memory comparisons (such as memcmp) are usually optimized so
that they return a nonzero value as soon as a mismatch is found.
This early-return behavior can leak timing information, allowing an
attacker to iteratively guess the correct result.
This patch adds a new method crypto_memcmp that has the same prototype
as standard memcmp, but that compares strings of the same length in
roughly constant time (cache misses could change the timing, but
since they don't reveal information about the content of the strings
being compared, they are effectively benign). Note that crypto_memcmp
(unlike memcmp) can only be used to test for equality or inequality,
NOT greater-than or less-than. This is not an issue for its use-cases
within the Crypto API.
I tried to locate all of the places in the Crypto API where memcmp was
being used for authentication or integrity checking, and convert them
over to crypto_memcmp.
crypto_memcmp is declared noinline and placed in its own source file
because a very smart compiler (or LTO) might notice that the return
value is always compared against zero/nonzero, and might then
reintroduce the same early-return optimization that we are trying to
avoid.
There was a similar patch posted some time ago [1] on lkml, where
Florian (CC) made a good point in [2] that future compiler optimizations
could short circuit on this. This issue should probably be addressed in
such a patch here as well.
[1] https://lkml.org/lkml/2013/2/10/131
[2] https://lkml.org/lkml/2013/2/11/381
Post by James Yonan
---
crypto/Makefile | 2 +-
crypto/asymmetric_keys/rsa.c | 5 +++--
crypto/authenc.c | 7 ++++---
crypto/authencesn.c | 9 +++++----
crypto/ccm.c | 5 +++--
crypto/crypto_memcmp.c | 31 +++++++++++++++++++++++++++++++
crypto/gcm.c | 3 ++-
include/crypto/internal/crypto_memcmp.h | 20 ++++++++++++++++++++
8 files changed, 69 insertions(+), 13 deletions(-)
create mode 100644 crypto/crypto_memcmp.c
create mode 100644 include/crypto/internal/crypto_memcmp.h
diff --git a/crypto/Makefile b/crypto/Makefile
index 2ba0df2..39a574d 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -3,7 +3,7 @@
#
obj-$(CONFIG_CRYPTO) += crypto.o
-crypto-y := api.o cipher.o compress.o
+crypto-y := api.o cipher.o compress.o crypto_memcmp.o
obj-$(CONFIG_CRYPTO_WORKQUEUE) += crypto_wq.o
diff --git a/crypto/asymmetric_keys/rsa.c b/crypto/asymmetric_keys/rsa.c
index 4a6a069..4f9a250 100644
--- a/crypto/asymmetric_keys/rsa.c
+++ b/crypto/asymmetric_keys/rsa.c
@@ -13,6 +13,7 @@
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/slab.h>
+#include <crypto/internal/crypto_memcmp.h>
#include "public_key.h"
MODULE_LICENSE("GPL");
@@ -189,12 +190,12 @@ static int RSA_verify(const u8 *H, const u8 *EM, size_t k, size_t hash_size,
}
}
- if (memcmp(asn1_template, EM + T_offset, asn1_size) != 0) {
+ if (crypto_memcmp(asn1_template, EM + T_offset, asn1_size) != 0) {
kleave(" = -EBADMSG [EM[T] ASN.1 mismatch]");
return -EBADMSG;
}
- if (memcmp(H, EM + T_offset + asn1_size, hash_size) != 0) {
+ if (crypto_memcmp(H, EM + T_offset + asn1_size, hash_size) != 0) {
kleave(" = -EKEYREJECTED [EM[T] hash mismatch]");
return -EKEYREJECTED;
}
diff --git a/crypto/authenc.c b/crypto/authenc.c
index ffce19d..82ca98f 100644
--- a/crypto/authenc.c
+++ b/crypto/authenc.c
@@ -13,6 +13,7 @@
#include <crypto/aead.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/skcipher.h>
+#include <crypto/internal/crypto_memcmp.h>
#include <crypto/authenc.h>
#include <crypto/scatterwalk.h>
#include <linux/err.h>
@@ -188,7 +189,7 @@ static void authenc_verify_ahash_update_done(struct crypto_async_request *areq,
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;
@@ -227,7 +228,7 @@ static void authenc_verify_ahash_done(struct crypto_async_request *areq,
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;
@@ -462,7 +463,7 @@ static int crypto_authenc_verify(struct aead_request *req,
ihash = ohash + authsize;
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- return memcmp(ihash, ohash, authsize) ? -EBADMSG : 0;
+ return crypto_memcmp(ihash, ohash, authsize) ? -EBADMSG : 0;
}
static int crypto_authenc_iverify(struct aead_request *req, u8 *iv,
diff --git a/crypto/authencesn.c b/crypto/authencesn.c
index ab53762..ec3bef9 100644
--- a/crypto/authencesn.c
+++ b/crypto/authencesn.c
@@ -15,6 +15,7 @@
#include <crypto/aead.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/skcipher.h>
+#include <crypto/internal/crypto_memcmp.h>
#include <crypto/authenc.h>
#include <crypto/scatterwalk.h>
#include <linux/err.h>
@@ -247,7 +248,7 @@ static void authenc_esn_verify_ahash_update_done(struct crypto_async_request *ar
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;
@@ -296,7 +297,7 @@ static void authenc_esn_verify_ahash_update_done2(struct crypto_async_request *a
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;
@@ -336,7 +337,7 @@ static void authenc_esn_verify_ahash_done(struct crypto_async_request *areq,
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;
@@ -568,7 +569,7 @@ static int crypto_authenc_esn_verify(struct aead_request *req)
ihash = ohash + authsize;
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- return memcmp(ihash, ohash, authsize) ? -EBADMSG : 0;
+ return crypto_memcmp(ihash, ohash, authsize) ? -EBADMSG : 0;
}
static int crypto_authenc_esn_iverify(struct aead_request *req, u8 *iv,
diff --git a/crypto/ccm.c b/crypto/ccm.c
index 499c917..57349d3 100644
--- a/crypto/ccm.c
+++ b/crypto/ccm.c
@@ -12,6 +12,7 @@
#include <crypto/internal/aead.h>
#include <crypto/internal/skcipher.h>
+#include <crypto/internal/crypto_memcmp.h>
#include <crypto/scatterwalk.h>
#include <linux/err.h>
#include <linux/init.h>
@@ -363,7 +364,7 @@ static void crypto_ccm_decrypt_done(struct crypto_async_request *areq,
if (!err) {
err = crypto_ccm_auth(req, req->dst, cryptlen);
- if (!err && memcmp(pctx->auth_tag, pctx->odata, authsize))
+ if (!err && crypto_memcmp(pctx->auth_tag, pctx->odata, authsize))
err = -EBADMSG;
}
aead_request_complete(req, err);
@@ -422,7 +423,7 @@ static int crypto_ccm_decrypt(struct aead_request *req)
return err;
/* verify */
- if (memcmp(authtag, odata, authsize))
+ if (crypto_memcmp(authtag, odata, authsize))
return -EBADMSG;
return err;
diff --git a/crypto/crypto_memcmp.c b/crypto/crypto_memcmp.c
new file mode 100644
index 0000000..ef4101c
--- /dev/null
+++ b/crypto/crypto_memcmp.c
@@ -0,0 +1,31 @@
+/*
+ * Constant-time memcmp.
+ *
+ * Copyright (C) 2013 OpenVPN Technologies Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include <linux/export.h>
+#include <crypto/internal/crypto_memcmp.h>
+
+/**
+ * Like memcmp(), but constant-time to prevent leakage of timing info.
+ * Returns 0 when data is equal, non-zero otherwise.
+ */
+noinline int crypto_memcmp(const void *a, const void *b, size_t size)
+{
+ const u8 *a1 = a;
+ const u8 *b1 = b;
+ int ret = 0;
+ size_t i;
+
+ for (i = 0; i < size; i++) {
+ ret |= *a1++ ^ *b1++;
+ }
+ return ret;
+}
+EXPORT_SYMBOL_GPL(crypto_memcmp);
diff --git a/crypto/gcm.c b/crypto/gcm.c
index 43e1fb0..6be5bca 100644
--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -12,6 +12,7 @@
#include <crypto/internal/aead.h>
#include <crypto/internal/skcipher.h>
#include <crypto/internal/hash.h>
+#include <crypto/internal/crypto_memcmp.h>
#include <crypto/scatterwalk.h>
#include <crypto/hash.h>
#include "internal.h"
@@ -582,7 +583,7 @@ static int crypto_gcm_verify(struct aead_request *req,
crypto_xor(auth_tag, iauth_tag, 16);
scatterwalk_map_and_copy(iauth_tag, req->src, cryptlen, authsize, 0);
- return memcmp(iauth_tag, auth_tag, authsize) ? -EBADMSG : 0;
+ return crypto_memcmp(iauth_tag, auth_tag, authsize) ? -EBADMSG : 0;
}
static void gcm_decrypt_done(struct crypto_async_request *areq, int err)
diff --git a/include/crypto/internal/crypto_memcmp.h b/include/crypto/internal/crypto_memcmp.h
new file mode 100644
index 0000000..db23cc0
--- /dev/null
+++ b/include/crypto/internal/crypto_memcmp.h
@@ -0,0 +1,20 @@
+/*
+ * Constant-time memcmp.
+ *
+ * Copyright (C) 2013 OpenVPN Technologies Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#ifndef _CRYPTO_INTERNAL_CRYPTO_MEMCMP_H
+#define _CRYPTO_INTERNAL_CRYPTO_MEMCMP_H
+
+#include <linux/types.h>
+#include <linux/compiler.h>
+
+noinline int crypto_memcmp(const void *a, const void *b, size_t size);
+
+#endif
--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
James Yonan
2013-09-11 17:20:09 UTC
Permalink
Post by Daniel Borkmann
There was a similar patch posted some time ago [1] on lkml, where
Florian (CC) made a good point in [2] that future compiler optimizations
could short circuit on this. This issue should probably be addressed in
such a patch here as well.
[1] https://lkml.org/lkml/2013/2/10/131
[2] https://lkml.org/lkml/2013/2/11/381
The discussion that Daniel pointed out has another interesting point
regarding the function name. I don't think it's a good idea to name it
crypto_memcpy since it doesn't have behavior the same way as strcmp.
Florian suggested in the thread names such crypto_mem_equal, which I
think fits better here.
Ok, here's another stab at this:

* Changed the name to crypto_mem_not_equal. The "not_equal" seems to
make more sense because the function returns a nonzero "true" value if
the memory regions are not equal.

* Good point that a smart optimizer might add instructions to
short-circuit the loop if all bits in ret have been set. One way to
deal with this is to disable optimizations that might increase code
size, since a short-circuit optimization in this case would require
adding instructions.

#pragma GCC optimize ("Os")

The nice thing about using #pragma is that older versions of gcc that
don't recognize it will simply ignore it, and we can probably presume
that older versions of gcc do not support a short-circuit optimization
if the latest one does not. I did a quick test using gcc 3.4.6 at -O2,
and did not see any evidence of a short-circuit optimization.

* Improved performance when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is
enabled. This makes the performance roughly on-par with memcmp.

----------------

#pragma GCC optimize ("Os")

noinline unsigned long crypto_mem_not_equal(const void *a, const void *b, size_t size)
{
unsigned long ret = 0;

#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
#if BITS_PER_LONG == 64
while (size >= 8) {
ret |= *(unsigned long *)a ^ *(unsigned long *)b;
a += 8;
b += 8;
size -= 8;
}
if (!size)
return ret;
#endif /* BITS_PER_LONG == 64 */
if (sizeof(unsigned int) == 4) {
while (size >= 4) {
ret |= *(unsigned int *)a ^ *(unsigned int *)b;
a += 4;
b += 4;
size -= 4;
}
if (!size)
return ret;
}
#endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */
while (size > 0) {
ret |= *(unsigned char *)a ^ *(unsigned char *)b;
a += 1;
b += 1;
size -= 1;
}
return ret;
}

James
Daniel Borkmann
2013-09-13 08:33:19 UTC
Permalink
Post by James Yonan
Post by Daniel Borkmann
There was a similar patch posted some time ago [1] on lkml, where
Florian (CC) made a good point in [2] that future compiler optimizations
could short circuit on this. This issue should probably be addressed in
such a patch here as well.
[1] https://lkml.org/lkml/2013/2/10/131
[2] https://lkml.org/lkml/2013/2/11/381
The discussion that Daniel pointed out has another interesting point
regarding the function name. I don't think it's a good idea to name it
crypto_memcpy since it doesn't have behavior the same way as strcmp.
Florian suggested in the thread names such crypto_mem_equal, which I
think fits better here.
* Changed the name to crypto_mem_not_equal. The "not_equal" seems to
make more sense because the function returns a nonzero "true" value if
the memory regions are not equal.
Ok, sounds good.
Post by James Yonan
* Good point that a smart optimizer might add instructions to
short-circuit the loop if all bits in ret have been set. One way to
deal with this is to disable optimizations that might increase code
size, since a short-circuit optimization in this case would require
adding instructions.
#pragma GCC optimize ("Os")
The nice thing about using #pragma is that older versions of gcc that
don't recognize it will simply ignore it, and we can probably presume
that older versions of gcc do not support a short-circuit optimization
if the latest one does not. I did a quick test using gcc 3.4.6 at -O2,
and did not see any evidence of a short-circuit optimization.
* Improved performance when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is
enabled. This makes the performance roughly on-par with memcmp.
Hm, why don't we take fixed-size versions of Daniel J Bernstein from NaCl
library [1], e.g. for comparing hashes?

E.g. for 16 bytes:

int crypto_verify(const unsigned char *x,const unsigned char *y)
{
unsigned int differentbits = 0;
#define F(i) differentbits |= x[i] ^ y[i];
F(0)
F(1)
F(2)
F(3)
F(4)
F(5)
F(6)
F(7)
F(8)
F(9)
F(10)
F(11)
F(12)
F(13)
F(14)
F(15)
return (1 & ((differentbits - 1) >> 8)) - 1;
}

It will return 0 if x[0], x[1], ..., x[15] are the same as y[0], y[1], ..., y[15],
otherwise it returns -1. That's w/o for loops, so probably more "compiler-proof" ...

[1] http://nacl.cr.yp.to/
Post by James Yonan
----------------
#pragma GCC optimize ("Os")
noinline unsigned long crypto_mem_not_equal(const void *a, const void *b, size_t size)
{
unsigned long ret = 0;
#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
#if BITS_PER_LONG == 64
while (size >= 8) {
ret |= *(unsigned long *)a ^ *(unsigned long *)b;
a += 8;
b += 8;
size -= 8;
}
if (!size)
return ret;
#endif /* BITS_PER_LONG == 64 */
if (sizeof(unsigned int) == 4) {
while (size >= 4) {
ret |= *(unsigned int *)a ^ *(unsigned int *)b;
a += 4;
b += 4;
size -= 4;
}
if (!size)
return ret;
}
#endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */
while (size > 0) {
ret |= *(unsigned char *)a ^ *(unsigned char *)b;
a += 1;
b += 1;
size -= 1;
}
return ret;
}
James
James Yonan
2013-09-15 15:32:59 UTC
Permalink
When comparing MAC hashes, AEAD authentication tags, or other hash
values in the context of authentication or integrity checking, it
is important not to leak timing information to a potential attacker.

Bytewise memory comparisons (such as memcmp) are usually optimized so
that they return a nonzero value as soon as a mismatch is found.
This early-return behavior can leak timing information, allowing an
attacker to iteratively guess the correct result.

This patch adds a new method crypto_mem_not_equal that compares
strings of the same length in roughly constant time (cache misses
could change the timing, but since they don't reveal information
about the content of the strings being compared, they are effectively
benign). Note that crypto_mem_not_equal (unlike memcmp) can only be
used to test for equality or inequality, NOT greater-than or
less-than. This is not an issue for its use-cases within the Crypto
API.

I tried to locate all of the places in the Crypto API where memcmp was
being used for authentication or integrity checking, and convert them
over to crypto_mem_not_equal.

crypto_mem_not_equal is declared noinline, placed in its own source
file, and compiled with optimizations that might increase code size
disabled ("Os") because a smart compiler (or LTO) might notice that
the return value is always compared against zero/nonzero, and might
then reintroduce the same early-return optimization that we are
trying to avoid.

Since crypto_mem_not_equal is often called with size == 16 by its
users in the Crypto API, we add a special fast path for this case.

Signed-off-by: James Yonan <***@openvpn.net>
---
crypto/Makefile | 2 +-
crypto/asymmetric_keys/rsa.c | 5 +-
crypto/authenc.c | 7 ++-
crypto/authencesn.c | 9 +--
crypto/ccm.c | 5 +-
crypto/crypto_mem_not_equal.c | 87 ++++++++++++++++++++++++++
crypto/gcm.c | 3 +-
include/crypto/internal/crypto_mem_not_equal.h | 21 +++++++
8 files changed, 126 insertions(+), 13 deletions(-)
create mode 100644 crypto/crypto_mem_not_equal.c
create mode 100644 include/crypto/internal/crypto_mem_not_equal.h

diff --git a/crypto/Makefile b/crypto/Makefile
index 2ba0df2..1f6f44d 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -3,7 +3,7 @@
#

obj-$(CONFIG_CRYPTO) += crypto.o
-crypto-y := api.o cipher.o compress.o
+crypto-y := api.o cipher.o compress.o crypto_mem_not_equal.o

obj-$(CONFIG_CRYPTO_WORKQUEUE) += crypto_wq.o

diff --git a/crypto/asymmetric_keys/rsa.c b/crypto/asymmetric_keys/rsa.c
index 4a6a069..dd82ce5 100644
--- a/crypto/asymmetric_keys/rsa.c
+++ b/crypto/asymmetric_keys/rsa.c
@@ -13,6 +13,7 @@
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/slab.h>
+#include <crypto/internal/crypto_mem_not_equal.h>
#include "public_key.h"

MODULE_LICENSE("GPL");
@@ -189,12 +190,12 @@ static int RSA_verify(const u8 *H, const u8 *EM, size_t k, size_t hash_size,
}
}

- if (memcmp(asn1_template, EM + T_offset, asn1_size) != 0) {
+ if (crypto_mem_not_equal(asn1_template, EM + T_offset, asn1_size) != 0) {
kleave(" = -EBADMSG [EM[T] ASN.1 mismatch]");
return -EBADMSG;
}

- if (memcmp(H, EM + T_offset + asn1_size, hash_size) != 0) {
+ if (crypto_mem_not_equal(H, EM + T_offset + asn1_size, hash_size) != 0) {
kleave(" = -EKEYREJECTED [EM[T] hash mismatch]");
return -EKEYREJECTED;
}
diff --git a/crypto/authenc.c b/crypto/authenc.c
index ffce19d..040ab09 100644
--- a/crypto/authenc.c
+++ b/crypto/authenc.c
@@ -13,6 +13,7 @@
#include <crypto/aead.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/skcipher.h>
+#include <crypto/internal/crypto_mem_not_equal.h>
#include <crypto/authenc.h>
#include <crypto/scatterwalk.h>
#include <linux/err.h>
@@ -188,7 +189,7 @@ static void authenc_verify_ahash_update_done(struct crypto_async_request *areq,
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);

- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_mem_not_equal(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;

@@ -227,7 +228,7 @@ static void authenc_verify_ahash_done(struct crypto_async_request *areq,
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);

- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_mem_not_equal(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;

@@ -462,7 +463,7 @@ static int crypto_authenc_verify(struct aead_request *req,
ihash = ohash + authsize;
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- return memcmp(ihash, ohash, authsize) ? -EBADMSG : 0;
+ return crypto_mem_not_equal(ihash, ohash, authsize) ? -EBADMSG : 0;
}

static int crypto_authenc_iverify(struct aead_request *req, u8 *iv,
diff --git a/crypto/authencesn.c b/crypto/authencesn.c
index ab53762..063d550 100644
--- a/crypto/authencesn.c
+++ b/crypto/authencesn.c
@@ -15,6 +15,7 @@
#include <crypto/aead.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/skcipher.h>
+#include <crypto/internal/crypto_mem_not_equal.h>
#include <crypto/authenc.h>
#include <crypto/scatterwalk.h>
#include <linux/err.h>
@@ -247,7 +248,7 @@ static void authenc_esn_verify_ahash_update_done(struct crypto_async_request *ar
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);

- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_mem_not_equal(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;

@@ -296,7 +297,7 @@ static void authenc_esn_verify_ahash_update_done2(struct crypto_async_request *a
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);

- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_mem_not_equal(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;

@@ -336,7 +337,7 @@ static void authenc_esn_verify_ahash_done(struct crypto_async_request *areq,
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);

- err = memcmp(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
+ err = crypto_mem_not_equal(ihash, ahreq->result, authsize) ? -EBADMSG : 0;
if (err)
goto out;

@@ -568,7 +569,7 @@ static int crypto_authenc_esn_verify(struct aead_request *req)
ihash = ohash + authsize;
scatterwalk_map_and_copy(ihash, areq_ctx->sg, areq_ctx->cryptlen,
authsize, 0);
- return memcmp(ihash, ohash, authsize) ? -EBADMSG : 0;
+ return crypto_mem_not_equal(ihash, ohash, authsize) ? -EBADMSG : 0;
}

static int crypto_authenc_esn_iverify(struct aead_request *req, u8 *iv,
diff --git a/crypto/ccm.c b/crypto/ccm.c
index 499c917..1d5c63a 100644
--- a/crypto/ccm.c
+++ b/crypto/ccm.c
@@ -12,6 +12,7 @@

#include <crypto/internal/aead.h>
#include <crypto/internal/skcipher.h>
+#include <crypto/internal/crypto_mem_not_equal.h>
#include <crypto/scatterwalk.h>
#include <linux/err.h>
#include <linux/init.h>
@@ -363,7 +364,7 @@ static void crypto_ccm_decrypt_done(struct crypto_async_request *areq,

if (!err) {
err = crypto_ccm_auth(req, req->dst, cryptlen);
- if (!err && memcmp(pctx->auth_tag, pctx->odata, authsize))
+ if (!err && crypto_mem_not_equal(pctx->auth_tag, pctx->odata, authsize))
err = -EBADMSG;
}
aead_request_complete(req, err);
@@ -422,7 +423,7 @@ static int crypto_ccm_decrypt(struct aead_request *req)
return err;

/* verify */
- if (memcmp(authtag, odata, authsize))
+ if (crypto_mem_not_equal(authtag, odata, authsize))
return -EBADMSG;

return err;
diff --git a/crypto/crypto_mem_not_equal.c b/crypto/crypto_mem_not_equal.c
new file mode 100644
index 0000000..8469edb
--- /dev/null
+++ b/crypto/crypto_mem_not_equal.c
@@ -0,0 +1,87 @@
+/*
+ * Constant-time equality testing of memory regions.
+ *
+ * Copyright (C) 2013 OpenVPN Technologies Inc.
+ * Author: James Yonan <***@openvpn.net>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include <linux/export.h>
+#include <crypto/internal/crypto_mem_not_equal.h>
+
+/* We want to avoid any optimizations that might shortcut
+ * the comparison and make it non-constant-time.
+ */
+#pragma GCC optimize ("Os")
+
+/**
+ * Constant-time equality testing of memory regions.
+ * Returns 0 when data is equal, non-zero otherwise.
+ * Fast path if size == 16.
+ */
+noinline unsigned long crypto_mem_not_equal(const void *a, const void *b, size_t size)
+{
+ unsigned long ret = 0;
+
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+#if BITS_PER_LONG == 64
+ if (size == 16)
+ return ((*(unsigned long *)(a) ^ *(unsigned long *)(b))
+ | (*(unsigned long *)(a+8) ^ *(unsigned long *)(b+8)));
+ while (size >= 8) {
+ ret |= *(unsigned long *)a ^ *(unsigned long *)b;
+ a += 8;
+ b += 8;
+ size -= 8;
+ }
+ if (!size)
+ return ret;
+#else /* BITS_PER_LONG != 64 */
+ if (size == 16 && sizeof(unsigned int) == 4)
+ return ((*(unsigned int *)(a) ^ *(unsigned int *)(b))
+ | (*(unsigned int *)(a+4) ^ *(unsigned int *)(b+4))
+ | (*(unsigned int *)(a+8) ^ *(unsigned int *)(b+8))
+ | (*(unsigned int *)(a+12) ^ *(unsigned int *)(b+12)));
+#endif /* BITS_PER_LONG */
+ if (sizeof(unsigned int) == 4) {
+ while (size >= 4) {
+ ret |= *(unsigned int *)a ^ *(unsigned int *)b;
+ a += 4;
+ b += 4;
+ size -= 4;
+ }
+ if (!size)
+ return ret;
+ }
+#else /* !defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) */
+ if (size == 16)
+ return ((*(unsigned char *)(a) ^ *(unsigned char *)(b))
+ | (*(unsigned char *)(a+1) ^ *(unsigned char *)(b+1))
+ | (*(unsigned char *)(a+2) ^ *(unsigned char *)(b+2))
+ | (*(unsigned char *)(a+3) ^ *(unsigned char *)(b+3))
+ | (*(unsigned char *)(a+4) ^ *(unsigned char *)(b+4))
+ | (*(unsigned char *)(a+5) ^ *(unsigned char *)(b+5))
+ | (*(unsigned char *)(a+6) ^ *(unsigned char *)(b+6))
+ | (*(unsigned char *)(a+7) ^ *(unsigned char *)(b+7))
+ | (*(unsigned char *)(a+8) ^ *(unsigned char *)(b+8))
+ | (*(unsigned char *)(a+9) ^ *(unsigned char *)(b+9))
+ | (*(unsigned char *)(a+10) ^ *(unsigned char *)(b+10))
+ | (*(unsigned char *)(a+11) ^ *(unsigned char *)(b+11))
+ | (*(unsigned char *)(a+12) ^ *(unsigned char *)(b+12))
+ | (*(unsigned char *)(a+13) ^ *(unsigned char *)(b+13))
+ | (*(unsigned char *)(a+14) ^ *(unsigned char *)(b+14))
+ | (*(unsigned char *)(a+15) ^ *(unsigned char *)(b+15)));
+#endif /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */
+ while (size > 0) {
+ ret |= *(unsigned char *)a ^ *(unsigned char *)b;
+ a += 1;
+ b += 1;
+ size -= 1;
+ }
+ return ret;
+}
+EXPORT_SYMBOL_GPL(crypto_mem_not_equal);
diff --git a/crypto/gcm.c b/crypto/gcm.c
index 43e1fb0..4470b4e 100644
--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -12,6 +12,7 @@
#include <crypto/internal/aead.h>
#include <crypto/internal/skcipher.h>
#include <crypto/internal/hash.h>
+#include <crypto/internal/crypto_mem_not_equal.h>
#include <crypto/scatterwalk.h>
#include <crypto/hash.h>
#include "internal.h"
@@ -582,7 +583,7 @@ static int crypto_gcm_verify(struct aead_request *req,

crypto_xor(auth_tag, iauth_tag, 16);
scatterwalk_map_and_copy(iauth_tag, req->src, cryptlen, authsize, 0);
- return memcmp(iauth_tag, auth_tag, authsize) ? -EBADMSG : 0;
+ return crypto_mem_not_equal(iauth_tag, auth_tag, authsize) ? -EBADMSG : 0;
}

static void gcm_decrypt_done(struct crypto_async_request *areq, int err)
diff --git a/include/crypto/internal/crypto_mem_not_equal.h b/include/crypto/internal/crypto_mem_not_equal.h
new file mode 100644
index 0000000..7b2bd7e
--- /dev/null
+++ b/include/crypto/internal/crypto_mem_not_equal.h
@@ -0,0 +1,21 @@
+/*
+ * Constant-time equality testing of memory regions.
+ *
+ * Copyright (C) 2013 OpenVPN Technologies Inc.
+ * Author: James Yonan <***@openvpn.net>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#ifndef _CRYPTO_INTERNAL_CRYPTO_MEM_NOT_EQUAL_H
+#define _CRYPTO_INTERNAL_CRYPTO_MEM_NOT_EQUAL_H
+
+#include <linux/types.h>
+#include <linux/compiler.h>
+
+noinline unsigned long crypto_mem_not_equal(const void *a, const void *b, size_t size);
+
+#endif
--
1.8.1.2
Florian Weimer
2013-09-15 15:45:53 UTC
Permalink
Post by James Yonan
+ * Constant-time equality testing of memory regions.
+ * Returns 0 when data is equal, non-zero otherwise.
+ * Fast path if size == 16.
+ */
+noinline unsigned long crypto_mem_not_equal(const void *a, const void *b, size_t size)
I think this should really return unsigned or int, to reduce the risk
that the upper bytes are truncated because the caller uses an
inappropriate type, resulting in a bogus zero result. Reducing the
value to 0/1 probably doesn't hurt performance too much. It also
doesn't encode any information about the location of the difference in
the result value, which helps if that ever leaks.
James Yonan
2013-09-15 16:59:43 UTC
Permalink
Post by Florian Weimer
Post by James Yonan
+ * Constant-time equality testing of memory regions.
+ * Returns 0 when data is equal, non-zero otherwise.
+ * Fast path if size == 16.
+ */
+noinline unsigned long crypto_mem_not_equal(const void *a, const void *b, size_t size)
I think this should really return unsigned or int, to reduce the risk
that the upper bytes are truncated because the caller uses an
inappropriate type, resulting in a bogus zero result. Reducing the
value to 0/1 probably doesn't hurt performance too much. It also
doesn't encode any information about the location of the difference in
the result value, which helps if that ever leaks.
The problem with returning 0/1 within the function body of
crypto_mem_not_equal is that it makes it easier for the compiler to
introduce a short-circuit optimization.

It might be better to move the test where the result is compared against
0 into an inline function:

noinline unsigned long __crypto_mem_not_equal(const void *a, const void
*b, size_t size);

static inline int crypto_mem_not_equal(const void *a, const void *b,
size_t size) {
return __crypto_mem_not_equal(a, b, size) != 0UL ? 1 : 0;
}

This hides the fact that we are only interested in a boolean result from
the compiler when it's compiling crypto_mem_not_equal.c, but also
ensures type safety when users test the return value. It's also likely
to have little or no performance impact.

James
Daniel Borkmann
2013-09-16 07:56:23 UTC
Permalink
Post by Florian Weimer
Post by James Yonan
+ * Constant-time equality testing of memory regions.
+ * Returns 0 when data is equal, non-zero otherwise.
+ * Fast path if size == 16.
+ */
+noinline unsigned long crypto_mem_not_equal(const void *a, const void *b, size_t size)
I think this should really return unsigned or int, to reduce the risk
that the upper bytes are truncated because the caller uses an
inappropriate type, resulting in a bogus zero result. Reducing the
value to 0/1 probably doesn't hurt performance too much. It also
doesn't encode any information about the location of the difference in
the result value, which helps if that ever leaks.
The problem with returning 0/1 within the function body of crypto_mem_not_equal is that it makes it easier for the compiler to introduce a short-circuit optimization.
noinline unsigned long __crypto_mem_not_equal(const void *a, const void *b, size_t size);
static inline int crypto_mem_not_equal(const void *a, const void *b, size_t size) {
return __crypto_mem_not_equal(a, b, size) != 0UL ? 1 : 0;
}
This hides the fact that we are only interested in a boolean result from the compiler when it's compiling crypto_mem_not_equal.c, but also ensures type safety when users test the return value. It's also likely to have little or no performance impact.
Well, the code snippet I've provided from NaCl [1] is not really "fast-path"
as you say, but rather to prevent the compiler from doing such optimizations
by having a transformation of the "accumulated" bits into 0 and 1 as an end
result (likely to prevent a short circuit), plus it has static size, so no
loops applied here that could screw up.

Variable size could be done under arch/ in asm, and if not available, that
just falls back to normal memcmp that is being transformed into a same return
value. By that, all other archs could easily migrate afterwards. What do you
think?

[1] http://www.spinics.net/lists/linux-crypto/msg09558.html
James Yonan
2013-09-16 17:10:59 UTC
Permalink
Post by Daniel Borkmann
Post by James Yonan
Post by Florian Weimer
Post by James Yonan
+ * Constant-time equality testing of memory regions.
+ * Returns 0 when data is equal, non-zero otherwise.
+ * Fast path if size == 16.
+ */
+noinline unsigned long crypto_mem_not_equal(const void *a, const
void *b, size_t size)
I think this should really return unsigned or int, to reduce the risk
that the upper bytes are truncated because the caller uses an
inappropriate type, resulting in a bogus zero result. Reducing the
value to 0/1 probably doesn't hurt performance too much. It also
doesn't encode any information about the location of the difference in
the result value, which helps if that ever leaks.
The problem with returning 0/1 within the function body of
crypto_mem_not_equal is that it makes it easier for the compiler to
introduce a short-circuit optimization.
It might be better to move the test where the result is compared
noinline unsigned long __crypto_mem_not_equal(const void *a, const void *b, size_t size);
static inline int crypto_mem_not_equal(const void *a, const void *b, size_t size) {
return __crypto_mem_not_equal(a, b, size) != 0UL ? 1 : 0;
}
This hides the fact that we are only interested in a boolean result
from the compiler when it's compiling crypto_mem_not_equal.c, but also
ensures type safety when users test the return value. It's also
likely to have little or no performance impact.
Well, the code snippet I've provided from NaCl [1] is not really "fast-path"
as you say, but rather to prevent the compiler from doing such
optimizations
by having a transformation of the "accumulated" bits into 0 and 1 as an end
result (likely to prevent a short circuit), plus it has static size, so no
loops applied here that could screw up.
Variable size could be done under arch/ in asm, and if not available, that
just falls back to normal memcmp that is being transformed into a same return
value. By that, all other archs could easily migrate afterwards. What do you
think?
[1] http://www.spinics.net/lists/linux-crypto/msg09558.html
I'm not sure that the differentbits -> 0/-1 transform in NaCl really
buys us anything because we don't care very much about making the final
test of differentbits != 0 constant-time. An attacker already knows
whether the test succeeded or failed -- we care more about making the
failure cases constant-time.

To do this, we need to make sure that the compiler doesn't insert one or
more early instructions to compare differentbits with 0xFF and then
bypass the rest of the F(n) lines because it knows then that the value
of differentbits cannot be changed by subsequent F(n) lines. It seems
that all of the approaches that use |= to build up the differentbits
value suffer from this problem.

My proposed solution is rather than trying to outsmart the compiler with
code that resists optimization, why not just turn optimization off
directly with #pragma GCC optimize. Or better yet, use an optimization
setting that provides reasonable speed without introducing potential
short-circuit optimizations.

By optimizing for size ("Os"), the compiler will need to turn off
optimizations that add code for a possible speed benefit, and the kind
of short-circuit optimizations that we are trying to avoid fall
precisely into this class -- they add an instruction to check if the OR
accumulator has all of its bits set, then if so, do an early exit. So
by using Os, we still benefit from optimizations that increase speed but
don't increase code size.

Once we eliminate the possibility of short-circuit optimizations, then
it's much more straightforward to make the function fast (e.g. by
comparing 64-bits at a time) and flexible (by handling arbitrary size).
My current implementation of crypto_mem_not_equal does both.

James
Daniel Borkmann
2013-09-17 19:07:12 UTC
Permalink
Post by Daniel Borkmann
Post by James Yonan
Post by Florian Weimer
Post by James Yonan
+ * Constant-time equality testing of memory regions.
+ * Returns 0 when data is equal, non-zero otherwise.
+ * Fast path if size == 16.
+ */
+noinline unsigned long crypto_mem_not_equal(const void *a, const
void *b, size_t size)
I think this should really return unsigned or int, to reduce the risk
that the upper bytes are truncated because the caller uses an
inappropriate type, resulting in a bogus zero result. Reducing the
value to 0/1 probably doesn't hurt performance too much. It also
doesn't encode any information about the location of the difference in
the result value, which helps if that ever leaks.
The problem with returning 0/1 within the function body of
crypto_mem_not_equal is that it makes it easier for the compiler to
introduce a short-circuit optimization.
It might be better to move the test where the result is compared
noinline unsigned long __crypto_mem_not_equal(const void *a, const
void *b, size_t size);
static inline int crypto_mem_not_equal(const void *a, const void *b, size_t size) {
return __crypto_mem_not_equal(a, b, size) != 0UL ? 1 : 0;
}
This hides the fact that we are only interested in a boolean result
from the compiler when it's compiling crypto_mem_not_equal.c, but also
ensures type safety when users test the return value. It's also
likely to have little or no performance impact.
Well, the code snippet I've provided from NaCl [1] is not really "fast-path"
as you say, but rather to prevent the compiler from doing such optimizations
by having a transformation of the "accumulated" bits into 0 and 1 as an end
result (likely to prevent a short circuit), plus it has static size, so no
loops applied here that could screw up.
Variable size could be done under arch/ in asm, and if not available, that
just falls back to normal memcmp that is being transformed into a same return
value. By that, all other archs could easily migrate afterwards. What do you
think?
[1] http://www.spinics.net/lists/linux-crypto/msg09558.html
I'm not sure that the differentbits -> 0/-1 transform in NaCl really buys us anything because
we don't care very much about making the final test of differentbits != 0 constant-time. An
attacker already knows whether the test succeeded or failed -- we care more about making the
failure cases constant-time.
To do this, we need to make sure that the compiler doesn't insert one or more early instructions
to compare differentbits with 0xFF and then bypass the rest of the F(n) lines because it knows
then that the value of differentbits cannot be changed by subsequent F(n) lines. It seems that
all of the approaches that use |= to build up the differentbits value suffer from this problem.
My proposed solution is rather than trying to outsmart the compiler with code that resists
optimization, why not just turn optimization off directly with #pragma GCC optimize. Or better
yet, use an optimization setting that provides reasonable speed without introducing potential
short-circuit optimizations.
By optimizing for size ("Os"), the compiler will need to turn off optimizations that add code
for a possible speed benefit, and the kind of short-circuit optimizations that we are trying to
avoid fall precisely into this class -- they add an instruction to check if the OR accumulator has
all of its bits set, then if so, do an early exit. So by using Os, we still benefit from
optimizations that increase speed but don't increase code size.
While I was looking a bit further into this, I thought using an attribute like this might be a
more clean variant ...

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 92669cd..2505b1b 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -351,6 +351,11 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
*/
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

+/* Tell GCC to turn off optimization for a particular function. */
+#ifndef __do_not_optimize
+#define __do_not_optimize __attribute__((optimize("O0")))
+#endif
+
/* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
#ifdef CONFIG_KPROBES
# define __kprobes __attribute__((__section__(".kprobes.text")))

... however, then I found on the GCC developer mailing list [1]: That said, I consider the
optimize attribute code seriously broken and unmaintained (but sometimes useful for debugging -
and only that). [...] Does "#pragma Gcc optimize" work more reliably? No, it uses the same
mechanism internally. [...] And if these options are so broken, should they be marked as such
in the manual? [...] Probably yes.

Therefore, I still need to ask ... what about the option of an arch/*/crypto/... asm
implementation? This still seems safer for something that crucial, imho.
Once we eliminate the possibility of short-circuit optimizations, then it's much more straightforward
to make the function fast (e.g. by comparing 64-bits at a time) and flexible (by handling arbitrary
size). My current implementation of crypto_mem_not_equal does both.
[1] http://gcc.gnu.org/ml/gcc/2012-07/msg00211.html
James Yonan
2013-09-19 00:13:34 UTC
Permalink
Post by James Yonan
Post by James Yonan
Post by Daniel Borkmann
Post by James Yonan
Post by Florian Weimer
Post by James Yonan
+ * Constant-time equality testing of memory regions.
+ * Returns 0 when data is equal, non-zero otherwise.
+ * Fast path if size == 16.
+ */
+noinline unsigned long crypto_mem_not_equal(const void *a, const
void *b, size_t size)
I think this should really return unsigned or int, to reduce the risk
that the upper bytes are truncated because the caller uses an
inappropriate type, resulting in a bogus zero result. Reducing the
value to 0/1 probably doesn't hurt performance too much. It also
doesn't encode any information about the location of the difference in
the result value, which helps if that ever leaks.
The problem with returning 0/1 within the function body of
crypto_mem_not_equal is that it makes it easier for the compiler to
introduce a short-circuit optimization.
It might be better to move the test where the result is compared
noinline unsigned long __crypto_mem_not_equal(const void *a, const
void *b, size_t size);
static inline int crypto_mem_not_equal(const void *a, const void *b, size_t size) {
return __crypto_mem_not_equal(a, b, size) != 0UL ? 1 : 0;
}
This hides the fact that we are only interested in a boolean result
from the compiler when it's compiling crypto_mem_not_equal.c, but also
ensures type safety when users test the return value. It's also
likely to have little or no performance impact.
Well, the code snippet I've provided from NaCl [1] is not really "fast-path"
as you say, but rather to prevent the compiler from doing such optimizations
by having a transformation of the "accumulated" bits into 0 and 1 as an end
result (likely to prevent a short circuit), plus it has static size, so no
loops applied here that could screw up.
Variable size could be done under arch/ in asm, and if not available, that
just falls back to normal memcmp that is being transformed into a same return
value. By that, all other archs could easily migrate afterwards. What do you
think?
[1] http://www.spinics.net/lists/linux-crypto/msg09558.html
I'm not sure that the differentbits -> 0/-1 transform in NaCl really
buys us anything because
we don't care very much about making the final test of differentbits
!= 0 constant-time. An
Post by James Yonan
attacker already knows whether the test succeeded or failed -- we
care more about making the
Post by James Yonan
failure cases constant-time.
To do this, we need to make sure that the compiler doesn't insert one
or more early instructions
to compare differentbits with 0xFF and then bypass the rest of the
F(n) lines because it knows
Post by James Yonan
then that the value of differentbits cannot be changed by subsequent
F(n) lines. It seems that
Post by James Yonan
all of the approaches that use |= to build up the differentbits value
suffer from this problem.
Post by James Yonan
My proposed solution is rather than trying to outsmart the compiler
with code that resists
optimization, why not just turn optimization off directly with
#pragma GCC optimize. Or better
Post by James Yonan
yet, use an optimization setting that provides reasonable speed
without introducing potential
Post by James Yonan
short-circuit optimizations.
By optimizing for size ("Os"), the compiler will need to turn off
optimizations that add code
for a possible speed benefit, and the kind of short-circuit
optimizations that we are trying to
Post by James Yonan
avoid fall precisely into this class -- they add an instruction to
check if the OR accumulator has
Post by James Yonan
all of its bits set, then if so, do an early exit. So by using Os,
we still benefit from
Post by James Yonan
optimizations that increase speed but don't increase code size.
While I was looking a bit further into this, I thought using an
attribute like this might be a
more clean variant ...
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 92669cd..2505b1b 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -351,6 +351,11 @@ void ftrace_likely_update(struct ftrace_branch_data
*f, int val, int expect);
*/
#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
+/* Tell GCC to turn off optimization for a particular function. */
+#ifndef __do_not_optimize
+#define __do_not_optimize __attribute__((optimize("O0")))
+#endif
+
/* Ignore/forbid kprobes attach on very low level functions marked by this attribute: */
#ifdef CONFIG_KPROBES
# define __kprobes __attribute__((__section__(".kprobes.text")))
... however, then I found on the GCC developer mailing list [1]: That said, I consider the
optimize attribute code seriously broken and unmaintained (but sometimes
useful for debugging -
and only that). [...] Does "#pragma Gcc optimize" work more reliably? No, it uses the same
mechanism internally. [...] And if these options are so broken, should
they be marked as such
in the manual? [...] Probably yes.
Therefore, I still need to ask ... what about the option of an
arch/*/crypto/... asm
implementation? This still seems safer for something that crucial, imho.
We can easily specify -Os in the Makefile rather than depending on
#pragma optimize or __attribute__ optimize if they are considered broken.

Re: arch/*/crypto/... asm, not sure it's worth it given the extra effort
to develop, test, and maintain asm for all archs. The two things we
care about (constant time and performance) seem readily achievable in C.

Regarding O0 vs. Os, I would tend to prefer Os because it's much faster
than O0, but still carries the desirable property that optimizations
that increase code size are disabled. It seems that short-circuit
optimizations would be disabled by this, since by definition a
short-circuit optimization requires the addition of a compare and branch.

James
Daniel Borkmann
2013-09-19 08:37:28 UTC
Permalink
On 09/19/2013 02:13 AM, James Yonan wrote:
[...]
We can easily specify -Os in the Makefile rather than depending on #pragma optimize or __attribute__ optimize if they are considered broken.
Re: arch/*/crypto/... asm, not sure it's worth it given the extra effort to develop, test, and maintain asm for all archs. The two things we care about (constant time and performance) seem readily achievable in C.
Regarding O0 vs. Os, I would tend to prefer Os because it's much faster than O0, but still carries the desirable property that optimizations that increase code size are disabled. It seems that short-circuit optimizations would be disabled by this, since by definition a short-circuit optimization requires the addition of a compare and branch.
Ok, if we can make sure that this would overwrite global defaults in any circumstances,
then that approach should be fine, imho.

I would suggest that you use the crypto_mem_not_equal() function that you originally had
or that I was proposing, and still allow the possibility for an arch optimized version,
if people want to.

In that way, it can be kept simple and stupid and easy to review, just like all other
util functions such as memcmp etc is implemented in [1].

[1] http://lingrok.org/xref/linux-net-next/lib/string.c#643

Florian Weimer
2013-09-16 17:25:10 UTC
Permalink
Post by James Yonan
noinline unsigned long __crypto_mem_not_equal(const void *a, const
void *b, size_t size);
static inline int crypto_mem_not_equal(const void *a, const void *b,
size_t size) {
return __crypto_mem_not_equal(a, b, size) != 0UL ? 1 : 0;
}
This hides the fact that we are only interested in a boolean result
from the compiler when it's compiling crypto_mem_not_equal.c, but also
ensures type safety when users test the return value. It's also
likely to have little or no performance impact.
Yes, that looks good to me.

(I haven't checked the other parts of your patch, though, especially
those where you adjust all callers.)
James Yonan
2013-09-15 15:38:12 UTC
Permalink
Post by Daniel Borkmann
Post by James Yonan
Post by Daniel Borkmann
There was a similar patch posted some time ago [1] on lkml, where
Florian (CC) made a good point in [2] that future compiler optimizations
could short circuit on this. This issue should probably be addressed in
such a patch here as well.
[1] https://lkml.org/lkml/2013/2/10/131
[2] https://lkml.org/lkml/2013/2/11/381
The discussion that Daniel pointed out has another interesting point
regarding the function name. I don't think it's a good idea to name it
crypto_memcpy since it doesn't have behavior the same way as strcmp.
Florian suggested in the thread names such crypto_mem_equal, which I
think fits better here.
* Changed the name to crypto_mem_not_equal. The "not_equal" seems to
make more sense because the function returns a nonzero "true" value if
the memory regions are not equal.
Ok, sounds good.
Post by James Yonan
* Good point that a smart optimizer might add instructions to
short-circuit the loop if all bits in ret have been set. One way to
deal with this is to disable optimizations that might increase code
size, since a short-circuit optimization in this case would require
adding instructions.
#pragma GCC optimize ("Os")
The nice thing about using #pragma is that older versions of gcc that
don't recognize it will simply ignore it, and we can probably presume
that older versions of gcc do not support a short-circuit optimization
if the latest one does not. I did a quick test using gcc 3.4.6 at -O2,
and did not see any evidence of a short-circuit optimization.
* Improved performance when CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is
enabled. This makes the performance roughly on-par with memcmp.
Hm, why don't we take fixed-size versions of Daniel J Bernstein from NaCl
library [1], e.g. for comparing hashes?
int crypto_verify(const unsigned char *x,const unsigned char *y)
{
unsigned int differentbits = 0;
#define F(i) differentbits |= x[i] ^ y[i];
F(0)
F(1)
F(2)
F(3)
F(4)
F(5)
F(6)
F(7)
F(8)
F(9)
F(10)
F(11)
F(12)
F(13)
F(14)
F(15)
return (1 & ((differentbits - 1) >> 8)) - 1;
}
It will return 0 if x[0], x[1], ..., x[15] are the same as y[0], y[1], ..., y[15],
otherwise it returns -1. That's w/o for loops, so probably more "compiler-proof" ...
[1] http://nacl.cr.yp.to/
Ok, I've resubmitted full patch with fast path for size == 16.

James
Loading...