Cryptographic hash functions reloaded [was Interest in cryptographic functions within the standard library]

Discussion:

Markus Mayer

2014-08-23 20:14:24 UTC

Sorry for not responding for such a long time, but private life kept me
busy.

First of all many thanks to all the valuable feedback you posted in the
'Interest in cryptographic functions within the standard library'
thread. As I will start with hash function I decided to start a new thread.

I have revamped (and simplified) my initial design and put it on the
list to get further feedback.

Design (heavily based on boost::crc):

class hash_function
{
public:
typedef std::array<unsigned char, ALGORITHM_DEFINED> result_type;
//Default contructable
//Copyable
//Moveable

hash_function& process_bytes( void const *buffer, std::size_t
byte_count);

void reset();

const result_type& hash_value();

};

//I am not sure about this function yet...
template<class hash>
typename hash::result_type calculate_hash(void const *buffer,
std::size_t byte_count);

The implemented algorithms will be (the class name is given in the list):
-md5
-sha_1
-sha_224
-sha_256
-sha_384
-sha_512
-sha3_224
-sha3_256
-sha3_384
-sha3_512
-Various flavors of crc

Rational:
-Why 'result_type'?
To be consistent with std::function.

-Why 'unsigned char' and not 'uint8_t' for result_type?
Most hash algorithms work on an 8 bit basis. But as long as unsigned
char has a multiple of 8 bits, the algorithms can still be applied. So
'unsigned char' enables those architectures to implement the functions.
Architectures where 'unsigned char' is not a multiple of 8 bits will be
excluded by the proposal.

-Why 'unsigned char' and not 'char' for result_type?
It is to prevent, that people think that the result is a text(string).
Furthermore if I interpret a raw byte it is always positive. Or do you
interpret 0xFF as -128 when it is given in a raw byte stream.

-Why 'process_bytes' and not 'write', 'update', ...?
Well, naming is hard. I will stick to 'process_bytes' during design, but
I'm open to suggestions.

-Why not implement operator()?
Having a function (with a name) is more vocal (and clear) then just
braces. IMHO Operator() is only useful if the object will be used as a
functor (as in std::less). But the signature is to uncommon to be used
in any standard algorithm. But I'm willing to change if someone came up
with a good example.

-Why not rename 'hash_value' to result_type()?
What do you prefer?
auto result = myHash.hash_value;
or
auto result = static_cast<hash_function::result_type>(myHash);

-Why 'sha_512' and not 'sha2_512'?
'sha_512' is the official name for the algorithm. I know it is bad, but
better be consistent.

-Why not add an iterator based process_bytes?
For now I consider it to complex.

-Why not add/delete algorithm XXX?
I think these are the most common. But I am open to suggestions.

-Why not use the naming of 'N3980: Types Don't Know #'?
Is already discussed above.

Open topics:
-How to handle the large state of a hash function?
Hash function can have a large internal state (64 Byte for sha2, 200
Byte for sha3) is it OK to put such objects on the stack, or do we need
to allocate them dynamically (using allocators)?

-How to hash a files?
Hashing a file is quite common. As the iterator interface was removed,
there is no easy way to hash a file (using istream_iterator). How to do
it now?

-Sync with 'Types Don't Know #'

-Should we only add some crc classes (like crc_32, crc_16 or crc_ccitt)
or a an generic (templated) crc algorithm (like in boost:crc)?

-Add 'nothrow' where applicable

-More naming discussions

-Find a suitable header file (maybe functional or algorithm)

regards
Markus

--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

Zhihao Yuan

2014-08-23 21:17:58 UTC