Security programming with OpenSSL

OpenSSL is the ubiquitous toolkit for writing security applications.
It was originally written as an SSL library with specific focus on
performance for x86 platrorms, but today it has become the de facto
standard for implementing user space security applications. 

What makes OpenSSL so versatile? One look at the feature set will
possibly give us the answer. OpenSSL does the following and many more:

1) Supports nearly every symmetric encryption algorithm  and message
   digest algorithm under the sun

2) Supports all the public key encryption technologies including
   elliptic curve cryptography

3) Has a comprehensive ASN1, Base64 and DER library builtin

4) Given an appropriate random seed, it can generate pseudorandom bits

5) It has support for SSL v2, SSL v3 and TLS v1

6) It implements the X.509 certificate standard with v3 extensions

7) Support for SMIME, PKCS7, PKSC8 and PKCS12 standards and OCSP

8) Supports hardware acceleration 

9) Has a low level math library for prime number generation, modulo
   arithmetic, arbitrary precision multiplications, exponentiations
   and so on

10) Has a BIO (Basic I/O) abstraction using which you can "chain"
    filter BIOs with a source/sink BIO 

I stopped at 10 since the list would be too long if I were to go on.
It also has a command line tool which has an interface to almost
everything I mentioned above. There are also man pages that describe
each command in detail. The library is free for commercial use and
there is no obligation to release the source code for the changes.
Not surprisingly, OpenSSL is popular in the commercial world as well.

What is lacking, however, even after several years of existence in
which the library has steadily progressed in terms of features,
various enhancements and bug fixes, support for different hardware
chips and so on and so forth, is a clear lack of proper documentation. 

Sure, you can walk to the nearby bookstore and pick up the O'Reilly
book titled "Network Security with OpenSSL". You can also google and
come up with a few search results. But, none of them give you a true
picture of how to use OpenSSL's length and breadth of features.

What this article aims to do is not give you advice on security and
security practices. We also won't talk about topics covered elsewhere
like doing SSL programming with OpenSSL. Instead we will focus on how
to  give you enough hold of the OpenSSL source code so that you can
use OpenSSL profitably. You may have an academic interest or wish to
learn the practical aspects of cryptography. Or you may want to hack
just for the fun of it. 

The first thought that comes to most people when they hear security
or cryptography is encryption. So why not start with that? 

Encryption can be very simply explained as a process in which the input
is subject to "substitution" and "transposition" otherwise known as
confusion and diffusion to produce an output which is indistinguishable
from the input. In fact, the diffusion or transposition is so thorough
that the output looks almost random, which is why there is no point in
attempting to compress an encrypted message since compression relies on
patterns in input data. So, compression should always be done prior to
encryption.

It is for the substitution stage that we need a secret key. The secret
key is used to transform the input in such a manner that it is very hard
to guess the key or the input from the output. And all this is done in
such a manner that is perfectly reversible. 

For decryption, use the same secret key and run the reverse substitution
process and the transposition such as to obtain the original back. 

Obviously this is very simplistic. What happens in reality is that the
secret key is used to generate a key sequence which is applied to the
input data in multiple rounds with a good amount of shuffling
(transposition) thrown in at each round. The input is split into multiple
blocks and each block is subject to the above process. The blocks are
combined in different ways to make sure that the shuffling is really
thorough. In the most secure method of combining blocks called cipher
block chaining, each successive block depends on all of its previous
blocks. 

By the way, what I mentioned above is symmetric encryption scheme which
is what most people mean by encryption. And I spoke about block ciphers
which is the most popular.

OpenSSL 0.9.8a supports the following symmetric encryption algorithms.

1) AES (Rijndael)
2) Blowfish
3) CAST
4) DES
5) IDEA
6) RC2
7) RC4
8) DES3

Now that we learnt the theory, it must be really tempting to see how
it works in practice. That is where having an open source implementation
helps in which we can really go down to as much depth as we need. For
this article, however, we will be satisfied with the code to achieve
basic encryption and decryption.

This is the code for encryption:


#include <fcntl.h>
#include <unistd.h>
#include <openssl/evp.h>

#define IV "0xdeadbeefdeadbeef"

int main(int argc, char **argv) {
	EVP_CIPHER_CTX ctx;
	unsigned char key[1024],iv[1024],ibuf[1024],obuf[1024];
	int rfd, wfd,keyfd,ilen,olen,tlen;
	int l = 0;

	if(argc < 3) {
		printf("Usage: %s infile outfile\n",argv[0]);
		exit(128);
	}


	memcpy(iv,IV,sizeof(IV));
	key[0] = 0;

	/* Let us derive a random 256 bit key */
	while(l < 32) {
		char b[128];
		sprintf(b,"%lu",arc4random());
		strcat(key,b);
		l = strlen(key);
	}


 	keyfd =	creat(".key",0644);
	write(keyfd,key,256);
	close(keyfd);

	EVP_CIPHER_CTX_init(&ctx);
	if(!EVP_CipherInit_ex(&ctx, EVP_aes_256_cbc(),NULL,key, iv,1) ) {
	printf("Couldnt initialize cipher\n");
	return 1;
}
/* 1 for encrypt, 0 for decrypt */

	if((rfd = open(argv[1],O_RDONLY) ) == -1) {
		printf("Couldnt open input file\n");
		exit(128);
	}
	if((wfd = creat(argv[2],0644) ) == -1) {
		printf("Couldn't open output file for writing\n");
		exit(128);
	}

	while((ilen = read(rfd,ibuf,1024) ) > 0) {
		if(EVP_CipherUpdate(&ctx,obuf,&olen,ibuf,ilen)){
			write(wfd,obuf,olen);
		}
		else {
			printf("Encryption error\n");
			return 1;
		}

	}
	if(!EVP_CipherFinal_ex(&ctx,obuf+olen,&tlen)) {
		printf("Trouble with padding the last block\n");
		return 1;
	}
	write(wfd,obuf+olen,tlen);
	EVP_CIPHER_CTX_cleanup(&ctx);
	close(rfd);
	close(wfd);

	printf("AES 256 CBC encryption complete\n");
	printf("Secret key is saved to file .key\n");

	return 0;
}

And here is the code for decryption:


#include <fcntl.h>
#include <unistd.h>
#include <openssl/evp.h>

#define IV "0xdeadbeefdeadbeef"

int main(int argc, char **argv) {
	EVP_CIPHER_CTX ctx;
	unsigned char key[1024],iv[1024],ibuf[1024],obuf[1024];
	int rfd, wfd,keyfd,ilen,olen,tlen;
	int l = 0;

	if(argc < 3) {
		printf("Usage: %s infile outfile\n",argv[0]);
		exit(128);
	}


	memcpy(iv,IV,sizeof(IV));
	key[0] = 0;
	
	keyfd = open(".key",O_RDONLY);
	read(keyfd,key,256);
	close(keyfd);

	
	EVP_CIPHER_CTX_init(&ctx);
/* last parameter 1 for encrypt, 0 for decrypt */
	if(!EVP_CipherInit_ex(&ctx, EVP_aes_256_cbc(),NULL,key, iv,0) ) {
	printf("Couldnt initialize cipher\n");
	return 1;
}

	if((rfd = open(argv[1],O_RDONLY) ) == -1) {
		printf("Couldnt open input file\n");
		exit(128);
	}
	if((wfd = creat(argv[2],0644) ) == -1) {
		printf("Couldn't open output file for writing\n");
		exit(128);
	}

	while((ilen = read(rfd,ibuf,1024) ) > 0) {
		if(EVP_CipherUpdate(&ctx,obuf,&olen,ibuf,ilen)){
			write(wfd,obuf,olen);
		}
		else {
			printf("Decryption error\n");
			return 1;
		}

	}
	if(!EVP_CipherFinal_ex(&ctx,obuf+olen,&tlen)) {
		printf("Trouble with unpadding the last block\n");
		return 1;
	}
	write(wfd,obuf+olen,tlen);
	EVP_CIPHER_CTX_cleanup(&ctx);
	close(rfd);
	close(wfd);

	printf("AES 256 CBC decryption complete\n");

	return 0;
}

As it must be obvious by now, the above works best only when
sufficient randomness is present in the key.  It helps if the
input does not have patterns, but that is not something we can
change.

There is another encryption methodology called public key
encryption or asymmetric cryptography. This is very different
from symmetric encryption. I am going to talk about the most
popular scheme - the RSA algorithm.

The goal of public key cryptography is that of achieving secrecy
without sharing a secret key first like in the case of symmetric
cryptography above. So what happens here is slightly more difficult
to follow. There are three components for encryption. One is the
modulus whose length is what we mean by 1024 bit RSA, i.e., the
modulus is an integer of length 1024 bits. Using the modulus we
derive two keys - the public key and the private key. Now there
is a relationship between these three entities. Data encrypted
with the private key is decryptable only with the public key and
vice versa. The modulus is a common entity required for encryption
and decryption. 

The way encryption is done is also very curious. It is modular
exponentiation.  Needless to say it is highly computation intensive
since very large integers are involved as well as unwieldy for
arbitrary data. So typically only secret keys are encrypted using RSA.
Or message digests are encrypted for signing with RSA.

The difficulty in cracking an RSA key lies in the difficulty of the
factorization of large numbers into two primes. It is very easy to
multiply two primes but without one of them , it is nearly impossible
to figure out the other multiplicand. 

So RSA is not something you would want to use for encrypting your
file on disk!

Bear in mind that in OpenSSL, public keys are uniquely related to
private keys in such a way that you can derive the public key from
the private key at any time. Next, we come to the topic of message
digests. Also known as cryptographic hashes or one way functions,
this class of functions actually have nothing to do with secrecy
since they dont directly make your communications secure. When used
in conjunction with encryption, they have an invaluable contribution
to make. 

The goal of message digests is to fingerprint the input data such
that even a very minor change (introduction of a space character)
or even change in one bit of input data produces an output that is
completely different. Another very important property is that no
matter what the size of input is, whether it is one byte or a
Gigabyte, the output is always of a constant size. As you can see,
message digests are irreversible. It is impossible to construct the
input from the output, that way it is quite different from the two
encryption techniques we discussed above.

There are different methodologies to achieve this. A cyclic redundancy
check used in communications is nothing but a polynomial which takes
into account as much of the input as possible to make sure we can
detect communication errors. This is a very primitive form of message
digests. Message digest algorithms like MD5 and SHA1 use a much more
sophisticated algorithm. 

Remember here the goal is not that of secrecy, but that of detecting any
malicious or accidental change in input. This is used for integrity check
and signature verification in real life. 

Here is the code for MD5 and SHA1 digests. MD5 digests produce a 128 bit
output whereas SHA-1 produces a 160 bit output:


#include <fcntl.h>
#include <openssl/evp.h>

int main(int argc,char **argv) {
	EVP_MD_CTX md;
        unsigned char md_value[EVP_MAX_MD_SIZE];
	int fd,n,i,md_len;
	unsigned char buf[1024];

	if(argc < 2) {
		printf("Please give a filename to compute the SHA-1 digest on\n");
		return 1;
	}
	OpenSSL_add_all_digests();

        EVP_MD_CTX_init(&md);
        EVP_DigestInit_ex(&md, EVP_sha1(), NULL);

	if((fd = open(argv[1],O_RDONLY) ) == -1) {
		printf("Couldnt open input file, try again\n");
		return 1;
	}
	while((n = read(fd,buf,1024)) > 0)
		EVP_DigestUpdate(&md, buf,n);
        if(EVP_DigestFinal_ex(&md, md_value, &md_len) != 1) {
		printf("Digest computation problem\n");
		return 1;
	}
        EVP_MD_CTX_cleanup(&md);

        printf("Digest is: ");
        for(i = 0; i < md_len; i++) printf("%02x", md_value[i]);
        printf("\n");

return 0;
}

If you replace EVP_sha1() with EVP_md5() you get an MD5 digest. 

Now that we have seen the basic security building blocks, let us
move on to more advanced real life applications of security. Please
be warned that security is more about understanding the threat model,
achieving peer review, following traditional wisdom, etc. As I
mentioned above, this article is not about security advice. I am just
giving you certain theoretical fundamentals and example code that you
can use to understand concepts and hack using the OpenSSL toolkit. 

Maybe this is the right time to take a look at the source level
organization of OpenSSL. Only three directories are of immediate
interest to us. Those are the "apps", "crypto" and "ssl" directories.
The ssl directory contains SSL code which we are not dealing with here. 

The apps directory may be a very good place to start with, as it
contains all the example code you need to start hacking OpenSSL.
The OpenSSL command line utility uses programs from this directory to
hook into the library. 

Maybe you can try the openssl command line and get familiar with the
tools before hacking them. Man pages are available for all of the
functions supported by the openssl tool. 'man enc' will tell about
the interface to symmetric ciphers.'man pkcs7' will tell you about
pkcs7 and so on. In case you are not able to see any of the man pages,
consider "man -M /usr/local/ssl/man enc" and so on.

The man pages are filled with examples on how to use the command line
tools. In case you want to find out all of OpenSSL's documentation,
this is one simple trick you can try. Go to "doc" subdirectory below
openssl-source-0.9.8a. Again there are subdirectories called "apps",
"crypto" and "ssl". Go below them and you will find man pages in pod
format (Perl's Plain Old Documentation format). There are around 280
man pages put together under these directories. To read them either
type "man <filename without .pod>" or else type "perldoc <filename>".
This is by far the most comprehensive documentation available on
OpenSSL library. 

The crypto directory is what will give you the real depth and breadth
of OpenSSL's true capabilities. Take your time and explore the code out
there to your heart's content with any source navigation tools you prefer.

Building OpenSSL is quite easy. Just go to the source directory, say
openssl-source-0.9.8a/ and type "./config shared" followed by "gmake"
and "gmake install" as root. Ok, now back again to some advanced real
life applications of cryptography.

First let us take a look at SMIME or secure e-mail. 
 
What is SMIME? SMIME stands for Secure Multipurpose Internet Mail
Extension.

SMIME is described in RFC3851. SMIME uses PKCS7 cryptographic message
syntax format for representing mail messages with attachments as either
signed, encrypted or compressed type. SMIME can be used for any generic
secure message transfer mechanism not necessarily email. In fact, secure
e-mail needs much more support infrastructure than SMIME alone. Thus, if
you want to send and receive secure mail, you should consider using GnuPG
or some such thing. 

SMIME serves as a good vehicle for sending and receiving PKCS7 wrapped
messages. An SMIME signature adds source authentication, non repudiation
and, of course, message integrity. Signatures in SMIME can be detached for
large messages. In fact, if you are using OpenSSL SMIME for large messages,
then they better be detached, or else it will fail. 

Whenever we talk of non repudiation, usually the presence of an RSA
private key and the corresponding certificate is assumed. 

Signatures are a primary means to ensure non repudiation. It is very
easy to understand digital signatures since we already know public key
cryptography. Signatures are nothing but message digest of the input
message encrypted with the user's private key. Since the private key
is known only to the user and his public key is available for anyone
who needs it, everybody can verify the signature by decrypting the
message digest with the public key and comparing the message digest
with the message digest freshly generated from the input message. 


Therefore we make sure that the originator of the message is the user
and none else. 

HMACs (Hashed Message Authentication Codes) also serve the same purpose,
albeit in a less secure manner. The way it works is very simple. We saw
above how message digests are constructed. HMAC appends a secret key to
the input message whose digest is being computed, thus adding security
to an otherwise insecure message digest. Now when somebody wants to
modify the data, it is impossible to generate a valid HMAC without the
secret key. 


#include <fcntl.h>
#include <openssl/hmac.h>

#define KEY "deadbeefdeadbeef"

int main(int argc,char **argv) {
	HMAC_CTX hmac;

	int fd,n,i,len;
	unsigned char buf[1024],hmac_value[1024];

	if(argc < 2) {
		printf("Please give a filename to compute the HMAC on\n");
		return 1;
	}

	HMAC_CTX_init(&hmac);

        HMAC_Init_ex(&hmac,KEY,strlen(KEY),EVP_sha1(),NULL);


	if((fd = open(argv[1],O_RDONLY) ) == -1) {
		printf("Couldnt open input file, try again\n");
		return 1;
	}
	while((n = read(fd,buf,1024)) > 0)
        	HMAC_Update(&hmac, buf, n);
        HMAC_Final(&hmac, hmac_value, &len); 

        HMAC_CTX_cleanup(&hmac);
        printf("Digest is: ");
        for(i = 0; i < len; i++) printf("%02x", hmac_value[i]);
        printf("\n");

return 0;

	
}

Please note that there is a mistake in the HMAC manpage for OpenSSL
0.9.8a. The HMAC_Init_ex prototype should be this instead:

void HMAC_Init_ex(HMAC_CTX *ctx, const void *key, int len,
                       const EVP_MD *md, ENGINE *impl)

This technique is used both in SSL handshakes and IPsec packet
authentication.

Encrypted email is something that has not quite taken off in spite
of cryptography being widely used for so many years. OpenSSL doesn't
explicitly support any encrypted email facilities though one can
always build one around it using the support it provides. The main
stumbling block to implementing an encrypted email solution lies in
user friendliness and the support infrastructure it requires. For
example, for me to send a plain email to you, all I need is your
email address.  Unfortunately, that is not the case with sending you
a secret mail message. For that I need to establish a secret with you
first. If I know you, then I can call you up and inform you the key
and send you the mail. Simple. But this is not done when there is no
such option. I at least need your public key which is *really* your
public key and nobody else's. How do I know that? These are some
of the issues. Many of them have been solved and PGP is a good
solution that works with a small group of trusted acquaintances.
But a scalable secure email solution that is also user friendly is
hard to achieve. 

Another very interesting field in security called steganography is
worth investigating. It is a form of security in which onlookers
have no idea that there is confidential communications going on.
Common examples are using images to embed text in its least
significant bits, making sensible sentences when the real meaning
of the message is to be read differently and so on. We find examples
of this in Sherlock Holmes' novels.

OpenSSL, however, does not have support for steganography. Maybe you
can add it!