Security programming with OpenSSL OpenSSL is the ubiquitous toolkit for writing security applications. It was originally written as an SSL library with specific focus on performance for x86 platrorms, but today it has become the de facto standard for implementing user space security applications. What makes OpenSSL so versatile? One look at the feature set will possibly give us the answer. OpenSSL does the following and many more: 1) Supports nearly every symmetric encryption algorithm and message digest algorithm under the sun 2) Supports all the public key encryption technologies including elliptic curve cryptography 3) Has a comprehensive ASN1, Base64 and DER library builtin 4) Given an appropriate random seed, it can generate pseudorandom bits 5) It has support for SSL v2, SSL v3 and TLS v1 6) It implements the X.509 certificate standard with v3 extensions 7) Support for SMIME, PKCS7, PKSC8 and PKCS12 standards and OCSP 8) Supports hardware acceleration 9) Has a low level math library for prime number generation, modulo arithmetic, arbitrary precision multiplications, exponentiations and so on 10) Has a BIO (Basic I/O) abstraction using which you can "chain" filter BIOs with a source/sink BIO I stopped at 10 since the list would be too long if I were to go on. It also has a command line tool which has an interface to almost everything I mentioned above. There are also man pages that describe each command in detail. The library is free for commercial use and there is no obligation to release the source code for the changes. Not surprisingly, OpenSSL is popular in the commercial world as well. What is lacking, however, even after several years of existence in which the library has steadily progressed in terms of features, various enhancements and bug fixes, support for different hardware chips and so on and so forth, is a clear lack of proper documentation. Sure, you can walk to the nearby bookstore and pick up the O'Reilly book titled "Network Security with OpenSSL". You can also google and come up with a few search results. But, none of them give you a true picture of how to use OpenSSL's length and breadth of features. What this article aims to do is not give you advice on security and security practices. We also won't talk about topics covered elsewhere like doing SSL programming with OpenSSL. Instead we will focus on how to give you enough hold of the OpenSSL source code so that you can use OpenSSL profitably. You may have an academic interest or wish to learn the practical aspects of cryptography. Or you may want to hack just for the fun of it. The first thought that comes to most people when they hear security or cryptography is encryption. So why not start with that? Encryption can be very simply explained as a process in which the input is subject to "substitution" and "transposition" otherwise known as confusion and diffusion to produce an output which is indistinguishable from the input. In fact, the diffusion or transposition is so thorough that the output looks almost random, which is why there is no point in attempting to compress an encrypted message since compression relies on patterns in input data. So, compression should always be done prior to encryption. It is for the substitution stage that we need a secret key. The secret key is used to transform the input in such a manner that it is very hard to guess the key or the input from the output. And all this is done in such a manner that is perfectly reversible. For decryption, use the same secret key and run the reverse substitution process and the transposition such as to obtain the original back. Obviously this is very simplistic. What happens in reality is that the secret key is used to generate a key sequence which is applied to the input data in multiple rounds with a good amount of shuffling (transposition) thrown in at each round. The input is split into multiple blocks and each block is subject to the above process. The blocks are combined in different ways to make sure that the shuffling is really thorough. In the most secure method of combining blocks called cipher block chaining, each successive block depends on all of its previous blocks. By the way, what I mentioned above is symmetric encryption scheme which is what most people mean by encryption. And I spoke about block ciphers which is the most popular. OpenSSL 0.9.8a supports the following symmetric encryption algorithms. 1) AES (Rijndael) 2) Blowfish 3) CAST 4) DES 5) IDEA 6) RC2 7) RC4 8) DES3 Now that we learnt the theory, it must be really tempting to see how it works in practice. That is where having an open source implementation helps in which we can really go down to as much depth as we need. For this article, however, we will be satisfied with the code to achieve basic encryption and decryption. This is the code for encryption: #include #include #include #define IV "0xdeadbeefdeadbeef" int main(int argc, char **argv) { EVP_CIPHER_CTX ctx; unsigned char key[1024],iv[1024],ibuf[1024],obuf[1024]; int rfd, wfd,keyfd,ilen,olen,tlen; int l = 0; if(argc < 3) { printf("Usage: %s infile outfile\n",argv[0]); exit(128); } memcpy(iv,IV,sizeof(IV)); key[0] = 0; /* Let us derive a random 256 bit key */ while(l < 32) { char b[128]; sprintf(b,"%lu",arc4random()); strcat(key,b); l = strlen(key); } keyfd = creat(".key",0644); write(keyfd,key,256); close(keyfd); EVP_CIPHER_CTX_init(&ctx); if(!EVP_CipherInit_ex(&ctx, EVP_aes_256_cbc(),NULL,key, iv,1) ) { printf("Couldnt initialize cipher\n"); return 1; } /* 1 for encrypt, 0 for decrypt */ if((rfd = open(argv[1],O_RDONLY) ) == -1) { printf("Couldnt open input file\n"); exit(128); } if((wfd = creat(argv[2],0644) ) == -1) { printf("Couldn't open output file for writing\n"); exit(128); } while((ilen = read(rfd,ibuf,1024) ) > 0) { if(EVP_CipherUpdate(&ctx,obuf,&olen,ibuf,ilen)){ write(wfd,obuf,olen); } else { printf("Encryption error\n"); return 1; } } if(!EVP_CipherFinal_ex(&ctx,obuf+olen,&tlen)) { printf("Trouble with padding the last block\n"); return 1; } write(wfd,obuf+olen,tlen); EVP_CIPHER_CTX_cleanup(&ctx); close(rfd); close(wfd); printf("AES 256 CBC encryption complete\n"); printf("Secret key is saved to file .key\n"); return 0; } And here is the code for decryption: #include #include #include #define IV "0xdeadbeefdeadbeef" int main(int argc, char **argv) { EVP_CIPHER_CTX ctx; unsigned char key[1024],iv[1024],ibuf[1024],obuf[1024]; int rfd, wfd,keyfd,ilen,olen,tlen; int l = 0; if(argc < 3) { printf("Usage: %s infile outfile\n",argv[0]); exit(128); } memcpy(iv,IV,sizeof(IV)); key[0] = 0; keyfd = open(".key",O_RDONLY); read(keyfd,key,256); close(keyfd); EVP_CIPHER_CTX_init(&ctx); /* last parameter 1 for encrypt, 0 for decrypt */ if(!EVP_CipherInit_ex(&ctx, EVP_aes_256_cbc(),NULL,key, iv,0) ) { printf("Couldnt initialize cipher\n"); return 1; } if((rfd = open(argv[1],O_RDONLY) ) == -1) { printf("Couldnt open input file\n"); exit(128); } if((wfd = creat(argv[2],0644) ) == -1) { printf("Couldn't open output file for writing\n"); exit(128); } while((ilen = read(rfd,ibuf,1024) ) > 0) { if(EVP_CipherUpdate(&ctx,obuf,&olen,ibuf,ilen)){ write(wfd,obuf,olen); } else { printf("Decryption error\n"); return 1; } } if(!EVP_CipherFinal_ex(&ctx,obuf+olen,&tlen)) { printf("Trouble with unpadding the last block\n"); return 1; } write(wfd,obuf+olen,tlen); EVP_CIPHER_CTX_cleanup(&ctx); close(rfd); close(wfd); printf("AES 256 CBC decryption complete\n"); return 0; } As it must be obvious by now, the above works best only when sufficient randomness is present in the key. It helps if the input does not have patterns, but that is not something we can change. There is another encryption methodology called public key encryption or asymmetric cryptography. This is very different from symmetric encryption. I am going to talk about the most popular scheme - the RSA algorithm. The goal of public key cryptography is that of achieving secrecy without sharing a secret key first like in the case of symmetric cryptography above. So what happens here is slightly more difficult to follow. There are three components for encryption. One is the modulus whose length is what we mean by 1024 bit RSA, i.e., the modulus is an integer of length 1024 bits. Using the modulus we derive two keys - the public key and the private key. Now there is a relationship between these three entities. Data encrypted with the private key is decryptable only with the public key and vice versa. The modulus is a common entity required for encryption and decryption. The way encryption is done is also very curious. It is modular exponentiation. Needless to say it is highly computation intensive since very large integers are involved as well as unwieldy for arbitrary data. So typically only secret keys are encrypted using RSA. Or message digests are encrypted for signing with RSA. The difficulty in cracking an RSA key lies in the difficulty of the factorization of large numbers into two primes. It is very easy to multiply two primes but without one of them , it is nearly impossible to figure out the other multiplicand. So RSA is not something you would want to use for encrypting your file on disk! Bear in mind that in OpenSSL, public keys are uniquely related to private keys in such a way that you can derive the public key from the private key at any time. Next, we come to the topic of message digests. Also known as cryptographic hashes or one way functions, this class of functions actually have nothing to do with secrecy since they dont directly make your communications secure. When used in conjunction with encryption, they have an invaluable contribution to make. The goal of message digests is to fingerprint the input data such that even a very minor change (introduction of a space character) or even change in one bit of input data produces an output that is completely different. Another very important property is that no matter what the size of input is, whether it is one byte or a Gigabyte, the output is always of a constant size. As you can see, message digests are irreversible. It is impossible to construct the input from the output, that way it is quite different from the two encryption techniques we discussed above. There are different methodologies to achieve this. A cyclic redundancy check used in communications is nothing but a polynomial which takes into account as much of the input as possible to make sure we can detect communication errors. This is a very primitive form of message digests. Message digest algorithms like MD5 and SHA1 use a much more sophisticated algorithm. Remember here the goal is not that of secrecy, but that of detecting any malicious or accidental change in input. This is used for integrity check and signature verification in real life. Here is the code for MD5 and SHA1 digests. MD5 digests produce a 128 bit output whereas SHA-1 produces a 160 bit output: #include #include int main(int argc,char **argv) { EVP_MD_CTX md; unsigned char md_value[EVP_MAX_MD_SIZE]; int fd,n,i,md_len; unsigned char buf[1024]; if(argc < 2) { printf("Please give a filename to compute the SHA-1 digest on\n"); return 1; } OpenSSL_add_all_digests(); EVP_MD_CTX_init(&md); EVP_DigestInit_ex(&md, EVP_sha1(), NULL); if((fd = open(argv[1],O_RDONLY) ) == -1) { printf("Couldnt open input file, try again\n"); return 1; } while((n = read(fd,buf,1024)) > 0) EVP_DigestUpdate(&md, buf,n); if(EVP_DigestFinal_ex(&md, md_value, &md_len) != 1) { printf("Digest computation problem\n"); return 1; } EVP_MD_CTX_cleanup(&md); printf("Digest is: "); for(i = 0; i < md_len; i++) printf("%02x", md_value[i]); printf("\n"); return 0; } If you replace EVP_sha1() with EVP_md5() you get an MD5 digest. Now that we have seen the basic security building blocks, let us move on to more advanced real life applications of security. Please be warned that security is more about understanding the threat model, achieving peer review, following traditional wisdom, etc. As I mentioned above, this article is not about security advice. I am just giving you certain theoretical fundamentals and example code that you can use to understand concepts and hack using the OpenSSL toolkit. Maybe this is the right time to take a look at the source level organization of OpenSSL. Only three directories are of immediate interest to us. Those are the "apps", "crypto" and "ssl" directories. The ssl directory contains SSL code which we are not dealing with here. The apps directory may be a very good place to start with, as it contains all the example code you need to start hacking OpenSSL. The OpenSSL command line utility uses programs from this directory to hook into the library. Maybe you can try the openssl command line and get familiar with the tools before hacking them. Man pages are available for all of the functions supported by the openssl tool. 'man enc' will tell about the interface to symmetric ciphers.'man pkcs7' will tell you about pkcs7 and so on. In case you are not able to see any of the man pages, consider "man -M /usr/local/ssl/man enc" and so on. The man pages are filled with examples on how to use the command line tools. In case you want to find out all of OpenSSL's documentation, this is one simple trick you can try. Go to "doc" subdirectory below openssl-source-0.9.8a. Again there are subdirectories called "apps", "crypto" and "ssl". Go below them and you will find man pages in pod format (Perl's Plain Old Documentation format). There are around 280 man pages put together under these directories. To read them either type "man " or else type "perldoc ". This is by far the most comprehensive documentation available on OpenSSL library. The crypto directory is what will give you the real depth and breadth of OpenSSL's true capabilities. Take your time and explore the code out there to your heart's content with any source navigation tools you prefer. Building OpenSSL is quite easy. Just go to the source directory, say openssl-source-0.9.8a/ and type "./config shared" followed by "gmake" and "gmake install" as root. Ok, now back again to some advanced real life applications of cryptography. First let us take a look at SMIME or secure e-mail. What is SMIME? SMIME stands for Secure Multipurpose Internet Mail Extension. SMIME is described in RFC3851. SMIME uses PKCS7 cryptographic message syntax format for representing mail messages with attachments as either signed, encrypted or compressed type. SMIME can be used for any generic secure message transfer mechanism not necessarily email. In fact, secure e-mail needs much more support infrastructure than SMIME alone. Thus, if you want to send and receive secure mail, you should consider using GnuPG or some such thing. SMIME serves as a good vehicle for sending and receiving PKCS7 wrapped messages. An SMIME signature adds source authentication, non repudiation and, of course, message integrity. Signatures in SMIME can be detached for large messages. In fact, if you are using OpenSSL SMIME for large messages, then they better be detached, or else it will fail. Whenever we talk of non repudiation, usually the presence of an RSA private key and the corresponding certificate is assumed. Signatures are a primary means to ensure non repudiation. It is very easy to understand digital signatures since we already know public key cryptography. Signatures are nothing but message digest of the input message encrypted with the user's private key. Since the private key is known only to the user and his public key is available for anyone who needs it, everybody can verify the signature by decrypting the message digest with the public key and comparing the message digest with the message digest freshly generated from the input message. Therefore we make sure that the originator of the message is the user and none else. HMACs (Hashed Message Authentication Codes) also serve the same purpose, albeit in a less secure manner. The way it works is very simple. We saw above how message digests are constructed. HMAC appends a secret key to the input message whose digest is being computed, thus adding security to an otherwise insecure message digest. Now when somebody wants to modify the data, it is impossible to generate a valid HMAC without the secret key. #include #include #define KEY "deadbeefdeadbeef" int main(int argc,char **argv) { HMAC_CTX hmac; int fd,n,i,len; unsigned char buf[1024],hmac_value[1024]; if(argc < 2) { printf("Please give a filename to compute the HMAC on\n"); return 1; } HMAC_CTX_init(&hmac); HMAC_Init_ex(&hmac,KEY,strlen(KEY),EVP_sha1(),NULL); if((fd = open(argv[1],O_RDONLY) ) == -1) { printf("Couldnt open input file, try again\n"); return 1; } while((n = read(fd,buf,1024)) > 0) HMAC_Update(&hmac, buf, n); HMAC_Final(&hmac, hmac_value, &len); HMAC_CTX_cleanup(&hmac); printf("Digest is: "); for(i = 0; i < len; i++) printf("%02x", hmac_value[i]); printf("\n"); return 0; } Please note that there is a mistake in the HMAC manpage for OpenSSL 0.9.8a. The HMAC_Init_ex prototype should be this instead: void HMAC_Init_ex(HMAC_CTX *ctx, const void *key, int len, const EVP_MD *md, ENGINE *impl) This technique is used both in SSL handshakes and IPsec packet authentication. Encrypted email is something that has not quite taken off in spite of cryptography being widely used for so many years. OpenSSL doesn't explicitly support any encrypted email facilities though one can always build one around it using the support it provides. The main stumbling block to implementing an encrypted email solution lies in user friendliness and the support infrastructure it requires. For example, for me to send a plain email to you, all I need is your email address. Unfortunately, that is not the case with sending you a secret mail message. For that I need to establish a secret with you first. If I know you, then I can call you up and inform you the key and send you the mail. Simple. But this is not done when there is no such option. I at least need your public key which is *really* your public key and nobody else's. How do I know that? These are some of the issues. Many of them have been solved and PGP is a good solution that works with a small group of trusted acquaintances. But a scalable secure email solution that is also user friendly is hard to achieve. Another very interesting field in security called steganography is worth investigating. It is a form of security in which onlookers have no idea that there is confidential communications going on. Common examples are using images to embed text in its least significant bits, making sensible sentences when the real meaning of the message is to be read differently and so on. We find examples of this in Sherlock Holmes' novels. OpenSSL, however, does not have support for steganography. Maybe you can add it!