summaryrefslogtreecommitdiffstats
path: root/old/botan/doc/api.tex
diff options
context:
space:
mode:
Diffstat (limited to 'old/botan/doc/api.tex')
-rw-r--r--old/botan/doc/api.tex3103
1 files changed, 3103 insertions, 0 deletions
diff --git a/old/botan/doc/api.tex b/old/botan/doc/api.tex
new file mode 100644
index 0000000..556e76a
--- /dev/null
+++ b/old/botan/doc/api.tex
@@ -0,0 +1,3103 @@
+\documentclass{article}
+
+\setlength{\textwidth}{6.5in}
+\setlength{\textheight}{9in}
+
+\setlength{\headheight}{0in}
+\setlength{\topmargin}{0in}
+\setlength{\headsep}{0in}
+
+\setlength{\oddsidemargin}{0in}
+\setlength{\evensidemargin}{0in}
+
+\title{\textbf{Botan API Reference}}
+\author{}
+\date{2009/2/19}
+
+\newcommand{\filename}[1]{\texttt{#1}}
+\newcommand{\manpage}[2]{\texttt{#1}(#2)}
+
+\newcommand{\macro}[1]{\texttt{#1}}
+
+\newcommand{\function}[1]{\textbf{#1}}
+\newcommand{\keyword}[1]{\texttt{#1}}
+\newcommand{\type}[1]{\texttt{#1}}
+\renewcommand{\arg}[1]{\textsl{#1}}
+\newcommand{\namespace}[1]{\texttt{#1}}
+
+\newcommand{\url}[1]{\texttt{#1}}
+
+\newcommand{\ie}[0]{\emph{i.e.}}
+\newcommand{\eg}[0]{\emph{e.g.}}
+
+\begin{document}
+
+\maketitle
+
+\tableofcontents
+
+\parskip=5pt
+
+\pagebreak
+\section{Introduction}
+
+Botan is a C++ library that attempts to provide the most common
+cryptographic algorithms and operations in an easy to use, efficient,
+and portable way. It runs on a wide variety of systems, and can be
+used with a number of different compilers.
+
+The base library is written in ISO C++, so it can be ported with
+minimal fuss, but Botan also supports a modules system. This system
+exposes system dependent code to the library through portable
+interfaces, extending the set of services available to users.
+
+\subsection{Targets}
+
+Botan's primary targets (system-wise) are 32 and 64-bit CPUs, with a
+flat memory address space of at least 32 bits. Generally, given the
+choice between optimizing for 32-bit systems and 64-bit systems, Botan
+is written to prefer 64-bit, simply on the theory that where
+performance is a real concern, modern 64-bit processors are the
+obvious choice. However in most cases this is not an issue, as many
+algorithms are specified in terms of 32-bit operations precisely to
+target commodity processors.
+
+Smaller handhelds, set-top boxes, and the bigger smart phones and smart
+cards, are also capable of using Botan. However, Botan uses a fairly
+large amount of code space (up to several megabytes, depending upon
+the compiler and options used), which could be prohibitive in some
+systems. Usage of RAM is fairly modest, usually under 64K.
+
+Botan's design makes it quite easy to remove unused algorithms in such
+a way that applications do not need to be recompiled to work, even
+applications that use the algorithms in question. They can simply ask
+Botan if the algorithm exists, and if Botan says yes, ask the library
+to give them such an object for that algorithm.
+
+\subsection{Why Botan?}
+
+Botan may be the perfect choice for your application. Or it might be a
+terribly bad idea. This section will make clear what Botan is
+and is not.
+
+First, let's cover the major strengths:
+
+\begin{list}{$\cdot$}
+ \item Support is (usually) quickly available on the project mailing lists.
+ Commercial support licenses are available for those that desire them.
+
+ \item
+ \item Is written in a (fairly) clean object-oriented style, and the usual
+ API works in terms of reasonably high-level abstractions.
+
+ \item Supports a huge variety of algorithms, including most of the major
+ public key algorithms and standards (such as IEEE 1363, PKCS, and
+ X.509v3).
+
+ \item Supports a name-based lookup scheme, so you can get a hold of any
+ algorithm on the fly.
+
+ \item You can easily extend much of the system at application compile time or
+ at run time.
+
+ \item Works well with a wide variety of compilers, operating systems, and
+ CPUs, and more all the time.
+
+ \item Is the only open source crypto library (that I know of) that has
+ support for memory allocation techniques that prevent an attacker from
+ reading swap in an attempt to gain access to keys or other secrets. In
+ fact several different such methods are supported, depending on the
+ system (two methods for Unix, another for Windows).
+
+ \item Has (optional) support for Zlib and Bzip2 compression/decompression
+ integrated completely into the system -- it only takes a line or two of
+ code to add compression to your application.
+\end{list}
+
+\noindent
+And the major downsides and deficiencies are:
+
+\begin{list}{$\cdot$}
+ \item It's written in C++. If your application isn't, Botan is probably
+ going to be more pain than it's worth.
+ \item
+
+ \item Botan doesn't directly support higher-level protocols and
+ formats like SSL or OpenPGP. SSH support is available from a
+ third-party, and there is an alpha-level SSL/TLS library
+ currently available.
+
+ \item Doesn't currently support any very high level 'envelope' style
+ processing - support for this will probably be added once support for
+ CMS is available, so code using the high level interface will produce
+ data readable by many other libraries.
+\end{list}
+
+\pagebreak
+\section{Getting Started}
+
+\subsection{Basic Conventions}
+
+With a very small number of exceptions, declarations in the library
+are contained within the namespace \namespace{Botan}. Botan declares
+several typedef'ed types to help buffer it against changes in machine
+architecture. These types are used extensively in the interface,
+thus it would be often be convenient to use them without the
+\namespace{Botan} prefix. You can do so by \keyword{using} the
+namespace \namespace{Botan::types} (this way you can use the type
+names without the namespace prefix, but the remainder of the library
+stays out of the global namespace). The included types are \type{byte}
+and \type{u32bit}, which are unsigned integer types.
+
+The headers for Botan are usually available in the form
+\filename{botan/headername.h}. For brevity in this documentation,
+headers are always just called \filename{headername.h}, but they
+should be used with the \filename{botan/} prefix in your actual code.
+
+\subsection{Initializing the Library}
+
+There is a set of core services that the library needs access to
+while it is performing requests. To ensure these are set up, you must
+create a \type{LibraryInitializer} object (usually called 'init' in
+Botan example code; 'botan\_library' or 'botan\_init' may make more
+sense in real applications) prior to making any calls to Botan. This
+object's lifetime must exceed that of all other Botan objects your
+application creates; for this reason the best place to create the
+\type{LibraryInitializer} is at the start of your \function{main}
+function, since this guarantees that it will be created first and
+destroyed last (via standard C++ RAII rules). The initializer does
+things like setting up the memory allocation system and algorithm
+lookup tables, finding out if there is a high resolution timer
+available to use, and similar such matters. With no arguments, the
+library is initialized with various default settings. So most of the
+time (unless you are writing threaded code; see below), all you need
+is:
+
+\texttt{Botan::LibraryInitializer init;}
+
+at the start of your \texttt{main}.
+
+The constructor takes an optional string that specifies arguments.
+Currently the only possible argument is ``thread\_safe'', which must
+have an Boolean argument (for instance ``thread\_safe=false'' or
+``thread\_safe=true''). If ``thread\_safe'' is specified as true the
+library will attempt to register a mutex type to properly guard access
+to shared resources. However these locks do not protect individual
+Botan objects: explicit locking must be used in this case.
+
+If you do not create a \type{LibraryInitializer} object, pretty much
+any Botan operation will fail, because it will be unable to do basic
+things like allocate memory or get random bits. Note too, that you
+should be careful to only create one such object.
+
+It is not strictly necessary to create a \type{LibraryInitializer};
+the actual code performing the initialization and shutdown are in
+static member functions of \type{LibraryInitializer}, called
+\function{initialize} and \function{deinitialize}. A
+\type{LibraryInitializer} merely provides a convenient RAII wrapper
+for the operations (thus for the internal library state as well).
+
+\subsection{Pitfalls}
+
+There are a few things to watch out for to prevent problems when using Botan.
+
+Never allocate any kind of Botan object globally. The problem with
+doing this is that the constructor for such an object will be called
+before the library is initialized. Many Botan objects will, in their
+constructor, make one or more calls into the library global state
+object. Access to this object is checked, so an exception should be
+thrown (rather than a memory access violation or undetected
+uninitialized object access). A rough equivalent that will work is to
+keep a global pointer to the object, initializing it after creating
+your \type{LibraryInitializer}. Merely making the
+\type{LibraryInitializer} also global will probably not help, because
+C++ does not make very strong guarantees about the order that such
+objects will be created.
+
+The same rule applies for making sure the destructors of all your
+Botan objects are called before the \type{LibraryInitializer} is
+destroyed. This implies you can't have static variables that are Botan
+objects inside functions or classes (since in most C++ runtimes, these
+objects will be destroyed after main has returned). This is inelegant,
+but seems to not cause many problems in practice.
+
+Botan's memory object classes (\type{MemoryVector},
+\type{SecureVector}, \type{SecureBuffer}) are extremely primitive, and
+do not (currently) meet the requirements for an STL container
+object. After Botan starts adopting C++0x features, they will be
+replaced by typedefs of \type{std::vector} with a custom allocator.
+
+Use a \function{try}/\function{catch} block inside your
+\function{main} function, and catch any \type{std::exception} throws
+(remember to catch by reference, as \type{std::exception}'s
+\function{what} method is polymorphic). This is not strictly required,
+but if you don't, and Botan throws an exception, the runtime will call
+\function{std::terminate}, which usually calls \function{abort} or
+something like it, leaving you (or worse, a user of your application)
+wondering what went wrong.
+
+\subsection{Information Flow: Pipes and Filters}
+
+Many common uses of cryptography involve processing one or more
+streams of data (be it from sockets, files, or a hardware device).
+Botan provides services that make setting up data flows through
+various operations, such as compression, encryption, and base64
+encoding. Each of these operations is implemented in what are called
+\emph{filters} in Botan. A set of filters are created and placed into
+a \emph{pipe}, and information ``flows'' through the pipe until it
+reaches the end, where the output is collected for retrieval. If
+you're familiar with the Unix shell environment, this design will
+sound quite familiar.
+
+Here is an example that uses a pipe to base64 encode some strings:
+
+\begin{verbatim}
+ Pipe pipe(new Base64_Encoder); // pipe owns the pointer
+ pipe.start_msg();
+ pipe.write(``message 1'');
+ pipe.end_msg(); // flushes buffers, increments message number
+
+ // process_msg(x) is start_msg() && write(x) && end_msg()
+ pipe.process_msg(``message2'');
+
+ std::string m1 = pipe.read_all_as_string(0); // ``message1''
+ std::string m2 = pipe.read_all_as_string(1); // ``message2''
+\end{verbatim}
+
+Bytestreams in the pipe are grouped into messages; blocks of data that
+are processed in an identical fashion (\ie, with the same sequence of
+\type{Filter}s). Messages are delimited by calls to
+\function{start\_msg} and \function{end\_msg}. Each message in a pipe
+has its own identifier, which currently is an integer that increments
+up from zero.
+
+As you can see, the \type{Base64\_Encoder} was allocated using
+\keyword{new}; but where was it deallocated? When a filter object is
+passed to a \type{Pipe}, the pipe takes ownership of the object, and
+will deallocate it when it is no longer needed.
+
+There are two different ways to make use of messages. One is to send
+several messages through a \type{Pipe} without changing the
+\type{Pipe}'s configuration, so you end up with a sequence of
+messages; one use of this would be to send a sequence of identically
+encrypted UDP packets, for example (note that the \emph{data} need not
+be identical; it is just that each is encrypted, encoded, signed, etc
+in an identical fashion). Another is to change the filters that are
+used in the \type{Pipe} between each message, by adding or removing
+\type{Filter}s; functions that let you do this are documented in the
+Pipe API section.
+
+Most operations in Botan have a corresponding filter for use in Pipe.
+Here's code that encrypts a string with AES-128 in CBC mode:
+
+\begin{verbatim}
+ AutoSeeded_RNG rng,
+ SymmetricKey key(rng, 16); // a random 128-bit key
+ InitializationVector iv(rng, 16); // a random 128-bit IV
+
+ // Notice the algorithm we want is specified by a string
+ Pipe pipe(get_cipher(``AES-128/CBC'', key, iv, ENCRYPTION));
+
+ pipe.process_msg(``secrets'');
+ pipe.process_msg(``more secrets'');
+
+ MemoryVector<byte> c1 = pipe.read_all(0);
+
+ byte c2[4096] = { 0 };
+ u32bit got_out = pipe.read(c2, sizeof(c2), 1);
+ // use c2[0...got_out]
+\end{verbatim}
+
+Note the use of \type{AutoSeeded\_RNG}, which is a random number
+generator. If you want to, you can explicitly set up the random number
+generators and entropy sources you want to, however for 99\% of cases
+\type{AutoSeeded\_RNG} is preferable.
+
+\type{Pipe} also has convenience methods for dealing with
+\type{std::iostream}s. Here is an example of those, using the
+\type{Bzip\_Compression} filter (included as a module; if you have
+bzlib available, check \filename{building.pdf} for how to enable it)
+to compress a file:
+
+\begin{verbatim}
+ std::ifstream in(``data.bin'', std::ios::binary)
+ std::ofstream out(``data.bin.bz2'', std::ios::binary)
+
+ Pipe pipe(new Bzip_Compression);
+
+ pipe.start_msg();
+ in >> pipe;
+ pipe.end_msg();
+ out << pipe;
+\end{verbatim}
+
+However there is a hitch to the code above; the complete contents of
+the compressed data will be held in memory until the entire message
+has been compressed, at which time the statement \verb|out << pipe| is
+executed, and the data is freed as it is read from the pipe and
+written to the file. But if the file is very large, we might not have
+enough physical memory (or even enough virtual memory!) for that to be
+practical. So instead of storing the compressed data in the pipe for
+reading it out later, we divert it directly to the file:
+
+\begin{verbatim}
+ std::ifstream in(``data.bin'', std::ios::binary)
+ std::ofstream out(``data.bin.bz2'', std::ios::binary)
+
+ Pipe pipe(new Bzip_Compression, new DataSink_Stream(out));
+
+ pipe.start_msg();
+ in >> pipe;
+ pipe.end_msg();
+\end{verbatim}
+
+This is the first code we've seen so far that uses more than one
+filter in a pipe. The output of the compressor is sent to the
+\type{DataSink\_Stream}. Anything written to a \type{DataSink\_Stream}
+is written to a file; the filter produces no output. As soon as the
+compression algorithm finishes up a block of data, it will send it along,
+at which point it will immediately be written to disk; if you were to
+call \verb|pipe.read_all()| after \verb|pipe.end_msg()|, you'd get an
+empty vector out.
+
+Here's an example using two computational filters:
+
+\begin{verbatim}
+ AutoSeeded_RNG rng,
+ SymmetricKey key(rng, 32);
+ InitializationVector iv(rng, 16);
+
+ Pipe encryptor(get_cipher("AES/CBC/PKCS7", key, iv, ENCRYPTION),
+ new Base64_Encoder);
+
+ encryptor.start_msg();
+ file >> encryptor;
+ encryptor.end_msg(); // flush buffers, complete computations
+ std::cout << encryptor;
+\end{verbatim}
+
+\subsection{Fork}
+
+It is fairly common that you might receive some data and want to
+perform more than one operation on it (\ie, encrypt it with Serpent
+and calculate the SHA-256 hash of the plaintext at the same
+time). That's where \type{Fork} comes in. \type{Fork} is a filter that
+takes input and passes it on to \emph{one or more} \type{Filter}s
+that are attached to it. \type{Fork} changes the nature of the pipe
+system completely. Instead of being a linked list, it becomes a tree.
+
+Each \type{Filter} in the fork is given its own output buffer, and
+thus its own message. For example, if you had previously written two
+messages into a \type{Pipe}, then you start a new one with a
+\type{Fork} that has three paths of \type{Filter}'s inside it, you
+add three new messages to the \type{Pipe}. The data you put into the
+\type{Pipe} is duplicated and sent into each set of \type{Filter}s,
+and the eventual output is placed into a dedicated message slot in the
+\type{Pipe}.
+
+Messages in the \type{Pipe} are allocated in a depth-first manner. This is only
+interesting if you are using more than one \type{Fork} in a single \type{Pipe}.
+As an example, consider the following:
+
+\begin{verbatim}
+ Pipe pipe(new Fork(
+ new Fork(
+ new Base64_Encoder,
+ new Fork(
+ NULL,
+ new Base64_Encoder
+ )
+ ),
+ new Hex_Encoder
+ )
+ );
+\end{verbatim}
+
+In this case, message 0 will be the output of the first \type{Base64\_Encoder},
+message 1 will be a copy of the input (see below for how \type{Fork} interprets
+NULL pointers), message 2 will be the output of the second
+\type{Base64\_Encoder}, and message 3 will be the output of the
+\type{Hex\_Encoder}. As you can see, this results in message numbers being
+allocated in a top to bottom fashion, when looked at on the screen. However,
+note that there could be potential for bugs if this is not anticipated. For
+example, if your code is passed a \type{Filter}, and you assume it is a
+``normal'' one that only uses one message, your message offsets would be
+wrong, leading to some confusion during output.
+
+If Fork's first argument is a null pointer, but a later argument is
+not, then Fork will feed a copy of its input directly through. Here's
+a case where that is useful:
+
+\begin{verbatim}
+ // have std::string ciphertext, auth_code, key, iv, mac_key;
+
+ Pipe pipe(new Base64_Decoder,
+ get_cipher(``AES-128'', key, iv, DECRYPTION),
+ new Fork(
+ 0
+ new MAC_Filter(``HMAC(SHA-1)'', mac_key)
+ )
+ );
+
+ pipe.process_msg(ciphertext);
+ std::string plaintext = pipe.read_all_as_string(0);
+ SecureVector<byte> mac = pipe.read_all(1);
+
+ if(mac != auth_code)
+ error();
+\end{verbatim}
+
+Here we wanted to not only decrypt the message, but send the decrypted
+text through an additional computation, in order to compute the
+authentication code.
+
+Any \type{Filter}s that are attached to the \type{Pipe} after the
+\type{Fork} are implicitly attached onto the first branch created by
+the fork. For example, let's say you created this \type{Pipe}:
+
+\begin{verbatim}
+Pipe pipe(new Fork(new Hash_Filter("MD5"), new Hash_Filter("SHA-1")),
+ new Hex_Encoder);
+\end{verbatim}
+
+And then called \function{start\_msg}, inserted some data, then
+\function{end\_msg}. Then \arg{pipe} would contain two messages. The
+first one (message number 0) would contain the MD5 sum of the input in
+hex encoded form, and the other would contain the SHA-1 sum of the
+input in raw binary. However, it's much better to use a \type{Chain}
+instead.
+
+\subsubsection{Chain}
+
+A \type{Chain} filter creates a chain of \type{Filter}s and
+encapsulates them inside a single filter (itself). This allows a
+sequence of filters to become a single filter, to be passed into or
+out of a function, or to a \type{Fork} constructor.
+
+You can call \type{Chain}'s constructor with up to 4 \type{Filter*}s
+(they will be added in order), or with an array of \type{Filter*}s and
+a \type{u32bit} that tells \type{Chain} how many \type{Filter*}s are
+in the array (again, they will be attached in order). Here's the
+example from the last section, using chain instead of relying on the
+obscure rule that version used.
+
+\begin{verbatim}
+ Pipe pipe(new Fork(
+ new Chain(new Hash_Filter("MD5"), new Hex_Encoder),
+ new Hash_Filter("SHA-1")
+ )
+ );
+\end{verbatim}
+
+\subsection{The Pipe API}
+
+\subsubsection{Initializing Pipe}
+
+By default, \type{Pipe} will do nothing at all; any input placed into
+the \type{Pipe} will be read back unchanged. Obviously, this has
+limited utility, and presumably you want to use one or more
+\type{Filter}s to somehow process the data. First, you can choose a
+set of \type{Filter}s to initialize the \type{Pipe} via the
+constructor. You can pass it either a set of up to 4 \type{Filter*}s,
+or a pre-defined array and a length:
+
+\begin{verbatim}
+ Pipe pipe1(new Filter1(/*args*/), new Filter2(/*args*/),
+ new Filter3(/*args*/), new Filter4(/*args*/));
+ Pipe pipe2(new Filter1(/*args*/), new Filter2(/*args*/));
+
+ Filter* filters[5] = {
+ new Filter1(/*args*/), new Filter2(/*args*/), new Filter3(/*args*/),
+ new Filter4(/*args*/), new Filter5(/*args*/) /* more if desired... */
+ };
+ Pipe pipe3(filters, 5);
+\end{verbatim}
+
+This is by far the most common way to initialize a \type{Pipe}. However,
+occasionally a more flexible initialization strategy is necessary; this is
+supported by 4 member functions: \function{prepend}(\type{Filter*}),
+\function{append}(\type{Filter*}), \function{pop}(), and \function{reset}().
+These functions may only be used while the \type{Pipe} in question is not in
+use; that is, either before calling \function{start\_msg}, or after
+\function{end\_msg} has been called (and no new calls to \function{start\_msg}
+have been made yet).
+
+The function \function{reset}() simply removes all the \type{Filter}s
+that the \type{Pipe} is currently using~--~it is reset to an
+initialize, ``empty'' state. Any data that is being retained by the
+\type{Pipe} is retained after a \function{reset}(), and
+\function{reset}() does not affect the message numbers (discussed
+later).
+
+Calling \function{prepend} and \function{append} will either prepend
+or append the passed \type{Filter} object to the list of
+transformations. For example, if you \function{prepend} a
+\type{Filter} implementing encryption, and the \type{Pipe} already had
+a \type{Filter} that hex encoded the input, then the next set of
+input would be first encrypted, then hex encoded. Alternately, if you
+called \function{append}, then the input would be first be hex
+encoded, and then encrypted (which is not terribly useful in this
+particular example).
+
+Finally, calling \function{pop}() will remove the first transformation
+of the \type{Pipe}. Say we had called \function{prepend} to put an
+encryption \type{Filter} into a \type{Pipe}; calling \function{pop}()
+would remove this \type{Filter} and return the \type{Pipe} to its
+state before we called \function{prepend}.
+
+\subsubsection{Giving Data to a Pipe}
+
+Input to a \type{Pipe} is delimited into messages, which can be read from
+independently (\ie, you can read 5 bytes from one message, and then all of
+another message, without either read affecting any other messages). The
+messages are delimited by calls to \function{start\_msg} and
+\function{end\_msg}. In between these two calls, you can write data into a
+\type{Pipe}, and it will be processed by the \type{Filter}(s) that it
+contains. Writes at any other time are invalid, and will result in an
+exception.
+
+As to writing, you can call any of the functions called \function{write}(),
+that can take any of: a \type{byte[]}/\type{u32bit} pair, a
+\type{SecureVector<byte>}, a \type{std::string}, a \type{DataSource\&}, or a
+single \type{byte}.
+
+Sometimes, you may want to do only a single write per message. In this case,
+you can use the \function{process\_msg} series of functions, which start a
+message, write their argument into the \type{Pipe}, and then end the
+message. In this case you would not make any explicit calls to
+\function{start\_msg}/\function{end\_msg}. The version of \function{write}
+that takes a single \type{byte} is not supported by \function{process\_msg},
+but all the other variants are.
+
+\type{Pipe} can also be used with the \verb|>>| operator, and will accept a
+\type{std::istream}, (or on Unix systems with the \verb|fd_unix| module), a
+Unix file descriptor. In either case, the entire contents of the file will be
+read into the \type{Pipe}.
+
+\subsubsection{Getting Output from a Pipe}
+
+Retrieving the processed data from a \type{Pipe} is a bit more complicated, for
+various reasons. In particular, because \type{Pipe} will separate each message
+into a separate buffer, you have to be able to retrieve data from each message
+independently. Each of \type{Pipe}'s read functions has a final parameter that
+specifies what message to read from (as a 32-bit integer). If this parameter is
+set to \type{Pipe::DEFAULT\_MESSAGE}, it will read the current default message
+(\type{DEFAULT\_MESSAGE} is also the default value of this parameter). The
+parameter will not be mentioned in further discussion of the reading API, but
+it is always there (unless otherwise noted).
+
+Reading is done with a variety of functions. The most basic are \type{u32bit}
+\function{read}(\type{byte} \arg{out}[], \type{u32bit} \arg{len}) and
+\type{u32bit} \function{read}(\type{byte\&} \arg{out}). Each reads into
+\arg{out} (either up to \arg{len} bytes, or a single byte for the one taking a
+\type{byte\&}), and returns the total number of bytes read. There is a variant
+of these functions, all named \function{peek}, which performs the same
+operations, but does not remove the bytes from the message (reading is a
+destructive operation with a \type{Pipe}).
+
+There are also the functions \type{SecureVector<byte>} \function{read\_all}(),
+and \type{std::string} \function{read\_all\_as\_string}(), which return the
+entire contents of the message, either as a memory buffer, or a
+\type{std::string} (which is generally only useful if the \type{Pipe} has
+encoded the message into a text string, such as when a \type{Base64\_Encoder}
+is used).
+
+To determine how many bytes are left in a message, call \type{u32bit}
+\function{remaining}() (which can also take an optional message
+number). Finally, there are some functions for managing the default message
+number: \type{u32bit} \function{default\_msg}() will return the current default
+message, \type{u32bit} \function{message\_count}() will return the total number
+of messages (0...\function{message\_count}()-1), and
+\function{set\_default\_msg}(\type{u32bit} \arg{msgno}) will set a new default
+message number (which must be a valid message number for that \type{Pipe}). The
+ability to set the default message number is particularly important in the case
+of using the file output operations (\verb|<<| with a \type{std::ostream} or
+Unix file descriptor), because there is no way to specify it explicitly when
+using the output operator.
+
+\subsection{A Filter Example}
+
+Here is some code that takes one or more filenames in \arg{argv} and
+calculates the result of several hash functions for each file. The complete
+program can be found as \filename{hasher.cpp} in the Botan distribution. For
+brevity, most error checking has been removed.
+
+\begin{verbatim}
+ string name[3] = { "MD5", "SHA-1", "RIPEMD-160" };
+ Botan::Filter* hash[3] = {
+ new Botan::Chain(new Botan::Hash_Filter(name[0]),
+ new Botan::Hex_Encoder),
+ new Botan::Chain(new Botan::Hash_Filter(name[1]),
+ new Botan::Hex_Encoder),
+ new Botan::Chain(new Botan::Hash_Filter(name[2]),
+ new Botan::Hex_Encoder) };
+
+ Botan::Pipe pipe(new Botan::Fork(hash, COUNT));
+
+ for(u32bit j = 1; argv[j] != 0; j++)
+ {
+ ifstream file(argv[j]);
+ pipe.start_msg();
+ file >> pipe;
+ pipe.end_msg();
+ file.close();
+ for(u32bit k = 0; k != 3; k++)
+ {
+ pipe.set_default_msg(3*(j-1)+k);
+ cout << name[k] << "(" << argv[j] << ") = " << pipe << endl;
+ }
+ }
+\end{verbatim}
+
+
+\subsection{Filter Catalog}
+
+This section contains descriptions of every \type{Filter} included in
+the portable sections of Botan. \type{Filter}s provided by modules
+are documented elsewhere.
+
+\subsubsection{Keyed Filters}
+
+A few sections ago, it was mentioned that \type{Pipe} can process multiple
+messages, treating each of them exactly the same. Well, that was a bit of a
+lie. There are some algorithms (in particular, block ciphers not in ECB mode,
+and all stream ciphers) that change their state as data is put through them.
+
+Naturally, you might well want to reset the keys or (in the case of block
+cipher modes) IVs used by such filters, so multiple messages can be processed
+using completely different keys, or new IVs, or new keys and IVs, or whatever.
+And in fact, even for a MAC or an ECB block cipher, you might well want to
+change the key used from message to message.
+
+Enter \type{Keyed\_Filter}, which acts as an abstract interface for
+any filter that is uses keys: block cipher modes, stream ciphers,
+MACs, and so on. It has two functions, \function{set\_key} and
+\function{set\_iv}. Calling \function{set\_key} will, naturally, set
+(or reset) the key used by the algorithm. Setting the IV only makes
+sense in certain algorithms -- a call to \function{set\_iv} on an
+object that doesn't support IVs will be ignored. You \emph{must} call
+\function{set\_key} before calling \function{set\_iv}: while not all
+\type{Keyed\_Filter} objects require this, you should assume it is
+required anytime you are using a \type{Keyed\_Filter}.
+
+Here's a example:
+
+\begin{verbatim}
+ Keyed_Filter *cast, *hmac;
+ Pipe pipe(new Base64_Decoder,
+ // Note the assignments to the cast and hmac variables
+ cast = new CBC_Decryption("CAST-128", "PKCS7", cast_key, iv),
+ new Fork(
+ 0, // Read the section 'Fork' to understand this
+ new Chain(
+ hmac = new MAC_Filter("HMAC(SHA-1)", mac_key, 12),
+ new Base64_Encoder
+ )
+ )
+ );
+ pipe.start_msg();
+ [use pipe for a while, decrypt some stuff, derive new keys and IVs]
+ pipe.end_msg();
+
+ cast->set_key(cast_key2);
+ cast->set_iv(iv2);
+ hmac->set_key(mac_key2);
+
+ pipe.start_msg();
+ [use pipe for some other things]
+ pipe.end_msg();
+\end{verbatim}
+
+There are some requirements to using \type{Keyed\_Filter} that you must
+follow. If you call \function{set\_key} or \function{set\_iv} on a filter that
+is owned by a \type{Pipe}, you must do so while the \type{Pipe} is
+``unlocked''. This refers to the times when no messages are being processed by
+\type{Pipe} -- either before \type{Pipe}'s \function{start\_msg} is called, or
+after \function{end\_msg} is called (and no new call to \function{start\_msg}
+has happened yet). Doing otherwise will result in undefined behavior, probably
+silently getting invalid output.
+
+And remember: if you're resetting both values, reset the key \emph{first}.
+
+\subsubsection{Cipher Filters}
+
+Getting a hold of a \type{Filter} implementing a cipher is very easy. Simply
+make sure you're including the header \filename{lookup.h}, and call
+\function{get\_cipher}. Generally you will pass the return value directly into
+a \type{Pipe}. There are actually a couple different functions, which do pretty
+much the same thing:
+
+\function{get\_cipher}(\type{std::string} \arg{cipher\_spec},
+ \type{SymmetricKey} \arg{key},
+ \type{InitializationVector} \arg{iv},
+ \type{Cipher\_Dir} \arg{dir});
+
+\function{get\_cipher}(\type{std::string} \arg{cipher\_spec},
+ \type{SymmetricKey} \arg{key},
+ \type{Cipher\_Dir} \arg{dir});
+
+The version that doesn't take an IV is useful for things that don't use them,
+like block ciphers in ECB mode, or most stream ciphers. If you specify a
+\arg{cipher\_spec} that does want a IV, and you use the version that doesn't
+take one, an exception will be thrown. The \arg{dir} argument can be either
+\type{ENCRYPTION} or \type{DECRYPTION}. In a few cases, like most (but not all)
+stream ciphers, these are equivalent, but even then it provides a way of
+showing the ``intent'' of the operation to readers of your code.
+
+The \arg{cipher\_spec} is a string that specifies what cipher is to be
+used. The general syntax for \arg{cipher\_spec} is ``STREAM\_CIPHER'',
+``BLOCK\_CIPHER/MODE'', or ``BLOCK\_CIPHER/MODE/PADDING''. In the case of
+stream ciphers, no mode is necessary, so just the name is sufficient. A block
+cipher requires a mode of some sort, which can be ``ECB'', ``CBC'', ``CFB(n)'',
+``OFB'', ``CTR-BE'', or ``EAX(n)''. The argument to CFB mode is how many bits
+of feedback should be used. If you just use ``CFB'' with no argument, it will
+default to using a feedback equal to the block size of the cipher. EAX mode
+also takes an optional bit argument, which tells EAX how large a tag size to
+use~--~generally this is the size of the block size of the cipher, which is the
+default if you don't specify any argument.
+
+In the case of the ECB and CBC modes, a padding method can also be
+specified. If it is not supplied, ECB defaults to not padding, and CBC defaults
+to using PKCS \#5/\#7 compatible padding. The padding methods currently
+available are ``NoPadding'', ``PKCS7'', ``OneAndZeros'', and ``CTS''. CTS
+padding is currently only available for CBC mode, but the others can also be
+used in ECB mode.
+
+Some example \arg{cipher\_spec} arguments are: ``DES/CFB(32)'',
+``TripleDES/OFB'', ``Blowfish/CBC/CTS'', ``SAFER-SK(10)/CBC/OneAndZeros'',
+``AES/EAX'', ``ARC4''
+
+``CTR-BE'' refers to counter mode where the counter is incremented as if it
+were a big-endian encoded integer. This is compatible with most other
+implementations, but it is possible some will use the incompatible little
+endian convention. This version would be denoted as ``CTR-LE'' if it were
+supported.
+
+``EAX'' is a new cipher mode designed by Wagner, Rogaway, and Bellare. It is an
+authenticated cipher mode (that is, no separate authentication is needed), has
+provable security, and is free from patent entanglements. It runs about half as
+fast as most of the other cipher modes (like CBC, OFB, or CTR), which is not
+bad considering you don't need to use an authentication code.
+
+\subsubsection{Hashes and MACs}
+
+Hash functions and MACs don't need anything special when it comes to
+filters. Both just take their input and produce no output until
+\function{end\_msg()} is called, at which time they complete the hash or MAC
+and send that as output.
+
+These \type{Filter}s take a string naming the type to be used. If for some
+reason you name something that doesn't exist, an exception will be thrown.
+
+\noindent
+\function{Hash\_Filter}(\type{std::string} \arg{hash},
+ \type{u32bit} \arg{outlength}):
+
+This type hashes its input with \arg{hash}. When \function{end\_msg} is called
+on the owning \type{Pipe}, the hash is completed and the digest is sent on to
+the next thing in the pipe. The argument \arg{outlength} specifies how much of
+the output of the hash will be passed along to the next filter when
+\function{end\_msg} is called. By default, it will pass the entire hash.
+
+Examples of names for \function{Hash\_Filter} are ``SHA-1'' and ``Whirlpool''.
+
+\noindent
+\function{MAC\_Filter}(\type{std::string} \arg{mac},
+ \type{const SymmetricKey\&} \arg{key},
+ \type{u32bit} \arg{outlength}):
+
+The constructor for a \type{MAC\_Filter} takes a key, used in calculating the
+MAC, and a length parameter, which has semantics exactly the same as the one
+passed to \type{Hash\_Filter}s constructor.
+
+Examples for \arg{mac} are ``HMAC(SHA-1)'', ``CMAC(AES-128)'', and the
+exceptionally long, strange, and probably useless name
+``CMAC(Lion(Tiger(20,3),MARK-4,1024))''.
+
+\subsubsection{PK Filters}
+
+There are four classes in this category, \type{PK\_Encryptor\_Filter},
+\type{PK\_Decryptor\_Filter}, \type{PK\_Signer\_Filter}, and
+\type{PK\_Verifier\_Filter}. Each takes a pointer to an object of the
+appropriate type (\type{PK\_Encryptor}, \type{PK\_Decryptor}, etc) that is
+deleted by the destructor. These classes are found in \filename{pk\_filts.h}.
+
+Three of these, for encryption, decryption, and signing are pretty much
+identical conceptually. Each of them buffers its input until the end of the
+message is marked with a call to the \function{end\_msg} function. Then they
+encrypt, decrypt, or sign their input and send the output (the ciphertext, the
+plaintext, or the signature) into the next filter.
+
+Signature verification works a little differently, because it needs to know
+what the signature is in order to check it. You can either pass this in along
+with the constructor, or call the function \function{set\_signature} -- with
+this second method, you need to keep a pointer to the filter around so you can
+send it this command. In either case, after \function{end\_msg} is called, it
+will try to verify the signature (if the signature has not been set by either
+method, an exception will be thrown here). It will then send a single byte onto
+the next filter -- a 1 or a 0, which specifies whether the signature verified
+or not (respectively).
+
+For more information about PK algorithms (including creating the appropriate
+objects to pass to the constructors), read the section ``Public Key
+Cryptography'' in this manual.
+
+\subsubsection{Encoders}
+
+Often you want your data to be in some form of text (for sending over channels
+that aren't 8-bit clean, printing it, etc). The filters \type{Hex\_Encoder}
+and \type{Base64\_Encoder} will convert arbitrary binary data into hex or
+base64 formats. Not surprisingly, you can use \type{Hex\_Decoder} and
+\type{Base64\_Decoder} to convert it back into its original form.
+
+Both of the encoders can take a few options about how the data should be
+formatted (all of which have defaults). The first is a \type{bool} which simply
+says if the encoder should insert line breaks. This defaults to
+false. Line breaks don't matter either way to the decoder, but it makes the
+output a bit more appealing to the human eye, and a few transport mechanisms
+(notably some email systems) limit the maximum line length.
+
+The second encoder option is an integer specifying how long such lines will be
+(obviously this will be ignored if line-breaking isn't being used). The default
+tends to be in the range of 60-80 characters, but is not specified exactly. If
+you want a specific value, set it. Otherwise the default should be fine.
+
+Lastly, \type{Hex\_Encoder} takes an argument of type \type{Case}, which can be
+\type{Uppercase} or \type{Lowercase} (default is \type{Uppercase}). This
+specifies what case the characters A-F should be output as. The base64 encoder
+has no such option, because it uses both upper and lower case letters for its
+output.
+
+The decoders both take a single option, which tells it how the object should
+behave in the case of invalid input. The enum (called \type{Decoder\_Checking})
+can take on any of three values: \type{NONE}, \type{IGNORE\_WS}, and
+\type{FULL\_CHECK}. With \type{NONE} (the default, for compatibility with
+previous releases), invalid input (for example, a ``z'' character in supposedly
+hex input) will simply be ignored. With \type{IGNORE\_WS}, whitespace will be
+ignored by the decoder, but receiving other non-valid data will raise an
+exception. Finally, \type{FULL\_CHECK} will raise an exception for \emph{any}
+characters not in the encoded character set, including whitespace.
+
+You can find the declarations for these types in \filename{hex.h} and
+\filename{base64.h}.
+
+\subsection{Rolling Your Own}
+
+The system of filters and pipes was designed in an attempt to make it
+as simple as possible to write new \type{Filter} objects. There are
+essentially four functions that need to be implemented by an object
+deriving from \type{Filter}:
+
+\noindent
+\type{void} \function{write}(\type{byte} \arg{input}[], \type{u32bit}
+\arg{length}):
+
+The \function{write} function is what is called when a filter receives input
+for it to process. The filter is \emph{not} required to process it right away;
+many filters buffer their input before producing any output. A filter will
+usually have \function{write} called many times during its lifetime.
+
+\noindent
+\type{void} \function{send}(\type{byte} \arg{output}[], \type{u32bit}
+\arg{length}):
+
+Eventually, a filter will want to produce some output to send along to the next
+filter in the pipeline. It does so by calling \function{send} with whatever it
+wants to send along to the next filter. There is also a version of
+\function{send} taking a single byte argument, as a convenience.
+
+\noindent
+\type{void} \function{start\_msg()}:
+
+This function is optional. Implement it if your \type{Filter} would like to do
+some processing or setup at the start of each message (for an example, see the
+Zlib compression module).
+
+\noindent
+\type{void} \function{end\_msg()}:
+
+Implementing the \function{end\_msg} function is optional. It is called when it
+has been requested that filters finish up their computations. Note that they
+must \emph{not} deallocate their resources; this should be done by their
+destructor. They should simply finish up with whatever computation they have
+been working on (for example, a compressing filter would flush the compressor
+and \function{send} the final block), and empty any buffers in preparation for
+processing a fresh new set of input. It is essentially the inverse of
+\function{start\_msg}.
+
+Additionally, if necessary, filters can define a constructor that takes any
+needed arguments, and a destructor to deal with deallocating memory, closing
+files, etc.
+
+There is also a \type{BufferingFilter} class (in \filename{buf\_filt.h}) that
+will take a message and split it up into an initial block that can be of any
+size (including zero), a sequence of fixed sized blocks of any non-zero size,
+and last (possibly zero-sized) final block. This might make a useful base class
+for your filters, depending on what you have in mind.
+
+
+\pagebreak
+\section{Public Key Cryptography}
+
+Let's create a 1024-bit RSA private key, encode the public key as a
+PKCS \#1 file with PEM encoding (which can be understood by many other
+cryptographic programs)
+
+\begin{verbatim}
+// everyone does:
+AutoSeeded_RNG rng;
+
+// Alice
+RSA_PrivateKey priv_rsa(rng, 1024 /* bits */);
+
+std::string alice_pem = X509::PEM_encode(priv_rsa);
+
+// send alice_pem to Bob, who does
+
+// Bob
+std::auto_ptr<X509_PublicKey> alice(load_key(alice_pem));
+
+RSA_PublicKey* alice_rsa = dynamic_cast<RSA_PublicKey>(alice);
+if(alice_rsa)
+ {
+ /* ... */
+ }
+
+\end{verbatim}
+
+\subsection{Creating PK Algorithm Key Objects}
+
+The library has interfaces for encryption, signatures, etc that do not require
+knowing the exact algorithm in use (for example RSA and Rabin-Williams
+signatures are handled by the exact same code path).
+
+One place where we \emph{do} need to know exactly what kind of algorithm is in
+use is when we are creating a key (\emph{But}: read the section ``Importing and
+Exporting PK Keys'', later in this manual).
+
+There are (currently) two kinds of public key algorithms in Botan: ones based
+on integer factorization (RSA and Rabin-Williams), and ones based on the
+discrete logarithm problem (DSA, Diffie-Hellman, Nyberg-Rueppel, and
+ElGamal). Since discrete logarithm parameters (primes and generators) can be
+shared among many keys, there is the notion of these being a combined type
+(called \type{DL\_Group}).
+
+There are two ways to create a DL private key (such as
+\type{DSA\_PrivateKey}). One is to pass in just a \type{DL\_Group} object -- a
+new key will automatically be generated. The other involves passing in a group
+to use, along with both the public and private values (private value first).
+
+Since in integer factorization algorithms, the modulus used isn't shared by
+other keys, we don't use this notion. You can create a new key by passing in a
+\type{u32bit} telling how long (in bits) the key should be, or you can copy an
+pre-existing key by passing in the appropriate parameters (primes, exponents,
+etc). For RSA and Rabin-Williams (the two IF schemes in Botan), the parameters
+are all \type{BigInt}s: prime 1, prime 2, encryption exponent, decryption
+exponent, modulus. The last two are optional, since they can easily be derived
+from the first three.
+
+\subsubsection{Creating a DL\_Group}
+
+There are quite a few ways to get a \type{DL\_Group} object. The best is to use
+the function \function{get\_dl\_group}, which takes a string naming a group; it
+will either return that group, if it knows about it, or throw an
+exception. Names it knows about include ``IETF-n'' where n is 768, 1024, 1536,
+2048, 3072, or 4096, and ``DSA-n'', where n is 512, 768, or 1024. The IETF
+groups are the ones specified for use with IPSec, and the DSA ones are the
+default DSA parameters specified by Java's JCE. For DSA and Nyberg-Rueppel, you
+should only use the ``DSA-n'' groups, while Diffie-Hellman and ElGamal can use
+either type (keep in mind that some applications/standards require DH/ELG to
+use DSA-style primes, while others require strong prime groups).
+
+You can also generate a new random group. This is not recommend, because it is
+quite slow, especially for safe primes.
+
+\subsection{Key Checking}
+
+Most public key algorithms have limitations or restrictions on their
+parameters. For example RSA requires an odd exponent, and algorithms based on
+the discrete logarithm problem need a generator $> 1$.
+
+Each low-level public key type has a function named \function{check\_key} that
+takes a \type{bool}. This function returns a Boolean value that declares
+whether or not the key is valid (from an algorithmic standpoint). For example,
+it will check to make sure that the prime parameters of a DSA key are, in fact,
+prime. It does not have anything to do with the validity of the key for any
+particular use, nor does it have anything to do with certificates that link a
+key (which, after all, is just some numbers) with a user or other entity. If
+\function{check\_key}'s argument is \type{true}, then it does ``strong''
+checking, which includes fairly expensive operations like primality checking.
+
+Keys are always checked when they are loaded or generated, so typically there
+is no reason to use this function directly. However, you can disable or reduce
+the checks for particular cases (public keys, loaded private keys, generated
+private keys) by setting the right config toggle (see the section on the
+configuration subsystem for details).
+
+\subsection{Getting a PK algorithm object}
+
+The key types, like \type{RSA\_PrivateKey}, do not implement any kind
+of padding or encoding (which is generally necessary for security). To
+get an object like this, the easiest thing to do is call the functions
+found in \filename{look\_pk.h}. Generally these take a key, followed
+by a string that specified what hashing and encoding method(s) to
+use. Examples of such strings are ``EME1(SHA-256)'' for OAEP
+encryption and ``EMSA4(SHA-256)'' for PSS signatures (where the
+message is hashed using SHA-256).
+
+Here are some basic examples (using an RSA key) to give you a feel for the
+possibilities. These examples assume \type{rsakey} is an
+\type{RSA\_PrivateKey}, since otherwise we would not be able to create a
+decryption or signature object with it (you can create encryption or signature
+verification objects with public keys, naturally). Remember to delete these
+objects when you're done with them.
+
+\begin{verbatim}
+ // PKCS #1 v2.0 / IEEE 1363 compatible encryption
+ PK_Encryptor* rsa_enc1 = get_pk_encryptor(rsakey, "EME1(RIPEMD-160)");
+ // PKCS #1 v1.5 compatible encryption
+ PK_Encryptor* rsa_enc2 = get_pk_encryptor(rsakey, "PKCS1v15");
+
+ // Raw encryption: no padding, input is directly encrypted by the key
+ // Don't use this unless you know what you're doing
+ PK_Encryptor* rsa_enc3 = get_pk_encryptor(rsakey, "Raw");
+
+ // This object can decrypt things encrypted by rsa_enc1
+ PK_Decryptor* rsa_dec1 = get_pk_decryptor(rsakey, "EME1(RIPEMD-160)");
+
+ // PKCS #1 v1.5 compatible signatures
+ PK_Signer* rsa_sig = get_pk_signer(rsakey, "EMSA3(MD5)");
+ PK_Verifier* rsa_verify = get_pk_verifier(rsakey, "EMSA3(MD5)");
+
+ // PKCS #1 v2.1 compatible signatures
+ PK_Signer* rsa_sig2 = get_pk_signer(rsakey, "EMSA4(SHA-1)");
+ PK_Verifier* rsa_verify2 = get_pk_verifier(rsakey, "EMSA4(SHA-1)");
+
+ // Hash input with SHA-1, but don't pad the input in any way; usually
+ // used with DSA/NR, not RSA
+ PK_Signer* rsa_sig = get_pk_signer(rsakey, "EMSA1(SHA-1)");
+\end{verbatim}
+
+\subsection{Encryption}
+
+The \type{PK\_Encryptor} and \type{PK\_Decryptor} classes are the interface for
+encryption and decryption, respectively.
+
+Calling \function{encrypt} with a \type{byte} array, a length
+parameter, and an RNG object will return the input encrypted with
+whatever scheme is being used. Calling the similar \function{decrypt}
+will perform the inverse operation. You can also do these operations
+with \type{SecureVector<byte>}s. In all cases, the output is returned
+via a \type{SecureVector<byte>}.
+
+If you attempt an operation with a larger size than the key can
+support (this limit varies based on the algorithm, the key size, and
+the padding method used (if any)), an exception will be
+thrown. Alternately, you can call \function{maximum\_input\_size},
+that will return the maximum size you can safely encrypt. In fact,
+you can often encrypt an object that is one byte longer, but only if
+enough of the high bits of the leading byte are set to zero. Since
+this is pretty dicey, it's best to stick with the advertised maximum.
+
+Available public key encryption algorithms in Botan are RSA and ElGamal. The
+encoding methods are EME1, denoted by ``EME1(HASHNAME)'', PKCS \#1 v1.5,
+called ``PKCS1v15'' or ``EME-PKCS1-v1\_5'', and raw encoding (``Raw'').
+
+For compatibility reasons, PKCS \#1 v1.5 is recommend for use with
+ElGamal (most other implementations of ElGamal do not support any
+other encoding format). RSA can also be used with PKCS \# 1 encoding,
+but because of various possible attacks, EME1 is the preferred
+encoding. EME1 requires the use of a hash function: unless a competent
+applied cryptographer tells you otherwise, you should use SHA-256 or
+SHA-512.
+
+Don't use ``Raw'' encoding unless you need it for backward
+compatibility with old protocols. There are many possible attacks
+against both ElGamal and RSA when they are used in this way.
+
+\subsection{Signatures}
+
+The signature algorithms look quite a bit like the hash functions. You
+can repeatedly call \function{update}, giving more and more of a
+message you wish to sign, and then call \function{signature}, which
+will return a signature for that message. If you want to do it all in
+one shot, call \function{sign\_message}, which will just call
+\function{update} with its argument and then return whatever
+\function{signature} returns. Generating a signature requires random
+numbers with some schemes, so \function{signature} and
+\function{sign\_message} both take a \type{RandomNumberGenerator\&}.
+
+You can validate a signature by updating the verifier class, and finally seeing
+the if the value returned from \function{check\_signature} is true (you pass
+the supposed signature to the \function{check\_signature} function as a byte
+array and a length or as a \type{MemoryRegion<byte>}). There is another
+function, \function{verify\_message}, which takes a pair of byte array/length
+pairs (or a pair of \type{MemoryRegion<byte>} objects), the first of which is
+the message, the second being the (supposed) signature. It returns true if the
+signature is valid and false otherwise.
+
+Available public key signature algorithms in Botan are RSA, DSA,
+Nyberg-Rueppel, and Rabin-Williams. Signature encoding methods include EMSA1,
+EMSA2, EMSA3, EMSA4, and Raw. All of them, except Raw, take a parameter naming
+a message digest function to hash the message with. Raw actually signs the
+input directly; if the message is too big, the signing operation will fail. Raw
+is not useful except in very specialized applications.
+
+There are various interactions that make certain encoding schemes and signing
+algorithms more or less useful.
+
+EMSA2 is the usual method for encoding Rabin-William signatures, so for
+compatibility with other implementations you may have to use that. EMSA4 (also
+called PSS), also works with Rabin-Williams. EMSA1 and EMSA3 do \emph{not} work
+with Rabin-Williams.
+
+RSA can be used with any of the available encoding methods. EMSA4 is by far the
+most secure, but is not (as of now) widely implemented. EMSA3 (also called
+``EMSA-PKCS1-v1\_5'') is commonly used with RSA (for example in SSL). EMSA1
+signs the message digest directly, without any extra padding or encoding. This
+may be useful, but is not as secure as either EMSA3 or EMSA4. EMSA2 may be used
+but is not recommended.
+
+For DSA and Nyberg-Rueppel, you should use EMSA1. None of the other encoding
+methods are particularly useful for these algorithms.
+
+\subsection{Key Agreement}
+
+You can get a hold of a \type{PK\_Key\_Agreement\_Scheme} object by calling
+\function{get\_pk\_kas} with a key that is of a type that supports key
+agreement (such as a Diffie-Hellman key stored in a \type{DH\_PrivateKey}
+object), and the name of a key derivation function. This can be ``Raw'',
+meaning the output of the primitive itself is returned as the key, or
+``KDF1(hash)'' or ``KDF2(hash)'' where ``hash'' is any string you happen to
+like (hopefully you like strings like ``SHA-256'' or ``RIPEMD-160''), or
+``X9.42-PRF(keywrap)'', which uses the PRF specified in ANSI X9.42. It takes
+the name or OID of the key wrap algorithm that will be used to encrypt a
+content encryption key.
+
+How key agreement generally works is that you trade public values with some
+other party, and then each of you runs a computation with the other's value and
+your key (this should return the same result to both parties). This computation
+can be called by using \function{derive\_key} with either a byte array/length
+pair, or a \type{SecureVector<byte>} than holds the public value of the other
+party. The last argument to either call is a number that specifies how long a
+key you want.
+
+Depending on the key derivation function you're using, you many not
+\emph{actually} get back a key of that size. In particular, ``Raw'' will return
+a number about the size of the Diffie-Hellman modulus, and KDF1 can only return
+a key that is the same size as the output of the hash. KDF2, on the other
+hand, will always give you a key exactly as long as you request, regardless of
+the underlying hash used with it. The key returned is a \type{SymmetricKey},
+ready to pass to a block cipher, MAC, or other symmetric algorithm.
+
+The public value that should be used can be obtained by calling
+\function{public\_data}, which exists for any key that is associated with a
+key agreement algorithm. It returns a \type{SecureVector<byte>}.
+
+``KDF2(SHA-256)'' is by far the preferred algorithm for key derivation
+in new applications. The X9.42 algorithm may be useful in some
+circumstances, but unless you need X9.42 compatibility, KDF2 is easier
+to use.
+
+There is a Diffie-Hellman example included in the distribution, which you may
+want to examine.
+
+\subsection{Importing and Exporting PK Keys}
+
+[This section mentions \type{Pipe} and \type{DataSource}, which is not covered
+until later in the manual. Please read those sections for more about
+\type{Pipe} and \type{DataSource} and their uses.]
+
+There are many, many different (often conflicting) standards surrounding public
+key cryptography. There is, thankfully, only two major standards surrounding
+the representation of a public or private key: X.509 (for public keys), and
+PKCS \#8 (for private keys). Other crypto libraries, like OpenSSL and B-SAFE,
+also support these formats, so you can easily exchange keys with software that
+doesn't use Botan.
+
+In addition to ``plain'' public keys, Botan also supports X.509 certificates.
+These are documented in the section ``Certificate Handling'', later in this
+manual.
+
+\subsubsection{Public Keys}
+
+The interfaces for doing either of these are quite similar. Let's look at the
+X.509 stuff first:
+\begin{verbatim}
+namespace X509 {
+ void encode(const X509_PublicKey& key, Pipe& out, X509_Encoding enc = PEM);
+ std::string PEM_encode(const X509_PublicKey& out);
+
+ X509_PublicKey* load_key(DataSource& in);
+ X509_PublicKey* load_key(const std::string& file);
+ X509_PublicKey* load_key(const SecureVector<byte>& buffer);
+}
+\end{verbatim}
+
+Basically, \function{X509::encode} will take an \type{X509\_PublicKey}
+(as of now, that's any RSA, DSA, or Diffie-Hellman key) and encodes it
+using \arg{enc}, which can be either \type{PEM} or
+\type{RAW\_BER}. Using \type{PEM} is \emph{highly} recommended for
+many reasons, including compatibility with other software, for
+transmission over 8-bit unclean channels, because it can be identified
+by a human without special tools, and because it sometimes allows more
+sane behavior of tools that process the data. It will place the
+encoding into \arg{out}. Remember that if you have just created the
+\type{Pipe} that you are passing to \function{X509::encode}, you need
+to call \function{start\_msg} first. Particularly with public keys,
+about 99\% of the time you just want to PEM encode the key and then
+write it to a file or something. In this case, it's probably easier to
+use \function{X509::PEM\_encode}. This function will simply return the
+PEM encoding of the key as a \type{std::string}.
+
+For loading a public key, the preferred method is one of the variants
+of \function{load\_key}. This function will return a newly allocated
+key based on the data from whatever source it is using (assuming, of
+course, the source is in fact storing a representation of a public
+key). The encoding used (PEM or BER) need not be specified; the format
+will be detected automatically. The key is allocated with
+\function{new}, and should be released with \function{delete} when you
+are done with it. The first takes a generic \type{DataSource} that
+you have to allocate~--~the others are simple wrapper functions that
+take either a filename or a memory buffer.
+
+So what can you do with the return value of \function{load\_key}? On
+its own, a \type{X509\_PublicKey} isn't particularly useful; you can't
+encrypt messages or verify signatures, or much else. But, using
+\function{dynamic\_cast}, you can figure out what kind of operations
+the key supports. Then, you can cast the key to the appropriate type
+and pass it to a higher-level class. For example:
+
+\begin{verbatim}
+ /* Might be RSA, might be ElGamal, might be ... */
+ X509_PublicKey* key = X509::load_key("pubkey.asc");
+ /* You MUST use dynamic_cast to convert, because of virtual bases */
+ PK_Encrypting_Key* enc_key = dynamic_cast<PK_Encrypting_Key*>(key);
+ if(!enc_key)
+ throw Some_Exception();
+ PK_Encryptor* enc = get_pk_encryptor(*enc_key, "EME1(SHA-256)");
+ SecureVector<byte> cipher = enc->encrypt(some_message, size_of_message);
+\end{verbatim}
+
+\subsubsection{Private Keys}
+
+There are two different options for private key import/export. The first is a
+plaintext version of the private key. This is supported by the following
+functions:
+
+\begin{verbatim}
+namespace PKCS8 {
+ void encode(const PKCS8_PrivateKey& key, Pipe& to, X509_Encoding enc = PEM);
+
+ std::string PEM_encode(const PKCS8_PrivateKey& key);
+}
+\end{verbatim}
+
+These functions are basically the same as the X.509 functions described
+previously. The only difference is that they take a \type{PKCS8\_PrivateKey}
+type (which, again, can be either RSA, DSA, or Diffie-Hellman, but this time
+the key must be a private key). In most situations, using these is a bad idea,
+because anyone can come along and grab the private key without having to know
+any passwords or other secrets. Unless you have very particular security
+requirements, always use the versions that encrypt the key based on a
+passphrase. For importing, the same functions can be used for encrypted and
+unencrypted keys.
+
+The other way to export a PKCS \#8 key is to first encode it in the same manner
+as done above, then encrypt it (using a passphrase and the techniques of PKCS
+\#5), and store the whole thing into another structure. This method is
+definitely preferred, since otherwise the private key is unprotected. The
+following functions support this technique:
+
+\begin{verbatim}
+namespace PKCS8 {
+ void encrypt_key(const PKCS8_PrivateKey& key, Pipe& out,
+ std::string passphrase, std::string pbe = "",
+ X509_Encoding enc = PEM);
+
+ std::string PEM_encode(const PKCS8_PrivateKey& key, std::string passphrase,
+ std::string pbe = "");
+}
+\end{verbatim}
+
+To export an encrypted private key, call \function{PKCS8::encrypt\_key}. The
+\arg{key}, \arg{out}, and \arg{enc} arguments are similar in usage to the ones
+for \function{PKCS8::encode}. As you might notice, there are two new arguments
+for \function{PKCS8::encrypt\_key}, however. The first is a passphrase (which
+you presumably got from a user somehow). This will be used to encrypt the key.
+The second new argument is \arg{pbe}; this specifies a particular password
+based encryption (or PBE) algorithm.
+
+The \function{PEM\_encode} version shown here is similar to the one that
+doesn't take a passphrase. Essentially it encrypts the key (using the default
+PBE algorithm), and then returns a C++ string with the PEM encoding of the key.
+
+If \arg{pbe} is blank, then the default algorithm (controlled by the
+``base/default\_pbe'' option) will be used. As shipped, this default is
+``PBE-PKCS5v20(SHA-1,TripleDES/CBC)'' . This is among the more secure options
+of PKCS \#5, and is widely supported among implementations of PKCS \#5 v2.0. It
+offers 168 bits of security against attacks, which should be more that
+sufficient. If you need compatibility with systems that only support PKCS \#5
+v1.5, pass ``PBE-PKCS5v15(MD5,DES/CBC)'' as \arg{pbe}. However, be warned that
+this PBE algorithm only has 56 bits of security against brute force attacks. As
+of 1.4.5, all three keylengths of AES are also available as options, which can
+be used with by specifying a PBE algorithm of
+``PBE-PKCS5v20(SHA-1,AES-256/CBC)'' (or ``AES-128'' or ``AES-192''). Support
+for AES is slightly non-standard, and some applications or libraries might not
+handle it. It is known that OpenSSL (0.9.7 and later) do handle AES for private
+key encryption.
+
+There may be some strange programs out there that support the v2.0 extensions
+to PBES1 but not PBES2; if you need to inter-operate with a program like that,
+use ``PBE-PKCS5v15(MD5,RC2/CBC)''. For example, OpenSSL supports this format
+(though since it also supports the v2.0 schemes, there is no reason not to just
+use TripleDES or AES). This scheme uses a 64-bit key that, while
+significantly better than a 56-bit key, is a bit too small for comfort.
+
+Last but not least, there are some functions that are basically identical to
+\function{X509::load\_key} that will load, and possibly decrypt, a PKCS \#8
+private key:
+
+\begin{verbatim}
+namespace PKCS8 {
+ PKCS8_PrivateKey* load_key(DataSource& in,
+ RandomNumberGenerator& rng,
+ const User_Interface& ui);
+ PKCS8_PrivateKey* load_key(DataSource& in,
+ RandomNumberGenerator& rng,
+ std::string passphrase = "");
+
+ PKCS8_PrivateKey* load_key(const std::string& filename,
+ RandomNumberGenerator& rng,
+ const User_Interface& ui);
+ PKCS8_PrivateKey* load_key(const std::string& filename,
+ RandomNumberGenerator& rng,
+ const std::string& passphrase = "");
+}
+\end{verbatim}
+
+The versions that take \type{std::string} \arg{passphrase}s are primarily for
+compatibility, but they are useful in limited circumstances. The
+\type{User\_Interface} versions are how \function{load\_key} is actually
+implemented, and provides for much more flexibility. Essentially, if the
+passphrase given to the function is not correct, then an exception is thrown
+and that is that. However, if you pass in an UI object instead, then the UI
+object can keep asking the user for the passphrase until they get it right (or
+until they cancel the action, though the UI interface). A
+\type{User\_Interface} has very little to do with talking to users; it's just a
+way to glue together Botan and whatever user interface you happen to be
+using. You can think of it as a user interface interface. The default
+\type{User\_Interface} is actually very dumb, and effectively acts just like
+the versions taking the \type{std::string}.
+
+All versions need access to a \type{RandomNumberGenerator} in order to
+perform probabilistic tests on the loaded key material.
+
+After loading a key, you can use \function{dynamic\_cast} to find out what
+operations it supports, and use it appropriately. Remember to \function{delete}
+it once you are done with it.
+
+\subsubsection{Limitations}
+
+As of now Nyberg-Rueppel and Rabin-Williams keys cannot be imported or
+exported, because they have no official ASN.1 OID or definition. ElGamal keys
+can (as of Botan 1.3.8) be imported and exported, but the only other
+implementation that supports the format is Peter Gutmann's Cryptlib. If you
+can help it, stick to RSA and DSA.
+
+\emph{Note}: Currently NR and RW are given basic ASN.1 key formats (which
+mirror DSA and RSA, respectively), which means that, if they are assigned an
+OID, they can be imported and exported just as easily as RSA and DSA. You can
+assign them an OID by putting a line in a Botan configuration file, calling
+\function{OIDS::add\_oid}, or editing \filename{src/policy.cpp}. Be warned that
+it is possible that a future version will use a format that is different from
+the current one (\ie, a newly standardized format).
+
+\pagebreak
+\section{Certificate Handling}
+
+A certificate is essentially a binding between some identifying information of
+a person or other entity (called a \emph{subject}) and a public key. This
+binding is asserted by a signature on the certificate, which is placed there by
+some authority (the \emph{issuer}) that at least claims that it knows the
+subject named in the certificate really ``owns'' the private key
+corresponding to the public key in the certificate.
+
+The major certificate format in use today is X.509v3, designed by ISO and
+further hacked on by dozens (hundreds?) of other organizations.
+
+When working with certificates, the main class to remember is
+\type{X509\_Certificate}. You can read an object of this type, but you can't
+create one on the fly; a CA object is necessary for actually making a new
+certificate. So for the most part, you only have to worry about reading them
+in, verifying the signatures, and getting the bits of data in them (most
+commonly the public key, and the information about the user of that key). An
+X.509v3 certificate can contain a literally infinite number of items related to
+all kinds of things. Botan doesn't support a lot of them, simply because nobody
+uses them and they're an impossible mess to work with. This section only
+documents the most commonly used ones of the ones that are supported; for the
+rest, read \filename{x509cert.h} and \filename{asn1\_obj.h} (which has the
+definitions of various common ASN.1 constructs used in X.509).
+
+\subsection{So what's in an X.509 certificate?}
+
+Obviously, you want to be able to get the public key. This is achieved by
+calling the member function \function{subject\_public\_key}, which will return
+a \type{X509\_PublicKey*}. As to what to do with this, read about
+\function{load\_key} in the section ``Importing and Exporting PK Keys''. In the
+general case, this could be any kind of public key, though 99\% of the time it
+will be an RSA key. However, Diffie-Hellman and DSA keys are also supported, so
+be careful about how you treat this. It is also a wise idea to examine the
+value returned by \function{constraints}, to see what uses the public key is
+approved for.
+
+The second major piece of information you'll want is the name/email/etc of the
+person to whom this certificate is assigned. Here is where things get a little
+nasty. X.509v3 has two (well, mostly just two $\ldots$) different places where
+you can stick information about the user: the \emph{subject} field, and in an
+extension called \emph{subjectAlternativeName}. The \emph{subject} field is
+supposed to only included the following information: country, organization
+(possibly), an organizational sub-unit name (possibly), and a so-called common
+name. The common name is usually the name of the person, or it could be a title
+associated with a position of some sort in the organization. It may also
+include fields for state/province and locality. What exactly a locality is,
+nobody knows, but it's usually given as a city name.
+
+Botan doesn't currently support any of the Unicode variants used in ASN.1
+(UTF-8, UCS-2, and UCS-4), any of which could be used for the fields in the
+DN. This could be problematic, particularly in Asia and other areas where
+non-ASCII characters are needed for most names. The UTF-8 and UCS-2 string
+types \emph{are} accepted (in fact, UTF-8 is used when encoding much of the
+time), but if any of the characters included in the string are not in ISO
+8859-1 (\ie 0 \ldots 255), an exception will get thrown. Currently the
+\type{ASN1\_String} type holds its data as ISO 8859-1 internally (regardless
+of local character set); this would have to be changed to hold UCS-2 or UCS-4
+in order to support Unicode (also, many interfaces in the X.509 code would have
+to accept or return a \type{std::wstring} instead of a \type{std::string}).
+
+Like the distinguished names, subject alternative names can contain a lot of
+things that Botan will flat out ignore (most of which you would never actually
+want to use). However, there are three very useful pieces of information that
+this extension might hold: an email address (``person@site1.com''), a DNS name
+(``somehost.site2.com''), or a URI (``http://www.site3.com'').
+
+So, how to get the information? Simply call \function{subject\_info} with the
+name of the piece of information you want, and it will return a
+\type{std::string} that is either empty (signifying that the certificate
+doesn't have this information), or has the information requested. There are
+several names for each possible item, but the most easily readable ones are:
+``Name'', ``Country'', ``Organization'', ``Organizational Unit'', ``Locality'',
+``State'', ``RFC822'', ``URI'', and ``DNS''. These values are returned as a
+\type{std::string}.
+
+You can also get information about the issuer of the certificate in the same
+way, using \function{issuer\_info}.
+
+\subsubsection{X.509v3 Extensions}
+
+X.509v3 specifies a large number of possible extensions. Botan supports some,
+but by no means all of them. This section lists which ones are supported, and
+notes areas where there may be problems with the handling. You have to be
+pretty familiar with X.509 in order to understand what this is talking about.
+
+\begin{list}{$\cdot$}
+ \item Key Usage and Extended Key Usage: No problems known.
+ \item
+
+ \item Basic Constraints: No problems known. The default for a v1/v2
+ certificate is assume it's a CA if and only if the option
+ ``x509/default\_to\_ca'' is set. A v3 certificate is marked as a CA if
+ (and only if) the basic constraints extension is present and set for a
+ CA cert.
+
+ \item Subject Alternative Names: Only the ``rfc822Name'', ``dNSName'', and
+ ``uniformResourceIdentifier'' fields will be stored; all others are
+ ignored.
+
+ \item Issuer Alternative Names: Same restrictions as the Subject Alternative
+ Names extension. New certificates generated by Botan never include the
+ issuer alternative name.
+
+ \item Authority Key Identifier: Only the version using KeyIdentifier is
+ supported. If the GeneralNames version is used and the extension is
+ critical, an exception is thrown. If both the KeyIdentifier and
+ GeneralNames versions are present, then the KeyIdentifier will be
+ used, and the GeneralNames ignored.
+
+ \item Subject Key Identifier: No problems known.
+\end{list}
+
+\subsubsection{Revocation Lists}
+
+It will occasionally happen that a certificate must be revoked before its
+expiration date. Examples of this happening include the private key being
+compromised, or the user to which it has been assigned leaving an
+organization. Certificate revocation lists are an answer to this problem
+(though online certificate validation techniques are starting to become
+somewhat more popular). Essentially, every once in a while the CA will release
+a CRL, listing all certificates that have been revoked. Also included is
+various pieces of information like what time a particular certificate was
+revoked, and for what reason. In most systems, it is wise to support some form
+of certificate revocation, and CRLs handle this fairly easily.
+
+For most users, processing a CRL is quite easy. All you have to do is call the
+constructor, which will take a filename (or a \type{DataSource\&}). The CRLs
+can either be in raw BER/DER, or in PEM format; the constructor will figure out
+which format without any extra information. For example:
+
+\begin{verbatim}
+ X509_CRL crl1("crl1.der");
+
+ DataSource_Stream in("crl2.pem");
+ X509_CRL crl2(in);
+\end{verbatim}
+
+After that, pass the \type{X509\_CRL} object to a \type{X509\_Store} object
+with \type{X509\_Code} \function{add\_crl}(\type{X509\_CRL}), and all future
+verifications will take into account the certificates listed, assuming
+\function{add\_crl} returns \type{VERIFIED}. If it doesn't return
+\type{VERIFIED}, then the return value is an error code signifying that the CRL
+could not be processed due to some problem (which could range from the issuing
+certificate not being found, to the CRL having some format problem). For more
+about the \type{X509\_Store} API, read the section later in this chapter.
+
+\subsection{Reading Certificates}
+
+\type{X509\_Certificate} has two constructors, each of which takes a source of
+data; a filename to read, and a \type{DataSource\&}.
+
+\subsection{Storing and Using Certificates}
+
+If you read a certificate, you probably want to verify the signature on
+it. However, consider that to do so, we may have to verify the signature on the
+certificate that we used to verify the first certificate, and on and on until
+we hit the top of the certificate tree somewhere. It would be a might huge pain
+to have to handle all of that manually in every application, so there is
+something that does it for you: \type{X509\_Store}.
+
+This is a pretty easy thing to use. The basic operations are: put certificates
+and CRLs into it, search for certificates, and attempt to verify
+certificates. That's about it. In the future, there will be support for online
+retrieval of certificates and CRLs (\eg with the HTTP cert-store interface
+currently under consideration by PKIX).
+
+\subsubsection{Adding Certificates}
+
+You can add new certificates to a certificate store using any of these
+functions:
+
+\function{add\_cert}(\type{const X509\_Certificate\&} \arg{cert},
+ \type{bool} \arg{trusted} \type{= false})
+
+\function{add\_certs}(\type{DataSource\&} \arg{source})
+
+\function{add\_trusted\_certs}(\type{DataSource\&} \arg{source})
+
+The versions that take a \type{DataSource\&} will add all the certificates
+that it can find in that source.
+
+All of them add the cert(s) to the store. The 'trusted' certificates are the
+ones that you have some reason to trust are genuine. For example, say your
+application is working with certificates that are owned by employees of some
+company, and all of their certificates are signed by the company CA, whose
+certificate is in turned signed by a commercial root CA. What you would then do
+is include the certificate of the commercial CA with your application, and read
+it in as a trusted certificate. From there, you could verify the company CA's
+certificate, and then use that to verify the end user's certificates. Only
+self-signed certificates may be considered trusted.
+
+\subsubsection{Adding CRLs}
+
+\type{X509\_Code} \function{add\_crl}(\type{const X509\_CRL\&} \arg{crl});
+
+This will process the CRL and mark the revoked certificates. This will also
+work if a revoked certificate is added to the store sometime after the CRL is
+processed. The function can return an error code (listed later), or will return
+\type{VERIFIED} if everything completed successfully.
+
+\subsubsection{Storing Certificates}
+
+You can output a set of certificates by calling \function{PEM\_encode}, which
+will return a \type{std::string} containing each of the certificates in the
+store, PEM encoded and concatenated. This simple format can easily be read by
+both Botan and other libraries/applications.
+
+\subsubsection{Searching for Certificates}
+
+You can find certificates in the store with a series of functions contained
+in the \function{X509\_Store\_Search} namespace:
+
+\begin{verbatim}
+namespace X509_Store_Search {
+std::vector<X509_Certificate> by_email(const X509_Store& store,
+ const std::string& email_addr);
+std::vector<X509_Certificate> by_name(const X509_Store& store,
+ const std::string& name);
+std::vector<X509_Certificate> by_dns(const X509_Store&,
+ const std::string& dns_name);
+}
+\end{verbatim}
+
+These functions will return a (possibly empty) vector of certificates from
+\arg{store} matching your search criteria. The email address and DNS name
+searches are case-insensitive but are sensitive to extra whitespace and so
+on. The name search will do case-insensitive substring matching, so, for
+example, calling \function{X509\_Store\_Search::by\_name}(\arg{your\_store},
+``dob'') will return certificates for ``J.R. 'Bob' Dobbs'' and
+``H. Dobbertin'', assuming both of those certificates are in \arg{your\_store}.
+
+You could then display the results to a user, and allow them to select the
+appropriate one. Searching using an email address as the key is usually more
+effective than the name, since email addresses are rarely shared.
+
+\subsubsection{Certificate Stores}
+
+An object of type \type{Certificate\_Store} is a generalized interface to an
+external source for certificates (and CRLs). Examples of such a store would be
+one that looked up the certificates in a SQL database, or by contacting a CGI
+script running on a HTTP server. There are currently three mechanisms for
+looking up a certificate, and one for retrieving CRLs. By default, most of
+these mechanisms will simply return an empty \type{std::vector} of
+\type{X509\_Certificate}. This storage mechanism is \emph{only} queried when
+doing certificate validation: it allows you to distribute only the root key
+with an application, and let some online method handle getting all the other
+certificates that are needed to validate an end entity certificate. In
+particular, the search routines will not attempt to access the external
+database.
+
+The three certificate lookup methods are \function{by\_SKID} (Subject Key
+Identifier), \function{by\_name} (the CommonName DN entry), and
+\function{by\_email} (stored in either the distinguished name, or in a
+subjectAlternativeName extension). The name and email versions take a
+\type{std::string}, while the SKID version takes a \type{SecureVector<byte>}
+containing the subject key identifier in raw binary. You can choose not to
+implement \function{by\_name} or \function{by\_email}, but \function{by\_SKID}
+is mandatory to implement, and, currently, is the only version that is used by
+\type{X509\_Store}.
+
+Finally, there is a method for finding CRLs, called \function{get\_crls\_for},
+that takes an \type{X509\_Certificate} object, and returns a
+\type{std::vector} of \type{X509\_CRL}. While generally there will be only one
+CRL, the use of the vector makes it easy to return no CRLs (\eg, if the
+certificate store doesn't support retrieving them), or return multiple ones
+(for example, if the certificate store can't determine precisely which key was
+used to sign the certificate). Implementing the function is optional, and by
+default will return no CRLs. If it is available, it will be used by
+\type{X509\_CRL}.
+
+As for actually using such a store, you have to tell \type{X509\_Store} about
+it, by calling the \type{X509\_Store} member function
+
+\function{add\_new\_certstore}(\type{Certificate\_Store}* \arg{new\_store})
+
+The argument, \arg{new\_store}, will be deleted by \type{X509\_Store}'s
+destructor, so make sure to allocate it with \function{new}.
+
+\subsubsection{Verifying Certificates}
+
+There is a single function in \type{X509\_Store} related to verifying a
+certificate:
+
+\type{X509\_Code}
+\function{validate\_cert}(\type{const X509\_Certificate\&} \arg{cert},
+ \type{Cert\_Usage} \arg{usage} = \type{ANY})
+
+To sum things up simply, it returns \type{VERIFIED} if the certificate can
+safely be considered valid for the usage(s) described by \arg{usage}, and an
+error code if it is not. Naturally, things are a bit more complicated than
+that. The enum \type{Cert\_Usage} is defined inside the \type{X509\_Store}
+class, it (currently) can take on any of the values \type{ANY} (any usage is
+OK), \type{TLS\_SERVER} (for SSL/TLS server authentication), \type{TLS\_CLIENT}
+(for SSL/TLS client authentication), \type{CODE\_SIGNING},
+\type{EMAIL\_PROTECTION} (email encryption, usually this means S/MIME),
+\type{TIME\_STAMPING} (in theory any time stamp application, usually IETF
+PKIX's Time Stamp Protocol), or \type{CRL\_SIGNING}. Note that Microsoft's code
+signing system, certainly the most widely used, uses a completely different
+(and basically undocumented) method for marking certificates for code signing.
+
+First, how does it know if a certificate is valid? Basically, a certificate is
+valid if both of the following hold: a) the signature in the certificate can be
+verified using the public key in the issuer's certificate, and b) the issuer's
+certificate is a valid CA certificate. Note that this definition is
+recursive. We get out of this by ``bottoming out'' when we reach a certificate
+that we consider trusted. In general this will either be a commercial root CA,
+or an organization or application specific CA.
+
+There are actually a few other restrictions (validity periods, key usage
+restrictions, etc), but the above summarizes the major points of the validation
+algorithm. In theory, Botan implements the certificate path validation
+algorithm given in RFC 2459, but in practice it does not (yet), because we
+don't support the X.509v3 policy or name constraint extensions.
+
+Possible values for \arg{usage} are \type{TLS\_SERVER}, \type{TLS\_CLIENT},
+\type{CODE\_SIGNING}, \type{EMAIL\_PROTECTION}, \type{CRL\_SIGNING}, and
+\type{TIME\_STAMPING}, and \type{ANY}. The default \type{ANY} does not mean
+valid for any use, it means ``is valid for some usage''. This is generally
+fine, and in fact requiring that a random certificate support a particular
+usage will likely result in a lot of failures, unless your application is very
+careful to always issue certificates with the proper extensions, and you never
+use certificates generated by other apps.
+
+Return values for \function{validate\_cert} (and \function{add\_crl}) include:
+
+\begin{list}{$\cdot$}
+ \item VERIFIED: The certificate is valid for the specified use.
+ \item
+ \item INVALID\_USAGE: The certificate cannot be used for the specified use.
+
+ \item CANNOT\_ESTABLISH\_TRUST: The root certificate was not marked as
+ trusted.
+ \item CERT\_CHAIN\_TOO\_LONG: The certificate chain exceeded the length
+ allowed by a basicConstraints extension.
+ \item SIGNATURE\_ERROR: An invalid signature was found
+ \item POLICY\_ERROR: Some problem with the certificate policies was found.
+
+ \item CERT\_FORMAT\_ERROR: Some format problem was found in a certificate.
+ \item CERT\_ISSUER\_NOT\_FOUND: The issuer of a certificate could not be
+ found.
+ \item CERT\_NOT\_YET\_VALID: The certificate is not yet valid.
+ \item CERT\_HAS\_EXPIRED: The certificate has expired.
+ \item CERT\_IS\_REVOKED: The certificate has been revoked.
+
+ \item CRL\_FORMAT\_ERROR: Some format problem was found in a CRL.
+ \item CRL\_ISSUER\_NOT\_FOUND: The issuer of a CRL could not be found.
+ \item CRL\_NOT\_YET\_VALID: The CRL is not yet valid.
+ \item CRL\_HAS\_EXPIRED: The CRL has expired.
+
+ \item CA\_CERT\_CANNOT\_SIGN: The CA certificate found does not have an
+ contain a public key that allows signature verification.
+ \item CA\_CERT\_NOT\_FOR\_CERT\_ISSUER: The CA cert found is not allowed to
+ issue certificates.
+ \item CA\_CERT\_NOT\_FOR\_CRL\_ISSUER: The CA cert found is not allowed to
+ issue CRLs.
+
+ \item UNKNOWN\_X509\_ERROR: Some other error occurred.
+
+\end{list}
+
+\subsection{Certificate Authorities}
+
+Setting up a CA for X.509 certificates is actually probably the easiest thing
+to do related to X.509. A CA is represented by the type \type{X509\_CA}, which
+can be found in \filename{x509\_ca.h}. A CA always needs its own certificate,
+which can either be a self-signed certificate (see below on how to create one)
+or one issued by another CA (see the section on PKCS \#10 requests). Creating
+a CA object is done by the following constructor:
+
+\begin{verbatim}
+ X509_CA(const X509_Certificate& cert, const PKCS8_PrivateKey& key);
+\end{verbatim}
+
+The private key is the private key corresponding to the public key in the
+CA's certificate.
+
+Generally, requests for new certificates are supplied to a CA in the form on
+PKCS \#10 certificate requests (called a \type{PKCS10\_Request} object in
+Botan). These are decoded in a similar manner to
+certificates/CRLs/etc. Generally, a request is vetted by humans (who somehow
+verify that the name in the request corresponds to the name of the person who
+requested it), and then signed by a CA key, generating a new certificate.
+
+\begin{verbatim}
+ X509_Certificate sign_request(const PKCS10_Request&) const;
+\end{verbatim}
+
+\subsubsection{Generating CRLs}
+
+As mentioned previously, the ability to process CRLs is highly important in
+many PKI systems. In fact, according to strict X.509 rules, you must not
+validate any certificate if the appropriate CRLs are not available (though
+hardly any systems are that strict). In any case, a CA should have a valid CRL
+available at all times.
+
+Of course, you might be wondering what to do if no certificates have been
+revoked. In fact, CRLs can be issued without any actually revoked certificates
+- the list of certs will simply be empty. To generate a new, empty CRL, just
+call \type{X509\_CRL}
+\function{X509\_CA::new\_crl}(\type{u32bit}~\arg{seconds}~=~0)~--~it will
+create a new, empty, CRL. If \arg{seconds} is the default 0, then the normal
+default CRL next update time (the value of the ``x509/crl/next\_update'') will
+be used. If not, then \arg{seconds} specifies how long (in seconds) it will be
+until the CRL's next update time (after this time, most clients will reject the
+CRL as too old).
+
+On the other hand, you may have issued a CRL before. In that case, you will
+want to issue a new CRL that contains all previously revoked
+certificates, along with any new ones. This is done by calling the
+\type{X509\_CA} member function
+\function{update\_crl}(\type{X509\_CRL}~\arg{old\_crl},
+\type{std::vector<CRL\_Entry>}~\arg{new\_revoked},
+\type{u32bit}~\arg{seconds}~=~0), where \type{X509\_CRL} is the last CRL this
+CA issued, and \arg{new\_revoked} is a list of any newly revoked certificates.
+The function returns a new \type{X509\_CRL} to make available for clients. The
+semantics for the \arg{seconds} argument is the same as \function{new\_crl}.
+
+The \type{CRL\_Entry} type is a structure that contains, at a minimum, the
+serial number of the revoked certificate. As serial numbers are never repeated,
+the pairing of an issuer and a serial number (should) distinctly identify any
+certificate. In this case, we represent the serial number as a
+\type{SecureVector<byte>} called \arg{serial}. There are two additional
+(optional) values, an enumeration called \type{CRL\_Code} that specifies the
+reason for revocation (\arg{reason}), and an object that represents the time
+that the certificate became invalid (if this information is known).
+
+If you wish to remove an old entry from the CRL, insert a new entry for the
+same cert, with a \arg{reason} code of \type{DELETE\_CRL\_ENTRY}. For example,
+if a revoked certificate has expired 'normally', there is no reason to continue
+to explicitly revoke it, since clients will reject the cert as expired in any
+case.
+
+\subsubsection{Self-Signed Certificates}
+
+Generating a new self-signed certificate can often be useful, for example when
+setting up a new root CA, or for use in email applications. In this case,
+the solution is summed up simply as:
+
+\begin{verbatim}
+namespace X509 {
+ X509_Certificate create_self_signed_cert(const X509_Cert_Options& opts,
+ const PKCS8_PrivateKey& key);
+}
+\end{verbatim}
+
+Where \arg{key} is obviously the private key you wish to use (the public key,
+used in the certificate itself, is extracted from the private key), and
+\arg{opts} is an structure that has various bits of information that will be
+used in creating the certificate (this structure, and its use, is discussed
+below). This function is found in the header \filename{x509self.h}. There is an
+example of using this function in the \filename{self\_sig} example.
+
+\subsubsection{Creating PKCS \#10 Requests}
+
+Also in \filename{x509self.h}, there is a function for generating new PKCS \#10
+certificate requests.
+
+\begin{verbatim}
+namespace X509 {
+ PKCS10_Request create_cert_req(const X509_Cert_Options&,
+ const PKCS8_PrivateKey&);
+}
+\end{verbatim}
+
+This function acts quite similarly to \function{create\_self\_signed\_cert},
+except it instead returns a PKCS \#10 certificate request. After creating it,
+one would typically transmit it to a CA, who signs it and returns a freshly
+minted X.509 certificate. There is an example of using this function in the
+\filename{pkcs10} example.
+
+\subsubsection{Certificate Options}
+
+So what is this \type{X509\_Cert\_Options} thing we've been passing around?
+Basically, it's a bunch of information that will end up being stored into the
+certificate. This information comes in 3 major flavors: information about the
+subject (CA or end-user), the validity period of the certificate, and
+restrictions on the usage of the certificate.
+
+First and foremost is a number of \type{std::string} members, which contains
+various bits of information about the user: \arg{common\_name},
+\arg{serial\_number}, \arg{country}, \arg{organization}, \arg{org\_unit},
+\arg{locality}, \arg{state}, \arg{email}, \arg{dns\_name}, and \arg{uri}. As
+many of these as possible should be filled it (especially an email address),
+though the only required ones are \arg{common\_name} and \arg{country}.
+
+There is another value that is only useful when creating a PKCS \#10 request,
+which is called \arg{challenge}. This is a challenge password, which you can
+later use to request certificate revocation (\emph{if} the CA supports doing
+revocations in this manner).
+
+Then there is the validity period; these are set with \function{not\_before}
+and \function{not\_after}. Both of these functions also take a
+\type{std::string}, which specifies when the certificate should start being
+valid, and when it should stop being valid. If you don't set the starting
+validity period, it will automatically choose the current time. If you don't
+set the ending time, it will choose the starting time plus a default time
+period. The arguments to these functions specify the time in the following
+format: ``2002/11/27 1:50:14''. The time is in 24-hour format, and the date is
+encoded as year/month/day. The date must be specified, but you can omit the
+time or trailing parts of it, for example ``2002/11/27 1:50'' or
+``2002/11/27''.
+
+Lastly, you can set constraints on a key. The one you're mostly likely to want
+to use is to create (or request) a CA certificate, which can be done by calling
+the member function \function{CA\_key}. This should only be used when needed.
+
+Other constraints can be set by calling the member functions
+\function{add\_constraints} and \function{add\_ex\_constraints}. The first
+takes a \type{Key\_Constraints} value, and replaces any previously set
+value. If no value is set, then the certificate key is marked as being valid
+for any usage. You can set it to any of the following (for more than one
+usage, OR them together): \type{DIGITAL\_SIGNATURE}, \type{NON\_REPUDIATION},
+\type{KEY\_ENCIPHERMENT}, \type{DATA\_ENCIPHERMENT}, \type{KEY\_AGREEMENT},
+\type{KEY\_CERT\_SIGN}, \type{CRL\_SIGN}, \type{ENCIPHER\_ONLY},
+\type{DECIPHER\_ONLY}. Many of these have quite special semantics, so you
+should either consult the appropriate standards document (such as RFC 3280), or
+simply not call \function{add\_constraints}, in which case the appropriate
+values will be chosen for you.
+
+The second function, \function{add\_ex\_constraints}, allows you to specify an
+OID that has some meaning with regards to restricting the key to particular
+usages. You can, if you wish, specify any OID you like, but there is a set of
+standard ones that other applications will be able to understand. These are
+the ones specified by the PKIX standard, and are named ``PKIX.ServerAuth'' (for
+TLS server authentication), ``PKIX.ClientAuth'' (for TLS client
+authentication), ``PKIX.CodeSigning'', ``PKIX.EmailProtection'' (most likely
+for use with S/MIME), ``PKIX.IPsecUser'', ``PKIX.IPsecTunnel'',
+``PKIX.IPsecEndSystem'', and ``PKIX.TimeStamping''. You can call
+\function{add\_ex\_constraints} any number of times~--~each new OID will be
+added to the list to include in the certificate.
+
+\pagebreak
+\section{The Low-Level Interface}
+
+Botan has two different interfaces. The one documented in this section is meant
+more for implementing higher-level types (see the section on filters, earlier in
+this manual) than for use by applications. Using it safely requires a solid
+knowledge of encryption techniques and best practices, so unless you know, for
+example, what CBC mode and nonces are, and why PKCS \#1 padding is important,
+you should avoid this interface in favor of something working at a higher level
+(such as the CMS interface).
+
+\subsection{Basic Algorithm Abilities}
+
+There are a small handful of functions implemented by most of Botan's
+algorithm objects. Among these are:
+
+\noindent
+\type{std::string} \function{name}():
+
+Returns a human-readable string of the name of this algorithm. Examples of
+names returned are ``Blowfish'' and ``HMAC(MD5)''. You can turn names back into
+algorithm objects using the functions in \filename{lookup.h}.
+
+\noindent
+\type{void} \function{clear}():
+
+Clear out the algorithm's internal state. A block cipher object will ``forget''
+its key, a hash function will ``forget'' any data put into it, etc. Basically,
+the object will look exactly as it did when you initially allocated it.
+
+\noindent
+\function{clone}():
+
+This function is central to Botan's name-based interface. The \function{clone}
+has many different return types, such as \type{BlockCipher*} and
+\type{HashFunction*}, depending on what kind of object it is called on. Note
+that unlike Java's clone, this returns a new object in a ``pristine'' state;
+that is, operations done on the initial object before calling \function{clone}
+do not affect the initial state of the new clone.
+
+Cloned objects can (and should) be deallocated with the C++ \texttt{delete}
+operator.
+
+\subsection{Keys and IVs}
+
+Both symmetric keys and initialization values can simply be considered byte (or
+octet) strings. These are represented by the classes \type{SymmetricKey} and
+\type{InitializationVector}, which are subclasses of \type{OctetString}.
+
+Since often it's hard to distinguish between a key and IV, many things (such as
+key derivation mechanisms) return \type{OctetString} instead of
+\type{SymmetricKey} to allow its use as a key or an IV.
+
+\noindent
+\function{OctetString}(\type{u32bit} \arg{length}):
+
+This constructor creates a new random key of size \arg{length}.
+
+\noindent
+\function{OctetString}(\type{std::string} \arg{str}):
+
+The argument \arg{str} is assumed to be a hex string; it is converted to binary
+and stored. Whitespace is ignored.
+
+\noindent
+\function{OctetString}(\type{const byte} \arg{input}[], \type{u32bit}
+\arg{length}):
+
+This constructor simply copies its input.
+
+\subsection{Symmetrically Keyed Algorithms}
+
+Block ciphers, stream ciphers, and MACs all handle keys in pretty much the same
+way. To make this similarity explicit, all algorithms of those types are
+derived from the \type{SymmetricAlgorithm} base class. This type has three
+functions:
+
+\noindent
+\type{void} \function{set\_key}(\type{const byte} \arg{key}[], \type{u32bit}
+\arg{length}):
+
+Most algorithms only accept keys of certain lengths. If you attempt to call
+\function{set\_key} with a key length that is not supported, the exception
+\type{Invalid\_Key\_Length} will be thrown. There is also another version of
+\function{set\_key} that takes a \type{SymmetricKey} as an argument.
+
+\noindent
+\type{bool} \function{valid\_keylength}(\type{u32bit} \arg{length}) const:
+
+This function returns true if a key of the given length will be accepted by
+the cipher.
+
+There are also three constant data members of every \type{SymmetricAlgorithm}
+object, which specify exactly what limits there are on keys which that object
+can accept:
+
+MAXIMUM\_KEYLENGTH: The maximum length of a key. Usually, this is at most 32
+(256 bits), even if the algorithm actually supports more. In a few rare cases
+larger keys will be supported.
+
+MINIMUM\_KEYLENGTH: The minimum length of a key. This is at least 1.
+
+KEYLENGTH\_MULTIPLE: The length of the key must be a multiple of this value.
+
+In all cases, \function{set\_key} must be called on an object before any data
+processing (encryption, decryption, etc) is done by that object. If this is not
+done, the results are undefined -- that is to say, Botan reserves the right in
+this situation to do anything from printing a nasty, insulting message on the
+screen to dumping core.
+
+\subsection{Block Ciphers}
+
+Block ciphers implement the interface \type{BlockCipher}, found in
+\filename{base.h}, as well as the \type{SymmetricAlgorithm} interface.
+
+\noindent
+\type{void} \function{encrypt}(\type{const byte} \arg{in}[BLOCK\_SIZE],
+ \type{byte} \arg{out}[BLOCK\_SIZE]) const
+
+\noindent
+\type{void} \function{encrypt}(\type{byte} \arg{block}[BLOCK\_SIZE]) const
+
+These functions apply the block cipher transformation to \arg{in} and
+place the result in \arg{out}, or encrypts \arg{block} in place
+(\arg{in} may be the same as \arg{out}). BLOCK\_SIZE is a constant
+member of each class, which specifies how much data a block cipher can
+process at one time. Note that BLOCK\_SIZE is not a static class
+member, meaning you can (given a \type{BlockCipher*} named
+\arg{cipher}), call \verb|cipher->BLOCK_SIZE| to get the block size of
+that particular object. \type{BlockCipher}s have similar functions
+\function{decrypt}, which perform the inverse operation.
+
+\begin{verbatim}
+AES_128 cipher;
+SymmetricKey key(cipher.MAXIMUM_KEYLENGTH); // randomly created
+cipher.set_key(key);
+
+byte in[16] = { /* secrets */ };
+byte out[16];
+cipher.encrypt(in, out);
+\end{verbatim}
+
+\subsection{Stream Ciphers}
+
+Stream ciphers are somewhat different from block ciphers, in that encrypting
+data results in changing the internal state of the cipher. Also, you may
+encrypt any length of data in one go (in byte amounts).
+
+\noindent
+\type{void} \function{encrypt}(\type{const byte} \arg{in}[], \type{byte}
+\arg{out}[], \type{u32bit} \arg{length})
+
+\noindent
+\type{void} \function{encrypt}(\type{byte} \arg{data}[], \type{u32bit}
+\arg{length}):
+
+These functions encrypt the arbitrary length (well, less than 4 gigabyte long)
+string \arg{in} and place it into \arg{out}, or encrypts it in place in
+\arg{data}. The \function{decrypt} functions look just like
+\function{encrypt}.
+
+Stream ciphers implement the \type{SymmetricAlgorithm} interface.
+
+Some stream ciphers support random access to any point in their cipher
+stream. For such ciphers, calling \type{void} \function{seek}(\type{u32bit}
+\arg{byte}) will change the cipher's state so that it is as if the cipher had been
+keyed as normal, then encrypted \arg{byte} -- 1 bytes of data (so the next byte
+in the cipher stream is byte number \arg{byte}).
+
+\subsection{Hash Functions / Message Authentication Codes}
+
+Hash functions take their input without producing any output, only producing
+anything when all input has already taken place. MACs are very similar, but are
+additionally keyed. Both of these are derived from the base class
+\type{BufferedComputation}, which has the following functions.
+
+\noindent
+\type{void} \function{update}(\type{const byte} \arg{input}[], \type{u32bit}
+\arg{length})
+
+\noindent
+\type{void} \function{update}(\type{byte} \arg{input})
+
+\noindent
+\type{void} \function{update}(\type{const std::string \&} \arg{input})
+
+Updates the hash/mac calculation with \arg{input}.
+
+\noindent
+\type{void} \function{final}(\type{byte} \arg{out}[OUTPUT\_LENGTH])
+
+\noindent
+\type{SecureVector<byte>} \function{final}():
+
+Complete the hash/MAC calculation and place the result into \arg{out}.
+OUTPUT\_LENGTH is a public constant in each object that gives the length of the
+hash in bytes. After you call \function{final}, the hash function is reset to
+its initial state, so it may be reused immediately.
+
+The second method of using final is to call it with no arguments at all, as
+shown in the second prototype. It will return the hash/mac value in a memory
+buffer, which will have size OUTPUT\_LENGTH.
+
+There is also a pair of functions called \function{process}. They are
+essentially a combination of a single \function{update}, and \function{final}.
+Both versions return the final value, rather than placing it an array. Calling
+\function{process} with a single byte value isn't available, mostly because it
+would rarely be useful.
+
+A MAC can be viewed (in most cases) as simply a keyed hash function, so classes
+that are derived from \type{MessageAuthenticationCode} have \function{update}
+and \function{final} classes just like a \type{HashFunction} (and like a
+\type{HashFunction}, after \function{final} is called, it can be used to make a
+new MAC right away; the key is kept around).
+
+A MAC has the \type{SymmetricAlgorithm} interface in addition to the
+\type{BufferedComputation} interface.
+
+\pagebreak
+\section{Random Number Generators}
+
+The random number generators provided in Botan are meant for creating keys,
+IVs, padding, nonces, and anything else that requires 'random' data. It is
+important to remember that the output of these classes will vary, even if they
+are supplied with exactly the same seed (\ie, two \type{Randpool} objects with
+similar initial states will not produce the same output, because the value of
+high resolution timers is added to the state at various points).
+
+To ensure good quality output, a PRNG needs to be seeded with truly random data
+(such as that produced by a hardware RNG). Typically, you will use an
+\type{EntropySource} (see below). To add entropy to a PRNG, you can use
+\type{void} \function{add\_entropy}(\type{const byte} \arg{data}[],
+\type{u32bit} \arg{length}) or (better), use the \type{EntropySource}
+interface.
+
+Once a PRNG has been initialized, you can get a single byte of random data by
+calling \type{byte} \function{random()}, or get a large block by calling
+\type{void} \function{randomize}(\type{byte} \arg{data}[], \type{u32bit}
+\arg{length}), which will put random bytes into each member of the array from
+indexes 0 $\ldots$ \arg{length} -- 1.
+
+You can avoid all the problems inherent in seeding the PRNG by using the
+globally shared PRNG, described later in this section.
+
+\subsection{Randpool}
+
+\type{Randpool} is the primary PRNG within Botan. In recent versions all uses
+of it have been wrapped by an implementation of the X9.31 PRNG (see below). If
+for some reason you should have cause to create a PRNG instead of using the
+``global'' one owned by the library, it would be wise to consider the same on
+the grounds of general caution; while \type{Randpool} is designed with known
+attacks and PRNG weaknesses in mind, it is not an standard/official PRNG. The
+remainder of this section is a (fairly technical, though high-level) description
+of the algorithms used in this PRNG. Unless you have a specific interest in
+this subject, the rest of this section might prove somewhat uninteresting.
+
+\type{Randpool} has an internal state called pool, which is 512 bytes
+long. This is where entropy is mixed into and extracted from. There is also a
+small output buffer (called buffer), which holds the data which has already
+been generated but has just not been output yet.
+
+It is based around a MAC and a block cipher (which are currently HMAC(SHA-256)
+and AES-256). Where a specific size is mentioned, it should be taken as a
+multiple of the cipher's block size. For example, if a 256-bit block cipher
+were used instead of AES, all the sizes internally would double. Every time
+some new output is needed, we compute the MAC of a counter and a high
+resolution timer. The resulting MAC is XORed into the output buffer (wrapping
+as needed), and the output buffer is then encrypted with AES, producing 16
+bytes of output.
+
+After 8 blocks (or 128 bytes) have been produced, we mix the pool. To do this,
+we first rekey both the MAC and the cipher; the new MAC key is the MAC of the
+current pool under the old MAC key, while the new cipher key is the MAC of the
+current pool under the just-chosen MAC key. We then encrypt the entire pool in
+CBC mode, using the current (unused) output buffer as the IV. We then generate
+a new output buffer, using the mechanism described in the previous paragraph.
+
+To add randomness to the PRNG, we compute the MAC of the input and XOR the
+output into the start of the pool. Then we remix the pool and produce a new
+output buffer. The initial MAC operation should make it very hard for chosen
+inputs to harm the security of \type{Randpool}, and as HMAC should be able to
+hold roughly 256 bits of state, it is unlikely that we are wasting much input
+entropy (or, if we are, it doesn't matter, because we have a very abundant
+supply).
+
+\subsection{ANSI X9.31}
+
+\type{ANSI\_X931\_PRNG} is the standard issue X9.31 Appendix A.2.4 PRNG, though
+using AES-256 instead of 3DES as the block cipher. This PRNG implementation has
+been checked against official X9.31 test vectors.
+
+Internally, the PRNG holds a pointer to another PRNG (typically
+Randpool). This internal PRNG generates the key and seed used by the
+X9.31 algorithm, as well as the date/time vectors. Each time an X9.31
+PRNG object receives entropy, it simply passes it along to the PRNG it
+is holding, and then pulls out some random bits to generate a new key
+and seed. This PRNG considers itself seeded as soon as the internal
+PRNG is seeded.
+
+As of version 1.4.7, the X9.31 PRNG is by default used for all random number
+generation.
+
+\subsection{Entropy Sources}
+
+An \type{EntropySource} is an abstract representation of some method of gather
+``real'' entropy. This tends to be very system dependent. The \emph{only} way
+you should use an \type{EntropySource} is to pass it to a PRNG that will
+extract entropy from it -- never use the output directly for any kind of key or
+nonce generation!
+
+\type{EntropySource} has a pair of functions for getting entropy from some
+external source, called \function{fast\_poll} and \function{slow\_poll}. These
+pass a buffer of bytes to be written; the functions then return how many bytes
+of entropy were actually gathered. \type{EntropySource}s are usually used to
+seed the global PRNG using the functions found in the \namespace{Global\_RNG}
+namespace.
+
+Note for writers of \type{EntropySource}s: it isn't necessary to use any kind
+of cryptographic hash on your output. The data produced by an EntropySource is
+only used by an application after it has been hashed by the
+\type{RandomNumberGenerator} that asked for the entropy, thus any hashing
+you do will be wasteful of both CPU cycles and possibly entropy.
+
+\pagebreak
+\section{User Interfaces}
+
+Botan has recently changed some infrastructure to better accommodate more
+complex user interfaces, in particular ones that are based on event
+loops. Primary among these was the fact that when doing something like loading
+a PKCS \#8 encoded private key, a passphrase might be needed, but then again it
+might not (a PKCS \#8 key doesn't have to be encrypted). Asking for a
+passphrase to decrypt an unencrypted key is rather pointless. Not only that,
+but the way to handle the user typing the wrong passphrase was complicated,
+undocumented, and inefficient.
+
+So now Botan has an object called \type{UI}, which provides a simple interface
+for the aspects of user interaction the library has to be concerned
+with. Currently, this means getting a passphrase from the user, and that's it
+(\type{UI} will probably be extended in the future to support other operations
+as they are needed). The base \type{UI} class is very stupid, because the
+library can't directly assume anything about the environment that it's running
+under (for example, if there will be someone sitting at the terminal, if the
+application is even \emph{attached} to a terminal, and so on). But since you
+can subclass \type{UI} to use whatever method happens to be appropriate for
+your application, this isn't a big deal.
+
+There is (currently) a single function that can be overridden by subclasses of
+\type{UI} (the \type{std::string} arguments are actually \type{const
+std::string\&}, but shown as simply \type{std::string} to keep the line from
+wrapping):
+
+\noindent
+\type{std::string} \function{get\_passphrase}(\type{std::string} \arg{what},
+ \type{std::string} \arg{source},
+ \type{UI\_Result\&} \arg{result}) const;
+
+The \arg{what} argument specifies what the passphrase is needed for (for
+example, PKCS \#8 key loading passes \arg{what} as ``PKCS \#8 private
+key''). This lets you provide the user with some indication of \emph{why} your
+application is asking for a passphrase; feel free to pass the string through
+\function{gettext(3)} or moral equivalent for i18n purposes. Similarly,
+\arg{source} specifies where the data in question came from, if available (for
+example, a file name). If the source is not available for whatever reason, then
+\arg{source} will be an empty string; be sure to account for this possibility
+when writing a \type{UI} subclass.
+
+The function returns the passphrase as the return value, and a status code in
+\arg{result} (either \type{OK} or \type{CANCEL\_ACTION}). If
+\type{CANCEL\_ACTION} is returned in \arg{result}, then the return value will
+be ignored, and the caller will take whatever action is necessary (typically,
+throwing an exception stating that the passphrase couldn't be determined). In
+the specific case of PKCS \#8 key decryption, a \type{Decoding\_Error}
+exception will be thrown; your UI should assume this can happen, and provide
+appropriate error handling (such as putting up a dialog box informing the user
+of the situation, and canceling the operation in progress).
+
+There is an example \type{UI} that uses GTK+ available on the web site. The
+\type{GTK\_UI} code is cleanly separated from the rest of the example, so if
+you happen to be using GTK+, you can copy (and/or adapt) that code for your
+application. If you write a \type{UI} object for another windowing system
+(Win32, Qt, wxWidgets, FOX, etc), and would like to make it available to users
+in general (ideally under a permissive license such as public domain or
+MIT/BSD), feel free to send in a copy.
+
+\pagebreak
+\section{Botan's Modules}
+
+Botan comes with a variety of modules that can be compiled into the system.
+These will not be available on all installations of the library, but you can
+check for their availability based on whether or not certain macros are
+defined.
+
+\subsection{Pipe I/O for Unix File Descriptors}
+
+This is a fairly minor feature, but it comes in handy sometimes. In all
+installations of the library, Botan's \type{Pipe} object overloads the
+\keyword{<<} and \keyword{>>} operators for C++ iostream objects, which is
+usually more than sufficient for doing I/O.
+
+However, there are cases where the iostream hierarchy does not map well to
+local 'file types', so there is also the ability to do I/O directly with Unix
+file descriptors. This is most useful when you want to read from or write to
+something like a TCP or Unix-domain socket, or a pipe, since for simple file
+access it's usually easier to just use C++'s file streams.
+
+If \macro{BOTAN\_EXT\_PIPE\_UNIXFD\_IO} is defined, then you can use the
+overloaded I/O operators with Unix file descriptors. For an example of this,
+check out the \filename{hash\_fd} example, included in the Botan distribution.
+
+\subsection{Entropy Sources}
+
+All of these are used by the \function{Global\_RNG::seed} function if they are
+available. Since this function is called by the \type{LibraryInitializer} class
+when it is created, it is fairly rare that you will need to deal with any of
+these classes directly. Even in the case of a long-running server that needs to
+renew its entropy poll, it is easier to simply call
+\function{Global\_RNG::seed} (see the section entitled ``The Global PRNG'' for
+more details).
+
+\noindent
+\type{EGD\_EntropySource}: Query an EGD socket. If the macro
+\macro{BOTAN\_EXT\_ENTROPY\_SRC\_EGD} is defined, it can be found in
+\filename{es\_egd.h}. The constructor takes a \type{std::vector<std::string>}
+that specifies the paths to look for an EGD socket.
+
+\noindent
+\type{Unix\_EntropySource}: This entropy source executes programs common on
+Unix systems (such as \filename{uptime}, \filename{vmstat}, and \filename{df})
+and adds it to a buffer. It's quite slow due to process overhead, and (roughly)
+1 bit of real entropy is in each byte that is output. It is declared in
+\filename{es\_unix.h}, if \macro{BOTAN\_EXT\_ENTROPY\_SRC\_UNIX} is
+defined. If you don't have \filename{/dev/urandom} \emph{or} EGD, this is
+probably the thing to use. For a long-running process on Unix, keep on object
+of this type around and run fast polls ever few minutes.
+
+\noindent
+\type{FTW\_EntropySource}: Walk through a filesystem (the root to start
+searching is passed as a string to the constructor), reading files. This tends
+to only be useful on things like \filename{/proc} that have a great deal of
+variability over time, and even then there is only a small amount of entropy
+gathered: about 1 bit of entropy for every 16 bits of output (and many hundreds
+of bits are read in order to get that 16 bits). It is declared in
+\filename{es\_ftw.h}, if \macro{BOTAN\_EXT\_ENTROPY\_SRC\_FTW} is defined. Only
+use this as a last resort. I don't really trust it, and neither should you.
+
+\noindent
+\type{Win32\_CAPI\_EntropySource}: This routines gathers entropy from a Win32
+CAPI module. It takes an optional \type{std::string} that will specify what
+type of CAPI provider to use. Generally the CAPI RNG is always the same
+software-based PRNG, but there are a few that may use a hardware RNG. By
+default it will use the first provider listed in the option
+``rng/ms\_capi\_prov\_type'' that is available on the machine (currently the
+providers ``RSA\_FULL'', ``INTEL\_SEC'', ``FORTEZZA'', and ``RNG'' are
+recognized).
+
+\noindent
+\type{BeOS\_EntropySource}: Query system statistics using various BeOS-specific
+APIs.
+
+\noindent
+\type{Pthread\_EntropySource}: Attempt to gather entropy based on jitter
+between a number of threads competing for a single mutex. This entropy source
+is \emph{very} slow, and highly questionable in terms of security. However, it
+provides a worst-case fallback on systems that don't have Unix-like features,
+but do support POSIX threads. This module is currently unavailable due to
+problems on some systems.
+
+\subsection{Compressors}
+
+There are two compression algorithms supported by Botan, Zlib and Bzip2 (Gzip
+and Zip encoding will be supported in future releases). Only lossless
+compression algorithms are currently supported by Botan, because they tend to
+be the most useful for cryptography. However, it is very reasonable to consider
+supporting something like GSM speech encoding (which is lossy), for use in
+encrypted voice applications.
+
+You should always compress \emph{before} you encrypt, because encryption seeks
+to hide the redundancy that compression is supposed to try to find and remove.
+
+\subsubsection{Bzip2}
+
+To test for Bzip2, check to see if \macro{BOTAN\_EXT\_COMPRESSOR\_BZIP2} is
+defined. If so, you can include \filename{bzip2.h}, which will declare a pair
+of \type{Filter} objects: \type{Bzip2\_Compression} and
+\type{Bzip2\_Decompression}.
+
+You should be prepared to take an exception when using the decompressing
+filter, for if the input is not valid Bzip2 data, that is what you will
+receive. You can specify the desired level of compression to
+\type{Bzip2\_Compression}'s constructor as an integer between 1 and 9, 1
+meaning worst compression, and 9 meaning the best. The default is to use 9,
+since small values take the same amount of time, just use a little less memory.
+
+The Bzip2 module was contributed by Peter J. Jones.
+
+\subsubsection{Zlib}
+
+Zlib compression works pretty much like Bzip2 compression. The only differences
+in this case are that the macro is \macro{BOTAN\_EXT\_COMPRESSOR\_ZLIB}, the
+header you need to include is called \filename{botan/zlib.h} (remember that you
+shouldn't just \verb|#include <zlib.h>|, or you'll get the regular zlib API,
+which is not what you want). The Botan classes for Zlib
+compression/decompression are called \type{Zlib\_Compression} and
+\type{Zlib\_Decompression}.
+
+Like Bzip2, a \type{Zlib\_Decompression} object will throw an exception if
+invalid (in the sense of not being in the Zlib format) data is passed into it.
+
+In the case of zlib's algorithm, a worse compression level will be faster than
+a very high compression ratio. For this reason, the Zlib compressor will
+default to using a compression level of 6. This tends to give a good trade off
+in terms of time spent to compression achieved. There are several factors you
+need to consider in order to decide if you should use a higher compression
+level:
+
+\begin{list}{$\cdot$}
+ \item Better security: the less redundancy in the source text, the harder it
+ is to attack your ciphertext. This is not too much of a concern,
+ because with decent algorithms using sufficiently long keys, it doesn't
+ really matter \emph{that} much (but it certainly can't hurt).
+ \item
+
+ \item Decreasing returns. Some simple experiments by the author showed
+ minimal decreases in the size between level 6 and level 9 compression
+ with large (1 to 3 megabyte) files. There was some difference, but it
+ wasn't that much.
+
+ \item CPU time. Level 9 zlib compression is often two to four times as slow
+ as level 6 compression. This can make a substantial difference in the
+ overall runtime of a program.
+\end{list}
+
+While the zlib compression library uses the same compression algorithm as the
+gzip and zip programs, the format is different. The zlib format is defined in
+RFC 1950.
+
+\subsubsection{Data Sources}
+
+A \type{DataSource} is a simple abstraction for a thing that stores bytes. This
+type is used fairly heavily in the areas of the API related to ASN.1
+encoding/decoding. The following types are \type{DataSource}s: \type{Pipe},
+\type{SecureQueue}, and a couple of special purpose ones:
+\type{DataSource\_Memory} and \type{DataSource\_Stream}.
+
+You can create a \type{DataSource\_Memory} with an array of bytes and a length
+field. The object will make a copy of the data, so you don't have to worry
+about keeping that memory allocated. This is mostly for internal use, but if it
+comes in handy, feel free to use it.
+
+A \type{DataSource\_Stream} is probably more useful than the memory based
+one. Its constructors take either a \type{std::istream} or a
+\type{std::string}. If it's a stream, the data source will use the
+\type{istream} to satisfy read requests (this is particularly useful to use
+with \type{std::cin}). If the string version is used, it will attempt to open
+up a file with that name and read from it.
+
+\subsubsection{Data Sinks}
+
+A \type{DataSink} (in \filename{data\_snk.h}) is a \type{Filter} that takes
+arbitrary amounts of input, and produces no output. Generally, this means it's
+doing something with the data outside the realm of what
+\type{Filter}/\type{Pipe} can handle, for example, writing it to a file (which
+is what the \type{DataSink\_Stream} does). There is no need for
+\type{DataSink}s that write to a \type{std::string} or memory buffer, because
+\type{Pipe} can handle that by itself.
+
+Here's a quick example of using a \type{DataSink}, which encrypts
+\filename{in.txt} and sends the output to \filename{out.txt}. There is
+no explicit output operation; the writing of \filename{out.txt} is
+implicit.
+
+\begin{verbatim}
+ DataSource_Stream in("in.txt");
+ Pipe pipe(new CBC_Encryption("Blowfish", "PKCS7", key, iv),
+ new DataSink_Stream("out.txt"));
+ pipe.process_msg(in);
+\end{verbatim}
+
+A real advantage of this is that even if ``in.txt'' is large, only as
+much memory is needed for internal I/O buffers will actually be used.
+
+\subsection{Writing Modules}
+
+It's a lot simpler to write modules for Botan that it is to write code
+in the core library, for several reasons. First, a module can rely on
+external libraries and services beyond the base ISO C++ libraries, and
+also machine dependent features. Also, the code can be added at
+configuration time on the user's end with very little effort (\ie the
+code can be distributed separately, and included by the user without
+needing to patch any existing source files).
+
+Each module lives in a subdirectory of the \filename{modules}
+directory, which exists at the top-level of the Botan source tree. The
+``short name'' of the module is the same as the name of this
+directory. The only required file in this directory is
+\filename{info.txt}, which contains directives that specify what a
+particular module does, what systems it runs on, and so on. Comments
+in \filename{info.txt} start with a \verb|#| character and continue
+to end of line.
+
+Recognized directives include:
+
+\newcommand{\directive}[2]{
+ \vskip 4pt
+ \noindent
+ \texttt{#1}: #2
+}
+
+\directive{realname <name>}{Specify that the 'real world' name of this module
+ is \texttt{<name>}.}
+
+\directive{note <note>}{Add a note that will be seen by the end-user at
+configure time if the module is included into the library.}
+
+\directive{require\_version <version>}{Require at configure time that
+the version of Botan in use be at least \texttt{<version>}.}
+
+\directive{define <macro>[,<macro>[,...]]}{Cause the macro
+ \macro{BOTAN\_EXT\_<macro>} (for each instance of \macro{<macro>}
+ in the directive) to be defined in \filename{build.h}. This should
+ only be used if the module creates user-visible changes. There is a
+ set of conventions that should be followed in deciding what to call
+ this macro (where xxx denotes some descriptive and distinguishing
+ characteristic of the thing implemented, such as
+ \macro{ALLOC\_MLOCK} or \macro{MUTEX\_PTHREAD}):
+
+\begin{itemize}
+\item Allocator: \macro{ALLOC\_xxx}
+\item Compressors: \macro{COMPRESSOR\_xxx}
+\item EntropySource: \macro{ENTROPY\_SRC\_xxx}
+\item Engines: \macro{ENGINE\_xxx}
+\item Mutex: \macro{MUTEX\_xxx}
+\item Timer: \macro{TIMER\_xxx}
+\end{itemize}
+}
+
+\directive{<libs> / </libs>}{This specifies any extra libraries to be
+linked in. It is a mapping from OS to library name, for example
+\texttt{linux -> rt}, which means that on Linux librt should be linked
+in. You can also use ``all'' to force the library to be linked in on
+all systems.}
+
+\directive{<add> / </add>}{Tell the configuration script to add the
+ files named between these two tags into the source tree. All these
+ files must exist in the current module directory.}
+
+\directive{<ignore> / </ignore>}{Tell the configuration script to
+ ignore the files named in the main source tree. This is useful, for
+ example, when replacing a C++ implementation with a pure assembly
+ version.}
+
+\directive{<replace> / </replace>}{Tell the configuration script to
+ ignore the file given in the main source tree, and instead use the
+ one in the module's directory.}
+
+Additionally, the module file can contain blocks, delimited by the
+following pairs:
+
+\texttt{<os> / </os>}, \texttt{<arch> / </arch>}, \texttt{<cc> / </cc>}
+
+\noindent
+For example, putting ``alpha'' and ``ia64'' in a \texttt{<arch>} block will
+make the configuration script only allow the module to be compiled on those
+architectures. Not having a block means any value is acceptable.
+
+\pagebreak
+\section{Miscellaneous}
+
+This section has documentation for anything that just didn't fit into any of
+the major categories. Many of them (Timers, Allocators) will rarely be used in
+actual application code, but others, like the S2K algorithms, have a wide
+degree of applicability.
+
+\subsection{S2K Algorithms}
+
+There are various procedures (usually fairly ad-hoc) for turning a passphrase
+into a (mostly) arbitrary length key for a symmetric cipher. A general
+interface for such algorithms is presented in \filename{s2k.h}. The main
+function is \function{derive\_key}, which takes a passphrase, and the desired
+length of the output key, and returns a key of that length, deterministically
+produced from the passphrase. If an algorithm can't produce a key of that size,
+it will throw an exception (most notably, PKCS \#5's PBKDF1 can only produce
+strings between 1 and $n$ bytes, where $n$ is the output size of the underlying
+hash function).
+
+Most such algorithms allow the use of a ``salt'', which provides some extra
+randomness and helps against dictionary attacks on the passphrase. Simply call
+\function{change\_salt} (there are variations of it for most of the ways you
+might wish to specify a salt, check the header for details) with a block of
+random data. You can also have the class generate a new salt for you with
+\function{new\_random\_salt}; the salt that was generated can be retrieved with
+\function{current\_salt}.
+
+Additionally some algorithms allow you to set some sort of iteration
+count, which will make the algorithm take longer to compute the final
+key (reducing the speed of brute-force attacks of various kinds). This
+can be changed with the \function{set\_iterations} function. Most
+standards recommend an iteration count of at least 1000. Currently
+defined S2K algorithms are ``PBKDF1(digest)'', ``PBKDF2(digest)'', and
+``OpenPGP-S2K(digest)''; you can retrieve any of these using the
+\function{get\_s2k}, found in \filename{lookup.h}. As of this writing,
+``PBKDF2(SHA-256)'' with 10000 iterations and an 8 byte salt is
+recommend for new applications.
+
+\subsubsection{OpenPGP S2K}
+
+There are some oddities about OpenPGP's S2K algorithms that are documented
+here. For one thing, it uses the iteration count in a strange manner; instead
+of specifying how many times to iterate the hash, it tells how many
+\emph{bytes} should be hashed in total (including the salt). So the exact
+iteration count will depend on the size of the salt (which is fixed at 8 bytes
+by the OpenPGP standard, though the implementation will allow any salt size)
+and the size of the passphrase.
+
+To get what OpenPGP calls ``Simple S2K'', set iterations to 0 (the default for
+OpenPGP S2K), and do not specify a salt. To get ``Salted S2K'', again leave the
+iteration count at 0, but give an 8-byte salt. ``Salted and Iterated S2K''
+requires an 8-byte salt and some iteration count (this should be significantly
+larger than the size of the longest passphrase that might reasonably be used;
+somewhere from 1024 to 65536 would probably be about right). Using both a
+reasonably sized salt and a large iteration count is highly recommended to
+prevent password guessing attempts.
+
+\subsection{Checksums}
+
+Checksums are very similar to hash functions, and in fact share the same
+interface. But there are some significant differences, the major ones being
+that the output size is very small (usually in the range of 2 to 4 bytes), and
+is not cryptographically secure. But for their intended purpose (error
+checking), they perform very well. Some examples of checksums included in Botan
+are the Adler32 and CRC32 checksums.
+
+\subsection{Exceptions}
+
+Sooner or later, something is going to go wrong. Botan's behavior when
+something unusual occurs, like most C++ software, is to throw an exception.
+Exceptions in Botan are derived from the \type{Exception} class. You can see
+most of the major varieties of exceptions used in Botan by looking at
+\filename{exceptn.h}. The only function you really need to concern yourself
+with is \type{const char*} \function{what()}. This will return an error message
+relevant to the error that occurred. For example:
+
+\begin{verbatim}
+try {
+ // various Botan operations
+ }
+catch(Botan::Exception& e)
+ {
+ cout << "Botan exception caught: " << e.what() << endl;
+ // error handling, or just abort
+ }
+\end{verbatim}
+
+Botan's exceptions are derived from \type{std::exception}, so you don't need
+to explicitly check for Botan exceptions if you're already catching the ISO
+standard ones.
+
+\subsection{Threads and Mutexes}
+
+Botan includes a mutex system, which is used internally to lock some shared
+data structures that must be kept shared for efficiency reasons (mostly, these
+are in the allocation systems~--~handing out 1000 separate allocators hurts
+performance and makes caching memory blocks useless). This system is supported
+by the \texttt{mux\_pthr} module, implementing the \type{Mutex} interface for
+systems that have POSIX threads.
+
+If your application is using threads, you \emph{must} add the option
+``thread\_safe'' to the options string when you create the
+\type{LibraryInitializer} object. If you specify this option and no mutex type
+is available, an exception is thrown, since otherwise you would probably be
+facing a nasty crash.
+
+\subsection{Secure Memory}
+
+A major concern with mixing modern multiuser OSes and cryptographic
+code is that at any time the code (including secret keys) could be
+swapped to disk, where it can later be read by an attacker. Botan
+stores almost everything (and especially anything sensitive) in memory
+buffers that a) clear out their contents when their destructors are
+called, and b) have easy plugins for various memory locking functions,
+such as the \function{mlock}(2) call on many Unix systems.
+
+Two of the allocation method used (``malloc'' and ``mmap'') don't
+require any extra privileges on Unix, but locking memory does. At
+startup, each allocator type will attempt to allocate a few blocks
+(typically totaling 128k), so if you want, you can run your
+application \texttt{setuid} \texttt{root}, and then drop privileges
+immediately after creating your \type{LibraryInitializer}. If you end
+up using more than what's been allocated, some of your sensitive data
+might end up being swappable, but that beats running as \texttt{root}
+all the time. BTW, I would note that, at least on Linux, you can use a
+kernel module to give your process extra privileges (such as the
+ability to call \function{mlock}) without being root. For example,
+check out my Capability Override LSM
+(\url{http://www.randombit.net/projects/cap\_over/}), which makes this
+pretty easy to do.
+
+These classes should also be used within your own code for storing sensitive
+data. They are only meant for primitive data types (int, long, etc): if you
+want a container of higher level Botan objects, you can just use a
+\verb|std::vector|, since these objects know how to clear themselves when they
+are destroyed. You cannot, however, have a \verb|std::vector| (or any other
+container) of \type{Pipe}s or \type{Filter}s, because these types have pointers
+to other \type{Filter}s, and implementing copy constructors for these types
+would be both hard and quite expensive (vectors of pointers to such objects is
+fine, though).
+
+These types are not described in any great detail: for more information,
+consult the definitive sources~--~the header files \filename{secmem.h} and
+\filename{allocate.h}.
+
+\type{SecureBuffer} is a simple array type, whose size is specified at compile
+time. It will automatically convert to a pointer of the appropriate type, and
+has a number of useful functions, including \function{clear()}, and
+\type{u32bit} \function{size()}, which returns the length of the array. It is a
+template that takes as parameters a type, and a constant integer which is how
+long the array is (for example: \verb|SecureBuffer<byte, 8> key;|).
+
+\type{SecureVector} is a variable length array. Its size can be increased or
+decreased as need be, and it has a wide variety of functions useful for copying
+data into its buffer. Like \type{SecureBuffer}, it implements \function{clear}
+and \function{size}.
+
+\subsection{Allocators}
+
+The containers described above get their memory from allocators. As a user of
+the library, you can add new allocator methods at run time for containers,
+including the ones used internally by the library, to use. The interface to
+this is in \filename{allocate.h}. Basically how it works is that code needing
+an allocator uses \function{get\_allocator}, which returns a pointer to an
+allocator. This pointer should not be freed: the caller does not own the
+allocator (it is shared among multiple users, and locks itself as needed). It
+is possible to call \function{get\_allocator} with a specific name to request a
+particular type of allocator, otherwise, a default allocator type is returned.
+
+At start time, the only allocator known is a \type{Default\_Allocator}, which
+just allocates memory using \function{malloc}, and \function{memset}s it to 0
+when the memory is released. It is known by the name ``malloc''. If you ask for
+another type of allocator (``locking'' and ``mmap'' are currently used), and it
+is not available, some other allocator will be returned.
+
+You can add in a new allocator type using \function{add\_allocator\_type}. This
+function takes a string and a pointer to an allocator. The string gives this
+allocator type a name to which it can be referred when one is requesting it
+with \function{get\_allocator}. If an error occurs (such as the name being
+already registered), this function returns false. It will return true if the
+allocator was successfully registered. If you ask it to,
+\type{LibraryInitializer} will do this for you.
+
+Finally, you can set the default allocator type that will be returned using
+the policy setting ``default\_alloc'' to the name of any previously registered
+allocator.
+
+\subsection{BigInt}
+
+\type{BigInt} is Botan's implementation of a multiple-precision
+integer. Thanks to C++'s operator overloading features, using \type{BigInt} is
+often quite similar to using a native integer type. The number of functions
+related to \type{BigInt} is quite large. You can find most of them in
+\filename{bigint.h} and \filename{numthry.h}.
+
+Due to the sheer number of functions involved, only a few, which a regular user
+of the library might have to deal with, are mentioned here. Fully documenting
+the MPI library would take a significant while, so if you need to use it now,
+the best way to learn is to look at the headers.
+
+Probably the most important are the encoding/decoding functions, which
+transform the normal representation of a \type{BigInt} into some other form,
+such as a decimal string. The most useful of these functions are
+
+\type{SecureVector<byte>} \function{BigInt::encode}(\type{BigInt},
+\type{Encoding})
+
+\noindent
+and
+
+\type{BigInt} \function{BigInt::decode}(\type{SecureVector<byte>},
+\type{Encoding})
+
+\type{Encoding} is an enum that has values \type{Binary}, \type{Octal},
+\type{Decimal}, and \type{Hexadecimal}. The parameter will default to
+\type{Binary}. These functions are static member functions, so they would be
+called like this:
+
+\begin{verbatim}
+ BigInt n1; // some number
+ SecureVector<byte> n1_encoded = BigInt::encode(n1);
+ BigInt n2 = BigInt::decode(n1_encoded);
+ // now n1 == n2
+\end{verbatim}
+
+There are also C++-style I/O operators defined for use with \type{BigInt}. The
+input operator understands negative numbers, hexadecimal numbers (marked with a
+leading ``0x''), and octal numbers (marked with a leading '0'). The '-' must
+come before the ``0x'' or '0' marker. The output operator will never adorn the
+output; for example, when printing a hexadecimal number, there will not be a
+leading ``0x'' (though a leading '-' will be printed if the number is
+negative). If you want such things, you'll have to do them yourself.
+
+\type{BigInt} has constructors that can create a \type{BigInt} from an unsigned
+integer or a string. You can also decode a \type{byte}[] / length pair into a
+BigInt. There are several other \type{BigInt} constructors, which I would
+seriously recommend you avoid, as they are only intended for use internally by
+the library, and may arbitrarily change, or be removed, in a future release.
+
+An essentially random sampling of \type{BigInt} related functions:
+
+\type{u32bit} \function{BigInt::bytes}(): Return the size of this \type{BigInt}
+in bytes.
+
+\type{BigInt} \function{random\_prime(\type{u32bit} \arg{b})}: Return a prime
+number \arg{b} bits long.
+
+\type{BigInt} \function{gcd}(\type{BigInt} \arg{x}, \type{BigInt} \arg{y}):
+Returns the greatest common divisor of \arg{x} and \arg{y}. Uses the binary
+GCD algorithm.
+
+\type{bool} \function{is\_prime}(\type{BigInt} \arg{x}): Returns true if
+\arg{x} is a (possible) prime number. Uses the Miller-Rabin probabilistic
+primality test with fixed bases. For higher assurance, use
+\function{verify\_prime}, which uses more rounds and randomized 48-bit bases.
+
+\subsubsection{Efficiency Hints}
+
+If you can, always use expressions of the form \verb|a += b| over
+\verb|a = a + b|. The difference can be \emph{very} substantial, because the
+first form prevents at least one needless memory allocation, and possibly as
+many as three.
+
+If you're doing repeated modular exponentiations with the same modulus, create
+a \type{BarrettReducer} ahead of time. If the exponent or base is a constant,
+use the classes in \filename{mod\_exp.h}. This stuff is all handled for you by
+the normal high-level interfaces, of course.
+
+Never use the low-level MPI functions (those that begin with
+\texttt{bigint\_}). These are completely internal to the library, and
+may make arbitrarily strange and undocumented assumptions about their
+inputs, and don't check to see if they are actually true, on the
+assumption that only the library itself calls them, and that the
+library knows what the assumptions are. The interfaces for these
+functions can change completely without notice.
+
+\pagebreak
+\section{Algorithms}
+
+\subsection{Recommended Algorithms}
+
+This section is by no means the last word on selecting which algorithms to use.
+However, Botan includes a sometimes bewildering array of possible algorithms,
+and unless you're familiar with the latest developments in the field, it can be
+hard to know what is secure and what is not. The following attributes of the
+algorithms were evaluated when making this list: security, standardization,
+patent status, support by other implementations, and efficiency (in roughly
+that order).
+
+It is intended as a set of simple guidelines for developers, and nothing more.
+It's entirely possible that there are algorithms in Botan that will turn out to
+be more secure than the ones listed, but the algorithms listed here are
+(currently) thought to be safe.
+
+\begin{list}{$\cdot$}
+ \item Block ciphers: AES or Serpent in CBC or CTR mode
+
+ \item Hash functions: SHA-256, SHA-512
+
+ \item MACs: HMAC with any recommended hash function
+
+ \item Public Key Encryption: RSA with ``EME1(SHA-256)''
+
+ \item Public Key Signatures: RSA with EMSA4 and any recommended hash, or DSA
+ with ``EMSA1(SHA-256)''
+
+ \item Key Agreement: Diffie-Hellman, with ``KDF2(SHA-256)''
+\end{list}
+
+\subsection{Compliance with Standards}
+
+Botan is/should be at least roughly compatible with many cryptographic
+standards, including the following:
+
+\newcommand{\standard}[2]{
+ \vskip 4pt
+ * #1: \textbf{#2}
+}
+
+\standard{RSA}{PKCS \#1 v2.1, ANSI X9.31}
+
+\standard{DSA}{ANSI X9.30, FIPS 186-2}
+
+\standard{Diffie-Hellman}{ANSI X9.42, PKCS \#3}
+
+\standard{Certificates}{ITU X.509, RFC 3280/3281 (PKIX), PKCS \#9 v2.0,
+PKCS \#10}
+
+\standard{Private Key Formats}{PKCS \#5 v2.0, PKCS \#8}
+
+\standard{DES/DES-EDE}{FIPS 46-3, ANSI X3.92, ANSI X3.106}
+
+\standard{SHA-1}{FIPS 180-2}
+
+\standard{HMAC}{ANSI X9.71, FIPS 198}
+
+\standard{ANSI X9.19 MAC}{ANSI X9.9, ANSI X9.19}
+
+\vskip 8pt
+\noindent
+There is also support for the very general standards of \textbf{IEEE 1363-2000}
+and \textbf{1363a}. Most of the contents of such are included in the standards
+mentioned above, in various forms (usually with extra restrictions that 1363
+does not impose).
+
+\subsection{Algorithms Listing}
+
+Botan includes a very sizable number of cryptographic algorithms. In
+nearly all cases, you never need to know the header file or type name
+to use them. However, you do need to know what string (or strings) are
+used to identify that algorithm. Generally, these names conform to
+those set out by SCAN (Standard Cryptographic Algorithm Naming), which
+is a document that specifies how strings are mapped onto algorithm
+objects, which is useful for a wide variety of crypto APIs (SCAN is
+oriented towards Java, but Botan and several other non-Java libraries
+also make at least some use of it). For full details, read the SCAN
+document, which can be found at
+\url{http://www.users.zetnet.co.uk/hopwood/crypto/scan/}
+
+Many of these algorithms can take options (such as the number of
+rounds in a block cipher, the output size of a hash function,
+etc). These are shown in the following list; all of them default to
+reasonable values (unless otherwise marked). There are
+algorithm-specific limits on most of them. When you see something like
+``HASH'' or ``BLOCK'', that means you should insert the name of some
+algorithm of that type. There are no defaults for those options.
+
+A few very obscure algorithms are skipped; if you need one of them,
+you'll know it, and you can look in the appropriate header to see what
+that classes' \function{name} function returns (the names tend to
+match that in SCAN, if it's defined there).
+
+\begin{list}{$\cdot$}
+ \item ROUNDS: The number of rounds in a block cipher.
+ \item
+ \item OUTSZ: The output size of a hash function or MAC
+ \item PASS: The number of passes in a hash function (more passes generally
+ means more security).
+\end{list}
+
+\vskip .05in
+\noindent
+\textbf{Block Ciphers:} ``AES'', ``Blowfish'', ``CAST-128'',
+``CAST-256'', ``DES'', ``DESX'', ``TripleDES'', ``GOST'', ``IDEA'',
+``MARS'', ``MISTY1(ROUNDS)'', ``RC2'', ``RC5(ROUNDS)'', ``RC6'',
+``SAFER-SK(ROUNDS)'', ``SEED'', ``Serpent'', ``Skipjack'', ``Square'',
+``TEA'', ``Twofish'', ``XTEA''
+
+\noindent
+\textbf{Stream Ciphers:} ``ARC4'', ``MARK4'', ``Turing'', ``WiderWake4+1-BE''
+
+\noindent
+\textbf{Hash Functions:} ``FORK-256'', ``HAS-160'', ``GOST-34.11'',
+``MD2'', ``MD4'', ``MD5'', ``RIPEMD-128'', ``RIPEMD-160'',
+``SHA-160'', ``SHA-256'', ``SHA-384'', ``SHA-512'', ``Skein-512'',
+``Tiger(OUTSZ,PASS)'', ``Whirlpool''
+
+\noindent
+\textbf{MACs:} ``HMAC(HASH)'', ``CMAC(BLOCK)'', ``X9.19-MAC''
+
+\subsection{Compatibility}
+
+Generally, cryptographic algorithms are well standardized, thus
+compatibility between implementations is relatively simple (of course, not all
+algorithms are supported by all implementations). But there are a few
+algorithms that are poorly specified, and these should be avoided if you wish
+your data to be processed in the same way by another implementation (including
+future versions of Botan).
+
+The block cipher GOST has a particularly poor specification: there are no
+standard Sboxes, and the specification does not give test vectors even for
+sample boxes, which leads to issues of endian conventions, etc.
+
+If you wish maximum portability between different implementations of an
+algorithm, it's best to stick to strongly defined and well standardized
+algorithms, TripleDES, AES, HMAC, and SHA-256 all being good examples.
+
+\pagebreak
+\section{Support and Further Information}
+
+\subsection{Patents}
+
+Some of the algorithms implemented by Botan may be covered by patents in some
+locations. Algorithms known to have patent claims on them in the United States
+and that are not available in a license-free/royalty-free manner include:
+IDEA, MISTY1, RC5, RC6, and Nyberg-Rueppel.
+
+You must not assume that, just because an algorithm is not listed here, it is
+not encumbered by patents. If you have any concerns about the patent status of
+any algorithm you are considering using in an application, please discuss it
+with your attorney.
+
+\subsection{Recommended Reading}
+
+It's a very good idea if you have some knowledge of cryptography prior
+to trying to use this stuff. You really should read one or more of
+these books before seriously using the library (note that the Handbook
+of Applied Cryptography is available for free online):
+
+\setlength{\parskip}{5pt}
+
+\noindent
+\textit{Handbook of Applied Cryptography}, Alfred J. Menezes,
+Paul C. Van Oorschot, and Scott A. Vanstone; CRC Press
+
+\noindent
+\textit{Security Engineering -- A Guide to Building Dependable Distributed
+Systems}, Ross Anderson; Wiley
+
+\noindent
+\textit{Cryptography: Theory and Practice}, Douglas R. Stinson; CRC Press
+
+\noindent
+\textit{Applied Cryptography, 2nd Ed.}, Bruce Schneier; Wiley
+
+\noindent
+Once you've got the basics down, these are good things to at least take a look
+at: IEEE 1363 and 1363a, SCAN, NESSIE, PKCS \#1 v2.1, the security related FIPS
+documents, and the CFRG RFCs.
+
+\subsection{Support}
+
+Questions or problems you have with Botan can be directed to the
+development mailing list. Joining this list is highly recommended if
+you're going to be using Botan, since often advance notice of upcoming
+changes is sent there. ``Philosophical'' bug reports, announcements of
+programs using Botan, and basically anything else having to do with
+Botan are also welcome.
+
+The lists can be found at
+\url{http://lists.randombit.net/mailman/listinfo/}.
+
+\subsection{Contact Information}
+
+A PGP key with a fingerprint of
+\verb|621D AF64 11E1 851C 4CF9 A2E1 6211 EBF1 EFBA DFBC| is used to sign all
+Botan releases. This key can be found in the file \filename{doc/pgpkeys.asc};
+PGP keys for the developers are also stored there.
+
+\vskip 5pt \noindent
+Web Site: \url{http://botan.randombit.net}
+
+\subsection{License}
+
+Copyright \copyright 2000-2008, Jack Lloyd
+
+Licensed under the same terms as the Botan source
+
+\end{document}