\documentclass{article}

\setlength{\textwidth}{6.75in} % 1 inch side margins
\setlength{\textheight}{9in} % ~1 inch top and bottom margins

\setlength{\headheight}{0in}
\setlength{\topmargin}{0in}
\setlength{\headsep}{0in}

\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}

\title{Botan Internals}
\author{Jack Lloyd (lloyd@randombit.net)}
\date{August 20, 2006}

\newcommand{\filename}[1]{\texttt{#1}}
\newcommand{\manpage}[2]{\texttt{#1}(#2)}

\newcommand{\function}[1]{\textbf{#1}}
\newcommand{\type}[1]{\texttt{#1}}
\renewcommand{\arg}[1]{\textsl{#1}}

\begin{document}

\maketitle

\tableofcontents

\parskip=5pt

\section{Introduction}

This document is intended to document some of the trickier and/or more
complicated parts of Botan. This is not going to be terribly useful if
you just want to use the library, but for people wishing to understand
how it works, or contribute new code to it, it will hopefully prove
helpful.

I've realized that a lot of things Botan does internally are pretty
hard to understand, and that a lot of things are only inside my head,
which is a bad place for them to be (things tend to get lost in there,
not to mention the possibility that I'll get hit by a truck next
week).

This document is currently very incomplete. I'll be working on it as I
have time.

\pagebreak

\section{Filter}

\type{Filter} is one of the core abstractions of the library. It is
used to represent any sort of transformation. Nearly all
\type{Filter}s are linear; they take input from a single source and
send their output (if any) to another single \type{Filter}. The one
exception is \type{Fanout\_Filter}, which uses friend access to
\type{Filter} in order to allow for multiple \type{Filter}s to attach
to its output. This special access is used by the Chain and Fork
filters; Chain encapsulates one or more \type{Filter}s into a single
Filter, and Fork sends its input to a set of several \type{Filter}
objects.

The majority of the relations between filters is maintained by the
\type{Pipe} object which ``owns'' the \type{Filter}s.

\section{Pipe}

\type{Pipe} is, conceptually, a tree structure of \type{Filter}
objects. There is a single unique top, and an arbitrary number of
leaves (which are \type{SecureQueue} objects). \type{SecureQueue} is a
simple \type{Filter} that buffers its input.

Writing into the pipe writes into the top of the tree. The filter at
the top of the tree writes its output into the next \type{Filter}, and
so on until eventually data trickles down into the bottommost
\type{Filter}s, where the data is stored for later retrieval.

When a new message is started, \type{Pipe} searches through the tree
of \type{Filter}s and finds places where the \arg{next} field of the
\type{Filter} is NULL. This implies that it was the lowest layer of
the \type{Filter} tree that the user added. It then adds
\type{SecureQueue} objects onto these \type{Filter}s. These queues are
also stored in an deque; this is so \type{Pipe} can read from them
later without doing a tree traversal each time.

\type{Pipe} will, if asked, destroy the existing tree structure, in
order to create a new one. However, the queue objects are not deleted,
because \type{Pipe} might be asked to read from them later (while
\type{Pipe} could delete all the messages in this case, the principle
of least astonishment suggested keeping them).

What I wrote about \type{Pipe} keeing the queues in a deque is a
lie. Sort of. It keeps them in an object called
\type{Output\_Buffers}, which keeps them in a
deque. \type{Output\_Buffers} is intended to abstract away how message
queues are stored from \type{Pipe}. After a queue has been added to
the output buffers object, \type{Pipe} keeps no references to it
whatsoever; all access is mediated by the \type{Output\_Buffers}.
This allows queues which have been read to be deleted, rather than
leaving empty queue objects all over the place.

\section{Library Initialization}

WRITEME

\section{Lookup Mechanism}

Most objects know their name, and they know how to create a new copy
of themselves. We build mapping tables that map from an algorithm name
into a single instance of that algorithm. The tables themselves can be
found in \filename{src/lookup.cpp}.

There are a set of functions named \function{add\_algorithm} that can
be used to populate the tables. We get something out of the table with
\function{retrieve\_x}, where x is the name of a type
(\texttt{block\_cipher}, \texttt{hash}, etc). This returns a const
pointer to the single unique instance of the algorithm that the lookup
tables know about. If it doesn't know about it, it falls back on
calling a function called \function{try\_to\_get\_x}. These functions
live in \filename{src/algolist.cpp}. They are mostly used to handle
algorithms which need (or at least can have) arguments passed to them,
like \type{HMAC} and \type{SAFER\_SK}. It will return NULL if it can't
find the algorithm at all.

When it's asked for an algorithm it doesn't know about (ie, isn't in
the mapping tables), the retrieval functions will ask the try-to-get
functions if \emph{they} know about it. If they do, then the object
returned will be stored into the table for later retrieval.

The functions \function{get\_x} call the retrieval functions. If we
get back NULL, an exception is thrown. Otherwise it will call the
\function{clone} method to get a new copy of the algorithm, which it
returns.

The various functions like \function{output\_length\_of} call the
retrieval function for each type of object that the parameter in
question (in this case, \texttt{OUTPUT\_LENGTH}) might be meaningful
for. If it manages to get back an object, it will return (in this
case) the \texttt{OUTPUT\_LENGTH} field of the object. No allocations
are required to call this function: all of its operations work
directly on the copies living in the lookup tables.

\section{Allocators}

A big (slow) mess.

\section{BigInt}

Read ``Handbook of Applied Cryptography''.

\section{PEM/BER Identification}

We have a specific algorithm for figuring out if something is PEM or
BER. Previous versions (everything before 1.3.0) requried that the
caller specify which one it was, and they had to be right. Now we use
a hueristic (aka, an algorithm that sometimes doesn't work right) to
figure it out. If the first character is not 0x30 (equal to ASCII
'0'), then it can't possibly be BER (because everything we care about
is enclosed in an ASN.1 SEQUENCE, which for BER/DER is encoded as
beginning with 0x30). Roughly 99.9% of PEM blocks \emph{won't} have a
random 0 character in front of them, so we are mostly safe (unless
someone does it on purpose, in which case, please hit them for me).
But to be sure, if there is a 0, then we search the first \emph{N}
bytes of the block for the string ``-----BEGIN ``, which marks the
typical start of a PEM block. The specific \emph{N} depends on the
variable ``base/pem\_search'', which defaults to 4 kilobytes.

So, you can actually fool it either way: that a PEM file is really
BER, or that a BER file is actually PEM. To fool it that a BER file is
PEM, just have the string ``-----BEGIN `` somewhere (I can't imagine
this string shows up in certificates or CRLs too often, so if it is
there it means somebody is being a jerk). If a file starts with 0 and
has at least ``base/pem\_search'' byte more junk in the way, it won't
notice that its PEM at all. In either case, of course, the loading
will fail, and you'll get a nice exception saying that the decoding
failed.

\end{document}