FreeS/WAN -- KLIPS HARDWARE ACCELERATION NOTES (draft 2) ============================================================================= Fri Mar 2 23:14:47 EST 2001 This is a rework of the first draft, if you are interested how it progressed go and read the original from http://www.jukie.net/~bart/linux-ipsec/ In this revision I have removed some of the reasoning why the design is the way it is. There have been some clarifications and more points on the current KLIPS. As before no attention is drawn to the specifics of the implementation. I have heard no rebuttals yet, but if you are interested in some of the backing for this read the first draft. == KLIPS IN PARTS == So the first thing to mention is that KLIPS will be split. The parts that will be created will better define the different jobs that KLIPS performs. Each separate part will have a well defined interface - this will allow for independent updates of each without disturbing the rest. There parts would be: 1) tunnel processing engine (tunnel database and pfkey interface) 2) protocol processing engine (ESP/AH packet mangling) 3) crypto processing engine (crypto functionality) The tunnel processing engine will remain almost unchanged in the current form. The protocol processing engine will call on the functionality of tunnel engine to locate an SA structure given the skb containing the packet (or possibly given the parsed parts of the packet that describe the tunnel like the two IP's, etc). Protocol processing engine will rely on the crypto functionality to do the - drum roll please - crypto operations. == CRYPTO ENGINE == The crypto engine contains functions like enc_3des, dec_3des, sign_sha1, verify_sha1, etc. KLIPS comes with software implementations of these functions in the crypto engine. To make things a bit easier the prototypes of each of the enc_* and dec_* functions will be defined using a template. This means that a consumer of the crypto engine can have a pointer to a hashing function and not knowing the specifics will be able to encrypt using it. This is important for the protocol engine; read on. What is interesting about these functions is that along a variety of error conditions they can return two kinds of success states. The reason for this is that the caller cannot be guaranteed that the job is finished when the function returned. So say then encryption task failed a negative is returned; if the encryption task was competed a value of FINISHED (==0) is returned; finally if the encryption was scheduled to a crypto accelerator then DETACHED (==1) is returned (yes, idea is borrowed from netfileter). Of course the context must at some point return to the crypto engine via some callback. This is handled by the protocol engine; read on. I think that the obvious values that must be given to an encryption function are the source data pointer, destination data pointer, the data length, the key pointer, and the key length. These values don't have to all be passed in when the encryption needs to happen; instead I favor being able to initialize a key structure of some kind which will keep the state of the encryption as it progresses (say if RC4 or a similar algorithm is ever used for encryption). This key structure would be initialized when the SA is created, used during the lifetime of the key, and destroyed when the SA expired or was just removed. I am going on the assumption that no symmetric algorithms, that are used today, which have the source data length not equal to the destination length. I may be completely wrong on this but for now I will stick with them being passed in as just one length. On a similar note, authentication functions in IPSec always generate a signature of an equal length (12 bytes) so the length of the destination of a signing operation should also not be specified. Thus as in encryption we have the source data, destination data, data length, key data, and key length as operands for any given authentication function. Because of the introduction of initialization of keys I think we need to note of a need for more functions. I propose we have an init_3des, clean_3des, etc. For each of the encryption and authentication algorithms. These functions would initialize and clean up a structure defining the internals variables needed for that algorithm; ex: MD5 state. Once again these would be called at SA creation time and there will have to be some creative way of making these functions not allocate anything but use the TBD space. But for now no specifics. :) == PROTOCOL ENGINE == This engine handles ESP and AH protocols. It has two entry points: inbound and outbound. And, once again the objective is to abstract a protocol handler from being a specific AH or ESP one. When a protocol handler's function is called it's because a specific tunnel was configured to use it. When creating a protocol handler you must specify the algorithms to use and their keys. The init_esp and init_ah functions will in turn call the init_* functions in the appropriate crypto engine defined by the algorithms specified. It would also be possible to specify if the algorithm should be a software one or a hardware one and so being able to decide a policy of hardware/software based on a connection type. More on this later. If we have a hardware accelerator that is capable of doing protocol processing the hardware module would register a hook for an ESP or AH handler (or both). == TUNNEL ENGINE == This part will remain in it's current reincarnation. The engine will handle all of the PFKey interface as well as handle ipsec_rcv and ipsec_tunnel. However, in the processing of skb's most of the work will be moved to the protocol engine. Currently in the KLIPS code, the ipsec_rcv and ipsec_tunnel functions lock the tdb entry while working on that SA. This is OK for single processor implementation running only in software. However in some instances the tbd may be used by multiple packets concurrently. So this lock will have to go away in a KLIPS that is hardware accelerated (or even wants to use two CPUs for a single but very-very-busy tunnel). I recommend using 'use count' and 'delete bit', as outlined in Rusty's Unreliable Locking Guide, to prevent from deleting a tdb before it's time is up. == PACKET PATH IN SOFTWARE == Now a question: what happens when a packet arrives? Well, the tunnel engine locates the appropriate tunnel entry in the tunnel database. It then calls the appropriate packet engine (ESP or AH as pointed to by a the tdb entry which was set at SA creation) to process it. The packet engine selects the appropriate crypto function (using a pointer that was set during SA creation) and executes it. On the completion of the crypto function the packet is returned for further packet processing like another crypto function. Once the packet processing is finished the packet is returned to the tunnel processor which may decide that another tunnel is appropriate for this packet and the cycle continues. Here is the above in a diagram (for the software case): [tunnel processing] [protocol processing] [crypto processing] PLUTO | | tdb_init(tdb, ...) |------------>| | | (pp_t*)pp_init(spi, | | e_alg,e_key,e_len, | | a_alg,a_key,a_len) | |-------------------->| | | | (enc_t*)enc_init(dir,key,len) | | |-------------------->| | | |pp->enc <------------| | | | | | | (auth_t*)auth_init(dir,key,len) | | |-------------------->| | | |pp->auth <-----------| | |tbd->pp <------------| |rc <---------| NIC | ipsec_rcv(skb) | | tdb_lookup(spi,...) |------------>| |tdb <--------| | tdb->pp.fn(tdb->pp,skb) |---------------------------------->| | | | | pp->enc.fn(pp->enc, | | skb+src_ofs, | | src_len, | | skb+dst_ofs, | | dst_len) | |-------------------->| | |rc <-----------------| | | | | pp->auth.fn(pp->auth, | | skb+src_ofs, | | src_len | | skb+dst_ofs, | | dst_len) | |-------------------->| | |rc <-----------------| |rc <-------------------------------| | (loop if unmangled packet is an ESP/AH) | | netif_rx(skb) Anyway, you get the picture. Here are some finer points: * ipsec_rcv will loop for nested tunnels/SA's * ipsec_tunnel will be similar to ipsec_rcv * tbd->pp.fn points to process_esp or process_ah depending on SA. * tbd->enc.fn points to dec_3des. * tbd->auth.fn points to verify_md5, verify_sha1. The point is that each layer makes no guesses how the next one works. For example we know that des encrypt and des decrypt is the same with a different key schedule. However, we do not use this and have two des functions. Advantage of this is that if a new function comes along that cannot use this trick we will still be able to use the layer preceding it; i.e. the processing layer. == PACKET PATH IN HARDWARE == The major issue about hardware acceleration is this: since most of KLIPS is running in bottom-half time (initiated by ISR of the NIC driver) it cannot sleep and wait for a device to complete it's computations. Even if it could we would not want to do this since we have better things to do; like servicing other user tasks, etc. What we must do when we dispatch the job on the crypto device is inform the IP stack that the packet was stolen and then detach. This is quite important; if we just returned from ipsec_rcv then the skb could have been deleted while we were still working on it. Here is my interpretation of what will happen if you add hardware to the above diagram. [tunnel processing] [protocol processing] [crypto processing] NIC | ipsec_rcv(skb) | | tdb_lookup(spi,...) |------------>| |tdb <--------| | |job = alloc_skb(tdb->pp.jobsize) |job->skb = skb; |job->tdb = tdb; | tdb->pp.fn(tdb->pp,job) |---------------------------------->| | | pp->enc.fn(pp->enc, | | job->skb+src_ofs, | | src_len, | | job->skb+dst_ofs, | | dst_len) | |-------------------->| | | |----> dispatch H/W | |rc <-----------------| |rc <-------------------------------| [ millions of nanoseconds later in code near by... ] H/W interrupt | ipsec_callback(job) | | job->tdb->pp.fn(job->tdb->pp,job) |---------------------------------->| | | pp->auth.fn(pp->auth, | | job->skb+src_ofs, | | src_len | | job->skb+dst_ofs, | | dst_len) | |-------------------->| | | |----> dispatch H/W | |rc <-----------------| |rc <-------------------------------| [ millions of nanoseconds later in code near by... ] H/W interrupt | ipsec_callback(job) | | job->tdb->pp.fn(job->tdb->pp,job) |---------------------------------->| |rc <-------------------------------| | (may need to repeat ipsec_rcv if unmangled packet is an ESP/AH) | | netif_rx(job->skb) | free_skb(job) A bit more involved but pretty much the same idea. The difference is that the continuity is broken twice; once to do encryption and once to do authentication. Here is a list of notes for this diagram: * job is a buffer that stores the information about one transaction; it is used internally to store all local variables that need to be kept around between the dispatch of the operation and the matching interrupt. * jobsize is calculated ahead of time during SA creation and contains the number of bytes used in the protocol processor and crypto processor. It is noteworthy to mention that there may be multiple jobs per tbd. This will happen most frequently on the receiver if the sender is must faster and can swamp its counterpart. For this reason we keep a separate 'job' structure for each packet that comes into KLIPS. == HARDWARE ACCELERATION POLICY == As briefly mentioned above you should be able to assign a crypto engine - be it software or an instance of a hardware accelerator - to a connection or even to an SA. For example say you have a massive S.G. with a fair number of static connections and a large number of possible road-warriors that may connect. You feel that the static connections are more important so you allocate a hardware engine to them but not to the %default connection. Thus each road-warrior would get the default engine - i.e. software. Say that in this box we have two accelerators named alpha and beta. Alpha is capable of performing whole packet processing for esp only. Beta is only able to perform 3des; a sample configuration could look like this: config setup hwload=alpha,beta ... conn %defaul doesp=software doah=software do3des=software domd5=software dosha1=software ... conn static-1 doesp=alpha ... conn static-2 do3des=beta ... conn roadwarrior ... In the above example alpha and beta are names of drivers that reside in /usr/modules/your.kernel.ver/kernel/net/ipsec along side ipsec.o. Thus after loading ipsec.o, but before setting up any tunnels, the setup script will have to check the hwload variable and load the modules listed there. These modules are most likely just wrappers around crypto drivers and will require that the main driver be loaded as well - modutils takes care of that for us. We have discussed signing crypto engines for loading into freeswan and decided against it as loading them requires root privileges. If you have root you can do it all anyway. == == As always... comments and critique are most appreciated. Regards, Bart Trojanowski ============================================================================= Ideas mentioned above are a copyright of Bart Trojanowski. If available, an updated version of this document can be found at: http://www.jukie.net/~bart/linux-ipsec/