MPFQ : Fast Finite fields

MPFQ is a software library intended for manipulation of finite fields. The announced objectives of MPFQ are both speed and usability. It is well-known that such objectives are hard to conciliate. We acknowledge a clear bias towards speed, since no concession is made in this regard. However the interface of MPFQ is expected to become sufficient for most intended uses.

The prominent strength of MPFQ is to deliver top-notch performance for finite fields of moderate size when some characteristics (e.g., modulus size) are known in advance. This is a very common application. Existing libraries so far offer the possibility to completely configure the finite fields used at runtime. For this reason alone, serious evaluation of the speed of an algorithm on a precise finite field often requires reimplementing the finite field layer.

MPFQ achieves good performance for predefined fields by providing automatically generated special-purpose code. A large portion of the MPFQ library is therefore written in Perl.

The two main targets of MPFQ are the Galois fields $\mathbb{F}_{2^n}$ and $\mathbb{F}_p$ with $p$ prime. Extensions of prime fields of characteristic other than $2$ will be supported at some point for completeness, but they are not the primary target for optimization.

Specializing code for a given field

An important feature of MPFQ is to allow the user to implement a specialization of a code in order to take advantage of optimizations that are available only for a field or a small class of fields. For instance, once you have an efficient implementation for prime fields for which the modulus fits in 2 machine words, you might have an application that deals with $\mathbb{F}_p$ where $p$ is a Mersenne-like prime for which a very fast reduction algorithm is available. Adding such an optimized reduction is not too hard in MPFQ, and you will get the full speed, since no test is done at runtime: you will have another C-type and new function names for this new field additionally to the previous general ones.

In principle, it is possible to add optimizations written in the C language or in assembly.

Generalizing code for a large set of fields

The MPFQ framework allows also to write generic, high level, code, that will have all the tests and branches at runtime. This is not the first goal, but it might be convenient to have this kind of setting at some point.

In the same spirit, it could be possible to have a C++ wrapper, to obtain all the coding facilities of object-oriented programming, and overriding operators. Again, this is not the priority, and we let this for the future.

Reporting bugs, comments, feature requests...

We acknowledge that it is not unlikely that there are many bugs hidden in parts that the authors do not often use. You can send your bug report, comments and feature requests to the following mailing list that is read by the authors:

        mpfq-discuss@lists.gforge.inria.fr

There are things that we are probably going to fix rapidly, and others that we are going to find non-prioritary. A typical thing that we do not plan to work on is the portability to other architectures, systems, compilers.

On the other side, if you have an application that is time-critical and that uses a particular field, tell us! We might find it fun to optimize it for you.

Chapter 2. Compile and install

Table of Contents

Hardware, system and software requirements

Generating C source and header files

Installing.

Optional targets

Examples
Targets for checking
Targets for benching
Targets for tuning

Hardware, system and software requirements

The MPFQ library is currently developped mostly for the x86_64 architecture (Intel's Core2 and AMD's Opteron processors). In principle, there is a plain C fallback implementation based on GMP for each function; so any hardware supported by GMP should be ok.

As for the operating system, the library is developped under GNU-Linux; it is expected to be not too hard to port to other Unices, but this is not thoroughly tested.

MPFQ requires some tools for the automatic code generation, and for the compilation:

Perl. Tested with versions 5.8.8 and higher
GCC. Better choose a version 4.x or higher. Beware. GCC versions 4.3.0 and 4.3.1 suffer from a severe bug (numbered 37101 in GCC's bug tracking system) which affects SSE-2 code generation. This may impact generated code for $\mathbb{F}_{2^n}$ unpredictably, the outcome being wrong generated code. For this reason, these compiler versions should be avoided. The bug in question is fixed in GCC 4.3.2.
GMP. Tested with version 4.2.2 and higher. Must be compiled without nails. If you give the location of gmp-impl.h and longlong.h, this might yield a speed up to MPFQ.
CMake. Last tested with 2.8.9 ; should work with most recent versions as well.

Building the documentation (which you are currently reading, so probably of little interest) requires some additional tools:

A complete tool chain for generating docbook documents. This includes in particular the xmllint and xsltproc programs, as well as the standard DocBook DTDs and style sheets. The corresponding packages are available in most Linux distributions.
Tools for parsing latex-style math text, and converting this to png images. The current scripts included with MPFQ achieve this goal using the following programs:
- latex
- ImageMagick

Generating C source and header files

First thing is to get a copy of MPFQ from http://www.mpfq.org and to uncompress it to get the sources in a path that we call in the following path_to_mpfq.

With CMake, out-of-source compiling is mandatory, so create a directory to contain the build. Let's call it build_dir in the sequel.

Go to build_dir, run cmake and make:

$ mkdir build_dir
$ cd build_dir
$ cmake path_to_mpfq/src
$ make

This will create in build_dir a directory tree similar to that of path_to_mpfq/src containing a lot of CMake junk, macro-generated source files and compiled files. It is reasonable to pass the -j to the make, in order to use the mulicore / hyperthreaded capabilities of your computer to speed-up this rather long compilation.

Installing.

Typing make install will copy all the required header files in prefix/include/mpfq and libmpfq_gfp.a, libmpfq_gfpe.a and libmpfq_gf2n.a into prefix/lib. The default value for prefix is /usr/local.

Compiling options, including this prefix parameter can be changed using the mechanism provided by CMake, namely typing ccmake . to call a GUI, or calling CMake with the command cmake path_to_mpfq/src -DCMAKE_INSTALL_PREFIX=your_prefix

Furthermore, CMake allows for the typicall ``staged install'' mechanism, by accepting a DESTDIR= argument to the make install command.

Note that once the files are installed, your applications can use them in a classical way, without using CMake.

Optional targets

You can optionally run make with the following targets:

Examples

apps Build the example applications given in the src/apps subdirectory of the distribution.

Targets for checking

check Run all tests listed below.
check-TAG, e.g. check-2_113 Run some test for the arithmetic of one field implementation, $\mathbb{F}_{2^{113}}$ in this example.
check-gfp Run some tests for prime field arithmetic.
check-gfpe Run some tests for extensions of prime field arithmetic.
check-g2n Run some tests for characteristic 2 field arithmetic.
check-apps Run some tests for the example applications.

Targets for benching

bench Measure the running time of basic operations in all implemented fields (takes a long time).
mpfqbench Run a bench that returns a single score for the whole library.
bench-apps Bench applications.

Targets for tuning

The following targets are related to the tuning mechanism documented in the next chapter:

tune-gf2n
tune-TAG

Chapter 3. Tuning

Table of Contents

Build targets for tuning
Exporting and importing tuning results

Build targets for tuning

Parts of MPFQ allow tuning: different possible codes are generated, and the fastest one is marked as preferred for later code generations.

Presently, the only part of MPFQ which allows this mechanism is the multiplication of binary polynomials. Hard-coded defaults offer satisfactory performance, but tuning helps in improving the speed. Possible targets are:

tune-gf2n Run tuning for all binary field sizes configured.
tune-TAG, e.g. tune-0-to-64 Perform tuning for the specified field sizes. The tuning results are saved into the file gf2x/wizard.table which serves as a cache database if the tuning target is run again. Type make help | grep tun to see the list of available tuning targets.

Notice that in order to observe the possible speedup gained from tuning, you must rebuild the affected target. The CMake dependency system should normally see, if you run e.g. make check after tuning, that binaries have to be rebuilt so that the newly chosen code is used.

A word of caution concerning tuning. The tuning process is not immune to speed measurement difficulties. The different possible codes are generated as function in a separate object file, and compared with respect to each other. This is not the same context as your favorite application, which in particular is likely to include the MPFQ functions as inlined code. Future releases of MPFQ will likely try to adopt a different approach to tuning.

A consequence of these measurement difficulties is that if you run make bench-apps, make tune-gf2n, in order, and make bench-apps again, you might not see as noticeable an improvement for $\mathbb{F}_{2^{251}}$ as the tuning code claims. In contrast, the tuning benefit for $\mathbb{F}_{2^{113}}$ is visible.

Exporting and importing tuning results

Tuning results are meaningful across identical CPUs. Therefore, it makes sense to share tuning results across different builds. For this reason, exporting and importing the tuning table is possible.

The file gf2x/wizard.table of the MPFQ build tree can be exported, after tuning, if you want to save your measurements.

Likewise, it is possible to teach the MPFQ build system into using an imported tuning table. For this purpose, after having initiated the build directory by running cmake /path/to/mpfq/source/dir, you may copy your tuning file to the location /path/to/mpfq/source/dir/gf2x/wizard.table.

Chapter 4. Writing applications using MPFQ

Table of Contents

Organization of MPFQ code
Names of MPFQ functions
General API of an MPFQ field
API for fields of characteristic two
API for prime fields in Montgomery representation
API for polynomials
Examples

Organization of MPFQ code

There are two important notions in MPFQ: the api_extension and the FieldFamily (also called the TAG). Their meanings are the following:

In MPFQ, there is a basis API. In some cases, one might want to define more funtions. We call api_extension a package of functions that we can choose to add to the basis API. For instance, CHAR2 is an api_extension containing functions we want for fields of characteristic 2. MGY is another api_extension for fields in Montgomery representation.
A FieldFamily, or a TAG corresponds to an implementation of a (possibly reduced to one element) family of fields. Each FieldFamily have the feature of the basis API plus the api_extensions that were chosen to make his use more convinient. Examples of FieldFamilys are 2_128 that is the field GF(2^{128}), pm_3 that is for the family of prime fields for which the modulus fits in 3 words and elements are in Montgomery representation.

Available api_extensions:

CHAR2: fields of characteristic 2
MGY: prime fields in Montgomery representation
POLY: package that provides univariate polynomials with coefficients in the field

Available FieldFamily:

2_x for x in [2..255]. This gives the finite field GF(2^x) in polynomial representation with a hard-coded (sparse) defining polynomial.
p_x for x in [1..9]. This family implements any prime field for which p fits in x machine words. The elements are stored in classical representation.
p_x_5 for x in [0..8]. This family implements any prime field for which p fits in x and a half machine words. The elements are stored in classical representation.
p_25519. This implements the field GF(2^{255}-19).
p_127_1. This implements the field GF(2^{127}-1).
p_127_735. This implements the field GF(2^{127}-735).
pm_x for x in [1..9]. This family implements any prime field for which p fits in x machine words. The elements are stored in Montgomery representation.
pm_x_5 for x in [0..8]. This family implements any prime field for which p fits in x and a half machine words. The elements are stored in Montgomery representation.
pf_e where pf is one of the four TAGs p_x, p_x_5, pm_x or pm_x_5. This family implements an extention of the base field implemented through pf. The elements are stored as polynomial with coefficent in the base field.

The FieldFamilys 2_x implement the api_extension CHAR2, pm_x and pm_x_5 implement MGY, and all FieldFamily above implement the api_extension POLY.

To each FieldFamily will correspond a C-type for fields and elements, and a set of functions following the basis API and eventually the api_extension s this family requires. All of them are gathered into a .c and a .h files named:

        mpfq_FieldFamily.c   mpfq_FieldFamily.h

Most of the code will actually be inside the .h file, so that the functions get inlined during compilation.

Names of MPFQ functions

The names of the functions in the MPFQ library (tend to) follow the following pattern: mpfq_FieldFamily_ObjectClass_Operation(), where

FieldFamily is a string (also called TAG) that defines the family of fields for which this function is available. Examples: 2_128, p_3, pm_3_5_e.
ObjectClass is a string that gives the class of objects the operation is acting on. It can be field, poly, vec ... It is omitted if the objects are elements in the base field.
Operation is a string that identifies the operation to be performed. Examples: mul, init, print ...

For instance, the function mpfq_2_19_vec_init() is for initializing vectors of elements over the field $\mathbb{F}_{2^{19}}$ .

The function mpfq_p_2_mul() is for multiplying elements of prime fields, for which the prime p fits in two machine word and coded in classical representation.

General API of an MPFQ field

All finite fields in MPFQ follow the same API that is defined in the file api.pl shipped with the MPFQ distribution. We reproduce it here, but if you have doubts about a possible mistake in this doc, please refer to the api.pl file within the source tree. This file is responsible for the enforcement of the calling parameters in the generated code and therefore closest to what is actually implemented.

In what follows, we assume that the FieldFamily is TAG.

Exported types:

    typedef mpfq_TAG_field;
    typedef mpfq_TAG_dst_field;
    
    typedef mpfq_TAG_elt;
    typedef mpfq_TAG_dst_elt;
    typedef mpfq_TAG_src_elt;
    
    typedef mpfq_TAG_elt_ur;
    typedef mpfq_TAG_dst_elt_ur;
    typedef mpfq_TAG_src_elt_ur;

    typedef mpfq_TAG_vec;
    typedef mpfq_TAG_dst_vec;
    typedef mpfq_TAG_src_vec;

    typedef mpfq_TAG_vec_ur;
    typedef mpfq_TAG_dst_vec_ur;
    typedef mpfq_TAG_src_vec_ur;

The field and dst_field types are for storing the current finite field. In some cases, the TAG already defines a unique finite field, however, even in that case, it is mandatory to have such a type, for consistency of the interface. An element of the finite field can be declared using the elt type. The elt type comes with two friends which give the types for passing elements as arguments of functions. The src variant translates into the const keyword. An unreduced element can be stored in a variable of type elt_ur. An unreduced element is a sum of products of two elements, for which the simplification has not been performed: reduction modulo p for prime fields in classical representation, montgomery reduction for prime fields in montgomery representation, and polynomial division for extension field like 2_n or TAG_e . Only a few functions are available for manipulating unreduced elements.

The following functions are available. All of them take a pointer to the finite field as first element. In all cases, the same element can be given in input and ouput. In general, the name is explicit enough, so that for most of them no explanation is really needed.

    /* Functions related to the field structure */
    void mpfq_TAG_field_init(mpfq_TAG_dst_field);
    void mpfq_TAG_field_clear(mpfq_TAG_dst_field);
    void mpfq_TAG_field_specify(mpfq_TAG_dst_field, unsigned long, void *);
    void mpfq_TAG_field_setopt(mpfq_TAG_dst_field, unsigned long, void *);
    int mpfq_TAG_field_degree(mpfq_TAG_dst_field);
    void field_characteristic(mpfq_TAG_dst_field, mpz_t);

Calling the field_init function is mandatory before doing anything with the field. In the case where TAG identifies a unique field, there is nothing more to do, but if TAG covers several finite fields, it is necessary to specify the particular finite field you really want to work in. The field_specify is meant for that purpose. The unsigned long argument describes the kind of specification. It can be MPFQ_PRIME_MPZ or MPFQ_POLYNOMIAL. The last argument identifies the finite field your want. For MPFQ_PRIME_MPZ, you must pass an mpz value.

The field_setopt has the same behaviour: the unsigned long argument gives the kind of optimization you want, and the last argument allows you to pass some information to describe the optimization. Note that this mechanism is intended to allow some optimization or precomputation made at runtime, and is usually not used for a TAG that describes a single field or a small family of fields for which the optimizations and precomputations can be made at compile-time.

    /* Element allocation functions */
    void mpfq_TAG_init(mpfq_TAG_dst_field, mpfq_TAG_elt *);
    void mpfq_TAG_clear(mpfq_TAG_dst_field, mpfq_TAG_elt *);

    /* Element assignment functions */
    void mpfq_TAG_set(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt);
    void mpfq_TAG_set_ui(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, unsigned long);
    void mpfq_TAG_set_zero(mpfq_TAG_dst_field, mpfq_TAG_dst_elt);
    unsigned long mpfq_TAG_get_ui(mpfq_TAG_dst_field, mpfq_TAG_src_elt);
    void mpfq_TAG_set_mpn(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mp_limb_t *, size_t);
    void mpfq_TAG_set_mpz(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpz_srcptr);
    void mpfq_TAG_get_mpn(mpfq_TAG_dst_field, mp_limb_t *, mpfq_TAG_src_elt);
    void mpfq_TAG_get_mpz(mpfq_TAG_dst_field, mpz_ptr, mpfq_TAG_src_elt);
    void mpfq_TAG_random(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, gmp_randstate_t);
    void mpfq_TAG_random2(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, gmp_randstate_t);

Note that the randomness is taken from the GMP random functions. We refer to the GMP documentation for initializing and seeding the gmp_randstate_t.

    /* Arithmetic operations on elements */
    void mpfq_TAG_add(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt, mpfq_TAG_src_elt);
    void mpfq_TAG_sub(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt, mpfq_TAG_src_elt);
    void mpfq_TAG_neg(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt);
    void mpfq_TAG_mul(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt, mpfq_TAG_src_elt);
    void mpfq_TAG_sqr(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt);
    void mpfq_TAG_pow(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt, unsigned long *, size_t);
    int mpfq_TAG_is_sqr(mpfq_TAG_dst_field, mpfq_TAG_src_elt);
    int mpfq_TAG_sqrt(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt);
    void mpfq_TAG_frobenius(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt);
    void mpfq_TAG_add_ui(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt, unsigned long);
    void mpfq_TAG_sub_ui(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt, unsigned long);
    void mpfq_TAG_mul_ui(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt, unsigned long);
    int mpfq_TAG_inv(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt);
    void mpfq_TAG_hadamard(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_dst_elt, mpfq_TAG_dst_elt, mpfq_TAG_dst_elt);

This last function hadamard is marked as optional in the API, so before using it, one should verify whether it is implemented in the FieldFamilly he wants to use; for all optional calls in the API, a preprocessor macro is defined if the function is provided. This symbol is HAVE_TAG_FunctionName, i.e. in this case HAVE_mpfq_p_1_hadamard.


    /* Operations involving unreduced elements */
    void mpfq_TAG_elt_ur_init(mpfq_TAG_dst_field, mpfq_TAG_elt_ur *);
    void mpfq_TAG_elt_ur_clear(mpfq_TAG_dst_field, mpfq_TAG_elt_ur *);
    void mpfq_TAG_elt_ur_set(mpfq_TAG_dst_field, mpfq_TAG_dst_elt_ur, mpfq_TAG_src_elt_ur);
    void mpfq_TAG_elt_ur_set_elt(mpfq_TAG_dst_field, mpfq_TAG_dst_elt_ur, mpfq_TAG_src_elt);
    void mpfq_TAG_elt_ur_set_zero(mpfq_TAG_dst_field, mpfq_TAG_dst_elt_ur);
    void mpfq_TAG_elt_ur_set_ui(mpfq_TAG_dst_field, mpfq_TAG_dst_elt_ur, unsigned long);
    void mpfq_TAG_mul_ur(mpfq_TAG_dst_field, mpfq_TAG_dst_elt_ur, mpfq_TAG_src_elt, mpfq_TAG_src_elt);
    void mpfq_TAG_sqr_ur(mpfq_TAG_dst_field, mpfq_TAG_dst_elt_ur, mpfq_TAG_src_elt);
    void mpfq_TAG_elt_ur_add(mpfq_TAG_dst_field, mpfq_TAG_dst_elt_ur, mpfq_TAG_src_elt, mpfq_TAG_src_elt);
    void mpfq_TAG_elt_ur_sub(mpfq_TAG_dst_field, mpfq_TAG_dst_elt_ur, mpfq_TAG_src_elt, mpfq_TAG_src_elt);
    void mpfq_TAG_elt_ur_neg(mpfq_TAG_dst_field, mpfq_TAG_dst_elt_ur, mpfq_TAG_src_elt);
    void mpfq_TAG_reduce(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_dst_elt_ur);
    
    /* Comparison functions */
    int mpfq_TAG_cmp(mpfq_TAG_dst_field, mpfq_TAG_src_elt, mpfq_TAG_src_elt);
    int mpfq_TAG_cmp_ui(mpfq_TAG_dst_field, mpfq_TAG_src_elt, unsigned long);
    int mpfq_TAG_is_zero(mpfq_TAG_dst_field, mpfq_TAG_src_elt);

The comparison functions return 0 if the elements are equal and 1 or -1 otherwise, depending on some arbitrary order (this order allows sorting).


    /* Input/output functions */
    void mpfq_TAG_asprint(mpfq_TAG_dst_field, char**, mpfq_TAG_src_elt);
    void mpfq_TAG_fprint(mpfq_TAG_dst_field, FILE*, mpfq_TAG_src_elt);
    void mpfq_TAG_print(mpfq_TAG_dst_field, mpfq_TAG_src_elt);
    int mpfq_TAG_sscan(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, const char*);
    int mpfq_TAG_fscan(mpfq_TAG_dst_field, FILE*, mpfq_TAG_dst_elt);
    int mpfq_TAG_scan(mpfq_TAG_dst_field, mpfq_TAG_dst_elt);
    std::istream& mpfq_TAG_cxx_in(mpfq_TAG_dst_field, std::istream&, mpfq_TAG_dst_elt);
    std::ostream& mpfq_TAG_cxx_out(mpfq_TAG_dst_field, std::ostream&, mpfq_TAG_src_elt);

The function asprint prints the element in the given string, ending with \0 char. The functions sscan, fscan and scan return 1 if the parsing was succesful and 0 otherwise. The functions cxx_in and cxx_out are only exposed if __cplusplus is defined.


    /* Vector allocation functions */
    void mpfq_TAG_vec_init(mpfq_TAG_dst_field, mpfq_TAG_vec *, unsigned int);
    void mpfq_TAG_vec_reinit(mpfq_TAG_dst_field, mpfq_TAG_vec *, unsigned int, unsigned int);
    void mpfq_TAG_vec_clear(mpfq_TAG_dst_field, mpfq_TAG_vec *, unsigned int);

    /* Vector assignment functions */
    void mpfq_TAG_vec_set(mpfq_TAG_dst_field, mpfq_TAG_dst_vec, mpfq_TAG_src_vec, unsigned int);
    void mpfq_TAG_vec_set_zero(mpfq_TAG_dst_field, mpfq_TAG_dst_vec, unsigned int);
    void mpfq_TAG_vec_setcoef(mpfq_TAG_dst_field, mpfq_TAG_dst_vec, mpfq_TAG_src_elt, unsigned int);
    void mpfq_TAG_vec_setcoef_ui(mpfq_TAG_dst_field, mpfq_TAG_dst_vec, unsigned long, unsigned int);
    void mpfq_TAG_vec_getcoef(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_vec, unsigned int);
    void mpfq_TAG_vec_random(mpfq_TAG_dst_field, mpfq_TAG_dst_vec, unsigned int, gmp_randstate_t);
    void mpfq_TAG_vec_random2(mpfq_TAG_dst_field, mpfq_TAG_dst_vec, unsigned int, gmp_randstate_t);

    /* Arithmetic operations on vectors */
    void mpfq_TAG_vec_add(mpfq_TAG_dst_field, mpfq_TAG_dst_vec, mpfq_TAG_src_vec, mpfq_TAG_src_vec, unsigned int);
    void mpfq_TAG_vec_neg(mpfq_TAG_dst_field, mpfq_TAG_dst_vec, mpfq_TAG_src_vec, unsigned int);
    void mpfq_TAG_vec_rev(mpfq_TAG_dst_field, mpfq_TAG_dst_vec, mpfq_TAG_src_vec, unsigned int);
    void mpfq_TAG_vec_sub(mpfq_TAG_dst_field, mpfq_TAG_dst_vec, mpfq_TAG_src_vec, mpfq_TAG_src_vec, unsigned int);
    void mpfq_TAG_vec_scal_mul(mpfq_TAG_dst_field, mpfq_TAG_dst_vec, mpfq_TAG_src_vec, mpfq_TAG_src_elt, unsigned int);
    void mpfq_TAG_vec_conv(mpfq_TAG_dst_field, mpfq_TAG_dst_vec, mpfq_TAG_src_vec, unsigned int, mpfq_TAG_src_vec, unsigned int);

    /* Comparison functions */
    int mpfq_TAG_vec_cmp(mpfq_TAG_dst_field, mpfq_TAG_src_vec, mpfq_TAG_src_vec, unsigned int);
    int mpfq_TAG_vec_is_zero(mpfq_TAG_dst_field, mpfq_TAG_src_vec, unsigned int);

    /* Input/output functions */
    void mpfq_TAG_vec_asprint(mpfq_TAG_dst_field, char**, mpfq_TAG_src_vec, unsigned int);
    void mpfq_TAG_vec_fprint(mpfq_TAG_dst_field, FILE*, mpfq_TAG_src_vec, unsigned int);
    void mpfq_TAG_vec_print(mpfq_TAG_dst_field, mpfq_TAG_src_vec, unsigned int);
    int mpfq_TAG_vec_sscan(mpfq_TAG_dst_field, mpfq_TAG_vec *, unsigned int *, const char*);
    int mpfq_TAG_vec_fscan(mpfq_TAG_dst_field, FILE*, mpfq_TAG_vec *, unsigned int *);
    int mpfq_TAG_vec_scan(mpfq_TAG_dst_field, mpfq_TAG_vec *, unsigned int *);

    /* Operations involving unreduced elements */
    void mpfq_TAG_vec_ur_init(mpfq_TAG_dst_field, mpfq_TAG_vec_ur *, unsigned int);
    void mpfq_TAG_vec_ur_reinit(mpfq_TAG_dst_field, mpfq_TAG_vec_ur *, unsigned int, unsigned int);
    void mpfq_TAG_vec_ur_clear(mpfq_TAG_dst_field, mpfq_TAG_vec_ur *, unsigned int);
    void mpfq_TAG_vec_ur_set_zero(mpfq_TAG_dst_field, mpfq_TAG_dst_vec_ur, unsigned int);
    void mpfq_TAG_vec_ur_set_vec(mpfq_TAG_dst_field, mpfq_TAG_dst_vec_ur, mpfq_TAG_src_vec, unsigned int);
    void mpfq_TAG_vec_ur_set(mpfq_TAG_dst_field, mpfq_TAG_dst_vec_ur, mpfq_TAG_src_vec_ur, unsigned int);
    void mpfq_TAG_vec_ur_setcoef(mpfq_TAG_dst_field, mpfq_TAG_dst_vec_ur, mpfq_TAG_src_elt_ur, unsigned int);
    void mpfq_TAG_vec_ur_getcoef(mpfq_TAG_dst_field, mpfq_TAG_dst_elt_ur, mpfq_TAG_src_vec_ur, unsigned int);
    void mpfq_TAG_vec_ur_add(mpfq_TAG_dst_field, mpfq_TAG_dst_vec_ur, mpfq_TAG_src_vec_ur, mpfq_TAG_src_vec_ur, unsigned int);
    void mpfq_TAG_vec_ur_neg(mpfq_TAG_dst_field, mpfq_TAG_dst_vec_ur, mpfq_TAG_src_vec_ur, unsigned int);
    void mpfq_TAG_vec_ur_rev(mpfq_TAG_dst_field, mpfq_TAG_dst_vec_ur, mpfq_TAG_src_vec_ur, unsigned int);
    void mpfq_TAG_vec_ur_sub(mpfq_TAG_dst_field, mpfq_TAG_dst_vec_ur, mpfq_TAG_src_vec_ur, mpfq_TAG_src_vec_ur, unsigned int);
    void mpfq_TAG_vec_scal_mul_ur(mpfq_TAG_dst_field, mpfq_TAG_dst_vec_ur, mpfq_TAG_src_vec, mpfq_TAG_src_elt, unsigned int);
    void mpfq_TAG_vec_conv_ur(mpfq_TAG_dst_field, mpfq_TAG_dst_vec_ur, mpfq_TAG_src_vec, unsigned int, mpfq_TAG_src_vec, unsigned int);
    void mpfq_TAG_vec_reduce(mpfq_TAG_dst_field, mpfq_TAG_dst_vec, mpfq_TAG_dst_vec_ur, unsigned int);

API for fields of characteristic two

If a field family belongs to the CHAR2 realm, then the API provides these additional functions:

    /* Behaves differently from set(): implements the reverse counting
       map from [0..#K-1] -> K. */
    void mpfq_TAG_set_uipoly(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, unsigned long);
    void mpfq_TAG_set_uipoly_wide(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, const unsigned long *, unsigned int );
    unsigned long mpfq_TAG_get_uipoly(mpfq_TAG_dst_field, mpfq_TAG_src_elt);
    void mpfq_TAG_get_uipoly_wide(mpfq_TAG_dst_field, unsigned long *, mpfq_TAG_src_elt);

    /* Arithmetic with uipoly elements */
    void mpfq_TAG_add_uipoly(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt, unsigned long);
    void mpfq_TAG_sub_uipoly(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt, unsigned long);
    void mpfq_TAG_mul_uipoly(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt, unsigned long);

    /* Artin Schreier equations: x^p-x = a. Undefined behaviour if
       Tr(a) is not 0. */
    void mpfq_TAG_as_solve(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt);
    unsigned long mpfq_TAG_trace(mpfq_TAG_dst_field, mpfq_TAG_src_elt);

API for prime fields in Montgomery representation

In Montgomery representation, an element $x$ is represented by $xR \mod p$ , where $R$ is a power of 2 corresponding to the number of machine-words that can contain $p$ .

Encoding and decoding between classical and Montgomery representation is implemented by the following functions:

    void mpfq_TAG_mgy_enc(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt);
    void mpfq_TAG_mgy_dec(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_elt);

API for polynomials

For FieldFamilys for which the api_extensions POLY is implemented, which are every one for the moment, there are three more types.

    typedef mpfq_TAG_poly;
    typedef mpfq_TAG_dst_poly;
    typedef mpfq_TAG_src_poly;

To manipulate objects of one of these types, one can use the following functions.

    /* Element allocation functions */
    void mpfq_TAG_poly_init(mpfq_TAG_dst_field, mpfq_TAG_poly, unsigned int);
    void mpfq_TAG_poly_clear(mpfq_TAG_dst_field, mpfq_TAG_poly);

    /* Element assignment functions */
    void mpfq_TAG_poly_set(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_src_poly);
    void mpfq_TAG_poly_setmonic(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_src_poly);
    void mpfq_TAG_poly_setcoef(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_src_poly, unsigned int);
    void mpfq_TAG_poly_setcoef_ui(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, unsigned long, unsigned int);
    void mpfq_TAG_poly_getcoef(mpfq_TAG_dst_field, mpfq_TAG_dst_elt, mpfq_TAG_src_poly, unsigned int);
    void mpfq_TAG_poly_random(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, unsigned int, gmp_randstate_t);
    void mpfq_TAG_poly_random2(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, unsigned int, gmp_randstate_t);

    /* Arithmetic operations on elements */
    void mpfq_TAG_poly_add(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_src_poly, mpfq_TAG_src_poly);
    void mpfq_TAG_poly_sub(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_src_poly, mpfq_TAG_src_poly);
    void mpfq_TAG_poly_add_ui(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_src_poly, unsigned long);
    void mpfq_TAG_poly_sub_ui(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_src_poly, unsigned long);
    void mpfq_TAG_poly_neg(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_src_poly);
    void mpfq_TAG_poly_scal_mul(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_src_poly, mpfq_TAG_src_elt);
    void mpfq_TAG_poly_mul(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_src_poly, mpfq_TAG_src_poly);
    void mpfq_TAG_poly_divmod(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_dst_poly, mpfq_TAG_src_poly, mpfq_TAG_src_poly);
    void mpfq_TAG_poly_precomp_mod(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_src_poly);
    void mpfq_TAG_poly_mod_pre(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_src_poly, mpfq_TAG_src_poly, mpfq_TAG_src_poly);

The function divmod implements, for now, the basic algorithm for the Euclidiean division. The functions precomp_mod and mod_pre are related to another algorithm for the Euclidean division. precomp_mod(K,Q,P) computes the inverse of $X^NP(1/X)$ modulo $X^N$ , and puts it in Q, where N is the degree of P. It requires that P be a monic polynomial over the field K. mod_pre(K,R,Q,P,invrevP) computes the reminder of the Euclidean division of Q by P and puts it in R, provided that invrevP is the inverse of $X^NP(1/X)$ modulo $X^N$ , where N is the degree of P. For more information, one can read chapter 9 of Modern Computer Algebra from Joachim von zur Gathen and Jürgen Gerhard, and in particular Algotihm 9.5.

    /* gcd functions */
    void mpfq_TAG_poly_gcd(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_src_poly, mpfq_TAG_src_poly);
    void mpfq_TAG_poly_xgcd(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, mpfq_TAG_dst_poly, mpfq_TAG_dst_poly, mpfq_TAG_src_poly, mpfq_TAG_src_poly);
    /* Comparison function */
    int mpfq_TAG_poly_cmp(mpfq_TAG_dst_field, mpfq_TAG_src_poly, mpfq_TAG_src_poly);
    int mpfq_TAG_poly_deg(mpfq_TAG_dst_field, mpfq_TAG_src_poly);

    /* Input/output functions */
    void mpfq_TAG_poly_asprint(mpfq_TAG_dst_field, char**, mpfq_TAG_src_poly);
    void mpfq_TAG_poly_fprint(mpfq_TAG_dst_field, FILE*, mpfq_TAG_src_poly);
    void mpfq_TAG_poly_print(mpfq_TAG_dst_field, mpfq_TAG_src_poly);
    int mpfq_TAG_poly_sscan(mpfq_TAG_dst_field, mpfq_TAG_dst_poly, const char*);
    int mpfq_TAG_poly_fscan(mpfq_TAG_dst_field, FILE*, mpfq_TAG_dst_poly);
    int mpfq_TAG_poly_scan(mpfq_TAG_dst_field, mpfq_TAG_dst_poly);

Examples

Some examples of using the MPFQ library are given in the apps/ subdirectory of the distribution.

In the test/ subdirectory, there is also an example. In that case, a genericity layer is obtained using CPP macros, automatically computed from the API file. In the future, this layer might evolve (possibly into a C++ wrapper) and be more documented.

Chapter 5. Design of the library

Table of Contents

Foreword

Advantages of automatically generated code
Disadvantages of automatically generated code

Foreword

This chapter comments the main design decisions concerning the MPFQ library. Reading this chapter is not an absolute necessity, but helps understanding the logic behind MPFQ

The most important decision concerning MPFQ is the use of automatically generated code for a wide variety of uses. This is a necessity called for by the efficiency requirements. Obviously it comes with pros and cons, some of which will be explained right below

Advantages of automatically generated code

Generating code automatically enables some benefits which are hardly accessible with static code. The first thing to note is that obviously, special-purpose code, fitted for finite fields satisfying some specified characteristics, may only be issued when these characteristics are known, in compile time.

The data type for finite fields elements is not fixed. It can be trimmed down to only the necessary information. This has the consequence that at least in the cases of $\mathbb{F}_p$ and $\mathbb{F}_{2^n}$ (respectively, for known word size of $p$ , and for known value of $n$ ), the storage size of finite field elements is constant, and hides no indirection. No reallocation is required. Objects can be dealt with at any arbitrary place in memory, every sequence of bytes in memory may be interpreted as a finite field element.
It is possible to handle small fields at best speed; if only one assembly instruction is needed, then generated code will (or at least, should) be an inline function equivalent to exactly that instruction.
Extending this principle, if assembly instructions very well adapted to the finite field in consideration are available on some given platform, then generated code ought to use these instuctions. Examples of such cases are MMX and SSE2 instructions, useful for multiplications in $\mathbb{F}_{2^n}$ .
Using generated code so widely is special to MPFQ and gives an incentive for better dissemination. Having the ability to provide the best possible speed for finite fields known in advance makes MPFQ a goood choice for every situtation where the finite field in which calculations are to be done is known: there is no need to reinvent the wheel. The authors of MPFQ recall having in several occasions re-implemented some dirty interface to finite field operations, only to gain the « expected » improvement from special-purpose code. The very purpose of MPFQ is to address this situation, which existing software libraries do not do.

Disadvantages of automatically generated code

As can be expected, the road of automatically generated code incurs some difficulties. We felt however that none of these difficulties should bar the efficiency goal set forth for MPFQ. We list here the difficulties that we have considered.

Special-purpose code is, well, special-purpose. This clearly forbids generic compiled code that will « just work » on any finite field. This being said, coherence in typing and function naming, through the MPFQ API makes it possible to re-use source code (much in the way C++ templates work — however MPFQ is entirely in C).

Chapter 6. How to implement or modify a field

Table of Contents

Enforcing the API

Input/Output of code_for_xxx()

Splitting information in several files

The inheritance schema, and conflict resolution

The elementary package files: the init_handler functions.

The top-level package file

The main generator file

Some extra notes

Enforcing the API

Each implementation should strictly follow the API. For that an automated generation scheme is present that must be used. Therefore, for each field or class of field, one must provide a list of Perl functions of the form code_for_xxx(), where xxx is the name of a function in the API. The generation scheme will then be able to check that every function in the API is indeed implemented and will produce a prototype directly from the API.

Input/Output of `code_for_xxx()`

For basic functions, the function code_for_xxx() does not take any input. However, in order to give a way to choose between different codes, a hash called $options is passed as a first argument. It is up to the field implementer to decide which options are pertinent for the class of fields he considers. A very typical option will be the word size (32 or 64).

In order for the function code_for_xxx() to have access to appropriate names, the special sequence @! is available: it is verbatim translated to the appropriate prefix mpfq_TAG_ during code creation. Hence, for instance, if the code needs an intermediate field element for the computation, it can be declared like that:

        @!elt tmp_elt;

The output of code_for_xxx() should be an array of two elements: the first follows the prototype given by the API, and give names to the parameters of the function xxx to be used in the implementation. The second element returned by code_for_xxx() is a string giving the C function implementation. For example, for the field GF(7), the addition will be implemented by the following Perl code:

        sub code_for_add {
            return [ 'inline(K!, z, x, y)', "z = (x+y)%7;" ];
        }

If the implementation of code_for_xxx() requires auxiliary functions that are not listed in api.pl (for instance code_for_mul might rely on code_for_karatsuba), then they must be returned as one or several additional members at the end of the returned array. Each such member must be a hash reference of the following form:

        {
          kind => 'inline(K!, z, x, y)',
          code => 'z = x+y',
          name => 'unreduced_add',
          requirements => 'dst_elt src_elt src_elt'
        }

where kind and requirements describe the prototype, code is the body of the function, and name is its name as called by code_for_xxx(). Beware that it will be created with a mpfq_TAG prefix, so that inside code_for_xxx() you must add @! in front of the name you give in the name key.

There is a small example in src/gf7/code_for_7.pm, that is a (very partial!) implementation of GF(7).

Splitting information in several files

The various functions to be implemented are to be provided in the form of code_for_xxx() perl functions. For better organization, these may be organized in several distinct perl modules. This is not a necessity, but it is at least good to know how this is supposed to work. We distinguish three types of files.

The elementary package files. These only have the mission of providing some code_for_xxx functions. The MPFQ distribution contains some such packages, e.g. in Mpfq::defaults. These files must be named with the extension .pm, and abide by the following requirements.
- begin with the proper package statement. This statement must match the way the package is made reachable from perl's includes. E.g. if the file is to be reached at the path some/where/pkg.pm, then it should begin with package some::where::pkg;.
- Terminate with the 1; statement to ensure a true value on package load.
- Each elementary package file may include other package files of the same form. Such included packages are called parents of the current package. This mechanism resembles an object model's multiple inheritance schema, although MPFQ has chosen to divert from perl's built-in MI mechanism. This is detailed later on.
We note that the set of code_for_xxx functions provided by an elementary package file does not have to be complete in any way.
The top-level package file. This file is a wrap-up of several elementary package files. It typically includes many elementary package files, and as such, it obeys the same requirements as the package files themselves. The top-level package files does not absolutely need to rely of inclusion of several package files, as it is possible for all code_for_xxx functions to appear directly at the top-level. The stuff which distinguishes the top-level package file from the more elementary ones is that it must define a perl object, which inherits from Mpfq::engine::handler. This is achieved by including the following statements in the code.
```
use Mpfq::engine::handler;
our @ISA = qw/Mpfq::engine::handler/;
sub new { return bless({}, shift); }
```
An additional requirement for the top-level package file is that the set of code_for_xxx functions it (and its parents) provide has to be complete.
The generator file. This file mainly arranges so that the top-level package file gets created, and calls the appropriate functions. One of the tasks of the main generator file is also to setup the perl include path properly (possibly relatively to the current file).

The inheritance schema, and conflict resolution

Several package files may be stitched together to form a complete implementation. This is achieved, in a package (including, but not restricted to, the top-level package file), by inclusion statements such as:

use Mpfq::defaults;
use Mpfq::defaults::flatdata;
our @parents = qw/
    Mpfq::defaults 
    Mpfq::defaults::flatdata
/;

The question which arises is the decision of what happens if a given code_for_xxx functions exists in package and also one or several of its parents. We distinguish the notion of a function code_for_xxx appearing in a given package, where we say the the package directly provides the function, or appearing in a package or one of this ancestor, in which case we say that the function is reachable from that package. The following rules are applied to constitute the list of functions reachable from a package unambiguously.

If a package directly provides a function, this implementation has precedence over all functions of the same name reachable from any of its parents.
If a package does not provide a given function directly, this function must be reachable by at most one of the package's parents. As a moderating exception to this rule, we also tolerate (albeit at the expense of a warning) the situation where the only implementation of a function which is reachable from a given package is within a single ancestor of the package, but accessible from the package via two distinct ancestry threads.
To alleviate the concern about conflicting ancestries to reach a function from a given package, a conflict resolution mechanism exists. This involves the package variable $resolve_conflicts. For example, some default package files of MPFQ include a code fragment similar to:
```
use Mpfq::defaults;
use Mpfq::defaults::flatdata;
our @parents = qw/
    Mpfq::defaults
    Mpfq::defaults::flatdata
/;  
our $resolve_conflicts = {
    vec_set => 'Mpfq::defaults::flatdata',
    vec_ur_set => 'Mpfq::defaults::flatdata',
};  
```
The syntax of this hash table should be self-explanatory.

The elementary package files: the `init_handler` functions.

There is little to mention about the elementary package files. As stated, these must include some code_for_xxx functions. Beyond that, a couple of extra functions may be provided, names init_handler and exit_handler. As their name suggest, these are called before and after the actual code generation. None of the built-in elementary package files un MPFQ makes use of the exit_handler mechanism, but the init_handler functions are widely used. These are meant to instruct MPFQ to add for example some #include statements in the generated code. The data fed to init_handler on input, and expected from its returned value, is specified as follows.

On input, a reference to a hash is provided. This exactly matches the $options argument which is fed to the create_code method from the main generator file. The init_handler may examine this hash, and base its returned value on that.

The output is also a reference to a hash. The members end up in the final $code hash in the generation mechanism (see main generator file). The expected members may be any of the following.

types. This defines the types provided by the implementation. For example, one may have:

sub init_handler {
  return { types => { elt => "typedef unsigned long @!elt\[1\];" } };
}

banner. This gets inserted verbatim in the generated .c and .h files. For example one may have:
```
sub init_handler {
  my $opt = shift;
  return { banner => "/* $opt->{'poly'} */" };
}
```
includes. This defines files to be #include'd from the generated .h file. As an alternate syntax, the key c:includes may be used for files which are to be included only from the generated .c file. We may have:
```
sub init_handler {
  return { includes => "<stdio.h>", 'c:includes' => "<inttypes.h>" };
}
```

The top-level package file

The goal of the top-level package file is to make as many function of the MPFQ API reachable. The top-level package file also must define a Perl object. We discuss by example the file provided in the MPFQ distribution as an example, under src/gf7/gf7.pm.

We proceed through this file line by line (comments, which exist in the file, are stripped off here).

package gf7;

use strict;
use warnings;

Not much to say about the first fragment above. The package name must match the file name.

use Mpfq::engine::handler;
our @ISA = qw/Mpfq::engine::handler/;
sub new { return bless({},shift); }

Here we set up the object mechanism. Not much (well actually nothing) is conveyed within the object in terms of data^[1]. The main use of the object is to contain a reference to a given package.

use Mpfq::defaults;
use Mpfq::defaults::flatdata;
use Mpfq::defaults::poly;
our @parents = qw/
    Mpfq::defaults
    Mpfq::defaults::flatdata
    Mpfq::defaults::poly
/;

The fragment above lists the parents of the given package. All package files may mention such parents.

our $resolve_conflicts = {
    vec_set => 'Mpfq::defaults::flatdata',
    vec_ur_set => 'Mpfq::defaults::flatdata',
};

This is optional, and is used to disambiguate possible conflicts.

sub code_for_add { return [ 'inline(K!, z, x, y)', "z = (x+y)%7;" ]; }

This is one of the code generation functions expected in this file.

sub code_for_sub_helper {
    return {
        kind=>'inline(z, x, y)',
        name=>'sub_helper',
        requirements=>'dst_elt src_elt src_elt',
        code=>'z = x-y;',
    };
}
# The function called by the automatic generator should return as usual
# its proto and its code, but additionnally a hash describing the helper
# function (there might be several of them).
sub code_for_sub {
    my $proto = 'inline(k, z, x, y)';
    my $code = '';
    $code .= "@!sub_helper(z, x, y);\n";
    $code .= "if ((long)z < 0)\n";
    $code .= "      z += 7;\n";
    return [ $proto, $code ], code_for_sub_helper();
}

This example is slightly more complicated and uses the extended syntax allowed for code_for_xxx, where dependencies between functions are possible.

sub init_handler {
  my $types = {
    elt =>      "typedef unsigned long @!elt\[1\];",
    dst_elt =>  "typedef unsigned long * @!dst_elt;",
    src_elt =>  "typedef const unsigned long * @!src_elt;",

    elt_ur =>   "typedef unsigned long @!elt_ur\[1\];",
    dst_elt_ur =>       "typedef unsigned long * @!dst_elt_ur;",
    src_elt_ur =>       "typedef const unsigned long * @!src_elt_ur;",

    field       =>      'typedef void * @!field;',
    dst_field   =>      'typedef void * @!dst_field;',
  };
  return { types => $types };
}
1;

This finished the package file (note the 1;). The init_handler provides some typedefs to be included in the generated code.

The main generator file

Traditionally, the main generator file is called gen_XYZ.pl and lies in a directory XYZ where the code generation for TAG is going to take place. We start by giving an example that is available in the MPFQ distribution in src/gf7/gen_gf7.pl:

#!/usr/bin/perl -w

use warnings;
use strict;

use File::Spec;
my $dirname;

BEGIN {
    $dirname = (File::Spec->splitpath($0))[1];
    unshift @INC, "$dirname/../perl-lib";
}

use Mpfq::engine::conf qw(read_api);

use gf7;

# Use the default api file.
my $api_file = "$dirname/../api.pl";

MAIN: {

  # Read api file. At that point, one can also pass one or several
  # api_extensions monikers.
  my $api = Mpfq::engine::conf::read_api $api_file, qw/POLY URE/;
  my $options = {};
  my $code = {};

  # This calls code_for_xxx for all xxx functions in the api.

  my $object = gf7->new();

  $object->create_code($api, $code, $options);

  # This creates the .c and .h files.
  $object->create_files('.', 'gf7');
}

The code generation proceeds in 4 steps:

Reading the API. This is done using the read_api() function, which takes a file, and optionally, realms. It returns a hash that is usually stored in a variable $api that will be passed as an argument to other functions of the code generation.
Creating the object of the proper type. For the moment this involves only calling the new method of the chosen package. The methods below for code generations are methods of that object, since all objects considered inherit from common MPFQ bare bones.
Creating the code with create_code(). This is the function that calls all the code_for_xxx functions that creates C code. It works by side-effect on the $code variable. Prior to calling the code generating functions, create_code() calls the init_handler() if present in the packages within the ancestry of the class of the object. Upon exit it similarily calls the exit_handler() function. Each init_handler() should return hash entries to be inserted in the $code parameter. It is also possible to modify the $code hash directly from the main generator, instead of from a package file (although best practices favor the latter).
Creating the files with create_files(). This write the resulting .c and .h files. The first argument specifies a path for the output files, which are named mpfq_TAG.c and mpfq_TAG.h, where TAG denotes the second argument.

The main generator file is rather short, and does very little beyond the steps above. We go through its different sections in order.

#!/usr/bin/perl -w
use warnings;
use strict;

It is handy to make the main generator file an executable perl script.

use File::Spec;
my $dirname;
BEGIN {
    $dirname = (File::Spec->splitpath($0))[1];
    unshift @INC, "$dirname/../perl-lib";
}

The fragment above is the chosen mechanism to setup perl's include path to also contain the perl-lib subdirectory at the upper level.

use gf7;

This loads the top-level package file (described in the previous section).

use Mpfq::engine::conf qw(read_api);
my $api_file = "$dirname/../api.pl";
my $api = Mpfq::engine::conf::read_api $api_file, qw/FIELD POLY/;

This is mandatory as one of the tasks of the main generator file is to read the MPFQ API file.

  my $object = gf7->new();

This creates an object of the class defined by the top-level package file.

  my $code = {};
  my $options = {};

The two (references to) hashes above must be created. The first array will hold the complete generated codde, while the $options hash is passed as an argument to all code generation functions. It is a convenient way to pass information to these functions.

  $object->create_code($api, $code, $options);

This effectively does the code generation in memory (within the $code hash, to which $object takes a reference).

$object->create_files('.', 'gf7');

and this final statement creates the source files.

Some extra notes

The definition of the types required by the API can be done at various places, including in the MAIN, like in the example. These types have to be added as the types member of the $code hash.

All along the generation, one might want to pass some information to the code_for_xxx; the $options is meant for that. What to put in this hash is up to the implementor.

Let us summarize the meaning of the 3 important variables:

$code is a hash. Here is the expected contents at the end of the process (before entering create_files()).
- There should be: an entry for each function to implement:
```
    $code->{'add'} = ...;
            
```
  where ... is a hash reference describing the implementation. The implementation may be tagged as macro, function, or inlined. See perl-lib/Mpfq/handler.pm (comments before the ship_code function) for the documentation of the other hash members.
- There must be an entry describing types (generated either from the gen_TAG.pl or from some init_handler() as described above)
```
     $code->{'types'} = {
         elt => "typedef unsigned long $elt\[17\];",
         dst =>  "typedef unsigned long * $dst;",
         src =>  "typedef const unsigned long * $src;",
         ...
     }
            
```
- It is possible to have a cpp_asserts entry. This will translates into some assertion at the preprocessor level. For instance, use :
```
     $code->{'cpp_asserts'} = [ "GMP_LIMB_BITS == 64" ];
            
```
  if the generated code is only for 64 bit computers.
- Another important field is for #include that are required by the implementation. The includes that must be added at the top of the .h file are to be declared as follows:
```
     $code->{'includes'} = [ '"my_macros.h"', '"grouik.h"' ];
            
```
- More complicated statements that must be put at the top of the .h can be declared in the extra field:
```
     $code->{'extra'} = [ "#ifndef __GNU_C_\n#warning \"non free compiler!\"\n#endif" ];
            
```
- Finally, $code->{'banner'} is a banner to put in the head of .c and .h files.
$options is a hash. It is passed as an argument to every creator. This is up to you to decide which option is relevant for your implementation.
$api is another hash. This is essentially a copy of the api.pl file and there is nothing to play with.

^[1]But this could evolve otherwise in the future.

MPFQ : Fast Finite fields

Pierrick Gaudry

Luc Sanselme

Emmanuel Thomé

Chapter 1. Introduction

Specializing code for a given field

Generalizing code for a large set of fields

Reporting bugs, comments, feature requests...

Chapter 2. Compile and install

Hardware, system and software requirements

Generating C source and header files

Installing.

Optional targets

Examples

Targets for checking

Targets for benching

Targets for tuning

Chapter 3. Tuning

Build targets for tuning

Exporting and importing tuning results

Chapter 4. Writing applications using MPFQ

Organization of MPFQ code

Names of MPFQ functions

General API of an MPFQ field

API for fields of characteristic two

API for prime fields in Montgomery representation

API for polynomials

Examples

Chapter 5. Design of the library

Foreword

Advantages of automatically generated code

Disadvantages of automatically generated code

Chapter 6. How to implement or modify a field

Enforcing the API

Input/Output of code_for_xxx()

Splitting information in several files

The inheritance schema, and conflict resolution

The elementary package files: the init_handler functions.

The top-level package file

The main generator file

Some extra notes

Input/Output of `code_for_xxx()`

The elementary package files: the `init_handler` functions.