Skip to content
dcoetzee edited this page Aug 28, 2013 · 5 revisions

This page is geared towards those interested in writing their own specializer. At this time, there is not documentation for application users looking to use a preexisting specializer in their Python code.

For any specializer, their are two major task that need to happen. The first is recognizing a block of code that needs to be specialized and invoking the appropriate specializer. The second - which is the majority of the work - is actually generating native code for the specialized region.

The first of these tasks is done by overloading a Python method that the specializer writer identifies. After intercepting the method call, the specializer is free to utilize any Asp library functions to specialize the method. The complexity of this step ranges from a few trivial lines of code to hundreds of lines, depending on how much flexibility and power the specializer writer wants.

For the code generation, Asp provides two major mechanisms. You can use Mako C Templates, or you can implement your own tree transformations and codegen. The first is much easier to setup, but the second allows much more powerful optimization and code generation. The remainder of this tutorial covers each of these mechanisms in detail.

Templating and the ArrayDoubler Specializer

See screencast on basic templates in Asp.

To understand the workings of a template driven specializer, we're going to examine the ArrayDoubler example available in the Asp repository. ArrayDoubler itself is trivial; it simply takes an input list of numbers and returns the result of multiplying each of the elements by 2. It's main purpose is to introduce developers to the templating features of Asp and illustrate the power of Asp's JIT specialization approach.

All of the code discussed can be found in the specializers/array_doubler/ directory of Asp.

If you look at tests/arraydoubler_test.py, you can see how an ArrayDoubler object is instantiated and how its pure Python implementation and its more efficient C implementation are called.

In array_doubler.py, you will see two methods for ArrayDoubler: double(), which is the pure Python implementation, and double_using_template(), which uses a template to output C code. Asp uses Mako as its templating engine. The template file is located at templates/double_template.mako, but since this is a trivial example, the only specialization used here is to set the upper bound of the for loop. Back in double_using_template(), the template is rendered as a string containing the code and is passed into an instance of ASPModule with the add_function() method, which handles the dynamic compilation and linking.

(Talk more about writing templates, using the C Python API, etc.)

Generating C code using templates is the fastest way of writing a specializer for Asp, but it is limited in power compared to using tree transformations.

Tree Transformations and the Stencil Specializer

To make more complicated specializers, a specializer writer should take advantage of the tree transformation capabilities of Asp. The general workflow begins with AST of a Python method. The specializer writer can write code to walk the AST, performing transformations on the individual nodes of the tree. Here Python code can be transformed into a semantic model that the specializer writer defines which gives a richer representation of the important aspects of the domain-specific problem. From here, the code is converted into a C/C++ AST using another tree-walking class written by the specializer writer.

There is a class that handles Python AST to C++ AST conversions in asp.codegen.ast_tools.ConvertAST. The output of this, a C++ AST, can be fed directly into asp_module.add_function, which invokes CodePy to compile it directly into a binary.

Note: It is also possible to use Python's or C++'s AST directly as your immediate tree form. Most authors find it simpler to introduce their own, but either model can be used.

Here is a brief list of the files used in the [Stencil Specializer] (https://github.com/shoaibkamil/stencil_specializer), a real-world specializer based on tree transformations:

  • stencil_kernel.py - The main driver of Stencil which calls each of the phases described above
  • stencil_python_front_end.py - Converts the initial Python AST into a special semantic model named StencilModel
  • stencil_model.py - Defines the new nodes used in StencilModel
  • stencil_unroll_neighbor_iter.py - Performs a couple optimizations on the StencilModel
  • stencil_convert.py - Converts a StencilModel into a C++ AST
  • stencil_optimize_cpp.py - Performs a couple optimizations on the C++ AST

Using framework-provided optimizations

The Asp framework provides classes to perform a number of common optimizations, loop unrolling, loop blocking, and loop switching. These are built on top of the general tree transformation mechanisms described above. This section briefly explains how to use them.

Loop unrolling

Loop unrolling is a transformation that takes the body of a loop and replicates it several times, eliminating loop overhead at the expense of code size. The LoopUnroller class acts on a C++ Block AST node object (normally the body of a For node), and unrolls it a specified fixed number of times. For example, the following code takes the "inner" Block object, unrolls it 4 times, and returns a new C++ Block object:

new_inner = ast_tools.LoopUnroller().unroll(inner, 4)

Although backend compilers also perform loop unrolling, explicit loop unrolling is sometimes needed either because the compiler cannot determine that unrolling is safe, or because the specific problem context facilitates a better estimate for the number of unrolls.

Loop blocking

Loop blocking or loop tiling is a transformation that breaks a single loop into a nested loop pair where the outer loop loops over a sequence of blocks (large contiguous ranges of indices) and the inner loop loops over a single block. By following this with loop interchange, data cache reuse can be improved. The LoopBlocker class is given a For node and the block size (in number of array indices) and returned the blocked loop:

blocked_loop = ast_tools.LoopBlocker().loop_block(for_node, 64)