17 February 2017

Introduction to NumExpr-3 (Alpha)

I've been working on a new branch of the venerable NumExpr module for Python.  It's a fairly major extension, as I discuss below.  The alpha release can be cloned from GitHub as follows:

cd numexpr
git checkout numexpr-3.0
python setup.py install  
(or pip install . but that suppresses error messages)

What's new?


Faster

The operations were re-written in such a way that gcc can auto-vectorize the loops to use SIMD instructions. Each operation now has a strided and aligned branch, which improves performance on aligned arrays by ~ 40 %. The setup time for threads has been reduced, by removing an unnecessary abstraction layer, and various other minor re-factorizations, resulting in improved thread scaling.

The combination of speed-ups means that NumExpr3 often runs 200-500 % faster than NumExpr2.6 on a machine with AVX2 support. The break-even point with NumPy is now roughly arrays with 64k-elements, compared to 256-512k-elements for NE2.

Example 1: Multiply-Add


Here 8 threads are used for both NE2 and NE3 (NumPy is single-threaded), and the cache dictionaries for the NumExprs have been disabled.  
The gap with complex number mathematics is larger because the entirety of the complex operations within NumExpr3 was implemented with vectorized functions. Vectorization hasn't been done yet with cmath functions for real numbers (such as sqrt()exp(), etc.), only for complex functions, but arithmetic operations such as +-*/  have been auto-vectorized by gcc. Here we are cheating, because NumExpr2 doesn't support complex64 as a data type, so it has to up-cast the data to complex-128. Similar advantages are had in image processing for NE3 on uint8 or uint16, whereas NE2 would up-cast to int32 and promptly return wrong values in the case of under- or overflow.


Example 2: Mandelbrot set

Jean-Francois Puget wrote a blog post last year for IBM, "How To Quickly Compute The Mandelbrot Set In Python." NumExpr2 didn't perform especially well compared to Numba.  There are some reasons for this. In Puget's code, the calculation would have been performed with complex128 instead of complex64 which was intended because NE2 upcasts consts to float64 and didn't have a complex64 datatype. Also it's not clear how many threads were used (he did the calculations on this laptop). 

Here's an example of the algorithm in a NumExpr3 format:


def mandelbrot_ne3(c, maxiter):
    output = np.zeros(c.shape, dtype='float32')
    notdone = np.zeros(c.shape, dtype='bool')
    z = np.zeros(c.shape, dtype='complex64' )

    # Almost 30 % of the time in a comparison appears to be in the 
    # cast to npy_bool
    neObj1 = ne3.NumExpr( 'notdone = abs2(z) < 4.0' )
    neObj2 = ne3.NumExpr( 'z = where(notdone,z*z+c,z)' )
    for it in np.arange(maxiter, dtype='float32'):
        # Here 'it' changes, but the AST parser doesn't know that and treats it
        # as a const if we use 'where(notdone, it, output)'
        # What we really need is an iter( ops, range ) function inside 
        # ne3.  This is an interesting case, since really here we see a 
        # major limitation in NumExpr working inside a loop.
        neObj1.run( check_arrays=False )
        output[notdone] = it
        neObj2.run( check_arrays=False )
    
    output[output == maxiter-1] = 0
    return output

def mandelbrot_set_ne3(xmin,xmax,ymin,ymax,width,height,maxiter):
    r1 = np.linspace(xmin, xmax, width, dtype='float32')
    r2 = np.linspace(ymin, ymax, height, dtype='float32')
    c = r1 + r2[:,None]*1j
    n3 = mandelbrot_ne3(c,maxiter)
    return (r1,r2,n3.T) 

Here I benchmarked with 4 threads on a Xeon CPU with a 2.5 GHz clock:

NE2.6:   365 ms for set #1
NE2.6:   9.73 s for set #2
NE3:     138 ms for set #1
NE3:     4.26 s for set #2

So again we see a nice 250 % speedup for NE3 compared to NE2.  There are some limitations in this algorithm for NumExpr in that the variable `it` changes with each iteration, but would be treated as a const by NumExpr.  This forces us in and out of the interpreter more than we would prefer.  There's a lot more that I believe can be done to improve NE3 performance, especially with regards to thread scaling.


More NumPy Datatypes

The program was re-factorized from a ascii-encoded byte code to a struct array, so that the operation space is now 65535 instead of 128.  As such, support for uint8, int8, uint16, int16, uint32, uint64, and complex64 data types was added.

NumExpr3 now uses NumPy 'safe' casting rules. If an operation doesn't return the same result as NumPy, it's a bug.  In the future other casting styles will be added if there is a demand for them.

More complete function set

With the enhanced operation space, almost the entire C++11 cmath function set is supported (if the compiler library has them; only C99 is expected). Also bitwise operations were added for all integer datatypes.  There are now 436 operations/functions in NE3, with more to come, compared to 190 in NE2.

Also a library-enum has been added to the op keys which allows multiple backend libraries to be linked to the interpreter, and changed on a per-expression basis, rather than picking between GNU std and Intel VML at compile time, for example.

More complete Python language support

The Python compiler was re-written from scratch to use the CPython `ast` module and a functional programming approach. As such, NE3 now compiles a wider subset of the Python language. It supports multi-line evaluation, and assignment with named temporaries.  The new compiler spends considerably less time in Python to compile expressions, about 200 us for 'a*b' compared to 550 us for NE2.

Compare for example:

    out_ne2 = ne2.evaluate( 'exp( -sin(2*a**2) - cos(2*b**2) - 2*a**2*b**2' )

to:

    neObj = NumExp( '''a2 = a*a; b2 = b*b
out_magic = exp( -sin(2*a2) - cos(2*b2) - 2*a2*b2''' ) 

This is a contrived example but the multi-line approach will allow for cleaner code and more sophisticated algorithms to be encapsulated in a single NumExpr call. The convention is that intermediate assignment targets are named temporaries if they do not exist in the calling frame, and full assignment targets if they do, which provides a method for multiple returns. Single-level de-referencing (e.g. `self.data`) is also supported for increased convenience and cleaner code. Slicing still needs to be performed above the ne3.evaluate() or ne3.NumExpr() call. 

More maintainable

The code base was generally refactored to increase the prevalence of single-point declarations, such that modifications don't require extensive knowledge of the code. In NE2 a lot of code was generated by the pre-processor using nested #defines.  That has been replaced by a object-oriented Python code generator called by setup.py, which generates about 15k lines of C code with 1k lines of Python. The use of generated code with defined line numbers makes debugging threaded code simpler. 

The generator also builds the autotest portion of the test submodule, for checking equivalence between NumPy and NumExpr3 operations and functions. 

What's TODO compared to NE2?

  • strided complex functions
  • Intel VML support (less necessary now with gcc auto-vectorization)
  • bytes and unicode support
  • eductions (mean, sum, prod, std)

What I'm looking for feedback on

  • String arrays: How do you use them?  How would unicode differ from bytes strings?
  • Interface: We now have a more object-oriented interface underneath the familiar evaluate() interface. How would you like to use this interface?  Francesc suggested generator support, as currently it's more difficult to use NumExpr within a loop than it should be.

Ideas for the future

  • vectorize real functions (such as exp, sqrt, log) similar to the complex_functions.hpp vectorization.
  • Add a keyword (likely 'yield') to indicate that a token is intended to be changed by a generator inside a loop with each call to NumExpr.run()

If you have any thoughts or find any issues please don't hesitate to open an issue at the Github repo. Although unit tests have been run over the operation space there are undoubtedly a number of bugs to squash.

31 December 2016

bloscpickle

Editted 27-01-2017 to add RapidJSON results.

Python has a number of libraries available to it for the purposes of serializing data and object, generally for the purpose of passing them around from one process or node to another, or for saving the program state to disk. Serialization for a weakly-typed language such as Python brings with it some challenges which typically result in modules having limitations in what they can serialize. My interest is mostly with regards to packaging meta-data with microscopy binary data, were one might have a few megabytes of metadata alongside gigabytes of image stacks.

The built-in modules are pickle, marshal, and json.  I will also look at two other third-party modules ujson and msgpack-python. All of them produce either text or binary representations, and all are uncompressed. I thought I would test an implementation wrapping them with the blosc meta-compressor library to compress their outputs before writing to disk, to see what sort of space-savings and potentially performance-enhancements could be wrote. The code presented herein is available at:

https://github.com/robbmcleod/bloscpickle

It's not intended as a production-ready tool.

Pickle: is Python's most robust serialization tool.  It can manage custom classes and objects, and circular references. It does not duplicate objects found to have multiple references. It outputs binary. It is not compatible with other languages. Pickle received a major speed upgrade with Python 3, which also came with a new file protocol. Pickle is used, for example, in the multiprocessing module to exchange data between processes.  Pickle is often said to be a potential security hazard as it can potentially carry malicious code, which is the disadvantage of its versatility in serializing objects.

Marshal: is Python's internal file I/O module. It can serialize only Python base types (essentially lists, tuples, and dicts), will crash if fed a circular reference. It is not even compatible across different version of Python. It is supposed to be the fastest of the tested standard Python modules.

JSON: otherwise known as JavaScript Object Notation is the most ubiquitous method to serialize objects. It essentially only deals with two constructs: list and dicts. As such, it requires helper functions to be implemented in order to serialize objects. It's not binary, so it can be human edited (with some difficulty, it's picky about commas and similar formatting errors). As with pickle, json received a major performance upgrade in Python 3, such that many external implementations of JSON were obsoleted, with a couple of exceptions such as...

UltraJSON: developed by an Electronic Arts studio (but released under a BSD license), UltraJSON is just like JSON, but faster. One drawback of ujson versus the default library is that it can fail silently.

RapidJSON: Another fast JSON parser built on top of a C-library.  Here I use Hideo Hattori's wrapper which is the more complete of the two Python wrappers.

Message Pack: is billed as a binary-equivalent of JSON. On first glance it was very intriguing, as it offers a significant encoding rate and encoded size advantage over JSON, and it has implementation for basically every programming language in-use today. However, I ran into problems in testing. By default, converts Unicode strings to bytes strings, which can cause a loss of information. When encoding as Unicode it loses its speed advantage over the faster JSONs. Furthermore, while msgpack will serialize objects, it doesn't serialize all their attributes, and so it fails silently.

Blosc: isn't a serializer, it's a compressor, or more properly a meta-compressor.  Blosc wraps a variety of compression codecs with a thread pool, to provide very high performance. The two best compressors in blosc in my experience are lz4, which is ultra-fast but middling compression (and the new standard codec for the ZFS file system), and Zstandard which achieves better compression ratios than zlib/gzip and is still very fast. Zstandard is new as of 2015 and essentially offers something for nothing compared to older compression algorithms. It's usually within spitting distance (a  few %) of BZIP2 for compression ratio and far faster, being heavily optimized for parallel computing. In testing on Xeon machines I've achieved compression rates of about 12 GB/s with lz4 and 5 GB/s with zstd.  That is GigaBytes per second. Blosc also has a filter stage, which at present is byte- or bit-shuffling. I've found bit-shuffling to be effective when compressing floating-point or otherwise dynamic-range limited data. It would probably be extremely helpful for DNA or protein sequences, for example. Here I did not find shuffling to be effective. Throughout this post I use compression level 1 for  zstd and compression level 9 for lz4.  Lz4 does not really slow at all with compression level, and zstd saturates much earlier than zlib (there's rarely an advantage to going past 4).

All tests were performed with Python 3.5 with an iCore5-3570K (3.4 GHz), running 4-threads for blosc, and a Western Digital 3 TB 'Red' drive formatted for NTFS. Ideally one would perform this test on a Linux system with disk cache flushing between each test. I would expect some additional performance from Python 3.6, in particular because we are using dicts here, but I use a lot of libraries so it will be some time before I move to it myself.

High-entropy Benchmark: 256k UUID4 keys

I tested the serialization tools on 256k UUID4 keys with singleton values. This is a fairly challenging data set for compression because there's quite a high degree of randomness inherent in what are supposed to be unique identifiers.
Figure 1: Write times for 256k UUID4 keys.
Figure 2: Read times for 256k UUID4 keys.
Figure 3: Size of the data on disk. The difficulty of this data set is evident in that lz4 compression achieved little. However, zstd shines here, cutting the data in half.

Overall for pickle using zstd compression yields about a 25 % write time penalty, but this is nearly negated by a corresponding reduction in read time. Since the data is small, I expect 'writing' is just depositing the file in the hard drive cache.

The increased performance of blosc-JSON compared to pure JSON is somewhat paradoxical, and not due to the JSON serialization, but the poor performance of Python in reading and writing Unicode streams to disk. If you encode the JSON output as UTF-8 and write it as bytes, it's much faster. I left  the benchmarks as is, because it's something to keep in mind. Similarly marshal seems to be faster at reading when it is passed a bytes object instead of a stream.

Message Pack looks on the surface to offer impressive performance, but as mentioned above the Python implementation often omits important information from objects. If I worked on an enterprise-level project, I might dig more deeply into why and when it fails, but I don't, so I won't. Also as

Low-entropy Benchmark: JSON Generator

Here I generated 10'000 entries of pseudo-phone book entries, with the handy JSON Generator, which corresponds to about 25 MB of JSON data. This data has a lot more repeated elements, in particular a lot of non-unique keys, that should improve relative performance of the compression tools.


Figure 4: Write times for ten-thousand procedurally generated phonebook-like JSON entries.

Figure 5: Read times for ten-thousand procedurally generated phonebook-like JSON entries.
Figure 6: Size of the data on disk. Here the data is significantly less random so the compression tools, and especially lz4 perform better than with the high entropy data set. The blocksize was 64kB.  

Overall lz4 reduces the disk usage by about 40-50 % and zStandard shaves another 10 % off of that.  If you are consistently dealing with larger data chunks, the blocksize could be increased.  Typically blosc is fastest when the block fits into L2 cache, but compression ratio usually increases up to about 1 MB blocks before saturated.

Here both UltraJSON and Message Pack silently failed the read/write assert test. The ujson error appears to be related to minimum precision in reading floats, and for Message Pack the problem was that it converting Unicode to bytes.

Conclusions

Overall, on fairly difficult data blosc reduces file size by about 50 % in return for a 25 % write speed penalty. However, the read time is accelerated, such that the net time spent on file I/O is more or less a push. On more highly compressible data (e.g. DNA base pairs, protein sequences) and in particular data large enough to swamp the hard disks' cache (typically 0.5-1.0 GB), one would expect to see blosc + serialization be faster than just pure serialization.

Only pickle offers out-of-the-box functionality for serializing objects. If you want to serialize with JSON, to maintain cross-language compatibility, then you'll need to implement helper methods yourself.  UltraJSON looks great on the surface but I did manage to break it, so I wouldn't consider it an out-of-the-box robust solution.  Still, it beats pickle in speed.  This could be as simple as Python's boolean True mapping to 'True' on disk and back to 'True' after read. Another potential JSON library that has a Python wrapper to examine is RapidJSON, which has two implementations python-rapidjson and pyrapidjson.

One aspect I wanted to look  at was trying to monkey-patch the multiprocessing module to use bloscpickle instead of pickle. However, pickle is not exposed so one would have to patch the reduction.py file in the module.

One disadvantage of blosc at present is that it does not have a streaming interface, i.e. it deals with bytes objects. This means it will store and extra (compressed) copy of the data in memory, relative to vanilla pickling. It also used to hold onto the GIL, although that has been patched out and should go live with the next release.

04 November 2016

Polygon Filling in Parallel Computing

I'm more or less turning into a full-time programmer as I age, so I thought it might be interesting to reactivate the blog and post some discussions on topics I find interesting from time to time. I have here this Voronoi tessellation and I want to extract polygons from it:



How can I do this quickly? Pre-written solutions exist in the Python universe. For example, scikit-image has the function draw.polygon which can be used to generate all points inside a polygon. The function itself is written in Cython and is single-threaded. 

Here's an example of a solution that uses the package numexpr instead of skimage's Cython code to calculate the points inside a polygon  The main advantage of numexpr is that it's multi-threaded and uses blocked calculations. You could try ThreadPool from multiprocessing, but skimage.draw.polygon only releases the GIL intermittently (in the Cython sub-function point_in_polygon), so I'm not sure how well that would work. The per-pixel check in scikit-image can be a little sub-optimal for a dense grid (e.g. filling), here's an example of a row-wise fill algorithm:  http://alienryderflex.com/polygon_fill/ 

Here I'm filling a polygon on a naive pixel-by-pixel basis, but because I we use meshgrid it could be performed on any gridded basis that you define:

# Generate our mesh.  Here I assume we might re-use the mesh several times for tiles 
# of a different shape (or slicing in 3D), so the scope is global relative to the 
# polygonInterior() function below
import numpy as np
import numexpr as ne
import skimage.draw
from time import time

polyCorners = 5
boxLen = 2048
ne.set_num_threads(1) # Generally NumExpr is fastest when threads = # of physical cores

vertices =np.array( [[ 0,   375.56], [ 578.70,  0], [ 2048, 1345.36 ],
       [ 1318.43,  2048], [ 0, 1712.97] ], dtype='float32' )

tileShape = np.array( [boxLen, boxLen], dtype='int' )
tileExtent = np.array( [0, boxLen, 0, boxLen], dtype='int' )

tileXmesh, tileYmesh = np.meshgrid( np.arange(tileShape[1]), np.arange(tileShape[0]) )
tileXmesh = tileXmesh.astype('float32'); tileYmesh = tileYmesh.astype('float32')

# Numexpr approach
# shape should crop the polygon to its extent=(xmin,xmax,ymin,ymax)
def polygonInterior( vertices, extent ):
    # Slice the pre-generated meshes
    xsub = tileXmesh[ extent[0]:extent[1], extent[2]:extent[3] ]
    ysub = tileYmesh[ extent[0]:extent[1], extent[2]:extent[3] ]
    polyMask = np.zeros( [extent[3]-extent[2], extent[1]-extent[0]], dtype='bool' )
    
    J = vertices.shape[0] - 1
    ypJ = vertices[J,0]

    for I in np.arange( vertices.shape[0] ):
        xpI = vertices[I,1]; ypI = vertices[I,0]
        xpJ = vertices[J,1]; ypJ = vertices[J,0]
        # Could re-use from I: ysub < ypJ, ypJ <= ysub but in testing this led to no speed-up
        polyMask = np.logical_xor( polyMask, ne.evaluate( "( (((ypI  <= ysub) & (ysub < ypJ)) | \
((ypJ <= ysub) & (ysub < ypI)  )) & (xsub < (xpJ - xpI) * (ysub - ypI) / (ypJ - ypI) + xpI) )" ) )

        J = I
    return polyMask
    
t0  = time()
ne_mask = polygonInterior( vertices, tileShape )
t1 = time()
print( "Numexpr calculated polygon mask over %d points in %f s" %( np.prod(tileShape), t1-t0 ) )

#### skimage approach ####
t2 = time()
xsub = tileXmesh[:tileShape[0],:tileShape[1]]
ysub = tileYmesh[:tileShape[0],:tileShape[1]]
geoMask = np.empty( tileShape, dtype='bool' )
si_indices = skimage.draw.polygon( vertices[:,1], vertices[:,0], tileShape )
t3 = time()
print( "skimage calculated polygon mask over %d points in %f s" %( np.prod(tileShape), t3-t2 )  )
print( "numexpr speed-up: %f %%" % ((t3-t2)/(t1-t0)*100) )


For 1 thread, numexpr is about 240 % faster than skimage.draw (probably because NE is using floats and skimage is double, but also due to the blocked code execution), for 4 threads numexpr is 640 % faster, for 8 threads it's 840 % faster, for 12 cores it's 1130 % faster (Intel Xeon E5-2680 v3 @ ~ 2.9 GHz).  We're hurt a bit by the fact that numexpr doesn't have an xor operator so each loop does have a slow numpy calculation (although maybe I'll fix that in the future). If I include the mesh generation in the numexpr time it's 185 % faster on 1 thread, but the mesh can be reused. One could use decorators to save the meshes on the function handle if you expect to call it repeatedly and don't want to have the variables from scope.

The huge advantage of using a virtual machine like numexpr here is the implementation time, and ease of redistribution. The above code functions just fine in a Python interpreter and doesn't need to be compiled, so the implementation time was very short and because numexpr has an efficient virtual machine, we get fast performance without hand-tuning. Numexpr is available in essentially all scientific python distributions, as is Cython but the numexpr code doesn't need to be compiled by the end-user. This can save a lot of pain on Windows and OSX.  

The result is the texture for each polygon can be extracted in a hurry.  This one took 5 ms:


14 December 2012

Why Stretching Is Bad For You

Pulling on a muscle for a long time doesn't make it longer, doesn't make it more pliable, and doesn't improve athletic performance. There's lots of scientific literature that shows that stretching before exercise increases the risk of injury, but still there's a lot of emphasis on it in common fitness literature. It's similar to the demonization of saturated fats: once a meme gets firmly implanted in the culture, it's hard to eradicate it. 

I wanted to share this video from Evan Oscer on why conventional stretching is bad.  I think it's worth your time to watch:


Personally, I happen to do yoga as my means of integrating my muscles and nervous system. Yoga is mentioned in this video around the 23 minute mark in a positive light. A lot of people seem to think yoga is glorified stretching, but that's evidence of a bad yoga teacher (there are many, many bad yoga teachers out there). The physical side of yoga, the asana practice, is the integration of:
  1. Breath
  2. Stability
  3. Movement
Stability and breath are both important for reducing muscle apprehension throughout the movement.  Whenever your nervous system is unsure of whether it can support a load at the edge of your range of motion, the interaction between the muscle spindles and the Golgi organ in the tendons causes the muscle to spasm to protect itself. If you want to decrease muscle tightness and improve your range of motion, the thing to work on is improving the stability of the movement while breathing deeply and evenly, _not_ pulling on the muscle harder.The apprehensive reflex is inhibited when the body is convinced, by many repetitions, that the joint is still safe and stable even at the edge of the range of motion. This in turn allows us to be more athletic, more open, and improve our eccentric muscle control so that we can relax when muscle tension is deleterious.

11 December 2012

Thermoelectric Breakthrough, ZT = 2.2

Recently there was a major advance reported in Nature (Biswas et al., 2012), conducted at Northwestern University (NWU), on a new high-temperature thermoelectric material. This particular material is doped and structured Lead Telleride (PbTe) and is used to produce electricity from high-temperature 'waste' heat, otherwise known as a Seebeck generator (as opposed to a Pelletier refrigerator, which is the effect in reverse).

If I may diverge for a moment, Tellurium is an element with a lot of interesting chemistry associated with it. Jim Ibers of Northwestern University wrote an interesting article in Nature Chemistry (2009) about it. It's in the same column as Oxygen and Sulfur on the periodic table, so it shares some aspects the very complicated chemistry that these elements have.  However unlike Oxygen and Suflur, it can also form distended Te-Te bonds, in addition to Te-Te single bonds, so oxidation states in metal Telluride compounds are often indeterminate (which sort of implies that Te-containing compounds can have lots of different phases with very different physical and chemical properties). 

Tellurium ain't exactly common (it's less abundant than gold, but not in as high demand) nor environmentally benign.  Also, thermoelectrics are naturally going to be employed more in high-wear, industrial environments.  Furthermore, there is competition for Tellurium from other industries such as thin-film CdTe photovoltaic manufacturers such as First Solar.

Anyway, back to the thermoelectric!  

The advance improved ZT, which is the figure of merit for the heat efficiency performance, from about 1.7 to 2.2.  ZT governs the Carnot efficiency of  the thermoelectric, and hence dictates how close to entropy-limited performance a thermoelectric material can operate.  These devices are supposed to operate from a heat source at about 750-900 Kelvin (630 - 480 °C) and dump into room temperature.  This is not low-quality heat by any means, as its easily hot enough to make steam, so there's no magic when it comes to thermoelectrics.  They still obey the laws of thermodynamics.

The basic trick here is the governing equation,
ZT = σS2/(κel+κlat)
from which we see that if you decrease the thermal conductivity, κ, you can improve the performance (electrical conductivity is σ). Thermal energy in solids is primarily conducted by phonons, which are quantized lattice vibrations (i.e. sound). Electrons carry some thermal energy as well, but at the temperature we're concerned with, it's precious little. In order to decrease the thermal conductivity, but not overly decrease the electronic conductivity, you need some structures that are phonon-sized (many atoms). Phonons have a large range of possible wavelengths, so in order to inhibit thermal conductivity you want to provide all wavelengths with something to reflect and scatter from.  Hence, the desire is to have a material that is structured/disordered on all potential length scales so there's no window for phonon-driven thermal conductivity to occur in. 

The previous body of research, that improved the ZT up to 1.7 involved nanopatterning the thermoelectric with precipitates about 5 nm in diameter. In comparison the microscale crystallites have a much broader distribution of grain sizes, but average around 1 μm (or 1000 nm).  The histograms in Figure 3 tell the story.
Figure 3 (from Biswas et al., 2012): Micro- (top) and nano-scale (bottom) structures in Sodium-doped Lead/Strontium Telluride.  The low-mag electron micrograph in (a) shows the crystal grains, while in (e) the nano-scale precipitates are seen in high-resolution transmission electron microscope images. 
It is slightly ironic, given the nanotechnology craze, that the group improved the performance first with nanostructures and then when that was insufficient, reverted to the microtechnology to further improve the performance. So the new model is to develop sexy technology first and the simple solution later? Of course there's a fairly big gap between the nano-scale precipitates and the micro-scale crystalline boundaries.  Perhaps something in the range of 50-100 nm could be added to further reduce the thermal conductivity? 

One concern I would pose is whether the nanostructures and microstructures are stable long-term at the given operating temperature.The melting point of PbTe is 924° C, but the constituents melt at much lower temperatures, so there is the potential for annealing over time.

04 October 2012

JAMA Study on Efficacy of Vitamin D in Preventing Colds

Whether or not low vitamin D3 levels can affect how often one can get sick is an interesting line of research these days.  The notion is, we spend so much time indoors or slathered in UVB-blocking sun cream that our bodies aren't producing nearly as much vitamin D3 as they did historically before the industrial age.

A new study in the Journal of the American Medical Association treated a population with vitamin D to see if vitamin D serum levels affect how often a person contracts a respiratory infection (Murdoch et al., 2012, free access). This was a randomized, double-blind study with a placebo group, so it's a high quality study and from my reading appears solid.  75 % of the study participants were female.  The treatment method was to give participants an oral dose of 200,000 IU initially and again one month later.  They then followed that up with 100,000 IU per month for the remainder of the study, which is about 3,300 IU per day.  Personally I take 4,000 IU per day in the winter and none in the summer.

My first concern was whether or not the treatment method, large monthly oral dose vitamin D, especially because vitamin D is fat soluble, the vitamin D was given in tablet form, and there were no instructions to eat fat with the dose that I can find.  However, as shown by Figure 2, they study had no problem creating a statistically significant difference between the treatment and placebo groups.
Figure 2 (from Murdoch et al., 2012): Mean serum levels of vitamin D3 in the treatment and placebo group.  Significantly different levels were achieved in only two months and maintained throughout the study period. 

The authors found no negative effects such as hypercalcemia and no side effects from the high dose regimen.  The fact that the human body does absorb such large doses of vitamin D3 does sort of suggest that the human body really does aim for higher vitamin D serum levels then are present in the general population.

The results show that there was no difference between the high and low serum vitamin D groups on the incidence of respiratory infections or days of work missed.  So that's a big nada for vitamin D supplementation to improve immune system response against these particular viruses.  Interestingly, while the common cold was not heavily affected by vitamin D status, more dangerous respiratory infections such as influenza were statistically significantly reduced,as shown in Table 2.

Table 2 (from Murdoch et al., 2012): Infection rates by common cold associated viruses were unchanged with vitamin D status, but more dangerous flu-like viruses were significantly reduced.
Personally speaking, I fear catching the flu but a cold is barely worth noticing. It is very rare to die from a cold, but influenza and other serious respiratory infections are often lethal in immune system compromised people. The authors noticed this as well,
Of particular note, there were few cases of confirmed influenza infection among our partly vaccinated group of participants. Although adult data are unavailable, a randomized controlled trial in Japanese schoolchildren, set up to assess the effect of vitamin D supplementation on “doctor-diagnosed influenza,” did not report on that outcome but did report a statistically significant reduction in laboratory-confirmed influenza A infection (relative risk, 0.58; P = .04).
There's still plenty of reason to supplement with vitamin D, cancer risk being a big one, but the common cold does not appear to be one of them.

18 October 2011

Fusion Power, Steampunk-style

Harnessing the power of fusing hydrogen isotopes together has long been a staple of science fiction, a means of achieving otherwise unachievable power densities, so as to make so many gee-whiz devices within the realm of the possible.  In reality, fusion design concepts are generally massive, cantankerous, and incredibly expensive (see magnetic confinement fusion and internal confinement fusion).  Fusion has been practical for awhile now, but it generally requires more energy input than it outputs, and the prototypes have been ludicrously expensive. The joke is with fusion, with research that's been ongoing for fifty years, is that it's always and always will be, "twenty years away." 

In order to make hydrogen fuse, the first step is to get it hot enough that it ionizes, such that the electrons are no longer attached to a nucleus, forming the forth-form of matter, a plasma.  Generally, the aim is to bring together a Deuterium (proton+neutron) ion and a Tritium (proton+2 neutrons) ion close enough together that the strong nuclear force affects the fusion of the two ions.  However, the electrostatic charge on the ions is a longer range force, and it tends to mess up collision trajectories, such that only very high energy ions on a direct collision course could ever fuse.  So the temperatures required are quite massive.

The magnetic confinement tokamak design that most people will be familiar with due to its widespread coverage in popular science magazines, tries to achieve more or less steady-state fusion power.  Steady-state fusion tends to be plagued by energy losses, particularly turbulence in the plasma, that bleeds off power. In comparison, pulsed concepts like internal confinement are easier to initiate, but the natural tendency of an extremely hot gas is to expand rapidly, so fusion rapidly slows and stops, limiting the overall efficiency of the process.

Ok, enough background: enter General Fusion, a company based in British Columbia, that is angling to build a fusion power generator that, well, seems like it would fit right into a Steampunk science fiction novel! It's the one fusion concept that I've seen that one could conceivable build using relatively low-technology components: pistons, microwave ovens, that sort of thing.  It's something MacGyver might build. 

Magnetized Target Fusion (MTF) is a hybrid concept that is supposed to be low-cost.  It was first proposed in 1976 as the LINUS concept, and it relies on first forming a small ball of  deuterium and tritium plasma, called a plasmoid (or sometimes a spheromak).  The plasmoid is given some angular momentum, such that it's actually a vortex, so that it has an inherent magnetic field that holds the plasmoid together for a brief moment.  The plasmoid, which is already pretty warm, is then compressed so that a pulse of fusion occurs.

The main advantage of forming a plasmoid first, over the plain inertial design, is efficiency in transferring energy from electricity into the plasma.  Lasers, plain and simple, aren't efficient at converting electricity to coherent light — I don't know what the lasing efficiency is at the National Ignition Facility, but commercial solid-state lasers are usually in the single digits. In comparison, a plasmoid can be formed with basically a high-tech microwave, using radio-frequency radiation, and the conversion efficiency is very very high.

Of course, the next question is, how to compress the plasmoid?  A plasmoid has a lifetime of approximately 100 μs according to General Fusion, so compression has to occur on that timescale. The proposed solution is to use over two-hundred pistons driven by compressed air to smash into the 'pot' holding the plasmoid, inducing a converging acoustical wave. As the wave converges, its strength increases and it collapses the plasmoid to very high pressures, ~1 Megabar and results in a enormously high magnetic field within the collapsed plasmoid, on the order of 1000 Telsa.  Effectively, it's like an artificial implosion nuclear bomb, using a very small amount of material. By using pneumatically-driven pistons instead of say, lasers, to achieve compression General Fusion is again gaining major efficiencies in terms of their energy input to output ratio (aka 'gain' in the fusion world).  Air can be compressed relatively efficiently up to thermodynamic limits, so the whole concept doesn't have massively lossy steps that crush the overall system efficiency.

Since the pistons are basically flat, the shock wave will actually not be perfectly spherical.  Also, it's pratically impossible to get all the pistons to hit the sphere at the exact same time — General Fusion claims they have accurate control of the impact time down to 5 μs which is 'good enough.'  Since there will always be some error in the impact timing, the shock wave will imperfectly compress the plasmoid and one can expect a lot of cavitation and other hypervelocity fluid dynamical effects.  The cavitation is similar to shaped-charge explosives, in that very high-speed jets are formed.  I am not very clear on the physics of these plasma jets, but I would guess that they are basically the source of the ultra-high temperatures that make fusion possible with this concept. So cavitation early in the compression of the plasmoid is bad, because it bleeds off energy and reduces the ultimate compression achieved.  However, a certain amount is probably desirable once the pressure reaches its ultimate limit.
Figure 1: General Fusion's pneumatic fusion reactor concept (http://www.generalfusion.com/generator_design.html). Plasmoids are formed in the plasma injectors (cones on the top and bottom) and then injected into the 'pot' of liquid lead and lithium. The two plasmoids collide in the middle and are metastable for a brief instant.  The pot is surrounded by 220 pneumatically driven pistons which hammer the side of the pot, creating an imploding acoustical wave that compresses the plasmoid, causing a pulse of nuclear fusion.
The pot itself is actually full of a mixture of liquid lead and lithium metals, as there's a need for an 'aether' to transmit the acoustical energy.  The lead acts as a neutron/thermal heat sink, absorbing the energy from neutrons produced by the fusion event both to recover it in the form of heat but also to protect the rest of the machine from high-energy neutron radiation.  The lead-lithium mixture carries the heat produced by the fusion pulse to some working fluid (i.e. water) which can then produce electrical power.

The lithium is a slow neutron absorber, but it also undergoes fission to hydrogen and helium isotopes (via n + 6Li → T + 4He and n + 7Li → T + 4He + n), thus acting as a source of tritium, which is very expensive, radioactive, has a tendancy to leak through solid materials, and dangerous, since it can be used to make hydrogen bombs.  Hence the reactor is designed to have a high breeding ratio (claimed at 1.6:1), so that once a little tritanium is given as a starter, more comes out.

Flowing the lead-liquid mixture in and out of the pot is likely a little tricky because the mixture has to spin in the pot, so as to setup favourable conditions for the plasmoid collision.

For the test-bed unit, which is smaller than an industrial scale reactor would likely be due to efficiencies of scale, about 100 Megajoules of mechanical energy is required as an input and about 600 MJ of thermal energy is produced.  The heat can then be used to make steam, just like any other thermal power plant, and recovered at around a 33 % efficiency, so that 200 MJ of electrical energy is produced per shot.  Hence the net would be 100 MJ per shot, and the target goal is 1 shot per second, thus producing 100 MW of power.

There are of course a variety of problems with the concept.  One of the biggest is getting the two plasmoids to collide and combine in the desired manner to form a little vortex of plasma in the centre of the pot. This is a hard thing to test without two working plasma injectors and a pot of liquid lead-lithium.  Currently they are relying on simulations, and there is plans for an explosive-based compression test to see if their plasma injector is working as desired.  The disadvantage of the explosive-based method is that it's destructive, so they can only get one test per boom-boom.  This makes iterating the design expensive and manpower intensive, but they are planning a shot in the fall of 2012 without Tritium.

Another problem is that material from the pot or the plasma injector nozzles (called spalling in the tokamak field) will be absorbed into the lead-lithium liquid, and that these impurities will radically increase the rate at which the plasmoids dissapate.

Irradiation of the machine itself is also a problem.  The lead-lithium matrix will absorb 99.9999 % of the neutrons but the walls of the vessel will still become too radioactive after about six months of use. Fortunately neutron embrittlement should not be a problem because the neutrons should be moving at relatively low velocities by the time they get to the shell of the pot.

Lifetime of the shell and pistons is also a concern, due to the thermal and shock stress caused by the impacts. This is actually something that improves as the machine gets bigger, because the pistons can move slower in order to achieve the same overall compression ratio. 

When I describe this concept as being steampunk-themed, I am exaggerating a bit.  In fact, this concept requires exquisite timing to control all the pneumatically driven pistons, and to form and inject the plasmoids into the liquid lithium-lead chamber, and that means lots of fibre-optics and other high-speed network devices unavailable 20-30 years ago. It is definitely the hipster of fusion power schemes, however. 

The bottom line: I remain skeptical that nuclear fusion can be more economical than either photovoltaics, which will eventually be the cheapest source of power on the planet, or advanced fission reactors.  Fusion is one of those gee-whiz things that sounds really exciting, until you start getting into the details and wonder how it will be economical, and the radiation waste aspect isn't really any better than fission (there's no worry about products decaying into Radon, which is a radioactive gas, but they do have to worry about Tritium contamination of the reactor, and it's a gas that can flow in-between the molecules of solid metal). The company has raised about $40 million thus far, and they probably need more than double that to finish their prototype in 2013/14, so it will be interesting to see if they find it. On the other hand, this concept is ripe for science-fiction fodder.

03 April 2011

Bacon-wrapped Bake Purple Potatoes

So the other day I decided to make myself brunch, for which I wanted to make something to go with poached eggs.  Now, poached eggs boiled in vinegar water four minutes each are a a little runny with the yokes only semi-congealed on the edges, so they need something starchy to sop up the yoke.  Hashbrowns could work, but I had some purple potatoes that tend to bleed their pigment, so instead I thought I'd bake them.  Then my next brainwave was to wrap them in bacon and chiles, and damn they were tasty. 

Ingredients
  • Two small purple potatoes, half lengthwise
  • Four strips bacon
  • (optonal) Pickled green Thai chiles
  • Salt and pepper, to taste
  • Olive oil
Slice the potatoes in half, then salt and pepper them.  The potatoes were small, not fingerlings but not nearly full-sized baking potatoes.  Wrap the bacon around loosely so that the top is well covered.  Insert the chiles, and drizzle olive oil on top as 'starter fat' to prevent the bacon from burning.  Bake at 350 °F for about 45 minutes, until the potato is cooked through. The bottom of the potatoes should come out nice and golden, and the bacon will shrink wrap around the potatoes.  Next time I might microwave the potatoes a bit first to decrease the cooking time and to cook the bacon a little less. 


The finished product.

28 January 2011

Avalanche Safety Training, Level 1

So least you readers think I am just being lazy in not posting, here is some photos from an avalanche safety course I took last weekend.  We were at Bow Summit for the practical (skiing) portion of the training, here's some pictures sans people.
 Sunset the night before.
 The skin track up.
The snowpack is about 70 cm of wind-loaded slab on top of 80 cm of very weak sugary surface hoar. The interface is obvious in the picture. In other words very dangerous for slab avalanches.  The snowpack was surprisingly strong given how bad it looked (still very dicey over 30°), probably because the top slab was still fairly plastic.  We didn't hear any whumpfting (audible evidence of snow settling when skied over). 
The view across the valley as the clouds broke up.  Ski touring is totally unlike lift-served skiing in tone.

Physical activity should be fun play-time.  I don't really get the emphasis on weight lifting in the paleo community, as it seems very boring to me.

04 January 2011

M1 and M2 macrophages and the Herpes-virus family

Ah, the immune system.  It is what makes trillion-cell organisms possible.  Immune system cells actively patrol the body and attack bacteria, fugii, and parasites.  In the process, they often causing collateral damage to self-tissue.  The innate immune system compared to the adapted immune system is evolutionarily older and more prone to carpet bombing tactics to defeat pests. As such, when it doesn't operate properly, a broad list of symptoms can present.  The innate immune system actually has two roles: both fighting off foreign cells but also repairing damaged tissue.  When the innate immune system is always fighting and never repairing it leads to the state of chronic inflammation at the heart of many of the diseases that plague modern civilization, like diabetes, stroke, and heart disease.

What then determines which role an innate immune system cell body operates in at any given time and place in the body?  Let's look at monocytes/macrophages.

Monocytes are undifferentiated (i.e. unspecialized) immune system cells of the innate immune system that circulate in the bloodstream.  In response to chemical signals from the tissues adjacent to their blood vessels, they enter into the tissue to either fight infection or repair tissue damage.  When monocytes enter tissue, their gene expression causes them to become more specialized and they are then called macrophages (they can also become other immune system cells).  There are two basic phenotypes for macrophages, which are essentially the ying and yang of the macrophage community.
  • M1 macrophages are pro-inflammatory and fight infection.  They are the classical state for macrophages that you would find described in a textbook.  Primarily, they detect and fight foreign organisms (viruses, bacteria, and parasites).  They are characterized by the production of pro-inflammatory cytokines, chemicals which alert the other cell types of your immune system to react and destroy the invader (as well as adjacent 'self' cells). 
  • M2 macrophages are anti-inflammatory and repair tissue damage.  For example, when you exercise and your muscle tissue is damaged, it is M2 macrophages that infiltrate your muscle organs and affect the repairs [Tidball, 2010] after the initial M1 surge.  The characteristic cytokine of M2 macrophages is interleukin-10 (IL-10), which encourages other macrophages to enter the tissue and differentiate into M2 phenotype but also discourage the attention of cyto-toxic 'killer' cells from the lymphocyte family of the immune system.
The differentiation of macrophages, from M1 to M2, is not all that distinct and is generally though to represent the two extremes of a continuum.   My reading suggests macrophage populations can make the transition from one phenotype to the other, without die-offs.  This is probably a bad thing for chronic modern diseases, in that many of the diseases that are as a result of macrophage dysfunction occur when apoptosis (programmed cell death) is impeded.

One of the beneficial effects of eating a diet low in inflammatory factors (e.g. fructose, wheat, smoking) is that the overall levels of pro-inflammatory hormones, such as cortisol or interferon, are low so the transition from high M1 expression to high M2 expression can occur more rapidly. I strongly suspect this is why most people who transition to the paleo-diet are much better able to put on muscle mass. As the Tidball article indicates, chronic exercise is another no-no because it doesn't give enough time for the M2 macrophages to enter and affect repairs, so the muscle is always in an inflamed state.  

This doesn't, in general, appear to actually reduce the ability of the body to fight infectious disease, however.  This is probably because even though the overall inflammatory condition (as determined by circulating cortisol globally or cytokines locally) is low, it can easily spike when a foreign body attacks. On the other hand, actual conditions where adrenal functions are suppressed (i.e. Addison's disease), the immune system is hamstrung by the homonal milieu it finds itself in and the immune system doesn't function properly to defend against infection.  One hypothetical cause may be that the ratio of M2 to M1 is tilted in favour of M2 in adrenal insufficiency. 

One pathogen that is known to mess with the M1/M2 expression in macrophages is human cytomegalovirus (HCMV). HCMV basically takes M2 macrophages or undifferentiated monocytes and reprograms them to be more like M1 macrophages in some ways (Chan et al., 2009 and Chan et al., 2008).  Importantly, they do not become like M1 macrophages in that they continue to release interleukin-10, which discourages the adapted immune system from deploying 'killer' lymphocytes (natural-killer cells and T-cells) but encourages more M2 macrophages (red shirts, basically) and at the same time they up-regulate the production of protein filaments that make macrophages motile, so that they can better travel about and infect other organs. So let's all remember, whenever you have massive infiltrate of an organ by macrophages, you can probably bet there's too much IL-10 being produced.

To me the various  pathogens that excel at molecular mimicry to hide from the immune system seem to be playing quite the bogey-man role in a whole host of chronic diseases. These are principally the Herpes virus family (which also includes cytomegalovirus and the Epstein-Barr virus which has been linked to chronic fatigue syndrome) and the bacteria C. Pneumoniae, which has been linked to atherosclerosis and a whole host of other chronic diseases. Paul Jaminet over at Perfect Heath Diet has also been covering C. Pneumoniae in the brain,

It also seems a lot of disease symptoms occur when macrophages manifest some combination of the M1 and M2 state.  Consider that glucose in the blood is regulated by the liver predominately, so insulin resistance (aka metabolic syndrome) reflects failure of the liver to be able to do it's job.  Largely, this is blamed on 'inflammation' in the liver, e.g. alcholic and non-alcoholic fatty liver disease, but what does that mean exactly?

A review (Olefsky and Glass, 2010) states that macrophages in tissue that produce excessive quantities of tumour necrosis factor-α (TNF-α) and interleukin-16 (IL-16) (both pro-inflammatory cytokines), to the point that they are detectable in the blood. I like the following quotation from the review, as it really lays out the problem:
The discovery that adipose tissue from obese mice and humans is infiltrated with increased numbers of macrophages provided a major mechanistic advance into understanding how obesity propagates inflammation (4, 5). Adipose tissue contains bone marrow–derived macrophages, and the content of these macrophages tracks with the degree of obesity (4, 5, 31, 32). In some reports, greater than 40% of the total adipose tissue cell content from obese rodents and humans can be composed of macrophages, compared with ~10% in lean counterparts (32).
That nearly half of fat tissue mass is actually not fat cells, but immune system cells is kind of amazing to me. A very similar thing happens in liver disease. Now, reference #32 is a mouse study (Weisberg et al., 2006) but it in turn cites two other mice studies that are more pertinent (Weisberg et al., 2003 and Xu et al., 2003).  Both articles show that gene expression for various proteins that attract immune system cells are strongly up-regulated in the adipose tissue of fat mice.  The question is why?  Is it diet?  I suspect partially, but the revelations regarding what cytomegalovirus can do to macrophages makes me suspect latent pathogens are attracting macrophages as lambs to the slaughter. Are lab mice susceptible to chronic infections given their conditions and short lifespan?  Are these latent viruses transmitted from mother to infant?

The prevalence of immune system bodies in the adipose tissue of the obese mice illustrates an example of the, "diseases of civilization," being largely driven by dysfunction of the innate immune system, probably egged on by latent viral and bacterial infections and an unnatural diet.  The pieces of the puzzle are mostly there now and evidence will continue to accumulate until we have a better view of the whole picture.  Stop the sources of inflammation (i.e. immune system activation), give the immune system the substrates it needs to fight effectively, and the other symptoms will go away.