Wednesday, October 26, 2011


  Recently I've been teaching myself lisp-- so far it seems like a very creative language, which I like, as I view creativity as one of the most important skills for a programmer to have.

 There are like a million different versions of lisp it seems and no single standard implementation.

 And for whatever reason I decided to write yet another...

 My lisp compiles to Lua, I figured that LuaJIT2 is the fastest dynamic language VM around, so targeting it should allow my Lisp implementation to perform better than most. I had no experience with Lisp prior to this, but I certainly learned lisp fairly well while writing a compiler for it-- also improved my Lua.

 It is somewhat different from the lisp norm in that it does not use lists and cons cells, instead I use a Lua table in array form.  No named arguments either, although you can fake them just as you would in Lua.  I also use { to introduce raw Lua code, and } to return to Lisp.

 Still have to implement most of the library functions that come with most lisp implementations, and I'm sure I'll have to fix a few bugs in the compiler yet, but it is working, including support for macros.

 I also saw that the creator of Lisp, John McCarthy, died a few days ago:(

Thursday, October 20, 2011

Fast Grid Aligned Noise

  Here is a method to produce noise faster than the traditional approach, such as perlin noise, where you sample the 8 integer corners and perform trilinear interpolation, with the limitation that it must be axis aligned and have a constant frequency for a given octave.

 I'm focusing on perlin/improved noise here, not simplex noise, which is not axis aligned to begin with.

 One of the slowest aspects of perlin noise is generation the pseudo random values for the 8 corners. Perlin uses a LUT, others use integer hashing(particularly necessary if you want to use SSE/AVX).

 Say you want to generate a 3D grid 32^3 of noise values, this requires 32,768 calls to noise, and internally the generation of 262,144 (32^3 * 8) pseudo random values.

 Many of these values are identical, how many depends on the frequency you are sampling at. The smoother the resulting noise is, the less unique pseudo random values were required.

  This approach only requires P^3 pseudo random values, where P is always less than or equal to half of N(most often P is only a small fraction of N, it depends on the frequency). Even at half of N this only requires 4096 (16^3) pseudo random values.

 One case where this is particularly applicable is when summing multiple octaves.  In many cases 20+ octaves will be used.  The outer octaves are very low frequency, there is no need to be calculating so many pseudo random values.

 A second major optimization is the interpolation pass.  These passes are separable.  This means we can interpolate all of the X, then all of the Y, and then all of the Z.  This reduces the number of interpolations required from (N^3 * 7) to (N*P*P + N*N*P + N*N*N).

 Here is the basic algorithm for generating grid aligned noise. It is much more complicated than perlin noise, and much less flexible. But it is far faster. It also allows for the use of cubic noise with no visible grid structure. It operates on blocks of noise, instead of on individual samples.

This is described in the context of 3D noise, but can be applied to other dimensions.
  1. Using the initial location and the frequency determine how many psuedo random values are required for N^3 cube of noise. The result should be a much smaller cube P^3. 
  2. Generate all the pseudo random values required for P^3
  3. Now we must perform some type of interpolation to create a smoothed representation of P^3. Perlins approach can be used, it is fast and only uses linear interpolation, but it does exhibit some grid structure.  Alternatively cubic interpolation can be used, although this requires sampling 4 values per axis, and that we have padded P by 1 on either side.
  4. Perform interpolation along X axis of P^3, this will result in a block of data N*P*P size
  5. Perform interpolation along Y axis, resulting in a block of N*N*P size
  6. Perform interpolation along Z axis, now we have the final result that is N*N*N in size and is properly interpolated.
To make it fast, operate on blocks that fit in the L1 cache.  

 I used this technique for a project I worked on about six years ago.  It is very fast, but I'm not using it for my current project as the grid aligned and constant frequency requirements got in the way. It is also only really efficient when you need large blocks of noise, which my current project does not.

Just thought I'd document it.  I'm sure someone else has done this as it is fairly obvious.

Monday, October 17, 2011

Pacific Crest Trail

 I was gone for a really long time.

 Hiked the Pacific Crest Trail(PCT) from Mexico to Canada, started May 1st, ended on Oct 9th.

It was the best time I've ever had, and along the way I met some great people.

 Now back to what is commonly called the real world...

 Just a few images from the entire 2650 mile trail--

Wilson, my soccer ball-- he lasted 800 miles
One of many raging creeks in the Sierras

Muir Hut 

One of the 10 or so passes in the Sierras

 Southern California, not just a Desert after all

 The monument by the Mexican border

 Eagle Rock 

 We are descending into Tuolumne Meadows