Here is a method to produce noise faster than the traditional approach, such as perlin noise, where you sample the 8 integer corners and perform trilinear interpolation, with the limitation that it must be axis aligned and have a constant frequency for a given octave.

I'm focusing on perlin/improved noise here, not simplex noise, which is not axis aligned to begin with.

One of the slowest aspects of perlin noise is generation the pseudo random values for the 8 corners. Perlin uses a LUT, others use integer hashing(particularly necessary if you want to use SSE/AVX).

Say you want to generate a 3D grid 32^3 of noise values, this requires 32,768 calls to noise, and internally the generation of 262,144 (32^3 * 8) pseudo random values.

Many of these values are identical, how many depends on the frequency you are sampling at. The smoother the resulting noise is, the less unique pseudo random values were required.

This approach only requires P^3 pseudo random values, where P is always less than or equal to half of N(most often P is only a small fraction of N, it depends on the frequency). Even at half of N this only requires 4096 (16^3) pseudo random values.

One case where this is particularly applicable is when summing multiple octaves. In many cases 20+ octaves will be used. The outer octaves are very low frequency, there is no need to be calculating so many pseudo random values.

A second major optimization is the interpolation pass. These passes are separable. This means we can interpolate all of the X, then all of the Y, and then all of the Z. This reduces the number of interpolations required from (N^3 * 7) to (N*P*P + N*N*P + N*N*N).

Here is the basic algorithm for generating grid aligned noise. It is much more complicated than perlin noise, and much less flexible. But it is far faster. It also allows for the use of cubic noise with no visible grid structure. It operates on blocks of noise, instead of on individual samples.

This is described in the context of 3D noise, but can be applied to other dimensions.

- Using the initial location and the frequency determine how many psuedo random values are required for N^3 cube of noise. The result should be a much smaller cube P^3.
- Generate all the pseudo random values required for P^3
- Now we must perform some type of interpolation to create a smoothed representation of P^3. Perlins approach can be used, it is fast and only uses linear interpolation, but it does exhibit some grid structure. Alternatively cubic interpolation can be used, although this requires sampling 4 values per axis, and that we have padded P by 1 on either side.
- Perform interpolation along X axis of P^3, this will result in a block of data N*P*P size
- Perform interpolation along Y axis, resulting in a block of N*N*P size
- Perform interpolation along Z axis, now we have the final result that is N*N*N in size and is properly interpolated.

To make it fast, operate on blocks that fit in the L1 cache.

I used this technique for a project I worked on about six years ago. It is very fast, but I'm not using it for my current project as the grid aligned and constant frequency requirements got in the way. It is also only really efficient when you need large blocks of noise, which my current project does not.

Just thought I'd document it. I'm sure someone else has done this as it is fairly obvious.