A couple of years ago I wrote a very simple implementation of fractal image compression for student work and posted the code on github .

To my surprise, the repository turned out to be quite popular, so I decided to update the code and write an article explaining it and the theory.

Theory

This part is quite theoretical, and if you are only interested in the code documentation, you can skip it.

Compression Mappings

Let be

$(E, d)$ Is the full metric space , and

$f: E \ rightarrow E$ - mapping from

$E$ on the

$E$ .

We say that

$f$ is a compressive mapping if exists

$0 <s <1$ such that:

$\ forall x, y \ in E, d (f (x), f (y)) \ leq sd (x, y)$

Based on this,

$f$ will denote a compression mapping with a compression ratio

$s$ .

There are two important theorems on contracting mappings: the Banach fixed point theorem and the collage theorem .

Fixed point theorem :

$f$ has a unique fixed point

$x_0$ .

Show proof

First we prove that the sequence

$(u_n)$ set as

$ inline $ \ left \ {\ begin {alignat *} {2} u_0 & = x \\ u_ {n + 1} & = f (u_n) \ end {alignat *} \ right. $ inline $ is convergent for all

$x \ in E$ .

For all

$m <n \ in \ mathbb {N}$ :

\ begin {alignat *} {2} d (u_m, u_n) & = d (f ^ m (x), f ^ n (x)) \\ & \ leq s ^ md (x, f ^ {nm} (x)) \ text {since} f \ text {is a contraction map} \\ & \ leq s ^ m \ left (\ sum_ {i = 0} ^ {nm-1} {d (f ^ i (x ), f ^ {i + 1} (x))} \ right) \ text {from the triangle inequality} \\ & \ leq s ^ m \ left (\ sum_ {i = 0} ^ {nm-1} {s ^ id (x, f (x))} \ right) \ text {since} f \ text {is a contraction map} \\ & = s ^ m \ left (\ frac {1 - s ^ {nm}} {1 - s} d (x, f (x)) \ right) \\ & \ leq \ frac {s ^ m} {1 - s} d (x, f (x)) \ underset {m \ rightarrow \ infty} {\ rightarrow} 0 \ end {alignat *}

$\ begin {alignat *} {2} d (u_m, u_n) & = d (f ^ m (x), f ^ n (x)) \\ & \ leq s ^ md (x, f ^ {nm} (x)) \ text {since} f \ text {is a contraction map} \\ & \ leq s ^ m \ left (\ sum_ {i = 0} ^ {nm-1} {d (f ^ i (x ), f ^ {i + 1} (x))} \ right) \ text {from the triangle inequality} \\ & \ leq s ^ m \ left (\ sum_ {i = 0} ^ {nm-1} {s ^ id (x, f (x))} \ right) \ text {since} f \ text {is a contraction map} \\ & = s ^ m \ left (\ frac {1 - s ^ {nm}} {1 - s} d (x, f (x)) \ right) \\ & \ leq \ frac {s ^ m} {1 - s} d (x, f (x)) \ underset {m \ rightarrow \ infty} {\ rightarrow} 0 \ end {alignat *}$

Consequently,

$(u_n)$ is a Cauchy sequence , and

$E$ is full space, which means

$(u_n)$ is convergent. Let her be the limit

$x_0$ .

Moreover, since the contraction map is continuous as a Lipschitz map , it is also continuous, i.e.

$f (u_n) \ rightarrow f (x_0)$ . Therefore, if

$n$ tends to infinity in

$u_ {n + 1} = f (u_n)$ , we get

$x_0 = f (x_0)$ . I.e

$x_0$ is a fixed point

$f$ .

We have shown that

$f$ has a fixed point. Let us show by contradiction that it is unique. Let be

$y_0$ - another fixed point. Then:

$d (x_0, y_0) = d (f (x_0), f (y_0)) \ leq sd (x_0, y_0) <d (x_0, y_0)$

There was a contradiction.

Further we will denote as

$x_0$ fixed point

$f$ .

Collage theorem : if

$d (x, f (x)) <\ epsilon$ then

$d (x, x_0) <\ frac {\ epsilon} {1 - s}$ .

Show proof

In the previous proof, we showed that

$d (u_m, u_n) \ leq \ frac {s ^ m} {1 - s} d (x, f (x)) = \ frac {s ^ m} {1 - s} \ epsilon$ .

If we fix

$m$ in

$0$ then we get

$d (x, u_n) \ leq \ frac {\ epsilon} {1 - s}$ .

When striving

$n$ to infinity we get the desired result.

The second theorem tells us that if we find a contraction map

$f$ such that

$f (x)$ close to

$x$ , then we can be sure that the fixed point

$f$ also close to

$x$ .

This result will be the foundation for our future work. And in fact, instead of saving the image, it is enough for us to save the compressive display, the fixed point of which is close to the image.

Compression displays for images

In this part I will show how to create such compressive displays so that the fixed point is close to the image.

First, let's set the image set and distance. We will choose

$E = [0, 1] ^ {h \ times w}$ .

$E$ Is a lot of matrices with

$h$ in rows

$w$ columns and with coefficients in the interval

$[0, 1]$ . Then we will take

$d (x, y) = \ left (\ sum_ {i = 1} ^ {h} {\ sum_ {j = 1} ^ {w} {(x_ {ij} -y_ {ij}) ^ 2}} \ right) ^ {0.5}$ .

$d$ Is the distance obtained from the Frobenius norm .

Let be

$x \ in E$ Is the image we want to compress.

We will split the image into blocks twice:

First, we divide the image into finite or interval blocks $R_1, ..., R_L$ . These blocks are separated and cover the entire image.
Then we divide the image into blocks of sources or domains $D_1, ..., D_K$ . These blocks are not necessarily separated and do not necessarily cover the entire image.

For example, we can segment the image as follows:

Then for each interval block

$R_l$ we select a domain block

$D_ {k_l}$ and mapping

$f_l: [0, 1] ^ {D_ {k_l}} \ rightarrow [0, 1] ^ {R_ {l}}$ .

Next we can define a function

$f$ as:

$f (x) _ {ij} = f_l (x_ {D_ {k_l}}) _ {ij} \ text {if} (i, j) \ in R_l$

Approval : if

$f_l$ are contracting mappings, then and

$f$ also compressive mapping.

Show proof

Let be

$x, y \ in E$ and suppose that all

$f_l$ are compression mappings with a compression ratio

$s_l$ . Then we get the following:

\ begin {alignat *} {2} d (f (x), f (y)) ^ 2 & = \ sum_ {i = 1} ^ {h} {\ sum_ {j = 1} ^ {w} { (f (x) _ {ij} -f (y) _ {ij}) ^ 2}} \ text {by definition} d \\ & = \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(f (x) _ {ij} -f (y) _ {ij}) ^ 2}} \ text {since} (R_l) \ text {is a partition} \\ & = \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(f_l (x_ {D_ {k_l}}) _ {ij} -f_l (y_ {D_ {k_l}}) _ {ij }) ^ 2}} \ text {by definition} f \\ & = \ sum_ {l = 1} ^ L {d (f_l (x_ {D_ {k_l}}), f_l (y_ {D_ {k_l}}) ) ^ 2} \ text {by definition} d \\ & \ leq \ sum_ {l = 1} ^ L {s_l ^ 2d (x_ {D_ {k_l}}, y_ {D_ {k_l}}) ^ 2} \ text {since} (f_l) \ text {are contraction mappings} \\ & \ leq \ underset {l} {\ max} {s_l ^ 2} \ sum_ {l = 1} ^ L {d (x_ {D_ {k_l }}, y_ {D_ {k_l}}) ^ 2} \\ & = \ underset {l} {\ max} {s_l ^ 2} \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(x_ {ij} -y_ {ij}) ^ 2}} \ text {by definition} d \\ & = \ underset {l} {\ max} {s_l ^ 2} \ sum_ {i = 1} ^ {h} {\ sum_ {j = 1} ^ {w} {(x_ {ij} -y_ {ij}) ^ 2}} \ text {since} (R_l) \ text {is a partition} \\ & = \ underset {l} {\ max} {s_l ^ 2} d (x, y) ^ 2 \ text {by definition} d \\ \ end {alignat *}

$\ begin {alignat *} {2} d (f (x), f (y)) ^ 2 & = \ sum_ {i = 1} ^ {h} {\ sum_ {j = 1} ^ {w} { (f (x) _ {ij} -f (y) _ {ij}) ^ 2}} \ text {by definition} d \\ & = \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(f (x) _ {ij} -f (y) _ {ij}) ^ 2}} \ text {since} (R_l) \ text {is a partition} \\ & = \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(f_l (x_ {D_ {k_l}}) _ {ij} -f_l (y_ {D_ {k_l}}) _ {ij }) ^ 2}} \ text {by definition} f \\ & = \ sum_ {l = 1} ^ L {d (f_l (x_ {D_ {k_l}}), f_l (y_ {D_ {k_l}}) ) ^ 2} \ text {by definition} d \\ & \ leq \ sum_ {l = 1} ^ L {s_l ^ 2d (x_ {D_ {k_l}}, y_ {D_ {k_l}}) ^ 2} \ text {since} (f_l) \ text {are contraction mappings} \\ & \ leq \ underset {l} {\ max} {s_l ^ 2} \ sum_ {l = 1} ^ L {d (x_ {D_ {k_l }}, y_ {D_ {k_l}}) ^ 2} \\ & = \ underset {l} {\ max} {s_l ^ 2} \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(x_ {ij} -y_ {ij}) ^ 2}} \ text {by definition} d \\ & = \ underset {l} {\ max} {s_l ^ 2} \ sum_ {i = 1} ^ {h} {\ sum_ {j = 1} ^ {w} {(x_ {ij} -y_ {ij}) ^ 2}} \ text {since} (R_l) \ text {is a partition} \\ & = \ underset {l} {\ max} {s_l ^ 2} d (x, y) ^ 2 \ text {by definition} d \\ \ end {alignat *}$

There remains one question that needs to be answered: how to choose

$D_ {k_l}$ and

$f_l$ ?

Collage theorem offers a way to choose them: if

$x_ {R_l}$ is close to

$f (x_ {D_ {k_l}})$ for all

$l$ then

$x$ is close to

$f (x)$ and by the collage theorem

$x$ and

$x_0$ are also close.

So we are independent for everyone

$l$ we can construct a set of squeezing mappings from each

$D_ {k}$ on the

$R_l$ and choose the best. In the next section, we show all the details of this operation.

Implementation

In each section I will copy interesting code fragments, and the entire script can be found here .

Partitions

I chose a very simple approach. Source blocks and leaf blocks segment the image on a grid, as shown in the image above.

The size of the blocks is equal to the powers of two and this greatly simplifies the work. Source blocks are 8 by 8 and end blocks are 4 by 4.

There are more complex partitioning schemes. For example, we can use the quadtree tree to break up areas with lots of detail more strongly.

Conversions

In this section, I will show how to create compression mappings from

$D_ {k}$ on the

$R_l$ .

Remember that we want to generate such a mapping

$f_l$ to

$f (x_ {D_k})$ was close to

$x_ {R_l}$ . That is, the more mappings we generate, the more likely it is to find a good one.

However, the quality of compression depends on the number of bits required to store

$f_l$ . That is, if many functions are too large, then the compression will be bad. A compromise is needed here.

I decided that

$f_l$ will look like this:

$f_l (x_ {D_k}) = s \ times rotate _ {\ theta} (flip_d (reduce (x_ {D_k}))) + b$

Where

$reduce$ - this is a function for moving from blocks 8 to 8 to blocks 4 to 4,

$flip$ and

$rotate$ - affine transformations,

$s$ changes the contrast, and

$b$ - brightness.

The reduce function reduces the size of the image by averaging the surroundings:

 def reduce(img, factor): result = np.zeros((img.shape[0] // factor, img.shape[1] // factor)) for i in range(result.shape[0]): for j in range(result.shape[1]): result[i,j] = np.mean(img[i*factor:(i+1)*factor,j*factor:(j+1)*factor]) return result

The rotate function rotates the image by a given angle:

 def rotate(img, angle): return ndimage.rotate(img, angle, reshape=False)

To maintain the shape of the image angle

$\ theta$ can only take values

$\ {0 ^ {\ circ}, 90 ^ {\ circ}, 180 ^ {\ circ}, 270 ^ {\ circ} \}$ .

The flip function mirrors the image if direction is -1 and does not reflect if the value is 1:

 def flip(img, direction): return img[::direction,:]

Complete conversion is performed by the apply_transformation function:

 def apply_transformation(img, direction, angle, contrast=1.0, brightness=0.0): return contrast*rotate(flip(img, direction), angle) + brightness

We need 1 bit to remember if mirroring is required, and 2 bits for the rotation angle. Moreover, if we save

$s$ and

$b$ Using 8 bits for each value, we will need only 11 bits to save the conversion.

In addition, we should check if these functions are contraction mappings. The proof of this is a little boring, and not really what we need. Perhaps later I will add it as an appendix to the article.

Compression

The compression algorithm is simple. First, we generate all possible affine transformations of all source blocks using the generate_all_transformed_blocks function:

 def generate_all_transformed_blocks(img, source_size, destination_size, step): factor = source_size // destination_size transformed_blocks = [] for k in range((img.shape[0] - source_size) // step + 1): for l in range((img.shape[1] - source_size) // step + 1): # Extract the source block and reduce it to the shape of a destination block S = reduce(img[k*step:k*step+source_size,l*step:l*step+source_size], factor) # Generate all possible transformed blocks for direction, angle in candidates: transformed_blocks.append((k, l, direction, angle, apply_transform(S, direction, angle))) return transformed_blocks

Then, for each final block, we check all previously generated transformed source blocks. For each, we optimize contrast and brightness using the find_contrast_and_brightness2 method, and if the tested conversion is the best of all so far found, then save it:

 def compress(img, source_size, destination_size, step): transformations = [] transformed_blocks = generate_all_transformed_blocks(img, source_size, destination_size, step) for i in range(img.shape[0] // destination_size): transformations.append([]) for j in range(img.shape[1] // destination_size): print(i, j) transformations[i].append(None) min_d = float('inf') # Extract the destination block D = img[i*destination_size:(i+1)*destination_size,j*destination_size:(j+1)*destination_size] # Test all possible transformations and take the best one for k, l, direction, angle, S in transformed_blocks: contrast, brightness = find_contrast_and_brightness2(D, S) S = contrast*S + brightness d = np.sum(np.square(D - S)) if d < min_d: min_d = d transformations[i][j] = (k, l, direction, angle, contrast, brightness) return transformations

To find the best contrast and brightness, the find_contrast_and_brightness2 method simply solves the least squares problem:

 def find_contrast_and_brightness2(D, S): # Fit the contrast and the brightness A = np.concatenate((np.ones((S.size, 1)), np.reshape(S, (S.size, 1))), axis=1) b = np.reshape(D, (D.size,)) x, _, _, _ = np.linalg.lstsq(A, b) return x[1], x[0]

Unpacking

The decompression algorithm is even simpler. We start with a completely random image, and then apply a squeeze image several times

$f$ :

 def decompress(transformations, source_size, destination_size, step, nb_iter=8): factor = source_size // destination_size height = len(transformations) * destination_size width = len(transformations[0]) * destination_size iterations = [np.random.randint(0, 256, (height, width))] cur_img = np.zeros((height, width)) for i_iter in range(nb_iter): print(i_iter) for i in range(len(transformations)): for j in range(len(transformations[i])): # Apply transform k, l, flip, angle, contrast, brightness = transformations[i][j] S = reduce(iterations[-1][k*step:k*step+source_size,l*step:l*step+source_size], factor) D = apply_transformation(S, flip, angle, contrast, brightness) cur_img[i*destination_size:(i+1)*destination_size,j*destination_size:(j+1)*destination_size] = D iterations.append(cur_img) cur_img = np.zeros((height, width)) return iterations

This algorithm works because the compression map has a unique fixed point, and no matter what source image we choose, we will strive for it.

I think the time has come for a small example. I will try to compress and unzip the monkey image:

The test_greyscale function loads the image, compresses it, decompresses it, and shows each unpacking iteration:

Not bad at all for such a simple implementation!

RGB images

A very naive approach to compressing RGB images is to compress all three channels individually:

 def compress_rgb(img, source_size, destination_size, step): img_r, img_g, img_b = extract_rgb(img) return [compress(img_r, source_size, destination_size, step), \ compress(img_g, source_size, destination_size, step), \ compress(img_b, source_size, destination_size, step)]

And for unpacking, we simply unpack the data of three channels separately and collect them in three image channels:

 def decompress_rgb(transformations, source_size, destination_size, step, nb_iter=8): img_r = decompress(transformations[0], source_size, destination_size, step, nb_iter)[-1] img_g = decompress(transformations[1], source_size, destination_size, step, nb_iter)[-1] img_b = decompress(transformations[2], source_size, destination_size, step, nb_iter)[-1] return assemble_rbg(img_r, img_g, img_b)

Another very simple solution is to use the same compression display for all three channels, because they are often very similar.

If you want to check how this works, then run the test_rgb function:

Artifacts appeared. This method is probably too naive to produce good results.

Where to go next

If you want to know more about fractal image compression, I can recommend you read the article Fractal and Wavelet Image Compression Techniques by Stephen Welsted. It is easy to read and explains more sophisticated techniques.

Fractal image compression

Theory

Compression Mappings

Compression displays for images

Implementation

Partitions

Conversions

Compression

Unpacking

RGB images

Where to go next

More articles: