Fractal image compression

image

A couple of years ago I wrote a very simple implementation of fractal image compression for student work and posted the code on github .

To my surprise, the repository turned out to be quite popular, so I decided to update the code and write an article explaining it and the theory.

Theory


This part is quite theoretical, and if you are only interested in the code documentation, you can skip it.

Compression Mappings


Let be (E,d) Is the full metric space , and f:E rightarrowE - mapping from E on the E .

We say that f is a compressive mapping if exists 0<s<1 such that:

 forallx,y inE,d(f(x),f(y)) leqsd(x,y)


Based on this, f will denote a compression mapping with a compression ratio s .

There are two important theorems on contracting mappings: the Banach fixed point theorem and the collage theorem .

Fixed point theorem : f has a unique fixed point x0 .

Show proof
First we prove that the sequence (un) set as $ inline $ \ left \ {\ begin {alignat *} {2} u_0 & = x \\ u_ {n + 1} & = f (u_n) \ end {alignat *} \ right. $ inline $ is convergent for all x inE .

For all m<n in mathbbN :

\ begin {alignat *} {2} d (u_m, u_n) & = d (f ^ m (x), f ^ n (x)) \\ & \ leq s ^ md (x, f ^ {nm} (x)) \ text {since} f \ text {is a contraction map} \\ & \ leq s ^ m \ left (\ sum_ {i = 0} ^ {nm-1} {d (f ^ i (x ), f ^ {i + 1} (x))} \ right) \ text {from the triangle inequality} \\ & \ leq s ^ m \ left (\ sum_ {i = 0} ^ {nm-1} {s ^ id (x, f (x))} \ right) \ text {since} f \ text {is a contraction map} \\ & = s ^ m \ left (\ frac {1 - s ^ {nm}} {1 - s} d (x, f (x)) \ right) \\ & \ leq \ frac {s ^ m} {1 - s} d (x, f (x)) \ underset {m \ rightarrow \ infty} {\ rightarrow} 0 \ end {alignat *}

\ begin {alignat *} {2} d (u_m, u_n) & = d (f ^ m (x), f ^ n (x)) \\ & \ leq s ^ md (x, f ^ {nm} (x)) \ text {since} f \ text {is a contraction map} \\ & \ leq s ^ m \ left (\ sum_ {i = 0} ^ {nm-1} {d (f ^ i (x ), f ^ {i + 1} (x))} \ right) \ text {from the triangle inequality} \\ & \ leq s ^ m \ left (\ sum_ {i = 0} ^ {nm-1} {s ^ id (x, f (x))} \ right) \ text {since} f \ text {is a contraction map} \\ & = s ^ m \ left (\ frac {1 - s ^ {nm}} {1 - s} d (x, f (x)) \ right) \\ & \ leq \ frac {s ^ m} {1 - s} d (x, f (x)) \ underset {m \ rightarrow \ infty} {\ rightarrow} 0 \ end {alignat *}


Consequently, (un) is a Cauchy sequence , and E is full space, which means (un) is convergent. Let her be the limit x0 .

Moreover, since the contraction map is continuous as a Lipschitz map , it is also continuous, i.e. f(un) rightarrowf(x0) . Therefore, if n tends to infinity in un+1=f(un) , we get x0=f(x0) . I.e x0 is a fixed point f .

We have shown that f has a fixed point. Let us show by contradiction that it is unique. Let be y0 - another fixed point. Then:

d(x0,y0)=d(f(x0),f(y0)) leqsd(x0,y0)<d(x0,y0)


There was a contradiction.

Further we will denote as x0 fixed point f .

Collage theorem : if d(x,f(x))< epsilon then d(x,x0)< frac epsilon1−s .

Show proof
In the previous proof, we showed that d(um,un) leq fracsm1−sd(x,f(x))= fracsm1−s epsilon .

If we fix m in 0 then we get d(x,un) leq frac epsilon1−s .

When striving n to infinity we get the desired result.

The second theorem tells us that if we find a contraction map f such that f(x) close to x , then we can be sure that the fixed point f also close to x .

This result will be the foundation for our future work. And in fact, instead of saving the image, it is enough for us to save the compressive display, the fixed point of which is close to the image.

Compression displays for images


In this part I will show how to create such compressive displays so that the fixed point is close to the image.

First, let's set the image set and distance. We will choose E=[0,1]h timesw . E Is a lot of matrices with h in rows w columns and with coefficients in the interval [0,1] . Then we will take d(x,y)= left( sumhi=1 sumwj=1(xij−yij)2 right)0.5 . d Is the distance obtained from the Frobenius norm .

Let be x inE Is the image we want to compress.

We will split the image into blocks twice:

  • First, we divide the image into finite or interval blocks R1,...,RL . These blocks are separated and cover the entire image.
  • Then we divide the image into blocks of sources or domains D1,...,DK . These blocks are not necessarily separated and do not necessarily cover the entire image.

For example, we can segment the image as follows:


Then for each interval block Rl we select a domain block Dkl and mapping fl:[0,1]Dkl rightarrow[0,1]Rl .

Next we can define a function f as:

f(x)ij=fl(xDkl)ij textif(i,j) inRl


Approval : if fl are contracting mappings, then and f also compressive mapping.

Show proof
Let be x,y inE and suppose that all fl are compression mappings with a compression ratio sl . Then we get the following:

\ begin {alignat *} {2} d (f (x), f (y)) ^ 2 & = \ sum_ {i = 1} ^ {h} {\ sum_ {j = 1} ^ {w} { (f (x) _ {ij} -f (y) _ {ij}) ^ 2}} \ text {by definition} d \\ & = \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(f (x) _ {ij} -f (y) _ {ij}) ^ 2}} \ text {since} (R_l) \ text {is a partition} \\ & = \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(f_l (x_ {D_ {k_l}}) _ {ij} -f_l (y_ {D_ {k_l}}) _ {ij }) ^ 2}} \ text {by definition} f \\ & = \ sum_ {l = 1} ^ L {d (f_l (x_ {D_ {k_l}}), f_l (y_ {D_ {k_l}}) ) ^ 2} \ text {by definition} d \\ & \ leq \ sum_ {l = 1} ^ L {s_l ^ 2d (x_ {D_ {k_l}}, y_ {D_ {k_l}}) ^ 2} \ text {since} (f_l) \ text {are contraction mappings} \\ & \ leq \ underset {l} {\ max} {s_l ^ 2} \ sum_ {l = 1} ^ L {d (x_ {D_ {k_l }}, y_ {D_ {k_l}}) ^ 2} \\ & = \ underset {l} {\ max} {s_l ^ 2} \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(x_ {ij} -y_ {ij}) ^ 2}} \ text {by definition} d \\ & = \ underset {l} {\ max} {s_l ^ 2} \ sum_ {i = 1} ^ {h} {\ sum_ {j = 1} ^ {w} {(x_ {ij} -y_ {ij}) ^ 2}} \ text {since} (R_l) \ text {is a partition} \\ & = \ underset {l} {\ max} {s_l ^ 2} d (x, y) ^ 2 \ text {by definition} d \\ \ end {alignat *}

\ begin {alignat *} {2} d (f (x), f (y)) ^ 2 & = \ sum_ {i = 1} ^ {h} {\ sum_ {j = 1} ^ {w} { (f (x) _ {ij} -f (y) _ {ij}) ^ 2}} \ text {by definition} d \\ & = \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(f (x) _ {ij} -f (y) _ {ij}) ^ 2}} \ text {since} (R_l) \ text {is a partition} \\ & = \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(f_l (x_ {D_ {k_l}}) _ {ij} -f_l (y_ {D_ {k_l}}) _ {ij }) ^ 2}} \ text {by definition} f \\ & = \ sum_ {l = 1} ^ L {d (f_l (x_ {D_ {k_l}}), f_l (y_ {D_ {k_l}}) ) ^ 2} \ text {by definition} d \\ & \ leq \ sum_ {l = 1} ^ L {s_l ^ 2d (x_ {D_ {k_l}}, y_ {D_ {k_l}}) ^ 2} \ text {since} (f_l) \ text {are contraction mappings} \\ & \ leq \ underset {l} {\ max} {s_l ^ 2} \ sum_ {l = 1} ^ L {d (x_ {D_ {k_l }}, y_ {D_ {k_l}}) ^ 2} \\ & = \ underset {l} {\ max} {s_l ^ 2} \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(x_ {ij} -y_ {ij}) ^ 2}} \ text {by definition} d \\ & = \ underset {l} {\ max} {s_l ^ 2} \ sum_ {i = 1} ^ {h} {\ sum_ {j = 1} ^ {w} {(x_ {ij} -y_ {ij}) ^ 2}} \ text {since} (R_l) \ text {is a partition} \\ & = \ underset {l} {\ max} {s_l ^ 2} d (x, y) ^ 2 \ text {by definition} d \\ \ end {alignat *}

There remains one question that needs to be answered: how to choose Dkl and fl ?

Collage theorem offers a way to choose them: if xRl is close to f(xDkl) for all l then x is close to f(x) and by the collage theorem x and x0 are also close.

So we are independent for everyone l we can construct a set of squeezing mappings from each Dk on the Rl and choose the best. In the next section, we show all the details of this operation.

Implementation


In each section I will copy interesting code fragments, and the entire script can be found here .

Partitions


I chose a very simple approach. Source blocks and leaf blocks segment the image on a grid, as shown in the image above.

The size of the blocks is equal to the powers of two and this greatly simplifies the work. Source blocks are 8 by 8 and end blocks are 4 by 4.

There are more complex partitioning schemes. For example, we can use the quadtree tree to break up areas with lots of detail more strongly.

Conversions


In this section, I will show how to create compression mappings from Dk on the Rl .

Remember that we want to generate such a mapping fl to f(xDk) was close to xRl . That is, the more mappings we generate, the more likely it is to find a good one.

However, the quality of compression depends on the number of bits required to store fl . That is, if many functions are too large, then the compression will be bad. A compromise is needed here.

I decided that fl will look like this:

fl(xDk)=s timesrotate theta(flipd(reduce(xDk)))+b


Where reduce - this is a function for moving from blocks 8 to 8 to blocks 4 to 4, flip and rotate - affine transformations, s changes the contrast, and b - brightness.

The reduce function reduces the size of the image by averaging the surroundings:

 def reduce(img, factor): result = np.zeros((img.shape[0] // factor, img.shape[1] // factor)) for i in range(result.shape[0]): for j in range(result.shape[1]): result[i,j] = np.mean(img[i*factor:(i+1)*factor,j*factor:(j+1)*factor]) return result 

The rotate function rotates the image by a given angle:

 def rotate(img, angle): return ndimage.rotate(img, angle, reshape=False) 

To maintain the shape of the image angle  theta can only take values \ {0 ^ {\ circ}, 90 ^ {\ circ}, 180 ^ {\ circ}, 270 ^ {\ circ} \} .

The flip function mirrors the image if direction is -1 and does not reflect if the value is 1:

 def flip(img, direction): return img[::direction,:] 

Complete conversion is performed by the apply_transformation function:

 def apply_transformation(img, direction, angle, contrast=1.0, brightness=0.0): return contrast*rotate(flip(img, direction), angle) + brightness 

We need 1 bit to remember if mirroring is required, and 2 bits for the rotation angle. Moreover, if we save s and b Using 8 bits for each value, we will need only 11 bits to save the conversion.

In addition, we should check if these functions are contraction mappings. The proof of this is a little boring, and not really what we need. Perhaps later I will add it as an appendix to the article.

Compression


The compression algorithm is simple. First, we generate all possible affine transformations of all source blocks using the generate_all_transformed_blocks function:

 def generate_all_transformed_blocks(img, source_size, destination_size, step): factor = source_size // destination_size transformed_blocks = [] for k in range((img.shape[0] - source_size) // step + 1): for l in range((img.shape[1] - source_size) // step + 1): # Extract the source block and reduce it to the shape of a destination block S = reduce(img[k*step:k*step+source_size,l*step:l*step+source_size], factor) # Generate all possible transformed blocks for direction, angle in candidates: transformed_blocks.append((k, l, direction, angle, apply_transform(S, direction, angle))) return transformed_blocks 

Then, for each final block, we check all previously generated transformed source blocks. For each, we optimize contrast and brightness using the find_contrast_and_brightness2 method, and if the tested conversion is the best of all so far found, then save it:

 def compress(img, source_size, destination_size, step): transformations = [] transformed_blocks = generate_all_transformed_blocks(img, source_size, destination_size, step) for i in range(img.shape[0] // destination_size): transformations.append([]) for j in range(img.shape[1] // destination_size): print(i, j) transformations[i].append(None) min_d = float('inf') # Extract the destination block D = img[i*destination_size:(i+1)*destination_size,j*destination_size:(j+1)*destination_size] # Test all possible transformations and take the best one for k, l, direction, angle, S in transformed_blocks: contrast, brightness = find_contrast_and_brightness2(D, S) S = contrast*S + brightness d = np.sum(np.square(D - S)) if d < min_d: min_d = d transformations[i][j] = (k, l, direction, angle, contrast, brightness) return transformations 

To find the best contrast and brightness, the find_contrast_and_brightness2 method simply solves the least squares problem:

 def find_contrast_and_brightness2(D, S): # Fit the contrast and the brightness A = np.concatenate((np.ones((S.size, 1)), np.reshape(S, (S.size, 1))), axis=1) b = np.reshape(D, (D.size,)) x, _, _, _ = np.linalg.lstsq(A, b) return x[1], x[0] 

Unpacking


The decompression algorithm is even simpler. We start with a completely random image, and then apply a squeeze image several times f :

 def decompress(transformations, source_size, destination_size, step, nb_iter=8): factor = source_size // destination_size height = len(transformations) * destination_size width = len(transformations[0]) * destination_size iterations = [np.random.randint(0, 256, (height, width))] cur_img = np.zeros((height, width)) for i_iter in range(nb_iter): print(i_iter) for i in range(len(transformations)): for j in range(len(transformations[i])): # Apply transform k, l, flip, angle, contrast, brightness = transformations[i][j] S = reduce(iterations[-1][k*step:k*step+source_size,l*step:l*step+source_size], factor) D = apply_transformation(S, flip, angle, contrast, brightness) cur_img[i*destination_size:(i+1)*destination_size,j*destination_size:(j+1)*destination_size] = D iterations.append(cur_img) cur_img = np.zeros((height, width)) return iterations 

This algorithm works because the compression map has a unique fixed point, and no matter what source image we choose, we will strive for it.

I think the time has come for a small example. I will try to compress and unzip the monkey image:


The test_greyscale function loads the image, compresses it, decompresses it, and shows each unpacking iteration:


Not bad at all for such a simple implementation!

RGB images


A very naive approach to compressing RGB images is to compress all three channels individually:

 def compress_rgb(img, source_size, destination_size, step): img_r, img_g, img_b = extract_rgb(img) return [compress(img_r, source_size, destination_size, step), \ compress(img_g, source_size, destination_size, step), \ compress(img_b, source_size, destination_size, step)] 

And for unpacking, we simply unpack the data of three channels separately and collect them in three image channels:

 def decompress_rgb(transformations, source_size, destination_size, step, nb_iter=8): img_r = decompress(transformations[0], source_size, destination_size, step, nb_iter)[-1] img_g = decompress(transformations[1], source_size, destination_size, step, nb_iter)[-1] img_b = decompress(transformations[2], source_size, destination_size, step, nb_iter)[-1] return assemble_rbg(img_r, img_g, img_b) 

Another very simple solution is to use the same compression display for all three channels, because they are often very similar.

If you want to check how this works, then run the test_rgb function:


Artifacts appeared. This method is probably too naive to produce good results.

Where to go next


If you want to know more about fractal image compression, I can recommend you read the article Fractal and Wavelet Image Compression Techniques by Stephen Welsted. It is easy to read and explains more sophisticated techniques.

Source: https://habr.com/ru/post/479200/


All Articles