A couple of years ago I wrote a very simple implementation of fractal image compression for student work and posted the code on
github .
To my surprise, the repository turned out to be quite popular, so I decided to update the code and write an article explaining it and the theory.
Theory
This part is quite theoretical, and if you are only interested in the code documentation, you can skip it.
Compression Mappings
Let be
(E,d) Is the
full metric space , and
f:E rightarrowE - mapping from
E on the
E .
We say that
f is a
compressive mapping if exists
0<s<1 such that:
forallx,y inE,d(f(x),f(y)) leqsd(x,y)
Based on this,
f will denote a compression mapping with a compression ratio
s .
There are two important theorems on contracting mappings:
the Banach fixed point theorem and
the collage theorem .
Fixed point theorem :
f has a unique fixed point
x0 .
Show proofFirst we prove that the sequence
(un) set as
$ inline $ \ left \ {\ begin {alignat *} {2} u_0 & = x \\ u_ {n + 1} & = f (u_n) \ end {alignat *} \ right. $ inline $ is convergent for all
x inE .
For all
m<n in mathbbN :
\ begin {alignat *} {2} d (u_m, u_n) & = d (f ^ m (x), f ^ n (x)) \\ & \ leq s ^ md (x, f ^ {nm} (x)) \ text {since} f \ text {is a contraction map} \\ & \ leq s ^ m \ left (\ sum_ {i = 0} ^ {nm-1} {d (f ^ i (x ), f ^ {i + 1} (x))} \ right) \ text {from the triangle inequality} \\ & \ leq s ^ m \ left (\ sum_ {i = 0} ^ {nm-1} {s ^ id (x, f (x))} \ right) \ text {since} f \ text {is a contraction map} \\ & = s ^ m \ left (\ frac {1 - s ^ {nm}} {1 - s} d (x, f (x)) \ right) \\ & \ leq \ frac {s ^ m} {1 - s} d (x, f (x)) \ underset {m \ rightarrow \ infty} {\ rightarrow} 0 \ end {alignat *}
\ begin {alignat *} {2} d (u_m, u_n) & = d (f ^ m (x), f ^ n (x)) \\ & \ leq s ^ md (x, f ^ {nm} (x)) \ text {since} f \ text {is a contraction map} \\ & \ leq s ^ m \ left (\ sum_ {i = 0} ^ {nm-1} {d (f ^ i (x ), f ^ {i + 1} (x))} \ right) \ text {from the triangle inequality} \\ & \ leq s ^ m \ left (\ sum_ {i = 0} ^ {nm-1} {s ^ id (x, f (x))} \ right) \ text {since} f \ text {is a contraction map} \\ & = s ^ m \ left (\ frac {1 - s ^ {nm}} {1 - s} d (x, f (x)) \ right) \\ & \ leq \ frac {s ^ m} {1 - s} d (x, f (x)) \ underset {m \ rightarrow \ infty} {\ rightarrow} 0 \ end {alignat *}
Consequently,
(un) is
a Cauchy sequence , and
E is full space, which means
(un) is convergent. Let her be the limit
x0 .
Moreover, since the contraction map is
continuous as a Lipschitz map , it is also continuous, i.e.
f(un) rightarrowf(x0) . Therefore, if
n tends to infinity in
un+1=f(un) , we get
x0=f(x0) . I.e
x0 is a fixed point
f .
We have shown that
f has a fixed point. Let us show by contradiction that it is unique. Let be
y0 - another fixed point. Then:
d(x0,y0)=d(f(x0),f(y0)) leqsd(x0,y0)<d(x0,y0)
There was a contradiction.
Further we will denote as
x0 fixed point
f .
Collage theorem : if
d(x,f(x))< epsilon then
d(x,x0)< frac epsilon1−s .
Show proofIn the previous proof, we showed that d(um,un) leq fracsm1−sd(x,f(x))= fracsm1−s epsilon .
If we fix m in 0 then we get d(x,un) leq frac epsilon1−s .
When striving n to infinity we get the desired result.
The second theorem tells us that if we find a contraction map
f such that
f(x) close to
x , then we can be sure that the fixed point
f also close to
x .
This result will be the foundation for our future work. And in fact, instead of saving the image, it is enough for us to save the compressive display, the fixed point of which is close to the image.
Compression displays for images
In this part I will show how to create such compressive displays so that the fixed point is close to the image.
First, let's set the image set and distance. We will choose
E=[0,1]h timesw .
E Is a lot of matrices with
h in rows
w columns and with coefficients in the interval
[0,1] . Then we will take
d(x,y)= left( sumhi=1 sumwj=1(xij−yij)2 right)0.5 .
d Is the distance obtained from the
Frobenius norm .
Let be
x inE Is the image we want to compress.
We will split the image into blocks twice:
- First, we divide the image into finite or interval blocks R1,...,RL . These blocks are separated and cover the entire image.
- Then we divide the image into blocks of sources or domains D1,...,DK . These blocks are not necessarily separated and do not necessarily cover the entire image.
For example, we can segment the image as follows:
Then for each interval block
Rl we select a domain block
Dkl and mapping
fl:[0,1]Dkl rightarrow[0,1]Rl .
Next we can define a function
f as:
f(x)ij=fl(xDkl)ij textif(i,j) inRl
Approval : if
fl are contracting mappings, then and
f also compressive mapping.
Show proofLet be
x,y inE and suppose that all
fl are compression mappings with a compression ratio
sl . Then we get the following:
\ begin {alignat *} {2} d (f (x), f (y)) ^ 2 & = \ sum_ {i = 1} ^ {h} {\ sum_ {j = 1} ^ {w} { (f (x) _ {ij} -f (y) _ {ij}) ^ 2}} \ text {by definition} d \\ & = \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(f (x) _ {ij} -f (y) _ {ij}) ^ 2}} \ text {since} (R_l) \ text {is a partition} \\ & = \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(f_l (x_ {D_ {k_l}}) _ {ij} -f_l (y_ {D_ {k_l}}) _ {ij }) ^ 2}} \ text {by definition} f \\ & = \ sum_ {l = 1} ^ L {d (f_l (x_ {D_ {k_l}}), f_l (y_ {D_ {k_l}}) ) ^ 2} \ text {by definition} d \\ & \ leq \ sum_ {l = 1} ^ L {s_l ^ 2d (x_ {D_ {k_l}}, y_ {D_ {k_l}}) ^ 2} \ text {since} (f_l) \ text {are contraction mappings} \\ & \ leq \ underset {l} {\ max} {s_l ^ 2} \ sum_ {l = 1} ^ L {d (x_ {D_ {k_l }}, y_ {D_ {k_l}}) ^ 2} \\ & = \ underset {l} {\ max} {s_l ^ 2} \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(x_ {ij} -y_ {ij}) ^ 2}} \ text {by definition} d \\ & = \ underset {l} {\ max} {s_l ^ 2} \ sum_ {i = 1} ^ {h} {\ sum_ {j = 1} ^ {w} {(x_ {ij} -y_ {ij}) ^ 2}} \ text {since} (R_l) \ text {is a partition} \\ & = \ underset {l} {\ max} {s_l ^ 2} d (x, y) ^ 2 \ text {by definition} d \\ \ end {alignat *}
\ begin {alignat *} {2} d (f (x), f (y)) ^ 2 & = \ sum_ {i = 1} ^ {h} {\ sum_ {j = 1} ^ {w} { (f (x) _ {ij} -f (y) _ {ij}) ^ 2}} \ text {by definition} d \\ & = \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(f (x) _ {ij} -f (y) _ {ij}) ^ 2}} \ text {since} (R_l) \ text {is a partition} \\ & = \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(f_l (x_ {D_ {k_l}}) _ {ij} -f_l (y_ {D_ {k_l}}) _ {ij }) ^ 2}} \ text {by definition} f \\ & = \ sum_ {l = 1} ^ L {d (f_l (x_ {D_ {k_l}}), f_l (y_ {D_ {k_l}}) ) ^ 2} \ text {by definition} d \\ & \ leq \ sum_ {l = 1} ^ L {s_l ^ 2d (x_ {D_ {k_l}}, y_ {D_ {k_l}}) ^ 2} \ text {since} (f_l) \ text {are contraction mappings} \\ & \ leq \ underset {l} {\ max} {s_l ^ 2} \ sum_ {l = 1} ^ L {d (x_ {D_ {k_l }}, y_ {D_ {k_l}}) ^ 2} \\ & = \ underset {l} {\ max} {s_l ^ 2} \ sum_ {l = 1} ^ L {\ sum _ {(i, j) \ in R_l} {(x_ {ij} -y_ {ij}) ^ 2}} \ text {by definition} d \\ & = \ underset {l} {\ max} {s_l ^ 2} \ sum_ {i = 1} ^ {h} {\ sum_ {j = 1} ^ {w} {(x_ {ij} -y_ {ij}) ^ 2}} \ text {since} (R_l) \ text {is a partition} \\ & = \ underset {l} {\ max} {s_l ^ 2} d (x, y) ^ 2 \ text {by definition} d \\ \ end {alignat *}
There remains one question that needs to be answered: how to choose
Dkl and
fl ?
Collage theorem offers a way to choose them: if
xRl is close to
f(xDkl) for all
l then
x is close to
f(x) and by the collage theorem
x and
x0 are also close.
So we are independent for everyone
l we can construct a set of squeezing mappings from each
Dk on the
Rl and choose the best. In the next section, we show all the details of this operation.
Implementation
In each section I will copy interesting code fragments, and the entire script can be found
here .
Partitions
I chose a very simple approach. Source blocks and leaf blocks segment the image on a grid, as shown in the image above.
The size of the blocks is equal to the powers of two and this greatly simplifies the work. Source blocks are 8 by 8 and end blocks are 4 by 4.
There are more complex partitioning schemes. For example, we can use the quadtree tree to break up areas with lots of detail more strongly.
Conversions
In this section, I will show how to create compression mappings from
Dk on the
Rl .
Remember that we want to generate such a mapping
fl to
f(xDk) was close to
xRl . That is, the more mappings we generate, the more likely it is to find a good one.
However, the quality of compression depends on the number of bits required to store
fl . That is, if many functions are too large, then the compression will be bad. A compromise is needed here.
I decided that
fl will look like this:
fl(xDk)=s timesrotate theta(flipd(reduce(xDk)))+b
Where
reduce - this is a function for moving from blocks 8 to 8 to blocks 4 to 4,
flip and
rotate - affine transformations,
s changes the contrast, and
b - brightness.
The
reduce
function reduces the size of the image by averaging the surroundings:
def reduce(img, factor): result = np.zeros((img.shape[0] // factor, img.shape[1] // factor)) for i in range(result.shape[0]): for j in range(result.shape[1]): result[i,j] = np.mean(img[i*factor:(i+1)*factor,j*factor:(j+1)*factor]) return result
The
rotate
function rotates the image by a given angle:
def rotate(img, angle): return ndimage.rotate(img, angle, reshape=False)
To maintain the shape of the image angle
theta can only take values
\ {0 ^ {\ circ}, 90 ^ {\ circ}, 180 ^ {\ circ}, 270 ^ {\ circ} \} .
The
flip
function mirrors the image if
direction
is -1 and does not reflect if the value is 1:
def flip(img, direction): return img[::direction,:]
Complete conversion is performed by the
apply_transformation
function:
def apply_transformation(img, direction, angle, contrast=1.0, brightness=0.0): return contrast*rotate(flip(img, direction), angle) + brightness
We need 1 bit to remember if mirroring is required, and 2 bits for the rotation angle. Moreover, if we save
s and
b Using 8 bits for each value, we will need only 11 bits to save the conversion.
In addition, we should check if these functions are contraction mappings. The proof of this is a little boring, and not really what we need. Perhaps later I will add it as an appendix to the article.
Compression
The compression algorithm is simple. First, we generate all possible affine transformations of all source blocks using the
generate_all_transformed_blocks
function:
def generate_all_transformed_blocks(img, source_size, destination_size, step): factor = source_size // destination_size transformed_blocks = [] for k in range((img.shape[0] - source_size) // step + 1): for l in range((img.shape[1] - source_size) // step + 1):
Then, for each final block, we check all previously generated transformed source blocks. For each, we optimize contrast and brightness using the
find_contrast_and_brightness2
method, and if the tested conversion is the best of all so far found, then save it:
def compress(img, source_size, destination_size, step): transformations = [] transformed_blocks = generate_all_transformed_blocks(img, source_size, destination_size, step) for i in range(img.shape[0] // destination_size): transformations.append([]) for j in range(img.shape[1] // destination_size): print(i, j) transformations[i].append(None) min_d = float('inf')
To find the best contrast and brightness, the
find_contrast_and_brightness2
method simply solves the least squares problem:
def find_contrast_and_brightness2(D, S):
Unpacking
The decompression algorithm is even simpler. We start with a completely random image, and then apply a squeeze image several times
f :
def decompress(transformations, source_size, destination_size, step, nb_iter=8): factor = source_size // destination_size height = len(transformations) * destination_size width = len(transformations[0]) * destination_size iterations = [np.random.randint(0, 256, (height, width))] cur_img = np.zeros((height, width)) for i_iter in range(nb_iter): print(i_iter) for i in range(len(transformations)): for j in range(len(transformations[i])):
This algorithm works because the compression map has a unique fixed point, and no matter what source image we choose, we will strive for it.
I think the time has come for a small example. I will try to compress and unzip the monkey image:
The
test_greyscale
function loads the image, compresses it, decompresses it, and shows each unpacking iteration:
Not bad at all for such a simple implementation!
RGB images
A very naive approach to compressing RGB images is to compress all three channels individually:
def compress_rgb(img, source_size, destination_size, step): img_r, img_g, img_b = extract_rgb(img) return [compress(img_r, source_size, destination_size, step), \ compress(img_g, source_size, destination_size, step), \ compress(img_b, source_size, destination_size, step)]
And for unpacking, we simply unpack the data of three channels separately and collect them in three image channels:
def decompress_rgb(transformations, source_size, destination_size, step, nb_iter=8): img_r = decompress(transformations[0], source_size, destination_size, step, nb_iter)[-1] img_g = decompress(transformations[1], source_size, destination_size, step, nb_iter)[-1] img_b = decompress(transformations[2], source_size, destination_size, step, nb_iter)[-1] return assemble_rbg(img_r, img_g, img_b)
Another very simple solution is to use the same compression display for all three channels, because they are often very similar.
If you want to check how this works, then run the
test_rgb
function:
Artifacts appeared. This method is probably too naive to produce good results.
Where to go next
If you want to know more about fractal image compression, I can recommend you read the article
Fractal and Wavelet Image Compression Techniques by Stephen Welsted. It is easy to read and explains more sophisticated techniques.