# Problem

Convert a string containing 4 uppercase alphanumeric characters, whose values range from A through Z and 0 through 9, into a “HH:MM:SS” (24 hours, 60 minutes, 60 seconds) timestamp whose value ranges from “00:00:00” through “23:59:59”.

The string has 1679616 possible values (36 × 36 × 36 × 36) whereas the timestamp has 86400 possible values (24 × 60 × 60). Thus the challenge is to convey more information using less.

How would you solve this problem? Pause for a moment to think about it. In fact, go now and implement your solution before reading about my particular approach and solution below, as this is quite a fun puzzle! :)

# Approach

Each character in the input string can be represented as a base 36 digit:

• Characters 0 through 9 become integers 0 through 9 respectively.
• Characters A through Z become integers 10 through 35 respectively.

Let’s examine the base 36 and binary representations of each character:

Character Base 36 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1
0 0 . . . . . 0
1 1 . . . . . 1
2 2 . . . . 1 0
3 3 . . . . 1 1
4 4 . . . 1 0 0
5 5 . . . 1 0 1
6 6 . . . 1 1 0
7 7 . . . 1 1 1
8 8 . . 1 0 0 0
9 9 . . 1 0 0 1
A 10 . . 1 0 1 0
B 11 . . 1 0 1 1
C 12 . . 1 1 0 0
D 13 . . 1 1 0 1
E 14 . . 1 1 1 0
F 15 . . 1 1 1 1
G 16 . 1 0 0 0 0
H 17 . 1 0 0 0 1
I 18 . 1 0 0 1 0
J 19 . 1 0 0 1 1
K 20 . 1 0 1 0 0
L 21 . 1 0 1 0 1
M 22 . 1 0 1 1 0
N 23 . 1 0 1 1 1
O 24 . 1 1 0 0 0
P 25 . 1 1 0 0 1
Q 26 . 1 1 0 1 0
R 27 . 1 1 0 1 1
S 28 . 1 1 1 0 0
T 29 . 1 1 1 0 1
U 30 . 1 1 1 1 0
V 31 . 1 1 1 1 1
W 32 1 0 0 0 0 0
X 33 1 0 0 0 0 1
Y 34 1 0 0 0 1 0
Z 35 1 0 0 0 1 1

Notice that bit 6 is present in only 4 characters, W through Z, and absent in all others. Thus, if we randomly picked a character from the set of 36 possible characters, the probability of finding bit 6 in our pick is 4 ÷ 36, or 0.1111111111111111, which approximately denotes a 11% chance of success.

In contrast, notice that bit 1 is present in every character. Thus, if we randomly picked a character from the set of 36 possible characters, the probability of finding bit 1 in our pick is 36 ÷ 36, or 1, which denotes a 100% chance of success.

These facts are recorded in the table below.

Bit Number of characters present in Probability of presence
6 4 0.1111111111111111
5 20 0.5555555555555556
4 28 0.7777777777777778
3 32 0.8888888888888888
2 34 0.9444444444444444
1 36 1

The timestamp, with its 86400 possible values, cannot represent all 1679616 values possible in the string unless we reuse some of them. That is, instead of mapping each unique string value to exactly one unique timestamp value, we map multiple unique string values to the same unique timestamp value, thereby reusing it.

This reuse happens automatically when we discard information. For example, if we discard the fourth character from two different strings “ABCD” and “ABCX”, then they would both become “ABC”. Next, if the string “ABC” were mapped to the timestamp “01:02:03”, then both strings “ABCD” and “ABCX” would map to the same “01:02:03” timestamp, thereby achieving reuse.

We can discard information strategically based on the probabilities listed in the table above. In particular, we can discard bits with lower presence probabilities with less risk of information loss because discarding nonexistent bits from a value does not change it. For example, bit 6 is present 11% of the time, so we only risk information loss 11% of the time by discarding it. In contrast, bit 1 is present 100% of the time, so we risk information loss 100% of the time by discarding it.

In binary, the string occupies 24 bits (4 characters × 6 bits per character) and the timestamp occupies at least 16 bits (216 < 86400 possible values < 217). Therefore, we need to discard at least 8 bits (24 string bits - 16 timestamp bits) of information from the string to fit it within the timestamp’s 16 bits. This translates to 2 bits (8 bits discarded ÷ 4 characters) discarded per character.

Since lower bits are more likely to be present than higher ones, as the table above shows, we should keep as many lower bits as possible and discard as few higher bits as necessary. Thus, we shall discard the upper 2 bits (bit 6 and 5) from each character in the string to fit the string within the timestamp.

# Solution

In the Ruby programming language:

``````
# Converts the given string into a "HH:MM:SS" timestamp.
def solution(string)
digits = digits36(string)
num_seconds = compress(*digits)
Time.at(num_seconds).utc.strftime('%H:%M:%S')
end

# Converts each character in the given string into a base 36 digit.
def digits36(string)
string.each_char.map { |char| digit36(char) }
end

# Converts the given character into a base 36 digit:
# decimal digits (0..9) + uppercase letters (10..35)
def digit36(char)
case char
when '0'..'9' then char.ord - '0'.ord
when 'A'..'Z' then char.ord - 'A'.ord + 10
else raise ArgumentError, "not a base36 digit: #{char.inspect}"
end
end

# Combines the given base 36 digits together into a 16-bit integer.
SECONDS_PER_DAY = 24 * 60 * 60 # hours * minutes/hour * seconds/minute
SECONDS_PER_DAY_BITS = Math.log2(SECONDS_PER_DAY).ceil # 2 ** exponent
def compress(*chars)
return 0 if chars.empty?
bits = SECONDS_PER_DAY_BITS / chars.length
mask = (1 << bits) - 1
chars.inject(0) do |acc, char|
# grab lower bits from each character and place them side by side
(acc << bits) | (char & mask)
end % SECONDS_PER_DAY
end

``````

Save the above code to a `solution.rb` file to play with it:

``````
\$ irb --simple-prompt -r ./solution.rb
## ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]

>> solution "ABCD"
"12:13:01"

>> digits36 "ABCD"
[10, 11, 12, 13]

>> compress 10, 11, 12, 13
43981

>> Time.at(43981).utc
1970-01-01 12:13:01 UTC

``````

The solution above is generic to any number of characters, so try more:

``````
\$ irb --simple-prompt -r ./solution.rb
## ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]

>> solution "ABCDEFG"
"03:09:32"

>> digits36 "ABCDEFG"
[10, 11, 12, 13, 14, 15, 16]

>> compress 10, 11, 12, 13, 14, 15, 16
11372

>> Time.at(11372).utc
1970-01-01 03:09:32 UTC

``````

# Evaluation

To evaluate how well the algorithm performs, we feed it different sets of input strings and analyze its output: the more input strings it maps to unique timestamps, the smaller the reuse probability and the better it is.

Algorithm (f : string → timestamp) characteristics Best case Worst case No case
Number of strings (domain) 65536 160000 1679616
Number of timestamps (codomain) 65536 65536 65536
Number of unique timestamps 65536 20736 0
Number of timestamps reused 0 44800 65536
Probability of timestamp reuse 0 0.68359375 1
Number of strings mapped to reused timestamps 0 139264 1679616
Probability of string mapped to reused timestamp 0 0.8704 1

In the best case, where all input characters fed into the algorithm lack the upper bits it discards, information is never lost. Therefore, the probability of timestamp reuse is zero.

In the worst case, where all input characters fed into the algorithm contain the upper bits it discards, some information is always lost. Therefore, the probability of timestamp reuse is greater than zero.

In the “no” case, where some input characters fed into the algorithm contain the upper bits it discards, information is sometimes lost. However, since this input set contains all 1679616 possible input strings, timestamps will always be reused: the timestamp, with its 86400 possible values, cannot represent all 1679616 values possible in the string unless we reuse some of them.

This observation applies to any such algorithm.

# Comparison

My solution to this problem is based on probabilities. However, there are many other solutions, some of which are known as hash functions. Let’s compare them, along with my solution, to see how well they perform.

To compare these algorithms fairly, we need to feed them the same set of strings. Unlike the singular evaluation of my solution above, here we don’t have the luxury of partitioning the input space into best and worst cases up front, because such cases vary wildly across these algorithms.

Here, I have chosen to feed all 4-subsets of the 36 possible uppercase alphanumeric characters, which produces 58905 unique strings containing 4 unique uppercase alphanumeric characters each, into these algorithms. Although this choice poorly approximates reality, wherein strings may contain repeated characters, some such sacrifice is necessary to produce a non-exhaustive set of inputs with which to compare these algorithms.

The results are recorded in the table below, sorted by reuse probabilities in ascending order: from the best result first to the worst result last.

The key result to notice here is that my solution had the best performance among all of these algorithms. I was delighted to see this because I expected it fare far worse than real algorithms. :-)

One surprising result is that the ∅ solution, which simply discards the fourth character and doesn’t bother with any of this algorithmic nonsense, performs better than half of these algorithms. Although its performance isn’t praiseworthy, it nevertheless hints that perhaps, sometimes, ignoring a problem can be just as effective as solving it.

Augh! What am I saying? Those are not the words of a true engineer! Problems must always be solved and solved well, so long as it’s fun. ;-) Anyway, that’s all folks. I hope you enjoyed this algorithmic treatise. But wait! Don’t leave just yet because there are some fun behind the scenes goodies waiting for you below! :-)

Algorithm (f : string → timestamp) Number of strings (domain) Number of timestamps (codomain) Number of unique timestamps Number of timestamps reused Probability of timestamp reuse Number of strings mapped to reused timestamps Probability of string mapped to reused timestamp
My solution 58905 39531 34690 4841 0.12246085350737396 24215 0.411085646379764
Preserve bits 4,3,2,1 58905 39531 34690 4841 0.12246085350737396 24215 0.411085646379764
Rotating hash 58905 42936 34649 8287 0.193008198248556 24256 0.4117816823699177
Bernstein XOR hash 58905 46718 35673 11045 0.23641851106639838 23232 0.39439775910364144
Shift-Add-XOR hash 58905 44623 32440 12183 0.2730206395804854 26465 0.4492827434003905
Preserve bits 4,3,2,1 then add bits 6,5 58905 40651 29341 11310 0.2782219379597058 29564 0.5018928783634666
Bernstein hash 58905 43281 28707 14574 0.33672974284327994 30198 0.5126559714795009
djb2 hash 58905 43281 28707 14574 0.33672974284327994 30198 0.5126559714795009
Pearson hash 58905 38838 23805 15033 0.3870693650548432 35100 0.5958747135217723
FNV-1a hash 58905 36225 21085 15140 0.4179434092477571 37820 0.6420507596978186
FNV-1 hash 58905 35338 20229 15109 0.42755673778934855 38676 0.6565826330532213
Preserve bits 4,3,2,1 then XOR bits 6,5 58905 29003 15230 13773 0.47488190876805847 43675 0.7414480943892708
Preserve bits 5,3,2,1 58905 21243 6706 14537 0.6843195405545356 52199 0.8861556743909685
Exim nhash 58905 7401 1280 6121 0.8270503985947845 57625 0.9782700959171547
Merge fourth character into others 58905 9467 1477 7990 0.8439843667476498 57428 0.9749257278669043
Fletcher16 checksum 58905 4935 621 4314 0.8741641337386018 58284 0.9894576012223071
Preserve bits 5,4,2,1 58905 12931 1610 11321 0.8754930013146702 57295 0.9726678550207962
sdbm hash 58905 3343 331 3012 0.9009871373018247 58574 0.9943807826160768
Discard fourth character (∅ solution) 58905 6545 561 5984 0.9142857142857143 58344 0.9904761904761905
Preserve bits 6,3,2,1 58905 6561 331 6230 0.9495503734186862 58574 0.9943807826160768
Preserve bits 5,4,3,1 58905 7191 327 6864 0.9545264914476429 58578 0.9944486885663356
BSD checksum 58905 464 21 443 0.9547413793103449 58884 0.9996434937611408
LRC checksum 58905 129 4 125 0.9689922480620154 58901 0.9999320940497411
lose lose hash 58905 129 4 125 0.9689922480620154 58901 0.9999320940497411
Preserve bits 6,4,2,1 58905 6145 171 5974 0.9721724979658258 58734 0.9970970206264324
Preserve bits 6,5,2,1 58905 2625 71 2554 0.9729523809523809 58834 0.9987946693829047
Preserve bits 5,4,3,2 58905 5655 150 5505 0.9734748010610079 58755 0.9974535268652915
Preserve bits 6,5,3,1 58905 1953 39 1914 0.9800307219662059 58866 0.9993379169849758
Preserve bits 6,5,4,1 58905 1073 21 1052 0.9804287045666356 58884 0.9996434937611408
Preserve bits 6,5,4,3 58905 495 9 486 0.9818181818181818 58896 0.9998472116119175
Preserve bits 6,4,3,1 58905 4193 75 4118 0.9821130455521107 58830 0.9987267634326458
Jenkins one-at-a-time hash 58905 9582 127 9455 0.9867459820496765 58778 0.9978439860792802
Preserve bits 6,5,4,2 58905 1005 13 992 0.9870646766169154 58892 0.9997793056616586
Preserve bits 6,5,3,2 58905 1749 21 1728 0.9879931389365352 58884 0.9996434937611408
Preserve bits 6,4,3,2 58905 3621 37 3584 0.9897818282242474 58868 0.9993718699601053

# Behind the scenes

I created the giant data table you see above, which compares different algorithms, by first running the comparison script and then formatting its results into a nice data table:

``````\$ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.combinations' > comparison.txt

\$ vim comparison.txt # do some post-processing to format the results into a nice data table
``````

Similarly, I created the smaller data table you saw above, which compares the best, worst, and “no” case performance of my solution, by running these commands:

``````
# BEST CASE

\$ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.best_case, /^preserve_lower_bits\$/'
[{"Compression algorithm"=>:preserve_lower_bits,
"Number of strings (domain)"=>65536,
"Number of timestamps (codomain)"=>65536,
"Number of unique timestamps"=>65536,
"Number of timestamps reused"=>0,
"Probability of timestamp reuse"=>0.0,
"Number of strings mapped to reused timestamps"=>0,
"Probability of string mapped to reused timestamp"=>0.0}]

# WORST CASE

\$ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.worst_case, /^preserve_lower_bits\$/'
[{"Compression algorithm"=>:preserve_lower_bits,
"Number of strings (domain)"=>160000,
"Number of timestamps (codomain)"=>65536,
"Number of unique timestamps"=>20736,
"Number of timestamps reused"=>44800,
"Probability of timestamp reuse"=>0.68359375,
"Number of strings mapped to reused timestamps"=>139264,
"Probability of string mapped to reused timestamp"=>0.8704}]

# NO CASE

\$ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.no_case, /^preserve_lower_bits\$/'
[{"Compression algorithm"=>:preserve_lower_bits,
"Number of strings (domain)"=>1679616,
"Number of timestamps (codomain)"=>65536,
"Number of unique timestamps"=>0,
"Number of timestamps reused"=>65536,
"Probability of timestamp reuse"=>1.0,
"Number of strings mapped to reused timestamps"=>1679616,
"Probability of string mapped to reused timestamp"=>1.0}]

``````

You can load the comparison script into an interactive Ruby shell and play with the algorithms and comparisons dynamically:

``````\$ irb --simple-prompt -r ./comparison.rb
## ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]

>> Input.singleton_methods
[:combinations, :permutations, :best_case, :worst_case, :no_case]

>> Input.combinations
#<Enumerator: ...>

>> Input.combinations.take(3)
[["0", "1", "2", "3"], ["0", "1", "2", "4"], ["0", "1", "2", "5"]]

>> Input.permutations.take(3)
[["0", "0", "0", "0"], ["0", "0", "0", "1"], ["0", "0", "0", "2"]]

>> Input.best_case.take(3)
[["0", "0", "0", "0"], ["0", "0", "0", "1"], ["0", "0", "0", "2"]]

>> Input.worst_case.take(3)
[["G", "G", "G", "G"], ["G", "G", "G", "H"], ["G", "G", "G", "I"]]

>> Input.no_case.take(3)
[["0", "0", "0", "0"], ["0", "0", "0", "1"], ["0", "0", "0", "2"]]
``````

You can even define a new algorithm and measure its performance:

``````>> def Compressor.my_new_solution(a, b, c, d)
>>   a + b - c * d
>> end
:my_new_solution

>> Comparator.measure Input.combinations, :my_new_solution
{"Compression algorithm"=>:my_new_solution,
"Number of strings (domain)"=>58905,
"Number of timestamps (codomain)"=>1183,
"Number of unique timestamps"=>7,
"Number of timestamps reused"=>1176,
"Probability of timestamp reuse"=>0.9940828402366864,
"Number of strings mapped to reused timestamps"=>58898,
"Probability of string mapped to reused timestamp"=>0.9998811645870469}
``````

You can also compare it to other algorithms, just as this article did:

``````>> Comparator.compare Input.combinations
# ... too much output, so it's omitted ...
``````

If that’s too much, selectively compare just the algorithms you want:

``````>> Comparator.compare Input.combinations, /my_new|char/
[{"Compression algorithm"=>:merge_fourth_char_into_others,
"Number of strings (domain)"=>58905,
"Number of timestamps (codomain)"=>9936,
"Number of unique timestamps"=>1841,
"Number of timestamps reused"=>8095,
"Probability of timestamp reuse"=>0.8147141706924316,
"Number of strings mapped to reused timestamps"=>57064,
"Probability of string mapped to reused timestamp"=>0.9687462863933453},
"Number of strings (domain)"=>58905,
"Number of timestamps (codomain)"=>6545,
"Number of unique timestamps"=>561,
"Number of timestamps reused"=>5984,
"Probability of timestamp reuse"=>0.9142857142857143,
"Number of strings mapped to reused timestamps"=>58344,
"Probability of string mapped to reused timestamp"=>0.9904761904761905},
{"Compression algorithm"=>:my_new_solution,
"Number of strings (domain)"=>58905,
"Number of timestamps (codomain)"=>1183,
"Number of unique timestamps"=>7,
"Number of timestamps reused"=>1176,
"Probability of timestamp reuse"=>0.9940828402366864,
"Number of strings mapped to reused timestamps"=>58898,
"Probability of string mapped to reused timestamp"=>0.9998811645870469}]
``````

## comparison.rb

In the Ruby programming language:

``````
# Combines the given base 36 digits together into a 16-bit integer.
module Compressor
extend self

#              _ _             _       _   _
#  _ __  _   _| | |  ___  ___ | |_   _| |_(_) ___  _ __
# | '_ \| | | | | | / __|/ _ \| | | | | __| |/ _ \| '_ \
# | | | | |_| | | | \__ \ (_) | | |_| | |_| | (_) | | | |
# |_| |_|\__,_|_|_| |___/\___/|_|\__,_|\__|_|\___/|_| |_|
#

# Why bother with all of this? Just discard the fourth character.
(a * (36 ** 2)) + (b * 36) + c
end

#                              _       _   _
#  _ __ ___  _   _   ___  ___ | |_   _| |_(_) ___  _ __  ___
# | '_ ` _ \| | | | / __|/ _ \| | | | | __| |/ _ \| '_ \/ __|
# | | | | | | |_| | \__ \ (_) | | |_| | |_| | (_) | | | \__ \
# |_| |_| |_|\__, | |___/\___/|_|\__,_|\__|_|\___/|_| |_|___/
#            |___/

def merge_fourth_char_into_others(a, b, c, d)
c += (d & 0b11)
b += (d & 0b1100) >> 2
a += (d & 0b110000) >> 4
(a * (36 ** 2)) + (b * 36) + c
end

def preserve_lower_bits(a, b, c, d)
upper_byte = ((a & 0xF) << 4) | (b & 0xF)
lower_byte = ((c & 0xF) << 4) | (d & 0xF)
(upper_byte << 8) | lower_byte
end

# Generate preserve_bits_WXYZ() functions that try all combinations.
preserve_bits_helper = lambda do |a, b, c, d, bits|
fetch_bits = lambda do |char|
bits.each_with_index.map do |bit, index|
char[bit] << index
end.reduce(:|).to_i
end
upper_byte = (fetch_bits.(a) << 4) | fetch_bits.(b)
lower_byte = (fetch_bits.(c) << 4) | fetch_bits.(d)
((upper_byte << 8) | lower_byte)
end
(0..5).to_a.combination(4).each do |bits|
define_method "preserve_bits_#{bits.sort.reverse.map(&:next).join}" do
|a, b, c, d| preserve_bits_helper.(a, b, c, d, bits)
end
end

lower_bits = lambda {|char| (char & 0b001111) }
upper_bits = lambda {|char| (char & 0b110000) >> 4 }
upper_byte = (lower_bits.(a) << 4) | lower_bits.(c)
lower_byte = (lower_bits.(b) << 4) | lower_bits.(d)
extra_byte = (upper_bits.(a) << 6) | (upper_bits.(c) << 4) |
(upper_bits.(b) << 2) | (upper_bits.(d) << 0)
((upper_byte << 8) | lower_byte) + extra_byte
end

def preserve_lower_bits_then_xor_upper_bits(a, b, c, d)
lower_bits = lambda {|char| (char & 0b001111) }
upper_bits = lambda {|char| (char & 0b110000) >> 4 }
xor_helper = lambda do |char|
((upper_bits.(char) ^ (lower_bits.(char) >> 2)) << 2) |
(lower_bits.(char) & 0b11)
end
upper_byte = (xor_helper.(a) << 4) | xor_helper.(c)
lower_byte = (xor_helper.(b) << 4) | xor_helper.(d)
((upper_byte << 8) | lower_byte)
end

#                 _             _       _   _
#  _ __ ___  __ _| |  ___  ___ | |_   _| |_(_) ___  _ __  ___
# | '__/ _ \/ _` | | / __|/ _ \| | | | | __| |/ _ \| '_ \/ __|
# | | |  __/ (_| | | \__ \ (_) | | |_| | |_| | (_) | | | \__ \
# |_|  \___|\__,_|_| |___/\___/|_|\__,_|\__|_|\___/|_| |_|___/
#

# https://en.wikipedia.org/wiki/Pearson_hashing
PEARSON_TABLE = ( 0 ... 2**16 ).to_a.shuffle
def pearson_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
index = hash ^ char
hash = PEARSON_TABLE[index]
end
hash
end

# http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
def rotating_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
hash = (hash << 4) ^ (hash >> 28) ^ char
end
hash
end

# http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
def bernstein_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
hash = 33 * hash + char
end
hash
end

# http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
def bernstein_xor_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
hash = 33 * hash ^ char
end
hash
end

# http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
hash = 0
[a, b, c, d].each do |char|
hash ^= (hash << 5) + (hash >> 2) + char
end
hash
end

# http://www.isthe.com/chongo/tech/comp/fnv/#FNV-1
def fnv1_hash(a, b, c, d)
hash = 2166136261
[a, b, c, d].each do |char|
hash = (hash * 16777619) ^ char
end
hash
end

# http://www.isthe.com/chongo/tech/comp/fnv/#FNV-1a
def fnv1a_hash(a, b, c, d)
hash = 2166136261
[a, b, c, d].each do |char|
hash = (hash ^ char) * 16777619
end
hash
end

# https://en.wikipedia.org/wiki/Jenkins_hash_function
def jenkins_one_at_a_time_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
hash += char
hash += (hash << 10)
hash ^= (hash >> 6)
end
hash += (hash << 3)
hash ^= (hash >> 11)
hash += (hash << 15)
hash
end

# Algorithm::Nhash - Exim nhash algorithm
# http://cpansearch.perl.org/src/MOOLI/Algorithm-Nhash-0.002/lib/Algorithm/Nhash.pm
NHASH_PRIMES = [3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113]
def exim_nhash(a, b, c, d)
sum = 0
i = 0
[a, b, c, d].each do |char|
i += 28
i %= 29
sum += NHASH_PRIMES[i] * char
end
sum
end

# https://en.wikipedia.org/wiki/Fletcher%27s_checksum
def fletcher16_checksum(a, b, c, d)
sum1 = 0
sum2 = 0

[a, b, c, d].each do |char|
sum1 = (sum1 + char) % 255
sum2 = (sum2 + sum1) % 255
end

(sum2 << 8) | sum1
end

# https://en.wikipedia.org/wiki/BSD_checksum
def bsd_checksum(a, b, c, d)
checksum = 0                 # The checksum mod 2^16.
[a, b, c, d].each do |char|
checksum = (checksum >> 1) + ((checksum & 1) << 15)
checksum += char
checksum &= 0xffff         # Keep it within bounds.
end
checksum
end

# https://en.wikipedia.org/wiki/Longitudinal_redundancy_check
def lrc_checksum(a, b, c, d)
lrc = 0
[a, b, c, d].each do |char|
lrc = (lrc + char) & 0xFF
end
lrc = (((lrc ^ 0xFF) + 1) & 0xFF)
end

# http://www.cse.yorku.ca/~oz/hash.html
def djb2_hash(a, b, c, d)
hash = 5381
[a, b, c, d].each do |char|
hash = ((hash << 5) + hash) + char # hash * 33 + c
end
hash
end

# http://www.cse.yorku.ca/~oz/hash.html
def sdbm_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
hash = char + (hash << 6) + (hash << 16) - hash
end
hash
end

# http://www.cse.yorku.ca/~oz/hash.html
def lose_lose_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
hash += char
end
hash
end

end

module Solver
extend self

# Converts the given 4-character string into a "HH:MM:SS" timestamp.
def solution(string, compressor)
a, b, c, d  = digits36(string)
num_seconds = Compressor.send(compressor, a, b, c, d)
Time.at(num_seconds).strftime('%H:%M:%S')
end

# Converts each character in the given string into a base 36 digits.
def digits36(string)
string.each_char.map { |char| digit36(char) }
end

# Converts the given character into a base 36 digit:
# decimal digits (0..9) + uppercase letters (10..35)
def digit36(char)
case char
when '0'..'9' then char.ord - '0'.ord
when 'A'..'Z' then char.ord - 'A'.ord + 10
else -1
end
end

# Converts the given base 36 digit into a character.
def undigit36(code)
case code
when  0..9  then ('0'.ord + code).chr
when 10..35 then ('A'.ord + (code - 10)).chr
else '?'
end
end

end

module Input
extend self

DOMAIN = ('0'..'9').to_a + ('A'..'Z').to_a

def combinations
DOMAIN.combination(4)
end

def permutations(domain=DOMAIN)
if block_given?
domain.each do |a|
domain.each do |b|
domain.each do |c|
domain.each do |d|
yield [a, b, c, d]
end
end
end
end
else
enum_for __method__, domain
end
end

def best_case
permutations DOMAIN[0..15]
end

def worst_case
permutations DOMAIN[16..35]
end

alias no_case permutations

end

module Comparator
extend self

# Compares the fitness of all compressors that match the given regexp.
def compare(inputs, compressor_regexp=//)
Compressor.singleton_methods.grep(compressor_regexp).
map { |compressor| measure(inputs, compressor) }.
sort_by { |measurement| measurement['Probability of timestamp reuse'] }
end

# Measures the fitness of the given compressor.
def measure(inputs, compressor)
reuse_by_timestamp = measure_reuse(inputs, compressor)

num_keys = reuse_by_timestamp.count
num_values = reuse_by_timestamp.values.reduce(:+).to_i

actual_reuse = reuse_by_timestamp.select {|k,v| v > 1 }
num_keys_reused = actual_reuse.count
num_values_reused = actual_reuse.values.reduce(:+).to_i

{
'Compression algorithm'                            => compressor,
'Number of strings (domain)'                       => num_values,
'Number of timestamps (codomain)'                  => num_keys,
'Number of unique timestamps'                      => num_keys - num_keys_reused,
'Number of timestamps reused'                      => num_keys_reused,
'Probability of timestamp reuse'                   => num_keys_reused / num_keys.to_f,
'Number of strings mapped to reused timestamps'    => num_values_reused,
'Probability of string mapped to reused timestamp' => num_values_reused / num_values.to_f,
}
end

def measure_reuse(inputs, compressor)
reuse_by_timestamp = Hash.new(0)
inputs.each do |characters|
string = characters.join
timestamp = Solver.solution(string, compressor)
reuse_by_timestamp[timestamp] += 1
end
reuse_by_timestamp
end

end

``````