Encoding four base 36 digits as a timestamp
Problem
Convert a string containing 4 uppercase alphanumeric characters, whose values range from A through Z and 0 through 9, into a “HH:MM:SS” (24 hours, 60 minutes, 60 seconds) timestamp whose value ranges from “00:00:00” through “23:59:59”.
The string has 1679616 possible values (36 × 36 × 36 × 36) whereas the timestamp has 86400 possible values (24 × 60 × 60). Thus the challenge is to convey more information using less.
How would you solve this problem? Pause for a moment to think about it. In fact, go now and implement your solution before reading about my particular approach and solution below, as this is quite a fun puzzle! :)
Approach
Each character in the input string can be represented as a base 36 digit:
- Characters 0 through 9 become integers 0 through 9 respectively.
- Characters A through Z become integers 10 through 35 respectively.
Let’s examine the base 36 and binary representations of each character:
Character | Base 36 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 |
---|---|---|---|---|---|---|---|
0 | 0 | . | . | . | . | . | 0 |
1 | 1 | . | . | . | . | . | 1 |
2 | 2 | . | . | . | . | 1 | 0 |
3 | 3 | . | . | . | . | 1 | 1 |
4 | 4 | . | . | . | 1 | 0 | 0 |
5 | 5 | . | . | . | 1 | 0 | 1 |
6 | 6 | . | . | . | 1 | 1 | 0 |
7 | 7 | . | . | . | 1 | 1 | 1 |
8 | 8 | . | . | 1 | 0 | 0 | 0 |
9 | 9 | . | . | 1 | 0 | 0 | 1 |
A | 10 | . | . | 1 | 0 | 1 | 0 |
B | 11 | . | . | 1 | 0 | 1 | 1 |
C | 12 | . | . | 1 | 1 | 0 | 0 |
D | 13 | . | . | 1 | 1 | 0 | 1 |
E | 14 | . | . | 1 | 1 | 1 | 0 |
F | 15 | . | . | 1 | 1 | 1 | 1 |
G | 16 | . | 1 | 0 | 0 | 0 | 0 |
H | 17 | . | 1 | 0 | 0 | 0 | 1 |
I | 18 | . | 1 | 0 | 0 | 1 | 0 |
J | 19 | . | 1 | 0 | 0 | 1 | 1 |
K | 20 | . | 1 | 0 | 1 | 0 | 0 |
L | 21 | . | 1 | 0 | 1 | 0 | 1 |
M | 22 | . | 1 | 0 | 1 | 1 | 0 |
N | 23 | . | 1 | 0 | 1 | 1 | 1 |
O | 24 | . | 1 | 1 | 0 | 0 | 0 |
P | 25 | . | 1 | 1 | 0 | 0 | 1 |
Q | 26 | . | 1 | 1 | 0 | 1 | 0 |
R | 27 | . | 1 | 1 | 0 | 1 | 1 |
S | 28 | . | 1 | 1 | 1 | 0 | 0 |
T | 29 | . | 1 | 1 | 1 | 0 | 1 |
U | 30 | . | 1 | 1 | 1 | 1 | 0 |
V | 31 | . | 1 | 1 | 1 | 1 | 1 |
W | 32 | 1 | 0 | 0 | 0 | 0 | 0 |
X | 33 | 1 | 0 | 0 | 0 | 0 | 1 |
Y | 34 | 1 | 0 | 0 | 0 | 1 | 0 |
Z | 35 | 1 | 0 | 0 | 0 | 1 | 1 |
Notice that bit 6 is present in only 4 characters, W through Z, and absent in all others. Thus, if we randomly picked a character from the set of 36 possible characters, the probability of finding bit 6 in our pick is 4 ÷ 36, or 0.1111111111111111, which approximately denotes a 11% chance of success.
In contrast, notice that bit 1 is present in every character. Thus, if we randomly picked a character from the set of 36 possible characters, the probability of finding bit 1 in our pick is 36 ÷ 36, or 1, which denotes a 100% chance of success.
These facts are recorded in the table below.
Bit | Number of characters present in | Probability of presence |
---|---|---|
6 | 4 | 0.1111111111111111 |
5 | 20 | 0.5555555555555556 |
4 | 28 | 0.7777777777777778 |
3 | 32 | 0.8888888888888888 |
2 | 34 | 0.9444444444444444 |
1 | 36 | 1 |
The timestamp, with its 86400 possible values, cannot represent all 1679616 values possible in the string unless we reuse some of them. That is, instead of mapping each unique string value to exactly one unique timestamp value, we map multiple unique string values to the same unique timestamp value, thereby reusing it.
This reuse happens automatically when we discard information. For example, if we discard the fourth character from two different strings “ABCD” and “ABCX”, then they would both become “ABC”. Next, if the string “ABC” were mapped to the timestamp “01:02:03”, then both strings “ABCD” and “ABCX” would map to the same “01:02:03” timestamp, thereby achieving reuse.
We can discard information strategically based on the probabilities listed in the table above. In particular, we can discard bits with lower presence probabilities with less risk of information loss because discarding nonexistent bits from a value does not change it. For example, bit 6 is present 11% of the time, so we only risk information loss 11% of the time by discarding it. In contrast, bit 1 is present 100% of the time, so we risk information loss 100% of the time by discarding it.
In binary, the string occupies 24 bits (4 characters × 6 bits per character) and the timestamp occupies at least 16 bits (216 < 86400 possible values < 217). Therefore, we need to discard at least 8 bits (24 string bits - 16 timestamp bits) of information from the string to fit it within the timestamp’s 16 bits. This translates to 2 bits (8 bits discarded ÷ 4 characters) discarded per character.
Since lower bits are more likely to be present than higher ones, as the table above shows, we should keep as many lower bits as possible and discard as few higher bits as necessary. Thus, we shall discard the upper 2 bits (bit 6 and 5) from each character in the string to fit the string within the timestamp.
Solution
In the Ruby programming language:
# Converts the given string into a "HH:MM:SS" timestamp.
def solution(string)
digits = digits36(string)
num_seconds = compress(*digits)
Time.at(num_seconds).utc.strftime('%H:%M:%S')
end
# Converts each character in the given string into a base 36 digit.
def digits36(string)
string.each_char.map { |char| digit36(char) }
end
# Converts the given character into a base 36 digit:
# decimal digits (0..9) + uppercase letters (10..35)
def digit36(char)
case char
when '0'..'9' then char.ord - '0'.ord
when 'A'..'Z' then char.ord - 'A'.ord + 10
else raise ArgumentError, "not a base36 digit: #{char.inspect}"
end
end
# Combines the given base 36 digits together into a 16-bit integer.
SECONDS_PER_DAY = 24 * 60 * 60 # hours * minutes/hour * seconds/minute
SECONDS_PER_DAY_BITS = Math.log2(SECONDS_PER_DAY).ceil # 2 ** exponent
def compress(*chars)
return 0 if chars.empty?
bits = SECONDS_PER_DAY_BITS / chars.length
mask = (1 << bits) - 1
chars.inject(0) do |acc, char|
# grab lower bits from each character and place them side by side
(acc << bits) | (char & mask)
end % SECONDS_PER_DAY
end
Save the above code to a solution.rb
file to play with it:
$ irb --simple-prompt -r ./solution.rb
## ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]
>> solution "ABCD"
"12:13:01"
>> digits36 "ABCD"
[10, 11, 12, 13]
>> compress 10, 11, 12, 13
43981
>> Time.at(43981).utc
1970-01-01 12:13:01 UTC
The solution above is generic to any number of characters, so try more:
$ irb --simple-prompt -r ./solution.rb
## ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]
>> solution "ABCDEFG"
"03:09:32"
>> digits36 "ABCDEFG"
[10, 11, 12, 13, 14, 15, 16]
>> compress 10, 11, 12, 13, 14, 15, 16
11372
>> Time.at(11372).utc
1970-01-01 03:09:32 UTC
Evaluation
To evaluate how well the algorithm performs, we feed it different sets of input strings and analyze its output: the more input strings it maps to unique timestamps, the smaller the reuse probability and the better it is.
Algorithm (f : string → timestamp) characteristics | Best case | Worst case | No case |
---|---|---|---|
Number of strings (domain) | 65536 | 160000 | 1679616 |
Number of timestamps (codomain) | 65536 | 65536 | 65536 |
Number of unique timestamps | 65536 | 20736 | 0 |
Number of timestamps reused | 0 | 44800 | 65536 |
Probability of timestamp reuse | 0 | 0.68359375 | 1 |
Number of strings mapped to reused timestamps | 0 | 139264 | 1679616 |
Probability of string mapped to reused timestamp | 0 | 0.8704 | 1 |
In the best case, where all input characters fed into the algorithm lack the upper bits it discards, information is never lost. Therefore, the probability of timestamp reuse is zero.
In the worst case, where all input characters fed into the algorithm contain the upper bits it discards, some information is always lost. Therefore, the probability of timestamp reuse is greater than zero.
In the “no” case, where some input characters fed into the algorithm contain the upper bits it discards, information is sometimes lost. However, since this input set contains all 1679616 possible input strings, timestamps will always be reused: the timestamp, with its 86400 possible values, cannot represent all 1679616 values possible in the string unless we reuse some of them.
This observation applies to any such algorithm.
Comparison
My solution to this problem is based on probabilities. However, there are many other solutions, some of which are known as hash functions. Let’s compare them, along with my solution, to see how well they perform.
To compare these algorithms fairly, we need to feed them the same set of strings. Unlike the singular evaluation of my solution above, here we don’t have the luxury of partitioning the input space into best and worst cases up front, because such cases vary wildly across these algorithms.
Here, I have chosen to feed all 4-subsets of the 36 possible uppercase alphanumeric characters, which produces 58905 unique strings containing 4 unique uppercase alphanumeric characters each, into these algorithms. Although this choice poorly approximates reality, wherein strings may contain repeated characters, some such sacrifice is necessary to produce a non-exhaustive set of inputs with which to compare these algorithms.
The results are recorded in the table below, sorted by reuse probabilities in ascending order: from the best result first to the worst result last.
The key result to notice here is that my solution had the best performance among all of these algorithms. I was delighted to see this because I expected it fare far worse than real algorithms. :-)
One surprising result is that the ∅ solution, which simply discards the fourth character and doesn’t bother with any of this algorithmic nonsense, performs better than half of these algorithms. Although its performance isn’t praiseworthy, it nevertheless hints that perhaps, sometimes, ignoring a problem can be just as effective as solving it.
Augh! What am I saying? Those are not the words of a true engineer! Problems must always be solved and solved well, so long as it’s fun. ;-) Anyway, that’s all folks. I hope you enjoyed this algorithmic treatise. But wait! Don’t leave just yet because there are some fun behind the scenes goodies waiting for you below! :-)
Algorithm (f : string → timestamp) | Number of strings (domain) | Number of timestamps (codomain) | Number of unique timestamps | Number of timestamps reused | Probability of timestamp reuse | Number of strings mapped to reused timestamps | Probability of string mapped to reused timestamp |
---|---|---|---|---|---|---|---|
My solution | 58905 | 39531 | 34690 | 4841 | 0.12246085350737396 | 24215 | 0.411085646379764 |
Preserve bits 4,3,2,1 | 58905 | 39531 | 34690 | 4841 | 0.12246085350737396 | 24215 | 0.411085646379764 |
Rotating hash | 58905 | 42936 | 34649 | 8287 | 0.193008198248556 | 24256 | 0.4117816823699177 |
Bernstein XOR hash | 58905 | 46718 | 35673 | 11045 | 0.23641851106639838 | 23232 | 0.39439775910364144 |
Shift-Add-XOR hash | 58905 | 44623 | 32440 | 12183 | 0.2730206395804854 | 26465 | 0.4492827434003905 |
Preserve bits 4,3,2,1 then add bits 6,5 | 58905 | 40651 | 29341 | 11310 | 0.2782219379597058 | 29564 | 0.5018928783634666 |
Bernstein hash | 58905 | 43281 | 28707 | 14574 | 0.33672974284327994 | 30198 | 0.5126559714795009 |
djb2 hash | 58905 | 43281 | 28707 | 14574 | 0.33672974284327994 | 30198 | 0.5126559714795009 |
Pearson hash | 58905 | 38838 | 23805 | 15033 | 0.3870693650548432 | 35100 | 0.5958747135217723 |
FNV-1a hash | 58905 | 36225 | 21085 | 15140 | 0.4179434092477571 | 37820 | 0.6420507596978186 |
FNV-1 hash | 58905 | 35338 | 20229 | 15109 | 0.42755673778934855 | 38676 | 0.6565826330532213 |
Preserve bits 4,3,2,1 then XOR bits 6,5 | 58905 | 29003 | 15230 | 13773 | 0.47488190876805847 | 43675 | 0.7414480943892708 |
Preserve bits 5,3,2,1 | 58905 | 21243 | 6706 | 14537 | 0.6843195405545356 | 52199 | 0.8861556743909685 |
Exim nhash | 58905 | 7401 | 1280 | 6121 | 0.8270503985947845 | 57625 | 0.9782700959171547 |
Merge fourth character into others | 58905 | 9467 | 1477 | 7990 | 0.8439843667476498 | 57428 | 0.9749257278669043 |
Fletcher16 checksum | 58905 | 4935 | 621 | 4314 | 0.8741641337386018 | 58284 | 0.9894576012223071 |
Preserve bits 5,4,2,1 | 58905 | 12931 | 1610 | 11321 | 0.8754930013146702 | 57295 | 0.9726678550207962 |
sdbm hash | 58905 | 3343 | 331 | 3012 | 0.9009871373018247 | 58574 | 0.9943807826160768 |
Discard fourth character (∅ solution) | 58905 | 6545 | 561 | 5984 | 0.9142857142857143 | 58344 | 0.9904761904761905 |
Preserve bits 6,3,2,1 | 58905 | 6561 | 331 | 6230 | 0.9495503734186862 | 58574 | 0.9943807826160768 |
Preserve bits 5,4,3,1 | 58905 | 7191 | 327 | 6864 | 0.9545264914476429 | 58578 | 0.9944486885663356 |
BSD checksum | 58905 | 464 | 21 | 443 | 0.9547413793103449 | 58884 | 0.9996434937611408 |
LRC checksum | 58905 | 129 | 4 | 125 | 0.9689922480620154 | 58901 | 0.9999320940497411 |
lose lose hash | 58905 | 129 | 4 | 125 | 0.9689922480620154 | 58901 | 0.9999320940497411 |
Preserve bits 6,4,2,1 | 58905 | 6145 | 171 | 5974 | 0.9721724979658258 | 58734 | 0.9970970206264324 |
Preserve bits 6,5,2,1 | 58905 | 2625 | 71 | 2554 | 0.9729523809523809 | 58834 | 0.9987946693829047 |
Preserve bits 5,4,3,2 | 58905 | 5655 | 150 | 5505 | 0.9734748010610079 | 58755 | 0.9974535268652915 |
Preserve bits 6,5,3,1 | 58905 | 1953 | 39 | 1914 | 0.9800307219662059 | 58866 | 0.9993379169849758 |
Preserve bits 6,5,4,1 | 58905 | 1073 | 21 | 1052 | 0.9804287045666356 | 58884 | 0.9996434937611408 |
Preserve bits 6,5,4,3 | 58905 | 495 | 9 | 486 | 0.9818181818181818 | 58896 | 0.9998472116119175 |
Preserve bits 6,4,3,1 | 58905 | 4193 | 75 | 4118 | 0.9821130455521107 | 58830 | 0.9987267634326458 |
Jenkins one-at-a-time hash | 58905 | 9582 | 127 | 9455 | 0.9867459820496765 | 58778 | 0.9978439860792802 |
Preserve bits 6,5,4,2 | 58905 | 1005 | 13 | 992 | 0.9870646766169154 | 58892 | 0.9997793056616586 |
Preserve bits 6,5,3,2 | 58905 | 1749 | 21 | 1728 | 0.9879931389365352 | 58884 | 0.9996434937611408 |
Preserve bits 6,4,3,2 | 58905 | 3621 | 37 | 3584 | 0.9897818282242474 | 58868 | 0.9993718699601053 |
Behind the scenes
I created the giant data table you see above, which compares different algorithms, by first running the comparison script and then formatting its results into a nice data table:
$ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.combinations' > comparison.txt
$ vim comparison.txt # do some post-processing to format the results into a nice data table
Similarly, I created the smaller data table you saw above, which compares the best, worst, and “no” case performance of my solution, by running these commands:
# BEST CASE
$ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.best_case, /^preserve_lower_bits$/'
[{"Compression algorithm"=>:preserve_lower_bits,
"Number of strings (domain)"=>65536,
"Number of timestamps (codomain)"=>65536,
"Number of unique timestamps"=>65536,
"Number of timestamps reused"=>0,
"Probability of timestamp reuse"=>0.0,
"Number of strings mapped to reused timestamps"=>0,
"Probability of string mapped to reused timestamp"=>0.0}]
# WORST CASE
$ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.worst_case, /^preserve_lower_bits$/'
[{"Compression algorithm"=>:preserve_lower_bits,
"Number of strings (domain)"=>160000,
"Number of timestamps (codomain)"=>65536,
"Number of unique timestamps"=>20736,
"Number of timestamps reused"=>44800,
"Probability of timestamp reuse"=>0.68359375,
"Number of strings mapped to reused timestamps"=>139264,
"Probability of string mapped to reused timestamp"=>0.8704}]
# NO CASE
$ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.no_case, /^preserve_lower_bits$/'
[{"Compression algorithm"=>:preserve_lower_bits,
"Number of strings (domain)"=>1679616,
"Number of timestamps (codomain)"=>65536,
"Number of unique timestamps"=>0,
"Number of timestamps reused"=>65536,
"Probability of timestamp reuse"=>1.0,
"Number of strings mapped to reused timestamps"=>1679616,
"Probability of string mapped to reused timestamp"=>1.0}]
You can load the comparison script into an interactive Ruby shell and play with the algorithms and comparisons dynamically:
$ irb --simple-prompt -r ./comparison.rb
## ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]
>> Input.singleton_methods
[:combinations, :permutations, :best_case, :worst_case, :no_case]
>> Input.combinations
#<Enumerator: ...>
>> Input.combinations.take(3)
[["0", "1", "2", "3"], ["0", "1", "2", "4"], ["0", "1", "2", "5"]]
>> Input.permutations.take(3)
[["0", "0", "0", "0"], ["0", "0", "0", "1"], ["0", "0", "0", "2"]]
>> Input.best_case.take(3)
[["0", "0", "0", "0"], ["0", "0", "0", "1"], ["0", "0", "0", "2"]]
>> Input.worst_case.take(3)
[["G", "G", "G", "G"], ["G", "G", "G", "H"], ["G", "G", "G", "I"]]
>> Input.no_case.take(3)
[["0", "0", "0", "0"], ["0", "0", "0", "1"], ["0", "0", "0", "2"]]
You can even define a new algorithm and measure its performance:
>> def Compressor.my_new_solution(a, b, c, d)
>> a + b - c * d
>> end
:my_new_solution
>> Comparator.measure Input.combinations, :my_new_solution
{"Compression algorithm"=>:my_new_solution,
"Number of strings (domain)"=>58905,
"Number of timestamps (codomain)"=>1183,
"Number of unique timestamps"=>7,
"Number of timestamps reused"=>1176,
"Probability of timestamp reuse"=>0.9940828402366864,
"Number of strings mapped to reused timestamps"=>58898,
"Probability of string mapped to reused timestamp"=>0.9998811645870469}
You can also compare it to other algorithms, just as this article did:
>> Comparator.compare Input.combinations
# ... too much output, so it's omitted ...
If that’s too much, selectively compare just the algorithms you want:
>> Comparator.compare Input.combinations, /my_new|char/
[{"Compression algorithm"=>:merge_fourth_char_into_others,
"Number of strings (domain)"=>58905,
"Number of timestamps (codomain)"=>9936,
"Number of unique timestamps"=>1841,
"Number of timestamps reused"=>8095,
"Probability of timestamp reuse"=>0.8147141706924316,
"Number of strings mapped to reused timestamps"=>57064,
"Probability of string mapped to reused timestamp"=>0.9687462863933453},
{"Compression algorithm"=>:discard_fourth_char,
"Number of strings (domain)"=>58905,
"Number of timestamps (codomain)"=>6545,
"Number of unique timestamps"=>561,
"Number of timestamps reused"=>5984,
"Probability of timestamp reuse"=>0.9142857142857143,
"Number of strings mapped to reused timestamps"=>58344,
"Probability of string mapped to reused timestamp"=>0.9904761904761905},
{"Compression algorithm"=>:my_new_solution,
"Number of strings (domain)"=>58905,
"Number of timestamps (codomain)"=>1183,
"Number of unique timestamps"=>7,
"Number of timestamps reused"=>1176,
"Probability of timestamp reuse"=>0.9940828402366864,
"Number of strings mapped to reused timestamps"=>58898,
"Probability of string mapped to reused timestamp"=>0.9998811645870469}]
comparison.rb
In the Ruby programming language:
# Combines the given base 36 digits together into a 16-bit integer.
module Compressor
extend self
# _ _ _ _ _
# _ __ _ _| | | ___ ___ | |_ _| |_(_) ___ _ __
# | '_ \| | | | | | / __|/ _ \| | | | | __| |/ _ \| '_ \
# | | | | |_| | | | \__ \ (_) | | |_| | |_| | (_) | | | |
# |_| |_|\__,_|_|_| |___/\___/|_|\__,_|\__|_|\___/|_| |_|
#
# Why bother with all of this? Just discard the fourth character.
def discard_fourth_char(a, b, c, d)
(a * (36 ** 2)) + (b * 36) + c
end
# _ _ _
# _ __ ___ _ _ ___ ___ | |_ _| |_(_) ___ _ __ ___
# | '_ ` _ \| | | | / __|/ _ \| | | | | __| |/ _ \| '_ \/ __|
# | | | | | | |_| | \__ \ (_) | | |_| | |_| | (_) | | | \__ \
# |_| |_| |_|\__, | |___/\___/|_|\__,_|\__|_|\___/|_| |_|___/
# |___/
def merge_fourth_char_into_others(a, b, c, d)
c += (d & 0b11)
b += (d & 0b1100) >> 2
a += (d & 0b110000) >> 4
(a * (36 ** 2)) + (b * 36) + c
end
# This is the solution described in this article. (((THE WINNER!)))
def preserve_lower_bits(a, b, c, d)
upper_byte = ((a & 0xF) << 4) | (b & 0xF)
lower_byte = ((c & 0xF) << 4) | (d & 0xF)
(upper_byte << 8) | lower_byte
end
# Generate preserve_bits_WXYZ() functions that try all combinations.
preserve_bits_helper = lambda do |a, b, c, d, bits|
fetch_bits = lambda do |char|
bits.each_with_index.map do |bit, index|
char[bit] << index
end.reduce(:|).to_i
end
upper_byte = (fetch_bits.(a) << 4) | fetch_bits.(b)
lower_byte = (fetch_bits.(c) << 4) | fetch_bits.(d)
((upper_byte << 8) | lower_byte)
end
(0..5).to_a.combination(4).each do |bits|
define_method "preserve_bits_#{bits.sort.reverse.map(&:next).join}" do
|a, b, c, d| preserve_bits_helper.(a, b, c, d, bits)
end
end
def preserve_lower_bits_then_add_upper_bits(a, b, c, d)
lower_bits = lambda {|char| (char & 0b001111) }
upper_bits = lambda {|char| (char & 0b110000) >> 4 }
upper_byte = (lower_bits.(a) << 4) | lower_bits.(c)
lower_byte = (lower_bits.(b) << 4) | lower_bits.(d)
extra_byte = (upper_bits.(a) << 6) | (upper_bits.(c) << 4) |
(upper_bits.(b) << 2) | (upper_bits.(d) << 0)
((upper_byte << 8) | lower_byte) + extra_byte
end
def preserve_lower_bits_then_xor_upper_bits(a, b, c, d)
lower_bits = lambda {|char| (char & 0b001111) }
upper_bits = lambda {|char| (char & 0b110000) >> 4 }
xor_helper = lambda do |char|
((upper_bits.(char) ^ (lower_bits.(char) >> 2)) << 2) |
(lower_bits.(char) & 0b11)
end
upper_byte = (xor_helper.(a) << 4) | xor_helper.(c)
lower_byte = (xor_helper.(b) << 4) | xor_helper.(d)
((upper_byte << 8) | lower_byte)
end
# _ _ _ _
# _ __ ___ __ _| | ___ ___ | |_ _| |_(_) ___ _ __ ___
# | '__/ _ \/ _` | | / __|/ _ \| | | | | __| |/ _ \| '_ \/ __|
# | | | __/ (_| | | \__ \ (_) | | |_| | |_| | (_) | | | \__ \
# |_| \___|\__,_|_| |___/\___/|_|\__,_|\__|_|\___/|_| |_|___/
#
# https://en.wikipedia.org/wiki/Pearson_hashing
PEARSON_TABLE = ( 0 ... 2**16 ).to_a.shuffle
def pearson_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
index = hash ^ char
hash = PEARSON_TABLE[index]
end
hash
end
# http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
def rotating_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
hash = (hash << 4) ^ (hash >> 28) ^ char
end
hash
end
# http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
def bernstein_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
hash = 33 * hash + char
end
hash
end
# http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
def bernstein_xor_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
hash = 33 * hash ^ char
end
hash
end
# http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
def shift_add_xor_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
hash ^= (hash << 5) + (hash >> 2) + char
end
hash
end
# http://www.isthe.com/chongo/tech/comp/fnv/#FNV-1
def fnv1_hash(a, b, c, d)
hash = 2166136261
[a, b, c, d].each do |char|
hash = (hash * 16777619) ^ char
end
hash
end
# http://www.isthe.com/chongo/tech/comp/fnv/#FNV-1a
def fnv1a_hash(a, b, c, d)
hash = 2166136261
[a, b, c, d].each do |char|
hash = (hash ^ char) * 16777619
end
hash
end
# https://en.wikipedia.org/wiki/Jenkins_hash_function
def jenkins_one_at_a_time_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
hash += char
hash += (hash << 10)
hash ^= (hash >> 6)
end
hash += (hash << 3)
hash ^= (hash >> 11)
hash += (hash << 15)
hash
end
# Algorithm::Nhash - Exim nhash algorithm
# http://cpansearch.perl.org/src/MOOLI/Algorithm-Nhash-0.002/lib/Algorithm/Nhash.pm
NHASH_PRIMES = [3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113]
def exim_nhash(a, b, c, d)
sum = 0
i = 0
[a, b, c, d].each do |char|
i += 28
i %= 29
sum += NHASH_PRIMES[i] * char
end
sum
end
# https://en.wikipedia.org/wiki/Fletcher%27s_checksum
def fletcher16_checksum(a, b, c, d)
sum1 = 0
sum2 = 0
[a, b, c, d].each do |char|
sum1 = (sum1 + char) % 255
sum2 = (sum2 + sum1) % 255
end
(sum2 << 8) | sum1
end
# https://en.wikipedia.org/wiki/BSD_checksum
def bsd_checksum(a, b, c, d)
checksum = 0 # The checksum mod 2^16.
[a, b, c, d].each do |char|
checksum = (checksum >> 1) + ((checksum & 1) << 15)
checksum += char
checksum &= 0xffff # Keep it within bounds.
end
checksum
end
# https://en.wikipedia.org/wiki/Longitudinal_redundancy_check
def lrc_checksum(a, b, c, d)
lrc = 0
[a, b, c, d].each do |char|
lrc = (lrc + char) & 0xFF
end
lrc = (((lrc ^ 0xFF) + 1) & 0xFF)
end
# http://www.cse.yorku.ca/~oz/hash.html
def djb2_hash(a, b, c, d)
hash = 5381
[a, b, c, d].each do |char|
hash = ((hash << 5) + hash) + char # hash * 33 + c
end
hash
end
# http://www.cse.yorku.ca/~oz/hash.html
def sdbm_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
hash = char + (hash << 6) + (hash << 16) - hash
end
hash
end
# http://www.cse.yorku.ca/~oz/hash.html
def lose_lose_hash(a, b, c, d)
hash = 0
[a, b, c, d].each do |char|
hash += char
end
hash
end
end
module Solver
extend self
# Converts the given 4-character string into a "HH:MM:SS" timestamp.
def solution(string, compressor)
a, b, c, d = digits36(string)
num_seconds = Compressor.send(compressor, a, b, c, d)
Time.at(num_seconds).strftime('%H:%M:%S')
end
# Converts each character in the given string into a base 36 digits.
def digits36(string)
string.each_char.map { |char| digit36(char) }
end
# Converts the given character into a base 36 digit:
# decimal digits (0..9) + uppercase letters (10..35)
def digit36(char)
case char
when '0'..'9' then char.ord - '0'.ord
when 'A'..'Z' then char.ord - 'A'.ord + 10
else -1
end
end
# Converts the given base 36 digit into a character.
def undigit36(code)
case code
when 0..9 then ('0'.ord + code).chr
when 10..35 then ('A'.ord + (code - 10)).chr
else '?'
end
end
end
module Input
extend self
DOMAIN = ('0'..'9').to_a + ('A'..'Z').to_a
def combinations
DOMAIN.combination(4)
end
def permutations(domain=DOMAIN)
if block_given?
domain.each do |a|
domain.each do |b|
domain.each do |c|
domain.each do |d|
yield [a, b, c, d]
end
end
end
end
else
enum_for __method__, domain
end
end
def best_case
permutations DOMAIN[0..15]
end
def worst_case
permutations DOMAIN[16..35]
end
alias no_case permutations
end
module Comparator
extend self
# Compares the fitness of all compressors that match the given regexp.
def compare(inputs, compressor_regexp=//)
Compressor.singleton_methods.grep(compressor_regexp).
map { |compressor| measure(inputs, compressor) }.
sort_by { |measurement| measurement['Probability of timestamp reuse'] }
end
# Measures the fitness of the given compressor.
def measure(inputs, compressor)
reuse_by_timestamp = measure_reuse(inputs, compressor)
num_keys = reuse_by_timestamp.count
num_values = reuse_by_timestamp.values.reduce(:+).to_i
actual_reuse = reuse_by_timestamp.select {|k,v| v > 1 }
num_keys_reused = actual_reuse.count
num_values_reused = actual_reuse.values.reduce(:+).to_i
{
'Compression algorithm' => compressor,
'Number of strings (domain)' => num_values,
'Number of timestamps (codomain)' => num_keys,
'Number of unique timestamps' => num_keys - num_keys_reused,
'Number of timestamps reused' => num_keys_reused,
'Probability of timestamp reuse' => num_keys_reused / num_keys.to_f,
'Number of strings mapped to reused timestamps' => num_values_reused,
'Probability of string mapped to reused timestamp' => num_values_reused / num_values.to_f,
}
end
def measure_reuse(inputs, compressor)
reuse_by_timestamp = Hash.new(0)
inputs.each do |characters|
string = characters.join
timestamp = Solver.solution(string, compressor)
reuse_by_timestamp[timestamp] += 1
end
reuse_by_timestamp
end
end