Encoding four base 36 digits as a timestamp

Suraj N. Kurapati

25 April 2015
10 May 2015

Problem
Approach
Solution
Evaluation
Comparison
Behind the scenes
1. comparison.rb

Problem

Convert a string containing 4 uppercase alphanumeric characters, whose values range from A through Z and 0 through 9, into a “HH:MM:SS” (24 hours, 60 minutes, 60 seconds) timestamp whose value ranges from “00:00:00” through “23:59:59”.

The string has 1679616 possible values (36 × 36 × 36 × 36) whereas the timestamp has 86400 possible values (24 × 60 × 60). Thus the challenge is to convey more information using less.

How would you solve this problem? Pause for a moment to think about it. In fact, go now and implement your solution before reading about my particular approach and solution below, as this is quite a fun puzzle! :)

Approach

Each character in the input string can be represented as a base 36 digit:

Characters 0 through 9 become integers 0 through 9 respectively.
Characters A through Z become integers 10 through 35 respectively.

Let’s examine the base 36 and binary representations of each character:

Character	Base 36	Bit 6	Bit 5	Bit 4	Bit 3	Bit 2	Bit 1
0	0	.	.	.	.	.	0
1	1	.	.	.	.	.	1
2	2	.	.	.	.	1	0
3	3	.	.	.	.	1	1
4	4	.	.	.	1	0	0
5	5	.	.	.	1	0	1
6	6	.	.	.	1	1	0
7	7	.	.	.	1	1	1
8	8	.	.	1	0	0	0
9	9	.	.	1	0	0	1
A	10	.	.	1	0	1	0
B	11	.	.	1	0	1	1
C	12	.	.	1	1	0	0
D	13	.	.	1	1	0	1
E	14	.	.	1	1	1	0
F	15	.	.	1	1	1	1
G	16	.	1	0	0	0	0
H	17	.	1	0	0	0	1
I	18	.	1	0	0	1	0
J	19	.	1	0	0	1	1
K	20	.	1	0	1	0	0
L	21	.	1	0	1	0	1
M	22	.	1	0	1	1	0
N	23	.	1	0	1	1	1
O	24	.	1	1	0	0	0
P	25	.	1	1	0	0	1
Q	26	.	1	1	0	1	0
R	27	.	1	1	0	1	1
S	28	.	1	1	1	0	0
T	29	.	1	1	1	0	1
U	30	.	1	1	1	1	0
V	31	.	1	1	1	1	1
W	32	1	0	0	0	0	0
X	33	1	0	0	0	0	1
Y	34	1	0	0	0	1	0
Z	35	1	0	0	0	1	1

Notice that bit 6 is present in only 4 characters, W through Z, and absent in all others. Thus, if we randomly picked a character from the set of 36 possible characters, the probability of finding bit 6 in our pick is 4 ÷ 36, or 0.1111111111111111, which approximately denotes a 11% chance of success.

In contrast, notice that bit 1 is present in every character. Thus, if we randomly picked a character from the set of 36 possible characters, the probability of finding bit 1 in our pick is 36 ÷ 36, or 1, which denotes a 100% chance of success.

These facts are recorded in the table below.

Bit	Number of characters present in	Probability of presence
6	4	0.1111111111111111
5	20	0.5555555555555556
4	28	0.7777777777777778
3	32	0.8888888888888888
2	34	0.9444444444444444
1	36	1

The timestamp, with its 86400 possible values, cannot represent all 1679616 values possible in the string unless we reuse some of them. That is, instead of mapping each unique string value to exactly one unique timestamp value, we map multiple unique string values to the same unique timestamp value, thereby reusing it.

This reuse happens automatically when we discard information. For example, if we discard the fourth character from two different strings “ABCD” and “ABCX”, then they would both become “ABC”. Next, if the string “ABC” were mapped to the timestamp “01:02:03”, then both strings “ABCD” and “ABCX” would map to the same “01:02:03” timestamp, thereby achieving reuse.

We can discard information strategically based on the probabilities listed in the table above. In particular, we can discard bits with lower presence probabilities with less risk of information loss because discarding nonexistent bits from a value does not change it. For example, bit 6 is present 11% of the time, so we only risk information loss 11% of the time by discarding it. In contrast, bit 1 is present 100% of the time, so we risk information loss 100% of the time by discarding it.

In binary, the string occupies 24 bits (4 characters × 6 bits per character) and the timestamp occupies at least 16 bits (2¹⁶ < 86400 possible values < 2^17). Therefore, we need to discard at least 8 bits (24 string bits - 16 timestamp bits) of information from the string to fit it within the timestamp’s 16 bits. This translates to 2 bits (8 bits discarded ÷ 4 characters) discarded per character.

Since lower bits are more likely to be present than higher ones, as the table above shows, we should keep as many lower bits as possible and discard as few higher bits as necessary. Thus, we shall discard the upper 2 bits (bit 6 and 5) from each character in the string to fit the string within the timestamp.

Solution

In the Ruby programming language:


# Converts the given string into a "HH:MM:SS" timestamp.
def solution(string)
  digits = digits36(string)
  num_seconds = compress(*digits)
  Time.at(num_seconds).utc.strftime('%H:%M:%S')
end

# Converts each character in the given string into a base 36 digit.
def digits36(string)
  string.each_char.map { |char| digit36(char) }
end

# Converts the given character into a base 36 digit:
# decimal digits (0..9) + uppercase letters (10..35)
def digit36(char)
  case char
  when '0'..'9' then char.ord - '0'.ord
  when 'A'..'Z' then char.ord - 'A'.ord + 10
  else raise ArgumentError, "not a base36 digit: #{char.inspect}"
  end
end

# Combines the given base 36 digits together into a 16-bit integer.
SECONDS_PER_DAY = 24 * 60 * 60 # hours * minutes/hour * seconds/minute
SECONDS_PER_DAY_BITS = Math.log2(SECONDS_PER_DAY).ceil # 2 ** exponent
def compress(*chars)
  return 0 if chars.empty?
  bits = SECONDS_PER_DAY_BITS / chars.length
  mask = (1 << bits) - 1
  chars.inject(0) do |acc, char|
    # grab lower bits from each character and place them side by side
    (acc << bits) | (char & mask)
  end % SECONDS_PER_DAY
end

Save the above code to a solution.rb file to play with it:


$ irb --simple-prompt -r ./solution.rb
## ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]

>> solution "ABCD"
"12:13:01"

>> digits36 "ABCD"
[10, 11, 12, 13]

>> compress 10, 11, 12, 13
43981

>> Time.at(43981).utc
1970-01-01 12:13:01 UTC

The solution above is generic to any number of characters, so try more:


$ irb --simple-prompt -r ./solution.rb
## ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]

>> solution "ABCDEFG"
"03:09:32"

>> digits36 "ABCDEFG"
[10, 11, 12, 13, 14, 15, 16]

>> compress 10, 11, 12, 13, 14, 15, 16
11372

>> Time.at(11372).utc
1970-01-01 03:09:32 UTC

Evaluation

To evaluate how well the algorithm performs, we feed it different sets of input strings and analyze its output: the more input strings it maps to unique timestamps, the smaller the reuse probability and the better it is.

Algorithm (f : string → timestamp) characteristics	Best case	Worst case	No case
Number of strings (domain)	65536	160000	1679616
Number of timestamps (codomain)	65536	65536	65536
Number of unique timestamps	65536	20736	0
Number of timestamps reused	0	44800	65536
Probability of timestamp reuse	0	0.68359375	1
Number of strings mapped to reused timestamps	0	139264	1679616
Probability of string mapped to reused timestamp	0	0.8704	1

In the best case, where all input characters fed into the algorithm lack the upper bits it discards, information is never lost. Therefore, the probability of timestamp reuse is zero.

In the worst case, where all input characters fed into the algorithm contain the upper bits it discards, some information is always lost. Therefore, the probability of timestamp reuse is greater than zero.

In the “no” case, where some input characters fed into the algorithm contain the upper bits it discards, information is sometimes lost. However, since this input set contains all 1679616 possible input strings, timestamps will always be reused: the timestamp, with its 86400 possible values, cannot represent all 1679616 values possible in the string unless we reuse some of them.

This observation applies to any such algorithm.

Comparison

My solution to this problem is based on probabilities. However, there are many other solutions, some of which are known as hash functions. Let’s compare them, along with my solution, to see how well they perform.

To compare these algorithms fairly, we need to feed them the same set of strings. Unlike the singular evaluation of my solution above, here we don’t have the luxury of partitioning the input space into best and worst cases up front, because such cases vary wildly across these algorithms.

Here, I have chosen to feed all 4-subsets of the 36 possible uppercase alphanumeric characters, which produces 58905 unique strings containing 4 unique uppercase alphanumeric characters each, into these algorithms. Although this choice poorly approximates reality, wherein strings may contain repeated characters, some such sacrifice is necessary to produce a non-exhaustive set of inputs with which to compare these algorithms.

The results are recorded in the table below, sorted by reuse probabilities in ascending order: from the best result first to the worst result last.

The key result to notice here is that my solution had the best performance among all of these algorithms. I was delighted to see this because I expected it fare far worse than real algorithms. :-)

One surprising result is that the ∅ solution, which simply discards the fourth character and doesn’t bother with any of this algorithmic nonsense, performs better than half of these algorithms. Although its performance isn’t praiseworthy, it nevertheless hints that perhaps, sometimes, ignoring a problem can be just as effective as solving it.

Augh! What am I saying? Those are not the words of a true engineer! Problems must always be solved and solved well, so long as it’s fun. ;-) Anyway, that’s all folks. I hope you enjoyed this algorithmic treatise. But wait! Don’t leave just yet because there are some fun behind the scenes goodies waiting for you below! :-)

Algorithm (f : string → timestamp)	Number of strings (domain)	Number of timestamps (codomain)	Number of unique timestamps	Number of timestamps reused	Probability of timestamp reuse	Number of strings mapped to reused timestamps	Probability of string mapped to reused timestamp
My solution	58905	39531	34690	4841	0.12246085350737396	24215	0.411085646379764
Preserve bits 4,3,2,1	58905	39531	34690	4841	0.12246085350737396	24215	0.411085646379764
Rotating hash	58905	42936	34649	8287	0.193008198248556	24256	0.4117816823699177
Bernstein XOR hash	58905	46718	35673	11045	0.23641851106639838	23232	0.39439775910364144
Shift-Add-XOR hash	58905	44623	32440	12183	0.2730206395804854	26465	0.4492827434003905
Preserve bits 4,3,2,1 then add bits 6,5	58905	40651	29341	11310	0.2782219379597058	29564	0.5018928783634666
Bernstein hash	58905	43281	28707	14574	0.33672974284327994	30198	0.5126559714795009
djb2 hash	58905	43281	28707	14574	0.33672974284327994	30198	0.5126559714795009
Pearson hash	58905	38838	23805	15033	0.3870693650548432	35100	0.5958747135217723
FNV-1a hash	58905	36225	21085	15140	0.4179434092477571	37820	0.6420507596978186
FNV-1 hash	58905	35338	20229	15109	0.42755673778934855	38676	0.6565826330532213
Preserve bits 4,3,2,1 then XOR bits 6,5	58905	29003	15230	13773	0.47488190876805847	43675	0.7414480943892708
Preserve bits 5,3,2,1	58905	21243	6706	14537	0.6843195405545356	52199	0.8861556743909685
Exim nhash	58905	7401	1280	6121	0.8270503985947845	57625	0.9782700959171547
Merge fourth character into others	58905	9467	1477	7990	0.8439843667476498	57428	0.9749257278669043
Fletcher16 checksum	58905	4935	621	4314	0.8741641337386018	58284	0.9894576012223071
Preserve bits 5,4,2,1	58905	12931	1610	11321	0.8754930013146702	57295	0.9726678550207962
sdbm hash	58905	3343	331	3012	0.9009871373018247	58574	0.9943807826160768
Discard fourth character (∅ solution)	58905	6545	561	5984	0.9142857142857143	58344	0.9904761904761905
Preserve bits 6,3,2,1	58905	6561	331	6230	0.9495503734186862	58574	0.9943807826160768
Preserve bits 5,4,3,1	58905	7191	327	6864	0.9545264914476429	58578	0.9944486885663356
BSD checksum	58905	464	21	443	0.9547413793103449	58884	0.9996434937611408
LRC checksum	58905	129	4	125	0.9689922480620154	58901	0.9999320940497411
lose lose hash	58905	129	4	125	0.9689922480620154	58901	0.9999320940497411
Preserve bits 6,4,2,1	58905	6145	171	5974	0.9721724979658258	58734	0.9970970206264324
Preserve bits 6,5,2,1	58905	2625	71	2554	0.9729523809523809	58834	0.9987946693829047
Preserve bits 5,4,3,2	58905	5655	150	5505	0.9734748010610079	58755	0.9974535268652915
Preserve bits 6,5,3,1	58905	1953	39	1914	0.9800307219662059	58866	0.9993379169849758
Preserve bits 6,5,4,1	58905	1073	21	1052	0.9804287045666356	58884	0.9996434937611408
Preserve bits 6,5,4,3	58905	495	9	486	0.9818181818181818	58896	0.9998472116119175
Preserve bits 6,4,3,1	58905	4193	75	4118	0.9821130455521107	58830	0.9987267634326458
Jenkins one-at-a-time hash	58905	9582	127	9455	0.9867459820496765	58778	0.9978439860792802
Preserve bits 6,5,4,2	58905	1005	13	992	0.9870646766169154	58892	0.9997793056616586
Preserve bits 6,5,3,2	58905	1749	21	1728	0.9879931389365352	58884	0.9996434937611408
Preserve bits 6,4,3,2	58905	3621	37	3584	0.9897818282242474	58868	0.9993718699601053

Behind the scenes

I created the giant data table you see above, which compares different algorithms, by first running the comparison script and then formatting its results into a nice data table:

$ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.combinations' > comparison.txt

$ vim comparison.txt # do some post-processing to format the results into a nice data table

Similarly, I created the smaller data table you saw above, which compares the best, worst, and “no” case performance of my solution, by running these commands:


# BEST CASE

$ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.best_case, /^preserve_lower_bits$/'
[{"Compression algorithm"=>:preserve_lower_bits,
  "Number of strings (domain)"=>65536,
  "Number of timestamps (codomain)"=>65536,
  "Number of unique timestamps"=>65536,
  "Number of timestamps reused"=>0,
  "Probability of timestamp reuse"=>0.0,
  "Number of strings mapped to reused timestamps"=>0,
  "Probability of string mapped to reused timestamp"=>0.0}]

# WORST CASE

$ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.worst_case, /^preserve_lower_bits$/'
[{"Compression algorithm"=>:preserve_lower_bits,
  "Number of strings (domain)"=>160000,
  "Number of timestamps (codomain)"=>65536,
  "Number of unique timestamps"=>20736,
  "Number of timestamps reused"=>44800,
  "Probability of timestamp reuse"=>0.68359375,
  "Number of strings mapped to reused timestamps"=>139264,
  "Probability of string mapped to reused timestamp"=>0.8704}]

# NO CASE

$ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.no_case, /^preserve_lower_bits$/'
[{"Compression algorithm"=>:preserve_lower_bits,
  "Number of strings (domain)"=>1679616,
  "Number of timestamps (codomain)"=>65536,
  "Number of unique timestamps"=>0,
  "Number of timestamps reused"=>65536,
  "Probability of timestamp reuse"=>1.0,
  "Number of strings mapped to reused timestamps"=>1679616,
  "Probability of string mapped to reused timestamp"=>1.0}]

You can load the comparison script into an interactive Ruby shell and play with the algorithms and comparisons dynamically:

$ irb --simple-prompt -r ./comparison.rb
## ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]

>> Input.singleton_methods
[:combinations, :permutations, :best_case, :worst_case, :no_case]

>> Input.combinations
#<Enumerator: ...>

>> Input.combinations.take(3)
[["0", "1", "2", "3"], ["0", "1", "2", "4"], ["0", "1", "2", "5"]]

>> Input.permutations.take(3)
[["0", "0", "0", "0"], ["0", "0", "0", "1"], ["0", "0", "0", "2"]]

>> Input.best_case.take(3)
[["0", "0", "0", "0"], ["0", "0", "0", "1"], ["0", "0", "0", "2"]]

>> Input.worst_case.take(3)
[["G", "G", "G", "G"], ["G", "G", "G", "H"], ["G", "G", "G", "I"]]

>> Input.no_case.take(3)
[["0", "0", "0", "0"], ["0", "0", "0", "1"], ["0", "0", "0", "2"]]

You can even define a new algorithm and measure its performance:

>> def Compressor.my_new_solution(a, b, c, d)
>>   a + b - c * d
>> end
:my_new_solution

>> Comparator.measure Input.combinations, :my_new_solution
{"Compression algorithm"=>:my_new_solution,
 "Number of strings (domain)"=>58905,
 "Number of timestamps (codomain)"=>1183,
 "Number of unique timestamps"=>7,
 "Number of timestamps reused"=>1176,
 "Probability of timestamp reuse"=>0.9940828402366864,
 "Number of strings mapped to reused timestamps"=>58898,
 "Probability of string mapped to reused timestamp"=>0.9998811645870469}

You can also compare it to other algorithms, just as this article did:

>> Comparator.compare Input.combinations
# ... too much output, so it's omitted ...

If that’s too much, selectively compare just the algorithms you want:

>> Comparator.compare Input.combinations, /my_new|char/
[{"Compression algorithm"=>:merge_fourth_char_into_others,
  "Number of strings (domain)"=>58905,
  "Number of timestamps (codomain)"=>9936,
  "Number of unique timestamps"=>1841,
  "Number of timestamps reused"=>8095,
  "Probability of timestamp reuse"=>0.8147141706924316,
  "Number of strings mapped to reused timestamps"=>57064,
  "Probability of string mapped to reused timestamp"=>0.9687462863933453},
 {"Compression algorithm"=>:discard_fourth_char,
  "Number of strings (domain)"=>58905,
  "Number of timestamps (codomain)"=>6545,
  "Number of unique timestamps"=>561,
  "Number of timestamps reused"=>5984,
  "Probability of timestamp reuse"=>0.9142857142857143,
  "Number of strings mapped to reused timestamps"=>58344,
  "Probability of string mapped to reused timestamp"=>0.9904761904761905},
 {"Compression algorithm"=>:my_new_solution,
  "Number of strings (domain)"=>58905,
  "Number of timestamps (codomain)"=>1183,
  "Number of unique timestamps"=>7,
  "Number of timestamps reused"=>1176,
  "Probability of timestamp reuse"=>0.9940828402366864,
  "Number of strings mapped to reused timestamps"=>58898,
  "Probability of string mapped to reused timestamp"=>0.9998811645870469}]

comparison.rb

In the Ruby programming language:


# Combines the given base 36 digits together into a 16-bit integer.
module Compressor
  extend self

  #              _ _             _       _   _
  #  _ __  _   _| | |  ___  ___ | |_   _| |_(_) ___  _ __
  # | '_ \| | | | | | / __|/ _ \| | | | | __| |/ _ \| '_ \
  # | | | | |_| | | | \__ \ (_) | | |_| | |_| | (_) | | | |
  # |_| |_|\__,_|_|_| |___/\___/|_|\__,_|\__|_|\___/|_| |_|
  #

  # Why bother with all of this? Just discard the fourth character.
  def discard_fourth_char(a, b, c, d)
    (a * (36 ** 2)) + (b * 36) + c
  end

  #                              _       _   _
  #  _ __ ___  _   _   ___  ___ | |_   _| |_(_) ___  _ __  ___
  # | '_ ` _ \| | | | / __|/ _ \| | | | | __| |/ _ \| '_ \/ __|
  # | | | | | | |_| | \__ \ (_) | | |_| | |_| | (_) | | | \__ \
  # |_| |_| |_|\__, | |___/\___/|_|\__,_|\__|_|\___/|_| |_|___/
  #            |___/

  def merge_fourth_char_into_others(a, b, c, d)
    c += (d & 0b11)
    b += (d & 0b1100) >> 2
    a += (d & 0b110000) >> 4
    (a * (36 ** 2)) + (b * 36) + c
  end

  # This is the solution described in this article.  (((THE WINNER!)))
  def preserve_lower_bits(a, b, c, d)
    upper_byte = ((a & 0xF) << 4) | (b & 0xF)
    lower_byte = ((c & 0xF) << 4) | (d & 0xF)
    (upper_byte << 8) | lower_byte
  end

  # Generate preserve_bits_WXYZ() functions that try all combinations.
  preserve_bits_helper = lambda do |a, b, c, d, bits|
    fetch_bits = lambda do |char|
      bits.each_with_index.map do |bit, index|
        char[bit] << index
      end.reduce(:|).to_i
    end
    upper_byte = (fetch_bits.(a) << 4) | fetch_bits.(b)
    lower_byte = (fetch_bits.(c) << 4) | fetch_bits.(d)
    ((upper_byte << 8) | lower_byte)
  end
  (0..5).to_a.combination(4).each do |bits|
    define_method "preserve_bits_#{bits.sort.reverse.map(&:next).join}" do
      |a, b, c, d| preserve_bits_helper.(a, b, c, d, bits)
    end
  end

  def preserve_lower_bits_then_add_upper_bits(a, b, c, d)
    lower_bits = lambda {|char| (char & 0b001111) }
    upper_bits = lambda {|char| (char & 0b110000) >> 4 }
    upper_byte = (lower_bits.(a) << 4) | lower_bits.(c)
    lower_byte = (lower_bits.(b) << 4) | lower_bits.(d)
    extra_byte = (upper_bits.(a) << 6) | (upper_bits.(c) << 4) |
                 (upper_bits.(b) << 2) | (upper_bits.(d) << 0)
    ((upper_byte << 8) | lower_byte) + extra_byte
  end

  def preserve_lower_bits_then_xor_upper_bits(a, b, c, d)
    lower_bits = lambda {|char| (char & 0b001111) }
    upper_bits = lambda {|char| (char & 0b110000) >> 4 }
    xor_helper = lambda do |char|
      ((upper_bits.(char) ^ (lower_bits.(char) >> 2)) << 2) |
       (lower_bits.(char) & 0b11)
    end
    upper_byte = (xor_helper.(a) << 4) | xor_helper.(c)
    lower_byte = (xor_helper.(b) << 4) | xor_helper.(d)
    ((upper_byte << 8) | lower_byte)
  end

  #                 _             _       _   _
  #  _ __ ___  __ _| |  ___  ___ | |_   _| |_(_) ___  _ __  ___
  # | '__/ _ \/ _` | | / __|/ _ \| | | | | __| |/ _ \| '_ \/ __|
  # | | |  __/ (_| | | \__ \ (_) | | |_| | |_| | (_) | | | \__ \
  # |_|  \___|\__,_|_| |___/\___/|_|\__,_|\__|_|\___/|_| |_|___/
  #

  # https://en.wikipedia.org/wiki/Pearson_hashing
  PEARSON_TABLE = ( 0 ... 2**16 ).to_a.shuffle
  def pearson_hash(a, b, c, d)
    hash = 0
    [a, b, c, d].each do |char|
      index = hash ^ char
      hash = PEARSON_TABLE[index]
    end
    hash
  end

  # http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
  def rotating_hash(a, b, c, d)
    hash = 0
    [a, b, c, d].each do |char|
      hash = (hash << 4) ^ (hash >> 28) ^ char
    end
    hash
  end

  # http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
  def bernstein_hash(a, b, c, d)
    hash = 0
    [a, b, c, d].each do |char|
      hash = 33 * hash + char
    end
    hash
  end

  # http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
  def bernstein_xor_hash(a, b, c, d)
    hash = 0
    [a, b, c, d].each do |char|
      hash = 33 * hash ^ char
    end
    hash
  end

  # http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
  def shift_add_xor_hash(a, b, c, d)
    hash = 0
    [a, b, c, d].each do |char|
      hash ^= (hash << 5) + (hash >> 2) + char
    end
    hash
  end

  # http://www.isthe.com/chongo/tech/comp/fnv/#FNV-1
  def fnv1_hash(a, b, c, d)
    hash = 2166136261
    [a, b, c, d].each do |char|
      hash = (hash * 16777619) ^ char
    end
    hash
  end

  # http://www.isthe.com/chongo/tech/comp/fnv/#FNV-1a
  def fnv1a_hash(a, b, c, d)
    hash = 2166136261
    [a, b, c, d].each do |char|
      hash = (hash ^ char) * 16777619
    end
    hash
  end

  # https://en.wikipedia.org/wiki/Jenkins_hash_function
  def jenkins_one_at_a_time_hash(a, b, c, d)
    hash = 0
    [a, b, c, d].each do |char|
      hash += char
      hash += (hash << 10)
      hash ^= (hash >> 6)
    end
    hash += (hash << 3)
    hash ^= (hash >> 11)
    hash += (hash << 15)
    hash
  end

  # Algorithm::Nhash - Exim nhash algorithm
  # http://cpansearch.perl.org/src/MOOLI/Algorithm-Nhash-0.002/lib/Algorithm/Nhash.pm
  NHASH_PRIMES = [3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
                  61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113]
  def exim_nhash(a, b, c, d)
    sum = 0
    i = 0
    [a, b, c, d].each do |char|
      i += 28
      i %= 29
      sum += NHASH_PRIMES[i] * char
    end
    sum
  end

  # https://en.wikipedia.org/wiki/Fletcher%27s_checksum
  def fletcher16_checksum(a, b, c, d)
    sum1 = 0
    sum2 = 0

    [a, b, c, d].each do |char|
      sum1 = (sum1 + char) % 255
      sum2 = (sum2 + sum1) % 255
    end

    (sum2 << 8) | sum1
  end

  # https://en.wikipedia.org/wiki/BSD_checksum
  def bsd_checksum(a, b, c, d)
    checksum = 0                 # The checksum mod 2^16.
    [a, b, c, d].each do |char|
      checksum = (checksum >> 1) + ((checksum & 1) << 15)
      checksum += char
      checksum &= 0xffff         # Keep it within bounds.
    end
    checksum
  end

  # https://en.wikipedia.org/wiki/Longitudinal_redundancy_check
  def lrc_checksum(a, b, c, d)
    lrc = 0
    [a, b, c, d].each do |char|
      lrc = (lrc + char) & 0xFF
    end
    lrc = (((lrc ^ 0xFF) + 1) & 0xFF)
  end

  # http://www.cse.yorku.ca/~oz/hash.html
  def djb2_hash(a, b, c, d)
    hash = 5381
    [a, b, c, d].each do |char|
      hash = ((hash << 5) + hash) + char # hash * 33 + c
    end
    hash
  end

  # http://www.cse.yorku.ca/~oz/hash.html
  def sdbm_hash(a, b, c, d)
    hash = 0
    [a, b, c, d].each do |char|
      hash = char + (hash << 6) + (hash << 16) - hash
    end
    hash
  end

  # http://www.cse.yorku.ca/~oz/hash.html
  def lose_lose_hash(a, b, c, d)
    hash = 0
    [a, b, c, d].each do |char|
      hash += char
    end
    hash
  end

end

module Solver
  extend self

  # Converts the given 4-character string into a "HH:MM:SS" timestamp.
  def solution(string, compressor)
    a, b, c, d  = digits36(string)
    num_seconds = Compressor.send(compressor, a, b, c, d)
    Time.at(num_seconds).strftime('%H:%M:%S')
  end

  # Converts each character in the given string into a base 36 digits.
  def digits36(string)
    string.each_char.map { |char| digit36(char) }
  end

  # Converts the given character into a base 36 digit:
  # decimal digits (0..9) + uppercase letters (10..35)
  def digit36(char)
    case char
    when '0'..'9' then char.ord - '0'.ord
    when 'A'..'Z' then char.ord - 'A'.ord + 10
    else -1
    end
  end

  # Converts the given base 36 digit into a character.
  def undigit36(code)
    case code
    when  0..9  then ('0'.ord + code).chr
    when 10..35 then ('A'.ord + (code - 10)).chr
    else '?'
    end
  end

end

module Input
  extend self

  DOMAIN = ('0'..'9').to_a + ('A'..'Z').to_a

  def combinations
    DOMAIN.combination(4)
  end

  def permutations(domain=DOMAIN)
    if block_given?
      domain.each do |a|
        domain.each do |b|
          domain.each do |c|
            domain.each do |d|
              yield [a, b, c, d]
            end
          end
        end
      end
    else
      enum_for __method__, domain
    end
  end

  def best_case
    permutations DOMAIN[0..15]
  end

  def worst_case
    permutations DOMAIN[16..35]
  end

  alias no_case permutations

end

module Comparator
  extend self

  # Compares the fitness of all compressors that match the given regexp.
  def compare(inputs, compressor_regexp=//)
    Compressor.singleton_methods.grep(compressor_regexp).
      map { |compressor| measure(inputs, compressor) }.
      sort_by { |measurement| measurement['Probability of timestamp reuse'] }
  end

  # Measures the fitness of the given compressor.
  def measure(inputs, compressor)
    reuse_by_timestamp = measure_reuse(inputs, compressor)

    num_keys = reuse_by_timestamp.count
    num_values = reuse_by_timestamp.values.reduce(:+).to_i

    actual_reuse = reuse_by_timestamp.select {|k,v| v > 1 }
    num_keys_reused = actual_reuse.count
    num_values_reused = actual_reuse.values.reduce(:+).to_i

    {
      'Compression algorithm'                            => compressor,
      'Number of strings (domain)'                       => num_values,
      'Number of timestamps (codomain)'                  => num_keys,
      'Number of unique timestamps'                      => num_keys - num_keys_reused,
      'Number of timestamps reused'                      => num_keys_reused,
      'Probability of timestamp reuse'                   => num_keys_reused / num_keys.to_f,
      'Number of strings mapped to reused timestamps'    => num_values_reused,
      'Probability of string mapped to reused timestamp' => num_values_reused / num_values.to_f,
    }
  end

  def measure_reuse(inputs, compressor)
    reuse_by_timestamp = Hash.new(0)
    inputs.each do |characters|
      string = characters.join
      timestamp = Solver.solution(string, compressor)
      reuse_by_timestamp[timestamp] += 1
    end
    reuse_by_timestamp
  end

end

Updates

Comments

NOTE: Comments are temporarily disabled on this page while I migrate them from Disqus to Gisqus (GitHub Discussions) because the former has 💪 forced the display of advertisements 🤑 on this website recently.

Character	Base 36	Bit 6	Bit 5	Bit 4	Bit 3	Bit 2	Bit 1
0	0	.	.	.	.	.	0
1	1	.	.	.	.	.	1
2	2	.	.	.	.	1	0
3	3	.	.	.	.	1	1
4	4	.	.	.	1	0	0
5	5	.	.	.	1	0	1
6	6	.	.	.	1	1	0
7	7	.	.	.	1	1	1
8	8	.	.	1	0	0	0
9	9	.	.	1	0	0	1
A	10	.	.	1	0	1	0
B	11	.	.	1	0	1	1
C	12	.	.	1	1	0	0
D	13	.	.	1	1	0	1
E	14	.	.	1	1	1	0
F	15	.	.	1	1	1	1
G	16	.	1	0	0	0	0
H	17	.	1	0	0	0	1
I	18	.	1	0	0	1	0
J	19	.	1	0	0	1	1
K	20	.	1	0	1	0	0
L	21	.	1	0	1	0	1
M	22	.	1	0	1	1	0
N	23	.	1	0	1	1	1
O	24	.	1	1	0	0	0
P	25	.	1	1	0	0	1
Q	26	.	1	1	0	1	0
R	27	.	1	1	0	1	1
S	28	.	1	1	1	0	0
T	29	.	1	1	1	0	1
U	30	.	1	1	1	1	0
V	31	.	1	1	1	1	1
W	32	1	0	0	0	0	0
X	33	1	0	0	0	0	1
Y	34	1	0	0	0	1	0
Z	35	1	0	0	0	1	1

Character	Base 36	Bit 6	Bit 5	Bit 4	Bit 3	Bit 2	Bit 1
0	0	.	.	.	.	.	0
1	1	.	.	.	.	.	1
2	2	.	.	.	.	1	0
3	3	.	.	.	.	1	1
4	4	.	.	.	1	0	0
5	5	.	.	.	1	0	1
6	6	.	.	.	1	1	0
7	7	.	.	.	1	1	1
8	8	.	.	1	0	0	0
9	9	.	.	1	0	0	1
A	10	.	.	1	0	1	0
B	11	.	.	1	0	1	1
C	12	.	.	1	1	0	0
D	13	.	.	1	1	0	1
E	14	.	.	1	1	1	0
F	15	.	.	1	1	1	1
G	16	.	1	0	0	0	0
H	17	.	1	0	0	0	1
I	18	.	1	0	0	1	0
J	19	.	1	0	0	1	1
K	20	.	1	0	1	0	0
L	21	.	1	0	1	0	1
M	22	.	1	0	1	1	0
N	23	.	1	0	1	1	1
O	24	.	1	1	0	0	0
P	25	.	1	1	0	0	1
Q	26	.	1	1	0	1	0
R	27	.	1	1	0	1	1
S	28	.	1	1	1	0	0
T	29	.	1	1	1	0	1
U	30	.	1	1	1	1	0
V	31	.	1	1	1	1	1
W	32	1	0	0	0	0	0
X	33	1	0	0	0	0	1
Y	34	1	0	0	0	1	0
Z	35	1	0	0	0	1	1

Character	Base 36	Bit 6	Bit 5	Bit 4	Bit 3	Bit 2	Bit 1
0	0	.	.	.	.	.	0
1	1	.	.	.	.	.	1
2	2	.	.	.	.	1	0
3	3	.	.	.	.	1	1
4	4	.	.	.	1	0	0
5	5	.	.	.	1	0	1
6	6	.	.	.	1	1	0
7	7	.	.	.	1	1	1
8	8	.	.	1	0	0	0
9	9	.	.	1	0	0	1
A	10	.	.	1	0	1	0
B	11	.	.	1	0	1	1
C	12	.	.	1	1	0	0
D	13	.	.	1	1	0	1
E	14	.	.	1	1	1	0
F	15	.	.	1	1	1	1
G	16	.	1	0	0	0	0
H	17	.	1	0	0	0	1
I	18	.	1	0	0	1	0
J	19	.	1	0	0	1	1
K	20	.	1	0	1	0	0
L	21	.	1	0	1	0	1
M	22	.	1	0	1	1	0
N	23	.	1	0	1	1	1
O	24	.	1	1	0	0	0
P	25	.	1	1	0	0	1
Q	26	.	1	1	0	1	0
R	27	.	1	1	0	1	1
S	28	.	1	1	1	0	0
T	29	.	1	1	1	0	1
U	30	.	1	1	1	1	0
V	31	.	1	1	1	1	1
W	32	1	0	0	0	0	0
X	33	1	0	0	0	0	1
Y	34	1	0	0	0	1	0
Z	35	1	0	0	0	1	1