Encoding four base 36 digits as a timestamp

Suraj N. Kurapati


  1. Problem
    1. Approach
      1. Solution
        1. Evaluation
          1. Comparison
            1. Behind the scenes
              1. comparison.rb

              Problem

              Convert a string containing 4 uppercase alphanumeric characters, whose values range from A through Z and 0 through 9, into a “HH:MM:SS” (24 hours, 60 minutes, 60 seconds) timestamp whose value ranges from “00:00:00” through “23:59:59”.

              The string has 1679616 possible values (36 × 36 × 36 × 36) whereas the timestamp has 86400 possible values (24 × 60 × 60). Thus the challenge is to convey more information using less.

              How would you solve this problem? Pause for a moment to think about it. In fact, go now and implement your solution before reading about my particular approach and solution below, as this is quite a fun puzzle! :)

              Approach

              Each character in the input string can be represented as a base 36 digit:

              Let’s examine the base 36 and binary representations of each character:

              Character Base 36 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1
              0 0 . . . . . 0
              1 1 . . . . . 1
              2 2 . . . . 1 0
              3 3 . . . . 1 1
              4 4 . . . 1 0 0
              5 5 . . . 1 0 1
              6 6 . . . 1 1 0
              7 7 . . . 1 1 1
              8 8 . . 1 0 0 0
              9 9 . . 1 0 0 1
              A 10 . . 1 0 1 0
              B 11 . . 1 0 1 1
              C 12 . . 1 1 0 0
              D 13 . . 1 1 0 1
              E 14 . . 1 1 1 0
              F 15 . . 1 1 1 1
              G 16 . 1 0 0 0 0
              H 17 . 1 0 0 0 1
              I 18 . 1 0 0 1 0
              J 19 . 1 0 0 1 1
              K 20 . 1 0 1 0 0
              L 21 . 1 0 1 0 1
              M 22 . 1 0 1 1 0
              N 23 . 1 0 1 1 1
              O 24 . 1 1 0 0 0
              P 25 . 1 1 0 0 1
              Q 26 . 1 1 0 1 0
              R 27 . 1 1 0 1 1
              S 28 . 1 1 1 0 0
              T 29 . 1 1 1 0 1
              U 30 . 1 1 1 1 0
              V 31 . 1 1 1 1 1
              W 32 1 0 0 0 0 0
              X 33 1 0 0 0 0 1
              Y 34 1 0 0 0 1 0
              Z 35 1 0 0 0 1 1

              Notice that bit 6 is present in only 4 characters, W through Z, and absent in all others. Thus, if we randomly picked a character from the set of 36 possible characters, the probability of finding bit 6 in our pick is 4 ÷ 36, or 0.1111111111111111, which approximately denotes a 11% chance of success.

              In contrast, notice that bit 1 is present in every character. Thus, if we randomly picked a character from the set of 36 possible characters, the probability of finding bit 1 in our pick is 36 ÷ 36, or 1, which denotes a 100% chance of success.

              These facts are recorded in the table below.

              Bit Number of characters present in Probability of presence
              6 4 0.1111111111111111
              5 20 0.5555555555555556
              4 28 0.7777777777777778
              3 32 0.8888888888888888
              2 34 0.9444444444444444
              1 36 1

              The timestamp, with its 86400 possible values, cannot represent all 1679616 values possible in the string unless we reuse some of them. That is, instead of mapping each unique string value to exactly one unique timestamp value, we map multiple unique string values to the same unique timestamp value, thereby reusing it.

              This reuse happens automatically when we discard information. For example, if we discard the fourth character from two different strings “ABCD” and “ABCX”, then they would both become “ABC”. Next, if the string “ABC” were mapped to the timestamp “01:02:03”, then both strings “ABCD” and “ABCX” would map to the same “01:02:03” timestamp, thereby achieving reuse.

              We can discard information strategically based on the probabilities listed in the table above. In particular, we can discard bits with lower presence probabilities with less risk of information loss because discarding nonexistent bits from a value does not change it. For example, bit 6 is present 11% of the time, so we only risk information loss 11% of the time by discarding it. In contrast, bit 1 is present 100% of the time, so we risk information loss 100% of the time by discarding it.

              In binary, the string occupies 24 bits (4 characters × 6 bits per character) and the timestamp occupies at least 16 bits (216 < 86400 possible values < 217). Therefore, we need to discard at least 8 bits (24 string bits - 16 timestamp bits) of information from the string to fit it within the timestamp’s 16 bits. This translates to 2 bits (8 bits discarded ÷ 4 characters) discarded per character.

              Since lower bits are more likely to be present than higher ones, as the table above shows, we should keep as many lower bits as possible and discard as few higher bits as necessary. Thus, we shall discard the upper 2 bits (bit 6 and 5) from each character in the string to fit the string within the timestamp.

              Solution

              In the Ruby programming language:

              
              # Converts the given string into a "HH:MM:SS" timestamp.
              def solution(string)
                digits = digits36(string)
                num_seconds = compress(*digits)
                Time.at(num_seconds).utc.strftime('%H:%M:%S')
              end
              
              # Converts each character in the given string into a base 36 digit.
              def digits36(string)
                string.each_char.map { |char| digit36(char) }
              end
              
              # Converts the given character into a base 36 digit:
              # decimal digits (0..9) + uppercase letters (10..35)
              def digit36(char)
                case char
                when '0'..'9' then char.ord - '0'.ord
                when 'A'..'Z' then char.ord - 'A'.ord + 10
                else raise ArgumentError, "not a base36 digit: #{char.inspect}"
                end
              end
              
              # Combines the given base 36 digits together into a 16-bit integer.
              SECONDS_PER_DAY = 24 * 60 * 60 # hours * minutes/hour * seconds/minute
              SECONDS_PER_DAY_BITS = Math.log2(SECONDS_PER_DAY).ceil # 2 ** exponent
              def compress(*chars)
                return 0 if chars.empty?
                bits = SECONDS_PER_DAY_BITS / chars.length
                mask = (1 << bits) - 1
                chars.inject(0) do |acc, char|
                  # grab lower bits from each character and place them side by side
                  (acc << bits) | (char & mask)
                end % SECONDS_PER_DAY
              end
              
              

              Save the above code to a solution.rb file to play with it:

              
              $ irb --simple-prompt -r ./solution.rb
              ## ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]
              
              >> solution "ABCD"
              "12:13:01"
              
              >> digits36 "ABCD"
              [10, 11, 12, 13]
              
              >> compress 10, 11, 12, 13
              43981
              
              >> Time.at(43981).utc
              1970-01-01 12:13:01 UTC
              
              

              The solution above is generic to any number of characters, so try more:

              
              $ irb --simple-prompt -r ./solution.rb
              ## ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]
              
              >> solution "ABCDEFG"
              "03:09:32"
              
              >> digits36 "ABCDEFG"
              [10, 11, 12, 13, 14, 15, 16]
              
              >> compress 10, 11, 12, 13, 14, 15, 16
              11372
              
              >> Time.at(11372).utc
              1970-01-01 03:09:32 UTC
              
              

              Evaluation

              To evaluate how well the algorithm performs, we feed it different sets of input strings and analyze its output: the more input strings it maps to unique timestamps, the smaller the reuse probability and the better it is.

              Algorithm (f : string → timestamp) characteristics Best case Worst case No case
              Number of strings (domain) 65536 160000 1679616
              Number of timestamps (codomain) 65536 65536 65536
              Number of unique timestamps 65536 20736 0
              Number of timestamps reused 0 44800 65536
              Probability of timestamp reuse 0 0.68359375 1
              Number of strings mapped to reused timestamps 0 139264 1679616
              Probability of string mapped to reused timestamp 0 0.8704 1

              In the best case, where all input characters fed into the algorithm lack the upper bits it discards, information is never lost. Therefore, the probability of timestamp reuse is zero.

              In the worst case, where all input characters fed into the algorithm contain the upper bits it discards, some information is always lost. Therefore, the probability of timestamp reuse is greater than zero.

              In the “no” case, where some input characters fed into the algorithm contain the upper bits it discards, information is sometimes lost. However, since this input set contains all 1679616 possible input strings, timestamps will always be reused: the timestamp, with its 86400 possible values, cannot represent all 1679616 values possible in the string unless we reuse some of them.

              This observation applies to any such algorithm.

              Comparison

              My solution to this problem is based on probabilities. However, there are many other solutions, some of which are known as hash functions. Let’s compare them, along with my solution, to see how well they perform.

              To compare these algorithms fairly, we need to feed them the same set of strings. Unlike the singular evaluation of my solution above, here we don’t have the luxury of partitioning the input space into best and worst cases up front, because such cases vary wildly across these algorithms.

              Here, I have chosen to feed all 4-subsets of the 36 possible uppercase alphanumeric characters, which produces 58905 unique strings containing 4 unique uppercase alphanumeric characters each, into these algorithms. Although this choice poorly approximates reality, wherein strings may contain repeated characters, some such sacrifice is necessary to produce a non-exhaustive set of inputs with which to compare these algorithms.

              The results are recorded in the table below, sorted by reuse probabilities in ascending order: from the best result first to the worst result last.

              The key result to notice here is that my solution had the best performance among all of these algorithms. I was delighted to see this because I expected it fare far worse than real algorithms. :-)

              One surprising result is that the ∅ solution, which simply discards the fourth character and doesn’t bother with any of this algorithmic nonsense, performs better than half of these algorithms. Although its performance isn’t praiseworthy, it nevertheless hints that perhaps, sometimes, ignoring a problem can be just as effective as solving it.

              Augh! What am I saying? Those are not the words of a true engineer! Problems must always be solved and solved well, so long as it’s fun. ;-) Anyway, that’s all folks. I hope you enjoyed this algorithmic treatise. But wait! Don’t leave just yet because there are some fun behind the scenes goodies waiting for you below! :-)

              Algorithm (f : string → timestamp) Number of strings (domain) Number of timestamps (codomain) Number of unique timestamps Number of timestamps reused Probability of timestamp reuse Number of strings mapped to reused timestamps Probability of string mapped to reused timestamp
              My solution 58905 39531 34690 4841 0.12246085350737396 24215 0.411085646379764
              Preserve bits 4,3,2,1 58905 39531 34690 4841 0.12246085350737396 24215 0.411085646379764
              Rotating hash 58905 42936 34649 8287 0.193008198248556 24256 0.4117816823699177
              Bernstein XOR hash 58905 46718 35673 11045 0.23641851106639838 23232 0.39439775910364144
              Shift-Add-XOR hash 58905 44623 32440 12183 0.2730206395804854 26465 0.4492827434003905
              Preserve bits 4,3,2,1 then add bits 6,5 58905 40651 29341 11310 0.2782219379597058 29564 0.5018928783634666
              Bernstein hash 58905 43281 28707 14574 0.33672974284327994 30198 0.5126559714795009
              djb2 hash 58905 43281 28707 14574 0.33672974284327994 30198 0.5126559714795009
              Pearson hash 58905 38838 23805 15033 0.3870693650548432 35100 0.5958747135217723
              FNV-1a hash 58905 36225 21085 15140 0.4179434092477571 37820 0.6420507596978186
              FNV-1 hash 58905 35338 20229 15109 0.42755673778934855 38676 0.6565826330532213
              Preserve bits 4,3,2,1 then XOR bits 6,5 58905 29003 15230 13773 0.47488190876805847 43675 0.7414480943892708
              Preserve bits 5,3,2,1 58905 21243 6706 14537 0.6843195405545356 52199 0.8861556743909685
              Exim nhash 58905 7401 1280 6121 0.8270503985947845 57625 0.9782700959171547
              Merge fourth character into others 58905 9467 1477 7990 0.8439843667476498 57428 0.9749257278669043
              Fletcher16 checksum 58905 4935 621 4314 0.8741641337386018 58284 0.9894576012223071
              Preserve bits 5,4,2,1 58905 12931 1610 11321 0.8754930013146702 57295 0.9726678550207962
              sdbm hash 58905 3343 331 3012 0.9009871373018247 58574 0.9943807826160768
              Discard fourth character (∅ solution) 58905 6545 561 5984 0.9142857142857143 58344 0.9904761904761905
              Preserve bits 6,3,2,1 58905 6561 331 6230 0.9495503734186862 58574 0.9943807826160768
              Preserve bits 5,4,3,1 58905 7191 327 6864 0.9545264914476429 58578 0.9944486885663356
              BSD checksum 58905 464 21 443 0.9547413793103449 58884 0.9996434937611408
              LRC checksum 58905 129 4 125 0.9689922480620154 58901 0.9999320940497411
              lose lose hash 58905 129 4 125 0.9689922480620154 58901 0.9999320940497411
              Preserve bits 6,4,2,1 58905 6145 171 5974 0.9721724979658258 58734 0.9970970206264324
              Preserve bits 6,5,2,1 58905 2625 71 2554 0.9729523809523809 58834 0.9987946693829047
              Preserve bits 5,4,3,2 58905 5655 150 5505 0.9734748010610079 58755 0.9974535268652915
              Preserve bits 6,5,3,1 58905 1953 39 1914 0.9800307219662059 58866 0.9993379169849758
              Preserve bits 6,5,4,1 58905 1073 21 1052 0.9804287045666356 58884 0.9996434937611408
              Preserve bits 6,5,4,3 58905 495 9 486 0.9818181818181818 58896 0.9998472116119175
              Preserve bits 6,4,3,1 58905 4193 75 4118 0.9821130455521107 58830 0.9987267634326458
              Jenkins one-at-a-time hash 58905 9582 127 9455 0.9867459820496765 58778 0.9978439860792802
              Preserve bits 6,5,4,2 58905 1005 13 992 0.9870646766169154 58892 0.9997793056616586
              Preserve bits 6,5,3,2 58905 1749 21 1728 0.9879931389365352 58884 0.9996434937611408
              Preserve bits 6,4,3,2 58905 3621 37 3584 0.9897818282242474 58868 0.9993718699601053

              Behind the scenes

              I created the giant data table you see above, which compares different algorithms, by first running the comparison script and then formatting its results into a nice data table:

              $ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.combinations' > comparison.txt
              
              $ vim comparison.txt # do some post-processing to format the results into a nice data table
              

              Similarly, I created the smaller data table you saw above, which compares the best, worst, and “no” case performance of my solution, by running these commands:

              
              # BEST CASE
              
              $ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.best_case, /^preserve_lower_bits$/'
              [{"Compression algorithm"=>:preserve_lower_bits,
                "Number of strings (domain)"=>65536,
                "Number of timestamps (codomain)"=>65536,
                "Number of unique timestamps"=>65536,
                "Number of timestamps reused"=>0,
                "Probability of timestamp reuse"=>0.0,
                "Number of strings mapped to reused timestamps"=>0,
                "Probability of string mapped to reused timestamp"=>0.0}]
              
              # WORST CASE
              
              $ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.worst_case, /^preserve_lower_bits$/'
              [{"Compression algorithm"=>:preserve_lower_bits,
                "Number of strings (domain)"=>160000,
                "Number of timestamps (codomain)"=>65536,
                "Number of unique timestamps"=>20736,
                "Number of timestamps reused"=>44800,
                "Probability of timestamp reuse"=>0.68359375,
                "Number of strings mapped to reused timestamps"=>139264,
                "Probability of string mapped to reused timestamp"=>0.8704}]
              
              # NO CASE
              
              $ ruby -r ./comparison.rb -r pp -e 'pp Comparator.compare Input.no_case, /^preserve_lower_bits$/'
              [{"Compression algorithm"=>:preserve_lower_bits,
                "Number of strings (domain)"=>1679616,
                "Number of timestamps (codomain)"=>65536,
                "Number of unique timestamps"=>0,
                "Number of timestamps reused"=>65536,
                "Probability of timestamp reuse"=>1.0,
                "Number of strings mapped to reused timestamps"=>1679616,
                "Probability of string mapped to reused timestamp"=>1.0}]
              
              

              You can load the comparison script into an interactive Ruby shell and play with the algorithms and comparisons dynamically:

              $ irb --simple-prompt -r ./comparison.rb
              ## ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]
              
              >> Input.singleton_methods
              [:combinations, :permutations, :best_case, :worst_case, :no_case]
              
              >> Input.combinations
              #<Enumerator: ...>
              
              >> Input.combinations.take(3)
              [["0", "1", "2", "3"], ["0", "1", "2", "4"], ["0", "1", "2", "5"]]
              
              >> Input.permutations.take(3)
              [["0", "0", "0", "0"], ["0", "0", "0", "1"], ["0", "0", "0", "2"]]
              
              >> Input.best_case.take(3)
              [["0", "0", "0", "0"], ["0", "0", "0", "1"], ["0", "0", "0", "2"]]
              
              >> Input.worst_case.take(3)
              [["G", "G", "G", "G"], ["G", "G", "G", "H"], ["G", "G", "G", "I"]]
              
              >> Input.no_case.take(3)
              [["0", "0", "0", "0"], ["0", "0", "0", "1"], ["0", "0", "0", "2"]]
              

              You can even define a new algorithm and measure its performance:

              >> def Compressor.my_new_solution(a, b, c, d)
              >>   a + b - c * d
              >> end
              :my_new_solution
              
              >> Comparator.measure Input.combinations, :my_new_solution
              {"Compression algorithm"=>:my_new_solution,
               "Number of strings (domain)"=>58905,
               "Number of timestamps (codomain)"=>1183,
               "Number of unique timestamps"=>7,
               "Number of timestamps reused"=>1176,
               "Probability of timestamp reuse"=>0.9940828402366864,
               "Number of strings mapped to reused timestamps"=>58898,
               "Probability of string mapped to reused timestamp"=>0.9998811645870469}
              

              You can also compare it to other algorithms, just as this article did:

              >> Comparator.compare Input.combinations
              # ... too much output, so it's omitted ...
              

              If that’s too much, selectively compare just the algorithms you want:

              >> Comparator.compare Input.combinations, /my_new|char/
              [{"Compression algorithm"=>:merge_fourth_char_into_others,
                "Number of strings (domain)"=>58905,
                "Number of timestamps (codomain)"=>9936,
                "Number of unique timestamps"=>1841,
                "Number of timestamps reused"=>8095,
                "Probability of timestamp reuse"=>0.8147141706924316,
                "Number of strings mapped to reused timestamps"=>57064,
                "Probability of string mapped to reused timestamp"=>0.9687462863933453},
               {"Compression algorithm"=>:discard_fourth_char,
                "Number of strings (domain)"=>58905,
                "Number of timestamps (codomain)"=>6545,
                "Number of unique timestamps"=>561,
                "Number of timestamps reused"=>5984,
                "Probability of timestamp reuse"=>0.9142857142857143,
                "Number of strings mapped to reused timestamps"=>58344,
                "Probability of string mapped to reused timestamp"=>0.9904761904761905},
               {"Compression algorithm"=>:my_new_solution,
                "Number of strings (domain)"=>58905,
                "Number of timestamps (codomain)"=>1183,
                "Number of unique timestamps"=>7,
                "Number of timestamps reused"=>1176,
                "Probability of timestamp reuse"=>0.9940828402366864,
                "Number of strings mapped to reused timestamps"=>58898,
                "Probability of string mapped to reused timestamp"=>0.9998811645870469}]
              

              comparison.rb

              In the Ruby programming language:

              
              # Combines the given base 36 digits together into a 16-bit integer.
              module Compressor
                extend self
              
                #              _ _             _       _   _
                #  _ __  _   _| | |  ___  ___ | |_   _| |_(_) ___  _ __
                # | '_ \| | | | | | / __|/ _ \| | | | | __| |/ _ \| '_ \
                # | | | | |_| | | | \__ \ (_) | | |_| | |_| | (_) | | | |
                # |_| |_|\__,_|_|_| |___/\___/|_|\__,_|\__|_|\___/|_| |_|
                #
              
                # Why bother with all of this? Just discard the fourth character.
                def discard_fourth_char(a, b, c, d)
                  (a * (36 ** 2)) + (b * 36) + c
                end
              
                #                              _       _   _
                #  _ __ ___  _   _   ___  ___ | |_   _| |_(_) ___  _ __  ___
                # | '_ ` _ \| | | | / __|/ _ \| | | | | __| |/ _ \| '_ \/ __|
                # | | | | | | |_| | \__ \ (_) | | |_| | |_| | (_) | | | \__ \
                # |_| |_| |_|\__, | |___/\___/|_|\__,_|\__|_|\___/|_| |_|___/
                #            |___/
              
                def merge_fourth_char_into_others(a, b, c, d)
                  c += (d & 0b11)
                  b += (d & 0b1100) >> 2
                  a += (d & 0b110000) >> 4
                  (a * (36 ** 2)) + (b * 36) + c
                end
              
                # This is the solution described in this article.  (((THE WINNER!)))
                def preserve_lower_bits(a, b, c, d)
                  upper_byte = ((a & 0xF) << 4) | (b & 0xF)
                  lower_byte = ((c & 0xF) << 4) | (d & 0xF)
                  (upper_byte << 8) | lower_byte
                end
              
                # Generate preserve_bits_WXYZ() functions that try all combinations.
                preserve_bits_helper = lambda do |a, b, c, d, bits|
                  fetch_bits = lambda do |char|
                    bits.each_with_index.map do |bit, index|
                      char[bit] << index
                    end.reduce(:|).to_i
                  end
                  upper_byte = (fetch_bits.(a) << 4) | fetch_bits.(b)
                  lower_byte = (fetch_bits.(c) << 4) | fetch_bits.(d)
                  ((upper_byte << 8) | lower_byte)
                end
                (0..5).to_a.combination(4).each do |bits|
                  define_method "preserve_bits_#{bits.sort.reverse.map(&:next).join}" do
                    |a, b, c, d| preserve_bits_helper.(a, b, c, d, bits)
                  end
                end
              
                def preserve_lower_bits_then_add_upper_bits(a, b, c, d)
                  lower_bits = lambda {|char| (char & 0b001111) }
                  upper_bits = lambda {|char| (char & 0b110000) >> 4 }
                  upper_byte = (lower_bits.(a) << 4) | lower_bits.(c)
                  lower_byte = (lower_bits.(b) << 4) | lower_bits.(d)
                  extra_byte = (upper_bits.(a) << 6) | (upper_bits.(c) << 4) |
                               (upper_bits.(b) << 2) | (upper_bits.(d) << 0)
                  ((upper_byte << 8) | lower_byte) + extra_byte
                end
              
                def preserve_lower_bits_then_xor_upper_bits(a, b, c, d)
                  lower_bits = lambda {|char| (char & 0b001111) }
                  upper_bits = lambda {|char| (char & 0b110000) >> 4 }
                  xor_helper = lambda do |char|
                    ((upper_bits.(char) ^ (lower_bits.(char) >> 2)) << 2) |
                     (lower_bits.(char) & 0b11)
                  end
                  upper_byte = (xor_helper.(a) << 4) | xor_helper.(c)
                  lower_byte = (xor_helper.(b) << 4) | xor_helper.(d)
                  ((upper_byte << 8) | lower_byte)
                end
              
                #                 _             _       _   _
                #  _ __ ___  __ _| |  ___  ___ | |_   _| |_(_) ___  _ __  ___
                # | '__/ _ \/ _` | | / __|/ _ \| | | | | __| |/ _ \| '_ \/ __|
                # | | |  __/ (_| | | \__ \ (_) | | |_| | |_| | (_) | | | \__ \
                # |_|  \___|\__,_|_| |___/\___/|_|\__,_|\__|_|\___/|_| |_|___/
                #
              
                # https://en.wikipedia.org/wiki/Pearson_hashing
                PEARSON_TABLE = ( 0 ... 2**16 ).to_a.shuffle
                def pearson_hash(a, b, c, d)
                  hash = 0
                  [a, b, c, d].each do |char|
                    index = hash ^ char
                    hash = PEARSON_TABLE[index]
                  end
                  hash
                end
              
                # http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
                def rotating_hash(a, b, c, d)
                  hash = 0
                  [a, b, c, d].each do |char|
                    hash = (hash << 4) ^ (hash >> 28) ^ char
                  end
                  hash
                end
              
                # http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
                def bernstein_hash(a, b, c, d)
                  hash = 0
                  [a, b, c, d].each do |char|
                    hash = 33 * hash + char
                  end
                  hash
                end
              
                # http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
                def bernstein_xor_hash(a, b, c, d)
                  hash = 0
                  [a, b, c, d].each do |char|
                    hash = 33 * hash ^ char
                  end
                  hash
                end
              
                # http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
                def shift_add_xor_hash(a, b, c, d)
                  hash = 0
                  [a, b, c, d].each do |char|
                    hash ^= (hash << 5) + (hash >> 2) + char
                  end
                  hash
                end
              
                # http://www.isthe.com/chongo/tech/comp/fnv/#FNV-1
                def fnv1_hash(a, b, c, d)
                  hash = 2166136261
                  [a, b, c, d].each do |char|
                    hash = (hash * 16777619) ^ char
                  end
                  hash
                end
              
                # http://www.isthe.com/chongo/tech/comp/fnv/#FNV-1a
                def fnv1a_hash(a, b, c, d)
                  hash = 2166136261
                  [a, b, c, d].each do |char|
                    hash = (hash ^ char) * 16777619
                  end
                  hash
                end
              
                # https://en.wikipedia.org/wiki/Jenkins_hash_function
                def jenkins_one_at_a_time_hash(a, b, c, d)
                  hash = 0
                  [a, b, c, d].each do |char|
                    hash += char
                    hash += (hash << 10)
                    hash ^= (hash >> 6)
                  end
                  hash += (hash << 3)
                  hash ^= (hash >> 11)
                  hash += (hash << 15)
                  hash
                end
              
                # Algorithm::Nhash - Exim nhash algorithm
                # http://cpansearch.perl.org/src/MOOLI/Algorithm-Nhash-0.002/lib/Algorithm/Nhash.pm
                NHASH_PRIMES = [3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
                                61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113]
                def exim_nhash(a, b, c, d)
                  sum = 0
                  i = 0
                  [a, b, c, d].each do |char|
                    i += 28
                    i %= 29
                    sum += NHASH_PRIMES[i] * char
                  end
                  sum
                end
              
                # https://en.wikipedia.org/wiki/Fletcher%27s_checksum
                def fletcher16_checksum(a, b, c, d)
                  sum1 = 0
                  sum2 = 0
              
                  [a, b, c, d].each do |char|
                    sum1 = (sum1 + char) % 255
                    sum2 = (sum2 + sum1) % 255
                  end
              
                  (sum2 << 8) | sum1
                end
              
                # https://en.wikipedia.org/wiki/BSD_checksum
                def bsd_checksum(a, b, c, d)
                  checksum = 0                 # The checksum mod 2^16.
                  [a, b, c, d].each do |char|
                    checksum = (checksum >> 1) + ((checksum & 1) << 15)
                    checksum += char
                    checksum &= 0xffff         # Keep it within bounds.
                  end
                  checksum
                end
              
                # https://en.wikipedia.org/wiki/Longitudinal_redundancy_check
                def lrc_checksum(a, b, c, d)
                  lrc = 0
                  [a, b, c, d].each do |char|
                    lrc = (lrc + char) & 0xFF
                  end
                  lrc = (((lrc ^ 0xFF) + 1) & 0xFF)
                end
              
                # http://www.cse.yorku.ca/~oz/hash.html
                def djb2_hash(a, b, c, d)
                  hash = 5381
                  [a, b, c, d].each do |char|
                    hash = ((hash << 5) + hash) + char # hash * 33 + c
                  end
                  hash
                end
              
                # http://www.cse.yorku.ca/~oz/hash.html
                def sdbm_hash(a, b, c, d)
                  hash = 0
                  [a, b, c, d].each do |char|
                    hash = char + (hash << 6) + (hash << 16) - hash
                  end
                  hash
                end
              
                # http://www.cse.yorku.ca/~oz/hash.html
                def lose_lose_hash(a, b, c, d)
                  hash = 0
                  [a, b, c, d].each do |char|
                    hash += char
                  end
                  hash
                end
              
              end
              
              module Solver
                extend self
              
                # Converts the given 4-character string into a "HH:MM:SS" timestamp.
                def solution(string, compressor)
                  a, b, c, d  = digits36(string)
                  num_seconds = Compressor.send(compressor, a, b, c, d)
                  Time.at(num_seconds).strftime('%H:%M:%S')
                end
              
                # Converts each character in the given string into a base 36 digits.
                def digits36(string)
                  string.each_char.map { |char| digit36(char) }
                end
              
                # Converts the given character into a base 36 digit:
                # decimal digits (0..9) + uppercase letters (10..35)
                def digit36(char)
                  case char
                  when '0'..'9' then char.ord - '0'.ord
                  when 'A'..'Z' then char.ord - 'A'.ord + 10
                  else -1
                  end
                end
              
                # Converts the given base 36 digit into a character.
                def undigit36(code)
                  case code
                  when  0..9  then ('0'.ord + code).chr
                  when 10..35 then ('A'.ord + (code - 10)).chr
                  else '?'
                  end
                end
              
              end
              
              module Input
                extend self
              
                DOMAIN = ('0'..'9').to_a + ('A'..'Z').to_a
              
                def combinations
                  DOMAIN.combination(4)
                end
              
                def permutations(domain=DOMAIN)
                  if block_given?
                    domain.each do |a|
                      domain.each do |b|
                        domain.each do |c|
                          domain.each do |d|
                            yield [a, b, c, d]
                          end
                        end
                      end
                    end
                  else
                    enum_for __method__, domain
                  end
                end
              
                def best_case
                  permutations DOMAIN[0..15]
                end
              
                def worst_case
                  permutations DOMAIN[16..35]
                end
              
                alias no_case permutations
              
              end
              
              module Comparator
                extend self
              
                # Compares the fitness of all compressors that match the given regexp.
                def compare(inputs, compressor_regexp=//)
                  Compressor.singleton_methods.grep(compressor_regexp).
                    map { |compressor| measure(inputs, compressor) }.
                    sort_by { |measurement| measurement['Probability of timestamp reuse'] }
                end
              
                # Measures the fitness of the given compressor.
                def measure(inputs, compressor)
                  reuse_by_timestamp = measure_reuse(inputs, compressor)
              
                  num_keys = reuse_by_timestamp.count
                  num_values = reuse_by_timestamp.values.reduce(:+).to_i
              
                  actual_reuse = reuse_by_timestamp.select {|k,v| v > 1 }
                  num_keys_reused = actual_reuse.count
                  num_values_reused = actual_reuse.values.reduce(:+).to_i
              
                  {
                    'Compression algorithm'                            => compressor,
                    'Number of strings (domain)'                       => num_values,
                    'Number of timestamps (codomain)'                  => num_keys,
                    'Number of unique timestamps'                      => num_keys - num_keys_reused,
                    'Number of timestamps reused'                      => num_keys_reused,
                    'Probability of timestamp reuse'                   => num_keys_reused / num_keys.to_f,
                    'Number of strings mapped to reused timestamps'    => num_values_reused,
                    'Probability of string mapped to reused timestamp' => num_values_reused / num_values.to_f,
                  }
                end
              
                def measure_reuse(inputs, compressor)
                  reuse_by_timestamp = Hash.new(0)
                  inputs.each do |characters|
                    string = characters.join
                    timestamp = Solver.solution(string, compressor)
                    reuse_by_timestamp[timestamp] += 1
                  end
                  reuse_by_timestamp
                end
              
              end
              
              

              Updates