This video goes into the solution of an Advent Of Code Challenge posted in 2022. It is about finding a 14 character substring that has 14 unique characters. One of the speedups is to represent the characters in the window as bits instead of a hashset or an array.

Why a 32-bit number?

We are dealing with 26 letters, so we need at least 26 bits to keep track if the letter is present in the window or not. Clearly a 16-bit number is too short, 32-bit number is the smallest number that can contain at least 26 bits.

Which bits are used?

As mentioned in the video, each character has a numeric representation.

Easy way to map a character to a specific bit, is calculating the remainder by division by 32 (so called mod 32). In case of the character a, it is mapped to bit 1 as 97 divided by 32 is 3 and remainder 1.

Character Numeric mod 32
a 97 1
b 96 2
y 121 25
z 122 26

From the above table:

  • bit 0 (the bit to the far right) is not used;
  • bits 1 through 26 keep track of the presence of a through z;
  • bits 27 through 31 are not used.

Examples

To make it more clear, here are some examples. Redundant zeros on the left are not being shown, so the bit representation in the table can have less than 32 characters.

Bit representation numeric representation characters found
10 2 a is found
100 4 b is found
100000000000000000000000000 67108864 z is found
110 6 a and b are found
100000000000000000000000010 67108866 a and z are found
100000000000000000000000110 67108870 a, b and z are found

High level approach

Let’s get down at the problem at hand. We have a string of 14 characters and we need to check if all characters are unique.

  • Initialize the state variable to 0 (so no character have been found so far);
  • for each character c in the string
    • compute the modulo 32 of c. This will be the position of the bit in the state we need to check - let’s call it modBit;
    • generate the number corresponding to modBit by shifting 1 to the left for modBit positions - let’s call it modValue;
    • We can now check if we already encountered c. Let’s do the OR operation on modValue and state and save it in newState;
      • if newState is different than state, it means we have flipped a bit to 1. So we haven’t encountered c before
      • if newState is equal to state, it means we have not flipped a bit to 1. This means the bit on position modBit was already 1, meaning we already encountered c

Walkthrough

Let’s test the string azb for uniqueness of characters. Each row in the table is a step of the high level approach.

c modBit modValue state newState state != newState
a 1 2 0 2 true
z 26 67108864 2 67108866 true
b 2 4 67108866 67108870 true

So at the end of the string, we had only unique characters.

But what if we check the string azbz?

c modBit modValue state newState state != newState
a 1 2 0 2 true
z 26 67108864 2 67108866 true
b 2 4 67108866 67108870 true
z 26 67108864 67108870 67108870 false

When checking the second z, the state value did not change meaning z was already encountered, hence the characters in the string are not unique.