This video goes into the solution of an Advent Of Code Challenge posted in 2022. It is about finding a 14 character substring that has 14 unique characters. One of the speedups is to represent the characters in the window as bits instead of a hashset or an array.

## Why a 32-bit number?

We are dealing with 26 letters, so we need at least 26 bits to keep track if the letter is present in the window or not. Clearly a 16-bit number is too short, 32-bit number is the smallest number that can contain at least 26 bits.

## Which bits are used?

As mentioned in the video, each character has a numeric representation.

Easy way to map a character to a specific bit, is calculating the remainder by division by 32 (so called mod 32). In case of the character `a`

, it is mapped to bit 1 as 97 divided by 32 is 3 and **remainder 1**.

Character | Numeric | mod 32 |
---|---|---|

a | 97 | 1 |

b | 96 | 2 |

… | … | … |

y | 121 | 25 |

z | 122 | 26 |

From the above table:

- bit 0 (the bit to the far right) is not used;
- bits 1 through 26 keep track of the presence of
`a`

through`z`

; - bits 27 through 31 are not used.

## Examples

To make it more clear, here are some examples. Redundant zeros on the left are not being shown, so the bit representation in the table can have less than 32 characters.

Bit representation | numeric representation | characters found |
---|---|---|

10 | 2 | a is found |

100 | 4 | b is found |

100000000000000000000000000 | 67108864 | z is found |

110 | 6 | a and b are found |

100000000000000000000000010 | 67108866 | a and z are found |

100000000000000000000000110 | 67108870 | a, b and z are found |

## High level approach

Let’s get down at the problem at hand. We have a string of 14 characters and we need to check if all characters are unique.

- Initialize the
`state`

variable to 0 (so no character have been found so far); - for each character
`c`

in the string- compute the modulo 32 of
`c`

. This will be the position of the bit in the state we need to check - let’s call it`modBit`

; - generate the number corresponding to
`modBit`

by shifting 1 to the left for`modBit`

positions - let’s call it`modValue`

; - We can now check if we already encountered
`c`

. Let’s do the`OR`

operation on`modValue`

and`state`

and save it in`newState`

;- if
`newState`

is different than`state`

, it means we have flipped a bit to 1. So we haven’t encountered`c`

before - if
`newState`

is equal to`state`

, it means we have**not**flipped a bit to 1. This means the bit on position`modBit`

was already 1, meaning we already encountered`c`

- if

- compute the modulo 32 of

## Walkthrough

Let’s test the string `azb`

for uniqueness of characters. Each row in the table is a step of the high level approach.

c | modBit | modValue | state | newState | state != newState |
---|---|---|---|---|---|

a | 1 | 2 | 0 | 2 | true |

z | 26 | 67108864 | 2 | 67108866 | true |

b | 2 | 4 | 67108866 | 67108870 | true |

So at the end of the string, we had only unique characters.

But what if we check the string `azbz`

?

c | modBit | modValue | state | newState | state != newState |
---|---|---|---|---|---|

a | 1 | 2 | 0 | 2 | true |

z | 26 | 67108864 | 2 | 67108866 | true |

b | 2 | 4 | 67108866 | 67108870 | true |

z | 26 | 67108864 | 67108870 | 67108870 | false |

When checking the second `z`

, the `state`

value did not change meaning `z`

was already encountered, hence the characters in the string are not unique.