Bits-per-Byte measures **how many bits** a compression program needs to know **to guess the next symbol** on average.
For example, **if** the compression program is **perfect**, then the **next symbol is obvious** to it, and it needs **0 bits**, so \( bpb = 0 \).
Or, if it the worst possible, it needs to be given the exact next symbol from the vocabulary, so it needs as \( bpb = log_2(\mathrm{vocabularySize}) \).

## How BPB relates to compression ratio or cross-entropy?

Bits-per-Byte (BPB) and Bits-per-Character (BPC) are metrics related to compression ratio and cross-entropy, used in compression and language modeling, with BPC equaling BPB for ASCII Extended characters, and cross-entropy loss using log2 in character-level models equating to BPC.

- compression ratio is defined as \( \mathrm{cmpRatio} = \mathrm{unCompressedBytes} / \mathrm{compressedBytes} \)
- Bits-per-byte is defined as \( \mathrm{compressedBits} / \mathrm{unCompressedBytes} \)
- Bits-per-byte (bpb) metric is inverse compression ratio divided by 8: \( 1 bpb = 1 / (8 \mathrm{cmpRatio}) \).
- Bits-per-character (bpc) metric for ASCII Extended characters equals bits-per-byte (bpb).
- Cross-entropy loss (log2) for a character-level language model averaged over a dataset equals bpc.
- Perplexity is cross-entropy (log2) to the second power \( PP = 2^{\mathrm{crossEnropy}} \)
- Gzip compresses enwik8 2.92 bpb, Morse code approximately 10.8 bpc
- SRU++ model achieves 1.02 bpc - approximately compression ratio of 8

BPC corresponds to BPB for extended ASCII characters, and when using log2 in character-level models, the cross-entropy loss is equivalent to BPB.

## Neural Data Compression

Data compression relies on ability to predict next symbol. Read more on neural data compression and its applications in machine learning here.