LZ77 Compression - Problem

Implement LZ77 compression and decompression algorithms. LZ77 is a lossless data compression algorithm that uses a sliding window to find repeated patterns in the input data.

For compression, scan through the input text and for each position, find the longest match within a sliding window of previously seen characters. Output a sequence of tokens in the format (offset, length, next_char) where:

offset: Distance back to the start of the match (0 if no match)
length: Length of the matched substring (0 if no match)
next_char: The character following the match

For decompression, process each token and reconstruct the original text by copying characters from the previously decoded output based on the offset and length, then append the next character.

Use a sliding window size of 12 characters for the search buffer.

Input & Output

Example 1 — Basic Compression

$ Input: text = "ABABAC", operation = "compress"

› Output: [[0,0,"A"],[0,0,"B"],[2,2,"C"]]

💡 Note: First A: no previous match → (0,0,A). First B: no match → (0,0,B). Second AB: matches AB at position 0-1, offset=2, length=2 → (2,2,C)

Example 2 — Decompression

$ Input: text = [[0,0,"A"],[0,0,"B"],[2,2,"C"]], operation = "decompress"

› Output: ABABAC

💡 Note: Token (0,0,A): copy nothing, append A → "A". Token (0,0,B): append B → "AB". Token (2,2,C): copy 2 chars from offset 2 (AB), append C → "ABABAC"

Example 3 — No Matches Found

$ Input: text = "ABCD", operation = "compress"

› Output: [[0,0,"A"],[0,0,"B"],[0,0,"C"],[0,0,"D"]]

💡 Note: Each character appears for the first time, so no matches are found. Each token has offset=0, length=0, followed by the literal character.

Constraints

1 ≤ text.length ≤ 1000 (for compression)
1 ≤ tokens.length ≤ 1000 (for decompression)
Window size is fixed at 12 characters
Characters are printable ASCII

Visualization

Tap to expand

Asked in

G Google 12 a Amazon 8 M Microsoft 6 A Adobe 4

The key insight is using a sliding window to balance compression efficiency with memory usage. The sliding window approach limits the search space to recent characters, achieving O(n²) time complexity while maintaining good compression ratios. Hash table optimization further improves performance by enabling fast lookups of potential matches. Time: O(n×w), Space: O(w) where w=12 is window size.

Common Approaches

✓ Brute Force Pattern Matching

⏱️ Time: O(n³) Space: O(n)

For each character position, scan through all previous characters to find the longest matching substring. Use the entire history as the search buffer without any window limitation.

Hash Table Optimization

⏱️ Time: O(n×w) Space: O(w)

Build a hash table that maps character sequences to their positions within the sliding window. This allows for faster lookup of potential matches instead of linear scanning.

Sliding Window Optimization

⏱️ Time: O(n²) Space: O(1)

Use a sliding window of fixed size (12 characters) to limit the search space. Only look for matches within the most recent window, which improves both time and space efficiency.

Brute Force Pattern Matching — Algorithm Steps

For each position i, check all positions j < i for matches
Find the longest match starting at any position j
Output token (i-j, match_length, next_char)
Continue until end of input

Visualization

Tap to expand

Step-by-Step Walkthrough

Current Position

At position i, we need to find the longest match

Check All Previous

Compare with every position j < i

Find Best Match

Output token with longest match found

Code -

solution.c — C

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct {
    int offset;
    int length;
    char nextChar;
} Token;

void compress(const char* text, Token* result, int* count) {
    int len = strlen(text);
    int i = 0;
    *count = 0;
    
    while (i < len) {
        int bestOffset = 0;
        int bestLength = 0;
        
        // Search entire previous text
        for (int j = 0; j < i; j++) {
            int length = 0;
            while (i + length < len && j + length < i && 
                   text[j + length] == text[i + length]) {
                length++;
            }
            if (length > bestLength) {
                bestLength = length;
                bestOffset = i - j;
            }
        }
        
        char nextChar = (i + bestLength < len) ? text[i + bestLength] : '\0';
        result[*count] = (Token){bestOffset, bestLength, nextChar};
        (*count)++;
        i += bestLength + 1;
    }
}

int main() {
    char text[1000];
    char operation[20];
    
    fgets(text, sizeof(text), stdin);
    text[strcspn(text, "\n")] = 0;
    
    fgets(operation, sizeof(operation), stdin);
    operation[strcspn(operation, "\n")] = 0;
    
    if (strcmp(operation, "compress") == 0) {
        Token tokens[100];
        int count;
        compress(text, tokens, &count);
        
        printf("[");
        for (int i = 0; i < count; i++) {
            if (i > 0) printf(",");
            printf("[%d,%d,\"%c\"]", tokens[i].offset, tokens[i].length, 
                   tokens[i].nextChar ? tokens[i].nextChar : ' ');
        }
        printf("]\n");
    }
    
    return 0;
}

Time & Space Complexity

Time Complexity

⏱️

O(n³)

For each of n positions, we check all previous positions (O(n²)) and compare characters (O(n))

⚠ Quadratic Growth

Space Complexity

O(n)

Store the entire input text and output tokens

⚡ Linearithmic Space

9.0K Views

Medium Frequency

~45 min Avg. Time

340 Likes

Ln 1, Col 1

Smart Actions

💡 Explanation

AI Ready

💡 Suggestion Tab to accept Esc to dismiss

// Output will appear here after running code

Code Editor Closed

Click the red button to reopen

LZ77 Compression - Problem

Input & Output

Constraints

Visualization

Related Problems

Common Approaches

Brute Force Pattern Matching — Algorithm Steps

Visualization

Code -

Time & Space Complexity

Select Compiler