Text Similarity Score - Problem

Calculate the similarity between two text documents by finding the longest common word sequence (not substring).

Given two documents represented as arrays of words, find the length of the longest sequence of words that appears in both documents in the same order, but not necessarily consecutively.

For example, if document A contains ["the", "quick", "brown", "fox"] and document B contains ["a", "quick", "brown", "cat"], the longest common word sequence is ["quick", "brown"] with length 2.

This is similar to the classic Longest Common Subsequence problem but applied to word sequences instead of character sequences.

Input & Output

Example 1 — Basic Common Words

$ Input: doc1 = ["the", "cat", "sat"], doc2 = ["a", "cat", "ran"]

› Output: 1

💡 Note: The longest common word sequence is ["cat"] with length 1. The word "cat" appears in both documents in the same relative position.

Example 2 — Multiple Common Words

$ Input: doc1 = ["quick", "brown", "fox"], doc2 = ["the", "quick", "brown", "dog"]

› Output: 2

💡 Note: The longest common word sequence is ["quick", "brown"] with length 2. Both words appear in the same order in both documents.

Example 3 — No Common Words

$ Input: doc1 = ["hello", "world"], doc2 = ["good", "bye"]

› Output: 0

💡 Note: There are no common words between the two documents, so the similarity score is 0.

Constraints

1 ≤ doc1.length, doc2.length ≤ 1000
1 ≤ doc1[i].length, doc2[i].length ≤ 100
All words consist of lowercase English letters

Visualization

Tap to expand

Asked in

G Google 45 a Amazon 38 M Microsoft 32 f Facebook 28

The key insight is to use dynamic programming to avoid recalculating overlapping subproblems when finding the longest common word subsequence. Best approach is 2D Dynamic Programming with optimal Time: O(m×n), Space: O(m×n).

Common Approaches

✓ 2D Dynamic Programming

⏱️ Time: O(m×n) Space: O(m×n)

Build a 2D DP table where dp[i][j] represents the longest common subsequence length for the first i words of doc1 and first j words of doc2. Fill the table iteratively from bottom-up.

Brute Force Recursion

⏱️ Time: O(2^(m+n)) Space: O(m+n)

For each position in both documents, recursively try matching or skipping words to find the longest common sequence. This explores all possible combinations but recalculates many subproblems.

Memoized Recursion

⏱️ Time: O(m×n) Space: O(m×n)

Use the same recursive approach but store results of subproblems in a memoization table. This eliminates redundant calculations while maintaining the intuitive recursive structure.

2D Dynamic Programming — Algorithm Steps

Create 2D DP table with dimensions (m+1)×(n+1)
Initialize base cases (empty documents)
For each cell, if words match: dp[i][j] = dp[i-1][j-1] + 1
If words don't match: dp[i][j] = max(dp[i-1][j], dp[i][j-1])
Return dp[m][n] as final result

Visualization

Tap to expand

Step-by-Step Walkthrough

Initialize Table

Create (m+1)×(n+1) table with base cases

Fill Cells

For each cell, compare words and fill based on recurrence

Final Answer

Bottom-right cell contains the maximum LCS length

Code -

solution.c — C

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_WORDS 1000
#define MAX_WORD_LEN 100

int solution(char doc1[][MAX_WORD_LEN], char doc2[][MAX_WORD_LEN], int len1, int len2) {
    int dp[MAX_WORDS + 1][MAX_WORDS + 1];
    
    // Initialize DP table
    for (int i = 0; i <= len1; i++) {
        for (int j = 0; j <= len2; j++) {
            dp[i][j] = 0;
        }
    }
    
    for (int i = 1; i <= len1; i++) {
        for (int j = 1; j <= len2; j++) {
            if (strcmp(doc1[i - 1], doc2[j - 1]) == 0) {
                dp[i][j] = dp[i - 1][j - 1] + 1;
            } else {
                dp[i][j] = dp[i - 1][j] > dp[i][j - 1] ? dp[i - 1][j] : dp[i][j - 1];
            }
        }
    }
    
    return dp[len1][len2];
}

int main() {
    char doc1[MAX_WORDS][MAX_WORD_LEN], doc2[MAX_WORDS][MAX_WORD_LEN];
    int len1 = 0, len2 = 0;
    char line[10000];
    
    // Parse first array
    fgets(line, sizeof(line), stdin);
    char *token = strtok(line, "[,]\"");
    while (token && len1 < MAX_WORDS) {
        if (strlen(token) > 1) {
            strcpy(doc1[len1], token);
            len1++;
        }
        token = strtok(NULL, "[,]\"");
    }
    
    // Parse second array
    fgets(line, sizeof(line), stdin);
    token = strtok(line, "[,]\"");
    while (token && len2 < MAX_WORDS) {
        if (strlen(token) > 1) {
            strcpy(doc2[len2], token);
            len2++;
        }
        token = strtok(NULL, "[,]\"");
    }
    
    int result = solution(doc1, doc2, len1, len2);
    printf("%d\n", result);
    
    return 0;
}

Time & Space Complexity

Time Complexity

⏱️

O(m×n)

Nested loops iterate through all m×n combinations once

✓ Linear Growth

Space Complexity

O(m×n)

2D DP table stores results for all (m+1)×(n+1) combinations

⚡ Linearithmic Space

23.5K Views

Medium Frequency

~25 min Avg. Time

892 Likes

Ln 1, Col 1

Smart Actions

💡 Explanation

AI Ready

💡 Suggestion Tab to accept Esc to dismiss

// Output will appear here after running code

Code Editor Closed

Click the red button to reopen

Text Similarity Score - Problem

Input & Output

Constraints

Visualization

Related Problems

Common Approaches

2D Dynamic Programming — Algorithm Steps

Visualization

Code -

Time & Space Complexity

Select Compiler