Menu

String Compression and Concatenation Performance

Here is a whiteboarding problem that may come up in interviews.

You are given a string. Implement a function to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2blc5a3. If the "compressed" string would not become smaller than the original string, the function should return the original string. For example, the string aabc would remain aabc because a2b1c1 is larger than the original.

This is a pretty simple problem to solve. Here is a basic implementation:

var compressedString = function(str) {  
  if (str.length < 3) return str;

  var compressedStr = '';
  var prevChar = str.charAt(0);
  var count = 1;

  for (var i = 1; i < str.length; i++) {
    if (str[i] !== prevChar) {
      compressedStr += prevChar + count;
      prevChar = str[i];
      count = 1;
    } else {
      count++;
    }
  }
  compressedStr += prevChar + count;

  return compressedStr.length >= str.length ? str : compressedStr;
};

That was my initial stab at the solution. I was then curious to see how others went about it. While I was doing research, I learned something new. The string concatenation happening within the for-loop using compressedStr += prevChar + count can potentially operate in O(n²) time, depending on the environment; some browsers have optimizations to make string concatenation more performant.

In JavaScript, strings are immutable—they cannot be changed. As a result, doing concatenation using += will create a copy of the string by iterating through each character one at a time (O(n)), and then appending the new string to the end. So because the string concatenation happens in linear time, and you have to iterate up to n characters with the for-loop, that results to O(n²).

If you have a large string and time is important, then a data structure like a StringBuffer would give you O(n) time. In a nutshell, a StringBuffer is an array that progressively collects each character of a string, and then at the end, it joins all the characters as a string. Thus, the concatenation only happens once at the very end instead of each time a new character is added. That does mean you end up with O(n) space though.

But wait, StringBuffer doesn't exist in JavaScript! Well, we can create a basic version of a StringBuffer. Check it:

var compressedString = function(str) {  
  if (str.length < 3) return str;

  var compressedStrArray = [];
  var index = 0;
  var char = str.charAt(0);
  var count = 1;

  for (var i = 1; i < str.length; i++) {
    if (str[i] !== char) {
      index = setChar(compressedStrArray, index, char, count);
      char = str[i];
      count = 1;
    } else {
      count++;
    }
  }
  setChar(compressedStrArray, index, char, count);
  var compressedStr = compressedStrArray.join('');

  return compressedStr.length >= str.length ? str : compressedStr;
};

var setChar = function(array, index, char, count) {  
  array[index++] = char;
  array[index++] = count;
  return index;
};

One optimization included is using the helper function, setChar, to append characters to the array representing the StringBuffer. The Array.prototype.push method could have been used to append new characters to the StringBuffer, but referencing an index is faster.

So is it still "safe" to use += concatenation? Absolutely, but you need to consider the environment where your code is running. Recent browsers will have optimizations for +=. Also, if you are doing a one-time concatenation and not concatenating within a loop, feel free to use +=.