Problem

You are given multiple integer sets and you need to merge them into a single set in the fastest possible way.

Solution

The always updated article is always available on Github:sangupta/ps repo

Suppose we have 3 arrays of integer sets and we need to merge them. The fastest solution would be possible in time of O(n1 + n2 + n3) where n1, n2, n3 are the lengths of the three sets respectively. The solution lies in using a bit-vector (also called as bit-set or a bit-array) to represent elements using the index and then iterating over them.

Construct a bit-array
Iterate over the first array and for each integer set the corresponding bit to true
Repeat the above process for remaining arrays
Any duplicate element will result in setting true an already set bit
The resultant bit-array is the merge of all the arrays

public void mergeArrays(int[] array1, int[] array2, int[] array3) {
  BitSet bitSet = new BitSet();

  // start merging
  for(int num : array1) {
    bitSet.setBit(num, true);
  }
  for(int num : array2) {
    bitSet.setBit(num, true);
  }
  for(int num : array3) {
    bitSet.setBit(num, true);
  }

  // output
  int index = -1;
  do {
    index = bitSet.getNextSetBit(index);
    if(index < 0) {
      break;
    }

    System.out.println(index);
  } while(true);
}

The solution above can be extended to as many arrays as are provided in the problem definition. The time to sort will still remain O(N) where N is the sum of total number of elements across all provided arrays.

Optimizations available

One can make use of sparsed-bit-arrays to reduce memory consumption. Refer to [brettwooldridge/SparseBitSet] (https://github.com/brettwooldridge/SparseBitSet) for one such implementation.
If the arrays are really, really huge - an implementation that uses file-based persistence of a bit-array can be used. Refer to one such implementation available in the jerry-core project.

Fastest way to merge multiple integer sets

Posted on 18 September 2016

Problem

Solution

Optimizations available