Solution to Codility lesson3-exercise3 Genomic Range Query problem, link to Codility problem and complete solution at the end of the post.

Definitions:

    \[\mathbf{P} = \{x \in \mathbb{N}\ |0\leq x \leq N\land |P| = M\},\mathbf{P}\, is\, a\, multiset.\]

    \[\mathbf{Q} = \{x \in \mathbb{N}\ |0\leq x \leq N\land |Q| = M\},\mathbf{Q}\, is\, a\, multiset.\]

    \[\{N \in \mathbb{N}\ |\ 1 \leq N \leq 100,000\}\]

    \[\{M \in \mathbb{N}\ |\ 1 \leq M\leq 50,000\}\]

    \[p_0\leq q_0\, such\, that\, p_0 \in \mathbf{P}=\{p_0,p_1,p_2,...\}\land q_0 \in \mathbf{Q}=\{q_0,q_1,q_2,...\}\]

S: string of size N, consisting of characters A, C, G and T.
Query: Consists in analyzing the substring S[P[K]] to S[Q[K]] with

    \[0\leq K<N\]

, and finding the minimal factor, such that A has factor 1, C has factor 2, G has factor 3 and T has factor 4.

Problem:
Define a function solution with input SP and Q, such that solution returns an array with the result of each query from 0 to N – 1.

  • Expected worst-case time complexity is O(N + M);
  • Expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments).

Elements of input arrays can be modified. 

Analysis:

To solve this problem we are going to use, Prefix Sums. For each factor of string S except for T we are going to made a prefix array where S[K] adds 1 to the prefix array of the factor at that position, and o to the others, so when doing the query we have only to check if the sum from P[K] to Q[K] of each prefix array is greater than 1.

To calculate the prefix sum we have to visit every character of S, since  has size N,  our time Complexity until here is O(N)
After doing the prefix sums, we only have to perform the query with the precedence order, factor A, factor C, factor G and factor T, and add the result to the result array.

To perform all queries we have to traverse both arrays P and Q, at the same time, both have size M, which gives us a time complexity of O(M).
To facilitate I defined a function that translates the nucleotide to the corresponding factor.

The total time complexity is O(N) + O(M) = O(N + M).
For the space complexity we have O(N) since we have 3 prefix arrays, depending on the size of the string S.

Complete Solution:

Time Complexity O(N + M)
Space Complexity O(N)

Link to GitHub solution here .
Link to Codility problem here .

/*Comment here*/