Sum of Digits

Here’s an interesting little puzzle/coding exercise. I use it as one of the examples in a presentation I give about how to prepare for a white board coding interview. It’s a fun question, with plenty of depth.

Given a number N, calculate the sum of all the digits in 1–N

It’s easy to misread/misinterpret this question. People quick off the blocks interpret the question as “sum 1–N”, which is a much simpler problem. An inefficient way of performing this is a simple loop, but most people remember the apocryphal story of the young mathematician Carl Friedrich Gauss.

According to the legend, in the late 1700’s, Gauss was asked, by his teacher, to sum up all the numbers 1–100 inclusive. His teacher thought that this busy work would keep him occupied for a long time, but Gauss, correctly, gave the answer (5,050) within a couple of seconds. Gauss realized these numbers could be paired up 1+100, 2+99, 3+98 … with each of the fifty pairs having the same sum of 101, for the answer 50×101=5,050.

Another way to think of this is that these are triangular numbers, and we can see how we can calculate the answer by copying, rotating, and making a rectangle N×(N+1), half of which is the solution.

The sum of the numbers 1–N is N(N+1)/2

This is NOT the answer I was looking for!

Sum of Digits

If you read the question carefully, I’m asking for the sum of the digits of the numbers 1–N.

As I advise the people I tutor, the first thing to do is draw a couple of examples on the board to make sure you understand the question, and what the expected outcome is. Here’s a couple of examples for N=5, and N=12.

Can you see the difference? We’re looking for the sum of digits. When we get past nine, the sum of digits for ‘10’ is just ‘1’. The answer for N=12 should be 51, not 78.

(For the record, when I’m performing a coding interview, my goal is to find out if you can code, not if you are good at puzzles. I want to understand that you can break a problem down, select a good algorithm, turn this into functioning (tested) code, understand the time and space complexity of your solution, appreciate any limitations, and understand possible optimizations. If you fell into the ‘trap’ above, it would not be a ‘ding’ against you, we’d have a chuckle, I’d clarify the question, and we’d move on. Interviews are stressful enough as it is. A good interview question is one in which it is clear ‘what’ it is you need to do, and you need to be able to code the ‘how’.)

Naïve Solution

Now that we know the true question, we can code a solution. A first approach you might consider is a brute force approach, looping through each number 1–N, summing the digits. We can neaten the code by breaking this up and using a helper function. Here’s the main function:


int Sum (int n) {
 int res = 0; 
 for (int x=1; x<=n; x++)
   {res += SumOfDigits(x);}
 return res;
}

Now let’s flesh out this helper function:


int SumOfDigits(int x) {
 int sum = 0;
 while (x != 0)
 {
  sum += x % 10;
  x /= 10;
 }
 return sum;
}

The helper function uses modula arithmetic to peel-off the last digit each time, then shift it down one place. Nice!

This is certainly a clean solution, but can we do better? (I certainly hope so, because the time complexity of this solution is horrendous!)

Optimizing

If we think about this for a little while we can see that, if we're dealing with bigger values, that the lower values never change. The sum of the first nine numbers is 45, and this is constant however large N gets.

Sum(9) = 1+2+3+4+5+6+7+8+9 = 45

If we need to calculate N=12, for instance, we know Sum(9)=45, so all we need to do is add the digits for '10', '11', '12' to 45.

Sum(13) = (1+2+3+4+5+6+7+8+9) + '10' + '11' + '12'

Can you see a pattern emerging? If we need to calculate the teens (anything '1?'), this is just the same as the single numbers 0-? with an extra count for each leading digit (plus the 45), so for 19, 29, 39 … the answers are:

Sum(19) = 45 + (10 + 45)
Sum(29) = 45 + (10 + 45) + (20 + 45)
Sum(39) = 45 + (10 + 45) + (20 + 45) + (30 + 45)

Taking this further:

Sum(99) = 45 + (10 + 45) + (20 + 45) … (90 + 45)
Sum(99) = (45 × 10) + (10 + 20 + 30 … 90)
Sum(99) = (Sum(9) × 10) + (10 + 20 + 30 … 90)

We can see that the leading digit (shown in red) below, follows the same pattern as 1-10, shifted down a decade:

1–10	45 + 0 × 10
11–19	45 + 1 × 10 + ↑
21–29	45 + 2 × 10 + ↑
31–39	45 + 3 × 10 + ↑
…	45 + … × 10 + ↑
91–99	45 + 9 × 10 + ↑

This relationship carries on for each power of ten:

Sum(999) = (Sum(99) × 10) + (45 × 10)
Sum(9999) = (Sum(999) × 10) + (45 × 100)
Sum(99999) = (Sum(9999) × 10) + (45 × 1000)

With a little bit of thinking we can come up with the formula for the sum of the digits for these powers of ten. When we move up decade we have the sum of all the leading digits, multiplied by ten. If d represents the number of power of 10 (d=1, d=2, d=3 …), then:

Sum((10^d)- 1) = (Sum( (10^(d-1)) - 1 ) × 10) + (45 × (10^(d-1))

d=0, Sum((10^d)-1) = Sum(0) = 0
d=1, Sum((10^d)-1) = Sum(9) = 45
d=2, Sum((10^d)-1) = Sum(99) = 900
d=3, Sum((10^d)-1) = Sum(999) = 13,500
d=4, Sum((10^d)-1) = Sum(9999) = 2,250,000

Putting it together, to calculate Sum(n):

Find the number of significant digits for n, and subtract 1. Call this d. (e.g. for 678, d=2).
Using the formula above we can work out the Sum((10^d)-1) e.g. Sum(99)=900
Find the most significant digit (in this case 6), call this msd. We know Sum(99), and to get sum in range 100-199 we take 1×100 and add Sum(99), to get 200-299 we take 2×100 and add Sum(99) … to get to Sum(599) we can say this is (6 × Sum(99)) + (1+2+3+4+5) × 100. The latter part can be written as msd × (msd-1)/2. This has calculated the sum of the digits Sum(599). To this we also add (78+1) lots of six because the digit 6 occurs in all numbers from 600 through to 678. This is the contribution to the final answer based on the first digit.
If we now divide by 10, we can repeat the same calculation with what is left, in this case 78. This is also added on.
And so on through the digits.

Updated Function

In the real world, in the interests of performance, we'd generate a look-up table of the precomputed values in advance for Sum((10^d)-1) and use these values. Imagine we had these stored in an array a[d]. The updated helper function would look something like this:


int SumOfDigits(int x) {
 
 if (x<10) return x*(x+1)/2;
 
 int d = log10(x); //number of digits less one
 int p = pow(10,d);
 int msd = x/p;
 
 return (msd * a[d]) + (msd * (msd-1)/2 * p) + (msd * (1+ x%p) + SumOfDigits(x%p);
}

The first line short-circuits and calculates when there is just a single digit. This uses the triangular number formula we described at the start of this posting. If there is more than one digit, we take the base10 log, and the integer part of this is one less than the number of digits. Raising this to the power 10, and using this to divide the input, we get the most significant digit.

From above, we have shown how to calculate the contribution to the digit sum because of the most significant digit power, then we recurse down to look at the contribution from dividing from what's left.

Generating Array

All we need to do now is generate the array a[d]. It's possible to generate this in advacnce based on the maximum size input we expect to see.


int PopulateArray(int MAXSIZE) {

 int d = log10(MAXSIZE); //number of digits less one
 int p = pow(10,d);
 int a[] = new int[d+1]; 
 a[0]=0; a[1]=45;
 for (int i = 2; i <= d; i++) { a[i] = a[i-1]*10 + 45*pow(10,i-1); }

}

It is possible to remove the need for the array as we can come up with an explicit formula for each term. As with all things, this is a compromise. Pre-computing the array takes a little time, and a little memory, but the computations only need to get done once. With pre-computing, memory access to the array will be quick. Without using the array, it takes less memory (and there is a, marginally, quicker start-up time). Execution time will be slightly slower each time a value is needed as there is one more calculation.

Sum((10^d)- 1) = (Sum( (10^(d-1)) - 1 ) × 10) + (45 × (10^(d-1)) - 1 )) = 45 × (10^(d-1)) × d

Inserting this into our function we get this final result:


int SumOfDigits(int x) {
 
 if (x<10) return x*(x+1)/2;
 
 int d = log10(x); //number of digits less one
 int p = pow(10,d);
 int msd = x/p;
 
 return (msd * 45 * d * pow(10,d-1)) + (msd * (msd-1)/2 * p) + (msd * (1+ x%p) + SumOfDigits(x%p);
}