# Sumproduct

"Whenever you find yourself on the side of the majority, it is time to pause and reflect." [Mark Twain]

SUMPRODUCT is a very powerful function. You can conditionally count or sum large ranges of data. On the other hand, if you need to maintain statistics on ALL elements of a larger list, the manual maintenance effort and sometimes the runtime of a recalculation can be too high.

SUMPRODUCT is more flexible than COUNTIF or SUMIF which support ONE condition only.

Example: If you want to count all cells in A1:A100 which show a positive numerical value, e.g. which are greater zero: =COUNTIF(A1:A100,”>0”) can be replaced by =SUMPRODUCT(--(A1:A100>0))

And if you need to count how many numbers of A1:A100 are greater than zero AND where the string in column B is "LONDON", use =SUMPRODUCT(--(A1:A100>0),--(B1:B100="LONDON")) This cannot be done with COUNTIF.

Instead of using SUMIF if you can sum up conditionally with SUMPRODUCT.

Example: If you want to sum up all cells in B1:B100 where the corresponding cells in A1:A100 show a positive numerical value: =SUMIF(A1:A100,”>0”,B1:B100) can be replaced by =SUMPRODUCT(--(A1:A100>0),B1:B100)

Please note that the double unary minus translates the Boolean values True and False into the numbers 1 and 0. You could also have used 0+ or 1* to do this but I think the double unary minus canonically indicates a condition and if that's missing you know that you look at the values which you need to sum up.

Now the fun starts because you can combine conditions which COUNTIF and SUMIF cannot deal with (since there are more than one of them).

Example: [AND Condition] If you want to add all numbers in C1:C100 where A1:A100 are negative and where B1:B100 show a "YES": =SUMPRODUCT(--(A1:A100<0),--(B1:B100="YES"),C1:C100) [OR Condition] If you want to add all numbers in C1:C100 where A1:A100 are negative or where B1:B100 show a "YES": =SUMPRODUCT(SIGN((A1:A100<0)+(B1:B100=”YES”)),C1:C100) [Please notice that the SIGN function reduces possible overlapping OR criteria - which would sum up to more than one - to one exactly! I suggest to always wrap OR criteria checks into a SIGN() function. A simple sum is naive and outright dangerous because a less skilled person could amend non-overlapping XOR criteria to overlapping OR ones.]

More complex examples:

Please note that I normally prefer SUMPRODUCT(--(),--()) to SUMPRODUCT(()*()) because this separates the conditions quite clearly, but for multi-column ranges such as in cell B17 above you can only use the *-approach.

SUMPRODUCT offers you a nice and complex functionality of an array function. You can count and sum up attributes/properties of cell ranges and put the result into a cell. If you need to do this for all appearing characteristics you face some difficulties, though. You need to prepare the set of unique characteristics for use with SUMPRODUCT first. If you do this manually you end up with manual maintenance if your input changes. If you want to do this with worksheet functions you increase the complexity of your worksheet.

Example: If you need to count all different strings in a column and to list them together with their number of occurrences, for example, you have two reasonable options for a worksheet function approach with SUMPRODUCT:

a) To manually write down all different entries and to count these with COUNTIF or SUMPRODUCT. You would have to maintain this list manually then. b) A feed of unique values with worksheet functions increases the complexity of the worksheet (please see here).

For cases like this one I suggest to use my user-defined-function (UDF) Pfreq instead of SUMPRODUCT or COUNTIF, if you need to count. If you want to sum up specified cells for all occurring corresponding cells/combinations, use sbSfreq, please.

A comparison of three different approaches to form a statistic on about 30,000 records of data:

We have a data table with 30,000 rows of data: Each row consists of a department key, a date, cost and identifier. Now we want to show for April 2010 the accumulated costs per department (leftmost column) and per identifier (topmost row).

1. Worksheet formulas

Sheet "Worksheet Formula" is built with SUMPRODUCT worksheet formulas. You need to specify all different departments and Id's correctly and completely or the statistic will not be correct. The MONTH-part of the formula is dangerous because data might be older than 1 year. This sheet needs about 1,200ms (1.2s) to be calculated (please see sheet "FastExcel" for details).

Sheet "Worksheet Formula (2)" is also built with SUMPRODUCT worksheet formulas. It fixes the MONTH issue but its runtime is about 10,600ms (10.6s).

Sheet "Worksheet Formula (3)" is built with SUMIFS worksheet formulas. It needs a helper column YearMonth in sheet tblProcessorActivity and runs about 270ms + 110ms = 0.38s.

2. VBA

Sheet "VBA" is built with a VBA macro. All different departments and Id's will be collected automatically. This sheet needs about 860ms (0.86s) to be calculated (according to FastExcel).

Please note that this code also uses the class SystemState.

`Option Explicit​'Enumerate source sheet columns - easier to maintain laterEnum tblProcessorActivity_Columns    tpa_Department = 2    tpa_Date    tpa_Cost    tpa_Id = 14End Enum 'tblProcessorActivity_Columns​Const CSep = ","​Sub CurrentMonthsCost()'Reverse("moc.liborplus.www") PB V0.11 20-Jan-2013Dim objCosts As Object, objDepts As Object, objIds As ObjectDim lRow As Long, lDeptCount As Long, lIdCount As LongDim dtMaxDate As DateDim sMaxDateYYYYMM As String, sIdx As StringDim v As Variant, vDeptId As VariantDim state As SystemStateSet state = New SystemState​Set objCosts = CreateObject("Scripting.Dictionary")Set objDepts = CreateObject("Scripting.Dictionary")Set objIds = CreateObject("Scripting.Dictionary")Sheets("VBA").SelectCells.ClearContentsWith Sheets("tblProcessorActivity")dtMaxDate = Application.WorksheetFunction.Max(.Range("C2:C30000"))sMaxDateYYYYMM = Format(dtMaxDate, "YYYYMM")​'Main data processing looplRow = 2Do While Not IsEmpty(.Cells(lRow, tpa_Department))    If sMaxDateYYYYMM = Format(.Cells(lRow, tpa_Date), "YYYYMM") Then        sIdx = .Cells(lRow, tpa_Department) & CSep & .Cells(lRow, tpa_Id)        objCosts.Item(sIdx) = objCosts.Item(sIdx) + .Cells(lRow, tpa_Cost)    End If    lRow = lRow + 1Loop​'We are almost finished - rest is formatting outputCells(1, 1) = dtMaxDateFor Each v In objCosts.keys    vDeptId = Split(v, CSep)    If objDepts.Item(vDeptId(0)) = 0 Then        lDeptCount = lDeptCount + 1        objDepts.Item(vDeptId(0)) = lDeptCount        Cells(lDeptCount + 1, 1) = vDeptId(0)    End If    If objIds.Item(vDeptId(1)) = 0 Then        lIdCount = lIdCount + 1        objIds.Item(vDeptId(1)) = lIdCount        Cells(1, lIdCount + 1) = vDeptId(1)    End If    Cells(objDepts.Item(vDeptId(0)) + 1, objIds.Item(vDeptId(1)) + 1) = _        objCosts.Item(v)Next vEnd With​'Sort result row-wise by department and column-wise by Id'Surely you could and you maybe even would prefer Excel's built-in sort ...'Please include code from http://sulprobil.com/html/gsort.htmlRange(Cells(2, 1), Cells(lDeptCount + 1, lIdCount + 1)).FormulaArray = _    GSort(Range(Cells(2, 1), Cells(lDeptCount + 1, lIdCount + 1)))Range(Cells(1, 2), Cells(lDeptCount + 1, lIdCount + 1)).FormulaArray = _    Application.WorksheetFunction.Transpose(GSort( _    Application.WorksheetFunction.Transpose(Range(Cells(1, 2), _    Cells(lDeptCount + 1, lIdCount + 1))), "A", "N"))​End Sub`
`Option Explicit​Sub RefreshPivotTable()ActiveSheet.PivotTables("PivotTable1").PivotCache.RefreshEnd Sub`