Sulprobil
Search…
sbCountUniq
There are quite a few different approaches circulating in the web to count unique (different) entries only. My opinion here is blunt and simple: Use my derivation sbCountUniq of Charles Williams' COUNTU function.
But let us have a look at a simple example:
As you can see, we have 10 random integers between 1 and 5 in cells A2 thru A11 - actually, the 1 is not occurring. This means that we have only 4 different (unique) values. This result 4 is shown with five different calculations in cells A13:A17. For information purpose these four different entries and their frequencies are shown in the upper right corner in cells C2:D5. This array has been calculated with an array formula using my user defined function Pfreq.
This simple example is based on numerical inputs only. But of course we also could have encountered alphanumerical values - for example a mixture of strings and numbers. The formulas in A14 and in A15 would not be able to cope with non-numerical inputs, though.
Now let us have a look at the runtimes of these different approaches (I used Charles Williams' FastExcel © here):
For less than 1,000 records or numbers it does not really matter which solution you take. But if you have more than some thousand records you should take Charles Williams' COUNTU function.
By the way: COUNTU needed 54 milliseconds to calculate the result 1,000 (=1,000 different numbers of 10,000 random numbers). My UDF Pfreq needed 68 milliseconds to list all 1000 different entries with their number of occurrences (a total of 10,000, of course)!
I think it is pretty evident that for this number of entries, simulations etc. you really should apply a VBA solution.
A more flexible function than COUNTU is sbCountUniq which works on ranges over more than one workbook as well as on an array:
Please read my Disclaimer.
1
Function sbCountUniq(ParamArray v() As Variant) As Long
2
'Count unique values over all input areas (ranges or one array).
3
'Inspired by Charles Williams' COUNTU function, see:
4
'http://msdn.microsoft.com/en-us/library/aa730921.aspx#Office2007excelPerf_FindingPrioritizingCalculationBottlenecks
5
'Reverse("moc.LiborPlus.www") V0.10 10-Jan-2011
6
7
Dim colUniques As New Collection
8
Dim vCell As Variant
9
Dim vLcell As Variant
10
Dim j As Long
11
12
On Error Resume Next
13
With Application.WorksheetFunction
14
For j = LBound(v) To UBound(v)
15
For Each vCell In v(j)
16
If vCell <> vLcell Then
17
If Len(CStr(vCell)) > 0 Then
18
colUniques.Add vCell, CStr(vCell)
19
End If
20
End If
21
vLcell = vCell
22
Next vCell
23
Next j
24
sbCountUniq = colUniques.Count
25
End With
26
End Function
Copied!
But what if you have some criteria and you do not want to re-arrange your spreadsheet so that it shows only relevant data?
One option might be a Pivot table with a helper column. Personally I would prefer two subsequent Pivot tables.
Another option is using the advanced filter.
The two options I would like to present here are a worksheet formula approach and a VBA approach, though. This is because I think it is worth to learn from the pitfalls of the suboptimal worksheet formula.
Let us assume you have a company in 10 regions with 200 employees who are selling some goods each day. And now you want to know how many (unique) employees per region sold anything in the first quarter of 2010.
We have the sales data in sheet Data:
Now we define some convenient named ranges:
In sheet ResultWS we can provide now T. Valko's (Biff) clever worksheet array formula solution:
The named ranges help us to phrase this formula in a neat way. We need to copy this array formula for each appearing region, and the named ranges make these formulae volatile, but that does not matter too much, right?
Hmm, let us see ...
The VBA approach is pretty straight forward:
Please read my Disclaimer.
1
Option Explicit
2
3
Enum Data_Columns
4
dDate = 1
5
dEmployee
6
dCSO
7
dRegion
8
End Enum 'Data_Columns
9
10
Sub UniqEmployeesPerRegion()
11
'Reverse("moc.liborplus.www") V0.10 23-Oct-2010
12
Dim objEmployeesRegions As Object, objRegions As Object
13
Dim v As Variant
14
Dim lRow As Long
15
Dim s() As String
16
17
Const sC = "|"
18
Sheets("ResultVBA").Range("A2:B10001").ClearContents
19
Set objEmployeesRegions = CreateObject("Scripting.Dictionary")
20
Set objRegions = CreateObject("Scripting.Dictionary")
21
Sheets("Data").Select
22
lRow = 2
23
Do While Not IsEmpty(Cells(lRow, 1))
24
If Cells(lRow, dDate) >= Range("Start") And _
25
Cells(lRow, dDate) <= Range("End") And _
26
Cells(lRow, dCSO) > 0 Then
27
objEmployeesRegions.Item(Cells(lRow, dEmployee) & sC & Cells(lRow, _
28
dRegion)) = 1
29
End If
30
lRow = lRow + 1
31
Loop
32
For Each v In objEmployeesRegions.keys
33
s = Split(v, sC)
34
objRegions.Item(s(1)) = objRegions.Item(s(1)) + 1
35
Next v
36
Sheets("ResultVBA").Select
37
Range("A2").Resize(objRegions.Count).Value = _
38
Application.WorksheetFunction.Transpose(objRegions.keys)
39
Range("B2").Resize(objRegions.Count).Value = _
40
Application.WorksheetFunction.Transpose(objRegions.items)
41
If CDbl(Application.Version) >= 14# Then 'Try, might not work
42
'in older versions, could not test
43
Range(Range("A1"), Range("B1").End(xlDown)).Select
44
ActiveWorkbook.Worksheets("ResultVBA").Sort.SortFields.Clear
45
ActiveWorkbook.Worksheets("ResultVBA").Sort.SortFields.Add Key:=Range( _
46
Range("A2"), Range("A1").End(xlDown)), SortOn:=xlSortOnValues, _
47
Order:=xlAscending, DataOption:=xlSortNormal
48
With ActiveWorkbook.Worksheets("ResultVBA").Sort
49
.SetRange Range(Range("A1"), Range("B1").End(xlDown))
50
.Header = xlYes
51
.MatchCase = False
52
.Orientation = xlTopToBottom
53
.SortMethod = xlPinYin
54
.Apply
55
End With
56
End If
57
Set objEmployeesRegions = Nothing
58
Set objRegions = Nothing
59
End Sub
Copied!
For both the worksheet function approach and the VBA approach I ran simulations on my dual core laptop for 10, 50 and 100 regions and for 1000, 5000, 10000 and 20000 rows of data and I measured the runtime with FastExcel ©.
Now you can see that the worksheet formula approach shows a quadratic runtime depending on the number of data rows. If you double the number of data rows the runtime will increase by a factor of 4! The VBA approach runtime is linear - doubling the data rows only doubles the runtime.
Please note that a Pivot table approach would presumably be even faster than the VBA approach. But with runtimes around 1s for up to 20,000 rows of data I would be able to live pretty well.
Last modified 1yr ago
Copy link