How to make nested loop faster to find instr in vba
Problem Description: Loop through excel max rows(approx. 10000000) to find instr. After finding instr, taking the values and copy the values to different sheet. Every time find the match which is instr, copy the value only the matches and paste it to different sheet.
Problem: I am using nested loop and my loop is running slow, so for 10 millions rows its taking around 19:37 mins. I timed it. So first question is there different ways of doing it or how do i make it faster instead of 20 mins, is it possible to compare 20 millions( each sheet 10 million rows, 10 million strings) strings within 1 min or two. Here is my current code
Sub zym()
Dim x As Long, lastrow As Long, lastrowx As Long, i As Long, ii As Long
Dim ws As Worksheet, ws2 As Worksheet, b As String
Dim j As Long
Set ws = Worksheets("Sheet1")
Set ws2 = Worksheets("Sheet2")
Set ws3 = Worksheets("Sheet3")
j = 1
T1 = GetTickCount
lastrow = ws.UsedRange.Rows.Count + 1
lastrowx = ws2.UsedRange.Rows.Count + 1
ReDim sheet1array(1 To lastrow)
ReDim sheet2array(1 To lastrowx)
For i = LBound(sheet1array) To UBound(sheet1array)
b = "-" & ws.Range("A" & i) & "-"
For ii = LBound(sheet2array) To UBound(sheet2array)
If InStr(1, ws2.Range("A" & ii), b) > 0 Then
ws3.Range("A" & j) = ws2.Range("A" & ii)
j = j + 1
End If
Next ii
Next i
Debug.Print "Array Time = " & (GetTickCount - T1) / 1000#
Debug.Print "Array Count = " & Format(ii, "#,###")
End Sub
microsoft-excel vba
add a comment |
Problem Description: Loop through excel max rows(approx. 10000000) to find instr. After finding instr, taking the values and copy the values to different sheet. Every time find the match which is instr, copy the value only the matches and paste it to different sheet.
Problem: I am using nested loop and my loop is running slow, so for 10 millions rows its taking around 19:37 mins. I timed it. So first question is there different ways of doing it or how do i make it faster instead of 20 mins, is it possible to compare 20 millions( each sheet 10 million rows, 10 million strings) strings within 1 min or two. Here is my current code
Sub zym()
Dim x As Long, lastrow As Long, lastrowx As Long, i As Long, ii As Long
Dim ws As Worksheet, ws2 As Worksheet, b As String
Dim j As Long
Set ws = Worksheets("Sheet1")
Set ws2 = Worksheets("Sheet2")
Set ws3 = Worksheets("Sheet3")
j = 1
T1 = GetTickCount
lastrow = ws.UsedRange.Rows.Count + 1
lastrowx = ws2.UsedRange.Rows.Count + 1
ReDim sheet1array(1 To lastrow)
ReDim sheet2array(1 To lastrowx)
For i = LBound(sheet1array) To UBound(sheet1array)
b = "-" & ws.Range("A" & i) & "-"
For ii = LBound(sheet2array) To UBound(sheet2array)
If InStr(1, ws2.Range("A" & ii), b) > 0 Then
ws3.Range("A" & j) = ws2.Range("A" & ii)
j = j + 1
End If
Next ii
Next i
Debug.Print "Array Time = " & (GetTickCount - T1) / 1000#
Debug.Print "Array Count = " & Format(ii, "#,###")
End Sub
microsoft-excel vba
You are not comparing 20 million strings. You are comparing 10,000,000*10,000,000 = 100 trillion strings. While I'm sure optimizations could be made, the expectation of getting this down to one minute is simply unrealistic. Is there a reason you don't break out of the inner loop once a match is found?
– Kyle
Sep 23 '15 at 14:32
@Kyle 19:37 mins for 100 Trillion strings? I'd be happy with that ;-)
– misha256
Sep 23 '15 at 20:43
add a comment |
Problem Description: Loop through excel max rows(approx. 10000000) to find instr. After finding instr, taking the values and copy the values to different sheet. Every time find the match which is instr, copy the value only the matches and paste it to different sheet.
Problem: I am using nested loop and my loop is running slow, so for 10 millions rows its taking around 19:37 mins. I timed it. So first question is there different ways of doing it or how do i make it faster instead of 20 mins, is it possible to compare 20 millions( each sheet 10 million rows, 10 million strings) strings within 1 min or two. Here is my current code
Sub zym()
Dim x As Long, lastrow As Long, lastrowx As Long, i As Long, ii As Long
Dim ws As Worksheet, ws2 As Worksheet, b As String
Dim j As Long
Set ws = Worksheets("Sheet1")
Set ws2 = Worksheets("Sheet2")
Set ws3 = Worksheets("Sheet3")
j = 1
T1 = GetTickCount
lastrow = ws.UsedRange.Rows.Count + 1
lastrowx = ws2.UsedRange.Rows.Count + 1
ReDim sheet1array(1 To lastrow)
ReDim sheet2array(1 To lastrowx)
For i = LBound(sheet1array) To UBound(sheet1array)
b = "-" & ws.Range("A" & i) & "-"
For ii = LBound(sheet2array) To UBound(sheet2array)
If InStr(1, ws2.Range("A" & ii), b) > 0 Then
ws3.Range("A" & j) = ws2.Range("A" & ii)
j = j + 1
End If
Next ii
Next i
Debug.Print "Array Time = " & (GetTickCount - T1) / 1000#
Debug.Print "Array Count = " & Format(ii, "#,###")
End Sub
microsoft-excel vba
Problem Description: Loop through excel max rows(approx. 10000000) to find instr. After finding instr, taking the values and copy the values to different sheet. Every time find the match which is instr, copy the value only the matches and paste it to different sheet.
Problem: I am using nested loop and my loop is running slow, so for 10 millions rows its taking around 19:37 mins. I timed it. So first question is there different ways of doing it or how do i make it faster instead of 20 mins, is it possible to compare 20 millions( each sheet 10 million rows, 10 million strings) strings within 1 min or two. Here is my current code
Sub zym()
Dim x As Long, lastrow As Long, lastrowx As Long, i As Long, ii As Long
Dim ws As Worksheet, ws2 As Worksheet, b As String
Dim j As Long
Set ws = Worksheets("Sheet1")
Set ws2 = Worksheets("Sheet2")
Set ws3 = Worksheets("Sheet3")
j = 1
T1 = GetTickCount
lastrow = ws.UsedRange.Rows.Count + 1
lastrowx = ws2.UsedRange.Rows.Count + 1
ReDim sheet1array(1 To lastrow)
ReDim sheet2array(1 To lastrowx)
For i = LBound(sheet1array) To UBound(sheet1array)
b = "-" & ws.Range("A" & i) & "-"
For ii = LBound(sheet2array) To UBound(sheet2array)
If InStr(1, ws2.Range("A" & ii), b) > 0 Then
ws3.Range("A" & j) = ws2.Range("A" & ii)
j = j + 1
End If
Next ii
Next i
Debug.Print "Array Time = " & (GetTickCount - T1) / 1000#
Debug.Print "Array Count = " & Format(ii, "#,###")
End Sub
microsoft-excel vba
microsoft-excel vba
edited Sep 23 '15 at 13:32
Raystafarian
19.5k105089
19.5k105089
asked Sep 23 '15 at 3:39
user3795861user3795861
18210
18210
You are not comparing 20 million strings. You are comparing 10,000,000*10,000,000 = 100 trillion strings. While I'm sure optimizations could be made, the expectation of getting this down to one minute is simply unrealistic. Is there a reason you don't break out of the inner loop once a match is found?
– Kyle
Sep 23 '15 at 14:32
@Kyle 19:37 mins for 100 Trillion strings? I'd be happy with that ;-)
– misha256
Sep 23 '15 at 20:43
add a comment |
You are not comparing 20 million strings. You are comparing 10,000,000*10,000,000 = 100 trillion strings. While I'm sure optimizations could be made, the expectation of getting this down to one minute is simply unrealistic. Is there a reason you don't break out of the inner loop once a match is found?
– Kyle
Sep 23 '15 at 14:32
@Kyle 19:37 mins for 100 Trillion strings? I'd be happy with that ;-)
– misha256
Sep 23 '15 at 20:43
You are not comparing 20 million strings. You are comparing 10,000,000*10,000,000 = 100 trillion strings. While I'm sure optimizations could be made, the expectation of getting this down to one minute is simply unrealistic. Is there a reason you don't break out of the inner loop once a match is found?
– Kyle
Sep 23 '15 at 14:32
You are not comparing 20 million strings. You are comparing 10,000,000*10,000,000 = 100 trillion strings. While I'm sure optimizations could be made, the expectation of getting this down to one minute is simply unrealistic. Is there a reason you don't break out of the inner loop once a match is found?
– Kyle
Sep 23 '15 at 14:32
@Kyle 19:37 mins for 100 Trillion strings? I'd be happy with that ;-)
– misha256
Sep 23 '15 at 20:43
@Kyle 19:37 mins for 100 Trillion strings? I'd be happy with that ;-)
– misha256
Sep 23 '15 at 20:43
add a comment |
3 Answers
3
active
oldest
votes
Reading from and writing to cells on a sheet slows down any macro. The following code copies cell values into arrays and loops through these. Output is copied in chunks from a result array into the target sheet.
On my notebook the original code took 56 sec, this code 3.7 sec:
Sub zym2()
Dim lastrow As Long, i As Long, j As Long, start As Long
Dim ws As Worksheet, ws2 As Worksheet, ws3 As Worksheet
Dim b As String
Dim T1 As Long
Dim arr1, arr2, arr3, c
Set ws = Worksheets("sh1")
Set ws2 = Worksheets("sh2")
Set ws3 = Worksheets("sh3")
ws3.Columns(1).Clear
T1 = Timer
arr1 = Intersect(ws.Columns(1), ws.UsedRange)
lastrow = UBound(arr1)
arr2 = ws2.UsedRange
ReDim arr3(1 To lastrow / 10, 2) ' initial length is arbitrary
j = 0
start = 1
For i = 1 To lastrow
b = "-" & arr1(i, 1) & "-"
For Each c In arr2
If InStr(1, c, b) > 0 Then
If j = UBound(arr3) Then
ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
start = start + j
j = 0
End If
j = j + 1
arr3(j, 1) = c
End If
Next c
Next i
If j > 0 Then
ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
End If
Debug.Print "Array Time = " & Format(Timer - T1, "##.0")
Debug.Print "Array Count = " & Format(start - 1 + j, "#,###")
End Sub
add a comment |
Although I have already offered an answer I want to propose a totally different algorithm here in order to improve the performance by another order of magnitude.
When the "big list" on sheet1 is scanned and matches in sheet2 are searched, the information about a successful search is thrown away after one pass. Sheet1 will contain repetitions of a search value, and when scanning sheet2 we can make use of it's frequency.
The means to finding unique search values and their frequencies is a dictionary object. To use it in VBA one has to add a reference to "Microsoft Scripting" in the VBA editor.
The second assumption is that the output list does not need to preserve the input order (because it will be sorted anyway). The following code will produce an output list in sheet3 with search values in the order in which they occur in the big list but with all repetitions in one block. Statements for timing have been commented out as an external class definition is needed for this.
Sub zym_dict()
' http://superuser.com/questions/976906/how-to-make-nested-loop-faster-to-find-instr-in-vba
' by E/S/P 2015-09-25
' 2nd improvement: use a dictionary object to count unique search items and loop over these
' speed 1:13 vs. array version; 1:186 vs. original (cell) version
Dim numvalues As Long, i As Long, j As Long, nextresult As Long
Dim numcompared As Long, numresults As Long
Dim cnt As Long
Dim ws1 As Worksheet, ws2 As Worksheet, ws3 As Worksheet
Dim searchterm As String
Dim values, arr2, results, c, v
Dim uniq As New Scripting.Dictionary
' Dim mStopWatch As New clsStopWatch
Set ws1 = Worksheets("sheet1")
Set ws2 = Worksheets("sheet2")
Set ws3 = Worksheets("sheet3")
' mStopWatch.StartWatch
values = Intersect(ws1.Columns(1), ws1.UsedRange)
arr2 = Intersect(ws2.Range("A:B"), ws2.UsedRange)
numcompared = UBound(arr2, 1)
' collect unique values and their frequencies
For i = 1 To UBound(values, 1)
uniq(values(i, 1)) = uniq(values(i, 1)) + 1
Next i
numresults = 0
' 2nd index is repeat count
For j = 1 To numcompared
arr2(j, 2) = 0
Next j
For Each v In uniq
searchterm = "-" & v & "-"
cnt = uniq.Item(v)
For j = 1 To numcompared
If InStr(1, arr2(j, 1), searchterm) > 0 Then
' copy this value multiple times into result array
arr2(j, 2) = arr2(j, 2) + cnt ' repeat count
numresults = numresults + cnt
End If
Next j
Next
' generate output list
ReDim results(1 To numresults, 1 To 2)
ws3.Columns(1).Clear
nextresult = 0
For i = 1 To numcompared
v = arr2(i, 1)
cnt = arr2(i, 2) ' may be 0!
For j = 1 To cnt
results(nextresult + j, 1) = v
Next j
nextresult = nextresult + cnt
Next i
' copy values to sheet
ws3.Range(Cells(1, 1), Cells(nextresult, 2)) = results
' Debug.Print "runtime = " & Format(mStopWatch.StopWatch, "#0.00 ms")
Debug.Print Format(nextresult, "#,### resulting lines")
End Sub
Compared to the OP's code the speed improvement is 1:186. A 20 minute run would then only take a couple of seconds.
add a comment |
I would use the Power Query Add-In for this. It has a Text.Contains function that is roughly similar to VB's InStr. I had a go at this particular challenge and got it working. You can download and use my demo file from my OneDrive:
http://1drv.ms/1AzPAZp
It's the file: Power Query demo - Searching for a list of strings among another list of strings.xlsx.
As described on the ReadMe sheet, I didn't have to write many functions - it was mostly built by clicking around the UI.
My design is to cross-join the Search and Target tables (the equivalent of your Sheet1 and Sheet2 I think) to get every possibly combination, then apply the Text.Contains function and filter on the result.
A key design objective is speed - it runs in about 1 second for the current semi-random test data:
19 Search Strings (currently single words)
78780 Target Strings (currently lines from War and Peace)
(so around 1.5 million combinations)
9268 Output matches.
So non-trivial scale, but nowhere near your requirements. Hopefully that will scale up to your needs - I'm keen to hear how it goes.
Note that the Target_Strings query could be replaced with one querying data directly from a database or website - Power Query is not limited to Excel as a datasource.
add a comment |
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f976906%2fhow-to-make-nested-loop-faster-to-find-instr-in-vba%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Reading from and writing to cells on a sheet slows down any macro. The following code copies cell values into arrays and loops through these. Output is copied in chunks from a result array into the target sheet.
On my notebook the original code took 56 sec, this code 3.7 sec:
Sub zym2()
Dim lastrow As Long, i As Long, j As Long, start As Long
Dim ws As Worksheet, ws2 As Worksheet, ws3 As Worksheet
Dim b As String
Dim T1 As Long
Dim arr1, arr2, arr3, c
Set ws = Worksheets("sh1")
Set ws2 = Worksheets("sh2")
Set ws3 = Worksheets("sh3")
ws3.Columns(1).Clear
T1 = Timer
arr1 = Intersect(ws.Columns(1), ws.UsedRange)
lastrow = UBound(arr1)
arr2 = ws2.UsedRange
ReDim arr3(1 To lastrow / 10, 2) ' initial length is arbitrary
j = 0
start = 1
For i = 1 To lastrow
b = "-" & arr1(i, 1) & "-"
For Each c In arr2
If InStr(1, c, b) > 0 Then
If j = UBound(arr3) Then
ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
start = start + j
j = 0
End If
j = j + 1
arr3(j, 1) = c
End If
Next c
Next i
If j > 0 Then
ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
End If
Debug.Print "Array Time = " & Format(Timer - T1, "##.0")
Debug.Print "Array Count = " & Format(start - 1 + j, "#,###")
End Sub
add a comment |
Reading from and writing to cells on a sheet slows down any macro. The following code copies cell values into arrays and loops through these. Output is copied in chunks from a result array into the target sheet.
On my notebook the original code took 56 sec, this code 3.7 sec:
Sub zym2()
Dim lastrow As Long, i As Long, j As Long, start As Long
Dim ws As Worksheet, ws2 As Worksheet, ws3 As Worksheet
Dim b As String
Dim T1 As Long
Dim arr1, arr2, arr3, c
Set ws = Worksheets("sh1")
Set ws2 = Worksheets("sh2")
Set ws3 = Worksheets("sh3")
ws3.Columns(1).Clear
T1 = Timer
arr1 = Intersect(ws.Columns(1), ws.UsedRange)
lastrow = UBound(arr1)
arr2 = ws2.UsedRange
ReDim arr3(1 To lastrow / 10, 2) ' initial length is arbitrary
j = 0
start = 1
For i = 1 To lastrow
b = "-" & arr1(i, 1) & "-"
For Each c In arr2
If InStr(1, c, b) > 0 Then
If j = UBound(arr3) Then
ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
start = start + j
j = 0
End If
j = j + 1
arr3(j, 1) = c
End If
Next c
Next i
If j > 0 Then
ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
End If
Debug.Print "Array Time = " & Format(Timer - T1, "##.0")
Debug.Print "Array Count = " & Format(start - 1 + j, "#,###")
End Sub
add a comment |
Reading from and writing to cells on a sheet slows down any macro. The following code copies cell values into arrays and loops through these. Output is copied in chunks from a result array into the target sheet.
On my notebook the original code took 56 sec, this code 3.7 sec:
Sub zym2()
Dim lastrow As Long, i As Long, j As Long, start As Long
Dim ws As Worksheet, ws2 As Worksheet, ws3 As Worksheet
Dim b As String
Dim T1 As Long
Dim arr1, arr2, arr3, c
Set ws = Worksheets("sh1")
Set ws2 = Worksheets("sh2")
Set ws3 = Worksheets("sh3")
ws3.Columns(1).Clear
T1 = Timer
arr1 = Intersect(ws.Columns(1), ws.UsedRange)
lastrow = UBound(arr1)
arr2 = ws2.UsedRange
ReDim arr3(1 To lastrow / 10, 2) ' initial length is arbitrary
j = 0
start = 1
For i = 1 To lastrow
b = "-" & arr1(i, 1) & "-"
For Each c In arr2
If InStr(1, c, b) > 0 Then
If j = UBound(arr3) Then
ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
start = start + j
j = 0
End If
j = j + 1
arr3(j, 1) = c
End If
Next c
Next i
If j > 0 Then
ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
End If
Debug.Print "Array Time = " & Format(Timer - T1, "##.0")
Debug.Print "Array Count = " & Format(start - 1 + j, "#,###")
End Sub
Reading from and writing to cells on a sheet slows down any macro. The following code copies cell values into arrays and loops through these. Output is copied in chunks from a result array into the target sheet.
On my notebook the original code took 56 sec, this code 3.7 sec:
Sub zym2()
Dim lastrow As Long, i As Long, j As Long, start As Long
Dim ws As Worksheet, ws2 As Worksheet, ws3 As Worksheet
Dim b As String
Dim T1 As Long
Dim arr1, arr2, arr3, c
Set ws = Worksheets("sh1")
Set ws2 = Worksheets("sh2")
Set ws3 = Worksheets("sh3")
ws3.Columns(1).Clear
T1 = Timer
arr1 = Intersect(ws.Columns(1), ws.UsedRange)
lastrow = UBound(arr1)
arr2 = ws2.UsedRange
ReDim arr3(1 To lastrow / 10, 2) ' initial length is arbitrary
j = 0
start = 1
For i = 1 To lastrow
b = "-" & arr1(i, 1) & "-"
For Each c In arr2
If InStr(1, c, b) > 0 Then
If j = UBound(arr3) Then
ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
start = start + j
j = 0
End If
j = j + 1
arr3(j, 1) = c
End If
Next c
Next i
If j > 0 Then
ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
End If
Debug.Print "Array Time = " & Format(Timer - T1, "##.0")
Debug.Print "Array Count = " & Format(start - 1 + j, "#,###")
End Sub
answered Sep 24 '15 at 18:55
user1016274user1016274
1,246713
1,246713
add a comment |
add a comment |
Although I have already offered an answer I want to propose a totally different algorithm here in order to improve the performance by another order of magnitude.
When the "big list" on sheet1 is scanned and matches in sheet2 are searched, the information about a successful search is thrown away after one pass. Sheet1 will contain repetitions of a search value, and when scanning sheet2 we can make use of it's frequency.
The means to finding unique search values and their frequencies is a dictionary object. To use it in VBA one has to add a reference to "Microsoft Scripting" in the VBA editor.
The second assumption is that the output list does not need to preserve the input order (because it will be sorted anyway). The following code will produce an output list in sheet3 with search values in the order in which they occur in the big list but with all repetitions in one block. Statements for timing have been commented out as an external class definition is needed for this.
Sub zym_dict()
' http://superuser.com/questions/976906/how-to-make-nested-loop-faster-to-find-instr-in-vba
' by E/S/P 2015-09-25
' 2nd improvement: use a dictionary object to count unique search items and loop over these
' speed 1:13 vs. array version; 1:186 vs. original (cell) version
Dim numvalues As Long, i As Long, j As Long, nextresult As Long
Dim numcompared As Long, numresults As Long
Dim cnt As Long
Dim ws1 As Worksheet, ws2 As Worksheet, ws3 As Worksheet
Dim searchterm As String
Dim values, arr2, results, c, v
Dim uniq As New Scripting.Dictionary
' Dim mStopWatch As New clsStopWatch
Set ws1 = Worksheets("sheet1")
Set ws2 = Worksheets("sheet2")
Set ws3 = Worksheets("sheet3")
' mStopWatch.StartWatch
values = Intersect(ws1.Columns(1), ws1.UsedRange)
arr2 = Intersect(ws2.Range("A:B"), ws2.UsedRange)
numcompared = UBound(arr2, 1)
' collect unique values and their frequencies
For i = 1 To UBound(values, 1)
uniq(values(i, 1)) = uniq(values(i, 1)) + 1
Next i
numresults = 0
' 2nd index is repeat count
For j = 1 To numcompared
arr2(j, 2) = 0
Next j
For Each v In uniq
searchterm = "-" & v & "-"
cnt = uniq.Item(v)
For j = 1 To numcompared
If InStr(1, arr2(j, 1), searchterm) > 0 Then
' copy this value multiple times into result array
arr2(j, 2) = arr2(j, 2) + cnt ' repeat count
numresults = numresults + cnt
End If
Next j
Next
' generate output list
ReDim results(1 To numresults, 1 To 2)
ws3.Columns(1).Clear
nextresult = 0
For i = 1 To numcompared
v = arr2(i, 1)
cnt = arr2(i, 2) ' may be 0!
For j = 1 To cnt
results(nextresult + j, 1) = v
Next j
nextresult = nextresult + cnt
Next i
' copy values to sheet
ws3.Range(Cells(1, 1), Cells(nextresult, 2)) = results
' Debug.Print "runtime = " & Format(mStopWatch.StopWatch, "#0.00 ms")
Debug.Print Format(nextresult, "#,### resulting lines")
End Sub
Compared to the OP's code the speed improvement is 1:186. A 20 minute run would then only take a couple of seconds.
add a comment |
Although I have already offered an answer I want to propose a totally different algorithm here in order to improve the performance by another order of magnitude.
When the "big list" on sheet1 is scanned and matches in sheet2 are searched, the information about a successful search is thrown away after one pass. Sheet1 will contain repetitions of a search value, and when scanning sheet2 we can make use of it's frequency.
The means to finding unique search values and their frequencies is a dictionary object. To use it in VBA one has to add a reference to "Microsoft Scripting" in the VBA editor.
The second assumption is that the output list does not need to preserve the input order (because it will be sorted anyway). The following code will produce an output list in sheet3 with search values in the order in which they occur in the big list but with all repetitions in one block. Statements for timing have been commented out as an external class definition is needed for this.
Sub zym_dict()
' http://superuser.com/questions/976906/how-to-make-nested-loop-faster-to-find-instr-in-vba
' by E/S/P 2015-09-25
' 2nd improvement: use a dictionary object to count unique search items and loop over these
' speed 1:13 vs. array version; 1:186 vs. original (cell) version
Dim numvalues As Long, i As Long, j As Long, nextresult As Long
Dim numcompared As Long, numresults As Long
Dim cnt As Long
Dim ws1 As Worksheet, ws2 As Worksheet, ws3 As Worksheet
Dim searchterm As String
Dim values, arr2, results, c, v
Dim uniq As New Scripting.Dictionary
' Dim mStopWatch As New clsStopWatch
Set ws1 = Worksheets("sheet1")
Set ws2 = Worksheets("sheet2")
Set ws3 = Worksheets("sheet3")
' mStopWatch.StartWatch
values = Intersect(ws1.Columns(1), ws1.UsedRange)
arr2 = Intersect(ws2.Range("A:B"), ws2.UsedRange)
numcompared = UBound(arr2, 1)
' collect unique values and their frequencies
For i = 1 To UBound(values, 1)
uniq(values(i, 1)) = uniq(values(i, 1)) + 1
Next i
numresults = 0
' 2nd index is repeat count
For j = 1 To numcompared
arr2(j, 2) = 0
Next j
For Each v In uniq
searchterm = "-" & v & "-"
cnt = uniq.Item(v)
For j = 1 To numcompared
If InStr(1, arr2(j, 1), searchterm) > 0 Then
' copy this value multiple times into result array
arr2(j, 2) = arr2(j, 2) + cnt ' repeat count
numresults = numresults + cnt
End If
Next j
Next
' generate output list
ReDim results(1 To numresults, 1 To 2)
ws3.Columns(1).Clear
nextresult = 0
For i = 1 To numcompared
v = arr2(i, 1)
cnt = arr2(i, 2) ' may be 0!
For j = 1 To cnt
results(nextresult + j, 1) = v
Next j
nextresult = nextresult + cnt
Next i
' copy values to sheet
ws3.Range(Cells(1, 1), Cells(nextresult, 2)) = results
' Debug.Print "runtime = " & Format(mStopWatch.StopWatch, "#0.00 ms")
Debug.Print Format(nextresult, "#,### resulting lines")
End Sub
Compared to the OP's code the speed improvement is 1:186. A 20 minute run would then only take a couple of seconds.
add a comment |
Although I have already offered an answer I want to propose a totally different algorithm here in order to improve the performance by another order of magnitude.
When the "big list" on sheet1 is scanned and matches in sheet2 are searched, the information about a successful search is thrown away after one pass. Sheet1 will contain repetitions of a search value, and when scanning sheet2 we can make use of it's frequency.
The means to finding unique search values and their frequencies is a dictionary object. To use it in VBA one has to add a reference to "Microsoft Scripting" in the VBA editor.
The second assumption is that the output list does not need to preserve the input order (because it will be sorted anyway). The following code will produce an output list in sheet3 with search values in the order in which they occur in the big list but with all repetitions in one block. Statements for timing have been commented out as an external class definition is needed for this.
Sub zym_dict()
' http://superuser.com/questions/976906/how-to-make-nested-loop-faster-to-find-instr-in-vba
' by E/S/P 2015-09-25
' 2nd improvement: use a dictionary object to count unique search items and loop over these
' speed 1:13 vs. array version; 1:186 vs. original (cell) version
Dim numvalues As Long, i As Long, j As Long, nextresult As Long
Dim numcompared As Long, numresults As Long
Dim cnt As Long
Dim ws1 As Worksheet, ws2 As Worksheet, ws3 As Worksheet
Dim searchterm As String
Dim values, arr2, results, c, v
Dim uniq As New Scripting.Dictionary
' Dim mStopWatch As New clsStopWatch
Set ws1 = Worksheets("sheet1")
Set ws2 = Worksheets("sheet2")
Set ws3 = Worksheets("sheet3")
' mStopWatch.StartWatch
values = Intersect(ws1.Columns(1), ws1.UsedRange)
arr2 = Intersect(ws2.Range("A:B"), ws2.UsedRange)
numcompared = UBound(arr2, 1)
' collect unique values and their frequencies
For i = 1 To UBound(values, 1)
uniq(values(i, 1)) = uniq(values(i, 1)) + 1
Next i
numresults = 0
' 2nd index is repeat count
For j = 1 To numcompared
arr2(j, 2) = 0
Next j
For Each v In uniq
searchterm = "-" & v & "-"
cnt = uniq.Item(v)
For j = 1 To numcompared
If InStr(1, arr2(j, 1), searchterm) > 0 Then
' copy this value multiple times into result array
arr2(j, 2) = arr2(j, 2) + cnt ' repeat count
numresults = numresults + cnt
End If
Next j
Next
' generate output list
ReDim results(1 To numresults, 1 To 2)
ws3.Columns(1).Clear
nextresult = 0
For i = 1 To numcompared
v = arr2(i, 1)
cnt = arr2(i, 2) ' may be 0!
For j = 1 To cnt
results(nextresult + j, 1) = v
Next j
nextresult = nextresult + cnt
Next i
' copy values to sheet
ws3.Range(Cells(1, 1), Cells(nextresult, 2)) = results
' Debug.Print "runtime = " & Format(mStopWatch.StopWatch, "#0.00 ms")
Debug.Print Format(nextresult, "#,### resulting lines")
End Sub
Compared to the OP's code the speed improvement is 1:186. A 20 minute run would then only take a couple of seconds.
Although I have already offered an answer I want to propose a totally different algorithm here in order to improve the performance by another order of magnitude.
When the "big list" on sheet1 is scanned and matches in sheet2 are searched, the information about a successful search is thrown away after one pass. Sheet1 will contain repetitions of a search value, and when scanning sheet2 we can make use of it's frequency.
The means to finding unique search values and their frequencies is a dictionary object. To use it in VBA one has to add a reference to "Microsoft Scripting" in the VBA editor.
The second assumption is that the output list does not need to preserve the input order (because it will be sorted anyway). The following code will produce an output list in sheet3 with search values in the order in which they occur in the big list but with all repetitions in one block. Statements for timing have been commented out as an external class definition is needed for this.
Sub zym_dict()
' http://superuser.com/questions/976906/how-to-make-nested-loop-faster-to-find-instr-in-vba
' by E/S/P 2015-09-25
' 2nd improvement: use a dictionary object to count unique search items and loop over these
' speed 1:13 vs. array version; 1:186 vs. original (cell) version
Dim numvalues As Long, i As Long, j As Long, nextresult As Long
Dim numcompared As Long, numresults As Long
Dim cnt As Long
Dim ws1 As Worksheet, ws2 As Worksheet, ws3 As Worksheet
Dim searchterm As String
Dim values, arr2, results, c, v
Dim uniq As New Scripting.Dictionary
' Dim mStopWatch As New clsStopWatch
Set ws1 = Worksheets("sheet1")
Set ws2 = Worksheets("sheet2")
Set ws3 = Worksheets("sheet3")
' mStopWatch.StartWatch
values = Intersect(ws1.Columns(1), ws1.UsedRange)
arr2 = Intersect(ws2.Range("A:B"), ws2.UsedRange)
numcompared = UBound(arr2, 1)
' collect unique values and their frequencies
For i = 1 To UBound(values, 1)
uniq(values(i, 1)) = uniq(values(i, 1)) + 1
Next i
numresults = 0
' 2nd index is repeat count
For j = 1 To numcompared
arr2(j, 2) = 0
Next j
For Each v In uniq
searchterm = "-" & v & "-"
cnt = uniq.Item(v)
For j = 1 To numcompared
If InStr(1, arr2(j, 1), searchterm) > 0 Then
' copy this value multiple times into result array
arr2(j, 2) = arr2(j, 2) + cnt ' repeat count
numresults = numresults + cnt
End If
Next j
Next
' generate output list
ReDim results(1 To numresults, 1 To 2)
ws3.Columns(1).Clear
nextresult = 0
For i = 1 To numcompared
v = arr2(i, 1)
cnt = arr2(i, 2) ' may be 0!
For j = 1 To cnt
results(nextresult + j, 1) = v
Next j
nextresult = nextresult + cnt
Next i
' copy values to sheet
ws3.Range(Cells(1, 1), Cells(nextresult, 2)) = results
' Debug.Print "runtime = " & Format(mStopWatch.StopWatch, "#0.00 ms")
Debug.Print Format(nextresult, "#,### resulting lines")
End Sub
Compared to the OP's code the speed improvement is 1:186. A 20 minute run would then only take a couple of seconds.
answered Sep 25 '15 at 17:07
user1016274user1016274
1,246713
1,246713
add a comment |
add a comment |
I would use the Power Query Add-In for this. It has a Text.Contains function that is roughly similar to VB's InStr. I had a go at this particular challenge and got it working. You can download and use my demo file from my OneDrive:
http://1drv.ms/1AzPAZp
It's the file: Power Query demo - Searching for a list of strings among another list of strings.xlsx.
As described on the ReadMe sheet, I didn't have to write many functions - it was mostly built by clicking around the UI.
My design is to cross-join the Search and Target tables (the equivalent of your Sheet1 and Sheet2 I think) to get every possibly combination, then apply the Text.Contains function and filter on the result.
A key design objective is speed - it runs in about 1 second for the current semi-random test data:
19 Search Strings (currently single words)
78780 Target Strings (currently lines from War and Peace)
(so around 1.5 million combinations)
9268 Output matches.
So non-trivial scale, but nowhere near your requirements. Hopefully that will scale up to your needs - I'm keen to hear how it goes.
Note that the Target_Strings query could be replaced with one querying data directly from a database or website - Power Query is not limited to Excel as a datasource.
add a comment |
I would use the Power Query Add-In for this. It has a Text.Contains function that is roughly similar to VB's InStr. I had a go at this particular challenge and got it working. You can download and use my demo file from my OneDrive:
http://1drv.ms/1AzPAZp
It's the file: Power Query demo - Searching for a list of strings among another list of strings.xlsx.
As described on the ReadMe sheet, I didn't have to write many functions - it was mostly built by clicking around the UI.
My design is to cross-join the Search and Target tables (the equivalent of your Sheet1 and Sheet2 I think) to get every possibly combination, then apply the Text.Contains function and filter on the result.
A key design objective is speed - it runs in about 1 second for the current semi-random test data:
19 Search Strings (currently single words)
78780 Target Strings (currently lines from War and Peace)
(so around 1.5 million combinations)
9268 Output matches.
So non-trivial scale, but nowhere near your requirements. Hopefully that will scale up to your needs - I'm keen to hear how it goes.
Note that the Target_Strings query could be replaced with one querying data directly from a database or website - Power Query is not limited to Excel as a datasource.
add a comment |
I would use the Power Query Add-In for this. It has a Text.Contains function that is roughly similar to VB's InStr. I had a go at this particular challenge and got it working. You can download and use my demo file from my OneDrive:
http://1drv.ms/1AzPAZp
It's the file: Power Query demo - Searching for a list of strings among another list of strings.xlsx.
As described on the ReadMe sheet, I didn't have to write many functions - it was mostly built by clicking around the UI.
My design is to cross-join the Search and Target tables (the equivalent of your Sheet1 and Sheet2 I think) to get every possibly combination, then apply the Text.Contains function and filter on the result.
A key design objective is speed - it runs in about 1 second for the current semi-random test data:
19 Search Strings (currently single words)
78780 Target Strings (currently lines from War and Peace)
(so around 1.5 million combinations)
9268 Output matches.
So non-trivial scale, but nowhere near your requirements. Hopefully that will scale up to your needs - I'm keen to hear how it goes.
Note that the Target_Strings query could be replaced with one querying data directly from a database or website - Power Query is not limited to Excel as a datasource.
I would use the Power Query Add-In for this. It has a Text.Contains function that is roughly similar to VB's InStr. I had a go at this particular challenge and got it working. You can download and use my demo file from my OneDrive:
http://1drv.ms/1AzPAZp
It's the file: Power Query demo - Searching for a list of strings among another list of strings.xlsx.
As described on the ReadMe sheet, I didn't have to write many functions - it was mostly built by clicking around the UI.
My design is to cross-join the Search and Target tables (the equivalent of your Sheet1 and Sheet2 I think) to get every possibly combination, then apply the Text.Contains function and filter on the result.
A key design objective is speed - it runs in about 1 second for the current semi-random test data:
19 Search Strings (currently single words)
78780 Target Strings (currently lines from War and Peace)
(so around 1.5 million combinations)
9268 Output matches.
So non-trivial scale, but nowhere near your requirements. Hopefully that will scale up to your needs - I'm keen to hear how it goes.
Note that the Target_Strings query could be replaced with one querying data directly from a database or website - Power Query is not limited to Excel as a datasource.
edited Oct 2 '15 at 3:43
answered Sep 24 '15 at 7:45
Mike HoneyMike Honey
1,7791611
1,7791611
add a comment |
add a comment |
Thanks for contributing an answer to Super User!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f976906%2fhow-to-make-nested-loop-faster-to-find-instr-in-vba%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You are not comparing 20 million strings. You are comparing 10,000,000*10,000,000 = 100 trillion strings. While I'm sure optimizations could be made, the expectation of getting this down to one minute is simply unrealistic. Is there a reason you don't break out of the inner loop once a match is found?
– Kyle
Sep 23 '15 at 14:32
@Kyle 19:37 mins for 100 Trillion strings? I'd be happy with that ;-)
– misha256
Sep 23 '15 at 20:43