How to make nested loop faster to find instr in vba












2















Problem Description: Loop through excel max rows(approx. 10000000) to find instr. After finding instr, taking the values and copy the values to different sheet. Every time find the match which is instr, copy the value only the matches and paste it to different sheet.



Problem: I am using nested loop and my loop is running slow, so for 10 millions rows its taking around 19:37 mins. I timed it. So first question is there different ways of doing it or how do i make it faster instead of 20 mins, is it possible to compare 20 millions( each sheet 10 million rows, 10 million strings) strings within 1 min or two. Here is my current code



  Sub zym()
Dim x As Long, lastrow As Long, lastrowx As Long, i As Long, ii As Long
Dim ws As Worksheet, ws2 As Worksheet, b As String
Dim j As Long

Set ws = Worksheets("Sheet1")
Set ws2 = Worksheets("Sheet2")
Set ws3 = Worksheets("Sheet3")
j = 1
T1 = GetTickCount

lastrow = ws.UsedRange.Rows.Count + 1
lastrowx = ws2.UsedRange.Rows.Count + 1

ReDim sheet1array(1 To lastrow)
ReDim sheet2array(1 To lastrowx)

For i = LBound(sheet1array) To UBound(sheet1array)
b = "-" & ws.Range("A" & i) & "-"
For ii = LBound(sheet2array) To UBound(sheet2array)
If InStr(1, ws2.Range("A" & ii), b) > 0 Then
ws3.Range("A" & j) = ws2.Range("A" & ii)
j = j + 1
End If

Next ii
Next i
Debug.Print "Array Time = " & (GetTickCount - T1) / 1000#
Debug.Print "Array Count = " & Format(ii, "#,###")

End Sub









share|improve this question

























  • You are not comparing 20 million strings. You are comparing 10,000,000*10,000,000 = 100 trillion strings. While I'm sure optimizations could be made, the expectation of getting this down to one minute is simply unrealistic. Is there a reason you don't break out of the inner loop once a match is found?

    – Kyle
    Sep 23 '15 at 14:32













  • @Kyle 19:37 mins for 100 Trillion strings? I'd be happy with that ;-)

    – misha256
    Sep 23 '15 at 20:43


















2















Problem Description: Loop through excel max rows(approx. 10000000) to find instr. After finding instr, taking the values and copy the values to different sheet. Every time find the match which is instr, copy the value only the matches and paste it to different sheet.



Problem: I am using nested loop and my loop is running slow, so for 10 millions rows its taking around 19:37 mins. I timed it. So first question is there different ways of doing it or how do i make it faster instead of 20 mins, is it possible to compare 20 millions( each sheet 10 million rows, 10 million strings) strings within 1 min or two. Here is my current code



  Sub zym()
Dim x As Long, lastrow As Long, lastrowx As Long, i As Long, ii As Long
Dim ws As Worksheet, ws2 As Worksheet, b As String
Dim j As Long

Set ws = Worksheets("Sheet1")
Set ws2 = Worksheets("Sheet2")
Set ws3 = Worksheets("Sheet3")
j = 1
T1 = GetTickCount

lastrow = ws.UsedRange.Rows.Count + 1
lastrowx = ws2.UsedRange.Rows.Count + 1

ReDim sheet1array(1 To lastrow)
ReDim sheet2array(1 To lastrowx)

For i = LBound(sheet1array) To UBound(sheet1array)
b = "-" & ws.Range("A" & i) & "-"
For ii = LBound(sheet2array) To UBound(sheet2array)
If InStr(1, ws2.Range("A" & ii), b) > 0 Then
ws3.Range("A" & j) = ws2.Range("A" & ii)
j = j + 1
End If

Next ii
Next i
Debug.Print "Array Time = " & (GetTickCount - T1) / 1000#
Debug.Print "Array Count = " & Format(ii, "#,###")

End Sub









share|improve this question

























  • You are not comparing 20 million strings. You are comparing 10,000,000*10,000,000 = 100 trillion strings. While I'm sure optimizations could be made, the expectation of getting this down to one minute is simply unrealistic. Is there a reason you don't break out of the inner loop once a match is found?

    – Kyle
    Sep 23 '15 at 14:32













  • @Kyle 19:37 mins for 100 Trillion strings? I'd be happy with that ;-)

    – misha256
    Sep 23 '15 at 20:43
















2












2








2








Problem Description: Loop through excel max rows(approx. 10000000) to find instr. After finding instr, taking the values and copy the values to different sheet. Every time find the match which is instr, copy the value only the matches and paste it to different sheet.



Problem: I am using nested loop and my loop is running slow, so for 10 millions rows its taking around 19:37 mins. I timed it. So first question is there different ways of doing it or how do i make it faster instead of 20 mins, is it possible to compare 20 millions( each sheet 10 million rows, 10 million strings) strings within 1 min or two. Here is my current code



  Sub zym()
Dim x As Long, lastrow As Long, lastrowx As Long, i As Long, ii As Long
Dim ws As Worksheet, ws2 As Worksheet, b As String
Dim j As Long

Set ws = Worksheets("Sheet1")
Set ws2 = Worksheets("Sheet2")
Set ws3 = Worksheets("Sheet3")
j = 1
T1 = GetTickCount

lastrow = ws.UsedRange.Rows.Count + 1
lastrowx = ws2.UsedRange.Rows.Count + 1

ReDim sheet1array(1 To lastrow)
ReDim sheet2array(1 To lastrowx)

For i = LBound(sheet1array) To UBound(sheet1array)
b = "-" & ws.Range("A" & i) & "-"
For ii = LBound(sheet2array) To UBound(sheet2array)
If InStr(1, ws2.Range("A" & ii), b) > 0 Then
ws3.Range("A" & j) = ws2.Range("A" & ii)
j = j + 1
End If

Next ii
Next i
Debug.Print "Array Time = " & (GetTickCount - T1) / 1000#
Debug.Print "Array Count = " & Format(ii, "#,###")

End Sub









share|improve this question
















Problem Description: Loop through excel max rows(approx. 10000000) to find instr. After finding instr, taking the values and copy the values to different sheet. Every time find the match which is instr, copy the value only the matches and paste it to different sheet.



Problem: I am using nested loop and my loop is running slow, so for 10 millions rows its taking around 19:37 mins. I timed it. So first question is there different ways of doing it or how do i make it faster instead of 20 mins, is it possible to compare 20 millions( each sheet 10 million rows, 10 million strings) strings within 1 min or two. Here is my current code



  Sub zym()
Dim x As Long, lastrow As Long, lastrowx As Long, i As Long, ii As Long
Dim ws As Worksheet, ws2 As Worksheet, b As String
Dim j As Long

Set ws = Worksheets("Sheet1")
Set ws2 = Worksheets("Sheet2")
Set ws3 = Worksheets("Sheet3")
j = 1
T1 = GetTickCount

lastrow = ws.UsedRange.Rows.Count + 1
lastrowx = ws2.UsedRange.Rows.Count + 1

ReDim sheet1array(1 To lastrow)
ReDim sheet2array(1 To lastrowx)

For i = LBound(sheet1array) To UBound(sheet1array)
b = "-" & ws.Range("A" & i) & "-"
For ii = LBound(sheet2array) To UBound(sheet2array)
If InStr(1, ws2.Range("A" & ii), b) > 0 Then
ws3.Range("A" & j) = ws2.Range("A" & ii)
j = j + 1
End If

Next ii
Next i
Debug.Print "Array Time = " & (GetTickCount - T1) / 1000#
Debug.Print "Array Count = " & Format(ii, "#,###")

End Sub






microsoft-excel vba






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Sep 23 '15 at 13:32









Raystafarian

19.5k105089




19.5k105089










asked Sep 23 '15 at 3:39









user3795861user3795861

18210




18210













  • You are not comparing 20 million strings. You are comparing 10,000,000*10,000,000 = 100 trillion strings. While I'm sure optimizations could be made, the expectation of getting this down to one minute is simply unrealistic. Is there a reason you don't break out of the inner loop once a match is found?

    – Kyle
    Sep 23 '15 at 14:32













  • @Kyle 19:37 mins for 100 Trillion strings? I'd be happy with that ;-)

    – misha256
    Sep 23 '15 at 20:43





















  • You are not comparing 20 million strings. You are comparing 10,000,000*10,000,000 = 100 trillion strings. While I'm sure optimizations could be made, the expectation of getting this down to one minute is simply unrealistic. Is there a reason you don't break out of the inner loop once a match is found?

    – Kyle
    Sep 23 '15 at 14:32













  • @Kyle 19:37 mins for 100 Trillion strings? I'd be happy with that ;-)

    – misha256
    Sep 23 '15 at 20:43



















You are not comparing 20 million strings. You are comparing 10,000,000*10,000,000 = 100 trillion strings. While I'm sure optimizations could be made, the expectation of getting this down to one minute is simply unrealistic. Is there a reason you don't break out of the inner loop once a match is found?

– Kyle
Sep 23 '15 at 14:32







You are not comparing 20 million strings. You are comparing 10,000,000*10,000,000 = 100 trillion strings. While I'm sure optimizations could be made, the expectation of getting this down to one minute is simply unrealistic. Is there a reason you don't break out of the inner loop once a match is found?

– Kyle
Sep 23 '15 at 14:32















@Kyle 19:37 mins for 100 Trillion strings? I'd be happy with that ;-)

– misha256
Sep 23 '15 at 20:43







@Kyle 19:37 mins for 100 Trillion strings? I'd be happy with that ;-)

– misha256
Sep 23 '15 at 20:43












3 Answers
3






active

oldest

votes


















0














Reading from and writing to cells on a sheet slows down any macro. The following code copies cell values into arrays and loops through these. Output is copied in chunks from a result array into the target sheet.

On my notebook the original code took 56 sec, this code 3.7 sec:



Sub zym2()
Dim lastrow As Long, i As Long, j As Long, start As Long
Dim ws As Worksheet, ws2 As Worksheet, ws3 As Worksheet
Dim b As String
Dim T1 As Long
Dim arr1, arr2, arr3, c

Set ws = Worksheets("sh1")
Set ws2 = Worksheets("sh2")
Set ws3 = Worksheets("sh3")
ws3.Columns(1).Clear
T1 = Timer

arr1 = Intersect(ws.Columns(1), ws.UsedRange)
lastrow = UBound(arr1)
arr2 = ws2.UsedRange
ReDim arr3(1 To lastrow / 10, 2) ' initial length is arbitrary

j = 0
start = 1
For i = 1 To lastrow
b = "-" & arr1(i, 1) & "-"
For Each c In arr2
If InStr(1, c, b) > 0 Then
If j = UBound(arr3) Then
ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
start = start + j
j = 0
End If
j = j + 1
arr3(j, 1) = c
End If
Next c
Next i
If j > 0 Then
ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
End If
Debug.Print "Array Time = " & Format(Timer - T1, "##.0")
Debug.Print "Array Count = " & Format(start - 1 + j, "#,###")
End Sub





share|improve this answer































    0














    Although I have already offered an answer I want to propose a totally different algorithm here in order to improve the performance by another order of magnitude.

    When the "big list" on sheet1 is scanned and matches in sheet2 are searched, the information about a successful search is thrown away after one pass. Sheet1 will contain repetitions of a search value, and when scanning sheet2 we can make use of it's frequency.



    The means to finding unique search values and their frequencies is a dictionary object. To use it in VBA one has to add a reference to "Microsoft Scripting" in the VBA editor.

    The second assumption is that the output list does not need to preserve the input order (because it will be sorted anyway). The following code will produce an output list in sheet3 with search values in the order in which they occur in the big list but with all repetitions in one block. Statements for timing have been commented out as an external class definition is needed for this.



    Sub zym_dict()
    ' http://superuser.com/questions/976906/how-to-make-nested-loop-faster-to-find-instr-in-vba
    ' by E/S/P 2015-09-25
    ' 2nd improvement: use a dictionary object to count unique search items and loop over these
    ' speed 1:13 vs. array version; 1:186 vs. original (cell) version

    Dim numvalues As Long, i As Long, j As Long, nextresult As Long
    Dim numcompared As Long, numresults As Long
    Dim cnt As Long
    Dim ws1 As Worksheet, ws2 As Worksheet, ws3 As Worksheet
    Dim searchterm As String
    Dim values, arr2, results, c, v
    Dim uniq As New Scripting.Dictionary

    ' Dim mStopWatch As New clsStopWatch

    Set ws1 = Worksheets("sheet1")
    Set ws2 = Worksheets("sheet2")
    Set ws3 = Worksheets("sheet3")

    ' mStopWatch.StartWatch

    values = Intersect(ws1.Columns(1), ws1.UsedRange)
    arr2 = Intersect(ws2.Range("A:B"), ws2.UsedRange)
    numcompared = UBound(arr2, 1)

    ' collect unique values and their frequencies
    For i = 1 To UBound(values, 1)
    uniq(values(i, 1)) = uniq(values(i, 1)) + 1
    Next i

    numresults = 0
    ' 2nd index is repeat count
    For j = 1 To numcompared
    arr2(j, 2) = 0
    Next j

    For Each v In uniq
    searchterm = "-" & v & "-"
    cnt = uniq.Item(v)
    For j = 1 To numcompared
    If InStr(1, arr2(j, 1), searchterm) > 0 Then
    ' copy this value multiple times into result array
    arr2(j, 2) = arr2(j, 2) + cnt ' repeat count
    numresults = numresults + cnt
    End If
    Next j
    Next

    ' generate output list
    ReDim results(1 To numresults, 1 To 2)
    ws3.Columns(1).Clear
    nextresult = 0
    For i = 1 To numcompared
    v = arr2(i, 1)
    cnt = arr2(i, 2) ' may be 0!
    For j = 1 To cnt
    results(nextresult + j, 1) = v
    Next j
    nextresult = nextresult + cnt
    Next i

    ' copy values to sheet
    ws3.Range(Cells(1, 1), Cells(nextresult, 2)) = results

    ' Debug.Print "runtime = " & Format(mStopWatch.StopWatch, "#0.00 ms")
    Debug.Print Format(nextresult, "#,### resulting lines")
    End Sub


    Compared to the OP's code the speed improvement is 1:186. A 20 minute run would then only take a couple of seconds.






    share|improve this answer































      0














      I would use the Power Query Add-In for this. It has a Text.Contains function that is roughly similar to VB's InStr. I had a go at this particular challenge and got it working. You can download and use my demo file from my OneDrive:



      http://1drv.ms/1AzPAZp



      It's the file: Power Query demo - Searching for a list of strings among another list of strings.xlsx.



      As described on the ReadMe sheet, I didn't have to write many functions - it was mostly built by clicking around the UI.



      My design is to cross-join the Search and Target tables (the equivalent of your Sheet1 and Sheet2 I think) to get every possibly combination, then apply the Text.Contains function and filter on the result.



      A key design objective is speed - it runs in about 1 second for the current semi-random test data:
      19 Search Strings (currently single words)
      78780 Target Strings (currently lines from War and Peace)
      (so around 1.5 million combinations)
      9268 Output matches.



      So non-trivial scale, but nowhere near your requirements. Hopefully that will scale up to your needs - I'm keen to hear how it goes.



      Note that the Target_Strings query could be replaced with one querying data directly from a database or website - Power Query is not limited to Excel as a datasource.






      share|improve this answer


























        Your Answer








        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "3"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f976906%2fhow-to-make-nested-loop-faster-to-find-instr-in-vba%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        0














        Reading from and writing to cells on a sheet slows down any macro. The following code copies cell values into arrays and loops through these. Output is copied in chunks from a result array into the target sheet.

        On my notebook the original code took 56 sec, this code 3.7 sec:



        Sub zym2()
        Dim lastrow As Long, i As Long, j As Long, start As Long
        Dim ws As Worksheet, ws2 As Worksheet, ws3 As Worksheet
        Dim b As String
        Dim T1 As Long
        Dim arr1, arr2, arr3, c

        Set ws = Worksheets("sh1")
        Set ws2 = Worksheets("sh2")
        Set ws3 = Worksheets("sh3")
        ws3.Columns(1).Clear
        T1 = Timer

        arr1 = Intersect(ws.Columns(1), ws.UsedRange)
        lastrow = UBound(arr1)
        arr2 = ws2.UsedRange
        ReDim arr3(1 To lastrow / 10, 2) ' initial length is arbitrary

        j = 0
        start = 1
        For i = 1 To lastrow
        b = "-" & arr1(i, 1) & "-"
        For Each c In arr2
        If InStr(1, c, b) > 0 Then
        If j = UBound(arr3) Then
        ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
        start = start + j
        j = 0
        End If
        j = j + 1
        arr3(j, 1) = c
        End If
        Next c
        Next i
        If j > 0 Then
        ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
        End If
        Debug.Print "Array Time = " & Format(Timer - T1, "##.0")
        Debug.Print "Array Count = " & Format(start - 1 + j, "#,###")
        End Sub





        share|improve this answer




























          0














          Reading from and writing to cells on a sheet slows down any macro. The following code copies cell values into arrays and loops through these. Output is copied in chunks from a result array into the target sheet.

          On my notebook the original code took 56 sec, this code 3.7 sec:



          Sub zym2()
          Dim lastrow As Long, i As Long, j As Long, start As Long
          Dim ws As Worksheet, ws2 As Worksheet, ws3 As Worksheet
          Dim b As String
          Dim T1 As Long
          Dim arr1, arr2, arr3, c

          Set ws = Worksheets("sh1")
          Set ws2 = Worksheets("sh2")
          Set ws3 = Worksheets("sh3")
          ws3.Columns(1).Clear
          T1 = Timer

          arr1 = Intersect(ws.Columns(1), ws.UsedRange)
          lastrow = UBound(arr1)
          arr2 = ws2.UsedRange
          ReDim arr3(1 To lastrow / 10, 2) ' initial length is arbitrary

          j = 0
          start = 1
          For i = 1 To lastrow
          b = "-" & arr1(i, 1) & "-"
          For Each c In arr2
          If InStr(1, c, b) > 0 Then
          If j = UBound(arr3) Then
          ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
          start = start + j
          j = 0
          End If
          j = j + 1
          arr3(j, 1) = c
          End If
          Next c
          Next i
          If j > 0 Then
          ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
          End If
          Debug.Print "Array Time = " & Format(Timer - T1, "##.0")
          Debug.Print "Array Count = " & Format(start - 1 + j, "#,###")
          End Sub





          share|improve this answer


























            0












            0








            0







            Reading from and writing to cells on a sheet slows down any macro. The following code copies cell values into arrays and loops through these. Output is copied in chunks from a result array into the target sheet.

            On my notebook the original code took 56 sec, this code 3.7 sec:



            Sub zym2()
            Dim lastrow As Long, i As Long, j As Long, start As Long
            Dim ws As Worksheet, ws2 As Worksheet, ws3 As Worksheet
            Dim b As String
            Dim T1 As Long
            Dim arr1, arr2, arr3, c

            Set ws = Worksheets("sh1")
            Set ws2 = Worksheets("sh2")
            Set ws3 = Worksheets("sh3")
            ws3.Columns(1).Clear
            T1 = Timer

            arr1 = Intersect(ws.Columns(1), ws.UsedRange)
            lastrow = UBound(arr1)
            arr2 = ws2.UsedRange
            ReDim arr3(1 To lastrow / 10, 2) ' initial length is arbitrary

            j = 0
            start = 1
            For i = 1 To lastrow
            b = "-" & arr1(i, 1) & "-"
            For Each c In arr2
            If InStr(1, c, b) > 0 Then
            If j = UBound(arr3) Then
            ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
            start = start + j
            j = 0
            End If
            j = j + 1
            arr3(j, 1) = c
            End If
            Next c
            Next i
            If j > 0 Then
            ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
            End If
            Debug.Print "Array Time = " & Format(Timer - T1, "##.0")
            Debug.Print "Array Count = " & Format(start - 1 + j, "#,###")
            End Sub





            share|improve this answer













            Reading from and writing to cells on a sheet slows down any macro. The following code copies cell values into arrays and loops through these. Output is copied in chunks from a result array into the target sheet.

            On my notebook the original code took 56 sec, this code 3.7 sec:



            Sub zym2()
            Dim lastrow As Long, i As Long, j As Long, start As Long
            Dim ws As Worksheet, ws2 As Worksheet, ws3 As Worksheet
            Dim b As String
            Dim T1 As Long
            Dim arr1, arr2, arr3, c

            Set ws = Worksheets("sh1")
            Set ws2 = Worksheets("sh2")
            Set ws3 = Worksheets("sh3")
            ws3.Columns(1).Clear
            T1 = Timer

            arr1 = Intersect(ws.Columns(1), ws.UsedRange)
            lastrow = UBound(arr1)
            arr2 = ws2.UsedRange
            ReDim arr3(1 To lastrow / 10, 2) ' initial length is arbitrary

            j = 0
            start = 1
            For i = 1 To lastrow
            b = "-" & arr1(i, 1) & "-"
            For Each c In arr2
            If InStr(1, c, b) > 0 Then
            If j = UBound(arr3) Then
            ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
            start = start + j
            j = 0
            End If
            j = j + 1
            arr3(j, 1) = c
            End If
            Next c
            Next i
            If j > 0 Then
            ws3.Range(Cells(start, 1), Cells(start - 1 + j, 1)) = arr3
            End If
            Debug.Print "Array Time = " & Format(Timer - T1, "##.0")
            Debug.Print "Array Count = " & Format(start - 1 + j, "#,###")
            End Sub






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Sep 24 '15 at 18:55









            user1016274user1016274

            1,246713




            1,246713

























                0














                Although I have already offered an answer I want to propose a totally different algorithm here in order to improve the performance by another order of magnitude.

                When the "big list" on sheet1 is scanned and matches in sheet2 are searched, the information about a successful search is thrown away after one pass. Sheet1 will contain repetitions of a search value, and when scanning sheet2 we can make use of it's frequency.



                The means to finding unique search values and their frequencies is a dictionary object. To use it in VBA one has to add a reference to "Microsoft Scripting" in the VBA editor.

                The second assumption is that the output list does not need to preserve the input order (because it will be sorted anyway). The following code will produce an output list in sheet3 with search values in the order in which they occur in the big list but with all repetitions in one block. Statements for timing have been commented out as an external class definition is needed for this.



                Sub zym_dict()
                ' http://superuser.com/questions/976906/how-to-make-nested-loop-faster-to-find-instr-in-vba
                ' by E/S/P 2015-09-25
                ' 2nd improvement: use a dictionary object to count unique search items and loop over these
                ' speed 1:13 vs. array version; 1:186 vs. original (cell) version

                Dim numvalues As Long, i As Long, j As Long, nextresult As Long
                Dim numcompared As Long, numresults As Long
                Dim cnt As Long
                Dim ws1 As Worksheet, ws2 As Worksheet, ws3 As Worksheet
                Dim searchterm As String
                Dim values, arr2, results, c, v
                Dim uniq As New Scripting.Dictionary

                ' Dim mStopWatch As New clsStopWatch

                Set ws1 = Worksheets("sheet1")
                Set ws2 = Worksheets("sheet2")
                Set ws3 = Worksheets("sheet3")

                ' mStopWatch.StartWatch

                values = Intersect(ws1.Columns(1), ws1.UsedRange)
                arr2 = Intersect(ws2.Range("A:B"), ws2.UsedRange)
                numcompared = UBound(arr2, 1)

                ' collect unique values and their frequencies
                For i = 1 To UBound(values, 1)
                uniq(values(i, 1)) = uniq(values(i, 1)) + 1
                Next i

                numresults = 0
                ' 2nd index is repeat count
                For j = 1 To numcompared
                arr2(j, 2) = 0
                Next j

                For Each v In uniq
                searchterm = "-" & v & "-"
                cnt = uniq.Item(v)
                For j = 1 To numcompared
                If InStr(1, arr2(j, 1), searchterm) > 0 Then
                ' copy this value multiple times into result array
                arr2(j, 2) = arr2(j, 2) + cnt ' repeat count
                numresults = numresults + cnt
                End If
                Next j
                Next

                ' generate output list
                ReDim results(1 To numresults, 1 To 2)
                ws3.Columns(1).Clear
                nextresult = 0
                For i = 1 To numcompared
                v = arr2(i, 1)
                cnt = arr2(i, 2) ' may be 0!
                For j = 1 To cnt
                results(nextresult + j, 1) = v
                Next j
                nextresult = nextresult + cnt
                Next i

                ' copy values to sheet
                ws3.Range(Cells(1, 1), Cells(nextresult, 2)) = results

                ' Debug.Print "runtime = " & Format(mStopWatch.StopWatch, "#0.00 ms")
                Debug.Print Format(nextresult, "#,### resulting lines")
                End Sub


                Compared to the OP's code the speed improvement is 1:186. A 20 minute run would then only take a couple of seconds.






                share|improve this answer




























                  0














                  Although I have already offered an answer I want to propose a totally different algorithm here in order to improve the performance by another order of magnitude.

                  When the "big list" on sheet1 is scanned and matches in sheet2 are searched, the information about a successful search is thrown away after one pass. Sheet1 will contain repetitions of a search value, and when scanning sheet2 we can make use of it's frequency.



                  The means to finding unique search values and their frequencies is a dictionary object. To use it in VBA one has to add a reference to "Microsoft Scripting" in the VBA editor.

                  The second assumption is that the output list does not need to preserve the input order (because it will be sorted anyway). The following code will produce an output list in sheet3 with search values in the order in which they occur in the big list but with all repetitions in one block. Statements for timing have been commented out as an external class definition is needed for this.



                  Sub zym_dict()
                  ' http://superuser.com/questions/976906/how-to-make-nested-loop-faster-to-find-instr-in-vba
                  ' by E/S/P 2015-09-25
                  ' 2nd improvement: use a dictionary object to count unique search items and loop over these
                  ' speed 1:13 vs. array version; 1:186 vs. original (cell) version

                  Dim numvalues As Long, i As Long, j As Long, nextresult As Long
                  Dim numcompared As Long, numresults As Long
                  Dim cnt As Long
                  Dim ws1 As Worksheet, ws2 As Worksheet, ws3 As Worksheet
                  Dim searchterm As String
                  Dim values, arr2, results, c, v
                  Dim uniq As New Scripting.Dictionary

                  ' Dim mStopWatch As New clsStopWatch

                  Set ws1 = Worksheets("sheet1")
                  Set ws2 = Worksheets("sheet2")
                  Set ws3 = Worksheets("sheet3")

                  ' mStopWatch.StartWatch

                  values = Intersect(ws1.Columns(1), ws1.UsedRange)
                  arr2 = Intersect(ws2.Range("A:B"), ws2.UsedRange)
                  numcompared = UBound(arr2, 1)

                  ' collect unique values and their frequencies
                  For i = 1 To UBound(values, 1)
                  uniq(values(i, 1)) = uniq(values(i, 1)) + 1
                  Next i

                  numresults = 0
                  ' 2nd index is repeat count
                  For j = 1 To numcompared
                  arr2(j, 2) = 0
                  Next j

                  For Each v In uniq
                  searchterm = "-" & v & "-"
                  cnt = uniq.Item(v)
                  For j = 1 To numcompared
                  If InStr(1, arr2(j, 1), searchterm) > 0 Then
                  ' copy this value multiple times into result array
                  arr2(j, 2) = arr2(j, 2) + cnt ' repeat count
                  numresults = numresults + cnt
                  End If
                  Next j
                  Next

                  ' generate output list
                  ReDim results(1 To numresults, 1 To 2)
                  ws3.Columns(1).Clear
                  nextresult = 0
                  For i = 1 To numcompared
                  v = arr2(i, 1)
                  cnt = arr2(i, 2) ' may be 0!
                  For j = 1 To cnt
                  results(nextresult + j, 1) = v
                  Next j
                  nextresult = nextresult + cnt
                  Next i

                  ' copy values to sheet
                  ws3.Range(Cells(1, 1), Cells(nextresult, 2)) = results

                  ' Debug.Print "runtime = " & Format(mStopWatch.StopWatch, "#0.00 ms")
                  Debug.Print Format(nextresult, "#,### resulting lines")
                  End Sub


                  Compared to the OP's code the speed improvement is 1:186. A 20 minute run would then only take a couple of seconds.






                  share|improve this answer


























                    0












                    0








                    0







                    Although I have already offered an answer I want to propose a totally different algorithm here in order to improve the performance by another order of magnitude.

                    When the "big list" on sheet1 is scanned and matches in sheet2 are searched, the information about a successful search is thrown away after one pass. Sheet1 will contain repetitions of a search value, and when scanning sheet2 we can make use of it's frequency.



                    The means to finding unique search values and their frequencies is a dictionary object. To use it in VBA one has to add a reference to "Microsoft Scripting" in the VBA editor.

                    The second assumption is that the output list does not need to preserve the input order (because it will be sorted anyway). The following code will produce an output list in sheet3 with search values in the order in which they occur in the big list but with all repetitions in one block. Statements for timing have been commented out as an external class definition is needed for this.



                    Sub zym_dict()
                    ' http://superuser.com/questions/976906/how-to-make-nested-loop-faster-to-find-instr-in-vba
                    ' by E/S/P 2015-09-25
                    ' 2nd improvement: use a dictionary object to count unique search items and loop over these
                    ' speed 1:13 vs. array version; 1:186 vs. original (cell) version

                    Dim numvalues As Long, i As Long, j As Long, nextresult As Long
                    Dim numcompared As Long, numresults As Long
                    Dim cnt As Long
                    Dim ws1 As Worksheet, ws2 As Worksheet, ws3 As Worksheet
                    Dim searchterm As String
                    Dim values, arr2, results, c, v
                    Dim uniq As New Scripting.Dictionary

                    ' Dim mStopWatch As New clsStopWatch

                    Set ws1 = Worksheets("sheet1")
                    Set ws2 = Worksheets("sheet2")
                    Set ws3 = Worksheets("sheet3")

                    ' mStopWatch.StartWatch

                    values = Intersect(ws1.Columns(1), ws1.UsedRange)
                    arr2 = Intersect(ws2.Range("A:B"), ws2.UsedRange)
                    numcompared = UBound(arr2, 1)

                    ' collect unique values and their frequencies
                    For i = 1 To UBound(values, 1)
                    uniq(values(i, 1)) = uniq(values(i, 1)) + 1
                    Next i

                    numresults = 0
                    ' 2nd index is repeat count
                    For j = 1 To numcompared
                    arr2(j, 2) = 0
                    Next j

                    For Each v In uniq
                    searchterm = "-" & v & "-"
                    cnt = uniq.Item(v)
                    For j = 1 To numcompared
                    If InStr(1, arr2(j, 1), searchterm) > 0 Then
                    ' copy this value multiple times into result array
                    arr2(j, 2) = arr2(j, 2) + cnt ' repeat count
                    numresults = numresults + cnt
                    End If
                    Next j
                    Next

                    ' generate output list
                    ReDim results(1 To numresults, 1 To 2)
                    ws3.Columns(1).Clear
                    nextresult = 0
                    For i = 1 To numcompared
                    v = arr2(i, 1)
                    cnt = arr2(i, 2) ' may be 0!
                    For j = 1 To cnt
                    results(nextresult + j, 1) = v
                    Next j
                    nextresult = nextresult + cnt
                    Next i

                    ' copy values to sheet
                    ws3.Range(Cells(1, 1), Cells(nextresult, 2)) = results

                    ' Debug.Print "runtime = " & Format(mStopWatch.StopWatch, "#0.00 ms")
                    Debug.Print Format(nextresult, "#,### resulting lines")
                    End Sub


                    Compared to the OP's code the speed improvement is 1:186. A 20 minute run would then only take a couple of seconds.






                    share|improve this answer













                    Although I have already offered an answer I want to propose a totally different algorithm here in order to improve the performance by another order of magnitude.

                    When the "big list" on sheet1 is scanned and matches in sheet2 are searched, the information about a successful search is thrown away after one pass. Sheet1 will contain repetitions of a search value, and when scanning sheet2 we can make use of it's frequency.



                    The means to finding unique search values and their frequencies is a dictionary object. To use it in VBA one has to add a reference to "Microsoft Scripting" in the VBA editor.

                    The second assumption is that the output list does not need to preserve the input order (because it will be sorted anyway). The following code will produce an output list in sheet3 with search values in the order in which they occur in the big list but with all repetitions in one block. Statements for timing have been commented out as an external class definition is needed for this.



                    Sub zym_dict()
                    ' http://superuser.com/questions/976906/how-to-make-nested-loop-faster-to-find-instr-in-vba
                    ' by E/S/P 2015-09-25
                    ' 2nd improvement: use a dictionary object to count unique search items and loop over these
                    ' speed 1:13 vs. array version; 1:186 vs. original (cell) version

                    Dim numvalues As Long, i As Long, j As Long, nextresult As Long
                    Dim numcompared As Long, numresults As Long
                    Dim cnt As Long
                    Dim ws1 As Worksheet, ws2 As Worksheet, ws3 As Worksheet
                    Dim searchterm As String
                    Dim values, arr2, results, c, v
                    Dim uniq As New Scripting.Dictionary

                    ' Dim mStopWatch As New clsStopWatch

                    Set ws1 = Worksheets("sheet1")
                    Set ws2 = Worksheets("sheet2")
                    Set ws3 = Worksheets("sheet3")

                    ' mStopWatch.StartWatch

                    values = Intersect(ws1.Columns(1), ws1.UsedRange)
                    arr2 = Intersect(ws2.Range("A:B"), ws2.UsedRange)
                    numcompared = UBound(arr2, 1)

                    ' collect unique values and their frequencies
                    For i = 1 To UBound(values, 1)
                    uniq(values(i, 1)) = uniq(values(i, 1)) + 1
                    Next i

                    numresults = 0
                    ' 2nd index is repeat count
                    For j = 1 To numcompared
                    arr2(j, 2) = 0
                    Next j

                    For Each v In uniq
                    searchterm = "-" & v & "-"
                    cnt = uniq.Item(v)
                    For j = 1 To numcompared
                    If InStr(1, arr2(j, 1), searchterm) > 0 Then
                    ' copy this value multiple times into result array
                    arr2(j, 2) = arr2(j, 2) + cnt ' repeat count
                    numresults = numresults + cnt
                    End If
                    Next j
                    Next

                    ' generate output list
                    ReDim results(1 To numresults, 1 To 2)
                    ws3.Columns(1).Clear
                    nextresult = 0
                    For i = 1 To numcompared
                    v = arr2(i, 1)
                    cnt = arr2(i, 2) ' may be 0!
                    For j = 1 To cnt
                    results(nextresult + j, 1) = v
                    Next j
                    nextresult = nextresult + cnt
                    Next i

                    ' copy values to sheet
                    ws3.Range(Cells(1, 1), Cells(nextresult, 2)) = results

                    ' Debug.Print "runtime = " & Format(mStopWatch.StopWatch, "#0.00 ms")
                    Debug.Print Format(nextresult, "#,### resulting lines")
                    End Sub


                    Compared to the OP's code the speed improvement is 1:186. A 20 minute run would then only take a couple of seconds.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Sep 25 '15 at 17:07









                    user1016274user1016274

                    1,246713




                    1,246713























                        0














                        I would use the Power Query Add-In for this. It has a Text.Contains function that is roughly similar to VB's InStr. I had a go at this particular challenge and got it working. You can download and use my demo file from my OneDrive:



                        http://1drv.ms/1AzPAZp



                        It's the file: Power Query demo - Searching for a list of strings among another list of strings.xlsx.



                        As described on the ReadMe sheet, I didn't have to write many functions - it was mostly built by clicking around the UI.



                        My design is to cross-join the Search and Target tables (the equivalent of your Sheet1 and Sheet2 I think) to get every possibly combination, then apply the Text.Contains function and filter on the result.



                        A key design objective is speed - it runs in about 1 second for the current semi-random test data:
                        19 Search Strings (currently single words)
                        78780 Target Strings (currently lines from War and Peace)
                        (so around 1.5 million combinations)
                        9268 Output matches.



                        So non-trivial scale, but nowhere near your requirements. Hopefully that will scale up to your needs - I'm keen to hear how it goes.



                        Note that the Target_Strings query could be replaced with one querying data directly from a database or website - Power Query is not limited to Excel as a datasource.






                        share|improve this answer






























                          0














                          I would use the Power Query Add-In for this. It has a Text.Contains function that is roughly similar to VB's InStr. I had a go at this particular challenge and got it working. You can download and use my demo file from my OneDrive:



                          http://1drv.ms/1AzPAZp



                          It's the file: Power Query demo - Searching for a list of strings among another list of strings.xlsx.



                          As described on the ReadMe sheet, I didn't have to write many functions - it was mostly built by clicking around the UI.



                          My design is to cross-join the Search and Target tables (the equivalent of your Sheet1 and Sheet2 I think) to get every possibly combination, then apply the Text.Contains function and filter on the result.



                          A key design objective is speed - it runs in about 1 second for the current semi-random test data:
                          19 Search Strings (currently single words)
                          78780 Target Strings (currently lines from War and Peace)
                          (so around 1.5 million combinations)
                          9268 Output matches.



                          So non-trivial scale, but nowhere near your requirements. Hopefully that will scale up to your needs - I'm keen to hear how it goes.



                          Note that the Target_Strings query could be replaced with one querying data directly from a database or website - Power Query is not limited to Excel as a datasource.






                          share|improve this answer




























                            0












                            0








                            0







                            I would use the Power Query Add-In for this. It has a Text.Contains function that is roughly similar to VB's InStr. I had a go at this particular challenge and got it working. You can download and use my demo file from my OneDrive:



                            http://1drv.ms/1AzPAZp



                            It's the file: Power Query demo - Searching for a list of strings among another list of strings.xlsx.



                            As described on the ReadMe sheet, I didn't have to write many functions - it was mostly built by clicking around the UI.



                            My design is to cross-join the Search and Target tables (the equivalent of your Sheet1 and Sheet2 I think) to get every possibly combination, then apply the Text.Contains function and filter on the result.



                            A key design objective is speed - it runs in about 1 second for the current semi-random test data:
                            19 Search Strings (currently single words)
                            78780 Target Strings (currently lines from War and Peace)
                            (so around 1.5 million combinations)
                            9268 Output matches.



                            So non-trivial scale, but nowhere near your requirements. Hopefully that will scale up to your needs - I'm keen to hear how it goes.



                            Note that the Target_Strings query could be replaced with one querying data directly from a database or website - Power Query is not limited to Excel as a datasource.






                            share|improve this answer















                            I would use the Power Query Add-In for this. It has a Text.Contains function that is roughly similar to VB's InStr. I had a go at this particular challenge and got it working. You can download and use my demo file from my OneDrive:



                            http://1drv.ms/1AzPAZp



                            It's the file: Power Query demo - Searching for a list of strings among another list of strings.xlsx.



                            As described on the ReadMe sheet, I didn't have to write many functions - it was mostly built by clicking around the UI.



                            My design is to cross-join the Search and Target tables (the equivalent of your Sheet1 and Sheet2 I think) to get every possibly combination, then apply the Text.Contains function and filter on the result.



                            A key design objective is speed - it runs in about 1 second for the current semi-random test data:
                            19 Search Strings (currently single words)
                            78780 Target Strings (currently lines from War and Peace)
                            (so around 1.5 million combinations)
                            9268 Output matches.



                            So non-trivial scale, but nowhere near your requirements. Hopefully that will scale up to your needs - I'm keen to hear how it goes.



                            Note that the Target_Strings query could be replaced with one querying data directly from a database or website - Power Query is not limited to Excel as a datasource.







                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Oct 2 '15 at 3:43

























                            answered Sep 24 '15 at 7:45









                            Mike HoneyMike Honey

                            1,7791611




                            1,7791611






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Super User!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f976906%2fhow-to-make-nested-loop-faster-to-find-instr-in-vba%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                How do I know what Microsoft account the skydrive app is syncing to?

                                When does type information flow backwards in C++?

                                Grease: Live!