How to Create a Proxy Grabber/Scraper in Visual Basic

Submitted by: 
Language: 
Visitors have accessed this post 2215 times.

Introduction:
Welcome to my tutorial on how to create a proxy ip:port grabber.

Steps of Creation:
Step 1:

First create a form with one button, this will allow the user to select a save location and begin the process.

We also want to import a few things and create a global string of the link in which we want to extract the proxy information from...

  1. Imports System.Text.RegularExpressions
  2. Imports System.Net
  3. Imports System.IO
  1. Dim link As String = "<a href="http://free-proxy-list.net/uk-proxy.html"

" rel="nofollow">http://free-proxy-list.net/uk-proxy.html"
[/geshifilter-vb]

Step 2:
Now we want to make a couple of function, we will use these to extract strings from bigger strings later on.

  1. Private Function GetBetween(ByVal Source As String, ByVal Str1 As String, ByVal Str2 As String, Optional ByVal Index As Integer = 0) As String
  2. Return Regex.Split(Regex.Split(Source, Str1)(Index + 1), Str2)(0)
  3. End Function
  4. Private Function GetBetweenAll(ByVal Source As String, ByVal Str1 As String, ByVal Str2 As String) As String()
  5. Dim Results, T As New List(Of String)
  6. T.AddRange(Regex.Split(Source, Str1))
  7. T.RemoveAt(0)
  8. For Each I As String In T
  9. Results.Add(Regex.Split(I, Str2)(0))
  10. Next
  11. Return Results.ToArray
  12. End Function

Step 3:
For the button click event we want to first let the user select a saving text file directory. Then check if the path is not nothing/null/empty.

  1. Dim fo As New SaveFileDialog()
  2. fo.Filter = "Text Files|*.txt"
  3. fo.FilterIndex = 1
  4. fo.ShowDialog()
  5. If (Not fo.FileName = Nothing) Then
  6. End If

Step 4:
Within the if statement we want to get the source code of the given page link and extract the information. Once we have extracted the information we write it line by line in to the text file location.

  1. Dim r As HttpWebRequest = HttpWebRequest.Create(link)
  2. r.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.69 Safari/537.36"
  3. r.KeepAlive = True
  4. Dim re As HttpWebResponse = r.GetResponse()
  5. Dim src As String = New StreamReader(re.GetResponseStream()).ReadToEnd()
  6. Dim rows As String() = GetBetweenAll(src, "<tr>", "</tr>")
  7. Dim tds As New List(Of String)
  8. Dim dones As New List(Of String)
  9. For Each s As String In rows
  10. If (Not s = rows(0) And s.Contains("<td>") And s.Contains("</td>")) Then
  11. Dim td As String() = GetBetweenAll(s, "<td>", "</td>")
  12. Dim ip As String = td(0)
  13. Dim port As String = td(1)
  14. dones.Add(ip & ":" & port)
  15. End If
  16. Next
  17. Using sw As New StreamWriter(fo.FileName)
  18. For Each s As String In dones
  19. sw.WriteLine(s)
  20. Next
  21. End Using
  22. MsgBox("Finished and wrote!")

Project Complete!
Below is the full source code and download to the project files.

  1. Imports System.Text.RegularExpressions
  2. Imports System.Net
  3. Imports System.IO
  4. Public Class Form1
  5. Dim link As String = "<a href="http://free-proxy-list.net/uk-proxy.html"
  6. " rel="nofollow">http://free-proxy-list.net/uk-proxy.html"
  7. </a> Private Function GetBetween(ByVal Source As String, ByVal Str1 As String, ByVal Str2 As String, Optional ByVal Index As Integer = 0) As String
  8. Return Regex.Split(Regex.Split(Source, Str1)(Index + 1), Str2)(0)
  9. End Function
  10. Private Function GetBetweenAll(ByVal Source As String, ByVal Str1 As String, ByVal Str2 As String) As String()
  11. Dim Results, T As New List(Of String)
  12. T.AddRange(Regex.Split(Source, Str1))
  13. T.RemoveAt(0)
  14. For Each I As String In T
  15. Results.Add(Regex.Split(I, Str2)(0))
  16. Next
  17. Return Results.ToArray
  18. End Function
  19. Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
  20. Dim fo As New SaveFileDialog()
  21. fo.Filter = "Text Files|*.txt"
  22. fo.FilterIndex = 1
  23. fo.ShowDialog()
  24. If (Not fo.FileName = Nothing) Then
  25. Dim r As HttpWebRequest = HttpWebRequest.Create(link)
  26. r.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.69 Safari/537.36"
  27. r.KeepAlive = True
  28. Dim re As HttpWebResponse = r.GetResponse()
  29. Dim src As String = New StreamReader(re.GetResponseStream()).ReadToEnd()
  30. Dim rows As String() = GetBetweenAll(src, "<tr>", "</tr>")
  31. Dim tds As New List(Of String)
  32. Dim dones As New List(Of String)
  33. For Each s As String In rows
  34. If (Not s = rows(0) And s.Contains("<td>") And s.Contains("</td>")) Then
  35. Dim td As String() = GetBetweenAll(s, "<td>", "</td>")
  36. Dim ip As String = td(0)
  37. Dim port As String = td(1)
  38. dones.Add(ip & ":" & port)
  39. End If
  40. Next
  41. Using sw As New StreamWriter(fo.FileName)
  42. For Each s As String In dones
  43. sw.WriteLine(s)
  44. Next
  45. End Using
  46. MsgBox("Finished and wrote!")
  47. End If
  48. End Sub
  49. End Class

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • You may insert videos with [video:URL]
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd> <table> <tr> <td> <th> <img> <h1> <h2> <h3> <iframe> [video]
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <asp>, <c>, <cpp>, <csharp>, <css>, <html4strict>, <java>, <javascript>, <mysql>, <php>, <python>, <sql>, <vb>, <vbnet>. The supported tag styles are: <foo>, [foo].
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.