How to Create a Web Scraper in ASP.NET MVC and jQuery

How to Create a Web Scraper in ASP.NET MVC and jQuery

A Web Scraper is a software that helps in extracting data from websites. They can be use to extract typical information like emails, telephone numbers, addresses, etc from different URL.

This extracting technique is also known as ‘Data Harvesting’.

I created this tutorial to teach how to create your own Web Scraper in ASP.NET MVC and jQuery. This Scraper will extract all emails and telephone numbers from a specified URL and show them in a HTML div control.

Let me tell you it is quite easy to create and you will enjoy the simple codes I have provided.

Web Scraper HTML Design

The HTML design of the Web Scraper consists of:

  • An input control of type text where the URL of the page(to be crawled) is put.
  • A button which when clicked will start the data harvesting procedure.
  • A div where the extracted emails and telephone numbers will be shown.
web scraper html

ASP.NET MVC Controller

First create a Controller in your ASP.NET MVC application. Name the controller as WebScrapingController or you can name it anything else.

Now create a function GetUrlSource in this controller and make it as a [HttpPost] type. This function will be called on the button click event by the jQuery AJAX method.

This Code of GetUrlSource Function is:

[HttpPost]
public string GetUrlSource(string url)
{
    url = url.Substring(0, 4) != "http" ? "http://" + url : url;
    string htmlCode = "";
    using (WebClient client = new WebClient())
    {
        try
        {
            htmlCode = client.DownloadString(url);
        }
        catch (Exception ex)
        {
 
        }
    }
    return htmlCode;
}

Explanation – The GetUrlSource function receives the URL of the page in its parameter. It reads the HTML (page source) of the URL using WebClient.DownloadString() function and then returns this HTML in the end.

ASP.NET MVC View

Create a view named Index for the WebScrapingController controller and place the below html code in it.

<div id="message"></div>
<input id="urlInput" type="text" placeholder="Enter URL" />
<button id="submit">Submit</button>
<div class="textAlignCenter">
    <img src="~/Content/Image/loading.gif" />
</div>
<div id="twoColumn">
    <div></div>
    <div></div>
</div>

Explanation – The above HTML code contains twoColumn div that contains two inner divs. The first inner div will show the fetched emails while the second one will show the fetched telephone numbers.

I have written 2 very useful tutorial on Validation in ASP.NET MVC. You should check them:

1. Server Side Validation in ASP.NET Core

2. Client Side Validation in ASP.NET Core

Now add the below jQuery Code to the view:

<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.1.0/jquery.min.js"></script>
<script>
    $(document).ready(function () {
        $("#reset").click(function (e) {
            $("#urlInput").val("")
            $("#twoColumn > div").html("")
        });
 
        $("#submit").click(function (e) {
            var validate = Validate();
            $("#message").html(validate);
 
            if (validate.length == 0) {
                $.ajax({
                    type: "POST",
                    url: "/WebScraping/GetUrlSource",
                    contentType: "application/json; charset=utf-8",
                    data: '{"url":"' + $("#urlInput").val() + '"}',
                    dataType: "html",
                    success: function (result, status, xhr) {
                        GetUrlTelePhone(result);
                    },
                    error: function (xhr, status, error) {
                        $("#message").html("Result: " + status + " " + error + " " + xhr.status + " " + xhr.statusText)
                    }
                });
            }
        });
 
        function GetUrlTelePhone(html) {
            emails = html.match(/([a-zA-Z0-9._-]+@@[a-zA-Z0-9._-]+\.[a-zA-Z0-9._-]+)/gi);
            emails = emails != null ? $.uniqueSort(emails) : "";
            var email = $("<p><u>Emails Found:-</u></p>");
            for (var i = 0, il = emails.length; i < il; i++)
                email.append("<p>" + (i + 1) + ". " + emails[i] + "</p>");
            $("#twoColumn > div").first().html(email);
 
            tels = html.match(/\(?([0-9]{3})\)?([ .-]?)([0-9]{3})\2([0-9]{4})/);
            tels = tels != null ? $.uniqueSort(tels) : "";
            tels = $.uniqueSort(tels);
            var tel = $("<p><u>Telephones Found:-</u></p>");
            for (var i = 0, il = tels.length; i < il; i++) {
                if (tels.length > 4)
                    tel.append("<p>" + (i + 1) + ". " + tels[i] + "</p>");
            }
            $("#twoColumn > div:nth-child(2)").html(tel);
        }
 
        $(document).ajaxStart(function () {
            $("img").show();
        });
 
        $(document).ajaxStop(function () {
            $("img").hide();
        });
 
 
        function Validate() {
            var errorMessage = "";
            if ($("#urlInput").val() == "") {
                errorMessage += "► Enter URL<br/>";
            }
            else if (!(isUrlValid($("#urlInput").val()))) {
                errorMessage += "► Invalid URL<br/>";
            }
 
            return errorMessage;
        }
 
        function isUrlValid(url) {
            var urlregex = new RegExp(
          "^(http[s]?:\\/\\/(www\\.)?|ftp:\\/\\/(www\\.)?|www\\.){1}([0-9A-Za-z-\\.@@:%_\+~#=]+)+((\\.[a-zA-Z]{2,3})+)(/(.)*)?(\\?(.)*)?");
            return urlregex.test(url);
        }
    });
</script>

Explanation – On the button click event the jQuery AJAX method calls the C# function – GetUrlSource of the controller.

Also note, on the success function of the jQuery AJAX method, I have called the jQuery function GetUrlTelePhone and have passed the URL’s HTML code to its parameter.

In the GetUrlTelePhone function I fetched the emails and telephone numbers using regular expressions, finally showing them at the end.

Kindly check the below link to download the codes:

DOWNLOAD

SHARE THIS ARTICLE

  • linkedin
  • reddit
yogihosting

ABOUT THE AUTHOR

I hope you enjoyed reading this tutorial. If it helped you then consider buying a cup of coffee for me. This will help me in writing more such good tutorials for the readers. Thank you. Buy Me A Coffee donate