Smart Crawler: A Two-Stage Crawler for Efficiently Har vesting Deep-Web Interfaces
As deep web g rows a t a very fast pace, there has been increased interest in techniques that help efficiently locate deep-web interfaces. However, due to the large volume of web resources and the dynamic nature of deep web, achieving wide coverage and high efficiency is a challenging issue. We propose a two-stage frame work , namely Smart Crawler, for efficient harvesting deep web interfaces. In the first stage, Smart Crawler performs site-based searching for center pages with the help of search engines, avoiding visiting a large number of pages. To achieve more accurate results for a focused crawl, Smart Crawler ranks websites to prioritize highly relevant ones for a given topic. In the second stage, Smart Crawler achieves fast in-site searching by excavating most relevant links with an adaptive link-ranking. To eliminate bias on visiting some highly relevant links in hidden web directories, we design a link tree data structure to achieve wider coverage for a website. Our experimental results on a set of representative domains show the agility and accuracy of our proposed crawler framework, which efficiently retrieves deep-web interfaces from large-scale sites and achieves higher harvest rates than other crawlers
ClickMyProject Specifications
|
|
|
Including Packages
|
|
Specialization
|
|
|
* Supporting Softwares |
|
* 24/7 Support |
|
* Complete Source Code |
|
* Ticketing System |
|
* Complete Documentation |
|
* Voice Conference |
|
* Complete Presentation Slides |
|
* Video On Demand * |
|
* Flow Diagram |
|
* Remote Connectivity * |
|
* Database File |
|
* Code Customization ** |
|
* Screenshots |
|
* Document Customization ** |
|
* Execution Procedure |
|
* Live Chat Support |
|
* Readme File |
|
* Toll Free Support * |
|
* Addons |
|
|
|
* Video Tutorials |
|
|
|
|
|
|
|
*- PremiumSupport Service (Based on Service Hours) ** - Premium Development Service (Based on Requirements) |

|
This product was added to our catalog on Friday 28 July, 2017.