Case Study:

Website Harvesting Solutions

For Different Scenario and Objective

1Helped client analyze the competition in Pharmaceutical sector for a given region, city, state and country.

2Knowing personality trait via social media footprints helped the client save money that they were spending on the manual process.

3The dashboard from WhatsApp helped client analyze and send appropriate responses to their clients in less time.

What Client Wanted

The client came with 3 different requirements that were heavily data-driven with data required to be in the right format. The client demanded an intelligent system that can handle the data for the following purposes;

  • WhatsApp Task

    Getting the live message feeds from the instant messaging platform ‘WhatsApp’. The data collected should be shown in a dashboard with bifurcation into Sentimental Analysis, Message Timeline, Active Groups, Trending Words and Geolocation Plotting.

  • Personality Trait Task

    Using the social media data from Facebook, predict the five personality traits of a person that is O.C.E.A.N (Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism).

  • Pharmacy Store Task

    Web harvest to get the data (Store Name, address, Contact Number, Operation Hours etc.) of all the Pharmacy stores of a particular country.

What were the Challenges

  • 01The biggest challenge was to understand the possibilities of acquiring relevant data set from the web.
  • 02After mapping the data, the next challenge was to transform the collected data in a more structured way.
  • 03Create a tool and methodology that is both cost and time saving and obeys/follows the copyright policy and terms of use of websites and platforms harvested.
  • 04The messages received on WhatsApp are Dynamic in nature with variations in the form of Images, Videos and Text. Non-availability of WhatsApp API.

Technology Used

  • WhatsApp Task

    Programming Language used is Python, Selenium for harvesting data, Visualizations – Dash (Plotly)

  • Personality Trait Task

    Programming Language used is Python and Machine Learning Algorithm for Regression

  • Pharmacy Store Task

    Programming Language used is Python and Google API

The Process

Success Criteria

  • Phyton library helped for pulling and extracting data out of HTML and XML files saving hours and days of manual work.
  • Mapping of the data using machine learning helped structure the data and used to analyze it.
  • Dashboards lead to the visual presentation of the analyzed data helping clients understand what the data is saying in an easy and organized manner.

Client Testimonial

  • Mamsys Web harvesting team is quick at their job. The extraction process proved our business quite helping with the data collected and processed was accurate and cost-effective. We are really happy with the kind of excellence they have provided to our job.

Tell Us About Your Project

Take 30 seconds to fill out our form so that we can learn more about you and your project.