My first Web Scraping challenge – Analyzing Flipkart Product Reviews utilizing Text Mining

E-commerce web sites generate giant quantities of textual information. These companies rent information science professionals to refine this unstructured information and collect significant insights from it that may assist in understanding the end-user in a greater approach. For instance, by analyzing product reviews, Flipkart can perceive the insights of the product, Netflix can discover customers’ likeness on their content material and we are able to’t think about doing this evaluation will occur with out Text analytics.

Topics we cowl on this article:

  • How to Extract Product critiques from Flipkart web site
  • Preprocessing of the Extracted critiques
  • Extracting and Analyzing Positive critiques 
  • Extracting and Negative critiques 

In this text, we’ll extract the critiques of Macbook air laptop computer from the Flipkart web site and carry out text analysis.

Hands-on implementation of Flipkart evaluation scarping

#Importing required libraries
import requests   
from bs4 import BeautifulSoup as bs 
import re 
import nltk
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import os

Extracting critiques from Flipkart for MacBook Air

Here we’re going to extract critiques of Macbook air laptops from the URL.





#Scraping evaluation utilizing beautifulsoup
macbook_reviews=[]
for i in vary(1,30):
  mac=[]  
  url="https://www.flipkart.com/apple-macbook-air-core-i5-5th-gen-8-gb-128-gb-ssd-mac-os-sierra-mqd32hn-a-a1466/product-reviews/itmevcpqqhf6azn3?pid=COMEVCPQBXBDFJ8C&page="+str(i)
  response = requests.get(url)
  soup = bs(response.content material,"html.parser")# creating soup object to iterate over the extracted content material 
  critiques = soup.findAll("div",attrs="class","qwjRop")# Extracting the content material beneath particular tags  
  for i in vary(len(critiques)):
    mac.append(critiques[i].textual content)  
  macbook_reviews=macbook_reviews+mac 
#right here we saving the extracted information 
with open("macbook.txt","w",encoding='utf8') as output:
    output.write(str(macbook_reviews))

Until right here we extracted critiques from the web site and saved them in a file named macbook_reviews.

Preprocessing 

The extracted product critiques embrace undesirable characters like areas, capital letters, symbols, smiley emojis.  We don’t need to embrace these undesirable characters in textual content evaluation, so in preprocessing we have to clear the info by eradicating undesirable characters.


W3Schools


os.getcwd()
os.chdir("/content/chider")  
# Joining all of the critiques into single paragraph 
mac_rev_string = " ".be part of(macbook_reviews) 
# Removing undesirable symbols incase if exists
mac_rev_string = re.sub("[^A-Za-z" "]+"," ",mac_rev_string).decrease()
mac_rev_string = re.sub("[0-9" "]+"," ",mac_rev_string)   
#right here we're splitting the phrases as particular person string
mac_reviews_words = mac_rev_string.cut up(" ")
#eradicating the cease phrases
#stop_words = stopwords('english')

In the beneath code snippet, we’ll collect the phrases from the critiques and show it utilizing the phrase cloud

with open("/content/stop.txt","r") as sw:
    stopwords = sw.learn()
temp = ["this","is","awsome","Data","Science"]
[i for i in temp if i not in "is"]
mac_reviews_words = [w for w in mac_reviews_words if not w in stopwords]
mac_rev_string = " ".be part of(mac_reviews_words)
#creating phrase cloud for all phrases
wordcloud_mac = WordCloud(
                      background_color='black',
                      width=1800,
                      top=1400
                     ).generate(mac_rev_string)
plt.imshow(wordcloud_mac)

From phrase cloud output the phrases like good, learn, the laptop computer seems within the greater dimension that illustrates these phrases are repeated extra occasions within the MacBook air critiques. By observing this phrase cloud, we are able to see the highlighted phrases like efficiency, battery, supply, the laptop computer, we are able to’t conclude how the battery works and, the way it performs, to get insights from this output we have to divide this right into a constructive and unfavourable phrase cloud.

See Also

Web Scraping In Python Vs R


In the beneath code snippet we’ll extract Positive phrases from product critiques

with open("/content/positive-words.txt","r") as pos:
  poswords = pos.learn().cut up("n")  
  poswords = poswords[36:]
mac_pos_in_pos = " ".be part of ([w for w in mac_reviews_words if w in poswords])
wordcloud_pos_in_pos = WordCloud(
                      background_color='black',
                      width=1800,
                      top=1400
                     ).generate(mac_pos_in_pos)
plt.imshow(wordcloud_pos_in_pos)
#right here we get wordcloud of all postive phrases in critiques
Flipkart review scraping 

Here, via this constructive phrase cloud, we are able to get some insights just like the product was good, easy, quick, superior product, advocate to others, moveable to make use of, lovely product, these are the constructive insights from the MacBook air product.

In the beneath code snippet we’ll extract Negative phrases from product critiques 

with open("/content/negative-words.txt","r",encoding = "ISO-8859-1") as neg:
  negwords = neg.learn().cut up("n")
  negwords = negwords[37:]
# unfavourable phrase cloud
# Choosing the one phrases that are current in negwords
mac_neg_in_neg = " ".be part of ([w for w in mac_reviews_words if w in negwords])
wordcloud_neg_in_neg = WordCloud(
                      background_color='black',
                      width=1800,
                      top=1400
                     ).generate(mac_neg_in_neg)
plt.imshow(wordcloud_neg_in_neg)
#right here we're getting essentially the most repeated unfavourable Wordcloud
Flipkart review scraping 

Now via this unfavourable phrase cloud, we are able to illustrate that the product was lag, sluggish, crashed, we’ve points within the product, it was so costly, pathetic.

Conclusion

By analyzing the product critiques utilizing textual content mining we gathered most appeared constructive and unfavourable phrases utilizing the phrase clouds. We can conclude that textual content mining features insights into buyer sentiment and may help corporations in addressing the problems. This method gives a chance to enhance the general buyer expertise which returns enormous earnings.

Provide your feedback beneath

feedback


If you really liked this story, do be part of our Telegram Community.


Also, you’ll be able to write for us and be one of many 500+ consultants who’ve contributed tales at AIM. Share your nominations here.

Prudhvi varma

Prudhvi varma

AI fanatic, Currently working with Analytics India Magazine. I’ve expertise of working with Machine studying, Deep studying real-time issues, Neural networks, structuring and machine studying initiatives. I’m a Computer Vision researcher and I’m Interested in fixing real-time laptop imaginative and prescient issues.

LEAVE A REPLY

Please enter your comment!
Please enter your name here