E-commerce web sites generate giant quantities of textual information. These companies rent information science professionals to refine this unstructured information and collect significant insights from it that may assist in understanding the end-user in a greater approach. For instance, by analyzing product reviews, Flipkart can perceive the insights of the product, Netflix can discover customers’ likeness on their content material and we are able to’t think about doing this evaluation will occur with out Text analytics.
Topics we cowl on this article:
- How to Extract Product critiques from Flipkart web site
- Preprocessing of the Extracted critiques
- Extracting and Analyzing Positive critiques
- Extracting and Negative critiques
In this text, we’ll extract the critiques of Macbook air laptop computer from the Flipkart web site and carry out text analysis.
Hands-on implementation of Flipkart evaluation scarping
#Importing required libraries import requests from bs4 import BeautifulSoup as bs import re import nltk import matplotlib.pyplot as plt from wordcloud import WordCloud import os
Extracting critiques from Flipkart for MacBook Air
Here we’re going to extract critiques of Macbook air laptops from the URL.
#Scraping evaluation utilizing beautifulsoup macbook_reviews= for i in vary(1,30): mac= url="https://www.flipkart.com/apple-macbook-air-core-i5-5th-gen-8-gb-128-gb-ssd-mac-os-sierra-mqd32hn-a-a1466/product-reviews/itmevcpqqhf6azn3?pid=COMEVCPQBXBDFJ8C&page="+str(i) response = requests.get(url) soup = bs(response.content material,"html.parser")# creating soup object to iterate over the extracted content material critiques = soup.findAll("div",attrs="class","qwjRop")# Extracting the content material beneath particular tags for i in vary(len(critiques)): mac.append(critiques[i].textual content) macbook_reviews=macbook_reviews+mac #right here we saving the extracted information with open("macbook.txt","w",encoding='utf8') as output: output.write(str(macbook_reviews))
Until right here we extracted critiques from the web site and saved them in a file named macbook_reviews.
The extracted product critiques embrace undesirable characters like areas, capital letters, symbols, smiley emojis. We don’t need to embrace these undesirable characters in textual content evaluation, so in preprocessing we have to clear the info by eradicating undesirable characters.
os.getcwd() os.chdir("/content/chider") # Joining all of the critiques into single paragraph mac_rev_string = " ".be part of(macbook_reviews) # Removing undesirable symbols incase if exists mac_rev_string = re.sub("[^A-Za-z" "]+"," ",mac_rev_string).decrease() mac_rev_string = re.sub("[0-9" "]+"," ",mac_rev_string) #right here we're splitting the phrases as particular person string mac_reviews_words = mac_rev_string.cut up(" ") #eradicating the cease phrases #stop_words = stopwords('english')
In the beneath code snippet, we’ll collect the phrases from the critiques and show it utilizing the phrase cloud
with open("/content/stop.txt","r") as sw: stopwords = sw.learn() temp = ["this","is","awsome","Data","Science"] [i for i in temp if i not in "is"] mac_reviews_words = [w for w in mac_reviews_words if not w in stopwords] mac_rev_string = " ".be part of(mac_reviews_words) #creating phrase cloud for all phrases wordcloud_mac = WordCloud( background_color='black', width=1800, top=1400 ).generate(mac_rev_string) plt.imshow(wordcloud_mac)
From phrase cloud output the phrases like good, learn, the laptop computer seems within the greater dimension that illustrates these phrases are repeated extra occasions within the MacBook air critiques. By observing this phrase cloud, we are able to see the highlighted phrases like efficiency, battery, supply, the laptop computer, we are able to’t conclude how the battery works and, the way it performs, to get insights from this output we have to divide this right into a constructive and unfavourable phrase cloud.
In the beneath code snippet we’ll extract Positive phrases from product critiques
with open("/content/positive-words.txt","r") as pos: poswords = pos.learn().cut up("n") poswords = poswords[36:] mac_pos_in_pos = " ".be part of ([w for w in mac_reviews_words if w in poswords]) wordcloud_pos_in_pos = WordCloud( background_color='black', width=1800, top=1400 ).generate(mac_pos_in_pos) plt.imshow(wordcloud_pos_in_pos) #right here we get wordcloud of all postive phrases in critiques
Here, via this constructive phrase cloud, we are able to get some insights just like the product was good, easy, quick, superior product, advocate to others, moveable to make use of, lovely product, these are the constructive insights from the MacBook air product.
In the beneath code snippet we’ll extract Negative phrases from product critiques
with open("/content/negative-words.txt","r",encoding = "ISO-8859-1") as neg: negwords = neg.learn().cut up("n") negwords = negwords[37:] # unfavourable phrase cloud # Choosing the one phrases that are current in negwords mac_neg_in_neg = " ".be part of ([w for w in mac_reviews_words if w in negwords]) wordcloud_neg_in_neg = WordCloud( background_color='black', width=1800, top=1400 ).generate(mac_neg_in_neg) plt.imshow(wordcloud_neg_in_neg) #right here we're getting essentially the most repeated unfavourable Wordcloud
Now via this unfavourable phrase cloud, we are able to illustrate that the product was lag, sluggish, crashed, we’ve points within the product, it was so costly, pathetic.
By analyzing the product critiques utilizing textual content mining we gathered most appeared constructive and unfavourable phrases utilizing the phrase clouds. We can conclude that textual content mining features insights into buyer sentiment and may help corporations in addressing the problems. This method gives a chance to enhance the general buyer expertise which returns enormous earnings.