Hello Pythonistas and SEO wizards, I have been working on python for getting a quick content analysis of the Search Engine Result Pages (SERP)s and got some good results. I am sharing the colab notebook here
This starts with fetching the first 50 links(can be changed according to your needs) for a keyword from search results. It then goes ahead and fetches content, title and headings from each of the pages.
Now the real analysis starts.
I went ahead and did parts of speech tagging for the content, and then counted all the tags. This allows me to count occurrences of any type of parts of speech present. I chose to limit my results to Nouns, Verbs, Adjectives and Adverbs.
Next comes Entity analysis. The script uses spacy and goes over all the words while predicting any entity it finds. All the entities can then be found in a dataframe with all their counts and entity types.
Fetching common words and phrases is one of the most useful things this script can do. It looks for phrases of variable lengths in titles and content. Phrase length ranges from 2 to 4 for titles and 3 to 6 for phrases in the content. This can give a good idea about which word combinations are being used the most.
Visualizing all these findings makes reading the data easier. I tried to add graphs for almost everything important and also tried to make them look pretty xD
Now comes the most important stuff which took me a lot of time to implement. At this point, we have the counts of all the common words and their tags, how they are used and in which sentences. But only the counts do not reflect how important each keyword actually is.
There has to be a better, trustworthy metric.
Taking the inverse of the counts could be used as a metric, but it seemed a bit too naïve.
While looking for some suitable methods, I went ahead and made a graph using words as edges that link to each other.
After getting the graph, we can go ahead and find the betweenness centrality. Betweenness centrality, in simple words, is the number of shortest paths that pass from a node. More shortest paths mean more important words. In contextual terms, if a word is linked to almost all other keywords, entities and categories, this word should be the parent word and thus needs to be the most important one. This concept can be better understood by having a look at the graphs I have attached below. The visualization of the graph makes this concept more firm.
Time to end it here. You can get all the data in an excel file in separate sheets by running the final cell to analyze it further.
Edit: One more thing, I am looking for an SEO role at a good agency that can promise learning and growth.
82 👍🏽16 💟100
This is pretty interesting – and very ambitious. There are a few things in there that I'm either not quite understanding fully or that may not be as important/relevant to ranking as you think they are, though. (NOTE: It's probably that I'm not fully understanding, not my latter assumption. lol)
I've got a few tips here that may give you some ideas on extending this even further, too… so I figured I'd share. You've already got it doing to foundational work – so you're already a fair ways toward having this anyway.
The knowledge graph works by looking at "entities" and "relationships" – and as such, it's looking for what are called "semantic triples". These come in the form of <entity> <relationship> <entity> – or more basically <noun> <verb> <noun>.
Amazon sells books.
Ford makes trucks.
You've got it pulling the nouns and verbs – so now it just needs to seek patterns. If you can find two nouns in close proximity with a verb in between – now you've got something important. Even if those aren't currently understood entities, this is how we lay the foundation to create our entities and their relationship to the rest of the knowledge graph. Google didn't always know that Amazon sells books – but it saw it written somewhere and also confirmed that Amazon does, in fact, have books for sale on the site – so now it knows. Amazon sells books and the knowledge graph expands to include that fact.
I really like the visualization here too – and it would be cool if it could, one day if you are ambitious enough, take your entities as they are visualized now, with their relationships to other entities maybe highlighted in a different way. By visualizing that it could help you figure out the primary and secondary elements too.
A service page for example would likely have two or three key things – The Service Itself – The Brand/Entity/Company that is providing the service – and possibly the location(s) in which that service is provided.
It wouldn't be able to take in everything, of course, For example, if we're Amazon and analyzing a page that lists books by Douglas Adams – we probably aren't going to explicitly say that each title we mention <IS> a book. So it wouldn't be good for that. But I still think some very useful information could come out – especially on new sites where you're trying to establish your brand entity from zero.
Very fun stuff. Keep up the good work. I love seeing people going beyond the simple "Match the Keyword" game. It brings me joy.
Thanks a lot. This is some interesting stuff. I'll surely try my best to improve and enhance this over time and keep on trying new things.
I am trying to find/build something that not only can figure out the relationship between entities but also try to predict/build relationships using some metric. This is all in my brain right now and will be trying to realize it soon. Would love your insights on it whenever I am onto something.
Would be happy to help out sometime if you hit a wall. You may have to remind me who you are at first… I suck at names horribly.
Here… this may be helpful if your brain works like mine does and enjoys to reverse engineer things.
Google's Natural Language Processing (NLP)Application Programming Interface (API)https://cloud.google.com/natural-language#section-2
If you haven't played with that – it basically analyzes text to extract entities and relationships from natural language. Plug some text into the "Try The API" box and it shows you what it found – vastly more comprehensive than what you could do because it actually has the knowledge graph at it's disposal… but…
If you look at the "syntax" tab – it can sort of give you an idea of how Google is connecting the dots. The sentiment tab might give you ideas too – but again, that's really only useful if you're one of us "reverse engineer" types. lol
Cloud Natural Language | Google Cloud
Hammad ✍️ » Truslow
Thanks for your offer to help. I didn't know about this and will surely give it a try. Looks really interesting.
Do you have a git repo? I'd love to lend a hand with future developments.
hammad-m – Overview
This is my github profile. I haven't added this project yet though. Will be doing that in a few days.
So, whats the goal in short answer?
This can help in getting a quick analysis about the search results for a certain keyword. Through adjectives and verbs you can get an idea about the intent and well, entities are entities. The graph right there with circles tries to predict/figure out how important is each keyword. Bigger the circle more important it is.
All in all, while creating content, you can give it a run and get some insight about what you need to do automatically while covering some other task. Or you can have an idea what you need to tell the writer while ordering content depending upon the Search Engine Result Pages (SERP)s.
Super good work,
Consider me a non-techy in PYTHON.
Can I play with the script results by putting up a new query OR it only works for "Search Engine Optimization (SEO)" query for now?
If I can use it, can you please mention the steps
Looks cool anyways though
make a copy of the colab
change the query and the number of links you wanna analyze from SERPs under (#fetching links from search results)
ctrl+f9 would run the script for you. You can also go to runtime menu and run all.
It will take some time to connect but won't take much long at all
Is it Important to Learn Programming Languages for Search Engine Optimization (SEO)?
What is PageRank? Is it the same as Backlink Juice or Link Equity? PageRank was a Patent Created by Google’s Founders
Use Custom Hand Coded Schema