Google uses Natural Language Processing (NLP) instead of Latent Semantic Indexing (LSI) due to Both Different Patent Owners



Clint
Latent Semantic Indexing v.s. LSI Keywords
The concept is that the definition of LSI by the academics that developed it is different from how SEO users and Search Engine Optimization (SEO) tool creators use it. Thus, because Google doesn't specifically cite LSI in their patents (academic version) they can't possibly be using the SEO definition of LSI in the algorithms.
Now, he goes on to say that he does manually review competition pages and find words commonly used amongst them to add to his content, something the SEO tools he's railing against do.
But he does not go deeply into how far into that comparison he's going. i.e. most "LSI tools" give you way more than just niche common vocabulary in their reports.
Nevertheless, he's manually doing something the tool creators are doing than saying the tool creators are capitalizing on a bad definition of the process.
So the question is, can there be different meanings of acronyms allowed in the SEO world where the "gurus" accept the alternative definition and then argue the merits of the application of that definition versus playing wordsmith games.
Case in point, SEO talks about entities, another SEO says entities are legally structured businesses, thus entities in SEO are not a thing. (yes this conversation happened).
Even websters allow for multiple definitions of words, I think it's time SEO's get their shit together and do the same.
seobythesea.com
What are LSI Keywords and What I Use Instead of Them?
LSI Keywords are a myth, however Google patents do describe ways to add terms and phrases to pages that optimize them better for keywords
47 πŸ‘πŸ½5 πŸ’Ÿ5395 πŸ’¬πŸ—¨

πŸ“°πŸ‘ˆ

Truslow πŸŽ“
The big difference is that there ARE other terms that Google uses that more accurately describe what they are doing. So, by using the LSI term – and constantly hearkening back to an origin story that really never went anywhere – you're not able to see the proper path of evolution and you'll have a rough time figuring where it's going in the future.
LSI makes assumptions like "Running" and "Jogging" are synonyms or at least very closely related because they occur together frequently. What Google does is actually vastly more accurate than that.
Jogging is a form of running. It knows that if you're jogging, you're running, but that if you're running you're not necessarily jogging. Jogging is a subset of Running.
This is an important distinction when researching keywords. And LSI (your SEO version nor the real version) takes that into consideration.
πŸ‘πŸ½10

Clint ✍️ » Truslow
What other terms more accurately describe what they are doing?
Truslow πŸŽ“
It's hard to say because the LSI tools combine a bunch of different things. The graphs of related words are called "Co-Occurrence" by Google, but this isn't indexed, (latently or semantically) it's done through word vectors. So it's not just words that "happen to be" on the same page – it's a vector through many documents in a structured cluster that not only shows that they are "related" but "how" they are related.
So again… you can't really say that you're "SEO Version of LSI" is a thing that equates to anything that is ACTUALLY happening at Google. That's the point of Bill pointing this out. They do NOT correlate, so the "A rose by any other name…" argument just doesn't work.
They not only have different names but accomplish different things in different ways.
Clint ✍️
I would argue that there is plenty of evidence in the SERPs to suggest that Google "try's" to do what you're saying.
Second, the "LSI Keyword" being described does in fact take into consideration topics and sub-topics addressed by words in practice, but not specifically in tools that are just programmed to analyze the text.
It's the SEO's job to arrange the content with other words in order to shape the overall intent of the content.
i.e. and SEO could be righting about the Deadliest Catch where they reference jogging when referring to moving at a steady pace. That has nothing to do with running thus it's no longer a sub-topic of running, its an analogy used when, in this case, talking about fishing. Google doesn't know that until it analyses the rest of the common words in other articles talking about maintaining pace on a fishing boat so you don't burn out your crew or equipment.
I still want to hear the other terms that SEO users use that more accurately describe what is simply word association in content to form topics.
Nor can you or BIll say they are not using the method described because I doubt anyone has run a test that deliberately removed "LSI keywords" or "Co-Occurrence" keywords for an article and successfully ranked for their target keyword.
I think in this case, the patent says one thing and there is an automatic assumption that Google is doing X based on the patent and because of that patent, clearly, it can't be doing Y.
Versus actually running tests to prove one side or the other.
Truslow πŸŽ“
And you can say that they aren't using it because it has already been determined that LSI indexing only works on a small corpus of documents. Several Google heads have confirmed that.
Clint ✍️
oh, well if Google heads said it, it must be true
πŸ€­πŸ’Ÿ6
Truslow πŸŽ“
So you ignore the information above and focus on the one statement I said about Google saying something.
Yeah. That's how things like "LSI Keywords Are A Thing" type misinformation keeps going around.
Clint ✍️
I didn't ignore anything, I said the fact that it's a legit thing or not it hasn't been tested and publically published, Your sore argument, while a valid point of view, is based on "Google heads said it" versus I've tested it, and here are the results.
That's how things like "LSI Keywords are a thing" type information keeps going around. Because people are no longer buying your "Google said so, so its so" argument and implementing concepts anyway to see what happens.
Truslow πŸŽ“
No. My argument is that Bill has said it. I've said it. Many others have said it. And to bolster that, Google has confirmed that.
The LSI argument has never been confirmed by Google and the very document it comes from states very clearly that it will only work on a very limited number of documents – meaning that not only is it not what's happening at Google, the document itself states that it cannot be happening at something at the scale of what Google does.
Clint ✍️
In what situation would Google come out and officially declare the importance of any given factor of their algorithm to the public. They protect that algorithm so much they don't even disclose it fully to their own teams that work on it.
And I've said LSI doesn't work in the past, but that doesn't make it so.
While as valuable as your opinion may be, it's still nothing but an opinion until you have test results to back it up.
Truslow πŸŽ“
For testing and experiment results follow @KorayGubur on Twitter.
Clint ✍️
I do, here is his website,
https://www.holisticseo.digital/on-page-SEO/ranking-factors/#3-Keyword-Research-TF-IDF-and-LSI-Analysis-for-On-Page-SEO
Here is a quote from it:
"Among other things, it is also important to expand the range of your topic using LSI keywords in order to signal Google that your content covers the topic in full. LSI stands for "latent semantic indexing". This tool helps Google identify the depth of the topic in your article based on related synonyms in your text. "
On-Page SEO: Definition and Ranking Factors – Holistic SEO
HOLISTICSEO.DIGITAL
On-Page SEO: Definition and Ranking Factors – Holistic SEO
Clint ✍️
He even offers "Three simple tricks help you to find relevant LSI keywords"
Truslow πŸŽ“
In this case, he's "calling it that" – but it's still not really what he's doing. He's looking at Google for the source – in the keyword suggestion tool (which can be produce flawed results if you aren't careful and know what you're doing) and through other means by analyzing Google's actual results.
Usage of the tools for this is very limited though. The tools don't have access to Google's database, so they have to make their own. Without the same database, the same results can't occur – so… what you see there may not be at all like what Google knows and sees.
As such, it may be a good starting point. And Koray may be using the term LSI in there – but it's a small part of it. He's not then just putting those words onto a page and saying "rank!" He's using those as a seed to use in something else.
Clint ✍️
Yeah. That's how things like "LSI Keywords Are A Thing" type misinformation keeps going around.


Keith L Evans πŸŽ“
Semantics are important in SEO. Whether it's latent or indexed is something for SEJ to publish and mislead 1,000s

Clint ✍️
I think that's the point, and one of the detractors of our industry, so much double talk playing wordsmith games versus just saying "This is the process we're doing and how we're doing it, we've named it X" without having to worry about some know-it-all coming back and offering different definitions of the same descriptor so they can feel smart.
Case in point:
Article marketing became Guest posting became outreach.
Link building became Digital Public Relations
All wordsmith games when in all those cases you're saying "link building".
πŸ’Ÿ14


Dang
I'd wager 99% of people who cite LSI don't even know what it means. So what Slawski is saying is, "this is what LSI keywords really means, it's not what Google is doing, Google is doing that instead."
Yet it doesn't matter, because most people don't really know what LSI means anyway. It could be called bullfrack index keywords. πŸ˜ƒ

Clint ✍️
I think Bill is taking one definition and using that to discount another one versus proving empirically that Google is using "LSI keywords" or not.
The fact or fiction that people don't know the SEO definition of "LSI keywords" or not doesn't support Bill's premise.

πŸ“°πŸ‘ˆ

Dang
Isn't "phrase based indexing" essentially LSI phrases? πŸ˜ƒ
"One of the examples from the patent os[sic] for the word "horse. To an equestrian, a "horse" is an animal. To a carpenter, a horse is a tool. To a gymnast, a horse is a vault of exercise equipment. If you include domain terms such as "saddle," "stirrups," and "thoroughbreds" on that page, those words help a search engine understand that the page is about animals or horses that equestrians might write about."
Isn't that essentially LSI? I'm no patent reader but it sure looks like it.

Truslow πŸŽ“
Not really. LSI simply understands that "horse" can be related to each of those things. What Google does is know what a horse is in relationship to those things – it's an animal, it's a tool, it's a piece of exercise equipment.
So basically, LSI would be the next step evolution IF Google was still playing that "Match the Keyword" game exclusively to rank pages.
It's not doing that though – and only really does that in areas where the knowledge graph isn't properly built out… yet. In those areas you COULD probably win and win big for a "while" using LSI strategies.
BUT… all it is going to take is someone with knowledge of how to actually connect the words with NLP, hitting semantic triples, good structured data breakdowns, etc and they'll be kicking your butt in no time. And at that point, there's no going back to the match the keyword game in that niche.
Ultimately, if you play that game, one day you'll be back here going "Google killed me on the update! What do I do?"
If you forget about that game and learn the more accurate (and complex) systems, then you can just keep ranking and not really have to worry about updates.
Nathan Β» Truslow
I've seen you mention semantic triplets and NLP and vectors for a while now… I want to learn this – could you point me in the right direction (even a course)? It seems crucial.
Truslow πŸŽ“
This is probably the best 30 minutes you can spend this weekend in terms of learning the basics of this stuff.
https://jasonbarnard.com/digital-marketing/published-content/videos/how-to-help-google-make-sense-of-a-chaotic-unstructured-web/
The video is a few years old now, but Jason makes it fun. Focuses mainly on the Semantic Triples and basics of teaching the knowledge graph to understand the <entity> <relationship> <entity> concepts (which is what semantic triples are).
As for Vectors and the finer points of NLP, that's a bit trickier. The video above has some basics of how the extractions work – but it's entry level stuff.
If you want it, here is a great source, though:
https://www.seobythesea.com/category/semantic-search/
This can get a little deep – so you'll need to read, digest for a bit and reread. Once you start to get the hang of it all, it gets easier though.
You may want to go back a bit and work toward the newer stuff.
And starting with this post and then going to the specific posts Bill links to within it can be a good way to hit on various aspects of it too.
https://gofishdigital.com/blog/what-is-semantic-SEO
Just remember – it's not something you're going to just learn and understand in a day or even a week. (This is, in no small part, why there aren't courses or a lot of people putting out helpful posts about it – without some specific context like a question on here to start with, it's hard. I have thought about it a hundred times and I always end up with, "Well Frell, where the heck do I even start trying to explain this?!?!?" lol)
JASONBARNARD.COM
Jason Barnard – How to Help Google Make Sense of a Chaotic, Unstructured Web #SEOisAEO
πŸ’Ÿ2
Ammon Johns πŸŽ“ Β» Dang
Latent Semantic Indexing is a concept based on three words that had never been put together before Bell Labs put a patent on the term, in the 1980s.
The most important word of the three is 'Indexing' as that's the 'what' part, the what it does, while the Latent Semantic component is the how. LSI is for building an index. The next most important word is 'Latent' because it qualifies the kind of semantic signals used. Latent Semantics is not the same as Semantics, in the same way that "Meat Free" is not the same as "meat".
The paper and patent are both dealing with a way of building an index around the pre-scoring of words by their relationship to the whole corpus. Add just one document and you have to rescore and thus rebuild the entire index.
It is extremely precise for dealing with a fixed corpus of documents, such as if you wanted to collate all the personal letters ever written by Mark Twain, and be able to find almost anything in them extremely quickly. The same for something like the War Journals of Winston Churchill. It's great for those kinds of purposes because they fit the criteria (as is natural, given this is precisely the kind of thing LSI was invented for). The corpus of all documents is fixed, and while it is possible that some new 'lost letter' or document might surface every now and then, it is rare that the corpus will have any additions or changes.
LSI has nothing at all to do with 'Semantic Search', and indeed the first discussions of the Knowledge Graph were going on concurrently, at the same time, on a different subject. The Link Graph also predates Google as a concept, and indeed, the first actual search engine that I heard talking about it openly in the media and science journals were Altavista, and their "Bow-Tie Theory of the Web". (There's a reason that the majority of Google's early hires were heavily headhunted from AltaVista).
So people leaping on LSI as anything to do with the web is kind of like people thinking that tunnel-boring for underground highways and railways uses drills, and therefore must somehow relate to a design they found for a dentists drill in the 1970s.
But it goes further. Synonyms are the *opposite* of Keywords. Keywords is all about saying the exact word, right down to whether singular or plural, matters, (which as we see in intent, it does). While synonyms is saying that any of this bunch of words all mean much the same thing.
Meanwhile Google gave us their strategy openly with the statement "moving from strings to things", which is saying that the only value in any string of words were in relating it to the precise thing(s) intended.
To Google, they don't want to use either the exact word (string), nor even the synonyms, but to attempt to understand the thing, the intent, the purpose itself, and then rewrite the query in the way that best helps their engine to answer that.
Kristine Β» Doc Sheldon
That


Cory
This is me…beating the preverbal horse of topical relationship. Bill nailed it in the article imo.
"Phrase-Based indexing means adding complete phrases on pages that rank highly for a specific word or term and frequently co-occur on those pages."
I spent enough time there in 2017-2018 to know this is their path. LSI is for quick hit bs and link stuffing in 2022.
In fairness LSI can be a great starting point for competition discovery. Competitors rank + keyword attribution + Topical intent is the only goal. Process the thought and the intent. "Keywords" with high LSI usually have an orgy of links that are ridiculous and in my opinion offer little value for content creation.

Clint ✍️
Agreed, but when you filter out those terms with an orgy of links and narrow in specific market-related terms, the ridiculous turns into curious. i.e. optimizing your content using entities in Google's own NLP tool to form topics.
So employing the "LSI Keywords" concept using specific entities is just more refined versus using all general words in common amongst content.
Cory Β» Clint
Yes sir. In the end its a solid strategy for some audit scope imo. Great post brother.
Clint ✍️
thank you sir

πŸ“°πŸ‘ˆ

Ammon Johns πŸŽ“
Many of us old-timers were around when the first big hype and hoopla about LSI came around in the very late 90s and early noughties. Personally, I was even fascinated by how LSI functioned and what it was for – relatively small (20,000 max), unchanging, corpuses of documents. As an insight into a specific way of dealing with a specific IR challenge which was utterly different to the web, it was insightful.
So we were also around when people first jumped on LSI, stating the patent specifically, often lifting and misquoting from it without any understanding of the fact it only works with a small, fixed, unchanging set of documents because the entire index has to be rebuilt for any change. I mean, seriously, it is right there in the patent and papers why this cannot possibly be used for the web, and would be entirely the wrong kind of tool to even think of adapting.
We were also there when LSI Keywords specifically first launched and directly leaned into the papers and patent they didn't understand to give pseudo-science for a tool that generated synonyms (a thing LSI specifically does not do, and in fact ranks according to the rarity of the specific word used where there were alternatives). In other words, the very first thing we knew about the people behind it was that they were bullshit merchants, selling FairyDust Keywords, because Fairies must be real, otherwise who left money under their pillow when they were kids who lost a tooth.
Now, I don't know about you, but I find it hard to look past that start to any relationship. When someone's first instinct is to con me, my last instinct is to think they deserve my money and support. I'm cranky that way.
Over the many, many years since, they've changed their story a lot, and distanced from the LSI patent they finally realised had nothing at all to do with synonyms, or semantic search. Had nothing to do with the words you used as such (That's why it is "Latent Semantics" – the things you say without knowing you are saying, rather than Open/Overt Semantics which would be the words you knowingly chose). But never once did they ever acknowledge the lie, the errors, or the scam.
THAT is the part that means I will never support, endorse, or condone their product, or the specifically crappy tactics they used to sell it. Like I said, I'm cranky that way.
πŸ’Ÿ8

Clint ✍️ » Ammon Johns
Preach it! Was definitely looking forward to your insights
πŸ’Ÿ
Ammon Johns πŸŽ“ Β» Clint
The problem is that for a lot of people, 'Semantics' is a word they have never, ever heard outside of SEO and semantic search, and so they latch on to any use of that word and think it has to all be part of the same thing.
The problem with that is that Semantics is a whole field of its own, along with Proxemics (the meaning of the distance or closeness of things), and several others.
Latent Semantics is different to just Semantics, as the important qualifier is the 'latent' bit. The unconscious. In speech, the latent semantics would be the pauses between words, the ums and ers. In writing, it is more about the quirks, like in my writing I use the word 'However' a lot more than is typical of writing in general. Or how often 'it depends' unthinkingly occurs in SEO discussions compared to discussions of other topics.
It has absolutely NOTHING to do with synonyms, or the conscious choices of words.
Clint ✍️
I think you're kind of reinforcing the point of the post, Bill is discounting a method because of the definition of the process he's choosing to follow, ergo on can't exist because the other does.
Full Definition of semantics
1: the study of meanings:
a: the historical and psychological study and the classification of changes in the signification of words or forms viewed as factors in linguistic development
b(1): SEMIOTICS
(2): a branch of semiotics dealing with the relations between signs and what they refer to and including theories of denotation, extension, naming, and truth
2: GENERAL SEMANTICS
3a: the meaning or relationship of meanings of a sign or set of signs
especially : connotative meaning
b: the language used (as in advertising or political propaganda) to achieve a desired effect on an audience especially through the use of words with novel or dual meanings
With your choice of the word Semantics, there are 3 accepted definitions. One, in particular, is 3a, the meaning or relationship of meanings of a sign or set of signs.
Words, in content, are signs, so its reasonable to think that a set of signs (words) form a relationship to determine meaning, thus Semantic Keywords (while not a thing) could be described using that definition.
And if you believe that Google's ML and AI is smart enough to take those signs, determine to mean, then sort on relevance to generate search results, you sort of have to think that the SEO application of "LSI keywords" (while general as all hell in tools) has some merit at its core and is well worth testing.
Ammon Johns πŸŽ“ Β» Clint
I have a problem with the completely nonsensical term "LSI Keywords", and it generally makes me feel as confident in the knowledge and expertise of those using it as I feel about 'Prancersize' – https://www.YouTube.com/watch?v=o-50GjySwew
Now, in theory, getting some exercise is better than getting none, so to that extent 'Prancersize' *could* be argued, by someone utterly batshit crazy, as a legitimate thing, I guess. But we are not in a world where no other form of exercising is available. I'd rather go with any of the thousands of people who actually know what they are talking about at least a bit, right?
So, just like that, there are advantages to using synonyms *IF* your only thinkable alternative to that is to just spam the same exact phrase matching keywords over and over. (And yes, we both know there are people who spam exact match keyword spam on a page, today, just as there always were). For those people, even the mis-sold, bullshit-laden con-pitch behind 'LSI Keywords" might be worth overlooking.
But, the truth is that if you were the kind of person who didn't write well enough to naturally be using synonyms and rich language like a literate and educated native of whatever language, then you probably won't be picking the right synonyms, or using them well either. The net result may be that it still makes no difference.
However, (there's my fave word), just like with exercises, there are tons of ways to get synonyms that don't require dealing with people who prance and think you are dumb enough to buy it. (LSI keywords my arse!).
As a kid at school, I regularly referred to a thesaurus, (Roget's Thesaurus in my case, an actual book, used jut like we used a dictionary), when I wanted to be prompted for a synonym. Those were freely available online, right alongside dictionaries.
Today, I'd be more looking at https://inlinks.net/ which is far, far better at showing me what actually matters in semantics and topic/sub-topic coverage.
And I'm still going to laugh my ass off at anyone prancersizing around with LSI keywords.
YouTube.COM
Original – Prancercise: A Fitness Workout
🀭3
Clint ✍️
I think you're limiting LSI to synonyms is an inaccurate assessment of what the original LSI tools used to provide users. Those tools, back then, did much the same as inLinks, though inLinks has since improved the technology tenfold from those original tools, as progress often does.
Today's tools, like LSIGraph, do you use synonyms and are complete shit, something we can both agree on. And what it does is nothing like the original LSI Keywords concept that was coined in the early 2000s.

πŸ“°πŸ‘ˆ

Ammon Johns πŸŽ“
Incidentally, word2vec was far more the kind of algorithm in use for a while, and you can immediately see the difference: https://www.tensorflow.org/tutorials/text/word2vec
Then along came the next step where entire phrases were converted to vectors with Phrase2vec: https://www.semanticscholar.org/paper/Phrase2Vec%3A-Phrase-embedding-based-on-parsing-Wu-Zhao/a32f6470c2d388369ee176346359f2278cd04041
We can instantly see these are far closer to the kind of language modelling Google openly talk about, though of course BERT and MUM go far, far beyond this in complexity and scale.
Slawski
LSI as seen by SEO toolmakers is a chance for them to make money, as long as people buy them. However many tools that are being called "LSi Keywords" are completely free and user experience tools for searchers such as query refinements or auto complete suggestions. These things have nothing to do with LSI and very little to do with Semantics, but SEO users aren't hurt by knowing about them. It doesn't hurt an SEO to write using synonyms and working with a thesaurus isn't a bad idea.
I pointed out phase based Indexing and domain terms because they are worth exploring. There are over 20 phrase based Indexing parents, and at least one white paper from Google on Semantic topic models. Get an idea of why you might be using the tools that you use, and the reasons why you use them. It's worth looking around and experimenting. It likely has nothing to do with latent Semantic indexing. Which has nothing to do with entities or object attributes or the semantic web.

Clint ✍️
LSI keywords (more commonly known as semantic keywords today, Google uses them to decide the link between different entities of the web content. So to say that semantic keywords are not related to entities is blatantly false.
Slawski Β» Clint
What planet does Google use LSI Keywords on?
Slawski Β» Clint
LSI Keywords no have absolutely nothing to do with entities. Maybe you mean attributes, but those have nothing to do with links. Entities don't care very much about links either
Kristine
ClintSlawski is right. That's not how it works. Terms are given mathematical numbers placed in the vectors and these vectors indicate relationships with nodes and the topical mesh — none of that is related to LSI anything.
Slawski Β» Clint
Absolutely! Google avoided using LSI, which was patented by Bell Labs, and only works on small static websites. They supposedly used Probabilistic Latent Semantic Indexing for paid search, but that works differently than LSI. And LSI Keywords are something from SEO Toolmakers and SEO users, and have never been used by Google ever.
Slawski Β» Clint
Clint I was around at the start of SEO, and wrote about entities years ago, with absolutely nothing to do with LSI. I optimized a page on the Baltimore.org website in 2005 for entities, because it helped that page to be better found for what I was trying to optimize it for – much better than repeating the keyword phase over and over again. No LSI, and a knowledge that Google didn't care about LSI at all. Google could care less about LSI. It wasn't something they have cared about. Even in 1999 when Brin filed the Dipre patent, to optimze for sites that had book's databases
BALTIMORE.ORG
Visit Baltimore | Official Travel Website for Baltimore Maryland

πŸ“°πŸ‘ˆ

MichaΕ‚
Why cant we just call these key-words? πŸ˜ƒ
🀭3

Clint ✍️
Words, words around words form topics
Ammon Johns πŸŽ“ Β» MichaΕ‚
You certainly could. But, ironically that would eliminate the Latent Semantics of the fact that certain tool providers *chose*, entirely deliberately, to produce a so-so tool and pretend it had a secret sauce based on a technology and a patent they didn't even understand.
(The use of the terminology of LSI, but none of the terms that showed actual understanding, would be a Latent Semantic signal, were anyone ever to actually use LSI
It would ignore that this is a concept that came out of outright, cold-blooded fraud, to prey on the ignorant. Because oddly enough the more honest "SnakeOil and Bullshit Keywords" wasn't quite as convincing. Go figure, huh?
It would ignore that no matter what has changed with modern re-creations or reexaminations of the strategy, ultimately it has that same genesis – either someone was so dishonest and deceptive to be in on the vile scam, or they were dumb enough to have bought into the scam and become a convert to bullshit.
We don't have the same hate for KeyWord Density (keyword) – sure we'll tell you straight out it is a made up thing with no basis at all in how any search engine ever worked. But keyword never pretended to be based on patents, merely on correlatory studies of pages. keyword was something sold in ignorance, not deliberate fraud and malice.
Gillispie Β» Ammon Johns
Keyword density matters. Last time I said this I got in a fight with the west coast SEJ crowd and deleted every post I made in this group.
But it matters (or did several years ago)…but not necessarily in the way most use it.
Stuffing gets smacked with a Panda-like filter/penalty.
And it's fairly easy to trigger.
Somewhat goes away with higher authority. I assume it's still there. But I haven't played around with a low authority site in a while. And we are careful with this on my existing sites. Might have to dust off a few old project sites and see if it's still in effect.
And in avoiding stuffing, you also naturally improve your LSI/Phrase based indexing or whatever you want to call it…and likely capture additional keyword search variations at the same time.
And it generally reads better to a human. After doing this for eight years, word repetition is like finger nails in a chalkboard.
I understand that synonyms aren't the only part of this, but it's a good chunk and what a lot of people mean when they say "LSI
Our take on it is to create the most in depth content on the net in our niches. Basically what Kyle preaches minus the hard core testing and we do siloing quite a bit looser.
Ammon Johns πŸŽ“ Β» Gillispie
Absolutely keyword matters… To you. You've made it a part of how *you* determine what you are comfortable with in measuring content, and in combination with other things, including some common sense and human judgement, you've found success with it.
And that's all fine. I wouldn't argue with one word of that.
I just know that keyword isn't a measure any major search engine EVER measured or used in any way.
The funny bit, to me, is that most people today have no idea that Keyword Density was only one of three metrics that had to be used in combination to matter. The other two were Keyword Count, and Keyword Prominence.
This gave SEO users a pretty accurate way to talk to each other about exactly how they were using keywords in their content without having to either tell another SEO what those keywords were, or even to show the content (because if a rival SEO were in the same group/forum/list you didn't want to be giving away your exact formula for free).
In the days before Google, when on-page was all there was to worry about, it was 'good enough' as a metric for SEO users to be using to share what was working.
The major industry tool of those days was WebPosition Gold, and that used the three keyword metrics, not just for the page as a whole, but also broken down by section too, such that a Title tag, or Heading could have its own metric, and you could specify the exact values for a leading paragraph separate to the rest of a page. Extremely granular, you could share a doorway page template by description.
But you know what else worked even better in those same days? Just copy the leading page in the SERP, and serve it cloaked to the engines, and you'd naturally have the exact same rankings because on-page was all there was, and you had the exact same page.
Or just copy the top ranked page and change the brand name and product names, keeping the rest of the copy identical.
That gives you an idea of how basic those days were, and how even a made-up, ultra-basic way of describing an exact template for an exact page might work.
But the fact remains that no search engine ever used such a basic measure, and the SEO users who made it up knew they didn't. They just knew that the metric was a good way to talk specifics without giving away keywords.
Gillispie Β» Ammon Johns
You are actually arguing that stuffing wasn't something likely included in Panda? Back when the SERPs were still fluid you could actually test this in damn near real time (with every crawl). Trigger the spam algo, go down 20 positions. Lighten your keyword, recrawl, regain your rank, repeat. I am wanting to say the October 2016 core update was when the SERPs got a hell of lot more sticky likely to defeat testing and "churn n burn." They aren't as sticky now as I think Google does more UX testing.
Oh yeah, that was the other half of the prior argument. Data collection from Chrome and UX testing. "No way does Google collect UX signals from Chrome." Despite data collection being in the Chrome and Google TOS (to "improve user experience") and obviously being used to calculate Core Web Vitals scores.
Ammon Johns πŸŽ“ Β» Gillispie
No, but keyword has absolutely NOTHING to do with how Panda worked.
Create a page, put a big infographic on it with no words or alt text. Give the page one exact match heading. It has a keyword of 100% and won't trigger Panda. Because keyword isn't a proper metric on its own, or at all. If it is a useful framework for you as a creator to make pages, that's great. But it is utter BS in terms of how search engines work.
Gillispie Β» Ammon Johns
The was a negative filter there. Might still be there. Saw it with my own eyes/testing as did others. Whether it was part of Panda or it's own thing, I don't know. Believe what you want. I'm out.
Ammon Johns πŸŽ“ Β» Gillispie
Funny how that happens when I give an actual, easy to replicate example. πŸ˜‰
Gillispie Β» Ammon Johns
That's a dumb assed example. You don't think Google put measures in place to stop churn n burn and shit content? During the end of days of Ezines and stuffing? Lolz. Anyhow, it's your show. I'm out. Probably won't post here again for six months. Got it out of my system.
Ammon Johns πŸŽ“ Β» Gillispie
Well, that would give you plenty of time to consider the following as a freebie tip.
Why would Google need to use Chrome, or ever want to use Chrome for data they get anyway when they Render a page, when they ONLY apply CWV to pages they render?
Seriously, what exactly do you think 'rendering' is? It is running it through a virtual browser built not for human eyes, but purely for machine reading and translation. Every page gets run through that before being added to the index, and THAT gives a 100% consistent CWV rating.
Why would they spend time on a redundant and inferior Chrome solution?
The thing about user data being collected to improve user experience is standard language in ANY tracked application, even stuff as simple as a text editor if it tracks usage stats. They use it to decide if the current levels of pre-fetching are right, if certain things crash it and might be fixed with better error handling or cache management, etc.
Personally, I think they do use Chrome data for some stuff, but only for stuff that (a) can't be gotten better another way and (b) is consistent and clear as a signal. I think aggregated Chrome usage data would certainly help spot malware, sneaky redirects, and cloaking, where the file sizes a user gets are significantly different to those Google's bots are getting. But there it is being used only as a signal of a potential thing, something to check out another way for a more reliable method. After all, anyone can 'drive' Chrome, and anyone could hack it to send different data. It's a cinch to hijack data-packets from your own machine or network, and change them before pass-through. Google are not going to give hackers that kind of manipulation tool.
Gillispie Β» Ammon Johns
It's (likely) for basic signals (which would change based on query intent). Time on page, scroll depth, shit like that. Things that you can get from GA – which they say they don't use. Yeah, cause they have better tools. And it only needs a sampling. And only for the top 5 or 10 positions. Basic algo math handles anything below that.
Ammon Johns πŸŽ“ Β» Gillispie
But you are assuming that stuff like time on page and scroll depth would always, consistently, be a good signal for whether or not a page was a good result for a query. It isn't.
These are what is called noisy signals. I can make users scroll further by simply slapping in more white space, or taller images. It hasn't improved the page. In fact, less scrolling is something most users prefer, given the option.
Same with time on page, I can use denser, harder to read language, or simply swap fonts to make text slower to read, changing nothing else. Does this indicate a better searcher result?
Again, actual users prefer to be satisfied, to get the answer they wanted, in the shortest time possible. Time on page would be the reverse of what mattered.
This is why I say that webmasters and SEO users often have the wrong perspective. Time on page is good for us, because we're not thinking of whether this is a universal signal, but of whether people are taking the time our SPECIFIC content deserves. It's a creator-centric bias. But it is totally irrelevant to search result quality in any universal way.
Gillispie Β» Ammon Johns
Which is why I said base on query or query type – long form info (how to grow x), short form info (what is the weather x), buying intent, etc. f*ck with the reader much (bad UX), and he/she/it will bounce quicker – maybe not pogosticking quick, but quicker.
Regardless, I'm done here.
Ammon Johns πŸŽ“ Β» Gillispie
I just told you exactly how to test this on any query type at all – find ways to extend the time on page without changing the content value, there's tons of ways to do this. Then compare the 2 test sites, both with the same content (there is no duplicate content penalty) one with more time on page forced, the other without. Control all other variables. Which version does Google rank top. Then change the attributes (easy if it was simply font-choice to a harder/slower to read font) and see if the pages ranked swap.
Anytime you think something may be a signal, think how if you knew it was you could cheat it. Then test the cheat.
Oddly enough, it turns out that the combined efforts of a few hundred Information Retrieval scientists often have thought already of most of the stuff I can come up with, and already decided not to use such an easily gamed metric.


Melissa
Bill is right. Google says the same thing Bill does. Don't ever believe in any SEO or any tool that uses the phrase LSI
Slawski
Thanks, Melissa.
There are a lot of people who strive for a magical solution that allows them to do little work, add some terms from a tool, rank highly, and profit insanely. And there are people calling any tool from Google designed as a better user experience in search as if it was an SEO tool to generate LSI Keywords. Query refinements in SERPs are not LSI Keywords. Autocomplete suggestions are also not LSI Keywords. Terms from Google's keyword planner are not LSI Keywords.
A lot of blinded people looking for a magical solution who are too afraid of doing actual work, have been sold a bill of goods that won't solve their inadequacies as SEO users.
Clint ✍️
So when I originally started this thread I thought the conversation was more a terminology point versus a ranking factor point. As the conversations grew, we had some great discussions about the topic where opinions were heard and ideas traded back and forth, at least for the most part.
But I was wondering, why in the world are we talking about LSI keywords again anyway? Then I did a little digging as to why I found this article in the first place again during my research in the SEO news cycle.
So here are my thoughts on LSI keywords and the whole situation in our industry surrounding why it's a topic again.
https://youtu.be/DqSv6N1k8r0
LSI Keywords Are A Thing – SEO This Week V2 Episode 4

Ammon Johns πŸŽ“ Β» Clint
I do get your point, that from your perspective it's just a name, and that shouldn't matter.
Sadly I'm not so sure you get ours, that it genuinely does, and for specific reason.
When LSIkeywords first came along as a concept, in the Noughties, they chose to put those words together, which had never been together before, for a single reason – to claim that the entire idea was based on Latent Semantic Indexing.
The first page about LSI had misquotes from the patent and everything. It just didn't actually understand the patent, and conveniently left out all those little details like the fact it doesn't work for large corpuses (more than a max of around 20,000 total documents), isn't suited to any use where new documents are added with any regularity over at most about 1 per month, and was never meant for the web in any way at all.
https://web.archive.org/web/20140123175709/http://lsikeywords.com/ is a capture of the lisikeywords domain from 2014 (the domain was first registered in the noughties, but I wanted to show how they were still firmly tying this to the specifics of LSI years and years later) where you can see they are specifically tying this to LSI (Latent Semantic Indexing). No mistakes, nor pretence that LSIkeywords are a different and separate thing to LSI the patent and methodology.
In other words, it was a shyster's con, pitched to pretend it was based on fancy techniques and patents it had nothing to do with. It was flat out fraud, purpose built to fool people who wouldn't read the patents.
They never picked LSIkeywords to mean Lazy Shyster Idiocy keywords. They picked it to deceive people. The inventor of the term picked it specifically, deliberately, and openly, to lean on the LSI patent for 'authority'. And sadly, a LOT of people were taken in. Reading patents is damn hard work, and even after all my years of practice now, I seriously struggle sometimes to fully get my head around the math of the actual equations. Most people won't bother, or will just take someone else's word for what a patent is about.
Anyone not into the fraud, and not fooled by it, obviously didn't take the name on to use for something else. So the continuation of that name is *all* down to either someone who understood the con, liked it, and wanted to borrow it to con people for themselves, or someone taken in by the con, who thought LSIkeywords was a real thing and did NO level of research at all (thus not finding the hundreds of pages explaining what a crock it all was).
Either way, not exactly someone you'd want to trust. And that's the bottom line.
Look, there's a lot of cons in SEO. Manipulations and deceptions. I mean, right at the start of your video there you say you "were suddenly hearing about LSI keywords everywhere" on Facebook and twitter, and didn't mention that the only person I've seen start a conversation about this in the past 6 months is you, in this thread. But making out it was something people other than you were talking about made it seem a more topical video, and that's just marketing.
But there's a big difference between a case of spotting some minor little marketing manipulation, knowing it for what it is, and overlooking it as harmless; and a case of not spotting outright lies or what it means about the person doing it. LSIkeywords was never 'harmless' it was always deliberately deceptive. It was always outright fraud. Anyone who then picked that up and ran with it as a good idea … That's someone you never want to trust.
The mere serious mention of LSI or LSIkeywords as if it were a real thing is like walking around with a giant sign that says "Hey, I'm either a con-man or a fool, come find out which!"
Clint ✍️
I'm not some child who doesn't understand what the grown-ups are talking about, I totally understand your argument, and it's just as I said.
Because LSI Keywords have nothing to do with LSI (Latent Semantic Indexing) the name is wrong thus the strategy that was developed under the name LSI Keywords is, therefore, a strategy used by fools.
This brings me full circle back to the point of my original post where, as pointed out, we're talking about a naming convention. Regardless of the reason, it was named as such.
If the strategy was named CRK (Common Related Keywords) you wouldn't have a problem with it, even if the idea came from someone who read the LSI document, didn't completely understand or ignored the entire concept, and came up with CRK.
To this day we've been doing the basic tenants of common related keywords (CRK) since tool makers (by hook or by crook) coined LSI Keywords.
The true fool in this entire absurd argument is the person who completely discounts a strategy because the name was highjacked from something else.
A reasonable person would point out the name is misleading, while the concept has merit, then suggest a new one. An unreasonable one will go on Twitter and bash a guy on a subject they wrote about in the past who's successful over the misleading name just to boost their ego and pretend they are doing it for ethic's sake alone.
πŸ’Ÿ3
Ammon Johns πŸŽ“ Β» Clint
The word 'Synonyms' and the term 'keyword variants' had been in the SEO vocabulary in the 90s, were already well known and established.
If someone had wanted a brandable term to pretend they'd invented it, and that it was a 'strategy' (keyword choices are NOT a 'strategy', nor even a tactic, but merely just a basic part of creating content.), and they'd named it Moving Man Keywords, or The Apartment Block Keyword Strategy, stuff that didn't mislead, sure. I'd still think anyone talking about it was a fool who'd been taken for a mook, but everyone starts out uneducated and ignorant until they learn.
But this wasn't 'branding'. This was fraud. This was deliberately lying to people to take advantage of them in pretending it was based on a science, rather than based simply on stuff that worked. That's a HUGE difference in intentions.
Why did they pretend it was based on a patent and some super-secret science recipe? Simple. Because you couldn't go to any SEO forum or site and NOT read advice to use lots of different keyword variations, keyword combinations, and synonyms. Without the LSI bullshine, there simply was no product at all. Without the fraudulent pretense that it was all based on some patent, they were selling air that was free and abundant everywhere.
What about today?
We know for a fact this is confusing and damaging. I mean, your an experienced SEO who already knows this stuff, and even then in your video you kept referring to it as just "LSI". If someone with your experience can't keep LSI separate from LSIkeywords in your head when explaining stuff, what hope has someone just starting out got?
One thing I have done a lot over the years as a consultant is advise on recruiting. Sometimes I even run the recruitment and handle interviews, but much more often I'll help a client come up with good questions to ask that help tell the difference between an 'in the trenches' SEO, and someone who's read a bunch of blogs, and most especially, people who have bought into common myths.
For more senior roles especially, LSI and LSI-derived keywords have functioned more than once as questions where if they believe they are a real thing, they are instantly disqualified.
Even someone really, REALLY bad at reading patents can see within moments that LSI is an index, built around scoring a page for how its patterns compare against all the rest of the documents in the entire index. That's why when you add or remove just a single document you have to rescore all the documents and rebuild the entire index. So by the exact same blatantly obvious logic, if anyone built an index with one single page more or less or different to the version Google had in their index, all the scores would be entirely different. It's a Mug Metric.
Clint ✍️
Dude, you're arguing with yourself at this point, I stopped reading when you chose to ignore the fact that I'm agreeing with about naming but saying that the method behind it works.
Doc Sheldon πŸŽ“ Β» Clint
If "LSI keywords" doesn't refer to latent semantic indexing, then all it really refers to is "keywords" – and all the offshoots, such as synonyms, misspellings, etc.


Josh
Damn this thread is long – anyone can do a TL:DR? thanks😘
🀭2

Marco Β» Josh
Kazanseo will have a text summarization tool soon lmao (for real)
πŸ’Ÿ


Fleming
Google does not use it period. The internet is far too large for Google to even consider using it, as it would eat most of their time and server resources. Instead of using this term let's just say use synonyms instead. Voila job done.

πŸ“°πŸ‘ˆ



Leave a Reply

Your email address will not be published. Required fields are marked *