Does Google Crawl and Index Users Generate Content Sites like Quora and Medium Easier than Yours?



Ammon Johns ๐Ÿ‘‘
Indexing Paradigm Shift
It's too early to be absolutely certain, but it is my strong suspicion that the changes we have seen in indexing, where many sites at the lower end of the 'global importance' spectrum are not getting content indexed as quickly as previously, or in some cases, having difficulty getting it indexed at all, is not entirely a temporary situation.
You see, it comes after some years of mumblings and rumblings within Google about their issues with quantity over quality spam. And coincidentally, follows right on the heels of vague announcements about new methods of dealing with spam in a spam update.
The problem for Google was sites like Quora, and a hundred other similar sites, where masses of user-generated content could produce literally hundreds of thousands of new pages over the course of just a few hours. Yet the vast majority of the new URLs this created were most often just rehashes of the same discussions on a thousand already indexed threads and URLs from the same site.
Then there is the increasing availability of so-called Artificial Intelligence (AI) copy writing software. Software that can create huge volumes of very low quality content at almost no cost in resources. Now, Machine-generated Content spam is, despite the bullshit by the so-called AI content software, nothing new. There was software that did much the same thing years and years ago that scraped and spun existing content into new forms according to a set of rules and instructions, and all the new software is doing is calling those rules and instructions "Artificial Intelligence (AI)
Google have systems to deal with this.
I just believe that they are improving and extending them. Crawl Prioritization has always existed in Google, and is largely a scaleable, self-regulating thing. Every known URL that Google wants to crawl, either to index for the first time, or to revisit to check for updates, is assigned a place in the long queue according to a priority scoring system. Ultra-high priority pages and revisits get taken care of almost at once, while at the other end, ultra-low priority URLs may wait months or even years, and possibly never get a turn at all, as higher priority items get added to the queue ahead of them.
It has always been quite typical that an 'average' website with 'average' importance and link popularity values for its size would have about 90% of its content indexed, and about 10% not indexed, at any given point in time. Sites such as Amazon and Ebay might have a far, far higher percentage of unindexed product pages simply because they change so quickly, and are often at such deep levels of the site in terms of links to follow.
However, over the past years, I have seen a steady increase in the number of regular, average-ish business sites that have been advised to blog and create fresh content every few days, have a MAJORITY of their content not be freshly indexed and reindexed.
Basically, what PageRank was going into and around the site from the few pages that earned any genuine links was being spread so thin by flowing to all the thousands of essentially pointless blog posts nobody cared about, that the strength in any one page was diluted to the point where it just wasn't seen as important.
My suspicion then is that Google have tightened this up, just a little, to help them find truly worthwhile pages even in a swamp of dross, and at the same time, make it much clearer to quantity spammers that their tactic is self-destructive.
So, all of this means that the following is my advice on how to focus with content going forward:
First, do not publish new stuff just to publish new stuff. If a new page is not built to convert for a specific campaign (e.g. a new product line), or is not a well-thought out piece of content you are sure will attract a bunch of genuine new citations, rethink it.
You need to focus on more results from less pages going forward.
It is better to spend 2-3 months creating one, absolute killer piece of content, such as a major study, a really good survey with expert insights, or otherwise something truly special and remarkable that will gain you a bunch of high-value links and buzz, and have people searching *specifically* for that study/page, than to churn out minor posts a few times a week that gain minor links.
You need to focus on more results from less pages going forward.
Crawl Prioritization was always going to become more and more of an issue over time. That's how power laws work. The rich get richer.
Crawl Priority is a complex system, and some of it is driven by circumstances such as news events, big trends, and circumstances beyond your control. But a lot of it is stuff you can have some control over, or at least, intelligently leverage and influence.
'Importance' of a site is one of the signals for priority, and the factors that show importance are the strength and power (not volume) of links and citations. When a major or local news site mentions you, that is a sign of importance. When your granny mentions what a good boy you are, that isn't. If you go through some incredible shenanigans to get adopted so that you have a thousand 'Grannies', and they all cite what a good boy you are, that STILL isn't important. Focus on quality links, links that can't be bought, or faked, or gained by the worst of your rivals.
If you have pages that get a lot of search impressions, but almost no clicks, consider revising, updating, or else pruning them. If Google see that the pages it already has for your site just rarely perform, they don't tend to adjust the ranking of the site, but they may lower the priority of grabbing any more.
Raise your profile. Be part of communities and conversations. Not to have your contributions there filled with links, but so that you get mentioned more, that you are discussed because of who you collaborated with, assisted, or generally were there for. This shows a level of importance wider than just what you are saying about yourself.
If you are regularly adding content to your site, for promotions, products, or simply news and updates, make absolutely certain that you are gaining links and citations, signals of importance, even faster than you are adding content that dilutes what you have. Focus on earning great citations.
Another part of crawl priority is in supply and demand. Think very, very hard before writing content that effectively already exists out there from a thousand other sources. Focus on content that is hot topically, that is 'on trend', and especially where you have a unique perspective (or can hire one), that separates your content from the masses.
22 ๐Ÿ‘๐Ÿฝ3 ๐Ÿ’Ÿ2532 ๐Ÿ’ฌ๐Ÿ—จ

๐Ÿ“ฐ๐Ÿ‘ˆ

Roger
I think you're right, Ammon. I think it's part of a years long trend, as there have been a LOT of people complaining about getting indexed. But it does seem to be turned up a little bit.
I don't mean to spam your post, but Fabrice Canel, the person in charge of crawling and other things at Bing said a lot about choosing what pages get indexed and I think much of what he said reflects concerns at all search engines, not just Bing.
"We are business-driven obviously to satisfy the end customer but we have to pick and chooseโ€ฆWe are guided by key pages that are important on the internet and we follow links to understand what's nextโ€ฆSo the view that we have of the internet is not to go deep forever and crawl useless content."
https://www.searchenginejournal.com/want-to-learn-about-Bing-and-search-ranking-here-you-go/426219/
Want to Learn About Bing and Search Ranking? Here You Go!

Ammon Johns โœ๏ธ ๐Ÿ‘‘
Yup, Crawl Priority has always been something to think about in the background. Even in the Cre8asite days I did sometimes talk about it. Crawling the entire Internet, or at least, all of the bits of it likely to be useful, is an incredible logistical challenge, and I always felt that the amazing complexity of Crawl Prioritization was sadly completely overlooked by far too many.
Back when we did the Q&A with Andrey Lipattsev of Google, one of the things that stuck with me was when he asked what use we had for knowing PageRank, and I said I liked it for knowing a factor of crawl priority, how quickly he dismissed that. Not because it is *not* a factor necessarily, but because of how many other factors come into account and consideration.
We know, for example, that one reason Google so readily did their unprecedented move of PAYING Twitter for access to the firehorse is that Twitter is almost uniquely useful for quickly detecting new trends, changes in meaning or intent, etc. This detection can spark Google to prioritize the crawling and indexing of matching keywords to be even faster at picking up on news, etc.


Mark
I feel a lot of confusion around the indexing subject has been created by Google themselves by saying "create better content and we may or may not index the page in the future". Folk are just a little confused as to what exactly they need to do in order to get a page indexed and stay indexed. It's great that you are taking some of that confusion away in your post here. My question to you Ammon isโ€ฆ Do you know of any tools that people can use to grade the quality of their content as a guide to see if it will get indexed or not?

Ammon Johns โœ๏ธ ๐Ÿ‘‘ ยป Mark
You mean other than using Google? ๐Ÿ˜ƒ
There's no one simple tool, Mark, because the priority depends on so many different factors, each of which is complex in its own right.
Google will prioritize stuff that people search for over stuff people don't. (Search volume metrics, both for the topic, and for the brand it is published on, as both tell them there's an audience wanting it).
They'll prioritize content on important sites over less important sites. (PageRank and Page Citation metrics)
They'll prioritize any fresh content relating to keywords that are trending in news or burstiness to see if there's something happening, changing, new. (general and topic-specific news feeds help here, but prediction is better than reaction).
But which of these is most important, or even present at all, is going to vary, making it difficult if not impossible to 'backward engineer' just by looking at what gets picked up and what doesn't.
But that's all great news. Complex things that take an experienced eye, rather than a cheap tool, to determine is WHY people hire an SEO
Mark ยป Ammon Johns
Maybe it's time I went back into freelancingโ€ฆ Complex excites me!
Ammon Johns โœ๏ธ ๐Ÿ‘‘ ยป Mark
It has always been what I love too. Complexity is what means every day is different to the one before. Two different clients who, to the uninitiated, seem to need exactly the same thing, in reality can be completely different jobs, each fresh and exciting.
And the complexity is layered.
For instance, when I said "prediction is better than reaction" that is carefully nuanced, because, obviously, if everyone starts reading the trends to publish what changes, then all of that content from all of those sites is hitting the queue at the same time, all with that higher priority. Getting the jump, even already having a page on the change in the crawl queue, could be a huge advantage.
Mark ยป Ammon Johns
Well, it's not as if the SEO industry ever takes anything literally! The way I see it is, people such as yourself and this group are taking SEO users on a journey from being Search Engine Optimization (SEO) Robots to becoming SEO Marketers who use their brain. We know that most SEO users love nothing better than a checklist.
Ammon Johns โœ๏ธ ๐Ÿ‘‘ ยป Mark
Which is why I have no doubt that before the month is out, as soon as all the articles on Crawl Prioritization and not getting indexed start to really fly, some of the more exploitative out there will offer simple tools and checklists, designed to fool and take the money of those who seek simplicity and reject complexity.
Hopefully, those in this group, and who read this post won't be among the suckers taken in.

๐Ÿ“ฐ๐Ÿ‘ˆ

Jake
Always great to read a bit of your wisdom Ammon. Thanks for sharing your thoughts.
I'll see if/what this changes for a couple cheap mass page sites I have.
It seems that overtime, Google is getting better at making us do better. I have the feeling that, more and more, many tricks work just for a short period of time or don't work anymore.

Ammon Johns โœ๏ธ ๐Ÿ‘‘ ยป Jake
Most of it isn't retroactive, in that I don't see any signs Google or others are going back through what they have already indexed and applying new standards. BUT, it can and will of course affect recrawling, so you may find that older pages slowly and gradually drop out of the index because the last crawled versions are deemed too 'stale' a capture to be reliable. That can take years though to become critical. However, it would be noticeable in about 6 months or so.
Jake
I'll keep an eye on these! It's going to be interesting.
One thing that popped up in my mind right nowโ€ฆ Mass page sites with thousands of pages each targeting a location with the same text except for the location name, have been quite successful up until now.
The reasoning behind that was that Google wouldn't expect you to write a different page for each and every location you offer your service on. But then again, that's a lot of indexing with low value.
Will the algorithm lean more towards better service pages that just mention the locations the service is offered in? A good test to be performed. I'll try to set something like that if I have the time.
Thanks again for sharing your thoughts!


Buth
However, over the past years, I have seen a steady increase in the number of regular, average-ish business sites that have been advised to blog and create fresh content every few days, have a MAJORITY of their content not be freshly indexed and reindexed.
Hell yeah!
It was a big Google's mistake to focus on quantity, not content/pages quality. I notice spam content wins even now. It means spam techniques work and Google can't handle that. On the other hand Google's spam updates hit "average" business sites which focus on content creation such as "How to doโ€ฆ" for 100500th time.

Ammon Johns โœ๏ธ ๐Ÿ‘‘ ยป Buth
In the early days for Google, the first few years, the size of the index was incredibly important. Google and FAST were constantly outbidding each other on just how much they had indexed, how complete their databases were.
However, once FAST went the same way as AltaVista and other rivals, sold off to Yahoo (which also owns a sizeable amount of shares of Google, like maybe as high as 20%), and Google were the only engine people cared about appearing in, well, the focus had to change.
To be fair, it did change too. Some of the big updates back then are still something webmasters of the time remember and shudder about, such as Jagger, or the Caffeine update.
Later we had years of Panda and Penguin updates, and now both of those just run constantly, completely automated, integrated into the core functionality of Google.
Let's face it, most of the complaints about spam on Google are from people who *think* the spam is what is holding the site in its position, but when they tried the same it didn't work. ๐Ÿ˜ƒ
I see a lot of sites using very questionable tactics, but most of the ones that actually rank also have very good reasons for that ranking too. Most. Not all, of course. But it is very, very rare for a site that successfully manages to fool the algo, or spam its way into a position, to still be there 6 months later.
Google operate on a massive scale. Almost unimaginably massive, and sometimes that includes their timescales too. They don't like to penalize for spam, because that doesn't scale. They far prefer to work out a better algorithm, better signals, so that a hundred thousand other searches also get cleaned up. Even if it takes them a few extra months to perfect the tweak.
Buth ยป Ammon Johns
Thank you for the explanation. I notice that many SEO agencies use "spam" techniques. These techniques work, but it work a few days or couple of weeks. Also I know people who try to replicate these techniques and they do that in the wrong way. Also, I've noticed, many spam tactics don't work now but people try to replicate and they see some results (position increases) sometimes.
Yes, Google has improved its algos and it continues to improve its algos. So, Search Engine Optimization (SEO) is interesting work and I happy to be here.

๐Ÿ“ฐ๐Ÿ‘ˆ


+++++++++++
01046
Ammon ๐ŸŽ“
Indexing Paradigm Shift
It's too early to be absolutely certain, but it is my strong suspicion that the changes we have seen in indexing, where many sites at the lower end of the 'global importance' spectrum are not getting content indexed as quickly as previously, or in some cases, having difficulty getting it indexed at all, is not entirely a temporary situation.
You see, it comes after some years of mumblings and rumblings within Google about their issues with quantity over quality spam. And coincidentally, follows right on the heels of vague announcements about new methods of dealing with spam in a spam update.
The problem for Google was sites like Quora, and a hundred other similar sites, where masses of user-generated content could produce literally hundreds of thousands of new pages over the course of just a few hours. Yet the vast majority of the new URLs this created were most often just rehashes of the same discussions on a thousand already indexed threads and URLs from the same site.
Then there is the increasing availability of so-called AI copy writing software. Software that can create huge volumes of very low quality content at almost no cost in resources. Now, Machine-generated Content spam is, despite the bullshit by the so-called AI content software, nothing new. There was software that did much the same thing years and years ago that scraped and spun existing content into new forms according to a set of rules and instructions, and all the new software is doing is calling those rules and instructions "Artificial Intelligence (AI)
Google have systems to deal with this.
I just believe that they are improving and extending them. Crawl Prioritization has always existed in Google, and is largely a scaleable, self-regulating thing. Every known URL that Google wants to crawl, either to index for the first time, or to revisit to check for updates, is assigned a place in the long queue according to a priority scoring system. Ultra-high priority pages and revisits get taken care of almost at once, while at the other end, ultra-low priority URLs may wait months or even years, and possibly never get a turn at all, as higher priority items get added to the queue ahead of them.
It has always been quite typical that an 'average' website with 'average' importance and link popularity values for its size would have about 90% of its content indexed, and about 10% not indexed, at any given point in time. Sites such as Amazon and Ebay might have a far, far higher percentage of unindexed product pages simply because they change so quickly, and are often at such deep levels of the site in terms of links to follow.
However, over the past years, I have seen a steady increase in the number of regular, average-ish business sites that have been advised to blog and create fresh content every few days, have a MAJORITY of their content not be freshly indexed and reindexed.
Basically, what PageRank was going into and around the site from the few pages that earned any genuine links was being spread so thin by flowing to all the thousands of essentially pointless blog posts nobody cared about, that the strength in any one page was diluted to the point where it just wasn't seen as important.
My suspicion then is that Google have tightened this up, just a little, to help them find truly worthwhile pages even in a swamp of dross, and at the same time, make it much clearer to quantity spammers that their tactic is self-destructive.
So, all of this means that the following is my advice on how to focus with content going forward:
First, do not publish new stuff just to publish new stuff. If a new page is not built to convert for a specific campaign (e.g. a new product line), or is not a well-thought out piece of content you are sure will attract a bunch of genuine new citations, rethink it.
You need to focus on more results from less pages going forward.
It is better to spend 2-3 months creating one, absolute killer piece of content, such as a major study, a really good survey with expert insights, or otherwise something truly special and remarkable that will gain you a bunch of high-value links and buzz, and have people searching *specifically* for that study/page, than to churn out minor posts a few times a week that gain minor links.
You need to focus on more results from less pages going forward.
Crawl Prioritization was always going to become more and more of an issue over time. That's how power laws work. The rich get richer.
Crawl Priority is a complex system, and some of it is driven by circumstances such as news events, big trends, and circumstances beyond your control. But a lot of it is stuff you can have some control over, or at least, intelligently leverage and influence.
'Importance' of a site is one of the signals for priority, and the factors that show importance are the strength and power (not volume) of links and citations. When a major or local news site mentions you, that is a sign of importance. When your granny mentions what a good boy you are, that isn't. If you go through some incredible shenanigans to get adopted so that you have a thousand 'Grannies', and they all cite what a good boy you are, that STILL isn't important. Focus on quality links, links that can't be bought, or faked, or gained by the worst of your rivals.
If you have pages that get a lot of search impressions, but almost no clicks, consider revising, updating, or else pruning them. If Google see that the pages it already has for your site just rarely perform, they don't tend to adjust the ranking of the site, but they may lower the priority of grabbing any more.
Raise your profile. Be part of communities and conversations. Not to have your contributions there filled with links, but so that you get mentioned more, that you are discussed because of who you collaborated with, assisted, or generally were there for. This shows a level of importance wider than just what you are saying about yourself.
If you are regularly adding content to your site, for promotions, products, or simply news and updates, make absolutely certain that you are gaining links and citations, signals of importance, even faster than you are adding content that dilutes what you have. Focus on earning great citations.
Another part of crawl priority is in supply and demand. Think very, very hard before writing content that effectively already exists out there from a thousand other sources. Focus on content that is hot topically, that is 'on trend', and especially where you have a unique perspective (or can hire one), that separates your content from the masses.
160 ๐Ÿ‘๐Ÿฝ29 ๐Ÿ’Ÿ190

๐Ÿ“ฐ๐Ÿ‘ˆ

Adam J. Humphreys
I recently gutted a dental group site consolidating a ton of pages. This resulted in 45% lower bounce rate, considerably longer time on page, and ultimately way more conversions. Page speed optimization was definitely a factor and they went from a 25 mobile on page speed insights to 92 (adequate)/desktop 99 (enough). What I found from orgs told to write lots of keyword fluff is that they initially might get seen but ultimately Google tests pages on a query per query basis. If people don't find it useful the links will only help so much. The more focused, useful, and relevant it is to people the better. Instead of paying someone 150 to write 10 articles maybe get them to do 3 instead of the highest quality possible.

Ammon โœ๏ธ ๐ŸŽ“ ยป Adam J. Humphreys
Yup. For years a lot of small-to-mid sized businesses were told by their SEO users that they needed to constantly be updating the blogs, creating fresh content, etc. We've all seen that same advice shared hundreds of times, for well over a decade, and closer to 15 years.
And never once did they think about the long term consequence, or how self-destructive that actually is.
It was about 8 years ago now that I started to notice that more often than not, a lot of issues sites had was that they had too much content for too little authority (PageRank), or if you prefer, too little PageRank to keep the amount of content they had indexed. They'd kept on blogging, creating hundreds of mostly completely pointless pages per year, until ending up with a 20,000 page site, that only had the importance to justify indexing maybe 200 pages.


Ammon โœ๏ธ ๐ŸŽ“
The closest I can give to a firm confirmation is NOT especially scientific, but John Mueller did specifically give a like to a tweet I made covering these specifics where his position didn't allow him to. https://twitter.com/Ammon_Johns/status/1457754631463968773
Ammonโ™ž on Twitter

Truslow ๐ŸŽ“
When first reading that earlierโ€ฆ I found it interesting there that he said "we don't understand the URL" (because it's not indexed). There's no forced directive that makes that true. "Not Crawled" means they know nothing about it, but "noindex, follow" directives suggest that you could have a page known and crawled (even frequently) that affects other pages without being in the index. There are other cases where I've seen a page that exists, has been crawled, but is not indexed, too.
Can't tell what to infer from that statement. I've been trying to wrap my head around it for almost a month now. It may also be significant that he used the word URL rather than Page.
Ammon โœ๏ธ ๐ŸŽ“ ยป Truslow
He was talking about not having an understanding of the URL based on itself, on its content, and using the context around references and citations of the URL to get data. Make sense?
Also, sidenote, I still try to never use or recommend NOINDEX,FOLLOW as they are a tiny bit oxymoronic. Google index a link by referring to words around it, in the anchor, etc, but a NOINDEX says not to do that.
Altavista famously would NOT index any page that had any ROBOTS meta tag at all (because the only purpose of one is *exclusion*, and to them any form of exclusion was total exclusion). Almost half of all of Google's first hires came from AltaVista, including at the most senior levels. As in that famous AltaVista patent that, by the time it was granted, all three of the names on that important patent were Google employeesโ€ฆ
Truslow ๐ŸŽ“
Agreed on the noindex, follow. The only time I ever use that in practice is when it's decided that a specific tag (or attribute) for posts or products doesn't warrant having those archive pages indexed, but we would still like to have the connections implied. Now – whether or not Google actually makes those connections – I can't say. All I can say is that in that scenario, that's how I would LIKE to have them treat it and that is the best way that I know to signal those to be my wishes.
Ammon โœ๏ธ ๐ŸŽ“ ยป Truslow
This one at least is a pretty simple experiment to make and test. Have a page with high link values link to a noindex,follow page that in turn links to a document with no other inbound links. My bet is that third page is treated as an Orphan, thus excluded, showing that the links, a part of the content they were forbidden to index, are not counted.
Truslow ๐ŸŽ“
Rightโ€ฆ that would help me determine if PageRank is passing through. But it doesn't really say whether the thing I'm particularly interested in is happening.
The way I use tags (and categories and any taxonomy, really) is to show relationships between posts that naturally occur based upon subject matter or things represented my something more than a brief mention in the post. The most powerful way to do that is, of course, a link from the post itself. But my hope is that if I tag all posts about "Link Value" which then basically connects older posts on the subject to the newer ones (one step removed, of course) – that Google understands that connection. And I'd really like that connection to be understood whether or not the "gap page" (i.e. the archive page set for that tag) is actually indexed and rankable itself.
Haven't figured a good way to test that one, yet. I don't care about the link so much as "When I say <this> – I'm talking about the same specific <this> that I'm talking about in these other articles." If I get link juiceโ€ฆ cool. If I get some extra confidence from Google that what it thinks I'm saying is what I'm saying – that's what I'm hoping. And if I get none of that, the user can get some benefit and I have some other techniques that reinforce the same idea anyway.
Ammon โœ๏ธ ๐ŸŽ“ ยป Truslow
It's more than *just* a measure of PageRank, as in it lets you see if links on a noindexed page allow the page it links to to even get indexed. If they don't even count for that much, it's a pretty safe bet they don't count in any other meaningful or useful practical sense.
Moiz ยป Ammon
What are your thoughts on making pagination pages no index but follow? Yoast removed this feature, and I've been indexing only first page while /2/3/ are set to no-index via custom plugin.
Here is what Yoast says.
For a while, SEO users thought it might be a good idea to add a noindex robots meta tag to page 2 and further of a paginated archive. This would prevent people from finding page 2 and further in the search results. The idea was that the search engine would still follow all these links, so all the linked pages would still be properly indexed.
The problem is that in late 2017, Google said something that caught our attention: long-term noindex on a page will lead to them not following links on that page. More recent statements imply that if a page isn't in their index, the links on/from it can't be evaluated at all โ€“ their indexing of pages is tied to their processing of pages.
This makes adding noindex to page 2 and further of paginated archives a bad idea, as it might lead to your articles no longer getting the internal links they need.
Because of what Google said about long-term noindex, in Yoast SEO 6.3 we removed the option to add noindex to subpages of archives.
Ammon โœ๏ธ ๐ŸŽ“ ยป Moiz
My understanding is very similar – that ultimately to NOT index the content of a page includes the links on that page as surely as anything else.
Links *to* a noindex page get indexed (if allowed) and means Google may have content *about* a page not indexed. But links *on* a page that is not indexed should not be indexed or recorded at all.
Moiz ยป Ammon
A bit confusing but let's put it this way.
If I noindex a category which is very normal, does this mean that Google wouldn't follow the content inside those paginated pages as Yoast claims?
Truslow ๐ŸŽ“ ยป Moiz
Somewhere in this comment thread, Ammon and I were talking a bit about the "noindex,follow" thing. Not sure where it is right nowโ€ฆ but it's in here somewhere. lol
๐Ÿคญ2
Moiz ยป Truslow
I think I may have already gotten the answer going over threads โœ…
Ammon โœ๏ธ ๐ŸŽ“ ยป Moiz
It would need to find other links to crawl than those on the noindexed page. So if they were crosslinked somewhere, then so long as one had some inbound link from an indexed page, it could follow the cross-linking without needing the blocked category page.
Also, if someone were using tags, and categories, Google could follow the tag pages links even if not the category ones.

๐Ÿ“ฐ๐Ÿ‘ˆ

Kelly
That's a great post Ammon, thanks for that โ€ฆ On a similarly related note, I know Matt Diggity is going to test 125 pieces of AI-written content on an expired domain this monthโ€ฆ so it'll be interesting to see the results on that.
๐Ÿ’Ÿ7

Ammon โœ๏ธ ๐ŸŽ“ ยป Kelly
It will indeed. My prediction is that the quality of those links will be absolutely key to whether or not it works, but that it may also be modified by the precise niche and search terms involved.
My theory is that where search terms favour freshness, the freshness of the links count more, and aged links count less. While in search topics where information is relatively set, based on solid and established (older, authoritative) knowledge, older, more authoritative links can count more.
Kelly
That makes perfect sense โ€ฆ it'll be interesting to see the outcome
Kevin ยป Randy
Sooo quality content ๐Ÿ™‚
๐Ÿคฏ


Peter
Interesting things to think through. The one I still struggle with is content getting stuck in "discovered" where it's focusing long-tail keywords that don't have focused content already published (i.e. no direct competition). And I know there's search volume for the one example because I'm getting traffic to a mismatched post.
Your comments above make a lot of sense for competitive content. It's low to no competition keywords that have me baffled currently.

Ammon โœ๏ธ ๐ŸŽ“ ยป Peter
If you can give a specific example, I may be able to give some equally specific insight into what may be in play.
Peter ยป Ammon
Thanks. I have 2 more things I'm going to try first before reaching out. On a different site I "accidentally" got a super-thin, very unoptimized page indexed within days (without Google Search Console (GSC) submission) for the keyword I stuck in the slug. And it was in a YMYL niche ๏ฟฝ๏ฟฝ
Resubmitted the same post with an updated URL last night. It's showing as indexed within a few hours of the request. We'll see if it stays ๐Ÿ™‚.


Davison
Totally agree with this and have been saying the same thing about Marketplace websites (Amazon, Redbubble, Etsy, Ebay etc.) for ages on my YouTube channel.
I am also having problems with the site I work for as an in house SEO. This is related to the website not being accessible by Google Bots anymore – they seem to have changed their system in the latest algo update.
Alex ยป Ammon
For the last 6 months I've been cleaning up a site that had a few but shocking quality $5/1000 words text full of mistakes (20 pages), and I've added 100 new well written articles that cover different parts of the topic but the rankings have been slowly dipping. Been driving me nuts! I've slashed the bounce rate, increased time on site massively, massively improved load speed, added schema, improved design and the conversion rate, as well as E-A-T signals but the rankings keep dipping. Could it be the dilution of link equity with the increased pages?

Ammon โœ๏ธ ๐ŸŽ“ ยป Alex
It could be, but there are other potential things you'd want to rule out too. If the site has very few genuine citations, definitely look at what you can do to get at least a few going and see if it sparks any improvement.
Usually anything much under several hundred pages wouldn't be too much of an issue so long as the site had at least the usual basic kinds of citations – some reputable directories, local business links, that sort of thing.
That said, this is a new thing, so the 'usual' may no longer apply, and quite possibly a stricter requirement of 'importance' signal is required now.
Alex ยป Ammon
Thanks. that's my next step. going to check pages that have low traffic and merge to create more in depth ones. Then time to get some more links!


Nick
Agreed on slower indexing being a counter to flurry of AI content.
Do you think Google will rely on UX metrics to determine whether one page is more valuable than another page?
Authority?
Something else? Like branded searches? Or referral traffic?

Ammon โœ๏ธ ๐ŸŽ“ ยป Nick
I believe it is a combination of a very large number of signals, just as ranking itself is. Even the topic (search volume of it), whether or not it is trending, and such counts. It is basically a supply vs demand kind of assessment. Does Google already have more supply of a given quality of content than there is demand for on the topic that links and site-based signals say your page is likely to be about? If they already have higher supply than demand, are there signals that your content is likely to be better than what they already have?
It's Google trying to stop wasting time crawling content that nobody wants or needs, so it can focus more on things that are in demand.
Lonnie ยป Stewart
Big surprise
Stewart ยป Lonnie
Pillars gonna pillar.

๐Ÿ“ฐ๐Ÿ‘ˆ

Alex ยป Ammon
This post has really got me thinking. How does this tie in to covering the topic well? I'm seeing lots of lower keyword volume terms and it's unlikely it's going to be a killer piece that gets great links at these lower volume terms. Each site will have a good combination, of low keyword terms and higher volume terms you can create a killer piece for.
Here's a big/valuable site https://www.investopedia.com/terms/g/gross_profit_margin.asp, and you'll see a ton of pages covering basic terms that cover the topic. (Interesting strategy using a terminology lexicon on the site) They're not amazing pieces of link worthy content, these terms have all been defined a thousand times on the web, but I think they're there to show Google depth of topic coverage, and help a few users, but they'll definitely cause link dilution like you mentioned.
Quick Questions
1) In this case, are these pages you'd keep or tell them to gut?
2) What about indexed pages on a site that don't have links that get under 50 visits per month. Would you gut these or are you only worried about pages that aren't indexed?
3) What if a section of a site adds value to the user, but not Google.
Let's say there's a a site and it compares 500 mortgage/or car financing schemes. Many of the schemes are almost the same, but there are a few minor differences in them, like interest rate, term, etc, but for the most part they're 95% the same. Would you no-index these, canonical them to a comparison table page, etc? They still add value to users as they can compare the different mortgages, etc. So how does your principle of bigger killer pieces and not diluting link juice apply to these type of pages?
I really appreciate the time you've taken to post! I hope I haven't asked too many questions! ๐Ÿ™‚

Ammon โœ๏ธ ๐ŸŽ“ ยป Alex
Sometimes you may have a strategy that, for good reasons, is going to result in a lot of long-tail, low individual value pages, that are unlikely to get many if any direct links, such as the classic glossary. In such cases, as with anything else in any strategy, you have to plan for it.
In such a specific case, that means making sure you have more than enough links and 'juice' flowing into the site to support it. You plan ahead for the fact that the dilution is going to happen, and you ensure you have enough of the good stuff that it isn't *too* diluted by the time it has flowed into all the volume you need.
Think for a moment about a concert. There's a headline act, usually a support act, maybe more than one support act, that people pay to see. How do you ensure the necessary roadies, riggers, lighting techs, sound techs, support staff, and security, all get paid? By having a headline act powerful enough to bring enough paying customers to cover it, and a venue big enough to fit them in. Same thing at the core.
If there *need* to be a lot of ancillary pages that won't bring or drive enough value on their own, then you need to have something else that does the job of bringing the links and the business.
BUT, and this is important, you would still need to be sure that overall those glossary pages were, in some way, enhancing the overall value of the site, helping maintain its reputation (and link-worthiness) as a really important site on the topic. So that individual glossary pages may not bring many links, but the homepage gets a ton of extras because the site contains a full glossary. Otherwise, that glossary is a vanity project, and a site with similar power pages, but not carrying so much dead weight, would crush it since the links *it* gets on similar 'headliners' wouldn't be so diluted.
I *think* this addresses all those questions when you get your head round it, but if you still have any, fire away.
Alex ยป Ammon
Thanks. that really helps me and helps me get my jead around it. I've gone ahead and made a canonical of 650 of these pages for now to a central page and I've unpublished 40% of the remaining pages that had no traffic in the last 30 days. will give it 30 days and see the effect. then will focus on links and slowly republish them as and when we have time to promote them and get them the publicity they deserve.


Lisa
With regards to clicks and impressions. That was an interesting point. I feel totally out of control with. I type one thing, Google edits and changes it. This point is bothering me.

Ammon โœ๏ธ ๐ŸŽ“
Query rewriting is a very tough thing to grasp. Take for example how widespread it has become for SEO users who haven't really thought things through to be using the keyword "near me" in their page.
In reality, of course, when Google see the words "near me" in a query, they know it means the user wants highly localized results to their current location, and *rewrites* the 'near me' into whatever the location is as close as Google can determine it.
So if you are on a smartphone, stood on the Corner of Main and 31st street, and search for "pizza restaurant near me" Google rewrite that into a Local search for "pizza restaurant close to Main and 31st, or by your GPS coordinates. If you were on a laptop at home, Google would (if you were logged into your Google profile) pull your home data from your profile. If not logged in, then it will estimate the location as best it can by ISP or whatever.
But never, in any circumstances, is Google going to be looking for pages that include the *words* "near me" in them. That's not how it works. I see literally thousands of sites getting this wrong, including some pretty big players.
But it is this rise of query rewriting that means a lot of smarter SEO users are moving away from the now outdated approaches to 'keywords'. No longer is the whole phrase, the whole query, a keyword phrase, but rather, there are keywords within any search that Google will generally match to entities in the knowledge graph, and use synonyms wherever appropriate.
So "Pizza restaurant near me" may actually get rewritten to "Pizza restaurant (or any other synonym for a dining establishment that offers pizza in its menus or is associated with pizza in reviews) around [user location here]"โ€ฆ
To cope with this change and fluidity, which is no more than we've done all our lives when dealing with other humans, tools such as Inlinks ( https://inlinks.net/p/category/case-studies/ ) become particularly useful, helping you to map out the 'fact fingerprint' of an entity and its associations.
Remember, what the Link Graph does/did for links, is what the Knowledge Graph is doing for facts and information.
๐Ÿ’Ÿ2

๐Ÿ“ฐ๐Ÿ‘ˆ

Google uses Natural Language Processing (NLP) instead of Latent Semantic Indexing (LSI) due to Both Different Patent Owners


Leave a Reply

Your email address will not be published. Required fields are marked *