Does Google Usually Crawl Text Content of an URL first and delay Indexing it?



Clint
Question, Do you think if Google indexes a URL, it has already analyzed the content and processed it for placement for a query?
Or do you think they are indexing first, then coming back to analyze the content later?
12 👍🏽1221 💬🗨

📰👈

Ammon Johns 🎓
They have parsed it for content, and processed a lot of the data, but rankings are determined only at the point of query. Some of the signals it is ranked on, of course, are processed in advance, especially those relating to links, trust, authority, quality.
👍🏽19

Nathan » Ammon Johns
When is trust and authority re-processed? Is it entity-based or page-based?
Ammon Johns 🎓 » Nathan
A large part of Authority *seems* to be link based, thus it is continually updated along with the link graph recalculations. Authority in the usual sense of Search Engine Optimization (SEO) is not Topical Authority. However, Topical Authority would also be updated with the link graph.
Trust is probably more based on what is on the site (although this includes links), and is probably mainly reevaluated with each crawl. If there is a link graph based component as well, with trusted sites 'lending' trust to those they link to, then it probably gets updated every time the value is known to change.
Just to add, if Brand search has any effect on 'Authority' rather than being its own 'importance' metric, then again, whenever the score is known to be adjusted, Google have to store it somewhere, and the most efficient place is where they index it.


Mišo
It takes months for Google to evaluate website's quality and relevance. Rankings for various queries can move even for an entire year before settling down.
If the position where it landed was deserved (ie no signal forcing was used), it can stay there for years. Unless competition does something significantly better; stagnation can mean moving backwards.

Ammon Johns 🎓
Things will certainly fluctuate over months, but that's more about changes, and slowly picking up links and recalculating the ever-changing link graph repeatedly over time. Remember, Google make tweaks to the algorithms almost constantly, and we tend to get major infrastructure changes, ones that affect all searches, a couple of times each year.
The initial ranking of on-page relevancy and quality is instantaneous. How the rest of the Internet relates to your page is what changes over the months, along with how Google approach the query, and what signals they have available (infrastructure changes almost always add new capabilities to process new data, or old data in new ways).
Mišo » Ammon Johns
Many factors are at play, I agree. I should have mentioned the trend factors that can boost things such as news articles. In my mind, I was referring to articles that are meant to be relevant for a longer period of time. When no forcing signals are used, algorithm tweaks have little to no negative impact on this kind of content.
Ammon Johns 🎓 » Mišo
When Google brought in the Hummingbird update, it affected more than 10% of all searches, and had nothing to do with spam or quality, merely in enabling better query rewriting.
The infamous 'Medic' update affected thousands of legitimate, non-forced, webpages, just by tightening up how strong their brand/authority needed to be.
There are a lot of changes Google introduce that massively affect pages not forced, not spammed into place, simply no longer meeting some threshold Google set, and those thresholds move and change frequently.
Mišo » Ammon Johns
Yes, that is true. It seems my viewpoint is simply too much outside the SEO scope alone, so I understand that it does not fit into this topic well. I had been working on recovering some major medical providers that have millions of monthly searches and always brought them back. My reply focused too much on SEO alone, as if all other aspects of web were already in place. However, I keep forgetting that they are not in most cases.
🤔


Robert
Good question. I think it depends on the queue. How many pages are waiting to be indexed versus how many pages are waiting to be evaluated for relevance and quality. Different algorithms.
I think we're seeing right now that Google is strapped for resources and there's a backlog on all kinds of different things. I think this is related to the explosion in online interactions and business since covid. They're IT infrastructure improvement projections we're rendered useless when covid came and exploded the move to online. They quickly had trouble keeping up. Combine computer and server hardware supply chain issues with that, and you have the backlog of indexing, Google quality evaluation, crawling, and indeed quality and relevance analysis.
I think in the past, all of these things seemed to happen relatively at the same time. This is because Google was able to accurately generally predict how many new web pages were being created over time and plan their it infrastructure improvements accurately. But because of covid, all bets are off. And now they're scrambling to keep up.
Jubair
I assumed on one of my blog, that's a new blog, we have published 12 – 15 articles, there are some posts, some posts are very unique topics that are get indexed so fast, even I didn't request manually to index, but when I check after 2 days I found that indexed. But the problem is, some posts are very common, bunch of peoples already published the same topic, same concept. Even the posts are unique, but still taking so long to index, sometimes I see it not getting indexed in 20 days+ ,,

Ammon Johns 🎓 » Jubair
Yes, the subject matter, and how topical or useful Google believe it will be to search users is absolutely a factor.
If there are keywords that are trending in links to the new page, meaning Google is looking for fresh content because something has changed in context, then it has a higher than normal crawl priority.
If content is published on a site for which Google already have a lot of other pages, and most of those pages don't do especially well in search, and there is little to no brand search, then that would have a lower priority.
I have talked about this in more detail elsewhere in the group.
Jubair » Ammon Johns
Exactly, thanks for making it clear 😊

📰👈



Is Crawled Currently Not Indexed Will Never or Hard To Show because It Only been Archive?

Does Google Crawl and Index Users Generate Content Sites like Quora and Medium Easier than Yours?

Published some Articles on an Expired Domain, but they’re not Indexed yet

Bought an Expired Domain, its Pages got normally Indexed by Bing for Major Keywords, but In Google, they are out of the Index

Discuss Discovered Currently not Indexed by Google SE

ROI of SEO Marketing gets Lower, and Google is looking Tired to Index New Entry Pages, so Should We Buy Some Ads?

Google uses Natural Language Processing (NLP) instead of Latent Semantic Indexing (LSI) due to Both Different Patent Owners

Discussion about to Get Solutions for Excluded URLs in GSC

SEO and URL Redirection for Expired Domains

an SEO Analyst Believes 301 redirection of an URL to the Same Slug Retains the Full Pagerank


Leave a Reply

Your email address will not be published. Required fields are marked *