If there’s one thing that can drive a veteran SEO nuts, it’s the myths and misconceptions. Beyond the fact that it’s just damned irritating, there’s also real-world affects as we often have to get into debates on them with clients. To the point of actually wasting valuable time dis-proving them.
And if you’re me? There’s some mental health issues as I tend to lose my mind when I hear such crap out there… (no worries, I still manage to find it again each time).
One of the more common ones that I’ve railed against over the years is; that Google is using click-data (such as SERP click-through rates) in their scoring (ranking) algorithms. Yes, I understand that it seemingly makes sense, but I’ve really never seen much to prove that its the case. For their part they’ve generally stated that it is “noisy and spammable” and they generally use it for “evaluation purposes and training data“.
Does Google use CTR in ranking algorithms?
First things first… I’ve covered this one a few times before. Countless times in interviews and hangouts. In a post back in 2016 and most recently, May of this year (2019). In those articles I’ve already noted the countless occasions that Googlers have chimed in on the topic. So, why would I be back beating this beast once again?
There’s a patent… something beyond mere statements from Google.
Deriving and using interaction profiles
Filed; December 11th 2015
Awarded; August 20 201
Ok, so the one thing that Googlers have often talked about is that click-data and many other forms of implicit user feedback, are more about evaluating existing algorithms, than they are about scoring mechanisms for the data sets. And from my many years of studying Google’s approaches, it does actually makes sense.
While talking about this patent, my pal Bill Slawski mentioned another that was related to this one. And it certainly is given that two of the Google Engineers on this one are also on that one ( Alexis J. Battle and David Ariel Cohn ).
The Meat on The Bones
Ok so let’s look at a few elements from this one;
“For example, if a commercial search engine has a new algorithm for determining search results for a search query, the commercial search engine may present results from the new algorithm, and compare the click rate of the results from the new algorithm to the click rate of the results from the old algorithm. A higher click rate on results from the new algorithm suggests that it is superior.”
And really, that’s the crux of it right there. But that’s just a start, it doesn’t account to various issues such as click-baity titles etc, as noted by, “(…) evaluating the quality of search results based solely on which results in a result set are selected by users (or “clicked on”) may not yield an effective evaluation.”. And interestingly, they also talk about an “interaction profile” for users in this type of evaluation, which leans more on implicit user feedback, not just click-data per se.
They mention;
- click-duration data
- multiple-click data
- and query-refinement data.
The interaction profile can include;
- which results in a result set a user clicked
- how long the user remained on the target web site
- and other user-behavior information.
Which of course is what is known as implicit user feedback (more on that later)
And usages for this method, including;
- determining the quality of ranking algorithms
- optimizing algorithms
- and detecting undesirable search results.
All of this of course can also be used in a sort of A/B testing scenario when doing quality assurance testing on a new algorithm that they’re considering implementing. Which I guess… would be kinda of funny if you’re checking rankings, or landed upon by a rank tracking tool. But that’s another story…
Data that can be tracking can include;
- the total number of clicks made on the set during a user session,
- the total number of single, long clicks made on the set during a user session,
- the ratio of long clicks to short clicks on the result set during a user session,
- the number of refinements of the original query made during a session.
Certainly there’s some things that we can glean from this, as far as methodology, but let’s not run off and start thinking that these in themselves, are being used as scoring elements in the core results. There simply isn’t much out there to qualify those types of statements.
Short and Long Clicks
It’s also probably worth noting what exactly this is. For most of us it’s what we’d talk about as “pogo-sticking” and “time on page” type metrics. But the specifics are mentioned in the patent, so it’s worth highlighting here as well…
“A short click indicates that the user returns to the result set shortly after clicking on one of the results (e.g., a user clicks on a URL in a result in a result set, views the page associated with the URL, and immediately clicks the “Back” button on the user’s Internet browser). In one embodiment, a short click is a click that is less than 80 seconds in duration.”
“A long click indicates that the user either returns to the result set after a relatively significant time viewing other pages (e.g., after 200 seconds or more), or does not return to the result set. A long click often indicates that the user located content of interest upon clicking on a result in the result set.”
That’s actually somewhat interesting, as I’ve not previously seen them stated with actual temporal elements (time on page etc).
They also get into click bias and session data (does the user tend to have a lot of long or short clicks within a given session). But we’ll leave that for now.
What’s the point
As far as using such methods to deal with search quality, they discuss some of the purposes behind such methodologies including;
- used to detect an article with an undesired attribute,
- to detect a manipulated article (such as spam),
- to detect a high-quality search result that should appear higher in search rankings,
- to evaluate the quality and usefulness of a search algorithm,
And what exactly is a “manipulated article” you ask? It’s what many of us do for a living… LOL.
“A manipulated article comprises an article that has been manipulated. For example, a manipulated article may comprise an article that has been manipulated specifically to influence a search engine’s treatment of the article.”
Sound familiar? Yea, I thought it would…
The Web Spam Connection
In another potential use for these systems, they also talk about spam detection. Which was stated as, “a click profile is developed for manipulated-article results, such as spam results”. I found this fairly interesting as in my experience, patented approaches that fill multiple purposes, are more likely to be implemented. Bear in mind; a patent is just that, a patent. It doesn’t mean that it’s been implemented.
They state some qualifiers such as;
- spam results comprise results that are overly optimized for search engines
- the pages tend to score high in page-scoring algorithms, but they are seldom visited by navigational client users;
- they are linked to by guestbook spammers
- they can be readily identified by human evaluators.
Guestbooks? What year is this. Meh. Nevermind… moving on. They also state;
“It has been observed that, for spam results, long duration clicks are only half as likely for non-spam; and results with very short clicks are the most likely to be spam results. The duration for spam results is typically short, evidencing a mean staytime of 51.3 seconds. Also, the ratio of short to long clicks is twice as high as for non-spam results, and spam results evidence more multiple clicks that other results. Generally, examining a particular result is not as effective as examining results for a domain. These and other factors may be considered in determining a click profile for a class of spam.”
And hey, even if you’re not a spammer, you might want to look at some of your engagement data to ensure you don’t fit that profile.
The Takeaway
Ultimately this patent (and the ones shared by Bill) lends more credence to the historic statements that have been made by various Googlers. But interestingly, we can also glean some other elements of search quality if we read between the lines. You can certainly find some nuggets in here that might adapt how you think about optimizing (or over-optimizing?) pages that you’re working on and engagement data in general.
It’s not often that I come across patents that give me pause to think about how I go about what I am doing… and this one certainly did that.
I sure hope you didn’t LT;DR this, as it is certainly quite enlightening…
Keep it geeky out there my friends and I’ll see you again soon…