Google and AI Content: The Originator Signal as a Solution

Categories
SEO

Organic search is facing a tsunami of synthetic content. The hypothetical ‘originator signal’ could be a solution.

ChatGPT and other generative AI technologies are changing how content is created – faster than marketers can keep up. Exactly what that change means for SEO is still up for debate.

For the better part of the last decade, SEO has very much been a quality–quantity game: outputting as much content as possible while preserving a certain level of quality. The most successful brands found the right middle ground between the two for their respective search landscapes.

But generative AI means that virtually anyone can churn out huge volumes of reasonable-quality text (if we’re defining ‘reasonable’ as being 50–60% on a quality bell curve). The days of balancing quality and quantity are over, and there’s nothing preventing a tsunami of synthetic content from flooding the SERPs.

Many marketers have speculated about the impacts on organic search. Ryan Law at Animalz offers some of the most probable outcomes:

  • There will be hugely increased competition in organic search.
  • Articles will be longer.
  • Keyword targeting will be broader.
  • Programmatic SEO will be more common.
  • Off-page ranking factors will become more important.
  • Authorship will be emphasised.
  • Information gain will be prioritised.
  • Returns from search will drop.

Most of those predictions, particularly the ones around information gain and credibility, are broadly agreed upon by SEOs and content marketers. After all, if all AI content is being produced by the same models using the same data, the only real differentiator will be new data.

The Conundrum: Decreasing Returns on New Data

Let’s assume that this new AI world comes to pass. Let’s also assume that Google does prioritise information gain and authorial credibility. What happens when brands invest tens of thousands of dollars into original research … only to have it plagiarised and repurposed hours or days after publication?

With large language models (LLMs) like ChatGPT, brands could feasibly copy competitor research, rewrite it using AI, and possibly enrich it with plagiarised data from other sources to create one highly optimised article. That expensive original research will end up with a very short lifespan.

And what about accidental plagiarism? LLMs like Bard have live access to information from indexed pages, which means that original concepts or data could be published and then regurgitated into AI-generated articles without the prompter realising. (While Bard’s reliability is currently undercut by the sometimes-severe hallucinations it suffers, we can assume that future iterations will be able to accurately reproduce indexed information.)

If we get to that point, the return on original research will be so low that many brands will give up on organic search as a channel altogether.

A Hypothetical Solution: The ‘Originator’ Signal

But I don’t think Google and its competitors will let search get to that stage. I think that, for now, at least, search engines will remain the best way for people to find reliable information from sources they trust – and that means there needs to be incentives for brands to stay and play.

Let’s step away from digital marketing for a moment and think about the rest of the world. How do we incentivise authors, musicians, and other artists to keep making original content? How do we stop copycats profiting from plagiarism and corrupting the heart of our creative industries? How do we reward innovation in commerce and encourage entrepreneurship? The answer is simple: IP law.

From automatic protections like copyright to more comprehensive measures like trade marks and patents, IP law means that innovations are legally safe from theft. In general, if you got there first, the credit and the benefits are yours.

So, is IP law the answer to AI infringements? I’m not a legal expert, but I don’t think so – at least, not anytime soon. After all, you can’t legally protect ideas, although some jurisdictions may attempt to regulate the datasets used by AI training models. Either way, legislation is notoriously slow to catch up, and AI is evolving at a breakneck pace.

But what if Google, Bing, and other search engines took the principle behind IP law – that the first creator of a work or a model has the right to it – and translated that into an ‘originator’ signal? Something that tied the first appearance of specific information in the SERPs to a specific website and, in doing so, gave that website an advantage for keywords related to that informational entity?

In many ways, this would act as a counter to both AI and idea theft at a general level. Smaller brands could publish content without worrying as much about search leaders swiping their data or concepts and out-competing them based on factors like backlinks, on-page optimisation, and topical authority.

The consequences would be significant for search engines too. We know that content quality has been an increasing concern for Google; the helpful content update was a clear step in moving towards more user-centric SERPs. Adding an originator signal into the mix would be a good incentive for brands to publish more on search – and more original, value-adding content, in turn, attracts more searchers, which gives brands a better search ROI, leading to more investment in content, which leads to more value for searchers, and so on. It’s a virtuous cycle that benefits everyone.

Originator Signal vs. Information Gain

The hypothetical originator signal isn’t just information gain by another name. According to the patent filed by Google, information gain scores reward sites that provide new information relative to other sites that a user might encounter.

Some search engines will provide summary information from one or more responsive and/or relevant documents, in addition to or instead of links to responsive and/or relevant documents, in response to a user’s search query.

However, when a set of documents is identified that share a topic, many of the documents may include similar information […] Thus, although two documents that share a topic may be relevant to the request or interest of the user, the user may have less interest in viewing a second document after already viewing the same or similar information in a first document or set of documents.

So information gain is relative – a transient competitive advantage that only benefits sites as long as they aren’t copied by competitors. The originator signal, on the other hand, would be absolute. First instances of new information would be permanently linked to that information in the context of the relevant entity.

Potentially, there could even be some connection between topical authority and originator signals. Sites with a high level of originator signals relating to entities categorised under a single topic might be viewed as more authoritative in relation to that topic – in essence, they would be rewarded for being thought leaders.

Challenges With the Originator Signal

Technical implementation aside, the idea of an originator signal does have certain issues.

The most obvious is that there might be multiple originators. For larger concepts rather than specific pieces of information, there could be cases of co-development or parallel evolution. In these scenarios, it wouldn’t make sense to designate a single originator. Instead, the originator signal could be a score or a percentage, with each subsequent publisher of the same information/concept receiving a slightly lower score until the SERPs reach a ‘saturation point’ (10 publications? 20 publications?), after which new publishers of the same information would no longer receive an originator score.

The originator signal would also need to be weighted correctly. It would have to complement existing ranking factors (such as recency and information gain) so that being first to a topic doesn’t take precedence over having the most helpful content. In other words, like all of Google’s ranking factors, it would be a small piece of a big puzzle.

And what about cross-channel idea aggregation – when one party’s idea (published somewhere other than a website) is taken and republished by a second party’s site? That second party would receive the main benefits of the originator signal, which could result in commercial gain. While this might seem unfair, I think it’s ultimately a non-issue for two reasons.

Firstly, concepts/data that aren’t covered by IP protections can be exploited by non-originators anyway. There’s nothing stopping you from taking an idea you read on social media, talking about it on your website, and benefitting from it.

Secondly, from Google’s perspective, this could actually be a good thing. Rather than originators starting idea distribution with non-search channels – podcasts, social media, radio, and so on – the incentive to gain the top originator score would lead to information being first published on websites. Distribution via other channels would occur either simultaneously or at a later date. This prioritisation of search distribution would consolidate Google’s position as being the best source of information, potentially slowing or even reversing the trend towards dark social research over the last five years.

The biggest issue is possibly around retrospective application. Would the originator signal be applied to information that has previously been indexed by Google? Would this even be possible – or fair or accurate? This is veering into technical territory (in other words, away from what I feel comfortable speculating about), but it’s definitely a consideration.

Summary

I can’t speak to the technical feasibility of an originator signal. But I do believe that Google and other search engines will have to take radical steps to dampen the impact of synthetic content – or risk search as a category becoming significantly devalued.

An originator signal would be a powerful counter to AI-generated text, and, at the same time, address the chronic quality issues that have been plaguing organic search. Innovative, fast-moving brands would gain an advantage, leading to increased competition, richer search results, and a significantly better experience for consumers.

Regardless of whether something like the originator signal becomes a reality, the SEO industry is at the frontier of an incredibly interesting time. Navigating it will require a first principles approach – focusing on engaging content that adds unique value for users. So who knows? Generative AI might actually prove to be the catalyst for a reinvigorated, more rewarding world of search.

Changelogs provide transparency into when and why we make changes to certain articles. We do not log minor stylistic changes or grammatical fixes.

15 May 2023

Two paragraphs updated to reflect the May 10 release of Bard.

By Duncan Croker

Duncan is a copywriter with a background in editing and storytelling. He loves collaborating with brands big and small, and thrives on the challenges of hard marketing.