Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> IN THE LAST WEEK OF APRIL, nearly 23 percent of all traffic to news sites tracked by web analytics firm Parse.ly came from search engines. Google alone accounts for nearly half of external referral traffic...

Is this surprisingly low to anyone else?

Depending on how you parse it, only 10% or 20% of news sites traffic comes from Google.

When I worked in comparison shopping, 80-90% of our inbound traffic came from Google, as we failed and failed again to cultivate any loyal, direct users.



Hey, CTO of Parse.ly here. Might be surprising, but news sites get their traffic from 5 broad categories: (1) search engines (mostly Google in US); (2) social networks (Facebook, LinkedIn, Twitter, Pinterest, etc.); (3) editorial & recirculation (homepage promotion, article-to-article promotion); (4) direct, text, & email, which covers things like WhatsApp and manually shared email links; and (5) the long tail ("other"), which covers things like Google News, Flipboard, blogs, and site-to-site links.

These 5 categories are roughly equally split in aggregate traffic -- somewhere between 15-25% per category. You're right that certain kinds of sites, like e-commerce, are heavily weighted toward search -- but this is not broadly or necessarily true for the whole "content universe" of news, information, & entertainment sites, including blogs and so on.

Our data reveal all sorts of interesting patterns that go against mainstream assumptions about how people read/watch content online. For example, a measly 1% of traffic to content publishers comes from Twitter, even though Twitter certainly seems like it drives way more than 1% of the conversation, especially in certain categories of content. I wrote about that phenomenon here:

https://twitter.com/amontalenti/status/1126913440746962944

If you care to go deeper, one of our data analysts, Kelsey, did a nice deep dive on the different kinds of traffic sources that resonate with different content categories here:

https://blog.parse.ly/post/8329/the-14-top-referrer-sources-...


Thanks for posting here pixelmonkey! I had a question; I asked the Parse.ly help/marketing people a while ago if you guys were tracking the rates of invalid traffic on articles and they said you guys don't track it. Is that accurate? Or are there any estimates on how much noise/dirt there is in the data?


Hmm, that's an interesting question, but I'm not sure I fully understand it. By invalid traffic, do you just mean, non-human (bot) traffic?

If so, I can say that over time, we have improved our use of bot lists, though that's just an IP blocking thing. Non-human traffic detection is not presently a strong focus of the company, though people have asked us to invest there. The issue is that non-human traffic detection is a somewhat gnarly problem in its own right, with its own vendors (mostly cybersecurity vendors) trying to figure that problem out.

We do know we are missing some traffic due to ad/analytics blockers and pi-hole style VPNs, which is fine.

One way we have thought about guesstimating "noise/dirt" in the data is to use one of our universally measured metrics, engaged time. So, we could separate really short page sessions from the rest, under the assumption that if a page session is super short, it's either a mindless human click or a JavaScript-enabled bot crawl. I discussed this on our blog awhile back when we did a data study on the subject:

https://blog.parse.ly/post/6509/replacing-bounce-rate-with-e...

In that study, we found that 32% of visits to pages were "bad visits" (page session <15s), a pretty high number, but that would include not just bots, but also humans queuing up tabs, Instapaper/Pocket saves, and so on.


Apologies on the terminology - by invalid traffic I'm referring to bots as well as click farms and other issues as used in the Media Rating Council's definition (they divide it into general and sophisticated invalid traffic both of which have a lot of types of traffic, http://mediaratingcouncil.org/101515_IVT%20Addendum%20FINAL%...).

I'm just a bit concerned that the Russian malware dudes were re-purposing their click fraud for astroturfing way back in 2015 and they had no problem just sitting and building dwell time instead of bouncing (https://www.trustwave.com/en-us/resources/blogs/spiderlabs-b...). I haven't been able to find anything indicating that US media companies have any kind of tracking to defend against or even identify a similar strategy being used to hit their article analytics to influence article production/placement, especially when it's now known that a Russian information campaign against the US was going on at the time.


I'd love to have us do better here and you sound very knowledgeable on the subject. Willing to reach out to me by email? ~email redacted~


When comparison shopping all anyone really cares about is the price, and you're comparing like for like in most cases so Google works really well. News is something where I at least rarely find myself wanting just any old opinion on a story, so I'm more likely to go to a news source that I trust already for their take on that.


? When comparison shopping all anyone really cares about is the price

I disagree - people care about delivery times, quality of goods (is it what I ordered), and in some cases, they are open to alternatives (I would like a cheap android phone, If you can give me a Nokia instead of a Huawei, I don't care).

Similarly for news, people may have some preferences in their browsing, but ultimately for breaking news it doesn't matter whether it comes from CNN or the India times if there's no opinion involved in it yet, and lets be honest, most short-form journalism has very little research/opinion

For long form/blog-like content I agree with you though.


> When comparison shopping all anyone really cares about is the price

While I certainly also care about price: it's not the only thing I care about. I care about length of warranty, user-serviceability and/or ease of return for defect or repair, and reliability.

Why should I spend $800 on something that the manufacturer only warrants is useful for 30 days? Why should I spend $80 on something that I can't repair using my own tools? Why should I spend $8 on something that breaks within a week and it's more expensive to return than buy again?

I take pride in the things I've acquired. I provide care and maintenance to them. I think only the poorest and/or un-savviest and/or lavish of people worry only about price.

> and you're comparing like for like in most cases so Google works really well.

I've found that most "marketplaces" provide really poor experiences for customers like me. Amazon is right at the top of the poor experiences. Google Shopping comes next in line. Even something like Newegg will frequently have some pretty iffy deals going on.

It makes you (me) really wonder about other "marketplaces" such as the ACA.


Not really, I would bet that twitter, reddit, facebook, and links from other news-like content generate the lion's share of traffic.


For news I go to a small set of specific sites combined with links from an number of social media sites like Reddit and HN for wider coverage.

I think it's very different than comparison shopping - for news what someone you trust considers important and what communities you care about matters.

I usually only look at Google if I want to dive into more sources about a specific news item, which is fairly rare, and usually indicates I have reason to mistrust that something is covered properly by my usual sources, or if it's something that for other reasons will not be covered by my usual sources (e.g. let's say some local news item in a location where I don't know what the trusted local media is).

For comparison shopping on the other hand, I want to find who can sell me something cheapest - if your site shows up in Google, then I don't have a reason to go to you directly vs. going to Google and getting others too.


Who needs to go to the news site anymore when google shows you the content already? I bet they are losing visits from that. With the other visitors to the news most likely bookmarking their favourite/trusted sites.


Because headlines are _not_ a substitute for quality journalism and despite falling ad revenue, people will still seek it out to some extent. It's why the EU link tax is a terrible idea - when Google News pulled out of Spain it damaged the online news industry badly[0] with 6-14% drops.

[0]https://www.zdnet.com/article/the-google-news-effect-spain-r...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: