My posts today on SecurityRatty inspired a bit more debate than I expected. A number of commenters asked if someone still links back to my site, how can I consider it theft? What makes it different than other content aggregators?
This is actually a big problem on many of the sites where I contribute content. From TidBITS to industry news sites, skimmers scrape the content, and often present it as their own. Some, like Ratty, aren’t as bad since they still link back. Others I never even see since they skip the linking process. I’ve been in discussions with other bloggers, analysts, and journalists where we all struggle with this issue. The good news is most of it is little more than an annoyance; my popularity is high enough now that people who search for my content will hit me on Google long before any of these other sites. But it’s still annoying.
Here’s my take on theft vs. legal use:
- Per my Creative Commons license, I allow non-commercial use of my content if it’s attributed back to me. By “non-commercial” I mean you don’t directly profit from the content. A security vendor linking into my posts and commenting on it is totally fine, since they aren’t using the content directly to profit. Reposting every single post I put up, with full content (as Ratty does), and placing advertising around it, is a violation. I purposely don’t sell advertising on this site- the closest I come is something like the SANS affiliate program which is a partner organization that I think offers value to my readers.
- Thieves take entire posts (attributed or not) and do not contribute their own content. They leech off others. Even if someone produces a feed with my headlines, and maybe a couple line summary, and then links into the original posts I consider that legitimate.
- Related to (2), search engines and feed aggregators are fine since they don’t repurpose the entire content. Technorati, Google, and others help people find my content, but they don’t host it. To get the full content people need to visit my site, or subscribe to my feed. Yes, they sell advertising, but not on my full content, for which readers need to visit my site.
- In some cases I may authorize a full representation of my content/feed, but it’s *my* decision. I do this with the Security Bloggers Network since it expands my reach, I have full access to readership statistics, and it’s content I like to be associated with.
- Many people use large chunks of my content on their sites, but they attribute back and use my content as something to blog about, thus contributing to the collective dialog. Thieves just scrape, and don’t contribute.
- Thieves steal content even when asked to cease and desist. I know 2 other bloggers that asked Ratty to drop them and he didn’t. I know one that did get dropped on request, but I only found that out after I put up my post (and knew the other requests were ignored). I didn’t ask myself, based on reports from others that were ignored.
Thus thieves violate content licenses, take full content and not just snippets, ignore requests to stop, and don’t contribute to the community dialog/discussion. Attributed or not, it’s still theft (albeit slightly less evil than unattributed theft).
I’m not naive; I don’t expect the problem to ever go away. To be honest, if it does it means my content is no longer of value. But that doesn’t mean I don’t reserve the right to protect my content when I can. I’ve been posting nearly daily for 2 years, and trying to put up a large volume of valuable content that helps people in their day to day jobs, not just comments on news stories. It’s one of the most difficult undertakings of my life, and even though I don’t directly generate revenue from advertising I get both personal satisfaction and other business benefits from having readers on my site, or reading my feed. To be blunt, my words feed my family.
The content is free, but I own my words – they are not in the public domain.
Reader interactions
7 Replies to “Defining (Blog) Content Theft”
I consider it theft? What makes it different than other content aggregators? This is actually a bighttp://securosis.com/2008/07/02/defining-blog-content-theft/Amazon price comparision search engine – loudsight.comThis site compares prices from all the Amazon
Like Rich, I also publish full content because that’s how I like to get it. Very few blogs are worth subscribing too that only post excerpts. I only click through to a blog if I’‘m interested in reading any comments or posting one anyways.
First, the good news. That site seems to have converted to snippets with links back to the original site. Hard to complain about that.
Now to some of the comments here. First, I put all my content into the RSS feed because that’s how I, as a consumer of information, like to get it. I rarely click through to full articles via my RSS reader; at the volume I consumer, it just takes too much time.
I also fully recognize that once you put content out there, there is absolutely no way to protect it. It will be used and abused in every way imagineable. For the most part this works in my favor.
But it doesn’‘t mean I don’‘t have the right to get a little nasty myself towards the occasional abuser that ends up on my radar. In this case, I not only improved the situation for myself but for every other feed that was being pulled.
While I accept my content will be misused, I get really irked at all the SEO crap and blog spam taking advantage of average Internet users (who are more the victims than I am). Every now and then I have the urge to lash out, and in this case it made the world just a smidge better.
@Pepper:
First off, good link. Well worth the read.
Second, He retains copyright. Copyright is simply (very, very simply) the right to enforce your ownership of a particular type of intellectual property through the law. He can show he originated the content and that another party has used it. So he is within his rights to send a C&D or pursue a lawsuit to enforce his rights. A certain amount of that is necessary depending on context in order to keep his rights enforceable.
The question is at what point enforcing these rights is of maximal benefit based on the content providers purpose for publication. The differentiation in my logical construct was never binary, it was an examination of the spectrum between the primary and secondary goals. As an object example, look at my use of Wiki for the definition of theft. Despite the negative connotations of depending on Wikipedia as a source I used it instead of the West’s Business Law sitting on the shelf next to my desk. Why? Because of my perception of the relative risk in using content from the given provider. Sometimes holding content too tightly makes it valueless in the Internet context.
And Rich can and should do whatever he thinks best in support of his interests. All I’‘m doing is commenting on them.
Why put all the content in a feed? I read about 90% of most blogs through my RSS reader, clicking through to about the last 10%. I habitually unsubscribe from blogs that don’‘t put full content in feeds as the cost in time to read a given blog through a web interface is greater than the benefit received for doing so in most cases. So the calculation here is, which is of greater benefit to the content creator: 10% click through or 0% click through?
Let’s assume I am atypical user, representing 10% of the average blog’s readership and that readership follows the continuum from 100% RSS, 0% click through (e.g., using RSS only and not going to the web or being unsubscribed) to 0% RSS, 100% click through (e.g. reading only on the web interface and not using RSS). What are the benefits for the content creator for each of these classes of users? How does the limiting of RSS feed size affect the population? At what point do users switch to an alternate provider of similar content.
I’‘m asking the questions because I don’‘t have the answers though I suspect someone has done this market analysis. The irony would be if someone did know and blogged or commented about the analysis results, wouldn’‘t they be stealing the original content?
Daniel Philpott
Um, no. Rich doesn’‘t have to choose only one goal (communications) and forgo all others, as you suggest. It’s also absurd to say that he retains copyright as a good, when discussing the lack of enforcement.
Rich is entirely entitled to say he wants to make no technical restrictions on the content, and expects people to restrict themselves to snippets and commentary (per fair use) instead of republishing. Declining to use copy protection techniques, which would impair human (RSS) readers does not remove Rich’s right to point out this misbehavior, or to jab SR with humor if he feels like it.
Rich,
James Duncan Davidson has a long and rambling post where he ruminates on republishing. The comments there are interesting too.
@ Daniel, just to be clear… My repost of Rich’s content was intended as a joke. I also cleared it with Rich beforehand 😉