This morning the WSJ reported that at a private meeting of newspaper executives last week, one of the models they considered for extracting revenue online was ASCAP, the American Society of Composers, Authors, and Publishers. (Page A11, or online here). That’s great, but there are a few technical considerations, and you can do a lot better than ASCAP. So I thought I would weigh in with some concrete suggestions about how such a system might be implemented so that it is accurate and fair.
First, some background. I was the Director of Research at MediaSentry, where we provided music companies with intelligence about online piracy, attempted to interdict or frustrate would-be downloaders, and collected evidence for civil actions. So I know quite a bit about monitoring and protecting online content. Also, through a composer I’m acquainted with, I’ve heard a good deal about the drawbacks of ASCAP.
ASCAP pays composers according to a formula based on the amount of play-time for their songs. That play-time includes things such as bars, and just about anything except humming it in the shower. On the other end of the cash flow, bars, radio stations, television programs, all pay into ASCAP based on estimates of the size of the audience. The tricky part is figuring out how much each composer gets paid. That requires a lot of sampling, since your neighborhood bar doesn’t report what songs they’re playing. That just requires listening to some radio stations and spending time in bars, or getting a few of them to self-report. The sampling is such that small composers tend to get nothing, because they are below ASCAP’s threshold of statistical significance.
How do you build a system that is accurate and fair? How would sampling work for online news, and how would you price membership for participants? You just can’t get reliable pageview information. You could ask for it, but it would have to be taken on good-faith, and that’s no way to determine payments. You can’t even get reliable traffic estimates for a site. Alexa and Compete.com aren’t accurate enough, and won’t distinguish between a site’s own content and content they’ve taken from elsewhere. There’s the Nielsen Ratings system, but read the criticisms on their Wikipedia entry before you seriously consider replicating their sampling method.
One practical system would be to put a callback on every article with an identifier for the site. That callback reports to the newspaper’s version of ASCAP that the article has been viewed. Its basically the same as how Google Analytics works. It can be defeated client-side, by simply blocking the request, but there is no incentive for consumers to do so — it only impacts payments between other parties, and does not interfere with their web experience. Sampling would not be required, as nearly complete information is available.
There are a few loop-holes. How would you count full-text RSS feeds? Triggering the same callback method for every feed request would overcount, and there is no mechanism for triggering callbacks in the reader. There are many different types of RSS consumers. Here I think you would make a different callback and discount RSS ingestion based on some agreed-upon factor, say 10%. One would need to run a study to determine, given a certain number of RSS feed requests, how many result in reading or browsing an article. What about other devices? The Amazon Kindle? Here again, you need a separate type of callback, with its own factor. API’s? Easy, just do a callback server-side whenever an article is fed out, but this is less easy to verify.
The newspaper version of ASCAP also needs verifiability. This is relatively easy. Sites can be crawled, and the presence of a client-side callback function verified. In the same way, you can look at the source of any web page and determine whether they are using Google Analytics — the javascript code is in plain view. For API’s and RSS feeds, its not as easy to verify. However one can do a simple statistical analysis using a method similar to how lock-in amplifiers are used to extract signals in the presence of even very high noise (other users). Make RSS or API calls yourself with some fixed modulation, at a level far below their overall usage. That it can be so far below is the magic of the lock-in amp. Confirm the presence of the same signal in the callbacks you receive. Its relatively easy to implement, and the statistics are bomb-proof. Do it randomly, and relatively infrequently, to verify compliance. For full-text RSS and API methods of delivery, you’ll need to warn partipants to make the server-side callback within a second of the request, and not cache that request. Yes, its being a bit anal, but just the presence of a verification mechanism will keep everyone in line.
With all of that readership data in hand, you can both accurately bill participants in the system, and accurately pay content owners. It doesn’t have the sampling issues of ASCAP that tend to short-change smaller contributors. You could even bill based on actual usage, instead of signing a contract upfront and calculating fees with an obscure formula, as ASCAP does.
In short, this system seems to have some promise, and might in fact work. Its accurate, easy to implement, and fair to all content owners and consumers.