Google InQuotes and Silobreaker Quote Attribution

September 25, 2008

This is something of a followup to yesterday’s post about Google’s InQuotes feature.  I was reading a post from Mark Forscher, who laments lack of access through an API, or the ability to plug in any person.  It occurred to me that at Daylife we opted to package the quotes somewhat differently because it is being served through an API.  You will note that Google displays a fairly good chunk of the excerpt, for example this one:

“At a time of crisis, when leadership is needed, Senator Obama has not provided it,” McCain said Sunday in a speech to the National Guard Association in Baltimore. “We saw the same lack of leadership on Iraq.”

Whereas Daylife would report this as:

“At a time of crisis, when leadership is needed, Senator Obama has not provided it… We saw the same lack of leadership on Iraq.”

A very long time ago, if you went to the Daylife web site, you would have seen something more like the former.  It requires more real estate to display, but has a nice feature that I have sorely missed.  It covers your butt if your attribution is inaccurate.  It also provides a bit more context, in this telling you it was a speech to the National Guard Association in Baltimore.  You could also do an easy but not terribly accurate job of searching attributed quotes by indexing the quote and non-quote text separately, although both Daylife and Google opted for a more sophisticated approach.

You will note that for the Google application, if you look for quotes on “immigration” or “bush”, the words always appear within the quote, and not the non-quote text that they display.  That’s some nice attention to detail for a feature like this.  They’re only indexing the quote, but displaying a broader swath of text.  So they index the quote, and separately store the quote with the context, or perhaps index its position within the full article body.  Similarly if you search for “Depp” for quotes by Johnny Depp, you correctly get nothing.  They’re also probably doing some simple sentence chunking to select better endpoints.  That, for those of you that haven’t tried it, includes more than just looking for periods, exclamation points, and question marks.

For the Daylife API, all you get is the quote, with ellipses joining fragments where appropriate.  That requires an algorithm for bridging quote fragments, not a trivial task for a quote-heavy document, and complicates attribution somewhat.  I’m curious though how others feel about quote-only vs. quote-plus-context.  Certainly it depends on the application, but since the Daylife API is intended to support a wide array of applications, it might argue for supporting both.  On the other hand, one should avoid complicating an API needlessly.  

Another company, Silobreaker, also does quote extraction and attribution.  Their quotes module for Barack Obama is shown below, and has a few problems.  Too much of the surrounding text is included, and isn’t selected as neatly as Google.  Two of the four weren’t actually said by Obama.  They’ve changed the bolded quote block to double quotes, but elsewhere the documents use single quotes, making it confusing to understand the original text.  Perhaps they didn’t intend it as quotes by Obama, and indeed it says “Quotes for Barack Obama”.  He will be glad to hear of their support.  That however is not what the placement of the module calls out for, and what most users would expect.  Lowering expectations in such a way not only means adding language like that, but involves the entire context of the presentation.  So I would put Silobreaker in a distant third for quality of quote attribution.

 

 

Silobreaker.com quotes for Barack Obama

Silobreaker.com quotes for Barack Obama