Evri took the wraps off of their site today, and the so far it looks good. Their angle seems to be establishing connections between people, products, places, and things, what in the business are called named entities, and creating a browsable experience. Compared to others in that market, they are doing a nice job. Its something that Silobreaker has been doing for a fairly long time, without significant traction. Silobreaker has a broader range of analytics, but no nice widgets and entity disambiguation, but Evri’s chance at success seems better for several reasons that are detailed below.
A good starting point for Evri are their profile pages. Start by looking at the Mike Walker page. That page lists out articles for any Mike Walker, and offers you a selection of several individuals with that name. Five “Mike Walker” pages are listed, but from their person finder menu I see the following six:
Mike Walker, Author and Journalist
Mike Walker, Coach and Soccer Player
Mike Walker, Football Player
Mike Walker, Musician
Mike Walker, Playwright
Mike Walker, Football Player
These map directly to Wikipedia entries, with Wikipedia’s paranthetic notations converted to a more naturally readable form. There are two “Football Player” entries because there were too football players with that name. One is listed as “Mike Walker (American football)”, the other “Mike Walker (Canadian football)”. The last one has no traffic.
Mike Walker the playwright is a very interesting case. One of the links is to this Wikipedia page that mentions but does not link to Mike Walker, but mentions “play” in the vicinity of his name. Also on the playwright’s page is a link to this news article. It clearly should have been linked to the football player, but you’ll notice that the word “play” appears fairly close to Mike Walker’s name. The language of the document as a whole however is easily distinguished as a football piece and not one on plays or playwrights, except perhaps for a play about football.
So here’s what I think they’re doing with topics. They scrape Wikipedia for a bunch of people, and fit them into categories, perhaps a few hundred. Football players, soccer players and coaches, musicians, playwrights, and so on. For example, Joe Biden is listed as “lawyer and U.S. politician”, right out of the sidebar on his Wikipedia entry. The playwright is filed under “Mike Walker (radio dramatist)”, but is assigned the category “British dramatists and playwrights”. So you pidgeon-hole him in with all of the other playwrights. This is a decent amount of manual work, but far more tractable than trying to do it all manually, and far more accurate than doing it all algorithmically. In the case of playwrights, soccer players, and so on, you define some words that will tend to isolate the type of person. For playwrights, the word “play”. Score each Mike Walker entry based on the words in the vicinity of the name, and the words attached to the category of each candidate. You get fairly good precision deciding between entities. But sometimes word sense will mess you up, like the word “play” in the football article resulting in an assignment to the playwright. Still, I think its a very nice solution.
The images and video are somewhat lacking. You will notice that all Mike Walker’s have exactly the same images and videos, and in fact Silobreaker has exactly the same video’s as Evri, and the’re just about all unrelated and of poor quality, and that goes for Evri’s images as well. That’s one of the problems with matching between data sets. I hear the Semantic Web will eradiate this type of problem at some point in the distant future and bring peace to all the peoples of the earth.
The related content widget is nice, but mostly a nice UI for what they are already doing to support the profile pages, so I won’t discuss it in too much detail. You stuff some text through your topic extraction pipeline, and use it to get related content.
Also, as a side note, I just want to say what an amazing thing Wikipedia is for getting sites like this off the ground. So many of the companies in this field have used them, and its great to see all of the human effort curating that site get leveraged across the industry.
– Sept 26 –
I realized subsequently that they enumarate what they call “taxonomical paths”, which are the categories referenced above that likely help with disambiguating entities. The list is available on their web site. Great transparency.
Posted by Ken Ellis
Posted by Ken Ellis
Posted by Ken Ellis
RSS