Searching for what doesn’t exist…
Posted on June 27, 2006
Filed Under search, user-generated content, web 2.0, yahoo | Comments
As an industry, we’ve made a ton of progress in search over the last several years. Yet there is a subtle but profound limitation to “web search” as currently realized: search engines can only return results that… well… you know… exist.
At a glance this doesn’t seem to be much of a hindrance. It’s obvious, expected, rational. I’ve heard (a most excellent and engaging) schpiel from Google (Craig Silverstein) that acknowledges that their current search index captures only a fraction of the information that’s “out there.” The punchline of Craig’s talk was that they’d only indexed a tiny fraction of what’s possible - hence the efforts to digitize, crawl the “dark web”, extend to other media types, etc. The spirit of the talk was indeed inspirational, in the vein of “we’re just getting started…”
But the very comment that we’re only x% “done” implies that there is some finite body of knowledge out there, and if we could only digitize faster, crawl harder, buy more servers, etc. then we’d be able to improve that percentage and ultimately get “all” that information into the index (and presumably sleep well at night again.)
Noble as this goal may be, if you pause to think about it, it’s obvious (to me anyway) that humankind’s “potential knowledge” is greater than our “realized knowledge” to date. This is admittedly “cosmic” or metaphysical, but I mean this in a practical sense as well. Barring apocalyptic scenarios, there are more web pages yet to be written than have already been written. (For the sake of discussion, let’s use “web page” as proxy for discrete knowledge element while confessing that we’ve already moved beyond the “page” as a paradigm.)
Where am I going with this? Perhaps not surprisingly, Yahoo! Answers.
Some of the magic of Yahoo! Answers is revealed through examining its provenance. The category of knowledge search sprang up in Korea. In Korea exists what is arguably the world’s most sophisticated online population… but they are disadvantaged by the lack of Korean language documents (relative to English language.) Didn’t matter how hard we crawled, how much attention we put on ranking and relevance, etc. If the document itself did not exist, then web search wasn’t going to find it, rank it, present it, etc.
Y! Answers turns the current search paradigm on its head. Rather than the current industry search paradigm (connecting the average 2.4 keywords to some extant “web page” out there), Y! Answers attempts to distill knowledge out of the very ether… Actually, “ether” is rather inappropriate term as Y! Answers attempts to distill knowledge from a very real asset: Yahoo!’s pool of half a billion monthly users. It turns this audience into the world’s most liquid knowledge marketplace.
(This also reminds me a bit PubSub’s schpiel about “prospective” vs. “retrospective” search. The premise here is that PubSub could “search the future.” What’s different about Y! Answers is that PubSub had a relatively passive relationship to the knowledge itself: “We’ll tell you when…” Y! Answers actually has the reach, platform and mechanism to invoke the knowledge versus passively monitoring it. Moreover it evokes it in a “lazy migration”, generating knowledge precisely in response to demand for that knowledge.)
It’s fun and illuminating to think about all of the knowledge that doesn’t yet exist on a web page. Trust me, there’s lots. One obvious category is what might be referred to as “colloquial” knowledge, i.e. the shortcut to my house that the online mapping services always seem to get wrong. Or “Where’s a good place to get authentic matzah ball soup in Times Sq. at noon where I won’t have to wait in line?” The kind of stuff my mother and father know from a collective 142 years on the planet… but alas, they’ve never authored a web page (let alone written a book, made a movie, etc.) so the only beneficiaries of their wisdom to date have been their immediate friends and family. (Tom Coates will rap my knuckles for invoking the dreaded “parents as naive users” meme…)
Yahoo! Answers serves many, many more purposes than just colloquial knowledge however. It’s fascinating to spend time in there… it’s an incredibly revealing lens into the multitude of categories underserved by web search today. While the original motivation for knowledge search might be attributed to “lack of Korean language documents,” the success of the product worldwide indicates that this was just the tip of the iceberg… there is something more substantial, subtle, and universal going on: knowledge yet to exist > knowledge that exists. I find something incredibly uplifting and optimistic about this.
And with a push of the “Publish” button, yet another web page springs into existence. This one unasked for, but hopefully useful all the same.
Ps.
Tempted to title this post, “I still haven’t found what I’m looking for…” but reconsidered…
Comments
12 Responses to “Searching for what doesn’t exist…”
Leave a Reply
I agree that the paradigm of connecting search term(s) X to result Y needs to be extended upon. It would be nice if Yahoo or Google tried to collate info across webpages to answer one question.
That is to say that there is no single web page out there for my search phrase “alpha gamma” but webpage1 ties in alpha and beta really well and webpage2 ties in beta and gamma somewhat well. Therefore while there is no single result, Yahoo comes up with a top result that is a combination of multiple URLs or extracts info from each URL and comes up with the relevant answers in a small text section.
Why must all search results point to discrete URLs?
Excellent post Bradley. Yahoo! answers is not only addictive but also an extremely useful tool for finding information that refuses to exist!
[…] Relating back to the previous post, I recall soon after Flickr joined Yahoo asking Heather if there was a way I could solicit more photos of Westbeth. (A building in NYC I’m fond of…) She said, “Sure! I can make that happen for you!” But Heather, being the community manager of Flickr, had the means to rally the troops toward any cause… But I said, “No. I’m not interested in how you would do it… I’m interested in how one would do it…” And she suggested finding a relevant group (in this case maybe this one) and just sending up a “Would someone go take a picture of Westbeth for me?” flare. […]
One angle you didn’t explore is the difference between objective and subjective answers. Web search and Q&A services do a good job at answering objective questions but we have to get much, much better at personalization/behavioral targeting to have any hope of decently answering subjective questions. To do that we have to see through the eye of the beholder, not the answerer.
On a separate note, are you still seeing an average of 2.4 keywords per search or is that figure dated (i.e. 2004 data).
Agree with this point. Our services like del.icio.us are headed in that direction.
The 2.4 keywords is most certainly stale. I don’t have new data howerver.
If information retrieval moves past search - which as it has been pointed out in above comments is based on the answer/end point; to find - which is based on the question/start point; then many of the problems of search go away.
For example if someone were to want to find everything related to something known the results would be displayed and organised differently in find - e.g. one wanted to know everything within two relationships of the words “alpha” and “gamma” - the first relationship level results would be for sites containing alpha, gamma or both. At the second relationship level the results would broaden to include the Greek alphabet, beta, new software in alpha release, etc. So in find the user controls the scope of the results and may select a result either as an end point or a new start point/centre.
The major obstacle to this is the fact that search controls users through a process based approach - of course 40 years of EDP has drilled this approach into “technical” people.
To overcome the search process find uses factorial level relationships (e.g. a practical need for a number bigger than a googolplex) allowing people control of their start point, intermediate points and end points rather than being controlled. It makes no attempt to restrict the data sources (e.g. all the data that the Yahoo’s and Google’s can’t index) or the results as each person is different in the way they search, what they know and where they start from. This would allow two people to come to the same answer via two different lines of thinking – e.g. the first excludes irrelevant items and the second include relevant items catering for the fact no two people think the same or follow the same process.
Thus find is based on the presumption that most people know about things around what they are looking for even if they don’t know the exact keywords contained in the result (if they knew what they were looking for they will have already found it). This means that with find the queries are not restricted to specific terms like search but rather vague things and semantic “outside” of keywords that progressively reduce the number of relationships from factorial to ever smaller numbers until one is close enough to pick what is wanted.
At this point it may not be obvious that find is based on semantics and the semantic relationships required must both be virtual - e.g. every user dynamically creates and constantly changes their own set relationships as they define what they want to find; and automated - as there cannot be a reliance on people to consistently and unbiasedly create the voluminous quantities of relationships required. This also means that the process doesn’t have to change to incorporate new data, data types, etc - just more relationships.
The big questions are is IT willing to move beyond EDP and when it does who will marginalise search with find?
I doubt today’s search companies can make the transition away from process. Some may attempt find processes which may provide slightly better results than search processes but not the levels of control, sophistication and depth that is being increasingly demanded and that non-process find can deliver.
Remember from a users perspective many don’t want to search they want to find – especially as the volume of information grows.
Excellent post about Q&A search, but I find it a bit strange that you neglected to mention Naver in Korea (although you did mention Korea) in your post. AFAIK they are the ones who pioneered knowledge search and have the strongest foothold in Korea.
I do understand that Naver is a competitor.. but mentioning competitors in a blog post never really hurts if your product is as good if not better. Kinda helps that your product is in a language most of your readers understand as well :-).
[…] I recently mentioned how traditional web search is generally retrospective or forensic, but Answers lets one search for knowledge which does not yet exist. Cool stuff, still blows my mind. […]
Yes! Naver definitely deserves huge credit for the great job they did in pioneering knowledge search.
I’m definitely not shy about crediting them publicly (did so at eTech where a Naver employee came by and thanked me afterwards…) In this post (which was already long by my standards anyway) the mention of Korea was more about the unique boundary conditions that made it fertile ground for knowledge search (v. a history.)
Naver Naver Naver!
[…] Searching for what doesn’t exist - Bradley Horowitz talks about answers.yahoo.com. Existing search engines point you to the best match on pages that exist. Answers is the first service that coaxes knowledge from the collective smarts of Yahoo’s pool of half a billion monthly users. […]
[…] As Yahoo’s Bradley Horowitz said, […]
[…] As Yahoo’s Bradley Horowitz said, […]