How Google and Microsoft taught search to "understand" the Web
Despite the massive amounts of computing power dedicated by search engine companies to crawling and indexing trillions of documents on the Internet, search engines still can't do what nearly any human can: tell the difference between a star, a 1970s TV show, and a Turkish alternative rock band.
That’s because Web indexing has been based on the bare words found on webpages, not on what they mean. Since the beginning, search engines have essentially matched strings of text, says Shashi Thakur, a technical lead for Google’s search team. “When you try to match strings, you don't get a sense of what those strings mean. We should have a connection to real-world knowledge of things and their properties and connections to other things.” Making those connections is the reason for recent major changes within the search engines at Microsoft and Google. Microsoft’s Satori and Google’s Knowledge Graph both extract data from the unstructured information on webpages to create a structured database of the “nouns” of the Internet: people, places, things, and the relationships between them all.
How Google and Microsoft taught search to "understand" the Web