Friday 14 December 2018

5 reasons why Enterprise Search will never be as good as Google

All the time we hear managers saying "we want a search engine as good as Google". Here are 5 reasons why you can never even get close.

Image from wikimedia commons
 Google is the yardstick for search, and managers seem to want internal enterprise search that works as well and as (apparently) intuitively as google. But there are 5 good reasons why this will never happen (bearing in mind that I am by no means a search specialist).


1) Search engine optimisation - webpages want to be found

Do you have a website? If you do you will be as familiar as I with the deluge of Spam emails offering to optimise my website for Google search. SEO (Search Engine Optimisation) is big business, and the owners of webpages are doing lots of work on Google's behalf to ensure their pages are indexable and findable and optimised for search.

But who, in an organisation, optimises their documents and sites for internal search? Let me tell you who - Nobody; that's who.  Unless you are very lucky, few if any people think about the issues of findability when they publish content.

Google is successful in finding sites because those sites want to be found. They are often very keen to be found, because they are trying to sell you something. The search results at the top of Google's list are often the ones most desperate to be found. Many documents in your enterprise system do not want to be found, often for issues related to confidentiality as described below.

2) The fact that the web is interlinked html pages, whereas your content is usually isolated word documents (if you're lucky!)

Sometimes it's not even Word documents - I know organisations that save their critical knowledge in pdf form!

The difference between interlinked web pages and isalted documents is critical. Google can crawl through the web of interlinked sites, can understand the context of a site partly through its links, and can identify authoritative or important sites based on the number of links that point to them. The search engine results at the top of the list are often the ones with the most backlinks.  The components of the page are also obvious to Google - the title, the first level headings, the metadata - and these also are used to understand what the page is about.

Your documents are not linked. Each stands alone. Each has to be searched and indexed separately. There are no backlinks. There is no visible structure to the document, other than to the human eye, and the search engine cannot tell a footnote from a level 1 heading.

3) The hordes of search engine specialists employed by Google.

How many search engine specialists do you employ? None, right? Google employs tens of thousands. That's one of the reasons their search works better than yours.

This is especially an issue if you are planning to use Semantic search, or to optimise customer search of your knowledge base. In these cases you will need a search engine specialist to build and evolve the ontology, track and improve the search accuracy, and define the synonyms and stop words.  However managers often neglect this aspect, and assume a search-engine is a one-off purchase that will run itself.

4) Google doesn't do "security levels"

Google assumes everything is available and visible to everyone. It doesn't do passwords or access restrictions or security levels. It searches everything that is not on the Dark Web.

A lot of your documents are effectively on the dark Web - they are in secure folders on Box, or Dropbox, or SharePoint. I consulted recently to an organisation that had 300 separate databases or document management systems. They had opened about 6 of these for indexing, the rest were effectively "dark" as far as search was concerned.


5) The web doesn't do version control

Every webpage on the web is the only version. Rather than storing a webpage as version 3.5 and writing version 4.0, you just rewrite and publish the page. Every page on the web is the current version, and is constantly under development. Google only returns one version of the page - the current version.

You don't treat documents in this way. Very often, unless your document management is very good, you will have multiple versions of the same document stored in different places.  One of the bugbears of enterprise search is that it will often find all these version in your search results.


So the next time your managers ask "Why can't we have search like Google" - 

you can reply - "Yes, we can, IF

  • You move all content out of documents onto wikis
  • You keep only one version of every document
  • You train all staff in search engine optimisation
  • You hire a team of search engine specialists, and
  • You make all documents open to everyone".
Then see what they say!


Enterprise search can work, but it will never work like Google.


No comments:

Blog Archive