The following content was originally published on BigDataMadeSimple and you can read it here.
I’ve become a big fan of Google. Now before I start pouring in praises for the search giant, it’s quite amusing to note that their well-loved search routine begins with the modest process of web crawling performed by crawlers (aka spiders or bots) commonly referred to as Googlebots.
How they tested, tried and played with the crawled data has turned them into a massive unparalleled search sensation.
So what set them apart?
The Google Journal
About 2 decades back, Google – currently the most valuable company in the world, started out with a rather simple looking mission statement – to organize the world’s information and make it universally accessible and useful.
Formulated by Stanford students Larry Page and Sergey Brin, the search engine was powered in 1998 from a garage in suburban Menlo Park, California. The building blocks of their search engine, as covered by Google engineer Matt Cutts’ video on the fundamentals of search include –
- Web crawling – With a list of web addresses gathered from sitemaps and previous crawls, the web crawling process is initiated. Computer programs called as Googlebots, decide the sites to crawl and the frequency at which the crawl should happen. It is up to the web crawlers to discover new, updated information on the list of websites it has.
- Indexing – Just as how you would check the index page for looking up a particular page in a huge book, Google picks key signals from the crawled data and maintains a huge search index that is claimed to be over 100,000,000 gigabytes.
- Serving – Having the information stacked up, Google then serves the relevant pages to users by applying the PageRank Algorithm. The algorithm measures how important a page is based on the number of incoming links and a list of 200 other factors!
No wonder you get most of your answers from Google.
The Google “WOW” factor
Search engines like Archie, WebCrawler, and Yahoo existed prior to Google but fizzled out with Google dominating the Internet in the years that followed. A few key areas where Google stood out remarkably were –
- Passion and innovation – Founder Larry Page had a perfect vision for his search engine – to understand correctly what a user wanted and give back exactly what they wanted. Google was not satisfied with giving just a bunch of websites as results but had a passion to serve the right answer. They worked hard in fine-tuning their algorithms to bring about better search answers making about 1600 improvements in 2016 alone.
Google was also futuristic in terms of trying out various innovative features like interpreting spelling corrections in search queries, finishing off people’s thoughts with google autocomplete and returning answers in different languages having relevant images with universal search. It didn’t stop there.
Getting about a trillion search queries a day, Google took an extra mile to reason out why a user was typing in a particular search query. Was it for directions to a particular place or to find a bakery nearby or to check out local clothing trends? With these questions, they went on to build more search-based products like Google maps, Google news, Google trends, Google alerts, and Google flights, growing to be more than just another search engine.
- Faster results – In a short video presented by Google on the Evolution of Search, Ben Gnomes, a Google Fellow outlines the goal of the Google team to get answers to users faster and faster. Amit Singhal, another Google Fellow, describes how Google failed to give relevant search results during the Twin Towers attack in 2001 as the Google index was crawled a month earlier. To address this they initiated crawling the news quickly and improved the frequency of their web crawling process in the years to follow.
- Staying Relevant always – Though Google made a lot of strides in their search and software products, they stayed 100% true to their initial mission of making as much relevant information as possible and available not just to a chosen few but to everyone. Rather than crowding their homepage with irrelevant content, they stuck to a simple box for typing in the search query. Serving the most relevant answers to all users mattered the most to them.
The Google lesson in web crawling
Google kicked off with the humble web crawling process of systematically looking at web pages across the Internet. But understanding user intent, Google knew the nature of questions that they had to work with presenting the right answers from their crawled data.
There is a lot of information out there on the world wide web that holds answers to your business questions. Web crawling can gather information according to the nature of your business query. It could be anything like – What is the best price I can sell my commodity at? Who are the competitors in my area of expertise? What is lacking in them that I possess? What companies are Venture Capitalists interested in? Is there a relevant business technology I can invest in? Can my solutions be of help to potential client profiles whom my solutions I can reach out?
Ask right, innovative questions like Google and look for answers from the information gathered in the web crawling process. If you are new to the web crawling activity, do get in touch with us!