pkb contents > search engines | just under 2262 words | updated 12/29/2017

1. What are search engines?

"Technically, 'search engine' is the popular term for information retrieval systems. Although Web search engines are the most popular, search engines are often used in other than the Web, such as desktop search engines and document search engines ... perhaps a more appropriate name for them would have been finding engines" (Sharda et al., 2014, p. 243).

1.1. Business applications of search engines

1.1.1. Enterprise IR

1.1.2. Search marketing

1.1.2.1. Search engine optimization

Per Sharda et al. (2014, pp. 246-248), SEO is the "intentional activity of affecting the visibility of an e-commerce site or a Web site in a search engine's natural (unpaid or organic) search results ... As an Internet marketing strategy, SEO considers how search engines work, what people search for, the actual search terms or keywords typed into search engines, and which search engines are preferred by their targeted audience. Optimizing a Web site may involve editing its content, HTML, and associated coding to both increase its relevance to specific keywords and to remove barriers to the indexing activities of search engines. Promoting a site to increase the number of backlinks, or inbound links, is another SEO tactic."

From VanFossen (2006), some more SEO tactics:

Per Sharda et al., (2014), 'black-hat SEO' tactics include cloaking (crawler and human see different versions of a page) and using HTML, JavaScript, etc. to create content that a crawler sees but a human doesn't (e.g., through text color).

Per VanRossen, in 2005(b) Google used the following factors to rank websites, implying some SEO practices:

1.1.2.1.1. Keywords in SEO

Per VanRossen (2005a):

VanRossen (2005c):

Tools:

Re: long-tail (uncommon) keywords,

1.1.2.2. Content marketing

1.1.2.2.1. Content curation

Good (2014), a content curator is distinguished from a content marketer in the following ways; he [sic]:

  1. "Is not after quantity. Quality is his key measure.
  2. Does not ever curate something without having thoroughly looked at it, multiple times.
  3. Always provides insight as to why something is relevant and where the item fits in its larger collection (stream, catalog, list, etc.)
  4. Adds personal evaluation, judgment, critique or praise.
  5. Integrates a personal touch, in the way it presents the curated object.
  6. Provides useful information about other related, connected or similar objects of interest.
  7. Credits and thanks anyone who has helped in the discovery, identification and analysis of any curated item and links relevant names of people present in the content.
  8. Does not ever republish content “as is” without adding extra value to it.
  9. Does not curate, select, personalize or republish his own content in an automated way.
  10. Discloses bias, affiliation and other otherwise non self-evident contextual clues."

Or, the short version: “[never] forget what these people are looking for and what they really expect from anyone providing them with an answer”;“[take] seriously the information needs of your niche”.

Kanter (2013) provides an excellent summary (as she must) of how curation adds value to information:

1.1.2.2.2. Writing roundup posts

Per Daly (2015):

  1. Identify and automate the collection of good content
  2. Get inspired by top-notch curators
  3. Establishing a well-trafficked blog takes 6-12 months; be patient, consistent, and promote the blog through links and complementary media channels

1.1.2.3. Search engine marketing

AKA SEM; paid search

Per EBizMBA, based on 2017 traffic data from Alexa, Compete, and Quantcast, ordered most to least popular:

  1. Google
  2. Bing
  3. Yahoo
  4. Baidu
  5. Ask
  6. AOL Search
  7. DuckDuckGo
  8. WolframAlpha
  9. Yandex
  10. WebCrawler
  11. Search
  12. dogpile
  13. ixquick
  14. excite
  15. Info

2. How do search engines work?

In general, search engines work by crawling and automatically indexing content, thus creating metadata. This index may be fairly shallow, e.g. based on contents of the tag or headers; it may also be quite deep, using natural language process (NLP) techniques like grammatical stemming. User search terms are then matched to the index.

In the early days, there was a strong distinction in techniques used between search engines and library catalogs. Increasingly, though, KOSs from IA --- which take advantage of human knowledge by formalizing it for use by an information system --- play a role in improving search engine performance.

2.1. Search engine process

Per Sharda et al. (2014, pp. 243-246), a search engine involves two simultaneous cycles: "[w]hile one is interfacing with the World Wide Web, the other is interfacing with the user."

2.1.1. Development cycle

2.1.1.1. Web crawler

(AKA Web spider, spider)

"A Web crawler starts with a list of URLs to visit, which are listed in the schedule and are often called the seeds. These URLs may come from submissions made by Webmasters, or, more often, they come from the internal hyperlinks of previously crawled documents/pages. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit. As the documents are found and fetched by the crawler, they are stored in a temporary staging area for the document indexer to grab and process."

2.1.1.2. Document indexer

2.1.2. Response cycle

2.1.2.1. Query analyzer

"[R]esponsible for receiving a search request from the user (via the search engine's Web server interface) and converting it into a standardized data structure, so that it can be easily queried/matched against the entries in the document database ... quite similar to what the document indexer does ..."

2.1.2.2. Document matcher

Per some search algorithm,

2.1.2.3. Postdelivery

"Leading search engines like Google monitor the performance of their search results by capturing, recording, and analyzing postdelivery user actions amd experiences. These analyses often lead to more and more rules to further refine the ranking of the documents/pages so that the links at the top are more preferable to the end users" (Sharda et al., 2014, p. 246).

2.2. Measuring search engine performance

Per Sharda et al. (2014):

2.3. Major search algorithms

"[E]arly search engines used a simple keyword match against the document database and returned a list of ordered documents/pages, where the determinant of the order was a function that used the number of words/terms matched between the query and the document along with the weights of those words/terms" (Sharda et al., 2014, p. 246)

2.3.2. PageRank

2.3.3. Hilltop

2.3.4. Topic-Sensitive

(or Hypertext Induced Topic Selection??)

2.3.6. Panda

2.3.7. Penguin

2.3.8. Hummingbird

(semantic reasoning and query rewriting)

2.3.9. RankBrain

(machine learning)

3. Sources

3.1. Cited

Daly, J. (2015, May 5). How to write a great roundup post. Retrieved from http://www.cornerstonecontent.com/how-to-write-a-great-roundup-post/

Good, R. (2014, March 18). Content curation is not content marketing. MasterNewMedia. Retrieved from http://www.masternewmedia.org/content-curation-is-not-content-marketing/

Kanter, B. (2013, December 13). How nonprofits get significant value from content curation. Beth's Blog. Retrieved from http://www.bethkanter.org/content-curation-2/

Sharda, R., Delen, D., & Turban, E. (2014). Business intelligence: A managerial perspective on analytics (3rd ed.). New York City, NY: Pearson.

VanFossen, L. (2005a, October 16). How people search the web, and how they can find your blog. Retrieved from https://lorelle.wordpress.com/2005/10/16/how-people-search-the-web-and-how-they-can-find-your-blog/

VanFossen, L. (2005b, September 19). Secret out -- How Google ranks websites. Retrieved from https://lorelle.wordpress.com/2005/09/19/secret-out-how-google-ranks-websites/

VanFossen, L. (2005c, November 26). What are keywords? Retrieved from https://lorelle.wordpress.com/2005/11/26/what-are-keywords/

VanFossen, L. (2006, January 15) Do-It-Yourself Search Engine Optimization. Retrieved from https://lorelle.wordpress.com/2006/01/15/dyi-search-engine-optimization/

3.2. References

3.3. Read

Hedden, H. (2016). The accidental taxonomist (2e). Medford, NJ: Information Today, Inc.

3.4. Unread