Web Content Mining - TKS

March 23, 2018 | Author: Anonymous | Category: N/A
Share Embed

Short Description

Download Web Content Mining - TKS...


Web Usage Mining



Contents 2

         

Web Mining Web Mining Taxonomy Web Usage Mining Web analysis tools Pattern Discovery Tools & it’s different stages Pattern Analysis Tools & techniques employed Web usage Mining Process Web usage Mining Architecture Research Directions Conclusion References

Web Usage Mining

Web Mining 3

Web mining - data mining techniques to automatically discover and extract information from Web documents/services .  Web mining research – it integrate information from several research communities such as:  Database (DB)  Information retrieval (IR)  The sub-areas of machine learning (ML)  Natural language processing (NLP) 

Web Usage Mining

Mining the World-Wide Web 4

WWW is a huge, widely distributed, global information source for :  Information

services: news, advertisements, consumer information, financial management, education, government, e-commerce, etc.  Hyper-link information  Access and usage information  Web Site contents and Organization

Web Usage Mining

Challenges on WWW Interactions 5

   

Finding Relevant Information Creating knowledge from Information available Personalization of the information Learning about customers / individual users

Web Mining can play an important Role!

Web Usage Mining

Web Mining Taxonomy 6

Web Mining

Web Content Mining

Web Structure Mining

Web Usage Mining

Web Usage Mining

Web Usage Mining 7

Web usage mining also known as Web log mining techniques to discover interesting usage patterns from the secondary data derived from the interactions of the users while surfing the web

 mining

Web Usage Mining


Organizations often generate and collect large volumes of data in their daily operations while interacting with a web site. Most of this information is usually generated automatically by Web servers and collected in server access logs. Other sources of user information include referrer logs which contains information about the referring pages for each page reference, and user registration or survey data gathered via tools such as CGI scripts. Analysis of server access logs and user registration data provide valuable information on how to better structure a Web site in order to create a more effective presence for the organization. Most of the existing Web analysis tools provide mechanisms for reporting user activity in the servers and various forms of data filtering. Web Usage Mining

Web analysis tools: 9

Using these tool it is possible to determine the number of accesses to the server and the individual files within the organization's Web space, the times or time intervals of visits, and domain names and the URLs of users of the Web server.

 

Pattern Discovery Tools Pattern Analysis Tools

Web Usage Mining

Pattern Discovery Tools 10

The emerging tools for user pattern discovery that use sophisticated techniques from AI, data mining, psychology, and information theory, to mine for knowledge from collected data. The WEBMINER system introduces a general architecture for Web usage mining. WEBMINER automatically discovers association rules and sequential patterns from server access logs. Pirolli et. al. use information foraging theory to combine path traversal patterns, Web page typing, and site topology information to categorize pages for easier access by users.

Web Usage Mining

Pattern Analysis Tools 11

 

Once access patterns have been discovered, analysts need the appropriate tools and techniques to understand, visualize, and interpret these patterns. Examples of such tools include , WebViz system OLAP techniques such as data cubes for the purpose of simplifying the analysis of usage statistics from server access logs . The WEBMINER system proposes an SQL-like query mechanism for querying the discovered knowledge Web Usage Mining

Pattern Discovery from Web Transactions 12

Preprocessing Tasks  Data

Cleaning  Transaction Identification 

Discovery Techniques on Web Transactions  Path

Analysis  Association Rules  Sequential Patterns  Clustering and Classification Web Usage Mining

Preprocessing Tasks 13

Data Cleaning: Techniques to clean a server log to eliminate irrelevant items. Elimination of irrelevant items can be reasonably accomplished by checking the suffix of the URL name. like, all log entries with filename suffixes such as, gif, jpeg, GIF, JPEG, jpg, JPG, and map can be removed.

Web Usage Mining

Transaction Identification 14

 

Here, sequences of page references are grouped into logical units representing Web transactions or user sessions. Two types of transactions are defined.

navigation-content where each transaction consists of a single content reference and all of the navigation references in the traversal path leading to the content reference. These transactions can be used to mine for path traversal patterns.

content-only which consists of all of the content references for a given user session. These transactions can be used to discover associations between the content pages of a site. Web Usage Mining

Discovery Techniques on Web Transactions 15

Path Analysis Here a graph represents the physical layout of a Web site, with Web pages as nodes and hypertext links between pages as directed edges. Other graphs could be formed based on the types of Web pages with edges representing similarity between pages, or creating edges that give the number of users that go from one page to another.

Path analysis could be used to determine most frequently visited paths in a Web site. Web Usage Mining


Other examples of information that can be discovered through path analysis are: 70% of clients who accessed /company/products/file2.html did so by starting at /company and proceeding through /company/whatsnew, /company/products, and /company/products/file1.html; 80% of clients who accessed the site started from /company/products; or 65% of clients left the site after four or less page references. Web Usage Mining

Association Rules 17 

 

This technique is generally applied to databases of transactions where each transaction consists of a set of items. the problem is to discover all associations and correlations among data items Each transaction is comprised of a set of URLs accessed by a client in one visit to the server. For example, using association rule discovery techniques we can find correlations such as the following: 40% of clients who accessed the Web page with URL /company/products/product1.html, also accessed /company/products/product2.html; or 30% of clients who accessed /company/announcements/special-offer.html, placed an online order in /company/products/product1.

Web Usage Mining

Contd.. 18

Usually such transaction databases contain extremely large amounts of data, current association rule discovery techniques try to prune the search space according to support for items under consideration. Support is a measure based on the number of occurrences of user transactions within transaction logs. Discovery of such rules for organizations engaged in electronic commerce can help in the development of effective marketing strategies.

Web Usage Mining

Sequential Patterns 19

The problem of discovering sequential patterns is to find inter-transaction patterns such that the presence of a set of items is followed by another item in the time-stamp ordered transaction set. By analyzing this information, the Web mining system can determine temporal relationships among data items such as the following: 30% of clients who visited /company/products/, had done a search in Yahoo, within the past week on keyword w; or 60% of clients who placed an online order in /company/products/product1.html, also placed an online order in /company1/products/product4 within 15 days. Web Usage Mining

Clustering and Classification 20 

Discovering classification rules allows one to develop a profile of items belonging to a particular group according to their common attributes. This profile can then be used to classify new data items that are added to the database such as the following: clients from state or government agencies who visit the site tend to be interested in the page /company/products/product1.html; or 50% of clients who placed an online order in /company/products/product2, were in the 20-25 age group and lived on the West Coast. Clustering analysis allows one to group together clients or data items that have similar characteristics. Clustering of client information or data items on Web transaction logs, can facilitate the development and execution of future marketing strategies, both online and off-line.

Web Usage Mining

Analysis of Discovered Patterns 21

 

Web site administrators are extremely interested in questions like "How are people using the site?", "Which pages are being accessed most frequently?", etc. These questions require the analysis of structure of hyperlinks as well as the contents of the pages. The end products of such analysis might include 1) the frequency of visits per document, 2) most recent visit per document, 3) who is visiting which documents, 4) frequency of use of each hyperlink, and 5) most recent use of each hyperlink. Visualization Techniques OLAP Techniques Data & Knowledge Querying Web Usage Mining

Visualization Techniques 22

Visualization has been used very successfully in helping people understand various kinds of phenomena, both real and abstract. The WebViz system is used for visualizing WWW access patterns. WebViz allows the analyst to selectively analyze the portion of the Web that is of interest by filtering out the irrelevant portions. The Web is visualized as a directed graph with cycles, where nodes are pages and edges are (interpage) hyperlinks. The visualization is composed of two windows, the WebViz control window and the display window . The first provides the analyst with controls to adjust the bindings, select a specific time to view, control the animation, and rearrange the layout. The second window's arrangement allows a document's access frequency to be represented by the width of the node representing it, while the node's color represents it recency of access. Web Usage Mining

OLAP Techniques 23

On-Line Analytical Processing (OLAP) is emerging as a very powerful paradigm for strategic analysis of databases in business settings. The key characteristics of strategic analysis include ,very large data volume, explicit support for the temporal dimension, support for various kinds of information aggregation, and long-range analysis. This has led to the development of the data cube information model , and techniques for its efficient implementation . Web usage data have much in common with those of a data warehouse, and hence OLAP techniques are quite applicable and the issue needs further investigation. Web Usage Mining

Data & Knowledge Querying 24

One of the reasons attributed to the great success of relational database technology has been the existence of a high-level, declarative, query language, which allows an application to express what conditions must be satisfied by the data it needs, rather than having to specify how to get the required data. The main focus may be provided in at least two ways. First, constraints may be placed on the database (perhaps in a declarative language) Second, querying may be performed on the knowledge that has been extracted by the mining process. An SQL-like querying mechanism has been proposed for the WEBMINER system. Web Usage Mining

Web usage Mining Process 25

Web Usage Mining

Web Usage Mining Architecture 26

Web Usage Mining

Research Directions 27

   

Web Usage Mining, which is just starting as an area of research, has a number of open issues. Following are some directions for future research: Data Pre-Processing for Mining The Mining Process Analysis of Mined Knowledge Web SIFT Example

Web Usage Mining

WebSIFT Example 28

Web Site Information Filter System (WebSIFT) is a Web usage mining framework, that uses the content and structure information from a Web site, and identifies the interesting results from mining usage data.  Input of the mining process: server logs (access, referrer, and agent), HTML files, optional data.  Prototypical Web usage mining system. 

Web Usage Mining

Conclusion 29

Web usage and data mining used for finding patterns is a growing area with the growth of Webbased applications Application of web usage data can be used to better understand web usage, and apply this specific knowledge to better serve users Web usage patterns and data mining can be the basis for a great deal of future research

Web Usage Mining

References: 30

 

Web Usage: Mining: Discovery and Applications of Usage Patterns from Web Data - Jaideep Srivastava, Robert Cooley, Mukund Deshpande, Pang-N in Tan Dept of CSE – University of Minnesota. Web Mining: Pattern Discovery from World Wide Web Transaction Web Mining Research: A Survey – Raymond Kosala, Hendrik Blockeel Dept of CS Katholieke Universiteit LeuvenJ. Srivastava, R. Cooley, M. Deshpande, Pang-Ning-tan. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, SIGKDD Explorations, Vol. 1, Issue 2, 2000. B. Mobasher, R. Cooley and J. Srivastava, Web Mining: Information and Pattern Discovery on the World Wide Web, Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'97), November 1997. www.wikipedia.org Web Usage Mining

Web Usage Mining



View more...


Copyright © 2017 DOCUMEN Inc.