Web information extraction and retrieval

Web information extraction and retrieval

Lectures: 45

Seminars: 10

Tutorials: 20

ECTS credit: 6

Content of the course:
This course will cover the following topics:
Information Retrieval and Web Search
Basic Concepts of Information Retrieval
Information Retrieval Models
Relevance Feedback
Evaluation Measures
Text and Web Page Pre-Processing
Inverted Index and Its Compression
Latent Semantic Indexing
Web Search
Meta-Search: Combining Multiple Rankings
Web Crawling
A Basic Crawler Algorithm
Implementation Issues
Universal Crawlers
Focused Crawlers
Topical Crawlers
Structured Data Extraction
Wrapper Induction
Instance-Based Wrapper Learning
Automatic Wrapper Generation
String Matching and Tree Matching
Multiple Alignment
Building DOM Trees
Extraction Based on a Single List Page or Multiple Pages
Information Integration
Schema-Level Matching
Domain and Instance-Level Matching
Combining Similarities
1:m Match
Integration of Web Query Interfaces
Constructing a Unified Global Query Interface
Opinion Mining and Sentiment Analysis
Document Sentiment Classification
Sentence Subjectivity and Sentiment Classification
Opinion Lexicon Expansion
Aspect-Based Opinion Mining
Opinion Search and Retrieval

Bing Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications, Springer, August 2013
Ricardo Baeza-Yates , Berthier Ribeiro-Neto: Modern Information Retrieval: The Concepts and Technology behind Search, 2nd Edition, ACM Press Books, 2010