document under reconstruction (and always will be)

PV211 -- Introduction to Information Retrieval (Spring 2020)

Intro | News | Lectures | Exercises (auth) | Previous courses | Projects |

Intro

The course is based on the textbook Manning, Raghavan and Schutze: Introduction to Information Retrieval (hard copies available in the library at FI) taught at Stanford, Munich and other places. In the course you will, among other things, learn how is it possible to fulfill seekers' information needs at the pace of 10,000+ questions per second on the global web scale within milliseconds.
This year, parts about machine (deep) learning are added, while others are discarded due to capacity reasons. Students will be motivated to try active/flipped learning approaches wherever possible. Course trailer (in Czech)

News

Lecture slides and other materials

  1. 19. 2. 2020 12:00 D3: Introduction to IR, Boolean Retrieval.
    Boolean retrieval slides 1, IIR chapter 1
    Exercises (week 1+2, auth IS)
  2. 26. 2. 2020 12:00 D3: Dictionary and Postings' storage (Indexing). Tolerant Retrieval.
    Readings: ternary trees, Soundex demo. Explore Google datacenters (YouTube video).
    Term vocabulary and postings lists slides 2, IIR chapter 2
    Dictionaries and tolerant retrieval slides 3, IIR chapter 3
    Exercises (week 1+2, auth IS)
  3. 4. 3. 2020 12:00 D3: Tolerant retrieval (cont.), Index construction, MapReduce.
    Readings: Index construction slides 4, IIR chapter 4
    Exercises (week 3+4, auth IS)
  4. 11. 3. 2020 12:00 D3: Index Compression, Scoring.
    Readings: Compression slides 5, IIR chapter 5
    Scoring, term weighting, the vector space model slides 6, IIR chapter 6
    Exercises (week 3+4, auth IS)
  5. 18. 3. 2020 12:00 D3: Vector Space Model, Anatomy of the web scale IR system.
    Readings: Vector space model (slides Arguello)
    Google: Anatomy paper from 1998 (PDF), (HTML), slides Google infrastructure by Jeff Dean, Jeff Dean (YouTube video), Google File System, How Google works (YouTube in Czech), Challenges in Building Google... (slides by Jeff Dean from Stanford CS276 course in 2015), Google crash course (in Czech), slides Google architecture (Ed Austin).
    Exercises (week 5+6, auth IS), Sketch Engine
  6. 25. 3. 2020 Distributed Word Representations for Information Retrieval.
    Readings: slides, main "word2vec" paper, Building scalable systems that understand content.
    Exercises (week 5+6, auth IS)
  7. 1. 4. 2020 Computing scores in complete search system. Ranking.
    Scoring slides 7, IIR chapter 7
    Exercises (week 7+8, auth IS)
  8. 8. 4. 2020 Evaluation in IR and result summaries
    Readings: slides 8, IIR chapter 8
    Exercises (week 7+8, auth IS)
  9. 15. 4. 2020 Relevance feedback and Query expansion.
    Readings: Query expansion slides 9, IIR chapter 9
    Exercises (week 9+10, auth IS)
  10. 22. 4. 2020 Text classification, Naive Bayes, Evaluation, Clustering, kNN.
    Readings: Text Classification and Naive Bayes slides 13, IIR chapter 13
    Clustering Introduction slides Cvinčeková, Flat Clustering slides 16, IIR chapter 16
    Exercises (week 9+10, auth IS), Term project information: deadline by April 29th.
  11. 29. 4. 2020 Vector Space Classification.
    Readings: slides 14, IIR chapter 14.
    Exercises (week 11+12, auth IS)
  12. 6. 5. 2020 Latent Semantics Models.
    Readings: Latent Semantic Indexing slides 18, IIR chapter 18, Gensim, Latent Dirichlet Allocation Topic similarity by LDA: intro, LDA slides by Blei, LDA visual browser demo
    Similarity search with Gensim (Exercise materials from 2019)
    Exercises (week 11+12, auth IS)
  13. 13. 5. 2020 Web search.
    Readings: slides 19, IIR chapter 19.
    Exercises (week 13+14, auth IS)
  14. 20. 5. 2020 Link Analysis.
    Readings: slides 21, IIR chapter 21, How Google finds a needle....
    Exercises (week 13+14, auth IS)

Topics not covered in 2020, course runs from previous years

Also due to corona limitations these topics will not be covered in the 2020 course run:

Projects and miniprojects

I will be glad if you get encouraged into course topics and you decide to get insight into it by solving [mini]projects. Activities in this direction will be rewarded by the nontrivial number of premium points towards successful grading. Number of stars below is an estimate of project difficulty, from miniproject [(*), 10 points] to big project size [(*****), 30+ points]. I am also open to assign/extend a project as a Bachelor/ Masters/ Dissertation thesis.

To a pupil who was in danger, Master said, "Those who do not make mistakes, they are most mistaken for all – they do not try anything new." Anthony de Mello

Valid XHTML 1.0!
sojka at fi dot muni dot cz --