PhD Scholarship at Inria on Crawling Algorithms

PhD Openning at Inria Sophia Antipolis, France

at Team NEO

under the supervision of Prof. K. Avrachenkov


The project is in the framework of the joint Inria - Qwant Search Engine Research Lab.

Topic: Adaptive crawling with machine learning techniques


We shall consider the problem of web crawling with limited bandwidth and computational resources. Some web sites could be crawled not sufficiently frequently resulting in resource underutilization and the other web sites could be crawled too frequently resulting in waste of resources. We shall try to design an adaptive crawling algorithm based on machine learning techniques such as clustering and reinforcement learning to try to find dynamically optimal crawling frequencies based on web site classification, behavior and changes.

Related references:

Lefortier, D., Ostroumova, L., Samosvat, E. and Serdyukov, P., 
"Timely crawling of high-quality ephemeral new content". 
In Proceedings of the 22nd ACM international conference on Information & Knowledge Management 
(pp. 745-750). 2013.

Faheem, M., and Senellart, P.,
"Adaptive Web Crawling Through Structure-Based Link Classification". 
In Proceedings of ICADL(pp. 39-51), 2015.

Avrachenkov, K., and Borkar, V.,
"Whittle Index Policy for Crawling Ephemeral Content".
to appear in IEEE Trans on Control of Network Systems,

Required skills: Solid knowledge of mathematics and, in particular,

Probability and Statistics; experience in machine learning or control

theory is a plus; knowledge of python is another plus.

Application: Please apply with CV, two reference letters and academic transcript.

Job location: 
Inria Sophia Antipolis
2004 Route des Lucioles
06902 Sophia Antipolis
Contact and application information
Thursday, May 31, 2018
Contact name: 
Konstantin Avrachenkov