@techreport{TR-IC-PFG-16-17,
   number = {IC-PFG-16-17},
   author = {Felipe Lemes Galvão and Alexandre Xavier Falcão},
   title   =   {{Evaluating   Active  Learning  Strategies  for  Image 
                   Annotation of Intestinal Parasites}},
   month = {December},
   year = {2016},
   institution = {Institute of Computing, University of Campinas},
   note = {In English, 14 pages.
    \par\selectlanguage{english}\textbf{Abstract}
       Manually  annotating large datasets is unfeasible and, to do it
       automatically  with  a  pattern  classifier,  it depends on the
       quality  of  a  much  smaller  training  set.  Active  learning 
       techniques  have been proposed to select those relevant samples
       from large datasets by prompting an user with label suggestions
       to  be  confirmed  or  corrected.  \par In this work we explore
       variations  of  an  active  learning methodology that, given an
       organization  of  the  data  computed beforehand and only once,
       allows  interactive  response  times during the active learning
       process  involving  an  expert.  \par  We  use the optimum-path
       forest  (OPF) data clustering algorithm for the \emph{a priori}
       organization   and   some   combinations   of  active  learning 
       algorithms  and classifiers to test the methodology. The active
       learning  algorithms  considered  are  the  root distance-based
       sampling  (RDS),  a  new  variation  of  it  that  we call root
       path-weight-based  sampling  (RWS)  and  two  additional random
       selection baselines. The included classifiers are the OPF-based
       supervised   and   semi-supervised  learning  methods,  and  an 
       ensemble  of  logistic  regression  classifiers. \par We tested
       each   combination   of   active  learning  and  classification 
       algorithm against a dataset extracted from images of intestinal
       parasites,  in  which  the  presence  of a large diverse class,
       namely  impurities,  mixed  with  the  actual parasites poses a
       challenge for learning methods.
  }
}