KEYNOTE 1: Getting Rid of the Ten Blue Links

ABSTRACT: In this talk, I will first give a brief overview of the IR group at RMIT. Then I will describe the work we are doing at RMIT to change one of the commonest web pages we all look at; the Search Result Page (SERP). In our work, we are looking to replace the SERP with a set of answer passages that address the user’s query. In the context of general web search, the problem of finding answer passages has not been explored extensively. Previous studies have found that many informational queries can be answered by a passage of text extracted from a retrieved document, relieving the user from having to read the actual document. While current passage retrieval methods that focus on topical relevance have been shown to be not effective at finding answers, the result shows that more knowledge is required to identify answers in the document.

We have been formulating the answer passage extraction problem as a summarisation task. We initially used term distributions extracted from a Community Question Answering (CQA) service to generate more effective summaries of retrieved web pages. An experiment was conducted to see the benefit of using the CQA data in finding answer passages. We analyse the fraction of answers covering a set of queries, the quality of the corresponding result from the answering service, and their impact on the generated summaries. I will also talk about recent work where we re-rank retrieved passages according to the summary quality and incorporate document summarisability into the ranking function.

SPEAKER: Prof. Dr. Mark Sanderson

School of Computer Science and Information Technology, RMIT University, Melbourne, Australia


Mark Sanderson is a Professor at the School of Computer Science and Information Technology at RMIT University in Melbourne, Australia. As the head of the RMIT Information Retrieval group, he is particularly interested in the evaluation of search engines, summarisation, geographic search and logs analysis. Prof. Sanderson is an associate editor of ACM Transactions on the Web and IEEE Transactions on Knowledge and Data Engineering. He is also a co-editor of Foundations and Trends in Information Retrieval.

 

 

 

KEYNOTE 2: Information Retrieval to Support Computer-aided Drug Design and Development Process

ABSTRACT: The vast amount of data in chemical and document databases offers a lot of opportunities to aid the process of drug design and development. The bioactivity of unknown compounds can be predicted based on their structural similarity to known drug compounds. Many information retrieval techniques that have been widely used for text retrieval has been studied for this application domain. Traditionally, the Vector Space Model utilising bit string representations of compound and the Tanimoto coefficient has been used to rank molecules. However, we have proven that this representation and way of comparison is not necessarily the best way to find similarly bio-active compounds among the top ranked compounds. In this talk, a number of approaches to enhance molecular search will be discussed. We will discuss the different ways molecules can be represented for retrieval purposes and the different ways molecules can be compared to find similar molecules in terms of bioactivity among the top ranking structures. The representation approach will include a number of shape-based molecular representations and deep learning approaches. A number of similarity enhancement approaches that will be presented include modification of the Simple Matching Similarity Measure with bit-string re-weighting, Bayesian network-based similarity measures, fragment selection, fragment weightings, relevance feedbacks and quantum-based similarity searching. Finally, a number of clustering techniques for clustering chemical compounds databases and how consensus clustering is used to for virtual screening of chemical compounds and a number of approaches for automatic detection of adverse drug effects from medical text reports will also be presented.

SPEAKER: Prof. Dr. Naomie Salim

Faculty of Computing, Universiti Teknologi Malaysia, Malaysia


Professor Salim’s main research goal is to design new algorithms to improve the effectiveness of searching and mining new knowledge from various kinds of datasets, including unstructured, semi-structured and structured databases. The current focus of her research is on chemical databases and text databases to support the process of computer-aided drug design, text summarisation, plagiarism detection, automatic information extraction, sentiment analysis and recommendation systems. The output of the research has been incorporated into a number of software such as UTMChem Workbench, KimiaSpace and NADI Natural Products Database System to support drug design and drug optimisation process, UTMCLPD Cross Language Plagiarism Detection System to summarise documents and check for plagiarism and Oricheck for cross-language idea similarity checking and plagiarism detection. She has been involved in 53 research projects out of which she heads 21 of the projects. She has authored over 170 journal articles. Among the research and innovation awards received by Professor Salim are the PECIPTA 2011 Gold Medal award for her UTMCLP cross-language semantic plagiarism detection system, the I-inova 2010 Gold Medal award for her Islamic Ontology-based Quran search engine, BioInnovation 2011 Bronze Award for UTMChem Workbench Molecular Database System, iPhex Gold Medal Award for innovation in teaching and learning, UTM 2011 Best Research Award, UTM 2014 Best Research Award and the INATEX Distinction Award (1998). She is a fellow of Japan Society for the Promotion of Science (JSPS) and heads the Soft Computing Research Group UTM.

 

 

KEYNOTE 3: Learning in a Dynamic and Ever Changing World

ABSTRACT: The world is dynamic – in a constant state of flux – but most learned models are static. Models learned from historical data are likely to decline in accuracy over time. This talk presents formal tools for analysing non-stationary distributions and some insights that they provide. Shortcomings of standard approaches to learning from non-stationary distributions are discussed together with strategies for developing more effective techniques.

SPEAKER: Prof. Dr. Geoff Webb

Center of Data Science, Monash University, Australia


Geoff Webb is Director of the Monash Centre for Data Science. He is a technical advisor to data science startups BigML and FROOMLE. He has been Editor in Chief of the premier data mining journal, Data Mining and Knowledge Discovery (2005 to 2014) and Program Committee Chair of the two top data mining conferences, ACM SIGKDD (2015) and IEEE ICDM (2010), as well as General Chair of ICDM (2012). His primary research areas are machine learning, data mining, user modelling and computational structural biology. Many of his learning algorithms are included in the widely-used BigML, R and Weka machine learning workbenches. He is an IEEE Fellow and received the inaugural Eureka Prize for Excellence in Data Science in 2017, the 2013 IEEE ICDM Service Award, a 2014 Australian Research Council Discovery Outstanding Researcher Award, the 2016 Australian Computer Society ICT Researcher of the Year Award and the 2016 Australasian Artificial Intelligence Distinguished Research Contributions Award.