*** ANUNCI DE CONFERÈNCIA *** Friday 21st, 12:00h, DSIC: sem. 202 Speaker: David L. Dowe, Associate Professor, Clayton School of Information Technology, Monash University, Melbourne, Australia. "Database Normalization as a By-product of Minimum Message Length Inference" Minimum Message Length (MML) (Wallace and Boulton, Computer J, 1968) is a unifying information-theoretic Bayesian principle of machine learning, statistics, econometrics, inductive inference and (so-called) data mining. It can deal with discrete and continuous-valued variables, and it has applications in a wide range of areas including (e.g.) the definition of intelligence, the philosophy of science (Ockham's razor, ``grue'' and the problem of induction, etc.) and a wide range of other applications. In this talk, we look at how (even) database normalization can be thought of in terms of Minimum Message Length. Database normalization is a central part of database design in which we re-organise the data stored so as to progressively ensure that as few anomalies occur as possible upon insertions, deletions and/or modifications. Successive normalizations of a database to higher normal forms continue to reduce the potential for such anomalies. We show here that database normalization follows as a consequence (or special case, or by-product) of the Minimum Message Length (MML) principle of machine learning and inductive inference. In other words, someone (previously) oblivious to database normalization but well-versed in MML could examine a database and - using MML considerations alone - normalise it, and even discover the notion of attribute inheritance. The work is done by calculating the MML message lengths for 1st normal form (1NF), 2nd normal form (2NF) and 3rd normal form (3NF). MML then advocates choosing the model with the shortest message length. Although these (1NF, 2NF and 3NF) are the only examples explicitly given in the paper, it is quite clear how to generalise this work (e.g.) to higher normal forms (BCNF, 4NF, 5NF) and to attribute inheritance. The above work was presented in Dowe & Zaidi (2010, LNAI 6464), and the paper can be downloaded. The speaker will welcome questions about MML (in general), his uniqueness result about log-loss probabilistic scoring and (in particular) the issue of MML and database normalisation. (No soy giri, soy guiri. Cesar no es marinero, es capitan.)