News 18/03/2019 : Les supports de présentation des orateurs sont disponibles dans le résumé de chacun des exposés ci-dessous
News 14/01/2019 : Le restaurant pourl le diner du 21 est donné, ce sera le "Restaurant au boeuf"
News 21/11/2018 : Extension de date limite au 26/11/2018
News 05/11/2018 : possibilité de bourses de mobilités du GDR MADICS pour les doctorants participants à l'école EGC et aux ateliers. Plus d'informations : www.madics.fr/reseaux/formation/doctorants.
La cinquième École d’Hiver é-EGC, sur le thème «Privacy Preserving, Reasoning, Explaining», est un évènement organisé par l’Association Extraction et Gestion de Connaissances (EGC, http://www.egc.asso.fr/).
Cet événement s’organise autour de deux activités principales :
Le phénomène des données massives s’installe durablement et les sources de données sont multiples et omniprésentes (IoT, réseaux sociaux, SI institutionnels, SI hospitaliers, etc.). Dans ce contexte, deux objectifs cruciaux semblent s’opposer. Le premier est celui de l’ouverture des données pour des usages variés, allant de la recherche publique à l’exploitation commerciale, dont certains font appel à des techniques sophistiquées de raisonnement et de construction d’explications susceptibles de rassurer les experts et de garantir une traçabilité des processus d’analyse et de décision mis en place. Le second objectif est de garantir le respect de la vie privée des personnes en ne divulguant pas d’informations identifiantes directement ou indirectement par croisement avec d’autres données. Des mécanismes flexibles et robustes doivent donc être pensés afin d’accompagner le mouvement de l’ouverture des données sans compromettre la sécurité de leur manipulation et des accès.
Les deux jours de formation ont pour but principal d’offrir aux participants des tutoriaux d’initiation dans le domaine de recherche couvrant les thèmes de l’école mais également des tutoriaux plus spécifiques présentant les récentes avancées proposant des solutions et des techniques nouvelles pour les différentes problématiques ayant émergé dans ce domaine. Les exposés, de 1h30 ou 2h30, couvriront une large gamme des problématiques et des solutions existantes. Certains exposés seront associés à des séances plus pratiques afin de permettre aux participants de manipuler quelques outils.
La participation à la conférence permettra aux participants de prendre part à un des événements majeurs de la communauté francophone de l’extraction et la gestion de connaissances. Elle leur permettra d’assister à des présentations de nouvelles avancées et approches développées dans la communauté, ceci pouvant ainsi inspirer leur parcours scientifique futur.
Enfin, cette École souhaite offrir aux jeunes chercheurs (doctorants, post-doctorants et ingénieurs) et aux chercheurs confirmés du domaine la possibilité de se rencontrer et d’échanger des idées, ce qui devrait également permettre aux jeunes chercheurs d’enrichir leur réseau.
09h00 - 09h30 - Bienvenue / Welcome ! « Privacy Preserving, reasoning, explaining »
09h30 - 10h30 - Dimitris Kotzinos (ETIS, University of Cergy-Pontoise) - "Introduction to privacy: theory and applications"
10h30 - 11h00 - Pause café
11h00 - 12h30 - Fatiha Sais (LRI, Paris Sud University, France) - “Identity Management in the Web of Data”
12h30 - 14h00 - Déjeuner
14h00 - 16h30 - Benjamin Nguyen (LIFO, INSA Centre Val de Loire, France) - “Anonymization techniques : theory and practice”
16h30 - 17h00 - Pause café
17h00 - 18h30 - Ioannis Krontiris (Huawei, Munich, Germany) - “Anonymization algorithms based on differential privacy with emphasis on optimizing the performance of data mining algorithms”
20h00 - Diner au restaurant "Assiette au boeuf"
Assiette au boeuf
1 rue du pont des morts
57000 Metz
Tél:03 87 32 43 12
Email:metz@assietteauboeuf.fr
Mardi 22/01/2019
09h00 - 10h30 - Marie-Laure Mugnier (LIRMM, University of Montpellier): “An introduction to Ontology-Based Data Access”
10h30 - 11h00 - Pause café
11h00 - 12h30 - Marie-Christine Rousset (LIG, University of Grenoble Alpes) - “RDF dataset anonymization robust to data interlinking”
12h30 - 14h00 - Déjeuner
14h00 - 15h30 - Yücel SAYGIN (Sabanci University, Instanbul, Turkey) - “Improving the accuracy of differentially private algorithms through sensitivity analysis”
15h30 - 16h00 - Pause café
16h00 - 17h30 - Vincent Rasneur (CNIL, France) - “General Data Protection Regulation in research”
Dimitris Kotzinos (ETIS, University of Cergy-Pontoise, France) - “Introduction to privacy: theory and applications”
Slides : eEGC-Kotzinos.pdf
Biography. Dimitris Kotzinos is a Professor at the Department of Computer Science of the University of Cergy – Pontoise, member of the ETIS Lab and head of the MIDI team of the lab. He holds a Ph.D. on the topic of ”Application of digital map technologies on developing internet based Advanced Traveler Information Systems (ATIS)” and a M.Sc. in Transportation, where he studied networks and their applications in transportation systems. His main research interests include data management algorithms, techniques and tools; development of methodologies, algorithms and tools for web-based information systems, portals and web services; and the understanding of the meaning (semantics) of interoperable data and services on the web. Recently he has started working on studying the formation and evolution of discussions in online social networks. He is also interested in issues around data privacy and especially their intersection with the publication of linked open data. Dimitris has published in various journals, books, conferences and workshops and serves as a program committee member and reviewer for various conferences and journals. He is also participating in nationally and internationally funded research programs around data analytics, data models and networks and their integration in the everyday life.
Abstract. In this introduction we survey problems that arise from privacy concerns around data and data processing and discuss relevant applications that can be seen as source and inspiration for the problems as well as possible.demonstration of the different solutions. We will discuss the different aspect of privacy and introduce how these fit in the context of the school.
Slides: eEGC-Sais.pdf
Abstract. Recent years have seen the rise of large knowledge bases (KBs), such as YAGO, Wikidata, and DBpedia on the academic side, and the Google Knowledge Graph or Microsoft’s Satori graph on the commercial side. These KBs contain millions of entities (such as people, places, or organizations), and millions of facts about them. This knowledge is typically expressed in RDF (Resource Description Frameowrk), i.e., as triples of the form of triples <Macron, presidentOf, France>. Some of these knowledge graphs are published and linked to other knowledge graphs trough Linked Data platforms. In this context, owl:sameAs links are declared to express identity relation between resources that refer to the same real world entity. To automatically detect identity links many approaches have been recently proposed. Some of them use keys or linking specifications to detect the links with a higher precision. However, recent research discussions have shown that there are many erroneous owl:sameAs links and have raised several issues in the use of owl:sameAs. In this talk, we will present the identity problem and an overview of existing approaches aiming to detect invalid owl:sameAs statements or to represent alternative identity links.
Slides: eEGC-Nguyen.pdf
Biography. Benjamin Nguyen est Professeur à l'INSA Centre Val de Loire depuis 2014, directeur du Laboratoire d'Informatique Fondamentale d'Orléans (LIFO), où il est membre de l'équipe Sécurité des Données et des Systèmes et collaborateur extérieur de l'équipe Inria Private and Trusted Cloud (PETRUS). Il est membre du bureau du GDR Sécurité Informatique en tant que responsable du groupe de travail "Privacy" et membre du comité de pilotage d'APVP, la conférence française sur la protection de la vie privée. Ses thèmes de recherche portent sur les problèmes de sécurité, de vie privée et d'efficacité liés à la gestion des données personnelles, e.g. les architectures logiques respectueuses de la vie privée, la sécurité et confidentialité de calculs distribués sur des données personnelles, la personnalisation de modèles d'anonymisation ou encore la définition formelle de concepts liés à la vie privée comme la collecte limitée des données.
Abstract. Dans les pays européens, le traitement de données personnelles est régit par le nouveau Règlement Général sur la Protection des Données (RGPD). Afin de protéger la vie privée des individus, tout traitement de telles données est contrôlé attentivement par les autorités de régulation (CNIL). Toutefois, si une donnée est anonyme, c'est-à-dire qu'il est difficile, voire impossible de la relier avec un individu du monde physique, alors cette donnée est moins risquée, et donc son traitement peut se faire librement. Dans ce tutoriel, nous présentons les modèles d'anonymisation classiques pour des données tabulaire (modèles à base de k-anonymat et confidentialité différentielle), ainsi que pour des données de géolocalisation. Nous proposons également un temps de TP sur machine avec le logiciel d'anonymisation open source ARX, développé par l'Université Technologique de Munich.
Slides: eEGC-Krontiris.pdf
Biography. Ioannis Krontiris holds a PhD in Computer Science from Mannheim University, Germany. Ioannis has over ten years of experience working in privacy protection in EU or international level. He has been working extensively with differential privacy algorithms and big data analytics, privacy-respecting identity management, privacy in Internet of Things, security and privacy in Internet of Vehicles. He has served as technical coordinator of the EU project ABC4Trust, and served as the chair of IFIP WG 11.2 - Pervasive Systems Security till June 2014. He currently has a position as Privacy Expert at the European Research Center of Huawei Technologies.
Abstract. The recent, remarkable growth of machine learning has drawn interest in the privacy protection of the data on which machine learning relies, and to new techniques for preserving privacy. Privacy in big data can be achieved through various means, but here we will focus on differential privacy. Differential privacy can achieve protection of big data with the strongest mathematical guarantee and has a large scope of future development. Along these lines, in this talk, we will discuss the fundamental ideas of sensitivity and privacy budget in differential privacy, the noise mechanisms utilized as a part of differential
privacy, the composition properties, and the ways through which it can be achieved. We will give emphasis on the performance of data mining algorithms executed on anonymized big data and on which anonymization algorithms should be applied to optimize this performance. Research gaps and future directions will also be part of this talk.
Marie-Laure Mugnier (LIRMM,University of Montpellier): “An introduction to Ontology-Based Data Access”
Slides: eEGC-Mugnier.pdf
Biography. Marie-Laure Mugnier is a professor at the University of Montpellier and the scientific leader of a research team in knowledge representation and reasoning (KR) at LIRMM and Inria. Her current research mainly focuses on KR languages to do reasoning on data. She regularly publishes in the main international conferences and journals of artificial intelligence. In 2017, she co-initiated a national action, named `Reasoning on Data’, common to CNRS research groups (GDR) Artificial Intelligence and MaDICS.
Abstract. Ontology-Based Data Access is a recent paradigm based on knowledge representation and reasoning techniques for data management in modern information systems. Within this approach, the information system is structured around an ontology, which provides a high-level vocabulary, as well as knowledge relevant to the target domain, and a uniform access to heterogeneous data sources. This tutorial will introduce Ontology-Based Data Access and give an overview of techniques and current research in this area.
Marie-Christine Rousset (LIG, University of Grenoble Alpes) - “Linked data anonymization”
Slides: eEGC-Rousset.pdf
Biography. Marie-Christine Rousset is a Professor of Computer Science at the University of Grenoble Alpes and senior member of Institut Universitaire de France. Her areas of research are Knowledge Representation, Information Integration, Linked Data and the Semantic Web. She has published around 100 refereed international journal articles and conference papers, and participated in several cooperative industry-university projects. She received a best paper award from AAAI in 1996, and has been nominated ECCAI fellow in 2005. She has served in many program committees of international conferences and in editorial boards of several journals.
Abstract. Linked Open Data (LOD) provides access to continuously increasing amounts of RDF data that describe properties and links among entities referenced by means of Uniform Resource Identifiers (URIs). Whereas many organizations, institutions and governments participate to the LOD movement by making their data accessible and reusable to citizens, the risks of identity disclosure in this process are not completely understood. As an example, in smart city applications, information about users' journeys in public transportation can help re-identify the individuals if they are joined with other public data sources by leveraging quasi-identifiers. The main problem for data providers willing to publish useful data is to determine which anonymization operations must be applied to the original dataset in order to preserve both individuals' privacy and data utility. In this tutorial, we will present a novel query-based approach for Linked Open Data anonymization under the form of delete and update operations on RDF graphs. We consider policies as sets of privacy and utility specifications, which can be readily written as queries by the data providers. This allows a data-independent approach to compute sets of anonymization operations guaranteed to satisfy both privacy and utility policies on any input RDF graph. We will then focus on the problem of ensuring the robustness to linking attacks which raises difficult issues since the goal is now to guarantee that the union of the anonymized dataset with any other dataset will still preserve privacy.
Slides: eEGC-Saygin.pdf
Biography. Yücel Saygın is a Professor of Computer Science with the Faculty of Engineering and Natural Sciences at Sabanci University in Istanbul, Turkey. He received his B.S., M.S., and PhD. degrees from the Department of Computer Engineering at Bilkent University in 1994, 1996, and 2001, respectively. His main research interests include datamining, and privacy preserving data management. Yücel Saygın has published in international journals like ACM Transactions on Database Systems, VLDB Journal, IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Dependable and Secure Computing, and in proceedings of international conferences. He co-chaired variousconferences and workshops in the area of data mining and privacy preserving data management. He was the coordinator of the MODAP (Mobility, Data Mining, and Privacy) project funded by EU FP7 under the Future and Emerging Technologies Program.
Abstract. Differential privacy has recently become the most popular standard for privacy protection. A prominent method to achieve differential privacy is by adding noise to the outputs of an algorithm. The required noise amount depends on the sensitivity of an algorithm, which captures the maximum possible change in the algorithm’s output with any one record change in its input. However, calculating or finding an upper bound for the sensitivity of an arbitrary algorithm or query set is NP-hard, and therefore existing approaches resort to safe but sub-optimal overestimations of sensitivity. In this talk, I will discuss our recent research on automated techniques for sensitivity analysis using
graph algorithms. Our theoretical contributions are two-fold: First, with the help of our graph structure, we formulate a tighter upper bound on sensitivity than those found in the literature, thus improving the accuracy of differentially private algorithms without sacrificing privacy. Second, we convert the problem of implementing our sensitivity bounds to clique problems on graphs, and via employing state of the art clique solvers, we are able to achieve very efficient (almost instant) realisations. I will also describe how these theoretical contributions fuelled another work on differentially private machine learning through instance-based classifiers, such as k-nearest neighbour, with accuracy superior to existing machine learning algorithms.
Vincent Rasneur (CNIL, France) - “General Data Protection Regulation in research”
Biography.
Abstract.
L’École d’Hiver é-EGC 2019 s’adresse particulièrement aux doctorants et étudiants, de manière générale, désirant approfondir leurs connaissances dans le domaine de la préservation de la vie privée lors de la manipulation et l’analyse des données à caractère personnel ainsi que les enjeux liées au raisonnement et à la production d’explications sur la base de connaissances acquises auprès d’experts puis formalisées ou extraites par fouille de données.
L'inscription se réalise en deux étapes :
Pour que la pré-inscription soit prise en compte, elle doit être accompagnée d’un CV récent du participant.
Le nombre total de participants est limité à 30 personnes. Le tarif de l’inscription sera affiché sur le site de la conférence EGC et il comprend :
Des prix ont été négociés pour les hôtels ; vous allez trouver ces informations dans le formulaire de pré-inscription ou sur le site de la conférence.