FREE ELECTRONIC LIBRARY - Thesis, dissertations, books

Pages:   || 2 | 3 | 4 | 5 |   ...   | 10 |

«Edited by CHARU C. AGGARWAL IBM T. J. Watson Research Center, Hawthorne, NY 10532 PHILIP S. YU University of Illinois at Chicago, Chicago, IL 60607 ...»

-- [ Page 1 ] --





Edited by


IBM T. J. Watson Research Center, Hawthorne, NY 10532


University of Illinois at Chicago, Chicago, IL 60607

Kluwer Academic Publishers

Boston/Dordrecht/London Contents List of Figures xv List of Tables xx Preface xxi An Introduction to Privacy-Preserving Data Mining Charu C. Aggarwal, Philip S. Yu

1. Introduction 1

2. Privacy-Preserving Data Mining Algorithms 3

3. Conclusions and Summary 7 References 8 A General Survey of Privacy-Preserving Data Mining Models and Algorithms Charu C. Aggarwal, Philip S. Yu

1. Introduction 11

2. The Randomization Method 13

2.1 Privacy Quantification 15

2.2 Adversarial Attacks on Randomization 17

2.3 Randomization Methods for Data Streams 18

2.4 Multiplicative Perturbations 18

2.5 Data Swapping 19

3. Group Based Anonymization 20 The k-Anonymity Framework 3.1 20

3.2 Personalized Privacy-Preservation 23

3.3 Utility Based Privacy Preservation 24

3.4 Sequential Releases 25 The l-diversity Method 3.5 26 The t-closeness Model 3.6 27

3.7 Models for Text, Binary and String Data 27

4. Distributed Privacy-Preserving Data Mining 28

4.1 Distributed Algorithms over Horizontally Partitioned Data Sets 30

4.2 Distributed Algorithms over Vertically Partitioned Data 31 Distributed Algorithms for k-Anonymity 4.3 31

5. Privacy-Pr

–  –  –

In recent years, advances in hardware technology have lead to an increase in the capability to store and record personal data about consumers and individuals. This has lead to concerns that the personal data may be misused for a variety of purposes. In order to alleviate these concerns, a number of techniques have recently been proposed in order to perform the data mining tasks in a privacy-preserving way. These techniques for performing privacy-preserving data mining are drawn from a wide array of related topics such as data mining, cryptography and information hiding. The material in this book is designed to be drawn from the different topics so as to provide a good overview of the important topics in the field.

While a large number of research papers are now available in this field, many of the topics have been studied by different communities with different styles.

At this stage, it becomes important to organize the topics in such a way that the relative importance of different research areas is recognized. Furthermore, the field of privacy-preserving data mining has been explored independently by the cryptography, database and statistical disclosure control communities. In some cases, the parallel lines of work are quite similar, but the communities are not sufficiently integrated for the provision of a broader perspective. This book will contain chapters from researchers of all three communities and will therefore try to provide a balanced perspective of the work done in this field.

This book will be structured as an edited book from prominent researchers in the field. Each chapter will contain a survey which contains the key research content on the topic, and the future directions of research in the field. Emphasis will be placed on making each chapter self-sufficient. While the chapters will be written by different researchers, the topics and content is organized in such a way so as to present the most important models, algorithms, and applications in the privacy field in a structured and concise way. In addition, attention is paid in drawing chapters from researchers working in different areas in order to provide different points of view. Given the lack of structurally organized information on the topic of privacy, the book will provide insights which are not easily accessible otherwise. A few chapters in the book are not surveys, since the corresponding topics fall in the emerging category, and enough material is not


available to create a survey. In such cases, the individual results have been included to give a flavor of the emerging research in the field. It is expected that the book will be a great help to researchers and graduate students interested in the topic. While the privacy field clearly falls in the emerging category because of its recency, it is now beginning to reach a maturation and popularity point, where the development of an overview book on the topic becomes both possible and necessary. It is hoped that this book will provide a reference to students, researchers and practitioners in both introducing the topic of privacy-preserving data mining and understanding the practical and algorithmic aspects of the area.

Chapter 1



Charu C. Aggarwal IBM T. J. Watson Research Center Hawthorne, NY 10532 charu@us.ibm.com Philip S. Yu IBM T. J. Watson Research Center Hawthorne, NY 10532 psyu@us.ibm.com Abstract The field of privacy has seen rapid advances in recent years because of the increases in the ability to store data. In particular, recent advances in the data mining field have lead to increased concerns about privacy. While the topic of privacy has been traditionally studied in the context of cryptography and informationhiding, recent emphasis on data mining has lead to renewed interest in the field.

In this chapter, we will introduce the topic of privacy-preserving data mining and provide an overview of the different topics covered in this book.

1. Introduction

The problem of privacy-preserving data mining has become more important in recent years because of the increasing ability to store personal data about users, and the increasing sophistication of data mining algorithms to leverage this information. A number of techniques such as randomization and k-anonymity [1, 4, 16] have been suggested in recent years in order to perform privacy-preserving data mining. Furthermore, the problem has been discussed in multiple communities such as the database community, the statistical disclosure control community and the cryptography community. In some cases,


the different communities have explored parallel lines of work which are quite similar. This book will try to explore different topics from the perspective of different communities, and will try to give a fused idea of the work in different communities.

The key directions in the field of privacy-preserving data mining are as follows:

Privacy-Preserving Data Publishing: These techniques tend to study different transformation methods associated with privacy. These techniques include methods such as randomization [1], k-anonymity [16, 7], and l-diversity [11]. Another related issue is how the perturbed data can be used in conjunction with classical data mining methods such as association rule mining [15]. Other related problems include that of determining privacy-preserving methods to keep the underlying data useful (utility-based methods), or the problem of studying the different definitions of privacy, and how they compare in terms of effectiveness in different scenarios.

Changing the results of Data Mining Applications to preserve privacy: In many cases, the results of data mining applications such as association rule or classification rule mining can compromise the privacy of the data. This has spawned a field of privacy in which the results of data mining algorithms such as association rule mining are modified in order to preserve the privacy of the data. A classic example of such techniques are association rule hiding methods, in which some of the association rules are suppressed in order to preserve privacy.

Query Auditing: Such methods are akin to the previous case of modifying the results of data mining algorithms. Here, we are either modifying or restricting the results of queries. Methods for perturbing the output of queries are discussed in [8], whereas techniques for restricting queries are discussed in [9, 13].

Cryptographic Methods for Distributed Privacy: In many cases, the data may be distributed across multiple sites, and the owners of the data across these different sites may wish to compute a common function. In such cases, a variety of cryptographic protocols may be used in order to communicate among the different sites, so that secure function computation is possible without revealing sensitive information. A survey of such methods may be found in [14].

Theoretical Challenges in High Dimensionality: Real data sets are usually extremely high dimensional, and this makes the process of privacypreservation extremely difficult both from a computational and effectiveness point of view. In [12], it has been shown that optimal k-anonymization An Introduction to Privacy-Preserving Data Mining is NP-hard. Furthermore, the technique is not even effective with increasing dimensionality, since the data can typically be combined with either public or background information to reveal the identity of the underlying record owners. A variety of methods for adversarial attacks in the high dimensional case are discussed in [5, 6].

This book will attempt to cover the different topics from the point of view of different communities in the field. This chapter will provide an overview of the different privacy-preserving algorithms covered in this book. We will discuss the challenges associated with each kind of problem, and discuss an overview of the material in the corresponding chapter.

2. Privacy-Preserving Data Mining Algorithms In this section, we will discuss the key stream mining problems and will discuss the challenges associated with each problem. We will also discuss an overview of the material covered in each chapter of this book. The broad topics

covered in this book are as follows:

General Survey. In chapter 2, we provide a broad survey of privacypreserving data-mining methods. We provide an overview of the different techniques and how they relate to one another. The individual topics will be covered in sufficient detail to provide the reader with a good reference point. The idea is to provide an overview of the field for a new reader from the perspective of the data mining community. However, more detailed discussions are deferred to future chapters which contain descriptions of different data mining algorithms.

Statistical Methods for Disclosure Control. The topic of privacy-preserving data mining has often been studied extensively by the data mining community without sufficient attention to the work done by the conventional work done by the statistical disclosure control community. In chapter 3, detailed methods for statistical disclosure control have been presented along with some of the relationships to the parallel work done in the database and data mining community.

This includes methods such as k-anonymity, swapping, randomization, microaggregation and synthetic data generation. The idea is to give the readers an overview of the common themes in privacy-preserving data mining by different communities.

Measures of Anonymity. There are a very large number of definitions of anonymity in the privacy-preserving data mining field. This is partially because of the varying goals of different privacy-preserving data mining algorithms. For example, methods such as k-anonymity, l-diversity and t-closeness are all designed to prevent identification, though the final goal is to preserve the


underlying sensitive information. Each of these methods is designed to prevent disclosure of sensitive information in a different way. Chapter 4 is s a survey of different measures of anonymity. The chapter tries to define privacy from the perspective of anonymity measures and classifies such measures. The chapter also compares and contrasts different measures, and discusses the relative advantages of different measures. This chapter thus provides an overview and perspective of the different ways in which privacy could be defined, and what the relative advantages of each method might be.

The k-anonymity Method. An important method for privacy de-identification is the method of k-anonymity [16]. The motivating factor behind the kanonymity technique is that many attributes in the data can often be considered pseudo-identifiers which can be used in conjunction with public records in order to uniquely identify the records. For example, if the identifications from the records are removed, attributes such as the birth date and zip-code an be used in order to uniquely identify the identities of the underlying records. The idea in k-anonymity is to reduce the granularity of representation of the data in such a way that a given record cannot be distinguished from at least (k − 1) other records. In chapter 5, the k-anonymity method is discussed in detail. A number of important algorithms for k-anonymity are discussed in the same chapter.

The Randomization Method. The randomization technique uses data distortion methods in order to create private representations of the records [1, 4].

In most cases, the individual records cannot be recovered, but only aggregate distributions can be recovered. These aggregate distributions can be used for

data mining purposes. Two kinds of perturbation are possible with the randomization method:

Additive Perturbation: In this case, randomized noise is added to the data records. The overall data distributions can be recovered from the randomized records. Data mining and management algorithms re designed to work with these data distributions. A detailed discussion of these methods is provided in chapter 6.

Multiplicative Perturbation: In this case, the random projection or random rotation techniques are used in order to perturb the records. A detailed discussion of these methods is provided in chapter 7.

In addition, these chapters deal with the issue of adversarial attacks and vulnerabilities of these methods.

Pages:   || 2 | 3 | 4 | 5 |   ...   | 10 |

Similar works:

«o O C C A S I O N A L PA P E R Building innovation capacity: the role of human capital formation in enterprises – a review of the literature Andrew Smith Jerry CourviSAnoS JACqueline tuCk univerSity of BAllArAt Steven mCeAChern AuStrAliAn nAtionAl univerSity Building innovation capacity: the role of human capital formation in enterprises—a review of the literature ANDREW SMITH JERRY COURVISANOS JACQUELINE TUCK UNIVERSITY OF BALLARAT STEVEN MCEACHERN AUSTRALIAN NATIONAL UNIVERSITY The views...»

«DATA MINING CLASSIFICATION FABRICIO VOZNIKA LEONARDO VIANA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe. The tendency is to keep increasing year after year. It is not hard to find databases with Terabytes of data in enterprises and research facilities. That is over 1,099,511,627,776 bytes of data. There is invaluable information and knowledge “hidden” in such databases; and without automatic methods for extracting...»

«Creating a Culture of Sustainability An Undergraduate and Graduate Research Conference April 15, 2015 Research Poster Abstract and Visual Performance/Exhibit Proposal Guidelines This conference, the first of its kind at Southern University, is open to the public and will feature work by hundreds of students across the disciplines from the Southern University System. These students will present original research and artistic endeavors, with a sustainability focus, in poster sessions, visual...»

«ON LOVING AND HATING MY MENTALLY RETARDED MOTHER * ** BY Carol Rambo Ronai, Ph.D. University of Memphis *I am grateful for the assistance of Rabecca Cross, Tiffany Parish Akin, and Jack Ronai in the preparation of this manuscript. I am also grateful to six anonymous reviewers and the editor of this journal for providing the thought provoking and supportive comments that strengthened this piece. Please direct all correspondence to Carol Rambo Ronai at the Department of Sociology and Social Work,...»

«Safety & Instruction Manual G2 Contender ® Pistol & Rifle Read the instructions and warnings in this manual CAREFULLY BEFORE using this firearm. THOMPSON/CENTER ARMS 2100 Roosevelt Avenue • Springfield, MA 01104 Toll Free Phone (866) 730-1614 www.tcarms.com Copyright © 2014 Smith & Wesson Corp. All rights reserved. WARNING READ THESE INSTRUCTIONS AND WARNINGS CAREFULLY. BE SURE YOU UNDERSTAND THESE INSTRUCTIONS AND WARNINGS BEFORE USING THIS FIREARM. FAILURE TO READ THESE INSTRUCTIONS...»

«AETIAPTAMEHT OEPA3OB AHITfl TOPOAA MOCKBbI CT OqHOE OKPY}ITHOE YIIPABJIEHI,IE OEPA3 OB AIglPIfl BO TOCyAAPCTBEHHOE EIOAXETHOE OEPA3OBATEJTbHOE yrrpE)KAEHT{E IOPOAA MOCKBbI IIIMHA3VIfl Ng 2072 !npercrop OTIIET rro pe3yJrbTaTa M CaMOO6CJreAOBaHrrfl rocyqapcrBeHHoro 6ro4xeruoro odpa3oBareJrbHoro yqper(AeHuq rr{MHa3rrtr Ng 2072 Mocrcna, 2014 roa СОДЕРЖАНИЕ № Содержание Стр. Организационно-правовое обеспечение 3-8...»

«The  explanatory  gap  problem:  how  neuroscience   might  contribute  to  its  solution     D  i  s  s  e  r  t  a  t  i  o  n   zur  Erlangung  des  akademischen  Grades     Doctor  philosophiae  (Dr.  phil.)   Eingereicht  an  der  Philosophischen  Fakultät  I   der  Humboldt ­Universität  zu  Berlin   von  Daniel  Kostić  ,  Dipl. ­Phil.   Der  Präsident  der  Humboldt ­Universität  zu  Berlin  Prof.  Dr.  Dr.  h.c....»


«What Professional Advisors Should Know and Ask about Life Insurance David N. Barkhausen, J.D., CFP Life Insurance Advisors, Inc. Fee-Only Insurance Consulting 714 E. Prospect Avenue Lake Bluff, Illinois 60044 (847) 482-1605 Copyright, David N. Barkhausen, 2001. All Rights Reserved. What Professional Advisors Should Know and Ask about Life Insurance Table of Contents Topic Page Comparing Companies 5 Company Ratings and Rating Agencies 6 Comparing Products as well as Companies 7 Permanent...»

«Lady ‘A’ishah’s age at the time of the consummation of her marriage to the Holy Prophet Muhammad (sas) by Ghulam Nabi Muslim M.A. Translated and edited by Kalamazad Mohammed B.A., Dip. Ed. with some revision by Nasir Ahmad Syed B.A., LL.B. www.aaiil.org Ahmadiyya Muslim Literary Trust, Trinidad and Tobago Lady ‘A’ishah’s age at the time of the consummation of her marriage to the Holy Prophet Muhammad (sas) Ahmadiyya Muslim Literary Trust, Trinidad and Tobago www.aaiil.org Published...»

«Multi-contaminant Treatment for 1,2,3 Trichloropropane Destruction Using the HiPOx Reactor Glenn Dombeck, P.E., Ascend Innovations, Inc. and Charles Borg, Applied Process Technology, Inc. Abstract 1,2,3 Trichloropropane (TCP), with the chemical formula C3H5Cl3, is reasonably anticipated to be a human carcinogen and is currently an unregulated chemical for which monitoring is required (UCMR) with a notification level (NL) of 0.005 parts per billion (ppb). Due to its historical uses as a solvent,...»

«The Complexities of Teaching ‘Inclusion’ in Higher Education Kate D’Arcy,1Health and Social Sciences, University of Bedfordshire Abstract This article considers how action research can support the teaching of ‘Inclusion’ in Higher Education. As a professional committed to improving educational practices, action research was identified as a practical research approach to study the relationship between theories and practices of inclusive education. This article will report on a short...»

<<  HOME   |    CONTACTS
2016 www.dis.xlibx.info - Thesis, dissertations, books

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.