File deduplication with cloud storage file system

Chan I. Ku, Guo Heng Luo, Che Pin Chang, Shyan-Ming Yuan

Research output: Contribution to conferencePaper

6 Scopus citations

Abstract

The Hadoop Distributed File System (HDFS) is used to solve the storage problem of huge data, but does not provide a handling mechanism of duplicate files. In this study, the middle layer file system in the HBASE virtual architecture is used to do File Deduplicate in HDFS, with two architectures proposed according to different requires of the applied requirement reliability, therein one is RFD-HDFS (Reliable File Deduplicated HDFS) which is not permitted to have any errors and the other is FD-HDFS (File Deduplicated HDFS) which can tolerate very few errors. In addition to the advantage of the space complexity, the marginal benefits from it are explored. Assuming a popular video is uploaded to HDFS by one million users, through the Hadoop replication, they are divided into three million files to store, that is a practice wasting disk space very much and only by the cloud to remove repeats for effectively loading. By that, only three file spaces are taken up, namely the 100% utility of removing duplicate files reaches. The experimental architecture is a cloud based documentation system, like the version of EndNote Cloud, to simulate the cluster effect of massive database when the researcher synchronized the data with cloud storage.

Original languageEnglish
Pages280-287
Number of pages8
DOIs
StatePublished - 1 Dec 2013
Event2013 16th IEEE International Conference on Computational Science and Engineering, CSE 2013 - Sydney, NSW, Australia
Duration: 3 Dec 20135 Dec 2013

Conference

Conference2013 16th IEEE International Conference on Computational Science and Engineering, CSE 2013
CountryAustralia
CitySydney, NSW
Period3/12/135/12/13

Keywords

  • Cloud Computing
  • Data Deduplication
  • HDFS
  • Single instance storage

Fingerprint Dive into the research topics of 'File deduplication with cloud storage file system'. Together they form a unique fingerprint.

  • Cite this

    Ku, C. I., Luo, G. H., Chang, C. P., & Yuan, S-M. (2013). File deduplication with cloud storage file system. 280-287. Paper presented at 2013 16th IEEE International Conference on Computational Science and Engineering, CSE 2013, Sydney, NSW, Australia. https://doi.org/10.1109/CSE.2013.52