skip to main content

PENGGUNAAN MIXTURE MODEL KERNEL-GENERALIZED PARETO DISTRIBUTION DAN D-VINE COPULA DALAM MENGANALISIS UKURAN PELANGGARAN DATA

*Fathiyyah Yolianda Dzikra  -  Department of Statistics, Faculty of Science and Mathematics, Diponegoro University, Indonesia
Yuciana Wilandari  -  Departemen Statistika, Fakultas Sains dan Matematika, Undip, Indonesia
Arief Rachman Hakim  -  Departemen Statistika, Fakultas Sains dan Matematika, Undip, Indonesia
Open Access Copyright 2023 Jurnal Gaussian under http://creativecommons.org/licenses/by-nc-sa/4.0.

Citation Format:
Abstract
The research conducted on the 2015-2021 Data Breach Report in the U.S. Department of Health and Human Services is a study related to the estimation and modeling of the breach sizes each type of entity using the Kernel-Generalized Pareto Distribution Mixture Model method, as well as the estimation of the dependence of breach sizes between years with the D-Vine Copula. The D-Vine Copula can accommodate the complex dependencies demonstrated by data breach reports across all enterprise categories. Before researching with D-Vine Copula, we will first model and estimate breach size parameters for each type of entity using the Mixture Model Kernel-Generalized Pareto Distribution (GPD). The Mixture Model can accommodate large data breach sizes via GPD and also allows the use of non-parametric kernel distributions to model smaller data breach sizes. The data resulting from the logarithmic transformation of entity data in the Business Associate and Healthcare Provider types has a right short-tail with Weibull distribution, while the Health Plan category has a right heavy-tail with Frechet distribution. The three types of entity were estimated using the maximum likelihood Cross-Validation method. Dependency estimation with D-Vine Copula shows that the breach sizes between years measure has a positive dependency.
Fulltext View|Download
Keywords: Breach Sizes; Generalized Pareto Distribution; Kernel; D-Vine Copula

Article Metrics:

  1. Aas, K., Czado, C., Frigessi, A., dan Bakken, H. 2009. Pair-copula constructions of multiple dependence. Insurance: Mathematics and economics Vol. 44, No. 2, Hal: 182-198
  2. Bedford, T. dan Cooke, R.M. 2002. Vines: A New Graphical Model for Dependent Random Variables. The Annals of Statistics Vol. 33, No. 4, Hal: 1031-1068
  3. Czado, C. 2010. Pair-Copula Constructions of Multivariate Copula. Proceedings on Workshop of Lecture Notes in Statistics Vol. 198, Copula Theory and Its Applications, University of Warsaw: 25-26 September 2009
  4. Czado, C. dan Nagler, T. 2022. Vine Copula Based Modeling. Annual Review of Statistics and Its Application Vol. 9, No. 1, Hal: 453-477
  5. Durante, F. dan Sempi, C. 2010. Copula Theory: An Introduction. Proceedings on Workshop of Lecture Notes in Statistics Vol. 198, Copula Theory and Its Applications, University of Warsaw: 25-26 September 2009
  6. Fang, Z., Xu, M., Xu, S., dan Hu, T. 2021. A framework for Predicting Data Breach Risk: Leveraging Dependence to Cope with Sparsity. IEEE Transactions on Information Forensics and Security Vol. 13, Hal: 2186-2201
  7. Friederichs, P. 2007. An introduction to extreme value theory. COPS Summer School
  8. Herawati, N., Nisa, K., dan Setiawan, E. 2017. The Optimal Bandwidth for Kernel Density Estimation ff Skewed Distribution: A Case Study on Survival Time Data of Cancer Patients. Prosiding Seminar Nasional Metode Kuantitatif 2017 Vol. 1, No. 1, Hal: 380-388. Jurusan Matematika FMIPA Universitas Lampung
  9. Hu, Y. 2013. Extreme Value Mixture Modelling with Simulation Study and Applications in Finance and Insurance. Tesis. Department of Mathematics and Statistics Canterbury University New Zealand
  10. Identity Theft Resource Center. 2021. ITRC’s Notified - The ITRC’s Convenient, Comprehensive, Source for Data Breach Information. Tersedia: https://notified.idtheftcenter.org/s/ (diakses pada tanggal 20 Desember 2021)
  11. Kang, S. dan Song, J. 2017. Parameter and quantile estimation for the generalized Pareto distribution in peaks over threshold framework. Journal of the Korean Statistical Society Vol. 46, No. 4, Hal: 487-501
  12. MacDonald, A.E., et al. 2011. A Flexible Extreme Value Mixture Model. Computational Statistics and Data Analysis Vol. 55, No. 6, Hal: 2137-2157
  13. Nelsen, R.B. 2006. An Introduction to Copula 2nd Ed. New York: Springer
  14. Sun, H., Xu, M., dan Zhao, P. 2020. Modeling Malicious Hacking Data Breach Risks. North American Actuarial Journal Vol. 25, No. 4, Hal: 484-502
  15. U.S. Department of Health and Human Services. 2022. Breach Portal: Notice to the Secretary of HHS Breach of Unsecured Protected. Tersedia: https://ocrportal.hhs.gov/ocr/breach/breach_report.jsf. (diakses pada tanggal 11 Januari 2022)
  16. U.S. Department of Health and Human Services Administration for Children and Families. 2015. Information Memorandum: Information Security Programs and Guidelines for Responding to Data Breaches. Tersedia: https://www.acf.hhs.gov/sites/default/files/documents/cb/ im1504.pdf (diakses pada tanggal 31 Mei 2022)

Last update:

No citation recorded.

Last update:

No citation recorded.