A Study of Big Data in Cloud Computing

  • Imran Khan Department of CSE, Harcourt Butler Technical University, Kanpur, India

Abstract

Over the last two decades, the size and amount of data has increased enormously, which has changed traditional methods of data management and introduced two new technological terms: big data and cloud computing. Addressing big data, characterized by massive volume, high velocity and variety, is quite challenging as it requires large computational infrastructure to store, process and analyze it. A reliable technique to carry out sophisticated and enormous data processing has emerged in the form of cloud computing because it eliminates the need to manage advanced hardware and software, and offers various services to users. Presently, big data and cloud computing are gaining significant interest among academia as well as in industrial research. In this review, we introduce various characteristics, applications and challenges of big data and cloud computing. We provide a brief overview of different platforms that are available to handle big data, including their critical analysis based on different parameters. We also discuss the correlation between big data and cloud computing. We focus on the life cycle of big data and its vital analysis applications in various fields and domains At the end, we present the open research issues that still need to be addressed and give some pointers to future scholars in the fields of big data and cloud computing.

Keywords

big data, cloud computing, distributed computing, data mining, Hadoop,

References

1. D. Laney, 3-D data management: Controlling data volume, velocity and variety, META Group Research Note 6, 2001, http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf.
2. H. Chen, R.H.L. Chiang, V.C. Storey, Business intelligence and analytics: From big data to big impact, MIS Quarterly, 36(4): 1165–1188, 2012, doi: 10.2307/41703503.
3. O. Kwon, N. Lee, B. Shin, Data quality management, data usage experience and acquisition intention of big data analytics, International Journal of Information Management, 34(3): 387–394, 2014, doi: 10.1016/j.ijinfomgt.2014.02.002.
4. Gartner, IT Glossary, Big Data, n.d., http://www.gartner.com/it-glossary/big-data/.
5. D. Beaver, S. Kumar, H.C. Li, J. Sobel, P. Vajgel, Finding a needle in haystack: Facebook’s photo storage, [in:] Proceedings of the Ninth USENIX Symposium on Operating Systems Design and Implementation (OSDI 10), Berkeley, CA, USA, pp. 1–8, USENIX Association, 2010, https://research.facebook.com/publications/finding-a-needle-in-haystack-facebooks-photo-storage/.
6. K. Cukier, Data, data everywhere: A special report on managing information, The Economist, February 25, 2010, http://www.economist.com/node/15557443.
7. Y. Demchenko, P. Grosso, C. de Laat, P. Membrey, Addressing big data issues in scientific data infrastructure, [in:] 2013 International Conference on Collaboration Technologies and Systems (CTS), San Diego, CA, USA, pp. 48–55, 2013, doi: 10.1109/CTS.2013.6567203.
8. A. Gandomi, M. Haider, Beyond the hype: Big data concepts, methods, and analytics, International Journal of Information Management, 35(2): 137–144, 2015, doi: 10.1016/j.ijinfomgt.2014.10.007.
9. J. Manyika et al., Big data: The next frontier for innovation, competition, and productivity, Report, McKinsey Global Institute, 2011, https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/big-data-the-next-frontier-for-innovation.
10. C.L.P. Chen, C.-Y. Zhang, Data-intensive applications, challenges, techniques and technologies: A survey on big data, Information Sciences, 275: 314–347, 2014, doi: 10.1016/j.ins.2014.01.015.
11. L. Candela, D. Castelli, P. Pagano, Managing big data through hybrid data infrastructures, ERCIM News, 89: 37–38, 2012.
12. J. Gantz, D. Reinsel, Extracting value from chaos, IDC’ Digital Universe Study, IDC iView, pp. 1–12, 2011.
13. Fact sheet: Big data across the federal government, The White House, March 29, 2012. https://obamawhitehouse.archives.gov/the-press-office/2015/12/04/fact-sheet-big-dataacross-federal-government.
14. V. Mayer-Schönberger, K. Cukier, Big data: A Revolution that Will Transform how We Live, Work, and Think, An Eamon Dolan Book/Houghton Mifflin Harcourt, Boston, New York, 2013.
15. M. Chen, S. Mao, Y. Liu, Big data: A survey, Mobile Networks and Applications, 19(2): 171–209, 2014, doi: 10.1007/s11036-013-0489-0.
16. O’Reilly Radar Team, Big Data Now: Current Perspectives from O’Reilly Radar, O’Reilly Media, 2011.
17. M. Grobelnik, Big data tutorial, 2012, http://videolectures.net/eswc2012grobelnikbigdata/ (accessed May 12, 2017).
18. A. Labrinidis, H.V. Jagadish, Challenges and opportunities with big data, Proceedings of the VLDB Endowment, 5(12): 2032–2033, 2012, doi: 10.14778/2367502.2367572.
19. PoweredBy – Applications and organizations using HADOOP2, Apache Software Foundation, 2013, http://wiki.apache.org/hadoop/PoweredBy.
20. T. Gunarathne, T.-L. Wu, J.Y. Choi, S.-H. Bae, J. Qiu, Cloud computing paradigms for pleasingly parallel biomedical applications, Concurrency and Computation: Practice and Experience, 23(17): 2338–2354, 2011, doi: 10.1002/cpe.1780.
21. J. Gantz, D. Reinsel, The digital universe decade – Are you ready?, IDC Analyze the Future, pp. 1–16, 2010.
22. How Big Data Analysis helped increase Walmart’s Sales turnover?, ProjectPro, https://www.projectpro.io/article/how-big-data-analysis-helped-increase-walmarts-salesturnover/109 (accessed May 12, 2017).
23. R. Cattell, Scalable SQL and NoSQL data stores, ACM SIGMOD Record, 39(4): 12–27, 2011, doi: 10.1145/1978915.1978919.
24. E. Ma, Colossus: Successor to the Google File System (GFS), SysTutorials, https://www.systutorials.com/colossus-successor-to-google-file-system-gfs/ (accessed May 12, 2017).
25. R. Chaiken et al., SCOPE: Easy and efficient parallel processing of massive data sets, Proceedings of the VLDB Endowment, 1(2): 1265–1276, 2008, doi: 10.14778/1454159.1454166.
26. J. Dean, S. Ghemawat, MapReduce: Simplified data processing on large clusters, Communications of the ACM, 51(1): 107–113, 2008, doi: 10.1145/1327452.1327492.
27. S. Blanas, J.M. Patel, V. Ercegovac, J. Rao, E.J. Shekita, Y. Tian, A comparison of join algorithms for log processing in MapReduce, [in:] SIGMOD’10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 975–986, ACM, 2010, doi: 10.1145/1807167.180727.
28. H.-C. Yang, D.S. Parker, Traverse: Simplified indexing on large Map-Reduce-Merge clusters, [in:] Database Systems for Advanced Applications, X. Zhou, H. Yokota, K. Deng, Q. Liu [Eds.], Springer, pp. 308–322, 2009.
29. R. Pike, S. Dorward, R. Griesemer, S. Quinlan, Interpreting the data: Parallel analysis with Sawzall, Scientific Programming, 13(4): 277–298, 2005, doi: 10.1155/2005/962135.
30. A.F. Gates et al., Building a high-level dataflow system on top of Map-Reduce: The Pig experience, Proceedings of VLDB Endowment, 2(2): 1414–1425, 2009, doi: 10.14778/1687553.1687568.
31. A. Thusoo et al., Hive: A warehousing solution over a Map-Reduce framework, Proceedings of the VLDB Endowment, 2(2): 1626–1629, 2009, doi: 10.14778/1687553.1687609.
32. M.-C.Wu, J. Zhou, N. Bruno, Y. Zhang, J. Fowler, Scope playback: Self-validation in the cloud, [in:] Proceedings of the Fifth International Workshop on Testing Database Systems (DBTest’12), Article 3, pp. 1–6, Association for Computing Machinery, New York, NY, USA, 2012, doi: 10.1145/2304510.2304514.
33. M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly, Dryad: Distributed data-parallel programs from sequential building blocks, ACM SIGOPS Operating Systems Review, 41(3): 59–72, 2007, doi: 10.1145/1272996.1273005.
34. Y. Yu et al., DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language, [in:] 8th USENIX Symposium on Operating Systems Design and Implementation, San Diego, CA, USA, Vol. 8, pp. 1–14, 2008.
35. C. Moretti, J. Bulosan, D. Thain, P.J. Flynn, All-pairs: An abstraction for data-intensive cloud computing, [in:] 2008 IEEE International Symposium on Parallel and Distributed Processing, Miami, FL, USA, 2008, pp. 1–11, doi: 10.1109/IPDPS.2008.4536311.
36. G. Malewicz et al., Pregel: A system for large-scale graph processing, [in:] SIGMOD’10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146, ACM, 2010, doi: 10.1145/1807167.180718.
37. C.-W. Lu, C.-M. Hsieh, C.-H. Chang, C.-T. Yang, An improvement to data service in cloud computing with content sensitive transaction analysis and adaptation, [in:] 2013 IEEE 37th Annual Computer Software and Applications Conference Workshops, Japan, pp. 463–468, 2013, doi: 10.1109/COMPSACW.2013.72.
38. M. Armbrust et al., A view of cloud computing, Communications of the ACM, 53(4): 50–58, 2010, doi: 10.1145/1721654.1721672.
39. H. Liu, Big data drives cloud adoption in enterprise, IEEE Internet Computing, 17(4): 68–71, 2013, doi: 10.1109/MIC.2013.63.
40. S. Pandey, S. Nepal, Cloud computing and scientific applications – Big data, scalable analytics, and beyond, Future Generation Computer Systems, 29(7): 1774–1776, 2013, doi: 10.1016/j.future.2013.04.026.
41. D. Warneke, O. Kao, Nephele: Efficient parallel data processing in the cloud, [in:] MTAGS’09: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, ACM, Article no. 8, pp. 1–10, 2009, doi: 10.1145/1646468.1646476.
42. P. Mell, T. Grance, The NIST Definition of Cloud Computing, Technical Report, Special Publication 80, National Institute of Standards & Technology, Gaithersburg, MD, USA, 2011.
43. G. Aceto, A. Botta, W. de Donato, A. Pescapè, Cloud monitoring: A survey, Computer Networks, 57(9): 2093–2115, 2013, doi: 10.1016/j.comnet.2013.04.0.
44. T. Gunarathne, B. Zhang, T.-L. Wu, J. Qiu, Scalable parallel computing on clouds using Twister4Azure iterative MapReduce, Future Generation Computer Systems, 29(4): 1035–1048, 2013, doi: 10.1016/j.future.2012.05.027.
45. A. O’Driscoll, J. Daugelaite, R.D. Sleator, ‘Big data’, Hadoop and cloud computing in genomics, Journal of Biomedical Informatics, 46(5): 774–781, 2013, doi: 10.1016/j.jbi.2013.07.001.
46. M.D. Assunção, R.N. Calheiros, S. Bianchi, M.A.S. Netto, R. Buyya, Big Data computing and clouds: Trends and future directions, Journal of Parallel and Distributed Computing, 79–80: 3–15, 2015, doi: 10.1016/j.jpdc.2014.08.003.
47. P.S. Yu, On mining big data, [in:] J. Wang, H. Xiong, Y. Ishikawa, J. Xu, J. Zhou [Eds.], Web-Age Information Management, Lecture Notes in Computer Science, Vol. 7923, Springer-Verlag, Berlin, Heidelberg, 2013, p. XIV.
48. X. Sun et al., Towards delivering analytical solutions in cloud: Business models and technical challenges, [in:] 2011 IEEE 8th International Conference on e-Business Engineering, Beijing, China, pp. 347–351, 2011, doi: 10.1109/ICEBE.2011.81.
49. ‘Big Data’ has Big Potential to Improve Americans’ Lives, Increase Economic Opportunities, Press Releases, Committee on Science, Space and Technology, April 24, 2013, https://science.house.gov/2013/4/big-data-has-big-potential-improve-americans-lives-increase-economic.
50. Prime Minister joins Sir Ka-shing Li for launch of £90m initiative in big data and drug discovery at Oxford, University of Oxford, May 3, 2013, http://www.cs.ox.ac.uk/news/639-full.html.
51. J. Manzoni, Big data in government: the challenges and opportunities, Speech delivered on February 17, 2017, https://www.gov.uk/government/speeches/big-data-ingovernment-the-challenges-and-opportunities.
52. Government-backed Russian Fund Launches Big Data Investment Program, RusSoft, http://russoft.org/docs/?doc=3391 (accessed May 12, 2017).
53. bigdata@csail, http://bigdata.csail.mit.edu/ (accessed May 12, 2017).
54. The Intel science and technology center for big data, Information Science and Technology Consultants (ISTC), http://istc-bigdata.org.
55. D. Borthakur et al., Apache Hadoop goes realtime at Facebook, [in:] SIGMOD’11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, New York, USA, pp. 1071–1080, ACM, 2011, doi: 10.1145/1989323.1989438.
56. M. Armbrust et al., Above the Clouds: A Berkeley View of Cloud Computing, Technical Report UCB/EECS-2009-28, EECS Department, University of California, Berkeley, USA, 2009, http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf.
57. Google. Google Trends for Big Data, 2013.
58. R. Swanstrom, NIST defines Big Data and Data Science, Data Science 101, Learn Data Science, http://101.datascience.community/2015/04/23/nist-defines-big-data-and-datascience/ (accessed May 12, 2017).
59. B. Golden, Virtualization for Dummies, Wiley, Indianapolis, Indian, USA, 2009.
60. Zephoria Inc., The Top 20 Valuable Facebook Statistics – Updated May 2017, https://zephoria.com/top-15-valuable-facebook-statistics/ (accessed May 12, 2017).
61. Y. Yang, L. Zhang, Y. Zhen, R. Ji, Learning for visual semantic understanding in big data, Neurocomputing, 169: 1–4, 2015, doi: 10.1016/j.neucom.2015.05.023.
62. S. Maitrey, C.K. Jha, MapReduce: Simplified data analysis of big data, Procedia Computer Science, 57: 563–571, 2015, doi: 10.1016/j.procs.2015.07.392.
63. A. Vinay, V.S. Shekhar, J. Rituparna, T. Aggrawal, K.N.B. Murthy, S. Natarajan, Cloud based big data analytics framework for face recognition in social networks using machine learning, Procedia Computer Science, 50: 623–630, 2015, doi: 10.1016/j.procs.2015.04.095.
64. K.P.N. Jayasena, L. Li, Q. Xie, Multi-modal multimedia big data analyzing architecture and resource allocation on cloud platform, Neurocomputing, 253: 135–143, 2017, doi: 10.1016/j.neucom.2016.11.077.
65. C. Smowton et al., A cost-effective approach to improving performance of big genomic data analyses in clouds, Future Generation Computer Systems, 67: 368–381, 2017, doi: 10.1016/j.future.2015.11.011.
66. D. Linthicum, Three types of IoT data sources, RTInsights.com, March 29, 2016, https://www.rtinsights.com/three-types-of-iot-data-sources/ (accessed May 12, 2017).
67. M. Wilhelm et al., Mass-spectrometry-based draft of the human proteome, Nature, 509: 582–587, 2014, doi: 10.1038/nature13319.
68. E. Olshannikova, T. Olsson, J. Huhtamäki, H. Kärkkäinen, Conceptualizing big social data, Journal of Big Data, 4(3): 1–19, 2017.
69. A.E. Marwick, Status Update: Celebrity, Publicity, and Branding in the Social Media Age, Yale University Press, 2013.
70. F. Campos Freire, N. Alonso Ramos, Online digital social tools for professional selfpromotion. A state of the art review, Revista Latina de Comunicación Social, 70: 288–299, 2015, doi: 10.4185/RLCS-2015-1047en.
71. C. Shih, The Facebook Era: Tapping Online Social Networks to Build Better Products, Reach New Audiences, and Sell More Stuff, Prentice Hall, New York, 2009.
72. A.T. Stephen, O. Toubia, Deriving value from social commerce networks, Journal of Marketing Research, 47(2): 215–228, 2010, doi: 10.2139/ssrn.1150995.
73. M.T. Musacchio, R. Panizzon, X. Zhang, V. Zorzi, A linguistically-driven methodology for detecting impending disasters and unfolding emergencies from social media messages, [in:] Proceedings of the LREC 2016 Workshop “EMOT: Emotions, Metaphors, Ontology and Terminology during Disasters, K. Ahmad, S. Kelly, X. Zhang [Eds.], Portorož, Slovenia, p. 26–33, 2016.
74. C. Aradau, T. Blanke, Politics of prediction: Security and the time/space of governmentality in the age of big data, European Journal of Social Theory, 20(3): 373–391, 2017, doi: 10.1177/1368431016667623.
75. A.M.M. Saldana-Perez, M. Moreno-Ibarra, Traffic analysis based on short texts from social media, International Journal of Knowledge Society Research (IJKSR), 7(1): 63–79, 2016, doi: 10.4018/IJKSR.2016010105.
76. E. Qualman, Socialnomics: How Social Media Transforms the Way We Live and Do Business, John Wiley & Sons, New Jersey, 2010.
77. H. Kennedy, Commercial mediations of social media data, [in:] Post, Mine, Repeat, pp. 99–127, Palgrave Macmillan, London, 2016, doi: 10.1057/978-1-137-35398-6_5.
78. D. Agrawal et al., Challenges and Opportunities with Big Data, A white paper prepared for the Computing Community Consortium Committee of the Computing Research Association, 2012, http://cra.org/ccc/resources/ccc-led-whitepapers.
79. M. Ware, M. Mabe, The STM Report: An Overview of Scientific and Scholarly Journal Publishing, International Association of Scientific, Technical and Medical Publishers, The Hauge, The Netherlands, 2009.
80. M.C. Burl, C. Fowlkes, J. Roden, Mining for image content, [in:] Systemics, Cybernetics, and Informatics/Information Systems: Analysis and Synthesis, Session on Intelligent Data Mining and Knowledge Discovery, 1999.
81. N. Kennedy, Facebook’s photo storage rewrite, https://www.niallkennedy.com/blog/2009/04/facebook-haystack.html (accessed May 12, 2017).
82. FortuneLords, YouTube Statistics – 2017, https://fortunelords.com/youtube-statistics/ (accessed May 12, 2017).
83. D. Saravanan, S. Srinivasan, Data mining framework for video data, [in:] Recent Advances in Space Technology Services and Climate Change 2010 (RSTS & CC-2010), Chennai, India, pp. 167–170, 2010, doi: 10.1109/RSTSCC.2010.5712827.
84. A. Ittoo, L.M. Nguyen, A. van den Bosch, Text analytics in industry: Challenges, desiderata and trends, Computers in Industry, 78: 96–107, 2016, doi: 10.1016/j.compind.2015.12.001.
85. RapidMiner, https://rapidminer.com/ (accessed May 12, 2017).
86. Weka, http://www.cs.waikato.ac.nz/ml/weka/ (accessed May 12, 2017).
87. Orange Data Mining, https://orange.biolab.si/ (accessed May 12, 2017).
88. DataMelt, http://jwork.org/dmelt/ (accessed May 12, 2017).
89. KEEL, http://www.keel.es/ (accessed May 12, 2017).
90. P. Fournier-Viger, A. Gomariz, T. Gueniche, A. Soltani, Ch.-W. Wu, V.S. Tseng, SPMF: A Java open-source pattern mining library, Journal of Machine Learning Research, 15(1): 35699–3573, 2014.
91. G.J. Williams, Rattle: A data mining GUI for R, The R Journal, 1/2: 45–55, 2009.
92. Apache Mahout, http://mahout.apache.org/ (accessed May 12, 2017).
93. V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, J. Allan, Mining of concurrent text and time series, [in:] KDD-2000 Workshop on Text Mining, Vol. 2000, pp. 37–44, University Park, PA, USA, 2000.
94. J.D. Thomas, K. Sycara, Integrating genetic algorithms and text learning for financial prediction, [in:] Proceedings of GECCO’00 Workshop on Data Mining with Evolutionary Algorithms, pp. 72–75, 2000.
95. B. Back, J. Toivonen, H. Vanharanta, A. Visa, Comparing numerical data and text information from annual reports using self-organizing maps, International Journal of Accounting Information Systems, 2(4): 249–269, 2001, doi: 10.1016/S1467-0895(01)00018-5.
96. G.P.C. Fung, J.X. Yu, W. Lam, Stock prediction: Integrating text mining approach using real-time news, [in:] 2003 IEEE International Conference on Computational Intelligence for Financial Engineering, Hong Kong, China, pp. 395–402, 2003, doi: 10.1109/CIFER.2003.1196287.
97. M. Koppel, I. Shtrimberg, Good news or bad news? Let the market decide, [in:] Computing Attitude and Affect in Text: Theory and Applications, J.G. Shanahan, Y. Qu, J. Wiebe [Eds.], pp. 297–301, Springer, Dordrecht, 2006, doi: 10.1007/1-4020-4102-0_22.
98. L. Dey, A. Mahajan, S.K.M. Haque, Document clustering for event identification and trend analysis in market news, [in:] ICAPR ’09: Proceedings of the 2009 Seventh International Conference on Advances in Pattern Recognition, pp. 103–106, Kolkata, India, 2009, doi: 10.1109/ICAPR.2009.84.
99. S. Wang, K. Xu, L. Liu, B. Fang, S. Liao, H. Wang, An ontology based framework for mining dependence relationships between news and financial instruments, Expert Systems with Applications, 38(10): 12044–12050, 2011, doi: 10.1016/j.eswa.2011.01.148.
100. A.K. Nassirtoussi, S. Aghabozorgi, T.Y. Wah, D.C.L. Ngo, Text mining of news-headlines for FOREX market prediction: A multi-layer dimension reduction algorithm with semantics and sentiment, Expert Systems with Applications, 42(1): 306–324, 2015, doi: 10.1016/j.eswa.2014.08.004.
101. J.B. Schafer, J.A. Konstan, J. Riedl, E-commerce recommendation applications, Data Mining and Knowledge Discovery, 5(1): 115–153, 2001, doi: 10.1023/A:1009804230409.
102. B. Pang, L. Lee, S. Vaithyanathan, Thumbs up? Sentiment classification using machine learning techniques, [in:] Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pp. 79–86, Association for Computational Linguistics, 2002, doi: 10.3115/1118693.1118704.
103. M. Hu, B. Liu, Mining and summarizing customer reviews, [in:] KDD ’04: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177, 2004, doi: 10.1145/1014052.1014073.
104. A.-M. Popescu, O. Etzioni, Extracting product features and opinions from reviews, [in:] Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 339–346, Association for Computational Linguistics, 2005, https://aclanthology.org/H05-1043.
105. A. Bifet, E. Frank, Sentiment knowledge discovery in Twitter streaming data, [in:] B. Pfahringer, G. Holmes, A. Hoffmann [Eds.], Discovery Science, Lecture Notes in Computer Science, Vol. 6332, pp. 1–15, Springer, Berlin, Heidelberg, 2010, doi: 10.1007/978-3-642-16184-1_1.
106. L. Dey, S.M. Haque, N. Raj, Mining customer feedbacks for actionable intelligence, [in:] 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Toronto, ON, Canada, pp. 239–242, 2010, doi: 10.1109/WI-IAT.2010.196.
107. S.-C.Wang, Artificial neural network, [in:] Interdisciplinary Computing in Java Programming, The Springer International Series in Engineering and Computer Science, Vol. 743, pp. 81–100, Springer, Boston, MA, 2003, doi: 10.1007/978-1-4615-0377-4_5.
108. R. Hecht-Nielsen, Theory of the backpropagation neural network, Neural Networks for Perception, 1(Supplement 1): 445–448, 1988, doi: 10.1016/0893-6080(88)90469-8.
109. Y. Freund, R. Iyer, R.E. Schapire, Y. Singer, An efficient boosting algorithm for combining preferences, Journal of Machine Learning Research, 4(6): 933–969, 2003.
110. J.M. Bernardo, A.F.M. Smith, Bayesian Theory, John Wiley and Sons, 2001.
111. J. Kolodner, Case-Based Reasoning, Morgan Kaufmann, 1993.
112. S. Suthaharan, Decision tree learning, [in:] Machine Learning Models and Algorithms for Big Data Classification, Integrated Series in Information Systems, Vol. 36, p. 237–269, Springer, Boston, MA, 2016, doi: 10.1007/978-1-4899-7641-3_10.
113. S. Muggleton, R. Otero, A. Tamaddoni-Nezhad [Eds.], Inductive Logic Programming, Vol. 38, Academic Press, London, 1992.
114. S. Choi, Gaussian Process Regression Analysis for Functional Data, Taylor & Francis, 2011.
115. K.P. Murphy, Naive Bayes Classifiers, University of British Columbia, 2006.
116. A. McCallum, D. Freitag, F.C.N. Pereira, Maximum entropy Markov models for information extraction and segmentation, [in:] 17th International Conference on Machine Learning, Vol. 17, pp. 591–598, 2000.
117. M.A. Hearst, S.T. Dumais, E. Osuna, J. Platt, B. Scholkopf, Support vector machines, IEEE Intelligent Systems and their Applications, 13(4): 18–28, 1998, doi: 10.1109/5254.708428.
118. J.A. Hartigan, M.A. Wong, Algorithm AS 136: A k-means clustering algorithm, Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1): 100–108, 1979, doi: 10.2307/2346830.
119. B. Abu-Jamous, R. Fa, A.K. Nandi, Mixture model clustering, [in:] Integrative Cluster Analysis in Bioinformatics, Ch. 15, pp. 197–226, 2015, doi: 10.1002/9781118906545.ch15.
120. S.C. Johnson, Hierarchical clustering schemes, Psychometrika, 32(3): 241–254, 1967, doi: 10.1007/BF02289588.
121. L.E. Peterson, K-nearest neighbor, Scholarpedia, 4(2): 1883, 2009, doi: 10.4249/scholarpedia.1883.
122. Y. Ye, C.-C. Chiang, A parallel apriori algorithm for frequent itemsets mining, [in:] Fourth International Conference on Software Engineering Research, Management and Applications (SERA’06), Seattle,WA, USA, pp. 87–94, 2006, doi: 10.1109/SERA.2006.6.
123. L. Schmidt-Thieme, Algorithmic features of Eclat, [in:] FIMI’04, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, Brighton, UK, 2004.
124. C. Borgelt, An implementation of the FP-growth algorithm, [in:] OSDM ’05: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, pp. 1–5, ACM, 2005, doi: 10.1145/1133905.1133907.
125. Cloud Computing Services – Amazon Web Services (AWS), https://aws.amazon.com/ (accessed May 12, 2017).
126. GoGrid, https://www.datapipe.com/gogrid/ (accessed May 12, 2017).
127. Flexiscale, http://www.flexiscale.com/signup-on-stop/ (accessed May 12, 2017).
128. App Engine Application Platform | Google Cloud, https://cloud.google.com/appengine/ (accessed May 12, 2017).
129. Cloud Computing Services | Microsoft Azure, https://azure.microsoft.com/en-in/ (accessed May 12, 2017).
130. RightScale, http://www.rightscale.com/ (accessed May 12, 2017).
131. Eucalyptus, http://www.dxc.technology/cloud/offerings/140041/140149-eucalyptus_software_support_services (accessed May 12, 2017).
132. C.L. Devasena, M. Hemalatha, A hybrid image mining technique using LIM-based data mining algorithm, International Journal of Computer Applications, 25(2): 11–15, 2011, doi: 10.5120/3007-4056.
133. P. Rajendran, M. Madheswaran, An improved image mining technique for brain tumour classification using efficient classifier, arXiv, 2010, arXiv: 10.48550/arXiv.1001.1988.
134. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, Communications of the ACM, 60(6): 84–90, 2017, doi: 10.1145/3065386.
135. A.R.J. Francois, R. Nevatia, J. Hobbs, R.C. Bolles, J.R. Smith, VERL: An ontology framework for representing and annotating video events, IEEE Multimedia, 12(4): 76–86, 2005, doi: 10.1109/MMUL.2005.87.
136. U. Gargi, W. Lu, V. Mirrokni, S. Yoon, Large-scale community detection on YouTube for topic discovery and exploration, [in:] Proceedings of the International AAAI Conference on Web and Social Media, Vol. 5, No. 1, pp. 486–489, 2011, doi: 10.1609/icwsm.v5i1.14191.
137. J.R. Zhang, Y. Song, T. Leung, Improving video classification via YouTube video co-watch data, [in:] SBNMA’11: Proceedings of the 2011 ACM Workshop on Social and Behavioural Networked Media Access, pp. 21–26, ACM, 2011, doi: 10.1145/2072627.2072635.
138. M. Schroeck, R. Shockley, J. Smart, D. Romero-Morales, P. Tufano, Analytics: The realworld use of big data: How innovative enterprises extract value from uncertain data. Executive Report, IBM Institute for Business Value, 2012, https://public.dhe.ibm.com/software/uk/data/pdf/The_real-world_use_of_big_data.pdf.
139. R. Sravan Kumar, A. Saxena, Data integrity proofs in cloud storage, [in:] 2011 Third International Conference on Communication Systems and Networks (COMSNETS 2011), Bangalore, India, pp. 1–4, 2011, doi: 10.1109/COMSNETS.2011.5716422.
140. L. Kocarev, G. Jakimoski, Logistic map as a block encryption algorithm, Physics Letters A, 289(4–5): 199–206, 2001, doi: 10.1016/S0375-9601(01)00609-0.
141. K.L. Weber, G. Rincon, A.L. Van Eenennaam, B.L. Golden, J.F. Medrano, Differences in allele frequency distribution of bovine high-density genotyping platforms in holsteins and jerseys, [in:] Proceedings, Western Section, American Society of Animal Science, Vol. 63, pp. 70–74, 2012, https://www.asas.org/docs/western-section/wsasas_2012.pdf?sfvrsn=0#page=84.
142. R. Akerkar [Ed.], Big Data Computing, Chapman and Hall/CRC Press, New York, 2013.
143. Statista, Big data market size revenue forecast worldwide from 2011–2027, https://www.statista.com/statistics/254266/global-big-data-market-forecast/ (accessed May 12, 2017).
144. S. Kumar, K. Cengiz, S. Vimal, A. Suresh, Energy efficient resource migration based load balance mechanism for high traffic applications IoT, Wireless Personal Communications, 127: 385–403, 2021, doi: 10.1007/s11277-021-08269-7.
145. S. Kumar, P. Ranjan, R. Ramaswami, M.R. Tripathy, Resource efficient clustering and next hop knowledge based routing in multiple heterogeneous wireless sensor networks, International Journal of Grid and High Performance Computing, 9(2): 1–20, 2017, doi: 10.4018/IJGHPC.2017040101.
146. S. Kumar, P. Ranjan, R. Radhakrishnan, M.R. Tripathy, Energy efficient multichannel MAC protocol for high traffic applications in heterogeneous wireless sensor networks, Recent Advances in Electrical and Electronic Engineering, 10(3): 223–232, 2017, doi: 10.2174/2352096510666170601090202.
Published
Aug 14, 2024
How to Cite
KHAN, Imran. A Study of Big Data in Cloud Computing. Computer Assisted Methods in Engineering and Science, [S.l.], v. 31, n. 3, p. 313–349, aug. 2024. ISSN 2956-5839. Available at: <https://cames.ippt.gov.pl/index.php/cames/article/view/906>. Date accessed: 23 dec. 2024. doi: http://dx.doi.org/10.24423/cames.2024.906.
Section
[CLOSED]AI-based Future Intelligent Networks and Communication Security