A Study of Big Data in Cloud Computing

Downloads

Authors

  • Imran Khan Department of CSE, Harcourt Butler Technical University, Kanpur, India

Abstract

Over the last two decades, the size and amount of data has increased enormously, which has changed traditional methods of data management and introduced two new technological terms: big data and cloud computing. Addressing big data, characterized by massive volume, high velocity and variety, is quite challenging as it requires large computational infrastructure to store, process and analyze it. A reliable technique to carry out sophisticated and enormous data processing has emerged in the form of cloud computing because it eliminates the need to manage advanced hardware and software, and offers various services to users. Presently, big data and cloud computing are gaining significant interest among academia as well as in industrial research. In this review, we introduce various characteristics, applications and challenges of big data and cloud computing. We provide a brief overview of different platforms that are available to handle big data, including their critical analysis based on different parameters. We also discuss the correlation between big data and cloud computing. We focus on the life cycle of big data and its vital analysis applications in various fields and domains At the end, we present the open research issues that still need to be addressed and give some pointers to future scholars in the fields of big data and cloud computing.

Keywords:

big data, cloud computing, distributed computing, data mining, Hadoop

References

1. D. Laney, 3-D data management: Controlling data volume, velocity and variety, META Group Research Note 6, 2001, http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf

2. H. Chen, R.H.L. Chiang, V.C. Storey, Business intelligence and analytics: From big data to big impact, MIS Quarterly, 36(4): 1165–1188, 2012, https://doi.org/10.2307/41703503

3. O. Kwon, N. Lee, B. Shin, Data quality management, data usage experience and acquisition intention of big data analytics, International Journal of Information Management, 34(3): 387–394, 2014, https://doi.org/10.1016/j.ijinfomgt.2014.02.002

4. Gartner, IT Glossary, Big Data, n.d., http://www.gartner.com/it-glossary/big-data/

5. D. Beaver, S. Kumar, H.C. Li, J. Sobel, P. Vajgel, Finding a needle in haystack: Facebook’s photo storage, [in:] Proceedings of the Ninth USENIX Symposium on Operating Systems Design and Implementation (OSDI 10), Berkeley, CA, USA, pp. 1–8, USENIX Association, 2010, https://research.facebook.com/publications/finding-a-needle-in-haystack-facebooks-photo-storage/

6. K. Cukier, Data, data everywhere: A special report on managing information, The Economist, February 25, 2010, http://www.economist.com/node/15557443

7. Y. Demchenko, P. Grosso, C. de Laat, P. Membrey, Addressing big data issues in scientific data infrastructure, [in:] 2013 International Conference on Collaboration Technologies and Systems (CTS), San Diego, CA, USA, pp. 48–55, 2013, https://doi.org/10.1109/CTS.2013.6567203

8. A. Gandomi, M. Haider, Beyond the hype: Big data concepts, methods, and analytics, International Journal of Information Management, 35(2): 137–144, 2015, https://doi.org/10.1016/j.ijinfomgt.2014.10.007

9. J. Manyika et al., Big data: The next frontier for innovation, competition, and productivity, Report, McKinsey Global Institute, 2011, https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/big-data-the-next-frontier-for-innovation

10. C.L.P. Chen, C.-Y. Zhang, Data-intensive applications, challenges, techniques and technologies: A survey on big data, Information Sciences, 275: 314–347, 2014, https://doi.org/10.1016/j.ins.2014.01.015

11. L. Candela, D. Castelli, P. Pagano, Managing big data through hybrid data infrastructures, ERCIM News, 89: 37–38, 2012.

12. J. Gantz, D. Reinsel, Extracting value from chaos, IDC’ Digital Universe Study, IDC iView, pp. 1–12, 2011.

13. Fact sheet: Big data across the federal government, The White House, March 29, 2012. https://obamawhitehouse.archives.gov/the-press-office/2015/12/04/fact-sheet-big-dataacross-federal-government

14. V. Mayer-Schönberger, K. Cukier, Big data: A Revolution that Will Transform how We Live, Work, and Think, An Eamon Dolan Book/Houghton Mifflin Harcourt, Boston, New York, 2013.

15. M. Chen, S. Mao, Y. Liu, Big data: A survey, Mobile Networks and Applications, 19(2): 171–209, 2014, https://doi.org/10.1007/s11036-013-0489-0

16. O’Reilly Radar Team, Big Data Now: Current Perspectives from O’Reilly Radar, O’Reilly Media, 2011.

17. M. Grobelnik, Big data tutorial, 2012, http://videolectures.net/eswc2012grobelnikbigdata/ (accessed May 12, 2017).

18. A. Labrinidis, H.V. Jagadish, Challenges and opportunities with big data, Proceedings of the VLDB Endowment, 5(12): 2032–2033, 2012, https://doi.org/10.14778/2367502.2367572

19. PoweredBy – Applications and organizations using HADOOP2, Apache Software Foundation, 2013, http://wiki.apache.org/hadoop/PoweredBy

20. T. Gunarathne, T.-L. Wu, J.Y. Choi, S.-H. Bae, J. Qiu, Cloud computing paradigms for pleasingly parallel biomedical applications, Concurrency and Computation: Practice and Experience, 23(17): 2338–2354, 2011, https://doi.org/10.1002/cpe.1780

21. J. Gantz, D. Reinsel, The digital universe decade – Are you ready?, IDC Analyze the Future, pp. 1–16, 2010.

22. How Big Data Analysis helped increase Walmart’s Sales turnover?, ProjectPro, https://www.projectpro.io/article/how-big-data-analysis-helped-increase-walmarts-salesturnover/109 (accessed May 12, 2017).

23. R. Cattell, Scalable SQL and NoSQL data stores, ACM SIGMOD Record, 39(4): 12–27, 2011, https://doi.org/10.1145/1978915.1978919

24. E. Ma, Colossus: Successor to the Google File System (GFS), SysTutorials, https://www.systutorials.com/colossus-successor-to-google-file-system-gfs/ (accessed May 12, 2017).

25. R. Chaiken et al., SCOPE: Easy and efficient parallel processing of massive data sets, Proceedings of the VLDB Endowment, 1(2): 1265–1276, 2008, https://doi.org/10.14778/1454159.1454166

26. J. Dean, S. Ghemawat, MapReduce: Simplified data processing on large clusters, Communications of the ACM, 51(1): 107–113, 2008, https://doi.org/10.1145/1327452.1327492

27. S. Blanas, J.M. Patel, V. Ercegovac, J. Rao, E.J. Shekita, Y. Tian, A comparison of join algorithms for log processing in MapReduce, [in:] SIGMOD’10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 975–986, ACM, 2010, https://doi.org/10.1145/1807167.180727

28. H.-C. Yang, D.S. Parker, Traverse: Simplified indexing on large Map-Reduce-Merge clusters, [in:] Database Systems for Advanced Applications, X. Zhou, H. Yokota, K. Deng, Q. Liu [Eds.], Springer, pp. 308–322, 2009.

29. R. Pike, S. Dorward, R. Griesemer, S. Quinlan, Interpreting the data: Parallel analysis with Sawzall, Scientific Programming, 13(4): 277–298, 2005, https://doi.org/10.1155/2005/962135

30. A.F. Gates et al., Building a high-level dataflow system on top of Map-Reduce: The Pig experience, Proceedings of VLDB Endowment, 2(2): 1414–1425, 2009, https://doi.org/10.14778/1687553.1687568

31. A. Thusoo et al., Hive: A warehousing solution over a Map-Reduce framework, Proceedings of the VLDB Endowment, 2(2): 1626–1629, 2009, https://doi.org/10.14778/1687553.1687609

32. M.-C.Wu, J. Zhou, N. Bruno, Y. Zhang, J. Fowler, Scope playback: Self-validation in the cloud, [in:] Proceedings of the Fifth International Workshop on Testing Database Systems (DBTest’12), Article 3, pp. 1–6, Association for Computing Machinery, New York, NY, USA, 2012, https://doi.org/10.1145/2304510.2304514

33. M. Isard, M. Budiu, Y. Yu, A. Birrell, D. Fetterly, Dryad: Distributed data-parallel programs from sequential building blocks, ACM SIGOPS Operating Systems Review, 41(3): 59–72, 2007, https://doi.org/10.1145/1272996.1273005

34. Y. Yu et al., DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language, [in:] 8th USENIX Symposium on Operating Systems Design and Implementation, San Diego, CA, USA, Vol. 8, pp. 1–14, 2008.

35. C. Moretti, J. Bulosan, D. Thain, P.J. Flynn, All-pairs: An abstraction for data-intensive cloud computing, [in:] 2008 IEEE International Symposium on Parallel and Distributed Processing, Miami, FL, USA, 2008, pp. 1–11, https://doi.org/10.1109/IPDPS.2008.4536311

36. G. Malewicz et al., Pregel: A system for large-scale graph processing, [in:] SIGMOD’10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146, ACM, 2010, https://doi.org/10.1145/1807167.180718

37. C.-W. Lu, C.-M. Hsieh, C.-H. Chang, C.-T. Yang, An improvement to data service in cloud computing with content sensitive transaction analysis and adaptation, [in:] 2013 IEEE 37th Annual Computer Software and Applications Conference Workshops, Japan, pp. 463–468, 2013, https://doi.org/10.1109/COMPSACW.2013.72

38. M. Armbrust et al., A view of cloud computing, Communications of the ACM, 53(4): 50–58, 2010, https://doi.org/10.1145/1721654.1721672

39. H. Liu, Big data drives cloud adoption in enterprise, IEEE Internet Computing, 17(4): 68–71, 2013, https://doi.org/10.1109/MIC.2013.63

40. S. Pandey, S. Nepal, Cloud computing and scientific applications – Big data, scalable analytics, and beyond, Future Generation Computer Systems, 29(7): 1774–1776, 2013, https://doi.org/10.1016/j.future.2013.04.026

41. D. Warneke, O. Kao, Nephele: Efficient parallel data processing in the cloud, [in:] MTAGS’09: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, ACM, Article no. 8, pp. 1–10, 2009, https://doi.org/10.1145/1646468.1646476

42. P. Mell, T. Grance, The NIST Definition of Cloud Computing, Technical Report, Special Publication 80, National Institute of Standards & Technology, Gaithersburg, MD, USA, 2011.

43. G. Aceto, A. Botta, W. de Donato, A. Pescapè, Cloud monitoring: A survey, Computer Networks, 57(9): 2093–2115, 2013, https://doi.org/10.1016/j.comnet.2013.04.0

44. T. Gunarathne, B. Zhang, T.-L. Wu, J. Qiu, Scalable parallel computing on clouds using Twister4Azure iterative MapReduce, Future Generation Computer Systems, 29(4): 1035–1048, 2013, https://doi.org/10.1016/j.future.2012.05.027

45. A. O’Driscoll, J. Daugelaite, R.D. Sleator, ‘Big data’, Hadoop and cloud computing in genomics, Journal of Biomedical Informatics, 46(5): 774–781, 2013, https://doi.org/10.1016/j.jbi.2013.07.001

46. M.D. Assunção, R.N. Calheiros, S. Bianchi, M.A.S. Netto, R. Buyya, Big Data computing and clouds: Trends and future directions, Journal of Parallel and Distributed Computing, 79–80: 3–15, 2015, https://doi.org/10.1016/j.jpdc.2014.08.003

47. P.S. Yu, On mining big data, [in:] J. Wang, H. Xiong, Y. Ishikawa, J. Xu, J. Zhou [Eds.], Web-Age Information Management, Lecture Notes in Computer Science, Vol. 7923, Springer-Verlag, Berlin, Heidelberg, 2013, p. XIV.

48. X. Sun et al., Towards delivering analytical solutions in cloud: Business models and technical challenges, [in:] 2011 IEEE 8th International Conference on e-Business Engineering, Beijing, China, pp. 347–351, 2011, https://doi.org/10.1109/ICEBE.2011.81

49. ‘Big Data’ has Big Potential to Improve Americans’ Lives, Increase Economic Opportunities, Press Releases, Committee on Science, Space and Technology, April 24, 2013, https://science.house.gov/2013/4/big-data-has-big-potential-improve-americans-lives-increase-economic

50. Prime Minister joins Sir Ka-shing Li for launch of £90m initiative in big data and drug discovery at Oxford, University of Oxford, May 3, 2013, http://www.cs.ox.ac.uk/news/639-full.html

51. J. Manzoni, Big data in government: the challenges and opportunities, Speech delivered on February 17, 2017, https://www.gov.uk/government/speeches/big-data-ingovernment-the-challenges-and-opportunities

52. Government-backed Russian Fund Launches Big Data Investment Program, RusSoft, http://russoft.org/docs/?doc=3391 (accessed May 12, 2017).

53. bigdata@csail, http://bigdata.csail.mit.edu/ (accessed May 12, 2017).

54. The Intel science and technology center for big data, Information Science and Technology Consultants (ISTC), http://istc-bigdata.org

55. D. Borthakur et al., Apache Hadoop goes realtime at Facebook, [in:] SIGMOD’11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, New York, USA, pp. 1071–1080, ACM, 2011, https://doi.org/10.1145/1989323.1989438

56. M. Armbrust et al., Above the Clouds: A Berkeley View of Cloud Computing, Technical Report UCB/EECS-2009-28, EECS Department, University of California, Berkeley, USA, 2009, http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf

57. Google. Google Trends for Big Data, 2013.

58. R. Swanstrom, NIST defines Big Data and Data Science, Data Science 101, Learn Data Science, http://101.datascience.community/2015/04/23/nist-defines-big-data-and-datascience/ (accessed May 12, 2017).

59. B. Golden, Virtualization for Dummies, Wiley, Indianapolis, Indian, USA, 2009.

60. Zephoria Inc., The Top 20 Valuable Facebook Statistics – Updated May 2017, https://zephoria.com/top-15-valuable-facebook-statistics/ (accessed May 12, 2017).

61. Y. Yang, L. Zhang, Y. Zhen, R. Ji, Learning for visual semantic understanding in big data, Neurocomputing, 169: 1–4, 2015, https://doi.org/10.1016/j.neucom.2015.05.023

62. S. Maitrey, C.K. Jha, MapReduce: Simplified data analysis of big data, Procedia Computer Science, 57: 563–571, 2015, https://doi.org/10.1016/j.procs.2015.07.392

63. A. Vinay, V.S. Shekhar, J. Rituparna, T. Aggrawal, K.N.B. Murthy, S. Natarajan, Cloud based big data analytics framework for face recognition in social networks using machine learning, Procedia Computer Science, 50: 623–630, 2015, https://doi.org/10.1016/j.procs.2015.04.095

64. K.P.N. Jayasena, L. Li, Q. Xie, Multi-modal multimedia big data analyzing architecture and resource allocation on cloud platform, Neurocomputing, 253: 135–143, 2017, https://doi.org/10.1016/j.neucom.2016.11.077

65. C. Smowton et al., A cost-effective approach to improving performance of big genomic data analyses in clouds, Future Generation Computer Systems, 67: 368–381, 2017, https://doi.org/10.1016/j.future.2015.11.011

66. D. Linthicum, Three types of IoT data sources, RTInsights.com, March 29, 2016, https://www.rtinsights.com/three-types-of-iot-data-sources/ (accessed May 12, 2017).

67. M. Wilhelm et al., Mass-spectrometry-based draft of the human proteome, Nature, 509: 582–587, 2014, https://doi.org/10.1038/nature13319

68. E. Olshannikova, T. Olsson, J. Huhtamäki, H. Kärkkäinen, Conceptualizing big social data, Journal of Big Data, 4(3): 1–19, 2017.

69. A.E. Marwick, Status Update: Celebrity, Publicity, and Branding in the Social Media Age, Yale University Press, 2013.

70. F. Campos Freire, N. Alonso Ramos, Online digital social tools for professional selfpromotion. A state of the art review, Revista Latina de Comunicación Social, 70: 288–299, 2015, https://doi.org/10.4185/RLCS-2015-1047en

71. C. Shih, The Facebook Era: Tapping Online Social Networks to Build Better Products, Reach New Audiences, and Sell More Stuff, Prentice Hall, New York, 2009.

72. A.T. Stephen, O. Toubia, Deriving value from social commerce networks, Journal of Marketing Research, 47(2): 215–228, 2010, https://doi.org/10.2139/ssrn.1150995

73. M.T. Musacchio, R. Panizzon, X. Zhang, V. Zorzi, A linguistically-driven methodology for detecting impending disasters and unfolding emergencies from social media messages, [in:] Proceedings of the LREC 2016 Workshop “EMOT: Emotions, Metaphors, Ontology and Terminology during Disasters, K. Ahmad, S. Kelly, X. Zhang [Eds.], Portorož, Slovenia, p. 26–33, 2016.

74. C. Aradau, T. Blanke, Politics of prediction: Security and the time/space of governmentality in the age of big data, European Journal of Social Theory, 20(3): 373–391, 2017, https://doi.org/10.1177/1368431016667623

75. A.M.M. Saldana-Perez, M. Moreno-Ibarra, Traffic analysis based on short texts from social media, International Journal of Knowledge Society Research (IJKSR), 7(1): 63–79, 2016, https://doi.org/10.4018/IJKSR.2016010105

76. E. Qualman, Socialnomics: How Social Media Transforms the Way We Live and Do Business, John Wiley & Sons, New Jersey, 2010.

77. H. Kennedy, Commercial mediations of social media data, [in:] Post, Mine, Repeat, pp. 99–127, Palgrave Macmillan, London, 2016, https://doi.org/10.1057/978-1-137-35398-6_5

78. D. Agrawal et al., Challenges and Opportunities with Big Data, A white paper prepared for the Computing Community Consortium Committee of the Computing Research Association, 2012, http://cra.org/ccc/resources/ccc-led-whitepapers

79. M. Ware, M. Mabe, The STM Report: An Overview of Scientific and Scholarly Journal Publishing, International Association of Scientific, Technical and Medical Publishers, The Hauge, The Netherlands, 2009.

80. M.C. Burl, C. Fowlkes, J. Roden, Mining for image content, [in:] Systemics, Cybernetics, and Informatics/Information Systems: Analysis and Synthesis, Session on Intelligent Data Mining and Knowledge Discovery, 1999.

81. N. Kennedy, Facebook’s photo storage rewrite, https://www.niallkennedy.com/blog/2009/04/facebook-haystack.html (accessed May 12, 2017).

82. FortuneLords, YouTube Statistics – 2017, https://fortunelords.com/youtube-statistics/ (accessed May 12, 2017).

83. D. Saravanan, S. Srinivasan, Data mining framework for video data, [in:] Recent Advances in Space Technology Services and Climate Change 2010 (RSTS & CC-2010), Chennai, India, pp. 167–170, 2010, https://doi.org/10.1109/RSTSCC.2010.5712827

84. A. Ittoo, L.M. Nguyen, A. van den Bosch, Text analytics in industry: Challenges, desiderata and trends, Computers in Industry, 78: 96–107, 2016, https://doi.org/10.1016/j.compind.2015.12.001

85. RapidMiner, https://rapidminer.com/ (accessed May 12, 2017).

86. Weka, http://www.cs.waikato.ac.nz/ml/weka/ (accessed May 12, 2017).

87. Orange Data Mining, https://orange.biolab.si/ (accessed May 12, 2017).

88. DataMelt, http://jwork.org/dmelt/ (accessed May 12, 2017).

89. KEEL, http://www.keel.es/ (accessed May 12, 2017).

90. P. Fournier-Viger, A. Gomariz, T. Gueniche, A. Soltani, Ch.-W. Wu, V.S. Tseng, SPMF: A Java open-source pattern mining library, Journal of Machine Learning Research, 15(1): 35699–3573, 2014.

91. G.J. Williams, Rattle: A data mining GUI for R, The R Journal, 1/2: 45–55, 2009.

92. Apache Mahout, http://mahout.apache.org/ (accessed May 12, 2017).

93. V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, J. Allan, Mining of concurrent text and time series, [in:] KDD-2000 Workshop on Text Mining, Vol. 2000, pp. 37–44, University Park, PA, USA, 2000.

94. J.D. Thomas, K. Sycara, Integrating genetic algorithms and text learning for financial prediction, [in:] Proceedings of GECCO’00 Workshop on Data Mining with Evolutionary Algorithms, pp. 72–75, 2000.

95. B. Back, J. Toivonen, H. Vanharanta, A. Visa, Comparing numerical data and text information from annual reports using self-organizing maps, International Journal of Accounting Information Systems, 2(4): 249–269, 2001, https://doi.org/10.1016/S1467-0895%2801%2900018-5

96. G.P.C. Fung, J.X. Yu, W. Lam, Stock prediction: Integrating text mining approach using real-time news, [in:] 2003 IEEE International Conference on Computational Intelligence for Financial Engineering, Hong Kong, China, pp. 395–402, 2003, https://doi.org/10.1109/CIFER.2003.1196287

97. M. Koppel, I. Shtrimberg, Good news or bad news? Let the market decide, [in:] Computing Attitude and Affect in Text: Theory and Applications, J.G. Shanahan, Y. Qu, J. Wiebe [Eds.], pp. 297–301, Springer, Dordrecht, 2006, https://doi.org/10.1007/1-4020-4102-0_22

98. L. Dey, A. Mahajan, S.K.M. Haque, Document clustering for event identification and trend analysis in market news, [in:] ICAPR ’09: Proceedings of the 2009 Seventh International Conference on Advances in Pattern Recognition, pp. 103–106, Kolkata, India, 2009, https://doi.org/10.1109/ICAPR.2009.84

99. S. Wang, K. Xu, L. Liu, B. Fang, S. Liao, H. Wang, An ontology based framework for mining dependence relationships between news and financial instruments, Expert Systems with Applications, 38(10): 12044–12050, 2011, https://doi.org/10.1016/j.eswa.2011.01.148

100. A.K. Nassirtoussi, S. Aghabozorgi, T.Y. Wah, D.C.L. Ngo, Text mining of news-headlines for FOREX market prediction: A multi-layer dimension reduction algorithm with semantics and sentiment, Expert Systems with Applications, 42(1): 306–324, 2015, https://doi.org/10.1016/j.eswa.2014.08.004

101. J.B. Schafer, J.A. Konstan, J. Riedl, E-commerce recommendation applications, Data Mining and Knowledge Discovery, 5(1): 115–153, 2001, https://doi.org/10.1023/A%3A1009804230409

102. B. Pang, L. Lee, S. Vaithyanathan, Thumbs up? Sentiment classification using machine learning techniques, [in:] Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), pp. 79–86, Association for Computational Linguistics, 2002, https://doi.org/10.3115/1118693.1118704

103. M. Hu, B. Liu, Mining and summarizing customer reviews, [in:] KDD ’04: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177, 2004, https://doi.org/10.1145/1014052.1014073

104. A.-M. Popescu, O. Etzioni, Extracting product features and opinions from reviews, [in:] Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 339–346, Association for Computational Linguistics, 2005, https://aclanthology.org/H05-1043

105. A. Bifet, E. Frank, Sentiment knowledge discovery in Twitter streaming data, [in:] B. Pfahringer, G. Holmes, A. Hoffmann [Eds.], Discovery Science, Lecture Notes in Computer Science, Vol. 6332, pp. 1–15, Springer, Berlin, Heidelberg, 2010, https://doi.org/10.1007/978-3-642-16184-1_1

106. L. Dey, S.M. Haque, N. Raj, Mining customer feedbacks for actionable intelligence, [in:] 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Toronto, ON, Canada, pp. 239–242, 2010, https://doi.org/10.1109/WI-IAT.2010.196

107. S.-C.Wang, Artificial neural network, [in:] Interdisciplinary Computing in Java Programming, The Springer International Series in Engineering and Computer Science, Vol. 743, pp. 81–100, Springer, Boston, MA, 2003, https://doi.org/10.1007/978-1-4615-0377-4_5

108. R. Hecht-Nielsen, Theory of the backpropagation neural network, Neural Networks for Perception, 1(Supplement 1): 445–448, 1988, https://doi.org/10.1016/0893-6080%2888%2990469-8

109. Y. Freund, R. Iyer, R.E. Schapire, Y. Singer, An efficient boosting algorithm for combining preferences, Journal of Machine Learning Research, 4(6): 933–969, 2003.

110. J.M. Bernardo, A.F.M. Smith, Bayesian Theory, John Wiley and Sons, 2001.

111. J. Kolodner, Case-Based Reasoning, Morgan Kaufmann, 1993.

112. S. Suthaharan, Decision tree learning, [in:] Machine Learning Models and Algorithms for Big Data Classification, Integrated Series in Information Systems, Vol. 36, p. 237–269, Springer, Boston, MA, 2016, https://doi.org/10.1007/978-1-4899-7641-3_10

113. S. Muggleton, R. Otero, A. Tamaddoni-Nezhad [Eds.], Inductive Logic Programming, Vol. 38, Academic Press, London, 1992.

114. S. Choi, Gaussian Process Regression Analysis for Functional Data, Taylor & Francis, 2011.

115. K.P. Murphy, Naive Bayes Classifiers, University of British Columbia, 2006.

116. A. McCallum, D. Freitag, F.C.N. Pereira, Maximum entropy Markov models for information extraction and segmentation, [in:] 17th International Conference on Machine Learning, Vol. 17, pp. 591–598, 2000.

117. M.A. Hearst, S.T. Dumais, E. Osuna, J. Platt, B. Scholkopf, Support vector machines, IEEE Intelligent Systems and their Applications, 13(4): 18–28, 1998, https://doi.org/10.1109/5254.708428

118. J.A. Hartigan, M.A. Wong, Algorithm AS 136: A k-means clustering algorithm, Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1): 100–108, 1979, https://doi.org/10.2307/2346830

119. B. Abu-Jamous, R. Fa, A.K. Nandi, Mixture model clustering, [in:] Integrative Cluster Analysis in Bioinformatics, Ch. 15, pp. 197–226, 2015, https://doi.org/10.1002/9781118906545.ch15

120. S.C. Johnson, Hierarchical clustering schemes, Psychometrika, 32(3): 241–254, 1967, https://doi.org/10.1007/BF02289588

121. L.E. Peterson, K-nearest neighbor, Scholarpedia, 4(2): 1883, 2009, https://doi.org/10.4249/scholarpedia.1883

122. Y. Ye, C.-C. Chiang, A parallel apriori algorithm for frequent itemsets mining, [in:] Fourth International Conference on Software Engineering Research, Management and Applications (SERA’06), Seattle,WA, USA, pp. 87–94, 2006, https://doi.org/10.1109/SERA.2006.6

123. L. Schmidt-Thieme, Algorithmic features of Eclat, [in:] FIMI’04, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations, Brighton, UK, 2004.

124. C. Borgelt, An implementation of the FP-growth algorithm, [in:] OSDM ’05: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, pp. 1–5, ACM, 2005, https://doi.org/10.1145/1133905.1133907

125. Cloud Computing Services – Amazon Web Services (AWS), https://aws.amazon.com/ (accessed May 12, 2017).

126. GoGrid, https://www.datapipe.com/gogrid/ (accessed May 12, 2017).

127. Flexiscale, http://www.flexiscale.com/signup-on-stop/ (accessed May 12, 2017).

128. App Engine Application Platform | Google Cloud, https://cloud.google.com/appengine/ (accessed May 12, 2017).

129. Cloud Computing Services | Microsoft Azure, https://azure.microsoft.com/en-in/ (accessed May 12, 2017).

130. RightScale, http://www.rightscale.com/ (accessed May 12, 2017).

131. Eucalyptus, http://www.dxc.technology/cloud/offerings/140041/140149-eucalyptus_software_support_services (accessed May 12, 2017).

132. C.L. Devasena, M. Hemalatha, A hybrid image mining technique using LIM-based data mining algorithm, International Journal of Computer Applications, 25(2): 11–15, 2011, https://doi.org/10.5120/3007-4056

133. P. Rajendran, M. Madheswaran, An improved image mining technique for brain tumour classification using efficient classifier, arXiv, 2010, arXiv: 10.48550/arXiv.1001.1988.

134. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, Communications of the ACM, 60(6): 84–90, 2017, https://doi.org/10.1145/3065386

135. A.R.J. Francois, R. Nevatia, J. Hobbs, R.C. Bolles, J.R. Smith, VERL: An ontology framework for representing and annotating video events, IEEE Multimedia, 12(4): 76–86, 2005, https://doi.org/10.1109/MMUL.2005.87

136. U. Gargi, W. Lu, V. Mirrokni, S. Yoon, Large-scale community detection on YouTube for topic discovery and exploration, [in:] Proceedings of the International AAAI Conference on Web and Social Media, Vol. 5, No. 1, pp. 486–489, 2011, https://doi.org/10.1609/icwsm.v5i1.14191

137. J.R. Zhang, Y. Song, T. Leung, Improving video classification via YouTube video co-watch data, [in:] SBNMA’11: Proceedings of the 2011 ACM Workshop on Social and Behavioural Networked Media Access, pp. 21–26, ACM, 2011, https://doi.org/10.1145/2072627.2072635

138. M. Schroeck, R. Shockley, J. Smart, D. Romero-Morales, P. Tufano, Analytics: The realworld use of big data: How innovative enterprises extract value from uncertain data. Executive Report, IBM Institute for Business Value, 2012, https://public.dhe.ibm.com/software/uk/data/pdf/The_real-world_use_of_big_data.pdf

139. R. Sravan Kumar, A. Saxena, Data integrity proofs in cloud storage, [in:] 2011 Third International Conference on Communication Systems and Networks (COMSNETS 2011), Bangalore, India, pp. 1–4, 2011, https://doi.org/10.1109/COMSNETS.2011.5716422

140. L. Kocarev, G. Jakimoski, Logistic map as a block encryption algorithm, Physics Letters A, 289(4–5): 199–206, 2001, https://doi.org/10.1016/S0375-9601%2801%2900609-0

141. K.L. Weber, G. Rincon, A.L. Van Eenennaam, B.L. Golden, J.F. Medrano, Differences in allele frequency distribution of bovine high-density genotyping platforms in holsteins and jerseys, [in:] Proceedings, Western Section, American Society of Animal Science, Vol. 63, pp. 70–74, 2012, https://www.asas.org/docs/western-section/wsasas_2012.pdf?sfvrsn=0#page=84

142. R. Akerkar [Ed.], Big Data Computing, Chapman and Hall/CRC Press, New York, 2013.

143. Statista, Big data market size revenue forecast worldwide from 2011–2027, https://www.statista.com/statistics/254266/global-big-data-market-forecast/ (accessed May 12, 2017).

144. S. Kumar, K. Cengiz, S. Vimal, A. Suresh, Energy efficient resource migration based load balance mechanism for high traffic applications IoT, Wireless Personal Communications, 127: 385–403, 2021, https://doi.org/10.1007/s11277-021-08269-7

145. S. Kumar, P. Ranjan, R. Ramaswami, M.R. Tripathy, Resource efficient clustering and next hop knowledge based routing in multiple heterogeneous wireless sensor networks, International Journal of Grid and High Performance Computing, 9(2): 1–20, 2017, https://doi.org/10.4018/IJGHPC.2017040101

146. S. Kumar, P. Ranjan, R. Radhakrishnan, M.R. Tripathy, Energy efficient multichannel MAC protocol for high traffic applications in heterogeneous wireless sensor networks, Recent Advances in Electrical and Electronic Engineering, 10(3): 223–232, 2017, https://doi.org/10.2174/2352096510666170601090202