User:Arcpkl/Big data

From Wikipedia, the free encyclopedia

"Big data" - Lead:

Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Big data was originally associated with three key concepts: volume, variety, and velocity, with veracity being included later on. Big data often includes data with sizes that exceed the capacity of traditional software to process within an acceptable time.

Current usage of the term big data tends to refer to the use of advanced data analytics methods that extract value from data, and seldom to a particular size of data set. Analysis of data sets can find new correlations to "spot business trends, prevent diseases, combat crime and so on."[1] Scientists, business executives, medical practitioners, advertising and governments alike regularly meet difficulties with large data-sets in areas including Internet searches, fintech, urban informatics, and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics,[2] connectomics, complex physics simulations, biology and environmental research.[3]

Data sets grow rapidly, to a certain extent because they are increasingly gathered by cheap and numerous information-sensing Internet of things devices such as mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless sensor networks.[4][5] A question large enterprises face is determining who should own big-data initiatives that affect the entire organization.[6]

Relational database management systems, desktop statistics[clarification needed] and software packages used to visualize data often have difficulty handling big data. The work may require "massively parallel software running on tens, hundreds, or even thousands of servers".[7] What qualifies as being "big data" varies depending on the capabilities of the users and their tools, and expanding capabilities make big data a moving target. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration."[8]

Definition[edit]

The term "big data" has been in use since the 1990s, with some giving credit to John Mashey for popularizing the term.[9][10] Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time.[11] Big data philosophy encompasses unstructured, semi-structured and structured data, however the main focus is on unstructured data.[12] Big data requires a set of techniques and technologies with new forms of integration to reveal insights from data-sets that are diverse, complex, and of a massive scale.[13]

"Variety", "veracity" and various other "Vs" are added by some organizations to describe big data, a revision challenged by some industry authorities.[14]

Big data "size" is a constantly moving target, ranging from a few dozen terabytes to many zettabytes of data as of 2012.[15] The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s;[16] as of 2017, every day 2.5 exabytes (2.5×260 bytes) of data are generated.[17] Based on an IDC report prediction, the global data volume was predicted to grow exponentially from 4.4 zettabytes to 44 zettabytes between 2013 and 2020. By 2025, IDC predicts there will be 163 zettabytes of data.[18]

A 2018 definition states "Big data is where parallel computing tools are needed to handle data", and notes, "This represents a distinct and clearly defined change in the computer science used, via parallel programming theories, and losses of some of the guarantees and capabilities made by Codd's relational model."[19]

The growing maturity of the concept more starkly delineates the difference between "big data" and "Business Intelligence":[20]


  1. ^ Cite error: The named reference Economist was invoked but never defined (see the help page).
  2. ^ "Community cleverness required". Nature. 455 (7209): 1. September 2008. Bibcode:2008Natur.455....1.. doi:10.1038/455001a. PMID 18769385.
  3. ^ Reichman OJ, Jones MB, Schildhauer MP (February 2011). "Challenges and opportunities of open data in ecology". Science. 331 (6018): 703–5. Bibcode:2011Sci...331..703R. doi:10.1126/science.1197962. PMID 21311007. S2CID 22686503.
  4. ^ Hellerstein, Joe (9 November 2008). "Parallel Programming in the Age of Big Data". Gigaom Blog.
  5. ^ Segaran, Toby; Hammerbacher, Jeff (2009). Beautiful Data: The Stories Behind Elegant Data Solutions. O'Reilly Media. p. 257. ISBN 978-0-596-15711-1.
  6. ^ Oracle and FSN, "Mastering Big Data: CFO Strategies to Transform Insight into Opportunity" Archived 4 August 2013 at the Wayback Machine, December 2012
  7. ^ Jacobs, A. (6 July 2009). "The Pathologies of Big Data". ACMQueue.
  8. ^ Magoulas, Roger; Lorica, Ben (February 2009). "Introduction to Big Data". Release 2.0 (11). Sebastopol CA: O'Reilly Media.
  9. ^ John R. Mashey (25 April 1998). "Big Data ... and the Next Wave of InfraStress" (PDF). Slides from invited talk. Usenix. Retrieved 28 September 2016.
  10. ^ Steve Lohr (1 February 2013). "The Origins of 'Big Data': An Etymological Detective Story". The New York Times. Retrieved 28 September 2016.
  11. ^ Snijders, C.; Matzat, U.; Reips, U.-D. (2012). "'Big Data': Big gaps of knowledge in the field of Internet". International Journal of Internet Science. 7: 1–5.
  12. ^ Dedić, N.; Stanier, C. (2017). "Towards Differentiating Business Intelligence, Big Data, Data Analytics and Knowledge Discovery". Innovations in Enterprise Information Systems Management and Engineering. Lecture Notes in Business Information Processing. Vol. 285. Berlin ; Heidelberg: Springer International Publishing. pp. 114–122. doi:10.1007/978-3-319-58801-8_10. ISBN 978-3-319-58800-1. ISSN 1865-1356. OCLC 909580101.
  13. ^ Ibrahim; Targio Hashem, Abaker; Yaqoob, Ibrar; Badrul Anuar, Nor; Mokhtar, Salimah; Gani, Abdullah; Ullah Khan, Samee (2015). "big data" on cloud computing: Review and open research issues". Information Systems. 47: 98–115. doi:10.1016/j.is.2014.07.006.
  14. ^ Grimes, Seth. "Big Data: Avoid 'Wanna V' Confusion". InformationWeek. Retrieved 5 January 2016.
  15. ^ Everts, Sarah (2016). "Information Overload". Distillations. Vol. 2, no. 2. pp. 26–33. Retrieved 22 March 2018.
  16. ^ Hilbert M, López P (April 2011). "The world's technological capacity to store, communicate, and compute information" (PDF). Science. 332 (6025): 60–5. Bibcode:2011Sci...332...60H. doi:10.1126/science.1200970. PMID 21310967. S2CID 206531385.
  17. ^ "Domo Resource - Data Never Sleeps 5.0". www.domo.com. Retrieved 2020-11-05.
  18. ^ Reinsel, David; Gantz, John; Rydning, John (13 April 2017). "Data Age 2025: The Evolution of Data to Life-Critical" (PDF). seagate.com. Framingham, MA, US: International Data Corporation. Retrieved 2 November 2017.
  19. ^ Fox, Charles (25 March 2018). Data Science for Transport. Springer Textbooks in Earth Sciences, Geography and Environment. Springer. ISBN 9783319729527.
  20. ^ "avec focalisation sur Big Data & Analytique" (PDF). Bigdataparis.com. Retrieved 8 October 2017.
  21. ^ a b Billings S.A. "Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains". Wiley, 2013
  22. ^ "le Blog ANDSI » DSI Big Data". Andsi.fr. Retrieved 8 October 2017.
  23. ^ Les Echos (3 April 2013). "Les Echos – Big Data car Low-Density Data ? La faible densité en information comme facteur discriminant – Archives". Lesechos.fr. Retrieved 8 October 2017.