{{short description|Probability distribution}} {{for|the linguistics law on word length|Zipf's law of abbreviation}} [[File:Zipf 30wiki en labels.png|thumbnail|A plot of the number of occurrences ''y'' of each word as a function of its frequency rank ''x'' for the first 10 million words in 30 Wikipedias (dumps from October 2015) in a [[log-log]] scale.]] '''Zipf's law''' ({{IPAc-en|z|ɪ|f}}, {{IPA-de|ts͡ɪpf|lang}}) is an [[empirical law]] that often holds, approximately, when a list of measured values is sorted in decreasing order. It states that the value of the ''n''th entry is [[inversely proportional]] to ''n''. The best known instance of Zipf's law applies to the [[frequency table]] of words in a text or [[text corpus|corpus]] of [[natural language]]. Namely, it is usually found that the most common word occurs approximately twice as often as the next common one, three times as often as the third most common, and so on. For example, in the [[Brown Corpus]] of American English text, the word "''[[English articles#Definite article|the]]''" is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69,971 out of slightly over 1 million). True to Zipf's Law, the second-place word "''of''" accounts for slightly over 3.5% of words (36,411 occurrences), followed by "''and''" (28,852). This "law" is named after the American [[linguistics|linguist]] [[George Kingsley Zipf]], and is still an important concept in [[quantitative linguistics]]. It has been found to apply to many other types of data studied in the [[physical science|physical]] and [[social science|social]] sciences. In [[mathematical statistics]], the concept has been formalized as the '''Zipfian distribution''': a family of related discrete [[probability distribution]]s whose [[rank-frequency distribution]] is an inverse [[power law]] relation. They are related to and to [[Benford's law]]. Some sets of time-dependent empirical data deviate somewhat from Zipf's law. Such empirical distributions are said to be quasi-Zipfian. ==History== In 1913, the German physicist [[Felix Auerbach]] observed an inverse proportionality between the population sizes of cities, and their ranks when sorted by decreasing order of that variable. Also the French stenographer [[Jean-Baptiste Estoup]] noticed the exponential regularity before Zipf. The same relation for frequencies of words in natural language texts was observed by George Zipf in 1932, but he never claimed to have originated it. In fact, Zipf didn't like mathematics. In his 1932 publication,{{cn|date=May 2023}} the author speaks with disdain about mathematical involvement in linguistics, a. o. ibidem, p. 21: ''(…) let me say here for the sake of any mathematician who may plan to formulate the ensuing data more exactly, the ability of the highly intense positive to become the highly intense negative, in my opinion, introduces the devil into the formula in the form of √(-i)''. The only mathematical expression Zipf used looks like ''a''.''b''2 = constant, which he "borrowed" from [[Alfred J. Lotka]]'s 1926 publication. The same relationship was found to occur in many other contexts, and for other variables besides frequency. For example, when corporations are ranked by decreasing size, their sizes are found to be inversely proportional to the rank. The same relation is found for personal incomes,{{cn|date=May 2023}} number of people watching the same TV channel, [[musical note|notes]] in music, cells [[transcriptomes]] and more. ==Formal definition== {{Probability distribution| name =Zipf's law| type =mass| pdf_image =[[Image:Zipf distribution PMF.png|325px|Plot of the Zipf PMF for ''N'' = 10]]
Zipf PMF for ''N'' = 10 on a log–log scale. The horizontal axis is the index ''k'' . (Note that the function is only defined at integer values of ''k''. The connecting lines do not indicate continuity.)| cdf_image =[[Image:Zipf distribution CMF.png|325px|Plot of the Zipf CDF for N=10]]
Zipf CDF for ''N'' = 10. The horizontal axis is the index ''k'' . (Note that the function is only defined at integer values of ''k''. The connecting lines do not indicate continuity.)| parameters =s \geq 0\, ([[real number|real]])
N \in \{1,2,3\ldots\} ([[integer]])| support =k \in \{1,2,\ldots,N\}| pdf =\frac{1/k^s}{H_{N,s}} where ''HN,s'' is the ''N''th generalized [[harmonic number]]| cdf =\frac{H_{k,s}}{H_{N,s}}| mean =\frac{H_{N,s-1}}{H_{N,s}}| median =| mode =1\,| variance =\frac{H_{N,s-2}}{H_{N,s}}-\frac{H^2_{N,s-1}}{H^2_{N,s}}| skewness =| kurtosis =| entropy =\frac{s}{H_{N,s}}\sum\limits_{k=1}^N\frac{\ln(k)}{k^s} +\ln(H_{N,s})| mgf =\frac{1}{H_{N,s}}\sum\limits_{n=1}^N \frac{e^{nt}}{n^s}| char =\frac{1}{H_{N,s}}\sum\limits_{n=1}^N \frac{e^{int}}{n^s}| }} Formally, the Zipf distribution on {{mvar|N}} elements assigns to the element of rank {{mvar|k}} (counting from 1) the probability : f(k;N) = \frac{1}{H_N}\,\frac{1}{k} where {{mvar|H}{{mvar|N}} is a normalization constant, the {{mvar|N}}th [[harmonic number]]: : H_N = \sum_k=1^N \frac{1}{k}\ . The distribution is sometimes generalized to an inverse power law with exponent {{mvar|s}} instead of 1. Namely, :f(k;s,N) = \frac{1}{H_{s,N}}\,\frac{1}{k^s} where {{mvar|H}{{mvar|s}},{{mvar|N}} is a [[generalized harmonic number]] : H_{s,N} = \sum_k=1^N \frac{1}{k^s}\ . The generalized Zipf distribution can be extended to infinitely many items ({{mvar|N}} = ∞) only if the exponent {{mvar|s}} exceeds 1. In that case, the normalization constant {{mvar|H}{{mvar|s}},{{mvar|N}} becomes [[Riemann zeta function|Riemann's zeta function]], :\zeta (s) = \sum_{k=1}^\infty \frac{1}{k^s} < \infty\ . If the exponent {{mvar|s}} is 1 or less, the normalization constant {{mvar|H}{{mvar|s}},{{mvar|N}} diverges as {{mvar|N}} tends to infinity. ==Empirical testing== Empirically, a data set can be tested to see whether Zipf's law applies by checking the [[goodness of fit]] of an empirical distribution to the hypothesized power law distribution with a [[Kolmogorov–Smirnov test]], and then comparing the (log) likelihood ratio of the power law distribution to alternative distributions like an exponential distribution or lognormal distribution.Clauset, A., Shalizi, C. R., & Newman, M. E. J. (2009). Power-Law Distributions in Empirical Data. SIAM Review, 51(4), 661–703. {{doi|10.1137/070710111}} Zipf's law can be visuallized by [[graph of a function|plotting]] the item frequency data on a [[log-log]] graph, with the axes being the [[logarithm]] of rank order, and logarithm of frequency. For example, as described in the introduction, the word ''"the"'' would appear at {{mvar|x}} = log(1) (order rank = 1), and {{nobr| {{mvar|y}} {{=}} log(69 971).}} It is also possible to plot reciprocal rank against frequency or reciprocal frequency or interword interval against rank. The data conform to Zipf's law to the extent that the plot is [[linear equation|linear]]. ==Statistical explanations== Although Zipf's Law holds for most natural languages, even some non-natural ones like [[Esperanto]], the reason is still not well understood. However, it may be partially explained by the statistical analysis of randomly generated texts. Wentian Li has shown that in a document in which each character has been chosen randomly from a uniform distribution of all letters (plus a space character), the "words" with different lengths follow the macro-trend of the Zipf's law (the more probable words are the shortest with equal probability). In 1959, [[Vitold Belevitch]] observed that if any of a large class of well-behaved [[statistical distribution]]s (not only the [[normal distribution]]) is expressed in terms of rank and expanded into a [[Taylor series]], the first-order truncation of the series results in Zipf's law. Further, a second-order truncation of the Taylor series resulted in [[Zipf–Mandelbrot law|Mandelbrot's law]]. The [[principle of least effort]] is another possible explanation: Zipf himself proposed that neither speakers nor hearers using a given language want to work any harder than necessary to reach understanding, and the process that results in approximately equal distribution of effort leads to the observed Zipf distribution. Another possible cause for the Zipf distribution is a[[preferential attachment]] process, in which the value {{mvar|x}} of an item tends to grow at a rate proportional to {{mvar|x}} (intuitively, "the rich get richer" or "success breeds success"). Such a growth process results in the [[Yule–Simon distribution]], which has been shown to fit word frequency versus rank in language and population versus city rank better than Zipf's law. It was originally derived to explain population versus rank in species by Yule, and applied to cities by Simon. A similar explanation is based on [[Atlas models]], systems of exchangeable positive-valued [[diffusion process]]es with drift and variance parameters that depend only on the rank of the process. It has been shown mathematically that Zipf's law holds for Atlas models that satisfy certain natural regularity conditions. Quasi-Zipfian distributions can result from quasi-Atlas models.{{cn|date=May 2023}} In the figure above of the 10 million Wikipedia words, the log-log plots are not precisely straight lines but rather slightly concave curves with a tangent of slope -1 at some point along the curve. ==Related laws== A generalization of Zipf's law is the [[Zipf–Mandelbrot law]], proposed by [[Benoit Mandelbrot]], whose frequencies are: :f(k;N,q,s)=\frac{1}{C}\,\frac{(k+q)^s}.\, The constant {{mvar|C}} is the [[Hurwitz zeta function]] evaluated at ''s''. Zipfian distributions can be obtained from [[Pareto distribution]]s by an exchange of variables. The Zipf distribution is sometimes called the '''discrete Pareto distribution''' because it is analogous to the continuous [[Pareto distribution]] in the same way that the [[Uniform distribution (discrete)|discrete uniform distribution]] is analogous to the [[Uniform distribution (continuous)|continuous uniform distribution]]. The tail frequencies of the [[Yule–Simon distribution]] are approximately :f(k;\rho) \approx \frac{[\text{constant}]}{k^{\rho+1}} for any choice of ''ρ'' > 0. In the [[parabolic fractal distribution]], the logarithm of the frequency is a quadratic polynomial of the logarithm of the rank. This can markedly improve the fit over a simple power-law relationship. Like fractal dimension, it is possible to calculate Zipf dimension, which is a useful parameter in the analysis of texts. It has been argued that [[Benford's law]] is a special bounded case of Zipf's law, with the connection between these two laws being explained by their both originating from scale invariant functional relations from statistical physics and critical phenomena. The ratios of probabilities in Benford's law are not constant. The leading digits of data satisfying Zipf's law with {{mvar|s}} = 1 satisfy Benford's law. {| class="wikitable" style="text-align: center;" |- !n !Benford's law: P(n) =
\log_{10}(n+1)-\log_{10}(n) !\frac{\log(P(n)/P(n-1))}{\log(n/(n-1))} |- | 1 | 0.30103000 | |- | 2 | 0.17609126 | −0.7735840 |- | 3 | 0.12493874 | −0.8463832 |- | 4 | 0.09691001 | −0.8830605 |- | 5 | 0.07918125 | −0.9054412 |- | 6 | 0.06694679 | −0.9205788 |- | 7 | 0.05799195 | −0.9315169 |- | 8 | 0.05115252 | −0.9397966 |- | 9 | 0.04575749 | −0.9462848 |} ==Occurrences== ===City sizes=== Following Auerbach's 1913 observation, there has been substantial examination of Zipf's law for city sizes. However, more recent empirical and theoretical studies have challenged the relevance of Zipf's law for cities. ===Word frequencies in natural languages=== [[Image:Wikipedia-n-zipf.png|thumb|A log-log plot of word frequency in Wikipedia (November 27, 2006). 'Most popular words are "the", "of" and "and", as expected. Zipf's law corresponds to the middle linear portion of the curve, roughly following the green (1/''x'')  line, while the early part is closer to the magenta (1/''x''0.5) line while the later part is closer to the cyan (1/(''k'' + ''x'')2.0) line. These lines correspond to three distinct parameterizations of the Zipf–Mandelbrot distribution, overall a [[broken power law]] with three segments: a head, middle, and tail.]] In many texts in human languages, word frequencies approximately follow a Zipf distribution with exponent {{mvar|s}} close to 1: that is, the most common word occurs about ''n'' times the ''n''th most common ones. The law cannot hold exactly, because words must occur an integer number of times; and the approximation gets worse as the rank approaches {{mvar|N}}. Analysis of a corpus of 30,000 English texts showed that only about 15% of the texts in have a good fit to Zipf's law. Slight variations in the definition of Zipf's law can increase this percentage up to close to 50%. In large corpora, the observed frequency-rank relation can be modelled more accurately as by separate Zipf–Mandelbrot laws distributions for different subsets or subtypes of words. In particular, the frequencies of the closed class of [[functional word]]s in English is better described with ''s'' lower than 1, while open-ended vocabulary growth with document size and corpus size require ''s'' greater than 1 for convergence of the [[harmonic series (mathematics)|Generalized Harmonic Series]]. Zipf's law has been used for extraction of parallel fragments of texts out of comparable corpora. Zipf's law has also been used in the [[search for extraterrestrial intelligence]]. It has also been used in the analysis of texts for authroship attribution. The word-like sign groups of the 15th-century codex [[Voynich manuscript|Voynich Manuscript]] have been found to satisfy Zipf's law, suggesting that text is most likely not a hoax but rather written in an obscure language or cipher. === Information theory === In [[information theory]], a symbol (event, signal) of probability p contains -\log_2(p) [[bit]]s of information. Hence, Zipf's law for natural numbers: \Pr(x) \approx 1/x is equivalent with number x containing \log_2(x) bits of information. To add information from a symbol of probability p into information already stored in a natural number x, we should go to x' such that \log_2(x') \approx \log_2(x) + \log_2(1/p), or equivalently x' \approx x/p. For instance, in standard binary system we would have x' = 2x + s, what is optimal for \Pr(s=0) = \Pr(s=1) = 1/2 probability distribution. Using x' \approx x/p rule for a general probability distribution is the base of [[asymmetric numeral system]]s family of [[entropy coding]] methods used in [[data compression]], whose state distribution is also governed by Zipf's law. ==See also== {{div col|colwidth=20em}} * [[1% rule (Internet culture)]] * [[Benford's law]] * [[Bradford's law]] * [[Brevity law]] * [[Demographic gravitation]] * [[Frequency list]] * [[Gibrat's law]] * [[Hapax legomenon]] * [[Heaps' law]] * [[King effect]] * [[Lorenz curve]] * [[Lotka's law]] * [[Menzerath's law]] * [[Pareto distribution]] * [[Pareto principle]], a.k.a. the "80–20 rule" * [[Price's law]] * [[Principle of least effort]] * [[Rank-size distribution]] * [[Stigler's law of eponymy]] * [[Long tail]] {{Div col end}} ==References== Auerbach F. (1913) Das Gesetz der Bevölkerungskonzentration. Petermann’s Geographische Mitteilungen 59, 74–76 George K. Zipf (1935): ''The Psychobiology of Language''. Houghton-Mifflin. {{Cite journal |last=Zipf |first=George Kingsley |date=1942 |title=The Unity of Nature, Least-Action, and Natural Social Science |url=https://www.jstor.org/stable/2784953 |journal=Sociometry |volume=5 |issue=1 |pages=48–62 |doi=10.2307/2784953 |jstor=2784953 |issn=0038-0431}} {{cite book| author = George K. Zipf | title = Human Behavior and the Principle of Least Effort| location = Cambridge, Massachusetts | publisher = Addison-Wesley | year =1949 | page = 1 | url = https://archive.org/details/in.ernet.dli.2015.90211}} {{cite journal| author = Belevitch V | title = On the statistical laws of linguistic distributions | journal = Annales de la Société Scientifique de Bruxelles | volume = 73 | series = I | date = 18 December 1959 | pages = 310–326 |url = http://www.csl.sri.com/users/neumann/belevitch.pdf }} [[Léon Brillouin]], ''La science et la théorie de l'information'', 1959, réédité en 1988, traduction anglaise rééditée en 2004 {{cite journal |author=Wentian Li |title=Random Texts Exhibit Zipf's-Law-Like Word Frequency Distribution |journal=[[IEEE Transactions on Information Theory]] |volume=38 |issue=6 |year=1992 |pages=1842–1845 |doi=10.1109/18.165464 |citeseerx=10.1.1.164.8422 }} {{cite book|title=Univariate Discrete Distributions|edition=second|year=1992|author1=N. L. Johnson |author2=S. Kotz |author3=A. W. Kemp |name-list-style=amp |publisher=John Wiley & Sons, Inc.|location=New York|isbn=978-0-471-54897-3}}, p. 466. {{cite conference| last = Powers | first = David M W| url=http://aclweb.org/anthology/W98-1218 |title=Applications and explanations of Zipf's law| year = 1998| conference = Joint conference on new methods in language processing and computational natural language learning| pages = 151–160| publisher = Association for Computational Linguistics }} {{Cite Q | Q58629995 }}Christopher D. Manning, Hinrich Schütze ''Foundations of Statistical Natural Language Processing'', MIT Press (1999), {{isbn|978-0-262-13360-9}}, p. 24 {{Cite journal|last=Gabaix|first=Xavier|date=1999|title=Zipf's Law for Cities: An Explanation|url=https://www.jstor.org/stable/2586883|journal=The Quarterly Journal of Economics|volume=114|issue=3|pages=739–767|doi=10.1162/003355399556133|jstor=2586883|issn=0033-5533}} {{cite report |author=Adamic, Lada A. |year=2000 |title=Zipf, power-laws, and Pareto - a ranking tutorial |publisher=[[Hewlett-Packard]] Company |url=http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html |archive-url=https://web.archive.org/web/20071026062626/http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html |archive-date=2007-10-26 }} {{cite web |title=originally published |website=www.parc.xerox.com |publisher=[[Xerox Corporation]] |url=http://www.parc.xerox.com/istl/groups/iea/papers/ranking/ranking.html}} {{cite journal |first1=L. |last1=Pietronero |first2=E. |last2=Tosatti |first3=V. |last3=Tosatti |first4=A. |last4=Vespignani |date=2001 |title=Explaining the uneven distribution of numbers in nature: The laws of Benford and Zipf |journal=[[Physica A]] |volume=293 |issue=1–2 |pages=297–304 |doi=10.1016/S0378-4371(00)00633-6|bibcode=2001PhyA..293..297P }} {{cite journal |author1 = Ramon Ferrer i Cancho |author2 = Ricard V. Sole | name-list-style = amp |year= 2003 |title = Least effort and the origins of scaling in human language |journal= [[Proceedings of the National Academy of Sciences of the United States of America]] |volume= 100 |pages= 788–791 |issue= 3 |doi= 10.1073/pnas.0335980100 |pmid= 12540826 |pmc= 298679 |bibcode = 2003PNAS..100..788C |doi-access = free}} {{cite web |url=http://home.zonnet.nl/galien8/factor/factor.html |title=Factorial randomness: the Laws of Benford and Zipf with respect to the first digit distribution of the factor sequence from the natural numbers |author=Johan Gerard van der Galien |date=2003-11-08 |access-date=8 July 2016 |archive-url=https://web.archive.org/web/20070305150334/http://home.zonnet.nl/galien8/factor/factor.html |archive-date=2007-03-05}} {{cite arXiv |last=Zanette |first=Damián H. |eprint=cs/0406015 |title=Zipf's law and the creation of musical context |date=June 7, 2004}} {{cite journal |first=Ali |last=Eftekhari |date=2006 |title=Fractal geometry of texts: An initial application to the works of Shakespeare |journal=Journal of Quantitative Linguistic |volume=13 |issue=2–3 |pages=177–193 |doi=10.1080/09296170600850106|s2cid=17657731 }} {{Cite journal|last1=Gan|first1=Li|last2=Li|first2=Dong|last3=Song|first3=Shunfeng|date=2006-08-01|title=Is the Zipf law spurious in explaining city-size distributions?|url=https://www.sciencedirect.com/science/article/pii/S0165176506000772|journal=Economics Letters|language=en|volume=92|issue=2|pages=256–262|doi=10.1016/j.econlet.2006.03.004|issn=0165-1765}} {{cite conference |author1=Bill Manaris |author2=Luca Pellicoro |author3=George Pothering |author4=Harland Hodges |title=Investigating Esperanto's statistical proportions relative to other languages using neural networks and Zipf's law |url=http://www.cs.cofc.edu/~manaris/uploads/Main/IASTED2006.pdf |journal=[[Artificial Intelligence and Applications]] |date=13 February 2006 |location=Innsbruck, Austria |pages=102–108 |url-status=dead |archive-url=https://web.archive.org/web/20160305040450/http://www.cs.cofc.edu/~manaris/uploads/Main/IASTED2006.pdf |archive-date=5 March 2016 }} {{citation|contribution=An introduction to textual econometrics|first1=Stephen|last1=Fagan|first2=Ramazan|last2=Gençay|pages=133–153|title=Handbook of Empirical Economics and Finance|editor1-first=Aman|editor1-last=Ullah|editor2-first=David E. A.|editor2-last=Giles|publisher=CRC Press|year=2010|isbn=9781420070361}}. [https://books.google.com/books?hl=en&lr=&id=QAUv9R6bJzwC&oi=fnd&pg=PA139 P. 139]: "For example, in the Brown Corpus, consisting of over one million words, half of the word volume consists of repeated uses of only 135 words." [[Peter G. Neumann|Neumann, Peter G.]] [http://www.csl.sri.com/users/neumann/#12a "Statistical metalinguistics and Zipf/Pareto/Mandelbrot"], ''SRI International Computer Science Laboratory'', accessed and [https://web.archive.org/web/20110605012951/http://www.csl.sri.com/users/neumann/ archived] 29 May 2011. {{Cite journal|last1=Montemurro|first1=Marcelo A.|last2=Zanette|first2=Damián H.|date=2013-06-21|title=Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis|journal=PLOS ONE|volume=8|issue=6|pages=e66344|doi=10.1371/journal.pone.0066344|issn=1932-6203|pmc=3689824|pmid=23805215|bibcode=2013PLoSO...866344M |doi-access=free}} {{cite journal |last1=Piantadosi |first1=Steven |date=March 25, 2014 |title=Zipf's word frequency law in natural language: A critical review and future directions |journal=Psychon Bull Rev |volume=21 |issue=5 |pages=1112–1130 |doi=10.3758/s13423-014-0585-6 |pmid=24664880 |pmc=4176592}} {{cite arXiv |title=Scaling laws in human speech, decreasing emergence of new words and a generalized model |eprint = 1412.4846|last1 = Lin|first1 = Ruokuang|last2 = Ma|first2 = Qianli D. Y.|last3 = Bian|first3 = Chunhua|class = cs.CL|year = 2014}} M. Eriksson, S.M. Hasibur Rahman, F. Fraille, M. Sjöström, [http://apachepersonal.miun.se/~mageri/myresearch/bmsb2013-Eriksson.pdf Efficient Interactive Multicast over DVB-T2 - Utilizing Dynamic SFNs and PARPS] {{webarchive|url=https://web.archive.org/web/20140502183246/http://apachepersonal.miun.se/~mageri/myresearch/bmsb2013-Eriksson.pdf |date=2014-05-02}}, 2013 IEEE International Conference on Computer and Information Technology (BMSB'13), London, UK, June 2013. Suggests a heterogeneous Zipf-law TV channel-selection model {{cite journal |title=Test of two hypotheses explaining the size of populations in a system of cities |journal = Journal of Applied Statistics|volume = 42|issue = 12|pages = 2686–2693|arxiv = 1506.08535|doi = 10.1080/02664763.2015.1047744|year = 2015|last1 = Vitanov|first1 = Nikolay K.|last2 = Ausloos|first2 = Marcel|last3 = Bian|first3 = Chunhua|bibcode = 2015arXiv150608535V|s2cid = 10599428}} {{Cite journal|last1=Doyle|first1=Laurance R.|last2=Mao|first2=Tianhua|date=2016-11-18|title=Why Alien Language Would Stand Out Among All the Noise of the Universe|url=http://cosmos.nautil.us/feature/54/listening-for-extraterrestrial-blah-blah|journal=[[Nautilus Quarterly]]|language=en}} {{cite conference |url=https://comparable.limsi.fr/bucc2016/pdf/BUCC04.pdf |title=Parallel Document Identification using Zipf's Law |last1=Mohammadi |first1=Mehdi |date=2016 |book-title=Proceedings of the Ninth Workshop on Building and Using Comparable Corpora |pages=21–25 |location=Portorož, Slovenia |conference=LREC 2016 |url-status=live |archive-url=https://web.archive.org/web/20180323154706/https://comparable.limsi.fr/bucc2016/pdf/BUCC04.pdf |archive-date=2018-03-23 }} {{cite journal |last1 = Moreno-Sánchez |first1 = I. |last2 = Font-Clos |first2 = F. |last3 = Corral |first3 = A. |year = 2016 |title = Large-scale analysis of Zipf's Law in English texts |journal = PLOS ONE |volume = 11 |issue = 1 |page = e0147073 |doi = 10.1371/journal.pone.0147073 |doi-access = free |pmid =26800025 |pmc = 4723055 |arxiv = 1509.04486 |bibcode = 2016PLoSO..1147073M }} {{Cite journal|last1=Arshad|first1=Sidra|last2=Hu|first2=Shougeng|last3=Ashraf|first3=Badar Nadeem|date=2018-02-15|title=Zipf's law and city size distribution: A survey of the literature and future research agenda|url=https://www.sciencedirect.com/science/article/pii/S0378437117310130|journal=Physica A: Statistical Mechanics and Its Applications|language=en|volume=492|pages=75–92|doi=10.1016/j.physa.2017.10.005|bibcode=2018PhyA..492...75A|issn=0378-4371}} {{Cite web|last=Boyle|first=Rebecca|title=Mystery text's language-like patterns may be an elaborate hoax|url=https://www.newscientist.com/article/2106915-mystery-texts-language-like-patterns-may-be-an-elaborate-hoax/|access-date=2022-02-25|website=New Scientist|language=en-US}} {{Cite journal|last1=Verbavatz|first1=Vincent|last2=Barthelemy|first2=Marc|date=November 2020|title=The growth equation of cities|url=https://www.nature.com/articles/s41586-020-2900-x|journal=Nature|language=en|volume=587|issue=7834|pages=397–401|doi=10.1038/s41586-020-2900-x |pmid=33208958|arxiv=2011.09403|bibcode=2020Natur.587..397V|s2cid=227012701|issn=1476-4687}} {{cite journal | author1 = Ricardo T. Fernholz | author2 = Robert Fernholz | title = Zipf's law for atlas models | journal = Journal of Applied Probability | volume = 57 | issue = 4 | date = December 2020 | pages = 1276–1297 | doi = 10.1017/jpr.2020.64 | s2cid = 146808080 | url = https://www.cambridge.org/core/journals/journal-of-applied-probability/article/abs/zipfs-law-for-atlas-models/5D6B730DDEE4C05CF494213FDA57B064 }} {{Cite book|last=Kershenbaum|first=Arik|title=The Zoologist's Guide to the Galaxy: What Animals on Earth Reveal About Aliens--and Ourselves|title-link=The Zoologist's Guide to the Galaxy|date=2021-03-16|publisher=Penguin|isbn=978-1-9848-8197-7|pages=251–256|language=en|oclc=1242873084|author-link=Arik Kershenbaum}} {{cite journal| doi = 10.1101/2021.06.16.448706| pages = 2021–06.16.448706| last1 = Lazzardi| first1 = Silvia| last2 = Valle| first2 = Filippo| last3 = Mazzolini| first3 = Andrea| last4 = Scialdone| first4 = Antonio| last5 = Caselle| first5 = Michele| last6 = Osella| first6 = Matteo| title = Emergent Statistical Laws in Single-Cell Transcriptomic Data| journal = bioRxiv| accessdate = 2021-06-18| date = 2021-06-17| s2cid = 235482777| url = https://www.biorxiv.org/content/10.1101/2021.06.16.448706v1}} {{cite journal | author1 = Terence Tao | title = E Pluribus Unum: From Complexity, Universality | journal = Daedalus | volume = 141 | issue = 3 | date = 2012 | pages = 23–34 | doi = 10.1162/DAED_a_00158 | s2cid = 14535989 | url = https://direct.mit.edu/daed/article/141/3/23/27037/E-pluribus-unum-From-Complexity-Universality }} Frans J. Van Droogenbroeck (2016): [https://www.academia.edu/24147736/ Handling the Zipf distribution in computerized authorship attribution] Frans J. Van Droogenbroeck (2019): [https://www.academia.edu/40029629 An essential rephrasing of the Zipf-Mandelbrot law to solve authorship attribution applications by Gaussian statistics] Axtell, Robert L (2001): [https://www.science.org/doi/abs/10.1126/science.1062081 Zipf distribution of US firm sizes], Science, 293, 5536, 1818, American Association for the Advancement of Science. Ramu Chenna, Toby Gibson; [http://www.worldcomp-proceedings.com/proc/p2011/BIC4329.pdf Evaluation of the Suitability of a Zipfian Gap Model for Pairwise Sequence Alignment], International Conference on Bioinformatics Computational Biology: 2011. ==Further reading== * Alexander Gelbukh and Grigori Sidorov (2001) [http://www.gelbukh.com/CV/Publications/2001/CICLing-2001-Zipf.htm "Zipf and Heaps Laws’ Coefficients Depend on Language"]. Proc. [[CICLing]]-2001, ''Conference on Intelligent Text Processing and Computational Linguistics'', February 18–24, 2001, Mexico City. Lecture Notes in Computer Science N 2004, {{ISSN|0302-9743}}, {{isbn|3-540-41687-0}}, Springer-Verlag: 332–335. * Kali R. (2003) "The city as a giant component: a random graph approach to Zipf's law," ''Applied Economics Letters 10'': 717–720(4) * Shyklo A. (2017); [https://ssrn.com/abstract=2918642 Simple Explanation of Zipf's Mystery via New Rank-Share Distribution, Derived from Combinatorics of the Ranking Process], Available at SSRN: https://ssrn.com/abstract=2918642. ==External links== {{Library resources box}} {{Commons category}} *{{Cite news | last = Strogatz | first = Steven | author-link = Steven Strogatz | title = Guest Column: Math and the City | date = 2009-05-29 | url = http://judson.blogs.nytimes.com/2009/05/19/math-and-the-city/ | access-date = 2009-05-29 | work = The New York Times | archive-date = 2015-09-27 | archive-url = https://web.archive.org/web/20150927204318/http://judson.blogs.nytimes.com/2009/05/19/math-and-the-city/ | url-status = dead }}—An article on Zipf's law applied to city populations *[https://www.theatlantic.com/issues/2002/04/rauch.htm Seeing Around Corners (Artificial societies turn up Zipf's law)] *[https://web.archive.org/web/20021018011011/http://planetmath.org/encyclopedia/ZipfsLaw.html PlanetMath article on Zipf's law] *[http://www.hubbertpeak.com/laherrere/fractal.htm Distributions de type "fractal parabolique" dans la Nature (French, with English summary)] {{Webarchive|url=https://web.archive.org/web/20041024144850/http://www.hubbertpeak.com/laherrere/fractal.htm |date=2004-10-24 }} *[https://www.newscientist.com/article.ns?id=mg18524904.300 An analysis of income distribution] *[http://www.lexique.org/listes/liste_mots.txt Zipf List of French words] {{Webarchive|url=https://web.archive.org/web/20070623154627/http://www.lexique.org/listes/liste_mots.txt |date=2007-06-23 }} *[http://1.1o1.in/en/webtools/semantic-depth Zipf list for English, French, Spanish, Italian, Swedish, Icelandic, Latin, Portuguese and Finnish from Gutenberg Project and online calculator to rank words in texts] {{Webarchive|url=https://web.archive.org/web/20110408115104/http://1.1o1.in/en/webtools/semantic-depth |date=2011-04-08 }} *[https://arxiv.org/abs/physics/9901035 Citations and the Zipf–Mandelbrot's law] *[http://www.geoffkirby.co.uk/ZIPFSLAW.pdf Zipf's Law examples and modelling (1985)] *[http://www.nature.com/nature/journal/v474/n7350/full/474164a.html Complex systems: Unzipping Zipf's law (2011)] *[http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/ Benford’s law, Zipf’s law, and the Pareto distribution] by Terence Tao. *{{springer|title=Zipf law|id=p/z130110}} {{ProbDistributions|discrete-finite}} {{Authority control}} [[Category:Discrete distributions]] [[Category:Computational linguistics]] [[Category:Power laws]] [[Category:Statistical laws]] [[Category:Empirical laws]] [[Category:Eponyms]] [[Category:Tails of probability distributions]] [[Category:Quantitative linguistics]] [[Category:Bibliometrics]] [[Category:Corpus linguistics]] [[Category:1949 introductions]]