Binary search algorithm

This article is about searching a finite sorted array. For searching continuous function values, see bisection method.

Binary search algorithm
Visualization of the binary search algorithm
Class	Search algorithm
Data structure	Array
Worst case performance	O(log n)
Best case performance	O(1)
Average case performance	O(log n)
Worst case space complexity	O(1)

In computer science, binary search, also known as half-interval search^[1] or logarithmic search^[2], is a search algorithm that finds the position of a target value within a sorted array.^[3]^[4] It works by comparing the target value to the middle element of the array; if they are not equal, the lower or upper half of the array is eliminated depending on the result and the search is repeated until the position of the target value is found.

Binary search is can be described in an intuitive way as a recursive algorithm; however, it is often done iteratively by keeping track of the bounds of the search with two pointers. Binary search is efficient for sorted arrays that are stored contiguously (close together) in memory,^[5] making O(log n) comparisons, where n is the number of elements in the array. Despite its simplicity, a precise implementation of binary search can be difficult.

A binary search tree (BST) is a type of tree data structure inspired by binary search; values are inserted into a BST by comparing them with values already in the tree, allowing them to be searched by comparisons.^[6] The binary and BST search algorithms are dichotomic and divide and conquer, with two choices to be made per step.^[7]

Algorithm

A flowchart of the recursive binary search.

Binary search only works on sorted arrays. A binary search begins by comparing the middle element of the array with the target value. If the target value matches the middle element, its position in the array is returned. If the target value is less or more than the middle element, the search continues the lower or upper half of the array respectively with a new middle element, eliminating the other half from consideration.^[8]

Procedure

The logical procedure for a basic binary search is as follows: given an array A of n elements with values A₁ ... A_n and target value T, the following subroutine finds the index of T in A.^[8]

Set L to 1 and R to n.
If L > R, the search terminates as unsuccessful. Compute the position of the middle element m of the subarray bounded by A_L and A_R.
If A_m = T, the search is done; return m.
If A_m < T, set L to m + 1 and go to step 2.
If A_m > T, set R to m - 1 and go to step 2.

This is the inclusive bounds version. The iterative procedure keeps track of the search boundaries via two variables; a recursive version would keep its boundaries in its recursive calls. Some implementations may place the comparison for equality at the end, resulting in a faster comparison loop but costing one more iteration on average; this is more efficient for large n. The array may also contain records containing other information instead of merely values, which can be exploited to implement an associative array, where binary search searches for the target key; however, hash tables are usually superior to such a scheme.

Exclusive or inclusive bounds

In the exclusive bound form the span to be searched is (L+1) to (R−1), and this may seem clumsy when the span to be searched could be described in the "inclusive form, as L to R. Although the details differ the two forms are equivalent as can be seen by transforming one version into the other. The inclusive bound form can be attained by replacing all appearances of "L" by "(L−1)" and "R" by "(R+1)" and then rearranging. Thus, the initialisation of L := 0 becomes (L−1) := 0 or L := 1, and R := N+1 becomes (R+1) := N+1 or R := N. Also note that the changes to L and R are no longer simply transferring the value of m to L or R as appropriate but now must be (R+1) := m or R := m−1, and (L−1) := m or L := m+1.

Another common variation uses inclusive bounds for the left bound, but exclusive bounds for the right bound. This is derived from the fact that the bounds in a language with zero-based arrays can be simply initialized to 0 and the size of the array, respectively.^[9]

Performance

A tree representing binary search

The performance of binary search can be analyzed by reducing the procedure to a binary comparison tree, where the root node is the middle element of the array; the middle element of the lower half is left of the root and the middle element of the upper half is right of the root. The rest of the tree is built in a similar fashion. This model represents binary search; starting from the root node, the left subtree is traversed if the target value is less than the node under consideration and vice versa, representing the successive elimination of elements.

The worst case is reached when the search reaches the deepest level of the tree; this is equivalent to a binary search where the element is found or the search is terminated only after the search has reduced to one element, and it is reached after ⌊log₂(n)⌋ + 1 iterations (of the comparison loop), where the ⌊ ⌋ notation denotes the floor function that rounds its argument down to an integer.^[10] On average, by the time the search completes, there will be one layer of the tree remaining; this is equivalent to a binary search that has reduced to two elements, and it is reached after about log₂(n) − 1 iterations.^[5] In the best case, where the first middle element selected is equal to the target value, its position is returned after one iteration.

Each iteration of the binary search algorithm defined above makes either one or two comparisons, for an average of 1.5 comparisons; by moving the check for equality at the end, one comparison is eliminated, decreasing the average number of comparisons by 0.5.^[11] In practice, for all but large n, this is less efficient than traditional binary search, as this decreases the time taken per iteration only slightly on most computers, while guaranteeing that the search takes the maximum number of iterations, on average adding one iteration to the search. For small n, the slight increase in comparison loop efficiency does not compensate for the extra iteration.^[12]

Number of iterations for binary and linear search for n from 1 to 1000.

Binary search can interact poorly with the memory hierarchy (i.e. caching), because of its random-access nature. For in-memory searching, if the span to be searched is small, a linear search may have superior performance simply because it exhibits better locality of reference. Due to this, it is usually more efficient to convert an array to a B-tree or similar data structure if plausible and use their respective binary search-like algorithms for searching.^[13]

When multiple binary searches are to be performed for the same key in related lists, fractional cascading can be used to speed up successive searches after the first one.

In theory, binary search is usually faster than linear search (which searches the array by looking at every element), but in practice that may not hold true. For an unsorted array, the cost of sorting the array may outweigh the efficiency of binary search. In addition, linear search is simpler than binary search; the extra steps required in binary search may not justify its use for small arrays.^[14]

Binary search versus other schemes

A hash table.

For associative arrays

Especially for implementing associative arrays, hash tables, a data structure that maps keys to records using a hash function, can be more ideal than binary search on sorted records.^[15]

A search for the associated record that corresponds to a key in hash tables is usually more efficient than the equivalent binary search, requiring amortized constant time on average, at the cost of slightly more space. The hash function may sometimes map two keys to the same index in the array, a phenomenon known as a hash collision. There exist various insertion algorithms to resolve collisions. However, for most of these techniques, additional time is required for a search, in some cases rendering it worse than binary search.^[16] However, it is possible to search a hash table in amortized constant time in all cases, but with a significant cost in space.^[17]

Thus, for random, exact accesses of the record associated with a key, hash tables are superior in performance to binary searches of a sorted array. However, hashing is not useful for approximate matches, such as the next-smallest, nearest, or next-largest key,^[18] while binary search can do such matches in logarithmic time, as they are trivial after the position of the key the matches are relative to is known. Although there exist other data structures (such as van Emde Boas trees)^[a] with sometimes faster approximate matching algorithms, many of them consume large amounts of memory and become inefficient when the range of the set of keys is large. An approximate match can be computed using binary search in at worst linearithmic time regardless of any feature of the keys.^[20]

Variations

Uniform binary search

A variation of the algorithm forgoes the lower and upper pointers or variables, instead storing the index of the middle element and the width of the search; i.e., the number of elements around the middle element that has not been eliminated yet. Each step reduces the width by about half. This algorithm is called the uniform binary search because the difference between the indices of middle elements and the preceding middle elements chosen remains constant between searches of arrays of the same length.^[21]

A lookup table containing the widths at each step of the search can be calculated in advance, eliminating the need to keep track of a middle element as it can be calculated using the table.^[22]

Fibonaccian search

Fibonaccian search exploits the Fibonacci numbers to eliminate division from the algorithm, instead using addition and subtraction; this is faster on some computers. It works like the uniform binary search, except it bases the width of the search on the Fibonacci numbers. This search is theoretically less efficient in the worst case than traditional uniform binary search, although more efficient in the average case.^[23]

Exponential search

Main article: Exponential search

Exponential search is an algorithm for searching primarily in infinite lists, but it can be applied to select the upper bound for binary search. It starts by finding the first element in index 2^j that exceeds the target value, where j starts at zero and increments by one per iteration. Afterwards, it sets that index as the upper bound, and switches to binary search. This is efficient if the target value is located near the beginning of the array, with ⌊log₂(x)⌋ + 1 iterations of the exponential search and at most ⌊log₂(x)⌋ iterations of the binary search, where x is the position of the target value.^[24]

Interpolation search

Main article: Interpolation search

Instead of merely calculating the midpoint, interpolation search attempts to calculate the position of the target value, taking into account the lowest and highest elements in the array and the length of the array. It works on the basis that the midpoint is not the best guess in many cases; for example, if the target value is close to the highest element in the array, it is likely to be located near the end of the array. It is theoretically efficient for arrays whose elements successively increase by a constant or close to constant amount, making about log₂(log₂(n)) comparisons in the average case.

In practice, interpolation search is faster than binary search only when the search space of the array is large, as the time difference between interpolation and binary search does not compensate for the extra computation required for small arrays. It is most successful as a precursor to binary search, eliminating elements while the search space is large and switching to binary search afterwards.^[25]

Equality test at end of program

Binary search is dichotomic, meaning that there are two choices to be made per step, but only after the target element is compared with the middle element. This comparison can be moved to the end of the program, after the search is reduced to one element. This approach allows the comparison loop to save on average 0.5 comparisons per iteration as it would not have to check for equality;^[11] however, it foregoes the possibility of early termination on discovery of a match. In practice, since the comparison loop is performed only log n times and only slightly decreases the time required to perform the loop, it is slower than traditional binary search for all but large n.^[12]

Its main advantage is that if the elements are not unique, it is guaranteed to return the largest index where the element is equal to the target value. The traditional version would return the first match it found, and that match is not guaranteed to be the largest-indexed match.^[26]

History

The first example of a list of numbers sorted to allow for faster searching dates back to the Babylonian Inakibit-Anu tablet from c. 200 BCE; it contained a sorted list of about 500 sexagesimal numbers and their reciprocals. The first mention of binary search was made in 1946 by John Mauchly while giving one of the Moore School Lectures, the first ever course regarding any computer-related topic.^[12]

No binary search algorithm was published that worked for arrays not of length 2ⁿ-1 for any natural number n until 1960, when Derrick Henry Lehmer published, along with the Lehmer code, a binary search algorithm that worked on all arrays.^[27] In 1962, Hermann Bottenbruch presented an ALGOL 60 implementation of binary search that placed the comparison for equality at the end, increasing the number of steps by one, but reducing to one the number of comparisons per step, as well as ensuring that the position of the rightmost element is returned if the target value is duplicated in the array.^[28] The uniform binary search was presented to Donald Knuth in 1971 by A. K. Chandra of Stanford University and published in Knuth's The Art of Computer Programming.^[12]

Implementation issues

Although the basic idea of binary search is comparatively straightforward, the details can be surprisingly tricky… — Donald Knuth^[2]

When Jon Bentley assigned it as a problem in a course for professional programmers, he found that an astounding ninety percent failed to code a binary search correctly after several hours of working on it,^[29] and another study shows that accurate code for it is only found in five out of twenty textbooks.^[30] Furthermore, Bentley's own implementation of binary search, published in his 1986 book Programming Pearls, contained an overflow error that remained undetected for over twenty years.^[31]

Arithmetic

In a practical implementation, the variables used to represent the indices will often be of finite size, hence only capable of representing a finite range of values. For example, 32-bit unsigned integers can only hold values from 0 to 4294967295. 32-bit signed integers can only hold values from -2147483648 to 2147483647. If the binary search algorithm is to operate on large arrays, this has two implications:

The values first − 1 and last + 1 must both be representable within the finite bounds of the chosen integer type . Therefore, continuing the 32-bit unsigned example, the largest value that last may take is +4294967294, not +4294967295.^[32]
If the midpoint of the span is calculated as p := (L + R)/2, then the value (L + R) will exceed the number range if last is greater than (for unsigned) 4294967295/2 or (for signed) 2147483647/2 and the search wanders toward the upper end of the search space. This can be avoided by performing the calculation as p := (R - L)/2 + L. For example, this bug existed in Java SDK at Arrays.binarySearch() from 1.2 to 5.0 and was fixed in 6.0.^[33] However, if the binary search is over a range of numbers that includes large negative numbers (e.g. if the basic binary search technique is applied in areas other than searching an array), neither (R + L)/2 nor (R - L)/2 + L will be always correct in a language with fixed-width integers.

Language support

Many standard libraries provide a way to do a binary search:

C provides the algorithm function bsearch() in its standard library.
C++'s STL provides the algorithm functions binary_search(), lower_bound(), upper_bound() and equal_range().
Java offers a set of overloaded binarySearch() static methods in the classes Arrays and Collections in the standard java.util package for performing binary searches on Java arrays and on Lists, respectively. They must be arrays of primitives, or the arrays or Lists must be of a type that implements the Comparable interface, or you must specify a custom Comparator object.
Microsoft's .NET Framework 2.0 offers static generic versions of the binary search algorithm in its collection base classes. An example would be System.Array's method BinarySearch<T>(T[] array, T value).
Python provides the bisect module.
COBOL can perform binary search on internal tables using the SEARCH ALL statement.
Perl can perform a generic binary search using the CPAN module List::BinarySearch.^[34]
Ruby's Array class has included a bsearch method since version 2.0.0.
Go's sort standard library package contains the functions Search, SearchInts, SearchFloat64s, and SearchStrings, which implement general binary search, as well as specific implementations for searching slices of integers, floating-point numbers, and strings, respectively.^[35]
For Objective-C, the Cocoa framework provides the NSArray -indexOfObject:inSortedRange:options:usingComparator: method in Mac OS X 10.6+. Apple's Core Foundation C framework also contains a CFArrayBSearchValues() function.

Notes

^ Where u is the cardinality of its universe (its set of keys {1 .. u}), a van Emde Boas tree can perform them in O(log₂(log₂(u)) time.^[19]

References

^ Willams, Jr., Louis F. (1975). "A modification to the half-interval search (binary search) method". ACM-SE 14: 96–101. doi:10.1145/503561.503582.
^ ^a ^b Knuth 1998, §6.2, "Binary search".
^ Cormen et al. 2009, p. 39.
^ Weisstein, Eric W., "Binary Search", MathWorld.
^ ^a ^b Flores, Ivan; Madpis, George (1971). "Average binary search length for dense ordered lists". CACM 14 (9): 602–603. doi:10.1145/362663.362752.
^ Sedgewick & Wayne 2011, §3.2, "Basic implementation".
^ Knuth 1998, §6.2, "A tree representation".
^ ^a ^b Knuth 1998, §6.2, "Algorithm B".
^ Gast, Holger (2015). How to Use Objects: Code and Concepts. Upper Saddle River, NJ: Addison-Wesley Professional. ISBN 978-0321995544.
^ Knuth 1998, §6.2, "Further analysis of binary search".
^ ^a ^b Rolfe, Timothy (1997). "Analytic derivation of comparisons in binary search". ACM SIGNUM Newsletter 32 (4): 15–19. doi:10.1145/289251.289255.
^ ^a ^b ^c ^d Knuth 1998, §6.2, "History and bibliography".
^ Morin, Pat. "Radial Tree by Pumpkin-Pirate Memory Layouts for Binary Search". Retrieved 19 March 2016.
^ Hester, J. H.; Hirschberg, D. S. (1985). "Self-organizing linear search". ACM Computing Surveys 17 (3): 296. doi:10.1145/5505.5507.
^ Knuth 1998, §6.4, "Hashing".
^ Knuth 1998, §6.4, "Algorithm L" et seq.
^ Dietzfelbinger, Martin; Karlin, Anna; Mehlhorn, Kurt; Meyer auf der Heide, Friedhelm; Rohnert, Hans; Tarjan, Robert E. (August 1994). "Dynamic Perfect Hashing: Upper and Lower Bounds". SIAM Journal on Computing 23 (4): 738–761. doi:10.1137/S0097539791194094.
^ Morin, Pat. "Hash Tables" (PDF). p. 1. Retrieved 28 March 2016.
^ Cormen et al. 2009, p. 552.
^ Beame, Paul; Fich, Faith E. (1999). "Optimal bounds for the predecessor problem". STOC '99: 295–304. doi:10.1145/301250.301323.
^ Knuth 1998, §6.2, "An important variation".
^ Knuth 1998, §6.2, "Algorithm U".
^ Knuth 1998, §6.2, "Fibonaccian search".
^ Moffat & Turpin 2002, p. 33.
^ Knuth 1998, §6.2, "Interpolation search".
^ Wirth, Niklaus (1983), Programming in Modula-2, p. 35
^ Lehmer, Derrick (1960). "Teaching combinatorial tricks to a computer". Proceedings of Symposia in Applied Mathematics 10: 180–181.
^ Bottenbruch, Hermann (1962). "Structure and Use of ALGOL 60". Journal of the ACM 9 (2): 214.
^ Bentley, Jon (2000) [1986]. Programming Pearls (2nd ed.). Addison-Wesley. p. 341. ISBN 0-201-65788-0.
^ Pattis, Richard E. (1988). "Textbook errors in binary searching". SIGCSE Bulletin 20: 190–194. doi:10.1145/52965.53012. cited at Kruse, Robert (1998). Data Structures and Program Design in C++. Prentice Hall. p. 280. ISBN 0-13-768995-0.
^ Bloch, Joshua (June 2, 2006) [Updated 17 Feb 2008]. "Extra, Extra – Read All About It: Nearly All Binary Searches and Mergesorts are Broken". Google Research Blog.
^ Ruggieri, Salvatore (2003). "On computing the semi-sum of two integers" (PDF). Information Processing Letters 87 (2): 67–71. doi:10.1016/S0020-0190(03)00263-1.
^ "Bug ID: 5045582 (coll) binarySearch() fails for size larger than 1<<30". Java Bug Database. Oracle. 11 May 2004.
^ "List::BinarySearch". CPAN.
^ "Package sort". The Go Programming Language.

Bibliography

Kruse, Robert L.: "Data Structures and Program Design in C++", Prentice-Hall, 1999, ISBN 0-13-768995-0, page 280.
Knuth, Donald (1998). Sorting and Searching. The Art of Computer Programming 3 (2nd ed.). Reading, MA: Addison-Wesley Professional.
Sedgewick, Robert; Wayne, Kevin (2011). Algorithms (4th ed.). Upper Saddle River, NJ: Addison-Wesley Professional. ISBN 978-0-321-57351-3.
Moffat, Alistair; Turpin, Andrew (2002). Compression and Coding Algorithms. Hamburg, Germany: Kluwer Academic Publishers. doi:10.1007/978-1-4615-0935-6. ISBN 978-0-7923-7668-2.
Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4.

External links

fonte: https://web.archive.org/web/20160331003234/https://en.wikipedia.org/wiki/Binary_search_algorithm

Conheça Walt Disney World