On finding by halving
De Inventione per Bisectionem
Enuntiatio
Propositio II. Theorema. Let arr be a sequence of n elements arranged in non-decreasing order, and let target be a sought value. Then there exists a procedure which, inspecting the sequence only by comparison, decides the presence of target and, if present, exhibits an index at which it sits. I assert two things of this procedure. First (correctness): it returns an index i with arr[i] = target whenever target occurs in the sequence, and returns -1 whenever it does not. Second (cost): it halts after no more than ⌊log₂ n⌋ + 1 comparisons of the midpoint, this bound being sharp. The mechanism is bisection — at each step the candidate interval is cut in half, so that the labour grows only as the logarithm of the multitude, and ignorance of a sorted thousand is dispelled in ten strokes.
Expressio
def binary_search(arr, target):
"""Return an index i with arr[i] == target, or -1 if target is absent.
arr must be sorted in non-decreasing order.
>>> binary_search([], 5)
-1
>>> binary_search([42], 42)
0
>>> binary_search([1, 3, 5, 7, 9], 1)
0
>>> binary_search([1, 3, 5, 7, 9], 9)
4
>>> binary_search([1, 3, 5, 7, 9], 5)
2
>>> binary_search([1, 3, 5, 7, 9], 4)
-1
>>> binary_search([1, 3, 5, 7, 9], 10)
-1
>>> binary_search([1, 3, 5, 7, 9], -2)
-1
"""
lo, hi = 0, len(arr) - 1 # closed search interval [lo, hi]
while lo <= hi:
mid = lo + (hi - lo) // 2 # midpoint; no overflow, floor of the half
value = arr[mid]
if value == target:
return mid # found
elif value < target:
lo = mid + 1 # discard the lower half, target lies right
else:
hi = mid - 1 # discard the upper half, target lies left
return -1 # interval emptied: target absent
if __name__ == "__main__":
import doctest
doctest.testmod(verbose=False)
# Inline demonstrations on concrete inputs.
a = [2, 4, 6, 8, 10, 12, 14]
assert binary_search(a, 2) == 0
assert binary_search(a, 14) == 6
assert binary_search(a, 8) == 3
assert binary_search(a, 7) == -1 # absent, between elements
assert binary_search(a, 100) == -1 # absent, above all
assert binary_search([], 0) == -1 # empty sequence
# Duplicates: the contract promises *an* index of the target, not the first.
d = [1, 2, 2, 2, 3]
assert d[binary_search(d, 2)] == 2
Demonstratio
Let n = len(arr), and write the closed integer interval under consideration after the k-th test of the loop guard as I_k = [lo_k, hi_k], with I_0 = [0, n-1] the value upon entry. Throughout, arr is fixed and sorted non-decreasingly: i ≤ j ⟹ arr[i] ≤ arr[j].
I. Correctness, by a loop invariant.
Invariant. (P): If target occurs anywhere in arr, then it occurs at some index within I_k = [lo_k, hi_k].
Establishment. Before the first iteration I_0 = [0, n-1] is the whole array, so any occurrence of target lies within it. (P) holds initially. (When n = 0 the interval [0, -1] is empty; this is consistent, since target then occurs nowhere and the premise of (P) is vacuous.)
Preservation. Suppose (P) holds and the guard lo ≤ hi is true, so I_k is non-empty and mid = lo + ⌊(hi - lo)/2⌋ ∈ [lo, hi] is a legitimate index. Three cases:
arr[mid] = target: the procedure returnsmid, an index wheretargetsits. The first claim is discharged on this exit.arr[mid] < target: I show no indexj ≤ midholdstarget. By sortedness, for anyj ≤ mid,arr[j] ≤ arr[mid] < target, hencearr[j] ≠ target. Therefore any occurrence oftargetinI_kmust lie in[mid+1, hi]. Settinglo_{k+1} = mid+1,hi_{k+1} = himakesI_{k+1} = [mid+1, hi], and(P)is preserved.arr[mid] > target: symmetrically, for anyj ≥ mid,arr[j] ≥ arr[mid] > target, so no index≥ midholdstarget. Any occurrence inI_klies in[lo, mid-1]. Settinghi_{k+1} = mid-1preserves(P).
Termination exit. The loop ends without returning only when lo > hi, i.e. I_k = ∅. By (P), if target occurred in arr it would occur in this empty interval — impossible. Hence target does not occur in arr, and the returned -1 is correct.
Combining the two manners of exit: a returned index always satisfies arr[i] = target (returned only in the first case), and -1 is returned only when target is genuinely absent. The first claim holds. ∎ (correctness)
II. Cost, by the halving argument.
Let s_k = hi_k - lo_k + 1 be the number of integers in I_k (its size), so s_0 = n. Each non-returning iteration replaces I_k by one of its two flanks about mid, both of which exclude mid itself. Write m = mid - lo and M = hi - mid, so the lower flank [lo, mid-1] has size m and the upper flank [mid+1, hi] has size M, with m + M = s_k - 1.
Because mid = lo + ⌊(s_k - 1)/2⌋, we have m = ⌊(s_k-1)/2⌋ and M = ⌈(s_k-1)/2⌉. The larger flank thus has size ⌈(s_k - 1)/2⌉. Whichever flank is taken,
s_{k+1} ≤ ⌈(s_k - 1)/2⌉ ≤ ⌊s_k / 2⌋,
the last step since ⌈(s-1)/2⌉ = ⌊s/2⌋ for every integer s ≥ 1. Hence the interval size is at least halved (with floor) on every iteration that does not return.
Iteration count. Let T(n) be the maximum number of loop iterations on an array of n entries. An iteration either returns (ending the loop) or strictly shrinks a non-empty interval to s_{k+1} ≤ ⌊s_k/2⌋. Define f(s) = ⌊s/2⌋. Starting from s_0 = n and applying f repeatedly, after t shrink-steps the size is at most ⌊n / 2^t⌋. The loop continues only while the size is ≥ 1; once ⌊n/2^t⌋ = 0, i.e. 2^t > n, the interval is empty and the guard fails. The least such t is t* = ⌊log₂ n⌋ + 1 for n ≥ 1, because 2^{⌊log₂ n⌋} ≤ n < 2^{⌊log₂ n⌋ + 1}. Counting the iteration that detects emptiness or returns, the number of midpoint comparisons is at most
T(n) ≤ ⌊log₂ n⌋ + 1.
Equivalently, by the recurrence T(n) = 1 + T(⌊n/2⌋) with T(0) = 0: this is the Master-Theorem case a = 1, b = 2, f(n) = Θ(1), where n^{log_b a} = n^0 = 1, yielding T(n) = Θ(log n) — and the explicit unrolling above pins the constant to give the exact bound ⌊log₂ n⌋ + 1.
Sharpness. The bound is attained. Take arr = [0, 1, …, n-1] and search for a value absent from the array yet within reach of the deepest descent — e.g. seeking a target that drives lo/hi to collapse only after the interval has been halved the full ⌊log₂ n⌋ + 1 times. For n = 1024 this is 11 comparisons; for n = 1023, 10; for n = 1000, 10. These match the formula exactly, so no smaller universal bound holds. ∎ (cost)
III. Termination. Part II shows s_{k+1} ≤ ⌊s_k/2⌋ < s_k for every non-returning iteration with s_k ≥ 1; the size is a strictly decreasing non-negative integer, hence reaches 0 (the guard lo ≤ hi fails) within finitely many steps. The procedure halts on every input.
Both assertions of the Enuntiatio — correctness and the sharp logarithmic cost — are established.
Q.E.D.