Scope (computer science)
In computer programming, a scope is the context within a computer program in which a variable name or other identifier is valid and can be used, or within which a declaration has effect. Outside of the scope of a variable name, the variable's value may still be stored, and may even be accessible in some way, but the name does not refer to it; that is, the name is not bound to the variable's storage.
Various programming languages have various different scoping rules for different kinds of declarations and identifiers. Such scoping rules have a large effect on language semantics and, consequently, on the behavior and correctness of programs. In languages like C++, accessing an unbound variable does not have well-defined semantics and may result in undefined behavior; and declarations or identifiers used outside their scope will generate syntax errors.
Scopes are frequently tied to other language constructs, but many languages also offer constructs specifically for controlling scope.
Contents |
Scope within a function
Function scope
function square(n) { return n * n; } function sum_of_squares(n) { var ret = 0; // the value to return var i = 1; // a counter to go from 1 to n while(i <= n) { ret = ret + square(i); i = i + 1; } return ret; }
Most of the commonly used programming languages offer a way to create a local variable in a function: a variable that, in some sense, disappears when the function returns.
For example, in the snippet of JavaScript code at right, two functions are defined: square and sum_of_squares. square computes the square of a number; sum_of_squares computes the sum of all squares up to a number. (For example, square(4) is 42 = 16, and sum_of_squares(4) is 12 + 22 + 32 + 42 = 30.)
Each of these functions has a variable named n that represents the argument to the function. These two n variables are completely separate and unrelated, despite having the same name, because they are local variables, with function scope: each one's scope is its own function, so they don't overlap. Therefore, sum_of_squares can call square without its own n being altered. Similarly, sum_of_squares has variables named ret and i; these variables, because of their limited scope, will not interfere with any variables named ret or i that might belong to any other function. In other words, there is no risk of a name collision between these identifiers and any unrelated identifiers, even if they are identical.
Block scope within a function
int sum_of_squares(int n) { int ret = 0; int i = 1; while(i <= n) { int n = i * i; ret = ret + n; i = i + 1; } return ret; }
Many languages take function scope slightly further, allowing variables to be made local to just part of a function; rather than having the entire function as its scope, a variable might have block scope, meaning that it's scoped to just a single block of statements. This is demonstrated in the C code at right, which has two distinct local variables named n: one whose scope is the entire function, and one that exists only inside the while-loop. In fact, each iteration of the while-loop creates a new instance of n. This inner n completely "hides" or "shadows" the outer n, such that the outer n is invisible inside the while-loop — but can still be used in the loop header, while(i <= n), which always refers to the outer n.
This is a rather contrived example; real-world C programmers do not usually name two local variables in such a way that one hides the other. Furthermore, some descendants of C, such as Java and C#, despite having support for block scope (in that a local variable can be made to go out of scope before the end of a function), do not allow one local variable to hide another. In such languages, the attempted declaration of the second n would result in a syntax error, and one of the n variables would have to be renamed.
Since the main use of blocks within a function is in control structures such as if-statements and while-loops, block scope tends to link the scope of variables to the structure of a function's flow of execution. However, languages with block scope typically also allow the use of "naked" blocks, which frequently serve no other purpose than to allow fine-grained control of variable scope.
Let-expressions
Many languages, especially functional languages, offer a feature called let-expressions, which allow a declaration's scope to be a single expression. This is convenient if, for example, an intermediate value is needed for a computation. For example, in Standard ML, if f() returns 12, then let val x = f() in x * x end is an expression that evaluates to 144, using a temporary variable named x to avoid calling f() twice. Some languages with block scope approximate this functionality by offering syntax for a block to be embedded into an expression; for example, the aforementioned Standard ML expression could be written in Perl as do { my $x = f(); $x * $x }, or in GNU C as ({ int x = f(); x * x; }).
Scope outside a function
A declaration has global scope if it has effect throughout an entire program, or (in some languages) if it has effect from the point of its occurrence until the end of the source-file it occurs in. The latter is also called file scope. Variable names with global scope — called global variables — are frequently considered bad practice, at least in some languages; but global scope is typically used (depending on the language) for various other sorts of identifiers, such as names of functions, and names of classes and other data types. In the JavaScript snippet we saw above, the function-names square and sum_of_squares have truly global scope, while in the C snippet, the function-name sum_of_squares has file scope.
{ my $counter = 0; sub increment_counter() { $counter = $counter + 1; return $counter; } }
Some languages allow the concept of block scope to be applied, to varying extents, outside of a function. For example, in the Perl snippet at right, $counter is a variable name with block scope (due to the use of the my keyword), while increment_counter is a function name with global scope. Each call to increment_counter will increase the value of $counter by one, and return the new value. Code outside of this block can call increment_counter, but cannot otherwise obtain or alter the value of $counter.
As we saw above, function scope and block scope are very useful for avoiding name collisions, but even at global scope, many languages have mechanisms such as namespaces to help mitigate this problem.
Lexical scoping and dynamic scoping
As we have seen, the use of local variables — of variable names with limited scope, that only exist within a specific function — helps avoid the risk of a name collision between two identically named variables. However, we have sidestepped an important question: what does it mean to be "within" a function? There are two very different approaches to scoping that answer this question in different ways.
In lexical scoping (or lexical scope; also called static scoping or static scope), if a variable name's scope is a certain function, then its scope is the program text of the function definition: within that text, the variable name exists, and is bound to its variable, but outside that text, the variable name does not exist. By contrast, in dynamic scoping (or dynamic scope), if a variable name's scope is a certain function, then its scope is the time-period during which the function is executing: while the function is running, the variable name exists, and is bound to its variable, but after the function returns, the variable name does not exist. This means that if function f invokes a separately defined function g, then under lexical scoping, function g does not have access to f's local variables (since the text of g is not inside the text of f), while under dynamic scoping, function g does have access to f's local variables (since the invocation of g is inside the invocation of f).
x=1 function g () { echo $x ; x=2 ; } function f () { local x=3 ; g ; } f # does this print 1, or 3? echo $x # does this print 1, or 2?
Consider, for example, the program at right. The first line, x=1, creates a global variable x and initializes it to 1. The second line, function g () { echo $x ; x=2 ; }, defines a function g that prints out ("echoes") the current value of x, and then sets x to 2 (overwriting the previous value). The third line, function f () { local x=3 ; g ; } defines a function f that creates a local variable x (hiding the identically named global variable) and initializes it to 3, and then calls g. The fourth line, f, calls f. The fifth line, echo $x, prints out the current value of x.
So, what exactly does this program print? It depends on the scoping rules. If the language of this program is one that uses lexical scoping, then g prints and modifies the global variable x (because g is defined outside f), so the program prints 1 and then 2. By contrast, if this language uses dynamic scoping, then g prints and modifies f's local variable x (because g is called from within f), so the program prints 3 and then 1. (As it happens, the language of the program is Bash, which uses dynamic scoping; so the program prints 3 and then 1.)
Lexical scoping
With lexical scope, a name always refers to its (more or less) local lexical environment. This is a property of the program text and is made independent of the runtime call stack by the language implementation. Because this matching only requires analysis of the static program text, this type of scoping is also called static scoping. Lexical scoping is standard in all ALGOL-based languages such as Pascal, Modula2 and Ada as well as in modern functional languages such as ML and Haskell. It is also used in the C language and its syntactic and semantic relatives, although with different kinds of limitations. Static scoping allows the programmer to reason about object references such as parameters, variables, constants, types, functions, etc. as simple name substitutions. This makes it much easier to make modular code and reason about it, since the local naming structure can be understood in isolation. In contrast, dynamic scope forces the programmer to anticipate all possible dynamic contexts in which the module's code may be invoked.
program A; var I:integer; K:char; procedure B; var K:real; L:integer; procedure C; var M:real; begin (*scope A+B+C*) end; (*scope A+B*) end; (*scope A*) end.
For example, consider the Pascal program fragment at right. The variable I
is visible at all points, because it is never hidden by another variable of the same name. The char
variable K
is visible only in the main program because it is hidden by the real
variable K
visible in procedure B
and C
only. Variable L
is also visible only in procedure B
and C
but it does not hide any other variable. Variable M
is only visible in procedure C
and therefore not accessible either from procedure B
or the main program. Also, procedure C
is visible only in procedure B
and can therefore not be called from the main program.
There could have been another procedure C
declared in the program outside of procedure B
. The place in the program where "C
" is mentioned then determines which of the two procedures named C
it represents, thus precisely analogous with the scope of variables.
Correct implementation of static scope in languages with first-class nested functions is not trivial, as it requires each function value to carry with it a record of the values of the variables that it depends on (the pair of the function and this environment is called a closure). Depending on implementation and computer architecture, variable lookup may become slightly inefficient[citation needed] when very deeply lexically nested functions are used, although there are well-known techniques to mitigate this[citation needed]. Also, for nested functions that only refer to their own arguments and (immediately) local variables, all relative locations can be known at compile time. No overhead at all is therefore incurred when using that type of nested function. The same applies to particular parts of a program where nested functions are not used, and, naturally, to programs written in a language where nested functions are not available (such as in the C language).
History
Lexical scoping was used for ALGOL and has been picked up in most other languages since then. Deep binding, which approximates static (lexical) scoping, was introduced in LISP 1.5 (via the Funarg device developed by Steve Russell, working under John McCarthy). The original Lisp interpreter (1960) and most early Lisps used dynamic scoping, but descendants of dynamically scoped languages often adopt static scoping; Common Lisp has both dynamic and static scoping while Scheme uses static scoping exclusively. Perl is another language with dynamic scoping that added static scoping afterwards. Languages like Pascal and C have always had lexical scoping, since they are both influenced by the ideas that went into ALGOL 60 (although C did not include lexically nested functions).
Dynamic scoping
With dynamic scope, each identifier has a global stack of bindings. Introducing a local variable with name x
pushes a binding onto the global x
stack (which may have been empty), which is popped off when the control flow leaves the scope. Evaluating x
in any context always yields the top binding. In other words, a global identifier refers to the identifier associated with the most recent environment. Note that this cannot be done at compile-time because the binding stack only exists at run-time, which is why this type of scoping is called dynamic scoping.
Generally, certain blocks are defined to create bindings whose lifetime is the execution time of the block; this adds some features of static scoping to the dynamic scoping process. However, since a section of code can be called from many different locations and situations, it can be difficult to determine at the outset what bindings will apply when a variable is used (or if one exists at all). This can be beneficial; application of the principle of least knowledge suggests that code avoid depending on the reasons for (or circumstances of) a variable's value, but simply use the value according to the variable's definition. This narrow interpretation of shared data can provide a very flexible system for adapting the behavior of a function to the current state (or policy) of the system. However, this benefit relies on careful documentation of all variables used this way as well as on careful avoidance of assumptions about a variable's behavior, and does not provide any mechanism to detect interference between different parts of a program. Dynamic scoping also voids all the benefits of referential transparency. As such, dynamic scoping can be dangerous and few modern languages use it. Some languages, like Perl and Common Lisp, allow the programmer to choose static or dynamic scoping when defining or redefining a variable. Logo and Emacs lisp are examples of languages that use dynamic scoping.
Dynamic scoping is fairly easy to implement. To find an identifier's value, the program could traverse the runtime stack, checking each activation record (each function's stack frame) for a value for the identifier. In practice, this is made more efficient via the use of an association list, which is a stack of name/value pairs. Pairs are pushed onto this stack whenever declarations are made, and popped whenever variables go out of scope.[1] Shallow binding is an alternative strategy that is considerably faster, making use of a central reference table, which associates each name with its own stack of meanings. This avoids a linear search during run-time to find a particular name, but care should be taken to properly maintain this table.[2] Note that both of these strategies assume a last-in-first-out (LIFO) ordering to bindings for any one variable; in practice all bindings are so ordered.
An even simpler implementation is the representation of dynamic variables with simple global variables. The local binding is performed by saving the original value in an anonymous location on the stack that is invisible to the program. When that binding scope terminates, the original value is restored from this location. In fact, dynamic scope originated in this manner. Early implementations of Lisp used this obvious strategy for implementing local variables, and the practice survives in some dialects which are still in use, such as GNU Emacs Lisp. Lexical scope was introduced into Lisp later. This is equivalent to the above shallow binding scheme, except that the central reference table is simply the global variable binding environment, in which the current meaning of the variable is its global value. Maintaining global variables isn't complex. For instance, a symbol object can have a dedicated slot for its global value.
Dynamic scoping provides an excellent abstraction for thread local storage, but if it is used that way it cannot be based on saving and restoring a global variable. A possible implementation strategy is for each variable to have a thread-local key. When the variable is accessed, the thread-local key is used to access the thread-local memory location (by code generated by the compiler, which knows which variables are dynamic and which are lexical). If the thread-local key does not exist for the calling thread, then the global location is used. When a variable is locally bound, the prior value is stored in a hidden location on the stack. The thread-local storage is created under the variable's key, and the new value is stored there. Further nested overrides of the variable within that thread simply save and restore this thread-local location. When the initial, outer-most override's scope terminates, the thread-local key is deleted, exposing the global version of the variable once again to that thread.
Qualified identifiers
As we have seen, one of the key reasons for scope is that it helps prevent name collisions, by allowing identical identifiers to refer to distinct things, with the restriction that the identifiers must have separate scopes. Sometimes this restriction is inconvenient; when many different things need to be accessible throughout a program, they generally all need identifiers with global scope, so different techniques are required to avoid name collisions.
To address this, many languages offer mechanisms for organizing global identifiers. The details of these mechanisms, and the terms used, depend on the language; but the general idea is that a group of identifiers can itself be given a name — a prefix — and, when necessary, an entity can be referred to by a qualified identifier consisting of the identifier plus the prefix. Normally such identifiers will have, in a sense, two sets of scopes: a scope (usually the global scope) in which the qualified identifier is visible, and one or more narrower scopes in which the unqualified identifier (without the prefix) is visible as well. And normally these groups can themselves be organized into groups; that is, they can be nested.
Although many languages support this concept, the details vary greatly. Some languages have mechanisms, such as namespaces in C++ and C#, that serve almost exclusively to enable global identifiers to be organized into groups. Other languages have mechanisms, such as packages in Ada and structures in Standard ML, that combine this with the additional purpose of allowing some identifiers to be visible only to other members of their group. And object-oriented languages often allow classes or singleton objects to fulfill this purpose (whether or not they also have a mechanism for which this is the primary purpose). Furthermore, languages often meld these approaches; for example, Perl's packages are largely similar to C++'s namespaces, but optionally double as classes for object-oriented programming; and Java organizes its variables and functions into classes, but then organizes those classes into Ada-like packages.
See also
- Closure (computer science)
- Global variable
- Local variable
- Non-local variable
- Name binding
- Name resolution
- Variables (scope and extent)
- Information hiding
References
Further reading
- Harold Abelson and Gerald Jay Sussman. "Lexical addressing". Structure and Interpretation of Computer Programs.
- Scott, M. (2006). Programming Language Pragmatics, 2nd edition. Morgan Kauffman Publishers, San Francisco, CA.