Added documentation for type inference.

2024-08-17 03:40:44 +03:00 · 2020-07-14 23:02:50 -07:00 · 2020-07-14 23:02:50 -07:00 · 531ebc4694
commit 531ebc4694
parent 1cc56a58c2
2 changed files with 286 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -84,6 +84,7 @@ Installing both Pyright and Pylance at the same time is not recommended. If both
 * [Configuration](/docs/configuration.md)
 * [Settings](/docs/settings.md)
 * [Comments](/docs/comments.md)
+* [Type Inference](/docs/type-inference.md)
 * [Import Resolution](/docs/import-resolution.md)
 * [Type Stubs](/docs/type-stubs.md)
 * [Commands](/docs/commands.md)
--- a/docs/type-inference.md
+++ b/docs/type-inference.md
@ -0,0 +1,285 @@
+# Understanding Type Inference
+
+## Symbols and Scopes
+
+In Python, a _symbol_ is any name that is not a keyword. Symbols can represent classes, functions, methods, variables, parameters, modules, type aliases, type variables, etc.
+
+Symbols are defined within _scopes_. A scope is associated with a block of code and defines which symbols are visible to that code block. Scopes can be “nested” allowing code to see symbols within its immediate scope and all “outer” scopes.
+
+The following constructs within Python define a scope:
+1. The “builtins” scope is always present and is always the outermost scope. It is pre-populated by the Python interpreter with symbols like “int” and “list.
+2. The module scope (sometimes called the “global” scope) is defined by the current source code file.
+3. Each class defines its own scope. Symbols that represent methods, class variables, or instance variables appear within a class scope.
+4. Each function and lambda defines its own scope. The function’s parameters are symbols within its scope, as are any variables defined within the function.
+5. List comprehensions define their own scope.
+
+## Type Declarations
+
+A symbol can be declared with an explicit type. The “def” and “class” keywords, for example, declare a function or a class. Other symbols in Python can be introduced into a scope with no explicit type. Newer versions of Python have introduced syntax for providing type annotations to input parameters, return parameters, and variables.
+
+When a parameter or variable is annotated with a type, the type checker verifies that all values assigned to that parameter or variable conform to that type.
+
+Consider the following example:
+```
+def func1(p1: float, p2: str, p3, **p4) -> None:
+    var1: int = p1    # This is a type violation
+    var2: str = p2    # This is allowed because the types match
+    var2: int         # This is allowed because it redeclares var2
+    var3 = p1         # var3 does not have a declared type
+    return var1       # This is a type violation
+```
+
+Symbol    | Symbol Type   | Scope     | Declared Type
+----------|---------------|-----------|----------------------------------------------------
+func1     | Function      | Module    | (float, str, Any, Dict[str, Any]) -> None
+p1        | Parameter     | func1     | float
+p2        | Parameter     | func1     | str
+p3        | Parameter     | func1     | <none>
+p4        | Parameter     | func1     | <none>
+var1      | Variable      | func1     | int
+var2      | Variable      | func1     | str
+var3      | Variable      | func1     | <none>
+
+
+Note that once a symbol’s type is declared, it cannot be redeclared to a different type.
+
+## Type Inference
+
+Some languages require every symbol to be explicitly typed. Python allows a symbol to be bound to different types at runtime. A symbol’s type doesn’t need to declared statically.
+
+When Pyright encounters a symbol with no type declaration, it attempts to _infer_ the type based on the values assigned to it. As we will see below, type inference cannot always determine the correct (intended) type, so type annotations are still required in some cases. Furthermore, type inference can require significant computation, so it is much less efficient than when type annotations are provided.
+
+If a symbol’s type cannot be inferred, Pyright internally sets its type to “Unknown”, which is a special form of “Any”. The “Unknown” type allows Pyright to optionally warn when types are not declared and cannot be inferred, thus leaving potential “blind spots” in type checking. 
+
+### Single-Assignment Type Inference
+
+The simplest form of type inference is one that involves a single assignment to a symbol. The inferred type comes from the type of the source expression. Examples include:
+
+```
+var1 = 3                        # Inferred type is int
+var2 = "hi"                     # Inferred type is str
+var3 = list()                   # Inferred type is List[Unknown]
+var4 = [3, 4]                   # Inferred type is List[int]
+for var5 in [3, 4]: ...         # Inferred type is int
+var6 = [p for p in [1, 2, 3]]   # Inferred type is List[int]
+```
+
+### Multi-Assignment Type Inference
+
+When a symbol is assigned values in multiple places within the code, those values may have different types. The inferred type of the variable is the union of all such types.
+
+```
+# In this example, symbol var1 has an inferred type of Union[str, int].
+class Foo:
+    def __init__(self):
+        self.var1 = ""
+    
+    def do_something(self, val: int):
+        self.var1 = val
+
+# In this example, symbol var2 has an inferred type of Optional[Foo].
+if __debug__:
+    var2 = None
+else:
+    var2 = Foo()
+```
+
+### Ambiguous Type Inference
+
+In some cases, an expression’s type is ambiguous. For example, what is the type of the expression `[]`? Is it `List[None]`, `List[int]`, `List[Any]`, `Sequence[Any]`, `Iterable[Any]`? These ambiguities can lead to unintended type violations. Pyright uses several techniques for reducing these ambiguities based on contextual information. Heuristics are used in other cases.
+
+### Bidirectional Type Inference (Expected Types)
+
+One powerful technique the Pyright uses to eliminate type inference ambiguities is _bidirectional inference_. This technique makes use of an “expected type”.
+
+As we saw above, the type of the expression `[]` is ambiguous, but if this expression is passed as an argument to a function, and the parameter is annotated with the type `List[int]`, Pyright can now assume that the type of `[]` in this context must be `List[int]`. Ambiguity eliminated!
+
+This technique is called “bidirectional inference” because type inference for an assignment normally proceeds by first determining the type of the right-hand side (RHS) of the assignment, which then informs the type of the left-hand side (LHS) of the assignment. With bidirectional inference, if the LHS of an assignment has a declared type, it can influence the inferred type of the RHS.
+
+Let’s look at a few examples:
+
+```
+var1 = []                       # Type of RHS is ambiguous
+var2: List[int] = []            # Type of LHS now makes type of RHS unambiguous
+var3 = [4]                      # Type is assumed to be List[int] 
+var4: List[float] = [4]         # Type of RHS is now List[float]
+var5 = (3,)                     # Type is assumed to be Tuple[int]
+var6: Tuple[float, ...] = (3,)  # Type of RHS is now Tuple[float, ...]
+```
+
+### Return Type Inference
+
+As with variable and assignments, function return types can be inferred from the `return` statements found within that function. The returned type is assumed to be the union of all types returned from all `return` statements. If a `return` statement is not followed by an expression, it is assumed to return `None`. Likewise, if the function does not end in a `return` statement, and it is possible for the execution to the end of the function, an implicit `return None` is assumed.
+
+```
+# This function has two explicit return statements and one implicit
+# return (at the end). It does not have a declared return type,
+# so Pyright infers its return type based on the return expressions.
+# In this case, the inferred return type is Union[str, bool, None].
+
+def func1(val: int):
+    if val > 3:
+        return ""
+    elif val < 1:
+        return True
+```
+
+### NoReturn return type
+
+If there is no code path that returns from a function (e.g. all code paths raise an exception), Pyright infers a return type of `NoReturn`. As an exception to this rule, if the function is decorated with `@abstractmethod`, the return type is not inferred as `NoReturn` even if there is no return. This accommodates a common practice where an abstract method is implemented with a `raise NotImplementedError()` statement.
+
+```
+class Foo:
+    # The inferred return type is NoReturn.
+    def method1(self):
+        raise Exception()
+    
+    # The inferred return type is Unknown.
+    @abstractmethod
+    def method2(self):
+        raise NotImplementedError()
+```
+
+### Generator return types
+
+Pyright can infer the return type for a generator function from the `yield` statements contained within that function.
+
+### Call-site Return Type Inference
+
+It is common for input parameters to be unannotated. This can make it difficult for Pyright to infer the correct return type for a function. For example:
+
+```
+# The return type of this function cannot be fully inferred based
+# on the information provided because the types of parameters
+# `a` and `b` are unknown. In this case, the inferred return
+# type is Union[Unknown, None].
+
+def func1(a, b, c):
+    if c:
+        return a
+    elif c > 3:
+        return b
+    else:
+        return None
+```
+
+In cases where all parameters are unannotated, Pyright uses a technique called _call-site return type inference_. It performs type inference using the the types of arguments passed to the function in a call expression. If the unannotated function calls other functions, call-site return type inference can be used recursively. Pyright limits this recursion to a small number for practical performance reasons.
+
+```
+def func2(p_int: int, p_str: str, p_flt: float):
+    # The type of var1 is inferred to be Union[int, None] based
+    # on call-site return type inference.
+    var1 = func1(p_int, p_int, p_int)
+
+    # The type of var2 is inferred to be Union[str, float, None].
+    var2 = func1(p_str, p_flt, p_int)
+```
+
+### Literals
+
+Python 3.8 introduced support for _literal types_. This allows a type checker like Pyright to track specific literal values of str, bytes, int, bool, and enum values. As with other types, literal types can be declared.
+
+```
+# This function is allowed to return only values 1, 2 or 3.
+def func1() -> Literal[1, 2, 3]:
+    ...
+
+# This function must be passed one of three specific string values.
+def func2(mode: Literal["r", "w", "rw"]) -> None:
+    ...
+```
+
+When Pyright is performing type inference, it generally does not infer literal types. Consider the following example:
+
+```
+# If Pyright inferred the type of var1 to be List[Literal[4]],
+# any attempt to append a value other than 4 to this list would
+# generate an error. Pyright therefore infers the type List[int].
+var1 = [4]
+```
+
+### Tuple Expressions
+
+When inferring the type of a tuple expression (in the absence of bidirectional inference hints), Pyright assumes that the tuple has a fixed length, and each tuple element is typed as specifically as possible.
+
+```
+# The inferred type is Tuple[Literal[1], Literal["a"], Literal[True]]
+var1 = (1, "a", True)
+
+def func1(a: int):
+    # The inferred type is Tuple[int, int]
+    var2 = (a, a)
+
+    # If you want the type to be Tuple[int, ...]
+    # (i.e. a homogenous tuple of indeterminate length),
+    # use a type annotation.
+    var3: Tuple[int, ...] = (a, a)
+```
+
+### List Expressions
+
+When inferring the type of a list expression (in the absence of bidirectional inference hints), Pyright uses the following heuristics:
+
+1. If the list is empty (`[]`), assume `List[Unknown]`.
+2. If the list contains at least one element and all elements are the same type T, infer the type `List[T]`.
+3. If the list contains multiple elements that are of different types, the behavior depends on the `strictListInference` configuration setting. By default this setting is off.
+
+    * If `strictListInference` is off, infer `List[Unknown]`.
+    * Otherwise use the union of all element types and infer `List[Union[...]]`.
+
+These heuristics can be overridden through the use of bidirectional inference hints (e.g. by providing a declared type for the target of the assignment expression).
+
+```
+var1 = []                       # Infer List[Unknown]
+
+var2 = [1, 2]                   # Infer List[int]
+
+# Type depends on strictListInference config setting
+var3 = [1, 3.4]                 # Infer List[Unknown] (off)
+var3 = [1, 3.4]                 # Infer List[Union[int, float]] (on)
+
+var4: List[float] = [1, 3.4]    # Infer List[float]
+```
+
+
+### Dictionary Expressions
+
+When inferring the type of a dictionary expression (in the absence of bidirectional inference hints), Pyright uses the following heuristics:
+
+1. If the dict is empty (`{}`), assume `Dict[Unknown, Unknown]`.
+2. If the dict contains at least one element and all keys are the same type K and all values are the same type V, infer the type `Dict[K, V]`.
+3. If the dict contains multiple elements where the keys or values differ in type, the behavior depends on the `strictDictionaryInference` configuration setting. By default this setting is off.
+
+    * If `strictDictionaryInference` is off, infer `Dict[Unknown, Unknown]`.
+    * Otherwise use the union of all key and value types `Dict[Union[(keys), Union[(values)]]]`.
+
+
+```
+var1 = {}                       # Infer Dict[Unknown, Unknown]
+
+var2 = {1: ""}                  # Infer Dict[int, str]
+
+# Type depends on strictDictionaryInference config setting
+var3 = {"a": 3, "b": 3.4}       # Infer Dict[str, Unknown] (off)
+var3 = {"a": 3, "b": 3.4}       # Infer Dict[str, Union[int, float]] (on)
+
+var4: Dict[str, float] = {"a": 3, "b": 3.4}
+```
+
+### Lambdas
+
+Lambdas present a particular challenge for a Python type checker because there is no provision in the Python syntax for annotating the types of a lambda’s input parameters. The types of these parameters must therefore be inferred based on context using bidirectional type inference. Absent this context, a lambda’s input parameters (and therefore its return type) will be unknown.
+
+```
+# The type of var1 is (a: Unknown, b: Unknown) -> Unknown.
+var1 = lambda a, b: a + b
+
+CompareFloats = Callable[[float, float], bool]
+def float_sort(list: List[float], comp: CompareFloats): ...
+
+# In this example, the types of the lambda’s input parameters
+# a and b can be inferred to be float because the float_sort
+# function expects a callback that accepts two floats as
+# inputs.
+float_sort([2, 1.3], lambda a, b: False if a < b else True)
+```