Skip to content

sys.path[0] breaks out of runfile tree. #382

@dillon-giacoppo

Description

@dillon-giacoppo

🐞 bug report

Affected Rule

py_binary, py_test.

Is this a regression?

No

Description

When Python initializes it adds the directory of the script to the sys.path:

As initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter. If the script directory is not available (e.g. if the interpreter is invoked interactively or if the script is read from standard input), path[0] is the empty string, which directs Python to search modules in the current directory first. Notice that the script directory is inserted before the entries inserted as a result of PYTHONPATH.

This would imply that a py target such as {REPO_DIR}/bazel-bin/{target_path}/{target}.runfiles/{script_path}/{python_script}.py would cause {REPO_DIR}/bazel-bin/{target_path}/{target}.runfiles/{script_path}/ to be inserted as sys.path[0].

However, because the script is a symlink to the real python script in the repository, Python follows the symlink and appends the actual underlying directory to sys.path[0], in this case: {REPO_DIR}/{script_path}/. This issue was raised in Python's bug tracker and noted as expected behaviour that won't be fixed issue17639.

This allows the script to "break out" of the runfiles tree and import code from the same directory which it does not necessarily have access to via src or deps.

This causes issues when creating multiple targets in the same directory which should not automatically depend on each other.

For example, given:

repo/
  src/
    main.py
    foo.py

main can depend on foo without the file being explicitly listed in the BUILD file.

🔬 Minimal Reproduction

src/BUILD

package(default_visibility = ["//visibility:public"])

py_binary(
    name = "main",
    srcs = ["main.py"],
    python_version = "PY3",
    srcs_version = "PY3",
)

src/main.py

import sys

import foo

print(sys.path[0])

src/foo.py

print("foo imported")

The output of bazel run //src:main will be:

{REPO}/src/
foo imported

Which should not be possible where foo.py is not an input to //src:main.

🌍 Your Environment

Operating System:

  
macOS 11.0.1 (20B29)
  

Output of bazel version:

  
Build label: 3.7.0-homebrew
Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Sat Nov 14 13:29:41 2020 (1605360581)
Build timestamp: 1605360581
Build timestamp as int: 1605360581
  

Rules_python version:

  
0.1.0
  

Anything else relevant?
The most obvious solution that comes to mind for this would be to copy into the runfile tree rather than symlinking. It seems unlikely this is going to change upstream in Python.

Metadata

Metadata

Assignees

Labels

core-rulesIssues concerning core bin/test/lib rules

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions