mirror of
https://github.com/roc-lang/roc.git
synced 2024-09-22 08:17:40 +03:00
324 lines
16 KiB
Plaintext
324 lines
16 KiB
Plaintext
interface Path
|
||
exposes [
|
||
Path,
|
||
PathComponent,
|
||
CanonicalizeErr,
|
||
WindowsRoot,
|
||
# toComponents,
|
||
# walkComponents,
|
||
display,
|
||
fromStr,
|
||
fromBytes,
|
||
withExtension,
|
||
]
|
||
imports [InternalPath.{ InternalPath }]
|
||
|
||
## You can canonicalize a [Path] using [Path.canonicalize].
|
||
##
|
||
## Comparing canonical paths is often more reliable than comparing raw ones.
|
||
## For example, `Path.fromStr "foo/bar/../baz" == Path.fromStr "foo/baz"` will return `Bool.false`,
|
||
## because those are different paths even though their canonical equivalents would be equal.
|
||
##
|
||
## Also note that canonicalization reads from the file system (in order to resolve symbolic
|
||
## links, and to convert relative paths into absolute ones). This means that it is not only
|
||
## a [Task] (which can fail), but also that running [canonicalize] on the same [Path] twice
|
||
## may give different answers. An example of a way this could happen is if a symbolic link
|
||
## in the path changed on disk to point somewhere else in between the two [canonicalize] calls.
|
||
##
|
||
## Similarly, remember that canonical paths are not guaranteed to refer to a valid file. They
|
||
## might have referred to one when they were canonicalized, but that file may have moved or
|
||
## been deleted since the canonical path was created. So you might [canonicalize] a [Path],
|
||
## and then immediately use that [Path] to read a file from disk, and still get back an error
|
||
## because something relevant changed on the filesystem between the two operations.
|
||
##
|
||
## Also note that different filesystems have different rules for syntactically valid paths.
|
||
## Suppose you're on a machine with two disks, one formatted as ext4 and another as FAT32.
|
||
## It's possible to list the contents of a directory on the ext4 disk, and get a [CanPath] which
|
||
## is valid on that disk, but invalid on the other disk. One way this could happen is if the
|
||
## directory on the ext4 disk has a filename containing a `:` in it. `:` is allowed in ext4
|
||
## paths but is considered invalid in FAT32 paths.
|
||
Path : InternalPath
|
||
|
||
CanonicalizeErr a : [
|
||
PathCanonicalizeErr {},
|
||
]a
|
||
|
||
## Note that the path may not be valid depending on the filesystem where it is used.
|
||
## For example, paths containing `:` are valid on ext4 and NTFS filesystems, but not
|
||
## on FAT ones. So if you have multiple disks on the same machine, but they have
|
||
## different filesystems, then this path could be valid on one but invalid on another!
|
||
##
|
||
## It's safest to assume paths are invalid (even syntactically) until given to an operation
|
||
## which uses them to open a file. If that operation succeeds, then the path was valid
|
||
## (at the time). Otherwise, error handling can happen for that operation rather than validating
|
||
## up front for a false sense of security (given symlinks, parts of a path being renamed, etc.).
|
||
fromStr : Str -> Path
|
||
fromStr = \str ->
|
||
FromStr str
|
||
|> InternalPath.wrap
|
||
|
||
## Not all filesystems use Unicode paths. This function can be used to create a path which
|
||
## is not valid Unicode (like a Roc [Str] is), but which is valid for a particular filesystem.
|
||
##
|
||
## Note that if the list contains any `0` bytes, sending this path to any file operations
|
||
## (e.g. `File.read` or `WriteStream.openPath`) will fail.
|
||
fromBytes : List U8 -> Path
|
||
fromBytes = \bytes ->
|
||
ArbitraryBytes bytes
|
||
|> InternalPath.wrap
|
||
|
||
## Note that canonicalization reads from the file system (in order to resolve symbolic
|
||
## links, and to convert relative paths into absolute ones). This means that it is not only
|
||
## a [Task] (which can fail), but also that running [canonicalize] on the same [Path] twice
|
||
## may give different answers. An example of a way this could happen is if a symbolic link
|
||
## in the path changed on disk to point somewhere else in between the two [canonicalize] calls.
|
||
##
|
||
## Returns an effect type of `[Metadata, Cwd]` because it can resolve symbolic links
|
||
## and can access the current working directory by turning a relative path into an
|
||
## absolute one (which can prepend the absolute path of the current working directory to
|
||
## the relative path).
|
||
# canonicalize : Path -> Task Path (CanonicalizeErr *) [Metadata, Read [Env]]*
|
||
## Unfortunately, operating system paths do not include information about which charset
|
||
## they were originally encoded with. It's most common (but not guaranteed) that they will
|
||
## have been encoded with the same charset as the operating system's curent locale (which
|
||
## typically does not change after it is set during installation of the OS), so
|
||
## this should convert a [Path] to a valid string as long as the path was created
|
||
## with the given `Charset`. (Use `Env.charset` to get the current system charset.)
|
||
##
|
||
## For a conversion to [Str] that is lossy but does not return a [Result], see
|
||
## [display].
|
||
# toInner : Path -> [Str Str, Bytes (List U8)]
|
||
## Assumes a path is encoded as [UTF-8](https://en.wikipedia.org/wiki/UTF-8),
|
||
## and converts it to a string using `Str.display`.
|
||
##
|
||
## This conversion is lossy because the path may contain invalid UTF-8 bytes. If that happens,
|
||
## any invalid bytes will be replaced with the [Unicode replacement character](https://unicode.org/glossary/#replacement_character)
|
||
## instead of returning an error. As such, it's rarely a good idea to use the [Str] returned
|
||
## by this function for any purpose other than displaying it to a user.
|
||
##
|
||
## When you don't know for sure what a path's encoding is, UTF-8 is a popular guess because
|
||
## it's the default on UNIX and also is the encoding used in Roc strings. This platform also
|
||
## automatically runs applications under the [UTF-8 code page](https://docs.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page)
|
||
## on Windows.
|
||
##
|
||
## Converting paths to strings can be an unreliable operation, because operating systems
|
||
## don't record the paths' encodings. This means it's possible for the path to have been
|
||
## encoded with a different character set than UTF-8 even if UTF-8 is the system default,
|
||
## which means when [display] converts them to a string, the string may include gibberish.
|
||
## [Here is an example.](https://unix.stackexchange.com/questions/667652/can-a-file-path-be-invalid-utf-8/667863#667863)
|
||
##
|
||
## If you happen to know the `Charset` that was used to encode the path, you can use
|
||
## `toStrUsingCharset` instead of [display].
|
||
display : Path -> Str
|
||
display = \path ->
|
||
when InternalPath.unwrap path is
|
||
FromStr str -> str
|
||
FromOperatingSystem bytes | ArbitraryBytes bytes ->
|
||
when Str.fromUtf8 bytes is
|
||
Ok str -> str
|
||
# TODO: this should use the builtin Str.display to display invalid UTF-8 chars in just the right spots, but that does not exist yet!
|
||
Err _ -> "<22>"
|
||
|
||
# isEq : Path, Path -> Bool
|
||
# isEq = \p1, p2 ->
|
||
# when InternalPath.unwrap p1 is
|
||
# FromOperatingSystem bytes1 | ArbitraryBytes bytes1 ->
|
||
# when InternalPath.unwrap p2 is
|
||
# FromOperatingSystem bytes2 | ArbitraryBytes bytes2 -> bytes1 == bytes2
|
||
# # We can't know the encoding that was originally used in the path, so we convert
|
||
# # the string to bytes and see if those bytes are equal to the path's bytes.
|
||
# #
|
||
# # This may sound unreliable, but it's how all paths are compared; since the OS
|
||
# # doesn't record which encoding was used to encode the path name, the only
|
||
# # reasonable# definition for path equality is byte-for-byte equality.
|
||
# FromStr str2 -> Str.isEqUtf8 str2 bytes1
|
||
# FromStr str1 ->
|
||
# when InternalPath.unwrap p2 is
|
||
# FromOperatingSystem bytes2 | ArbitraryBytes bytes2 -> Str.isEqUtf8 str1 bytes2
|
||
# FromStr str2 -> str1 == str2
|
||
# compare : Path, Path -> [Lt, Eq, Gt]
|
||
# compare = \p1, p2 ->
|
||
# when InternalPath.unwrap p1 is
|
||
# FromOperatingSystem bytes1 | ArbitraryBytes bytes1 ->
|
||
# when InternalPath.unwrap p2 is
|
||
# FromOperatingSystem bytes2 | ArbitraryBytes bytes2 -> Ord.compare bytes1 bytes2
|
||
# FromStr str2 -> Str.compareUtf8 str2 bytes1 |> Ord.reverse
|
||
# FromStr str1 ->
|
||
# when InternalPath.unwrap p2 is
|
||
# FromOperatingSystem bytes2 | ArbitraryBytes bytes2 -> Str.compareUtf8 str1 bytes2
|
||
# FromStr str2 -> Ord.compare str1 str2
|
||
## ## Path Components
|
||
PathComponent : [
|
||
ParentDir, # e.g. ".." on UNIX or Windows
|
||
CurrentDir, # e.g. "." on UNIX
|
||
Named Str, # e.g. "stuff" on UNIX
|
||
DirSep Str, # e.g. "/" on UNIX, "\" or "/" on Windows. Or, sometimes, "¥" on Windows - see
|
||
# https://docs.microsoft.com/en-us/windows/win32/intl/character-sets-used-in-file-names
|
||
#
|
||
# This is included as an option so if you're transforming part of a path,
|
||
# you can write back whatever separator was originally used.
|
||
]
|
||
|
||
## Note that a root of Slash (`/`) has different meanings on UNIX and on Windows.
|
||
## * On UNIX, `/` at the beginning of the path refers to the filesystem root, and means the path is absolute.
|
||
## * On Windows, `/` at the beginning of the path refers to the current disk drive, and means the path is relative.
|
||
# PathRoot : [
|
||
# WindowsSpecificRoot WindowsRoot, # e.g. "C:" on Windows
|
||
# Slash,
|
||
# None,
|
||
# ]
|
||
# TODO see https://doc.rust-lang.org/std/path/enum.Prefix.html
|
||
WindowsRoot : []
|
||
|
||
## Returns the root of the path.
|
||
# root : Path -> PathRoot
|
||
# components : Path -> { root : PathRoot, components : List PathComponent }
|
||
## Walk over the path's [components].
|
||
# walk :
|
||
# Path,
|
||
# # None means it's a relative path
|
||
# (PathRoot -> state),
|
||
# (state, PathComponent -> state)
|
||
# -> state
|
||
## Returns the path without its last [`component`](#components).
|
||
##
|
||
## If the path was empty or contained only a [root](#PathRoot), returns the original path.
|
||
# dropLast : Path -> Path
|
||
# TODO see https://doc.rust-lang.org/std/path/struct.Path.html#method.join for
|
||
# the definition of the term "adjoin" - should we use that term?
|
||
# append : Path, Path -> Path
|
||
# append = \prefix, suffix ->
|
||
# content =
|
||
# when InternalPath.unwrap prefix is
|
||
# FromOperatingSystem prefixBytes ->
|
||
# when InternalPath.unwrap suffix is
|
||
# FromOperatingSystem suffixBytes ->
|
||
# # Neither prefix nor suffix had interior nuls, so the answer won't either
|
||
# List.concat prefixBytes suffixBytes
|
||
# |> FromOperatingSystem
|
||
# ArbitraryBytes suffixBytes ->
|
||
# List.concat prefixBytes suffixBytes
|
||
# |> ArbitraryBytes
|
||
# FromStr suffixStr ->
|
||
# # Append suffixStr by writing it to the end of prefixBytes
|
||
# Str.appendToUtf8 suffixStr prefixBytes (List.len prefixBytes)
|
||
# |> ArbitraryBytes
|
||
# ArbitraryBytes prefixBytes ->
|
||
# when InternalPath.unwrap suffix is
|
||
# ArbitraryBytes suffixBytes | FromOperatingSystem suffixBytes ->
|
||
# List.concat prefixBytes suffixBytes
|
||
# |> ArbitraryBytes
|
||
# FromStr suffixStr ->
|
||
# # Append suffixStr by writing it to the end of prefixBytes
|
||
# Str.writeUtf8 suffixStr prefixBytes (List.len prefixBytes)
|
||
# |> ArbitraryBytes
|
||
# FromStr prefixStr ->
|
||
# when InternalPath.unwrap suffix is
|
||
# ArbitraryBytes suffixBytes | FromOperatingSystem suffixBytes ->
|
||
# List.concat suffixBytes (Str.toUtf8 prefixStr)
|
||
# |> ArbitraryBytes
|
||
# FromStr suffixStr ->
|
||
# Str.concat prefixStr suffixStr
|
||
# |> FromStr
|
||
# InternalPath.wrap content
|
||
# appendStr : Path, Str -> Path
|
||
# appendStr = \prefix, suffixStr ->
|
||
# content =
|
||
# when InternalPath.unwrap prefix is
|
||
# FromOperatingSystem prefixBytes | ArbitraryBytes prefixBytes ->
|
||
# # Append suffixStr by writing it to the end of prefixBytes
|
||
# Str.writeUtf8 suffixStr prefixBytes (List.len prefixBytes)
|
||
# |> ArbitraryBytes
|
||
# FromStr prefixStr ->
|
||
# Str.concat prefixStr suffixStr
|
||
# |> FromStr
|
||
# InternalPath.wrap content
|
||
## Returns `Bool.true` if the first path begins with the second.
|
||
# startsWith : Path, Path -> Bool
|
||
# startsWith = \path, prefix ->
|
||
# when InternalPath.unwrap path is
|
||
# FromOperatingSystem pathBytes | ArbitraryBytes pathBytes ->
|
||
# when InternalPath.unwrap prefix is
|
||
# FromOperatingSystem prefixBytes | ArbitraryBytes prefixBytes ->
|
||
# List.startsWith pathBytes prefixBytes
|
||
# FromStr prefixStr ->
|
||
# strLen = Str.countUtf8Bytes prefixStr
|
||
# if strLen == List.len pathBytes then
|
||
# # Grab the first N bytes of the list, where N = byte length of string.
|
||
# bytesPrefix = List.takeAt pathBytes 0 strLen
|
||
# # Compare the two for equality.
|
||
# Str.isEqUtf8 prefixStr bytesPrefix
|
||
# else
|
||
# Bool.false
|
||
# FromStr pathStr ->
|
||
# when InternalPath.unwrap prefix is
|
||
# FromOperatingSystem prefixBytes | ArbitraryBytes prefixBytes ->
|
||
# Str.startsWithUtf8 pathStr prefixBytes
|
||
# FromStr prefixStr ->
|
||
# Str.startsWith pathStr prefixStr
|
||
## Returns `Bool.true` if the first path ends with the second.
|
||
# endsWith : Path, Path -> Bool
|
||
# endsWith = \path, prefix ->
|
||
# when InternalPath.unwrap path is
|
||
# FromOperatingSystem pathBytes | ArbitraryBytes pathBytes ->
|
||
# when InternalPath.unwrap suffix is
|
||
# FromOperatingSystem suffixBytes | ArbitraryBytes suffixBytes ->
|
||
# List.endsWith pathBytes suffixBytes
|
||
# FromStr suffixStr ->
|
||
# strLen = Str.countUtf8Bytes suffixStr
|
||
# if strLen == List.len pathBytes then
|
||
# # Grab the last N bytes of the list, where N = byte length of string.
|
||
# bytesSuffix = List.takeAt pathBytes (strLen - 1) strLen
|
||
# # Compare the two for equality.
|
||
# Str.startsWithUtf8 suffixStr bytesSuffix
|
||
# else
|
||
# Bool.false
|
||
# FromStr pathStr ->
|
||
# when InternalPath.unwrap suffix is
|
||
# FromOperatingSystem suffixBytes | ArbitraryBytes suffixBytes ->
|
||
# Str.endsWithUtf8 pathStr suffixBytes
|
||
# FromStr suffixStr ->
|
||
# Str.endsWith pathStr suffixStr
|
||
# TODO https://doc.rust-lang.org/std/path/struct.Path.html#method.strip_prefix
|
||
# TODO idea: what if it's File.openRead and File.openWrite? And then e.g. File.metadata,
|
||
# File.isDir, etc.
|
||
## If the last component of this path has no `.`, appends `.` followed by the given string.
|
||
## Otherwise, replaces everything after the last `.` with the given string.
|
||
##
|
||
## Examples:
|
||
##
|
||
## Path.fromStr "foo/bar/baz" |> Path.withExtension "txt" # foo/bar/baz.txt
|
||
## Path.fromStr "foo/bar/baz." |> Path.withExtension "txt" # foo/bar/baz.txt
|
||
## Path.fromStr "foo/bar/baz.xz" |> Path.withExtension "txt" # foo/bar/baz.txt
|
||
withExtension : Path, Str -> Path
|
||
withExtension = \path, extension ->
|
||
when InternalPath.unwrap path is
|
||
FromOperatingSystem bytes | ArbitraryBytes bytes ->
|
||
beforeDot =
|
||
when List.splitLast bytes (Num.toU8 '.') is
|
||
Ok { before } -> before
|
||
Err NotFound -> bytes
|
||
|
||
beforeDot
|
||
|> List.reserve (1 + Str.countUtf8Bytes extension)
|
||
|> List.append (Num.toU8 '.')
|
||
|> List.concat (Str.toUtf8 extension)
|
||
|> ArbitraryBytes
|
||
|> InternalPath.wrap
|
||
|
||
FromStr str ->
|
||
beforeDot =
|
||
when Str.splitLast str "." is
|
||
Ok { before } -> before
|
||
Err NotFound -> str
|
||
|
||
beforeDot
|
||
|> Str.reserve (1 + Str.countUtf8Bytes extension)
|
||
|> Str.concat "."
|
||
|> Str.concat extension
|
||
|> FromStr
|
||
|> InternalPath.wrap
|
||
|
||
# NOTE: no withExtensionBytes because it's too narrow. If you really need to get some
|
||
# non-Unicode in there, do it with
|