翻译于 2019/01/04 11:06
0 人 顶 此译文
Bash is not the most programmer-friendly tool. It requires a lot of caution, low-level knowledge and doesn’t allow the slightest mistake (you know you can’t type foo = 42, right?). On the other hand, bash is everywhere (even on Windows 10), it’s quite portable and powerful, and in effect is the most pragmatic choice when automating tasks. Luckily, following a set of simple rules can save you from many of its minefields.
There are a number of possible shebangs you can use to refer to the interpreter you want to execute your code under. Some of them are:
#!/usr/bin/env bash
#!/bin/bash
#!/bin/sh
#!/bin/sh –
We all know a shebang is nothing but the path (absolute or relative to current working directory) to shell interpreter, but which one is preferred?
Long story short – you should use #!/usr/bin/env bash for portability. The thing is that POSIX does not standardize path names, so different UNIX-based systems may have bash placed in different locations. You cannot safely assume that – for example – /bin/bash even exists (some of BSD systems have bash binary placed in /usr/local/bin/bash).
目前有数个可用的shebang,你可以用其引用你想执行代码的解释器。他们中的一些是:
#!/usr/bin/env bash
#!/bin/bash
#!/bin/sh
#!/bin/sh –
我们都知道shebang仅仅是一个指向shell解释器的路径(绝对火相对于当前目录),但哪一个更受欢迎呢?
长话短说 –为了可移植性,你应该使用#!/usr/bin/env bash。这是因为POSIX并没有标准化路径名,因此不同的基于UNIX的系统可能会将bash放到不同的位置。你不能完全假定——例如——/bin/bash是必然存在的 (一些BSD系统将bash可执行文件放到/usr/local/bin/bash中)。
Env utility can help us workaround this limitation: #!/usr/bin/env bash will cause code execution under the first bash interpreter found in PATH. While it’s not the perfect solution (what if the same problem applies to /usr/bin/env? Luckily, every UNIX OS I know have env placed exactly there), it’s the best we can go for.
However, there is one exception I’m aware of: for a system boot script, use /bin/sh since it’s the standard command interpreter for the system.
It’s worth to check out this and this article for more information.
This is the simplest and the best advice you should follow to save yourself from many of possible pitfalls. Incorrect shell quoting is the most common reason of a bash programmer’s headache. Unfortunately, it’s not as easy as important.
There are many great articles completely covering this specific topic. I don’t have anything more to say, but to recommend you this and this article.
It’s worth to remember, that you generally should use double quotes.
$foo is the classic form of variable referencing in bash. However, version 2 of bash (see echo $BASH_VERSION) brings us a new notation known as variable expansion. The idea is to use curly braces around variable identifier, like ${foo}. Why is this considered to be a good practice? It brings us a whole set of new features:
array elements expanding: ${array[42]}
parameter expansion, like ${filename%.*} (removes file extension), ${foo// } (removes whitespaces) and ${BASH_VERSION%%.*} (gets major version of bash)
variable concatenation: ${dirname}/${filename}
appending string to a variable: ${HOME}/.bashrc
access positional parameters (arguments to a script) beyond $9
substring support: ${foo:1:5}
indirect referencing: ${!foo} will be expanded to a value hold by a parameter whose name is stored in foo(bar=42; foo="bar"; echo "${!foo}" will print 42)
case modification: ${foo^} will modify foo‘s first character to uppercase, the , operator to lowercase. Theirs double-form (^^ and ,,) will convert all characters
3.变量的使用
$foo 是bash中引用变量的经典方法,但是bash2.0版本(通过echo $BASH_VERSION查看)给我们提供了新标记方法——变量扩展。这种方法是通过在变量标识符的两边使用大括号来做标记的,比如${foo}。为什么说这是一种好的实践呢?因为它给我们带来了一些新的特性:
数组元素的扩展:${array[42]}
参数的扩展,例如${filename%.*} (删除了文件的扩展名),, ${foo// } (删除了空格), ${BASH_VERSION%%.*}(获取bash的大版本号)
变量名的拼接:${dirname}/${filename}
将字符串拼接到变量的后面: ${HOME}/.bashrc
通过位置参数来访问参数变量(脚本的输入参数),例如 $9
支持子字符串的访问:${foo:1:5}
间接引用:${!foo} 将会展开成一个由名称为foo,且存储在其中的值来间接表示的值(bar=42; foo="bar"; echo "${!foo}" 将会打印42)
大小写修改:${foo^}会将foo的首字母修改为大写字母,,(单独一个逗号会将其转换成小写字母)双重形式的这种方法 (^^ 和 ,,) 会将所有的字母进行转换。
In most common cases, using variable expansion form gives us no advantage over the classic one, but to keep code consistent, using it everywhere can be considered as a good practice. Read more about it here.
What you also have to know about variables in bash is that by default, all of them are global. This can result in problems like shadowing, overriding or ambiguous referencing. local operator restricts the scope of variables, protecting them from leaking to a global namespace. Just remember – make all your function’s variables as local.
在大多数常见情况下,使用变量扩展形式使我们没有优于经典扩展形式的地方,但为了保持代码一致性,在所有地方使用它可以被认为是一种好的做法。在此阅读更多相关信息。
你还需要了解关于bash中的变量的是,默认情况下,所有变量都是全局变量。这可能导致诸如浅拷贝、覆盖或歧义引用等问题。local运算符限制了变量的范围,防止它们泄漏到全局命名空间中。请记住 - 将所有函数的变量设置为local变量。
Within bash script, you will often operate on other files. Thus, you have to be really careful using relative paths. By default, the current working directory under script is derived from parent shell.
$ pwd /home/jakub $ cat test/test #!/usr/bin/env bash echo "$(pwd)" $ ./test/test /home/jakub
The problem exist when both pwd and script’s location differs. You cannot then simply refer to ./some_file, since it does not point to some_file placed next to your script. To be able to easily operate on files in script’s directory and avoid messing up random system files, you should consider using this handy one-liner to change subshell working directory to source directory of a bash script:
cd "$(cd "$(dirname "${BASH_SOURCE[0]}")" > /dev/null && pwd)" || return $ pwd /home/jakub $ cat test/test #!/usr/bin/env bash cd "$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null && pwd)" || return echo "$(pwd)" $ ./test/test /home/jakub/test
Looks much more natural, doesn’t it?
4.观察脚本的运行目录
你经常会在bash脚本中和其他的文件进行交互。因此,你必须十分小心使用相对路径。默认情况下,当前的工作路径是由脚本所在的父shell环境下所得到的。
$ pwd /home/jakub $ cat test/test #!/usr/bin/env bash echo "$(pwd)" $ ./test/test /home/jakub
当pwd和脚本所在的路径不一致的时候,会存在一些问题。这个时候,不能够简单的通过./脚本名称的方式运行脚本,因为它不会指向你的脚本旁边的文件。为了更加简单的将脚本作用到特定路径的文件,并且避免不小心引用到其他的系统文件,你应该考虑使用这个方便的单行指令将子shell工作目录更改为bash脚本的所在的目录:
cd "$(cd "$(dirname "${BASH_SOURCE[0]}")" > /dev/null && pwd)" || return $ pwd /home/jakub $ cat test/test #!/usr/bin/env bash cd "$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null && pwd)" || return echo "$(pwd)" $ ./test/test /home/jakub/test
看起来是不是更加自然?
The approach of ls usage inside bash script is almost always entirely flawed. I’m not able to recall even one reason to do this. To explain why, let’s go through two of common examples:
for file in $(ls *.txt)
Word Splitting will ruin this for-loop when any of filenames contains whitespace. What’s more – if a filename contains glob character (also known as a wildcard, like *, ?, [, ]), it will be recognized as a glob pattern and expanded by the shell. That’s probably not exactly what you want. Another problem is that POSIX allows pathnames contain any character except \0 (including |, / and even newline). This makes impossible to determine where the first pathname ends and the second one begins when dealing with ls output.
for file in "$(ls *.txt)"
Double quotes around ls will cause its output to be treated as a single word – not as a list of files, as desired.
How to iterate over list of files the right way? There are two possibilities:
for file in ./*.txt
This uses bash globbing feature mentioned above. Remember to double quote "${file}"!
find . -type f -name '*.txt' -exec ...
This one is probably the best solution. Find util lets you use regex-based search (-regex), recursion and has many other built-in features you may find useful. Here is a great synopsis of this tool.
find . -type f -name '*.txt' -print0 | xargs -0 ...
An alternative usage of find and xargs. It’s neither simpler nor shorter, but the advantage of xargs is that it supports parallel pipeline execution. Read more about the differences here.
To summarize, never try to parse the output of ls command. It’s simply not indented to be parsed and there is no way you can make it work. Read more here“.
在 bash 脚本中使用 ls 的方法几乎总是有缺陷的,我甚至无法记起这样做的一个理由。为了解释其原因,我们来看看两个常见的例子:
for file in $(ls *.txt)
当任意文件名包含空格时,分词将破坏此 for 循环。更重要的是 —— 如果文件名包含 glob 字符(也称为通配符,如*、?、[、]),它将被识别为 glob 模式并由 shell 扩展,但这可能不是你想要的。另一个问题是 POSIX 允许路径名包含除 \0 之外的任何字符(包括 |,/ 甚至换行符)。这使得在处理 ls 输出时无法确定第一个路径名的结束位置以及第二个路径名的起始位置。
for file in "$(ls *.txt)"
将 ls 包含在双引号内将导致其输出被视为单个词 —— 而不是期望的文件列表。
如何以正确的方式遍历文件列表呢?有两种可行策略:
for file in ./*.txt
这会使用上述的 bash globbing 功能。记得是双引用"${file}"!
find . -type f -name '*.txt' -exec ...
这个可能是最好的解决方案。 Find 工具允许你使用基于正则表达式的搜索(-regex),递归并具有许多你可能觉得有用的内置功能。这里有一个不错的简介。
find . -type f -name '*.txt' -print0 | xargs -0 ...
另一种替代 find 的用法是使用 xargs。它既不简单也不简短,但 xargs 的优势在于它支持并行管道执行。更多有关此差异的信息阅读此文。
总而言之,永远不要尝试解析 ls 命令的输出。它根本没有被设计用于解析,你无法让其正常工作。点此阅读更多。
It’s often forgotten to check for non-zero status codes of commands executed within the bash script. It’s easy to imagine what would happen when our cd command preceding file operations fails silently (because of “No such file or directory” for example).
#!/usr/bin/env bash cd "${some_directory}" rm -rf ./*
An example above works well, but only if nothing goes wrong. The intention was to delete content of some_directory/, but it may end up executing rm -rf ./* in completely different location.
cd "${some_directory}" && rm -rf ./* and cd "${some_directory}" || return are the simplest and self-descriptive solution. In both cases, deletion won’t execute if cd returns non-zero. It’s worth to point out, that this code is still vulnerable to a common programming error – misspelling.
Executing cd "${some_dierctory}" && rm -rf ./* will end up deleting files you probably want to keep (as long as there isn’t misspelled some_dierctory variable declaration). "${some_dierctory}" will be expanded to "", which is entirely valid cd argument bringing us to home directory. Don’t worry though, that’s not the end of the story.
通常忘记检查在bash脚本中执行命令的非零状态代码。很容易想象当我们的cd命令在文件操作之前静默失败时会发生什么(因为例如“没有这样的文件或目录”)。
#!/usr/bin/env bash cd "${some_directory}" rm -rf ./*
上面的一个例子会很好的工作,但只有在没有出错的情况下。目的是删除some_directory/目录的内容,但最终可能会在完全不同的目录位置执行rm -rf ./*。
cd“$ {some_directory}”&& rm -rf ./* 和 cd“$ {some_directory}”|| return是最简单的自描述解决方案。在这两种情况下,如果cd返回非零,则不会执行删除。值得指出的是,此代码仍然容易受到常见编程错误的影响 - 拼写错误。
执行cd“$ {some_dierctory}”&& rm -rf ./*将最终删除您可能要保留的文件(只要没有拼写错误的some_dierctory变量声明)。 “$ {some_dierctory}”将扩展为“”,这是完全有效的cd参数,将我们带到主目录。不过不用担心,这不是故事的结局。
Bash has some programmer-friendly switches you should be aware of:
set -o nounset tells bash to treat referring to unset variables as an error. This one saves us from many typos mistakes.
set -o errexit tells bash to exit the script immediately if any statement returns a non-zero. One may say, that using errexit gives us error checking for free, but this can be tricky to use correctly. Some commands returns a non-zero for a warning and sometimes you know exactly how to handle particular command’s error. Read more here.
set -o pipefail changes the default behavior when using pipes. By default, bash takes the status code of the last expression in a pipeline, meaning that false | true will be considered to return 0. It may not be what you want, since this approach ignores errors raised by previous commands in pipeline. This is where pipefail comes in. This options sets the exit code of a pipeline to the rightmost non-zero one (or to 0 if all commands exit successfully).
set -x causes bash to print each command right before executing it (i.e. after globbing, arguments expanding). Definitely a great help when trying to debug a bash script failure.
Of course error handling problems applies not only to cd command described above. Your script should take into account vast majority of possible problems, like spaces in pathnames, files missing, directories not being created or non-existing commands (you know, awk isn’t always present in OS you’re about to run your script on).
Bash编程有一些值得注意的对于程序员友好的开关:
set -o nounset 可以设置bash将引用未被初始化的变量视为错误,这一特性可以避免我们犯下拼写等低级错误。
set -o errexit 可以设置bash脚本在语句的返回值为非0值的时候立即退出。虽然使用errexit可以帮助我们有效的检验程序的错误,但要正确的使用errexit却需要一些技巧。一些命令故意返回非0的值来产生告警,并且程序员确切的知道应该如何去处理特定命令返回的错误值。参考这里了解更多。
set -o pipefail 可以改变使用管道时的默认行为。默认情况下,bash会将管道前面的命令返回的状态码作为管道后面的命令的输入,这意味着false| true返回0(管道符前面的状态是非0,会立马执行管道符后的命令)。这样的结果有可能不是你所期望的,因为这种情况下会忽略管道符前面的命令的结果。此时需要使用pipefail命令了,通过set -o pipefail的设置,可以设置管道的退出码为最右边的返回非零的命令(或者在所有的指令的执行成功的情况下设置退出码为0)。
当然,错误问题的处理不仅仅适用于上面提到的cd命令,你的bash脚本应该考虑各种情况下可能出现的问题,比如路径名称中的空格,文件的缺失,目录未创建,或者是错误的使用了原本不存在的命令等(例如,就像你所了解的,并不是所有运行你的bash脚本的linux操作系统都预装了awk命令)