Bash 技巧

已翻译 100%
参与翻译 (3人) : Tocy, SVD, tsingkuo2019
加载中

Bash is not the most programmer-friendly tool. It requires a lot of caution, low-level knowledge and doesn’t allow the slightest mistake (you know you can’t type foo = 42, right?). On the other hand, bash is everywhere (even on Windows 10), it’s quite portable and powerful, and in effect is the most pragmatic choice when automating tasks. Luckily, following a set of simple rules can save you from many of its minefields.

Bash不是最佳的程序员友好之工具。它需要小心谨慎,底层知识并且不允许出现任何错误(你知道你不能输入foo = 42,对吧?)。 另一方面,bash是无处不在的(即使在Windows 10上),它非常容易移植且功能强大,实际上它是自动化任务时最实用的选择。幸运的是,遵循一套简单的规则可以使你免于其诸多雷区。

1. Shebang

There are a number of possible shebangs you can use to refer to the interpreter you want to execute your code under. Some of them are:

  • #!/usr/bin/env bash

  • #!/bin/bash

  • #!/bin/sh

  • #!/bin/sh –

We all know a shebang is nothing but the path (absolute or relative to current working directory) to shell interpreter, but which one is preferred?

Long story short – you should use #!/usr/bin/env bash for portability. The thing is that POSIX does not standardize path names, so different UNIX-based systems may have bash placed in different locations. You cannot safely assume that – for example – /bin/bash even exists (some of BSD systems have bash binary placed in /usr/local/bin/bash).

1. Shebang

目前有数个可用的shebang,你可以用其引用你想执行代码的解释器。他们中的一些是:

  • #!/usr/bin/env bash

  • #!/bin/bash

  • #!/bin/sh

  • #!/bin/sh –

我们都知道shebang仅仅是一个指向shell解释器的路径(绝对火相对于当前目录),但哪一个更受欢迎呢?

长话短说 –为了可移植性,你应该使用#!/usr/bin/env bash。这是因为POSIX并没有标准化路径名,因此不同的基于UNIX的系统可能会将bash放到不同的位置。你不能完全假定——例如——/bin/bash是必然存在的 (一些BSD系统将bash可执行文件放到/usr/local/bin/bash中)。

 

Env utility can help us workaround this limitation: #!/usr/bin/env bash will cause code execution under the first bash interpreter found in PATH. While it’s not the perfect solution (what if the same problem applies to /usr/bin/env? Luckily, every UNIX OS I know have env placed exactly there), it’s the best we can go for.

However, there is one exception I’m aware of: for a system boot script, use /bin/sh since it’s the standard command interpreter for the system.

It’s worth to check out this and this article for more information.

Env实用程序可以帮助我们规避这种限制:#!/usr/bin/env bash将是代码在执行时使用PATH路径下找到的第一个解释器。尽管这不是最完美的解决方案 (那么如果同样的问题也适用于/usr/bin/env呢? 幸运的是,据我所知每个UNIX OS将env放置到同一个位置), 这是我们能做到的最好的方案。

然而,我意识到这也有个例外:对于系统启动脚本,既然/bin/sh是系统标准命令行解释器,使用之。 

更多信息请查阅这篇这篇文章。

2. Always use quotes

This is the simplest and the best advice you should follow to save yourself from many of possible pitfalls. Incorrect shell quoting is the most common reason of a bash programmer’s headache. Unfortunately, it’s not as easy as important.

There are many great articles completely covering this specific topic. I don’t have anything more to say, but to recommend you this and this article.

It’s worth to remember, that you generally should use double quotes.

2. 始终使用引号

这是你应该遵循的最简单以及最好的建议,以避免诸多可能的陷阱。错误的shell引用是让bash程序员头痛的最常见原因。不幸的是,它并不像重要那么容易。

目前有很多不错的文章完全涵盖了这一特定主题。我没有更多要说的,但向你推荐这篇以及t这篇文章。

值得记住的是:你通常应该使用双引号。

3. Variables usage

$foo is the classic form of variable referencing in bash. However, version 2 of bash (see echo $BASH_VERSION) brings us a new notation known as variable expansion. The idea is to use curly braces around variable identifier, like ${foo}. Why is this considered to be a good practice? It brings us a whole set of new features:

  • array elements expanding: ${array[42]}

  • parameter expansion, like ${filename%.*} (removes file extension), ${foo// } (removes whitespaces) and ${BASH_VERSION%%.*} (gets major version of bash)

  • variable concatenation: ${dirname}/${filename}

  • appending string to a variable: ${HOME}/.bashrc

  • access positional parameters (arguments to a script) beyond $9

  • substring support: ${foo:1:5}

  • indirect referencing: ${!foo} will be expanded to a value hold by a parameter whose name is stored in foo(bar=42; foo="bar"; echo "${!foo}" will print 42)

  • case modification: ${foo^} will modify foo‘s first character to uppercase, the , operator to lowercase. Theirs double-form (^^ and ,,) will convert all characters

3.变量的使用

$foo 是bash中引用变量的经典方法,但是bash2.0版本(通过echo $BASH_VERSION查看)给我们提供了新标记方法——变量扩展。这种方法是通过在变量标识符的两边使用大括号来做标记的,比如${foo}。为什么说这是一种好的实践呢?因为它给我们带来了一些新的特性:

数组元素的扩展:${array[42]}

参数的扩展,例如${filename%.*} (删除了文件的扩展名),, ${foo// } (删除了空格), ${BASH_VERSION%%.*}(获取bash的大版本号)

变量名的拼接:${dirname}/${filename}

将字符串拼接到变量的后面: ${HOME}/.bashrc

通过位置参数来访问参数变量(脚本的输入参数),例如 $9

支持子字符串的访问:${foo:1:5}

间接引用:${!foo}  将会展开成一个由名称为foo,且存储在其中的值来间接表示的值(bar=42; foo="bar"; echo "${!foo}" 将会打印42)

大小写修改:${foo^}会将foo的首字母修改为大写字母,,(单独一个逗号会将其转换成小写字母)双重形式的这种方法 (^^ 和 ,,) 会将所有的字母进行转换。

In most common cases, using variable expansion form gives us no advantage over the classic one, but to keep code consistent, using it everywhere can be considered as a good practice. Read more about it here.

What you also have to know about variables in bash is that by default, all of them are global. This can result in problems like shadowing, overriding or ambiguous referencing. local operator restricts the scope of variables, protecting them from leaking to a global namespace. Just remember – make all your function’s variables as local.

在大多数常见情况下,使用变量扩展形式使我们没有优于经典扩展形式的地方,但为了保持代码一致性,在所有地方使用它可以被认为是一种好的做法。在阅读更多相关信息。

你还需要了解关于bash中的变量的是,默认情况下,所有变量都是全局变量。这可能导致诸如浅拷贝、覆盖或歧义引用等问题。local运算符限制了变量的范围,防止它们泄漏到全局命名空间中。请记住 - 将所有函数的变量设置为local变量。

4. Watch the script’s working directory

Within bash script, you will often operate on other files. Thus, you have to be really careful using relative paths. By default, the current working directory under script is derived from parent shell.

$ pwd
/home/jakub

$ cat test/test
#!/usr/bin/env bash
echo "$(pwd)"

$ ./test/test
/home/jakub

The problem exist when both pwd and script’s location differs. You cannot then simply refer to ./some_file, since it does not point to some_file placed next to your script. To be able to easily operate on files in script’s directory and avoid messing up random system files, you should consider using this handy one-liner to change subshell working directory to source directory of a bash script:

cd "$(cd "$(dirname "${BASH_SOURCE[0]}")" > /dev/null && pwd)" || return
$ pwd
/home/jakub

$ cat test/test
#!/usr/bin/env bash
cd "$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null && pwd)" || return
echo "$(pwd)"

$ ./test/test
/home/jakub/test

Looks much more natural, doesn’t it?

4.观察脚本的运行目录

你经常会在bash脚本中和其他的文件进行交互。因此,你必须十分小心使用相对路径。默认情况下,当前的工作路径是由脚本所在的父shell环境下所得到的。

$ pwd
/home/jakub

$ cat test/test
#!/usr/bin/env bash
echo "$(pwd)"

$ ./test/test
/home/jakub

当pwd和脚本所在的路径不一致的时候,会存在一些问题。这个时候,不能够简单的通过./脚本名称的方式运行脚本,因为它不会指向你的脚本旁边的文件。为了更加简单的将脚本作用到特定路径的文件,并且避免不小心引用到其他的系统文件,你应该考虑使用这个方便的单行指令将子shell工作目录更改为bash脚本的所在的目录:

cd "$(cd "$(dirname "${BASH_SOURCE[0]}")" > /dev/null && pwd)" || return
$ pwd
/home/jakub

$ cat test/test
#!/usr/bin/env bash
cd "$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null && pwd)" || return
echo "$(pwd)"

$ ./test/test
/home/jakub/test

看起来是不是更加自然?

5. You don’t really need ls

The approach of ls usage inside bash script is almost always entirely flawed. I’m not able to recall even one reason to do this. To explain why, let’s go through two of common examples:

for file in $(ls *.txt)

Word Splitting will ruin this for-loop when any of filenames contains whitespace. What’s more – if a filename contains glob character (also known as a wildcard, like *, ?, [, ]), it will be recognized as a glob pattern and expanded by the shell. That’s probably not exactly what you want. Another problem is that POSIX allows pathnames contain any character except \0 (including |, / and even newline). This makes impossible to determine where the first pathname ends and the second one begins when dealing with ls output.

for file in "$(ls *.txt)"

Double quotes around ls will cause its output to be treated as a single word – not as a list of files, as desired.

How to iterate over list of files the right way? There are two possibilities:

for file in ./*.txt

This uses bash globbing feature mentioned above. Remember to double quote "${file}"!

find . -type f -name '*.txt' -exec ...

This one is probably the best solution. Find util lets you use regex-based search (-regex), recursion and has many other built-in features you may find useful. Here is a great synopsis of this tool.

find . -type f -name '*.txt' -print0 | xargs -0 ...

An alternative usage of find and xargs. It’s neither simpler nor shorter, but the advantage of xargs is that it supports parallel pipeline execution. Read more about the differences here.

To summarize, never try to parse the output of ls command. It’s simply not indented to be parsed and there is no way you can make it work. Read more here“.

5. 你真的不需要 ls

在 bash 脚本中使用 ls 的方法几乎总是有缺陷的,我甚至无法记起这样做的一个理由。为了解释其原因,我们来看看两个常见的例子:

for file in $(ls *.txt)

当任意文件名包含空格时,分词将破坏此 for 循环。更重要的是 —— 如果文件名包含 glob 字符(也称为通配符,如*、?、[、]),它将被识别为 glob 模式并由 shell 扩展,但这可能不是你想要的。另一个问题是 POSIX 允许路径名包含除 \0 之外的任何字符(包括 |,/ 甚至换行符)。这使得在处理 ls 输出时无法确定第一个路径名的结束位置以及第二个路径名的起始位置。

for file in "$(ls *.txt)"

将 ls 包含在双引号内将导致其输出被视为单个词 —— 而不是期望的文件列表。

如何以正确的方式遍历文件列表呢?有两种可行策略:

for file in ./*.txt

这会使用上述的 bash globbing 功能。记得是双引用"${file}"!

find . -type f -name '*.txt' -exec ...

这个可能是最好的解决方案。 Find 工具允许你使用基于正则表达式的搜索(-regex),递归并具有许多你可能觉得有用的内置功能。这里有一个不错的简介。

find . -type f -name '*.txt' -print0 | xargs -0 ...

另一种替代 find 的用法是使用 xargs。它既不简单也不简短,但 xargs 的优势在于它支持并行管道执行。更多有关此差异的信息阅读此文

总而言之,永远不要尝试解析 ls 命令的输出。它根本没有被设计用于解析,你无法让其正常工作。点此阅读更多。

6. Expect the unexpected

It’s often forgotten to check for non-zero status codes of commands executed within the bash script. It’s easy to imagine what would happen when our cd command preceding file operations fails silently (because of “No such file or directory” for example).

#!/usr/bin/env bash
cd "${some_directory}"
rm -rf ./*

An example above works well, but only if nothing goes wrong. The intention was to delete content of some_directory/, but it may end up executing rm -rf ./* in completely different location.

cd "${some_directory}" && rm -rf ./* and cd "${some_directory}" || return are the simplest and self-descriptive solution. In both cases, deletion won’t execute if cd returns non-zero. It’s worth to point out, that this code is still vulnerable to a common programming error – misspelling.

Executing cd "${some_dierctory}" && rm -rf ./* will end up deleting files you probably want to keep (as long as there isn’t misspelled some_dierctory variable declaration). "${some_dierctory}" will be expanded to "", which is entirely valid cd argument bringing us to home directory. Don’t worry though, that’s not the end of the story.

6. 期待意外

通常忘记检查在bash脚本中执行命令的非零状态代码。很容易想象当我们的cd命令在文件操作之前静默失败时会发生什么(因为例如“没有这样的文件或目录”)。

#!/usr/bin/env bash
cd "${some_directory}"
rm -rf ./*

上面的一个例子会很好的工作,但只有在没有出错的情况下。目的是删除some_directory/目录的内容,但最终可能会在完全不同的目录位置执行rm -rf ./*。

cd“$ {some_directory}”&& rm -rf ./* 和 cd“$ {some_directory}”|| return是最简单的自描述解决方案。在这两种情况下,如果cd返回非零,则不会执行删除。值得指出的是,此代码仍然容易受到常见编程错误的影响 - 拼写错误。

执行cd“$ {some_dierctory}”&& rm -rf ./*将最终删除您可能要保留的文件(只要没有拼写错误的some_dierctory变量声明)。 “$ {some_dierctory}”将扩展为“”,这是完全有效的cd参数,将我们带到主目录。不过不用担心,这不是故事的结局。

Bash has some programmer-friendly switches you should be aware of:

  • set -o nounset tells bash to treat referring to unset variables as an error. This one saves us from many typos mistakes.

  • set -o errexit tells bash to exit the script immediately if any statement returns a non-zero. One may say, that using errexit gives us error checking for free, but this can be tricky to use correctly. Some commands returns a non-zero for a warning and sometimes you know exactly how to handle particular command’s error. Read more here.

  • set -o pipefail changes the default behavior when using pipes. By default, bash takes the status code of the last expression in a pipeline, meaning that false | true will be considered to return 0. It may not be what you want, since this approach ignores errors raised by previous commands in pipeline. This is where pipefail comes in. This options sets the exit code of a pipeline to the rightmost non-zero one (or to 0 if all commands exit successfully).

  • set -x causes bash to print each command right before executing it (i.e. after globbing, arguments expanding). Definitely a great help when trying to debug a bash script failure.

Of course error handling problems applies not only to cd command described above. Your script should take into account vast majority of possible problems, like spaces in pathnames, files missing, directories not being created or non-existing commands (you know, awk isn’t always present in OS you’re about to run your script on).

Bash编程有一些值得注意的对于程序员友好的开关:

  • set -o nounset 可以设置bash将引用未被初始化的变量视为错误,这一特性可以避免我们犯下拼写等低级错误。

  • set -o errexit 可以设置bash脚本在语句的返回值为非0值的时候立即退出。虽然使用errexit可以帮助我们有效的检验程序的错误,但要正确的使用errexit却需要一些技巧。一些命令故意返回非0的值来产生告警,并且程序员确切的知道应该如何去处理特定命令返回的错误值。参考这里了解更多。

  • set -o pipefail 可以改变使用管道时的默认行为。默认情况下,bash会将管道前面的命令返回的状态码作为管道后面的命令的输入,这意味着false| true返回0(管道符前面的状态是非0,会立马执行管道符后的命令)。这样的结果有可能不是你所期望的,因为这种情况下会忽略管道符前面的命令的结果。此时需要使用pipefail命令了,通过set -o pipefail的设置,可以设置管道的退出码为最右边的返回非零的命令(或者在所有的指令的执行成功的情况下设置退出码为0)。

当然,错误问题的处理不仅仅适用于上面提到的cd命令,你的bash脚本应该考虑各种情况下可能出现的问题,比如路径名称中的空格,文件的缺失,目录未创建,或者是错误的使用了原本不存在的命令等(例如,就像你所了解的,并不是所有运行你的bash脚本的linux操作系统都预装了awk命令)

返回顶部
顶部