# 基础正则表达式

既然正则表达式是处理字符串的一种表示方式，那么对字符排序有影响的语系数据就会对正则表达式的结果有影响。此外也需要有支持工具程序来辅助才行。

因此这里先介绍一个最简单的字符串摘取工具程序 grep。前面讲解了 grep 的相关参数与参数，本章着重讲解进阶的 grep 选项说明，介绍完 grep 的功能后，就进入正则表达式的特殊字符处理能力

# 语系对正则表达式的影响

为什么语系数据会影响正则表达式的输出结果？在第 0 章计算器概论的文字编码系统里面谈到，文件其实记录的仅有 0 与 1，我们看到的字符与数值都是通过编码表转换来的。

由于不同语系的编码数据不同，就会造成数据处理结果的差异了，举例说明，假设两种语系输出结果为：

LANG=C：0 1 2 3 ... A B C D ..Z a b c d .. z
LANG=zh_TW ：0 1 2 3 ... a A b c C D .. z Z

两种语系明显不一样，如果想获取大写字符使用 [A-Z]时，会发现 C 可以获取到正确的大写字符（因为是连续的），zh_TW 连同小写也会 b-z 也会获取到，因为就编码的顺序来看，big5 语系可以获取到 A b B c C .. z Z 这一堆字符。

所以使用正则表达式时，需要留意当前的语系，否则可能发现与别人不同的截取结果

由于一般我们再联系正则表达式时，使用的是兼容于 POSIX 的标准，因此就使用 C 这个语系，因此下面的练习都是使用 LANG=C来练习的。为了避免这样编码所造成的英文与数字截取问题，因此特殊符号需要了解下

[:alnum:]：代表英文大小写字符及数字，即 0-9、A-Z、a-z
[:alpha:]：代表任何英文大小写字符，A-Z、a-z
[:blank:]：代表空格与 tab
[:cntrl:]：代表键盘上面的控制按键，包括 CR、LF、TAB、Del 等
[:digit:]：代表数字，0-9
[:graph:]：除了空格符（空格键与 tab 键）外其他的所有按键
[:lower:]：代表些小字符，a-z
[:print:]：代表任何可以被打印出来的字符
[:punct:]：代表标点符号（punctuation symbol）
[:upper:]：代表大写字符，A-Z
[:space:]：任何会产生空白的字符，包括空格、tab、CR 等
[:xdigit:]：代表 16 进制的数值类型，包括 0-9、A-F、a-f 的数字与字符

尤其是 [:alnum:]、[:alpha:]、[:upper:]、[:lower:]、[:digit:]一定要知道代表什么意思，因为他们要比 a-z 或 A-Z 的用途要确定。

# grep 的一些进阶选项

在第十章 BASH 中的 grep 谈论过一些基础用法，下面列出较进阶的 grep 选项与参数

grep [-A] [-B] [--color='auto'] '关键词' filename

-A：后面可以加数字，为 after 的意思，除了列出该行外，后续的 n 行也列出来
-B：后面可以加数字，为 befer 的意思，处理列出该行外，前面的 n 行也列出来
--colort=auto：可将正确的哪个截取数据列出颜色

1
2
3
4
5

实践与练习

# 范例 1：用 dmesg 列出核心信息，再以 grep 找出含有 qx1 那一行
dmesg | grep 'qx1'
# 笔者不知道自己使用的显卡是什么，而且使用的是虚拟机，而作者使用的显卡是 qx1，所以查看显卡信息

# 范例 2：用 --color=auto 显示查找到的关键词高亮,并显示行号
dmesg | grep -n --color=auto ‘qx1’

# 范例 3：在关键词所在行的前两行与后三行也一起显示出来
dmest | grep -n -A2 -B3 --color=auto 'qx1'

1
2
3
4
5
6
7
8
9

grep 是一个很常见也很常用的指令，最重要的功能就是进行字符串的比对，然后将符合用户需求的字符串打印出来。需要注意的是：grep 是已整行为单位来进行数据截取的

# 基础正则表达式练习

要了解正则表达式最简单的方法就是由实际练习去感受，所以在汇总特殊符号前，先以下面这个文件的内容来进行正则表达式的练习，练习前提为：

语系已经使用 export LANG=C；export LC_ALL=C
grep 已经使用 alias 设置为 grep --color=auto

本机默认为 LANG=en_US.UTF-8;LC_ALL=

文件为 regular——express.txt ，该文件内容是在 windows 系统下编辑的，所以包含 dos 的换行符；

"Open Source" is a good mechanism to develop programs.
apple is my favorite food.
Football game is not use feet only.
this dress doesn't fit me.
However, this dress is about $ 3183 dollars.
GNU is free air not free beer.
Her hair is very beauty.
I can't finish the test.
Oh! The soup taste good.
motorcycle is cheap than car.
This window is clear.
the symbol '*' is represented as start.
Oh!	My god!
The gd software is a library for drafting programs.
You are the best is mean you are the no. 1.
The world <Happy> is the same with "glad".
I like dog.
google is the best tools for search keyword.
goooooogle yes!
go! go! Let's go.
# I am VBird

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

文件最后一行为空白行。

# 范例 1：搜索特定字符

从文件中取得 the 这个特定字符串，最简单的方式如下

[mrcode@study tmp]$ grep -n 'the' regular_express.txt
8:I can't finish the test.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.

1
2
3
4
5
6

反向选择，可以看到输出结果少了上面的 8、12、15、16、18 行

[mrcode@study tmp]$ grep -vn 'the' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
5:However, this dress is about $ 3183 dollars.
6:GNU is free air not free beer.
7:Her hair is very beauty.
9:Oh! The soup taste good.
10:motorcycle is cheap than car.
11:This window is clear.
13:Oh!  My god!
14:The gd software is a library for drafting programs.
17:I like dog.
19:goooooogle yes!
20:go! go! Let's go.
21:# I am VBird
22:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

忽略大小写，多出来几行

[mrcode@study tmp]$ grep -in 'the' regular_express.txt
8:I can't finish the test.
9:Oh! The soup taste good.
12:the symbol '*' is represented as start.
14:The gd software is a library for drafting programs.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
18:google is the best tools for search keyword.

1
2
3
4
5
6
7
8

# 范例 2：利用中括号`[]`来搜索集合字符

如果要搜索 test 或 taste 这两个单词时，可以发现他们其实有共同的 t?st 存在

[mrcode@study tmp]$ grep -n 't[ae]st' regular_express.txt
8:I can't finish the test.
9:Oh! The soup taste good.

1
2
3

中括号中，无论几个字符都表示任意一个字符。如果想要搜索到所有 oo 字符时

[mrcode@study tmp]$ grep -n 'oo' regular_express.txt
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!

1
2
3
4
5
6
7

如果不想要 oo 前面的 g 呢？

[mrcode@study tmp]$ grep -n '[^g]oo' regular_express.txt
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!

1
2
3
4
5

会发现可能会有一部分是正确的，一部分是错误的，比如 1、9 行少了，但是 google 和 goooooogle 还是出来了，是怎么回事？第 18 行，出现了 tools 所以也符合 [^g]oo，而 19 行，中间有那么多的 oo，也符合

继续，不想要 oo 前面是小写字符的

# 由于小写字符的 ASCII 编码顺序是连续的，所以可以简化为，否则就需要把 a-z 都写出来
[mrcode@study tmp]$ grep -n '[^a-z]oo' regular_express.txt
3:Football game is not use feet only.

# 取得有数字那一行
[mrcode@study tmp]$ grep -n '[0-9]' regular_express.txt
5:However, this dress is about $ 3183 dollars.
15:You are the best is mean you are the no. 1.

1
2
3
4
5
6
7
8

由于考虑到语系对于编码顺序的影响，因此除了连续编码使用减号 -，还可以使用如下的方法来取得前面两个测试的结果

[mrcode@study tmp]$ grep -n '[^[:lower:]]oo' regular_express.txt
3:Football game is not use feet only.

[mrcode@study tmp]$ grep -n '[[:digit:]]' regular_express.txt
5:However, this dress is about $ 3183 dollars.
15:You are the best is mean you are the no. 1.

1
2
3
4
5
6

# 范例 3：行首与行位字符 `^、$`

# 只要行首是 the 的
[mrcode@study tmp]$ grep -n '^the' regular_express.txt 
12:the symbol '*' is represented as start.

# 想要行首是小写字符开头的
[mrcode@study tmp]$ grep -n '^[a-z]' regular_express.txt 
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
# 下面的等效
# [mrcode@study tmp]$ grep -n '^[[:lower:]]' regular_express.txt 

# 不要英文字母开头的
# ^ 在中括号内表示反选，在外表示定位首航
[mrcode@study tmp]$ grep -n '^[^a-zA-Z]' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
21:# I am VBird

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

行尾练习

# 找出行尾为 . 符号的数据
# 使用 \ 对 小数点转义
[mrcode@study tmp]$ grep -n '\.$' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
5:However, this dress is about $ 3183 dollars.
6:GNU is free air not free beer.
7:Her hair is very beauty.
8:I can't finish the test.
9:Oh! The soup taste good.
10:motorcycle is cheap than car.
11:This window is clear.
12:the symbol '*' is represented as start.
14:The gd software is a library for drafting programs.
15:You are the best is mean you are the no. 1.
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
20:go! go! Let's go.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

这里需要说一句，原本的文件 5-9 行默认是 .^M$ 结尾的，也就是 \r\n，由于没有网络，无法下载文件，所以复制粘贴丢失了这些换行符，和书上结果不一样。

也就是说上面的示例 5-9 不应该出来的，使用命令查看特殊字符应该如下

[mrcode@study tmp]$ cat -An regular_express.txt | head -n 10 | tail -n 6
     5  However, this dress is about $ 3183 dollars.^M$
     6  GNU is free air not free beer.^M$
     7  Her hair is very beauty.^M$
     8  I can't finish the test.^M$
     9  Oh! The soup taste good.^M$
    10  motorcycle is cheap than car.$		# 但实际上 ^M 被丢失了

1
2
3
4
5
6
7
8

找出空白行

[mrcode@study tmp]$ grep -n '^$' regular_express.txt 
22:
# 只有行首和行尾的表示法，中间没有任何字符，所以是 ^$

1
2
3

假设你已经知道 shell script 或则是配置文件中，空白行与开头为 # 的那一行是批注，想要将这些数据忽略掉，该怎么做？

[mrcode@study tmp]$ cat -n /etc/rsyslog.conf 
# 在 centOS 7 中可以看到有 91 行，有大量的空行与批注信息

# 第一种写法：-v '^$' 是反选，也就是排除空行的，-v ‘^#’ 排除开头是 # 号的
# 但是这里的行号与源文件对不上了，后面的行号是针对前面排除空行后的行号
[mrcode@study tmp]$ grep -v '^$' /etc/rsyslog.conf | grep -vn '^#'
6:$ModLoad imuxsock # provides support for local system logging (e.g. via logger command)
7:$ModLoad imjournal # provides access to the systemd journal
18:$WorkDirectory /var/lib/rsyslog
20:$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
25:$IncludeConfig /etc/rsyslog.d/*.conf
28:$OmitLocalLogging on
30:$IMJournalStateFile imjournal.state
37:*.info;mail.none;authpriv.none;cron.none                /var/log/messages
39:authpriv.*                                              /var/log/secure
41:mail.*                                                  -/var/log/maillog
43:cron.*                                                  /var/log/cron
45:*.emerg                                                 :omusrmsg:*
47:uucp,news.crit                                          /var/log/spooler
49:local7.*                                                /var/log/boot.log

# 第二种实现：直接匹配行首非 # 开头的
# 因为使用了中括号表示需要有一个字符存在，所以空行的不会被匹配
[mrcode@study tmp]$ grep -n '^[^#]' /etc/rsyslog.conf 
9:$ModLoad imuxsock # provides support for local system logging (e.g. via logger command)
10:$ModLoad imjournal # provides access to the systemd journal
26:$WorkDirectory /var/lib/rsyslog
29:$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
36:$IncludeConfig /etc/rsyslog.d/*.conf
40:$OmitLocalLogging on
43:$IMJournalStateFile imjournal.state
54:*.info;mail.none;authpriv.none;cron.none                /var/log/messages
57:authpriv.*                                              /var/log/secure
60:mail.*                                                  -/var/log/maillog
64:cron.*                                                  /var/log/cron
67:*.emerg                                                 :omusrmsg:*
70:uucp,news.crit                                          /var/log/spooler
73:local7.*                                                /var/log/boot.log

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

这里要注意的是批注可以出现在任意处，所以匹配行首的是最安全的做法

# 范例 4：任意一个字符 `.` 与重复字符 `*`

在第十章 bash 中，通配符 *表示任意（0 或多个）字符，但是正则表达式中并不是这样，他们含义如下：

.：一定有一个任意字符
*：重复前一个字符，0 到任意次，为组合形态

# 找出 g??d 的字符串，也就是 g 开头 d 结尾的 4 字符的字符串
[mrcode@study tmp]$ grep -n 'g..d' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
9:Oh! The soup taste good.
16:The world <Happy> is the same with "glad".

# 找出 oo、ooo、ooo 等数据，至少含有 2 个 o
# 注意，这里不能写 oo* 因为，*是作用于第二个 o 的，表示 0 到任意个
# 也就是说如果是 oo* 有可能匹配到一个 o 
[mrcode@study tmp]$ grep -n 'ooo*' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!

# 找出 开头与结尾都是 g ，并且中间至少含有一个 o 的数据
# 也就是 gog、goog 之类的数据
[mrcode@study tmp]$ grep -n 'goo*g' regular_express.txt 
18:google is the best tools for search keyword.
19:goooooogle yes!

# 找出 开头与结尾都是 g，中间有无字符均可
[mrcode@study tmp]$ grep -n 'g*g' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
3:Football game is not use feet only.
9:Oh! The soup taste good.
13:Oh!  My god!
14:The gd software is a library for drafting programs.
16:The world <Happy> is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
# 使用 g*g 发现第一行的数据就不匹配，这个还是需要再终端看，因为可以开启高亮，方便查看哈
# 原因是 * 作用于 g，g* 代表空字符或一个以上的 g，因此应该匹配 g、gg、ggg 等

# 正确的应该这样实现
[mrcode@study tmp]$ grep -n 'g.*g' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
14:The gd software is a library for drafting programs.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.

# 找出包含任意数字的数据
# 同上，[0-9]* 只作用于一个中括号
[mrcode@study tmp]$ grep -n '[0-9][0-9]*' regular_express.txt 
5:However, this dress is about $ 3183 dollars.
15:You are the best is mean you are the no. 1.
# 直接使用 grep -n '[0-9]' regular_express.txt 也可以得到相同结果哈

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

# 范例 5：限定连续正则字符范围 `{}`

找出 2 个到 5 个 o 的连续字符串

# 华括弧在 shell 中是特殊符号，需要转义
[mrcode@study tmp]$ grep -n 'o\{2\}' regular_express.txt 
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.
18:google is the best tools for search keyword.
19:goooooogle yes!
# 上述结果是至少是 2 个 oo 的出来了

# 单词开头结尾都是 g，中间 o，至少 2 个，最多 5 个
[mrcode@study tmp]$ grep -n 'go\{2,5\}g' regular_express.txt 
18:google is the best tools for search keyword.

# 承上，只是中间的 o 至少 2 个
[mrcode@study tmp]$ grep -n 'go\{2,\}g' regular_express.txt 
18:google is the best tools for search keyword.
19:goooooogle yes!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

# 基础正则表示法字符汇总

^word：搜索的关键词 word 在行首

范例：搜索行首为 # 的，并列出行号 grep -n '^#' file
word$：搜索的关键词 word 在行尾

范例：搜索以！结尾的，grep -n '!$' file
.：一定有一个任意字符

范例：搜索字符串可以是 eve、eae、eee、e e；grep -n 'e.e' file
\：转义字符

范例：搜索含有单引号数据。grep -n '\’' file
*：重复另个到无穷多个前一个字符

范例：找出含有 es、ess、esss 等字符串；grep -n 'es*' file
[list]：里面列出想要截取的字符合集

范例：找出含有 g1 或 gd 的数据；grep -n 'g[1d]' file
[n1-n2]：字符合集范围

范例：找出含有任意大写字母的数据；grep -n '[A-Z]' file
[^list]：不要包含该集合中的字符或该范围的字符

范例：找出 ooa、oog 但是不包含 oot 的数据; grep -n 'oo[^t]'
\{n,m\}：连续 n 到 m 个前一个字符
\{n\}：连续 n 个前一个字符
\{n,\}：至少 n 个以上的前一个字符；咋效果上感觉和 \{n\} 是一样的

最后再强调，通配符和正则表达式不一样，比如在 ls 命令中找出以 a 开头的文件

通配符：ls -l a*
正则表达式：ls | grep -n '^a' 或则 ls | grep -n '^a.*'

# 范例：以 ls -l 配合 grep 找出 /etc/ 下文件类型为链接文件属性的文件名
# 符号链接文件的特点是权限前面一位是 l，根据 ls 的输出，只要找到行首为 l 的即可
[mrcode@study tmp]$ ls -l /etc | grep '^l'
lrwxrwxrwx.  1 root root       56 Oct  4 18:22 favicon.png -> /usr/share/icons/hicolor/16x16/apps/fedora-logo-icon.png
lrwxrwxrwx.  1 root root       22 Oct  4 18:23 grub2.cfg -> ../boot/grub2/grub.cfg

1
2
3
4
5
6

# sed 工具

了解了一些正则基础使用后，可以来玩一玩 sed 和 awk ；作者就利用他们两个实现了一个小工具：logfile.sh 分析登录文件（第十八章会讲解）。里面绝大部分关键词的提取、统计等都是通过他们来完成的

sed：本身是一个管线命令，可以分析 standard input 的数据，还可以将数据进行替换、新增、截取特定行等功能

sed [-nefr] [动作]

选项与参数：

n：使用安静（silent）模式

在一般 sed 的用法中，所有来自 STDIN 的数据一般都会列出到屏幕上，加上 -n 之后，只有经过 sed 特殊处理的那一行（或则动作）才会被打印出来
e：直接在指令模式上进行 sed 的动作编辑
f：直接将 sed 的动作写在一个文件内，- f filename 则可以执行 filename 内的 sed 动作
r：sed 的动作支持是延伸类型正则表达式的语法（预设是基础正则表达式语法）
i：直接修改读取的文件内容，而不是由屏幕输出

动作说明：[n1[,n2]]function

n1,n2：不见得会存在，一般代表「选择进行动作的行数」，比如：如果我的动作是需要再 10 到 20 行之间进行的，则「10,20[动作行为]」

function 有如下：

a：新增，a 后面可以接字符串，这些字符串会在新的一行出现（当前的下一行）
c：替换，c 后面可以接字符串，这些字符串替换 n1,n2 之间的行
d：删除，后面不接任何字符串
i：插入，i 的后面可以接字符串，而这些字符串会在新的一行出现（当前的上一行）
p：打印，将某个选择的数据打印。通常 p 会参与 sed -n 一起运作
s：替换，可以直接进行替换工作。通常这个 s 的动作可以搭配正则表达式，例如：1,20s/old/new/g

# 以行为单位的新增/删除功能

# 范例1：将 /etc/passwd 的内容列出并且打印行号，同时将第 2~5 行删除
[mrcode@study ~]$ nl /etc/passwd | sed '2,5d'   	# 注意写法和结果
     1  root:x:0:0:root:/root:/bin/bash
     6  sync:x:5:0:sync:/sbin:/bin/sync
     7  shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
# 另外这里，应该带上 sed -e '2,5d' 才标准，不过不带也可以，但是需要使用单引号括起来
# 实测不用单引号也可以实现

# 范例2：只删除第二行
[mrcode@study ~]$ nl /etc/passwd | sed '2d'

# 范例3：删除第三行到最后一行
[mrcode@study ~]$ nl /etc/passwd | sed '3,$d'
     1  root:x:0:0:root:/root:/bin/bash
     2  bin:x:1:1:bin:/bin:/sbin/nologin

# 范例4：在第二行后（也就是加载第三行）加上「drink tea？」字样
[mrcode@study ~]$ nl /etc/passwd | sed '2a drink tea?'
     1  root:x:0:0:root:/root:/bin/bash
     2  bin:x:1:1:bin:/bin:/sbin/nologin
drink tea?
     3  daemon:x:2:2:daemon:/sbin:/sbin/nologin

# 范例5：在第二行后面加入两行字
# 注意：不要一开始就写好所有的单引号，因为需要使用 \ + 回车触发换行
[mrcode@study ~]$ nl /etc/passwd | sed '2a drink tea \
> drink beer?'
     1  root:x:0:0:root:/root:/bin/bash
     2  bin:x:1:1:bin:/bin:/sbin/nologin
drink tea 
drink beer?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

# 以行为单位的取代显示功能

# 范例1：将第 2-5 行的内容替换为 no 2-5 nuber
[mrcode@study ~]$ nl /etc/passwd | sed '2,5c no 2-5 number'
     1  root:x:0:0:root:/root:/bin/bash
no 2-5 number

# 范例2：取出第 11-20 行
# 通过之前的知识来达成需要这样写
[mrcode@study ~]$ nl /etc/passwd | head -n 20 | tail -n 10
    11  games:x:12:100:games:/usr/games:/sbin/nologin
    12  ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
    13  nobody:x:99:99:Nobody:/:/sbin/nologin
    14  systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
    15  dbus:x:81:81:System message bus:/:/sbin/nologin
    16  polkitd:x:999:998:User for polkitd:/:/sbin/nologin
    17  colord:x:998:997:User for colord:/var/lib/colord:/sbin/nologin
    18  libstoragemgmt:x:997:995:daemon account for libstoragemgmt:/var/run/lsm:/sbin/nologin
    19  rpc:x:32:32:Rpcbind Daemon:/var/lib/rpcbind:/sbin/nologin
    20  saslauth:x:996:76:Saslauthd user:/run/saslauthd:/sbin/nologin
# 注意需要使用 -n 只输出 sed 处理过的数据
[mrcode@study ~]$ nl /etc/passwd | sed -n '11,20p'

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

# 部分数据的搜索并替换功能

除了整行的处理模式外，还可以用行为单位进行部分数据的搜索并替换的功能，基本上 sed 的搜索与替换与 vi 类似

sed 's/要被替换的字符串/新的字符串/g'

# 范例1：先观察原始信息，利用 /sbin/ifconfig 查询 IP
[mrcode@study ~]$ /sbin/ifconfig 
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.128  netmask 255.255.255.0  broadcast 192.168.0.255
        inet6 fe80::deb9:3a1b:fd0f:f6c2  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:a0:49:8f  txqueuelen 1000  (Ethernet)
        RX packets 2436261  bytes 219827411 (209.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2011081  bytes 319310584 (304.5 MiB)
# 还未讲解到 IP,这里先关注第二行的 IP

# 利用关键词配合 grep 截取出关键的一行数据
[mrcode@study ~]$ /sbin/ifconfig enp0s3 | grep 'inet '
        inet 192.168.0.128  netmask 255.255.255.0  broadcast 192.168.0.255

# 将 ip 前面的信息删除，也就是 inet 
[mrcode@study ~]$ /sbin/ifconfig enp0s3 | grep 'inet ' | sed 's/inet //g'
        192.168.0.128  netmask 255.255.255.0  broadcast 192.168.0.255
# 需要使用通配符，不然会留下前面的空白符号：任意字符开头另个或多个
[mrcode@study ~]$ /sbin/ifconfig enp0s3 | grep 'inet ' | sed 's/^.*inet //g'
192.168.0.128  netmask 255.255.255.0  broadcast 192.168.0.255

# 再删除后续部分，只剩下 192.168.0.128
# 注意这里需要使用：空格任意个，来匹配前面多个空格
[mrcode@study ~]$ /sbin/ifconfig enp0s3 | grep 'inet ' | sed 's/^.*inet //g' | sed 's/ *netmask.*$//g'
192.168.0.128

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

上面例子建议一步一步的来做，下面继续研究 sed 与正则表示法配合练习

# 范例2：只要 MAN 存在的那几行数据，但是含有 # 在内的批注和空白行不要
# 步骤1：先使用 grep 将关键词 MAN 所在行取出来
[mrcode@study ~]$ cat /etc/man_db.conf | grep 'MAN'
# MANDATORY_MANPATH                     manpath_element
# MANPATH_MAP           path_element    manpath_element
# MANDB_MAP             global_manpath  [relative_catpath]
# every automatically generated MANPATH includes these fields
#MANDATORY_MANPATH                      /usr/src/pvm3/man
MANDATORY_MANPATH                       /usr/man
MANDATORY_MANPATH                       /usr/share/man
...省略...
# 步骤2：删除掉批注数据行
[mrcode@study ~]$ cat /etc/man_db.conf | grep 'MAN' | sed 's/^#.*$//g'





MANDATORY_MANPATH                       /usr/man
MANDATORY_MANPATH                       /usr/share/man
MANDATORY_MANPATH                       /usr/local/share/man
# 步骤3：删除空白行
# 注意这里使用了动作里面的 d 命令，前面是正则匹配？
[mrcode@study ~]$ cat /etc/man_db.conf | grep 'MAN' | sed 's/^#.*$//g' | sed '/^$/d'
MANDATORY_MANPATH                       /usr/man
MANDATORY_MANPATH                       /usr/share/man
MANDATORY_MANPATH                       /usr/local/share/man

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

# 直接修改文件内容（危险动作）

# 范例1：利用 sed 将 /tmp/regular_express.txt 内每一行结尾若为 . 则换成 ！
# 下面还是使用了动作 s 替换，后面的是转义 . 和 !
# 这样可以直接修改文件内容
[mrcode@study tmp]$ sed -i 's/\./\!/g' regular_express.txt 

# 范例2：利用 sed 直接在 /tmp/regular_express.txt 最后一行加入 # This is a test
# $ 表示最后一行
[mrcode@study tmp]$ sed -i '$a # This is a test ' regular_express.txt 
# 想要删除最后一行就简单了
[mrcode@study tmp]$ sed -i '$d' regular_express.txt

1
2
3
4
5
6
7
8
9
10

← 开始之前：上面是正则表达式延伸正则表示法 →

# 基础正则表达式

# 语系对正则表达式的影响

# grep 的一些进阶选项

# 基础正则表达式练习

# 范例 1：搜索特定字符

# 范例 2：利用中括号[]来搜索集合字符

# 范例 3：行首与行位字符 ^、$

# 范例 4：任意一个字符 . 与重复字符 *

# 范例 5：限定连续 正则字符范围 {}

# 基础正则表示法字符汇总

# sed 工具

# 以行为单位的新增/删除功能

# 以行为单位的取代显示功能

# 部分数据的搜索并替换功能

# 直接修改文件内容（危险动作）

# 范例 2：利用中括号`[]`来搜索集合字符

# 范例 3：行首与行位字符 `^、$`

# 范例 4：任意一个字符 `.` 与重复字符 `*`

# 范例 5：限定连续正则字符范围 `{}`