一文掌握Linux服务器监控命令

1. cpu
cat /proc/cpuinfo# 物理 cpu 个数cat /proc/cpuinfo | grep 'physical id' | sort | uniq | wc -l# 每个 cpu 核心数cat /proc/cpuinfo | grep 'core id' | sort | uniq | wc -l# 逻辑 cpucat /proc/cpuinfo | grep 'processor' | sort | uniq | wc -l# mpstatmpstatmpstat 2 10
2. 内存
cat /proc/meminfofree -gtdf -htdu -csh ./*
操作系统 ipc 共享内存/队列：
ipcs #(shmems, queues, semaphores)
平时我们经常需要监控内存的使用状态，常用的命令有free、vmstat、top、dstat -m等。
2.1 free
> free -h             total       used       free     shared    buffers     cachedmem:          7.7g       6.2g       1.5g        17m        33m       184m-/+ buffers/cache:       6.0g       1.7gswap:          24g       581m        23g
各行数据含义
第一行mem：
total：内存总数7.7g，物理内存大小，就是机器实际的内存
used：已使用内存6.2g，这个值包括了cached和应用程序实际使用的内存
free：空闲的内存1.5g，未被使用的内存大小
shared：共享内存的大小，17m
buffers：被缓冲区占用的内存大小，33m
cached：被缓存占用的内存大小，184m
其中有：
total = used + free
第二行-/+ buffers/cache，代表应用程序实际使用的内存：
前一个值表示used - buffers/cached，表示应用程序实际使用的内存
后一个值表示free + buffers/cached，表示理论上都可以被使用的内存
可以看到，这两个值加起来也是total
第三行swap，代表交换分区的使用情况：总量、使用的和未使用的
缓存 cache
cache代表缓存，当系统读取文件时，会先把数据从硬盘读到内存里，因为硬盘比内存慢很多，所以这个过程会很耗时。
为了提高效率，linux 会把读进来的文件在内存中缓存下来（局部性原理），即使程序结束，cache 也不会被自动释放。因此，当有程序进行大量的读文件操作时，就会发现内存使用率升高了。
当其他程序需要使用内存时，linux 会根据自己的缓存策略（例如 lru）将这些没人使用的 cache 释放掉，给其他程序使用，当然也可以手动释放缓存：
echo 1 > /proc/sys/vm/drop_caches
缓冲区 buffer
考虑内存写文件到硬盘的场景，因为硬盘太慢了，如果内存要等待数据写完了之后才继续后面的操作，效率会非常低，也会影响程序的运行速度，所以就有了缓冲区buffer。
当内存需要写数据到硬盘中时会先放到 buffer 里面，内存很快把数据写到 buffer 中，可以继续其他工作，而硬盘可以在后台慢慢读出 buffer 中的数据并保存起来，这样就提高了读写的效率。
例如把电脑中的文件拷贝到 u 盘时，如果文件特别大，有时会出现这样的情况：明明看到文件已经拷贝完，但系统还是会提示 u 盘正在使用中。这就是 buffer 的原因：拷贝程序虽然已经把数据放到 buffer 中，但是还没有全部写入到 u 盘中
同样的，可以使用sync命令来手动flush buffer中的内容：
> sync --helpusage: sync [option] [file]...synchronize cached writes to persistent storageif one or more files are specified, sync only them,or their containing file systems.  -d, --data             sync only file data, no unneeded metadata  -f, --file-system      sync the file systems that contain the files      --help     display this help and exit      --version  output version information and exitgnu coreutils online help: full documentation at: or available locally via: info '(coreutils) sync invocation'
交换分区 swap
交换分区swap是实现虚拟内存的重要概念。swap就是把硬盘上的一部分空间当作内存来使用，正在运行的程序会使用物理内存，把未使用的内存放到硬盘，叫做swap out。而把硬盘交换分区中的内存重新放到物理内存中，叫做swap in。
交换分区可以在逻辑上扩大内存空间，但是也会拖慢系统速度，因为硬盘的读写速度很慢。linux 系统会将不经常使用的内存放到交换分区中。
cache 和 buffer 的区别
cache：作为page cache的内存，是文件系统的缓存，在文件层面上的数据会缓存到page cache中
buffer：作为buffer cache的内存，是磁盘块的缓存，直接对磁盘进行操作的数据会缓存到 buffer cache 中
简单来说：page cache用来缓存文件数据，buffer cache用来缓存磁盘数据。在有文件系统的情况下，对文件操作，那么数据会缓存到page cache中。如果直接采用dd等工具对磁盘进行读写，那么数据会缓存到buffer cache中。
2.2 vmstat
vmstat (virtual memory statics，虚拟内存统计) 是对系统的整体情况进行统计，包括内核进程、虚拟内存、磁盘、中断和 cpu 活动的统计信息：
> vmstat --helpusage: vmstat [options] [delay [count]]options: -a, --active           active/inactive memory -f, --forks            number of forks since boot -m, --slabs            slabinfo -n, --one-header       do not redisplay header -s, --stats            event counter statistics -d, --disk             disk statistics -d, --disk-sum         summarize disk statistics -p, --partition   partition specific statistics -s, --unit       define display unit -w, --wide             wide output -t, --timestamp        show timestamp -h, --help     display this help and exit -v, --version  output version information and exit来源 | 公众号：网络技术干货圈for more details see vmstat(8).> vmstat -sm 1 100 # 1 表示刷新间隔(秒)，100 表示打印次数，单位 mbprocs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st 1  0      0    470    188   1154    0    0     0     4    3    0  0  0 99  0  0 0  0      0    470    188   1154    0    0     0     0  112  231  1  1 98  0  0 0  0      0    470    188   1154    0    0     0     0   91  176  0  0 100  0  0 0  0      0    470    188   1154    0    0     0     0  118  229  1  0 99  0  0 0  0      0    470    188   1154    0    0     0     0   78  156  0  0 100  0  0 0  0      0    470    188   1154    0    0     0    64   84  186  0  1 97  2  0
procs
r列：表示运行和等待 cpu 时间片的进程数，这个值如果长期大于 cpu 个数，就说明 cpu 资源不足，可以考虑增加 cpu
b列：表示在等待资源的进程数，例如正在等待 i/o 或者内存交换
memory
swpn列：表示切换到交换分区的内存大小，如果swpd的值不为 0 或者比较大，且si、so的值长期为 0，那么这种情况暂时不会影响系统性能
free列：当前空闲的物理内存大小
buff列：表示buffers cache的内存大小，一般对块设备的读写才需要缓冲
cache列：表示page cache的内存大小，一般作为文件系统的缓存，频繁访问的文件都会被 cached。如果 cache 值比较大，就说明 cached 文件数量较多。如果此时 i/o 中的bi比较小，就说明文件系统效率比较好
swap
si列：表示swap in，即内存由交换分区放入物理内存中
so列：表示swap out，即将未使用的内存放到硬盘的交换分区中
io
bi列：表示从块设备读取的数据总量，即读磁盘，单位kb/s
bo列：表示写入块设备的数据总量，即写磁盘，单位kb/s
这里设置的bi+bo参考值为1000，如果超过1000，且wa值比较大，则表示系统磁盘 i/o 性能瓶颈
system
in列：表示在某一时间间隔中观察到的每秒设备中断数
cs列：表示每秒产生的上下文切换次数
上面这两个值越大，内核消耗的 cpu 时间就越多
cpu
us列：表示用户进程消耗 cpu 的时间百分比。us值比较高时，说明用户进程消耗的 cpu 时间多，如果长期大于 50%，可以考虑优化程序
sy列：表示内核进程消耗 cpu 的时间百分比。sy值比较高时，说明内核消耗的 cpu 时间多，如果us+sy超过 80%，就说明 cpu 资源存在不足
id列：表示 cpu 处在空闲状态的时间百分比
wa列：表示 i/o wait 所占 cpu 的时间百分比。wa值越高，说明 i/o wait 越严重。如果wa值超过 20%，说明 i/o wait 严重
st列：表示 cpu steal time，针对虚拟机
3. 网络
3.1 接口
ifconfigiftopethtool
3.2 端口
# 端口netstat -ntlp # tcpnetstat -nulp # udpnetstat -nxlp # unixnetstat -nalp # 不仅展示监听端口，还展示其他阶段的连接lsof -p  -plsof -i :5900sar -n dev 1  # 网络流量ssss -s
3.3 tcpdump
sudo tcpdump -i any udp port 20112 and ip[0x1f:02]=0x4e91 -xnnvvvsudo tcpdump -i any -xnnvvvsudo tcpdump -i any udp -xnnvvvsudo tcpdump -i any udp port 20112 -xnnvvvsudo tcpdump -i any udp port 20112 and ip[0x1f:02]=0x4e91 -xnnvvv
3.4 nethogs
监控各进程的网络流量
nethogs
4. i/o 性能
iotopiostatiostat -kx 2vmstat -smvmstat 2 10dstatdstat --top-io --top-bio
5. 进程
toptop -hhtopps auxfps -elf # 展示线程ls /proc//task
5.1 top
例如最常用的top命令：
help for interactive commands - procps version 3.2.8window 1 cumulative mode off.  system: delay 3.0 secs; secure mode off.  z,b       global: 'z' change color mappings; 'b' disable/enable bold  l,t,m     toggle summaries: 'l' load avg; 't' task/cpu stats; 'm' mem info  1,i       toggle smp view: '1' single/separate states; 'i' irix/solaris mode  f,o     . fields/columns: 'f' add or remove; 'o' change display order  f or o  . select sort field       . move sort field: '' next col right  r,h     . toggle: 'r' normal/reverse sort; 'h' show threads  c,i,s   . toggle: 'c' cmd name/line; 'i' idle tasks; 's' cumulative time  x,y     . toggle highlights: 'x' sort field; 'y' running tasks  z,b     . toggle: 'z' color/mono; 'b' bold/reverse (only if 'x' or 'y')  u       . show specific user only  n or #  . set maximum tasks displayed  k,r       manipulate tasks: 'k' kill; 'r' renice  d or s    set update interval  w         write configuration file  q         quit          ( commands shown with '.' require a visible task display window ) press 'h' or '?' for help with windows,any other key to continue
1: 显示各个 cpu 的使用情况
c: 显示进程完整路径
h: 显示线程
p: 排序 - cpu 使用率
m: 排序 - 内存使用率
r: 倒序
z: change color mappings
b: disable/enable bold
l: toggle load avg
t: toggle task/cpu stats
m: toggle mem info
us - time spent in user spacesy - time spent in kernel spaceni - time spent running niced user processes (user defined priority)id - time spent in idle operationswa - time spent on waiting on io peripherals (eg. disk)hi - time spent handling hardware interrupt routines. (whenever a peripheral unit want attention form the cpu, it literally pulls a line, to signal the cpu to service it)来源 | 公众号：网络技术干货圈si - time spent handling software interrupt routines. (a piece of code, calls an interrupt routine...)st - time spent on involuntary waits by virtual cpu while hypervisor is servicing another processor (stolen from a virtual machine)
5.2 lsof
lsof -p -p 123
6. 性能测试
stress --cpu 8        --io 4         --vm 2         --vm-bytes 128m        --timeout 60s
time命令
7. 用户
wwhoami
8. 系统状态
uptimehtopvmstatmpstatdstat
9. 硬件设备
lspcilscpulsblklsblk -fm # 显示文件系统、权限lshw -c displaydmidecode
10. 文件系统
# 挂载mountumountcat /etc/fstab# lvmpvdisplaypvslvdisplaylvsvgdisplayvgsdf -htlsof
11. 内核、中断
cat /proc/modulessysctl -a | grep ...cat /proc/interrupts
12. 系统日志、内核日志
dmesgless /var/log/messagesless /var/log/secureless /var/log/auth
13. cron 定时任务
crontab -lcrontab -l -u nobody # 查看所有用户的cronsudo find /var/spool/cron/ | sudo xargs cat
14. 调试工具
14.1 perf
14.2 strace
strace命令用于打印系统调用、信号：
strace -pstrace -p 5191 -fstrace -e trace=signal -p 5191-e trace=open-e trace=file-e trace=process-e trace=network-e trace=signal-e trace=ipc-e trace=desc-e trace=memory
14.3 ltrace
ltrace命令用于打印动态链接库访问：
ltrace -p ltrace -s # syscall
15. 场景案例
场景 1：连上服务器之后
w       # 显示当前登录的用户、登录 ip、正在执行的进程等last    # 看看最近谁登录了服务器、服务器重启时间uptime  # 开机时间、登录用户、平均负载history # 查看历史命令
场景 2：/proc 目录有哪些信息
cat /proc/...cgroupscmdlinecpuinfocryptodevicesdiskstatsfilesystemsiomemioportskallsymsmeminfomodulespartitionsuptimeversionvmstat
场景 3：后台执行命令
nohup  &>[some.log] &
一些命令
# 综合tophtop glancesdstat & sarmpstat# 性能分析perf# 进程pspstree -ppgreppkillpidofctrl+z & jobs & fg# 网络ipifconfigdigpingtracerouteiftop pingtop nloadnetstatvnstatslurmscptcpdump# 磁盘 i/oiotop iostat# 虚拟机virt-top# 用户wwhoami# 运行时间uptime# 磁盘dudflsblk# 权限chownchmod# 服务systemctl list-unit-files# 定位findlocate# 性能测试time

三星是否垄断内存价格有待调查内存供需偏紧上半年涨势难止
网站基本建设开发设计的四个方法分别是什么
单电机控制SOC产品MM32SPIN030C
郭台铭的抗韩梦遇阻传LG将会为三星提供面板
5G时代的通信业巨变将至,运营商的重大机遇
一文掌握Linux服务器监控命令
声纹识别是怎样的一情况
彻底变革工作流程，NVIDIA推出首款基于Turing架构GPU
霍尔电流传感器在逆变器中的应用
警方花钱从亚马逊买进人脸识别技术Rekognition
分体式蓝牙耳机哪款好？好用的分体式蓝牙耳机推荐
三相固态继电器工作原理
门控量子位：量子计算机的新希望
不少企业正力推无人驾驶在物流卡车领域的运用
扩展坞苹果电脑转换器接口功能说明
智能配电柜的设备
NASA开发出无人机系统可摧毁擅闯禁飞区无人机
云知声CEO黄伟：人工智能将通过重塑商业，重塑社会
STM32的PWM实验
一周芯闻：美国将提供370亿美元的半导体补贴