Linux

关于磁盘及 fs 的几个问题处理

Linux 关于磁盘及 fs 的几个问题处理有块磁盘出现了错误，其挂载的分区可以 cd 进入，但无法通过 ls 列出当前目录文件，显示错误为： cannot list ......:Bad message 没遇到这个问题，没办法，可能与之前的 dd 操作有关？查到的资料显示可能是 inode 损坏，那么就尝试清理 inode。 First list bad file with inode e.g. $ ls –il Output 14071947 -rw-r--r-- 1 dba 0 2010-01-27 15:49 -®Å Note: 14071947 is inode number. Now Use find command to delete file by inode: $ find . -inum 14071947 -exec rm -f {} ; It will find that bad file and will remove it with force i.e remove without prompt. 但并不奏效，因为这个目录都无法再列出文件，而不是这个问题提出者遇到的无法 rm 的问题。 ...

内存文件系统使用

Linux Ubuntu 内存文件系统使用内存的速度足够快，那么在内存中开辟一个存储空间，挂在到特定分区，实现快速缓存的方案。 tmpfs 是一种虚拟内存文件系统, 它存储在 VM(virtual memory) 里面, VM 是由 Linux 内核里面的 VM 子系统管理，现在大多数操作系统都采用了虚拟内存(MMU)管理机制。 $ mount -t tmpfs -o size= 1024m tmpfs /mnt 优点大小随意分配大小根据实际存储的容量而变化不指定size大小是物理内存的一半读写速度超级快缺点断电内容消失 echo "tmpfs /mnt tmpfs size=1024m 0 0\n" >> /etc/fstab 既然做出了文件系统，就来测个速度吧，fio 配置文件如下。 [global] ioengine=libaio direct=0 thread=1 norandommap=1 randrepeat=0 runtime=60 ramp_time=6 size=1g directory=/path numjobs=16 iodepth=128 [read4k-rand] stonewall group_reporting bs=4k rw=randread [read64k-seq] stonewall group_reporting bs=64k rw=read [write4k-rand] stonewall group_reporting bs=4k rw=randwrite [write64k-seq] stonewall group_reporting bs=64k rw=write Jobs: 16 (f=0): [_(48),/(16)][-.-%][r=0KiB/s,w=13.0GiB/s][r=0,w=3411k IOPS][eta 01m:05s] read4k-rand: (groupid=0, jobs=16): err= 0: pid=13736: Mon Jun 8 15:25:25 2020 read: IOPS=3534k, BW=13.5GiB/s (14.5GB/s)(16.0GiB/1187msec) clat percentiles (nsec): | 1.00th=[ 0], 5.00th=[ 0], 10.00th=[ 0], 20.00th=[ 0], | 30.00th=[ 0], 40.00th=[ 0], 50.00th=[ 0], 60.00th=[ 0], | 70.00th=[ 0], 80.00th=[ 0], 90.00th=[ 0], 95.00th=[ 0], | 99.00th=[ 0], 99.50th=[ 0], 99.90th=[ 0], 99.95th=[ 0], | 99.99th=[ 0] cpu : usr=14.95%, sys=84.75%, ctx=1711, majf=0, minf=2049 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued rwt: total=4194304,0,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=128 read64k-seq: (groupid=1, jobs=16): err= 0: pid=13752: Mon Jun 8 15:25:25 2020 read: IOPS=267k, BW=16.3GiB/s (17.5GB/s)(16.0GiB/981msec) clat percentiles (nsec): | 1.00th=[ 0], 5.00th=[ 0], 10.00th=[ 0], 20.00th=[ 0], | 30.00th=[ 0], 40.00th=[ 0], 50.00th=[ 0], 60.00th=[ 0], | 70.00th=[ 0], 80.00th=[ 0], 90.00th=[ 0], 95.00th=[ 0], | 99.00th=[ 0], 99.50th=[ 0], 99.90th=[ 0], 99.95th=[ 0], | 99.99th=[ 0] cpu : usr=1.14%, sys=98.40%, ctx=1344, majf=0, minf=32784 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.6% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued rwt: total=262144,0,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=128 write4k-rand: (groupid=2, jobs=16): err= 0: pid=13768: Mon Jun 8 15:25:25 2020 write: IOPS=1572k, BW=6141MiB/s (6439MB/s)(16.0GiB/2668msec) clat percentiles (nsec): | 1.00th=[ 0], 5.00th=[ 0], 10.00th=[ 0], 20.00th=[ 0], | 30.00th=[ 0], 40.00th=[ 0], 50.00th=[ 0], 60.00th=[ 0], | 70.00th=[ 0], 80.00th=[ 0], 90.00th=[ 0], 95.00th=[ 0], | 99.00th=[ 0], 99.50th=[ 0], 99.90th=[ 0], 99.95th=[ 0], | 99.99th=[ 0] cpu : usr=11.57%, sys=88.18%, ctx=3745, majf=0, minf=2424 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued rwt: total=0,4194304,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=128 write64k-seq: (groupid=3, jobs=16): err= 0: pid=13784: Mon Jun 8 15:25:25 2020 write: IOPS=254k, BW=15.5GiB/s (16.6GB/s)(16.0GiB/1033msec) clat percentiles (nsec): | 1.00th=[ 0], 5.00th=[ 0], 10.00th=[ 0], 20.00th=[ 0], | 30.00th=[ 0], 40.00th=[ 0], 50.00th=[ 0], 60.00th=[ 0], | 70.00th=[ 0], 80.00th=[ 0], 90.00th=[ 0], 95.00th=[ 0], | 99.00th=[ 0], 99.50th=[ 0], 99.90th=[ 0], 99.95th=[ 0], | 99.99th=[ 0] cpu : usr=14.67%, sys=84.97%, ctx=1430, majf=0, minf=16 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.6% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued rwt: total=0,262144,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=128 Run status group 0 (all jobs): READ: bw=13.5GiB/s (14.5GB/s), 13.5GiB/s-13.5GiB/s (14.5GB/s-14.5GB/s), io=16.0GiB (17.2GB), run=1187-1187msec Run status group 1 (all jobs): READ: bw=16.3GiB/s (17.5GB/s), 16.3GiB/s-16.3GiB/s (17.5GB/s-17.5GB/s), io=16.0GiB (17.2GB), run=981-981msec Run status group 2 (all jobs): WRITE: bw=6141MiB/s (6439MB/s), 6141MiB/s-6141MiB/s (6439MB/s-6439MB/s), io=16.0GiB (17.2GB), run=2668-2668msec Run status group 3 (all jobs): WRITE: bw=15.5GiB/s (16.6GB/s), 15.5GiB/s-15.5GiB/s (16.6GB/s-16.6GB/s), io=16.0GiB (17.2GB), run=1033-1033msec io带宽很高，64k 随机读的 IOPS 居然达到了惊人的 1572k。

硬 raid 快速配置

Linux RAID 硬 raid 快速配置查看并清理 foreign configure /opt/MegaRAID/storcli/storcli64 /c0/fall show /opt/MegaRAID/storcli/storcli64 /c0/fall del 查看物理磁盘，确定数目和 eid 和 sid，一般是连续的 /opt/MegaRAID/storcli/storcli64 /c0/eall/sall show 配置 raid 一般需要确认 enclosure ID 和 Slot ID。在这里 eid 就是 8，sid 就是 0-9 一共九块盘。根据第二步看到的数据进行修改。 /opt/MegaRAID/storcli/storcli64 /c0 add vd r5 size=all name=lotus drives=8:0-9 AWB ra direct strip=256 上述是一般 raid 创建，我们的存储机器有 36 块盘，所以指定前 18 块盘是一个 raid5，接着 17 块盘是一个 raid5，最后一个盘是全局热备盘。以 172.16.10.11 为例子： sudo /opt/MegaRAID/storcli/storcli64 /c0 add vd r5 size=all name=lotus drives=25:1-18 AWB ra direct strip=1024 sudo /opt/MegaRAID/storcli/storcli64 /c0 add vd r5 size=all name=lotus-2 drives=20:1-23 AWB ra direct strip=1024 sudo /opt/MegaRAID/storcli/storcli64 /c0/e26/s12 add hotsparedrive 根据实际需求，确定 eid 和 slot id，生成 raid，最后一步指定一个全局热备盘。 ...

蓝鲸智云 5.1.29 搭建

Monitoring 蓝鲸智云 5.1.29 搭建由于官方的安装指导文件更新于 2020-01-20，而 5.1.29 的安装包则是 2020-09-27 释出的，所以安装过程中会有坑，这里会对整个安装过程进行记录。获取安装包并检验MD5码 axel https://bkopen-1252002024.file.myqcloud.com/ce/bkce_src-6.0.0.tgz -n 10 --output=/root/bkce_src-6.0.0.tgz md5sum /root/bkce_src-6.0.0.tgz 操作系统配置 1. yum 源更新 # centos7 为基准 wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.cloud.tencent.com/repo/centos7_base.repo yum clean all yum makecache # centos7 epel mv /etc/yum.repos.d/epel.repo /etc/yum.repos.d/epel.repo.backup wget -O /etc/yum.repos.d/epel.repo http://mirrors.cloud.tencent.com/repo/epel-7.repo yum clean all yum makecache 2. 关闭 SELinux # 检查 SELinux 的状态，如果它已经禁用，可以跳过后面的命令 sestatus # 通过命令临时禁用 SELinux setenforce 0 # 或者修改配置文件 sed -i 's/^SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config reboot 3. 关闭默认防火墙(firewalld) # 检查默认防火墙状态，如果返回 not running，可以跳过后面的命令 firewall-cmd --state systemctl stop firewalld # 停止 firewalld systemctl disable firewalld # 禁用 firewall 开机启动 4. 安装 rsync 命令 # 检查是否有 rsync 命令，如果有返回 rsync 路径，可以跳过后面的命令 which rsync # 安装 rsync yum -y install rsync 5. 停止并禁用 NetWorkManager # 检查 NetworkManager 运行状态 systemctl status NetworkManager # 关闭并禁用 NetworkManager systemctl stop NetworkManager systemctl disable NetworkManager 6. 调整最大文件打开数 # 检查当前 root 账号下的 max open files 值 ulimit -n # 备份之前的配置文件 cp /etc/security/limits.conf /etc/security/limits.conf.bak # 增加配置信息 cat << EOF >> /etc/security/limits.conf root soft nofile 102400 root hard nofile 102400 EOF 7. 确认服务器时间同步 # 检查每台机器当前时间和时区是否一致，若相互之间差别大于3s(考虑批量执行时的时差)，建议校时。 date -R # 查看和ntp server的时间差异(需要外网访问，如果内网有ntpd服务器，自行替换域名为该服务的地址) ntpdate -d cn.pool.ntp.org 如果输出的最后一行 offset 大于 1s 建议校时。 # 和 ntp 服务器同步时间 ntpdate cn.pool.ntp.org # 使用 ntpd 进行时间同步 http://xstarcd.github.io/wiki/sysadmin/ntpd.html 8. 检查 resolv.conf 是否有修改权限检查 /etc/resolv.conf 是否被加密无法修改(即便是 root)，执行如下命令，检查是否有“i”加密字样： ...

蓝鲸智云 v6.0.3 安装

Monitoring 蓝鲸智云 v6.0.3 安装环境监测以及硬件配置参考: 安装环境准备脚本: curl -sSL http://172.16.0.219:8080/directlink/2/sh/sudo-2-firefly.sh|bash fdisk /dev/sda ==================================================================================================================== n t 31 w pvcreate /dev/sda1 vgcreate -s 32M data /dev/sda1 lvcreate -L 300G -n data00 data mkfs.ext4 /dev/data/data00 blkid |grep data ==================================================================================================================== echo 'UUID="26130f2b-ceb7-40d4-b1d9-8e2712735c55" /data ext4 defaults 0 0' >> /etc/fstab mkdir /data mount -a df -h wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.cloud.tencent.com/repo/centos7_base.repo yum clean all yum makecache wget -O /etc/yum.repos.d/epel.repo http://mirrors.cloud.tencent.com/repo/epel-7.repo yum clean all yum makecache systemctl disable firewalld yum -y install rsync pssh cat >> /etc/security/limits.conf << EOF root soft nofile 102400 root hard nofile 102400 EOF ntpdate cn.pool.ntp.org hostnamectl set-hostname tencent-bk1 timedatectl set-timezone Asia/Shanghai sed -i 's/^SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config reboot 获取安装包和证书文件下载页证书下载页注意，证书需要三台节点的第一块网卡的mac地址，且下载证书是需要QQ登录的。 ...

译-终结一个进程和它的所有后代

Linux [译]终结一个进程和它的所有后代终结一个类UNIX系统的进程可能比预期要复杂。上周我正在调试一个信号量停止工作导致的奇怪问题。更具体地说，涉及终结作业中正在运行的进程的问题。以下是我学到的内容的亮点：类 UNIX 操作系统有很复杂的进程关系。父子进程、进程组、会话和会话负责人。但是，Linux 和 macOSX 等操作系统的细节并不统一。符合 POSIX 标准的操作系统支持向具有负 PID 编号的进程组发送信号。在会话中向所有进程发送信号对于系统调用来说并不简单。使用 exec 启动的子进程可以继承父进程的信号量信息。杀死父进程不会杀死子进程每个进程都有一个父进程，我们可以通过 pstree 或 ps 程序观察到这一点。 # start two dummy processes $ sleep 100 & $ sleep 101 & $ pstree -p init(1)-+ |-bash(29051)-+-pstree(29251) |-sleep(28919) `-sleep(28964) $ ps j -A PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND 0 1 1 1 ? -1 Ss 0 0:03 /sbin/init 29051 1470 1470 29051 pts/2 2386 SN 1000 0:00 sleep 100 29051 1538 1538 29051 pts/2 2386 SN 1000 0:00 sleep 101 29051 2386 2386 29051 pts/2 2386 R+ 1000 0:00 ps j -A 1 29051 29051 29051 pts/2 2386 Ss 1000 0:00 -bash ps 命令显示 PID (进程的 ID)和 PPID (进程的父 ID)。 ...

软 raid 配置简单操作

Linux RAID 软raid配置简单操作分以下几步：停止已有的md 磁盘分区创建软raid虚拟磁盘格式化磁盘挂载磁盘停止已有的md 通过 ls -al /dev/md*，观察是否已有 mdxxx 的字样，如果有则需要先清掉。 all_disk=`fdisk -l|grep '1.8 T'|awk '{print $2}'|cut -d":" -f1` part_disk=`fdisk -l|grep 'Linux raid autodetect'|awk '{print $1}'` umount /dev/md/cache md_list=`ls -al /dev/md/*|awk '{print $9}'` for i in $md_list do mdadm -S $i done mdadm --misc --zero-superblock $part_disk rm -f /etc/mdadm.conf rm -f /etc/mdadm/mdadmin.conf 磁盘分区通过 fdisk -l，观察每块nvme磁盘是否都有一个分区，且该分区的类型是 Linux raid autodetect。如果不是则需要配置。 fdisk /dev/nvme?n1 # 下列操作需要依据需求来做，一般通过n创建分区，通过fd指定类型，通过w写入数据。创建软raid虚拟磁盘这一步不难，如果前面做好的话。 blkid $part_disk /sbin/mdadm --create cache --auto yes --level 0 -n`echo $part_disk|wc -w` $part_disk 上面的指令出现了五个问号，都是数字。其中n后面的数字是指一共几个磁盘，nvme后面的指盘符。 ...

阿里云挂载 NFS 磁盘

Linux [[Alibaba Cloud]] 阿里云挂载 NFS 磁盘其实阿里云买了服务过后，作为一个开发者，跟着文档走，大部分问题都可以解决，现成的指令甚至连参数都帮你替换好了，复制粘贴就好，但是因为复杂，会接连出现多个文档，缺少统一性。安装组件以及调优 $ sudo yum install nfs-utils $ echo "options sunrpc tcp_slot_table_entries=128" >> /etc/modprobe.d/sunrpc.conf $ echo "options sunrpc tcp_max_slot_table_entries=128" >> /etc/modprobe.d/sunrpc.conf $ reboot $ cat /proc/sys/sunrpc/tcp_slot_table_entries 挂载 NFS 磁盘首先先购买 NAS 文件系统，买完之后，其实就可以理解为拥有了一个根分区的容量很大的磁盘，可以通过 mount 的方式将这个文件系统作为普通目录挂载在 ECS 中。打开 NAS 文件系统控制台，点击资源进入，在挂载使用里面给出了“已自动替换参数的命令”，简单的操作就是复制到 ECS 主机上就可以了。 $ sudo mount -t nfs -o vers=4,minorversion=0,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport file-system-id.region.nas.aliyuncs.com:/ /mnt 参数描述 file-system-id.region.nas.aliyuncs.com:/ /mnt 表示<挂载点地址>：<NAS文件系统目录> <当前服务器上待挂载的本地路径> vers 文件系统版本，目前只支持nfsv3和nfsv4。挂载文件系统时，可选择多种挂载选项，详情情参见下表。如果您必须更改IO大小参数（rsize和wsize），建议您尽可能使用最大值（1048576），以避免性能下降。如果您必须更改超时参数（timeo），建议您使用150或更大的值。该timeo参数的单位为0.1秒，因此150表示的时间为15秒。不建议使用soft选项，有数据一致性风险。如果您要使用soft选项，相关风险需由您自行承担。避免设置不同于默认值的任何其他挂载选项。如果更改读或写缓冲区大小或禁用属性缓存，会导致性能下降。选项说明 rsize 定义数据块的大小，用于在您的客户端与云中的文件系统之间读取数据。建议值：1048576。 wsize 定义数据块的大小，用于在您的客户端与云中的文件系统之间写入数据。建议值：1048576。 hard 指定在NAS暂时不可用的情况下，使用文件系统上某个文件的本地应用程序时应停止并等待该文件系统恢复在线状态。建议启用该参数。 timeo 指定时长（单位为0.1秒），即NFS客户端在重试向云中的文件系统发送请求之前等待响应的时间。建议值：600（60秒）。 retrans 指定NFS客户端应重试请求的次数。建议值：2。 noresvport 指定在网络重连时使用新的TCP端口，保障在网络发生故障恢复的时候不会中断连接。建议启用该参数。 $ df -h|grep aliyun 错误处理这中间可能会出现一些问题导致无法挂载，阿里云已经写好了 python 脚本去检测问题。 ...