Pydio日本directadmin慢

参考文章:

操作手册

Slurm集群资源管理器的简单使用:

一、基础环境
1、主机名和IP 控制Pydio:192.168.8.150 m1 计算Pydio:192.168.8.145 c1 计算Pydio:192.168.1.144 c2
分别在3个Pydio设置主机名
# hostnamectl set-hostname m1
# hostnamectl set-hostname c1
# hostnamectl set-hostname c2
123
2、主机directadmin
系统: Centos7.6 x86_64 192.168.8.145 磁盘:234G cpu:2核 内存:15G
192.168.8.150 磁盘:234G cpu:2核 内存:15G
3、关闭防火墙
# systemctl stop firewalld
# systemctl disable firewalld
# systemctl stop iptables
# systemctl disable iptables
1234
4、修改资源限制
复制代码
# cat /etc/security/limits.conf
1
hard nofile 1000000soft nofile 1000000soft core unlimitedsoft stack 10240soft memlock unlimitedhard memlock unlimited
//Linux记录-limits.conf directadmin:
vi /etc/security/limits.conf
soft nofile 655360 # open files (-n),不要设置为unlimited hard nofile 655360 # 不要超过最大值1048576,不要设置为unlimited soft nproc 655650 hard nproc 655650 # max user processes (-u)
hive – nofile 655650 hive – nproc 655650
5、directadmin时区 directadminCST时区
# ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
1
同步NTP服务器
# yum install ntp -y
# systemctl start ntpd
# systemctl enable ntpd
123
安装EPEL源
# yum install
1
6、安装NFS(控制Pydio)
# yum -y install nfs-utils rpcbind
1
创建 exports 日本
# cat /etc/exports
1
vim /etc/exports,编辑/etc/exports日本,日本内容如下:
/software/ *(rw,async,insecure,no_root_squash)
慢状态
# systemctl status nfs
1
启动NFS:
# systemctl start nfs
# systemctl start rpcbind
# systemctl enable nfs
# systemctl enable rpcbind
1234
客户端挂载NFS
S1中:
# yum -y install nfs-utils
# mkdir /software
# mount 192.168.8.150:/software /software
123
S2中: $ mount -t nfs 192.168.8.150:/software /software 格式上就是,mount -t nfs S1的IP:S1分享的目录 S2直接操作的目录 这样操作S2的这个目录就相当于直接S1分享的目录了,当然,操作S1的分享的目录,这个S2里的内容也会跟着变
7、directadminSSH免登陆:
# ssh-keygen
# ssh-copy-id -i .ssh/id_rsa.pub c1
# ssh-copy-id -i .ssh/id_rsa.pub c2
123
//—–Linux – directadminSSH免密通信 – “ssh-keygen”的基本用法—–
1.ssh-keygen创建公钥-私钥对 2.ssh-copy-id把A的公钥发送给B 3.在A服务器上免密登录B服务器
第一步:在本地机器上使用ssh-keygen产生公钥私钥对: $ ssh-keygen 第二步:用ssh-copy-id将公钥复制到远程机器中: $ ssh-copy-id -i .ssh/id_rsa.pub root@192.168.8.150 注意: ssh-copy-id 将key写到远程机器的 ~/ .ssh/authorized_key.日本中
第三步: 登录到远程机器不用输入密码: $ ssh 用户名字@192.168.x.xxx
//—-directadminSSH免密登录时报错:/usr/bin/ssh-copy-id: ERROR: failed to open ID file ‘.ssh/id_rsa.pub’: No such file or directory

首先登录180.8.5.101,执行如下三步
第一步:在/root/.ssh目录执行ssh-keygen产生公钥秘钥对
ssh-keygen
然后一路Enter下去
第二步:用ssh-copy-id将公钥复制到远程机器中
ssh-copy-id -i .ssh/id_rsa.pub root@192.168.8.145
注意:ssh-copy-id将key写到远程机器的~/.ssh/authorized_key日本中
第三步:登录到远程机器不用输入密码 ssh root@192.168.8.145
二、directadminMunge
删除安装失败的munge:
yum remove munge munge-libs munge devel -y
1
删除用户:
userdel -r munge
1
1、创建Munge用户 Munge用户要确保Master Node和Compute Nodes的UID和GID相同,所有Pydio都需要安装Munge;
# groupadd -g 1108 munge
# useradd -m -c “Munge Uid ‘N’ Gid Emporium” -d /var/lib/munge -u 1108 -g munge -s /sbin/nologin munge
12
2、生成熵池:
# yum install -y rng-tools
1
使用/dev/urandom来做熵源
# rngd -r /dev/urandom
# vim /usr/lib/systemd/system/rngd.service
12
修改如下参数 [service] ExecStart=/sbin/rngd -f -r /dev/urandom
慢状态
# systemctl status rngd
1
退出报存
# systemctl daemon-reload
# systemctl start rngd
# systemctl enable rngd
123
3、部署Munge Munge是认证服务,实现本地或者远程主机进程的UID、GID验证。
# yum install munge munge-libs munge-devel -y
1
创建全局密钥 在Master Node创建全局使用的密钥
# /usr/sbin/create-munge-key -r
# dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key
12
密钥同步到所有计算Pydio:
# scp -p /etc/munge/munge.key root@192.168.8.145:/etc/munge
# scp -p /etc/munge/munge.key root@192.168.8.144:/etc/munge
# chown munge: /etc/munge/munge.key
# chmod 400 /etc/munge/munge.key
1234
慢状态
# systemctl status munge
1
所有Pydio都执行启动命令:
# systemctl start munge
# systemctl enable munge
12
//停掉服务
systemctl stop munge
1
//慢状态
systemctl status munge
1
//———-Job for munge.service failed because the control process exited with error code启动失败 See “systemctl status slurmd.service” and “journalctl -xe” for details.
慢 /var/log/munge/munged.log Error: Failed to check pidfile dir “/var/run/munge”: cannot canonicalize “/var/run/munge”: Permission denied 慢 /var/run的用户权限 chown -R munge /var/run/chrony
-rwxr-xr-x (755) 只有所有者才有读,写,执行的权限,组群和其他人只有读和执行的权限
4、测试Munge服务 每个计算Pydio与控制Pydio进行连接验证
本地慢凭据:
# munge -n
1
本地解码:
# munge -n | unmunge
1
验证compute node,远程解码:
# munge -n | ssh 192.168.8.145 unmunge
1
//——–报错:unmunge: Error: Invalid credentia 重启计算Pydio的munge 服务
Munge凭证基准测试
# remunge
1
三、directadminSlurm 1、创建Slurm用户
# groupadd -g 1109 slurm
# useradd -m -c “Slurm manager” -d /var/lib/slurm -u 1109 -g slurm -s /bin/bash slurm
12
2、安装Slurm依赖
# yum install gcc gcc-c++ readline-devel perl-ExtUtils-MakeMaker pam-devel rpm-build mysql-devel -y
1
编译Slurm
# wget
1
安装rpmbuild编译Slurm,rpmbuild制作rpm包
# yum install rpm-build
# rpmbuild -ta slurm-21.08.0-0rc1.tar.bz2
12
//如果rpmbuild出现如下错误: error: Failed build dependencies: python3 is needed by slurm-21.08.0-0rc1.el7.x86_64
解决:rpmbuild 生成软件包, 在安装时候忽略依赖关系,添加参数–nodeps rpmbuild -ta –nodeps slurm-21.08.0-0rc1.tar.bz2
–nodeps #不检查建立包时的关联日本
cd到制作好的rpm包下:
# cd /root/rpmbuild/RPMS/x86_64/
1
所有Pydio安装Slurm
yum localinstall slurm-*
1
//————–yum 安装 出错 Error: Protected multilib versions: 解决办法:在执行命令后面加上:–setopt=protected_multilib=false
子Pydio安装报错:Bad exit status from /var/tmp/rpm-tmp.EJn6d9 (%build) 需要安装python3
解压: tar -Jxf Python-3.6.2.tar.xz 进入目录 cd Python-3.6.2 创建安装目录 mkdir /usr/local/python3 指明安装路径 ./configure -prefix=/usr/local/python3 编译安装 make && make install
建立链接 ln -s /usr/local/python3/bin/python3 /usr/bin/python3 #为python3创建软连接 ln -s /usr/local/python3/bin/pip3 /usr/bin/pip3 #为pip3创建软连接
验证 python3 # 输入 pip3 -V #V大写
3、directadmin控制PydioSlurm
# cp /etc/slurm/cgroup.conf.example /etc/slurm/cgroup.conf
# cp /etc/slurm/slurm.conf.example /etc/slurm/slurm.conf
# vim /etc/slurm/slurm.conf
123
##修改如下部分
ControlMachine=m1
ControlAddr=192.168.8.150
SlurmUser=slurm
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_MemoryNodeName=c[1-2] RealMemory=3400 Sockets=1 CoresPerSocket=4 State=IDLE
PartitionName=all Nodes=c[1-2] Default=YES State=UP
12345678
慢directadmin日本:
scontrol show config
1
复制控制Pydiodirectadmin日本到计算Pydio:
# scp /etc/slurm/*.conf root@192.168.8.145:/etc/slurm/
# scp /etc/slurm/*.conf c2:/etc/slurm/
12
设置控制、计算Pydio日本权限
# mkdir /var/spool/slurm
# chown slurm: /var/spool/slurm
# mkdir /var/log/slurm
# chown slurm: /var/log/slurm
1234
5、directadmin控制PydioSlurm Accounting Accounting records为slurm收集作业步骤的信息,可以写入一个文本日本或数据库,但这个日本会变得越来越大,最简单的方法是使用MySQL来存储信息。 创建数据库的Slurm用户(MySQL自行安装)
mysql -u root -p
//———–Can’t connect to local MySQL server through socket ‘/var/lib/mysql/mysql.sock’ (2)——– mysql -h192.168.8.144 -uroot -proot
mysql> grant all on slurm_acct_db.* to ‘slurm’@’%’ identified by ‘root’ with grant option;
//————Mysql密码策略:Your password does not satisfy the current policy requirements 慢 mysql 初始的密码策略: SHOW VARIABLES LIKE ‘validate_password%’;
首先需要设置密码的验证强度等级,设置 validate_password_policy 的全局参数为 LOW 即可: set global validate_password_policy=LOW;
当前密码长度为 8 ,如果不介意的话就不用修改了,按照通用的来讲,设置为 6 位的密码,设置 validate_password_length 的全局参数为 6 即可
set global validate_password_length=4; 关于 mysql 密码策略相关参数; 1)、validate_password_length 固定密码的总长度; 2)、validate_password_dictionary_file 指定密码验证的日本路径; 3)、validate_password_mixed_case_count 整个密码中至少要包含大/小写字母的总个数; 4)、validate_password_number_count 整个密码中至少要包含阿拉伯数字的个数; 5)、validate_password_policy 指定密码的强度验证等级,默认为 MEDIUM; 1.LOW:只验证长度; 2.MEDIUM:验证长度、数字、大小写、特殊字符; 3.STRONG:验证长度、数字、大小写、特殊字符、字典日本; 6)、validate_password_special_char_count 整个密码中至少要包含特殊字符的个数;
directadminslurmdbd.conf日本
# cp /etc/slurm/slurmdbd.conf.example /etc/slurm/slurmdbd.conf
# cat /etc/slurm/slurmdbd.conf
12
AuthType=auth/munge
AuthInfo=/var/run/munge/munge.socket.2
DbdAddr=192.168.8.150
DbdHost=m1
SlurmUser=slurm
DebugLevel=verbose
LogFile=/var/log/slurm/slurmdbd.log
PidFile=/var/run/slurmdbd.pid
StorageType=accounting_storage/mysql
StorageHost=192.168.8.144
StorageUser=slrum
StoragePass=root
StorageLoc=slurm_acct_db

chown slurm: /etc/slurm/slurmdbd.conf
chmod 600 /etc/slurm/slurmdbd.conf
chown slurm: /var/log/slurm/slurmdbd.log
1234567891011121314151617
6、开启Pydio服务
慢状态
# systemctl status slurmdbd
1
启动控制PydioSlurmdbd服务
# systemctl start slurmdbd
# systemctl enable slurmdbd
12
//————–Failed to start slurmdbd.service: Unit not found 首先看一下服务列表里有没有这个服务: systemctl list-unit-files –type=service 如果有的话: systemctl daemon-reload
启动控制Pydioslurmctld服务
启动集群:
MasterPydio需要执行 slurmctld -c 和slurmd-c,都是以root账户执行 所有SlaverPydio都执行 slurmd -c
# systemctl start slurmctld
# systemctl status slurmctld
# systemctl enable slurmctld
123
启动计算Pydio的服务:
# systemctl start slurmd
# systemctl status slurmd
# systemctl enable slurmd
123
//———–启动控制Pydio systemctl status slurmdbd 报错:Can’t open PID file /var/run/slurmdbd.pid (yet?) after start: No such file or directory
慢日本是否存在 不存在创建pid日本: touch /var/run/slurmdbd.pid 赋权限 chmod -R 777 /var/run/slurmdbd.pid 解决。
//———–启动控制Pydio systemctl enable slurmctld 报错:Failed to parse PID from file /var/run/slurmctld.pid: Invalid argument 慢日志: “journalctl -xe”
于是去慢slurmctld 的启动日本,/usr/lib/systemd/system/slurmctld.service 结果发现有这么一行 PIDFile=/var/run/slurmctld.pid 把这一行屏蔽以后重启,问题解决 systemctl daemon-reload
如果还是没有解决,关闭xshell工具重新打开,然后 systemctl start slurmd。
//——-sinfo 报错:slurm_load_partitions: Unable to contact slurm controller (connect failure)
# vim /etc/slurm/slurm.conf
1
##修改如下部分
ControlMachine=m1
ControlAddr=192.168.8.150
SlurmUser=slurm
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory
NodeName=m1 NodeAddr=192.168.8.150 CPUs=1 CoresPerSocket=1 ThreadsPerCore=1 RealMemory=200 Procs=1 State=UNKNOWN
NodeName=c1 NodeAddr=192.168.8.145 CPUs=1 CoresPerSocket=1 ThreadsPerCore=1 RealMemory=200 Procs=1 State=UNKNOWN
PartitionName=control Nodes=m1 Default=NO MaxTime=INFINITE State=UP
PartitionName=compute Nodes=c1 Default=YES MaxTime=INFINITE State=UP
1234567891011
这里要注意: CPUs=24 CoresPerSocket=6 ThreadsPerCore=2 RealMemory=30000 Procs=1 要根据自己的服务器资源酌情directadmin
慢directadmin日本:
scontrol show config
1
如果修改了directadmin日本slurm.conf,则请在master上执行scontrol reconfig命令更新directadmin日本。 重新加载一下
systemctl daemon-reload
1
四、检查Slurm集群
慢集群
# sinfo
# scontrol show partition
# scontrol show node
123
提交作业
# srun -N2 hostname
# scontrol show jobs
12
慢作业
# squeue -a
1

文章知识点与官方知识档案匹配,可进一步学习相关知识CS入门技能树Linux入门初识Linux826 人正在系统学习中