一、day HDFS
Hadoop生态系统
Hadoop:HDFS | MapReduce 起源 google GFS --nutch(ndfs)-- HDFS
HDFS:存储问题 分布式文件存储 提升系统的访问速率
磁盘大小 磁盘访问速度 读完耗时
1GB 4.4M/S 4 分钟
1TB 100M/S 2.9小时
分布式解决方案
1TB -- 100 台机器 -- 10.24GB
10.24GB 100M/S 2分钟
MapReduce:计算 起源 google MapReduce
计算任务 拆分 若干个小任务 分配给 存储节点 结果汇总 数亿行*百万列 数据随机访问
Hbase:基于HDFS上的一款数据库 起源 google bigtable
Hive:HQL -- 翻译成 MapReduce程序
Zookeeper:分布式协调服务
HDFS架构图
1.hadoop环境搭建 jiangzz_wy
系统:CentOS6.5 32 位 安装JDK1.7+ (并且已经配置过环境变量JAVA_HOME)
1.安装JDK
Jdk配置步骤:
①先把jdk-7u71-linux-i586.rpm 用 winsp 拉进/usr/local
②rpm –ivh jdk-7u71-linux-i586.rpm 安装jdk
③ls –a 会看到.bashrc
Vi .bashrc
配置环境:
CLASSPATH=.
JAVA_HOME=/usr/java/latest
PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH
export JAVA_HOME
export PATH
④sec 克隆一下就可生效 jps 或 java –version
2配置主机名:
[root ~]# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=CentOS
2.配置主机名和IP的映射关系
注:先去C:\Windows\System32\drivers\etc\hosts
添加 192.168.0.8 CentOS
[root ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.8 CentOS
4.配置 机器为SSH免密码登陆
5.取消防火墙
[root ~]# service iptables stop
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Flushing firewall rules: [ OK ]
iptables: Unloading modules: [ OK ]
[root ~]# chkconfig --del iptables -- 关闭防火墙的开启自启动
6.配置SSH免密码登陆
(1)生成公私钥对
[root ~]# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
06:76:81:51:1f:94:7c:02:6b:49:c5:e8:cc:80:df:8b root@CentOS
The key's randomart image is:
+--[ DSA 1024]----+
| ..+=B+. |
| . o..==.. |
| .o*= .o |
| ..+= |
| .S. |
| E.. |
| |
| |
| |
+-----------------+
(2)上传给需要登陆的目标机器
略 winsp
(3)目标机器将上传的公钥添加到自己的信任列表
[root@CentOS ~]# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
图解SSH免密码登陆
7.上传Hadoop-2.6.0.tar.gz文件并解压到/usr目录
[root@CentOS ~]# tar -zxf hadoop-2.6.0.tar.gz -C /usr/
8.配置Hadoop的相关配置
(1)etc/hadoop/core-site.xml
(2)exc/hadoop/hdfs-site.xml
(3)etc/hadoop/slaves
9.格式化namenode (创建fsimage文件)
[root@CentOS hadoop-2.6.0]# ./bin/hdfs namenode -format
16/07/27 23:25:12 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
.....
16/07/27 23:25:13 INFO namenode.NNConf: XAttrs enabled? true
16/07/27 23:25:13 INFO namenode.NNConf: Maximum size of an xattr: 16384
16/07/27 23:25:13 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1909604994-192.168.0.8-1469633113883
16/07/27 23:25:13 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
16/07/27 23:25:14 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
16/07/27 23:25:14 INFO util.ExitUtil: Exiting with status 0
16/07/27 23:25:14 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at CentOS/192.168.0.8
************************************************************/
10.启动hadoop
[root@CentOS hadoop-2.6.0]# ./sbin/start-dfs.sh
Starting namenodes on [CentOS]
CentOS: starting namenode, logging to /usr/hadoop-2.6.0/logs/hadoop-root-namenode-CentOS.out
CentOS: starting datanode, logging to /usr/hadoop-2.6.0/logs/hadoop-root-datanode-CentOS.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/hadoop-2.6.0/logs/hadoop-root-secondarynamenode-CentOS.out
11.检验是否启动成功
[root@CentOS hadoop-2.6.0]# jps
3459 NameNode
3713 SecondaryNameNode
3571 DataNode
12.关闭hadoop
[root@CentOS hadoop-2.6.0]# ./sbin/stop-dfs.sh
Stopping namenodes on [CentOS]
CentOS: stopping namenode
CentOS: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
任务二、自己学习使用HDFS shell指令
[root@CentOS hadoop-2.6.0]# ./bin/hdfs dfs -help
任务三、JavaAPI操作HDFS
1.配置Windows的开发环境
(1) 解压hadoop-2.6.0.tar.gz 到C:/
(2)配置HADOOP_HOME环境变量
hadoop-2.6.0 路径 不要有中文
(3)添加hadoop windows开发支持
a) Hadoop.dll 和 winutils.exe 拷贝到hadoop-2.6.0的bin目录下
2.将core-site.xml和hdfs-site.xml拷贝到项目的src的目录下
3.修改Java的启动参数 -DHADOOP_USER_NAME=root
第二种解决权限方案
补充:
1.配置文件的基础路径
a) hadoop.tmp.dir
hadoop.tmp.dir 是hadoop文件系统依赖的基础配置,很多路径都依赖它。如果hdfs-site.xml中不配 置namenode和datanode的存放位置,默认就放在这个路径中
在core-site.xml 配置
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-2.6.0/hadoop-${user.name}</value>
</property>
2.配置回收站
a) Fs.trash.interval
在core-site.xml 配置
<property>
<name>fs.trash.interval</name>
<value>2</value>
</property>
回收站路径:
hdfs://Centos:9000/user/root/.Trash/Current
查看回收站:
./bin/hdfs dfs -ls /user/root/.Trash/Current
恢复:
./bin/hdfs dfs -mv /user/root/.Trash/Current/hadoop-2.6.0.tar.gz /
3.自行了解
a) ./bin/hdfs -help
b) ./bin/hdfs dfsadmin -help --自行学习
查看机器内存
./bin/hdfs dfsadmin –report
进入安全模式
./bin/hdfs dfsadmin –safemode enter/get(查看当前状态)
查看机架
./bin/hdfs dfsadmin -printTopology