云帆大数据学院hadoop/spark系列技术文档
Hive安装手册
1. 环境准备
1. 软件环境:
o 已经安装hadoop和hbase.
o hive-0.10.0-cdh4.7.0.tar.gz
2. 硬件资源规划:
o 10.1.1.117 hive
2. 安装Mysql
用root用户登陆master
1. 安装Mysql:
2. # yum install mysql mysql-server
3. 启动Mysql:
4. # service mysqld start
5. 设置root用户:
6. # mysqladmin -uroot password '123456'7. # mysqladmin -uroot -hmaster password '123456'
8. 修改Mysql字符编码:修改/etc/mysql.cnf 文件,加以下内容
9. [mysqld]10. default-character-set=utf811. [mysql]12. default-character-set=utf8
13. 重启Mysql
14. #service mysqld restart3. 安装Hive
在master节点上用hadoop用户登陆。
1. 解压Hive安装包并配置环境变量复制hive-0.10.0-cdh4.7.0.tar.gz到/home/hadoop目录,解压:
2. # tar -zxvf hive-0.10.0-cdh4.7.0.tar.gz
修改~/.bashrc环境变量,在.bashrc 的末尾插入以下变量:
export HIVE_HOME=/home/hadoop/hive-0.10.0-cdh4.7.0export PATH=$PATH:$HIVE_HOME/bin
保存并更新~/.bashrc:
$source ~/.bashrc
或者重新用hadoop用户登陆。
3. 初始化Hive数据库进入mysql数据库
4. $mysql -uroot -p
在mysql创建数据库metastore,导入metastore初始化脚本,添加hadoop用户
mysql> create database metastore;mysql> use metastoremysql>source /home/hadoop/hive-0.10.0-cdh4.7.0/scripts/metastore/upgrade/mysql/hive-schema-0.10.0.mysql.sqlmysql>grant all on metastore.* to hadoop@localhost identified by '123456';mysql>grant all on metastore.* to hadoop@master identified by '123456';mysql>grant all on metastore.* to hadoop@'%' identified by '123456';mysql>flush privileges;
复制mysql的jdbc库mysql-connector-java.jar到~/hive-0.10.0-cdh4.7.0/lib。
配置~/hive-0.10.0-cdh4.7.0/conf/hive-env.sh:
HADOOP_HOME=/home/hadoop/hadoop-2.0.0-cdh4.7.0export HIVE_CONF_DIR=/home/hadoop/hive-0.10.0-cdh4.7.0/confexport HIVE_AUX_JARS_PATH=/home/hadoop/hive-0.10.0-cdh4.7.0/lib
配置~/hive-0.10.0-cdh4.7.0/conf/hive-site.xml
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.--> <configuration> <!-- WARNING!!! This file is provided for documentation purposes ONLY! --><!-- WARNING!!! Any changes you make to this file will be ignored by Hive. --><!-- WARNING!!! You must make your changes in hive-site.xml instead. --> <!-- Hive Execution Parameters --><property> <name>mapred.reduce.tasks</name> <value>-1</value><description>The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive uses -1 as its default value. By setting this property to -1, Hive will automatically figure out what should be the number of reducers. </description></property> <property> <name>hive.exec.reducers.bytes.per.reducer</name> <value>1000000000</value> <description>size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers.</description></property> <property> <name>hive.exec.reducers.max</name> <value>999</value> <description>max number of reducers will be used. If the one specified in the configuration parameter mapred.reduce.tasks is negative, hive will use this one as the max number of reducers when automatically determine number of reducers.</description></property> <property> <name>hive.cli.print.header</name> <value>false</value> <description>Whether to print the names of the columns in query output.</description></property> <property> <name>hive.cli.print.current.db</name> <value>false</value> <description>Whether to include the current database in the hive prompt.</description></property> <property> <name>hive.cli.prompt</name> <value>hive</value> <description>Command line prompt configuration value. Other hiveconf can be used inthis configuration value. Variable substitution will only be invoked at the hivecli startup.</description></property> <property> <name>hive.exec.scratchdir</name> <value>/tmp/hive-${user.name}</value> <description>Scratch space for Hive jobs</description></property> <property> <name>hive.exec.local.scratchdir</name> <value>/tmp/${user.name}</value> <description>Local scratch space for Hive jobs</description></property> <property> <name>hive.test.mode</name> <value>false</value> <description>whether hive is running in test mode. If yes, it turns on sampling and prefixes the output tablename</description></property> <property> <name>hive.test.mode.prefix</name> <value>test_</value> <description>if hive is running in test mode, prefixes the output table by this string</description></property> <!-- If the input table is not bucketed, the denominator of the tablesample is determinied by the parameter below --><!-- For example, the following query: --><!-- INSERT OVERWRITE TABLE dest --><!-- SELECT col1 from src --><!-- would be converted to --><!-- INSERT OVERWRITE TABLE test_dest --><!-- SELECT col1 from src TABLESAMPLE (BUCKET 1 out of 32 on rand(1)) --><property> <name>hive.test.mode.samplefreq</name> <value>32</value> <description>if hive is running in test mode and table is not bucketed, sampling frequency</description></property> <property> <name>hive.test.mode.nosamplelist</name> <value></value> <description>if hive is running in test mode, dont sample the above comma seperated list of tables</description></property> <property> <name>hive.metastore.uris</name> <value>thrift://master:9083</value> <description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description></property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://master/metastore</value> <description>JDBC connect string for a JDBC metastore</description></property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description></property> <property> <name>javax.jdo.PersistenceManagerFactoryClass</name> <value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value> <description>class implementing the jdo persistence</description></property> <property> <name>javax.jdo.option.DetachAllOnCommit</name> <value>true</value> <description>detaches all objects from session so that they can be used after transaction is committed</description></property> <property> <name>javax.jdo.option.NonTransactionalRead</name> <value>true</value> <description>reads outside of transactions</description></property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hadoop</value> <description>username to use against metastore database</description></property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>password to use against metastore database</description></property> <property> <name>javax.jdo.option.Multithreaded</name> <value>true</value> <description>Set this to true if multiple threads access metastore through JDO concurrently.</description></property> <property> <name>datanucleus.connectionPoolingType</name> <value>DBCP</value> <description>Uses a DBCP connection pool for JDBC metastore</description></property> <property> <name>datanucleus.validateTables</name> <value>false</value> <description>validates existing schema against code. turn this on if you want to verify existing schema </description></property> <property> <name>datanucleus.validateColumns</name> <value>false</value> <description>validates existing schema against code. turn this on if you want to verify existing schema </description></property> <property> <name>datanucleus.validateConstraints</name> <value>false</value> <description>validates existing schema against code. turn this on if you want to verify existing schema </description></property> <property> <name>datanucleus.storeManagerType</name> <value>rdbms</value> <description>metadata store type</description></property> <property> <name>datanucleus.autoCreateSchema</name> <value>false</value> <description>creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once</description></property> <property> <name>datanucleus.autoStartMechanismMode</name> <value>checked</value> <description>throw exception if metadata tables are incorrect</description></property> <property> <name>datanucleus.transactionIsolation</name> <value>read-committed</value> <description>Default transaction isolation level for identity generation. </description></property> <property> <name>datanucleus.cache.level2</name> <value>false</value> <description>Use a level 2 cache. Turn this off if metadata is changed independently of hive metastore server</description></property> <property> <name>datanucleus.cache.level2.type</name> <value>SOFT</value> <description>SOFT=soft reference based cache, WEAK=weak reference based cache.</description></property> <property> <name>datanucleus.identifierFactory</name> <value>datanucleus1</value> <description>Name of the identifier factory to use when generating table/column names etc. 'datanucleus1' is used for backward compatibility with DataNucleus v1</description></property> <property> <name>datanucleus.plugin.pluginRegistryBundleCheck</name> <value>LOG</value> <description>Defines what happens when plugin bundles are found and are duplicated [EXCEPTION|LOG|NONE]</description></property> <property> <name>hive.aux.jars.path</name> <value>file:///home/hadoop/hive-0.10.0-cdh4.7.0/lib/zookeeper-3.4.5-
转载请注明:云帆大数据学院 » Hive安装手册
更多内容请加入qq交流群获取:115438979
Hive实战课程链接:http:/http://www.yfteach.com/h-pd-j-6-3_1.html
Hive安装手册
1. 环境准备
1. 软件环境:
o 已经安装hadoop和hbase.
o hive-0.10.0-cdh4.7.0.tar.gz
2. 硬件资源规划:
o 10.1.1.117 hive
2. 安装Mysql
用root用户登陆master
1. 安装Mysql:
2. # yum install mysql mysql-server
3. 启动Mysql:
4. # service mysqld start
5. 设置root用户:
6. # mysqladmin -uroot password '123456'7. # mysqladmin -uroot -hmaster password '123456'
8. 修改Mysql字符编码:修改/etc/mysql.cnf 文件,加以下内容
9. [mysqld]10. default-character-set=utf811. [mysql]12. default-character-set=utf8
13. 重启Mysql
14. #service mysqld restart3. 安装Hive
在master节点上用hadoop用户登陆。
1. 解压Hive安装包并配置环境变量复制hive-0.10.0-cdh4.7.0.tar.gz到/home/hadoop目录,解压:
2. # tar -zxvf hive-0.10.0-cdh4.7.0.tar.gz
修改~/.bashrc环境变量,在.bashrc 的末尾插入以下变量:
export HIVE_HOME=/home/hadoop/hive-0.10.0-cdh4.7.0export PATH=$PATH:$HIVE_HOME/bin
保存并更新~/.bashrc:
$source ~/.bashrc
或者重新用hadoop用户登陆。
3. 初始化Hive数据库进入mysql数据库
4. $mysql -uroot -p
在mysql创建数据库metastore,导入metastore初始化脚本,添加hadoop用户
mysql> create database metastore;mysql> use metastoremysql>source /home/hadoop/hive-0.10.0-cdh4.7.0/scripts/metastore/upgrade/mysql/hive-schema-0.10.0.mysql.sqlmysql>grant all on metastore.* to hadoop@localhost identified by '123456';mysql>grant all on metastore.* to hadoop@master identified by '123456';mysql>grant all on metastore.* to hadoop@'%' identified by '123456';mysql>flush privileges;
复制mysql的jdbc库mysql-connector-java.jar到~/hive-0.10.0-cdh4.7.0/lib。
配置~/hive-0.10.0-cdh4.7.0/conf/hive-env.sh:
HADOOP_HOME=/home/hadoop/hadoop-2.0.0-cdh4.7.0export HIVE_CONF_DIR=/home/hadoop/hive-0.10.0-cdh4.7.0/confexport HIVE_AUX_JARS_PATH=/home/hadoop/hive-0.10.0-cdh4.7.0/lib
配置~/hive-0.10.0-cdh4.7.0/conf/hive-site.xml
<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.--> <configuration> <!-- WARNING!!! This file is provided for documentation purposes ONLY! --><!-- WARNING!!! Any changes you make to this file will be ignored by Hive. --><!-- WARNING!!! You must make your changes in hive-site.xml instead. --> <!-- Hive Execution Parameters --><property> <name>mapred.reduce.tasks</name> <value>-1</value><description>The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive uses -1 as its default value. By setting this property to -1, Hive will automatically figure out what should be the number of reducers. </description></property> <property> <name>hive.exec.reducers.bytes.per.reducer</name> <value>1000000000</value> <description>size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers.</description></property> <property> <name>hive.exec.reducers.max</name> <value>999</value> <description>max number of reducers will be used. If the one specified in the configuration parameter mapred.reduce.tasks is negative, hive will use this one as the max number of reducers when automatically determine number of reducers.</description></property> <property> <name>hive.cli.print.header</name> <value>false</value> <description>Whether to print the names of the columns in query output.</description></property> <property> <name>hive.cli.print.current.db</name> <value>false</value> <description>Whether to include the current database in the hive prompt.</description></property> <property> <name>hive.cli.prompt</name> <value>hive</value> <description>Command line prompt configuration value. Other hiveconf can be used inthis configuration value. Variable substitution will only be invoked at the hivecli startup.</description></property> <property> <name>hive.exec.scratchdir</name> <value>/tmp/hive-${user.name}</value> <description>Scratch space for Hive jobs</description></property> <property> <name>hive.exec.local.scratchdir</name> <value>/tmp/${user.name}</value> <description>Local scratch space for Hive jobs</description></property> <property> <name>hive.test.mode</name> <value>false</value> <description>whether hive is running in test mode. If yes, it turns on sampling and prefixes the output tablename</description></property> <property> <name>hive.test.mode.prefix</name> <value>test_</value> <description>if hive is running in test mode, prefixes the output table by this string</description></property> <!-- If the input table is not bucketed, the denominator of the tablesample is determinied by the parameter below --><!-- For example, the following query: --><!-- INSERT OVERWRITE TABLE dest --><!-- SELECT col1 from src --><!-- would be converted to --><!-- INSERT OVERWRITE TABLE test_dest --><!-- SELECT col1 from src TABLESAMPLE (BUCKET 1 out of 32 on rand(1)) --><property> <name>hive.test.mode.samplefreq</name> <value>32</value> <description>if hive is running in test mode and table is not bucketed, sampling frequency</description></property> <property> <name>hive.test.mode.nosamplelist</name> <value></value> <description>if hive is running in test mode, dont sample the above comma seperated list of tables</description></property> <property> <name>hive.metastore.uris</name> <value>thrift://master:9083</value> <description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description></property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://master/metastore</value> <description>JDBC connect string for a JDBC metastore</description></property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description></property> <property> <name>javax.jdo.PersistenceManagerFactoryClass</name> <value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value> <description>class implementing the jdo persistence</description></property> <property> <name>javax.jdo.option.DetachAllOnCommit</name> <value>true</value> <description>detaches all objects from session so that they can be used after transaction is committed</description></property> <property> <name>javax.jdo.option.NonTransactionalRead</name> <value>true</value> <description>reads outside of transactions</description></property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hadoop</value> <description>username to use against metastore database</description></property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>password to use against metastore database</description></property> <property> <name>javax.jdo.option.Multithreaded</name> <value>true</value> <description>Set this to true if multiple threads access metastore through JDO concurrently.</description></property> <property> <name>datanucleus.connectionPoolingType</name> <value>DBCP</value> <description>Uses a DBCP connection pool for JDBC metastore</description></property> <property> <name>datanucleus.validateTables</name> <value>false</value> <description>validates existing schema against code. turn this on if you want to verify existing schema </description></property> <property> <name>datanucleus.validateColumns</name> <value>false</value> <description>validates existing schema against code. turn this on if you want to verify existing schema </description></property> <property> <name>datanucleus.validateConstraints</name> <value>false</value> <description>validates existing schema against code. turn this on if you want to verify existing schema </description></property> <property> <name>datanucleus.storeManagerType</name> <value>rdbms</value> <description>metadata store type</description></property> <property> <name>datanucleus.autoCreateSchema</name> <value>false</value> <description>creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once</description></property> <property> <name>datanucleus.autoStartMechanismMode</name> <value>checked</value> <description>throw exception if metadata tables are incorrect</description></property> <property> <name>datanucleus.transactionIsolation</name> <value>read-committed</value> <description>Default transaction isolation level for identity generation. </description></property> <property> <name>datanucleus.cache.level2</name> <value>false</value> <description>Use a level 2 cache. Turn this off if metadata is changed independently of hive metastore server</description></property> <property> <name>datanucleus.cache.level2.type</name> <value>SOFT</value> <description>SOFT=soft reference based cache, WEAK=weak reference based cache.</description></property> <property> <name>datanucleus.identifierFactory</name> <value>datanucleus1</value> <description>Name of the identifier factory to use when generating table/column names etc. 'datanucleus1' is used for backward compatibility with DataNucleus v1</description></property> <property> <name>datanucleus.plugin.pluginRegistryBundleCheck</name> <value>LOG</value> <description>Defines what happens when plugin bundles are found and are duplicated [EXCEPTION|LOG|NONE]</description></property> <property> <name>hive.aux.jars.path</name> <value>file:///home/hadoop/hive-0.10.0-cdh4.7.0/lib/zookeeper-3.4.5-
转载请注明:云帆大数据学院 » Hive安装手册
更多内容请加入qq交流群获取:115438979
Hive实战课程链接:http:/http://www.yfteach.com/h-pd-j-6-3_1.html
