Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr Index on Hive - Loading data into External table fails #42

Open
gseshwar opened this issue Mar 15, 2018 · 1 comment
Open

Solr Index on Hive - Loading data into External table fails #42

gseshwar opened this issue Mar 15, 2018 · 1 comment
Labels
information needed More information is needed to answer

Comments

@gseshwar
Copy link

Hello,
I created a Solr Index on a Hive table with below steps. When I try to load rows from the Hive Internal table to the Hive External table, it fails. Pls help.

  1. CREATE TABLE ER_ENTITY1000(entityid INT,claimid_s INT,firstname_s STRING,lastname_s STRING,addrline1_s STRING, addrline2_s STRING, city_s STRING, state_S STRING, country_s STRING, zipcode_s STRING, dob_s STRING, ssn_s STRING, dl_num_s STRING, proflic_s STRING, policynum_s STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

  2. LOAD DATA LOCAL INPATH '/home/Solr1.csv' OVERWRITE INTO TABLE ER_ENTITY1;

  3. add jar /home/solr-hive-serde-3.0.0.jar;

CREATE EXTERNAL TABLE SOLR_ENTITY999(entityid INT,claimid_s INT,firstname_s STRING,lastname_s STRING,ssn_s STRING,dl_num_s STRING,city_s STRING,state_s STRING,country_s STRING,zipcode_s STRING)
> STORED BY 'com.lucidworks.hadoop.hive.LWStorageHandler'
> LOCATION '/user/i98779/SOLR_ENTITY1'
> TBLPROPERTIES('solr.server.url' = 'http://10.52.192.108:8983/solr','solr.collection' = 'er_entity','solr.query' = ':');

********** All above steps work fine **********

  1. ********** This step fails **********
    INSERT OVERWRITE TABLE SOLR_ENTITY999 SELECT * FROM ER_ENTITY1000;

... With error:
hive> INSERT OVERWRITE TABLE SOLR_ENTITY999 SELECT * FROM ER_ENTITY1000;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = i98779_20180308085142_3918b9ea-2158-4b0e-865f-2fcdefc17e4b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2018-03-08 08:51:45,993 Stage-1 map = 0%, reduce = 0%
Ended Job = job_local1283927429_0001 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: MAPRFS Read: 0 MAPRFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

********** ERROR FROM HIVE JOB LOG is as below **********
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all.

@acesar
Copy link
Contributor

acesar commented Mar 15, 2018

@gseshwar Not sure if it is a typo but the second step you have a different table:

LOAD DATA LOCAL INPATH '/home/Solr1.csv' OVERWRITE INTO TABLE ER_ENTITY1;

ER_ENTITY1 -> ER_ENTITY11000

this log FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
you need to check the yarn logs.

@ctargett ctargett added the information needed More information is needed to answer label May 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
information needed More information is needed to answer
Projects
None yet
Development

No branches or pull requests

3 participants