Orc table creation from spark sql with snappy compression

9/7/2023

|11 |0.7142857142857143 |10.0 |7.0 |JXDCGLmlZGEONYlgCtjfIZSOcMzCPVNPkNaHedcmpMbXDuCLmH| 11 |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx| |10 |0.6428571428571429 |9.0 |9.0 |cBEKanDFrPZkcHFuepVxcAiMwyAsRqDlRtQxiDXpCNycLapimt| 10 |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx| |9 |0.5714285714285714 |8.0 |7.0 |NxrilRavGDMfvJNScUykTCUBkkpdhiGLeXSyYVgsnRoUYAfXrn| 9 |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx| |8 |0.5 |7.0 |3.0 |xyimTcfipZGnzPbDFDyFKmzfFoWbSrHAEyUhQqgeyNygQdvpSf| 8 |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx| |7 |0.42857142857142855|6.0 |5.0 |jzPdeIgxLdGncfBAepfJBdKhoOOLdKLzdocJisAjIhKtJRlgLK| 7 |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx| |6 |0.35714285714285715|5.0 |12.0 |KfFWqcajQLEWVxuXbrFZmUAIIRgmKJSZUqQZNRfBvfxZAZqCSg| 6 |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx| |5 |0.2857142857142857 |4.0 |9.0 |qVwYSVPHbDXpPdkh圎pyIgKpaUnArlXykWZeiNNCiiaanXnkks| 5 |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx| |4 |0.21428571428571427|3.0 |3.0 |tgUzEjfebzJsZWdoHIxrXlgqnbPZqZrmktsOUxfMvQyGplpErf| 4 |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx| |3 |0.14285714285714285|2.0 |3.0 |LIixMEOLeMaEqJomTEIJEzOjoOjHyVaQXekWLctXbrEMUyTYBz| 3 |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx|

|2 |0.07142857142857142|1.0 |13.0 |dffxkVZQtqMnMcLRkBOzZUGxICGrcbxDuyBHkJlpobluliGGxG| 2 |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx| |1 |0.0 |0.0 |2.0 |KZWeqhFWCEPyYngFbyBMWXaSCrUZoLgubbbPIayRnBUbHoWCFJ| 1|xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx| |ID |CLUSTERED |SCATTERED|RANDOMISED|RANDOM_STRING |SMALL_VC |PADDING | Rows = spark.sql(f"""SELECT COUNT(1) FROM """).show() Please let me know if you need more information.If (spark.sql("SHOW TABLES IN test like 'randomDataDelta'").count() = 1): I am still investigating what is the best way to handle VARCHAR/CHAR types through Spark dataframe. I noticed some columns are defined as VARCHAR(35) and I think those columns may be the issue.Īfter I made the change from VARCHAR to String and CHAR to String, it worked fine. if use the same ORC but use hive to create a table using second query even then I am getting the same error. But I use an existing table alter table with a new coulmn using the Spark Hive context and save as ORC with snappy compression, I am getting the following error ORC does not support type conversion from STRING to VARCHAR. if I store ORC file with snappy compression and use hive to create table using script 1 then it is working fine. If use the first script using spark sql and store the file as ORC with snappy compression it is working. The original table create when we scooped the data from SQL server using SQOOP importĬREATE TABLE `testtabledim`( `person_key` bigint, `pat_last` varchar(35), `pat_first` varchar(35), `pat_dob` timestamp, `pat_zip` char(5), `pat_gender` char(1), `pat_chksum1` bigint, `pat_chksum2` bigint, `dimcreatedgmt` timestamp, `pat_mi` char(1), `h_keychksum` string, `patmd5` string) ROW FORMAT SERDE '.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT '.ql.io.orc.OrcInputFormat' OUTPUTFORMAT '.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://hdp-cent7-01:8020/apps/hive/warehouse/datawarehouse.db/testtabledim' TBLPROPERTIES ( 'COLUMN_STATS_ACCURATE'='false', 'last_modified_by'='hdfs', 'last_modified_time'='1469026541', 'numFiles'='1', 'numRows'='-1', 'orc.compress'='SNAPPY', 'rawDataSize'='-1', 'totalSize'='11144909', 'transient_lastDdlTime'='1469026541')

A) The following is the show create table testtable results ( this table is created with Spark SQLĬREATE TABLE `testtabletmp1`( `person_key` bigint, `pat_last` string, `pat_first` string, `pat_dob` timestamp, `pat_zip` string, `pat_gender` string, `pat_chksum1` bigint, `pat_chksum2` bigint, `dimcreatedgmt` timestamp, `pat_mi` string, `h_keychksum` string, `patmd5` string) ROW FORMAT SERDE '.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT '.ql.io.orc.OrcInputFormat' OUTPUTFORMAT '.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://hdp-cent7-01:8020/apps/hive/warehouse/datawarehouse.db/testtabledimtmp1' | TBLPROPERTIES ( 'orc.compress'='SNAPPY', 'transient_lastDdlTime'='1469207216')Ģ.

0 Comments

Orc table creation from spark sql with snappy compression

Leave a Reply.

Author

Archives

Categories