Microsoft Excel Spreadsheet is one of the most commonly used
office file but still in ETL world we don’t have that much cases where Excel
file load requirement comes up. Recently in one of my project we have come across
with this requirement where business users are updating excel spreadsheet and
it’s getting uploaded directly in intranet. For some business analysis purpose
they want to merge excel data with other DW data and generate reports.
Informatica Power Center is getting used here for ETL needs. In Power Center you can import excel source definition using Microsoft Excel ODBC driver but if excel file is getting replaced or new file is getting added then you won't be able to use this feature. I think best way would be to convert the excel file in text file format and then use Power Center. This requirement is to do the load every week so we won’t be able to do this manually. We need to have automated scripts which will convert excel into text file. There are many ways of converting excel file into text file. Since I am familiar with Python I decided to use it for converting and use the script in Pre-Session Command of Session task.
Pre-Session Command that we used -
xls2flatfile.py -i "$HOME/ifile/InputFile.xlsx"
-s Sheet2 -o $HOME/ofile/Outfile.txt -d "|"
Description of above command –
xls2flatfile.py: Python script name
-i: Input file name
-s: sheet (tab) name, in our input file
we had multiple tabs.
-o: Output file name
-d: Output file delimiter type, default
is comma delimited type.
Since Python was not getting used in our project I started
from installation steps to writing scripts. I will provide python installation
steps and script information that we used.
Python Installation
-
Our server is Linux so I will be providing Linux command. If
you have Unix Admin group then you can simply ask them to do the installation
for you. You can skip this part.
1. Download the latest version of python from https://www.python.org/download/.
2. Decompress the above file. If its tar file then you could use (tar -xvf Python-2.7.6.tar)
3. Since we don’t have root access, we decided to use our application directory to do the installation. Change directory to decompressed (Python-2.7.6) directory and execute following commands.
a.
./configure --prefix=$HOME/python
b.
make
install
After successful execution of step b Python
installation is complete.
4. In order to read the excel file in Python we need to download xlrd module. Download the latest version from https://pypi.python.org/pypi/xlrd/0.9.3
5. Decompress the xlrd tar file and change directory to decompressed directory.
6. Installation command –
a.
$HOME/python/bin/python setup.py build
b.
$HOME/python/bin/python install --prefix=$HOME/python
After successful execution of step b xlrd installation is
complete.
Excel to Text file
Converter –
I tried to write a generic script as much I could think.
This code can easily be adoptable for any kind of project. It’s very small
script and it’s only around 60 lines of code. I think python code is very much
self-explanatory.
#!$HOME/python/bin/python
import sys
import argparse
import time
import xlrd
## define parameter ##
parser = argparse.ArgumentParser(description='This script is for to convert xls to text file format')
parser.add_argument('-i','--input', help='Input xls file name', required=True)
parser.add_argument('-s','--sheet', help='Input sheet name', required=True)
parser.add_argument('-o','--output',help='Output file name', required=True)
parser.add_argument('-d','--delimiter',help='File delimiter type', default = ',', required=False)
args = parser.parse_args()
## define variable ##
ifile = args.input
isheet = args.sheet
ofile = args.output
delimiter = args.delimiter
print ("Script execution started at %s" % time.ctime())
## show values ##
print ("Input file: %s" % ifile )
print ("Input sheet name: %s" % isheet )
print ("Output file: %s" % ofile )
print ("Output file delimiter: %s" % delimiter )
## Read excel file and write text file ##
workbook = xlrd.open_workbook(ifile)
worksheet = workbook.sheet_by_name(isheet)
out_file = open(ofile, 'w')
num_rows = worksheet.nrows - 1
curr_row = -1
print ("Reading excel file now....")
while curr_row < num_rows:
curr_row += 1
row = worksheet.row(curr_row)
num_cols = worksheet.ncols - 1
curr_col = -1
while curr_col < num_cols:
curr_col += 1
try:
col = worksheet.cell_value(rowx=curr_row, colx=curr_col).encode('ASCII')
except:
col = worksheet.cell_value(rowx=curr_row, colx=curr_col)
#print col
out_file.write('"%s"%s' % (str(col).strip(), delimiter))
out_file.write('\n')
out_file.close()
print ("Script completed at %s" % time.ctime())
exit()
import sys
import argparse
import time
import xlrd
## define parameter ##
parser = argparse.ArgumentParser(description='This script is for to convert xls to text file format')
parser.add_argument('-i','--input', help='Input xls file name', required=True)
parser.add_argument('-s','--sheet', help='Input sheet name', required=True)
parser.add_argument('-o','--output',help='Output file name', required=True)
parser.add_argument('-d','--delimiter',help='File delimiter type', default = ',', required=False)
args = parser.parse_args()
## define variable ##
ifile = args.input
isheet = args.sheet
ofile = args.output
delimiter = args.delimiter
print ("Script execution started at %s" % time.ctime())
## show values ##
print ("Input file: %s" % ifile )
print ("Input sheet name: %s" % isheet )
print ("Output file: %s" % ofile )
print ("Output file delimiter: %s" % delimiter )
## Read excel file and write text file ##
workbook = xlrd.open_workbook(ifile)
worksheet = workbook.sheet_by_name(isheet)
out_file = open(ofile, 'w')
num_rows = worksheet.nrows - 1
curr_row = -1
print ("Reading excel file now....")
while curr_row < num_rows:
curr_row += 1
row = worksheet.row(curr_row)
num_cols = worksheet.ncols - 1
curr_col = -1
while curr_col < num_cols:
curr_col += 1
try:
col = worksheet.cell_value(rowx=curr_row, colx=curr_col).encode('ASCII')
except:
col = worksheet.cell_value(rowx=curr_row, colx=curr_col)
#print col
out_file.write('"%s"%s' % (str(col).strip(), delimiter))
out_file.write('\n')
out_file.close()
print ("Script completed at %s" % time.ctime())
exit()
Very helpful.. I was looking for this. Thanks
ReplyDeleteAksaray
ReplyDeleteAydın
Kütahya
Rize
Bingöl
56541J
görüntülüshow
ReplyDeleteücretli show
VN2BA
malatya evden eve nakliyat
ReplyDeleteartvin evden eve nakliyat
kocaeli evden eve nakliyat
ankara evden eve nakliyat
düzce evden eve nakliyat
MT7PX
85FBC
ReplyDeleteElazığ Parça Eşya Taşıma
Ünye Oto Lastik
Samsun Lojistik
Afyon Lojistik
Kripto Para Nedir
İstanbul Şehirler Arası Nakliyat
Yozgat Şehirler Arası Nakliyat
Mamak Fayans Ustası
Van Parça Eşya Taşıma
8ED95
ReplyDeleteAntalya Lojistik
Çerkezköy Çekici
Eskişehir Şehir İçi Nakliyat
Karaman Parça Eşya Taşıma
Okex Güvenilir mi
Kocaeli Lojistik
Çerkezköy Organizasyon
Bolu Lojistik
Tekirdağ Şehir İçi Nakliyat
14F34
ReplyDeleteCoin Nasıl Oynanır
Btcturk Borsası Güvenilir mi
Sohbet
Tiktok Takipçi Satın Al
Arg Coin Hangi Borsada
Periscope Takipçi Satın Al
Twitter Beğeni Satın Al
Facebook Takipçi Satın Al
Area Coin Hangi Borsada
78ADB
ReplyDeleteYeni Çıkacak Coin Nasıl Alınır
Coin Kazma Siteleri
Tiktok Takipçi Satın Al
Chat Gpt Coin Hangi Borsada
Telegram Görüntüleme Satın Al
Tesla Coin Hangi Borsada
Omlira Coin Hangi Borsada
Görüntülü Sohbet
Kripto Para Nasıl Üretilir
نفخ المجاري بالاحساء b5sx9L7cOW
ReplyDelete<a href="https://elasraa.com/%d8%b4%d8%b1%d9%83%d8%a9-%d9%85%d9%83%d8%a7%d9%81%d8%ad%d8%a9-%d8%a
ReplyDeleteشركة مكافحة الفئران بالدمام hIZRz5Rpdb
ReplyDeleteشركة مكافحة الفئران بالاحساء VY7wkMWcvI
ReplyDeleteشركة تنظيف بيوت بالدمام v4uEgGApml
ReplyDelete