Thursday, July 23, 2009

Handling oversized spreadsheets


Oh yeah, those log files with hundreds of thousands lines... Apart from the latest Excel release from MS, you can't handle them at all. Being an SAP consultant, I face this problem frequently. With no MS Office on my notebook, I had to find a work around to open and edit log files longer than 65536 lines. Instead of MS Office, I use OpenOffice 3.0, which is available for Linux, too. The solution is easy, but you should keep in mind some small tricks listed below:
  • Create an OpenOffice Base database, set it's type to "text database". You will need to select a folder containing the oversized log files as texts. This database will not read data from the text files, but create a reference to them as virtual tables
  • Create another OpenOffice Base database, this time with standard type
  • Drag and drop the virtual table from the text-type database to the standard one. This will take a while, depending on the size of data in the text file. To my Intel Core 2 Duo with 2GB of RAM, 300k lines in a single column took around 1 minute to copy. Do not change anything at this step (eg the name of the column in the standard table); to me this led to endless calculations
  • Once you have the data in the standard table, save the database. Now you can edit eg the column name. It is recommended to add a simple name (no spaces, etc), otherwise you queries might fail later...
  • Create a query to remove unnecessary data - maybe you get the net record count under 65536, so that you can swith to a spreadsheet manager, namely OpenOffice Calc. Save the database containing both the data and the query
  • Drag and drop the query into the upper left cell (A1) of an empty worksheet in OpenOffice Calc. This might a few minutes again
  • And, you are done: edit your data in the spreadsheet as you like!