Public
Authored by Don Park ⛹🏽

Janky DSV to CSV Converter

Quick Notes

This piece of quick and dirty code parses Data Source View files (MSSQL DSV) and exports to CSV files.

This code assumes many things. Some of them are:

  1. One table per file
  2. Ordering of the dataset compared to the first "column" information
  3. Assumes the same number of lines need to be skipped at the top

To note, the code is also:

  1. Slow (the logic works but it'll be a bit slow where it takes about a minute or two per file or so).
  2. Assumes all files in a folder is a DSV file (I'll explain later)

How to Use

The code can be modified to support parallelization (the original version actually is parallelized), but it's dependent on the running environment and situation.

Libraries

This code is confirmed working with R 3.6.1. The setup the environment run the following commands:

install.packages(c('tidyverse', 'doParallel', 'foreach'))

Configuring the Script

Most of these are straight forward.

  1. Under folderLocation you place the path to the folder where all the files are DSVs. This scans the folder for all files (but doesn't have logic that cleans it for DSV files)
  2. Under outputFolder all the .CSV files will be placed.
  3. ignoreLines ignores the first X number of lines from the top before it starts running

How to use the Output

So the output still can be a little wonky, by that I mean it doesn't remove the ' you find for string elements that are present in the DSV files. I haven't gotten around to removing them so you might need some minor post-processing to remove those "quotes", but it'll be good enough to work for other things (it's not as clean as I'd like it but it's a start).

Quick Breakdown of the Code

The code uses an implementation of RegEx within R to handle the logical parsing of each line of the DSV file. It creates a two column data.frame where the first column is the column headers and the second column is the unparsed dataset. The code then parses the content into a data.frame and builds a table based on that value.

script.R 1.42 KB
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment