Janky DSV to CSV Converter
This piece of quick and dirty code parses Data Source View files (MSSQL DSV) and exports to CSV files.
This code assumes many things. Some of them are:
- One table per file
- Ordering of the dataset compared to the first "column" information
- Assumes the same number of lines need to be skipped at the top
To note, the code is also:
- Slow (the logic works but it'll be a bit slow where it takes about a minute or two per file or so).
- Assumes all files in a folder is a DSV file (I'll explain later)
How to Use
The code can be modified to support parallelization (the original version actually is parallelized), but it's dependent on the running environment and situation.
This code is confirmed working with R 3.6.1. The setup the environment run the following commands:
install.packages(c('tidyverse', 'doParallel', 'foreach'))
Configuring the Script
Most of these are straight forward.
folderLocationyou place the path to the folder where all the files are DSVs. This scans the folder for all files (but doesn't have logic that cleans it for DSV files)
outputFolderall the .CSV files will be placed.
ignoreLinesignores the first X number of lines from the top before it starts running
How to use the Output
So the output still can be a little wonky, by that I mean it doesn't remove the
' you find for string elements that are present in the DSV files. I haven't gotten around to removing them so you might need some minor post-processing to remove those "quotes", but it'll be good enough to work for other things (it's not as clean as I'd like it but it's a start).
Quick Breakdown of the Code
The code uses an implementation of RegEx within R to handle the logical parsing of each line of the DSV file. It creates a two column data.frame where the first column is the column headers and the second column is the unparsed dataset. The code then parses the content into a data.frame and builds a table based on that value.