I recently started hanging out on StackOverflow where I found a great question: Dealing with large number of text strings. My answer to the question is here. I’ve dealt with this type of problem in the past: A program that collects a huge pile of data, does some processing to the data and then stuffs the results in a database. In some ways you could solve the stuffing into the database using an ETL tool like SSIS. However, that would require dumping out a file and then firing up the SSIS package. Not hard, but I don’t like working that way.
The solution in my opinion is to use ADO.NET’s SQLBulkCopy. The only problem is that SQLBulkCopy wants you to setup a data source that closely matches the destination for the data. Not really a big deal but it does take a little work. A further issue is dealing with referential integrity. If the destination table has foreign keys that are requires you’ll need to fill in values for them.
To demonstrate this I used the Adventure Works database and created 2 programs to load up the products table. The first program creates as many products you want to test with and saves them to a CSV file. The second program reads the CSV file, sets up a DataTable and then loads it using SqlBulkCopy. To see how fast it is, the time is taken before and after the bulk copy.
You can grab the code here
Creating the data
Creating the data file isn’t a big deal. The only problem was picking out what fields I’d provide. The rest is just making all of the products.
The file layout
To create the file layout I first looked at the definition for the Product table in Adventure works to see what fields absolutely had to be provided. A few of the required fields could be calculated on the file (date modified for instance), others would just be hard coded to whatever value I chose. This is just a demo after all.
The layout I arrived at is:
- Name
- ProductNumber
- Color
- Category
- Cost
- Listprice
- Size
- Sellstartdate
As a matter of convention I went ahead and added the column names to the file header so somebody that picks up the file might have a chance of figuring out what I’ve done.
Creating the file
With the file layout decided on all I have to do is dump everything to a file:
private void button1_Click(object sender, RoutedEventArgs e)
{
int numberOfItems = 0;
string fileName = "";
int.TryParse(txtProductsCount.Text, out numberOfItems);
if(0 == numberOfItems)
{
return;
}
SaveFileDialog sfd = new SaveFileDialog();
if(true == sfd.ShowDialog())
{
fileName = sfd.FileName;
}
else
{
return;
}
labelStatus.Content = "Status:processing";
this.Cursor = Cursors.Wait;
if (true == File.Exists(sfd.FileName))
{
File.Delete(sfd.FileName);
}
StreamWriter writer = new StreamWriter(sfd.FileName);
writer.WriteLine("Name,ProductNumber,Color,category,cost,listprice,size,sellstartdate");
for(int counter = 0; counter < numberOfItems; counter++)
{
writer.WriteLine(string.Format("Name{0},Tst{1},Red,Brakes,1,2,sm,1.0,{2}"
,counter, DateTime.Now.ToString("hhmmss")
,DateTime.Now.ToString("yyyy-MM-dd")));
}
writer.Close();
labelStatus.Content = "Status:idle";
this.Cursor = Cursors.Arrow;
}
Nothing terribly exciting here, but it does generate the test data I want.
Loading the data
The loader itself reads the CSV file and uses the data to create rows in a DataTable. This program is a little more interesting. The first problem is to setup a DataTable that matches the destination. Then you have to load the data and do any adjustments (transforms) to make sure it will load in to the table. Finally you use SQLBulkCopy to move the data in the table to the server.
Setting up the DataTable
SqlBulkCopy can use other data sources as long as they provide an IDataReader interface, but why go through the trouble when DataTable already does what you need? All you have to do is setup your DataTable to match the destination and then fill it up. To setup my DataTable I created a method to create columns for me:
private static DataColumn GetColumn (string columnType, string columnName)
{
DataColumn newColumn = new DataColumn();
newColumn.DataType = System.Type.GetType(columnType);
newColumn.ColumnName = columnName;
return newColumn;
}
This function just creates a DataColumn using the provided type string and column name. This column is then added to the DataTable:
private static DataTable CreateTable()
{
DataTable newProducts = new DataTable("Products");
// Add the column objects to the table. col file
newProducts.Columns.Add(GetColumn("System.Int32", "ProductID")); //0
newProducts.Columns.Add(GetColumn("System.String", "Name")); //1 0
newProducts.Columns.Add(GetColumn("System.String","ProductNumber")); //2 1
newProducts.Columns.Add(GetColumn("System.String","Color")); //3 2
newProducts.Columns.Add(GetColumn("System.Decimal","StandardCost")); //4 4
newProducts.Columns.Add(GetColumn("System.Decimal","ListPrice")); //5 5
newProducts.Columns.Add(GetColumn("System.String","Size")); //6 6
newProducts.Columns.Add(GetColumn("System.Decimal","Weight")); //7
newProducts.Columns.Add(GetColumn("System.Int32", "ProductCategoryID")); //8 3
newProducts.Columns.Add(GetColumn("System.Int32", "ProductModelID")); //9
newProducts.Columns.Add(GetColumn("System.DateTime", "SellStartDate")); //10 7
newProducts.Columns.Add(GetColumn("System.DateTime", "SellEndDate")); //11
newProducts.Columns.Add(GetColumn("System.DateTime", "DiscontinuedDate")); //12
newProducts.Columns.Add(GetColumn("System.Byte[]", "ThumbNailPhoto")); //13
newProducts.Columns.Add(GetColumn("System.String", "ThumbnailPhotoFileName")); //14
newProducts.Columns.Add(GetColumn("System.Guid", "rowguid")); //15
newProducts.Columns.Add(GetColumn("System.DateTime", "ModifiedDate")); //16
return newProducts;
}
The DataTable setup is really pretty easy. This matches up the columns and I added columns to help make it easier for me to remember which column in the file goes where in the DataTable.
Transforming the Data
Before actually loading the file I need to have a way to look up the product categories. For this I made a quick EntityFramework model of the AdventureWorks database and then wrote a method to lookup ProductCategories by name:
private int GetCateogryByName(string categoryName)
{
ProductCategory category = (from cat in Categories
where cat.Name == categoryName
select cat).FirstOrDefault<ProductCategory>();
if(null == category)
{
return -1;
}
return category.ProductCategoryID;
}
This method just uses a simple LINQ statement to find a category whose name matches the one provided and then returns the ProductCateogryID. With all of this in place we can now load the file in to the DataTable:
private int LoadFile(string fileName, DataTable productTable)
{
StreamReader reader = new StreamReader(fileName);
try
{
bool done = false;
int count = 0;
reader.ReadLine(); //get rid of the header
while (!done)
{
string row = reader.ReadLine();
if (true == string.IsNullOrEmpty(row))
{
done = true;
break;
}
DataRow product = productTable.NewRow();
GetDataRow(row, product);
productTable.Rows.Add(product);
count++;
}
reader.Close();
return count;
}
catch (Exception exc)
{
reader.Close();
throw;
}
}
This has been a million times so it isn’t really worth talking about. Now that we have the data loaded we just call SqlBulkCopy:
private void BulkLoadProducts()
{
SqlConnection connection = GetConnection();
connection.Open();
SqlCommand countCommand = new SqlCommand("SELECT COUNT(*) FROM SalesLT.Product", connection);
long count = (int) countCommand.ExecuteScalar();
listBox1.Items.Add("product count before:" + count.ToString());
SqlBulkCopy bulkCopy = new SqlBulkCopy(connection);
bulkCopy.DestinationTableName = "SalesLT.Product";
DateTime start = DateTime.Now;
bulkCopy.WriteToServer(ProductTable);
DateTime end = DateTime.Now;
TimeSpan diff = end.Subtract(start);
connection.Close();
listBox1.Items.Add(string.Format("took {0}ms", diff.Milliseconds));
}
The magic really takes two three lines: first, create the SqlBulkCopy instance. Second tell it what table it will be loading. Third, call WriteToServer passing in the data. The other stuff in the method is just there to provide us some timing information. On my laptop running everything I’m able to load 1200 rows in about a third of a second. Running on a real server I’d imagine you’d be able to load data even faster.
SqlBulkCopy
For most of us we won’t be using SqlBulkCopy everyday. But on the rare occasions where you need to blast a ton of data in to a database this is a handy object to have around. You can read the MSDN docs here (http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx). And David Hayden also has a good write up on SqlBulkCopy here (http://davidhayden.com/blog/dave/archive/2006/03/08/2877.aspx).
I hope you this helps you. Please let me know what you think or if you know of any better alternatives for this situation.