Blogger Jeff = new Blogger

Programming and stuff in Western PA

Using Linq to query a CSV File

I know there have been a ton of times I’ve had to deal directly with Comma Delimited files. Searching through the contents for certain pieces of data is usually not a whole lot of fun. However with Linq, things become a bit more enjoyable!

To start off, we’ll create a csv file with the following contents :

"Mets","New York","NL"
"Marlins","Florida","NL"
"Orioles","Baltimore","AL"
"Pirates","Pittsburgh","NL"
"Phillies","Philadelphia","NL"

Save that some place and name it “teams.csv”. Then we’ll add the following namespace to our class file :

using System.Data.OleDb

Then using OleDb, we can access the csv file, and load a data table into a data adapter :

String conn = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\;
    Extended Properties=""Text;HDR=No;FMT=Delimited""";

OleDbConnection cn = new OleDbConnection(conn);
OleDbCommand cmd = new OleDbCommand(@"SELECT * FROM C:\Temp\teams.csv", cn);
OleDbDataAdapter da = new OleDbDataAdapter(cmd);

cn.Open();

DataTable dt = new DataTable();

da.Fill(dt);

At this point, we are good to go to query the datatable with Linq. Our entire function looks like this :

static void QueryCsv()
{

    OleDbConnection cn = new OleDbConnection (@"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=
C:\;Extended Properties=""Text;HDR=No;FMT=Delimited""");
    OleDbCommand cmd = new OleDbCommand(@"SELECT * FROM C:\Temp\teams.csv", cn);
    OleDbDataAdapter da = new OleDbDataAdapter(cmd);

    cn.Open();

    DataTable dt = new DataTable();

    da.Fill(dt);

    var teams = from r in dt.AsEnumerable()
                where r.Field(2) == "NL"
                select new { City = r.Field(0), TeamName= r.Field(1) };

    foreach (var team in teams)
    {
        Console.WriteLine(String.Format("The {0} {1}",team.TeamName,team.City));
    }

    Console.ReadLine();

    cn.Close();

}

If you run this you should see :

Result Set

Now if your life is like mine, the client will always throw a screwball at you and say something like : “We need to return everything from the csv file where the record also exists in this other data source.” Linq makes this easy too. We’ll create a quick class to demonstrate to represent the other data source :

public class Mascot
{
    public string Name;
    public string City;
    public string Player;
}

We’ll then create a list with our data in it

List mlb = new List
{
    new Mascot{City="Philadelphia",Name="Philllies",Player="Cole Hammels"},
    new Mascot{City="Boston",Name="Red Sox",Player="Curt Schilling"},
    new Mascot{City="Florida",Name="Marlins",Player="Dontrelle Willis"},
    new Mascot{City="Cleveland",Name="Indians",Player="Jake Westbrook"},
    new Mascot{City="Baltimore",Name="Braves",Player="Eric Bedard"},
    new Mascot{City="Los Angeles",Name="Dodgers",Player="Randy Wolf"}
};

Adjusting our original code, we can query and join the two data sources to display some common data :

static void QueryCsvWithJoin()
{

	 String conn = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\;
	     Extended Properties=""Text;HDR=No;FMT=Delimited""";

	 List mlb = new List
	 {
	     new Mascot{City="Philadelphia",Name="Philllies",Player="Cole Hammels"},
	     new Mascot{City="Boston",Name="Red Sox",Player="Curt Schilling"},
	     new Mascot{City="Florida",Name="Marlins",Player="Dontrelle Willis"},
	     new Mascot{City="Cleveland",Name="Indians",Player="Jake Westbrook"},
	     new Mascot{City="Baltimore",Name="Braves",Player="Eric Bedard"},
	     new Mascot{City="Los Angeles",Name="Dodgers",Player="Randy Wolf"}
	 };

	 OleDbConnection cn = new OleDbConnection(conn);
	 OleDbCommand cmd = new OleDbCommand(@"SELECT * FROM C:\Temp\teams.csv", cn);
	 OleDbDataAdapter da = new OleDbDataAdapter(cmd);

	 cn.Open();

	 DataTable dt = new DataTable();

	 da.Fill(dt);

	 var teams = from r in dt.AsEnumerable()
	             join o in mlb on r.Field(1) equals o.City
	             where r.Field(2) == "NL"
	             select new { City = r.Field(0), TeamName= r.Field(1),Player=o.Player }; 

	 foreach (var team in teams)
	 {
	     Console.WriteLine(String.Format("Pitcher {0} from the {1} {2}", team.Player,team.TeamName, team.City));
	 }

	 Console.ReadLine();

	 cn.Close(); 

}

When run this, we get

Result Set 2

Tags: , , , ,

Advertisements

January 28, 2008 - Posted by | .Net, C#, Linq

14 Comments »

  1. Sweet, I can really use this.

    Comment by Jared Coakley | January 31, 2008 | Reply

  2. This works for the most part if the csv file is comma delimited. However if anything is different from the norm you will need to use a Schema.ini file to specify the extra parameters. For example, for a tab delimited file named data.txt you would need to add the file Schema.ini in the same folder with the following contents:

    [data.txt]
    ColNameHeader=True
    Format=TabDelimited
    MaxScanRows=0

    Comment by DigitalMan | March 12, 2008 | Reply

  3. For carat delimited can use Schema.ini with:

    [data.txt]
    ColNameHeader=True
    Format=Delimited(^)
    MaxScanRows=0

    More on Schema.ini http://msdn2.microsoft.com/en-us/library/ms709353.aspx

    Comment by DigitalMan | March 12, 2008 | Reply

  4. @Dan : Yes, and thanks for pointing out the use of the schema files.

    Comment by rudesyle | March 13, 2008 | Reply

  5. I guess an alternative to using OleDb to get the CSV data would be to use the free LINQ to CSV library at
    http://www.codeproject.com/KB/linq/LINQtoCSV.aspx

    That allows you to use tab delimited data, dates in international formats, etc., without using a Schema.ini file.

    Comment by maryjames | April 12, 2008 | Reply

  6. var teams = from r in dt.AsEnumerable()
    where r.Field (2) == “NL”
    select new { City = r.Field (0), TeamName = r.Field (1) };

    Comment by Nigel Story | April 22, 2008 | Reply

  7. When trying to compile the LINQ to CSV code above (in VS 2008), I get this err msg:

    “The type arguments for method ‘System.Data.DataRowExtensions.Field(System.Data.DataRow, string)’ cannot be inferred from the usage. Try specifying the type arguments explicitly.”

    on the line:

    where r.Field(2) == “NL”

    Do you know why?

    BTW, as an obvious sports fan, you might like my site:

    http://www.knowitallsports.com

    Comment by Clay Shannon | September 17, 2008 | Reply

  8. I got it to work by changing the code to this (thanks to Joe Mayonnaise on Microsoft’s C# forum):

    var teams = (from r in dt.AsEnumerable()
    where (r.Field(2) == “NL”)
    select new { City = r.Field(0),
    TeamName = r.Field(1) });

    Joe said, “Field is a generic method and isn’t able to infer the type, so you should specify the type like this: r.Field(2).”

    Comment by Clay Shannon | September 19, 2008 | Reply

  9. Thanks for following up Clay. I haven’t had a chance to see why it didn’t work for you. But when I wrote this article, I wrote the above program first and it did run error free.

    Comment by rudesyle | September 19, 2008 | Reply

  10. Aha!

    I see in my post that the necessary “left angle bracket string right angle bracket” got stripped out. These should be placed after each instance of “r.Field” and before the “left paren integer right paren”

    So the original code as shown in the article probably had this, too, I assume…

    Comment by Clay Shannon | September 19, 2008 | Reply

  11. Can you email an example of what you mean by: “the necessary ‘left angle bracket string right angle bracket’ got stripped out. These should be placed after each instance of ‘r.Field’ and before the ‘left paren integer right paren'”

    Comment by Jason | February 10, 2009 | Reply

  12. To get this to work I had to make write my r.Fields like this:

    where r.Field(2) == “NL”
    select new { City = r.Field(0), TeamName= r.Field(1) };

    Comment by William Lingle | March 24, 2010 | Reply

  13. Last post didn’t work

    where r.Field “left angle bracket” then the word string “right angle bracket” then the (2)

    Comment by William Lingle | March 24, 2010 | Reply

  14. Hi There,

    I just got given a task to write a function that sorts and returns a datatable strictly in Linq. However got into difficulty using this row.Field(sortType): The type arguments for method ‘System.Data.DataRowExtensions.Field(System.Data.DataRow, int)’ cannot be inferred from the usage. Try specifying the type arguments explicitly.

    Here is the code: Can any one tell me what I am doing wrong?

    public enum SortDirection{ ascending=0, descending=1};

    public DataTable SortTableByLinq(DataTable tb, string sortType, SortDirection sortDirection)
    {

    switch (sortDirection)
    {
    case SortDirection.descending:
    var resultSet = from row in tb.AsEnumerable()
    orderby row.Field(sortType) descending
    select row;

    DataTable dt = new DataTable();
    dt.Rows.Add(resultSet);
    return dt;
    default:
    resultSet = from row in tb.AsEnumerable()
    orderby row.Field(sortType) ascending
    select row;

    dt = new DataTable();
    dt.Rows.Add(resultSet);
    return dt;
    break;
    }

    }

    Comment by Martin Okello | August 9, 2010 | Reply


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: