Getting Code Review Statistics Programmatically from TFS

This post describes how to use LINQPad to query a Team Foundation Server (TFS) instance so that code review statistics can be collected.

Context

The project I work on for my employer has an external release cycle of six months. In between releases, the developers propose ideas about how things could be improved for the following release. One of the top suggestions was more code reviews.

Our project uses TFS for source control and work item tracking. TFS supports developers requesting code reviews by creating a collection of files (changeset) and tagging other developers as reviewers. The reviewers can accept or decline said review, and if they accept they close their end by marking it as Looks Good, Needs Work, or With Comments.

Having improvement goals is all well and fine, but if there’s no way to measure whether we’re actually doing more reviews, it’s difficult to tell how big of an impact it made. I set out to come up with a script that could be run every so often that would collect statistics about the reviews during each sprint:

  1. Who created reviews, and how many?
  2. Who took part in reviews, and in how many?
  3. For each developer, how many reviews are open/closed?
  4. For each reviewer, how many are not finished?

TFS is backed by SQL Server, so it shouldn’t be too much of a surprise that you can run queries against it. (This MSDN post is a good starting point if you’re interested in what else you can do with such queries.) The remainder of this post will describe

  1. my choice of tools for this task,
  2. how to connect to and query TFS programatically,
  3. the structure TFS uses when it comes to code reviews,
  4. how to collect the data about code reviews.

Choice of tools

LINQPad is a handy tool when you have a few lines (or a few methods) of C# or VB.NET code you want to throw together to test something out. You don’t need to involve Visual Studio to create a new console app that you’ll just delete when you’re done experimenting.

As the name implies, LINQPad makes it very convenient to put together LINQ queries and have their results rendered in grids, which can in turn be collapsed, have columns summed, etc. Considering the relational nature of how TFS stores its data, LINQ was a logical choice for getting at the pieces I needed.

My implementation involved a single .linq file, that I copied to a shared location for other developers to access. (If your organization allows it, you could put it in a repository on Bitbucket or GitHub.) Simply open the file, hit F5, and watch the results pour onto the screen.

You may need to add the following references to your script (by pressing F4 in LINQPad):

  • Microsoft.TeamFoundation.Client.dll
  • Microsoft.TeamFoundation.Common.dll
  • Microsoft.TeamFoundation.VersionControl.Client.dll
  • Microsoft.TeamFoundation.WorkItemTracking.Client.dll
  • Microsoft.TeamFoundation.WorkItemTracking.Tracking.Common.dll

My particular instances of these DLLs were in C:\Program Files (x86)\Microsoft Visual Studio 12.0\Common7\IDE\ReferencesAssemblies\v2.0\.

Aside: Because of the nature of my work, I have a fairly locked-down development environment. To spare myself some typing, I don’t have a full implementation of the script to share; however, you should be able to take the code snippets from this post and run with them for your own purposes.

Connecting to TFS

Your specific connection process may vary, but here’s how I get access to the Work Item Store, which does most of the heavy lifting.

var tfsUrl = new Uri("http://someserver:8080/tfs/defaultcollection");
var server = new TfsTeamProjectCollection(tfsUrl, CredentialCache.DefaultNetworkCredentials);
_workItemStore = server.GetService<WorkItemStore>();

To help with filtering by project later on, I also hang on to the exact project instance — assuming the title of my project is “Awesome App”.

_project = _workItemStore.Projects.Cast<Project>().Where(p => p.Name == "Awesome App").FirstOrDefault();

I have to use the Cast extension method because Projects is of type ProjectCollection, which is not a generic type, thus I don’t get my usual LINQ extension methods like Where.

Running a query

I’m interested in breaking down the number of reviews for each two-week sprint, so I need to get the names for each sprint first. (In our instance of TFS, the sprint name is under Iteration Path.)

var sprints = _workItemStore.Query(String.Format("Select * from WorkItems where [System.TeamProject] = '{0}' and [System.WorkItemType] = 'Sprint'", _project.Name))
    .OfType<WorkItem>()
    .Select(r => new
    {
        r.Id,
        r.IterationPath,
        Fields = r.Fields.OfType<Field>().Select(x => new { x.Name, x.Value })
    })
    .Select(r => new
    {
        r.Id,
        r.IterationPath,
        StartDate = r.Fields.First(x => x.Name == "Start Date").Value,
        EndDate = r.Fields.First(x => x.Name == "Finish Date").Value
    })
    .Where(r => r.StartDate != null && r.EndDate != null)
    .Select(r => new
    {
        r.Id, 
        r.IterationPath, 
        StartDate = DateTime.Parse(r.StartDate.ToString()),
        EndDate = DateTime.Parse(r.EndDate.ToString())
     });

Note that there’s a Fields property on the work item that has the start and end dates I need for the sprints.

The iteration path also contains the named release as well, and I only want to get sprint names that apply for release “v12.1” that are either in progress or in the past.

return sprints.Where(r => r.IterationPath.Contains("v12.1") && (r.EndDate <= DateTime.Now || r.StartDate <= DateTime.Now)) .Select(r => r.IterationPath)
              .ToList();

By this point, I know the names of each sprint so that I can get just the code review work items that belong to a particular sprint. Now let’s look at how code reviews are stored in TFS.

How code reviews work in TFS

The primary class used when dealing with TFS data is WorkItem; work items can be tasks, bugs, etc. When it comes to code reviews, there are two types of work items:

  • Code Review Request
  • Code Review Response

When I ask for a review, TFS creates a Code Review Request, and then behind the scenes creates one Code Review Response for every person I’ve asked to review my work. So if I tag Alice, Bob, and Charlie on my review, there are four work items created for me: one request plus three responses.

The glue is something called a Related Link. The parent review relates to the children responses and vice versa. This simplifies things a bit because once I have the review, I can easily determine the response work items based on their IDs.

Collecting the data

I need a helper method that gets me all the requests for the iterations.

private List<WorkItem> GetCodeReviewRequestsForIteration(string iteration)
{
    var query = String.Format(@"
        select *
        from WorkItems
        where
        [System.IterationPath] under '{0}'
        and [System.WorkItemType] = 'Code Review Request'", iteration);

    return _workItemStore.Query(query).OfType<WorkItem>().Select(r => r).ToList();
}

Showing code review initiators

Here’s how I can use LINQ to group the code reviews by developer, get the total number of reviews, and then display it to the screen using the Dump() method.

private void ShowCodeReviewInitiators(List<WorkItem> codeReviewRequest)
{
    var creators = codeReviewRequests.Select(r => r.CreatedBy).GroupBy(r => r);
    var reviewCounts = new Dictionary<string, int>();
    foreach (var creatorGroup in creators)
        reviewCounts[creatorGroup.Key] = creatorGroup.Count();

    reivewCounts.OrderByDescending(x => x.Value).Dump();
}

Here’s some example output:

review-creator-count

Showing review states

To simplify things, I only care about the state of the review and who created it. To make the code a little cleaner, I created a class with two properties.

public class MemberWithState
{
    public string Member { get; set; }
    public string State { get; set; }
}

Next, I get the creator and state for all the reviews, group them, and then count up the states of the reviews.

private void ShowCodeReviewRequestStates(List<WorkItem> codeReviewRequests)
{
    var collatedStates = new Dictionary<string, string>();
    var groupedCreators = codeReviewRequests
                             .Select(r => new MemberWithState { Member = r.CreatedBy, State = r.State })
                             .GroupBy(g => g.Member);
    foreach (var creatorGroup in groupedCreators) 
    {
        var numRequested = creatorGroup.Count(g => g.State == "Requested");
        var numAccepted = creatorGroup.Count(g => g.State == "Accepted");
        var numClosed = creatorGroup.Count(g => g.State == "Closed");

        collatedStates[creatorGroup.Key] = String.Format("Still open: {0} | Accepted: {1} | Closed: {2}",
            numRequested, numAccepted, numClosed);
    }

    collatedStates.Dump();
}

The output could look something like this:

review-state

Showing code review initiators

As mentioned previously, the review responses are their own work items that are linked to the parent review. The WorkItem class as a Links property (which is a non-generic LinkCollection). In this case, we only care about RelatedLink instances (as opposed to external links).

var relatedLinks = codeReviewRequest.Links.OfType<RelatedLink>().Select(x => x);
var relatedLinkIds = relatedLinks.Cast<RelatedLink>().Select(x => x.RelatedWorkItemId);

Now here’s where things get a little hack-y. There is an AssignedTo field in the WorkItem.Fields property. However, if the reviewer hasn’t accepted the review, that field is empty. Fortunately there’s another mechanism we can use to find out who the reviewers are: TFS creates the response work items with a title that has the name of the reviewer appended to it inside parentheses. For example, the review titled “Added business logic” can have a review response named “Added business logic (Alice Smith)”.

Here’s the code that finds out who the reviews belong to. (So that I don’t have to re-query, I go ahead and grab the response states while I’m looking at the related work items.)

var reviewerStates = new List<MemberWithState>();
foreach (var relatedLinkId in relatedLinkIds)
{
    var responseWorkItem = GetWorkItemById(linkedId);
    var indexOfParen = responseWorkItem.Title.LastIndexOf('(');
    var memberName = responseWorkItem.Title.Substring(indexOfParen + 1).Trim(')');
    reviewerStates.Add(new MemberWithState { Member = memberName, State = responseWorkItem.State });
}

I wrote a helper function to get a work item by ID, which basically does the following: select * from WorkItems where [System.Id] = {0}.

Showing review request states

The same process for showing review states works for review requests as well — group them, and then count up the states of the requests. (Because the code is so similar to showing review states as above, I’ve omitted the code here.)

Wrapping up

By now you should have a better understanding of how to create a LINQPad query (or at least some .NET code) to collect code review information from a TFS instance. There’s a wealth of information in a well-populated TFS system, and the API for querying it is fairly straightforward. LINQ really helps tie everything together so you can aggregate whatever values you hope to analyze.

Caveat: Because of the deep object hierarchy of WorkItem — and also possible circular references with other WorkItems — I don’t recommend calling .Dump() on a particular WorkItem instance. (I ended up getting an OutOfMemoryException in LINQPad.) To be honest, I did end up creating a throwaway project in Visual Studio so that I could use the Locals window to inspect objects while stopped at a breakpoint. I used both tools in tandem to figure out where I needed to go.

Now that this script is in place for my project, I look forward to seeing if

  1. code reviews become more commonplace and frequent, and
  2. there’s an inverse correlation between code reviews and number of defects reported.

8 Comments

Comments are closed.