Thursday 3 January 2019

SQL pagination with hierarchical tables


         Some time ago I was in quite a pickle to paginate over a database table, which contained hierarchical data, with the requirement of pagination done on a specific level of the hierarchy. I had no ownership of the DB and had to do everything from the side of the code. I've put quite a bit of time into looking for a solution, finally a colleague has proposed a solution I'm describing here.

Scenerio:
·         No control over the database
·         Hierarchical data in a flat table
·         Specific hierarchy level should be paginated and returned (parent nodes of specific paged nodes and children only of the paged nodes)
·         Decrease the amount of data, which is sent through the wire
·         Use CTE

Table: Users

Id
Name
Level
Parent
11
Janet
1
NULL
21
Tom
1
NULL
31
Anthony
2
11
41
Jack
2
21
51
Roberta
3
31
61
Ron
3
41

In this case, if we paginate by 1 record single page we'd want to get Anthony, his parent and all his children, their children etc.

Here's one of the solution to paginating on a specific level of hierarchy:


with UsersSecondLevel as
(
select * from Users where Level = 2
), UsersSecondLevelPage as
(
select * from UsersSecodLevel
OFFSET {pageIndex * pageSize} ROWS FETCH NEXT {(pageIndex + 1) * pageSize} ROWS ONLY
), UsersSecondLevelPageWithParents as
(
select parents.* from Users parents where parents.Level = 1 and parents.Id in (select Id from UsersSecondLevelPage)
union all
select * from UsersSecondLevelPage
), UsersSecondLevelPageChildren as
(
select * from UsersSecondLevelPageWithParents
union all
Select u.* form Users u
Inner join UsersSecondLevelPageChildren c on c.Parent = u.Id and c.Level > u.Level
)
select * from UsersSecondLevelPageChildren

This can be used to get the data as a single SQL liner, for instance with the entity framework and an SQL block.
If the source of the data is a function, we may have to use a temp table as a proxy for the data to avoid repeated calls to the function.


Thursday 22 November 2018

Testing API with HTTP requests containg sizable JSON body


          Testing APIs with big size JSON can be troublesome. Sometimes the easiest way to test our API is to be able to get a pregenerated JSON from a separate file and send it to the API. Keeping JSON samples for testing in a file can also make it easier for us to segragate them and have a well maintained collection of them.

Using a text file with JSON for testing an API is not always straight forward. Here are 2 approaches I've used to use a big JSON file (~60MB) to test an API.

CURL

One option is to use CURL, which I've already described in: http://simonkatanski.blogspot.com/2018/10/testing-net-core-with-jmeter-and-curl.html
However it takes a bit of time, CURL prints the response, which also is not ideal in most situations.

JMeter

To send requests using JMeter and loading body from file we need to do the following:
  • Create a HTTP Header Manager, we add all the headers in it
  • Create a HTTP Request we set it as POST, set the Server Name, Port Number
  • In Body Data tab of the HTTP Request we enter the following:
                       ${__FileToString(C:\\JsonSamples\\big.json,,)}
  • Now we can add the listeners to view the results and we are done

It's also possible to use an incrementing parameter in the file name and load different files with different requests.

Tuesday 30 October 2018

Testing .NET Core with JMeter and Curl


          There are different ways of testing your APIs, I've recently started using 2 new to me and quite useful. Given I come from the Windows world I never had the pleasure to use Curl. Recently I've started using it specifically to test the API with big json files. All we need to do is to install it, it will run within the gitbash. Nowadays we can use HTTP-REPL to discover and test APIs from command line, however Curl can still be useful. An example of a call:

curl -X POST "http://localhost:9000/api/resource" -H "accept: text/plain" -H "header1: headerValue1" -H "header2: headerValue2" -H "Content-Type: application/json" --data-binary @file.json

I think most of the above is pretty self-explanatory.
--data-binary allows to load the file at the current folder, while maintaining the line breaks in json
--data is an option we can use instead --data-binary, it will however strip all the line breaks
We can get the file from the folder one level above our level by using @../file.json instead

Another tool is JMeter. JMeter allows you to run a plan of API tests list response times, put them on a chart, calculate averages etc. The plan can be multi request, we can add authentication against a token end point, and then do an authenticated call to our API and check the summarized timings. Perhaps it looks a bit dated, but it's a pretty powerful tool. It's also free.

To do a quick start you need to:
  • Create a Test Plan
  • Add a Thread Group to the Test Plan (TestPlan -> Add -> Threads (Users) -> Thread Group
    • In Thread Group you can change Thread properties to influence how many threads are sending requests at a time and how long the pause is between the requests. Also set Loop Count to Forver
  • Add HTTP Request to the Thread Group (Thread Group -> Add -> Sampler -> HTTP Request
    • In HTTP Request set the HTTP Method, Path should contain the API endpoint URL, Server Name should contain the ip or the host, in Parameters add the headers
  • Add Listeners to the Thread Group (Thread Group -> Add -> Listener -> Summary Report or Response Time Graph
     
Now you can click play and check the Summary Report and the Time Graph after multiple requests have been processed.


JMeter: https://jmeter.apache.org/index.html
CURL for Windows: https://curl.haxx.se/download.html

Thursday 22 March 2018

Linq and entity framework: Chaining where clauses to produce OR query. Part 2

1. Generic approach to generating LINQ OR statements

In the previous post Linq and entity framework OR chaining I've given a quick overview of how can we have a quick and dirty SQL translatable solution to filtering IQueryable collection using a list of strings.

I'd like to take this idea further and expand on it, by proposing a generic approach. This is how it's going to work:

var filterList = new List<string> { "Name1""Name3" };
var resultQuery = queryableContainers.WhereOr(
    (container, filterItem) => container.InnerContainer.Name.Contains(filterItem),
    filterList);

It will query each item in the entity collection against each item in passed list. The resulting SQL will be containing a WHERE clause with multiple OR connected conditions (as many as there are items in the filter list.

Here's there WhereOr method:

/// <summary>
///     WhereOr extension allows to chain together predicates.
///     Against a common collection with OR operator.
/// </summary>
/// <typeparam name="T">Queried item.</typeparam>
/// <typeparam name="TFilter">Filter item type.</typeparam>        
/// <param name="filterPredicate">Predicate, which will return true or false.
/// Resulting from an operation between T and TFilter parameters.</param>
/// <param name="filterItems">Filter items collection to WHERE against. 
/// The number of OR predicate calls will be dependant on the number of filter items.</param>
public static IQueryable<T> WhereOr<TTFilter>(
    this IQueryable<T> self,            
    Expression<Func<TTFilterbool>> filterPredicate,
    IEnumerable<TFilter> filterItems)
{            
    var orChainedWhereClause = self.WhereItemFiltered(
        filterPredicate,
        filterItems.First());

    foreach (var filterItem in filterItems.Skip(1))
    {
        orChainedWhereClause = orChainedWhereClause.Or(
            self.WhereItemFiltered(
                filterPredicate,
                filterItem));
    };

    return self.Where(orChainedWhereClause);
}

The method does pretty much the same, what we've done in the example from the previous post. The only difference is that the condition can be passed from the outside - the rest is generic.

First WhereItemFiltered call produces the initial expression, which the will be joined with the use of Or against other expressions based on the filterItems collection.

To complete the walkthrough, this is the WhereItemFiltered method, which creates an expression, which allows to pass two parameters into the lambda. One is the item of the queryable collection the other is parameter passed using the filter collection.

public static Expression<Func<Tbool>> WhereItemFiltered<TTFilter>(
    this IEnumerable<T> self,
    Expression<Func<TTFilterbool>> filterPredicate,
    TFilter filterItem)
{
    var queryableItemParamExpr = Expression.Parameter(typeof(T));
    var filterItemConstExpr = Expression.Constant(filterItem, typeof(TFilter));
    var predicateInvokeExpr = Expression.Invoke(filterPredicate, queryableItemParamExpr, filterItemConstExpr);
    return Expression.Lambda<Func<Tbool>>(predicateInvokeExpr, queryableItemParamExpr);
}

Here is the generated SQL:

SELECT [c].[Id], [c].[InnerContainerId]
FROM [Containers] AS [c]
LEFT JOIN [InnerContainer] AS [c.InnerContainer] ON [c].[InnerContainerId] = [c.InnerContainer].[Id]
WHERE (CHARINDEX(N'Name1', [c.InnerContainer].[Name]) > 0) 
OR (CHARINDEX(N'Name3', [c.InnerContainer].[Name]) > 0)

As you can see it's a little bit different from our previous example. That's because in the previous post we've used EndsWith and not Contains. We can use any predicate, which is translatable to SQL.

Potential issues, which I see with the approach:

  • Complex objects can't be used in the filter item collection. An approach like this:

    var filterObjList = new List<FilterObj> {
        new FilterObj { Name = "Name1" },
        new FilterObj { Name = "Name2" }
    };            
    var resultQuery = queryableContainers.WhereOr(
        (container, filterItem) => container.InnerContainer.Name.Contains(filterItem.Name),
        filterObjList);
    var result = resultQuery.ToList();
    

    Will cause a WARNING, stating that the Where part of the query will be executed locally (in memory). To make it work we'd need a rewrite of the item filtering expression. 
  • As mentioned in the previous post - filter item collection should not contain too many items.


Tuesday 20 March 2018

Linq and entity framework: Chaining where clauses to produce OR query. Part 1

1. Entity framework LINQ translation troubles

Another topic I wanted to write about was my effort to find a solution for chaining together LINQ's "Wheres" to produce a SQL with OR connected conditions against a list of values.

The need was caused by Entity Framework not being able to properly compile certain LINQ queries to SQL. For instance, imagine our database has a container table, each container references an inner container, and each inner container has a "Name" property. We want to get all the items, which contain a string from a list we're holding, like this:

var filterList = new List<string> { "Name1""Name3" };
var containers = context.Containers
    .Where(container => filterList.Any(filterName => container.InnerContainer.Name.EndsWith(filterName)))
    .ToList();

If we look at the log file however, we will see the following warning:

The LINQ expression 'where {from String filterName in __filterList_0 where ([container.InnerContainer]?.Name?.EndsWith([filterName]) == True) select [filterName] => Any()}' could not be translated and will be evaluated locally.

Entity framework will fail silently, whole Containers table will be extracted from the database and the filtering will happen in memory. This can be very surprising and in case you are not checking your EF logs, you can be dumbfounded by sudden performance issues.

Now, this is not to nitpick on EF Core. Some methods obviously don't have a corresponding SQL version and can't be translated. In this case however I was pretty sure I'll get a nice SQL statement.

2. Simple non-generic solution

After a bit of digging I decided the way to go would be to use expression trees. There were many examples in stackoverflow or msdn forums, but none was, what I've exactly needed. The first simple approach allows us to chain together passed predicates using an Or LINQ extension.

public static Expression<Func<Tbool>> Or<T>(
    this Expression<Func<Tbool>> expr1, 
    Expression<Func<Tbool>> expr2)
{
    var invokedExpr = Expression.Invoke(expr2, expr1.Parameters);
    return Expression.Lambda<Func<Tbool>>
          (Expression.OrElse(expr1.Body, invokedExpr), expr1.Parameters);
}

       The extension allows us to chain two other expressions. What's important to note - passing same parameters expressions is required for the chaining to work. Think of one lambda with multiple || inside. Each condition will be using same lambda parameters passed into it.

var filterNames = new List<string> { "Name1""Name3" };
Expression<Func<Containerbool>> whereClause = c => c.InnerContainer.Name.Contains(filterNames.First());
foreach (var filterName in filterNames.Skip(1))
{
    whereClause = whereClause.Or(c => c.InnerContainer.Name.Contains(filterName));
};  

var result = queryableContainers
    .Where(whereClause)
    .ToList();

Following SQL will be generated:

SELECT [c].[Id], [c].[InnerContainerId]
FROM [Containers] AS [c]
LEFT JOIN [InnerContainer] AS [c.InnerContainer] ON [c].[InnerContainerId] = [c.InnerContainer].[Id]
WHERE ((CHARINDEX(@__First_0, [c.InnerContainer].[Name]) > 0) 
OR (@__First_0 = N'')) 
OR ((CHARINDEX(@__filterName_1, [c.InnerContainer].[Name]) > 0) 
OR (@__filterName_1 = N''))

This is precisely what we've wanted. Unfortunately, that's quite a lot of code to write each time we want to do something like this.

It's also important to add that we should not take this approach, if we will have many of such items in the filtering collection - the SQL can grow quite big, each case should be considered separately.

In the next post I'll walk through a generic example.

Wednesday 27 December 2017

Accept header based response formatting in asp net core

To be able to return responses from controller's actions based of on the returned type dot net core uses formatters. I've cobbled together an example how boilerplate code can look. This is pretty much similar to what you can find on msdn with some extra flair.

First, I've implemented the following 2 adapters (bear in mind there's a built-in xml serializer/formatter in .net core, I've created one for the sake of an example). They perform formatting on the list of cities passed in the form of dtos.

public class XmlCityAdapter : ICityTextAdapter
{
    public AdapterOutputFormat AdapterOutputFormat => AdapterOutputFormat.Xml;

    public string GetFormatted(IEnumerable<CityDto> cities)
    {
        var citiesElement = cities.Select(c =>
            new XElement("City",
                new XAttribute("Name", c.CityName),
                new XAttribute("Population", c.Population)));
        
        var doc = new XElement("Cities", citiesElement);
        return doc.ToString();
    }
}

and

public class CsvCityAdapter : ICityTextAdapter
{
    public AdapterOutputFormat AdapterOutputFormat => AdapterOutputFormat.Csv;

    public string GetFormatted(IEnumerable<CityDto> cities)
    {
        var stringBuilder = new StringBuilder();
        foreach (var city in cities)
        {
            stringBuilder.AppendLine($"{city.CityName};{city.Population}");
        }
        return stringBuilder.ToString();
    }
}

There are many, many ways to perform such serialization. Here's the interface behind these classes:

public interface ICityTextAdapter
{
    AdapterOutputFormat AdapterOutputFormat { get; }

    string GetFormatted(IEnumerable<CityDto> cities);
}

Next step was to add some boilerplate base class for the formatters. The formatter needs to derive from one of the base classes offered by core libraries. My choice was to use the OutputFormatter.
For our response formatter to work we need to override the following methods:
  • CanWriteType method (to asses, whether the passed in type should be formatted or not)
  • WriteResponseBodyAsync to do the actual formatting
Another method which we could override is WriteResponseHeaders. It is not really needed though, because the headers can be safely expanded in the WriteResponseBodyAsync method.
The base class will implement the above, it will also use ServiceProvider, which will allow us to use the container to get our adapter classes.

Here's the base class:

public abstract class BaseFormatter<TFormattedTypeTFormatter> : OutputFormatter
{
    private const string ContentDispositionHeader = "Content-Disposition";

    protected override bool CanWriteType(Type type)
    {
        return base.CanWriteType(type) && type.IsAssignableFrom(typeof(TFormattedType));
    }
    
    public override async Task WriteResponseBodyAsync(OutputFormatterWriteContext context)
    {
        if (!(context.Object is TFormattedType contextObject))
        {
            return;
        }

        var response = context.HttpContext.Response;
        var serviceProvider = response.HttpContext.RequestServices;
        var availableAdapters = serviceProvider
            .GetServices(typeof(TFormatter))
            .Cast<TFormatter>();                

        if(availableAdapters != null)
        {
            response.Headers.Add(ContentDispositionHeader, $"attachment;filename={GetFileName(contextObject)}");
            var formattedResult = GetFormattedResult(availableAdapters, contextObject);
            await response.WriteAsync(formattedResult);
        }             
    }

    protected abstract string GetFileName(TFormattedType contextObject);

    protected abstract string GetFormattedResult(
        IEnumerable<TFormatter> availableAdapters, 
        TFormattedType contextObject);
}

TFormattedType is the type, which we will validate against to check, if we can perform the formatting. If CanWriteType returns false other methods from the class won't be called.
TFormatter is the interface, which implementations of we will resolve from the container.

Given the formatters can't get dependencies through the constructor we need to get the container through the HttpContext. We will try and resolve all the implementation of our TFormatter type from it.

Another functionality of this class is to attach the disposition header. In this case each deriving class will set its disposition header as a filename passed through the overridable method.
Last thing to note is that the use of adapters have also been delegated to the deriving classes.

An example of the csv formatter is as follows, as you can see it passes List<CityDto> as the type, which is returned by the controller's action and which will get formatted. It also passes the interface ICityTextAdapter used for resolving the adapter implementations:

public class CsvCityFormatter : BaseFormatter<List<CityDto>, ICityTextAdapter>
{
    public CsvCityFormatter()
    {
        SupportedMediaTypes.Add(MediaTypeHeaderValue.Parse("text/csv"));
    }
    
    protected override string GetFileName(List<CityDto> contextObject)
    {
        return "Cities.csv";
    }

    protected override string GetFormattedResult(
        IEnumerable<ICityTextAdapter> availableAdapters, 
        List<CityDto> contextObject)
    {
        var csvAdapter = availableAdapters.FirstOrDefault(p => p.AdapterOutputFormat == AdapterOutputFormat.Csv);
        if (csvAdapter == null)
            throw new NullReferenceException(nameof(csvAdapter));
        return csvAdapter.GetFormatted(contextObject);
    }
}

The actual implementation of the formatter adds the supported media type as well. Uses the abstract methods to:
  • pass the filename, which will be set as disposition header
  • chooses and uses the adapter
Last piece to note is the registration of the formatter in the Startup class:

services.AddMvc(options =>
{                
    options.OutputFormatters.Add(new XmlCityFormatter());
    options.OutputFormatters.Add(new CsvCityFormatter());
});

What could be further added is:

  • null/error checking
  • logging
  • selection of the adapter could be based on another generic type, instead of an enum - which would make it much more robust and solid.

The code sample will be added on github soon.

Sunday 26 November 2017

.net core and entity framework integration tests with in memory database

    Integration testing is really useful, and any tool or framework which makes it easier will be welcomed. For .net core we are given the TestServer class and the entity framework's in memory database setup. I've built a small example, with a useful base class, which I'll present in this post.

Here's the whole code sample, including a small API in .net core 2,  and integration tests prepared using xUnit:
https://github.com/simonkatanski/coreintegrationtests

In the following I'm not going into basic knowledge of Entity Framework, or the setup of .net Core API. There's plenty articles about it elsewhere. This post shouldn't also be taken as an example of good design, I'm trying to make is as small and self-contained as possible.

I'll start with showing a simple DbContext which I want to "mock" with an "in-memory" version in my tests. A small db containing some exemplary city data.

public class CityContext : DbContextICityContext
{
    public DbSet<City> Cities { getset; }
    
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder.UseSqlite("Data Source=Cities.db");
}

It is used directly in our controller (not a good thing IRL ;)) like this:

[Route("api")]
public class CityController : Controller
{
    private readonly ICityContext _cityContext;

    public CityController(ICityContext cityContext)
    {
        _cityContext = cityContext;
    }
    
    [HttpGet("cities")]
    public IEnumerable<string> Get()
    {
        var cities = _cityContext.Cities.ToList();
        if (cities.Any())
        {
            return cities.Select(c => c.CityName);
        }
        return Enumerable.Empty<string>();
    }

    [HttpPost("cities")]
    public void Post([FromBody]City city)
    {
        _cityContext.Cities.Add(city);
        _cityContext.SaveChanges();
    }
}

As you can see it is minimal. Having created our small API we do want to create integration tests for the implemented actions. Now after creation of the integration tests project, we can create our "in-memory" version of CityContext.

public class InMemoryCityContext : CityContext
{
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder.UseInMemoryDatabase(Guid.NewGuid().ToString());
}

We derive from the context and override the OnConfiguring method. Bear in mind this might work slightly differently, if your DbContext is configured in a different way. Now to make it possible to swap our CityContext for the InMemoryCityContext implementation we need to prepare by using appropriate container registration.

In our case (we are using .net core built-in Dependency Injection IOC framework) we will use the following registration method, it will not register another ICityContext implementation, if there's already one existing in the container:

services.TryAddScoped<ICityContextCityContext>();

This means we need to register "in-memory" version before the actual API registration takes place. Now other containers are much more robust and performing a registration overriding is much easier using them. I'm going to focus on the built-in container.

Microsoft provides the following package for integration testing: "Microsoft.AspNetCore.TestHost". It contains the TestServer class, which can use our API's Startup class and set up an in-memory API for testing.

I've built a wrapper for it, to make it easier to manage the mocked DB context.

public class ApiTestServer : IDisposable
{
    private readonly TestServer _server;
    public InMemoryCityContext CityContext { get; }

    /// <summary>
    ///     A wrapper around the TestServer, which also contains the 
    ///     EF contexts used in the API.
    /// </summary>
    public ApiTestServer()
    {
        _server = new TestServer(new WebHostBuilder()
            .UseStartup<Startup>()
            .ConfigureServices(RegisterDependencies));

        CityContext = new InMemoryCityContext();
    }

    public RequestBuilder CreateRequest(string path)
    {
        return _server.CreateRequest(path);
    }
    
    /// <summary>
    ///     Register dependencies, which differ from the ordinary setup of the API. 
    ///     For the registration here to work, you need to use the TryAdd* versions
    ///     of container registration methods.
    /// </summary>
    private void RegisterDependencies(IServiceCollection service)
    {
        service.AddSingleton<ICityContextInMemoryCityContext>(serviceProvider => CityContext);
    }

    public void Dispose()
    {
        CityContext?.Dispose();
        _server?.Dispose();
    }
}

As you can see above the class exposes the Context outside, so that we can both set our test data in the context, and validate the changes introduced by the API. It also exposes the RequestBuilder, which allows us to send http requests. ApiTestServer registers the context as a singleton, the context is later used thanks to the TryAddScoped registration of the original base context.

An example created with xUnit:

[Fact]
public async Task GivenNonEmptyDb_ThenExpectCityToBeAdded()
{
    using (var server = new ApiTestServer())
    {
        //Arrange
        server.CityContext.Add(new City { CityName = "Bristol", Population = 100000, Id = 0 });
        server.CityContext.SaveChanges();

        var cityToAdd = new City { CityName = "Berlin", Population = 100000 };
        var stringData = JsonConvert.SerializeObject(cityToAdd);
        var request = server.CreateRequest("/api/cities")
            .And(c => c.Content = new StringContent(stringData, Encoding.UTF8, "application/json"))

        //Act
        var response = await request.PostAsync();

        //Assert
        Assert.True(response.StatusCode == HttpStatusCode.OK);
        Assert.True(server.CityContext.Cities.Count() == 2);
    }
}

In the example we:

  • create our test server
  • setup the context (state of the db before the API call)
  • prepare the request
  • call the API with the request
  • assert the context state

This way, in a real-life case, by adding all our contexts into the ApiTestServer class we can prepare a complete setup for our DB and our integration tests.

UPDATE:
I've explicitly set the ID to 0 for the entity added to the context prior to the http call. Obviously that's the default value for that parameter - but in case you were using some special data generator like Fizzware's NBuilder, you might forget about setting it to 0 explicitly and struggle to find the reason why EF is not assigning a new value to the newly added entity. For the Entity Framework's primary key's integer value generator to generate a new value it has to have 0 assigned, otherwise it'll be omitted from the generation.