Getting Started with Solr & Carrot2 Clustering

08 Oct
October 8, 2013

Every time I start a new Solr project I have to remind myself with the basics again, such as setting up & configuring Solr Cores, updating indexes and plugging in projects such as Carrot2’s classification engine, so I decided to create this post as a reminder of the how to perform the main operations you might encounter when playing around with a Solr engine.

Read more →

Keyword Stemming and Lemmatisation with Apache Solr

02 Aug
August 2, 2013

I started working recently with Apache Solr, and I am hugely impressed, the search technology is very solid and packs many IR, advance search and NLP features out of the box.

In this post I will provide an overview of how to setup Keyword Stemming on a field in your Solr core. A stemming filter will essentially expand the input Solr search term to include results containing stems of the original search term, in addition to the search term itself.

Read more →

In-Memory (Memory Optimized) Tables in SQL Server 2014

12 Jul
July 12, 2013

In-Memory storage technology finally make their debut appearance on the SQL Server 2014’s BI stack, with the creation of a proper memory optimized tables and stored procedures, unlike the Columnstore feature which offers a read-only memory optimized solution, that does not work overly well in a true transactional environment.

In this post I hope to dissect the new In-Memory tables feature of SQL Server 2014, providing an overview of how the technology works, how to create in-memory tables, maintain them and any pitfalls to watch out from. Mainly though, I am writing this as a reminder to myself of the latest articles I have been reading about this cool new feature.

Read more →

Connecting SQL Server and Analysis Services to Hadoop Hive

09 Jul
July 9, 2013

Hadoop is a pretty neat set of tools for processing loads of data in a distributed, parallel and easy to scale-out manner, and so rightfully the Hadoop toolset owns a pretty high position in the data analysis and BI game, and a must consider when embarking on any new big data project. But that being said, the Hadoop eco-system, however advance in many areas, is still away from being a complete end to end BI solution, particularly when it comes to offering support for emerging data analysis and business intelligence concepts, such as exploratory data analysis and real-time data querying, or even fully-integrated data visualization and report authoring tools.

Read more →

Visual Studio 2013 Preview is Available for Download

29 Jun
June 29, 2013

Well its Preview release week at Microsoft, and the new and shiny Visual Studio 2013 is now available for download under multiple editions and with additional VS13 tools.

Read more →

8 Ways to Optimize and Improve Performance of your SSIS Package

28 Jun
June 28, 2013

The title should actually read “8 Random Ways to Optimise SSIS”.

One of the recent project I have been working on involved building a distributed (scaled-out) SSIS environment, this means multiple VMs with a standalone SSIS (2012 in Package mode) instances installed (so no SQL Server Database Engine), all pushing massive amount of data to a staging database.

I have been brought in on this project to suggest a few techniques to improve the performance of the scaled-out SSIS environment, by basically increasing the throughput to the staging database, below I discuss some of the general approaches I have taken to achieve that goal. Some of the advice might be a bit random, and others might not be pertinent to your particular situation, but over-all you should find a gem or two on optimising SSIS  performance in there somewhere!

Read more →

SQL Server 2014 CTP is Out!

25 Jun
June 25, 2013

Image by James Shearer

Its a RDBMs, no its a ColumnStore, no its a cloud integrated storage platform… Oh its actually SQL Server 2014!

Exciting news as the new (project code-name: HekatonSQL Server 2014 is community technology preview (CTP1) is out and can be downloaded and evaluated by the community.

Read more →

Altering Calculations for a Deployed (Live) SSAS Cube

18 Jun
June 18, 2013

This is a pretty simple post to show how to alter (add, remove or edit) a calculated field in an SSAS cube without redeploying the whole project, a useful technique if you do not have the SSAS cube project handy or wish to quickly implement changes on a live cube.

Read more →

Diagnosing Kerberos Delegation Issues on SQL Server, SharePoint, SSRS and SSAS

01 Jun
June 1, 2013

Until now, I have found working with Kerberos when setting up a SQL Server stack to be a complete nightmarish experience, mainly due to two reasons:

  • Working with Kerberos usually requires access rights to Active Directory for the account setting up this authentication protocol on the stack, in order to be able to effectively diagnose the setup and also configure the Service Principal Names (SPN) for the various SQL Server and SharePoint service accounts, and setup delegation. This means SQL Server architects and Network Administrators need to collaborate in order to correctly configure the stack, which is often an unpleasant and long winded experience of trial and error.
  • The lack of a centralized diagnostic and configuration tools for Kerberos setup on SQL Server makes this tasks very tedious, particularly if you follow the limited number of online resources out there to setup Kerberos, and find that they do not apply exactly to your situation, or do not work exactly as intended after following the lengthy steps, and you are left with a very limited option in terms of diagnosing exactly what went wrong.

Read more →

4 Ways to Visualize Geographical (Location) Data in Excel 2013

31 May
May 31, 2013

Excel 2013 brings forward an array of new and exciting features at the finger tips of the data analysts, ranging from a shiny new visualization and exploratory data analysis platform (PowerView) to a number of new pivoting features as well as a powerful in-memory data modelling engine (PowerPivot) enabled by default.

Among all these features, the new Excel delivers a few different options for visualizing geographical and location based data, each visualization technique serving a different purpose (with a specific set of features) or targeting a particular demographic segment of the over-all Excel user-base. This is a short post introducing some techniques for visualizing geo information in Excel.

Read more →