Saturday, June 19, 2010

blog moving to 42sight.blogspot.com

moving the blog to 42sight.blogspot.com - where we continue the Data warehousing related discussions and our upcoming book
http://42sight.blogspot.com/

Wednesday, March 17, 2010

Experimenting with Recursive Common Table Expressions (CTE)

Having read Johnathon Parker's excellent post on how Recursive CTEs work in SQL Server
http://jonathanparker.com.au/archive/2007/04/16/recursive-common-table-expressions-ctes-in-sql-server-2005.aspx

We decided to try it out on something more complicated

The challenge we gave oursleves was to take the "SalesQuota" table which is in quartlery buckets and split it into monthly Buckets.

This can be acheived with 3 queries in Union as follows:

SELECT SalesPersonID, QuotaDate, SalesQuota / 3 AS SalesQuota

FROM Sales.SalesPersonQuotaHistory

UNION ALL


SELECT SalesPersonID, DATEADD(m, 1, QuotaDate) AS QuotaDate, SalesQuota / 3 AS SalesQuota
FROM Sales.SalesPersonQuotaHistory AS SalesPersonQuotaHistory_1


UNION ALL


SELECT SalesPersonID, DATEADD(m, 2, QuotaDate) AS QuotaDate, SalesQuota / 3 AS SalesQuota
FROM Sales.SalesPersonQuotaHistory AS SalesPersonQuotaHistory_2

An alternative Solution using Recursive CTE is as follows:

WITH SPQ AS (SELECT SalesPersonID, QuotaDate, SalesQuota / 3 AS SalesQuota , 1 as c1

FROM Sales.SalesPersonQuotaHistory


UNION ALL


SELECT SalesPersonID, DATEADD(m, 1 , QuotaDate) AS QuotaDate, SalesQuota, c1+1 as c1
FROM SPQ
WHERE c1<3)

SELECT SalesPersonID, QuotaDate, SalesQuota
from SPQ
ORDER BY SalesPersonID, QuotaDate

This repeats the second part twice in a recursive manner - providing the same results as the 1st triple union query.

Tuesday, November 17, 2009

Microsoft SQL - Update a Table with results from a query.

i.e. update with join to a derived table.

Scenario - You have a table (TableA) where you would like to keep track of, among other statistics and information, the number of records for each Entity from another table (DataTable). This requires a method to Update one table while joined to the results of query on another table (a derived table).

The join is by EntityID.

This is how you could do it in Microsoft SQL using a T-SQL statement.

UPDATE TableA
SET TableA.RecordCount = RecCount
FROM TableA,
(SELECT EntityID, Count(EntityID) as RecCount
From DataTable Group by EntityID) as DT
where TableA.EntityID = DT.EntityID

Sunday, November 2, 2008

Adventure works 2005- faulty report samples - fixes/solution

Solution for faulty Product line sales:

The Product line sales report has currently for "top employees" the following incorrect query in the dataset TopEmployees:

SELECT TOP 5 C.LastName, C.FirstName, E.EmployeeID, SUM(SOH.SubTotal) AS SaleAmount
FROM Sales.SalesPerson SP INNER JOIN HumanResources.Employee E ON SP.SalesPersonID = E.EmployeeID INNER JOIN Person.Contact C ON E.ContactID = C.ContactID INNER JOIN Sales.SalesOrderHeader SOH ON SP.SalesPersonID = SOH.SalesPersonID INNER JOIN Sales.SalesOrderDetail SOD ON SOH.SalesOrderID = SOD.SalesOrderID INNER JOIN Production.Product P ON SOD.ProductID = P.ProductID INNER JOIN Production.ProductSubcategory PS ON P.ProductSubcategoryID = PS.ProductSubcategoryID INNER JOIN Production.ProductCategory PC ON PS.ProductCategoryID = PC.ProductCategoryID
WHERE (PC.ProductCategoryID = @ProductCategory) AND (PS.ProductSubcategoryID IN (@ProductSubcategory)) AND (SOH.OrderDate > @StartDate) AND (SOH.OrderDate < @EndDate)
GROUP BY C.LastName, C.FirstName, E.EmployeeID, PC.ProductCategoryID, PS.ProductSubcategoryID
ORDER BY SUM(SOH.SubTotal) DESC

The problem is that it is adding up the Sales Order header - for every line on the order - so an order with 10 lines has the Order total multiplied 10 times. The "SUM(SOH.SubTotal)" bits need to be substituted with SUM(SOD.LineTotal).

So the top 5 report should use the following query:
SELECT TOP (100) PERCENT C.LastName, C.FirstName, E.EmployeeID, SUM(SOD.LineTotal) AS SaleAmount
FROM Sales.SalesPerson AS SP INNER JOIN HumanResources.Employee AS E ON SP.SalesPersonID = E.EmployeeID INNER JOIN Person.Contact AS C ON E.ContactID = C.ContactID INNER JOIN Sales.SalesOrderHeader AS SOH ON SP.SalesPersonID = SOH.SalesPersonID INNER JOIN Sales.SalesOrderDetail AS SOD ON SOH.SalesOrderID = SOD.SalesOrderID INNER JOIN Production.Product AS P ON SOD.ProductID = P.ProductID INNER JOIN Production.ProductSubcategory AS PS ON P.ProductSubcategoryID = PS.ProductSubcategoryID INNER JOIN Production.ProductCategory AS PC ON PS.ProductCategoryID = PC.ProductCategoryID
WHERE (PC.ProductCategoryID = @ProductCategory) AND (PS.ProductSubcategoryID IN (@ProductSubcategory)) AND (SOH.OrderDate > @StartDate) AND (SOH.OrderDate < @EndDate)
GROUP BY C.LastName, C.FirstName, E.EmployeeID, PC.ProductCategoryID, PS.ProductSubcategoryID
ORDER BY SUM(SOD.LineTotal) DESC

For the dataset TopCustomers there is an identical mistake
SELECT TOP 5 S.Name AS StoreName, SUM(SOH.SubTotal) AS SaleAmount, PS.ProductSubcategoryID,nPS.ProductCategoryID
FROM Production.Product P INNER JOIN Production.ProductSubcategory PS ON P.ProductSubcategoryID = PS.ProductSubcategoryID INNER JOIN Sales.SalesOrderDetail SOD ON P.ProductID = SOD.ProductID INNER JOIN Sales.Customer CU INNER JOIN Sales.SalesOrderHeader SOH ON CU.CustomerID = SOH.CustomerID INNER JOIN Sales.Store S ON CU.CustomerID = S.CustomerID ON SOD.SalesOrderID = SOH.SalesOrderID
WHERE (SOH.OrderDate > @StartDate) AND (SOH.OrderDate < @EndDate) AND (PS.ProductCategoryID = @ProductCategory) AND (PS.ProductSubcategoryID IN (@ProductSubcategory))
GROUP BY S.Name, PS.ProductSubcategoryID, PS.ProductCategoryID
ORDER BY SUM(SOH.SubTotal) DESC

The correct query is:
SELECT TOP 5 S.Name AS StoreName, SUM(SOD.LineTotal) AS SaleAmount, PS.ProductSubcategoryID,nPS.ProductCategoryID
FROM Production.Product P INNER JOIN Production.ProductSubcategory PS ON P.ProductSubcategoryID = PS.ProductSubcategoryID INNER JOIN Sales.SalesOrderDetail SOD ON P.ProductID = SOD.ProductID INNER JOIN Sales.Customer CU INNER JOIN Sales.SalesOrderHeader SOH ON CU.CustomerID = SOH.CustomerID INNER JOIN Sales.Store S ON CU.CustomerID = S.CustomerID ON SOD.SalesOrderID = SOH.SalesOrderID

WHERE (SOH.OrderDate > @StartDate) AND (SOH.OrderDate < @EndDate) AND (PS.ProductCategoryID = @ProductCategory) AND (PS.ProductSubcategoryID IN (@ProductSubcategory)) GROUP BY S.Name, PS.ProductSubcategoryID, PS.ProductCategoryIDORDER BY SUM(SOD.LineTotal) DESC

I think these work :)

As for the sales territory drilldown the only fix I can see is to do an update query of the Sales Order Header file of the correct line totals from the Sales Order Detail files.

Friday, October 31, 2008

Problems we are having with Adventure works

Product Line Sales:

Top 5 Customers - sales are inflated significantly. Adventure works sales in total (from the Sales order header) are approx $97M

Do a report (Bikes, road bikes, Calendar year 2003) and the Top 5 Sales people sold over $158 millions - does anyone notice this as a problem ? Just a slight reconciliation difference I would think

Or is this report deliberately broken so people can fix it?


Now my next favourite report:

Sales Territory Drill down

The Sales by Order by Header does agree to the sales order total from the 'Drill down' - hmm housten we have a problem

Australia , Lynn Tsoflias Sales order SO51150 $40,274.01 =/= the drilldown $26,328 (after discount of $1,996)

In fact the Adventure works database is broken - Sales order Header subtotal =/= sum(Sales Order Detail . line total) - total discrepancy a cool $17 mill or so. A redememing fact is that the dbo.FactResellerSales table does indeed have the 'correct' sales number.(It reconciles to the Sales order Detail)

ALSO this report is missing sales made by managers - or dont managers count? (they are also missing from the view Sales.vSalesPerson in the

=======================================
So is there something I am completely missing?

Seriously though I am using Adventure works for SQL server 2005 - are these problems that are fixed in the Adventure works 2008? (havent installed yet as I dont have SQL 2008 yet)

Please feel free to contribute to the discussion

Tuesday, September 2, 2008

Our aim of this Blog

Once we have published our work we aim to provide support and answers to all your queries regarding the use of our model on Microsoft SQL Server and reporting in Microsoft Excel and we will also provide more technical details behind our methods for the curious.

In the meantime we are throwing up some topics for discussion that we are passionate about in our journey of building and designing a data warehouse

Computing Efficiency vs Economic Efficiency

Is there still any point to building “Efficient” computer models. Today’s computers are infinitely more powerful than when many of the design guidelines were set.

Which is better – an inefficient computer model that is easy to understand, implement and use or an efficient computer model that is cumbersome, complicated and takes an inordinate amount of time to implement?

I.E. Computing Efficiency vs Economic Efficiency

This is Jane's ask the Experts Question for Tech'ed this year