moving the blog to 42sight.blogspot.com - where we continue the Data warehousing related discussions and our upcoming book
http://42sight.blogspot.com/
Saturday, June 19, 2010
Wednesday, March 17, 2010
Experimenting with Recursive Common Table Expressions (CTE)
Having read Johnathon Parker's excellent post on how Recursive CTEs work in SQL Server
http://jonathanparker.com.au/archive/2007/04/16/recursive-common-table-expressions-ctes-in-sql-server-2005.aspx
We decided to try it out on something more complicated
The challenge we gave oursleves was to take the "SalesQuota" table which is in quartlery buckets and split it into monthly Buckets.
This can be acheived with 3 queries in Union as follows:
SELECT SalesPersonID, QuotaDate, SalesQuota / 3 AS SalesQuota
FROM Sales.SalesPersonQuotaHistory
UNION ALL
SELECT SalesPersonID, DATEADD(m, 1, QuotaDate) AS QuotaDate, SalesQuota / 3 AS SalesQuota
FROM Sales.SalesPersonQuotaHistory AS SalesPersonQuotaHistory_1
UNION ALL
SELECT SalesPersonID, DATEADD(m, 2, QuotaDate) AS QuotaDate, SalesQuota / 3 AS SalesQuota
FROM Sales.SalesPersonQuotaHistory AS SalesPersonQuotaHistory_2
An alternative Solution using Recursive CTE is as follows:
WITH SPQ AS (SELECT SalesPersonID, QuotaDate, SalesQuota / 3 AS SalesQuota , 1 as c1
FROM Sales.SalesPersonQuotaHistory
UNION ALL
SELECT SalesPersonID, DATEADD(m, 1 , QuotaDate) AS QuotaDate, SalesQuota, c1+1 as c1
FROM SPQ
WHERE c1<3)
SELECT SalesPersonID, QuotaDate, SalesQuota
from SPQ
ORDER BY SalesPersonID, QuotaDate
This repeats the second part twice in a recursive manner - providing the same results as the 1st triple union query.
http://jonathanparker.com.au/archive/2007/04/16/recursive-common-table-expressions-ctes-in-sql-server-2005.aspx
We decided to try it out on something more complicated
The challenge we gave oursleves was to take the "SalesQuota" table which is in quartlery buckets and split it into monthly Buckets.
This can be acheived with 3 queries in Union as follows:
SELECT SalesPersonID, QuotaDate, SalesQuota / 3 AS SalesQuota
FROM Sales.SalesPersonQuotaHistory
UNION ALL
SELECT SalesPersonID, DATEADD(m, 1, QuotaDate) AS QuotaDate, SalesQuota / 3 AS SalesQuota
FROM Sales.SalesPersonQuotaHistory AS SalesPersonQuotaHistory_1
UNION ALL
SELECT SalesPersonID, DATEADD(m, 2, QuotaDate) AS QuotaDate, SalesQuota / 3 AS SalesQuota
FROM Sales.SalesPersonQuotaHistory AS SalesPersonQuotaHistory_2
An alternative Solution using Recursive CTE is as follows:
WITH SPQ AS (SELECT SalesPersonID, QuotaDate, SalesQuota / 3 AS SalesQuota , 1 as c1
FROM Sales.SalesPersonQuotaHistory
UNION ALL
SELECT SalesPersonID, DATEADD(m, 1 , QuotaDate) AS QuotaDate, SalesQuota, c1+1 as c1
FROM SPQ
WHERE c1<3)
SELECT SalesPersonID, QuotaDate, SalesQuota
from SPQ
ORDER BY SalesPersonID, QuotaDate
This repeats the second part twice in a recursive manner - providing the same results as the 1st triple union query.
Tuesday, November 17, 2009
Microsoft SQL - Update a Table with results from a query.
i.e. update with join to a derived table.
Scenario - You have a table (TableA) where you would like to keep track of, among other statistics and information, the number of records for each Entity from another table (DataTable). This requires a method to Update one table while joined to the results of query on another table (a derived table).
The join is by EntityID.
This is how you could do it in Microsoft SQL using a T-SQL statement.
UPDATE TableA
SET TableA.RecordCount = RecCount
FROM TableA,
(SELECT EntityID, Count(EntityID) as RecCount
From DataTable Group by EntityID) as DT
where TableA.EntityID = DT.EntityID
Scenario - You have a table (TableA) where you would like to keep track of, among other statistics and information, the number of records for each Entity from another table (DataTable). This requires a method to Update one table while joined to the results of query on another table (a derived table).
The join is by EntityID.
This is how you could do it in Microsoft SQL using a T-SQL statement.
UPDATE TableA
SET TableA.RecordCount = RecCount
FROM TableA,
(SELECT EntityID, Count(EntityID) as RecCount
From DataTable Group by EntityID) as DT
where TableA.EntityID = DT.EntityID
Sunday, November 2, 2008
Adventure works 2005- faulty report samples - fixes/solution
Solution for faulty Product line sales:
The Product line sales report has currently for "top employees" the following incorrect query in the dataset TopEmployees:
SELECT TOP 5 C.LastName, C.FirstName, E.EmployeeID, SUM(SOH.SubTotal) AS SaleAmount
FROM Sales.SalesPerson SP INNER JOIN HumanResources.Employee E ON SP.SalesPersonID = E.EmployeeID INNER JOIN Person.Contact C ON E.ContactID = C.ContactID INNER JOIN Sales.SalesOrderHeader SOH ON SP.SalesPersonID = SOH.SalesPersonID INNER JOIN Sales.SalesOrderDetail SOD ON SOH.SalesOrderID = SOD.SalesOrderID INNER JOIN Production.Product P ON SOD.ProductID = P.ProductID INNER JOIN Production.ProductSubcategory PS ON P.ProductSubcategoryID = PS.ProductSubcategoryID INNER JOIN Production.ProductCategory PC ON PS.ProductCategoryID = PC.ProductCategoryID
WHERE (PC.ProductCategoryID = @ProductCategory) AND (PS.ProductSubcategoryID IN (@ProductSubcategory)) AND (SOH.OrderDate > @StartDate) AND (SOH.OrderDate < @EndDate)
GROUP BY C.LastName, C.FirstName, E.EmployeeID, PC.ProductCategoryID, PS.ProductSubcategoryID
ORDER BY SUM(SOH.SubTotal) DESC
The problem is that it is adding up the Sales Order header - for every line on the order - so an order with 10 lines has the Order total multiplied 10 times. The "SUM(SOH.SubTotal)" bits need to be substituted with SUM(SOD.LineTotal).
So the top 5 report should use the following query:
SELECT TOP (100) PERCENT C.LastName, C.FirstName, E.EmployeeID, SUM(SOD.LineTotal) AS SaleAmount
FROM Sales.SalesPerson AS SP INNER JOIN HumanResources.Employee AS E ON SP.SalesPersonID = E.EmployeeID INNER JOIN Person.Contact AS C ON E.ContactID = C.ContactID INNER JOIN Sales.SalesOrderHeader AS SOH ON SP.SalesPersonID = SOH.SalesPersonID INNER JOIN Sales.SalesOrderDetail AS SOD ON SOH.SalesOrderID = SOD.SalesOrderID INNER JOIN Production.Product AS P ON SOD.ProductID = P.ProductID INNER JOIN Production.ProductSubcategory AS PS ON P.ProductSubcategoryID = PS.ProductSubcategoryID INNER JOIN Production.ProductCategory AS PC ON PS.ProductCategoryID = PC.ProductCategoryID
WHERE (PC.ProductCategoryID = @ProductCategory) AND (PS.ProductSubcategoryID IN (@ProductSubcategory)) AND (SOH.OrderDate > @StartDate) AND (SOH.OrderDate < @EndDate)
GROUP BY C.LastName, C.FirstName, E.EmployeeID, PC.ProductCategoryID, PS.ProductSubcategoryID
ORDER BY SUM(SOD.LineTotal) DESC
For the dataset TopCustomers there is an identical mistake
SELECT TOP 5 S.Name AS StoreName, SUM(SOH.SubTotal) AS SaleAmount, PS.ProductSubcategoryID,nPS.ProductCategoryID
FROM Production.Product P INNER JOIN Production.ProductSubcategory PS ON P.ProductSubcategoryID = PS.ProductSubcategoryID INNER JOIN Sales.SalesOrderDetail SOD ON P.ProductID = SOD.ProductID INNER JOIN Sales.Customer CU INNER JOIN Sales.SalesOrderHeader SOH ON CU.CustomerID = SOH.CustomerID INNER JOIN Sales.Store S ON CU.CustomerID = S.CustomerID ON SOD.SalesOrderID = SOH.SalesOrderID
WHERE (SOH.OrderDate > @StartDate) AND (SOH.OrderDate < @EndDate) AND (PS.ProductCategoryID = @ProductCategory) AND (PS.ProductSubcategoryID IN (@ProductSubcategory))
GROUP BY S.Name, PS.ProductSubcategoryID, PS.ProductCategoryID
ORDER BY SUM(SOH.SubTotal) DESC
The correct query is:
SELECT TOP 5 S.Name AS StoreName, SUM(SOD.LineTotal) AS SaleAmount, PS.ProductSubcategoryID,nPS.ProductCategoryID
FROM Production.Product P INNER JOIN Production.ProductSubcategory PS ON P.ProductSubcategoryID = PS.ProductSubcategoryID INNER JOIN Sales.SalesOrderDetail SOD ON P.ProductID = SOD.ProductID INNER JOIN Sales.Customer CU INNER JOIN Sales.SalesOrderHeader SOH ON CU.CustomerID = SOH.CustomerID INNER JOIN Sales.Store S ON CU.CustomerID = S.CustomerID ON SOD.SalesOrderID = SOH.SalesOrderID
WHERE (SOH.OrderDate > @StartDate) AND (SOH.OrderDate < @EndDate) AND (PS.ProductCategoryID = @ProductCategory) AND (PS.ProductSubcategoryID IN (@ProductSubcategory)) GROUP BY S.Name, PS.ProductSubcategoryID, PS.ProductCategoryIDORDER BY SUM(SOD.LineTotal) DESC
I think these work :)
As for the sales territory drilldown the only fix I can see is to do an update query of the Sales Order Header file of the correct line totals from the Sales Order Detail files.
The Product line sales report has currently for "top employees" the following incorrect query in the dataset TopEmployees:
SELECT TOP 5 C.LastName, C.FirstName, E.EmployeeID, SUM(SOH.SubTotal) AS SaleAmount
FROM Sales.SalesPerson SP INNER JOIN HumanResources.Employee E ON SP.SalesPersonID = E.EmployeeID INNER JOIN Person.Contact C ON E.ContactID = C.ContactID INNER JOIN Sales.SalesOrderHeader SOH ON SP.SalesPersonID = SOH.SalesPersonID INNER JOIN Sales.SalesOrderDetail SOD ON SOH.SalesOrderID = SOD.SalesOrderID INNER JOIN Production.Product P ON SOD.ProductID = P.ProductID INNER JOIN Production.ProductSubcategory PS ON P.ProductSubcategoryID = PS.ProductSubcategoryID INNER JOIN Production.ProductCategory PC ON PS.ProductCategoryID = PC.ProductCategoryID
WHERE (PC.ProductCategoryID = @ProductCategory) AND (PS.ProductSubcategoryID IN (@ProductSubcategory)) AND (SOH.OrderDate > @StartDate) AND (SOH.OrderDate < @EndDate)
GROUP BY C.LastName, C.FirstName, E.EmployeeID, PC.ProductCategoryID, PS.ProductSubcategoryID
ORDER BY SUM(SOH.SubTotal) DESC
The problem is that it is adding up the Sales Order header - for every line on the order - so an order with 10 lines has the Order total multiplied 10 times. The "SUM(SOH.SubTotal)" bits need to be substituted with SUM(SOD.LineTotal).
So the top 5 report should use the following query:
SELECT TOP (100) PERCENT C.LastName, C.FirstName, E.EmployeeID, SUM(SOD.LineTotal) AS SaleAmount
FROM Sales.SalesPerson AS SP INNER JOIN HumanResources.Employee AS E ON SP.SalesPersonID = E.EmployeeID INNER JOIN Person.Contact AS C ON E.ContactID = C.ContactID INNER JOIN Sales.SalesOrderHeader AS SOH ON SP.SalesPersonID = SOH.SalesPersonID INNER JOIN Sales.SalesOrderDetail AS SOD ON SOH.SalesOrderID = SOD.SalesOrderID INNER JOIN Production.Product AS P ON SOD.ProductID = P.ProductID INNER JOIN Production.ProductSubcategory AS PS ON P.ProductSubcategoryID = PS.ProductSubcategoryID INNER JOIN Production.ProductCategory AS PC ON PS.ProductCategoryID = PC.ProductCategoryID
WHERE (PC.ProductCategoryID = @ProductCategory) AND (PS.ProductSubcategoryID IN (@ProductSubcategory)) AND (SOH.OrderDate > @StartDate) AND (SOH.OrderDate < @EndDate)
GROUP BY C.LastName, C.FirstName, E.EmployeeID, PC.ProductCategoryID, PS.ProductSubcategoryID
ORDER BY SUM(SOD.LineTotal) DESC
For the dataset TopCustomers there is an identical mistake
SELECT TOP 5 S.Name AS StoreName, SUM(SOH.SubTotal) AS SaleAmount, PS.ProductSubcategoryID,nPS.ProductCategoryID
FROM Production.Product P INNER JOIN Production.ProductSubcategory PS ON P.ProductSubcategoryID = PS.ProductSubcategoryID INNER JOIN Sales.SalesOrderDetail SOD ON P.ProductID = SOD.ProductID INNER JOIN Sales.Customer CU INNER JOIN Sales.SalesOrderHeader SOH ON CU.CustomerID = SOH.CustomerID INNER JOIN Sales.Store S ON CU.CustomerID = S.CustomerID ON SOD.SalesOrderID = SOH.SalesOrderID
WHERE (SOH.OrderDate > @StartDate) AND (SOH.OrderDate < @EndDate) AND (PS.ProductCategoryID = @ProductCategory) AND (PS.ProductSubcategoryID IN (@ProductSubcategory))
GROUP BY S.Name, PS.ProductSubcategoryID, PS.ProductCategoryID
ORDER BY SUM(SOH.SubTotal) DESC
The correct query is:
SELECT TOP 5 S.Name AS StoreName, SUM(SOD.LineTotal) AS SaleAmount, PS.ProductSubcategoryID,nPS.ProductCategoryID
FROM Production.Product P INNER JOIN Production.ProductSubcategory PS ON P.ProductSubcategoryID = PS.ProductSubcategoryID INNER JOIN Sales.SalesOrderDetail SOD ON P.ProductID = SOD.ProductID INNER JOIN Sales.Customer CU INNER JOIN Sales.SalesOrderHeader SOH ON CU.CustomerID = SOH.CustomerID INNER JOIN Sales.Store S ON CU.CustomerID = S.CustomerID ON SOD.SalesOrderID = SOH.SalesOrderID
WHERE (SOH.OrderDate > @StartDate) AND (SOH.OrderDate < @EndDate) AND (PS.ProductCategoryID = @ProductCategory) AND (PS.ProductSubcategoryID IN (@ProductSubcategory)) GROUP BY S.Name, PS.ProductSubcategoryID, PS.ProductCategoryIDORDER BY SUM(SOD.LineTotal) DESC
I think these work :)
As for the sales territory drilldown the only fix I can see is to do an update query of the Sales Order Header file of the correct line totals from the Sales Order Detail files.
Friday, October 31, 2008
Problems we are having with Adventure works
Product Line Sales:
Top 5 Customers - sales are inflated significantly. Adventure works sales in total (from the Sales order header) are approx $97M
Do a report (Bikes, road bikes, Calendar year 2003) and the Top 5 Sales people sold over $158 millions - does anyone notice this as a problem ? Just a slight reconciliation difference I would think
Or is this report deliberately broken so people can fix it?
Now my next favourite report:
Sales Territory Drill down
The Sales by Order by Header does agree to the sales order total from the 'Drill down' - hmm housten we have a problem
Australia , Lynn Tsoflias Sales order SO51150 $40,274.01 =/= the drilldown $26,328 (after discount of $1,996)
In fact the Adventure works database is broken - Sales order Header subtotal =/= sum(Sales Order Detail . line total) - total discrepancy a cool $17 mill or so. A redememing fact is that the dbo.FactResellerSales table does indeed have the 'correct' sales number.(It reconciles to the Sales order Detail)
ALSO this report is missing sales made by managers - or dont managers count? (they are also missing from the view Sales.vSalesPerson in the
=======================================
So is there something I am completely missing?
Seriously though I am using Adventure works for SQL server 2005 - are these problems that are fixed in the Adventure works 2008? (havent installed yet as I dont have SQL 2008 yet)
Please feel free to contribute to the discussion
Top 5 Customers - sales are inflated significantly. Adventure works sales in total (from the Sales order header) are approx $97M
Do a report (Bikes, road bikes, Calendar year 2003) and the Top 5 Sales people sold over $158 millions - does anyone notice this as a problem ? Just a slight reconciliation difference I would think
Or is this report deliberately broken so people can fix it?
Now my next favourite report:
Sales Territory Drill down
The Sales by Order by Header does agree to the sales order total from the 'Drill down' - hmm housten we have a problem
Australia , Lynn Tsoflias Sales order SO51150 $40,274.01 =/= the drilldown $26,328 (after discount of $1,996)
In fact the Adventure works database is broken - Sales order Header subtotal =/= sum(Sales Order Detail . line total) - total discrepancy a cool $17 mill or so. A redememing fact is that the dbo.FactResellerSales table does indeed have the 'correct' sales number.(It reconciles to the Sales order Detail)
ALSO this report is missing sales made by managers - or dont managers count? (they are also missing from the view Sales.vSalesPerson in the
=======================================
So is there something I am completely missing?
Seriously though I am using Adventure works for SQL server 2005 - are these problems that are fixed in the Adventure works 2008? (havent installed yet as I dont have SQL 2008 yet)
Please feel free to contribute to the discussion
Tuesday, September 2, 2008
Our aim of this Blog
Once we have published our work we aim to provide support and answers to all your queries regarding the use of our model on Microsoft SQL Server and reporting in Microsoft Excel and we will also provide more technical details behind our methods for the curious.
In the meantime we are throwing up some topics for discussion that we are passionate about in our journey of building and designing a data warehouse
In the meantime we are throwing up some topics for discussion that we are passionate about in our journey of building and designing a data warehouse
Computing Efficiency vs Economic Efficiency
Is there still any point to building “Efficient” computer models. Today’s computers are infinitely more powerful than when many of the design guidelines were set.
Which is better – an inefficient computer model that is easy to understand, implement and use or an efficient computer model that is cumbersome, complicated and takes an inordinate amount of time to implement?
I.E. Computing Efficiency vs Economic Efficiency
This is Jane's ask the Experts Question for Tech'ed this year
Which is better – an inefficient computer model that is easy to understand, implement and use or an efficient computer model that is cumbersome, complicated and takes an inordinate amount of time to implement?
I.E. Computing Efficiency vs Economic Efficiency
This is Jane's ask the Experts Question for Tech'ed this year
Subscribe to:
Posts (Atom)