|
|
-
T-SQL provides functions for determining the greatest and the lowest values in a column of data using the MAX() and MIN() aggregate functions respectively. The functions only work with one variable and may require the use of the GROUP BY clause if other columns of the same table are included in the SELECT list. This means that if you wanted to get the higher of any two independent variables or columns just like the .NET Math.Max() function, you would have to write your own implementation or convert the independent variables into a set (i.e. a one column table variable). Consider the following the table and data: 1: CREATE TABLE t1(col1 int, col2 int,col3 int);
2:
3: GO
4:
5: INSERT INTO t1 VALUES(1,0,1),(2,1,0),(3,1,NULL),(4,NULL,1),(5,NULL,NULL)
Supposing we want to pick the greater of the 2 columns (col1 and col2) for each row in the table, the easiest route would be to use a CASE statement like this:
1: SELECT col1, CASE WHEN col2 > col3 THEN col2 ELSE col3 END AS greaterCol
2:
3: FROM t1;
This would return the following results.
col1 greaterCol
----------- -----------
1 1
2 1
3 NULL
4 1
5 NULL
5 1
The result for the 3rd row may not be correct if in ‘our design’, we were expecting 1 to be greater than NULL. SQL Server (and NULL logic), will give different results for comparisons with ‘unknowns’. Apparently, the answer depends on which column is on the right hand side of the comparison.
To get the correct results, we could modify our query like this.
1: SELECT col1,
2: CASE WHEN ISNULL(col2,0) > ISNULL(col3,0) THEN col2 ELSE col3 END AS greaterCol
3:
4: FROM t1;
We could also implement our own MAX () function that takes 2 parameters and returns the greater value. Of course, our function should be able to handle NULLs appropriately. The user-defined MAX () function looks like the listing below:
1: CREATE FUNCTION dbo.MAX(@val1 int,@val2 int)
2:
3: RETURNS int
4:
5: AS
6:
7: BEGIN
8:
9: IF ISNULL(@val1 , 0) >= ISNULL(@val2 , 0)
10:
11: RETURN @val1;
12:
13: RETURN @val2;
14:
15: END
.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, "Courier New", courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }
and use it (repeatedly if we wish), like this:
1: SELECT col1, dbo.MAX(col2,col3) AS greaterCol FROM t1;
The user-defined function, as created above, is limited to one explicit data type and we may have to change its definition to cater for more comparable data types. It is also limited to only 2 parameters, although I am sure we can extend it to accept more parameters with some recursive code.
Despite the ease of usage, the dbo.MAX () function, may be a tad slower than the CASE version for very large tables and the CASE statement seems to benefit from parallelism.
As an aside, for the functionality discussed above, Oracle PL/SQL provides the GREATEST () and LEAST () functions. The beauty of these functions is that they can accept more than 2 parameters. The downside is that they intrinsically return NULL if any of the parameters is a NULL value – no fault on Oracle’s part since the functions are intended to be ANSI compliant.
I wonder why Microsoft has not bothered to provide similar (or better) functionality
del.icio.us Tags: T-SQL, MAX() function
|
-
The Damerau–Levenshtein distance or the “edit distance” between two strings can be used in SQL Server database applications to search for strings when you’ve reached the limit with the LIKE clause or the full-text indexing features. Used directly in T-SQL, the algorithm has some obvious limitations like: - When used for fuzzy matches, the strings need to be relatively short since the algorithm operates in the worst case at O(m*n) where m and n are the respective string lengths. This fact can be made worse if the algorithm is called as a scalar-valued function as part of a multi-column select over a very large table. The entire search then runs at O(k*m*n*) where k is the row count.
- There is little(no) chance of the query optimizer using any indexes on the searched column when you use the algorithm in your WHERE clause.
Those grim facts aside and the debatable instinct that you ought to consider implementing this either using LINQ or just plain .NET, here is my adapted T-SQL version of the Damerau–Levenshtein distance. CREATE FUNCTION DamerauLevenshteinDistance(@str1 varchar(max), @str2 varchar(max)) RETURNS int AS BEGIN DECLARE @str1_len int=(SELECT DATALENGTH(@str1)); DECLARE @str2_len int=(SELECT DATALENGTH(@str2)); if (@str1 IS NULL AND @str2 IS NULL) return 0; if (@str1 IS null) return @str2_len if (@str2 IS null) return @str1_len; -- matrix that stores cumulative scores as strings are traversed DECLARE @d TABLE (i int NULL,j int NULL,value int NULL) declare @i int, @j int, @cost int; -- fill in scores for first row and column SET @i = 0; WHILE (@i <= @str1_len) BEGIN INSERT into @d (i,j,value) VALUES(@i,0,@i) SET @i +=1; END SET @j = 0; WHILE (@j <= @str2_len) BEGIN INSERT into @d (i,j,value) VALUES(0,@j,@j) SET @j +=1; END declare @newvalue int; declare @x int; declare @y int; declare @z int; declare @tcost int; declare @a int; declare @b int; SET @i = 1; WHILE (@i <= @str1_len) BEGIN SET @j = 1; WHILE (@j <= @str2_len) BEGIN if (SUBSTRING(@str1,@i,1) = SUBSTRING(@str2,@j,1)) set @cost = 0; else set @cost = 1; -- addition, subtraction, substitution costs set @x=(SELECT value + 1 from @d WHERE i=@i - 1 AND j=@j); set @y=(SELECT value + 1 from @d WHERE i=@i AND j=@j-1); set @z=(SELECT DISTINCT value from @d WHERE i=@i-1 AND j=@j-1); set @z += @cost; SET @newvalue=dbo.Minimum(dbo.Minimum(@x, @y),@z); INSERT into @d (i,j,value) VALUES(@i,@j,@newvalue) if (@i > 1 AND @j > 1 AND SUBSTRING(@str1,@i,1) = SUBSTRING(@str2,@j - 1,1) AND SUBSTRING(@str1,@i - 1,1) = SUBSTRING(@str2,@j,1)) BEGIN -- transposition cost set @a=(SELECT value from @d WHERE i=@i AND j=@j); set @b=(SELECT DISTINCT value from @d WHERE i=@i - 2 AND j=@j-2); set @b +=@cost; set @tcost=dbo.Minimum(@a,@b); Update @d SET value=@tcost WHERE i=@i AND j=@j; END SET @j +=1; END SET @i +=1; END declare @distance int=(select value from @d WHERE i=@str1_len AND j=@str2_len); return @distance; END The DL distance function above makes use of another scalar-valued function for getting the minimum of two numbers, Minimum, shown below: CREATE FUNCTION Minimum(@val1 float, @val2 float) RETURNS Float AS BEGIN IF (@val1 < @val2) RETURN @val1; RETURN @val2; END It is up to you to decide the DL distance that meets the threshold of relevance of your search.
|
-
I recently stumbled upon this cool online toy by Jonathan Feinberg from the IBM Research Visual Communication Lab. It is called WORDLE, uses JQuery and generates "word clouds" from a bunch of text or any URL with an RSS feed. It is a tagging engine and if the gallery serves as an indicator of its usage, you will notice that people who have tried it range from one word bloggers to pundits analyzing American presidential inaugural speeches. I will confess that it is the latter that led me to this toy. Well, I decided to plug in this blog and here is the result. I wasn't terribly surprised by the Wordle analysis of my blog. I realize that the blog could benefit from more technical writing on SQL Server and .NET. The only gripe I have with it is the fact that I had to change to my IE 7 security settings to medium and allow some worrying Microsoft VM Java options to make it work. I suspect that it has nothing to do with the tool itself but just a damning reminder of why I have always been wary of Java applets in general. Since I have been fascinated with image generation for a while now, I am going to use this experience as a launching pad for my next .NET drawing and AJAX learning project. it would be interesting to do this in ASP.NET or WPF
|
-
SQL Server allows us to decide whether query objects we create referencing tables can be tightly or loosely bound to the table structure. When you create a view or stored procedure you can specify the WITH SCHEMABINDING option to indicate that the view or stored procedure is invalidated if the underlying table structure is modified. In such a situation, the developer will need to drop and recreate the view or stored procedure. It also, inadvertently , serves as a check against 'breaking' changes like dropping columns and changing column data types without regard for dependencies. Most of the views I see in practice do not specify this option and a reasonable percentage use the "SELECT *" construct to retrieve data from the base tables. This means that if the underlying table(s) change, the developers will be scrambling to either fix bugs or if they are in the know, running sp_refreshview to synchronize the view metadata. One of the practices I preach is the use of WITH SCHEMABINDING at all times. This will consequently force you - Not to use the "SELECT *" construct in your views at all. This makes total sense to me even without considering some positive performance points you might gain by specifying exactly what you need to retrieve from the database.
- To use two part names for every column and table you reference i.e. table.column or schema.table.
- To carefully consider making 'breaking' changes to your table structure. In any case, good and thorough designing at all tiers of your application should reduce the need for frequent structural changes at the table level. If you must change stuff, script these changes and apply them as part of a bug fix or patch.
- To seriously re-evaluate your usage of nested views. Yes, I think that they are bad.
I have not yet seen any single advantage (apart from excuses) of not tightly binding your references to tables in your database. If the guys at Redmond were listening, I would suggest that all views be implicitly schema bound instead of allowing the T-SQL developer to shoot themselves in the foot. In fact, it sounds better to me to have a WITH NOSCHEMABINDING option just in case some developer out there wanted to do the un-recommended.
|
-
In SQL Server 2005, if you create a view utilizing the TOP (100) PERCENT and ORDER BY clauses, you will not get back your results in the order specified. Example: USE AdventureWorks GO CREATE VIEW CustomersOrderedbyTerritory AS SELECT TOP(100) PERCENT * FROM Sales.Customer ORDER BY TerritoryID GO SELECT * FROM customersOrderedbyTerritory You will notice that SELECTing from this view returns a result set 'ignorant' of the ORDER BY directive. In essence, the optimizer ignores the TOP and ORDER BY clauses. To achieve the desired result, there is a hot fix available from MSFT which introduces trace flag 168 and also requires you set the database compatibility level to 80 (SQL Server 2000). Apparently, the problem also extends to Common Table Expressions (CTE) which use the same constructs such as : WITH territorialcustomers AS ( SELECT TOP(100) PERCENT * FROM Sales.Customer ORDER BY TerritoryID ) SELECT * FROM territorialcustomers If you don't want to set your compatibility level to 80, you can use TOP (99.999999999999) instead. This somehow fools the optimizer to sort the data the way you want it to. Of course, it is only useful if your table contains less than 10 million records since the 0.000000000001% difference may start to show. Well, if you insist on using the view, you can avoid the hot fix by using the ORDER BY clause in a table-valued function to retrieve your ordered data . You can then wrap your view around this UDF. I have not checked if this behavior has been carried forward to SQL Server 2008.
|
-
A substantial part of my development career was spent on the assembly line churning out code based on someone else specifications. There was always somebody else to handle the financial, administrative, marketing and sales aspects of the whole manufacturing process - what I used to call the 'social' face of software. Now that I am running my own small software outfit and having gone through a few projects, the lessons seem to be coming my way fast and hard on a daily basis. Most of them are the 'social' kind and I will proceed to randomly throw them at you, mainly because I don't want to forget them. - Document all proposed requirements (features) to the most atomic level of detail possible. Ambiguous or general statements can fool you into optimism as far as the project timeline and budget is concerned. A specification statement like "The new system should allow employees to view their own data currently stored in the HR management system oracle database" cannot be the basis of a design. Break it down with questions like this:
- Allow - May suggest security concerns. Will this be integrated security (implicit) or forms-based (explicit)?. Is there some form of identity management in place? Will we have to log (keep track) access to the system somewhere?
- Employees - How many? Where are they located? Will each computer need some form of installation or setup procedure before using the system?
- View - Where is the viewpoint? i.e. can the employee view the data while outside the LAN. What will they use to view the data? i.e a browser, a Windows forms application through a VPN? Do we need to encrypt the data? What obstacles lie between the user and the data? Firewalls,
- Own - Are there people who can view other peoples' data? Who manages all these permissions?
- Data - Which data exactly? i.e. Salaries, leave (vacation), loans, employment history (vertical and horizontal movement within the firm). How should each be displayed?
- Oracle - Which version? Oracle client software installation? Who is responsible for this other database? Will they help with scripts to get the data out? Will we be allowed access to some of the data during development? Silently:(Do we have skills to deal with PL/SQL and ADO.NET?). Any documentation available?
- If most of the above questions can be answered satisfactorily by the client, you may be able to do a better estimate than just a '1 man-week' guess. In fact, you might realize that this could very easily end up at slightly more than '1 man-month'. Remember that I am talking about one requirements specification statement.
- You might, of course, want to wear the 'architects' hat and start thinking of performance, availability, scalability, extensibility and interoperability at the statement level. These thoughts will not reduce the value of your requirements and may uncover other concerns and wishes not previously stated.
- Communicate anything involving an agreed or suggested flux in resources and system features in writing (e-mail) as much as possible. Copy (cc) the communication to all the people relevant to that part of the project. After meetings, e-mail the client highlighting the main points agreed or disagreed during that meeting and ask for confirmation. You may look dumb and slow but this will cover your bases and avoid any ambiguity or disputes later.
- Perform user acceptance testing as early as sanely possible. Unit and functional tests can be greatly complemented by the almost natural ability of a user to always crash a system. The fact that the software is incomplete only serves to gauge your feature prioritization skills. This could be a better indicator of where the project status instead of just checking off a feature list in a garage or on an Ms Project gantt chart. In any case, a client nod is worth more than any test case scenarios you can cook up.
- Demarcate or at least try to understand the boundary between the user and the client. These two entities are important to the closure of the project. The client will pay for the software. Users will use your software as often as necessary. Users will certify your system. The client may not pay if the user is not happy. Your will be supporting users later - despite what the contract says. When the client is the user, you may have an easier job navigating through the acceptance and invoicing phases. If they are not , get a good project liaison to grease the path between them.
- Listen to all levels of the user spectrum. Every data entry and retrieval scenario should have a 'persona' and a real blood-and-bones user early in the project. Different people have different views of design. You may be able to merge these different views with your own or even learn something new. Not listening creates room for assumptions. Assumptions can spawn undefined elasticity in resources and features. Listening later only serves to widen the delicate gap between project initiation and final invoicing.
- Know the accounting practices and procedures of the client. What they may consider fast payment processing may be too slow for you. Your bills and obligations may not dance to the same tune as your clients' workflow and traditions. This will also allow you to plan for cash flow.
- If you can, start creating user documentation (help files) early. Consider this as a validation of the requirements and testing activities. This also brings the end game nearer.
- If there is a training clause in the contract, make sure the scope is clear. You may find yourself teaching PC skills, database administration, active directory services or even VB.NET for free or at a lower price than you would have if it was stated clearly from the beginning.
- If you can help it, get somebody who knows their way around the establishment and has a goodwill for the project as a liaison. Canceled or missed appointments with various people, internal events or staff deployment changes in the client firm can affect your ship date. You may also need help bullying the accounts people - they usually respond better to internal pressure and nagging. A good rapport with the internal IT team can be good for repeat or referential business. It might just help to reduce your medical bills as you struggle to deploy your new software.
- It may be that you got the job through a competitive process (a.k.a tender). Make sure you get to know why you won/lost the bid and what your competition had to offer in their technical and financial proposals. There will always be a next time and it is very likely that you will meet your competitors again. Such knowledge can only serve to improve your chances.
- In case you were wondering, I have little to say about software maintenance contracts. You will not need to do much if the agreement (SLA) is clear and your software approximates the elusive zero-bug state. On the other hand, it is very easy to spend all your money 'maintaining' the software as you respond to every user SOS and hunt down all kinds of bugs.
Whether my experiences are out of place is subject to the test of time and the pressure of future projects. In the meantime, they will serve as a reminder that the world is still interesting outside the confines of Visual Studio and the SQL Server query editor.
|
-
Mark Miller (DevExpress) is offering a free for all download of Refactor! for Visual Studio 2005. It includes 25 refactorings, nine of which are dedicated to ASP.NET development (the other 16 work in VB and C# code-behind files). There is also a full-screen training video available for your viewing pleasure.Having used the Pro version of the tool for some time now, I am happy to see that Mark has decided to make some of the features free for public consumption. The features that are available while working with ASP.NET 2.0 are:
- Add Validator
- Extract ContentPlaceHolder
- Extract ContentPlaceHolder (and create master page)
- Extract Style (Class) : Not available inside content pages
- Extract Style (id) : Not available inside content pages
- Move to Code-behind
- Move Style Attributes to CSS
- Rename Style
- Surround with Update Panel: Does not modify the Web.Config file
|
-
T-SQL may not be a 'true' programming language but it sure has lots of language features that make manipulating data possible. I spend quite a reasonable amount of time writing T-SQL scripts; either in production or learning or sometimes while teaching. While doing this I have often found myself cursing at the amount of typing required to accomplish otherwise simple constructs. Here is a sampling what I am rambling about: 1. Separation of variable declaration and initialization. Put simply, I would love to be able to do the following, at least for scalar types DECLARE @var int=0 instead of having to do it over two lines, thus DECLARE @var int SET @var=0 2. Creating database objects using IF ..DROP..CREATE. It is just involves too many keystrokes. Oracle PL/SQL has the CREATE OR REPLACE syntax. Why wouldn't the SQL Server team indulge us on this one? For those who are wondering what this is all about, if you want to create a table, you must check if it exists before issuing the CREATE statement. You could go about it like this IF OBJECT_ID('myTable') IS NOT NULL DROP TABLE myTable CREATE TABLE myTable(col1 int) Please, just give us this instead CREATE OR REPLACE TABLE myTable(col1 int) If I am not mistaken, somebody has already asked for this on the Microsoft Connect site. You may want to vote for that feature. 3. Is there any hope for a more compact loop construct instead of this? DECLARE @var int SET @var=0 WHILE @var<1000 BEGIN SET @var=@var+1 --Some code here END You notice that the six uncommented lines are required to do a simple loop. This can quickly increase if you have more than one loop (a.k.a an 'inner' loop). Since I don't expect to see curly brackets in T-SQL anytime soon, what about a FOR...NEXT simple VB-like loop? I may come off as a truly lazy DBA, but after exposure to other 'programming' languages, I feel that once in a while my productivity is impaired with this chatful state of affairs. I am not sure if there is anything in the ANSI-SQL standard that prohibits these wishes from coming true, but I am definitely sure that the SQL Server programmability team should think about it. Note To Self: If wishes were horses, beggars would surely ride.
|
-
The July Refresh of the SQL Server Books Online is now available. This edition of the documentation is supposed to cater for all current editions of SQL Server including Express. Download
|
-
The public release of Office 2007 Beta 2 is now available. Download
|
-
As an SQL Server admin, there are times when the combination of many users, groups, roles, logins and the various permissions that can be given to each to do stuff on various databases and objects within your server may just be too confusing. Just like the previous sentence.
One way of coming out of this mess unscarthed is to rigorously document every permission granted to every login/group and keep track of it all. I know many people who don't. I don't either. Since you basically can't do much with SQL Server unless 'explicity' granted or 'implicitly' allowed by virtue of your membership in the public or other groups/roles, I am normally much more worried about what I have GRANTED my logins rather than what I have explicitly DENIED them. In other words, most people would love to know who CAN do stuff rather than who CANNOT do stuff. So what else can you do about it? I wrote a script (stored procedure actually) to show me all the stuff that a given user/login can do within a particular database and on its hosting server. Here we go.
CREATE PROC [dbo].[loginperms]
@login varchar(128)
AS
SET NOCOUNT ON
IF (SELECT is_disabled from sys.server_principals WHERE name=@login)=1
SELECT 'The login <' + @login + '> is disabled'
ELSE
SELECT [Object Class]=class_desc, [Object]=CASE class WHEN 100 THEN @@SERVERNAME
WHEN 105 THEN (SELECT name FROM sys.endpoints WHERE endpoint_id=major_id)
ELSE object_name(major_id) END ,[column]='',[Permission]=permission_name,
[Permission Type] =CASE spr.type WHEN 'U' THEN 'Direct' WHEN 'R' THEN 'Inherited' END
FROM sys.server_permissions spm JOIN sys.server_principals spr
ON spm.grantee_principal_id=spr.principal_id
WHERE spr.type NOT IN('C') AND is_disabled=0 AND spr.name=@login
UNION
SELECT [Object Class]=class_desc,Object=CASE class WHEN 100 THEN @@SERVERNAME
WHEN 105 THEN (SELECT name FROM sys.endpoints WHERE endpoint_id=major_id)
ELSE object_name(major_id) END,'' ,permission_name,
[Permission Type] =CASE spr.type WHEN 'U' THEN 'Direct' WHEN 'R' THEN 'Inherited' END
FROM sys.server_permissions spm JOIN sys.server_principals spr
ON spm.grantee_principal_id=spr.principal_id
WHERE spr.type NOT IN('C') AND is_disabled=0 AND spr.name='public'
UNION
SELECT [Object Class]=CASE class WHEN 1 THEN (SELECT REPLACE(type_desc,'_',' ') FROM sys.all_objects WHERE object_id=major_id) ELSE class_desc END ,
Object =CASE class WHEN 0 THEN DB_NAME() ELSE object_name(major_id) END ,
[Column] =CASE minor_id WHEN 0 then '' ELSE COL_NAME(major_id,minor_id) END,permission_name,
[Permission Type] =CASE dpri.type WHEN 'U' THEN 'Direct' WHEN 'R' THEN 'Inherited' END
FROM sys.database_permissions dperm JOIN sys.database_principals dpri
ON dperm.grantee_principal_id=dpri.principal_id
WHERE dpri.name=@login AND state_desc LIKE 'G%' AND major_id>=0
UNION
SELECT [Object Class]=class_desc, Object=CASE class WHEN 0 THEN DB_NAME() ELSE object_name(major_id) END,
[Column] =CASE minor_id WHEN 0 then '' ELSE COL_NAME(major_id,minor_id) END,
permission_name ,[Permission Type]=CASE dpri.type WHEN 'U' THEN 'Direct' WHEN 'R' THEN 'Inherited' END
FROM sys.database_permissions dperm JOIN sys.database_principals dpri
ON dperm.grantee_principal_id=dpri.principal_id
WHERE dpri.name='public' AND state_desc LIKE 'G%' AND major_id>=0
GO
You may want to use the procedure like this:
EXECUTE [dbo].[loginperms] 'DOMAIN\user'
or
EXECUTE [dbo].[loginperms] 'sql_login'
I know, it is even easier get this information using the brand new SQL Server Management Objects (SMO). Maybe I will blog about that on another day. In the meantime, let's just be happy with T-SQL.
|
-
The delayed SQL Server 2005 Books Online (BOL) April 2006 refresh should now be up and ready for downloading.
Get it here
|
-

Details:
Event: Mobile and Embedded Devcon (MEDC) 2006 Europe
Dates: 6th-8th June 2006
Place: Nice, France
Sessions:
- Using Microsoft Windows Embedded Products to Bring to Market a Broad Range of Devices
- Windows CE Overview
- What's New in the Next Version of Windows CE Tools
- Writing Windows CE Services
- Stabilizing the Platform
- Inside the Windows CE Kernel
- Windows CE Memory Tools for Native Code
- Windows CE Performance Tools & Techniques
- How to write a Windows CE Boot Loader (combine with Secure Bootloader)
Discounts are available if you register before the end of next week. Get more details at http://www.microsoft.com/europe/medc/
|
-
-
I just downloaded Chris
Frazier's PostXing Beta
(2.0.6068.2) and I am using it post this offline. If you can read it, then
it works and it rocks!
I was totally unable to use IMHO Instant blogger despite Ryan Rinaldi's
suggestions. This might just change the way I've been
blogging.
|
|
|