Managed meets relational

Blog about .Net and SQL

Sets and set operations in .Net or "Why do we reinvent the wheel?"

clock May 11, 2009 23:27 by author alaudo

Resurrection of Delphi

As a dilligent subscriber of DotnetPro-Magazine (monthly developer magazine in German) I read in the last issue about the reincarnation of Delphi and Object Pascal undet the .Net platform. Before .Net Delphi was the language I had had most experience with, spending days and weeks trying to develop my own visual components and to develop something worth to show the others. Delphi is still the IDE of choice if you need to programm something, which should be compiled into the native code and should run on a variety of Windows Systems, starting from Win98SE on. In my last project where I had to programm the software which controls acoustic modem in marine research I had to use it again (in 2009, imagine!) in order to make sure that the final code will run on both old and new laptops they have.

Sets in .Net and C#

One of the articles of the actual issue was about "mimicking Object Pascal sets under .Net" by Bernd Klaiber, where he describes the class, developed by him in order to get a behaviour similar to what is called "sets" in Delphi. I was a little bit surprised to read about this implementation and even recommendation to use this "generic implementation with overloaded operators" implementation, which is somehow compared to the classical C# Flags. This is, a propos, the only point I agree -- the implemenation offers much more functionality compared to Flags, but do we really need this functionality?

Classical implementation from 2002

First of all there is a classical and well-known implementation of sets under .Net, which can be found and downloaded from CodePage. The implementation (and the article in the CodePage portal which supports it) was published as early as in 2002 and was revised in 2004, it implements a whole hierarchy of Set classes, implementing not only the classical set, but also dictionary and hash sets. The implementation does not overload any operators, probably because it was mainly implemented before this became possible in .Net (even though it is implemented as a generic class in the latest version available on CodeProject).

Drawbacks and what we actually need

I used this implementation in one of my projects at work and found it not so handy, because you have to deploy 2 separate DLLs together with your code (this is just one stand-alone DLL), even if you use just the smallest part of the overall functionality. By the way, when I think of sets, I basically think of the following features: ability to have a set of data where I can add to and remove from and check whether a certain element belongs to the set of not. Other operations like union or interseption are nice, but if I need them I rather use something in the direction of SQL-server or SQL-provider to implement this feature. That's why basically speaking of sets we need three operations:

  • Add something to set.
  • Remove something from the set.
  • Check, if something belongs to the set or not.

Flags

For static values one should take the classical C# flags, which are nothing more than a binary enum with possibility to check for certain bit. Here is just a small extract from my C# workshop demo project to illustrate the issue:


[Flags]
enum DaysOfWeek : byte
   { Monday = 1, Tuesday = 2, Wednesday = 4, Thursday = 8,
         Friday = 16, Saturday = 32, Sunday = 64 }

public static void FlagsTest()
{
   DaysOfWeek workingDays = DaysOfWeek.Monday | DaysOfWeek.Tuesday
                 | DaysOfWeek.Thursday | DaysOfWeek.Wednesday | DaysOfWeek.Friday;
   DaysOfWeek today = DaysOfWeek.Monday; 
   DaysOfWeek day1 = today;
   day1 |= DaysOfWeek.Sunday; // adding new day
   day1 |= DaysOfWeek.Tuesday; // adding another day
   day1 &= ~DaysOfWeek.Saturday; // removing one day
   if ((today & workingDays) != 0)
     { Console.WriteLine("Today is " + today + ", working day"); }
   else
     { Console.WriteLine("Today is " + today + ", holiday"); }
 }

What if we need more?

To some reason unknown to me there are quite a few who know that there is actually a full-fledged set class in .Net starting from the version 3.5. This is the hashset class, which implements all methods we need in order to manipulate elements and check for them. It supports .Add(), .Remove() (together with .Clear()) and .Contains(), the basic 3 operations I ever needed from a set. Besides, it also supports many real set operators like .ExceptWith() or .Overlaps(), which were so arduously implemented in the article.

Then I switched to this class in a newer implementation of my project at work the total project sized schrinked from about 800Kb down to 16Kb (imagine!) and I didn't have to deploy two separate DLLs with my project anymore (that was great!). The only method which I missed a lot was the .AddAll(T[] T) methods, which woud allow me to add the whole array of elements to the set at once, without implementing a loop. That's why I just briefly concocted the following class:


    class SetOfStrings : HashSet <string>
    {
        public bool AddAll(string[] newItems)
        {
            try
            {
                foreach (string newitem in newItems)
                {
                    base.Add(newitem);
                }

            }
            catch (Exception)
            {
                return false
            }
            return true;

        }
    }

Alas, this new class was not generic anymore (I did not care for it because I my set was hard typed), but it implemented the functionality I needed. Apparently it was the only difference between the CodeProject class and the hashset, since the rest of the code just worked without further correction. Now I use this code in several project I work at and I am fully satisfied with the execution speed and flexibility.



Brosius, Scheerer, Wolff: Business Intelligence mit Office 2007 und SQL Server

clock May 10, 2009 12:06 by author alaudo

Buchbild

Business Intelligence mit Office 2007 und SQL Server
Data Mining und Datenanalyse mit Excel, SharePoint und SQL Server
Von: Brosius, Gerhard / Scheerer, Benjamin / Wolff, Ulrich


328 Seiten
erschienen bei Microsoft-Press 01/2009 (Hardcover)  
ISBN-10: 3-86645-637-9

ISBN-13: 978-3-86645-637-2
Preis €49,90 (bei Bestellung auf MS Press Seite,
Best.Nr.: MS-5637).

Dieses Buch war bereits das zweite MS-Press Buch von diesem Team aus Hamburg, das ich mir zwecks Vorbereitung für meinen Vortrag an der Uni Hamburg bestellt habe. Das erste Buch wurde noch Mitte 2006 unter dem Titel "Business Intelligence und Reporting mit Microsoft SQL Server 2005" herausgegeben, es richtete sich aber in der ersten Reihe an Entwickler und Datenbank- bzw. Data Warehouses Administratoren, die ihre Kenntnisse im Bereich MS SQL Server 2005 vertiefen wollen.

Dieses Buch hingegen, hat Manager und Verwaltungskräfte als Zielgruppe. Anders gesagt, es werden hauptsächlich die Leute angesprochen, die im Unternehmen für gewisse wirtschaftliche Entscheidungen zuständig sind. Wie die Autoren selber in der Einleitung schreiben: "Das Buch richtet sich an Leser, die in ihrem Arbeitsalltag Daten analysieren und bewerten müssen". Das Ganze wird anhand von gut ausgewählten Beispielen erklärt, wobei man wenig ins Tiefe der im Hintergrund ablaufenden Analyse geht, dafür aber sehr breites Spektrum von üblichen Wirtschaftssituationen präsentiert wird.

Das Buch setzt den installierten MS SQL Server 2005, MS Office 2007 und Data Mining Add-In für Office 2007 voraus. Die Installation vom MS SQL Server "out-of-the-box" reicht schön völlig aus, um in dem Buch angeführte Beispiele auch selber bearbeiten zu können. Obwohl auf Titelseite nicht explizit erwähnt, wird in das Buch von einem MS SQL Server 2005 und Add-In für diese Serverversion ausgegangen. Sollte man den SQL2008-Server für Analysis in Anspruch nehmen, muss man aufpassen: einige Formulare sind in dieser Version vom Office Add-In bereits überarbeitet und bieten in der Regel bessere Übersicht bzw. Analysemöglichkeiten. Auch die Ergebnisse der Analyse werden sich an manchen Stellen von dem, was in dem Buch steht, leicht abweichen, was man aber dann nur auf die Serverversion zurückführen kann. So werden die Daten automatisch fürs Training und Validieren partitioniert, es besteht keine Notwendigkeit mehr, diese manuell vor der Analyse zu partitionieren -- das Ganze wird einfach über das Analyse-Formular eingestellt und im Hintergrund durch Add-In erledigt. Sollte man also das Aufteilen von Daten nicht mehr nach dem im Buch aufgestellten Algorithmus durchführen können, muss man sich nicht wundern und diesen Abschnitt einfach beim Lesen überspringen.

Erfahrung im Umgang mit Excel ist zwar erwünscht, jedoch nicht strikt vorausgesetzt: die nötigen Sachverhalte werden ausführlich erklärt, das Buch ist mit vielen Abbildungen versehen, wo fast jeder Schritt auch optisch nachvollziehbar ist. Darüber hinaus beschäftigt sich das erste Teil bestehend aus 3 Kapiteln mit Datenanalyse mit Excel, wobei die Grundbegriffe wir Excel Table, Excel Charts und Pivots ausführlich und illustrativ erklärt werden. Eine Excel-Datei mit allen in dem Buch verwendeten Beispielen lässt sich von der Autorenseite herunterladen. Dieses Teil ist eine perfekte Einführung in Datenanalyse mit Excel und ist jedem, der Excel täglich im Beruf einsetzt, anzuraten.

In dem zweiten Teil werden die Grundlagen von Data Mining vermittelt, wobei die einzelnen Analysemöglichkeiten von dem Office Add-In systematisch durchgangen werden. Hier wird aber meistens den "Knopf-Formular"-Ansatz verwendet, indem man nicht die Fragenstellung und Problemlösung erklärt, sondern einfach die hinter den Buttons liegenden Funktionen Schritt für Schritt erklärt. Wer sich also mit Datenanalyse noch nicht richtig auskennt, der soll noch ein anderes Buch davor gelesen haben; diejenige aber, die sich einen schnellen Einstieg in Office Data Mining Add-In wünschen, werden zufriedengestellt.

Hier wird auch häufig versucht, eine bestimmte Strategie für Datenanalyse auszuarbeiten, jedoch wird dieser Versuch nicht immer konsequent weitergeführt. Darüber hinaus gibt es hier einen peinlichen Fehler, der leider die ganze Logik der Datenanalyse zunichte macht. Es handelt sich um das Kapitel 5 „Data Mining Tools". Auf der Seite 189 wird ein Genauigkeitsdiagramm für ein Klassifikationsmodel dargestellt (Abbildung 5.14), wo alle drei Linien des Genauigkeitsdiagramms aufeinander liegen. Die Abbildung weckt bei einem aufmerksamen Lesen sofort Zweifel, da wenn das Ideale Model sich von der Vorhersage nach Zufallprinzip nicht unterscheiden lässt, macht es überhaupt keinen Sinn, irgendein Model weiter zu bauen, man nehme einfach das Zufallmodel und wende dieses an.

Ich musste das Model selber nachbauen und die Genauigkeitsdiagramm sah erstaunlicherweise deutlich anders aus. Beim Untersuchen der beiden Modelle wrid auch der ursprüngliche Fehler klar:  auf der Seite 188 in dem Punkt 3, in dem die Vorherzusagende Spalte gewählt wird, steht „Geben Sie aber statt des vorgeschlagenen Wertes 0 als vorherzusagenden Wert 1 ein." Wenn man sich aber die Abbildung in dem Buch ansieht, da kann man unter der Diagrammüberschrift „Vorhergesagte Spalte „Kunde = 0"". Das erklärt auch das Ergebnis der Genauigkeitsüberprüfung, die 0-Werte kommen in der ursprünglichen Datenmenge in knapp 100% der Fälle vor, da passt ein Zufallmodel am Besten. Das bricht leider die ganze Logik der weiteren Modelentwicklung, die in dem Buch vorgeschlagene Datenselektion (downsampling) trägt nicht bei zu Modelgenauigkeit bei, ganz im Gegenteil, die Vorhersagekraft des Models wird dadurch um etwas schlechter. Der Fehler wurde bereits den Buchautoren mitgeteilt und wird hoffentlich in der nächsten Auflage korrigiert.

Das letzte, dritte Teil des Buches durchleuchtet das Thema der Zusammenarbeit zwischen MS Excel und Sharepoint 2007. Es werden die Excel Services im Rahmen eines Sharepoint-Servers erklärt und auf Buisiness Intelligence mit diesem Server eingegangen. Da ich leider auf dem Gebiet kein Expert bin, kann ich den Inhalte hier nicht richtig bewerten.

Im Großen und Ganzen lässt auch das Buch einen guten und soliden Eindruck, lediglich die Erscheinung in der Fachbibliothek-Serie vom MS Press finde ich seltsam, da dadurch ein großer Anteil der Zielgruppe das Buch leicht übersehen kann. Eine "stand-alone"-Ausgabe oder im Rahmen einer Office-Serie wäre hier, meiner Meinung nach, viel angemessener.

Das Buch kann direkt von der MS Press Seite bestellt werden.



Off we go!! // Auf die Plätze! Fertig! Los!

clock May 10, 2009 10:13 by author alaudo

Everything seems to be ready, the server works, the blog engine is correctly installed and it is high time to write the first blog entry, but what should I write here? At all, do I need to write something here? A difficult question. At least you don't need to read it, that's good. :) Let me just briefly describe my motivation to start this blog, you may find it boring, if you continue to read, so be warned and don't complain afterwards!

Who am I?

A good question, I wish I knew it. Basically I can only tell you with some degree of certainty what I am and with even more certainty what I am not. 

I was bord many years ago in Kazan, Russia. That time it was still Soviet Union and I was happy to experience some of the benefits and disadvantages of that time. I was 12 when the regime fell and the time immediately following this event will always remain fresh in my mind. I never felt myself so uncertain about what is coming and what I am going to be as during this time in Russia, where we during one night lost our former identity and got nothing instead. But now this time is over and it may never come again!

Medical School in Russia

So, technically, I am a child doctor. Yes, it may sound redicular, but this is the only university degree I have completed as early as in 2002. I have never practised medicine since then except for numerous pieces of medical advice I am (even now) asked for. Strangely enough, being a good student during my medical school time, I just completely abandoned this activity and was never (or better to say, almost never) sorry for it. Medical school is a good school where I learn how to learn generally, for the only way to complete it is to learn everything by heart. This does not mean there is no logic in medicine -- surely there is -- but you are usually short in time to grap it while learning and the understand comes after you have stuffed everything into your head. That makes your brain just function perfectly and to infer the logic in the things which seemingly do not have any. This skill -- I don't know how to name it -- helps one further in the career.

Move to Germany

Having finished my medical school I didn't work a day. Next week after my graduation I moved to Germany to start my PhD which I was pursuing actively for 3 years and then switched to IT without actually completing it. I am still on my way to finish it and everything looks optimistic except for free time. Ultimately, I need to revise my priorities somehow...

Study of Computer Science

So, from 2005 on I pursue my IT career as a database and software developer. 2007 I was enrolled to the Technical University Hamburg-Harburg, initially as a student of General Engeneering Sciences (GIS, AIS in German), but during my first term I switched to Informatics and Engeneering Sciences (IIW in German) and completed my basic studies (Grundstudium) early in 2009. Even though my course is called "Informatik-" in German, I am apt to think that Computer Science would be an adquate name for it as well. It is just historically called informatics, from the time when everything what touched computers was called so.

This does not mean I started right from scratch in 2005, not at all. I had some experience of application development for Windows before, I developed several rather sophisticated applications for Windows using Borland Delphi 3 and 5 during my time as a medical student I used in my scientific work at the Department of Human Physiology of my medical school. We used them for data acquisition and modelling. That was a good start already, I knew the basics of OOP and had a broad scope in IT when I started, to say nothing about the tremendous interest to the whole area.

Career

I was lucky to find my first IT job rather quickly, thanking to my wife who spotted the correct ad on the pinboard during one of her sporadic visits to our university cantine. As of know I have over 2 years of experience as database and application developer and this experience motivated and helped me to obtain the certifications I hold now. After first 1-2 months of euphoria of working as a developer I ultimately got the feeling I am on the right place now, I do something I was always eager to do, and I can do it well enough to be paid for it. That was like a big alleviation after 10 years of pursing somebody else's career, not mine. Even being probably a not so bad physician, I feel a sort of calling for informatics and mere practisizing it makes me happy.

Microsoft Academic Programm

In almost a year after my enrollment to the TUHH I came across an event which was arranged by Microsoft Student Partners. It was a 4 days workshop about not yet officially released MS technology called Windows Presentation Foundation and was held by two ordinary (e.g. undergraduate) students of TUHH, Pawel and Björn. Inspite of my initial prejustice the workshop might be not of such a high quality if prepared and held by students it was absolutely great. I saw people who were really interested in the technology they were delivering and were able to infect others with their enthusiasm. This event became the key point which finally brought me to Microsoft Academic Program.

Microsoft Academic Program is just another student society under the auspices of Microsoft where technology enganged students can pursue their certifications and career of a trainer for Microsoft technologies. The flat hierarchy, so common for US, was rather new for me here. And obtaining benefits for the things I would normally rather pay myself for was also a very nice surprise to say the least. Thanking to this program I could start pursuing my certifications, to develop my soft skills as a trainer and coacher, to see something close to the global goal of my career. And this is the Academic Program, which inspired me to start this webpage and this blog.

In that respect you may consider me biased for Microsoft technologies and solutions. It is only partially true. Yes, I am prone to use MS technologies because I know them and I hold certifications for them, that means I used to study them fundamentically from books. But I am open to every new technology, I have a LPIC-1 certification (Linux Professional Institute), Linux is installed on every machine I use and I develop much for Linux as well. But even here I try to use technologies, originally suggested or developer by MS, like Mono, for example. So, I may be just a little biased, not more, I admit it!

Why this blog?

Why to start a new blog? This is not my first blog, I have been blogging for 7 years using gradually degrading livejournal.com service. But this is a personal blog where I write primarily about myself and events which are directly related to me. Here I would like to have a sort of "career diary", blogging about the events from IT which seemed interesting to me and worth to be written about.

An attentive reader would ask -- why to blog in English if it is not my primary language? And I am not going to blog only in English here. I do realize that some entries would require German as I live in Germany and it would be rediculous to blog about events which are intimately related to and probably also confined to Germany in any other language. I would, however, try not to use further language I can speak here, for it would hinder you from following me. I assume that every reader of my blog (and there are none at the moment :) ) would understand some English to get the information he or she is looking for, still I will reply to every comment written in the language I can understand.

What is this blog about?

At the moment I am trying to become an expert in the area of database development and administration. I use .Net for application logic and MS SQL Server 2008 as RDBMS, and I will probably blog about them. I will also blog about the articles and books I read, about the wiki-dot.net webpage and the wiki which is going to emerge here in the nearest future. I will also blog about something I find interesting for you to read or useful to know.

But I will not blog about my personal life, about my wife and son, about my trips etc. If you want to know it, welcome to my personal blog at livejournal. I also promise not to blog much about local topics which may be interesting for German readers only. I will try to balance, let's see if I manage it.

The first enry is menacing to grow beyond any boundary of politeness, that's why I want to call it a day.

So, off we go!



About me

 Venice, 2009

Greetings here in my blog!
My name is Alexander Galkin. I was born 1979 in Kazan, Russia, where I graduated in child medicine.
Since 2001 I live in Hamburg, Germany and work as a freelancer software and database architect and trainer for Microsoft technologies.

 Microsoft Certified Trainer   Microsoft Certified Professional Developer
 
MCTS Logo     MCITP Logo

Sign in