Managed meets functional

Blog about programming and having fun with .Net

About me

 Venice, 2009

profile for Alexander Galkin on Stack Exchange, a network of free, community-driven Q&A sites

Project Euler

Greetings here in my blog!
My name is Alexander Galkin. I was born 1979 in Kazan, Russia, where I graduated in child medicine.
Since 2001 I live in Hamburg, Germany and work as a freelancer software and database architect and trainer for Microsoft technologies.

 Microsoft Certified Trainer
Microsoft Certified Professional Developer
MCTS Logo
MCITP Logo

Calendar

<<  February 2012  >>
MoTuWeThFrSaSu
303112345
6789101112
13141516171819
20212223242526
2728291234
567891011

View posts in large calendar

Updated version of GLSL parser by Laurent Le Brun to compile with FParsec 0.9.1

GLSL parser by Laurent Le Brun is a gold nugget for those who would like to use FParsec to parse C-like languages. As I have plans to use this script as a starting point for my TouchDevelop parser, this script was the first I looked at after I fixed F# support in my Visual Studio 2010. 

It turned out, however, that the script is somewhat outdated: written in 2010 (according to the date of the blog entry) it requires an earlier version of FParsec library. Trying to compile it against the latest stable version of FParsec (0.9.1) fails due to some deprecated names (mostly abbreviations that were previously used in FParsec, like "Assoc", are now written in full). Besides, the syntax for SyntaxParser generic constructor has been changed to include the UserState type. 

I adapted the code by Laurent Le Brun, the working versions of the script with minimal changes (see below) can be downloaded from this blog entry.

Changes compared to the original version of the script:

  1. Assoc -> Associativity
  2. µOp -> µOperator where µ in {Infix, Prefix, Postfix, Ternary}
  3. Parse -> GlslParser
  4. Ast -> GlslAst
I also put together a small VS2010 project containing the latest version of FParsec along with a sample GLSL script that is automatically parsed if run in debug mode.
Files:

glsl_parser.fs (8 Kb) 

VisualStudio 2010 solution with FParsec 0.9.1 and sample script (1.36 Mb)


Categories: .net | F# | parser
Permalink | Comments (0) | Post RSSRSS comment feed

Using LINQPad as scrapbook for FParsec

FPrasec is a smart implementation of the famous Parsec library from Haskell. 

FParsec belongs to the class of parser combinators, meaning that you don't have any IDE or formal definition of your grammar (as in case of ANTLR/ANTLRWorks). Rather you deal with some primitive parsers which can consume strings, digits and combine them in a clever way to implement your parser. The main advantage here is that you have the full control over what you are designing and can use the full strength of the underlying language (in this case that of F#).

Since I have got problems with F# in my main Visual Studio installation which I couldn't not repair (here is the respective StackOverflow question) I decided to use my favorite tool LINQPad for learning and designing parsers and it worked perfectly.

So, if you want to use LINQPad with FParasec the only thing you have to do is to add the references to FParsec DLL. This is done by going to the application menu:

and then add both FParsec.dll and FparsecCS.dll to the "additional references". You will find these DLLs after the first compilation of the FParsec source code. 

For your convinience I am attaching an achive with pre-compiled DLLs to this post.

From now on you can use FParsec freely, just don't forget to open the FParsec namespace in your code.

I also adapted the examples from the FParsec tutorial to run in LINQPad. Those samples expect the DLLs to be stored under the follownig path: D:\Dev\FParsec\DLL\

You can download the complete tutorial scripts and the samples provided with the FParsec using the link below.

FParsec.dll and FParsecCS.dll (.zip, 176 kb) 

LINQ queries for tutorial and samples (.zip, 13 kb)


Categories: F# | parser
Permalink | Comments (0) | Post RSSRSS comment feed

About Refactoring

I just answered one question on Programmers@SO and would like to post my answer here as well:

Q:

  1. How does [refactoring] takes place in the Software development process and how far it effects the system?
  2. Does Refactoring using these tools really speed up the process of development/maintenance?

A: First of all, depending upon the site of refactoring one can distinguish several types of it: code refactoring, database (schema) refactoring, refactoring of unit tests, refactoring of GUI etc.

There are several situations where you can meet refactoring during software development:

  1. Refactoring is known to be a mandatory step in certain agile development techniques like test-driven development. It is supposed to perform refactoring step after every implementation step. In this case the refactoring targets just the last implementation and its goal is to integrate the new code into the existing code corpus in the most optimal way.

  2. Refactoring can be done some internal problems in the working code are detected: this is called "code smell". This estimation is in many aspects rather subjective, despite the fact that it can be actually based upon certain code metrics (like number of lines of code per method, cyclomatic complexity of the code etc.). Here the goal of refactoring is to improve the code quality by changing it so that the metrics used for quality estimation return to the expected domain.

  3. You often need to refactor the code to achieve certain principles of programming in your code, look for Clean Code development to learn more about such principles.

  4. You may need to perform refactoring of your code and database schema to prepare it for coming changes, especially if those were not considered during the design phase of the project. For example data normalization and denormalization take often place during data-driven software development to prepare the database for possible extensions.

Refactoring tools available on the market basically support the developer in two ways:

  1. While writing your code, you get suggestions how you can improve it "on-the-fly". Whereas many fallacies can be detected directly by your IDE, like Visual Studio or Eclipse (for example dead code, variables declared but not used etc.), the refactoring tools like Resharper can reveal problems which are far less evident, like re-writing the loops in LINQ queries etc.

  2. These tools also support you with custom refactoring steps, like global renaming of your identifiers, splitting your class declarations into separate properly named files, extracting interfaces and base classes from your class implementation etc. They save a lot of work here, especially if your project has a large code base, but you must first know what you really want to refactor.

Actually using tools like ReSharper in everyday's development is so useful that it makes you almost dependent on them: they really accelerate the process of code writing, especially if you know how to use them appropriately!


Permalink | Comments (0) | Post RSSRSS comment feed

Class diagram for CodeDom namespace in .Net

CodeDom namespace in .Net is one of several ways to develop your own compiler or source code generator in .Net. Even somewhat abandoned, it currently support for following languages:

  • C# (native, out-of-the-box),
  • VB.NET (native, out-of-the-box)
  • F# (native, out-of-the-box)
  • IronPython

CodDom provides you with the classes to build your own abstract syntax tree of your code, which can be either compiled to "binary" aka CIL or translated "back" to a high-order general purpose language from the list above. This explains why it is rather a complex namespace with many classes and non-trivial class hierarchy. If you try to build class diagram for this namespace you come up with the following image:

Since it is absolutely impossible to work with diagram I prepared a set of diagrams covering large constellations of classes.

So, here you have:

  1. A general overview of CodeDom namespace classes with "pruned" tree.

  2. Collections and enums.

  3. CodeStatement

  4. CodeType

  5. CodeExpression

     

So, feel free to use these diagrams in your work. There is no explicit license attached to these images (I haven't drawn them, just used the VS2010 designer), so in case of a question just consider my work being a public domain.


Permalink | Comments (0) | Post RSSRSS comment feed

How To: Improve the performance of Visual Studio UML and class diagramm designers

If you feel that Visual Studio 2010 UML or class diagramm editor became slow and not as responsive as it is used to me, especially if you are working with many objects at a time, then you should try to close the property window. Strangely enough this window gets populated every time you select an object and in case if you select many objects it shows only the commonly set properties. This is the reason why the editor becomes so slowly (even though iit does not explain why it takes so long to populate the properties window).


Categories: how-to
Permalink | Comments (0) | Post RSSRSS comment feed

A wonderful collection of bit hacks, interesting for every programmer

Have you ever tried to count bits in a bit-array structure without using shitfts? Or probably used bitwise XOR for a primitive but effective cryptography? Then you will definitely like the following webpage, providing an absolutely wonderful collection of non-trivial bit manipulations for achieving marvelous results. Really a great place to read and to take away some pieces of bit manipulation magic.

 


Permalink | Comments (0) | Post RSSRSS comment feed

Object-Relational Mapping: a handy design pattern or a spoiling anti-pattern?

Today somebody asked the question about the nature of ORM in development: should we consider it a useful pattern or an anti-pattern.

Here are my thoughts on the topic (just copy and pasted from StackOverflow):

Actually ORM helps you to quickly implement a data-base connectivity and implement your application logic without paying much attention to the actual connection to database. You are allowed to use the entities of your programming language while implementing the logic and you don't have to care about how these are then translated into the relational model of database. This is the main advantage for me and that is why ORM is so popular -- you can develop a simple data-driven application in just a couple of hours.

So, ORM, as many other technologies like managed code, garbage collection, generics etc. is optimized for developer productivity, e.g. to minimize the number of developer hours (that are normally quite expensive) needed to implement certain functionality.

As long as you have other criteria that may override the above mentioned one, like performance, application size, flexibility of the logic, network throughput, code size (both of the source and compiled) ORM is not your friend anymore. But since this is not a common scenario people usually don't care and take ORM for their applications.


Tags: ,
Permalink | Comments (0) | Post RSSRSS comment feed

Why C++ is not good as the first programming language

This term I teach the course "Introduction into Programming with C++" for the first-year students of Engineering Sciences. 

The course was requested by the University and targets the students with no prior knowledge of programming. The choice of C++ as the language of the course was not mine, this is the default language to teach OOP at our university (Hamburg University of Technology), for the professor in charge is doing his research primarily using C++. So, the decision to use C++ as the first language was more or less imposed on me by the dean office.

During the very first session I provided students with several motivating examples. One of those was the classical task about the chessboard and the wheat beads, where  one Vizir truly enchanted by the wonderful game of chess asks the inventor for a decent reward and the latter asks him to put one wheat bead on the first field, two beads on the second fields etc. until every field is covered. This seemingly trivial task results in a very high number which is not so easy to compute directly using a brute force approach (without deriving the summation formula for this geometric row).

So, we implemented this task in several languages. Here is the one-line implementation in F# (MS version of oCaml for .Net):

let rice = 
     [0 .. 63] |> List.map (fun x -> 2I ** x)  |> List.sum |> Dump

The equivalent C# (and with small changes C++) code is much longer and requires much more things to keep an eye on:

void Main()
{
	ulong sum = 0;
	for(int i = 0;i < 64; i++) 
	{
	  sum+= power2(i);
	  sum.Dump();
	}
	sum.Dump();
}

// Define other methods and classes here
ulong power2(int power)
{
   ulong answer = 1;
   for(int i=0; i < power; i++) answer=answer * 2L;
   return answer;
}

This is why I do agree with the widely spread opinion that Computer Science and especially the algorithms should be first taught using a non-imperative language, like Haskell, oCaml, Erlang, Scala or F#.


Permalink | Comments (0) | Post RSSRSS comment feed

I will be at TechEd 2010 Europe in Berlin

Hi, everyone!

It has been quite some time since I wrote something in my blog, mostly because I was busy with my studies.

But today's news could not evade my blog: I will be attending TechEd 2010 in Berlin this year.

I will be working there as an invited MS Expert at Silverlight Booth.

So, if you happen to be there as well, come to our booth!


Categories: general
Permalink | Comments (0) | Post RSSRSS comment feed

The Zen of live coding: Win7 Zoom feature and ZoomIt from Sysinternals!

If you have ever presented something to the developer audience you have definitely had to show some features or code samples live. I often combine live coding with slide presentations, not only because it makes people wake up, but also because it gives your presentation a professional and lively touch, helping you to win the audience in case if the topic does not seem to be extremely interesting to the majority of them.

During these live shows I often feel the need to zoom to certain area of my screen in order to emphasize the actions I am taking or the code menu items I choose. The easiest way to achieve it if you present on a foreign laptop is to use the new Win7 zoom feature. By holding Winkey (the key located between Ctrl and Alt on the left-hand side of keyboard) and pressing "+" and "-" on the additional keyboard (known as "Grey Plus" and "Grey Minus") you can zoom in and out respectively.

There are several zoom levels you can reach if you press these buttons several times, enabling you to focus on the tiny little part of your screen. Your system remains fully responsive during and after zoom, you can just continue typing or choosing menu item, the magnifier would normally follow the mouse cursor (this feature is called "live zoom") unless explicitly set otherwise. If you want to leave the zoom modus you have to left-click on the magnifying glass and close the magnifier panel.

This built-in option equips you with a very handy presentation tool working right out-of-the-box. However, you might sometimes need a little bit more than just zooming. During my live presentations I often have the situation where I need to freeze the screen content for a while and to explain something in more details. I was desperately looking for a free solution which would assist me here and found a very nice tool from Systeinternals webpage called ZoomIt.

This is a small single executable (about 500 Kb) which has to be started manually and which resides in your tray after start watching for the keystrokes. The default key-mappings (Ctrl + 1..4) collide with keyboard layout switch in my system that is why I changed those, in your case you may wish to do the same, you just have to right-click on the magnifying glass icon in the tray and choose "Options":

As you can already guess the from context menu, there are basically three different functions you can run by pressing keystrokes or from this context menu.

  1. Zoom

    Zoom works similarly to the Win7 zoom feature, except for the fact that there is no more mouse pointer visible; the screen zoom, however, follows your mouse movements until the first mouse click, which unveils the reason for such a strange behavior. Basically, zoom mode captures the screenshot of your desktop and you zoom to the still non-interactive image. Being in this modus you have two options: by clicking the right mouse button you fall back to the normal screen and can continue working, by clicking the left mouse button you enter the drawing mode and can now draw with your mouse pointer, depicted as a small cross.

    By clicking the right mouse button you can go one step back and continue to zoom throughout the screenshot of your desktop. You can go back to the normal working mode either by clicking right mouse button twice or just by pressing Esc in the draw zoom mode. There are multiple keystrokes you can use in the drawing mode; you get the full description if you go to the "Draw" tab of the "Options"-Form.

    Strangely enough, there is however one keystroke which is missing in this very detailed description (you'll however find this keystroke on the adjacent tab). If you press "t" in the drawing mode you can enter any text (it will be printed using the color of you pen) and finish the text entry by pressing Esc. This helps you to annotate the screenshots quickly.

     

  2. Draw

    Draw mode is practically the same we were describing just above, the only difference is that you don't need to zoom first and you the screenshot of your complete desktop as a template for your artistic exercises.

  3. Timer

    This is a nice option if you deliver talk or workshop where you have some assignments (like implementing "Hello, World!"). In order to dim down your presentation or IDE and to stress the importance of the exercise you can start the stop clock which would countdown the time (default for 10 minutes) until the deadline.

  4. LiveZoom

    Working only on the systems starting from Vista on this mode completely mimics the zoom feature of Win7, e.g. you have a completely functional desktop you zoom at and you can continue working (typing) whereas the part of your desktop is shown magnified. Even though it may be also handy to use it as alternative to the built-in feature, I feel myself often trapped while using this feature because it is not evident how to live this mode (since the Escape button does not help you). Just press the keystroke once more to go back.

In conclusion I can just recommend you to have ZoomIt on your memory stick together with your PowerPoint presentation, so you can start this handy tool every time you need to present something (there is no need for elevated privileges to start it). With some practice you will be able to produce rather complex annotations to your desktop using this tool only!


Categories: english | sysinternals
Permalink | Comments (0) | Post RSSRSS comment feed

How to disable password request on logon in Windows 2000 – Windows7?

Password protection is essential part of the security policy and you should consider disabling it as the uttermost measure. The possible scenario includes home machine with no domain identification used only at home by family members. Do not disable the password request on your laptop under any circumstances, you may probably want to configure your laptop not to re-request your password on wake-up, but it is essential for keeping your private data secured against undesired access in case of laptop being lost or stolen.

So, in order to disable logon password request you have to do the following:

  1. Download the Autologon utility from sysinternals collection from here:
    http://technet.microsoft.com/de-de/sysinternals/bb963905.aspx
  2. Start the tool with elevated privileges:
  3. Confirm the start with elevated privileges (using safe desktop if you have UAC activated on Vista/Win7).
  4. Accept the license agreement.
  5. Enter your computer name for logon as a local user or domain name for domain logon.
    Then enter your username (local or domain) and password (shown as asterisks).
  6. Click "Enable" to enable autologon.

From now on your system will logon automatically.

In order to disable autologon in the future you have to either change the password of your account or re-start the tool and click "Disable". A message window will prompt the new status of autologon (enabled or disabled).

This solution was tested on Windows XP Professional and Windows 7 Ultimate, reportedly it also works for Windows 2000.


Categories: sysinternals | english
Permalink | Comments (0) | Post RSSRSS comment feed

Technorati relauchned! How to make you blog discoverable.

After reviving my blog with the sysinternal entry I went to Google Analytics to check what kind of users visit this blog. To my surprise there are more people looking for my German CV than any from the IT-area. That made me think it would be nice to get my blog registered somewhere like Technorati in order to get some more target group visitors.

Since relaunch of Technorati there is no more need to add Technorati tags to your every post, the site implemented the architecture known in the IT-world as "inversion of control", meaning that rather than submitting your posts with certain tags you just post normally and the system tries to discover your blog posts and to put them into respective category.

In order to make you blog "discoverable" you have to sign up (or sign in, depending upon your current status at Technorati) and claim your own blog. After filling up all necessary fields as to the URL and feed URL you are requested to put a code (PBFE7PQ3VWN9) provided by the page into the most recent blog entry in order to prove you are the owner of the blog and verify your blog.

As soon as Technorati crawler verifies the token your blog is scheduled for review which may take some time.


Permalink | Comments (0) | Post RSSRSS comment feed

How to measure the execution time of your program in Windows?

There are often cases that you need to measure the execution time of your program. Working in Windows or Linux one gets used to the "time" system utility, which allows one to measure both system and kernel time of any executable just by preceding it with complete parameter set in the command line (for instance "time du").

Pity, there is no such a handy routine in (usual) Windows installation and the "timeit.exe" solution which one might find on many Internet forums does not seem to work in all cases (at least it does not in my case of Windows 7 Ultimate). There is, however, a nice workaround for this problem, provided you want to measure the execution time of one or few runs, by using Process Explorer from Sysinternals toolset by Mark Russinovich. This solution, however, does not help in case you need to perform multiple performance run tests of your software. Please, refer to Visual Studio Profiler in this case.

  1. So, first you need to download the latest version of Process Explorer from here:
    http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx
       
  2. Once you downloaded it and agreed to the user agreement you can see the main screen:

  3. You first should add the parameters you'd like to get from your program to the list of visible columns. For this purpose go to View and choose Select Columns

  4. Go to the "Process Performance" tab



    and select the counters you'd like to see in the general process list. In order to get the total execution time you'll need to tick the "CPU Time" option:



    Click OK to go back to the main screen and you will see the new column CPU Time to the very right of the process table. The order of columns can be easily changed just clicking on the column caption and dragging it to the desired position.

    Note: You might need to enlarge your Process Explorer window to see the new column.
  5. Now you are almost ready to capture the information you need. However if you now start your process and immediately switch back to the Process Explorer, find it in the list of running processes and wait until it terminates you can read the total execution time in the CPU Time column. It works fine in many cases, but you may encounter two types of problems:
    1. If the process you start just runs shortly you may miss your process, for it is already finished by the time you have switched back to Process Explorer and found it in the list.
    2. Even if you managed to get your process, the information about total runtime is quickly purged from the process list.
  6. To solve these problems let us increase the time process is shown in the list after it has terminated. For this purpose go to Options -> Difference Highlight Duration and set the duration to 9 seconds:



    Now the process will remain in the list for 9 seconds after its termination.



    That should be enough to write down its execution time. Now just start your program and read the execution time after its termination. You can easily identify the process which terminated shortly by the red background highlighting (on the screenshot you can see the "pan.exe", the compiled verifier from Spin).


Categories: general | sysinternals
Permalink | Comments (0) | Post RSSRSS comment feed

Office Palmistry and SELECT DISTINCT

Das wahre Office

Microsoft Academic Programm plans as a part of its activity to support the "Das wahre Office" ("The authentic office", search for "Ultimate Steal" in Internet for a similar campaign in Ireland; just in a nutshell -- undergraduate students can obtain the MS Office 2007 Ultimate for the price of barely €50) campaign to arrange several lectures and workshops throughout Germany under the tagline "Office Themen-Tagen" ("Thematical Days of Office"). As a big and longlife fan of MS Office (starting from as early as its version for DOS!) I am going to deliver several lectures on applied office topics. It is planned that I mostly concentrate on MS Word, MS Excel and MS Access whereas other Microsoft Student Partners will cover the rest of the Office 2007 Ultimate Edition except for the products with dubious end-user value (like InfoPath, for example).

Office Palmistry, Taming the Word

This campaign just prodded me to produce several video tutorials for these MS products and probably to devote some more time to Office. So, just to contradict the Shakespeare's "a rose by any other name would smell as sweet" I came up with a provocative and edgy title for the whole series of videos -- "Office Palmistry" -- and for its first part, officially known as "Scientific Writing with Word 2007", but published under its running title "Taming the Word". Here you can read the script of the first video.

I haven't decided yet where I would like to host these videos and so far I am just learning how to produce videos under Vista. Meanwhile the script of the very first tutorial is ready and meanwhile I publish it here in my blog. I will probably move it somewhere else in the future (do you have any suggestions), especially as I have the screencast for it ready.

SELECT DISTINCT with several columns

As regards the pivotal topic of this blog -- I am currently working on a rather big tutorial on preudo-SQL SELECT DISTINCT (col1, col2, col3), col4, col5 FROM Table1 command and how to write querries which would do that is meant by such a crippled SQL query. I will try to publish it this week (since this is the only week I have off in the University).


Permalink | Comments (0) | Post RSSRSS comment feed

Sets and set operations in .Net or "Why do we reinvent the wheel?"

Resurrection of Delphi

As a dilligent subscriber of DotnetPro-Magazine (monthly developer magazine in German) I read in the last issue about the reincarnation of Delphi and Object Pascal undet the .Net platform. Before .Net Delphi was the language I had had most experience with, spending days and weeks trying to develop my own visual components and to develop something worth to show the others. Delphi is still the IDE of choice if you need to programm something, which should be compiled into the native code and should run on a variety of Windows Systems, starting from Win98SE on. In my last project where I had to programm the software which controls acoustic modem in marine research I had to use it again (in 2009, imagine!) in order to make sure that the final code will run on both old and new laptops they have.

Sets in .Net and C#

One of the articles of the actual issue was about "mimicking Object Pascal sets under .Net" by Bernd Klaiber, where he describes the class, developed by him in order to get a behaviour similar to what is called "sets" in Delphi. I was a little bit surprised to read about this implementation and even recommendation to use this "generic implementation with overloaded operators" implementation, which is somehow compared to the classical C# Flags. This is, a propos, the only point I agree -- the implemenation offers much more functionality compared to Flags, but do we really need this functionality?

Classical implementation from 2002

First of all there is a classical and well-known implementation of sets under .Net, which can be found and downloaded from CodePage. The implementation (and the article in the CodePage portal which supports it) was published as early as in 2002 and was revised in 2004, it implements a whole hierarchy of Set classes, implementing not only the classical set, but also dictionary and hash sets. The implementation does not overload any operators, probably because it was mainly implemented before this became possible in .Net (even though it is implemented as a generic class in the latest version available on CodeProject).

Drawbacks and what we actually need

I used this implementation in one of my projects at work and found it not so handy, because you have to deploy 2 separate DLLs together with your code (this is just one stand-alone DLL), even if you use just the smallest part of the overall functionality. By the way, when I think of sets, I basically think of the following features: ability to have a set of data where I can add to and remove from and check whether a certain element belongs to the set of not. Other operations like union or interseption are nice, but if I need them I rather use something in the direction of SQL-server or SQL-provider to implement this feature. That's why basically speaking of sets we need three operations:

  • Add something to set.
  • Remove something from the set.
  • Check, if something belongs to the set or not.

Flags

For static values one should take the classical C# flags, which are nothing more than a binary enum with possibility to check for certain bit. Here is just a small extract from my C# workshop demo project to illustrate the issue:


[Flags]
enum DaysOfWeek : byte
   { Monday = 1, Tuesday = 2, Wednesday = 4, Thursday = 8,
         Friday = 16, Saturday = 32, Sunday = 64 }

public static void FlagsTest()
{
   DaysOfWeek workingDays = DaysOfWeek.Monday | DaysOfWeek.Tuesday
                 | DaysOfWeek.Thursday | DaysOfWeek.Wednesday | DaysOfWeek.Friday;
   DaysOfWeek today = DaysOfWeek.Monday; 
   DaysOfWeek day1 = today;
   day1 |= DaysOfWeek.Sunday; // adding new day
   day1 |= DaysOfWeek.Tuesday; // adding another day
   day1 &= ~DaysOfWeek.Saturday; // removing one day
   if ((today & workingDays) != 0)
     { Console.WriteLine("Today is " + today + ", working day"); }
   else
     { Console.WriteLine("Today is " + today + ", holiday"); }
 }

What if we need more?

To some reason unknown to me there are quite a few who know that there is actually a full-fledged set class in .Net starting from the version 3.5. This is the hashset class, which implements all methods we need in order to manipulate elements and check for them. It supports .Add(), .Remove() (together with .Clear()) and .Contains(), the basic 3 operations I ever needed from a set. Besides, it also supports many real set operators like .ExceptWith() or .Overlaps(), which were so arduously implemented in the article.

Then I switched to this class in a newer implementation of my project at work the total project sized schrinked from about 800Kb down to 16Kb (imagine!) and I didn't have to deploy two separate DLLs with my project anymore (that was great!). The only method which I missed a lot was the .AddAll(T[] T) methods, which woud allow me to add the whole array of elements to the set at once, without implementing a loop. That's why I just briefly concocted the following class:


    class SetOfStrings : HashSet <string>
    {
        public bool AddAll(string[] newItems)
        {
            try
            {
                foreach (string newitem in newItems)
                {
                    base.Add(newitem);
                }

            }
            catch (Exception)
            {
                return false
            }
            return true;

        }
    }

Alas, this new class was not generic anymore (I did not care for it because I my set was hard typed), but it implemented the functionality I needed. Apparently it was the only difference between the CodeProject class and the hashset, since the rest of the code just worked without further correction. Now I use this code in several project I work at and I am fully satisfied with the execution speed and flexibility.


Permalink | Comments (0) | Post RSSRSS comment feed

Brosius, Scheerer, Wolff: Business Intelligence mit Office 2007 und SQL Server

Buchbild

Business Intelligence mit Office 2007 und SQL Server
Data Mining und Datenanalyse mit Excel, SharePoint und SQL Server
Von: Brosius, Gerhard / Scheerer, Benjamin / Wolff, Ulrich


328 Seiten
erschienen bei Microsoft-Press 01/2009 (Hardcover)  
ISBN-10: 3-86645-637-9

ISBN-13: 978-3-86645-637-2
Preis €49,90 (bei Bestellung auf MS Press Seite,
Best.Nr.: MS-5637).

Dieses Buch war bereits das zweite MS-Press Buch von diesem Team aus Hamburg, das ich mir zwecks Vorbereitung für meinen Vortrag an der Uni Hamburg bestellt habe. Das erste Buch wurde noch Mitte 2006 unter dem Titel "Business Intelligence und Reporting mit Microsoft SQL Server 2005" herausgegeben, es richtete sich aber in der ersten Reihe an Entwickler und Datenbank- bzw. Data Warehouses Administratoren, die ihre Kenntnisse im Bereich MS SQL Server 2005 vertiefen wollen.

Dieses Buch hingegen, hat Manager und Verwaltungskräfte als Zielgruppe. Anders gesagt, es werden hauptsächlich die Leute angesprochen, die im Unternehmen für gewisse wirtschaftliche Entscheidungen zuständig sind. Wie die Autoren selber in der Einleitung schreiben: "Das Buch richtet sich an Leser, die in ihrem Arbeitsalltag Daten analysieren und bewerten müssen". Das Ganze wird anhand von gut ausgewählten Beispielen erklärt, wobei man wenig ins Tiefe der im Hintergrund ablaufenden Analyse geht, dafür aber sehr breites Spektrum von üblichen Wirtschaftssituationen präsentiert wird.

Das Buch setzt den installierten MS SQL Server 2005, MS Office 2007 und Data Mining Add-In für Office 2007 voraus. Die Installation vom MS SQL Server "out-of-the-box" reicht schön völlig aus, um in dem Buch angeführte Beispiele auch selber bearbeiten zu können. Obwohl auf Titelseite nicht explizit erwähnt, wird in das Buch von einem MS SQL Server 2005 und Add-In für diese Serverversion ausgegangen. Sollte man den SQL2008-Server für Analysis in Anspruch nehmen, muss man aufpassen: einige Formulare sind in dieser Version vom Office Add-In bereits überarbeitet und bieten in der Regel bessere Übersicht bzw. Analysemöglichkeiten. Auch die Ergebnisse der Analyse werden sich an manchen Stellen von dem, was in dem Buch steht, leicht abweichen, was man aber dann nur auf die Serverversion zurückführen kann. So werden die Daten automatisch fürs Training und Validieren partitioniert, es besteht keine Notwendigkeit mehr, diese manuell vor der Analyse zu partitionieren -- das Ganze wird einfach über das Analyse-Formular eingestellt und im Hintergrund durch Add-In erledigt. Sollte man also das Aufteilen von Daten nicht mehr nach dem im Buch aufgestellten Algorithmus durchführen können, muss man sich nicht wundern und diesen Abschnitt einfach beim Lesen überspringen.

Erfahrung im Umgang mit Excel ist zwar erwünscht, jedoch nicht strikt vorausgesetzt: die nötigen Sachverhalte werden ausführlich erklärt, das Buch ist mit vielen Abbildungen versehen, wo fast jeder Schritt auch optisch nachvollziehbar ist. Darüber hinaus beschäftigt sich das erste Teil bestehend aus 3 Kapiteln mit Datenanalyse mit Excel, wobei die Grundbegriffe wir Excel Table, Excel Charts und Pivots ausführlich und illustrativ erklärt werden. Eine Excel-Datei mit allen in dem Buch verwendeten Beispielen lässt sich von der Autorenseite herunterladen. Dieses Teil ist eine perfekte Einführung in Datenanalyse mit Excel und ist jedem, der Excel täglich im Beruf einsetzt, anzuraten.

In dem zweiten Teil werden die Grundlagen von Data Mining vermittelt, wobei die einzelnen Analysemöglichkeiten von dem Office Add-In systematisch durchgangen werden. Hier wird aber meistens den "Knopf-Formular"-Ansatz verwendet, indem man nicht die Fragenstellung und Problemlösung erklärt, sondern einfach die hinter den Buttons liegenden Funktionen Schritt für Schritt erklärt. Wer sich also mit Datenanalyse noch nicht richtig auskennt, der soll noch ein anderes Buch davor gelesen haben; diejenige aber, die sich einen schnellen Einstieg in Office Data Mining Add-In wünschen, werden zufriedengestellt.

Hier wird auch häufig versucht, eine bestimmte Strategie für Datenanalyse auszuarbeiten, jedoch wird dieser Versuch nicht immer konsequent weitergeführt. Darüber hinaus gibt es hier einen peinlichen Fehler, der leider die ganze Logik der Datenanalyse zunichte macht. Es handelt sich um das Kapitel 5 „Data Mining Tools". Auf der Seite 189 wird ein Genauigkeitsdiagramm für ein Klassifikationsmodel dargestellt (Abbildung 5.14), wo alle drei Linien des Genauigkeitsdiagramms aufeinander liegen. Die Abbildung weckt bei einem aufmerksamen Lesen sofort Zweifel, da wenn das Ideale Model sich von der Vorhersage nach Zufallprinzip nicht unterscheiden lässt, macht es überhaupt keinen Sinn, irgendein Model weiter zu bauen, man nehme einfach das Zufallmodel und wende dieses an.

Ich musste das Model selber nachbauen und die Genauigkeitsdiagramm sah erstaunlicherweise deutlich anders aus. Beim Untersuchen der beiden Modelle wrid auch der ursprüngliche Fehler klar:  auf der Seite 188 in dem Punkt 3, in dem die Vorherzusagende Spalte gewählt wird, steht „Geben Sie aber statt des vorgeschlagenen Wertes 0 als vorherzusagenden Wert 1 ein." Wenn man sich aber die Abbildung in dem Buch ansieht, da kann man unter der Diagrammüberschrift „Vorhergesagte Spalte „Kunde = 0"". Das erklärt auch das Ergebnis der Genauigkeitsüberprüfung, die 0-Werte kommen in der ursprünglichen Datenmenge in knapp 100% der Fälle vor, da passt ein Zufallmodel am Besten. Das bricht leider die ganze Logik der weiteren Modelentwicklung, die in dem Buch vorgeschlagene Datenselektion (downsampling) trägt nicht bei zu Modelgenauigkeit bei, ganz im Gegenteil, die Vorhersagekraft des Models wird dadurch um etwas schlechter. Der Fehler wurde bereits den Buchautoren mitgeteilt und wird hoffentlich in der nächsten Auflage korrigiert.

Das letzte, dritte Teil des Buches durchleuchtet das Thema der Zusammenarbeit zwischen MS Excel und Sharepoint 2007. Es werden die Excel Services im Rahmen eines Sharepoint-Servers erklärt und auf Buisiness Intelligence mit diesem Server eingegangen. Da ich leider auf dem Gebiet kein Expert bin, kann ich den Inhalte hier nicht richtig bewerten.

Im Großen und Ganzen lässt auch das Buch einen guten und soliden Eindruck, lediglich die Erscheinung in der Fachbibliothek-Serie vom MS Press finde ich seltsam, da dadurch ein großer Anteil der Zielgruppe das Buch leicht übersehen kann. Eine "stand-alone"-Ausgabe oder im Rahmen einer Office-Serie wäre hier, meiner Meinung nach, viel angemessener.

Das Buch kann direkt von der MS Press Seite bestellt werden.


Categories: book | database | german
Permalink | Comments (0) | Post RSSRSS comment feed

Off we go!! // Auf die Plätze! Fertig! Los!

Everything seems to be ready, the server works, the blog engine is correctly installed and it is high time to write the first blog entry, but what should I write here? At all, do I need to write something here? A difficult question. At least you don't need to read it, that's good. :) Let me just briefly describe my motivation to start this blog, you may find it boring, if you continue to read, so be warned and don't complain afterwards!

Who am I?

A good question, I wish I knew it. Basically I can only tell you with some degree of certainty what I am and with even more certainty what I am not. 

I was bord many years ago in Kazan, Russia. That time it was still Soviet Union and I was happy to experience some of the benefits and disadvantages of that time. I was 12 when the regime fell and the time immediately following this event will always remain fresh in my mind. I never felt myself so uncertain about what is coming and what I am going to be as during this time in Russia, where we during one night lost our former identity and got nothing instead. But now this time is over and it may never come again!

Medical School in Russia

So, technically, I am a child doctor. Yes, it may sound redicular, but this is the only university degree I have completed as early as in 2002. I have never practised medicine since then except for numerous pieces of medical advice I am (even now) asked for. Strangely enough, being a good student during my medical school time, I just completely abandoned this activity and was never (or better to say, almost never) sorry for it. Medical school is a good school where I learn how to learn generally, for the only way to complete it is to learn everything by heart. This does not mean there is no logic in medicine -- surely there is -- but you are usually short in time to grap it while learning and the understand comes after you have stuffed everything into your head. That makes your brain just function perfectly and to infer the logic in the things which seemingly do not have any. This skill -- I don't know how to name it -- helps one further in the career.

Move to Germany

Having finished my medical school I didn't work a day. Next week after my graduation I moved to Germany to start my PhD which I was pursuing actively for 3 years and then switched to IT without actually completing it. I am still on my way to finish it and everything looks optimistic except for free time. Ultimately, I need to revise my priorities somehow...

Study of Computer Science

So, from 2005 on I pursue my IT career as a database and software developer. 2007 I was enrolled to the Technical University Hamburg-Harburg, initially as a student of General Engeneering Sciences (GIS, AIS in German), but during my first term I switched to Informatics and Engeneering Sciences (IIW in German) and completed my basic studies (Grundstudium) early in 2009. Even though my course is called "Informatik-" in German, I am apt to think that Computer Science would be an adquate name for it as well. It is just historically called informatics, from the time when everything what touched computers was called so.

This does not mean I started right from scratch in 2005, not at all. I had some experience of application development for Windows before, I developed several rather sophisticated applications for Windows using Borland Delphi 3 and 5 during my time as a medical student I used in my scientific work at the Department of Human Physiology of my medical school. We used them for data acquisition and modelling. That was a good start already, I knew the basics of OOP and had a broad scope in IT when I started, to say nothing about the tremendous interest to the whole area.

Career

I was lucky to find my first IT job rather quickly, thanking to my wife who spotted the correct ad on the pinboard during one of her sporadic visits to our university cantine. As of know I have over 2 years of experience as database and application developer and this experience motivated and helped me to obtain the certifications I hold now. After first 1-2 months of euphoria of working as a developer I ultimately got the feeling I am on the right place now, I do something I was always eager to do, and I can do it well enough to be paid for it. That was like a big alleviation after 10 years of pursing somebody else's career, not mine. Even being probably a not so bad physician, I feel a sort of calling for informatics and mere practisizing it makes me happy.

Microsoft Academic Programm

In almost a year after my enrollment to the TUHH I came across an event which was arranged by Microsoft Student Partners. It was a 4 days workshop about not yet officially released MS technology called Windows Presentation Foundation and was held by two ordinary (e.g. undergraduate) students of TUHH, Pawel and Björn. Inspite of my initial prejustice the workshop might be not of such a high quality if prepared and held by students it was absolutely great. I saw people who were really interested in the technology they were delivering and were able to infect others with their enthusiasm. This event became the key point which finally brought me to Microsoft Academic Program.

Microsoft Academic Program is just another student society under the auspices of Microsoft where technology enganged students can pursue their certifications and career of a trainer for Microsoft technologies. The flat hierarchy, so common for US, was rather new for me here. And obtaining benefits for the things I would normally rather pay myself for was also a very nice surprise to say the least. Thanking to this program I could start pursuing my certifications, to develop my soft skills as a trainer and coacher, to see something close to the global goal of my career. And this is the Academic Program, which inspired me to start this webpage and this blog.

In that respect you may consider me biased for Microsoft technologies and solutions. It is only partially true. Yes, I am prone to use MS technologies because I know them and I hold certifications for them, that means I used to study them fundamentically from books. But I am open to every new technology, I have a LPIC-1 certification (Linux Professional Institute), Linux is installed on every machine I use and I develop much for Linux as well. But even here I try to use technologies, originally suggested or developer by MS, like Mono, for example. So, I may be just a little biased, not more, I admit it!

Why this blog?

Why to start a new blog? This is not my first blog, I have been blogging for 7 years using gradually degrading livejournal.com service. But this is a personal blog where I write primarily about myself and events which are directly related to me. Here I would like to have a sort of "career diary", blogging about the events from IT which seemed interesting to me and worth to be written about.

An attentive reader would ask -- why to blog in English if it is not my primary language? And I am not going to blog only in English here. I do realize that some entries would require German as I live in Germany and it would be rediculous to blog about events which are intimately related to and probably also confined to Germany in any other language. I would, however, try not to use further language I can speak here, for it would hinder you from following me. I assume that every reader of my blog (and there are none at the moment :) ) would understand some English to get the information he or she is looking for, still I will reply to every comment written in the language I can understand.

What is this blog about?

At the moment I am trying to become an expert in the area of database development and administration. I use .Net for application logic and MS SQL Server 2008 as RDBMS, and I will probably blog about them. I will also blog about the articles and books I read, about the wiki-dot.net webpage and the wiki which is going to emerge here in the nearest future. I will also blog about something I find interesting for you to read or useful to know.

But I will not blog about my personal life, about my wife and son, about my trips etc. If you want to know it, welcome to my personal blog at livejournal. I also promise not to blog much about local topics which may be interesting for German readers only. I will try to balance, let's see if I manage it.

The first enry is menacing to grow beyond any boundary of politeness, that's why I want to call it a day.

So, off we go!


Categories: english | general
Permalink | Comments (1) | Post RSSRSS comment feed