Shareware Beach

Wednesday, 26 October 2005

A New Era of Computing

Filed under: Software Development — Jan @ 10:31

With the limited release of the EditPad Pro 6 beta, JGsoft has entered a new era of computing. Previously, all our products were designed for computers with a single processing core (CPU). Almost all present Windows applications and utilities fall into this category, because multi-processor systems used to be very expensive. Putting multiple CPUs into a single computer only made–and still makes–sense for high-end workstations and servers.

The new PC I got a few months ago has a Pentium 820 CPU. I got this system because it was the cheapest dual-core system locally available at the time. In fact, the Pentium 820 was cheaper than Intel’s fastest single-core Pentiums.

Soon, run-of-the-mill computers and even notebooks will come equipped with multi-core CPUs. Intel has announced quad-core server CPUs for 2006, and quad-core mainstream CPUs for 2007. If Moore’s law continues to hold true–the number of transistors that can be put cost-effectively into a single CPU doubles every 18 months–we’ll likely see the number of CPU cores increase further in the future.

The significance of this is not that we’ll have faster computers. In fact, when running the applications I have, my 2.8 GHz Pentium 820 isn’t much faster than the 2.4 GHz Pentium 4 I bought three years ago. The reason is that applications designed with a single active thread in mind don’t take advantage of the extra CPU core.

I’ve made a screen recording of EditPad Pro 5 opening a huge file. The example is somewhat artificial, since few people put a million lines of source code into a single file. But it makes the situation easier to observe in the Windows Task Manager. As you’ll see in the movie, EditPad Pro 5 “goes stupid” the whole time while it scans the file for line breaks and applies syntax coloring. All the while, CPU usage is pegged at 50%. Both cores share the load, but it’s only 50% since there’s only one active thread. The other running threads–the Windows Task Manager and other OS threads–don’t use any measurable amount of CPU time. But they do cause EditPad Pro’s thread to be switched between the cores.

The significance of affordable multi-core systems is that consumers will become increasingly intolerant of applications that “go stupid”. Particularly when that application isn’t using 100% CPU time. An application can always use one CPU core for foreground GUI handling, while lengthy operations run in the background.

EditPad Pro 6 will use up to 4 threads. The main thread handles all GUI and editing tasks, while 3 background threads take care of finding line breaks and word wrapping, applying syntax coloring, and building the file navigation tree.

Even this modest amount of threading makes a dramatic difference. As you can see in the recording of EditPad Pro 6 opening a huge file. The file appears instantly, complete with syntax coloring. The only time you’d have to wait is when you want to jump to the end of the file and EditPad Pro 6 hasn’t finished scanning for line breaks yet. If syntax coloring isn’t done yet, the end of the file is simply displayed without. (Since Pascal supports multi-line comments, the whole file has to be colored.) The movie clearly shows that the line break scanning and syntax coloring threads each tie up one CPU core. The foreground thread doesn’t use much CPU, since I’m simply scrolling through the file.

Making your application multi-threaded isn’t enough. My first attempt at separating line break scanning into its own thread worked wonderfully on a single core CPU. But on a dual core system, CPU usage was still pegged at 50% while scrolling and line break scanning at the same time. The reason is that while my benchmark app had two threads, they weren’t running simultaneously.

Whenever the foreground thread had to repaint the screen, it would block the line scanning thread completely. Doing that is fine on a single core computer, but kills performance on a multi core system. The final solution uses critical sections to only block either thread for the smallest amount of time when line break information is updated.

In the end, what matters is percieved speed. How many seconds of CPU time your app clocks up in the task manager doesn’t matter. How many seconds your customers spend waiting for your software is what it’s all about. The ideal is obviously not to make them wait at all.

In EditPad Pro 5, performance of the syntax coloring mechanism was vital. If it was slow, users would spend time waiting for it. Therefore, EditPad Pro 5 came with a lot of built-in coloring schemes for various popular file formats and programming languages. I coded these schemes directly into the Delphi source code, so they’d run as fast as possible. By contrast, user-contributed schemes used a system based on regular expressions, which precluded many of the assumptions and optimizations the built-in schemes could make. It also made it impossible for users to customize the built-in schemes other than choosing the color palette.

In EditPad Pro 6, syntax coloring performance is irrelevant, as long as it’s reasonable. If a scheme is too complex or a file too large for the syntax coloring to keep up, it’ll simply temporarily disappear until it’s done. As a result, all syntax coloring schemes included with EditPad Pro 6 now use the regex-based custom scheme system, making them flexible and easy to customize. Though EditPad Pro 6 needs significantly more CPU time for syntax coloring, it feels much faster since you don’t have to wait for it, and it’s still fast enough to keep up while you’re scrolling. Only quickly hitting Ctrl+End on a large file or using a rediculously complicated scheme (some user love ‘em) might make it disappear for a while.

When benchmarking your application, think about human time rather than CPU time, and parallellize those tasks that users shouldn’t have to wait for.

Tuesday, 11 October 2005

Delphi 2006 and Beyond

Filed under: Software Development — Jan @ 18:34

Borland has announced Delphi 2006. The most significant difference with the 2005 version will be that C++Builder is now part of the package, integrated into a single IDE with Delphi (Win32 and .NET) and C#. All editions of Delphi include all 4 language platforms.

Also new is that the Professional and Enterprise editions will now contain key parts of ECO, Borland’s Enterprice Core Objects modelling framework for Delphi for .NET and C#. Previously, ECO was only available in the expensive Architect edition. ECO looks like it should make database development a lot easier, by automatically generating Delphi or C# code based on the models you create.

The Delphi roadmap also looks promising, with a VCL version for Avalon (or whatever Microsoft calls it now) and native 64-bit Delphi and C++ compilers announced for 2007. Though some may claim that’s royally late, it seems right on time for me. I prefer that Borland take their time to get things right, rather than rushing an unfinished product to marked like happened with Delphi 2005. By 2007/2008, enough of our customers should have 64-bit systems to make the effort worthwhile. Don’t forget that while 64-bit hardware is common, 64-bit operating systems are not.

My new PC has a 64-bit CPU. Windows XP x64 runs just fine on it. The motherboard included 64-bit drivers for everything, and Windows automatically installed a Microsoft driver for the GeForce 6600 graphics card. The problem is external hardware. According to Canon’s web site, my old S820 printer will never be supported on x64. Drivers for the new MP780 all-in-one I bought are “coming soon”. Since I don’t have any 64-bit software, it doesn’t really matter. 32-bit applications get no benefit from a 64-bit OS.

The result is that I’m still using my trusty Windows 2000 installation, and will continue to do so until Vista arrives. At least it is good to know that all JGsoft products run just fine on XP x64.

Monday, 8 August 2005

CHM Files No Longer Work Across The Network

Filed under: Software Development — Jan @ 11:01

I’ve been getting a lot of complaints from HelpScribble users that their CHM files don’t work. It’s not a problem with HelpScribble, but with CHM files in general.

On Windows systems with the latest security patches installed, CHM files (compiled HTML Help files) can no longer be viewed across the network. If you open a CHM file that is stored on a network drive, you will see the table of contents and the index, but you won’t see the page content.

If you’re a shareware author, the story basically ends here. Make sure your application installs its CHM files on the local PC instead of the network by default. If you get a complaint that your CHM file shows up without content, the far most likely cause is that it was installed on the network.

WinHelp (HLP) files suffer no such problems. If you use a help authoring tool like HelpScribble, you can switch between the CHM and HLP formats at the click of a button. If your software is primarily used in a networked environment, switching to HLP is probably a good idea.

The reason for this mess is that a CHM file basically consists of a bunch of HTML files that are rendered using Internet Explorer. This means that any kind of malware that works on a web page can also be put into a CHM file. It seems Microsoft has decided that CHM files are no longer to be trusted.

It is possible to change the Windows security settings to make it possible to view CHM files across the network. That’s something you can do on your own network. You can’t ask users of your shareware to do this, however, since the changes affects the security of the whole network, not just the limitations on CHM files.

Wednesday, 20 July 2005

Underscore vs Dash

Filed under: Software Development — Jan @ 9:40

Dave Collins wonders about the difference between the underscore and the dash. In particular, he wonders why search engines treat “my_page” as one word, and “my-page” as two.

Easy. Search engines are developed by programmers, and programmers generally treat the underscore as part of a word or identifier, but not the dash. In most programming languages, an identifier can start with a letter or an underscore, which can be followed by a number of letters, digits and underscores. The dash is used as the substraction operator. In other words, “my_page” is an identifier, while “my-page” substracts “page” from “my”.

In regular expressions, the short-hand character class w matches letters, digits and the underscore. While the actual set of characters matched by w is implementation-specific, it always includes the underscore. The difference is in matching non-English letters and digits. So if a search engine programmer uses w+ to get a list of words, “my_page” is matched entirely, and “my-page” matches twice as “my” and “page”.

While I’m sure search engines differ in the way they treat the underscore, it’s a safe assumption that most of them will treat underscores as part of the word, while a dash connects two separate words.

« Previous PageNext Page »