There are a set of factors creating the environment where Microsoft and mainstream parallel computing come face to face, so there is a need to ensure that a robust and workable bridge exists between the two. That was the basis underpinning the recent Intel Parallel Computing Conference in Salzburg.
On the hardware side the factors are the multicore processors from the likes of Intel and AMD. Now all servers sold by the major vendors have four-core processors as standard, which means that, as the server upgrade cycle drifts through the user base, even the smallest user is equipped for a measure of parallel computing. With six-core devices coming available at the server high-end, and eight-core coming next year, the next 12 months will see the parallel processing performance of bog-standard servers expand significantly.
And that is just in the traditional x86-architecture environment. There is widespread speculation and experimentation about the potential of so-called 'manycore' specialist devices such as Nvidia's graphics processor. These devices, with 32 or more cores per processor, are seen as a powerful alternative platform option capable of running mainstream business services and applications for those users looking to take the parallel route seriously. Intel is well aware of this potential as it is already talking openly about its own up-coming graphics chip, codenamed Larrabee, as a contender for mainstream parallel processing as well. Published simulation results suggest this will have up to 32 x86 architecture cores available to start with.
On the software side Microsoft is a major player in the mainstream of business applications and operating systems and, while there has to be some doubts about its current capabilities at supporting and exploiting the potential of parallelism, neither the company nor Intel are going to ignore the need for new tools to help applications developers move towards parallelism with both new applications and the adaptation of legacy ones. The need is, after all, already growing fast as eight-core processors—and the potential of 32-core Larrabee devices—arrive next year.
It fell to James Reinders, Intel's Chief Software Evangelist and Director of Software Development Products, to outline the new package of development tools the company has brought together to help applications developers maximise the productivity of parallelizing C++ applications using Visual Studio on Windows. Intel already has a suite of development tools for developing C++ and Fortran applications on Linux and MacOS, such as the MPI Library, Cluster Toolkit, Thread Checker, VTtune and the Math Kernal Library. Reinders describes these as tools for experts, and now the need is for tools that meet the needs of the rest of the world. This is the objective behind the new packaged set of tools, Parallel Studio, that Intel is formally introducing on May 26th. This integrates the updated existing tools with Visual Studio-related additions in a fully integrated package.
According to Reinders, all tools in parallel applications development need to address two aspects—helping with correctness and with scaling. They should also provide the appropriate level of abstraction so that issues like maintainability and future proofing are ensured. This is important, particularly in the mainstream where operating systems, language compilers and applications code needs to demonstrate a degree of platform independence as upgraded hardware is introduced.
Parallel Studio helps with these important development issues, helping Microsoft developers from the start point of applications design—where to start parallelizing—through to the tuning of the final program. Everything plugs in with Visual Studio 'very tightly', according to Reinders.
There are a number of Visual Studio-specific component tools included in Parallel Studio. Parallel Composer provides the coding and debug capabilities, with an optimised C++ compiler. It can also invoke Integrated Performance Primitives directly from the math library component as well as supporting Lambda Functions.
Parallel Inspector helps with determinism issues such as data races or dead locks. This is an updated version of Intel Thread Checker that includes a memory checker that can handle threading in a parallel environment. This is designed to help identify problem areas with parallelising existing applications that have worked OK serially, but don't work in a parallel environment because of memory problems. It is particularly useful for identifying memory leaks and threading errors, which can be a real problem in parallelising applications. As well as identifying the source of threading problem it also locates and displays the relevant source code.
Parallel Amplifier is designed to identify bottlenecks and can show what it is in the source code that is causing the problem. It can also show where locks—used to ensure function synchronisation in parallel applications—are in practice causing delays. It can also scale performance by using additional cores. This is expected to be very useful for the applications tuning phase, as it can generate a wide range of statistics and analysis, such as identifying differences that might occur with multiple runs.
A short term goal with Amplifier is its ability to help make applications run faster on multicore platforms by helping developers work out how the application will perform and identify potential trouble spots in advance. This is achieved with analysis tools for hot spot analysis, which finds where the application is spending too much time; locks and waits analysis, which identifies where bottlenecks exist; and concurrency analysis, which determines where and when cores are idle. According to Reinders, concurrency analysis should be particularly useful for helping to move serial code to parallel and keep it effective.
Most of these tools work on AMD processors just as well as on Intel, though Reinders did state that this was not always the case for Parallel Amplifier because of the way that has been optimised to work with Intel processors.
At the first upgrade of Parallel Studio, probably later this year, Intel will be adding Parallel Advisor, though, in an interesting marketing tactic, the plan is to have a 'lite' version of the tool available with the initial launch. This will be aimed at helping developers think parallel and help them work out which is the best way to implement parallelism in their applications. According to Reinders, it will do this by providing some 'what if' analysis on what happens if parallelism used 'here' and 'in this way'. The objective is to use Advisor as part of the initial design phase, to help developers identify possible problem areas such as data races before the coding phase starts. This will be available to purchasers of Parallel Studio, but not to purchasers of the individual tools for other platforms.
Intel is, of course, not pursuing this track alone, and Microsoft has got a new version of Visual Studio (VS 2010) specifically for parallel systems on the stocks. This has already reached the community technology preview stage, and should be available in beta by mid-year. Reinders was, however, keen to point out that Parallel Studio is better than Visual Studio 2010, not least because it supports multiple versions of Visual Studio—and because it is available now.
The support for multiple versions of Visual Studio is obviously important, in that Microsoft's approach is inevitably geared to providing parallelism in the next iteration of Visual Studio—therefore pushing developers towards an upgrade. The ability of Parallel Studio to work with existing versions of Visual Studio—2005, 2008 and the upcoming 2010—should allow developers to add parallelization capabilities while remaining on a development tool they know and understand.
The existing toolset has not been forgotten in all this, with many of the enhancements developed for Parallel Studio being incorporated in the next upgrade. VTune will get new features in June, including key enhancements taken from Parallel Amplifier—though some of these are only expected to appear next year. Thread Checker will get enhancements taken from Parallel Inspector, and the Fortran and C++ compilers will continue to track standards.