Copy files 86% faster with Vista SP1

Testing Setup

Although Vista SP1 has many documented improvements we aimed to test a particular scenario which has proved to be a major problem for pre-SP1 users namely file copy performance particularly over a network. The perspective was that of a home user running Vista Home Premium on a fast low-latency network with decent hardware. All file copy tests were initiated from the main client machine.

Our main client system was an Acer Veriton 7900 Pro (Intel Core 2 Duo 6700/4GB RAM/ATI RADEON X1950/2xSATA-II HDD) running Windows Vista Home Premium connected via a dedicated gigabit network to two remote systems – one also running Windows Vista Home Premium the other running a fully-patched Windows XP SP2 installation.

Each system used the latest available vendor (non-Microsoft) drivers and the November release of DirectX. No modifications were made to the operating system so as to represent as closely as possible the configuration of an OEM machine.

We uses two test file batches – the first was a single 2.5 GB ISO and the second was 2.5GB of small files (over 300 MP3s). Each file batch was copied to a remote destination (write) and then written back across the wire to the test system (read/write). The destinations were the second hard drive in the main testing system a SanDisk Cruzer Micro 8GB USB flash drive the remote Vista system and the remote XP system. File copies were timed from the time “Copy” was clicked to the time the copy shell disappeared.

We also ran PCMark Vantage across the test system to get an overall impression of system performance.

The aim of the tests was to see how changes in the test machine’s patch level affected performance. The tests were therefore run three times – once with Vista Home Premium RTM once with all the available patches applied from Windows Update and once with SP1 RTM applied. The remote Vista system was also patched to maintain consistency between it and the test machine. The XP system was unchanged.

Hard drives on all the three systems were defragmented before each file copy test.

The Hard Numbers

Although the data were more or less anticipated there were some surprising results as well as some interesting anomalies. These were reported back to Microsoft for clarification and explanation.

PCMark Vantage

The RTM results of PCMark Vantage provided some baseline performance metrics for our test system. Vantage essentially tests system hardware performance so any changes across the various benchmarks would be the result of changes in patch level only as neither hardware nor drivers were changed.

Overall PCMark results jumped 12.8% from RTM to RTM Patched – quite an increase. An overall increase was expected as the two main performance compatibility and stability updates for Windows Vista – KB938979 and KB938194 – are installed by default via Windows Update and were therefore deployed to the RTM Patched system. Most variation between individual benchmarks tests was very slight with three notable exceptions – Music and Communications actually dropped by 1.6% and 6.5% respectively while Productivity increased by 5.25%.

Moving to SP1 demonstrated general improvements across the board. The overall score was 1.5% higher than RTM patched and 14.5% higher than RTM. The anomalous results for Music and Communications in RTM Patched were reversed the end results being 4% and 12.8% higher than RTM respectively. Productivity was another stand-out result being 9.7% higher than RTM and 4.2% higher than RTM Patched.

Graphics and multimedia performance demonstrated consistent improvements across the tests but the Gaming benchmarks demonstrated a very slight decline of 0.2% compared with RTM. Microsoft couldn’t specifically explain why there was a downward trend with this particular benchmark but aren’t too concerned. PCMark’s gaming benchmarks are very sensitive to driver performance (especially graphics drivers) and the version of ATI Catalyst installed isn’t configured to take advantage of Vista SP1. They expect to see this trend reversed in future months with the release of updated vendor drivers along with the general release of SP1.

We also reported the anomalies encountered between the RTM and RTM Patched systems to Microsoft. The explanation given wasn’t specific to the problems encountered but did give us an idea as to why you might expect to see a drop in some areas of performance particularly when they are fixed later on. Essentially a patch or hotfix is designed in relative isolation to the rest of the operating system. It’s tested for compatibility but in general it’s only designed to fix a very specific issue. As such there is a risk that fixing one component actually causes issues elsewhere and Microsoft believe that this is the main reason for the data anomalies we encountered in the RTM Patched tests.

A service pack on the other hand is with a much more holistic approach examining all components and testing them for inter-operability. SP1 incorporates a number of architectural changes as well as various fixes and updates which is why we encountered benchmark data which was not only much better than RTM but reversed any negative trends in RTM Patched.

Vista SP1 is being marketed as a significant update to improve the performance and responsiveness of Vista systems and these benchmark results certainly bear this out. The biggest improvement are in communications and overall performance which have been the main areas of criticisms of Vista to date.

Disk-to-Disk File Copy

Performance in this benchmark was already quite good and few changes were anticipated. The two most notable metrics was that in RTM there was an initial difference of up to 12 seconds between the single file and multi-file tests but that this variation was largely eradicated in the SP1 performance data with the tests producing almost identical results.

Disk-to-USB File Copy

These results produced an interesting trend between the two tests. The test results were exactly the same in RTM but the write test data started to deviate in RTM Patched and were almost a minute apart in SP1 in favour of the single file test. Multi-file write times under SP1 were actually worse than RTM/RTM Patched whereas single file write times improved significantly. This was reported to Microsoft who reported that this was not an expected result. The data has been submitted for further examination. In both cases the writeback test improved marginally with SP1.

Vista-to-XP File Copy

Given the problems which have dogged Vista in the realm of XP/2003 networking this was the area we expected to see the greatest improvements…and we weren’t disappointed.

There was almost no variation in the test results between RTM and RTM Patched but it’s worth mentioning just how bad performance was. The single file write test was almost 3.5 minutes slower than the same test copying to USB and the multi-file test was only a minute quicker. The writeback test results were appalling at between 6 to 8 minutes slower than the USB copy tests and interestingly there was a difference of between 2 to 3 minutes between the single file and multi-file tests in favour of the multi-file.

With SP1 however the improvements were astounding. The single file write test dropped from just under 9 minutes to less than 2 – a 77% improvement – while the writeback test dropped from 9:33 to just over a minute – an 86% improvement. The results in the multi-file tests were equally impressive improving by 64% and 83% respectively. Interestingly the massive variation between the tests on RTM was almost completely ironed out in SP1.

Vista-to-Vista File Copy

This test produced the greatest number of anomalous results on which we consulted with Microsoft for explanation and clarification.While the overall performance times were good they worsened across the board from RTM to RTM Patched by a factor of 2 to 5 seconds. However this was all resolved in SP1 where test results were 6 to 14 seconds better than RTM with the notable exception of the single file writeback test which was a substantial 21 seconds slower than RTM. The explanation for this has much do to with how file copying in Vista SP1 has been redesigned and also explains why the Vista-to-XP tests in RTM and RTM Patched were so poor.

The changes to the file copy algorithm introduced in Windows Vista were designed to remedy a number of problems with the algorithm used in Windows XP such as the Cache Manager write-behind thread on the target system not being able to keep up with the rate of data writes to cache and memory in copies involving lots of data thereby eventually turning the target system’s memory into a bottleneck in the copying process. Also file copies from a remote system cached the content twice – once as the source file was read and once as the target file was written. This puts extra memory pressure on the client system as well as extra CPU load because of the Cache Manager working overtime.

Vista uses an algorithm which uses cached disk I/Os for files of 256KB or less and non-cached I/Os for files larger than this – 2 I/Os for files 2MB in size or smaller ranging up to a maximum of 8 I/Os for files larger than 8MB. The size of the I/O also varies depending on the file being copied – the actual file size for files less than 1MB 1MB for files between 1MB and 2MB in size and 2MB for anything bigger than 2MB. Therefore to copy a 16MB file Vista would use 8 x 2MB non-cached I/Os wait for the writes to complete and then start another cycle. Although this algorithm does improve file copies over the previous algorithm it does suffer from a number of drawbacks the most significant ones being that on network file copies the process sporadically write the I/Os back out of order. This causes excessive disk seeks on the destination system and over time bottlenecking as the Cache Manager effectively backs up at both ends of the wire. Another issue – and probably the most significant – is poor copy performance on file copies involving large files and/or large groups of files.

The exact reason for this is explained in detail by Mark Russinovich Technical Fellow in the Platform and Services Division at Microsoft and previously of Winternals and In his blog he explains that “…the previous algorithm’s use of cached file I/O lets Explorer finish writing destination files to memory and dismiss the copy dialog long before the Cache Manager’s write-behind thread has actually committed the data to disk; with Vista’s non-cached implementation Explorer is forced to wait for each write operation to complete before issuing more and ultimately for all copied data to be on disk before indicating a copy’s completion. In Vista Explorer also waits 12 seconds before making an estimate of the copy’s duration and the estimation algorithm is sensitive to fluctuations in the copy speed both of which exacerbate user frustration with slower copies.”

Vista’s reliance on non-cached I/O copies effectively increases the perception of poor performance regardless of whether the data is being moved more efficiently or not. This certainly explains our benchmark data which was measured from the “perceived start” to the “perceived finish” of each file copy as well as the serious drop in system responsiveness experienced during each file copy.

SP1 seeks to remedy the situation by going back to using cached I/Os in all file size instances except in the case of remote file copies. In this case it issues a command not to cache remote file locally during the copy process which prevents the double-buffering problem previously encountered and spools the I/Os passed from source to destination into memory where they assembled in the correct order before being written to the disk – this prevents the hard drive thrashing high CPU utilisation and sluggish system performance previously encountered. Additionally for SMB 1.0 (the file transfer protocol used by Windows XP/2003) the I/Os passed on the wire have been reduced from 60KB in size (the previous default) down to 32KB. This essentially means that file transfers between Vista SP1 and non-SP1 systems have been made vastly more efficient and require fewer resources from both the source and destination systems but at a cost of throughput in some scenarios hence the unexpected slightly higher scores encountered in some of our tests.