NEW DATABASE - 350 MILLION DATASHEETS FROM 8500 MANUFACTURERS
Performance Enhancement Environment for the Streaming SIMD Extensions ® - Datasheet Archive
Performance Enhancement Environment for the Streaming SIMD Extensions ® Copyright © 1999, Intel Corporation. All rights
Using the VTuneTM Performance Enhancement Environment for the Streaming SIMD Extensions ® Copyright © 1999, Intel Corporation. All rights reserved. 1 Agenda n Background n The VTuneTM Performance Enhancement Environment for the Streaming SIMD Extensions n Development Methods for the Streaming SIMD Extensions n Summary ® Copyright © 1999, Intel Corporation. All rights reserved. 2 Background: MMXTM Technology Tools n Enabled low level work (assembly language) n Efforts to provide high level support (compilers) were: n late n not utilized by ISVs n immature technically n not adopted by the industry quickly, or at all ® Copyright © 1999, Intel Corporation. All rights reserved. 3 MMXTM MMXTM Technology Technology Tools Tools What Developers Told Us n It is painful to realize performance benefits from MMXTM technology n Compilers are not capable of taking highlevel code and, automatically, producing optimized MMXTM technology instructions n You, the developer, had to use assembly Lack of good tools creates Lack of good tools creates big headaches for developers big headaches for developers ® Copyright © 1999, Intel Corporation. All rights reserved. 4 The VTuneTM Performance Enhancement Environment, 4.0 n Intel® C/C+ Compiler n VTuneTM Analyzer n Register Viewing Tool n Performance Library Suite n Intel® Architecture Training Center The definitive toolkit for Streaming The definitive toolkit for Streaming SIMD Extensions programming SIMD Extensions programming ® Copyright © 1999, Intel Corporation. All rights reserved. 5 VTuneTM VTuneTM Performance Performance Enhancement Enhancement Environment Environment Intel® C/C+ Compiler nA "plug-in" to Microsoft* Developer Studio versions 5.0 and 6.0 Microsoft Visual Studio 97* n Object, language compatible with MSVC v5.0 and v6.0 ® Copyright © 1999, Intel Corporation. All rights reserved. *Third-party brands and names are the property of their respective owners. 6 VTuneTM VTuneTM Performance Performance Enhancement Enhancement Environment Environment Intel® C/C+ Compiler Optimization n For SIMD coding: n inlined-asm, intrinsics, vector classes, and vectorization n data alignment mechanisms n CPU Dispatch: n different code for different processors - one executable n Scalar optimization: n aggressive floating-point optimization n profile-guided and inter-procedural optimization Let the compiler deal with optimization Let the compiler deal with optimization ® Copyright © 1999, Intel Corporation. All rights reserved. 7 VTuneTM VTuneTM Performance Performance Enhancement Enhancement Environment Environment n VTuneTM Analyzer Performance tune apps via several methods: n Processor sampling for CPU usage without binary instrumentation n CPU simulation in software - Dynamic Analysis n Chronologies of performance counters from OS, processor, 740 Graphics chipset n Call graph profiling for C/C+ and Java* with binary instrumentation ® Copyright © 1999, Intel Corporation. All rights reserved. *Third-party brands and names are the property of their respective owners. 8 VTuneTM VTuneTM Performance Performance Enhancement Enhancement Environment Environment VTuneTM Analyzer All tuning methods centered around source code views n Offers performance tuning advice for C/C+, Fortran, Java*, assembly n Teaches how to write better performing code n Supports Pentium® III, Pentium II, Pentium Pro, and Pentium processors on Windows* 95/98, Windows NT* 4.0/5.0 n ® Copyright © 1999, Intel Corporation. All rights reserved. *Third-party brands and names are the property of their respective owners. 9 VTuneTM VTuneTM Performance Performance Enhancement Enhancement Environment Environment ® TM VTune Analyzer: Hotspots Copyright © 1999, Intel Corporation. All rights reserved. 10 VTuneTM VTuneTM Performance Performance Enhancement Enhancement Environment Environment ® TM VTune Analyzer: Coach Advice Copyright © 1999, Intel Corporation. All rights reserved. 11 VTuneTM VTuneTM Performance Performance Enhancement Enhancement Environment Environment ® VTuneTM Analyzer: Dynamic Analysis Copyright © 1999, Intel Corporation. All rights reserved. 12 VTuneTM VTuneTM Performance Performance Enhancement Enhancement Environment Environment ® VTuneTM Analyzer: Call Graph Profiling Copyright © 1999, Intel Corporation. All rights reserved. 13 VTuneTM VTuneTM Performance Performance Enhancement Enhancement Environment Environment n ® Register Viewing Tool Shows xmm registers during execution, debugging Copyright © 1999, Intel Corporation. All rights reserved. 14 VTuneTM VTuneTM Performance Performance Enhancement Enhancement Environment Environment Performance Libraries n Image Processing n Signal Processing n Recognition Primitives n Math Kernel n JPEG Library Streaming SIMD Extensions and Streaming SIMD Extensions and MMXTM Technology used extensively MMXTM Technology used extensively ® Copyright © 1999, Intel Corporation. All rights reserved. 15 VTuneTM VTuneTM Performance Performance Enhancement Enhancement Environment Environment Intel® Architecture Training Center n Computer Based Training (CBT): n interactive ® tutorial on Streaming SIMD Extensions Copyright © 1999, Intel Corporation. All rights reserved. 16 VTuneTM VTuneTM Performance Performance Enhancement Enhancement Environment Environment Intel® Architecture Training Center, cont'd n Pentium® II and Pentium III processors Programmer's Reference Manuals n Optimization Manual n Application notes and code samples using Streaming SIMD Extensions: n 3D lighting/transform, filters, min/max, Newton-Raphson, FFT, deformable surfaces, & lots more ® Copyright © 1999, Intel Corporation. All rights reserved. 17 Development Methods for the Streaming SIMD Extensions Hand coded assembly Bit Bangers Only! Intrinsics movaps xmm0, b[i] movaps xmm1, c[i] addps xmm0, xmm1 movaps a[i], xmm0 a[i]=_mm_add_ps(b[i],c[i]) Performance libraries You make the call C+ class library RLsbAdd3() Difficulty Difficulty Fast food assembly a[i]=b[i]+c[i] Performance for the masses Vectorization #pragma vector Let the compiler do the work, sort of. ® Copyright © 1999, Intel Corporation. All rights reserved. 18 Development Development Methods Methods Intrinsics n New data type: _m128 n No need to schedule and register allocate n Can still choose instruction sequences n Use for MMXTM Technology and the Streaming SIMD Extensions n Definitions in Pentium® III Processor Programmer's Reference Manual n Near hand-coded assembly performance (< 15% difference) a[i]=_mm_add_ps(b[i],c[i]) ® Copyright © 1999, Intel Corporation. All rights reserved. 19 Development Development Methods Methods C+ class library n Abstract the underlying technology n Performance gain everywhere class used n New packed data types: n I32vec2(2 32-bit ints), I16vec4 (4 16-bit ints), I8vec8(8 8-bit ints), and unsigned versions n F32vec4(4 32-bit floats) n Extensible, easy to use, keeps code readable and portable n Nearly matches intrinsics performance (0-5% difference) a[i]=b[i]+c[i] ® Copyright © 1999, Intel Corporation. All rights reserved. 20 Development Development Methods Methods Vectorization n Compiler generates SIMD integer or FP code for you under strict conditions: n countable loops with single-unit stride n body of loop must be single block no internal branching, single entry/exit n data types: float, char/short/int n user ensures correct alignment for floats n user ensures no aliases for pointers n no function calls in loop ® Copyright © 1999, Intel Corporation. All rights reserved. 21 Development Development Methods Methods Vector-Multiply-Add in C void do_c(float *ac, float *m, float *v, int n) { for(int i=0; i