3.17.11 Intel 386 and AMD x86-64 Options
These -m options are defined for the i386 and x86-64 family of
computers:
-mtune=cpu-type- Tune to cpu-type everything applicable about the generated code, except
for the ABI and the set of available instructions. The choices for
cpu-type are:
- i386
- Original Intel's i386 CPU.
- i486
- Intel's i486 CPU. (No scheduling is implemented for this chip.)
- i586, pentium
- Intel Pentium CPU with no MMX support.
- pentium-mmx
- Intel PentiumMMX CPU based on Pentium core with MMX instruction set support.
- i686, pentiumpro
- Intel PentiumPro CPU.
- pentium2
- Intel Pentium2 CPU based on PentiumPro core with MMX instruction set support.
- pentium3, pentium3m
- Intel Pentium3 CPU based on PentiumPro core with MMX and SSE instruction set
support.
- pentium-m
- Low power version of Intel Pentium3 CPU with MMX, SSE and SSE2 instruction set
support. Used by Centrino notebooks.
- pentium4, pentium4m
- Intel Pentium4 CPU with MMX, SSE and SSE2 instruction set support.
- prescott
- Improved version of Intel Pentium4 CPU with MMX, SSE, SSE2 and SSE3 instruction
set support.
- nocona
- Improved version of Intel Pentium4 CPU with 64-bit extensions, MMX, SSE,
SSE2 and SSE3 instruction set support.
- k6
- AMD K6 CPU with MMX instruction set support.
- k6-2, k6-3
- Improved versions of AMD K6 CPU with MMX and 3dNOW! instruction set support.
- athlon, athlon-tbird
- AMD Athlon CPU with MMX, 3dNOW!, enhanced 3dNOW! and SSE prefetch instructions
support.
- athlon-4, athlon-xp, athlon-mp
- Improved AMD Athlon CPU with MMX, 3dNOW!, enhanced 3dNOW! and full SSE
instruction set support.
- k8, opteron, athlon64, athlon-fx
- AMD K8 core based CPUs with x86-64 instruction set support. (This supersets
MMX, SSE, SSE2, 3dNOW!, enhanced 3dNOW! and 64-bit instruction set extensions.)
- winchip-c6
- IDT Winchip C6 CPU, dealt in same way as i486 with additional MMX instruction
set support.
- winchip2
- IDT Winchip2 CPU, dealt in same way as i486 with additional MMX and 3dNOW!
instruction set support.
- c3
- Via C3 CPU with MMX and 3dNOW! instruction set support. (No scheduling is
implemented for this chip.)
- c3-2
- Via C3-2 CPU with MMX and SSE instruction set support. (No scheduling is
implemented for this chip.)
While picking a specific cpu-type will schedule things appropriately
for that particular chip, the compiler will not generate any code that
does not run on the i386 without the -march=cpu-type option
being used.
-march=cpu-type- Generate instructions for the machine type cpu-type. The choices
for cpu-type are the same as for -mtune. Moreover,
specifying -march=cpu-type implies -mtune=cpu-type.
-mcpu=cpu-type- A deprecated synonym for -mtune.
-m386-m486-mpentium-mpentiumpro- These options are synonyms for -mtune=i386, -mtune=i486,
-mtune=pentium, and -mtune=pentiumpro respectively.
These synonyms are deprecated.
-mfpmath=unit- Generate floating point arithmetics for selected unit unit. The choices
for unit are:
- 387
- Use the standard 387 floating point coprocessor present majority of chips and
emulated otherwise. Code compiled with this option will run almost everywhere.
The temporary results are computed in 80bit precision instead of precision
specified by the type resulting in slightly different results compared to most
of other chips. See -ffloat-store for more detailed description.
This is the default choice for i386 compiler.
- sse
- Use scalar floating point instructions present in the SSE instruction set.
This instruction set is supported by Pentium3 and newer chips, in the AMD line
by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE
instruction set supports only single precision arithmetics, thus the double and
extended precision arithmetics is still done using 387. Later version, present
only in Pentium4 and the future AMD x86-64 chips supports double precision
arithmetics too.
For i387 you need to use -march=cpu-type, -msse or
-msse2 switches to enable SSE extensions and make this option
effective. For x86-64 compiler, these extensions are enabled by default.
The resulting code should be considerably faster in the majority of cases and avoid
the numerical instability problems of 387 code, but may break some existing
code that expects temporaries to be 80bit.
This is the default choice for the x86-64 compiler.
- sse,387
- Attempt to utilize both instruction sets at once. This effectively double the
amount of available registers and on chips with separate execution units for
387 and SSE the execution resources too. Use this option with care, as it is
still experimental, because the GCC register allocator does not model separate
functional units well resulting in instable performance.
-masm=dialect- Output asm instructions using selected dialect. Supported choices are
intel or att (the default one).
-mieee-fp-mno-ieee-fp- Control whether or not the compiler uses IEEE floating point
comparisons. These handle correctly the case where the result of a
comparison is unordered.
-msoft-float- Generate output containing library calls for floating point.
Warning: the requisite libraries are not part of GCC.
Normally the facilities of the machine's usual C compiler are used, but
this can't be done directly in cross-compilation. You must make your
own arrangements to provide suitable library functions for
cross-compilation.
On machines where a function returns floating point results in the 80387
register stack, some floating point opcodes may be emitted even if
-msoft-float is used.
-mno-fp-ret-in-387- Do not use the FPU registers for return values of functions.
The usual calling convention has functions return values of types
float and double in an FPU register, even if there
is no FPU. The idea is that the operating system should emulate
an FPU.
The option -mno-fp-ret-in-387 causes such values to be returned
in ordinary CPU registers instead.
-mno-fancy-math-387- Some 387 emulators do not support the
sin, cos and
sqrt instructions for the 387. Specify this option to avoid
generating those instructions. This option is the default on FreeBSD,
OpenBSD and NetBSD. This option is overridden when -march
indicates that the target cpu will always have an FPU and so the
instruction will not need emulation. As of revision 2.6.1, these
instructions are not generated unless you also use the
-funsafe-math-optimizations switch.
-malign-double-mno-align-double- Control whether GCC aligns
double, long double, and
long long variables on a two word boundary or a one word
boundary. Aligning double variables on a two word boundary will
produce code that runs somewhat faster on a Pentium at the
expense of more memory.
Warning: if you use the -malign-double switch,
structures containing the above types will be aligned differently than
the published application binary interface specifications for the 386
and will not be binary compatible with structures in code compiled
without that switch.
-m96bit-long-double-m128bit-long-double- These switches control the size of
long double type. The i386
application binary interface specifies the size to be 96 bits,
so -m96bit-long-double is the default in 32 bit mode.
Modern architectures (Pentium and newer) would prefer long double
to be aligned to an 8 or 16 byte boundary. In arrays or structures
conforming to the ABI, this would not be possible. So specifying a
-m128bit-long-double will align long double
to a 16 byte boundary by padding the long double with an additional
32 bit zero.
In the x86-64 compiler, -m128bit-long-double is the default choice as
its ABI specifies that long double is to be aligned on 16 byte boundary.
Notice that neither of these options enable any extra precision over the x87
standard of 80 bits for a long double.
Warning: if you override the default value for your target ABI, the
structures and arrays containing long double variables will change
their size as well as function calling convention for function taking
long double will be modified. Hence they will not be binary
compatible with arrays or structures in code compiled without that switch.
-msvr3-shlib-mno-svr3-shlib- Control whether GCC places uninitialized local variables into the
bss or data segments. -msvr3-shlib places them
into bss. These options are meaningful only on System V Release 3.
-mrtd- Use a different function-calling convention, in which functions that
take a fixed number of arguments return with the
ret num
instruction, which pops their arguments while returning. This saves one
instruction in the caller since there is no need to pop the arguments
there.
You can specify that an individual function is called with this calling
sequence with the function attribute stdcall. You can also
override the -mrtd option by using the function attribute
cdecl. See Function Attributes.
Warning: this calling convention is incompatible with the one
normally used on Unix, so you cannot use it if you need to call
libraries compiled with the Unix compiler.
Also, you must provide function prototypes for all functions that
take variable numbers of arguments (including printf);
otherwise incorrect code will be generated for calls to those
functions.
In addition, seriously incorrect code will result if you call a
function with too many arguments. (Normally, extra arguments are
harmlessly ignored.)
-mregparm=num- Control how many registers are used to pass integer arguments. By
default, no registers are used to pass arguments, and at most 3
registers can be used. You can control this behavior for a specific
function by using the function attribute regparm.
See Function Attributes.
Warning: if you use this switch, and
num is nonzero, then you must build all modules with the same
value, including any libraries. This includes the system libraries and
startup modules.
-mpreferred-stack-boundary=num- Attempt to keep the stack boundary aligned to a 2 raised to num
byte boundary. If -mpreferred-stack-boundary is not specified,
the default is 4 (16 bytes or 128 bits), except when optimizing for code
size (-Os), in which case the default is the minimum correct
alignment (4 bytes for x86, and 8 bytes for x86-64).
On Pentium and PentiumPro, double and long double values
should be aligned to an 8 byte boundary (see -malign-double) or
suffer significant run time performance penalties. On Pentium III, the
Streaming SIMD Extension (SSE) data type __m128 suffers similar
penalties if it is not 16 byte aligned.
To ensure proper alignment of this values on the stack, the stack boundary
must be as aligned as that required by any value stored on the stack.
Further, every function must be generated such that it keeps the stack
aligned. Thus calling a function compiled with a higher preferred
stack boundary from a function compiled with a lower preferred stack
boundary will most likely misalign the stack. It is recommended that
libraries that use callbacks always use the default setting.
This extra alignment does consume extra stack space, and generally
increases code size. Code that is sensitive to stack space usage, such
as embedded systems and operating system kernels, may want to reduce the
preferred alignment to -mpreferred-stack-boundary=2.
-mmmx-mno-mmx
-msse-mno-sse
-msse2-mno-sse2
-msse3-mno-sse3
-m3dnow-mno-3dnow- These switches enable or disable the use of built-in functions that allow
direct access to the MMX, SSE, SSE2, SSE3 and 3Dnow extensions of the
instruction set.
See X86 Built-in Functions, for details of the functions enabled
and disabled by these switches.
To have SSE/SSE2 instructions generated automatically from floating-point
code, see -mfpmath=sse.
-mpush-args-mno-push-args- Use PUSH operations to store outgoing parameters. This method is shorter
and usually equally fast as method using SUB/MOV operations and is enabled
by default. In some cases disabling it may improve performance because of
improved scheduling and reduced dependencies.
-maccumulate-outgoing-args- If enabled, the maximum amount of space required for outgoing arguments will be
computed in the function prologue. This is faster on most modern CPUs
because of reduced dependencies, improved scheduling and reduced stack usage
when preferred stack boundary is not equal to 2. The drawback is a notable
increase in code size. This switch implies -mno-push-args.
-mthreads- Support thread-safe exception handling on Mingw32. Code that relies
on thread-safe exception handling must compile and link all code with the
-mthreads option. When compiling, -mthreads defines
-D_MT; when linking, it links in a special thread helper library
-lmingwthrd which cleans up per thread exception handling data.
-mno-align-stringops- Do not align destination of inlined string operations. This switch reduces
code size and improves performance in case the destination is already aligned,
but GCC doesn't know about it.
-minline-all-stringops- By default GCC inlines string operations only when destination is known to be
aligned at least to 4 byte boundary. This enables more inlining, increase code
size, but may improve performance of code that depends on fast memcpy, strlen
and memset for short lengths.
-momit-leaf-frame-pointer- Don't keep the frame pointer in a register for leaf functions. This
avoids the instructions to save, set up and restore frame pointers and
makes an extra register available in leaf functions. The option
-fomit-frame-pointer removes the frame pointer for all functions
which might make debugging harder.
-mtls-direct-seg-refs-mno-tls-direct-seg-refs- Controls whether TLS variables may be accessed with offsets from the
TLS segment register (
%gs for 32-bit, %fs for 64-bit),
or whether the thread base pointer must be added. Whether or not this
is legal depends on the operating system, and whether it maps the
segment to cover the entire TLS area.
For systems that use GNU libc, the default is on.
These -m switches are supported in addition to the above
on AMD x86-64 processors in 64-bit environments.
-m32-m64- Generate code for a 32-bit or 64-bit environment.
The 32-bit environment sets int, long and pointer to 32 bits and
generates code that runs on any i386 system.
The 64-bit environment sets int to 32 bits and long and pointer
to 64 bits and generates code for AMD's x86-64 architecture.
-mno-red-zone- Do not use a so called red zone for x86-64 code. The red zone is mandated
by the x86-64 ABI, it is a 128-byte area beyond the location of the
stack pointer that will not be modified by signal or interrupt handlers
and therefore can be used for temporary data without adjusting the stack
pointer. The flag -mno-red-zone disables this red zone.
-mcmodel=small- Generate code for the small code model: the program and its symbols must
be linked in the lower 2 GB of the address space. Pointers are 64 bits.
Programs can be statically or dynamically linked. This is the default
code model.
-mcmodel=kernel- Generate code for the kernel code model. The kernel runs in the
negative 2 GB of the address space.
This model has to be used for Linux kernel code.
-mcmodel=medium- Generate code for the medium model: The program is linked in the lower 2
GB of the address space but symbols can be located anywhere in the
address space. Programs can be statically or dynamically linked, but
building of shared libraries are not supported with the medium model.
-mcmodel=large- Generate code for the large model: This model makes no assumptions
about addresses and sizes of sections. Currently GCC does not implement
this model.
|