gcc究竟如何做优化？-Java 学习之路

为了知道gcc究竟如何进行优化，我编写了两个用-O2编译的程序，但是汇编代码有一些区别 . 在我的程序中，我想在循环中输出“hello”，并在每个输出之间添加一些延迟 . 这两个程序仅用于说明我的问题，我知道我可以在程序1中使用volatile或asm来实现我的目的 .

计划1

#include <stdio.h>

int main(int argc, char **argv)
{
    unsigned long i = 0;
    while (1) {
        if (++i > 0x1fffffffUL) {
            printf("hello\n");
            i = 0;
        }
    }
}

用-O2编译，汇编代码是：

Disassembly of section .text.startup:

00000000 <_main>:
#include <stdio.h>

int main(int argc, char **argv)
{
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 e4 f0                and    $0xfffffff0,%esp
   6:   83 ec 10                sub    $0x10,%esp
   9:   e8 00 00 00 00          call   e <_main+0xe>
   e:   66 90                   xchg   %ax,%ax
  10:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
  17:   e8 00 00 00 00          call   1c <_main+0x1c>
  1c:   eb f2                   jmp    10 <_main+0x10>
  1e:   90                      nop
  1f:   90                      nop

计划2

int main(int argc, char **argv)
{
    unsigned long i = 0;
    while (1) {
        if (i > 0x1fffffffUL) {
            printf("hello\n");
            i = 0;
        }
        i++;
    }
}

用-O2编译，汇编代码是：

Disassembly of section .text.startup:

00000000 <_main>:
#include <stdio.h>

int main(int argc, char **argv)
{
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 e4 f0                and    $0xfffffff0,%esp
   6:   83 ec 10                sub    $0x10,%esp
   9:   e8 00 00 00 00          call   e <_main+0xe>
   e:   31 c0                   xor    %eax,%eax
  10:   83 c0 01                add    $0x1,%eax
  13:   3d ff ff ff 1f          cmp    $0x1fffffff,%eax
  18:   76 f6                   jbe    10 <_main+0x10>
  1a:   c7 04 24 00 00 00 00    movl   $0x0,(%esp)
    while (1) {
        if (i > 0x1fffffffUL) {
            printf("hello\n");
            i = 0;
        }
        i++;
  21:   e8 00 00 00 00          call   26 <_main+0x26>

int main(int argc, char **argv)
{
    unsigned long i = 0;
    while (1) {
        if (i > 0x1fffffffUL) {
  26:   31 c0                   xor    %eax,%eax
  28:   eb e6                   jmp    10 <_main+0x10>
            printf("hello\n");
  2a:   90                      nop
  2b:   90                      nop
  2c:   90                      nop
  2d:   90                      nop
  2e:   90                      nop
  2f:   90                      nop

在程序1中， i 的增加被优化，但它不在程序2中 . 为什么会发生这种情况？ gcc在为这两个程序优化-O2时使用了哪些规则？

2 回答

if语句中的分支现在取决于循环的前一次迭代中发生的事情 . 特别地，编译器可以在程序1中容易地确定 i 在while循环的每次迭代中递增（因为它在右上方），而在程序2中不是这种情况 .

无论如何，编译器优化非常复杂 . 见下文：

gcc -O2是这些标志的快捷方式:(来自documentation）

-fauto-inc-dec 
      -fbranch-count-reg 
      -fcombine-stack-adjustments 
      -fcompare-elim 
      -fcprop-registers 
      -fdce 
      -fdefer-pop 
      -fdelayed-branch 
      -fdse 
      -fforward-propagate 
      -fguess-branch-probability 
      -fif-conversion2 
      -fif-conversion 
      -finline-functions-called-once 
      -fipa-pure-const 
      -fipa-profile 
      -fipa-reference 
      -fmerge-constants 
      -fmove-loop-invariants 
      -freorder-blocks 
      -fshrink-wrap 
      -fsplit-wide-types 
      -fssa-backprop 
      -fssa-phiopt 
      -ftree-bit-ccp 
      -ftree-ccp 
      -ftree-ch 
      -ftree-coalesce-vars 
      -ftree-copy-prop 
      -ftree-dce 
      -ftree-dominator-opts 
      -ftree-dse 
      -ftree-forwprop 
      -ftree-fre 
      -ftree-phiprop 
      -ftree-sink 
      -ftree-slsr 
      -ftree-sra 
      -ftree-pta 
      -ftree-ter 
      -funit-at-a-time
      -fthread-jumps 
      -falign-functions  -falign-jumps 
      -falign-loops  -falign-labels 
      -fcaller-saves 
      -fcrossjumping 
      -fcse-follow-jumps  -fcse-skip-blocks 
      -fdelete-null-pointer-checks 
      -fdevirtualize -fdevirtualize-speculatively 
      -fexpensive-optimizations 
      -fgcse  -fgcse-lm  
      -fhoist-adjacent-loads 
      -finline-small-functions 
      -findirect-inlining 
      -fipa-cp 
      -fipa-cp-alignment 
      -fipa-sra 
      -fipa-icf 
      -fisolate-erroneous-paths-dereference 
      -flra-remat 
      -foptimize-sibling-calls 
      -foptimize-strlen 
      -fpartial-inlining 
      -fpeephole2 
      -freorder-blocks-algorithm=stc 
      -freorder-blocks-and-partition -freorder-functions 
      -frerun-cse-after-loop  
      -fsched-interblock  -fsched-spec 
      -fschedule-insns  -fschedule-insns2 
      -fstrict-aliasing -fstrict-overflow 
      -ftree-builtin-call-dce 
      -ftree-switch-conversion -ftree-tail-merge 
      -ftree-pre 
      -ftree-vrp 
      -fipa-ra

这些标志中的每一个对应于允许编译器进行的不同可能的优化 .

回复于 2024-04-28T08:10:21+08:00

2

询问"why"关于优化器通常是浪费时间，因为优化器没有"rules"操作 - 除了"as if"：优化器可能不会改变符合代码的可观察行为 .

你的程序的“可观察行为”是反复打印“你好” .

在您的第一个程序中，计数被优化掉，使得可观察的行为更快地发生 . 这是优化器的工作 . 请高兴您的代码现在更高效！

在你的第二个程序中，计数没有被优化掉，因为不知何故优化器 - 在 this 版本的 this 编译器中设置了 this - 没有看到它没有它 . 为什么？谁知道（除了编译器的优化器模块的维护者）？

如果您希望的行为是在输出之间有延迟，请使用类似thrd_sleep()的内容 . 空计数循环是在C64上延迟BASIC 2.0程序的一种方法，但它们不应该在C中使用，原因在于您刚才观察到的：您永远不知道优化程序的作用 .

回复于 2024-04-28T08:10:21+08:00

gcc究竟如何做优化？

2 回答

相关问题