近期通过后台的监控,看到某台 ORACLE 9i数据库服务器整体负载有升高趋势,部分时段loadavg峰值到8,平均在4-5之间,调整之后,系统负载下降非常显著,现将调整的过程分享出来,文档可能会有些长度,为了打消阅读的畏难情绪和普遍面对调优无从着手的印象,我想先强调一点:数据库性能调优,有时候简单到超乎你的想像~~~

现状描述:

先来看看调整前后整机负载状况的对比,如图:

收集时间为上午10:00-12:00 之间,分别收集10.27和28号两天数据的对比~~

由上图可以看出,通过特别调整后,系统整体负载下降了数倍,现在平均负载保持在0.x左右,高峰时期也没有超过1!

接下来再通过ORACLE的STATSPACK生成性能报告来对比,报表信息采集时间均为下午14:00---15:00之间:

 

A>. 数据读写

修改前:

    Load Profile

    ~~~~~~~~~~~~                            Per Second       Per Transaction

                                       ---------------       ---------------

                      Redo size:             26,567.21              1,611.03

                  Logical reads:            389,897.11             23,643.26

                  Block changes:                165.96                 10.06

                 Physical reads:             30,068.50              1,823.35

修改后:

    Load Profile

    ~~~~~~~~~~~~                            Per Second       Per Transaction

                                       ---------------       ---------------

                      Redo size:             32,102.45              2,132.94

                  Logical reads:             31,283.94              2,078.55

                  Block changes:                173.87                 11.55

                 Physical reads:                457.09                 30.37

=================================

由上述信息可以看出,Redo size/Block changes基本无变化(后者甚至更高),说明涉及数据的修改操作基本相同,数据读则有十倍以上的差距,修改后系统物理读和逻辑读都有近10倍的下降,可以简单理解为,同等应用环境情况下,对IO的负载要求减轻了10倍!

 

B>. TOP5等待事件

修改前:

    Top 5 Timed Events

    ~~~~~~~~~~~~~~~~~~                                                     % Total

    Event                                               Waits    Time (s) Ela Time

    -------------------------------------------- ------------ ----------- --------

    CPU time                                                       11,495    92.03

    db file sequential read                        47,569,569         369     2.95

    db file scattered read                         16,255,400         355     2.84

    log file sync                                      60,633         101      .81

    log file parallel write                            61,910          60      .48

修改后:

    Top 5 Timed Events

    ~~~~~~~~~~~~~~~~~~                                                     % Total

    Event                                               Waits    Time (s) Ela Time

    -------------------------------------------- ------------ ----------- --------

    CPU time                                                        1,254    87.71

    db file sequential read                           367,961          55     3.85

    log file sync                                      55,857          35     2.42

    LGWR wait on LNS                                  195,613          17     1.22

    LGWR-LNS wait on channel                           68,833          17     1.19

=================================

从排名前5的等待事件能够清晰的看到,曾经最严重的磁盘文件读等待现在已经完全可以忽视,并且CPU time时间也有10倍的下降~ 

 

C>. Buffer Gets

修改前:

                                                         CPU      Elapsd

      Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value

    --------------- ------------ -------------- ------ -------- --------- ----------

        450,247,504        1,974      228,088.9   32.1  2445.17   2405.51 1551938908

    select count(1) as num from t_wiki_doc_topics where doc_id_encry

    pt=¨BBwReRlFZV0RRXlkG¨ and topic_state=1

        446,599,884        1,959      227,973.4   31.8  2450.06   2406.25 1250588569

    select b.* from (select rownum as r,a.* from (select t.user_id,t

    .last_post_user_id,t.doc_title,t.topic_title,t.click_count+1 as

    click_count,t.posts_count-1 as posts_count,t.user_nick,to_char(t

    .last_post_time,¨yy-mm-dd hh24:mi:ss¨) last_post_time,t.last_pos

    t_user_nick,t.last_post_user_id_encrypt,t.user_id_encrypt,t.onto

        216,979,036        3,602       60,238.5   15.4  1878.72   1850.94   52310006

    select e.*, rownum from T_WIKI_EVENT e where (rownum <= :1) and

    NEXT_SEND_TIME < SYSDATE and EVENT_TYPE>=0 order by NEXT_SEND_TIME

              98,751,515       14,404        6,855.8    7.4  2176.33   2182.08 3309920380

    update T_WIKI_DOC_LOCK set DOC_LOCK_USER_ID=:1,DOC_LOCK_START_TI

    ME=to_date(:2,¨yyyy-mm-dd hh24:mi:ss¨),DOC_LOCK_HEARTBEAT_TIME=t

    o_date(:3,¨yyyy-mm-dd hh24:mi:ss¨),DOC_LOCK_STATE=:4,DOC_LOCK_US

    ER_ID_ENCRYPT=:5,DOC_ID_ENCRYPT=:6,DOC_LOCK_USER_IP=:7,DOC_TITLE

    =:8,DOC_USER_ID_EN=:9,DOC_USER_NICK=:10 where DOC_ID=:11

修改后:

                                                         CPU      Elapsd

      Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value

    --------------- ------------ -------------- ------ -------- --------- ----------

         31,281,969       18,319        1,707.6   27.8    86.73     96.33 3049882434

    select COUNT(*) from t_wiki_user_msg where ((dest_user_id_encryp

    t = :1) AND (msg_state = 1) AND (open_time IS NULL))

    .........

    ...........

         13,723,327        2,037        6,737.0   12.2   345.14    344.57 1551938908

    select count(1) as num from t_wiki_doc_topics where doc_id_encry

    pt=¨BBwReRlFZV0RRXlkG¨ and topic_state=1

         12,717,774        1,999        6,362.1   11.3   115.08    115.46 1250588569

    select b.* from (select rownum as r,a.* from (select t.user_id,t

    .last_post_user_id,t.doc_title,t.topic_title,t.click_count+1 as

    click_count,t.posts_count-1 as posts_count,t.user_nick,to_char(t

    .last_post_time,¨yy-mm-dd hh24:mi:ss¨) last_post_time,t.last_pos

    t_user_nick,t.last_post_user_id_encrypt,t.user_id_encrypt,t.onto

=================================

可以看到,修改前那几条占用CPU/BUFFER最为夸张的几条语句,在新的性能REPORT中都基本可以忽略,新的统计报表中,排名最靠前的buffer gets不过3kw,CPU时间不到1分半钟,与之前的SQL语句完全不在同等重量级,相同SQL语句的统计信息相比修改之前,都有十数倍以上的降低。

 

D>. 物理读

修改前:

    SQL ordered by Reads for DB: ORACLE9I  Instance: oracle9i  Snaps: 1014 -1015

    -> End Disk Reads Threshold:      1000

                                                         CPU      Elapsd

     Physical Reads  Executions  Reads per Exec %Total Time (s)  Time (s) Hash Value

    --------------- ------------ -------------- ------ -------- --------- ----------

        104,960,046        3,602       29,139.4   96.9  1878.72   1850.94   52310006

    select e.*, rownum from T_WIKI_EVENT e where (rownum <= :1) and

    NEXT_SEND_TIME < SYSDATE and EVENT_TYPE>=0 order by NEXT_SEND_TIME

修改后:

    SQL ordered by Reads for DB: ORACLE9I  Instance: oracle9i  Snaps: 1090 -1091

    -> End Disk Reads Threshold:      1000

                                                         CPU      Elapsd

     Physical Reads  Executions  Reads per Exec %Total Time (s)  Time (s) Hash Value

    --------------- ------------ -------------- ------ -------- --------- ----------

              4,225       18,319            0.2    0.3    86.73     96.33 3049882434

    select COUNT(*) from t_wiki_user_msg where ((dest_user_id_encryp

    t = :1) AND (msg_state = 1) AND (open_time IS NULL))

=================================

曾经排名最靠前的产生物理读的SQL语句,在新的报表中已经不再出现,并且新的性能报表中,排名最高的SQL所产生的物理读也只有4,225个(该语句执行次数稍高了一点,不过这个就非DBA所能控制了,需要开发人员及设计人员配合处理才能有可能降低其请求次数)。

 

E>. 实例整体统计信息

修改前:

    Statistic                                      Total     per Second    per Trans

    --------------------------------- ------------------ -------------- ------------

    CPU used by this session                   1,149,502          319.1         19.4

    CPU used when call started                 1,149,615          319.2         19.4

    CR blocks created                             10,632            3.0          0.2

    Cached Commit SCN referenced              11,403,832        3,166.0        192.0

    Commit SCN cached                                105            0.0          0.0

    DBWR buffers scanned                          68,437           19.0          1.2

    DBWR summed scan depth                        68,437           19.0          1.2

    buffer is not pinned count             1,369,310,429      380,152.8     23,052.4

    buffer is pinned count                 6,491,211,605    1,802,113.2    109,279.7

    .......

    consistent changes                            16,402            4.6          0.3

    consistent gets                        1,403,720,942      389,706.0     23,631.7

    consistent gets - examination              6,368,534        1,768.1        107.2

    .........

    no work - consistent read gets         1,396,760,295      387,773.5     23,514.5

    .........

    physical reads                           108,306,724       30,068.5      1,823.4

    physical reads direct                         46,758           13.0          0.8

    physical reads direct (lob)                    2,161            0.6          0.0

    .........

    session logical reads                  1,404,409,388      389,897.1     23,643.3

    .........

    table fetch by rowid                   3,496,022,454      970,578.1     58,855.6

    table fetch continued row                395,372,678      109,764.8      6,656.1

    table scan blocks gotten                 453,859,894      126,002.2      7,640.7

    table scan rows gotten                 6,925,949,059    1,922,806.5    116,598.5

修改后:

    Statistic                                      Total     per Second    per Trans

    --------------------------------- ------------------ -------------- ------------

    CPU used by this session                     125,354           34.8          2.3

    CPU used when call started                   125,227           34.8          2.3

    CR blocks created                                278            0.1          0.0

    Cached Commit SCN referenced              12,475,990        3,464.6        230.2

    Commit SCN cached                                 66            0.0          0.0

    DBWR buffers scanned                              50            0.0          0.0

    DBWR summed scan depth                            50            0.0          0.0

    buffer is not pinned count                86,322,220       23,971.7      1,592.7

    buffer is pinned count                   315,042,719       87,487.6      5,812.8

    .........

    no work - consistent read gets           103,809,445       28,828.0      1,915.4

    .........

    physical reads                             1,645,979          457.1         30.4

    physical reads direct                              0            0.0          0.0

    physical reads direct (lob)                   11,441            3.2          0.2

    .........

    session logical reads                    112,653,483       31,283.9      2,078.6

    .........

    table fetch by rowid                     178,037,671       49,441.2      3,285.0

    table fetch continued row                    640,957          178.0         11.8

    table scan blocks gotten                  27,871,674        7,740.0        514.3

    table scan rows gotten                   435,414,252      120,914.8      8,033.8

=================================

在consistent gets,consistent read gets,physical reads,session logical reads,table fetch,table scan 等方面都有数倍到数十倍的下降

 

F>. 表空间IO

修改前:

    Tablespace

    ------------------------------

                     Av      Av     Av                    Av        Buffer Av Buf

             Reads Reads/s Rd(ms) Blks/Rd       Writes Writes/s      Waits Wt(ms)

    -------------- ------- ------ ------- ------------ -------- ---------- ------

    GAIBAN_DATA

        63,821,302  17,718    0.0     1.7       10,734        3      5,633    1.5

修改后:

    Tablespace

    ------------------------------

                     Av      Av     Av                    Av        Buffer Av Buf

             Reads Reads/s Rd(ms) Blks/Rd       Writes Writes/s      Waits Wt(ms)

    -------------- ------- ------ ------- ------------ -------- ---------- ------

    GAIBAN_DATA

           562,450     156    0.1     2.9       17,842        5          6    6.7

=================================

从统计结果看,调整主要涉及表空间的读和等待均有百倍的提升~~~