db2诊断系列之---定位锁等待问题
作者:tacy lee
在应用中,我们经常会碰到sql执行很慢,但是数据库cpu和内存使用率又不高的情况,类似的问题基本上由于锁,排序等原因造成,本文主要描述如何去定位锁等待问题,谁在锁等待?等待谁持有的锁?锁在那个表?
一、测试准备
1、先在session1执行如下操作,创建测试表
#db2 connect to eos
#export DB2OPTIONS=+C
#db2 "create table tacy_test (a int not null primary key,b varchar(10))"
#db2 "insert into tacy_test values(1,'a')"
#db2 "insert into tacy_test values(2,'a')"
#db2 "insert into tacy_test values(3,'a')"
#db2 "insert into tacy_test values(4,'a')"
#db2 commit
2、在session2执行如下操作
#db2 connect to eos
#export DB2OPTIONS=+C
二、产生一个lock wait
在session1做一个表更新:
#db2 "update tacy_test set b='b' where a=4"
sql执行成功
在session2做同样更新操作:
#db2 "update tacy_test set b='c' where a=4"
进程被挂起等待
三、定位锁等待
1、先来看看应用的情况:
#db2pd -db eos -applications
Database Partition 0 -- Database EOS -- Active -- Up 0 days 07:37:37
Applications:
Address AppHandl [nod-index] NumAgents CoorPid Status C-AnchID C-StmtUID L-AnchID L-StmtUID Appid
0x10140040 8 [000-00008] 1 8425 Lock-wait 80 2 66 1 *LOCAL.db2inst1.071124043739
0x100CE540 7 [000-00007] 1 8358 UOW-Waiting 0 0 80 2 *LOCAL.db2inst1.071124043708
可以看到有一个应用的状态处于Lock-wait
2、现在我们来看看应用在等什么
#db2pd -db eos -locks showlock wait
Database Partition 0 -- Database EOS -- Active -- Up 0 days 07:42:56
Locks:
Address TranHdl Lockname Type Mode Sts Owner Dur HldCnt Att Rlse
0x2C8E0760 3 02001806078066020000000052 Row ..X W 2 1 0 0 0x0 TbspaceID 2 TableID 1560 RecordID 0x2668007
锁的类型为Row(行锁),X锁(排他锁),下面是我们最关心的锁的位置
TbspaceID 2 TableID 1560 RecordID 0x2668007
其中TbspaceID为表空间ID,TableID为表的ID,RecordID代表具体位置,全部应该是0x0266807,其中前面三个字节为page number,为0x026680,后面一个字节代表solt identifier,为0x07
3、找到相应的表
#db2 "select tbspace,tabschema,tabname,tableid,tbspaceid from syscat.tables where tbspaceid=2 and tableid=1560"
TBSPACE TABSCHEMA TABNAME TABLEID TBSPACEID
------------ ----------- ---------- ------- ---------
USERSPACE1 DB2INST1 TACY_TEST 1560 2
1 record(s) selected.
4、根据RecordID找到锁在哪行
db2提供了一个强大的数据分析工具db2dart,可以dump出相应的page数据
#db2dart eos /dd /tsi 2 /oi 1560 /ps 157312p /np 1 /v y
Warning: The database state is not consistent.
Warning: Reorg rows MAY be due to the inconsistent state of the database.
DB2DART Processing completed with warning(s)!
Complete DB2DART report found in:
/home/db2inst1/sqllib/db2dump/DART0000/EOS.RPT
其中tsi为表空间id(2),oi为表id(1560),ps为page number(0x0266807),需要转换为十进制,在结尾必须加p,np代表你要获取的页数,v为是否详细输出
现在我们来看看EOS.RPT
______________________________________________________________________________
_______ DART _______
D a t a b a s e A n a l y s i s a n d R e p o r t i n g T o o l
IBM DB2 6000
______________________________________________________________________________
DART (V8.1.0) Report:
2007-11-24-20.59.51.355893
Database Name: EOS
Report name: EOS.RPT
Old report back-up: EOS.BAK
Database Subdirectory: /opt/db2/db2inst1/NODE0000/SQL00001
Operational Mode: Database Inspection Only (INSPECT)
______________________________________________________________________________
------------------------------------------------------------------------------
Action option: DD
Table-object-ID: 1560; Tablespace-ID: 2; First-page: 157312p; Number-pages: 1; Verbose: y
Warning: The database state is not consistent.
Warning: Reorg rows MAY be due to the inconsistent state of the database.
Connecting to Buffer Pool Services...
Table object report phase start.
Dump format is verbose.
______________________________________
Page 0 of object 1560 from table space 2.
BPS Page Header:
Page Data Offset = 48
Page Data Length = 4048
Page LSN = 0000 AE97 AE41
Object Page Number = 0
Pool Page Number = 157312
Object ID = 1560
Object Type = Data Object
Data Page Header:
Slot Count = 8
Total Free Space = 2784
Total Reserve Space = 0
Youngest Reserve Space = n/a
Youngest TID = n/a
Free Space Offset = 2799
Maximum Record Size = 23
Data Records:
Slot 0:
Offset Location = 3996 (xF9C)
Record Length = 32 (x20)
Record Type = Data Object Header Control Record
Page count = 1
Object Creation LSN = 0000 AE97 800C
Object State = x0000
UDI Since Runstats = 0
DART Field = x00000000
Slot 1:
Offset Location = 2992 (xBB0)
Record Length = 1004 (x3EC)
Record Type = Free Space Control Record
Free space entries:
0: 2884 (x0B44), 4028 (x0FBC), 4028 (x0FBC), 4028 (x0FBC)
4: 4028 (x0FBC), 4028 (x0FBC), 4028 (x0FBC), 4028 (x0FBC)
8: 4028 (x0FBC), 4028 (x0FBC), 4028 (x0FBC), 4028 (x0FBC)
省略。。。
492: 4028 (x0FBC), 4028 (x0FBC), 4028 (x0FBC), 4028 (x0FBC)
496: 4028 (x0FBC), 4028 (x0FBC), 4028 (x0FBC), 4028 (x0FBC)
Slot 2:
Offset Location = 2916 (xB64)
Record Length = 76 (x4C)
Record Type = Table Directory Record
MetaIndex Root Page = 157377
Index Type = 2
Table Descriptor Pointer -- Page 157312 Slot 3
Max Insert Search = 0
Flags = x02000200
bit representation = 00000010 00000000 00000010 00000000
Check pending info:
Constraint status = x00
Constraint RID = Page 0 Slot 0
last BID = x00000000
Slot 3:
Offset Location = 2892 (xB4C)
Record Length = 24 (x18)
Record Type = Table Description Record
Number of Columns = 2
Column 1:
Type is Long Integer
Length = 4
Prohibits NULLs
Prohibits Default
Fixed offset: 0
Column 2:
Type is Fixed Length Character String
Length = 10
Allows NULLs
Prohibits Default
Fixed offset: 4
Slot 4:
Offset Location = 2869 (xB35)
Record Length = 23 (x17)
Record Type = Table Data Record (FIXEDVAR)
Fixed part length value = 15
Column 1:
Fixed offset: 0
Type is Long Integer
Value = 1
Column 2:
Fixed offset: 4
Type is Fixed Length Character String
61202020 20202020 2020 a
Slot 5:
Offset Location = 2846 (xB1E)
Record Length = 23 (x17)
Record Type = Table Data Record (FIXEDVAR)
Fixed part length value = 15
Column 1:
Fixed offset: 0
Type is Long Integer
Value = 2
Column 2:
Fixed offset: 4
Type is Fixed Length Character String
61202020 20202020 2020 a
Slot 6:
Offset Location = 2823 (xB07)
Record Length = 23 (x17)
Record Type = Table Data Record (FIXEDVAR)
Fixed part length value = 15
Column 1:
Fixed offset: 0
Type is Long Integer
Value = 3
Column 2:
Fixed offset: 4
Type is Fixed Length Character String
61202020 20202020 2020 a
Slot 7:
Offset Location = 2800 (xAF0)
Record Length = 23 (x17)
Record Type = Table Data Record (FIXEDVAR)
Fixed part length value = 15
Column 1:
Fixed offset: 0
Type is Long Integer
Value = 4
Column 2:
Fixed offset: 4
Type is Fixed Length Character String
61202020 20202020 2020 a
Slots Summary: Total=8, In-use=8, Deleted=0.
Table object report phase end.
______________________________________
DB2DART Processing completed with warning(s)!
Warning(s) detected during processing.
______________________________________
Complete DB2DART report found in:
/home/db2inst1/sqllib/db2dump/DART0000/EOS.RPT
_______ D A R T P R O C E S S I N G C O M P L E T E _______
找到Solt 7 (0x07),ok,你现在可以清楚的知道应用等待的Row为(4,a)
总结
通过上面的方法,我们简单描述了一个db2锁问题的定位方法,希望能给大家在分析和定位应用性能问题的时候起到一定的帮助
根据rid能查到该row数据
==========================
在程序出现问题的时候,锁等待往往得不到释放,就是那个application执行完毕都还hold住表,在这种情况下,我们需要尽可能多的提供信息给开发人员,帮助他们定位问题所在,确实这个很useful。
1,db2pd -d mall -locks show detail 查看到锁等待row所在的page和slot
2,根据数据库ROWID的组成,pages为640和slot为4,如果是 large tablespace,则rowid转换为整数为:640*65536+4 ( 补充:一般说来,RID在v8里面是4字节,在v9里面的large tablespace(默认)是6字节。4字节的版本是3字节page id加上1字节的slot。6字节版本是4字节page id加上2字节的slot
) 3,select * from ORDERS where rid(ORDERS)= 54758604808 with ur 根据rid能查到该row数据
------
这块为什么要乘以65536 大家可能听说过RID,也很困惑RID是什么咚咚(只知道是指向某一个record的,但是不知道具体是怎么指的)。 一般说来,RID在v8里面是4字节,在v9里面的large tablespace(默认)是6字节。4字节的版本是3字节page id加上1字节的slot。 6字节版本是4字节page id加上2字节的slot 这块是v9 ,large tablespace ,所以2的16次方=65536 ,也就是一页有65536个slot, 故当前的row对应的rowid为640*65536+4