SRID: JPRDS6DSJTY Stat: OP1L2 PMR: 63359,693,760 Entitle: N___ S************S Rqst: SOFTWARE SubRqst: SOFTWARE DEFECT SUPPORT * PRB REC * Queu: TSCTQ_ Ctr: 30X S3 P3 04 Con: (V10J1)__________ * PRINT * Comp: 5752SC1C3___ Rel: 790 ConPh: ___________________ SC: 75 * IBM CONFID * RtID: 5752SC1C3___ 760 Lvl: 1 GenPh: ___________________ Cat: MVS Surv: __ PID#: _______ VRM: ___ NxtQ: IOSHCD NxtCtr: 36D PA: ______ -00 CUS SP: Y Cust: 0993046 IKT1CCMK-STC________________ CPU: XXXX 0993046 SecC: N Cmts: IOS DPSVAL in HyperPAV environment____________________ Ar: 25 Eng: _ 11/08/23 modify Don't disp,handle by isc Ent: 10:28/17 170647 Dly: __ : __ / __ APAR: _______ BU: AFE O15/08/17 10:28 Dsp: __:__/__ ______ T/D: ________ ECT: _________ Ter: 127 A15/10/20 11:24 Req: 03:10/03 441998 ESN: _____________ Sysdown: N FUPQ/C: ______ ___ FUP/MM/DD Own: ______________________ OwnID: O______ SWBO/Cty: SMA1760 Res: BARB EIKOV BONANNO ResID: R441998 FixNm: _______________ RS4 EMail: __________________________________ AltPh: ___________________ AltPhSource: ______ Page 1 Service Options: ___ ___ ___ ___ ___ ___ ___ CritSit: ______________ of KW1: @SKM________#SG1__ KW2: _________ KW3: PB=TSMVS,30X 30 -SAKAI, M M -5752SC1C3 -L30X/-------P3S3-15/08/18-19:35 -SP DIA: IOS DPSVAL in HyperPAV environment. Long I/O. CL15/08/17 ZZZLL -SAKAI, M M -5752SC1C3 -L30X/TSCTQ -P3S3-15/08/17-10:28 -CE +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-11:15 -AT Material received from FTP Server and stored in ECuRep: Directory: /ecurep/pmr/6/3/63359,693,760/2015-08-17 File: 63359.693.760.syslog58 3072 bytes File: 63359.693.760.gtf58 754688 bytes +ECUREP PMRUPDATE R27 P-5752SC1C3 -L30X/TSCTQ -P3S3-15/08/17-11:15 SCG +ECUREP PMRUPDATE R27 P-5752SC1C3 -L30X/TSCTQ -P3S3-15/08/17-11:15 SAT +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-11:15 -AT Untersed data now available on MCEVS1-System : /ecurep/pmr/6/3/63359,693,760/2015-08-17/63359.693.760.syslog58 ->ONTOP.GS693.P63359.C760.SYSLOG58 +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-11:16 -AT Page Untersed data now available on MCEVS1-System : 2 /ecurep/pmr/6/3/63359,693,760/2015-08-17/63359.693.760.gtf58 of ->ONTOP.GS693.P63359.C760.GTF58 30 +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-11:20 -AT Material received from FTP Server and stored in ECuRep: Directory: /ecurep/pmr/6/3/63359,693,760/2015-08-17 File: 63359.693.760.dump58 59362304 bytes +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-11:20 -AT No call generated. A Call exists already on Queue: TSCTQ,30X +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-11:21 -AT Untersed data now available on MCEVS1-System : /ecurep/pmr/6/3/63359,693,760/2015-08-17/63359.693.760.dump58 ->ONTOP.GS693.P63359.C760.DUMP58 +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-11:27 -AT Material received from FTP Server and stored in ECuRep: Directory: /ecurep/pmr/6/3/63359,693,760/2015-08-17 Page File: 63359.693.760.syslog85 15360 bytes 3 +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-11:27 -AT of No call generated. A Call exists already on Queue: TSCTQ,30X 30 +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-11:31 -AT Untersed data now available on MCEVS1-System : /ecurep/pmr/6/3/63359,693,760/2015-08-17/63359.693.760.syslog85 ->ONTOP.GS693.P63359.C760.SYSLOG85 +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-11:34 -AT Material received from FTP Server and stored in ECuRep: Directory: /ecurep/pmr/6/3/63359,693,760/2015-08-17 File: 63359.693.760.dump85 52421632 bytes +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-11:34 -AT No call generated. A Call exists already on Queue: TSCTQ,30X +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-11:36 -AT Untersed data now available on MCEVS1-System : /ecurep/pmr/6/3/63359,693,760/2015-08-17/63359.693.760.dump85 Page ->ONTOP.GS693.P63359.C760.DUMP85 4 +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-12:30 -AT of Material received from FTP Server and stored in ECuRep: 30 Directory: /ecurep/pmr/6/3/63359,693,760/2015-08-17 File: 63359.693.760.gtf85 638832640 bytes +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-12:30 -AT No call generated. A Call exists already on Queue: TSCTQ,30X +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-12:32 -AT Untersed data now available on MCEVS1-System : /ecurep/pmr/6/3/63359,693,760/2015-08-17/63359.693.760.gtf85 ->ONTOP.GS693.P63359.C760.GTF85 -SAKAI, M M -5752SC1C3 -L30X/TSCTQ -P3S3-15/08/17-12:56 S2D +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-14:12 -AT Material received from FTP Server and stored in ECuRep: Directory: /ecurep/pmr/6/3/63359,693,760/2015-08-17 File: 63359.693.760.gtfaa18 194560 bytes Page +ECUREP PMRUPDATE R27 P-5752SC1C3 -L30X/TSCTQ -P3S3-15/08/17-14:12 SCG 5 +ECUREP PMRUPDATE R27 P-5752SC1C3 -L30X/TSCTQ -P3S3-15/08/17-14:12 SAT of +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/08/17-14:15 -AT 30 Untersed data now available on MCEVS1-System : /ecurep/pmr/6/3/63359,693,760/2015-08-17/63359.693.760.gtfaa18 ->ONTOP.GS693.P63359.C760.GTFAA18 -SAKAI, M M -5752SC1C3 -L30X/TSCTQ -P3S3-15/08/17-16:06 S2D -SAKAI, M M -5752SC1C3 -L30X/-------P3S3-15/08/17-19:33 -AL -SAKAI, M M -5752SC1C3 -L30X/-------P3S3-15/08/18-19:35 -AT Customer has concern about the IOS recovery time when the dasd devices are unresponsive. When IFCC or MIH type error occurs, IOS performs the DPS validation. During this DPSVAL, if the SNID cmd fails, the defective path is taken offline with message IOS450E PERMANENT I/O, PATH TAKEN OFFLINE . If the device is completely unresponsive, the SNID cmd would fail as MIH. DPSVAL processing sets the special timeout value, 15 seconds once, Page 5 seconds twice. So, the path should be taken offline after about 25 6 seconds and the application I/O should resume after that. of The problem here is that in HyperPAV environment, the time for the 30 IOS recovery (the time between the triggering H/W error and the application I/O resume) seems to be much longer because the application I/O resumes after DPSVAL against both PAV alias and base devices. . Our H/W engineer helped us to inject the IFCC and the unresponsive type path failures - 3 channel path onlined CHPID A0 ( Always channel busy ) CHPID A1 ( Good state ) CHPID A8 ( Temporary IFCC ) . The docs collected are syslog / GTF CCW trace / console dump of IOSAS+ dataspace (for SYSIOS ctrace). Page . 7 > Docs in MCEVS1; of < case 1 > 30 ONTOP.GS693.P63359.C760.SYSLOG58 ONTOP.GS693.P63359.C760.GTF58 (sorry, this is formatted trace) ONTOP.GS693.P63359.C760.DUMP58 . < case 2 > ONTOP.GS693.P63359.C760.SYSLOG85 ONTOP.GS693.P63359.C760.GTF85 (this one is raw GTF) ONTOP.GS693.P63359.C760.DUMP85 . > test case 1) simple case - syslog & GTF summary 16:24:01.96 IOS1051I INTERFACE TIMEOUT DETECTED ON 1AAED,A8,E7,**02,PCH 16:24:17.78 IOS071I AA00,A0,*MASTER*, START PENDING (1AAED) alias Page 16:24:23.78 IOS071I AA00,A0,*MASTER*, START PENDING (1AAED) alias 8 16:24:29.78 IOS071I AA00,A0,*MASTER*, START PENDING (1AAED) alias of ** SPID Disband/Establish for 1AAED ** 30 16:24:45.79 IOS071I AA00,A0,*MASTER*, START PENDING (0AA00) base 16:24:51.79 IOS071I AA00,A0,*MASTER*, START PENDING (0AA00) base 16:24:57.79 IOS071I AA00,A0,*MASTER*, START PENDING (0AA00) base ** SPID Disband/Establish for 0AA00 ** 16:24:57.80 *IOS450E AA00,A0, PERMANENT I/O, PATH TAKEN OFFLINE 16:24:57.81 *** application I/O resumed *** . Note that the device address in IOS071I is always PAV base. The actual address (1AAED) (0AA00) in above is taken from GTF CCW trace. In this case, the IFCC occurred on PAV alias 1AAED. GTF trace shows SNID was issued just after this IFCC. After 15-16 seconds, 1st MIH on alias, then two MIH each after 5-6 seconds. Although no IOS message was issued, GTF trace shows SPID disband and SPID establish was issued for the alias Page 1AAED. 9 Then another 15 sec MIH on PAV base, two 5 sec MIH, then SPID disband of and establish and IOS450E for the base. 30 The application I/O resumed after the IOS450E. . . > test case 2) many jobs and many devices case - syslog & GTF summary < for device AA09 > error on PAV alias case 01:13:30.99 IOS1051I INTERFACE TIMEOUT DETECTED ON 1AADD,A8,E7,**02,PCH 01:13:46.15 IOS071I AA09,A0,*MASTER*, START PENDING (1AADD) alias 01:13:51.17 IOS071I AA09,A0,*MASTER*, START PENDING (1AADD) alias 01:13:57.21 IOS071I AA09,A0,*MASTER*, START PENDING (1AADD) alias ** SPID Disband/Establish for 1AADD ** 01:14:13.25 IOS071I AA09,A0,*MASTER*, START PENDING (0AA09) base 01:14:19.28 IOS071I AA09,A0,*MASTER*, START PENDING (0AA09) base Page 01:14:25.37 IOS071I AA09,A0,*MASTER*, START PENDING (0AA09) base 10 ** SPID Disband/Establish for 0AA09 ** of 01:14:25.42 *IOS450E AA09,A0, PERMANENT I/O, PATH TAKEN OFFLINE 30 01:14:25.43 *** application I/O resumed *** . < for device AC11 > error on PAV alias case 01:13:30.99 IOS1051I INTERFACE TIMEOUT DETECTED ON 1ACD5,A8,E7,**02,PCH 01:13:46.15 IOS071I AC11,A0,*MASTER*, START PENDING (1ACD5) alias 01:13:51.17 IOS071I AC11,A0,*MASTER*, START PENDING (1ACD5) alias 01:13:57.20 IOS071I AC11,A0,*MASTER*, START PENDING (1ACD5) alias ** SPID Disband/Establish for 1ACD5 ** 01:14:13.25 IOS071I AC11,A0,*MASTER*, START PENDING (0AC11) base 01:14:19.28 IOS071I AC11,A0,*MASTER*, START PENDING (0AC11) base 01:14:25.37 IOS071I AC11,A0,*MASTER*, START PENDING (0AC11) base ** SPID Disband/Establish for 0AC11 ** 01:14:25.44 *IOS450E AC11,A0, PERMANENT I/O, PATH TAKEN OFFLINE Page 01:14:25.46 *** application I/O resumed *** 11 . of < for device AA0B > error on PAV base case 30 01:13:30.99 IOS051I INTERFACE TIMEOUT DETECTED ON AA0B,A8,E7,**02,PCH 01:13:46.15 IOS071I AA0B,A0,*MASTER*, START PENDING (0AA0B) base 01:13:51.18 IOS071I AA0B,A0,*MASTER*, START PENDING (0AA0B) base 01:13:57.21 IOS071I AA0B,A0,*MASTER*, START PENDING (0AA0B) base ** SPID Disband/Establish for 0AA0B ** 01:13:57.25 *IOS450E AA0B,A0, PERMANENT I/O, PATH TAKEN OFFLINE 01:14:13.25 IOS071I AA0B,A0,*MASTER*, START PENDING (1AAF2) alias 01:14:19.28 IOS071I AA0B,A0,*MASTER*, START PENDING (1AAF2) alias 01:14:25.37 IOS071I AA0B,A0,*MASTER*, START PENDING (1AAF2) alias ** SPID Disband/Establish for 1AAF2 ** 01:14:41.47 IOS071I AA0B,A0,*MASTER*, START PENDING (1AA92) alias 01:14:47.49 IOS071I AA0B,A0,*MASTER*, START PENDING (1AA92) alias 01:14:53.52 IOS071I AA0B,A0,*MASTER*, START PENDING (1AA92) alias Page ** SPID Disband/Establish for 1AA92 ** 12 *** application I/O resumed *** of . 30 < for device AC13 > error on PAV base case 01:13:30.99 IOS051I INTERFACE TIMEOUT DETECTED ON AC13,A8,E7,**02,PCH 01:13:46.16 IOS071I AC13,A0,*MASTER*, START PENDING (0AC13) base 01:13:51.17 IOS071I AC13,A0,*MASTER*, START PENDING (0AC13) base 01:13:57.21 IOS071I AC13,A0,*MASTER*, START PENDING (0AC13) base ** SPID Disband/Establish for 0AC13 ** 01:13:57.26 *IOS450E AC13,A0, PERMANENT I/O, PATH TAKEN OFFLINE 01:14:13.25 IOS071I AC13,A0,*MASTER*, START PENDING (1ACEB) alias 01:14:19.28 IOS071I AC13,A0,*MASTER*, START PENDING (1ACEB) alias 01:14:25.37 IOS071I AC13,A0,*MASTER*, START PENDING (1ACEB) alias ** SPID Disband/Establish for 1ACEB ** 01:14:41.47 IOS071I AC13,A0,*MASTER*, START PENDING (1ACDC) alias 01:14:47.48 IOS071I AC13,A0,*MASTER*, START PENDING (1ACDC) alias Page 01:14:53.52 IOS071I AC13,A0,*MASTER*, START PENDING (1ACDC) alias 13 ** SPID Disband/Establish for 1ACDC ** of *** application I/O resumed *** 30 . The 'error on PAV alias case' is same as the case1) simple case. In the 'error on PAV base case', IOS450E was issued after the DPSVAL for the PAV base device. But DPSVAL processing continued on two alias devices. For the alias devices, IOS450E is not issued but GTF shows SPID Disband/Establish was issued after the 3rd MIH each. Because GTF trace was stopped at 01:14:49, it's not sure when the application I/O resumed. At least, it was not resumed before the GTF stop, presumably resumed after the DPSVAL for the two aliases. . . . In all, the application I/O halted until the DPSVAL for PAV base and Page alias finished and SPID Disband/Establish was issued. 14 In module IOSRSLH and IOSRMIHR who call the DPS validation module of IOSRDPSV, there is the comment 'if this device is in HPAV mode, we set 30 DPSPDEVO to tell IOSRDPSV to process just this device'. In this IFCC case, the caller of IOSRDPSV would be IOSRSLH. In the dump, there is residual DPSP for IOSRSLH. It shows DPSPDEVO on. . 01F661F0 C4D7E2D7 024C2B08 C9D6E2D9 E2D3C840 | DPSP.<..IOSRSLH | 01F66200 80000000 00008000 00000000 00000000 | ................ | * DPSPDEVO on . So, why the DPSVAL is done for both PAV base and alias ?? -SAKAI, M M -5752SC1C3 -L30X/-------P3S3-15/08/18-19:40 -AT Hello, IOS support. Env: z/OS V2R1 Problem: In HyperPAV environment, DPS validation is executed for both Page PAV base and alias(es) after H/W error. If the device/path is 15 unresponsive and SNID for DPSVAL gets MIH condition, the application of I/O does not resume until these DPSVAL for base and alias finish. 30 See above pages for our test scenario and docs. . Please take a look and advise on this matter. Thanks and regards, Sakai -SAKAI, M M -5752SC1C3 -L30X/TSCTQ -P3S3-15/08/18-19:41 -CR S5> SERVICE GIVEN= 19 SG/19/ S6> SERVICE GIVEN= 19 SG/19/ +PROG.OPERATOR5 -5752SC1C3 -L108/-------P3S3-15/08/18-19:46 -AL =BARB EIKOV BONANNO -5752SC1C3 -L36D/IOSHCD-P3S3-15/08/18-23:10 -CT +PROG.OPERATOR5 -5752SC1C3 -L108/-------P3S3-15/08/18-23:16 -AL =BARB EIKOV BONANNO -5752SC1C3 -L108/-------P3S3-15/08/19-03:06 -AT Action taken: Noting update. Action plan: Reviewing documentation. Page -VERNON, CRAIG L2 -5752SC1C3 -L36D/IOSHCD-P3S3-15/08/19-13:20 -CT 16 -VERNON, CRAIG L2 -5752SC1C3 -L36D/-------P3S3-15/08/19-13:21 -AL of -VERNON, CRAIG L2 -5752SC1C3 -L36D/-------P3S3-15/08/19-14:02 -AL 30 -VERNON, CRAIG L2 -5752SC1C3 -L36D/IOSHCD-P3S3-15/08/19-14:02 -CR S5> SERVICE GIVEN= 99 SG/99/ S6> SERVICE GIVEN= 99 SG/99/ Is the question: is this WAD? =BARB EIKOV BONANNO -5752SC1C3 -L108/-------P3S3-15/08/20-05:19 -AL =BARB EIKOV BONANNO -5752SC1C3 -L36D/IOSHCD-P3S3-15/08/20-05:19 -CR S5> SERVICE GIVEN= 99 SG/99/ S6> SERVICE GIVEN= 99 SG/99/ Action taken: We have reviewed the GTF traces provided, and will be further discussing with our colleagues. Action plan: Discussing with colleagues. =BARB EIKOV BONANNO -5752SC1C3 -L36D/IOSHCD-P3S3-15/08/25-21:45 -CR S5> SERVICE GIVEN= 99 SG/99/ Page S6> SERVICE GIVEN= 99 SG/99/ 17 Hello Sakai, of 30 We are still in the process of discussing the design of DPSVAL related to Hyperpav devices. However, we wanted to mention that there is a new function APAR which will improve the timeouts during DPSVAL. This will improve the overall time that the devices are held at the high UCB level, blocking out user level I/O. The new function APAR is OA45514. We can certainly provide a ++APAR if you would be interested in performing the same type of test to see the improvements that are being made. If you are interested in the ++APAR, please let us know and we can either put it on the FTP site or if you provide a userid/node we can transmit the ++APAR to you. Regards, Barbara Page 18 Action taken: Providing additional information regarding a new of function APAR which will make some improvements in this area 30 Action plan: Continue discussions with colleagues. -SAKAI, M M -5752SC1C3 -L30X/-------P3S3-15/08/26-10:28 -AT Hello, Barbara. Okay, we'd like to test the ++APAR of OA45514. Please place the fix into testcase or EcuRep fromibm site. Thank you, Sakai -SAKAI, M M -5752SC1C3 -L30X/TSCTQ -P3S3-15/08/26-10:30 -CR S5> SERVICE GIVEN= 99 SG/99/ S6> SERVICE GIVEN= 99 SG/99/ +PROG.OPERATOR5 -5752SC1C3 -L108/-------P3S3-15/08/26-10:31 -AL =BARB EIKOV BONANNO -5752SC1C3 -L36D/IOSHCD-P3S3-15/08/26-21:36 -CT +PROG.OPERATOR5 -5752SC1C3 -L108/-------P3S3-15/08/26-21:46 -AL =BARB EIKOV BONANNO -5752SC1C3 -L36D/IOSHCD-P3S3-15/08/26-21:48 -CR Page S5> SERVICE GIVEN= 99 SG/99/ 19 S6> SERVICE GIVEN= 99 SG/99/ of Hello Sakai, 30 No problem. We have FTP'd the ++APARs to testcase.boulder.ibm.com, directory fromibm/mvs. The following files have been sent: ha45514.hbb7780 ia45514.hbb7790 ka45514.hbb77a0 Regards, Barbara Action taken: FTP'd the ++APARs to the FTP site. Action plan: Continue discussions with colleagues. Page +BARB EIKOV BONANNO -5752SC1C3 -L16D/-------P3S3-15/08/26-23:33 -AT 20 Hello Sakai, of 30 Upon further review of the ++APAR, we do not believe that the ++APAR will help reduce the recovery time for this case. We will continue investigating to determine why the DPSVAL is being driven to the aliases after the path has already been removed from the base. Regards, Barbara Action taken: Providing additional information. Action plan: Continue investigations. -SAKAI, M M -5752SC1C3 -L30X/-------P3S3-15/08/27-15:52 -AT Update noted, thanks Barbara. +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/09/01-18:16 -AT Page Material received from FTP Server and stored in ECuRep: 21 Directory: /ecurep/pmr/6/3/63359,693,760/2015-09-01 of File: 63359.693.760.D0901.gtf.trs 17619968 bytes 30 File: 63359.693.760.D0901.syslog.trs 4096 bytes +ECUREP PMRUPDATE R27 P-5752SC1C3 -L36D/IOSHCD-P3S3-15/09/01-18:16 SCG +ECUREP PMRUPDATE R27 P-5752SC1C3 -L36D/IOSHCD-P3S3-15/09/01-18:16 SAT +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/09/01-18:16 -AT Material received from FTP Server and stored in ECuRep: Directory: /ecurep/pmr/6/3/63359,693,760/2015-09-01 File: 63359.693.760.D0901.slipdump.trs 61818880 bytes +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/09/01-18:16 -AT No call generated. A Call exists already on Queue: IOSHCD,36D +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/09/01-18:20 -AT Untersed data now available on MCEVS1-System : /ecurep/pmr/6/3/63359,693,760/2015-09-01/63359.693.760.D0901.syslog.trs ->ONTOP.GS693.P63359.C760.D0901.SYSLOG Page +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/09/01-18:20 -AT 22 Untersed data now available on MCEVS1-System : of /ecurep/pmr/6/3/63359,693,760/2015-09-01/63359.693.760.D0901.gtf.trs 30 ->ONTOP.GS693.P63359.C760.D0901.GTF +ECUREP PMRUPDATE R27 P-5752SC1C3 -L203/-------P3S3-15/09/01-18:21 -AT Untersed data now available on MCEVS1-System : /ecurep/pmr/6/3/63359,693,760/2015-09-01/63359.693.760.D0901.slipdump.t rs ->ONTOP.GS693.P63359.C760.D0901.SLIPDUMP -SAKAI, M M -5752SC1C3 -L30X/-------P3S3-15/09/01-19:14 -AT Hello, First, Customer (actually, internal customer) applied ++APAR OA45514. The test result with the ++APAR shows that the MIH timeout value for DPSVAL SNID gets shorter ( 5 seconds * 3 ? ), but the behavior - do DPSVAL against all PAV base/alias - seems to be same. Sorry, we do not Page collect docs with this ++APAR on. Anyway, this ++APAR was restored. 23 . of Next, 30 Because I suspected IOSRDPSV proc DPSVALID iterates 'CALL DPSVPROC' for each PAV base/alias, I asked customer to take the following IF SLIP dump. SLIP SET,IF,ID=DPSV,N=(IOSRDPSV,B28),DATA=(6R?+143,NE,00), A=SYNCSVCD,JL=(IOSAS),DSPNAME=('IOSAS'.*),E R6 +140 is LoopCounter. This SLIP intended to take a dump when DPSVPROC is called with LoopCounter > 0, i.e., 'GoTo RecursiveProcessing' is done to process 'next' base/alias device. This SLIP dump was taken with the following syslog msgs; (note that (xxxxx) at end ofd IOS071I is the actual device number from GTF CCW trace.) . 14:09:37.53 IOS051I INTERFACE TIMEOUT DETECTED ON AA00,A8,E7,**02,PCHID Page 14:09:37.53 IOS1051I INTERFACE TIMEOUT DETECTED ON 1AAF5,A8,E7,**02,PCH 24 14:09:37.53 IOS1051I INTERFACE TIMEOUT DETECTED ON 1AAFF,A8,E7,**02,PCH of 14:09:40.63 IOS1051I INTERFACE TIMEOUT DETECTED ON 1AAF7,A8,E7,**02,PCH 30 14:09:53.51 IOS071I AA00,A0,*MASTER*, START PENDING ( AA00) 14:09:59.51 IOS071I AA00,A0,*MASTER*, START PENDING ( AA00) 14:10:05.52 IOS071I AA00,A0,*MASTER*, START PENDING ( AA00) 14:10:05.53 *IOS450E AA00,A0, PERMANENT I/O, PATH TAKEN OFFLINE . 14:10:05.54 IEA045I AN SVC DUMP HAS STARTED AT TIME=14.10.05 DATE=09/01 14:10:05.54 IEA992I SLIP TRAP ID=DPSV MATCHED. JOBNAME=*MASTER*, ASID= . 14:10:21.53 IOS071I AA00,A0,*MASTER*, START PENDING (1AAFA) 14:10:27.53 IOS071I AA00,A0,*MASTER*, START PENDING (1AAFA) 14:10:33.53 IOS071I AA00,A0,*MASTER*, START PENDING (1AAFA) . 14:10:49.53 IOS071I AA00,A0,*MASTER*, START PENDING (1AAF7) Page 14:10:55.54 IOS071I AA00,A0,*MASTER*, START PENDING (1AAF7) 25 14:11:01.53 IOS071I AA00,A0,*MASTER*, START PENDING (1AAF7) of . 30 14:11:17.59 IOS071I AA00,A0,*MASTER*, START PENDING (1AAF5) 14:11:23.60 IOS071I AA00,A0,*MASTER*, START PENDING (1AAF5) 14:11:29.60 IOS071I AA00,A0,*MASTER*, START PENDING (1AAF5) . 14:11:45.60 IOS071I AA00,A0,*MASTER*, START PENDING (1AAFF) 14:11:51.60 IOS071I AA00,A0,*MASTER*, START PENDING (1AAFF) 14:11:57.60 IOS071I AA00,A0,*MASTER*, START PENDING (1AAFF) . In this case, DPSV ran against the base AA00 first, then DPSV for 4 aliases (1AAFA/1AAF7/1AAF5/1AAFF) followed. IP LISTU AA00 indicates the base device AA00 had these 4 aliases. . I'm not sure if this dump is helpful. Could you take a look at this Page additional docs if interested ? 26 ONTOP.GS693.P63359.C760.D0901.SYSLOG of ONTOP.GS693.P63359.C760.D0901.GTF 30 ONTOP.GS693.P63359.C760.D0901.SLIPDUMP . Thanks and regards, Sakai -SAKAI, M M -5752SC1C3 -L30X/TSCTQ -P3S3-15/09/01-19:17 -CR S5> SERVICE GIVEN= 99 SG/99/ S6> SERVICE GIVEN= 99 SG/99/ ... requeue the primary as well ... +PROG.OPERATOR5 -5752SC1C3 -L108/-------P3S3-15/09/01-19:31 -AL =BARB EIKOV BONANNO -5752SC1C3 -L36D/IOSHCD-P3S3-15/09/01-21:08 S2D =BARB EIKOV BONANNO -5752SC1C3 -L36D/IOSHCD-P3S3-15/09/01-21:08 -CT +PROG.OPERATOR5 -5752SC1C3 -L108/-------P3S3-15/09/01-21:16 -AL =BARB EIKOV BONANNO -5752SC1C3 -L108/-------P3S3-15/09/01-23:21 -AT Hi Sakai, Page 27 Thank you very much for the feedback with the ++APAR. We agree, that of there would have been no changes with the I/Os being redrive to the 30 alias devices when this ++APAR is applied. Thank you very much for the additional documentation. We will continue investigating. Regards, Barbara Action taken: Acknowledging update. Action plan: Reviewing dump/trace. =BARB EIKOV BONANNO -5752SC1C3 -L36D/IOSHCD-P3S3-15/09/01-23:46 -CR =BARB EIKOV BONANNO -5752SC1C3 -L36D/IOSHCD-P3S3-15/09/01-23:46 -AT S5> SERVICE GIVEN= 99 SG/99/ S6> SERVICE GIVEN= 99 SG/99/ =BARB EIKOV BONANNO -5752SC1C3 -L36D/IOSHCD-P4S3-15/09/24-22:33 S2D Page =BARB EIKOV BONANNO -5752SC1C3 -L36D/IOSHCD-P3S3-15/09/25-01:54 -CR 28 S5> SERVICE GIVEN= 99 SG/99/ of S6> SERVICE GIVEN= 99 SG/99/ 30 Action taken: Review of the dump shows that the LPM for the base shows only 2 paths while the LPM for the aliases has all 3 paths. After DPSV for the base completes, we then go on to do DPSV for the bound aliases. We have gone through and reviewed the code in IOSRDPSV, and believe that this is the expected behavior. Action plan: We are in the process of discussing this with our colleagues. =BARB EIKOV BONANNO -5752SC1C3 -L36D/IOSHCD-P3S3-15/10/03-03:10 -CR S5> SERVICE GIVEN= 99 SG/99/ S6> SERVICE GIVEN= 99 SG/99/ Hello Sakai, We apologize for the delay. We have been in discussions with our team Page regarding the design when DPS validation is performed. As we've seen 29 here, in a Hyperpav environment, DPS validation occurs serially. First of DPSV occurs on the base, followed by each of the alias devices which 30 are bound. We certainly agree that there could be some improvements here, in order to reduce the recovery time when there is unresponsive hardware involved. Our recommendation would be to open a requirement to request for a design change. In the meantime, the fix for OA45514 could be used in order to cut down the MIH timeouts to reduce some of the overall recovery time. Regards, Barbara Action taken: Continued discussing design with colleagues. Action plan: Recommending opening a requirement. -SAKAI, MASAKI -5752SC1C3 -L30X/-------P3S3-15/10/05-17:27 -AT Page Update noted. Thanks Barbara. 30 -SAKAI, MASAKI -5752SC1C3 -L30X/TSCTQ -P4S3-15/10/20-11:24 S2D of 30