3 udebug - Reports Ubik process status for a database server process
7 B<udebug> B<-servers> <I<server machine>> [B<-port> <I<IP port>>]
10 B<udebug> B<-s> <I<server machine>> [B<-p> <I<IP port>>] [B<-l>] [B<-h>]
14 The B<udebug> command displays the status of the lightweight Ubik process
15 for the database server process identified by the B<-port> argument that
16 is running on the database server machine named by the B<-servers>
17 argument. The output identifies the machines where peer database server
18 processes are running, which of them is the synchronization site (Ubik
19 coordinator), and the status of the connections between them.
25 =item B<-servers> <I<server machine>>
27 Names the database server machine that is running the process for which to
28 display status information. Provide the machine's IP address in dotted
29 decimal format, its fully qualified host name (for example,
30 B<fs1.abc.com>), or the shortest abbreviated form of its host name that
31 distinguishes it from other machines. Successful use of an abbreviated
32 form depends on the availability of a name resolution service (such as the
33 Domain Name Service or a local host table) at the time the command is
36 =item B<-port> <I<IP port>>
38 Identifies the database server process for which to display status
39 information, either by its process name or port number. Provide one of the
44 =item B<buserver> or 7021 for the Backup Server
46 =item B<kaserver> or 7004 for the Authentication Server
48 =item B<ptserver> or 7002 for the Protection Server
50 =item B<vlserver> or 7003 for the Volume Location Server
56 Reports additional information about each peer of the machine named by the
57 B<-servers> argument. The information appears by default if that machine
58 is the synchronization site.
62 Prints the online help for this command. All other valid options are
69 Several of the messages in the output provide basic status information
70 about the Ubik process on the machine specified by the B<-servers>
71 argument, and the remaining messages are useful mostly for debugging
74 To check basic Ubik status, issue the command for each database server
75 machine in turn. In the output for each, one of the following messages
76 appears in the top third of the output.
78 I am sync site . . . (<#_sites> servers)
82 For the synchronization site, the following message indicates that all
83 sites have the same version of the database, which implies that Ubik is
84 functioning correctly. See the following for a description of values other
89 For correct Ubik operation, the database server machine clocks must agree
90 on the time. The following messages, which are the second and third lines
91 in the output, report the current date and time according to the database
92 server machine's clock and the clock on the machine where the B<udebug>
95 Host's <IP_addr> time is <dbserver_date/time>
96 Local time is <local_date/time> (time differential <skew> secs)
98 The <skew> is the difference between the database server machine clock and
99 the local clock. Its absolute value is not vital for Ubik functioning, but
100 a difference of more than a few seconds between the I<skew> values for the
101 database server machines indicates that their clocks are not synchronized
102 and Ubik performance is possibly hampered.
104 Following is a description of all messages in the output. As noted, it is
105 useful mostly for debugging and most meaningful to someone who understands
106 Ubik's implementation.
108 The output begins with the following messages. The first message reports
109 the IP addresses that are configured with the operating system on the
110 machine specified by the B<-servers> argument. As previously noted, the
111 second and third messages report the current date and time according to
112 the clocks on the database server machine and the machine where the
113 B<udebug> command is issued, respectively. All subsequent timestamps in
114 the output are expressed in terms of the local clock rather than the
115 database server machine clock.
117 Host's addresses are: <list_of_IP_addrs>
118 Host's <IP_addr> time is <dbserver_date/time>
119 Local time is <local_date/time> (time differential <skew> secs)
121 If the <skew> is more than about 10 seconds, the following message
122 appears. As noted, it does not necessarily indicate Ubik malfunction: it
123 denotes clock skew between the database server machine and the local
124 machine, rather than among the database server machines.
128 If the udebug command is issued during the coordinator election process
129 and voting has not yet begun, the following message appears next.
131 Last yes vote not cast yet
133 Otherwise, the output continues with the following messages.
135 Last yes vote for <sync_IP_addr> was <last_vote> secs ago (sync site);
136 Last vote started <vote_start> secs ago (at <date/time>)
137 Local db version is <db_version>
139 The first indicates which peer this Ubik process last voted for as
140 coordinator (it can vote for itself) and how long ago it sent the vote.
141 The second message indicates how long ago the Ubik coordinator requested
142 confirming votes from the secondary sites. Usually, the <last_vote> and
143 <vote_start> values are the same; a difference between them can indicate
144 clock skew or a slow network connection between the two database server
145 machines. A small difference is not harmful. The third message reports the
146 current version number <db_version> of the database maintained by this
147 Ubik process. It has two fields separated by a period. The field before
148 the period is based on a timestamp that reflects when the database first
149 changed after the most recent coordinator election, and the field after
150 the period indicates the number of changes since the election.
152 The output continues with messages that differ depending on whether the
153 Ubik process is the coordinator or not.
159 If there is only one database server machine, it is always the coordinator
160 (synchronization site), as indicated by the following message.
162 I am sync site forever (1 server)
166 If there are multiple database sites, and the B<-servers> argument names
167 the coordinator (synchronization site), the output continues with the
168 following two messages.
170 I am sync site until <expiration> secs from now (at <date/time>)
172 Recovery state <flags>
174 The first message (which is reported on one line) reports how much longer
175 the site remains coordinator even if the next attempt to maintain quorum
176 fails, and how many sites are participating in the quorum. The I<flags>
177 field in the second message is a hexadecimal number that indicates the
178 current state of the quorum. A value of C<1f> indicates complete database
179 synchronization, whereas a value of C<f> means that the coordinator has
180 the correct database but cannot contact all secondary sites to determine
181 if they also have it. Lesser values are acceptable if the B<udebug>
182 command is issued during coordinator election, but they denote a problem
183 if they persist. The individual flags have the following meanings:
189 This machine is the coordinator.
193 The coordinator has determined which site has the database with the
194 highest version number.
198 The coordinator has a copy of the database with the highest version
203 The database's version number has been updated correctly.
207 All sites have the database with the highest version number.
211 If the udebug command is issued while the coordinator is writing a change
212 into the database, the following additional message appears.
214 I am currently managing write transaction I<identifier>
218 If the B<-servers> argument names a secondary site, the output continues
219 with the following messages.
222 Lowest host <lowest_IP_addr> was set <low_time> secs ago
223 Sync host <sync_IP_addr> was set <sync_time> secs ago
225 The <lowest_IP_addr> is the lowest IP address of any peer from which the
226 Ubik process has received a message recently, whereas the <sync_IP_addr>
227 is the IP address of the current coordinator. If they differ, the machine
228 with the lowest IP address is not currently the coordinator. The Ubik
229 process continues voting for the current coordinator as long as they
230 remain in contact, which provides for maximum stability. However, in the
231 event of another coordinator election, this Ubik process votes for the
232 <lowest_IP_addr> site instead (assuming they are in contact), because it
233 has a bias to vote in elections for the site with the lowest IP address.
237 For both the synchronization and secondary sites, the output continues
238 with the following messages. The first message reports the version number
239 of the database at the synchronization site, which needs to match the
240 <db_version> reported by the preceding C<Local db version> message. The
241 second message indicates how many VLDB records are currently locked for
242 any operation or for writing in particular. The values are nonzero if the
243 B<udebug> command is issued while an operation is in progress.
245 Sync site's db version is <db_version>
246 <locked> locked pages, <writes> of them for write
248 The following messages appear next only if there are any read or write
249 locks on database records:
251 There are read locks held
252 There are write locks held
254 Similarly, one or more of the following messages appear next only if there
255 are any read or write transactions in progress when the B<udebug> command
258 There is an active write transaction
259 There is at least one active read transaction
260 Transaction tid is <tid>
262 If the machine named by the B<-servers> argument is the coordinator, the
263 next message reports when the current coordinator last updated the
266 Last time a new db version was labelled was:
267 <last_restart> secs ago (at <date/time>)
269 If the machine named by the B<-servers> argument is the coordinator, the
270 output concludes with an entry for each secondary site that is
271 participating in the quorum, in the following format.
273 Server (<IP_address>): (db <db_version>)
274 last vote rcvd <last_vote> secs ago (at <date/time>),
275 last beacon sent <last_beacon> secs ago (at <date/time>),
276 last vote was { yes | no }
277 dbcurrent={ 0 | 1 }, up={ 0 | 1 } beaconSince={ 0 | 1 }
279 The first line reports the site's IP address and the version number of the
280 database it is maintaining. The <last_vote> field reports how long ago the
281 coordinator received a vote message from the Ubik process at the site, and
282 the <last_beacon> field how long ago the coordinator last requested a vote
283 message. If the B<udebug> command is issued during the coordinator
284 election process and voting has not yet begun, the following messages
288 Last beacon never sent
290 On the final line of each entry, the fields have the following meaning:
296 C<dbcurrent> is C<1> if the site has the database with the highest version
297 number, C<0> if it does not.
301 C<up> is C<1> if the Ubik process at the site is functioning correctly,
306 C<beaconSince> is C<1> if the site has responded to the coordinator's last
307 request for votes, C<0> if it has not.
311 Including the B<-long> flag produces peer entries even when the
312 B<-servers> argument names a secondary site, but in that case only the
313 I<IP_address> field is guaranteed to be accurate. For example, the value
314 in the <db_version> field is usually C<0.0>, because secondary sites do
315 not poll their peers for this information. The values in the I<last_vote>
316 and I<last_beacon> fields indicate when this site last received or
317 requested a vote as coordinator; they generally indicate the time of the
318 last coordinator election.
322 This example checks the status of the Ubik process for the Volume Location
323 Server on the machine C<afs1>, which is the synchronization site.
325 % udebug afs1 vlserver
326 Host's addresses are: 192.12.107.33
327 Host's 192.12.107.33 time is Wed Oct 27 09:49:50 1999
328 Local time is Wed Oct 27 09:49:52 1999 (time differential 2 secs)
329 Last yes vote for 192.12.107.33 was 1 secs ago (sync site);
330 Last vote started 1 secs ago (at Wed Oct 27 09:49:51 1999)
331 Local db version is 940902602.674
332 I am sync site until 58 secs from now (at Wed Oct 27 09:50:50 1999) (3 servers)
334 Sync site's db version is 940902602.674
335 0 locked pages, 0 of them for write
336 Last time a new db version was labelled was:
337 129588 secs ago (at Mon Oct 25 21:50:04 1999)
339 Server( 192.12.107.35 ): (db 940902602.674)
340 last vote rcvd 2 secs ago (at Wed Oct 27 09:49:50 1999),
341 last beacon sent 1 secs ago (at Wed Oct 27 09:49:51 1999), last vote was yes
342 dbcurrent=1, up=1 beaconSince=1
344 Server( 192.12.107.34 ): (db 940902602.674)
345 last vote rcvd 2 secs ago (at Wed Oct 27 09:49:50 1999),
346 last beacon sent 1 secs ago (at Wed Oct 27 09:49:51 1999), last vote was yes
347 dbcurrent=1, up=1 beaconSince=1
349 This example checks the status of the Authentication Server on the machine
350 with IP address 192.12.107.34, which is a secondary site. The local clock
351 is about 4 minutes behind the database server machine's clock.
353 % udebug 192.12.107.34 7004
354 Host's addresses are: 192.12.107.34
355 Host's 192.12.107.34 time is Wed Oct 27 09:54:15 1999
356 Local time is Wed Oct 27 09:50:08 1999 (time differential -247 secs)
358 Last yes vote for 192.12.107.33 was 6 secs ago (sync site);
359 Last vote started 6 secs ago (at Wed Oct 27 09:50:02 1999)
360 Local db version is 940906574.25
362 Lowest host 192.12.107.33 was set 6 secs ago
363 Sync host 192.12.107.33 was set 6 secs ago
364 Sync site's db version is 940906574.25
365 0 locked pages, 0 of them for write
367 =head1 PRIVILEGE REQUIRED
380 IBM Corporation 2000. <http://www.ibm.com/> All Rights Reserved.
382 This documentation is covered by the IBM Public License Version 1.0. It was
383 converted from HTML to POD by software written by Chas Williams and Russ
384 Allbery, based on work by Alf Wachsmann and Elizabeth Cassell.