Discussion:
Clarification on some ZEO server log activity
evadmin
2009-12-10 16:00:47 UTC
Permalink
Background:
We have a cluster of 2 clients for a ZEO server (plone 2.5.4) that is having
some performance issues (occasional hanging of 20-40 seconds). We've done
call & page profiling and the initial data suggests that the hangups are
either random or located in a more common piece of the code than a specific
custom script or page template (say, a custom global_sections).

While I'm pretty sure our problem lies in the Plone layer I wanted to rule
out lower level stuff - especially since we've had some hardware issues
[failing RAID controller, bad NIC, etc] in the recent past.

Queston:
When I check the ZEO log it shows that one of the two clients is
disconnecting every three minutes. Is this normal behavior? The
http://wiki.zope.org/ZODB/howto.html wiki simply describes how disconnects
are handled by ZEO, but not what the "normal" occurrence of disconnects are
(ideally 0?) .

Can anyone tell me if this is something I should be looking at further in
regards to our server hangups? Or is it likely a redherring?
--
View this message in context: http://n2.nabble.com/Clarification-on-some-ZEO-server-log-activity-tp4146105p4146105.html
Sent from the General Questions mailing list archive at Nabble.com.
ajung
2009-12-10 16:37:09 UTC
Permalink
Please check your client logs for crashes or restarts. A client is usually
connected as long as it it running (or until it goes down for whatever
reason).

-aj
Post by evadmin
When I check the ZEO log it shows that one of the two clients is
disconnecting every three minutes. Is this normal behavior?
--
View this message in context: http://n2.nabble.com/Clarification-on-some-ZEO-server-log-activity-tp4146105p4146305.html
Sent from the General Questions mailing list archive at Nabble.com.
evadmin
2009-12-10 16:50:45 UTC
Permalink
Thanks for your response - There are no corresponding crashes on any client
log that match up with disconnect/reconnects.

we have two servers, each server has 2 clients.
All use the DB on the primary server.
The second server appears to be the one disconnecting, based on IP address.

would more detailed ZEO config information help? I apologize but I'm not
familiar with ZEO in this configuration (I previously used it with just 1
server that had two clients)

-------
[from zeo.log]

2009-12-10T10:05:35 INFO ZEO.StorageServer (30736/10.2.1.2:39168)
disconnected
------
2009-12-10T10:08:41 INFO ZEO.StorageServer (30736) new connection
('10.2.1.2', 57021): <ManagedServerConnection ('10.2.1.2', 57021)>
------
2009-12-10T10:08:41 INFO ZEO.StorageServer (30736/10.2.1.2:57021)
disconnected
------
2009-12-10T10:11:42 INFO ZEO.StorageServer (30736) new connection
('10.2.1.2', 58746): <ManagedServerConnection ('10.2.1.2', 58746)>
------
Post by ajung
Please check your client logs for crashes or restarts. A client is usually
connected as long as it it running (or until it goes down for whatever
reason).
-aj
Post by evadmin
When I check the ZEO log it shows that one of the two clients is
disconnecting every three minutes. Is this normal behavior?
--
View this message in context: http://n2.nabble.com/Clarification-on-some-ZEO-server-log-activity-tp4146105p4146381.html
Sent from the General Questions mailing list archive at Nabble.com.
ajung
2009-12-10 16:54:29 UTC
Permalink
Any firewall between ZEO clients and server closing the connection due to
inactivity timeouts?

Andreas
Post by evadmin
Thanks for your response - There are no corresponding crashes on any
client log that match up with disconnect/reconnects.
we have two servers, each server has 2 clients.
All use the DB on the primary server.
The second server appears to be the one disconnecting, based on IP address.
would more detailed ZEO config information help? I apologize but I'm not
familiar with ZEO in this configuration (I previously used it with just 1
server that had two clients)
-------
[from zeo.log]
2009-12-10T10:05:35 INFO ZEO.StorageServer (30736/10.2.1.2:39168)
disconnected
------
2009-12-10T10:08:41 INFO ZEO.StorageServer (30736) new connection
('10.2.1.2', 57021): <ManagedServerConnection ('10.2.1.2', 57021)>
------
2009-12-10T10:08:41 INFO ZEO.StorageServer (30736/10.2.1.2:57021)
disconnected
------
2009-12-10T10:11:42 INFO ZEO.StorageServer (30736) new connection
('10.2.1.2', 58746): <ManagedServerConnection ('10.2.1.2', 58746)>
------
Post by ajung
Please check your client logs for crashes or restarts. A client is
usually connected as long as it it running (or until it goes down for
whatever reason).
-aj
Post by evadmin
When I check the ZEO log it shows that one of the two clients is
disconnecting every three minutes. Is this normal behavior?
--
View this message in context: http://n2.nabble.com/Clarification-on-some-ZEO-server-log-activity-tp4146105p4146399.html
Sent from the General Questions mailing list archive at Nabble.com.
evadmin
2009-12-10 17:02:57 UTC
Permalink
I'm checking with our hosting engineer, will get back asap. Originally there
was none but they have moved hardware in the datacenter.

thanks again...


Any firewall between ZEO clients and server closing the connection due to
inactivity timeouts?

Andreas


Thanks for your response - There are no corresponding crashes on any client
log that match up with disconnect/reconnects.

we have two servers, each server has 2 clients.
All use the DB on the primary server.
The second server appears to be the one disconnecting, based on IP address.

would more detailed ZEO config information help? I apologize but I'm not
familiar with ZEO in this configuration (I previously used it with just 1
server that had two clients)

-------
[from zeo.log]

2009-12-10T10:05:35 INFO ZEO.StorageServer (30736/10.2.1.2:39168)
disconnected
------
2009-12-10T10:08:41 INFO ZEO.StorageServer (30736) new connection
('10.2.1.2', 57021): <ManagedServerConnection ('10.2.1.2', 57021)>
------
2009-12-10T10:08:41 INFO ZEO.StorageServer (30736/10.2.1.2:57021)
disconnected
------
2009-12-10T10:11:42 INFO ZEO.StorageServer (30736) new connection
('10.2.1.2', 58746): <ManagedServerConnection ('10.2.1.2', 58746)>
------
--
View this message in context: http://n2.nabble.com/Clarification-on-some-ZEO-server-log-activity-tp4146105p4146443.html
Sent from the General Questions mailing list archive at Nabble.com.
ajung
2009-12-10 17:36:21 UTC
Permalink
You may also try to increase the logging level (DEBUG) both on server and
client side. A client trying to reconnect should be visible within the logs.

-aj


I'm checking with our hosting engineer, will get back asap. Originally there
was none but they have moved hardware in the datacenter.

thanks again...


Any firewall between ZEO clients and server closing the connection due to
inactivity timeouts?

Andreas
Post by evadmin
Thanks for your response - There are no corresponding crashes on any
client log that match up with disconnect/reconnects.
we have two servers, each server has 2 clients.
All use the DB on the primary server.
The second server appears to be the one disconnecting, based on IP address.
would more detailed ZEO config information help? I apologize but I'm not
familiar with ZEO in this configuration (I previously used it with just 1
server that had two clients)
-------
[from zeo.log]
2009-12-10T10:05:35 INFO ZEO.StorageServer (30736/10.2.1.2:39168)
disconnected
------
2009-12-10T10:08:41 INFO ZEO.StorageServer (30736) new connection
('10.2.1.2', 57021): <ManagedServerConnection ('10.2.1.2', 57021)>
------
2009-12-10T10:08:41 INFO ZEO.StorageServer (30736/10.2.1.2:57021)
disconnected
------
2009-12-10T10:11:42 INFO ZEO.StorageServer (30736) new connection
('10.2.1.2', 58746): <ManagedServerConnection ('10.2.1.2', 58746)>
------
--
View this message in context: http://n2.nabble.com/Clarification-on-some-ZEO-server-log-activity-tp4146105p4146663.html
Sent from the General Questions mailing list archive at Nabble.com.
evadmin
2009-12-10 19:27:01 UTC
Permalink
Okay - I'll turn the debug level on tonight (we're in production so I don't
want to take it down during the day). Oh, i got word back from the
datacenter :

"I checked on the architecture and there is no firewall between E1 and E2,
only a layer 2 switch.

I checked on the TCP timeouts on both servers and they are set to the
default value of 2 hours. All of the other TCP/IP level timeouts are the
normal value, and all at least 2 hours.

I checked the Zeo config on E1 to make sure that there was not a low timeout
value set there (/opt/Plone-2.5.4/zeocluster/server/etc/zeo.conf) and the
transaction timeout is not set, so it is forever."



You may also try to increase the logging level (DEBUG) both on server and
client side. A client trying to reconnect should be visible within the logs.

-aj


I'm checking with our hosting engineer, will get back asap. Originally there
was none but they have moved hardware in the datacenter.

thanks again...
--
View this message in context: http://n2.nabble.com/Clarification-on-some-ZEO-server-log-activity-tp4146105p4147290.html
Sent from the General Questions mailing list archive at Nabble.com.
evadmin
2009-12-11 14:46:05 UTC
Permalink
Mystery solved - it turns out our nagios monitoring connects directly to the
Zope server. This is done because monitoring just the port 80 had previously
gave false 'staus ok' readings when Zope hung up (but the Apache server was
still able to serve up cached pages, i.e. the home page).

I was surprised that the ZEO log actually logs this as a 'connection', I
guess I was thinking of it in a "ZEO client" connection sense than a network
request.

Thanks for your time, Andreas.


You may also try to increase the logging level (DEBUG) both on server and
client side. A client trying to reconnect should be visible within the logs.

-aj


I'm checking with our hosting engineer, will get back asap. Originally there
was none but they have moved hardware in the datacenter.

thanks again...
Post by ajung
Any firewall between ZEO clients and server closing the connection due to
inactivity timeouts?
Andreas
Post by evadmin
Thanks for your response - There are no corresponding crashes on any
client log that match up with disconnect/reconnects.
we have two servers, each server has 2 clients.
All use the DB on the primary server.
The second server appears to be the one disconnecting, based on IP address.
would more detailed ZEO config information help? I apologize but I'm not
familiar with ZEO in this configuration (I previously used it with just
1 server that had two clients)
-------
[from zeo.log]
2009-12-10T10:05:35 INFO ZEO.StorageServer (30736/10.2.1.2:39168)
disconnected
------
2009-12-10T10:08:41 INFO ZEO.StorageServer (30736) new connection
('10.2.1.2', 57021): <ManagedServerConnection ('10.2.1.2', 57021)>
------
2009-12-10T10:08:41 INFO ZEO.StorageServer (30736/10.2.1.2:57021)
disconnected
------
2009-12-10T10:11:42 INFO ZEO.StorageServer (30736) new connection
('10.2.1.2', 58746): <ManagedServerConnection ('10.2.1.2', 58746)>
------
--
View this message in context: http://n2.nabble.com/Clarification-on-some-ZEO-server-log-activity-tp4146105p4151660.html
Sent from the General Questions mailing list archive at Nabble.com.
ajung
2009-12-12 09:03:51 UTC
Permalink
I have doubts with this explanation. A HTTP request by Nagios to the Zope ZEO
client will unlikely result in such a logging message. I assume that Nagios
is directly connecting to the ZEO server.

-aj
Post by evadmin
Mystery solved - it turns out our nagios monitoring connects directly to
the Zope server. This is done because monitoring just the port 80 had
previously gave false 'staus ok' readings when Zope hung up (but the
Apache server was still able to serve up cached pages, i.e. the home
page).
--
View this message in context: http://n2.nabble.com/Clarification-on-some-ZEO-server-log-activity-tp4146105p4155596.html
Sent from the General Questions mailing list archive at Nabble.com.
Dieter Maurer
2009-12-11 18:23:30 UTC
Permalink
Post by evadmin
...
[from zeo.log]
...
------
2009-12-10T10:08:41 INFO ZEO.StorageServer (30736) new connection
('10.2.1.2', 57021): <ManagedServerConnection ('10.2.1.2', 57021)>
------
2009-12-10T10:08:41 INFO ZEO.StorageServer (30736/10.2.1.2:57021)
disconnected
This looks very much like a probe.
For a typical Zope, the disconnect comes too fast after the connect --
unless the ZODB versions of ZEO and Zope were incompatible.
--
Dieter
Dieter Maurer
2009-12-11 18:21:30 UTC
Permalink
Post by evadmin
...
When I check the ZEO log it shows that one of the two clients is
disconnecting every three minutes. Is this normal behavior?
No. At least it is unusual.

My former employer had a high availability software which monitored
the ZEO by opening a connection (and doing some operations).
This monitoring caused connects and disconnects every few minutes.

The ZEO connect log entries tell you from which IP the connection
request came. Maybe, this can give you a hint.
Post by evadmin
...
Can anyone tell me if this is something I should be looking at further in
regards to our server hangups? Or is it likely a redherring?
It is likely not related to your hangups.

A lost ZEO connection causes requests to fail with a
"ClientDisconnected" exception. The connection is automatically
reopened but you should nevertheless see the exceptions in
your "error_log" object.

In addition, Zope logs ZEO disconnections and connections at
log level "INFO". Usually, you should see them in your Zope "event" logfile.
--
Dieter
Derek Broughton
2009-12-11 22:32:33 UTC
Permalink
Post by Dieter Maurer
A lost ZEO connection causes requests to fail with a
"ClientDisconnected" exception. The connection is automatically
reopened
Really? I had an issue on a Windows server that would result in a
ClientDisconnected exception, that promptly resulted in a flood of log
messages that quickly either filled up the disk or, after I installed
rotating logs, rotated 10MBs of logs out of existence every minute. I guess
the loop could have been caused by trying to reconnect - and failing - but
because of the speed with which it filled the logs, I never was able to
debug it. I haven't seen it since we changed server hardware.
--
derek
Dieter Maurer
2009-12-14 17:48:05 UTC
Permalink
Post by Derek Broughton
Post by Dieter Maurer
A lost ZEO connection causes requests to fail with a
"ClientDisconnected" exception. The connection is automatically
reopened
Really?
Yes, in general (but there are exceptions).
Post by Derek Broughton
I had an issue on a Windows server that would result in a
ClientDisconnected exception, that promptly resulted in a flood of log
messages that quickly either filled up the disk or, after I installed
rotating logs, rotated 10MBs of logs out of existence every minute. I guess
the loop could have been caused by trying to reconnect - and failing - but
because of the speed with which it filled the logs, I never was able to
debug it. I haven't seen it since we changed server hardware.
You get a "ClientDisconnected" exception for each request that
fails because the ZEO is not (yet fully) connected.


Reconnecting to ZEO can need an incredibly long time -- due to
the verification of a potentially large ZEO client cache (and
inefficiency in the ZEO protocol for this task). I have seen delays of several
minutes. This problem usually occurs when the ZEO server is
restarted (special ZEO verification optimizations usually ensure
quick reconnects for other disconnect causes).

For modern ZODB versions (>= 3.8.2), I have implemented an
option "drop-rather-than-verify" (or similarly spelled) that
drops the client cache rather than perform verification.
This option can drastically speed up ZEO reconnections made
necessary by ZEO restarts
(This time, the enhancement is in the ZODB core (not in a private copy, as
usual)).
As far as I have heared, Plone 3 works together with ZODB 3.8.2.
--
Dieter
Loading...