Frequently Asked QuestionsContents
GeneralWhat is the difference between RHQ, Jopr and JON?RHQ is an extensible management platform. RHQ is licensed as a fully open-sourced project. Jopr was an open source project that contained JBoss middleware specific plugins, such as the JBossAS plugin, Tomcat plugin, et. al. Jopr followed the same licensing agreement as the RHQ project. The Jopr codebase has been rolled into the RHQ project and there is no longer a standalone "Jopr" project. All plugin code that used to be in the Jopr source code repository has been copied into the RHQ project and is being further developed there. Building RHQ today gives you both RHQ and what was termed "Jopr" combined. Therefore, you will find the development team now uses the term "RHQ", and rarely mentions Jopr except when talking in historical contexts (like this FAQ entry is). JON (aka "JBoss Operations Network" or "JBoss ON") is a commercial product offered to Red Hat customers and is a fully tested, QA'ed and certified distribution. Of the three projects mentioned here (RHQ, Jopr, JON), the JON product is the only one officially supported by Red Hat. Note that an older version of JON (version 1.x) exists but it is not based on RHQ/Jopr - instead, JON 1.x was closed-source only. What documentation is available?RHQ user documentation can be found here. Note: the commercially-available-only JBoss ON product has its documentation at http://www.redhat.com/docs/en-US/JBoss_ON/. Is there a publicly available issue tracker system to search for bugs and submit enhancement requests?Yes. If you would like to search for a bug, report a bug, or submit an enhancement request then use the RHQ Bugzilla located at https://bugzilla.redhat.com/browse.cgi?product=RHQ%20Project. Is XXX database supported?No. PostgreSQL and Oracle are the only supported databases for production usage. RHQ 2.3 and up can also support an H2 Embedded Database for demo and developer usage only. What is the syntax for regular expressions used within RHQ?RHQ uses regular expressions in several places. When you encounter a place (user interface, configuration file, etc) that requires you to enter a regular expression, consult Java's javadoc documentation for the syntax rules. Below are the Javadocs for regular expression syntax and date/time syntax for Java5 and Java6:
Data ModelWhat is a "Measurement Definition" versus a "Measurement Schedule"?In order to know what schedule and a definition represent, you need to understand the underlying data model for the measurement data. In RHQ, there are "resource types" and "resources". A "resource type" represents a kind of resource (like "JBossAS server" or "Apache Web Server"). A "resource" is an instance of a resource type (like "My JBossAS App Server" or "hostname foo Apache"). RHQ has analogous entities in the realm of the measurement subsystem, too. For each resource type, we have "measurement definitions", sometimes alternatively called "metric definitions" (because these are the <metric> definitions in the XML plugin descriptors) - these represent a "kind" of measurement. For each resource, there is an instance of each metric definition called a "measurement schedule". So, for example, a "Linux Platform resource type" has a "Free Memory metric definition". Each of your Linux boxes would therefore have a "Free Memory measurement schedule". Each schedule has their own collection intervals associated with them - that's why you can collect Free Memory for platform A every 30 minutes but you can collect Free Memory for platform B every 15 minutes. So, it is like this:
Therefore, "measurement definition" refers to the kind of metric being collected for that resource's type (e.g. it refers to the "Free Memory" metric defined on the "Linux platform" resource type - it does not refer to any specific resource or any specific piece of data, rather it identifies the "kind" of metric). "Measurement schedule" refers to the specific measurement data that was collected for a specific resource (e.g. it refers to the Free Memory measurement data for the specific Linux platform resource named "myhost"). User InterfaceHow can I ignore an autodiscovered resource?If your agent discovered a new platform and found a few resources that you do not want to take into inventory, you have to tell the RHQ Server to ignore those resources. First, you can just select the resources to import in the auto-discovery portlet and deselect the unwanted resource. As long as they are shown in the portlet, they are not imported. Of course this has the disadvantage that it might be confusing to always see them. The other option is to select the resource you do not want to import and click on "Ignore", so it no longer shows up in the portlet. Although, if you try this on a resource on a freshly discovered platform, it will fail. The reason behind this is that the inventory is organized in a tree like manner with the platform as a tree-root and when a server or service is taken into the system (no matter if imported or ignored) it will be attached below that root. When the platform is not yet imported into the inventory, there is no root where the ignored resource can be attached to. So to ignore a server on a platform: first import the platform and leave that server unchecked. When the platform is successfully imported, select the server and click on "ignore". From the above explanation you can see that it is not possible to ignore just a platform. If you want to ignore a platform, just do not run an agent on it. ServerHow do I get debug messages from the RHQ Server?You can edit the <server-install-dir>/jbossas/server/default/conf/jboss-log4j.xml configuration file to enable debug messages. Generally, you will want to just uncomment the org.rhq category, which will set its priority to DEBUG. This will emit debug messages for all RHQ subsystems to the log file. If you want to only emit debug messages for a smaller subset of the RHQ server internals, you can uncomment only those categories you are interested in, or add your own categories. There are several commented out categories in jboss-log4j.xml, with comments that briefly explain what types of debug messages can be expected from a particular category. You can also emit debug messages for third-party subsystems - again, there are some already commented out in jboss-log4j.xml - things like JBoss/Remoting and Hibernate are examples of these third-party subsystems that you can configure to emit debug message. After you make your changes to the jboss-log4j.xml file, save the file and the RHQ server will hot-deploy the changes within a few seconds. The debug messages can be found by examining the file <server-install-dir>/logs/rhq-server-log4j.log. Note that by default the console window will not show the debug messages. This is because the log4j CONSOLE appender has a threshold at INFO. If you want your debug messages to also go to the console, you must change the CONSOLE appender's Threshold setting to DEBUG. In some cases, you will want to get debug messages from the RHQ Server launcher scripts. To do this, you need to set the environment variable RHQ_SERVER_DEBUG to "true". Now when you start, the launcher scripts will output debug messages.
How does RHQ integrate with external LDAP user repositories?RHQ uses passwords to authenticate users. Authentication information, comprising user names and passwords, can be stored in an internal database (the default) or in an external LDAP repository. It is important to note that support for LDAP currently does not include storing attributes other than user names and passwords. In particular, authorization information such as roles used to control access to RHQ resources is persisted in the internal database.
RHQ does not currently check server certificates for LDAP over SSL, nor can it provide client side certificates to the LDAP server. However, developers should be able to customize RHQ to perform these tasks - please see https://bugzilla.redhat.com/show_bug.cgi?id=RHQ-2064 for more information. How can I specify command-line options for the Server JVM?On UNIX If you want to override the default max heap and permgen sizes, set them via the RHQ_SERVER_JAVA_OPTS environment variable, e.g.:
RHQ_SERVER_JAVA_OPTS="-Dapp.name=rhq-server -Xms256M -Xmx1024M
-XX:PermSize=128M -XX:MaxPermSize=256M
-Djava.net.preferIPv4Stack=true"
export RHQ_SERVER_JAVA_OPTS
Set all other JVM options via the RHQ_SERVER_ADDITIONAL_JAVA_OPTS environment variable, e.g.:
RHQ_SERVER_ADDITIONAL_JAVA_OPTS="-Dfoo=true"
export RHQ_SERVER_ADDITIONAL_JAVA_OPTS
On Windows For all other JVM options, add "wrapper.java.additional.n" lines to <server-install-dir>\bin\wrapper\rhq-server-wrapper.inc (creating that file if necessary), e.g.: wrapper.java.additional.12=-verbosegc:file=gc-log.txt How can I confirm my server's email/SMTP settings are correct?Each server is configured to talk to a particular SMTP server. This configuration # Email rhq.server.email.smtp-host=localhost rhq.server.email.smtp-port=25 rhq.server.email.from-address=rhqadmin@localhost If you want to confirm that these settings are correct and the server can actually send emails successfully, log into the GUI as the "rhqadmin" user and go to the "test email" page located at http://<your-server>:7080/admin/test/email.jsp. When do Baselines auto-calculate?Go to the Administration>SystemConfiguration>Settings page of the RHQ GUI. You will see settings for Automatic Baseline Configuration Properties. Baseline Frequency determines how often the baselines will be calculated. By default it is 3 days. This means that every 3 days a new set of baselines are calculated (except for those that were manually set by the user - those remain pinned to the baselines set by the user). Baseline Dataset determines the minimum set of data that must have been collected for a measurement before a baseline for that measurement is calculated. The default is 7 days. For example, when it is determined that baselines should be calculated (every 3rd day by default), only those measurements that have data that are 7 days old or older will get a baseline calculated. Any measurements that do not yet have data from 7 days ago will be skipped. This ensures that when a measurement's baseline is calculated, you have a good representative set of data to include in the calculation (e.g. by default, you will have 7 days worth of data that will be included in the baseline calculation). I deleted a Platform from inventory. How do I get it be rediscovered, so I can re-import it?Just force an Agent discovery by issuing the following command at the Agent command prompt: > discovery -f Alternatively, you can register a new Agent by restarting the agent, specifying the -l option, on the machine corresponding to the Platform you deleted. The Platform will get rediscovered. That is, first quit the agent (using the 'quit' command), then run: <agent-install-dir>/bin/rhq-agent.sh -l My server machine does not have a writable directory called /var/run. How can I get my rhq-server.sh script to successfully write out its pidfile?Set the environment variable RHQ_SERVER_PIDFILE_DIR to a full path of the directory where you want the pidfile to get stored. When you run the script, that variable's value will override the default location. If you have an older script (2.1 or older), directly edit rhq-server.sh and change /var/run to the directory that you want. The default location for this pid file has changed in 2.2 and up - it is now written to the /bin directory of the server install directory. When I try to start the server, I get an exception whose cause says "Exception creating identity" and the server fails to start. How can I fix this?The message you are probably refering to looks something like this: Caused by: java.lang.RuntimeException: Exception creating identity: my.host.name.com: my.host.name.com | at org.jboss.remoting.ident.Identity.get(Identity.java:211) This is not RHQ specific - its JBoss/Remoting failing. See: https://jira.jboss.org/jira/browse/JBREM-769. The core issue (that is hidden from you, because JBoss/Remoting isn't bubbling up the real error message, as per that JIRA) is typically because your hostname is not resolvable. Make sure your hostname (as reported to you in that exception message, e.g. "my.host.name.com" is a valid hostname and make sure it is resolvable by your machine (i.e. is it in /etc/hosts?? can you get an IP for it via nslookup??) My server logs are showing the message "Have not heard from agent ... Will be backfilled since we suspect it is down". What does that mean?When you see [org.rhq.enterprise.server.core.AgentManagerBean] Have not heard from agent [<some agent name>] since [<some date/time>]. Will be backfilled since we suspect it is down it means that the agent did not send up its availability report in the required amount of time (which called the "agent quiet time" - 15 minutes by default but is configurable in the Administration>SystemConfiguration>Settings page). When this happens, the server gets worried and suspects that agent is down - at which time it "backfills" the availability of ALL resources managed by that agent to DOWN (you'll see the availabilities turn RED). This can happen for a number of reasons:
What ports do I have to be concerned about when setting up a firewall between servers and agents?(note: the following refers to out-of-box defaults. If you configured your RHQ Servers or RHQ Agents to use different ports than their out-of-box defaults, you'll obviously want to use your custom port numbers.) The agent is configured to talk to one or more RHQ Server endpoints. Each RHQ Server endpoint includes the port. This port is typically over 7080 (non-secure) or 7443 (SSL-secured). So, when the agent starts up, it will try to communicate to a RHQ Server over one of those two ports (an agent will only need to go over one of those ports to talk to a single RHQ Server - it depends on the transport used). If your server expects to be communicated to via the "servlet" transport, that is unsecured over 7080 by default; if it uses "sslservlet" transport, that is ssh-secured and is over 7443 by default. That's the only port from agent to server. When the server needs to talk to the agent, the server talks over the bind port that the agent was configured for. Out of box, that is 16163 (and that can be the same port whether the server talks to the agent over the socket or sslsocket transport - in other words, it doesn't matter if its secured or not, by default out of box its port is 16163). That's the only port from server to agent. Of course, the server needs to talk to its database too - so make sure any ports that your database requires to be open are actually open to the server - otherwise, the server will not be able to talk to the database. Note: servers do not talk to one another directly - there are no ports required for server-to-server links because there is no communication like that going on. So in summary: server -> agent : port 16163 I installed the Server as a Windows Service, but it is failing to start with no error messages. How can I start the Server as a Windows Service?You probably installed the Server to run as the "Local System Account" and that account probably doesn't have the proper permissions to run the Server. Perhaps your machine has been locked down due to security concerns and that Local System Account cannot access the network or run Java or any number of things. To solve this, create a user on your Windows box that can run the Server properly (you can test it, log in as the user and execute "rhq-server.bat console" to see if it can be run by that user). Then, install the Server as a Windows Service with the RHQ_SERVER_RUN_AS_ME environment variable set to "true": rhq-server.bat remove set RHQ_SERVER_RUN_AS_ME=true rhq-server.bat install For more information on installing the Server as a Windows Service, see Installing and Running as a Windows Service. How do I fix a 'ORA-12519, TNS:no appropriate service handler found' error when using Oracle XE?Although not for production use it is not uncommon to use Oracle XE for test or development environments. For Oracle XE 10g the following setting should be applied in addition to any other settings, if it has not been set to any other non-default value: ALTER SYSTEM SET PROCESSES=150 SCOPE=SPFILE; This setting requires a restart of Oracle XE. When I try to create a bundle by uploading a Ant recipe XML directly, the XML content seems to get corrupted and tags are placed out of order.If you file upload a ANT script as the recipe, you can't use XML notation like <property name="a" /> - you need to explicitly provide the ending tag like this: <property name="a"></property>. If you don't want to be forced into using that notation, just copy-n-paste the ANT script content directly in the text field, as opposed to using the file upload mechanism. AgentHow do I get debug messages from the RHQ Agent?The easiest and quickest way to get your agent to start logging debug messages is, before starting your RHQ Agent, to set the environment variable RHQ_AGENT_DEBUG to "true". Now when you start the agent, both the launcher scripts and the agent itself will output debug messages. When you use this environment variable, the agent will use an internal log4j configuration file called "log4j-debug.xml" which is located in the agent's main jar file. If you want more fine-grained control over what log4j categories have DEBUG priority, you can directly edit the conf/log4j.xml file (modifying this file requires an agent restart in order to pick up the changes). You must not set RHQ_AGENT_DEBUG if you want the agent to use this log4j.xml file (setting that environment variable will cause the agent to override this log4j.xml with the internally configured log4j-debug.xml file which enabled all categories for the DEBUG level). The log messages can be found in the log files located in the <agent-install-dir>/logs directory. If you are launching the RHQ Agent on Windows using the service wrapper, you must set RHQ_AGENT_DEBUG and then install the service via rhq-agent-wrapper.bat install. If you want to enable or disable debug messages while the agent is still running, you can use the "debug" prompt command (type "help debug" at the agent prompt for more info). You can write your own log4j.xml files, put them in /conf and use them via the debug -f command. For example, debug -f custom-log4j.xml. This means that while the agent is running, you can switch between log4j.xml files if you want by simply using debug -f and passing in the log4j.xml file you want to use. RHQ also ships with log4j-warn.xml in the agent jar too - this can be used if you want the agent to be especially quiet (only WARN and above messages are logged, INFO and below are not). For example, during runtime you can invoke debug -f log4j-debug.xml which will "turn on debugging" while the agent is still running. When you are done debugging, you can invoke debug -f log4j.xml which switches the agent to the default log4j configuration without having to shutdown and restart the agent. You can get fancy with your own log4j xml files - so if you want to just enable debug for your own plugin for example, you can write your own log4j.xml, put it in conf/ and switch between that log4j configuration and the default one all without having to recycle the agent. How do I start the RHQ Agent fresh, as if newly installed?If you want the agent to clean itself of all previous inventory and force itself to re-register with the server, shutdown the agent and restart it with the --cleanconfig command line option. If you do this, you may also pass in the --config argument to have it start up with a configuration file you specify (otherwise, the default conf/agent-configuration.xml will be used). The -l option is an alias for --cleanconfig and -c is an alias of --config - therefore your command line can be similar to the example below (if on Windows, replace .sh with .bat): rhq-agent.sh -l or rhq-agent.sh -l -c my-agent-configuration.xml where both will clean the old configuration, but the first loads the default agent-configuration.xml and the second loads your custom my-agent-configuration.xml (it will look in the conf/ directory, unless you specify a full path to a location other than conf/). My resources went "red" after starting the agent with -u / --purgedata or -l / --cleanconfigIf you purge the persisted data that the Agent maintains, you must also reset the "connection properties" for each resource that Agent is managing. If your resource had manually overridden connection properties (ones that you used the web console to set), then you will need to set those again. To ease the burden of doing this, consider creating compatible groups for these resources; this will enable you to set the connection properties across all members in the group at the same time.
How can I update the plugins on all my agents?When you add a new plugin to your system, or you upgrade an existing plugin, you normally want to tell all of your agents to update their existing plugins with the new plugin versions. You can individually do this by executing the prompt command "plugins update" at any agent prompt. Or you can individually execute the operation "Update All Plugins" from the UI's Operation Tab for each "RHQ Agent" resource. If you want to update all of your agents so they all download the latest plugins, you can use the DynaGroup feature along with the Group Operation feature to do this. First, create a DynaGroup with the expression: resource.type.plugin = RHQAgent resource.type.name = RHQ Agent This creates a compatible group that dynamically adds all RHQ Agents as members to that group. Note that if you already have a compatible group with your agents as members, you can skip this group creation step. Next, traverse to that compatible group that contains all your agents. You should see an Operations tab. From here, just invoke the "Update All Plugins" operation on that group. This will tell all of your agents in that group to update their plugins. Once that group operation is completed, all of your agent will have the latest, most up-to-date versions of all plugins. How can I change the agent name after it has already been registered?When you start the agent for the first time, the first setup question asked is for the "agent name". This is a name that must be unique across all agents in your environment. Once registered you cannot change this name. Anytime you attempt to re-register this agent, you must re-register it with the same name that it was registered under before. Note that this "agent name" is not the same as the "RHQ Agent resource name" that you see in the UI. If you import an RHQ Agent resource into inventory, that resource's name will be something like "agentname RHQ Agent" where "agentname" is agent name you provided at agent setup time. This RHQ Agent resource name can be changed by editing its value within the Inventory tab. Changing this name does not change the name that the agent is registered under. Your agent is still registered under its original agent name. I want to run agents on all my machines, but only one starts OK - the rest fail due to binding to a wrong addressIf you want to run multiple agents, but many fail to start with this error: FATAL [main] (org.jboss.on.agent.AgentMain)-
{AgentMain.startup-error}The agent encountered an error during startup
and must abort java.net.BindException: Cannot assign requested address
then there are a couple of things you need to consider. First, if you changed your agent-configuration.xml manually (say, to change IP addresses), did you do that after you initially setup the agent? The agent's configuration XML file is not referenced after the agent is setup - it doesn't need to because its configuration is persisted using Java Preferences (this is so it can support agent updates or agent re-installs without losing its configuration). If you want to change the agent's configuration file and have those changes picked up, restart the agent and pass it the --config command line option (or -c which is shorthand for --config). This tells the agent to re-read the configuration file and make that its configuration, overriding any old configuration it persisted before. The other question to ask is - is your home directory stored on NFS? If so, then you are probably picking up the same Java Preferences across all your machines (see $HOME/.java - that is the default location where Java stores Java Preferences on UNIX - on Microsoft Windows, it goes in the registry so this might not be relevant if you are on Windows). If you are running the agents as the same user and your user's home directory is shared (via NFS or some other sharing technology) then one solution is to have your agents use different Java Preferences names. Each time that you start your agents, you need to tell them where they can find their preferences. You tell the agent your new preference name via --pref (or its shorthand notation of -p). Each agent must have their own preference node name. On UNIX, you could use `hostname` as its value, for example. Read the comments at the top of agent-configuration.xml, it has some relevant info in there. You can also read the usage help too: rhq-agent.sh --help.
If you do not want to be forced to edit your configuration files or pass the -p option, the other alternative is that you can define the system property java.util.prefs.userRoot to point to some other, unique, location (e.g. /etc/rhq-agent-prefs). When the agent starts, Java will use the value of that system property as the location where it will store its Java Preferences. You set this system property on the agent via the environment variable RHQ_AGENT_ADDITIONAL_JAVA_OPTS. When you set that environment variable, rhq-agent.sh will add its value to the default set of Java options when passing in options to the agent's Java VM: set RHQ_AGENT_ADDITIONAL_JAVA_OPTS="-Djava.util.prefs.userRoot=/etc/rhq-agent-prefs" rhq-agent.sh When starting the Agent via a Windows service, the Agent fails to start, and I see the error "java.lang.IllegalStateException: The name of this agent is not defined - you cannot start the agent until you give it a valid name" in the Agent wrapper log file. What does this mean?The Agent cannot ask for its initial setup configuration when installing as a Windows service (because there is no console for the user to see and answer the prompts). This means that you need to either preconfigure the agent or run the agent in standard (non-service) mode once as the user that should run the service in order to answer the setup questions and configure it before installing it as a service. My Agent setup is correct but my Agent is getting "Cause: org.jboss.remoting.CannotConnectException: Can not connect http client invoker."Starting in RHQ 1.1, the Server information defined in your Agent setup is used only for initial contact with a RHQ Server (i.e. the server hostname/IP address you provide to the agent startup setup prompt is only used when initially registering with the server). Since RHQ 1.1 supports a multi-Server "High Availability Cloud", the Agent may be serviced by any Server in your RHQ Server network. The Agent will try to connect to any Server in the cloud -and it does so via the Server endpoint as defined for the Server at Server install-time, or via the RHQ GUI's server details pages (Administration>HighAvailability>Servers). This error is typically seen when the Server's endpoint address is not set to something that can be resolved by the Agent. The Public Endpoint Address set for each Server must be resolvable by every RHQ Agent. Check your Server endpoint information via the GUI's HA Administration page and update if necessary. After the update, restart your Agent. My agent machine does not have a writable directory called /var/run. How can I get my rhq-agent-wrapper.sh script to successfully write out its pidfile?Set the environment variable RHQ_AGENT_PIDFILE_DIR to a full path of the directory where you want the pidfile to get stored. When you run the script, that variable's value will override the default location. If you have an older script (2.1 or older), directly edit rhq-agent-wrapper.sh and change /var/run to the directory that you want. Explain how the agent scans for resourcesWhen the agent performs discovery, it does so using two different types of "scans" to try to find resources. A "server scan" detects top-level servers that run on your platform - things like JBossAS servers, Postgres servers and the like. These scans run by default every 15 minutes. The setting that controls this is "rhq.agent.plugins.server-discovery.period-secs". A "service scan" detects lower-level and more fine-grained services that are running in already detected and imported top-level servers. Things like EJBs running in JBossAS, tables in a Postgres databases/tables or VHosts in Apache. These scans run by default every 24 hours (i.e. 1 day). You must have already imported the servers in inventory before services can be discovered! These types of scans are normally very "expensive" to perform since they do probing inside the managed resource, so we don't do it often (which is why its defaulted to 24 hours). The setting that controls this is "rhq.agent.plugins.service-discovery.period-secs". The above two types of scans are "discovery" scans - in other words, they attempt to discover new resources that the agent does not have in its inventory yet but they might be ones you want to manage. There is also a third type of scan - an "availability scan". A availability scan is not a discovery scan; however, it is very important to understand what it is. When the agent performs an availability scan, it tries to determine the availability of resources that are already discovered and committed to inventory (i.e. these resources were previously discovered by one of the two types of discovery scans previously mentioned). These availability scans run by default every 5 minutes - the setting that controls this is "rhq.agent.plugins.availability-scan.period-secs". After an availability scan completes, the agent will have an up-to-date status of which resources are either UP or DOWN. Once the availability scan is finished, the agent will send an "availability report" to the server. This is how the server will know which resources should currently be displayed as UP or DOWN (aka "green" or "red"). Note that this availability report serves a second purpose - it informs the server of the agent's own availability! In other words, when the server receives an availability report from agent A, not only does the server now know the UP or DOWN status of that agent's managed resources, but it also implicitly knows that agent A itself is UP. This agent availability will thus reset the clock on that agent's "quiet time", which is used by the server to determine when it should suspect that agent is DOWN. For example, if the "max agent quiet time" server setting is set to 10 minutes, and the server hasn't received an availability report from agent A in over 10 minutes, the server will suspect that agent A is DOWN (which has a side effect of causing the server to "backfill" all of the agent A's managed resources to the availability status of DOWN). How can I see the agent persisted configuration?The agent's configuration is initially read from agent-configuration.xml and overlaid with values you enter at the setup prompts at startup. After the agent is initially configured, it will persist that configuration and never look at agent-configuration.xml (unless you clear the configuration). The actual location on the file system where the configuration is persisted is platform dependent - for example, on UNIX, its typically "$HOME/.java" (see the Java Preferences API documentation for more information on how and where Java persists preferences). For more details, read the comments at the top of the agent-configuration.xml file. Configure the RHQ Agent and Preconfiguring the Agent also has more information on this. There are several ways in which you can view the agent's persisted configuration.
How can I get a dump of inventory information from an agent running on another machine?The use-case here is that someone (call him "the customer") is running an agent in their environment and is having problems. You suspect the customer's agent inventory is corrupted somehow. As a developer, you would like to know exactly what the agent thinks is in its inventory so you can debug the problem. To get this information, you must get the customer's agent "data/inventory.dat" file. Copy that file to your local machine (it doesn't matter what directory you put it in). Now, run your own agent on your own local machine - make sure you run that agent with the same plugins that the customer was running with. The agent doesn't necessarily have to be connected to a server, but the plugin container must be started (that means the agent has to have been registered). Now, execute this agent prompt command: inventory --xml --export=/customer-inventory.xml /the/customer/inventory.dat where /the/customer/inventory.dat is the full path to where you copied the customer's inventory.dat file. If you do not specify the --export option, the XML will simply be dumped to the stdout console window, otherwise, the XML is stored in the full path you specify. I need to change the IP Address of my agent machine - how do I keep my server and agent up to date with that change?The agent has a configuration preference named "rhq.communications.connector.bind-address" whose value is that of the IP address the agent binds to when it starts its server socket (the thing it listens to for incoming messages from the server). If you change the agent's IP address (and invalidate the old agent IP address), you have to do a couple things:
Once the agent is restarted, it will use that new IP address. When I shutdown the agent, the RHQ Server takes more than 14 minutes to detect the agent was down. Can I configure it to not take so long?You are killing the agent entirely, so the agent is never reporting any availability data at all to the server. To support cases like this (where the agent is completely down or unresponsive), periodically, the server needs to check to see what agents it hasn't heard from in a long time and then determine which of these "suspect" agents are really down. Read this for background on this issue: https://bugzilla.redhat.com/show_bug.cgi?id=RHQ-1098. That issue tells you why we increased the default time. Read this for more information - it talks about the new default time: https://bugzilla.redhat.com/show_bug.cgi?id=RHQ-2349. It states, in part, "We have a quiet time of 15m right now (recently changed to that)." What does this mean? It means, by default, if we have not heard from an agent in 15 minutes (what we call the agent's "quiet time"), only then do we mark that agent and all of its resources down. This is why it takes more than 14 minutes to detect your agent was down. If you do not like that, and you want it to report "down" faster, then, yes, you can change this - its configurable in the GUI... go to the main menu "Administration>SystemConfiguration>Settings" and change the setting "Agent Max Quiet Time Allowed" to something shorter. Note: the shorter your allowed quiet time interval is, the greater the possibility of a "false negative" - for example, if you set quiet time to 5 minutes and if your server can't process all your agent's availability reports fast enough, it may think it hasn't heard from an agent when in fact it just hasn't had time to process the latest avail report. When an agent is determined to be down, the server has to "backfill it" - marking all of its resources down - and this is expensive. So you don't want to do this often. Do I have to run the agent as root?You do not necessarily have to run the agent as root. It all depends on how much and how deep you want to manage your resources. For example, there is a Postgres plugin that lets the agent probe the Postgres configuration file postgres.conf. However, by default, Postgres installs itself with very strict file permissions on that file - and if you run the agent as a non-root, non-postgres- privileged user, it won't be able to read that file and manage it (and you'll see agent log messages saying so). The same is true for lots of other plugins that try to manage things that touch privileged files (like iptables and things like that; even JBossAS app servers might be installed with strict file privileges that might cause this). If you run the agent as root, you are giving the agent privileges to manage all those things - if you don't, you are giving the agent restricted views of your managed resources. This might be what you want, hence, you don't have to run the agent as root. But if you don't run the agent as root, you must be willing to accept that the agent will not be able to manage some things and will log messages saying so. How can I find out what environment variables and Java system properties are set in my agent JVM process?The prompt command "version" can give you a list of the agent process' environment variables and system properties. At the agent prompt, type "help version" for the syntax of that command. In short "version --sysprops" will provide a list of all the system properties, "version --env" will provide a list of all the environment variables. Log messagesWhat are "Command failed to be authenticated" messages?Agents are assigned security tokens when they first register with the server. The token is one way an agent identifies itself with the server. If an agent does not identify itself with any token, or if it identifies itself with a wrong token, the server will deny access to that agent - in other words, the server will reject commands that come from that agent until that agent has properly registered. If an agent is continually causing "failed to be authenticated" errors on the server similar to this: 02:31:33,095 WARN [CommandProcessor] {CommandProcessor.failed-authentication}
Command failed to be authenticated! This command will be ignored and not processed:
Command: type=[identify]; cmd-in-response=[false]; config=[{}]; params=[null]
then it usually means the agent has been misconfigured, or it is an unknown agent attempting to identify itself as another agent. Restart your agent with the "--cleanconfig" command line option to clean out its configuration and re-register.
What are "fail-safe cleanup" messages?You'll often see messages in your logs that look like: 13:43:10,781 WARN [LoadContexts] fail-safe cleanup (collections) : org.hibernate.engine.loading.CollectionLoadContext@103583b <rs=org.postgresql.jdbc3.Jdbc3ResultSet@d16f5b> Please ignore these messages as they are normal and expected. The messages deal with the underlying ORM technology used (Hibernate) and how it automatically cleans up after itself to prevent memory leaks. PluginsPlatform PluginHow can I collect syslog messages as RHQ Events?The Linux platform plugin can monitor syslog messages by emitting them as events. Syslog messages can be collected by the plugin by either reading syslog message files or by receiving them over a socket listener. In either case, syslog must be configured to format the messages in a way that RHQ can parse. You can either tell RHQ (in the platform's plugin configuration - aka connection properties) what regular expressions can parse your syslog messages, or in your syslog config file (e.g. /etc/rsyslog.conf), you should format your messages that RHQ understands out of the box. In the latter case, if you make sure you define the syslog message format like below, the Linux platform plugin can parse it: $template RHQfmt,"%timegenerated:::date-rfc3339%,%syslogpriority-text%,%syslogfacility-text%:%msg%\n" If you then use "RHQfmt" in your syslog configuration so it writes messages out in that format, you'll be able to have RHQ understand the log messages fully. For example: $template RHQfmt,"%timegenerated:::date-rfc3339%,%syslogpriority-text%,%syslogfacility-text%:%msg%\n" *.* /var/log/messages-for-rhq;RHQfmt *.* @@127.0.0.1:5514;RHQfmt That will both write syslog messages to /var/log/messages-for-rhq and will send the messages over TCP to a listener on port 5514 (you would configure the platform's connection properties to listen to this port). JBossAS PluginWhy does only 1 JBossAS server show "green" availability and all the rest show "red" even though I made sure all of my JNP credentials are configured properly in my resources' connection properties?There is a problem in the way the JBossAS JNP client works. See RHQ-1030 for the full description of the problem, but in short, if you a managing multiple JBossAS servers on a single box, all of your security credentials for those servers must be the same (i.e. the JNP username and password must be the same). Postgres PluginWhy is the agent showing an error in my postgres discovery about authentication failed for user "postgres"?The Postgres plugin attempts to log into the database server using the username and password of "postgres". In many installations, this is a default superuser and will work. However, it is also possible that this login could fail for a number of reasons:
In many cases, this can be alleviated as follows:
Additionally, Postgres may need to be changed on Linux systems to allow password based logins (i.e. "md5" v. "ident sameuser" settings in the pg_hba.conf file). Consult the Postgres for more details. Why are most of the metrics for my Postgres resource showing up as NaN?In many installations, Postgres will not start its statistics collector by default. To enable statistics collection, add (or change) the following line in the postgres.conf file: How many database connections are necessary to monitor a Postgres database?Each Postgres database inventoried in RHQ requires 1 connection. Why can't I drop my database that is inventoried in RHQ?With the frequency of availability and statistics monitoring, the Postgres plugin keeps an open connection to the database. As such, when attempting to drop a database currently inventoried in RHQ, an error will be thrown about the database being in use. In order to drop the database, the RHQ Agent monitoring the database must be shutdown or the database resource should be removed from RHQ. This will close the postgres plugin's connection to the Postgres server and thus allow you to drop the database. Apache PluginWhere can I get the connectors?The Apache plugin monitors an Apache Web Server via custom modules like the SNMP connector. You can download the open-source versions of these connectors and install them in your Apache Web Server. Augeas-based PluginsWhat is this augeas plugin?The augeas plugin is an "abstract" plugin that exists solely as an extension point for other plugins to extend. The augeas plugin provides the Java JNI classes necessary for other dependent plugins to use to access the Augeas native library. For example, the opensshd plugin depends on the augeas plugin because it uses the Augeas library to access the OpenSSH daemon configuration. The other RHQ plugins known to use this augeas plugin are: hosts, grub and apt. Why does my agent log have this in it: "java.lang.UnsatisfiedLinkError: Unable to load library 'augeas': libaugeas.so: cannot open shared object file: No such file or directory"This occurs when you have deployed one or more augeas-based plugins but your Linux machine does not have the augeas native library installed. See http://augeas.net for more information on Augeas and how you can install it on your machine. TroubleshootingInstaller fails on PostgreSQL with "Relation RHQ_Principal does not exist"First make sure that the RHQ server / installer is allowed to connect to PostgreSQL. You should look at the PostgreSQL configuration file pg_hba.conf where the permissions are configured. If this is OK, and the installer is able to connect to the database, please check the PostgreSQL page for a workaround. RHQ 1.0 has trouble starting on Java 6Java 6 is not supported on earlier versions of RHQ 2.0 - please use Java 5. RHQ 1.1 and up support Java 6. RHQ 4.0 has trouble starting on Java 5Support for Java 5 was dropped in RHQ 4.0. For RHQ 4.0 and later, please use Java 6. The execution of a Script-resource fails on UnixWhen I invoke the "Execute" operation on a Script resource, it immediately fails and I get an error saying that the script can not be executed. Make sure that the execute bit is set on the resource. You can set it via chmod +x scriptname Install fails on Oracle with ORA-01843This issue happens when Oracle runs in a locale where the abbreviation for April is not 'APR' like in EN or DE locales. There are currently two workarounds
When trying to monitor a JBoss EAP instance, I get the error "Connection failure Failed to authenticate principal=null, securityDomain=jmx-console"As explained in the JBoss EAP documentation, the jmx-console is secured by default, follow the instructions listed in the EAP Installation Guide to define a username/password. Then, in the RHQ GUI, go to the Inventory > Connection tab of the JBoss EAP Resource and set the username and password properties to the same values. Also note that when starting a JBoss EAP instance without specifying a configuration parameter (-c), it will be started with the "production" configuration, as described in JBPAPP-198. Why does my Apache SNMP module fail to start with the error ...?"Syntax error on line 1376 of /etc/httpd/conf/httpd.conf: Unable to write to SNMPvar directory" (on stderr)Please ensure the directory specified via the "SNMPVar" directive exists and is writable by the user that owns the Apache process. "init_master_agent: Invalid local port (Permission denied)" (in the error_log file)See if your Apache error_log contains a log message similar to "[notice] SELinux policy enabled; httpd running as context user_u:system_r:httpd_t:s0". If so, the SELinux (Security-Enhanced Linux) policy is preventing the httpd process from binding to the SNMP agent port (1610 by default). The easiest solution is to put SELinux in permissive mode by running the command "/usr/bin/setenforce 0" and then restart Apache. You should then see a message similar to "[notice] SELinux policy enabled; httpd running as context user_u:system_r:unconfined_t" in your error_log; note the "unconfined_t" portion, which indicates SELinux is no longer restricting the process. When monitoring a JBAS instance, I'm not seeing any JVM resources beneath it?In order for RHQ to discover JVM resources for a JBAS resource, the corresponding JBAS instance needs to be running on Java 5 or later, and it needs to have been started with the jboss.platform.mbeanserver System property set. For example, in UNIX-type environments, you can specify the following in the ${JBOSS_HOME}\bin\run.conf file: JAVA_OPTS="$JAVA_OPTS -Djboss.platform.mbeanserver" Note: With RHQ 1.0 and 1.0.1, if the system property com.sun.management.jmxremote is also specified this will prevent the JVM resources being discovered by RHQ. Removing this property will allow those resources to be found. In RHQ 1.0.1 this restriction is lifted and even if the system property com.sun.management.jmxremote is specified JVM resources should still be added to the RHQ inventory. How can I debug JDBC access and trace SQL?Use log4jdbc. How can I stop my agent from thinking the server keeps going up and down when the server has remained running the whole time?If you see information like this in your agent logs: INFO (org.rhq.enterprise.agent.AgentAutoDiscoveryListener)- {AgentAutoDiscoveryListener.server-offline}
The Agent has auto-detected the Server going offline [InvokerLocator
[servlet://server:7080/jboss-remoting-servlet-invoker
/ServerInvokerServlet?rhq.communications.connector.rhqtype=server]] -
the agent will stop sending new messages
...
INFO (org.rhq.enterprise.agent.AgentAutoDiscoveryListener)- {AgentAutoDiscoveryListener.server-online}
The Agent has auto-detected the Server coming online [InvokerLocator
[servlet://server:7080/jboss-remoting-servlet-invoker
/ServerInvokerServlet?rhq.communications.connector.rhqtype=server]] -
the agent will be able to start sending messages now
it means the agent has auto-detected its server going down and back up. This auto-detection was done through the multicast detector (it is different than the detection-via-polling, which is the second way the agent attempts to detect the server's status). If you think the agent is erroneously detecting the server going up or down, it is possible your network does not support multicast traffic or the multicast network is acting abnormally. In either case, you should disable the agent multicast detector and just have the agent rely on polling to detect changes in the server status. To turn off the multicast detection, set the following agent preferences to false: rhq.agent.server-auto-detection Those are the actual Java Preference names; you may often see these in the user interface as the following: Auto-Detect RHQ Server? Since you are disabling multicast detection, make sure you keep the polling detection feature enabled (i.e. rhq.agent.client.server-polling-interval-msecs should be larger than 0, typically 60000), otherwise, the agent will never be able to know when the server goes down. Once you reconfigure the agent, you need to restart it so the communications subsystem can pick up the changes. My Agent fails to start with "[: 207: ==: unexpected operator".This is a known bug in RHQ 1.2/Jopr 2.2. There is a syntax error in rhq-agent.sh that causes the script to fail when executed by non-bash shells (e.g. /bin/sh on Solaris, HP-UX, or AIX). To fix the issue, edit rhq-agent.sh and change the "==" on line 207 to "=". Why are the graphs and charts on the Monitor tab in the GUI not displayed?If you see errors in the RHQ Server log such as: java.lang.NoClassDefFoundError: Could not initialize class org.rhq.enterprise.gui.image.chart.ColumnChart it is probably because you are missing some system fonts needed by Java to generate the text in the graphs/charts. If you are on Linux, make sure you have the urw-fonts package installed. On Fedora or RHEL, use: yum install urw-fonts If you are on another OS, make sure you have all the default fonts installed. To help debug Out Of Memory conditions, how do I get the agent or server to dump heap when it runs out of memory or on demand?Pass these JVM arguments to the server or agent, e.g. via RHQ_AGENT_ADDITIONAL_JAVA_OPTS or RHQ_SERVER_ADDITIONAL_JAVA_OPTS: -XX:+HeapDumpOnOutOfMemoryError -XX:+HeapDumpOnCtrlBreak If you want the heap dump file to be dropped in a particular location, additionally specify: -XX:HeapDumpPath=<where you want the hprof file> See SUN JVM Debugging Options for more info. |