1.5 The Use of Data in the PSA

Indicators and statistics are an integral part of the process of undertaking the PSA and have a prominent role in achieving a quality Population Situation Analysis and monitoring the progress towards the goals to set. The PSA is a data intensive process and attention should be paid to the capacity of the national statistical system to deliver the appropriate data. It requires a comprehensive approach, requiring data and information production and analysis at macro level, at the level of key individual sectors, including both productive and social sectors as well as at the household or individual level.

The availability of data for analysis is one of the aspects where the practical feasibility of executing a complete PSA may vary greatly from one country to the next. This is due to two distinct reasons. On the one hand, the level of development of the statistical system in individual countries may be different. Some developing countries have reliable civil registration systems; others have long series of Demographic and Health Surveys (DHS), going as far back as the 1980s; still others have neither the former nor the latter. Obviously, this may impose major limitations on the types of analyses that can be performed. The other distinction has to do with the degree to which national statistical authorities provide public access to the data they collect. Some countries, such as Brazil, nowadays have very liberal data policies that make it possible for any legitimate user to carry out his/her own analysis on micro-data. In other parts of the world, such as some countries in Central and Eastern Asia, this access can be much more problematic, to the point where in practice the NSOs are the only entity that can generate analyses based on national data. Even though the situation is improving, census data in many parts of the world are still treated as a national security issue. Under such circumstances, the role of UNFPA or even the UN system as a whole in the process of applying the PSA may be limited to compiling existing data and research and using the process of the PSA to advocate for further data analysis on the part of the government and greater public access to data.

In selecting indicators, one should consider the two main categories of indicators: intermediate and final indicators. Final, outcome or impact indicators measure the outcome or impact of interventions on individuals’ well being, e.g. individuals’ freedom from hunger, literacy, good health, security, etc. They capture behaviour change, the use of services, and satisfaction with public services, such as use of health clinics. Intermediate or process indicators measure the factors that determine an outcome or contribute to the process of achieving it. They are also called “input” or “output” indicators, depending on the stage of the process. For example, many inputs may be needed to raise literacy levels of the population: more schools and teachers, better textbooks, etc. While measures of public expenditures on classrooms and teachers would be an input indicator, measures of classrooms built in compliance with the rules and well trained teachers performing would be output indicators. Outputs are the final direct deliveries of a project or specific intervention differing from outcomes which requires contribution more than the exclusive control of the given intervention. While the number of schools built in compliance with the rule is an output, the number of children who would attend the schools is an outcome, because it depends on the behaviour of children and their families.

Although in practice it may not always be possible to consider both indicator categories, the PSA should attempt to consider at least those indicators that are likely to be used in the next Common Country Assessment (CCA). This will guarantee consistency with the current Policies and Procedures Manual (PPM) guidelines. In situations where the PSA is carried out well before the CCA, there may be some uncertainty about precisely what these indicators will be. Previous CCAs and recent CCAs from neighbouring countries may provide some guidance on what to expect and, of course, UNFPA is always free to promote the use of new indicators within the CCA process.

Although indicators are important, especially for programmatic purposes, care should be taken not to reduce the analysis of social issues to the mere construction of indicators. Indicators can be misleading if they are applied outside the context for which they were constructed. For example, the female labour force participation rate is the percentage of women that declare having some kind of economic activity. It cannot be used to measure the proportion of the number of hours spent on economic activities by women in comparison to men, the percentage of the GDP produced by women or the percentage of household incomes generated by women. Nor is it true that equal participation rates between men and women imply that inequalities in the labour market have been eradicated. In addition, the value of indicators may change for different reasons which point at different policy implications and the indicator itself may not offer any clues as to how this change should be interpreted. For example, female labour force participation may decline as a consequence of increased discrimination against women in the labour market, but, depending on the circumstances, it may also decline because of legislation to raise the wages of domestic servants, thereby making it less attractive for middle class women to work outside the home. It may even be because of a general increase of wages, which makes it less necessary for women in menial occupations to continue complementing the family income. Which of these possible causes is at work is something that the indicator by itself does not reveal and that requires more detailed research.

To the extent feasible, the PSA needs to strike a balance between quantitative and qualitative data. Qualitative data collection methods include:

  • Beneficiary assessments: Participant observation and more systematic data collection methods like structured interviews over a limited time span;
  • Ethnographic investigations: Anthropological research techniques, especially direct observation, to analyze the influence of ethnicity, gender and village stratification on the household and group well-being and behaviour;
  • Longitudinal village studies: Wide variety of methods ranging from direct observation and recording (tabulation), periodic semistructured interviews with key informants (e.g. health center staff) and village population, to survey interviews in several different observation periods;
  • Participatory assessments: Ranking, mapping, diagramming and scoring methods are prominent, together with open interviews and participant observation, usually over a relatively short time span. These methods build on local populations describing and analyzing their own reality surrounding poverty and wellbeing.

Qualitative methods provide information that can be analyzed on both ordinal and nominal scales. Examples include: focus group discussions, in-depth interviews, and clients exit satisfaction interviews using open-ended questions. These are useful for seeking opinions. However, the methods are generally not representative and therefore do not allow generalizations and are susceptible to biases introduced by the interviewers, observers and informants. While these kinds of data are rarely considered to be part of a formal statistical system, nevertheless the information they provide is of the utmost importance for the development of a comprehensive PSA.

Any quantitative data and indicators that are presented need to be accompanied by meta-data underpinning the interpretation of the levels and trends implied by the quantitative data. This becomes especially important in the face of apparent inconsistencies in values of indicators between different data sources. In such situations, the qualitative information may help understanding the nature of discrepancies between indicators and in some cases may help identifying which estimates are the more likely. When facing such inconsistencies, consideration must also be given to alternative indicators that are known to be highly correlated with the one for which inconsistencies are observed. The estimation of maternal mortality may serve as an appropriate example in this regard. For many countries there will be at least three different estimates of the Maternal Mortality Ratio (MMR). One or more that are derived from national surveys, census or vital registration, another from modeled estimates prepared by WHO / UNICEF / UNFPA / World Bank (prepared every five years), and yet another from a comprehensive study by Hogan et al. published in The Lancet in April of 2010. Each of these estimates is based on different methodologies, and they are likely to indicate different levels and different trends. The PSA needs to report each of these estimates and discuss them in the light of what country information was considered in the external estimates and based on actual developments in the maternity/obstetric care delivery system in the country. In doing so, it may have to rely on qualitative information (has there been notably improved training of health personnel, improvements in physical infrastructure, etc. ?) as well as quantitative data (levels and trends in deliveries attended by skilled personnel, numbers of basic and comprehensive obstetric care facilities, etc.). Thus, the PSA could arrive at an educated guess as to the real situation with regard to inconsistent indicators.

The substantive chapters in the second part of this manual contain more systematic references to data sources on particular subjects. These are broken down by primary sources and secondary sources. Although this is not always the case, the first tend to be national data, whereas the second tend to be data that have already undergone some processing and that are being used for inter-country comparisons by international agencies. Most of international organizations such as the United Nations
departments and the specialized UN agencies generate global and country statistics that could be used when no reliable national indicators or information are not available. Sometimes, using international based indicators may generate contest from the partner governments. As in the case of maternal mortality referred to in the previous paragraph, the secondary data frequently do not agree with the primary sources. The main potential reasons for such inconsistencies are:

a) Countries are using more recent data that have not yet found their way to the international agencies;
b) Even though the international agencies have most recent data, they prefer not to use
them before their quality has been assessed;
c) Rather than using most recent data, the international agencies prefer to use a trend line, based on several recent data points;
d) Data sources between countries are not comparable and the international agencies are
applying adjustments to improve comparability;
e) Due to the poor quality of national data, the international agencies are ignoring any
national data that may exist and inferring values based on some sort of model;
f) National data are based on incomplete geographical coverage, so that in international
data compilations they have either been ignored or adjusted to the national level.

While specific international data sources will be referenced in the substantive chapters, it may be appropriate to refer here to a particularly comprehensive collection of UN data that was recently created by the UN Statistics Division, namely the UNdata site (http://data.un.org), which brings together a wide variety of economic, social, health, and demographic statistics. For more comprehensive demographic data, the Demographic Yearbook, also published by the UN Statistics Division, continues to be an important data source. Another important data source is the Integrated Public Use Microdata Series (IPUMS) at the University of Minnesota, which maintains the original micro-data from a large number of censuses from around the world, so that these will be available for secondary data analysis.

The objective of national ownership might be understood to imply that priority needs to be given to national data, over data that is internationally compiled. This is not always the case. It must be considered, however, that only a fraction of all national data makes its way into international databases and that a thorough (re)analysis of available national data sources is often likely to yield richer information as to the population situation and its differentials, trends, and correlates. Yet, caution must be exercised
with regard to comparability of data: national data may suffer to a greater or lesser extent from incompatibilities due to different methodologies and definitions. Quality of data may vary by source and over time. No simple solution exists in resolving or circumventing such data problems. The PSA needs to document such issues and could possibly include their very existence in further dialogues with data producers and incorporate proposals for strengthening national statistical systems in ensuing policy recommendations. When using international data, it is important that any deviations from national values, due to different definitions or adjustments, are appropriately footnoted.

Similar considerations apply to situations where data gaps are encountered. The PSA would note such gaps and, in the appropriate section of the PSA, formulate recommendations on ways to overcome these. In the meantime the PSA would investigate any available information that pertains to the subject matter on which missing data is encountered in order to arrive at a reasonable assessment of the situation that the missing data would have measured. Sometimes the lack of exact quantitative indicators can be remedied by providing less precise qualitative characterizations of the situations, which are more likely to be correct, e.g. “high but declining” or “unlikely to be less than 100”.

In most countries, the national statistical institute is responsible for the large scale and regular data collection processes. These will include population and housing censuses (PHC), censuses of agriculture and businesses, and sample surveys, especially households based enumeration and other kinds of data collection, such as price collections. However, even in fairly centralized systems different central government ministries and departments will also collect data. In some cases these agencies may carry out specialized data collections such as a school census, or a survey of small businesses. A substantial amount of information could also be collected during the course of regular administrative processes on a routine basis. For example, where people using a public service are required to make some payment, such as applying for a driving license, some information is collected on individuals that could be processed to produce information.

For the purposes of the PSA some important management information systems will include: i) school records, which will provide information on the education system including indicators on issues such as enrolment, academic outcomes and progress through the educational system; ii) population registers, providing information on births, deaths, as well as registration notifications of foreigners and notifications of move; iii) health records, providing information on access to and use of health facilities,
morbidity and mortality data for important diseases, the use of preventative health services and important outcomes such as the nutritional status of children; iv) social security records, providing information on changes in employment and labor market. However, information derived from the records from service delivery systems such as clinics or schools will only cover the specific people and households that make use of the given services.

Although the substantive sections will make more specific suggestions about the use of data, the following overview, which was adapted from the Resource Guide for Youth and Poverty Reduction (UNFPA, 2011), provides a general mapping of the most relevant sources of information.

The Demographic and Health Surveys (DHS) – and similar surveys such as the reproductive health surveys fielded by the Centers for Disease Control (CDC) and the Pan Arab Project for Family Health (PAPFAM) – continue to be one of the main sources of data on SRH. Their main limitation is that they do not provide a lot of contextual socio-economic information, although this situation has been remedied to some extent by the construction of the wealth quintiles, which provide a reasonable proxy for more specific poverty indicators in many cases. These quintiles are constructed on the basis of up to 30 household attributes, including type of flooring and/or roof, source of water, availability of electricity, ownership of various consumer durables, etc. An interesting point made by the Resource Guide for Youth and Poverty Reduction (UNFPA, 2011) is that the DHS data on literacy are to be preferred over census data because they actually call for the respondent to read a simple sentence, based on their everyday life.

For the analysis and dissemination of various demographic and socioeconomic indicators, two software instruments may be useful: REDATAM and DevInfo.

REDATAM (REtrieval of DATa for small Areas by Microcomputer) is a software developed in 1985 by the the Population Division of the UN Economic Commission for Latin America and the Caribbean (ECLAC) for data processing, dissemination and analysis. It allows the analysis of micro-data, particularly from censuses, in order to construct new indicators. It allows users to get the most of population information either in a standalone version (CD) or through direct on-line processing through the web. Since the late 1980’s REDATAM has been extensively used for processing census micro data, both to request results from whole censuses, taking advantage of the friendly interface, high data compression, data processing velocity, data confidentiality, data disaggregation into geographical subareas, and the capability to read only the data in the selected geographical areas which could be represented in tables graphs and maps.

The REDATAM family provides three options for disseminating census data: 1) The R+Process module of the REDATAM+SP software, the data dictionary and the census database together with all relevant census documentation can be placed on a compact disk (CD) to allow users full data access, at a given level of protection, through the REDATAM software and its programming language; 2) The R+xPlan module of the REDATAM+SP software can be used to create applications -customized database interfaces plus pre-defined indicators, that can be placed on a CD to provide a simple way for endusers to obtain pre-defined indicators with some user specification for any geographical areas from the census and other data without knowing how to use the REDATAM programming language; 3) The R+WebServer, via the Internet or an intranet, can provide end-users with direct on-line data processing.

One of the main advantages of REDATAM in this regard is that all three ways of accessing the data provide data security and restrictions for the data using encrypted data compression, passwords and deletion of sensible variables. The micro-data are organized in a way that makes it impossible for users to access data on individual persons or households, thereby protecting the confidentiality of the census data, which is one of the main obstacles to the distribution of census micro-data to the public. With REDATAM, there is really no justification for National Statistical Offices not to disseminate their census micro-data.

The desired level of access can always be set in a given application by each NSO or database owner. An additional advantage of the R+xPlan and the R+WebServer options is that they allow the design of tailored applications for specific users and can more easily be written in the local language —as has been done by the NSO of Mongolia— since they have a much smaller number of screens then the entire REDATAM+SP software, itself.

REDATAM facilitates the analysis of census (and other data) particularly because of its user friendly character and the high speed of data processing. The mapping facilities of the software are frequently used to highlight the spatial distribution of poverty related indicators, access to facilities such as sanitation, or characteristics of special groups (disabled, elderly, indigenous groups, migrants etc.). Finally, the REDATAM family also counts with stand alone applications to estimate indirectly infant and child mortality and fertility.5

DevInfo is a database system used to monitor human development. It has been endorsed by the UN Development Group (UNDG) to support countries in monitoring progress toward the MDGs. DevInfo is compliant with international statistical standards to support open access and widespread data exchange. It operates as a tool for organizing, storing and presenting data in a uniform way to facilitate data sharing at the country level, as well as with UN agencies and development partners. It generates tables, graphs and maps for inclusion in reports, presentations and advocacy materials. Data can be analyzed at different geographical levels, from the country level down to the community district level. The software supports both standard indicators (e.g. 48 MDG indicators) and user-defined indicators, but it does not provide a framework for the creation of these indicators. Database administrators can add their own national datasets, and regional and local indicators.6

DevInfo is being used to monitor comprehensive plans for sustainable development, including poverty reduction strategies, health and nutrition plans, environmental plans and education plans. UN Country Teams use DevInfo to support the CCA process. The system is also used to set up and monitor key indicators of the UN Development Assistance Framework (UNDAF). Specific applications have been developed for the tracking of census data (CensusInfo) and gender data (GenderInfo). Within UNFPA, the system has been customized to monitor key performance indicators of the MDG 5b (MDG 5b+ Info). UNFPA has been collaborating with UNICEF and DHS to ensure availability of information for monitoring MDG 5b on universal access to reproductive health and other population and development related indicators and has developed the respective indicator framework. MDG5b+ Info contains data on sexual and reproductive health indicators and other MDG indicators at the global, national and sub-national levels, where available. See www.devinfo.info/mdg5b for UNFPA’s online database.

The Training Manual on integration of population issues in African Development Bank programmes and projects, developed by the African Development Bank and UNFPA, includes one module on Population data (including gender statistics) in a multi-sectoral database for planning, monitoring and evaluation which teaches the user how to explain the need for good data for population and development plans and projects, make best of use of such data and understand how and where to look for required statistics in various contexts, especially with respect to monitoring and evaluation.


5 See http://www.eclac.cl/celade/ingles/redatam/ for more information on REDATAM and its accessory application ZonPlan.
6 For more detailed information, see http://www.devinfo.org.