Use of Statistical Package SAS A Case Study from Coconut Research Institute of Sri Lanka

The Coconut Research Institute in Sri Lanka (CRI) began using microcomputers for the analysis of their experimental data in 1986. Four case studies of typical sets of field experimental data from the CRI are used to illustrate the potential benefits and possible problems in moving from a manual system of data processing to one which uses a powerful statistical package.


INTRODUCTION
The Coconut Research Institute of Sri Lanka (CRI) acquired microcomputers in 1986 for data processing and analysis. Other agricultural research institutes in Sri Lanka are also currently acquiring microcomputers and all have a joint licence to use the statistical package SYSTEM ANALYSIS SOFTWARE (SAS). This paper con siders some ways in which computerization may affect the biometrical work at these institutes.
The structure of the CRI, collection of data from field experiments and their analyses are briefly described. Then four typical sets of experimental data and their analyses are outlined. The emphasies in this paper is on the potential of the computing facilities to enhance the work done by the Biometry Unit.rather than the sets of data themselves. Some of the dangers of uncritical use of a package for the analysis are also considered.
In the final section of the paper we consider briefly other ways in which the computer could help in data processing. Many agricultural research institutes conduct experiments for more than one year. This is however taken to extremes with a crop like coconut where palm yields are taken six times a year for trees which yield for over 60 years. The entry and management of such long term data are a formidable task which could involve other software in addition to a statistical package.

BACKGROUND
The Coconut Research Institute (CRI) at Lunuwila in the North Western Province in Sri Lanka was established in 1929. It is primarily concerned with research on agro nomic aspects of coconut, on the processing of coconut and its byproducts and conducts a large number of field and laboratory experiments. The field experiments are carried out either at the main station (Bandirippuwa Estate) or at one of the eight substations or in state-owned estates. Most of the field experiments are essentially long-term, and on an average, eight new field experiments are initiated annually.
The CRI has eight research divisions but most of the field experiments are conducted by Agronomy, Soils and Plant Nutrition, Genetics and Plant Breeding and Crop Protection Divisions. The research staff number about 40 and all divisions have officers with post graduate degrees.
The Biometry Unit at the CRI is the oldest biometry facility in the research institutes in Sri Lanka, and its responsibility is to provide a statistical service to the other research divisions. This involves the design of experiments., analysis of data and the interpretation of results in consultation with the experimenter concerned. In addition, the Biometry Unit conducts research on crop-weather relationships and maintains three agrometeorological stations. The Unit has two biometricians with a number of support staff.
Normally, the Biometry Unit designs experiments in consultation with the experimenter concerned. The most commonly used design is the rarfdomised block design with factorial treatment structure. The number of blocks is usually two or three with plots often con sisting of six to eight coconut palms.
In field experiments, coconuts are harvested once every two months, and at each harvest two bunches are picked. The common variables recorded are the number of nuts and their copra content (kernel dried to 6% moisture) and the number of female flowers. The data are usually recorded for individual palms. After each pick the data are sum marised over plots and at the end of the last pick, the data are totalled over the year. The original data, recorded in the field books, are transferred to record books. The yearly summary data are kept in files.
These summary data are then analysed by the staff in the Biometry Unit. The methods of analysis are standard, involving the analysis of variance, covariance and regression analysis. The data are sometimes transformed before analysis. Most ana lyses, regardless of the size of the data, are carried out with big electric desk calculators. Complicated analyses such as principal components analysis and multiple regression studies are sent to the computer at the Statistics and Computer Science Department in the University of Colombo.
The summary of the analysis is returned to the experimenter with the data. This summary usually consists of the analysis of variance table together with the means and standard errors etc. The method of the analysis and the interpretation of the results are explained verbally to the experimenter. Occasionally a brief written report is prepared.
In mid-1986, two microcomputers were obtained for the Biometry Unit. Until the SAS package was acquired, the support staff who were not familiar with computers, were trained in the general use of computers. They also entered and analysed a few sets of data with a simple statistical package provided by manufacturer.

Data sets used in the Study
Four sets of experimental data have been selected from three research divisions to compare the analysis with and without the use of the computer. They are typical of the experiments conducted by the CRI. A brief summary of the experiments is given in Fig. 1 The first experiment .had been conducted for eight years by the Soils and Plant Nutrition Division in an estate in Chilaw district. It is a fertilizer experiment with two quantitative factors consisting of all combinations of three levels of nitrogen and potash. The layout is a randomised block design with eight palms in each plot and three blocks. The nut yield and copra content had been collected. The yield in the first year is used for illustration.
The second experiment had been conducted in a shade house by the Agronomy Division to evaluate three varieties of each of three legumes: cowpea, black gram and green gram, each grown under five different shade levels.The layout is a split-plot design in three randomised blocks. The five shade levels are the main plot's and nine combin ations of legume varieties are the sub plots. A large number of variables were measured. The seed yield is analysed for illustration.
The third experiment had been conducted for nine years by the Soils and Plant Nutrition Division in an estate in Chilaw district and is a comparison of three sources of nitrogen: urea, ammonium chloride and ammonium sulphate, each of which had been applied at two levels. There was also a control treatment with no added nitrogen; replicated three times within each block. The treatment structure is therefore a simple factorial with unequal replication for the control. The analysis described in this example is an analysis of cpnvariance with the yield of the final year as the variable and the yield in the one year before the treatments as a covariate. The analysis of covariance is often used in the analysis of coconut data.
These three examples are typical of much of the current work of the Biometry Unit.
The final example involves data for individual palms. The volume of data is large. The problems of manual analysis are therefore correspondingly greater and there is more potential for improvement. This, therefore, typifies the type of data where the computer should have a greater impact. This experiment had been carried out by Crop Pro tection Division at Bandirippuwa estate to evaluate insecticides for the control of rhinoceros beetle which attacks the fronds of coconut palms. Seven insecticides and a control were used in a randomized block design with block of eight palms. The insecticides were applied once. The total number of coconut fronds and those attacked were, counted at two-month intervals. The first measurement had been taken before the insecti cides were applied.

Analysis of the four data sets
In this section the manual analysis is compared with the results from the use of SAS. Two SAS procedures are used.PROC ANOVA handles balanced designs while PROC GLM (General Linear Model) can be used whether the design is balanced or not. When it is appropriate, PROC ANOVA is faster than PROC GLM and presents the results in a way that can be easier to interpret. However, even for balanced design, some options (such as the division of treatment effects into individual contrasts and coveriance analysis) are not available in PROC ANOVA. In such situations PROC GLM has to be used.

Analysis 1. (RBD with factorial structure)
The experimenter was interested in the overall analysis of variance and also in the linear and quadratic components of the main effect plus possible interactions between the two factors. The results as prepared manually by the Biometry Unit are given in Fig. 2. This was a standard routine for the staff and involved no difficulty.
The input of the data and the factor levels to SAS was straight-forward and is shown together with the results from using PROC ANOVA in Fig. 3. Minor disappointment in the presentation of the results are the excess number of figures given for each of the terms and the fact that the two-way table of means is not presented as such.  A more disturbing ommission is the lack of the standard errors for the treatment means. The closest option (given in Fig. 3) is to ask for least significant differences (LSD). However, this gives the LSD for the main effects only. This is easy to mis interpret if there is a substantial interaction. In addition to the LSD there are 17 different multiple comparison options. Such tests could easily mislead the experimenter. They still give no measure of precision for the interaction means. The standard deviations given for the interaction means in Fig. 3 should not be used to interpret the results. They are based only on the three observations from treatment combination, while the grand analysis assumes the same variance for all treatments. Since PROC ANOVA does not present the subdivision of the treatment sums of squares PROC GLMwercused for this. The additional output is shown in Frig. 4.  The outputs from these two figures are now quite extensive compared with the, information given from the manual analysis in Fig. 2. It is acceptable as a basis but requires cautionary notes and editing before it could be returned to the experimenter. A further point is that the default option for the GLM procedure produces two sets of sums of squares (Type I and III) even for a simple balanced design, where they are iden tical. To prevent this, the user must know of the theory of General Linear Models.  The results for PROC ANOVA in SAS together with the SAS program are given in Fig. 6. The MEAN option was omitted in the program because the output was illustrated in the first example. Subdivision of the variety term is not possible in the ANOVA procedure. A more serious problem however is that the presentation of the results for a split-plot experiment could be confusing. It is not satisfactory to return results to experimenter where the main plot term appears twice with identical mean squares but different F values. This is a problem for all analysis that involve different error levels. In this experiment the control treatment was replicated three times while the six treatments (3 factors x 2 levels) were replicated once within a block. This was a long term experiment and in such cases post-treatment yield is often analysed with a pretreatment yield as a covariate. The adjusted ANOVA table prepared manually is shown in Fig. 7 The analyses of convariance cannot be carried with PROC ANOVA. Hence the GLM procedure was used. The SAS program to produce similar results as in Fig. 7 is shown in Fig. 8. The SAS program is relatively straight forward but not as simple as in the first two examples. The results for PROC GLM is given in Fig. 9.

Analysis 4 repeated measurements experiment
In this experiment the total number of coconut fronds and those that were attacked by the rhinoceros beetle were counted in 64 palms (8 palms with each of 8 treatments) at six equal intervals including the pre-treatment stage. Treatments were applied once. This is a typical repeated measurements set of data which can be analysed in a variety of ways including split plot analysis, multivariate analysis and ante-dependence covariance analysis (Kenward, 1987). The manual analysis for the experimenter consisted simply of five separate 2-way ANOVA tables. The variable considered was the difference between the percentage of fronds attacked at consecutive periods. This can easily be obtained from SAS. In addition, other methods of analysis can easily be conducted. Thus, this set of data is typical of many that are already partially analysed manually and require a comprehensive statistical package for there to be any chance of realising the full potential of the data. In addition to using a computer package for analyses that are impracticable by hand, it is also valuable to use computer to present simple graphical displays. This is often of much benefit at the start of an analysis and is very tedious to do manually. Plots of two of the treatments are given in Fig. 10. We feel such plots can be of considerable benefit to the experimenter. -

DISCUSSION
The statistical package SAS is an extremely powerful tool for the analysis of the experimental data collected by the Coconut Research Institute. The example in the last section shows that the presentation of results for those analyses that were done effi ciently prior to computerization is not always ideal. Some notes will have to be supplied with the results and the output could be edited to bring it closer to the form that is needed by the experimenters. An alternative would be to use a different statistical package, for some of the standard analysis instead. The use of GENSTAT Version 5 is investigated in a future study, though there are facilities in SAS, such as high resolution plotting that are not available in GENSTAT. When staff become experienced in using a statistical package, the production of the standard analyses should be far quicker than previously. It will be important to utilise the time saved constructively to do more analyses than were possible manually. These should include more work on the simple analysis and pre sentation of data. Some experiments may require a range of more complex analyses. Perhaps, the most important aspect will be to encourage experimenter to bring the raw data (rather than the semi-processed data) to the Biometry Unit. The repeated copying Time in months and summary of the observations can lead to many errors as well as loss of information. The computers could be used for the storage of both the raw data and information about the experiments and not just as a large statistical calculator. This requires a well defined system of data entry and management. Although SAS could be used for this purpose, a standard database package such as DATAEASE may be preferable.
The observations reported here indicate that the computers and SAS provide tremendous potential for management and analysis of data at the CRI. Similar advant ages are likely at other agricultural research institutes. It will however require careful planning, as well as hard work, to ensure that this potential is realised.