^{1}

^{2}

^{2}

^{*}

Many ecological data are compositional and different quantitative techniques have been used to analyze such data, albeit some of them being methodologically wrong. The aim of this contribution is to apply the compositional data approach to forestry data and demonstrate the strengths of this method for percentage or relative data with infrequent zero values. Basal areas of three dominant tree species (

Many ecological data are compositional (composed of only relative information) or may be regarded as compositional. The forest species composition is expressed as the number of trees, the basal area, or the biomass of each species in a given area. These data may be viewed as absolute values when studying biomass increment, tree density, or diameter structure (

Different quantitative techniques have been used to analyze patterns in community species composition and dynamics (

However, zeros are very rare and do not occur in some instances, such as during the investigation of the occurrence of a small number of dominant tree species, plant functional groups, or diameter classes. In these cases, it is desirable to use all of the information available in the data,

Compositional data contain only relative information and present certain peculiarities that preclude traditional statistical methods. The mathematical foundations of compositional data analysis and terminology are explained in

The purpose of this contribution is to introduce the compositional data approach to forest science and demonstrate its applicability in forest dynamics research. The forest species composition of uneven-aged forests in the Snežnik area in southwest Slovenia was analyzed using different methods within compositional data framework. A brief theory of compositional data analysis is first presented, followed by an example of its application to 3-component tree species forest compositions. The graphical representations of the compositional data and descriptive statistics are presented and interpreted. Statistical analyses were applied in a compositional context, including MANOVA and compositional linear modeling.

Based on the nature of compositional data, which seeks to describe the parts of some whole and provide only relative information between components, three principles can be derived (

The results of any statistical analysis must be the same if all components of the composition are scaled with a positive number (scale invariance). This relationship is ensured by closing the data to 1 or 100% prior to any analysis. Changing the units of the components does not change the results of the compositional data analysis.

The results of any statistical analysis must be the same if previously unmeasured components are added to the composition (subcompositional coherence); the statistical result for some components may not be dependent on the other reported components.

Any statistical result for compositions may not be dependent on the sequence of the reported components (permutation invariance).

Composition is presented as a real vector x = [_{1}, _{2}, …, _{D}] of positive components, in which the sum of positive constant

where

The sample space of compositions is the ^{D}), which is a restricted space of a real vector space (

where

Simplicial geometry does not follow ordinary Euclidean rules observed in real vector space. Nevertheless, concepts such as distance between points, straight lines, orthogonality, norm, and inner vector product exist for a simplex vector space, although these concepts are highly non-intuitive (see

The compositional operations perturbation and powering satisfy the properties required to give a simplex ^{D} the vector space structure (

Perturbation plays the same role in ^{D} as the sum of vectors in real space. Given two _{1}, _{2}, …, _{D}] and y = [_{1}, _{2}, …, _{D}], with x, y ∈ ^{D}, their perturbation x⊕y is defined as follows (

The neutral element of the simplicial vector space is the identity composition I = ^{-1} of a composition x = [_{1}, …, _{D}] is given by the composition x^{-1 }= _{1},…,1/_{D}].

The perturbation of a composition x with the inverse of a composition y is denoted by symbol “•” (

This operation is equivalent to subtraction in real vector spaces and is called the perturbation difference. For instance, the inference about mean difference between paired compositions is based on the inference about perturbation differences (

The powering of composition x = [_{1}, _{2}, …, _{D}] is an operation by a real number or scalar

Powering plays the same role in ^{D} as multiplication by scalars in real space.

The principle of working in coordinates states that each composition in ^{D}) can be expressed as a perturbation linear combination (

where coefficients _{i}^{*} (_{1}, e_{2}, …, e_{D-1}] of the simplex ^{D}. The isometric log-ratio transformation (^{*} to x. Given a basis, the coordinates of x are uniquely determined. The vector x^{*} of ^{D-1}, and the values of coordinates are not constrained to be positive or less than 1. In a 2-simplex, the composition is expressed with two

The corresponding general equation for isometric log-ratio transformation from simplex to real space y = ilr(x) is defined using orthonormal basis, e_{1}, e_{2}, …, e_{D-1} (

with

where _{1}, …, _{i}) = _{1}·_{2}···_{D}]^{1/D} is the geometric mean of the parts of composition x with

Standard descriptive statistics, such as the arithmetic mean and the variance of the individual components, do not fit with Aitchison geometry as measures of central tendency and dispersion. The closed sample geometric mean is a measure of the central tendency for compositional data. For a data set

with _{i} (

The variability of a set of compositions can be expressed using various measures. One can either use metric (or total) variance based on simplicial distances be tween a composition and the geometric mean of a set of compositions (

Compositions can evolve dependence on one or more external co-variables such as time, space, or different environmental variables. The explaining variable can be continuous, discrete, real, or even compositional data. The strategy of linear modeling of compositional processes in the simplex can be replaced with linear modeling of compositional coordinates in real space (

The study site is situated in the predominantly forested Natura 2000 Snežnik area in southwest Slovenia (Lon. 14° 26′ E, Lat. 45° 35′ N). The area is located in the Dinaric Mountains, which range from Slovenia to Montenegro. The forest management unit Leskova dolina (2.392 ha) was chosen as the research area. The altitude ranges from 610 m to 1796 m a.s.l. The topography of the area is very diverse, with abundant sinkholes typical of high Karst geology with limestone and dolomite as the parent material. Due to the high variability in topography, different pedogenetic associations developed primarily litosols, rendzic leptosols, cambisols, and luvisols. The soil depth varies between 10 and 300 cm, depending on the micro-topographic position, and precipitation is evenly distributed throughout the year, with a mean annual precipitation of 2150 mm. The average mean temperature is 6.5 °C, and late spring and early autumn frosts are common. The prevalent plant community in the study area is the Dinaric silver fir - European beech forest (

The analysis utilized the basal area, expressed as m^{2} ha^{-1}, which represents the area of a given section of land that is occupied by the cross-section of silver fir, Norway spruce and European beech trunks measured at the diameter at breast height (1.3 m above ground) per forest compartment in 1954 and 2004. In the 1890s, the forests in the Snežnik area were divided into forest compartments, and each has relatively homogenous site characteristics and stand type. The shapes of the compartments have not changed throughout the last century. Forest compartments with similar site and stand characteristics (parent material, soil, vegetation community, stand structure, etc.) determine forest management system and are thus grouped into forest management classes (FMCs). These classes were named after the prevalent vegetation units in this area of Dinaric silver fir and European beech forests:

In each compartment, all trees with a diameter greater than 10 cm were measured (full calipering) at breast height in 1954, and the trees were classified into 5-cm DBH classes for each tree species (

We utilized several consecutive steps to perform the compositional analysis of this example data set. First, some descriptive statistics, such as means and variation, were shown and coupled with general graphic data representation. A ternary diagram was most suitable for our 3-part compositions. In a ternary diagram, each point represents a 3-part composition in a forest compartment. The vertices of the ternary diagram correspond to the three components or, in this study, species. The components are proportional to the lengths of the perpendicular segments from the vertex to the opposite side of the triangle.

Second, we separately assessed the relative differences in the mean species compositions between the 4 FMCs for each year (1954 and 2004) by classical MANOVA of the

Third, the compositional change was evaluated with a calculation of perturbation differences (

Finally, to obtain a deeper explanation of the compositional change, we related the compositional change to the explanatory variables by regression of the perturbation-differences against these variables. We modeled the perturbation differences against FMC; tree species composition of thin trees (DBH < 30 cm) in 1954; median DBH of silver fir, Norway spruce, and European beech; and percentage of salvage cut in 1954-2004 period. The dependent variables in the multivariate linear models included both

The data were analyzed in the R environment (

The subcomposition of the three dominant tree species included in this analysis represents 97.6% of the total basal area. The tree species abundance in each compartment was expressed as part of the total basal area of all trees. The

We applied a classical MANOVA test to the_{1954} < 0.001, p_{2004} < 0.001) in both investigated periods. The results indicate that the ratio between silver fir and Norway spruce in 1954 is significantly higher in TYPC and MERC compared to HOMO and LYCO. The second

The perturbation differences for all forest compartments were calculated to analyze the difference (distance) between forest compositions in 1954 and 2004 (

The significant compositional change is quite evident in

We also compared change in 1954-2004 period of each of the dominant three species using three different measures, namely change in absolute terms (in basal area per hectare), in percentage points and in perturbation differences.

The initial stand structure was well correlated with the perturbation differences. The site characteristics, expressed as FMC, seem to play an important role in the forest species composition dynamics. In both cases, the first and second ^{2}) is 0.56, and residuals are uniformly distributed around the barycenter of the ternary diagram (

The effect of unique variables on perturbation differences is presented in

Compartments with a small silver fir DBH do not change greatly (a colored line begins near the center of the ternary diagram on

We have outlined the fundamentals of compositional data analysis and have presented some examples of its application in the study of forest structure and forest dynamics.

Compositional data analysis presents two common challenges: (i) characterizing a change in composition; and (ii) testing hypotheses concerning the nature of the change. In general, a lack of change in a value indicates that the expected difference between two paired samples is zero. For compositions, the perturbation difference is the corresponding measure of change. In our 3-species Sneznik forest composition, we showed that the forest composition change over the last 50 years is highly significant. The interpretation of perturbation differences as compositions is somewhat complicated when investigating the bidirectional dynamics of forests, in which trees are simultaneously eliminated (felled or dying trees) and growing (recruitment of new trees). The perturbation difference characterizes the net change in species composition. The forest species composition in the forest compartment 10a in 1954 is x_{1954 }= [0.77, 0.16, 0.07] and x_{2004 }= [0.48, 0.24, 0.29] in 2004, where the first, second, and third parts are silver fir, Norway spruce, and European beech, respectively. The compositional change expressed as the perturbation difference is x_{dif }=

The species composition of the investigated forests displayed relative increases in Norway spruce and European beech and decreases in silver fir. To explain this change some historical facts should be presented. Mature forests of European beech and silver fir prevailed in the Dinaric parts of Slovenia and beyond in the middle of 19^{th} century, and young trees below the canopy mainly consisted of shade-tolerant silver fir, which was capable of effective regeneration due to low ungulate densities. Adult beech trees were intensively harvested for firewood and charcoal production until the middle of the 20^{th} century, while silver fir became a dominant tree species, desired for its timber and suitability for selection management (^{th} century with substantial silver fir dieback. Recent studies have offered some possible explanations for this event (

Using linear modeling we showed which direct and indirect variables were substantially related to forest compositional change. The perturbation differences identified in this study demonstrate the advantage of this approach for ecological studies.

In general, in the case of forest species composition, the total sum of forest basal area (m^{2} ha^{-1}) is not irrelevant, since the total wood biomass or tree coverage of a site tells a lot about forest productivity and structure (

This paper does not consider null values, which are the principal problem that can arise in compositional data analysis of ecological data (

This paper presents a natural and mathematically clean framework to deal with closed data, particularly when the data are subjected to physical constraints, such as mass or area conservation (

We showed that certain aspects of forest dynamics such as species composition can be successfully evaluated using compositional data approach. For the investigated forests of SW Slovenia we showed significant compositional change in a 50-year period caused by the interplay between environmental factors and forest regime with their differential effects on three dominant tree species. In absolute terms, the amount of silver fir changed the most; decrease could be attributed to fir dieback and poor regeneration/recruitment due to ungulates. In relative terms, expressed in perturbation differences, largest change was showed for Norway spruce with its increase being predominantly the effect of promotion through forest management measures.

We thank the anonymous reviewers for their valuable comments which greatly contributed to the quality of the paper. We acknowledge the financial support from the Slovenian Research Agency (Young Researcher’s programme [MK] and research core funding No. P4-0059 and No. P4-0085).

Basal area of 119 forest compartments according to 4 different forest management classes (FMCs) for 1954 and 2004.

Ternary diagrams of tree species compositions determined in 1954 (left) and 2004 (right). Each symbol represents a single forest compartment (n = 119). Different color symbols represent different FMCs: (blue)

(a): Shift in tree species composition between 1954 and 2004 shown on ternary diagram. The circles are forest compartments in 1954, and the line segments are compositional straight lines toward the 2004 forest compositions. (b): The perturbation differences between the two observation periods. The magnified symbols in both plots represent two compartments shown as compositional change and the corresponding perturbation difference. Different color symbols represent different FMCs (see

A comparison of different measures to evaluate forest species compositional change during the investigated period between 1954 and 2004. Grey horizontal lines indicate no change.

Predicted perturbation differences from final model (a) and the distribution of model residuals (b). Different color symbols represent different FMCs (see

The component effect for each co-variable in a linear model of perturbation differences in the forest composition between 2004 and 1954. The small points indicate perturbation differences. The color ramp represents the component effect (orange and blue colors represent high and low values of the explanatory variable, respectively).

Explanatory variables and data sources of forest structure and terrain topography specified for each forest compartment. (

Data source | Explanatory variable per forest compartment | Units |
---|---|---|

Calculated from full calipering from 1954 data | Median silver fir diameter at DBH | cm |

Median Norway spruce diameter at DBH | cm | |

Median European beech diameter at DBH | cm | |

Species composition of trees with diameter < 30 cm | - | |

Records of harvested trees in 1954-2004 | Harvesting rates | m^{3} |

Salvage logging relative to total harvesting rates | % | |

Broadleaf logging relative to total harvesting rates | % | |

Conifers logging relative to total harvesting rates | % | |

Digital elevation model 12.5 × 12.5 m | Mean altitude | m |

Mean terrain curvature | - | |

Mean slope | degrees |

Mean percentages per tree species (SF: silver fir; NS: Norway spruce; EB: European beech), total variance of the dataset, and significance of differences between four FMCs in 1954 and 2004. Significant differences between FMCs are described for each

Year | Parameter | Species / |
HOMO | LYCO | MERC | TYPC |
---|---|---|---|---|---|---|

- | No of compartments | - | 33 | 13 | 13 | 60 |

1954 | Mean portion of | Silver fir (SF) | 0.66 | 0.77 | 0.77 | 0.79 |

Norway spruce (NS) | 0.09 | 0.14 | 0.05 | 0.04 | ||

European beech (EB) | 0.25 | 0.09 | 0.18 | 0.17 | ||

Total variance | - | 0.43 | 0.98 | 1.30 | 1.04 | |

Homogenous groups | a | a | b | b | ||

a | b | a | a | |||

2004 | Mean portion of | Silver fir (SF) | 0.46 | 0.49 | 0.54 | 0.61 |

Norway spruce (NS) | 0.21 | 0.37 | 0.19 | 0.12 | ||

European beech (EB) | 0.32 | 0.13 | 0.27 | 0.27 | ||

Total variance | - | 0.30 | 0.14 | 0.49 | 0.46 | |

Homogenous groups | b | a | b | c | ||

a | b | a | a |

Estimates of the parameters, residual standard error, and adjusted coefficients of determination for modeling perturbation differences.

Coordinate | Coefficients | Estimate | Std. Error | t value | p value | Res. Std. Error | Adj. ^{2} |
---|---|---|---|---|---|---|---|

(Intercept) | -0.692 | 0.434 | -1.592 | 0.114 | 0.358 | 0.588 | |

TT |
-0.488 | 0.051 | -9.531 | 0.000 | |||

TT |
-0.078 | 0.064 | -1.223 | 0.224 | |||

NS median | -0.039 | 0.009 | -4.642 | 0.000 | |||

SF median | 0.038 | 0.011 | 3.553 | 0.001 | |||

EB median | 0.056 | 0.017 | 3.377 | 0.001 | |||

Salvage cut | -0.004 | 0.001 | -2.861 | 0.005 | |||

(Intercept) | 0.069 | 0.379 | 0.182 | 0.856 | 0.312 | 0.437 | |

TT |
0.034 | 0.045 | 0.754 | 0.452 | |||

TT |
-0.354 | 0.056 | -6.336 | 0.000 | |||

NS median | 0.039 | 0.007 | 5.236 | 0.000 | |||

SF median | 0.008 | 0.009 | 0.905 | 0.367 | |||

EB median | -0.063 | 0.015 | -4.324 | 0.000 | |||

Salvage cut | 0.004 | 0.001 | 3.742 | 0.000 |