Wagner’s hypothesis in Europe: a causality analysis with disaggregated data

. This paper examines Wagner hypothesis of the growth of public expenditure alongside the growth of economic activity for a panel of 28 European economies during the 1995-2018 period. The hypothesis is verified using Pesaran (2007) panel unit root and Westerlund (2007) cointegration tests that account for cross-sectional dependence in the series, and three panel causality tests (Toda-Yamamoto, Dumitrescu-Hurlin and Juodis-Karavias-Sarafidis) that are suitable for mixed order of series’ integration, heterogeneous balanced panels and cases of limited evidence of cointegration. The empirical results suggested that expenditure and output variables were non-stationary in levels and stationary in the first differences; the cointegration among the variables was present; the causality was principally uni-directional (from output to public expenditure), in line with Wagner’s hypothesis, or bi-directional; the causality from public expenditure to output along Keynesian lines was limited.


Introduction
This paper empirically tests Wagner's hypothesis, 1 which assumes a faster rate of growth of government expenditure (in absolute and in relative senses) than economic growth, as well as expansion of the government spending (and, more generally, activity) at the expense of private sector (Wagner, 1912). The consideration of Wagner's hypothesis was diverse in terms of the definition and measurement of the public expenditure variable, the selection of independent variables in addition to GDP, the country coverage, econometric methods and tests, as well as the findings (the following section provides a cursory review of the empirical studies).
In contrast to many previous studies, the focus of this paper is on the relationship between output and expenditure at both aggregate and disaggregated levels. The consideration of the aggregate level of analysis is justified by the fiscal sustainability, budget deficit and public debt problems that plague the developed economies particularly strongly (Koester & Priesmeier, 2013): the validity of Wagner's hypothesis may provide a solid theoretical explanation of these problems (in addition to other explanatory factors). On the other hand, while Wagner's hypothesis may hold at the aggregate expenditure level, this may not necessarily be the case for the individual expenditure categories, e.g.
the growth of health expenditure may exceed output growth, but the growth of expenditure on agriculture may lag behind it (as put by Peacock & Scott, 2000, the references to the differing pace and speed of expansion of the government may be found in Wagner's original works). Additionally, the focus on the aggregate level may be unsound from an econometric perspective: Rajaguru (2002) and Kucukkale et al. (2012: 1-2) note the distortionary and bias-inducing effects of expenditure aggregation on cointegration and causality relationships, while Granger (1988) mentions the incongruity of the unit root properties of the aggregate and component (disaggregated) series.
Given the specific aim of the paper (to make certain generalisations as to the validity of Wagner's hypothesis in a sufficiently large group of economies) and the lack of public expenditure data with a sufficient time series dimension (that is typically available only for a limited number of economies), we use a panel data set that covers 28 European economies over the 1995-2018 period. Due to the economic and political integration that has been underway in Europe since the 1950-60s, it is reasonable to assume that the panel data would be characterized by the cross-sectional dependence (stemming principally from intra-regional trade and investment flows, implementation of common economic policies, and the public expenditure decisions made at supra-national level) and thus use appropriate panel data econometric techniques.
The rest of the paper is organized as follows. Section 2 contains a review of the theoretical basis of Wagner's hypothesis and of the relevant empirical studies. Section 3 provides the modeling framework, the data and econometric methodology, and the description of the panel data. Section 4 presents the empirical results. Section 5 concludes the paper, discusses the policy implications and outlines the avenues for future research.

Theoretical framework and empirical literature
Certain ambiguity exists regarding the exact definition of Wagner's hypothesis, its applicability in a specific socio-economic setting and the formulation for the modelling purposes. Peacock and Scott (2000, pp. 2-3), based on a thorough examination of Wagner's original works, note that Wagner preferred to call the concept an 'empirical observed uniformity' rather than law; tended to apply the concept both to the traditional services of the government (e.g. defence, law and order) and to the newer functions (welfare provision); included the expenditure by the central and local government as well as the activity of public enterprises; noted the problem of public sector expansion at the expense of the private sector growth; and conceptualised public sector growth not purely as a quantitative but also as a qualitative phenomenon (manifested in sophistication and greater extent of government regulation). Timm (1961) argued that Wagner formulated the hypothesis for the analysis of the 19 th -century governments but did not restrict the applicability of the concept, stating that government size would grow and involvement would extend as long as "cultural and economic progress" continues. Thus, the hypothesis may be equally applicable to the economies at the early stage of capitalist development (as were European economies in the 19 th century), and to other economies at various development stages. 2 The public expenditure growth at a faster rate than economic growth has been attributed to a number of objective factors. Firstly, the industrialisation, production on a large scale, and application of science and technology necessitate the accompanying growth of the organising force, embodied in government and manifested in greater spending (North & Wallis, 1982, pp. 336;Oxley, 1994, pp. 287). Secondly, in contrast to the earlier epochs (e.g. 18 th and 19 th centuries, when the role of the government was restricted to the provision of public order, law and basic services), the 20 th century witnessed much broader roles of the government. The popularity of socialist ideas and the accompanying class struggle, the rise of central or indicative planning (along with socialist economic doctrine), the two world wars and the Cold War, the failures of the capitalist economies at certain junctures (e.g. Great Depression), led to greater public expenditure in such areas as welfare, education, health, and defense. Thirdly, the sophistication of economic activity and transactions, and scientific and technological progress required greater regulation and oversight by the government in such functional areas as the design and implementation of legislation, improvement of economic and political institutions, consumer protection from monopolies, prevention of unfair competition, and also justified greater spending to support scientific and technological capability.
The economic theory provides a number of explanations behind Wagner's hypothesis. The demand for education, health, social protection has high-income elasticity, hence the expenditure in these categories is more likely to conform to Wagner's predictions. Public spending is an outcome of a public decision making process that is subject to influence by vested interests and special interest groups through their lobbying effort, bringing the expansion of the government size and greater public spending in a number of categories (Mueller, 1987). The relative cost of government services may rise over time, resulting in spending growth in nominal terms (Baumol, 1967). Lastly, as stated by the bureaucracy theory of the government, the proliferation of the policies and the growth of spending (frequently at the expense of economic efficiency) result from the immanent features of bureaucracy and bureaucratic concerns that may be summarized as striving for power and prestige, policy and position preservation, and expansion of the bureaucratic apparatus (Niskanen, 1971;Oxley, 1994, pp. 286).
An alternative hypothesis of reverse causality that runs from government expenditure to output is rooted in Keynesian economics and is also supported by other lines of economic theory. In the shortrun, during the cyclical downswing, the increase in government expenditure may stimulate output, provided that there exists idle capacity and the economy is below the potential output (Ray & Pal, 2022).
This view was challenged in a number of studies: while the causality originating from government expenditure may be present, the effects on output may be negative. Barro (1991) notes the inefficiencies that are caused by the public sector expansion and that potentially slowdown economic growth. Tobin (2005) argues that the strength and sign of the expenditure-output effect hinges on the specific role of the state as a modernisation agent (as opposed to a product of vested interests and political pressures, or a source of bureaucratic failure and slack). The positive effects of expenditure on output are, therefore, not guaranteed.
The empirical testing of Wagner's hypothesis was diverse in many respects. With regard to the definition of government expenditure, the empirical studies chose between the nominal and real expenditure values (for instance, Gandhi, 1971, andLin, 1995, are examples of the study that used expenditure in current prices, while Bohl, 1996, is the study that defined expenditure in constant prices).
In a number of studies, the defence expenditure was excluded to isolate the expenditure-output relationships in a civil economy (Abizadeh, Gray, 1985). The empirical research focused on both the expenditure by the central government (Mann, 1980) and the spending at sub-national level (e.g. Narayan et al, 2012 in the study of Wagner's hypothesis for a group of Indian states), but tended, in the absence of disaggregated expenditure data, to examine the aggregate expenditure (few studies, such as the analysis of the hypothesis in Canada by Biswal et al, 1999, stand as exception).
Regarding the selection of regressors, the following variables were used in addition to GDP (per capita): the sectoral shares of output to account for structural influences on 'expenditure-output' nexus (Mann, 1980), the values of exports and imports to account for external economy influences of government spending decisions (Abizadeh, Gray, 1985); permanent income and deviation of current demand from the trend (Courakis et al, 1993); country size and population density (Alesina, Wacziarg, 1998;Dao, 1995).
In terms of geographic coverage, a large number of studies focused on a single economy: Wagner's hypothesis in USA (Islam, 2001), Spain (Jaen Garcia, 2004), Italy (Magazzino, 2012), Canada (Ahsan et al, 1996), Egypt (Ghazy et al, 2021), among others. Selected studies looked at regional or level of development groups (developing economies by Diamond, 1977;14 European economies by Afonso, Alves, 2017), economies with specific characteristics (e.g. the study of the hypothesis for petroleum exporters, where oil export revenues constitute a major source of government revenue and thus influence fiscal policy significantly, Burney, 2002), or random sets of economies (e.g. the comparative study of Spain and Armenia by Sedrakyan and Varela-Candamio, 2019; or of the three African economies, Ansari et al, 1997).
In terms of econometric methods and tests, the earlier studies tended to rely on descriptive statistics analysis (Bairam, 1992), the use of linear regression (Lin, 1995), or specification of the simultaneous equation models for the demand and supply of public expenditure (Dollery, Singh, 2000). Such methods suffered from the 'spurious regression' and identification problems and ignored the non-stationarity of the time series. To address these problems, a more recent empirical analysis started to utilise cointegration and causality tests (Oxley, 1994;Legrenzi, 2000), vector autoregression models (Benavides et al, 2013), panel data techniques (Afonso, Jalles, 2014), or non-linear models (Karagianni, Pempetzoglou, 2009).
The empirical research delivered conflicting results. A number of studies indicated the absence of causality between expenditure and GDP in either direction (Huang, 2006;Sinha, 2007). The unidirectional causality that supports Wagner's hypothesis was identified in the studies by Courakis et al (1993), Tang (2001), Chang et al (2004), Jaen Garcia (2004), Sideris (2007), Lamartina and Zaghini (2011) and, in selected specifications, Benavides et al (2013). The causality along Keynesian lines was demonstrated by Iyare and Lorde (2004) and Babatunde (2011). The bi-directional causality was indicated by Narayan et al (2008), Ziramba (2009), and Yay and Tastan (2009). Given a variety of empirical outcomes in the studies that focused at aggregate expenditure and individual economies (or narrow groups of economies), it is advantageous to consider the specific expenditure categories and a broader panel that encompassed the maximum possible number of countries.

Model
Following Loizides and Vamvoukas (2005), a bivariate representation of Wagner's hypothesis is used, with no inclusion of additional variables. The five specifications of the hypothesis suggested by Peacock and Wiseman (1979), Mann (1980), Musgrave (1969), Gupta (1967), and Goffman (1968) were considered: where G , Y and P respectively represent government, expenditure, real GDP and population.
It is assumed that causality runs from the right to the left-hand side of the above equations under Wagner's hypothesis, and in the opposite direction under Keynesian hypothesis.

Data
The paper uses the GDP and public expenditure data released by Eurostat. Both GDP and expenditure The public expenditure categories followed Classification of the Function of the Government (COFOG), Version 1999 (OECD, 2011, pp. 194-5). In addition to total expenditure, the following expenditure categories are delineated: defence; economic affairs; education; environmental protection; general public services; health; housing and community amenities; public order and safety; recreation, culture and religion; social protection.

Econometric methods
Initially, we examine the integration order and the unit root properties of the expenditure and output.
There is a possibility of cross-sectional dependence in the series (the correlation of the panel members in the cross-section due to various unobserved common factors, such as policy and political integration or policy learning). As a result, we conduct Pesaran (2004,2015) cross-sectional dependence test that is flexible with regard to panel data structure (specifically when time series dimension is smaller than cross-sectional dimension, T N < ), and Pesaran (2007) cross-sectionally augmented IPS test (CIPS) that is robust to the cross-sectional correlation (in lieu of the first-generation panel unit root tests that ignore the phenomenon).
For the variables that have (1) I integration order and contain unit roots, we consider the possibility of cointegration and conduct Westerlund panel cointegration tests (Westerlund, 2007). Westerlund test tackles cross-sectional dependence, removes common factor restriction, and specifies the errorcorrection model as follows (Persyn& Westerlund, 2008, pp. 233): ( ) As a next step, in order to ascertain causality between public expenditure and output, we employed Toda-Yamamoto (TY) causality test (Toda & Yamamoto, 1995). The test is applicable for the situations when the variables have different order of integration, for instance, the combination of (0) I and (1) I order (the data characteristic that renders the conventional Granger causality test unreliable), or when the cointegration tests give conflicting indications. The TY test is invariant to the integration orders (allows any combination of the orders) and is valid irrespective of the presence or absence of cointegration. Implementation-wise, the TY test establishes the maximum integration order of the series, max d , uses it in the estimation of the augmented VAR model in levels (where the optimal lag order is determined using the conventional lag selection criteria, and the selected lag for each variable is increased by max d ) and constructs modified Wald ( MWald ) statistics to confirm causality (similarly to the conventional Granger causality test). The inclusion of lags is necessary, due to the delayed (not instantaneous) effects of GDP on government expenditure.
For the case of public expenditure-output relationship, the augmented VAR model is represented as: The causality in the sense of Granger from X to Y in Equation (7) exists when 1 0 For the purpose of robustness check, we conducted Dumitrescu-Hurlin (2012) for 1, , The null hypothesis is of no causality for any cross-sections in the panel: Under an alternative hypothesis the causality in some of the cross-sections is allowed: For each i , the Wald test is conducted for the hypothesis 0 iK γ = and the Wald test statistics for each panel member and the aggregate panel are obtained as: As a last step the standardized Z and Z  statistics are calculated: As a last step we perform the causality test proposed by Juodis et al. (2021). In contrast to Dumitrescu-Hurlin test that has the highest power with large T dimension and 2 / N T approaching zero, it is suitable for the panels with a moderate time series dimension. The test performs well in both homogeneous and heterogeneous panels, assumes that Granger causality parameters are equal to zero under the null, and thus allows using pooled estimator for Granger causality parameters (Xiao et al., 2021, p. 4). The pooled estimators have faster rate of convergence while at the same time suffering from the 'Nickel bias', the problem corrected in the test using the half-panel jackknife method (Xiao et al., 2021, p. 3).

Empirical results
Figure 1 (Appendix) shows the dynamics of cross-sectional means of GDP and expenditure series.
Both GDP and GDP per capita exhibited an upward trend throughout the period (with the exception of the global financial crisis / GFC years of 2008-09). With regard to absolute values of expenditures, the upward trend during the whole period  was observed for the total, education, health, public order and safety, social protection and recreation, culture and religion categories, while other categories were characterised by stabilisation of expenditure (economic affairs, general public services, and environmental protection), decline (defence and housing and community amenities) in the post-GFC years. The expenditure shares of GDP did not follow any specific trend (possibly with the exception of defence, and general public services). For the expenditure per capita, the deterministic patterns were observed only for education, health, public order and safety, social protection and recreation, culture and religion categories. The visual inspection is supplemented for formal unit root tests; the former method is deemed misleading (Cuddington et al, 2002: 21).
As demonstrated in Table 1, according to the Pesaran 2004 and 2015 tests, the null hypotheses of cross-sectional independence and weak dependence were respectively rejected: each of the variables in question was found to be characterised by strong cross-sectional correlation. The pattern is not unusual, given the high degree of economic and political integration in Europe: the previous research indicated convergence in public expenditure across European economies (Apergis et al., 2013) and also highlighted GDP and business cycle comovement in the region (Azcona, 2022). The incorporation of the cross-sectional dependence into unit root and cointegration tests is therefore warranted.
The results of the Pesaran CADF unit root test are presented in Table 1. The test was implemented with a trend component in order to account for the growth of public expenditure and GDP in the longrun. The test was also run on the levels and the first differences of the variables in order to determine the integration order of the variables, which is instrumental for Toda-Yamamoto causality test (that requires the knowledge of the maximum order of integration). The test indicates that GDP and GDP per capita variables were trend stationary in both levels and the first differences (in the latter case at 5% and 10% significance levels), and are thus (0) I order variables. All other variables in the level form contained unit root and the test' null hypothesis was not rejected (the exceptions were the absolute level of total, education, health, public order and safety, environmental protection and housing and community amenities expenditure; the per capita expenditure on education, public order and safety; and the public expenditure as a proportion of GDP in general public services and environmental protection). In the first difference form, all expenditure variables were trend stationary at 5% level (social protection expenditure as a proportion of GDP at 10% level). Overall, we conclude that the variables are the mix of (0) I and (1) I order of integration, and the maximum order is (1) I .
The Westerlund cointegration test was implemented with a maximum of two lags, one lead and a short kernel window of two (in line with recommendations for panels with small time-series dimension, Persyn & Westerlund, 2008)  As part of the robustness check, we conducted Dumitrescu-Hurlin causality test on each specification, with variables expressed as first-differences. The optimal lag length was determined by Bayesian-Schwarz criterion from a maximum of eight lags, and in all cases was found to be one. Table   4 contains an asymptotically well-behaved average Wald statistics (W ), the standardised Z and approximated standardised Z  statistics that follow normal distribution (the latter used for the panels with fixed time-series dimension), together with the respective p-values. We note that in a number of instances Z and Z  statistics gave conflicting results; however, based on Z  , the test rejected the null hypothesis that public expenditure does not Granger-cause GDP only in five cases (three specifications for the general public services category, and two specifications for the defence category). In contrast, the null hypothesis that GDP does not Granger-cause public expenditure was rejected in a total of 26 cases, while bi-directional causality was indicated in 7 cases.
Lastly, given the relatively short N and T dimensions of the panel, we implemented the biascorrected Granger causality test proposed by Xiao et al. (2021). The test was run on the first differences of the series with optimal lag selected based on Bayesian-Schwarz criterion (similarly to the previous test, in all cases, the optimal lag length stood at one). The results are presented in Table 5: the halfpanel Jackknife (HPJ) Wald test statistics and the estimator, both with the respective p-values. The test rejected the null of no unidirectional causality from GDP to public expenditure in a total of 29 out of 55 cases, but rejected the null of no unidirectional causality from public expenditure to GDP in only 2 cases (Mann and Musgrave specifications for the public order and safety category). The bi-directional causality was indicated in 11 cases.

Robustness checks
In addition to causality testing, the study has experimented with cross-sectional analysis of the data.
For each year of the period (1995-2018), the cross-sectional average was calculated for each output and expenditure variable. The Wagner's hypothesis was then verified through the significance of the coefficient in the regression that includes expenditure as independent variable and GDP (output) as regressor. Table 6 in the Appendix demonstrates the coefficient significance in most expenditure categories in Specifications 1 and 4, in the majority of categories in Specification 5, and in many categories in other specifications.
The Toda-Yamamoto causality results were consistent over time. Two checks that capture the effect of the business cycle on the causality and that distinguish causality in the expansion and downturn were implemented. The first check introduced the cyclical dummy variable in the Toda-Yamamoto relationship (the dummy takes the value one when GDP has negative growth and zero value when the growth rate is positive). As indicated in Table 7 in the Appendix, the inclusion of the dummy variable did not alter the results substantially, with bi-directional causality replacing Wagner hypothesis causality in few categories (specifications) and vice versa. Importantly, the Keynesian-type causality from expenditure to output was not identified in any of the cases, similar to the baseline model.

Discussion and conclusion
The paper examined alternative hypotheses that describe the relationship between public expenditure and output in a panel of 28 European economies over the 1995-2018 period using the panel data methods.
The respective variables were found to be cross-sectionally dependent (expected result, given the tight and extensive political and economic integration between European economies), and thus, the appropriate unit root and cointegration tests (Pesaran CADF and Westerlund) were used. The variables were characterised by a mix of (1) I and (0) I integration order, and the long-run equilibrium relationship between public expenditure and output was established in every specification and expenditure category. The strong evidence of cointegration is an indication of robust results, as Wagner's hypothesis presumes a strong equilibrium relationship between public expenditure and GDP at earlier stages of development, and a weaker relationship at later stages. This result is in line with other studies of Wagner's hypothesis for the advanced and industrialised economies in the post-WWII period (the study of G7 economies by Kolluri et al., 2000; of six European countries by Thornton, 1999;and of Italy by Magazzino, 2012).
The evidence of cointegration was somewhat weaker in the social protection expenditure category.
While, as put by Shelton (2007), the population ageing in European countries has necessitated greater social security and protection expenditure and thus stronger relationship with GDP (per capita), the previously strong relationship between social expenditure and GDP in the well-established welfare states of Europe has weakened in recent decades, due to the structural change of the economy and the reforms of the social security and welfare systems that these economies have been undergoing in the recent decades (Carter, 2003;Kuckuck, 2014, p. 150 There are multiple policy implications of the results. Firstly, the strong evidence of cointegration between output and expenditure and the absence of government expenditure to output causality along Keynesian lines may imply limited effectiveness of the short-term spending cuts and rises in government expenditure, given that expenditure will tend to return to a long-run equilibrium level. In this context, as noted by Magazzino (2011) and Akitoby et al (2006), structural reforms would be needed to help achieve the social and economic objectives that fiscal policy is supposed to accomplish. Secondly, the strong output-to-expenditure causality along Wagner's lines and sustained public expenditure was observed despite the slowdown of European economic growth in recent decades. The factors that helped to maintain strong long-term relationships and causality along Wagner's lines included increased revenue collection of the governments, a decrease in the size of the shadow economy, and the growing demand for public expenditure in certain functional areas, e.g. increase in health spending due to new health threats, or the projected expansion of the defence expenditures due to uncertain geopolitical situation in Europe. Thirdly, as argued by Afonso et al (2005), the Wagner's relationship is necessarily weakened over time, due to the improvement of the quality of goods and services provided by the public sector, to the higher efficiency of the sector and slower increase (and perhaps) decrease in spending in the industrial economies. The analysis conducted in this paper for the sub-periods did not demonstrate any weakening of the 'output-spending' cointegration in later periods. Fourthly, as noted by Benavides et al (2013, pp. 71-72), the causality along Wagner's as opposed to Keynes' lines points to the limited role of political factors and political decision-making in the determination of economic outcomes. This empirical result would be contrary to the actual experience of the developed economies in the post-WWII period (expansion of public policies, growth of bureaucracy and the predominance of redistribution as opposed to efficiency objectives and logic in the public policies), and the premises of the public choice and new political economy theories (Buchanan, Tullock, 1977). Lastly, the identified weakness of Keynesian hypothesis may be attributed to the crowding out of private investment by public spending, a phenomenon that has been well documented in the literature (Ashauer, 1989, Erden, Holcombe, 2005. The relevant policy implication is the absence of any automatic positive effect of greater public spending on growth and productivity. In a related vein, the weakness of Keynesian causality may be explained by the weakening of the positive effects of public spending on total factor productivity that has been observed starting from the 1980s. With regard to both phenomena, Lachler and Aschauer (1998) stressed the importance of savings-financed increase in public spending and capital (as opposed to an increase in public consumption that merely brings greater public debt and higher current and future taxes) for the full manifestation of positive gains of fiscal effort.
Future research may modify the present model and setting by considering the cyclical components of government expenditure and output and alternative channels through which the expenditure is directed in the economy, by incorporating the effects of discretionary fiscal policy, and by investigating Wagner's hypothesis at the level of sub-national finances (Kucukkale, Yamak, 2012;Inshauspe et al, 2020). Future empirical studies may consider the role of public enterprises (in addition to the level of spending), the role of government regulation and legislative activity as a proxy or measure of the government size, the breaks and non-linearities in the relationship (in line with Armey-Rahn hypothesis of optimal size of government). They may also continue to experiment with alternative functional forms of the relationship and alternative definitions of dependent and independent variables (Peacock, Scott, 2000).
The bulk of the research on Wagner's hypothesis has been on variables (e.g. public spending or the scope of legislative activity of the government) that are amenable to quantitative analysis. A more important issue is the nature of the government itself. As noted by Lamartina and Zaghini (2011), the pursuit of self-or vested-interests (alongside corruption and moral hazard behaviour) by the bureaucracy would render expansion of the government involvement in social and economic affairs undesirable, notwithstanding the growth of the economy and objective need for 'bigger' government. In a related vein, the growth of the spending on public services and administration that is motivated by the self-interest in the expansion of bureaucratic prominence and functions or proliferation of government programs due to lobbying efforts are examples of the growth of public spending that takes place irrespective of output or economic growth (Niskanen, 1971). Wagner's hypothesis is thus a partial explanation of the public sector growth.