*by R. K. Sinha*^{^} *The article analyses statistical characteristics of the Consumer Price Index-Combined (CPI-C) based inflation and inflation expectations datasets and identifies suitable statistical distributions for these. The identification of appropriate distributions facilitates in establishing a one-to-one mapping of these distributions. The mapping provides a conversion/correspondence of a data point from one dataset to another. These models have the potential to forecast inflation and are also potentially useful to measure Inflation-at-Risk (IaR).* Introduction The CPI-C based inflation data is published by the Ministry of Statistics and Programme Implementation (MoSPI) together with granular-level data. One type of granularity is by-product item at the all-India level. Another is according to the product group and sub-group level according to States/Union Territories (UTs) and Regions (Rural/Urban). The Reserve Bank conducts the Inflation Expectations Survey of Households (IESH), which provides expectations of the respondents (surveyed households) on inflation for the near term. Such surveys are known for biases internationally, and accordingly, the levels of inflation expectations often differ from the realised inflation. Nevertheless, they have proved to be very useful for tracking the directional changes. Several recent studies (Das et al., 2019; Shaw, 2019; Muduli et al., 2022) have attempted to assess the inherent biases in such surveys and removed them to establish a meaningful comparison between inflation and inflation expectations. In this article, we carry out a comparative study of the statistical characteristics of entire distribution of the datasets of actual inflation of MoSPI and inflation expectations^{1} of the surveyed households rather than just modelling and mapping the central tendencies of the two datasets. It may please be noted that comparing and modelling aggregate inflation/inflation expectation numbers often lose inherent information in the dataset, as these are just the derived numbers. The article is divided into five sections. After the introductory section, the datasets of inflation and inflation expectations are described in the second and third sections, respectively. The fourth section connects the findings of these two sections through suitable mappings and suggests possible uses of it. The last section concludes the article. II. Statistical characteristics of CPI-C based Inflation Dataset The data on CPI-C based inflation (aggregate as well as granular level) is published by the MoSPI on a monthly frequency. Statistically, the mean of inflation of the aggregate and granular-level datasets of the same period should match closely, the standard deviation (SD) of granular data can be expected to be higher as compared to the SD of aggregate data, as aggregate data is a distribution of the mean of the granular data. The modal inflation of the aggregate data falls in the band of 5 per cent to 6 per cent, while it is in the band of 4 per cent to 5 per cent in the case of disaggregate data for the period January 2014 to June 2023. The greater variability in the granular data represents individual product level shocks, which can be favourable (bringing the aggregate level inflation towards target point) or adverse (moving away the aggregate level inflation from the target point). The lowest and highest inflation in the aggregate level data stand at 1.46 per cent (recorded in June 2017) and 8.60 per cent (recorded in January 2014), respectively during January 2014 to June 2023 (Chart 1). The distribution of inflation in the granular level has varied significantly across the months driven by the relative presence of extreme values. We attempt to analyse the statistical properties of the granular dataset^{2} over the period January 2014 to June 2023. The disaggregated dataset of CPI-C may, initially, appear to have some characteristics of a normal (bell curve).^{3} However, the dataset is found to be very leptokurtic i.e., having high peak than normal, with a kurtosis at 15.856. The distribution visually appears to be more-or-less symmetric, although has a mild positive skewness of 0.869. A best fit Normal distribution, viz., N (5.0430, 7.1185) is also plotted, demonstrating the nature of poor fitting with under-estimation at around central and extreme values, and compensating over-estimation in between (Chart 2). The underlying leptokurtic dataset has fatty tails with around 2.5 per cent of observations each in extreme parts, i.e., inflation lower than -10 per cent in the left tail and more than 20 per cent in the right tail, representing severe shocks (Chart 1). As the normal distribution fails to explain characteristics of the dataset, we explore and search for other suitable statistical distributions, which may potentially explain the nature of this dataset. It is observed that no single statistical distribution explains the dataset adequately. Two best-fit distributions were identified as Cauchy (*μ* = 4.7930 and *σ* = 2.4758) and Laplace (*μ* = 5.0430 and *λ* = 0.1987) though they also do not fit the dataset appropriately (Chart 2 and Table 1). It may be mentioned that the granular level dataset is composed of various product/sub-product groups across the regions (Rural/Urban) and States/ UTs leading to wide heterogeneity. Fitting of sub-sets of datasets by product categories, having larger heterogeneity than regions and States/UTs, indicate more precise modelling for some of the products. Also, we observe larger variations in the descriptive statistics of these subsets. For example, inflation of ‘cloth and footwear’ appeared to be closest to a bell curve (normal); inflation of ‘housing’ hovered in a tight spread (narrow range) over time (Annex - Chart A1 and Table A1). Table 1: Fitting of CPI-C Disaggregate Dataset through Suitable Statistical Distributions | Name of the Distribution | Estimated Parameters | (Kolmogorov – Smirnov Test (Critical Value at 5 per cent = 0.01944) | Rejection at 5 per cent level of significance | Normal | *µ* = 5.0430 *σ* = 7.1185 | K-S Statistics = 0.12413 | Yes | Cauchy | *µ* = 4.7930 *σ* = 2.4758 | K-S Statistics = 0.03687 | Yes | Laplace | *µ* = 5.0430 *λ* = 0.1987 | K-S Statistics = 0.06548 | Yes | Source: Author’s calculations. | If a single distribution fails to fit the underlying dataset appropriately, various studies have explored and demonstrated the use of mixture distributions, having potential to capture the characteristics of the dataset more appropriately. These mixture distributions can be constructed with or without the identification of a threshold, a particular value of the random variable. The threshold approach partitions the dataset into two parts, and the parts are modelled separately using different statistical distributions. Several studies e.g., Cooray and Ananda (2005) and Scollnik (2007) used the Lognormal-Pareto model; Ciumara (2006) and Scollnik and Sun (2012) applied the Weibull-Pareto model; Nadarajah and Bakar (2014) suggested Lognormal-Burr model; to mix two distributions with a threshold. Other approaches suggest mixing of two statistical distributions across the entire distribution without any threshold, but that might have fixed or dynamic weights (mixing parameters). Frigessi et al. (2002) demonstrated a dynamic mixture model for the unsupervised tail estimation without estimating the threshold. The study used a Weibull-Pareto pair, which assigned a higher weight, starting from one, to Weibull at the left part of the distribution which is gradually reduced and tend to zero at the right tail of the distribution. All these studies demonstrated the same dataset viz., Danish fire loss data, a famous insurance dataset known for its heavy right tail. Unlike the above dataset, which has only one possible heavy tail loss, as values are bounded at zero, the underlying inflation dataset has two clear tails, which have varied significantly across the months. We split the data into two parts with inflation at 4 per cent, as the threshold. These two parts of the dataset are modelled separately. However, we transform the data before the modelling as detailed below: Let {x_{L}} and {x_{R}} are the data points of the initial granular dataset of inflation covering the distinct ranges (-∞ to 4 per cent) and [4 per cent to ∞), respectively. We define: Both, X_{L} and X_{R} range from 0 to ∞ now. We now fit the data and identify that 3-parameter Burr and 3-parameter Dagum^{4} are the two distributions, which could explain the characteristics of the data appropriately for both the parts. We use Burr distribution in our case, and rest of the analysis is centered around Burr. Burr distribution is a versatile distribution and has been found to be suitable for many insurance datasets. Sastry and Sinha (2010) used a 4-parameter Burr distribution to describe Danish fire loss data and found it to be competitive to several mixture distributions, as proposed by some of the studies for this dataset, as mentioned earlier in this section. The probability density function (pdf) of a 3-parameter Burr distribution is defined as: Where, k (>0) and α (>0) are the first and second shape parameters, respectively; β (>0) is the scale parameter. The distribution function (df) of a 3-parameter Burr distribution is defined as: The descriptive statistics (DS) of each part of the distribution indicates resemblances of observed data and fitting by Burr (Table 2). The fitting by using mixture distribution indicates stark improvement over the initial approach of using single distribution. Now the derived variables (X_{L} and X_{R}) are transformed back to the original variable (X) and the modelled probability density functions are proportioned into their respective weights and stitched together. This way, the derived single pdf from the synthetic pair of Burr (2.2685, 1.1129, 5.6631) and Burr (2.7037, 1.2135, 7.9923) adds to unity with appropriate weights^{5} and explains the data in a much better way (Chart 3). Table 2: Descriptive Statistics of two Parts of the Distribution | DS | Left Part (X_{L}) | Right Part (X_{R}) | Weight in Full Data | 0.41015 | 0.58985 | Mean | 4.0977 (-0.0977 for X) | 4.6112 (8.6112 for X) | Range | 0 to 46.90 (-42.90 to +4.00 for X) | 0 to 89.19 (+4 to +93.19 for X) | SD | 5.4349 | 5.8530 | Skewness | 2.7690 | 4.1868 | Kurtosis | 11.933 | 31.462 | Fitting of Burr Distribution | Parameters | k = 2.2685 | k = 2.7037 | | α = 1.1129 | α = 1.2135 | | β = 5.6631 | β = 7.9923 | K-S Statistics (Critical value at 5 per cent) | 0.01839 (0.02611) | 0.01381 (0.02384) | Source: Author’s calculations. | If a single distribution, whether derived on a standalone basis or through the mixing of distributions, fails to capture the characteristics of extreme tails adequately and precisely, its estimates of probabilities in the extreme tails are neither reliable nor usable, as it may be under or over-estimating these consistently. In such cases, the alternative solution is to model the extreme tails exclusively through the Extreme Value Theory (EVT) tools. In the current case, the synthetic Burr appears to fit well the entire curve including the tails (Chart 3). We examine the same statistically and explore if EVT tools would be a valuable addition in this context. The distribution of excesses over a high threshold, say u, in the right tail of inflation is defined as: The distribution of excesses represents the probability that inflation (X) exceeds the threshold inflation u by at most an amount x_{R}, where, x_{R} = x-u, given the information that X exceeds the threshold u. In terms of the underlying function, the same is as below: The functions, F_{u}(x_{R}) and F(x_{R} + u), are the conditional and unconditional distribution functions, respectively. The function F(x_{R} + u) is equivalent to F(x), as x_{R} = x - u. The F(u) is the cumulative probability at the threshold “u”. The underlying distribution function may have an infinite right endpoint, i.e., it allows the possibility of arbitrarily very large inflation value with a very small probability. Similarly, the distribution of shortfall over a lower threshold u in the left tail of inflation is defined as: where, x_{min} is the smallest observation. The distribution of shortfall represents the probability that inflation (X) falls short the threshold inflation u by at most an amount X_{L}, where, x_{L} = u - x, given the information that X falls short of the threshold u. In terms of the underlying function, the same is as below: The EVT essentially considers the larger/smaller few observations of the dataset at the extreme ends and not the complete dataset. The EVT deals with conditional probabilities for example, what is the probability of inflation exceeding 25 per cent, given that it is more than 10 per cent. The challenge with EVT is determining the threshold level. Ideally, a higher threshold should be preferred. However, as the threshold increases, the modeller is left with a very small number of observations raising debatable issues on the reliability of probability estimates. Accordingly, there has to be an optimum level of threshold. There are a few standard techniques to determine the threshold statistically. One such technique is plotting the ‘Mean Excess Function’. The same is described below: If u is the threshold, the mean excess function e(u) can be estimated by the sum of exceedances (or shortfall, in case of left tail of the distribution) over the threshold u divided by the number of data points exceeding the threshold u. In other words, the mean excess function indicates the expected overshoot of a threshold given that it exceeds the threshold. For the right tail, an upward trend of the mean excess function may indicate heavy-tailed behaviour of the data; a horizontal line may suggest an exponentially distributed data, and a downward trend may indicate a short-tailed data. The underlying data may follow GPD, if the empirical mean excess function shows an upward trend, in particular, a positive gradient (upward sloping) straight line (McNeil, 1997). In our context, the mean excess plot of the data (Chart 4) is a clear upward sloping line exhibiting suitability for a Generalised Pareto Distribution (GPD) at various possible threshold points. A threshold is chosen from inspecting the plot of mean excess. Accordingly, the mean excess function is computed for our dataset^{6}. It is observed that the mean excess function more-or-less maintains linearity and does not diverge across the board. This indicates that GPD may be potentially an appropriate choice for fitting the exceedances (Chart 4). We explore and examine the appropriateness of GPD in our context and also compare it with our fitted synthetic Burr distribution. We find that the GPD is inferior to synthetic Burr, which has a poor fit at many thresholds (especially at lower u values) and has higher K-S Statistics values. The GPD appears to improve with the increase of threshold and converges with Burr though does not exhibit betterment over it. We demonstrate this for the right tail of the data (Chart 5). We now move to the distribution of inflation in the aggregate data. As we observed earlier (Chart 1) that the distribution of CPI-C based aggregate inflation has much shorter tails, as compared to the granular level inflation distribution. The distribution is found to be almost symmetric and platykurtic (less peaked than normal). The Log-Pearson Type III distribution is identified to be the closest representation of the aggregate level inflation data (Table 3 and Chart 6). Table 3: Descriptive Statistics and Fitting of CPI-C Aggregate Inflation | DS | Estimates | Fitting | Sample Size (N) | 112 | 3-parameter Log – Pearson Type III α = 6.4118 β = - 0.1371 γ = 2.4545 K-S Statistics = 0.03671 (Critical value at 5% = 0.12832) | Mean | 5.1039 | Range | 7.14 (1.46 to 8.60) | SD | 1.5925 | Skewness | 0.0653 | Kurtosis | 2.4621 | Note: The data period is from January 2014 to June 2023. The data of April 2020 and May 2020 are not included, which were imputed and published by MoSPI subsequently. Source: Author’s calculations. | The quantile-quantile (QQ) plot exhibits a straight line highlighting the apprpriateness of the 3-parameter Log-Pearson Type III distribution for the CPI-C headline distribution (Chart 7). We also analyse the evolution of inflation distribution with the incoming of each incremental data point for the CPI-C aggregate data.^{7} Evolution and Stabilisation of Statistical Moments of Inflation (January 2014 to June 2023) The mean inflation^{8} witnessed a more-or-less consistent drop since the beginning of January 2014 till September 2019, touching a trough of 4.54 per cent, which rose gradually in the subsequent period to 5.10 per cent in June 2023. The Standard deviation (SD) of the distribution appears to be settling at around 1.6 per cent (Chart 8). The skewness of the distribution dipped gradually in recent years towards zero-level, leading to a symmetric distribution. The distribution turned platykurtic (less peaked than normal) again in February 2020, just prior to COVID emergence, which remained leptokurtic (more peaked than normal) throughout December 2017 to January 2020. The unstable values of skewness and kurtosis during 2014 are due to the small sample size. Further, these do not appear to precisely converge given the current sample size (Chart 9). Now, we explore the statistical properties of inflation expectations in the following section, which is sourced from the Inflation Expectations Survey of Households (IESH) conducted by the Reserve Bank. There are other sources of inflation expectations/forecasts such as the Survey of Professional Forecasts (SPF), which is also conducted by the Reserve Bank. We restrict the analysis to IESH in the current context, as mentioned in the introductory section. III. Statistical Characteristics of the Inflation Expectations Dataset The inflation expectations survey of households is a bimonthly survey, wherein qualitative and quantitative expectations on inflation are sought from around 6,000 households^{9} in select cities in the urban areas. Here, we analyse only the quantitative inflation expectations of households, which are captured from the households for three-time points - current period, 3-month ahead period and 12-month ahead period. We consider the dataset for the IESH starting from March 2014 (Round 35) to May 2023 (Round 71B) including the two bimonthly surveys conducted every year in addition to four quarterly surveys. The granular (unit) level data on inflation expectations are also released by the Reserve Bank, in addition to web releasing the summary (aggregate) data. Accordingly, we analyse both the datasets, as carried out for CPI-C based inflation in the previous section. The households happen to generally report higher inflation than actual inflation. Further, there is a tendency to report higher inflation for 3-month ahead and further to 12-month ahead as compared to the current inflation (Table 4). We identify that the 4-parameter Burr distribution explains the IESH unit-level data appropriately. In the previous section, we identified that a mixture of two 3-parameter Burr distributions explains the distribution of CPI-C granular inflation well. From the IESH granular dataset, we observe that there has been a preference of households to report inflation in round numbers. This preference leads to the bunching of frequencies at round numbers and distorts the distribution. Further, unlike CPI-C granular inflation, the IESH granular inflation expectations have only one tail viz., right tail, as the lowest band (inflation less than one per cent) does not produce an extreme left tail, although it is unbounded theoretically, it is likely to be considered as between zero to one per cent by the respondents, which indeed appeared to contain a low frequency, barring 12-month inflation expectations dataset. The descriptive statistics and fitting of distribution are provided in Table 4 and Chart 10, respectively. Table 4: Descriptive Statistics and Fitting of IESH Granular Data | DS | Dataset | Current Period | 3-month ahead | 12-month ahead | Mean | 11.765 | 13.150 | 13.849 | Median | 8.50 | 9.50 | 10.50 | Range | 99.50 (0.50 to 100.00) | 98.50 (0.50 to 99.0) | 99.50 (0.50 to 100.0) | SD | 11.241 | 12.689 | 14.131 | Skewness | 3.1441 | 2.9822 | 2.7168 | Kurtosis | 15.916 | 14.001 | 12.163 | Fitting | 4-parameter Burr k = 0.2894 α = 8.0122 β = 9.8506 γ = - 4.7867 K-S Statistics = 0.09987 (Critical value at 5% = 0.13675) | 4-parameter Burr k = 0.3447 α = 6.3927 β = 10.1440 γ = - 4.2926 K-S Statistics = 0.07008 (Critical value at 5% = 0.13446) | 4-parameter Burr k = 0.9789 α = 2.4867 β = 11.8570 γ = - 2.0122 K-S Statistics = 0.08236 (Critical value at 5% = 0.13446) | Source: Author’s calculations. | Chart 10 exhibits the characteristics of respondents regarding their preference for round numbers, as mentioned, with round numbers in the multiples of 5, viz., 5, 10, 15, 20…..and so on. The distribution is found to be very (positively) skewed as well as very leptokurtic for all three datasets (each for current inflation, 3-month ahead inflation and 12-month ahead inflation). The 4-parameter Burr appears to pass the goodness of fit at 5 per cent though, a superior fitting may still be feasible possibly through a mixture distribution due to the inherent nature of round number preferences while responding at survey rounds. The same is not attempted in the current context though. Now, we move to the fitting of IESH aggregate data. The summary statistics of the IESH data is released for two central tendencies (mean^{10} and median). We identify that 4-parameter Johnson SB distribution^{11} tracks the distribution of mean inflation of IESH well. The descriptive statistics and fitting are exhibited in Table 5 and Chart 11, respectively. Table 5: Descriptive Statistics and Fitting of IESH Aggregate Data (Mean) | DS | Dataset | Current Period | 3-month ahead | 12-month ahead | N | 51 | 51 | 51 | Mean | 9.284 | 9.961 | 10.051 | Range | 5.40 (7.30 to 12.70) | 4.80 (7.90 to 12.70) | 5.20 (8.30 to 13.50) | SD | 1.246 | 1.258 | 1.171 | Skewness | 0.7234 | 0.2401 | 0.9043 | Kurtosis | 3.7781 | 2.2039 | 4.0463 | Fitting | 4-parameter Johnson SB γ = 5.0416 δ = 2.8226 λ = 27.7870 ξ = 5.1452 K-S Statistics = 0.08065 | 4-parameter Johnson SB γ = 0.3289 δ = 0.9778 λ = 6.0331 ξ = 7.3584 K-S Statistics = 0.06640 | 4-parameter Johnson SB γ = 3.0075 δ = 1.8969 λ = 15.1540 ξ = 7.2878 K-S Statistics = 0.08959 | Critical value at 5% = 0.18659 (same for all the datasets, as these are ungrouped datasets with same sample size) | Source: Author’s calculations. | The distribution of mean inflation of IESH is found to be (positively) skewed unlike the CPI-C aggregate inflation, which was found to be almost symmetric. After studying and analysing the statistical properties of inflation and inflation expectations at the granular and aggregate level, we attempt to map these in the following section. IV. Mapping of Distributions The findings of section II and III reveal that the statistical moments of the distributions of various analysed datasets differ significantly from each other. The findings are summarized in Chart 12. The mappings, as collated in Chart 12, provide an equivalence of distribution with the other. For example, a data point of IESH (aggregate) for current inflation following the 4-parameter Johnson SB (γ = 5.0416, δ = 2.8226, λ = 27.7870, ξ = 5.1452) has a correspondence with a data point of CPI-C (aggregate) following Log-Pearson Type III (α = 6.4118, β = - 0.1371 and γ = 2.4545). The functional relationship of two datasets can be used in many ways. A simple approach is to map through the cumulative distribution function (CDF). These are potentially useful as IESH is forward-looking, whereas CPI-C realised inflation is post-facto. Accordingly, the mapping has the potential to forecast inflation. The forecast for CPI-C inflation using IESH data can be possible under two mappings – direct and indirect, as below: Direct mapping It is based on aggregate numbers and does not use granular-level information. Let X and Y be random variables representing realised inflation and 3-month ahead inflation expectations of IESH, respectively, both at an aggregate level. The mapping of a particular value ‘y’ of the 3-month ahead inflation expectation to a value of x (of realised inflation) can be done by equating F_{Johnson SB} (y) with F_{log-Pearson III} (x). The steps to do this mapping is given as below: For example, the result of the 72nd round of IESH, which was conducted during July 1-10, 2023, indicated a mean of 10.20 per cent for the 3-month ahead inflation (i.e., forecast for October 2023). Using the 4-parameter Johnson SB (γ = 0.3289, δ = 0.9778, λ = 6.0331, ξ = 7.3584) distribution of 3-month ahead inflation in IESH, we compute F_{Johnson SB} (10.20) = 0.58525. We compute x, by solving the equation, x = F^{-1} _{log-Pearson III} (0.58525), wherein F^{-1} is an inverse CDF. This provides an estimate for x = 5.43 per cent. The above mapping could be an alternative to the traditional econometric models, which are commonly used to forecast inflation through forward-looking inflation expectations. Based on the above approach using the identified distributions with estimated parameters, we estimate the inflation for the months since March 2014, barring those months, wherein any of the two – inflation and inflation expectations are not available. We compare the estimates of inflation with realised inflation using this approach and a simple regression-based approach, an econometric tool, and observe that the proposed approach is quite competitive, which additionally provides valuable insights into the detailed profile of the datasets (Chart 13). More complex mapping of distributions could be done using Copula^{12} functions, although they are not attempted in this article. Indirect mapping In direct mapping, one data point of one distribution is mapped with one data point of the other distribution, which is suitable for the aggregate dataset, as we have only one (aggregate) number per month for inflation expectations and one for the realised inflation. Instead of aggregating single numbers, we can establish a mapping between granular datasets of inflation expectations and realised inflation through many one-to-one mappings. These mappings could include the mapping of inflation at disaggregate level such as urban city (centre) of IESH versus corresponding State of CPI-C, etc. The indirect mapping could be complex and could be done in two stages, as below: In the indirect mapping, it is feasible to analyse the distribution of incoming (new) data of a month at a granular level and compare the historical distribution (covering several months) to identify changes in the shape of the distribution, in terms of changes in moments such as skewness, kurtosis etc. This is not possible in the aggregate, being a single number. We fit the 4-parameter Burr distribution to all rounds under study on an individual basis also to get estimates of parameters of the distribution as also the round-wise descriptive statistics (Annex Table A2). We find that the mean and standard deviation of the granular level IESH data are positively correlated with the realised CPI-C inflation. The skewness and kurtosis are negatively correlated. Similarly, two parameters (α and β) appeared to be linked positively with the realised inflation, while the other two (viz., k and γ) parameters are inversely correlated. These indicators may play a useful role in econometric models as input variables aiding in forecasting inflation (Annex Table A3). The identification of one-to-one mappings in stage 1, which exhibits closer co-movement, shall be a useful exercise. The same is not explored in the current article. The second stage of the indirect mapping is expected to be stronger, being part of the same dataset. Inflation-at-Risk (IaR) The above mappings may also be useful in assessing Inflation-at-Risk (IaR)^{13}. As lower inflation (left tail) has not been a concern in the Indian context, we compute IaR at 95 per cent and 99 per cent for the CPI-C aggregate data using historical (observed) inflation and using inflation expectations from IESH. The estimated distribution of CPI-C inflation i.e., Log-Pearson Type III (α = 6.4118, β = - 0.1371 and γ = 2.4545) corresponds to the inverse CDFs - F^{-1}(0.95) and F^{-1}(0.99) viz. the Inflation-at-Risks - IaR_{0.95} and IaR_{0.99} at 7.84 per cent and 8.84 per cent, respectively. The same using the estimated distribution of IESH i.e., 4-parameter Johnson SB (γ = 0.3289, δ = 0.9778, λ = 6.0331, ξ = 7.3584) for 3-month ahead inflation corresponds the inverse CDFs of F^{-1}(0.95) and F^{-1}(0.99) to IaR_{0.95} and IaR_{0.99} at 12.15 per cent and 12.70 per cent, respectively. Thus, the IaR_{0.95} = 12.15 per cent and IaR_{0.99} = 12.70 per cent of 3-month ahead inflation expectations have equivalence with the IaR_{0.95} = 7.84 per cent and IaR_{0.99} = 8.84 per cent of realised inflation, respectively. The detailed quantile mapping of IESH current and 3-month ahead inflation along with CPI-C inflation is provided for completeness (Table 6). Similar computations could be carried out using granular level data under the indirect mapping approach, as discussed earlier. Table 6: Quantile Mapping of IESH Inflation Expectation and CPI-C Aggregate Inflation | Quantiles | CPI-C Aggregate Inflation | IESH Current Inflation Perception | IESH 3-Month Ahead Inflation Expectation | 0.01 | 1.78 | 7.05 | 7.73 | 0.05 | 2.55 | 7.52 | 8.07 | 0.10 | 3.04 | 7.82 | 8.33 | 0.15 | 3.39 | 8.04 | 8.56 | 0.20 | 3.69 | 8.22 | 8.76 | 0.25 | 3.95 | 8.38 | 8.95 | 0.30 | 4.18 | 8.54 | 9.14 | 0.35 | 4.41 | 8.69 | 9.32 | 0.40 | 4.63 | 8.84 | 9.50 | 0.45 | 4.84 | 8.98 | 9.69 | 0.50 | 5.06 | 9.13 | 9.87 | 0.55 | 5.27 | 9.29 | 10.06 | 0.60 | 5.49 | 9.45 | 10.26 | 0.65 | 5.72 | 9.62 | 10.46 | 0.70 | 5.96 | 9.81 | 10.68 | 0.75 | 6.22 | 10.02 | 10.90 | 0.80 | 6.51 | 10.26 | 11.15 | 0.85 | 6.84 | 10.56 | 11.42 | 0.90 | 7.25 | 10.95 | 11.74 | 0.95 | 7.84 | 11.56 | 12.15 | 0.99 | 8.84 | 12.83 | 12.70 | Source: Author’s calculations. | V. Conclusion The statistical properties of granular-level inflation and inflation expectation datasets remain important and can be analysed through suitable statistical distributions. This article attempts to map the datasets of survey-based inflation and actual inflation through their long-run statistical distributions, which appear to be an unexplored area of research. The variants of Burr distributions are found to be appropriate in explaining statistical characteristics of both the granular level datasets, viz. survey-based inflation expectations and the realised inflation. The aggregation of these datasets provides useful summary statistics such as headline inflation numbers. As the survey-based inflation expectations are forward-looking and have been useful in forecasting inflation for the short-term for which econometric tools are widely used, the functional relationship through suitable statistical distributions derived in the article may facilitate short-term forecasting as a non-econometric tool. Further, the roun-dwise estimated parameters for the survey-based inflation expectations may also be used as an input to the suitable econometric models. The identified distributions can also be used to measure Inflation-at-Risk for the observed inflation and survey-based inflation expectation datasets. References Andrade, P., Ghysels, E. and Idier, J. (2012), “Tails of Inflation Forecasts and Tales of Monetary Policy”. Working Papers, Banque de France. Carreau, J. and Bengio, Y. (2009), “A hybrid Pareto model for asymmetric fat-tailed data: the univariate case”, Extremes 12, 53-76. Cooray, K. and Ananda, M. M. A. (2005), “Modelling actuarial data with a composite Lognormal-Pareto model”, Scandinavian Actuarial Journal (5), 321-334. Das, A., Lahiri, K. and Zhao, Y. (2019), “Inflation expectations in India: learning from household tendency surveys”. International Journal of Forecasting, 35 (3), 980-993. Frigessi, A., Haug, O. and Rue, A. (2002), “Dynamic mixture model for unsupervised tail estimation without threshold selection”, Extremes, 5, 219-235. McNeil, A. J. (1997), “Estimating the tails of loss severity distributions using extreme value theory”, ASTIN Bulletin, Vol. 27, No. 1, 117-137. Muduli, S., Nadhanael, G. V. and Pattanaik, S. (2022), “Assesing inflation expectations adjusting for households’ biases”, Monthly Bulletin, Reserve Bank of India, December. Nadarajah, S. and Bakar, S. A. A. (2014), “New Composite Models for the Danish Fire Insurance Data”, Scandinavian Actuarial Journal, 2, 180-187. Salido, D. L. and Loria, F. (2021), “Inflation at Risk”, Federal Reserve Board, September 08. Sastry, D. V. S. and Sinha, R. K. (2010), “A Revisit to Danish fire loss data”, Conference Proceedings, 12th Global Conference of Actuaries (GCA), Mumbai, India. Scollnik, D. P. (2007), “On composite Lognormal-Pareto model”, Scandinavian Actuarial Journal, Vol. 2007, Issue 1/2007, 20-33. Scollnik, D. P. and Sun, C. (2012), “Modelling with Weibull-Pareto models”, North American Actuarial Journal, 16 (2), 260-272. Shaw, P. (2019), “Using rational expectations to predict inflation”, Reserve Bank of India Occasional Papers, Vol. 40, No. 1. Sinha, R. K. (2023). “India’s Steady State Equilibrium Inflation: A Revisit”, Monthly Bulletin, Reserve Bank of India, May. Annex Table A1: CPI-C Granular Data (Goodness of Fit) | DS | Best Fit | Goodness of Fit | 1. Food and Beverages | Mean | 4.72 | Cauchy (μ, σ) μ = 4.455 σ = 3.420 | K-S Statistics = 0.03111 Critical Value at 5 per cent = 0.01928 | SD | 9.46 | Range | 136.09 | Skewness | 0.8244 | Kurtosis | 10.5708 | 2. Pan and Tobacco | Mean | 6.22 | Johnson SU (γ, δ, λ, ξ) γ = - 0.2939 δ = 1.5054 λ = 5.6234 ξ = 4.8419 | K-S Statistics = 0.03005 Critical Value at 5 per cent = 0.03314 | SD | 4.84 | Range | 53.85 | Skewness | 0.6025 | Kurtosis | 7.3770 | 3. Cloth and Footwear | Mean | 5.65 | Generalised Gamma (k, α, β, γ) k = 1.6752 α = 94.813 β = 3.5953 γ = - 48.695 | K-S Statistics = 0.02143 Critical Value at 5 per cent = 0.0331 | SD | 3.32 | Range | 30.16 | Skewness | 0.0040 | Kurtosis | 3.5698 | 4. Housing | Mean | 5.44 | Burr (k, α, β, γ) k = 0.3489 α = 28.030 β = 20.683 γ = - 17.264 | K-S Statistics = 0.02326 Critical Value at 5 per cent = 0.04537 | SD | 2.92 | Range | 19.38 | Skewness | 1.7588 | Kurtosis | 7.5862 | 5. Fuel and Light | Mean | 5.45 | Johnson SU (γ, δ, λ, ξ) γ = - 0.0883 δ = 1.4241 λ = 6.8561 ξ = 4.9099 | K-S Statistics = 0.03855 Critical Value at 5 per cent = 0.02995 | SD | 6.30 | Range | 104.16 | Skewness | 0.2176 | Kurtosis | 7.8465 | 6. Miscellaneous | Mean | 5.08 | Johnson SU (γ, δ, λ, ξ) γ = -0.3685 δ = 1.6384 λ = 4.8154 ξ = 3.7645 | K-S Statistics = 0.00780 Critical Value at 5 per cent = 0.02852 | SD | 3.69 | Range | 74.42 | Skewness | 0.5900 | Kurtosis | 6.3318 | Source: Author’s calculations. | Table A2: IESH Data (Round-Wise Fitting of 3-Month Ahead Inflation) | Survey Rounds | Period of Survey | 4-Parameter Burr Distribution (Estimated parameters) | Descriptive Statistics | Published Prints | k | α | β | γ | Mean* | SD | Skewness | Kurtosis | IESH 3-month ahead Mean Inflation | IESH 3-month ahead Median Inflation | Realised CPI-C Inflation^{#} | 35 | Mar-14 | 0.3890 | 7.320 | 18.287 | -9.292 | 17.601 | 14.993 | 2.486 | 10.233 | 12.3 | 12.9 | 6.77 | 36 | Jun-14 | 0.2289 | 11.157 | 23.521 | -16.494 | 20.748 | 19.468 | 2.112 | 7.531 | 12.5 | 14.0 | 5.63 | 37 | Sep-14 | 1.0350 | 2.039 | 15.455 | 0.324 | 21.805 | 19.744 | 2.004 | 7.277 | 12.7 | 14.6 | 4.28 | 38 | Dec-14 | 0.9008 | 2.974 | 8.637 | -1.012 | 10.365 | 9.494 | 3.827 | 25.393 | 8.8 | 8.3 | 5.25 | 39 | Mar-15 | 0.7021 | 3.756 | 8.626 | -1.498 | 10.731 | 9.528 | 3.882 | 25.077 | 9.0 | 8.5 | 5.40 | 40 | Jun-15 | 0.6444 | 4.283 | 10.484 | -2.526 | 12.231 | 10.429 | 3.580 | 21.570 | 10.1 | 10.0 | 4.41 | 41 | Sep-15 | 0.5529 | 4.307 | 10.101 | -2.216 | 13.519 | 11.909 | 3.255 | 17.591 | 10.6 | 10.4 | 5.61 | 42 | Dec-15 | 0.3618 | 6.313 | 10.005 | -3.071 | 13.749 | 12.144 | 2.983 | 14.159 | 10.5 | 9.9 | 4.83 | 43 | Mar-16 | 0.6304 | 3.272 | 6.807 | -0.547 | 10.946 | 11.036 | 3.730 | 21.811 | 8.8 | 8.1 | 5.77 | 44 | Jun-16 | 0.4702 | 5.158 | 10.030 | -3.533 | 12.364 | 11.773 | 3.231 | 16.556 | 9.7 | 9.2 | 4.39 | 45 | Sep-16 | 0.6900 | 3.805 | 9.991 | -2.241 | 12.109 | 11.521 | 3.573 | 19.768 | 9.7 | 9.5 | 3.41 | 45B | Nov-16 | 0.5190 | 4.441 | 8.395 | -2.288 | 11.356 | 11.504 | 3.709 | 21.040 | 9.1 | 8.2 | 3.65 | 46 | Dec-16 | 0.6270 | 5.581 | 10.085 | -4.180 | 8.843 | 7.692 | 4.331 | 32.604 | 7.9 | 7.3 | 3.89 | 47 | Mar-17 | 0.6111 | 4.951 | 9.175 | -3.145 | 9.477 | 8.723 | 3.982 | 24.903 | 8.2 | 7.5 | 1.46 | 47B | May-17 | 0.5241 | 6.050 | 9.976 | -4.386 | 9.401 | 9.014 | 4.182 | 26.427 | 8.1 | 7.3 | 3.28 | 48 | Jun-17 | 0.6939 | 5.146 | 9.604 | -3.202 | 8.899 | 7.203 | 4.481 | 34.669 | 8.1 | 7.5 | 3.28 | 49 | Sep-17 | 0.4139 | 7.261 | 9.362 | -4.067 | 9.503 | 8.886 | 4.255 | 27.648 | 8.2 | 7.2 | 5.21 | 49B | Nov-17 | 0.3860 | 6.654 | 8.724 | -3.494 | 10.304 | 10.210 | 3.773 | 21.608 | 8.4 | 7.5 | 4.44 | 50 | Dec-17 | 0.4357 | 5.831 | 8.724 | -3.502 | 10.159 | 10.146 | 3.736 | 21.247 | 8.4 | 7.5 | 4.28 | 51 | Mar-18 | 0.4533 | 6.737 | 10.251 | -4.550 | 9.967 | 8.848 | 4.042 | 26.851 | 8.6 | 7.8 | 4.92 | 51B | May-18 | 0.4994 | 4.642 | 8.454 | -2.125 | 11.544 | 10.892 | 3.553 | 19.713 | 9.4 | 8.7 | 3.69 | 52 | Jun-18 | 0.3841 | 6.238 | 10.144 | -4.293 | 12.201 | 11.690 | 3.122 | 15.356 | 9.5 | 8.9 | 3.70 | 53 | Sep-18 | 0.3098 | 6.523 | 9.215 | -3.723 | 13.224 | 12.953 | 2.779 | 11.973 | 9.9 | 9.4 | 2.11 | 53B | Nov-18 | 0.3102 | 7.708 | 10.619 | -5.227 | 12.166 | 11.315 | 3.025 | 14.568 | 9.7 | 9.0 | 2.57 | 54 | Dec-18 | 0.3726 | 6.174 | 8.590 | -3.194 | 11.263 | 11.273 | 3.522 | 18.303 | 9.1 | 8.2 | 2.86 | 55 | Mar-19 | 0.4732 | 5.271 | 8.028 | -2.320 | 10.162 | 9.210 | 3.986 | 25.473 | 8.7 | 7.8 | 3.18 | 55B | May-19 | 0.3909 | 6.603 | 8.869 | -3.649 | 10.119 | 9.225 | 3.642 | 21.563 | 8.6 | 7.6 | 3.28 | 56 | Jul-19 | 0.4151 | 5.723 | 7.895 | -2.590 | 10.254 | 9.702 | 3.646 | 20.489 | 8.6 | 7.6 | 4.62 | 57 | Sep-19 | 0.4767 | 4.941 | 8.098 | -2.364 | 10.554 | 9.325 | 3.119 | 16.027 | 8.9 | 8.0 | 7.35 | 57B | Nov-19 | 0.2914 | 8.021 | 10.718 | -5.315 | 12.540 | 11.751 | 3.018 | 14.512 | 9.8 | 9.2 | 6.58 | 58 | Jan-20 | 0.2456 | 9.788 | 11.045 | -6.055 | 12.192 | 11.575 | 3.088 | 15.106 | 9.5 | 8.6 | NA | 59 | Mar-20 | 0.4305 | 5.954 | 8.876 | -2.782 | 10.980 | 9.610 | 3.510 | 19.517 | 9.2 | 8.5 | 6.23 | 59B | May-20 | 0.1809 | 16.522 | 19.864 | -14.896 | 14.411 | 13.375 | 2.654 | 11.462 | 10.6 | 10.4 | 6.69 | 60 | Jul-20 | 0.1979 | 15.068 | 20.132 | -14.933 | 14.735 | 13.551 | 2.582 | 11.019 | 10.8 | 10.5 | 7.61 | 61 | Sep-20 | 0.1767 | 16.053 | 20.088 | -15.110 | 15.261 | 14.410 | 2.492 | 10.252 | 10.8 | 10.4 | 4.59 | 61B | Nov-20 | 0.2505 | 8.788 | 11.589 | -5.986 | 14.202 | 13.503 | 2.653 | 10.997 | 10.4 | 10.1 | 5.03 | 62 | Jan-21 | 0.2865 | 7.708 | 10.275 | -4.722 | 13.110 | 12.510 | 2.869 | 12.823 | 10.0 | 9.3 | 4.23 | 63 | Mar-21 | 0.2244 | 10.097 | 12.130 | -6.629 | 14.178 | 13.369 | 2.689 | 11.339 | 10.4 | 10.1 | 6.26 | Table A2: IESH Data (Round-Wise Fitting of 3-Month Ahead Inflation) (Concld.) | Survey Rounds | Period of Survey | 4-Parameter Burr Distribution (Estimated parameters) | Descriptive Statistics | Published Prints | k | α | β | γ | Mean* | SD | Skewness | Kurtosis | IESH 3-month ahead Mean Inflation | IESH 3-month ahead Median Inflation | Realised CPI-C Inflation^{#} | 63B | May-21 | 0.1680 | 15.823 | 19.861 | -14.492 | 16.561 | 15.527 | 2.299 | 8.642 | 11.3 | 10.8 | 5.30 | 64 | Jul-21 | 0.1335 | 23.541 | 26.553 | -21.489 | 16.904 | 15.415 | 2.149 | 7.879 | 11.7 | 11.3 | 4.48 | 65 | Sep-21 | 0.1571 | 16.348 | 19.905 | -14.466 | 17.231 | 16.112 | 2.193 | 8.020 | 11.4 | 10.8 | 5.66 | 65B | Nov-21 | 0.2908 | 6.865 | 13.349 | -6.371 | 17.914 | 15.994 | 2.141 | 7.856 | 11.9 | 12.3 | 6.07 | 66 | Jan-22 | 0.1776 | 14.816 | 17.495 | -12.027 | 15.480 | 14.241 | 2.474 | 9.884 | 11.1 | 10.6 | 7.79 | 67 | Mar-22 | 0.1706 | 16.596 | 20.122 | -14.744 | 15.628 | 14.173 | 2.312 | 8.772 | 11.1 | 10.7 | 7.01 | 67B | May-22 | 0.1419 | 23.030 | 26.925 | -21.733 | 16.545 | 14.856 | 2.233 | 8.514 | 11.4 | 10.8 | 7.00 | 68 | Jul-22 | 0.3039 | 7.324 | 10.986 | -4.521 | 14.302 | 12.710 | 2.633 | 11.188 | 10.7 | 10.3 | 6.77 | 69 | Sep-22 | 0.2155 | 11.932 | 17.219 | -11.210 | 16.070 | 14.675 | 2.465 | 9.847 | 11.3 | 10.8 | 5.72 | 69B | Nov-22 | 0.2651 | 9.224 | 13.205 | -6.982 | 14.432 | 12.714 | 2.691 | 11.788 | 10.9 | 10.4 | 6.44 | 70 | Jan-23 | 0.3470 | 6.865 | 11.370 | -4.566 | 13.917 | 11.883 | 2.747 | 12.329 | 10.8 | 10.5 | 4.70 | 71 | Mar-23 | 0.3251 | 7.186 | 10.753 | -4.299 | 13.504 | 11.829 | 2.937 | 13.665 | 10.5 | 10.2 | 4.87 | 71B | May-23 | 0.4260 | 5.205 | 9.225 | -2.338 | 13.186 | 11.666 | 2.939 | 13.570 | 10.3 | 10.1 | 6.83 | Note: *: Computed from the raw data (without incorporating any filter/trimming) and is unweighted. #: Realised CPI-C Inflation is 3-month ahead print from the month of the survey. NA: The CPI-C inflation for April 2020 (3-month ahead from Round 58) was not initially published by MoSPI, which was imputed and published subsequently. Source: Author’s calculations. | Table A3: Correlation Matrix of IESH Indicators (with Published CPI-C Inflation) | | Realised CPI-C Inflation | k | α | β | γ | Mean* | SD | Skewness | Kurtosis | Published Mean | Published Median | Realised CPI-C Inflation | 1.000 | | | | | | | | | | | k | -0.378 | 1.000 | | | | | | | | | | α | 0.406 | -0.774 | 1.000 | | | | | | | | | β | 0.433 | -0.577 | 0.894 | 1.000 | | | | | | | | γ | -0.414 | 0.738 | -0.967 | -0.954 | 1.000 | | | | | | | Mean* | 0.424 | -0.380 | 0.514 | 0.766 | -0.600 | 1.000 | | | | | | SD | 0.367 | -0.408 | 0.522 | 0.749 | -0.603 | 0.982 | 1.000 | | | | | Skewness | -0.496 | 0.591 | -0.625 | -0.746 | 0.655 | -0.927 | -0.919 | 1.000 | | | | Kurtosis | -0.474 | 0.628 | -0.586 | -0.672 | 0.602 | -0.878 | -0.888 | 0.983 | 1.000 | | | Published Mean | 0.482 | -0.429 | 0.524 | 0.751 | -0.599 | 0.976 | 0.937 | -0.940 | -0.900 | 1.000 | | Published Median | 0.426 | -0.271 | 0.396 | 0.695 | -0.502 | 0.975 | 0.932 | -0.878 | -0.825 | 0.975 | 1.000 | Source: Author’s calculations. | |