Title: | House Price Indexes |
---|---|
Description: | Compute house price indexes and series using a variety of different methods and models common through the real estate literature. Evaluate index 'goodness' based on accuracy, volatility and revision statistics. Background on basic model construction for repeat sales models can be found at: Case and Quigley (1991) <https://ideas.repec.org/a/tpr/restat/v73y1991i1p50-58.html> and for hedonic pricing models at: Bourassa et al (2006) <doi:10.1016/j.jhe.2006.03.001>. The package author's working paper on the random forest approach to house price indexes can be found at: <https://www.github.com/andykrause/hpi_research>. |
Authors: | Andy Krause [aut, cre] |
Maintainer: | Andy Krause <[email protected]> |
License: | GPL-3 |
Version: | 0.3.4 |
Built: | 2024-10-10 05:46:28 UTC |
Source: | https://github.com/andykrause/hpir |
Generate a vector of row IDs for use in forecast accuracy tests
buildForecastIDs(time_cut, hpi_df, forecast_length = 1, train = TRUE)
buildForecastIDs(time_cut, hpi_df, forecast_length = 1, train = TRUE)
time_cut |
Period after which to cut off data |
hpi_df |
Data to be converted to training or scoring |
forecast_length |
default = 1; Length of forecasting to do |
train |
Default=TRUE; Create training data? FALSE = Scoring data |
vector of row_ids indicating inclusion in the forecasting data as either the training set (train = TRUE) or the scoring set (train = FALSE)
This function is rarely (if ever) used directly. Most often called by 'calcForecastError()'
It is a generic method that dispatches on the 'hpi_df' object.
# Load example sales data(ex_sales) # Create RT data rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Create ids fc_ids <- buildForecastIDs(time_cut = 27, hpi_df = rt_data, forecast_length = 2, train = TRUE)
# Load example sales data(ex_sales) # Create RT data rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Create ids fc_ids <- buildForecastIDs(time_cut = 27, hpi_df = rt_data, forecast_length = 2, train = TRUE)
Generate a vector of row IDs for use in forecast accuracy tests (hed approach)
## S3 method for class 'heddata' buildForecastIDs(time_cut, hpi_df, forecast_length = 1, train = TRUE)
## S3 method for class 'heddata' buildForecastIDs(time_cut, hpi_df, forecast_length = 1, train = TRUE)
time_cut |
Period after which to cut off data |
hpi_df |
Data to be converted to training or scoring |
forecast_length |
default = 1; Length of forecasting to do |
train |
Default=TRUE; Create training data? FALSE = Scoring data |
Generate a vector of row IDs for use in forecast accuracy tests (rt approach)
## S3 method for class 'rtdata' buildForecastIDs(time_cut, hpi_df, forecast_length = 1, train = TRUE)
## S3 method for class 'rtdata' buildForecastIDs(time_cut, hpi_df, forecast_length = 1, train = TRUE)
time_cut |
Period after which to cut off data |
hpi_df |
Data to be converted to training or scoring |
forecast_length |
default = 1; Length of forecasting to do |
train |
Default=TRUE; Create training data? FALSE = Scoring data |
Estimate index accuracy using one of a variety of approaches
calcAccuracy( hpi_obj, test_method = "insample", test_type = "rt", pred_df = NULL, smooth = FALSE, in_place = FALSE, in_place_name = "accuracy", ... )
calcAccuracy( hpi_obj, test_method = "insample", test_type = "rt", pred_df = NULL, smooth = FALSE, in_place = FALSE, in_place_name = "accuracy", ... )
hpi_obj |
Object of class 'hpi' |
test_method |
default = 'insample'; Also 'kfold' |
test_type |
default = 'rt'; Type of data to use for test. See details. |
pred_df |
default = NULL; Extra data if the test_type doesn't match data in hpi_obj |
smooth |
default = FALSE; calculated on the smoothed index(es) |
in_place |
default = FALSE; Should the result be returned into an existing 'hpi' object |
in_place_name |
default = 'accuracy'; Name for returning in place |
... |
Additional Arguments |
object of class 'hpiaccuracy' inheriting from class 'data.frame' containing the following fields:
Property Identification number
Transaction Price
Predicted price
(Prediction - Actual) / Actual
log(prediction) - log(actual)
Period of the prediction
'rt' test type tests the ability of the index to correctly predict the second value in a repeat transaction pair FUTURE: 'hed' test type tests the ability of the index to improve an OLS model that doesn't account for time. (This approach is not ready yet).
# Load Data data(ex_sales) # Create Index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Calculate insample accuracy hpi_accr <- calcAccuracy(hpi_obj = rt_index, test_type = 'rt', test_method = 'insample')
# Load Data data(ex_sales) # Create Index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Calculate insample accuracy hpi_accr <- calcAccuracy(hpi_obj = rt_index, test_type = 'rt', test_method = 'insample')
Estimate the index accuracy with forecasting for a (progressive) series of indexes
calcForecastError( is_obj, pred_df, return_forecasts = FALSE, forecast_length = 1, ... )
calcForecastError( is_obj, pred_df, return_forecasts = FALSE, forecast_length = 1, ... )
is_obj |
Object of class 'hpiseries' |
pred_df |
Set of sales to be used for predictive quality of index |
return_forecasts |
default = FALSE; return the forecasted indexes |
forecast_length |
default = 1; Length of period(s) in time to forecast |
... |
Additional Arguments |
object of class 'hpiaccuracy' inheriting from class 'data.frame' containing the following fields:
Property Identification number
Transaction Price
Predicted price
(Prediction - Actual) / Actual
log(prediction) - log(actual)
Period of the prediction
Series position from which the prediction was generated
If you set 'return_forecasts' = TRUE, the forecasted indexes for each period will be returned in the 'forecasts' attribute of the 'hpiaccuracy' object. (attr(accr_obj, 'forecasts')
For now, the 'pred_df' object must be a set of repeat transactions with the class 'rt', inheriting from 'hpidata'
# Load example sales data(ex_sales) # Create Index hed_index <- hedIndex(trans_df = ex_sales, periodicity = 'monthly', max_date = '2011-12-31', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 24, dep_var = 'price', ind_var = c('tot_sf', 'beds', 'baths'), smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = hed_index, train_period = 12)) # Create Prediction data rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', max_date = '2011-12-31', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date', min_period_dist = 12) # Calculate forecast accuracty fc_accr <- calcForecastError(is_obj = hpi_series, pred_df = rt_data)
# Load example sales data(ex_sales) # Create Index hed_index <- hedIndex(trans_df = ex_sales, periodicity = 'monthly', max_date = '2011-12-31', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 24, dep_var = 'price', ind_var = c('tot_sf', 'beds', 'baths'), smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = hed_index, train_period = 12)) # Create Prediction data rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', max_date = '2011-12-31', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date', min_period_dist = 12) # Calculate forecast accuracty fc_accr <- calcForecastError(is_obj = hpi_series, pred_df = rt_data)
Estimate the predictive error of an index via an in-sample approach.
calcInSampleError(pred_df, index, ...)
calcInSampleError(pred_df, index, ...)
pred_df |
Set of sales against which to test predictions |
index |
Index (of class 'ts') to be tested for accuracy |
... |
Additional Arguments |
object of class 'hpiaccuracy' inheriting from class 'data.frame' containing the following fields:
Uniq Pair ID number
Transaction Price
Predicted price
(Prediction - Actual) / Actual
log(prediction) - log(actual)
Period of the prediction
In addition to being a stand-alone function, it is also used by 'calcForecastError' and 'calcKFoldError“
# Load example data data(ex_sales) # Create index with raw transaction data rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Calculate accuracy in_accr <- calcInSampleError(pred_df = rt_index$data, index = rt_index$index$value)
# Load example data data(ex_sales) # Create index with raw transaction data rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Calculate accuracy in_accr <- calcInSampleError(pred_df = rt_index$data, index = rt_index$index$value)
Estimate the predictive error of an index via an in-sample approach (hed approach)
## S3 method for class 'heddata' calcInSampleError(pred_df, index, ...)
## S3 method for class 'heddata' calcInSampleError(pred_df, index, ...)
pred_df |
Set of sales against which to test predictions |
index |
Index (of class 'ts') to be tested for accuracy |
... |
Additional Arguments |
Estimate the predictive error of an index via an in-sample approach (rt approach)
## S3 method for class 'rtdata' calcInSampleError(pred_df, index, ...)
## S3 method for class 'rtdata' calcInSampleError(pred_df, index, ...)
pred_df |
Set of sales against which to test predictions |
index |
Index (of class 'ts') to be tested for accuracy |
... |
Additional Arguments |
Use a KFold (out of sample) approach to estimate index accuracy
calcKFoldError(hpi_obj, pred_df, k = 10, seed = 1, smooth = FALSE, ...)
calcKFoldError(hpi_obj, pred_df, k = 10, seed = 1, smooth = FALSE, ...)
hpi_obj |
HPI object of class 'hpi' |
pred_df |
Data.frame of sales to be used for assessing predictive quality of index |
k |
default=10; Number of folds to apply to holdout process |
seed |
default=1; Random seed generator to control the folding process |
smooth |
default = FALSE; Calculate on the smoothed index |
... |
Additional Arguments |
object of class 'hpiaccuracy' inheriting from class 'data.frame' containing the following fields:
Unique Pair ID
Transaction Price
Predicted price
(Prediction - Actual) / Actual
log(prediction) - log(actual)
Period of the prediction
# Load data data(ex_sales) # Create index with raw transaction data rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create prediction data rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Calc Accuracy kf_accr <- calcKFoldError(hpi_obj = rt_index, pred_df = rt_data, k = 10, seed = 123, smooth = FALSE)
# Load data data(ex_sales) # Create index with raw transaction data rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create prediction data rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Calc Accuracy kf_accr <- calcKFoldError(hpi_obj = rt_index, pred_df = rt_data, k = 10, seed = 123, smooth = FALSE)
Create estimates of the revision statistics for a house price index
calcRevision( series_obj, in_place = FALSE, in_place_name = "rev", smooth = FALSE, ... )
calcRevision( series_obj, in_place = FALSE, in_place_name = "rev", smooth = FALSE, ... )
series_obj |
A list of progressively longer indexes (a 'serieshpi“ object from 'createSeries()“) |
in_place |
default = FALSE; Calculating in place (adding to hpi) |
in_place_name |
default = 'rev'; Name of revision object in_place |
smooth |
default = FALSE; Use smoothed indexes |
... |
Additional Arguments |
list of length 3 containing:
Data.frame containing the period number, mean and median for that period
Mean revision for all periods
Median revision for all periods
The revision object can be generate "in place" inside of the 'serieshpi' object by setting 'in_place' equal to TRUE.
# Load example sales data(ex_sales) # Create Index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12)) # Calculate revision series_rev <- calcRevision(series_obj = hpi_series)
# Load example sales data(ex_sales) # Create Index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12)) # Calculate revision series_rev <- calcRevision(series_obj = hpi_series)
Estimate the index accuracy for a (progressive) series of indexes
calcSeriesAccuracy( series_obj, test_method = "insample", test_type = "rt", pred_df = NULL, smooth = FALSE, summarize = FALSE, in_place = FALSE, in_place_name = "accuracy", ... )
calcSeriesAccuracy( series_obj, test_method = "insample", test_type = "rt", pred_df = NULL, smooth = FALSE, summarize = FALSE, in_place = FALSE, in_place_name = "accuracy", ... )
series_obj |
Serieshpi object to be analyzed |
test_method |
default = 'insample'; Also 'kfold' or 'forecast' |
test_type |
default = 'rt'; Type of data to use for test. See details. |
pred_df |
default = NULL; Extra data if the test_type doesn't match data in hpi_obj |
smooth |
default = FALSE; Analyze the smoothed indexes |
summarize |
default = FALSE; When multiple accuracy measurements for single observation take the mean of them all. |
in_place |
default = FALSE; Should the result be returned into an existing 'hpi' object |
in_place_name |
default = 'accuracy'; Name for returning in place |
... |
Additional Arguments |
‘seriesaccuracy' object (unless calculated ’in_place')
Unless using 'test_method = "forecast"“ with a "forecast_length" of 1, the results will have more than one accuracy estimate per observations. Setting 'summarize = TRUE' will take the mean accuracy for each observation across all indexes.
# Load data data(ex_sales) # Create index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12)) # Calculate insample accuracy hpi_series_accr <- calcSeriesAccuracy(series_obj = hpi_series, test_type = 'rt', test_method = 'insample')
# Load data data(ex_sales) # Create index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12)) # Calculate insample accuracy hpi_series_accr <- calcSeriesAccuracy(series_obj = hpi_series, test_type = 'rt', test_method = 'insample')
Calculates volatility over a (progressive) series of indexes
calcSeriesVolatility( series_obj, window = 3, smooth = FALSE, in_place_name = "volatility", ... )
calcSeriesVolatility( series_obj, window = 3, smooth = FALSE, in_place_name = "volatility", ... )
series_obj |
Series object to be calculated |
window |
default = 3; Rolling periods over which to calculate the volatility |
smooth |
default = FALSE; Also calculate volatility for smoothed indexes |
in_place_name |
name if saving in place |
... |
Additional Arguments |
'serieshpi' object
Leaving order blank default to a moving average with order 3.
# Load example sales data(ex_sales) # Create Index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12)) # Calculate series volatility series_vol <- calcSeriesVolatility(series_obj = hpi_series, window= 3)
# Load example sales data(ex_sales) # Create Index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12)) # Calculate series volatility series_vol <- calcSeriesVolatility(series_obj = hpi_series, window= 3)
Create estimate of index volatility given a window
calcVolatility( index, window = 3, in_place = FALSE, in_place_name = "volatility", smooth = FALSE, ... )
calcVolatility( index, window = 3, in_place = FALSE, in_place_name = "volatility", smooth = FALSE, ... )
index |
An object of class 'hpiindex' |
window |
default = 3; Rolling periods over which to calculate the volatility |
in_place |
default = FALSE; Adds volatility metric to the 'hpiindex' object (may be within an 'hpi' object) |
in_place_name |
default = 'vol'; Name of volatility object in 'hpiindex' object |
smooth |
default = FALSE; Calculate on the smoothed index? |
... |
Additional arguments |
an ‘indexvolatility' (S3) object, the ’index' slot of which is a 'ts' object
volatility at each rolling point
overall mean volatility
overall median volatility
You may also provide an 'hpi' object to this function. If you do, it will extract the 'hpiindex' object from the 'index' slot in the 'hpi' class object.
# Load Data data(ex_sales) # Create index with raw transaction data rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Calculate Volatility index_vol <- calcVolatility(index = rt_index, window = 3)
# Load Data data(ex_sales) # Create index with raw transaction data rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Calculate Volatility index_vol <- calcVolatility(index = rt_index, window = 3)
Internal function to validate (or convert) the provided date field
checkDate(x_date, name)
checkDate(x_date, name)
x_date |
Date string or vector |
name |
Name of argument to return in error/warning message |
Adjusted date field
# Load Data data(ex_sales) # Check date date_checked <- checkDate(x_date = ex_sales$sale_date, name = 'sale date')
# Load Data data(ex_sales) # Check date date_checked <- checkDate(x_date = ex_sales$sale_date, name = 'sale date')
Generic method for creating KFold testing data
createKFoldData(score_ids, full_data, pred_df)
createKFoldData(score_ids, full_data, pred_df)
score_ids |
Vector of row ids to be included in scoring data |
full_data |
Complete dataset (class 'hpidata“) of this model type (rt or hed) |
pred_df |
Data to be used for prediction |
list of length 2 containing:
Training data.frame
Scoring data.frame
Called from 'calcKFoldError()“
# Load Data data(ex_sales) # Create RT Data rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Create folds k_folds <- split(x = 1:nrow(rt_data), f = sample(1:10, nrow(rt_data), replace = TRUE)) # Create data from folds kfold_data <- createKFoldData(score_ids = k_folds[[1]], full_data = rt_data, pred_df = rt_data)
# Load Data data(ex_sales) # Create RT Data rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Create folds k_folds <- split(x = 1:nrow(rt_data), f = sample(1:10, nrow(rt_data), replace = TRUE)) # Create data from folds kfold_data <- createKFoldData(score_ids = k_folds[[1]], full_data = rt_data, pred_df = rt_data)
'rtdata' method for creating KFold testing data
## S3 method for class 'rtdata' createKFoldData(score_ids, full_data, pred_df)
## S3 method for class 'rtdata' createKFoldData(score_ids, full_data, pred_df)
score_ids |
Vector of row ids to be included in scoring data |
full_data |
Complete dataset (class 'hpidata“) of this model type (rt or hed) |
pred_df |
Data to be used for prediction |
Generate a series of progressive indexes
createSeries(hpi_obj, train_period = 12, max_period = NULL, ...)
createSeries(hpi_obj, train_period = 12, max_period = NULL, ...)
hpi_obj |
Object of class 'hpi' |
train_period |
default = 12; Number of periods to use as purely training before creating indexes |
max_period |
default = NULL; Maximum number of periods to create the index up to |
... |
Additional Arguments |
An 'serieshpi' object – a list of 'hpi' objects.
'train_period' Represents the shortest index that you will create. For certain approaches, such as a repeat transaction model, indexes shorter than 10 will likely be highly unstable.
If 'max_period“ is left NULL, then it will forecast up to the end of the data.
# Load example sales data(ex_sales) # Create Index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12))
# Load example sales data(ex_sales) # Create Index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12))
Create a relative period variable from a date variable
dateToPeriod( trans_df, date, periodicity = NULL, min_date = NULL, max_date = NULL, adj_type = "move", ... )
dateToPeriod( trans_df, date, periodicity = NULL, min_date = NULL, max_date = NULL, adj_type = "move", ... )
trans_df |
data.frame of raw transactions |
date |
name of field containing the date of the sale in Date or POSIXt format |
periodicity |
type of periodicity to use ('yearly', 'quarterly', 'monthly' or 'weekly) |
min_date |
default = NULL; optional minimum date to use |
max_date |
default = NULL; optional maximum date to use |
adj_type |
default = 'move'; how to handle min and max dates within the range of transactions. 'move' min and/or max date or 'clip' the data |
... |
Additional arguments |
original data frame ('trans_df' object) with two new fields: trans_period: integer value counting from the minimum transaction date in the periodicity selected. Base value is 1. Primarily for modeling trans_date: properly formatted transaction date
"trans_period" counts from the minimum transaction date provided. As such the period counts are relative, not absolute
Additionally, this function modifies the data.frame that it is given and return that same data.frame that it is given and returns that data.frame with the new fields attached.
# Load data data(ex_sales) # Convert to period df hpi_data <- dateToPeriod(trans_df = ex_sales, date = 'sale_date', periodicity = 'monthly')
# Load data data(ex_sales) # Convert to period df hpi_data <- dateToPeriod(trans_df = ex_sales, date = 'sale_date', periodicity = 'monthly')
Seattle home sales from areas 13, 14,an 15 (central Seattle) 2010 to 2016. Includes only detached single family residences and townhomes. Data gathered from the King County Assessor's FTP site. A number of initial data munging tasks were necessary to bring the data into this format.
data(ex_sales)
data(ex_sales)
A "data.frame"
with 5,348 rows and 16 variables
The unique property identifying code. Original value is preceded by two '..'s to prevent the dropping of leading zeros
The unique transaction identifying code.
Price of the home
Date of sale
Property use type
Assessment area or zone
Size of lot in square feet
Is property waterfront?
Quality of the building construction (higher is better)
Size of home in square feet
Number of bedrooms
Number of bathrooms
Age of home
Age of home, considering major remodels
Longitude
Latitude
King County Assessor: https://info.kingcounty.gov/assessor/DataDownload/
Generate standardized data for the 'hed' modeling approach
hedCreateTrans( trans_df, prop_id, trans_id, price, date = NULL, periodicity = NULL, ... )
hedCreateTrans( trans_df, prop_id, trans_id, price, date = NULL, periodicity = NULL, ... )
trans_df |
sales transaction in either a data.frame or a trans_df class from dateToPeriod() function |
prop_id |
field contain the unique property identification |
trans_id |
field containing the unique transaction identification |
price |
field containing the transaction price |
date |
default=NULL, field containing the date of the transaction. Only necessary if not passing an 'hpidata' object |
periodicity |
default=NULL, field containing the desired periodicity of analysis. Only necessary if not passing a 'hpidata' object |
... |
Additional arguments |
data.frame of transactions with standardized period field. Note that a full data.frame of the possible periods, their values and names can be found in the attributes to the returned 'hed' object
# Load example data data(ex_sales) # Create Hed Data ex_heddata <- hedCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', date = 'sale_date', periodicity = 'monthly')
# Load example data data(ex_sales) # Create Hed Data ex_heddata <- hedCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', date = 'sale_date', periodicity = 'monthly')
Wrapper to create index object via entire hedonic approach
hedIndex(trans_df, dep_var = NULL, ind_var = NULL, hed_spec = NULL, ...)
hedIndex(trans_df, dep_var = NULL, ind_var = NULL, hed_spec = NULL, ...)
trans_df |
data.frame of transactions |
dep_var |
default = NULL; Dependent variable in hedonic model |
ind_var |
default = NULL; Independent variables in the hedonic model |
hed_spec |
default = NULL; Full hedonic model specification |
... |
Additional Arguments |
'hpi“ object. S3 list with:
'hpidata' object
'hpimodel' object
'hpiindex' object
Additional argument need to provide necessary argument for create 'hpidata' objects if the 'trans_df' object is not of that class.
# Load data data(ex_sales) # Create index with raw transaction data hed_index <- hedIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, dep_var = 'price', ind_var = c('tot_sf', 'beds', 'baths'), smooth = FALSE)
# Load data data(ex_sales) # Create index with raw transaction data hed_index <- hedIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, dep_var = 'price', ind_var = c('tot_sf', 'beds', 'baths'), smooth = FALSE)
Estimate coefficients for an index via the hedonic approach (generic method)
hedModel(estimator, hed_df, hed_spec, ...)
hedModel(estimator, hed_df, hed_spec, ...)
estimator |
Type of model to estimates (base, robust, weighted) |
hed_df |
Repeat sales dataset from hedCreateSales() |
hed_spec |
Model specification ('formula' object) |
... |
Additional arguments |
'hedmodel' object: model object of the estimator (ex.: 'lm')
‘estimator' argument must be in a class of ’base', 'weighted' or 'robust' This function is not generally called directly, but rather from 'hpiModel()'
# Load example data data(ex_sales) # Create hedonic data hed_data <- hedCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', date = 'sale_date', periodicity = 'monthly') # Estimate Model hed_model <- hedModel(estimator = structure('base', class = 'base'), hed_df = hed_data, hed_spec = as.formula(log(price) ~ baths + tot_sf))
# Load example data data(ex_sales) # Create hedonic data hed_data <- hedCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', date = 'sale_date', periodicity = 'monthly') # Estimate Model hed_model <- hedModel(estimator = structure('base', class = 'base'), hed_df = hed_data, hed_spec = as.formula(log(price) ~ baths + tot_sf))
Use of base estimator in hedonic model approach
## S3 method for class 'base' hedModel(estimator, hed_df, hed_spec, ...)
## S3 method for class 'base' hedModel(estimator, hed_df, hed_spec, ...)
estimator |
Type of model to estimates (base, robust, weighted) |
hed_df |
Repeat sales dataset from hedCreateSales() |
hed_spec |
Model specification ('formula' object) |
... |
Additional arguments |
See '?hedModel' for more information
Use of robust estimator in hedonic model approach
## S3 method for class 'robust' hedModel(estimator, hed_df, hed_spec, ...)
## S3 method for class 'robust' hedModel(estimator, hed_df, hed_spec, ...)
estimator |
Type of model to estimates (base, robust, weighted) |
hed_df |
Repeat sales dataset from hedCreateSales() |
hed_spec |
Model specification ('formula' object) |
... |
Additional arguments |
See '?hedModel' for more information
See '?hedModel' for more information
Use of weighted estimator in hedonic model approach
## S3 method for class 'weighted' hedModel(estimator, hed_df, hed_spec, ...)
## S3 method for class 'weighted' hedModel(estimator, hed_df, hed_spec, ...)
estimator |
Type of model to estimates (base, robust, weighted) |
hed_df |
Repeat sales dataset from hedCreateSales() |
hed_spec |
Model specification ('formula' object) |
... |
Additional arguments |
See '?hedModel' for more information
Generic method to estimate modeling approaches for indexes
hpiModel( model_type, hpi_df, estimator = "base", log_dep = TRUE, trim_model = TRUE, mod_spec = NULL, ... )
hpiModel( model_type, hpi_df, estimator = "base", log_dep = TRUE, trim_model = TRUE, mod_spec = NULL, ... )
model_type |
Type of model to estimate ('rt', 'hed', 'rf') |
hpi_df |
Dataset created by one of the *CreateTrans() function in this package. |
estimator |
Type of estimator to be used ('base', 'weighted', 'robust') |
log_dep |
default TRUE, should the dependent variable (change in price) be logged? |
trim_model |
default TRUE, should excess be trimmed from model results ('lm' or 'rlm' object)? |
mod_spec |
Model specification |
... |
Additional Arguments |
hpimodel object consisting of:
Type of estimator
Data.frame of coefficient
class 'rtmodel' or 'hedmodel'
Full model specification
Binary: is the dependent variable in logged format
Mean price in the base period
'data.frame' of periods
Type of model used
# Load data data(ex_sales) # With a raw transaction data.frame rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Create model object hpi_model <- hpiModel(model_type = 'rt', hpi_df = rt_data, estimator = 'base', log_dep = TRUE) # For custom weighted repeat transaction model hpi_model_wgt <- hpiModel(model_type = 'rt', hpi_df = rt_data, estimator = 'weighted', weights = runif(nrow(rt_data), 0, 1))
# Load data data(ex_sales) # With a raw transaction data.frame rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Create model object hpi_model <- hpiModel(model_type = 'rt', hpi_df = rt_data, estimator = 'base', log_dep = TRUE) # For custom weighted repeat transaction model hpi_model_wgt <- hpiModel(model_type = 'rt', hpi_df = rt_data, estimator = 'weighted', weights = runif(nrow(rt_data), 0, 1))
Estimate hpi models with hed approach
## S3 method for class 'hed' hpiModel( model_type, hpi_df, estimator = "base", log_dep = TRUE, trim_model = TRUE, mod_spec = NULL, dep_var = NULL, ind_var = NULL, ... )
## S3 method for class 'hed' hpiModel( model_type, hpi_df, estimator = "base", log_dep = TRUE, trim_model = TRUE, mod_spec = NULL, dep_var = NULL, ind_var = NULL, ... )
model_type |
Type of model to estimate ('rt', 'hed', 'rf') |
hpi_df |
Dataset created by one of the *CreateSales() function in this package. |
estimator |
Type of estimator to be used ('base', 'weighted', 'robust') |
log_dep |
default=TRUE; should the dependent variable (change in price) be logged? |
trim_model |
default TRUE, should excess be trimmed from model results ('lm' or 'rlm' object)? |
mod_spec |
default=NULL; hedonic model specification |
dep_var |
default=NULL; dependent variable of the model |
ind_var |
default=NULL; independent variable(s) of the model |
... |
Additional Arguments |
hpimodel object consisting of:
Type of estimator
Data.frame of coefficient
class 'rtmodel' or 'hedmodel'
Full model specification
Binary: is the dependent variable in logged format
Mean price in the base period
'data.frame' of periods
Type of model used
Estimate hpi models with hed approach
## S3 method for class 'rf' hpiModel( model_type, hpi_df, estimator = "pdp", log_dep = TRUE, trim_model = TRUE, mod_spec = NULL, dep_var = NULL, ind_var = NULL, ... )
## S3 method for class 'rf' hpiModel( model_type, hpi_df, estimator = "pdp", log_dep = TRUE, trim_model = TRUE, mod_spec = NULL, dep_var = NULL, ind_var = NULL, ... )
model_type |
Type of model ('rt', 'hed', 'rf') |
hpi_df |
Dataset created by one of the *CreateSales() function in this package. |
estimator |
Type of estimator to be used ('base', 'weighted', 'robust') |
log_dep |
default=TRUE; should the dependent variable (change in price) be logged? |
trim_model |
default TRUE, should excess be trimmed from model results ('lm' or 'rlm' object)? |
mod_spec |
default=NULL; hedonic model specification |
dep_var |
default=NULL; dependent variable of the model |
ind_var |
default=NULL; independent variable(s) of the model |
... |
Additional Arguments |
hpimodel object consisting of:
Type of estimator
Data.frame of coefficient
class 'rtmodel' or 'hedmodel'
Full model specification
Binary: is the dependent variable in logged format
Mean price in the base period
'data.frame' of periods
Type of model used
Estimate hpi models with rt approach
## S3 method for class 'rt' hpiModel( model_type, hpi_df, estimator = "base", log_dep = TRUE, trim_model = TRUE, mod_spec = NULL, ... )
## S3 method for class 'rt' hpiModel( model_type, hpi_df, estimator = "base", log_dep = TRUE, trim_model = TRUE, mod_spec = NULL, ... )
model_type |
Type of model to estimate ('rt', 'hed', 'rf') |
hpi_df |
Dataset created by one of the *CreateTrans() function in this package. |
estimator |
Type of estimator to be used ('base', 'weighted', 'robust') |
log_dep |
default TRUE, should the dependent variable (change in price) be logged? |
trim_model |
default TRUE, should excess be trimmed from model results ('lm' or 'rlm' object)? |
mod_spec |
Model specification |
... |
Additional Arguments |
hpimodel object consisting of:
Type of estimator
Data.frame of coefficient
class 'rtmodel' or 'hedmodel'
Full model specification
Binary: is the dependent variable in logged format
Mean price in the base period
'data.frame' of periods
Type of model used
House Price Indexes in R: A set of tools to create house price indexes and analyze their various performance metrics.
Function to help create KFold data based on approach (Generic Method)
matchKFold(train_df, pred_df)
matchKFold(train_df, pred_df)
train_df |
Data.frame of training data |
pred_df |
Data.frame (class 'hpidata“) to be used for prediction |
list
Training data
Scoring data
Helper function called from createKFoldData
Function to help create KFold data based on hed approach
## S3 method for class 'heddata' matchKFold(train_df, pred_df)
## S3 method for class 'heddata' matchKFold(train_df, pred_df)
train_df |
Data.frame of training data |
pred_df |
Data.frame (class 'hpidata“) to be used for prediction |
Function to help create KFold data based on rt approach
## S3 method for class 'rtdata' matchKFold(train_df, pred_df)
## S3 method for class 'rtdata' matchKFold(train_df, pred_df)
train_df |
Data.frame of training data |
pred_df |
Data.frame (class 'hpidata“) to be used for prediction |
Converts model results to standardized index objects
modelToIndex(model_obj, max_period = max(model_obj$coefficients$time), ...)
modelToIndex(model_obj, max_period = max(model_obj$coefficients$time), ...)
model_obj |
Model results object |
max_period |
Maximum number of periods that should have been estimated. |
... |
Additional arguments |
'hpiindex' object containing:
name |
vector of period names |
numeric |
vector of period in numeric form |
period |
vector of period numbers |
value |
'ts' object of the index values |
imputed |
vector of binary values indicating imputation |
# Load data data(ex_sales) # With a raw transaction data.frame rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Create model object hpi_model <- hpiModel(model_type = 'rt', hpi_df = rt_data, estimator = 'base', log_dep = TRUE) # Create Index hpi_index <- modelToIndex(hpi_model, max_period = 84)
# Load data data(ex_sales) # With a raw transaction data.frame rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Create model object hpi_model <- hpiModel(model_type = 'rt', hpi_df = rt_data, estimator = 'base', log_dep = TRUE) # Create Index hpi_index <- modelToIndex(hpi_model, max_period = 84)
Generic method for create simple table of all selected periods. Used within 'dateToPeriod()'
periodTable(trans_df, periodicity, ...)
periodTable(trans_df, periodicity, ...)
trans_df |
Transaction data.frame |
periodicity |
Periodicity option ('weekly', 'monthly', 'quarterly', 'annually') |
... |
Additional Arguments |
[data.frame] consisting of
Period number
start date of each period
end date of each period
name of the period
# Load data data(ex_sales) ex_sales$trans_date <- checkDate(ex_sales[['sale_date']], 'date') # With a raw transaction data.frame pt_df <- periodTable(trans_df = ex_sales, periodicity = 'annual')
# Load data data(ex_sales) ex_sales$trans_date <- checkDate(ex_sales[['sale_date']], 'date') # With a raw transaction data.frame pt_df <- periodTable(trans_df = ex_sales, periodicity = 'annual')
Specific method for creating annual period table
## S3 method for class 'annual' periodTable(trans_df, periodicity, ...)
## S3 method for class 'annual' periodTable(trans_df, periodicity, ...)
trans_df |
Transaction data.frame |
periodicity |
Periodicity option ('weekly', 'monthly', 'quarterly', 'annually') |
... |
Additional Arguments |
Specific method for creating flexible frequency periods
## S3 method for class 'equalfreq' periodTable( trans_df, periodicity, freq = NULL, start = NULL, first_date = NULL, last_date = NULL, ... )
## S3 method for class 'equalfreq' periodTable( trans_df, periodicity, freq = NULL, start = NULL, first_date = NULL, last_date = NULL, ... )
trans_df |
Transaction data.frame |
periodicity |
Periodicity option ('weekly', 'monthly', 'quarterly', 'annually') |
freq |
[30] Frequency width of each period in days |
start |
['first'] Where to start counting ('first' or 'last') |
first_date |
[NULL] If null, the data determines the first date. Else set your own. Note that the first_date must be outside of the range of transaction dates. It can only extend the time period, not clip it. That should be done else where. |
last_date |
[NULL] If null, the data determines the last date. Else set your own Note that the last_date must be outside of the range of transaction dates. It can only extend the time period, not clip it. That should be done else where. |
... |
Additional Arguments |
Specific method for creating flexible frequency periods
## S3 method for class 'equalsample' periodTable(trans_df, periodicity, nbr_periods, ...)
## S3 method for class 'equalsample' periodTable(trans_df, periodicity, nbr_periods, ...)
trans_df |
Transaction data.frame |
periodicity |
Periodicity option ('weekly', 'monthly', 'quarterly', 'annually') |
nbr_periods |
Number of periods to use |
... |
Additional Arguments |
Specific method for creating monthly period table
## S3 method for class 'monthly' periodTable(trans_df, periodicity, ...)
## S3 method for class 'monthly' periodTable(trans_df, periodicity, ...)
trans_df |
Transaction data.frame |
periodicity |
Periodicity option ('weekly', 'monthly', 'quarterly', 'annually') |
... |
Additional Arguments |
Specific method for creating quarterly period table
## S3 method for class 'quarterly' periodTable(trans_df, periodicity, ...)
## S3 method for class 'quarterly' periodTable(trans_df, periodicity, ...)
trans_df |
Transaction data.frame |
periodicity |
Periodicity option ('weekly', 'monthly', 'quarterly', 'annually') |
... |
Additional Arguments |
Specific method for creating weekly period table
## S3 method for class 'weekly' periodTable(trans_df, periodicity, ...)
## S3 method for class 'weekly' periodTable(trans_df, periodicity, ...)
trans_df |
Transaction data.frame |
periodicity |
Periodicity option ('weekly', 'monthly', 'quarterly', 'annually') |
... |
Additional Arguments |
Specific plotting method for hpi objects
## S3 method for class 'hpi' plot(x, ...)
## S3 method for class 'hpi' plot(x, ...)
x |
Object to plot of class 'hpi' |
... |
Additional Arguments |
'plotindex' object inheriting from a ggplot object
Additional argument can include those argument for 'plot.hpindex“
# Load data data(ex_sales) # Create index with raw transaction data rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Plot data plot(rt_index) plot(rt_index, smooth = TRUE)
# Load data data(ex_sales) # Create index with raw transaction data rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Plot data plot(rt_index) plot(rt_index, smooth = TRUE)
Specific plotting method for hpiaccuracy objects
## S3 method for class 'hpiaccuracy' plot(x, return_plot = FALSE, do_plot = TRUE, use_log_error = FALSE, ...)
## S3 method for class 'hpiaccuracy' plot(x, return_plot = FALSE, do_plot = TRUE, use_log_error = FALSE, ...)
x |
Object to plot of class 'hpiaccuracy“ |
return_plot |
default = FALSE; Return the plot to the function call |
do_plot |
default = FALSE; Execute plotting to terminal/console |
use_log_error |
[FALSE] Use the log error? |
... |
Additional Arguments |
'plotaccuracy' object inheriting from a ggplot object
# Load Data data(ex_sales) # Create Index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Calculate insample accuracy hpi_accr <- calcAccuracy(hpi_obj = rt_index, test_type = 'rt', test_method = 'insample') # Make Plot plot(hpi_accr)
# Load Data data(ex_sales) # Create Index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Calculate insample accuracy hpi_accr <- calcAccuracy(hpi_obj = rt_index, test_type = 'rt', test_method = 'insample') # Make Plot plot(hpi_accr)
Specific plotting method for hpiindex objects
## S3 method for class 'hpiindex' plot(x, show_imputed = FALSE, smooth = FALSE, ...)
## S3 method for class 'hpiindex' plot(x, show_imputed = FALSE, smooth = FALSE, ...)
x |
Object to plot of class 'hpiindex“ |
show_imputed |
default = FALSE; highlight the imputed points |
smooth |
default = FALSE; plot the smoothed index |
... |
Additional Arguments |
'plotindex' object inheriting from a ggplot object
# Load data data(ex_sales) # With a raw transaction data.frame rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Create model object hpi_model <- hpiModel(model_type = 'rt', hpi_df = rt_data, estimator = 'base', log_dep = TRUE) # Create Index hpi_index <- modelToIndex(hpi_model, max_period = 84) # Make Plot plot(hpi_index)
# Load data data(ex_sales) # With a raw transaction data.frame rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Create model object hpi_model <- hpiModel(model_type = 'rt', hpi_df = rt_data, estimator = 'base', log_dep = TRUE) # Create Index hpi_index <- modelToIndex(hpi_model, max_period = 84) # Make Plot plot(hpi_index)
Specific plotting method for indexvolatility objects
## S3 method for class 'indexvolatility' plot(x, ...)
## S3 method for class 'indexvolatility' plot(x, ...)
x |
Object to plot of class 'indexvolatility“ |
... |
Additional Arguments |
'plotvolatility' object inheriting from a ggplot object
# Load Data data(ex_sales) # Create index with raw transaction data rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Calculate Volatility index_vol <- calcVolatility(index = rt_index, window = 3) # Make Plot plot(index_vol)
# Load Data data(ex_sales) # Create index with raw transaction data rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Calculate Volatility index_vol <- calcVolatility(index = rt_index, window = 3) # Make Plot plot(index_vol)
Specific plotting method for seriesaccuracy objects
## S3 method for class 'seriesaccuracy' plot(x, return_plot = FALSE, ...)
## S3 method for class 'seriesaccuracy' plot(x, return_plot = FALSE, ...)
x |
Object of class 'hpiaccuracy“ |
return_plot |
default = FALSE; Return the plot to the function call |
... |
Additional argument (passed to 'plot.hpiaccuracy()“) |
'plotaccuracy' object inheriting from a ggplot object
# Load data data(ex_sales) # Create index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12)) # Calculate insample accuracy hpi_series_accr <- calcSeriesAccuracy(series_obj = hpi_series, test_type = 'rt', test_method = 'insample') # Make Plot plot(hpi_series_accr)
# Load data data(ex_sales) # Create index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12)) # Calculate insample accuracy hpi_series_accr <- calcSeriesAccuracy(series_obj = hpi_series, test_type = 'rt', test_method = 'insample') # Make Plot plot(hpi_series_accr)
Specific plotting method for serieshpi objects
## S3 method for class 'serieshpi' plot(x, smooth = FALSE, ...)
## S3 method for class 'serieshpi' plot(x, smooth = FALSE, ...)
x |
Object of class 'serieshpi' |
smooth |
default = FALSE; plot the smoothed object |
... |
Additional Arguments' |
'plotseries' object inheriting from a ggplot object
# Load data data(ex_sales) # Create index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12)) # Make Plot plot(hpi_series)
# Load data data(ex_sales) # Create index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12)) # Make Plot plot(hpi_series)
Specific plotting method for seriesrevision objects
## S3 method for class 'seriesrevision' plot(x, measure = "median", ...)
## S3 method for class 'seriesrevision' plot(x, measure = "median", ...)
x |
Object to plot of class 'seriesrevision' |
measure |
default = 'median'; Metric to plot ('median' or 'mean') |
... |
Additional Arguments |
'plotrevision' object inheriting from a ggplot object
# Load example sales data(ex_sales) # Create Index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12)) # Calculate revision series_rev <- calcRevision(series_obj = hpi_series) # Make Plot plot(series_rev)
# Load example sales data(ex_sales) # Create Index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12)) # Calculate revision series_rev <- calcRevision(series_obj = hpi_series) # Make Plot plot(series_rev)
Wrapper to create index object via entire random forest approach
rfIndex(trans_df, dep_var = NULL, ind_var = NULL, rf_spec = NULL, ...)
rfIndex(trans_df, dep_var = NULL, ind_var = NULL, rf_spec = NULL, ...)
trans_df |
data.frame of transactions |
dep_var |
default = NULL; Dependent variable in hedonic model |
ind_var |
default = NULL; Independent variables in the hedonic model |
rf_spec |
default = NULL; Full random forest model specification |
... |
Additional Arguments |
'hpi“ object. S3 list with:
'hpidata' object
'hpimodel' object
'hpiindex' object
Additional argument need to provide necessary argument for create 'hpidata' objects if the 'trans_df' object is not of that class.
# Load data data(ex_sales) # Create index with raw transaction data rf_index <- rfIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'pdp', log_dep = TRUE, trim_model = TRUE, max_period = 48, dep_var = 'price', ind_var = c('tot_sf', 'beds', 'baths'), smooth = FALSE, ntrees = 10, sim_count = 2)
# Load data data(ex_sales) # Create index with raw transaction data rf_index <- rfIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'pdp', log_dep = TRUE, trim_model = TRUE, max_period = 48, dep_var = 'price', ind_var = c('tot_sf', 'beds', 'baths'), smooth = FALSE, ntrees = 10, sim_count = 2)
Estimate coefficients for an index via the random forest approach (generic method)
rfModel(estimator, rf_df, rf_spec, ntrees = 200, seed = 1, ...)
rfModel(estimator, rf_df, rf_spec, ntrees = 200, seed = 1, ...)
estimator |
Type of model to estimates (pdp) |
rf_df |
Transactions dataset from hedCreateSales() |
rf_spec |
Model specification ('formula' object) |
ntrees |
[200] Set number of trees to use |
seed |
[1] Random seed for reproducibility |
... |
Additional arguments |
'rfmodel' object: model object of the estimator (ex.: 'lm')
‘estimator' argument must be in a class of ’pdp' This function is not generally called directly, but rather from 'hpiModel()'
# Load example data data(ex_sales) # Create hedonic data hed_data <- hedCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', date = 'sale_date', periodicity = 'monthly') # Estimate Model rf_model <- rfModel(estimator = structure('pdp', class = 'pdp'), rf_df = hed_data, rf_spec = as.formula(log(price) ~ baths + tot_sf), ntrees = 10, sim_count = 1)
# Load example data data(ex_sales) # Create hedonic data hed_data <- hedCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', date = 'sale_date', periodicity = 'monthly') # Estimate Model rf_model <- rfModel(estimator = structure('pdp', class = 'pdp'), rf_df = hed_data, rf_spec = as.formula(log(price) ~ baths + tot_sf), ntrees = 10, sim_count = 1)
Use of pdp estimator in random forest approach
## S3 method for class 'pdp' rfModel(estimator, rf_df, rf_spec, ntrees = 200, seed = 1, ...)
## S3 method for class 'pdp' rfModel(estimator, rf_df, rf_spec, ntrees = 200, seed = 1, ...)
estimator |
Type of model to estimates (pdp) |
rf_df |
Transactions dataset from hedCreateSales() |
rf_spec |
Model specification ('formula' object) |
ntrees |
[200] Set number of trees to use |
seed |
[1] Random seed for reproducibility |
... |
Additional arguments |
See '?rfModel' for more information
Create data to use in PDP simulation
rfSimDf(rf_df, seed, sim_ids = NULL, sim_count = NULL, sim_per = NULL, ...)
rfSimDf(rf_df, seed, sim_ids = NULL, sim_count = NULL, sim_per = NULL, ...)
rf_df |
Full training dataset |
seed |
Random seed for reproducibility |
sim_ids |
row ids to simulate |
sim_count |
number of random rows to simulate |
sim_per |
percent of rows to randomly simulate |
... |
Additional arguments |
See '?rfModel' for more information
Generate standardized object for rt estimate approach
rtCreateTrans( trans_df, prop_id, trans_id, price, date = NULL, periodicity = NULL, seq_only = FALSE, min_period_dist = NULL, ... )
rtCreateTrans( trans_df, prop_id, trans_id, price, date = NULL, periodicity = NULL, seq_only = FALSE, min_period_dist = NULL, ... )
trans_df |
transactions in either a data.frame or a 'hpidata“ class from dateToPeriod() function |
prop_id |
field contain the unique property identification |
trans_id |
field containing the unique transaction identification |
price |
field containing the transaction price |
date |
default=NULL, field containing the date of the sale. Only necessary if not passing an 'hpidata' object |
periodicity |
default=NULL, field containing the desired periodicity of analysis. Only necessary if not passing a 'hpidata' object |
seq_only |
default=FALSE, indicating whether to only include sequential repeat observations 1 to 2 and 2 to 3. False returns 1 to 2, 1 to 3 and 2 to 3. |
min_period_dist |
[12] Minimum number of period required between repeat sales |
... |
Additional arguments |
data.frame of repeat transactions. Note that a full data.frame of the possible periods, their values and names can be found in the attributes to the returned 'rtdata' object
Properties with greater than two transactions during the period will make pairwise matches among all sales. Any property transacting twice in the same period will remove the lower priced of the two transactions. If passing a raw data.frame (not a 'hpidata“ object) the "date" field should refer to a field containing a vector of class POSIXt or Date.
# Load data data(ex_sales) # With a raw transaction data.frame rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date')
# Load data data(ex_sales) # With a raw transaction data.frame rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date')
Wrapper to create index object via entire repeat transaction approach
rtIndex(trans_df, ...)
rtIndex(trans_df, ...)
trans_df |
data.frame of transactions. Can be a 'hpidata' or an 'rtdata' object. |
... |
Additional Arguments |
'hpi“ object. S3 list with:
'hpidata' object
'hpimodel' object
'hpiindex' object
Additional argument need to provide necessary argument for create 'hpidata' objects if the 'trans_df' object is not of that class.
# Load data data(ex_sales) # Create index with raw transaction data rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE)
# Load data data(ex_sales) # Create index with raw transaction data rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE)
Estimate coefficients for an index via the repeat transaction approach (generic method)
rtModel(rt_df, time_matrix, price_diff, estimator, lm_recover = TRUE, ...)
rtModel(rt_df, time_matrix, price_diff, estimator, lm_recover = TRUE, ...)
rt_df |
Repeat transactions dataset from rtCreateTrans() |
time_matrix |
Time matrix object from rtTimeMatrix() |
price_diff |
Difference in price between the two transactions |
estimator |
Type of model to estimates (base, robust, weighted). Must be in that class. |
lm_recover |
(TRUE) Allows robust model to use linear model if it fails |
... |
Additional arguments |
'rtmodel' object
Three available specific methods: 'base', 'robust' and 'weighted'
# Load data data(ex_sales) # With a raw transaction data.frame rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Calc price differences price_diff <- rt_data$price_2 - rt_data$price_1 # Create time matrix rt_matrix <- rtTimeMatrix(rt_data) # Calculate model rt_model <- rtModel(rt_df = rt_data, price_diff = price_diff, time_matrix = rt_matrix, estimator = structure('base', class='base'))
# Load data data(ex_sales) # With a raw transaction data.frame rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Calc price differences price_diff <- rt_data$price_2 - rt_data$price_1 # Create time matrix rt_matrix <- rtTimeMatrix(rt_data) # Calculate model rt_model <- rtModel(rt_df = rt_data, price_diff = price_diff, time_matrix = rt_matrix, estimator = structure('base', class='base'))
Use of base estimator in repeat transactions model approach
## S3 method for class 'base' rtModel(rt_df, time_matrix, price_diff, estimator, ...)
## S3 method for class 'base' rtModel(rt_df, time_matrix, price_diff, estimator, ...)
rt_df |
Repeat transactions dataset from rtCreateTrans() |
time_matrix |
Time matrix object from rtTimeMatrix() |
price_diff |
Difference in price between the two transactions |
estimator |
Type of model to estimates (base, robust, weighted). Must be in that class. |
... |
Additional arguments |
See '?rtModel' for more information
Use of robust estimator in repeat transactions model approach
## S3 method for class 'robust' rtModel(rt_df, time_matrix, price_diff, estimator, lm_recover = TRUE, ...)
## S3 method for class 'robust' rtModel(rt_df, time_matrix, price_diff, estimator, lm_recover = TRUE, ...)
rt_df |
Repeat transactions dataset from rtCreateTrans() |
time_matrix |
Time matrix object from rtTimeMatrix() |
price_diff |
Difference in price between the two transactions |
estimator |
Type of model to estimates (base, robust, weighted). Must be in that class. |
lm_recover |
(TRUE) Allows robust model to use linear model if it fails |
... |
Additional arguments |
See '?rtModel' for more information
Use of weighted estimator in repeat transactions model approach
## S3 method for class 'weighted' rtModel(rt_df, time_matrix, price_diff, estimator, ...)
## S3 method for class 'weighted' rtModel(rt_df, time_matrix, price_diff, estimator, ...)
rt_df |
Repeat transactions dataset from rtCreateTrans() |
time_matrix |
Time matrix object from rtTimeMatrix() |
price_diff |
Difference in price between the two transactions |
estimator |
Type of model to estimates (base, robust, weighted). Must be in that class. |
... |
Additional arguments |
See '?rtModel' for more information
Generates the array necessary to estimate a repeat transactions model
rtTimeMatrix(rt_df)
rtTimeMatrix(rt_df)
rt_df |
object of class 'rtdata': repeat transaction data.frame created by rtCreateTrans() |
matrix to be used on the right hand side of a repeat sales regression model
Time periods are calculated from the data provided.
# Load data data(ex_sales) # With a raw transaction data.frame rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Create Matrix rt_matrix <- rtTimeMatrix(rt_data)
# Load data data(ex_sales) # With a raw transaction data.frame rt_data <- rtCreateTrans(trans_df = ex_sales, prop_id = 'pinx', trans_id = 'sale_id', price = 'sale_price', periodicity = 'monthly', date = 'sale_date') # Create Matrix rt_matrix <- rtTimeMatrix(rt_data)
Seattle home sales from 2010 to 2016. Includes only detached single family residences and townhomes. Data gathered from the King County Assessor's FTP site. A number of initial data munging tasks were necessary to bring the data into this format.
data(seattle_sales)
data(seattle_sales)
A "data.frame"
with 43,313 rows and 16 variables
The unique property identifying code. Original value is preceded by two '..'s to prevent the dropping of leading zeros
The unique transaction identifying code.
Price of the home
Date of sale
Property use type
Assessment area or zone
Size of lot in square feet
Is property waterfront?
Quality of the building construction (higher is better)
Size of home in square feet
Number of bedrooms
Number of bathrooms
Age of home
Age of home, considering major remodels
Longitude
Latitude
King County Assessor: https://info.kingcounty.gov/assessor/DataDownload/
Smooths an existing hpiindex object
smoothIndex(index_obj, order = 3, in_place = FALSE, ...)
smoothIndex(index_obj, order = 3, in_place = FALSE, ...)
index_obj |
Index to be smoothed |
order |
default = 3; Number of nearby period to smooth with, multiple means multiple iterations |
in_place |
default = FALSE; adds smoothed index to the 'hpiindex' object |
... |
Additional Arguments |
a ‘ts“ and ’smooth_index' object with smoothed index
Leaving order blank default to a moving average with order 3.
# Load data data(ex_sales) # Create index with raw transaction data rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Smooth index sm_index <- smoothIndex(index_obj = rt_index, order = 3, in_place = FALSE) # Create Smooth index (in place) sm_index <- smoothIndex(index_obj = rt_index, order = 3, in_place = TRUE)
# Load data data(ex_sales) # Create index with raw transaction data rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Smooth index sm_index <- smoothIndex(index_obj = rt_index, order = 3, in_place = FALSE) # Create Smooth index (in place) sm_index <- smoothIndex(index_obj = rt_index, order = 3, in_place = TRUE)
Smooths all indexes within a progressive series of indexes
smoothSeries(series_obj, order = 3, ...)
smoothSeries(series_obj, order = 3, ...)
series_obj |
Series to be smoothed |
order |
Number of nearby period to smooth with |
... |
Additional Arguments |
a 'serieshpi' object with a smoothed index in each 'hpiindex' object
Leaving order blank default to a moving average with order 3.
# Load data data(ex_sales) # Create index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12)) # Smooth indexes sm_series <- smoothSeries(series_obj = hpi_series, order = 5)
# Load data data(ex_sales) # Create index rt_index <- rtIndex(trans_df = ex_sales, periodicity = 'monthly', min_date = '2010-06-01', max_date = '2015-11-30', adj_type = 'clip', date = 'sale_date', price = 'sale_price', trans_id = 'sale_id', prop_id = 'pinx', estimator = 'robust', log_dep = TRUE, trim_model = TRUE, max_period = 48, smooth = FALSE) # Create Series (Suppressing messages do to small sample size of this example) suppressMessages( hpi_series <- createSeries(hpi_obj = rt_index, train_period = 12)) # Smooth indexes sm_series <- smoothSeries(series_obj = hpi_series, order = 5)