Bee the Predictor

We forecast using two bee population targets:

  1. Top 20% Density - Counties with the highest density of bee colonies based on the size of each county
  2. Top 20% Growth - Changes in county bee populations between years.

Bee Population Density as Target

Our Bee Density Colab prepares honeybee population data for use in our RealityStream ML. The dataset comes from the USDA National Agricultural Statistics Service (NASS) Quick Stats, and geographic data is retrieved from the U.S. Census Bureau's GEOINFO API.

We use county-level bee population density and classify the top 20% as high-density

The bee density colab generates a .csv file containing County Fips codes and their binary classification. The Target column is set to 1 for counties with high bee population densities and 0 otherwise.

View .csv output: bee-data/targets

This dataset was sourced directly from the USDA Quick Stats website and contains county-level bee population data for different years(2002,2007,2012,2017,2022) across all states in USA. (See quick stats export image near the bottom of the current page.)

Processing Steps

1. Data Cleaning:

2. Fetching County-Level Land Area Data:

3. Merging Bee Population Data with Geographic Data:

4. Bee Population Density Calculation:

5. Creating a Binary Classification Target:

6. Main Steps to upload the csv file in GitHub

How to Use

CSV. output is sent directly to GitHub.

1. Open the Notebook in Google Colab:

2. Run All Code Cells:

3. Modify the bee-data:

4. Save Backup periodically:

Random Forests for Bees

Using county industry changes to predict honey bee populations.

The following are probably all represented by the Run Models Colab now.

NOTE: bees-targets.csv was a copy of bees-targets-increase2022.csv
bees-targets-top-20-percent.csv is the top 20% of colony density (not change over time). It's a copy of bees-population-usda.csv
bees-targets-top-20-percent.csv is the default when the target path is simply "bees"

Backup and run locally in models/location-forest:

python location-forest-input-bkup.ipynb bees
python location-forest-output-bkup.ipynb bees

2-column Target tables containing county Fips.

Our Run Models colab merges 2-column bee targets data for counties with features data with rows for industries and demographic data for each county.

Bee Pollinators

[Prior change] predicting [future] change at locations or in industry mix
For model training, a "y" column value of 1 indicate locations where [Attribute(s)] that changed in a [prior year] predict a later year.


Best Params: max depth: 8; n-estimators: 100
Accuracy before tuning: 69%.  Accuracy after tuning: 71%.

Prior Bees Output

Factors that contribute to bee colony decline: nutrition deficiencies, mite infestations, viral diseases, pesticide exposure, temperature.