博文

[转载]【计算机科学】【2017.07】基于时间序列数据的植物生长预测机器学习策略

已有 1917 次阅读 2020-12-15 22:12 |系统分类:科研笔记|文章来源:转载

本文为荷兰埃因霍芬理工大学（作者：Aditya Vikram Singh Bhadoria）的硕士论文，共87页。

在典型的温室环境中，植物的生长取决于输入控制参数，如温度、辐射水平、蒸汽压和二氧化碳水平。在这些温室环境中，温度、蒸汽压和二氧化碳水平等参数都是人为控制的。然而，辐射水平取决于阳光和天气模式。植物的生长会随着辐射水平的变化而减弱，在日照较少的日子里，植物的生长会明显减弱。在荷兰，不断变化的天气影响阳光的辐射水平，这对种植者和农民来说是一个巨大的问题。飞利浦照明研究公司认识到了这一问题，并开发了一种人工照明解决方案。这种解决方案通过在LED的帮助下提供人工光照来补充整体辐射水平。飞利浦照明研究公司决定在给定的一组输入控制参数下进行温室环境下植物生长预测的研究。这个想法是为了了解这些输入参数对生长的影响，特别是辐射水平在预测植物生长方面的重要性。

本论文的目的是在温室环境中，针对给定的一组输入控制参数，找出适当的步骤来精确预测植物的生长。目前，植物生长模拟主要是通过生物力学模型来模拟光合作用和相关过程中的生物和机械过程。这些模型就像一个黑匣子，是专有模型。基于这些模型进行增长预测需要大量的时间和资源。在本论文中，研究的问题是一个时间序列预测问题，其中输出变量（增长）依赖于多个输入参数。在对番茄植株的时间序列数据进行数据探索后，ARIMA族模型和指数平滑模型被用于预测番茄生长。之后，机器学习（ML）方法被用来进行植物生长的预测。这些机器学习模型包括回归样条、决策树、基于核的方法、最近邻方法、基于高斯过程的方法以及bagging和boosting方法。均方根误差（RMSE）、决定系数（R2）和算法运行时间是评价不同模型预测性能的三个指标。由于ARIMA模型的RMSE较高，R2值较低，因此不适合于本研究。基于回归的机器学习模型提供较低的RMSE值和较高的R2值。然而，这些模型中的一些需要大量的训练时间，不适合随着训练数据量的增加而扩展。在所有的评价标准方面，梯度提升模型都优于其他机器学习模型。它们是boosting算法的一个变体，与以前的boosting算法相比，它们的速度更快、精度更高。在我们的项目中，最终预测模型与我们的模型相似。讨论了一些与我们的数据集相似并用梯度提升模型建模的数据集，以证明这些ML模型的有效性。本文给出的方法为准确预测植物生长提供了步骤。利用番茄植株的模拟数据对整个方法进行了检验。本文的研究工作为利用机器学习模型进行温室环境下植物生长的精确有效预测提供了必要的知识。

Plant growth in a typical greenhouse setting depends upon input control parameters like temperature, radiation levels, vapour pressure and CO2 levels. Parameters such as temperature, vapour pressure, and CO2 levels are controlled artificially in these greenhouse environments. However, radiation levels depend on the sunlight and weather patterns. Plant growth ‑uctuates with the changing levels of radiation and significantly diminishes on the days with low amounts of sunlight. In the Netherlands, constantly changing weather affects the radiation levels of sunlight, which is a huge issue for growers and farmers. Philips Lighting Research recognized this problem and developed an artificial lighting solution. This solution supplements the overall radiation levels by providing artificial light with the help of LEDs. Philips Lighting Research has decided to perform research on predicting plant growth in a greenhouse environment for a given set of input control parameters. The idea is to understand the effect of these input parameters on growth, especially the importance of radiation levels in predicting plant growth. The aim of this thesis is to discover appropriate steps for accurate prediction of plant growth for a given set of changing input control parameters in a greenhouse environment. Currently, plant growth modelling has been performed heavily by biomechanical models, which simulate the biological and mechanical process involved in photosynthesis and related processes. These models act as a black box and are proprietary models. It takes significant time and resources to perform growth predictions based on these models. In this thesis, the research problem has been addressed as a time-series prediction problem, in which the output variable (growth) is dependent upon several input parameters. After performing data exploration exercises on such time series data for tomato plants, ARIMA family models, and exponential smoothing models have been used for forecasting the growth. Post that, machine learning (ML) methods have been utilized to perform the prediction of plant growth. These machine learning models include regression splines, decision trees, kernel based methods, nearest neighbour methods, Gaussian process based methods, and bagging and boosting methods . Root Mean Square Error (RMSE), Coefficient of Determination (R 2 ) and Running Time of algorithm are the three measures which are used for evaluating the ecacy of different models in growth prediction performance. ARIMA models are not found to be suitable for this exercise, because of their higher RMSE and low value of R 2 . Regression based Machine learning models provide low values of RMSE and higher R 2 . However, some of these models require significant training time and are not suitable for scaling with the amount of training data. Gradient boosting models have outperformed other machine learning models in terms of all evaluation criteria. They are a variant of boosting algorithms and have been developed to be faster and more accurate than previous boosting algorithms. The nal prediction model is generalized for the plants with similar data as the one used in our project. Some datasets which are similar to our dataset and have been modeled with gradient boosting models are discussed to demonstrate the effectiveness of these ML models. The approach given in this thesis provide steps to predict plant growth accurately. Simulated plant data for tomato plants have been used to test the overall approach. This research work provides the necessary knowledge to perform accurate and efficient prediction of plant growth in greenhouse environment using machine learning models.

大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【计算机科学】【2017.07】基于时间序列数据的植物生长预测机器学习策略

1. 引言

2. 文献回顾

3. 数据探索与先决条件

4. 传统时间序列模型

5. 机器学习模型

6. 结果

7. 结论

附录A计算环境

附录B超参数估计

更多精彩文章请关注公众号：

当前推荐数：0

该博文允许注册用户评论请点击登录评论 (0 个评论)

刘春静

全部作者的其他最新博文

全部精选博文导读

大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【计算机科学】【2017.07】基于时间序列数据的植物生长预测机器学习策略

1. 引言

2. 文献回顾

3. 数据探索与先决条件

4. 传统时间序列模型

5. 机器学习模型

6. 结果

7. 结论

附录A计算环境

附录B超参数估计

更多精彩文章请关注公众号：

当前推荐数：0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

刘春静

全部作者的其他最新博文

全部精选博文导读

该博文允许注册用户评论请点击登录评论 (0 个评论)