您所在的位置:一氧化碳中毒 >> 病因探究>> >> R海拾遗hdf5r包

R海拾遗hdf5r包

文章来源:一氧化碳中毒   发布时间:2021-6-21 21:14:18   点击数:
  北京痤疮正规医院 http://baidianfeng.39.net/a_yqhg/210111/8578752.html
为大数据而生hdfr5概述

hdf5文件是一种大数据存储结构,除了目前介绍的hdf5r包之外,同时cran中的h5包,Bioconductor中的rhdf5也能够实现类似的功能。

简单开始创建文件、分组和数据集

library(hdf5r)#创建一个临时hdf5文件test_filename-tempfile(fileext=".h5")#读取hdf5文件,如果存在则覆盖file.h5-H5Filenew(test_filename,mode="w")file.h5#Class:H5File#Filename:C:\Users\cmusunqi\TMP\Rtmp2Vb8Pj\file29ac4ea.h5#Accesstype:H5F_ACC_RDWR

建立两个分组,一个分组用来装mtcars的数据,一个用于nycflights13

mtcars.grp-file.h5create_group("mtcars")flights.grp-file.h5create_group("flights")

写入数据

library(datasets)library(nycflights13)library(reshape2)#在分组中加入数据mtcars.grp[["mtcars"]]-datasets::mtcars#飞行数据中放入天气数据flights.grp[["weather"]]-nycflights13::weather#飞行数据中放入航班数据flights.grp[["flights"]]-nycflights13::flights

从weather数据中提取站点为EWR的风向和风速数据,并保存为matrix,小时为列,日期为行

#取子集,subset函数weather_wind_dir-subset(#选择行nycflights13::weather,origin=="EWR",#选择列select=c("year","month","day","hour","wind_dir"))#去除存在缺失值的行weather_wind_dir-na.exclude(weather_wind_dir)#将风向转换为整数weather_wind_dirwind_dir-as.integer(weather_wind_dirwind_dir)#acast为聚合函数,类似dcastweather_wind_dir-acast(weather_wind_dir,year+month+day~hour,value.var="wind_dir")#风向放入flights组中flights.grp[["wind_dir"]]-weather_wind_dir#对风速处理weather_wind_speed-subset(nycflights13::weather,origin=="EWR",select=c("year","month","day","hour","wind_speed"))weather_wind_speed-na.exclude(weather_wind_speed)#将长数据装换为宽数据的矩阵weather_wind_speed-acast(weather_wind_speed,year+month+day~hour,value.var="wind_speed")#将风速放入filght组中flights.grp[["wind_speed"]]-weather_wind_speed

定义attributes,也就是将风向和风速的行列名指定为特征

h5attr(flights.grp[["wind_dir"]],"colnames")-colnames(weather_wind_dir)h5attr(flights.grp[["wind_dir"]],"rownames")-rownames(weather_wind_dir)h5attr(flights.grp[["wind_speed"]],"colnames")-colnames(weather_wind_speed)h5attr(flights.grp[["wind_speed"]],"rownames")-rownames(weather_wind_speed)获取信息

这个比较重要,目前来看,我需要的其实是对数据的读取,至于制作hdf5文件,我想我应该暂时不会涉及

文件和组的信息

#查看file.h5下的groupnames(file.h5)#[1]"flights""mtcars"#查看filght组中有什么数据names(flights.grp)##[1]"flights""weather""wind_dir""wind_speed"#ls函数,返回名字、连接类型、数据的维度等信息flights.grpls()##namelink.typeobj_typenum_attrsgroup.nlinksgroup.mounted##1flightsH5L_TYPE_HARDH5I_DATASET0NANA##2weatherH5L_TYPE_HARDH5I_DATASET0NANA##3wind_dirH5L_TYPE_HARDH5I_DATASET2NANA##4wind_speedH5L_TYPE_HARDH5I_DATASET2NANA##dataset.rankdataset.dimsdataset.maxdimsdataset.type_class##InfH5T_COMPOUND##InfH5T_COMPOUND##x24InfxInfH5T_INTEGER##x24InfxInfH5T_INTEGER##dataset.space_class

转载请注明:http://www.lwblm.com/bytj/12025.html
  • 上一篇文章:
  • 下一篇文章: