TensorFlow入门：探索tfrecord与TFRecordDataset

96 浏览量更新于2024-08-31 收藏 110KB PDF 举报

"tensorflow入门:tfrecord 和tf.data.TFRecordDataset的使用" 在TensorFlow中，`tfrecord`是一种高效的数据存储格式，用于保存大量的原始数据，尤其适用于机器学习和深度学习模型的训练。它允许将数据序列化，并且可以方便地在不同的计算环境中进行传输和读取。而`tf.data.TFRecordDataset`则是TensorFlow提供的一个接口，用于读取`tfrecord`文件中的数据，将其转换为可被模型训练使用的数据流。 1. 创建tfrecord 创建`tfrecord`文件的基本步骤是将数据转换为特定的结构，然后写入文件。首先，数据需要转换为TensorFlow支持的格式，如字符串（`tf.train.BytesList`）、整型（`tf.train.Int64List`）或浮点型（`tf.train.FloatList`）。例如，对于多维数组，通常需要先将其转换为字节串（`tostring()`），同时保存其形状信息，因为转换过程中形状信息会被丢失。在示例代码中，特征`feature`被转换为字节串并保存，而标签`label`被保存为浮点列表。 ```python def get_tfrecords_example(feature, label): tfrecords_features = {} feat_shape = feature.shape tfrecords_features['feature'] = tf.train.Feature(bytes_list=tf.train.BytesList(value=[feature.tostring()])) tfrecords_features['shape'] = tf.train.Feature(int64_list=tf.train.Int64List(value=list(feat_shape))) tfrecords_features['label'] = tf.train.Feature(float_list=tf.train.FloatList(value=[label])) ``` 这里，`tfrecords_features`是一个字典，包含了`feature`、`shape`和`label`的`tf.train.Feature`对象，这些将构成`tf.train.Example`的组成部分。 2. 构建tf.train.Example `tf.train.Example`是`tfrecord`中存储数据的基本单元，它包含了一系列的键值对，每个键对应的值都是一个`tf.train.Feature`对象。在上面的`get_tfrecords_example`函数中，我们创建了`tf.train.Example`实例，然后可以使用`tf.train.Example.SerializeToString()`方法将其序列化为字节串，进一步写入到`tfrecord`文件。 3. 写入tfrecord文件有了`tf.train.Example`的序列化字符串，就可以使用`tf.io.write_file`或`tf.io.TFRecordWriter`来写入`tfrecord`文件。例如： ```python writer = tf.io.TFRecordWriter('data.tfrecord') for example in examples: serialized_example = get_tfrecords_example(example.feature, example.label).SerializeToString() writer.write(serialized_example) writer.close() ``` 4. 使用tf.data.TFRecordDataset 读取`tfrecord`文件时，我们可以利用`tf.data.TFRecordDataset`。这个类提供了一个高效的接口，可以方便地将`tfrecord`文件中的数据解析成数据流，供模型训练使用： ```python def parse_function(example_proto): feature_description = { 'feature': tf.io.FixedLenFeature([], tf.string), 'shape': tf.io.FixedLenFeature([], tf.int64), 'label': tf.io.FixedLenFeature([], tf.float32), } parsed_example = tf.io.parse_single_example(example_proto, feature_description) # 解析数据，如将feature恢复为原始形状 feature = tf.reshape(tf.io.decode_raw(parsed_example['feature'], tf.float32), parsed_example['shape']) return feature, parsed_example['label'] dataset = tf.data.TFRecordDataset('data.tfrecord') dataset = dataset.map(parse_function) dataset = dataset.batch(batch_size) # 根据需要设置batch大小 dataset = dataset.prefetch(1) # 预加载数据，提高性能 ``` 在这个例子中，`parse_function`解析每个`tf.train.Example`，恢复特征和标签的原始形式。然后，`dataset`可以被馈送给模型进行训练。总结，`tfrecord`和`tf.data.TFRecordDataset`在TensorFlow中是数据预处理和输入流水线的关键部分，它们使得大规模数据的存储和高效处理成为可能，有助于提升机器学习模型的训练效率。

tensorflow入门入门:tfrecord 和和tf.data.TFRecordDataset的使用的使用

今天小编就为大家分享一篇tensorflow入门:tfrecord 和tf.data.TFRecordDataset的使用，具有很好的参考价值，

希望对大家有所帮助。一起跟随小编过来看看吧

1.创建创建tfrecord

tfrecord支持写入三种格式的数据：string，int64，float32，以列表的形式分别通过tf.train.BytesList、tf.train.Int64List、

tf.train.FloatList写入tf.train.Feature，如下所示：

tf.train.Feature(bytes_list=tf.train.BytesList(value=[feature.tostring()])) #feature一般是多维数组，要先转为list

tf.train.Feature(int64_list=tf.train.Int64List(value=list(feature.shape))) #tostring函数后feature的形状信息会丢失，把shape也写入

tf.train.Feature(float_list=tf.train.FloatList(value=[label]))

通过上述操作，以dict的形式把要写入的数据汇总，并构建tf.train.Features，然后构建tf.train.Example，如下：

def get_tfrecords_example(feature, label):

tfrecords_features = {}

feat_shape = feature.shape

tfrecords_features['feature'] = tf.train.Feature(bytes_list=tf.train.BytesList(value=[feature.tostring()]))

tfrecords_features['shape'] = tf.train.Feature(int64_list=tf.train.Int64List(value=list(feat_shape)))

tfrecords_features['label'] = tf.train.Feature(float_list=tf.train.FloatList(value=label))

return tf.train.Example(features=tf.train.Features(feature=tfrecords_features))

把创建的tf.train.Example序列化下，便可通过tf.python_io.TFRecordWriter写入tfrecord文件，如下：

tfrecord_wrt = tf.python_io.TFRecordWriter('xxx.tfrecord') #创建tfrecord的writer，文件名为xxx

exmp = get_tfrecords_example(feats[inx], labels[inx]) #把数据写入Example

exmp_serial = exmp.SerializeToString() #Example序列化

tfrecord_wrt.write(exmp_serial) #写入tfrecord文件

tfrecord_wrt.close() #写完后关闭tfrecord的writer

代码汇总：

import tensorflow as tf

from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets

mnist = read_data_sets("MNIST_data/", one_hot=True)

#把数据写入Example

def get_tfrecords_example(feature, label):

tfrecords_features = {}

feat_shape = feature.shape

tfrecords_features['feature'] = tf.train.Feature(bytes_list=tf.train.BytesList(value=[feature.tostring()]))

tfrecords_features['shape'] = tf.train.Feature(int64_list=tf.train.Int64List(value=list(feat_shape)))

tfrecords_features['label'] = tf.train.Feature(float_list=tf.train.FloatList(value=label))

return tf.train.Example(features=tf.train.Features(feature=tfrecords_features))

#把所有数据写入tfrecord文件

def make_tfrecord(data, outf_nm='mnist-train'):

feats, labels = data

outf_nm += '.tfrecord'

tfrecord_wrt = tf.python_io.TFRecordWriter(outf_nm)

ndatas = len(labels)

for inx in range(ndatas):

exmp = get_tfrecords_example(feats[inx], labels[inx])

exmp_serial = exmp.SerializeToString()

tfrecord_wrt.write(exmp_serial)

tfrecord_wrt.close()

import random

nDatas = len(mnist.train.labels)

inx_lst = range(nDatas)

random.shuffle(inx_lst)

ntrains = int(0.85*nDatas)

# make training set

data = ([mnist.train.images[i] for i in inx_lst[:ntrains]], \

[mnist.train.labels[i] for i in inx_lst[:ntrains]])

make_tfrecord(data, outf_nm='mnist-train')

# make validation set

data = ([mnist.train.images[i] for i in inx_lst[ntrains:]], \

[mnist.train.labels[i] for i in inx_lst[ntrains:]])

make_tfrecord(data, outf_nm='mnist-val')

# make test set

data = (mnist.test.images, mnist.test.labels)

make_tfrecord(data, outf_nm='mnist-test')

下载后可阅读完整内容，剩余4页未读，立即下载

weixin_38691055

粉丝: 10

TensorFlow入门：探索tfrecord与TFRecordDataset

tf.data官方教程 – – 基于TF-v2

tensorflow入门:TFRecordDataset变长数据的batch读取详解

自己标注的TFrecord数据集

TensorFlow数据处理：tf.data.Dataset.map与interleave详解

tf-explain：使用Tensorflow 2.x的tf.keras模型的可解释性方法

Tensorflow：tf.contrib.rnn.DropoutWrapper函数(谷歌已经为Dropout申请了专利！)、MultiRNNCell函数的解读与理解

TensorFlow实战：tfrecord文件生成与读取解析

Java调用TensorFlow模型：保存与应用.pb文件

Tensorflow深入学习：@tf.function与自定义梯度解析

TensorFlow数据处理：tf.data API快速教程

最新资源