forked from apache/hadoop
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
128 lines (97 loc) · 4.65 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
Overall
=======
This is a fork of Hadoop 3.3.0 which implements zonefs support in HDFS as a
research project. Zonefs is a simple filsystem which allows to access zones of
zoned stroage as files. For detail of zoned stroage and zonefs, please visit
zonedstorage.io:
https://zonedstorage.io/
https://zonedstorage.io/docs/linux/fs#zonefs
About Hadoop and HDFS, please visit Hadoop web site and wiki:
http://hadoop.apache.org/
https://cwiki.apache.org/confluence/display/HADOOP/
This implementation allows to run Hadoop and HDFS applications on zonefs set up
on top of zoned-stroage. Two benchmark applications DFSIO and TeraSort were
confirmed to work on zonefs.
This implementation was developed and confirmed with Fedora 35 on Hadoop nodes.
SMR HDDs were attached to DataNodes and used for confirmation.
How to set up zonefs for Hadoop HDFS
====================================
1. Follow Hadoop user guide on Hadoop web site to set up Hadoop and HDFS nodes.
HDFS DataNodes shall have Linux based OS which support zoned block device and
zonefs. Fedora 35 is the recommended OS. DataNodes also require libaio
runtime library. Install libaio package to each DataNode.
2. Follow BUILDING.txt and build this Hadoop fork, then deploy it to the Hadoop
nodes. To build this fork, libaio.h and libaio.so are required. To build on
Fedora, install libaio-devel package to your build environment.
3. Set up zoned-storage on DataNodes and create zonefs instances for them.
Format and mount zonefs for all DataNodes in same manner.
4. Modify hdfs-site.xml and set up parameters below:
- dfs.blocksize:
Default size of the HDFS block files. Set zone size of the zoned storage
on DataNodes.
- dfs.data.zonefs.dir
Determines where zonefs instances are mounted. It can list multiple mount
points as comma separated values.
- dfs.data.zonefs.meta.dir
Determines where zonefs meta data is saved. Meta data keeps valid data
size of each zone and sector ramainders.
- dfs.data.zonefs.buffer.size
Specifies buffers size used for each data write to ZoneFs file. Its
default value is 0. When 0 is set, read the value of sysfs max_sectors_kb
attribute and use it for the buffer size.
- dfs.data.zonefs.max.io
Defines limit of maximum IOs, or queue depth, which can be queued and
executed in parallel. Its default value is 32.
5. Start the HDFS and confirm the DataNodes recoginize zonefs instances. In
DataNode log, you will see ZoneFs message like this:
INFO org.apache.hadoop.hdfs.server.datanode.ZoneFs: ZoneFs device: \
root=/zonefs0, dev=/dev/sda, max I/O bytes=524288, align bytes=4096
Example of hdfs-site.xml
========================
This is an example of parameter settings for zonefs. In this example, zone size
of zoned-storage is 256MiB, then this number is set to dfs.blocksize. Each
DataNode has two zonefs instances mounted at /zonefs0 and /zonefs1. Zonefs
metadata is kept at /opt/hadoop/zonefs_meta directory. Also non-default values
are set to zonefs I/O buffer size and max IO limit.
<property>
<name>dfs.blocksize</name>
<value>256m</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop/home/data</value>
</property>
<property>
<name>dfs.datanode.zonefs.dir</name>
<value>/zonefs0,/zonefs1</value>
</property>
<property>
<name>dfs.datanode.zonefs.meta.dir</name>
<value>/opt/hadoop/zonefs_meta</value>
</property>
<property>
<name>dfs.datanode.zonefs.buffer.size</name>
<value>524288</value>
</property>
<property>
<name>dfs.datanode.zonefs.max.io</name>
<value>64</value>
</property>
Left works
==========
1. Conventional zone support
Zonefs has two sub-directories: "cnv" for conventional zones and "seq" for
sequential write required zones. Current HDFS implementation supports only
zonefs files in "seq" sub-directory. Further work is required to use zonefs
files in "cnv" sub-directory and conventional zones.
2. Free space accounting
Free space accounting of zonefs instances needs zonefs unique implementation.
Current implementation lacks it then "hdfs dfs -df" command does not report
valid numbers.
3. Two ZoneFsFileIoProvider methods not yet implemented
ZoneFsFileIoProvider class extends FileIoProvider class and hides zonefs
unique file operations. Still the ZoneFsFileIoProvider does not implement two
override methods, getRandomAccessFile() and nativeCopyFileUnubuffered().
Zonefs unique implementations of these methods are not required for HDFS data
I/O benchmark on zonefs. For completeness, will need to revisit and implement
the methods.