HTML5 Webook
118/134

compute the Jacobian matrix with respect to the 6 chemi-cal species proles considered in Table 2 as well as those of temperature and LOS wind. ey are dened for 1.5 km thickness layers from 20 to 130 km (x = 75*8 parameters). 3.3Radiative transfer with Zeeman effect and polarizationTable 7 shows the commands to include the Zeeman eect on the O2 line. e same setting as that in the previ-ous section is used except for the frequency grid. It is re-duced to 300 MHz and 1000 samples. e Zeeman lines of the O2 transition are determined with the method “setZabs” and the polarized radiative transfer is applied (method “zrad” instead of “rad”). e spectrum of the total radiance at 50 km is shown in Fig. 5. Dierences with non-Zeeman calculations (previous section) are only seen near the line center. Performances4.1BenchmarksRuntime improvements when using GPU are illustrated with the benchmarks presented in Table 8. e computa-tions for unpolarized radiations presented in Sect. 3.2 are considered. e absorption coecient (150 altitudes, 10000 frequencies) includes 105 spectroscopic lines. e Voigt line-shape is used and computed with the Hui algorithm taken from the model GARLIC [14]. e radiances are computed for 50 tangent heights. All calculations are done in double precision (FLT64) on a working station equipped with 2 CPUs Intel(R) Xeon(R) Silver 4108 @ 1.80GHz (32 threads) and a GPU NVIDIA TITAN V.TabT8Runtimes for the unpolarized radiative transfer calculations in Sect. 3.2.CPU - 1 threadCPU - 32 threadsGPUAbsorption coefficient TF / V136 s / 1.1 s7 s / x0.24 s / xRadiative transfer TF / V125 s / 13 s7.2 s / x0.09 s / xPoor performances are found for the CPU computation of the TF absorption coecient compared to V1 one. Our implementation of the Voigt function with TF has been identied as the bottleneck. is will be improved. When using the CPU with 32 threads, the full runtime (absorp-tion coecient and radiative transfer) are similar as that with V1 (14 s) but takes less than 0.35 s on the GPU. e Jacobian computations (Table 6) take about 50 s with both AMATERASU-TF (GPU) and –V1 (not shown). e run-times are similar though the perturbation method used in the TF version is much less computationally ecient than the analytical one in V1. e GPU runtimes for the absorp-tion coecients presented in Sect 3.3 (150 altitudes, 1000 frequencies) are 44.1 ms and 11 ms for the non-Zeeman transitions (104 lines) and for the O2 transition (25 Zeeman lines, 7 coecients), respectively. e polarized radiative transfer is computed in 2.4 s (4 Stokes parameters) which is 300 times slower than the calculation done without polarization (7 ms). is should be improved.4.2TensorFlowough it is developed for implementing codes for machine learning, TF can be used for other applications where large arrays are computed and handled. e model has been implemented in a relatively short time as we could expect from using a high-level language, allowing us to proceed smoothly in our studies. is must be underlined since we had very little knowledge of TF when we started and, likely, we can still improve the way the model has been implemented. Using the GPU clearly helps us to get satis-factory computational performances. e overall result is very positive.We have chosen TF because it was the most used library when we started this project and a large user community provides supports on internet. Other similar tools (e.g., https://pytorch.org/ or https://cupy.chainer.org/ ) are likely as valuable choices as TF and moving to another framework is still under investigation. e main appealing aspects in TF (version 1.12) are: -e same code can be executed on GPUs and/or CPUs-It is Implemented on top of Numpy and provides a relatively comprehensive linear algebra library-e Eager mode allows us to use TF dynamically together with AMATERASU-v1 -It is portable to the main operating systems (AMATERASU-v1/TF tested on Linux, MacOs and Windows10)-Automatic dierentiation tools are availableWe obviously must deal with issues. Some of them may be improved in future versions of TF: -Assignment such as “tensor[i]= …” is not directly possible with TF constant under Eager mode. -Selecting a tensor element (e.g, “y = tensor[k]” or similar functions as gather or slice) has poor perfor-4114   情報通信研究機構研究報告 Vol. 65 No. 1 (2019)4 衛星センサによる宇宙からの地球環境観測

元のページ  ../index.html#118

このブックを見る