Direct numerical simulation (DNS) of a round jet in crossflow based on lattice Boltzmann method (LBM) is carried out on multi-GPU cluster. Data parallel SIMT (single instruction multiple thread) characteristic of GPU matches the parallelism of LBM well, which leads to the high efficiency of GPU on the LBM solver. With present GPU settings (6 Nvidia Tesla K20M), the present DNS simulation can be completed in several hours. A grid system of 1.5 × 10^{8} is adopted and largest jet Reynolds number reaches 3000. The jet-to-free-stream velocity ratio is set as 3.3. The jet is orthogonal to the mainstream flow direction. The validated code shows good agreement with experiments. Vortical structures of CRVP, shear-layer vortices and horseshoe vortices, are presented and analyzed based on velocity fields and vorticity distributions. Turbulent statistical quantities of Reynolds stress are also displayed. Coherent structures are revealed in a very fine resolution based on the second invariant of the velocity gradients.
1. Introduction
A jet in crossflow (JICF), also known as transverse jet, normally describes a jet of fluid which enters and interacts with the crossflow and the resulting flow field. JICF has wide applications in many engineering fields, such as gas turbine impingement cooling and film cooling, fuel injection in combustors, thrust vectoring system in turbojet propulsion, and reaction control for missiles. For a traditional JICF problem, near the flow field of the transverse jet, four types of vortical structures can be observed [1] due to the interaction between mainstream and jet as shown in Figure 1: (1) the counter-rotating vortex pair (CRVP), also known as kidney vortices; (2) the horseshoe vortices system; (3) the jet shear-layer vortices due to Kelvin-Helmholtz instabilities; (4) the wake vortices. The CRVP and the horseshoe vortices are normally defined by mean flow even though they may have unsteady part. The shear-layer vortices and the wake vortices are naturally unsteady.
Four vortical structures associated with the JICF as in [1].
Those vortical structures are very important to fluid flow and heat transfer behaviors of those fields where JICF is applied. If gas turbine film cooling is taken as an example, film coverage and the cooling effectiveness are closely relevant to those large structures, no matter whether they are defined as steady or naturally unsteady. The “steady” CRVP lifts the coolant up and mixes with the mainstream, weakening the film coverage and cooling effectiveness. The “unsteady” shear-layer vortices have different directions determined by the shape of the ejection hole and consequently either strengthen or suppress the CRVP [2]. To accurately predict those large structures in which unsteadiness and anisotropy are inherent, a time-resolved scheme is required in the flow field calculation with fine spatial discretization.
The flow field of JICF is characterized by both anisotropic large-scale structures which break down to smaller sizes. These resultant small scale structures are isotropic and dissipation is dominant. The typical energy spectrum for the JICF inherently requires capturing the flow structures across the spectrum of all scales and thus a time and space accurate calculation is needed. Direct numerical simulation (DNS) and large eddy simulation (LES) are commonly used to resolve the turbulent characteristics of JICF in large-scale parallel computation devices. However, inherent features of Navier-Stokes equation or transport equation make the parallel processing with low efficiency. Thus, a highly parallel scheme is extremely helpful to those large-scale simulations of turbulent flow field.
Lattice Boltzmann method (LBM) has been regarded as a promising candidate for years due to its major merits of fully parallel algorithm. Unlike conventional numerical schemes based on assumption and discretizations of macroscopic continuum, the LBM adopts microscopic models [3]. Other advantages of LBM include (1) easy treatment of boundary conditions and (2) easy programming. As a result, the LBM has its wide application in scientific and engineering research, such as Biofluid and porous medium [4–6]. However, to resolve the turbulent flow, LBM-based DNS and LES require a very high grid resolution and thus huge computation resource is needed.
Graphic Processing Unit (GPU) brings excellent hardware conditions for these simulations. With the development of computing platforms, for example, CUDA [7] and OpenCL, the use of GPU to accelerate nongraphic computations has drawn much more attention [8, 9]. Due to its high performance of floating-point arithmetic operation, together with wide memory bandwidth and good programmability, general-purpose GPU (GPGPU) has its huge advantage over CPU in turbulent flow simulation [10–12] and thus has been applied in fields like weather prediction, crystal growth process, and air flow in the city [13–15].
The feasibility for LBM-based DNS has some preliminary validation for turbulent flows between parallel plates [16]; further verification for JICF is still empty. In this paper, the lattice Boltzmann based DNS on JICF model is performed using D3Q19 model in multi-GPUs. With further validation of this code, this paper aims to resolve the JICF flow field, including the time-averaged and instantaneous velocities, vorticities, and turbulent quantities of Reynolds stress. The characteristic coherent hairpin structures are also revealed and discussed.
2. Lattice Boltzmann Equations
Lattice Boltzmann equation is the special form of the Boltzmann-BGK equation [17], in which discretization applies to velocity, time, and space. The Boltzmann equation for the discrete velocity distribution on a discrete lattice is as follows: (1)∂fi∂t+ei·∇fi=Ωi,where fi is the particle velocity distribution function and ei is the particle velocity in the ith direction. Here, the scheme of LBM is described as DmQn [18]. “Dm” means “m dimension” and “Qn” stands for “n lattice speeds.” For two-dimensional problems, D2Q9 model is most commonly used and means 2D and 9 speeds. For three-dimensional problems, several cubic models are used, such as D3Q13, D3Q15, D3Q19, and D3Q27 (n=13, 15, 19, or 27). Ωi is the collision operator. With Boltzmann-BGK approximation,(2)Ωi=-1τfi-fieq.Combine (1) and (2); then it changes to(3)∂fi∂t+ei·∇fi=-1τfi-fieq,where fieq is the local equilibrium and τ is the relaxation time. The equilibrium distribution form has to be correctly adopted to restore the Navier-Stokes equations. For 3D models with the 13-, 15-, 19-, and 27-speed lattice, an appropriate distribution function is written as(4)fieq=ρϖi1+3ei·u+92ei·u2-32u·u.
In the present paper, D3Q19 scheme is used with particle velocity as shown in Figure 2: e00,0,0, e11,0,0, e2-1,0,0, e30,1,0, e40,-1,0, e50,0,1, e60,0,-1, e71,1,0, e8-1,1,0, e91,-1,0, e10-1,-1,0, e111,0,1, e12-1,0,1, e131,0,-1, e14-1,0,-1, e150,1,1, e160,-1,1, e170,1,-1, and e180,-1,-1. Corresponding weighting factors are ϖ0=1/3, ϖ1~ϖ6=1/18, and ϖ7~ϖ18=1/36. If (3) is further discretized in both space x and time t, the discretized form can be written as(5)fix+eiδt,t+δt-fix,t=-1τfix,t-fieqx,t.The above equation is the LBGK model, in which τ=λ/δt is the nondimensional relaxation time. The viscosity in the macroscopic Navier-Stokes equation can be expressed as ν=τ-1/2Cs2δt. Equation (5) is usually solved in a standard way assuming δt=1 and divided into two steps as follows.
D3Q19 LBM model.
Collision step is as follows:(6)f~ix,t=fx,t-1τfx,t-fieqx,t;streaming step is as follows:(7)fix+ei,t+1=f~ix,t,where fi and f~i denote the pre- and postcollision states of the distribution function. Please note that in the collision step f~i is calculated absolutely locally and in streaming step fi is only relevant to its neighboring nodes. Thus, the LBM code itself is highly suitable for parallel processing and capable of obtaining high efficiency. Macroscopic quantities, such as ρ and, can be calculated as follows:(8)ρ=∑i=0Nfi=∑i=0Nfieq,ρu=∑i=0Nfiei=∑i=0Nfieqei.The lattice speed of sound is Cs=1/3, and pressure can be obtained from the state equation of the ideal gas:(9)p=ρCs2=ρ3.Considering boundaries of f~ in transverse direction (front and back boundaries), periodic boundary conditions are used in this paper. For solid walls, the following boundary conditions are applied [19]:(10)f~i=fieq+fineq=fieq+fi-neighborneq=fieq+f~i-nieghbor-fi-neighboreqin which fieq is calculated by macroscopic u and ρ on boundaries by (4). The nonequilibrium distribution on boundaries fineq is assumed to be of the same value as that in the neighboring inner node fi-neighborneq. f~i-neighbor and fi-neighboreq on the inner node are computed by (6) and (4), respectively.
3. GPU Settings
A well-known inherent advantage of lattice Boltzmann method is its parallelism. Previous study [20] compared the acceleration performance between Navier-Stokes solver and current LBM solver, in which incompressible flow around a cylinder was simulated on GPU (GeForce GTX280) and CPU (Xeon E5420, 2.5 GHz) separately. For the Navier-Stokes solver in which Red-Black scheme and multi-grids were applied, the GPU acceleration over CPU was 13.7. Comparatively, for LBM on the same grid scale, the acceleration was 87.4. In addition to the acceleration performance of LBM on GPU as mentioned, the difficulty of coding is much smaller for LBM than Navier-Stokes on GPU.
However, the expense to LBM’s parallelism is also obvious due to substantial variables and consequent huge memory consumption. For the D3Q19 model in single precision calculation, the theoretical GPU memory requirement on 1.5 × 10^{8} grids (1024 × 256 × 600) is over 30 G. Besides, data transfer between GPUs is realized by MPI (message passing interface) and CudaMemcpy() sentences in CUDA. From the previous experience [11], if the number of GPUs is less than 10, the 1D domain partitioning is more efficient than 2D and 3D. In the current research, 6 GPUs of Nvidia Tesla K20M are used and the current 1D partitioning of the GPUs is illustrated in Figure 3.
1D partitioning in z-direction.
4. Physical and Numerical Models
The physical model of the jet in cross flow problem is shown in Figure 4. The computational domain is hexahedral in shape. The mainstream flow is in x-direction, while the jet is orthogonal to the mainstream in z-direction and ejected from the bottom plane uniformly with diameter D. Origin of the coordinates is located at the center of the round jet exit. The flow domain is zero-gradient in pressure and no-slip boundary conditions are applied on the bottom plane excluding the jet exit. On transverse planes (with normal in y-direction), periodic boundary conditions are applied. The inlet turbulent profile of velocity is given through a calculation procedure similar to the so-called “rescaling process” as described in [21]. The results presented in this paper are based on flow parameters and boundary conditions listed as follows.
Ejecting hole diameter (D)=75.
Domain length (Lx/D=Lx1+Lx2/D=5+7)=12.
Domain span Ly/D=3.4.
Domain height Lz/D=6.4.
Free-stream turbulence intensity (TI)=3%.
Boundary layer height (δ/D)=0.2.
Velocity ratio (VR=Uj/U∞)=3.3.
Physical model of jet in crossflow.
Reynolds number Rej is defined based on jet velocity Uj and film hole diameter D as Rej=UjD/υ. The largest Reynolds number Rej reaches 3,000 on grids (1024 × 256 × 600), and several calculations are performed with different dimensions, grids, and jet locations trialed as shown in Table 1. The discussion is based on the 1st case in the table. The flow domain is kept in the “low-speed” range, and velocity magnitude is smaller than 0.3 times of the lattice sound speed Cs. Thus, the LBGK model used in current study is suitable [22].
Testing cases matrix.
Jet diameter (D)
Domain (Lx×Ly×Lz)
Jet location (Lx1)
Mesh
75
12D × 3.4D × 8D
2.6D
896 × 256 × 600
75
13.6D × 3.4D × 8D
4D
1024 × 256 × 600
100
10D × 5D × 8D
4D
1024 × 512 × 600
75
8.5D × 3.4D × 6.4D
2.6D
640 × 256 × 480
75
8.5D × 4D × 6.4D
2.6D
640 × 300 × 480
5. Model Validation
Nondimensional time-averaged streamwise velocity U+=U/uτ versus z+=zuτ/υ is presented in Figure 5. The location is selected before the jet at x=-2D and y=NY/2 to eliminate the influence from the jet. A turbulent boundary layer profile is obtained, consisting of laminar sublayer and log layer, which is consistent with the Von Karman mixing length theory.
Streamwise velocity boundary layer profile.
Further comparisons with experiments are made in regions where the mainstream is disturbed by the jet as shown in Figure 6. The experiments were done by Meyer et al. [23, 24] using stereoscopic PIV in a JICF problem. Time-averaged streamwise and spanwise velocities (U/U∞ and W/U∞) are presented in spanwise direction (with respect to z/D) at locations of x=0.5D, y=NY/2 and x=3D, y=NY/2. Results show good agreement with experimental data. In current simulation, a uniformly distributed velocity profile is set at the jet exit, while, in experiment, the jet is supplied through a tube and a turbulent profile has been developed inside it. Thus, an overestimation of the jet’s momentum near its boundary of the jet hole can be expected and seen in Figure 6.
Streamwise and spanwise velocities profiles after the jet.
6. Results and Analysis6.1. Energy Spectra
Fast Fourier transformation (FFT) is performed to a time series of turbulent energy Etur at the sample frequency of 2000 at x=6D, z=4D, and y=NY/2. The turbulent energy is defined as Etur=1/2u′2+v′2+w′2. The corresponding turbulent power spectrum is presented versus frequency as shown in Figure 7. The calculated turbulence decay rate is close to the theoretical canonical value of −5/3 and no obvious spectrum buildup is observed, which indicate the current mesh resolution is adequate.
Spectral power density of velocity fluctuations.
6.2. Velocity Field and Vorticity
The 2D time-averaged velocity (U,W) and instantaneous velocity (u,w) fields in the middle plane of y-direction (y=NY/2) are plotted in Figures 8(a), 8(b), and 8(c), respectively. The corresponding vorticity distributions based on the time-averaged and instantaneous velocities are also shown in the figure. In Figure 8(a), the vertical jet is bended toward exit by the mainstream flow and phenomenon like flow passing a solid obstacle is observed in the wake region. As shown in the enlarging plot of flow field, the approaching wall boundary layer meets with adverse pressure gradient before the jet and consequently separates and forms the horseshoe vortex. Considering the vorticity field, negative and positive values are observed in leading edge and trailing edge along the jet trajectory and trivial values exist in the wake region. Instantaneous velocity and vorticity fields at time steps 80000 and 120000 are shown in Figures 8(b) and 8(c), respectively. Generation and shedding of Kelvin-Helmholtz vortices are introduced by the shear layers between the mainstream and the jet on both leading edge and trailing edge. The shed vortices are mixing with the mainstream and dissipated in the downstream region quickly. In the wake region of the jet, wake vortices are observed through those nontrivial vorticity values; however, no vortex tube is directly observed in this cross section view. To compare the time-averaged fields with their instantaneous counterparts, the Kelvin-Helmholtz vortices are not available after averaging, which proves the shear-layer induced vortices are inherently unsteady and periodic.
Time-averaged and instantaneous 2D velocity vector and vorticity at y=NY/2 plane.
Time-averaged
Time step = 80000
Time step = 120000
The 2D instantaneous velocity (u,v) and time-averaged velocity (U,V) fields are displayed at locations =0.5D, 1D, and 2D, respectively, in Figure 9. The corresponding vorticity distributions are also displayed in the figure. In Figure 9(a), the instantaneous velocity vectors and vorticity distributions at time step 80000 are shown. Separation of mainstream occurs near the mid-chord portion. Further, vortex shedding and dissipation are observed in the downstream area. Wake region is formed at the backside of the jet. A characteristic flow pattern of the wake by the transverse jet is obtained which is significantly different from that by a solid cylinder as described in [1]. In Figures 9(b) and 9(c), the time-averaged velocity vectors and vorticity distributions are displayed. Similar to flow passing a solid cylinder, along the streamwise direction, positive and negative values appear when the mainstream starts to separate. In downstream region, the vorticity recovers to trivial values due to the dissipation of those periodically-shed vortices.
2D velocity vector and vorticity at planes of z=0.5D, 1D, and 2D.
z=0.5D
z=1D
z=2D
The 2D time-averaged velocity (V,W) fields are presented in several streamwise planes at locations x=1D, 1.5D, and 3D in Figures 10(a), 10(b), and 10(c), respectively, with vorticity distributions. As shown in Figures 10(a) and 10(b), the CRVP system consists of two layers, the upper one and the lower one. For a round hole, as used in the current study, the rotating directions of the two pairs are the same. This “two-deck” vortices structure is very similar to that observed in [2] by laser-induced fluorescence. By comparing among Figures 10(a), 10(b), and 10(c), it is found that when moving downstream, the vertical component (W) of the 2D velocity reduces because of the jet’s bending. At the same time, the lower layer vortices lift in vertical direction and weaken in strength. Finally, in far downstream region (x=3D), the W-component is at the same level of V-component since the jet is almost fully bended and those two layers fully emerge with strength further weakened.
Time-averaged 2D velocity vector and vorticity at planes of x=1D, 1.5D, and 3D.
x=1D
x=1.5D
x=3D
6.3. Turbulent Statistics
Reynolds stress distributions of u′w′¯, u′u′¯, and w′w′¯ are plotted in the midplane of y-direction (y=NY/2) in Figures 11(a), 11(b), and 11(c), respectively. The shear stress component of the Reynolds stress tensor is shown in Figure 11(a). Strong anisotropy of the flow field can be observed in the flow field. Negative values of u′w′¯ appear in the leading edge of the jet with positive values in the trailing portion. In the wake region, both positive and negative values exist with big nonuniformity. In Figures 11(b) and 11(c), the normal Reynolds stress is displayed. For both u′u′¯ and w′w′¯, the highest values appear in the leading and trailing portions of the jet due to large velocity gradients. Right behind the jet in the recirculation zone, both components maintain some nontrivial values, especially w′w′¯, indicating the wrapping motion around the jet by the fluid flow.
Reynolds stress distribution at y=NY/2 plane.
6.4. Coherent Structure
The isosurfaces of second invariant of velocity gradient Q at time steps 80,000 and 120,000 with domain height of NZ/2 are plotted in Figure 12. The definition of Q is given by (11) in tensor form:(11)Q=12WijWij-SijSij=-12∂uj∂xi∂ui∂xj,Wij=∂ui∂xj-∂uj∂xi,Sij=∂ui∂xj+∂uj∂xi,where Wij and Sij denote the rate of rotation and rate of shearing, respectively. The hair-pin-shaped coherent structures are originated from the side and trailing edge of the ejecting hole and wrapping around the jet. In the downstream region, the hair-pin structure is weakened and dissipated into the mainstream.
Isosurfaces of second invariant of velocity gradient.
Time step = 80000
Time step = 120000
7. Conclusion
A direct numerical simulation of jet in crossflow based on lattice Boltzmann method is performed on multi-GPUs. The current simulation takes about 10 hours based on largest grid of 1.5 × 10^{8}. Several points can be summarized and concluded as follows.
A turbulent boundary layer is observed before the jet (x=-2D, y=NY/2), consisting of laminar sublayer and turbulent core. After the jet at x=0.5D, y=NY/2 and x=3D, y=NY/2, time-averaged streamwise and spanwise velocity components (U/U∞ and W/U∞) are compared with experiments, and good agreement is obtained.
The turbulent energy spectrum is plotted versus frequency at x=6D, z=4D, and y=NY/2. A decay rate close to the theoretical value of −5/3 is observed.
2D velocity vectors and vorticity fields are plotted in y-, z-, and x-direction planes. Unsteady shear-layer vortices are formed and shed along the leading and trailing side of the jet trajectory. Horseshoe vortex is generated in the plate boundary layer before the jet. Two-deck structure of the CRVP is retained and the lower pair is observed to rise along the streamwise direction.
The normal components (u′u′¯ and w′w′¯) and shear component (u′w′¯) of the Reynolds stress are revealed in the y=NY/2 plane. Strong anisotropy is observed for the flow field due to the shear layer and wake.
Coherent structure represented by the second invariant of velocity gradient (Q) is plotted at different time steps. The characteristic hair-pin vortices are observed.
Conflict of Interests
The authors declare that there is no any conflict of interests regarding the publication of this paper.
Acknowledgments
The work is supported by 973 Project of China (2013CB035702), the National Natural Science Foundation of China (11302165 and 11402191), and Postdoctoral Science Foundation of China (2013M540741).
FricT. F.RoshkoA.Vortical structure in the wake of a transverse jetHavenB. A.KurosakaM.Kidney and anti-kidney vortices in crossflow jetsChenS. Y.DoolenG. D.Lattice Boltzmann method for fluid flowsHeY. L.WangY.LiQ.ChenL.KangQ.MuY.HeY.-L.TaoW.-Q.A critical review of the pseudopotential multiphase lattice Boltzmann model: methods and applicationsAidunC. K.ClausenJ. R.Lattice-Boltzmann method for complex flowsnVIDIANVIDIA CUDA Compute Unified Device Architecture Programming Guide Version 2.0nVIDIA, 2008BuckI.FoleyT.HornD.Brook for GPUs: stream computing on graphics hardwareKrügerJ.WestermannR.Linear algebra operators for GPU implementation of numerical algorithmsOgawaS.AokiT.GPU Computing for 2-dimensional incompressible-flow simulation based on multi-grid methodXianW.TakayukiA.Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU clusterRossinelliD.BergdorfM.CottetG.-H.KoumoutsakosP.GPU accelerated simulations of bluff body flows using vortex particle methodsShimokawabeT.AokiT.IshidaJ.KawanoK.MuroiC.145 TFlops performance on 3990 GPUs of TSUBAME 2.0 supercomputer for an operational weather prediction4Proceedings of the 11th International Conference on Computational Science (ICCS '11)June 20111535154410.1016/j.procs.2011.04.1662-s2.0-79958268442WangX.AokiT.High performance computation by multi-node GPU cluster-TSUBAME 2.0 on the air flow in an urban city using lattice Boltzmann methodMikiT.WangX.AokiT.ImaiY.IshikawaT.TakaseK.YamaguchiT.Patient-specific modelling of pulmonary airflow using GPU cluster for the application in medical practiceWangX.ShangguanY.OnoderaN.KobayashiH.AokiT.Direct numerical simulation and large eddy simulation on a turbulent wall-bounded flow using lattice boltzmann method and multiple GPUsBhatnagarP. L.GrossE. P.KrookM.A model for collision processes in gasesQianY. H.D'HumièresD.LallemandP.Lattice BGK models for navier-stokes equationGuoZ.-L.ZhengC.-G.ShiB.-C.Non-equilibrium extrapolation method for velocity and pressure boundary conditions in the lattice boltzmann methodXuD.ChenG.WangX.LiY.Direct numerical simulation of the wall-bounded turbulent flow by Lattice Boltzmann method based on multi-GPUsMuldoonF.AcharyaS.Direct numerical simulation of pulsed jets-in-crossflowGuoZ. L.ZhengC. G.MeyerK. E.ÖzcanO.LarsenP. S.WestergaardC. H.Steroscopic PIV measurements in a jet in crossflowProceedings of the 2nd International Symposium on Turbulence and Shear Flow PhenomenaJune 2001Stockholm, SwedenMeyerK. E.ÖzcanO.LarsenP. S.WestergaardC. H.Flow mapping of a jet in crossflow with stereoscopic PIV