EMsolver
The program primarily solves the Jefimenko equations for different charge densities and current densities.; The objective is to obtain the electromagnetic field parameters for varying charge densities and current densities.
To understand the background and principles of the application of the program you can refer to the following article JefiGPU: Jefimenko’s Equations on GPU
Advantages of the program
- The Jefimenko equations are efficiently processed using CUDA for fast parallel computation.
Structure of the program
- Class EBsolver()
- init()
- update_rho_J()
- Jefimenko_solver()
- Class EMsolver()
- init()
- terminate()
- Jefimenko_solver()
- acquire_EB_field()
- areal_position()
- generate_rho_J()
- real_position()
- copy_new_rho_J_to_GPU()
- retarded_time_index()
- r_distance() The overall logic of the program can be described as follows:
EMsolver()
Main class for solving E and B from Jefimenko’s equations. The calculated rho and J are always stored as the self.data.
The Params:
len_time_snapshots: int, the maximum time sequence that can be stored in GPU memory
x_grid_size_o, y_grid_size_o, z_grid_size_o: number of grid sizes in the obervation region
x_grid_size_s, y_grid_size_s, z_grid_size_s: number of grid sizes in the source region
dx_o, dy_o, dz_o: infinitesimal difference of the spatial coordinates in the observation region
dx_s, dy_s, dz_s: infinitesimal difference of the spatial coordinates in the source region
x_left_boundary_o, y_left_boundary_o, z_left_boundary_o: the left boundary of the observation region
x_left_boundary_s, y_left_boundary_s, z_left_boundary_s: the left boundary of the source region
dt: time step, epsilon0: the numerical value of epsilon0 used in FU (Flexible Unit),
c: the numerical value of c used in FU (Flexible Unit)
class EMsolver():
def __init__(self, \
len_time_snapshots, \
x_grid_size_o, y_grid_size_o, z_grid_size_o, \
x_grid_size_s, y_grid_size_s, z_grid_size_s, \
dx_o, dy_o, dz_o, x_left_boundary_o, y_left_boundary_o, z_left_boundary_o, \
dx_s, dy_s, dz_s, x_left_boundary_s, y_left_boundary_s, z_left_boundary_s, \
dt, epsilon0, c):
# save the variables in the class
self.len_time_snapshots = len_time_snapshots
self.dx_o, self.dy_o, self.dz_o = dx_o, dy_o, dz_o
self.dx_s, self.dy_s, self.dz_s = dx_s, dy_s, dz_s
self.x_left_boundary_o, self.y_left_boundary_o, self.z_left_boundary_o = x_left_boundary_o, y_left_boundary_o, z_left_boundary_o
self.x_left_boundary_s, self.y_left_boundary_s, self.z_left_boundary_s = x_left_boundary_s, y_left_boundary_s, z_left_boundary_s
self.x_grid_size_o, self.y_grid_size_o, self.z_grid_size_o = x_grid_size_o, y_grid_size_o, z_grid_size_o
self.x_grid_size_s, self.y_grid_size_s, self.z_grid_size_s = x_grid_size_s, y_grid_size_s, z_grid_size_s
self.dt = dt
self.epsilon0 = epsilon0
self.c = c
# total grid numbers and total time ticks in GPU
self.total_grid_o = x_grid_size_o*y_grid_size_o*z_grid_size_o
self.total_grid_s = x_grid_size_s*y_grid_size_s*z_grid_size_s
self.all_grids = self.total_grid_o*self.total_grid_s
# Configure the blocks
self.threadsperblock = 32
# configure the grids
self.blockspergrid_o = (self.total_grid_o + (self.threadsperblock - 1)) // self.threadsperblock
self.blockspergrid_s = (self.total_grid_s + (self.threadsperblock - 1)) // self.threadsperblock
self.blockspergrid_all = (self.all_grids + (self.threadsperblock - 1)) // self.threadsperblock
# define device array of rho and J with zero initial values of the source region
self.rho_GPU, self.Jx_GPU, self.Jy_GPU, self.Jz_GPU = \
(cupy.zeros([len_time_snapshots, self.total_grid_s], dtype=np.float64) for _ in range(4))
# coefficient for E and B when doing integration in the source region
self.coeff_E = 1/(4*math.pi*epsilon0)*dx_s*dy_s*dz_s
self.coeff_B = -1/(4*math.pi*epsilon0*c**2)*dx_s*dy_s*dz_s
# store the sequences of the time steps chosen
self.dt_series = cupy.zeros([len_time_snapshots])
self.dt_list = []
# time tick
self.time_tick = 0
# accumulated time steps from the current dt till the previous
self.accumu_dt_series = cupy.zeros([len_time_snapshots])
self.accumu_dt_index = cupy.zeros([len_time_snapshots], dtype=np.int64)
update_rho_J()
The function primarily calculates the time index and saves the current charge density during the time iteration process. It consists of two sub-functions.
def update_rho_J(self, rho, Jx, Jy, Jz):
# i_time: int, index at which to copy the data
# i_round: i_step = i_rounds*len_time_snapshots + i_time
self.i_round, self.i_time = time_index_in_GPU(self.time_tick, self.len_time_snapshots)
self.t = self.time_tick*self.dt
self.time_tick += 1
# send these data to GPU
rho, Jx, Jy, Jz = (cupy.asarray(i) for i in [rho, Jx, Jy, Jz])
# copy newly calculated rho and J to GPU
copy_new_rho_J_to_GPU[self.blockspergrid_s, self.threadsperblock](self.i_time, rho, Jx, Jy, Jz, \
self.rho_GPU, self.Jx_GPU, self.Jy_GPU, self.Jz_GPU, \
self.total_grid_s)
- time_index_in_GPU()——The time index calculation is performed to prevent the iteration process from exceeding the maximum time step;
def time_index_in_GPU(i_time, len_time_snapshots): return (i_time//len_time_snapshots, i_time%len_time_snapshots)
- copy_new_rho_J_to_GPU()—— All current densities and charge densities are recorded according to the time index;
@cuda.jit def copy_new_rho_J_to_GPU(i_time, rho, Jx, Jy, Jz, \ rho_GPU, Jx_GPU, Jy_GPU, Jz_GPU, \ total_grid_s): i = cuda.grid(1) # loop through each spatial grid if i < total_grid_s: rho_GPU[i_time, i] = rho[i] Jx_GPU[i_time, i] = Jx[i] Jy_GPU[i_time, i] = Jy[i] Jz_GPU[i_time, i] = Jz[i]
signal_indicator()
The function primarily determines whether there is an overlap between the user-defined source region and the target region positions and provides indices for different scenarios.
The Params:
o indicates observational and s indicates source
dx_o, dy_o, dz_o, x_left_boundary_o, y_left_boundary_o, z_left_boundary_o
dx_s, dy_s, dz_s, x_left_boundary_s, y_left_boundary_s, z_left_boundary_s
x_grid_size_o, y_grid_size_o, z_grid_size_o
x_grid_size_s, y_grid_size_s, z_grid_size_s
def signal_indicator(dx_o, dy_o, dz_o, x_left_boundary_o, y_left_boundary_o, z_left_boundary_o, \
dx_s, dy_s, dz_s, x_left_boundary_s, y_left_boundary_s, z_left_boundary_s, \
x_grid_size_o, y_grid_size_o, z_grid_size_o, \
x_grid_size_s, y_grid_size_s, z_grid_size_s):
# find the left and right boundaries
x_right_boundary_o, y_right_boundary_o, z_right_boundary_o = x_left_boundary_o + dx_o*x_grid_size_o, \
y_left_boundary_o + dy_o*y_grid_size_o, \
z_left_boundary_o + dz_o*z_grid_size_o
x_right_boundary_s, y_right_boundary_s, z_right_boundary_s = x_left_boundary_s + dx_s*x_grid_size_s, \
y_left_boundary_s + dy_o*y_grid_size_s, \
z_left_boundary_s + dz_o*z_grid_size_s
xro,yro,zro,xlo,ylo,zlo,xrs,yrs,zrs,xls,yls,zls = x_right_boundary_o, y_right_boundary_o, z_right_boundary_o,\
x_left_boundary_o, y_left_boundary_o, z_left_boundary_o, \
x_right_boundary_s, y_right_boundary_s, z_right_boundary_s, \
x_left_boundary_s, y_left_boundary_s, z_left_boundary_s
# there are four cases for the relavent spatial positions of the two regions
# we use the boundaries points in observational region to subtract the boundaries in source region
# first we compare the points in observational region with source region
ob = []
for i in (xlo,xro):
ii=step_func(i,xls,xrs)
for j in (ylo,yro):
jj=step_func(j,yls,yrs)
for k in (zlo,zro):
kk=step_func(k,zls,zrs)
ob.append([ii,jj,kk])
# count how many 8 or -8 and where are they
sob = np.sum(ob,0)
count, index, sign = conut_eights_or_minus_eights(sob)
# the two regions overlap
if count == 0:
return 0.
# one direction exceeds
elif count == 1:
return sign_distance(sign[0], xlo,ylo,zlo, xrs,yrs,zrs, xls,yls,zls, xro,yro,zro, index[0])
# two directions exceed
elif count == 2:
r1 = sign_distance(sign[0], xlo,ylo,zlo, xrs,yrs,zrs, xls,yls,zls, xro,yro,zro, index[0])
r2 = sign_distance(sign[1], xlo,ylo,zlo, xrs,yrs,zrs, xls,yls,zls, xro,yro,zro, index[1])
return math.sqrt(r1**2+r2**2)
elif count ==3:
r1 = sign_distance(sign[0], xlo,ylo,zlo, xrs,yrs,zrs, xls,yls,zls, xro,yro,zro, index[0])
r2 = sign_distance(sign[1], xlo,ylo,zlo, xrs,yrs,zrs, xls,yls,zls, xro,yro,zro, index[1])
r3 = sign_distance(sign[2], xlo,ylo,zlo, xrs,yrs,zrs, xls,yls,zls, xro,yro,zro, index[2])
return math.sqrt(r1**2+r2**2+r3**2)
- step_func()
The “sob” divides the configuration of the observation region (x_r, x_l; y_r, y_l; z_r, z_l) into eight groups and compares them with the source region. If the left boundary of the observation region is less than the source region, it assigns -1. If it is between them, it assigns 0. If it is greater, it assigns 1. It then sums up all the results.
ob = []
for i in (xlo,xro):
ii=step_func(i,xls,xrs)
for j in (ylo,yro):
jj=step_func(j,yls,yrs)
for k in (zlo,zro):
kk=step_func(k,zls,zrs)
ob.append([ii,jj,kk])
def step_func(x, l, r):
'''compare x with l and r. return -1 or 0 or 1.'''
if x<l:
return -1
elif x>r:
return 1
else:
return 0
- conut_eights_or_minus_eights()
- Perform an evaluation on the function ‘sob’.
-
The variable “count” indicates whether there is an overlap between the source area and the observation area.:
count = 1 indicates that there is no overlap in one direction; count = 2 indicates that there is no overlap in two directions; count = 3 indicates that there is no overlap between the source area and the observation area.
-
The variable “indx” indicates which axis the source area and the observation area overlap:
indx = 0 indicates that there is no overlap in the x-axis; indx = 1 indicates that there is no overlap in the y-axis; indx = 2 indicates that there is no overlap in the z-axis.
-
The variable “sign” indicates the direction of overlap between the source area and the observation area:
sign = -1 indicates that the observation area is to the left of the source area; sign = 1 indicates that the observation area is to the right of the source area.
-
- Perform an evaluation on the function ‘sob’.
def conut_eights_or_minus_eights(x):
count = 0
index = []
sign = []
for i in range(3):
if abs(x[i]) == 8:
count+=1
index.append(i)
sign.append(x[i]/8)
return count,index,sign
Jefimenko_solver()
The function primarily calculates the electric field and magnetic field using the Jefimenko’s equations. At the same time, the program has incorporated the functionality of variable dt during the updating process, which means that it can automatically adjust the appropriate dt based on the simulation process during runtime. This feature provides the program with a wider range of application environments.
The Params:
rho, Jx, Jy, Jz: physical quantities of shape [total_grid_s]
current_dt: the time step used in updation
quasi_neutral: if quasi_neutral == 1, the system only considers the contribution of electric current
back_to_previous: whether to use the previous distribution for updation
time_stride_back: If distribution f(i) at time i is ilegal change f(i) to f(i-time_stride_back)
def Jefimenko_solver(self, rho, Jx, Jy, Jz, current_dt, quasi_neutral, back_to_previous, time_stride_back):
self.dt_list.append(current_dt)
if back_to_previous == 0:
# i_time: int, index at which to copy the data
self.i_time = self.time_tick%self.len_time_snapshots # time_index_in_GPU(self.time_tick, self.len_time_snapshots)
# copy newly calculated rho and J to GPU
self.rho_GPU[self.i_time], self.Jx_GPU[self.i_time], self.Jy_GPU[self.i_time], self.Jz_GPU[self.i_time] = rho, Jx, Jy, Jz
self.dt_series[self.i_time] = current_dt
self.t = sum(self.dt_list)
else:
# i_time: int, index at which to copy the data
self.i_time = (self.time_tick- time_stride_back)%self.len_time_snapshots # time_index_in_GPU(self.time_tick, self.len_time_snapshots)
self.time_tick = self.time_tick - time_stride_back
# copy newly calculated rho and J to GPU
try:
self.rho_GPU[self.i_time+1 :self.i_time + time_stride_back], self.Jx_GPU[self.i_time+1 :self.i_time + time_stride_back], \
self.Jy_GPU[self.i_time+1 :self.i_time + time_stride_back], self.Jz_GPU[self.i_time+1 :self.i_time + time_stride_back] = 0, 0, 0, 0
except: # if self.i_time + time_stride_back > time_snap_shots '''Jun-Jie Zhang 2023/07/20'''
self.rho_GPU[self.i_time+1 :], self.Jx_GPU[self.i_time+1 :], \
self.Jy_GPU[self.i_time+1 :], self.Jz_GPU[self.i_time+1 :] = 0, 0, 0, 0
self.rho_GPU[:time_stride_back-(self.len_time_snapshots-(self.i_time+1))], self.Jx_GPU[:time_stride_back-(self.len_time_snapshots-(self.i_time+1))], \
self.Jy_GPU[:time_stride_back-(self.len_time_snapshots-(self.i_time+1))], self.Jz_GPU[:time_stride_back-(self.len_time_snapshots-(self.i_time+1))] = 0, 0, 0, 0
self.dt_series[self.i_time:] = 0
self.dt_series[self.i_time] = current_dt
self.dt_list[-time_stride_back:] = [0] * time_stride_back
self.t = sum(self.dt_list)
# save the sequences of dt from the current time step till the previous len_time_snapshots-th and the index
accumu_i_time = self.i_time
accumu_time_steps = current_dt
for i_reorder_time in range(self.len_time_snapshots):
self.accumu_dt_series[i_reorder_time] = accumu_time_steps
self.accumu_dt_index[i_reorder_time] = accumu_i_time
accumu_i_time -= 1
if accumu_i_time<0:
accumu_i_time += self.len_time_snapshots
accumu_time_steps += self.dt_series[accumu_i_time]
# define device array of E and B with zero initial values
GEx, GEy, GEz, GBx, GBy, GBz = (cupy.zeros(self.total_grid_o, dtype=np.float64) for _ in range(6))
if quasi_neutral == 1:
# calculate E and B
Jefimenko_kernel[self.blockspergrid_o, self.threadsperblock](self.rho_GPU, self.Jx_GPU, self.Jy_GPU, self.Jz_GPU, \
self.len_time_snapshots, current_dt, self.t, \
self.accumu_dt_index, self.accumu_dt_series, \
self.total_grid_o, self.total_grid_s, \
self.x_grid_size_o, self.y_grid_size_o, self.z_grid_size_o, \
self.x_grid_size_s, self.y_grid_size_s, self.z_grid_size_s, \
self.dx_o, self.dy_o, self.dz_o, \
self.dx_s, self.dy_s, self.dz_s, \
self.x_left_boundary_o, self.y_left_boundary_o, self.z_left_boundary_o, \
self.x_left_boundary_s, self.y_left_boundary_s, self.z_left_boundary_s, \
GEx, GEy, GEz, GBx, GBy, GBz, self.coeff_E, self.coeff_B, self.c)
self.GEx, self.GEy, self.GEz, self.GBx, self.GBy, self.GBz = GEx*self.coeff_E, GEy*self.coeff_E, GEz*self.coeff_E, GBx*self.coeff_B, GBy*self.coeff_B, GBz*self.coeff_B
# very small values are forced to be zero
self.GEx, self.GEy, self.GEz, self.GBx, self.GBy, self.GBz = \
cupy.where(cupy.abs(self.GEx)<10**(-18),0.,self.GEx),\
cupy.where(cupy.abs(self.GEy)<10**(-18),0.,self.GEy),\
cupy.where(cupy.abs(self.GEz)<10**(-18),0.,self.GEz),\
cupy.where(cupy.abs(self.GBx)<10**(-18),0.,self.GBx),\
cupy.where(cupy.abs(self.GBy)<10**(-18),0.,self.GBy),\
cupy.where(cupy.abs(self.GBz)<10**(-18),0.,self.GBz)
# add one executed time step on time_tick
self.time_tick += 1
# save the passed time in accumulate time_seg
- Jefimenko_kernel() - performs parallel computation;
Firstly, it associates the block index in parallel computation with the corresponding position;
i_o = cuda.grid(1)
# loop through each spatial grid in the observation region
if i_o < total_grid_o:
# convert 1D grid index into 3D
iz_o = i_o%z_grid_size_o
iz_rest_o = i_o//z_grid_size_o
iy_o = iz_rest_o%y_grid_size_o
iy_rest_o = iz_rest_o//y_grid_size_o
ix_o = iy_rest_o%x_grid_size_o
# express the position corresponding to the index
x_o, y_o, z_o = real_position(ix_o, dx_o, x_left_boundary_o), \
real_position(iy_o, dy_o, y_left_boundary_o), \
real_position(iz_o, dz_o, z_left_boundary_o)
# loop through all spatial grid in the source region for integration
for i_s in range(total_grid_s):
# 1D to 3D
iz_s = i_s%z_grid_size_s
iz_rest_s = i_s//z_grid_size_s
iy_s = iz_rest_s%y_grid_size_s
iy_rest_s = iz_rest_s//y_grid_size_s
ix_s = iy_rest_s%x_grid_size_s
# positions
x_s, y_s, z_s = real_position(ix_s, dx_s, x_left_boundary_s), \
real_position(iy_s, dy_s, y_left_boundary_s), \
real_position(iz_s, dz_s, z_left_boundary_s)
# dr = r - r' = r_o - r_s
dx = x_o - x_s
dy = y_o - y_s
dz = z_o - z_s
dr = r_distance(x_o, y_o, z_o, x_s, y_s, z_s)
Next, it calculates the time delay to prevent the time iteration process from exceeding the maximum time step;
if dr > 10**-10:
# find the retarded time for positions r_o
# rtime_i gives the index of time in time_snapshots, it can be negtive
rtime_i = retarded_time_index(t, dr, dt, i_round, len_time_snapshots)
# rtime_i_minus: index for retarded_time - dt
if rtime_i < 0:
# no updation since signal has not arrived yet
pass
else:
if rtime_i == 0:
rtime_i_minus = len_time_snapshots-1
elif rtime_i > 0:
rtime_i_minus = rtime_i - 1
@cuda.jit(device=True)
def retarded_time_index(t, dr, dt, i_round, len_time_snapshots):
tr = t - dr
# if the signal has not been transmitted
if tr < 0.:
return -1
# the current time tr corresponds to the ti-th row
ti = int(math.floor(tr/dt)%len_time_snapshots)
# if ti corresponds to the last row
if ti == len_time_snapshots - 1:
return 0
else:
return ti + 1
Calculate the time derivatives of current density and charge density:
rho_s, rho_minus_s = rho_GPU_s[rtime_i, i_s], rho_GPU_s[rtime_i_minus, i_s]
Jx_s, Jx_minus_s = Jx_GPU_s[rtime_i, i_s], Jx_GPU_s[rtime_i_minus, i_s]
Jy_s, Jy_minus_s = Jy_GPU_s[rtime_i, i_s], Jy_GPU_s[rtime_i_minus, i_s]
Jz_s, Jz_minus_s = Jz_GPU_s[rtime_i, i_s], Jz_GPU_s[rtime_i_minus, i_s]
prho_pt_s = (rho_s - rho_minus_s)/dt
pJx_pt_s = (Jx_s - Jx_minus_s)/dt
pJy_pt_s = (Jy_s - Jy_minus_s)/dt
pJz_pt_s = (Jz_s - Jz_minus_s)/dt
The current and charge densities are divided into three dimensions and summed using CUDA atomic operations;
cuda.atomic.add(GEx, i_o, (x_o-x_s)/dr**3*rho_s+(x_o-x_s)/dr**2*prho_pt_s-1/dr*pJx_pt_s)
cuda.atomic.add(GEy, i_o, (y_o-y_s)/dr**3*rho_s+(y_o-y_s)/dr**2*prho_pt_s-1/dr*pJy_pt_s)
cuda.atomic.add(GEz, i_o, (z_o-z_s)/dr**3*rho_s+(z_o-z_s)/dr**2*prho_pt_s-1/dr*pJz_pt_s)
cuda.atomic.add(GBx, i_o, -(((y_o-y_s)*Jz_s-(z_o-z_s)*Jy_s)/dr**3+((y_o-y_s)*pJz_pt_s-(z_o-z_s)*pJy_pt_s)/dr**2))
cuda.atomic.add(GBy, i_o, -(((z_o-z_s)*Jx_s-(x_o-x_s)*Jz_s)/dr**3+((z_o-z_s)*pJx_pt_s-(x_o-x_s)*pJz_pt_s)/dr**2))
cuda.atomic.add(GBz, i_o, -(((x_o-x_s)*Jy_s-(y_o-y_s)*Jx_s)/dr**3+((x_o-x_s)*pJy_pt_s-(y_o-y_s)*pJx_pt_s)/dr**2))