Application of the Lagrangian Meshfree Approach to Modelling of Batch Crystallisation : Part II — An Efficient Solution of Integrated CFD and Population Balance Equations

The second article in the series presents the application of the Smoothed Particle Hydrodynamics (SPH) method to modelling of batch crystallisation in stirred tanks. A methodology to integrate the population balance equations (PBE) in parallel and independently from the Navier-Stokes equations is demonstrated. The benefits of the proposed methodology in terms of computational requirements, accuracy and availability of the crystal size distribution are discussed. The specific formulation of the SPH equations where the resulting system of ordinary differential equations is solved using the weighted contributions rather than numerically by solving a linear system of equations allows for massive parallelisation and a very loose coupling of the population balance and the fluid dynamics. It has been demonstrated, that the population balance equations can be solved on a Shared Memory Architecture (SMA) system using the OpenMP interface while the fluid dynamics equations being computed independently on a General Purpose Graphics Processing Unit (GPGPU) using the NVidia CUDA technology. This way, a significant portion of the computational overhead due to the large number of additional transport equations resulting from the discretisation of the population balance was removed: the SPH simulation coupled with 200 population balance equations was only 40% slower compared to SPH-only simulation. Two methods for the solution of population balance equations that preserve full crystal size distribution were implemented: discretised population balance (DPB) and method of characteristics (MOCH). The DPB equations are solved using the high-resolution finite-volume method with flux limiter and the effect of a large number of different flux limiters have been investigated. Both methods were validated using the case studies from the literature where an analytical solution can be derived. The developed models were applied to a numerical solution of coupled computational fluid dynamics and population balance equations to model a batch crystallization process. The effect of the hydrodynamics on the local temperature/supersaturation and the resulting crystal size Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 November 2016 doi:10.20944/preprints201611.0012.v1


Introduction
A large number of engineering fields such as chemical, petrochemical, pharmaceutical, microelectronics, aerosol formation, food processing, colloid chemistry, growth of cell populations and atmospheric physics involve a dynamic evolution of a distribution of particles.The performance of many industrial downstream processes is directly affected by particle characteristics and product qualities such as morphology, bulk density, average particle size, filterability and dry solid flow properties are directly related to the crystal size distribution (CSD).Therefore, the CSD is one of the most important process performance parameters.The dynamical behaviour of particles can be best described by population balance equations (PBE) (Ramkrishna, 2000).The population balance modelling framework provides a deterministic description of the dynamic evolution of the crystal size distribution by forming a balance to calculate the number of crystals in the crystalliser.Since all the above-mentioned processes are largely affected by hydrodynamics the population balance equations are often coupled with computational fluid dynamics (CFD) modelling to correctly predict the key process performance indicators.
The solution of the generic population balance equations usually requires computationally expensive, complex numerical solution techniques (Ramkrishna, 2000).These approaches can be categorised in five main classes: a) Standard method of moments (SMOM) (Randolph and Larson, 1971), b) Numerical non-linear model reduction approaches: method of characteristics (MOCH) (Hounslow and Reynolds, 2006), quadrature method of moments (QMOM) (McGraw, 1997;Marchisio et al., 2003), fixed quadrature method of moments (FQMOM) (Alopaeus, 2006), Jacobian matrix transformation (JMT) (McGraw and Wright, 2003) and direct quadrature method of moments (DQMOM) (Fan et al., 2004), c) Direct numerical solution approaches involving finite-element or finite-volume discretisation of the partial differential equation (discretised population balances, DPB) (LeVeque, 2002;Gunawan et al., 2004;Costa et al., 2007), d) various methods in the weighted residuals framework such as the least squares, orthogonal collocation and Galerkin methods (Singh and Ramkrishna, 1977;Sporleder et al., 2011;van den Bosch and Padmanabhan, 1974;Nigam and Nigam, 1980;Roussos et al., 2005), and e) Dynamic Monte Carlo simulation (DMC) (Haseltine and Rawlings, 2005;Rosner et al., 2003).The two most often used techniques are the standard method of moments and the quadrature method of moments.The method of moments is based on the idea of tracking only the moments of the crystal size distribution, rather than the entire distribution.In addition, there were some efforts to use combined methods to achieve a computationally efficient technique for the estimation of the crystal size distribution, such as combined QMOM and MOCH (Aamir et al., 2009).The common problem encountered with the method of moments is that the CSD is lost and needs to be reconstructed from its distribution moments.Many reconstruction techniques were developed however, no unified technique for reconstruction of a complete distribution from a finite number of moments is available in literature due to the fact that mathematically all the distribution moments up to infinity are required to achieve an accurate reconstruction (John et al., 2007).
The overall goal in this work is to develop a computationally efficient mathematical model of integrated CFD and population balance equations.The main idea is to use the SPH method for rapid prediction of the global mean flow in stirred reactors presented in Nikolic and Frawley (2015) and apply accurate and computationally efficient methods for the solution of the population balance equations to preserve the crystal size distribution.The focus in developing models was put on computational efficiency and accuracy of the CSD prediction.

Population Balance modelling
The mathematical description of the evolution of particle properties during a process can be written as (Ramkrishna, 2000): (1) where n(L,t) is the number density function, L is the internal property such as characteristic length, volume, mass etc., G(L,t) is growth rate and s(L,t,n) is the generation/depletion rate.
The term s(L,t,n) can include nucleation, breakage, agglomeration, aggregation, attrition and other phenomena that contribute to the change of the number density function.Often, these terms are defined as integral functions making the whole equation ( 1) an integro partial-differential equation.There is a large number of methods to solve the population balance equations numerically and in some case analytically, as already discussed in the previous section.In this work the methods for solution of population balance equations that are capable to preserve the full crystal size distribution are of a special interest.The most suitable in terms of implementation cost and computational requirements are the well known discretised population balance (DPB) and the method of characteristics (MOCH).
The methods are briefly described in the subsequent sections.

Discretised population balance method (DPB)
The most commonly used direct numerical solution approaches to population balance equations are finite-difference, finite-element and finite-volume discretisation techniques.
The equation ( 1) is hyperbolic in its nature and a typical problem is accurate tracking of sharp fronts in the particle distribution since the differential equation is not valid at discontinuities, which lead to numerical diffusion and non-physical oscillations (LeVeque, 2002).The high-resolution finite-volume methods (LeVeque, 2002) have been developed to address these problems.Finite volumes allow direct satisfaction of conservation laws while cell-centering leads to a natural situation where domain boundaries coincide with cell faces (Koren, 1993).
The DPB equations are obtained by discretising the domain L (particle size) into N cells/elements as given in Fig. 1.Applying the cell-centered finite volume scheme on the one-dimensional population balance equation we get (Koren, 1993): (2) where the first integral term represents the volume-average number density in every cell: ( 3) Since the discretisation of the source term should be consistent with the discretisation of the advection term to avoid numerical problems (Koren, 1993) the new source term integral S is introduced: (4) Replacing the new terms in the integral form in equation ( 2) and integrating in every cell i, we get: (5) where the term (Gn-S) denotes an extended advective flux while the half-integer indices i-1/2 and i+1/2 refer to the cell faces between cell centres Li-1 -Li, and Li -Li+1, respectively; consequently, the terms (Gn)i-1/2 and (Gn)i +1/2 denote the cell-face fluxes of the element i at its left and right edge, respectively.The accuracy of the numerical scheme is determined by the way in which the extended advective fluxes are calculated.Assuming that the flow is in positive L-direction: and applying the van Leer piecewise polynomial κ-interpolation (van Leer, 1985) we get: (6) The parameter κ is used to control the accuracy of the scheme: a) for κ = -1 we get the fully one-sided upwind scheme of the second order, b) for κ = 1 we get the standard second order scheme, and c) for other values in the range [-1, 1] we have a weighted blend between the central and fully one-side upwind schemes (Koren, 1993).In the case where κ = 1/3 the scheme becomes third-order accurate for steady linear problems (Koren, 1993).To suppress the possible occurrences of non-physical oscillations and possible occurrences of negative values the flux limiter function are introduced (Koren, 1993).Assuming κ=1/3 the highresolution upwind scheme for cell-fluxes can be now derived as: where φ is a flux limiter function and r represents the upwind ratio of two consecutive solution gradients (Koren, 1993): (8) There are a large number of flux limiter functions and a list of the most frequently used ones is given in Table 1.The limiter functions must satisfy certain conditions such as they must lay inside or at the boundary of the Sweby's monotonicity domain (Sweby, 1984).
Limiter functions used in this work in relation to 2 nd order total variation diminishing (TVD) region are given in Fig. 2. Since the piecewise polynomial interpolation cannot be applied around and at boundaries (Koren, 1993) the equation ( 6) must be modified accordingly: a) the flux at the inflow boundary L1/2 is known exactly through the boundary condition, b) at L3/2 only the central interpolation with κ = 1 can be consistently applied, and c) at the outflow boundary Ln+1/2 only the fully one-sided upwind scheme κ = -1 can be applied.The full set of equations can be found in Koren (1993).

Method of characteristics (MOCH)
The generic partial differential equation (1) can be reduced to a system of ordinary differential equations by finding the characteristic curves in the L -t plane (Aamir et al., 2009).The L -t plane is given in a parametric form by L = L(z) and t = t(z) where the parameter z gives the measure of the distance along the characteristic curve.For the partial differential equation (1), the characteristic curves are given parametrically: (9) such that the following system of ODEs is satisfied: with the following initial conditions: (13) ( 14) The equation (1) must be linear or quasi-linear, that is the growth rate can be sizedependent but must not depend on any derivative.Since the number density function is now a function of L(z) and t(z), applying the chain rule: (16) we get: (17) Comparing equations ( 16) and ( 18) it can shown that (Aamir et al., 2009): (18) and consequently, the characteristic equations can be described by the following system of equations: To obtain the dynamic evolution of the crystal size distribution the system of equations (19)(20) has to be integrated for different initial values L0 and n(L0,0), accompanied with growth and generation/depletion rates.Typically, they are a function of supersaturation and can be obtained from the mass balance for the solute using the moment transformation of equation (1).Aamir et al. (2009) used the combined quadrature method of moments with method of characteristics, however, only the change of the total mass of crystals is required to form the solute mass balance.For the size-independent growth-only process the system (19-20) simplifies to: Phenomena that cause the change of the distribution like primary/secondary nucleation, breakage, agglomeration etc. need a special treatment.Regarding the primary nucleation, the method requires that a new characteristic line is added every time step.However, in this work it is assumed that crystals in the characteristic line with the smallest crystal size L=L0 do not grow -only the newly born crystals are accumulated.When the size of crystals in the second characteristic line increases for the specified ΔL value (typically equal to the bin size in the CSD) the size of the crystals in the first characteristic line is set to L0+ΔL and the new characteristic line inserted at the beginning with the size L0.From that point in time, the crystals in the characteristic line with size L0+ΔL are allowed to grow according to the equations (19)(20).This way, the computational and memory requirements are kept low.
Secondary nucleation, breakage and agglomeration require a mathematical procedure to maintain internal consistency with regard to the desired moments of the distribution (Kumar andRamkrishna, 1996a, 1996b).In this work, the fixed pivot technique proposed by Kumar and Ramkrishna (1996a) is selected and implemented.The pivoting technique is used to numerically preserve two properties of the distribution: the total number and mass of crystals.In addition, the identical procedure is applied at the end of the simulation to assemble the final crystal size distribution from local distributions available in every particle.

Integrated Smoothed Particle Hydrodynamics and Population Balance model
Two different algorithms for coupled SPH and population balance have been developed in this work: a) both SPH and population balance equations integrated on GPU, b) SPH equations integrated on GPU while population balance equations integrated on CPUs using the OpenMP interface.The following basic assumptions have been made: the solid phase is assumed to closely follow the flow field that is the Stokes number is sufficiently low, the presence of the solid phase does not affect the flow field, the growth rate is independent of crystal size, and the perfect micromixing is assumed to exist in each fluid particle.
The first algorithm is an extension of the SPH model (Nikolic and Frawley, 2015) by using additional transport equations.Three scalar transport equations were added: the heat conduction (between fluid particles and between the fluid particles and the reactor wall), the mass diffusion (concentration of dissolved crystals) and a set of population balance equations (either DPB or MOCH).The heat conduction and mass diffusion equations were adopted from the work of Monaghan et al. (2005): where λi is thermal conductivity and Di is mass diffusion coefficient.Similar to the pressure gradient and shear forces, the above equations conserve the heat and mass exactly and ensure that the flux will be continuous even when the conductivity is discontinuous (at the phase interfaces; here, at the fluid-wall interface).The SPH model of a stirred tank is implemented in c++ using Fluidix software (MacDonald, 2014) and identical to that described in Nikolic and Frawley (2015).The simulation procedure consists of the initialization phase and the main SPH loop.First, triangular surface meshes are generated for stirred tank walls, baffles and a stirrer shaft with impellers.These meshes can be directly loaded into the Fluidix software.Then, uniformly distributed fluid particles of the specified size are generated inside the whole volume of the tank and the particles that are positioned inside the baffles and stirrer meshes are removed.An excess number of particles from the top of the reactor, up to the prescribed volume, is also removed.Now, "wall" particles are created at all mesh vertices.The SPH main loop is implemented in a standard fashion.First, a density of all particles is estimated and internal forces evaluated such as the pressure gradient, shear forces, gravity and the surface tension.Stirrer mesh and its wall particles are rotated and the forces between impeller and fluid particles are calculated.The system is now integrated in time using the second order Verlet scheme and particle-mesh collisions processed to update particle positions.This procedure is repeated until the defined time horizon is reached.
The second algorithm is more complex.The computational procedure is divided into three parts: 1) the master thread -a controlling thread that manages the calls to CUDA kernels and controls a team of OpenMP threads, 2) the team of (Nthreads-1) threads used to integrate population balance in parallel, 3) GPU used to calculate particle interactions and integrate Navier-Stokes, heat and mass balance equations.The solution algorithm is presented in Fig. 3. First, the system is initialized in the master thread in an identical fashion as in the first algorithm.Next, the master thread repeatedly calls CUDA kernels to calculate the forces and gradients in Navier-Stokes, heat and mass balance equations and to estimate the time step.All other threads are idle.When the master thread finishes the work all threads are joined at the OpenMP barrier 1. Next, the solute concentration in particles is copied from the main memory to GPU and the temperature and the concentration gradient in particles copied from GPU to the main memory by the master thread.All other threads are idle.
When the master thread is done with the memory synchronisation all threads are joined at the OpenMP barrier 2. Now, (Nthreads-1) population balance threads are forked to integrate the population balance equations while the master thread continues with integration of Navier-Stokes, heat and mass balance equations on the GPU.The integration of population balance proceeds to the OpenMP barrier 1 where the threads join the master thread.Finally, after the integration stage, all threads check whether the time horizon has been reached and exit if it has.If the time for integration of population balance equations is lower then combined time to calculate the forces and gradients and integrate the Navier-Stokes, heat and mass balance equations the total simulation time should be equal to the SPH-only simulation and the combined algorithm should be faster than the initial GPU-only algorithm.However, this is not the case in reality since some time is always spent on thread scheduling, memory copy between the GPU and the main memory and waiting at barriers.
Therefore, the algorithm is always somewhat slower then the SPH only simulation.The benchmarks and a detailed discussion on computational requirements is given in section 4.4.

High-resolution schemes
In this section the simulation results obtained by using the high-resolution scheme for solution of DPB equations were compared to the analytical solutions for different cases using the numerical examples from Qamar et al. (2006).A range of different flux limiter functions given in Table 1 have been used in all tests.All models in this section were developed in Python using DAE Tools software (Nikolic, 2014).The quality of prediction of different flux limiters was assessed by calculating the p-norms in the L p space: (25 where nHR is the number density function from the high-resolution scheme and nAnalytical is the number density function from the analytical solution.Two error norms have been used:

Size-independent growth I
The aim of this example is to illustrate the importance of high resolution schemes and to point out the numerical problems present in the hyperbolic partial differential equations with sharp discontinuities in the solution.Although most size distributions in real processes are smooth, sharp fronts occur very often during the primary nucleation, especially in antisolvent crystallisations, due to high supersaturation values when a burst in the number density function of nearly zero-sized crystals occurs.Here, the growth-only crystallisation process was considered with the constant growth rate of 1μm/s and the following initial number density function: (28) The crystal size distribution in the size range of [0, 100]μm discretised into 100 elements was used.The analytical solution in this case is equal to the initial profile translated right in time by a distance Gt (the growth rate multiplied by the time elapsed in the process): (29 This problem is a good numerical benchmark because it is a combination of a sharp step function in the distribution and a very high growth rate.If the method works under these extreme conditions one can expect very good numerical results in the real problems without sharp fronts and with lower growth rates (Qamar et al., 2006).The simulations were performed using the I-order upwind, II-order central, and high-resolution schemes with different flux limiters and the results compared to the analytical solution.The number density functions for the I-order upwind and II-order central are shown in Fig. 4 and the best three high-resolution schemes (a-c) and the worst one (d) are presented in Fig. 5. L 1 and L 2 norms for high-resolution schemes are given in Table 2.As expected, the I-order upwind scheme produces the solution with a very high numerical diffusion while the IIorder central scheme produces non-physical oscillation in the presence of sharp fronts.The high-resolution schemes suppressed the numerical diffusion, non-physical oscillations and negative values in the solution to a different degree.In this case, the superbee, Sweby and Koren flux limiters showed the best prediction compared to the analytical solution and the superbee outperformed all the rest in terms of both L 1 and L 2 error tests.

Size-independent growth II
The first region is a square step with a sharp discontinuity, the second region is a cosinesquared wave with the discontinuities at each side, the third region is a semi-ellipse with a combination of sharp and gradual changes in gradient, and the last region represents a narrow Gaussian distribution (σ = 0.778ΔL) with a very sharp peak.The results are compared to the analytical solution: the best three high-resolution schemes (a-c) and the worst one (d) are presented in Fig. 6.L 1 and L 2 norms for high-resolution schemes are given in Table 3. Again, the high-resolution schemes suppressed the numerical diffusion, non-physical oscillations and negative values in the solution to a different degree.Overall, the superbee, smart and Koren flux limiters showed the best prediction compared to the analytical solution and the superbee again produced the best prediction in both error tests.
The results in the first three regions are fairly good for all flux limiters; however, a certain degree of numerical diffusion is still present in all of them.In the most challenging fourth region we can clearly observe the limitations of discretisation schemes.The very sharp peaks commonly occur at the beginning of the distribution during an intense nucleation.
The solution to this phenomena would be a very fine grid for the smallest sizes in the distribution or some sort of an adaptive grid.However, both approaches demand a much higher computational power to successfully resolve this kind of discontinuities.

Size-dependent growth
In this example the performance of high resolution schemes for a batch process with the size-dependent growth rate was analysed.Again, the same growth-only crystallisation process was considered with the growth rate given by: and the crystal size distribution in the size range of [0, 1]μm discretised into 100 elements.
The analytical solution in this case and is given by: (32 where G0 is a constant (0.1μm/s), Lmean is the distribution mean size (0.01μm) and N0 is the total number of particles in the distribution (100).The initial number density function is given by: (33) The results are compared to the analytical solution: the best three high-resolution schemes (a-c) and the worst one (d) are presented in Fig. 7 and 8 (the zoomed region).L 1 and L 2 norms for high-resolution schemes are given in Table 4.All high-resolution schemes performed approximately equally well with very small differences.Overall, van Albada 2, ospre and van Albada 1 flux limiters showed somewhat better prediction compared to the analytical solution.

Stiff nucleation at a negligible size
In this problem, the performance of high resolution schemes for a batch process with stiff nucleation and constant growth rate is analysed.It was assumed that the nucleation takes place at the minimum size L0 = 0 with the nucleation rate given by: (34) The growth rate is 1μm/s and the crystal size distribution is given in the size range of [0, 2]μm discretised into 200 elements.The analytical solution for this case is: (35) The initial number density function is given by: (36) The first region in the solution is a narrow wave originating from the nucleation while the second region is a square step with a sharp discontinuity.The results are compared to the analytical solution: the best three high-resolution schemes (a-c) and the worst one (d) are presented in Fig. 9. L 1 and L 2 norms for high-resolution schemes are given in Table 5.
Similarly to the case study in the section 4.1.2,discretisation schemes have problems with the very sharp peaks in the size distribution.Again, all high-resolution schemes performed approximately equally well with a significant degree of numerical diffusion present in all schemes.Overall, smart, HQUICK and HCUS flux limiters showed somewhat better prediction than the rest.
Based on the numerical problems in sections 4.1.1 to 4.1.4an overall conclusion is that superbee and Koren flux limiter showed better results in most of the cases considered and that all schemes produce a significant numerical diffusion when sharp peak-like fronts are present in the distribution.Therefore, in a general case, selection of a suitable flux limiter function is case-dependent and some experimenting is necessary.

Growth-only cooling crystallisation (hydrodynamics effects)
In this case study, the effect of hydrodynamics in a poorly mixed reactor were analysed.
The growth only cooling crystallization problem has been used, since the analytical solution can be easily obtained while the effect of the other phenomena is absent.The results from four different cases were compared: a) ideal mixing assumed and crystallization modelled using the discretised population balance (IDMIX+DPB), b) ideal mixing assumed and crystallization modelled using the method of characteristics (IDMIX+MOCH), c) SPH used for fluid dynamics coupled with the discretised population balance model, applied to every fluid particle (SPH+DPB), and d) SPH coupled with the method of characteristics, again applied to every fluid particle (SPH+MOCH).A low rpm of the stirrer and short impeller blades were used to create poor mixing conditions.The temperature profile in both ideal mixing cases was set to the calculated average temperature profile of the SPH cases to create the identical cooling conditions.The domain was discretized using 200 elements.The process studied was crystallisation of paracetamol from ethanol solution in the 1 lit Labmax reactor using the kinetic data from Mitchell et al. (2011a).The initial supersaturation ratio was set to 1.5667, the initial temperature to 20 0 C, the rotational speed to 150 rpm and the radius of the stirrer to 2cm (4 blade paddle turbine used).The reactor is then cooled using the reactor jacket.It was assumed that the wall temperature was constant and equal to 0 0 C. The stirred tank wall, baffles and stirrer surface meshes are presented in the Fig. 10.The stirred tank filled with fluid particles of the radius 1.5mm is given in Fig. 11.The ideal mixing models coupled with DPB/MOCH methods were developed in Python using DAE Tools software (Nikolic, 2014).The model equations are identical to the model equations presented in (Mitchell et al., 2011a).The only difference is that the method of moments equations were replaced by the DPB or MOCH equations and that the temperature profile in the ideal mixing case was set to the calculated average temperature of the SPH cases.
Simulations were executed on NVidia GeForce 780 Ti graphics card with 2880 cores and 3GB of RAM and Dell Precision 7500 workstation running Debian GNU/Linux operating system.The temperature of the fluid particles in the reactor's axial cross-section at t = 300 s is given in Fig. 12.The calculated average temperature in the reactor for SPH+DPB and SPH+MOCH cases is presented in Fig. 13.The performance of the developed models was analysed using the seed with sharp discontinuities in its CSD as shown in Fig. 14  (yellow) and have approximately uniform growth rate.Next, the mostly static particles along the wall in the zone 2 (blue) have a high cooling rate and a faster growth and form the front end of the CSD.Finally, the particles that belong to the poorly mixed zone 3 (red) away from the walls, have a low cooling rate and therefore a lower growth rate and form the CSD tail.

Cooling crystallisation with primary nucleation (SPH+DPB vs SPH+MOCH)
In this case study, the capability of SPH+DPB and SPH+MOCH methods to handle sharp fronts/peaks in the distribution occurring during the primary nucleation was studied.The process was cooling crystallization starting with primary nucleation followed by growth.
All runtime parameters and kinetics data are identical to those in the previous problem (section 4.2).The equation for the primary nucleation kinetics is adopted from the work of Mitchell et al. (2011b): (37) with kn = 1.597 x 10 10 #/(min x m 3 ) n and n = 2.276.The initial CSD is assumed to be empty and the models were again simulated for 300s.The simulation results are shown in Fig. 16.
As it was previously discussed in the section 4.1 the DPB method is not able to accurately track sharp peak-like discontinuities in the solution and a significant degree of numerical diffusion is present in those cases.On the other hand, the MOCH method produces the results identical to the analytical solution.

Benchmarking and computational requirements
In this case study, the computational efficiency of four different methods for the solution of algorithm, analyse the data in terms of the computational requirements, detect the bottlenecks and propose the most efficient method from the computational perspective.The results are presented in Tables 6 to 9, while a graphical comparison for 300 bins is given in The general conclusion is that MOCH method is computationally more efficient.In particular, if we compare the total step durations we can observe that SPH+MOCH+OpenMP method is superior to the others reducing the computational overhead caused by population balance equations to only ~40%, compared to SPH only.
Although, the parallel algorithm does not bring many benefits for a low number of bins/characteristic lines, the difference gets more pronounced for higher numbers.In addition, the population balance model did not include any other phenomena such as secondary nucleation, breakage, agglomeration etc. typically described by kernels given in an integral form.These integrals typically require a rather expensive numerical evaluation and significantly contribute to the total simulation time.Looking at the duration of PB integration relative to the total time we can see that it is very short in SPH+MOCH+OpenMP case and the additional integral evaluation calculations will not cause a slowdown.Also, a good indicator of whether an additional processing can be performed is a time spent by the master thread at the OpenMP barrier 1.A short waiting time (i.e. less than 1 ms) is an indicator that the time for integration of population balance equations is shorter than the processing and integration of the Navier-Stokes forces and gradients.The results also show that a certain overhead must always be present.The data to be fetched from the main memory).For instance, the overhead due to the time spent on barriers and memory synchronisation in the case with 200 bins is significant: 21% in SPH+DPB+OpenMP and 17% in SPH+MOCH+OpenMP case.The other contributing factor is longer calculation of forces and longer integration of Navier-Stokes equations since the data structure in memory holding the particle information is larger and an additional time is required to fetch larger chunks of memory.
The advantage of application of the SPH method to modelling of the integrated fluid dynamics and crystallisation process is that a very large number of bins or characteristic lines can be used.For instance, the SPH+MOCH+OpenMP simulation with 500 characteristic lines is only 18% slower compared to 200 characteristic lines.Obviously, there is a lot of room for further optimisations: use of more processors, multiple GPUs, and merging the memory copy routine with the population balance integration stage.The above-mentioned optimisations will be a part of the future work.

Conclusions
The integrated SPH and population balance model has been developed based on the fluid dynamics model presented in Nikolic and Frawley (2015).Two existing methods for solution of population balance equations have been implemented: a) discretised population balance method solved using the high-resolution finite volume schemes with flux limiter, and b) method of characteristics.Both methods have successfully been applied to numerical benchmarking problems available in the literature where the analytical solutions are available.The algorithm to integrate the population balance equations in parallel and independently from the Navier-Stokes equations has been developed.It has been shown that the population balance equations can be solved using the OpenMP interface while the fluid dynamics equations being computed independently on a GPGPU using the NVidia CUDA technology.The benefits of the proposed methodology were pointed out such as significantly lower computational requirements and availability of the crystal size distribution.This way, a significant portion of the computational overhead due to a large number of additional transport equations resulting from the discretisation of the population balance can be removed: the total overhead was reduced to only 40% for 200 additional equations, compared to CFD-only simulation.
The developed models were applied to a numerical solution of coupled computational fluid dynamics and population balance equations to capture the effect of the hydrodynamics on the local temperature/supersaturation and the resulting crystal size distribution.The method   Table 1.Flux limiters

Preprints
(black dotted line).Since the process is growth-only the analytical solution for the ideal-mixing case is simply the initial CSD translated right by a cumulative distance ΣGi•Δti (a sum of the growth rate multiplied by the time elapsed for every time step).The simulation results for all cases at time t = 300s are presented in Fig.14and the zoomed areas in Fig.15.The results for the ideal mixing cases is shown in Fig.14a.It can be observed that the scheme suppressed oscillations in the solution and decreased the amount of numerical diffusion, although not completely.However, the ideal mixing models do not take into the account the non-idealities in the reactor with poor mixing since they do not capture the local hydrodynamics conditions.From the Fig.12we can observe three zones in the reactor: zone 1 (yellow) around the impellers with the relatively good mixing, zone 2 (blue) along the reactor wall with poor mixing and good heat transfer, and zone 3 (red) in the upper part of the reactor with poor mixing and low heat transfer rate.Overall CSDs in the reactor for the SPH+DPB and SPH+MOCH cases is obtained by combining the CSD from individual particles and presented in Fig.14b.Obviously, the non-idealities in the reactor resulted in some fluid particles being well mixed while some other are located in poorly mixed zones (i.e.trapped in the zones around baffles or located in poorly mixed upper part of the volume) and contribute to the dispersion of the CSD.Similarly to the temperature profile in the reactor, we can observe three zones in the CSD plot in Fig.14b.First, the particles that are located in the middle part of the CSD in the Fig.14belong to the well mixed zone 1 coupled SPH and population balance equations were compared: a) both SPH and DPB equations integrated on GPU (SPH+DPB), b) both SPH and MOCH equations integrated on GPU (SPH+MOCH), c) SPH equations integrated on GPU and DPB equations integrated on CPUs using the OpenMP interface (SPH+DPB+OpenMP), and d) SPH equations integrated on GPU and MOCH equations integrated on CPUs using the OpenMP interface (SPH+MOCH+OpenMP).The identical reactor and input parameters as previously discussed in sections 4.2 and 4.3 were used again.Simulations were executed using NVidia GeForce 780 Ti graphics card with 2880 cores and 3GB of RAM and Dell Precision 7500 dual-cpu workstation using 24 OpenMP threads.The total number of particles was 37038 and the particle radius 1.5mm.The simulation runs were repeated for different number of bins in CSD (200 to 1000) for all four cases.The aim was to profile the simulation Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 2 November 2016 doi:10.20944/preprints201611.0012.v1

Fig. 17 .
Fig. 17.The data represent durations in milliseconds of individual stages in the algorithm for a single time step during the simulation.Three main stages were considered: calculation of forces for Navier-Stokes equations and gradients for heat/mass transfers (stage: N-S forces), integration of Navier-Stokes equations and heat/mass balances (stage: N-S integration), integration of population balance equations (stage: PB integration).OpenMP methods contain two additional stages: waiting on OpenMP barriers (stages: OpenMP barrier 1 and 2) and data synchronisation between GPU and main memory (stage: Memory copy).All results are averaged over approximately two thousand time steps.
reasons for this are: a) thread management in general (i.e.time spent by threading library scheduling chunks of work on each thread), b) waiting at the barriers (i.e.7% of the total step time in SPH+DPB+OpenMP and 4% in SPH+MOCH+OpenMP case for 200 bins), c) memory copy (i.e.14% of the total step time in SPH+DPB+OpenMP and 12% in SPH+MOCH+OpenMP case for 200 bins), d) one of the threads lags behind the rest causing the whole team of threads to wait for the slowest one (i.e. a thread is waiting for the Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 2 November 2016 doi:10.20944/preprints201611.0012.v1

Figure 12 .
Figure 12.Temperature of the particles in the axial cross section of the reactor for the growth-only crystallisation at t = 300s (SPH+DPB and SPH+MOCH cases)

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 2 November 2016 doi:10.20944/preprints201611.0012.v1 of
characteristics has been selected as computationally more efficient and more accurate for problems where sharp or peak-like gradients are present in the distribution.This research has been conducted as part of the Synthesis and Solid State Pharmaceutical Centre (SSPC) and funded by Science Foundation Ireland (SFI).

Table 2 .
Performance of flux limiters for the size-independent growth I case (L 2 sorted)

Table 3 .
Performance of flux limiters for the size-independent growth II case (L 2 sorted)

Table 4 .
Performance of flux limiters for the size-dependent growth case (L 2 sorted)

Table 5 .
Performance of flux limiters for the stiff nucleation at negligible size case (L 2 sorted) Preprints (www.