In this paper we show how to efficiently implement parallel discrete simulations on Multi-Core and GPU architectures through a real example of application: a cellular automata model of laser dynamics. We describe the techniques employed to build and optimize the implementations using OpenMP and CUDA frameworks. We have evaluated the performance on two different hardware platforms that represent different target market segments: high-end platforms for scientific computing, using an Intel Xeon Platinum 8259CL server with48cores and also an NVIDIA Tesla6V100 GPU, both running on Amazon Web Server (AWS) Cloud, and on a consumer-oriented platform, using an Intel Core i9 9900k CPU and an NVIDIA GeForce GTX 1050 TI GPU. Performance results are compared and analysed in detail. We show that excellent performance and scalability can be obtained in both platforms, and we extract some important issues that imply a performance degradation for them. We also found that current Multi-Core CPUs with large core numbers can bring a performance very near to that of GPUs, even similar in some cases.