<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Mar 13, 2018 at 5:39 PM, Ard Biesheuvel <span dir="ltr"><<a href="mailto:ard.biesheuvel@linaro.org" target="_blank">ard.biesheuvel@linaro.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 13 March 2018 at 15:31, Frederik Lotter<br>
<div><div class="gmail-h5"><<a href="mailto:frederik.lotter@netronome.com">frederik.lotter@netronome.com</a><wbr>> wrote:<br>
> On Tue, Mar 13, 2018 at 3:45 PM, Ard Biesheuvel <<a href="mailto:ard.biesheuvel@linaro.org">ard.biesheuvel@linaro.org</a>><br>
> wrote:<br>
>><br>
>> On 13 March 2018 at 13:44, Frederik Lotter<br>
>> <<a href="mailto:frederik.lotter@netronome.com">frederik.lotter@netronome.com</a><wbr>> wrote:<br>
>> > On Tue, Mar 13, 2018 at 2:55 PM, Ard Biesheuvel<br>
>> > <<a href="mailto:ard.biesheuvel@linaro.org">ard.biesheuvel@linaro.org</a>><br>
>> > wrote:<br>
>> >><br>
>> >> On 13 March 2018 at 12:48, Frederik Lotter<br>
>> >> <<a href="mailto:frederik.lotter@netronome.com">frederik.lotter@netronome.com</a><wbr>> wrote:<br>
>> >> > On Tue, Mar 13, 2018 at 2:36 PM, Ard Biesheuvel<br>
>> >> > <<a href="mailto:ard.biesheuvel@linaro.org">ard.biesheuvel@linaro.org</a>><br>
>> >> > wrote:<br>
>> >> >><br>
>> >> >> On 13 March 2018 at 12:26, Frederik Lotter<br>
>> >> >> <<a href="mailto:frederik.lotter@netronome.com">frederik.lotter@netronome.com</a><wbr>> wrote:<br>
>> >> >> > Hi Ard,<br>
>> >> >> ><br>
>> >> >> > On Mon, Mar 12, 2018 at 6:22 PM, Ard Biesheuvel<br>
>> >> >> > <<a href="mailto:ard.biesheuvel@linaro.org">ard.biesheuvel@linaro.org</a>><br>
>> >> >> > wrote:<br>
>> >> >> >><br>
>> >> >> >> On 12 March 2018 at 16:15, Frederik Lotter<br>
>> >> >> >> <<a href="mailto:frederik.lotter@netronome.com">frederik.lotter@netronome.com</a><wbr>> wrote:<br>
>> >> >> >> > Hi,<br>
>> >> >> >> > I am getting CPU stall warnings when booting up using the EFI<br>
>> >> >> >> > route.<br>
>> >> >> >> > I<br>
>> >> >> >> > suspect the PCIe interface, as the stall warning sometimes<br>
>> >> >> >> > contain<br>
>> >> >> >> > the<br>
>> >> >> >> > probe<br>
>> >> >> >> > function. Other times is seems to get further than PCIe init,<br>
>> >> >> >> > but<br>
>> >> >> >> > still<br>
>> >> >> >> > stall interrupt handling.<br>
>> >> >> >> > Here are some facts around my observation:<br>
>> >> >> >> ><br>
>> >> >> >> > I have two sdcards for my Machiattobin board. They have<br>
>> >> >> >> > identical<br>
>> >> >> >> > kernels<br>
>> >> >> >> > (4.16 rc5) with Ubuntu 16.04 rootfs. The one sdcard uses a<br>
>> >> >> >> > uboot,<br>
>> >> >> >> > DT<br>
>> >> >> >> > and<br>
>> >> >> >> > kernel boot. The second sdcard has EDKII, grub kernel boot. The<br>
>> >> >> >> > EDKII<br>
>> >> >> >> > build<br>
>> >> >> >> > includes the device tree DTB (and DTS which I believe is<br>
>> >> >> >> > unused)<br>
>> >> >> >> > from<br>
>> >> >> >> > the<br>
>> >> >> >> > one used on the uboot sdcard.<br>
>> >> >> >> ><br>
>> >> >> >> > EFI stub: Booting Linux Kernel...<br>
>> >> >> >> > EFI stub: Using DTB from configuration table<br>
>> >> >> >> > EFI stub: Exiting boot services and installing virtual address<br>
>> >> >> >> > map...<br>
>> >> >> >> > [ 0.000000] Booting Linux on physical CPU 0x0000000000<br>
>> >> >> >> > [0x410fd081]<br>
>> >> >> >> > [ 0.000000] Linux version 4.16.0-rc5-mbcin-netronome-2-<wbr>dirty<br>
>> >> >> >> > (root@mcb1-cpt) (gcc version 5.4.0 20160609 (Ubuntu/Linaro<br>
>> >> >> >> > 5.4.0-6ubuntu1~16.04.9)) #2 SMP PREEMPT Mon Mar 12 14:40:25 UTC<br>
>> >> >> >> > 2018<br>
>> >> >> >> > [ 0.000000] Machine model: Marvell 8040 MACHIATOBin<br>
>> >> >> >> > [ 0.000000] efi: Getting EFI parameters from FDT:<br>
>> >> >> >> > [ 0.000000] efi: EFI v2.70 by EDK II<br>
>> >> >> >> > [ 0.000000] efi: SMBIOS 3.0=0xbfd00000 ACPI 2.0=0xb6760000<br>
>> >> >> >> > MEMATTR=0xb8973418 RNG=0xbffdbf98<br>
>> >> >> >> > [ 0.000000] random: fast init done<br>
>> >> >> >> > [ 0.000000] efi: seeding entropy pool<br>
>> >> >> >> > :<br>
>> >> >> >> ><br>
>> >> >> >> > (I am using the latest EDKII master, the Marvell<br>
>> >> >> >> > edk2-open-platform<br>
>> >> >> >> > 17.10<br>
>> >> >> >> > banch, with all the latest mv-ddr/ atf /etc....).<br>
>> >> >> >> ><br>
>> >> >> >> > The DT data appear there in die EFI boot, but the PCIe<br>
>> >> >> >> > interface<br>
>> >> >> >> > fails,<br>
>> >> >> >> > and<br>
>> >> >> >> > results (I believe) in the CPU stall warnings:<br>
>> >> >> >> ><br>
>> >> >> >> > [ 717.453025] INFO: rcu_preempt self-detected stall on CPU<br>
>> >> >> >> > :<br>
>> >> >> >> > :<br>
>> >> >> >> > [ 717.589783] armada8k_pcie_probe+0x140/<wbr>0x240<br>
>> >> >> >> > :<br>
>> >> >> >> ><br>
>> >> >> >> > Other times, the pcie gets further:<br>
>> >> >> >> ><br>
>> >> >> >> > [ 3.312127] PCI: OF: host bridge /cp0/pcie@f2600000 ranges:<br>
>> >> >> >> > [ 3.317740] PCI: OF: IO 0xf9000000..0xf900ffff -><br>
>> >> >> >> > 0xf9000000<br>
>> >> >> >> > [ 3.323692] PCI: OF: MEM 0xc0000000..0xdfffffff -><br>
>> >> >> >> > 0xc0000000<br>
>> >> >> >> > [ 3.328915] random: crng init done<br>
>> >> >> >> > [ 4.326158] armada8k-pcie f2600000.pcie: phy link never came<br>
>> >> >> >> > up<br>
>> >> >> >> > [ 4.332109] armada8k-pcie f2600000.pcie: Link not up after<br>
>> >> >> >> > reconfiguration<br>
>> >> >> >> > [ 4.339056] armada8k-pcie f2600000.pcie: PCI host bridge to<br>
>> >> >> >> > bus<br>
>> >> >> >> > 0000:00<br>
>> >> >> >><br>
>> >> >> >><br>
>> >> >> >> To be brutally honest, the armada8k-pcie driver is a piece of<br>
>> >> >> >> junk,<br>
>> >> >> >> and you're much better off using the generic ECAM driver, which<br>
>> >> >> >> now<br>
>> >> >> >> includes special handling for the missing root port on Synopsys<br>
>> >> >> >> IP.<br>
>> >> >> >><br>
>> >> >> >> It also allows you to have both MMIO32 and MMIO64 regions, which<br>
>> >> >> >> can<br>
>> >> >> >> be useful with some PCIe cards with large BARs<br>
>> >> >> >><br>
>> >> >> >> Could you try<br>
>> >> >> >><br>
>> >> >> >> compatible = "marvell,armada8k-pcie-ecam";<br>
>> >> >> >><br>
>> >> >> >> in the DT node, please?<br>
>> >> >> >><br>
>> >> >> >> (Before you do that, please check whether UEFI recognizes your<br>
>> >> >> >> PCI<br>
>> >> >> >> hardware using the 'pci' command in the shell)<br>
>> >> >> ><br>
>> >> >> ><br>
>> >> >> > This exercise help a lot. Thank you for the proposal.<br>
>> >> >> ><br>
>> >> >> > So now I can consistently boot using uboot and efi.<br>
>> >> >> ><br>
>> >> >> > However, the pcie driver init fails. I have provided boot logs and<br>
>> >> >> > also<br>
>> >> >> > my<br>
>> >> >> > DT entry - we need custom BAR ranges, and I am not sure if this<br>
>> >> >> > driver<br>
>> >> >> > understand everything.<br>
>> >> >> ><br>
>> >> >> > cp0_pcie0: pcie@f2600000 {<br>
>> >> >> > compatible = "marvell,armada8k-pcie-ecam", "snps,dw-pcie";<br>
>> >> >> > reg = <0 0xf2600000 0 0x10000>,<br>
>> >> >> > <0 ((0xf6000000 + (0 * 0x1000000)) + 0xf00000) 0 0x80000>;<br>
>> >> >> > reg-names = "ctrl", "config";<br>
>> >> >> > #address-cells = <3>;<br>
>> >> >> > #size-cells = <2>;<br>
>> >> >> > #interrupt-cells = <1>;<br>
>> >> >> > device_type = "pci";<br>
>> >> >> > dma-coherent;<br>
>> >> >> > msi-parent = <&gic_v2m0>;<br>
>> >> >> ><br>
>> >> >> > bus-range = <0 0xff>;<br>
>> >> >> > ranges =<br>
>> >> >> ><br>
>> >> >> > <0x81000000 0 (0xf9000000 + (0 * 0x10000)) 0 (0xf9000000 + (0 *<br>
>> >> >> > 0x10000))<br>
>> >> >> > 0 0x10000<br>
>> >> >> ><br>
>> >> >> > 0x82000000 0 (0xf6000000 + (0 * 0x1000000)) 0 (0xf6000000 + (0 *<br>
>> >> >> > 0x1000000)) 0 0xf00000>;<br>
>> >> >> > interrupt-map-mask = <0 0 0 0>;<br>
>> >> >> > interrupt-map = <0 0 0 0 &cp0_icu 0x0 22 4>;<br>
>> >> >> > interrupts = <0x0 22 4>;<br>
>> >> >> > num-lanes = <1>;<br>
>> >> >> > clocks = <&cp0_clk 1 13>;<br>
>> >> >> > status = "disabled";<br>
>> >> >> > };<br>
>> >> >> ><br>
>> >> >> ><br>
>> >> >> > Error:<br>
>> >> >> ><br>
>> >> >> > [ 1.396968] PCI: OF: host bridge /cp0/pcie@f2600000 ranges:<br>
>> >> >> > [ 1.396979] PCI: OF: IO 0xf9000000..0xf900ffff -> 0xf9000000<br>
>> >> >> > [ 1.396984] PCI: OF: MEM 0xc0000000..0xdfffffff -> 0xc0000000<br>
>> >> >> > [ 1.396998] pci-host-generic f2600000.pcie: ECAM area [mem<br>
>> >> >> > 0xf2600000-0xf260ffff] can only accommodate [bus<br>
>> >> >> > 00-ffffffffffffffff]<br>
>> >> >> > (reduced from [bus 00-ff] desired)<br>
>> >> >> > [ 1.397002] pci-host-generic f2600000.pcie: ECAM ioremap failed<br>
>> >> >> > [ 1.397011] pci-host-generic: probe of f2600000.pcie failed<br>
>> >> >> > with<br>
>> >> >> > error<br>
>> >> >> > -12<br>
>> >> >> ><br>
>> >> >> ><br>
>> >> >> > Thanks for the support.<br>
>> >> >> ><br>
>> >> >><br>
>> >> >> Please try the following config<br>
>> >> >><br>
>> >> >> cp0_pcie0: pcie@e0000000 {<br>
>> >> >> compatible = "marvell,armada8k-pcie-ecam", "snps,dw-pcie";<br>
>> >> >> reg = <0 0xe0000000 0 0xff00000>;<br>
>> >> >> #address-cells = <3>;<br>
>> >> >> #size-cells = <2>;<br>
>> >> >> #interrupt-cells = <1>;<br>
>> >> >> device_type = "pci";<br>
>> >> >> dma-coherent;<br>
>> >> >> msi-parent = <&gic_v2m0>;<br>
>> >> >><br>
>> >> >> bus-range = <0 0xfe>;<br>
>> >> >> ranges = <0x1000000 0x0 0x00000000 0x0 0xeff00000 0x0<br>
>> >> >> 0x00010000>,<br>
>> >> >> <0x2000000 0x0 0xc0000000 0x0 0xc0000000 0x0<br>
>> >> >> 0x20000000>,<br>
>> >> >> <0x3000000 0x8 0x00000000 0x8 0x00000000 0x1<br>
>> >> >> 0x00000000>;<br>
>> >> >><br>
>> >> >> interrupt-map-mask = <0 0 0 0>;<br>
>> >> >> interrupt-map = <0 0 0 0 &cp0_icu 0x0 22 4>;<br>
>> >> >> };<br>
>> >> ><br>
>> >> ><br>
>> >> > I am trying it now.<br>
>> >> ><br>
>> >> > Could you just give me some insight on how the peripheral base<br>
>> >> > address<br>
>> >> > can<br>
>> >> > be just modified like that ?<br>
>> >> ><br>
>> >> > Is there a mapping change somewhere?<br>
>> >> ><br>
>> >><br>
>> >> All those addresses are configurable, and the default armada8k-pcie<br>
>> >> driver sets up all the translation windows from scratch (in a rather<br>
>> >> limited way, mind you)<br>
>> >><br>
>> >> The armada8k-pcie-ecam driver just reuses the configuration set by the<br>
>> >> firmware, allowing for a larger bus range and an additional 4 GB<br>
>> >> window for 64-bit MMIO<br>
>> ><br>
>> ><br>
>> > The new DTS extract:<br>
>> ><br>
>> > cp0_pcie0: pcie@0xe0000000 {<br>
>> > compatible = "marvell,armada8k-pcie-ecam", "snps,dw-pcie";<br>
>> > reg = <0 0xe0000000 0 0x10000>;<br>
>> ><br>
>> > #address-cells = <3>;<br>
>> > #size-cells = <2>;<br>
>> > #interrupt-cells = <1>;<br>
>> > device_type = "pci";<br>
>> > dma-coherent;<br>
>> > msi-parent = <&gic_v2m0>;<br>
>> ><br>
>> > bus-range = <0 0xfe>;<br>
>> > ranges = <0x1000000 0x0 0x00000000 0x0 0xeff00000 0x0 0x00010000>,<br>
>> > <0x2000000 0x0 0xc0000000 0x0 0xc0000000 0x0 0x20000000>,<br>
>> > <0x3000000 0x8 0x00000000 0x8 0x00000000 0x1 0x00000000>;<br>
>> ><br>
>> > interrupt-map-mask = <0 0 0 0>;<br>
>> > interrupt-map = <0 0 0 0 &cp0_icu 0x0 22 4>;<br>
>> > };<br>
>> ><br>
>> > The result:<br>
>> ><br>
>> > [ 1.463594] PCI: OF: host bridge /cp0/pcie@0xe0000000 ranges:<br>
>> > [ 1.463608] PCI: OF: IO 0xeff00000..0xeff0ffff -> 0x00000000<br>
>> > [ 1.463616] PCI: OF: MEM 0xc0000000..0xdfffffff -> 0xc0000000<br>
>> > [ 1.463622] PCI: OF: MEM 0x800000000..0x8ffffffff -> 0x800000000<br>
>> > [ 1.463638] pci-host-generic e0000000.pcie: ECAM area [mem<br>
>> > 0xe0000000-0xe000ffff] can only accommodate [bus 00-ffffffffffffffff]<br>
>> > (reduced from [bus 00-fe] desired)<br>
>> > [ 1.463646] pci-host-generic e0000000.pcie: ECAM ioremap failed<br>
>> > [ 1.463657] pci-host-generic: probe of e0000000.pcie failed with<br>
>> > error<br>
>> > -12<br>
>> ><br>
>> ><br>
>><br>
>> Please use the size I suggested for the 'reg' property<br>
><br>
><br>
> Sorry I missed that:<br>
><br>
> [ 1.463413] PCI: OF: host bridge /cp0/pcie@0xe0000000 ranges:<br>
> [ 1.463427] PCI: OF: IO 0xeff00000..0xeff0ffff -> 0x00000000<br>
> [ 1.463435] PCI: OF: MEM 0xc0000000..0xdfffffff -> 0xc0000000<br>
> [ 1.463442] PCI: OF: MEM 0x800000000..0x8ffffffff -> 0x800000000<br>
> [ 1.463481] pci-host-generic e0000000.pcie: ECAM at [mem<br>
> 0xe0000000-0xefefffff] for [bus 00-fe]<br>
> [ 1.463525] pci-host-generic e0000000.pcie: PCI host bridge to bus<br>
> 0000:00<br>
> [ 1.463531] pci_bus 0000:00: root bus resource [bus 00-fe]<br>
> [ 1.463536] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]<br>
> [ 1.463541] pci_bus 0000:00: root bus resource [mem<br>
> 0xc0000000-0xdfffffff]<br>
> [ 1.463547] pci_bus 0000:00: root bus resource [mem<br>
> 0x800000000-0x8ffffffff]<br>
><br>
><br>
> So I assume this works, and I am super grateful. I will test it tomorrow<br>
> with our Smart NIC.<br>
><br>
<br>
</div></div>This doesn't tell you all that much, to be honest. But at least the<br>
numbers look sane now, and appear to match the UEFI configuration.<br></blockquote><div><br></div><div>Perhaps I was too optimistic too quickly.</div><div><br></div><div><div>root@localhost:~# lspci</div><div>root@localhost:~# lspci -v</div><div>root@localhost:~# echo 1 > /sys/bus/pci/</div><div>devices/ drivers_probe slots/</div><div>drivers/ rescan uevent</div><div>drivers_autoprobe resource_alignment</div><div>root@localhost:~# echo 1 > /sys/bus/pci/rescan</div><div><br></div><div>[ 176.977408] INFO: rcu_preempt detected stalls on CPUs/tasks:</div><div>[ 176.983100] 1-...0: (1 GPs behind) idle=242/1/4611686018427387904 softirq=3435/3435 fqs=22</div><div>[ 176.991572] (detected by 2, t=5375 jiffies, g=1169, c=1168, q=60)</div></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span class="gmail-"><br>
> However, we are building a product that obviously requires long term<br>
> maintenance, so may I please get your input on a strategy with this?<br>
><br>
> If we decide to stick with this driver, would it be easy for things to<br>
> become disjointed?<br>
><br>
> The hope with going the EFI route is that we could boot "generic" Ubuntu and<br>
> CentOS installs, so I guess as long as we keep the DT and the EFKII snapshot<br>
> in sync on our side, the risk is low.<br>
><br>
<br>
</span>I'm afraid you are getting caught in the middle of a philosophical<br>
debate here: many engineers that are involved with the Marvell support<br>
in Linux feel that a device tree is not something that should be<br>
supported long term, and needs to be bundled with the OS. Over the<br>
last couple of kernel releases, the Marvell 8040 support was changed<br>
in a non-backward compatible manner numerous times.<br>
<br>
This conflicts badly with the idea that the firmware provides the<br>
hardware description (using DT or ACPI), and that the contract with<br>
the OS is kept by both sides for longer than a single release.<br>
<br>
So I cannot really answer that question, unfortunately. If you don't<br>
intend to use the onboard network controller, you could go the ACPI<br>
route, I guess.<br>
<br>
Another problem is that none of this UEFI/ACPI support is upstream in<br>
the Tianocore project, and trying random trees left and right doesn't<br>
really help when assessing whether a platform is suitable as a long<br>
term investment.<br>
<div><div class="gmail-h5"><br>
<br>
<br>
> For example, using the same DT with uboot, it fails:<br>
><br>
> [ 0.294942] sysfs: cannot create duplicate filename<br>
> '/bus/platform/devices/<wbr>e0000000.pcie'<br>
> [ 0.294950] CPU: 2 PID: 1 Comm: swapper/0 Not tainted<br>
> 4.16.0-rc5-mbcin-netronome-2-<wbr>dirty #2<br>
> [ 0.294952] Hardware name: Marvell 8040 MACHIATOBin (DT)<br>
> [ 0.294955] Call trace:<br>
> [ 0.294967] dump_backtrace+0x0/0x150<br>
> [ 0.294970] show_stack+0x14/0x20<br>
> [ 0.294976] dump_stack+0x98/0xbc<br>
> [ 0.294980] sysfs_warn_dup+0x60/0x78<br>
> [ 0.294983] sysfs_do_create_link_sd.isra.<wbr>0+0xd8/0xe0<br>
> [ 0.294986] sysfs_create_link+0x20/0x40<br>
> [ 0.294990] bus_add_device+0x88/0x148<br>
> [ 0.294993] device_add+0x394/0x568<br>
> [ 0.294997] of_device_add+0x5c/0x70<br>
> [ 0.295000] of_platform_device_create_<wbr>pdata+0x80/0xd0<br>
> [ 0.295003] of_platform_bus_create+0xdc/<wbr>0x300<br>
> [ 0.295006] of_platform_bus_create+0x11c/<wbr>0x300<br>
> [ 0.295008] of_platform_populate+0x4c/0xb0<br>
> [ 0.295014] of_platform_default_populate_<wbr>init+0xa4/0xc0<br>
> [ 0.295017] do_one_initcall+0x38/0x120<br>
> [ 0.295020] kernel_init_freeable+0x134/<wbr>0x1d4<br>
> [ 0.295025] kernel_init+0x10/0x100<br>
> [ 0.295028] ret_from_fork+0x10/0x18<br>
><br>
> So I think this confirms that the pcie setup is different between EDKII and<br>
> uboot (unless I am doing something stupid here).<br>
><br>
<br>
</div></div>It looks like you have two copies of the pcie node here, no?<br>
</blockquote></div><br></div></div>